Skip to content

[MAJOR FEATURE] Overhaul library to use Data Validation via Pydantic #58

@gregparkes

Description

@gregparkes

Aim

I would propose to overhaul the library as it currently stands to introduce a dependency on pydantic, an excellent data validation library that works hand-in-glove with JSON and JSON schemas.

Let me know what you think of the below, and I'd consider developing a basic PR.

Note: I'm aware that Pydantic requires Python>=3.7 and there are mentioned comments that need backwards compatibility (specifically Python 3.6). If that is the case, the Pydantic parts could be implemented as optional if desired.

Benefits

This would serve to provide the following benefits:

  1. Provide a clear dataclass format for each of the data components. This would make it easy to deprecate and change if modifications were introduced to the core SigMF schema.
  2. Abolish and automatically integrate any validation code associated with checking the .sigmf-meta JSON files. Pydantic would perform type, length and semantic checking on your behalf.
  3. Significantly extend existing validation at minimal cost (not requiring lots of extra tests), introducing semantic and regex checks.
  4. Minimal computational overhead (pydantic is written in Rust).
  5. Would work seamlessly with the IQEngine package you are also developing, as Pydantic works great with JSON for HTTP requests, FastAPI and other common web libraries.

Examples

For example, in defining the schema for a SigMF Global object, one might:

from pydantic import BaseModel, FilePath

# mirrors the Global Object as defined in the schema
class SigMFGlobal(BaseModel):
    sample_rate: float
    datatype: str
    author: str | None
    dataset: FilePath # Path object which also automatically checks if the file at the directory exists... 
    ...

and for a single meta file one would define:

from pydantic import BaseModel, FilePath

class SigMFFile(BaseModel):
    global_: Annotated[SigMFGlobal, Field(alias="global")]
    captures: Annotated[list[SigMFCapture], Field(min_length=1)]
    annotations: list[SigMFAnnotation]

calling BaseModel.model_dump_json() automatically converts nested BaseModel objects into a JSON-compliant strings which is trivial to store to disk. Similarly, reading in a .sigmf-meta file could be parsed by a SigMFFile object (which extends BaseModel) to validate the JSON according to the schema without any custom checking code.

Pydantic can handle custom serialization formats by using aliasing, for example:

from typing import Annotated
from pydantic import Field, AliasChoices

sample_rate: Annotated[float, Field(
    # stores core:sample_rate into the JSON file when serialized
    serialization_alias="core:sample_rate",
    # enables core:sample_rate or sample_rate as valid JSON strings when reading in file
    validation_alias=AliasChoices("core:sample_rate", "sample_rate")
)]

helps to handle any discrepancies between the JSON naming format e.g "core:" or "my_custom_extension:<item"> and the Python variable name.

Extensions

Pydantic would be a great way to handle custom extensions for SigMF also, any custom user extension could simply extend a pre-defined extension class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions