-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Aim
I would propose to overhaul the library as it currently stands to introduce a dependency on pydantic, an excellent data validation library that works hand-in-glove with JSON and JSON schemas.
Let me know what you think of the below, and I'd consider developing a basic PR.
Note: I'm aware that Pydantic requires Python>=3.7 and there are mentioned comments that need backwards compatibility (specifically Python 3.6). If that is the case, the Pydantic parts could be implemented as optional if desired.
Benefits
This would serve to provide the following benefits:
- Provide a clear dataclass format for each of the data components. This would make it easy to deprecate and change if modifications were introduced to the core SigMF schema.
- Abolish and automatically integrate any validation code associated with checking the
.sigmf-metaJSON files. Pydantic would perform type, length and semantic checking on your behalf. - Significantly extend existing validation at minimal cost (not requiring lots of extra tests), introducing semantic and regex checks.
- Minimal computational overhead (
pydanticis written in Rust). - Would work seamlessly with the
IQEnginepackage you are also developing, as Pydantic works great with JSON for HTTP requests, FastAPI and other common web libraries.
Examples
For example, in defining the schema for a SigMF Global object, one might:
from pydantic import BaseModel, FilePath
# mirrors the Global Object as defined in the schema
class SigMFGlobal(BaseModel):
sample_rate: float
datatype: str
author: str | None
dataset: FilePath # Path object which also automatically checks if the file at the directory exists...
...and for a single meta file one would define:
from pydantic import BaseModel, FilePath
class SigMFFile(BaseModel):
global_: Annotated[SigMFGlobal, Field(alias="global")]
captures: Annotated[list[SigMFCapture], Field(min_length=1)]
annotations: list[SigMFAnnotation]calling BaseModel.model_dump_json() automatically converts nested BaseModel objects into a JSON-compliant strings which is trivial to store to disk. Similarly, reading in a .sigmf-meta file could be parsed by a SigMFFile object (which extends BaseModel) to validate the JSON according to the schema without any custom checking code.
Pydantic can handle custom serialization formats by using aliasing, for example:
from typing import Annotated
from pydantic import Field, AliasChoices
sample_rate: Annotated[float, Field(
# stores core:sample_rate into the JSON file when serialized
serialization_alias="core:sample_rate",
# enables core:sample_rate or sample_rate as valid JSON strings when reading in file
validation_alias=AliasChoices("core:sample_rate", "sample_rate")
)]helps to handle any discrepancies between the JSON naming format e.g "core:" or "my_custom_extension:<item"> and the Python variable name.
Extensions
Pydantic would be a great way to handle custom extensions for SigMF also, any custom user extension could simply extend a pre-defined extension class.