|
| 1 | +# Mapping Schemas to Programming Language Structures |
| 2 | + |
| 3 | +LinkML Schemas can be mapped to modeling constructs in different programming languages. |
| 4 | +This allows for *code generation*, in which a LinkML schema is used as the source to generate |
| 5 | +code in a target language. |
| 6 | + |
| 7 | +This has a number of advantages, including type safety, programmer efficiency, ease of mapping |
| 8 | +to other serializations, helping ensure code and domain models are aligned. |
| 9 | + |
| 10 | +As each programming language differs in which constructs it offers and the precise |
| 11 | +semantics of these constructs, there is single standard for mapping. Instead, we provide |
| 12 | +a set of general recommendations that can be adapted to each language. |
| 13 | + |
| 14 | +## Terminology |
| 15 | + |
| 16 | +* Programming language constructs: |
| 17 | + * `Structure`: a compound datatype that consists of one of more attributes |
| 18 | + * `Class`: a Structure that supports or partially supports inheritance |
| 19 | + * `Attribute`: a field or property of a class or struct |
| 20 | + * `Class-level variable`: a property of a class rather than of an instance of that class |
| 21 | + * `Module`: A file-level bundle of classes or structures |
| 22 | + * `Package`: A collection of modules |
| 23 | + |
| 24 | +## Mapping of LinkML Schemas |
| 25 | + |
| 26 | +### Schemas to Modules |
| 27 | + |
| 28 | +A schema SHOULD be mapped to EITHER a module or a collection of modules, depending on the idioms |
| 29 | +of the target language. |
| 30 | + |
| 31 | +For languages where it is conventional to include multiple classes or structures in |
| 32 | +a single module (e.g. Python), the schema SHOULD correspond to a module. |
| 33 | + |
| 34 | +For languages where it is conventional to include a single classes or structures in |
| 35 | +a single module (e.g. Java), a single module will correspond to a single LinkML class or enum. |
| 36 | + |
| 37 | +Current implementations: |
| 38 | + |
| 39 | +| Target | Default Mapping | |
| 40 | +|---------------|---------------------| |
| 41 | +| Dataclasses | One file per schema | |
| 42 | +| Pydantic | One file per schema | |
| 43 | +| Java | One file per class | |
| 44 | +| Typescript | One file per schema | |
| 45 | + |
| 46 | +### Imports |
| 47 | + |
| 48 | +A mapping MAY choose to merge imports prior to code generation. If imports are not merged, |
| 49 | +then each `imports` in the SchemaDefinition MUST be mapped to an import statement in the target |
| 50 | +language. |
| 51 | + |
| 52 | +Where modules correspond to structures, there SHOULD be one import in the target language module for |
| 53 | +every import in the source LinkML schema. |
| 54 | + |
| 55 | +Mappings MAY choose to selectively import via inspection of all used elements. |
| 56 | + |
| 57 | +### Naming Conventions for Modules |
| 58 | + |
| 59 | +There MUST be a correspondence between schema `name` and module name. The mapping MAY |
| 60 | +prioritize idioms of the target language over LinkML idioms, although the mapping MUST |
| 61 | +be deterministic. |
| 62 | + |
| 63 | +For example, if the target language has module names as `CamelCase` then a mapping MAY |
| 64 | +translate all module names using a standard camel case string transformation. |
| 65 | + |
| 66 | +Schema level metadata MAY be included in the header of the module. This MAY |
| 67 | +be as comments, but if the target language supports module-level variables or other |
| 68 | +ways to make schema metadata introspectable at runtime, these mechanisms SHOULD be used. |
| 69 | + |
| 70 | +Current implementations: |
| 71 | + |
| 72 | +| Target | Default Mapping | |
| 73 | +|---------------|-----------------| |
| 74 | +| Dataclasses | underscore | |
| 75 | +| Pydantic | underscore | |
| 76 | +| Java | CamelCase | |
| 77 | +| Typescript | CamelCase | |
| 78 | + |
| 79 | +## Mapping of LinkML Classes |
| 80 | + |
| 81 | +### Target Constructs |
| 82 | + |
| 83 | +Different languages support different constructs, which may all make |
| 84 | +appropriate targets for LinkML classes. For example: |
| 85 | + |
| 86 | +* Java has both Interfaces and Classes, and newer versions of Java support Records |
| 87 | +* Scala has traits and sealed traits |
| 88 | +* Rust has structs, traits, and enums |
| 89 | +* Typescript has classes and interfaces |
| 90 | + |
| 91 | +The choice should reflect whatever is most idiomatic for the target language. |
| 92 | +The generation MAY allow for different mappings, controlled by either user configuration, |
| 93 | +or by properties of either the schema and its elements. |
| 94 | + |
| 95 | +For example, when mapping to Rust, a generator MAY choose to map to either structs |
| 96 | +or to struct/enum/trait combinations, depending on whether polymorphism is used |
| 97 | +in the schema. |
| 98 | + |
| 99 | +A mapping MAY be non-isomorphic (i.e not one-to-one). For example, |
| 100 | +in languages that have a split between |
| 101 | +interface-like constructs and concrete class-like constructs, a generator MAY choose to |
| 102 | +implement a mapping where each LinkML class creates *both* structures, in order |
| 103 | +to leverage the full benefits of the target language. |
| 104 | + |
| 105 | + |
| 106 | + |
| 107 | +### Class level variables |
| 108 | + |
| 109 | +A generator SHOULD map certain properties of schema elements to class level |
| 110 | +variables where the target language allows. When this mapping occurs, |
| 111 | +the names of the class level variables MUST correspond to LinkML metamodel elements, |
| 112 | +allowing for translation to language idioms. |
| 113 | + |
| 114 | +The following class-level variables are recommended: |
| 115 | + |
| 116 | +* class_class_uri: the semantic URI of the class, as defined by `class_uri` |
| 117 | +* class_class_curie: the CURIE form of class_class_uri |
| 118 | +* class_name: the normalized name of the LinkML class, corresponding to the name of the class |
| 119 | +* class_model_uri: the URI of the class within the namespace of the schema |
| 120 | + |
| 121 | +A generator MAY allow for generation of class level variables to be suppressed. |
| 122 | + |
| 123 | +A generator MAY choose to use annotations in place of class-level variables |
| 124 | + |
| 125 | +Current implementations: |
| 126 | + |
| 127 | +| Target | Default Mapping | |
| 128 | +|---------------|-----------------| |
| 129 | +| Dataclasses | all | |
| 130 | +| Pydantic | none (planned) | |
| 131 | +| Java | none (planned) | |
| 132 | +| Typescript | none | |
| 133 | + |
| 134 | +### Inheritance |
| 135 | + |
| 136 | +Mapping of `is_a` and `mixins` may be dependent on properties of target language |
| 137 | +constructs. |
| 138 | + |
| 139 | +Generators MAY choose to roll-down attributes from parent classes. |
| 140 | + |
| 141 | +If the target construct for a class supports single inheritance, then the is_a SHOULD |
| 142 | +correspond to the analogous construct (for example, `extends` in Java). The mixins MAY be |
| 143 | +represented using an alternative construct (such as `implements` in Java). |
| 144 | + |
| 145 | +Generators MAY choose to create type checkers for runtime inspection |
| 146 | +of instantiated classes. This SHOULD NOT be done in languages that |
| 147 | +support polymorphism and type checking natively (for example `isinstance` in Python) |
| 148 | + |
| 149 | +### Mapping of class slots and attributes |
| 150 | + |
| 151 | +Mapping of class slots and attributes should be entailment-preserving, such that |
| 152 | +the semantics of the generated code in the target language corresponds to |
| 153 | +a *derived* schema. |
| 154 | + |
| 155 | +For example, consider a schema with classes: |
| 156 | + |
| 157 | +```yaml |
| 158 | +classes: |
| 159 | + NamedThing: |
| 160 | + attributes: |
| 161 | + id: |
| 162 | + Person: |
| 163 | + is_a: NamedThing |
| 164 | + attributes: |
| 165 | + address: |
| 166 | +``` |
| 167 | +
|
| 168 | +The following are both valid ways to map `Person` to a target language construct: |
| 169 | + |
| 170 | +* a non-class structure with two asserted attributes, `id` and `address` (the address has been "rolled down") |
| 171 | +* a class structure whose structure mirrors the source LinkML, with `id` only asserted on the parent |
| 172 | + |
| 173 | +### Constructors |
| 174 | + |
| 175 | +Generation of constructors will be highly dependent on source language, but the following |
| 176 | +guidelines should be followed: |
| 177 | + |
| 178 | +- if the target language allows named assignment of attributes, then this SHOULD be the default constructor style |
| 179 | +- if the target language allows positional assignment of attributes, then this MAY be allowed: |
| 180 | + - the order of attributes MUST correspond to `rank` metaslots in the derived schema, if specified |
| 181 | + - otherwise the ordering MUST correspond to the order in which slots or attributes are specified |
| 182 | + - starting from the is_a root, working down the is_a hierarchy |
| 183 | + - `slots` order prioritized over `attributes` |
| 184 | +- if the target language and idiom uses builder patterns then these may be used |
| 185 | + |
| 186 | +### Mapping of schema-level slots |
| 187 | + |
| 188 | +### Mapping of constraints and rules |
| 189 | + |
| 190 | +Constraints and rules in LinkML SHOULD be mapped to declarative target |
| 191 | +constructs where possible. If this is not possible then the generator |
| 192 | +MAY choose to generate code that implements the constraint or rule. |
| 193 | + |
| 194 | +For example, when mapping the LinkML metaslot `maximum_value` to Pydantic, |
| 195 | +a `Field` with property `maximum_value` should be used. This makes the generated |
| 196 | +code more declarative, and takes advantage of the target framework's |
| 197 | +builtin abilities to perform validation. |
| 198 | + |
| 199 | +If mapping to a target that does not support the feature then code generation |
| 200 | +may be applied. For example, if there is no equivalent direct correspondence to |
| 201 | +`maximum_value`, then the generated construct MAY include a validation |
| 202 | +procedure that checks the value against the maximum. |
| 203 | + |
| 204 | +### Complex boolean ranges |
| 205 | + |
| 206 | +## Mapping of enums |
| 207 | + |
| 208 | +## Runtime dependencies |
| 209 | + |
| 210 | +A generator MAY choose to generate code that is either self-contained, |
| 211 | +or that has runtime dependencies. The runtime library SHOULD be kept |
| 212 | +minimal, within the constraints of the requirements of the runtime library. |
| 213 | + |
| 214 | +## Mapping of types |
| 215 | + |
| 216 | +Types SHOULD be mapped to one of the following: |
| 217 | + |
| 218 | +* primitive types in the target language |
| 219 | +* type variables |
| 220 | +* class-like structures that can emulate scalar-like properties |
| 221 | + |
| 222 | +Some languages like Java have a choice for primitives, either builtin |
| 223 | +like `str` or classes like `String`. |
| 224 | + |
| 225 | +## Metaclasses |
| 226 | + |
| 227 | +If the target language supports it then metaclasses may be used to |
| 228 | +type generated LinkML classes. |
| 229 | + |
| 230 | + |
| 231 | + |
| 232 | +## Loaders and Dumpers |
| 233 | + |
| 234 | +## Packages and package distribution |
| 235 | + |
0 commit comments