Skip to content

Commit bfb6424

Browse files
authored
Merge pull request #157 from linkml/adding-codegen-spec
Adding a codegen spec
2 parents c948de3 + b6df923 commit bfb6424

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Mapping Schemas to Programming Language Structures
2+
3+
LinkML Schemas can be mapped to modeling constructs in different programming languages.
4+
This allows for *code generation*, in which a LinkML schema is used as the source to generate
5+
code in a target language.
6+
7+
This has a number of advantages, including type safety, programmer efficiency, ease of mapping
8+
to other serializations, helping ensure code and domain models are aligned.
9+
10+
As each programming language differs in which constructs it offers and the precise
11+
semantics of these constructs, there is single standard for mapping. Instead, we provide
12+
a set of general recommendations that can be adapted to each language.
13+
14+
## Terminology
15+
16+
* Programming language constructs:
17+
* `Structure`: a compound datatype that consists of one of more attributes
18+
* `Class`: a Structure that supports or partially supports inheritance
19+
* `Attribute`: a field or property of a class or struct
20+
* `Class-level variable`: a property of a class rather than of an instance of that class
21+
* `Module`: A file-level bundle of classes or structures
22+
* `Package`: A collection of modules
23+
24+
## Mapping of LinkML Schemas
25+
26+
### Schemas to Modules
27+
28+
A schema SHOULD be mapped to EITHER a module or a collection of modules, depending on the idioms
29+
of the target language.
30+
31+
For languages where it is conventional to include multiple classes or structures in
32+
a single module (e.g. Python), the schema SHOULD correspond to a module.
33+
34+
For languages where it is conventional to include a single classes or structures in
35+
a single module (e.g. Java), a single module will correspond to a single LinkML class or enum.
36+
37+
Current implementations:
38+
39+
| Target | Default Mapping |
40+
|---------------|---------------------|
41+
| Dataclasses | One file per schema |
42+
| Pydantic | One file per schema |
43+
| Java | One file per class |
44+
| Typescript | One file per schema |
45+
46+
### Imports
47+
48+
A mapping MAY choose to merge imports prior to code generation. If imports are not merged,
49+
then each `imports` in the SchemaDefinition MUST be mapped to an import statement in the target
50+
language.
51+
52+
Where modules correspond to structures, there SHOULD be one import in the target language module for
53+
every import in the source LinkML schema.
54+
55+
Mappings MAY choose to selectively import via inspection of all used elements.
56+
57+
### Naming Conventions for Modules
58+
59+
There MUST be a correspondence between schema `name` and module name. The mapping MAY
60+
prioritize idioms of the target language over LinkML idioms, although the mapping MUST
61+
be deterministic.
62+
63+
For example, if the target language has module names as `CamelCase` then a mapping MAY
64+
translate all module names using a standard camel case string transformation.
65+
66+
Schema level metadata MAY be included in the header of the module. This MAY
67+
be as comments, but if the target language supports module-level variables or other
68+
ways to make schema metadata introspectable at runtime, these mechanisms SHOULD be used.
69+
70+
Current implementations:
71+
72+
| Target | Default Mapping |
73+
|---------------|-----------------|
74+
| Dataclasses | underscore |
75+
| Pydantic | underscore |
76+
| Java | CamelCase |
77+
| Typescript | CamelCase |
78+
79+
## Mapping of LinkML Classes
80+
81+
### Target Constructs
82+
83+
Different languages support different constructs, which may all make
84+
appropriate targets for LinkML classes. For example:
85+
86+
* Java has both Interfaces and Classes, and newer versions of Java support Records
87+
* Scala has traits and sealed traits
88+
* Rust has structs, traits, and enums
89+
* Typescript has classes and interfaces
90+
91+
The choice should reflect whatever is most idiomatic for the target language.
92+
The generation MAY allow for different mappings, controlled by either user configuration,
93+
or by properties of either the schema and its elements.
94+
95+
For example, when mapping to Rust, a generator MAY choose to map to either structs
96+
or to struct/enum/trait combinations, depending on whether polymorphism is used
97+
in the schema.
98+
99+
A mapping MAY be non-isomorphic (i.e not one-to-one). For example,
100+
in languages that have a split between
101+
interface-like constructs and concrete class-like constructs, a generator MAY choose to
102+
implement a mapping where each LinkML class creates *both* structures, in order
103+
to leverage the full benefits of the target language.
104+
105+
106+
107+
### Class level variables
108+
109+
A generator SHOULD map certain properties of schema elements to class level
110+
variables where the target language allows. When this mapping occurs,
111+
the names of the class level variables MUST correspond to LinkML metamodel elements,
112+
allowing for translation to language idioms.
113+
114+
The following class-level variables are recommended:
115+
116+
* class_class_uri: the semantic URI of the class, as defined by `class_uri`
117+
* class_class_curie: the CURIE form of class_class_uri
118+
* class_name: the normalized name of the LinkML class, corresponding to the name of the class
119+
* class_model_uri: the URI of the class within the namespace of the schema
120+
121+
A generator MAY allow for generation of class level variables to be suppressed.
122+
123+
A generator MAY choose to use annotations in place of class-level variables
124+
125+
Current implementations:
126+
127+
| Target | Default Mapping |
128+
|---------------|-----------------|
129+
| Dataclasses | all |
130+
| Pydantic | none (planned) |
131+
| Java | none (planned) |
132+
| Typescript | none |
133+
134+
### Inheritance
135+
136+
Mapping of `is_a` and `mixins` may be dependent on properties of target language
137+
constructs.
138+
139+
Generators MAY choose to roll-down attributes from parent classes.
140+
141+
If the target construct for a class supports single inheritance, then the is_a SHOULD
142+
correspond to the analogous construct (for example, `extends` in Java). The mixins MAY be
143+
represented using an alternative construct (such as `implements` in Java).
144+
145+
Generators MAY choose to create type checkers for runtime inspection
146+
of instantiated classes. This SHOULD NOT be done in languages that
147+
support polymorphism and type checking natively (for example `isinstance` in Python)
148+
149+
### Mapping of class slots and attributes
150+
151+
Mapping of class slots and attributes should be entailment-preserving, such that
152+
the semantics of the generated code in the target language corresponds to
153+
a *derived* schema.
154+
155+
For example, consider a schema with classes:
156+
157+
```yaml
158+
classes:
159+
NamedThing:
160+
attributes:
161+
id:
162+
Person:
163+
is_a: NamedThing
164+
attributes:
165+
address:
166+
```
167+
168+
The following are both valid ways to map `Person` to a target language construct:
169+
170+
* a non-class structure with two asserted attributes, `id` and `address` (the address has been "rolled down")
171+
* a class structure whose structure mirrors the source LinkML, with `id` only asserted on the parent
172+
173+
### Constructors
174+
175+
Generation of constructors will be highly dependent on source language, but the following
176+
guidelines should be followed:
177+
178+
- if the target language allows named assignment of attributes, then this SHOULD be the default constructor style
179+
- if the target language allows positional assignment of attributes, then this MAY be allowed:
180+
- the order of attributes MUST correspond to `rank` metaslots in the derived schema, if specified
181+
- otherwise the ordering MUST correspond to the order in which slots or attributes are specified
182+
- starting from the is_a root, working down the is_a hierarchy
183+
- `slots` order prioritized over `attributes`
184+
- if the target language and idiom uses builder patterns then these may be used
185+
186+
### Mapping of schema-level slots
187+
188+
### Mapping of constraints and rules
189+
190+
Constraints and rules in LinkML SHOULD be mapped to declarative target
191+
constructs where possible. If this is not possible then the generator
192+
MAY choose to generate code that implements the constraint or rule.
193+
194+
For example, when mapping the LinkML metaslot `maximum_value` to Pydantic,
195+
a `Field` with property `maximum_value` should be used. This makes the generated
196+
code more declarative, and takes advantage of the target framework's
197+
builtin abilities to perform validation.
198+
199+
If mapping to a target that does not support the feature then code generation
200+
may be applied. For example, if there is no equivalent direct correspondence to
201+
`maximum_value`, then the generated construct MAY include a validation
202+
procedure that checks the value against the maximum.
203+
204+
### Complex boolean ranges
205+
206+
## Mapping of enums
207+
208+
## Runtime dependencies
209+
210+
A generator MAY choose to generate code that is either self-contained,
211+
or that has runtime dependencies. The runtime library SHOULD be kept
212+
minimal, within the constraints of the requirements of the runtime library.
213+
214+
## Mapping of types
215+
216+
Types SHOULD be mapped to one of the following:
217+
218+
* primitive types in the target language
219+
* type variables
220+
* class-like structures that can emulate scalar-like properties
221+
222+
Some languages like Java have a choice for primitives, either builtin
223+
like `str` or classes like `String`.
224+
225+
## Metaclasses
226+
227+
If the target language supports it then metaclasses may be used to
228+
type generated LinkML classes.
229+
230+
231+
232+
## Loaders and Dumpers
233+
234+
## Packages and package distribution
235+

0 commit comments

Comments
 (0)