aphp
diff --git a/‎changelog.md‎
Lines changed: 1 addition & 0 deletions b/‎changelog.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/assets/overrides/main.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/assets/overrides/main.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/training/training-api.md‎
Lines changed: 70 additions & 0 deletions b/‎docs/training/training-api.md‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎docs/tutorials/index.md‎
Lines changed: 26 additions & 7 deletions b/‎docs/tutorials/index.md‎
Lines changed: 26 additions & 7 deletions
diff --git a/‎docs/tutorials/make-a-training-script.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/tutorials/make-a-training-script.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/tutorials/training.md‎ ‎docs/tutorials/training-ner.md‎docs/tutorials/training.md renamed to docs/tutorials/training-ner.md
Lines changed: 8 additions & 24 deletions b/‎docs/tutorials/training.md‎ ‎docs/tutorials/training-ner.md‎docs/tutorials/training.md renamed to docs/tutorials/training-ner.md
Lines changed: 8 additions & 24 deletions
@@ -9,6 +9,7 @@
 - Parquet writer now has a `pyarrow_write_kwargs` to pass to [pyarrow.dataset.write_dataset](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html#pyarrow-dataset-write-dataset)
 - LinearSchedule (mostly used for LR scheduling) now allows a `end_value` parameter to configure if the learning rate should decay to zero or another value.
 - New `eds.explode` pipe that splits one document into multiple documents, one per span yielded by its `span_getter` parameter, each new document containing exactly that single span.
+- New `Training a span classifier` tutorial, and reorganized deep-learning docs
 
 ## Fixed
 
 
@@ -1,5 +1,5 @@
 {% extends "base.html" %}
 
 {% block announce %}
-  Check out the new <a href="/tutorials/training">Model Training tutorial</a> !
+  Check out the new <a href="/tutorials/training-span-classifier">span classifier training tutorial</a> !
 {% endblock %}
@@ -0,0 +1,70 @@
+# Training API
+
+Under the hood, EDS-NLP uses PyTorch to train and run deep-learning models. EDS-NLP acts as a sidekick to PyTorch, providing a set of tools to perform preprocessing, composition and evaluation. The trainable [`TorchComponents`][edsnlp.core.torch_component.TorchComponent] are actually PyTorch modules with a few extra methods to handle the feature preprocessing and postprocessing. Therefore, EDS-NLP is fully compatible with the PyTorch ecosystem.
+
+To build and train a deep learning model, you can either build a training script from scratch (check out the [*Make a training script*](/tutorials/make-a-training-script) tutorial), or use the provided training API. The training API is designed to be flexible and can handle various types of models, including Named Entity Recognition (NER) models, span classifiers, and more. However, if you need more control over the training process, consider writing your own training script.
+
+EDS-NLP supports training models either from the command line or from a Python script or notebook, and switching between the two is relatively straightforward thanks to the use of [Confit](https://aphp.github.io/confit/).
+
+??? note "A word about Confit"
+
+    EDS-NLP makes heavy use of [Confit](https://aphp.github.io/confit/), a configuration library that allows you call functions from Python or the CLI, and validate and optionally cast their arguments.
+
+    The EDS-NLP function described on this page is the `train` function of the `edsnlp.train` module. When passing a dict to a type-hinted argument (either from a `config.yml` file, or by calling the function in Python), Confit will instantiate the correct class with the arguments provided in the dict. For instance, we pass a dict to the `train_data` parameter, which is actually type hinted as a `TrainingData`: this dict will actually be used as keyword arguments to instantiate this `TrainingData` object. You can also instantiate a `TrainingData` object directly and pass it to the function.
+
+    You can also tell Confit specifically which class you want to instantiate by using the `@register_name = "name_of_the_registered_class"` key and value in a dict or config section. We make a heavy use of this mechanism to build pipeline architectures.
+
+## How it works
+
+To train a model with EDS-NLP, you need the following ingredients:
+
+- **Pipeline**: a [pipeline][edsnlp.core.pipeline.Pipeline] with at least one trainable component. Components that share parameters or that must be updated together are trained in the same phase.
+
+- **Training streams**: one or more streams of documents wrapped in a TrainingData object. Each of these specifies how to shuffle the stream, how to batch it with a stat expression such as `2000 words` or `16 spans`, whether to split batches into sub batches for gradient accumulation, and which components it feeds.
+
+- **Validation streams**: optional streams of documents used for periodic evaluation.
+
+- **Scorer**: a [scorer][edsnlp.training.trainer.GenericScorer] that defines the metrics to compute on the validation set. By default, it reports speed and uses autocast during scoring unless disabled.
+
+- **Optimizer**: an [optimizer][edsnlp.training.optimizer.ScheduledOptimizer]. Defaults to AdamW with linear warmup and two groups of parameters, one for the transformer with lr 5•10^-5, and one for the rest of the model with lr 3•10^-4.
+
+- **A bunch of hyperparameters**: finally, the function expects various hyperparameters (most of them set to sensible defaults) to the function, such as `max_steps`, `seed`, `validation_interval`, `checkpoint_interval`, `grad_max_norm`, and more.
+
+The training then proceeds in several steps:
+
+**Setup**
+The function prepares the device with [Accelerate](https://huggingface.co/docs/accelerate/index), creates the output folders, materializes the validation set from the user-provided stream, and runs a post-initialization pass on the training data when requested. This `post_init` op let's the pipeline inspect the data before learning to adjust the number of heads depending on the labels encountered. Finally, the optimizer is instantiated.
+
+**Phases**
+Training runs **by phases**. A phase groups components that should be optimized together because they share parameters (think for instance of a BERT shared between multiple models). During a phase, losses are computed for each of these "active" components at each step, and only their parameters are updated.
+
+**Data preparation**
+Each TrainingData object turns its streams of documents into device ready batches. It optionally shuffles the stream, preprocess the documents for the active components, builds stat-aware batches (for instance, limiting the number of tokens per batch), optionally splits batches into sub batches for gradient accumulation, then converts everything into device-ready tensors. This can be done in parallel to the actual deep-learning work.
+
+**Optimization**
+For every training step the function draws one batch from each training stream (in case there are more than one) and synchronizes statistics across processes (in case we're doing multi-GPU training) to keep supports and losses consistent. It runs forward passes for the phase components. When several components reuse the same intermediate features a cache avoids recomputation. Gradients are accumulated over sub batches.
+
+**Gradient safety**
+Gradients are always clipped to `grad_max_norm`. Optionally the function tracks an exponential moving mean and variance of the gradient norm. If a spike is detected you can clip to the running mean or to a threshold or skip the update depending on `grad_dev_policy`. This protects training from rare extreme updates.
+
+**Validation and logging**
+At regular intervals the scorer evaluates the pipeline on the validation documents. It isolates each task by copying docs and disabling unrelated pipes to avoid leakage. It reports throughput and metrics for NER and span attribute classifiers plus any custom metrics.
+
+**Checkpoints and output**
+The model is saved on schedule and at the end in `output_dir/model-last` unless saving is disabled.
+
+## Tutorials and examples
+
+--8<-- "docs/tutorials/index.md:deep-learning-tutorials"
+
+## Parameters of `edsnlp.train` {: #edsnlp.training.trainer.train }
+
+Here are the parameters you can pass to the `train` function:
+
+::: edsnlp.training.trainer.train
+    options:
+        heading_level: 4
+        only_parameters: no-header
+        skip_parameters: []
+        show_source: false
+        show_toc: false
@@ -2,7 +2,9 @@
 
 We provide step-by-step guides to get you started. We cover the following use-cases:
 
-<!-- --8<-- [start:tutorials] -->
+### Base tutorials
+
+<!-- --8<-- [start:classic-tutorials] -->
 
 === card {: href=/tutorials/spacy101 }
 
@@ -83,21 +85,35 @@ We provide step-by-step guides to get you started. We cover the following use-ca
     ---
     Quickly visualize the results of your pipeline as annotations or tables.
 
+### Deep learning tutorials
+
+We also provide tutorials on how to train deep-learning models with EDS-NLP. These tutorials cover the training API, hyperparameter tuning, and more.
+
+<!-- --8<-- [start:deep-learning-tutorials] -->
+
 === card {: href=/tutorials/make-a-training-script }
 
     :fontawesome-solid-flask:
-    **Deep learning tutorial**
+    **Writing a training script**
+
+    ---
+    Learn how EDS-NLP handles training deep-neural networks, and how to write a training script on your own.
+
+=== card {: href=/tutorials/training-ner }
+
+    :fontawesome-solid-highlighter:
+    **Training a NER model**
 
     ---
-    Learn how EDS-NLP handles training deep-neural networks.
+    Learn how to quickly train a NER model with `edsnlp.train`.
 
-=== card {: href=/tutorials/training }
+=== card {: href=/tutorials/training-span-classifier }
 
-    :fontawesome-solid-brain:
-    **Training API**
+    :fontawesome-solid-circle-check:
+    **Training a Span Classifier model**
 
     ---
-    Learn how to quicky train a deep-learning model with `edsnlp.train`.
+    Learn how to quickly train a biopsy date classifier model model with `edsnlp.train`.
 
 === card {: href=/tutorials/tuning }
 
@@ -108,4 +124,7 @@ We provide step-by-step guides to get you started. We cover the following use-ca
     Learn how to tune hyperparameters of a model with `edsnlp.tune`.
 
 
+<!-- --8<-- [end:deep-learning-tutorials] -->
+
+
 <!-- --8<-- [end:tutorials] -->
@@ -1,8 +1,8 @@
-# Deep-learning tutorial
+# Writing a training script
 
 In this tutorial, we'll see how we can write our own deep learning model training script with EDS-NLP. We will implement a script to train a named-entity recognition (NER) model.
 
-If you do not care about the details and just want to train a model, we suggest you to use the [training API](/tutorials/training) and move on to the next tutorial.
+If you do not care about the details and just want to train a model, we suggest that you use the [training API](/training/training-api) and move on to the [next tutorial](/tutorials/training-ner).
 
 !!! warning "Hardware requirements"
 
@@ -440,7 +440,7 @@ python train.py --config config.cfg --nlp.components.ner.embedding.embedding.tra
 
 ## Going further
 
-EDS-NLP also provides a generic training script that follows the same structure as the one we just wrote. You can learn more about in the [next Training API tutorial](/tutorials/training).
+EDS-NLP also provides a generic training script that follows the same structure as the one we just wrote. You can learn more about in the [next NER model training tutorial through EDS-NLP training API](/tutorials/training-ner).
 
 This tutorial gave you a glimpse of the training API of EDS-NLP. To build a custom trainable component, you can refer to the [TorchComponent][edsnlp.core.torch_component.TorchComponent] class or look up the implementation of [some of the trainable components on GitHub](https://github.com/aphp/edsnlp/tree/master/edsnlp/pipes/trainable).
 
 
@@ -1,12 +1,12 @@
-# Training API {: #edsnlp.training.trainer.train }
+# Training a NER model
 
 In this tutorial, we'll see how we can quickly train a deep learning model with EDS-NLP using the `edsnlp.train` function.
 
 !!! warning "Hardware requirements"
 
-    Training a modern deep learning model requires a lot of computational resources. We recommend using a machine with a GPU, ideally with at least 16GB of VRAM. If you don't have access to a GPU, you can use a cloud service like [Google Colab](https://colab.research.google.com/), [Kaggle](https://www.kaggle.com/), [Paperspace](https://www.paperspace.com/) or [Vast.ai](https://vast.ai/).
+    Training modern deep-learning models is compute-intensive. A GPU with **≥ 16 GB VRAM** is recommended. Training on CPU is possible but much slower. On macOS, PyTorch’s MPS backend may not support all operations and you'll likely hit `NotImplementedError` messages : in this case, fall back to CPU using the `cpu=True` option.
 
-If you need a high level of control over the training procedure, we suggest you read the previous ["Deep learning tutorial"](./make-a-training-script.md) to understand how to build a training loop from scratch with EDS-NLP.
+This tutorial uses EDS-NLP’s command-line interface, `python -m edsnlp.train`. If you need fine-grained control over the loop, consider [**writing your own training script**](./make-a-training-script.md).
 
 ## Creating a project
 
@@ -66,13 +66,7 @@ uv pip install -e ".[dev]" -p $(uv python find)
 
 EDS-NLP supports training models either [from the command line](#from-the-command-line) or [from a Python script or notebook](#from-a-script-or-a-notebook), and switching between the two is straightforward thanks to the use of [Confit](https://aphp.github.io/confit/).
 
-??? note "A word about Confit"
-
-    EDS-NLP makes heavy use of [Confit](https://aphp.github.io/confit/), a configuration library that allows you call functions from Python or the CLI, and validate and optionally cast their arguments.
-
-    The EDS-NLP function used in this script is the `train` function of the `edsnlp.train` module. When passing a dict to a type-hinted argument (either from a `config.yml` file, or by calling the function in Python), Confit will instantiate the correct class with the arguments provided in the dict. For instance, we pass a dict to the `val_data` parameter, which is actually type hinted as a `SampleGenerator`: this dict will actually be used as keyword arguments to instantiate this `SampleGenerator` object. You can also instantiate a `SampleGenerator` object directly and pass it to the function.
-
-    You can also tell Confit specifically which class you want to instantiate by using the `@register_name = "name_of_the_registered_class"` key and value in a dict or config section. We make a heavy use of this mechanism to build pipeline architectures.
+Visit the [`edsnlp.train` documentation][edsnlp.training.trainer.train] for a list of all the available options.
 
 === "From the command line"
 
@@ -170,7 +164,7 @@ EDS-NLP supports training models either [from the command line](#from-the-comman
         - '@factory': eds.standoff_dict2doc
           span_setter: 'gold_spans'
 
-    logger:
+    loggers:
         - '@loggers': csv
         - '@loggers': rich
           fields:
@@ -206,7 +200,7 @@ EDS-NLP supports training models either [from the command line](#from-the-comman
       grad_max_norm: 1.0
       scorer: ${ scorer }
       optimizer: ${ optimizer }
-      logger: ${ logger }
+      logger: ${ loggers }
       # Do preprocessing in parallel on 1 worker
       num_workers: 1
       # Enable on Mac OS X or if you don't want to use available GPUs
@@ -297,7 +291,7 @@ EDS-NLP supports training models either [from the command line](#from-the-comman
     )
 
     #
-    logger = [
+    loggers = [
         CSVLogger(),
         RichLogger(
             fields={
@@ -328,7 +322,7 @@ EDS-NLP supports training models either [from the command line](#from-the-comman
         optimizer=optimizer,
         grad_max_norm=1.0,
         output_dir="artifacts",
-        loggers
+        logger=loggers,
         # Do preprocessing in parallel on 1 worker
         num_workers=1,
         # Enable on Mac OS X or if you don't want to use available GPUs
@@ -349,16 +343,6 @@ cfg = confit.Config.from_disk(
 nlp = train(**cfg["train"])
 ```
 
-Here are the parameters you can pass to the `train` function:
-
-::: edsnlp.training.trainer.train
-    options:
-        heading_level: 4
-        only_parameters: true
-        skip_parameters: []
-        show_source: false
-        show_toc: false
-
 ## Use the model
 
 You can now load the model and use it to process some text: