You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: changelog.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,11 +23,15 @@
23
23
24
24
- Sort files before iterating over a standoff or json folder to ensure reproducibility
25
25
- Sentence detection now correctly match capitalized letters + apostrophe
26
+
- We now ensure that the workers pool is properly closed whatever happens (exception, garbage collection, data ending) in the `multiprocessing` backend. This prevents some executions from hanging indefinitely at the end of the processing.
26
27
27
28
### Data API changes
28
29
29
30
-`LazyCollection` objects are now called `Stream` objects
30
-
- By default, `multiprocessing` backend now preserves the order of the input data
31
+
- By default, `multiprocessing` backend now preserves the order of the input data. To disable this and improve performance, use `deterministic=False` in the `set_processing` method
- For simple {pre-process → model → post-process} pipelines, GPU inference can be up to 30% faster in non-deterministic mode (results can be out of order) and up to 20% faster in deterministic mode (results are in order)
34
+
- For multitask pipelines, GPU inference can be up to twice as fast (measured in a two-tasks BERT+NER+Qualif pipeline on T4 and A100 GPUs)
31
35
- The `.map_batches`, `.map_pipeline` and `.map_gpu` methods now support a specific `batch_size` and batching function, instead of having a single batch size for all pipes
32
36
- Readers now have a `loop` parameter to cycle over the data indefinitely (useful for training)
33
37
- Readers now have a `shuffle` parameter to shuffle the data before iterating over it
0 commit comments