Skip to content

Commit 6fb35b9

Browse files
committed
ci: drop conjugator for now
1 parent cb0d7b6 commit 6fb35b9

File tree

6 files changed

+146
-251
lines changed

6 files changed

+146
-251
lines changed

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ jobs:
6565
if: matrix.python-version != '3.10' && matrix.python-version != '3.12' && matrix.python-version != '3.13' && matrix.python-version != '3.14'
6666

6767
- name: Install dependencies
68-
run: uv pip install -e . --group dev --group setup
68+
run: uv pip install -e . --group dev
6969
if: matrix.python-version == '3.10'
7070

7171
- name: Install dependencies

contributing.md

Lines changed: 19 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -9,52 +9,35 @@ We welcome contributions ! There are many ways to help. For example, you can:
99

1010
## Development installation
1111

12-
To be able to run the test suite, run the example notebooks and develop your own pipeline component, you should clone the repo and install it locally.
12+
To be able to run the test suite, run the example notebooks and develop your own pipeline component, you should clone the repo and install it locally. We use `uv` to manage virtual environments, and think you should too.
1313

14-
<div class="termy">
15-
16-
```console
14+
```bash { data-md-color-scheme="slate" }
1715
# Clone the repository and change directory
18-
$ git clone https://github.com/aphp/edsnlp.git
19-
---> 100%
20-
$ cd edsnlp
16+
git clone https://github.com/aphp/edsnlp.git
17+
cd edsnlp
2118

2219
# Optional: create a virtual environment
23-
$ python -m venv venv
24-
$ source venv/bin/activate
20+
uv venv
21+
source .venv/bin/activate
2522

26-
# Install the package with common, dev, setup dependencies in editable mode
27-
$ pip install -e . --group dev --group setup
28-
# And build resources
29-
$ python scripts/conjugate_verbs.py
23+
# Install the package with common, dev dependencies in editable mode
24+
uv pip install -e . --group dev
3025
```
3126

32-
</div>
33-
3427
To make sure the pipeline will not fail because of formatting errors, we added pre-commit hooks using the `pre-commit` Python library. To use it, simply install it:
3528

36-
<div class="termy">
37-
38-
```console
39-
$ pre-commit install
29+
```bash { data-md-color-scheme="slate" }
30+
pre-commit install
4031
```
4132

42-
</div>
43-
4433
The pre-commit hooks defined in the [configuration](https://github.com/aphp/edsnlp/blob/master/.pre-commit-config.yaml) will automatically run when you commit your changes, letting you know if something went wrong.
4534

4635
The hooks only run on staged changes. To force-run it on all files, run:
4736

48-
<div class="termy">
49-
50-
```console
51-
$ pre-commit run --all-files
52-
---> 100%
53-
color:green All good !
37+
```bash { data-md-color-scheme="slate" }
38+
pre-commit run --all-files
5439
```
5540

56-
</div>
57-
5841
## Proposing a merge request
5942

6043
At the very least, your changes should :
@@ -70,7 +53,7 @@ We use the Pytest test suite.
7053
The following command will run the test suite. Writing your own tests is encouraged !
7154

7255
```shell
73-
python -m pytest
56+
pytest
7457
```
7558

7659
!!! warning "Testing Cython code"
@@ -93,11 +76,11 @@ edsnlp/pipes/<pipe>
9376

9477
### Style Guide
9578

96-
We use [Black](https://github.com/psf/black) to reformat the code. While other formatter only enforce PEP8 compliance, Black also makes the code uniform. In short :
79+
We use [Ruff](https://github.com/astral-sh/ruff) to reformat the code. While other formatter only enforce PEP8 compliance, Ruff also makes the code uniform. In short :
9780

98-
> Black reformats entire files in place. It is not configurable.
81+
> Ruff reformats entire files in place. It is not configurable.
9982
100-
Moreover, the CI/CD pipeline enforces a number of checks on the "quality" of the code. To wit, non black-formatted code will make the test pipeline fail. We use `pre-commit` to keep our codebase clean.
83+
Moreover, the CI/CD pipeline enforces a number of checks on the "quality" of the code. To wit, non ruff-formatted code will make the test pipeline fail. We use `pre-commit` to keep our codebase clean.
10184

10285
Refer to the [development install tutorial](#development-installation) for tips on how to format your files automatically.
10386
Most modern editors propose extensions that will format files on save.
@@ -109,19 +92,13 @@ as well as in the documentation itself if need be.
10992

11093
We use `MkDocs` for EDS-NLP's documentation. You can check out the changes you make with:
11194

112-
<div class="termy">
113-
114-
```console
95+
```bash { data-md-color-scheme="slate" }
11596
# Install the requirements
116-
$ pip install -e . --group docs
117-
---> 100%
118-
color:green Installation successful
97+
uv pip install -e . --group dev --group docs
11998

12099
# Run the documentation
121-
$ mkdocs serve
100+
mkdocs serve
122101
```
123102

124-
</div>
125-
126103
Go to [`localhost:8000`](http://localhost:8000) to see your changes. MkDocs watches for changes in the documentation folder
127104
and automatically reloads the page.

edsnlp/conjugator.py

Lines changed: 0 additions & 119 deletions
This file was deleted.

pyproject.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,10 +77,10 @@ dev = [
7777
"configobj>=5.0.9",
7878
"tensorboardx>=2.6.4",
7979
]
80-
setup = [
81-
"mlconjug3<3.9.0", # bug https://github.com/Ars-Linguistica/mlconjug3/pull/506
82-
"numpy<2", # mlconjug has scikit-learn dep which doesn't support for numpy 2 yet
83-
]
80+
# setup = [
81+
# "mlconjug3<3.9.0", # bug https://github.com/Ars-Linguistica/mlconjug3/pull/506
82+
# "numpy<2", # mlconjug has scikit-learn dep which doesn't support for numpy 2 yet
83+
# ]
8484
ml = [
8585
"edsnlp[ml]"
8686
]

scripts/conjugate_verbs.py

Lines changed: 122 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,138 @@
11
import warnings
22
from pathlib import Path
3+
from typing import Dict, List, Union
34

45
import context # noqa
6+
import mlconjug3
7+
import pandas as pd
58
import typer
69

7-
from edsnlp.conjugator import conjugate
8-
from edsnlp.pipelines.qualifiers.hypothesis.patterns import verbs_eds, verbs_hyp
9-
from edsnlp.pipelines.qualifiers.negation.patterns import verbs as neg_verbs
10-
from edsnlp.pipelines.qualifiers.reported_speech.patterns import verbs as rspeech_verbs
10+
from edsnlp.pipes.qualifiers.hypothesis.patterns import verbs_eds, verbs_hyp
11+
from edsnlp.pipes.qualifiers.negation.patterns import verbs as neg_verbs
12+
from edsnlp.pipes.qualifiers.reported_speech.patterns import verbs as rspeech_verbs
1113

1214
warnings.filterwarnings("ignore")
1315

1416

17+
def conjugate_verb(
18+
verb: str,
19+
conjugator: mlconjug3.Conjugator,
20+
) -> pd.DataFrame:
21+
"""
22+
Conjugates the verb using an instance of mlconjug3,
23+
and formats the results in a pandas `DataFrame`.
24+
25+
Parameters
26+
----------
27+
verb : str
28+
Verb to conjugate.
29+
conjugator : mlconjug3.Conjugator
30+
mlconjug3 instance for conjugating.
31+
32+
Returns
33+
-------
34+
pd.DataFrame
35+
Normalized dataframe containing all conjugated forms
36+
for the verb.
37+
"""
38+
39+
df = pd.DataFrame(
40+
conjugator.conjugate(verb).iterate(),
41+
columns=["mode", "tense", "person", "term"],
42+
)
43+
44+
df.term = df.term.fillna(df.person)
45+
df.loc[df.person == df.term, "person"] = None
46+
47+
df.insert(0, "verb", verb)
48+
49+
return df
50+
51+
52+
def conjugate(
53+
verbs: Union[str, List[str]],
54+
language: str = "fr",
55+
) -> pd.DataFrame:
56+
"""
57+
Conjugate a list of verbs.
58+
59+
Parameters
60+
----------
61+
verbs : Union[str, List[str]]
62+
List of verbs to conjugate
63+
language: str
64+
Language to conjugate. Defaults to French (`fr`).
65+
66+
Returns
67+
-------
68+
pd.DataFrame
69+
Dataframe containing the conjugations for the provided verbs.
70+
Columns: `verb`, `mode`, `tense`, `person`, `term`
71+
"""
72+
if isinstance(verbs, str):
73+
verbs = [verbs]
74+
75+
conjugator = mlconjug3.Conjugator(language=language)
76+
77+
df = pd.concat([conjugate_verb(verb, conjugator=conjugator) for verb in verbs])
78+
79+
df = df.reset_index(drop=True)
80+
81+
return df
82+
83+
84+
def get_conjugated_verbs(
85+
verbs: Union[str, List[str]],
86+
matches: Union[List[Dict[str, str]], Dict[str, str]],
87+
language: str = "fr",
88+
) -> List[str]:
89+
"""
90+
Get a list of conjugated verbs.
91+
92+
Parameters
93+
----------
94+
verbs : Union[str, List[str]]
95+
List of verbs to conjugate.
96+
matches : Union[List[Dict[str, str]], Dict[str, str]]
97+
List of dictionary describing the mode/tense/persons to keep.
98+
language : str, optional
99+
[description], by default "fr" (French)
100+
101+
Returns
102+
-------
103+
List[str]
104+
List of terms to look for.
105+
106+
Examples
107+
--------
108+
>>> get_conjugated_verbs(
109+
"aimer",
110+
dict(mode="Indicatif", tense="Présent", person="1p"),
111+
)
112+
['aimons']
113+
"""
114+
115+
if isinstance(matches, dict):
116+
matches = [matches]
117+
118+
terms = []
119+
120+
df = conjugate(
121+
verbs=verbs,
122+
language=language,
123+
)
124+
125+
for match in matches:
126+
q = " & ".join([f'{k} == "{v}"' for k, v in match.items()])
127+
terms.extend(df.query(q).term.unique())
128+
129+
return list(set(terms))
130+
131+
15132
def conjugate_verbs(
16133
output_path: Path = typer.Argument(
17134
"edsnlp/resources/verbs.csv.gz", help="Path to the output CSV table."
18-
)
135+
),
19136
) -> None:
20137
"""
21138
Convenience script to automatically conjugate a set of verbs,

0 commit comments

Comments
 (0)