11# TSV - Processing Tools
22
3+ Create .tsv files that can be viewed and edited with [ neat] ( https://github.com/qurator-spk/neat ) .
4+
35## Installation:
46
7+ Clone this project and the [ SBB-utils] ( https://github.com/qurator-spk/sbb_utils ) .
8+
59Setup virtual environment:
610```
711virtualenv --python=python3.6 venv
@@ -19,7 +23,8 @@ pip install -U pip
1923
2024Install package together with its dependencies in development mode:
2125```
22- pip install -e ./
26+ pip install -e sbb_utils
27+ pip install -e page2tsv
2328```
2429
2530## PAGE-XML to TSV Transformation:
@@ -59,3 +64,33 @@ Create a URL-annotated TSV file from an existing TSV file:
5964```
6065annotate-tsv enp_DE.tsv enp_DE-annotated.tsv
6166```
67+
68+ # Command-line interface:
69+
70+ ```
71+ page2tsv [OPTIONS] PAGE_XML_FILE TSV_OUT_FILE
72+
73+ Options:
74+ --purpose [NERD|OCR] Purpose of output tsv file.
75+
76+ NERD: NER/NED application/ground-truth creation.
77+
78+ OCR: OCR application/ground-truth creation.
79+
80+ default: NERD.
81+ --image-url TEXT
82+ --ner-rest-endpoint TEXT REST endpoint of sbb_ner service. See
83+ https://github.com/qurator-spk/sbb_ner for
84+ details. Only applicable in case of NERD.
85+ --ned-rest-endpoint TEXT REST endpoint of sbb_ned service. See
86+ https://github.com/qurator-spk/sbb_ned for
87+ details. Only applicable in case of NERD.
88+ --noproxy disable proxy. default: enabled.
89+ --scale-factor FLOAT default: 1.0
90+ --ned-threshold FLOAT
91+ --min-confidence FLOAT
92+ --max-confidence FLOAT
93+ --ned-priority INTEGER
94+ --help Show this message and exit.
95+
96+ ```
0 commit comments