Skip to content

Commit 2ea8287

Browse files
committed
tests with clang19
1 parent e854fe7 commit 2ea8287

21 files changed

+111
-103
lines changed

CRAN-SUBMISSION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
Version: 5.3.5
2-
Date: 2024-12-22 02:05:11 UTC
3-
SHA: e61f3ff34d5b28e39f4725f700d7350a503a5103
2+
Date: 2025-01-14 17:49:03 UTC
3+
SHA: e854fe79164464d5ba65a4029f767c2f32c51c48

R/tessdata.R

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,15 @@
33
#' Helper function to download training data from the official
44
#' [tessdata](https://tesseract-ocr.github.io/tessdoc/Data-Files) repository.
55
#' On Linux, the fast training data can be installed directly with
6-
#' [yum](https://src.fedoraproject.org/rpms/tesseract) or
7-
#' [apt-get](https://packages.debian.org/search?suite=stable&section=all&arch=any&searchon=names&keywords=tesseract-ocr-).
6+
#' yum or apt-get.
87
#'
98
#' Tesseract uses training data to perform OCR. Most systems default to English
109
#' training data. To improve OCR performance for other languages you can to
1110
#' install the training data from your distribution. For example to install the
1211
#' spanish training data:
1312
#'
14-
#' - [tesseract-ocr-spa](https://packages.debian.org/testing/tesseract-ocr-spa)
15-
#' (Debian, Ubuntu)
16-
#' - `tesseract-langpack-spa` (Fedora, EPEL)
13+
#' - tesseract-ocr-spa (Debian, Ubuntu)
14+
#' - tesseract-langpack-spa (Fedora, EPEL)
1715
#'
1816
#' On Windows and MacOS you can install languages using the [tesseract_download]
1917
#' function which downloads training data directly from

README.Rmd

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -64,38 +64,38 @@ Installation from source on Linux or OSX requires the `Tesseract` library (see b
6464

6565
### Install from source
6666

67-
On Debian or Ubuntu install [libtesseract-dev](https://packages.debian.org/testing/libtesseract-dev) and
68-
[libleptonica-dev](https://packages.debian.org/testing/libleptonica-dev). Also install [tesseract-ocr-eng](https://packages.debian.org/testing/tesseract-ocr-eng) to run examples.
67+
On Debian or Ubuntu install libtesseract-dev, libleptonica-dev, and
68+
tesseract-ocr-eng to run examples.
6969

70-
```
70+
```bash
7171
sudo apt-get install -y libtesseract-dev libleptonica-dev tesseract-ocr-eng
7272
```
7373

7474
On Ubuntu you can optionally use [this PPA](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel) to get the latest version of Tesseract:
7575

76-
```
76+
```bash
7777
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
7878
sudo apt-get install -y libtesseract-dev tesseract-ocr-eng
7979
```
8080

8181
On Fedora you need [tesseract-devel](https://src.fedoraproject.org/rpms/tesseract) and
8282
[leptonica-devel](https://src.fedoraproject.org/rpms/leptonica)
8383

84-
```
84+
```bash
8585
sudo yum install tesseract-devel leptonica-devel
8686
````
8787

8888
On RHEL and CentOS you need [tesseract-devel](https://src.fedoraproject.org/rpms/tesseract) and
8989
[leptonica-devel](https://src.fedoraproject.org/rpms/leptonica) from EPEL
9090

91-
```
91+
```bash
9292
sudo yum install epel-release
9393
sudo yum install tesseract-devel leptonica-devel
9494
````
9595
9696
On OS-X use [tesseract](https://github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb) from Homebrew:
9797
98-
```
98+
```bash
9999
brew install tesseract
100100
```
101101

@@ -109,10 +109,8 @@ tesseract_download('fra')
109109
```
110110

111111
On Linux you need to install the appropriate training data from your distribution.
112-
For example to install the spanish training data:
113-
114-
- [tesseract-ocr-spa](https://packages.debian.org/testing/tesseract-ocr-spa) (Debian, Ubuntu)
115-
- [tesseract-langpack-spa](https://src.fedoraproject.org/rpms/tesseract-langpack) (Fedora, EPEL)
112+
For example to install the spanish training data you need tesseract-ocr-spa
113+
(Debian, Ubuntu) or tesseract-langpack-spa (Fedora, EPEL).
116114

117115
Alternatively you can manually download training data from [github](https://github.com/tesseract-ocr/tessdata)
118116
and store it in a path on disk that you pass in the `datapath` parameter or set a default path via the
@@ -121,7 +119,7 @@ training data format. Make sure to download training data from the branch that m
121119

122120
## Testing with docker (development)
123121

124-
```
122+
```bash
125123
mkdir check
126124
docker run -v `pwd`/check:/check ghcr.io/r-hub/containers/clang19:latest apt install apt-utils libcurl4-openssl-dev &\
127125
R -q -e "install.packages(c('Rcpp', 'jsonlite', 'curl', 'httr', 'yaml', 'rex', 'digest', 'crayon', 'withr', 'cli', 'magick', 'processx', 'tibble', 'V8', 'testthat', 'mockery', 'whoami', 'covr', 'asciicast'), repos = 'https://cloud.r-project.org')" &\

README.md

Lines changed: 31 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -75,42 +75,47 @@ library (see below).
7575

7676
### Install from source
7777

78-
On Debian or Ubuntu install
79-
[libtesseract-dev](https://packages.debian.org/testing/libtesseract-dev)
80-
and
81-
[libleptonica-dev](https://packages.debian.org/testing/libleptonica-dev).
82-
Also install
83-
[tesseract-ocr-eng](https://packages.debian.org/testing/tesseract-ocr-eng)
84-
to run examples.
78+
On Debian or Ubuntu install libtesseract-dev, libleptonica-dev, and
79+
tesseract-ocr-eng to run examples.
8580

86-
sudo apt-get install -y libtesseract-dev libleptonica-dev tesseract-ocr-eng
81+
``` bash
82+
sudo apt-get install -y libtesseract-dev libleptonica-dev tesseract-ocr-eng
83+
```
8784

8885
On Ubuntu you can optionally use [this
8986
PPA](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel)
9087
to get the latest version of Tesseract:
9188

92-
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
93-
sudo apt-get install -y libtesseract-dev tesseract-ocr-eng
89+
``` bash
90+
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
91+
sudo apt-get install -y libtesseract-dev tesseract-ocr-eng
92+
```
9493

9594
On Fedora you need
9695
[tesseract-devel](https://src.fedoraproject.org/rpms/tesseract) and
9796
[leptonica-devel](https://src.fedoraproject.org/rpms/leptonica)
9897

99-
sudo yum install tesseract-devel leptonica-devel
98+
``` bash
99+
sudo yum install tesseract-devel leptonica-devel
100+
```
100101

101102
On RHEL and CentOS you need
102103
[tesseract-devel](https://src.fedoraproject.org/rpms/tesseract) and
103104
[leptonica-devel](https://src.fedoraproject.org/rpms/leptonica) from
104105
EPEL
105106

106-
sudo yum install epel-release
107-
sudo yum install tesseract-devel leptonica-devel
107+
``` bash
108+
sudo yum install epel-release
109+
sudo yum install tesseract-devel leptonica-devel
110+
```
108111

109112
On OS-X use
110113
[tesseract](https://github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb)
111114
from Homebrew:
112115

113-
brew install tesseract
116+
``` bash
117+
brew install tesseract
118+
```
114119

115120
Tesseract uses training data to perform OCR. Most systems default to
116121
English training data. To improve OCR results for other languages you
@@ -122,12 +127,9 @@ tesseract_download('fra')
122127
```
123128

124129
On Linux you need to install the appropriate training data from your
125-
distribution. For example to install the spanish training data:
126-
127-
- [tesseract-ocr-spa](https://packages.debian.org/testing/tesseract-ocr-spa)
128-
(Debian, Ubuntu)
129-
- [tesseract-langpack-spa](https://src.fedoraproject.org/rpms/tesseract-langpack)
130-
(Fedora, EPEL)
130+
distribution. For example to install the spanish training data you need
131+
tesseract-ocr-spa (Debian, Ubuntu) or tesseract-langpack-spa (Fedora,
132+
EPEL).
131133

132134
Alternatively you can manually download training data from
133135
[github](https://github.com/tesseract-ocr/tessdata) and store it in a
@@ -136,3 +138,12 @@ path via the `TESSDATA_PREFIX` environment variable. Note that the
136138
Tesseract 4 and Tesseract 3 use different training data format. Make
137139
sure to download training data from the branch that matches your
138140
libtesseract version.
141+
142+
## Testing with docker (development)
143+
144+
``` bash
145+
mkdir check
146+
docker run -v `pwd`/check:/check ghcr.io/r-hub/containers/clang19:latest apt install apt-utils libcurl4-openssl-dev &\
147+
R -q -e "install.packages(c('Rcpp', 'jsonlite', 'curl', 'httr', 'yaml', 'rex', 'digest', 'crayon', 'withr', 'cli', 'magick', 'processx', 'tibble', 'V8', 'testthat', 'mockery', 'whoami', 'covr', 'asciicast'), repos = 'https://cloud.r-project.org')" &\
148+
r-check
149+
```

docs/404.html

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/LICENSE.html

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/articles/index.html

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/articles/intro.html

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/authors.html

Lines changed: 6 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)