Skip to content

Commit b0e15ce

Browse files
authored
Taginfo update scripts (#382)
* Upload in a directory taginfo files * Add ulr path to download files Update taginfo scrpit Fix scrpit Fix config for taginfo serviceAccount * Upload only .db files from the main directory Update taginfo script and readme Update taginfo Path to taginfo to sabe db files Install cron in taginfo Fetch data if requiere - taginfo Update config * Update env configs - taginfo
1 parent d6f06ed commit b0e15ce

File tree

7 files changed

+99
-52
lines changed

7 files changed

+99
-52
lines changed

envs/.env.taginfo.example

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,14 @@
1-
#######################################
2-
# Environment variables for taginfo database
3-
#######################################
4-
URL_PLANET_FILE_STATE=https://planet.openhistoricalmap.org.s3.amazonaws.com/planet/state.txt
5-
URL_HISTORY_PLANET_FILE_STATE=https://planet.openhistoricalmap.org.s3.amazonaws.com/planet/full-history/state.txt
6-
URL_PLANET_FILE=https://planet.openhistoricalmap.org.s3.amazonaws.com/planet/planet-200526_0000.osm.pbf
7-
URL_HISTORY_PLANET_FILE=https://planet.openhistoricalmap.org.s3.amazonaws.com/planet/full-history/history-200526_0000.osh.pbf
8-
INSTANCE_URL=http://localhost:4567
9-
INSTANCE_NAME="OHM Taginfo"
10-
INSTANCE_DESCRIPTION="This is a <b>taginfo test instance</b>. Change this text in your <tt>taginfo-config.json</tt>."
11-
INSTANCE_ICON=https://www.openhistoricalmap.org/assets/ohm_logo-2d97749faddd5bd051d846ed1be0544aa7c92422b673eb43d2fd6edf3428986d.svg
12-
INSTANCE_CONTACT= "Anonymous"
1+
URL_PLANET_FILE_STATE=https://s3.amazonaws.com/osm-seed.org/planet/state.txt
2+
URL_HISTORY_PLANET_FILE_STATE=https://s3.amazonaws.com/osm-seed.org/planet/full-history/state.txt
3+
URL_PLANET_FILE='none'
4+
URL_HISTORY_PLANET_FILE='none'
5+
TIME_UPDATE_INTERVAL=7d
136
TAGINFO_PROJECT_REPO=https://github.com/OpenHistoricalMap/taginfo-projects.git
147
DOWNLOAD_DB='languages wiki'
15-
CREATE_DB='db projects chronology'
8+
CREATE_DB='db projects chronology'
9+
ENVIRONMENT=production
10+
INTERVAL_DOWNLOAD_DATA=7d
11+
FETCH_DB_FILES=false
12+
TAGINFO_DB_BASE_URL=https://osm-seed.s3.amazonaws.com/taginfo/staging
13+
AWS_S3_BUCKET=osm-seed
14+

images/taginfo/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ RUN apt-get update && apt-get install -y \
2525
jq \
2626
python3-pip \
2727
wget \
28+
cron \
2829
&& apt-get clean \
2930
&& rm -rf /var/lib/apt/lists/*
3031

images/taginfo/README.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,43 @@
11
# OSM-Seed taginfo
22

3-
We build a docker container for taginfo software, the container will start the web service and also process required files to create databases.
3+
Docker container for taginfo that runs the web service and processes PBF files to create databases.
44

55
## Environment Variables
66

7-
All environment variables are located at [`.env.taginfo.example`](./../../envs/.env.taginfo.example), make a copy and name it as `.env.tagninfo` to use in osm-seed.
7+
Copy [`.env.taginfo.example`](./../../envs/.env.taginfo.example) to `.env.taginfo` and configure:
88

9-
- `URL_PLANET_FILE_STATE`: Url to the state file, that contains the URL for the latest planet PBF file. e.g [`state.txt`](https://planet.openhistoricalmap.org.s3.amazonaws.com/planet/state.txt), This is no required in case you set the `URL_PLANET_FILE` env var
9+
### Planet Files
10+
- `URL_PLANET_FILE_STATE`: URL to state file with latest planet PBF URL (optional if `URL_PLANET_FILE` is set)
11+
- `URL_HISTORY_PLANET_FILE_STATE`: URL to state file with latest history PBF URL (optional if `URL_HISTORY_PLANET_FILE` is set)
12+
- `URL_PLANET_FILE`: Direct URL to planet PBF file
13+
- `URL_HISTORY_PLANET_FILE`: Direct URL to history PBF file
1014

11-
- `URL_HISTORY_PLANET_FILE_STATE`: Url to the full history state file, that contains the URL for the latest full history planet PBF file. e.g [`state.txt`](https://planet.openhistoricalmap.org.s3.amazonaws.com/planet/full-history/state.txt), This is no required in case you set the `URL_HISTORY_PLANET_FILE` env var
15+
### Database Configuration
16+
- `TAGINFO_DB_BASE_URL`: Base URL to download SQLite database files. Downloads: projects-cache.db, selection.db, taginfo-chronology.db, taginfo-db.db, taginfo-history.db, taginfo-languages.db, taginfo-master.db, taginfo-projects.db, taginfo-wiki.db, taginfo-wikidata.db
17+
- Example: `https://osm-seed.org.s3.amazonaws.com/taginfo`
1218

13-
- `URL_PLANET_FILE`: URL for the latest planet PBF file.
14-
- `URL_HISTORY_PLANET_FILE`: URL for the latest full history planet PBF file.
15-
- `TIME_UPDATE_INTERVAL` Interval time to update the databases, e.g: `50m` = every 50 minutes, `20h` = every 20 hours , `5d` = every 5 days
19+
- `DOWNLOAD_DB`: Which databases to download (e.g., `languages wiki` or `languages wiki projects chronology`)
1620

17-
The following env vars are required in the instance to update the values at: https://github.com/taginfo/taginfo/blob/master/taginfo-config-example.json
21+
- `CREATE_DB`: Which databases to create from PBF files (e.g., `db projects` or `db projects chronology`)
22+
- `db` requires `URL_PLANET_FILE` or `URL_PLANET_FILE_STATE`
23+
- `projects` requires `TAGINFO_PROJECT_REPO`
24+
- `chronology` requires `URL_PLANET_FILE` or `URL_HISTORY_PLANET_FILE`
1825

19-
- `OVERWRITE_CONFIG_URL`: config file with the values to update
26+
### Other
27+
- `TAGINFO_PROJECT_REPO`: Repository URL for taginfo projects (default: https://github.com/taginfo/taginfo-projects.git)
28+
- `OVERWRITE_CONFIG_URL`: URL to custom taginfo config JSON file
29+
- `INTERVAL_DOWNLOAD_DATA`: Interval to sync databases (e.g., `3600` for 1 hour, `7d` for 7 days)
2030

21-
- `DOWNLOAD_DB`: Taginfo instances need 7 Sqlite databases to start up the web service, all of them can be downloaded from https://taginfo.openstreetmap.org/download. Or if you can download only some of them you can pass herec. e.g DOWNLOAD_DB=`languages wiki`, or DOWNLOAD_DB=`languages wiki projects chronology`.
22-
23-
- `CREATE_DB`: If you want process you of data using the PBF files, you can pass the values. eg. CREATE_DB=`db projects` or CREATE_DB=`db projects chronology`.
24-
Note:
25-
- Value `db` require to pass `URL_PLANET_FILE` or `URL_PLANET_FILE_STATE`
26-
- Value `projects` require to pass `TAGINFO_PROJECT_REPO`
27-
- Value `chronology` require to pass `URL_PLANET_FILE` or `URL_HISTORY_PLANET_FILE`
28-
29-
#### Running taginfo container
31+
## Running
3032

3133
```sh
32-
# Docker compose
33-
docker-compose run taginfo
34-
35-
# Docker
36-
docker run \
37-
--env-file ./envs/.env.taginfo \
38-
-v ${PWD}/data/taginfo-data:/apps/data/ \
39-
--network osm-seed_default \
40-
-it osmseed-taginfo:v1
41-
```
34+
# Docker compose
35+
docker-compose run taginfo
36+
37+
# Docker
38+
docker run \
39+
--env-file ./envs/.env.taginfo \
40+
-v ${PWD}/data/taginfo-data:/usr/src/app/data \
41+
--network osm-seed_default \
42+
-it osmseed-taginfo:v1
43+
```

images/taginfo/start.sh

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ process_data() {
4949
mv $DATADIR/*.db $DATADIR/
5050
mv $DATADIR/*/*.db $DATADIR/
5151
# if AWS_S3_BUCKET is set upload data
52-
if ! aws s3 ls "s3://$AWS_S3_BUCKET/$ENVIRONMENT" 2>&1 | grep -q 'An error occurred'; then
53-
aws s3 sync $DATADIR/ s3://$AWS_S3_BUCKET/$ENVIRONMENT/ --exclude "*" --include "*.db"
52+
if ! aws s3 ls "s3://$AWS_S3_BUCKET/taginfo" 2>&1 | grep -q 'An error occurred'; then
53+
aws s3 sync $DATADIR/ s3://$AWS_S3_BUCKET/taginfo/ --exclude "*" --include "*.db"
5454
fi
5555
}
5656

@@ -63,32 +63,72 @@ compress_files() {
6363
}
6464

6565
download_db_files() {
66-
if ! aws s3 ls "s3://$AWS_S3_BUCKET/$ENVIRONMENT" 2>&1 | grep -q 'An error occurred'; then
67-
aws s3 sync "s3://$AWS_S3_BUCKET/$ENVIRONMENT/" "$DATADIR/"
68-
mv $DATADIR/*.db $DATADIR/
69-
mv $DATADIR/*/*.db $DATADIR/
70-
compress_files
66+
local base_url=$1
67+
68+
if [ -z "$base_url" ]; then
69+
echo "Error: URL base is required for download_db_files"
70+
return 1
7171
fi
72+
73+
# Ensure base_url ends with /
74+
if [[ ! "$base_url" =~ /$ ]]; then
75+
base_url="${base_url}/"
76+
fi
77+
78+
# List of SQLite database files to download
79+
local db_files=(
80+
"projects-cache.db"
81+
"selection.db"
82+
"taginfo-chronology.db"
83+
"taginfo-db.db"
84+
"taginfo-history.db"
85+
"taginfo-languages.db"
86+
"taginfo-master.db"
87+
"taginfo-projects.db"
88+
"taginfo-wiki.db"
89+
"taginfo-wikidata.db"
90+
)
91+
92+
echo "Downloading SQLite database files from: $base_url"
93+
94+
for db_file in "${db_files[@]}"; do
95+
local file_url="${base_url}${db_file}"
96+
local output_path="${DATADIR}/${db_file}"
97+
98+
echo "Downloading: $db_file"
99+
if wget -q --show-progress -O "$output_path" --no-check-certificate "$file_url"; then
100+
echo "Successfully downloaded: $db_file"
101+
else
102+
echo "Warning: Failed to download $db_file from $file_url"
103+
# Continue with other files even if one fails
104+
fi
105+
done
106+
107+
echo "Database files download completed"
72108
}
73109

74110
sync_latest_db_version() {
75111
while true; do
112+
download_db_files "$TAGINFO_DB_BASE_URL"
76113
sleep "$INTERVAL_DOWNLOAD_DATA"
77-
download_db_files
78114
done
79115
}
80116

81117
start_web() {
82118
echo "Start...Taginfo web service"
83-
download_db_files
84-
cd $WORKDIR/taginfo/web && ./taginfo.rb & sync_latest_db_version
119+
cd $WORKDIR/taginfo/web && ./taginfo.rb
85120
}
86121

87122
ACTION=$1
88123
# Overwrite the config file
89124
[[ ! -z ${OVERWRITE_CONFIG_URL} ]] && wget $OVERWRITE_CONFIG_URL -O /usr/src/app/taginfo-config.json
90125
updates_source_code
91126
if [ "$ACTION" = "web" ]; then
127+
# Start sync in background if enabled
128+
if [ "${FETCH_DB_FILES:-true}" = "true" ] && [ ! -z "$TAGINFO_DB_BASE_URL" ]; then
129+
sync_latest_db_version &
130+
fi
131+
# Start web server in foreground (so the loop can detect if it fails)
92132
start_web
93133
elif [ "$ACTION" = "data" ]; then
94134
process_data

osm-seed/templates/taginfo/taginfo-configMap.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,6 @@ data:
1515
TIME_UPDATE_INTERVAL: {{ .Values.taginfo.env.TIME_UPDATE_INTERVAL | quote }}
1616
AWS_S3_BUCKET: {{ .Values.taginfo.env.AWS_S3_BUCKET | quote }}
1717
ENVIRONMENT: {{ .Values.taginfo.env.ENVIRONMENT | quote }}
18-
INTERVAL_DOWNLOAD_DATA: {{ .Values.taginfo.env.INTERVAL_DOWNLOAD_DATA | quote}}
18+
INTERVAL_DOWNLOAD_DATA: {{ .Values.taginfo.env.INTERVAL_DOWNLOAD_DATA | quote }}
19+
TAGINFO_DB_BASE_URL: {{ .Values.taginfo.env.TAGINFO_DB_BASE_URL | quote }}
1920
{{- end }}

osm-seed/templates/taginfo/taginfo-cronJob.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,10 @@ spec:
1919
spec:
2020
template:
2121
spec:
22+
{{- if .Values.taginfo.serviceAccount.enabled }}
2223
serviceAccountName: {{ .Values.taginfo.serviceAccount.name }}
24+
automountServiceAccountToken: true
25+
{{- end }}
2326
containers:
2427
- name: {{ .Release.Name }}-taginfo-job
2528
image: "{{ .Values.taginfo.image.name }}:{{ .Values.taginfo.image.tag }}"

osm-seed/values.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1030,6 +1030,7 @@ taginfo:
10301030
ENVIRONMENT: development
10311031
AWS_S3_BUCKET: taginfo
10321032
INTERVAL_DOWNLOAD_DATA: 3600
1033+
TAGINFO_DB_BASE_URL: https://planet.openhistoricalmap.org.s3.amazonaws.com/taginfo
10331034
resources:
10341035
enabled: false
10351036
requests:

0 commit comments

Comments
 (0)