-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Priority Level
Low
Background
We may want to provide compressed csv versions of the reconciliation indexes and/or copies of the LMDBs themselves as options for the index_loader. These are a lot faster to load than having to build them from scratch each time, and would be very useful for others reproducing the LUX environment.
The reconciliation indexes are as follows:
URI index of VIAF
URI index of Wikidata
URI index of LCSH
URI index of LCNAF
URI index of AAT
URI index of ULAN
Name index of LCSH
Name index of LCNAF
Name index of AAT
Name index of ULAN
Description
Downloads of the LMDB databases and compressed versions of the csv, make these available in the codebase somewhere. The index_loader scripts will need updated for a flag/option to load from these files instead of re-indexing the source entirely.
Notes about how to make these files available:
We could ask ITS to make the S3 bucket for downloads publicly available.
Tasks
- Create compressed csv versions of the reconciliation indexes
- Make them available in the pipeline codebase somewhere
- Update index_loader scripts for option to load from these files