Skip to content

[Enhancement]: Compressed CSV Versions of Reconciliation Indexes #213

@kkdavis14

Description

@kkdavis14

Priority Level

Low

Background

We may want to provide compressed csv versions of the reconciliation indexes and/or copies of the LMDBs themselves as options for the index_loader. These are a lot faster to load than having to build them from scratch each time, and would be very useful for others reproducing the LUX environment.

The reconciliation indexes are as follows:
URI index of VIAF
URI index of Wikidata
URI index of LCSH
URI index of LCNAF
URI index of AAT
URI index of ULAN

Name index of LCSH
Name index of LCNAF
Name index of AAT
Name index of ULAN

Description

Downloads of the LMDB databases and compressed versions of the csv, make these available in the codebase somewhere. The index_loader scripts will need updated for a flag/option to load from these files instead of re-indexing the source entirely.

Notes about how to make these files available:
We could ask ITS to make the S3 bucket for downloads publicly available.

Tasks

  • Create compressed csv versions of the reconciliation indexes
  • Make them available in the pipeline codebase somewhere
  • Update index_loader scripts for option to load from these files

Metadata

Metadata

Assignees

Labels

LowLow priority taskenhancementNew feature to add to the code

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions