Skip to content

[Dataset]: International Standard Name Identifier (ISNI) #244

@wjbmattingly

Description

@wjbmattingly

Priority Level

Medium

Dataset Name

ISNI

Description

Overview

This issue proposes adding support for the International Standard Name Identifier (ISNI) dataset to the Lux pipeline. ISNI provides authoritative identification for public identities of parties involved in content creation across the creative industries. Much of the information below comes from ISNI's website. I would like to thank drjwbaker for getting this Issue started. Thanks to these questions, we were able to identify the new data dump files provided by ISNI.

What is ISNI?

ISNI (International Standard Name Identifier) is an ISO Standard (ISO 27729) that assigns unique identifiers to public identities of parties involved throughout the media content industries. ISNI identifies contributors to creative works such as:

  • Authors, writers, and creators
  • Publishers and imprints
  • Recording artists and performers
  • Researchers and academics
  • Organizations and institutions

Each ISNI consists of 16 digits and provides persistent identification across different platforms and databases, enabling disambiguation of entities with similar names.

Benefits to Lux

  1. Authority Control: ISNI provides authoritative identification for creators and contributors
  2. Cross-Platform Linking: Enables connections to major library and cultural heritage databases
  3. Disambiguation: Resolves name conflicts for entities with similar names
  4. International Scope: Covers entities from creative industries worldwide
  5. Standard Compliance: Based on ISO 27729 international standard

Configuration Requirements

The pipeline would need configuration entries for:

  • Person data URL: https://isni.org/isni/data-person/data.jsonld
  • Organization data URL: https://isni.org/isni/data-organization/data.jsonld
  • Update schedule: Every 6 months
  • Data format: JSON-LD preferred for easier processing

Dataset Characteristics

  • License: Creative Commons CC0 1.0 Universal Public Domain Dedication
  • Update Frequency: Every 6 months
  • URI Pattern: https://isni.org/isni/{ISNI} (where ISNI is the 16-digit identifier)
  • Size: Millions of person and organization records
  • No SPARQL endpoint currently available

Data Access Method

Data Sources

Available Formats

  • RDF/XML: https://isni.org/isni/data-person/data.rdf and https://isni.org/isni/data-organization/data.rdf
  • JSON-LD: https://isni.org/isni/data-person/data.jsonld and https://isni.org/isni/data-organization/data.jsonld

Data Format

ISNI RDF Data Model

Person Entity Schema

RDF Property Expected Value/Range Definition Cardinality
rdfs:label Literal 16 digit ISNI presented as stated in the ISNI ISO standard, e.g. ISNI 0000 0000 8045 6315 1
rdf:type Class Always schema:Person 1
schema:alternateName Literal Name of the public identity 1..*
schema:birthDate Literal Year of birth of the public identity 0..1
schema:deathDate Literal Year of death of the public identity 0..1
schema:identifier Class Always schema:PropertyValue 1
isni:hasDeprecatedISNI Literal Deprecated ISNI; 16 digits with no space 0..*
owl:sameAs Class Entity identified by a URI and modelled as a real world object 0..*
madsrdf:isIdentifiedByAuthority Class Entity identified by a URI and modelled as an authority/skos:Concept 0..*
dcterms:source Class Entity identified by a non machine actionable URI, i.e. a URL 0..*

Organization or Group Entity Schema

RDF Property Expected Value/Range Definition Cardinality
rdfs:label Literal 16 digit ISNI presented as stated in the ISNI ISO standard, e.g. ISNI 0000 0001 2353 1945 1
rdf:type Class Always schema:Organization 1
schema:alternateName Literal Name of the public identity 1..*
schema:identifier Class Always schema:PropertyValue 1
isni:hasDeprecatedISNI Literal Deprecated ISNI; 16 digits with no space 0..*
owl:sameAs Class Entity identified by a URI and modelled as a real world object 0..*
madsrdf:isIdentifiedByAuthority Class Entity identified by a URI and modelled as an authority/skos:Concept 0..*
dcterms:source Class Entity identified by a non machine actionable URI, i.e. a URL 0..*

Property Value Schema

Always a blank node:

RDF Property Expected Value/Range Definition Cardinality
rdf:type Class schema:PropertyValue 1
schema:propertyID Class Always the Wikidata identifier for the ISNI schema, i.e. http://www.wikidata.org/entity/Q423048 1
schema:value Literal 16 digit ISNI – no blank spaces 1

Entity Matching

The dataset includes links to:

  • LC/NACO (Library of Congress Name Authority Cooperative Program)
  • data.bnf.fr (Bibliothèque nationale de France)
  • Wikidata
  • MusicBrainz
  • National Library of Korea
  • National Assembly Library of Korea

Linking Properties Used

  • owl:sameAs: For resources modelled as real world objects (e.g., Wikidata)
  • madsrdf:isIdentifiedByAuthority: For resources modelled as authorities (e.g., Library of Congress)
  • dcterms:source: For non-machine actionable URLs

Technical Requirements

  • Review and approve the proposal (below)
  • Implement the downloader, loader, and mapper components
  • [ ]Add ISNI configuration to the pipeline
  • Test with sample data
  • Schedule regular updates aligned with ISNI's 6-month refresh cycle

Known Limitations

No response

Example Integration

1. Example Downloader

import os
from pipeline.process.base.downloader import BaseDownloader

class ISNIDownloader(BaseDownloader):
    """
    Person data URL: https://isni.org/isni/data-person/data.jsonld
    Organization data URL: https://isni.org/isni/data-organization/data.jsonld
    """
    def get_urls(self):
        person_url = self.config['input_files']["records"][0]['url']
        org_url = self.config['input_files']["records"][1]['url']
        dumps_dir = self.config['dumps_dir']
        
        person_path = os.path.join(dumps_dir, 'isni-persons.jsonld')
        org_path = os.path.join(dumps_dir, 'isni-organizations.jsonld')
        
        return [
            {"url": person_url, "path": person_path}, 
            {"url": org_url, "path": org_path}
        ]

2. Example Loader

import os
import ujson as json
import gzip
import time
from pipeline.process.base.loader import Loader

class ISNILoader(Loader):
    
    def extract_identifier(self, record):
        """Extract ISNI identifier from the record URI"""
        uri = record.get('@id', '')
        if 'isni.org/isni/' in uri:
            return uri.split('/')[-1]
        return None

    def load(self):
        """Load ISNI JSON-LD data"""
        start = time.time()
        record_count = 0
        
        with open(self.in_path, 'r', encoding='utf-8') as fh:
            data = json.load(fh)
            
            # Handle different JSON-LD structures
            if '@graph' in data:
                records = data['@graph']
            elif isinstance(data, list):
                records = data
            else:
                records = [data]
            
            for record in records:
                identifier = self.extract_identifier(record)
                if identifier:
                    self.out_cache[identifier] = record
                    record_count += 1
                    
                    if record_count % 10000 == 0:
                        elapsed = time.time() - start
                        rate = record_count / elapsed
                        print(f"Processed {record_count} records in {elapsed:.2f}s ({rate:.2f}/s)")
        
        print(f"Loaded {record_count} ISNI records")
        self.out_cache.commit()

3. Example Mapper

from pipeline.process.base.mapper import Mapper
from cromulent import model, vocab
import re

class ISNIMapper(Mapper):
    def __init__(self, config):
        Mapper.__init__(self, config)
        self.factory.auto_assign_id = False
    
    def guess_type(self, data):
        """Determine entity type from RDF type"""
        rdf_type = data.get("@type", [])
        if isinstance(rdf_type, str):
            rdf_type = [rdf_type]
        
        if "schema:Person" in rdf_type or "Person" in rdf_type:
            return model.Person
        elif "schema:Organization" in rdf_type or "Organization" in rdf_type:
            return model.Group
        return model.Person  # Default fallback
    
    def extract_isni_number(self, uri):
        """Extract 16-digit ISNI from URI"""
        if 'isni.org/isni/' in uri:
            return uri.split('/')[-1]
        return None
    
    def format_isni_display(self, isni):
        """Format ISNI for display: 0000 0000 0000 0000"""
        if len(isni) == 16:
            return f"{isni[:4]} {isni[4:8]} {isni[8:12]} {isni[12:16]}"
        return isni
    
    def parse_person(self, record):
        """Map ISNI person record to Linked Art Person"""
        uri = record.get('@id', '')
        isni_number = self.extract_isni_number(uri)
        if not isni_number:
            return None
        
        top = model.Person(ident=uri)
        
        # Add ISNI as identifier
        isni_id = vocab.LocalNumber(content=self.format_isni_display(isni_number))
        isni_id.assigned_by = model.AttributeAssignment()
        isni_id.assigned_by.carried_out_by = model.Group(ident="https://isni.org/", _label="ISNI International Agency")
        top.identified_by = isni_id
        
        # Add names from schema:alternateName
        alt_names = record.get('schema:alternateName', [])
        if isinstance(alt_names, str):
            alt_names = [alt_names]
        
        if alt_names:
            # First name as primary
            primary_name = vocab.PrimaryName(content=alt_names[0])
            top.identified_by = primary_name
            
            # Rest as alternate names
            for name in alt_names[1:]:
                alt_name = vocab.AlternateName(content=name)
                top.identified_by = alt_name
        
        # Add birth date
        birth_date = record.get('schema:birthDate')
        if birth_date:
            birth = model.Birth()
            birth.timespan = model.TimeSpan()
            birth.timespan.identified_by = vocab.DisplayName(content=str(birth_date))
            top.born = birth
        
        # Add death date
        death_date = record.get('schema:deathDate')
        if death_date:
            death = model.Death()
            death.timespan = model.TimeSpan()
            death.timespan.identified_by = vocab.DisplayName(content=str(death_date))
            top.died = death
        
        # Add external equivalents
        same_as = record.get('owl:sameAs', [])
        if isinstance(same_as, str):
            same_as = [same_as]
        for equiv_uri in same_as:
            if isinstance(equiv_uri, dict):
                equiv_uri = equiv_uri.get('@id', equiv_uri)
            top.equivalent = model.Person(ident=equiv_uri)
        
        data = model.factory.toJSON(top)
        return {"identifier": isni_number, "data": data, "source": "isni"}
    
    def parse_organization(self, record):
        """Map ISNI organization record to Linked Art Group"""
        uri = record.get('@id', '')
        isni_number = self.extract_isni_number(uri)
        if not isni_number:
            return None
        
        top = model.Group(ident=uri)
        
        # Add ISNI as identifier
        isni_id = vocab.LocalNumber(content=self.format_isni_display(isni_number))
        isni_id.assigned_by = model.AttributeAssignment()
        isni_id.assigned_by.carried_out_by = model.Group(ident="https://isni.org/", _label="ISNI International Agency")
        top.identified_by = isni_id
        
        # Add names from schema:alternateName
        alt_names = record.get('schema:alternateName', [])
        if isinstance(alt_names, str):
            alt_names = [alt_names]
        
        if alt_names:
            # First name as primary
            primary_name = vocab.PrimaryName(content=alt_names[0])
            top.identified_by = primary_name
            
            # Rest as alternate names
            for name in alt_names[1:]:
                alt_name = vocab.AlternateName(content=name)
                top.identified_by = alt_name
        
        # Add external equivalents
        same_as = record.get('owl:sameAs', [])
        if isinstance(same_as, str):
            same_as = [same_as]
        for equiv_uri in same_as:
            if isinstance(equiv_uri, dict):
                equiv_uri = equiv_uri.get('@id', equiv_uri)
            top.equivalent = model.Group(ident=equiv_uri)
        
        data = model.factory.toJSON(top)
        return {"identifier": isni_number, "data": data, "source": "isni"}
    
    def transform(self, record, rectype=None, reference=False):
        if not rectype:
            rectype = self.guess_type(record)
        
        if rectype == model.Person or "Person" in str(rectype):
            return self.parse_person(record)
        elif rectype == model.Group or "Organization" in str(rectype):
            return self.parse_organization(record)
        else:
            return None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature to add to the code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions