[Dataset]: International Standard Name Identifier (ISNI)

### Priority Level

Medium

### Dataset Name

ISNI

### Description

## Overview

This issue proposes adding support for the **International Standard Name Identifier (ISNI)** dataset to the Lux pipeline. ISNI provides authoritative identification for public identities of parties involved in content creation across the creative industries. Much of the information below comes from [ISNI's website](https://isni.org/page/linked-data/). I would like to thank [drjwbaker](https://github.com/drjwbaker) for getting this Issue started. Thanks to these questions, we were able to identify the new data dump files provided by ISNI.

## What is ISNI?

ISNI (International Standard Name Identifier) is an ISO Standard (ISO 27729) that assigns unique identifiers to public identities of parties involved throughout the media content industries. ISNI identifies contributors to creative works such as:

- Authors, writers, and creators
- Publishers and imprints  
- Recording artists and performers
- Researchers and academics
- Organizations and institutions

Each ISNI consists of 16 digits and provides persistent identification across different platforms and databases, enabling disambiguation of entities with similar names.

## Benefits to Lux

1. **Authority Control**: ISNI provides authoritative identification for creators and contributors
2. **Cross-Platform Linking**: Enables connections to major library and cultural heritage databases
3. **Disambiguation**: Resolves name conflicts for entities with similar names
4. **International Scope**: Covers entities from creative industries worldwide
5. **Standard Compliance**: Based on ISO 27729 international standard

## Configuration Requirements

The pipeline would need configuration entries for:
- Person data URL: `https://isni.org/isni/data-person/data.jsonld`
- Organization data URL: `https://isni.org/isni/data-organization/data.jsonld`
- Update schedule: Every 6 months
- Data format: JSON-LD preferred for easier processing


## Dataset Characteristics
- **License**: Creative Commons CC0 1.0 Universal Public Domain Dedication
- **Update Frequency**: Every 6 months
- **URI Pattern**: `https://isni.org/isni/{ISNI}` (where ISNI is the 16-digit identifier)
- **Size**: Millions of person and organization records
- **No SPARQL endpoint currently available**

### Data Access Method

### Data Sources
- **Person data**: https://isni.org/isni/data-person
- **Organization data**: https://isni.org/isni/data-organization

### Available Formats
- RDF/XML: `https://isni.org/isni/data-person/data.rdf` and `https://isni.org/isni/data-organization/data.rdf`
- JSON-LD: `https://isni.org/isni/data-person/data.jsonld` and `https://isni.org/isni/data-organization/data.jsonld`



### Data Format

![ISNI RDF Data Model](https://isni.org/resources/images/pages/data-model-implemented.jpg)

### Person Entity Schema

| RDF Property | Expected Value/Range | Definition | Cardinality |
|--------------|---------------------|------------|-------------|
| `rdfs:label` | Literal | 16 digit ISNI presented as stated in the ISNI ISO standard, e.g. ISNI 0000 0000 8045 6315 | 1 |
| `rdf:type` | Class | Always `schema:Person` | 1 |
| `schema:alternateName` | Literal | Name of the public identity | 1..* |
| `schema:birthDate` | Literal | Year of birth of the public identity | 0..1 |
| `schema:deathDate` | Literal | Year of death of the public identity | 0..1 |
| `schema:identifier` | Class | Always `schema:PropertyValue` | 1 |
| `isni:hasDeprecatedISNI` | Literal | Deprecated ISNI; 16 digits with no space | 0..* |
| `owl:sameAs` | Class | Entity identified by a URI and modelled as a real world object | 0..* |
| `madsrdf:isIdentifiedByAuthority` | Class | Entity identified by a URI and modelled as an authority/skos:Concept | 0..* |
| `dcterms:source` | Class | Entity identified by a non machine actionable URI, i.e. a URL | 0..* |

### Organization or Group Entity Schema

| RDF Property | Expected Value/Range | Definition | Cardinality |
|--------------|---------------------|------------|-------------|
| `rdfs:label` | Literal | 16 digit ISNI presented as stated in the ISNI ISO standard, e.g. ISNI 0000 0001 2353 1945 | 1 |
| `rdf:type` | Class | Always `schema:Organization` | 1 |
| `schema:alternateName` | Literal | Name of the public identity | 1..* |
| `schema:identifier` | Class | Always `schema:PropertyValue` | 1 |
| `isni:hasDeprecatedISNI` | Literal | Deprecated ISNI; 16 digits with no space | 0..* |
| `owl:sameAs` | Class | Entity identified by a URI and modelled as a real world object | 0..* |
| `madsrdf:isIdentifiedByAuthority` | Class | Entity identified by a URI and modelled as an authority/skos:Concept | 0..* |
| `dcterms:source` | Class | Entity identified by a non machine actionable URI, i.e. a URL | 0..* |

### Property Value Schema

Always a blank node:

| RDF Property | Expected Value/Range | Definition | Cardinality |
|--------------|---------------------|------------|-------------|
| `rdf:type` | Class | `schema:PropertyValue` | 1 |
| `schema:propertyID` | Class | Always the Wikidata identifier for the ISNI schema, i.e. `http://www.wikidata.org/entity/Q423048` | 1 |
| `schema:value` | Literal | 16 digit ISNI – no blank spaces | 1 |


### Entity Matching

The dataset includes links to:
- **LC/NACO** (Library of Congress Name Authority Cooperative Program)
- **data.bnf.fr** (Bibliothèque nationale de France)
- **Wikidata**
- **MusicBrainz**
- **National Library of Korea**
- **National Assembly Library of Korea**

### Linking Properties Used
- `owl:sameAs`: For resources modelled as real world objects (e.g., Wikidata)
- `madsrdf:isIdentifiedByAuthority`: For resources modelled as authorities (e.g., Library of Congress)
- `dcterms:source`: For non-machine actionable URLs

### Technical Requirements

- [ ] Review and approve the proposal (below)
- [ ] Implement the downloader, loader, and mapper components
- [ ]Add ISNI configuration to the pipeline
- [ ] Test with sample data
- [ ] Schedule regular updates aligned with ISNI's 6-month refresh cycle

### Known Limitations

_No response_

### Example Integration

### 1. Example Downloader

```python
import os
from pipeline.process.base.downloader import BaseDownloader

class ISNIDownloader(BaseDownloader):
    """
    Person data URL: https://isni.org/isni/data-person/data.jsonld
    Organization data URL: https://isni.org/isni/data-organization/data.jsonld
    """
    def get_urls(self):
        person_url = self.config['input_files']["records"][0]['url']
        org_url = self.config['input_files']["records"][1]['url']
        dumps_dir = self.config['dumps_dir']
        
        person_path = os.path.join(dumps_dir, 'isni-persons.jsonld')
        org_path = os.path.join(dumps_dir, 'isni-organizations.jsonld')
        
        return [
            {"url": person_url, "path": person_path}, 
            {"url": org_url, "path": org_path}
        ]
```

### 2. Example Loader

```python
import os
import ujson as json
import gzip
import time
from pipeline.process.base.loader import Loader

class ISNILoader(Loader):
    
    def extract_identifier(self, record):
        """Extract ISNI identifier from the record URI"""
        uri = record.get('@id', '')
        if 'isni.org/isni/' in uri:
            return uri.split('/')[-1]
        return None

    def load(self):
        """Load ISNI JSON-LD data"""
        start = time.time()
        record_count = 0
        
        with open(self.in_path, 'r', encoding='utf-8') as fh:
            data = json.load(fh)
            
            # Handle different JSON-LD structures
            if '@graph' in data:
                records = data['@graph']
            elif isinstance(data, list):
                records = data
            else:
                records = [data]
            
            for record in records:
                identifier = self.extract_identifier(record)
                if identifier:
                    self.out_cache[identifier] = record
                    record_count += 1
                    
                    if record_count % 10000 == 0:
                        elapsed = time.time() - start
                        rate = record_count / elapsed
                        print(f"Processed {record_count} records in {elapsed:.2f}s ({rate:.2f}/s)")
        
        print(f"Loaded {record_count} ISNI records")
        self.out_cache.commit()
```

### 3. Example Mapper

```python
from pipeline.process.base.mapper import Mapper
from cromulent import model, vocab
import re

class ISNIMapper(Mapper):
    def __init__(self, config):
        Mapper.__init__(self, config)
        self.factory.auto_assign_id = False
    
    def guess_type(self, data):
        """Determine entity type from RDF type"""
        rdf_type = data.get("@type", [])
        if isinstance(rdf_type, str):
            rdf_type = [rdf_type]
        
        if "schema:Person" in rdf_type or "Person" in rdf_type:
            return model.Person
        elif "schema:Organization" in rdf_type or "Organization" in rdf_type:
            return model.Group
        return model.Person  # Default fallback
    
    def extract_isni_number(self, uri):
        """Extract 16-digit ISNI from URI"""
        if 'isni.org/isni/' in uri:
            return uri.split('/')[-1]
        return None
    
    def format_isni_display(self, isni):
        """Format ISNI for display: 0000 0000 0000 0000"""
        if len(isni) == 16:
            return f"{isni[:4]} {isni[4:8]} {isni[8:12]} {isni[12:16]}"
        return isni
    
    def parse_person(self, record):
        """Map ISNI person record to Linked Art Person"""
        uri = record.get('@id', '')
        isni_number = self.extract_isni_number(uri)
        if not isni_number:
            return None
        
        top = model.Person(ident=uri)
        
        # Add ISNI as identifier
        isni_id = vocab.LocalNumber(content=self.format_isni_display(isni_number))
        isni_id.assigned_by = model.AttributeAssignment()
        isni_id.assigned_by.carried_out_by = model.Group(ident="https://isni.org/", _label="ISNI International Agency")
        top.identified_by = isni_id
        
        # Add names from schema:alternateName
        alt_names = record.get('schema:alternateName', [])
        if isinstance(alt_names, str):
            alt_names = [alt_names]
        
        if alt_names:
            # First name as primary
            primary_name = vocab.PrimaryName(content=alt_names[0])
            top.identified_by = primary_name
            
            # Rest as alternate names
            for name in alt_names[1:]:
                alt_name = vocab.AlternateName(content=name)
                top.identified_by = alt_name
        
        # Add birth date
        birth_date = record.get('schema:birthDate')
        if birth_date:
            birth = model.Birth()
            birth.timespan = model.TimeSpan()
            birth.timespan.identified_by = vocab.DisplayName(content=str(birth_date))
            top.born = birth
        
        # Add death date
        death_date = record.get('schema:deathDate')
        if death_date:
            death = model.Death()
            death.timespan = model.TimeSpan()
            death.timespan.identified_by = vocab.DisplayName(content=str(death_date))
            top.died = death
        
        # Add external equivalents
        same_as = record.get('owl:sameAs', [])
        if isinstance(same_as, str):
            same_as = [same_as]
        for equiv_uri in same_as:
            if isinstance(equiv_uri, dict):
                equiv_uri = equiv_uri.get('@id', equiv_uri)
            top.equivalent = model.Person(ident=equiv_uri)
        
        data = model.factory.toJSON(top)
        return {"identifier": isni_number, "data": data, "source": "isni"}
    
    def parse_organization(self, record):
        """Map ISNI organization record to Linked Art Group"""
        uri = record.get('@id', '')
        isni_number = self.extract_isni_number(uri)
        if not isni_number:
            return None
        
        top = model.Group(ident=uri)
        
        # Add ISNI as identifier
        isni_id = vocab.LocalNumber(content=self.format_isni_display(isni_number))
        isni_id.assigned_by = model.AttributeAssignment()
        isni_id.assigned_by.carried_out_by = model.Group(ident="https://isni.org/", _label="ISNI International Agency")
        top.identified_by = isni_id
        
        # Add names from schema:alternateName
        alt_names = record.get('schema:alternateName', [])
        if isinstance(alt_names, str):
            alt_names = [alt_names]
        
        if alt_names:
            # First name as primary
            primary_name = vocab.PrimaryName(content=alt_names[0])
            top.identified_by = primary_name
            
            # Rest as alternate names
            for name in alt_names[1:]:
                alt_name = vocab.AlternateName(content=name)
                top.identified_by = alt_name
        
        # Add external equivalents
        same_as = record.get('owl:sameAs', [])
        if isinstance(same_as, str):
            same_as = [same_as]
        for equiv_uri in same_as:
            if isinstance(equiv_uri, dict):
                equiv_uri = equiv_uri.get('@id', equiv_uri)
            top.equivalent = model.Group(ident=equiv_uri)
        
        data = model.factory.toJSON(top)
        return {"identifier": isni_number, "data": data, "source": "isni"}
    
    def transform(self, record, rectype=None, reference=False):
        if not rectype:
            rectype = self.guess_type(record)
        
        if rectype == model.Person or "Person" in str(rectype):
            return self.parse_person(record)
        elif rectype == model.Group or "Organization" in str(rectype):
            return self.parse_organization(record)
        else:
            return None
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dataset]: International Standard Name Identifier (ISNI) #244

Priority Level

Dataset Name

Description

Overview

What is ISNI?

Benefits to Lux

Configuration Requirements

Dataset Characteristics

Data Access Method

Data Sources

Available Formats

Data Format

Person Entity Schema

Organization or Group Entity Schema

Property Value Schema

Entity Matching

Linking Properties Used

Technical Requirements

Known Limitations

Example Integration

1. Example Downloader

2. Example Loader

3. Example Mapper

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RDF Property	Expected Value/Range	Definition	Cardinality
`rdfs:label`	Literal	16 digit ISNI presented as stated in the ISNI ISO standard, e.g. ISNI 0000 0000 8045 6315	1
`rdf:type`	Class	Always `schema:Person`	1
`schema:alternateName`	Literal	Name of the public identity	1..*
`schema:birthDate`	Literal	Year of birth of the public identity	0..1
`schema:deathDate`	Literal	Year of death of the public identity	0..1
`schema:identifier`	Class	Always `schema:PropertyValue`	1
`isni:hasDeprecatedISNI`	Literal	Deprecated ISNI; 16 digits with no space	0..*
`owl:sameAs`	Class	Entity identified by a URI and modelled as a real world object	0..*
`madsrdf:isIdentifiedByAuthority`	Class	Entity identified by a URI and modelled as an authority/skos:Concept	0..*
`dcterms:source`	Class	Entity identified by a non machine actionable URI, i.e. a URL	0..*

RDF Property	Expected Value/Range	Definition	Cardinality
`rdf:type`	Class	`schema:PropertyValue`	1
`schema:propertyID`	Class	Always the Wikidata identifier for the ISNI schema, i.e. `http://www.wikidata.org/entity/Q423048`	1
`schema:value`	Literal	16 digit ISNI – no blank spaces	1

[Dataset]: International Standard Name Identifier (ISNI) #244

Description

Priority Level

Dataset Name

Description

Overview

What is ISNI?

Benefits to Lux

Configuration Requirements

Dataset Characteristics

Data Access Method

Data Sources

Available Formats

Data Format

Person Entity Schema

Organization or Group Entity Schema

Property Value Schema

Entity Matching

Linking Properties Used

Technical Requirements

Known Limitations

Example Integration

1. Example Downloader

2. Example Loader

3. Example Mapper

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions