End-to-end, mostly automatic pipeline for turning your own stereo masters into aligned, labeled multi-track MIDI suitable for model training.
Stereo in → stems → tempo/meter → transcription → canonical tracks → (optional) key normalization → cleaned multi-track MIDI
- Splits each track into:
vocals,drums,bass,guitar,other
- Outputs:
data/stems/<Song>/...manifests/<Song>.json
- Uses
librosa(and optionallymadmomif installed) to estimate:meter_key.tempo- downbeat positions
- rough time signature
- Estimated tempo is reused downstream (e.g. as Basic Pitch
midi_tempo).
Run on stems with tempo-aware settings and cleanup.
- Basic Pitch → note events
- Vocal-specific tweaks:
- higher onset/frame thresholds
- minimum note length
- merge same-pitch segments (reduce double hits)
- squash tiny vibrato / slides
- Split into:
voxlead— highest active linevoxbg— background / harmonies
- Basic Pitch on
bass - Optional filtering to reduce obvious junk / octave errors
- Basic Pitch on
guitar - Exported with a guitar-like GM program
- Basic Pitch on
other - Treated as pads/synths/etc. with a pad-like GM program
Pitched transcription status is stored under:
"transcription": {
"pitched": {
"...": "..."
}
}
steps/transcribe_drums.pyusesadtof_pytorchon thedrumsstem- Merges hits into a single
drumskit - Velocities derived from stem RMS (dynamic, not all-100)
Drum transcription status is stored under:
"transcription": {
"drums": {
"...": "..."
}
}
steps/assign_parts.py maps detected parts into consistent labels:
drumsvoxleadvoxbgbassguitarkeys(optional)other
Only non-empty tracks are kept.
Recorded under:
"assignment": {
"tracks": {
"...": "..."
}
}
steps/key_normalize.py:
- Detects a global key from pitched notes (ignoring drums) using
music21. - If enabled:
- major-ish → transposed to C major
- minor-ish → transposed to A minor
When enabled:
"key": {
"detected_tonic": "...",
"detected_mode": "...",
"normalized": true,
"transpose_semitones": <int>,
"target": "C major" | "A minor"
}
When disabled:
"key": {
"detected_tonic": "...",
"detected_mode": "...",
"normalized": false,
"transpose_semitones": 0,
"target": null,
"reason": "key normalization disabled via CLI"
}
- Key normalization is OFF by default. Enable per run with
--normalize-key.
steps/meter_apply.py can inject simple time signature meta events when meter estimation is confident.
steps/clean_quantize.py:
- Removes obvious junk events
- Applies gentle timing/length cleanup
- Tries not to destroy groove/feel
steps/write_midi.py builds, for each song:
- One multi-track MIDI file:
data/midi/<Song>/<Song>.mid
- Uses:
- tempo from
meter_key.tempo - one track per canonical class
is_drum = Truefor drums- track names = canonical labels
- tempo from
python pipeline.py review-pending- surfaces items flagged for human review
steps/qc_render.py- optional utilities for quick audio/MIDI spot checks
Use Python 3.10 (this repo is tuned for it).
# 1) Create & activate venv
python3.10 -m venv .venv-ai-midi
source .venv-ai-midi/bin/activate
# 2) Install dependencies
pip install -r requirements.txt
Key dependencies (see requirements.txt for exact pins):
- Core:
numpy,typing-extensions,librosa,soundfile,scipy,pretty_midi,mido - Separation:
demucs>=4.0.0 - Key detection:
music21 - Transcription:
basic-pitch==0.2.6(+ appropriatetensorflowfor your platform) - Drums:
adtof_pytorch - CLI / misc:
gradio,tqdm,pyyaml - Optional:
madmomfor extra beat/downbeat features
mkdir -p data/raw
cp /path/to/YourSong.wav data/raw/
Default (no key normalization):
python pipeline.py run-batch "data/raw/*.wav"
With key normalization (C major / A minor):
python pipeline.py run-batch "data/raw/*.wav" --normalize-key
For YourSong.wav:
- Stems:
data/stems/YourSong/... - Manifest:
manifests/YourSong.json - MIDI:
data/midi/YourSong/YourSong.mid
See items flagged for human review:
python pipeline.py review-pending
Export all final MIDIs to a flat folder:
python pipeline.py export-midi --out out_midis/