ALL-IN-ONE: AI MIDI Pipeline

End-to-end, mostly automatic pipeline for turning your own stereo masters into aligned, labeled multi-track MIDI suitable for model training.

Stereo in → stems → tempo/meter → transcription → canonical tracks → (optional) key normalization → cleaned multi-track MIDI

Features (What It Does)

1. Stem Separation (HTDemucs 5-stem)

Splits each track into:
- vocals, drums, bass, guitar, other
Outputs:
- data/stems/<Song>/...
- manifests/<Song>.json

2. Tempo, Downbeats, and Meter

Uses librosa (and optionally madmom if installed) to estimate:
- meter_key.tempo
- downbeat positions
- rough time signature
Estimated tempo is reused downstream (e.g. as Basic Pitch midi_tempo).

3. Transcription

3.1 Pitched (Basic Pitch 0.2.6)

Run on stems with tempo-aware settings and cleanup.

Vocals (`vocals` stem)

Basic Pitch → note events
Vocal-specific tweaks:
- higher onset/frame thresholds
- minimum note length
- merge same-pitch segments (reduce double hits)
- squash tiny vibrato / slides
Split into:
- voxlead — highest active line
- voxbg — background / harmonies

Bass

Basic Pitch on bass
Optional filtering to reduce obvious junk / octave errors

Guitar

Basic Pitch on guitar
Exported with a guitar-like GM program

Other

Basic Pitch on other
Treated as pads/synths/etc. with a pad-like GM program

Pitched transcription status is stored under:

"transcription": {
  "pitched": {
    "...": "..."
  }
}

3.2 Drums (ADTOF)

steps/transcribe_drums.py uses adtof_pytorch on the drums stem
Merges hits into a single drums kit
Velocities derived from stem RMS (dynamic, not all-100)

Drum transcription status is stored under:

"transcription": {
  "drums": {
    "...": "..."
  }
}

4. Canonical Track Assignment

steps/assign_parts.py maps detected parts into consistent labels:

drums
voxlead
voxbg
bass
guitar
keys (optional)
other

Only non-empty tracks are kept.

Recorded under:

"assignment": {
  "tracks": {
    "...": "..."
  }
}

5. Key Detection and Optional Normalization

steps/key_normalize.py:

Detects a global key from pitched notes (ignoring drums) using music21.
If enabled:
- major-ish → transposed to C major
- minor-ish → transposed to A minor

When enabled:

"key": {
  "detected_tonic": "...",
  "detected_mode": "...",
  "normalized": true,
  "transpose_semitones": <int>,
  "target": "C major" | "A minor"
}

When disabled:

"key": {
  "detected_tonic": "...",
  "detected_mode": "...",
  "normalized": false,
  "transpose_semitones": 0,
  "target": null,
  "reason": "key normalization disabled via CLI"
}

Key normalization is OFF by default. Enable per run with --normalize-key.

6. Time Signature Injection (Optional)

steps/meter_apply.py can inject simple time signature meta events when meter estimation is confident.

7. Cleanup and Quantization

steps/clean_quantize.py:

Removes obvious junk events
Applies gentle timing/length cleanup
Tries not to destroy groove/feel

8. Multi-track MIDI Export

steps/write_midi.py builds, for each song:

One multi-track MIDI file:
- data/midi/<Song>/<Song>.mid
Uses:
- tempo from meter_key.tempo
- one track per canonical class
- is_drum = True for drums
- track names = canonical labels

9. Human-In-The-Loop Hooks

python pipeline.py review-pending
- surfaces items flagged for human review
steps/qc_render.py
- optional utilities for quick audio/MIDI spot checks

Install

Use Python 3.10 (this repo is tuned for it).

# 1) Create & activate venv
python3.10 -m venv .venv-ai-midi
source .venv-ai-midi/bin/activate

# 2) Install dependencies
pip install -r requirements.txt

Key dependencies (see requirements.txt for exact pins):

Core: numpy, typing-extensions, librosa, soundfile, scipy, pretty_midi, mido
Separation: demucs>=4.0.0
Key detection: music21
Transcription: basic-pitch==0.2.6 (+ appropriate tensorflow for your platform)
Drums: adtof_pytorch
CLI / misc: gradio, tqdm, pyyaml
Optional: madmom for extra beat/downbeat features

Usage

1. Add Audio

mkdir -p data/raw
cp /path/to/YourSong.wav data/raw/

2. Run the Pipeline

Default (no key normalization):

python pipeline.py run-batch "data/raw/*.wav"

With key normalization (C major / A minor):

python pipeline.py run-batch "data/raw/*.wav" --normalize-key

3. Inspect Outputs

For YourSong.wav:

Stems: data/stems/YourSong/...
Manifest: manifests/YourSong.json
MIDI: data/midi/YourSong/YourSong.mid

4. Extra Commands

See items flagged for human review:

python pipeline.py review-pending

Export all final MIDIs to a flat folder:

python pipeline.py export-midi --out out_midis/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
manifests		manifests
steps		steps
utils		utils
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ALL-IN-ONE: AI MIDI Pipeline

Features (What It Does)

1. Stem Separation (HTDemucs 5-stem)

2. Tempo, Downbeats, and Meter

3. Transcription

3.1 Pitched (Basic Pitch 0.2.6)

Vocals (`vocals` stem)

Bass

Guitar

Other

3.2 Drums (ADTOF)

4. Canonical Track Assignment

5. Key Detection and Optional Normalization

6. Time Signature Injection (Optional)

7. Cleanup and Quantization

8. Multi-track MIDI Export

9. Human-In-The-Loop Hooks

Install

Usage

1. Add Audio

2. Run the Pipeline

3. Inspect Outputs

4. Extra Commands

About

Uh oh!

Releases

Packages

Languages

skrinsky/all-in-one-ai-midi-pipeline

Folders and files

Latest commit

History

Repository files navigation

ALL-IN-ONE: AI MIDI Pipeline

Features (What It Does)

1. Stem Separation (HTDemucs 5-stem)

2. Tempo, Downbeats, and Meter

3. Transcription

3.1 Pitched (Basic Pitch 0.2.6)

Vocals (vocals stem)

Bass

Guitar

Other

3.2 Drums (ADTOF)

4. Canonical Track Assignment

5. Key Detection and Optional Normalization

6. Time Signature Injection (Optional)

7. Cleanup and Quantization

8. Multi-track MIDI Export

9. Human-In-The-Loop Hooks

Install

Usage

1. Add Audio

2. Run the Pipeline

3. Inspect Outputs

4. Extra Commands

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Vocals (`vocals` stem)

Packages