Skip to content

Request for step-by-step SFT (single & multi-task) pipeline guidance for MFTCoder #91

@alvi75

Description

@alvi75

Hello MFTCoder authors 👋,

First, thank you for releasing MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning and sharing the code. I’m a 3rd year PhD student currently attempting to reproduce your SFT experiments (both single-task and multi-task baselines) using the repo.

I’ve successfully set up the environment (conda, CUDA 12.x; multi-GPU available) and explored the repo (e.g., build_model.py, atorch_trainer.py). However, I’m still unclear on the exact SFT pipelines. Could you please clarify or provide a minimal set of example scripts/configs?

What I’m Hoping To Clarify

  1. Data Loading & Formats
    • Expected JSON/JSONL schema per task (fields for input/output, roles, label masking).
    • Where task IDs / TASK2ID are defined and how they map to datasets.

  2. SFT-Single (SFT-S-*)
    • One concrete command (e.g., CodeLlama-13B-Python + QLoRA) to fine-tune on a single task (e.g., text-to-code or code completion).
    • Example config/flags for:
    • optimizer, LR schedule
    • max sequence length
    • gradient accumulation
    • PEFT settings (LoRA/QLoRA)

  3. SFT-Mixed (Multi-task)
    • How to specify multiple datasets in one run (CLI flags vs config file).
    • Task sampling/mixing policy: uniform vs size-based?
    • How to switch between:
    • sample-count weighted loss
    • valid-token weighted loss
    • Any recommendations on per-task batch sizes or temperature scaling.

  4. Loss Functions
    • Confirmation that SFT experiments used cross-entropy with weighted loss.
    • Whether focal loss or FAMO were excluded in the official SFT baseline results.
    • The exact flag names to enable:
    • weighted by valid tokens
    • weighted by samples

  5. Evaluation
    • Commands to evaluate on:
    • HumanEval / HumanEval-X
    • MBPP
    • CodeFuseEval
    • pass@k evaluation protocol, execution-based scoring, and seeds for reproducibility.

  6. Reproducibility
    • Example run logs or expected training curves.
    • Early stopping criteria and typical step counts.
    • Any specific branches (e.g., mftcoder_accelerate vs mftcoder_atorch) that contain the canonical SFT scripts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions