Skip to content

Commit 0eb3320

Browse files
committed
cleaned and scaffolded
1 parent cc2b07c commit 0eb3320

40 files changed

+1560
-0
lines changed

.idea/.gitignore

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/FSDP-Multi-GPU-Training.iml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/profiles_settings.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Makefile

Whitespace-only changes.

common/configs/base_config.yaml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
defaults:
2+
- fsdp_defaults # For FSDP jobs only
3+
4+
training:
5+
batch_size: 4 # Unsloth: per-device batch size; FSDP: may use batch_size_per_gpu
6+
batch_size_per_gpu: 4 # FSDP trainer expects this
7+
grad_accum_steps: 1
8+
lr: 2e-5
9+
max_steps: 1000
10+
optimizer: adamw_torch
11+
12+
checkpoint:
13+
save_interval: 100
14+
output_dir: ./outputs
15+
16+
logging:
17+
wandb_project: "unsloth-qlora"
18+
log_interval: 10
19+
20+
data:
21+
name: "gbharti/finance-alpaca"
22+
prompt_template: |
23+
### Instruction: {instruction}
24+
### Input: {input}
25+
### Response: {output}{eos_token}
26+
27+
model:
28+
name: "meta-llama/Llama-2-7b-hf"
29+
max_length: 4096
30+
load_in_4bit: true
31+
hf_token: null
32+
33+
fsdp:
34+
mixed_precision: true
35+
# sharding_policy and other FSDP-specific options can be provided in strategy-specific configs

common/configs/fsdp_defaults.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# @package _global_
2+
fsdp:
3+
sharding_strategy: "FULL_SHARD"
4+
mixed_precision: true
5+
activation_checkpointing: true
6+
7+
checkpoint:
8+
save_optimizer: false # Saves VRAM
9+
use_sharded_state: true

common/docs/eval_protocol.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Evaluation Protocol
2+
3+
## Automated Metrics
4+
1. **Training Loss** - Tracked every `log_interval` steps
5+
2. **GPU Utilization** - Via CloudWatch metrics
6+
3. **Memory Usage** - Peak VRAM/CPU recorded
7+
4. **Gradient Metrics** - Norm distribution, kurtosis
8+
9+
## Manual Evaluation
10+
For each checkpoint:
11+
1. Run 10 prompts from `eval_samples.txt`
12+
2. Score responses (1-5 scale) on:
13+
- **Accuracy**: Factual correctness
14+
- **Coherence**: Logical flow
15+
- **Conciseness**: Brevity of response
16+
- **Relevance**: Adherence to prompt
17+
18+
3. Use scoring template:
19+
```json
20+
{
21+
"prompt": "Explain quantum computing",
22+
"response": "...",
23+
"scores": {
24+
"accuracy": 4,
25+
"coherence": 5,
26+
"conciseness": 3,
27+
"relevance": 5
28+
}
29+
}

0 commit comments

Comments
 (0)