Skip to content

Commit 04031a8

Browse files
author
Raymond Li
committed
fix call to blendable dataset
1 parent ac497ce commit 04031a8

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

megatron/data/gpt_dataset.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,8 @@ def build_dataset_group(dataset_group_name, paths, weights, splits, data_impl,
145145
assert ds is not None, \
146146
f"Got an empty split when trying to create dataset: {prefixes[i], splits[i]}"
147147
datasets.append(ds)
148-
all_datasets = BlendableDataset(datasets, weights)
148+
total_size = sum(len(ds) for ds in datasets)
149+
all_datasets = BlendableDataset(datasets, weights, total_size)
149150

150151
return all_datasets
151152

0 commit comments

Comments
 (0)