Skip to content

Commit 7c77e7f

Browse files
remove duplicated content in parallel-training.md
1 parent 8c794e5 commit 7c77e7f

File tree

1 file changed

+0
-56
lines changed

1 file changed

+0
-56
lines changed

doc/train/parallel-training.md

Lines changed: 0 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -190,62 +190,6 @@ torchrun --rdzv_endpoint=node0:12321 --nnodes=2 --nproc_per_node=4 --node_rank=1
190190
191191
## Paddle Implementation {{ paddle_icon }}
192192

193-
Currently, parallel training in paddle version is implemented in the form of Paddle Distributed Data Parallelism [DDP](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/06_distributed_training/cluster_quick_start_collective_cn.html).
194-
DeePMD-kit will decide whether to launch the training in parallel (distributed) mode or in serial mode depending on your execution command.
195-
196-
### Dataloader and Dataset
197-
198-
First, we establish a DeepmdData class for each system, which is consistent with the TensorFlow version in this level. Then, we create a dataloader for each system, resulting in the same number of dataloaders as the number of systems. Next, we create a dataset for the dataloaders obtained in the previous step. This allows us to query the data for each system through this dataset, while the iteration pointers for each system are maintained by their respective dataloaders. Finally, a dataloader is created for the outermost dataset.
199-
200-
We achieve custom sampling methods using a weighted sampler. The length of the sampler is set to total_batch_num \* num_workers.The parameter "num_workers" defines the number of threads involved in multi-threaded loading, which can be modified by setting the environment variable NUM_WORKERS (default: min(8, ncpus)).
201-
202-
> **Note** The underlying dataloader will use a distributed sampler to ensure that each GPU receives batches with different content in parallel mode, which will use sequential sampler in serial mode. In the TensorFlow version, Horovod shuffles the dataset using different random seeds for the same purpose..
203-
204-
```mermaid
205-
flowchart LR
206-
subgraph systems
207-
subgraph system1
208-
direction LR
209-
frame1[frame 1]
210-
frame2[frame 2]
211-
end
212-
subgraph system2
213-
direction LR
214-
frame3[frame 3]
215-
frame4[frame 4]
216-
frame5[frame 5]
217-
end
218-
end
219-
subgraph dataset
220-
dataset1[dataset 1]
221-
dataset2[dataset 2]
222-
end
223-
system1 -- frames --> dataset1
224-
system2 --> dataset2
225-
subgraph distribted sampler
226-
ds1[distributed sampler 1]
227-
ds2[distributed sampler 2]
228-
end
229-
dataset1 --> ds1
230-
dataset2 --> ds2
231-
subgraph dataloader
232-
dataloader1[dataloader 1]
233-
dataloader2[dataloader 2]
234-
end
235-
ds1 -- mini batch --> dataloader1
236-
ds2 --> dataloader2
237-
subgraph index[index on Rank 0]
238-
dl11[dataloader 1, entry 1]
239-
dl21[dataloader 2, entry 1]
240-
dl22[dataloader 2, entry 2]
241-
end
242-
dataloader1 --> dl11
243-
dataloader2 --> dl21
244-
dataloader2 --> dl22
245-
index -- for each step, choose 1 system --> WeightedSampler
246-
--> dploaderset --> bufferedq[buffered queue] --> model
247-
```
248-
249193
### How to use
250194

251195
We use [`paddle.distributed.fleet`](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/06_distributed_training/cluster_quick_start_collective_cn.html) to launch a DDP training session.

0 commit comments

Comments
 (0)