-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Hi there.
I have tried running this code on one of my machine with four RTX3090 GPUs (GPU memory 24GB for each)
python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3]
I do not change any other parts of this repo. However, I encountered the CUDA error saying that I need more GPU memory. Later I modified this code as follows:
python script/run.py -c config/inductive/wn18rr.yaml --gpus [0]
and run it on a machine with one A100 GPU with 40GB GPU memory. The code runs successfully and costs roughly 32GB GPU memory. I am really puzzled for this: why the code does not properly utilize the total 24GB*4=96GB GPU memory and still report a memory issue? Is there something wrong with my setups?
Metadata
Metadata
Assignees
Labels
No labels