-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi @anchitdharmw ,
I seem to be having problems with large RAM usage and large negative losses. I'll start by explaining the RAM usage.
RAM Usage
I have attempted to train on my GPU (NVIDIA GeForce 1060, 6Gb VRAM), and quickly run out of memory. I understand that Mask-RCNN is a large network and will most likely require more memory usage than this, so I can't complain there. However, when running on my CPU I see memory usage of 24Gb+. With only 2 workers and a mini batch size of 2, I consistently see memory usage spike to ~20Gb. My images are only 512x512 so I can't see why this network would take so much memory. This also prevents me from fully utilizing my CPU/utilizing my GPU since memory is the limiting factor.
Negative Losses
I know there was an issue (that I encountered as well) with the negative variance. I have applied the workaround you suggested. Even still, I am getting negative losses that hover around -2,000. I haven't been able to finish an epoch of training (245 images) due to the large memory usage and time. If the negative loss could be rectified by allowing it to run for all 10 epochs, I'll do so.
For some context here is some relevant information re: my computer:
CPU: Intel Core i9-10850k (10 cores)
GPU: NVIDIA Geforce 1060 (6Gb VRAM)
RAM: 32Gb DDR4 RAM
Let me know if there is any more information you need from me.
Best,
Adam