检查清单
问题描述
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
复现步骤
我在使用local_chat.py 改成读取jsonl文件 去批处理 大概在进行10~20个问题 就出现这个问题 请教大家怎么解决?
环境信息
A100 单卡 ubuntu 22.04 cpu Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz