Is your feature request related to a problem? Please describe.
It seems measure and rydberg_corr functions (I'm not sure if rydberg_density and apply are also the case) are performed on CPU rather than GPU, when running the Bloqade simulation on GPUs (with CUDA and Adapt).
Describe the solution you'd like
@Roger-luo suggests writing a GPU kernel for these functions
Describe alternatives you've considered
Not using measure in the evolution simulation process.
Additional context