Add line numbering to gpu loops #1029

danieljvickers · 2025-11-06T16:37:23Z

User description

Description

I have refactored the to separate out the openMP and openACC macros into two separate macros. One for starting a parallel loop, GPU_PARALLEL_LOOP and one for ending a parallel region END_GPU_PARALLEL_LOOP. This allows the preprocessor to only insert new lines at the start and end of GPU parallel loops. This means that the code is not able to allow line markers to progress naturally instead of all lines inside a loop having the same line number.

I have also adjusted the macro files to remove the additional warnings that were being printed with every single file that has GPU macros.

The result is that this eliminates over half of the warnings that are being printed when the code compiles, and the line numbers for errors are correct. I have tested this locally and found that the line numbers reported are correct, at least in regions that I checked. The result should be a much improved developer experience.

Fixes #1028

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

I compiled locally and saw that the warnings were removed. I also tried making intentional changes and saw that the correct line numbers were reported.

Test Configuration:

Checklist

I ran ./mfc.sh format before committing my code
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

PR Type

Enhancement

Description

Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() / #:endcall GPU_PARALLEL_LOOP syntax to $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP() syntax
Separated OpenMP and OpenACC macro implementations into distinct start and end directives (OMP_PARALLEL_LOOP / END_OMP_PARALLEL_LOOP and ACC_PARALLEL_LOOP / END_GPU_PARALLEL_LOOP)
Added explicit private clause parameters to all GPU parallel loop macros to specify privatized loop variables
Modified macro definitions to emit only opening directives, allowing loop code to follow separately for improved line numbering accuracy
Applied changes consistently across multiple simulation and common modules including viscous stress, RHS, pressure relaxation, CBC, levelset computations, and MPI common utilities
Adjusted indentation of loop bodies to align with new macro structure
Removed embedded code execution from macro implementations to allow line markers to progress naturally
Result eliminates over half of compilation warnings and provides correct line numbers for error reporting

Diagram Walkthrough

flowchart LR
  A["Old Macro Style<br/>#:call GPU_PARALLEL_LOOP()"] -->|Refactor| B["New Macro Style<br/>$:GPU_PARALLEL_LOOP()"]
  B -->|Add| C["Explicit private<br/>clause"]
  B -->|Add| D["$:END_GPU_PARALLEL_LOOP()"]
  C -->|Result| E["Correct line numbers<br/>Fewer warnings"]
  D -->|Result| E

File Walkthrough

Relevant files

Enhancement

8 files

m_viscous.fpp `Refactor GPU parallel loop macros for improved line numbering` src/simulation/m_viscous.fpp Refactored GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` / `#:endcall GPU_PARALLEL_LOOP` to `$:GPU_PARALLEL_LOOP()` / `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` parameter declarations to all `GPU_PARALLEL_LOOP` macro calls to specify loop variables Restructured loop nesting and indentation to align with new macro syntax requirements Applied changes consistently across multiple subroutines including `s_compute_viscous_stress_tensor` and gradient computation functions	+786/-786
acc_macros.fpp `Separate ACC parallel loop opening directive from code` src/common/include/acc_macros.fpp Modified `ACC_PARALLEL_LOOP` macro definition to remove `code` parameter and `acc_end_directive` from macro body Changed macro to only emit opening directive, allowing loop code to follow separately Removed embedded code execution and end directive from macro implementation	+2/-5
m_rhs.fpp `Refactor GPU loop macros with explicit private variables` src/simulation/m_rhs.fpp Refactored GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` and `#:endcall GPU_PARALLEL_LOOP` to `$:GPU_PARALLEL_LOOP()` and `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause to all `GPU_PARALLEL_LOOP` macros specifying loop variables to be privatized Adjusted indentation of loop bodies to align with new macro syntax (removed extra indentation level) Applied changes consistently across all GPU parallel regions throughout the file	+572/-572
m_mpi_common.fpp `Update GPU macros with explicit private variable declarations` src/common/m_mpi_common.fpp Converted GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` to `$:GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause listing all loop variables (`i`, `j`, `k`, `l`, `q`, `r`) to GPU parallel regions Updated macro closing from `#:endcall GPU_PARALLEL_LOOP` to `$:END_GPU_PARALLEL_LOOP()` Corrected indentation of loop bodies to match new macro structure	+222/-222
m_pressure_relaxation.fpp `Refactor GPU parallel loop macros with private variables` src/simulation/m_pressure_relaxation.fpp Replaced `#:call GPU_PARALLEL_LOOP()` with `$:GPU_PARALLEL_LOOP()` macro syntax Added explicit `private='[j,k,l]'` clause to specify privatized variables Changed closing macro from `#:endcall GPU_PARALLEL_LOOP` to `$:END_GPU_PARALLEL_LOOP()` Adjusted loop body indentation to align with new macro format	+7/-7
m_cbc.fpp `Refactor GPU loop macros to separate start and end directives` src/simulation/m_cbc.fpp Refactored GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` / `#:endcall GPU_PARALLEL_LOOP` to `$:GPU_PARALLEL_LOOP()` / `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause parameters to all `GPU_PARALLEL_LOOP` calls specifying loop variables Adjusted indentation of loop bodies to align with new macro structure, removing extra indentation from the old call-based approach Updated numerous GPU parallel regions throughout CBC calculations, flux reshaping, and output operations	+590/-590
m_compute_levelset.fpp `Refactor GPU parallel loop macros in levelset computations` src/common/m_compute_levelset.fpp Converted all `#:call GPU_PARALLEL_LOOP()` / `#:endcall GPU_PARALLEL_LOOP` invocations to `$:GPU_PARALLEL_LOOP()` / `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause with loop variable lists to all GPU parallel loop macros Reduced indentation of loop bodies by one level to match new macro structure Applied changes across all levelset computation functions (circle, airfoil, 3D airfoil, rectangle, cuboid, sphere, cylinder)	+274/-274
omp_macros.fpp `Split GPU parallel loop macro into separate start and end macros` src/common/include/omp_macros.fpp Separated `OMP_PARALLEL_LOOP` macro into two parts: start directive generation and end directive generation Removed `code` parameter from `OMP_PARALLEL_LOOP` macro definition Created new `END_OMP_PARALLEL_LOOP` macro to emit appropriate end directives based on compiler type Removed inline `$:code` execution from `OMP_PARALLEL_LOOP`, allowing loop bodies to be written directly in code	+15/-6

Formatting

1 files

shared_parallel_macros.fpp `Add required newline at end of file` src/common/include/shared_parallel_macros.fpp Added newline at end of file to comply with FYPP requirements	+1/-1

Additional files

27 files

parallel_macros.fpp	+17/-7
m_boundary_common.fpp	+345/-345
m_chemistry.fpp	+128/-128
m_finite_differences.fpp	+31/-31
m_ib_patches.fpp	+189/-189
m_phase_change.fpp	+118/-118
m_variables_conversion.fpp	+316/-316
m_acoustic_src.fpp	+105/-105
m_body_forces.fpp	+48/-48
m_bubbles_EE.fpp	+171/-171
m_bubbles_EL.fpp	+285/-285
m_bubbles_EL_kernels.fpp	+101/-101
m_data_output.fpp	+13/-13
m_derived_variables.fpp	+217/-217
m_fftw.fpp	+59/-59
m_hyperelastic.fpp	+93/-93
m_hypoelastic.fpp	+255/-255
m_ibm.fpp	+158/-158
m_igr.fpp	+1728/-1728
m_mhd.fpp	+48/-48
m_mpi_proxy.fpp	+53/-53
m_muscl.fpp	+141/-141
m_qbmm.fpp	+218/-218
m_riemann_solvers.fpp	+3155/-3155
m_surface_tension.fpp	+157/-157
m_time_steppers.fpp	+103/-103
m_weno.fpp	+494/-494

CodeAnt-AI Description

Restore accurate line numbers for GPU loops

What Changed

GPU loops now open and close with the new parallel macros so compilers see the real loop bodies, which fixes reported error line numbers and eliminates the extra warning noise from the old wrappers.
MPI unpack/pack paths plus viscous, gradient, and MHD calculations now use the same macros, extending the clearer diagnostics to those GPU-bound routines.

Impact

✅ Clearer GPU error line numbers
✅ Fewer GPU build warnings
✅ Consistent diagnostics across viscous and MPI kernels

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

New Features
- Added a new public interface-compression routine for improved interface reconstruction.
Documentation
- GPU parallelization guide updated to a new invocation style and clarifies that parallelization directives wrap code blocks.
Refactor
- Standardized GPU parallel-region syntax and end markers across the codebase, tightened private-variable scoping, preserved legacy variants for compatibility, and unified 3D boundary handling across modules.

qodo-merge-pro · 2025-11-06T16:38:19Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 1028 - Partially compliant Compliant requirements: Ensure FYPP-generated line markers advance correctly inside GPU parallel regions. Refactor GPU loop macros so that start/end markers are separate (no single wrapper causing same-line markers). Preserve functional equivalence of existing GPU loops (OpenACC/OpenMP paths). Reduce excessive FYPP/compile-time warnings caused by previous macro usage. Non-compliant requirements: Validate that compiler error line numbers inside GPU loops point to the correct source lines. Requires further human verification: Validate that compiler error line numbers inside GPU loops point to the correct source lines.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue Many new GPU_PARALLEL_LOOP insertions use arrays indexed at k-1 or similar (e.g., flux at k-1) while outer loops start at 0 or include boundaries. Please verify boundary ranges to avoid out-of-bounds when k=0 (examples include flux access patterns around lines adding inv_ds with flux_face1 at index-1). call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz) end if $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k_loop,l_loop,q_loop,inv_ds,flux_face1,flux_face2]') do j = 1, sys_size do q_loop = 0, p do l_loop = 0, n do k_loop = 0, m inv_ds = 1._wp/dx(k_loop) flux_face1 = flux_n(1)%vf(j)%sf(k_loop - 1, l_loop, q_loop) flux_face2 = flux_n(1)%vf(j)%sf(k_loop, l_loop, q_loop) rhs_vf(j)%sf(k_loop, l_loop, q_loop) = inv_ds(flux_face1 - flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() if (model_eqns == 3) then $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k_loop,l_loop,q_loop,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]') do q_loop = 0, p do l_loop = 0, n do k_loop = 0, m do i_fluid_loop = 1, num_fluids inv_ds = 1._wp/dx(k_loop) advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(k_loop, l_loop, q_loop) pressure_val = q_prim_vf%vf(E_idx)%sf(k_loop, l_loop, q_loop) flux_face1 = flux_src_n_vf%vf(advxb)%sf(k_loop, l_loop, q_loop) flux_face2 = flux_src_n_vf%vf(advxb)%sf(k_loop - 1, l_loop, q_loop) rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) = & rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) - & inv_dsadvected_qty_valpressure_val(flux_face1 - flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() end if call s_add_directional_advection_source_terms(idir, rhs_vf, q_cons_vf, q_prim_vf, flux_src_n_vf, Kterm) case (2) ! y-direction if (bc_y%beg <= BC_CHAR_SLIP_WALL .and. bc_y%beg >= BC_CHAR_SUP_OUTFLOW) then call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, -1, irx, iry, irz) end if if (bc_y%end <= BC_CHAR_SLIP_WALL .and. bc_y%end >= BC_CHAR_SUP_OUTFLOW) then call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz) end if $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,inv_ds,flux_face1,flux_face2]') do j = 1, sys_size do l = 0, p do k = 0, n do q = 0, m inv_ds = 1._wp/dy(k) flux_face1 = flux_n(2)%vf(j)%sf(q, k - 1, l) flux_face2 = flux_n(2)%vf(j)%sf(q, k, l) rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) + inv_ds(flux_face1 - flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() if (model_eqns == 3) then $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k,l,q,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]') do l = 0, p do k = 0, n do q = 0, m do i_fluid_loop = 1, num_fluids inv_ds = 1._wp/dy(k) advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(q, k, l) pressure_val = q_prim_vf%vf(E_idx)%sf(q, k, l) flux_face1 = flux_src_n_vf%vf(advxb)%sf(q, k, l) flux_face2 = flux_src_n_vf%vf(advxb)%sf(q, k - 1, l) rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = & rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - & inv_dsadvected_qty_valpressure_val(flux_face1 - flux_face2) if (cyl_coord) then rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = & rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - & 5.e-1_wp/y_cc(k)advected_qty_valpressure_val(flux_face1 + flux_face2) end if end do end do end do end do $:END_GPU_PARALLEL_LOOP() end if if (cyl_coord) then $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,flux_face1,flux_face2]') do j = 1, sys_size do l = 0, p do k = 0, n do q = 0, m flux_face1 = flux_gsrc_n(2)%vf(j)%sf(q, k - 1, l) flux_face2 = flux_gsrc_n(2)%vf(j)%sf(q, k, l) rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) - & 5.e-1_wp/y_cc(k)(flux_face1 + flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() end if Off-by-one Risk Writes to data_real_gpu use +1 offsets and ranges l=0..p; ensure array extents and leading indices match device allocations after macro refactor, especially with firstprivate and collapse usage. #endif #:endcall GPU_HOST_DATA $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3) do k = 1, sys_size do j = 0, m do l = 0, p data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size)/real(real_size, dp) q_cons_vf(k)%sf(j, 0, l) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) end do end do end do $:END_GPU_PARALLEL_LOOP() do i = 1, fourier_rings $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3) do k = 1, sys_size do j = 0, m do l = 1, cmplx_size data_fltr_cmplx_gpu(l + jcmplx_size + (k - 1)cmplx_sizex_size) = (0_dp, 0_dp) end do end do end do $:END_GPU_PARALLEL_LOOP() $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]') do k = 1, sys_size do j = 0, m do l = 0, p data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) = q_cons_vf(k)%sf(j, i, l) end do end do end do $:END_GPU_PARALLEL_LOOP() #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_cmplx]') #if defined(__PGI) ierr = cufftExecD2Z(fwd_plan_gpu, data_real_gpu, data_cmplx_gpu) #else ierr = hipfftExecD2Z(fwd_plan_gpu, c_loc(p_real), c_loc(p_cmplx)) call hipCheck(hipDeviceSynchronize()) #endif #:endcall GPU_HOST_DATA Nfq = min(floor(2_dpreal(i, dp)pi), cmplx_size) $:GPU_UPDATE(device='[Nfq]') $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3) do k = 1, sys_size do j = 0, m do l = 1, Nfq data_fltr_cmplx_gpu(l + jcmplx_size + (k - 1)cmplx_sizex_size) = data_cmplx_gpu(l + jcmplx_size + (k - 1)cmplx_sizex_size) end do end do end do $:END_GPU_PARALLEL_LOOP() #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_fltr_cmplx]') #if defined(__PGI) ierr = cufftExecZ2D(bwd_plan_gpu, data_fltr_cmplx_gpu, data_real_gpu) #else ierr = hipfftExecZ2D(bwd_plan_gpu, c_loc(p_fltr_cmplx), c_loc(p_real)) call hipCheck(hipDeviceSynchronize()) #endif #:endcall GPU_HOST_DATA $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]') do k = 1, sys_size do j = 0, m do l = 0, p data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size)/real(real_size, dp) q_cons_vf(k)%sf(j, i, l) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) end do end do end do $:END_GPU_PARALLEL_LOOP() end do Macro Consistency A comment line was moved to ensure newline at EOF. Confirm that macro endings and inserted END_GPU_PARALLEL_LOOP everywhere match the corresponding start to avoid unterminated regions in some files not shown here. #:endif $:extraArgs_val #:enddef ! New line at end of file is required for FYPP

qodo-merge-pro · 2025-11-06T16:39:36Z

src/simulation/m_riemann_solvers.fpp

+                            if (mhd) then
+                                if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
+                                    ! B_y flux = v_x * B_y - v_y * Bx0
+                                    ! B_z flux = v_x * B_z - v_z * Bx0
                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 3
-                                        ! Flux of rho*v_i in the ${XYZ}$ direction
-                                        ! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
-                                            (s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &
-                                                  - B%R(i)*B%R(norm_dir) &
-                                                  + dir_flg(i)*(pres_R + pres_mag%R)) &
-                                             - s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &
-                                                    - B%L(i)*B%L(norm_dir) &
-                                                    + dir_flg(i)*(pres_L + pres_mag%L)) &
-                                             + s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &
-                                            /(s_M - s_P)
+                                    do i = 0, 1
+                                        flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
+                                                                               - s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+                                                                               + s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
                                    end do
-                                elseif (mhd .and. relativity) then
+                                else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
+                                    ! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
+                                    ! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
+                                    ! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 3
-                                        ! Flux of m_i in the ${XYZ}$ direction
-                                        ! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
-                                            (s_M*(cm%R(i)*vel_R(norm_dir) &
-                                                  - b4%R(i)/Ga%R*B%R(norm_dir) &
-                                                  + dir_flg(i)*(pres_R + pres_mag%R)) &
-                                             - s_P*(cm%L(i)*vel_L(norm_dir) &
-                                                    - b4%L(i)/Ga%L*B%L(norm_dir) &
-                                                    + dir_flg(i)*(pres_L + pres_mag%L)) &
-                                             + s_M*s_P*(cm%L(i) - cm%R(i))) &
-                                            /(s_M - s_P)
-                                    end do
-                                elseif (bubbles_euler) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_vels
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
-                                            (s_M*(rho_R*vel_R(dir_idx(1)) &
-                                                  *vel_R(dir_idx(i)) &
-                                                  + dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &
-                                             - s_P*(rho_L*vel_L(dir_idx(1)) &
-                                                    *vel_L(dir_idx(i)) &
-                                                    + dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &
-                                             + s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
-                                                        - rho_R*vel_R(dir_idx(i)))) &
-                                            /(s_M - s_P) &
-                                            + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
-                                    end do
-                                else if (hypoelasticity) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_vels
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
-                                            (s_M*(rho_R*vel_R(dir_idx(1)) &
-                                                  *vel_R(dir_idx(i)) &
-                                                  + dir_flg(dir_idx(i))*pres_R &
-                                                  - tau_e_R(dir_idx_tau(i))) &
-                                             - s_P*(rho_L*vel_L(dir_idx(1)) &
-                                                    *vel_L(dir_idx(i)) &
-                                                    + dir_flg(dir_idx(i))*pres_L &
-                                                    - tau_e_L(dir_idx_tau(i))) &
-                                             + s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
-                                                        - rho_R*vel_R(dir_idx(i)))) &
-                                            /(s_M - s_P)
-                                    end do
-                                else
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_vels
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
-                                            (s_M*(rho_R*vel_R(dir_idx(1)) &
-                                                  *vel_R(dir_idx(i)) &
-                                                  + dir_flg(dir_idx(i))*pres_R) &
-                                             - s_P*(rho_L*vel_L(dir_idx(1)) &
-                                                    *vel_L(dir_idx(i)) &
-                                                    + dir_flg(dir_idx(i))*pres_L) &
-                                             + s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
-                                                        - rho_R*vel_R(dir_idx(i)))) &
-                                            /(s_M - s_P) &
-                                            + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
+                                    do i = 0, 2
+                                        flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
+                                                                                    s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
+                                                                                    s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
+                                                                                    s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
                                    end do
                                end if
+                                flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
+                            end if


Suggestion: Restrict the 1D MHD flux calculation to only execute for the x-direction (NORM_DIR == 1) to prevent incorrect flux calculations for the y and z directions. [possible issue, importance: 9]

Suggested change

if (mhd) then

if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.

! B_y flux = v_x * B_y - v_y * Bx0

! B_z flux = v_x * B_z - v_z * Bx0

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 3

! Flux of rho*v_i in the ${XYZ}$ direction

! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot

flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &

(s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &

- B%R(i)*B%R(norm_dir) &

+ dir_flg(i)*(pres_R + pres_mag%R)) &

- s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &

- B%L(i)*B%L(norm_dir) &

+ dir_flg(i)*(pres_L + pres_mag%L)) &

+ s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &

/(s_M - s_P)

do i = 0, 1

flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &

- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &

+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)

end do

elseif (mhd .and. relativity) then

else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction

! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)

! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)

! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 3

! Flux of m_i in the ${XYZ}$ direction

! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot

flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &

(s_M*(cm%R(i)*vel_R(norm_dir) &

- b4%R(i)/Ga%R*B%R(norm_dir) &

+ dir_flg(i)*(pres_R + pres_mag%R)) &

- s_P*(cm%L(i)*vel_L(norm_dir) &

- b4%L(i)/Ga%L*B%L(norm_dir) &

+ dir_flg(i)*(pres_L + pres_mag%L)) &

+ s_M*s_P*(cm%L(i) - cm%R(i))) &

/(s_M - s_P)

end do

elseif (bubbles_euler) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_vels

flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &

(s_M*(rho_R*vel_R(dir_idx(1)) &

*vel_R(dir_idx(i)) &

+ dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &

- s_P*(rho_L*vel_L(dir_idx(1)) &

*vel_L(dir_idx(i)) &

+ dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &

+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &

- rho_R*vel_R(dir_idx(i)))) &

/(s_M - s_P) &

+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))

end do

else if (hypoelasticity) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_vels

flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &

(s_M*(rho_R*vel_R(dir_idx(1)) &

*vel_R(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_R &

- tau_e_R(dir_idx_tau(i))) &

- s_P*(rho_L*vel_L(dir_idx(1)) &

*vel_L(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_L &

- tau_e_L(dir_idx_tau(i))) &

+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &

- rho_R*vel_R(dir_idx(i)))) &

/(s_M - s_P)

end do

else

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_vels

flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &

(s_M*(rho_R*vel_R(dir_idx(1)) &

*vel_R(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_R) &

- s_P*(rho_L*vel_L(dir_idx(1)) &

*vel_L(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_L) &

+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &

- rho_R*vel_R(dir_idx(i)))) &

/(s_M - s_P) &

+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))

do i = 0, 2

flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &

s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &

s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &

s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)

end do

end if

flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp

end if

if (mhd) then

if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.

#:if (NORM_DIR == 1)

! B_y flux = v_x * B_y - v_y * Bx0

! B_z flux = v_x * B_z - v_z * Bx0

$:GPU_LOOP(parallelism='[seq]')

do i = 0, 1

flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &

- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &

+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)

end do

#:endif

else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction

! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)

! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)

! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)

$:GPU_LOOP(parallelism='[seq]')

do i = 0, 2

flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &

s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &

s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &

s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)

end do

end if

flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp

end if

qodo-merge-pro · 2025-11-06T16:39:37Z

src/simulation/m_viscous.fpp

-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
+            $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')


Suggestion: Add the loop variable q to the private clause of the GPU_PARALLEL_LOOP directive to prevent a race condition. [possible issue, importance: 9]

Suggested change

$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

danieljvickers · 2025-11-06T18:46:46Z

I just builkt this on frontier manually and both omp and acc compile fine. I am going to rerun the build for frontier.

codecov · 2025-11-06T19:16:30Z

Codecov Report

❌ Patch coverage is 39.68038% with 1774 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.34%. Comparing base (7169711) to head (30417cb).
⚠️ Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_viscous.fpp	40.42%	190 Missing and 6 partials ⚠️
src/simulation/m_cbc.fpp	45.03%	162 Missing and 4 partials ⚠️
src/simulation/m_rhs.fpp	44.40%	128 Missing and 11 partials ⚠️
src/simulation/m_derived_variables.fpp	8.73%	114 Missing and 1 partial ⚠️
src/simulation/m_bubbles_EL.fpp	37.27%	105 Missing and 1 partial ⚠️
src/common/m_compute_levelset.fpp	40.00%	82 Missing and 20 partials ⚠️
src/simulation/m_hypoelastic.fpp	30.06%	96 Missing and 4 partials ⚠️
src/simulation/m_weno.fpp	55.83%	83 Missing and 4 partials ⚠️
src/common/m_boundary_common.fpp	55.08%	65 Missing and 10 partials ⚠️
src/simulation/m_surface_tension.fpp	29.16%	65 Missing and 3 partials ⚠️
... and 19 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
- Coverage   46.23%   44.34%   -1.90%     
==========================================
  Files          68       71       +3     
  Lines       13548    20578    +7030     
  Branches     1589     1990     +401     
==========================================
+ Hits         6264     9125    +2861     
- Misses       6356    10308    +3952     
- Partials      928     1145     +217

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson · 2025-11-06T22:33:00Z

running this by @anandrdbz and @prathi-wind. also the MHD parts might need to be examined by @ChrisZYJ

danieljvickers · 2025-11-07T00:45:30Z

We can get their input. Luckily a few tests are actually passing on Frontier, and the ones that are failing appear to mostly have Adaptive time stepping failed to converge. as the error message, which is a workable debug point.

…ectives getting printed. Thanks Anand for looking at things with me

…was never called, breaking the IO

anandrdbz · 2025-11-08T17:20:31Z

I personally think this should be fine, seems mostly non-intrusive to the PR

danieljvickers · 2025-11-10T14:27:48Z

We are still getting the weird timeout bug on frontier. All tests pass except for two on the last job, and those two hung instead of failed. Mostly just starting that frontier has been pretty finicky, since I've resubmitted it about 4 times now without seeing any cases actually fail, just run out of time. After i get this merged, and maybe after SC, I want to open a branch and debug what is going on in the tool chain that's causing this.

prathi-wind · 2025-11-10T17:01:33Z

Is there a reason why you add loop iterator variables to the private list of the parallel loops? Loop iterators should automatically be private.

prathi-wind · 2025-11-10T17:04:06Z

To make the macros consistent, GPU_DATA and GPU_HOST_DATA should follow the same style as GPU_PARALLEL_LOOP.

danieljvickers · 2025-11-10T17:29:23Z

There is not a particular reason I added private variables other than being through. I found a bug with IO as a result of explicitly placing iterators in the private variables, so I think it was a good thing to add.

For style, I have no problem with changing the host data macros. I would prefer to place it as a separate PR only because of the massive number of lines the indentation change modified here, and I think the host data is a stylistic change, where this PR actually encompasses a bug fix.

danieljvickers · 2025-11-10T22:38:43Z

Logging again that frontier ran out of time. This time only one job hung, and it was different than the previous hang. Weird I guess I'll start again. @sbryngelson do you think I just keep resubmitting until I finally get a failure or it passes?

codeant-ai · 2025-11-17T03:55:55Z

CodeAnt AI is running Incremental review

codeant-ai · 2025-11-17T03:57:37Z

CodeAnt AI Incremental review completed.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/common/m_mpi_common.fpp (1)
760-771: Fix incorrect macro syntax: Use GPU_PARALLEL_LOOP instead of GPU_PARALLEL throughout this file.

The pipeline reports that the GPU_PARALLEL macro doesn't accept the collapse keyword argument. Based on the PR objectives, the macro invocations should use the new $:GPU_PARALLEL_LOOP() and $:END_GPU_PARALLEL_LOOP() syntax, not the call-style syntax with GPU_PARALLEL.

This issue affects all 16 GPU parallel regions in this file at lines: 760, 774, 790, 807, 823, 840, 858, 961, 982, 998, 1015, 1037, 1054, 1073, 1096, and 1114.

Apply this pattern to fix all occurrences (example for lines 760-771):
-                    #:call GPU_PARALLEL(collapse=4,private='[r]')
+                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
                                     do i = 1, nVar
                                         r = (i - 1) + v_size*(j + buff_size*(k + (n + 1)*l))
                                         buff_send(r) = real(q_comm(i)%sf(j + pack_offset, k, l), kind=wp)
                                     end do
                                 end do
                             end do
                         end do
-                    #:endcall GPU_PARALLEL
+                    $:END_GPU_PARALLEL_LOOP()
Also applies to: 774-788, 790-804, 807-820, 823-838, 840-855, 858-871, 961-979, 982-996, 998-1012, 1015-1034, 1037-1052, 1054-1069, 1073-1093, 1096-1112, 1114-1130

♻️ Duplicate comments (1)

src/common/m_mpi_common.fpp (1)
774-788: Add q to the private list for consistency and correctness.

The loop body uses q as a loop index (line 779), but q is not included in the private clause. This is inconsistent with other qbmm loops in this file (lines 790, 823, 840, etc.) which include q in their private lists.

After applying the macro syntax fix, also add q to the private list:
-                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                    $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db852bf and 8578364.

📒 Files selected for processing (1)

src/common/m_mpi_common.fpp (21 hunks)

🧰 Additional context used

🪛 GitHub Actions: Coverage Check

src/common/m_mpi_common.fpp

[error] 760-760: FyppFatalError: exception occurred when calling 'GPU_PARALLEL' and macro 'GPU_PARALLEL' called with unknown keyword argument(s) 'collapse'. Preprocessing failed during Fypp step in pre_process target.

🪛 GitHub Actions: Pretty

src/common/m_mpi_common.fpp

[error] 1871-1871: ./mfc.sh format -j $(nproc) failed: MFC formatting detected an end-of-file newline issue in 'src/common/m_mpi_common.fpp' (diff shows missing newline at end of file).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Github (macos, mpi, debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Build & Publish

src/common/m_mpi_common.fpp

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/common/m_mpi_common.fpp (1)
774-788: Add missing 'q' to private list in qbmm pack loop.

The GPU_PARALLEL_LOOP_OLD at line 774 has private='[r]', but the inner loop at lines 779-783 uses q as a loop index:
do q = 1, nb
    r = (i - 1) + (q - 1)*4 + v_size* ...
Variable q should be included in the private clause to prevent potential data races. All other qbmm loops in this file (lines 790, 823, 840, etc.) correctly include q in their private lists.

Apply this fix:
-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=4,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=4,private='[q,r]')

🧹 Nitpick comments (3)

src/common/include/omp_macros.fpp (1)

199-253: Consider extracting compiler-specific directive mapping to reduce duplication.

The compiler-specific OpenMP directive logic (NVIDIA/PGI, CCE, AMD) is duplicated between OMP_PARALLEL_LOOP (lines 226-234) and END_OMP_PARALLEL_LOOP (lines 242-250). If a compiler's directive syntax changes, both macros must be updated consistently.

Consider extracting the directive mappings to shared variables or a helper macro to maintain consistency and reduce maintenance burden.

src/common/include/parallel_macros.fpp (1)

55-66: Consider adding END_ACC_PARALLEL_LOOP macro for consistency.

In END_GPU_PARALLEL_LOOP, the OpenACC end directive is hardcoded as a string (line 57), while the OpenMP end directive calls END_OMP_PARALLEL_LOOP() (line 58).

For consistency and future extensibility, consider defining an END_ACC_PARALLEL_LOOP macro in acc_macros.fpp and calling it here, mirroring the OMP pattern. This would make the codebase more uniform and easier to maintain if compiler-specific ACC end directives are needed later.

src/common/include/acc_macros.fpp (1)

164-191: Consider adding END_ACC_PARALLEL_LOOP macro for consistency with OMP pattern.

The new ACC_PARALLEL_LOOP macro (lines 164-191) only emits the start directive, following the refactoring pattern. However, unlike omp_macros.fpp which defines END_OMP_PARALLEL_LOOP, there is no corresponding END_ACC_PARALLEL_LOOP macro in this file.

While the current usage through GPU_PARALLEL_LOOP works (it hardcodes the ACC end directive at line 57 of parallel_macros.fpp), adding END_ACC_PARALLEL_LOOP would:

Maintain consistency with the OMP macro pattern

Enable direct usage of ACC macros if needed

Improve maintainability by centralizing the end directive logic

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b521697 and 973ea73.

📒 Files selected for processing (4)

src/common/include/acc_macros.fpp (3 hunks)
src/common/include/omp_macros.fpp (2 hunks)
src/common/include/parallel_macros.fpp (2 hunks)
src/common/m_mpi_common.fpp (20 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Code Cleanliness Check
GitHub Check: Coverage Test on CodeCov
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Github (macos, mpi, debug, false)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Build & Publish

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 973ea73 and 2ede9a1.

📒 Files selected for processing (1)

src/common/m_boundary_common.fpp (7 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Code Cleanliness Check
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Github (macos, mpi, debug, false)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Coverage Test on CodeCov
GitHub Check: Build & Publish

🔇 Additional comments (1)

src/common/m_boundary_common.fpp (1)

94-118: LGTM: Clean macro refactoring.

The refactoring from call-style macros to explicit start/end directives is well-executed. The explicit private variable declarations (private='[l,k]') and collapse=2 directive are appropriate for the nested loop structure and will help produce accurate line numbers in compiler diagnostics.

coderabbitai · 2025-11-17T04:24:42Z

src/common/m_boundary_common.fpp

+                        case (BC_SlIP_WALL)
+                            call s_slip_wall(q_prim_vf, 3, 1, k, l)


⚠️ Potential issue | 🔴 Critical

Fix typo in case statement.

Line 267 contains a typo: BC_SlIP_WALL (capital 'I') should be BC_SLIP_WALL to match the constant used throughout the rest of the file. This typo would prevent the slip wall boundary condition from being applied correctly at the z-direction end boundary.

Apply this diff to fix the typo:

- case (BC_SlIP_WALL) + case (BC_SLIP_WALL)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

case (BC_SlIP_WALL)

call s_slip_wall(q_prim_vf, 3, 1, k, l)

case (BC_SLIP_WALL)

call s_slip_wall(q_prim_vf, 3, 1, k, l)

🤖 Prompt for AI Agents

In src/common/m_boundary_common.fpp around lines 267 to 268, the case label has a typo `BC_SlIP_WALL` (capital 'I') that should be `BC_SLIP_WALL`; update the case statement to use the correct constant name `BC_SLIP_WALL` so the slip wall boundary condition is matched and the call to s_slip_wall executes as intended.

danieljvickers · 2025-11-17T04:46:05Z

Just as a note about my debugging now. it does look like reverting solves the seg fault issue. I couldn't begin to guess why. If this passes, it should not be merged until I resolve the MPI common file. I will likely do a series of tests where I change the file a few loops at a time until I identify the compiler seg fault. I will make another comment when it is ready to be merged.

…of fake errors. Now testing with half the loops in the new configuration.

…otal loops in this test with new macros

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/common/m_mpi_common.fpp (1)

760-906: Complete migration to new-style GPU macros within the buffer-packing subroutine.

Within the same subroutine (lines 760–906), different MPI directions use inconsistent macro styles:

mpi_dir==1 (lines 760–771): new-style $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP()

mpi_dir==2 (line 840): OLD-style #:call GPU_PARALLEL_LOOP_OLD() / #:endcall GPU_PARALLEL_LOOP_OLD

mpi_dir==3 (lines 858–906): OLD-style macros

This mixed usage contradicts the PR's objective to uniformly replace call-style macros with start/end directives. The inconsistency creates maintenance burden and undermines the goal of accurate line-number mapping.

Migrate the remaining OLD-style macros at lines 840–855 and 858–906 to the new style for consistency.

♻️ Duplicate comments (1)

src/common/m_mpi_common.fpp (1)
774-788: q loop variable still missing from private clause (previously flagged).

The GPU_PARALLEL_LOOP on line 774 omits q from its private list despite using it as a loop iterator on line 779. This was flagged in a previous review but remains unaddressed.

Apply this fix:
-                        $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')

🧹 Nitpick comments (1)

src/common/m_mpi_common.fpp (1)

961-1130: Unpacking sections uniformly use OLD-style macros while packing is mixed.

All buffer-unpacking loops (lines 961-1130) across all three MPI directions consistently use the OLD-style #:call GPU_PARALLEL_LOOP_OLD() / #:endcall macros. This contrasts with the packing sections (lines 760-906), which mix new-style and OLD-style.

While internal consistency within unpacking is good, the asymmetry between packing and unpacking sections suggests incomplete migration. Consider migrating unpacking to the new style in a follow-up to achieve uniform line-number mapping across the entire subroutine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5a3f74 and abf2f5a.

📒 Files selected for processing (1)

src/common/m_mpi_common.fpp (15 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Code Cleanliness Check
GitHub Check: Coverage Test on CodeCov
GitHub Check: Build & Publish

…two more from that list for this test.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/common/m_mpi_common.fpp (4)

874-906: Add q to private clause in qbmm packing loops (mpi_dir == 3, OLD-style macros).

Both qbmm GPU pack loops for mpi_dir == 3 use the OLD-style macro with private='[r]' but iterate over q (lines 879 and 896), creating the same data race as the other mpi_dir branches.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, buff_size - 1
                                     do k = -buff_size, n + buff_size

Also applies to line 891:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, buff_size - 1
                                     do k = -buff_size, n + buff_size

982-1012: Add q to private clause in qbmm unpacking loops (mpi_dir == 1, OLD-style macros).

Both qbmm unpack loops for mpi_dir == 1 use OLD-style macros with private='[r]' but iterate over q (lines 987 and 1003), creating a data race during buffer unpacking.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do l = 0, p
                                 do k = 0, n
                                     do j = -buff_size, -1

Also applies to line 998:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do l = 0, p
                                 do k = 0, n
                                     do j = -buff_size, -1

1037-1069: Add q to private clause in qbmm unpacking loops (mpi_dir == 2, OLD-style macros).

Both qbmm unpack loops for mpi_dir == 2 iterate over q (lines 1042 and 1059) but only include [r] in their private clauses, causing a data race.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1

Also applies to line 1054:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1

1096-1130: Add q to private clause in qbmm unpacking loops (mpi_dir == 3, OLD-style macros).

Both qbmm unpack loops for mpi_dir == 3 iterate over q (lines 1101 and 1119) but only include [r] in their private clauses, causing a data race.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size

Also applies to line 1114:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size

♻️ Duplicate comments (1)

src/common/m_mpi_common.fpp (1)
774-804: Add q to private clause in qbmm packing loops (mpi_dir == 1).

Both qbmm GPU pack loops for mpi_dir == 1 have private='[r]' but the innermost loop iterates over q (line 779 and 795). Since q is declared in the subroutine scope (line 688) and not marked private, this creates a data race where multiple parallel iterations may conflict on the shared q variable, leading to incorrect buffer packing.

Apply this diff to add q to both private clauses:
-                        $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
Also applies to line 790:
-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1

🧹 Nitpick comments (1)

src/common/m_mpi_common.fpp (1)

760-1130: Consider completing macro migration for consistency.

The file mixes new-style $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP() macros (lines 760-771, 774-788, 790-804, 807-820, 823-838, 840-855, 858-871) with OLD-style #:call GPU_PARALLEL_LOOP_OLD() / #:endcall GPU_PARALLEL_LOOP_OLD macros (lines 874-906 for mpi_dir == 3 packing and lines 961-1130 for all unpacking).

Given your comments about encountering compiler segfaults and intending additional debugging before merging these MPI changes, this partial migration appears intentional. Once debugging is complete, consider migrating all remaining OLD-style macros to the new style for consistency with the rest of the PR.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abf2f5a and 71f19b1.

📒 Files selected for processing (1)

src/common/m_mpi_common.fpp (13 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Code Cleanliness Check
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Github (macos, mpi, debug, false)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Coverage Test on CodeCov
GitHub Check: Build & Publish

coderabbitai · 2025-11-17T05:35:33Z

src/common/m_mpi_common.fpp

+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        do i = nVar + 1, nVar + 4
+                            do l = 0, p
+                                do k = 0, buff_size - 1
+                                    do j = -buff_size, m + buff_size
+                                        do q = 1, nb
+                                            r = (i - 1) + (q - 1)*4 + v_size* &
+                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
+                                                 (k + buff_size*l))
+                                            buff_send(r) = real(pb_in(j, k + pack_offset, l, i - nVar, q), kind=wp)
                                        end do
                                    end do
                                end do
                            end do
-                        #:endcall GPU_PARALLEL_LOOP
+                        end do
+                        $:END_GPU_PARALLEL_LOOP()

-                        #:call GPU_PARALLEL_LOOP(collapse=5,private='[r]')
-                            do i = nVar + 1, nVar + 4
-                                do l = 0, p
-                                    do k = 0, buff_size - 1
-                                        do j = -buff_size, m + buff_size
-                                            do q = 1, nb
-                                                r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
-                                                    ((j + buff_size) + (m + 2*buff_size + 1)* &
-                                                     (k + buff_size*l))
-                                                buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nVar, q), kind=wp)
-                                            end do
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        do i = nVar + 1, nVar + 4
+                            do l = 0, p
+                                do k = 0, buff_size - 1
+                                    do j = -buff_size, m + buff_size
+                                        do q = 1, nb
+                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
+                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
+                                                 (k + buff_size*l))
+                                            buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nVar, q), kind=wp)
                                        end do
                                    end do
                                end do
                            end do
-                        #:endcall GPU_PARALLEL_LOOP
+                        end do
+                        $:END_GPU_PARALLEL_LOOP()


⚠️ Potential issue | 🔴 Critical

Add q to private clause in qbmm packing loops (mpi_dir == 2).

Both qbmm GPU pack loops for mpi_dir == 2 iterate over q (lines 828 and 845) but only include [r] in their private clauses. This creates the same data race as in the mpi_dir == 1 case above.

Apply this diff to add q to both private clauses:

- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = 0, buff_size - 1

Also applies to line 840:

- $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = 0, buff_size - 1

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/common/m_mpi_common.fpp around lines 823 to 855, the two GPU_PARALLEL_LOOP directives for the qbmm packing loops (mpi_dir == 2) only declare [r] as private but the loops also iterate over q, causing a data race; update both GPU_PARALLEL_LOOP(private='[r]') clauses to include q (e.g., private='[r,q]') for each of the two loops (also adjust the similar private clause at the nearby line ~840) so that q is private within the parallel region.

…ng loops or both. Changing one of them now.

…n it is line 891-906 in m_mpi_common.fpp. I will then have to figure out why that causes a seg fault.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

src/common/m_mpi_common.fpp (2)

823-855: Add q to private clauses in both qbmm packing loops (mpi_dir == 2).

Both qbmm GPU packing loops for mpi_dir == 2 iterate over q (lines 828 and 845) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, p
                                 do k = 0, buff_size - 1
                                     do j = -buff_size, m + buff_size
                                         do q = 1, nb

Also at line 840:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, p
                                 do k = 0, buff_size - 1

774-804: Add q to private clauses in both qbmm packing loops (mpi_dir == 1).

Both qbmm GPU packing loops for mpi_dir == 1 iterate over q (lines 779 and 795) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
                                     do i = nVar + 1, nVar + 4
                                         do q = 1, nb

Also at line 790:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1875e3c and b5079f6.

📒 Files selected for processing (1)

src/common/m_mpi_common.fpp (11 hunks)

🧰 Additional context used

🪛 GitHub Actions: Pretty

src/common/m_mpi_common.fpp

[error] 896-896: MFC formatting changed the file. Process completed with exit code 1 after diff in src/common/m_mpi_common.fpp.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Github (macos, mpi, debug, false)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Code Cleanliness Check
GitHub Check: Coverage Test on CodeCov
GitHub Check: Build & Publish

coderabbitai · 2025-11-17T05:58:45Z

src/common/m_mpi_common.fpp

+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        do i = nVar + 1, nVar + 4
+                            do l = 0, buff_size - 1
+                                do k = -buff_size, n + buff_size
+                                    do j = -buff_size, m + buff_size
+                                        do q = 1, nb
+                                            r = (i - 1) + (q - 1)*4 + v_size* &
+                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
+                                                 ((k + buff_size) + (n + 2*buff_size + 1)*l))
+                                            buff_send(r) = real(pb_in(j, k, l + pack_offset, i - nVar, q), kind=wp)
                                        end do
                                    end do
                                end do
                            end do
-                        #:endcall GPU_PARALLEL_LOOP
+                        end do
+                        $:END_GPU_PARALLEL_LOOP()

-                        #:call GPU_PARALLEL_LOOP(collapse=5,private='[r]')
-                            do i = nVar + 1, nVar + 4
-                                do l = 0, buff_size - 1
-                                    do k = -buff_size, n + buff_size
-                                        do j = -buff_size, m + buff_size
-                                            do q = 1, nb
-                                                r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
-                                                    ((j + buff_size) + (m + 2*buff_size + 1)* &
-                                                     ((k + buff_size) + (n + 2*buff_size + 1)*l))
-                                                buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nVar, q), kind=wp)
-                                            end do
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        do i = nVar + 1, nVar + 4
+                            do l = 0, buff_size - 1
+                                do k = -buff_size, n + buff_size
+                                    do j = -buff_size, m + buff_size
+                                        do q = 1, nb
+                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
+                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
+                                                  ((k + buff_size) + (n + 2*buff_size + 1)*l))
+                                            buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nVar, q), kind=wp)
                                        end do
                                    end do
                                end do
                            end do
-                        #:endcall GPU_PARALLEL_LOOP
+                        end do
+                        $:END_GPU_PARALLEL_LOOP()


⚠️ Potential issue | 🔴 Critical

Add q to private clauses in both qbmm packing loops (mpi_dir == 3).

Both qbmm GPU packing loops for mpi_dir == 3 iterate over q (lines 879 and 896) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.

Apply this diff to add q to both private clauses:

- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, buff_size - 1 do k = -buff_size, n + buff_size do j = -buff_size, m + buff_size do q = 1, nb

Also at line 891:

- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, buff_size - 1 do k = -buff_size, n + buff_size

🧰 Tools

🪛 GitHub Actions: Pretty

[error] 896-896: MFC formatting changed the file. Process completed with exit code 1 after diff in src/common/m_mpi_common.fpp.

🤖 Prompt for AI Agents

In src/common/m_mpi_common.fpp around lines 874 to 906, both GPU_PARALLEL_LOOP pragmas for the qbmm packing path (mpi_dir == 3) currently declare only r as private but the inner loop also iterates over q; declare q private as well to prevent data races by updating the private clauses to include q (e.g., private='[r,q]') for both loops spanning the blocks starting at ~line 876 and ~line 892.

src/common/m_mpi_common.fpp

…ll be curious if the format change does anything.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

src/common/m_mpi_common.fpp (6)
982-996: Add q to private clause in qbmm pb_in unpacking loop (mpi_dir == 1).

The qbmm GPU unpacking loop for pb_in when mpi_dir == 1 iterates over q (line 987) but only includes [r] in the private clause at line 961's OLD-style wrapper. Variable q must be declared private to avoid potential data races.

The private clause is set at the GPU_PARALLEL_LOOP_OLD invocation. Update it to include q:
-                    #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                    #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = -buff_size, -1
998-1012: Add q to private clause in qbmm mv_in unpacking loop (mpi_dir == 1).

The qbmm GPU unpacking loop for mv_in when mpi_dir == 1 iterates over q (line 1003) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:
-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do l = 0, p
                                 do k = 0, n
                                     do j = -buff_size, -1
1037-1052: Add q to private clause in qbmm pb_in unpacking loop (mpi_dir == 2).

The qbmm GPU unpacking loop for pb_in when mpi_dir == 2 iterates over q (line 1042) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:
-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1
1054-1069: Add q to private clause in qbmm mv_in unpacking loop (mpi_dir == 2).

The qbmm GPU unpacking loop for mv_in when mpi_dir == 2 iterates over q (line 1059) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:
-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1
1096-1112: Add q to private clause in qbmm pb_in unpacking loop (mpi_dir == 3).

The qbmm GPU unpacking loop for pb_in when mpi_dir == 3 iterates over q (line 1101) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:
-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size
1114-1130: Add q to private clause in qbmm mv_in unpacking loop (mpi_dir == 3).

The qbmm GPU unpacking loop for mv_in when mpi_dir == 3 iterates over q (line 1119) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:
-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5079f6 and 31d913f.

📒 Files selected for processing (1)

src/common/m_mpi_common.fpp (11 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Code Cleanliness Check
GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Coverage Test on CodeCov
GitHub Check: Build & Publish

coderabbitai · 2025-11-17T06:08:30Z

src/common/m_mpi_common.fpp

+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        do l = 0, p
+                            do k = 0, n
+                                do j = 0, buff_size - 1
+                                    do i = nVar + 1, nVar + 4
+                                        do q = 1, nb
+                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
+                                                (j + buff_size*(k + (n + 1)*l))
+                                            buff_send(r) = real(mv_in(j + pack_offset, k, l, i - nVar, q), kind=wp)
                                        end do
                                    end do
                                end do
                            end do
-                        #:endcall GPU_PARALLEL_LOOP
+                        end do
+                        $:END_GPU_PARALLEL_LOOP()


⚠️ Potential issue | 🔴 Critical

Add q to private clause in qbmm mv_in packing loop (mpi_dir == 1).

The second qbmm GPU packing loop for mpi_dir == 1 iterates over q (line 795) but only includes [r] in the private clause. This creates a potential data race on q.

Apply this diff:

- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = 0, buff_size - 1

🤖 Prompt for AI Agents

In src/common/m_mpi_common.fpp around lines 790 to 804, the GPU_PARALLEL_LOOP for the qbmm mv_in packing (mpi_dir == 1) declares private='[r]' but the loop also uses loop variable q, which can cause a data race; update the private clause to include q as well (e.g., private='[r,q]') so q is private per GPU thread and no race occurs, preserving the existing loop structure and indentation.

…VHOPC openACC, which must have been overwritten

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

src/simulation/m_bubbles_EL_kernels.fpp (2)
58-93: Incomplete private clause causes data races.

The private clause only includes l, s_coord, and cell, but the loop body also modifies volpart, strength_vol, strength_vel, Vol, addFun1, addFun2, and addFun3. Without privatizing these temporaries, concurrent threads will race on shared variables, producing nondeterministic results.

Apply this diff to add missing variables to the private clause:
-        $:GPU_PARALLEL_LOOP(private='[l,s_coord,cell]')
+        $:GPU_PARALLEL_LOOP(private='[l,s_coord,cell,volpart,strength_vol,strength_vel,Vol,addFun1,addFun2,addFun3]')
124-199: Incomplete private clause causes data races.

The private clause only includes nodecoord, l, s_coord, cell, and center, but the loop body also uses per-bubble temporaries volpart, strength_vol, strength_vel, stddsv and per-cell temporaries func, func2, addFun1, addFun2, addFun3, cellaux, and celloutside. Without privatizing these variables in the outer parallel loop, concurrent threads will race on shared storage.

Apply this diff to add missing variables to the private clause:
-        $:GPU_PARALLEL_LOOP(private='[nodecoord,l,s_coord,cell,center]', copyin='[smearGrid,smearGridz]')
+        $:GPU_PARALLEL_LOOP(private='[nodecoord,l,s_coord,cell,center,volpart,strength_vol,strength_vel,stddsv,func,func2,addFun1,addFun2,addFun3,cellaux,celloutside]', copyin='[smearGrid,smearGridz]')

🧹 Nitpick comments (1)

src/simulation/m_bubbles_EL.fpp (1)

1516-1529: Void-fraction statistics reduction is correctly expressed

The $:GPU_PARALLEL_LOOP in s_write_void_evol uses a grouped reduction [[lag_vol, lag_void_avg], [lag_void_max]] with operators [+, MAX], which matches the accumulation pattern inside the loop. Privatizing volcell avoids reuse across iterations, and the prior initialization of the reduced variables ensures a well-defined starting point.

If you ever touch this block again, you might consider switching the literal 5.0d-11 to a wp-kind literal (e.g., 5.0e-11_wp) for consistency with the rest of the file, but it’s not functionally urgent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31d913f and 30417cb.

📒 Files selected for processing (2)

src/simulation/m_bubbles_EL.fpp (10 hunks)
src/simulation/m_bubbles_EL_kernels.fpp (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
GitHub Check: Github (macos, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, false)
GitHub Check: Github (ubuntu, mpi, debug, false)
GitHub Check: Github (macos, mpi, debug, false)
GitHub Check: Github (ubuntu, mpi, no-debug, true)
GitHub Check: Self Hosted (cpu, none, frontier)
GitHub Check: Self Hosted (gpu, acc, gt)
GitHub Check: Github (ubuntu, mpi, debug, true)
GitHub Check: Self Hosted (gpu, acc, frontier)
GitHub Check: Self Hosted (cpu, none, gt)
GitHub Check: Self Hosted (gpu, omp, frontier)
GitHub Check: Self Hosted (gpu, omp, gt)
GitHub Check: Coverage Test on CodeCov
GitHub Check: Code Cleanliness Check
GitHub Check: Build & Publish

🔇 Additional comments (8)

src/simulation/m_bubbles_EL_kernels.fpp (1)

384-384: No breaking changes detected within the codebase.

The search found no calls to s_get_char_vol anywhere in the repository, and the module m_bubbles_EL_kernels is not imported elsewhere. Removing the elemental keyword will not break any existing code within this codebase.

However, if this function is part of a public library API consumed by external projects, those external callers may have elemental calls that would now fail. Please verify whether this module is intended as a public API or an internal implementation detail.

src/simulation/m_bubbles_EL.fpp (7)

598-615: GPU_PARALLEL_LOOP refactor in s_compute_bubble_EL_dynamics looks sound

The new $:GPU_PARALLEL_LOOP blocks correctly privatize loop indices and per‑iteration temporaries (k, i, cell, myalpha_rho, myalpha, Re) and use a MAX reduction for adap_dt_stop_max consistent with the scalar update pattern. I don’t see new data races or scope issues introduced by this macro restructuring, and the added copyin='[stage]' is appropriate.

Also applies to: 620-686, 692-699

723-738: Source-term GPU loops have consistent scoping and no obvious races

In s_compute_bubbles_EL_source, the converted $:GPU_PARALLEL_LOOP regions (with collapse and explicit private='[i,j,k,l]') align with the nested loop structure, and each iteration writes to a unique (l,i,j,k) or (i,j,k) location in rhs_vf/q_beta. The absence of reductions here is correct, and I don’t see any new parallel write conflicts introduced by the macro changes.

Also applies to: 740-754, 762-775, 779-787, 792-804

849-859: Void-fraction smearing loops remain correct under new macros

The two $:GPU_PARALLEL_LOOP usages in s_smear_voidfraction properly privatize indices and cover all (i,j,k,l) combinations once, so zeroing q_beta and then applying the 1 - beta/clamping transform is still race‑free and logically unchanged by the refactor.

Also applies to: 866-877

1107-1117: Time-stepper update kernels preserve per-bubble independence

In s_update_lagrange_tdv_rk, all RK1/RK2/RK3 update loops now use $:GPU_PARALLEL_LOOP(private='[k]') over k = 1:nBubs, with each iteration only touching state for that k. This keeps the original per‑bubble independence, so introducing the explicit start/end GPU macros does not add races or change the update ordering semantics.

Also applies to: 1131-1141, 1144-1154, 1168-1179, 1182-1192, 1193-1204

1280-1292: Temporary-state transfer loop is a good fit for GPU_PARALLEL_LOOP

The $:GPU_PARALLEL_LOOP(private='[k]') around s_transfer_data_to_tmp correctly parallelizes a pure per‑bubble copy of state from slot 1 → 2. There are no cross‑iteration dependencies, so this is safe and matches the intended data movement.

1378-1391: Gradient kernels: GPU loop restructuring keeps stencil semantics intact

For all three directions in s_gradient_dir, the new $:GPU_PARALLEL_LOOP(private='[i,j,k]', collapse=3) wrappers preserve the central‑difference stencil (neighbors at ±1 along the active axis) and only update dq(i,j,k) per iteration. Given the existing ghost‑cell layout, this macro refactor does not introduce boundary or race issues.

Also applies to: 1394-1406, 1409-1421

1712-1721: Bubble-radius stats reduction matches update logic

In s_calculate_lag_bubble_stats, the $:GPU_PARALLEL_LOOP with reduction='[[Rmax_glb], [Rmin_glb]]' and reductionOp='[MAX, MIN]' matches the max/min usage, and each k iteration only writes to Rmax_stats(k)/Rmin_stats(k). This keeps global and per‑bubble statistics consistent without adding races.

wilfonba

benchmark

danieljvickers · 2025-11-17T17:14:52Z

@anandrdbz I think that you told me that I should expect benchmark to fail because the master benchmark is not currently working. Can you comment if I am remembering this correctly or not?

danieljvickers · 2025-11-17T17:26:26Z

@sbryngelson My code is finally rebased and passing regression tests, but failing benchmark. The last PR that I can find that even ran the benchmark tests is this one: 8a66941

That PR fails benchmark.

I think I recall Anand telling that Benchmark would fail. I asked for him to comment to provide clarity on it. I think this PR is actually as complete as it can be and we just expect the benchmarking to fail. My preference would be to merge this sooner if we can because every PR that comes in causes a significant amount of debugging and rebasing on code I am unfamiliar with. I don't know if it is efficient for me to keep updating this PR until we have a fix to the test suite. I am curious if you want to wait for benchmarking to return or if we can merge.What do you think?

Edit: Anand confirmed over slack that the NVHPC bench is expected to fail because of issues with openMP on it. The rest appear to fail due to being canceled because the NVHPC bench test fails.

danieljvickers · 2025-11-19T15:52:11Z

@sbryngelson thos merge passed all the benchmarks except for the broken one. Do you think we can just merge it now and work on the NVHPC issue next?

danieljvickers added 8 commits November 6, 2025 08:34

Correct line markers have been recovered

ed2bc5e

Made it through common

35c57eb

Modified a few hundred macros

477a13b

Finished with parallel loops

3c010f8

Removed duplicate parallel statements

f8dd4cd

This builds

fd482ed

Removed all warnings

ceec116

Spelling and Format

0ce8883

danieljvickers requested a review from a team as a code owner November 6, 2025 16:37

qodo-merge-pro bot added the Review effort 4/5 label Nov 6, 2025

qodo-merge-pro bot reviewed Nov 6, 2025

View reviewed changes

Updated macro API documentation

824585f

danieljvickers requested a review from a team as a code owner November 6, 2025 16:43

Spelling

28a20bb

danieljvickers added 5 commits November 7, 2025 08:27

Found an error in how FYPP processes that resulted in only openMP dir…

5f768ec

…ectives getting printed. Thanks Anand for looking at things with me

Resolved chemistry issues

5ea3506

Ben found my IO error

1980851

Fixed issue with s_get_char_vol being elemental, which means that it …

1d5e33c

…was never called, breaking the IO

All tests pass on wingtip

723822d

Daniel Vickers and others added 2 commits November 16, 2025 15:44

Parenthesis change

db852bf

Testing if reverting macros on the MPI file resolves seg fault issues

8578364

codeant-ai bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 17, 2025

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

src/common/m_mpi_common.fpp Outdated Show resolved Hide resolved

Daniel Vickers added 2 commits November 16, 2025 23:00

format

b521697

Fixed old macros

973ea73

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

Undoing a deletion of a directive with and end loop

2ede9a1

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

Daniel Vickers added 2 commits November 17, 2025 00:16

Previous test passed except for gres resource errors, which are kind …

d5a3f74

…of fake errors. Now testing with half the loops in the new configuration.

Got errors with the first 9 loops, so now I removed 4 more loops. 5 t…

abf2f5a

…otal loops in this test with new macros

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

Last test pass, so I have narrowed the seg fault to 4 loops. Removed …

71f19b1

…two more from that list for this test.

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

Daniel Vickers added 2 commits November 17, 2025 00:44

Previous test compiled, meaning the problem is one of the two remaini…

1875e3c

…ng loops or both. Changing one of them now.

The previous test compiled. If this does not compile then I am certai…

b5079f6

…n it is line 891-906 in m_mpi_common.fpp. I will then have to figure out why that causes a seg fault.

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

For some reason, the last test compiled. It changed with format. I wi…

31d913f

…ll be curious if the format change does anything.

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

Readding a fix that I implemented earlier for bubbles that fails on N…

30417cb

…VHOPC openACC, which must have been overwritten

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

wilfonba approved these changes Nov 17, 2025

View reviewed changes

sbryngelson merged commit 22af239 into MFlowCode:master Nov 19, 2025
39 of 44 checks passed

	$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
	$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

Add line numbering to gpu loops #1029

Add line numbering to gpu loops #1029

Uh oh!

Conversation

danieljvickers commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Summary by CodeRabbit

Uh oh!

qodo-merge-pro bot commented Nov 6, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

qodo-merge-pro bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

danieljvickers commented Nov 6, 2025

Uh oh!

codecov bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sbryngelson commented Nov 6, 2025

Uh oh!

danieljvickers commented Nov 7, 2025

Uh oh!

anandrdbz commented Nov 8, 2025

Uh oh!

danieljvickers commented Nov 10, 2025

Uh oh!

prathi-wind commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prathi-wind commented Nov 10, 2025

Uh oh!

danieljvickers commented Nov 10, 2025

Uh oh!

danieljvickers commented Nov 10, 2025

Uh oh!

codeant-ai bot commented Nov 17, 2025

Uh oh!

codeant-ai bot commented Nov 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

danieljvickers commented Nov 17, 2025

danieljvickers commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

If your code changes any code source files (anything in `src/simulation`)

codecov bot commented Nov 6, 2025 •

edited

Loading

prathi-wind commented Nov 10, 2025 •

edited

Loading

danieljvickers commented Nov 17, 2025 •

edited

Loading