Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Nov 6, 2025

User description

User description

Description

I have refactored the to separate out the openMP and openACC macros into two separate macros. One for starting a parallel loop, GPU_PARALLEL_LOOP and one for ending a parallel region END_GPU_PARALLEL_LOOP. This allows the preprocessor to only insert new lines at the start and end of GPU parallel loops. This means that the code is not able to allow line markers to progress naturally instead of all lines inside a loop having the same line number.

I have also adjusted the macro files to remove the additional warnings that were being printed with every single file that has GPU macros.

The result is that this eliminates over half of the warnings that are being printed when the code compiles, and the line numbers for errors are correct. I have tested this locally and found that the line numbers reported are correct, at least in regions that I checked. The result should be a much improved developer experience.

Fixes #1028

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

I compiled locally and saw that the warnings were removed. I also tried making intentional changes and saw that the correct line numbers were reported.

Test Configuration:

Checklist

  • I ran ./mfc.sh format before committing my code
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

PR Type

Enhancement

Description

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() / #:endcall GPU_PARALLEL_LOOP syntax to $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP() syntax

  • Separated OpenMP and OpenACC macro implementations into distinct start and end directives (OMP_PARALLEL_LOOP / END_OMP_PARALLEL_LOOP and ACC_PARALLEL_LOOP / END_GPU_PARALLEL_LOOP)

  • Added explicit private clause parameters to all GPU parallel loop macros to specify privatized loop variables

  • Modified macro definitions to emit only opening directives, allowing loop code to follow separately for improved line numbering accuracy

  • Applied changes consistently across multiple simulation and common modules including viscous stress, RHS, pressure relaxation, CBC, levelset computations, and MPI common utilities

  • Adjusted indentation of loop bodies to align with new macro structure

  • Removed embedded code execution from macro implementations to allow line markers to progress naturally

  • Result eliminates over half of compilation warnings and provides correct line numbers for error reporting

Diagram Walkthrough

flowchart LR
  A["Old Macro Style<br/>#:call GPU_PARALLEL_LOOP()"] -->|Refactor| B["New Macro Style<br/>$:GPU_PARALLEL_LOOP()"]
  B -->|Add| C["Explicit private<br/>clause"]
  B -->|Add| D["$:END_GPU_PARALLEL_LOOP()"]
  C -->|Result| E["Correct line numbers<br/>Fewer warnings"]
  D -->|Result| E
Loading

File Walkthrough

Relevant files
Enhancement
8 files
m_viscous.fpp
Refactor GPU parallel loop macros for improved line numbering

src/simulation/m_viscous.fpp

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() /
    #:endcall GPU_PARALLEL_LOOP to $:GPU_PARALLEL_LOOP() /
    $:END_GPU_PARALLEL_LOOP() syntax
  • Added explicit private parameter declarations to all GPU_PARALLEL_LOOP
    macro calls to specify loop variables
  • Restructured loop nesting and indentation to align with new macro
    syntax requirements
  • Applied changes consistently across multiple subroutines including
    s_compute_viscous_stress_tensor and gradient computation functions
+786/-786
acc_macros.fpp
Separate ACC parallel loop opening directive from code     

src/common/include/acc_macros.fpp

  • Modified ACC_PARALLEL_LOOP macro definition to remove code parameter
    and acc_end_directive from macro body
  • Changed macro to only emit opening directive, allowing loop code to
    follow separately
  • Removed embedded code execution and end directive from macro
    implementation
+2/-5     
m_rhs.fpp
Refactor GPU loop macros with explicit private variables 

src/simulation/m_rhs.fpp

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP()
    and #:endcall GPU_PARALLEL_LOOP to $:GPU_PARALLEL_LOOP() and
    $:END_GPU_PARALLEL_LOOP() syntax
  • Added explicit private clause to all GPU_PARALLEL_LOOP macros
    specifying loop variables to be privatized
  • Adjusted indentation of loop bodies to align with new macro syntax
    (removed extra indentation level)
  • Applied changes consistently across all GPU parallel regions
    throughout the file
+572/-572
m_mpi_common.fpp
Update GPU macros with explicit private variable declarations

src/common/m_mpi_common.fpp

  • Converted GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() to
    $:GPU_PARALLEL_LOOP() syntax
  • Added explicit private clause listing all loop variables (i, j, k, l,
    q, r) to GPU parallel regions
  • Updated macro closing from #:endcall GPU_PARALLEL_LOOP to
    $:END_GPU_PARALLEL_LOOP()
  • Corrected indentation of loop bodies to match new macro structure
+222/-222
m_pressure_relaxation.fpp
Refactor GPU parallel loop macros with private variables 

src/simulation/m_pressure_relaxation.fpp

  • Replaced #:call GPU_PARALLEL_LOOP() with $:GPU_PARALLEL_LOOP() macro
    syntax
  • Added explicit private='[j,k,l]' clause to specify privatized
    variables
  • Changed closing macro from #:endcall GPU_PARALLEL_LOOP to
    $:END_GPU_PARALLEL_LOOP()
  • Adjusted loop body indentation to align with new macro format
+7/-7     
m_cbc.fpp
Refactor GPU loop macros to separate start and end directives

src/simulation/m_cbc.fpp

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() /
    #:endcall GPU_PARALLEL_LOOP to $:GPU_PARALLEL_LOOP() /
    $:END_GPU_PARALLEL_LOOP() syntax
  • Added explicit private clause parameters to all GPU_PARALLEL_LOOP
    calls specifying loop variables
  • Adjusted indentation of loop bodies to align with new macro structure,
    removing extra indentation from the old call-based approach
  • Updated numerous GPU parallel regions throughout CBC calculations,
    flux reshaping, and output operations
+590/-590
m_compute_levelset.fpp
Refactor GPU parallel loop macros in levelset computations

src/common/m_compute_levelset.fpp

  • Converted all #:call GPU_PARALLEL_LOOP() / #:endcall GPU_PARALLEL_LOOP
    invocations to $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP()
    syntax
  • Added explicit private clause with loop variable lists to all GPU
    parallel loop macros
  • Reduced indentation of loop bodies by one level to match new macro
    structure
  • Applied changes across all levelset computation functions (circle,
    airfoil, 3D airfoil, rectangle, cuboid, sphere, cylinder)
+274/-274
omp_macros.fpp
Split GPU parallel loop macro into separate start and end macros

src/common/include/omp_macros.fpp

  • Separated OMP_PARALLEL_LOOP macro into two parts: start directive
    generation and end directive generation
  • Removed code parameter from OMP_PARALLEL_LOOP macro definition
  • Created new END_OMP_PARALLEL_LOOP macro to emit appropriate end
    directives based on compiler type
  • Removed inline $:code execution from OMP_PARALLEL_LOOP, allowing loop
    bodies to be written directly in code
+15/-6   
Formatting
1 files
shared_parallel_macros.fpp
Add required newline at end of file                                           

src/common/include/shared_parallel_macros.fpp

  • Added newline at end of file to comply with FYPP requirements
+1/-1     
Additional files
27 files
parallel_macros.fpp +17/-7   
m_boundary_common.fpp +345/-345
m_chemistry.fpp +128/-128
m_finite_differences.fpp +31/-31 
m_ib_patches.fpp +189/-189
m_phase_change.fpp +118/-118
m_variables_conversion.fpp +316/-316
m_acoustic_src.fpp +105/-105
m_body_forces.fpp +48/-48 
m_bubbles_EE.fpp +171/-171
m_bubbles_EL.fpp +285/-285
m_bubbles_EL_kernels.fpp +101/-101
m_data_output.fpp +13/-13 
m_derived_variables.fpp +217/-217
m_fftw.fpp +59/-59 
m_hyperelastic.fpp +93/-93 
m_hypoelastic.fpp +255/-255
m_ibm.fpp +158/-158
m_igr.fpp +1728/-1728
m_mhd.fpp +48/-48 
m_mpi_proxy.fpp +53/-53 
m_muscl.fpp +141/-141
m_qbmm.fpp +218/-218
m_riemann_solvers.fpp +3155/-3155
m_surface_tension.fpp +157/-157
m_time_steppers.fpp +103/-103
m_weno.fpp +494/-494

CodeAnt-AI Description

Restore accurate line numbers for GPU loops

What Changed

  • GPU loops now open and close with the new parallel macros so compilers see the real loop bodies, which fixes reported error line numbers and eliminates the extra warning noise from the old wrappers.
  • MPI unpack/pack paths plus viscous, gradient, and MHD calculations now use the same macros, extending the clearer diagnostics to those GPU-bound routines.

Impact

✅ Clearer GPU error line numbers
✅ Fewer GPU build warnings
✅ Consistent diagnostics across viscous and MPI kernels

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • New Features

    • Added a new public interface-compression routine for improved interface reconstruction.
  • Documentation

    • GPU parallelization guide updated to a new invocation style and clarifies that parallelization directives wrap code blocks.
  • Refactor

    • Standardized GPU parallel-region syntax and end markers across the codebase, tightened private-variable scoping, preserved legacy variants for compatibility, and unified 3D boundary handling across modules.

@danieljvickers danieljvickers requested a review from a team as a code owner November 6, 2025 16:37
@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Nov 6, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1028 - Partially compliant

Compliant requirements:

  • Ensure FYPP-generated line markers advance correctly inside GPU parallel regions.
  • Refactor GPU loop macros so that start/end markers are separate (no single wrapper causing same-line markers).
  • Preserve functional equivalence of existing GPU loops (OpenACC/OpenMP paths).
  • Reduce excessive FYPP/compile-time warnings caused by previous macro usage.

Non-compliant requirements:

  • Validate that compiler error line numbers inside GPU loops point to the correct source lines.

Requires further human verification:

  • Validate that compiler error line numbers inside GPU loops point to the correct source lines.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Many new GPU_PARALLEL_LOOP insertions use arrays indexed at k-1 or similar (e.g., flux at k-1) while outer loops start at 0 or include boundaries. Please verify boundary ranges to avoid out-of-bounds when k=0 (examples include flux access patterns around lines adding inv_ds with flux_face1 at index-1).

        call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz)
    end if

    $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k_loop,l_loop,q_loop,inv_ds,flux_face1,flux_face2]')
    do j = 1, sys_size
        do q_loop = 0, p
            do l_loop = 0, n
                do k_loop = 0, m
                    inv_ds = 1._wp/dx(k_loop)
                    flux_face1 = flux_n(1)%vf(j)%sf(k_loop - 1, l_loop, q_loop)
                    flux_face2 = flux_n(1)%vf(j)%sf(k_loop, l_loop, q_loop)
                    rhs_vf(j)%sf(k_loop, l_loop, q_loop) = inv_ds*(flux_face1 - flux_face2)
                end do
            end do
        end do
    end do
    $:END_GPU_PARALLEL_LOOP()

    if (model_eqns == 3) then
        $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k_loop,l_loop,q_loop,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]')
        do q_loop = 0, p
            do l_loop = 0, n
                do k_loop = 0, m
                    do i_fluid_loop = 1, num_fluids
                        inv_ds = 1._wp/dx(k_loop)
                        advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(k_loop, l_loop, q_loop)
                        pressure_val = q_prim_vf%vf(E_idx)%sf(k_loop, l_loop, q_loop)
                        flux_face1 = flux_src_n_vf%vf(advxb)%sf(k_loop, l_loop, q_loop)
                        flux_face2 = flux_src_n_vf%vf(advxb)%sf(k_loop - 1, l_loop, q_loop)
                        rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) = &
                            rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) - &
                            inv_ds*advected_qty_val*pressure_val*(flux_face1 - flux_face2)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if

    call s_add_directional_advection_source_terms(idir, rhs_vf, q_cons_vf, q_prim_vf, flux_src_n_vf, Kterm)

case (2) ! y-direction
    if (bc_y%beg <= BC_CHAR_SLIP_WALL .and. bc_y%beg >= BC_CHAR_SUP_OUTFLOW) then
        call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, -1, irx, iry, irz)
    end if
    if (bc_y%end <= BC_CHAR_SLIP_WALL .and. bc_y%end >= BC_CHAR_SUP_OUTFLOW) then
        call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz)
    end if

    $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,inv_ds,flux_face1,flux_face2]')
    do j = 1, sys_size
        do l = 0, p
            do k = 0, n
                do q = 0, m
                    inv_ds = 1._wp/dy(k)
                    flux_face1 = flux_n(2)%vf(j)%sf(q, k - 1, l)
                    flux_face2 = flux_n(2)%vf(j)%sf(q, k, l)
                    rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) + inv_ds*(flux_face1 - flux_face2)
                end do
            end do
        end do
    end do
    $:END_GPU_PARALLEL_LOOP()

    if (model_eqns == 3) then
        $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k,l,q,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]')
        do l = 0, p
            do k = 0, n
                do q = 0, m
                    do i_fluid_loop = 1, num_fluids
                        inv_ds = 1._wp/dy(k)
                        advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(q, k, l)
                        pressure_val = q_prim_vf%vf(E_idx)%sf(q, k, l)
                        flux_face1 = flux_src_n_vf%vf(advxb)%sf(q, k, l)
                        flux_face2 = flux_src_n_vf%vf(advxb)%sf(q, k - 1, l)
                        rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = &
                            rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - &
                            inv_ds*advected_qty_val*pressure_val*(flux_face1 - flux_face2)
                        if (cyl_coord) then
                            rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = &
                                rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - &
                                5.e-1_wp/y_cc(k)*advected_qty_val*pressure_val*(flux_face1 + flux_face2)
                        end if
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if

    if (cyl_coord) then
        $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,flux_face1,flux_face2]')
        do j = 1, sys_size
            do l = 0, p
                do k = 0, n
                    do q = 0, m
                        flux_face1 = flux_gsrc_n(2)%vf(j)%sf(q, k - 1, l)
                        flux_face2 = flux_gsrc_n(2)%vf(j)%sf(q, k, l)
                        rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) - &
                                                5.e-1_wp/y_cc(k)*(flux_face1 + flux_face2)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if
Off-by-one Risk

Writes to data_real_gpu use +1 offsets and ranges l=0..p; ensure array extents and leading indices match device allocations after macro refactor, especially with firstprivate and collapse usage.

#endif
                #:endcall GPU_HOST_DATA

                $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3)
                do k = 1, sys_size
                    do j = 0, m
                        do l = 0, p
                            data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)/real(real_size, dp)
                            q_cons_vf(k)%sf(j, 0, l) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)
                        end do
                    end do
                end do
                $:END_GPU_PARALLEL_LOOP()

                do i = 1, fourier_rings

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3)
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 1, cmplx_size
                                data_fltr_cmplx_gpu(l + j*cmplx_size + (k - 1)*cmplx_size*x_size) = (0_dp, 0_dp)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]')
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 0, p
                                data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size) = q_cons_vf(k)%sf(j, i, l)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_cmplx]')
#if defined(__PGI)
                        ierr = cufftExecD2Z(fwd_plan_gpu, data_real_gpu, data_cmplx_gpu)
#else
                        ierr = hipfftExecD2Z(fwd_plan_gpu, c_loc(p_real), c_loc(p_cmplx))
                        call hipCheck(hipDeviceSynchronize())
#endif
                    #:endcall GPU_HOST_DATA

                    Nfq = min(floor(2_dp*real(i, dp)*pi), cmplx_size)
                    $:GPU_UPDATE(device='[Nfq]')

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3)
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 1, Nfq
                                data_fltr_cmplx_gpu(l + j*cmplx_size + (k - 1)*cmplx_size*x_size) = data_cmplx_gpu(l + j*cmplx_size + (k - 1)*cmplx_size*x_size)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_fltr_cmplx]')
#if defined(__PGI)
                        ierr = cufftExecZ2D(bwd_plan_gpu, data_fltr_cmplx_gpu, data_real_gpu)
#else
                        ierr = hipfftExecZ2D(bwd_plan_gpu, c_loc(p_fltr_cmplx), c_loc(p_real))
                        call hipCheck(hipDeviceSynchronize())
#endif
                    #:endcall GPU_HOST_DATA

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]')
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 0, p
                                data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)/real(real_size, dp)
                                q_cons_vf(k)%sf(j, i, l) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                end do
Macro Consistency

A comment line was moved to ensure newline at EOF. Confirm that macro endings and inserted END_GPU_PARALLEL_LOOP everywhere match the corresponding start to avoid unterminated regions in some files not shown here.

    #:endif
    $:extraArgs_val
#:enddef
! New line at end of file is required for FYPP

Comment on lines 917 to 940
if (mhd) then
if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
! B_y flux = v_x * B_y - v_y * Bx0
! B_z flux = v_x * B_z - v_z * Bx0
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of rho*v_i in the ${XYZ}$ direction
! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &
- B%R(i)*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &
- B%L(i)*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &
/(s_M - s_P)
do i = 0, 1
flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
end do
elseif (mhd .and. relativity) then
else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of m_i in the ${XYZ}$ direction
! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(cm%R(i)*vel_R(norm_dir) &
- b4%R(i)/Ga%R*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(cm%L(i)*vel_L(norm_dir) &
- b4%L(i)/Ga%L*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(cm%L(i) - cm%R(i))) &
/(s_M - s_P)
end do
elseif (bubbles_euler) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
end do
else if (hypoelasticity) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R &
- tau_e_R(dir_idx_tau(i))) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L &
- tau_e_L(dir_idx_tau(i))) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P)
end do
else
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
do i = 0, 2
flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
end do
end if
flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Restrict the 1D MHD flux calculation to only execute for the x-direction (NORM_DIR == 1) to prevent incorrect flux calculations for the y and z directions. [possible issue, importance: 9]

Suggested change
if (mhd) then
if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
! B_y flux = v_x * B_y - v_y * Bx0
! B_z flux = v_x * B_z - v_z * Bx0
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of rho*v_i in the ${XYZ}$ direction
! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &
- B%R(i)*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &
- B%L(i)*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &
/(s_M - s_P)
do i = 0, 1
flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
end do
elseif (mhd .and. relativity) then
else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of m_i in the ${XYZ}$ direction
! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(cm%R(i)*vel_R(norm_dir) &
- b4%R(i)/Ga%R*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(cm%L(i)*vel_L(norm_dir) &
- b4%L(i)/Ga%L*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(cm%L(i) - cm%R(i))) &
/(s_M - s_P)
end do
elseif (bubbles_euler) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
end do
else if (hypoelasticity) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R &
- tau_e_R(dir_idx_tau(i))) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L &
- tau_e_L(dir_idx_tau(i))) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P)
end do
else
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
do i = 0, 2
flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
end do
end if
flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
end if
if (mhd) then
if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
#:if (NORM_DIR == 1)
! B_y flux = v_x * B_y - v_y * Bx0
! B_z flux = v_x * B_z - v_z * Bx0
$:GPU_LOOP(parallelism='[seq]')
do i = 0, 1
flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
end do
#:endif
else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
$:GPU_LOOP(parallelism='[seq]')
do i = 0, 2
flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
end do
end if
flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
end if

rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add the loop variable q to the private clause of the GPU_PARALLEL_LOOP directive to prevent a race condition. [possible issue, importance: 9]

Suggested change
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

@danieljvickers danieljvickers requested a review from a team as a code owner November 6, 2025 16:43
@danieljvickers
Copy link
Member Author

I just builkt this on frontier manually and both omp and acc compile fine. I am going to rerun the build for frontier.

@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 39.68038% with 1774 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.34%. Comparing base (7169711) to head (30417cb).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_viscous.fpp 40.42% 190 Missing and 6 partials ⚠️
src/simulation/m_cbc.fpp 45.03% 162 Missing and 4 partials ⚠️
src/simulation/m_rhs.fpp 44.40% 128 Missing and 11 partials ⚠️
src/simulation/m_derived_variables.fpp 8.73% 114 Missing and 1 partial ⚠️
src/simulation/m_bubbles_EL.fpp 37.27% 105 Missing and 1 partial ⚠️
src/common/m_compute_levelset.fpp 40.00% 82 Missing and 20 partials ⚠️
src/simulation/m_hypoelastic.fpp 30.06% 96 Missing and 4 partials ⚠️
src/simulation/m_weno.fpp 55.83% 83 Missing and 4 partials ⚠️
src/common/m_boundary_common.fpp 55.08% 65 Missing and 10 partials ⚠️
src/simulation/m_surface_tension.fpp 29.16% 65 Missing and 3 partials ⚠️
... and 19 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
- Coverage   46.23%   44.34%   -1.90%     
==========================================
  Files          68       71       +3     
  Lines       13548    20578    +7030     
  Branches     1589     1990     +401     
==========================================
+ Hits         6264     9125    +2861     
- Misses       6356    10308    +3952     
- Partials      928     1145     +217     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson
Copy link
Member

running this by @anandrdbz and @prathi-wind. also the MHD parts might need to be examined by @ChrisZYJ

@danieljvickers
Copy link
Member Author

We can get their input. Luckily a few tests are actually passing on Frontier, and the ones that are failing appear to mostly have Adaptive time stepping failed to converge. as the error message, which is a workable debug point.

@anandrdbz
Copy link
Contributor

I personally think this should be fine, seems mostly non-intrusive to the PR

@danieljvickers
Copy link
Member Author

We are still getting the weird timeout bug on frontier. All tests pass except for two on the last job, and those two hung instead of failed. Mostly just starting that frontier has been pretty finicky, since I've resubmitted it about 4 times now without seeing any cases actually fail, just run out of time. After i get this merged, and maybe after SC, I want to open a branch and debug what is going on in the tool chain that's causing this.

@prathi-wind
Copy link
Collaborator

prathi-wind commented Nov 10, 2025

Is there a reason why you add loop iterator variables to the private list of the parallel loops? Loop iterators should automatically be private.

@prathi-wind
Copy link
Collaborator

To make the macros consistent, GPU_DATA and GPU_HOST_DATA should follow the same style as GPU_PARALLEL_LOOP.

@danieljvickers
Copy link
Member Author

There is not a particular reason I added private variables other than being through. I found a bug with IO as a result of explicitly placing iterators in the private variables, so I think it was a good thing to add.

For style, I have no problem with changing the host data macros. I would prefer to place it as a separate PR only because of the massive number of lines the indentation change modified here, and I think the host data is a stylistic change, where this PR actually encompasses a bug fix.

@danieljvickers
Copy link
Member Author

Logging again that frontier ran out of time. This time only one job hung, and it was different than the previous hang. Weird I guess I'll start again. @sbryngelson do you think I just keep resubmitting until I finally get a failure or it passes?

@codeant-ai
Copy link

codeant-ai bot commented Nov 17, 2025

CodeAnt AI is running Incremental review

@codeant-ai codeant-ai bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 17, 2025
@codeant-ai
Copy link

codeant-ai bot commented Nov 17, 2025

CodeAnt AI Incremental review completed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/common/m_mpi_common.fpp (1)

760-771: Fix incorrect macro syntax: Use GPU_PARALLEL_LOOP instead of GPU_PARALLEL throughout this file.

The pipeline reports that the GPU_PARALLEL macro doesn't accept the collapse keyword argument. Based on the PR objectives, the macro invocations should use the new $:GPU_PARALLEL_LOOP() and $:END_GPU_PARALLEL_LOOP() syntax, not the call-style syntax with GPU_PARALLEL.

This issue affects all 16 GPU parallel regions in this file at lines: 760, 774, 790, 807, 823, 840, 858, 961, 982, 998, 1015, 1037, 1054, 1073, 1096, and 1114.

Apply this pattern to fix all occurrences (example for lines 760-771):

-                    #:call GPU_PARALLEL(collapse=4,private='[r]')
+                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
                                     do i = 1, nVar
                                         r = (i - 1) + v_size*(j + buff_size*(k + (n + 1)*l))
                                         buff_send(r) = real(q_comm(i)%sf(j + pack_offset, k, l), kind=wp)
                                     end do
                                 end do
                             end do
                         end do
-                    #:endcall GPU_PARALLEL
+                    $:END_GPU_PARALLEL_LOOP()

Also applies to: 774-788, 790-804, 807-820, 823-838, 840-855, 858-871, 961-979, 982-996, 998-1012, 1015-1034, 1037-1052, 1054-1069, 1073-1093, 1096-1112, 1114-1130

♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)

774-788: Add q to the private list for consistency and correctness.

The loop body uses q as a loop index (line 779), but q is not included in the private clause. This is inconsistent with other qbmm loops in this file (lines 790, 823, 840, etc.) which include q in their private lists.

After applying the macro syntax fix, also add q to the private list:

-                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                    $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db852bf and 8578364.

📒 Files selected for processing (1)
  • src/common/m_mpi_common.fpp (21 hunks)
🧰 Additional context used
🪛 GitHub Actions: Coverage Check
src/common/m_mpi_common.fpp

[error] 760-760: FyppFatalError: exception occurred when calling 'GPU_PARALLEL' and macro 'GPU_PARALLEL' called with unknown keyword argument(s) 'collapse'. Preprocessing failed during Fypp step in pre_process target.

🪛 GitHub Actions: Pretty
src/common/m_mpi_common.fpp

[error] 1871-1871: ./mfc.sh format -j $(nproc) failed: MFC formatting detected an end-of-file newline issue in 'src/common/m_mpi_common.fpp' (diff shows missing newline at end of file).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Build & Publish

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)

774-788: Add missing 'q' to private list in qbmm pack loop.

The GPU_PARALLEL_LOOP_OLD at line 774 has private='[r]', but the inner loop at lines 779-783 uses q as a loop index:

do q = 1, nb
    r = (i - 1) + (q - 1)*4 + v_size* ...

Variable q should be included in the private clause to prevent potential data races. All other qbmm loops in this file (lines 790, 823, 840, etc.) correctly include q in their private lists.

Apply this fix:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=4,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=4,private='[q,r]')
🧹 Nitpick comments (3)
src/common/include/omp_macros.fpp (1)

199-253: Consider extracting compiler-specific directive mapping to reduce duplication.

The compiler-specific OpenMP directive logic (NVIDIA/PGI, CCE, AMD) is duplicated between OMP_PARALLEL_LOOP (lines 226-234) and END_OMP_PARALLEL_LOOP (lines 242-250). If a compiler's directive syntax changes, both macros must be updated consistently.

Consider extracting the directive mappings to shared variables or a helper macro to maintain consistency and reduce maintenance burden.

src/common/include/parallel_macros.fpp (1)

55-66: Consider adding END_ACC_PARALLEL_LOOP macro for consistency.

In END_GPU_PARALLEL_LOOP, the OpenACC end directive is hardcoded as a string (line 57), while the OpenMP end directive calls END_OMP_PARALLEL_LOOP() (line 58).

For consistency and future extensibility, consider defining an END_ACC_PARALLEL_LOOP macro in acc_macros.fpp and calling it here, mirroring the OMP pattern. This would make the codebase more uniform and easier to maintain if compiler-specific ACC end directives are needed later.

src/common/include/acc_macros.fpp (1)

164-191: Consider adding END_ACC_PARALLEL_LOOP macro for consistency with OMP pattern.

The new ACC_PARALLEL_LOOP macro (lines 164-191) only emits the start directive, following the refactoring pattern. However, unlike omp_macros.fpp which defines END_OMP_PARALLEL_LOOP, there is no corresponding END_ACC_PARALLEL_LOOP macro in this file.

While the current usage through GPU_PARALLEL_LOOP works (it hardcodes the ACC end directive at line 57 of parallel_macros.fpp), adding END_ACC_PARALLEL_LOOP would:

  • Maintain consistency with the OMP macro pattern
  • Enable direct usage of ACC macros if needed
  • Improve maintainability by centralizing the end directive logic
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b521697 and 973ea73.

📒 Files selected for processing (4)
  • src/common/include/acc_macros.fpp (3 hunks)
  • src/common/include/omp_macros.fpp (2 hunks)
  • src/common/include/parallel_macros.fpp (2 hunks)
  • src/common/m_mpi_common.fpp (20 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Build & Publish

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 973ea73 and 2ede9a1.

📒 Files selected for processing (1)
  • src/common/m_boundary_common.fpp (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish
🔇 Additional comments (1)
src/common/m_boundary_common.fpp (1)

94-118: LGTM: Clean macro refactoring.

The refactoring from call-style macros to explicit start/end directives is well-executed. The explicit private variable declarations (private='[l,k]') and collapse=2 directive are appropriate for the nested loop structure and will help produce accurate line numbers in compiler diagnostics.

Comment on lines +267 to +268
case (BC_SlIP_WALL)
call s_slip_wall(q_prim_vf, 3, 1, k, l)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix typo in case statement.

Line 267 contains a typo: BC_SlIP_WALL (capital 'I') should be BC_SLIP_WALL to match the constant used throughout the rest of the file. This typo would prevent the slip wall boundary condition from being applied correctly at the z-direction end boundary.

Apply this diff to fix the typo:

-                        case (BC_SlIP_WALL)
+                        case (BC_SLIP_WALL)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case (BC_SlIP_WALL)
call s_slip_wall(q_prim_vf, 3, 1, k, l)
case (BC_SLIP_WALL)
call s_slip_wall(q_prim_vf, 3, 1, k, l)
🤖 Prompt for AI Agents
In src/common/m_boundary_common.fpp around lines 267 to 268, the case label has
a typo `BC_SlIP_WALL` (capital 'I') that should be `BC_SLIP_WALL`; update the
case statement to use the correct constant name `BC_SLIP_WALL` so the slip wall
boundary condition is matched and the call to s_slip_wall executes as intended.

@danieljvickers
Copy link
Member Author

Just as a note about my debugging now. it does look like reverting solves the seg fault issue. I couldn't begin to guess why. If this passes, it should not be merged until I resolve the MPI common file. I will likely do a series of tests where I change the file a few loops at a time until I identify the compiler seg fault. I will make another comment when it is ready to be merged.

Daniel Vickers added 2 commits November 17, 2025 00:16
…of fake errors. Now testing with half the loops in the new configuration.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/common/m_mpi_common.fpp (1)

760-906: Complete migration to new-style GPU macros within the buffer-packing subroutine.

Within the same subroutine (lines 760–906), different MPI directions use inconsistent macro styles:

  • mpi_dir==1 (lines 760–771): new-style $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP()
  • mpi_dir==2 (line 840): OLD-style #:call GPU_PARALLEL_LOOP_OLD() / #:endcall GPU_PARALLEL_LOOP_OLD
  • mpi_dir==3 (lines 858–906): OLD-style macros

This mixed usage contradicts the PR's objective to uniformly replace call-style macros with start/end directives. The inconsistency creates maintenance burden and undermines the goal of accurate line-number mapping.

Migrate the remaining OLD-style macros at lines 840–855 and 858–906 to the new style for consistency.

♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)

774-788: q loop variable still missing from private clause (previously flagged).

The GPU_PARALLEL_LOOP on line 774 omits q from its private list despite using it as a loop iterator on line 779. This was flagged in a previous review but remains unaddressed.

Apply this fix:

-                        $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
🧹 Nitpick comments (1)
src/common/m_mpi_common.fpp (1)

961-1130: Unpacking sections uniformly use OLD-style macros while packing is mixed.

All buffer-unpacking loops (lines 961-1130) across all three MPI directions consistently use the OLD-style #:call GPU_PARALLEL_LOOP_OLD() / #:endcall macros. This contrasts with the packing sections (lines 760-906), which mix new-style and OLD-style.

While internal consistency within unpacking is good, the asymmetry between packing and unpacking sections suggests incomplete migration. Consider migrating unpacking to the new style in a follow-up to achieve uniform line-number mapping across the entire subroutine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5a3f74 and abf2f5a.

📒 Files selected for processing (1)
  • src/common/m_mpi_common.fpp (15 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/common/m_mpi_common.fpp (4)

874-906: Add q to private clause in qbmm packing loops (mpi_dir == 3, OLD-style macros).

Both qbmm GPU pack loops for mpi_dir == 3 use the OLD-style macro with private='[r]' but iterate over q (lines 879 and 896), creating the same data race as the other mpi_dir branches.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, buff_size - 1
                                     do k = -buff_size, n + buff_size

Also applies to line 891:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, buff_size - 1
                                     do k = -buff_size, n + buff_size

982-1012: Add q to private clause in qbmm unpacking loops (mpi_dir == 1, OLD-style macros).

Both qbmm unpack loops for mpi_dir == 1 use OLD-style macros with private='[r]' but iterate over q (lines 987 and 1003), creating a data race during buffer unpacking.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do l = 0, p
                                 do k = 0, n
                                     do j = -buff_size, -1

Also applies to line 998:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do l = 0, p
                                 do k = 0, n
                                     do j = -buff_size, -1

1037-1069: Add q to private clause in qbmm unpacking loops (mpi_dir == 2, OLD-style macros).

Both qbmm unpack loops for mpi_dir == 2 iterate over q (lines 1042 and 1059) but only include [r] in their private clauses, causing a data race.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1

Also applies to line 1054:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1

1096-1130: Add q to private clause in qbmm unpacking loops (mpi_dir == 3, OLD-style macros).

Both qbmm unpack loops for mpi_dir == 3 iterate over q (lines 1101 and 1119) but only include [r] in their private clauses, causing a data race.

Apply this diff to add q to both private clauses:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size

Also applies to line 1114:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size
♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)

774-804: Add q to private clause in qbmm packing loops (mpi_dir == 1).

Both qbmm GPU pack loops for mpi_dir == 1 have private='[r]' but the innermost loop iterates over q (line 779 and 795). Since q is declared in the subroutine scope (line 688) and not marked private, this creates a data race where multiple parallel iterations may conflict on the shared q variable, leading to incorrect buffer packing.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1

Also applies to line 790:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
🧹 Nitpick comments (1)
src/common/m_mpi_common.fpp (1)

760-1130: Consider completing macro migration for consistency.

The file mixes new-style $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP() macros (lines 760-771, 774-788, 790-804, 807-820, 823-838, 840-855, 858-871) with OLD-style #:call GPU_PARALLEL_LOOP_OLD() / #:endcall GPU_PARALLEL_LOOP_OLD macros (lines 874-906 for mpi_dir == 3 packing and lines 961-1130 for all unpacking).

Given your comments about encountering compiler segfaults and intending additional debugging before merging these MPI changes, this partial migration appears intentional. Once debugging is complete, consider migrating all remaining OLD-style macros to the new style for consistency with the rest of the PR.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abf2f5a and 71f19b1.

📒 Files selected for processing (1)
  • src/common/m_mpi_common.fpp (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish

Comment on lines +823 to +855
$:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do i = nVar + 1, nVar + 4
do l = 0, p
do k = 0, buff_size - 1
do j = -buff_size, m + buff_size
do q = 1, nb
r = (i - 1) + (q - 1)*4 + v_size* &
((j + buff_size) + (m + 2*buff_size + 1)* &
(k + buff_size*l))
buff_send(r) = real(pb_in(j, k + pack_offset, l, i - nVar, q), kind=wp)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end do
$:END_GPU_PARALLEL_LOOP()
#:call GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do i = nVar + 1, nVar + 4
do l = 0, p
do k = 0, buff_size - 1
do j = -buff_size, m + buff_size
do q = 1, nb
r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
((j + buff_size) + (m + 2*buff_size + 1)* &
(k + buff_size*l))
buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nVar, q), kind=wp)
end do
$:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do i = nVar + 1, nVar + 4
do l = 0, p
do k = 0, buff_size - 1
do j = -buff_size, m + buff_size
do q = 1, nb
r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
((j + buff_size) + (m + 2*buff_size + 1)* &
(k + buff_size*l))
buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nVar, q), kind=wp)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end do
$:END_GPU_PARALLEL_LOOP()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add q to private clause in qbmm packing loops (mpi_dir == 2).

Both qbmm GPU pack loops for mpi_dir == 2 iterate over q (lines 828 and 845) but only include [r] in their private clauses. This creates the same data race as in the mpi_dir == 1 case above.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, p
                                 do k = 0, buff_size - 1

Also applies to line 840:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, p
                                 do k = 0, buff_size - 1

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/common/m_mpi_common.fpp around lines 823 to 855, the two
GPU_PARALLEL_LOOP directives for the qbmm packing loops (mpi_dir == 2) only
declare [r] as private but the loops also iterate over q, causing a data race;
update both GPU_PARALLEL_LOOP(private='[r]') clauses to include q (e.g.,
private='[r,q]') for each of the two loops (also adjust the similar private
clause at the nearby line ~840) so that q is private within the parallel region.

Daniel Vickers added 2 commits November 17, 2025 00:44
…ng loops or both. Changing one of them now.
…n it is line 891-906 in m_mpi_common.fpp. I will then have to figure out why that causes a seg fault.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/common/m_mpi_common.fpp (2)

823-855: Add q to private clauses in both qbmm packing loops (mpi_dir == 2).

Both qbmm GPU packing loops for mpi_dir == 2 iterate over q (lines 828 and 845) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, p
                                 do k = 0, buff_size - 1
                                     do j = -buff_size, m + buff_size
                                         do q = 1, nb

Also at line 840:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, p
                                 do k = 0, buff_size - 1

774-804: Add q to private clauses in both qbmm packing loops (mpi_dir == 1).

Both qbmm GPU packing loops for mpi_dir == 1 iterate over q (lines 779 and 795) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
                                     do i = nVar + 1, nVar + 4
                                         do q = 1, nb

Also at line 790:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1875e3c and b5079f6.

📒 Files selected for processing (1)
  • src/common/m_mpi_common.fpp (11 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pretty
src/common/m_mpi_common.fpp

[error] 896-896: MFC formatting changed the file. Process completed with exit code 1 after diff in src/common/m_mpi_common.fpp.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish

Comment on lines 874 to 906
$:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do i = nVar + 1, nVar + 4
do l = 0, buff_size - 1
do k = -buff_size, n + buff_size
do j = -buff_size, m + buff_size
do q = 1, nb
r = (i - 1) + (q - 1)*4 + v_size* &
((j + buff_size) + (m + 2*buff_size + 1)* &
((k + buff_size) + (n + 2*buff_size + 1)*l))
buff_send(r) = real(pb_in(j, k, l + pack_offset, i - nVar, q), kind=wp)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end do
$:END_GPU_PARALLEL_LOOP()
#:call GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do i = nVar + 1, nVar + 4
do l = 0, buff_size - 1
do k = -buff_size, n + buff_size
do j = -buff_size, m + buff_size
do q = 1, nb
r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
((j + buff_size) + (m + 2*buff_size + 1)* &
((k + buff_size) + (n + 2*buff_size + 1)*l))
buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nVar, q), kind=wp)
end do
$:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do i = nVar + 1, nVar + 4
do l = 0, buff_size - 1
do k = -buff_size, n + buff_size
do j = -buff_size, m + buff_size
do q = 1, nb
r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
((j + buff_size) + (m + 2*buff_size + 1)* &
((k + buff_size) + (n + 2*buff_size + 1)*l))
buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nVar, q), kind=wp)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end do
$:END_GPU_PARALLEL_LOOP()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add q to private clauses in both qbmm packing loops (mpi_dir == 3).

Both qbmm GPU packing loops for mpi_dir == 3 iterate over q (lines 879 and 896) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.

Apply this diff to add q to both private clauses:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, buff_size - 1
                                 do k = -buff_size, n + buff_size
                                     do j = -buff_size, m + buff_size
                                         do q = 1, nb

Also at line 891:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do i = nVar + 1, nVar + 4
                             do l = 0, buff_size - 1
                                 do k = -buff_size, n + buff_size
🧰 Tools
🪛 GitHub Actions: Pretty

[error] 896-896: MFC formatting changed the file. Process completed with exit code 1 after diff in src/common/m_mpi_common.fpp.

🤖 Prompt for AI Agents
In src/common/m_mpi_common.fpp around lines 874 to 906, both GPU_PARALLEL_LOOP
pragmas for the qbmm packing path (mpi_dir == 3) currently declare only r as
private but the inner loop also iterates over q; declare q private as well to
prevent data races by updating the private clauses to include q (e.g.,
private='[r,q]') for both loops spanning the blocks starting at ~line 876 and
~line 892.

…ll be curious if the format change does anything.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/common/m_mpi_common.fpp (6)

982-996: Add q to private clause in qbmm pb_in unpacking loop (mpi_dir == 1).

The qbmm GPU unpacking loop for pb_in when mpi_dir == 1 iterates over q (line 987) but only includes [r] in the private clause at line 961's OLD-style wrapper. Variable q must be declared private to avoid potential data races.

The private clause is set at the GPU_PARALLEL_LOOP_OLD invocation. Update it to include q:

-                    #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                    #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = -buff_size, -1

998-1012: Add q to private clause in qbmm mv_in unpacking loop (mpi_dir == 1).

The qbmm GPU unpacking loop for mv_in when mpi_dir == 1 iterates over q (line 1003) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do l = 0, p
                                 do k = 0, n
                                     do j = -buff_size, -1

1037-1052: Add q to private clause in qbmm pb_in unpacking loop (mpi_dir == 2).

The qbmm GPU unpacking loop for pb_in when mpi_dir == 2 iterates over q (line 1042) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1

1054-1069: Add q to private clause in qbmm mv_in unpacking loop (mpi_dir == 2).

The qbmm GPU unpacking loop for mv_in when mpi_dir == 2 iterates over q (line 1059) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = 0, p
                                     do k = -buff_size, -1

1096-1112: Add q to private clause in qbmm pb_in unpacking loop (mpi_dir == 3).

The qbmm GPU unpacking loop for pb_in when mpi_dir == 3 iterates over q (line 1101) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size

1114-1130: Add q to private clause in qbmm mv_in unpacking loop (mpi_dir == 3).

The qbmm GPU unpacking loop for mv_in when mpi_dir == 3 iterates over q (line 1119) but only includes [r] in the private clause. Variable q must be declared private to avoid potential data races.

Apply this diff:

-                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]')
+                        #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]')
                             do i = nVar + 1, nVar + 4
                                 do l = -buff_size, -1
                                     do k = -buff_size, n + buff_size
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5079f6 and 31d913f.

📒 Files selected for processing (1)
  • src/common/m_mpi_common.fpp (11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish

Comment on lines +790 to +804
$:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
do l = 0, p
do k = 0, n
do j = 0, buff_size - 1
do i = nVar + 1, nVar + 4
do q = 1, nb
r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
(j + buff_size*(k + (n + 1)*l))
buff_send(r) = real(mv_in(j + pack_offset, k, l, i - nVar, q), kind=wp)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end do
$:END_GPU_PARALLEL_LOOP()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add q to private clause in qbmm mv_in packing loop (mpi_dir == 1).

The second qbmm GPU packing loop for mpi_dir == 1 iterates over q (line 795) but only includes [r] in the private clause. This creates a potential data race on q.

Apply this diff:

-                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+                        $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
                         do l = 0, p
                             do k = 0, n
                                 do j = 0, buff_size - 1
🤖 Prompt for AI Agents
In src/common/m_mpi_common.fpp around lines 790 to 804, the GPU_PARALLEL_LOOP
for the qbmm mv_in packing (mpi_dir == 1) declares private='[r]' but the loop
also uses loop variable q, which can cause a data race; update the private
clause to include q as well (e.g., private='[r,q]') so q is private per GPU
thread and no race occurs, preserving the existing loop structure and
indentation.

…VHOPC openACC, which must have been overwritten
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/simulation/m_bubbles_EL_kernels.fpp (2)

58-93: Incomplete private clause causes data races.

The private clause only includes l, s_coord, and cell, but the loop body also modifies volpart, strength_vol, strength_vel, Vol, addFun1, addFun2, and addFun3. Without privatizing these temporaries, concurrent threads will race on shared variables, producing nondeterministic results.

Apply this diff to add missing variables to the private clause:

-        $:GPU_PARALLEL_LOOP(private='[l,s_coord,cell]')
+        $:GPU_PARALLEL_LOOP(private='[l,s_coord,cell,volpart,strength_vol,strength_vel,Vol,addFun1,addFun2,addFun3]')

124-199: Incomplete private clause causes data races.

The private clause only includes nodecoord, l, s_coord, cell, and center, but the loop body also uses per-bubble temporaries volpart, strength_vol, strength_vel, stddsv and per-cell temporaries func, func2, addFun1, addFun2, addFun3, cellaux, and celloutside. Without privatizing these variables in the outer parallel loop, concurrent threads will race on shared storage.

Apply this diff to add missing variables to the private clause:

-        $:GPU_PARALLEL_LOOP(private='[nodecoord,l,s_coord,cell,center]', copyin='[smearGrid,smearGridz]')
+        $:GPU_PARALLEL_LOOP(private='[nodecoord,l,s_coord,cell,center,volpart,strength_vol,strength_vel,stddsv,func,func2,addFun1,addFun2,addFun3,cellaux,celloutside]', copyin='[smearGrid,smearGridz]')
🧹 Nitpick comments (1)
src/simulation/m_bubbles_EL.fpp (1)

1516-1529: Void-fraction statistics reduction is correctly expressed

The $:GPU_PARALLEL_LOOP in s_write_void_evol uses a grouped reduction [[lag_vol, lag_void_avg], [lag_void_max]] with operators [+, MAX], which matches the accumulation pattern inside the loop. Privatizing volcell avoids reuse across iterations, and the prior initialization of the reduced variables ensures a well-defined starting point.

If you ever touch this block again, you might consider switching the literal 5.0d-11 to a wp-kind literal (e.g., 5.0e-11_wp) for consistency with the rest of the file, but it’s not functionally urgent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31d913f and 30417cb.

📒 Files selected for processing (2)
  • src/simulation/m_bubbles_EL.fpp (10 hunks)
  • src/simulation/m_bubbles_EL_kernels.fpp (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Self Hosted (cpu, none, frontier)
  • GitHub Check: Self Hosted (gpu, acc, gt)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Self Hosted (gpu, acc, frontier)
  • GitHub Check: Self Hosted (cpu, none, gt)
  • GitHub Check: Self Hosted (gpu, omp, frontier)
  • GitHub Check: Self Hosted (gpu, omp, gt)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Build & Publish
🔇 Additional comments (8)
src/simulation/m_bubbles_EL_kernels.fpp (1)

384-384: No breaking changes detected within the codebase.

The search found no calls to s_get_char_vol anywhere in the repository, and the module m_bubbles_EL_kernels is not imported elsewhere. Removing the elemental keyword will not break any existing code within this codebase.

However, if this function is part of a public library API consumed by external projects, those external callers may have elemental calls that would now fail. Please verify whether this module is intended as a public API or an internal implementation detail.

src/simulation/m_bubbles_EL.fpp (7)

598-615: GPU_PARALLEL_LOOP refactor in s_compute_bubble_EL_dynamics looks sound

The new $:GPU_PARALLEL_LOOP blocks correctly privatize loop indices and per‑iteration temporaries (k, i, cell, myalpha_rho, myalpha, Re) and use a MAX reduction for adap_dt_stop_max consistent with the scalar update pattern. I don’t see new data races or scope issues introduced by this macro restructuring, and the added copyin='[stage]' is appropriate.

Also applies to: 620-686, 692-699


723-738: Source-term GPU loops have consistent scoping and no obvious races

In s_compute_bubbles_EL_source, the converted $:GPU_PARALLEL_LOOP regions (with collapse and explicit private='[i,j,k,l]') align with the nested loop structure, and each iteration writes to a unique (l,i,j,k) or (i,j,k) location in rhs_vf/q_beta. The absence of reductions here is correct, and I don’t see any new parallel write conflicts introduced by the macro changes.

Also applies to: 740-754, 762-775, 779-787, 792-804


849-859: Void-fraction smearing loops remain correct under new macros

The two $:GPU_PARALLEL_LOOP usages in s_smear_voidfraction properly privatize indices and cover all (i,j,k,l) combinations once, so zeroing q_beta and then applying the 1 - beta/clamping transform is still race‑free and logically unchanged by the refactor.

Also applies to: 866-877


1107-1117: Time-stepper update kernels preserve per-bubble independence

In s_update_lagrange_tdv_rk, all RK1/RK2/RK3 update loops now use $:GPU_PARALLEL_LOOP(private='[k]') over k = 1:nBubs, with each iteration only touching state for that k. This keeps the original per‑bubble independence, so introducing the explicit start/end GPU macros does not add races or change the update ordering semantics.

Also applies to: 1131-1141, 1144-1154, 1168-1179, 1182-1192, 1193-1204


1280-1292: Temporary-state transfer loop is a good fit for GPU_PARALLEL_LOOP

The $:GPU_PARALLEL_LOOP(private='[k]') around s_transfer_data_to_tmp correctly parallelizes a pure per‑bubble copy of state from slot 1 → 2. There are no cross‑iteration dependencies, so this is safe and matches the intended data movement.


1378-1391: Gradient kernels: GPU loop restructuring keeps stencil semantics intact

For all three directions in s_gradient_dir, the new $:GPU_PARALLEL_LOOP(private='[i,j,k]', collapse=3) wrappers preserve the central‑difference stencil (neighbors at ±1 along the active axis) and only update dq(i,j,k) per iteration. Given the existing ghost‑cell layout, this macro refactor does not introduce boundary or race issues.

Also applies to: 1394-1406, 1409-1421


1712-1721: Bubble-radius stats reduction matches update logic

In s_calculate_lag_bubble_stats, the $:GPU_PARALLEL_LOOP with reduction='[[Rmax_glb], [Rmin_glb]]' and reductionOp='[MAX, MIN]' matches the max/min usage, and each k iteration only writes to Rmax_stats(k)/Rmin_stats(k). This keeps global and per‑bubble statistics consistent without adding races.

Copy link
Collaborator

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark

@danieljvickers
Copy link
Member Author

@anandrdbz I think that you told me that I should expect benchmark to fail because the master benchmark is not currently working. Can you comment if I am remembering this correctly or not?

@danieljvickers
Copy link
Member Author

danieljvickers commented Nov 17, 2025

@sbryngelson My code is finally rebased and passing regression tests, but failing benchmark. The last PR that I can find that even ran the benchmark tests is this one: 8a66941

That PR fails benchmark.

I think I recall Anand telling that Benchmark would fail. I asked for him to comment to provide clarity on it. I think this PR is actually as complete as it can be and we just expect the benchmarking to fail. My preference would be to merge this sooner if we can because every PR that comes in causes a significant amount of debugging and rebasing on code I am unfamiliar with. I don't know if it is efficient for me to keep updating this PR until we have a fix to the test suite. I am curious if you want to wait for benchmarking to return or if we can merge.What do you think?

Edit: Anand confirmed over slack that the NVHPC bench is expected to fail because of issues with openMP on it. The rest appear to fail due to being canceled because the NVHPC bench test fails.

@danieljvickers
Copy link
Member Author

@sbryngelson thos merge passed all the benchmarks except for the broken one. Do you think we can just merge it now and work on the NVHPC issue next?

@sbryngelson sbryngelson merged commit 22af239 into MFlowCode:master Nov 19, 2025
39 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 4/5 size:XXL This PR changes 1000+ lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

add-correct-line-numbering-to-fypp

5 participants