-
Notifications
You must be signed in to change notification settings - Fork 122
Add line numbering to gpu loops #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add line numbering to gpu loops #1029
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
| if (mhd) then | ||
| if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const. | ||
| ! B_y flux = v_x * B_y - v_y * Bx0 | ||
| ! B_z flux = v_x * B_z - v_z * Bx0 | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 3 | ||
| ! Flux of rho*v_i in the ${XYZ}$ direction | ||
| ! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot | ||
| flux_rs${XYZ}$_vf(j, k, l, contxe + i) = & | ||
| (s_M*(rho_R*vel_R(i)*vel_R(norm_dir) & | ||
| - B%R(i)*B%R(norm_dir) & | ||
| + dir_flg(i)*(pres_R + pres_mag%R)) & | ||
| - s_P*(rho_L*vel_L(i)*vel_L(norm_dir) & | ||
| - B%L(i)*B%L(norm_dir) & | ||
| + dir_flg(i)*(pres_L + pres_mag%L)) & | ||
| + s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) & | ||
| /(s_M - s_P) | ||
| do i = 0, 1 | ||
| flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) & | ||
| - s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) & | ||
| + s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P) | ||
| end do | ||
| elseif (mhd .and. relativity) then | ||
| else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction | ||
| ! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$) | ||
| ! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$) | ||
| ! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$) | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 3 | ||
| ! Flux of m_i in the ${XYZ}$ direction | ||
| ! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot | ||
| flux_rs${XYZ}$_vf(j, k, l, contxe + i) = & | ||
| (s_M*(cm%R(i)*vel_R(norm_dir) & | ||
| - b4%R(i)/Ga%R*B%R(norm_dir) & | ||
| + dir_flg(i)*(pres_R + pres_mag%R)) & | ||
| - s_P*(cm%L(i)*vel_L(norm_dir) & | ||
| - b4%L(i)/Ga%L*B%L(norm_dir) & | ||
| + dir_flg(i)*(pres_L + pres_mag%L)) & | ||
| + s_M*s_P*(cm%L(i) - cm%R(i))) & | ||
| /(s_M - s_P) | ||
| end do | ||
| elseif (bubbles_euler) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_vels | ||
| flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = & | ||
| (s_M*(rho_R*vel_R(dir_idx(1)) & | ||
| *vel_R(dir_idx(i)) & | ||
| + dir_flg(dir_idx(i))*(pres_R - ptilde_R)) & | ||
| - s_P*(rho_L*vel_L(dir_idx(1)) & | ||
| *vel_L(dir_idx(i)) & | ||
| + dir_flg(dir_idx(i))*(pres_L - ptilde_L)) & | ||
| + s_M*s_P*(rho_L*vel_L(dir_idx(i)) & | ||
| - rho_R*vel_R(dir_idx(i)))) & | ||
| /(s_M - s_P) & | ||
| + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i))) | ||
| end do | ||
| else if (hypoelasticity) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_vels | ||
| flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = & | ||
| (s_M*(rho_R*vel_R(dir_idx(1)) & | ||
| *vel_R(dir_idx(i)) & | ||
| + dir_flg(dir_idx(i))*pres_R & | ||
| - tau_e_R(dir_idx_tau(i))) & | ||
| - s_P*(rho_L*vel_L(dir_idx(1)) & | ||
| *vel_L(dir_idx(i)) & | ||
| + dir_flg(dir_idx(i))*pres_L & | ||
| - tau_e_L(dir_idx_tau(i))) & | ||
| + s_M*s_P*(rho_L*vel_L(dir_idx(i)) & | ||
| - rho_R*vel_R(dir_idx(i)))) & | ||
| /(s_M - s_P) | ||
| end do | ||
| else | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_vels | ||
| flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = & | ||
| (s_M*(rho_R*vel_R(dir_idx(1)) & | ||
| *vel_R(dir_idx(i)) & | ||
| + dir_flg(dir_idx(i))*pres_R) & | ||
| - s_P*(rho_L*vel_L(dir_idx(1)) & | ||
| *vel_L(dir_idx(i)) & | ||
| + dir_flg(dir_idx(i))*pres_L) & | ||
| + s_M*s_P*(rho_L*vel_L(dir_idx(i)) & | ||
| - rho_R*vel_R(dir_idx(i)))) & | ||
| /(s_M - s_P) & | ||
| + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i))) | ||
| do i = 0, 2 | ||
| flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( & | ||
| s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - & | ||
| s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + & | ||
| s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P) | ||
| end do | ||
| end if | ||
| flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp | ||
| end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Restrict the 1D MHD flux calculation to only execute for the x-direction (NORM_DIR == 1) to prevent incorrect flux calculations for the y and z directions. [possible issue, importance: 9]
| if (mhd) then | |
| if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const. | |
| ! B_y flux = v_x * B_y - v_y * Bx0 | |
| ! B_z flux = v_x * B_z - v_z * Bx0 | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 3 | |
| ! Flux of rho*v_i in the ${XYZ}$ direction | |
| ! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot | |
| flux_rs${XYZ}$_vf(j, k, l, contxe + i) = & | |
| (s_M*(rho_R*vel_R(i)*vel_R(norm_dir) & | |
| - B%R(i)*B%R(norm_dir) & | |
| + dir_flg(i)*(pres_R + pres_mag%R)) & | |
| - s_P*(rho_L*vel_L(i)*vel_L(norm_dir) & | |
| - B%L(i)*B%L(norm_dir) & | |
| + dir_flg(i)*(pres_L + pres_mag%L)) & | |
| + s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) & | |
| /(s_M - s_P) | |
| do i = 0, 1 | |
| flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) & | |
| - s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) & | |
| + s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P) | |
| end do | |
| elseif (mhd .and. relativity) then | |
| else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction | |
| ! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$) | |
| ! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$) | |
| ! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$) | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 3 | |
| ! Flux of m_i in the ${XYZ}$ direction | |
| ! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot | |
| flux_rs${XYZ}$_vf(j, k, l, contxe + i) = & | |
| (s_M*(cm%R(i)*vel_R(norm_dir) & | |
| - b4%R(i)/Ga%R*B%R(norm_dir) & | |
| + dir_flg(i)*(pres_R + pres_mag%R)) & | |
| - s_P*(cm%L(i)*vel_L(norm_dir) & | |
| - b4%L(i)/Ga%L*B%L(norm_dir) & | |
| + dir_flg(i)*(pres_L + pres_mag%L)) & | |
| + s_M*s_P*(cm%L(i) - cm%R(i))) & | |
| /(s_M - s_P) | |
| end do | |
| elseif (bubbles_euler) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_vels | |
| flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = & | |
| (s_M*(rho_R*vel_R(dir_idx(1)) & | |
| *vel_R(dir_idx(i)) & | |
| + dir_flg(dir_idx(i))*(pres_R - ptilde_R)) & | |
| - s_P*(rho_L*vel_L(dir_idx(1)) & | |
| *vel_L(dir_idx(i)) & | |
| + dir_flg(dir_idx(i))*(pres_L - ptilde_L)) & | |
| + s_M*s_P*(rho_L*vel_L(dir_idx(i)) & | |
| - rho_R*vel_R(dir_idx(i)))) & | |
| /(s_M - s_P) & | |
| + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i))) | |
| end do | |
| else if (hypoelasticity) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_vels | |
| flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = & | |
| (s_M*(rho_R*vel_R(dir_idx(1)) & | |
| *vel_R(dir_idx(i)) & | |
| + dir_flg(dir_idx(i))*pres_R & | |
| - tau_e_R(dir_idx_tau(i))) & | |
| - s_P*(rho_L*vel_L(dir_idx(1)) & | |
| *vel_L(dir_idx(i)) & | |
| + dir_flg(dir_idx(i))*pres_L & | |
| - tau_e_L(dir_idx_tau(i))) & | |
| + s_M*s_P*(rho_L*vel_L(dir_idx(i)) & | |
| - rho_R*vel_R(dir_idx(i)))) & | |
| /(s_M - s_P) | |
| end do | |
| else | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_vels | |
| flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = & | |
| (s_M*(rho_R*vel_R(dir_idx(1)) & | |
| *vel_R(dir_idx(i)) & | |
| + dir_flg(dir_idx(i))*pres_R) & | |
| - s_P*(rho_L*vel_L(dir_idx(1)) & | |
| *vel_L(dir_idx(i)) & | |
| + dir_flg(dir_idx(i))*pres_L) & | |
| + s_M*s_P*(rho_L*vel_L(dir_idx(i)) & | |
| - rho_R*vel_R(dir_idx(i)))) & | |
| /(s_M - s_P) & | |
| + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i))) | |
| do i = 0, 2 | |
| flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( & | |
| s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - & | |
| s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + & | |
| s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P) | |
| end do | |
| end if | |
| flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp | |
| end if | |
| if (mhd) then | |
| if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const. | |
| #:if (NORM_DIR == 1) | |
| ! B_y flux = v_x * B_y - v_y * Bx0 | |
| ! B_z flux = v_x * B_z - v_z * Bx0 | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 0, 1 | |
| flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) & | |
| - s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) & | |
| + s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P) | |
| end do | |
| #:endif | |
| else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction | |
| ! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$) | |
| ! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$) | |
| ! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$) | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 0, 2 | |
| flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( & | |
| s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - & | |
| s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + & | |
| s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P) | |
| end do | |
| end if | |
| flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp | |
| end if |
src/simulation/m_viscous.fpp
Outdated
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
| $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Add the loop variable q to the private clause of the GPU_PARALLEL_LOOP directive to prevent a race condition. [possible issue, importance: 9]
| $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | |
| $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') |
|
I just builkt this on frontier manually and both omp and acc compile fine. I am going to rerun the build for frontier. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1029 +/- ##
==========================================
- Coverage 46.23% 44.34% -1.90%
==========================================
Files 68 71 +3
Lines 13548 20578 +7030
Branches 1589 1990 +401
==========================================
+ Hits 6264 9125 +2861
- Misses 6356 10308 +3952
- Partials 928 1145 +217 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
running this by @anandrdbz and @prathi-wind. also the MHD parts might need to be examined by @ChrisZYJ |
|
We can get their input. Luckily a few tests are actually passing on Frontier, and the ones that are failing appear to mostly have |
…ectives getting printed. Thanks Anand for looking at things with me
…was never called, breaking the IO
|
I personally think this should be fine, seems mostly non-intrusive to the PR |
|
We are still getting the weird timeout bug on frontier. All tests pass except for two on the last job, and those two hung instead of failed. Mostly just starting that frontier has been pretty finicky, since I've resubmitted it about 4 times now without seeing any cases actually fail, just run out of time. After i get this merged, and maybe after SC, I want to open a branch and debug what is going on in the tool chain that's causing this. |
|
Is there a reason why you add loop iterator variables to the private list of the parallel loops? Loop iterators should automatically be private. |
|
To make the macros consistent, |
|
There is not a particular reason I added private variables other than being through. I found a bug with IO as a result of explicitly placing iterators in the private variables, so I think it was a good thing to add. For style, I have no problem with changing the host data macros. I would prefer to place it as a separate PR only because of the massive number of lines the indentation change modified here, and I think the host data is a stylistic change, where this PR actually encompasses a bug fix. |
|
Logging again that frontier ran out of time. This time only one job hung, and it was different than the previous hang. Weird I guess I'll start again. @sbryngelson do you think I just keep resubmitting until I finally get a failure or it passes? |
|
CodeAnt AI is running Incremental review |
|
CodeAnt AI Incremental review completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/common/m_mpi_common.fpp (1)
760-771: Fix incorrect macro syntax: UseGPU_PARALLEL_LOOPinstead ofGPU_PARALLELthroughout this file.The pipeline reports that the
GPU_PARALLELmacro doesn't accept thecollapsekeyword argument. Based on the PR objectives, the macro invocations should use the new$:GPU_PARALLEL_LOOP()and$:END_GPU_PARALLEL_LOOP()syntax, not the call-style syntax withGPU_PARALLEL.This issue affects all 16 GPU parallel regions in this file at lines: 760, 774, 790, 807, 823, 840, 858, 961, 982, 998, 1015, 1037, 1054, 1073, 1096, and 1114.
Apply this pattern to fix all occurrences (example for lines 760-771):
- #:call GPU_PARALLEL(collapse=4,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=4,private='[r]') do l = 0, p do k = 0, n do j = 0, buff_size - 1 do i = 1, nVar r = (i - 1) + v_size*(j + buff_size*(k + (n + 1)*l)) buff_send(r) = real(q_comm(i)%sf(j + pack_offset, k, l), kind=wp) end do end do end do end do - #:endcall GPU_PARALLEL + $:END_GPU_PARALLEL_LOOP()Also applies to: 774-788, 790-804, 807-820, 823-838, 840-855, 858-871, 961-979, 982-996, 998-1012, 1015-1034, 1037-1052, 1054-1069, 1073-1093, 1096-1112, 1114-1130
♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)
774-788: Addqto the private list for consistency and correctness.The loop body uses
qas a loop index (line 779), butqis not included in the private clause. This is inconsistent with other qbmm loops in this file (lines 790, 823, 840, etc.) which includeqin their private lists.After applying the macro syntax fix, also add
qto the private list:- $:GPU_PARALLEL_LOOP(collapse=4,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/m_mpi_common.fpp(21 hunks)
🧰 Additional context used
🪛 GitHub Actions: Coverage Check
src/common/m_mpi_common.fpp
[error] 760-760: FyppFatalError: exception occurred when calling 'GPU_PARALLEL' and macro 'GPU_PARALLEL' called with unknown keyword argument(s) 'collapse'. Preprocessing failed during Fypp step in pre_process target.
🪛 GitHub Actions: Pretty
src/common/m_mpi_common.fpp
[error] 1871-1871: ./mfc.sh format -j $(nproc) failed: MFC formatting detected an end-of-file newline issue in 'src/common/m_mpi_common.fpp' (diff shows missing newline at end of file).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Build & Publish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)
774-788: Add missing 'q' to private list in qbmm pack loop.The
GPU_PARALLEL_LOOP_OLDat line 774 hasprivate='[r]', but the inner loop at lines 779-783 usesqas a loop index:do q = 1, nb r = (i - 1) + (q - 1)*4 + v_size* ...Variable
qshould be included in the private clause to prevent potential data races. All other qbmm loops in this file (lines 790, 823, 840, etc.) correctly includeqin their private lists.Apply this fix:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=4,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=4,private='[q,r]')
🧹 Nitpick comments (3)
src/common/include/omp_macros.fpp (1)
199-253: Consider extracting compiler-specific directive mapping to reduce duplication.The compiler-specific OpenMP directive logic (NVIDIA/PGI, CCE, AMD) is duplicated between
OMP_PARALLEL_LOOP(lines 226-234) andEND_OMP_PARALLEL_LOOP(lines 242-250). If a compiler's directive syntax changes, both macros must be updated consistently.Consider extracting the directive mappings to shared variables or a helper macro to maintain consistency and reduce maintenance burden.
src/common/include/parallel_macros.fpp (1)
55-66: Consider adding END_ACC_PARALLEL_LOOP macro for consistency.In
END_GPU_PARALLEL_LOOP, the OpenACC end directive is hardcoded as a string (line 57), while the OpenMP end directive callsEND_OMP_PARALLEL_LOOP()(line 58).For consistency and future extensibility, consider defining an
END_ACC_PARALLEL_LOOPmacro inacc_macros.fppand calling it here, mirroring the OMP pattern. This would make the codebase more uniform and easier to maintain if compiler-specific ACC end directives are needed later.src/common/include/acc_macros.fpp (1)
164-191: Consider adding END_ACC_PARALLEL_LOOP macro for consistency with OMP pattern.The new
ACC_PARALLEL_LOOPmacro (lines 164-191) only emits the start directive, following the refactoring pattern. However, unlikeomp_macros.fppwhich definesEND_OMP_PARALLEL_LOOP, there is no correspondingEND_ACC_PARALLEL_LOOPmacro in this file.While the current usage through
GPU_PARALLEL_LOOPworks (it hardcodes the ACC end directive at line 57 ofparallel_macros.fpp), addingEND_ACC_PARALLEL_LOOPwould:
- Maintain consistency with the OMP macro pattern
- Enable direct usage of ACC macros if needed
- Improve maintainability by centralizing the end directive logic
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/common/include/acc_macros.fpp(3 hunks)src/common/include/omp_macros.fpp(2 hunks)src/common/include/parallel_macros.fpp(2 hunks)src/common/m_mpi_common.fpp(20 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Build & Publish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/m_boundary_common.fpp(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
🔇 Additional comments (1)
src/common/m_boundary_common.fpp (1)
94-118: LGTM: Clean macro refactoring.The refactoring from call-style macros to explicit start/end directives is well-executed. The explicit private variable declarations (
private='[l,k]') andcollapse=2directive are appropriate for the nested loop structure and will help produce accurate line numbers in compiler diagnostics.
| case (BC_SlIP_WALL) | ||
| call s_slip_wall(q_prim_vf, 3, 1, k, l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in case statement.
Line 267 contains a typo: BC_SlIP_WALL (capital 'I') should be BC_SLIP_WALL to match the constant used throughout the rest of the file. This typo would prevent the slip wall boundary condition from being applied correctly at the z-direction end boundary.
Apply this diff to fix the typo:
- case (BC_SlIP_WALL)
+ case (BC_SLIP_WALL)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case (BC_SlIP_WALL) | |
| call s_slip_wall(q_prim_vf, 3, 1, k, l) | |
| case (BC_SLIP_WALL) | |
| call s_slip_wall(q_prim_vf, 3, 1, k, l) |
🤖 Prompt for AI Agents
In src/common/m_boundary_common.fpp around lines 267 to 268, the case label has
a typo `BC_SlIP_WALL` (capital 'I') that should be `BC_SLIP_WALL`; update the
case statement to use the correct constant name `BC_SLIP_WALL` so the slip wall
boundary condition is matched and the call to s_slip_wall executes as intended.
|
Just as a note about my debugging now. it does look like reverting solves the seg fault issue. I couldn't begin to guess why. If this passes, it should not be merged until I resolve the MPI common file. I will likely do a series of tests where I change the file a few loops at a time until I identify the compiler seg fault. I will make another comment when it is ready to be merged. |
…of fake errors. Now testing with half the loops in the new configuration.
…otal loops in this test with new macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/common/m_mpi_common.fpp (1)
760-906: Complete migration to new-style GPU macros within the buffer-packing subroutine.Within the same subroutine (lines 760–906), different MPI directions use inconsistent macro styles:
mpi_dir==1(lines 760–771): new-style$:GPU_PARALLEL_LOOP()/$:END_GPU_PARALLEL_LOOP()mpi_dir==2(line 840): OLD-style#:call GPU_PARALLEL_LOOP_OLD()/#:endcall GPU_PARALLEL_LOOP_OLDmpi_dir==3(lines 858–906): OLD-style macrosThis mixed usage contradicts the PR's objective to uniformly replace call-style macros with start/end directives. The inconsistency creates maintenance burden and undermines the goal of accurate line-number mapping.
Migrate the remaining OLD-style macros at lines 840–855 and 858–906 to the new style for consistency.
♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)
774-788:qloop variable still missing from private clause (previously flagged).The GPU_PARALLEL_LOOP on line 774 omits
qfrom its private list despite using it as a loop iterator on line 779. This was flagged in a previous review but remains unaddressed.Apply this fix:
- $:GPU_PARALLEL_LOOP(collapse=4,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]')
🧹 Nitpick comments (1)
src/common/m_mpi_common.fpp (1)
961-1130: Unpacking sections uniformly use OLD-style macros while packing is mixed.All buffer-unpacking loops (lines 961-1130) across all three MPI directions consistently use the OLD-style
#:call GPU_PARALLEL_LOOP_OLD()/#:endcallmacros. This contrasts with the packing sections (lines 760-906), which mix new-style and OLD-style.While internal consistency within unpacking is good, the asymmetry between packing and unpacking sections suggests incomplete migration. Consider migrating unpacking to the new style in a follow-up to achieve uniform line-number mapping across the entire subroutine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/m_mpi_common.fpp(15 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
…two more from that list for this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/common/m_mpi_common.fpp (4)
874-906: Addqtoprivateclause in qbmm packing loops (mpi_dir == 3, OLD-style macros).Both qbmm GPU pack loops for
mpi_dir == 3use the OLD-style macro withprivate='[r]'but iterate overq(lines 879 and 896), creating the same data race as the other mpi_dir branches.Apply this diff to add
qto both private clauses:- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, buff_size - 1 do k = -buff_size, n + buff_sizeAlso applies to line 891:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, buff_size - 1 do k = -buff_size, n + buff_size
982-1012: Addqtoprivateclause in qbmm unpacking loops (mpi_dir == 1, OLD-style macros).Both qbmm unpack loops for
mpi_dir == 1use OLD-style macros withprivate='[r]'but iterate overq(lines 987 and 1003), creating a data race during buffer unpacking.Apply this diff to add
qto both private clauses:- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = -buff_size, -1Also applies to line 998:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = -buff_size, -1
1037-1069: Addqtoprivateclause in qbmm unpacking loops (mpi_dir == 2, OLD-style macros).Both qbmm unpack loops for
mpi_dir == 2iterate overq(lines 1042 and 1059) but only include[r]in their private clauses, causing a data race.Apply this diff to add
qto both private clauses:- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = -buff_size, -1Also applies to line 1054:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = -buff_size, -1
1096-1130: Addqtoprivateclause in qbmm unpacking loops (mpi_dir == 3, OLD-style macros).Both qbmm unpack loops for
mpi_dir == 3iterate overq(lines 1101 and 1119) but only include[r]in their private clauses, causing a data race.Apply this diff to add
qto both private clauses:- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = -buff_size, -1 do k = -buff_size, n + buff_sizeAlso applies to line 1114:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = -buff_size, -1 do k = -buff_size, n + buff_size
♻️ Duplicate comments (1)
src/common/m_mpi_common.fpp (1)
774-804: Addqtoprivateclause in qbmm packing loops (mpi_dir == 1).Both qbmm GPU pack loops for
mpi_dir == 1haveprivate='[r]'but the innermost loop iterates overq(line 779 and 795). Sinceqis declared in the subroutine scope (line 688) and not marked private, this creates a data race where multiple parallel iterations may conflict on the sharedqvariable, leading to incorrect buffer packing.Apply this diff to add
qto both private clauses:- $:GPU_PARALLEL_LOOP(collapse=4,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]') do l = 0, p do k = 0, n do j = 0, buff_size - 1Also applies to line 790:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = 0, buff_size - 1
🧹 Nitpick comments (1)
src/common/m_mpi_common.fpp (1)
760-1130: Consider completing macro migration for consistency.The file mixes new-style
$:GPU_PARALLEL_LOOP()/$:END_GPU_PARALLEL_LOOP()macros (lines 760-771, 774-788, 790-804, 807-820, 823-838, 840-855, 858-871) with OLD-style#:call GPU_PARALLEL_LOOP_OLD()/#:endcall GPU_PARALLEL_LOOP_OLDmacros (lines 874-906 formpi_dir == 3packing and lines 961-1130 for all unpacking).Given your comments about encountering compiler segfaults and intending additional debugging before merging these MPI changes, this partial migration appears intentional. Once debugging is complete, consider migrating all remaining OLD-style macros to the new style for consistency with the rest of the PR.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/m_mpi_common.fpp(13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
| $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do i = nVar + 1, nVar + 4 | ||
| do l = 0, p | ||
| do k = 0, buff_size - 1 | ||
| do j = -buff_size, m + buff_size | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + v_size* & | ||
| ((j + buff_size) + (m + 2*buff_size + 1)* & | ||
| (k + buff_size*l)) | ||
| buff_send(r) = real(pb_in(j, k + pack_offset, l, i - nVar, q), kind=wp) | ||
| end do | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() | ||
| #:call GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do i = nVar + 1, nVar + 4 | ||
| do l = 0, p | ||
| do k = 0, buff_size - 1 | ||
| do j = -buff_size, m + buff_size | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + nb*4 + v_size* & | ||
| ((j + buff_size) + (m + 2*buff_size + 1)* & | ||
| (k + buff_size*l)) | ||
| buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nVar, q), kind=wp) | ||
| end do | ||
| $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do i = nVar + 1, nVar + 4 | ||
| do l = 0, p | ||
| do k = 0, buff_size - 1 | ||
| do j = -buff_size, m + buff_size | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + nb*4 + v_size* & | ||
| ((j + buff_size) + (m + 2*buff_size + 1)* & | ||
| (k + buff_size*l)) | ||
| buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nVar, q), kind=wp) | ||
| end do | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add q to private clause in qbmm packing loops (mpi_dir == 2).
Both qbmm GPU pack loops for mpi_dir == 2 iterate over q (lines 828 and 845) but only include [r] in their private clauses. This creates the same data race as in the mpi_dir == 1 case above.
Apply this diff to add q to both private clauses:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+ $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
do i = nVar + 1, nVar + 4
do l = 0, p
do k = 0, buff_size - 1Also applies to line 840:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
+ $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
do i = nVar + 1, nVar + 4
do l = 0, p
do k = 0, buff_size - 1Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/common/m_mpi_common.fpp around lines 823 to 855, the two
GPU_PARALLEL_LOOP directives for the qbmm packing loops (mpi_dir == 2) only
declare [r] as private but the loops also iterate over q, causing a data race;
update both GPU_PARALLEL_LOOP(private='[r]') clauses to include q (e.g.,
private='[r,q]') for each of the two loops (also adjust the similar private
clause at the nearby line ~840) so that q is private within the parallel region.
…ng loops or both. Changing one of them now.
…n it is line 891-906 in m_mpi_common.fpp. I will then have to figure out why that causes a seg fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/common/m_mpi_common.fpp (2)
823-855: Addqto private clauses in both qbmm packing loops (mpi_dir == 2).Both qbmm GPU packing loops for
mpi_dir == 2iterate overq(lines 828 and 845) but only include[r]in their private clauses. The loop indexqmust be declared private to avoid potential data races.Apply this diff to add
qto both private clauses:- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = 0, buff_size - 1 do j = -buff_size, m + buff_size do q = 1, nbAlso at line 840:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = 0, buff_size - 1
774-804: Addqto private clauses in both qbmm packing loops (mpi_dir == 1).Both qbmm GPU packing loops for
mpi_dir == 1iterate overq(lines 779 and 795) but only include[r]in their private clauses. The loop indexqmust be declared private to avoid potential data races.Apply this diff to add
qto both private clauses:- $:GPU_PARALLEL_LOOP(collapse=4,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=4,private='[q,r]') do l = 0, p do k = 0, n do j = 0, buff_size - 1 do i = nVar + 1, nVar + 4 do q = 1, nbAlso at line 790:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') + $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = 0, buff_size - 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/m_mpi_common.fpp(11 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pretty
src/common/m_mpi_common.fpp
[error] 896-896: MFC formatting changed the file. Process completed with exit code 1 after diff in src/common/m_mpi_common.fpp.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
| $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do i = nVar + 1, nVar + 4 | ||
| do l = 0, buff_size - 1 | ||
| do k = -buff_size, n + buff_size | ||
| do j = -buff_size, m + buff_size | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + v_size* & | ||
| ((j + buff_size) + (m + 2*buff_size + 1)* & | ||
| ((k + buff_size) + (n + 2*buff_size + 1)*l)) | ||
| buff_send(r) = real(pb_in(j, k, l + pack_offset, i - nVar, q), kind=wp) | ||
| end do | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() | ||
| #:call GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do i = nVar + 1, nVar + 4 | ||
| do l = 0, buff_size - 1 | ||
| do k = -buff_size, n + buff_size | ||
| do j = -buff_size, m + buff_size | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + nb*4 + v_size* & | ||
| ((j + buff_size) + (m + 2*buff_size + 1)* & | ||
| ((k + buff_size) + (n + 2*buff_size + 1)*l)) | ||
| buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nVar, q), kind=wp) | ||
| end do | ||
| $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do i = nVar + 1, nVar + 4 | ||
| do l = 0, buff_size - 1 | ||
| do k = -buff_size, n + buff_size | ||
| do j = -buff_size, m + buff_size | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + nb*4 + v_size* & | ||
| ((j + buff_size) + (m + 2*buff_size + 1)* & | ||
| ((k + buff_size) + (n + 2*buff_size + 1)*l)) | ||
| buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nVar, q), kind=wp) | ||
| end do | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add q to private clauses in both qbmm packing loops (mpi_dir == 3).
Both qbmm GPU packing loops for mpi_dir == 3 iterate over q (lines 879 and 896) but only include [r] in their private clauses. The loop index q must be declared private to avoid potential data races.
Apply this diff to add q to both private clauses:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+ $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
do i = nVar + 1, nVar + 4
do l = 0, buff_size - 1
do k = -buff_size, n + buff_size
do j = -buff_size, m + buff_size
do q = 1, nbAlso at line 891:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+ $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
do i = nVar + 1, nVar + 4
do l = 0, buff_size - 1
do k = -buff_size, n + buff_size🧰 Tools
🪛 GitHub Actions: Pretty
[error] 896-896: MFC formatting changed the file. Process completed with exit code 1 after diff in src/common/m_mpi_common.fpp.
🤖 Prompt for AI Agents
In src/common/m_mpi_common.fpp around lines 874 to 906, both GPU_PARALLEL_LOOP
pragmas for the qbmm packing path (mpi_dir == 3) currently declare only r as
private but the inner loop also iterates over q; declare q private as well to
prevent data races by updating the private clauses to include q (e.g.,
private='[r,q]') for both loops spanning the blocks starting at ~line 876 and
~line 892.
…ll be curious if the format change does anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/common/m_mpi_common.fpp (6)
982-996: Addqtoprivateclause in qbmm pb_in unpacking loop (mpi_dir == 1).The qbmm GPU unpacking loop for
pb_inwhenmpi_dir == 1iterates overq(line 987) but only includes[r]in the private clause at line 961's OLD-style wrapper. Variableqmust be declared private to avoid potential data races.The private clause is set at the
GPU_PARALLEL_LOOP_OLDinvocation. Update it to includeq:- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = -buff_size, -1
998-1012: Addqtoprivateclause in qbmm mv_in unpacking loop (mpi_dir == 1).The qbmm GPU unpacking loop for
mv_inwhenmpi_dir == 1iterates overq(line 1003) but only includes[r]in the private clause. Variableqmust be declared private to avoid potential data races.Apply this diff:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do l = 0, p do k = 0, n do j = -buff_size, -1
1037-1052: Addqtoprivateclause in qbmm pb_in unpacking loop (mpi_dir == 2).The qbmm GPU unpacking loop for
pb_inwhenmpi_dir == 2iterates overq(line 1042) but only includes[r]in the private clause. Variableqmust be declared private to avoid potential data races.Apply this diff:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = -buff_size, -1
1054-1069: Addqtoprivateclause in qbmm mv_in unpacking loop (mpi_dir == 2).The qbmm GPU unpacking loop for
mv_inwhenmpi_dir == 2iterates overq(line 1059) but only includes[r]in the private clause. Variableqmust be declared private to avoid potential data races.Apply this diff:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = 0, p do k = -buff_size, -1
1096-1112: Addqtoprivateclause in qbmm pb_in unpacking loop (mpi_dir == 3).The qbmm GPU unpacking loop for
pb_inwhenmpi_dir == 3iterates overq(line 1101) but only includes[r]in the private clause. Variableqmust be declared private to avoid potential data races.Apply this diff:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = -buff_size, -1 do k = -buff_size, n + buff_size
1114-1130: Addqtoprivateclause in qbmm mv_in unpacking loop (mpi_dir == 3).The qbmm GPU unpacking loop for
mv_inwhenmpi_dir == 3iterates overq(line 1119) but only includes[r]in the private clause. Variableqmust be declared private to avoid potential data races.Apply this diff:
- #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[r]') + #:call GPU_PARALLEL_LOOP_OLD(collapse=5,private='[q,r]') do i = nVar + 1, nVar + 4 do l = -buff_size, -1 do k = -buff_size, n + buff_size
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/common/m_mpi_common.fpp(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Build & Publish
| $:GPU_PARALLEL_LOOP(collapse=5,private='[r]') | ||
| do l = 0, p | ||
| do k = 0, n | ||
| do j = 0, buff_size - 1 | ||
| do i = nVar + 1, nVar + 4 | ||
| do q = 1, nb | ||
| r = (i - 1) + (q - 1)*4 + nb*4 + v_size* & | ||
| (j + buff_size*(k + (n + 1)*l)) | ||
| buff_send(r) = real(mv_in(j + pack_offset, k, l, i - nVar, q), kind=wp) | ||
| end do | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add q to private clause in qbmm mv_in packing loop (mpi_dir == 1).
The second qbmm GPU packing loop for mpi_dir == 1 iterates over q (line 795) but only includes [r] in the private clause. This creates a potential data race on q.
Apply this diff:
- $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
+ $:GPU_PARALLEL_LOOP(collapse=5,private='[q,r]')
do l = 0, p
do k = 0, n
do j = 0, buff_size - 1🤖 Prompt for AI Agents
In src/common/m_mpi_common.fpp around lines 790 to 804, the GPU_PARALLEL_LOOP
for the qbmm mv_in packing (mpi_dir == 1) declares private='[r]' but the loop
also uses loop variable q, which can cause a data race; update the private
clause to include q as well (e.g., private='[r,q]') so q is private per GPU
thread and no race occurs, preserving the existing loop structure and
indentation.
…VHOPC openACC, which must have been overwritten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/simulation/m_bubbles_EL_kernels.fpp (2)
58-93: Incomplete private clause causes data races.The private clause only includes
l,s_coord, andcell, but the loop body also modifiesvolpart,strength_vol,strength_vel,Vol,addFun1,addFun2, andaddFun3. Without privatizing these temporaries, concurrent threads will race on shared variables, producing nondeterministic results.Apply this diff to add missing variables to the private clause:
- $:GPU_PARALLEL_LOOP(private='[l,s_coord,cell]') + $:GPU_PARALLEL_LOOP(private='[l,s_coord,cell,volpart,strength_vol,strength_vel,Vol,addFun1,addFun2,addFun3]')
124-199: Incomplete private clause causes data races.The private clause only includes
nodecoord,l,s_coord,cell, andcenter, but the loop body also uses per-bubble temporariesvolpart,strength_vol,strength_vel,stddsvand per-cell temporariesfunc,func2,addFun1,addFun2,addFun3,cellaux, andcelloutside. Without privatizing these variables in the outer parallel loop, concurrent threads will race on shared storage.Apply this diff to add missing variables to the private clause:
- $:GPU_PARALLEL_LOOP(private='[nodecoord,l,s_coord,cell,center]', copyin='[smearGrid,smearGridz]') + $:GPU_PARALLEL_LOOP(private='[nodecoord,l,s_coord,cell,center,volpart,strength_vol,strength_vel,stddsv,func,func2,addFun1,addFun2,addFun3,cellaux,celloutside]', copyin='[smearGrid,smearGridz]')
🧹 Nitpick comments (1)
src/simulation/m_bubbles_EL.fpp (1)
1516-1529: Void-fraction statistics reduction is correctly expressedThe
$:GPU_PARALLEL_LOOPins_write_void_evoluses a grouped reduction[[lag_vol, lag_void_avg], [lag_void_max]]with operators[+, MAX], which matches the accumulation pattern inside the loop. Privatizingvolcellavoids reuse across iterations, and the prior initialization of the reduced variables ensures a well-defined starting point.If you ever touch this block again, you might consider switching the literal
5.0d-11to awp-kind literal (e.g.,5.0e-11_wp) for consistency with the rest of the file, but it’s not functionally urgent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/simulation/m_bubbles_EL.fpp(10 hunks)src/simulation/m_bubbles_EL_kernels.fpp(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Self Hosted (cpu, none, frontier)
- GitHub Check: Self Hosted (gpu, acc, gt)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Self Hosted (gpu, acc, frontier)
- GitHub Check: Self Hosted (cpu, none, gt)
- GitHub Check: Self Hosted (gpu, omp, frontier)
- GitHub Check: Self Hosted (gpu, omp, gt)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Code Cleanliness Check
- GitHub Check: Build & Publish
🔇 Additional comments (8)
src/simulation/m_bubbles_EL_kernels.fpp (1)
384-384: No breaking changes detected within the codebase.The search found no calls to
s_get_char_volanywhere in the repository, and the modulem_bubbles_EL_kernelsis not imported elsewhere. Removing theelementalkeyword will not break any existing code within this codebase.However, if this function is part of a public library API consumed by external projects, those external callers may have elemental calls that would now fail. Please verify whether this module is intended as a public API or an internal implementation detail.
src/simulation/m_bubbles_EL.fpp (7)
598-615: GPU_PARALLEL_LOOP refactor ins_compute_bubble_EL_dynamicslooks soundThe new
$:GPU_PARALLEL_LOOPblocks correctly privatize loop indices and per‑iteration temporaries (k,i,cell,myalpha_rho,myalpha,Re) and use aMAXreduction foradap_dt_stop_maxconsistent with the scalar update pattern. I don’t see new data races or scope issues introduced by this macro restructuring, and the addedcopyin='[stage]'is appropriate.Also applies to: 620-686, 692-699
723-738: Source-term GPU loops have consistent scoping and no obvious racesIn
s_compute_bubbles_EL_source, the converted$:GPU_PARALLEL_LOOPregions (withcollapseand explicitprivate='[i,j,k,l]') align with the nested loop structure, and each iteration writes to a unique(l,i,j,k)or(i,j,k)location inrhs_vf/q_beta. The absence of reductions here is correct, and I don’t see any new parallel write conflicts introduced by the macro changes.Also applies to: 740-754, 762-775, 779-787, 792-804
849-859: Void-fraction smearing loops remain correct under new macrosThe two
$:GPU_PARALLEL_LOOPusages ins_smear_voidfractionproperly privatize indices and cover all(i,j,k,l)combinations once, so zeroingq_betaand then applying the1 - beta/clamping transform is still race‑free and logically unchanged by the refactor.Also applies to: 866-877
1107-1117: Time-stepper update kernels preserve per-bubble independenceIn
s_update_lagrange_tdv_rk, all RK1/RK2/RK3 update loops now use$:GPU_PARALLEL_LOOP(private='[k]')overk = 1:nBubs, with each iteration only touching state for thatk. This keeps the original per‑bubble independence, so introducing the explicit start/end GPU macros does not add races or change the update ordering semantics.Also applies to: 1131-1141, 1144-1154, 1168-1179, 1182-1192, 1193-1204
1280-1292: Temporary-state transfer loop is a good fit for GPU_PARALLEL_LOOPThe
$:GPU_PARALLEL_LOOP(private='[k]')arounds_transfer_data_to_tmpcorrectly parallelizes a pure per‑bubble copy of state from slot 1 → 2. There are no cross‑iteration dependencies, so this is safe and matches the intended data movement.
1378-1391: Gradient kernels: GPU loop restructuring keeps stencil semantics intactFor all three directions in
s_gradient_dir, the new$:GPU_PARALLEL_LOOP(private='[i,j,k]', collapse=3)wrappers preserve the central‑difference stencil (neighbors at±1along the active axis) and only updatedq(i,j,k)per iteration. Given the existing ghost‑cell layout, this macro refactor does not introduce boundary or race issues.Also applies to: 1394-1406, 1409-1421
1712-1721: Bubble-radius stats reduction matches update logicIn
s_calculate_lag_bubble_stats, the$:GPU_PARALLEL_LOOPwithreduction='[[Rmax_glb], [Rmin_glb]]'andreductionOp='[MAX, MIN]'matches themax/minusage, and eachkiteration only writes toRmax_stats(k)/Rmin_stats(k). This keeps global and per‑bubble statistics consistent without adding races.
wilfonba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmark
|
@anandrdbz I think that you told me that I should expect benchmark to fail because the master benchmark is not currently working. Can you comment if I am remembering this correctly or not? |
|
@sbryngelson My code is finally rebased and passing regression tests, but failing benchmark. The last PR that I can find that even ran the benchmark tests is this one: 8a66941 That PR fails benchmark. I think I recall Anand telling that Benchmark would fail. I asked for him to comment to provide clarity on it. I think this PR is actually as complete as it can be and we just expect the benchmarking to fail. My preference would be to merge this sooner if we can because every PR that comes in causes a significant amount of debugging and rebasing on code I am unfamiliar with. I don't know if it is efficient for me to keep updating this PR until we have a fix to the test suite. I am curious if you want to wait for benchmarking to return or if we can merge.What do you think? Edit: Anand confirmed over slack that the NVHPC bench is expected to fail because of issues with openMP on it. The rest appear to fail due to being canceled because the NVHPC bench test fails. |
|
@sbryngelson thos merge passed all the benchmarks except for the broken one. Do you think we can just merge it now and work on the NVHPC issue next? |
User description
User description
Description
I have refactored the to separate out the openMP and openACC macros into two separate macros. One for starting a parallel loop,
GPU_PARALLEL_LOOPand one for ending a parallel regionEND_GPU_PARALLEL_LOOP. This allows the preprocessor to only insert new lines at the start and end of GPU parallel loops. This means that the code is not able to allow line markers to progress naturally instead of all lines inside a loop having the same line number.I have also adjusted the macro files to remove the additional warnings that were being printed with every single file that has GPU macros.
The result is that this eliminates over half of the warnings that are being printed when the code compiles, and the line numbers for errors are correct. I have tested this locally and found that the line numbers reported are correct, at least in regions that I checked. The result should be a much improved developer experience.
Fixes #1028
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
I compiled locally and saw that the warnings were removed. I also tried making intentional changes and saw that the correct line numbers were reported.
Test Configuration:
Checklist
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
PR Type
Enhancement
Description
Refactored GPU parallel loop macros from
#:call GPU_PARALLEL_LOOP()/#:endcall GPU_PARALLEL_LOOPsyntax to$:GPU_PARALLEL_LOOP()/$:END_GPU_PARALLEL_LOOP()syntaxSeparated OpenMP and OpenACC macro implementations into distinct start and end directives (
OMP_PARALLEL_LOOP/END_OMP_PARALLEL_LOOPandACC_PARALLEL_LOOP/END_GPU_PARALLEL_LOOP)Added explicit
privateclause parameters to all GPU parallel loop macros to specify privatized loop variablesModified macro definitions to emit only opening directives, allowing loop code to follow separately for improved line numbering accuracy
Applied changes consistently across multiple simulation and common modules including viscous stress, RHS, pressure relaxation, CBC, levelset computations, and MPI common utilities
Adjusted indentation of loop bodies to align with new macro structure
Removed embedded code execution from macro implementations to allow line markers to progress naturally
Result eliminates over half of compilation warnings and provides correct line numbers for error reporting
Diagram Walkthrough
File Walkthrough
8 files
m_viscous.fpp
Refactor GPU parallel loop macros for improved line numberingsrc/simulation/m_viscous.fpp
#:call GPU_PARALLEL_LOOP()/#:endcall GPU_PARALLEL_LOOPto$:GPU_PARALLEL_LOOP()/$:END_GPU_PARALLEL_LOOP()syntaxprivateparameter declarations to allGPU_PARALLEL_LOOPmacro calls to specify loop variables
syntax requirements
s_compute_viscous_stress_tensorand gradient computation functionsacc_macros.fpp
Separate ACC parallel loop opening directive from codesrc/common/include/acc_macros.fpp
ACC_PARALLEL_LOOPmacro definition to removecodeparameterand
acc_end_directivefrom macro bodyfollow separately
implementation
m_rhs.fpp
Refactor GPU loop macros with explicit private variablessrc/simulation/m_rhs.fpp
#:call GPU_PARALLEL_LOOP()and
#:endcall GPU_PARALLEL_LOOPto$:GPU_PARALLEL_LOOP()and$:END_GPU_PARALLEL_LOOP()syntaxprivateclause to allGPU_PARALLEL_LOOPmacrosspecifying loop variables to be privatized
(removed extra indentation level)
throughout the file
m_mpi_common.fpp
Update GPU macros with explicit private variable declarationssrc/common/m_mpi_common.fpp
#:call GPU_PARALLEL_LOOP()to$:GPU_PARALLEL_LOOP()syntaxprivateclause listing all loop variables (i,j,k,l,q,r) to GPU parallel regions#:endcall GPU_PARALLEL_LOOPto$:END_GPU_PARALLEL_LOOP()m_pressure_relaxation.fpp
Refactor GPU parallel loop macros with private variablessrc/simulation/m_pressure_relaxation.fpp
#:call GPU_PARALLEL_LOOP()with$:GPU_PARALLEL_LOOP()macrosyntax
private='[j,k,l]'clause to specify privatizedvariables
#:endcall GPU_PARALLEL_LOOPto$:END_GPU_PARALLEL_LOOP()m_cbc.fpp
Refactor GPU loop macros to separate start and end directivessrc/simulation/m_cbc.fpp
#:call GPU_PARALLEL_LOOP()/#:endcall GPU_PARALLEL_LOOPto$:GPU_PARALLEL_LOOP()/$:END_GPU_PARALLEL_LOOP()syntaxprivateclause parameters to allGPU_PARALLEL_LOOPcalls specifying loop variables
removing extra indentation from the old call-based approach
flux reshaping, and output operations
m_compute_levelset.fpp
Refactor GPU parallel loop macros in levelset computationssrc/common/m_compute_levelset.fpp
#:call GPU_PARALLEL_LOOP()/#:endcall GPU_PARALLEL_LOOPinvocations to
$:GPU_PARALLEL_LOOP()/$:END_GPU_PARALLEL_LOOP()syntax
privateclause with loop variable lists to all GPUparallel loop macros
structure
airfoil, 3D airfoil, rectangle, cuboid, sphere, cylinder)
omp_macros.fpp
Split GPU parallel loop macro into separate start and end macrossrc/common/include/omp_macros.fpp
OMP_PARALLEL_LOOPmacro into two parts: start directivegeneration and end directive generation
codeparameter fromOMP_PARALLEL_LOOPmacro definitionEND_OMP_PARALLEL_LOOPmacro to emit appropriate enddirectives based on compiler type
$:codeexecution fromOMP_PARALLEL_LOOP, allowing loopbodies to be written directly in code
1 files
shared_parallel_macros.fpp
Add required newline at end of filesrc/common/include/shared_parallel_macros.fpp
27 files
CodeAnt-AI Description
Restore accurate line numbers for GPU loops
What Changed
Impact
✅ Clearer GPU error line numbers✅ Fewer GPU build warnings✅ Consistent diagnostics across viscous and MPI kernels💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
New Features
Documentation
Refactor