Skip to content

Conversation

@jiez
Copy link

@jiez jiez commented Nov 26, 2025

For GMAC4, when split header is enabled, in some rare cases, the hardware does not fill buf2 of the first descriptor with payload. Thus we cannot assume buf2 is always fully filled if it is not the last descriptor. Otherwise, the length of buf2 of the second descriptor will be calculated wrong and cause an oops:

Unable to handle kernel paging request at virtual address ffff00019246bfc0 Mem abort info:
  ESR = 0x0000000096000145
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
  CM = 1, WnR = 1, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000090d8b000 [ffff00019246bfc0] pgd=180000009dfff403, p4d=180000009dfff403, pud=0000000000000000 Internal error: Oops: 0000000096000145 [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 157 Comm: iperf3 Not tainted 6.18.0-rc6 #1 PREEMPT Hardware name: ADI 64-bit SC598 SOM EZ Kit (DT)
pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : dcache_inval_poc+0x28/0x58
lr : arch_sync_dma_for_cpu+0x28/0x34
sp : ffff800080dcbc40
x29: ffff800080dcbc40 x28: 0000000000000008 x27: ffff000091c50980 x26: ffff000091c50980 x25: 0000000000000000 x24: ffff000092a5fb00 x23: ffff000092768f28 x22: 000000009246c000 x21: 0000000000000002 x20: 00000000ffffffdc x19: ffff000091844c10 x18: 0000000000000000 x17: ffff80001d308000 x16: ffff800080dc8000 x15: ffff0000929fb034 x14: 70f709157374dd21 x13: ffff000092812ec0 x12: 0000000000000000 x11: 000000000000dd86 x10: 0000000000000040 x9 : 0000000000000600 x8 : ffff000092a5fbac x7 : 0000000000000001 x6 : 0000000000004240 x5 : 000000009246c000 x4 : ffff000091844c10 x3 : 000000000000003f x2 : 0000000000000040 x1 : ffff00019246bfc0 x0 : ffff00009246c000 Call trace:
 dcache_inval_poc+0x28/0x58 (P)
 dma_direct_sync_single_for_cpu+0x38/0x6c
 __dma_sync_single_for_cpu+0x34/0x6c
 stmmac_napi_poll_rx+0x8f0/0xb60
 __napi_poll.constprop.0+0x30/0x144
 net_rx_action+0x160/0x274
 handle_softirqs+0x1b8/0x1fc
 __do_softirq+0x10/0x18
 ____do_softirq+0xc/0x14
 call_on_irq_stack+0x30/0x48
 do_softirq_own_stack+0x18/0x20
 __irq_exit_rcu+0x64/0xe8
 irq_exit_rcu+0xc/0x14
 el1_interrupt+0x3c/0x58
 el1h_64_irq_handler+0x14/0x1c
 el1h_64_irq+0x6c/0x70
 __arch_copy_to_user+0xbc/0x240 (P)
 simple_copy_to_iter+0x28/0x30
 __skb_datagram_iter+0x1bc/0x268
 skb_copy_datagram_iter+0x1c/0x24
 tcp_recvmsg_locked+0x3ec/0x778
 tcp_recvmsg+0x10c/0x194
 inet_recvmsg+0x64/0xa0
 sock_recvmsg_nosec+0x1c/0x24
 sock_read_iter+0x8c/0xdc
 vfs_read+0x144/0x1a0
 ksys_read+0x74/0xdc
 __arm64_sys_read+0x14/0x1c
 invoke_syscall+0x60/0xe4
 el0_svc_common.constprop.0+0xb0/0xcc
 do_el0_svc+0x18/0x20
 el0_svc+0x80/0xc8
 el0t_64_sync_handler+0x58/0x134
 el0t_64_sync+0x170/0x174
Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21) ---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: disabled
CPU features: 0x080000,00008000,08006281,0400520b
Memory Limit: none
---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

To fix this, the PL bit-field in RDES3 register is used for all descriptors, whether it is the last descriptor or not.

PR Description

  • Please replace this comment with a summary of your changes, and add any context
    necessary to understand them. List any dependencies required for this change.
  • To check the checkboxes below, insert a 'x' between square brackets (without
    any space), or simply check them after publishing the PR.
  • If you changes include a breaking change, please specify dependent PRs in the
    description and try to push all related PRs simultaneously.

PR Type

  • Bug fix (a change that fixes an issue)
  • New feature (a change that adds new functionality)
  • Breaking change (a change that affects other repos or cause CIs to fail)

PR Checklist

  • I have conducted a self-review of my own code changes
  • I have compiled my changes, including the documentation
  • I have tested the changes on the relevant hardware
  • I have updated the documentation outside this repo accordingly
  • I have provided links for the relevant upstream lore

@pamolloy
Copy link
Collaborator

@pamolloy pamolloy added this to ADSP Nov 26, 2025
Copy link
Collaborator

@nunojsa nunojsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nit. As for the fix itself I don't really know the IP to opinate about it but looks good

*/
if (!priv->plat->has_gmac4 &&
/* Not last descriptor */
(status & rx_not_ls))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line break seems unnecessary. If I'm not mistaken, checkpatch should even complain about it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for adding the "Not last descriptor" at the right place. checkpatch is OK with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already got this because of things like the above but might only pop up with --strict or can be a checkpatch false positive. But more importantly, it does not go over the 80 column limit so It's very likely you're getting some comments from the maintainers because the above, AFAICT, is not following linux coding style.

I would rather update the comment (even though the condition is fairly obvious already).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--strict does not give an error, either. This is a common way to add a comment for only one conditional. This does follow the Linux coding style. The link to the code is for preventing another way to break conditionals into multiple lines (which is required in GNU coding style, so for people who needs to switch between Linux and GNU, this is a common mistake)

But after reading the patch several times, I feels I should also add a comment for the GMAC4 conditional. So I put them in one comment and make the two conditionals in one line.

I also add back the comment from the original code.

I agree they are obvious. I just follow what the original code does.

@jiez
Copy link
Author

jiez commented Nov 26, 2025

@jiez how does this compare to the original fix referenced in #2887?

https://github.com/analogdevicesinc/lnxdsp-adi-meta/blob/main/meta-adi-adsp-sc5xx/recipes-kernel/linux/linux-adi/0001-SC598-fix-stmmac-dma-split-header-crash.patch

This fix is just a hack and wrong. It would calculate wrong statistics number and DMA sync unnecessary bytes.

@pamolloy
Copy link
Collaborator

pamolloy commented Nov 26, 2025

You can submit it mainline and see what folks say. It would be good to add a Link: https://lore.kernel.org/lkml/... your commit message here before we merge it.

For GMAC4, when split header is enabled, in some rare cases, the
hardware does not fill buf2 of the first descriptor with payload.
Thus we cannot assume buf2 is always fully filled if it is not
the last descriptor. Otherwise, the length of buf2 of the second
descriptor will be calculated wrong and cause an oops:

Unable to handle kernel paging request at virtual address ffff00019246bfc0
Mem abort info:
  ESR = 0x0000000096000145
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
  CM = 1, WnR = 1, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000090d8b000
[ffff00019246bfc0] pgd=180000009dfff403, p4d=180000009dfff403, pud=0000000000000000
Internal error: Oops: 0000000096000145 [#1]  SMP
Modules linked in:
CPU: 0 UID: 0 PID: 157 Comm: iperf3 Not tainted 6.18.0-rc6 #1 PREEMPT
Hardware name: ADI 64-bit SC598 SOM EZ Kit (DT)
pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dcache_inval_poc+0x28/0x58
lr : arch_sync_dma_for_cpu+0x28/0x34
sp : ffff800080dcbc40
x29: ffff800080dcbc40 x28: 0000000000000008 x27: ffff000091c50980
x26: ffff000091c50980 x25: 0000000000000000 x24: ffff000092a5fb00
x23: ffff000092768f28 x22: 000000009246c000 x21: 0000000000000002
x20: 00000000ffffffdc x19: ffff000091844c10 x18: 0000000000000000
x17: ffff80001d308000 x16: ffff800080dc8000 x15: ffff0000929fb034
x14: 70f709157374dd21 x13: ffff000092812ec0 x12: 0000000000000000
x11: 000000000000dd86 x10: 0000000000000040 x9 : 0000000000000600
x8 : ffff000092a5fbac x7 : 0000000000000001 x6 : 0000000000004240
x5 : 000000009246c000 x4 : ffff000091844c10 x3 : 000000000000003f
x2 : 0000000000000040 x1 : ffff00019246bfc0 x0 : ffff00009246c000
Call trace:
 dcache_inval_poc+0x28/0x58 (P)
 dma_direct_sync_single_for_cpu+0x38/0x6c
 __dma_sync_single_for_cpu+0x34/0x6c
 stmmac_napi_poll_rx+0x8f0/0xb60
 __napi_poll.constprop.0+0x30/0x144
 net_rx_action+0x160/0x274
 handle_softirqs+0x1b8/0x1fc
 __do_softirq+0x10/0x18
 ____do_softirq+0xc/0x14
 call_on_irq_stack+0x30/0x48
 do_softirq_own_stack+0x18/0x20
 __irq_exit_rcu+0x64/0xe8
 irq_exit_rcu+0xc/0x14
 el1_interrupt+0x3c/0x58
 el1h_64_irq_handler+0x14/0x1c
 el1h_64_irq+0x6c/0x70
 __arch_copy_to_user+0xbc/0x240 (P)
 simple_copy_to_iter+0x28/0x30
 __skb_datagram_iter+0x1bc/0x268
 skb_copy_datagram_iter+0x1c/0x24
 tcp_recvmsg_locked+0x3ec/0x778
 tcp_recvmsg+0x10c/0x194
 inet_recvmsg+0x64/0xa0
 sock_recvmsg_nosec+0x1c/0x24
 sock_read_iter+0x8c/0xdc
 vfs_read+0x144/0x1a0
 ksys_read+0x74/0xdc
 __arm64_sys_read+0x14/0x1c
 invoke_syscall+0x60/0xe4
 el0_svc_common.constprop.0+0xb0/0xcc
 do_el0_svc+0x18/0x20
 el0_svc+0x80/0xc8
 el0t_64_sync_handler+0x58/0x134
 el0t_64_sync+0x170/0x174
Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
Kernel Offset: disabled
CPU features: 0x080000,00008000,08006281,0400520b
Memory Limit: none
---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

To fix this, the PL bit-field in RDES3 register is used for all
descriptors, whether it is the last descriptor or not.

Signed-off-by: Jie Zhang <jie.zhang@analog.com>
@jiez jiez force-pushed the 2887-oops-in-stmmac-driver-parsing-rx-packets branch from 8f28e7c to 15773a9 Compare December 2, 2025 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants