Commit 6cf8a1f
committed
Improves attention bias numerical stability
Replaces $\\exp(A\\cdot\\mathrm{softplus}(\\Delta V))$ with $A\\cdot\\mathrm{softplus}(\\Delta V)$ to prevent overflow/NaNs in attention bias and stabilize training/inference.
Preserves tensor shape/dtype and adds a clarifying comment on the rationale.1 parent 78bb93d commit 6cf8a1f
1 file changed
+2
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
| 220 | + | |
| 221 | + | |
221 | 222 | | |
222 | 223 | | |
223 | 224 | | |
| |||
0 commit comments