From e23968f8a7b9d1d76890ca6a0eae9c5f24c0f73c Mon Sep 17 00:00:00 2001
From: pnazari <pnazari@student.ethz.ch>
Date: Tue, 12 Aug 2025 22:49:23 +0200
Subject: [PATCH 01/10] updated the technical note for dplr

---
 fla/ops/generalized_delta_rule/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index f96c22f44..c5e067b01 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -34,4 +34,4 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit
 
 ## Efficient Chunkwise Implementation
 
-For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1rJbO3dU4fe7OKG3w7Yg058z_BNIuavNF/view?usp=sharing).
+For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing).

From e6f10969c67cdec9d5cffa0cac717010685dcdf8 Mon Sep 17 00:00:00 2001
From: pnazari <pnazari@student.ethz.ch>
Date: Sat, 16 Aug 2025 17:13:51 +0200
Subject: [PATCH 02/10] put the computations in the README

---
 fla/ops/generalized_delta_rule/README.md | 64 +++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index c5e067b01..99ef213e7 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -33,5 +33,67 @@ The second variant is DPLR, where we have:
 Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transition matrix structure has been utilized in RWKV7.
 
 ## Efficient Chunkwise Implementation
+The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations.
 
-For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing).
+
+Our goal is to show how to efficiently compute the DPLR representation
+$$
+    \mathbf S_t = \mathbf S_{t-1} \left( \mathbf D_t + \mathbf a_t \mathbf b_t^\top \right) + \mathbf v_t \mathbf k_t^\top
+$$
+for vectors $\mathbf a_t, \mathbf b_t, \mathbf v_t, \mathbf k_t \in \mathbb R^d$ and matrices $\mathbf D_t \in \mathbb R^{d, d}$.
+
+In particular, if the $\mathbf D_t$ are diagonal matrices, this identity provides the WY representation for products of DPLR matrices.
+
+### $WY$ Representation for $P_t$
+Let $\mathbf \Gamma_i^t \coloneqq \prod_{j=i}^t \mathbf D_j$. Then
+$$
+\begin{equation*}
+    \mathbf P_t = \mathbf \Gamma_1^t + \left( \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)
+\end{equation*}
+$$
+with
+$$
+    \mathbf w_i = \begin{cases}
+        \mathbf a_1, & i=1 \\
+        \mathbf \Gamma_1^{i-1} \mathbf a_i + \sum_{j=1}^{i-1} \mathbf w_j \mathbf b_j^\top \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2.
+    \end{cases}
+$$
+where we define $\mathbf \Gamma_m^{n} \coloneqq \mathbf I$ for $m > n$.
+
+We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$.
+
+For the induction step, note that
+$$
+\begin{align*}
+    \mathbf P_{t+1} &= \mathbf P_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) \\
+    &= \left( \mathbf \Gamma_{1}^t + \sum_{i=1}^t\mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}  \right) \mathbf D_{t+1} + \left( \mathbf \Gamma_1^t + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)\mathbf a_{t+1} \mathbf b_{t+1}^\top\\
+    &= \mathbf \Gamma_{1}^{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left(\mathbf \Gamma_1^{t} \mathbf a_{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right)}_{\eqqcolon \mathbf w_{t+1}} \mathbf b_{t+1}^\top \\
+    &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_i^{t},
+\end{align*}
+$$
+where we used $\mathbf \Gamma_{t+1}^t = \mathbf I$ in the last step. 
+
+### $WY$ Representation for $S_t$
+The $WY$ representation for $\mathbf S_t$ reads
+$$
+    \mathbf S_t = \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}
+$$
+where
+$$
+    \mathbf u_i = \begin{cases}
+        0, & i=1 \\
+        \sum_{j=1}^{i-1} \left( \mathbf v_j \mathbf k_j^\top + \mathbf u_j \mathbf b_j^\top \right) \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2.
+    \end{cases}
+$$
+We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^0 \coloneqq \mathbf I$.
+
+For the induction step, we compute
+$$
+\begin{align*}
+    \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1} \\
+    &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1} \\
+    &= \sum_{i=1}^t  (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t  \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right) \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\
+    &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1},
+\end{align*}
+$$
+where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$.

From 8003a5f9d968e676e65f96e04b6d9c36ac62cc99 Mon Sep 17 00:00:00 2001
From: pnazari <pnazari@student.ethz.ch>
Date: Sat, 16 Aug 2025 17:37:19 +0200
Subject: [PATCH 03/10] fixed some typos

---
 fla/ops/generalized_delta_rule/README.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index 99ef213e7..2a79c612e 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -35,6 +35,8 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit
 ## Efficient Chunkwise Implementation
 The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations.
 
+If you have questions about or comments on the below derivations, feel free to reach out: philipp.nazari@tuebingen.mpg.de.
+
 
 Our goal is to show how to efficiently compute the DPLR representation
 $$
@@ -68,10 +70,10 @@ $$
     \mathbf P_{t+1} &= \mathbf P_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) \\
     &= \left( \mathbf \Gamma_{1}^t + \sum_{i=1}^t\mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}  \right) \mathbf D_{t+1} + \left( \mathbf \Gamma_1^t + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)\mathbf a_{t+1} \mathbf b_{t+1}^\top\\
     &= \mathbf \Gamma_{1}^{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left(\mathbf \Gamma_1^{t} \mathbf a_{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right)}_{\eqqcolon \mathbf w_{t+1}} \mathbf b_{t+1}^\top \\
-    &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_i^{t},
+    &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
 $$
-where we used $\mathbf \Gamma_{t+1}^t = \mathbf I$ in the last step. 
+where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. 
 
 ### $WY$ Representation for $S_t$
 The $WY$ representation for $\mathbf S_t$ reads
@@ -85,14 +87,14 @@ $$
         \sum_{j=1}^{i-1} \left( \mathbf v_j \mathbf k_j^\top + \mathbf u_j \mathbf b_j^\top \right) \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2.
     \end{cases}
 $$
-We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^0 \coloneqq \mathbf I$.
+We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^1 \coloneqq \mathbf I$.
 
 For the induction step, we compute
 $$
 \begin{align*}
-    \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1} \\
-    &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1} \\
-    &= \sum_{i=1}^t  (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t  \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right) \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\
+    \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\
+    &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\
+    &= \sum_{i=1}^t  (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t  \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top \right)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1} \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\
     &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
 $$

From cfcd92645d01a31b1f3487314cee0fd648f34f1d Mon Sep 17 00:00:00 2001
From: Philipp Nazari <41115254+phnazari@users.noreply.github.com>
Date: Fri, 17 Oct 2025 10:58:09 +0200
Subject: [PATCH 04/10] fixed latex parsing and trailing whitespace

---
 fla/ops/generalized_delta_rule/README.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index 2a79c612e..c1dae7981 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -48,54 +48,54 @@ In particular, if the $\mathbf D_t$ are diagonal matrices, this identity provide
 
 ### $WY$ Representation for $P_t$
 Let $\mathbf \Gamma_i^t \coloneqq \prod_{j=i}^t \mathbf D_j$. Then
-$$
+```math
 \begin{equation*}
     \mathbf P_t = \mathbf \Gamma_1^t + \left( \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)
 \end{equation*}
-$$
+```
 with
-$$
+```math
     \mathbf w_i = \begin{cases}
         \mathbf a_1, & i=1 \\
         \mathbf \Gamma_1^{i-1} \mathbf a_i + \sum_{j=1}^{i-1} \mathbf w_j \mathbf b_j^\top \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2.
     \end{cases}
-$$
+```
 where we define $\mathbf \Gamma_m^{n} \coloneqq \mathbf I$ for $m > n$.
 
 We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$.
 
 For the induction step, note that
-$$
+```math
 \begin{align*}
     \mathbf P_{t+1} &= \mathbf P_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) \\
     &= \left( \mathbf \Gamma_{1}^t + \sum_{i=1}^t\mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}  \right) \mathbf D_{t+1} + \left( \mathbf \Gamma_1^t + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)\mathbf a_{t+1} \mathbf b_{t+1}^\top\\
     &= \mathbf \Gamma_{1}^{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left(\mathbf \Gamma_1^{t} \mathbf a_{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right)}_{\eqqcolon \mathbf w_{t+1}} \mathbf b_{t+1}^\top \\
     &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
-$$
+```
 where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. 
 
 ### $WY$ Representation for $S_t$
 The $WY$ representation for $\mathbf S_t$ reads
-$$
+```math
     \mathbf S_t = \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}
-$$
+```
 where
-$$
+```math
     \mathbf u_i = \begin{cases}
         0, & i=1 \\
         \sum_{j=1}^{i-1} \left( \mathbf v_j \mathbf k_j^\top + \mathbf u_j \mathbf b_j^\top \right) \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2.
     \end{cases}
-$$
+```
 We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^1 \coloneqq \mathbf I$.
 
 For the induction step, we compute
-$$
+```math
 \begin{align*}
     \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\
     &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\
     &= \sum_{i=1}^t  (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t  \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top \right)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1} \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\
     &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
-$$
+```
 where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$.

From 838537863ebb5a89cdf1a9f30ae83fe57604d7fd Mon Sep 17 00:00:00 2001
From: Philipp Nazari <41115254+phnazari@users.noreply.github.com>
Date: Fri, 17 Oct 2025 11:00:54 +0200
Subject: [PATCH 05/10] removed trailing whitespace


From 2bf2c552c562b3881ebe333a8d410c578cddce0f Mon Sep 17 00:00:00 2001
From: Philipp Nazari <41115254+phnazari@users.noreply.github.com>
Date: Fri, 17 Oct 2025 11:01:09 +0200
Subject: [PATCH 06/10] removed trailing whitespace


From 7880a70ff1e2e973d209088ec44a0749af502ccc Mon Sep 17 00:00:00 2001
From: Philipp Nazari <41115254+phnazari@users.noreply.github.com>
Date: Fri, 17 Oct 2025 11:01:27 +0200
Subject: [PATCH 07/10] removed trailing whitespace


From d957896944182d2ba82eda1788bbe7d9fe609621 Mon Sep 17 00:00:00 2001
From: pnazari <pnazari@student.ethz.ch>
Date: Fri, 17 Oct 2025 11:04:42 +0200
Subject: [PATCH 08/10] removed trailing white-space

---
 fla/ops/generalized_delta_rule/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index c1dae7981..5a513e839 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -98,4 +98,4 @@ For the induction step, we compute
     &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
 ```
-where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$.
+where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$.
\ No newline at end of file

From a5b09ca607d4357347b50d3a22dd201eea82a288 Mon Sep 17 00:00:00 2001
From: pnazari <pnazari@student.ethz.ch>
Date: Fri, 17 Oct 2025 11:11:25 +0200
Subject: [PATCH 09/10] ran pre-commits

---
 fla/ops/generalized_delta_rule/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index 5a513e839..5a97a5ada 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -73,7 +73,7 @@ For the induction step, note that
     &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
 ```
-where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. 
+where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step.
 
 ### $WY$ Representation for $S_t$
 The $WY$ representation for $\mathbf S_t$ reads
@@ -98,4 +98,4 @@ For the induction step, we compute
     &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1},
 \end{align*}
 ```
-where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$.
\ No newline at end of file
+where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$.

From e1c2f2ee5b466e841290103572cc280c4c9cd47b Mon Sep 17 00:00:00 2001
From: pnazari <pnazari@student.ethz.ch>
Date: Fri, 17 Oct 2025 11:14:29 +0200
Subject: [PATCH 10/10] changed contact information

---
 fla/ops/generalized_delta_rule/README.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md
index 5a97a5ada..7ac2a76ba 100644
--- a/fla/ops/generalized_delta_rule/README.md
+++ b/fla/ops/generalized_delta_rule/README.md
@@ -35,8 +35,7 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit
 ## Efficient Chunkwise Implementation
 The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations.
 
-If you have questions about or comments on the below derivations, feel free to reach out: philipp.nazari@tuebingen.mpg.de.
-
+If you have questions about or comments about the below derivations, feel free to [reach out](https://phnazari.github.io).
 
 Our goal is to show how to efficiently compute the DPLR representation
 $$