From e23968f8a7b9d1d76890ca6a0eae9c5f24c0f73c Mon Sep 17 00:00:00 2001 From: pnazari Date: Tue, 12 Aug 2025 22:49:23 +0200 Subject: [PATCH 01/10] updated the technical note for dplr --- fla/ops/generalized_delta_rule/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index f96c22f44..c5e067b01 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -34,4 +34,4 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit ## Efficient Chunkwise Implementation -For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1rJbO3dU4fe7OKG3w7Yg058z_BNIuavNF/view?usp=sharing). +For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing). From e6f10969c67cdec9d5cffa0cac717010685dcdf8 Mon Sep 17 00:00:00 2001 From: pnazari Date: Sat, 16 Aug 2025 17:13:51 +0200 Subject: [PATCH 02/10] put the computations in the README --- fla/ops/generalized_delta_rule/README.md | 64 +++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index c5e067b01..99ef213e7 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -33,5 +33,67 @@ The second variant is DPLR, where we have: Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transition matrix structure has been utilized in RWKV7. ## Efficient Chunkwise Implementation +The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations. -For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing). + +Our goal is to show how to efficiently compute the DPLR representation +$$ + \mathbf S_t = \mathbf S_{t-1} \left( \mathbf D_t + \mathbf a_t \mathbf b_t^\top \right) + \mathbf v_t \mathbf k_t^\top +$$ +for vectors $\mathbf a_t, \mathbf b_t, \mathbf v_t, \mathbf k_t \in \mathbb R^d$ and matrices $\mathbf D_t \in \mathbb R^{d, d}$. + +In particular, if the $\mathbf D_t$ are diagonal matrices, this identity provides the WY representation for products of DPLR matrices. + +### $WY$ Representation for $P_t$ +Let $\mathbf \Gamma_i^t \coloneqq \prod_{j=i}^t \mathbf D_j$. Then +$$ +\begin{equation*} + \mathbf P_t = \mathbf \Gamma_1^t + \left( \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right) +\end{equation*} +$$ +with +$$ + \mathbf w_i = \begin{cases} + \mathbf a_1, & i=1 \\ + \mathbf \Gamma_1^{i-1} \mathbf a_i + \sum_{j=1}^{i-1} \mathbf w_j \mathbf b_j^\top \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2. + \end{cases} +$$ +where we define $\mathbf \Gamma_m^{n} \coloneqq \mathbf I$ for $m > n$. + +We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$. + +For the induction step, note that +$$ +\begin{align*} + \mathbf P_{t+1} &= \mathbf P_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) \\ + &= \left( \mathbf \Gamma_{1}^t + \sum_{i=1}^t\mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right) \mathbf D_{t+1} + \left( \mathbf \Gamma_1^t + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)\mathbf a_{t+1} \mathbf b_{t+1}^\top\\ + &= \mathbf \Gamma_{1}^{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left(\mathbf \Gamma_1^{t} \mathbf a_{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right)}_{\eqqcolon \mathbf w_{t+1}} \mathbf b_{t+1}^\top \\ + &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_i^{t}, +\end{align*} +$$ +where we used $\mathbf \Gamma_{t+1}^t = \mathbf I$ in the last step. + +### $WY$ Representation for $S_t$ +The $WY$ representation for $\mathbf S_t$ reads +$$ + \mathbf S_t = \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t} +$$ +where +$$ + \mathbf u_i = \begin{cases} + 0, & i=1 \\ + \sum_{j=1}^{i-1} \left( \mathbf v_j \mathbf k_j^\top + \mathbf u_j \mathbf b_j^\top \right) \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2. + \end{cases} +$$ +We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^0 \coloneqq \mathbf I$. + +For the induction step, we compute +$$ +\begin{align*} + \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1} \\ + &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1} \\ + &= \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right) \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\ + &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1}, +\end{align*} +$$ +where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$. From 8003a5f9d968e676e65f96e04b6d9c36ac62cc99 Mon Sep 17 00:00:00 2001 From: pnazari Date: Sat, 16 Aug 2025 17:37:19 +0200 Subject: [PATCH 03/10] fixed some typos --- fla/ops/generalized_delta_rule/README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index 99ef213e7..2a79c612e 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -35,6 +35,8 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit ## Efficient Chunkwise Implementation The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations. +If you have questions about or comments on the below derivations, feel free to reach out: philipp.nazari@tuebingen.mpg.de. + Our goal is to show how to efficiently compute the DPLR representation $$ @@ -68,10 +70,10 @@ $$ \mathbf P_{t+1} &= \mathbf P_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) \\ &= \left( \mathbf \Gamma_{1}^t + \sum_{i=1}^t\mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right) \mathbf D_{t+1} + \left( \mathbf \Gamma_1^t + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)\mathbf a_{t+1} \mathbf b_{t+1}^\top\\ &= \mathbf \Gamma_{1}^{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left(\mathbf \Gamma_1^{t} \mathbf a_{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right)}_{\eqqcolon \mathbf w_{t+1}} \mathbf b_{t+1}^\top \\ - &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_i^{t}, + &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1}, \end{align*} $$ -where we used $\mathbf \Gamma_{t+1}^t = \mathbf I$ in the last step. +where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. ### $WY$ Representation for $S_t$ The $WY$ representation for $\mathbf S_t$ reads @@ -85,14 +87,14 @@ $$ \sum_{j=1}^{i-1} \left( \mathbf v_j \mathbf k_j^\top + \mathbf u_j \mathbf b_j^\top \right) \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2. \end{cases} $$ -We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^0 \coloneqq \mathbf I$. +We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^1 \coloneqq \mathbf I$. For the induction step, we compute $$ \begin{align*} - \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1} \\ - &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1} \\ - &= \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right) \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\ + \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\ + &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\ + &= \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top \right)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1} \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\ &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1}, \end{align*} $$ From cfcd92645d01a31b1f3487314cee0fd648f34f1d Mon Sep 17 00:00:00 2001 From: Philipp Nazari <41115254+phnazari@users.noreply.github.com> Date: Fri, 17 Oct 2025 10:58:09 +0200 Subject: [PATCH 04/10] fixed latex parsing and trailing whitespace --- fla/ops/generalized_delta_rule/README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index 2a79c612e..c1dae7981 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -48,54 +48,54 @@ In particular, if the $\mathbf D_t$ are diagonal matrices, this identity provide ### $WY$ Representation for $P_t$ Let $\mathbf \Gamma_i^t \coloneqq \prod_{j=i}^t \mathbf D_j$. Then -$$ +```math \begin{equation*} \mathbf P_t = \mathbf \Gamma_1^t + \left( \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right) \end{equation*} -$$ +``` with -$$ +```math \mathbf w_i = \begin{cases} \mathbf a_1, & i=1 \\ \mathbf \Gamma_1^{i-1} \mathbf a_i + \sum_{j=1}^{i-1} \mathbf w_j \mathbf b_j^\top \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2. \end{cases} -$$ +``` where we define $\mathbf \Gamma_m^{n} \coloneqq \mathbf I$ for $m > n$. We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$. For the induction step, note that -$$ +```math \begin{align*} \mathbf P_{t+1} &= \mathbf P_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) \\ &= \left( \mathbf \Gamma_{1}^t + \sum_{i=1}^t\mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right) \mathbf D_{t+1} + \left( \mathbf \Gamma_1^t + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right)\mathbf a_{t+1} \mathbf b_{t+1}^\top\\ &= \mathbf \Gamma_{1}^{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left(\mathbf \Gamma_1^{t} \mathbf a_{t+1} + \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1}\right)}_{\eqqcolon \mathbf w_{t+1}} \mathbf b_{t+1}^\top \\ &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1}, \end{align*} -$$ +``` where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. ### $WY$ Representation for $S_t$ The $WY$ representation for $\mathbf S_t$ reads -$$ +```math \mathbf S_t = \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t} -$$ +``` where -$$ +```math \mathbf u_i = \begin{cases} 0, & i=1 \\ \sum_{j=1}^{i-1} \left( \mathbf v_j \mathbf k_j^\top + \mathbf u_j \mathbf b_j^\top \right) \mathbf \Gamma_{j+1}^{i-1} \mathbf a_i, & i \geq 2. \end{cases} -$$ +``` We again show this claim by induction. The base case $t=1$ is clear, once we realize that $\mathbf u_1 \coloneqq 0$ and $\mathbf \Gamma_2^1 \coloneqq \mathbf I$. For the induction step, we compute -$$ +```math \begin{align*} \mathbf S_{t+1} &= \mathbf S_t (\mathbf D_{t+1} + \mathbf a_{t+1} \mathbf b_{t+1}^\top) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\ &= \left[\sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t}\right] \left(\mathbf D_{t+1} + \mathbf a_{t+1}\mathbf b_{t+1}^\top\right) + \mathbf v_{t+1} \mathbf k_{t+1}^\top \\ &= \sum_{i=1}^t (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top) \mathbf \Gamma_{i+1}^{t+1} + \underbrace{\left[ \sum_{i=1}^t \left(\mathbf v_i\mathbf k_i^\top + \mathbf u_i\mathbf b_i^\top \right)\mathbf \Gamma_{i+1}^{t}\mathbf a_{t+1} \right]}_{\eqqcolon \mathbf u_{t+1}}\mathbf b_{t+1}^\top + \mathbf v_{t+1}\mathbf k_{t+1}^\top \\ &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1}, \end{align*} -$$ +``` where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$. From 838537863ebb5a89cdf1a9f30ae83fe57604d7fd Mon Sep 17 00:00:00 2001 From: Philipp Nazari <41115254+phnazari@users.noreply.github.com> Date: Fri, 17 Oct 2025 11:00:54 +0200 Subject: [PATCH 05/10] removed trailing whitespace From 2bf2c552c562b3881ebe333a8d410c578cddce0f Mon Sep 17 00:00:00 2001 From: Philipp Nazari <41115254+phnazari@users.noreply.github.com> Date: Fri, 17 Oct 2025 11:01:09 +0200 Subject: [PATCH 06/10] removed trailing whitespace From 7880a70ff1e2e973d209088ec44a0749af502ccc Mon Sep 17 00:00:00 2001 From: Philipp Nazari <41115254+phnazari@users.noreply.github.com> Date: Fri, 17 Oct 2025 11:01:27 +0200 Subject: [PATCH 07/10] removed trailing whitespace From d957896944182d2ba82eda1788bbe7d9fe609621 Mon Sep 17 00:00:00 2001 From: pnazari Date: Fri, 17 Oct 2025 11:04:42 +0200 Subject: [PATCH 08/10] removed trailing white-space --- fla/ops/generalized_delta_rule/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index c1dae7981..5a513e839 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -98,4 +98,4 @@ For the induction step, we compute &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1}, \end{align*} ``` -where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$. +where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$. \ No newline at end of file From a5b09ca607d4357347b50d3a22dd201eea82a288 Mon Sep 17 00:00:00 2001 From: pnazari Date: Fri, 17 Oct 2025 11:11:25 +0200 Subject: [PATCH 09/10] ran pre-commits --- fla/ops/generalized_delta_rule/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index 5a513e839..5a97a5ada 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -73,7 +73,7 @@ For the induction step, note that &= \mathbf \Gamma_1^{t+1} + \sum_{i=1}^{t+1} \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t+1}, \end{align*} ``` -where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. +where we used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$ in the last step. ### $WY$ Representation for $S_t$ The $WY$ representation for $\mathbf S_t$ reads @@ -98,4 +98,4 @@ For the induction step, we compute &= \sum_{i=1}^{t+1} (\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top ) \mathbf \Gamma_{i+1}^{t+1}, \end{align*} ``` -where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$. \ No newline at end of file +where we again used $\mathbf \Gamma_{t+2}^{t+1} = \mathbf I$. From e1c2f2ee5b466e841290103572cc280c4c9cd47b Mon Sep 17 00:00:00 2001 From: pnazari Date: Fri, 17 Oct 2025 11:14:29 +0200 Subject: [PATCH 10/10] changed contact information --- fla/ops/generalized_delta_rule/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fla/ops/generalized_delta_rule/README.md b/fla/ops/generalized_delta_rule/README.md index 5a97a5ada..7ac2a76ba 100644 --- a/fla/ops/generalized_delta_rule/README.md +++ b/fla/ops/generalized_delta_rule/README.md @@ -35,8 +35,7 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit ## Efficient Chunkwise Implementation The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations. -If you have questions about or comments on the below derivations, feel free to reach out: philipp.nazari@tuebingen.mpg.de. - +If you have questions about or comments about the below derivations, feel free to [reach out](https://phnazari.github.io). Our goal is to show how to efficiently compute the DPLR representation $$