feat: use original cuda scan from linear RNN #22

yoyolicoris · 2025-05-04T09:11:18Z

import torch
import torchlpc
from numba import cuda

from timeit import timeit


def numba_scan(impulse, decay, initial_state):
    out = torch.empty_like(impulse)
    n_dims, n_steps = decay.shape
    torchlpc.recurrence.compute_linear_recurrence(
        cuda.as_cuda_array(decay.detach()),
        cuda.as_cuda_array(impulse.detach()),
        cuda.as_cuda_array(initial_state.detach()),
        cuda.as_cuda_array(out),
        n_dims,
        n_steps,
    )
    return out


batch_size = 64
samples = 2**19

x = torch.randn(batch_size, samples).cuda()
A = torch.rand(batch_size, samples).cuda() * 2 - 1
zi = torch.randn(batch_size).cuda()

t_torch = timeit(
    "torch.ops.torchlpc.scan(x, A, zi)",
    globals=globals(),
    number=100,
)
print(f"Torch time: {t_torch:.4f} seconds")

numba_scan(x, A, zi)

t_numba = timeit(
    "numba_scan(x, A, zi)",
    globals=globals(),
    number=100,
)

print(f"Numba time: {t_numba:.4f} seconds")

print(f"Torch is {t_numba / t_torch:.2f}x faster than Numba")

The results on a 5060 ti GPU with linux machine:

Torch time: 0.4350 seconds
Numba time: 0.4457 seconds
Torch is 1.02x faster than Numba

It's roughly the same as before.

… for scan and lpc functions

…y and maintainability

Copilot

Pull Request Overview

This PR reintroduces the original CUDA scan implementation for linear RNN recurrence computation while also cleaning up and reorganizing the recurrence functions and extension API. Key changes include:

Splitting recurrence functionality into separate _cuda_recurrence and _cpu_recurrence functions.
Renaming and re-registering C++ extension functions from “scan_cpu”/“lpc_cpu” to “scan”/“lpc”.
Updating tests and setup configuration to support the new extension API and parameterizations.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
torchlpc/recurrence.py	Refactored recurrence functions with separate CPU/CUDA paths.
torchlpc/csrc/scan_cpu.cpp	Updated API registration and const/mutable data pointer usage.
torchlpc/csrc/cuda/LICENSE.txt	Added license text for CUDA code.
torchlpc/core.py	Updated operator calls to reflect new API names.
torchlpc/init.py	Changed extension loading logic using try-except for _C import.
tests/test_grad.py	Added parameterization for complex data tests.
tests/test_extension.py	Updated tests to use new operator names and added device parameter.
setup.py	Revised extension build configuration and source file discovery.

Files not reviewed (1)

torchlpc/csrc/cuda/linear_recurrence.cu: Language not supported

torchlpc/recurrence.py

torchlpc/__init__.py

… run faster

yoyolicoris added 11 commits May 3, 2025 14:47

feat: add linear recurrence with MIT license

ad77bbb

refactor linear recurrence code

9164dcb

fix: correct ndims and nsteps

4dcde47

refactor: rename scan_cpu and lpc_cpu functions to scan and lpc

6c8570c

refactor: update function calls to use unified 'scan' operation

a4fd535

refactor: reorganize setup.py for building CUDA extensions

98f314f

refactor: update tests to include device and complex parameterization…

206497e

… for scan and lpc functions

refactor: implement separate recurrence functions for improved clarit…

6d4a2de

…y and maintainability

refactor: create dummy _C module for python loading

086bbb4

fix: typo

5e140c5

apply copilot's suggestion

bf9acfe

yoyolicoris requested a review from Copilot May 4, 2025 09:17

Copilot AI reviewed May 4, 2025

View reviewed changes

torchlpc/recurrence.py Show resolved Hide resolved

torchlpc/__init__.py Show resolved Hide resolved

refactor: use channel-first format, swap the role of lane and warp to…

a98aae9

… run faster

yoyolicoris merged commit c662e34 into main May 4, 2025
6 of 8 checks passed

yoyolicoris deleted the feat/cuda-scan branch May 4, 2025 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use original cuda scan from linear RNN #22

feat: use original cuda scan from linear RNN #22

Uh oh!

yoyolicoris commented May 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: use original cuda scan from linear RNN #22

feat: use original cuda scan from linear RNN #22

Uh oh!

Conversation

yoyolicoris commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yoyolicoris commented May 4, 2025 •

edited

Loading