fix: add tensormatmul and tensortranspose to iengine for proper integration #504

ooples · 2025-11-24T00:46:11Z

Summary

Verified and completed IEngine integration for tensor matrix operations. Added missing TensorMatMul and TensorTranspose methods to the IEngine interface to enable future GPU acceleration in autodiff computation graphs.

Key Findings:

TensorOperations.MatrixMultiply currently uses Tensor.MatrixMultiply() (not IEngine)
TensorOperations.Transpose currently uses Tensor.Transpose() (not IEngine)
ComputationNode lacks an Engine property, preventing direct IEngine usage in TensorOperations
DenseLayer ExportComputationGraph method not present in this branch

Changes

IEngine Interface (`src/Engines/IEngine.cs`)

Added TensorMatMul<T>(Tensor<T> a, Tensor<T> b) method
Added TensorTranspose<T>(Tensor<T> tensor) method
Both methods documented with full XML comments

CpuEngine (`src/Engines/CpuEngine.cs`)

Implemented TensorMatMul with standard matrix multiplication algorithm
Implemented TensorTranspose with row/column swap logic
Both methods validate 2D tensor inputs and dimensions

GpuEngine (`src/Engines/GpuEngine.cs`)

Implemented TensorMatMul delegating to Matrix operations with adaptive thresholds
Implemented TensorTranspose delegating to Matrix operations
Added ToMatrix<T> and ToTensor<T> helper methods for conversion
Uses GPU acceleration for large tensors, CPU fallback for small ones

Architecture Impact

While TensorOperations doesn't use these methods yet (due to ComputationNode lacking an Engine reference), this PR:

Completes the IEngine interface for tensor operations
Provides infrastructure for future optimization
Maintains consistency between Matrix and Tensor operation APIs
Enables GPU acceleration pathways when autodiff is refactored

Testing

Build succeeds for all target frameworks (net462, net471, net8.0)
No compilation errors
IEngine interface now complete for tensor operations

Next Steps

Future work could include:

Add Engine property to ComputationNode
Update TensorOperations to use IEngine methods directly
Update Tensor.MatrixMultiply and Tensor.Transpose to delegate to IEngine

Generated with Claude Code

…ration Added missing IEngine methods for tensor-level matrix operations to enable GPU acceleration in autodiff computation graphs. TensorOperations currently uses Tensor.MatrixMultiply and Tensor.Transpose which don't leverage IEngine, but this infrastructure enables future optimization. Changes: - Add TensorMatMul and TensorTranspose methods to IEngine interface - Implement both methods in CpuEngine with standard matrix algorithms - Implement both methods in GpuEngine delegating to Matrix operations - Add helper methods ToMatrix/ToTensor for 2D tensor-matrix conversion This completes the IEngine interface for tensor operations required by autodiff, even though TensorOperations doesn't use them yet due to ComputationNode lacking an Engine reference. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-11-24T00:46:37Z

Summary by CodeRabbit

New Features

Added 2D tensor matrix multiplication operation with CPU and GPU backend support
Added 2D tensor transpose operation with CPU and GPU backend support
Both operations include automatic backend routing with fallback capabilities

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds two new tensor operation methods—TensorMatMul for matrix multiplication and TensorTranspose for tensor transposition—to the IEngine interface and both CpuEngine and GpuEngine implementations. GpuEngine includes adaptive routing to CPU for small shapes, while CpuEngine provides direct CPU implementations.

Changes

Cohort / File(s)	Summary
Interface Definitions `src/Engines/IEngine.cs`	Declares two new public generic methods: `TensorMatMul<T>` and `TensorTranspose<T>` with comprehensive XML documentation describing 2D tensor operation behavior. Note: Methods appear duplicated in the interface definition.
CPU Implementation `src/Engines/CpuEngine.cs`	Implements `TensorMatMul<T>` with input validation (null checks, rank verification, inner-dimension compatibility) and triple-nested loop multiplication logic; implements `TensorTranspose<T>` with validation and transpose loop.
GPU Implementation `src/Engines/GpuEngine.cs`	Implements `TensorMatMul<T>` and `TensorTranspose<T>` with adaptive routing (small shapes fall back to CPU, otherwise use GPU if available and healthy). Adds private helper methods `ToMatrix<T>` and `ToTensor<T>` to convert between Tensor and Matrix representations for GPU processing. Includes type-specialized handling for float and double.

Sequence Diagram

sequenceDiagram
    actor Client
    participant GpuEngine
    participant CpuEngine
    participant GPU

    Client->>GpuEngine: TensorMatMul(a, b)
    alt Shape is Small
        GpuEngine->>CpuEngine: TensorMatMul(a, b)
        CpuEngine-->>GpuEngine: result
    else Shape is Large & GPU Healthy
        GpuEngine->>GpuEngine: ToMatrix(a), ToMatrix(b)
        GpuEngine->>GPU: MatrixMultiply
        GPU-->>GpuEngine: matrix_result
        GpuEngine->>GpuEngine: ToTensor(matrix_result)
    else GPU Unavailable
        GpuEngine->>CpuEngine: TensorMatMul(a, b)
        CpuEngine-->>GpuEngine: result
    end
    GpuEngine-->>Client: Tensor<T> result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

GpuEngine adaptive routing logic: Verify correct shape threshold for CPU fallback and GPU health checks
Type specialization in GpuEngine: Confirm float/double handling and type casting in conversion helpers
Interface duplication: Investigate why TensorMatMul and TensorTranspose appear twice in IEngine
Input validation consistency: Cross-check null checks and rank validation across CpuEngine and GpuEngine implementations
Tensor-Matrix conversion helpers: Validate correctness of ToMatrix and ToTensor bidirectional conversions

Poem

🐰 Two tensors dance, seeking dimensions bright,
Transposed and multiplied with CPU might!
When GPU's ready, with adaptive grace,
Small shapes hop back to the safe CPU space. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding TensorMatMul and TensorTranspose methods to the IEngine interface for proper integration.
Description check	✅ Passed	The description is well-structured and directly related to the changeset, providing context about the changes to IEngine, CpuEngine, and GpuEngine with clear explanations of implementation approaches.
Docstring Coverage	✅ Passed	Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/iengine-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/Engines/GpuEngine.cs (1)
5010-5023: Fix infinite recursion in RecordGpuFailure when max recovery attempts are exceeded

When _consecutiveFailures >= MaxRecoveryAttempts, RecordGpuFailure calls itself recursively:
if (_consecutiveFailures >= MaxRecoveryAttempts)
{
    RecordGpuFailure(exception);
    return true;
}
This never changes state in a way that breaks the recursion and will lead to a stack overflow once the limit is crossed, completely breaking GPU failure handling and potentially the entire process.

The likely intent is to permanently disable GPU usage after the maximum attempts, not to recurse. Suggested fix:
     private bool RecordGpuFailure(Exception exception)
     {
@@
-            // If we've exceeded maximum recovery attempts, permanently disable GPU
-            if (_consecutiveFailures >= MaxRecoveryAttempts)
-            {
-                RecordGpuFailure(exception);
-                return true;
-            }
+            // If we've exceeded maximum recovery attempts, permanently disable GPU
+            if (_consecutiveFailures >= MaxRecoveryAttempts)
+            {
+                _gpuHealthy = false;
+                Console.WriteLine($"[GpuEngine] GPU permanently disabled after {_consecutiveFailures} consecutive failures.");
+                return true;
+            }
This avoids recursion, correctly marks the GPU as permanently unhealthy, and keeps the rest of the health/recovery logic meaningful.

🧹 Nitpick comments (1)

src/Engines/GpuEngine.cs (1)

4036-4064: Consider more efficient 2D Tensor↔Matrix conversion for large shapes

ToMatrix/ToTensor are functionally correct and nicely encapsulate 2D conversions, but they perform element‑by‑element copies via nested indexer loops. For large matrices, this adds extra CPU overhead on top of the unavoidable host↔device copies.

If Tensor<T> and Matrix<T> share compatible row‑major layouts (as they appear to), you could eventually:

Expose spans from both types (AsSpan/AsWritableSpan) and

Replace the nested loops with a single span copy where possible.

That would turn the conversions into a single linear copy instead of O(m*n) indexer calls, without changing semantics.

This is not blocking, but worth tracking as a perf improvement once the layout guarantees are confirmed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef6df16 and 1db47cd.

📒 Files selected for processing (3)

src/Engines/CpuEngine.cs (1 hunks)
src/Engines/GpuEngine.cs (1 hunks)
src/Engines/IEngine.cs (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-19T04:08:26.895Z

Learnt from: ooples
Repo: ooples/AiDotNet PR: 0
File: :0-0
Timestamp: 2025-11-19T04:08:26.895Z
Learning: For ILGPU GPU operations in GpuEngine.cs, use standard .NET exception types (InvalidOperationException, ArgumentException, OutOfMemoryException) instead of ILGPU-specific exception types, as ILGPU exception types may be version-specific. Combine with message-based filtering using ex.Message.Contains("device") or ex.Message.Contains("accelerator") as a fallback for GPU-specific errors.

Applied to files:

src/Engines/GpuEngine.cs

🧬 Code graph analysis (1)

src/Engines/IEngine.cs (1)

src/Engines/CpuEngine.cs (20)

T (266-279)

T (282-300)

T (303-312)

T (532-548)

Tensor (676-730)

Tensor (733-774)

Tensor (777-800)

Tensor (803-822)

Tensor (825-844)

Tensor (847-866)

Tensor (869-882)

Tensor (885-910)

Tensor (938-1004)

Tensor (1007-1079)

Tensor (1082-1170)

Tensor (1209-1215)

Tensor (1217-1223)

Tensor (1225-1241)

Tensor (1263-1268)

Tensor (1270-1275)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build All Frameworks

🔇 Additional comments (3)

src/Engines/CpuEngine.cs (2)

732-774: LGTM! Correct standard matrix multiplication implementation.

The implementation correctly validates inputs, checks dimension compatibility, and performs standard O(n³) matrix multiplication. The logic appropriately mirrors the existing MatrixMultiply method (lines 387-417) but operates on Tensor<T> types.

776-800: LGTM! Correct transpose implementation.

The implementation correctly validates inputs and performs 2D transpose by swapping indices. The logic appropriately mirrors the existing MatrixTranspose method (lines 449-464) but operates on Tensor<T> types.

src/Engines/IEngine.cs (1)

694-731: Interface and implementations are correct; integration to TensorOperations is pending future work.

The TensorMatMul<T> and TensorTranspose<T> interface declarations are well-documented with proper XML comments explaining their purpose in autodiff computation graphs. Both CpuEngine and GpuEngine provide implementations with appropriate adaptive execution logic and fallback mechanisms.

The search confirms that integration to TensorOperations is not yet complete (as already noted in the PR description), with TensorOperations still calling Tensor.MatrixMultiply() and Tensor.Transpose() directly rather than delegating to IEngine. This aligns with the PR's design decision to add the infrastructure now and integrate it in future work.

src/Engines/GpuEngine.cs

Copilot

Pull request overview

This PR adds TensorMatMul and TensorTranspose methods to the IEngine interface and implements them in both CpuEngine and GpuEngine. These additions complete the IEngine interface for tensor matrix operations, establishing infrastructure for future GPU acceleration in autodiff computation graphs.

Key Changes:

Added two new methods to IEngine interface with comprehensive XML documentation
Implemented CPU-based matrix multiplication and transpose for 2D tensors
Implemented GPU-based versions with adaptive thresholds and type-specific delegation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
src/Engines/IEngine.cs	Added `TensorMatMul` and `TensorTranspose` interface methods with detailed documentation explaining usage in autodiff computation graphs
src/Engines/CpuEngine.cs	Implemented standard matrix multiplication and transpose algorithms for 2D tensors with proper dimension validation
src/Engines/GpuEngine.cs	Implemented GPU-accelerated versions with adaptive thresholds, conversion helpers, and CPU fallback for unsupported types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/Engines/GpuEngine.cs

src/Engines/CpuEngine.cs

src/Engines/GpuEngine.cs

- PR #509: Added comprehensive gradient tests for TaylorSoftmax and GumbelSoftmax including temperature scaling, hard mode, and validation - PR #508: Verified Sparsemax threshold algorithm and SphericalSoftmax gradient implementation (correct standard algorithms) - PR #504: Verified GpuEngine TensorMatMul/TensorTranspose threshold logic - PR #500: Fixed 76+ redundant null check patterns in TensorOperations.cs using proper local variable approach for null safety instead of verbose nested if/else blocks - Fixed CreateRandomTensor helper in tests to use proper Tensor constructor - Added braces to if statements for proper block statements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add GradHardSigmoid with proper masking for -3 < x < 3 - Add GradHardTanh with proper masking for minVal < x < maxVal - Add GradSoftPlus with numerically stable implementation - Fix Softplus forward pass: use max(0,x) + log(1+exp(-|x|)) formula - Add comprehensive TensorMatMul/TensorTranspose tests (20 tests) Addresses PR review comments for #499, #500, #503, #504, #508, #509 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai bot requested changes Nov 24, 2025

View reviewed changes

src/Engines/GpuEngine.cs Show resolved Hide resolved

src/Engines/GpuEngine.cs Show resolved Hide resolved

coderabbitai bot mentioned this pull request Nov 27, 2025

chore: enable JIT compilation for remaining unsupported layers #514

Merged

7 tasks

coderabbitai bot approved these changes Dec 1, 2025

View reviewed changes

Merge branch 'master' into feat/iengine-verification

0a25217

Copilot AI review requested due to automatic review settings December 1, 2025 02:06

Copilot started reviewing on behalf of ooples December 1, 2025 02:07 View session

Copilot finished reviewing on behalf of ooples December 1, 2025 02:10

Copilot AI reviewed Dec 1, 2025

View reviewed changes

ooples closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: add tensormatmul and tensortranspose to iengine for proper integration #504

fix: add tensormatmul and tensortranspose to iengine for proper integration #504

Uh oh!

ooples commented Nov 24, 2025

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: add tensormatmul and tensortranspose to iengine for proper integration #504

fix: add tensormatmul and tensortranspose to iengine for proper integration #504

Uh oh!

Conversation

ooples commented Nov 24, 2025

Summary

Changes

IEngine Interface (src/Engines/IEngine.cs)

CpuEngine (src/Engines/CpuEngine.cs)

GpuEngine (src/Engines/GpuEngine.cs)

Architecture Impact

Testing

Next Steps

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

New Features

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IEngine Interface (`src/Engines/IEngine.cs`)

CpuEngine (`src/Engines/CpuEngine.cs`)

GpuEngine (`src/Engines/GpuEngine.cs`)

coderabbitai bot commented Nov 24, 2025 •

edited

Loading