Skip to content

Conversation

@ooples
Copy link
Owner

@ooples ooples commented Nov 24, 2025

Summary

Verified and completed IEngine integration for tensor matrix operations. Added missing TensorMatMul and TensorTranspose methods to the IEngine interface to enable future GPU acceleration in autodiff computation graphs.

Key Findings:

  • TensorOperations.MatrixMultiply currently uses Tensor.MatrixMultiply() (not IEngine)
  • TensorOperations.Transpose currently uses Tensor.Transpose() (not IEngine)
  • ComputationNode lacks an Engine property, preventing direct IEngine usage in TensorOperations
  • DenseLayer ExportComputationGraph method not present in this branch

Changes

IEngine Interface (src/Engines/IEngine.cs)

  • Added TensorMatMul<T>(Tensor<T> a, Tensor<T> b) method
  • Added TensorTranspose<T>(Tensor<T> tensor) method
  • Both methods documented with full XML comments

CpuEngine (src/Engines/CpuEngine.cs)

  • Implemented TensorMatMul with standard matrix multiplication algorithm
  • Implemented TensorTranspose with row/column swap logic
  • Both methods validate 2D tensor inputs and dimensions

GpuEngine (src/Engines/GpuEngine.cs)

  • Implemented TensorMatMul delegating to Matrix operations with adaptive thresholds
  • Implemented TensorTranspose delegating to Matrix operations
  • Added ToMatrix<T> and ToTensor<T> helper methods for conversion
  • Uses GPU acceleration for large tensors, CPU fallback for small ones

Architecture Impact

While TensorOperations doesn't use these methods yet (due to ComputationNode lacking an Engine reference), this PR:

  1. Completes the IEngine interface for tensor operations
  2. Provides infrastructure for future optimization
  3. Maintains consistency between Matrix and Tensor operation APIs
  4. Enables GPU acceleration pathways when autodiff is refactored

Testing

  • Build succeeds for all target frameworks (net462, net471, net8.0)
  • No compilation errors
  • IEngine interface now complete for tensor operations

Next Steps

Future work could include:

  1. Add Engine property to ComputationNode
  2. Update TensorOperations to use IEngine methods directly
  3. Update Tensor.MatrixMultiply and Tensor.Transpose to delegate to IEngine

Generated with Claude Code

…ration

Added missing IEngine methods for tensor-level matrix operations to enable
GPU acceleration in autodiff computation graphs. TensorOperations currently
uses Tensor.MatrixMultiply and Tensor.Transpose which don't leverage IEngine,
but this infrastructure enables future optimization.

Changes:
- Add TensorMatMul and TensorTranspose methods to IEngine interface
- Implement both methods in CpuEngine with standard matrix algorithms
- Implement both methods in GpuEngine delegating to Matrix operations
- Add helper methods ToMatrix/ToTensor for 2D tensor-matrix conversion

This completes the IEngine interface for tensor operations required by
autodiff, even though TensorOperations doesn't use them yet due to
ComputationNode lacking an Engine reference.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Summary by CodeRabbit

New Features

  • Added 2D tensor matrix multiplication operation with CPU and GPU backend support
  • Added 2D tensor transpose operation with CPU and GPU backend support
  • Both operations include automatic backend routing with fallback capabilities

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds two new tensor operation methods—TensorMatMul for matrix multiplication and TensorTranspose for tensor transposition—to the IEngine interface and both CpuEngine and GpuEngine implementations. GpuEngine includes adaptive routing to CPU for small shapes, while CpuEngine provides direct CPU implementations.

Changes

Cohort / File(s) Summary
Interface Definitions
src/Engines/IEngine.cs
Declares two new public generic methods: TensorMatMul<T> and TensorTranspose<T> with comprehensive XML documentation describing 2D tensor operation behavior. Note: Methods appear duplicated in the interface definition.
CPU Implementation
src/Engines/CpuEngine.cs
Implements TensorMatMul<T> with input validation (null checks, rank verification, inner-dimension compatibility) and triple-nested loop multiplication logic; implements TensorTranspose<T> with validation and transpose loop.
GPU Implementation
src/Engines/GpuEngine.cs
Implements TensorMatMul<T> and TensorTranspose<T> with adaptive routing (small shapes fall back to CPU, otherwise use GPU if available and healthy). Adds private helper methods ToMatrix<T> and ToTensor<T> to convert between Tensor and Matrix representations for GPU processing. Includes type-specialized handling for float and double.

Sequence Diagram

sequenceDiagram
    actor Client
    participant GpuEngine
    participant CpuEngine
    participant GPU

    Client->>GpuEngine: TensorMatMul(a, b)
    alt Shape is Small
        GpuEngine->>CpuEngine: TensorMatMul(a, b)
        CpuEngine-->>GpuEngine: result
    else Shape is Large & GPU Healthy
        GpuEngine->>GpuEngine: ToMatrix(a), ToMatrix(b)
        GpuEngine->>GPU: MatrixMultiply
        GPU-->>GpuEngine: matrix_result
        GpuEngine->>GpuEngine: ToTensor(matrix_result)
    else GPU Unavailable
        GpuEngine->>CpuEngine: TensorMatMul(a, b)
        CpuEngine-->>GpuEngine: result
    end
    GpuEngine-->>Client: Tensor<T> result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • GpuEngine adaptive routing logic: Verify correct shape threshold for CPU fallback and GPU health checks
  • Type specialization in GpuEngine: Confirm float/double handling and type casting in conversion helpers
  • Interface duplication: Investigate why TensorMatMul and TensorTranspose appear twice in IEngine
  • Input validation consistency: Cross-check null checks and rank validation across CpuEngine and GpuEngine implementations
  • Tensor-Matrix conversion helpers: Validate correctness of ToMatrix and ToTensor bidirectional conversions

Poem

🐰 Two tensors dance, seeking dimensions bright,
Transposed and multiplied with CPU might!
When GPU's ready, with adaptive grace,
Small shapes hop back to the safe CPU space. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding TensorMatMul and TensorTranspose methods to the IEngine interface for proper integration.
Description check ✅ Passed The description is well-structured and directly related to the changeset, providing context about the changes to IEngine, CpuEngine, and GpuEngine with clear explanations of implementation approaches.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/iengine-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Engines/GpuEngine.cs (1)

5010-5023: Fix infinite recursion in RecordGpuFailure when max recovery attempts are exceeded

When _consecutiveFailures >= MaxRecoveryAttempts, RecordGpuFailure calls itself recursively:

if (_consecutiveFailures >= MaxRecoveryAttempts)
{
    RecordGpuFailure(exception);
    return true;
}

This never changes state in a way that breaks the recursion and will lead to a stack overflow once the limit is crossed, completely breaking GPU failure handling and potentially the entire process.

The likely intent is to permanently disable GPU usage after the maximum attempts, not to recurse. Suggested fix:

     private bool RecordGpuFailure(Exception exception)
     {
@@
-            // If we've exceeded maximum recovery attempts, permanently disable GPU
-            if (_consecutiveFailures >= MaxRecoveryAttempts)
-            {
-                RecordGpuFailure(exception);
-                return true;
-            }
+            // If we've exceeded maximum recovery attempts, permanently disable GPU
+            if (_consecutiveFailures >= MaxRecoveryAttempts)
+            {
+                _gpuHealthy = false;
+                Console.WriteLine($"[GpuEngine] GPU permanently disabled after {_consecutiveFailures} consecutive failures.");
+                return true;
+            }

This avoids recursion, correctly marks the GPU as permanently unhealthy, and keeps the rest of the health/recovery logic meaningful.

🧹 Nitpick comments (1)
src/Engines/GpuEngine.cs (1)

4036-4064: Consider more efficient 2D Tensor↔Matrix conversion for large shapes

ToMatrix/ToTensor are functionally correct and nicely encapsulate 2D conversions, but they perform element‑by‑element copies via nested indexer loops. For large matrices, this adds extra CPU overhead on top of the unavoidable host↔device copies.

If Tensor<T> and Matrix<T> share compatible row‑major layouts (as they appear to), you could eventually:

  • Expose spans from both types (AsSpan/AsWritableSpan) and
  • Replace the nested loops with a single span copy where possible.

That would turn the conversions into a single linear copy instead of O(m*n) indexer calls, without changing semantics.

This is not blocking, but worth tracking as a perf improvement once the layout guarantees are confirmed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef6df16 and 1db47cd.

📒 Files selected for processing (3)
  • src/Engines/CpuEngine.cs (1 hunks)
  • src/Engines/GpuEngine.cs (1 hunks)
  • src/Engines/IEngine.cs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-19T04:08:26.895Z
Learnt from: ooples
Repo: ooples/AiDotNet PR: 0
File: :0-0
Timestamp: 2025-11-19T04:08:26.895Z
Learning: For ILGPU GPU operations in GpuEngine.cs, use standard .NET exception types (InvalidOperationException, ArgumentException, OutOfMemoryException) instead of ILGPU-specific exception types, as ILGPU exception types may be version-specific. Combine with message-based filtering using ex.Message.Contains("device") or ex.Message.Contains("accelerator") as a fallback for GPU-specific errors.

Applied to files:

  • src/Engines/GpuEngine.cs
🧬 Code graph analysis (1)
src/Engines/IEngine.cs (1)
src/Engines/CpuEngine.cs (20)
  • T (266-279)
  • T (282-300)
  • T (303-312)
  • T (532-548)
  • Tensor (676-730)
  • Tensor (733-774)
  • Tensor (777-800)
  • Tensor (803-822)
  • Tensor (825-844)
  • Tensor (847-866)
  • Tensor (869-882)
  • Tensor (885-910)
  • Tensor (938-1004)
  • Tensor (1007-1079)
  • Tensor (1082-1170)
  • Tensor (1209-1215)
  • Tensor (1217-1223)
  • Tensor (1225-1241)
  • Tensor (1263-1268)
  • Tensor (1270-1275)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build All Frameworks
🔇 Additional comments (3)
src/Engines/CpuEngine.cs (2)

732-774: LGTM! Correct standard matrix multiplication implementation.

The implementation correctly validates inputs, checks dimension compatibility, and performs standard O(n³) matrix multiplication. The logic appropriately mirrors the existing MatrixMultiply method (lines 387-417) but operates on Tensor<T> types.


776-800: LGTM! Correct transpose implementation.

The implementation correctly validates inputs and performs 2D transpose by swapping indices. The logic appropriately mirrors the existing MatrixTranspose method (lines 449-464) but operates on Tensor<T> types.

src/Engines/IEngine.cs (1)

694-731: Interface and implementations are correct; integration to TensorOperations is pending future work.

The TensorMatMul<T> and TensorTranspose<T> interface declarations are well-documented with proper XML comments explaining their purpose in autodiff computation graphs. Both CpuEngine and GpuEngine provide implementations with appropriate adaptive execution logic and fallback mechanisms.

The search confirms that integration to TensorOperations is not yet complete (as already noted in the PR description), with TensorOperations still calling Tensor.MatrixMultiply() and Tensor.Transpose() directly rather than delegating to IEngine. This aligns with the PR's design decision to add the infrastructure now and integrate it in future work.

Copilot AI review requested due to automatic review settings December 1, 2025 02:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds TensorMatMul and TensorTranspose methods to the IEngine interface and implements them in both CpuEngine and GpuEngine. These additions complete the IEngine interface for tensor matrix operations, establishing infrastructure for future GPU acceleration in autodiff computation graphs.

Key Changes:

  • Added two new methods to IEngine interface with comprehensive XML documentation
  • Implemented CPU-based matrix multiplication and transpose for 2D tensors
  • Implemented GPU-based versions with adaptive thresholds and type-specific delegation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/Engines/IEngine.cs Added TensorMatMul and TensorTranspose interface methods with detailed documentation explaining usage in autodiff computation graphs
src/Engines/CpuEngine.cs Implemented standard matrix multiplication and transpose algorithms for 2D tensors with proper dimension validation
src/Engines/GpuEngine.cs Implemented GPU-accelerated versions with adaptive thresholds, conversion helpers, and CPU fallback for unsupported types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ooples ooples closed this Dec 1, 2025
ooples pushed a commit that referenced this pull request Dec 1, 2025
- PR #509: Added comprehensive gradient tests for TaylorSoftmax and
  GumbelSoftmax including temperature scaling, hard mode, and validation
- PR #508: Verified Sparsemax threshold algorithm and SphericalSoftmax
  gradient implementation (correct standard algorithms)
- PR #504: Verified GpuEngine TensorMatMul/TensorTranspose threshold logic
- PR #500: Fixed 76+ redundant null check patterns in TensorOperations.cs
  using proper local variable approach for null safety instead of verbose
  nested if/else blocks
- Fixed CreateRandomTensor helper in tests to use proper Tensor constructor
- Added braces to if statements for proper block statements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
ooples pushed a commit that referenced this pull request Dec 1, 2025
- Add GradHardSigmoid with proper masking for -3 < x < 3
- Add GradHardTanh with proper masking for minVal < x < maxVal
- Add GradSoftPlus with numerically stable implementation
- Fix Softplus forward pass: use max(0,x) + log(1+exp(-|x|)) formula
- Add comprehensive TensorMatMul/TensorTranspose tests (20 tests)

Addresses PR review comments for #499, #500, #503, #504, #508, #509

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants