-
-
Notifications
You must be signed in to change notification settings - Fork 7
fix: add tensormatmul and tensortranspose to iengine for proper integration #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ration Added missing IEngine methods for tensor-level matrix operations to enable GPU acceleration in autodiff computation graphs. TensorOperations currently uses Tensor.MatrixMultiply and Tensor.Transpose which don't leverage IEngine, but this infrastructure enables future optimization. Changes: - Add TensorMatMul and TensorTranspose methods to IEngine interface - Implement both methods in CpuEngine with standard matrix algorithms - Implement both methods in GpuEngine delegating to Matrix operations - Add helper methods ToMatrix/ToTensor for 2D tensor-matrix conversion This completes the IEngine interface for tensor operations required by autodiff, even though TensorOperations doesn't use them yet due to ComputationNode lacking an Engine reference. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Summary by CodeRabbitNew Features
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds two new tensor operation methods— Changes
Sequence DiagramsequenceDiagram
actor Client
participant GpuEngine
participant CpuEngine
participant GPU
Client->>GpuEngine: TensorMatMul(a, b)
alt Shape is Small
GpuEngine->>CpuEngine: TensorMatMul(a, b)
CpuEngine-->>GpuEngine: result
else Shape is Large & GPU Healthy
GpuEngine->>GpuEngine: ToMatrix(a), ToMatrix(b)
GpuEngine->>GPU: MatrixMultiply
GPU-->>GpuEngine: matrix_result
GpuEngine->>GpuEngine: ToTensor(matrix_result)
else GPU Unavailable
GpuEngine->>CpuEngine: TensorMatMul(a, b)
CpuEngine-->>GpuEngine: result
end
GpuEngine-->>Client: Tensor<T> result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Engines/GpuEngine.cs (1)
5010-5023: Fix infinite recursion inRecordGpuFailurewhen max recovery attempts are exceededWhen
_consecutiveFailures >= MaxRecoveryAttempts,RecordGpuFailurecalls itself recursively:if (_consecutiveFailures >= MaxRecoveryAttempts) { RecordGpuFailure(exception); return true; }This never changes state in a way that breaks the recursion and will lead to a stack overflow once the limit is crossed, completely breaking GPU failure handling and potentially the entire process.
The likely intent is to permanently disable GPU usage after the maximum attempts, not to recurse. Suggested fix:
private bool RecordGpuFailure(Exception exception) { @@ - // If we've exceeded maximum recovery attempts, permanently disable GPU - if (_consecutiveFailures >= MaxRecoveryAttempts) - { - RecordGpuFailure(exception); - return true; - } + // If we've exceeded maximum recovery attempts, permanently disable GPU + if (_consecutiveFailures >= MaxRecoveryAttempts) + { + _gpuHealthy = false; + Console.WriteLine($"[GpuEngine] GPU permanently disabled after {_consecutiveFailures} consecutive failures."); + return true; + }This avoids recursion, correctly marks the GPU as permanently unhealthy, and keeps the rest of the health/recovery logic meaningful.
🧹 Nitpick comments (1)
src/Engines/GpuEngine.cs (1)
4036-4064: Consider more efficient 2D Tensor↔Matrix conversion for large shapes
ToMatrix/ToTensorare functionally correct and nicely encapsulate 2D conversions, but they perform element‑by‑element copies via nested indexer loops. For large matrices, this adds extra CPU overhead on top of the unavoidable host↔device copies.If
Tensor<T>andMatrix<T>share compatible row‑major layouts (as they appear to), you could eventually:
- Expose spans from both types (
AsSpan/AsWritableSpan) and- Replace the nested loops with a single span copy where possible.
That would turn the conversions into a single linear copy instead of
O(m*n)indexer calls, without changing semantics.This is not blocking, but worth tracking as a perf improvement once the layout guarantees are confirmed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/Engines/CpuEngine.cs(1 hunks)src/Engines/GpuEngine.cs(1 hunks)src/Engines/IEngine.cs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-19T04:08:26.895Z
Learnt from: ooples
Repo: ooples/AiDotNet PR: 0
File: :0-0
Timestamp: 2025-11-19T04:08:26.895Z
Learning: For ILGPU GPU operations in GpuEngine.cs, use standard .NET exception types (InvalidOperationException, ArgumentException, OutOfMemoryException) instead of ILGPU-specific exception types, as ILGPU exception types may be version-specific. Combine with message-based filtering using ex.Message.Contains("device") or ex.Message.Contains("accelerator") as a fallback for GPU-specific errors.
Applied to files:
src/Engines/GpuEngine.cs
🧬 Code graph analysis (1)
src/Engines/IEngine.cs (1)
src/Engines/CpuEngine.cs (20)
T(266-279)T(282-300)T(303-312)T(532-548)Tensor(676-730)Tensor(733-774)Tensor(777-800)Tensor(803-822)Tensor(825-844)Tensor(847-866)Tensor(869-882)Tensor(885-910)Tensor(938-1004)Tensor(1007-1079)Tensor(1082-1170)Tensor(1209-1215)Tensor(1217-1223)Tensor(1225-1241)Tensor(1263-1268)Tensor(1270-1275)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build All Frameworks
🔇 Additional comments (3)
src/Engines/CpuEngine.cs (2)
732-774: LGTM! Correct standard matrix multiplication implementation.The implementation correctly validates inputs, checks dimension compatibility, and performs standard O(n³) matrix multiplication. The logic appropriately mirrors the existing
MatrixMultiplymethod (lines 387-417) but operates onTensor<T>types.
776-800: LGTM! Correct transpose implementation.The implementation correctly validates inputs and performs 2D transpose by swapping indices. The logic appropriately mirrors the existing
MatrixTransposemethod (lines 449-464) but operates onTensor<T>types.src/Engines/IEngine.cs (1)
694-731: Interface and implementations are correct; integration to TensorOperations is pending future work.The
TensorMatMul<T>andTensorTranspose<T>interface declarations are well-documented with proper XML comments explaining their purpose in autodiff computation graphs. BothCpuEngineandGpuEngineprovide implementations with appropriate adaptive execution logic and fallback mechanisms.The search confirms that integration to
TensorOperationsis not yet complete (as already noted in the PR description), withTensorOperationsstill callingTensor.MatrixMultiply()andTensor.Transpose()directly rather than delegating toIEngine. This aligns with the PR's design decision to add the infrastructure now and integrate it in future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds TensorMatMul and TensorTranspose methods to the IEngine interface and implements them in both CpuEngine and GpuEngine. These additions complete the IEngine interface for tensor matrix operations, establishing infrastructure for future GPU acceleration in autodiff computation graphs.
Key Changes:
- Added two new methods to
IEngineinterface with comprehensive XML documentation - Implemented CPU-based matrix multiplication and transpose for 2D tensors
- Implemented GPU-based versions with adaptive thresholds and type-specific delegation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| src/Engines/IEngine.cs | Added TensorMatMul and TensorTranspose interface methods with detailed documentation explaining usage in autodiff computation graphs |
| src/Engines/CpuEngine.cs | Implemented standard matrix multiplication and transpose algorithms for 2D tensors with proper dimension validation |
| src/Engines/GpuEngine.cs | Implemented GPU-accelerated versions with adaptive thresholds, conversion helpers, and CPU fallback for unsupported types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- PR #509: Added comprehensive gradient tests for TaylorSoftmax and GumbelSoftmax including temperature scaling, hard mode, and validation - PR #508: Verified Sparsemax threshold algorithm and SphericalSoftmax gradient implementation (correct standard algorithms) - PR #504: Verified GpuEngine TensorMatMul/TensorTranspose threshold logic - PR #500: Fixed 76+ redundant null check patterns in TensorOperations.cs using proper local variable approach for null safety instead of verbose nested if/else blocks - Fixed CreateRandomTensor helper in tests to use proper Tensor constructor - Added braces to if statements for proper block statements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add GradHardSigmoid with proper masking for -3 < x < 3 - Add GradHardTanh with proper masking for minVal < x < maxVal - Add GradSoftPlus with numerically stable implementation - Fix Softplus forward pass: use max(0,x) + log(1+exp(-|x|)) formula - Add comprehensive TensorMatMul/TensorTranspose tests (20 tests) Addresses PR review comments for #499, #500, #503, #504, #508, #509 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
Verified and completed IEngine integration for tensor matrix operations. Added missing
TensorMatMulandTensorTransposemethods to the IEngine interface to enable future GPU acceleration in autodiff computation graphs.Key Findings:
Tensor.MatrixMultiply()(not IEngine)Tensor.Transpose()(not IEngine)Changes
IEngine Interface (
src/Engines/IEngine.cs)TensorMatMul<T>(Tensor<T> a, Tensor<T> b)methodTensorTranspose<T>(Tensor<T> tensor)methodCpuEngine (
src/Engines/CpuEngine.cs)TensorMatMulwith standard matrix multiplication algorithmTensorTransposewith row/column swap logicGpuEngine (
src/Engines/GpuEngine.cs)TensorMatMuldelegating to Matrix operations with adaptive thresholdsTensorTransposedelegating to Matrix operationsToMatrix<T>andToTensor<T>helper methods for conversionArchitecture Impact
While TensorOperations doesn't use these methods yet (due to ComputationNode lacking an Engine reference), this PR:
Testing
Next Steps
Future work could include:
Generated with Claude Code