-
-
Notifications
You must be signed in to change notification settings - Fork 7
feat(Story-2): Add IR Operations for ReLU Family (Group 1) #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added IROp interface and IR operation classes for 8 ReLU-family activations: - ReLUOp (uses IEngine.ReLU) - GeluOp (uses IEngine.GELU) - EluOp (uses IEngine.ELU with alpha parameter) - SeluOp (with α and λ constants for self-normalizing networks) - CeluOp (continuously differentiable ELU with alpha parameter) - LeakyReLUOp (parameterized negative slope, default 0.01) - PReLUOp (learnable per-channel parameters) - RReLUOp (randomized slope for regularization) - ThresholdedReLUOp (sparse activations above threshold) Also included SigmoidOp and TanhOp for completeness. These enable JIT compilation for layers using these activation functions. Forward() methods use IEngine where available, others marked for future implementation. Backward() methods are placeholders for gradient support. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a public marker interface Changes
Sequence DiagramsequenceDiagram
participant Client
participant ActivationOp as Activation Operator
participant Engine as IEngine
participant Tensor
Client->>ActivationOp: Forward(input Tensor)
ActivationOp->>ActivationOp: validate input
ActivationOp->>Engine: call activation (e.g., ReLU/GELU/ELU...)
Engine->>Tensor: compute output Tensor
Tensor-->>ActivationOp: return output
ActivationOp-->>Client: return output Tensor
Note right of ActivationOp: Backward currently throws NotImplementedException
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/Interfaces/IJitCompilable.cs (1)
1-1: Remove unused using directive.The
using AiDotNet.LinearAlgebra;directive is not used in this file since the interface is a marker with no members.-using AiDotNet.LinearAlgebra; - namespace AiDotNet.Interfaces;src/JIT/ActivationOps.cs (3)
103-109: Consider validating the alpha parameter.The
alphaparameter accepts any double value without validation. While negative or zero values won't cause runtime errors, they may produce unexpected behavior:
- Negative alpha inverts the ELU curve for negative inputs
- Zero alpha makes ELU equivalent to ReLU
If these edge cases are intentional, consider documenting them. Otherwise, add validation.
public EluOp(IEngine engine, double alpha = 1.0) { if (engine == null) throw new ArgumentNullException(nameof(engine)); + if (alpha <= 0) + throw new ArgumentException("Alpha must be positive", nameof(alpha)); _engine = engine; _alpha = alpha; }
277-277: Make the field readonly.The
_alphafield is only assigned in the constructor and could be markedreadonlyto indicate immutability.- private Tensor<T> _alpha; + private readonly Tensor<T> _alpha;
372-378: Consider documenting or validating the threshold parameter.The
thresholdparameter accepts any double value without validation. While negative thresholds won't cause runtime errors, they may produce unexpected behavior (e.g., negative threshold would allow activation for almost all values). If this is intentional, consider documenting the expected range.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/Interfaces/IJitCompilable.cs(1 hunks)src/JIT/ActivationOps.cs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build All Frameworks
🔇 Additional comments (1)
src/JIT/ActivationOps.cs (1)
200-207: Forward method implementations deferred as expected.Several activation operations (CELU, LeakyReLU, PReLU, RReLU, ThresholdedReLU) have their
Forwardmethods throwingNotImplementedExceptionwith clear explanations about missing IEngine capabilities. This is consistent with the PR description indicating these are marked for future implementation.The error messages clearly communicate what's needed:
- Element-wise conditional operations
- Broadcasting operations
- Random number generation
This provides a clear roadmap for future work.
Also applies to: 245-251, 289-295, 338-344, 380-386
Replace unsafe (T)(object)Lambda cast with proper _numOps.FromDouble(Lambda) pattern using INumericOperations<T> to safely convert double constants to generic type T. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/JIT/ActivationOps.cs (2)
232-399: Advanced ReLU variants: parameter validation is good; stubs are acceptable but consider usage guardsLeakyReLU, PReLU, RReLU, and ThresholdedReLU all validate constructor inputs (e.g., non-negative slopes, valid bounds, non-null
alpha) and then throwNotImplementedExceptioninForward/Backward. That’s fine as long as no current JIT path emits these ops yet, but it will surface as runtime failures if they accidentally enter a graph.Two small improvements you might consider (low priority):
- Make it explicit in higher-level IR/JIT builders that these ops are not yet supported, so they’re never constructed at runtime.
- Optionally add a brief XML
<remarks>note per class stating that only the IR “shape” is defined and execution will throw until engine support is added.No code changes are strictly required for this PR, but clarifying their “stub” status would reduce surprises later.
411-479: Sigmoid/Tanh ops correctly delegate to engine and follow the shared pattern
SigmoidOpandTanhOpmirror the pattern of the other simple activations: ctor null-checks,Forwarddelegating to_engine.Sigmoid/_engine.Tanh, and explicitNotImplementedExceptionforBackward. This is clean and consistent.If the number of such ops grows further, you might later extract a small base class to share the
IEnginefield and argument null-checking, but that’s purely a convenience refactor.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/JIT/ActivationOps.cs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: CodeQL analysis (csharp)
- GitHub Check: Agent
- GitHub Check: Build All Frameworks
🔇 Additional comments (2)
src/JIT/ActivationOps.cs (2)
18-87: Simple ReLU/GELU IR ops look correct and consistentConstructor null-checks, generic constraints, and
Forwarddelegations to_engine.ReLU/_engine.GELUare straightforward and consistent with the intended IROp pattern;Backwardstubs clearly signal incomplete gradient support without hiding failures.
99-175: SELU implementation and numeric conversion now look soundUsing
MathHelper.GetNumericOperations<T>()plus_numOps.FromDouble(Lambda)to scale the ELU result avoids the earlier boxed cast issue and should be safe for supported numericTtypes, assumingMathHelperonly returns non-null operations for valid tensor element types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces an Intermediate Representation (IR) layer for activation functions, adding 11 new operation classes for ReLU-family and common activations. The IR operations serve as a bridge between high-level neural network layers and low-level execution engines, with support for JIT compilation.
Key Changes:
- Added
IROpmarker interface for JIT-compilable operations - Implemented 11 activation operation classes (ReLU, GELU, ELU, SELU, CELU, LeakyReLU, PReLU, RReLU, ThresholdedReLU, Sigmoid, Tanh)
- Five operations have fully implemented forward passes using existing IEngine methods; six are marked for future implementation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| src/Interfaces/IJitCompilable.cs | Introduces the IROp marker interface for operations that can be JIT-compiled |
| src/JIT/ActivationOps.cs | Implements 11 activation operation classes with forward/backward method stubs and comprehensive XML documentation |
Comments suppressed due to low confidence (1)
src/JIT/ActivationOps.cs:279
- Field '_alpha' can be 'readonly'.
private Tensor<T> _alpha;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Story 2: IR Operations - ReLU Family
Added 8 IR operation classes for ReLU-family activations plus 2 additional common activations.
Activations Added:
Pattern:
Files Modified:
src/Interfaces/IJitCompilable.cs- New IROp marker interface for JIT-compilable operationssrc/JIT/ActivationOps.cs- New file with all activation IR operationsBuild Status: ✅ Build succeeded with 0 warnings, 0 errors
🤖 Generated with Claude Code