|
| 1 | +# JIT Activation Mapping Reference |
| 2 | + |
| 3 | +This document provides a complete reference for all activation functions available in AiDotNet, their JIT compilation support status, and how to use them in your layers. |
| 4 | + |
| 5 | +## Quick Reference |
| 6 | + |
| 7 | +**Total Activations**: 37 |
| 8 | +**Production-Ready**: 10 |
| 9 | +**Available (Pending Integration)**: 27 |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Production-Ready Activations (10) |
| 14 | + |
| 15 | +These activations are fully integrated into DenseLayer and ready for use in JIT compilation. |
| 16 | + |
| 17 | +### ReLU Family (1) |
| 18 | + |
| 19 | +| Activation Class | TensorOperations Method | IEngine Method | Parameters | Status | |
| 20 | +|------------------|-------------------------|----------------|------------|--------| |
| 21 | +| `ReLUActivation<T>` | `TensorOperations<T>.ReLU(node)` | `IEngine<T>.ReLU(tensor)` | None | ✅ Ready | |
| 22 | + |
| 23 | +**Usage Example:** |
| 24 | +```csharp |
| 25 | +// In CanActivationBeJitted() |
| 26 | +if (ScalarActivation is ReLUActivation<T>) |
| 27 | + return true; |
| 28 | + |
| 29 | +// In ApplyActivationToGraph() |
| 30 | +if (ScalarActivation is ReLUActivation<T>) |
| 31 | + return TensorOperations<T>.ReLU(input); |
| 32 | +``` |
| 33 | + |
| 34 | +**Forward Function**: `f(x) = max(0, x)` |
| 35 | + |
| 36 | +**Use Cases**: Default activation for hidden layers in most neural networks. |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +### Sigmoid Family (5) |
| 41 | + |
| 42 | +| Activation Class | TensorOperations Method | IEngine Method | Parameters | Status | |
| 43 | +|------------------|-------------------------|----------------|------------|--------| |
| 44 | +| `SigmoidActivation<T>` | `TensorOperations<T>.Sigmoid(node)` | `IEngine<T>.Sigmoid(tensor)` | None | ✅ Ready | |
| 45 | +| `TanhActivation<T>` | `TensorOperations<T>.Tanh(node)` | `IEngine<T>.Tanh(tensor)` | None | ✅ Ready | |
| 46 | +| `SwishActivation<T>` | `TensorOperations<T>.Swish(node)` | `IEngine<T>.Swish(tensor)` | None | ✅ Ready | |
| 47 | +| `SiLUActivation<T>` | `TensorOperations<T>.SiLU(node)` | `IEngine<T>.SiLU(tensor)` | None | ✅ Ready | |
| 48 | +| `MishActivation<T>` | `TensorOperations<T>.Mish(node)` | `IEngine<T>.Mish(tensor)` | None | ✅ Ready | |
| 49 | + |
| 50 | +**Usage Example (Sigmoid):** |
| 51 | +```csharp |
| 52 | +// In CanActivationBeJitted() |
| 53 | +if (ScalarActivation is SigmoidActivation<T>) |
| 54 | + return true; |
| 55 | + |
| 56 | +// In ApplyActivationToGraph() |
| 57 | +if (ScalarActivation is SigmoidActivation<T>) |
| 58 | + return TensorOperations<T>.Sigmoid(input); |
| 59 | +``` |
| 60 | + |
| 61 | +**Forward Functions**: |
| 62 | +- **Sigmoid**: `f(x) = 1 / (1 + e^(-x))` |
| 63 | +- **Tanh**: `f(x) = (e^x - e^(-x)) / (e^x + e^(-x))` |
| 64 | +- **Swish**: `f(x) = x * sigmoid(x)` (also known as SiLU) |
| 65 | +- **SiLU**: Same as Swish |
| 66 | +- **Mish**: `f(x) = x * tanh(softplus(x))` |
| 67 | + |
| 68 | +**Use Cases**: |
| 69 | +- **Sigmoid**: Binary classification output layers, LSTM gates |
| 70 | +- **Tanh**: RNN hidden states, centered outputs (-1 to 1) |
| 71 | +- **Swish/SiLU**: Modern alternative to ReLU with smooth gradients |
| 72 | +- **Mish**: Self-regularized activation, good for deep networks |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +### Modern Activations (2) |
| 77 | + |
| 78 | +| Activation Class | TensorOperations Method | IEngine Method | Parameters | Status | |
| 79 | +|------------------|-------------------------|----------------|------------|--------| |
| 80 | +| `GELUActivation<T>` | `TensorOperations<T>.GELU(node)` | `IEngine<T>.GELU(tensor)` | None | ✅ Ready | |
| 81 | +| `ELUActivation<T>` | `TensorOperations<T>.ELU(node, alpha)` | `IEngine<T>.ELU(tensor, alpha)` | `alpha` (default: 1.0) | ✅ Ready | |
| 82 | + |
| 83 | +**Usage Example (GELU):** |
| 84 | +```csharp |
| 85 | +// In CanActivationBeJitted() |
| 86 | +if (ScalarActivation is GELUActivation<T>) |
| 87 | + return true; |
| 88 | + |
| 89 | +// In ApplyActivationToGraph() |
| 90 | +if (ScalarActivation is GELUActivation<T>) |
| 91 | + return TensorOperations<T>.GELU(input); |
| 92 | +``` |
| 93 | + |
| 94 | +**Usage Example (ELU with parameter):** |
| 95 | +```csharp |
| 96 | +// In CanActivationBeJitted() |
| 97 | +if (ScalarActivation is ELUActivation<T>) |
| 98 | + return true; |
| 99 | + |
| 100 | +// In ApplyActivationToGraph() |
| 101 | +if (ScalarActivation is ELUActivation<T> elu) |
| 102 | + return TensorOperations<T>.ELU(input, elu.Alpha); |
| 103 | +``` |
| 104 | + |
| 105 | +**Forward Functions**: |
| 106 | +- **GELU**: `f(x) = x * Φ(x)` where Φ is the cumulative distribution function of the standard normal distribution |
| 107 | +- **ELU**: `f(x) = x if x > 0, else alpha * (e^x - 1)` |
| 108 | + |
| 109 | +**Use Cases**: |
| 110 | +- **GELU**: Used in Transformers (BERT, GPT), superior to ReLU for NLP tasks |
| 111 | +- **ELU**: Reduces vanishing gradient problem, smooth negative values |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +### Vector Activations (1) |
| 116 | + |
| 117 | +| Activation Class | TensorOperations Method | IEngine Method | Parameters | Status | |
| 118 | +|------------------|-------------------------|----------------|------------|--------| |
| 119 | +| `SoftmaxActivation<T>` | `TensorOperations<T>.Softmax(node, axis)` | `IEngine<T>.Softmax(tensor, axis)` | `axis` (default: -1) | ✅ Ready | |
| 120 | + |
| 121 | +**Usage Example:** |
| 122 | +```csharp |
| 123 | +// In CanActivationBeJitted() |
| 124 | +if (VectorActivation is SoftmaxActivation<T>) |
| 125 | + return true; |
| 126 | + |
| 127 | +// In ApplyActivationToGraph() |
| 128 | +if (VectorActivation is SoftmaxActivation<T>) |
| 129 | + return TensorOperations<T>.Softmax(input); |
| 130 | +``` |
| 131 | + |
| 132 | +**Forward Function**: `f(x_i) = e^(x_i) / Σ(e^(x_j))` |
| 133 | + |
| 134 | +**Use Cases**: Multi-class classification output layers, attention mechanisms. |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +### Identity (1) |
| 139 | + |
| 140 | +| Activation Class | TensorOperations Method | IEngine Method | Parameters | Status | |
| 141 | +|------------------|-------------------------|----------------|------------|--------| |
| 142 | +| `IdentityActivation<T>` | `input` (no-op) | N/A | None | ✅ Ready | |
| 143 | + |
| 144 | +**Usage Example:** |
| 145 | +```csharp |
| 146 | +// In CanActivationBeJitted() |
| 147 | +if (ScalarActivation is IdentityActivation<T>) |
| 148 | + return true; |
| 149 | + |
| 150 | +// In ApplyActivationToGraph() |
| 151 | +if (ScalarActivation is IdentityActivation<T>) |
| 152 | + return input; // No transformation |
| 153 | +``` |
| 154 | + |
| 155 | +**Forward Function**: `f(x) = x` |
| 156 | + |
| 157 | +**Use Cases**: Linear layers, skip connections, output layers for regression. |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## Available Activations - Pending Integration (27) |
| 162 | + |
| 163 | +These activations have TensorOperations methods implemented but are not yet integrated into layer implementations. To use them, follow the pattern shown in the "Production-Ready" section above. |
| 164 | + |
| 165 | +### ReLU Family (7) |
| 166 | + |
| 167 | +| Activation Class | TensorOperations Method | Parameters | Forward Function | IEngine Status | |
| 168 | +|------------------|-------------------------|------------|------------------|----------------| |
| 169 | +| `LeakyReLUActivation<T>` | `TensorOperations<T>.LeakyReLU(node, negativeSlope)` | `negativeSlope` (default: 0.01) | `f(x) = max(negativeSlope*x, x)` | ✅ Integrated | |
| 170 | +| `SELUActivation<T>` | `TensorOperations<T>.SELU(node)` | None | `f(x) = scale * (max(0,x) + min(0, alpha*(e^x-1)))` | ✅ Integrated | |
| 171 | +| `CELUActivation<T>` | `TensorOperations<T>.CELU(node, alpha)` | `alpha` (default: 1.0) | `f(x) = max(0,x) + min(0, alpha*(e^(x/alpha)-1))` | ✅ Integrated | |
| 172 | +| `PReLUActivation<T>` | `TensorOperations<T>.PReLU(node, alpha)` | `alpha` (default: 0.25) | `f(x) = max(alpha*x, x)` | ✅ Integrated | |
| 173 | +| `RReLUActivation<T>` | `TensorOperations<T>.RReLU(node, lower, upper)` | `lower` (0.125), `upper` (0.333) | `f(x) = max(a*x, x)` where a ~ U(lower, upper) | ✅ Integrated | |
| 174 | +| `ThresholdedReLUActivation<T>` | `TensorOperations<T>.ThresholdedReLU(node, threshold)` | `threshold` (default: 1.0) | `f(x) = x if x > threshold, else 0` | ✅ Integrated | |
| 175 | + |
| 176 | +**Integration Example (LeakyReLU):** |
| 177 | +```csharp |
| 178 | +// Add to CanActivationBeJitted() |
| 179 | +if (ScalarActivation is LeakyReLUActivation<T>) |
| 180 | + return true; |
| 181 | + |
| 182 | +// Add to ApplyActivationToGraph() |
| 183 | +if (ScalarActivation is LeakyReLUActivation<T> leakyRelu) |
| 184 | + return TensorOperations<T>.LeakyReLU(input, leakyRelu.NegativeSlope); |
| 185 | +``` |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +### Sigmoid Family (9) |
| 190 | + |
| 191 | +| Activation Class | TensorOperations Method | Parameters | Forward Function | IEngine Status | |
| 192 | +|------------------|-------------------------|------------|------------------|----------------| |
| 193 | +| `HardSigmoidActivation<T>` | `TensorOperations<T>.HardSigmoid(node)` | None | `f(x) = clip((x+1)/2, 0, 1)` | ✅ Integrated | |
| 194 | +| `HardTanhActivation<T>` | `TensorOperations<T>.HardTanh(node)` | None | `f(x) = clip(x, -1, 1)` | ✅ Integrated | |
| 195 | +| `ScaledTanhActivation<T>` | `TensorOperations<T>.ScaledTanh(node, alpha, beta)` | `alpha` (1.0), `beta` (1.0) | `f(x) = alpha * tanh(beta * x)` | ✅ Integrated | |
| 196 | +| `SoftplusActivation<T>` | `TensorOperations<T>.Softplus(node)` | None | `f(x) = log(1 + e^x)` | ✅ Integrated | |
| 197 | +| `SoftsignActivation<T>` | `TensorOperations<T>.Softsign(node)` | None | `f(x) = x / (1 + abs(x))` | ✅ Integrated | |
| 198 | +| `BentIdentityActivation<T>` | `TensorOperations<T>.BentIdentity(node)` | None | `f(x) = (sqrt(x^2 + 1) - 1)/2 + x` | ✅ Integrated | |
| 199 | + |
| 200 | +**Integration Example (Softplus):** |
| 201 | +```csharp |
| 202 | +// Add to CanActivationBeJitted() |
| 203 | +if (ScalarActivation is SoftplusActivation<T>) |
| 204 | + return true; |
| 205 | + |
| 206 | +// Add to ApplyActivationToGraph() |
| 207 | +if (ScalarActivation is SoftplusActivation<T>) |
| 208 | + return TensorOperations<T>.Softplus(input); |
| 209 | +``` |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +### Softmax Family (3) |
| 214 | + |
| 215 | +| Activation Class | TensorOperations Method | Parameters | Forward Function | IEngine Status | |
| 216 | +|------------------|-------------------------|------------|------------------|----------------| |
| 217 | +| `SoftminActivation<T>` | `TensorOperations<T>.Softmin(node, axis)` | `axis` (default: -1) | `f(x_i) = e^(-x_i) / Σ(e^(-x_j))` | ✅ Integrated | |
| 218 | +| `LogSoftmaxActivation<T>` | `TensorOperations<T>.LogSoftmax(node, axis)` | `axis` (default: -1) | `f(x_i) = log(e^(x_i) / Σ(e^(x_j)))` | ✅ Integrated | |
| 219 | +| `LogSoftminActivation<T>` | `TensorOperations<T>.LogSoftmin(node, axis)` | `axis` (default: -1) | `f(x_i) = log(e^(-x_i) / Σ(e^(-x_j)))` | ✅ Integrated | |
| 220 | + |
| 221 | +**Integration Example (LogSoftmax):** |
| 222 | +```csharp |
| 223 | +// Add to CanActivationBeJitted() - check VectorActivation |
| 224 | +if (VectorActivation is LogSoftmaxActivation<T>) |
| 225 | + return true; |
| 226 | + |
| 227 | +// Add to ApplyActivationToGraph() - check VectorActivation |
| 228 | +if (VectorActivation is LogSoftmaxActivation<T>) |
| 229 | + return TensorOperations<T>.LogSoftmax(input); |
| 230 | +``` |
| 231 | + |
| 232 | +--- |
| 233 | + |
| 234 | +### Special Activations (8) |
| 235 | + |
| 236 | +| Activation Class | TensorOperations Method | Parameters | Forward Function | IEngine Status | |
| 237 | +|------------------|-------------------------|------------|------------------|----------------| |
| 238 | +| `SignActivation<T>` | `TensorOperations<T>.Sign(node)` | None | `f(x) = 1 if x > 0, -1 if x < 0, 0 if x == 0` | ✅ Integrated | |
| 239 | +| `GaussianActivation<T>` | `TensorOperations<T>.Gaussian(node)` | None | `f(x) = e^(-x^2)` | ✅ Integrated | |
| 240 | +| `ISRUActivation<T>` | `TensorOperations<T>.ISRU(node, alpha)` | `alpha` (default: 1.0) | `f(x) = x / sqrt(1 + alpha*x^2)` | ✅ Integrated | |
| 241 | +| `LiSHTActivation<T>` | `TensorOperations<T>.LiSHT(node)` | None | `f(x) = x * tanh(x)` | ✅ Integrated | |
| 242 | +| `SQRBFActivation<T>` | `TensorOperations<T>.SQRBF(node, center, width)` | `center` (0.0), `width` (1.0) | `f(x) = e^(-((x-center)/width)^2)` | ✅ Integrated | |
| 243 | +| `SquashActivation<T>` | `TensorOperations<T>.Squash(node)` | None | `f(x) = (norm^2 / (1 + norm^2)) * (x / norm)` | ✅ Integrated | |
| 244 | +| `BinarySpikingActivation<T>` | `TensorOperations<T>.BinarySpiking(node, threshold)` | `threshold` (default: 0.0) | `f(x) = 1 if x > threshold, else 0` | ✅ Integrated | |
| 245 | + |
| 246 | +**Integration Example (Gaussian):** |
| 247 | +```csharp |
| 248 | +// Add to CanActivationBeJitted() |
| 249 | +if (ScalarActivation is GaussianActivation<T>) |
| 250 | + return true; |
| 251 | + |
| 252 | +// Add to ApplyActivationToGraph() |
| 253 | +if (ScalarActivation is GaussianActivation<T>) |
| 254 | + return TensorOperations<T>.Gaussian(input); |
| 255 | +``` |
| 256 | + |
| 257 | +--- |
| 258 | + |
| 259 | +### Complex Activations - Placeholder Status (6) |
| 260 | + |
| 261 | +These activations have placeholder implementations in TensorOperations. Full implementation requires complex algorithms and will be completed in the gradient computation phase. |
| 262 | + |
| 263 | +| Activation Class | TensorOperations Method | Parameters | Description | Status | |
| 264 | +|------------------|-------------------------|------------|-------------|--------| |
| 265 | +| `SparsemaxActivation<T>` | `TensorOperations<T>.Sparsemax(node, axis)` | `axis` (default: -1) | Projects onto simplex, produces sparse outputs | ⚠️ Placeholder | |
| 266 | +| `SphericalSoftmaxActivation<T>` | `TensorOperations<T>.SphericalSoftmax(node, axis)` | `axis` (default: -1) | Normalizes to unit sphere | ⚠️ Placeholder | |
| 267 | +| `GumbelSoftmaxActivation<T>` | `TensorOperations<T>.GumbelSoftmax(node, temp, axis)` | `temp` (1.0), `axis` (-1) | Differentiable sampling | ⚠️ Placeholder | |
| 268 | +| `TaylorSoftmaxActivation<T>` | `TensorOperations<T>.TaylorSoftmax(node, order, axis)` | `order` (2), `axis` (-1) | Taylor approximation of softmax | ⚠️ Placeholder | |
| 269 | +| `HierarchicalSoftmaxActivation<T>` | `TensorOperations<T>.HierarchicalSoftmax(node)` | None | Tree-structured softmax | ⚠️ Placeholder | |
| 270 | +| `MaxoutActivation<T>` | `TensorOperations<T>.Maxout(node, numPieces)` | `numPieces` (default: 2) | Learnable piecewise linear | ⚠️ Placeholder | |
| 271 | + |
| 272 | +**Note**: These activations currently throw `NotImplementedException` for backward pass. Do not use in production until fully implemented. |
| 273 | + |
| 274 | +--- |
| 275 | + |
| 276 | +## Backward Pass Status |
| 277 | + |
| 278 | +**Current Status**: Placeholder implementations only |
| 279 | + |
| 280 | +All TensorOperations activation methods currently have placeholder backward functions: |
| 281 | + |
| 282 | +```csharp |
| 283 | +backward: (gradOutput) => |
| 284 | +{ |
| 285 | + throw new NotImplementedException("Backward pass for [Activation] not yet implemented"); |
| 286 | +} |
| 287 | +``` |
| 288 | + |
| 289 | +**Future Work**: Gradient computation will be implemented in a future phase. This includes: |
| 290 | +- Analytical gradient formulas for all 37 activations |
| 291 | +- Efficient backward pass implementations |
| 292 | +- Support for training with JIT-compiled graphs |
| 293 | + |
| 294 | +**Current Limitation**: JIT compilation is only suitable for **inference** (forward pass only). For **training**, use eager mode until backward pass is implemented. |
| 295 | + |
| 296 | +--- |
| 297 | + |
| 298 | +## Activation Selection Guide |
| 299 | + |
| 300 | +### For Image Classification (CNNs) |
| 301 | + |
| 302 | +**Recommended**: |
| 303 | +- Hidden layers: `ReLUActivation<T>` (fast, effective) |
| 304 | +- Modern alternative: `GELUActivation<T>` (smoother gradients) |
| 305 | +- Output layer: `SoftmaxActivation<T>` (multi-class) |
| 306 | + |
| 307 | +**Example**: |
| 308 | +```csharp |
| 309 | +var conv1 = new ConvolutionalLayer<float>(filters: 32, kernelSize: 3, activation: new ReLUActivation<float>()); |
| 310 | +var conv2 = new ConvolutionalLayer<float>(filters: 64, kernelSize: 3, activation: new ReLUActivation<float>()); |
| 311 | +var dense = new DenseLayer<float>(inputSize: 1024, outputSize: 10, activation: new SoftmaxActivation<float>()); |
| 312 | +``` |
| 313 | + |
| 314 | +### For Natural Language Processing (Transformers) |
| 315 | + |
| 316 | +**Recommended**: |
| 317 | +- Hidden layers: `GELUActivation<T>` (used in BERT, GPT) |
| 318 | +- Alternative: `SwishActivation<T>` or `MishActivation<T>` |
| 319 | +- Output layer: `SoftmaxActivation<T>` (classification) or `IdentityActivation<T>` (regression) |
| 320 | + |
| 321 | +**Example**: |
| 322 | +```csharp |
| 323 | +var feedForward = new DenseLayer<float>(inputSize: 768, outputSize: 3072, activation: new GELUActivation<float>()); |
| 324 | +var output = new DenseLayer<float>(inputSize: 3072, outputSize: 768, activation: new IdentityActivation<float>()); |
| 325 | +``` |
| 326 | + |
| 327 | +### For Recurrent Networks (RNNs, LSTMs, GRUs) |
| 328 | + |
| 329 | +**Recommended**: |
| 330 | +- Gates: `SigmoidActivation<T>` (LSTM/GRU gates) |
| 331 | +- Hidden state: `TanhActivation<T>` (LSTM/GRU hidden state) |
| 332 | +- Output layer: `SoftmaxActivation<T>` (classification) |
| 333 | + |
| 334 | +**Example**: |
| 335 | +```csharp |
| 336 | +// LSTM uses both Sigmoid (for gates) and Tanh (for cell state) |
| 337 | +var lstm = new LSTMLayer<float>(inputSize: 100, hiddenSize: 128); |
| 338 | +// Gates internally use Sigmoid, cell state uses Tanh |
| 339 | +``` |
| 340 | + |
| 341 | +### For Generative Models (GANs, VAEs) |
| 342 | + |
| 343 | +**Recommended**: |
| 344 | +- Generator hidden: `LeakyReLUActivation<T>` or `ELUActivation<T>` (avoid dying ReLU) |
| 345 | +- Generator output: `TanhActivation<T>` (normalize to [-1, 1]) |
| 346 | +- Discriminator: `LeakyReLUActivation<T>` (stable gradients) |
| 347 | + |
| 348 | +**Example**: |
| 349 | +```csharp |
| 350 | +var genHidden = new DenseLayer<float>(inputSize: 100, outputSize: 256, activation: new LeakyReLUActivation<float>()); |
| 351 | +var genOutput = new DenseLayer<float>(inputSize: 256, outputSize: 784, activation: new TanhActivation<float>()); |
| 352 | +``` |
| 353 | + |
| 354 | +--- |
| 355 | + |
| 356 | +## Integration Checklist |
| 357 | + |
| 358 | +When adding JIT support for an activation to your layer: |
| 359 | + |
| 360 | +- [ ] Check if activation is in "Production-Ready" list |
| 361 | +- [ ] If not, check "Available Activations - Pending Integration" list |
| 362 | +- [ ] Add activation type check to `CanActivationBeJitted()` |
| 363 | +- [ ] Add activation mapping to `ApplyActivationToGraph()` |
| 364 | +- [ ] Handle parameterized activations correctly (extract parameters) |
| 365 | +- [ ] Update `SupportsJitCompilation` property |
| 366 | +- [ ] Update XML documentation with supported activations |
| 367 | +- [ ] Test with sample data |
| 368 | +- [ ] Verify JIT compilation succeeds |
| 369 | +- [ ] Benchmark performance |
| 370 | + |
| 371 | +--- |
| 372 | + |
| 373 | +## See Also |
| 374 | + |
| 375 | +- [JIT_COMPILATION_PATTERN_GUIDE.md](JIT_COMPILATION_PATTERN_GUIDE.md) - Complete implementation guide |
| 376 | +- [JIT_ROADMAP.md](JIT_ROADMAP.md) - Current status and future work |
0 commit comments