Skip to content

Release v1.0 - Neural Network in C with CUDA, MPI, and Bayesian Optimization

Latest

Choose a tag to compare

@NeuralAditya NeuralAditya released this 29 Mar 03:11
· 1 commit to main since this release

This release introduces a high-performance neural network framework in C, featuring CUDA acceleration, MPI-based distributed training, Bayesian optimization, and real-time monitoring with TensorBoard. It is designed for researchers and engineers who need low-level control over deep learning architectures with GPU acceleration and multi-node scalability.

🛠️ Key Features:
✅ Dynamic Neural Network Architecture - Custom layers and neurons like PyTorch/TF
✅ CUDA & OpenCL Optimized - CuBLAS/cuDNN, FP16, Tensor Cores
✅ Advanced Optimization Techniques - Adam, RMSprop, Dropout, BatchNorm, L2 Regularization
✅ Convolutional & Attention Mechanisms - CNNs with MaxPooling, Self-Attention
✅ RNNs & LSTMs - Bi-Directional LSTMs, GRUs
✅ Parallel & Distributed Training - Multi-GPU support via OpenMP, MPI for cluster-based training
✅ Dataset Loader & Preprocessing - OpenCV for image augmentation, HDF5 for large datasets
✅ Compiler & CPU Optimizations - AVX/SIMD instructions, memory pooling, thread pools
✅ Real-Time Monitoring - TensorBoard integration for loss/accuracy tracking
✅ Quantization Support - FP16 weight quantization for reduced memory usage

📥 Installation & Usage
🔧 Dependencies:
CUDA 11+/cuDNN (for GPU acceleration)

MPI (OpenMPI, MPICH) (for distributed training)

Torch C++ API (for Bayesian optimization)

OpenCV (for image processing)

HDF5 (for dataset management)

JSON-C (for configuration files)

📌 What's Next?
🔹 Transformer-based NLP models (BERT, GPT-like architectures)
🔹 Multi-GPU support via NCCL
🔹 Integration with PyTorch/TensorFlow datasets
🔹 Faster quantization with INT8 optimizations

✨ Contribute & Star the repo if you find it useful! 🚀