-
-
Notifications
You must be signed in to change notification settings - Fork 969
GSoC 2025 ‐ Shabareesh Shetty
I’m Shabareesh Shetty, an undergraduate at the NMAMIT, Nitte, Udupi. I am from Mangalore, Karnataka, India. I started exploring scientific computation because of my educational background, Robotics and Artificial Intelligence. My interest in robot kinematics and machine learning helped me diving deep into scientific computing. Additionally, I like exploring applications of Artificial Intelligence in different sectors.
In linear algebra, BLAS routines are predominantly employed to perform operations on matrices and vectors. BLAS are commonly utilized in the development of advanced linear algebra software owing to their efficiency, portability, and extensive accessibility. The major goals associated with the project are:
Add JavaScript implementation. Add C implementation. Re-implementing BLAS subroutines using the existing lapack-netlib Fortran implementation to free-form the current version of Fortran-95. Write node.js bindings to allow calling BLAS routines in compiled C/ Fortran from JavaScript. I tried to implement the same approach as I mentioned in the proposal but diverged slightly as I understood the requirement of the project more deeply. Start with the Level 2 routines. In each level, the priority will be first real, followed by complex datatypes. For real data types, the first priority was real double precision, followed by real single precision, and similarly for complex double and single-precision. I proposed to implement both single and double precision simultaneously but with right guidance and understanding, I understood that I should implement any one of the precision first, after getting the approval for that, I can move to the next one.
During the community bonding period, I began coding by focusing on the BLAS Level 2 routines, which involve matrix-vector operations, as outlined in tracking issue #2039. I initially started with C implementation of the Level 2 packages because I started gaining grip on the C implementations in the pre GSoC period and also because they had a higher priority. Parallelly, I tried to implement level 1 complex routines and it went better than I expected. Implementing Fortran was initially challenging, but when I started reading the existing code as well as with some practice it seemed to be bit easier than before. Also I have made contributions for the JavaScript implementation which I had a good grip, but those were less prioritized as some of them required R&D, as they were new types of packaged especially banded matrices.
Native Implementation Level 1 Signature:
sswap( N, x, strideX, y, strideY )Ndarray Implementation Level 1 Signature:
sswap( N, x, strideX, offsetX, y, strideY, offsetY )This additional offset parameter in the ndarray implementation gives the user the freedom to select their starting index for the operation along with the stride.
There were only 2 packages remaining to be implemented in level 1 real routines and I tried cover those and they are in the reviewing stage.
In level 2 routines, we have an additional parameter which is order. The existing reference LAPACK implementation is Fortran-based and hence follows a column-major layout by default. However, in JavaScript, we can provide the user with the freedom to choose whether they want to pass the matrix in a row-major or column-major order. This flexibility is important since matrices are represented as arrays in JavaScript, ensuring contiguous memory allocation. However, the key innovation comes after this. It became a bit easy for me as there were existing similar implementations in the repository, but things became challenging when moving to banded matrices.
This is a small example:
A = [ 1, 2, 3 ]
[ 4, 5, 6 ] (2X3)
A = [ 1, 2, 3, 4, 5, 6 ] (row-major)
A = [ 1, 4, 2, 5, 3, 6 ] (column-major)
Native Implementation Level 2 Signature:
sgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )sgemv( order, trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )Similar to the ndarray implementation for the Level 1 routine, we have an offset parameter associated with the matrix and the vector. Additionally, there are use cases where we need to perform operations on a specific submatrix within a larger global matrix. To accommodate this, our ndarray implementation includes the following parameters:
- sa1: stride along the first dimension of matrix A.
- sa2: stride along the second dimension of matrix A. These parameters give users full flexibility to utilize the BLAS implementation as needed, even allowing for negative stride values depending on the use case. At each stage of implementation, the idea was to reduce code duplication and maintain cache optimality.
Ndarray implementation signature:
sgemv( trans, M, N, alpha, A, strideA1, strideA2, offsetA, x, strideX, offsetX, beta, y, strideY, offsetY )A short example to understand how stride parameters for matrix A operate:
A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0 ],
[ 999.0, 4.0, 999.0, 5.0, 999.0, 6.0 ],
[ 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ],
A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0, 999.0, 4.0, 999.0, 5.0, 999.0, 6.0, 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ] (row-major)
Now let's say our required sub-matrix for operation is:
A = [ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8, 9 ]
Hence, the stride parameters along with offset give flexibility here.
sa1 = 6
sa2 = 2
offsetA = 1
With the initial offset, we reach the position of 1 in the array representation, and using the sa1 (1-->6) and sa2 (1-->2) we perform the required operation seamlessly.
Once the implementation was ready for a package, the next step was adding test suites. However, most existing BLAS implementation test suites were neither comprehensive nor extensive. But again there were reference implementations for me done by previous contributors and that helped me to create comprehensive test suites. Also I have ensured 100% test coverage. Also, side by side, I tried to complete 100% test coverage for the existing packages. Also, given our unique ndarray implementation covering all edge cases, particularly validating how the custom strides, offsets, and other parameters interacted was important. Thus, each of the possible combinations for the variation of the parameters involved was checked, and now we can claim that we have a robust test suite available for our routines. Also, the test coverage report script command helped me a lot
Test coverage report command:
### Completed work
TODO: include a list of links to all relevant PRs along with a short description for each. For small bug fix PRs, you can group them together and describe them as, e.g., "various maintenance work".
### Current state
TODO: add a summary of the current state of the project.
### What remains
TODO: add a summary of what remains left to do. If there is a tracking issue, include a link to that issue.
## Challenges and lessons learned
TODO: add a summary of any unexpected challenges that you faced, along with any lessons learned during the course of your project.
## Conclusion
TODO: add a report summary and include any acknowledgments (e.g., shout outs to contributors/maintainers who helped out along the way, etc).