Skip to content

GSoC 2025 ‐ Shabareesh Shetty

Shabareesh Shetty edited this page Aug 27, 2025 · 18 revisions

About me

I’m Shabareesh Shetty, an undergraduate at the NMAMIT, Nitte, Udupi. I am from Mangalore, Karnataka, India. I started exploring scientific computation because of my educational background, Robotics and Artificial Intelligence. My interest in robot kinematics and machine learning helped me diving deep into scientific computing. Additionally, I like exploring applications of Artificial Intelligence in different sectors.

Project overview

In linear algebra, BLAS routines are predominantly employed to perform operations on matrices and vectors. BLAS are commonly utilized in the development of advanced linear algebra software owing to their efficiency, portability, and extensive accessibility. The major goals associated with the project are:

Add JavaScript implementation. Add C implementation. Re-implementing BLAS subroutines using the existing lapack-netlib Fortran implementation to free-form the current version of Fortran-95. Write node.js bindings to allow calling BLAS routines in compiled C/ Fortran from JavaScript. I tried to implement the same approach as I mentioned in the proposal but diverged slightly as I understood the requirement of the project more deeply. Start with the Level 2 routines. In each level, the priority will be first real, followed by complex datatypes. For real data types, the first priority was real double precision, followed by real single precision, and similarly for complex double and single-precision. I proposed to implement both single and double precision simultaneously but with right guidance and understanding, I understood that I should implement any one of the precision first, after getting the approval for that, I can move to the next one.

Project recap

During the community bonding period, I began coding by focusing on the BLAS Level 2 routines, which involve matrix-vector operations, as outlined in tracking issue #2039. I initially started with C implementation of the Level 2 packages because I started gaining grip on the C implementations in the pre GSoC period and also because they had a higher priority. Parallelly, I tried to implement level 1 complex routines and it went better than I expected. Implementing Fortran was initially challenging, but when I started reading the existing code as well as with some practice it seemed to be bit easier than before. Also I have made contributions for the JavaScript implementation which I had a good grip, but those were less prioritized as some of them required R&D, as they were new types of packaged especially banded matrices.

Native Implementation Level 1 Signature:

sswap( N, x, strideX, y, strideY )

Ndarray Implementation Level 1 Signature:

sswap( N, x, strideX, offsetX, y, strideY, offsetY )

This additional offset parameter in the ndarray implementation gives the user the freedom to select their starting index for the operation along with the stride.

There were only 2 packages remaining to be implemented in level 1 real routines and I tried cover those and they are in the reviewing stage.

In level 2 routines, we have an additional parameter which is order. The existing reference LAPACK implementation is Fortran-based and hence follows a column-major layout by default. However, in JavaScript, we can provide the user with the freedom to choose whether they want to pass the matrix in a row-major or column-major order. This flexibility is important since matrices are represented as arrays in JavaScript, ensuring contiguous memory allocation. However, the key innovation comes after this. It became a bit easy for me as there were existing similar implementations in the repository, but things became challenging when moving to banded matrices.

This is a small example:

A = [ 1, 2, 3 ]
    [ 4, 5, 6 ] (2X3)
A = [ 1, 2, 3, 4, 5, 6 ] (row-major)
A = [ 1, 4, 2, 5, 3, 6 ] (column-major)

Native Implementation Level 2 Signature:

sgemv( trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )
sgemv( order, trans, M, N, alpha, A, lda, x, strideX, beta, y, strideY )

Similar to the ndarray implementation for the Level 1 routine, we have an offset parameter associated with the matrix and the vector. Additionally, there are use cases where we need to perform operations on a specific submatrix within a larger global matrix. To accommodate this, our ndarray implementation includes the following parameters:

  • sa1: stride along the first dimension of matrix A.
  • sa2: stride along the second dimension of matrix A. These parameters give users full flexibility to utilize the BLAS implementation as needed, even allowing for negative stride values depending on the use case. At each stage of implementation, the idea was to reduce code duplication and maintain cache optimality.

Ndarray implementation signature:

sgemv( trans, M, N, alpha, A, strideA1, strideA2, offsetA, x, strideX, offsetX, beta, y, strideY, offsetY )

A short example to understand how stride parameters for matrix A operate:

A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0 ], 
    [ 999.0, 4.0, 999.0, 5.0, 999.0, 6.0 ], 
    [ 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ],

A = [ 999.0, 1.0, 999.0, 2.0, 999.0, 3.0, 999.0, 4.0, 999.0, 5.0, 999.0, 6.0, 999.0, 7.0, 999.0, 8.0, 999.0, 9.0 ] (row-major)

Now let's say our required sub-matrix for operation is:

A = [ 1, 2, 3 ],
    [ 4, 5, 6 ],
    [ 7, 8, 9 ]

Hence, the stride parameters along with offset give flexibility here.

sa1 = 6
sa2 = 2
offsetA = 1

With the initial offset, we reach the position of 1 in the array representation, and using the sa1 (1-->6) and sa2 (1-->2) we perform the required operation seamlessly.

Once the implementation was ready for a package, the next step was adding test suites. However, most existing BLAS implementation test suites were neither comprehensive nor extensive. But again there were reference implementations for me done by previous contributors and that helped me to create comprehensive test suites. Also I have ensured 100% test coverage. Also, side by side, I tried to complete 100% test coverage for the existing packages. Also, given our unique ndarray implementation covering all edge cases, particularly validating how the custom strides, offsets, and other parameters interacted was important. Thus, each of the possible combinations for the variation of the parameters involved was checked, and now we can claim that we have a robust test suite available for our routines. Also, the test coverage report script command helped me a lot.

Test coverage report command:

$ make test-cov TESTS_FILTER=".*/math/base/special/floorn/.*"

The one trick I used to use to generate test suited is writing my own custom C code using cblas and give my required inputs as parameters.

This is the example custom code for:

/**
* Example usage of CBLAS zgemv
*
* Performs y := alpha*A*x + beta*y
*/

#include <stdio.h>
#include <complex.h>
#include <cblas.h>

int main() {
    // Matrix dimensions
    int m = 2;  // rows
    int n = 2;  // cols

    // Scalars
    double complex alpha = 1.0 + 0.0*I;
    double complex beta  = 0.0 + 0.0*I;

    // Matrix A (row-major order)
    double complex A[] = {
        1.0 + 1.0*I,  2.0 + 2.0*I,  3.0 + 3.0*I,
        4.0 + 4.0*I
    };

    // Vector x (length n)
    double complex x[] = { 1.0 + 2.0*I, 3.0 + 4.0*I };

    // Vector y (length m)
    double complex y[] = { 0.0 + 0.0*I, 0.0 + 0.0*I };

    // Perform y := alpha*A*x + beta*y
    // Using Row-major, No transpose
    cblas_zgemv(
        CblasRowMajor,     // Row-major storage
        CblasNoTrans,      // No transpose
        m, n,              // Dimensions of A
        &alpha, A, n,      // alpha, A, lda = n
        x, 1,              // x, incx
        &beta, y, 1        // beta, y, incy
    );

    // Print result
    printf("Result y = [%.2f%+.2fi, %.2f%+.2fi]\n",
           creal(y[0]), cimag(y[0]),
           creal(y[1]), cimag(y[1]));

    return 0;
}

The same follows for the other Level 2 routines as mentioned above the operation might look similar but the matrix representation varied along packages.

There exist sets such as:

  • SGEMV - where A is an MXN matrix.
  • SSYMV - where A is an NXN matrix.
  • STPMV - where A is an NXN matrix supplied as a packed form
  • STBMV - where A is an NXN band matrix

After getting a few of the real single and double-precision routines over the finish line, I moved on to implementing a Level 2 complex routine: (zher). This was one of the most challenging part for me as there were no reference implementations and that was the reason I wanted to implement these, so that I can create reference implementations for the future. I also tried to implement (zher2) which is another complex Level 2 routine.

Completed work

TODO: include a list of links to all relevant PRs along with a short description for each. For small bug fix PRs, you can group them together and describe them as, e.g., "various maintenance work".

Current state

TODO: add a summary of the current state of the project.

What remains

TODO: add a summary of what remains left to do. If there is a tracking issue, include a link to that issue.

Challenges and lessons learned

TODO: add a summary of any unexpected challenges that you faced, along with any lessons learned during the course of your project.

Conclusion

TODO: add a report summary and include any acknowledgments (e.g., shout outs to contributors/maintainers who helped out along the way, etc).

Clone this wiki locally