Skip to content

v0.2: Pushing FLOPS in Assembly πŸ‹οΈβ€β™‚οΈ

Choose a tag to compare

@ashvardanian ashvardanian released this 20 Jan 09:57
· 236 commits to main since this release

Release: v0.2.0 [skip ci]

Minor

  • Add: Latency Hiding & Port Interleaving (086f8d7)
  • Add: AMX kernels (0cb024d)
  • Add: Inline Assembly kernels (89095a6)
  • Add: BLAS & Eigen TOPs benchmarks (28ca39b)
  • Add: AVX2 & low-precision AVX-512 TOPS (0a48108)
  • Add: i8, f16, and bf16 kernels (3f54200)
  • Add: Arm NEON FMAs (d0e521e)
  • Add: vfmadd231ps kernels (7ca3161)
  • Add: Assembly micro-kernels (2e71e76)

Patch

  • Docs: Zen4 matmul-benchmarks (2476310)
  • Docs: H100 Tensor Cores vs Intel (fa86663)
  • Fix: Illegal instruction for AMX (a7243dd)
  • Fix: Duplicate .global symbols (c732234)
  • Docs: Recommended Eigen macros (7be2d58)
  • Fix: Missing tops_u8_neon (d97bbfc)
  • Fix: Missing tops_f64_neon (4afa7e3)
  • Improve: Shorter TOPS names (be0c94b)