perf: NEON SIMD matmul optimization (~4x speedup) by dddimcha · Pull Request #218 · dddimcha/embodiOS

dddimcha · 2026-02-02T19:21:22Z

Summary

Replace scalar tensor_gemm() with SIMD-accelerated matmul_neon() in tensor_dense_forward().

Changes

1 file changed: kernel/ai/tensor_ops.c (+62, -8 lines)
Uses existing matmul_neon() from simd_ops.c
Converts float→Q16.16 fixed-point→SIMD→float
Graceful fallback to scalar on allocation failure

Benchmark Results (ARM64)

Matrix Size	Scalar	NEON	Speedup
64×64	0.16ms	0.04ms	4.48x
128×128	1.52ms	0.38ms	4.03x
256×256	15.2ms	3.76ms	4.04x
512×512	144ms	36ms	4.00x
1024×1024	1278ms	319ms	4.01x

Testing

✅ Compiles for x86_64
✅ Compiles for aarch64 (cross-compile)
✅ Benchmarked on Apple Silicon

Code Diff

// BEFORE (scalar):
tensor_gemm(in_data, weight_data, out_data, ...);

// AFTER (SIMD):
matmul_neon(in_fixed, weight_fixed, out_fixed, ...);

Replace scalar tensor_gemm() with SIMD-accelerated matmul_neon() for ~4x faster matrix multiplication on ARM64. Changes: - Add fixed_point.h include for Q16.16 format - Convert float→fixed→SIMD→float in tensor_dense_forward() - Graceful fallback to scalar on allocation failure Benchmark results (Apple Silicon ARM64): 64x64: 4.48x speedup 256x256: 4.04x speedup 512x512: 4.00x speedup 1024x1024: 4.01x speedup Tested on: macOS ARM64, cross-compiled for aarch64-elf

dddimcha merged commit a840eb8 into main Feb 2, 2026
2 of 4 checks passed

dddimcha deleted the simd/neon-matmul-optimization branch February 2, 2026 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: NEON SIMD matmul optimization (~4x speedup)#218

perf: NEON SIMD matmul optimization (~4x speedup)#218
dddimcha merged 1 commit into
mainfrom
simd/neon-matmul-optimization

dddimcha commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dddimcha commented Feb 2, 2026

Summary

Changes

Benchmark Results (ARM64)

Testing

Code Diff

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant