Skip to content

[PERFORMANCE] Optimize some additional small convolution paths#297

Merged
sdatkinson merged 1 commit into
mainfrom
conv-implementations
Jun 23, 2026
Merged

[PERFORMANCE] Optimize some additional small convolution paths#297
sdatkinson merged 1 commit into
mainfrom
conv-implementations

Conversation

@sdatkinson

Copy link
Copy Markdown
Owner

Summary

Adds specialized fast paths for other small convolution shapes:

  • Adds unrolled Conv1D paths for 8x4 and 1x4 channel layouts.
  • Expands Conv1x1 optimized paths for 4x4 with fused bias, 4x6, and 8x6.
  • Enables NAM_USE_INLINE_GEMM for tool/test targets that exercise these paths.
  • Adds reference-comparison tests for the new optimized convolution cases.

Testing

  • Added coverage in tools/test/test_conv1d.cpp for:
    • 8x4, kernel size 6, dilation 3
    • 1x4, kernel size 16
  • Added coverage in tools/test/test_conv_1x1.cpp for:
    • 4x6
    • 8x6
    • 4x4 with bias

@sdatkinson sdatkinson merged commit dd972d6 into main Jun 23, 2026
4 checks passed
@sdatkinson sdatkinson deleted the conv-implementations branch June 23, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant