Spatz Platform by Bumblebee00 · Pull Request #168 · pulp-platform/Deeploy

Bumblebee00 · 2026-03-02T18:42:24Z

This pr aims to add Spatz platform

Added

Deeploy/Targets/Spatz/Deployer.py, Platform.py have been added (but for now SpatzMapping contains only the mapping of Add that is from Deeploy.Targets.Generic.Bindings)
DeeployTest/Platforms/Spatz/main.c, DeeployTest/testUtils/platformMapping.py to be able to generate c code. The generated c code I tested it compiling manually and executing on the Questa Sim-64 simulator

Changed

nothing

Fixed

nothing

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

…ctions

… version

…env in the makefile

Bumblebee00 · 2026-06-22T08:39:14Z

ok i deleted the unnecesary tests tell me if its ok

bowwwang

Hello Mattia, thank you for the efforts. Here is the second round of the review.

After the fixes, you could enable the CI checks to see if your implementation can pass all the test. It would also make sense to first rebase it to the latest commit.

bowwwang · 2026-06-22T14:04:12Z

+    --gcc-toolchain=/usr/pack/riscv-1.0-kgf/spatz-gcc-7.1.1
+)
+
+# User command linked: -lm -lgcc -lm -lgcc libsnRuntime-cluster.a


remove the comments if unused

bowwwang · 2026-06-22T14:05:09Z

 width = int(data_in_type.referencedType.typeWidth/8)
 %>
 BEGIN_SINGLE_CORE
+% if num_indices == 1:


why do we have this case here? what is the meaning of num_indices?

this is to not generate a loop with one iteration:

for (uint32_t j=0; j<1; ++j) {

is cleaner c code

bowwwang · 2026-06-22T14:06:52Z

+    _waitTemplate = NodeTemplate("if (snrt_is_dm_core()) snrt_dma_wait_all();")
+
+
+# LMACAN: TODO: Add single transfer waiting


Is this something still in progress?

bowwwang · 2026-06-22T14:07:23Z

+        2:
+            NodeTemplate("""
+            if (snrt_is_dm_core()) {
+                ${future} = snrt_dma_start_2d(${dest}, ${src}, ${size}, ${stride_dest}, ${stride_src}, ${repeat});


Is it by default using 2D-DMA now? If so why?

yes the function snrt_dma_start_2d is used for all nodes. The reason is that to transfer tiles of matrices you need to have a stride. For the other nodes where stride is not needed, repeat is set to 1

bowwwang · 2026-06-22T14:08:29Z

+% else:
+for (uint32_t i=0; i<${batch}; ++i) {
+    for (uint32_t j=0; j<${num_indices}; ++j) {
+        memcpy(${data_out} + i * (${num_indices} * ${axis_length}) + j * ${axis_length}, 


can we align the usage of memcpy and DMA? why sometimes DMA is used but here is memcpy?

those are the templates for the old versions of the gather node, not used anymore, now the memcpy should never be used. I will remove now those old templates

bowwwang · 2026-06-22T14:19:50Z

+  }
+}
+
+void gemv_col_reduction(float *a, float *b, float *c, int N, int P) {


please make the function number easy to inpertret. You can specify it is either innor_product, gustv, or outer_product, as we discussed eariler

bowwwang · 2026-06-22T14:20:47Z

+  }
+}
+
+void matmul_naive_vanilla(float *a, float *b, float *c, int M, int N, int P) {


good to keep this one, change the name of the kernel to matmul_vanila

bowwwang · 2026-06-22T14:21:14Z

+
+
+
+void Spatz_MatMul_fp32_fp32_fp32(const float32_t *__restrict__ a,


align the function usage here accordingly

bowwwang · 2026-06-22T14:22:29Z

@@ -0,0 +1,202 @@
+#include "DeeployBasicMath.h"


could you please keep a vanila version of the softmax, and one with the best performance? The rest can be removed. We want to keep a clean and concise codebase if possible.
If certain kernel has a special reason to keep, please specify in a comment.

actually i have only one version of softmax with the best performance

bowwwang · 2026-06-22T14:23:33Z

@@ -0,0 +1,81 @@
+name: /scratch/mmm/.conda/envs/deeploy_conda_venv2


make be change the name of this file to a more generic one without your usrname

done but for me we can also remove this file

Bumblebee00 · 2026-06-22T19:13:53Z

i should rebase on latest commit of devel or main?

Victor-Jung · 2026-06-23T07:08:23Z

i should rebase on latest commit of devel or main?

Devel please 😁

coderabbitai · 2026-06-23T07:26:11Z

📝 Walkthrough

Walkthrough

This PR introduces a new Spatz RISC-V vector processor deployment target for the Deeploy framework. It adds a generic TopK operator (parser, type checker, template, bindings), implements Spatz-specific C kernels (float32 MatMul, Softmax, TopK), defines DMA/tiling infrastructure, and provides a complete test harness with CMake build support.

Changes

Spatz Platform and TopK Operator

Layer / File(s)	Summary
Generic TopK operator: parser, type checker, template, bindings, and platform wiring `Deeploy/Targets/Generic/Parsers.py`, `Deeploy/Targets/Generic/TypeCheckers.py`, `Deeploy/Targets/Generic/Layers.py`, `Deeploy/Targets/Generic/Bindings.py`, `Deeploy/Targets/Generic/Templates/TopKTemplate.py`, `Deeploy/Targets/Generic/Templates/GatherTemplate.py`, `Deeploy/Targets/Generic/Platform.py`	Adds `TopKParser`, `TopKChecker`, `TopKLayer`, `BasicTopKBindings`, a selection-sort `referenceTemplate`, and wires them into `GenericMapping`. Also generalizes `GatherParser` to handle multi-index tensors and updates `GatherTemplate` with a conditional multi-index loop.
Tiling extension fixes: alignment, multi-output, ConstantBuffer, and docs `Deeploy/TilingExtension/MemoryScheduler.py`, `Deeploy/TilingExtension/TileConstraint.py`, `Deeploy/TilingExtension/TilerModel.py`, `Deeploy/TilingExtension/TilerExtension.py`, `Deeploy/TilingExtension/TilingCodegen.py`	Increases `byteAlignment` from 4 to 8, relaxes the single-output assertion in `wrapTilingSolution`, skips `ConstantBuffer` in heuristics and element-count model, scales MiniMalloc I/O in alignment units, and adds comprehensive docstrings across the tiling extension.
Spatz C kernels: MatMul, Softmax, TopK, and public header `TargetLibraries/Spatz/src/MatMul_fp32_spatz.c`, `TargetLibraries/Spatz/src/Softmax_fp32_spatz.c`, `TargetLibraries/Spatz/src/TopK_fp32_int32_spatz.c`, `TargetLibraries/Spatz/src/Util.c`, `TargetLibraries/Spatz/inc/DeeploySpatzMath.h`, `TargetLibraries/Spatz/inc/Util.h`, `TargetLibraries/Spatz/CMakeLists.txt`, `TargetLibraries/Spatz/cmake/spatz-runtime-precompiled.cmake`	Implements `Spatz_MatMul_fp32_fp32_fp32` (GEMV/matmul dispatch using RISC-V vector assembly), `Spatz_Softmax_fp32_fp32` (two-core parallel with custom `exp`/`inv` approximations), and `compute_topk_min_heap`/`compute_topk_vector_instructions`; exposes all via `DeeploySpatzMath.h` and builds as `deeployspatz` static library.
Spatz code-gen templates: alloc/free, MatMul, Softmax, TopK, Gather DMA `Deeploy/Targets/Spatz/Templates/AllocateTemplate.py`, `Deeploy/Targets/Spatz/Templates/FreeTemplate.py`, `Deeploy/Targets/Spatz/Templates/MatMulTemplate.py`, `Deeploy/Targets/Spatz/Templates/SoftmaxTemplate.py`, `Deeploy/Targets/Spatz/Templates/TopKTemplate.py`, `Deeploy/Targets/Spatz/Templates/GatherTemplate.py`	Adds `NodeTemplate` strings for L1/L3 allocation, signed-integer and fp32 MatMul, float softmax tiling, TopK (selection-sort and min-heap call wrappers), and a DMA multi-index Gather template (`dynamicDMAtemplate`).
Spatz DMA, bindings, benchmarking, and code-transformation pipelines `Deeploy/Targets/Spatz/DMA/SpatzDma.py`, `Deeploy/Targets/Spatz/CodeTransformationPasses/Benchmarking.py`, `Deeploy/Targets/Spatz/Bindings.py`	Introduces `SpatzDma` (`AsyncDma` subclass with `SnitchFuture`, 2D DMA template, contiguous-stride enforcement), `SpatzBenchmarkInnerPass`/`SpatzBenchmarkOuterPass`, and `BasicTransformer`/`TiledTransformer` pipelines, plus per-operator bindings (`SpatzMatMulBindings`, `SpatzGatherBindings`, `SpatzTopKBindings`, `SpatzSoftmaxBindings`).
Spatz tile constraints: MatMul, Gather, TopK, Softmax `Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py`, `Deeploy/Targets/Spatz/TileConstraints/GatherTileConstraint.py`, `Deeploy/Targets/Spatz/TileConstraints/TopKTileConstraint.py`, `Deeploy/Targets/Spatz/TileConstraints/SoftmaxTileConstraint.py`, `Deeploy/Targets/Spatz/Tiler.py`	Implements `TileConstraint` subclasses for all four Spatz operators (geometric, policy, and serialization logic), then assembles four `TilingReadyNodeBindings` in `Tiler.py`.
Spatz platform, deployer, and buffer classes `Deeploy/Targets/Spatz/Platform.py`, `Deeploy/Targets/Spatz/Deployer.py`	Defines `SpatzPlatform`, `SpatzEngine`, `SpatzOptimizer`, four `SpatzBuffer` subclasses with Spatz alloc/free templates, `SpatzMapping`, and `SpatzDeployer` extending `SignPropDeployer`.
Build system: CMake, toolchain, Makefile, and simulation macros `CMakeLists.txt`, `cmake/spatz/spatz.cmake`, `cmake/spatz/toolchain_llvm.cmake`, `cmake/simulation.cmake`, `Makefile`	Extends `CMakeLists.txt` with a Spatz platform branch; adds `spatz.cmake` (compile defs, `add_spatz_vsim_simulation`), `toolchain_llvm.cmake` (RISC-V LLVM/Clang flags, lld, newlib), `add_spatz_gvsoc_emulation` macro; updates Makefile with SPATZ clone/build/runtime targets and updated GVSOC dependencies.
Test harness, platform mapping, runners, and test ONNX models `DeeployTest/CMakeLists.txt`, `DeeployTest/Platforms/Spatz/CMakeLists.txt`, `DeeployTest/Platforms/Spatz/main.c`, `DeeployTest/testUtils/platformMapping.py`, `DeeployTest/testUtils/deeployRunner.py`, `DeeployTest/testMVP.py`, `DeeployTest/testUtils/codeGenerate.py`, `DeeployTest/testUtils/core/execution.py`, `DeeployTest/deeployRunner_spatz.py`, `DeeployTest/deeployRunner_tiled_spatz.py`, `DeeployTest/test_platforms.py`, `DeeployTest/Tests/Kernels/FP32/...`	Adds Spatz test `main.c` (DM-core startup, zero-copy input binding via weak hook, cycle measurement, float/int output validation), CMake test wiring, `deeployRunner_spatz.py`/`_tiled` entry points, `mapPlatform`/`mapDeployer` Spatz branches, two-level (L3/L1) memory hierarchy, streaming simulation output, `DeeployNetwork_BindExternalInputs` codegen, and new ONNX test models.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

pulp-platform/Deeploy#105: Shares the same tiling infrastructure (TilerExtension, MemoryScheduler, TileConstraint) that this PR modifies for Spatz alignment and multi-output support.
pulp-platform/Deeploy#178: Both PRs modify DeeployTest/testUtils/core/execution.py, specifically the run_simulation subprocess handling.

Suggested labels

Feature

Suggested reviewers

Victor-Jung
Xeratec

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.01% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is directly related to the changeset, outlining the Spatz platform integration effort including added files (Deployer.py, Platform.py, main.c) and test infrastructure modifications that are present in the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title 'Spatz Platform' is clear and directly reflects the main objective of this PR, which is to integrate the Spatz platform into Deeploy with extensive supporting infrastructure.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

♻️ Duplicate comments (1)

Deeploy/Targets/Spatz/Deployer.py (1)

31-38: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Forward inputOffsets to preserve sign-propagation.

Line 31 calls SignPropDeployer.__init__ without inputOffsets, so any caller-provided offsets are dropped. This changes input signedness inference for Spatz deployments.

Suggested fix

         super().__init__(graph,
                          deploymentPlatform,
                          inputTypes,
                          loweringOptimizer,
                          scheduler,
                          name,
                          default_channels_first = default_channels_first,
-                         deeployStateDir = deeployStateDir)
+                         deeployStateDir = deeployStateDir,
+                         inputOffsets = inputOffsets)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/Deployer.py` around lines 31 - 38, The
SpatzDeployer.__init__ method is not forwarding the inputOffsets parameter to
the parent SignPropDeployer class via the super().__init__() call. Add
inputOffsets as a parameter to the super().__init__() call alongside the
existing parameters (graph, deploymentPlatform, inputTypes, loweringOptimizer,
scheduler, name, default_channels_first, and deeployStateDir) to ensure that
caller-provided input offsets are properly propagated to the parent class and
preserve the sign-propagation behavior.

🟠 Major comments (20)

Deeploy/Targets/Spatz/Templates/AllocateTemplate.py-17-20 (1)

17-20: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Use the same byte-size formula in the fallback allocation path.

The else branch switches from sizeof(...) * size to typeWidth//8 * size. That can allocate too few bytes and cause buffer overruns for width/ABI mismatches.

Suggested fix

 % else:
 // COMPILER WARNING — unsupported memory level ${_memoryLevel}, defaulting to L3
-${name} = (${type.typeName}) snrt_l3alloc(${type.referencedType.typeWidth//8} * ${size});
+${name} = (${type.typeName}) snrt_l3alloc(sizeof(${type.referencedType.typeName}) * ${size});
 % endif

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/Templates/AllocateTemplate.py` around lines 17 - 20,
The fallback allocation path in the else branch uses a different byte-size
formula than the primary path. In the snrt_l3alloc call within the else block,
replace the calculation formula that uses typeWidth//8 with the same
sizeof-based formula that should be used in the non-fallback allocation path to
ensure consistent byte-size calculation and prevent potential buffer overruns
due to width/ABI mismatches.

Deeploy/Targets/Generic/Templates/TopKTemplate.py-16-16 (1)

16-16: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Avoid allocating pairs[${data_in_size}] on the stack.

Line 16 uses an O(N) VLA on stack; realistic tensor sizes can exhaust stack and crash generated kernels.

Suggested direction

- topk_pair_t pairs[${data_in_size}];
+ // Prefer an O(k) approach (min-heap/selection buffer) or a managed transient buffer
+ // instead of a stack VLA proportional to input size.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Templates/TopKTemplate.py` at line 16, Replace the
stack-allocated Variable Length Array (VLA) declaration `topk_pair_t
pairs[${data_in_size}]` with dynamic heap allocation to prevent stack overflow
crashes with large tensor sizes. Allocate the pairs array dynamically on the
heap using appropriate memory allocation functions and ensure proper
deallocation before the function returns to avoid memory leaks.

Deeploy/Targets/Generic/Bindings.py-333-336 (1)

333-336: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

TopK binding uses 8-bit integers for k and indices, which truncates valid values.

Lines 334-335 cap both k and output indices to int8_t. This will corrupt indices for typical tensor sizes and reject/overflow larger k.

Suggested fix

         TopKChecker(
-            [PointerClass(float32_t), PointerClass(int8_t)], # inputs
-            [PointerClass(float32_t), PointerClass(int8_t)] # outputs
+            [PointerClass(float32_t), PointerClass(int32_t)], # inputs
+            [PointerClass(float32_t), PointerClass(int32_t)] # outputs
         ),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Bindings.py` around lines 333 - 336, The TopKChecker
binding is using int8_t (8-bit signed integers) for both the k parameter and the
output indices, which is too restrictive and will cause data corruption for
typical tensor operations. Replace the PointerClass(int8_t) references in the
TopKChecker constructor with PointerClass(int32_t) or PointerClass(int64_t) to
allow for larger k values and tensor indices that can exceed the range of an
8-bit integer.

Deeploy/Targets/Generic/Templates/GatherTemplate.py-19-22 (1)

19-22: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add index normalization/bounds checks in multi-index gather path.

Line 21 dereferences ${indices}[j] directly. Negative or out-of-range values can produce invalid memcpy source addresses.

Suggested fix

 for (uint32_t i=0; i<${batch}; ++i) {
     for (uint32_t j=0; j<${num_indices}; ++j) {
+        int32_t idx = ${indices}[j];
+        const int32_t axis_dim = (${batch_length} / ${axis_length});
+        if (idx < 0) idx += axis_dim;
+        if (idx < 0 || idx >= axis_dim) { continue; }
         memcpy(${data_out} + i * (${num_indices} * ${axis_length}) + j * ${axis_length},
-               ${data_in} + i * ${batch_length} + ${indices}[j] * ${axis_length},
+               ${data_in} + i * ${batch_length} + idx * ${axis_length},
                ${axis_length} * ${width});
     }
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Templates/GatherTemplate.py` around lines 19 - 22,
The loop in the memcpy operation directly dereferences ${indices}[j] without
validating that the index value is within valid bounds or handling negative
values. Add index normalization and bounds checking before using ${indices}[j]
to access memory. Validate that each index value is non-negative and within the
valid range (0 to the appropriate dimension size) to prevent invalid memcpy
source addresses. Apply this validation within the for loop that iterates with j
before the memcpy call that uses ${indices}[j].

Deeploy/Targets/Generic/Parsers.py-1003-1007 (1)

1003-1007: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Normalize Gather axis before deriving strides.

Line 1005 treats any axis <= 0 as batch 1. For negative axes this computes invalid offsets (batch, batch_length, axis_length) and can produce out-of-bounds addresses in generated gather code.

Suggested fix

-        axis = self.operatorRepresentation['axis']
-        shape = ctxt.lookup(node.inputs[0].name).shape
+        axis = self.operatorRepresentation['axis']
+        shape = ctxt.lookup(node.inputs[0].name).shape
+        rank = len(shape)
+        if axis < 0:
+            axis += rank
+        if axis < 0 or axis >= rank:
+            return ctxt, False
+        self.operatorRepresentation['axis'] = axis
         self.operatorRepresentation['batch'] = int(np.prod(shape[:axis])) if axis > 0 else 1
         self.operatorRepresentation['batch_length'] = int(np.prod(shape[axis:]))
         self.operatorRepresentation['axis_length'] = int(np.prod(shape[axis + 1:])) if axis + 1 < len(shape) else 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Parsers.py` around lines 1003 - 1007, Negative axes
are not being normalized before calculating batch, batch_length, and axis_length
values, causing invalid offset computations. In the code block starting with the
axis assignment, normalize the axis value to its positive equivalent when it is
negative (by adding len(shape) to negative axis values, similar to Python's
indexing behavior) before using it in the subsequent calculations for batch,
batch_length, and axis_length.

Deeploy/Targets/Generic/Parsers.py-2914-2916 (1)

2914-2916: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

TopK parsing currently flattens the full input tensor without a shape/axis guard.

Line 2915 collapses every dimension (np.prod(data_in.shape)), while this implementation path is 1D-only. For non-1D inputs this silently changes TopK semantics.

Suggested fix

         self.operatorRepresentation['data_in'] = data_in.name
+        if len(data_in.shape) != 1:
+            return ctxt, False
         self.operatorRepresentation['data_in_size'] = int(np.prod(data_in.shape))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Parsers.py` around lines 2914 - 2916, The TopK
parsing logic in the section setting operatorRepresentation for data_in and
data_in_size does not validate input dimensions. Add a guard to ensure the input
tensor is 1D before calculating data_in_size using np.prod(data_in.shape), since
this implementation path assumes 1D inputs. For non-1D tensors, either validate
and raise an appropriate error or adjust the logic to handle the correct TopK
semantics based on the actual input shape. Reference the data_in.shape check
before the line containing int(np.prod(data_in.shape)) to properly validate
input dimensionality.

Deeploy/TilingExtension/TilerExtension.py-443-448 (1)

443-448: 🩺 Stability & Availability | 🟠 Major

Add a timeout to the MiniMalloc subprocess call.

Line 443 invokes an external allocator without a timeout. If minimalloc hangs, the tiling flow can block indefinitely.

🔧 Proposed fix

-        minimallocOutput = subprocess.run([
-            f"{minimallocInstallDir}/minimalloc", f"--capacity={capacity // alignment}",
-            f"--input={self._minimalloc_input}.csv", f"--output={self._minimalloc_output}.csv"
-        ],
-                                          capture_output = True,
-                                          text = True)
+        try:
+            minimallocOutput = subprocess.run([
+                f"{minimallocInstallDir}/minimalloc", f"--capacity={capacity // alignment}",
+                f"--input={self._minimalloc_input}.csv", f"--output={self._minimalloc_output}.csv"
+            ],
+                                              capture_output = True,
+                                              text = True,
+                                              timeout = 60)
+        except subprocess.TimeoutExpired as exc:
+            log.error(f"Memory allocator timed out after {exc.timeout}s at memory level {memoryLevel}.")
+            raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/TilingExtension/TilerExtension.py` around lines 443 - 448, The
subprocess.run() call to the minimalloc allocator lacks a timeout parameter,
which can cause the tiling flow to block indefinitely if the process hangs. Add
a timeout parameter to the subprocess.run() invocation that specifies a
reasonable maximum duration for the minimalloc execution. Choose an appropriate
timeout value based on expected execution time for the allocator, and ensure
that if the timeout is exceeded, the subprocess is properly terminated and an
appropriate error is handled.

Deeploy/Targets/Spatz/Platform.py-110-117 (1)

110-117: 🩺 Stability & Availability | 🟠 Major

Avoid shared engine state via default arguments.

Line 111 creates one SpatzEngine at definition time and reuses it for every SpatzPlatform() instance. Because NodeMapper objects in the engine's mapping are stateful (containing mutable binder, bound, and discardedBindings fields), this leaks binding state across deployments and makes compilation behavior order-dependent.

Suggested fix

 class SpatzPlatform(DeploymentPlatform):
 
     def __init__( self,
-        engines = [SpatzEngine("SpatzVectorProcessor")],
+        engines = None,
         variableBuffer = SpatzVariableBuffer,
         transientBuffer = SpatzTransientBuffer,
         constantBuffer = SpatzConstantBuffer,
         structBuffer = SpatzStructBuffer,
         includeList: List[str] = includeList
     ):
+        if engines is None:
+            engines = [SpatzEngine("SpatzVectorProcessor")]
         super().__init__(engines, variableBuffer, constantBuffer, structBuffer, transientBuffer)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/Platform.py` around lines 110 - 117, The default
argument engines = [SpatzEngine("SpatzVectorProcessor")] in the __init__ method
creates a shared mutable list at class definition time, causing all instances of
SpatzPlatform to reuse the same SpatzEngine object and its stateful NodeMapper
bindings. Fix this by changing the engines default parameter to None, then
inside the __init__ method body, check if engines is None and instantiate a
fresh SpatzEngine list only when needed, ensuring each SpatzPlatform instance
gets its own independent engine with clean state.

Deeploy/Targets/Spatz/TileConstraints/TopKTileConstraint.py-38-40 (1)

38-40: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard TopK shape handling for scalar tensors.

At Line 38 and Line 66, scalar shape: int will raise when iterated or passed to len().

🔧 Suggested patch

-            for idx, shapeDim in enumerate(_buffer.shape):
+            shape = (_buffer.shape,) if isinstance(_buffer.shape, int) else tuple(_buffer.shape)
+            for idx, shapeDim in enumerate(shape):
                 tileDimVar = tilerModel.getTensorDimVar(tensorName = tensorName, dimIdx = idx)
                 tilerModel.addConstraint(tileDimVar == shapeDim)
@@
-        dataInCube = HyperRectangle(offset = (0,) * len(dataInBuffer.shape), dims = tuple(dataInBuffer.shape))
+        dataInShape = (dataInBuffer.shape,) if isinstance(dataInBuffer.shape, int) else tuple(dataInBuffer.shape)
+        dataInCube = HyperRectangle(offset = (0,) * len(dataInShape), dims = dataInShape)

Also applies to: 66-67

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/TileConstraints/TopKTileConstraint.py` around lines 38
- 40, The code iterates over _buffer.shape without guarding for scalar tensors,
which will fail when shape is an integer rather than a tuple or list. Add a
guard condition in the loop that starts at line 38 (the for loop iterating
through enumerate(_buffer.shape)) to check if the shape is scalar (an int) and
handle it appropriately, potentially by wrapping it in a tuple or conditionally
processing it. Apply the same guard logic to the similar code block at line 66
that also iterates over shape dimensions to ensure consistency across both
locations where tensor shapes are processed.

Deeploy/Targets/Spatz/TileConstraints/SoftmaxTileConstraint.py-38-40 (1)

38-40: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Normalize scalar shapes in Softmax tiling paths.

At Line 38 and Line 65, scalar shapes (int) will fail with enumerate()/len(); this should be normalized first.

🔧 Suggested patch

-            for idx, shapeDim in enumerate(_buffer.shape):
+            shape = (_buffer.shape,) if isinstance(_buffer.shape, int) else tuple(_buffer.shape)
+            for idx, shapeDim in enumerate(shape):
                 tileDimVar = tilerModel.getTensorDimVar(tensorName = tensorName, dimIdx = idx)
                 tilerModel.addConstraint(tileDimVar == shapeDim)
@@
-        dataInCube = HyperRectangle(offset = (0,) * len(dataInBuffer.shape), dims = tuple(dataInBuffer.shape))
+        dataInShape = (dataInBuffer.shape,) if isinstance(dataInBuffer.shape, int) else tuple(dataInBuffer.shape)
+        dataInCube = HyperRectangle(offset = (0,) * len(dataInShape), dims = dataInShape)

Also applies to: 65-66

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/TileConstraints/SoftmaxTileConstraint.py` around lines
38 - 40, The code attempts to enumerate over _buffer.shape which may be a scalar
integer value instead of an iterable, causing the enumerate() call to fail.
Before the for loop that uses enumerate(_buffer.shape), normalize scalar shapes
by converting them to a tuple format (scalar values should be wrapped as a tuple
with a single element). Apply this normalization in both locations where
enumerate() is used with shape data, specifically where the for loop iterates
over _buffer.shape at the start of the constraint addition logic and at the
second occurrence around line 65.

Deeploy/Targets/Spatz/TileConstraints/GatherTileConstraint.py-39-40 (1)

39-40: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle scalar tensor shapes before iterating or taking len().

At Line 39 and Line 70, shape is treated as iterable. Deeploy tensors can carry scalar shapes as int, which will crash here.

🔧 Normalize shape to tuple before use

-                for idx, shapeDim in enumerate(_buffer.shape):
+                shape = (_buffer.shape,) if isinstance(_buffer.shape, int) else tuple(_buffer.shape)
+                for idx, shapeDim in enumerate(shape):
                     tilerModel.addConstraint(tilerModel.getTensorDimVar(tensorName = value, dimIdx = idx) == shapeDim)
@@
-        dataInCube = HyperRectangle(offset = (0,) * len(dataInBuffer.shape), dims = tuple(dataInBuffer.shape))
-        indicesCube = HyperRectangle(offset = (0,) * len(indicesBuffer.shape), dims = tuple(indicesBuffer.shape))
+        dataInShape = (dataInBuffer.shape,) if isinstance(dataInBuffer.shape, int) else tuple(dataInBuffer.shape)
+        indicesShape = (indicesBuffer.shape,) if isinstance(indicesBuffer.shape, int) else tuple(indicesBuffer.shape)
+        dataInCube = HyperRectangle(offset = (0,) * len(dataInShape), dims = dataInShape)
+        indicesCube = HyperRectangle(offset = (0,) * len(indicesShape), dims = indicesShape)

Also applies to: 70-71

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/TileConstraints/GatherTileConstraint.py` around lines
39 - 40, The code iterates over _buffer.shape in the enumerate loop without
checking if it's a scalar integer, which will crash when shape is an int rather
than a tuple or list. Before the for loop that enumerates over _buffer.shape,
normalize the shape to ensure it's always a tuple - either by wrapping a scalar
int in a tuple or using a utility function. Apply the same normalization at the
second location (around line 70-71) where shape is also iterated over to handle
both tensor and scalar shapes consistently.

Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py-53-63 (1)

53-63: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Constrain or explicitly reject broadcasted batch MatMul in geometry constraints.

At Line 53, constraints are only added when bufferA.shape[:-2] == bufferB.shape[:-2]. For broadcasted batches, input/output batch tile dimensions are left unconstrained, which can make the solver optimize against a different tile relation than the serializer later enforces.

💡 Safe fail-fast fix until full broadcast constraints are encoded

-        if (bufferA.shape[:-2] == bufferB.shape[:-2]):
+        if bufferA.shape[:-2] != bufferB.shape[:-2]:
+            raise NotImplementedError(
+                "Broadcasted batch MatMul requires explicit geometric constraints; "
+                "current tiling relation is only safe for identical batch prefixes."
+            )
+        else:
             for idx in range(tensorsShapeLenA - 2):
                 tilerModel.addConstraint(
                     tilerModel.getTensorDimVar(tensorName = outputBuffer.name, dimIdx = tensorsShapeLenOutput - 3 - idx)
                     == tilerModel.getTensorDimVar(tensorName = bufferA.name, dimIdx = tensorsShapeLenA - 3 - idx))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py` around lines
53 - 63, The constraint block starting at the if statement checking
bufferA.shape[:-2] == bufferB.shape[:-2] only handles the case where batch
dimensions match, leaving broadcasted batches unconstrained. Add an else clause
after the second for loop to explicitly handle the broadcasted batch case by
raising an exception or rejecting the constraint setup, ensuring that mismatched
batch dimensions in MatMul operations are caught early rather than allowing the
solver to operate on unconstrained dimensions that may conflict with later
serialization.

Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py-152-155 (1)

152-155: 🩺 Stability & Availability | 🟠 Major

Replace assert with explicit runtime validation in serializeTilingSolution.

At line 53, the assert validates an unsupported tiling constraint. Using assert for this check is unsafe because assertions can be disabled (via Python's -O flag), allowing invalid tiling schedules to silently generate incorrect code.

✅ Deterministic runtime check

-                    assert all(off == 0 for off in cube.offset[:-3]), (
-                        f"Unsupported tiling across leading batch dims: offsets={cube.offset}. "
-                        "Only the last batch dim (besides M/O) may be tiled.")
+                    if not all(off == 0 for off in cube.offset[:-3]):
+                        raise ValueError(
+                            f"Unsupported tiling across leading batch dims: offsets={cube.offset}. "
+                            "Only the last batch dim (besides M/O) may be tiled."
+                        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py` around lines
152 - 155, In the serializeTilingSolution method, replace the assert statement
that validates unsupported tiling across leading batch dimensions (checking that
all offsets except the last three are zero in cube.offset) with explicit runtime
validation using an if condition that raises a ValueError instead. This ensures
the validation always executes at runtime regardless of Python optimization
flags, preventing invalid tiling schedules from silently generating incorrect
code.

DeeployTest/testUtils/platformMapping.py-281-291 (1)

281-291: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Spatz deployer branch breaks mapDeployer contract (optimizer/defaults/input offsets).

Line 286 ignores caller-provided loweringOptimizer; Line 289 may pass None for default_channels_first; and inputOffsets is not forwarded at all despite being accepted by mapDeployer and SpatzDeployer.

Proposed fix

-    elif isinstance(platform, (SpatzPlatform)):
-        deployer = SpatzDeployer(
-            graph,
-            platform,
-            inputTypes,
-            SpatzOptimizer,
-            scheduler,
-            name = name,
-            default_channels_first = default_channels_first,
-            deeployStateDir = deeployStateDir
-        )
+    elif isinstance(platform, SpatzPlatform):
+        if loweringOptimizer is None:
+            loweringOptimizer = SpatzOptimizer
+        if default_channels_first is None:
+            default_channels_first = False
+
+        deployer = SpatzDeployer(
+            graph,
+            platform,
+            inputTypes,
+            loweringOptimizer,
+            scheduler,
+            name = name,
+            default_channels_first = default_channels_first,
+            deeployStateDir = deeployStateDir,
+            inputOffsets = inputOffsets
+        )

Also update Deeploy/Targets/Spatz/Deployer.py to forward inputOffsets to super().__init__(..., inputOffsets=inputOffsets) so the value is not dropped downstream.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DeeployTest/testUtils/platformMapping.py` around lines 281 - 291, The
SpatzDeployer instantiation in the elif block for SpatzPlatform is breaking the
mapDeployer contract in three ways: it hardcodes SpatzOptimizer instead of using
the caller-provided loweringOptimizer parameter, it may pass None for
default_channels_first instead of using the parameter value, and it completely
omits the inputOffsets parameter despite accepting it. Fix these by replacing
SpatzOptimizer with loweringOptimizer, ensuring default_channels_first uses the
passed parameter, and adding inputOffsets=inputOffsets to the SpatzDeployer
constructor call. Additionally, update the SpatzDeployer class in
Deeploy/Targets/Spatz/Deployer.py to forward inputOffsets to its
super().__init__() call so the value is properly passed to the parent class
instead of being dropped.

cmake/simulation.cmake-106-115 (1)

106-115: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add the missing GVSOC_INSTALL_DIR precondition check in the Spatz macro.

Line 114 dereferences ${GVSOC_INSTALL_DIR} without a guard, unlike the generic macro. This can fail late with unclear errors.

Proposed fix

 macro(add_spatz_gvsoc_emulation name target)
+	if(NOT DEFINED GVSOC_INSTALL_DIR)
+		message(FATAL_ERROR "CMake variable GVSOC_INSTALL_DIR not set. Please specify it with -DGVSOC_INSTALL_DIR=<path>")
+	endif()
 	set(GVSOC_WORKDIR ${CMAKE_BINARY_DIR}/gvsoc_workdir)
 	make_directory(${GVSOC_WORKDIR})
 	set(GVSOC_BINARY "${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${name}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmake/simulation.cmake` around lines 106 - 115, The macro
add_spatz_gvsoc_emulation dereferences the variable GVSOC_INSTALL_DIR without
first verifying that it is defined, which can cause unclear errors at runtime.
Add a precondition check at the beginning of the macro (before the custom target
is created) that validates GVSOC_INSTALL_DIR is set and non-empty, and if not,
issue a clear error message using message(FATAL_ERROR ...) to fail early with
helpful context about the missing requirement.

DeeployTest/Tests/Kernels/FP32/Softmax/Regular2D/network.onnx-1-13 (1)

1-13: 📐 Maintainability & Code Quality | 🟠 Major

ONNX test fixtures lack Git LFS configuration — resolve binary storage strategy.

This file is a serialized ONNX model (118 bytes) that cannot be reviewed as text. The repository contains 189 ONNX files without Git LFS rules: files range from 118 bytes to 16.9 MB (e.g., DeeployTest/Tests/Models/Transformer/network.onnx), with many multi-megabyte models like microLlama variants (2.2–2.3 MB each) and CCT models (up to 3.7 MB). All are tracked directly in Git rather than via Git LFS or external artifact storage.

Migrate test fixtures to Git LFS or an external artifact system to reduce repository bloat and improve clone/checkout performance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DeeployTest/Tests/Kernels/FP32/Softmax/Regular2D/network.onnx` around lines 1
- 13, Binary ONNX model files including network.onnx and other test fixtures are
being tracked directly in Git, causing repository bloat. Configure Git LFS for
all ONNX files by adding appropriate patterns to the .gitattributes file to
handle these binary artifacts, or alternatively migrate the test fixtures to an
external artifact storage system. This will prevent large model files from
inflating repository size during clones and checkouts.

cmake/spatz/toolchain_llvm.cmake-5-12 (1)

5-12: 🩺 Stability & Availability | 🟠 Major

Replace ${SPATZ_HOME} with $ENV{SPATZ_HOME} in the toolchain file.

CMake processes toolchain files in an isolated scope before the parent CMakeLists.txt variables are available. The SPATZ_HOME variable defined via set(SPATZ_HOME $ENV{SPATZ_HOME}) in CMakeLists.txt is a normal (non-cache) variable and is not accessible within the toolchain scope. As a result, ${SPATZ_HOME} resolves to an empty string, causing SPATZ_TOOLCHAIN_DIR to become /sw/toolchain/llvm-project/build/bin and all compiler/tool paths to be incorrect.

Access the environment variable directly using $ENV{SPATZ_HOME} instead on both lines 5 and 41.
🔧 Proposed fix
-set(SPATZ_TOOLCHAIN_DIR ${SPATZ_HOME}/sw/toolchain/llvm-project/build/bin)
+set(SPATZ_TOOLCHAIN_DIR $ENV{SPATZ_HOME}/sw/toolchain/llvm-project/build/bin)
--isystem ${SPATZ_HOME}/sw/toolchain/riscv-gnu-toolchain/riscv-newlib/newlib/libc/include
+-isystem $ENV{SPATZ_HOME}/sw/toolchain/riscv-gnu-toolchain/riscv-newlib/newlib/libc/include
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmake/spatz/toolchain_llvm.cmake` around lines 5 - 12, CMake toolchain files
execute in an isolated scope where regular variables from CMakeLists.txt are not
accessible, causing the SPATZ_TOOLCHAIN_DIR variable definition to fail. Replace
all occurrences of ${SPATZ_HOME} with $ENV{SPATZ_HOME} in the
toolchain_llvm.cmake file to directly access the environment variable instead.
This change must be applied wherever SPATZ_HOME is referenced, including the
SPATZ_TOOLCHAIN_DIR assignment and any other locations in the file.

DeeployTest/testUtils/codeGenerate.py-220-221 (1)

220-221: 🩺 Stability & Availability | 🟠 Major

Gate DeeployNetwork_BindExternalInputs generation to Spatz platform only.

_generateBindExternalInputs is called unconditionally on line 220, generating the DeeployNetwork_BindExternalInputs function with references to DeeployNetwork_input_{i} and DeeployNetwork_inputs. The TODO comment correctly identifies this should be Spatz-specific—other platforms (Chimera, CortexM, GAP9, Generic, MemPool, Neureka, PULPOpen, Snitch, SoftHier) do not declare these symbols and will fail compilation. Apply an isinstance(deployer.Platform, SpatzPlatform) guard matching the pattern already used in this file (lines 127, 185, 190, 202). Also fix the typo: "unputs" → "inputs".
🔧 Proposed guard
-    # TODO: make this work only for spatz and with the correct number of unputs every time
-    retStr += _generateBindExternalInputs(deployer)
+    # Only Spatz uses zero-copy external input binding for now.
+    if isinstance(deployer.Platform, SpatzPlatform):
+        retStr += _generateBindExternalInputs(deployer)
(Note: requires importing SpatzPlatform from Deeploy.Targets.Spatz.Platform)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DeeployTest/testUtils/codeGenerate.py` around lines 220 - 221, The call to
_generateBindExternalInputs(deployer) on line 220 is unconditional but should
only execute for the Spatz platform, as other platforms lack the
DeeployNetwork_input_{i} and DeeployNetwork_inputs symbols that this function
generates. Wrap the retStr += _generateBindExternalInputs(deployer) line with an
isinstance(deployer.Platform, SpatzPlatform) guard following the same pattern
used elsewhere in the file (see lines 127, 185, 190, 202). Also import
SpatzPlatform from Deeploy.Targets.Spatz.Platform at the top of the file, and
fix the typo in the TODO comment by changing "unputs" to "inputs".

DeeployTest/Platforms/Spatz/main.c-2-5 (1)

2-5: 🎯 Functional Correctness | 🟠 Major

Add missing <string.h> header for memcpy function.

memcpy is called on lines 41 and 46 but <string.h> is not included. This can cause compilation failures or warnings depending on compiler strictness settings.

Proposed fix

 `#include` <stdint.h>
 `#include` <stddef.h>
+#include <string.h>
 `#include` <benchmark.h>
 `#include` "printf.h"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DeeployTest/Platforms/Spatz/main.c` around lines 2 - 5, Add the missing
`#include <string.h>` header to the include section at the top of the file.
Insert this include statement with the other standard library headers (alongside
the existing `#include <stdint.h>` and `#include <stddef.h>` includes) to
provide the declaration for the `memcpy` function that is being called in the
code.

DeeployTest/Platforms/Spatz/main.c-77-84 (1)

77-84: 🎯 Functional Correctness | 🟠 Major

Pointer aliasing and incomplete NaN detection in float validation.

Lines 80 and 83 use *(uint32_t*)&variable casts, which violate C strict aliasing rules and trigger undefined behavior. Additionally, checking only 0x7FC00000 detects one quiet NaN pattern but misses other NaN encodings; a complete check must cover all values where the exponent bits are 0xFF with a non-zero mantissa.

Use isnan() to detect all NaN cases, and replace unsafe pointer casts with memcpy() for safe bit reinterpretation:

Suggested fix

 `#if` ISOUTPUTFLOAT == 1
-        // RUNWANG: Allow margin of error for float32_t
-        // MATTIA: if diff is a quiet nan 0x7FC00000 we want to error
-        if ((diff < -1e-4f) || (diff > 1e-4f) || *(uint32_t*)&diff == 0x7FC00000) {
+        // Allow margin of error for float32_t; fail on any NaN.
+        if (isnan((float)expected) || isnan((float)actual) || isnan((float)diff) ||
+            (diff < -1e-4f) || (diff > 1e-4f)) {
           tot_err += 1;
-          // printf("Expected: %f  Actual: %f  Diff: %f at Index %12u in Output %u\r\n", expected, actual, diff, i, buf);  
-          printf("Expected: 0x%08x  Actual: 0x%08x  Diff: 0x%08x at Index %4u in Output %u\r\n", *(uint32_t*)&expected, *(uint32_t*)&actual, *(uint32_t*)&diff, i, buf);  
+          uint32_t expected_bits, actual_bits, diff_bits;
+          memcpy(&expected_bits, &expected, sizeof(expected_bits));
+          memcpy(&actual_bits, &actual, sizeof(actual_bits));
+          memcpy(&diff_bits, &diff, sizeof(diff_bits));
+          printf("Expected: 0x%08x  Actual: 0x%08x  Diff: 0x%08x at Index %4u in Output %u\r\n",
+                 expected_bits, actual_bits, diff_bits, i, buf);
         }
 `#else`

Note: If math.h cannot be included, use __builtin_isnan() or implement a safe bit-pattern check via memcpy().

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DeeployTest/Platforms/Spatz/main.c` around lines 77 - 84, The code uses
unsafe pointer casting with `*(uint32_t*)&` to reinterpret float values, which
violates strict aliasing rules and causes undefined behavior. Additionally, the
NaN detection only checks for one specific pattern (`0x7FC00000`) and misses
other NaN encodings. Replace the direct pointer cast NaN check in the condition
(`*(uint32_t*)&diff == 0x7FC00000`) with the `isnan()` function to properly
detect all NaN values. For the printf statement that casts the `expected`,
`actual`, and `diff` variables to uint32 pointers, replace these unsafe casts
with `memcpy()` calls to safely reinterpret the float bits as uint32 values.

🟡 Minor comments (4)

TargetLibraries/Spatz/src/TopK_fp32_int32_spatz.c-59-63 (1)
59-63: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

root_idx should be int32_t, not float32_t.

heap_indices[0] is an int32_t, but it is copied through a float32_t temporary and written back into heap_indices[i]. For indices above 2^24 the int→float→int round-trip loses precision and yields a wrong index. Use the correct integer type.
🐛 Fix the temporary type
-    float32_t root_val = heap_values[0]; float32_t root_idx = heap_indices[0];
+    float32_t root_val = heap_values[0]; int32_t root_idx = heap_indices[0];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Spatz/src/TopK_fp32_int32_spatz.c` around lines 59 - 63, The
variable `root_idx` is incorrectly declared as `float32_t` when it should be
`int32_t` to match the type of elements in the `heap_indices` array. When
assigning an `int32_t` value from `heap_indices[0]` into a `float32_t` temporary
and then back to `heap_indices[i]`, precision loss occurs for large index values
above 2^24, resulting in incorrect indices. Change the declaration of `root_idx`
from `float32_t` to `int32_t` while keeping `root_val` as `float32_t` since it
stores values from the `heap_values` array.
TargetLibraries/Spatz/src/Softmax_fp32_spatz.c-66-78 (1)
66-78: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

myinv violates strict aliasing.

uint32_t i = *(uint32_t*)&x; and float y = *(float*)&i; reinterpret through incompatible pointer types, which is undefined behavior and may be miscompiled under optimization. Ironically expf_nodiv_reduced already adopts a union float_bits precisely to avoid this; reuse that approach here (or memcpy).
🛡️ Use the type-punning union instead of pointer casts
 float32_t myinv(float32_t x){
-    uint32_t i = *(uint32_t*)&x;
-    i = 0x7EEEEEEE - i; 
-    float y = *(float*)&i;
+    union float_bits u;
+    u.f = x;
+    u.i = 0x7EEEEEEE - u.i;
+    float y = u.f;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Spatz/src/Softmax_fp32_spatz.c` around lines 66 - 78, The
myinv function violates strict aliasing rules by using pointer casts to
reinterpret float as uint32_t and back, which is undefined behavior. Replace the
pointer casts in myinv with a union type approach (reuse the existing float_bits
union already adopted in expf_nodiv_reduced) or use memcpy to safely perform the
type punning while maintaining type safety and compatibility with compiler
optimizations.
conda_enviroment_topk_attention.yml-1-1 (1)
1-1: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

name/prefix use a ~ path, which conda will not handle as intended.

name: should be a plain environment name (e.g. deeploy_conda_venv), not ~/.conda/envs/.... The prefix: on Line 81 is a machine-specific absolute-ish path that conda does not tilde-expand and that conda env create ignores/overrides; it leaks the author's local layout. Consider removing prefix and using a simple name. Also the filename is misspelled: "enviroment" → "environment".

Also applies to: 81-81
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conda_enviroment_topk_attention.yml` at line 1, The environment configuration
file has two issues: the `name` field on line 1 contains a tilde-path
(`~/.conda/envs/deeploy_conda_venv`) when it should only contain a plain
environment name like `deeploy_conda_venv` without any path, and the `prefix`
field on line 81 contains a machine-specific absolute path that conda does not
expand properly. Remove the `prefix` field entirely and replace the `name` value
with just the environment name `deeploy_conda_venv`. Additionally, rename the
file from `conda_enviroment_topk_attention.yml` to
`conda_environment_topk_attention.yml` to fix the spelling of "environment".
cmake/spatz/toolchain_llvm.cmake-68-68 (1)
68-68: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Hardcoded, machine-specific --gcc-toolchain path.

--gcc-toolchain=/usr/pack/riscv-1.0-kgf/spatz-gcc-7.1.1 ties the build to one specific cluster filesystem and will fail elsewhere. Derive it from an environment variable (e.g. $ENV{GCC_INSTALL_DIR}) or a configurable cache option.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmake/spatz/toolchain_llvm.cmake` at line 68, Replace the hardcoded path in
the `--gcc-toolchain` flag (currently `/usr/pack/riscv-1.0-kgf/spatz-gcc-7.1.1`)
with a dynamic reference that can be configured across different environments.
Use either a CMake cache variable (defined with `set()` and available as an
option) or an environment variable referenced via `$ENV{VARIABLE_NAME}` syntax.
This allows the toolchain path to be customized at build time without modifying
the CMake file directly.

🧹 Nitpick comments (9)

TargetLibraries/Spatz/src/MatMul_fp32_spatz.c (2)
26-26: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

const is discarded when passing inputs to the GEMV kernels.

Spatz_MatMul_fp32_fp32_fp32 declares a/b as const float32_t *__restrict__, but gemv_v32b_m4/gemv_col_reduction take non-const float *, and Line 130/137 assign b (const) to float *b_offset. These all silently discard const, which will emit warnings and fail under -Werror.
♻️ Make the GEMV input pointers const
-void gemv_v32b_m4(float *a, float *b, float *c, int N, int local_P, int total_P) {
+void gemv_v32b_m4(const float *a, const float *b, float *c, int N, int local_P, int total_P) {
-void gemv_col_reduction(float *a, float *b, float *c, int N, int local_P, int total_P) {
+void gemv_col_reduction(const float *a, const float *b, float *c, int N, int local_P, int total_P) {
-      float *b_offset = b + cols_core0; float *c_offset = c + cols_core0;
+      const float *b_offset = b + cols_core0; float *c_offset = c + cols_core0;
Please confirm the Spatz toolchain flags (e.g. whether -Werror is set in cmake/spatz/toolchain_llvm.cmake); under -Werror the discarded-qualifier warnings would break the build.

Also applies to: 51-51, 130-130, 137-137
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Spatz/src/MatMul_fp32_spatz.c` at line 26, The function
signature for gemv_v32b_m4 on line 26 and gemv_col_reduction on line 51 are
discarding const qualifiers from their input pointer parameters a and b, which
causes compiler warnings and fails under -Werror. Update both gemv_v32b_m4 and
gemv_col_reduction function signatures to declare their float pointer parameters
as const float * instead of non-const float * to match the const qualifiers in
the caller Spatz_MatMul_fp32_fp32_fp32. Additionally, update any intermediate
pointer variables like b_offset on lines 130 and 137 that are assigned from
these parameters to also be declared as const pointers to maintain const
correctness throughout.
122-123: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Resolve the TODO on the GEMV algorithm-selection heuristic.

The N > 4*P threshold for choosing gemv_col_reduction vs. gemv_v32b_m4 is marked as provisional. Worth pinning down before this leaves draft.

Want me to open an issue to track tuning this dispatch threshold?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Spatz/src/MatMul_fp32_spatz.c` around lines 122 - 123, The
TODO comment in the M==1 conditional block indicates the algorithm-selection
threshold for choosing between gemv_col_reduction and gemv_v32b_m4 needs to be
finalized. Replace the provisional threshold logic with either a tuned constant
value based on performance benchmarking or define a named constant that clearly
represents the optimal heuristic. Remove the TODO comment once the threshold is
properly determined and expressed as a meaningful named constant or
well-justified numeric value in the conditional that compares N against a
multiple of P.
Deeploy/Targets/Spatz/Templates/TopKTemplate.py (1)
39-39: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Misleading comment.

The comment reads # compute_topk_vector_instructions, but minHeapTemplate invokes compute_topk_min_heap. Update or remove it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Spatz/Templates/TopKTemplate.py` at line 39, The comment at
line 39 incorrectly states `compute_topk_vector_instructions` but the actual
code invokes `compute_topk_min_heap` through the `minHeapTemplate` function.
Update the comment to accurately reflect that `minHeapTemplate` invokes
`compute_topk_min_heap`, or remove the misleading comment if it no longer serves
a purpose.
TargetLibraries/Spatz/src/Softmax_fp32_spatz.c (1)
4-23: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

myexpf appears to be dead code.

The softmax paths use expf_nodiv_reduced exclusively; myexpf (with its ldexpf dependency) is unused. Per the earlier request to keep the codebase concise, consider removing it unless it is kept intentionally.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Spatz/src/Softmax_fp32_spatz.c` around lines 4 - 23, The
myexpf function is unused dead code that should be removed to keep the codebase
concise. Since the softmax paths exclusively use expf_nodiv_reduced instead of
myexpf, simply delete the entire myexpf function definition that includes the
range reduction logic, polynomial approximation, and ldexpf call.
TargetLibraries/Spatz/inc/DeeploySpatzMath.h (1)
27-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Single-core macros reference an undefined core_id.

The kernels in this cohort use cid from snrt_cluster_core_idx(), while these macros expand to if (core_id == 0). Any translation unit using BEGIN_SINGLE_CORE/SINGLE_CORE must declare a core_id in scope, which is inconsistent with the rest of the codebase and easy to misuse.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Spatz/inc/DeeploySpatzMath.h` around lines 27 - 29, The
macros BEGIN_SINGLE_CORE, END_SINGLE_CORE, and SINGLE_CORE reference an
undefined variable `core_id`, while the codebase consistently uses `cid`
obtained from snrt_cluster_core_idx(). Update these three macro definitions to
use snrt_cluster_core_idx() directly within each macro expansion instead of
relying on an external `core_id` variable. This will make the macros
self-contained and eliminate the requirement for callers to declare `core_id` in
scope, ensuring consistency across the codebase.
Deeploy/TilingExtension/TileConstraint.py (1)
134-136: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Guard the multi-output invariant explicitly.

At Line 138, tiling structure is derived only from the first output. Add a fail-fast check that every output has the same shape and memory-path as the first output, otherwise schedules can be silently incorrect for future heterogeneous multi-output operators.
Suggested patch
-        outVar, outTensorConstraint = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+        outVar, outTensorConstraint = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+        refShape = tuple(ctxt.lookup(outVar).shape)
+        refPath = tuple(outTensorConstraint.memoryConstraints.keys())
+        for otherVar, otherConstraint in tilingSolution.outputTensorMemoryConstraints.items():
+            otherShape = tuple(ctxt.lookup(otherVar).shape)
+            otherPath = tuple(otherConstraint.memoryConstraints.keys())
+            assert otherShape == refShape, \
+                f"Multi-output tiling requires matching output shapes, got {outVar}:{refShape} vs {otherVar}:{otherShape}"
+            assert otherPath == refPath, \
+                f"Multi-output tiling requires matching memory paths, got {outVar}:{refPath} vs {otherVar}:{otherPath}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/TilingExtension/TileConstraint.py` around lines 134 - 136, The
current assertion in the multi-output check only verifies that at least one
output exists, but does not validate that all outputs have compatible tiling
characteristics. Add an explicit fail-fast guard after the existing assertion
that iterates through tilingSolution.outputTensorMemoryConstraints and verifies
that every output shares the same shape and memory-path properties as the first
output (which is used to determine the tiling structure). This will prevent
silent correctness issues when handling heterogeneous multi-output operators in
the future.
Makefile (2)
76-76: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Declare orchestration targets as .PHONY.

These targets should be phony to avoid accidental no-op builds when files/directories with the same names exist.
Proposed fix
+.PHONY: all toolchain emulators spatz_runtime echo-bash
+
 all: toolchain emulators # docs echo-bash
Also applies to: 100-102, 458-458
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 76, Add `.PHONY` declarations for the `all` target on line
76 and the other orchestration targets referenced at lines 100-102 and 458.
Include each target name in a `.PHONY` declaration placed before its target
definition to ensure Make always executes these targets regardless of whether
files or directories with matching names exist.
Source: Linters/SAST tools

450-452: 📐 Maintainability & Code Quality | 🔵 Trivial

Consider using generic python3 instead of hardcoded python3.6 for improved portability.

Line 450 hardcodes python3.6, while other recipes in the Makefile (e.g., SNITCH_INSTALL_DIR at line 432) do not specify a Python version. The project contains no documented Python version requirement, and the commented-out gvsoc_spatz recipe (line 465) uses generic python3. Using a configurable interpreter will make the target compatible with modern Python environments without sacrificing functionality.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 450 - 452, Replace the hardcoded python3.6 interpreter
reference with generic python3 in the venv creation command. This change
improves portability by allowing the Makefile to work with any available Python
3 version on the system, consistent with other recipes in the Makefile that do
not specify a version. Locate the line where python3.6 -m venv .venv is called
and change python3.6 to python3 while keeping the rest of the venv and pip
install commands intact.
conda_enviroment_topk_attention.yml (1)
5-38: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Fully-pinned conda build strings make this env non-reproducible off the author's machine.

The conda section pins exact OS-specific build hashes (e.g. =h5eee18b_6), which is the output of conda env export on one Linux host and will fail to solve on other platforms. If this file is meant to be a shared/reproducible env, prefer conda env export --from-history or loosen to version-only pins.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conda_enviroment_topk_attention.yml` around lines 5 - 38, The conda
environment file contains OS-specific build hashes (the characters after the
last equals sign in each package specification, such as h5eee18b_6 in
bzip2=1.0.8=h5eee18b_6) which make the environment non-reproducible across
different platforms. Remove the build hash suffix from each package line,
keeping only the package name and version number (e.g., change
bzip2=1.0.8=h5eee18b_6 to bzip2=1.0.8). Alternatively, regenerate this file
using conda env export --from-history instead of conda env export to produce a
more portable specification.

coderabbitai · 2026-06-23T07:26:15Z

+        self.operatorRepresentation['k_value'] = int(k_in.values[0])
+        self.operatorRepresentation['values_out'] = values_out.name


🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Validate k as a constant scalar and bound it to input size.

Line 2916 assumes k_in.values[0] exists and is valid. If k is non-constant, scalar-shaped differently, or k > data_in_size, the generated TopK loop indexes beyond the local pairs buffer.

Suggested fix

- self.operatorRepresentation['k_value'] = int(k_in.values[0]) + if not hasattr(k_in, "values"): + return ctxt, False + k_value = int(np.asarray(k_in.values).item()) + if k_value <= 0 or k_value > self.operatorRepresentation['data_in_size']: + return ctxt, False + self.operatorRepresentation['k_value'] = k_value

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.operatorRepresentation['k_value'] = int(k_in.values[0])

self.operatorRepresentation['values_out'] = values_out.name

if not hasattr(k_in, "values"):

return ctxt, False

k_value = int(np.asarray(k_in.values).item())

if k_value <= 0 or k_value > self.operatorRepresentation['data_in_size']:

return ctxt, False

self.operatorRepresentation['k_value'] = k_value

self.operatorRepresentation['values_out'] = values_out.name

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Parsers.py` around lines 2916 - 2917, Before assigning the k value at line 2916 in the TopK operator representation setup, add validation to ensure that k_in is a constant scalar and that its value does not exceed the size of the input data. Check that k_in is constant-valued, verify it has scalar shape, extract the k value safely, and validate that the k value is within bounds relative to data_in_size to prevent buffer overruns in the TopK loop. If validation fails, raise an appropriate error describing which constraint was violated.

coderabbitai · 2026-06-23T07:26:15Z

+target_include_directories(network PUBLIC ${SPATZ_BENCHMARK_INCLUDE_DIR})
+
+target_link_libraries(${ProjectId} PRIVATE network deeploylib)
+target_compile_options(${ProjectId} INTERFACE network)


🎯 Functional Correctness | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Inspect the changed CMake file:" sed -n '1,80p' DeeployTest/Platforms/Spatz/CMakeLists.txt echo echo "Find target_compile_options usages under DeeployTest/Platforms:" fd -i 'CMakeLists.txt' DeeployTest/Platforms --exec rg -n 'target_compile_options\(' {}

Repository: pulp-platform/Deeploy

Length of output: 1294

🌐 Web query:

CMake target_compile_options passing target name as compile option

💡 Result:

In CMake, the target_compile_options command is used to add compiler options to a specific target [1][2]. The syntax is: target_compile_options( [BEFORE] <INTERFACE|PUBLIC|PRIVATE> ... [<INTERFACE|PUBLIC|PRIVATE> ...]...) You do not "pass" the target name as a compile option in the sense of a flag passed to the compiler; rather, you specify the target name as the first argument to the command to indicate which target the options apply to [1][3]. If you need to access the name of the target being built from within your compile options (e.g., to pass the target name as a preprocessor definition), you can use the $<TARGET_NAME:...> generator expression or simply reference the target name directly in your command if you are generating the build system [4]. However, for passing the name as a macro definition, it is standard practice to use: target_compile_definitions( PRIVATE TARGET_NAME="${target_name}") If you are looking to retrieve or inspect the compile options that have already been set for a specific target, you can use the get_target_property command [5][6]: get_target_property(my_options COMPILE_OPTIONS) message("Options are: ${my_options}") Key points: - The target name is the mandatory first argument to target_compile_options [1][7]. - Options added via this command are stored in the COMPILE_OPTIONS or INTERFACE_COMPILE_OPTIONS target properties [1][3]. - Generator expressions, such as $<...>, can be used to dynamically determine options based on the target or other conditions [1][4].

Citations:

1: https://cmake.org/cmake/help/latest/command/target_compile_options.html?highlight=target_compile_options

2: https://cmake.org/cmake/help/latest/command/target_compile_options.html

3: https://cmake.org/cmake/help/v4.3/command/target_compile_options.html

4: https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html

5: https://cmake.org/cmake/help/latest/command/get_target_property.html

6: https://stackoverflow.com/questions/56104607/how-to-print-current-compilation-flags-that-are-set-with-target-compile-options

7: https://cmake.org/cmake/help/git-master/command/target_compile_options.html

🏁 Script executed:

# Check what 'network' is in the codebase - is it a library, executable, or custom target? rg -n "add_library\(network" --type cmake rg -n "add_executable\(network" --type cmake rg -n "project\(network" --type cmake # Check if there's any documentation or comments about this pattern rg -B3 -A3 'target_compile_options.*network' DeeployTest/Platforms --type cmake

Repository: pulp-platform/Deeploy

Length of output: 159

🏁 Script executed:

# Find where network is defined and what type of target it is rg -n "add_library.*network|add_executable.*network" --type cmake # Look at all CMakeLists.txt files that might define network fd CMakeLists.txt -x grep -l "network" {} \; # Check the specific context around line 20 again and nearby lines cat -n DeeployTest/Platforms/Spatz/CMakeLists.txt | sed -n '1,30p' # Search for any documentation about the network target rg -n "network" DeeployTest/Platforms --type cmake | head -20

Repository: pulp-platform/Deeploy

Length of output: 159

🏁 Script executed:

# Simple direct approach - check if the file exists and show its content if [ -f "DeeployTest/Platforms/Spatz/CMakeLists.txt" ]; then echo "=== File exists, showing lines 15-25 ===" head -25 DeeployTest/Platforms/Spatz/CMakeLists.txt | tail -11 else echo "File not found" fi # Try to find network definition echo "" echo "=== Searching for 'network' target definition ===" find . -name CMakeLists.txt -type f -exec grep -l "add_library.*network\|add_executable.*network" {} \; 2>/dev/null | head -5

Repository: pulp-platform/Deeploy

Length of output: 723

🏁 Script executed:

# Look at the main CMakeLists.txt to see how network is defined cat -n ./DeeployTest/CMakeLists.txt | grep -A5 -B5 "network" # Also check if there are any build errors documented find . -name "*.md" -o -name "*.txt" | xargs grep -l "network" 2>/dev/null | head -5

Repository: pulp-platform/Deeploy

Length of output: 2064

Remove the invalid target_compile_options call passing a target name as a compile flag.

Line 20 passes network (a library target) as a compile option, which is invalid CMake syntax. Compiler options must be actual flags (like -Wall, -O2) or generator expressions. The dependency is already correctly handled by target_link_libraries on line 19. Similar problematic patterns appear elsewhere in the file and should be audited as well.

💡 Proposed fix

target_link_libraries(${ProjectId} PRIVATE network deeploylib) -target_compile_options(${ProjectId} INTERFACE network)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

target_compile_options(${ProjectId} INTERFACE network)

target_link_libraries(${ProjectId} PRIVATE network deeploylib)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/Platforms/Spatz/CMakeLists.txt` at line 20, The target_compile_options call is incorrectly passing the network target name as a compile flag. Remove the line containing target_compile_options(${ProjectId} INTERFACE network) since the network dependency is already correctly handled by the target_link_libraries call on the preceding line. Compiler options should only contain actual compiler flags like -Wall or -O2, not target names. After fixing this line, audit the rest of the CMakeLists.txt file for similar invalid patterns where target names might be mistakenly used as compile options.

coderabbitai · 2026-06-23T07:26:15Z

+    # Instantiate Classes Required for Memory Level Annotation Extension
+    if args.platform == "Spatz":
+        # Spatz cluster has only TCDM (L1) + external DRAM (L3). No on-chip L2.
+        # Declare L1 and L3 as direct neighbours so BFS-based tile-path
+        # generation does not insert a phantom L2 staging buffer.
+        L3 = MemoryLevel(name = "L3", neighbourNames = ["L1"], size = 64000000)
+        L1 = MemoryLevel(name = "L1", neighbourNames = ["L3"], size = args.l1)
+        memoryLevels = [L3, L1]
+    else:
+        L3 = MemoryLevel(name = "L3", neighbourNames = ["L2"],        size = 64000000)
+        L2 = MemoryLevel(name = "L2", neighbourNames = ["L3", "L1"],  size = args.l2)
+        L1 = MemoryLevel(name = "L1", neighbourNames = ["L2"],        size = args.l1)
+        memoryLevels = [L3, L2, L1]
+


🎯 Functional Correctness | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check defaultMemLevel defaults/usages for Spatz paths rg -n --type=py 'defaultMemLevel|--defaultMemLevel|platform.*Spatz|Spatz' \ DeeployTest/testMVP.py \ DeeployTest/deeployRunner_spatz.py \ DeeployTest/deeployRunner_tiled_spatz.py \ DeeployTest/test_platforms.py

Repository: pulp-platform/Deeploy

Length of output: 1387

🏁 Script executed:

sed -n '150,160p' DeeployTest/testMVP.py

Repository: pulp-platform/Deeploy

Length of output: 701

🏁 Script executed:

sed -n '270,285p' DeeployTest/testMVP.py

Repository: pulp-platform/Deeploy

Length of output: 886

🏁 Script executed:

cat DeeployTest/deeployRunner_spatz.py

Repository: pulp-platform/Deeploy

Length of output: 401

🏁 Script executed:

cat DeeployTest/deeployRunner_tiled_spatz.py

Repository: pulp-platform/Deeploy

Length of output: 400

Spatz platform will fail at runtime with default --defaultMemLevel L2 since L2 is not defined.

The default value for --defaultMemLevel is "L2" (line 155), but lines 255-261 remove L2 for Spatz, creating only L3 and L1. The Spatz runners (deeployRunner_spatz.py and deeployRunner_tiled_spatz.py) do not override this default, so when Spatz is executed without an explicit --defaultMemLevel argument, line 278 will attempt to access memoryHierarchy.memoryLevels["L2"], which does not exist and will cause a runtime failure.

Either pass --defaultMemLevel L3 in Spatz runners or update the default to be platform-aware.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/testMVP.py` around lines 254 - 267, The Spatz platform configuration only creates L3 and L1 memory levels but the default value for the --defaultMemLevel argument is "L2", which does not exist in Spatz, causing a runtime failure when accessing memoryHierarchy.memoryLevels["L2"]. Update the Spatz runners (deeployRunner_spatz.py and deeployRunner_tiled_spatz.py) to explicitly pass --defaultMemLevel L3 when invoking the test, or alternatively make the default value for --defaultMemLevel platform-aware by checking the platform argument first and setting the default to "L3" for Spatz and "L2" for other platforms.

Bumblebee00 and others added 30 commits March 2, 2026 19:31

add placeholdeer code for spatz platform

86cfce6

code generation with generic c code

0774556

modified spatz c code to use proper memory allocation and copying fun…

a8fd323

…ctions

tmp commit to send to badie103

bbf2751

modified Makefile and cmakefiles to build and use spatz runtime. ugly…

b1a5868

… version

vsim simulator runnable by deeployRunner

0acf6e7

Removed reverence to conda environment and added commands to create v…

8e4e8c3

…env in the makefile

typo

9cab786

double gvsoc temporaney configuration

e3c46c4

reunited gvsoc build for spatz and other platforms

651d4cf

forgot comment

908261d

forgot comment

edc461f

added topk generic binding (hardcoded k=10)

cd13ce4

added matmul softmax and topk generic bindings to spatz

f72e147

switched default simulator to gvsoc bc is faster

93db81b

added topk test network

eee910c

added sparse attention test network

1523825

added topk binding to generic platform

8b99e3a

improved generic gather node to support more than one index

b541e17

added gather binding for spatz

e35433d

now for any k inot the graph

7eb2e35

added big attention

8f99e67

first draft of tiling (not working)

bfcdda6

minimalloc fix

bdcdd70

fixed makefile adding missing things

64e3874

added yaml for my conda environment

23017e8

modified bindings to use snrt_dma_wait_all function

8b46fd5

added different dimensions of FP32/MatMul

3187d64

fixed simulation staling issue

1624642

fixed memory levels

08ddf29

removed unnecessary tests

83a67a5

bowwwang requested changes Jun 22, 2026

View reviewed changes

Bumblebee00 added 10 commits June 22, 2026 18:18

modified tests

cbdf855

removed unised comments

b941002

necessary spatz makefile modification

c414332

unecessary comment

c25a855

missing slash in makefile

7b0f848

removed old gather templates

348eda3

removed comments

855065e

removed unnecessary matmuls

e773691

rmoved unnecessary softmax

82f7071

renamed .yml

ba320c6

Bumblebee00 marked this pull request as ready for review June 22, 2026 21:21

Bumblebee00 requested review from Victor-Jung, Xeratec and runwangdl as code owners June 22, 2026 21:21

Bumblebee00 marked this pull request as draft June 22, 2026 21:21

Victor-Jung added this to the Release 0.2.2 milestone Jun 23, 2026

Victor-Jung added this to Deeploy Jun 23, 2026

Victor-Jung removed the Feature Addition of new features label Jun 23, 2026

Victor-Jung self-assigned this Jun 23, 2026

bowwwang marked this pull request as ready for review June 23, 2026 07:10

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Victor-Jung changed the title ~~[DRAFT] Spatz integration~~ Spatz integration Jun 23, 2026

Victor-Jung changed the title ~~Spatz integration~~ Spatz Integration Jun 23, 2026

Victor-Jung changed the title ~~Spatz Integration~~ Spatz Platform Jun 23, 2026

		_waitTemplate = NodeTemplate("if (snrt_is_dm_core()) snrt_dma_wait_all();")


		# LMACAN: TODO: Add single transfer waiting




		void Spatz_MatMul_fp32_fp32_fp32(const float32_t *__restrict__ a,

		@@ -0,0 +1,81 @@
		name: /scratch/mmm/.conda/envs/deeploy_conda_venv2

		self.operatorRepresentation['k_value'] = int(k_in.values[0])
		self.operatorRepresentation['values_out'] = values_out.name

	target_compile_options(${ProjectId} INTERFACE network)
	target_link_libraries(${ProjectId} PRIVATE network deeploylib)

Conversation

Bumblebee00 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

Bumblebee00 commented Jun 22, 2026

Uh oh!

bowwwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bumblebee00 commented Jun 22, 2026

Uh oh!

Victor-Jung commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bumblebee00 commented Mar 2, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading