Spatz Platform#168
Conversation
…env in the makefile
|
ok i deleted the unnecesary tests tell me if its ok |
bowwwang
left a comment
There was a problem hiding this comment.
Hello Mattia, thank you for the efforts. Here is the second round of the review.
After the fixes, you could enable the CI checks to see if your implementation can pass all the test. It would also make sense to first rebase it to the latest commit.
| --gcc-toolchain=/usr/pack/riscv-1.0-kgf/spatz-gcc-7.1.1 | ||
| ) | ||
|
|
||
| # User command linked: -lm -lgcc -lm -lgcc libsnRuntime-cluster.a |
There was a problem hiding this comment.
remove the comments if unused
| width = int(data_in_type.referencedType.typeWidth/8) | ||
| %> | ||
| BEGIN_SINGLE_CORE | ||
| % if num_indices == 1: |
There was a problem hiding this comment.
why do we have this case here? what is the meaning of num_indices?
There was a problem hiding this comment.
this is to not generate a loop with one iteration:
for (uint32_t j=0; j<1; ++j) {
is cleaner c code
| _waitTemplate = NodeTemplate("if (snrt_is_dm_core()) snrt_dma_wait_all();") | ||
|
|
||
|
|
||
| # LMACAN: TODO: Add single transfer waiting |
There was a problem hiding this comment.
Is this something still in progress?
| 2: | ||
| NodeTemplate(""" | ||
| if (snrt_is_dm_core()) { | ||
| ${future} = snrt_dma_start_2d(${dest}, ${src}, ${size}, ${stride_dest}, ${stride_src}, ${repeat}); |
There was a problem hiding this comment.
Is it by default using 2D-DMA now? If so why?
There was a problem hiding this comment.
yes the function snrt_dma_start_2d is used for all nodes. The reason is that to transfer tiles of matrices you need to have a stride. For the other nodes where stride is not needed, repeat is set to 1
| % else: | ||
| for (uint32_t i=0; i<${batch}; ++i) { | ||
| for (uint32_t j=0; j<${num_indices}; ++j) { | ||
| memcpy(${data_out} + i * (${num_indices} * ${axis_length}) + j * ${axis_length}, |
There was a problem hiding this comment.
can we align the usage of memcpy and DMA? why sometimes DMA is used but here is memcpy?
There was a problem hiding this comment.
those are the templates for the old versions of the gather node, not used anymore, now the memcpy should never be used. I will remove now those old templates
| } | ||
| } | ||
|
|
||
| void gemv_col_reduction(float *a, float *b, float *c, int N, int P) { |
There was a problem hiding this comment.
please make the function number easy to inpertret. You can specify it is either innor_product, gustv, or outer_product, as we discussed eariler
| } | ||
| } | ||
|
|
||
| void matmul_naive_vanilla(float *a, float *b, float *c, int M, int N, int P) { |
There was a problem hiding this comment.
good to keep this one, change the name of the kernel to matmul_vanila
|
|
||
|
|
||
|
|
||
| void Spatz_MatMul_fp32_fp32_fp32(const float32_t *__restrict__ a, |
There was a problem hiding this comment.
align the function usage here accordingly
| @@ -0,0 +1,202 @@ | |||
| #include "DeeployBasicMath.h" | |||
There was a problem hiding this comment.
could you please keep a vanila version of the softmax, and one with the best performance? The rest can be removed. We want to keep a clean and concise codebase if possible.
If certain kernel has a special reason to keep, please specify in a comment.
There was a problem hiding this comment.
actually i have only one version of softmax with the best performance
| @@ -0,0 +1,81 @@ | |||
| name: /scratch/mmm/.conda/envs/deeploy_conda_venv2 | |||
There was a problem hiding this comment.
make be change the name of this file to a more generic one without your usrname
There was a problem hiding this comment.
done but for me we can also remove this file
|
i should rebase on latest commit of devel or main? |
Devel please 😁 |
📝 WalkthroughWalkthroughThis PR introduces a new Spatz RISC-V vector processor deployment target for the Deeploy framework. It adds a generic TopK operator (parser, type checker, template, bindings), implements Spatz-specific C kernels (float32 MatMul, Softmax, TopK), defines DMA/tiling infrastructure, and provides a complete test harness with CMake build support. ChangesSpatz Platform and TopK Operator
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
♻️ Duplicate comments (1)
Deeploy/Targets/Spatz/Deployer.py (1)
31-38: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winForward
inputOffsetsto preserve sign-propagation.Line 31 calls
SignPropDeployer.__init__withoutinputOffsets, so any caller-provided offsets are dropped. This changes input signedness inference for Spatz deployments.Suggested fix
super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name, default_channels_first = default_channels_first, - deeployStateDir = deeployStateDir) + deeployStateDir = deeployStateDir, + inputOffsets = inputOffsets)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/Deployer.py` around lines 31 - 38, The SpatzDeployer.__init__ method is not forwarding the inputOffsets parameter to the parent SignPropDeployer class via the super().__init__() call. Add inputOffsets as a parameter to the super().__init__() call alongside the existing parameters (graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name, default_channels_first, and deeployStateDir) to ensure that caller-provided input offsets are properly propagated to the parent class and preserve the sign-propagation behavior.
🟠 Major comments (20)
Deeploy/Targets/Spatz/Templates/AllocateTemplate.py-17-20 (1)
17-20: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winUse the same byte-size formula in the fallback allocation path.
The
elsebranch switches fromsizeof(...) * sizetotypeWidth//8 * size. That can allocate too few bytes and cause buffer overruns for width/ABI mismatches.Suggested fix
% else: // COMPILER WARNING — unsupported memory level ${_memoryLevel}, defaulting to L3 -${name} = (${type.typeName}) snrt_l3alloc(${type.referencedType.typeWidth//8} * ${size}); +${name} = (${type.typeName}) snrt_l3alloc(sizeof(${type.referencedType.typeName}) * ${size}); % endif🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/Templates/AllocateTemplate.py` around lines 17 - 20, The fallback allocation path in the else branch uses a different byte-size formula than the primary path. In the snrt_l3alloc call within the else block, replace the calculation formula that uses typeWidth//8 with the same sizeof-based formula that should be used in the non-fallback allocation path to ensure consistent byte-size calculation and prevent potential buffer overruns due to width/ABI mismatches.Deeploy/Targets/Generic/Templates/TopKTemplate.py-16-16 (1)
16-16: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftAvoid allocating
pairs[${data_in_size}]on the stack.Line 16 uses an O(N) VLA on stack; realistic tensor sizes can exhaust stack and crash generated kernels.
Suggested direction
- topk_pair_t pairs[${data_in_size}]; + // Prefer an O(k) approach (min-heap/selection buffer) or a managed transient buffer + // instead of a stack VLA proportional to input size.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Templates/TopKTemplate.py` at line 16, Replace the stack-allocated Variable Length Array (VLA) declaration `topk_pair_t pairs[${data_in_size}]` with dynamic heap allocation to prevent stack overflow crashes with large tensor sizes. Allocate the pairs array dynamically on the heap using appropriate memory allocation functions and ensure proper deallocation before the function returns to avoid memory leaks.Deeploy/Targets/Generic/Bindings.py-333-336 (1)
333-336: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winTopK binding uses 8-bit integers for
kand indices, which truncates valid values.Lines 334-335 cap both
kand output indices toint8_t. This will corrupt indices for typical tensor sizes and reject/overflow largerk.Suggested fix
TopKChecker( - [PointerClass(float32_t), PointerClass(int8_t)], # inputs - [PointerClass(float32_t), PointerClass(int8_t)] # outputs + [PointerClass(float32_t), PointerClass(int32_t)], # inputs + [PointerClass(float32_t), PointerClass(int32_t)] # outputs ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Bindings.py` around lines 333 - 336, The TopKChecker binding is using int8_t (8-bit signed integers) for both the k parameter and the output indices, which is too restrictive and will cause data corruption for typical tensor operations. Replace the PointerClass(int8_t) references in the TopKChecker constructor with PointerClass(int32_t) or PointerClass(int64_t) to allow for larger k values and tensor indices that can exceed the range of an 8-bit integer.Deeploy/Targets/Generic/Templates/GatherTemplate.py-19-22 (1)
19-22: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winAdd index normalization/bounds checks in multi-index gather path.
Line 21 dereferences
${indices}[j]directly. Negative or out-of-range values can produce invalid memcpy source addresses.Suggested fix
for (uint32_t i=0; i<${batch}; ++i) { for (uint32_t j=0; j<${num_indices}; ++j) { + int32_t idx = ${indices}[j]; + const int32_t axis_dim = (${batch_length} / ${axis_length}); + if (idx < 0) idx += axis_dim; + if (idx < 0 || idx >= axis_dim) { continue; } memcpy(${data_out} + i * (${num_indices} * ${axis_length}) + j * ${axis_length}, - ${data_in} + i * ${batch_length} + ${indices}[j] * ${axis_length}, + ${data_in} + i * ${batch_length} + idx * ${axis_length}, ${axis_length} * ${width}); } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Templates/GatherTemplate.py` around lines 19 - 22, The loop in the memcpy operation directly dereferences ${indices}[j] without validating that the index value is within valid bounds or handling negative values. Add index normalization and bounds checking before using ${indices}[j] to access memory. Validate that each index value is non-negative and within the valid range (0 to the appropriate dimension size) to prevent invalid memcpy source addresses. Apply this validation within the for loop that iterates with j before the memcpy call that uses ${indices}[j].Deeploy/Targets/Generic/Parsers.py-1003-1007 (1)
1003-1007: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winNormalize Gather axis before deriving strides.
Line 1005 treats any
axis <= 0as batch1. For negative axes this computes invalid offsets (batch,batch_length,axis_length) and can produce out-of-bounds addresses in generated gather code.Suggested fix
- axis = self.operatorRepresentation['axis'] - shape = ctxt.lookup(node.inputs[0].name).shape + axis = self.operatorRepresentation['axis'] + shape = ctxt.lookup(node.inputs[0].name).shape + rank = len(shape) + if axis < 0: + axis += rank + if axis < 0 or axis >= rank: + return ctxt, False + self.operatorRepresentation['axis'] = axis self.operatorRepresentation['batch'] = int(np.prod(shape[:axis])) if axis > 0 else 1 self.operatorRepresentation['batch_length'] = int(np.prod(shape[axis:])) self.operatorRepresentation['axis_length'] = int(np.prod(shape[axis + 1:])) if axis + 1 < len(shape) else 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Parsers.py` around lines 1003 - 1007, Negative axes are not being normalized before calculating batch, batch_length, and axis_length values, causing invalid offset computations. In the code block starting with the axis assignment, normalize the axis value to its positive equivalent when it is negative (by adding len(shape) to negative axis values, similar to Python's indexing behavior) before using it in the subsequent calculations for batch, batch_length, and axis_length.Deeploy/Targets/Generic/Parsers.py-2914-2916 (1)
2914-2916: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winTopK parsing currently flattens the full input tensor without a shape/axis guard.
Line 2915 collapses every dimension (
np.prod(data_in.shape)), while this implementation path is 1D-only. For non-1D inputs this silently changes TopK semantics.Suggested fix
self.operatorRepresentation['data_in'] = data_in.name + if len(data_in.shape) != 1: + return ctxt, False self.operatorRepresentation['data_in_size'] = int(np.prod(data_in.shape))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Parsers.py` around lines 2914 - 2916, The TopK parsing logic in the section setting operatorRepresentation for data_in and data_in_size does not validate input dimensions. Add a guard to ensure the input tensor is 1D before calculating data_in_size using np.prod(data_in.shape), since this implementation path assumes 1D inputs. For non-1D tensors, either validate and raise an appropriate error or adjust the logic to handle the correct TopK semantics based on the actual input shape. Reference the data_in.shape check before the line containing int(np.prod(data_in.shape)) to properly validate input dimensionality.Deeploy/TilingExtension/TilerExtension.py-443-448 (1)
443-448: 🩺 Stability & Availability | 🟠 MajorAdd a timeout to the MiniMalloc subprocess call.
Line 443 invokes an external allocator without a timeout. If
minimallochangs, the tiling flow can block indefinitely.🔧 Proposed fix
- minimallocOutput = subprocess.run([ - f"{minimallocInstallDir}/minimalloc", f"--capacity={capacity // alignment}", - f"--input={self._minimalloc_input}.csv", f"--output={self._minimalloc_output}.csv" - ], - capture_output = True, - text = True) + try: + minimallocOutput = subprocess.run([ + f"{minimallocInstallDir}/minimalloc", f"--capacity={capacity // alignment}", + f"--input={self._minimalloc_input}.csv", f"--output={self._minimalloc_output}.csv" + ], + capture_output = True, + text = True, + timeout = 60) + except subprocess.TimeoutExpired as exc: + log.error(f"Memory allocator timed out after {exc.timeout}s at memory level {memoryLevel}.") + raise🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/TilingExtension/TilerExtension.py` around lines 443 - 448, The subprocess.run() call to the minimalloc allocator lacks a timeout parameter, which can cause the tiling flow to block indefinitely if the process hangs. Add a timeout parameter to the subprocess.run() invocation that specifies a reasonable maximum duration for the minimalloc execution. Choose an appropriate timeout value based on expected execution time for the allocator, and ensure that if the timeout is exceeded, the subprocess is properly terminated and an appropriate error is handled.Deeploy/Targets/Spatz/Platform.py-110-117 (1)
110-117: 🩺 Stability & Availability | 🟠 MajorAvoid shared engine state via default arguments.
Line 111 creates one
SpatzEngineat definition time and reuses it for everySpatzPlatform()instance. BecauseNodeMapperobjects in the engine's mapping are stateful (containing mutablebinder,bound, anddiscardedBindingsfields), this leaks binding state across deployments and makes compilation behavior order-dependent.Suggested fix
class SpatzPlatform(DeploymentPlatform): def __init__( self, - engines = [SpatzEngine("SpatzVectorProcessor")], + engines = None, variableBuffer = SpatzVariableBuffer, transientBuffer = SpatzTransientBuffer, constantBuffer = SpatzConstantBuffer, structBuffer = SpatzStructBuffer, includeList: List[str] = includeList ): + if engines is None: + engines = [SpatzEngine("SpatzVectorProcessor")] super().__init__(engines, variableBuffer, constantBuffer, structBuffer, transientBuffer)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/Platform.py` around lines 110 - 117, The default argument engines = [SpatzEngine("SpatzVectorProcessor")] in the __init__ method creates a shared mutable list at class definition time, causing all instances of SpatzPlatform to reuse the same SpatzEngine object and its stateful NodeMapper bindings. Fix this by changing the engines default parameter to None, then inside the __init__ method body, check if engines is None and instantiate a fresh SpatzEngine list only when needed, ensuring each SpatzPlatform instance gets its own independent engine with clean state.Deeploy/Targets/Spatz/TileConstraints/TopKTileConstraint.py-38-40 (1)
38-40: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winGuard TopK shape handling for scalar tensors.
At Line 38 and Line 66, scalar
shape: intwill raise when iterated or passed tolen().🔧 Suggested patch
- for idx, shapeDim in enumerate(_buffer.shape): + shape = (_buffer.shape,) if isinstance(_buffer.shape, int) else tuple(_buffer.shape) + for idx, shapeDim in enumerate(shape): tileDimVar = tilerModel.getTensorDimVar(tensorName = tensorName, dimIdx = idx) tilerModel.addConstraint(tileDimVar == shapeDim) @@ - dataInCube = HyperRectangle(offset = (0,) * len(dataInBuffer.shape), dims = tuple(dataInBuffer.shape)) + dataInShape = (dataInBuffer.shape,) if isinstance(dataInBuffer.shape, int) else tuple(dataInBuffer.shape) + dataInCube = HyperRectangle(offset = (0,) * len(dataInShape), dims = dataInShape)Also applies to: 66-67
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/TileConstraints/TopKTileConstraint.py` around lines 38 - 40, The code iterates over _buffer.shape without guarding for scalar tensors, which will fail when shape is an integer rather than a tuple or list. Add a guard condition in the loop that starts at line 38 (the for loop iterating through enumerate(_buffer.shape)) to check if the shape is scalar (an int) and handle it appropriately, potentially by wrapping it in a tuple or conditionally processing it. Apply the same guard logic to the similar code block at line 66 that also iterates over shape dimensions to ensure consistency across both locations where tensor shapes are processed.Deeploy/Targets/Spatz/TileConstraints/SoftmaxTileConstraint.py-38-40 (1)
38-40: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winNormalize scalar shapes in Softmax tiling paths.
At Line 38 and Line 65, scalar shapes (
int) will fail withenumerate()/len(); this should be normalized first.🔧 Suggested patch
- for idx, shapeDim in enumerate(_buffer.shape): + shape = (_buffer.shape,) if isinstance(_buffer.shape, int) else tuple(_buffer.shape) + for idx, shapeDim in enumerate(shape): tileDimVar = tilerModel.getTensorDimVar(tensorName = tensorName, dimIdx = idx) tilerModel.addConstraint(tileDimVar == shapeDim) @@ - dataInCube = HyperRectangle(offset = (0,) * len(dataInBuffer.shape), dims = tuple(dataInBuffer.shape)) + dataInShape = (dataInBuffer.shape,) if isinstance(dataInBuffer.shape, int) else tuple(dataInBuffer.shape) + dataInCube = HyperRectangle(offset = (0,) * len(dataInShape), dims = dataInShape)Also applies to: 65-66
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/TileConstraints/SoftmaxTileConstraint.py` around lines 38 - 40, The code attempts to enumerate over _buffer.shape which may be a scalar integer value instead of an iterable, causing the enumerate() call to fail. Before the for loop that uses enumerate(_buffer.shape), normalize scalar shapes by converting them to a tuple format (scalar values should be wrapped as a tuple with a single element). Apply this normalization in both locations where enumerate() is used with shape data, specifically where the for loop iterates over _buffer.shape at the start of the constraint addition logic and at the second occurrence around line 65.Deeploy/Targets/Spatz/TileConstraints/GatherTileConstraint.py-39-40 (1)
39-40: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winHandle scalar tensor shapes before iterating or taking
len().At Line 39 and Line 70,
shapeis treated as iterable. Deeploy tensors can carry scalar shapes asint, which will crash here.🔧 Normalize shape to tuple before use
- for idx, shapeDim in enumerate(_buffer.shape): + shape = (_buffer.shape,) if isinstance(_buffer.shape, int) else tuple(_buffer.shape) + for idx, shapeDim in enumerate(shape): tilerModel.addConstraint(tilerModel.getTensorDimVar(tensorName = value, dimIdx = idx) == shapeDim) @@ - dataInCube = HyperRectangle(offset = (0,) * len(dataInBuffer.shape), dims = tuple(dataInBuffer.shape)) - indicesCube = HyperRectangle(offset = (0,) * len(indicesBuffer.shape), dims = tuple(indicesBuffer.shape)) + dataInShape = (dataInBuffer.shape,) if isinstance(dataInBuffer.shape, int) else tuple(dataInBuffer.shape) + indicesShape = (indicesBuffer.shape,) if isinstance(indicesBuffer.shape, int) else tuple(indicesBuffer.shape) + dataInCube = HyperRectangle(offset = (0,) * len(dataInShape), dims = dataInShape) + indicesCube = HyperRectangle(offset = (0,) * len(indicesShape), dims = indicesShape)Also applies to: 70-71
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/TileConstraints/GatherTileConstraint.py` around lines 39 - 40, The code iterates over _buffer.shape in the enumerate loop without checking if it's a scalar integer, which will crash when shape is an int rather than a tuple or list. Before the for loop that enumerates over _buffer.shape, normalize the shape to ensure it's always a tuple - either by wrapping a scalar int in a tuple or using a utility function. Apply the same normalization at the second location (around line 70-71) where shape is also iterated over to handle both tensor and scalar shapes consistently.Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py-53-63 (1)
53-63: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winConstrain or explicitly reject broadcasted batch MatMul in geometry constraints.
At Line 53, constraints are only added when
bufferA.shape[:-2] == bufferB.shape[:-2]. For broadcasted batches, input/output batch tile dimensions are left unconstrained, which can make the solver optimize against a different tile relation than the serializer later enforces.💡 Safe fail-fast fix until full broadcast constraints are encoded
- if (bufferA.shape[:-2] == bufferB.shape[:-2]): + if bufferA.shape[:-2] != bufferB.shape[:-2]: + raise NotImplementedError( + "Broadcasted batch MatMul requires explicit geometric constraints; " + "current tiling relation is only safe for identical batch prefixes." + ) + else: for idx in range(tensorsShapeLenA - 2): tilerModel.addConstraint( tilerModel.getTensorDimVar(tensorName = outputBuffer.name, dimIdx = tensorsShapeLenOutput - 3 - idx) == tilerModel.getTensorDimVar(tensorName = bufferA.name, dimIdx = tensorsShapeLenA - 3 - idx))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py` around lines 53 - 63, The constraint block starting at the if statement checking bufferA.shape[:-2] == bufferB.shape[:-2] only handles the case where batch dimensions match, leaving broadcasted batches unconstrained. Add an else clause after the second for loop to explicitly handle the broadcasted batch case by raising an exception or rejecting the constraint setup, ensuring that mismatched batch dimensions in MatMul operations are caught early rather than allowing the solver to operate on unconstrained dimensions that may conflict with later serialization.Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py-152-155 (1)
152-155: 🩺 Stability & Availability | 🟠 MajorReplace
assertwith explicit runtime validation inserializeTilingSolution.At line 53, the assert validates an unsupported tiling constraint. Using
assertfor this check is unsafe because assertions can be disabled (via Python's-Oflag), allowing invalid tiling schedules to silently generate incorrect code.✅ Deterministic runtime check
- assert all(off == 0 for off in cube.offset[:-3]), ( - f"Unsupported tiling across leading batch dims: offsets={cube.offset}. " - "Only the last batch dim (besides M/O) may be tiled.") + if not all(off == 0 for off in cube.offset[:-3]): + raise ValueError( + f"Unsupported tiling across leading batch dims: offsets={cube.offset}. " + "Only the last batch dim (besides M/O) may be tiled." + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/TileConstraints/MatMulTileConstraint.py` around lines 152 - 155, In the serializeTilingSolution method, replace the assert statement that validates unsupported tiling across leading batch dimensions (checking that all offsets except the last three are zero in cube.offset) with explicit runtime validation using an if condition that raises a ValueError instead. This ensures the validation always executes at runtime regardless of Python optimization flags, preventing invalid tiling schedules from silently generating incorrect code.DeeployTest/testUtils/platformMapping.py-281-291 (1)
281-291: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winSpatz deployer branch breaks
mapDeployercontract (optimizer/defaults/input offsets).Line 286 ignores caller-provided
loweringOptimizer; Line 289 may passNonefordefault_channels_first; andinputOffsetsis not forwarded at all despite being accepted bymapDeployerandSpatzDeployer.Proposed fix
- elif isinstance(platform, (SpatzPlatform)): - deployer = SpatzDeployer( - graph, - platform, - inputTypes, - SpatzOptimizer, - scheduler, - name = name, - default_channels_first = default_channels_first, - deeployStateDir = deeployStateDir - ) + elif isinstance(platform, SpatzPlatform): + if loweringOptimizer is None: + loweringOptimizer = SpatzOptimizer + if default_channels_first is None: + default_channels_first = False + + deployer = SpatzDeployer( + graph, + platform, + inputTypes, + loweringOptimizer, + scheduler, + name = name, + default_channels_first = default_channels_first, + deeployStateDir = deeployStateDir, + inputOffsets = inputOffsets + )Also update
Deeploy/Targets/Spatz/Deployer.pyto forwardinputOffsetstosuper().__init__(..., inputOffsets=inputOffsets)so the value is not dropped downstream.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/testUtils/platformMapping.py` around lines 281 - 291, The SpatzDeployer instantiation in the elif block for SpatzPlatform is breaking the mapDeployer contract in three ways: it hardcodes SpatzOptimizer instead of using the caller-provided loweringOptimizer parameter, it may pass None for default_channels_first instead of using the parameter value, and it completely omits the inputOffsets parameter despite accepting it. Fix these by replacing SpatzOptimizer with loweringOptimizer, ensuring default_channels_first uses the passed parameter, and adding inputOffsets=inputOffsets to the SpatzDeployer constructor call. Additionally, update the SpatzDeployer class in Deeploy/Targets/Spatz/Deployer.py to forward inputOffsets to its super().__init__() call so the value is properly passed to the parent class instead of being dropped.cmake/simulation.cmake-106-115 (1)
106-115: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winAdd the missing
GVSOC_INSTALL_DIRprecondition check in the Spatz macro.Line 114 dereferences
${GVSOC_INSTALL_DIR}without a guard, unlike the generic macro. This can fail late with unclear errors.Proposed fix
macro(add_spatz_gvsoc_emulation name target) + if(NOT DEFINED GVSOC_INSTALL_DIR) + message(FATAL_ERROR "CMake variable GVSOC_INSTALL_DIR not set. Please specify it with -DGVSOC_INSTALL_DIR=<path>") + endif() set(GVSOC_WORKDIR ${CMAKE_BINARY_DIR}/gvsoc_workdir) make_directory(${GVSOC_WORKDIR}) set(GVSOC_BINARY "${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${name}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmake/simulation.cmake` around lines 106 - 115, The macro add_spatz_gvsoc_emulation dereferences the variable GVSOC_INSTALL_DIR without first verifying that it is defined, which can cause unclear errors at runtime. Add a precondition check at the beginning of the macro (before the custom target is created) that validates GVSOC_INSTALL_DIR is set and non-empty, and if not, issue a clear error message using message(FATAL_ERROR ...) to fail early with helpful context about the missing requirement.DeeployTest/Tests/Kernels/FP32/Softmax/Regular2D/network.onnx-1-13 (1)
1-13: 📐 Maintainability & Code Quality | 🟠 MajorONNX test fixtures lack Git LFS configuration — resolve binary storage strategy.
This file is a serialized ONNX model (118 bytes) that cannot be reviewed as text. The repository contains 189 ONNX files without Git LFS rules: files range from 118 bytes to 16.9 MB (e.g.,
DeeployTest/Tests/Models/Transformer/network.onnx), with many multi-megabyte models like microLlama variants (2.2–2.3 MB each) and CCT models (up to 3.7 MB). All are tracked directly in Git rather than via Git LFS or external artifact storage.Migrate test fixtures to Git LFS or an external artifact system to reduce repository bloat and improve clone/checkout performance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/Tests/Kernels/FP32/Softmax/Regular2D/network.onnx` around lines 1 - 13, Binary ONNX model files including network.onnx and other test fixtures are being tracked directly in Git, causing repository bloat. Configure Git LFS for all ONNX files by adding appropriate patterns to the .gitattributes file to handle these binary artifacts, or alternatively migrate the test fixtures to an external artifact storage system. This will prevent large model files from inflating repository size during clones and checkouts.cmake/spatz/toolchain_llvm.cmake-5-12 (1)
5-12: 🩺 Stability & Availability | 🟠 MajorReplace
${SPATZ_HOME}with$ENV{SPATZ_HOME}in the toolchain file.CMake processes toolchain files in an isolated scope before the parent CMakeLists.txt variables are available. The
SPATZ_HOMEvariable defined viaset(SPATZ_HOME $ENV{SPATZ_HOME})in CMakeLists.txt is a normal (non-cache) variable and is not accessible within the toolchain scope. As a result,${SPATZ_HOME}resolves to an empty string, causingSPATZ_TOOLCHAIN_DIRto become/sw/toolchain/llvm-project/build/binand all compiler/tool paths to be incorrect.Access the environment variable directly using
$ENV{SPATZ_HOME}instead on both lines 5 and 41.🔧 Proposed fix
-set(SPATZ_TOOLCHAIN_DIR ${SPATZ_HOME}/sw/toolchain/llvm-project/build/bin) +set(SPATZ_TOOLCHAIN_DIR $ENV{SPATZ_HOME}/sw/toolchain/llvm-project/build/bin)--isystem ${SPATZ_HOME}/sw/toolchain/riscv-gnu-toolchain/riscv-newlib/newlib/libc/include +-isystem $ENV{SPATZ_HOME}/sw/toolchain/riscv-gnu-toolchain/riscv-newlib/newlib/libc/include🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmake/spatz/toolchain_llvm.cmake` around lines 5 - 12, CMake toolchain files execute in an isolated scope where regular variables from CMakeLists.txt are not accessible, causing the SPATZ_TOOLCHAIN_DIR variable definition to fail. Replace all occurrences of ${SPATZ_HOME} with $ENV{SPATZ_HOME} in the toolchain_llvm.cmake file to directly access the environment variable instead. This change must be applied wherever SPATZ_HOME is referenced, including the SPATZ_TOOLCHAIN_DIR assignment and any other locations in the file.DeeployTest/testUtils/codeGenerate.py-220-221 (1)
220-221: 🩺 Stability & Availability | 🟠 MajorGate
DeeployNetwork_BindExternalInputsgeneration to Spatz platform only.
_generateBindExternalInputsis called unconditionally on line 220, generating theDeeployNetwork_BindExternalInputsfunction with references toDeeployNetwork_input_{i}andDeeployNetwork_inputs. The TODO comment correctly identifies this should be Spatz-specific—other platforms (Chimera, CortexM, GAP9, Generic, MemPool, Neureka, PULPOpen, Snitch, SoftHier) do not declare these symbols and will fail compilation. Apply anisinstance(deployer.Platform, SpatzPlatform)guard matching the pattern already used in this file (lines 127, 185, 190, 202). Also fix the typo: "unputs" → "inputs".🔧 Proposed guard
- # TODO: make this work only for spatz and with the correct number of unputs every time - retStr += _generateBindExternalInputs(deployer) + # Only Spatz uses zero-copy external input binding for now. + if isinstance(deployer.Platform, SpatzPlatform): + retStr += _generateBindExternalInputs(deployer)(Note: requires importing
SpatzPlatformfromDeeploy.Targets.Spatz.Platform)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/testUtils/codeGenerate.py` around lines 220 - 221, The call to _generateBindExternalInputs(deployer) on line 220 is unconditional but should only execute for the Spatz platform, as other platforms lack the DeeployNetwork_input_{i} and DeeployNetwork_inputs symbols that this function generates. Wrap the retStr += _generateBindExternalInputs(deployer) line with an isinstance(deployer.Platform, SpatzPlatform) guard following the same pattern used elsewhere in the file (see lines 127, 185, 190, 202). Also import SpatzPlatform from Deeploy.Targets.Spatz.Platform at the top of the file, and fix the typo in the TODO comment by changing "unputs" to "inputs".DeeployTest/Platforms/Spatz/main.c-2-5 (1)
2-5: 🎯 Functional Correctness | 🟠 MajorAdd missing
<string.h>header formemcpyfunction.
memcpyis called on lines 41 and 46 but<string.h>is not included. This can cause compilation failures or warnings depending on compiler strictness settings.Proposed fix
`#include` <stdint.h> `#include` <stddef.h> +#include <string.h> `#include` <benchmark.h> `#include` "printf.h"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/Platforms/Spatz/main.c` around lines 2 - 5, Add the missing `#include <string.h>` header to the include section at the top of the file. Insert this include statement with the other standard library headers (alongside the existing `#include <stdint.h>` and `#include <stddef.h>` includes) to provide the declaration for the `memcpy` function that is being called in the code.DeeployTest/Platforms/Spatz/main.c-77-84 (1)
77-84: 🎯 Functional Correctness | 🟠 MajorPointer aliasing and incomplete NaN detection in float validation.
Lines 80 and 83 use
*(uint32_t*)&variablecasts, which violate C strict aliasing rules and trigger undefined behavior. Additionally, checking only0x7FC00000detects one quiet NaN pattern but misses other NaN encodings; a complete check must cover all values where the exponent bits are 0xFF with a non-zero mantissa.Use
isnan()to detect all NaN cases, and replace unsafe pointer casts withmemcpy()for safe bit reinterpretation:Suggested fix
`#if` ISOUTPUTFLOAT == 1 - // RUNWANG: Allow margin of error for float32_t - // MATTIA: if diff is a quiet nan 0x7FC00000 we want to error - if ((diff < -1e-4f) || (diff > 1e-4f) || *(uint32_t*)&diff == 0x7FC00000) { + // Allow margin of error for float32_t; fail on any NaN. + if (isnan((float)expected) || isnan((float)actual) || isnan((float)diff) || + (diff < -1e-4f) || (diff > 1e-4f)) { tot_err += 1; - // printf("Expected: %f Actual: %f Diff: %f at Index %12u in Output %u\r\n", expected, actual, diff, i, buf); - printf("Expected: 0x%08x Actual: 0x%08x Diff: 0x%08x at Index %4u in Output %u\r\n", *(uint32_t*)&expected, *(uint32_t*)&actual, *(uint32_t*)&diff, i, buf); + uint32_t expected_bits, actual_bits, diff_bits; + memcpy(&expected_bits, &expected, sizeof(expected_bits)); + memcpy(&actual_bits, &actual, sizeof(actual_bits)); + memcpy(&diff_bits, &diff, sizeof(diff_bits)); + printf("Expected: 0x%08x Actual: 0x%08x Diff: 0x%08x at Index %4u in Output %u\r\n", + expected_bits, actual_bits, diff_bits, i, buf); } `#else`Note: If
math.hcannot be included, use__builtin_isnan()or implement a safe bit-pattern check viamemcpy().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DeeployTest/Platforms/Spatz/main.c` around lines 77 - 84, The code uses unsafe pointer casting with `*(uint32_t*)&` to reinterpret float values, which violates strict aliasing rules and causes undefined behavior. Additionally, the NaN detection only checks for one specific pattern (`0x7FC00000`) and misses other NaN encodings. Replace the direct pointer cast NaN check in the condition (`*(uint32_t*)&diff == 0x7FC00000`) with the `isnan()` function to properly detect all NaN values. For the printf statement that casts the `expected`, `actual`, and `diff` variables to uint32 pointers, replace these unsafe casts with `memcpy()` calls to safely reinterpret the float bits as uint32 values.
🟡 Minor comments (4)
TargetLibraries/Spatz/src/TopK_fp32_int32_spatz.c-59-63 (1)
59-63: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
root_idxshould beint32_t, notfloat32_t.
heap_indices[0]is anint32_t, but it is copied through afloat32_ttemporary and written back intoheap_indices[i]. For indices above 2^24 theint→float→intround-trip loses precision and yields a wrong index. Use the correct integer type.🐛 Fix the temporary type
- float32_t root_val = heap_values[0]; float32_t root_idx = heap_indices[0]; + float32_t root_val = heap_values[0]; int32_t root_idx = heap_indices[0];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Spatz/src/TopK_fp32_int32_spatz.c` around lines 59 - 63, The variable `root_idx` is incorrectly declared as `float32_t` when it should be `int32_t` to match the type of elements in the `heap_indices` array. When assigning an `int32_t` value from `heap_indices[0]` into a `float32_t` temporary and then back to `heap_indices[i]`, precision loss occurs for large index values above 2^24, resulting in incorrect indices. Change the declaration of `root_idx` from `float32_t` to `int32_t` while keeping `root_val` as `float32_t` since it stores values from the `heap_values` array.TargetLibraries/Spatz/src/Softmax_fp32_spatz.c-66-78 (1)
66-78: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
myinvviolates strict aliasing.
uint32_t i = *(uint32_t*)&x;andfloat y = *(float*)&i;reinterpret through incompatible pointer types, which is undefined behavior and may be miscompiled under optimization. Ironicallyexpf_nodiv_reducedalready adopts aunion float_bitsprecisely to avoid this; reuse that approach here (ormemcpy).🛡️ Use the type-punning union instead of pointer casts
float32_t myinv(float32_t x){ - uint32_t i = *(uint32_t*)&x; - i = 0x7EEEEEEE - i; - float y = *(float*)&i; + union float_bits u; + u.f = x; + u.i = 0x7EEEEEEE - u.i; + float y = u.f;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Spatz/src/Softmax_fp32_spatz.c` around lines 66 - 78, The myinv function violates strict aliasing rules by using pointer casts to reinterpret float as uint32_t and back, which is undefined behavior. Replace the pointer casts in myinv with a union type approach (reuse the existing float_bits union already adopted in expf_nodiv_reduced) or use memcpy to safely perform the type punning while maintaining type safety and compatibility with compiler optimizations.conda_enviroment_topk_attention.yml-1-1 (1)
1-1: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
name/prefixuse a~path, which conda will not handle as intended.
name:should be a plain environment name (e.g.deeploy_conda_venv), not~/.conda/envs/.... Theprefix:on Line 81 is a machine-specific absolute-ish path that conda does not tilde-expand and thatconda env createignores/overrides; it leaks the author's local layout. Consider removingprefixand using a simplename. Also the filename is misspelled: "enviroment" → "environment".Also applies to: 81-81
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@conda_enviroment_topk_attention.yml` at line 1, The environment configuration file has two issues: the `name` field on line 1 contains a tilde-path (`~/.conda/envs/deeploy_conda_venv`) when it should only contain a plain environment name like `deeploy_conda_venv` without any path, and the `prefix` field on line 81 contains a machine-specific absolute path that conda does not expand properly. Remove the `prefix` field entirely and replace the `name` value with just the environment name `deeploy_conda_venv`. Additionally, rename the file from `conda_enviroment_topk_attention.yml` to `conda_environment_topk_attention.yml` to fix the spelling of "environment".cmake/spatz/toolchain_llvm.cmake-68-68 (1)
68-68: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winHardcoded, machine-specific
--gcc-toolchainpath.
--gcc-toolchain=/usr/pack/riscv-1.0-kgf/spatz-gcc-7.1.1ties the build to one specific cluster filesystem and will fail elsewhere. Derive it from an environment variable (e.g.$ENV{GCC_INSTALL_DIR}) or a configurable cache option.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmake/spatz/toolchain_llvm.cmake` at line 68, Replace the hardcoded path in the `--gcc-toolchain` flag (currently `/usr/pack/riscv-1.0-kgf/spatz-gcc-7.1.1`) with a dynamic reference that can be configured across different environments. Use either a CMake cache variable (defined with `set()` and available as an option) or an environment variable referenced via `$ENV{VARIABLE_NAME}` syntax. This allows the toolchain path to be customized at build time without modifying the CMake file directly.
🧹 Nitpick comments (9)
TargetLibraries/Spatz/src/MatMul_fp32_spatz.c (2)
26-26: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
constis discarded when passing inputs to the GEMV kernels.
Spatz_MatMul_fp32_fp32_fp32declaresa/basconst float32_t *__restrict__, butgemv_v32b_m4/gemv_col_reductiontake non-constfloat *, and Line 130/137 assignb(const) tofloat *b_offset. These all silently discardconst, which will emit warnings and fail under-Werror.♻️ Make the GEMV input pointers const
-void gemv_v32b_m4(float *a, float *b, float *c, int N, int local_P, int total_P) { +void gemv_v32b_m4(const float *a, const float *b, float *c, int N, int local_P, int total_P) {-void gemv_col_reduction(float *a, float *b, float *c, int N, int local_P, int total_P) { +void gemv_col_reduction(const float *a, const float *b, float *c, int N, int local_P, int total_P) {- float *b_offset = b + cols_core0; float *c_offset = c + cols_core0; + const float *b_offset = b + cols_core0; float *c_offset = c + cols_core0;Please confirm the Spatz toolchain flags (e.g. whether
-Werroris set incmake/spatz/toolchain_llvm.cmake); under-Werrorthe discarded-qualifier warnings would break the build.Also applies to: 51-51, 130-130, 137-137
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Spatz/src/MatMul_fp32_spatz.c` at line 26, The function signature for gemv_v32b_m4 on line 26 and gemv_col_reduction on line 51 are discarding const qualifiers from their input pointer parameters a and b, which causes compiler warnings and fails under -Werror. Update both gemv_v32b_m4 and gemv_col_reduction function signatures to declare their float pointer parameters as const float * instead of non-const float * to match the const qualifiers in the caller Spatz_MatMul_fp32_fp32_fp32. Additionally, update any intermediate pointer variables like b_offset on lines 130 and 137 that are assigned from these parameters to also be declared as const pointers to maintain const correctness throughout.
122-123: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueResolve the TODO on the GEMV algorithm-selection heuristic.
The
N > 4*Pthreshold for choosinggemv_col_reductionvs.gemv_v32b_m4is marked as provisional. Worth pinning down before this leaves draft.Want me to open an issue to track tuning this dispatch threshold?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Spatz/src/MatMul_fp32_spatz.c` around lines 122 - 123, The TODO comment in the M==1 conditional block indicates the algorithm-selection threshold for choosing between gemv_col_reduction and gemv_v32b_m4 needs to be finalized. Replace the provisional threshold logic with either a tuned constant value based on performance benchmarking or define a named constant that clearly represents the optimal heuristic. Remove the TODO comment once the threshold is properly determined and expressed as a meaningful named constant or well-justified numeric value in the conditional that compares N against a multiple of P.Deeploy/Targets/Spatz/Templates/TopKTemplate.py (1)
39-39: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueMisleading comment.
The comment reads
# compute_topk_vector_instructions, butminHeapTemplateinvokescompute_topk_min_heap. Update or remove it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Spatz/Templates/TopKTemplate.py` at line 39, The comment at line 39 incorrectly states `compute_topk_vector_instructions` but the actual code invokes `compute_topk_min_heap` through the `minHeapTemplate` function. Update the comment to accurately reflect that `minHeapTemplate` invokes `compute_topk_min_heap`, or remove the misleading comment if it no longer serves a purpose.TargetLibraries/Spatz/src/Softmax_fp32_spatz.c (1)
4-23: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
myexpfappears to be dead code.The softmax paths use
expf_nodiv_reducedexclusively;myexpf(with itsldexpfdependency) is unused. Per the earlier request to keep the codebase concise, consider removing it unless it is kept intentionally.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Spatz/src/Softmax_fp32_spatz.c` around lines 4 - 23, The myexpf function is unused dead code that should be removed to keep the codebase concise. Since the softmax paths exclusively use expf_nodiv_reduced instead of myexpf, simply delete the entire myexpf function definition that includes the range reduction logic, polynomial approximation, and ldexpf call.TargetLibraries/Spatz/inc/DeeploySpatzMath.h (1)
27-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueSingle-core macros reference an undefined
core_id.The kernels in this cohort use
cidfromsnrt_cluster_core_idx(), while these macros expand toif (core_id == 0). Any translation unit usingBEGIN_SINGLE_CORE/SINGLE_COREmust declare acore_idin scope, which is inconsistent with the rest of the codebase and easy to misuse.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Spatz/inc/DeeploySpatzMath.h` around lines 27 - 29, The macros BEGIN_SINGLE_CORE, END_SINGLE_CORE, and SINGLE_CORE reference an undefined variable `core_id`, while the codebase consistently uses `cid` obtained from snrt_cluster_core_idx(). Update these three macro definitions to use snrt_cluster_core_idx() directly within each macro expansion instead of relying on an external `core_id` variable. This will make the macros self-contained and eliminate the requirement for callers to declare `core_id` in scope, ensuring consistency across the codebase.Deeploy/TilingExtension/TileConstraint.py (1)
134-136: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winGuard the multi-output invariant explicitly.
At Line 138, tiling structure is derived only from the first output. Add a fail-fast check that every output has the same shape and memory-path as the first output, otherwise schedules can be silently incorrect for future heterogeneous multi-output operators.
Suggested patch
- outVar, outTensorConstraint = next(iter(tilingSolution.outputTensorMemoryConstraints.items())) + outVar, outTensorConstraint = next(iter(tilingSolution.outputTensorMemoryConstraints.items())) + refShape = tuple(ctxt.lookup(outVar).shape) + refPath = tuple(outTensorConstraint.memoryConstraints.keys()) + for otherVar, otherConstraint in tilingSolution.outputTensorMemoryConstraints.items(): + otherShape = tuple(ctxt.lookup(otherVar).shape) + otherPath = tuple(otherConstraint.memoryConstraints.keys()) + assert otherShape == refShape, \ + f"Multi-output tiling requires matching output shapes, got {outVar}:{refShape} vs {otherVar}:{otherShape}" + assert otherPath == refPath, \ + f"Multi-output tiling requires matching memory paths, got {outVar}:{refPath} vs {otherVar}:{otherPath}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/TilingExtension/TileConstraint.py` around lines 134 - 136, The current assertion in the multi-output check only verifies that at least one output exists, but does not validate that all outputs have compatible tiling characteristics. Add an explicit fail-fast guard after the existing assertion that iterates through tilingSolution.outputTensorMemoryConstraints and verifies that every output shares the same shape and memory-path properties as the first output (which is used to determine the tiling structure). This will prevent silent correctness issues when handling heterogeneous multi-output operators in the future.Makefile (2)
76-76: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winDeclare orchestration targets as
.PHONY.These targets should be phony to avoid accidental no-op builds when files/directories with the same names exist.
Proposed fix
+.PHONY: all toolchain emulators spatz_runtime echo-bash + all: toolchain emulators # docs echo-bashAlso applies to: 100-102, 458-458
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` at line 76, Add `.PHONY` declarations for the `all` target on line 76 and the other orchestration targets referenced at lines 100-102 and 458. Include each target name in a `.PHONY` declaration placed before its target definition to ensure Make always executes these targets regardless of whether files or directories with matching names exist.Source: Linters/SAST tools
450-452: 📐 Maintainability & Code Quality | 🔵 TrivialConsider using generic
python3instead of hardcodedpython3.6for improved portability.Line 450 hardcodes
python3.6, while other recipes in the Makefile (e.g., SNITCH_INSTALL_DIR at line 432) do not specify a Python version. The project contains no documented Python version requirement, and the commented-out gvsoc_spatz recipe (line 465) uses genericpython3. Using a configurable interpreter will make the target compatible with modern Python environments without sacrificing functionality.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 450 - 452, Replace the hardcoded python3.6 interpreter reference with generic python3 in the venv creation command. This change improves portability by allowing the Makefile to work with any available Python 3 version on the system, consistent with other recipes in the Makefile that do not specify a version. Locate the line where python3.6 -m venv .venv is called and change python3.6 to python3 while keeping the rest of the venv and pip install commands intact.conda_enviroment_topk_attention.yml (1)
5-38: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueFully-pinned conda build strings make this env non-reproducible off the author's machine.
The conda section pins exact OS-specific build hashes (e.g.
=h5eee18b_6), which is the output ofconda env exporton one Linux host and will fail to solve on other platforms. If this file is meant to be a shared/reproducible env, preferconda env export --from-historyor loosen to version-only pins.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@conda_enviroment_topk_attention.yml` around lines 5 - 38, The conda environment file contains OS-specific build hashes (the characters after the last equals sign in each package specification, such as h5eee18b_6 in bzip2=1.0.8=h5eee18b_6) which make the environment non-reproducible across different platforms. Remove the build hash suffix from each package line, keeping only the package name and version number (e.g., change bzip2=1.0.8=h5eee18b_6 to bzip2=1.0.8). Alternatively, regenerate this file using conda env export --from-history instead of conda env export to produce a more portable specification.
| self.operatorRepresentation['k_value'] = int(k_in.values[0]) | ||
| self.operatorRepresentation['values_out'] = values_out.name |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win
Validate k as a constant scalar and bound it to input size.
Line 2916 assumes k_in.values[0] exists and is valid. If k is non-constant, scalar-shaped differently, or k > data_in_size, the generated TopK loop indexes beyond the local pairs buffer.
Suggested fix
- self.operatorRepresentation['k_value'] = int(k_in.values[0])
+ if not hasattr(k_in, "values"):
+ return ctxt, False
+ k_value = int(np.asarray(k_in.values).item())
+ if k_value <= 0 or k_value > self.operatorRepresentation['data_in_size']:
+ return ctxt, False
+ self.operatorRepresentation['k_value'] = k_value📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.operatorRepresentation['k_value'] = int(k_in.values[0]) | |
| self.operatorRepresentation['values_out'] = values_out.name | |
| if not hasattr(k_in, "values"): | |
| return ctxt, False | |
| k_value = int(np.asarray(k_in.values).item()) | |
| if k_value <= 0 or k_value > self.operatorRepresentation['data_in_size']: | |
| return ctxt, False | |
| self.operatorRepresentation['k_value'] = k_value | |
| self.operatorRepresentation['values_out'] = values_out.name |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Deeploy/Targets/Generic/Parsers.py` around lines 2916 - 2917, Before
assigning the k value at line 2916 in the TopK operator representation setup,
add validation to ensure that k_in is a constant scalar and that its value does
not exceed the size of the input data. Check that k_in is constant-valued,
verify it has scalar shape, extract the k value safely, and validate that the k
value is within bounds relative to data_in_size to prevent buffer overruns in
the TopK loop. If validation fails, raise an appropriate error describing which
constraint was violated.
| target_include_directories(network PUBLIC ${SPATZ_BENCHMARK_INCLUDE_DIR}) | ||
|
|
||
| target_link_libraries(${ProjectId} PRIVATE network deeploylib) | ||
| target_compile_options(${ProjectId} INTERFACE network) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspect the changed CMake file:"
sed -n '1,80p' DeeployTest/Platforms/Spatz/CMakeLists.txt
echo
echo "Find target_compile_options usages under DeeployTest/Platforms:"
fd -i 'CMakeLists.txt' DeeployTest/Platforms --exec rg -n 'target_compile_options\(' {}Repository: pulp-platform/Deeploy
Length of output: 1294
🌐 Web query:
CMake target_compile_options passing target name as compile option
💡 Result:
In CMake, the target_compile_options command is used to add compiler options to a specific target [1][2]. The syntax is: target_compile_options( [BEFORE] <INTERFACE|PUBLIC|PRIVATE> ... [<INTERFACE|PUBLIC|PRIVATE> ...]...) You do not "pass" the target name as a compile option in the sense of a flag passed to the compiler; rather, you specify the target name as the first argument to the command to indicate which target the options apply to [1][3]. If you need to access the name of the target being built from within your compile options (e.g., to pass the target name as a preprocessor definition), you can use the
Citations:
- 1: https://cmake.org/cmake/help/latest/command/target_compile_options.html?highlight=target_compile_options
- 2: https://cmake.org/cmake/help/latest/command/target_compile_options.html
- 3: https://cmake.org/cmake/help/v4.3/command/target_compile_options.html
- 4: https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html
- 5: https://cmake.org/cmake/help/latest/command/get_target_property.html
- 6: https://stackoverflow.com/questions/56104607/how-to-print-current-compilation-flags-that-are-set-with-target-compile-options
- 7: https://cmake.org/cmake/help/git-master/command/target_compile_options.html
🏁 Script executed:
# Check what 'network' is in the codebase - is it a library, executable, or custom target?
rg -n "add_library\(network" --type cmake
rg -n "add_executable\(network" --type cmake
rg -n "project\(network" --type cmake
# Check if there's any documentation or comments about this pattern
rg -B3 -A3 'target_compile_options.*network' DeeployTest/Platforms --type cmakeRepository: pulp-platform/Deeploy
Length of output: 159
🏁 Script executed:
# Find where network is defined and what type of target it is
rg -n "add_library.*network|add_executable.*network" --type cmake
# Look at all CMakeLists.txt files that might define network
fd CMakeLists.txt -x grep -l "network" {} \;
# Check the specific context around line 20 again and nearby lines
cat -n DeeployTest/Platforms/Spatz/CMakeLists.txt | sed -n '1,30p'
# Search for any documentation about the network target
rg -n "network" DeeployTest/Platforms --type cmake | head -20Repository: pulp-platform/Deeploy
Length of output: 159
🏁 Script executed:
# Simple direct approach - check if the file exists and show its content
if [ -f "DeeployTest/Platforms/Spatz/CMakeLists.txt" ]; then
echo "=== File exists, showing lines 15-25 ==="
head -25 DeeployTest/Platforms/Spatz/CMakeLists.txt | tail -11
else
echo "File not found"
fi
# Try to find network definition
echo ""
echo "=== Searching for 'network' target definition ==="
find . -name CMakeLists.txt -type f -exec grep -l "add_library.*network\|add_executable.*network" {} \; 2>/dev/null | head -5Repository: pulp-platform/Deeploy
Length of output: 723
🏁 Script executed:
# Look at the main CMakeLists.txt to see how network is defined
cat -n ./DeeployTest/CMakeLists.txt | grep -A5 -B5 "network"
# Also check if there are any build errors documented
find . -name "*.md" -o -name "*.txt" | xargs grep -l "network" 2>/dev/null | head -5Repository: pulp-platform/Deeploy
Length of output: 2064
Remove the invalid target_compile_options call passing a target name as a compile flag.
Line 20 passes network (a library target) as a compile option, which is invalid CMake syntax. Compiler options must be actual flags (like -Wall, -O2) or generator expressions. The dependency is already correctly handled by target_link_libraries on line 19. Similar problematic patterns appear elsewhere in the file and should be audited as well.
💡 Proposed fix
target_link_libraries(${ProjectId} PRIVATE network deeploylib)
-target_compile_options(${ProjectId} INTERFACE network)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| target_compile_options(${ProjectId} INTERFACE network) | |
| target_link_libraries(${ProjectId} PRIVATE network deeploylib) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@DeeployTest/Platforms/Spatz/CMakeLists.txt` at line 20, The
target_compile_options call is incorrectly passing the network target name as a
compile flag. Remove the line containing target_compile_options(${ProjectId}
INTERFACE network) since the network dependency is already correctly handled by
the target_link_libraries call on the preceding line. Compiler options should
only contain actual compiler flags like -Wall or -O2, not target names. After
fixing this line, audit the rest of the CMakeLists.txt file for similar invalid
patterns where target names might be mistakenly used as compile options.
| # Instantiate Classes Required for Memory Level Annotation Extension | ||
| if args.platform == "Spatz": | ||
| # Spatz cluster has only TCDM (L1) + external DRAM (L3). No on-chip L2. | ||
| # Declare L1 and L3 as direct neighbours so BFS-based tile-path | ||
| # generation does not insert a phantom L2 staging buffer. | ||
| L3 = MemoryLevel(name = "L3", neighbourNames = ["L1"], size = 64000000) | ||
| L1 = MemoryLevel(name = "L1", neighbourNames = ["L3"], size = args.l1) | ||
| memoryLevels = [L3, L1] | ||
| else: | ||
| L3 = MemoryLevel(name = "L3", neighbourNames = ["L2"], size = 64000000) | ||
| L2 = MemoryLevel(name = "L2", neighbourNames = ["L3", "L1"], size = args.l2) | ||
| L1 = MemoryLevel(name = "L1", neighbourNames = ["L2"], size = args.l1) | ||
| memoryLevels = [L3, L2, L1] | ||
|
|
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check defaultMemLevel defaults/usages for Spatz paths
rg -n --type=py 'defaultMemLevel|--defaultMemLevel|platform.*Spatz|Spatz' \
DeeployTest/testMVP.py \
DeeployTest/deeployRunner_spatz.py \
DeeployTest/deeployRunner_tiled_spatz.py \
DeeployTest/test_platforms.pyRepository: pulp-platform/Deeploy
Length of output: 1387
🏁 Script executed:
sed -n '150,160p' DeeployTest/testMVP.pyRepository: pulp-platform/Deeploy
Length of output: 701
🏁 Script executed:
sed -n '270,285p' DeeployTest/testMVP.pyRepository: pulp-platform/Deeploy
Length of output: 886
🏁 Script executed:
cat DeeployTest/deeployRunner_spatz.pyRepository: pulp-platform/Deeploy
Length of output: 401
🏁 Script executed:
cat DeeployTest/deeployRunner_tiled_spatz.pyRepository: pulp-platform/Deeploy
Length of output: 400
Spatz platform will fail at runtime with default --defaultMemLevel L2 since L2 is not defined.
The default value for --defaultMemLevel is "L2" (line 155), but lines 255-261 remove L2 for Spatz, creating only L3 and L1. The Spatz runners (deeployRunner_spatz.py and deeployRunner_tiled_spatz.py) do not override this default, so when Spatz is executed without an explicit --defaultMemLevel argument, line 278 will attempt to access memoryHierarchy.memoryLevels["L2"], which does not exist and will cause a runtime failure.
Either pass --defaultMemLevel L3 in Spatz runners or update the default to be platform-aware.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@DeeployTest/testMVP.py` around lines 254 - 267, The Spatz platform
configuration only creates L3 and L1 memory levels but the default value for the
--defaultMemLevel argument is "L2", which does not exist in Spatz, causing a
runtime failure when accessing memoryHierarchy.memoryLevels["L2"]. Update the
Spatz runners (deeployRunner_spatz.py and deeployRunner_tiled_spatz.py) to
explicitly pass --defaultMemLevel L3 when invoking the test, or alternatively
make the default value for --defaultMemLevel platform-aware by checking the
platform argument first and setting the default to "L3" for Spatz and "L2" for
other platforms.
This pr aims to add Spatz platform
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.