-
Notifications
You must be signed in to change notification settings - Fork 46
Add FP32 operators and tiling support for MicroLlama on Snitch #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
1a6308e
1bdf9c9
35d51ef
76a4678
c222810
497e9c1
bf5ddb7
7e3659d
28280fb
ac5d541
c90b35c
5669c28
cf4d9bd
c04bd6a
bdc550e
b53ff75
b355624
5306134
32d88c0
7092e35
b2199cb
7ad03a3
d76f6f1
1c62b68
32d4bfa
66c4b4f
be96413
89d382a
e55c7bc
13a4e64
55b6750
06010e4
85a68fd
182a2c3
03125c0
027ccab
064981a
9be8768
fdc0c82
7813684
4865516
fc8ea3f
b6b6eb5
4e8448b
7003801
e693be7
1633a71
0a66cf4
6b357cf
c7b9771
6e736f4
1062ec2
f7d62b0
ebf9009
ebe3e46
94aea9e
9fa6fd7
f87f98b
1924e5e
dfe3f0a
56e2d57
ba1b27e
c54195f
05b01ef
8338159
79c47fa
4ec3fb3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
diaconuccalin marked this conversation as resolved.
diaconuccalin marked this conversation as resolved.
|
|
diaconuccalin marked this conversation as resolved.
diaconuccalin marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,3 +31,5 @@ ignore: | |
| - "**/toolchain/" | ||
| # Ignore all files in .git | ||
| - "**/.git/**" | ||
| # Ignore all files in .venv | ||
| - "**/.venv/" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,6 +34,12 @@ def alignToContext(self, ctxt: NetworkContext, | |
| bufferIn.aliases.add(bufferOut.name) | ||
| bufferOut.aliases.add(bufferIn.name) | ||
|
|
||
| # Tiling still reads the legacy single-valued `_alias` attribute | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be better practice to do the fix in the tiler (prevent it from relying on _alias and instead use the new alises parameter)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This workload is too heavy, it would touch the deeploy core:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to open a new pr to fix this |
||
| # (TilerExtension / MemoryScheduler). Set it here so platforms that | ||
| # rely on Reshape pointer-passthrough during tiling don't each need | ||
| # to carry the same workaround in a subclass. | ||
| bufferOut._alias = bufferIn.name | ||
|
|
||
| return ctxt, operatorRepresentation, [] | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,11 +12,26 @@ | |
| from Deeploy.TilingExtension.MemoryConstraints import NodeMemoryConstraint | ||
| from Deeploy.TilingExtension.TileConstraint import TileConstraint | ||
| from Deeploy.TilingExtension.TilerModel import TilerModel | ||
| from Deeploy.TilingExtension.TilingCodegen import AbsoluteHyperRectangle, TilingSchedule, VariableReplacementScheme | ||
| from Deeploy.TilingExtension.TilingCodegen import AbsoluteHyperRectangle, HyperRectangle, TilingSchedule, \ | ||
| VariableReplacementScheme | ||
|
|
||
|
|
||
| class BOPTileConstraint(TileConstraint): | ||
| """Tile constraint class for binary operators, i.e. operators that use two input tensors of equal dimensions | ||
| """Tile constraint class for binary operators, i.e. operators that have exactly 2 inputs and 1 output. | ||
|
|
||
| When the second input is a scalar (total size 1), it is kept full-size and only | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a conflict between the 2 definitions in this documentation ("operators that use two input tensors of equal dimensions" and "when second input is a scalar")
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to official documentations however, an operation is considered binary even if it needs broadcasting, so let's change the definition on line 20 (to "i.e. operators that have exactly 2 inputs and 1 output")
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the wording to your suggestion in dfe3f0a. |
||
| the first input and the output are tiled together. This supports ONNX | ||
| broadcasting in operators that have a corresponding scalar kernel. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add a warning that this is only partial broadcasting support (second operator is scalar)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in commit dfe3f0a |
||
|
|
||
| Warning: | ||
| Broadcasting support is partial -- only the case of a fully-scalar | ||
| second input (np.prod(input2.shape) == 1) is handled. Other ONNX | ||
| broadcasting patterns -- input1 scalar, partial broadcasting such | ||
| as (N, 1) + (1, M), single-dim broadcasting such as (N, M, K) + | ||
| (N, 1, K), or rank-mismatched shapes such as (N, M) + (M,) -- | ||
| fall through to the non-scalar branch, where the dim-equality | ||
| constraints will fail to satisfy. Operators that need full ONNX | ||
| broadcasting must use a different tile constraint. | ||
| """ | ||
|
|
||
| dataIn1Name = 'data_in_1' #: str: Name of the first input tensor as defined by the operator's parser | ||
|
|
@@ -34,14 +49,27 @@ def addGeometricalConstraint(cls, tilerModel: TilerModel, parseDict: Dict, ctxt: | |
| tilerModel.addTensorDimToModel(ctxt, bufferName) | ||
|
|
||
| input1Shape = ctxt.lookup(inputBuffer1Name).shape | ||
|
|
||
| for dim in range(len(input1Shape)): | ||
| inputDim1Var = tilerModel.getTensorDimVar(tensorName = inputBuffer1Name, dimIdx = dim) | ||
| inputDim2Var = tilerModel.getTensorDimVar(tensorName = inputBuffer2Name, dimIdx = dim) | ||
| outputDimVar = tilerModel.getTensorDimVar(tensorName = outputBufferName, dimIdx = dim) | ||
|
|
||
| tilerModel.addConstraint(inputDim1Var == inputDim2Var) | ||
| tilerModel.addConstraint(inputDim1Var == outputDimVar) | ||
| input2Shape = list(ctxt.lookup(inputBuffer2Name).shape) | ||
| is_scalar = (np.prod(input2Shape) == 1) | ||
|
|
||
| if is_scalar: | ||
| # Scalar broadcasting: tile input1 and output together; input2 stays full-size. | ||
| for dim in range(len(input1Shape)): | ||
| inputDim1Var = tilerModel.getTensorDimVar(tensorName = inputBuffer1Name, dimIdx = dim) | ||
| outputDimVar = tilerModel.getTensorDimVar(tensorName = outputBufferName, dimIdx = dim) | ||
| tilerModel.addConstraint(inputDim1Var == outputDimVar) | ||
| for dim in range(len(input2Shape)): | ||
| inputDim2Var = tilerModel.getTensorDimVar(tensorName = inputBuffer2Name, dimIdx = dim) | ||
| tilerModel.addConstraint(inputDim2Var == input2Shape[dim]) | ||
| else: | ||
| # Element-wise: all three tensors tiled identically. | ||
| for dim in range(len(input1Shape)): | ||
| inputDim1Var = tilerModel.getTensorDimVar(tensorName = inputBuffer1Name, dimIdx = dim) | ||
| inputDim2Var = tilerModel.getTensorDimVar(tensorName = inputBuffer2Name, dimIdx = dim) | ||
| outputDimVar = tilerModel.getTensorDimVar(tensorName = outputBufferName, dimIdx = dim) | ||
|
|
||
| tilerModel.addConstraint(inputDim1Var == inputDim2Var) | ||
| tilerModel.addConstraint(inputDim1Var == outputDimVar) | ||
|
|
||
| return tilerModel | ||
|
|
||
|
|
@@ -64,11 +92,18 @@ def serializeTilingSolution( | |
| newSize = np.prod(cube.dims) | ||
| replacements["size"].append(newSize) | ||
|
|
||
| input2Shape = list(ctxt.lookup(operatorRepresentation[cls.dataIn2Name]).shape) | ||
| is_scalar = (np.prod(input2Shape) == 1) | ||
|
|
||
| inputLoadSchedule = [] | ||
| outputLoadSchedule = [] | ||
|
|
||
| for cube in outputCubes: | ||
| inputLoadSchedule.append({cls.dataIn1Name: cube, cls.dataIn2Name: cube}) | ||
| if is_scalar: | ||
| in2Cube = HyperRectangle(tuple([0] * len(input2Shape)), tuple(input2Shape)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that this is correct. We don't want to replicate data, and I'm afraid that this tiling solution might cause exactly that (for a scalar input_2, we want to keep only one value in memory, and reuse it for the operation with each element, which will be done in the kernel, instead of transforming it from a scalar into a tensor of the same size as input_1, thus repeating the single element in input_2 many times)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or if data is not replicated, it might cause issues if it does set size of input_2 to the size of input_1, since the kernel might look for data that is not there (since input_2 should be a single scalar, it might access invalid memory). Be sure that you include thorugh tests in the PR with all binary operators that use this feature that cover all cases (multiple shapes for input_1, and both scalar and shape of input_1 for input_2)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tiling doesn't replicate -- in2Cube.dims = input2Shape (np.prod == 1 for scalars), so the DMA transfers one float and the kernel reads input2[0] by value (no OOB read). Added scalar broadcast tests for Add / Div / Mul covering untiled + tiled (L1 = 2k/5k/10k) in c54195f. |
||
| inputLoadSchedule.append({cls.dataIn1Name: cube, cls.dataIn2Name: in2Cube}) | ||
| else: | ||
| inputLoadSchedule.append({cls.dataIn1Name: cube, cls.dataIn2Name: cube}) | ||
|
|
||
| for out in outputCubes: | ||
| outputLoadSchedule.append({cls.dataOutName: out}) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.