Magia V3#80
Conversation
…er and one for main core)
… fixed small bugs
…ta_demux,, aligned address map, modified tinyprintf to resolve relative address (WIP), added spatz tests, implement event_unit_utils helper functions
…t_mm and matmul_compare_spatz_test to work with the cluster
…tion) Cluster code is now compiled to a flat PIC binary and embedded as a .pulp_binary section in the single CV32 ELF, same pattern as .spatz_binary. No separate binary load needed at simulation time. - Makefile: auto-detect cluster tests via find; drop cluster=0/1 flag -modified tile_csr for giving fetch enable to cluster cores in one-hot fashion. Every core has is own start boot address
…ion of new cache wrapper
AlessandroNadalini
left a comment
There was a problem hiding this comment.
Since it's a huge PR, this is just the first round of revision: I tried to highlight the major changes that are required, but there might be more to be addressed.
Another point is the development/adaptation of the CI, which is now failing due to some actual errors and mismatches with the new implemented compilation/simulation flow: I'm taking care of updating the CI repo, we'll consider the verification status when the CI will be working again.
There was a problem hiding this comment.
In addition to the other comment I left, we should discuss the name of the instance of the control core of the tile and align the name within the testbenches and the simulation flow in general because we access some signals internal to the core through the hierarchy so a name mismatch can cause error in compilation or during the loading of the design in the simulation phase.
…der definition in th emakefile, debug_req of the tile set to 0
MAGIA V3: PULP Cluster Integration
Hardware Changes
New RTL Modules
obi_slave_ctrl_cluster.sv: OBI slave controller for the PULP cluster. Exposes 8 memory-mapped registers atPULP_CTRL_BASE(0x1740):CLK_EN,DONE,READY,NB_CORES_TO_WAIT,TASKBIN,DATA,START,BINARY. Implements the quorum logic: countsPULP_DONEwrites from cluster cores and firesdone_o(EU bit 12) oncenb_cores_to_waitcores have reported completion.tile_csr.sv: Tile-level CSR block that aggregatesobi_slave_ctrl_clusterandobi_slave_ctrl_spatzbehind a single OBI slave port, simplifying themagia_tiletop-level.Modified RTL
magia_tile.sv: Integrated 8 PULP cluster cores (RI5CY/CV32E40P),obi_slave_ctrl_cluster,tile_csr, and per-core EU direct links. Cluster cores share the existing HCI/OBI local interconnect and instruction cache.magia_tile_pkg.sv: Added PULP cluster parameters (N_CLUSTER_CORES,N_BIT_CLUSTER_CORES,PULP_HARTID_BASE,PULP_CTRL_BASE) and EU event bit assignments forcluster_done(bit 12).magia_event_unit.sv: Extended EU direct link from 1 port (CV32 only) toNB_CORESports (CV32 + all cluster cores). Broadcastscluster_done(EU bit 12), RedMulE, and iDMA events to every core's EU slice.core_data_demux_eu_direct.sv: Replaced the previous single-bit registered state with a 2-entry destination FIFO. The old design was broken under CV32E40P's DEPTH=2 outstanding LSU: a second grant to a different path would overwrite the first destination, causing the first response to be silently routed to the wrong instruction. The FIFO tracks per-request destinations in issue order and includes per-path capture registers for out-of-order arrivals.eu_direct_cut.sv: Updated to the per-core array interface matching the newmagia_event_unitports.magia.sv/magia_pkg.sv: Updated mesh-level signal connections for the new tile ports.hw/mesh/noc/): Minor regeneration of all FlooNoC topologies to align with updated tile port widths.Software Changes
PULP Cluster Runtime (
sw/kernel_pulp/)New sub-tree for the embedded PULP binary:
pulp_crt0.S: Bare-metal CRT0 for the 8 cluster cores. Computes per-hart stack pointer, clears BSS (all-harts benign race), installstrap_handlerinmtvec(256-byte aligned — required by CV32E40P's hardwiredmtvec[7:2]=0), enables MEIE/MIE, writesPULP_READY, then parks in awfidispatcher loop. On MEI: readsTASKBIN/DATA, ACKsSTART, calls the task, writesPULP_DONE,mret.pulp_program.ld: PIC linker script (ORIGIN=0x0, LENGTH=64 KB). The binary is embedded into the CV32 ELF at.pulp_binary(256-byte aligned per mtvec constraint).Makefile: Builds the PULP ELF with-fPIC -mcmodel=medany -O2, converts to a C header viabin2header.py, and links it into the CV32 build.Modified CV32 Runtime
sw/kernel/link.ld: Added.pulp_binarysection (256-byte aligned ininstrram), updated stack layout to accommodate 1 main + 8 PULP core slots.sw/kernel/crt0.S: Minor updates for PULP binary embedding and stack layout alignment.New Cluster Tests (
sw/tests/cluster_tests/)Each test follows the pattern: CV32 main prepares buffers via iDMA → boots cluster → dispatches task → waits on EU bit 12 → verifies results.
hello_pulp: Smoke test — each cluster core writes its hartid to an L2 marker slot; CV32 verifies all 8 entries are present.hello_redmule_pulp: All 8 cluster cores serialize onhwpe_acquire_job(), each runs a 1×64×64 FP16 GEMM into its own private Y slot (Y_BASE + id × 4 KB). CV32 verifies all 8 slots against the golden reference. Exercises HWPE acquire contention, HCI write-back ordering, and EU broadcast masking.fpu_cluster_test: Each cluster core runs 3 FPU operations (fadd, fmul, fsub) and writes results to per-core L2 slots. CV32 verifies all entries.hello_spatz_pulp: PULP cluster cores cooperate with the Spatz vector coprocessor — each cluster core processes a private chunk of a FP16 vector, Spatz executes the SIMD kernel.New SW Utilities
sw/utils/cluster_utils.h(new): CV32 orchestrator API (cluster_boot,cluster_dispatch_task,cluster_arm_done_event,cluster_wait_done_eu) and cluster-core identity helpers (cluster_core_id,cluster_tile_id,cluster_chunk_*).sw/utils/magia_pulp_utils.h(new): Low-level PULP CSR accessors (pulp_init,pulp_run_task,pulp_clk_dis, etc.) wrapping theobi_slave_ctrl_clusterregister interface.Updated SW Utilities
sw/utils/event_unit_utils.h: Addedeu_cluster_done_init()/eu_cluster_done_wait()with absolute mask write (fixes EU broadcast race — see below),EU_CLUSTER_DONE_MASK(bit 12).sw/utils/magia_tile_utils.h: AddedPULP_CTRL_BASE,PULP_CORE_COUNT(8),PULP_HARTID_BASE, stack layout constants.Note:This is a large PR touching many files with significant architectural changes.