Good First Issue: isolate the MLX submodule build (MLX backend)

## Problem

MLX is a git submodule (`backends/mlx/third-party/mlx`) pulled into ET's build
with `add_subdirectory`, which drops MLX's whole CMake project **into ET's own
target/option namespace**:

```cmake
# backends/mlx/CMakeLists.txt:239-242
add_subdirectory(${MLX_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR}/mlx)
```

Sharing the namespace causes a collision: MLX's upstream
`FetchContent_MakeAvailable(json)` clashes with the `nlohmann_json` target ET
already provides. We work around it by **patching MLX's own `CMakeLists.txt`**
at configure time (`backends/mlx/CMakeLists.txt:216-238` applies
`patches/mlx_json.patch`, which wraps the fetch in `if(NOT TARGET nlohmann_json)`).

**Why this is bad:** patching a submodule is fragile. The patch is pinned to
specific lines of MLX's `CMakeLists.txt` and **silently stops applying** when an
MLX bump touches that region (`git apply --check` just reports "not
applicable") — and the json collision returns with no clear signal. More
broadly, `add_subdirectory` gives MLX zero isolation: every shared dep MLX
fetches (today `json`; tomorrow maybe `fmt`, `gguf`, …) is a latent collision,
and MLX's `MLX_BUILD_*` options leak into ET's cache.

**Fix:** build MLX in its **own isolated CMake scope** (via `ExternalProject`)
and consume it as a prebuilt static lib + headers + metallib through an
**imported `mlx` target**. MLX then runs its `FetchContent` in its own
namespace, so the patch becomes unnecessary and is deleted.

## How the build works today

**In ET's build** (`backends/mlx/CMakeLists.txt`):

1. Guards: submodule present (`:113-122`); deployment target ≥ macOS 14 / iOS 17 (`:123-164`).
2. **MLX options** (`:165-214`): force-set `MLX_BUILD_*` (METAL ON; CPU/CUDA/python/tests/gguf/safetensors OFF; static lib; JIT). These work *only* because `add_subdirectory` shares the cache.
3. **Patch** (`:216-238`) → *deleted by this issue.*
4. **`add_subdirectory`** (`:239-242`) → *replaced by this issue.*
5. **`mlxdelegate`** (`:259-297`): links MLX as `$<BUILD_INTERFACE:mlx>`, so `mlx` is **not** re-exported.
6. **Install** (`:299-340`): installs `mlxdelegate`/`mlx_schema`/`mlx`, MLX headers, and `mlx.metallib` to `cmake-out/lib/`; caches `MLX_METALLIB_PATH`.

**Downstream consumers** then read those installed artifacts — none of them
build MLX themselves:

| Consumer | How it gets MLX | Source |
|---|---|---|
| qwen / gemma4 runners | `find_package(executorch)` → imported `mlx` target + `MLX_METALLIB_PATH` | `examples/models/{qwen3_5_moe,gemma4_31b}/CMakeLists.txt` |
| package config | recreates imported `mlx` (`find_library` + Metal/Foundation) and `MLX_METALLIB_PATH` (`find_file`) from `cmake-out/lib/` | `tools/cmake/executorch-config.cmake:124-165` |
| metallib copy helper | copies `${MLX_METALLIB_PATH}` next to a binary | `tools/cmake/Utils.cmake:195-213` |
| pybindings wheel | copies metallib from a **hardcoded build path** into the wheel | `setup.py:1080-1085` |
| ET delegate tests | link `mlx` directly | `backends/mlx/test/CMakeLists.txt` |

**Key takeaway for scoping:** the runners and package config only depend on the
**installed** `cmake-out/lib/{libmlx.a, mlx.metallib}` + the exported
`mlxdelegate`. As long as those keep landing in `lib/`, `ExternalProject` vs
`add_subdirectory` is invisible to them. The real work is confined to ET's
in-tree build (plus the one hardcoded `setup.py` path).

## Proposed design

### Part 1 — Build MLX as an `ExternalProject`

Replace the patch + `add_subdirectory` (`:216-242`) with an `ExternalProject_Add`
that configures MLX in its **own** binary dir / CMake invocation. Forward the
options force-set today as `-D` args, plus toolchain/deployment settings, so the
Metal-only static build is identical:

```cmake
include(ExternalProject)

set(_mlx_install_dir ${CMAKE_CURRENT_BINARY_DIR}/mlx-install)
set(_mlx_static_lib  ${_mlx_install_dir}/lib/libmlx.a)
set(_mlx_metallib    ${_mlx_install_dir}/lib/mlx.metallib)

ExternalProject_Add(
  mlx_external
  SOURCE_DIR  ${MLX_SOURCE_DIR}                # submodule, unmodified
  BINARY_DIR  ${CMAKE_CURRENT_BINARY_DIR}/mlx  # isolated FetchContent scope
  INSTALL_DIR ${_mlx_install_dir}
  CMAKE_ARGS
    -DCMAKE_INSTALL_PREFIX=${_mlx_install_dir}
    -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
    -DCMAKE_CXX_STANDARD=${CMAKE_CXX_STANDARD}
    -DCMAKE_OSX_DEPLOYMENT_TARGET=${CMAKE_OSX_DEPLOYMENT_TARGET}
    -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE}   # forward iOS toolchain if set
    -DPLATFORM=${PLATFORM}
    -DMLX_BUILD_METAL=ON -DMLX_BUILD_CPU=OFF -DMLX_BUILD_CUDA=OFF
    -DMLX_BUILD_SHARED_LIBS=OFF -DMLX_BUILD_PYTHON_BINDINGS=OFF
    -DMLX_BUILD_TESTS=OFF -DMLX_BUILD_EXAMPLES=OFF -DMLX_BUILD_BENCHMARKS=OFF
    -DMLX_BUILD_GGUF=OFF -DMLX_BUILD_SAFETENSORS=OFF -DMLX_METAL_JIT=ON
  BUILD_BYPRODUCTS ${_mlx_static_lib} ${_mlx_metallib}   # required for Ninja
)
```

This is what removes the patch: MLX configures in a separate CMake process where
`nlohmann_json` does not pre-exist, so the `if(NOT TARGET nlohmann_json)` guard
is moot.

### Part 2 — Expose an imported `mlx` target (ET in-tree)

`mlxdelegate` and the ET tests link `mlx`. With `ExternalProject` the library is
produced at build time, so define an imported target:

```cmake
add_library(mlx STATIC IMPORTED GLOBAL)
set_target_properties(mlx PROPERTIES IMPORTED_LOCATION ${_mlx_static_lib})
target_include_directories(mlx INTERFACE ${MLX_SOURCE_DIR})   # headers from source tree
# A static libmlx.a carries no transitive deps — re-add MLX's frameworks
# (mirrors third-party/mlx/CMakeLists.txt:209,253).
find_library(_metal Metal)
find_library(_foundation Foundation)
find_library(_quartz QuartzCore)
set_property(TARGET mlx PROPERTY INTERFACE_LINK_LIBRARIES
            ${_metal} ${_foundation} ${_quartz})   # + Accelerate iff MLX_BUILD_CPU
add_dependencies(mlx mlx_external)
```

`mlxdelegate` then links plain `mlx` (drop the `$<BUILD_INTERFACE:mlx>` wrapper
at `:289-291`). **Three things to get right:**

1. **Frameworks** — a static `libmlx.a` exports no transitive deps, so the
   imported target must re-add Metal/Foundation/QuartzCore, or `mlxdelegate`,
   `portable_lib`, and the tests fail with undefined-symbol errors. (ET's
   package config does the downstream half of this at
   `executorch-config.cmake:138-147`, Metal+Foundation only.)
2. **Compile order** — `add_dependencies` on an *imported* target does not order
   the build. Also add `add_dependencies(mlxdelegate mlx_external)` so
   `libmlx.a` exists before `mlxdelegate` links.
3. **Headers** — point the include dir at the **submodule source tree**
   (`${MLX_SOURCE_DIR}`, always present), not the ExternalProject install dir
   (only materializes after the external build), so `mlxdelegate` compilation
   can't race the build.

The runners need nothing here — they get `mlx` from the package config (Part 3),
not this in-tree target.

### Part 3 — Keep install rules + `MLX_METALLIB_PATH` working

Repoint the **in-tree** sources at the ExternalProject output; keep installing
to `CMAKE_INSTALL_LIBDIR` (= `lib`) so the downstream consumers keep finding
everything unchanged:

- `MLX_METALLIB_PATH` cache var (`:337-340`)
- `install(... mlx.metallib ...)` (`:330-333`) and `libmlx.a` / headers installs (`:306-314`)

These then need **no edits**, since they only read the installed `lib/` or the
cache var: package config (`executorch-config.cmake:124-165`) and the copy
helper (`Utils.cmake:195-213`).

**One downstream fix is required** — the pybindings wheel uses a hardcoded
build-tree path that the move breaks:

```python
# setup.py:1080-1085 — copies the metallib into the wheel
BuiltFile(
    src_dir="%CMAKE_CACHE_DIR%/backends/mlx/mlx/mlx/backend/metal/kernels/",  # add_subdirectory layout
    src_name="mlx.metallib",
    dst="executorch/extension/pybindings/",
    dependent_cmake_flags=["EXECUTORCH_BUILD_MLX"],
),
```

Repoint `src_dir` to the new metallib location. (This is a separate mechanism
from the in-CMake copy at `CMakeLists.txt:1161-1165`, which uses
`MLX_METALLIB_PATH` and is fine.)

> **Confirm the metallib location.** Today we copy from MLX's *build tree*
> (`mlx/mlx/backend/metal/kernels/mlx.metallib`). Check whether MLX's own
> `install()` emits the metallib: if so, point `MLX_METALLIB_PATH`,
> `BUILD_BYPRODUCTS`, and `setup.py` `src_dir` at the install dir; if not, at
> the build-tree path inside `BINARY_DIR`.

### Part 4 — Delete the patch machinery

Remove `backends/mlx/patches/mlx_json.patch`, the `_mlx_patches` loop
(`:216-238`), and the `patches/` dir if empty.

## Alternatives considered

- **FetchContent with `EXCLUDE_FROM_ALL` / scoped vars** — doesn't fix the json
  collision (targets still land in ET's namespace), so the patch would remain.
- **Pre-providing `nlohmann_json` to MLX** — effectively today's patch; keeps the
  coupling we want gone.
- **`ExternalProject` (recommended)** — the only option giving MLX a fully
  separate configure/build, deleting the patch and immunizing ET against future
  shared-dep collisions. Costs: imported-target plumbing (incl. frameworks),
  build-time (not configure-time) availability of `libmlx.a`, and weaker
  incremental rebuilds (won't rebuild MLX on source change without
  `BUILD_ALWAYS`) — fine for a pinned submodule contributors don't edit.

## Acceptance criteria

- [ ] `cmake --workflow --preset mlx-release` builds with the MLX submodule
      **pristine** (`git -C backends/mlx/third-party/mlx status` clean).
- [ ] `patches/mlx_json.patch` and the patch loop are deleted.
- [ ] `mlxdelegate` and `mlx` are valid, linkable targets.
- [ ] ET delegate tests build/pass (`-DEXECUTORCH_BUILD_TESTS=ON`):
      `op_test_runner`, `multi_thread_test_runner`, `mlx_mutable_state_test`,
      `strict_compile_test` (they link `mlx` directly).
- [ ] `mlx.metallib` lands next to `_portable_lib.so` and the qwen/gemma runners
      (CI check `.github/workflows/mlx.yml:185-186` passes).
- [ ] The pybindings **wheel** ships `mlx.metallib` next to `_portable_lib.so`;
      `python install_executorch.py` then running an MLX `.pte` works.
- [ ] `find_package(executorch)` consumers still resolve `mlx` +
      `MLX_METALLIB_PATH` (i.e. both files install to `cmake-out/lib/`).
- [ ] `make qwen3_5_moe-mlx` and `make gemma4_31b-mlx` build/run unchanged.
- [ ] MLX CI (`.github/workflows/mlx.yml`) is green; bumping the submodule needs
      no patch.
-[] delegate binary size is unchanged

## Out of scope (follow-ups)

- Caching/prebuilt MLX artifacts to speed clean builds.
- Documenting the new flow in `backends/mlx/README.md`.
- Applying the same isolation to other backends that vendor shared deps.

## Pointers

- MLX integration: `backends/mlx/CMakeLists.txt:113-340` — patch loop `:216-238`,
  `add_subdirectory` `:239-242`, options `:165-214`, `mlx` link `:289-291`,
  install/metallib `:299-340`
- Patch: `backends/mlx/patches/mlx_json.patch` · submodule: `.gitmodules`
- Root hookup: `CMakeLists.txt:724-727`, `:1052-1053`, `:1161-1165`
- Metallib helper: `tools/cmake/Utils.cmake:195-213`
- Package config (downstream `mlx` + `MLX_METALLIB_PATH`):
  `tools/cmake/executorch-config.cmake:124-165`
- Pybindings wheel metallib copy (must repoint): `setup.py:1080-1085`
- Presets: `CMakePresets.json:339-377`; `tools/cmake/preset/mlx.cmake`
- Tests: `backends/mlx/test/CMakeLists.txt`
- Runners (no change expected): `examples/models/{qwen3_5_moe,gemma4_31b,llama,voxtral,voxtral_realtime,parakeet}/CMakeLists.txt`
- CI: `.github/workflows/mlx.yml`


Consumer	How it gets MLX	Source
qwen / gemma4 runners	`find_package(executorch)` → imported `mlx` target + `MLX_METALLIB_PATH`	`examples/models/{qwen3_5_moe,gemma4_31b}/CMakeLists.txt`
package config	recreates imported `mlx` (`find_library` + Metal/Foundation) and `MLX_METALLIB_PATH` (`find_file`) from `cmake-out/lib/`	`tools/cmake/executorch-config.cmake:124-165`
metallib copy helper	copies `${MLX_METALLIB_PATH}` next to a binary	`tools/cmake/Utils.cmake:195-213`
pybindings wheel	copies metallib from a hardcoded build path into the wheel	`setup.py:1080-1085`
ET delegate tests	link `mlx` directly	`backends/mlx/test/CMakeLists.txt`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Good First Issue: isolate the MLX submodule build (MLX backend) #20556

Problem

How the build works today

Proposed design

Part 1 — Build MLX as an `ExternalProject`

Part 2 — Expose an imported `mlx` target (ET in-tree)

Part 3 — Keep install rules + `MLX_METALLIB_PATH` working

Part 4 — Delete the patch machinery

Alternatives considered

Acceptance criteria

Out of scope (follow-ups)

Pointers

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Good First Issue: isolate the MLX submodule build (MLX backend) #20556

Description

Problem

How the build works today

Proposed design

Part 1 — Build MLX as an ExternalProject

Part 2 — Expose an imported mlx target (ET in-tree)

Part 3 — Keep install rules + MLX_METALLIB_PATH working

Part 4 — Delete the patch machinery

Alternatives considered

Acceptance criteria

Out of scope (follow-ups)

Pointers

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Part 1 — Build MLX as an `ExternalProject`

Part 2 — Expose an imported `mlx` target (ET in-tree)

Part 3 — Keep install rules + `MLX_METALLIB_PATH` working