From 8cbc056b7e7e98e204407f14c2932ceec49582c5 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Tue, 2 Jun 2026 11:35:52 +0000
Subject: [PATCH 01/25] docs: plan for event-driven async + auto-present
 refactor

Adds the Phase 0 handoff plan: background WaitAny GPU thread to replace the
JS-thread polling loop, and a global Choreographer/CADisplayLink-driven
auto-present to remove the non-standard context.present().

https://claude.ai/code/session_01TY7QS4Kiqo1oGQoBodCH56
---
 docs/refactor-async-present-plan.md | 211 ++++++++++++++++++++++++++++
 1 file changed, 211 insertions(+)
 create mode 100644 docs/refactor-async-present-plan.md

diff --git a/docs/refactor-async-present-plan.md b/docs/refactor-async-present-plan.md
new file mode 100644
index 000000000..6c22cc98d
--- /dev/null
+++ b/docs/refactor-async-present-plan.md
@@ -0,0 +1,211 @@
+# Refactor: event-driven async + auto-present
+
+Status: **planning / Phase 0 (local spikes)**
+Branch: `claude/keen-darwin-xeywa`
+
+This document is the handoff for moving the async + present refactor forward. Phase 0
+(spikes) needs a real local machine: installed `node_modules`, a Dawn build, and the
+iOS/Android toolchains. Everything below the "How to resume locally" section is meant to
+be executed on your computer, not in the web container.
+
+---
+
+## Goals (locked)
+
+- **Async**: replace the JS-thread polling loop with a **background `WaitAny` GPU thread**
+  (Dawn `TimedWaitAny` is already enabled — `packages/webgpu/cpp/rnwgpu/api/GPU.cpp:17-23`).
+- **Present**: **remove `context.present()` entirely** (breaking) in favor of a **global
+  Choreographer / CADisplayLink-driven auto-present**.
+- **Scope**: first-class for **all runtimes** — main JS, the reanimated UI runtime, and
+  `createWorkletRuntime` dedicated runtimes.
+
+---
+
+## What exists today (the two problems)
+
+### Async (polling) — `packages/webgpu/cpp/rnwgpu/async/`
+- Every async op (`requestAdapter`, `requestDevice`, `mapAsync`, `onSubmittedWorkDone`,
+  `createRender/ComputePipelineAsync`, `popErrorScope`) registers a Dawn callback with
+  `CallbackMode::AllowProcessEvents` and calls `AsyncRunner::postTask`.
+- `AsyncRunner::requestTick` (`async/AsyncRunner.cpp:89-177`) schedules `tick()` via
+  `setImmediate` / `setTimeout(4ms)` / `queueMicrotask`; `tick()` calls
+  `_instance.ProcessEvents()` and **re-schedules itself while any task is "pumping"**
+  (`AsyncRunner.cpp:189-191`). This is a busy reschedule loop: wasted CPU when idle, added
+  latency, and `JSIMicrotaskDispatcher`'s `queueMicrotask` dispatch is only thread-safe when
+  called on the runtime's own thread.
+
+### Present (manual, non-standard)
+`api/GPUCanvasContext.cpp:56-65` → `SurfaceRegistry.h:116-121` → `wgpu::Surface::Present()`.
+The user must call `context.present()` after every `queue.submit` (**16 JS/TS call sites**).
+No CADisplayLink/Choreographer exists; RN's `requestAnimationFrame` is the only frame driver.
+On Apple, present also does a blocking `WaitForCommandsToBeScheduled` on the JS thread.
+
+---
+
+## Target architecture
+
+Three new pieces:
+
+### A. `RuntimeScheduler` — thread-safe "post to this runtime's JS thread"
+Replaces `AsyncDispatcher` / `JSIMicrotaskDispatcher` (which use non-thread-safe
+`queueMicrotask`).
+- Interface: `void scheduleOnJS(std::function<void(jsi::Runtime&)>)`, callable from any thread.
+- **Main runtime**: wraps `react::CallInvoker::invokeAsync` (already available —
+  `apple/WebGPUModule.mm:70`, `android/cpp/cpp-adapter.cpp:25-29`).
+- **Worklet runtimes**: wraps the worklet runtime's own thread executor from
+  `react-native-worklets` 0.8.3 (**see Phase 0 spike #1**).
+- Stored per-runtime in a `RuntimeContext` (the "per-JS-thread event loop"), created on first
+  WebGPU use, torn down via the existing `RuntimeLifecycleMonitor` / `RuntimeAwareCache`
+  (`cpp/jsi/RuntimeAwareCache.h`).
+
+### B. `GpuEventLoop` — background `WaitAny` thread (no polling)
+One per `wgpu::Instance` (effectively global).
+- All async sites switch `CallbackMode::AllowProcessEvents` → **`CallbackMode::WaitAnyOnly`**,
+  returning a `wgpu::Future`.
+- A **small bounded thread pool**; each pending future is waited via
+  `instance.WaitAny(future, /*timeout*/UINT64_MAX)` on a pool thread → genuinely event-driven,
+  **zero idle CPU**, resolves the instant GPU work completes. No wake/interrupt problem (each
+  thread owns one future). **See Phase 0 spike #2.**
+- On completion the worker marshals the result and calls the owning runtime's
+  `RuntimeScheduler.scheduleOnJS` to settle the JS Promise. `AsyncTaskHandle` / `Promise`
+  settle logic is reused; `AsyncRunner` + its tick loop are deleted.
+- Fallback (if concurrent `WaitAny` on one instance is unsafe): single worker thread waiting on
+  the batched future set with a condition-variable re-arm.
+
+### C. `FrameDriver` — global vsync source for auto-present
+One UI-thread singleton; removes the need for `present()`.
+- **iOS**: `CADisplayLink` on the main run loop. **Android**: NDK
+  `AChoreographer_postFrameCallback` from C++ (API 24+, avoids JNI). **See Phase 0 spike #3.**
+- Lifecycle: started when ≥1 surface is configured, stopped at 0.
+- **Auto-present semantics** (spec-aligned "update the rendering" after rAF):
+  1. `GPUCanvasContext::getCurrentTexture()` marks its `SurfaceInfo` dirty and registers a
+     present request with `FrameDriver`, tagged with the owning runtime.
+  2. Each vsync (UI thread), `FrameDriver` dispatches each dirty context's present onto its
+     **owning runtime's `RuntimeScheduler`** — so `Surface.Present()` + the Apple Metal
+     scheduling wait run on the same thread that did `getCurrentTexture` / `submit`, preserving
+     Dawn surface thread-affinity and guaranteeing present-after-submit ordering (FIFO on that
+     loop). Clear dirty after present.
+- Offscreen path (`SurfaceRegistry` `switchToOffscreen`, `src/Offscreen.ts`) has no surface →
+  present is a no-op; tests keep reading back the CPU texture.
+
+---
+
+## Phase 0 — Local spikes (DO THESE FIRST, on your machine)
+
+These de-risk the refactor before any large change. Run from repo root.
+
+```bash
+# 0. install deps (web container can't do this)
+yarn install
+```
+
+### Spike 1 — worklet-runtime scheduler (HIGHEST RISK)
+Goal: obtain a **thread-safe** "schedule this lambda on runtime R's thread" for an arbitrary
+worklet runtime (UI runtime + a `createWorkletRuntime` runtime) using
+`react-native-worklets@0.8.3`.
+
+```bash
+# inspect the worklets native API actually shipped at 0.8.3
+find node_modules/react-native-worklets -name "*.h" | grep -iE "Runtime|Scheduler|Invoker|Queue"
+# look for: WorkletRuntime, RuntimeManager / WorkletsModuleProxy, UIScheduler / JSScheduler,
+# and any per-runtime executor / async queue we can call from a background C++ thread.
+```
+Deliverable: a one-paragraph note on the exact symbol(s) to use (or "not exposed → needs JS
+shim / worklets PR"). This determines whether Phase 3 (first-class worklet runtimes) is cheap
+or needs a workaround.
+
+### Spike 2 — concurrent `WaitAny` on one Dawn instance
+Goal: confirm multiple threads can each call `instance.WaitAny(singleFuture, UINT64_MAX)`
+concurrently on the **same** instance safely. If not, switch `GpuEventLoop` to the
+single-worker + condition-variable fallback.
+- Search Dawn headers/docs in `externals/dawn` (or built `libs/`) for `WaitAny` threading
+  guarantees. A tiny throwaway C++ test against the built Dawn is ideal.
+
+### Spike 3 — Android frame callback
+Goal: confirm NDK `AChoreographer_postFrameCallback` is usable at the project `minSdk`
+(`packages/webgpu/android/build.gradle`). If `minSdk < 24` for that API, plan the Java
+`Choreographer` + JNI bridge instead.
+
+---
+
+## Implementation phases (after Phase 0)
+
+**Phase 1 — Event-driven async** (no public API change; `present()` untouched)
+- Add `RuntimeScheduler` (+ main-runtime CallInvoker impl) and `GpuEventLoop`.
+- Switch all 7 async sites to `WaitAnyOnly` + `GpuEventLoop.addFuture(...)`:
+  `api/GPU.cpp`, `api/GPUAdapter.cpp`, `api/GPUDevice.cpp` (×3), `api/GPUBuffer.cpp`,
+  `api/GPUQueue.cpp`, `api/GPUShaderModule.cpp`.
+- Delete `async/AsyncRunner.*` polling + `async/JSIMicrotaskDispatcher.*`; keep
+  `AsyncTaskHandle` / `Promise` settle path on the new scheduler.
+
+**Phase 2 — Auto-present + remove `present()`**
+- Add `FrameDriver` (iOS `CADisplayLink`, Android `AChoreographer`); wire
+  `getCurrentTexture` → register; vsync → dispatch present to owning runtime.
+- Remove `GPUCanvasContext::present` (`api/GPUCanvasContext.h:50,58`, `.cpp:56-65`) and
+  `SurfaceInfo::present` (`SurfaceRegistry.h:116-121`).
+- JS: drop `present` from `RNCanvasContext` (`src/Canvas.tsx:22-24`, `src/types.ts`).
+- Migrate all 16 example / `useWebGPU` call sites + `README.md` + `packages/webgpu/README.md`.
+
+**Phase 3 — First-class worklet runtimes**
+- Worklet-runtime `RuntimeScheduler` impl (per Spike 1); verify auto-present dispatch on UI +
+  dedicated runtimes; update `apps/example/src/Reanimated/Reanimated.tsx` (drop `present()`,
+  keep its own rAF loop).
+
+**Phase 4 — Validation**
+```bash
+yarn tsc && yarn lint
+yarn workspace react-native-wgpu test         # offscreen readback + demo specs
+yarn build:ios        # or: yarn workspace example ios
+yarn build:android    # or: yarn workspace example android
+```
+Verify: no idle-CPU polling (logging), correct frame pacing, no present-ordering glitches,
+Reanimated UI/Dedicated examples render.
+
+---
+
+## 16 `present()` call sites to migrate (Phase 2)
+
+```
+apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
+apps/example/src/components/useWebGPU.ts
+apps/example/src/components/Texture.tsx
+apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
+apps/example/src/ThreeJS/Helmet.tsx
+apps/example/src/ComputeToys/engine/index.ts
+apps/example/src/CanvasAPI/CanvasAPI.tsx
+apps/example/src/ThreeJS/PostProcessing.tsx
+apps/example/src/ThreeJS/Cube.tsx
+apps/example/src/Triangle/HelloTriangle.tsx
+apps/example/src/Triangle/HelloTriangleMSAA.tsx
+apps/example/src/ThreeJS/InstancedMesh.tsx
+apps/example/src/ThreeJS/Retargeting.tsx
+apps/example/src/ThreeJS/components/FiberCanvas.tsx
+apps/example/src/Reanimated/Reanimated.tsx
+apps/example/src/ThreeJS/Backdrop.tsx
+```
+Plus `README.md` and `packages/webgpu/README.md`.
+
+---
+
+## Risks / open questions
+- **Worklet-runtime scheduler** access in worklets 0.8.3 (Spike 1 — highest risk).
+- **Concurrent `WaitAny`** semantics on one Dawn instance (Spike 2; single-worker fallback ready).
+- **Present timing**: vsync-dispatched-to-owning-loop must land after submit (FIFO on that loop)
+  and before the next `getCurrentTexture`.
+- **Breaking change**: `present()` removed — type, examples, README updated together.
+- **Apple Metal wait** moves into the frame-boundary present task, off the synchronous call path.
+
+---
+
+## How to resume locally
+
+```bash
+git fetch origin claude/keen-darwin-xeywa
+git checkout claude/keen-darwin-xeywa
+git pull origin claude/keen-darwin-xeywa
+# open this file and run Phase 0 spikes, then start Claude Code:
+#   claude
+# suggested kickoff prompt:
+#   "Read docs/refactor-async-present-plan.md. Run the Phase 0 spikes and report
+#    findings before implementing. Develop on this branch."
+```

From 234384d810f6e4874eefb57a3f8cd81f7bf44ef9 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 14:17:23 +0200
Subject: [PATCH 02/25] :wrench:

---
 packages/webgpu/android/CMakeLists.txt        |   3 +-
 .../webgpu/cpp/rnwgpu/RNWebGPUManager.cpp     |   4 +-
 packages/webgpu/cpp/rnwgpu/api/GPU.cpp        |  21 ++-
 packages/webgpu/cpp/rnwgpu/api/GPU.h          |   7 +-
 packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp |  16 +-
 packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp  |  50 ++---
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp  |  84 +++++----
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.h    |   8 +-
 packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp   |   7 +-
 .../webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp |   7 +-
 .../webgpu/cpp/rnwgpu/async/AsyncDispatcher.h |  28 ---
 .../webgpu/cpp/rnwgpu/async/AsyncRunner.cpp   | 172 +++---------------
 .../webgpu/cpp/rnwgpu/async/AsyncRunner.h     |  48 ++---
 .../cpp/rnwgpu/async/AsyncTaskHandle.cpp      |  31 +---
 .../webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h |  12 +-
 .../cpp/rnwgpu/async/CallInvokerScheduler.cpp |  21 +++
 .../cpp/rnwgpu/async/CallInvokerScheduler.h   |  32 ++++
 .../webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp  | 102 +++++++++++
 .../webgpu/cpp/rnwgpu/async/GpuEventLoop.h    |  70 +++++++
 .../rnwgpu/async/JSIMicrotaskDispatcher.cpp   |  23 ---
 .../cpp/rnwgpu/async/JSIMicrotaskDispatcher.h |  22 ---
 .../cpp/rnwgpu/async/RuntimeScheduler.h       |  31 ++++
 22 files changed, 444 insertions(+), 355 deletions(-)
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/AsyncDispatcher.h
 create mode 100644 packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp
 create mode 100644 packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h
 create mode 100644 packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
 create mode 100644 packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.cpp
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.h
 create mode 100644 packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h

diff --git a/packages/webgpu/android/CMakeLists.txt b/packages/webgpu/android/CMakeLists.txt
index fcab0baa1..33704d56c 100644
--- a/packages/webgpu/android/CMakeLists.txt
+++ b/packages/webgpu/android/CMakeLists.txt
@@ -51,7 +51,8 @@ add_library(${PACKAGE_NAME} SHARED
     ../cpp/jsi/RuntimeAwareCache.cpp
     ../cpp/rnwgpu/async/AsyncRunner.cpp
     ../cpp/rnwgpu/async/AsyncTaskHandle.cpp
-    ../cpp/rnwgpu/async/JSIMicrotaskDispatcher.cpp
+    ../cpp/rnwgpu/async/CallInvokerScheduler.cpp
+    ../cpp/rnwgpu/async/GpuEventLoop.cpp
 )
 
 target_include_directories(
diff --git a/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp b/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
index 38f675d37..fe42b9020 100644
--- a/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
+++ b/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
@@ -31,8 +31,8 @@
 #include "GPURenderPassEncoder.h"
 #include "GPURenderPipeline.h"
 #include "GPUSampler.h"
-#include "GPUSharedTextureMemory.h"
 #include "GPUShaderModule.h"
+#include "GPUSharedTextureMemory.h"
 #include "GPUSupportedLimits.h"
 #include "GPUTexture.h"
 #include "GPUTextureView.h"
@@ -63,7 +63,7 @@ RNWebGPUManager::RNWebGPUManager(
   // Register main runtime for RuntimeAwareCache
   BaseRuntimeAwareCache::setMainJsRuntime(_jsRuntime);
 
-  auto gpu = std::make_shared<GPU>(*_jsRuntime);
+  auto gpu = std::make_shared<GPU>(*_jsRuntime, _jsCallInvoker);
   auto rnWebGPU =
       std::make_shared<RNWebGPU>(gpu, _platformContext, _jsCallInvoker);
   _gpu = gpu->get();
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
index 764a9aa32..7332ac394 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
@@ -9,11 +9,14 @@
 
 #include "Convertors.h"
 #include "JSIConverter.h"
-#include "rnwgpu/async/JSIMicrotaskDispatcher.h"
+#include "rnwgpu/async/CallInvokerScheduler.h"
+#include "rnwgpu/async/GpuEventLoop.h"
 
 namespace rnwgpu {
 
-GPU::GPU(jsi::Runtime &runtime) : NativeObject(CLASS_NAME) {
+GPU::GPU(jsi::Runtime &runtime,
+         std::shared_ptr<facebook::react::CallInvoker> callInvoker)
+    : NativeObject(CLASS_NAME) {
   static const auto kTimedWaitAny = wgpu::InstanceFeatureName::TimedWaitAny;
   wgpu::InstanceDescriptor instanceDesc{.requiredFeatureCount = 1,
                                         .requiredFeatures = &kTimedWaitAny};
@@ -22,8 +25,11 @@ GPU::GPU(jsi::Runtime &runtime) : NativeObject(CLASS_NAME) {
   instanceDesc.requiredLimits = &limits;
   _instance = wgpu::CreateInstance(&instanceDesc);
 
-  auto dispatcher = std::make_shared<async::JSIMicrotaskDispatcher>(runtime);
-  _async = async::AsyncRunner::getOrCreate(runtime, _instance, dispatcher);
+  auto scheduler =
+      std::make_shared<async::CallInvokerScheduler>(std::move(callInvoker));
+  auto eventLoop = std::make_shared<async::GpuEventLoop>(_instance);
+  _async = async::AsyncRunner::getOrCreate(runtime, std::move(scheduler),
+                                           std::move(eventLoop));
 }
 
 async::AsyncTaskHandle GPU::requestAdapter(
@@ -41,9 +47,10 @@ async::AsyncTaskHandle GPU::requestAdapter(
   aOptions.backendType = kDefaultBackendType;
   return _async->postTask(
       [this, aOptions](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                       const async::AsyncTaskHandle::RejectFunction &reject) {
-        _instance.RequestAdapter(
-            &aOptions, wgpu::CallbackMode::AllowProcessEvents,
+                       const async::AsyncTaskHandle::RejectFunction &reject)
+          -> wgpu::Future {
+        return _instance.RequestAdapter(
+            &aOptions, wgpu::CallbackMode::WaitAnyOnly,
             [asyncRunner = _async, resolve,
              reject](wgpu::RequestAdapterStatus status, wgpu::Adapter adapter,
                      wgpu::StringView message) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.h b/packages/webgpu/cpp/rnwgpu/api/GPU.h
index f6bb4ede3..89f46526b 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.h
@@ -19,6 +19,10 @@
 
 #include <webgpu/webgpu.h>
 
+namespace facebook::react {
+class CallInvoker;
+} // namespace facebook::react
+
 namespace rnwgpu {
 
 namespace jsi = facebook::jsi;
@@ -27,7 +31,8 @@ class GPU : public NativeObject<GPU> {
 public:
   static constexpr const char *CLASS_NAME = "GPU";
 
-  explicit GPU(jsi::Runtime &runtime);
+  GPU(jsi::Runtime &runtime,
+      std::shared_ptr<facebook::react::CallInvoker> callInvoker);
 
 public:
   std::string getBrand() { return CLASS_NAME; }
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
index 57f77b625..0a35a39e8 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
@@ -92,14 +92,14 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
       [this, aDescriptor, descriptor, label = std::move(label),
        deviceLostBinding,
        creationRuntime](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                        const async::AsyncTaskHandle::RejectFunction &reject) {
+                        const async::AsyncTaskHandle::RejectFunction &reject)
+          -> wgpu::Future {
         (void)descriptor;
-        _instance.RequestDevice(
-            &aDescriptor, wgpu::CallbackMode::AllowProcessEvents,
+        return _instance.RequestDevice(
+            &aDescriptor, wgpu::CallbackMode::WaitAnyOnly,
             [asyncRunner = _async, resolve, reject, label, creationRuntime,
              deviceLostBinding](wgpu::RequestDeviceStatus status,
-                                wgpu::Device device,
-                                wgpu::StringView message) {
+                                wgpu::Device device, wgpu::StringView message) {
               if (message.length) {
                 fprintf(stderr, "%s", message.data);
               }
@@ -123,14 +123,12 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
                     case wgpu::LoggingType::Warning:
                       logLevel = "Warning";
                       Logger::warnToJavascriptConsole(
-                          *creationRuntime,
-                          std::string(msg.data, msg.length));
+                          *creationRuntime, std::string(msg.data, msg.length));
                       break;
                     case wgpu::LoggingType::Error:
                       logLevel = "Error";
                       Logger::errorToJavascriptConsole(
-                          *creationRuntime,
-                          std::string(msg.data, msg.length));
+                          *creationRuntime, std::string(msg.data, msg.length));
                       break;
                     case wgpu::LoggingType::Verbose:
                       logLevel = "Verbose";
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
index 4d6012621..a53d97940 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
@@ -54,31 +54,31 @@ async::AsyncTaskHandle GPUBuffer::mapAsync(uint64_t modeIn,
   return _async->postTask(
       [bufferHandle, mode, resolvedOffset,
        rangeSize](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                  const async::AsyncTaskHandle::RejectFunction &reject) {
-        bufferHandle.MapAsync(mode, resolvedOffset, rangeSize,
-                              wgpu::CallbackMode::AllowProcessEvents,
-                              [resolve, reject](wgpu::MapAsyncStatus status,
-                                                wgpu::StringView message) {
-                                switch (status) {
-                                case wgpu::MapAsyncStatus::Success:
-                                  resolve(nullptr);
-                                  break;
-                                case wgpu::MapAsyncStatus::CallbackCancelled:
-                                  reject("MapAsyncStatus::CallbackCancelled");
-                                  break;
-                                case wgpu::MapAsyncStatus::Error:
-                                  reject("MapAsyncStatus::Error");
-                                  break;
-                                case wgpu::MapAsyncStatus::Aborted:
-                                  reject("MapAsyncStatus::Aborted");
-                                  break;
-                                default:
-                                  reject(
-                                      "MapAsyncStatus: " +
-                                      std::to_string(static_cast<int>(status)));
-                                  break;
-                                }
-                              });
+                  const async::AsyncTaskHandle::RejectFunction &reject)
+          -> wgpu::Future {
+        return bufferHandle.MapAsync(
+            mode, resolvedOffset, rangeSize, wgpu::CallbackMode::WaitAnyOnly,
+            [resolve, reject](wgpu::MapAsyncStatus status,
+                              wgpu::StringView message) {
+              switch (status) {
+              case wgpu::MapAsyncStatus::Success:
+                resolve(nullptr);
+                break;
+              case wgpu::MapAsyncStatus::CallbackCancelled:
+                reject("MapAsyncStatus::CallbackCancelled");
+                break;
+              case wgpu::MapAsyncStatus::Error:
+                reject("MapAsyncStatus::Error");
+                break;
+              case wgpu::MapAsyncStatus::Aborted:
+                reject("MapAsyncStatus::Aborted");
+                break;
+              default:
+                reject("MapAsyncStatus: " +
+                       std::to_string(static_cast<int>(status)));
+                break;
+              }
+            });
       });
 }
 
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
index f80d7fadf..98d06b75a 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
@@ -19,23 +19,33 @@ namespace rnwgpu {
 
 void GPUDevice::notifyDeviceLost(wgpu::DeviceLostReason reason,
                                  std::string message) {
-  if (_lostSettled) {
-    return;
-  }
+  std::optional<async::AsyncTaskHandle::ResolveFunction> resolveToCall;
+  std::shared_ptr<GPUDeviceLostInfo> info;
+  {
+    std::lock_guard<std::mutex> lock(_lostMutex);
+    if (_lostSettled) {
+      return;
+    }
 
-  _lostSettled = true;
-  _lostInfo = std::make_shared<GPUDeviceLostInfo>(reason, std::move(message));
+    _lostSettled = true;
+    _lostInfo = std::make_shared<GPUDeviceLostInfo>(reason, std::move(message));
+    info = _lostInfo;
+
+    if (_lostResolve.has_value()) {
+      resolveToCall = std::move(*_lostResolve);
+      _lostResolve.reset();
+    }
 
-  if (_lostResolve.has_value()) {
-    auto resolve = std::move(*_lostResolve);
-    _lostResolve.reset();
-    resolve([info = _lostInfo](jsi::Runtime &runtime) mutable {
+    _lostHandle.reset();
+  }
+
+  // Settle outside the lock: resolve() only enqueues onto the JS thread.
+  if (resolveToCall.has_value()) {
+    (*resolveToCall)([info](jsi::Runtime &runtime) mutable {
       return JSIConverter<std::shared_ptr<GPUDeviceLostInfo>>::toJSI(runtime,
                                                                      info);
     });
   }
-
-  _lostHandle.reset();
 }
 
 void GPUDevice::forceLossForTesting() {
@@ -302,10 +312,10 @@ async::AsyncTaskHandle GPUDevice::createComputePipelineAsync(
                               const async::AsyncTaskHandle::ResolveFunction
                                   &resolve,
                               const async::AsyncTaskHandle::RejectFunction
-                                  &reject) {
+                                  &reject) -> wgpu::Future {
     (void)descriptor;
-    device.CreateComputePipelineAsync(
-        &desc, wgpu::CallbackMode::AllowProcessEvents,
+    return device.CreateComputePipelineAsync(
+        &desc, wgpu::CallbackMode::WaitAnyOnly,
         [pipelineHolder, resolve,
          reject](wgpu::CreatePipelineAsyncStatus status,
                  wgpu::ComputePipeline pipeline, wgpu::StringView msg) {
@@ -316,9 +326,9 @@ async::AsyncTaskHandle GPUDevice::createComputePipelineAsync(
                   runtime, pipelineHolder);
             });
           } else {
-            std::string error =
-                msg.length ? std::string(msg.data, msg.length)
-                           : "Failed to create compute pipeline";
+            std::string error = msg.length
+                                    ? std::string(msg.data, msg.length)
+                                    : "Failed to create compute pipeline";
             reject(std::move(error));
           }
         });
@@ -344,10 +354,10 @@ async::AsyncTaskHandle GPUDevice::createRenderPipelineAsync(
                               const async::AsyncTaskHandle::ResolveFunction
                                   &resolve,
                               const async::AsyncTaskHandle::RejectFunction
-                                  &reject) {
+                                  &reject) -> wgpu::Future {
     (void)descriptor;
-    device.CreateRenderPipelineAsync(
-        &desc, wgpu::CallbackMode::AllowProcessEvents,
+    return device.CreateRenderPipelineAsync(
+        &desc, wgpu::CallbackMode::WaitAnyOnly,
         [pipelineHolder, resolve,
          reject](wgpu::CreatePipelineAsyncStatus status,
                  wgpu::RenderPipeline pipeline, wgpu::StringView msg) {
@@ -358,9 +368,8 @@ async::AsyncTaskHandle GPUDevice::createRenderPipelineAsync(
                   runtime, pipelineHolder);
             });
           } else {
-            std::string error =
-                msg.length ? std::string(msg.data, msg.length)
-                           : "Failed to create render pipeline";
+            std::string error = msg.length ? std::string(msg.data, msg.length)
+                                           : "Failed to create render pipeline";
             reject(std::move(error));
           }
         });
@@ -377,9 +386,9 @@ async::AsyncTaskHandle GPUDevice::popErrorScope() {
   return _async->postTask([device](const async::AsyncTaskHandle::ResolveFunction
                                        &resolve,
                                    const async::AsyncTaskHandle::RejectFunction
-                                       &reject) {
-    device.PopErrorScope(
-        wgpu::CallbackMode::AllowProcessEvents,
+                                       &reject) -> wgpu::Future {
+    return device.PopErrorScope(
+        wgpu::CallbackMode::WaitAnyOnly,
         [resolve, reject](wgpu::PopErrorScopeStatus status,
                           wgpu::ErrorType type, wgpu::StringView message) {
           if (status == wgpu::PopErrorScopeStatus::Error ||
@@ -447,6 +456,11 @@ std::unordered_set<std::string> GPUDevice::getFeatures() {
 }
 
 async::AsyncTaskHandle GPUDevice::getLost() {
+  // Held across the whole body: the postTask callback below runs synchronously
+  // on this (JS) thread and touches the same _lost* fields, so it must not
+  // re-lock. notifyDeviceLost() takes the same lock from its (possibly worker)
+  // thread.
+  std::lock_guard<std::mutex> lock(_lostMutex);
   if (_lostHandle.has_value()) {
     return *_lostHandle;
   }
@@ -455,29 +469,33 @@ async::AsyncTaskHandle GPUDevice::getLost() {
     return _async->postTask(
         [info = _lostInfo](
             const async::AsyncTaskHandle::ResolveFunction &resolve,
-            const async::AsyncTaskHandle::RejectFunction & /*reject*/) {
+            const async::AsyncTaskHandle::RejectFunction & /*reject*/)
+            -> wgpu::Future {
           resolve([info](jsi::Runtime &runtime) mutable {
             return JSIConverter<std::shared_ptr<GPUDeviceLostInfo>>::toJSI(
                 runtime, info);
           });
-        },
-        false);
+          // No Dawn event to wait on: resolved synchronously.
+          return wgpu::Future{};
+        });
   }
 
   auto handle = _async->postTask(
       [this](const async::AsyncTaskHandle::ResolveFunction &resolve,
-             const async::AsyncTaskHandle::RejectFunction & /*reject*/) {
+             const async::AsyncTaskHandle::RejectFunction & /*reject*/)
+          -> wgpu::Future {
         if (_lostSettled && _lostInfo) {
           resolve([info = _lostInfo](jsi::Runtime &runtime) mutable {
             return JSIConverter<std::shared_ptr<GPUDeviceLostInfo>>::toJSI(
                 runtime, info);
           });
-          return;
+          return wgpu::Future{};
         }
 
+        // Resolved later from notifyDeviceLost(); no Dawn event to wait on.
         _lostResolve = resolve;
-      },
-      false);
+        return wgpu::Future{};
+      });
 
   _lostHandle = handle;
   return handle;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
index 2ab1ddd14..765a8d794 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
@@ -45,10 +45,10 @@
 #include "GPURenderPipelineDescriptor.h"
 #include "GPUSampler.h"
 #include "GPUSamplerDescriptor.h"
-#include "GPUSharedTextureMemory.h"
-#include "GPUSharedTextureMemoryDescriptor.h"
 #include "GPUShaderModule.h"
 #include "GPUShaderModuleDescriptor.h"
+#include "GPUSharedTextureMemory.h"
+#include "GPUSharedTextureMemoryDescriptor.h"
 #include "GPUSupportedLimits.h"
 #include "GPUTexture.h"
 #include "GPUTextureDescriptor.h"
@@ -251,6 +251,10 @@ class GPUDevice : public NativeObject<GPUDevice> {
   wgpu::Device _instance;
   std::shared_ptr<async::AsyncRunner> _async;
   std::string _label;
+  // Guards the device-lost state below. notifyDeviceLost() may run on a
+  // GpuEventLoop worker thread (the device-lost callback is Spontaneous), while
+  // getLost() runs on the JS thread, so these fields need synchronization.
+  std::mutex _lostMutex;
   std::optional<async::AsyncTaskHandle> _lostHandle;
   std::shared_ptr<GPUDeviceLostInfo> _lostInfo;
   bool _lostSettled = false;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
index d3c0d65af..9b3365d69 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
@@ -82,9 +82,10 @@ async::AsyncTaskHandle GPUQueue::onSubmittedWorkDone() {
   auto queue = _instance;
   return _async->postTask(
       [queue](const async::AsyncTaskHandle::ResolveFunction &resolve,
-              const async::AsyncTaskHandle::RejectFunction &reject) {
-        queue.OnSubmittedWorkDone(
-            wgpu::CallbackMode::AllowProcessEvents,
+              const async::AsyncTaskHandle::RejectFunction &reject)
+          -> wgpu::Future {
+        return queue.OnSubmittedWorkDone(
+            wgpu::CallbackMode::WaitAnyOnly,
             [resolve, reject](wgpu::QueueWorkDoneStatus status,
                               wgpu::StringView message) {
               if (status == wgpu::QueueWorkDoneStatus::Success) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
index 113dc407c..5ac6d3634 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
@@ -12,10 +12,11 @@ async::AsyncTaskHandle GPUShaderModule::getCompilationInfo() {
 
   return _async->postTask(
       [module](const async::AsyncTaskHandle::ResolveFunction &resolve,
-               const async::AsyncTaskHandle::RejectFunction &reject) {
+               const async::AsyncTaskHandle::RejectFunction &reject)
+          -> wgpu::Future {
         auto result = std::make_shared<GPUCompilationInfo>();
-        module.GetCompilationInfo(
-            wgpu::CallbackMode::AllowProcessEvents,
+        return module.GetCompilationInfo(
+            wgpu::CallbackMode::WaitAnyOnly,
             [result, resolve,
              reject](wgpu::CompilationInfoRequestStatus status,
                      const wgpu::CompilationInfo *compilationInfo) {
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncDispatcher.h b/packages/webgpu/cpp/rnwgpu/async/AsyncDispatcher.h
deleted file mode 100644
index 0ec176824..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncDispatcher.h
+++ /dev/null
@@ -1,28 +0,0 @@
-#pragma once
-
-#include <functional>
-#include <memory>
-
-#include <jsi/jsi.h>
-
-namespace rnwgpu::async {
-
-namespace jsi = facebook::jsi;
-
-/**
- * Abstract dispatcher used by the AsyncRunner to enqueue work back onto the
- * JavaScript thread.
- */
-class AsyncDispatcher {
-public:
-  using Work = std::function<void(jsi::Runtime &)>;
-
-  virtual ~AsyncDispatcher() = default;
-
-  /**
-   * Enqueue a unit of work that will be executed on the JavaScript thread.
-   */
-  virtual void post(Work work) = 0;
-};
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp b/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp
index 94bbae230..850e57e8a 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp
@@ -1,6 +1,6 @@
 #include "AsyncRunner.h"
 
-#include <chrono>
+#include <memory>
 #include <stdexcept>
 #include <utility>
 
@@ -16,16 +16,17 @@ struct RuntimeData {
 constexpr const char *TAG = "AsyncRunner";
 } // namespace
 
-AsyncRunner::AsyncRunner(wgpu::Instance instance,
-                         std::shared_ptr<AsyncDispatcher> dispatcher)
-    : _instance(std::move(instance)), _dispatcher(std::move(dispatcher)),
-      _pendingTasks(0), _pumpTasks(0), _tickScheduled(false),
-      _lastTickTimeNs(0) {
-  if (!_dispatcher) {
-    throw std::runtime_error("AsyncRunner requires a valid dispatcher.");
+AsyncRunner::AsyncRunner(std::shared_ptr<RuntimeScheduler> scheduler,
+                         std::shared_ptr<GpuEventLoop> eventLoop)
+    : _scheduler(std::move(scheduler)), _eventLoop(std::move(eventLoop)) {
+  if (!_scheduler) {
+    throw std::runtime_error("AsyncRunner requires a valid RuntimeScheduler.");
   }
-  Logger::logToConsole("[%s] Created runner (dispatcher=%p)", TAG,
-                       _dispatcher.get());
+  if (!_eventLoop) {
+    throw std::runtime_error("AsyncRunner requires a valid GpuEventLoop.");
+  }
+  Logger::logToConsole("[%s] Created runner (scheduler=%p, eventLoop=%p)", TAG,
+                       _scheduler.get(), _eventLoop.get());
 }
 
 std::shared_ptr<AsyncRunner> AsyncRunner::get(jsi::Runtime &runtime) {
@@ -38,173 +39,48 @@ std::shared_ptr<AsyncRunner> AsyncRunner::get(jsi::Runtime &runtime) {
 }
 
 std::shared_ptr<AsyncRunner>
-AsyncRunner::getOrCreate(jsi::Runtime &runtime, wgpu::Instance instance,
-                         std::shared_ptr<AsyncDispatcher> dispatcher) {
+AsyncRunner::getOrCreate(jsi::Runtime &runtime,
+                         std::shared_ptr<RuntimeScheduler> scheduler,
+                         std::shared_ptr<GpuEventLoop> eventLoop) {
   auto existing = get(runtime);
   if (existing) {
     return existing;
   }
 
   auto runner =
-      std::make_shared<AsyncRunner>(std::move(instance), std::move(dispatcher));
+      std::make_shared<AsyncRunner>(std::move(scheduler), std::move(eventLoop));
   auto data = std::make_shared<RuntimeData>();
   data->runner = runner;
   runtime.setRuntimeData(runtimeDataUUID(), data);
   return runner;
 }
 
-AsyncTaskHandle AsyncRunner::postTask(const TaskCallback &callback,
-                                      bool keepPumping) {
-  auto handle = AsyncTaskHandle::create(shared_from_this(), keepPumping);
+AsyncTaskHandle AsyncRunner::postTask(const TaskCallback &callback) {
+  auto handle = AsyncTaskHandle::create(_scheduler);
   if (!handle.valid()) {
     throw std::runtime_error("Failed to create AsyncTaskHandle.");
   }
 
-  _pendingTasks.fetch_add(1, std::memory_order_acq_rel);
-  if (keepPumping) {
-    _pumpTasks.fetch_add(1, std::memory_order_acq_rel);
-  }
-  requestTick();
-
-  Logger::logToConsole(
-      "[%s] postTask (keepPumping=%s, pending=%zu, pumping=%zu)", TAG,
-      keepPumping ? "true" : "false",
-      _pendingTasks.load(std::memory_order_acquire),
-      _pumpTasks.load(std::memory_order_acquire));
-
   auto resolve = handle.createResolveFunction();
   auto reject = handle.createRejectFunction();
 
+  wgpu::Future future{};
   try {
-    callback(resolve, reject);
+    future = callback(resolve, reject);
   } catch (const std::exception &exception) {
     reject(exception.what());
+    return handle;
   } catch (...) {
     reject("Unknown native error in AsyncRunner::postTask.");
+    return handle;
   }
 
+  _eventLoop->addFuture(future);
   return handle;
 }
 
-void AsyncRunner::requestTick() {
-  bool expected = false;
-  if (!_tickScheduled.compare_exchange_strong(expected, true,
-                                              std::memory_order_acq_rel)) {
-    return;
-  }
-
-  auto self = shared_from_this();
-  _dispatcher->post([self](jsi::Runtime &runtime) {
-    auto tickCallback = jsi::Function::createFromHostFunction(
-        runtime, jsi::PropNameID::forAscii(runtime, "AsyncRunnerTick"), 0,
-        [self](jsi::Runtime &runtime, const jsi::Value & /*thisValue*/,
-               const jsi::Value * /*args*/, size_t /*count*/) -> jsi::Value {
-          self->tick(runtime);
-          return jsi::Value::undefined();
-        });
-
-#if defined(ANDROID) || defined(__ANDROID__)
-    auto global = runtime.global();
-    auto setImmediateValue = global.getProperty(runtime, "setImmediate");
-    constexpr auto kMinTickInterval = std::chrono::milliseconds(4);
-    const int64_t nowNs =
-        std::chrono::duration_cast<std::chrono::nanoseconds>(
-            std::chrono::steady_clock::now().time_since_epoch())
-            .count();
-    const int64_t lastNs =
-        self->_lastTickTimeNs.load(std::memory_order_acquire);
-    int delayMs = 0;
-    if (lastNs > 0) {
-      const int64_t elapsedNs = nowNs - lastNs;
-      const int64_t minIntervalNs = kMinTickInterval.count() * 1000000LL;
-      if (elapsedNs < minIntervalNs) {
-        const int64_t remainingNs = minIntervalNs - elapsedNs;
-        delayMs = static_cast<int>((remainingNs + 999999) / 1000000);
-      }
-    }
-
-    auto tryScheduleTimeout = [&](int ms) {
-      auto setTimeoutValue = global.getProperty(runtime, "setTimeout");
-      if (!setTimeoutValue.isObject()) {
-        return false;
-      }
-      auto setTimeoutObj = setTimeoutValue.asObject(runtime);
-      if (!setTimeoutObj.isFunction(runtime)) {
-        return false;
-      }
-      Logger::logToConsole("[%s] requestTick scheduled via setTimeout(%d)", TAG,
-                           ms);
-      auto setTimeoutFn = setTimeoutObj.asFunction(runtime);
-      jsi::Value callbackArg(runtime, tickCallback);
-      jsi::Value delayArg(static_cast<double>(ms));
-      setTimeoutFn.call(runtime, callbackArg, delayArg);
-      return true;
-    };
-
-    if (delayMs > 0) {
-      if (tryScheduleTimeout(delayMs)) {
-        return;
-      }
-      // If setTimeout unavailable fall through to immediate scheduling.
-    }
-
-    if (setImmediateValue.isObject()) {
-      auto setImmediateObj = setImmediateValue.asObject(runtime);
-      if (setImmediateObj.isFunction(runtime)) {
-        Logger::logToConsole("[%s] requestTick scheduled via setImmediate",
-                             TAG);
-        auto setImmediateFn = setImmediateObj.asFunction(runtime);
-        jsi::Value callbackArg(runtime, tickCallback);
-        setImmediateFn.call(runtime, callbackArg);
-        return;
-      }
-    }
-
-    int timeoutDelayMs = delayMs > 0 ? delayMs : 0;
-    if (tryScheduleTimeout(timeoutDelayMs)) {
-      return;
-    }
-
-    Logger::logToConsole("[%s] requestTick scheduled via microtask fallback",
-                         TAG);
-    runtime.queueMicrotask(std::move(tickCallback));
-#else
-    Logger::logToConsole("[%s] requestTick scheduled microtask (non-Android)",
-                         TAG);
-    runtime.queueMicrotask(std::move(tickCallback));
-#endif
-  });
-}
-
-void AsyncRunner::tick(jsi::Runtime & /*runtime*/) {
-  _tickScheduled.store(false, std::memory_order_release);
-  _instance.ProcessEvents();
-  const auto nowNs = std::chrono::duration_cast<std::chrono::nanoseconds>(
-                         std::chrono::steady_clock::now().time_since_epoch())
-                         .count();
-  _lastTickTimeNs.store(nowNs, std::memory_order_release);
-  Logger::logToConsole("[%s] tick processed events (pending=%zu, pumping=%zu)",
-                       TAG, _pendingTasks.load(std::memory_order_acquire),
-                       _pumpTasks.load(std::memory_order_acquire));
-  if (_pumpTasks.load(std::memory_order_acquire) > 0) {
-    requestTick();
-  }
-}
-
-void AsyncRunner::onTaskSettled(bool keepPumping) {
-  _pendingTasks.fetch_sub(1, std::memory_order_acq_rel);
-  if (keepPumping) {
-    _pumpTasks.fetch_sub(1, std::memory_order_acq_rel);
-  }
-  Logger::logToConsole(
-      "[%s] onTaskSettled (keepPumping=%s, pending=%zu, pumping=%zu)", TAG,
-      keepPumping ? "true" : "false",
-      _pendingTasks.load(std::memory_order_acquire),
-      _pumpTasks.load(std::memory_order_acquire));
-}
-
-std::shared_ptr<AsyncDispatcher> AsyncRunner::dispatcher() const {
-  return _dispatcher;
+std::shared_ptr<RuntimeScheduler> AsyncRunner::scheduler() const {
+  return _scheduler;
 }
 
 jsi::UUID AsyncRunner::runtimeDataUUID() {
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h b/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h
index f81101d10..7c01d0f69 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h
@@ -1,14 +1,13 @@
 #pragma once
 
-#include <atomic>
-#include <cstdint>
 #include <functional>
 #include <memory>
 
 #include <jsi/jsi.h>
 
-#include "AsyncDispatcher.h"
 #include "AsyncTaskHandle.h"
+#include "GpuEventLoop.h"
+#include "RuntimeScheduler.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -16,38 +15,43 @@ namespace jsi = facebook::jsi;
 
 namespace rnwgpu::async {
 
+/**
+ * Per-runtime coordinator for asynchronous WebGPU operations.
+ *
+ * Bundles the runtime's RuntimeScheduler (how to settle Promises back on the
+ * owning JS thread) with the GpuEventLoop (how to wait on Dawn futures off the
+ * JS thread). This replaces the previous ProcessEvents polling design: there is
+ * no tick loop and no idle CPU usage.
+ *
+ * A task callback registers a Dawn async op with CallbackMode::WaitAnyOnly and
+ * returns the resulting wgpu::Future, which is handed to the GpuEventLoop. A
+ * returned future with id == 0 means "no event to wait on" (deferred/immediate
+ * resolution, e.g. GPUDevice::getLost).
+ */
 class AsyncRunner : public std::enable_shared_from_this<AsyncRunner> {
 public:
   using TaskCallback =
-      std::function<void(const AsyncTaskHandle::ResolveFunction &,
-                         const AsyncTaskHandle::RejectFunction &)>;
+      std::function<wgpu::Future(const AsyncTaskHandle::ResolveFunction &,
+                                 const AsyncTaskHandle::RejectFunction &)>;
 
-  AsyncRunner(wgpu::Instance instance,
-              std::shared_ptr<AsyncDispatcher> dispatcher);
+  AsyncRunner(std::shared_ptr<RuntimeScheduler> scheduler,
+              std::shared_ptr<GpuEventLoop> eventLoop);
 
   static std::shared_ptr<AsyncRunner> get(jsi::Runtime &runtime);
   static std::shared_ptr<AsyncRunner>
-  getOrCreate(jsi::Runtime &runtime, wgpu::Instance instance,
-              std::shared_ptr<AsyncDispatcher> dispatcher);
+  getOrCreate(jsi::Runtime &runtime,
+              std::shared_ptr<RuntimeScheduler> scheduler,
+              std::shared_ptr<GpuEventLoop> eventLoop);
 
-  AsyncTaskHandle postTask(const TaskCallback &callback,
-                           bool keepPumping = true);
+  AsyncTaskHandle postTask(const TaskCallback &callback);
 
-  void requestTick();
-  void tick(jsi::Runtime &runtime);
-  void onTaskSettled(bool keepPumping);
-
-  std::shared_ptr<AsyncDispatcher> dispatcher() const;
+  std::shared_ptr<RuntimeScheduler> scheduler() const;
 
 private:
   static jsi::UUID runtimeDataUUID();
 
-  wgpu::Instance _instance;
-  std::shared_ptr<AsyncDispatcher> _dispatcher;
-  std::atomic<size_t> _pendingTasks;
-  std::atomic<size_t> _pumpTasks;
-  std::atomic<bool> _tickScheduled;
-  std::atomic<int64_t> _lastTickTimeNs;
+  std::shared_ptr<RuntimeScheduler> _scheduler;
+  std::shared_ptr<GpuEventLoop> _eventLoop;
 };
 
 } // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
index 6b262005a..c0876c1e3 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
@@ -1,20 +1,19 @@
 #include "AsyncTaskHandle.h"
 
+#include <memory>
 #include <string>
 #include <utility>
 
 #include "Promise.h"
 
-#include "AsyncRunner.h"
-
 namespace rnwgpu::async {
 
 using Action = std::function<void(jsi::Runtime &, rnwgpu::Promise &)>;
 
 struct AsyncTaskHandle::State
     : public std::enable_shared_from_this<AsyncTaskHandle::State> {
-  State(std::shared_ptr<AsyncRunner> runner, bool keepPumping)
-      : runner(std::move(runner)), keepPumping(keepPumping) {}
+  explicit State(std::shared_ptr<RuntimeScheduler> scheduler)
+      : scheduler(std::move(scheduler)) {}
 
   void settle(Action action);
   void attachPromise(const std::shared_ptr<rnwgpu::Promise> &promise);
@@ -26,12 +25,11 @@ struct AsyncTaskHandle::State
   std::shared_ptr<rnwgpu::Promise> currentPromise();
 
   std::mutex mutex;
-  std::weak_ptr<AsyncRunner> runner;
+  std::shared_ptr<RuntimeScheduler> scheduler;
   std::shared_ptr<rnwgpu::Promise> promise;
   std::optional<Action> pendingAction;
   bool settled = false;
   std::shared_ptr<State> keepAlive;
-  bool keepPumping;
 };
 
 // MARK: - State helpers
@@ -77,26 +75,18 @@ void AsyncTaskHandle::State::attachPromise(
 }
 
 void AsyncTaskHandle::State::schedule(Action action) {
-  auto runnerRef = runner.lock();
-  if (!runnerRef) {
+  if (!scheduler) {
     return;
   }
 
   auto promiseRef = currentPromise();
   if (!promiseRef) {
-    runnerRef->onTaskSettled(keepPumping);
-    return;
-  }
-
-  auto dispatcherRef = runnerRef->dispatcher();
-  if (!dispatcherRef) {
-    runnerRef->onTaskSettled(keepPumping);
     return;
   }
 
-  dispatcherRef->post([self = shared_from_this(), action = std::move(action),
-                       runnerRef, promiseRef](jsi::Runtime &runtime) mutable {
-    runnerRef->onTaskSettled(self->keepPumping);
+  scheduler->scheduleOnJS([self = shared_from_this(),
+                           action = std::move(action),
+                           promiseRef](jsi::Runtime &runtime) mutable {
     action(runtime, *promiseRef);
     std::lock_guard<std::mutex> lock(self->mutex);
     self->keepAlive.reset();
@@ -149,9 +139,8 @@ AsyncTaskHandle::AsyncTaskHandle(std::shared_ptr<State> state)
 bool AsyncTaskHandle::valid() const { return _state != nullptr; }
 
 AsyncTaskHandle
-AsyncTaskHandle::create(const std::shared_ptr<AsyncRunner> &runner,
-                        bool keepPumping) {
-  auto state = std::make_shared<State>(runner, keepPumping);
+AsyncTaskHandle::create(const std::shared_ptr<RuntimeScheduler> &scheduler) {
+  auto state = std::make_shared<State>(scheduler);
   state->keepAlive = state;
   return AsyncTaskHandle(std::move(state));
 }
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
index cb6c7a2a4..2f910fd3f 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
@@ -8,7 +8,7 @@
 
 #include <jsi/jsi.h>
 
-#include "AsyncDispatcher.h"
+#include "RuntimeScheduler.h"
 
 namespace rnwgpu {
 class Promise;
@@ -16,11 +16,13 @@ class Promise;
 
 namespace rnwgpu::async {
 
-class AsyncRunner;
-
 /**
  * Represents a pending asynchronous WebGPU operation that can be converted into
  * a JavaScript Promise.
+ *
+ * The native callback (resolve/reject) may be invoked from any thread (e.g. a
+ * GpuEventLoop worker); the actual Promise settlement is marshalled onto the
+ * owning runtime's JS thread via a RuntimeScheduler.
  */
 class AsyncTaskHandle {
 public:
@@ -45,8 +47,8 @@ class AsyncTaskHandle {
 
   void attachPromise(const std::shared_ptr<rnwgpu::Promise> &promise) const;
 
-  static AsyncTaskHandle create(const std::shared_ptr<AsyncRunner> &runner,
-                                bool keepPumping);
+  static AsyncTaskHandle
+  create(const std::shared_ptr<RuntimeScheduler> &scheduler);
 
 private:
   std::shared_ptr<State> _state;
diff --git a/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp b/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp
new file mode 100644
index 000000000..2ef72f407
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp
@@ -0,0 +1,21 @@
+#include "CallInvokerScheduler.h"
+
+#include <memory>
+#include <utility>
+
+namespace rnwgpu::async {
+
+CallInvokerScheduler::CallInvokerScheduler(
+    std::shared_ptr<react::CallInvoker> invoker)
+    : _invoker(std::move(invoker)) {}
+
+void CallInvokerScheduler::scheduleOnJS(
+    std::function<void(jsi::Runtime &)> job) {
+  if (!_invoker || !job) {
+    return;
+  }
+  _invoker->invokeAsync(
+      [job = std::move(job)](jsi::Runtime &runtime) { job(runtime); });
+}
+
+} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h b/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h
new file mode 100644
index 000000000..cbb6a9174
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h
@@ -0,0 +1,32 @@
+#pragma once
+
+#include <functional>
+#include <memory>
+
+#include <ReactCommon/CallInvoker.h>
+#include <jsi/jsi.h>
+
+#include "RuntimeScheduler.h"
+
+namespace rnwgpu::async {
+
+namespace jsi = facebook::jsi;
+namespace react = facebook::react;
+
+/**
+ * RuntimeScheduler for the main React Native JS runtime, backed by
+ * react::CallInvoker::invokeAsync. invokeAsync is safe to call from any thread
+ * and delivers the work on the JS thread with the runtime, which is exactly the
+ * contract RuntimeScheduler requires.
+ */
+class CallInvokerScheduler final : public RuntimeScheduler {
+public:
+  explicit CallInvokerScheduler(std::shared_ptr<react::CallInvoker> invoker);
+
+  void scheduleOnJS(std::function<void(jsi::Runtime &)> job) override;
+
+private:
+  std::shared_ptr<react::CallInvoker> _invoker;
+};
+
+} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
new file mode 100644
index 000000000..91119dd96
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
@@ -0,0 +1,102 @@
+#include "GpuEventLoop.h"
+
+#include <algorithm>
+#include <cstdint>
+#include <thread>
+#include <utility>
+
+#include "WGPULogger.h"
+
+namespace rnwgpu::async {
+
+namespace {
+constexpr const char *TAG = "GpuEventLoop";
+
+std::size_t computeMaxWorkers() {
+  unsigned int hw = std::thread::hardware_concurrency();
+  if (hw == 0) {
+    hw = 4;
+  }
+  // A small bounded pool: enough to overlap the handful of async GPU ops that
+  // are realistically in flight at once, without spawning unbounded threads.
+  return std::max<std::size_t>(2, std::min<std::size_t>(8, hw));
+}
+} // namespace
+
+GpuEventLoop::GpuEventLoop(wgpu::Instance instance)
+    : _state(std::make_shared<State>(std::move(instance))) {
+  _state->maxWorkers = computeMaxWorkers();
+  Logger::logToConsole("[%s] Created (maxWorkers=%zu)", TAG,
+                       _state->maxWorkers);
+}
+
+GpuEventLoop::~GpuEventLoop() {
+  {
+    std::lock_guard<std::mutex> lock(_state->mutex);
+    _state->running.store(false, std::memory_order_release);
+  }
+  // Wake idle workers so they can observe !running and exit. Workers that are
+  // currently blocked in WaitAny keep the shared State (and its wgpu::Instance
+  // ref) alive until their future completes, then exit; we intentionally do not
+  // join here to avoid blocking teardown on in-flight GPU work.
+  _state->cv.notify_all();
+}
+
+void GpuEventLoop::addFuture(wgpu::Future future) {
+  if (future.id == 0) {
+    // No event to wait on (deferred/immediate resolution). The callback path
+    // settles the promise without involving the event loop.
+    return;
+  }
+
+  std::lock_guard<std::mutex> lock(_state->mutex);
+  if (!_state->running.load(std::memory_order_acquire)) {
+    return;
+  }
+
+  _state->queue.push(future);
+
+  // Grow the pool if every worker is busy and we are still under the cap;
+  // otherwise wake an idle worker. A freshly spawned worker picks the job up
+  // via the queue-non-empty predicate, so it needs no separate notify.
+  if (_state->idleWorkers == 0 && _state->totalWorkers < _state->maxWorkers) {
+    _state->totalWorkers++;
+    std::thread(&GpuEventLoop::worker, _state).detach();
+    Logger::logToConsole("[%s] grew pool to %zu worker(s)", TAG,
+                         _state->totalWorkers);
+  } else {
+    _state->cv.notify_one();
+  }
+}
+
+void GpuEventLoop::worker(std::shared_ptr<State> state) {
+  for (;;) {
+    wgpu::Future future{};
+    {
+      std::unique_lock<std::mutex> lock(state->mutex);
+      state->idleWorkers++;
+      state->cv.wait(lock, [&state] {
+        return !state->running.load(std::memory_order_acquire) ||
+               !state->queue.empty();
+      });
+      state->idleWorkers--;
+
+      if (state->queue.empty()) {
+        // Only happens when shutting down.
+        state->totalWorkers--;
+        return;
+      }
+
+      future = state->queue.front();
+      state->queue.pop();
+    }
+
+    // Single-future wait: always a legal single-source WaitAny. Blocks with no
+    // CPU cost until the GPU work completes, at which point Dawn invokes the
+    // future's callback on this thread (it then marshals back to the owning
+    // runtime via its RuntimeScheduler).
+    state->instance.WaitAny(future, UINT64_MAX);
+  }
+}
+
+} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h
new file mode 100644
index 000000000..07e90cd98
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h
@@ -0,0 +1,70 @@
+#pragma once
+
+#include <atomic>
+#include <condition_variable>
+#include <cstddef>
+#include <memory>
+#include <mutex>
+#include <queue>
+#include <utility>
+
+#include "webgpu/webgpu_cpp.h"
+
+namespace rnwgpu::async {
+
+/**
+ * Background, event-driven driver for Dawn async operations. Replaces the old
+ * JS-thread ProcessEvents polling loop.
+ *
+ * Each pending wgpu::Future (registered with CallbackMode::WaitAnyOnly) is
+ * handed to addFuture() and waited on by a worker thread via
+ * `instance.WaitAny(future, UINT64_MAX)`. The wait is genuinely event-driven
+ * (zero idle CPU) and resolves the instant the GPU work completes, at which
+ * point Dawn fires the future's callback on the worker thread. That callback is
+ * responsible for marshalling back to the owning runtime's JS thread (via a
+ * RuntimeScheduler) to settle the JS Promise.
+ *
+ * Threading model (validated in Phase 0, spike 2): each WaitAny call waits on a
+ * *single* future, which is always a legal single-source wait. Multiple workers
+ * may block in WaitAny on the same instance concurrently; Dawn's EventManager
+ * is designed for this.
+ *
+ * The worker pool grows lazily up to a small cap as concurrent work demands,
+ * and threads are reused. Shared state is held behind a shared_ptr so detached
+ * workers (and the wgpu::Instance ref they need) outlive this object safely.
+ */
+class GpuEventLoop {
+public:
+  explicit GpuEventLoop(wgpu::Instance instance);
+  ~GpuEventLoop();
+
+  GpuEventLoop(const GpuEventLoop &) = delete;
+  GpuEventLoop &operator=(const GpuEventLoop &) = delete;
+
+  /**
+   * Wait for `future` to complete on a background thread. A future with id == 0
+   * (no event to wait on, e.g. a deferred/immediate resolution) is ignored.
+   * Thread-safe.
+   */
+  void addFuture(wgpu::Future future);
+
+private:
+  struct State {
+    explicit State(wgpu::Instance instance) : instance(std::move(instance)) {}
+
+    wgpu::Instance instance;
+    std::mutex mutex;
+    std::condition_variable cv;
+    std::queue<wgpu::Future> queue;
+    std::atomic_bool running{true};
+    std::size_t idleWorkers = 0;
+    std::size_t totalWorkers = 0;
+    std::size_t maxWorkers = 1;
+  };
+
+  static void worker(std::shared_ptr<State> state);
+
+  std::shared_ptr<State> _state;
+};
+
+} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.cpp b/packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.cpp
deleted file mode 100644
index 6231a833c..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.cpp
+++ /dev/null
@@ -1,23 +0,0 @@
-#include "JSIMicrotaskDispatcher.h"
-
-#include <utility>
-
-namespace rnwgpu::async {
-
-JSIMicrotaskDispatcher::JSIMicrotaskDispatcher(jsi::Runtime &runtime)
-    : _runtime(runtime) {}
-
-void JSIMicrotaskDispatcher::post(Work work) {
-  auto microtask = jsi::Function::createFromHostFunction(
-      _runtime, jsi::PropNameID::forAscii(_runtime, "AsyncMicrotask"), 0,
-      [work = std::move(work)](
-          jsi::Runtime &runtime, const jsi::Value & /*thisValue*/,
-          const jsi::Value * /*args*/, size_t /*count*/) -> jsi::Value {
-        work(runtime);
-        return jsi::Value::undefined();
-      });
-
-  _runtime.queueMicrotask(std::move(microtask));
-}
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.h b/packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.h
deleted file mode 100644
index bae208c5d..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/JSIMicrotaskDispatcher.h
+++ /dev/null
@@ -1,22 +0,0 @@
-#pragma once
-
-#include "AsyncDispatcher.h"
-
-namespace rnwgpu::async {
-
-/**
- * Dispatcher implementation backed by `jsi::Runtime::queueMicrotask`.
- */
-class JSIMicrotaskDispatcher final
-    : public AsyncDispatcher,
-      public std::enable_shared_from_this<JSIMicrotaskDispatcher> {
-public:
-  explicit JSIMicrotaskDispatcher(jsi::Runtime &runtime);
-
-  void post(Work work) override;
-
-private:
-  jsi::Runtime &_runtime;
-};
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h
new file mode 100644
index 000000000..926b494c3
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h
@@ -0,0 +1,31 @@
+#pragma once
+
+#include <functional>
+
+#include <jsi/jsi.h>
+
+namespace rnwgpu::async {
+
+namespace jsi = facebook::jsi;
+
+/**
+ * Thread-safe "post this job onto a specific runtime's JS thread".
+ *
+ * Replaces the old AsyncDispatcher / JSIMicrotaskDispatcher, whose
+ * queueMicrotask-based dispatch was only safe to call from the runtime's own
+ * thread. A RuntimeScheduler can be called from any thread (e.g. the
+ * GpuEventLoop background threads) and guarantees the job runs on the owning
+ * runtime's JS thread.
+ */
+class RuntimeScheduler {
+public:
+  virtual ~RuntimeScheduler() = default;
+
+  /**
+   * Schedule `job` to run on this runtime's JS thread. Callable from any
+   * thread. Jobs are delivered in FIFO order relative to one another.
+   */
+  virtual void scheduleOnJS(std::function<void(jsi::Runtime &)> job) = 0;
+};
+
+} // namespace rnwgpu::async

From e2acc776990a72dbf4ebcb8e21c805f01d2795fc Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 14:38:50 +0200
Subject: [PATCH 03/25] :wrench:

---
 docs/refactor-async-present-plan.md           | 110 +++++++++++++++++-
 packages/webgpu/android/CMakeLists.txt        |   2 +-
 packages/webgpu/cpp/rnwgpu/api/GPU.cpp        |  10 +-
 packages/webgpu/cpp/rnwgpu/api/GPU.h          |   5 +-
 packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp |   4 +-
 packages/webgpu/cpp/rnwgpu/api/GPUAdapter.h   |   6 +-
 packages/webgpu/cpp/rnwgpu/api/GPUBuffer.h    |   6 +-
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.h    |   6 +-
 packages/webgpu/cpp/rnwgpu/api/GPUQueue.h     |   6 +-
 .../webgpu/cpp/rnwgpu/api/GPUShaderModule.h   |   6 +-
 .../webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h |   2 +-
 .../{AsyncRunner.cpp => RuntimeContext.cpp}   |  37 +++---
 .../async/{AsyncRunner.h => RuntimeContext.h} |  10 +-
 13 files changed, 158 insertions(+), 52 deletions(-)
 rename packages/webgpu/cpp/rnwgpu/async/{AsyncRunner.cpp => RuntimeContext.cpp} (56%)
 rename packages/webgpu/cpp/rnwgpu/async/{AsyncRunner.h => RuntimeContext.h} (83%)

diff --git a/docs/refactor-async-present-plan.md b/docs/refactor-async-present-plan.md
index 6c22cc98d..e69706534 100644
--- a/docs/refactor-async-present-plan.md
+++ b/docs/refactor-async-present-plan.md
@@ -1,6 +1,6 @@
 # Refactor: event-driven async + auto-present
 
-Status: **planning / Phase 0 (local spikes)**
+Status: **Phase 0 complete — all spikes GREEN, ready for Phase 1**
 Branch: `claude/keen-darwin-xeywa`
 
 This document is the handoff for moving the async + present refactor forward. Phase 0
@@ -128,9 +128,77 @@ Goal: confirm NDK `AChoreographer_postFrameCallback` is usable at the project `m
 
 ---
 
+## Phase 0 — Findings (completed 2026-06-02, branch `claude/keen-darwin-xeywa`)
+
+Environment verified: `node_modules` installed, `externals/dawn` present, RN **0.81.4**,
+`react-native-worklets` **0.8.3**, Android `minSdk` **26**, NDK 26/27 available.
+
+### Spike 1 — worklet-runtime scheduler → **GREEN (symbol exists, thread-safe)**
+`worklets/WorkletRuntime/WorkletRuntime.h` exposes exactly what we need:
+- `WorkletRuntime::schedule(std::function<void(jsi::Runtime &)> job)` — posts `job` onto the
+  runtime's own `AsyncQueue` (`WorkletRuntime.cpp:211-227`). It is **callable from any thread**
+  (the underlying `AsyncQueueImpl` is a mutex+condvar queue; `AsyncQueueUI` forwards to the
+  `UIScheduler`). The job runs on the runtime's event-loop thread, under `runtimeMutex_`, and
+  uses `weak_from_this()` so it is a **safe no-op if the runtime was torn down**. This is a
+  drop-in for `RuntimeScheduler::scheduleOnJS` for worklet runtimes.
+- `WorkletRuntime::getWeakRuntimeFromJSIRuntime(jsi::Runtime &rt)` (RN ≥ 0.81, we have 0.81.4)
+  maps a bare `jsi::Runtime&` → `weak_ptr<WorkletRuntime>`, so the per-runtime
+  `RuntimeContext` can recover the scheduler from any worklet runtime (UI + dedicated
+  `createWorkletRuntime`) with no JS shim.
+
+**Caveat (build wiring, not API):** webgpu does **not** currently link worklets natively
+(no worklets entry in `packages/webgpu/*.podspec` or `android/CMakeLists.txt`; only JS-level
+serialization helpers exist). Phase 3 must add the native dependency:
+- iOS: depend on `RNWorklets` pod (it ships public headers under `worklets/`,
+  `header_dir = "worklets"`).
+- Android: import the worklets **prefab** module `worklets` (`prefabPublishing` is on in
+  `react-native-worklets/android/build.gradle`).
+Worklets is already a `peerDependency`, so this adds no new install. Phase 3 stays cheap; no
+worklets PR or JS shim needed.
+
+### Spike 2 — concurrent `WaitAny` on one instance → **GREEN (designed for it)**
+Dawn's native `EventManager` (`externals/dawn/src/dawn/native/EventManager.{h,cpp}`) is built
+for multi-threaded waits:
+- State is `MutexProtected<EventState>`; `mNextFutureID` is atomic; a code comment
+  (`EventManager.h:78-82`) explicitly notes "another thread can race to complete the event …
+  via a WaitAny call".
+- Each `WaitAny` call with a non-zero timeout creates a **stack-local `Waiter`** with its **own**
+  `MutexCondVarProtected<bool>` (`EventManager.cpp:338`, `:106`), registers it per-FutureID in
+  the shared map, then blocks on its own condvar. `SetFutureReady` signals the registered
+  waiters. → **N threads can each block in `WaitAny` on the same instance concurrently, each
+  owning its own future.** This is exactly the plan's primary "one future per pool thread" model.
+
+**Hard constraint discovered (`EventManager.cpp:341-354`):** within a *single* `WaitAny` call
+with a non-zero timeout, you may **not** mix events from multiple queues, nor a queue event
+together with a non-queue event — it returns `WaitStatus::Error` ("Mixed source waits with
+timeouts are not currently supported"). Note `mapAsync`/`onSubmittedWorkDone` are *queue*
+events while `requestAdapter`/`requestDevice`/`createPipelineAsync`/`popErrorScope` are
+*non-queue* events.
+→ **Implication:** adopt the **per-future-per-thread** design (each pool thread waits on exactly
+one future) — it is single-source and always legal. The plan's stated fallback ("single worker
+waiting on the batched future set") is **not viable** as written, because batching mixed sources
+hits this restriction. If a bounded pool is undesirable, the correct fallback is one
+worker-thread *per future* (still single-source), not one worker for a batched set.
+
+### Spike 3 — Android frame callback → **GREEN (no JNI bridge needed)**
+In `android/choreographer.h`, `AChoreographer_getInstance()` and
+`AChoreographer_postFrameCallback()` are both `__INTRODUCED_IN(24)`; `minSdk` is **26**, so the
+pure-NDK path works with no Java `Choreographer`/JNI bridge.
+- `postFrameCallback` is `__DEPRECATED_IN(29)` in favor of `postFrameCallback64` (API 29) /
+  `postVsyncCallback` (API 33). Recommendation: call `postFrameCallback64` when
+  `android_get_device_api_level() >= 29`, else `postFrameCallback` (works on 26-28). Both are
+  acceptable; the 64-bit variant just avoids the deprecation warning and 32-bit time wrap.
+- `AChoreographer_getInstance()` must be called on a thread with a `Looper` (the main/UI
+  thread) — `FrameDriver` already lives on the UI thread, so this is satisfied.
+
+### Net go/no-go
+All three risks clear. Proceed to Phase 1. Two plan amendments: (1) Phase 3 must add the
+worklets native build dependency (podspec + prefab); (2) `GpuEventLoop` must use
+per-future-per-thread waits (drop the batched-future fallback).
+
 ## Implementation phases (after Phase 0)
 
-**Phase 1 — Event-driven async** (no public API change; `present()` untouched)
+**Phase 1 — Event-driven async** (no public API change; `present()` untouched) — **DONE**
 - Add `RuntimeScheduler` (+ main-runtime CallInvoker impl) and `GpuEventLoop`.
 - Switch all 7 async sites to `WaitAnyOnly` + `GpuEventLoop.addFuture(...)`:
   `api/GPU.cpp`, `api/GPUAdapter.cpp`, `api/GPUDevice.cpp` (×3), `api/GPUBuffer.cpp`,
@@ -138,6 +206,44 @@ Goal: confirm NDK `AChoreographer_postFrameCallback` is usable at the project `m
 - Delete `async/AsyncRunner.*` polling + `async/JSIMicrotaskDispatcher.*`; keep
   `AsyncTaskHandle` / `Promise` settle path on the new scheduler.
 
+### Phase 1 — what shipped (branch `claude/keen-darwin-xeywa`)
+New files (`cpp/rnwgpu/async/`):
+- `RuntimeScheduler.h` — interface `scheduleOnJS(std::function<void(jsi::Runtime&)>)`,
+  callable from any thread.
+- `CallInvokerScheduler.{h,cpp}` — main-runtime impl wrapping
+  `react::CallInvoker::invokeAsync(CallFunc&&)` (RN 0.81 delivers the job on the JS thread
+  with the runtime).
+- `GpuEventLoop.{h,cpp}` — background `WaitAny` driver. Lazily-grown bounded worker pool
+  (cap = `clamp(hardware_concurrency, 2, 8)`); each worker does a single-future
+  `instance.WaitAny(future, UINT64_MAX)` (always a legal single-source wait, per Phase 0
+  spike 2). Shared state held behind a `shared_ptr` so detached workers (and the
+  `wgpu::Instance` ref they need) outlive the object safely; teardown sets `running=false`
+  and notifies idle workers without joining in-flight GPU waits.
+
+Deviations from the original plan (intentional):
+1. **`AsyncRunner` was replaced by `RuntimeContext`** (`async/RuntimeContext.{h,cpp}`), the
+   per-runtime coordinator the plan's Target-architecture §A already named. It bundles
+   `{RuntimeScheduler, GpuEventLoop}` and exposes `postTask`; all polling internals
+   (`tick`/`requestTick`/`ProcessEvents`/pump counters) are gone. `AsyncTaskHandle` depends
+   only on `RuntimeScheduler`. The old `AsyncRunner` name/files no longer exist anywhere
+   (the 6 `api/*` classes now hold `std::shared_ptr<async::RuntimeContext> _async`); the dead
+   `GPU::getAsyncRunner()` accessor was deleted.
+2. **`postTask`'s callback now returns a `wgpu::Future`** (the value returned by the Dawn
+   `WaitAnyOnly` call), which `AsyncRunner` hands to `GpuEventLoop.addFuture`. A returned
+   future with `id == 0` means "no event to wait on" and is ignored — used by
+   `GPUDevice::getLost` (resolved synchronously or later via `notifyDeviceLost`). This
+   replaced the old `keepPumping` bool argument, which is gone.
+
+`GPU`'s constructor now takes the `CallInvoker` (threaded through from `RNWebGPUManager`,
+which already held it) to build the `CallInvokerScheduler`. `AsyncDispatcher.h` and
+`JSIMicrotaskDispatcher.{h,cpp}` deleted; `android/CMakeLists.txt` updated (iOS podspec
+globs `cpp/**` so it needs no change).
+
+Validation run locally: all changed + new TUs syntax-check under the Android NDK toolchain;
+the full `react-native-wgpu` native lib **compiles and links** for `arm64-v8a` (ninja);
+`cpplint` clean (project filters); `clang-format` (pinned 15.0.0) applied; `yarn tsc` passes
+(no TS changed). On-device runtime behaviour (frame pacing, zero idle CPU) is Phase 4.
+
 **Phase 2 — Auto-present + remove `present()`**
 - Add `FrameDriver` (iOS `CADisplayLink`, Android `AChoreographer`); wire
   `getCurrentTexture` → register; vsync → dispatch present to owning runtime.
diff --git a/packages/webgpu/android/CMakeLists.txt b/packages/webgpu/android/CMakeLists.txt
index 33704d56c..bb7c2bacb 100644
--- a/packages/webgpu/android/CMakeLists.txt
+++ b/packages/webgpu/android/CMakeLists.txt
@@ -49,7 +49,7 @@ add_library(${PACKAGE_NAME} SHARED
     ../cpp/jsi/Promise.cpp
     ../cpp/jsi/RuntimeLifecycleMonitor.cpp
     ../cpp/jsi/RuntimeAwareCache.cpp
-    ../cpp/rnwgpu/async/AsyncRunner.cpp
+    ../cpp/rnwgpu/async/RuntimeContext.cpp
     ../cpp/rnwgpu/async/AsyncTaskHandle.cpp
     ../cpp/rnwgpu/async/CallInvokerScheduler.cpp
     ../cpp/rnwgpu/async/GpuEventLoop.cpp
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
index 7332ac394..fcffd8a68 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
@@ -28,8 +28,8 @@ GPU::GPU(jsi::Runtime &runtime,
   auto scheduler =
       std::make_shared<async::CallInvokerScheduler>(std::move(callInvoker));
   auto eventLoop = std::make_shared<async::GpuEventLoop>(_instance);
-  _async = async::AsyncRunner::getOrCreate(runtime, std::move(scheduler),
-                                           std::move(eventLoop));
+  _async = async::RuntimeContext::getOrCreate(runtime, std::move(scheduler),
+                                              std::move(eventLoop));
 }
 
 async::AsyncTaskHandle GPU::requestAdapter(
@@ -51,7 +51,7 @@ async::AsyncTaskHandle GPU::requestAdapter(
           -> wgpu::Future {
         return _instance.RequestAdapter(
             &aOptions, wgpu::CallbackMode::WaitAnyOnly,
-            [asyncRunner = _async, resolve,
+            [context = _async, resolve,
              reject](wgpu::RequestAdapterStatus status, wgpu::Adapter adapter,
                      wgpu::StringView message) {
               if (message.length) {
@@ -59,8 +59,8 @@ async::AsyncTaskHandle GPU::requestAdapter(
               }
 
               if (status == wgpu::RequestAdapterStatus::Success && adapter) {
-                auto adapterHost = std::make_shared<GPUAdapter>(
-                    std::move(adapter), asyncRunner);
+                auto adapterHost =
+                    std::make_shared<GPUAdapter>(std::move(adapter), context);
                 auto result =
                     std::variant<std::nullptr_t, std::shared_ptr<GPUAdapter>>(
                         adapterHost);
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.h b/packages/webgpu/cpp/rnwgpu/api/GPU.h
index 89f46526b..e7dc15caf 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.h
@@ -9,8 +9,8 @@
 
 #include "NativeObject.h"
 
-#include "rnwgpu/async/AsyncRunner.h"
 #include "rnwgpu/async/AsyncTaskHandle.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -53,11 +53,10 @@ class GPU : public NativeObject<GPU> {
   }
 
   inline const wgpu::Instance get() { return _instance; }
-  inline std::shared_ptr<async::AsyncRunner> getAsyncRunner() { return _async; }
 
 private:
   wgpu::Instance _instance;
-  std::shared_ptr<async::AsyncRunner> _async;
+  std::shared_ptr<async::RuntimeContext> _async;
 };
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
index 0a35a39e8..b34b12d96 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
@@ -97,7 +97,7 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
         (void)descriptor;
         return _instance.RequestDevice(
             &aDescriptor, wgpu::CallbackMode::WaitAnyOnly,
-            [asyncRunner = _async, resolve, reject, label, creationRuntime,
+            [context = _async, resolve, reject, label, creationRuntime,
              deviceLostBinding](wgpu::RequestDeviceStatus status,
                                 wgpu::Device device, wgpu::StringView message) {
               if (message.length) {
@@ -146,7 +146,7 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
                   creationRuntime);
 
               auto deviceHost = std::make_shared<GPUDevice>(std::move(device),
-                                                            asyncRunner, label);
+                                                            context, label);
               *deviceLostBinding = deviceHost;
 
               // Register the device in the static registry so the uncaptured
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.h b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.h
index 66acdc2f7..7f399f0a7 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.h
@@ -8,8 +8,8 @@
 
 #include "NativeObject.h"
 
-#include "rnwgpu/async/AsyncRunner.h"
 #include "rnwgpu/async/AsyncTaskHandle.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -27,7 +27,7 @@ class GPUAdapter : public NativeObject<GPUAdapter> {
   static constexpr const char *CLASS_NAME = "GPUAdapter";
 
   explicit GPUAdapter(wgpu::Adapter instance,
-                      std::shared_ptr<async::AsyncRunner> async)
+                      std::shared_ptr<async::RuntimeContext> async)
       : NativeObject(CLASS_NAME), _instance(instance), _async(async) {}
 
 public:
@@ -53,7 +53,7 @@ class GPUAdapter : public NativeObject<GPUAdapter> {
 
 private:
   wgpu::Adapter _instance;
-  std::shared_ptr<async::AsyncRunner> _async;
+  std::shared_ptr<async::RuntimeContext> _async;
 };
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.h b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.h
index edfc8e41b..036b5af4b 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.h
@@ -9,8 +9,8 @@
 
 #include "NativeObject.h"
 
-#include "rnwgpu/async/AsyncRunner.h"
 #include "rnwgpu/async/AsyncTaskHandle.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -25,7 +25,7 @@ class GPUBuffer : public NativeObject<GPUBuffer> {
   static constexpr const char *CLASS_NAME = "GPUBuffer";
 
   explicit GPUBuffer(wgpu::Buffer instance,
-                     std::shared_ptr<async::AsyncRunner> async,
+                     std::shared_ptr<async::RuntimeContext> async,
                      std::string label)
       : NativeObject(CLASS_NAME), _instance(instance), _async(async),
         _label(label) {}
@@ -71,7 +71,7 @@ class GPUBuffer : public NativeObject<GPUBuffer> {
 
 private:
   wgpu::Buffer _instance;
-  std::shared_ptr<async::AsyncRunner> _async;
+  std::shared_ptr<async::RuntimeContext> _async;
   std::string _label;
   struct Mapping {
     uint64_t start;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
index 765a8d794..facbed161 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
@@ -15,8 +15,8 @@
 
 #include "NativeObject.h"
 
-#include "rnwgpu/async/AsyncRunner.h"
 #include "rnwgpu/async/AsyncTaskHandle.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -63,7 +63,7 @@ class GPUDevice : public NativeObject<GPUDevice> {
   static constexpr const char *CLASS_NAME = "GPUDevice";
 
   explicit GPUDevice(wgpu::Device instance,
-                     std::shared_ptr<async::AsyncRunner> async,
+                     std::shared_ptr<async::RuntimeContext> async,
                      std::string label)
       : NativeObject(CLASS_NAME), _instance(instance), _async(async),
         _label(label) {}
@@ -249,7 +249,7 @@ class GPUDevice : public NativeObject<GPUDevice> {
   friend class GPUAdapter;
 
   wgpu::Device _instance;
-  std::shared_ptr<async::AsyncRunner> _async;
+  std::shared_ptr<async::RuntimeContext> _async;
   std::string _label;
   // Guards the device-lost state below. notifyDeviceLost() may run on a
   // GpuEventLoop worker thread (the device-lost callback is Spontaneous), while
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.h b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.h
index be824e781..f322392b7 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.h
@@ -8,8 +8,8 @@
 
 #include "NativeObject.h"
 
-#include "rnwgpu/async/AsyncRunner.h"
 #include "rnwgpu/async/AsyncTaskHandle.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -28,7 +28,7 @@ class GPUQueue : public NativeObject<GPUQueue> {
   static constexpr const char *CLASS_NAME = "GPUQueue";
 
   explicit GPUQueue(wgpu::Queue instance,
-                    std::shared_ptr<async::AsyncRunner> async,
+                    std::shared_ptr<async::RuntimeContext> async,
                     std::string label)
       : NativeObject(CLASS_NAME), _instance(instance), _async(async),
         _label(label) {}
@@ -74,7 +74,7 @@ class GPUQueue : public NativeObject<GPUQueue> {
 
 private:
   wgpu::Queue _instance;
-  std::shared_ptr<async::AsyncRunner> _async;
+  std::shared_ptr<async::RuntimeContext> _async;
   std::string _label;
 };
 
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.h b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.h
index ab8561090..0e59edf01 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.h
@@ -7,8 +7,8 @@
 
 #include "NativeObject.h"
 
-#include "rnwgpu/async/AsyncRunner.h"
 #include "rnwgpu/async/AsyncTaskHandle.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -23,7 +23,7 @@ class GPUShaderModule : public NativeObject<GPUShaderModule> {
   static constexpr const char *CLASS_NAME = "GPUShaderModule";
 
   explicit GPUShaderModule(wgpu::ShaderModule instance,
-                           std::shared_ptr<async::AsyncRunner> async,
+                           std::shared_ptr<async::RuntimeContext> async,
                            std::string label)
       : NativeObject(CLASS_NAME), _instance(instance), _async(async),
         _label(label) {}
@@ -59,7 +59,7 @@ class GPUShaderModule : public NativeObject<GPUShaderModule> {
 
 private:
   wgpu::ShaderModule _instance;
-  std::shared_ptr<async::AsyncRunner> _async;
+  std::shared_ptr<async::RuntimeContext> _async;
   std::string _label;
 };
 
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
index 2f910fd3f..e3a224563 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
@@ -36,7 +36,7 @@ class AsyncTaskHandle {
   AsyncTaskHandle();
 
   /**
-   * Internal constructor used by AsyncRunner.
+   * Internal constructor used by RuntimeContext.
    */
   explicit AsyncTaskHandle(std::shared_ptr<State> state);
 
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
similarity index 56%
rename from packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp
rename to packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
index 850e57e8a..f297ae6b0 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
@@ -1,4 +1,4 @@
-#include "AsyncRunner.h"
+#include "RuntimeContext.h"
 
 #include <memory>
 #include <stdexcept>
@@ -11,25 +11,26 @@ namespace rnwgpu::async {
 
 namespace {
 struct RuntimeData {
-  std::shared_ptr<AsyncRunner> runner;
+  std::shared_ptr<RuntimeContext> runner;
 };
-constexpr const char *TAG = "AsyncRunner";
+constexpr const char *TAG = "RuntimeContext";
 } // namespace
 
-AsyncRunner::AsyncRunner(std::shared_ptr<RuntimeScheduler> scheduler,
-                         std::shared_ptr<GpuEventLoop> eventLoop)
+RuntimeContext::RuntimeContext(std::shared_ptr<RuntimeScheduler> scheduler,
+                               std::shared_ptr<GpuEventLoop> eventLoop)
     : _scheduler(std::move(scheduler)), _eventLoop(std::move(eventLoop)) {
   if (!_scheduler) {
-    throw std::runtime_error("AsyncRunner requires a valid RuntimeScheduler.");
+    throw std::runtime_error(
+        "RuntimeContext requires a valid RuntimeScheduler.");
   }
   if (!_eventLoop) {
-    throw std::runtime_error("AsyncRunner requires a valid GpuEventLoop.");
+    throw std::runtime_error("RuntimeContext requires a valid GpuEventLoop.");
   }
   Logger::logToConsole("[%s] Created runner (scheduler=%p, eventLoop=%p)", TAG,
                        _scheduler.get(), _eventLoop.get());
 }
 
-std::shared_ptr<AsyncRunner> AsyncRunner::get(jsi::Runtime &runtime) {
+std::shared_ptr<RuntimeContext> RuntimeContext::get(jsi::Runtime &runtime) {
   auto data = runtime.getRuntimeData(runtimeDataUUID());
   if (!data) {
     return nullptr;
@@ -38,24 +39,24 @@ std::shared_ptr<AsyncRunner> AsyncRunner::get(jsi::Runtime &runtime) {
   return stored->runner;
 }
 
-std::shared_ptr<AsyncRunner>
-AsyncRunner::getOrCreate(jsi::Runtime &runtime,
-                         std::shared_ptr<RuntimeScheduler> scheduler,
-                         std::shared_ptr<GpuEventLoop> eventLoop) {
+std::shared_ptr<RuntimeContext>
+RuntimeContext::getOrCreate(jsi::Runtime &runtime,
+                            std::shared_ptr<RuntimeScheduler> scheduler,
+                            std::shared_ptr<GpuEventLoop> eventLoop) {
   auto existing = get(runtime);
   if (existing) {
     return existing;
   }
 
-  auto runner =
-      std::make_shared<AsyncRunner>(std::move(scheduler), std::move(eventLoop));
+  auto runner = std::make_shared<RuntimeContext>(std::move(scheduler),
+                                                 std::move(eventLoop));
   auto data = std::make_shared<RuntimeData>();
   data->runner = runner;
   runtime.setRuntimeData(runtimeDataUUID(), data);
   return runner;
 }
 
-AsyncTaskHandle AsyncRunner::postTask(const TaskCallback &callback) {
+AsyncTaskHandle RuntimeContext::postTask(const TaskCallback &callback) {
   auto handle = AsyncTaskHandle::create(_scheduler);
   if (!handle.valid()) {
     throw std::runtime_error("Failed to create AsyncTaskHandle.");
@@ -71,7 +72,7 @@ AsyncTaskHandle AsyncRunner::postTask(const TaskCallback &callback) {
     reject(exception.what());
     return handle;
   } catch (...) {
-    reject("Unknown native error in AsyncRunner::postTask.");
+    reject("Unknown native error in RuntimeContext::postTask.");
     return handle;
   }
 
@@ -79,11 +80,11 @@ AsyncTaskHandle AsyncRunner::postTask(const TaskCallback &callback) {
   return handle;
 }
 
-std::shared_ptr<RuntimeScheduler> AsyncRunner::scheduler() const {
+std::shared_ptr<RuntimeScheduler> RuntimeContext::scheduler() const {
   return _scheduler;
 }
 
-jsi::UUID AsyncRunner::runtimeDataUUID() {
+jsi::UUID RuntimeContext::runtimeDataUUID() {
   static const auto uuid = jsi::UUID();
   return uuid;
 }
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
similarity index 83%
rename from packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h
rename to packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
index 7c01d0f69..a7a5d46f4 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncRunner.h
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
@@ -28,17 +28,17 @@ namespace rnwgpu::async {
  * returned future with id == 0 means "no event to wait on" (deferred/immediate
  * resolution, e.g. GPUDevice::getLost).
  */
-class AsyncRunner : public std::enable_shared_from_this<AsyncRunner> {
+class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
 public:
   using TaskCallback =
       std::function<wgpu::Future(const AsyncTaskHandle::ResolveFunction &,
                                  const AsyncTaskHandle::RejectFunction &)>;
 
-  AsyncRunner(std::shared_ptr<RuntimeScheduler> scheduler,
-              std::shared_ptr<GpuEventLoop> eventLoop);
+  RuntimeContext(std::shared_ptr<RuntimeScheduler> scheduler,
+                 std::shared_ptr<GpuEventLoop> eventLoop);
 
-  static std::shared_ptr<AsyncRunner> get(jsi::Runtime &runtime);
-  static std::shared_ptr<AsyncRunner>
+  static std::shared_ptr<RuntimeContext> get(jsi::Runtime &runtime);
+  static std::shared_ptr<RuntimeContext>
   getOrCreate(jsi::Runtime &runtime,
               std::shared_ptr<RuntimeScheduler> scheduler,
               std::shared_ptr<GpuEventLoop> eventLoop);

From e71fcffda9ec2b95cad99d8f866615cf33c4a386 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 15:41:00 +0200
Subject: [PATCH 04/25] :wrench:

---
 packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp | 13 ++++++++++++-
 packages/webgpu/package.json                      |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
index 91119dd96..2bd643b39 100644
--- a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
@@ -95,7 +95,18 @@ void GpuEventLoop::worker(std::shared_ptr<State> state) {
     // CPU cost until the GPU work completes, at which point Dawn invokes the
     // future's callback on this thread (it then marshals back to the owning
     // runtime via its RuntimeScheduler).
-    state->instance.WaitAny(future, UINT64_MAX);
+    auto status = state->instance.WaitAny(future, UINT64_MAX);
+    if (status != wgpu::WaitStatus::Success) {
+      // With an infinite timeout on a single future this is not expected. If it
+      // happens, Dawn did not invoke the future's callback, so the associated
+      // JS Promise will never settle. Log it so the otherwise-silent hang is at
+      // least observable.
+      Logger::logToConsole(
+          "[%s] WaitAny returned non-success status %u for future %llu; its "
+          "Promise will not settle.",
+          TAG, static_cast<unsigned int>(status),
+          static_cast<unsigned long long>(future.id));
+    }
   }
 }
 
diff --git a/packages/webgpu/package.json b/packages/webgpu/package.json
index 961528fb3..07a48b53e 100644
--- a/packages/webgpu/package.json
+++ b/packages/webgpu/package.json
@@ -1,6 +1,6 @@
 {
   "name": "react-native-wgpu",
-  "version": "0.5.12",
+  "version": "0.5.13",
   "description": "React Native WebGPU",
   "main": "lib/commonjs/index",
   "module": "lib/module/index",

From c32687360e26eec21f1f5ec0d4fc6f454109b9fd Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 16:11:57 +0200
Subject: [PATCH 05/25] :wrench:

---
 README.md                                     | 10 +---
 apps/example/ios/Podfile.lock                 |  4 +-
 apps/example/src/CanvasAPI/CanvasAPI.tsx      |  2 -
 apps/example/src/ComputeToys/engine/index.ts  |  1 -
 .../ImportExternalTexture.tsx                 |  1 -
 apps/example/src/Reanimated/Reanimated.tsx    |  1 -
 .../SharedTextureMemory.tsx                   |  1 -
 .../StorageBufferVertices.tsx                 |  2 -
 apps/example/src/ThreeJS/Backdrop.tsx         |  1 -
 apps/example/src/ThreeJS/Cube.tsx             |  1 -
 apps/example/src/ThreeJS/Helmet.tsx           |  1 -
 apps/example/src/ThreeJS/InstancedMesh.tsx    |  1 -
 apps/example/src/ThreeJS/PostProcessing.tsx   |  1 -
 apps/example/src/ThreeJS/Retargeting.tsx      |  1 -
 .../src/ThreeJS/components/FiberCanvas.tsx    |  1 -
 apps/example/src/Triangle/HelloTriangle.tsx   |  2 -
 .../src/Triangle/HelloTriangleMSAA.tsx        |  1 -
 .../example/src/VisionCamera/VisionCamera.tsx |  1 -
 apps/example/src/components/Texture.tsx       |  1 -
 apps/example/src/components/useWebGPU.ts      |  1 -
 docs/refactor-async-present-plan.md           | 46 +++++++++++++++-
 packages/webgpu/README.md                     | 10 +---
 packages/webgpu/android/CMakeLists.txt        |  1 +
 packages/webgpu/android/cpp/cpp-adapter.cpp   | 54 +++++++++++++++++++
 packages/webgpu/apple/MetalView.mm            |  4 ++
 packages/webgpu/apple/WebGPUModule.mm         |  7 +++
 packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h  | 29 +++++++++-
 packages/webgpu/cpp/rnwgpu/api/GPU.h          |  1 +
 .../cpp/rnwgpu/api/GPUCanvasContext.cpp       | 35 ++++++------
 .../webgpu/cpp/rnwgpu/api/GPUCanvasContext.h  |  5 +-
 packages/webgpu/src/Canvas.tsx                |  6 +--
 packages/webgpu/src/Offscreen.ts              |  4 --
 packages/webgpu/src/WebPolyfillGPUModule.ts   |  5 +-
 packages/webgpu/src/types.ts                  |  6 +--
 34 files changed, 172 insertions(+), 76 deletions(-)

diff --git a/README.md b/README.md
index 8eeb1cba1..d7415053b 100644
--- a/README.md
+++ b/README.md
@@ -128,8 +128,6 @@ export function HelloTriangle() {
       passEncoder.end();
 
       device.queue.submit([commandEncoder.finish()]);
-
-      context.present();
     };
     helloTriangle();
   }, [ref]);
@@ -174,15 +172,13 @@ ctx.canvas.height = ctx.canvas.clientHeight * PixelRatio.get();
 
 ### Frame Scheduling
 
-In React Native, we want to keep frame presentation as a manual operation as we plan to provide more advanced rendering options that are React Native specific.  
-This means that when you are ready to present a frame, you need to call `present` on the context.
+Frame presentation is automatic. Once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
 
 ```tsx
 // draw
 // submit to the queue
 device.queue.submit([commandEncoder.finish()]);
-// This method is React Native only
-context.present();
+// The frame is presented automatically on the next vsync.
 ```
 
 ### Canvas Transparency
@@ -296,7 +292,6 @@ const render = () => {
 
   // Release the surface's access window right after the submit that sampled it.
   externalTexture.destroy();
-  context.present();
 };
 ```
 
@@ -328,7 +323,6 @@ const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   const commandEncoder = device.createCommandEncoder();
   // ... render ...
   device.queue.submit([commandEncoder.finish()]);
-  context.present();
 };
 
 // Initialize WebGPU on main thread, then run on UI thread
diff --git a/apps/example/ios/Podfile.lock b/apps/example/ios/Podfile.lock
index fd5ba968c..b4c5f158a 100644
--- a/apps/example/ios/Podfile.lock
+++ b/apps/example/ios/Podfile.lock
@@ -1924,7 +1924,7 @@ PODS:
     - ReactCommon/turbomodule/core
     - SocketRocket
     - Yoga
-  - react-native-wgpu (0.5.12):
+  - react-native-wgpu (0.5.13):
     - boost
     - DoubleConversion
     - fast_float
@@ -3074,7 +3074,7 @@ SPEC CHECKSUMS:
   React-microtasksnativemodule: 75b6604b667d297292345302cc5bfb6b6aeccc1b
   react-native-safe-area-context: c00143b4823773bba23f2f19f85663ae89ceb460
   react-native-skia: fc73e9bdc46ebb420a98c9c2be29fee80f565e79
-  react-native-wgpu: 274ffec11ee3a082260d9f3d1fb54030a5ca0873
+  react-native-wgpu: 0496e9efeb4c3939ab56371005ede4e1468591d1
   React-NativeModulesApple: 879fbdc5dcff7136abceb7880fe8a2022a1bd7c3
   React-oscompat: 93b5535ea7f7dff46aaee4f78309a70979bdde9d
   React-perflogger: 5536d2df3d18fe0920263466f7b46a56351c0510
diff --git a/apps/example/src/CanvasAPI/CanvasAPI.tsx b/apps/example/src/CanvasAPI/CanvasAPI.tsx
index a9f5c4928..a403c8388 100644
--- a/apps/example/src/CanvasAPI/CanvasAPI.tsx
+++ b/apps/example/src/CanvasAPI/CanvasAPI.tsx
@@ -89,8 +89,6 @@ export const CanvasAPI = () => {
             passEncoder.end();
 
             device.queue.submit([commandEncoder.finish()]);
-
-            context.present();
           })()
         }
         title="check surface"
diff --git a/apps/example/src/ComputeToys/engine/index.ts b/apps/example/src/ComputeToys/engine/index.ts
index f0fa08f07..8db2562ad 100644
--- a/apps/example/src/ComputeToys/engine/index.ts
+++ b/apps/example/src/ComputeToys/engine/index.ts
@@ -398,7 +398,6 @@ fn passSampleLevelBilinearRepeat(pass_index: int, uv: float2, lod: float) -> flo
 
       // Submit command buffer
       this.device.queue.submit([encoder.finish()]);
-      this.surface!.present();
 
       // Update frame counter
       this.bindings!.time.host.frame += 1;
diff --git a/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx b/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx
index f8399ee8a..7c973e03f 100644
--- a/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx
+++ b/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx
@@ -247,7 +247,6 @@ export const ImportExternalTexture = () => {
       // Now that the work sampling it has been submitted, end the external
       // texture's access window so the frame's surface is released promptly.
       externalTex?.destroy();
-      context.present();
       rafRef.current = requestAnimationFrame(render);
     };
     rafRef.current = requestAnimationFrame(render);
diff --git a/apps/example/src/Reanimated/Reanimated.tsx b/apps/example/src/Reanimated/Reanimated.tsx
index 505296565..2f8b5e5cb 100644
--- a/apps/example/src/Reanimated/Reanimated.tsx
+++ b/apps/example/src/Reanimated/Reanimated.tsx
@@ -79,7 +79,6 @@ export const webGPUDemo = (
 
     device.queue.submit([commandEncoder.finish()]);
 
-    context.present();
     if (runAnimation.value) {
       requestAnimationFrame(frame);
     }
diff --git a/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx b/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
index b5627cc43..197657460 100644
--- a/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
+++ b/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
@@ -268,7 +268,6 @@ export const SharedTextureMemory = () => {
       }
       pass.end();
       device.queue.submit([encoder.finish()]);
-      context.present();
       rafRef.current = requestAnimationFrame(render);
     };
     rafRef.current = requestAnimationFrame(render);
diff --git a/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx b/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
index 907264638..b1906cf74 100644
--- a/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
+++ b/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
@@ -185,8 +185,6 @@ export function StorageBufferVertices() {
 
     const commandBuffer = encoder.finish();
     device.queue.submit([commandBuffer]);
-    // eslint-disable-next-line @typescript-eslint/no-explicit-any
-    (context as any).present();
   });
 
   return (
diff --git a/apps/example/src/ThreeJS/Backdrop.tsx b/apps/example/src/ThreeJS/Backdrop.tsx
index 8ed2a8c91..113325b9d 100644
--- a/apps/example/src/ThreeJS/Backdrop.tsx
+++ b/apps/example/src/ThreeJS/Backdrop.tsx
@@ -150,7 +150,6 @@ export const Backdrop = () => {
       }
 
       renderer.render(scene, camera);
-      context!.present();
     }
     return () => {
       renderer.setAnimationLoop(null);
diff --git a/apps/example/src/ThreeJS/Cube.tsx b/apps/example/src/ThreeJS/Cube.tsx
index d3e9707b5..ea3fe0f23 100644
--- a/apps/example/src/ThreeJS/Cube.tsx
+++ b/apps/example/src/ThreeJS/Cube.tsx
@@ -31,7 +31,6 @@ export const Cube = () => {
       mesh.rotation.y = time / 1000;
 
       renderer.render(scene, camera);
-      context.present();
     }
     renderer.setAnimationLoop(animate);
     return () => {
diff --git a/apps/example/src/ThreeJS/Helmet.tsx b/apps/example/src/ThreeJS/Helmet.tsx
index be7cb626f..70720d360 100644
--- a/apps/example/src/ThreeJS/Helmet.tsx
+++ b/apps/example/src/ThreeJS/Helmet.tsx
@@ -49,7 +49,6 @@ export const Helmet = () => {
     function animate() {
       animateCamera();
       renderer.render(scene, camera);
-      context!.present();
     }
 
     return () => {
diff --git a/apps/example/src/ThreeJS/InstancedMesh.tsx b/apps/example/src/ThreeJS/InstancedMesh.tsx
index 3f60631de..5b7c7ca4d 100644
--- a/apps/example/src/ThreeJS/InstancedMesh.tsx
+++ b/apps/example/src/ThreeJS/InstancedMesh.tsx
@@ -59,7 +59,6 @@ export const InstancedMesh = () => {
 
     function animate() {
       render();
-      context!.present();
     }
 
     function render() {
diff --git a/apps/example/src/ThreeJS/PostProcessing.tsx b/apps/example/src/ThreeJS/PostProcessing.tsx
index d94ef1728..0c2980501 100644
--- a/apps/example/src/ThreeJS/PostProcessing.tsx
+++ b/apps/example/src/ThreeJS/PostProcessing.tsx
@@ -72,7 +72,6 @@ export const PostProcessing = () => {
         mixer.update(delta);
       }
       postProcessing.render();
-      context!.present();
     }
     return () => {
       renderer.setAnimationLoop(null);
diff --git a/apps/example/src/ThreeJS/Retargeting.tsx b/apps/example/src/ThreeJS/Retargeting.tsx
index c25601885..8b8dd9a29 100644
--- a/apps/example/src/ThreeJS/Retargeting.tsx
+++ b/apps/example/src/ThreeJS/Retargeting.tsx
@@ -302,7 +302,6 @@ export const Retargeting = () => {
       source.mixer.update(delta);
       mixer.update(delta);
       renderer.render(scene, camera);
-      context.present();
     });
 
     return () => {
diff --git a/apps/example/src/ThreeJS/components/FiberCanvas.tsx b/apps/example/src/ThreeJS/components/FiberCanvas.tsx
index 91b699553..92b928987 100644
--- a/apps/example/src/ThreeJS/components/FiberCanvas.tsx
+++ b/apps/example/src/ThreeJS/components/FiberCanvas.tsx
@@ -66,7 +66,6 @@ export const FiberCanvas = ({
         const renderFrame = state.gl.render.bind(state.gl);
         state.gl.render = (s: THREE.Scene, c: THREE.Camera) => {
           renderFrame(s, c);
-          context?.present();
         };
       },
     });
diff --git a/apps/example/src/Triangle/HelloTriangle.tsx b/apps/example/src/Triangle/HelloTriangle.tsx
index 3e28d6c12..caeb560b3 100644
--- a/apps/example/src/Triangle/HelloTriangle.tsx
+++ b/apps/example/src/Triangle/HelloTriangle.tsx
@@ -77,8 +77,6 @@ export function HelloTriangle() {
       passEncoder.end();
 
       device.queue.submit([commandEncoder.finish()]);
-
-      context.present();
     })();
   }, [ref]);
 
diff --git a/apps/example/src/Triangle/HelloTriangleMSAA.tsx b/apps/example/src/Triangle/HelloTriangleMSAA.tsx
index 5d66983d5..b9518fbe9 100644
--- a/apps/example/src/Triangle/HelloTriangleMSAA.tsx
+++ b/apps/example/src/Triangle/HelloTriangleMSAA.tsx
@@ -87,7 +87,6 @@ export function HelloTriangleMSAA() {
       }
 
       frame();
-      context.present();
     })();
   }, [ref]);
 
diff --git a/apps/example/src/VisionCamera/VisionCamera.tsx b/apps/example/src/VisionCamera/VisionCamera.tsx
index c4adcfaa0..cba2d2948 100644
--- a/apps/example/src/VisionCamera/VisionCamera.tsx
+++ b/apps/example/src/VisionCamera/VisionCamera.tsx
@@ -617,7 +617,6 @@ const CameraView = () => {
           // access window now to release the camera frame's surface promptly
           // (don't wait for GC, which would starve the frame buffer pool).
           externalTex.destroy();
-          context.present();
         } finally {
           videoFrame.release();
         }
diff --git a/apps/example/src/components/Texture.tsx b/apps/example/src/components/Texture.tsx
index d9e689b41..5bd82a911 100644
--- a/apps/example/src/components/Texture.tsx
+++ b/apps/example/src/components/Texture.tsx
@@ -145,7 +145,6 @@ export const Texture = ({ texture, style, device }: GPUTextureProps) => {
     renderPass.end();
 
     device.queue.submit([commandEncoder.finish()]);
-    context.present();
   }, [device, state, texture, ref]);
   return <Canvas ref={ref} style={style} />;
 };
diff --git a/apps/example/src/components/useWebGPU.ts b/apps/example/src/components/useWebGPU.ts
index ac8a631ac..1a399aafe 100644
--- a/apps/example/src/components/useWebGPU.ts
+++ b/apps/example/src/components/useWebGPU.ts
@@ -57,7 +57,6 @@ export const useWebGPU = (scene: Scene) => {
         const render = () => {
           const timestamp = Date.now();
           renderScene(timestamp);
-          context.present();
           animationFrameId.current = requestAnimationFrame(render);
         };
 
diff --git a/docs/refactor-async-present-plan.md b/docs/refactor-async-present-plan.md
index e69706534..e4d38b802 100644
--- a/docs/refactor-async-present-plan.md
+++ b/docs/refactor-async-present-plan.md
@@ -244,7 +244,7 @@ the full `react-native-wgpu` native lib **compiles and links** for `arm64-v8a` (
 `cpplint` clean (project filters); `clang-format` (pinned 15.0.0) applied; `yarn tsc` passes
 (no TS changed). On-device runtime behaviour (frame pacing, zero idle CPU) is Phase 4.
 
-**Phase 2 — Auto-present + remove `present()`**
+**Phase 2 — Auto-present + remove `present()`** — **DONE**
 - Add `FrameDriver` (iOS `CADisplayLink`, Android `AChoreographer`); wire
   `getCurrentTexture` → register; vsync → dispatch present to owning runtime.
 - Remove `GPUCanvasContext::present` (`api/GPUCanvasContext.h:50,58`, `.cpp:56-65`) and
@@ -252,6 +252,50 @@ the full `react-native-wgpu` native lib **compiles and links** for `arm64-v8a` (
 - JS: drop `present` from `RNCanvasContext` (`src/Canvas.tsx:22-24`, `src/types.ts`).
 - Migrate all 16 example / `useWebGPU` call sites + `README.md` + `packages/webgpu/README.md`.
 
+### Phase 2 — what shipped (branch `claude/keen-darwin-xeywa`)
+New files:
+- `cpp/rnwgpu/FrameDriver.{h,cpp}` — global vsync auto-present coordinator. `requestPresent`
+  (from `getCurrentTexture`, JS thread) coalesces per `contextId`; `onVSync` (UI thread)
+  dispatches each pending surface's present onto its owning runtime's `RuntimeScheduler`
+  (`surface->presentFrame()`). Request-driven: starts the platform vsync on first request,
+  stops after `kMaxIdleFrames` (3) idle frames → zero idle CPU.
+- `apple/WebGPUFrameDriver.{h,mm}` — iOS/tvOS `CADisplayLink` on the main run loop (paused
+  toggled by start/stop). macOS uses `NSScreen.displayLinkWithTarget:` on 14+, else an
+  `NSTimer` fallback. Selector → `FrameDriver::onVSync()`.
+- `android/.../com/webgpu/WebGPUFrameDriver.java` — main-thread `Choreographer` driver;
+  `doFrame` → static `nativeOnVSync()` JNI → `FrameDriver::onVSync()`, reposts while running.
+
+Wiring:
+- `SurfaceInfo::present()` → `presentFrame()` (Apple `WaitForCommandsToBeScheduled` + Present,
+  no-op offscreen); added `SurfaceInfo::hasSurface()`. Metal extern moved to `SurfaceRegistry.h`.
+- `GPU::getContext()` re-exposes the per-runtime `RuntimeContext` (so the canvas can reach its
+  scheduler). `GPUCanvasContext` stores `_contextId`, registers the present in
+  `getCurrentTexture` (and now sets the canvas client size there), and dropped `present()` +
+  its JS binding.
+- iOS `WebGPUModule install` and Android `initializeNative` register `setPlatformVSync`. View
+  teardown (`MetalView dealloc`, Android `onSurfaceDestroy`) calls `FrameDriver::cancelPresent`.
+- JS: `RNCanvasContext` is now just `GPUCanvasContext` (`src/Canvas.tsx`, `src/types.ts`);
+  removed the no-op `present` from `Offscreen.ts` and `WebPolyfillGPUModule.ts`. 18 example
+  call sites (the plan's 16 + `VisionCamera`, `ImportExternalTexture`) and both READMEs migrated.
+
+Decisions / deviations:
+1. **Android vsync = Java `Choreographer` + JNI** (not pure NDK `AChoreographer`), chosen for
+   robustness — pure NDK needs a JNI hop to a Looper thread to bootstrap anyway. Confirmed with
+   the user.
+2. **`present()` hard-removed** (breaking), confirmed with the user.
+3. **Owning-runtime caveat (→ Phase 3):** `getCurrentTexture` currently dispatches present via
+   the **main** runtime's scheduler (`_gpu->getContext()`). Correct for main-JS rendering. The
+   Reanimated example renders on the **UI (worklet) runtime**, so its present is migrated (call
+   removed) but auto-present won't target the correct thread until Phase 3 tags the present with
+   the *calling* runtime and gives worklet runtimes their own `RuntimeScheduler`. Expect the
+   Reanimated/Dedicated examples to be visually broken between Phase 2 and Phase 3.
+
+Validation (local): `react-native-wgpu` native lib **compiles and links** for `arm64-v8a`
+(ninja, CMake picked up `FrameDriver.cpp`); `cpplint` clean; `clang-format` applied; `yarn tsc`
+and `yarn lint` pass for both `packages/webgpu` and `apps/example`. iOS `.mm` and the Java
+driver are not compiled locally (no iOS/gradle build run here) — review-only; needs a device
+build. On-device frame pacing / zero-idle-CPU verification is Phase 4.
+
 **Phase 3 — First-class worklet runtimes**
 - Worklet-runtime `RuntimeScheduler` impl (per Spike 1); verify auto-present dispatch on UI +
   dedicated runtimes; update `apps/example/src/Reanimated/Reanimated.tsx` (drop `present()`,
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index 8eeb1cba1..d7415053b 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -128,8 +128,6 @@ export function HelloTriangle() {
       passEncoder.end();
 
       device.queue.submit([commandEncoder.finish()]);
-
-      context.present();
     };
     helloTriangle();
   }, [ref]);
@@ -174,15 +172,13 @@ ctx.canvas.height = ctx.canvas.clientHeight * PixelRatio.get();
 
 ### Frame Scheduling
 
-In React Native, we want to keep frame presentation as a manual operation as we plan to provide more advanced rendering options that are React Native specific.  
-This means that when you are ready to present a frame, you need to call `present` on the context.
+Frame presentation is automatic. Once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
 
 ```tsx
 // draw
 // submit to the queue
 device.queue.submit([commandEncoder.finish()]);
-// This method is React Native only
-context.present();
+// The frame is presented automatically on the next vsync.
 ```
 
 ### Canvas Transparency
@@ -296,7 +292,6 @@ const render = () => {
 
   // Release the surface's access window right after the submit that sampled it.
   externalTexture.destroy();
-  context.present();
 };
 ```
 
@@ -328,7 +323,6 @@ const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   const commandEncoder = device.createCommandEncoder();
   // ... render ...
   device.queue.submit([commandEncoder.finish()]);
-  context.present();
 };
 
 // Initialize WebGPU on main thread, then run on UI thread
diff --git a/packages/webgpu/android/CMakeLists.txt b/packages/webgpu/android/CMakeLists.txt
index 50756e72e..51005acdc 100644
--- a/packages/webgpu/android/CMakeLists.txt
+++ b/packages/webgpu/android/CMakeLists.txt
@@ -47,6 +47,7 @@ add_library(${PACKAGE_NAME} SHARED
     ../cpp/rnwgpu/api/GPUComputePipeline.cpp
     ../cpp/rnwgpu/api/GPUCanvasContext.cpp
     ../cpp/rnwgpu/RNWebGPUManager.cpp
+    ../cpp/rnwgpu/FrameDriver.cpp
     ../cpp/jsi/Promise.cpp
     ../cpp/jsi/RuntimeLifecycleMonitor.cpp
     ../cpp/jsi/RuntimeAwareCache.cpp
diff --git a/packages/webgpu/android/cpp/cpp-adapter.cpp b/packages/webgpu/android/cpp/cpp-adapter.cpp
index 2a441c218..4f0ba61d3 100644
--- a/packages/webgpu/android/cpp/cpp-adapter.cpp
+++ b/packages/webgpu/android/cpp/cpp-adapter.cpp
@@ -10,6 +10,7 @@
 #include <webgpu/webgpu_cpp.h>
 
 #include "AndroidPlatformContext.h"
+#include "FrameDriver.h"
 #include "GPUCanvasContext.h"
 #include "RNWebGPUManager.h"
 
@@ -17,6 +18,37 @@
 
 std::shared_ptr<rnwgpu::RNWebGPUManager> manager;
 
+// JNI handles for driving the vsync source (com.webgpu.WebGPUFrameDriver),
+// cached on the JNI thread in initializeNative (which has the app classloader).
+static JavaVM *gJavaVM = nullptr;
+static jclass gFrameDriverClass = nullptr;
+static jmethodID gFrameDriverStart = nullptr;
+static jmethodID gFrameDriverStop = nullptr;
+
+static void callFrameDriver(jmethodID method) {
+  if (gJavaVM == nullptr || gFrameDriverClass == nullptr || method == nullptr) {
+    return;
+  }
+  JNIEnv *env = nullptr;
+  bool attached = false;
+  jint res = gJavaVM->GetEnv(reinterpret_cast<void **>(&env), JNI_VERSION_1_6);
+  if (res == JNI_EDETACHED) {
+    if (gJavaVM->AttachCurrentThread(&env, nullptr) != JNI_OK) {
+      return;
+    }
+    attached = true;
+  } else if (res != JNI_OK) {
+    return;
+  }
+  env->CallStaticVoidMethod(gFrameDriverClass, method);
+  if (env->ExceptionCheck()) {
+    env->ExceptionClear();
+  }
+  if (attached) {
+    gJavaVM->DetachCurrentThread();
+  }
+}
+
 extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUModule_initializeNative(
     JNIEnv *env, jobject /* this */, jlong jsRuntime,
     jobject jsCallInvokerHolder, jobject blobModule) {
@@ -31,6 +63,27 @@ extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUModule_initializeNative(
       std::make_shared<rnwgpu::AndroidPlatformContext>(globalBlobModule);
   manager = std::make_shared<rnwgpu::RNWebGPUManager>(runtime, jsCallInvoker,
                                                       platformContext);
+
+  // Cache JNI handles for the Choreographer-based vsync source and register it
+  // with the FrameDriver to drive auto-present (replaces context.present()).
+  env->GetJavaVM(&gJavaVM);
+  jclass localCls = env->FindClass("com/webgpu/WebGPUFrameDriver");
+  if (localCls != nullptr) {
+    gFrameDriverClass = reinterpret_cast<jclass>(env->NewGlobalRef(localCls));
+    gFrameDriverStart =
+        env->GetStaticMethodID(gFrameDriverClass, "start", "()V");
+    gFrameDriverStop = env->GetStaticMethodID(gFrameDriverClass, "stop", "()V");
+    env->DeleteLocalRef(localCls);
+  }
+  rnwgpu::FrameDriver::getInstance().setPlatformVSync(
+      [] { callFrameDriver(gFrameDriverStart); },
+      [] { callFrameDriver(gFrameDriverStop); });
+}
+
+extern "C" JNIEXPORT void JNICALL
+Java_com_webgpu_WebGPUFrameDriver_nativeOnVSync(JNIEnv * /*env*/,
+                                                jclass /*clazz*/) {
+  rnwgpu::FrameDriver::getInstance().onVSync();
 }
 
 extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUView_onSurfaceChanged(
@@ -66,6 +119,7 @@ Java_com_webgpu_WebGPUView_switchToOffscreenSurface(JNIEnv *env, jobject thiz,
 
 extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUView_onSurfaceDestroy(
     JNIEnv *env, jobject thiz, jint contextId) {
+  rnwgpu::FrameDriver::getInstance().cancelPresent(contextId);
   auto &registry = rnwgpu::SurfaceRegistry::getInstance();
   registry.removeSurfaceInfo(contextId);
 }
\ No newline at end of file
diff --git a/packages/webgpu/apple/MetalView.mm b/packages/webgpu/apple/MetalView.mm
index ccff1245c..e617da889 100644
--- a/packages/webgpu/apple/MetalView.mm
+++ b/packages/webgpu/apple/MetalView.mm
@@ -1,6 +1,8 @@
 #import "MetalView.h"
 #import "webgpu/webgpu_cpp.h"
 
+#include "FrameDriver.h"
+
 @implementation MetalView {
   BOOL _isConfigured;
 }
@@ -42,6 +44,8 @@ - (void)update {
 }
 
 - (void)dealloc {
+  // Stop any pending auto-present for this surface before it goes away.
+  rnwgpu::FrameDriver::getInstance().cancelPresent([_contextId intValue]);
   auto &registry = rnwgpu::SurfaceRegistry::getInstance();
   // Remove the surface info from the registry
   registry.removeSurfaceInfo([_contextId intValue]);
diff --git a/packages/webgpu/apple/WebGPUModule.mm b/packages/webgpu/apple/WebGPUModule.mm
index 99580aa14..c4c7224ad 100644
--- a/packages/webgpu/apple/WebGPUModule.mm
+++ b/packages/webgpu/apple/WebGPUModule.mm
@@ -1,6 +1,8 @@
 #import "WebGPUModule.h"
 #include "ApplePlatformContext.h"
+#include "FrameDriver.h"
 #import "GPUCanvasContext.h"
+#import "WebGPUFrameDriver.h"
 
 #import <React/RCTBridge+Private.h>
 #import <React/RCTCallInvoker.h>
@@ -78,6 +80,11 @@ - (void)invalidate {
       std::make_shared<rnwgpu::ApplePlatformContext>();
   webgpuManager = std::make_shared<rnwgpu::RNWebGPUManager>(runtime, jsInvoker,
                                                             platformContext);
+
+  // Drive auto-present from the display's vsync (replaces context.present()).
+  rnwgpu::FrameDriver::getInstance().setPlatformVSync(
+      [] { [WebGPUFrameDriver start]; }, [] { [WebGPUFrameDriver stop]; });
+
   return @true;
 }
 
diff --git a/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h b/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
index 110a45d44..ed098896a 100644
--- a/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
+++ b/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
@@ -7,6 +7,12 @@
 
 #include "webgpu/webgpu_cpp.h"
 
+#ifdef __APPLE__
+namespace dawn::native::metal {
+void WaitForCommandsToBeScheduled(WGPUDevice device);
+} // namespace dawn::native::metal
+#endif
+
 namespace rnwgpu {
 
 struct NativeInfo {
@@ -113,7 +119,22 @@ class SurfaceInfo {
     height = newHeight;
   }
 
-  void present() {
+  // Present the current surface texture. Called at the frame boundary from the
+  // owning runtime's JS thread (via FrameDriver), replacing the old manual
+  // present(). No-op when offscreen / unconfigured (no surface).
+  void presentFrame() {
+#ifdef __APPLE__
+    // Ensure command buffers are scheduled before presenting. Read the device
+    // under a shared lock, then wait without holding it (the wait can block).
+    wgpu::Device device;
+    {
+      std::shared_lock<std::shared_mutex> lock(_mutex);
+      device = config.device;
+    }
+    if (device) {
+      dawn::native::metal::WaitForCommandsToBeScheduled(device.Get());
+    }
+#endif
     std::unique_lock<std::shared_mutex> lock(_mutex);
     if (surface) {
       surface.Present();
@@ -131,6 +152,12 @@ class SurfaceInfo {
     }
   }
 
+  // True when an on-screen wgpu::Surface is attached (vs offscreen texture).
+  bool hasSurface() {
+    std::shared_lock<std::shared_mutex> lock(_mutex);
+    return surface != nullptr;
+  }
+
   NativeInfo getNativeInfo() {
     std::shared_lock<std::shared_mutex> lock(_mutex);
     return {.nativeSurface = nativeSurface, .width = width, .height = height};
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.h b/packages/webgpu/cpp/rnwgpu/api/GPU.h
index e7dc15caf..b2488d4c7 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.h
@@ -53,6 +53,7 @@ class GPU : public NativeObject<GPU> {
   }
 
   inline const wgpu::Instance get() { return _instance; }
+  inline std::shared_ptr<async::RuntimeContext> getContext() { return _async; }
 
 private:
   wgpu::Instance _instance;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
index d75eb7b0f..7a2c32886 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
@@ -1,16 +1,9 @@
 #include "GPUCanvasContext.h"
 #include "Convertors.h"
+#include "FrameDriver.h"
 #include "RNWebGPUManager.h"
 #include <memory>
 
-#ifdef __APPLE__
-namespace dawn::native::metal {
-
-void WaitForCommandsToBeScheduled(WGPUDevice device);
-
-}
-#endif
-
 namespace rnwgpu {
 
 void GPUCanvasContext::configure(
@@ -48,20 +41,26 @@ std::shared_ptr<GPUTexture> GPUCanvasContext::getCurrentTexture() {
     _surfaceInfo->reconfigure(width, height);
   }
   auto texture = _surfaceInfo->getCurrentTexture();
-  // Pass reportsMemoryPressure=false to avoid triggering spurious Hermes GC
-  // cycles every frame since the canvas texture doesn't own the buffer.
-  return std::make_shared<GPUTexture>(texture, "", false);
-}
 
-void GPUCanvasContext::present() {
-#ifdef __APPLE__
-  dawn::native::metal::WaitForCommandsToBeScheduled(
-      _surfaceInfo->getDevice().Get());
-#endif
+  // Auto-present: acquiring the current texture schedules a present for this
+  // surface at the next vsync (spec-aligned "update the rendering" after the
+  // frame). Replaces the old explicit context.present(). Offscreen surfaces
+  // have no wgpu::Surface, so skip them (their texture is read back directly).
   auto size = _surfaceInfo->getSize();
   _canvas->setClientWidth(size.width);
   _canvas->setClientHeight(size.height);
-  _surfaceInfo->present();
+  if (_surfaceInfo->hasSurface()) {
+    // Phase 2: dispatch the present on the main runtime (the only runtime that
+    // owns WebGPU rendering today). Phase 3 will tag this with the *calling*
+    // runtime so worklet-runtime rendering (e.g. the Reanimated example)
+    // presents on its own JS thread, preserving Dawn surface thread-affinity.
+    FrameDriver::getInstance().requestPresent(_contextId, _surfaceInfo,
+                                              _gpu->getContext()->scheduler());
+  }
+
+  // Pass reportsMemoryPressure=false to avoid triggering spurious Hermes GC
+  // cycles every frame since the canvas texture doesn't own the buffer.
+  return std::make_shared<GPUTexture>(texture, "", false);
 }
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
index 4b97a7887..2ab5d69c2 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
@@ -26,7 +26,7 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
 
   GPUCanvasContext(std::shared_ptr<GPU> gpu, int contextId, int width,
                    int height)
-      : NativeObject(CLASS_NAME), _gpu(std::move(gpu)) {
+      : NativeObject(CLASS_NAME), _contextId(contextId), _gpu(std::move(gpu)) {
     _canvas = std::make_shared<Canvas>(nullptr, width, height);
     auto &registry = rnwgpu::SurfaceRegistry::getInstance();
     _surfaceInfo =
@@ -47,7 +47,6 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
                   &GPUCanvasContext::unconfigure);
     installMethod(runtime, prototype, "getCurrentTexture",
                   &GPUCanvasContext::getCurrentTexture);
-    installMethod(runtime, prototype, "present", &GPUCanvasContext::present);
   }
 
   // TODO: is this ok?
@@ -55,9 +54,9 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
   void configure(std::shared_ptr<GPUCanvasConfiguration> configuration);
   void unconfigure();
   std::shared_ptr<GPUTexture> getCurrentTexture();
-  void present();
 
 private:
+  int _contextId;
   std::shared_ptr<Canvas> _canvas;
   std::shared_ptr<SurfaceInfo> _surfaceInfo;
   std::shared_ptr<GPU> _gpu;
diff --git a/packages/webgpu/src/Canvas.tsx b/packages/webgpu/src/Canvas.tsx
index 1030f3e38..7c2a47a6e 100644
--- a/packages/webgpu/src/Canvas.tsx
+++ b/packages/webgpu/src/Canvas.tsx
@@ -19,9 +19,9 @@ export interface NativeCanvas {
   clientHeight: number;
 }
 
-export type RNCanvasContext = GPUCanvasContext & {
-  present: () => void;
-};
+// Auto-present (a global vsync FrameDriver) replaces the old manual present();
+// the native context is now just a spec GPUCanvasContext.
+export type RNCanvasContext = GPUCanvasContext;
 
 export interface CanvasRef {
   getContextId: () => number;
diff --git a/packages/webgpu/src/Offscreen.ts b/packages/webgpu/src/Offscreen.ts
index c4e460bb2..6ce2f589c 100644
--- a/packages/webgpu/src/Offscreen.ts
+++ b/packages/webgpu/src/Offscreen.ts
@@ -64,10 +64,6 @@ class GPUOffscreenCanvasContext implements GPUCanvasContext {
     throw new Error("Method not implemented.");
   }
 
-  present() {
-    // Do nothing
-  }
-
   getDevice() {
     if (!this.device) {
       throw new Error("Device is not configured.");
diff --git a/packages/webgpu/src/WebPolyfillGPUModule.ts b/packages/webgpu/src/WebPolyfillGPUModule.ts
index 9dcc1f1c5..04229cd05 100644
--- a/packages/webgpu/src/WebPolyfillGPUModule.ts
+++ b/packages/webgpu/src/WebPolyfillGPUModule.ts
@@ -39,10 +39,7 @@ function makeWebGPUCanvasContext(
     canvas.setAttribute("height", pixelHeight);
   }
 
-  const context = canvas.getContext("webgpu")!;
-  return Object.assign(context, {
-    present: () => {},
-  });
+  return canvas.getContext("webgpu")!;
 }
 
 // @ts-expect-error - polyfill for RNWebGPU native module
diff --git a/packages/webgpu/src/types.ts b/packages/webgpu/src/types.ts
index c03f92b4b..0758c73f4 100644
--- a/packages/webgpu/src/types.ts
+++ b/packages/webgpu/src/types.ts
@@ -8,9 +8,9 @@ export interface NativeCanvas {
   clientHeight: number;
 }
 
-export type RNCanvasContext = GPUCanvasContext & {
-  present: () => void;
-};
+// Auto-present (a global vsync FrameDriver) replaces the old manual present();
+// the native context is now just a spec GPUCanvasContext.
+export type RNCanvasContext = GPUCanvasContext;
 
 export interface CanvasRef {
   getContextId: () => number;

From f5bc1c20b2287ff71c1290e27ada6ed0dc5e4e8b Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 17:05:24 +0200
Subject: [PATCH 06/25] :wrench:

---
 docs/refactor-async-present-plan.md           | 74 +++++++++++++++-
 .../java/com/webgpu/WebGPUFrameDriver.java    | 66 ++++++++++++++
 packages/webgpu/apple/WebGPUFrameDriver.h     | 13 +++
 packages/webgpu/apple/WebGPUFrameDriver.mm    | 88 +++++++++++++++++++
 packages/webgpu/cpp/rnwgpu/FrameDriver.cpp    | 81 +++++++++++++++++
 packages/webgpu/cpp/rnwgpu/FrameDriver.h      | 83 +++++++++++++++++
 .../cpp/rnwgpu/api/GPUCanvasContext.cpp       | 51 ++++++++---
 .../webgpu/cpp/rnwgpu/api/GPUCanvasContext.h  | 10 ++-
 8 files changed, 450 insertions(+), 16 deletions(-)
 create mode 100644 packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java
 create mode 100644 packages/webgpu/apple/WebGPUFrameDriver.h
 create mode 100644 packages/webgpu/apple/WebGPUFrameDriver.mm
 create mode 100644 packages/webgpu/cpp/rnwgpu/FrameDriver.cpp
 create mode 100644 packages/webgpu/cpp/rnwgpu/FrameDriver.h

diff --git a/docs/refactor-async-present-plan.md b/docs/refactor-async-present-plan.md
index e4d38b802..65490af29 100644
--- a/docs/refactor-async-present-plan.md
+++ b/docs/refactor-async-present-plan.md
@@ -1,6 +1,6 @@
 # Refactor: event-driven async + auto-present
 
-Status: **Phase 0 complete — all spikes GREEN, ready for Phase 1**
+Status: **Phases 1–3 complete (local build/lint green). Phase 4 (SurfaceRegistry rework) proposed; Phase 5 = on-device validation.**
 Branch: `claude/keen-darwin-xeywa`
 
 This document is the handoff for moving the async + present refactor forward. Phase 0
@@ -296,12 +296,80 @@ and `yarn lint` pass for both `packages/webgpu` and `apps/example`. iOS `.mm` an
 driver are not compiled locally (no iOS/gradle build run here) — review-only; needs a device
 build. On-device frame pacing / zero-idle-CPU verification is Phase 4.
 
-**Phase 3 — First-class worklet runtimes**
+**Phase 3 — First-class worklet runtimes** — **DONE**
 - Worklet-runtime `RuntimeScheduler` impl (per Spike 1); verify auto-present dispatch on UI +
   dedicated runtimes; update `apps/example/src/Reanimated/Reanimated.tsx` (drop `present()`,
   keep its own rAF loop).
 
-**Phase 4 — Validation**
+### Phase 3 — what shipped (branch `claude/keen-darwin-xeywa`)
+Observed after Phase 2: the **UI-runtime** Reanimated example worked (the Reanimated UI runtime
+executes on the **main thread**, so dispatching its present to the main runtime's scheduler
+happened to land on the right thread), but the **dedicated `createWorkletRuntime`** example
+(`Reanimated/DedicatedThread.tsx`, `runOnRuntime`) crashed — its render thread is its own, so a
+main-thread present violated Dawn surface thread-affinity.
+
+**Decision (confirmed with the user): self-scheduled present, no native worklets dependency.**
+Rather than link `react-native-worklets` natively and have the FrameDriver dispatch via
+`WorkletRuntime::schedule` (the original plan / Spike 1 primary), worklet runtimes now schedule
+their own present on their own event loop. This avoids a new native build dependency entirely
+and is fully buildable/validatable locally (it is Spike 1's documented "JS-scheduling"
+contingency).
+
+Implementation (native only; no JS/build-system changes):
+- `GPUCanvasContext::getCurrentTexture` switched to the full-control HostFunction signature
+  (`jsi::Value(rt, thisVal, args, count)`, same pattern as `RNWebGPU::createImageBitmap`) so it
+  learns the **calling** runtime. New `schedulePresent(runtime)`:
+  - **Main runtime** (`RuntimeContext::get(runtime)` is non-null): unchanged — register with the
+    global vsync `FrameDriver` using that runtime's scheduler.
+  - **Any worklet runtime** (no `RuntimeContext` — Reanimated UI/dedicated, Vision Camera frame
+    processors, …): **present-on-next-acquire**. `getCurrentTexture` presents the *previous*
+    frame synchronously (inline, on the calling thread) just before acquiring the next texture;
+    by then the previous frame's submit has happened, and present runs on the same thread that
+    rendered it. This is the natural swapchain boundary and needs no scheduler.
+
+    Why not schedule onto the runtime's own loop: two earlier attempts failed. (1)
+    `queueMicrotask` is **disabled** on worklet runtimes (throws "microtasks are disabled in this
+    runtime"). (2) `setImmediate`/`setTimeout` exist but route through the runtime's `EventLoop`
+    `AsyncQueue`, which for **Vision Camera** is a custom `NativeThreadAsyncQueue` that hops back
+    through JNI (`fbjni Environment::current()`) and **crashes** when pushed from a
+    non-JVM-attached thread. Present-on-next-acquire avoids the runtime's task queue entirely.
+    Trade-off: one frame of latency, and a worklet that renders exactly once would not present
+    its single frame (continuous loops — rAF, camera frames — are unaffected; the main runtime's
+    one-shot case is covered by the FrameDriver).
+- `Reanimated.tsx` already had `present()` removed in Phase 2; `DedicatedThread.tsx` /
+  `UIThread.tsx` need no changes.
+
+Known limitation (out of scope, examples don't hit it): **async ops** (`mapAsync`,
+`onSubmittedWorkDone`, …) invoked *on a worklet runtime* still settle their Promise via the
+object's creation-runtime context (main), not the calling worklet runtime — the example worklets
+only do synchronous rendering + present (device/adapter are created on the main runtime). Routing
+async settlement to the calling runtime would need the same calling-runtime detection applied to
+the 7 async sites; deferred until a use case needs it.
+
+Validation (local): native lib **compiles + links** for `arm64-v8a`; `cpplint` clean;
+`clang-format` applied; `yarn tsc`/`yarn lint` unaffected (no JS changed). On-device
+verification of the dedicated-worklet example is for the maintainer.
+
+**Phase 4 — `SurfaceRegistry` / surface-model rework** (proposed)
+The `SurfaceInfo` / `SurfaceRegistry` model (`cpp/rnwgpu/SurfaceRegistry.h`) predates the
+event-driven + auto-present work and is now the rough edge. Candidate improvements to scope:
+- **Surface thread-affinity.** Surface lifecycle (`configure`/`switchToOnscreen`/
+  `switchToOffscreen`/`resize`) runs on the **UI thread** (native view callbacks) while
+  `getCurrentTexture`/`presentFrame` run on the **owning runtime's render thread**. A single
+  `shared_mutex` serializes them but they're still cross-thread against a Dawn surface that
+  prefers single-thread access. Consider routing all surface ops through the owning runtime
+  (e.g. via the `RuntimeScheduler`), making affinity structural rather than lock-guarded.
+- **State clarity.** The on-screen-`surface` vs offscreen-`texture` duality is encoded as
+  `if (surface) … else …` branches throughout; a small explicit state (Offscreen / Onscreen)
+  would remove the implicit coupling and the `switchToOnscreen` flush path's validation cost
+  (its existing `// TODO: faster way without validation?`).
+- **Dead/again-evaluated fields.** e.g. the stored `wgpu::Instance gpu` member appears unused;
+  audit members now that present/`hasSurface` were added.
+- **Lifetime vs `contextId`.** Registry keyed by a JS-side incrementing `int`; `FrameDriver`
+  now also keys pending presents by `contextId`. Confirm teardown ordering (view dealloc →
+  `cancelPresent` + `removeSurfaceInfo`) is race-free under the new threading.
+
+**Phase 5 — Validation**
 ```bash
 yarn tsc && yarn lint
 yarn workspace react-native-wgpu test         # offscreen readback + demo specs
diff --git a/packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java b/packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java
new file mode 100644
index 000000000..03a1d2c29
--- /dev/null
+++ b/packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java
@@ -0,0 +1,66 @@
+package com.webgpu;
+
+import android.os.Handler;
+import android.os.Looper;
+import android.view.Choreographer;
+
+/**
+ * Drives WebGPU auto-present from the main-thread {@link Choreographer},
+ * replacing the manual {@code context.present()} call.
+ *
+ * <p>{@link #start()} / {@link #stop()} are invoked from native code
+ * (rnwgpu::FrameDriver::setPlatformVSync) on arbitrary threads; both hop to the
+ * main thread. While running, {@link #doFrame(long)} calls back into native
+ * once per vsync, where pending surfaces are presented.
+ */
+public class WebGPUFrameDriver implements Choreographer.FrameCallback {
+  private static final WebGPUFrameDriver INSTANCE = new WebGPUFrameDriver();
+
+  private final Handler mainHandler = new Handler(Looper.getMainLooper());
+  private boolean running = false;
+
+  private WebGPUFrameDriver() {}
+
+  /** Called from native (any thread). */
+  public static void start() {
+    INSTANCE.startInternal();
+  }
+
+  /** Called from native (any thread). */
+  public static void stop() {
+    INSTANCE.stopInternal();
+  }
+
+  private void startInternal() {
+    mainHandler.post(
+        () -> {
+          if (running) {
+            return;
+          }
+          running = true;
+          Choreographer.getInstance().postFrameCallback(this);
+        });
+  }
+
+  private void stopInternal() {
+    mainHandler.post(
+        () -> {
+          if (!running) {
+            return;
+          }
+          running = false;
+          Choreographer.getInstance().removeFrameCallback(this);
+        });
+  }
+
+  @Override
+  public void doFrame(long frameTimeNanos) {
+    if (!running) {
+      return;
+    }
+    nativeOnVSync();
+    Choreographer.getInstance().postFrameCallback(this);
+  }
+
+  private static native void nativeOnVSync();
+}
diff --git a/packages/webgpu/apple/WebGPUFrameDriver.h b/packages/webgpu/apple/WebGPUFrameDriver.h
new file mode 100644
index 000000000..aacae84ee
--- /dev/null
+++ b/packages/webgpu/apple/WebGPUFrameDriver.h
@@ -0,0 +1,13 @@
+#pragma once
+
+#import <Foundation/Foundation.h>
+
+// Objective-C wrapper around the platform vsync source (CADisplayLink) that
+// drives rnwgpu::FrameDriver::onVSync() once per frame. start/stop are invoked
+// by the C++ FrameDriver via setPlatformVSync; both hop to the main thread.
+@interface WebGPUFrameDriver : NSObject
+
++ (void)start;
++ (void)stop;
+
+@end
diff --git a/packages/webgpu/apple/WebGPUFrameDriver.mm b/packages/webgpu/apple/WebGPUFrameDriver.mm
new file mode 100644
index 000000000..1d302e2fa
--- /dev/null
+++ b/packages/webgpu/apple/WebGPUFrameDriver.mm
@@ -0,0 +1,88 @@
+#import "WebGPUFrameDriver.h"
+
+#import "RNWGUIKit.h"
+#import <QuartzCore/QuartzCore.h>
+
+#include "FrameDriver.h"
+
+@implementation WebGPUFrameDriver
+
++ (void)onFrame {
+  rnwgpu::FrameDriver::getInstance().onVSync();
+}
+
+#if !TARGET_OS_OSX
+
+// iOS / tvOS: CADisplayLink on the main run loop, paused/resumed for
+// start/stop.
+static CADisplayLink *sDisplayLink = nil;
+
++ (void)tick:(CADisplayLink *)link {
+  [WebGPUFrameDriver onFrame];
+}
+
++ (void)start {
+  dispatch_async(dispatch_get_main_queue(), ^{
+    if (sDisplayLink == nil) {
+      sDisplayLink = [CADisplayLink displayLinkWithTarget:self
+                                                 selector:@selector(tick:)];
+      [sDisplayLink addToRunLoop:[NSRunLoop mainRunLoop]
+                         forMode:NSRunLoopCommonModes];
+    }
+    sDisplayLink.paused = NO;
+  });
+}
+
++ (void)stop {
+  dispatch_async(dispatch_get_main_queue(), ^{
+    sDisplayLink.paused = YES;
+  });
+}
+
+#else // TARGET_OS_OSX
+
+// macOS: CADisplayLink is available via NSScreen on 14.0+. On older systems we
+// fall back to an NSTimer at ~60Hz (not vsync-aligned, but keeps auto-present
+// working). FrameDriver self-idles cheaply when nothing is rendering.
+static id sDisplayLink = nil;
+
++ (void)tick:(id)sender {
+  [WebGPUFrameDriver onFrame];
+}
+
++ (void)start {
+  dispatch_async(dispatch_get_main_queue(), ^{
+    if (sDisplayLink == nil) {
+      if (@available(macOS 14.0, *)) {
+        CADisplayLink *link =
+            [NSScreen.mainScreen displayLinkWithTarget:self
+                                              selector:@selector(tick:)];
+        [link addToRunLoop:[NSRunLoop mainRunLoop]
+                   forMode:NSRunLoopCommonModes];
+        sDisplayLink = link;
+      } else {
+        sDisplayLink = [NSTimer scheduledTimerWithTimeInterval:1.0 / 60.0
+                                                        target:self
+                                                      selector:@selector(tick:)
+                                                      userInfo:nil
+                                                       repeats:YES];
+      }
+    }
+    if ([sDisplayLink isKindOfClass:[CADisplayLink class]]) {
+      ((CADisplayLink *)sDisplayLink).paused = NO;
+    }
+  });
+}
+
++ (void)stop {
+  dispatch_async(dispatch_get_main_queue(), ^{
+    if ([sDisplayLink isKindOfClass:[CADisplayLink class]]) {
+      ((CADisplayLink *)sDisplayLink).paused = YES;
+    }
+    // NSTimer fallback keeps firing; onVSync is a cheap no-op while idle.
+  });
+}
+
+#endif // TARGET_OS_OSX
+
+@end
diff --git a/packages/webgpu/cpp/rnwgpu/FrameDriver.cpp b/packages/webgpu/cpp/rnwgpu/FrameDriver.cpp
new file mode 100644
index 000000000..792940e5e
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/FrameDriver.cpp
@@ -0,0 +1,81 @@
+#include "FrameDriver.h"
+
+#include <memory>
+#include <utility>
+#include <vector>
+
+namespace jsi = facebook::jsi;
+
+namespace rnwgpu {
+
+FrameDriver &FrameDriver::getInstance() {
+  static FrameDriver instance;
+  return instance;
+}
+
+void FrameDriver::setPlatformVSync(std::function<void()> start,
+                                   std::function<void()> stop) {
+  std::lock_guard<std::mutex> lock(_mutex);
+  _start = std::move(start);
+  _stop = std::move(stop);
+}
+
+void FrameDriver::requestPresent(
+    int contextId, std::shared_ptr<SurfaceInfo> surface,
+    std::shared_ptr<async::RuntimeScheduler> scheduler) {
+  if (!surface || !scheduler) {
+    return;
+  }
+
+  std::function<void()> startToCall;
+  {
+    std::lock_guard<std::mutex> lock(_mutex);
+    _pending[contextId] = {std::move(surface), std::move(scheduler)};
+    _idleFrames = 0;
+    if (!_running && _start) {
+      _running = true;
+      startToCall = _start;
+    }
+  }
+
+  // Invoked outside the lock: the platform start hops to the UI thread.
+  if (startToCall) {
+    startToCall();
+  }
+}
+
+void FrameDriver::cancelPresent(int contextId) {
+  std::lock_guard<std::mutex> lock(_mutex);
+  _pending.erase(contextId);
+}
+
+void FrameDriver::onVSync() {
+  std::vector<Pending> toPresent;
+  std::function<void()> stopToCall;
+  {
+    std::lock_guard<std::mutex> lock(_mutex);
+    if (!_pending.empty()) {
+      toPresent.reserve(_pending.size());
+      for (auto &entry : _pending) {
+        toPresent.push_back(std::move(entry.second));
+      }
+      _pending.clear();
+      _idleFrames = 0;
+    } else if (_running && ++_idleFrames >= kMaxIdleFrames) {
+      _running = false;
+      stopToCall = _stop;
+    }
+  }
+
+  for (auto &pending : toPresent) {
+    auto surface = pending.surface;
+    pending.scheduler->scheduleOnJS(
+        [surface](jsi::Runtime & /*runtime*/) { surface->presentFrame(); });
+  }
+
+  if (stopToCall) {
+    stopToCall();
+  }
+}
+
+} // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/FrameDriver.h b/packages/webgpu/cpp/rnwgpu/FrameDriver.h
new file mode 100644
index 000000000..c16fedabf
--- /dev/null
+++ b/packages/webgpu/cpp/rnwgpu/FrameDriver.h
@@ -0,0 +1,83 @@
+#pragma once
+
+#include <functional>
+#include <memory>
+#include <mutex>
+#include <unordered_map>
+
+#include "SurfaceRegistry.h"
+#include "rnwgpu/async/RuntimeScheduler.h"
+
+namespace rnwgpu {
+
+/**
+ * Global vsync-driven auto-present coordinator. Replaces the manual
+ * `context.present()` call.
+ *
+ * Flow:
+ *   - `GPUCanvasContext::getCurrentTexture()` (JS thread) calls
+ * `requestPresent` for its surface, tagged with the owning runtime's
+ * RuntimeScheduler.
+ *   - A platform vsync source (iOS CADisplayLink / Android Choreographer) calls
+ *     `onVSync()` on the UI thread once per frame.
+ *   - On each vsync, every surface that requested a present has its present
+ *     dispatched onto its owning runtime's JS thread (so `Surface.Present()`
+ * and the Apple Metal scheduling wait run on the same thread that did
+ *     getCurrentTexture / submit, preserving Dawn surface thread-affinity and
+ *     present-after-submit ordering via FIFO on that loop).
+ *
+ * The vsync source is request-driven: it is started when the first present is
+ * requested and stopped after a few idle frames, so an idle (non-rendering) app
+ * costs zero CPU.
+ */
+class FrameDriver {
+public:
+  static FrameDriver &getInstance();
+
+  /**
+   * Register how to start/stop the platform vsync source. `start`/`stop` are
+   * invoked when presents begin/cease; each implementation is responsible for
+   * hopping to the UI thread as needed. Called once per platform at init.
+   */
+  void setPlatformVSync(std::function<void()> start,
+                        std::function<void()> stop);
+
+  /**
+   * Request that `surface` be presented at the next vsync. Coalesced per
+   * contextId (at most one present per surface per frame). Thread-safe; called
+   * from a JS thread inside getCurrentTexture. Surfaces with no on-screen
+   * `wgpu::Surface` (offscreen) should not be registered.
+   */
+  void requestPresent(int contextId, std::shared_ptr<SurfaceInfo> surface,
+                      std::shared_ptr<async::RuntimeScheduler> scheduler);
+
+  /**
+   * Drop any pending present for a surface (e.g. when its view is torn down).
+   * Thread-safe.
+   */
+  void cancelPresent(int contextId);
+
+  /** Called by the platform vsync source on the UI thread, once per frame. */
+  void onVSync();
+
+private:
+  FrameDriver() = default;
+
+  struct Pending {
+    std::shared_ptr<SurfaceInfo> surface;
+    std::shared_ptr<async::RuntimeScheduler> scheduler;
+  };
+
+  // Number of consecutive empty frames before the vsync source is stopped.
+  // A small grace period avoids start/stop thrash during continuous rendering.
+  static constexpr int kMaxIdleFrames = 3;
+
+  std::mutex _mutex;
+  std::unordered_map<int, Pending> _pending;
+  std::function<void()> _start;
+  std::function<void()> _stop;
+  bool _running = false;
+  int _idleFrames = 0;
+};
+
+} // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
index 7a2c32886..2eb76c0b4 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
@@ -32,7 +32,15 @@ void GPUCanvasContext::configure(
 
 void GPUCanvasContext::unconfigure() {}
 
-std::shared_ptr<GPUTexture> GPUCanvasContext::getCurrentTexture() {
+jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
+                                               const jsi::Value & /*thisValue*/,
+                                               const jsi::Value * /*args*/,
+                                               size_t /*count*/) {
+  // Main JS runtime owns a RuntimeContext; worklet runtimes (Reanimated UI /
+  // dedicated, Vision Camera frame processors, …) do not.
+  auto runtimeContext = async::RuntimeContext::get(runtime);
+  const bool isMainRuntime = runtimeContext != nullptr;
+
   auto prevSize = _surfaceInfo->getConfig();
   auto width = _canvas->getWidth();
   auto height = _canvas->getHeight();
@@ -40,27 +48,46 @@ std::shared_ptr<GPUTexture> GPUCanvasContext::getCurrentTexture() {
   if (sizeHasChanged) {
     _surfaceInfo->reconfigure(width, height);
   }
+
+  // Worklet-runtime auto-present: present the PREVIOUS frame synchronously on
+  // this thread, just before acquiring the next texture. By now that frame's
+  // submit has already happened (during the previous frame's work), and this
+  // runs on the same thread that did getCurrentTexture/submit — preserving Dawn
+  // surface thread-affinity. We can't use the UI-thread FrameDriver here, and
+  // scheduling onto the worklet runtime's own task queue is unsafe in general
+  // (e.g. Vision Camera's queue hops through JNI and crashes off the JS
+  // thread), so we present inline at the natural swapchain boundary instead.
+  if (!isMainRuntime && _hasUnpresentedFrame && _surfaceInfo->hasSurface()) {
+    _surfaceInfo->presentFrame();
+    _hasUnpresentedFrame = false;
+  }
+
   auto texture = _surfaceInfo->getCurrentTexture();
 
-  // Auto-present: acquiring the current texture schedules a present for this
-  // surface at the next vsync (spec-aligned "update the rendering" after the
-  // frame). Replaces the old explicit context.present(). Offscreen surfaces
-  // have no wgpu::Surface, so skip them (their texture is read back directly).
   auto size = _surfaceInfo->getSize();
   _canvas->setClientWidth(size.width);
   _canvas->setClientHeight(size.height);
+
+  // Auto-present: acquiring the current texture arranges for this frame to be
+  // presented (spec-aligned "update the rendering" after the frame). Replaces
+  // the old explicit context.present(). Offscreen surfaces have no
+  // wgpu::Surface, so skip them (their texture is read back directly).
   if (_surfaceInfo->hasSurface()) {
-    // Phase 2: dispatch the present on the main runtime (the only runtime that
-    // owns WebGPU rendering today). Phase 3 will tag this with the *calling*
-    // runtime so worklet-runtime rendering (e.g. the Reanimated example)
-    // presents on its own JS thread, preserving Dawn surface thread-affinity.
-    FrameDriver::getInstance().requestPresent(_contextId, _surfaceInfo,
-                                              _gpu->getContext()->scheduler());
+    if (isMainRuntime) {
+      // Main runtime: drive present from the global vsync FrameDriver (handles
+      // one-shot renders too, since it presents the current frame at vsync).
+      FrameDriver::getInstance().requestPresent(_contextId, _surfaceInfo,
+                                                runtimeContext->scheduler());
+    } else {
+      // Worklet runtime: present at the next acquire (see above).
+      _hasUnpresentedFrame = true;
+    }
   }
 
   // Pass reportsMemoryPressure=false to avoid triggering spurious Hermes GC
   // cycles every frame since the canvas texture doesn't own the buffer.
-  return std::make_shared<GPUTexture>(texture, "", false);
+  auto gpuTexture = std::make_shared<GPUTexture>(texture, "", false);
+  return JSIConverter<std::shared_ptr<GPUTexture>>::toJSI(runtime, gpuTexture);
 }
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
index 2ab5d69c2..bdf6bee8c 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
@@ -53,13 +53,21 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
   inline const wgpu::Surface get() { return nullptr; }
   void configure(std::shared_ptr<GPUCanvasConfiguration> configuration);
   void unconfigure();
-  std::shared_ptr<GPUTexture> getCurrentTexture();
+  // Full-control signature so we can learn the *calling* runtime and route the
+  // auto-present onto its own thread (main runtime → FrameDriver vsync; worklet
+  // runtime → presented inline at the next getCurrentTexture).
+  jsi::Value getCurrentTexture(jsi::Runtime &runtime,
+                               const jsi::Value &thisValue,
+                               const jsi::Value *args, size_t count);
 
 private:
   int _contextId;
   std::shared_ptr<Canvas> _canvas;
   std::shared_ptr<SurfaceInfo> _surfaceInfo;
   std::shared_ptr<GPU> _gpu;
+  // For worklet-runtime auto-present: true when a frame was acquired on a
+  // worklet runtime and not yet presented (presented at the next acquire).
+  bool _hasUnpresentedFrame = false;
 };
 
 } // namespace rnwgpu

From ba9efe94ecf8942f9e834a75422447ee027505ad Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 17:33:07 +0200
Subject: [PATCH 07/25] :wrench:

---
 README.md                                     | 15 +++-
 apps/example/src/Reanimated/Reanimated.tsx    |  3 +
 .../example/src/VisionCamera/VisionCamera.tsx |  3 +
 docs/refactor-async-present-plan.md           | 65 +++++++++------
 packages/webgpu/README.md                     | 15 +++-
 .../cpp/rnwgpu/api/GPUCanvasContext.cpp       | 80 +++++++++++--------
 .../webgpu/cpp/rnwgpu/api/GPUCanvasContext.h  | 12 +--
 packages/webgpu/src/Canvas.tsx                | 15 +++-
 packages/webgpu/src/Offscreen.ts              |  4 +
 packages/webgpu/src/WebPolyfillGPUModule.ts   |  5 +-
 packages/webgpu/src/types.ts                  | 15 +++-
 11 files changed, 159 insertions(+), 73 deletions(-)

diff --git a/README.md b/README.md
index d7415053b..433d498fa 100644
--- a/README.md
+++ b/README.md
@@ -172,7 +172,7 @@ ctx.canvas.height = ctx.canvas.clientHeight * PixelRatio.get();
 
 ### Frame Scheduling
 
-Frame presentation is automatic. Once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
+On the **main JS runtime** and the **Reanimated UI runtime**, frame presentation is automatic: once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
 
 ```tsx
 // draw
@@ -181,6 +181,19 @@ device.queue.submit([commandEncoder.finish()]);
 // The frame is presented automatically on the next vsync.
 ```
 
+When you render from a **dedicated worklet runtime** (e.g. `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor), it runs on its own thread where present can't be driven automatically. Call `context.present()` yourself after submitting:
+
+```tsx
+const onFrame = () => {
+  "worklet";
+  // draw on the dedicated runtime's thread
+  device.queue.submit([commandEncoder.finish()]);
+  context.present(); // required on dedicated worklet runtimes; a no-op on JS/UI
+};
+```
+
+`present()` is safe to call from a worklet that runs on either the UI runtime or a dedicated runtime: it presents on the dedicated runtime and does nothing on the JS/UI runtime (which auto-present).
+
 ### Canvas Transparency
 
 On Android, the `alphaMode` property is ignored when configuring the canvas.
diff --git a/apps/example/src/Reanimated/Reanimated.tsx b/apps/example/src/Reanimated/Reanimated.tsx
index 2f8b5e5cb..3761c90f9 100644
--- a/apps/example/src/Reanimated/Reanimated.tsx
+++ b/apps/example/src/Reanimated/Reanimated.tsx
@@ -78,6 +78,9 @@ export const webGPUDemo = (
     passEncoder.end();
 
     device.queue.submit([commandEncoder.finish()]);
+    // Needed on a dedicated worklet runtime (DedicatedThread); a no-op on the
+    // UI runtime (UIThread), where present is automatic.
+    context.present();
 
     if (runAnimation.value) {
       requestAnimationFrame(frame);
diff --git a/apps/example/src/VisionCamera/VisionCamera.tsx b/apps/example/src/VisionCamera/VisionCamera.tsx
index cba2d2948..f6c6c95bd 100644
--- a/apps/example/src/VisionCamera/VisionCamera.tsx
+++ b/apps/example/src/VisionCamera/VisionCamera.tsx
@@ -613,6 +613,9 @@ const CameraView = () => {
           pass.draw(3);
           pass.end();
           device.queue.submit([encoder.finish()]);
+          // Vision Camera frame processors run on a dedicated worklet runtime,
+          // so present explicitly (auto-present only covers the JS/UI runtime).
+          context.present();
           // The work sampling it is submitted, so end the external texture's
           // access window now to release the camera frame's surface promptly
           // (don't wait for GC, which would starve the frame buffer pool).
diff --git a/docs/refactor-async-present-plan.md b/docs/refactor-async-present-plan.md
index 65490af29..82e0de054 100644
--- a/docs/refactor-async-present-plan.md
+++ b/docs/refactor-async-present-plan.md
@@ -308,36 +308,49 @@ happened to land on the right thread), but the **dedicated `createWorkletRuntime
 (`Reanimated/DedicatedThread.tsx`, `runOnRuntime`) crashed — its render thread is its own, so a
 main-thread present violated Dawn surface thread-affinity.
 
-**Decision (confirmed with the user): self-scheduled present, no native worklets dependency.**
-Rather than link `react-native-worklets` natively and have the FrameDriver dispatch via
-`WorkletRuntime::schedule` (the original plan / Spike 1 primary), worklet runtimes now schedule
-their own present on their own event loop. This avoids a new native build dependency entirely
-and is fully buildable/validatable locally (it is Spike 1's documented "JS-scheduling"
-contingency).
-
-Implementation (native only; no JS/build-system changes):
+**Decision (confirmed with the user): auto-present on the JS + UI runtimes, explicit
+`ctx.present()` on dedicated worklet runtimes. No native worklets dependency.** Rather than link
+`react-native-worklets` natively and dispatch via `WorkletRuntime::schedule` (the original plan /
+Spike 1 primary), the FrameDriver covers the JS and UI runtimes; dedicated runtimes — which run
+on their own thread with no safe scheduler/vsync hook — keep an explicit `present()`. (A
+scheduler-free auto path for dedicated runtimes was prototyped but rejected — see below — because
+it added one frame of latency and never presented a one-shot frame.) This needs no new native
+build dependency and is fully buildable/validatable locally.
+
+Implementation:
 - `GPUCanvasContext::getCurrentTexture` switched to the full-control HostFunction signature
   (`jsi::Value(rt, thisVal, args, count)`, same pattern as `RNWebGPU::createImageBitmap`) so it
-  learns the **calling** runtime. New `schedulePresent(runtime)`:
+  learns the **calling** runtime. Present routing:
   - **Main runtime** (`RuntimeContext::get(runtime)` is non-null): unchanged — register with the
     global vsync `FrameDriver` using that runtime's scheduler.
-  - **Any worklet runtime** (no `RuntimeContext` — Reanimated UI/dedicated, Vision Camera frame
-    processors, …): **present-on-next-acquire**. `getCurrentTexture` presents the *previous*
-    frame synchronously (inline, on the calling thread) just before acquiring the next texture;
-    by then the previous frame's submit has happened, and present runs on the same thread that
-    rendered it. This is the natural swapchain boundary and needs no scheduler.
-
-    Why not schedule onto the runtime's own loop: two earlier attempts failed. (1)
-    `queueMicrotask` is **disabled** on worklet runtimes (throws "microtasks are disabled in this
-    runtime"). (2) `setImmediate`/`setTimeout` exist but route through the runtime's `EventLoop`
-    `AsyncQueue`, which for **Vision Camera** is a custom `NativeThreadAsyncQueue` that hops back
-    through JNI (`fbjni Environment::current()`) and **crashes** when pushed from a
-    non-JVM-attached thread. Present-on-next-acquire avoids the runtime's task queue entirely.
-    Trade-off: one frame of latency, and a worklet that renders exactly once would not present
-    its single frame (continuous loops — rAF, camera frames — are unaffected; the main runtime's
-    one-shot case is covered by the FrameDriver).
-- `Reanimated.tsx` already had `present()` removed in Phase 2; `DedicatedThread.tsx` /
-  `UIThread.tsx` need no changes.
+  - **Reanimated UI runtime** (`globalThis.__RUNTIME_KIND === 2`, worklets' `RuntimeKind::UI`):
+    also auto-present via the FrameDriver + main scheduler. The UI runtime is reached correctly
+    by this path (Phase 2 confirmed it), so no `present()` is needed.
+  - **Dedicated worklet runtimes** (`RuntimeKind::Worker`, or any untagged/unknown worklet
+    runtime — e.g. Vision Camera frame processors): **explicit `ctx.present()`**, kept in the
+    public API for exactly this case. They run on their own thread with no safe scheduler/vsync
+    hook, so present is called synchronously by the author after `submit`, on that thread
+    (preserving Dawn surface thread-affinity).
+
+  `ctx.present()` is a **no-op on the JS / UI runtime** (they auto-present), which makes it safe
+  to call from a worklet shared between the UI and a dedicated runtime (the example's
+  `webGPUDemo`). Runtime classification uses `RuntimeContext::get(rt)` (main) and the stable
+  worklets global `__RUNTIME_KIND` (`ReactNative=1`, `UI=2`, `Worker=3`); no worklets headers
+  are linked.
+
+  Two scheduler-based approaches were tried and rejected before landing here: (1)
+  `queueMicrotask` is **disabled** on worklet runtimes (throws); (2) `setImmediate`/`setTimeout`
+  exist but route through the runtime's `EventLoop` `AsyncQueue`, which for **Vision Camera** is
+  a custom `NativeThreadAsyncQueue` that hops through JNI (`fbjni Environment::current()`) and
+  **crashes** when pushed from a non-JVM-attached thread. A scheduler-free
+  "present-on-next-acquire" fallback worked everywhere but added one frame of latency and never
+  presented a one-shot frame, so the explicit-`present()`-on-dedicated split was chosen instead.
+- JS surface: `present()` re-added to `RNCanvasContext` (`src/Canvas.tsx`, `src/types.ts`,
+  documented dedicated-only) and as a no-op on `Offscreen.ts` / `WebPolyfillGPUModule.ts`. Native
+  `GPUCanvasContext::present` re-added (full-control signature; no-op on auto-presented runtimes).
+- Examples: `present()` re-added to `Reanimated/Reanimated.tsx`'s shared `webGPUDemo` (no-op on
+  UIThread, real on DedicatedThread) and to `VisionCamera.tsx`'s frame processor. Both READMEs'
+  "Frame Scheduling" sections document the JS/UI-auto vs dedicated-manual split.
 
 Known limitation (out of scope, examples don't hit it): **async ops** (`mapAsync`,
 `onSubmittedWorkDone`, …) invoked *on a worklet runtime* still settle their Promise via the
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index d7415053b..433d498fa 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -172,7 +172,7 @@ ctx.canvas.height = ctx.canvas.clientHeight * PixelRatio.get();
 
 ### Frame Scheduling
 
-Frame presentation is automatic. Once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
+On the **main JS runtime** and the **Reanimated UI runtime**, frame presentation is automatic: once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
 
 ```tsx
 // draw
@@ -181,6 +181,19 @@ device.queue.submit([commandEncoder.finish()]);
 // The frame is presented automatically on the next vsync.
 ```
 
+When you render from a **dedicated worklet runtime** (e.g. `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor), it runs on its own thread where present can't be driven automatically. Call `context.present()` yourself after submitting:
+
+```tsx
+const onFrame = () => {
+  "worklet";
+  // draw on the dedicated runtime's thread
+  device.queue.submit([commandEncoder.finish()]);
+  context.present(); // required on dedicated worklet runtimes; a no-op on JS/UI
+};
+```
+
+`present()` is safe to call from a worklet that runs on either the UI runtime or a dedicated runtime: it presents on the dedicated runtime and does nothing on the JS/UI runtime (which auto-present).
+
 ### Canvas Transparency
 
 On Android, the `alphaMode` property is ignored when configuring the canvas.
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
index 2eb76c0b4..c4390ba6d 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
@@ -6,6 +6,29 @@
 
 namespace rnwgpu {
 
+namespace {
+// Runtimes whose present is automatic (no ctx.present() needed): the main JS
+// runtime and the Reanimated UI runtime. Both are reached correctly by the
+// global vsync FrameDriver dispatching through the main runtime's scheduler.
+// Dedicated worklet runtimes (createWorkletRuntime, Vision Camera frame
+// processors, …) run on their own thread with no safe scheduler hook, so they
+// present explicitly via ctx.present().
+bool isAutoPresentedRuntime(jsi::Runtime &runtime) {
+  if (async::RuntimeContext::get(runtime) != nullptr) {
+    return true; // main JS runtime
+  }
+  // Worklets tags every runtime with a numeric `__RUNTIME_KIND`
+  // (worklets::RuntimeKind: ReactNative=1, UI=2, Worker=3). Auto-present only
+  // the UI runtime; treat Worker / unknown / untagged as needing ctx.present().
+  auto kind = runtime.global().getProperty(runtime, "__RUNTIME_KIND");
+  if (kind.isNumber()) {
+    constexpr int kRuntimeKindUI = 2;
+    return static_cast<int>(kind.asNumber()) == kRuntimeKindUI;
+  }
+  return false;
+}
+} // namespace
+
 void GPUCanvasContext::configure(
     std::shared_ptr<GPUCanvasConfiguration> configuration) {
   Convertor conv;
@@ -36,11 +59,6 @@ jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
                                                const jsi::Value & /*thisValue*/,
                                                const jsi::Value * /*args*/,
                                                size_t /*count*/) {
-  // Main JS runtime owns a RuntimeContext; worklet runtimes (Reanimated UI /
-  // dedicated, Vision Camera frame processors, …) do not.
-  auto runtimeContext = async::RuntimeContext::get(runtime);
-  const bool isMainRuntime = runtimeContext != nullptr;
-
   auto prevSize = _surfaceInfo->getConfig();
   auto width = _canvas->getWidth();
   auto height = _canvas->getHeight();
@@ -49,39 +67,21 @@ jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
     _surfaceInfo->reconfigure(width, height);
   }
 
-  // Worklet-runtime auto-present: present the PREVIOUS frame synchronously on
-  // this thread, just before acquiring the next texture. By now that frame's
-  // submit has already happened (during the previous frame's work), and this
-  // runs on the same thread that did getCurrentTexture/submit — preserving Dawn
-  // surface thread-affinity. We can't use the UI-thread FrameDriver here, and
-  // scheduling onto the worklet runtime's own task queue is unsafe in general
-  // (e.g. Vision Camera's queue hops through JNI and crashes off the JS
-  // thread), so we present inline at the natural swapchain boundary instead.
-  if (!isMainRuntime && _hasUnpresentedFrame && _surfaceInfo->hasSurface()) {
-    _surfaceInfo->presentFrame();
-    _hasUnpresentedFrame = false;
-  }
-
   auto texture = _surfaceInfo->getCurrentTexture();
 
   auto size = _surfaceInfo->getSize();
   _canvas->setClientWidth(size.width);
   _canvas->setClientHeight(size.height);
 
-  // Auto-present: acquiring the current texture arranges for this frame to be
-  // presented (spec-aligned "update the rendering" after the frame). Replaces
-  // the old explicit context.present(). Offscreen surfaces have no
-  // wgpu::Surface, so skip them (their texture is read back directly).
-  if (_surfaceInfo->hasSurface()) {
-    if (isMainRuntime) {
-      // Main runtime: drive present from the global vsync FrameDriver (handles
-      // one-shot renders too, since it presents the current frame at vsync).
-      FrameDriver::getInstance().requestPresent(_contextId, _surfaceInfo,
-                                                runtimeContext->scheduler());
-    } else {
-      // Worklet runtime: present at the next acquire (see above).
-      _hasUnpresentedFrame = true;
-    }
+  // Auto-present on the JS / UI runtime: acquiring the current texture
+  // schedules a present for this surface at the next vsync (spec-aligned
+  // "update the rendering" after the frame), dispatched through the main
+  // runtime's scheduler. Dedicated worklet runtimes instead call ctx.present()
+  // explicitly on their own thread. Offscreen surfaces have no wgpu::Surface,
+  // so skip them (their texture is read back directly).
+  if (_surfaceInfo->hasSurface() && isAutoPresentedRuntime(runtime)) {
+    FrameDriver::getInstance().requestPresent(_contextId, _surfaceInfo,
+                                              _gpu->getContext()->scheduler());
   }
 
   // Pass reportsMemoryPressure=false to avoid triggering spurious Hermes GC
@@ -90,4 +90,20 @@ jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
   return JSIConverter<std::shared_ptr<GPUTexture>>::toJSI(runtime, gpuTexture);
 }
 
+jsi::Value GPUCanvasContext::present(jsi::Runtime &runtime,
+                                     const jsi::Value & /*thisValue*/,
+                                     const jsi::Value * /*args*/,
+                                     size_t /*count*/) {
+  // Only meaningful on a dedicated worklet runtime, where present can't be
+  // automated. On the JS / UI runtime present is automatic, so this is a no-op
+  // there — which makes it safe to call from a worklet shared between the UI
+  // runtime and a dedicated runtime. Presents synchronously on the calling
+  // thread (the one that did getCurrentTexture / submit), preserving Dawn
+  // surface thread-affinity.
+  if (!isAutoPresentedRuntime(runtime) && _surfaceInfo->hasSurface()) {
+    _surfaceInfo->presentFrame();
+  }
+  return jsi::Value::undefined();
+}
+
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
index bdf6bee8c..a2e80b7cc 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
@@ -47,27 +47,27 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
                   &GPUCanvasContext::unconfigure);
     installMethod(runtime, prototype, "getCurrentTexture",
                   &GPUCanvasContext::getCurrentTexture);
+    installMethod(runtime, prototype, "present", &GPUCanvasContext::present);
   }
 
   // TODO: is this ok?
   inline const wgpu::Surface get() { return nullptr; }
   void configure(std::shared_ptr<GPUCanvasConfiguration> configuration);
   void unconfigure();
-  // Full-control signature so we can learn the *calling* runtime and route the
-  // auto-present onto its own thread (main runtime → FrameDriver vsync; worklet
-  // runtime → presented inline at the next getCurrentTexture).
+  // Full-control signatures so we can learn the *calling* runtime and decide
+  // how this frame is presented (auto on the JS / UI runtime; explicit
+  // ctx.present() on a dedicated worklet runtime).
   jsi::Value getCurrentTexture(jsi::Runtime &runtime,
                                const jsi::Value &thisValue,
                                const jsi::Value *args, size_t count);
+  jsi::Value present(jsi::Runtime &runtime, const jsi::Value &thisValue,
+                     const jsi::Value *args, size_t count);
 
 private:
   int _contextId;
   std::shared_ptr<Canvas> _canvas;
   std::shared_ptr<SurfaceInfo> _surfaceInfo;
   std::shared_ptr<GPU> _gpu;
-  // For worklet-runtime auto-present: true when a frame was acquired on a
-  // worklet runtime and not yet presented (presented at the next acquire).
-  bool _hasUnpresentedFrame = false;
 };
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/src/Canvas.tsx b/packages/webgpu/src/Canvas.tsx
index 7c2a47a6e..43c9621e7 100644
--- a/packages/webgpu/src/Canvas.tsx
+++ b/packages/webgpu/src/Canvas.tsx
@@ -19,9 +19,18 @@ export interface NativeCanvas {
   clientHeight: number;
 }
 
-// Auto-present (a global vsync FrameDriver) replaces the old manual present();
-// the native context is now just a spec GPUCanvasContext.
-export type RNCanvasContext = GPUCanvasContext;
+export type RNCanvasContext = GPUCanvasContext & {
+  /**
+   * Present the current frame.
+   *
+   * Only needed when rendering from a **dedicated worklet runtime** (e.g.
+   * `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame
+   * processor), which runs on its own thread. On the main JS runtime and the
+   * Reanimated UI runtime present is automatic (driven by a global vsync), so
+   * calling this there is a no-op. Call it after `queue.submit()`.
+   */
+  present: () => void;
+};
 
 export interface CanvasRef {
   getContextId: () => number;
diff --git a/packages/webgpu/src/Offscreen.ts b/packages/webgpu/src/Offscreen.ts
index 6ce2f589c..4deab8a1c 100644
--- a/packages/webgpu/src/Offscreen.ts
+++ b/packages/webgpu/src/Offscreen.ts
@@ -64,6 +64,10 @@ class GPUOffscreenCanvasContext implements GPUCanvasContext {
     throw new Error("Method not implemented.");
   }
 
+  present() {
+    // Offscreen contexts have nothing to present; readback is via getImageData.
+  }
+
   getDevice() {
     if (!this.device) {
       throw new Error("Device is not configured.");
diff --git a/packages/webgpu/src/WebPolyfillGPUModule.ts b/packages/webgpu/src/WebPolyfillGPUModule.ts
index 04229cd05..8b629a0c9 100644
--- a/packages/webgpu/src/WebPolyfillGPUModule.ts
+++ b/packages/webgpu/src/WebPolyfillGPUModule.ts
@@ -39,7 +39,10 @@ function makeWebGPUCanvasContext(
     canvas.setAttribute("height", pixelHeight);
   }
 
-  return canvas.getContext("webgpu")!;
+  const context = canvas.getContext("webgpu")!;
+  // On web there is no manual present; expose a no-op so RNCanvasContext's
+  // present() (used on native dedicated worklet runtimes) is callable here too.
+  return Object.assign(context, { present: () => {} });
 }
 
 // @ts-expect-error - polyfill for RNWebGPU native module
diff --git a/packages/webgpu/src/types.ts b/packages/webgpu/src/types.ts
index 0758c73f4..1608a4ff0 100644
--- a/packages/webgpu/src/types.ts
+++ b/packages/webgpu/src/types.ts
@@ -8,9 +8,18 @@ export interface NativeCanvas {
   clientHeight: number;
 }
 
-// Auto-present (a global vsync FrameDriver) replaces the old manual present();
-// the native context is now just a spec GPUCanvasContext.
-export type RNCanvasContext = GPUCanvasContext;
+export type RNCanvasContext = GPUCanvasContext & {
+  /**
+   * Present the current frame.
+   *
+   * Only needed when rendering from a **dedicated worklet runtime** (e.g.
+   * `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame
+   * processor), which runs on its own thread. On the main JS runtime and the
+   * Reanimated UI runtime present is automatic (driven by a global vsync), so
+   * calling this there is a no-op. Call it after `queue.submit()`.
+   */
+  present: () => void;
+};
 
 export interface CanvasRef {
   getContextId: () => number;

From 0696aaa049c3ad89bb1b3c51535e442ea52fe82c Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Tue, 2 Jun 2026 18:00:49 +0200
Subject: [PATCH 08/25] Delete docs/refactor-async-present-plan.md

---
 docs/refactor-async-present-plan.md | 442 ----------------------------
 1 file changed, 442 deletions(-)
 delete mode 100644 docs/refactor-async-present-plan.md

diff --git a/docs/refactor-async-present-plan.md b/docs/refactor-async-present-plan.md
deleted file mode 100644
index 82e0de054..000000000
--- a/docs/refactor-async-present-plan.md
+++ /dev/null
@@ -1,442 +0,0 @@
-# Refactor: event-driven async + auto-present
-
-Status: **Phases 1–3 complete (local build/lint green). Phase 4 (SurfaceRegistry rework) proposed; Phase 5 = on-device validation.**
-Branch: `claude/keen-darwin-xeywa`
-
-This document is the handoff for moving the async + present refactor forward. Phase 0
-(spikes) needs a real local machine: installed `node_modules`, a Dawn build, and the
-iOS/Android toolchains. Everything below the "How to resume locally" section is meant to
-be executed on your computer, not in the web container.
-
----
-
-## Goals (locked)
-
-- **Async**: replace the JS-thread polling loop with a **background `WaitAny` GPU thread**
-  (Dawn `TimedWaitAny` is already enabled — `packages/webgpu/cpp/rnwgpu/api/GPU.cpp:17-23`).
-- **Present**: **remove `context.present()` entirely** (breaking) in favor of a **global
-  Choreographer / CADisplayLink-driven auto-present**.
-- **Scope**: first-class for **all runtimes** — main JS, the reanimated UI runtime, and
-  `createWorkletRuntime` dedicated runtimes.
-
----
-
-## What exists today (the two problems)
-
-### Async (polling) — `packages/webgpu/cpp/rnwgpu/async/`
-- Every async op (`requestAdapter`, `requestDevice`, `mapAsync`, `onSubmittedWorkDone`,
-  `createRender/ComputePipelineAsync`, `popErrorScope`) registers a Dawn callback with
-  `CallbackMode::AllowProcessEvents` and calls `AsyncRunner::postTask`.
-- `AsyncRunner::requestTick` (`async/AsyncRunner.cpp:89-177`) schedules `tick()` via
-  `setImmediate` / `setTimeout(4ms)` / `queueMicrotask`; `tick()` calls
-  `_instance.ProcessEvents()` and **re-schedules itself while any task is "pumping"**
-  (`AsyncRunner.cpp:189-191`). This is a busy reschedule loop: wasted CPU when idle, added
-  latency, and `JSIMicrotaskDispatcher`'s `queueMicrotask` dispatch is only thread-safe when
-  called on the runtime's own thread.
-
-### Present (manual, non-standard)
-`api/GPUCanvasContext.cpp:56-65` → `SurfaceRegistry.h:116-121` → `wgpu::Surface::Present()`.
-The user must call `context.present()` after every `queue.submit` (**16 JS/TS call sites**).
-No CADisplayLink/Choreographer exists; RN's `requestAnimationFrame` is the only frame driver.
-On Apple, present also does a blocking `WaitForCommandsToBeScheduled` on the JS thread.
-
----
-
-## Target architecture
-
-Three new pieces:
-
-### A. `RuntimeScheduler` — thread-safe "post to this runtime's JS thread"
-Replaces `AsyncDispatcher` / `JSIMicrotaskDispatcher` (which use non-thread-safe
-`queueMicrotask`).
-- Interface: `void scheduleOnJS(std::function<void(jsi::Runtime&)>)`, callable from any thread.
-- **Main runtime**: wraps `react::CallInvoker::invokeAsync` (already available —
-  `apple/WebGPUModule.mm:70`, `android/cpp/cpp-adapter.cpp:25-29`).
-- **Worklet runtimes**: wraps the worklet runtime's own thread executor from
-  `react-native-worklets` 0.8.3 (**see Phase 0 spike #1**).
-- Stored per-runtime in a `RuntimeContext` (the "per-JS-thread event loop"), created on first
-  WebGPU use, torn down via the existing `RuntimeLifecycleMonitor` / `RuntimeAwareCache`
-  (`cpp/jsi/RuntimeAwareCache.h`).
-
-### B. `GpuEventLoop` — background `WaitAny` thread (no polling)
-One per `wgpu::Instance` (effectively global).
-- All async sites switch `CallbackMode::AllowProcessEvents` → **`CallbackMode::WaitAnyOnly`**,
-  returning a `wgpu::Future`.
-- A **small bounded thread pool**; each pending future is waited via
-  `instance.WaitAny(future, /*timeout*/UINT64_MAX)` on a pool thread → genuinely event-driven,
-  **zero idle CPU**, resolves the instant GPU work completes. No wake/interrupt problem (each
-  thread owns one future). **See Phase 0 spike #2.**
-- On completion the worker marshals the result and calls the owning runtime's
-  `RuntimeScheduler.scheduleOnJS` to settle the JS Promise. `AsyncTaskHandle` / `Promise`
-  settle logic is reused; `AsyncRunner` + its tick loop are deleted.
-- Fallback (if concurrent `WaitAny` on one instance is unsafe): single worker thread waiting on
-  the batched future set with a condition-variable re-arm.
-
-### C. `FrameDriver` — global vsync source for auto-present
-One UI-thread singleton; removes the need for `present()`.
-- **iOS**: `CADisplayLink` on the main run loop. **Android**: NDK
-  `AChoreographer_postFrameCallback` from C++ (API 24+, avoids JNI). **See Phase 0 spike #3.**
-- Lifecycle: started when ≥1 surface is configured, stopped at 0.
-- **Auto-present semantics** (spec-aligned "update the rendering" after rAF):
-  1. `GPUCanvasContext::getCurrentTexture()` marks its `SurfaceInfo` dirty and registers a
-     present request with `FrameDriver`, tagged with the owning runtime.
-  2. Each vsync (UI thread), `FrameDriver` dispatches each dirty context's present onto its
-     **owning runtime's `RuntimeScheduler`** — so `Surface.Present()` + the Apple Metal
-     scheduling wait run on the same thread that did `getCurrentTexture` / `submit`, preserving
-     Dawn surface thread-affinity and guaranteeing present-after-submit ordering (FIFO on that
-     loop). Clear dirty after present.
-- Offscreen path (`SurfaceRegistry` `switchToOffscreen`, `src/Offscreen.ts`) has no surface →
-  present is a no-op; tests keep reading back the CPU texture.
-
----
-
-## Phase 0 — Local spikes (DO THESE FIRST, on your machine)
-
-These de-risk the refactor before any large change. Run from repo root.
-
-```bash
-# 0. install deps (web container can't do this)
-yarn install
-```
-
-### Spike 1 — worklet-runtime scheduler (HIGHEST RISK)
-Goal: obtain a **thread-safe** "schedule this lambda on runtime R's thread" for an arbitrary
-worklet runtime (UI runtime + a `createWorkletRuntime` runtime) using
-`react-native-worklets@0.8.3`.
-
-```bash
-# inspect the worklets native API actually shipped at 0.8.3
-find node_modules/react-native-worklets -name "*.h" | grep -iE "Runtime|Scheduler|Invoker|Queue"
-# look for: WorkletRuntime, RuntimeManager / WorkletsModuleProxy, UIScheduler / JSScheduler,
-# and any per-runtime executor / async queue we can call from a background C++ thread.
-```
-Deliverable: a one-paragraph note on the exact symbol(s) to use (or "not exposed → needs JS
-shim / worklets PR"). This determines whether Phase 3 (first-class worklet runtimes) is cheap
-or needs a workaround.
-
-### Spike 2 — concurrent `WaitAny` on one Dawn instance
-Goal: confirm multiple threads can each call `instance.WaitAny(singleFuture, UINT64_MAX)`
-concurrently on the **same** instance safely. If not, switch `GpuEventLoop` to the
-single-worker + condition-variable fallback.
-- Search Dawn headers/docs in `externals/dawn` (or built `libs/`) for `WaitAny` threading
-  guarantees. A tiny throwaway C++ test against the built Dawn is ideal.
-
-### Spike 3 — Android frame callback
-Goal: confirm NDK `AChoreographer_postFrameCallback` is usable at the project `minSdk`
-(`packages/webgpu/android/build.gradle`). If `minSdk < 24` for that API, plan the Java
-`Choreographer` + JNI bridge instead.
-
----
-
-## Phase 0 — Findings (completed 2026-06-02, branch `claude/keen-darwin-xeywa`)
-
-Environment verified: `node_modules` installed, `externals/dawn` present, RN **0.81.4**,
-`react-native-worklets` **0.8.3**, Android `minSdk` **26**, NDK 26/27 available.
-
-### Spike 1 — worklet-runtime scheduler → **GREEN (symbol exists, thread-safe)**
-`worklets/WorkletRuntime/WorkletRuntime.h` exposes exactly what we need:
-- `WorkletRuntime::schedule(std::function<void(jsi::Runtime &)> job)` — posts `job` onto the
-  runtime's own `AsyncQueue` (`WorkletRuntime.cpp:211-227`). It is **callable from any thread**
-  (the underlying `AsyncQueueImpl` is a mutex+condvar queue; `AsyncQueueUI` forwards to the
-  `UIScheduler`). The job runs on the runtime's event-loop thread, under `runtimeMutex_`, and
-  uses `weak_from_this()` so it is a **safe no-op if the runtime was torn down**. This is a
-  drop-in for `RuntimeScheduler::scheduleOnJS` for worklet runtimes.
-- `WorkletRuntime::getWeakRuntimeFromJSIRuntime(jsi::Runtime &rt)` (RN ≥ 0.81, we have 0.81.4)
-  maps a bare `jsi::Runtime&` → `weak_ptr<WorkletRuntime>`, so the per-runtime
-  `RuntimeContext` can recover the scheduler from any worklet runtime (UI + dedicated
-  `createWorkletRuntime`) with no JS shim.
-
-**Caveat (build wiring, not API):** webgpu does **not** currently link worklets natively
-(no worklets entry in `packages/webgpu/*.podspec` or `android/CMakeLists.txt`; only JS-level
-serialization helpers exist). Phase 3 must add the native dependency:
-- iOS: depend on `RNWorklets` pod (it ships public headers under `worklets/`,
-  `header_dir = "worklets"`).
-- Android: import the worklets **prefab** module `worklets` (`prefabPublishing` is on in
-  `react-native-worklets/android/build.gradle`).
-Worklets is already a `peerDependency`, so this adds no new install. Phase 3 stays cheap; no
-worklets PR or JS shim needed.
-
-### Spike 2 — concurrent `WaitAny` on one instance → **GREEN (designed for it)**
-Dawn's native `EventManager` (`externals/dawn/src/dawn/native/EventManager.{h,cpp}`) is built
-for multi-threaded waits:
-- State is `MutexProtected<EventState>`; `mNextFutureID` is atomic; a code comment
-  (`EventManager.h:78-82`) explicitly notes "another thread can race to complete the event …
-  via a WaitAny call".
-- Each `WaitAny` call with a non-zero timeout creates a **stack-local `Waiter`** with its **own**
-  `MutexCondVarProtected<bool>` (`EventManager.cpp:338`, `:106`), registers it per-FutureID in
-  the shared map, then blocks on its own condvar. `SetFutureReady` signals the registered
-  waiters. → **N threads can each block in `WaitAny` on the same instance concurrently, each
-  owning its own future.** This is exactly the plan's primary "one future per pool thread" model.
-
-**Hard constraint discovered (`EventManager.cpp:341-354`):** within a *single* `WaitAny` call
-with a non-zero timeout, you may **not** mix events from multiple queues, nor a queue event
-together with a non-queue event — it returns `WaitStatus::Error` ("Mixed source waits with
-timeouts are not currently supported"). Note `mapAsync`/`onSubmittedWorkDone` are *queue*
-events while `requestAdapter`/`requestDevice`/`createPipelineAsync`/`popErrorScope` are
-*non-queue* events.
-→ **Implication:** adopt the **per-future-per-thread** design (each pool thread waits on exactly
-one future) — it is single-source and always legal. The plan's stated fallback ("single worker
-waiting on the batched future set") is **not viable** as written, because batching mixed sources
-hits this restriction. If a bounded pool is undesirable, the correct fallback is one
-worker-thread *per future* (still single-source), not one worker for a batched set.
-
-### Spike 3 — Android frame callback → **GREEN (no JNI bridge needed)**
-In `android/choreographer.h`, `AChoreographer_getInstance()` and
-`AChoreographer_postFrameCallback()` are both `__INTRODUCED_IN(24)`; `minSdk` is **26**, so the
-pure-NDK path works with no Java `Choreographer`/JNI bridge.
-- `postFrameCallback` is `__DEPRECATED_IN(29)` in favor of `postFrameCallback64` (API 29) /
-  `postVsyncCallback` (API 33). Recommendation: call `postFrameCallback64` when
-  `android_get_device_api_level() >= 29`, else `postFrameCallback` (works on 26-28). Both are
-  acceptable; the 64-bit variant just avoids the deprecation warning and 32-bit time wrap.
-- `AChoreographer_getInstance()` must be called on a thread with a `Looper` (the main/UI
-  thread) — `FrameDriver` already lives on the UI thread, so this is satisfied.
-
-### Net go/no-go
-All three risks clear. Proceed to Phase 1. Two plan amendments: (1) Phase 3 must add the
-worklets native build dependency (podspec + prefab); (2) `GpuEventLoop` must use
-per-future-per-thread waits (drop the batched-future fallback).
-
-## Implementation phases (after Phase 0)
-
-**Phase 1 — Event-driven async** (no public API change; `present()` untouched) — **DONE**
-- Add `RuntimeScheduler` (+ main-runtime CallInvoker impl) and `GpuEventLoop`.
-- Switch all 7 async sites to `WaitAnyOnly` + `GpuEventLoop.addFuture(...)`:
-  `api/GPU.cpp`, `api/GPUAdapter.cpp`, `api/GPUDevice.cpp` (×3), `api/GPUBuffer.cpp`,
-  `api/GPUQueue.cpp`, `api/GPUShaderModule.cpp`.
-- Delete `async/AsyncRunner.*` polling + `async/JSIMicrotaskDispatcher.*`; keep
-  `AsyncTaskHandle` / `Promise` settle path on the new scheduler.
-
-### Phase 1 — what shipped (branch `claude/keen-darwin-xeywa`)
-New files (`cpp/rnwgpu/async/`):
-- `RuntimeScheduler.h` — interface `scheduleOnJS(std::function<void(jsi::Runtime&)>)`,
-  callable from any thread.
-- `CallInvokerScheduler.{h,cpp}` — main-runtime impl wrapping
-  `react::CallInvoker::invokeAsync(CallFunc&&)` (RN 0.81 delivers the job on the JS thread
-  with the runtime).
-- `GpuEventLoop.{h,cpp}` — background `WaitAny` driver. Lazily-grown bounded worker pool
-  (cap = `clamp(hardware_concurrency, 2, 8)`); each worker does a single-future
-  `instance.WaitAny(future, UINT64_MAX)` (always a legal single-source wait, per Phase 0
-  spike 2). Shared state held behind a `shared_ptr` so detached workers (and the
-  `wgpu::Instance` ref they need) outlive the object safely; teardown sets `running=false`
-  and notifies idle workers without joining in-flight GPU waits.
-
-Deviations from the original plan (intentional):
-1. **`AsyncRunner` was replaced by `RuntimeContext`** (`async/RuntimeContext.{h,cpp}`), the
-   per-runtime coordinator the plan's Target-architecture §A already named. It bundles
-   `{RuntimeScheduler, GpuEventLoop}` and exposes `postTask`; all polling internals
-   (`tick`/`requestTick`/`ProcessEvents`/pump counters) are gone. `AsyncTaskHandle` depends
-   only on `RuntimeScheduler`. The old `AsyncRunner` name/files no longer exist anywhere
-   (the 6 `api/*` classes now hold `std::shared_ptr<async::RuntimeContext> _async`); the dead
-   `GPU::getAsyncRunner()` accessor was deleted.
-2. **`postTask`'s callback now returns a `wgpu::Future`** (the value returned by the Dawn
-   `WaitAnyOnly` call), which `AsyncRunner` hands to `GpuEventLoop.addFuture`. A returned
-   future with `id == 0` means "no event to wait on" and is ignored — used by
-   `GPUDevice::getLost` (resolved synchronously or later via `notifyDeviceLost`). This
-   replaced the old `keepPumping` bool argument, which is gone.
-
-`GPU`'s constructor now takes the `CallInvoker` (threaded through from `RNWebGPUManager`,
-which already held it) to build the `CallInvokerScheduler`. `AsyncDispatcher.h` and
-`JSIMicrotaskDispatcher.{h,cpp}` deleted; `android/CMakeLists.txt` updated (iOS podspec
-globs `cpp/**` so it needs no change).
-
-Validation run locally: all changed + new TUs syntax-check under the Android NDK toolchain;
-the full `react-native-wgpu` native lib **compiles and links** for `arm64-v8a` (ninja);
-`cpplint` clean (project filters); `clang-format` (pinned 15.0.0) applied; `yarn tsc` passes
-(no TS changed). On-device runtime behaviour (frame pacing, zero idle CPU) is Phase 4.
-
-**Phase 2 — Auto-present + remove `present()`** — **DONE**
-- Add `FrameDriver` (iOS `CADisplayLink`, Android `AChoreographer`); wire
-  `getCurrentTexture` → register; vsync → dispatch present to owning runtime.
-- Remove `GPUCanvasContext::present` (`api/GPUCanvasContext.h:50,58`, `.cpp:56-65`) and
-  `SurfaceInfo::present` (`SurfaceRegistry.h:116-121`).
-- JS: drop `present` from `RNCanvasContext` (`src/Canvas.tsx:22-24`, `src/types.ts`).
-- Migrate all 16 example / `useWebGPU` call sites + `README.md` + `packages/webgpu/README.md`.
-
-### Phase 2 — what shipped (branch `claude/keen-darwin-xeywa`)
-New files:
-- `cpp/rnwgpu/FrameDriver.{h,cpp}` — global vsync auto-present coordinator. `requestPresent`
-  (from `getCurrentTexture`, JS thread) coalesces per `contextId`; `onVSync` (UI thread)
-  dispatches each pending surface's present onto its owning runtime's `RuntimeScheduler`
-  (`surface->presentFrame()`). Request-driven: starts the platform vsync on first request,
-  stops after `kMaxIdleFrames` (3) idle frames → zero idle CPU.
-- `apple/WebGPUFrameDriver.{h,mm}` — iOS/tvOS `CADisplayLink` on the main run loop (paused
-  toggled by start/stop). macOS uses `NSScreen.displayLinkWithTarget:` on 14+, else an
-  `NSTimer` fallback. Selector → `FrameDriver::onVSync()`.
-- `android/.../com/webgpu/WebGPUFrameDriver.java` — main-thread `Choreographer` driver;
-  `doFrame` → static `nativeOnVSync()` JNI → `FrameDriver::onVSync()`, reposts while running.
-
-Wiring:
-- `SurfaceInfo::present()` → `presentFrame()` (Apple `WaitForCommandsToBeScheduled` + Present,
-  no-op offscreen); added `SurfaceInfo::hasSurface()`. Metal extern moved to `SurfaceRegistry.h`.
-- `GPU::getContext()` re-exposes the per-runtime `RuntimeContext` (so the canvas can reach its
-  scheduler). `GPUCanvasContext` stores `_contextId`, registers the present in
-  `getCurrentTexture` (and now sets the canvas client size there), and dropped `present()` +
-  its JS binding.
-- iOS `WebGPUModule install` and Android `initializeNative` register `setPlatformVSync`. View
-  teardown (`MetalView dealloc`, Android `onSurfaceDestroy`) calls `FrameDriver::cancelPresent`.
-- JS: `RNCanvasContext` is now just `GPUCanvasContext` (`src/Canvas.tsx`, `src/types.ts`);
-  removed the no-op `present` from `Offscreen.ts` and `WebPolyfillGPUModule.ts`. 18 example
-  call sites (the plan's 16 + `VisionCamera`, `ImportExternalTexture`) and both READMEs migrated.
-
-Decisions / deviations:
-1. **Android vsync = Java `Choreographer` + JNI** (not pure NDK `AChoreographer`), chosen for
-   robustness — pure NDK needs a JNI hop to a Looper thread to bootstrap anyway. Confirmed with
-   the user.
-2. **`present()` hard-removed** (breaking), confirmed with the user.
-3. **Owning-runtime caveat (→ Phase 3):** `getCurrentTexture` currently dispatches present via
-   the **main** runtime's scheduler (`_gpu->getContext()`). Correct for main-JS rendering. The
-   Reanimated example renders on the **UI (worklet) runtime**, so its present is migrated (call
-   removed) but auto-present won't target the correct thread until Phase 3 tags the present with
-   the *calling* runtime and gives worklet runtimes their own `RuntimeScheduler`. Expect the
-   Reanimated/Dedicated examples to be visually broken between Phase 2 and Phase 3.
-
-Validation (local): `react-native-wgpu` native lib **compiles and links** for `arm64-v8a`
-(ninja, CMake picked up `FrameDriver.cpp`); `cpplint` clean; `clang-format` applied; `yarn tsc`
-and `yarn lint` pass for both `packages/webgpu` and `apps/example`. iOS `.mm` and the Java
-driver are not compiled locally (no iOS/gradle build run here) — review-only; needs a device
-build. On-device frame pacing / zero-idle-CPU verification is Phase 4.
-
-**Phase 3 — First-class worklet runtimes** — **DONE**
-- Worklet-runtime `RuntimeScheduler` impl (per Spike 1); verify auto-present dispatch on UI +
-  dedicated runtimes; update `apps/example/src/Reanimated/Reanimated.tsx` (drop `present()`,
-  keep its own rAF loop).
-
-### Phase 3 — what shipped (branch `claude/keen-darwin-xeywa`)
-Observed after Phase 2: the **UI-runtime** Reanimated example worked (the Reanimated UI runtime
-executes on the **main thread**, so dispatching its present to the main runtime's scheduler
-happened to land on the right thread), but the **dedicated `createWorkletRuntime`** example
-(`Reanimated/DedicatedThread.tsx`, `runOnRuntime`) crashed — its render thread is its own, so a
-main-thread present violated Dawn surface thread-affinity.
-
-**Decision (confirmed with the user): auto-present on the JS + UI runtimes, explicit
-`ctx.present()` on dedicated worklet runtimes. No native worklets dependency.** Rather than link
-`react-native-worklets` natively and dispatch via `WorkletRuntime::schedule` (the original plan /
-Spike 1 primary), the FrameDriver covers the JS and UI runtimes; dedicated runtimes — which run
-on their own thread with no safe scheduler/vsync hook — keep an explicit `present()`. (A
-scheduler-free auto path for dedicated runtimes was prototyped but rejected — see below — because
-it added one frame of latency and never presented a one-shot frame.) This needs no new native
-build dependency and is fully buildable/validatable locally.
-
-Implementation:
-- `GPUCanvasContext::getCurrentTexture` switched to the full-control HostFunction signature
-  (`jsi::Value(rt, thisVal, args, count)`, same pattern as `RNWebGPU::createImageBitmap`) so it
-  learns the **calling** runtime. Present routing:
-  - **Main runtime** (`RuntimeContext::get(runtime)` is non-null): unchanged — register with the
-    global vsync `FrameDriver` using that runtime's scheduler.
-  - **Reanimated UI runtime** (`globalThis.__RUNTIME_KIND === 2`, worklets' `RuntimeKind::UI`):
-    also auto-present via the FrameDriver + main scheduler. The UI runtime is reached correctly
-    by this path (Phase 2 confirmed it), so no `present()` is needed.
-  - **Dedicated worklet runtimes** (`RuntimeKind::Worker`, or any untagged/unknown worklet
-    runtime — e.g. Vision Camera frame processors): **explicit `ctx.present()`**, kept in the
-    public API for exactly this case. They run on their own thread with no safe scheduler/vsync
-    hook, so present is called synchronously by the author after `submit`, on that thread
-    (preserving Dawn surface thread-affinity).
-
-  `ctx.present()` is a **no-op on the JS / UI runtime** (they auto-present), which makes it safe
-  to call from a worklet shared between the UI and a dedicated runtime (the example's
-  `webGPUDemo`). Runtime classification uses `RuntimeContext::get(rt)` (main) and the stable
-  worklets global `__RUNTIME_KIND` (`ReactNative=1`, `UI=2`, `Worker=3`); no worklets headers
-  are linked.
-
-  Two scheduler-based approaches were tried and rejected before landing here: (1)
-  `queueMicrotask` is **disabled** on worklet runtimes (throws); (2) `setImmediate`/`setTimeout`
-  exist but route through the runtime's `EventLoop` `AsyncQueue`, which for **Vision Camera** is
-  a custom `NativeThreadAsyncQueue` that hops through JNI (`fbjni Environment::current()`) and
-  **crashes** when pushed from a non-JVM-attached thread. A scheduler-free
-  "present-on-next-acquire" fallback worked everywhere but added one frame of latency and never
-  presented a one-shot frame, so the explicit-`present()`-on-dedicated split was chosen instead.
-- JS surface: `present()` re-added to `RNCanvasContext` (`src/Canvas.tsx`, `src/types.ts`,
-  documented dedicated-only) and as a no-op on `Offscreen.ts` / `WebPolyfillGPUModule.ts`. Native
-  `GPUCanvasContext::present` re-added (full-control signature; no-op on auto-presented runtimes).
-- Examples: `present()` re-added to `Reanimated/Reanimated.tsx`'s shared `webGPUDemo` (no-op on
-  UIThread, real on DedicatedThread) and to `VisionCamera.tsx`'s frame processor. Both READMEs'
-  "Frame Scheduling" sections document the JS/UI-auto vs dedicated-manual split.
-
-Known limitation (out of scope, examples don't hit it): **async ops** (`mapAsync`,
-`onSubmittedWorkDone`, …) invoked *on a worklet runtime* still settle their Promise via the
-object's creation-runtime context (main), not the calling worklet runtime — the example worklets
-only do synchronous rendering + present (device/adapter are created on the main runtime). Routing
-async settlement to the calling runtime would need the same calling-runtime detection applied to
-the 7 async sites; deferred until a use case needs it.
-
-Validation (local): native lib **compiles + links** for `arm64-v8a`; `cpplint` clean;
-`clang-format` applied; `yarn tsc`/`yarn lint` unaffected (no JS changed). On-device
-verification of the dedicated-worklet example is for the maintainer.
-
-**Phase 4 — `SurfaceRegistry` / surface-model rework** (proposed)
-The `SurfaceInfo` / `SurfaceRegistry` model (`cpp/rnwgpu/SurfaceRegistry.h`) predates the
-event-driven + auto-present work and is now the rough edge. Candidate improvements to scope:
-- **Surface thread-affinity.** Surface lifecycle (`configure`/`switchToOnscreen`/
-  `switchToOffscreen`/`resize`) runs on the **UI thread** (native view callbacks) while
-  `getCurrentTexture`/`presentFrame` run on the **owning runtime's render thread**. A single
-  `shared_mutex` serializes them but they're still cross-thread against a Dawn surface that
-  prefers single-thread access. Consider routing all surface ops through the owning runtime
-  (e.g. via the `RuntimeScheduler`), making affinity structural rather than lock-guarded.
-- **State clarity.** The on-screen-`surface` vs offscreen-`texture` duality is encoded as
-  `if (surface) … else …` branches throughout; a small explicit state (Offscreen / Onscreen)
-  would remove the implicit coupling and the `switchToOnscreen` flush path's validation cost
-  (its existing `// TODO: faster way without validation?`).
-- **Dead/again-evaluated fields.** e.g. the stored `wgpu::Instance gpu` member appears unused;
-  audit members now that present/`hasSurface` were added.
-- **Lifetime vs `contextId`.** Registry keyed by a JS-side incrementing `int`; `FrameDriver`
-  now also keys pending presents by `contextId`. Confirm teardown ordering (view dealloc →
-  `cancelPresent` + `removeSurfaceInfo`) is race-free under the new threading.
-
-**Phase 5 — Validation**
-```bash
-yarn tsc && yarn lint
-yarn workspace react-native-wgpu test         # offscreen readback + demo specs
-yarn build:ios        # or: yarn workspace example ios
-yarn build:android    # or: yarn workspace example android
-```
-Verify: no idle-CPU polling (logging), correct frame pacing, no present-ordering glitches,
-Reanimated UI/Dedicated examples render.
-
----
-
-## 16 `present()` call sites to migrate (Phase 2)
-
-```
-apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
-apps/example/src/components/useWebGPU.ts
-apps/example/src/components/Texture.tsx
-apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
-apps/example/src/ThreeJS/Helmet.tsx
-apps/example/src/ComputeToys/engine/index.ts
-apps/example/src/CanvasAPI/CanvasAPI.tsx
-apps/example/src/ThreeJS/PostProcessing.tsx
-apps/example/src/ThreeJS/Cube.tsx
-apps/example/src/Triangle/HelloTriangle.tsx
-apps/example/src/Triangle/HelloTriangleMSAA.tsx
-apps/example/src/ThreeJS/InstancedMesh.tsx
-apps/example/src/ThreeJS/Retargeting.tsx
-apps/example/src/ThreeJS/components/FiberCanvas.tsx
-apps/example/src/Reanimated/Reanimated.tsx
-apps/example/src/ThreeJS/Backdrop.tsx
-```
-Plus `README.md` and `packages/webgpu/README.md`.
-
----
-
-## Risks / open questions
-- **Worklet-runtime scheduler** access in worklets 0.8.3 (Spike 1 — highest risk).
-- **Concurrent `WaitAny`** semantics on one Dawn instance (Spike 2; single-worker fallback ready).
-- **Present timing**: vsync-dispatched-to-owning-loop must land after submit (FIFO on that loop)
-  and before the next `getCurrentTexture`.
-- **Breaking change**: `present()` removed — type, examples, README updated together.
-- **Apple Metal wait** moves into the frame-boundary present task, off the synchronous call path.
-
----
-
-## How to resume locally
-
-```bash
-git fetch origin claude/keen-darwin-xeywa
-git checkout claude/keen-darwin-xeywa
-git pull origin claude/keen-darwin-xeywa
-# open this file and run Phase 0 spikes, then start Claude Code:
-#   claude
-# suggested kickoff prompt:
-#   "Read docs/refactor-async-present-plan.md. Run the Phase 0 spikes and report
-#    findings before implementing. Develop on this branch."
-```

From f9833e39737a7782e4a45df3461f38b391aa446f Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Thu, 4 Jun 2026 17:07:53 +0200
Subject: [PATCH 09/25] :wrench:

---
 apps/example/src/CanvasAPI/CanvasAPI.tsx      |  1 +
 apps/example/src/ComputeToys/engine/index.ts  |  1 +
 .../ImportExternalTexture.tsx                 |  1 +
 .../SharedTextureMemory.tsx                   |  1 +
 apps/example/src/ThreeJS/Backdrop.tsx         |  1 +
 apps/example/src/ThreeJS/Cube.tsx             |  1 +
 apps/example/src/ThreeJS/Helmet.tsx           |  1 +
 apps/example/src/ThreeJS/InstancedMesh.tsx    |  1 +
 apps/example/src/ThreeJS/PostProcessing.tsx   |  1 +
 apps/example/src/ThreeJS/Retargeting.tsx      |  1 +
 .../src/ThreeJS/components/FiberCanvas.tsx    |  1 +
 apps/example/src/Triangle/HelloTriangle.tsx   |  1 +
 .../src/Triangle/HelloTriangleMSAA.tsx        |  1 +
 apps/example/src/components/Texture.tsx       |  1 +
 apps/example/src/components/useWebGPU.ts      |  1 +
 .../cpp/rnwgpu/api/GPUCanvasContext.cpp       | 52 ++++---------------
 16 files changed, 25 insertions(+), 42 deletions(-)

diff --git a/apps/example/src/CanvasAPI/CanvasAPI.tsx b/apps/example/src/CanvasAPI/CanvasAPI.tsx
index a403c8388..2c150e488 100644
--- a/apps/example/src/CanvasAPI/CanvasAPI.tsx
+++ b/apps/example/src/CanvasAPI/CanvasAPI.tsx
@@ -89,6 +89,7 @@ export const CanvasAPI = () => {
             passEncoder.end();
 
             device.queue.submit([commandEncoder.finish()]);
+            context.present();
           })()
         }
         title="check surface"
diff --git a/apps/example/src/ComputeToys/engine/index.ts b/apps/example/src/ComputeToys/engine/index.ts
index 8db2562ad..f0fa08f07 100644
--- a/apps/example/src/ComputeToys/engine/index.ts
+++ b/apps/example/src/ComputeToys/engine/index.ts
@@ -398,6 +398,7 @@ fn passSampleLevelBilinearRepeat(pass_index: int, uv: float2, lod: float) -> flo
 
       // Submit command buffer
       this.device.queue.submit([encoder.finish()]);
+      this.surface!.present();
 
       // Update frame counter
       this.bindings!.time.host.frame += 1;
diff --git a/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx b/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx
index 7c973e03f..36bbaed15 100644
--- a/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx
+++ b/apps/example/src/ImportExternalTexture/ImportExternalTexture.tsx
@@ -244,6 +244,7 @@ export const ImportExternalTexture = () => {
 
       pass.end();
       device.queue.submit([encoder.finish()]);
+      context.present();
       // Now that the work sampling it has been submitted, end the external
       // texture's access window so the frame's surface is released promptly.
       externalTex?.destroy();
diff --git a/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx b/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
index 197657460..b5627cc43 100644
--- a/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
+++ b/apps/example/src/SharedTextureMemory/SharedTextureMemory.tsx
@@ -268,6 +268,7 @@ export const SharedTextureMemory = () => {
       }
       pass.end();
       device.queue.submit([encoder.finish()]);
+      context.present();
       rafRef.current = requestAnimationFrame(render);
     };
     rafRef.current = requestAnimationFrame(render);
diff --git a/apps/example/src/ThreeJS/Backdrop.tsx b/apps/example/src/ThreeJS/Backdrop.tsx
index 113325b9d..1af5f6cf1 100644
--- a/apps/example/src/ThreeJS/Backdrop.tsx
+++ b/apps/example/src/ThreeJS/Backdrop.tsx
@@ -150,6 +150,7 @@ export const Backdrop = () => {
       }
 
       renderer.render(scene, camera);
+      context.present();
     }
     return () => {
       renderer.setAnimationLoop(null);
diff --git a/apps/example/src/ThreeJS/Cube.tsx b/apps/example/src/ThreeJS/Cube.tsx
index ea3fe0f23..d3e9707b5 100644
--- a/apps/example/src/ThreeJS/Cube.tsx
+++ b/apps/example/src/ThreeJS/Cube.tsx
@@ -31,6 +31,7 @@ export const Cube = () => {
       mesh.rotation.y = time / 1000;
 
       renderer.render(scene, camera);
+      context.present();
     }
     renderer.setAnimationLoop(animate);
     return () => {
diff --git a/apps/example/src/ThreeJS/Helmet.tsx b/apps/example/src/ThreeJS/Helmet.tsx
index 70720d360..fd56b8e0c 100644
--- a/apps/example/src/ThreeJS/Helmet.tsx
+++ b/apps/example/src/ThreeJS/Helmet.tsx
@@ -49,6 +49,7 @@ export const Helmet = () => {
     function animate() {
       animateCamera();
       renderer.render(scene, camera);
+      context.present();
     }
 
     return () => {
diff --git a/apps/example/src/ThreeJS/InstancedMesh.tsx b/apps/example/src/ThreeJS/InstancedMesh.tsx
index 5b7c7ca4d..0ab040d60 100644
--- a/apps/example/src/ThreeJS/InstancedMesh.tsx
+++ b/apps/example/src/ThreeJS/InstancedMesh.tsx
@@ -87,6 +87,7 @@ export const InstancedMesh = () => {
       }
 
       renderer.render(scene, camera);
+      context.present();
     }
     return () => {
       renderer.setAnimationLoop(null);
diff --git a/apps/example/src/ThreeJS/PostProcessing.tsx b/apps/example/src/ThreeJS/PostProcessing.tsx
index 0c2980501..116c46876 100644
--- a/apps/example/src/ThreeJS/PostProcessing.tsx
+++ b/apps/example/src/ThreeJS/PostProcessing.tsx
@@ -72,6 +72,7 @@ export const PostProcessing = () => {
         mixer.update(delta);
       }
       postProcessing.render();
+      context.present();
     }
     return () => {
       renderer.setAnimationLoop(null);
diff --git a/apps/example/src/ThreeJS/Retargeting.tsx b/apps/example/src/ThreeJS/Retargeting.tsx
index 8b8dd9a29..c25601885 100644
--- a/apps/example/src/ThreeJS/Retargeting.tsx
+++ b/apps/example/src/ThreeJS/Retargeting.tsx
@@ -302,6 +302,7 @@ export const Retargeting = () => {
       source.mixer.update(delta);
       mixer.update(delta);
       renderer.render(scene, camera);
+      context.present();
     });
 
     return () => {
diff --git a/apps/example/src/ThreeJS/components/FiberCanvas.tsx b/apps/example/src/ThreeJS/components/FiberCanvas.tsx
index 92b928987..7eb14fafd 100644
--- a/apps/example/src/ThreeJS/components/FiberCanvas.tsx
+++ b/apps/example/src/ThreeJS/components/FiberCanvas.tsx
@@ -66,6 +66,7 @@ export const FiberCanvas = ({
         const renderFrame = state.gl.render.bind(state.gl);
         state.gl.render = (s: THREE.Scene, c: THREE.Camera) => {
           renderFrame(s, c);
+          context.present();
         };
       },
     });
diff --git a/apps/example/src/Triangle/HelloTriangle.tsx b/apps/example/src/Triangle/HelloTriangle.tsx
index caeb560b3..968c231da 100644
--- a/apps/example/src/Triangle/HelloTriangle.tsx
+++ b/apps/example/src/Triangle/HelloTriangle.tsx
@@ -77,6 +77,7 @@ export function HelloTriangle() {
       passEncoder.end();
 
       device.queue.submit([commandEncoder.finish()]);
+      context.present();
     })();
   }, [ref]);
 
diff --git a/apps/example/src/Triangle/HelloTriangleMSAA.tsx b/apps/example/src/Triangle/HelloTriangleMSAA.tsx
index b9518fbe9..affac3352 100644
--- a/apps/example/src/Triangle/HelloTriangleMSAA.tsx
+++ b/apps/example/src/Triangle/HelloTriangleMSAA.tsx
@@ -84,6 +84,7 @@ export function HelloTriangleMSAA() {
         passEncoder.end();
 
         device.queue.submit([commandEncoder.finish()]);
+        context.present();
       }
 
       frame();
diff --git a/apps/example/src/components/Texture.tsx b/apps/example/src/components/Texture.tsx
index 5bd82a911..d9e689b41 100644
--- a/apps/example/src/components/Texture.tsx
+++ b/apps/example/src/components/Texture.tsx
@@ -145,6 +145,7 @@ export const Texture = ({ texture, style, device }: GPUTextureProps) => {
     renderPass.end();
 
     device.queue.submit([commandEncoder.finish()]);
+    context.present();
   }, [device, state, texture, ref]);
   return <Canvas ref={ref} style={style} />;
 };
diff --git a/apps/example/src/components/useWebGPU.ts b/apps/example/src/components/useWebGPU.ts
index 1a399aafe..ac8a631ac 100644
--- a/apps/example/src/components/useWebGPU.ts
+++ b/apps/example/src/components/useWebGPU.ts
@@ -57,6 +57,7 @@ export const useWebGPU = (scene: Scene) => {
         const render = () => {
           const timestamp = Date.now();
           renderScene(timestamp);
+          context.present();
           animationFrameId.current = requestAnimationFrame(render);
         };
 
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
index c4390ba6d..fb7a6efd3 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
@@ -1,34 +1,10 @@
 #include "GPUCanvasContext.h"
 #include "Convertors.h"
-#include "FrameDriver.h"
 #include "RNWebGPUManager.h"
 #include <memory>
 
 namespace rnwgpu {
 
-namespace {
-// Runtimes whose present is automatic (no ctx.present() needed): the main JS
-// runtime and the Reanimated UI runtime. Both are reached correctly by the
-// global vsync FrameDriver dispatching through the main runtime's scheduler.
-// Dedicated worklet runtimes (createWorkletRuntime, Vision Camera frame
-// processors, …) run on their own thread with no safe scheduler hook, so they
-// present explicitly via ctx.present().
-bool isAutoPresentedRuntime(jsi::Runtime &runtime) {
-  if (async::RuntimeContext::get(runtime) != nullptr) {
-    return true; // main JS runtime
-  }
-  // Worklets tags every runtime with a numeric `__RUNTIME_KIND`
-  // (worklets::RuntimeKind: ReactNative=1, UI=2, Worker=3). Auto-present only
-  // the UI runtime; treat Worker / unknown / untagged as needing ctx.present().
-  auto kind = runtime.global().getProperty(runtime, "__RUNTIME_KIND");
-  if (kind.isNumber()) {
-    constexpr int kRuntimeKindUI = 2;
-    return static_cast<int>(kind.asNumber()) == kRuntimeKindUI;
-  }
-  return false;
-}
-} // namespace
-
 void GPUCanvasContext::configure(
     std::shared_ptr<GPUCanvasConfiguration> configuration) {
   Convertor conv;
@@ -73,16 +49,10 @@ jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
   _canvas->setClientWidth(size.width);
   _canvas->setClientHeight(size.height);
 
-  // Auto-present on the JS / UI runtime: acquiring the current texture
-  // schedules a present for this surface at the next vsync (spec-aligned
-  // "update the rendering" after the frame), dispatched through the main
-  // runtime's scheduler. Dedicated worklet runtimes instead call ctx.present()
-  // explicitly on their own thread. Offscreen surfaces have no wgpu::Surface,
-  // so skip them (their texture is read back directly).
-  if (_surfaceInfo->hasSurface() && isAutoPresentedRuntime(runtime)) {
-    FrameDriver::getInstance().requestPresent(_contextId, _surfaceInfo,
-                                              _gpu->getContext()->scheduler());
-  }
+  // getCurrentTexture has no side effects: acquiring the texture must not
+  // schedule a present (that would be a surprising, spec-violating coupling).
+  // Callers present explicitly via ctx.present() after submit, on whichever
+  // thread did the rendering.
 
   // Pass reportsMemoryPressure=false to avoid triggering spurious Hermes GC
   // cycles every frame since the canvas texture doesn't own the buffer.
@@ -90,17 +60,15 @@ jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
   return JSIConverter<std::shared_ptr<GPUTexture>>::toJSI(runtime, gpuTexture);
 }
 
-jsi::Value GPUCanvasContext::present(jsi::Runtime &runtime,
+jsi::Value GPUCanvasContext::present(jsi::Runtime & /*runtime*/,
                                      const jsi::Value & /*thisValue*/,
                                      const jsi::Value * /*args*/,
                                      size_t /*count*/) {
-  // Only meaningful on a dedicated worklet runtime, where present can't be
-  // automated. On the JS / UI runtime present is automatic, so this is a no-op
-  // there — which makes it safe to call from a worklet shared between the UI
-  // runtime and a dedicated runtime. Presents synchronously on the calling
-  // thread (the one that did getCurrentTexture / submit), preserving Dawn
-  // surface thread-affinity.
-  if (!isAutoPresentedRuntime(runtime) && _surfaceInfo->hasSurface()) {
+  // Present is always explicit. It runs synchronously on the calling thread
+  // (the one that did getCurrentTexture / submit), preserving Dawn surface
+  // thread-affinity. Required on every runtime (main JS, UI, dedicated
+  // worklet); offscreen surfaces have no wgpu::Surface so they no-op.
+  if (_surfaceInfo->hasSurface()) {
     _surfaceInfo->presentFrame();
   }
   return jsi::Value::undefined();

From 7f83d8bbe8e61f9e18c36a09c10f02af22414052 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Thu, 4 Jun 2026 20:12:21 +0200
Subject: [PATCH 10/25] :wrench:

---
 README.md                                     | 22 ++---
 packages/webgpu/README.md                     | 22 ++---
 packages/webgpu/android/CMakeLists.txt        |  1 -
 packages/webgpu/android/cpp/cpp-adapter.cpp   | 54 ------------
 .../java/com/webgpu/WebGPUFrameDriver.java    | 66 --------------
 packages/webgpu/apple/MetalView.mm            |  4 -
 packages/webgpu/apple/WebGPUFrameDriver.h     | 13 ---
 packages/webgpu/apple/WebGPUFrameDriver.mm    | 88 -------------------
 packages/webgpu/apple/WebGPUModule.mm         |  6 --
 packages/webgpu/cpp/rnwgpu/FrameDriver.cpp    | 81 -----------------
 packages/webgpu/cpp/rnwgpu/FrameDriver.h      | 83 -----------------
 packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h  |  7 +-
 packages/webgpu/cpp/rnwgpu/api/GPU.h          |  1 -
 .../cpp/rnwgpu/api/GPUCanvasContext.cpp       | 27 ++----
 .../webgpu/cpp/rnwgpu/api/GPUCanvasContext.h  | 16 ++--
 packages/webgpu/react-native-wgpu.podspec     |  4 +
 16 files changed, 35 insertions(+), 460 deletions(-)
 delete mode 100644 packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java
 delete mode 100644 packages/webgpu/apple/WebGPUFrameDriver.h
 delete mode 100644 packages/webgpu/apple/WebGPUFrameDriver.mm
 delete mode 100644 packages/webgpu/cpp/rnwgpu/FrameDriver.cpp
 delete mode 100644 packages/webgpu/cpp/rnwgpu/FrameDriver.h

diff --git a/README.md b/README.md
index 433d498fa..85826903b 100644
--- a/README.md
+++ b/README.md
@@ -128,6 +128,8 @@ export function HelloTriangle() {
       passEncoder.end();
 
       device.queue.submit([commandEncoder.finish()]);
+
+      context.present();
     };
     helloTriangle();
   }, [ref]);
@@ -172,28 +174,16 @@ ctx.canvas.height = ctx.canvas.clientHeight * PixelRatio.get();
 
 ### Frame Scheduling
 
-On the **main JS runtime** and the **Reanimated UI runtime**, frame presentation is automatic: once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
+In React Native, frame presentation is a manual operation: when you are ready to present a frame, call `present()` on the context after submitting your commands to the queue. This works the same on every runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor). `present()` runs synchronously on the calling thread, so the frame is presented from whichever thread did the rendering.
 
 ```tsx
 // draw
 // submit to the queue
 device.queue.submit([commandEncoder.finish()]);
-// The frame is presented automatically on the next vsync.
+// This method is React Native only
+context.present();
 ```
 
-When you render from a **dedicated worklet runtime** (e.g. `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor), it runs on its own thread where present can't be driven automatically. Call `context.present()` yourself after submitting:
-
-```tsx
-const onFrame = () => {
-  "worklet";
-  // draw on the dedicated runtime's thread
-  device.queue.submit([commandEncoder.finish()]);
-  context.present(); // required on dedicated worklet runtimes; a no-op on JS/UI
-};
-```
-
-`present()` is safe to call from a worklet that runs on either the UI runtime or a dedicated runtime: it presents on the dedicated runtime and does nothing on the JS/UI runtime (which auto-present).
-
 ### Canvas Transparency
 
 On Android, the `alphaMode` property is ignored when configuring the canvas.
@@ -302,6 +292,7 @@ const render = () => {
 
   // ... encode a pass that samples `externalTexture`, then:
   device.queue.submit([encoder.finish()]);
+  context.present();
 
   // Release the surface's access window right after the submit that sampled it.
   externalTexture.destroy();
@@ -336,6 +327,7 @@ const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   const commandEncoder = device.createCommandEncoder();
   // ... render ...
   device.queue.submit([commandEncoder.finish()]);
+  context.present();
 };
 
 // Initialize WebGPU on main thread, then run on UI thread
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index 433d498fa..85826903b 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -128,6 +128,8 @@ export function HelloTriangle() {
       passEncoder.end();
 
       device.queue.submit([commandEncoder.finish()]);
+
+      context.present();
     };
     helloTriangle();
   }, [ref]);
@@ -172,28 +174,16 @@ ctx.canvas.height = ctx.canvas.clientHeight * PixelRatio.get();
 
 ### Frame Scheduling
 
-On the **main JS runtime** and the **Reanimated UI runtime**, frame presentation is automatic: once you acquire the frame's texture with `context.getCurrentTexture()` and submit your commands, the frame is presented on the next display refresh (driven by a global vsync source: `CADisplayLink` on iOS, `Choreographer` on Android). There is no `present()` call.
+In React Native, frame presentation is a manual operation: when you are ready to present a frame, call `present()` on the context after submitting your commands to the queue. This works the same on every runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor). `present()` runs synchronously on the calling thread, so the frame is presented from whichever thread did the rendering.
 
 ```tsx
 // draw
 // submit to the queue
 device.queue.submit([commandEncoder.finish()]);
-// The frame is presented automatically on the next vsync.
+// This method is React Native only
+context.present();
 ```
 
-When you render from a **dedicated worklet runtime** (e.g. `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor), it runs on its own thread where present can't be driven automatically. Call `context.present()` yourself after submitting:
-
-```tsx
-const onFrame = () => {
-  "worklet";
-  // draw on the dedicated runtime's thread
-  device.queue.submit([commandEncoder.finish()]);
-  context.present(); // required on dedicated worklet runtimes; a no-op on JS/UI
-};
-```
-
-`present()` is safe to call from a worklet that runs on either the UI runtime or a dedicated runtime: it presents on the dedicated runtime and does nothing on the JS/UI runtime (which auto-present).
-
 ### Canvas Transparency
 
 On Android, the `alphaMode` property is ignored when configuring the canvas.
@@ -302,6 +292,7 @@ const render = () => {
 
   // ... encode a pass that samples `externalTexture`, then:
   device.queue.submit([encoder.finish()]);
+  context.present();
 
   // Release the surface's access window right after the submit that sampled it.
   externalTexture.destroy();
@@ -336,6 +327,7 @@ const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   const commandEncoder = device.createCommandEncoder();
   // ... render ...
   device.queue.submit([commandEncoder.finish()]);
+  context.present();
 };
 
 // Initialize WebGPU on main thread, then run on UI thread
diff --git a/packages/webgpu/android/CMakeLists.txt b/packages/webgpu/android/CMakeLists.txt
index 51005acdc..50756e72e 100644
--- a/packages/webgpu/android/CMakeLists.txt
+++ b/packages/webgpu/android/CMakeLists.txt
@@ -47,7 +47,6 @@ add_library(${PACKAGE_NAME} SHARED
     ../cpp/rnwgpu/api/GPUComputePipeline.cpp
     ../cpp/rnwgpu/api/GPUCanvasContext.cpp
     ../cpp/rnwgpu/RNWebGPUManager.cpp
-    ../cpp/rnwgpu/FrameDriver.cpp
     ../cpp/jsi/Promise.cpp
     ../cpp/jsi/RuntimeLifecycleMonitor.cpp
     ../cpp/jsi/RuntimeAwareCache.cpp
diff --git a/packages/webgpu/android/cpp/cpp-adapter.cpp b/packages/webgpu/android/cpp/cpp-adapter.cpp
index 4f0ba61d3..2a441c218 100644
--- a/packages/webgpu/android/cpp/cpp-adapter.cpp
+++ b/packages/webgpu/android/cpp/cpp-adapter.cpp
@@ -10,7 +10,6 @@
 #include <webgpu/webgpu_cpp.h>
 
 #include "AndroidPlatformContext.h"
-#include "FrameDriver.h"
 #include "GPUCanvasContext.h"
 #include "RNWebGPUManager.h"
 
@@ -18,37 +17,6 @@
 
 std::shared_ptr<rnwgpu::RNWebGPUManager> manager;
 
-// JNI handles for driving the vsync source (com.webgpu.WebGPUFrameDriver),
-// cached on the JNI thread in initializeNative (which has the app classloader).
-static JavaVM *gJavaVM = nullptr;
-static jclass gFrameDriverClass = nullptr;
-static jmethodID gFrameDriverStart = nullptr;
-static jmethodID gFrameDriverStop = nullptr;
-
-static void callFrameDriver(jmethodID method) {
-  if (gJavaVM == nullptr || gFrameDriverClass == nullptr || method == nullptr) {
-    return;
-  }
-  JNIEnv *env = nullptr;
-  bool attached = false;
-  jint res = gJavaVM->GetEnv(reinterpret_cast<void **>(&env), JNI_VERSION_1_6);
-  if (res == JNI_EDETACHED) {
-    if (gJavaVM->AttachCurrentThread(&env, nullptr) != JNI_OK) {
-      return;
-    }
-    attached = true;
-  } else if (res != JNI_OK) {
-    return;
-  }
-  env->CallStaticVoidMethod(gFrameDriverClass, method);
-  if (env->ExceptionCheck()) {
-    env->ExceptionClear();
-  }
-  if (attached) {
-    gJavaVM->DetachCurrentThread();
-  }
-}
-
 extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUModule_initializeNative(
     JNIEnv *env, jobject /* this */, jlong jsRuntime,
     jobject jsCallInvokerHolder, jobject blobModule) {
@@ -63,27 +31,6 @@ extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUModule_initializeNative(
       std::make_shared<rnwgpu::AndroidPlatformContext>(globalBlobModule);
   manager = std::make_shared<rnwgpu::RNWebGPUManager>(runtime, jsCallInvoker,
                                                       platformContext);
-
-  // Cache JNI handles for the Choreographer-based vsync source and register it
-  // with the FrameDriver to drive auto-present (replaces context.present()).
-  env->GetJavaVM(&gJavaVM);
-  jclass localCls = env->FindClass("com/webgpu/WebGPUFrameDriver");
-  if (localCls != nullptr) {
-    gFrameDriverClass = reinterpret_cast<jclass>(env->NewGlobalRef(localCls));
-    gFrameDriverStart =
-        env->GetStaticMethodID(gFrameDriverClass, "start", "()V");
-    gFrameDriverStop = env->GetStaticMethodID(gFrameDriverClass, "stop", "()V");
-    env->DeleteLocalRef(localCls);
-  }
-  rnwgpu::FrameDriver::getInstance().setPlatformVSync(
-      [] { callFrameDriver(gFrameDriverStart); },
-      [] { callFrameDriver(gFrameDriverStop); });
-}
-
-extern "C" JNIEXPORT void JNICALL
-Java_com_webgpu_WebGPUFrameDriver_nativeOnVSync(JNIEnv * /*env*/,
-                                                jclass /*clazz*/) {
-  rnwgpu::FrameDriver::getInstance().onVSync();
 }
 
 extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUView_onSurfaceChanged(
@@ -119,7 +66,6 @@ Java_com_webgpu_WebGPUView_switchToOffscreenSurface(JNIEnv *env, jobject thiz,
 
 extern "C" JNIEXPORT void JNICALL Java_com_webgpu_WebGPUView_onSurfaceDestroy(
     JNIEnv *env, jobject thiz, jint contextId) {
-  rnwgpu::FrameDriver::getInstance().cancelPresent(contextId);
   auto &registry = rnwgpu::SurfaceRegistry::getInstance();
   registry.removeSurfaceInfo(contextId);
 }
\ No newline at end of file
diff --git a/packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java b/packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java
deleted file mode 100644
index 03a1d2c29..000000000
--- a/packages/webgpu/android/src/main/java/com/webgpu/WebGPUFrameDriver.java
+++ /dev/null
@@ -1,66 +0,0 @@
-package com.webgpu;
-
-import android.os.Handler;
-import android.os.Looper;
-import android.view.Choreographer;
-
-/**
- * Drives WebGPU auto-present from the main-thread {@link Choreographer},
- * replacing the manual {@code context.present()} call.
- *
- * <p>{@link #start()} / {@link #stop()} are invoked from native code
- * (rnwgpu::FrameDriver::setPlatformVSync) on arbitrary threads; both hop to the
- * main thread. While running, {@link #doFrame(long)} calls back into native
- * once per vsync, where pending surfaces are presented.
- */
-public class WebGPUFrameDriver implements Choreographer.FrameCallback {
-  private static final WebGPUFrameDriver INSTANCE = new WebGPUFrameDriver();
-
-  private final Handler mainHandler = new Handler(Looper.getMainLooper());
-  private boolean running = false;
-
-  private WebGPUFrameDriver() {}
-
-  /** Called from native (any thread). */
-  public static void start() {
-    INSTANCE.startInternal();
-  }
-
-  /** Called from native (any thread). */
-  public static void stop() {
-    INSTANCE.stopInternal();
-  }
-
-  private void startInternal() {
-    mainHandler.post(
-        () -> {
-          if (running) {
-            return;
-          }
-          running = true;
-          Choreographer.getInstance().postFrameCallback(this);
-        });
-  }
-
-  private void stopInternal() {
-    mainHandler.post(
-        () -> {
-          if (!running) {
-            return;
-          }
-          running = false;
-          Choreographer.getInstance().removeFrameCallback(this);
-        });
-  }
-
-  @Override
-  public void doFrame(long frameTimeNanos) {
-    if (!running) {
-      return;
-    }
-    nativeOnVSync();
-    Choreographer.getInstance().postFrameCallback(this);
-  }
-
-  private static native void nativeOnVSync();
-}
diff --git a/packages/webgpu/apple/MetalView.mm b/packages/webgpu/apple/MetalView.mm
index e617da889..ccff1245c 100644
--- a/packages/webgpu/apple/MetalView.mm
+++ b/packages/webgpu/apple/MetalView.mm
@@ -1,8 +1,6 @@
 #import "MetalView.h"
 #import "webgpu/webgpu_cpp.h"
 
-#include "FrameDriver.h"
-
 @implementation MetalView {
   BOOL _isConfigured;
 }
@@ -44,8 +42,6 @@ - (void)update {
 }
 
 - (void)dealloc {
-  // Stop any pending auto-present for this surface before it goes away.
-  rnwgpu::FrameDriver::getInstance().cancelPresent([_contextId intValue]);
   auto &registry = rnwgpu::SurfaceRegistry::getInstance();
   // Remove the surface info from the registry
   registry.removeSurfaceInfo([_contextId intValue]);
diff --git a/packages/webgpu/apple/WebGPUFrameDriver.h b/packages/webgpu/apple/WebGPUFrameDriver.h
deleted file mode 100644
index aacae84ee..000000000
--- a/packages/webgpu/apple/WebGPUFrameDriver.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#pragma once
-
-#import <Foundation/Foundation.h>
-
-// Objective-C wrapper around the platform vsync source (CADisplayLink) that
-// drives rnwgpu::FrameDriver::onVSync() once per frame. start/stop are invoked
-// by the C++ FrameDriver via setPlatformVSync; both hop to the main thread.
-@interface WebGPUFrameDriver : NSObject
-
-+ (void)start;
-+ (void)stop;
-
-@end
diff --git a/packages/webgpu/apple/WebGPUFrameDriver.mm b/packages/webgpu/apple/WebGPUFrameDriver.mm
deleted file mode 100644
index 1d302e2fa..000000000
--- a/packages/webgpu/apple/WebGPUFrameDriver.mm
+++ /dev/null
@@ -1,88 +0,0 @@
-#import "WebGPUFrameDriver.h"
-
-#import "RNWGUIKit.h"
-#import <QuartzCore/QuartzCore.h>
-
-#include "FrameDriver.h"
-
-@implementation WebGPUFrameDriver
-
-+ (void)onFrame {
-  rnwgpu::FrameDriver::getInstance().onVSync();
-}
-
-#if !TARGET_OS_OSX
-
-// iOS / tvOS: CADisplayLink on the main run loop, paused/resumed for
-// start/stop.
-static CADisplayLink *sDisplayLink = nil;
-
-+ (void)tick:(CADisplayLink *)link {
-  [WebGPUFrameDriver onFrame];
-}
-
-+ (void)start {
-  dispatch_async(dispatch_get_main_queue(), ^{
-    if (sDisplayLink == nil) {
-      sDisplayLink = [CADisplayLink displayLinkWithTarget:self
-                                                 selector:@selector(tick:)];
-      [sDisplayLink addToRunLoop:[NSRunLoop mainRunLoop]
-                         forMode:NSRunLoopCommonModes];
-    }
-    sDisplayLink.paused = NO;
-  });
-}
-
-+ (void)stop {
-  dispatch_async(dispatch_get_main_queue(), ^{
-    sDisplayLink.paused = YES;
-  });
-}
-
-#else // TARGET_OS_OSX
-
-// macOS: CADisplayLink is available via NSScreen on 14.0+. On older systems we
-// fall back to an NSTimer at ~60Hz (not vsync-aligned, but keeps auto-present
-// working). FrameDriver self-idles cheaply when nothing is rendering.
-static id sDisplayLink = nil;
-
-+ (void)tick:(id)sender {
-  [WebGPUFrameDriver onFrame];
-}
-
-+ (void)start {
-  dispatch_async(dispatch_get_main_queue(), ^{
-    if (sDisplayLink == nil) {
-      if (@available(macOS 14.0, *)) {
-        CADisplayLink *link =
-            [NSScreen.mainScreen displayLinkWithTarget:self
-                                              selector:@selector(tick:)];
-        [link addToRunLoop:[NSRunLoop mainRunLoop]
-                   forMode:NSRunLoopCommonModes];
-        sDisplayLink = link;
-      } else {
-        sDisplayLink = [NSTimer scheduledTimerWithTimeInterval:1.0 / 60.0
-                                                        target:self
-                                                      selector:@selector(tick:)
-                                                      userInfo:nil
-                                                       repeats:YES];
-      }
-    }
-    if ([sDisplayLink isKindOfClass:[CADisplayLink class]]) {
-      ((CADisplayLink *)sDisplayLink).paused = NO;
-    }
-  });
-}
-
-+ (void)stop {
-  dispatch_async(dispatch_get_main_queue(), ^{
-    if ([sDisplayLink isKindOfClass:[CADisplayLink class]]) {
-      ((CADisplayLink *)sDisplayLink).paused = YES;
-    }
-    // NSTimer fallback keeps firing; onVSync is a cheap no-op while idle.
-  });
-}
-
-#endif // TARGET_OS_OSX
-
-@end
diff --git a/packages/webgpu/apple/WebGPUModule.mm b/packages/webgpu/apple/WebGPUModule.mm
index c4c7224ad..ad44a4c79 100644
--- a/packages/webgpu/apple/WebGPUModule.mm
+++ b/packages/webgpu/apple/WebGPUModule.mm
@@ -1,8 +1,6 @@
 #import "WebGPUModule.h"
 #include "ApplePlatformContext.h"
-#include "FrameDriver.h"
 #import "GPUCanvasContext.h"
-#import "WebGPUFrameDriver.h"
 
 #import <React/RCTBridge+Private.h>
 #import <React/RCTCallInvoker.h>
@@ -81,10 +79,6 @@ - (void)invalidate {
   webgpuManager = std::make_shared<rnwgpu::RNWebGPUManager>(runtime, jsInvoker,
                                                             platformContext);
 
-  // Drive auto-present from the display's vsync (replaces context.present()).
-  rnwgpu::FrameDriver::getInstance().setPlatformVSync(
-      [] { [WebGPUFrameDriver start]; }, [] { [WebGPUFrameDriver stop]; });
-
   return @true;
 }
 
diff --git a/packages/webgpu/cpp/rnwgpu/FrameDriver.cpp b/packages/webgpu/cpp/rnwgpu/FrameDriver.cpp
deleted file mode 100644
index 792940e5e..000000000
--- a/packages/webgpu/cpp/rnwgpu/FrameDriver.cpp
+++ /dev/null
@@ -1,81 +0,0 @@
-#include "FrameDriver.h"
-
-#include <memory>
-#include <utility>
-#include <vector>
-
-namespace jsi = facebook::jsi;
-
-namespace rnwgpu {
-
-FrameDriver &FrameDriver::getInstance() {
-  static FrameDriver instance;
-  return instance;
-}
-
-void FrameDriver::setPlatformVSync(std::function<void()> start,
-                                   std::function<void()> stop) {
-  std::lock_guard<std::mutex> lock(_mutex);
-  _start = std::move(start);
-  _stop = std::move(stop);
-}
-
-void FrameDriver::requestPresent(
-    int contextId, std::shared_ptr<SurfaceInfo> surface,
-    std::shared_ptr<async::RuntimeScheduler> scheduler) {
-  if (!surface || !scheduler) {
-    return;
-  }
-
-  std::function<void()> startToCall;
-  {
-    std::lock_guard<std::mutex> lock(_mutex);
-    _pending[contextId] = {std::move(surface), std::move(scheduler)};
-    _idleFrames = 0;
-    if (!_running && _start) {
-      _running = true;
-      startToCall = _start;
-    }
-  }
-
-  // Invoked outside the lock: the platform start hops to the UI thread.
-  if (startToCall) {
-    startToCall();
-  }
-}
-
-void FrameDriver::cancelPresent(int contextId) {
-  std::lock_guard<std::mutex> lock(_mutex);
-  _pending.erase(contextId);
-}
-
-void FrameDriver::onVSync() {
-  std::vector<Pending> toPresent;
-  std::function<void()> stopToCall;
-  {
-    std::lock_guard<std::mutex> lock(_mutex);
-    if (!_pending.empty()) {
-      toPresent.reserve(_pending.size());
-      for (auto &entry : _pending) {
-        toPresent.push_back(std::move(entry.second));
-      }
-      _pending.clear();
-      _idleFrames = 0;
-    } else if (_running && ++_idleFrames >= kMaxIdleFrames) {
-      _running = false;
-      stopToCall = _stop;
-    }
-  }
-
-  for (auto &pending : toPresent) {
-    auto surface = pending.surface;
-    pending.scheduler->scheduleOnJS(
-        [surface](jsi::Runtime & /*runtime*/) { surface->presentFrame(); });
-  }
-
-  if (stopToCall) {
-    stopToCall();
-  }
-}
-
-} // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/FrameDriver.h b/packages/webgpu/cpp/rnwgpu/FrameDriver.h
deleted file mode 100644
index c16fedabf..000000000
--- a/packages/webgpu/cpp/rnwgpu/FrameDriver.h
+++ /dev/null
@@ -1,83 +0,0 @@
-#pragma once
-
-#include <functional>
-#include <memory>
-#include <mutex>
-#include <unordered_map>
-
-#include "SurfaceRegistry.h"
-#include "rnwgpu/async/RuntimeScheduler.h"
-
-namespace rnwgpu {
-
-/**
- * Global vsync-driven auto-present coordinator. Replaces the manual
- * `context.present()` call.
- *
- * Flow:
- *   - `GPUCanvasContext::getCurrentTexture()` (JS thread) calls
- * `requestPresent` for its surface, tagged with the owning runtime's
- * RuntimeScheduler.
- *   - A platform vsync source (iOS CADisplayLink / Android Choreographer) calls
- *     `onVSync()` on the UI thread once per frame.
- *   - On each vsync, every surface that requested a present has its present
- *     dispatched onto its owning runtime's JS thread (so `Surface.Present()`
- * and the Apple Metal scheduling wait run on the same thread that did
- *     getCurrentTexture / submit, preserving Dawn surface thread-affinity and
- *     present-after-submit ordering via FIFO on that loop).
- *
- * The vsync source is request-driven: it is started when the first present is
- * requested and stopped after a few idle frames, so an idle (non-rendering) app
- * costs zero CPU.
- */
-class FrameDriver {
-public:
-  static FrameDriver &getInstance();
-
-  /**
-   * Register how to start/stop the platform vsync source. `start`/`stop` are
-   * invoked when presents begin/cease; each implementation is responsible for
-   * hopping to the UI thread as needed. Called once per platform at init.
-   */
-  void setPlatformVSync(std::function<void()> start,
-                        std::function<void()> stop);
-
-  /**
-   * Request that `surface` be presented at the next vsync. Coalesced per
-   * contextId (at most one present per surface per frame). Thread-safe; called
-   * from a JS thread inside getCurrentTexture. Surfaces with no on-screen
-   * `wgpu::Surface` (offscreen) should not be registered.
-   */
-  void requestPresent(int contextId, std::shared_ptr<SurfaceInfo> surface,
-                      std::shared_ptr<async::RuntimeScheduler> scheduler);
-
-  /**
-   * Drop any pending present for a surface (e.g. when its view is torn down).
-   * Thread-safe.
-   */
-  void cancelPresent(int contextId);
-
-  /** Called by the platform vsync source on the UI thread, once per frame. */
-  void onVSync();
-
-private:
-  FrameDriver() = default;
-
-  struct Pending {
-    std::shared_ptr<SurfaceInfo> surface;
-    std::shared_ptr<async::RuntimeScheduler> scheduler;
-  };
-
-  // Number of consecutive empty frames before the vsync source is stopped.
-  // A small grace period avoids start/stop thrash during continuous rendering.
-  static constexpr int kMaxIdleFrames = 3;
-
-  std::mutex _mutex;
-  std::unordered_map<int, Pending> _pending;
-  std::function<void()> _start;
-  std::function<void()> _stop;
-  bool _running = false;
-  int _idleFrames = 0;
-};
-
-} // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h b/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
index ed098896a..db18d7af1 100644
--- a/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
+++ b/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
@@ -119,9 +119,10 @@ class SurfaceInfo {
     height = newHeight;
   }
 
-  // Present the current surface texture. Called at the frame boundary from the
-  // owning runtime's JS thread (via FrameDriver), replacing the old manual
-  // present(). No-op when offscreen / unconfigured (no surface).
+  // Present the current surface texture. Called synchronously from the thread
+  // that did getCurrentTexture / submit (via GPUCanvasContext::present), so it
+  // preserves Dawn surface thread-affinity. No-op when offscreen / unconfigured
+  // (no surface).
   void presentFrame() {
 #ifdef __APPLE__
     // Ensure command buffers are scheduled before presenting. Read the device
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.h b/packages/webgpu/cpp/rnwgpu/api/GPU.h
index b2488d4c7..e7dc15caf 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.h
@@ -53,7 +53,6 @@ class GPU : public NativeObject<GPU> {
   }
 
   inline const wgpu::Instance get() { return _instance; }
-  inline std::shared_ptr<async::RuntimeContext> getContext() { return _async; }
 
 private:
   wgpu::Instance _instance;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
index fb7a6efd3..4da91d441 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.cpp
@@ -31,10 +31,7 @@ void GPUCanvasContext::configure(
 
 void GPUCanvasContext::unconfigure() {}
 
-jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
-                                               const jsi::Value & /*thisValue*/,
-                                               const jsi::Value * /*args*/,
-                                               size_t /*count*/) {
+std::shared_ptr<GPUTexture> GPUCanvasContext::getCurrentTexture() {
   auto prevSize = _surfaceInfo->getConfig();
   auto width = _canvas->getWidth();
   auto height = _canvas->getHeight();
@@ -49,29 +46,19 @@ jsi::Value GPUCanvasContext::getCurrentTexture(jsi::Runtime &runtime,
   _canvas->setClientWidth(size.width);
   _canvas->setClientHeight(size.height);
 
-  // getCurrentTexture has no side effects: acquiring the texture must not
-  // schedule a present (that would be a surprising, spec-violating coupling).
-  // Callers present explicitly via ctx.present() after submit, on whichever
-  // thread did the rendering.
-
   // Pass reportsMemoryPressure=false to avoid triggering spurious Hermes GC
   // cycles every frame since the canvas texture doesn't own the buffer.
-  auto gpuTexture = std::make_shared<GPUTexture>(texture, "", false);
-  return JSIConverter<std::shared_ptr<GPUTexture>>::toJSI(runtime, gpuTexture);
+  return std::make_shared<GPUTexture>(texture, "", false);
 }
 
-jsi::Value GPUCanvasContext::present(jsi::Runtime & /*runtime*/,
-                                     const jsi::Value & /*thisValue*/,
-                                     const jsi::Value * /*args*/,
-                                     size_t /*count*/) {
-  // Present is always explicit. It runs synchronously on the calling thread
-  // (the one that did getCurrentTexture / submit), preserving Dawn surface
-  // thread-affinity. Required on every runtime (main JS, UI, dedicated
-  // worklet); offscreen surfaces have no wgpu::Surface so they no-op.
+void GPUCanvasContext::present() {
+  // Present runs synchronously on the calling thread (the one that did
+  // getCurrentTexture / submit), preserving Dawn surface thread-affinity.
+  // Required on every runtime (main JS, Reanimated UI, dedicated worklet);
+  // offscreen surfaces have no wgpu::Surface so they no-op.
   if (_surfaceInfo->hasSurface()) {
     _surfaceInfo->presentFrame();
   }
-  return jsi::Value::undefined();
 }
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
index a2e80b7cc..a5efc3c6a 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUCanvasContext.h
@@ -26,7 +26,7 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
 
   GPUCanvasContext(std::shared_ptr<GPU> gpu, int contextId, int width,
                    int height)
-      : NativeObject(CLASS_NAME), _contextId(contextId), _gpu(std::move(gpu)) {
+      : NativeObject(CLASS_NAME), _gpu(std::move(gpu)) {
     _canvas = std::make_shared<Canvas>(nullptr, width, height);
     auto &registry = rnwgpu::SurfaceRegistry::getInstance();
     _surfaceInfo =
@@ -54,17 +54,13 @@ class GPUCanvasContext : public NativeObject<GPUCanvasContext> {
   inline const wgpu::Surface get() { return nullptr; }
   void configure(std::shared_ptr<GPUCanvasConfiguration> configuration);
   void unconfigure();
-  // Full-control signatures so we can learn the *calling* runtime and decide
-  // how this frame is presented (auto on the JS / UI runtime; explicit
-  // ctx.present() on a dedicated worklet runtime).
-  jsi::Value getCurrentTexture(jsi::Runtime &runtime,
-                               const jsi::Value &thisValue,
-                               const jsi::Value *args, size_t count);
-  jsi::Value present(jsi::Runtime &runtime, const jsi::Value &thisValue,
-                     const jsi::Value *args, size_t count);
+  std::shared_ptr<GPUTexture> getCurrentTexture();
+  // Present is explicit on every runtime (main JS, Reanimated UI, and dedicated
+  // worklet runtimes). It runs synchronously on the calling thread, preserving
+  // Dawn surface thread-affinity; offscreen surfaces no-op.
+  void present();
 
 private:
-  int _contextId;
   std::shared_ptr<Canvas> _canvas;
   std::shared_ptr<SurfaceInfo> _surfaceInfo;
   std::shared_ptr<GPU> _gpu;
diff --git a/packages/webgpu/react-native-wgpu.podspec b/packages/webgpu/react-native-wgpu.podspec
index ac01a3b66..36bfb0b8e 100644
--- a/packages/webgpu/react-native-wgpu.podspec
+++ b/packages/webgpu/react-native-wgpu.podspec
@@ -21,6 +21,10 @@ Pod::Spec.new do |s|
 
   s.vendored_frameworks = 'libs/apple/libwebgpu_dawn.xcframework'
 
+  # The VideoPlayer API uses AVFoundation / CoreMedia, and shared-texture
+  # surfaces use CoreVideo (CVPixelBuffer). Link them so their symbols resolve.
+  s.frameworks = "AVFoundation", "CoreMedia", "CoreVideo"
+
   s.pod_target_xcconfig = {
     'HEADER_SEARCH_PATHS' => '$(PODS_TARGET_SRCROOT)/cpp',
   }

From 9388e92360405587dfa76fc9eabb4f1305223ec2 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Thu, 4 Jun 2026 20:33:11 +0200
Subject: [PATCH 11/25] :wrench:

---
 apps/example/ios/Podfile.lock     | 10 +++++-----
 packages/webgpu-shim/package.json |  2 +-
 packages/webgpu-shim/src/index.ts |  8 ++++++++
 packages/webgpu/package.json      |  2 +-
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/apps/example/ios/Podfile.lock b/apps/example/ios/Podfile.lock
index b4c5f158a..8559c8c27 100644
--- a/apps/example/ios/Podfile.lock
+++ b/apps/example/ios/Podfile.lock
@@ -1924,7 +1924,7 @@ PODS:
     - ReactCommon/turbomodule/core
     - SocketRocket
     - Yoga
-  - react-native-wgpu (0.5.13):
+  - react-native-webgpu (0.5.14):
     - boost
     - DoubleConversion
     - fast_float
@@ -2812,7 +2812,7 @@ DEPENDENCIES:
   - React-microtasksnativemodule (from `../../../node_modules/react-native/ReactCommon/react/nativemodule/microtasks`)
   - react-native-safe-area-context (from `../../../node_modules/react-native-safe-area-context`)
   - "react-native-skia (from `../../../node_modules/@shopify/react-native-skia`)"
-  - react-native-wgpu (from `../../../node_modules/react-native-wgpu`)
+  - react-native-webgpu (from `../../../node_modules/react-native-webgpu`)
   - React-NativeModulesApple (from `../../../node_modules/react-native/ReactCommon/react/nativemodule/core/platform/ios`)
   - React-oscompat (from `../../../node_modules/react-native/ReactCommon/oscompat`)
   - React-perflogger (from `../../../node_modules/react-native/ReactCommon/reactperflogger`)
@@ -2948,8 +2948,8 @@ EXTERNAL SOURCES:
     :path: "../../../node_modules/react-native-safe-area-context"
   react-native-skia:
     :path: "../../../node_modules/@shopify/react-native-skia"
-  react-native-wgpu:
-    :path: "../../../node_modules/react-native-wgpu"
+  react-native-webgpu:
+    :path: "../../../node_modules/react-native-webgpu"
   React-NativeModulesApple:
     :path: "../../../node_modules/react-native/ReactCommon/react/nativemodule/core/platform/ios"
   React-oscompat:
@@ -3074,7 +3074,7 @@ SPEC CHECKSUMS:
   React-microtasksnativemodule: 75b6604b667d297292345302cc5bfb6b6aeccc1b
   react-native-safe-area-context: c00143b4823773bba23f2f19f85663ae89ceb460
   react-native-skia: fc73e9bdc46ebb420a98c9c2be29fee80f565e79
-  react-native-wgpu: 0496e9efeb4c3939ab56371005ede4e1468591d1
+  react-native-webgpu: ea7239ee381b4937d8e971f648cdcf6b9ff4de7e
   React-NativeModulesApple: 879fbdc5dcff7136abceb7880fe8a2022a1bd7c3
   React-oscompat: 93b5535ea7f7dff46aaee4f78309a70979bdde9d
   React-perflogger: 5536d2df3d18fe0920263466f7b46a56351c0510
diff --git a/packages/webgpu-shim/package.json b/packages/webgpu-shim/package.json
index ff69270ce..59d414c1e 100644
--- a/packages/webgpu-shim/package.json
+++ b/packages/webgpu-shim/package.json
@@ -1,6 +1,6 @@
 {
   "name": "react-native-wgpu",
-  "version": "0.5.11",
+  "version": "0.5.14",
   "description": "Shim that re-exports react-native-webgpu under its previous package name",
   "main": "lib/commonjs/index",
   "module": "lib/module/index",
diff --git a/packages/webgpu-shim/src/index.ts b/packages/webgpu-shim/src/index.ts
index 409910e79..7b0014339 100644
--- a/packages/webgpu-shim/src/index.ts
+++ b/packages/webgpu-shim/src/index.ts
@@ -1 +1,9 @@
+if (typeof __DEV__ !== "undefined" && __DEV__) {
+  console.warn(
+    "[react-native-wgpu] This package has been renamed to 'react-native-webgpu'. " +
+      "The 'react-native-wgpu' shim is deprecated and will be removed in a future release. " +
+      "Please install 'react-native-webgpu' and update your imports.",
+  );
+}
+
 export * from "react-native-webgpu";
diff --git a/packages/webgpu/package.json b/packages/webgpu/package.json
index 1e08e05fa..b69932f21 100644
--- a/packages/webgpu/package.json
+++ b/packages/webgpu/package.json
@@ -1,6 +1,6 @@
 {
   "name": "react-native-webgpu",
-  "version": "0.5.13",
+  "version": "0.5.14",
   "description": "React Native WebGPU",
   "main": "lib/commonjs/index",
   "module": "lib/module/index",

From 85341defc6f17db7381da50c99fa7b8d7b7c0f26 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sat, 6 Jun 2026 18:50:48 +0200
Subject: [PATCH 12/25] :wrench:

---
 .../cpp/rnwgpu/async/AsyncTaskHandle.cpp      | 26 ++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
index c0876c1e3..5d5ea195d 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
@@ -84,12 +84,32 @@ void AsyncTaskHandle::State::schedule(Action action) {
     return;
   }
 
+  // The settle callback fires on a GpuEventLoop background worker thread. We
+  // must NOT touch the JS runtime from there: Hermes is single-threaded, so
+  // calling queueMicrotask (or any other JSI op, including Promise.resolve)
+  // off-thread corrupts the heap (crash in GCScope::_newChunkAndPHV). So we
+  // first hop onto the owning runtime's JS thread via the thread-safe
+  // RuntimeScheduler, and only THEN, now safely on that thread, queue the
+  // settlement as a microtask via that runtime's queueMicrotask.
   scheduler->scheduleOnJS([self = shared_from_this(),
                            action = std::move(action),
                            promiseRef](jsi::Runtime &runtime) mutable {
-    action(runtime, *promiseRef);
-    std::lock_guard<std::mutex> lock(self->mutex);
-    self->keepAlive.reset();
+    auto microtask = jsi::Function::createFromHostFunction(
+        runtime, jsi::PropNameID::forUtf8(runtime, "__rnwgpuSettleMicrotask"),
+        0,
+        [self = std::move(self), action = std::move(action), promiseRef](
+            jsi::Runtime &rt, const jsi::Value & /*thisVal*/,
+            const jsi::Value * /*args*/, size_t /*count*/) mutable
+        -> jsi::Value {
+          action(rt, *promiseRef);
+          std::lock_guard<std::mutex> lock(self->mutex);
+          self->keepAlive.reset();
+          return jsi::Value::undefined();
+        });
+
+    runtime.global()
+        .getPropertyAsFunction(runtime, "queueMicrotask")
+        .call(runtime, std::move(microtask));
   });
 }
 

From b298cd175bb155c9d84e12e9a37f9249109c338b Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sat, 6 Jun 2026 18:51:54 +0200
Subject: [PATCH 13/25] :arrow_up:

---
 packages/webgpu-shim/package.json | 2 +-
 packages/webgpu/package.json      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/packages/webgpu-shim/package.json b/packages/webgpu-shim/package.json
index f1b29c1c0..f9318ed6d 100644
--- a/packages/webgpu-shim/package.json
+++ b/packages/webgpu-shim/package.json
@@ -1,6 +1,6 @@
 {
   "name": "react-native-wgpu",
-  "version": "0.5.14",
+  "version": "0.5.15",
   "description": "Shim that re-exports react-native-webgpu under its previous package name",
   "main": "lib/commonjs/index",
   "module": "lib/module/index",
diff --git a/packages/webgpu/package.json b/packages/webgpu/package.json
index b69932f21..ce48f8605 100644
--- a/packages/webgpu/package.json
+++ b/packages/webgpu/package.json
@@ -1,6 +1,6 @@
 {
   "name": "react-native-webgpu",
-  "version": "0.5.14",
+  "version": "0.5.15",
   "description": "React Native WebGPU",
   "main": "lib/commonjs/index",
   "module": "lib/module/index",

From 32907892368d781f3be1fcb8a7b5dbbdda179705 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sat, 6 Jun 2026 19:15:07 +0200
Subject: [PATCH 14/25] :wrench:

---
 apps/example/ios/Podfile.lock                 |   4 +-
 apps/example/src/Reanimated/AsyncBuffer.tsx   | 196 ++++++++++++++++++
 .../Reanimated/AsyncBufferDedicatedThread.tsx |  14 ++
 .../src/Reanimated/AsyncBufferUIThread.tsx    |   8 +
 apps/example/src/Reanimated/List.tsx          |   8 +
 apps/example/src/Reanimated/Routes.ts         |   2 +
 apps/example/src/Reanimated/index.tsx         |  16 ++
 7 files changed, 246 insertions(+), 2 deletions(-)
 create mode 100644 apps/example/src/Reanimated/AsyncBuffer.tsx
 create mode 100644 apps/example/src/Reanimated/AsyncBufferDedicatedThread.tsx
 create mode 100644 apps/example/src/Reanimated/AsyncBufferUIThread.tsx

diff --git a/apps/example/ios/Podfile.lock b/apps/example/ios/Podfile.lock
index 8559c8c27..560141dff 100644
--- a/apps/example/ios/Podfile.lock
+++ b/apps/example/ios/Podfile.lock
@@ -1924,7 +1924,7 @@ PODS:
     - ReactCommon/turbomodule/core
     - SocketRocket
     - Yoga
-  - react-native-webgpu (0.5.14):
+  - react-native-webgpu (0.5.15):
     - boost
     - DoubleConversion
     - fast_float
@@ -3074,7 +3074,7 @@ SPEC CHECKSUMS:
   React-microtasksnativemodule: 75b6604b667d297292345302cc5bfb6b6aeccc1b
   react-native-safe-area-context: c00143b4823773bba23f2f19f85663ae89ceb460
   react-native-skia: fc73e9bdc46ebb420a98c9c2be29fee80f565e79
-  react-native-webgpu: ea7239ee381b4937d8e971f648cdcf6b9ff4de7e
+  react-native-webgpu: 02d51c1d86e4d653de06bdc954d2f693dcead7a5
   React-NativeModulesApple: 879fbdc5dcff7136abceb7880fe8a2022a1bd7c3
   React-oscompat: 93b5535ea7f7dff46aaee4f78309a70979bdde9d
   React-perflogger: 5536d2df3d18fe0920263466f7b46a56351c0510
diff --git a/apps/example/src/Reanimated/AsyncBuffer.tsx b/apps/example/src/Reanimated/AsyncBuffer.tsx
new file mode 100644
index 000000000..18966b581
--- /dev/null
+++ b/apps/example/src/Reanimated/AsyncBuffer.tsx
@@ -0,0 +1,196 @@
+import React, { useEffect, useRef } from "react";
+import { StyleSheet, View } from "react-native";
+import type { CanvasRef, RNCanvasContext } from "react-native-webgpu";
+import { Canvas } from "react-native-webgpu";
+import type { SharedValue } from "react-native-reanimated";
+import { useSharedValue } from "react-native-reanimated";
+
+import { redFragWGSL, triangleVertWGSL } from "../Triangle/triangle";
+
+// The GPU usage / map-mode constants are plain numbers. We resolve them on the
+// JS thread (where the constants are guaranteed to be installed) and pass them
+// into the worklet, so the worklet does not depend on those globals being
+// present on the UI / dedicated runtime.
+interface GPUFlags {
+  COPY_SRC: number;
+  COPY_DST: number;
+  MAP_READ: number;
+  MAP_WRITE: number;
+  MAP_MODE_READ: number;
+}
+
+// A triangle demo that ALSO performs an async GPU readback (buffer.mapAsync)
+// every frame, then presents only after the readback resolves. This makes the
+// behaviour of async WebGPU ops on the runtime visible: if the mapAsync Promise
+// never settles on this runtime, the animation freezes after the first frame.
+export const webGPUAsyncDemo = (
+  runAnimation: SharedValue<boolean>,
+  device: GPUDevice,
+  context: RNCanvasContext,
+  presentationFormat: GPUTextureFormat,
+  flags: GPUFlags,
+) => {
+  "worklet";
+  if (!context) {
+    throw new Error("No context");
+  }
+
+  context.configure({
+    device,
+    format: presentationFormat,
+    alphaMode: "premultiplied",
+  });
+
+  const pipeline = device.createRenderPipeline({
+    layout: "auto",
+    vertex: {
+      module: device.createShaderModule({ code: triangleVertWGSL }),
+      entryPoint: "main",
+    },
+    fragment: {
+      module: device.createShaderModule({ code: redFragWGSL }),
+      entryPoint: "main",
+      targets: [{ format: presentationFormat }],
+    },
+    primitive: { topology: "triangle-list" },
+  });
+
+  const SIZE = 16; // 4 x f32
+  // Reused across frames: we copy 4 floats into this buffer and read them back.
+  const readback = device.createBuffer({
+    size: SIZE,
+    usage: flags.COPY_DST | flags.MAP_READ,
+  });
+
+  let frameId = 0;
+
+  const frame = async () => {
+    frameId += 1;
+    const commandEncoder = device.createCommandEncoder();
+    const textureView = context.getCurrentTexture().createView();
+
+    const time = Date.now() / 1000;
+    const r = (Math.sin(time * 2) + 1) / 2;
+    const g = (Math.sin(time * 1.5 + Math.PI / 3) + 1) / 2;
+    const b = (Math.sin(time + Math.PI / 2) + 1) / 2;
+
+    const passEncoder = commandEncoder.beginRenderPass({
+      colorAttachments: [
+        {
+          view: textureView,
+          clearValue: [r, g, b, 1],
+          loadOp: "clear",
+          storeOp: "store",
+        },
+      ],
+    });
+    passEncoder.setPipeline(pipeline);
+    passEncoder.draw(3);
+    passEncoder.end();
+
+    // Put real data on the GPU so the readback below has something to wait on.
+    const src = device.createBuffer({
+      size: SIZE,
+      usage: flags.COPY_SRC | flags.MAP_WRITE,
+      mappedAtCreation: true,
+    });
+    new Float32Array(src.getMappedRange()).set([frameId, r, g, b]);
+    src.unmap();
+    commandEncoder.copyBufferToBuffer(src, 0, readback, 0, SIZE);
+
+    device.queue.submit([commandEncoder.finish()]);
+
+    // THE ASYNC OP. This Promise must settle on the runtime that is running
+    // this worklet. On the JS thread it does. On the UI / dedicated runtime the
+    // settlement currently routes through the main JS CallInvoker, so this
+    // await may resolve on the wrong runtime (or never here), freezing the loop.
+    console.log(`[asyncBuffer] frame ${frameId}: awaiting mapAsync...`);
+    await readback.mapAsync(flags.MAP_MODE_READ);
+    const data = Array.from(new Float32Array(readback.getMappedRange()));
+    readback.unmap();
+    src.destroy();
+    console.log(`[asyncBuffer] frame ${frameId}: resolved ->`, data);
+
+    // Present only AFTER the async readback resolves, so a stuck await visibly
+    // freezes the animation instead of silently dropping the readback.
+    context.present();
+
+    if (runAnimation.value) {
+      requestAnimationFrame(frame);
+    }
+  };
+  frame();
+};
+
+interface AsyncBufferExampleProps {
+  // Schedules the worklet on a given runtime (e.g. runOnUI for the UI thread,
+  // or runOnRuntime(runtime, ...) for a dedicated worklet runtime).
+  run: (
+    worklet: typeof webGPUAsyncDemo,
+  ) => (
+    runAnimation: SharedValue<boolean>,
+    device: GPUDevice,
+    context: RNCanvasContext,
+    presentationFormat: GPUTextureFormat,
+    flags: GPUFlags,
+  ) => void;
+}
+
+export function AsyncBufferExample({ run }: AsyncBufferExampleProps) {
+  const runAnimation = useSharedValue(true);
+  const ref = useRef<CanvasRef>(null);
+  useEffect(() => {
+    const initWebGPU = async () => {
+      const adapter = await navigator.gpu.requestAdapter();
+      if (!adapter) {
+        console.error("Failed to get GPU adapter");
+        return;
+      }
+      const device = await adapter.requestDevice();
+      if (!device) {
+        console.error("Failed to get GPU device");
+        return;
+      }
+      const ctx = ref.current!.getContext("webgpu");
+      if (!ctx) {
+        console.error("Failed to get GPU canvas context");
+        return;
+      }
+      const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
+      const flags: GPUFlags = {
+        COPY_SRC: GPUBufferUsage.COPY_SRC,
+        COPY_DST: GPUBufferUsage.COPY_DST,
+        MAP_READ: GPUBufferUsage.MAP_READ,
+        MAP_WRITE: GPUBufferUsage.MAP_WRITE,
+        MAP_MODE_READ: GPUMapMode.READ,
+      };
+      // TODO: stop the animation on unmount
+      run(webGPUAsyncDemo)(
+        runAnimation,
+        device,
+        ctx,
+        presentationFormat,
+        flags,
+      );
+    };
+    initWebGPU();
+    return () => {
+      runAnimation.value = false;
+    };
+  });
+  return (
+    <View style={style.container}>
+      <Canvas ref={ref} style={style.webgpu} />
+    </View>
+  );
+}
+
+const style = StyleSheet.create({
+  container: {
+    flex: 1,
+    backgroundColor: "rgb(90, 180, 255)",
+  },
+  webgpu: {
+    flex: 1,
+  },
+});
diff --git a/apps/example/src/Reanimated/AsyncBufferDedicatedThread.tsx b/apps/example/src/Reanimated/AsyncBufferDedicatedThread.tsx
new file mode 100644
index 000000000..78d9efe10
--- /dev/null
+++ b/apps/example/src/Reanimated/AsyncBufferDedicatedThread.tsx
@@ -0,0 +1,14 @@
+import React, { useMemo } from "react";
+import { createWorkletRuntime, runOnRuntime } from "react-native-worklets";
+
+import { AsyncBufferExample } from "./AsyncBuffer";
+
+export const AsyncBufferDedicatedThread = () => {
+  const runtime = useMemo(
+    () => createWorkletRuntime({ name: "WebGPUAsyncBufferRuntime" }),
+    [],
+  );
+  return (
+    <AsyncBufferExample run={(worklet) => runOnRuntime(runtime, worklet)} />
+  );
+};
diff --git a/apps/example/src/Reanimated/AsyncBufferUIThread.tsx b/apps/example/src/Reanimated/AsyncBufferUIThread.tsx
new file mode 100644
index 000000000..c310e07a1
--- /dev/null
+++ b/apps/example/src/Reanimated/AsyncBufferUIThread.tsx
@@ -0,0 +1,8 @@
+import React from "react";
+import { runOnUI } from "react-native-reanimated";
+
+import { AsyncBufferExample } from "./AsyncBuffer";
+
+export const AsyncBufferUIThread = () => {
+  return <AsyncBufferExample run={runOnUI} />;
+};
diff --git a/apps/example/src/Reanimated/List.tsx b/apps/example/src/Reanimated/List.tsx
index 6531786aa..71446fd1d 100644
--- a/apps/example/src/Reanimated/List.tsx
+++ b/apps/example/src/Reanimated/List.tsx
@@ -19,6 +19,14 @@ export const examples = [
     screen: "FrameProcessor",
     title: "📷 Frame Processor",
   },
+  {
+    screen: "AsyncBufferUIThread",
+    title: "🧵 Async Buffer (UI)",
+  },
+  {
+    screen: "AsyncBufferDedicatedThread",
+    title: "🔀 Async Buffer (Dedicated)",
+  },
 ] as const;
 
 const styles = StyleSheet.create({
diff --git a/apps/example/src/Reanimated/Routes.ts b/apps/example/src/Reanimated/Routes.ts
index d39029d66..51fedd064 100644
--- a/apps/example/src/Reanimated/Routes.ts
+++ b/apps/example/src/Reanimated/Routes.ts
@@ -3,4 +3,6 @@ export type Routes = {
   UIThread: undefined;
   DedicatedThread: undefined;
   FrameProcessor: undefined;
+  AsyncBufferUIThread: undefined;
+  AsyncBufferDedicatedThread: undefined;
 };
diff --git a/apps/example/src/Reanimated/index.tsx b/apps/example/src/Reanimated/index.tsx
index 7200678e2..1f2310317 100644
--- a/apps/example/src/Reanimated/index.tsx
+++ b/apps/example/src/Reanimated/index.tsx
@@ -6,6 +6,8 @@ import { List } from "./List";
 import { UIThread } from "./UIThread";
 import { DedicatedThread } from "./DedicatedThread";
 import { FrameProcessor } from "./FrameProcessor";
+import { AsyncBufferUIThread } from "./AsyncBufferUIThread";
+import { AsyncBufferDedicatedThread } from "./AsyncBufferDedicatedThread";
 
 const Stack = createStackNavigator<Routes>();
 export const Reanimated = () => {
@@ -40,6 +42,20 @@ export const Reanimated = () => {
           title: "📷 Frame Processor",
         }}
       />
+      <Stack.Screen
+        name="AsyncBufferUIThread"
+        component={AsyncBufferUIThread}
+        options={{
+          title: "🧵 Async Buffer (UI)",
+        }}
+      />
+      <Stack.Screen
+        name="AsyncBufferDedicatedThread"
+        component={AsyncBufferDedicatedThread}
+        options={{
+          title: "🔀 Async Buffer (Dedicated)",
+        }}
+      />
     </Stack.Navigator>
   );
 };

From 5c1d3528a1e72502531f35120e832620d2a5411d Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sat, 6 Jun 2026 20:53:47 +0200
Subject: [PATCH 15/25] :wrench:

---
 apps/example/src/Reanimated/AsyncBuffer.tsx   | 295 +++++++++++-------
 packages/webgpu/android/CMakeLists.txt        |   2 -
 packages/webgpu/cpp/jsi/NativeObject.h        |  39 +++
 .../webgpu/cpp/rnwgpu/RNWebGPUManager.cpp     |   2 +-
 packages/webgpu/cpp/rnwgpu/api/GPU.cpp        |  29 +-
 packages/webgpu/cpp/rnwgpu/api/GPU.h          |  14 +-
 packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp |   2 +-
 packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp  |   2 +-
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp  |  12 +-
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.h    |   7 +-
 packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp   |   2 +-
 .../webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp |   2 +-
 .../cpp/rnwgpu/async/AsyncTaskHandle.cpp      |  57 ++--
 .../webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h |  15 +-
 .../cpp/rnwgpu/async/CallInvokerScheduler.cpp |  21 --
 .../cpp/rnwgpu/async/CallInvokerScheduler.h   |  32 --
 .../webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp  | 113 -------
 .../webgpu/cpp/rnwgpu/async/GpuEventLoop.h    |  70 -----
 .../cpp/rnwgpu/async/RuntimeContext.cpp       | 110 ++++---
 .../webgpu/cpp/rnwgpu/async/RuntimeContext.h  |  58 ++--
 .../cpp/rnwgpu/async/RuntimeScheduler.h       |  31 --
 21 files changed, 389 insertions(+), 526 deletions(-)
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h
 delete mode 100644 packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h

diff --git a/apps/example/src/Reanimated/AsyncBuffer.tsx b/apps/example/src/Reanimated/AsyncBuffer.tsx
index 18966b581..0b2adf4bf 100644
--- a/apps/example/src/Reanimated/AsyncBuffer.tsx
+++ b/apps/example/src/Reanimated/AsyncBuffer.tsx
@@ -1,5 +1,5 @@
-import React, { useEffect, useRef } from "react";
-import { StyleSheet, View } from "react-native";
+import React, { useEffect, useRef, useState } from "react";
+import { Pressable, StyleSheet, Text, View } from "react-native";
 import type { CanvasRef, RNCanvasContext } from "react-native-webgpu";
 import { Canvas } from "react-native-webgpu";
 import type { SharedValue } from "react-native-reanimated";
@@ -19,14 +19,16 @@ interface GPUFlags {
   MAP_MODE_READ: number;
 }
 
-// A triangle demo that ALSO performs an async GPU readback (buffer.mapAsync)
-// every frame, then presents only after the readback resolves. This makes the
-// behaviour of async WebGPU ops on the runtime visible: if the mapAsync Promise
-// never settles on this runtime, the animation freezes after the first frame.
+// A triangle demo that creates its adapter/device AND performs an async GPU
+// readback (buffer.mapAsync) every frame, all on the runtime this worklet runs
+// on. With the ProcessEvents async model the device must be created and used on
+// the same runtime, so requestAdapter/requestDevice happen here in the worklet
+// (the GPU object is passed in). The point: with the JS thread busy, the readback
+// keeps resolving on this runtime's own thread and the triangle keeps animating.
 export const webGPUAsyncDemo = (
   runAnimation: SharedValue<boolean>,
-  device: GPUDevice,
   context: RNCanvasContext,
+  gpu: GPU,
   presentationFormat: GPUTextureFormat,
   flags: GPUFlags,
 ) => {
@@ -35,91 +37,117 @@ export const webGPUAsyncDemo = (
     throw new Error("No context");
   }
 
-  context.configure({
-    device,
-    format: presentationFormat,
-    alphaMode: "premultiplied",
-  });
-
-  const pipeline = device.createRenderPipeline({
-    layout: "auto",
-    vertex: {
-      module: device.createShaderModule({ code: triangleVertWGSL }),
-      entryPoint: "main",
-    },
-    fragment: {
-      module: device.createShaderModule({ code: redFragWGSL }),
-      entryPoint: "main",
-      targets: [{ format: presentationFormat }],
-    },
-    primitive: { topology: "triangle-list" },
-  });
-
-  const SIZE = 16; // 4 x f32
-  // Reused across frames: we copy 4 floats into this buffer and read them back.
-  const readback = device.createBuffer({
-    size: SIZE,
-    usage: flags.COPY_DST | flags.MAP_READ,
-  });
-
-  let frameId = 0;
-
-  const frame = async () => {
-    frameId += 1;
-    const commandEncoder = device.createCommandEncoder();
-    const textureView = context.getCurrentTexture().createView();
-
-    const time = Date.now() / 1000;
-    const r = (Math.sin(time * 2) + 1) / 2;
-    const g = (Math.sin(time * 1.5 + Math.PI / 3) + 1) / 2;
-    const b = (Math.sin(time + Math.PI / 2) + 1) / 2;
-
-    const passEncoder = commandEncoder.beginRenderPass({
-      colorAttachments: [
-        {
-          view: textureView,
-          clearValue: [r, g, b, 1],
-          loadOp: "clear",
-          storeOp: "store",
-        },
-      ],
+  // Errors thrown on a worklet are forwarded to the JS thread by the worklets
+  // runtime; if the error object transitively references WebGPU host objects,
+  // JSON.stringify of it on the JS side can crash. So we catch everything here
+  // and forward only a plain string.
+  const logError = (where: string, e: unknown) => {
+    console.error(
+      `[asyncBuffer] ${where}: ` +
+        String((e as { message?: string })?.message ?? e),
+    );
+  };
+
+  const run = async () => {
+    const adapter = await gpu.requestAdapter();
+    if (!adapter) {
+      console.error("[asyncBuffer] failed to get adapter on worklet runtime");
+      return;
+    }
+    const device = await adapter.requestDevice();
+    if (!device) {
+      console.error("[asyncBuffer] failed to get device on worklet runtime");
+      return;
+    }
+    console.log("[asyncBuffer] device created on worklet runtime");
+
+    context.configure({
+      device,
+      format: presentationFormat,
+      alphaMode: "premultiplied",
+    });
+
+    const pipeline = device.createRenderPipeline({
+      layout: "auto",
+      vertex: {
+        module: device.createShaderModule({ code: triangleVertWGSL }),
+        entryPoint: "main",
+      },
+      fragment: {
+        module: device.createShaderModule({ code: redFragWGSL }),
+        entryPoint: "main",
+        targets: [{ format: presentationFormat }],
+      },
+      primitive: { topology: "triangle-list" },
     });
-    passEncoder.setPipeline(pipeline);
-    passEncoder.draw(3);
-    passEncoder.end();
 
-    // Put real data on the GPU so the readback below has something to wait on.
-    const src = device.createBuffer({
+    const SIZE = 16; // 4 x f32
+    const readback = device.createBuffer({
       size: SIZE,
-      usage: flags.COPY_SRC | flags.MAP_WRITE,
-      mappedAtCreation: true,
+      usage: flags.COPY_DST | flags.MAP_READ,
     });
-    new Float32Array(src.getMappedRange()).set([frameId, r, g, b]);
-    src.unmap();
-    commandEncoder.copyBufferToBuffer(src, 0, readback, 0, SIZE);
-
-    device.queue.submit([commandEncoder.finish()]);
-
-    // THE ASYNC OP. This Promise must settle on the runtime that is running
-    // this worklet. On the JS thread it does. On the UI / dedicated runtime the
-    // settlement currently routes through the main JS CallInvoker, so this
-    // await may resolve on the wrong runtime (or never here), freezing the loop.
-    console.log(`[asyncBuffer] frame ${frameId}: awaiting mapAsync...`);
-    await readback.mapAsync(flags.MAP_MODE_READ);
-    const data = Array.from(new Float32Array(readback.getMappedRange()));
-    readback.unmap();
-    src.destroy();
-    console.log(`[asyncBuffer] frame ${frameId}: resolved ->`, data);
-
-    // Present only AFTER the async readback resolves, so a stuck await visibly
-    // freezes the animation instead of silently dropping the readback.
-    context.present();
-
-    if (runAnimation.value) {
-      requestAnimationFrame(frame);
-    }
+
+    let frameId = 0;
+
+    const frame = async () => {
+      try {
+      frameId += 1;
+      const commandEncoder = device.createCommandEncoder();
+      const textureView = context.getCurrentTexture().createView();
+
+      const time = Date.now() / 1000;
+      const r = (Math.sin(time * 2) + 1) / 2;
+      const g = (Math.sin(time * 1.5 + Math.PI / 3) + 1) / 2;
+      const b = (Math.sin(time + Math.PI / 2) + 1) / 2;
+
+      const passEncoder = commandEncoder.beginRenderPass({
+        colorAttachments: [
+          {
+            view: textureView,
+            clearValue: [r, g, b, 1],
+            loadOp: "clear",
+            storeOp: "store",
+          },
+        ],
+      });
+      passEncoder.setPipeline(pipeline);
+      passEncoder.draw(3);
+      passEncoder.end();
+
+      const src = device.createBuffer({
+        size: SIZE,
+        usage: flags.COPY_SRC | flags.MAP_WRITE,
+        mappedAtCreation: true,
+      });
+      new Float32Array(src.getMappedRange()).set([frameId, r, g, b]);
+      src.unmap();
+      commandEncoder.copyBufferToBuffer(src, 0, readback, 0, SIZE);
+
+      device.queue.submit([commandEncoder.finish()]);
+
+      // THE ASYNC OP. With the ProcessEvents model this Promise is pumped and
+      // settled on THIS runtime's own thread, so it resolves even while the JS
+      // thread is busy. Watch the logs against the "Make JS busy" button.
+      await readback.mapAsync(flags.MAP_MODE_READ);
+      const data = Array.from(new Float32Array(readback.getMappedRange()));
+      readback.unmap();
+      src.destroy();
+      if (frameId % 30 === 0) {
+        console.log(`[asyncBuffer] frame ${frameId} resolved ->`, data);
+      }
+
+      context.present();
+
+      if (runAnimation.value) {
+        requestAnimationFrame(frame);
+      }
+      } catch (e) {
+        logError("frame", e);
+      }
+    };
+    frame();
   };
-  frame();
+  run().catch((e) => logError("run", e));
 };
 
 interface AsyncBufferExampleProps {
@@ -129,8 +157,8 @@ interface AsyncBufferExampleProps {
     worklet: typeof webGPUAsyncDemo,
   ) => (
     runAnimation: SharedValue<boolean>,
-    device: GPUDevice,
     context: RNCanvasContext,
+    gpu: GPU,
     presentationFormat: GPUTextureFormat,
     flags: GPUFlags,
   ) => void;
@@ -139,48 +167,59 @@ interface AsyncBufferExampleProps {
 export function AsyncBufferExample({ run }: AsyncBufferExampleProps) {
   const runAnimation = useSharedValue(true);
   const ref = useRef<CanvasRef>(null);
+  const [busy, setBusy] = useState(false);
+
+  // Hammer the JS thread to prove the worklet's async readback + rendering are
+  // independent of it. Each tick blocks the JS thread for 250ms.
   useEffect(() => {
-    const initWebGPU = async () => {
-      const adapter = await navigator.gpu.requestAdapter();
-      if (!adapter) {
-        console.error("Failed to get GPU adapter");
-        return;
-      }
-      const device = await adapter.requestDevice();
-      if (!device) {
-        console.error("Failed to get GPU device");
-        return;
-      }
-      const ctx = ref.current!.getContext("webgpu");
-      if (!ctx) {
-        console.error("Failed to get GPU canvas context");
-        return;
+    if (!busy) {
+      return;
+    }
+    let job = requestAnimationFrame(function work() {
+      const start = Date.now();
+      while (Date.now() - start < 250) {
+        // Busy-wait, blocking the JS thread.
       }
-      const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
-      const flags: GPUFlags = {
-        COPY_SRC: GPUBufferUsage.COPY_SRC,
-        COPY_DST: GPUBufferUsage.COPY_DST,
-        MAP_READ: GPUBufferUsage.MAP_READ,
-        MAP_WRITE: GPUBufferUsage.MAP_WRITE,
-        MAP_MODE_READ: GPUMapMode.READ,
-      };
-      // TODO: stop the animation on unmount
-      run(webGPUAsyncDemo)(
-        runAnimation,
-        device,
-        ctx,
-        presentationFormat,
-        flags,
-      );
+      job = requestAnimationFrame(work);
+    });
+    return () => cancelAnimationFrame(job);
+  }, [busy]);
+
+  useEffect(() => {
+    const ctx = ref.current!.getContext("webgpu");
+    if (!ctx) {
+      console.error("Failed to get GPU canvas context");
+      return;
+    }
+    // The GPU object is created on the main runtime; we hand it to the worklet,
+    // which calls requestAdapter/requestDevice on its OWN runtime.
+    const gpu = navigator.gpu;
+    const presentationFormat = gpu.getPreferredCanvasFormat();
+    const flags: GPUFlags = {
+      COPY_SRC: GPUBufferUsage.COPY_SRC,
+      COPY_DST: GPUBufferUsage.COPY_DST,
+      MAP_READ: GPUBufferUsage.MAP_READ,
+      MAP_WRITE: GPUBufferUsage.MAP_WRITE,
+      MAP_MODE_READ: GPUMapMode.READ,
     };
-    initWebGPU();
+    run(webGPUAsyncDemo)(runAnimation, ctx, gpu, presentationFormat, flags);
     return () => {
       runAnimation.value = false;
     };
-  });
+    // Init the GPU pipeline once on mount. Toggling `busy` must NOT re-run this
+    // (a second device + render loop would fight over the same surface and
+    // trigger a device-mismatch validation error).
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, []);
+
   return (
     <View style={style.container}>
       <Canvas ref={ref} style={style.webgpu} />
+      <Pressable style={style.button} onPress={() => setBusy((b) => !b)}>
+        <Text style={style.buttonText}>
+          {busy ? "Stop busy JS" : "Make JS busy"}
+        </Text>
+      </Pressable>
     </View>
   );
 }
@@ -193,4 +232,18 @@ const style = StyleSheet.create({
   webgpu: {
     flex: 1,
   },
+  button: {
+    position: "absolute",
+    bottom: 32,
+    alignSelf: "center",
+    backgroundColor: "rgba(0,0,0,0.6)",
+    paddingHorizontal: 20,
+    paddingVertical: 12,
+    borderRadius: 24,
+  },
+  buttonText: {
+    color: "white",
+    fontSize: 16,
+    fontWeight: "600",
+  },
 });
diff --git a/packages/webgpu/android/CMakeLists.txt b/packages/webgpu/android/CMakeLists.txt
index 35fc9b50f..8f7321b7f 100644
--- a/packages/webgpu/android/CMakeLists.txt
+++ b/packages/webgpu/android/CMakeLists.txt
@@ -53,8 +53,6 @@ add_library(${PACKAGE_NAME} SHARED
     ../cpp/jsi/RuntimeAwareCache.cpp
     ../cpp/rnwgpu/async/RuntimeContext.cpp
     ../cpp/rnwgpu/async/AsyncTaskHandle.cpp
-    ../cpp/rnwgpu/async/CallInvokerScheduler.cpp
-    ../cpp/rnwgpu/async/GpuEventLoop.cpp
 )
 
 target_include_directories(
diff --git a/packages/webgpu/cpp/jsi/NativeObject.h b/packages/webgpu/cpp/jsi/NativeObject.h
index a90927721..d3e09ed5c 100644
--- a/packages/webgpu/cpp/jsi/NativeObject.h
+++ b/packages/webgpu/cpp/jsi/NativeObject.h
@@ -439,6 +439,29 @@ class NativeObject : public jsi::NativeState,
     prototype.setProperty(runtime, name, func);
   }
 
+  /**
+   * Install a method whose native implementation needs the calling jsi::Runtime
+   * as its first parameter. Used by entry points that must act per-runtime
+   * (e.g. GPU::requestAdapter, which creates a per-runtime RuntimeContext).
+   */
+  template <typename ReturnType, typename... Args>
+  static void
+  installMethodWithRuntime(jsi::Runtime &runtime, jsi::Object &prototype,
+                           const char *name,
+                           ReturnType (Derived::*method)(jsi::Runtime &,
+                                                         Args...)) {
+    auto func = jsi::Function::createFromHostFunction(
+        runtime, jsi::PropNameID::forUtf8(runtime, name), sizeof...(Args),
+        [method](jsi::Runtime &rt, const jsi::Value &thisVal,
+                 const jsi::Value *args, size_t count) -> jsi::Value {
+          auto native = Derived::fromValue(rt, thisVal);
+          return callMethodWithRuntime(native.get(), method, rt, args,
+                                       std::index_sequence_for<Args...>{},
+                                       count);
+        });
+    prototype.setProperty(runtime, name, func);
+  }
+
   /**
    * Install a getter on the prototype.
    */
@@ -574,6 +597,22 @@ class NativeObject : public jsi::NativeState,
   }
 
 private:
+  // Helper to call a method that takes the calling jsi::Runtime as its first
+  // parameter, with JSI argument conversion for the rest and JSI conversion of
+  // the result.
+  template <typename ReturnType, typename... Args, size_t... Is>
+  static jsi::Value
+  callMethodWithRuntime(Derived *obj,
+                        ReturnType (Derived::*method)(jsi::Runtime &, Args...),
+                        jsi::Runtime &runtime, const jsi::Value *args,
+                        std::index_sequence<Is...>, size_t count) {
+    ReturnType result = (obj->*method)(
+        runtime, rnwgpu::JSIConverter<std::decay_t<Args>>::fromJSI(
+                     runtime, args[Is], Is >= count)...);
+    return rnwgpu::JSIConverter<std::decay_t<ReturnType>>::toJSI(
+        runtime, std::move(result));
+  }
+
   // Helper to call a method with JSI argument conversion
   template <typename ReturnType, typename... Args, size_t... Is>
   static jsi::Value callMethod(Derived *obj,
diff --git a/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp b/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
index 8868f3d8a..a9f2c2fb7 100644
--- a/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
+++ b/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
@@ -65,7 +65,7 @@ RNWebGPUManager::RNWebGPUManager(
   // Register main runtime for RuntimeAwareCache
   BaseRuntimeAwareCache::setMainJsRuntime(_jsRuntime);
 
-  auto gpu = std::make_shared<GPU>(*_jsRuntime, _jsCallInvoker);
+  auto gpu = std::make_shared<GPU>(*_jsRuntime);
   auto rnWebGPU =
       std::make_shared<RNWebGPU>(gpu, _platformContext, _jsCallInvoker);
   _gpu = gpu->get();
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
index 36434e0ee..902ce141e 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
@@ -9,14 +9,11 @@
 
 #include "Convertors.h"
 #include "JSIConverter.h"
-#include "rnwgpu/async/CallInvokerScheduler.h"
-#include "rnwgpu/async/GpuEventLoop.h"
+#include "rnwgpu/async/RuntimeContext.h"
 
 namespace rnwgpu {
 
-GPU::GPU(jsi::Runtime &runtime,
-         std::shared_ptr<facebook::react::CallInvoker> callInvoker)
-    : NativeObject(CLASS_NAME) {
+GPU::GPU(jsi::Runtime & /*runtime*/) : NativeObject(CLASS_NAME) {
   static const auto kTimedWaitAny = wgpu::InstanceFeatureName::TimedWaitAny;
   wgpu::InstanceDescriptor instanceDesc{.requiredFeatureCount = 1,
                                         .requiredFeatures = &kTimedWaitAny};
@@ -51,15 +48,10 @@ GPU::GPU(jsi::Runtime &runtime,
   instanceDesc.nextInChain = &toggles;
 
   _instance = wgpu::CreateInstance(&instanceDesc);
-
-  auto scheduler =
-      std::make_shared<async::CallInvokerScheduler>(std::move(callInvoker));
-  auto eventLoop = std::make_shared<async::GpuEventLoop>(_instance);
-  _async = async::RuntimeContext::getOrCreate(runtime, std::move(scheduler),
-                                              std::move(eventLoop));
 }
 
 async::AsyncTaskHandle GPU::requestAdapter(
+    jsi::Runtime &runtime,
     std::optional<std::shared_ptr<GPURequestAdapterOptions>> options) {
   wgpu::RequestAdapterOptions aOptions;
   Convertor conv;
@@ -72,13 +64,18 @@ async::AsyncTaskHandle GPU::requestAdapter(
   constexpr auto kDefaultBackendType = wgpu::BackendType::Vulkan;
 #endif
   aOptions.backendType = kDefaultBackendType;
-  return _async->postTask(
-      [this, aOptions](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                       const async::AsyncTaskHandle::RejectFunction &reject)
+
+  // Per-runtime context: async ops requested on this runtime resolve on this
+  // runtime's own thread (via its ProcessEvents pump).
+  auto context = async::RuntimeContext::getOrCreate(runtime, _instance);
+  return context->postTask(
+      [this, aOptions,
+       context](const async::AsyncTaskHandle::ResolveFunction &resolve,
+                const async::AsyncTaskHandle::RejectFunction &reject)
           -> wgpu::Future {
         return _instance.RequestAdapter(
-            &aOptions, wgpu::CallbackMode::WaitAnyOnly,
-            [context = _async, resolve,
+            &aOptions, wgpu::CallbackMode::AllowProcessEvents,
+            [context, resolve,
              reject](wgpu::RequestAdapterStatus status, wgpu::Adapter adapter,
                      wgpu::StringView message) {
               if (message.length) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.h b/packages/webgpu/cpp/rnwgpu/api/GPU.h
index e7dc15caf..f42589fc7 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.h
@@ -19,10 +19,6 @@
 
 #include <webgpu/webgpu.h>
 
-namespace facebook::react {
-class CallInvoker;
-} // namespace facebook::react
-
 namespace rnwgpu {
 
 namespace jsi = facebook::jsi;
@@ -31,13 +27,15 @@ class GPU : public NativeObject<GPU> {
 public:
   static constexpr const char *CLASS_NAME = "GPU";
 
-  GPU(jsi::Runtime &runtime,
-      std::shared_ptr<facebook::react::CallInvoker> callInvoker);
+  explicit GPU(jsi::Runtime &runtime);
 
 public:
   std::string getBrand() { return CLASS_NAME; }
 
+  // requestAdapter needs the calling runtime so each runtime gets its own
+  // RuntimeContext (and ProcessEvents pump on its own thread).
   async::AsyncTaskHandle requestAdapter(
+      jsi::Runtime &runtime,
       std::optional<std::shared_ptr<GPURequestAdapterOptions>> options);
   wgpu::TextureFormat getPreferredCanvasFormat();
 
@@ -45,7 +43,8 @@ class GPU : public NativeObject<GPU> {
 
   static void definePrototype(jsi::Runtime &runtime, jsi::Object &prototype) {
     installGetter(runtime, prototype, "__brand", &GPU::getBrand);
-    installMethod(runtime, prototype, "requestAdapter", &GPU::requestAdapter);
+    installMethodWithRuntime(runtime, prototype, "requestAdapter",
+                             &GPU::requestAdapter);
     installMethod(runtime, prototype, "getPreferredCanvasFormat",
                   &GPU::getPreferredCanvasFormat);
     installGetter(runtime, prototype, "wgslLanguageFeatures",
@@ -56,7 +55,6 @@ class GPU : public NativeObject<GPU> {
 
 private:
   wgpu::Instance _instance;
-  std::shared_ptr<async::RuntimeContext> _async;
 };
 
 } // namespace rnwgpu
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
index 130b00622..ebe84690b 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
@@ -164,7 +164,7 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
           deviceDesc.nextInChain = &toggles;
         }
         return _instance.RequestDevice(
-            &deviceDesc, wgpu::CallbackMode::WaitAnyOnly,
+            &deviceDesc, wgpu::CallbackMode::AllowProcessEvents,
             [context = _async, resolve, reject, label, creationRuntime,
              deviceLostBinding](wgpu::RequestDeviceStatus status,
                                 wgpu::Device device, wgpu::StringView message) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
index a53d97940..1938b72d7 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
@@ -57,7 +57,7 @@ async::AsyncTaskHandle GPUBuffer::mapAsync(uint64_t modeIn,
                   const async::AsyncTaskHandle::RejectFunction &reject)
           -> wgpu::Future {
         return bufferHandle.MapAsync(
-            mode, resolvedOffset, rangeSize, wgpu::CallbackMode::WaitAnyOnly,
+            mode, resolvedOffset, rangeSize, wgpu::CallbackMode::AllowProcessEvents,
             [resolve, reject](wgpu::MapAsyncStatus status,
                               wgpu::StringView message) {
               switch (status) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
index ae01b0eab..346d342c0 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
@@ -366,7 +366,7 @@ async::AsyncTaskHandle GPUDevice::createComputePipelineAsync(
                                   &reject) -> wgpu::Future {
     (void)descriptor;
     return device.CreateComputePipelineAsync(
-        &desc, wgpu::CallbackMode::WaitAnyOnly,
+        &desc, wgpu::CallbackMode::AllowProcessEvents,
         [pipelineHolder, resolve,
          reject](wgpu::CreatePipelineAsyncStatus status,
                  wgpu::ComputePipeline pipeline, wgpu::StringView msg) {
@@ -408,7 +408,7 @@ async::AsyncTaskHandle GPUDevice::createRenderPipelineAsync(
                                   &reject) -> wgpu::Future {
     (void)descriptor;
     return device.CreateRenderPipelineAsync(
-        &desc, wgpu::CallbackMode::WaitAnyOnly,
+        &desc, wgpu::CallbackMode::AllowProcessEvents,
         [pipelineHolder, resolve,
          reject](wgpu::CreatePipelineAsyncStatus status,
                  wgpu::RenderPipeline pipeline, wgpu::StringView msg) {
@@ -439,7 +439,7 @@ async::AsyncTaskHandle GPUDevice::popErrorScope() {
                                    const async::AsyncTaskHandle::RejectFunction
                                        &reject) -> wgpu::Future {
     return device.PopErrorScope(
-        wgpu::CallbackMode::WaitAnyOnly,
+        wgpu::CallbackMode::AllowProcessEvents,
         [resolve, reject](wgpu::PopErrorScopeStatus status,
                           wgpu::ErrorType type, wgpu::StringView message) {
           if (status == wgpu::PopErrorScopeStatus::Error ||
@@ -528,7 +528,8 @@ async::AsyncTaskHandle GPUDevice::getLost() {
           });
           // No Dawn event to wait on: resolved synchronously.
           return wgpu::Future{};
-        });
+        },
+        /*keepPumping=*/false);
   }
 
   auto handle = _async->postTask(
@@ -546,7 +547,8 @@ async::AsyncTaskHandle GPUDevice::getLost() {
         // Resolved later from notifyDeviceLost(); no Dawn event to wait on.
         _lostResolve = resolve;
         return wgpu::Future{};
-      });
+      },
+      /*keepPumping=*/false);
 
   _lostHandle = handle;
   return handle;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
index 80b02bfd8..c4237a653 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
@@ -258,9 +258,10 @@ class GPUDevice : public NativeObject<GPUDevice> {
   wgpu::Device _instance;
   std::shared_ptr<async::RuntimeContext> _async;
   std::string _label;
-  // Guards the device-lost state below. notifyDeviceLost() may run on a
-  // GpuEventLoop worker thread (the device-lost callback is Spontaneous), while
-  // getLost() runs on the JS thread, so these fields need synchronization.
+  // Guards the device-lost state below. In the ProcessEvents model both
+  // notifyDeviceLost() (fired by Dawn during ProcessEvents) and getLost() run on
+  // the owning runtime's own thread, but device destruction can also trigger
+  // notifyDeviceLost() synchronously, so the mutex keeps these fields safe.
   std::mutex _lostMutex;
   std::optional<async::AsyncTaskHandle> _lostHandle;
   std::shared_ptr<GPUDeviceLostInfo> _lostInfo;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
index 9b3365d69..cac79ca5b 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
@@ -85,7 +85,7 @@ async::AsyncTaskHandle GPUQueue::onSubmittedWorkDone() {
               const async::AsyncTaskHandle::RejectFunction &reject)
           -> wgpu::Future {
         return queue.OnSubmittedWorkDone(
-            wgpu::CallbackMode::WaitAnyOnly,
+            wgpu::CallbackMode::AllowProcessEvents,
             [resolve, reject](wgpu::QueueWorkDoneStatus status,
                               wgpu::StringView message) {
               if (status == wgpu::QueueWorkDoneStatus::Success) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
index 5ac6d3634..de6d73e6f 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
@@ -16,7 +16,7 @@ async::AsyncTaskHandle GPUShaderModule::getCompilationInfo() {
           -> wgpu::Future {
         auto result = std::make_shared<GPUCompilationInfo>();
         return module.GetCompilationInfo(
-            wgpu::CallbackMode::WaitAnyOnly,
+            wgpu::CallbackMode::AllowProcessEvents,
             [result, resolve,
              reject](wgpu::CompilationInfoRequestStatus status,
                      const wgpu::CompilationInfo *compilationInfo) {
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
index 5d5ea195d..99b928a33 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
@@ -5,6 +5,7 @@
 #include <utility>
 
 #include "Promise.h"
+#include "RuntimeContext.h"
 
 namespace rnwgpu::async {
 
@@ -12,8 +13,8 @@ using Action = std::function<void(jsi::Runtime &, rnwgpu::Promise &)>;
 
 struct AsyncTaskHandle::State
     : public std::enable_shared_from_this<AsyncTaskHandle::State> {
-  explicit State(std::shared_ptr<RuntimeScheduler> scheduler)
-      : scheduler(std::move(scheduler)) {}
+  State(std::shared_ptr<RuntimeContext> context, bool keepPumping)
+      : context(std::move(context)), keepPumping(keepPumping) {}
 
   void settle(Action action);
   void attachPromise(const std::shared_ptr<rnwgpu::Promise> &promise);
@@ -25,7 +26,8 @@ struct AsyncTaskHandle::State
   std::shared_ptr<rnwgpu::Promise> currentPromise();
 
   std::mutex mutex;
-  std::shared_ptr<RuntimeScheduler> scheduler;
+  std::shared_ptr<RuntimeContext> context;
+  bool keepPumping;
   std::shared_ptr<rnwgpu::Promise> promise;
   std::optional<Action> pendingAction;
   bool settled = false;
@@ -75,42 +77,24 @@ void AsyncTaskHandle::State::attachPromise(
 }
 
 void AsyncTaskHandle::State::schedule(Action action) {
-  if (!scheduler) {
-    return;
-  }
-
   auto promiseRef = currentPromise();
   if (!promiseRef) {
     return;
   }
 
-  // The settle callback fires on a GpuEventLoop background worker thread. We
-  // must NOT touch the JS runtime from there: Hermes is single-threaded, so
-  // calling queueMicrotask (or any other JSI op, including Promise.resolve)
-  // off-thread corrupts the heap (crash in GCScope::_newChunkAndPHV). So we
-  // first hop onto the owning runtime's JS thread via the thread-safe
-  // RuntimeScheduler, and only THEN, now safely on that thread, queue the
-  // settlement as a microtask via that runtime's queueMicrotask.
-  scheduler->scheduleOnJS([self = shared_from_this(),
-                           action = std::move(action),
-                           promiseRef](jsi::Runtime &runtime) mutable {
-    auto microtask = jsi::Function::createFromHostFunction(
-        runtime, jsi::PropNameID::forUtf8(runtime, "__rnwgpuSettleMicrotask"),
-        0,
-        [self = std::move(self), action = std::move(action), promiseRef](
-            jsi::Runtime &rt, const jsi::Value & /*thisVal*/,
-            const jsi::Value * /*args*/, size_t /*count*/) mutable
-        -> jsi::Value {
-          action(rt, *promiseRef);
-          std::lock_guard<std::mutex> lock(self->mutex);
-          self->keepAlive.reset();
-          return jsi::Value::undefined();
-        });
-
-    runtime.global()
-        .getPropertyAsFunction(runtime, "queueMicrotask")
-        .call(runtime, std::move(microtask));
-  });
+  // The resolve/reject callback is invoked on the owning runtime's own thread:
+  // either synchronously from instance.ProcessEvents() during the
+  // RuntimeContext tick, or synchronously from postTask (immediate
+  // resolution / exception path). So we settle the Promise directly here, with
+  // no cross-thread hop and no microtask trampoline.
+  action(promiseRef->runtime, *promiseRef);
+
+  if (context) {
+    context->onTaskSettled(keepPumping);
+  }
+
+  std::lock_guard<std::mutex> lock(mutex);
+  keepAlive.reset();
 }
 
 AsyncTaskHandle::ResolveFunction
@@ -159,8 +143,9 @@ AsyncTaskHandle::AsyncTaskHandle(std::shared_ptr<State> state)
 bool AsyncTaskHandle::valid() const { return _state != nullptr; }
 
 AsyncTaskHandle
-AsyncTaskHandle::create(const std::shared_ptr<RuntimeScheduler> &scheduler) {
-  auto state = std::make_shared<State>(scheduler);
+AsyncTaskHandle::create(const std::shared_ptr<RuntimeContext> &context,
+                        bool keepPumping) {
+  auto state = std::make_shared<State>(context, keepPumping);
   state->keepAlive = state;
   return AsyncTaskHandle(std::move(state));
 }
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
index e3a224563..fea16c0f6 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.h
@@ -8,21 +8,22 @@
 
 #include <jsi/jsi.h>
 
-#include "RuntimeScheduler.h"
-
 namespace rnwgpu {
 class Promise;
 }
 
 namespace rnwgpu::async {
 
+class RuntimeContext;
+
 /**
  * Represents a pending asynchronous WebGPU operation that can be converted into
  * a JavaScript Promise.
  *
- * The native callback (resolve/reject) may be invoked from any thread (e.g. a
- * GpuEventLoop worker); the actual Promise settlement is marshalled onto the
- * owning runtime's JS thread via a RuntimeScheduler.
+ * In the ProcessEvents model the resolve/reject callbacks are invoked on the
+ * owning runtime's own thread (synchronously from instance.ProcessEvents()
+ * during the RuntimeContext tick, or synchronously from postTask), so the
+ * Promise is settled directly without any thread marshalling.
  */
 class AsyncTaskHandle {
 public:
@@ -47,8 +48,8 @@ class AsyncTaskHandle {
 
   void attachPromise(const std::shared_ptr<rnwgpu::Promise> &promise) const;
 
-  static AsyncTaskHandle
-  create(const std::shared_ptr<RuntimeScheduler> &scheduler);
+  static AsyncTaskHandle create(const std::shared_ptr<RuntimeContext> &context,
+                                bool keepPumping);
 
 private:
   std::shared_ptr<State> _state;
diff --git a/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp b/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp
deleted file mode 100644
index 2ef72f407..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.cpp
+++ /dev/null
@@ -1,21 +0,0 @@
-#include "CallInvokerScheduler.h"
-
-#include <memory>
-#include <utility>
-
-namespace rnwgpu::async {
-
-CallInvokerScheduler::CallInvokerScheduler(
-    std::shared_ptr<react::CallInvoker> invoker)
-    : _invoker(std::move(invoker)) {}
-
-void CallInvokerScheduler::scheduleOnJS(
-    std::function<void(jsi::Runtime &)> job) {
-  if (!_invoker || !job) {
-    return;
-  }
-  _invoker->invokeAsync(
-      [job = std::move(job)](jsi::Runtime &runtime) { job(runtime); });
-}
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h b/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h
deleted file mode 100644
index cbb6a9174..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/CallInvokerScheduler.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#pragma once
-
-#include <functional>
-#include <memory>
-
-#include <ReactCommon/CallInvoker.h>
-#include <jsi/jsi.h>
-
-#include "RuntimeScheduler.h"
-
-namespace rnwgpu::async {
-
-namespace jsi = facebook::jsi;
-namespace react = facebook::react;
-
-/**
- * RuntimeScheduler for the main React Native JS runtime, backed by
- * react::CallInvoker::invokeAsync. invokeAsync is safe to call from any thread
- * and delivers the work on the JS thread with the runtime, which is exactly the
- * contract RuntimeScheduler requires.
- */
-class CallInvokerScheduler final : public RuntimeScheduler {
-public:
-  explicit CallInvokerScheduler(std::shared_ptr<react::CallInvoker> invoker);
-
-  void scheduleOnJS(std::function<void(jsi::Runtime &)> job) override;
-
-private:
-  std::shared_ptr<react::CallInvoker> _invoker;
-};
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
deleted file mode 100644
index 2bd643b39..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.cpp
+++ /dev/null
@@ -1,113 +0,0 @@
-#include "GpuEventLoop.h"
-
-#include <algorithm>
-#include <cstdint>
-#include <thread>
-#include <utility>
-
-#include "WGPULogger.h"
-
-namespace rnwgpu::async {
-
-namespace {
-constexpr const char *TAG = "GpuEventLoop";
-
-std::size_t computeMaxWorkers() {
-  unsigned int hw = std::thread::hardware_concurrency();
-  if (hw == 0) {
-    hw = 4;
-  }
-  // A small bounded pool: enough to overlap the handful of async GPU ops that
-  // are realistically in flight at once, without spawning unbounded threads.
-  return std::max<std::size_t>(2, std::min<std::size_t>(8, hw));
-}
-} // namespace
-
-GpuEventLoop::GpuEventLoop(wgpu::Instance instance)
-    : _state(std::make_shared<State>(std::move(instance))) {
-  _state->maxWorkers = computeMaxWorkers();
-  Logger::logToConsole("[%s] Created (maxWorkers=%zu)", TAG,
-                       _state->maxWorkers);
-}
-
-GpuEventLoop::~GpuEventLoop() {
-  {
-    std::lock_guard<std::mutex> lock(_state->mutex);
-    _state->running.store(false, std::memory_order_release);
-  }
-  // Wake idle workers so they can observe !running and exit. Workers that are
-  // currently blocked in WaitAny keep the shared State (and its wgpu::Instance
-  // ref) alive until their future completes, then exit; we intentionally do not
-  // join here to avoid blocking teardown on in-flight GPU work.
-  _state->cv.notify_all();
-}
-
-void GpuEventLoop::addFuture(wgpu::Future future) {
-  if (future.id == 0) {
-    // No event to wait on (deferred/immediate resolution). The callback path
-    // settles the promise without involving the event loop.
-    return;
-  }
-
-  std::lock_guard<std::mutex> lock(_state->mutex);
-  if (!_state->running.load(std::memory_order_acquire)) {
-    return;
-  }
-
-  _state->queue.push(future);
-
-  // Grow the pool if every worker is busy and we are still under the cap;
-  // otherwise wake an idle worker. A freshly spawned worker picks the job up
-  // via the queue-non-empty predicate, so it needs no separate notify.
-  if (_state->idleWorkers == 0 && _state->totalWorkers < _state->maxWorkers) {
-    _state->totalWorkers++;
-    std::thread(&GpuEventLoop::worker, _state).detach();
-    Logger::logToConsole("[%s] grew pool to %zu worker(s)", TAG,
-                         _state->totalWorkers);
-  } else {
-    _state->cv.notify_one();
-  }
-}
-
-void GpuEventLoop::worker(std::shared_ptr<State> state) {
-  for (;;) {
-    wgpu::Future future{};
-    {
-      std::unique_lock<std::mutex> lock(state->mutex);
-      state->idleWorkers++;
-      state->cv.wait(lock, [&state] {
-        return !state->running.load(std::memory_order_acquire) ||
-               !state->queue.empty();
-      });
-      state->idleWorkers--;
-
-      if (state->queue.empty()) {
-        // Only happens when shutting down.
-        state->totalWorkers--;
-        return;
-      }
-
-      future = state->queue.front();
-      state->queue.pop();
-    }
-
-    // Single-future wait: always a legal single-source WaitAny. Blocks with no
-    // CPU cost until the GPU work completes, at which point Dawn invokes the
-    // future's callback on this thread (it then marshals back to the owning
-    // runtime via its RuntimeScheduler).
-    auto status = state->instance.WaitAny(future, UINT64_MAX);
-    if (status != wgpu::WaitStatus::Success) {
-      // With an infinite timeout on a single future this is not expected. If it
-      // happens, Dawn did not invoke the future's callback, so the associated
-      // JS Promise will never settle. Log it so the otherwise-silent hang is at
-      // least observable.
-      Logger::logToConsole(
-          "[%s] WaitAny returned non-success status %u for future %llu; its "
-          "Promise will not settle.",
-          TAG, static_cast<unsigned int>(status),
-          static_cast<unsigned long long>(future.id));
-    }
-  }
-}
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h b/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h
deleted file mode 100644
index 07e90cd98..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/GpuEventLoop.h
+++ /dev/null
@@ -1,70 +0,0 @@
-#pragma once
-
-#include <atomic>
-#include <condition_variable>
-#include <cstddef>
-#include <memory>
-#include <mutex>
-#include <queue>
-#include <utility>
-
-#include "webgpu/webgpu_cpp.h"
-
-namespace rnwgpu::async {
-
-/**
- * Background, event-driven driver for Dawn async operations. Replaces the old
- * JS-thread ProcessEvents polling loop.
- *
- * Each pending wgpu::Future (registered with CallbackMode::WaitAnyOnly) is
- * handed to addFuture() and waited on by a worker thread via
- * `instance.WaitAny(future, UINT64_MAX)`. The wait is genuinely event-driven
- * (zero idle CPU) and resolves the instant the GPU work completes, at which
- * point Dawn fires the future's callback on the worker thread. That callback is
- * responsible for marshalling back to the owning runtime's JS thread (via a
- * RuntimeScheduler) to settle the JS Promise.
- *
- * Threading model (validated in Phase 0, spike 2): each WaitAny call waits on a
- * *single* future, which is always a legal single-source wait. Multiple workers
- * may block in WaitAny on the same instance concurrently; Dawn's EventManager
- * is designed for this.
- *
- * The worker pool grows lazily up to a small cap as concurrent work demands,
- * and threads are reused. Shared state is held behind a shared_ptr so detached
- * workers (and the wgpu::Instance ref they need) outlive this object safely.
- */
-class GpuEventLoop {
-public:
-  explicit GpuEventLoop(wgpu::Instance instance);
-  ~GpuEventLoop();
-
-  GpuEventLoop(const GpuEventLoop &) = delete;
-  GpuEventLoop &operator=(const GpuEventLoop &) = delete;
-
-  /**
-   * Wait for `future` to complete on a background thread. A future with id == 0
-   * (no event to wait on, e.g. a deferred/immediate resolution) is ignored.
-   * Thread-safe.
-   */
-  void addFuture(wgpu::Future future);
-
-private:
-  struct State {
-    explicit State(wgpu::Instance instance) : instance(std::move(instance)) {}
-
-    wgpu::Instance instance;
-    std::mutex mutex;
-    std::condition_variable cv;
-    std::queue<wgpu::Future> queue;
-    std::atomic_bool running{true};
-    std::size_t idleWorkers = 0;
-    std::size_t totalWorkers = 0;
-    std::size_t maxWorkers = 1;
-  };
-
-  static void worker(std::shared_ptr<State> state);
-
-  std::shared_ptr<State> _state;
-};
-
-} // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
index f297ae6b0..1ed3afb68 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
@@ -11,23 +11,14 @@ namespace rnwgpu::async {
 
 namespace {
 struct RuntimeData {
-  std::shared_ptr<RuntimeContext> runner;
+  std::shared_ptr<RuntimeContext> context;
 };
 constexpr const char *TAG = "RuntimeContext";
 } // namespace
 
-RuntimeContext::RuntimeContext(std::shared_ptr<RuntimeScheduler> scheduler,
-                               std::shared_ptr<GpuEventLoop> eventLoop)
-    : _scheduler(std::move(scheduler)), _eventLoop(std::move(eventLoop)) {
-  if (!_scheduler) {
-    throw std::runtime_error(
-        "RuntimeContext requires a valid RuntimeScheduler.");
-  }
-  if (!_eventLoop) {
-    throw std::runtime_error("RuntimeContext requires a valid GpuEventLoop.");
-  }
-  Logger::logToConsole("[%s] Created runner (scheduler=%p, eventLoop=%p)", TAG,
-                       _scheduler.get(), _eventLoop.get());
+RuntimeContext::RuntimeContext(jsi::Runtime &runtime, wgpu::Instance instance)
+    : _runtime(runtime), _instance(std::move(instance)) {
+  Logger::logToConsole("[%s] Created (runtime=%p)", TAG, &runtime);
 }
 
 std::shared_ptr<RuntimeContext> RuntimeContext::get(jsi::Runtime &runtime) {
@@ -35,53 +26,100 @@ std::shared_ptr<RuntimeContext> RuntimeContext::get(jsi::Runtime &runtime) {
   if (!data) {
     return nullptr;
   }
-  auto stored = std::static_pointer_cast<RuntimeData>(data);
-  return stored->runner;
+  return std::static_pointer_cast<RuntimeData>(data)->context;
 }
 
 std::shared_ptr<RuntimeContext>
-RuntimeContext::getOrCreate(jsi::Runtime &runtime,
-                            std::shared_ptr<RuntimeScheduler> scheduler,
-                            std::shared_ptr<GpuEventLoop> eventLoop) {
-  auto existing = get(runtime);
-  if (existing) {
+RuntimeContext::getOrCreate(jsi::Runtime &runtime, wgpu::Instance instance) {
+  if (auto existing = get(runtime)) {
     return existing;
   }
-
-  auto runner = std::make_shared<RuntimeContext>(std::move(scheduler),
-                                                 std::move(eventLoop));
+  auto context = std::make_shared<RuntimeContext>(runtime, std::move(instance));
   auto data = std::make_shared<RuntimeData>();
-  data->runner = runner;
+  data->context = context;
   runtime.setRuntimeData(runtimeDataUUID(), data);
-  return runner;
+  return context;
 }
 
-AsyncTaskHandle RuntimeContext::postTask(const TaskCallback &callback) {
-  auto handle = AsyncTaskHandle::create(_scheduler);
+AsyncTaskHandle RuntimeContext::postTask(const TaskCallback &callback,
+                                         bool keepPumping) {
+  auto handle = AsyncTaskHandle::create(shared_from_this(), keepPumping);
   if (!handle.valid()) {
     throw std::runtime_error("Failed to create AsyncTaskHandle.");
   }
 
+  _pendingTasks.fetch_add(1, std::memory_order_acq_rel);
+  if (keepPumping) {
+    _pumpTasks.fetch_add(1, std::memory_order_acq_rel);
+  }
+  requestTick();
+
   auto resolve = handle.createResolveFunction();
   auto reject = handle.createRejectFunction();
-
-  wgpu::Future future{};
   try {
-    future = callback(resolve, reject);
+    callback(resolve, reject);
   } catch (const std::exception &exception) {
     reject(exception.what());
-    return handle;
   } catch (...) {
     reject("Unknown native error in RuntimeContext::postTask.");
-    return handle;
   }
-
-  _eventLoop->addFuture(future);
   return handle;
 }
 
-std::shared_ptr<RuntimeScheduler> RuntimeContext::scheduler() const {
-  return _scheduler;
+void RuntimeContext::onTaskSettled(bool keepPumping) {
+  _pendingTasks.fetch_sub(1, std::memory_order_acq_rel);
+  if (keepPumping) {
+    _pumpTasks.fetch_sub(1, std::memory_order_acq_rel);
+  }
+}
+
+void RuntimeContext::requestTick() {
+  bool expected = false;
+  if (!_tickScheduled.compare_exchange_strong(expected, true,
+                                              std::memory_order_acq_rel)) {
+    return;
+  }
+
+  // postTask and tick both run on the owning runtime's thread, so we can
+  // schedule the next tick directly via that runtime's own timer. setTimeout is
+  // available on the main RN runtime and on worklet runtimes (backed by the
+  // worklets EventLoop); setImmediate / queueMicrotask are fallbacks. We do NOT
+  // use queueMicrotask as the primary mechanism: a self-rescheduling microtask
+  // never yields the microtask checkpoint, starving the runtime's task loop.
+  auto self = shared_from_this();
+  jsi::Runtime &rt = _runtime;
+  auto tickCallback = jsi::Function::createFromHostFunction(
+      rt, jsi::PropNameID::forAscii(rt, "RNWGPUAsyncTick"), 0,
+      [self](jsi::Runtime & /*runtime*/, const jsi::Value & /*thisVal*/,
+             const jsi::Value * /*args*/, size_t /*count*/) -> jsi::Value {
+        self->tick();
+        return jsi::Value::undefined();
+      });
+
+  auto global = rt.global();
+  auto setTimeoutValue = global.getProperty(rt, "setTimeout");
+  if (setTimeoutValue.isObject() &&
+      setTimeoutValue.asObject(rt).isFunction(rt)) {
+    setTimeoutValue.asObject(rt).asFunction(rt).call(
+        rt, jsi::Value(rt, tickCallback), jsi::Value(0));
+    return;
+  }
+  auto setImmediateValue = global.getProperty(rt, "setImmediate");
+  if (setImmediateValue.isObject() &&
+      setImmediateValue.asObject(rt).isFunction(rt)) {
+    setImmediateValue.asObject(rt).asFunction(rt).call(
+        rt, jsi::Value(rt, tickCallback));
+    return;
+  }
+  rt.queueMicrotask(std::move(tickCallback));
+}
+
+void RuntimeContext::tick() {
+  _tickScheduled.store(false, std::memory_order_release);
+  _instance.ProcessEvents();
+  if (_pumpTasks.load(std::memory_order_acquire) > 0) {
+    requestTick();
+  }
 }
 
 jsi::UUID RuntimeContext::runtimeDataUUID() {
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
index a7a5d46f4..93a1a39c9 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
@@ -1,13 +1,13 @@
 #pragma once
 
+#include <atomic>
+#include <cstddef>
 #include <functional>
 #include <memory>
 
 #include <jsi/jsi.h>
 
 #include "AsyncTaskHandle.h"
-#include "GpuEventLoop.h"
-#include "RuntimeScheduler.h"
 
 #include "webgpu/webgpu_cpp.h"
 
@@ -18,15 +18,24 @@ namespace rnwgpu::async {
 /**
  * Per-runtime coordinator for asynchronous WebGPU operations.
  *
- * Bundles the runtime's RuntimeScheduler (how to settle Promises back on the
- * owning JS thread) with the GpuEventLoop (how to wait on Dawn futures off the
- * JS thread). This replaces the previous ProcessEvents polling design: there is
- * no tick loop and no idle CPU usage.
+ * Each JS runtime that uses WebGPU gets its own RuntimeContext, stored in the
+ * runtime's runtimeData. Async Dawn operations are registered with
+ * CallbackMode::AllowProcessEvents and driven to completion by pumping
+ * `instance.ProcessEvents()` on the runtime's OWN thread via a self-
+ * rescheduling tick (scheduled through that runtime's setTimeout). Because
+ * ProcessEvents invokes the Dawn callbacks synchronously on the pumping thread,
+ * the JS Promise is settled directly on the owning runtime, with no background
+ * thread and no cross-thread hop.
  *
- * A task callback registers a Dawn async op with CallbackMode::WaitAnyOnly and
- * returns the resulting wgpu::Future, which is handed to the GpuEventLoop. A
- * returned future with id == 0 means "no event to wait on" (deferred/immediate
- * resolution, e.g. GPUDevice::getLost).
+ * The pump only runs while at least one "pumping" task is outstanding, so it
+ * costs nothing when idle and stops cleanly. Tasks that may never settle on
+ * their own (e.g. GPUDevice::getLost) are posted with keepPumping = false so
+ * they do not keep the pump spinning forever.
+ *
+ * Threading contract: a RuntimeContext must only be used from the runtime it
+ * was created for (postTask is expected to run on that runtime's thread). Create
+ * and use a GPUDevice (and the buffers/queues derived from it) on the same
+ * runtime that requested the adapter.
  */
 class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
 public:
@@ -34,24 +43,33 @@ class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
       std::function<wgpu::Future(const AsyncTaskHandle::ResolveFunction &,
                                  const AsyncTaskHandle::RejectFunction &)>;
 
-  RuntimeContext(std::shared_ptr<RuntimeScheduler> scheduler,
-                 std::shared_ptr<GpuEventLoop> eventLoop);
+  RuntimeContext(jsi::Runtime &runtime, wgpu::Instance instance);
 
   static std::shared_ptr<RuntimeContext> get(jsi::Runtime &runtime);
-  static std::shared_ptr<RuntimeContext>
-  getOrCreate(jsi::Runtime &runtime,
-              std::shared_ptr<RuntimeScheduler> scheduler,
-              std::shared_ptr<GpuEventLoop> eventLoop);
+  static std::shared_ptr<RuntimeContext> getOrCreate(jsi::Runtime &runtime,
+                                                     wgpu::Instance instance);
+
+  // The wgpu::Instance bound to this runtime.
+  wgpu::Instance instance() const { return _instance; }
 
-  AsyncTaskHandle postTask(const TaskCallback &callback);
+  AsyncTaskHandle postTask(const TaskCallback &callback,
+                           bool keepPumping = true);
 
-  std::shared_ptr<RuntimeScheduler> scheduler() const;
+  // Invoked by AsyncTaskHandle when a task settles. Runs on the owning runtime's
+  // thread.
+  void onTaskSettled(bool keepPumping);
 
 private:
   static jsi::UUID runtimeDataUUID();
 
-  std::shared_ptr<RuntimeScheduler> _scheduler;
-  std::shared_ptr<GpuEventLoop> _eventLoop;
+  void requestTick();
+  void tick();
+
+  jsi::Runtime &_runtime;
+  wgpu::Instance _instance;
+  std::atomic<std::size_t> _pendingTasks{0};
+  std::atomic<std::size_t> _pumpTasks{0};
+  std::atomic<bool> _tickScheduled{false};
 };
 
 } // namespace rnwgpu::async
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h
deleted file mode 100644
index 926b494c3..000000000
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeScheduler.h
+++ /dev/null
@@ -1,31 +0,0 @@
-#pragma once
-
-#include <functional>
-
-#include <jsi/jsi.h>
-
-namespace rnwgpu::async {
-
-namespace jsi = facebook::jsi;
-
-/**
- * Thread-safe "post this job onto a specific runtime's JS thread".
- *
- * Replaces the old AsyncDispatcher / JSIMicrotaskDispatcher, whose
- * queueMicrotask-based dispatch was only safe to call from the runtime's own
- * thread. A RuntimeScheduler can be called from any thread (e.g. the
- * GpuEventLoop background threads) and guarantees the job runs on the owning
- * runtime's JS thread.
- */
-class RuntimeScheduler {
-public:
-  virtual ~RuntimeScheduler() = default;
-
-  /**
-   * Schedule `job` to run on this runtime's JS thread. Callable from any
-   * thread. Jobs are delivered in FIFO order relative to one another.
-   */
-  virtual void scheduleOnJS(std::function<void(jsi::Runtime &)> job) = 0;
-};
-
-} // namespace rnwgpu::async

From b9275a03170fb4bc6f6224d669c153b49f18cdac Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sat, 6 Jun 2026 21:11:52 +0200
Subject: [PATCH 16/25] :wrench:

---
 .../cpp/rnwgpu/async/AsyncTaskHandle.cpp      | 36 ++++++++++++------
 .../cpp/rnwgpu/async/RuntimeContext.cpp       | 38 ++++++++++++++++++-
 .../webgpu/cpp/rnwgpu/async/RuntimeContext.h  | 33 +++++++++++++---
 3 files changed, 89 insertions(+), 18 deletions(-)

diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
index 99b928a33..5c117cc7e 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
@@ -82,19 +82,33 @@ void AsyncTaskHandle::State::schedule(Action action) {
     return;
   }
 
-  // The resolve/reject callback is invoked on the owning runtime's own thread:
-  // either synchronously from instance.ProcessEvents() during the
-  // RuntimeContext tick, or synchronously from postTask (immediate
-  // resolution / exception path). So we settle the Promise directly here, with
-  // no cross-thread hop and no microtask trampoline.
-  action(promiseRef->runtime, *promiseRef);
-
-  if (context) {
-    context->onTaskSettled(keepPumping);
+  // The resolve/reject callback may fire on a thread that is NOT the owning
+  // runtime's thread: with a shared wgpu::Instance, another runtime's
+  // ProcessEvents() pump can consume this Dawn event. Touching the Promise's
+  // runtime off-thread would corrupt Hermes. So we deposit the actual settle
+  // (the only JSI-touching work) into the owning context's mailbox; the context
+  // drains it on its own thread during its next tick. The deposited closure
+  // captures only C++ state and runs no JSI until drained, so depositing from
+  // any thread is safe.
+  if (!context) {
+    // No context (shouldn't happen): best-effort inline settle.
+    action(promiseRef->runtime, *promiseRef);
+    std::lock_guard<std::mutex> lock(mutex);
+    keepAlive.reset();
+    return;
   }
 
-  std::lock_guard<std::mutex> lock(mutex);
-  keepAlive.reset();
+  auto self = shared_from_this();
+  const bool keep = keepPumping;
+  context->postSettle(
+      [self, action = std::move(action), promiseRef, keep]() mutable {
+        action(promiseRef->runtime, *promiseRef);
+        if (self->context) {
+          self->context->onTaskSettled(keep);
+        }
+        std::lock_guard<std::mutex> lock(self->mutex);
+        self->keepAlive.reset();
+      });
 }
 
 AsyncTaskHandle::ResolveFunction
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
index 1ed3afb68..c9e12b847 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
@@ -14,6 +14,14 @@ struct RuntimeData {
   std::shared_ptr<RuntimeContext> context;
 };
 constexpr const char *TAG = "RuntimeContext";
+
+// Serializes ProcessEvents() across all runtimes that share a wgpu::Instance.
+// Held only across the ProcessEvents call itself, never while running JS / mailbox
+// settle-actions, so it cannot deadlock against the per-context mailbox mutex.
+std::mutex &processEventsMutex() {
+  static std::mutex mutex;
+  return mutex;
+}
 } // namespace
 
 RuntimeContext::RuntimeContext(jsi::Runtime &runtime, wgpu::Instance instance)
@@ -73,6 +81,27 @@ void RuntimeContext::onTaskSettled(bool keepPumping) {
   }
 }
 
+void RuntimeContext::postSettle(std::function<void()> job) {
+  if (!job) {
+    return;
+  }
+  std::lock_guard<std::mutex> lock(_mailboxMutex);
+  _mailbox.push_back(std::move(job));
+}
+
+void RuntimeContext::drainMailbox() {
+  std::vector<std::function<void()>> jobs;
+  {
+    std::lock_guard<std::mutex> lock(_mailboxMutex);
+    jobs.swap(_mailbox);
+  }
+  // Run settle-actions on this (the owning) thread, NOT under the ProcessEvents
+  // mutex, so JS continuations never execute while the pump lock is held.
+  for (auto &job : jobs) {
+    job();
+  }
+}
+
 void RuntimeContext::requestTick() {
   bool expected = false;
   if (!_tickScheduled.compare_exchange_strong(expected, true,
@@ -116,7 +145,14 @@ void RuntimeContext::requestTick() {
 
 void RuntimeContext::tick() {
   _tickScheduled.store(false, std::memory_order_release);
-  _instance.ProcessEvents();
+  {
+    // Serialize ProcessEvents across runtimes sharing this instance. Callbacks
+    // fired here only deposit into mailboxes (postSettle), they do not run JS.
+    std::lock_guard<std::mutex> lock(processEventsMutex());
+    _instance.ProcessEvents();
+  }
+  // Settle this runtime's ready promises on this thread, outside the pump lock.
+  drainMailbox();
   if (_pumpTasks.load(std::memory_order_acquire) > 0) {
     requestTick();
   }
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
index 93a1a39c9..7a43a33c8 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
@@ -4,6 +4,8 @@
 #include <cstddef>
 #include <functional>
 #include <memory>
+#include <mutex>
+#include <vector>
 
 #include <jsi/jsi.h>
 
@@ -32,10 +34,19 @@ namespace rnwgpu::async {
  * their own (e.g. GPUDevice::getLost) are posted with keepPumping = false so
  * they do not keep the pump spinning forever.
  *
- * Threading contract: a RuntimeContext must only be used from the runtime it
- * was created for (postTask is expected to run on that runtime's thread). Create
- * and use a GPUDevice (and the buffers/queues derived from it) on the same
- * runtime that requested the adapter.
+ * Shared-instance safety (mailbox): multiple runtimes may share one
+ * wgpu::Instance. ProcessEvents() drains the whole instance queue and fires
+ * callbacks on the calling thread, which may NOT be the owning runtime's thread
+ * for a given promise. So a settled callback never touches JSI inline; it
+ * deposits a settle-action (a plain C++ closure, no JSI) into the OWNING
+ * context's thread-safe mailbox via postSettle(), and each context drains its
+ * own mailbox on its own thread during tick(). ProcessEvents() itself is
+ * serialized across runtimes by a process-wide mutex, since concurrent
+ * ProcessEvents on one instance is not guaranteed reentrant.
+ *
+ * Threading contract: a RuntimeContext must only be pumped from the runtime it
+ * was created for. Create and use a GPUDevice (and the buffers/queues derived
+ * from it) on the same runtime that requested the adapter.
  */
 class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
 public:
@@ -55,8 +66,14 @@ class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
   AsyncTaskHandle postTask(const TaskCallback &callback,
                            bool keepPumping = true);
 
-  // Invoked by AsyncTaskHandle when a task settles. Runs on the owning runtime's
-  // thread.
+  // Deposit a settle-action to run on THIS context's runtime thread. Thread-safe
+  // (callable from any thread, e.g. another runtime that pumped ProcessEvents).
+  // The job must not touch JSI until it runs (it runs during drainMailbox on the
+  // owning thread).
+  void postSettle(std::function<void()> job);
+
+  // Invoked by a drained settle-action when its task settles. Runs on the owning
+  // runtime's thread.
   void onTaskSettled(bool keepPumping);
 
 private:
@@ -64,12 +81,16 @@ class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
 
   void requestTick();
   void tick();
+  void drainMailbox();
 
   jsi::Runtime &_runtime;
   wgpu::Instance _instance;
   std::atomic<std::size_t> _pendingTasks{0};
   std::atomic<std::size_t> _pumpTasks{0};
   std::atomic<bool> _tickScheduled{false};
+
+  std::mutex _mailboxMutex;
+  std::vector<std::function<void()>> _mailbox;
 };
 
 } // namespace rnwgpu::async

From 5b97f33045faa08a205669ae061004f88c39bdaf Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sat, 6 Jun 2026 21:19:37 +0200
Subject: [PATCH 17/25] :wrench:

---
 .../webgpu/cpp/rnwgpu/async/RuntimeContext.cpp  | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
index c9e12b847..518a0c741 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
@@ -15,6 +15,11 @@ struct RuntimeData {
 };
 constexpr const char *TAG = "RuntimeContext";
 
+// Heartbeat interval (ms) used when there are pending tasks but none that demand
+// a fast pump (e.g. a long-lived device.lost watcher). Keeps delivering Dawn's
+// spontaneous callbacks without burning CPU at frame rate when idle.
+constexpr double kSlowTickMs = 100.0;
+
 // Serializes ProcessEvents() across all runtimes that share a wgpu::Instance.
 // Held only across the ProcessEvents call itself, never while running JS / mailbox
 // settle-actions, so it cannot deadlock against the per-context mailbox mutex.
@@ -109,6 +114,12 @@ void RuntimeContext::requestTick() {
     return;
   }
 
+  // Fast pump (delay 0) while there is active async work; otherwise a slow
+  // heartbeat so long-lived spontaneous callbacks (e.g. device.lost, uncaptured
+  // errors) are still delivered without burning CPU at frame rate when idle.
+  const double delayMs =
+      _pumpTasks.load(std::memory_order_acquire) > 0 ? 0.0 : kSlowTickMs;
+
   // postTask and tick both run on the owning runtime's thread, so we can
   // schedule the next tick directly via that runtime's own timer. setTimeout is
   // available on the main RN runtime and on worklet runtimes (backed by the
@@ -130,7 +141,7 @@ void RuntimeContext::requestTick() {
   if (setTimeoutValue.isObject() &&
       setTimeoutValue.asObject(rt).isFunction(rt)) {
     setTimeoutValue.asObject(rt).asFunction(rt).call(
-        rt, jsi::Value(rt, tickCallback), jsi::Value(0));
+        rt, jsi::Value(rt, tickCallback), jsi::Value(delayMs));
     return;
   }
   auto setImmediateValue = global.getProperty(rt, "setImmediate");
@@ -153,7 +164,9 @@ void RuntimeContext::tick() {
   }
   // Settle this runtime's ready promises on this thread, outside the pump lock.
   drainMailbox();
-  if (_pumpTasks.load(std::memory_order_acquire) > 0) {
+  // Keep pumping while any task is outstanding: fast for active work, slow
+  // heartbeat for lingering watchers like device.lost (see requestTick).
+  if (_pendingTasks.load(std::memory_order_acquire) > 0) {
     requestTick();
   }
 }

From 88de5bb059d4ea1f1353f02cb82c26103b1fac7c Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 07:58:43 +0200
Subject: [PATCH 18/25] :wrench:

---
 apps/example/src/Reanimated/Reanimated.tsx    |  5 ++--
 .../StorageBufferVertices.tsx                 |  1 +
 .../example/src/VisionCamera/VisionCamera.tsx |  4 +--
 apps/example/src/components/useWebGPU.ts      |  3 ++-
 packages/webgpu/cpp/rnwgpu/api/GPU.cpp        |  5 ++--
 packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp |  5 ++--
 packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp  |  5 ++--
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp  | 25 ++++++++-----------
 packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp   |  5 ++--
 .../webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp |  5 ++--
 .../webgpu/cpp/rnwgpu/async/RuntimeContext.h  |  4 +--
 packages/webgpu/src/Canvas.tsx                |  8 +++---
 packages/webgpu/src/WebPolyfillGPUModule.ts   |  2 +-
 packages/webgpu/src/types.ts                  |  8 +++---
 14 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/apps/example/src/Reanimated/Reanimated.tsx b/apps/example/src/Reanimated/Reanimated.tsx
index f0af0d59b..f48266d05 100644
--- a/apps/example/src/Reanimated/Reanimated.tsx
+++ b/apps/example/src/Reanimated/Reanimated.tsx
@@ -78,8 +78,9 @@ export const webGPUDemo = (
     passEncoder.end();
 
     device.queue.submit([commandEncoder.finish()]);
-    // Needed on a dedicated worklet runtime (DedicatedThread); a no-op on the
-    // UI runtime (UIThread), where present is automatic.
+    // Present runs on the calling thread, so it works the same whether this
+    // renders on the UI runtime (UIThread) or a dedicated worklet runtime
+    // (DedicatedThread).
     context.present();
 
     if (runAnimation.value) {
diff --git a/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx b/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
index 371fad7c0..071bfb92e 100644
--- a/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
+++ b/apps/example/src/StorageBufferVertices/StorageBufferVertices.tsx
@@ -185,6 +185,7 @@ export function StorageBufferVertices() {
 
     const commandBuffer = encoder.finish();
     device.queue.submit([commandBuffer]);
+    context.present();
   });
 
   return (
diff --git a/apps/example/src/VisionCamera/VisionCamera.tsx b/apps/example/src/VisionCamera/VisionCamera.tsx
index 25ad6ac39..c2571c4f8 100644
--- a/apps/example/src/VisionCamera/VisionCamera.tsx
+++ b/apps/example/src/VisionCamera/VisionCamera.tsx
@@ -613,8 +613,8 @@ const CameraView = () => {
           pass.draw(3);
           pass.end();
           device.queue.submit([encoder.finish()]);
-          // Vision Camera frame processors run on a dedicated worklet runtime,
-          // so present explicitly (auto-present only covers the JS/UI runtime).
+          // Vision Camera frame processors run on a dedicated worklet runtime;
+          // present runs on that thread, presenting the frame we just rendered.
           context.present();
           // The work sampling it is submitted, so end the external texture's
           // access window now to release the camera frame's surface promptly
diff --git a/apps/example/src/components/useWebGPU.ts b/apps/example/src/components/useWebGPU.ts
index 196a39c26..68cce550f 100644
--- a/apps/example/src/components/useWebGPU.ts
+++ b/apps/example/src/components/useWebGPU.ts
@@ -4,10 +4,11 @@ import {
   useDevice,
   type CanvasRef,
   type NativeCanvas,
+  type RNCanvasContext,
 } from "react-native-webgpu";
 
 interface SceneProps {
-  context: GPUCanvasContext;
+  context: RNCanvasContext;
   device: GPUDevice;
   gpu: GPU;
   presentationFormat: GPUTextureFormat;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
index 902ce141e..92939b28c 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPU.cpp
@@ -71,9 +71,8 @@ async::AsyncTaskHandle GPU::requestAdapter(
   return context->postTask(
       [this, aOptions,
        context](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                const async::AsyncTaskHandle::RejectFunction &reject)
-          -> wgpu::Future {
-        return _instance.RequestAdapter(
+                const async::AsyncTaskHandle::RejectFunction &reject) {
+        _instance.RequestAdapter(
             &aOptions, wgpu::CallbackMode::AllowProcessEvents,
             [context, resolve,
              reject](wgpu::RequestAdapterStatus status, wgpu::Adapter adapter,
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
index ebe84690b..04de74ed1 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUAdapter.cpp
@@ -134,8 +134,7 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
       [this, aDescriptor, descriptor, label = std::move(label),
        deviceLostBinding,
        creationRuntime](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                        const async::AsyncTaskHandle::RejectFunction &reject)
-          -> wgpu::Future {
+                        const async::AsyncTaskHandle::RejectFunction &reject) {
         // Build a local mutable copy so we can chain Dawn's device toggles.
         // The toggle name strings are owned by `descriptor` (captured above),
         // and the const char* / DawnTogglesDescriptor locals live for the
@@ -163,7 +162,7 @@ async::AsyncTaskHandle GPUAdapter::requestDevice(
           }
           deviceDesc.nextInChain = &toggles;
         }
-        return _instance.RequestDevice(
+        _instance.RequestDevice(
             &deviceDesc, wgpu::CallbackMode::AllowProcessEvents,
             [context = _async, resolve, reject, label, creationRuntime,
              deviceLostBinding](wgpu::RequestDeviceStatus status,
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
index 1938b72d7..6ab4b5927 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUBuffer.cpp
@@ -54,9 +54,8 @@ async::AsyncTaskHandle GPUBuffer::mapAsync(uint64_t modeIn,
   return _async->postTask(
       [bufferHandle, mode, resolvedOffset,
        rangeSize](const async::AsyncTaskHandle::ResolveFunction &resolve,
-                  const async::AsyncTaskHandle::RejectFunction &reject)
-          -> wgpu::Future {
-        return bufferHandle.MapAsync(
+                  const async::AsyncTaskHandle::RejectFunction &reject) {
+        bufferHandle.MapAsync(
             mode, resolvedOffset, rangeSize, wgpu::CallbackMode::AllowProcessEvents,
             [resolve, reject](wgpu::MapAsyncStatus status,
                               wgpu::StringView message) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
index 346d342c0..e132378b5 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
@@ -363,9 +363,9 @@ async::AsyncTaskHandle GPUDevice::createComputePipelineAsync(
                               const async::AsyncTaskHandle::ResolveFunction
                                   &resolve,
                               const async::AsyncTaskHandle::RejectFunction
-                                  &reject) -> wgpu::Future {
+                                  &reject) {
     (void)descriptor;
-    return device.CreateComputePipelineAsync(
+    device.CreateComputePipelineAsync(
         &desc, wgpu::CallbackMode::AllowProcessEvents,
         [pipelineHolder, resolve,
          reject](wgpu::CreatePipelineAsyncStatus status,
@@ -405,9 +405,9 @@ async::AsyncTaskHandle GPUDevice::createRenderPipelineAsync(
                               const async::AsyncTaskHandle::ResolveFunction
                                   &resolve,
                               const async::AsyncTaskHandle::RejectFunction
-                                  &reject) -> wgpu::Future {
+                                  &reject) {
     (void)descriptor;
-    return device.CreateRenderPipelineAsync(
+    device.CreateRenderPipelineAsync(
         &desc, wgpu::CallbackMode::AllowProcessEvents,
         [pipelineHolder, resolve,
          reject](wgpu::CreatePipelineAsyncStatus status,
@@ -437,8 +437,8 @@ async::AsyncTaskHandle GPUDevice::popErrorScope() {
   return _async->postTask([device](const async::AsyncTaskHandle::ResolveFunction
                                        &resolve,
                                    const async::AsyncTaskHandle::RejectFunction
-                                       &reject) -> wgpu::Future {
-    return device.PopErrorScope(
+                                       &reject) {
+    device.PopErrorScope(
         wgpu::CallbackMode::AllowProcessEvents,
         [resolve, reject](wgpu::PopErrorScopeStatus status,
                           wgpu::ErrorType type, wgpu::StringView message) {
@@ -520,33 +520,28 @@ async::AsyncTaskHandle GPUDevice::getLost() {
     return _async->postTask(
         [info = _lostInfo](
             const async::AsyncTaskHandle::ResolveFunction &resolve,
-            const async::AsyncTaskHandle::RejectFunction & /*reject*/)
-            -> wgpu::Future {
+            const async::AsyncTaskHandle::RejectFunction & /*reject*/) {
           resolve([info](jsi::Runtime &runtime) mutable {
             return JSIConverter<std::shared_ptr<GPUDeviceLostInfo>>::toJSI(
                 runtime, info);
           });
-          // No Dawn event to wait on: resolved synchronously.
-          return wgpu::Future{};
         },
         /*keepPumping=*/false);
   }
 
   auto handle = _async->postTask(
       [this](const async::AsyncTaskHandle::ResolveFunction &resolve,
-             const async::AsyncTaskHandle::RejectFunction & /*reject*/)
-          -> wgpu::Future {
+             const async::AsyncTaskHandle::RejectFunction & /*reject*/) {
         if (_lostSettled && _lostInfo) {
           resolve([info = _lostInfo](jsi::Runtime &runtime) mutable {
             return JSIConverter<std::shared_ptr<GPUDeviceLostInfo>>::toJSI(
                 runtime, info);
           });
-          return wgpu::Future{};
+          return;
         }
 
-        // Resolved later from notifyDeviceLost(); no Dawn event to wait on.
+        // Resolved later from notifyDeviceLost().
         _lostResolve = resolve;
-        return wgpu::Future{};
       },
       /*keepPumping=*/false);
 
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
index cac79ca5b..d3c0d65af 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUQueue.cpp
@@ -82,9 +82,8 @@ async::AsyncTaskHandle GPUQueue::onSubmittedWorkDone() {
   auto queue = _instance;
   return _async->postTask(
       [queue](const async::AsyncTaskHandle::ResolveFunction &resolve,
-              const async::AsyncTaskHandle::RejectFunction &reject)
-          -> wgpu::Future {
-        return queue.OnSubmittedWorkDone(
+              const async::AsyncTaskHandle::RejectFunction &reject) {
+        queue.OnSubmittedWorkDone(
             wgpu::CallbackMode::AllowProcessEvents,
             [resolve, reject](wgpu::QueueWorkDoneStatus status,
                               wgpu::StringView message) {
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
index de6d73e6f..113dc407c 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUShaderModule.cpp
@@ -12,10 +12,9 @@ async::AsyncTaskHandle GPUShaderModule::getCompilationInfo() {
 
   return _async->postTask(
       [module](const async::AsyncTaskHandle::ResolveFunction &resolve,
-               const async::AsyncTaskHandle::RejectFunction &reject)
-          -> wgpu::Future {
+               const async::AsyncTaskHandle::RejectFunction &reject) {
         auto result = std::make_shared<GPUCompilationInfo>();
-        return module.GetCompilationInfo(
+        module.GetCompilationInfo(
             wgpu::CallbackMode::AllowProcessEvents,
             [result, resolve,
              reject](wgpu::CompilationInfoRequestStatus status,
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
index 7a43a33c8..ff6b45d6d 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
@@ -51,8 +51,8 @@ namespace rnwgpu::async {
 class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
 public:
   using TaskCallback =
-      std::function<wgpu::Future(const AsyncTaskHandle::ResolveFunction &,
-                                 const AsyncTaskHandle::RejectFunction &)>;
+      std::function<void(const AsyncTaskHandle::ResolveFunction &,
+                         const AsyncTaskHandle::RejectFunction &)>;
 
   RuntimeContext(jsi::Runtime &runtime, wgpu::Instance instance);
 
diff --git a/packages/webgpu/src/Canvas.tsx b/packages/webgpu/src/Canvas.tsx
index 43c9621e7..d5bca183d 100644
--- a/packages/webgpu/src/Canvas.tsx
+++ b/packages/webgpu/src/Canvas.tsx
@@ -23,11 +23,11 @@ export type RNCanvasContext = GPUCanvasContext & {
   /**
    * Present the current frame.
    *
-   * Only needed when rendering from a **dedicated worklet runtime** (e.g.
+   * Call this after `queue.submit()` on every runtime: the main JS runtime, the
+   * Reanimated UI runtime, and dedicated worklet runtimes (e.g.
    * `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame
-   * processor), which runs on its own thread. On the main JS runtime and the
-   * Reanimated UI runtime present is automatic (driven by a global vsync), so
-   * calling this there is a no-op. Call it after `queue.submit()`.
+   * processor). It runs synchronously on the calling thread, so the frame is
+   * presented from whichever thread did the rendering.
    */
   present: () => void;
 };
diff --git a/packages/webgpu/src/WebPolyfillGPUModule.ts b/packages/webgpu/src/WebPolyfillGPUModule.ts
index 8b629a0c9..3851733dd 100644
--- a/packages/webgpu/src/WebPolyfillGPUModule.ts
+++ b/packages/webgpu/src/WebPolyfillGPUModule.ts
@@ -41,7 +41,7 @@ function makeWebGPUCanvasContext(
 
   const context = canvas.getContext("webgpu")!;
   // On web there is no manual present; expose a no-op so RNCanvasContext's
-  // present() (used on native dedicated worklet runtimes) is callable here too.
+  // present() (called after queue.submit() on native) is callable here too.
   return Object.assign(context, { present: () => {} });
 }
 
diff --git a/packages/webgpu/src/types.ts b/packages/webgpu/src/types.ts
index df3443157..cd94faa10 100644
--- a/packages/webgpu/src/types.ts
+++ b/packages/webgpu/src/types.ts
@@ -12,11 +12,11 @@ export type RNCanvasContext = GPUCanvasContext & {
   /**
    * Present the current frame.
    *
-   * Only needed when rendering from a **dedicated worklet runtime** (e.g.
+   * Call this after `queue.submit()` on every runtime: the main JS runtime, the
+   * Reanimated UI runtime, and dedicated worklet runtimes (e.g.
    * `createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame
-   * processor), which runs on its own thread. On the main JS runtime and the
-   * Reanimated UI runtime present is automatic (driven by a global vsync), so
-   * calling this there is a no-op. Call it after `queue.submit()`.
+   * processor). It runs synchronously on the calling thread, so the frame is
+   * presented from whichever thread did the rendering.
    */
   present: () => void;
 };

From a3bb58d10e4242350dd4241fa4674e3dacefed09 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 09:11:00 +0200
Subject: [PATCH 19/25] :wrench:

---
 README.md                                     |   9 ++
 apps/example/src/Reanimated/AsyncBuffer.tsx   | 102 +++++++++---------
 packages/webgpu/README.md                     |   9 ++
 packages/webgpu/apple/WebGPUModule.mm         |   1 -
 .../webgpu/cpp/rnwgpu/RNWebGPUManager.cpp     |   7 +-
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp  |  20 ++++
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.h    |   9 +-
 .../cpp/rnwgpu/async/AsyncTaskHandle.cpp      |  46 ++++++--
 .../cpp/rnwgpu/async/RuntimeContext.cpp       |  60 +++++++----
 .../webgpu/cpp/rnwgpu/async/RuntimeContext.h  |  34 +++++-
 10 files changed, 204 insertions(+), 93 deletions(-)

diff --git a/README.md b/README.md
index 3224f858b..aef470c0f 100644
--- a/README.md
+++ b/README.md
@@ -184,6 +184,15 @@ device.queue.submit([commandEncoder.finish()]);
 context.present();
 ```
 
+### Threading model
+
+react-native-webgpu can drive WebGPU from more than one JavaScript runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor). A few rules follow from how JSI and Dawn work:
+
+- **A device belongs to the runtime that created it.** Call `requestAdapter` / `requestDevice` on the runtime where you intend to render, and use the device (and the buffers, textures, pipelines, queue and canvas context derived from it) only on that runtime. JSI objects are not shared across runtimes.
+- **Async results resolve on the runtime that issued the call.** Every async method (`requestAdapter`, `requestDevice`, `mapAsync`, `queue.onSubmittedWorkDone`, `create*PipelineAsync`, `popErrorScope`, `getCompilationInfo`) settles its Promise on the runtime that called it, on that runtime's own thread. This holds even on a worklet runtime while the main JS thread is busy, so a per-frame `mapAsync` readback keeps resolving on the worklet.
+- **Synchronous frame ops run on the calling thread.** `getCurrentTexture`, `queue.submit`, `queue.writeBuffer` and `present()` execute synchronously on whichever thread calls them.
+- **Spontaneous device events are delivered on the main JS runtime only.** `device.lost`, `uncapturederror` and the device logging callback are only guaranteed for a device created on the main JS runtime. A device created on a worklet runtime renders and performs async readbacks normally, but its `device.lost` / `uncapturederror` may not fire. If you need reliable device-loss handling, create that device on the main JS runtime.
+
 ### Canvas Transparency
 
 On Android, the `alphaMode` property is ignored when configuring the canvas.
diff --git a/apps/example/src/Reanimated/AsyncBuffer.tsx b/apps/example/src/Reanimated/AsyncBuffer.tsx
index 0b2adf4bf..04355ab42 100644
--- a/apps/example/src/Reanimated/AsyncBuffer.tsx
+++ b/apps/example/src/Reanimated/AsyncBuffer.tsx
@@ -91,56 +91,56 @@ export const webGPUAsyncDemo = (
 
     const frame = async () => {
       try {
-      frameId += 1;
-      const commandEncoder = device.createCommandEncoder();
-      const textureView = context.getCurrentTexture().createView();
-
-      const time = Date.now() / 1000;
-      const r = (Math.sin(time * 2) + 1) / 2;
-      const g = (Math.sin(time * 1.5 + Math.PI / 3) + 1) / 2;
-      const b = (Math.sin(time + Math.PI / 2) + 1) / 2;
-
-      const passEncoder = commandEncoder.beginRenderPass({
-        colorAttachments: [
-          {
-            view: textureView,
-            clearValue: [r, g, b, 1],
-            loadOp: "clear",
-            storeOp: "store",
-          },
-        ],
-      });
-      passEncoder.setPipeline(pipeline);
-      passEncoder.draw(3);
-      passEncoder.end();
-
-      const src = device.createBuffer({
-        size: SIZE,
-        usage: flags.COPY_SRC | flags.MAP_WRITE,
-        mappedAtCreation: true,
-      });
-      new Float32Array(src.getMappedRange()).set([frameId, r, g, b]);
-      src.unmap();
-      commandEncoder.copyBufferToBuffer(src, 0, readback, 0, SIZE);
-
-      device.queue.submit([commandEncoder.finish()]);
-
-      // THE ASYNC OP. With the ProcessEvents model this Promise is pumped and
-      // settled on THIS runtime's own thread, so it resolves even while the JS
-      // thread is busy. Watch the logs against the "Make JS busy" button.
-      await readback.mapAsync(flags.MAP_MODE_READ);
-      const data = Array.from(new Float32Array(readback.getMappedRange()));
-      readback.unmap();
-      src.destroy();
-      if (frameId % 30 === 0) {
-        console.log(`[asyncBuffer] frame ${frameId} resolved ->`, data);
-      }
-
-      context.present();
-
-      if (runAnimation.value) {
-        requestAnimationFrame(frame);
-      }
+        frameId += 1;
+        const commandEncoder = device.createCommandEncoder();
+        const textureView = context.getCurrentTexture().createView();
+
+        const time = Date.now() / 1000;
+        const r = (Math.sin(time * 2) + 1) / 2;
+        const g = (Math.sin(time * 1.5 + Math.PI / 3) + 1) / 2;
+        const b = (Math.sin(time + Math.PI / 2) + 1) / 2;
+
+        const passEncoder = commandEncoder.beginRenderPass({
+          colorAttachments: [
+            {
+              view: textureView,
+              clearValue: [r, g, b, 1],
+              loadOp: "clear",
+              storeOp: "store",
+            },
+          ],
+        });
+        passEncoder.setPipeline(pipeline);
+        passEncoder.draw(3);
+        passEncoder.end();
+
+        const src = device.createBuffer({
+          size: SIZE,
+          usage: flags.COPY_SRC | flags.MAP_WRITE,
+          mappedAtCreation: true,
+        });
+        new Float32Array(src.getMappedRange()).set([frameId, r, g, b]);
+        src.unmap();
+        commandEncoder.copyBufferToBuffer(src, 0, readback, 0, SIZE);
+
+        device.queue.submit([commandEncoder.finish()]);
+
+        // THE ASYNC OP. With the ProcessEvents model this Promise is pumped and
+        // settled on THIS runtime's own thread, so it resolves even while the JS
+        // thread is busy. Watch the logs against the "Make JS busy" button.
+        await readback.mapAsync(flags.MAP_MODE_READ);
+        const data = Array.from(new Float32Array(readback.getMappedRange()));
+        readback.unmap();
+        src.destroy();
+        if (frameId % 30 === 0) {
+          console.log(`[asyncBuffer] frame ${frameId} resolved ->`, data);
+        }
+
+        context.present();
+
+        if (runAnimation.value) {
+          requestAnimationFrame(frame);
+        }
       } catch (e) {
         logError("frame", e);
       }
@@ -193,7 +193,7 @@ export function AsyncBufferExample({ run }: AsyncBufferExampleProps) {
     }
     // The GPU object is created on the main runtime; we hand it to the worklet,
     // which calls requestAdapter/requestDevice on its OWN runtime.
-    const gpu = navigator.gpu;
+    const { gpu } = navigator;
     const presentationFormat = gpu.getPreferredCanvasFormat();
     const flags: GPUFlags = {
       COPY_SRC: GPUBufferUsage.COPY_SRC,
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index 3224f858b..aef470c0f 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -184,6 +184,15 @@ device.queue.submit([commandEncoder.finish()]);
 context.present();
 ```
 
+### Threading model
+
+react-native-webgpu can drive WebGPU from more than one JavaScript runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor). A few rules follow from how JSI and Dawn work:
+
+- **A device belongs to the runtime that created it.** Call `requestAdapter` / `requestDevice` on the runtime where you intend to render, and use the device (and the buffers, textures, pipelines, queue and canvas context derived from it) only on that runtime. JSI objects are not shared across runtimes.
+- **Async results resolve on the runtime that issued the call.** Every async method (`requestAdapter`, `requestDevice`, `mapAsync`, `queue.onSubmittedWorkDone`, `create*PipelineAsync`, `popErrorScope`, `getCompilationInfo`) settles its Promise on the runtime that called it, on that runtime's own thread. This holds even on a worklet runtime while the main JS thread is busy, so a per-frame `mapAsync` readback keeps resolving on the worklet.
+- **Synchronous frame ops run on the calling thread.** `getCurrentTexture`, `queue.submit`, `queue.writeBuffer` and `present()` execute synchronously on whichever thread calls them.
+- **Spontaneous device events are delivered on the main JS runtime only.** `device.lost`, `uncapturederror` and the device logging callback are only guaranteed for a device created on the main JS runtime. A device created on a worklet runtime renders and performs async readbacks normally, but its `device.lost` / `uncapturederror` may not fire. If you need reliable device-loss handling, create that device on the main JS runtime.
+
 ### Canvas Transparency
 
 On Android, the `alphaMode` property is ignored when configuring the canvas.
diff --git a/packages/webgpu/apple/WebGPUModule.mm b/packages/webgpu/apple/WebGPUModule.mm
index 5d710dd91..e637633b0 100644
--- a/packages/webgpu/apple/WebGPUModule.mm
+++ b/packages/webgpu/apple/WebGPUModule.mm
@@ -78,7 +78,6 @@ - (void)invalidate {
       std::make_shared<rnwgpu::ApplePlatformContext>();
   webgpuManager = std::make_shared<rnwgpu::RNWebGPUManager>(runtime, jsInvoker,
                                                             platformContext);
-
   return @true;
 }
 
diff --git a/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp b/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
index a9f2c2fb7..9db8ce387 100644
--- a/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
+++ b/packages/webgpu/cpp/rnwgpu/RNWebGPUManager.cpp
@@ -34,7 +34,6 @@
 #include "GPUSharedFence.h"
 #include "GPUSharedTextureMemory.h"
 #include "GPUShaderModule.h"
-#include "GPUSharedTextureMemory.h"
 #include "GPUSupportedLimits.h"
 #include "GPUTexture.h"
 #include "GPUTextureView.h"
@@ -65,6 +64,12 @@ RNWebGPUManager::RNWebGPUManager(
   // Register main runtime for RuntimeAwareCache
   BaseRuntimeAwareCache::setMainJsRuntime(_jsRuntime);
 
+  // Register the main runtime + its CallInvoker so spontaneous events
+  // (device.lost / uncapturederror) on main-runtime devices can be delivered to
+  // the JS thread without the ProcessEvents pump. Worklet-runtime devices have
+  // no invoker (best-effort; see README "Threading model").
+  async::RuntimeContext::registerMainRuntime(_jsRuntime, _jsCallInvoker);
+
   auto gpu = std::make_shared<GPU>(*_jsRuntime);
   auto rnWebGPU =
       std::make_shared<RNWebGPU>(gpu, _platformContext, _jsCallInvoker);
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
index e132378b5..624068fe6 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.cpp
@@ -6,6 +6,8 @@
 #include <utility>
 #include <vector>
 
+#include <ReactCommon/CallInvoker.h>
+
 #include "Convertors.h"
 #include "JSIConverter.h"
 
@@ -563,6 +565,24 @@ void GPUDevice::removeEventListener(std::string type, jsi::Function callback) {
 
 void GPUDevice::notifyUncapturedError(wgpu::ErrorType type,
                                       std::string message) {
+  // Dawn can surface an uncaptured error from any ProcessEvents pump (a worklet
+  // runtime sharing this instance may pump it on the wrong thread). Marshal to
+  // the owning runtime's JS thread via its CallInvoker before touching JSI. The
+  // invoker is wired only for the main JS runtime, so a device created on a
+  // worklet runtime does not deliver uncaptured errors to JS (best-effort; see
+  // README "Threading model").
+  auto invoker = _async ? _async->callInvoker() : nullptr;
+  if (!invoker) {
+    return;
+  }
+  auto self = shared_from_this();
+  invoker->invokeAsync([self, type, message = std::move(message)]() mutable {
+    self->deliverUncapturedError(type, std::move(message));
+  });
+}
+
+void GPUDevice::deliverUncapturedError(wgpu::ErrorType type,
+                                       std::string message) {
   auto it = _eventListeners.find("uncapturederror");
   if (it == _eventListeners.end() || it->second.empty()) {
     return;
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
index c4237a653..09bb9adfd 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
@@ -50,8 +50,6 @@
 #include "GPUSharedTextureMemoryDescriptor.h"
 #include "GPUShaderModule.h"
 #include "GPUShaderModuleDescriptor.h"
-#include "GPUSharedTextureMemory.h"
-#include "GPUSharedTextureMemoryDescriptor.h"
 #include "GPUSupportedLimits.h"
 #include "GPUTexture.h"
 #include "GPUTextureDescriptor.h"
@@ -158,6 +156,13 @@ class GPUDevice : public NativeObject<GPUDevice> {
   void notifyUncapturedError(wgpu::ErrorType type, std::string message);
   void forceLossForTesting();
 
+private:
+  // Runs the uncapturederror listeners on the creation runtime's JS thread.
+  // Invoked from notifyUncapturedError via the main CallInvoker.
+  void deliverUncapturedError(wgpu::ErrorType type, std::string message);
+
+public:
+
   // EventTarget methods
   void addEventListener(std::string type, jsi::Function callback);
   void removeEventListener(std::string type, jsi::Function callback);
diff --git a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
index 5c117cc7e..e6ca59285 100644
--- a/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/AsyncTaskHandle.cpp
@@ -4,6 +4,8 @@
 #include <string>
 #include <utility>
 
+#include <ReactCommon/CallInvoker.h>
+
 #include "Promise.h"
 #include "RuntimeContext.h"
 
@@ -82,14 +84,6 @@ void AsyncTaskHandle::State::schedule(Action action) {
     return;
   }
 
-  // The resolve/reject callback may fire on a thread that is NOT the owning
-  // runtime's thread: with a shared wgpu::Instance, another runtime's
-  // ProcessEvents() pump can consume this Dawn event. Touching the Promise's
-  // runtime off-thread would corrupt Hermes. So we deposit the actual settle
-  // (the only JSI-touching work) into the owning context's mailbox; the context
-  // drains it on its own thread during its next tick. The deposited closure
-  // captures only C++ state and runs no JSI until drained, so depositing from
-  // any thread is safe.
   if (!context) {
     // No context (shouldn't happen): best-effort inline settle.
     action(promiseRef->runtime, *promiseRef);
@@ -99,12 +93,42 @@ void AsyncTaskHandle::State::schedule(Action action) {
   }
 
   auto self = shared_from_this();
-  const bool keep = keepPumping;
+
+  if (!keepPumping) {
+    // Spontaneous task (e.g. device.lost): not driven by the ProcessEvents pump.
+    // Settle on the owning runtime's JS thread via its CallInvoker, which is
+    // wired only for the main JS runtime. A device created on a worklet runtime
+    // has no invoker, so its device.lost is dropped (best-effort; see the README
+    // "Threading model"). invokeAsync runs the closure on the main JS thread,
+    // where promiseRef->runtime lives for a main-runtime device.
+    auto invoker = context->callInvoker();
+    if (invoker) {
+      invoker->invokeAsync(
+          [self, action = std::move(action), promiseRef]() mutable {
+            action(promiseRef->runtime, *promiseRef);
+            std::lock_guard<std::mutex> lock(self->mutex);
+            self->keepAlive.reset();
+          });
+    } else {
+      std::lock_guard<std::mutex> lock(mutex);
+      keepAlive.reset();
+    }
+    return;
+  }
+
+  // Pumping task (request/response op). The resolve/reject callback may fire on
+  // a thread that is NOT the owning runtime's thread: with a shared
+  // wgpu::Instance, another runtime's ProcessEvents() pump can consume this Dawn
+  // event. Touching the Promise's runtime off-thread would corrupt Hermes. So we
+  // deposit the actual settle (the only JSI-touching work) into the owning
+  // context's mailbox; the context drains it on its own thread during its next
+  // tick. The deposited closure captures only C++ state and runs no JSI until
+  // drained, so depositing from any thread is safe.
   context->postSettle(
-      [self, action = std::move(action), promiseRef, keep]() mutable {
+      [self, action = std::move(action), promiseRef]() mutable {
         action(promiseRef->runtime, *promiseRef);
         if (self->context) {
-          self->context->onTaskSettled(keep);
+          self->context->onTaskSettled(/*keepPumping=*/true);
         }
         std::lock_guard<std::mutex> lock(self->mutex);
         self->keepAlive.reset();
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
index 518a0c741..46754a40c 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.cpp
@@ -4,6 +4,8 @@
 #include <stdexcept>
 #include <utility>
 
+#include <ReactCommon/CallInvoker.h>
+
 #include "AsyncTaskHandle.h"
 #include "WGPULogger.h"
 
@@ -15,10 +17,12 @@ struct RuntimeData {
 };
 constexpr const char *TAG = "RuntimeContext";
 
-// Heartbeat interval (ms) used when there are pending tasks but none that demand
-// a fast pump (e.g. a long-lived device.lost watcher). Keeps delivering Dawn's
-// spontaneous callbacks without burning CPU at frame rate when idle.
-constexpr double kSlowTickMs = 100.0;
+// The main JS runtime and its CallInvoker, registered once on install. The
+// context created for sMainRuntime gets sMainInvoker; spontaneous events
+// (device.lost) on a main-runtime device are delivered through it without the
+// pump. Worklet runtimes have no invoker (best-effort, see the header doc).
+jsi::Runtime *sMainRuntime = nullptr;
+std::shared_ptr<facebook::react::CallInvoker> sMainInvoker;
 
 // Serializes ProcessEvents() across all runtimes that share a wgpu::Instance.
 // Held only across the ProcessEvents call itself, never while running JS / mailbox
@@ -29,6 +33,13 @@ std::mutex &processEventsMutex() {
 }
 } // namespace
 
+void RuntimeContext::registerMainRuntime(
+    jsi::Runtime *runtime,
+    std::shared_ptr<facebook::react::CallInvoker> invoker) {
+  sMainRuntime = runtime;
+  sMainInvoker = std::move(invoker);
+}
+
 RuntimeContext::RuntimeContext(jsi::Runtime &runtime, wgpu::Instance instance)
     : _runtime(runtime), _instance(std::move(instance)) {
   Logger::logToConsole("[%s] Created (runtime=%p)", TAG, &runtime);
@@ -48,6 +59,11 @@ RuntimeContext::getOrCreate(jsi::Runtime &runtime, wgpu::Instance instance) {
     return existing;
   }
   auto context = std::make_shared<RuntimeContext>(runtime, std::move(instance));
+  // Only the main JS runtime's context carries the CallInvoker; it is used to
+  // deliver spontaneous events (device.lost) without the pump.
+  if (&runtime == sMainRuntime) {
+    context->_callInvoker = sMainInvoker;
+  }
   auto data = std::make_shared<RuntimeData>();
   data->context = context;
   runtime.setRuntimeData(runtimeDataUUID(), data);
@@ -61,11 +77,13 @@ AsyncTaskHandle RuntimeContext::postTask(const TaskCallback &callback,
     throw std::runtime_error("Failed to create AsyncTaskHandle.");
   }
 
-  _pendingTasks.fetch_add(1, std::memory_order_acq_rel);
+  // Only pumping tasks (request/response ops) drive the ProcessEvents pump.
+  // Spontaneous tasks (keepPumping == false, e.g. device.lost) never touch the
+  // pump: they settle via the CallInvoker (see AsyncTaskHandle::State::schedule).
   if (keepPumping) {
     _pumpTasks.fetch_add(1, std::memory_order_acq_rel);
+    requestTick();
   }
-  requestTick();
 
   auto resolve = handle.createResolveFunction();
   auto reject = handle.createRejectFunction();
@@ -80,7 +98,6 @@ AsyncTaskHandle RuntimeContext::postTask(const TaskCallback &callback,
 }
 
 void RuntimeContext::onTaskSettled(bool keepPumping) {
-  _pendingTasks.fetch_sub(1, std::memory_order_acq_rel);
   if (keepPumping) {
     _pumpTasks.fetch_sub(1, std::memory_order_acq_rel);
   }
@@ -114,18 +131,14 @@ void RuntimeContext::requestTick() {
     return;
   }
 
-  // Fast pump (delay 0) while there is active async work; otherwise a slow
-  // heartbeat so long-lived spontaneous callbacks (e.g. device.lost, uncaptured
-  // errors) are still delivered without burning CPU at frame rate when idle.
-  const double delayMs =
-      _pumpTasks.load(std::memory_order_acquire) > 0 ? 0.0 : kSlowTickMs;
-
-  // postTask and tick both run on the owning runtime's thread, so we can
-  // schedule the next tick directly via that runtime's own timer. setTimeout is
-  // available on the main RN runtime and on worklet runtimes (backed by the
-  // worklets EventLoop); setImmediate / queueMicrotask are fallbacks. We do NOT
-  // use queueMicrotask as the primary mechanism: a self-rescheduling microtask
-  // never yields the microtask checkpoint, starving the runtime's task loop.
+  // The pump only ever runs while a request/response op is outstanding, so it
+  // always schedules as soon as possible (delay 0). postTask and tick both run
+  // on the owning runtime's thread, so we schedule the next tick directly via
+  // that runtime's own timer. setTimeout is available on the main RN runtime and
+  // on worklet runtimes (backed by the worklets EventLoop); setImmediate /
+  // queueMicrotask are fallbacks. We do NOT use queueMicrotask as the primary
+  // mechanism: a self-rescheduling microtask never yields the microtask
+  // checkpoint, starving the runtime's task loop.
   auto self = shared_from_this();
   jsi::Runtime &rt = _runtime;
   auto tickCallback = jsi::Function::createFromHostFunction(
@@ -141,7 +154,7 @@ void RuntimeContext::requestTick() {
   if (setTimeoutValue.isObject() &&
       setTimeoutValue.asObject(rt).isFunction(rt)) {
     setTimeoutValue.asObject(rt).asFunction(rt).call(
-        rt, jsi::Value(rt, tickCallback), jsi::Value(delayMs));
+        rt, jsi::Value(rt, tickCallback), jsi::Value(0.0));
     return;
   }
   auto setImmediateValue = global.getProperty(rt, "setImmediate");
@@ -164,9 +177,10 @@ void RuntimeContext::tick() {
   }
   // Settle this runtime's ready promises on this thread, outside the pump lock.
   drainMailbox();
-  // Keep pumping while any task is outstanding: fast for active work, slow
-  // heartbeat for lingering watchers like device.lost (see requestTick).
-  if (_pendingTasks.load(std::memory_order_acquire) > 0) {
+  // Keep pumping only while a "pumping" task (active async work) is outstanding.
+  // Non-pumping tasks (e.g. device.lost) intentionally do NOT keep the pump
+  // alive: we prioritise battery over catching a device.lost fired while idle.
+  if (_pumpTasks.load(std::memory_order_acquire) > 0) {
     requestTick();
   }
 }
diff --git a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
index ff6b45d6d..cb0024d6d 100644
--- a/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
+++ b/packages/webgpu/cpp/rnwgpu/async/RuntimeContext.h
@@ -15,6 +15,10 @@
 
 namespace jsi = facebook::jsi;
 
+namespace facebook::react {
+class CallInvoker;
+} // namespace facebook::react
+
 namespace rnwgpu::async {
 
 /**
@@ -30,9 +34,16 @@ namespace rnwgpu::async {
  * thread and no cross-thread hop.
  *
  * The pump only runs while at least one "pumping" task is outstanding, so it
- * costs nothing when idle and stops cleanly. Tasks that may never settle on
- * their own (e.g. GPUDevice::getLost) are posted with keepPumping = false so
- * they do not keep the pump spinning forever.
+ * costs nothing when idle and stops cleanly.
+ *
+ * Spontaneous events (keepPumping = false): events that may fire at any time,
+ * independent of any request/response op (today only GPUDevice::getLost, whose
+ * Dawn callback is registered AllowSpontaneous). These are NOT driven by the
+ * pump. Instead their settle is marshalled onto the owning runtime's JS thread
+ * via that runtime's CallInvoker, which is wired only for the MAIN JS runtime
+ * (callInvoker()). A device created on a worklet runtime has no invoker, so its
+ * device.lost is best-effort and may never fire. See the README "Threading
+ * model" section.
  *
  * Shared-instance safety (mailbox): multiple runtimes may share one
  * wgpu::Instance. ProcessEvents() drains the whole instance queue and fires
@@ -60,6 +71,20 @@ class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
   static std::shared_ptr<RuntimeContext> getOrCreate(jsi::Runtime &runtime,
                                                      wgpu::Instance instance);
 
+  // Register the main JS runtime and its CallInvoker. The RuntimeContext created
+  // for this runtime gets the invoker (callInvoker() returns it); every other
+  // runtime's context returns null. Called once from RNWebGPUManager on install.
+  static void
+  registerMainRuntime(jsi::Runtime *runtime,
+                      std::shared_ptr<facebook::react::CallInvoker> invoker);
+
+  // CallInvoker for this runtime's JS thread, or null. Non-null only for the
+  // main JS runtime; used to deliver spontaneous events (device.lost) without
+  // the pump. See the class doc.
+  const std::shared_ptr<facebook::react::CallInvoker> &callInvoker() const {
+    return _callInvoker;
+  }
+
   // The wgpu::Instance bound to this runtime.
   wgpu::Instance instance() const { return _instance; }
 
@@ -85,7 +110,8 @@ class RuntimeContext : public std::enable_shared_from_this<RuntimeContext> {
 
   jsi::Runtime &_runtime;
   wgpu::Instance _instance;
-  std::atomic<std::size_t> _pendingTasks{0};
+  // Non-null only for the main JS runtime's context (see registerMainRuntime).
+  std::shared_ptr<facebook::react::CallInvoker> _callInvoker;
   std::atomic<std::size_t> _pumpTasks{0};
   std::atomic<bool> _tickScheduled{false};
 

From 19da770c04dde1807cdfbb14424a2aeaf52f12f9 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 09:26:20 +0200
Subject: [PATCH 20/25] :wrench:

---
 README.md                      | 10 ++++------
 packages/webgpu-shim/README.md | 27 +++------------------------
 packages/webgpu/README.md      | 10 ++++------
 3 files changed, 11 insertions(+), 36 deletions(-)

diff --git a/README.md b/README.md
index aef470c0f..934dc3d9d 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ npm install react-native-webgpu
 ## With Expo
 
 Expo provides a React Native WebGPU template that works with React Three Fiber.
-The works on iOS, Android, and Web.
+This works on iOS, Android, and Web.
 
 ```
 npx create-expo-app@latest -e with-webgpu
@@ -186,12 +186,10 @@ context.present();
 
 ### Threading model
 
-react-native-webgpu can drive WebGPU from more than one JavaScript runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor). A few rules follow from how JSI and Dawn work:
+react-native-webgpu can drive WebGPU from more than one JavaScript runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor).
+This module also works well with [Bundle Mode](https://docs.swmansion.com/react-native-worklets/docs/bundleMode/) and lets you run complex Three.js scenes on the UI thread or dedicated worklet threads.
 
-- **A device belongs to the runtime that created it.** Call `requestAdapter` / `requestDevice` on the runtime where you intend to render, and use the device (and the buffers, textures, pipelines, queue and canvas context derived from it) only on that runtime. JSI objects are not shared across runtimes.
-- **Async results resolve on the runtime that issued the call.** Every async method (`requestAdapter`, `requestDevice`, `mapAsync`, `queue.onSubmittedWorkDone`, `create*PipelineAsync`, `popErrorScope`, `getCompilationInfo`) settles its Promise on the runtime that called it, on that runtime's own thread. This holds even on a worklet runtime while the main JS thread is busy, so a per-frame `mapAsync` readback keeps resolving on the worklet.
-- **Synchronous frame ops run on the calling thread.** `getCurrentTexture`, `queue.submit`, `queue.writeBuffer` and `present()` execute synchronously on whichever thread calls them.
-- **Spontaneous device events are delivered on the main JS runtime only.** `device.lost`, `uncapturederror` and the device logging callback are only guaranteed for a device created on the main JS runtime. A device created on a worklet runtime renders and performs async readbacks normally, but its `device.lost` / `uncapturederror` may not fire. If you need reliable device-loss handling, create that device on the main JS runtime.
+There is a caveat with `device.lost` and `uncapturederror`: they are only delivered on the main JS runtime. This is usually fine because the GPU device is typically created on the main JS thread and then sent to the UI or a dedicated worklet thread. However, if for some reason you create the device outside the main JS thread, beware that `device.lost` and `uncapturederror` won't fire.
 
 ### Canvas Transparency
 
diff --git a/packages/webgpu-shim/README.md b/packages/webgpu-shim/README.md
index f23e4f6e7..8ae240b65 100644
--- a/packages/webgpu-shim/README.md
+++ b/packages/webgpu-shim/README.md
@@ -1,30 +1,9 @@
 # react-native-wgpu
 
-This package is a thin shim that re-exports [`react-native-webgpu`](https://www.npmjs.com/package/react-native-webgpu) under its previous npm name.
+This package has been renamed to [`react-native-webgpu`](https://www.npmjs.com/package/react-native-webgpu).
 
-It exists so that projects that depended on the older `react-native-wgpu` name keep working without an immediate code change. New projects should depend on `react-native-webgpu` directly.
-
-## Installation
+Please use `react-native-webgpu` instead.
 
 ```
-npm install react-native-wgpu
+npm install react-native-webgpu
 ```
-
-This installs `react-native-webgpu` as a dependency. All imports are forwarded:
-
-```ts
-import { Canvas } from "react-native-wgpu";
-// equivalent to
-import { Canvas } from "react-native-webgpu";
-```
-
-## Migrating
-
-Replace the dependency in your `package.json`:
-
-```diff
--  "react-native-wgpu": "^0.5.11"
-+  "react-native-webgpu": "^0.5.11"
-```
-
-and update your imports from `"react-native-wgpu"` to `"react-native-webgpu"`.
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index aef470c0f..934dc3d9d 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -15,7 +15,7 @@ npm install react-native-webgpu
 ## With Expo
 
 Expo provides a React Native WebGPU template that works with React Three Fiber.
-The works on iOS, Android, and Web.
+This works on iOS, Android, and Web.
 
 ```
 npx create-expo-app@latest -e with-webgpu
@@ -186,12 +186,10 @@ context.present();
 
 ### Threading model
 
-react-native-webgpu can drive WebGPU from more than one JavaScript runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor). A few rules follow from how JSI and Dawn work:
+react-native-webgpu can drive WebGPU from more than one JavaScript runtime: the main JS runtime, the Reanimated UI runtime, and dedicated worklet runtimes (`createWorkletRuntime` / `runOnRuntime`, or a Vision Camera frame processor).
+This module also works well with [Bundle Mode](https://docs.swmansion.com/react-native-worklets/docs/bundleMode/) and lets you run complex Three.js scenes on the UI thread or dedicated worklet threads.
 
-- **A device belongs to the runtime that created it.** Call `requestAdapter` / `requestDevice` on the runtime where you intend to render, and use the device (and the buffers, textures, pipelines, queue and canvas context derived from it) only on that runtime. JSI objects are not shared across runtimes.
-- **Async results resolve on the runtime that issued the call.** Every async method (`requestAdapter`, `requestDevice`, `mapAsync`, `queue.onSubmittedWorkDone`, `create*PipelineAsync`, `popErrorScope`, `getCompilationInfo`) settles its Promise on the runtime that called it, on that runtime's own thread. This holds even on a worklet runtime while the main JS thread is busy, so a per-frame `mapAsync` readback keeps resolving on the worklet.
-- **Synchronous frame ops run on the calling thread.** `getCurrentTexture`, `queue.submit`, `queue.writeBuffer` and `present()` execute synchronously on whichever thread calls them.
-- **Spontaneous device events are delivered on the main JS runtime only.** `device.lost`, `uncapturederror` and the device logging callback are only guaranteed for a device created on the main JS runtime. A device created on a worklet runtime renders and performs async readbacks normally, but its `device.lost` / `uncapturederror` may not fire. If you need reliable device-loss handling, create that device on the main JS runtime.
+There is a caveat with `device.lost` and `uncapturederror`: they are only delivered on the main JS runtime. This is usually fine because the GPU device is typically created on the main JS thread and then sent to the UI or a dedicated worklet thread. However, if for some reason you create the device outside the main JS thread, beware that `device.lost` and `uncapturederror` won't fire.
 
 ### Canvas Transparency
 

From 35deac50708f46da6aae094f984283efa1be19ba Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 09:44:40 +0200
Subject: [PATCH 21/25] :wrench:

---
 apps/example/src/Reanimated/AsyncBuffer.tsx | 35 ++++--------
 packages/webgpu/src/constants.ts            | 59 +++++++++++++++++++++
 packages/webgpu/src/index.tsx               |  1 +
 3 files changed, 69 insertions(+), 26 deletions(-)
 create mode 100644 packages/webgpu/src/constants.ts

diff --git a/apps/example/src/Reanimated/AsyncBuffer.tsx b/apps/example/src/Reanimated/AsyncBuffer.tsx
index 04355ab42..9847b612b 100644
--- a/apps/example/src/Reanimated/AsyncBuffer.tsx
+++ b/apps/example/src/Reanimated/AsyncBuffer.tsx
@@ -1,36 +1,27 @@
 import React, { useEffect, useRef, useState } from "react";
 import { Pressable, StyleSheet, Text, View } from "react-native";
 import type { CanvasRef, RNCanvasContext } from "react-native-webgpu";
-import { Canvas } from "react-native-webgpu";
+import { Canvas, GPUBufferUsage, GPUMapMode } from "react-native-webgpu";
 import type { SharedValue } from "react-native-reanimated";
 import { useSharedValue } from "react-native-reanimated";
 
 import { redFragWGSL, triangleVertWGSL } from "../Triangle/triangle";
 
-// The GPU usage / map-mode constants are plain numbers. We resolve them on the
-// JS thread (where the constants are guaranteed to be installed) and pass them
-// into the worklet, so the worklet does not depend on those globals being
-// present on the UI / dedicated runtime.
-interface GPUFlags {
-  COPY_SRC: number;
-  COPY_DST: number;
-  MAP_READ: number;
-  MAP_WRITE: number;
-  MAP_MODE_READ: number;
-}
-
 // A triangle demo that creates its adapter/device AND performs an async GPU
 // readback (buffer.mapAsync) every frame, all on the runtime this worklet runs
 // on. With the ProcessEvents async model the device must be created and used on
 // the same runtime, so requestAdapter/requestDevice happen here in the worklet
 // (the GPU object is passed in). The point: with the JS thread busy, the readback
 // keeps resolving on this runtime's own thread and the triangle keeps animating.
+//
+// GPUBufferUsage / GPUMapMode are imported from react-native-webgpu: the bare
+// globals are only installed on the main JS runtime, but importing them lets the
+// Worklets serializer capture them by closure, so they work on this runtime too.
 export const webGPUAsyncDemo = (
   runAnimation: SharedValue<boolean>,
   context: RNCanvasContext,
   gpu: GPU,
   presentationFormat: GPUTextureFormat,
-  flags: GPUFlags,
 ) => {
   "worklet";
   if (!context) {
@@ -84,7 +75,7 @@ export const webGPUAsyncDemo = (
     const SIZE = 16; // 4 x f32
     const readback = device.createBuffer({
       size: SIZE,
-      usage: flags.COPY_DST | flags.MAP_READ,
+      usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
     });
 
     let frameId = 0;
@@ -116,7 +107,7 @@ export const webGPUAsyncDemo = (
 
         const src = device.createBuffer({
           size: SIZE,
-          usage: flags.COPY_SRC | flags.MAP_WRITE,
+          usage: GPUBufferUsage.COPY_SRC | GPUBufferUsage.MAP_WRITE,
           mappedAtCreation: true,
         });
         new Float32Array(src.getMappedRange()).set([frameId, r, g, b]);
@@ -128,7 +119,7 @@ export const webGPUAsyncDemo = (
         // THE ASYNC OP. With the ProcessEvents model this Promise is pumped and
         // settled on THIS runtime's own thread, so it resolves even while the JS
         // thread is busy. Watch the logs against the "Make JS busy" button.
-        await readback.mapAsync(flags.MAP_MODE_READ);
+        await readback.mapAsync(GPUMapMode.READ);
         const data = Array.from(new Float32Array(readback.getMappedRange()));
         readback.unmap();
         src.destroy();
@@ -160,7 +151,6 @@ interface AsyncBufferExampleProps {
     context: RNCanvasContext,
     gpu: GPU,
     presentationFormat: GPUTextureFormat,
-    flags: GPUFlags,
   ) => void;
 }
 
@@ -195,14 +185,7 @@ export function AsyncBufferExample({ run }: AsyncBufferExampleProps) {
     // which calls requestAdapter/requestDevice on its OWN runtime.
     const { gpu } = navigator;
     const presentationFormat = gpu.getPreferredCanvasFormat();
-    const flags: GPUFlags = {
-      COPY_SRC: GPUBufferUsage.COPY_SRC,
-      COPY_DST: GPUBufferUsage.COPY_DST,
-      MAP_READ: GPUBufferUsage.MAP_READ,
-      MAP_WRITE: GPUBufferUsage.MAP_WRITE,
-      MAP_MODE_READ: GPUMapMode.READ,
-    };
-    run(webGPUAsyncDemo)(runAnimation, ctx, gpu, presentationFormat, flags);
+    run(webGPUAsyncDemo)(runAnimation, ctx, gpu, presentationFormat);
     return () => {
       runAnimation.value = false;
     };
diff --git a/packages/webgpu/src/constants.ts b/packages/webgpu/src/constants.ts
new file mode 100644
index 000000000..c823ae705
--- /dev/null
+++ b/packages/webgpu/src/constants.ts
@@ -0,0 +1,59 @@
+// WebGPU flag constants as importable JS values.
+//
+// The native module installs `GPUBufferUsage`, `GPUTextureUsage`,
+// `GPUShaderStage`, `GPUColorWrite` and `GPUMapMode` as globals, but only on the
+// main JS runtime. Worklet runtimes (Reanimated UI, dedicated worklet runtimes,
+// Vision Camera frame processors) do not get those globals, so referencing the
+// bare global inside a worklet yields `undefined`.
+//
+// These are fixed WebGPU spec values (matching the native `wgpu::*Usage` enums),
+// so we also expose them as plain JS objects. Importing them into a worklet lets
+// the Worklets serializer capture them by closure (the same way module-level
+// shader strings are captured), making them available on every runtime without
+// passing them in by hand:
+//
+//   import { GPUBufferUsage } from "react-native-webgpu";
+//   const work = () => {
+//     "worklet";
+//     device.createBuffer({ usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ });
+//   };
+
+export const GPUBufferUsage = {
+  MAP_READ: 0x0001,
+  MAP_WRITE: 0x0002,
+  COPY_SRC: 0x0004,
+  COPY_DST: 0x0008,
+  INDEX: 0x0010,
+  VERTEX: 0x0020,
+  UNIFORM: 0x0040,
+  STORAGE: 0x0080,
+  INDIRECT: 0x0100,
+  QUERY_RESOLVE: 0x0200,
+};
+
+export const GPUTextureUsage = {
+  COPY_SRC: 0x01,
+  COPY_DST: 0x02,
+  TEXTURE_BINDING: 0x04,
+  STORAGE_BINDING: 0x08,
+  RENDER_ATTACHMENT: 0x10,
+};
+
+export const GPUShaderStage = {
+  VERTEX: 0x1,
+  FRAGMENT: 0x2,
+  COMPUTE: 0x4,
+};
+
+export const GPUColorWrite = {
+  RED: 0x1,
+  GREEN: 0x2,
+  BLUE: 0x4,
+  ALPHA: 0x8,
+  ALL: 0xf,
+};
+
+export const GPUMapMode = {
+  READ: 0x1,
+  WRITE: 0x2,
+};
diff --git a/packages/webgpu/src/index.tsx b/packages/webgpu/src/index.tsx
index 5bb19fd3a..7601e8953 100644
--- a/packages/webgpu/src/index.tsx
+++ b/packages/webgpu/src/index.tsx
@@ -13,6 +13,7 @@ import type {
 } from "./types";
 
 export * from "./main";
+export * from "./constants";
 export type {
   NativeVideoFrame,
   VideoPlayer,

From 39304ad24271055a4c94256d9b9e34b27496e578 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 10:05:44 +0200
Subject: [PATCH 22/25] :wrench:

---
 README.md                 | 18 ++++++++++++++++++
 packages/webgpu/README.md | 18 ++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/README.md b/README.md
index 934dc3d9d..02715155a 100644
--- a/README.md
+++ b/README.md
@@ -343,6 +343,24 @@ const context = canvasRef.current.getContext("webgpu");
 runOnUI(renderFrame)(device, context);
 ```
 
+#### WebGPU constants inside worklets
+
+The flag constants (`GPUBufferUsage`, `GPUTextureUsage`, `GPUShaderStage`, `GPUColorWrite`, `GPUMapMode`) are installed as globals only on the main JS runtime, so the bare global is `undefined` inside a worklet. Import them from `react-native-webgpu` instead: the values are then captured into the worklet by closure (the same way a shader string is), so they work on the UI runtime, dedicated worklet runtimes, and Vision Camera frame processors.
+
+```tsx
+import { GPUBufferUsage, GPUMapMode } from "react-native-webgpu";
+
+const work = (device: GPUDevice) => {
+  "worklet";
+  const buffer = device.createBuffer({
+    size,
+    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
+  });
+  // ...
+  await buffer.mapAsync(GPUMapMode.READ);
+};
+```
+
 ## Troubleshooting
 
 ### iOS
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index 934dc3d9d..02715155a 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -343,6 +343,24 @@ const context = canvasRef.current.getContext("webgpu");
 runOnUI(renderFrame)(device, context);
 ```
 
+#### WebGPU constants inside worklets
+
+The flag constants (`GPUBufferUsage`, `GPUTextureUsage`, `GPUShaderStage`, `GPUColorWrite`, `GPUMapMode`) are installed as globals only on the main JS runtime, so the bare global is `undefined` inside a worklet. Import them from `react-native-webgpu` instead: the values are then captured into the worklet by closure (the same way a shader string is), so they work on the UI runtime, dedicated worklet runtimes, and Vision Camera frame processors.
+
+```tsx
+import { GPUBufferUsage, GPUMapMode } from "react-native-webgpu";
+
+const work = (device: GPUDevice) => {
+  "worklet";
+  const buffer = device.createBuffer({
+    size,
+    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
+  });
+  // ...
+  await buffer.mapAsync(GPUMapMode.READ);
+};
+```
+
 ## Troubleshooting
 
 ### iOS

From 7aaf206b835d99217b8f26d43df1e7fc7238ca72 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 16:17:07 +0200
Subject: [PATCH 23/25] :wrench:

---
 README.md                      | 29 +++++-----------
 packages/webgpu/README.md      | 29 +++++-----------
 packages/webgpu/src/index.tsx  |  1 +
 packages/webgpu/src/install.ts | 60 ++++++++++++++++++++++++++++++++++
 4 files changed, 79 insertions(+), 40 deletions(-)
 create mode 100644 packages/webgpu/src/install.ts

diff --git a/README.md b/README.md
index 02715155a..3c37bdd33 100644
--- a/README.md
+++ b/README.md
@@ -322,14 +322,21 @@ First, install the optional peer dependencies:
 npm install react-native-reanimated react-native-worklets
 ```
 
-WebGPU objects are automatically registered for Worklets serialization when the module loads. You can pass WebGPU objects like `GPUDevice` and `GPUCanvasContext` directly to worklets:
+WebGPU objects are automatically registered for Worklets serialization when the module loads. You can pass WebGPU objects like `GPUDevice` and `GPUCanvasContext` directly to worklets.
+Call `installWebGPU()` once at the top of the worklet to install flag constants like `GPUBufferUsage`, `GPUTextureUsage`, and so on.
 
 ```tsx
-import { Canvas } from "react-native-webgpu";
+import { Canvas, installWebGPU } from "react-native-webgpu";
 import { runOnUI } from "react-native-reanimated";
 
 const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   "worklet";
+  installWebGPU();
+  // WebGPU constants are no available on this worklet thread
+  const buffer = device.createBuffer({
+    size,
+    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
+  });
   // WebGPU rendering code runs on the UI thread
   const commandEncoder = device.createCommandEncoder();
   // ... render ...
@@ -343,24 +350,6 @@ const context = canvasRef.current.getContext("webgpu");
 runOnUI(renderFrame)(device, context);
 ```
 
-#### WebGPU constants inside worklets
-
-The flag constants (`GPUBufferUsage`, `GPUTextureUsage`, `GPUShaderStage`, `GPUColorWrite`, `GPUMapMode`) are installed as globals only on the main JS runtime, so the bare global is `undefined` inside a worklet. Import them from `react-native-webgpu` instead: the values are then captured into the worklet by closure (the same way a shader string is), so they work on the UI runtime, dedicated worklet runtimes, and Vision Camera frame processors.
-
-```tsx
-import { GPUBufferUsage, GPUMapMode } from "react-native-webgpu";
-
-const work = (device: GPUDevice) => {
-  "worklet";
-  const buffer = device.createBuffer({
-    size,
-    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
-  });
-  // ...
-  await buffer.mapAsync(GPUMapMode.READ);
-};
-```
-
 ## Troubleshooting
 
 ### iOS
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index 02715155a..3c37bdd33 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -322,14 +322,21 @@ First, install the optional peer dependencies:
 npm install react-native-reanimated react-native-worklets
 ```
 
-WebGPU objects are automatically registered for Worklets serialization when the module loads. You can pass WebGPU objects like `GPUDevice` and `GPUCanvasContext` directly to worklets:
+WebGPU objects are automatically registered for Worklets serialization when the module loads. You can pass WebGPU objects like `GPUDevice` and `GPUCanvasContext` directly to worklets.
+Call `installWebGPU()` once at the top of the worklet to install flag constants like `GPUBufferUsage`, `GPUTextureUsage`, and so on.
 
 ```tsx
-import { Canvas } from "react-native-webgpu";
+import { Canvas, installWebGPU } from "react-native-webgpu";
 import { runOnUI } from "react-native-reanimated";
 
 const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   "worklet";
+  installWebGPU();
+  // WebGPU constants are no available on this worklet thread
+  const buffer = device.createBuffer({
+    size,
+    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
+  });
   // WebGPU rendering code runs on the UI thread
   const commandEncoder = device.createCommandEncoder();
   // ... render ...
@@ -343,24 +350,6 @@ const context = canvasRef.current.getContext("webgpu");
 runOnUI(renderFrame)(device, context);
 ```
 
-#### WebGPU constants inside worklets
-
-The flag constants (`GPUBufferUsage`, `GPUTextureUsage`, `GPUShaderStage`, `GPUColorWrite`, `GPUMapMode`) are installed as globals only on the main JS runtime, so the bare global is `undefined` inside a worklet. Import them from `react-native-webgpu` instead: the values are then captured into the worklet by closure (the same way a shader string is), so they work on the UI runtime, dedicated worklet runtimes, and Vision Camera frame processors.
-
-```tsx
-import { GPUBufferUsage, GPUMapMode } from "react-native-webgpu";
-
-const work = (device: GPUDevice) => {
-  "worklet";
-  const buffer = device.createBuffer({
-    size,
-    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
-  });
-  // ...
-  await buffer.mapAsync(GPUMapMode.READ);
-};
-```
-
 ## Troubleshooting
 
 ### iOS
diff --git a/packages/webgpu/src/index.tsx b/packages/webgpu/src/index.tsx
index 7601e8953..58728ad32 100644
--- a/packages/webgpu/src/index.tsx
+++ b/packages/webgpu/src/index.tsx
@@ -14,6 +14,7 @@ import type {
 
 export * from "./main";
 export * from "./constants";
+export * from "./install";
 export type {
   NativeVideoFrame,
   VideoPlayer,
diff --git a/packages/webgpu/src/install.ts b/packages/webgpu/src/install.ts
new file mode 100644
index 000000000..bc6016b7e
--- /dev/null
+++ b/packages/webgpu/src/install.ts
@@ -0,0 +1,60 @@
+import {
+  GPUBufferUsage,
+  GPUColorWrite,
+  GPUMapMode,
+  GPUShaderStage,
+  GPUTextureUsage,
+} from "./constants";
+
+// Globals that this function installs on the calling runtime. These are fixed
+// WebGPU spec values (matching the native `wgpu::*Usage` enums), so they are
+// safe to set on any runtime.
+const constants = {
+  GPUBufferUsage,
+  GPUTextureUsage,
+  GPUShaderStage,
+  GPUColorWrite,
+  GPUMapMode,
+};
+
+/**
+ * Install WebGPU on the runtime that calls it.
+ *
+ * The native module installs the WebGPU flag constants (`GPUBufferUsage`,
+ * `GPUTextureUsage`, `GPUShaderStage`, `GPUColorWrite`, `GPUMapMode`) as globals
+ * on the main JS runtime, but worklet runtimes (Reanimated UI, dedicated worklet
+ * runtimes, Vision Camera frame processors) start without them, so referencing
+ * the bare global inside a worklet yields `undefined`.
+ *
+ * Call `installWebGPU()` once at the top of a worklet to make those globals
+ * available there, instead of importing each constant by hand:
+ *
+ * ```tsx
+ * import { installWebGPU } from "react-native-webgpu";
+ *
+ * const work = (device: GPUDevice) => {
+ *   "worklet";
+ *   installWebGPU();
+ *   device.createBuffer({
+ *     usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
+ *   });
+ * };
+ * ```
+ *
+ * The constants are captured into the worklet by closure (the same way a shader
+ * string is), so they work on every runtime. Calling it on a runtime that
+ * already has the globals (e.g. the main JS runtime) is a safe no-op.
+ *
+ * This is the explicit entry point for runtime setup; for now it only installs
+ * the flag constants, but it is the place where other per-runtime WebGPU setup
+ * (e.g. `navigator.gpu`) can be wired in later.
+ */
+export const installWebGPU = () => {
+  "worklet";
+  const g = globalThis as unknown as Record<string, unknown>;
+  for (const [key, value] of Object.entries(constants)) {
+    if (g[key] === undefined) {
+      g[key] = value;
+    }
+  }
+};

From b27167b811badaae6436b9eb5fc3bc5a64882d05 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 16:32:48 +0200
Subject: [PATCH 24/25] :wrench:

---
 README.md                        |  2 +-
 packages/webgpu/README.md        |  2 +-
 packages/webgpu/src/constants.ts | 58 ++++++++++----------------------
 packages/webgpu/src/install.ts   |  7 ++--
 4 files changed, 24 insertions(+), 45 deletions(-)

diff --git a/README.md b/README.md
index 3c37bdd33..323865194 100644
--- a/README.md
+++ b/README.md
@@ -332,7 +332,7 @@ import { runOnUI } from "react-native-reanimated";
 const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   "worklet";
   installWebGPU();
-  // WebGPU constants are no available on this worklet thread
+  // WebGPU constants are now available on this worklet thread
   const buffer = device.createBuffer({
     size,
     usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
diff --git a/packages/webgpu/README.md b/packages/webgpu/README.md
index 3c37bdd33..323865194 100644
--- a/packages/webgpu/README.md
+++ b/packages/webgpu/README.md
@@ -332,7 +332,7 @@ import { runOnUI } from "react-native-reanimated";
 const renderFrame = (device: GPUDevice, context: GPUCanvasContext) => {
   "worklet";
   installWebGPU();
-  // WebGPU constants are no available on this worklet thread
+  // WebGPU constants are now available on this worklet thread
   const buffer = device.createBuffer({
     size,
     usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
diff --git a/packages/webgpu/src/constants.ts b/packages/webgpu/src/constants.ts
index c823ae705..c96970f46 100644
--- a/packages/webgpu/src/constants.ts
+++ b/packages/webgpu/src/constants.ts
@@ -1,3 +1,5 @@
+/// <reference types="@webgpu/types" />
+
 // WebGPU flag constants as importable JS values.
 //
 // The native module installs `GPUBufferUsage`, `GPUTextureUsage`,
@@ -6,54 +8,30 @@
 // Vision Camera frame processors) do not get those globals, so referencing the
 // bare global inside a worklet yields `undefined`.
 //
-// These are fixed WebGPU spec values (matching the native `wgpu::*Usage` enums),
-// so we also expose them as plain JS objects. Importing them into a worklet lets
-// the Worklets serializer capture them by closure (the same way module-level
-// shader strings are captured), making them available on every runtime without
-// passing them in by hand:
+// Rather than hardcode the bit values here (which could drift from the native
+// `wgpu::*Usage` enums), we re-export the globals the native module already
+// installed (see `GPUBufferUsage.h` and friends, which derive their values from
+// the Dawn enums with `static_assert`s). This keeps a single source of truth.
+// Importing them into a worklet lets the Worklets serializer capture them by
+// closure (the same way module-level shader strings are captured), making them
+// available on every runtime without passing them in by hand:
 //
 //   import { GPUBufferUsage } from "react-native-webgpu";
 //   const work = () => {
 //     "worklet";
 //     device.createBuffer({ usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ });
 //   };
+//
+// These are read at module evaluation time. The package entry (`index.tsx`)
+// re-exports `./main` before `./constants`, and `./main` installs the native
+// module synchronously, so the globals always exist by the time this runs.
 
-export const GPUBufferUsage = {
-  MAP_READ: 0x0001,
-  MAP_WRITE: 0x0002,
-  COPY_SRC: 0x0004,
-  COPY_DST: 0x0008,
-  INDEX: 0x0010,
-  VERTEX: 0x0020,
-  UNIFORM: 0x0040,
-  STORAGE: 0x0080,
-  INDIRECT: 0x0100,
-  QUERY_RESOLVE: 0x0200,
-};
+export const GPUBufferUsage = globalThis.GPUBufferUsage;
 
-export const GPUTextureUsage = {
-  COPY_SRC: 0x01,
-  COPY_DST: 0x02,
-  TEXTURE_BINDING: 0x04,
-  STORAGE_BINDING: 0x08,
-  RENDER_ATTACHMENT: 0x10,
-};
+export const GPUTextureUsage = globalThis.GPUTextureUsage;
 
-export const GPUShaderStage = {
-  VERTEX: 0x1,
-  FRAGMENT: 0x2,
-  COMPUTE: 0x4,
-};
+export const GPUShaderStage = globalThis.GPUShaderStage;
 
-export const GPUColorWrite = {
-  RED: 0x1,
-  GREEN: 0x2,
-  BLUE: 0x4,
-  ALPHA: 0x8,
-  ALL: 0xf,
-};
+export const GPUColorWrite = globalThis.GPUColorWrite;
 
-export const GPUMapMode = {
-  READ: 0x1,
-  WRITE: 0x2,
-};
+export const GPUMapMode = globalThis.GPUMapMode;
diff --git a/packages/webgpu/src/install.ts b/packages/webgpu/src/install.ts
index bc6016b7e..3483a0e3d 100644
--- a/packages/webgpu/src/install.ts
+++ b/packages/webgpu/src/install.ts
@@ -6,9 +6,10 @@ import {
   GPUTextureUsage,
 } from "./constants";
 
-// Globals that this function installs on the calling runtime. These are fixed
-// WebGPU spec values (matching the native `wgpu::*Usage` enums), so they are
-// safe to set on any runtime.
+// Globals that this function installs on the calling runtime. These are the
+// native-derived flag constants re-exported from `./constants` (a single source
+// of truth, matching the native `wgpu::*Usage` enums), so they are safe to set
+// on any runtime.
 const constants = {
   GPUBufferUsage,
   GPUTextureUsage,

From 228edf6412266cb0a331da9d630a3fc9139dac27 Mon Sep 17 00:00:00 2001
From: William Candillon <wcandillon@gmail.com>
Date: Sun, 7 Jun 2026 16:37:49 +0200
Subject: [PATCH 25/25] :wrench:

---
 packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h |  4 ++++
 packages/webgpu/cpp/rnwgpu/api/GPUDevice.h   | 11 ++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h b/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
index db18d7af1..5e96ee480 100644
--- a/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
+++ b/packages/webgpu/cpp/rnwgpu/SurfaceRegistry.h
@@ -127,6 +127,10 @@ class SurfaceInfo {
 #ifdef __APPLE__
     // Ensure command buffers are scheduled before presenting. Read the device
     // under a shared lock, then wait without holding it (the wait can block).
+    // The device may be reconfigured between the two locks; that is safe because
+    // present() is called on the rendering thread right after submit(), the wait
+    // just flushes that thread's already-submitted work, and the Present() below
+    // re-checks `surface` under the unique lock before touching it.
     wgpu::Device device;
     {
       std::shared_lock<std::shared_mutex> lock(_mutex);
diff --git a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
index 09bb9adfd..8df6909a2 100644
--- a/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
+++ b/packages/webgpu/cpp/rnwgpu/api/GPUDevice.h
@@ -156,13 +156,6 @@ class GPUDevice : public NativeObject<GPUDevice> {
   void notifyUncapturedError(wgpu::ErrorType type, std::string message);
   void forceLossForTesting();
 
-private:
-  // Runs the uncapturederror listeners on the creation runtime's JS thread.
-  // Invoked from notifyUncapturedError via the main CallInvoker.
-  void deliverUncapturedError(wgpu::ErrorType type, std::string message);
-
-public:
-
   // EventTarget methods
   void addEventListener(std::string type, jsi::Function callback);
   void removeEventListener(std::string type, jsi::Function callback);
@@ -260,6 +253,10 @@ class GPUDevice : public NativeObject<GPUDevice> {
 private:
   friend class GPUAdapter;
 
+  // Runs the uncapturederror listeners on the creation runtime's JS thread.
+  // Invoked from notifyUncapturedError via the main CallInvoker.
+  void deliverUncapturedError(wgpu::ErrorType type, std::string message);
+
   wgpu::Device _instance;
   std::shared_ptr<async::RuntimeContext> _async;
   std::string _label;