Skip to content

Trim import tractor 0.42s -> 0.15s (gh #470)#478

Open
goodboy wants to merge 5 commits into
mainfrom
wkt/boot_latency_470
Open

Trim import tractor 0.42s -> 0.15s (gh #470)#478
goodboy wants to merge 5 commits into
mainfrom
wkt/boot_latency_470

Conversation

@goodboy

@goodboy goodboy commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Trim import tractor 0.42s -> 0.15s (gh #470)

Motivation

import tractor measured ~0.42s cold and, on the default trio
spawn backend, that import ~100% dominates per-actor spawn latency: a
pure start_actor() (spawn + boot + register) came in at
~0.40-0.44s/actor, so sequential subactor-per-core spawns cost N ×
~0.4s. Surfaced while reworking the landing-page example in #460;
issue #470 proposed lazy-importing the heavy/optional deps. The
practical payoff: cheap enough cold boots that a tree can spawn
subactors serially (inline start_actor()) without the
bg-trio.Task import-overlap trick just to stay responsive.

Profiling (-X importtime + cProfile) showed the dep-list only
accounted for ~20ms of the total — the dominant ~244ms was
log.get_logger()'s nested get_caller_mod() calling
inspect.stack() at module level in ~39 tractor modules, which
builds src-file info for EVERY (import-time-deep) stack frame and
scans all of sys.modules per frame via inspect.getmodule().

Summary of changes

  • fix the real hot-path: get_caller_mod() now resolves the caller
    frame via sys._getframe(frames_up) + a f_globals['__name__'] ->
    sys.modules lookup instead of the O(stack × sys.modules)
    inspect.stack() walk.
  • lazy-import the optional 3rd-party deps per the Trim import tractor cost to cut actor spawn latency #470 checklist:
    bidict, multiaddr -> TYPE_CHECKING/fn-local
    (discovery._addr/._multiaddr, ipc._tcp/._uds); colorlog
    -> fn-local in log.get_console_log(); pdbp + wrapt ->
    fn-local in devx._frame_stack; platformdirs -> fn-local in
    runtime._state.get_rt_dir().
  • defer asyncio entirely for trio-only apps: fn-local the
    .to_asyncio imports in devx.debug._trace/._tty_lock +
    spawn._entry's infected-aio branches, with a PEP-562
    __getattr__ in tractor/__init__.py keeping the public
    tractor.to_asyncio.<attr> access working unchanged.
  • add prompt-io provenance entries for the AI-assisted session per
    the NLNet generative-AI policy.
  • demo the payoff in the primary we_are_processes example: a
    main() spawn_subs_in_bg_tasks toggle so the serial spawn path
    (inline start_actor() per sub) can be shown alongside the
    bg-trio.Task overlap path — now that a cold child boots in ~0.18s
    the overlap trick is no longer needed to keep tree spawn-time sane.
    Factors an open_ep() helper out of spawn_and_open_ep() (which
    grows a maybe_ptl param) so the spawn step is caller-optional.

Results: import tractor 0.42s -> ~0.145s (-65%); sequential
start_actor() latency ~0.42 -> ~0.18s/actor. Full suite green
under both --tpt-proto backends (tcp 403 passed, uds 401
passed, 0 failures each).

Future follow up

See the remaining follow-ups in issue #470: the pdbp
(~10ms, needs a devx.debug._repl restructure) and platformdirs
(~1.5ms, needs an Address-proto rework of
UDSAddress.def_bindspace) deferrals stay open there.

Links

(this pr content was generated in some part by claude-code)

goodboy added 4 commits July 2, 2026 12:17
`get_caller_mod()` (nested in `get_logger()`) walks the WHOLE
call-stack via `inspect.stack()`, which also resolves src-file
info for every frame and scans all of `sys.modules` per frame
via `inspect.getmodule()`. During nested imports (deep
importlib stacks) each module-level `get_logger()` call costs
~5-10ms, making the ~39 such calls dominate `import tractor`
wall-time: ~244ms of the ~420ms total (see gh #470).

Deats,
- resolve the caller frame with `sys._getframe(frames_up)` and
  map its `f_globals['__name__']` through `sys.modules`: O(1)
  vs. O(stack x sys.modules).
- guard `ValueError` (stack too shallow) -> `None`, matching
  the existing null-caller handling at all use-sites.
- drop the now-unused `inspect` imports; pull `FrameType` from
  `types` instead.

Results: `import tractor` drops 0.42s -> ~0.155s; sequential
`.start_actor()` spawn latency ~0.42 -> ~0.18s/actor.

Prompt-IO: ai/prompt-io/claude/20260702T155626Z_65bf9df5_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Move every import-time-only-by-accident dep off the eager
`import tractor` path so cold child-actor boots only pay for
what they actually use:

- `bidict` -> `TYPE_CHECKING` in `discovery._addr`
  (annotation-only; `_address_types` is a plain `dict`
  literal).
- `multiaddr` -> `TYPE_CHECKING` + fn-local imports in
  `discovery._multiaddr.mk_maddr()`/`parse_maddr()`; also
  `TYPE_CHECKING` the `Multiaddr` annots in `ipc._tcp`/`._uds`
  (adds future-annots to `._multiaddr`).
- `colorlog` -> fn-local in `log.get_console_log()`.
- `pdbp` + `wrapt` -> fn-local in
  `devx._frame_stack.hide_runtime_frames()`/`api_frame()`.
- `platformdirs` -> fn-local in `runtime._state.get_rt_dir()`.

Still eager (documented follow-ups),
- `pdbp` via `devx.debug._repl` class-bases
  (`PdbREPL(pdbp.Pdb)`) + the module-lvl `@pdbp.hideframe` in
  `._tty_lock`; needs a `._repl` restructure.
- `platformdirs` via the `UDSAddress.def_bindspace: ClassVar`
  class-body eval of `get_rt_dir()`; needs an `Address`-proto
  rework.
- `stackscope` is already fn-local; `setproctitle` is not
  imported anywhere.

Prompt-IO: ai/prompt-io/claude/20260702T155626Z_65bf9df5_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`asyncio` (~5ms) only matters for infected-aio actors yet gets
imported by every cold `import tractor` via module-lvl
`.to_asyncio` imports in the debug-REPL + spawn-entry mods.

Deats,
- `devx.debug._trace`/`._tty_lock`: mv `import asyncio` under
  `TYPE_CHECKING` + fn-local it at the two
  `asyncio.current_task()` call-sites; fn-local the
  `run_trio_task_in_future` imports in the infected-aio-only
  branches.
- `spawn._entry`: fn-local `run_as_asyncio_guest` inside the
  `infect_asyncio=True` branches of `_mp_main()`/
  `_trio_main()`.
- `tractor/__init__.py`: add a PEP-562 module `__getattr__`
  lazy-loading `.to_asyncio` on first attr-access so the
  public `tractor.to_asyncio.<attr>` API (e.g.
  `LinkedTaskChannel` annots in
  `test_child_manages_service_nursery.py` + downstream users)
  keeps working unchanged.

Prompt-IO: ai/prompt-io/claude/20260702T155626Z_65bf9df5_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Log the AI-assisted session per the NLNet generative-AI
policy: prompt, profiling findings, per-file diff pointers,
measured results and the unimplemented `pdbp`/`platformdirs`
deferral follow-ups.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Copilot AI review requested due to automatic review settings July 2, 2026 16:38
@goodboy goodboy changed the title Wkt/boot latency 470 Trim import tractor 0.42s -> 0.15s (gh #470) Jul 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces import tractor wall time (and downstream actor boot latency) by removing expensive eager imports and optimizing logger caller-module detection.

Changes:

  • Reworks tractor.log.get_logger() caller-module resolution to avoid inspect.stack()/inspect.getmodule() overhead; moves colorlog to a function-local import.
  • Introduces PEP 562 lazy submodule loading for tractor.to_asyncio to keep asyncio off the eager import tractor path.
  • Converts several imports (platformdirs, multiaddr, bidict, wrapt, pdbp, and some asyncio / tractor.to_asyncio call sites) to TYPE_CHECKING or local imports to reduce import-time cost.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tractor/spawn/_entry.py Defers .to_asyncio import to infect_asyncio=True branches to avoid eager asyncio cost.
tractor/runtime/_state.py Lazy-imports platformdirs inside get_rt_dir() to reduce eager imports.
tractor/log.py Replaces inspect.stack()-based caller detection with sys._getframe(); lazy-imports colorlog in console handler setup.
tractor/ipc/_uds.py Moves multiaddr.Multiaddr import to TYPE_CHECKING to avoid eager import cost.
tractor/ipc/_tcp.py Same as _uds.py: Multiaddr import moved under TYPE_CHECKING.
tractor/discovery/_multiaddr.py Adds postponed annotations + local multiaddr imports in parsing/formatting helpers.
tractor/discovery/_addr.py Moves bidict import under TYPE_CHECKING (annotation-only).
tractor/devx/debug/_tty_lock.py Defers asyncio / .to_asyncio imports to runtime branches that require them.
tractor/devx/debug/_trace.py Defers asyncio / .to_asyncio imports to runtime branches that require them.
tractor/devx/_frame_stack.py Defers pdbp and wrapt imports to their use sites to reduce eager import cost.
tractor/init.py Adds __getattr__ to lazily load tractor.to_asyncio on first access.
ai/prompt-io/claude/20260702T155626Z_65bf9df5_prompt_io.raw.md Adds raw prompt/output capture related to the work on gh #470.
ai/prompt-io/claude/20260702T155626Z_65bf9df5_prompt_io.md Adds summarized prompt/output capture related to the work on gh #470.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The #470 boot-latency example hard-coded spawning each `worker_<i>`
subactor concurrently from a bg `trio.Task` (so each child's cold
`import tractor` overlaps). Add a `main()` `spawn_subs_in_bg_tasks`
flag so the serial-spawn path can be demo'd/compared too: flip it
`False` to `start_actor()` each sub inline in the loop before
handing the ready `Portal` to the bg task.

Deats,
- factor an `open_ep(ptl, i)` helper out of `spawn_and_open_ep()` -
  just the `Portal.open_context()` + `wait_for_result()` half, now
  that the spawn step is caller-optional.
- `spawn_and_open_ep()` grows a `maybe_ptl: Portal|None = None`
  param: spawn the subactor itself when unset (bg-task path), OW
  reuse the pre-spawned one (serial path).
- move the "overlap cold imports" rationale comment onto the new
  `main()` param where the toggle now lives.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants