vfs: integrate with CJS and ESM module loaders#63653
Conversation
|
Review requested:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #63653 +/- ##
==========================================
- Coverage 91.96% 90.32% -1.64%
==========================================
Files 379 732 +353
Lines 166638 237271 +70633
Branches 25497 44727 +19230
==========================================
+ Hits 153242 214325 +61083
- Misses 13099 14656 +1557
- Partials 297 8290 +7993
🚀 New features to boost your workflow:
|
|
@joyeecheung take a look, should be easier to review. |
Restore the "DO NOT depend on the patchability" warnings in esm/load.js and esm/resolve.js that were dropped along with the fs imports. The warning still applies; it now also points at node:vfs as one of the formal hook mechanisms callers should reach for instead. Addresses review feedback from @jsumners-nr in nodejs#63653
6321e08 to
51b033a
Compare
joyeecheung
left a comment
There was a problem hiding this comment.
A design question recently occurred to me: have we explored the versioning of the mounting?
| const { clearPackageJSONCache } = require('internal/modules/package_json_reader'); | ||
| clearPackageJSONCache(); | ||
| // The ESM cascaded loader's loadCache is intentionally NOT cleared here: | ||
| // clearing it mid-flight (while another import() is awaiting nested |
There was a problem hiding this comment.
Isn't it also the case for require() anyway? (You can unmount it at the top level of a CJS module while the graph is loading and things can be broken there, it's always up to the user to ensure they don't unmount from an unsafe place).
An alternative approach would be to add a cache-busting search params to avoid the stale caches, which solves the identity problem. This is also what the test mock module does and how some user-land HMR works, see #61767 (comment)
There was a problem hiding this comment.
I wanted to avoid search params because in my experience they are unreliable (in a graph) and I wanted to minimize memory leaks. Not sure if it's totally possible.
There was a problem hiding this comment.
Doesn't this now leak all the time whenever new ESM are loaded through the VFS? Unless the identifiers are reused by subsequent mounting (i.e. graph 1, 2, 3 overlaps), the modules stay stale in the cache. With a search param in the identifier I think it will be possible walk through the caches and purge everything identified by an umounted VFS version in the search params during unmounting. Not that it eliminates the leaks (#63186 will continue to be an issue), but at least it reduces the leaks when the graphs don't overlap?
what do you mean? You mean multiple vfs layers on top of each other? |
For the stacks to have some kind of version number/ID to identify the current status? BTW I just noticed that there's no mention of |
I did purge them when doing the splitting; I forgot to bring them back. I'll add them to this PR. |
No but we totally should. |
Each VirtualFileSystem now exposes a per-process monotonically increasing `layerId`, assigned at construction. The id is stable across mount/unmount cycles for the lifetime of the instance and surfaces in: - debug() output for register / deregister so the layer stack is visible when NODE_DEBUG=vfs is enabled; - the overlap ERR_INVALID_STATE message, which now names the layer ids of the conflicting mounts. The id is the building block for tagging cache entries with the owning VFS, which a follow-up will use to replace the global loader-cache flush in deregisterVFS with a scoped purge. Refs: nodejs#63653 Signed-off-by: Matteo Collina <hello@matteocollina.com>
51b033a to
294a19c
Compare
Replace the global loader-cache flush in deregisterVFS with a scope-purge that only drops entries owned by the unmounting VFS. Per-layer ownership is determined two ways: - For CJS-style filename-keyed caches (Module._cache, Module._pathCache, the CJS stat cache, the helpers.js realpath cache, and the package.json caches) entries are filtered with `vfs.shouldHandle(filename)`. __filename stays a clean absolute path so user code that does `path.dirname(__filename)` or similar is unaffected. - For the ESM cascaded loader's loadCache, entries are tagged at resolve time: when finalizeResolution() detects the resolved path is VFS-owned (via the new loaderGetLayerForPath hook), it appends `?vfs-layer=<id>` to the URL. The tag surfaces in `import.meta.url`, matching the cache-busting pattern used by HMR tooling. On deregister, entries whose URL carries the tag for the unmounting layer are deleted. Multi-mount setups no longer pay the cross-VFS cache-warmup penalty when a single VFS unmounts, and ESM modules loaded from a VFS become reachable for purge instead of leaking forever in the cascaded loader. New helpers exposed for VFS: - cjs/loader.js: clearStatCacheForVFS - helpers.js: purgeRealpathCacheForVFS, loaderGetLayerForPath - package_json_reader.js: purgePackageJSONCacheForVFS Adds test-vfs-scoped-cache-purge covering both the multi-mount isolation and the import.meta.url tag visibility. Refs: nodejs#63653 Signed-off-by: Matteo Collina <hello@matteocollina.com>
|
@joyeecheung PTAL |
75bb5c7 to
99a5a5c
Compare
Each VirtualFileSystem now exposes a per-process monotonically increasing `layerId`, assigned at construction. The id is stable across mount/unmount cycles for the lifetime of the instance and surfaces in: - debug() output for register / deregister so the layer stack is visible when NODE_DEBUG=vfs is enabled; - the overlap ERR_INVALID_STATE message, which now names the layer ids of the conflicting mounts. The id is the building block for tagging cache entries with the owning VFS, which a follow-up will use to replace the global loader-cache flush in deregisterVFS with a scoped purge. Refs: nodejs#63653 Signed-off-by: Matteo Collina <hello@matteocollina.com>
Replace the global loader-cache flush in deregisterVFS with a scope-purge that only drops entries owned by the unmounting VFS. Per-layer ownership is determined two ways: - For CJS-style filename-keyed caches (Module._cache, Module._pathCache, the CJS stat cache, the helpers.js realpath cache, and the package.json caches) entries are filtered with `vfs.shouldHandle(filename)`. __filename stays a clean absolute path so user code that does `path.dirname(__filename)` or similar is unaffected. - For the ESM cascaded loader's loadCache, entries are tagged at resolve time: when finalizeResolution() detects the resolved path is VFS-owned (via the new loaderGetLayerForPath hook), it appends `?vfs-layer=<id>` to the URL. The tag surfaces in `import.meta.url`, matching the cache-busting pattern used by HMR tooling. On deregister, entries whose URL carries the tag for the unmounting layer are deleted. Multi-mount setups no longer pay the cross-VFS cache-warmup penalty when a single VFS unmounts, and ESM modules loaded from a VFS become reachable for purge instead of leaking forever in the cascaded loader. New helpers exposed for VFS: - cjs/loader.js: clearStatCacheForVFS - helpers.js: purgeRealpathCacheForVFS, loaderGetLayerForPath - package_json_reader.js: purgePackageJSONCacheForVFS Adds test-vfs-scoped-cache-purge covering both the multi-mount isolation and the import.meta.url tag visibility. Refs: nodejs#63653 Signed-off-by: Matteo Collina <hello@matteocollina.com>
| let _loaderRealpath = null; | ||
| let _loaderLegacyMainResolve = null; | ||
| let _loaderGetFormatOfExtensionlessFile = null; | ||
| let _loaderGetLayerForPath = null; |
5d002a9 to
6fd7334
Compare
| const moduleCache = cjsLoader.Module._cache; | ||
| for (const filename of ObjectKeys(moduleCache)) { | ||
| if (isOwnedByVFS(vfs, filename)) { | ||
| delete moduleCache[filename]; |
There was a problem hiding this comment.
Can this be made to use a Map instead. Using the delete is not ideal.
| const cjsLoader = require('internal/modules/cjs/loader'); | ||
| const moduleCache = cjsLoader.Module._cache; | ||
| for (const filename of ObjectKeys(moduleCache)) { | ||
| if (isOwnedByVFS(vfs, filename)) { |
There was a problem hiding this comment.
Nit: isOwnedByVFS includes a try/catch that swallows possible errors. If that is throwing for whatever reason independent of the specific filename, then it's possible that every filename iteration here could trigger the catch case in the worst case, leading to a new wasted error and stack capture on every iteration.
There was a problem hiding this comment.
The code is doing a suspicious amount of path.resolve() everywhere (which would resolve the paths to process.cwd(), not against VFS, and the result would be observable in e.g. error messages), and the same filename can go through that multiple times repeatedly in various places, which seems rather wasteful..
4dc22f7 to
4bb502d
Compare
4bb502d to
10412dc
Compare
|
@joyeecheung PTAL |
Co-authored-by: James M Snell <jasnell@gmail.com> Signed-off-by: Matteo Collina <hello@matteocollina.com>
10412dc to
572c4bc
Compare
Makes `require()` and `import` resolve files served by `node:vfs`. Before this PR, mounted VFS files were only visible through `fs.*`; the loaders went straight to the real filesystem.
Mechanism: toggleable wrappers in the loaders. Null fast-path when no VFS is mounted; otherwise the VFS answers stat / read / realpath / package.json.
User-visible side effects:
Review guide
Suggested reading order:
Tests (all gated by `--experimental-vfs`): `test-vfs-require`, `test-vfs-import`, `test-vfs-module-hooks`, `test-vfs-package-json`, `test-vfs-package-json-cache`, `test-vfs-invalid-package-json`, `test-vfs-module-hooks-cleanup`, `test-vfs-layer-id`, `test-vfs-scoped-cache-purge`.
Out of scope
SEA + VFS, migrating the C++ `package_configs_` cache, broader permission-model integration.