Skip to content

Add remote-read support to read_nwb#2190

Open
h-mayorquin wants to merge 2 commits into
devfrom
add_remote_reading_support_to_read_nwb
Open

Add remote-read support to read_nwb#2190
h-mayorquin wants to merge 2 commits into
devfrom
add_remote_reading_support_to_read_nwb

Conversation

@h-mayorquin

@h-mayorquin h-mayorquin commented May 14, 2026

Copy link
Copy Markdown
Contributor

Closes #2150.

The main contribution of this PR is enabling pynwb.read_nwb(path) to read nwbfiles in remote public locations such as public DANDI Archive assets served over HTTPS (e.g. https://dandiarchive.s3.amazonaws.com/...) or any publicly-readable S3, GCS, or HTTPS object store.

To move this forward I have decided not to use the can_read methods for dispatch, like I did in #1994 for local file support. What we have here is simple URL pattern matching: routing based on conventions like the .zarr suffix and DANDI's /zarr/<uuid>/ URL layout, without opening the file. This avoids the performance cost of reading the file twice and in addition does not overpromise (can_read implies a strong contract that is hard-to-deliver for all the complexity of remote files).

Private files still can be accessed in a lot of cases. For example, a private S3 bucket read with s3:// when AWS credentials are already configured in the environment (AWS_PROFILE or ~/.aws/credentials), a gs:// object when GOOGLE_APPLICATION_CREDENTIALS is set, or an abfs:// path via Azure managed identity; fsspec picks up each backend's default credential chain automatically.

That said, I did not want to make the signature more complex by adding login configuration parameters as that would be against the original spirit of this function. The original read_nwb design from #1974 was as simple as possible without config. Power-user scenarios (e.g. forced h5py ROS3 driver or custom S3-compatible endpoints) continue to require dropping to NWBHDF5IO or NWBZarrIO directly and I think this is where private access should be.

The remote-Zarr test depends on the resolve_ref self-reference fix for fsspec stores from hdmf-dev/hdmf-zarr#348, which I opened upstream. That fix is now released in hdmf-zarr 0.13.0, so no special pin is needed and the test runs against a normal hdmf-zarr install.

I am also fixing a pre-existing scheme bug in NWBHDF5IO.read_nwb's streaming branch: the fsspec filesystem was hard-coded to "http" regardless of URL scheme, so s3://, gs://, and abfs:// paths silently failed for non-HTTP backends.

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff check . && codespell from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov

codecov Bot commented May 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.00%. Comparing base (e3127b2) to head (450b793).

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #2190      +/-   ##
==========================================
+ Coverage   95.99%   96.00%   +0.01%     
==========================================
  Files          30       30              
  Lines        2970     2982      +12     
  Branches      431      433       +2     
==========================================
+ Hits         2851     2863      +12     
  Misses         67       67              
  Partials       52       52              
Flag Coverage Δ
integration 74.44% <100.00%> (+0.17%) ⬆️
unit 86.38% <7.14%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@h-mayorquin h-mayorquin changed the title Add remote-read support to pynwb.read_nwb Add remote-read support to read_nwb May 14, 2026
@h-mayorquin h-mayorquin marked this pull request as ready for review May 14, 2026 03:41
read_nwb now accepts remote URLs (s3://, gs://, abfs://, https://, etc.) and
dispatches by URL shape: .zarr suffixes and DANDI Zarr assets under /zarr/ go
to NWBZarrIO, everything else to NWBHDF5IO. Remote files are opened through
fsspec using the URL's actual scheme, replacing the hardcoded
fsspec.filesystem("http") that mishandled non-HTTP schemes.

Adds integration tests covering local HDF5/Zarr reads and anonymous public
remote reads over HTTPS for both backends.
@h-mayorquin h-mayorquin force-pushed the add_remote_reading_support_to_read_nwb branch from 0ccff58 to 6536cda Compare July 1, 2026 19:45
@h-mayorquin h-mayorquin requested a review from rly July 1, 2026 19:54
Comment thread tests/integration/io/test_read.py
oruebel
oruebel previously approved these changes Jul 1, 2026
Comment thread CHANGELOG.md Outdated
@oruebel

oruebel commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Other than moving the changelog entry, this looks good to me. I'll let @rly handle merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support for remote paths in pynwb.read_nwb

2 participants