Skip to content

AllenNeuralDynamics/LC-NE_BARseq_MAT-RDS_conversion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LC-NE BARseq MATLAB-to-RDS Conversion

Data preparation capsule that converts BARseq gene-expression data from MATLAB format to R SingleCellExperiment (SCE) objects for downstream analysis, as part of:

Su, Kosillo, Jung, Chen et al. (2026). Topographic structure and function of locus coeruleus norepinephrine neurons. bioRxiv 2026.04.10.717727

This capsule does not produce manuscript figures. Its outputs are saved as two per-subject derived data assets which are consumed by the downstream analysis capsule LC-NE_BARseq_MAPseq_analyses (Code Ocean), which uses them to generate Figure S5 of the manuscript.

GitHub: https://github.com/AllenNeuralDynamics/LC-NE_BARseq_MAT-RDS_conversion
Code Ocean: https://codeocean.allenneuraldynamics.org/capsule/3953531/tree
Full collection: https://codeocean.allenneuraldynamics.org/collections/9cf044ce-93c7-4c7e-bfa1-5d8c37aa42ec

Code

File Description
00_env_library_loading.R Loads the r4-base conda environment and core libraries (hdf5r, Matrix, SingleCellExperiment). Provided as a reference for interactive use; not called directly by the run script.
00_conversion_lib.R Shared library holding the conversion logic. Defines convert_v7_filtneurons() (reads a v7.3 BARseq .mat into a SingleCellExperiment) and convert_subject() (end-to-end per-subject pipeline: read input, save initial SCE, clean, save cleaned SCE, Dbh-filter, save filtered SCE). Sourced by the per-subject scripts.
01_BarSeq_RDSconvert_brain3_v2.R Per-subject driver for specimen 780345 (brain 3). Sources 00_conversion_lib.R and calls convert_subject().
01_BarSeq_RDSconvert_brain4_v2.R Per-subject driver for specimen 780346 (brain 4). Sources 00_conversion_lib.R and calls convert_subject().
02_update_metadata.py Generates AIND-compliant data_description.json and processing.json for each output folder, and copies peer metadata (acquisition.json, procedures.json, subject.json) from the input asset. Uses aind-data-schema Pydantic models for validation.
run Bash entry point for Reproducible Run. Renders each conversion script to an HTML report via knitr::spin, then runs the metadata-generation script.

convert_subject() performs the following steps for each subject:

  1. Opens the BARseq MATLAB file (.mat, HDF5 v7.3 format) using hdf5r::H5File and reads the filt_neurons group.
  2. Reconstructs the sparse gene-by-cell count matrix from stored CSC components using Matrix::sparseMatrix.
  3. Extracts per-cell metadata: slice, position, FOV coordinates, angle, depth, barcode status, batch number, CCF coordinates, and CCF annotation.
  4. Constructs a unique cell identifier (uid) from batch, slice, and cell ID.
  5. Assembles into a SingleCellExperiment object and saves as combined_neurons_clust_CCFv2.rds.
  6. Validates uid uniqueness, renames columns by uid, removes placeholder genes (unused-*) and duplicate hybridization-cycle genes.
  7. Saves the cleaned SCE as combined_neurons_clust_CCFv2_uid.rds — this is the file consumed by the downstream analysis capsule.
  8. Filters to putative LC-NE neurons (Dbh expression > 2) and saves as DBHfilteredneurons_clust_CCFv2_uid.rds.

Data assets

Asset is_public Description
barseq_780345_2025-02-24_12-00-00 true BARseq data for specimen 780345 (brain 3). Contains BARseq/combined_neurons_clust_CCFv2.mat. Bucket: aind-open-data.
barseq_780346_2025-06-13_12-00-00 true BARseq data for specimen 780346 (brain 4). Contains BARseq/combined_neurons_clust_CCFv2.mat. Bucket: aind-open-data.

Outputs

Each conversion script writes to a per-subject output folder under /results/, named:

/results/<input_asset_name>_processed_MAT2RDS_<timestamp>/

For example, a run on May 6, 2026 might produce:

  • /results/barseq_780345_2025-02-24_12-00-00_processed_MAT2RDS_2026-05-06_17-30-00/
  • /results/barseq_780346_2025-06-13_12-00-00_processed_MAT2RDS_2026-05-06_17-30-00/

Each output folder contains three .rds files:

File Description
combined_neurons_clust_CCFv2.rds Initial SingleCellExperiment object before duplicate-gene cleanup
combined_neurons_clust_CCFv2_uid.rds Cleaned SCE with unique cell IDs and unused-* / duplicate hybridization-cycle genes removed
DBHfilteredneurons_clust_CCFv2_uid.rds Same as above but filtered to putative LC-NE neurons (Dbh expression > 2)

After a reproducible run from the released capsule, these two output folders are saved as separate AIND-metadata-tagged data assets in aind-open-data, with processing JSON pointing back to this capsule. Those published assets are what the downstream analysis capsule mounts.

How to run in Code Ocean

Click Reproducible Run in Code Ocean. The run script processes both brains sequentially. Runtime is approximately 10 minutes on a large instance.

Before launching the run, attach the Code Ocean API Credentials Secret to the capsule (Capsule Settings → Credentials). The metadata-generation step queries the Code Ocean API at runtime to record the capsule's release version in each output folder's processing.json. Without the Secret the conversion still runs end-to-end and produces the RDS files plus data_description.json / subject.json / acquisition.json / procedures.json; only processing.json is skipped, with a warning. For producing the canonical published derived assets, the Secret should be attached so provenance is recorded.

Environment

R 4.2.3 in a conda environment (r4-base) with hdf5r, Matrix, and SingleCellExperiment as core dependencies. The full environment is defined in environment/r4-base.yml.

License

This project is licensed under the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors