An fMRI study lives as thousands of files: NIfTI images, events.tsv files, confound regressors, JSON sidecars. The BIDS H5 archive captures an entire study in one compressed HDF5 file that is queryable by subject, task, session, and run. You download one file and start analyzing immediately.
The archive supports two modes. Pick the one that matches your analysis:
| Mode | What it stores | Reconstruction | Best for |
|---|---|---|---|
parcellated |
Cluster averages [T, K] |
Parcel-level only | ROI analyses, connectivity |
latent |
Basis [T, K] + loadings
[V, K] |
Full voxel resolution | Searchlight, fine-grained spatial |
Both modes store events, confounds, censor vectors, and BIDS metadata
alongside the data. The standard fmridataset API works on
either.
compress_bids_study() reads a BIDS directory and streams
scans one at a time into the archive. Only one NIfTI is held in memory
at once.
library(bidser)
bids <- bids_project("/path/to/my_study")
atlas <- neuroim2::read_vol("schaefer_400.nii.gz")
study <- compress_bids_study(
bids,
file = "my_study.h5",
mode = "parcellated",
clusters = atlas
)The result is a bids_h5_study_dataset you can use
immediately.
Latent mode uses fmrilatent::encode() to compress each
scan into a low-rank basis + loadings representation that can be
reconstructed back to voxel resolution.
library(fmrilatent)
study <- compress_bids_study(
bids,
file = "my_study_latent.h5",
mode = "latent",
encoding = spec_time_dct(k = 30),
mask = brain_mask
)For multi-subject studies that share a spatial atlas, a shared template stores the loadings once and only keeps per-scan coefficients, significantly reducing file size:
One function, one file:
#> <bids_h5_study_dataset>
#> File: my_study.h5
#> Mode: parcellated (400 parcels)
#> Subjects: 20 | Tasks: nback, rest | Sessions: pre, post
#> Scans: 80 | Total timepoints: 24000 | TR: 2s
BIDS metadata is directly accessible:
scan_manifest() returns a tibble with one row per
scan:
#> scan_name subject task session run n_time
#> sub-01_ses-pre_task-nback_run-01 01 nback pre 01 300
#> sub-01_ses-pre_task-rest_run-01 01 rest pre 01 200
#> ...
Use subset_bids_h5() to carve out the data you need.
This returns a new bids_h5_study_dataset backed by the same
file:
nback <- subset_bids_h5(study, task = "nback")
sub01 <- subset_bids_h5(study, subject = "01")
pre_nback <- subset_bids_h5(study, task = "nback", session = "pre")Subsetting is cheap — it selects scan backends from the shared H5 connection, no data is copied.
The standard fmridataset API works on the result:
#> [1] 6000 400
That is [total_timepoints, K] where K is the number of
parcels (or components in latent mode). Per-subject data:
#> [1] 600 400
The event table combines all scans with task,
session, subject_id, and both the BIDS
run label and an internal run_id:
#> onset duration trial_type run run_id subject_id task session
#> 0.0 2.0 face 01 1 01 nback pre
#> 4.0 2.0 house 01 1 01 nback pre
#> ...
Memory-efficient processing via data_chunks():
When working with a latent-mode archive, three additional accessors are available:
info <- encoding_info(study)
info$encoding_family
info$n_components
info$has_shared_template
loadings <- get_loadings(study, scan_name = "sub-01_task-nback_run-01")
dim(loadings)
recon <- reconstruct_voxels(study,
scan_name = "sub-01_task-nback_run-01",
rows = 1:10,
voxels = roi_indices
)
dim(recon)reconstruct_voxels() computes
basis %*% t(loadings) + offset on the fly, so you only
materialize the slice you need.
For parcellated archives, parcellation_info() gives you
the cluster mapping:
This returns NULL for latent-mode archives.
vignette("fmridataset-intro") for the core dataset
APIvignette("study-level-analysis") for multi-subject
workflows without BIDSvignette("backend-development-basics") if you want to
write a custom backend?compress_bids_study, ?bids_h5_dataset,
?subset_bids_h5 for full parameter docs