--- title: 'BIDS H5 Archive: Compressing a Study into a Single File' output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: > %\VignetteIndexEntry{BIDS H5 Archive: Compressing a Study into a Single File} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} params: family: red preset: homage css: albers.css resource_files: - albers.css - albers.js includes: in_header: |- --- ```{r setup, include = FALSE} if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) { ggplot2::theme_set( albersdown::theme_albers(family = params$family, preset = params$preset) ) } knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, eval = FALSE ) ``` An fMRI study lives as thousands of files: NIfTI images, events.tsv files, confound regressors, JSON sidecars. The BIDS H5 archive captures an entire study in **one compressed HDF5 file** that is queryable by subject, task, session, and run. You download one file and start analyzing immediately. ```{r load-package, eval = TRUE} library(fmridataset) ``` # Which compression mode should you use? The archive supports two modes. Pick the one that matches your analysis: | Mode | What it stores | Reconstruction | Best for | |:-----|:---------------|:---------------|:---------| | `parcellated` | Cluster averages `[T, K]` | Parcel-level only | ROI analyses, connectivity | | `latent` | Basis `[T, K]` + loadings `[V, K]` | Full voxel resolution | Searchlight, fine-grained spatial | Both modes store events, confounds, censor vectors, and BIDS metadata alongside the data. The standard `fmridataset` API works on either. # How do you create an archive? `compress_bids_study()` reads a BIDS directory and streams scans one at a time into the archive. Only one NIfTI is held in memory at once. ## Parcellated mode ```{r write-parcellated} library(bidser) bids <- bids_project("/path/to/my_study") atlas <- neuroim2::read_vol("schaefer_400.nii.gz") study <- compress_bids_study( bids, file = "my_study.h5", mode = "parcellated", clusters = atlas ) ``` The result is a `bids_h5_study_dataset` you can use immediately. ## Latent mode Latent mode uses `fmrilatent::encode()` to compress each scan into a low-rank basis + loadings representation that can be reconstructed back to voxel resolution. ```{r write-latent} library(fmrilatent) study <- compress_bids_study( bids, file = "my_study_latent.h5", mode = "latent", encoding = spec_time_dct(k = 30), mask = brain_mask ) ``` For multi-subject studies that share a spatial atlas, a **shared template** stores the loadings once and only keeps per-scan coefficients, significantly reducing file size: ```{r write-template} tpl <- parcel_basis_template(parcellation, basis_spec = basis_slepian(k = 10)) study <- compress_bids_study( bids, file = "my_study_template.h5", mode = "latent", template = tpl, mask = brain_mask ) ``` # How do you open an archive? One function, one file: ```{r read-archive} study <- bids_h5_dataset("my_study.h5") study ``` ``` #> #> File: my_study.h5 #> Mode: parcellated (400 parcels) #> Subjects: 20 | Tasks: nback, rest | Sessions: pre, post #> Scans: 80 | Total timepoints: 24000 | TR: 2s ``` # Exploring the study BIDS metadata is directly accessible: ```{r explore-metadata} participants(study) tasks(study) sessions(study) scan_manifest(study) ``` `scan_manifest()` returns a tibble with one row per scan: ``` #> scan_name subject task session run n_time #> sub-01_ses-pre_task-nback_run-01 01 nback pre 01 300 #> sub-01_ses-pre_task-rest_run-01 01 rest pre 01 200 #> ... ``` # Subsetting by task, subject, or session Use `subset_bids_h5()` to carve out the data you need. This returns a new `bids_h5_study_dataset` backed by the same file: ```{r subset-study} nback <- subset_bids_h5(study, task = "nback") sub01 <- subset_bids_h5(study, subject = "01") pre_nback <- subset_bids_h5(study, task = "nback", session = "pre") ``` Subsetting is cheap --- it selects scan backends from the shared H5 connection, no data is copied. # Accessing data The standard `fmridataset` API works on the result: ```{r access-data} mat <- get_data_matrix(nback) dim(mat) ``` ``` #> [1] 6000 400 ``` That is `[total_timepoints, K]` where K is the number of parcels (or components in latent mode). Per-subject data: ```{r per-subject} mat01 <- get_data_matrix(nback, subject_id = "01") dim(mat01) ``` ``` #> [1] 600 400 ``` ## Events The event table combines all scans with `task`, `session`, `subject_id`, and both the BIDS `run` label and an internal `run_id`: ```{r events} head(nback$event_table) ``` ``` #> onset duration trial_type run run_id subject_id task session #> 0.0 2.0 face 01 1 01 nback pre #> 4.0 2.0 house 01 1 01 nback pre #> ... ``` ## Confounds ```{r confounds} conf <- get_confounds(study, scan_name = "sub-01_ses-pre_task-nback_run-01") head(conf) ``` ## Chunked iteration Memory-efficient processing via `data_chunks()`: ```{r chunks} chunks <- data_chunks(nback, nchunks = 10) while (!is.null(chunk <- iterators::nextElem(chunks))) { # Process chunk$data [T, K_chunk] } ``` ## Group operations Convert to `fmri_group` for per-subject analyses: ```{r group} group <- study_to_group(nback) ``` # Latent-mode extras When working with a latent-mode archive, three additional accessors are available: ```{r latent-extras} info <- encoding_info(study) info$encoding_family info$n_components info$has_shared_template loadings <- get_loadings(study, scan_name = "sub-01_task-nback_run-01") dim(loadings) recon <- reconstruct_voxels(study, scan_name = "sub-01_task-nback_run-01", rows = 1:10, voxels = roi_indices ) dim(recon) ``` `reconstruct_voxels()` computes `basis %*% t(loadings) + offset` on the fly, so you only materialize the slice you need. # Parcellation metadata For parcellated archives, `parcellation_info()` gives you the cluster mapping: ```{r parcellation} pinfo <- parcellation_info(study) pinfo$n_parcels pinfo$cluster_ids ``` This returns `NULL` for latent-mode archives. # Next steps - `vignette("fmridataset-intro")` for the core dataset API - `vignette("study-level-analysis")` for multi-subject workflows without BIDS - `vignette("backend-development-basics")` if you want to write a custom backend - `?compress_bids_study`, `?bids_h5_dataset`, `?subset_bids_h5` for full parameter docs