fmridataset separates how data is stored from how analyses consume it. This document gives you a mental model of the three-layer stack so you can predict how components interact, choose the right class for a given task, and extend the package without modifying core code.
┌──────────────────────────────────────────────────────┐
│ User layer │
│ fmri_group fmri_study_dataset │
│ (multi-study) (multi-subject) │
├──────────────────────────────────────────────────────┤
│ Dataset layer │
│ matrix_dataset fmri_file_dataset latent_dataset│
│ (in-memory) (NIfTI / HDF5) (embedding) │
├──────────────────────────────────────────────────────┤
│ Backend layer │
│ matrix_backend nifti_backend h5_backend │
│ zarr_backend study_backend │
└──────────────────────────────────────────────────────┘
↑ all backends implement the same five-method contract
Each layer only knows about the layer directly beneath it. Analysis
code that calls get_data_matrix() or
data_chunks() works identically whether the data lives in
RAM, on disk, or in a cloud-hosted Zarr store.
Every backend must implement five generic functions:
backend_open(backend) # open file handles / allocate resources
backend_close(backend) # release resources
backend_get_dims(backend) # return list(spatial = c(x,y,z), time = n)
backend_get_data(backend, # return matrix [time x voxels]
rows = NULL,
cols = NULL)
validate_backend(backend) # stop() on contract violationsrows selects timepoints and cols selects
voxels; both default to “all”. Backends may add caching, memory-mapping,
or chunked I/O behind those five calls without any dataset-layer
changes. For a full walkthrough of writing and registering a backend see
the backend-development-basics vignette.
matrix_datasetWraps an in-memory [time x voxels] matrix. Use it for
simulated data, preprocessed ROI time series, or any situation where the
full dataset fits comfortably in RAM.
fmri_file_datasetPoints at NIfTI or HDF5 files without loading them. The underlying
nifti_backend or h5_backend loads voxel blocks
on demand, so construction is near-instantaneous even for large scan
collections.
latent_datasetStores data in a lower-dimensional embedding space (e.g., ICA components, PCA scores) rather than voxel space. The interface is identical to the other dataset classes; only the column semantics differ.
fmri_study_datasetAggregates multiple single-subject datasets under one object.
Subject-level data stays lazy; you iterate over subjects via
data_chunks() with runwise = TRUE or pull one
subject at a time.
Every dataset carries a sampling_frame that models the
acquisition timeline.
# Constructed automatically, or explicitly:
sf <- sampling_frame(
blocklens = c(150, 150), # run lengths in TRs
TR = 2.0
)
get_TR(sf) # 2.0
get_run_lengths(sf) # 150 and 150
get_total_duration(sf) # 600 secondsThe sampling frame handles all timepoint-to-second and TR-index
conversions, so the rest of the codebase never does raw arithmetic on
timing. An event_table can be attached to any dataset;
onset times are validated against the sampling frame at assignment.
Run lengths also drive the runwise chunking mode: when
you request data_chunks(ds, runwise = TRUE) the iterator
yields one [time x voxels] block per run, boundaries
already aligned to the sampling frame.
Implement the five contract functions for your new class, then register it:
my_backend <- function(source, ...) {
structure(list(source = source, ...), class = c("my_backend", "storage_backend"))
}
backend_open.my_backend <- function(b) {
... # open the underlying source
}
backend_close.my_backend <- function(b) {
... # release resources
}
backend_get_dims.my_backend <- function(b) {
... # named list with spatial and time entries
}
backend_get_data.my_backend <- function(b, rows = NULL, cols = NULL) {
... # return a [time x voxels] matrix
}
validate_backend.my_backend <- function(b) {
... # check invariants
}Once those five methods exist, your backend can be used inside any
fmri_file_dataset by passing an instance via the
backend argument, or inside a custom dataset subclass.
fmri_file_dataset with h5_backend is the
recommended pattern for preprocessed BIDS derivatives stored in HDF5.
The h5_backend uses chunk-aware reads aligned to the HDF5
chunk lattice, so iterating through a BIDS cohort with
data_chunks() achieves near-optimal I/O without any changes
to analysis code.
| Class | Constructor | Purpose |
|---|---|---|
matrix_dataset |
matrix_dataset() |
In-memory matrix, full random access |
fmri_file_dataset |
fmri_file_dataset() |
Lazy NIfTI / HDF5 file access |
latent_dataset |
latent_dataset() |
Embedding / component space data |
fmri_study_dataset |
fmri_study_dataset() |
Multi-subject container |
sampling_frame |
sampling_frame() |
Temporal structure for one session |
matrix_backend |
matrix_backend() |
In-memory backend (internal) |
nifti_backend |
nifti_backend() |
NIfTI file backend |
h5_backend |
h5_backend() |
HDF5 backend with chunk-aware I/O |
zarr_backend |
zarr_backend() |
Zarr backend for cloud-native arrays |
study_backend |
study_backend() |
Multi-subject lazy backend |