---
title: Architecture Overview
author: fmridataset Team
date: '`r Sys.Date()`'
output:
rmarkdown::html_vignette:
toc: yes
toc_depth: 3
vignette: >
%\VignetteIndexEntry{Architecture Overview}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
params:
family: red
preset: homage
css: albers.css
resource_files:
- albers.css
- albers.js
includes:
in_header: |-
---
```{r setup, include=FALSE}
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) {
ggplot2::theme_set(
albersdown::theme_albers(family = params$family, preset = params$preset)
)
}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)
```
fmridataset separates *how data is stored* from *how analyses consume it*.
This document gives you a mental model of the three-layer stack so you can
predict how components interact, choose the right class for a given task, and
extend the package without modifying core code.
# Layer Diagram
```
┌──────────────────────────────────────────────────────┐
│ User layer │
│ fmri_group fmri_study_dataset │
│ (multi-study) (multi-subject) │
├──────────────────────────────────────────────────────┤
│ Dataset layer │
│ matrix_dataset fmri_file_dataset latent_dataset│
│ (in-memory) (NIfTI / HDF5) (embedding) │
├──────────────────────────────────────────────────────┤
│ Backend layer │
│ matrix_backend nifti_backend h5_backend │
│ zarr_backend study_backend │
└──────────────────────────────────────────────────────┘
↑ all backends implement the same five-method contract
```
Each layer only knows about the layer directly beneath it.
Analysis code that calls `get_data_matrix()` or `data_chunks()` works
identically whether the data lives in RAM, on disk, or in a cloud-hosted
Zarr store.
# The Backend Contract
Every backend must implement five generic functions:
```{r backend-contract}
backend_open(backend) # open file handles / allocate resources
backend_close(backend) # release resources
backend_get_dims(backend) # return list(spatial = c(x,y,z), time = n)
backend_get_data(backend, # return matrix [time x voxels]
rows = NULL,
cols = NULL)
validate_backend(backend) # stop() on contract violations
```
`rows` selects timepoints and `cols` selects voxels; both default to "all".
Backends may add caching, memory-mapping, or chunked I/O behind those five
calls without any dataset-layer changes.
For a full walkthrough of writing and registering a backend see the
`backend-development-basics` vignette.
# Dataset Classes
## `matrix_dataset`
Wraps an in-memory `[time x voxels]` matrix.
Use it for simulated data, preprocessed ROI time series, or any situation
where the full dataset fits comfortably in RAM.
```{r matrix-dataset-example}
ds <- matrix_dataset(
datamat = matrix(rnorm(100 * 500), nrow = 100, ncol = 500),
TR = 2.0,
run_length = c(50, 50)
)
```
## `fmri_file_dataset`
Points at NIfTI or HDF5 files without loading them.
The underlying `nifti_backend` or `h5_backend` loads voxel blocks on demand,
so construction is near-instantaneous even for large scan collections.
```{r file-dataset-example}
ds <- fmri_file_dataset(
scans = c("run1.nii.gz", "run2.nii.gz"),
mask = "brain_mask.nii.gz",
TR = 2.0,
run_length = c(200, 200)
)
```
## `latent_dataset`
Stores data in a lower-dimensional embedding space (e.g., ICA components,
PCA scores) rather than voxel space.
The interface is identical to the other dataset classes; only the column
semantics differ.
```{r latent-dataset-example}
ds <- latent_dataset(
loadings = component_matrix, # voxels x components
scores = score_matrix, # time x components
TR = 2.0,
run_length = c(100, 100)
)
```
## `fmri_study_dataset`
Aggregates multiple single-subject datasets under one object.
Subject-level data stays lazy; you iterate over subjects via `data_chunks()`
with `runwise = TRUE` or pull one subject at a time.
```{r study-dataset-example}
study <- fmri_study_dataset(
datasets = list(sub01_ds, sub02_ds, sub03_ds),
subject_ids = c("sub-01", "sub-02", "sub-03")
)
```
# Temporal Structure
Every dataset carries a `sampling_frame` that models the acquisition
timeline.
```{r sampling-frame-example}
# Constructed automatically, or explicitly:
sf <- sampling_frame(
blocklens = c(150, 150), # run lengths in TRs
TR = 2.0
)
get_TR(sf) # 2.0
get_run_lengths(sf) # 150 and 150
get_total_duration(sf) # 600 seconds
```
The sampling frame handles all timepoint-to-second and TR-index conversions,
so the rest of the codebase never does raw arithmetic on timing.
An `event_table` can be attached to any dataset; onset times are validated
against the sampling frame at assignment.
Run lengths also drive the `runwise` chunking mode: when you request
`data_chunks(ds, runwise = TRUE)` the iterator yields one `[time x voxels]`
block per run, boundaries already aligned to the sampling frame.
# Extension Points
## Adding a New Backend
Implement the five contract functions for your new class, then register it:
```{r custom-backend-skeleton}
my_backend <- function(source, ...) {
structure(list(source = source, ...), class = c("my_backend", "storage_backend"))
}
backend_open.my_backend <- function(b) {
... # open the underlying source
}
backend_close.my_backend <- function(b) {
... # release resources
}
backend_get_dims.my_backend <- function(b) {
... # named list with spatial and time entries
}
backend_get_data.my_backend <- function(b, rows = NULL, cols = NULL) {
... # return a [time x voxels] matrix
}
validate_backend.my_backend <- function(b) {
... # check invariants
}
```
Once those five methods exist, your backend can be used inside any
`fmri_file_dataset` by passing an instance via the `backend` argument, or
inside a custom dataset subclass.
## BIDS + HDF5 as a Concrete Example
`fmri_file_dataset` with `h5_backend` is the recommended pattern for
preprocessed BIDS derivatives stored in HDF5.
The `h5_backend` uses chunk-aware reads aligned to the HDF5 chunk lattice,
so iterating through a BIDS cohort with `data_chunks()` achieves near-optimal
I/O without any changes to analysis code.
# Object Zoo
| Class | Constructor | Purpose |
|---|---|---|
| `matrix_dataset` | `matrix_dataset()` | In-memory matrix, full random access |
| `fmri_file_dataset` | `fmri_file_dataset()` | Lazy NIfTI / HDF5 file access |
| `latent_dataset` | `latent_dataset()` | Embedding / component space data |
| `fmri_study_dataset` | `fmri_study_dataset()` | Multi-subject container |
| `sampling_frame` | `sampling_frame()` | Temporal structure for one session |
| `matrix_backend` | `matrix_backend()` | In-memory backend (internal) |
| `nifti_backend` | `nifti_backend()` | NIfTI file backend |
| `h5_backend` | `h5_backend()` | HDF5 backend with chunk-aware I/O |
| `zarr_backend` | `zarr_backend()` | Zarr backend for cloud-native arrays |
| `study_backend` | `study_backend()` | Multi-subject lazy backend |
# Related Vignettes
- `fmridataset-intro` — hands-on introduction to constructors and data access
- `backend-development-basics` — step-by-step guide to writing a new backend
- `study-level-analysis` — multi-subject iteration patterns with `fmri_study_dataset`