--- title: Architecture Overview author: fmridataset Team date: '`r Sys.Date()`' output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: > %\VignetteIndexEntry{Architecture Overview} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} params: family: red preset: homage css: albers.css resource_files: - albers.css - albers.js includes: in_header: |- --- ```{r setup, include=FALSE} if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) { ggplot2::theme_set( albersdown::theme_albers(family = params$family, preset = params$preset) ) } knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` fmridataset separates *how data is stored* from *how analyses consume it*. This document gives you a mental model of the three-layer stack so you can predict how components interact, choose the right class for a given task, and extend the package without modifying core code. # Layer Diagram ``` ┌──────────────────────────────────────────────────────┐ │ User layer │ │ fmri_group fmri_study_dataset │ │ (multi-study) (multi-subject) │ ├──────────────────────────────────────────────────────┤ │ Dataset layer │ │ matrix_dataset fmri_file_dataset latent_dataset│ │ (in-memory) (NIfTI / HDF5) (embedding) │ ├──────────────────────────────────────────────────────┤ │ Backend layer │ │ matrix_backend nifti_backend h5_backend │ │ zarr_backend study_backend │ └──────────────────────────────────────────────────────┘ ↑ all backends implement the same five-method contract ``` Each layer only knows about the layer directly beneath it. Analysis code that calls `get_data_matrix()` or `data_chunks()` works identically whether the data lives in RAM, on disk, or in a cloud-hosted Zarr store. # The Backend Contract Every backend must implement five generic functions: ```{r backend-contract} backend_open(backend) # open file handles / allocate resources backend_close(backend) # release resources backend_get_dims(backend) # return list(spatial = c(x,y,z), time = n) backend_get_data(backend, # return matrix [time x voxels] rows = NULL, cols = NULL) validate_backend(backend) # stop() on contract violations ``` `rows` selects timepoints and `cols` selects voxels; both default to "all". Backends may add caching, memory-mapping, or chunked I/O behind those five calls without any dataset-layer changes. For a full walkthrough of writing and registering a backend see the `backend-development-basics` vignette. # Dataset Classes ## `matrix_dataset` Wraps an in-memory `[time x voxels]` matrix. Use it for simulated data, preprocessed ROI time series, or any situation where the full dataset fits comfortably in RAM. ```{r matrix-dataset-example} ds <- matrix_dataset( datamat = matrix(rnorm(100 * 500), nrow = 100, ncol = 500), TR = 2.0, run_length = c(50, 50) ) ``` ## `fmri_file_dataset` Points at NIfTI or HDF5 files without loading them. The underlying `nifti_backend` or `h5_backend` loads voxel blocks on demand, so construction is near-instantaneous even for large scan collections. ```{r file-dataset-example} ds <- fmri_file_dataset( scans = c("run1.nii.gz", "run2.nii.gz"), mask = "brain_mask.nii.gz", TR = 2.0, run_length = c(200, 200) ) ``` ## `latent_dataset` Stores data in a lower-dimensional embedding space (e.g., ICA components, PCA scores) rather than voxel space. The interface is identical to the other dataset classes; only the column semantics differ. ```{r latent-dataset-example} ds <- latent_dataset( loadings = component_matrix, # voxels x components scores = score_matrix, # time x components TR = 2.0, run_length = c(100, 100) ) ``` ## `fmri_study_dataset` Aggregates multiple single-subject datasets under one object. Subject-level data stays lazy; you iterate over subjects via `data_chunks()` with `runwise = TRUE` or pull one subject at a time. ```{r study-dataset-example} study <- fmri_study_dataset( datasets = list(sub01_ds, sub02_ds, sub03_ds), subject_ids = c("sub-01", "sub-02", "sub-03") ) ``` # Temporal Structure Every dataset carries a `sampling_frame` that models the acquisition timeline. ```{r sampling-frame-example} # Constructed automatically, or explicitly: sf <- sampling_frame( blocklens = c(150, 150), # run lengths in TRs TR = 2.0 ) get_TR(sf) # 2.0 get_run_lengths(sf) # 150 and 150 get_total_duration(sf) # 600 seconds ``` The sampling frame handles all timepoint-to-second and TR-index conversions, so the rest of the codebase never does raw arithmetic on timing. An `event_table` can be attached to any dataset; onset times are validated against the sampling frame at assignment. Run lengths also drive the `runwise` chunking mode: when you request `data_chunks(ds, runwise = TRUE)` the iterator yields one `[time x voxels]` block per run, boundaries already aligned to the sampling frame. # Extension Points ## Adding a New Backend Implement the five contract functions for your new class, then register it: ```{r custom-backend-skeleton} my_backend <- function(source, ...) { structure(list(source = source, ...), class = c("my_backend", "storage_backend")) } backend_open.my_backend <- function(b) { ... # open the underlying source } backend_close.my_backend <- function(b) { ... # release resources } backend_get_dims.my_backend <- function(b) { ... # named list with spatial and time entries } backend_get_data.my_backend <- function(b, rows = NULL, cols = NULL) { ... # return a [time x voxels] matrix } validate_backend.my_backend <- function(b) { ... # check invariants } ``` Once those five methods exist, your backend can be used inside any `fmri_file_dataset` by passing an instance via the `backend` argument, or inside a custom dataset subclass. ## BIDS + HDF5 as a Concrete Example `fmri_file_dataset` with `h5_backend` is the recommended pattern for preprocessed BIDS derivatives stored in HDF5. The `h5_backend` uses chunk-aware reads aligned to the HDF5 chunk lattice, so iterating through a BIDS cohort with `data_chunks()` achieves near-optimal I/O without any changes to analysis code. # Object Zoo | Class | Constructor | Purpose | |---|---|---| | `matrix_dataset` | `matrix_dataset()` | In-memory matrix, full random access | | `fmri_file_dataset` | `fmri_file_dataset()` | Lazy NIfTI / HDF5 file access | | `latent_dataset` | `latent_dataset()` | Embedding / component space data | | `fmri_study_dataset` | `fmri_study_dataset()` | Multi-subject container | | `sampling_frame` | `sampling_frame()` | Temporal structure for one session | | `matrix_backend` | `matrix_backend()` | In-memory backend (internal) | | `nifti_backend` | `nifti_backend()` | NIfTI file backend | | `h5_backend` | `h5_backend()` | HDF5 backend with chunk-aware I/O | | `zarr_backend` | `zarr_backend()` | Zarr backend for cloud-native arrays | | `study_backend` | `study_backend()` | Multi-subject lazy backend | # Related Vignettes - `fmridataset-intro` — hands-on introduction to constructors and data access - `backend-development-basics` — step-by-step guide to writing a new backend - `study-level-analysis` — multi-subject iteration patterns with `fmri_study_dataset`