---
title: Getting Started with fmridataset
author: fmridataset Team
date: '`r Sys.Date()`'
output:
rmarkdown::html_vignette:
toc: yes
toc_depth: 2
number_sections: yes
vignette: >
%\VignetteIndexEntry{Getting Started with fmridataset}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
params:
family: red
preset: homage
css: albers.css
resource_files:
- albers.css
- albers.js
includes:
in_header: |-
---
```{r setup, include=FALSE}
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) {
ggplot2::theme_set(
albersdown::theme_albers(family = params$family, preset = params$preset)
)
}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = TRUE,
warning = FALSE,
message = FALSE
)
library(fmridataset)
```
# Motivation
fMRI analyses involve heterogeneous sources: NIfTI files, BIDS datasets, preprocessed matrices, and HDF5 archives. fmridataset provides a unified interface that abstracts format differences so the same functions work regardless of how data is stored.
# Creating a Dataset
The simplest starting point is a matrix dataset, which wraps an in-memory matrix with temporal metadata.
```{r create-dataset}
set.seed(42)
mat <- matrix(rnorm(150 * 1000), nrow = 150, ncol = 1000)
ds <- matrix_dataset(
datamat = mat,
TR = 2.0,
run_length = c(75, 75)
)
print(ds)
```
The dataset holds 150 timepoints split into two runs of 75, with a 2-second TR.
# Accessing Data
`get_data_matrix()` returns a standard timepoints-by-voxels matrix. You can retrieve all runs or a single run by index.
```{r get-data}
all_data <- get_data_matrix(ds)
cat("Full dimensions:", dim(all_data), "\n")
run1 <- get_data_matrix(ds, run_id = 1)
cat("Run 1 dimensions:", dim(run1), "\n")
```
The returned matrix is always in timepoints x voxels orientation.
# Temporal Structure
Every dataset contains a `sampling_frame` that records run boundaries, TR, and duration.
```{r sampling-frame}
sf <- ds$sampling_frame
cat("TR:", get_TR(sf), "seconds\n")
cat("Runs:", n_runs(sf), "\n")
cat("Run lengths:", get_run_lengths(sf), "timepoints\n")
cat("Total duration:", get_total_duration(sf), "seconds\n")
# First six acquisition times (seconds)
head(samples(sf))
```
`blockids()` maps each timepoint to its run index, which is useful for run-specific indexing.
# Event Tables
Experimental design attaches to the dataset as a data frame in `$event_table`.
```{r event-table}
events <- data.frame(
onset = c(10, 30, 50, 70, 110, 130, 150, 170),
duration = 2,
trial_type = rep(c("faces", "houses"), 4),
run = c(1, 1, 1, 1, 2, 2, 2, 2)
)
ds$event_table <- events
head(ds$event_table)
```
Event onsets are in seconds and align with the sampling frame's time axis.
# Chunked Processing
For large datasets, `data_chunks()` partitions voxels into memory-manageable pieces without loading the entire matrix at once.
```{r chunked-processing}
chunks <- data_chunks(ds, nchunks = 4)
results <- lapply(chunks, function(chunk) {
colMeans(chunk$data)
})
voxel_means <- do.call(c, results)
cat("Computed means for", length(voxel_means), "voxels\n")
```
Each element of `results` corresponds to one chunk; `do.call(c, ...)` reassembles them in voxel order.
# Run-wise Processing
Set `runwise = TRUE` to get one chunk per run. This is appropriate for analyses that must respect run boundaries such as detrending or temporal filtering.
```{r runwise-processing}
run_chunks <- data_chunks(ds, runwise = TRUE)
run_stats <- lapply(run_chunks, function(chunk) {
cat("Run", chunk$chunk_num, ":", nrow(chunk$data), "timepoints\n")
rowMeans(chunk$data)
})
cat("Processed", length(run_stats), "runs\n")
```
Each chunk's `$data` contains only that run's timepoints.
# File-based Datasets
When working with NIfTI files, use `fmri_file_dataset()`. Data remains on disk until explicitly accessed.
```{r file-dataset, eval=FALSE}
ds_files <- fmri_file_dataset(
scans = c("/path/to/run1.nii.gz", "/path/to/run2.nii.gz"),
mask = "/path/to/mask.nii.gz",
TR = 2.0,
run_length = c(180, 180)
)
# Inspect metadata without loading data
print(ds_files)
# Load one run only
run1 <- get_data_matrix(ds_files, run_id = 1)
```
Use `run_id` to load only the run you need, which keeps peak memory usage low.
# See Also
- `vignette("architecture-overview")` - Design principles and backend extensibility
- `vignette("h5-backend-usage")` - Efficient HDF5 storage for large datasets
- `vignette("study-level-analysis")` - Multi-subject studies and group analyses
# Session Information
```{r session-info}
sessionInfo()
```