--- title: Getting Started with fmridataset author: fmridataset Team date: '`r Sys.Date()`' output: rmarkdown::html_vignette: toc: yes toc_depth: 2 number_sections: yes vignette: > %\VignetteIndexEntry{Getting Started with fmridataset} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} params: family: red preset: homage css: albers.css resource_files: - albers.css - albers.js includes: in_header: |- --- ```{r setup, include=FALSE} if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) { ggplot2::theme_set( albersdown::theme_albers(family = params$family, preset = params$preset) ) } knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = TRUE, warning = FALSE, message = FALSE ) library(fmridataset) ``` # Motivation fMRI analyses involve heterogeneous sources: NIfTI files, BIDS datasets, preprocessed matrices, and HDF5 archives. fmridataset provides a unified interface that abstracts format differences so the same functions work regardless of how data is stored. # Creating a Dataset The simplest starting point is a matrix dataset, which wraps an in-memory matrix with temporal metadata. ```{r create-dataset} set.seed(42) mat <- matrix(rnorm(150 * 1000), nrow = 150, ncol = 1000) ds <- matrix_dataset( datamat = mat, TR = 2.0, run_length = c(75, 75) ) print(ds) ``` The dataset holds 150 timepoints split into two runs of 75, with a 2-second TR. # Accessing Data `get_data_matrix()` returns a standard timepoints-by-voxels matrix. You can retrieve all runs or a single run by index. ```{r get-data} all_data <- get_data_matrix(ds) cat("Full dimensions:", dim(all_data), "\n") run1 <- get_data_matrix(ds, run_id = 1) cat("Run 1 dimensions:", dim(run1), "\n") ``` The returned matrix is always in timepoints x voxels orientation. # Temporal Structure Every dataset contains a `sampling_frame` that records run boundaries, TR, and duration. ```{r sampling-frame} sf <- ds$sampling_frame cat("TR:", get_TR(sf), "seconds\n") cat("Runs:", n_runs(sf), "\n") cat("Run lengths:", get_run_lengths(sf), "timepoints\n") cat("Total duration:", get_total_duration(sf), "seconds\n") # First six acquisition times (seconds) head(samples(sf)) ``` `blockids()` maps each timepoint to its run index, which is useful for run-specific indexing. # Event Tables Experimental design attaches to the dataset as a data frame in `$event_table`. ```{r event-table} events <- data.frame( onset = c(10, 30, 50, 70, 110, 130, 150, 170), duration = 2, trial_type = rep(c("faces", "houses"), 4), run = c(1, 1, 1, 1, 2, 2, 2, 2) ) ds$event_table <- events head(ds$event_table) ``` Event onsets are in seconds and align with the sampling frame's time axis. # Chunked Processing For large datasets, `data_chunks()` partitions voxels into memory-manageable pieces without loading the entire matrix at once. ```{r chunked-processing} chunks <- data_chunks(ds, nchunks = 4) results <- lapply(chunks, function(chunk) { colMeans(chunk$data) }) voxel_means <- do.call(c, results) cat("Computed means for", length(voxel_means), "voxels\n") ``` Each element of `results` corresponds to one chunk; `do.call(c, ...)` reassembles them in voxel order. # Run-wise Processing Set `runwise = TRUE` to get one chunk per run. This is appropriate for analyses that must respect run boundaries such as detrending or temporal filtering. ```{r runwise-processing} run_chunks <- data_chunks(ds, runwise = TRUE) run_stats <- lapply(run_chunks, function(chunk) { cat("Run", chunk$chunk_num, ":", nrow(chunk$data), "timepoints\n") rowMeans(chunk$data) }) cat("Processed", length(run_stats), "runs\n") ``` Each chunk's `$data` contains only that run's timepoints. # File-based Datasets When working with NIfTI files, use `fmri_file_dataset()`. Data remains on disk until explicitly accessed. ```{r file-dataset, eval=FALSE} ds_files <- fmri_file_dataset( scans = c("/path/to/run1.nii.gz", "/path/to/run2.nii.gz"), mask = "/path/to/mask.nii.gz", TR = 2.0, run_length = c(180, 180) ) # Inspect metadata without loading data print(ds_files) # Load one run only run1 <- get_data_matrix(ds_files, run_id = 1) ``` Use `run_id` to load only the run you need, which keeps peak memory usage low. # See Also - `vignette("architecture-overview")` - Design principles and backend extensibility - `vignette("h5-backend-usage")` - Efficient HDF5 storage for large datasets - `vignette("study-level-analysis")` - Multi-subject studies and group analyses # Session Information ```{r session-info} sessionInfo() ```