---
title: 'fmristore: Efficient Storage for Neuroimaging Data'
author: fmristore Package
date: '`r Sys.Date()`'
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{fmristore: Efficient Storage for Neuroimaging Data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
params:
family: red
css: albers.css
resource_files:
- albers.css
- albers.js
includes:
in_header: |-
---
```{r setup, include = FALSE}
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(params$family))
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE,
fig.width = 7,
fig.height = 5
)
library(fmristore)
library(neuroim2)
```
## Why fmristore?
A typical fMRI session produces 4D arrays with dimensions like 64 x 64 x 40 x 200 — over 32 million values per run. Load a few runs into R and you quickly exhaust available memory. `fmristore` solves this by storing neuroimaging data in HDF5 format: your data lives on disk and is loaded on demand. The package also supports clustered (parcellated) storage for ROI analyses, multi-run containers, and latent decompositions for compression.
This vignette walks you through the main storage formats and when to use each one.
## Quick Start: Choose Your Storage Format
```
Your Data
/ | \
/ | \
Regular Clustered Multiple
4D fMRI Data Brain Maps
| | |
v v v
H5NeuroVec H5Parcellated* LabeledVolumeSet
```
## Data Conversion Workflows
### 1. Converting neuroim2 ClusteredNeuroVec
The `ClusteredNeuroVec` class from neuroim2 stores cluster-averaged time series data. Here's the complete workflow:
```{r clustered-neurovec-workflow}
# Create sample data
brain_dims <- c(10, 10, 5)
mask_data <- array(FALSE, brain_dims)
mask_data[3:7, 3:7, 2:4] <- TRUE
mask <- LogicalNeuroVol(mask_data, NeuroSpace(brain_dims))
# Define clusters
n_voxels <- sum(mask)
cluster_ids <- sample(1:3, n_voxels, replace = TRUE)
cvol <- ClusteredNeuroVol(mask, cluster_ids)
# Create time series (T x K matrix)
n_time <- 100
n_clusters <- 3
ts_data <- matrix(rnorm(n_time * n_clusters), nrow = n_time, ncol = n_clusters)
# Create ClusteredNeuroVec
cnvec <- ClusteredNeuroVec(x = ts_data, cvol = cvol)
# Convert to HDF5 (single scan by default)
h5_file <- tempfile(fileext = ".h5")
h5_scan <- as_h5(cnvec, file = h5_file, scan_name = "my_scan")
cat("Conversion result:\n")
cat(" Input:", class(cnvec), "with", nrow(ts_data), "timepoints\n")
cat(" Output:", class(h5_scan), "\n")
cat(" Storage:", round(file.size(h5_file)/1024, 1), "KB\n")
# Access the data
summary_matrix <- as.matrix(h5_scan)
cat(" Retrieved:", dim(summary_matrix)[1], "x", dim(summary_matrix)[2], "matrix\n")
close(h5_scan)
unlink(h5_file)
```
### 2. Converting Regular NeuroVec to HDF5
For standard 4D fMRI data:
```{r neurovec-workflow}
# Create 4D fMRI data
fmri_dims <- c(10, 10, 5, 50) # x, y, z, time
fmri_data <- array(rnorm(prod(fmri_dims)), dim = fmri_dims)
nvec <- NeuroVec(fmri_data, NeuroSpace(fmri_dims))
# Convert to HDF5-backed storage
h5_file <- tempfile(fileext = ".h5")
h5_nvec <- as_h5(nvec, file = h5_file)
cat("Regular NeuroVec conversion:\n")
cat(" Memory usage before:", format(object.size(nvec), units = "KB"), "\n")
cat(" File size:", round(file.size(h5_file)/1024, 1), "KB\n")
cat(" Can access slices:", dim(h5_nvec[,,, 1:10]), "\n")
close(h5_nvec)
unlink(h5_file)
```
### 3. Creating Multi-Scan Experiments
When you have multiple runs with the same clustering:
```{r multi-scan-workflow}
# Create multiple ClusteredNeuroVec objects (e.g., multiple runs)
runs <- list()
for (i in 1:3) {
ts <- matrix(rnorm(n_time * n_clusters), nrow = n_time, ncol = n_clusters)
runs[[i]] <- ClusteredNeuroVec(x = ts, cvol = cvol)
}
# Option 1: Convert first run to multi-scan container
h5_file <- tempfile(fileext = ".h5")
h5_multi <- as_h5(runs[[1]],
file = h5_file,
scan_name = "run1",
as_multiscan = TRUE)
cat("Multi-scan container created:\n")
cat(" Type:", class(h5_multi), "\n")
cat(" Scans:", length(h5_multi@runs), "\n")
# Note: Currently, adding additional scans requires using write_dataset
# with the appropriate file management strategy
close(h5_multi)
unlink(h5_file)
```
## Performance Comparison
Let's compare different storage strategies:
```{r performance-comparison}
# Create test data
dims <- c(64, 64, 40, 200) # Realistic fMRI dimensions
n_voxels_total <- prod(dims[1:3])
n_voxels_brain <- round(n_voxels_total * 0.3) # ~30% brain voxels
# Memory usage estimates
memory_full <- n_voxels_total * dims[4] * 8 / (1024^2) # MB
memory_masked <- n_voxels_brain * dims[4] * 8 / (1024^2) # MB
memory_clustered <- 100 * dims[4] * 8 / (1024^2) # 100 clusters
sizes <- c(memory_full, memory_masked, memory_clustered)
names(sizes) <- c("Full volume", "Masked (30%)", "Clustered (100)")
cat("Compression ratio (full vs clustered):", round(memory_full / memory_clustered, 0), ":1\n")
```
```{r performance-plot, echo=FALSE, fig.cap="Memory requirements shrink dramatically with masking and clustering.", fig.height=3.5}
barplot(sizes,
ylab = "Memory (MB)", col = c("#E8A0A0", "#A0C4E8", "#A0E8A0"),
main = paste("Storage for", paste(dims, collapse = "x"), "fMRI Data"))
```
## Best Practices
### 1. Memory Management
Always close HDF5 objects when you're done:
```{r memory-management, eval=FALSE}
h5_obj <- as_h5(data, file = "output.h5")
# ... work with data ...
close(h5_obj)
```
### 2. Choosing Between Single and Multi-Scan
```{r single-vs-multi, eval=FALSE}
# Single scan (default) - simpler, more efficient for one run
h5_single <- as_h5(cnvec, file = "single.h5")
# Multi-scan - when you need a container for multiple runs
h5_multi <- as_h5(cnvec, file = "multi.h5", as_multiscan = TRUE)
```
### 3. Data Access Patterns
```{r access-patterns, eval=FALSE}
# Efficient: Access contiguous slices from an H5NeuroVec
time_slice <- h5_nvec[, , , 1:10] # Good
# Less efficient: Random access in time
random_times <- h5_nvec[, , , c(1, 50, 25, 75)] # Slower
# For clustered data (H5ParcellatedScanSummary): get summary matrix once
summary_data <- as.matrix(h5_scan) # Efficient
```
## Common Use Cases
### Use Case 1: Single Subject, Multiple Runs
```{r use-case-1}
# Typical fMRI experiment with multiple runs
n_runs <- 4
n_time_per_run <- 150
# Create clustering once (same for all runs)
mask <- LogicalNeuroVol(mask_data, NeuroSpace(brain_dims))
clusters <- ClusteredNeuroVol(mask, cluster_ids)
# Generate and save each run
for (run in 1:n_runs) {
# Simulate run data
ts <- matrix(rnorm(n_time_per_run * n_clusters),
nrow = n_time_per_run, ncol = n_clusters)
cnvec <- ClusteredNeuroVec(x = ts, cvol = cvol)
# Save each run separately (simple approach)
run_file <- tempfile(pattern = paste0("run", run, "_"), fileext = ".h5")
h5_run <- as_h5(cnvec, file = run_file, scan_name = paste0("run", run))
cat("Saved run", run, "as", class(h5_run), "\n")
close(h5_run)
unlink(run_file) # Clean up for example
}
```
### Use Case 2: Storing Subject Maps with LabeledVolumeSet
```{r use-case-2}
# Store activation maps from multiple subjects in a single HDF5 file
n_subjects <- 5
subject_data <- array(rnorm(prod(brain_dims) * n_subjects),
dim = c(brain_dims, n_subjects))
subject_vec <- NeuroVec(subject_data, NeuroSpace(c(brain_dims, n_subjects)))
subject_labels <- paste0("Subject_", sprintf("%02d", 1:n_subjects))
# Write to HDF5 with meaningful labels
h5_file <- tempfile(fileext = ".h5")
h5 <- write_labeled_vec(subject_vec, mask, subject_labels, file = h5_file)
h5$close_all()
# Load and access individual maps by name
lvs <- read_labeled_vec(h5_file)
cat("Stored", length(labels(lvs)), "subject maps:",
paste(labels(lvs), collapse = ", "), "\n")
subj1 <- lvs[["Subject_01"]]
cat("Subject_01 map dimensions:", paste(dim(subj1), collapse = "x"), "\n")
close(lvs)
unlink(h5_file)
```
### Use Case 3: Resting State Connectivity from Parcellated Data
```{r use-case-3}
# Store clustered resting-state time series and compute connectivity
rest_ts <- matrix(rnorm(n_time * n_clusters), nrow = n_time, ncol = n_clusters)
# Add shared signal to clusters 1 and 2 (simulating a network)
shared_signal <- rnorm(n_time, sd = 0.8)
rest_ts[, 1] <- rest_ts[, 1] + shared_signal
rest_ts[, 2] <- rest_ts[, 2] + shared_signal
# Save as parcellated scan via ClusteredNeuroVec
cnvec_rest <- ClusteredNeuroVec(x = rest_ts, cvol = cvol)
h5_file <- tempfile(fileext = ".h5")
h5_rest <- as_h5(cnvec_rest, file = h5_file, scan_name = "rest")
# Retrieve the cluster-average matrix and compute connectivity
rest_matrix <- as.matrix(h5_rest)
connectivity <- cor(rest_matrix)
```
```{r use-case-3-plot, echo=FALSE, fig.cap="Inter-cluster connectivity from parcellated resting-state data.", fig.width=5, fig.height=4}
# Visualize the connectivity matrix
image(1:ncol(connectivity), 1:nrow(connectivity),
connectivity,
main = "Cluster Connectivity",
xlab = "Cluster", ylab = "Cluster",
col = hcl.colors(20, "RdBu", rev = TRUE),
zlim = c(-1, 1))
```
Clusters 1 and 2 show elevated correlation because they share an underlying signal — exactly the kind of structure you would look for in resting-state connectivity analyses.
```{r use-case-3-cleanup, include=FALSE}
close(h5_rest)
unlink(h5_file)
```
## Advanced Features
### Latent Representations
For data compression using basis decompositions:
```{r latent-demo, eval=FALSE}
n_vox <- 5000
n_tp <- 200
n_comp <- 20
basis <- matrix(rnorm(n_tp * n_comp), nrow = n_tp, ncol = n_comp)
loadings <- Matrix::Matrix(rnorm(n_vox * n_comp), nrow = n_vox, ncol = n_comp, sparse = TRUE)
lnv <- LatentNeuroVec(basis = basis,
loadings = loadings,
mask = mask,
space = NeuroSpace(c(brain_dims, n_tp)))
compression_ratio <- (n_vox * n_tp) / (n_tp * n_comp + n_vox * n_comp)
cat("Compression ratio:", round(compression_ratio, 1), ":1\n")
```
### Working with Atlas Labels
For storing collections of named brain maps, see `vignette("LabeledVolumeSet")`:
```{r atlas-demo, eval=FALSE}
atlas_labels <- c("Visual", "Motor", "Default", "Attention", "Limbic")
n_labels <- length(atlas_labels)
atlas_data <- array(rnorm(prod(brain_dims) * n_labels),
dim = c(brain_dims, n_labels))
atlas_vec <- NeuroVec(atlas_data, NeuroSpace(c(brain_dims, n_labels)))
lvs_file <- tempfile(fileext = ".h5")
h5 <- write_labeled_vec(atlas_vec, mask, atlas_labels, file = lvs_file)
h5$close_all()
lvs <- read_labeled_vec(lvs_file)
labels(lvs)
close(lvs)
```
## Troubleshooting
### Common Issues and Solutions
1. **Memory errors with large datasets**
- Use HDF5-backed storage (`as_h5()`)
- Process data in chunks
- Use summary data instead of full voxel data
2. **Slow data access**
- Access contiguous slices when possible
- Use clustered/parcellated format for ROI analyses
- Consider caching frequently accessed data
3. **File size concerns**
- Enable compression: `compression = 4` in write functions
- Use summary data (`H5ParcellatedScanSummary`)
- Consider latent representations for compression
## Next Steps
- **Basic HDF5 storage**: See `vignette("H5Neuro")`
- **Clustered data details**: See `vignette("H5ParcellatedScan")`
- **Multi-scan containers**: See `vignette("H5ParcellatedMultiScan")`
- **Labeled volumes**: See `vignette("LabeledVolumeSet")`
## Format Chooser
| Your data | Best format | Key function |
|:----------|:------------|:-------------|
| Standard 4D fMRI | `H5NeuroVec` | `as_h5()` |
| Clustered single scan | `H5ParcellatedScanSummary` | `as_h5()` or `write_dataset()` |
| Multi-run clustered | `H5ParcellatedMultiScan` | `write_dataset()` |
| Named brain maps | `LabeledVolumeSet` | `write_labeled_vec()` |
| Compressed via PCA/ICA | `LatentNeuroVec` | `LatentNeuroVec()` |