---
title: 'Working with Individual Scans: H5ParcellatedScan and H5ParcellatedScanSummary'
author: fmristore Package
date: '`r Sys.Date()`'
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Working with Individual Scans: H5ParcellatedScan and H5ParcellatedScanSummary}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
params:
family: red
css: albers.css
resource_files:
- albers.css
- albers.js
includes:
in_header: |-
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE,
fig.width = 7,
fig.height = 5
)
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(params$family))
```
```{r load-packages}
library(fmristore)
library(neuroim2)
```
## How are individual scans organized?
In the fmristore parcellated data system, **H5ParcellatedMultiScan** serves as a container that holds multiple individual scan objects. Each scan within the container is either:
1. **H5ParcellatedScan**: Full voxel-level time series data
2. **H5ParcellatedScanSummary**: Cluster-averaged time series data
This vignette focuses on working with these **individual scan objects** - how to create them (including from neuroim2's ClusteredNeuroVec objects), access them from the parent H5ParcellatedMultiScan container, and what operations you can perform on each scan type.
### The Data Hierarchy
```
H5ParcellatedMultiScan (Container)
├── Run1 → H5ParcellatedScan (Full voxel data)
├── Run2 → H5ParcellatedScan (Full voxel data)
├── Run3 → H5ParcellatedScanSummary (Cluster averages)
└── Run4 → H5ParcellatedScanSummary (Cluster averages)
```
## Step 1: Create Sample Data Using the New Interface
First, let's create a sample dataset using the modern `write_dataset()` and `read_dataset()` functions:
```{r create-sample-data}
# Create a small brain with realistic structure
brain_dims <- c(10, 10, 5)
brain_space <- NeuroSpace(brain_dims, spacing = c(2, 2, 2))
# Define brain mask (central region)
mask_data <- array(FALSE, brain_dims)
mask_data[4:7, 4:7, 2:4] <- TRUE # Small brain region
mask <- LogicalNeuroVol(mask_data, brain_space)
# Divide brain into 3 clusters (regions)
n_voxels <- sum(mask)
cluster_assignments <- rep(1:3, length.out = n_voxels)
clusters <- ClusteredNeuroVol(mask, cluster_assignments)
cat("Created brain with", n_voxels, "voxels in", max(cluster_assignments), "clusters\n")
print(table(clusters@clusters))
```
```
Brain voxels grouped into spatial clusters:
+-----------+ +-----------+ +-----------+
| Cluster 1 | | Cluster 2 | | Cluster 3 |
| (Visual) | | (Motor) | | (Default) |
| 16 voxels | | 16 voxels | | 16 voxels |
+-----------+ +-----------+ +-----------+
```
### Create Multiple fMRI Runs with Different Characteristics
```{r create-multiple-runs}
# Create 3 different fMRI runs
run_names <- c("task_full", "rest_full", "task_summary")
runs_data <- list()
# Run 1: Task run (full data) - 60 timepoints
n_time1 <- 60
fmri_dims1 <- c(brain_dims, n_time1)
fmri_data1 <- array(rnorm(prod(fmri_dims1)), dim = fmri_dims1)
# Add task activation to cluster 2 (motor region)
cluster_2_voxels <- which(clusters@clusters == 2)
mask_indices <- which(mask, arr.ind = TRUE)
for (v in cluster_2_voxels) {
x <- mask_indices[v, 1]
y <- mask_indices[v, 2]
z <- mask_indices[v, 3]
# Add sinusoidal activation
fmri_data1[x, y, z, ] <- fmri_data1[x, y, z, ] + 0.8 * sin(seq(0, 4 * pi, length.out = n_time1))
}
runs_data[[1]] <- NeuroVec(fmri_data1, NeuroSpace(fmri_dims1))
# Run 2: Rest run (full data) - 50 timepoints
n_time2 <- 50
fmri_dims2 <- c(brain_dims, n_time2)
fmri_data2 <- array(rnorm(prod(fmri_dims2)), dim = fmri_dims2)
# Add resting state connectivity pattern
for (i in 1:3) {
cluster_voxels <- which(clusters@clusters == i)
shared_signal <- rnorm(n_time2, sd = 0.3)
for (v in cluster_voxels) {
x <- mask_indices[v, 1]
y <- mask_indices[v, 2]
z <- mask_indices[v, 3]
fmri_data2[x, y, z, ] <- fmri_data2[x, y, z, ] + shared_signal
}
}
runs_data[[2]] <- NeuroVec(fmri_data2, NeuroSpace(fmri_dims2))
# Run 3: Will be saved as summary data (we'll use run 1's data)
runs_data[[3]] <- runs_data[[1]]
cat(
"Created", length(runs_data), "runs with timepoints:",
paste(sapply(runs_data, function(x) dim(x)[4]), collapse = ", "), "\n"
)
```
### Save Mixed Full and Summary Data
```{r save-mixed-data}
# Save to HDF5 file - mix of full and summary scans
h5_file <- tempfile(fileext = ".h5")
# First, save the two full runs
write_dataset(list(runs_data[[1]], runs_data[[2]]),
file = h5_file,
clusters = clusters,
scan_names = c("task_full", "rest_full"),
summary = FALSE
) # Full voxel data
# Then add a summary run (saved here to a separate file for clarity)
# Note: Appending additional scans to an existing file currently requires
# lower-level helpers such as write_parcellated_experiment_h5().
summary_file <- tempfile(fileext = ".h5")
write_dataset(runs_data[[3]],
file = summary_file,
clusters = clusters,
summary = TRUE,
scan_name = "task_summary"
) # Cluster averages only
cat("Created files:\n")
cat(" Full data file:", round(file.size(h5_file) / 1024, 1), "KB\n")
cat(" Summary data file:", round(file.size(summary_file) / 1024, 1), "KB\n")
```
## Converting neuroim2 ClusteredNeuroVec Objects
The neuroim2 package provides `ClusteredNeuroVec`, a class that stores time series data organized by clusters. This is a perfect match for fmristore's parcellated storage format. Here's how to convert ClusteredNeuroVec objects to HDF5 format:
### Creating a ClusteredNeuroVec
```{r create-clustered-neurovec}
# ClusteredNeuroVec stores cluster-averaged time series
# First, create a ClusteredNeuroVol (spatial clustering)
cvol <- neuroim2::ClusteredNeuroVol(mask = mask, clusters = cluster_assignments)
# Create time series data (time x clusters matrix)
n_time_cnv <- 40
n_clusters_cnv <- max(cluster_assignments)
ts_data <- matrix(rnorm(n_time_cnv * n_clusters_cnv),
nrow = n_time_cnv,
ncol = n_clusters_cnv)
# Add some structure to make clusters distinguishable
for (i in 1:n_clusters_cnv) {
ts_data[, i] <- ts_data[, i] + sin(seq(0, 2*pi, length.out = n_time_cnv) * i)
}
# Create ClusteredNeuroVec
cnvec <- neuroim2::ClusteredNeuroVec(x = ts_data, cvol = cvol)
cat("Created ClusteredNeuroVec:\n")
cat(" Time points:", nrow(cnvec@ts), "\n")
cat(" Clusters:", ncol(cnvec@ts), "\n")
cat(" Data stored as:", class(cnvec@ts), "\n")
```
### Converting to HDF5: Single Scan (Default)
By default, `ClusteredNeuroVec` is treated as a single scan since it doesn't contain run information:
```{r convert-cnvec-single}
# Method 1: Using as_h5() - creates a single scan by default
h5_single <- as_h5(cnvec, file = tempfile(fileext = ".h5"), scan_name = "my_scan")
cat("Created:", class(h5_single), "\n")
# Method 2: Using write_dataset() - explicit single scan
cnvec_file <- tempfile(fileext = ".h5")
h5_single2 <- write_dataset(cnvec,
file = cnvec_file,
scan_name = "task_scan",
as_multiscan = FALSE) # Default is FALSE
cat("Created:", class(h5_single2), "\n")
# The result is an H5ParcellatedScanSummary
summary_mat <- as.matrix(h5_single)
cat("Summary data dimensions:", dim(summary_mat), "\n")
# Clean up
close(h5_single)
close(h5_single2)
```
### Converting to HDF5: Multi-Scan Container
If you want to prepare for adding more scans later, you can create a multi-scan container:
```{r convert-cnvec-multi}
# Create as multi-scan container with one scan
h5_multi <- as_h5(cnvec,
file = tempfile(fileext = ".h5"),
scan_name = "scan_001",
as_multiscan = TRUE) # Explicitly request multi-scan
cat("Created:", class(h5_multi), "\n")
cat("Number of scans:", length(h5_multi@runs), "\n")
cat("First scan type:", class(h5_multi@runs[[1]]), "\n")
# You can add more scans to this container later
# (See H5ParcellatedMultiScan vignette for details)
close(h5_multi)
```
For combining multiple `ClusteredNeuroVec` objects into a multi-scan container, see `vignette("H5ParcellatedMultiScan")`.
## Step 2: Load the Data with read_dataset()
```{r load-data}
# Load the datasets using the new read_dataset() function
full_experiment <- read_dataset(h5_file)
summary_experiment <- read_dataset(summary_file)
cat("Full data experiment:\n")
print(full_experiment)
cat("\nSummary data experiment:\n")
print(summary_experiment)
```
## Step 3: Accessing Individual Scans from the Container
This is the key concept: **H5ParcellatedScan** and **H5ParcellatedScanSummary** objects are accessed from their parent **H5ParcellatedMultiScan** container.
```{r access-individual-scans}
# Method 1: Access by position in @runs slot
task_scan <- full_experiment@runs[[1]] # First scan
rest_scan <- full_experiment@runs[[2]] # Second scan
# Method 2: Access by name if names are available
scan_names_full <- scan_names(full_experiment)
cat("Available scan names in full experiment:", paste(scan_names_full, collapse = ", "), "\n")
if (length(scan_names_full) > 0) {
task_scan_by_name <- full_experiment@runs[[scan_names_full[1]]]
cat("Accessed scan by name:", scan_names_full[1], "\n")
}
# Show what type of scan objects we got
cat("Task scan class:", class(task_scan), "\n")
cat("Rest scan class:", class(rest_scan), "\n")
# Access the summary scan
summary_scan <- summary_experiment@runs[[1]]
cat("Summary scan class:", class(summary_scan), "\n")
```
### Understanding Individual Scan Properties
```{r scan-properties}
# Each individual scan has its own properties
cat("=== Task Scan (H5ParcellatedScan) ===\n")
cat("Dimensions:", paste(dim(task_scan), collapse = " × "), "\n")
cat("Class:", class(task_scan), "\n")
cat("Number of voxels:", dim(task_scan)[1], "\n")
cat("Number of timepoints:", dim(task_scan)[2], "\n\n")
cat("=== Summary Scan (H5ParcellatedScanSummary) ===\n")
summary_matrix <- as.matrix(summary_scan)
cat("Summary dimensions:", paste(dim(summary_matrix), collapse = " × "), "\n")
cat("Class:", class(summary_scan), "\n")
cat("Number of clusters:", ncol(summary_matrix), "\n")
cat("Number of timepoints:", nrow(summary_matrix), "\n\n")
cat("=== Storage Efficiency ===\n")
n_voxels_full <- dim(task_scan)[1] * dim(task_scan)[2]
n_elements_summary <- prod(dim(summary_matrix))
cat("Full scan elements:", n_voxels_full, "\n")
cat("Summary scan elements:", n_elements_summary, "\n")
cat("Compression ratio:", round(n_voxels_full / n_elements_summary, 1), ":1\n")
```
## H5ParcellatedScan: Operations on Individual Full Scans
When working with **H5ParcellatedScan** objects (full voxel data), you can extract time series for specific voxels and perform voxel-level analyses.
### Extract Time Series from Individual Voxels
```{r extract-voxel-series}
# Extract time series for specific voxel indices
voxel_indices <- c(1, 5, 10, 15)
task_voxel_ts <- series(task_scan, i = voxel_indices)
# Show what we extracted
cat("Extracted time series dimensions:", paste(dim(task_voxel_ts), collapse = " × "), "\n")
cat("(rows = timepoints, columns = voxels)\n\n")
# Plot individual voxel time series
matplot(task_voxel_ts[, 1:3],
type = "l", lty = 1, lwd = 2,
main = "Task Scan: Individual Voxel Time Series",
xlab = "Time Point", ylab = "Signal",
col = c("red", "blue", "darkgreen")
)
legend("topright",
legend = paste("Voxel", voxel_indices[1:3]),
col = c("red", "blue", "darkgreen"), lty = 1, lwd = 2
)
```
### Extract Time Series for All Voxels in a Cluster
```{r extract-cluster-voxels}
# Find all voxels in cluster 2 (the motor region with task activation)
cluster_2_voxels <- which(clusters@clusters == 2)
cat("Cluster 2 contains", length(cluster_2_voxels), "voxels\n")
# Extract time series for all voxels in cluster 2
cluster_2_ts <- series(task_scan, i = cluster_2_voxels)
cat("Cluster 2 time series dimensions:", paste(dim(cluster_2_ts), collapse = " × "), "\n")
# Plot the first few voxels from cluster 2 to see the task activation
matplot(cluster_2_ts[, 1:min(4, ncol(cluster_2_ts))],
type = "l", lty = 1,
main = "Task Activation in Cluster 2 (Motor) Voxels",
xlab = "Time Point", ylab = "Signal",
col = rainbow(min(4, ncol(cluster_2_ts)))
)
legend("topright",
legend = paste("Voxel", cluster_2_voxels[1:min(4, ncol(cluster_2_ts))]),
col = rainbow(min(4, ncol(cluster_2_ts))), lty = 1
)
```
### Compare Task vs Rest Scans (Individual Scan Analysis)
```{r compare-scans}
# Extract the same voxel indices from both scans
task_sample <- series(task_scan, i = 1:6)
rest_sample <- series(rest_scan, i = 1:6)
# Calculate variance for each voxel in each scan
task_vars <- apply(task_sample, 2, var)
rest_vars <- apply(rest_sample, 2, var)
# Compare
comparison_df <- data.frame(
Voxel = 1:6,
Task_Variance = task_vars,
Rest_Variance = rest_vars,
Ratio = task_vars / rest_vars
)
print("Variance comparison between task and rest scans:")
print(round(comparison_df, 3))
# Plot comparison
par(mfrow = c(1, 2))
plot(task_vars, rest_vars,
xlab = "Task Variance", ylab = "Rest Variance",
main = "Variance Comparison", pch = 19, col = "blue"
)
abline(0, 1, col = "red", lty = 2)
plot(1:6, comparison_df$Ratio,
type = "b", pch = 19, col = "darkgreen",
xlab = "Voxel Index", ylab = "Task/Rest Variance Ratio",
main = "Variance Ratio (Task/Rest)"
)
abline(h = 1, col = "red", lty = 2)
par(mfrow = c(1, 1))
```
## H5ParcellatedScanSummary: Operations on Individual Summary Scans
When working with **H5ParcellatedScanSummary** objects, you work with cluster-averaged data that's much more compact.
### Extract and Analyze Summary Data
```{r extract-summary-data}
# Get the summary matrix (time × clusters)
summary_matrix <- as.matrix(summary_scan)
cat("Summary matrix dimensions:", paste(dim(summary_matrix), collapse = " × "), "\n")
cat("Timepoints:", nrow(summary_matrix), ", Clusters:", ncol(summary_matrix), "\n\n")
# Show column names if available
if (!is.null(colnames(summary_matrix))) {
cat("Cluster names:", paste(colnames(summary_matrix), collapse = ", "), "\n")
} else {
cat("Using default cluster numbering: 1 to", ncol(summary_matrix), "\n")
}
# Plot cluster averages over time
matplot(summary_matrix,
type = "l", lty = 1, lwd = 2,
main = "Summary Scan: Cluster Average Time Series",
xlab = "Time Point", ylab = "Average Signal",
col = c("red", "blue", "darkgreen")
)
legend("topright",
legend = paste("Cluster", 1:ncol(summary_matrix)),
col = c("red", "blue", "darkgreen"), lty = 1, lwd = 2
)
```
### Compute Cluster-to-Cluster Correlations
```{r cluster-correlations}
# Calculate correlation matrix between clusters
cluster_correlations <- cor(summary_matrix)
cat("Cluster correlation matrix:\n")
print(round(cluster_correlations, 3))
# Visualize correlation matrix
if (requireNamespace("corrplot", quietly = TRUE)) {
corrplot::corrplot(cluster_correlations,
method = "color",
title = "Inter-Cluster Correlations",
mar = c(0, 0, 2, 0)
)
} else {
# Simple heatmap alternative
image(1:ncol(cluster_correlations), 1:nrow(cluster_correlations),
as.matrix(cluster_correlations),
main = "Inter-Cluster Correlations",
xlab = "Cluster", ylab = "Cluster",
col = heat.colors(20)
)
}
```
## Practical Usage Guidelines
### Choose H5ParcellatedScan when you need:
**Individual voxel analysis within clusters:**
```{r practical-full-example, eval=FALSE}
# Example: Find the most variable voxel in each cluster
most_variable_voxels <- sapply(1:3, function(cluster_id) {
cluster_voxels <- which(clusters@clusters == cluster_id)
cluster_ts <- series(task_scan, i = cluster_voxels)
variances <- apply(cluster_ts, 2, var)
cluster_voxels[which.max(variances)]
})
```
**Voxel-wise statistical analyses:**
```{r practical-stats, eval=FALSE}
# Example: Correlate each voxel with a task regressor
task_regressor <- sin(seq(0, 4 * pi, length.out = dim(task_scan)[2]))
all_voxel_ts <- series(task_scan, i = 1:dim(task_scan)[1])
voxel_correlations <- cor(all_voxel_ts, task_regressor)
```
**Spatial pattern analysis within regions:**
```{r practical-spatial, eval=FALSE}
# Example: Identify activation hotspots within a cluster
cluster_1_voxels <- which(clusters@clusters == 1)
cluster_1_ts <- series(task_scan, i = cluster_1_voxels)
cluster_1_activation <- rowMeans(abs(cluster_1_ts)) # Activation magnitude
```
### Choose H5ParcellatedScanSummary when you need:
**Network connectivity analysis:**
```{r practical-connectivity}
# Already computed above - cluster correlation matrix
print("Inter-cluster connectivity strengths:")
diag(cluster_correlations) <- NA # Remove self-correlations
print(round(cluster_correlations, 3))
```
**Efficient group comparisons:**
```{r practical-group-compare}
# Example: Compare average activation levels between conditions
# If you had multiple summary scans, you could do:
cluster_means <- colMeans(summary_matrix)
names(cluster_means) <- paste("Cluster", 1:length(cluster_means))
print("Average activation per cluster:")
print(round(cluster_means, 3))
```
**Memory-efficient temporal analysis:**
```{r practical-temporal}
# Example: Analyze temporal dynamics at cluster level
cluster_peak_times <- apply(summary_matrix, 2, which.max)
names(cluster_peak_times) <- paste("Cluster", 1:length(cluster_peak_times))
print("Peak activation timepoints per cluster:")
print(cluster_peak_times)
```
## Key Individual Scan Operations Summary
| Operation | H5ParcellatedScan | H5ParcellatedScanSummary |
|-----------|-------------------|--------------------------|
| `series(scan, i = indices)` | ✅ Extract voxel time series | ❌ Not available |
| `as.matrix(scan)` | ❌ Not available | ✅ Get cluster × time matrix |
| `dim(scan)` | ✅ [voxels, timepoints] | ❌ Use `dim(as.matrix(scan))` |
| `scan[i, j, k, t]` | ✅ 4D indexing | ❌ Not available |
| Memory efficiency | Lower (full voxel data) | Higher (cluster averages only) |
## Memory and Performance Considerations
```{r final-memory-comparison}
# Compare memory footprint of different scan types
cat("Memory usage comparison:\n")
cat(" Task scan object size:", format(object.size(task_scan), units = "KB"), "\n")
cat(" Summary scan object size:", format(object.size(summary_scan), units = "KB"), "\n")
cat(" Summary matrix size:", format(object.size(summary_matrix), units = "KB"), "\n\n")
# File size comparison
cat("File size comparison:\n")
cat(" Full data file:", round(file.size(h5_file) / 1024, 1), "KB\n")
cat(" Summary data file:", round(file.size(summary_file) / 1024, 1), "KB\n")
```
## Clean Up
```{r cleanup}
# Always close HDF5 connections when done
close(full_experiment)
close(summary_experiment)
# Clean up temporary files
unlink(c(h5_file, summary_file))
```
## Key Takeaways
1. **Individual scan objects** (`H5ParcellatedScan` and `H5ParcellatedScanSummary`) are accessed from **H5ParcellatedMultiScan** containers via `@runs[[index]]` or `@runs[["name"]]`
2. **ClusteredNeuroVec conversion** is now seamlessly supported:
- Use `as_h5()` for quick conversion (defaults to single scan)
- Use `write_dataset()` with `as_multiscan` parameter for more control
- ClusteredNeuroVec naturally maps to H5ParcellatedScanSummary format
3. **H5ParcellatedScan** provides voxel-level access within clusters:
- Use `series(scan, i = indices)` for individual voxel time series
- Supports 4D indexing: `scan[i, j, k, t]`
- Essential for spatial pattern analysis and voxel-wise statistics
4. **H5ParcellatedScanSummary** provides efficient cluster-level analysis:
- Use `as.matrix(scan)` to get time × cluster matrix
- Perfect for connectivity analysis and group comparisons
- Much smaller memory footprint
- Natural output format for ClusteredNeuroVec conversion
5. **Modern workflow** uses `read_dataset()` and `write_dataset()` functions for simplified data I/O with automatic type detection
6. **Choose the right scan type** based on your analysis needs:
- Full scans for detailed within-cluster analysis
- Summary scans for between-cluster connectivity and efficiency
- ClusteredNeuroVec for when you already have cluster-averaged data
## Next Steps
- **For container management**: See `vignette("H5ParcellatedMultiScan")` for working with multiple runs
- **For data creation**: Learn about the new `write_dataset()` interface for saving parcellated data
- **For advanced analysis**: Explore mixed designs with both full and summary scans in the same container