---
title: Encoding workflows
output:
rmarkdown::html_vignette:
toc: yes
toc_depth: 2.0
css: albers.css
includes:
in_header: albers-header.html
resource_files:
- albers.css
- albers.js
- albers-header.html
params:
family: red
preset: homage
vignette: |
%\VignetteIndexEntry{Encoding workflows}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(family = params$family, preset = params$preset))
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE
)
```
```{r albers-classes, echo=FALSE, results='asis'}
cat(sprintf(
paste0(
''
),
params$family,
params$preset
))
```
```{r library}
library(fmrilatent)
```
This vignette walks through a small executable path for temporal, spatial,
and spatiotemporal encoding, then gives optional recipes for the heavier
graph-based families. For a quick introduction, see `vignette("fmrilatent")`.
We use the same small simulated dataset throughout:
```{r shared-data}
mask <- array(TRUE, dim = c(4, 4, 4))
mask_vol <- neuroim2::LogicalNeuroVol(mask, neuroim2::NeuroSpace(dim(mask)))
set.seed(42)
n_time <- 20
n_vox <- sum(mask)
X <- matrix(rnorm(n_time * n_vox), nrow = n_time)
parcel_map <- as.integer(rep_len(1:4, n_vox))
red <- make_cluster_reduction(mask_vol, parcel_map)
```
## Compressing the time dimension
Temporal specs project each voxel's time series onto a shared set of basis
functions. The data matrix X (time x voxels) becomes a basis B (time x k)
and loadings L (voxels x k). When B is orthonormal, the loadings simplify to
L = t(X) %*% B; otherwise `encode()` uses the least-squares projection implied
by B, equivalent to solving with B'B. You choose `k` to control the compression
ratio.
### Slepian/DPSS sequences
Discrete prolate spheroidal sequences concentrate energy in a frequency band
while minimizing spectral leakage. Use them for resting-state data or any
analysis focused on a specific frequency range.
```{r slepian-time}
spec_slep <- spec_time_slepian(tr = 2, bandwidth = 0.08, k = 4)
lat_slep <- encode(X, spec_slep, mask = mask_vol, materialize = "matrix")
# 4 basis vectors spanning the 0-0.08 Hz band
dim(basis(lat_slep))
# Reconstruction error
sqrt(mean((as.matrix(lat_slep) - X)^2))
```
The `bandwidth` parameter (in Hz) and `tr` (in seconds) together determine
the normalized half-bandwidth W = bandwidth * tr. The number of
well-concentrated tapers is approximately `2 * n_time * W - 1`; setting `k`
lower than this gives heavier compression.
### DCT
The discrete cosine transform provides orthogonal frequency components. It
is fast and parameter-free beyond choosing `k` — a good default when you
have no strong prior about the signal bandwidth.
```{r dct}
lat_dct <- encode(X, spec_time_dct(k = 8), mask = mask_vol, materialize = "matrix")
dim(basis(lat_dct))
sqrt(mean((as.matrix(lat_dct) - X)^2))
```
### B-splines
B-splines produce smooth, localized basis functions with controllable
flexibility. They are useful for modeling slow drifts or the hemodynamic
response, where you want smooth temporal structure without frequency-domain
assumptions.
```{r bspline-time}
lat_bs <- encode(X, spec_time_bspline(k = 6, degree = 3), mask = mask_vol,
materialize = "matrix")
dim(basis(lat_bs))
```
The `degree` parameter controls smoothness (3 = cubic, the default).
Set `orthonormalize = TRUE` (the default) for orthonormal columns, which
simplifies downstream linear algebra.
### Comparing temporal families
All three temporal encodings follow the same pattern. The trade-off is
between spectral precision (Slepian), simplicity (DCT), and localization
(B-spline):
```{r compare-temporal}
rmse <- function(lat) sqrt(mean((as.matrix(lat) - X)^2))
data.frame(
family = c("Slepian (k=4)", "DCT (k=8)", "B-spline (k=6)"),
k = c(ncol(basis(lat_slep)), ncol(basis(lat_dct)), ncol(basis(lat_bs))),
rmse = round(c(rmse(lat_slep), rmse(lat_dct), rmse(lat_bs)), 4)
)
```
## Compressing the voxel dimension
Spatial specs reduce the number of spatial locations by projecting voxels
onto a set of spatial atoms. The roles reverse: the loadings L (voxels x k)
define the spatial dictionary, and the temporal coefficients become
C. With an orthonormal spatial dictionary C = X %*% L; for non-orthonormal
dictionaries the encoder uses the corresponding Gram solve.
### Graph Slepian
Graph Slepian bases are eigenvectors of the graph Laplacian, spectrally
concentrated on spatial clusters. They are the spatial analog of temporal
DPSS sequences.
```{r slepian-space, eval = FALSE}
lat_sp <- encode(X, spec_space_slepian(k = 3, k_neighbors = 6),
mask = mask_vol)
```
The `k` parameter sets the number of components per cluster;
`k_neighbors` controls the k-NN graph used for spatial adjacency.
### Heat wavelet
Heat wavelets approximate diffusion on the voxel graph at multiple time
scales, capturing both local and global spatial structure.
```{r heat-space, eval = FALSE}
lat_heat <- encode(X,
spec_space_heat(scales = c(1, 2, 4, 8), order = 30, threshold = 1e-6),
mask = mask_vol
)
```
The `scales` parameter controls diffusion time scales (larger = smoother);
`order` sets the Chebyshev polynomial approximation order; `threshold`
truncates small coefficients for sparsity.
### HRBF (hierarchical radial basis functions)
HRBF places smooth Gaussian atoms at data-driven centers. This family gives
you explicit, interpretable spatial basis functions.
```{r hrbf, eval = FALSE}
lat_hrbf <- encode(X,
spec_space_hrbf(params = list(
sigma0 = 2, levels = 1,
radius_factor = 2.5, kernel_type = "gaussian"
)),
mask = mask_vol
)
```
The `sigma0` parameter controls atom width; `levels` controls hierarchy
depth (more levels = finer resolution); `radius_factor` sets atom spacing
relative to sigma.
### Wavelet active (CDF 5/3 lifting)
The wavelet active encoder applies CDF 5/3 lifting wavelets along each
spatial axis, touching only in-mask voxels. It is exact (biorthogonal) and
very fast — the best choice for speed-critical pipelines.
```{r wavelet-active, eval = FALSE}
lat_wav <- encode(X,
spec_space_wavelet_active(levels_space = 2, levels_time = 0, threshold = 0),
mask = mask_vol
)
# Wavelet active returns an ImplicitLatent; reconstruct with predict()
recon_wav <- predict(lat_wav)
dim(recon_wav)
```
Setting `threshold > 0` zeroes out small wavelet coefficients for lossy
compression. With `threshold = 0` the transform is exact.
### PCA (cluster-local)
Parcel-local PCA computes per-cluster SVD to build a block-sparse spatial
dictionary. Each parcel contributes `k` components. This is a data-driven
approach — the loadings are subject-specific.
```{r pca-space}
lat_pca <- encode(X, spec_space_pca(k = 3, center = TRUE), mask = mask_vol,
reduction = red)
# 4 parcels x 3 components = 12 total spatial atoms
dim(loadings(lat_pca))
```
The `center` parameter subtracts voxel means before PCA (stored in
`offset()`); `whiten = TRUE` rescales scores to unit variance.
### Shared parcel templates
When you have a parcellation (atlas) and want the **same** spatial
dictionary across subjects, use `parcel_basis_template()`. The default
basis is Laplacian eigenvectors — smooth functions from the voxel adjacency
graph within each parcel. No training data is needed.
```{r parcel-template}
# Build template once from the shared reduction.
tmpl <- parcel_basis_template(red, basis_pca(k = 1), data = X, center = TRUE)
tmpl
# Project each subject onto the shared dictionary
lvec_s1 <- encode(X, spec_space_parcel(tmpl), mask = mask_vol)
# Loadings are identical across subjects; scores and offset vary
dim(basis(lvec_s1)) # time x atoms
dim(loadings(lvec_s1)) # voxels x atoms
```
The example uses one PCA component per parcel so it is fast and fully
reproducible. Geometric templates use the same encoding surface:
```{r parcel-pca-shared, eval = FALSE}
atlas <- neuroim2::ClusteredNeuroVol(mask_vol, parcel_map)
tmpl_slepian <- parcel_basis_template(atlas, basis_slepian(k = 5))
lvec <- encode(X, spec_space_parcel(tmpl_slepian), mask = mask_vol)
```
Any `lift()`-compatible basis spec works: `basis_slepian()` (default),
`basis_pca()`, or `basis_heat_wavelet()`.
## Compressing both dimensions
`spec_st()` combines a temporal and a spatial spec into a separable
representation. Instead of storing the full reconstructed matrix, it stores
a small core tensor and decodes on demand.
```{r separable}
lat_st <- encode(X,
spec_st(
time = spec_time_bspline(k = 4, degree = 3),
space = spec_space_hrbf(params = list(
sigma0 = 2, levels = 0, radius_factor = 2.5, seed = 42L
))
),
mask = mask_vol
)
# Full reconstruction
full <- predict(lat_st)
dim(full)
# Partial: only time points 1-5
partial <- predict(lat_st, time_idx = 1:5)
dim(partial)
# Partial: only voxels in an ROI
roi <- array(FALSE, dim = c(4, 4, 4))
roi[1:2, 1:2, 1:2] <- TRUE
roi_data <- predict(lat_st, roi_mask = roi)
dim(roi_data)
```
Partial decoding is the main advantage of separable encoding. You
reconstruct a single time point or a region of interest without paying the
cost of full reconstruction. The core tensor is typically tiny (k_time x
k_space), so storage is minimal regardless of the original data size.
## The convenience factory
`latent_factory()` builds the spec and calls `encode()` in one step. It is
handy for quick experiments:
```{r factory}
lat_f <- latent_factory("time_dct", x = X, mask = mask_vol, k = 8,
materialize = "matrix")
dim(basis(lat_f))
```
Family strings follow an `_` convention. Available strings and
their underlying specs:
| String | Spec |
|--------|------|
| `"time_slepian"` | `spec_time_slepian()` |
| `"time_dct"` | `spec_time_dct()` |
| `"space_slepian"` | `spec_space_slepian()` |
| `"space_pca"` | `spec_space_pca()` |
| `"space_parcel"` | `spec_space_parcel()` (pass `template =`) |
| `"space_heat"` | `spec_space_heat()` |
| `"space_wavelet_active"` | `spec_space_wavelet_active()` |
| `"st_slepian"` | `spec_st()` with Slepian time + space |
| `"st_bspline_hrbf"` | `spec_st()` with B-spline time + HRBF space |
| `"hierarchical"` | `spec_hierarchical_template()` |
The older `_` names (`"dct_time"`, `"slepian_space"`, …) still
work as aliases, so existing code does not break.
Extra arguments in `...` are forwarded to the spec constructor. The two-step
pattern (spec then `encode()`) gives you more control; the factory is a
shortcut when defaults suffice. Note that B-spline temporal encoding is not
in the factory — use `spec_time_bspline()` with `encode()` directly.
## Handle vs. matrix materialization
The `materialize` argument to `encode()` controls how bases are stored
internally:
- **`"handle"`** (default): a lazy object that generates basis columns on
demand. Saves memory when the basis is large or you only need partial
reconstruction.
- **`"matrix"`**: an explicit dense matrix. Faster for repeated full
reconstructions since the basis is precomputed.
```{r materialize}
# Eager matrix
lat_eager <- encode(X, spec_time_dct(k = 8), mask = mask_vol,
materialize = "matrix")
# Lazy handle (default)
lat_lazy <- encode(X, spec_time_dct(k = 8), mask = mask_vol,
materialize = "handle")
# Same numerical result
all.equal(as.matrix(lat_eager), as.matrix(lat_lazy))
```
Both produce identical numerical results. Choose `"handle"` when memory
matters; choose `"matrix"` when you will call `as.matrix()` or `series()`
repeatedly.
## Benchmarking
`benchmark_roundtrip()` times the encode-decode cycle and reports
reconstruction error across spatial families. It requires the `bench`
package (listed in Suggests).
```{r benchmark, eval = FALSE}
res <- benchmark_roundtrip(
mask_dims = c(8, 8, 4),
n_time = 50,
methods = c("slepian_space", "hrbf", "wavelet_active", "bspline_hrbf_st")
)
res
# Visualize (requires ggplot2)
plot_benchmark_roundtrip(res)
```
The output includes median timing (ms) and reconstruction RMSE for each
family. Use this to find the best trade-off for your data size and accuracy
requirements.
## Quick reference
| Family | Stored | Accuracy | Speed |
|--------|--------|----------|-------|
| Slepian (space) | basis + loadings | exact on span | fast, sparse loadings |
| PCA (local) | basis + loadings | optimal k-rank per parcel | fast, block-sparse |
| Parcel template (shared) | basis + loadings | projection onto fixed dict | fast, reusable across subjects |
| HRBF | basis + loadings | improves with atom count | moderate, dense atoms |
| Wavelet active | coefficients | exact (biorthogonal 5/3) | very fast, mask-aware |
| Separable (B-spline + HRBF) | core tensor + bases | approximate | fast, separable decode |
| Separable (Slepian) | core tensor + bases | exact on span | fast, sparse spatial |
| Heat wavelet | basis + loadings | approximate (thresholded) | graph build cost, sparse |