--- title: Encoding workflows output: rmarkdown::html_vignette: toc: yes toc_depth: 2.0 css: albers.css includes: in_header: albers-header.html resource_files: - albers.css - albers.js - albers-header.html params: family: red preset: homage vignette: | %\VignetteIndexEntry{Encoding workflows} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(family = params$family, preset = params$preset)) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) ``` ```{r albers-classes, echo=FALSE, results='asis'} cat(sprintf( paste0( '' ), params$family, params$preset )) ``` ```{r library} library(fmrilatent) ``` This vignette walks through a small executable path for temporal, spatial, and spatiotemporal encoding, then gives optional recipes for the heavier graph-based families. For a quick introduction, see `vignette("fmrilatent")`. We use the same small simulated dataset throughout: ```{r shared-data} mask <- array(TRUE, dim = c(4, 4, 4)) mask_vol <- neuroim2::LogicalNeuroVol(mask, neuroim2::NeuroSpace(dim(mask))) set.seed(42) n_time <- 20 n_vox <- sum(mask) X <- matrix(rnorm(n_time * n_vox), nrow = n_time) parcel_map <- as.integer(rep_len(1:4, n_vox)) red <- make_cluster_reduction(mask_vol, parcel_map) ``` ## Compressing the time dimension Temporal specs project each voxel's time series onto a shared set of basis functions. The data matrix X (time x voxels) becomes a basis B (time x k) and loadings L (voxels x k). When B is orthonormal, the loadings simplify to L = t(X) %*% B; otherwise `encode()` uses the least-squares projection implied by B, equivalent to solving with B'B. You choose `k` to control the compression ratio. ### Slepian/DPSS sequences Discrete prolate spheroidal sequences concentrate energy in a frequency band while minimizing spectral leakage. Use them for resting-state data or any analysis focused on a specific frequency range. ```{r slepian-time} spec_slep <- spec_time_slepian(tr = 2, bandwidth = 0.08, k = 4) lat_slep <- encode(X, spec_slep, mask = mask_vol, materialize = "matrix") # 4 basis vectors spanning the 0-0.08 Hz band dim(basis(lat_slep)) # Reconstruction error sqrt(mean((as.matrix(lat_slep) - X)^2)) ``` The `bandwidth` parameter (in Hz) and `tr` (in seconds) together determine the normalized half-bandwidth W = bandwidth * tr. The number of well-concentrated tapers is approximately `2 * n_time * W - 1`; setting `k` lower than this gives heavier compression. ### DCT The discrete cosine transform provides orthogonal frequency components. It is fast and parameter-free beyond choosing `k` — a good default when you have no strong prior about the signal bandwidth. ```{r dct} lat_dct <- encode(X, spec_time_dct(k = 8), mask = mask_vol, materialize = "matrix") dim(basis(lat_dct)) sqrt(mean((as.matrix(lat_dct) - X)^2)) ``` ### B-splines B-splines produce smooth, localized basis functions with controllable flexibility. They are useful for modeling slow drifts or the hemodynamic response, where you want smooth temporal structure without frequency-domain assumptions. ```{r bspline-time} lat_bs <- encode(X, spec_time_bspline(k = 6, degree = 3), mask = mask_vol, materialize = "matrix") dim(basis(lat_bs)) ``` The `degree` parameter controls smoothness (3 = cubic, the default). Set `orthonormalize = TRUE` (the default) for orthonormal columns, which simplifies downstream linear algebra. ### Comparing temporal families All three temporal encodings follow the same pattern. The trade-off is between spectral precision (Slepian), simplicity (DCT), and localization (B-spline): ```{r compare-temporal} rmse <- function(lat) sqrt(mean((as.matrix(lat) - X)^2)) data.frame( family = c("Slepian (k=4)", "DCT (k=8)", "B-spline (k=6)"), k = c(ncol(basis(lat_slep)), ncol(basis(lat_dct)), ncol(basis(lat_bs))), rmse = round(c(rmse(lat_slep), rmse(lat_dct), rmse(lat_bs)), 4) ) ``` ## Compressing the voxel dimension Spatial specs reduce the number of spatial locations by projecting voxels onto a set of spatial atoms. The roles reverse: the loadings L (voxels x k) define the spatial dictionary, and the temporal coefficients become C. With an orthonormal spatial dictionary C = X %*% L; for non-orthonormal dictionaries the encoder uses the corresponding Gram solve. ### Graph Slepian Graph Slepian bases are eigenvectors of the graph Laplacian, spectrally concentrated on spatial clusters. They are the spatial analog of temporal DPSS sequences. ```{r slepian-space, eval = FALSE} lat_sp <- encode(X, spec_space_slepian(k = 3, k_neighbors = 6), mask = mask_vol) ``` The `k` parameter sets the number of components per cluster; `k_neighbors` controls the k-NN graph used for spatial adjacency. ### Heat wavelet Heat wavelets approximate diffusion on the voxel graph at multiple time scales, capturing both local and global spatial structure. ```{r heat-space, eval = FALSE} lat_heat <- encode(X, spec_space_heat(scales = c(1, 2, 4, 8), order = 30, threshold = 1e-6), mask = mask_vol ) ``` The `scales` parameter controls diffusion time scales (larger = smoother); `order` sets the Chebyshev polynomial approximation order; `threshold` truncates small coefficients for sparsity. ### HRBF (hierarchical radial basis functions) HRBF places smooth Gaussian atoms at data-driven centers. This family gives you explicit, interpretable spatial basis functions. ```{r hrbf, eval = FALSE} lat_hrbf <- encode(X, spec_space_hrbf(params = list( sigma0 = 2, levels = 1, radius_factor = 2.5, kernel_type = "gaussian" )), mask = mask_vol ) ``` The `sigma0` parameter controls atom width; `levels` controls hierarchy depth (more levels = finer resolution); `radius_factor` sets atom spacing relative to sigma. ### Wavelet active (CDF 5/3 lifting) The wavelet active encoder applies CDF 5/3 lifting wavelets along each spatial axis, touching only in-mask voxels. It is exact (biorthogonal) and very fast — the best choice for speed-critical pipelines. ```{r wavelet-active, eval = FALSE} lat_wav <- encode(X, spec_space_wavelet_active(levels_space = 2, levels_time = 0, threshold = 0), mask = mask_vol ) # Wavelet active returns an ImplicitLatent; reconstruct with predict() recon_wav <- predict(lat_wav) dim(recon_wav) ``` Setting `threshold > 0` zeroes out small wavelet coefficients for lossy compression. With `threshold = 0` the transform is exact. ### PCA (cluster-local) Parcel-local PCA computes per-cluster SVD to build a block-sparse spatial dictionary. Each parcel contributes `k` components. This is a data-driven approach — the loadings are subject-specific. ```{r pca-space} lat_pca <- encode(X, spec_space_pca(k = 3, center = TRUE), mask = mask_vol, reduction = red) # 4 parcels x 3 components = 12 total spatial atoms dim(loadings(lat_pca)) ``` The `center` parameter subtracts voxel means before PCA (stored in `offset()`); `whiten = TRUE` rescales scores to unit variance. ### Shared parcel templates When you have a parcellation (atlas) and want the **same** spatial dictionary across subjects, use `parcel_basis_template()`. The default basis is Laplacian eigenvectors — smooth functions from the voxel adjacency graph within each parcel. No training data is needed. ```{r parcel-template} # Build template once from the shared reduction. tmpl <- parcel_basis_template(red, basis_pca(k = 1), data = X, center = TRUE) tmpl # Project each subject onto the shared dictionary lvec_s1 <- encode(X, spec_space_parcel(tmpl), mask = mask_vol) # Loadings are identical across subjects; scores and offset vary dim(basis(lvec_s1)) # time x atoms dim(loadings(lvec_s1)) # voxels x atoms ``` The example uses one PCA component per parcel so it is fast and fully reproducible. Geometric templates use the same encoding surface: ```{r parcel-pca-shared, eval = FALSE} atlas <- neuroim2::ClusteredNeuroVol(mask_vol, parcel_map) tmpl_slepian <- parcel_basis_template(atlas, basis_slepian(k = 5)) lvec <- encode(X, spec_space_parcel(tmpl_slepian), mask = mask_vol) ``` Any `lift()`-compatible basis spec works: `basis_slepian()` (default), `basis_pca()`, or `basis_heat_wavelet()`. ## Compressing both dimensions `spec_st()` combines a temporal and a spatial spec into a separable representation. Instead of storing the full reconstructed matrix, it stores a small core tensor and decodes on demand. ```{r separable} lat_st <- encode(X, spec_st( time = spec_time_bspline(k = 4, degree = 3), space = spec_space_hrbf(params = list( sigma0 = 2, levels = 0, radius_factor = 2.5, seed = 42L )) ), mask = mask_vol ) # Full reconstruction full <- predict(lat_st) dim(full) # Partial: only time points 1-5 partial <- predict(lat_st, time_idx = 1:5) dim(partial) # Partial: only voxels in an ROI roi <- array(FALSE, dim = c(4, 4, 4)) roi[1:2, 1:2, 1:2] <- TRUE roi_data <- predict(lat_st, roi_mask = roi) dim(roi_data) ``` Partial decoding is the main advantage of separable encoding. You reconstruct a single time point or a region of interest without paying the cost of full reconstruction. The core tensor is typically tiny (k_time x k_space), so storage is minimal regardless of the original data size. ## The convenience factory `latent_factory()` builds the spec and calls `encode()` in one step. It is handy for quick experiments: ```{r factory} lat_f <- latent_factory("time_dct", x = X, mask = mask_vol, k = 8, materialize = "matrix") dim(basis(lat_f)) ``` Family strings follow an `_` convention. Available strings and their underlying specs: | String | Spec | |--------|------| | `"time_slepian"` | `spec_time_slepian()` | | `"time_dct"` | `spec_time_dct()` | | `"space_slepian"` | `spec_space_slepian()` | | `"space_pca"` | `spec_space_pca()` | | `"space_parcel"` | `spec_space_parcel()` (pass `template =`) | | `"space_heat"` | `spec_space_heat()` | | `"space_wavelet_active"` | `spec_space_wavelet_active()` | | `"st_slepian"` | `spec_st()` with Slepian time + space | | `"st_bspline_hrbf"` | `spec_st()` with B-spline time + HRBF space | | `"hierarchical"` | `spec_hierarchical_template()` | The older `_` names (`"dct_time"`, `"slepian_space"`, …) still work as aliases, so existing code does not break. Extra arguments in `...` are forwarded to the spec constructor. The two-step pattern (spec then `encode()`) gives you more control; the factory is a shortcut when defaults suffice. Note that B-spline temporal encoding is not in the factory — use `spec_time_bspline()` with `encode()` directly. ## Handle vs. matrix materialization The `materialize` argument to `encode()` controls how bases are stored internally: - **`"handle"`** (default): a lazy object that generates basis columns on demand. Saves memory when the basis is large or you only need partial reconstruction. - **`"matrix"`**: an explicit dense matrix. Faster for repeated full reconstructions since the basis is precomputed. ```{r materialize} # Eager matrix lat_eager <- encode(X, spec_time_dct(k = 8), mask = mask_vol, materialize = "matrix") # Lazy handle (default) lat_lazy <- encode(X, spec_time_dct(k = 8), mask = mask_vol, materialize = "handle") # Same numerical result all.equal(as.matrix(lat_eager), as.matrix(lat_lazy)) ``` Both produce identical numerical results. Choose `"handle"` when memory matters; choose `"matrix"` when you will call `as.matrix()` or `series()` repeatedly. ## Benchmarking `benchmark_roundtrip()` times the encode-decode cycle and reports reconstruction error across spatial families. It requires the `bench` package (listed in Suggests). ```{r benchmark, eval = FALSE} res <- benchmark_roundtrip( mask_dims = c(8, 8, 4), n_time = 50, methods = c("slepian_space", "hrbf", "wavelet_active", "bspline_hrbf_st") ) res # Visualize (requires ggplot2) plot_benchmark_roundtrip(res) ``` The output includes median timing (ms) and reconstruction RMSE for each family. Use this to find the best trade-off for your data size and accuracy requirements. ## Quick reference | Family | Stored | Accuracy | Speed | |--------|--------|----------|-------| | Slepian (space) | basis + loadings | exact on span | fast, sparse loadings | | PCA (local) | basis + loadings | optimal k-rank per parcel | fast, block-sparse | | Parcel template (shared) | basis + loadings | projection onto fixed dict | fast, reusable across subjects | | HRBF | basis + loadings | improves with atom count | moderate, dense atoms | | Wavelet active | coefficients | exact (biorthogonal 5/3) | very fast, mask-aware | | Separable (B-spline + HRBF) | core tensor + bases | approximate | fast, separable decode | | Separable (Slepian) | core tensor + bases | exact on span | fast, sparse spatial | | Heat wavelet | basis + loadings | approximate (thresholded) | graph build cost, sparse |