--- title: Getting started with fmrilatent output: rmarkdown::html_vignette: toc: yes toc_depth: 2.0 css: albers.css includes: in_header: albers-header.html resource_files: - albers.css - albers.js - albers-header.html params: family: red preset: homage vignette: | %\VignetteIndexEntry{Getting started with fmrilatent} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(family = params$family, preset = params$preset)) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) ``` ```{r albers-classes, echo=FALSE, results='asis'} cat(sprintf( paste0( '' ), params$family, params$preset )) ``` ```{r library} library(fmrilatent) ``` This is the on-ramp. It shows the minimal `mask` → `encode()` → reconstruct loop and points to deeper vignettes for everything else. | If you want to … | Read | |---|---| | Understand the two latent object types | `vignette("explicit-vs-implicit-latents")` | | Pick a basis family for your data | `vignette("choosing-basis-family")` | | Drive the spec/encoder pipeline programmatically | `vignette("encode-factory")` | | Work with `LatentNeuroVec` objects in detail | `vignette("working-with-latentneurovec")` | | Share a spatial dictionary across subjects | `vignette("shared-spatial-dictionaries")` | | Use the transport / decoder-backed path | `vignette("transport-aware-encoding")` | | Compress a voxel-by-time matrix with BOLDZip-SR | `vignette("boldzip")` | | Use a standalone matrix codec | `vignette("standalone-codecs")` | | Inspect shared structure for BOLDZip-SR | `vignette("shared-structure-boldzip")` | | Benchmark compression vs. fidelity | `vignette("compression-diagnostics")` | ## Why fmrilatent? An fMRI dataset is a 4D array — three spatial dimensions and time. Most of that volume is empty: air, skull, tissue outside the brain. Even within the brain mask, the signal lives in a much lower-dimensional subspace than the raw voxel count suggests. fmrilatent exploits this structure. You choose a basis family, call `encode()`, and get a `LatentNeuroVec` that stores your data as a compact matrix factorization. The original data can be reconstructed on demand — partially or fully — without ever materializing the full dense array. ## A first encoding Compress a small simulated fMRI dataset using a temporal DCT basis: ```{r first-encode} # Build a small brain mask (4 x 4 x 4, all voxels in-mask) mask <- array(TRUE, dim = c(4, 4, 4)) mask_vol <- neuroim2::LogicalNeuroVol(mask, neuroim2::NeuroSpace(dim(mask))) # Simulated data: 20 time points, 64 voxels set.seed(42) X <- matrix(rnorm(20 * sum(mask)), nrow = 20) # Encode with 8 DCT components lat <- encode(X, spec_time_dct(k = 8), mask = mask_vol, materialize = "matrix") lat ``` The `LatentNeuroVec` stores a 20 × 8 temporal basis and 64 × 8 spatial loadings — much smaller than the original 20 × 64 matrix. The savings grow dramatically with real data: a 300-timepoint run over 100,000 voxels shrinks from 30 million values to a basis and loadings totaling a fraction of that. ## Getting data back Reconstruction is straightforward: ```{r reconstruct} # Full matrix (time x voxels) recon <- as.matrix(lat) dim(recon) # How close is the approximation? sqrt(mean((recon - X)^2)) ``` With 8 of 20 possible DCT components, the approximation is lossy. More components capture more variance; fewer give heavier compression. ```{r reconstruct-plot, echo = FALSE, fig.width = 7, fig.height = 3.2, fig.alt = "One voxel time course: original versus 8-component DCT reconstruction."} op <- par(mar = c(4, 4, 1, 1)) plot(X[, 1], type = "l", col = "grey60", lwd = 2, xlab = "time point", ylab = "signal", main = "") lines(recon[, 1], col = "firebrick", lwd = 2) legend("topright", c("original", "reconstructed"), col = c("grey60", "firebrick"), lwd = 2, bty = "n") par(op) ``` The reconstruction (red) tracks the broad shape of the original signal (grey) while smoothing away the high-frequency detail the discarded components carried — the visible gap *is* the compression loss. You can also access data selectively: ```{r access} # Time series at voxel 1 (linear index) ts1 <- series(lat, 1L) head(ts1) # Inspect the latent factors directly dim(basis(lat)) # temporal basis: 20 x 8 dim(loadings(lat)) # spatial loadings: 64 x 8 ``` ## Choosing a basis Every basis family follows the same two-step pattern: build a spec, then call `encode()`. Common choices: ```{r slepian-temporal} # Slepian/DPSS: bandlimited to 0.08 Hz, TR = 2s lat_slep <- encode(X, spec_time_slepian(tr = 2, bandwidth = 0.08, k = 4), mask = mask_vol, materialize = "matrix") dim(basis(lat_slep)) ``` ```{r bspline} # B-spline: 6 smooth cubic functions lat_bs <- encode(X, spec_time_bspline(k = 6, degree = 3), mask = mask_vol, materialize = "matrix") dim(basis(lat_bs)) ``` Spatial bases compress the voxel dimension; spatiotemporal bases (`spec_st`) compress both. See `vignette("choosing-basis-family")` for guidance on picking among them. ```{r spatial, eval = FALSE} # Separable spatiotemporal: B-spline (time) + HRBF (space) lat_st <- encode(X, spec_st( time = spec_time_bspline(k = 4), space = spec_space_hrbf(params = list(sigma0 = 2, levels = 0, radius_factor = 2.5)) ), mask = mask_vol ) # Partial decode: only time points 1-5 partial <- predict(lat_st, time_idx = 1:5) dim(partial) ``` `spec_st` returns an `ImplicitLatent` (decoder-backed) rather than an explicit `LatentNeuroVec`; its `predict()` method accepts `time_idx` and `roi_mask` arguments so you reconstruct only what you need. ### Quick reference | Family | Dimension | Strengths | |--------|-----------|-----------| | Slepian/DPSS | time | Optimal bandlimiting, minimal spectral leakage | | DCT | time | Fast, interpretable frequency components | | B-spline | time | Smooth, localized, adjustable flexibility | | Slepian | space | Graph-spectral concentration on regions | | PCA | space | Data-driven per-parcel components | | Parcel template | space | Shared dictionary across subjects, atlas-based | | Heat wavelet | space | Multiscale graph diffusion | | HRBF | space | Smooth radial basis functions | | Wavelet active | space | Mask-aware CDF 5/3 lifting, very fast | | Separable | time + space | Compact core tensor, partial decoding | ## Two latent representations fmrilatent ships two related but distinct object types: - **`LatentNeuroVec`** — explicit `basis × loadings + offset` factorization; the basis and loadings are matrices (or lazy `BasisHandle` / `LoadingsHandle` references). Most encoders return this. - **`ImplicitLatent`** — decoder-backed; stores coefficients plus a `decoder()` closure. Returned by `spec_st`, `encode_transport`, `encode_awpt`, and the haar-wavelet path. Use `predict()` to materialize. Both share `predict()`, `as.matrix()`, and meta-access generics. The operator-aware transport workflows live entirely on the implicit-latent side; see `vignette("transport-aware-encoding")` for the shared-asset + field-operator pipeline that bridges fmrilatent and downstream modeling. `vignette("explicit-vs-implicit-latents")` is the dedicated walk-through of this distinction — read it next. ## Next steps - `vignette("choosing-basis-family")` — compares basis choices on a small worked example. - `vignette("encode-factory")` — programmatic spec construction and encoder discovery via `register_encoder()` / `list_encoders()`. - `vignette("working-with-latentneurovec")` — object access, slicing, and memory tradeoffs (handles vs. materialized matrices). - `vignette("shared-spatial-dictionaries")` — building a parcel- or atlas-based template once and projecting many subjects onto it. - `vignette("transport-aware-encoding")` — the shared-basis + subject field-operator pipeline that integrates with downstream modeling packages. - `?LatentNeuroVec` documents the class and its methods (`basis()`, `loadings()`, `series()`, `as.matrix()`, `as.array()`). - `?benchmark_roundtrip` compares encoding families on your own data and hardware.