--- title: "Explicit vs decoder-backed latents" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Explicit vs decoder-backed latents} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r library} library(fmrilatent) ``` fmrilatent ships **two latent object types** that share a common interface but store data very differently. Knowing which one you have — and which one a given encoder returns — is the single most useful piece of mental orientation when reading the docs. ## The two tiers ``` ┌─────────────────────────────────────────┐ │ Explicit: basis × loadings + offset │ LatentNeuroVec (S4) │ matrices, on disk │ ├─────────────────────────────────────────┤ │ Decoder-backed: coeff + decoder() │ ImplicitLatent (S3) │ closure that materializes │ │ on demand │ └─────────────────────────────────────────┘ ``` | Property | Explicit (`LatentNeuroVec`) | Decoder-backed (`ImplicitLatent`) | |---|---|---| | Class system | S4, inherits `neuroim2::NeuroVec` | S3, plain list | | Storage | `@basis`, `@loadings`, `@offset` matrices (or lazy handles) | `$coeff`, `$decoder`, `$meta`, `$mask` | | Reconstruction | `as.matrix(x)`, `series(x, …)` | `predict(x, time_idx, roi_mask)` | | Latent factors | `basis(x)`, `loadings(x)` | `x$coeff` (heterogeneous) | | Saved to disk | Matrix bytes | Closure (captures its environment) | | Typical use | Compact storage of pre-computed factorization | External solver / non-separable codec | ## Which encoders return which? | Encoder family | Returns | |---|---| | `spec_time_dct` / `spec_time_slepian` / `spec_time_bspline` | Explicit `LatentNeuroVec` | | `spec_space_slepian` / `spec_space_pca` / `spec_space_heat` / `spec_space_hrbf` / `spec_space_wavelet_active` | Explicit `LatentNeuroVec` | | `spec_space_parcel` (with `parcel_basis_template`) | Explicit `LatentNeuroVec` | | `spec_st(time = …, space = …)` (separable spatiotemporal) | Decoder-backed `ImplicitLatent` | | `spec_hierarchical_template` | Explicit `LatentNeuroVec` | | `encode_transport(...)` | Decoder-backed `ImplicitLatent` | | `encode_awpt(...)` | Decoder-backed `ImplicitLatent` | | `encode_operator(...)` | Decoder-backed `ImplicitLatent` | | `haar_latent(...)` | Decoder-backed `ImplicitLatent` (subclass `HaarLatent`) | The rule of thumb: if the basis can be written down as a matrix with fewer rows than the time axis (or fewer columns than the voxel count), the encoder produces an explicit object. If the underlying contract requires a non-trivial decoder — separable Kronecker structure, operator transport, lifted wavelets, learned codecs — the encoder produces a decoder-backed object. ## Working with explicit latents ```{r explicit-example} mask <- array(TRUE, dim = c(4, 4, 4)) mask_vol <- neuroim2::LogicalNeuroVol(mask, neuroim2::NeuroSpace(dim(mask))) set.seed(7) X <- matrix(rnorm(20 * sum(mask)), nrow = 20) lvec <- encode(X, spec_time_dct(k = 6), mask = mask_vol, materialize = "matrix") class(lvec) isS4(lvec) # Direct factor access: dim(basis(lvec)) # 20 x 6 dim(loadings(lvec)) # 64 x 6 # Reconstruction: recon <- as.matrix(lvec) dim(recon) # Slicing — same as a NeuroVec: ts1 <- series(lvec, 1L) length(ts1) ``` `LatentNeuroVec` is a subclass of `neuroim2::NeuroVec`, so the standard neuroim2 operations work — `dim()`, `series()`, `as.array()`, `[`, `[[`. `basis()` and `loadings()` give you the latent matrices directly. ## Working with decoder-backed latents ```{r implicit-example} spec_separable <- spec_st( time = spec_time_dct(k = 4), space = spec_space_hrbf(params = list(sigma0 = 2, levels = 0, radius_factor = 2.5)) ) ilat <- encode(X, spec_separable, mask = mask_vol) class(ilat) isS4(ilat) # Coefficients + decoder, not basis × loadings: names(ilat) str(ilat$coeff, max.level = 1) ilat$meta$family # Reconstruction goes through predict(): recon_full <- predict(ilat) dim(recon_full) # n_time x n_voxels # Partial decode — only the first 5 time points: recon_part <- predict(ilat, time_idx = 1:5) dim(recon_part) ``` `predict()` is the universal decoder API for the implicit tier. It accepts `time_idx`, `roi_mask`, and family-specific arguments (`levels_keep` for haar, etc.), and only materializes the slice you ask for. ## Serialization implications This is the most common gotcha. Both tiers can be `saveRDS()`'d, but the cost and reproducibility characteristics differ. ```{r serialize-explicit, eval = FALSE} # Explicit: matrices serialize natively. With handle-backed slots # (e.g. dct_basis_handle), the @id + @spec are saved and the basis is # rematerialized on first access in the new session. saveRDS(lvec, "lvec.rds") lvec2 <- readRDS("lvec.rds") identical(as.matrix(basis(lvec)), as.matrix(basis(lvec2))) # TRUE ``` ```{r serialize-implicit, eval = FALSE} # Decoder-backed: $decoder is a closure. saveRDS captures its # environment — including any data the closure references. This means: # - Self-contained decoders (haar, st-separable) round-trip cleanly. # - Decoders that reference large external assets (subject field # operators) save a copy of the asset by default. saveRDS(ilat, "ilat.rds") ilat2 <- readRDS("ilat.rds") identical(predict(ilat), predict(ilat2)) # TRUE ``` When in doubt: round-trip through `tempfile()` and check that `predict()` (or `as.matrix()`) returns the same numbers. The package test suite has dedicated coverage for this on the explicit side (`test-latent_serialization.R`) and the implicit decoders are exercised indirectly through their family-specific tests. ## Shared structure is orthogonal Both tiers can participate in the **shared structure protocol** (`R/shared_structure.R`), which lets multiple objects reference the same heavy data — a template basis, a parcel atlas, a precomputed graph — instead of each carrying its own copy. The protocol works through dictionary handles (`BasisHandle` / `LoadingsHandle` for the explicit tier, decoder-side asset references for the implicit tier) plus an in-session shared-reference registry. Use shared structures when: - You're encoding many subjects against a common template. - Multiple `LatentNeuroVec`s in the same session would otherwise duplicate the same dictionary in memory. - You're building a benchmark and want strict equality of the basis across runs. See `vignette("shared-spatial-dictionaries")` for the parcel-template walkthrough, and `?fmrilatent_registry_enable` for the in-session cache controls. ## Choosing between tiers | Choose explicit if … | Choose decoder-backed if … | |---|---| | You want fast, predictable matrix access | You need partial decoding or operator transport | | You'll be slicing voxels or time often | The basis is non-separable or learned | | You want to inspect / plot the basis directly | The decoder captures domain knowledge (Haar lifting, AWPT) | | You're storing many objects to disk and want straightforward bytes | You're in a coefficient-space modeling pipeline | In practice, most users start with explicit (DCT or B-spline temporal encoding) and only reach for the implicit tier when they hit `spec_st`, the transport pipeline, or a wavelet codec. ## Further reading - `?LatentNeuroVec`, `?ImplicitLatent` for the class contracts. - `vignette("transport-aware-encoding")` — the implicit tier in depth, including the shared-asset + subject field-operator pipeline. - `vignette("shared-spatial-dictionaries")` — the shared-structure protocol applied to atlas-based encoders. - `vignette("compression-diagnostics")` — comparing tiers on the same data for compression vs. fidelity tradeoffs.