--- title: "Getting started with dipca" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with dipca} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4, warning = FALSE, message = FALSE ) ``` ## Why dipca? Standard PCA finds directions of maximum variance, but it ignores temporal structure. When your data are multivariate time series, the most *interesting* directions are often those that are **predictable from their own past** -- not merely high-variance. The dipca package provides three methods that extract such temporally structured latent components: | Method | Function | Objective | |--------|----------|-----------| | DiPCA | `dipca()` | Maximize autocovariance of latent scores | | DiCCA | `dicca()` | Maximize canonical correlation between scores and their AR prediction | | DiPLS | `dipls()` | Maximize dynamic cross-covariance between an input block X and output block Y | All three return objects that integrate with the [multivarious](https://bbuchsbaum.github.io/multivarious/) framework, so you get `scores()`, `project()`, `reconstruct()`, `predict()`, and `residuals()` out of the box. ```{r load-packages} library(dipca) library(multivarious) ``` ## Quick example Simulate a multivariate time series with two hidden dynamic components, fit DiPCA, and recover them in a few lines: ```{r quick-example} set.seed(42) N <- 300; m <- 6 # Two latent AR(1) processes with different dynamics t1 <- arima.sim(list(ar = 0.8), n = N) t2 <- arima.sim(list(ar = 0.4), n = N) # Mix into 6 observed variables + noise P <- qr.Q(qr(matrix(rnorm(m * 2), m, 2))) X <- cbind(t1, t2) %*% t(P) + matrix(rnorm(N * m, sd = 0.3), N, m) # Fit DiPCA: extract 2 components with lag order 1 fit <- dipca(X, s = 1, l = 2, n_init = 3) # Recovered scores head(scores(fit)) ``` The fitted object is a `bi_projector` from multivarious. You can project new data, reconstruct, or forecast with familiar generic functions. ## DiPCA: unsupervised dynamic components `dipca()` extracts components that maximize autocovariance -- a balance between capturing variance and being temporally predictable. ```{r dipca-fit} fit_pca <- dipca(X, s = 1, l = 2, n_init = 3, max_iter = 500) ``` **Scores** are the latent time series: ```{r dipca-scores, fig.height=5} T_hat <- scores(fit_pca) par(mfrow = c(2, 1), mar = c(4, 4, 2, 1)) plot(T_hat[, 1], type = "l", col = "steelblue", ylab = "Score", main = "DiPCA Component 1") plot(T_hat[, 2], type = "l", col = "firebrick", ylab = "Score", main = "DiPCA Component 2") ``` **Prediction** uses the fitted VAR model on the latent scores: ```{r dipca-predict} pred <- predict(fit_pca, X) # R-squared for each component r2 <- sapply(1:2, function(j) { ok <- !is.na(pred$scores_hat[, j]) cor(pred$scores[ok, j], pred$scores_hat[ok, j])^2 }) cat("Component R-squared:", round(r2, 3), "\n") ``` **Reconstruction** maps scores back to the original variable space: ```{r dipca-recon} X_hat <- reconstruct(fit_pca) cat("Reconstruction RMSE:", round(sqrt(mean((X - X_hat)^2)), 4), "\n") ``` **Residuals** provide a dynamic whitening filter -- the temporal structure is removed: ```{r dipca-resid} res <- stats::residuals(fit_pca, X) # Compare autocorrelation before and after whitening acf_before <- mean(sapply(1:m, function(j) acf(X[, j], lag.max = 1, plot = FALSE)$acf[2])) acf_after <- mean(sapply(1:m, function(j) { ok <- is.finite(res$e_hat[, j]) if (sum(ok) > 10) acf(res$e_hat[ok, j], lag.max = 1, plot = FALSE)$acf[2] else NA }), na.rm = TRUE) cat("Mean lag-1 ACF -- original:", round(acf_before, 3), " residuals:", round(acf_after, 3), "\n") ``` ## DiCCA: components ordered by predictability `dicca()` extracts components in descending order of predictability (R-squared). This is useful when you want the *most dynamic* features first. ```{r dicca-fit} fit_cca <- dicca(X, s = 1, l = 2, inner = "classic", n_init = 3) cat("DiCCA R-squared per component:", round(fit_cca$R2, 3), "\n") ``` DiCCA also supports compact inner models (`inner = "ar"` or `"arma"`) that fit AR/ARMA models per component instead of a joint VAR: ```{r dicca-ar} fit_ar <- dicca(X, s = 1, l = 2, inner = "ar", n_init = 2) ``` The same `predict()`, `residuals()`, and `reconstruct()` methods work for both classic and compact variants. ## DiPLS: supervised dynamic components When you have both input (X) and output (Y) blocks, `dipls()` finds dynamic latent components that maximize the cross-covariance between X-scores and Y-scores through a lagged inner model. ```{r dipls-fit} # Simulate an output block driven by the first latent component beta_true <- c(0.7, -0.3) u <- stats::filter(t1, beta_true, sides = 1) Y <- cbind(u, 0.5 * u + rnorm(N, sd = 0.2)) Y[is.na(Y)] <- 0 fit_pls <- dipls(X, Y, n_comp = 1, s = 1, mode = "fir", verbose = 0) # Predict Y from X Y_hat <- predict(fit_pls, X) ok <- !is.na(Y_hat[, 1]) cat("Y prediction R-squared:", round(cor(Y[ok, 1], Y_hat[ok, 1])^2, 3), "\n") ``` ## Projecting new data All fitted objects support `project()` to transform new observations into the latent space: ```{r project-new} X_new <- matrix(rnorm(50 * m), 50, m) # DiPCA / DiCCA scores_new <- project(fit_pca, X_new) dim(scores_new) ``` ## Next steps - `vignette("extracting-dynamic-components")` -- a deeper walkthrough with simulated ground-truth data, subspace recovery metrics, and method comparisons. - `?dipca`, `?dicca`, `?dipls` -- full parameter documentation.