---
title: "Getting started with phynd"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with phynd}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(phynd)
```

## Why phynd?

Functional MRI signals are contaminated by physiological noise -- cardiac
pulsation, respiration, and other non-neuronal sources that inflate variance
and reduce sensitivity to true neural effects. Traditional approaches require
external recordings (pulse oximeter, respiratory belt) or explicit nuisance
models, which are not always available.

**phynd** provides fast, *design-free* physiological denoising inspired by
the PHYCAA+ framework (Churchill et al., 2012, 2013). It automatically
separates neuronal from non-neuronal voxels, extracts dynamic noise
components from the non-neuronal subspace, and projects them out -- all
without external physiological recordings or a task design matrix.

The core pipeline runs in seconds on typical fMRI datasets and is
implemented with Rcpp/RcppArmadillo for the performance-critical inner loops.

## Quick example

Generate a toy dataset with known physiological and neuronal sources, denoise
it in one call, and inspect the result:

```{r quick-example}
set.seed(42)

# Simulate: 500 voxels, 200 time points, 20% non-neuronal
X <- phynd:::simulate_phy_data(n_vox = 500, n_time = 200, nn_frac = 0.20)

# Denoise
result <- fast_phy_denoise(
  X, tr = 2.0,
  stability = "none",
  max_passes = 1
)

# What came back?
names(result)
```
```{r quick-dims}
dim(result$x_clean)
n_selected <- ncol(result$components_selected)
n_selected
```

`x_clean` is the denoised data matrix in the same orientation as the input.
`components_selected` holds the nuisance regressors that were identified and
removed. On this simulated example you should typically see a nonzero
selection count. If you see zero selected components on your own data, try
`ratio_thresh = 0.9` or set `stability = "none"` during initial exploration.

## How the pipeline works

`fast_phy_denoise()` proceeds through four stages:

1. **Build a non-neuronal weight map (wNN)** -- each voxel gets a weight
   between 0 (likely neuronal) and 1 (likely non-neuronal), based on the
   energy profile of its temporal differences.

2. **Low-rank SVD** -- the wNN-weighted data is reduced to a compact
   subspace via truncated SVD.

3. **Dynamic component extraction** -- temporal components with high
   lag-one predictability are extracted. By default this uses a fast
   canonical autocorrelation (CAA) method; optionally DiCCA or DiPCA
   can be used instead.

4. **Score and project out** -- each candidate component is scored by how
   much variance it explains in non-neuronal versus neuronal voxels.
   Components with a high NN/NT ratio are identified as physiological
   noise and regressed out.

Let's walk through these stages.

### Stage 1: The wNN map

The wNN map is also available standalone via `compute_wNN_diff_energy()`:

```{r wnn-map}
wnn_result <- compute_wNN_diff_energy(X)
hist(wnn_result$wNN, breaks = 30, main = "wNN distribution",
     xlab = "wNN (0 = non-neuronal, 1 = neuronal)", col = "steelblue")
```

Voxels near 0 are flagged as non-neuronal (high temporal derivative energy);
voxels near 1 are likely neuronal. The ramp between `delta_nt` and `delta_nn`
provides a smooth transition.

### Stage 2-3: Component extraction

The `extractor` argument controls which method finds dynamic components:

- **`"caa"`** (default) -- canonical autocorrelation, fast and
  dependency-free.
- **`"dicca"`** or **`"dipca"`** -- uses the `dipca` package for
  more sophisticated dynamic component analysis. Requires installing
  `dipca`.

```{r component-table}
result$component_table[, c("component", "ratio_nn_nt", "predictability", "selected")]
```

Components with `ratio_nn_nt` above the threshold (default 1.0) are
selected as nuisance regressors.

### Stage 4: Cleaned output

You can verify the denoising effect with `compute_design_free_qc()`:

```{r qc}
qc <- compute_design_free_qc(
  x_raw   = X,
  x_clean = result$x_clean,
  wNN     = result$wNN,
  component_table = result$component_table
)

fmt_qc_num <- function(x) if (is.na(x)) "NA" else if (abs(x) < 1e-3) sprintf("%.3e", x) else sprintf("%.4f", x)

cat(sprintf("tSNR in neuronal voxels:  %s -> %s\n",
            fmt_qc_num(qc$tsnr_nt_before), fmt_qc_num(qc$tsnr_nt_after)))
cat(sprintf("NT median tSNR delta:     %+0.3e\n", qc$tsnr_nt_delta))
cat(sprintf("NN variance reduction:    %.0f%%\n",
            qc$nn_variance_reduction_frac * 100))
```

Good denoising preserves (or improves) tSNR in neuronal voxels while
reducing variance in non-neuronal voxels.

## Tuning parameters

The most important knobs:

| Parameter | Default | Effect |
|---|---|---|
| `pca_rank` | 100 | Dimension of the low-rank subspace. Larger values capture more variance but cost more compute. |
| `ratio_thresh` | 1.0 | NN/NT ratio cutoff. Raise to be more conservative (fewer components removed). |
| `extractor` | `"caa"` | Component extraction method. `"dicca"` and `"dipca"` may be more sensitive but require the `dipca` package. |
| `stability` | `"none"` | Set to `"half"` to enable split-half stability filtering, which discards components that are not reproducible across data halves. |
| `max_passes` | 1 | Number of iterative denoising passes. A second pass can catch residual noise. |
| `svd_engine` | `"auto"` | Uses randomized SVD (`rsvd`) when rank-truncated decomposition is needed; falls back to base `svd` for full-rank decomposition. |

```{r tuning-example}
result_strict <- fast_phy_denoise(
  X, tr = 2.0,
  ratio_thresh = 1.5,
  stability = "half",
  stability_thresh = 0.3
)

cat(sprintf("Default: %d components removed\n",
            ncol(result$components_selected)))
cat(sprintf("Strict:  %d components removed\n",
            ncol(result_strict$components_selected)))
```

## CompCor baseline

phynd also provides `compcor_denoise()`, implementing both aCompCor and
tCompCor as comparison baselines:

```{r compcor}
cc <- compcor_denoise(X, tr = 2.0, mode = "tcompcorr", n_comp = 5)

qc_cc <- compute_design_free_qc(
  x_raw   = X,
  x_clean = cc$x_clean,
  wNN     = result$wNN
)

cat(sprintf("tCompCor tSNR (NT): %.2f -> %.2f\n",
            qc_cc$tsnr_nt_before, qc_cc$tsnr_nt_after))
```

**Compatibility note:** this is a CompCor-style baseline for internal
comparison, not a drop-in reproduction of fMRIPrep confounds. Differences can
include preprocessing before CompCor (for example high-pass/censoring),
anatomical mask construction details, and component retention policy
(fixed `n_comp` here).

If you want a closer preprocessing analogue to common fMRIPrep-style usage,
enable DCT pre-highpass with `pre_highpass = "dct"` and tune `highpass_hz`.

This lets you compare phynd's adaptive approach against fixed-rank CompCor
on the same data.

## Diagnostics and timing

When `return_diagnostics = TRUE` (the default), the result includes detailed
timing information:

```{r timing}
result$diagnostics$timings$total_seconds
```

For systematic benchmarking across data sizes and SVD engines, see
`benchmark_fast_phy_denoise()`.

## Saving QC artifacts

`write_qc_artifact()` exports QC summaries for downstream reporting:

```{r write-qc, eval = FALSE}
write_qc_artifact(qc, "qc_summary.json")
write_qc_artifact(qc, "qc_summary.csv")
write_qc_artifact(qc, "qc_summary.rds")
```

## Next steps

- `?fast_phy_denoise` -- full parameter documentation
- `?compcor_denoise` -- CompCor baseline details
- `?compute_wNN_diff_energy` -- standalone wNN map construction
- `?compute_design_free_qc` -- QC metric definitions

## References

Churchill, N. W., Yourganov, G., Spring, R., Rasmussen, P. M., Lee, W.,
Ween, J. E., & Strother, S. C. (2012). PHYCAA: Data-driven measurement and
removal of physiological noise in BOLD fMRI. *NeuroImage*, 59(2), 1299--1314.
doi: [10.1016/j.neuroimage.2011.08.021](https://doi.org/10.1016/j.neuroimage.2011.08.021)

Churchill, N. W., & Strother, S. C. (2013). PHYCAA+: An optimized, adaptive
procedure for measuring and controlling physiological noise in BOLD fMRI.
*NeuroImage*, 82, 306--325.
doi: [10.1016/j.neuroimage.2013.05.102](https://doi.org/10.1016/j.neuroimage.2013.05.102)