After loading a BIDS dataset and finding your scans, the next step in
most fMRI workflows is extracting confound regressors
and event tables for modeling. bidser gives you tidy
tibbles at every level – per-run, per-subject, or dataset-wide – so you
can go straight from BIDS into lm(), lme4, or
whatever modeling framework you prefer.
We’ll create a small mock dataset with two subjects, two tasks, and realistic fMRIPrep-style confound files so that every code chunk in this vignette runs without network access.
proj <- bids_project(temp_dir, fmriprep = TRUE)
proj
#> BIDS Project Summary
#> Project Name: bids_confounds_vignette_15f97c9c33fc
#> Participants (n): 2
#> Participants Source: file
#> Tasks: nback, rest
#> fMRIPrep Derivatives: derivatives/fmriprep
#> Derivative Pipelines: fmriprep
#> Index: enabled
#> Image Types: func, funcprep
#> Modalities: (none)
#> Keys: folder, kind, relative_path, run, subid, suffix, task, type, desc, spaceOur demo has 2 subjects, each with a resting-state and an n-back task run, plus fMRIPrep derivatives containing confound timeseries.
Before reading any files, it helps to know what confound sets bidser ships with. These are curated variable lists that match common denoising strategies from the fMRI literature.
list_confound_sets()
#> set
#> 1 motion6
#> 2 motion12
#> 3 motion24
#> 4 global3
#> 5 9p
#> 6 36p
#> 7 acompcor
#> 8 tcompcor
#> 9 compcor
#> 10 cosine
#> 11 outliers
#> 12 dvars
#> 13 std_dvars
#> 14 raw_dvars
#> 15 non_std_dvars
#> 16 vx_wisestd_dvars
#> 17 fd
#> 18 legacy_default
#> description
#> 1 Rigid-body motion (6 params)
#> 2 Motion + first derivatives (12)
#> 3 Friston 24-parameter motion model (24)
#> 4 CSF, WM, and global signals (3)
#> 5 9-parameter model: motion6 + CSF + WM + GlobalSignal (9)
#> 6 36-parameter model: motion24 + globals with derivs/quadratics (36)
#> 7 Anatomical CompCor components (use n to limit)
#> 8 Temporal CompCor components (use n to limit)
#> 9 Both anatomical and temporal CompCor (use n to limit)
#> 10 Discrete cosine basis regressors
#> 11 FD/RMSD, motion spike regressors, and nonsteady-state outliers
#> 12 Standardized DVARS only (std_dvars)
#> 13 Standardized DVARS only (std_dvars)
#> 14 Raw/non-standardized DVARS (dvars)
#> 15 Explicit non-standardized DVARS (non_std_dvars)
#> 16 Voxel-wise standardized DVARS (vx_wisestd_dvars)
#> 17 Framewise displacement only
#> 18 Legacy read_confounds() default = former DEFAULT_CVARS2 (26 canonical names)Each set is a named collection of confound variable names. You can inspect what a set contains:
# 6 rigid-body motion parameters
confound_set("motion6")
#> [1] "trans_x" "trans_y" "trans_z" "rot_x" "rot_y" "rot_z"
# Friston 24-parameter expansion (motion + derivatives + squares)
length(confound_set("motion24"))
#> [1] 24The sets compose naturally – "36p" is
motion24 plus tissue signals and their expansions, while
"9p" is motion6 plus the three global
signals.
read_confounds() reads fMRIPrep confound TSV files and
returns tidy tibbles. Pass a confound set name to select variables:
conf_nested <- read_confounds(proj, cvars = confound_set("motion6"))
conf_nested
#> # A tibble: 4 × 5
#> # Groups: participant_id, task, run, session [4]
#> participant_id task run session data
#> <chr> <chr> <chr> <chr> <list>
#> 1 01 nback 01 1 <tibble [100 × 6]>
#> 2 01 rest 01 1 <tibble [100 × 6]>
#> 3 02 nback 01 1 <tibble [100 × 6]>
#> 4 02 rest 01 1 <tibble [100 × 6]>The result is a nested tibble – one row per run, with the actual
timeseries tucked inside the data list column. This
structure makes it easy to iterate over runs in a modeling pipeline:
conf_nested |>
unnest(data) |>
select(participant_id, task, run, trans_x, rot_x) |>
head()
#> # A tibble: 6 × 6
#> # Groups: participant_id, task, run, session [1]
#> session participant_id task run trans_x rot_x
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 1 01 nback 01 0.0133 0.00121
#> 2 1 01 nback 01 -0.0172 0.00366
#> 3 1 01 nback 01 -0.0146 0.00413
#> 4 1 01 nback 01 -0.0373 0.00403
#> 5 1 01 nback 01 -0.0468 0.00313
#> 6 1 01 nback 01 -0.0688 0.00309You can also pass a character vector of specific variable names, or use wildcards to match patterns:
# All CompCor components
compcor_conf <- read_confounds(
proj,
subid = "01", task = "rest",
cvars = c("a_comp_cor_*", "t_comp_cor_*")
)
names(compcor_conf$data[[1]])
#> [1] "a_comp_cor_00" "a_comp_cor_01" "a_comp_cor_02" "a_comp_cor_03"
#> [5] "a_comp_cor_04" "a_comp_cor_05" "t_comp_cor_00" "t_comp_cor_01"
#> [9] "t_comp_cor_02"When you just want a single table for modeling, use
nest = FALSE:
conf_flat <- read_confounds(
proj,
cvars = confound_set("motion6"),
nest = FALSE
)
dim(conf_flat)
#> [1] 400 10
head(conf_flat)
#> # A tibble: 6 × 10
#> trans_x trans_y trans_z rot_x rot_y rot_z participant_id task run
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 0.0133 0.00456 -0.0187 0.00121 -5.84e-6 2.30e-4 01 nback 01
#> 2 -0.0172 -0.00769 0.0118 0.00366 -1.26e-3 1.62e-4 01 nback 01
#> 3 -0.0146 0.00699 0.0117 0.00413 -1.05e-3 -8.40e-4 01 nback 01
#> 4 -0.0373 -0.0300 -0.0219 0.00403 -1.60e-3 -1.40e-3 01 nback 01
#> 5 -0.0468 -0.0494 -0.0229 0.00313 -2.23e-3 -1.16e-3 01 nback 01
#> 6 -0.0688 -0.0368 -0.0220 0.00309 -3.83e-3 5.46e-5 01 nback 01
#> # ℹ 1 more variable: session <chr>Each row is one volume, with subject/task/run identifiers alongside the confound values – ready to join with your design matrix.
fMRIPrep writes leading NA values for derivative-type
confounds such as framewise_displacement and
*_derivative1. The default preserves those values for
backward compatibility. For raw nuisance regressors that need to go
directly into a GLM design matrix, set na_action = "zero"
to use the common fMRIPrep convention of replacing those leading values
with zero.
For high-dimensional confound sets, you often want to reduce the
dimensionality rather than include all regressors directly. A
confound_strategy() splits variables into two groups: those
reduced via PCA and those kept as-is.
list_confound_strategies()
#> strategy description
#> 1 pcabasic80 PCA(motion24 + aCompCor + tCompCor + CSF + WM, 80% var) + cosineThe built-in "pcabasic80" strategy applies PCA to motion
and CompCor variables (retaining 80% of variance) while keeping cosine
regressors unchanged:
strat <- confound_strategy("pcabasic80")
conf_pca <- read_confounds(proj, subid = "01", task = "rest", cvars = strat)
names(conf_pca$data[[1]])
#> [1] "PC1" "PC2" "PC3" "PC4" "PC5" "PC6"
#> [7] "PC7" "PC8" "PC9" "PC10" "PC11" "PC12"
#> [13] "PC13" "PC14" "PC15" "PC16" "cosine_00" "cosine_01"
#> [19] "cosine_02"The motion and CompCor variables have been replaced by a handful of principal components, while the cosine regressors pass through untouched.
You can build your own strategy by specifying which variables get PCA-reduced and which are kept raw:
my_strat <- confound_strategy(
pca_vars = c(confound_set("motion24"), confound_set("acompcor")),
raw_vars = c("framewise_displacement", confound_set("cosine")),
npcs = 5
)
conf_custom <- read_confounds(proj, subid = "01", task = "nback", cvars = my_strat)
names(conf_custom$data[[1]])
#> [1] "PC1" "PC2" "PC3"
#> [4] "PC4" "PC5" "framewise_displacement"
#> [7] "cosine_00" "cosine_01" "cosine_02"Five PCs from the motion + aCompCor space, plus FD and cosine regressors kept in their original form.
Event files describe the experimental design – trial onsets,
durations, and conditions. read_events() returns them as
nested tibbles:
events <- read_events(proj, task = "nback")
events
#> # A tibble: 2 × 9
#> # Groups: .task, .session, .run, .subid [2]
#> .task .session .run .subid task session run participant_id data
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 nback <NA> 01 01 nback <NA> 01 01 <tibble>
#> 2 nback <NA> 01 02 nback <NA> 01 02 <tibble>Unnest to get a flat trial table:
trials <- events |>
unnest(data) |>
select(.subid, .task, .run, onset, duration, trial_type, response_time)
head(trials)
#> # A tibble: 6 × 8
#> # Groups: .task, .session, .run, .subid [1]
#> .session .subid .task .run onset duration trial_type response_time
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 <NA> 01 nback 01 6.87 2 0back 0.806
#> 2 <NA> 01 nback 01 7.11 2 0back 0.705
#> 3 <NA> 01 nback 01 7.93 2 0back 0.805
#> 4 <NA> 01 nback 01 8.50 2 2back 0.781
#> 5 <NA> 01 nback 01 10.8 2 0back 0.624
#> 6 <NA> 01 nback 01 17.3 2 2back 0.529trials |>
group_by(.subid, trial_type) |>
summarise(
n_trials = n(),
mean_rt = mean(response_time, na.rm = TRUE),
.groups = "drop"
)
#> # A tibble: 4 × 4
#> .subid trial_type n_trials mean_rt
#> <chr> <chr> <int> <dbl>
#> 1 01 0back 19 0.772
#> 2 01 2back 21 0.809
#> 3 02 0back 18 0.778
#> 4 02 2back 22 0.777If you want everything in one flat table directly, use
load_all_events():
variables_table() pulls together scans, events, and
confounds into a single run-level tibble. This is the tidy bridge
between BIDS and your analysis code:
vars <- variables_table(proj)
vars |> select(.subid, .task, .run, any_of(c("n_scans", "n_events", "n_confound_rows")))
#> # A tibble: 4 × 5
#> .subid .task .run n_scans n_confound_rows
#> <chr> <chr> <chr> <int> <int>
#> 1 01 nback 01 2 100
#> 2 01 rest 01 2 100
#> 3 02 nback 01 2 100
#> 4 02 rest 01 2 100Each row is one run. The scans, events, and
confounds list columns hold the nested data. You can
selectively include only events or only confounds:
vars_events <- variables_table(proj, task = "nback", include = "events")
vars_events |> select(.subid, .task, .run, n_events)
#> # A tibble: 2 × 4
#> .subid .task .run n_events
#> <chr> <chr> <chr> <int>
#> 1 01 nback 01 40
#> 2 02 nback 01 40This structure makes it straightforward to write a per-run analysis loop:
# Use the nback-only table which has both events and confounds columns
vars_nback <- variables_table(proj, task = "nback")
vars_nback |>
filter(n_events > 0, n_confound_rows > 0) |>
rowwise() |>
mutate(
n_conditions = length(unique(events$trial_type)),
mean_fd = mean(confounds$framewise_displacement, na.rm = TRUE)
) |>
ungroup() |>
select(.subid, .task, n_conditions, mean_fd)
#> # A tibble: 2 × 4
#> .subid .task n_conditions mean_fd
#> <chr> <chr> <int> <dbl>
#> 1 01 nback 2 0.192
#> 2 02 nback 2 0.196bids_report() assembles a lightweight QA summary
covering project metadata, compliance checks, pipeline discovery, and
run-level coverage:
report <- bids_report(proj)
report
#> BIDS Report
#> Project: bids_confounds_vignette_15f97c9c33fc
#> Participants source: file
#> Subjects: 2
#> Sessions: 0
#> Tasks: nback, rest
#> Total runs: 2
#> Compliance: passed
#> Index: available
#> Pipelines: fmriprep
#> Indexed runs: 4The underlying data is fully accessible for custom reporting:
rdata <- bids_report_data(proj)
rdata$summary
#> $n_subjects
#> [1] 2
#>
#> $n_sessions
#> NULL
#>
#> $tasks
#> # A tibble: 2 × 2
#> task n_runs
#> <chr> <int>
#> 1 nback 1
#> 2 rest 1
#>
#> $total_runs
#> [1] 2
rdata$run_coverage
#> # A tibble: 4 × 7
#> .subid .session .task .run n_scans n_events n_confound_rows
#> <chr> <chr> <chr> <chr> <int> <int> <int>
#> 1 01 "" nback 01 2 0 100
#> 2 01 "" rest 01 2 0 100
#> 3 02 "" nback 01 2 0 100
#> 4 02 "" rest 01 2 0 100You can use the coverage table to spot missing data at a glance:
rdata$run_coverage |>
mutate(
has_scans = n_scans > 0,
has_events = if ("n_events" %in% names(rdata$run_coverage)) n_events > 0 else FALSE,
has_confounds = n_confound_rows > 0
)
#> # A tibble: 4 × 10
#> .subid .session .task .run n_scans n_events n_confound_rows has_scans
#> <chr> <chr> <chr> <chr> <int> <int> <int> <lgl>
#> 1 01 "" nback 01 2 0 100 TRUE
#> 2 01 "" rest 01 2 0 100 TRUE
#> 3 02 "" nback 01 2 0 100 TRUE
#> 4 02 "" rest 01 2 0 100 TRUE
#> # ℹ 2 more variables: has_events <lgl>, has_confounds <lgl>A typical analysis script combines these pieces into a pipeline. Here is a sketch that extracts design-ready data for each run:
analysis_data <- variables_table(proj, task = "nback") |>
filter(n_events > 0, n_confound_rows > 0) |>
rowwise() |>
mutate(
n_trials = nrow(events),
n_vols = nrow(confounds)
) |>
ungroup() |>
select(.subid, .task, .run, n_trials, n_vols)
analysis_data
#> # A tibble: 2 × 5
#> .subid .task .run n_trials n_vols
#> <chr> <chr> <chr> <int> <int>
#> 1 01 nback 01 40 100
#> 2 02 nback 01 40 100From here you would unnest the events and confounds columns, build
your design matrix with model.matrix(), and fit your model
– all in native R tibbles with no intermediate file-path
bookkeeping.