---
title: Getting Started with bidser
output:
rmarkdown::html_vignette:
toc: yes
toc_depth: 2.0
css: albers.css
header-includes:
- ''
params:
family: red
preset: homage
resource_files:
- albers.css
- albers.js
vignette: |
%\VignetteIndexEntry{Getting Started with bidser}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE
)
```
```{r theme-setup, include = FALSE}
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(family = params$family, preset = params$preset))
albers_pkg <- "albersdown"
if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace(albers_pkg, quietly = TRUE)) {
theme_fn <- get("theme_albers", envir = asNamespace(albers_pkg))
ggplot2::theme_set(theme_fn(params$family))
}
suppressPackageStartupMessages({
library(bidser)
library(tibble)
library(dplyr)
library(tidyr)
library(gluedown)
})
```
```{r albers-classes, echo=FALSE, results='asis'}
cat(sprintf(
paste0(
''
),
params$family,
params$preset
))
```
## Introduction to bidser
`bidser` is an R package designed for working with neuroimaging data organized according to the [Brain Imaging Data Structure (BIDS)](https://bids.neuroimaging.io/) standard. BIDS is a specification that describes how to organize and name neuroimaging and behavioral data, making datasets more accessible, shareable, and easier to analyze.
### What is BIDS?
BIDS organizes data into a hierarchical folder structure with standardized naming conventions:
- **Subjects** are identified by folders named `sub-XX`
- **Sessions** (optional) are identified by folders named `ses-XX`
- **Data types** are organized into modality-specific folders (`anat`, `func`, `dwi`, etc.)
- **Files** follow specific naming patterns that encode metadata (subject, session, task, run, etc.)
### What does bidser do?
`bidser` provides tools to:
- **Query and filter** files based on BIDS metadata (subject, task, run, etc.)
- **Read event files** that describe experimental paradigms
- **Work with fMRIPrep derivatives** for preprocessed data
- **Navigate complex BIDS hierarchies** without manually constructing file paths
Let's explore these capabilities using a real BIDS dataset.
## Loading a BIDS Dataset
We'll use the `ds001` dataset from the BIDS examples, which contains data from a "Balloon Analog Risk Task" experiment with 16 subjects.
```{r setup, include = FALSE}
ds001_path <- tryCatch(
get_example_bids_dataset("ds001"),
error = function(e) NULL
)
if (is.null(ds001_path)) {
knitr::knit_exit("Example dataset not available.")
}
proj <- bids_project(ds001_path)
```
```{r}
proj
```
The `bids_project` object provides a high-level interface to the dataset. We can see it contains 16 subjects with both anatomical and functional data.
## Basic Dataset Queries
### Dataset Structure
Let's explore the basic structure of this dataset:
```{r}
# Check if the dataset has multiple sessions per subject
sessions(proj)
# Get all participant IDs
participants(proj)
# What tasks are included?
tasks(proj)
# Get a summary of the dataset
bids_summary(proj)
```
### Finding Files by Type
Let's find the most common neuroimaging file types:
```{r}
# Find all anatomical T1-weighted images
t1w_files <- query_files(proj, regex = "T1w\\.nii", full_path = FALSE)
head(t1w_files)
# Find all functional BOLD scans
bold_files <- func_scans(proj, full_path = FALSE)
head(bold_files)
```
### Filtering by Subject and Task
One of bidser's key strengths is filtering data by BIDS metadata:
```{r}
# Get functional scans for specific subjects
sub01_scans <- func_scans(proj, subid = "01")
sub02_scans <- func_scans(proj, subid = "02")
cat("Subject 01:", length(sub01_scans), "scans\n")
cat("Subject 02:", length(sub02_scans), "scans\n")
# Filter by task (ds001 only has one task, but this shows the syntax)
task_scans <- func_scans(proj, task = "balloonanalogrisktask")
cat("Balloon task:", length(task_scans), "scans total\n")
# Combine filters: specific subject AND task
sub01_task_scans <- func_scans(proj, subid = "01", task = "balloonanalogrisktask")
cat("Subject 01, balloon task:", length(sub01_task_scans), "scans\n")
```
### Working with Multiple Subjects
You can use regular expressions to select multiple subjects at once:
```{r}
# Get scans for subjects 01, 02, and 03
first_three_scans <- func_scans(proj, subid = "0[123]")
cat("First 3 subjects:", length(first_three_scans), "scans total\n")
# Get scans for all subjects (equivalent to default)
all_scans <- func_scans(proj, subid = ".*")
cat("All subjects:", length(all_scans), "scans total\n")
```
## Working with Event Files
Event files describe the experimental paradigm - when stimuli were presented, what responses occurred, etc. This is crucial for task-based fMRI analysis.
```{r}
# Find all event files
event_file_paths <- event_files(proj)
cat("Found", length(event_file_paths), "event files\n")
# Read event data into a nested data frame
events_data <- read_events(proj)
events_data
```
Let's explore the event data structure:
```{r}
# Unnest events for subject 01
first_subject_events <- events_data %>%
filter(.subid == "01") %>%
unnest(cols = c(data))
head(first_subject_events)
names(first_subject_events)
```
### Analyzing Event Data
Let's do some basic exploration of the experimental design:
```{r}
# How many trials per subject?
trial_counts <- events_data %>%
unnest(cols = c(data)) %>%
group_by(.subid) %>%
summarise(n_trials = n(), .groups = "drop")
trial_counts
```
## Working with Metadata Sidecars
BIDS stores acquisition metadata in JSON sidecars. `bidser` now supports both
direct sidecar reads and inheritance-aware resolution following the BIDS
inheritance principle.
```{r}
# Read sidecar rows directly
direct_sidecars <- read_sidecar(
proj,
subid = "01",
task = "balloonanalogrisktask",
inherit = FALSE
)
nrow(direct_sidecars)
names(direct_sidecars)
```
If you want the effective metadata for a scan after applying inherited
sidecars from parent locations, use `get_metadata()` or set `inherit = TRUE`
in `read_sidecar()`:
```{r}
# Resolve metadata for a specific BOLD file with inheritance
resolved_meta <- get_metadata(proj, bold_files[[1]], inherit = TRUE)
sort(names(resolved_meta))[1:8]
resolved_meta$RepetitionTime
# Inheritance-aware sidecar table
inherited_sidecars <- read_sidecar(
proj,
subid = "01",
task = "balloonanalogrisktask",
inherit = TRUE
)
if (nrow(inherited_sidecars) > 0) {
inherited_sidecars %>%
select(any_of(c("file", "RepetitionTime")))
} else {
inherited_sidecars
}
```
This is useful when the metadata you need lives in a task- or dataset-level
JSON sidecar instead of the most specific file-level sidecar.
## Working with Individual Subjects
The `bids_subject()` function provides a convenient interface for working with data from a single subject. It returns a lightweight object with helper functions that automatically filter data for that subject.
```{r}
# Create a subject-specific interface for subject 01
subject_01 <- bids_subject(proj, "01")
# Get all functional scans for this subject
sub01_scans <- subject_01$scans()
cat("Subject 01:", length(sub01_scans), "functional scans\n")
# Get event files for this subject
sub01_events <- subject_01$events()
cat("Subject 01:", length(sub01_events), "event files\n")
# Read event data for this subject
sub01_event_data <- subject_01$events()
sub01_event_data
```
This approach is particularly useful when you're doing subject-level analyses:
```{r}
subjects_to_analyze <- c("01", "02", "03")
for (subj_id in subjects_to_analyze) {
subj <- bids_subject(proj, subj_id)
scans <- subj$scans()
events <- subj$events()
cat(sprintf("Subject %s: %d scans, %d event files\n",
subj_id, length(scans), length(events)))
}
```
The subject interface makes it easy to write analysis pipelines that iterate over subjects without manually constructing filters:
```{r}
subject_trial_summary <- lapply(participants(proj)[1:3], function(subj_id) {
subj <- bids_subject(proj, subj_id)
event_data <- subj$events()
n_trials <- if (nrow(event_data) > 0) {
event_data %>% unnest(cols = c(data)) %>% nrow()
} else {
0
}
tibble(subject = subj_id, n_trials = n_trials, n_scans = length(subj$scans()))
}) %>% bind_rows()
subject_trial_summary
```
## Advanced Querying with `query_files()`
`query_files()` is the primary file-finding API in bidser. It supports
exact, regex, and glob matching modes, scoped searches across raw data and
derivatives, and can return either paths or a tibble with parsed entities.
### Match Modes
```{r}
# Exact entity matching -- reproducible, no regex surprises
exact_bold <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "01",
task = "balloonanalogrisktask",
match_mode = "exact"
)
cat("Exact-match BOLD files:", length(exact_bold), "\n")
# Regex entity matching -- select multiple values with patterns
regex_bold <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "0[1-3]",
task = "balloon.*",
match_mode = "regex"
)
cat("Regex-match BOLD files:", length(regex_bold), "\n")
# Glob matching -- shell-style wildcards
glob_bold <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "0*",
match_mode = "glob"
)
cat("Glob-match BOLD files:", length(glob_bold), "\n")
```
### Entity Presence, Extension, and Datatype Filters
```{r}
# Require the queried entity to actually exist on returned files
task_annotated <- query_files(
proj,
regex = "\\.nii\\.gz$",
task = ".*",
require_entity = TRUE,
scope = "raw"
)
cat("Files with an explicit task entity:", length(task_annotated), "\n")
# Filter by extension and datatype directly
json_files <- query_files(proj, extension = "\\.json$")
cat("JSON files:", length(json_files), "\n")
func_niftis <- query_files(proj, datatype = "func", extension = "\\.nii\\.gz$")
cat("Functional NIfTIs:", length(func_niftis), "\n")
```
### Tibble Output with Parsed Entities
```{r}
# Return a tibble instead of paths -- includes all parsed BIDS entities
bold_tbl <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "0[1-3]",
return = "tibble"
)
bold_tbl |> select(path, subid, task, run)
```
The tibble is sorted deterministically by subject, session, task, run, and path.
### Scoped Queries and Derivatives
When derivatives are present, `scope` controls where to search and `pipeline`
selects specific derivative pipelines:
```{r derivatives-query, eval = FALSE}
deriv_path <- get_example_bids_dataset("ds000001-fmriprep")
proj_deriv <- bids_project(deriv_path)
# Search only derivatives from a specific pipeline
prep_bold <- query_files(
proj_deriv,
regex = "bold\\.nii\\.gz$",
desc = "preproc",
scope = "derivatives",
pipeline = "fmriprep",
match_mode = "exact"
)
# Or use the convenience wrapper
deriv_bold <- derivative_files(proj_deriv, pipeline = "fmriprep",
regex = "bold\\.nii\\.gz$")
# Search everywhere and get a tibble with scope/pipeline columns
all_bold <- query_files(
proj_deriv,
regex = "bold\\.nii\\.gz$",
scope = "all",
return = "tibble"
)
```
### Permissive Loading
`bids_project()` can handle real-world datasets missing `participants.tsv` --
subjects are inferred from the directory tree:
```{r permissive-project, eval = FALSE}
proj_relaxed <- bids_project(
"/path/to/bids",
strict_participants = FALSE
)
# Check where participant IDs came from
participants(proj_relaxed, as_tibble = TRUE)
# See which derivative pipelines were discovered
derivative_pipelines(proj_relaxed)
```
### Run-Level Variables and Report Data
`variables_table()` gives you a run-level tibble that nests scan inventory,
events, and confounds -- ready for downstream R workflows:
```{r variables-report, eval = FALSE}
vars <- variables_table(
proj_deriv,
scope = "all",
pipeline = "fmriprep"
)
vars[, c(".subid", ".task", ".run", "n_scans", "n_events", "n_confound_rows")]
report <- bids_report(proj_deriv, scope = "all", pipeline = "fmriprep")
report
```
### Full File Paths
When you need absolute paths for analysis tools:
```{r}
full_paths <- func_scans(proj, subid = "01", full_path = TRUE)
full_paths
all(file.exists(full_paths))
```
## Working with fMRIPrep Derivatives
`bidser` automatically discovers derivative pipelines under `derivatives/`.
You can query preprocessed scans, confounds, and masks through `query_files()`:
```{r derivatives, eval = FALSE}
deriv_path <- get_example_bids_dataset("ds000001-fmriprep")
proj_deriv <- bids_project(deriv_path)
# See which pipelines were discovered
derivative_pipelines(proj_deriv)
# Query preprocessed BOLD scans
preproc <- query_files(
proj_deriv,
regex = "bold\\.nii\\.gz$",
desc = "preproc",
scope = "derivatives",
pipeline = "fmriprep",
return = "tibble"
)
head(preproc$path)
# Read confound regressors
conf <- read_confounds(proj_deriv, subid = "01")
```
```{r cleanup, include=FALSE}
# Example datasets are cached by get_example_bids_dataset(); leave them in place.
```