bidser is an R package designed for working with
neuroimaging data organized according to the Brain Imaging Data Structure
(BIDS) standard. BIDS is a specification that describes how to
organize and name neuroimaging and behavioral data, making datasets more
accessible, shareable, and easier to analyze.
BIDS organizes data into a hierarchical folder structure with standardized naming conventions:
sub-XXses-XXanat, func, dwi,
etc.)bidser provides tools to:
Let’s explore these capabilities using a real BIDS dataset.
We’ll use the ds001 dataset from the BIDS examples,
which contains data from a “Balloon Analog Risk Task” experiment with 16
subjects.
proj
#> BIDS Project Summary
#> Project Name: bids_example_ds001
#> Participants (n): 16
#> Participants Source: file
#> Tasks: balloonanalogrisktask
#> Index: enabled
#> Image Types: func, anat
#> Modalities: (none)
#> Keys: folder, kind, relative_path, run, subid, suffix, task, typeThe bids_project object provides a high-level interface
to the dataset. We can see it contains 16 subjects with both anatomical
and functional data.
Let’s explore the basic structure of this dataset:
# Check if the dataset has multiple sessions per subject
sessions(proj)
#> NULL
# Get all participant IDs
participants(proj)
#> [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12" "13" "14" "15"
#> [16] "16"
# What tasks are included?
tasks(proj)
#> [1] "balloonanalogrisktask"
# Get a summary of the dataset
bids_summary(proj)
#> $n_subjects
#> [1] 16
#>
#> $n_sessions
#> NULL
#>
#> $tasks
#> # A tibble: 1 × 2
#> task n_runs
#> <chr> <int>
#> 1 balloonanalogrisktask 3
#>
#> $total_runs
#> [1] 3Let’s find the most common neuroimaging file types:
# Find all anatomical T1-weighted images
t1w_files <- query_files(proj, regex = "T1w\\.nii", full_path = FALSE)
head(t1w_files)
#> [1] "sub-01/anat/sub-01_T1w.nii.gz" "sub-02/anat/sub-02_T1w.nii.gz"
#> [3] "sub-03/anat/sub-03_T1w.nii.gz" "sub-04/anat/sub-04_T1w.nii.gz"
#> [5] "sub-05/anat/sub-05_T1w.nii.gz" "sub-06/anat/sub-06_T1w.nii.gz"
# Find all functional BOLD scans
bold_files <- func_scans(proj, full_path = FALSE)
head(bold_files)
#> [1] "sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz"
#> [2] "sub-01/func/sub-01_task-balloonanalogrisktask_run-02_bold.nii.gz"
#> [3] "sub-01/func/sub-01_task-balloonanalogrisktask_run-03_bold.nii.gz"
#> [4] "sub-02/func/sub-02_task-balloonanalogrisktask_run-01_bold.nii.gz"
#> [5] "sub-02/func/sub-02_task-balloonanalogrisktask_run-02_bold.nii.gz"
#> [6] "sub-02/func/sub-02_task-balloonanalogrisktask_run-03_bold.nii.gz"One of bidser’s key strengths is filtering data by BIDS metadata:
# Get functional scans for specific subjects
sub01_scans <- func_scans(proj, subid = "01")
sub02_scans <- func_scans(proj, subid = "02")
cat("Subject 01:", length(sub01_scans), "scans\n")
#> Subject 01: 3 scans
cat("Subject 02:", length(sub02_scans), "scans\n")
#> Subject 02: 3 scans
# Filter by task (ds001 only has one task, but this shows the syntax)
task_scans <- func_scans(proj, task = "balloonanalogrisktask")
cat("Balloon task:", length(task_scans), "scans total\n")
#> Balloon task: 48 scans total
# Combine filters: specific subject AND task
sub01_task_scans <- func_scans(proj, subid = "01", task = "balloonanalogrisktask")
cat("Subject 01, balloon task:", length(sub01_task_scans), "scans\n")
#> Subject 01, balloon task: 3 scansYou can use regular expressions to select multiple subjects at once:
# Get scans for subjects 01, 02, and 03
first_three_scans <- func_scans(proj, subid = "0[123]")
cat("First 3 subjects:", length(first_three_scans), "scans total\n")
#> First 3 subjects: 9 scans total
# Get scans for all subjects (equivalent to default)
all_scans <- func_scans(proj, subid = ".*")
cat("All subjects:", length(all_scans), "scans total\n")
#> All subjects: 48 scans totalEvent files describe the experimental paradigm - when stimuli were presented, what responses occurred, etc. This is crucial for task-based fMRI analysis.
# Find all event files
event_file_paths <- event_files(proj)
cat("Found", length(event_file_paths), "event files\n")
#> Found 48 event files
# Read event data into a nested data frame
events_data <- read_events(proj)
events_data
#> # A tibble: 48 × 9
#> # Groups: .task, .session, .run, .subid [48]
#> .task .session .run .subid task session run participant_id data
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 balloonana… <NA> 01 01 ball… <NA> 01 01 <tibble>
#> 2 balloonana… <NA> 02 01 ball… <NA> 02 01 <tibble>
#> 3 balloonana… <NA> 03 01 ball… <NA> 03 01 <tibble>
#> 4 balloonana… <NA> 01 02 ball… <NA> 01 02 <tibble>
#> 5 balloonana… <NA> 02 02 ball… <NA> 02 02 <tibble>
#> 6 balloonana… <NA> 03 02 ball… <NA> 03 02 <tibble>
#> 7 balloonana… <NA> 01 03 ball… <NA> 01 03 <tibble>
#> 8 balloonana… <NA> 02 03 ball… <NA> 02 03 <tibble>
#> 9 balloonana… <NA> 03 03 ball… <NA> 03 03 <tibble>
#> 10 balloonana… <NA> 01 04 ball… <NA> 01 04 <tibble>
#> # ℹ 38 more rowsLet’s explore the event data structure:
# Unnest events for subject 01
first_subject_events <- events_data %>%
filter(.subid == "01") %>%
unnest(cols = c(data))
head(first_subject_events)
#> # A tibble: 6 × 17
#> # Groups: .task, .session, .run, .subid [1]
#> .task .session .run .subid task session run participant_id onset duration
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 ball… <NA> 01 01 ball… <NA> 01 01 0.061 0.772
#> 2 ball… <NA> 01 01 ball… <NA> 01 01 4.96 0.772
#> 3 ball… <NA> 01 01 ball… <NA> 01 01 7.18 0.772
#> 4 ball… <NA> 01 01 ball… <NA> 01 01 10.4 0.772
#> 5 ball… <NA> 01 01 ball… <NA> 01 01 13.4 0.772
#> 6 ball… <NA> 01 01 ball… <NA> 01 01 16.8 0.772
#> # ℹ 7 more variables: trial_type <chr>, cash_demean <dbl>,
#> # control_pumps_demean <dbl>, explode_demean <dbl>, pumps_demean <dbl>,
#> # response_time <dbl>, .file <chr>
names(first_subject_events)
#> [1] ".task" ".session" ".run"
#> [4] ".subid" "task" "session"
#> [7] "run" "participant_id" "onset"
#> [10] "duration" "trial_type" "cash_demean"
#> [13] "control_pumps_demean" "explode_demean" "pumps_demean"
#> [16] "response_time" ".file"Let’s do some basic exploration of the experimental design:
# How many trials per subject?
trial_counts <- events_data %>%
unnest(cols = c(data)) %>%
group_by(.subid) %>%
summarise(n_trials = n(), .groups = "drop")
trial_counts
#> # A tibble: 16 × 2
#> .subid n_trials
#> <chr> <int>
#> 1 01 463
#> 2 02 555
#> 3 03 494
#> 4 04 510
#> 5 05 419
#> 6 06 536
#> 7 07 492
#> 8 08 494
#> 9 09 497
#> 10 10 521
#> 11 11 471
#> 12 12 453
#> 13 13 485
#> 14 14 503
#> 15 15 411
#> 16 16 419BIDS stores acquisition metadata in JSON sidecars.
bidser now supports both direct sidecar reads and
inheritance-aware resolution following the BIDS inheritance
principle.
# Read sidecar rows directly
direct_sidecars <- read_sidecar(
proj,
subid = "01",
task = "balloonanalogrisktask",
inherit = FALSE
)
nrow(direct_sidecars)
#> [1] 0
names(direct_sidecars)
#> character(0)If you want the effective metadata for a scan after applying
inherited sidecars from parent locations, use
get_metadata() or set inherit = TRUE in
read_sidecar():
# Resolve metadata for a specific BOLD file with inheritance
resolved_meta <- get_metadata(proj, bold_files[[1]], inherit = TRUE)
sort(names(resolved_meta))[1:8]
#> [1] "RepetitionTime" "TaskName" NA NA
#> [5] NA NA NA NA
resolved_meta$RepetitionTime
#> [1] 2
# Inheritance-aware sidecar table
inherited_sidecars <- read_sidecar(
proj,
subid = "01",
task = "balloonanalogrisktask",
inherit = TRUE
)
if (nrow(inherited_sidecars) > 0) {
inherited_sidecars %>%
select(any_of(c("file", "RepetitionTime")))
} else {
inherited_sidecars
}
#> # A tibble: 0 × 0This is useful when the metadata you need lives in a task- or dataset-level JSON sidecar instead of the most specific file-level sidecar.
The bids_subject() function provides a convenient
interface for working with data from a single subject. It returns a
lightweight object with helper functions that automatically filter data
for that subject.
# Create a subject-specific interface for subject 01
subject_01 <- bids_subject(proj, "01")
# Get all functional scans for this subject
sub01_scans <- subject_01$scans()
cat("Subject 01:", length(sub01_scans), "functional scans\n")
#> Subject 01: 3 functional scans
# Get event files for this subject
sub01_events <- subject_01$events()
cat("Subject 01:", length(sub01_events), "event files\n")
#> Subject 01: 9 event files
# Read event data for this subject
sub01_event_data <- subject_01$events()
sub01_event_data
#> # A tibble: 3 × 9
#> # Groups: .task, .session, .run, .subid [3]
#> .task .session .run .subid task session run participant_id data
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 balloonanal… <NA> 01 01 ball… <NA> 01 01 <tibble>
#> 2 balloonanal… <NA> 02 01 ball… <NA> 02 01 <tibble>
#> 3 balloonanal… <NA> 03 01 ball… <NA> 03 01 <tibble>This approach is particularly useful when you’re doing subject-level analyses:
subjects_to_analyze <- c("01", "02", "03")
for (subj_id in subjects_to_analyze) {
subj <- bids_subject(proj, subj_id)
scans <- subj$scans()
events <- subj$events()
cat(sprintf("Subject %s: %d scans, %d event files\n",
subj_id, length(scans), length(events)))
}
#> Subject 01: 3 scans, 9 event files
#> Subject 02: 3 scans, 9 event files
#> Subject 03: 3 scans, 9 event filesThe subject interface makes it easy to write analysis pipelines that iterate over subjects without manually constructing filters:
subject_trial_summary <- lapply(participants(proj)[1:3], function(subj_id) {
subj <- bids_subject(proj, subj_id)
event_data <- subj$events()
n_trials <- if (nrow(event_data) > 0) {
event_data %>% unnest(cols = c(data)) %>% nrow()
} else {
0
}
tibble(subject = subj_id, n_trials = n_trials, n_scans = length(subj$scans()))
}) %>% bind_rows()
subject_trial_summary
#> # A tibble: 3 × 3
#> subject n_trials n_scans
#> <chr> <int> <int>
#> 1 01 463 3
#> 2 02 555 3
#> 3 03 494 3query_files()query_files() is the primary file-finding API in bidser.
It supports exact, regex, and glob matching modes, scoped searches
across raw data and derivatives, and can return either paths or a tibble
with parsed entities.
# Exact entity matching -- reproducible, no regex surprises
exact_bold <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "01",
task = "balloonanalogrisktask",
match_mode = "exact"
)
cat("Exact-match BOLD files:", length(exact_bold), "\n")
#> Exact-match BOLD files: 3
# Regex entity matching -- select multiple values with patterns
regex_bold <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "0[1-3]",
task = "balloon.*",
match_mode = "regex"
)
cat("Regex-match BOLD files:", length(regex_bold), "\n")
#> Regex-match BOLD files: 9
# Glob matching -- shell-style wildcards
glob_bold <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "0*",
match_mode = "glob"
)
cat("Glob-match BOLD files:", length(glob_bold), "\n")
#> Glob-match BOLD files: 30# Require the queried entity to actually exist on returned files
task_annotated <- query_files(
proj,
regex = "\\.nii\\.gz$",
task = ".*",
require_entity = TRUE,
scope = "raw"
)
cat("Files with an explicit task entity:", length(task_annotated), "\n")
#> Files with an explicit task entity: 48
# Filter by extension and datatype directly
json_files <- query_files(proj, extension = "\\.json$")
cat("JSON files:", length(json_files), "\n")
#> JSON files: 0
func_niftis <- query_files(proj, datatype = "func", extension = "\\.nii\\.gz$")
cat("Functional NIfTIs:", length(func_niftis), "\n")
#> Functional NIfTIs: 48# Return a tibble instead of paths -- includes all parsed BIDS entities
bold_tbl <- query_files(
proj,
regex = "bold\\.nii\\.gz$",
subid = "0[1-3]",
return = "tibble"
)
bold_tbl |> select(path, subid, task, run)
#> # A tibble: 9 × 4
#> path subid task run
#> <chr> <chr> <chr> <chr>
#> 1 sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.n… 01 ball… 01
#> 2 sub-01/func/sub-01_task-balloonanalogrisktask_run-02_bold.n… 01 ball… 02
#> 3 sub-01/func/sub-01_task-balloonanalogrisktask_run-03_bold.n… 01 ball… 03
#> 4 sub-02/func/sub-02_task-balloonanalogrisktask_run-01_bold.n… 02 ball… 01
#> 5 sub-02/func/sub-02_task-balloonanalogrisktask_run-02_bold.n… 02 ball… 02
#> 6 sub-02/func/sub-02_task-balloonanalogrisktask_run-03_bold.n… 02 ball… 03
#> 7 sub-03/func/sub-03_task-balloonanalogrisktask_run-01_bold.n… 03 ball… 01
#> 8 sub-03/func/sub-03_task-balloonanalogrisktask_run-02_bold.n… 03 ball… 02
#> 9 sub-03/func/sub-03_task-balloonanalogrisktask_run-03_bold.n… 03 ball… 03The tibble is sorted deterministically by subject, session, task, run, and path.
When derivatives are present, scope controls where to
search and pipeline selects specific derivative
pipelines:
deriv_path <- get_example_bids_dataset("ds000001-fmriprep")
proj_deriv <- bids_project(deriv_path)
# Search only derivatives from a specific pipeline
prep_bold <- query_files(
proj_deriv,
regex = "bold\\.nii\\.gz$",
desc = "preproc",
scope = "derivatives",
pipeline = "fmriprep",
match_mode = "exact"
)
# Or use the convenience wrapper
deriv_bold <- derivative_files(proj_deriv, pipeline = "fmriprep",
regex = "bold\\.nii\\.gz$")
# Search everywhere and get a tibble with scope/pipeline columns
all_bold <- query_files(
proj_deriv,
regex = "bold\\.nii\\.gz$",
scope = "all",
return = "tibble"
)bids_project() can handle real-world datasets missing
participants.tsv – subjects are inferred from the directory
tree:
variables_table() gives you a run-level tibble that
nests scan inventory, events, and confounds – ready for downstream R
workflows:
When you need absolute paths for analysis tools:
full_paths <- func_scans(proj, subid = "01", full_path = TRUE)
full_paths
#> [1] "/tmp/RtmpdnaWTD/bids_example_ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz"
#> [2] "/tmp/RtmpdnaWTD/bids_example_ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-02_bold.nii.gz"
#> [3] "/tmp/RtmpdnaWTD/bids_example_ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-03_bold.nii.gz"
all(file.exists(full_paths))
#> [1] TRUEbidser automatically discovers derivative pipelines
under derivatives/. You can query preprocessed scans,
confounds, and masks through query_files():
deriv_path <- get_example_bids_dataset("ds000001-fmriprep")
proj_deriv <- bids_project(deriv_path)
# See which pipelines were discovered
derivative_pipelines(proj_deriv)
# Query preprocessed BOLD scans
preproc <- query_files(
proj_deriv,
regex = "bold\\.nii\\.gz$",
desc = "preproc",
scope = "derivatives",
pipeline = "fmriprep",
return = "tibble"
)
head(preproc$path)
# Read confound regressors
conf <- read_confounds(proj_deriv, subid = "01")