--- title: Getting Started with bidser output: rmarkdown::html_vignette: toc: yes toc_depth: 2.0 css: albers.css header-includes: - '' params: family: red preset: homage resource_files: - albers.css - albers.js vignette: | %\VignetteIndexEntry{Getting Started with bidser} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) ``` ```{r theme-setup, include = FALSE} if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("albersdown", quietly = TRUE)) ggplot2::theme_set(albersdown::theme_albers(family = params$family, preset = params$preset)) albers_pkg <- "albersdown" if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace(albers_pkg, quietly = TRUE)) { theme_fn <- get("theme_albers", envir = asNamespace(albers_pkg)) ggplot2::theme_set(theme_fn(params$family)) } suppressPackageStartupMessages({ library(bidser) library(tibble) library(dplyr) library(tidyr) library(gluedown) }) ``` ```{r albers-classes, echo=FALSE, results='asis'} cat(sprintf( paste0( '' ), params$family, params$preset )) ``` ## Introduction to bidser `bidser` is an R package designed for working with neuroimaging data organized according to the [Brain Imaging Data Structure (BIDS)](https://bids.neuroimaging.io/) standard. BIDS is a specification that describes how to organize and name neuroimaging and behavioral data, making datasets more accessible, shareable, and easier to analyze. ### What is BIDS? BIDS organizes data into a hierarchical folder structure with standardized naming conventions: - **Subjects** are identified by folders named `sub-XX` - **Sessions** (optional) are identified by folders named `ses-XX` - **Data types** are organized into modality-specific folders (`anat`, `func`, `dwi`, etc.) - **Files** follow specific naming patterns that encode metadata (subject, session, task, run, etc.) ### What does bidser do? `bidser` provides tools to: - **Query and filter** files based on BIDS metadata (subject, task, run, etc.) - **Read event files** that describe experimental paradigms - **Work with fMRIPrep derivatives** for preprocessed data - **Navigate complex BIDS hierarchies** without manually constructing file paths Let's explore these capabilities using a real BIDS dataset. ## Loading a BIDS Dataset We'll use the `ds001` dataset from the BIDS examples, which contains data from a "Balloon Analog Risk Task" experiment with 16 subjects. ```{r setup, include = FALSE} ds001_path <- tryCatch( get_example_bids_dataset("ds001"), error = function(e) NULL ) if (is.null(ds001_path)) { knitr::knit_exit("Example dataset not available.") } proj <- bids_project(ds001_path) ``` ```{r} proj ``` The `bids_project` object provides a high-level interface to the dataset. We can see it contains 16 subjects with both anatomical and functional data. ## Basic Dataset Queries ### Dataset Structure Let's explore the basic structure of this dataset: ```{r} # Check if the dataset has multiple sessions per subject sessions(proj) # Get all participant IDs participants(proj) # What tasks are included? tasks(proj) # Get a summary of the dataset bids_summary(proj) ``` ### Finding Files by Type Let's find the most common neuroimaging file types: ```{r} # Find all anatomical T1-weighted images t1w_files <- query_files(proj, regex = "T1w\\.nii", full_path = FALSE) head(t1w_files) # Find all functional BOLD scans bold_files <- func_scans(proj, full_path = FALSE) head(bold_files) ``` ### Filtering by Subject and Task One of bidser's key strengths is filtering data by BIDS metadata: ```{r} # Get functional scans for specific subjects sub01_scans <- func_scans(proj, subid = "01") sub02_scans <- func_scans(proj, subid = "02") cat("Subject 01:", length(sub01_scans), "scans\n") cat("Subject 02:", length(sub02_scans), "scans\n") # Filter by task (ds001 only has one task, but this shows the syntax) task_scans <- func_scans(proj, task = "balloonanalogrisktask") cat("Balloon task:", length(task_scans), "scans total\n") # Combine filters: specific subject AND task sub01_task_scans <- func_scans(proj, subid = "01", task = "balloonanalogrisktask") cat("Subject 01, balloon task:", length(sub01_task_scans), "scans\n") ``` ### Working with Multiple Subjects You can use regular expressions to select multiple subjects at once: ```{r} # Get scans for subjects 01, 02, and 03 first_three_scans <- func_scans(proj, subid = "0[123]") cat("First 3 subjects:", length(first_three_scans), "scans total\n") # Get scans for all subjects (equivalent to default) all_scans <- func_scans(proj, subid = ".*") cat("All subjects:", length(all_scans), "scans total\n") ``` ## Working with Event Files Event files describe the experimental paradigm - when stimuli were presented, what responses occurred, etc. This is crucial for task-based fMRI analysis. ```{r} # Find all event files event_file_paths <- event_files(proj) cat("Found", length(event_file_paths), "event files\n") # Read event data into a nested data frame events_data <- read_events(proj) events_data ``` Let's explore the event data structure: ```{r} # Unnest events for subject 01 first_subject_events <- events_data %>% filter(.subid == "01") %>% unnest(cols = c(data)) head(first_subject_events) names(first_subject_events) ``` ### Analyzing Event Data Let's do some basic exploration of the experimental design: ```{r} # How many trials per subject? trial_counts <- events_data %>% unnest(cols = c(data)) %>% group_by(.subid) %>% summarise(n_trials = n(), .groups = "drop") trial_counts ``` ## Working with Metadata Sidecars BIDS stores acquisition metadata in JSON sidecars. `bidser` now supports both direct sidecar reads and inheritance-aware resolution following the BIDS inheritance principle. ```{r} # Read sidecar rows directly direct_sidecars <- read_sidecar( proj, subid = "01", task = "balloonanalogrisktask", inherit = FALSE ) nrow(direct_sidecars) names(direct_sidecars) ``` If you want the effective metadata for a scan after applying inherited sidecars from parent locations, use `get_metadata()` or set `inherit = TRUE` in `read_sidecar()`: ```{r} # Resolve metadata for a specific BOLD file with inheritance resolved_meta <- get_metadata(proj, bold_files[[1]], inherit = TRUE) sort(names(resolved_meta))[1:8] resolved_meta$RepetitionTime # Inheritance-aware sidecar table inherited_sidecars <- read_sidecar( proj, subid = "01", task = "balloonanalogrisktask", inherit = TRUE ) if (nrow(inherited_sidecars) > 0) { inherited_sidecars %>% select(any_of(c("file", "RepetitionTime"))) } else { inherited_sidecars } ``` This is useful when the metadata you need lives in a task- or dataset-level JSON sidecar instead of the most specific file-level sidecar. ## Working with Individual Subjects The `bids_subject()` function provides a convenient interface for working with data from a single subject. It returns a lightweight object with helper functions that automatically filter data for that subject. ```{r} # Create a subject-specific interface for subject 01 subject_01 <- bids_subject(proj, "01") # Get all functional scans for this subject sub01_scans <- subject_01$scans() cat("Subject 01:", length(sub01_scans), "functional scans\n") # Get event files for this subject sub01_events <- subject_01$events() cat("Subject 01:", length(sub01_events), "event files\n") # Read event data for this subject sub01_event_data <- subject_01$events() sub01_event_data ``` This approach is particularly useful when you're doing subject-level analyses: ```{r} subjects_to_analyze <- c("01", "02", "03") for (subj_id in subjects_to_analyze) { subj <- bids_subject(proj, subj_id) scans <- subj$scans() events <- subj$events() cat(sprintf("Subject %s: %d scans, %d event files\n", subj_id, length(scans), length(events))) } ``` The subject interface makes it easy to write analysis pipelines that iterate over subjects without manually constructing filters: ```{r} subject_trial_summary <- lapply(participants(proj)[1:3], function(subj_id) { subj <- bids_subject(proj, subj_id) event_data <- subj$events() n_trials <- if (nrow(event_data) > 0) { event_data %>% unnest(cols = c(data)) %>% nrow() } else { 0 } tibble(subject = subj_id, n_trials = n_trials, n_scans = length(subj$scans())) }) %>% bind_rows() subject_trial_summary ``` ## Advanced Querying with `query_files()` `query_files()` is the primary file-finding API in bidser. It supports exact, regex, and glob matching modes, scoped searches across raw data and derivatives, and can return either paths or a tibble with parsed entities. ### Match Modes ```{r} # Exact entity matching -- reproducible, no regex surprises exact_bold <- query_files( proj, regex = "bold\\.nii\\.gz$", subid = "01", task = "balloonanalogrisktask", match_mode = "exact" ) cat("Exact-match BOLD files:", length(exact_bold), "\n") # Regex entity matching -- select multiple values with patterns regex_bold <- query_files( proj, regex = "bold\\.nii\\.gz$", subid = "0[1-3]", task = "balloon.*", match_mode = "regex" ) cat("Regex-match BOLD files:", length(regex_bold), "\n") # Glob matching -- shell-style wildcards glob_bold <- query_files( proj, regex = "bold\\.nii\\.gz$", subid = "0*", match_mode = "glob" ) cat("Glob-match BOLD files:", length(glob_bold), "\n") ``` ### Entity Presence, Extension, and Datatype Filters ```{r} # Require the queried entity to actually exist on returned files task_annotated <- query_files( proj, regex = "\\.nii\\.gz$", task = ".*", require_entity = TRUE, scope = "raw" ) cat("Files with an explicit task entity:", length(task_annotated), "\n") # Filter by extension and datatype directly json_files <- query_files(proj, extension = "\\.json$") cat("JSON files:", length(json_files), "\n") func_niftis <- query_files(proj, datatype = "func", extension = "\\.nii\\.gz$") cat("Functional NIfTIs:", length(func_niftis), "\n") ``` ### Tibble Output with Parsed Entities ```{r} # Return a tibble instead of paths -- includes all parsed BIDS entities bold_tbl <- query_files( proj, regex = "bold\\.nii\\.gz$", subid = "0[1-3]", return = "tibble" ) bold_tbl |> select(path, subid, task, run) ``` The tibble is sorted deterministically by subject, session, task, run, and path. ### Scoped Queries and Derivatives When derivatives are present, `scope` controls where to search and `pipeline` selects specific derivative pipelines: ```{r derivatives-query, eval = FALSE} deriv_path <- get_example_bids_dataset("ds000001-fmriprep") proj_deriv <- bids_project(deriv_path) # Search only derivatives from a specific pipeline prep_bold <- query_files( proj_deriv, regex = "bold\\.nii\\.gz$", desc = "preproc", scope = "derivatives", pipeline = "fmriprep", match_mode = "exact" ) # Or use the convenience wrapper deriv_bold <- derivative_files(proj_deriv, pipeline = "fmriprep", regex = "bold\\.nii\\.gz$") # Search everywhere and get a tibble with scope/pipeline columns all_bold <- query_files( proj_deriv, regex = "bold\\.nii\\.gz$", scope = "all", return = "tibble" ) ``` ### Permissive Loading `bids_project()` can handle real-world datasets missing `participants.tsv` -- subjects are inferred from the directory tree: ```{r permissive-project, eval = FALSE} proj_relaxed <- bids_project( "/path/to/bids", strict_participants = FALSE ) # Check where participant IDs came from participants(proj_relaxed, as_tibble = TRUE) # See which derivative pipelines were discovered derivative_pipelines(proj_relaxed) ``` ### Run-Level Variables and Report Data `variables_table()` gives you a run-level tibble that nests scan inventory, events, and confounds -- ready for downstream R workflows: ```{r variables-report, eval = FALSE} vars <- variables_table( proj_deriv, scope = "all", pipeline = "fmriprep" ) vars[, c(".subid", ".task", ".run", "n_scans", "n_events", "n_confound_rows")] report <- bids_report(proj_deriv, scope = "all", pipeline = "fmriprep") report ``` ### Full File Paths When you need absolute paths for analysis tools: ```{r} full_paths <- func_scans(proj, subid = "01", full_path = TRUE) full_paths all(file.exists(full_paths)) ``` ## Working with fMRIPrep Derivatives `bidser` automatically discovers derivative pipelines under `derivatives/`. You can query preprocessed scans, confounds, and masks through `query_files()`: ```{r derivatives, eval = FALSE} deriv_path <- get_example_bids_dataset("ds000001-fmriprep") proj_deriv <- bids_project(deriv_path) # See which pipelines were discovered derivative_pipelines(proj_deriv) # Query preprocessed BOLD scans preproc <- query_files( proj_deriv, regex = "bold\\.nii\\.gz$", desc = "preproc", scope = "derivatives", pipeline = "fmriprep", return = "tibble" ) head(preproc$path) # Read confound regressors conf <- read_confounds(proj_deriv, subid = "01") ``` ```{r cleanup, include=FALSE} # Example datasets are cached by get_example_bids_dataset(); leave them in place. ```