| Title: | Programmatic Access to OpenNeuro Datasets |
|---|---|
| Description: | Search, explore, and download datasets from 'OpenNeuro' <https://openneuro.org>, the largest open neuroimaging data repository. Queries the 'OpenNeuro' GraphQL API to discover datasets by modality, diagnosis, or keyword; inspect snapshots, files, and subject lists; and download full datasets or selected subsets via HTTPS, Amazon S3, or 'DataLad'. Downloaded data are cached locally so subsequent requests skip already-fetched files. |
| Authors: | Bradley Buchsbaum [aut, cre, cph] |
| Maintainer: | Bradley Buchsbaum <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-27 00:09:47 UTC |
| Source: | https://github.com/bbuchsbaum/openneuroR |
Converts a fetched OpenNeuro dataset handle into a bidser bids_project
object, enabling BIDS-aware data access to subjects, sessions, files,
and derivatives.
on_bids(handle, fmriprep = FALSE, prep_dir = "derivatives/fmriprep")on_bids(handle, fmriprep = FALSE, prep_dir = "derivatives/fmriprep")
handle |
An |
fmriprep |
Logical. If |
prep_dir |
Character. Path to derivatives directory relative to
the dataset root. If specified, takes precedence over |
This function provides a bridge between OpenNeuro's download system
and bidser's BIDS-aware data structures. The resulting bids_project
object exposes:
Subject and session information
BIDS file listings by modality
Derivatives access (if available)
The bidser package is required but listed as an optional dependency (Suggests). If not installed, a helpful message guides installation.
A bids_project object from the bidser package.
When fmriprep = TRUE, the function looks for derivatives at
derivatives/fmriprep within the dataset. You can specify a custom
derivatives path with prep_dir.
If prep_dir is set to a non-default value, it takes precedence over
fmriprep = TRUE. A warning is issued if the requested derivatives
path does not exist.
on_handle() to create a handle, on_fetch() to download data.
## Not run: # Basic usage handle <- on_handle("ds000001") handle <- on_fetch(handle) bids <- on_bids(handle) # Auto-fetch if needed handle <- on_handle("ds000002") bids <- on_bids(handle) # Fetches automatically # Include fMRIPrep derivatives bids <- on_bids(handle, fmriprep = TRUE) # Custom derivatives path bids <- on_bids(handle, prep_dir = "derivatives/custom-pipeline") ## End(Not run)## Not run: # Basic usage handle <- on_handle("ds000001") handle <- on_fetch(handle) bids <- on_bids(handle) # Auto-fetch if needed handle <- on_handle("ds000002") bids <- on_bids(handle) # Fetches automatically # Include fMRIPrep derivatives bids <- on_bids(handle, fmriprep = TRUE) # Custom derivatives path bids <- on_bids(handle, prep_dir = "derivatives/custom-pipeline") ## End(Not run)
Removes cached datasets. Can clear a specific dataset or all cached data.
on_cache_clear(dataset_id = NULL, confirm = interactive())on_cache_clear(dataset_id = NULL, confirm = interactive())
dataset_id |
Dataset identifier to clear (e.g., "ds000001"), or NULL to clear all cached datasets. |
confirm |
If TRUE (default in interactive sessions), asks for confirmation before clearing. Set FALSE to skip confirmation. |
Invisibly returns the number of datasets cleared.
## Not run: # Clear specific dataset (with confirmation) on_cache_clear("ds000001") # Clear specific dataset without confirmation on_cache_clear("ds000001", confirm = FALSE) # Clear all cached datasets on_cache_clear() ## End(Not run)## Not run: # Clear specific dataset (with confirmation) on_cache_clear("ds000001") # Clear specific dataset without confirmation on_cache_clear("ds000001", confirm = FALSE) # Clear all cached datasets on_cache_clear() ## End(Not run)
Returns information about the openneuroR cache location and total size.
on_cache_info()on_cache_info()
A list with:
Path to cache directory
Number of cached datasets
Total size in bytes
Human-readable total size (e.g., "5.3 GB")
## Not run: # Get cache info info <- on_cache_info() info$cache_path # Where cache is stored info$n_datasets # How many datasets info$size_formatted # Human-readable size ## End(Not run)## Not run: # Get cache info info <- on_cache_info() info$cache_path # Where cache is stored info$n_datasets # How many datasets info$size_formatted # Human-readable size ## End(Not run)
Returns a tibble of all datasets currently in the openneuroR cache.
on_cache_list()on_cache_list()
A tibble with columns:
Dataset identifier (e.g., "ds000001")
Cached snapshot version (may be NA if unknown)
Number of cached files
Total size in bytes
Human-readable size (e.g., "1.2 GB")
When first cached (ISO 8601 timestamp)
Type of cached data: "raw" for raw dataset files, "derivative" for fMRIPrep/MRIQC outputs, or "raw+derivative" if both are cached
## Not run: # List all cached datasets on_cache_list() # Check total cache usage cached <- on_cache_list() sum(cached$total_size) # total bytes # Filter to only derivatives cached[grepl("derivative", cached$type), ] ## End(Not run)## Not run: # List all cached datasets on_cache_list() # Check total cache usage cached <- on_cache_list() sum(cached$total_size) # total bytes # Filter to only derivatives cached[grepl("derivative", cached$type), ] ## End(Not run)
Creates a client object for accessing the OpenNeuro GraphQL API. The client stores configuration including the API endpoint URL and optional authentication token.
on_client(url = "https://openneuro.org/crn/graphql", token = NULL)on_client(url = "https://openneuro.org/crn/graphql", token = NULL)
url |
API endpoint URL. Defaults to the OpenNeuro GraphQL endpoint. |
token |
API token for authentication. Defaults to the value of the
|
An openneuro_client object (S3 class) containing:
The API endpoint URL
The authentication token (or NULL)
on_request() for executing queries with the client
# Create client with default settings client <- on_client() print(client) # Create client with custom endpoint client <- on_client(url = "https://staging.openneuro.org/crn/graphql")# Create client with default settings client <- on_client() print(client) # Create client with custom endpoint client <- on_client(url = "https://staging.openneuro.org/crn/graphql")
Retrieves detailed metadata for a single OpenNeuro dataset.
on_dataset(id, client = NULL)on_dataset(id, client = NULL)
id |
Dataset identifier (e.g., "ds000001"). |
client |
An |
A tibble with one row containing:
Dataset identifier
Dataset title
Timestamp when dataset was created (POSIXct)
Whether the dataset is publicly accessible (logical)
Tag of the most recent snapshot (if any)
on_search() to find datasets, on_snapshots() for version history
## Not run: # Get metadata for a specific dataset ds <- on_dataset("ds000001") print(ds) # Access fields ds$name ds$created ## End(Not run)## Not run: # Get metadata for a specific dataset ds <- on_dataset("ds000001") print(ds) # Access fields ds$name ds$created ## End(Not run)
Finds derivative datasets (fMRIPrep, MRIQC, etc.) available for an OpenNeuro dataset. Searches both embedded derivatives within the dataset and external derivatives from the OpenNeuroDerivatives GitHub organization.
on_derivatives( dataset_id, sources = c("embedded", "openneuro-derivatives"), refresh = FALSE, client = NULL )on_derivatives( dataset_id, sources = c("embedded", "openneuro-derivatives"), refresh = FALSE, client = NULL )
dataset_id |
Dataset identifier (e.g., "ds000102"). |
sources |
Character vector specifying which sources to check.
Default is |
refresh |
If |
client |
An |
Embedded derivatives are stored directly within the dataset's BIDS
structure in a derivatives/ subdirectory. These are typically provided
by the dataset authors.
OpenNeuroDerivatives are externally processed derivatives maintained by the OpenNeuro team, available from the OpenNeuroDerivatives GitHub organization. These are stored on S3 and can be downloaded separately.
When the same pipeline exists in both sources, embedded derivatives are preferred and the OpenNeuroDerivatives entry is removed from results. This follows the principle that author-provided derivatives should take precedence.
Results are cached per-session to minimize API calls. Use refresh = TRUE
to bypass the cache and fetch fresh data.
A tibble with one row per available derivative, containing:
The dataset identifier
Pipeline name (e.g., "fmriprep", "mriqc")
Where the derivative is from: "embedded" or "openneuro-derivatives"
Pipeline version (NA if not available)
Number of subjects processed (NA if not available)
Number of derivative files (NA if not available)
Human-readable size (e.g., "2.3 GB", NA if not available)
Last modification time (POSIXct, NA if not available)
S3 URL for OpenNeuroDerivatives sources (NA for embedded)
Returns an empty tibble with the same structure if no derivatives are found.
on_files() for listing files within datasets
## Not run: # Find all derivatives for a dataset derivs <- on_derivatives("ds000102") print(derivs) # Check only OpenNeuroDerivatives (GitHub) github_derivs <- on_derivatives("ds000102", sources = "openneuro-derivatives") # Check only embedded derivatives embedded_derivs <- on_derivatives("ds000102", sources = "embedded") # Force refresh of cached data fresh_derivs <- on_derivatives("ds000102", refresh = TRUE) # Filter for fMRIPrep derivatives fmriprep <- derivs[derivs$pipeline == "fmriprep", ] ## End(Not run)## Not run: # Find all derivatives for a dataset derivs <- on_derivatives("ds000102") print(derivs) # Check only OpenNeuroDerivatives (GitHub) github_derivs <- on_derivatives("ds000102", sources = "openneuro-derivatives") # Check only embedded derivatives embedded_derivs <- on_derivatives("ds000102", sources = "embedded") # Force refresh of cached data fresh_derivs <- on_derivatives("ds000102", refresh = TRUE) # Filter for fMRIPrep derivatives fmriprep <- derivs[derivs$pipeline == "fmriprep", ] ## End(Not run)
Reports the status of all available download backends, showing which are installed, their versions, and readiness for use.
on_doctor()on_doctor()
Invisibly returns an object of class openneuro_doctor containing:
List with available (always TRUE), version (NA)
List with available (logical), version (character or NA)
List with available (logical), version (character or NA)
on_doctor()on_doctor()
Downloads files from an OpenNeuro dataset to local disk. Supports downloading the full dataset, specific files, files matching a regex pattern, or specific subjects.
on_download( id, tag = NULL, files = NULL, subjects = NULL, include_derivatives = TRUE, dest_dir = NULL, use_cache = TRUE, quiet = FALSE, verbose = FALSE, force = FALSE, backend = NULL, client = NULL )on_download( id, tag = NULL, files = NULL, subjects = NULL, include_derivatives = TRUE, dest_dir = NULL, use_cache = TRUE, quiet = FALSE, verbose = FALSE, force = FALSE, backend = NULL, client = NULL )
id |
Dataset identifier (e.g., "ds000001"). |
tag |
Snapshot version tag. If NULL (default), uses latest snapshot. |
files |
Character vector of specific files to download, or a single regex pattern (detected by presence of regex metacharacters). If NULL (default), downloads all files. |
subjects |
Character vector of subject IDs (e.g., |
include_derivatives |
If TRUE (default) and |
dest_dir |
Destination directory. If NULL (default) and |
use_cache |
If TRUE (default) and dest_dir is NULL, downloads to CRAN-compliant cache location. Set FALSE to use current working directory. Ignored when dest_dir is explicitly provided. |
quiet |
If TRUE, suppress all progress output. Default FALSE. |
verbose |
If TRUE, show per-file progress in addition to overall progress. Default FALSE. |
force |
If TRUE, re-download files even if they exist with correct size. Default FALSE. |
backend |
Backend to use for downloading: "datalad", "s3", or "https". If NULL (default), auto-selects best available backend with priority: DataLad > S3 > HTTPS. DataLad provides git-annex integrity verification, S3 uses AWS CLI for fast parallel sync, HTTPS is the universal fallback. |
client |
An openneuro_client object. If NULL, creates default client. |
By default, files are downloaded to a CRAN-compliant cache location (platform-specific, see Details). Repeat downloads of the same files are skipped automatically based on manifest tracking.
Cache locations by platform:
Mac: ~/Library/Caches/R/openneuroR
Linux: ~/.cache/R/openneuroR
Windows: ~/AppData/Local/R/cache/openneuroR
Each dataset is stored in a subdirectory by dataset ID. A manifest.json file tracks downloaded files, enabling automatic skip of already-cached files on repeat downloads.
Backend selection:
DataLad: Clones from OpenNeuroDatasets GitHub with git-annex.
Provides cryptographic integrity verification. Requires datalad and
git-annex CLI tools.
S3: Uses AWS CLI s3 sync for fast parallel downloads.
Requires aws CLI tool.
HTTPS: Direct file downloads via httr2. Always available, no external dependencies.
Subject filtering:
When subjects is specified, only files belonging to those subjects are
downloaded, plus root-level files (e.g., dataset_description.json,
participants.tsv). Subject IDs can be provided with or without the
"sub-" prefix - both "01" and "sub-01" work.
For pattern matching, wrap the pattern in regex(). Patterns are
auto-anchored for full subject ID matching, so regex("sub-01") will
match "sub-01" but not "sub-010".
Invisibly returns a list with:
Number of files downloaded
Number of files skipped (already cached or existed)
Character vector of failed file names
Total bytes downloaded
Path to destination directory
Backend used for download (if S3 or DataLad)
## Not run: # Download to cache (default - auto-selects best backend) on_download("ds000001", files = "participants.tsv") # Repeat download skips cached files result <- on_download("ds000001", files = "participants.tsv") result$skipped # >= 1 (files already in cache) # Download to specific directory (bypasses cache) on_download("ds000001", dest_dir = "~/data/openneuro") # Download to current working directory on_download("ds000001", use_cache = FALSE) # Force re-download of cached files on_download("ds000001", force = TRUE) # Use specific backend on_download("ds000001", backend = "s3") on_download("ds000001", backend = "https") # Force HTTPS # Download specific subjects on_download("ds000001", subjects = c("sub-01", "sub-02")) # Download subjects matching pattern on_download("ds000001", subjects = regex("sub-0[1-5]")) # Download subjects without derivatives on_download("ds000001", subjects = c("01", "02"), include_derivatives = FALSE) ## End(Not run)## Not run: # Download to cache (default - auto-selects best backend) on_download("ds000001", files = "participants.tsv") # Repeat download skips cached files result <- on_download("ds000001", files = "participants.tsv") result$skipped # >= 1 (files already in cache) # Download to specific directory (bypasses cache) on_download("ds000001", dest_dir = "~/data/openneuro") # Download to current working directory on_download("ds000001", use_cache = FALSE) # Force re-download of cached files on_download("ds000001", force = TRUE) # Use specific backend on_download("ds000001", backend = "s3") on_download("ds000001", backend = "https") # Force HTTPS # Download specific subjects on_download("ds000001", subjects = c("sub-01", "sub-02")) # Download subjects matching pattern on_download("ds000001", subjects = regex("sub-0[1-5]")) # Download subjects without derivatives on_download("ds000001", subjects = c("01", "02"), include_derivatives = FALSE) ## End(Not run)
Downloads fMRIPrep, MRIQC, or other derivative outputs from OpenNeuro datasets. Supports filtering by subject, output space, and BIDS suffix. Uses S3 backend (openneuro-derivatives bucket) with HTTPS fallback.
on_download_derivatives( dataset_id, pipeline, subjects = NULL, space = NULL, suffix = NULL, dry_run = FALSE, dest_dir = NULL, use_cache = TRUE, quiet = FALSE, verbose = FALSE, force = FALSE, backend = NULL, client = NULL )on_download_derivatives( dataset_id, pipeline, subjects = NULL, space = NULL, suffix = NULL, dry_run = FALSE, dest_dir = NULL, use_cache = TRUE, quiet = FALSE, verbose = FALSE, force = FALSE, backend = NULL, client = NULL )
dataset_id |
Dataset identifier (e.g., "ds000001"). |
pipeline |
Pipeline name (e.g., "fmriprep", "mriqc"). |
subjects |
Character vector of subject IDs (e.g., |
space |
Character string: output space to filter by (e.g.,
"MNI152NLin2009cAsym", "fsaverage", "T1w"). If |
suffix |
Character vector of BIDS suffixes to filter by (e.g.,
|
dry_run |
If |
dest_dir |
Destination directory. If |
use_cache |
If |
quiet |
If |
verbose |
If |
force |
If |
backend |
Backend to use for downloading: "s3" or "https".
If |
client |
An |
All filters combine with AND logic - a file must match ALL specified
filters to be included. For example, subjects = "sub-01", space = "MNI152NLin2009cAsym"
downloads only sub-01's MNI-space files.
Derivatives are cached in BIDS-compliant structure:
{cache_root}/{dataset_id}/derivatives/{pipeline}/
This keeps derivatives organized alongside raw data while maintaining clear separation by pipeline.
S3 backend is preferred for the openneuro-derivatives bucket as it provides fast parallel sync. HTTPS fallback is used if S3 is unavailable.
Space matching is exact - specify the full space name (e.g.,
"MNI152NLin2009cAsym", not "MNI"). Files without a _space- entity
(native/T1w space per BIDS convention) are always included when
filtering by space.
If dry_run = TRUE, returns a tibble with columns:
Relative path within derivative
File size in bytes
Human-readable size (e.g., "1.2 GB")
Full destination path where file would be downloaded
If dry_run = FALSE, invisibly returns a list with:
Number of files downloaded
Number of files skipped (already cached)
Character vector of failed file names
Total bytes downloaded
Path to destination directory
Backend used for download
on_derivatives() to discover available derivatives,
on_spaces() to discover available output spaces,
on_download() to download raw datasets
## Not run: # Download all fMRIPrep derivatives for a dataset on_download_derivatives("ds000001", "fmriprep") # Download specific subjects on_download_derivatives("ds000001", "fmriprep", subjects = c("sub-01", "sub-02")) # Download only MNI-space outputs on_download_derivatives("ds000001", "fmriprep", space = "MNI152NLin2009cAsym") # Download only BOLD and mask files on_download_derivatives("ds000001", "fmriprep", suffix = c("bold", "mask")) # Preview files without downloading files <- on_download_derivatives("ds000001", "fmriprep", subjects = "sub-01", space = "MNI152NLin2009cAsym", dry_run = TRUE) print(files) # Combine all filters on_download_derivatives("ds000001", "fmriprep", subjects = regex("sub-0[1-5]"), space = "MNI152NLin2009cAsym", suffix = c("bold", "T1w")) ## End(Not run)## Not run: # Download all fMRIPrep derivatives for a dataset on_download_derivatives("ds000001", "fmriprep") # Download specific subjects on_download_derivatives("ds000001", "fmriprep", subjects = c("sub-01", "sub-02")) # Download only MNI-space outputs on_download_derivatives("ds000001", "fmriprep", space = "MNI152NLin2009cAsym") # Download only BOLD and mask files on_download_derivatives("ds000001", "fmriprep", suffix = c("bold", "mask")) # Preview files without downloading files <- on_download_derivatives("ds000001", "fmriprep", subjects = "sub-01", space = "MNI152NLin2009cAsym", dry_run = TRUE) print(files) # Combine all filters on_download_derivatives("ds000001", "fmriprep", subjects = regex("sub-0[1-5]"), space = "MNI152NLin2009cAsym", suffix = c("bold", "T1w")) ## End(Not run)
Materializes a lazy handle by downloading the referenced dataset.
If the handle is already in "ready" state, returns it unchanged
unless force = TRUE.
on_fetch(handle, ...) ## S3 method for class 'openneuro_handle' on_fetch(handle, quiet = FALSE, force = FALSE, ...)on_fetch(handle, ...) ## S3 method for class 'openneuro_handle' on_fetch(handle, quiet = FALSE, force = FALSE, ...)
handle |
An object to fetch. For |
... |
Additional arguments passed to methods. |
quiet |
If TRUE, suppress progress output during download. |
force |
If TRUE, re-download even if handle is already "ready". |
The handle with updated state. For openneuro_handle,
returns the handle with state = "ready", path set to the
download location, and fetch_time set to current time.
You must capture the return value! S3 objects have copy semantics:
# CORRECT handle <- on_fetch(handle) # WRONG - changes are lost on_fetch(handle)
on_handle() to create a handle, on_path() to get path.
## Not run: handle <- on_handle("ds000001", files = "participants.tsv") handle <- on_fetch(handle) # Downloads now handle$state # "ready" ## End(Not run)## Not run: handle <- on_handle("ds000001", files = "participants.tsv") handle <- on_fetch(handle) # Downloads now handle$state # "ready" ## End(Not run)
Lists all files in a dataset snapshot. Can list the root directory or
drill into subdirectories using the tree parameter.
on_files(id, tag = NULL, tree = NULL, client = NULL)on_files(id, tag = NULL, tree = NULL, client = NULL)
id |
Dataset identifier (e.g., "ds000001"). |
tag |
Snapshot version tag (e.g., "1.0.0"). If |
tree |
Subdirectory token for listing nested files. Use the |
client |
An |
OpenNeuro stores datasets using git-annex, where large files are stored
separately from the git repository. The annexed column indicates which
files use this storage method.
To explore a directory structure:
Call on_files() to get the root listing
Filter for directory == TRUE entries
Use the id from a directory to call on_files(tree = id)
A tibble with columns:
Name of the file or directory
File size in bytes (numeric), may be NA for directories
TRUE if this entry is a directory (logical)
TRUE if file is stored in git-annex (logical). Annexed files are typically larger and require special download handling.
Unique identifier for this entry. Pass it as the tree
argument to explore a subdirectory.
List column of direct HTTPS download URLs for the entry (character vector, empty for directories).
Backward-compatible alias of id (the directory tree token).
Returns an empty tibble with the same column structure if the snapshot has no files.
on_snapshots() to list available snapshots
## Not run: # List root files using latest snapshot files <- on_files("ds000001") print(files) # List files in a specific snapshot files <- on_files("ds000001", tag = "1.0.0") # Explore a subdirectory dirs <- files[files$directory, ] if (nrow(dirs) > 0) { subfiles <- on_files("ds000001", tree = dirs$id[1]) print(subfiles) } # Find all annexed (large) files annexed_files <- files[files$annexed & !files$directory, ] ## End(Not run)## Not run: # List root files using latest snapshot files <- on_files("ds000001") print(files) # List files in a specific snapshot files <- on_files("ds000001", tag = "1.0.0") # Explore a subdirectory dirs <- files[files$directory, ] if (nrow(dirs) > 0) { subfiles <- on_files("ds000001", tree = dirs$id[1]) print(subfiles) } # Find all annexed (large) files annexed_files <- files[files$annexed & !files$directory, ] ## End(Not run)
Creates a lazy handle that references an OpenNeuro dataset without triggering an immediate download. The handle can be fetched later when the data is actually needed.
on_handle(dataset_id, tag = NULL, files = NULL, backend = NULL)on_handle(dataset_id, tag = NULL, files = NULL, backend = NULL)
dataset_id |
Dataset identifier (e.g., "ds000001"). |
tag |
Snapshot version tag. If NULL, uses latest snapshot when fetched. |
files |
Character vector of specific files to download when fetched, or a regex pattern. If NULL, downloads all files when fetched. |
backend |
Backend to use when fetching: "datalad", "s3", or "https". If NULL, auto-selects best available backend. |
Handles support a lazy evaluation pattern:
Create handle with on_handle() - no download occurs
Fetch data with on_fetch() - download happens here
Get path with on_path() - returns filesystem path
This is useful for pipelines where dataset references need to be defined early but data should only be downloaded when needed.
An S3 object of class openneuro_handle with state "pending".
S3 objects have copy semantics. You must capture the return value
of on_fetch():
# WRONG - handle not updated on_fetch(handle) handle$state # Still "pending"! # CORRECT - capture returned handle handle <- on_fetch(handle) handle$state # Now "ready"
on_fetch() to materialize the download, on_path() to get path.
## Not run: # Create lazy handle - no download yet handle <- on_handle("ds000001", files = "participants.tsv") print(handle) # Shows state: pending # Fetch when data is needed handle <- on_fetch(handle) print(handle) # Shows state: ready # Get filesystem path path <- on_path(handle) ## End(Not run)## Not run: # Create lazy handle - no download yet handle <- on_handle("ds000001", files = "participants.tsv") print(handle) # Shows state: pending # Fetch when data is needed handle <- on_fetch(handle) print(handle) # Shows state: ready # Get filesystem path path <- on_path(handle) ## End(Not run)
Returns the filesystem path for a fetched handle. Raises an error if the handle has not been fetched yet.
on_path(handle) ## S3 method for class 'openneuro_handle' on_path(handle)on_path(handle) ## S3 method for class 'openneuro_handle' on_path(handle)
handle |
An object to get the path from. For |
Character string with the filesystem path.
on_handle() to create a handle, on_fetch() to materialize.
## Not run: handle <- on_handle("ds000001") handle <- on_fetch(handle) path <- on_path(handle) list.files(path) ## End(Not run)## Not run: handle <- on_handle("ds000001") handle <- on_fetch(handle) path <- on_path(handle) list.files(path) ## End(Not run)
Executes a GraphQL query against the OpenNeuro API. Handles authentication, retry logic, rate limiting, and error handling.
on_request(query, variables = NULL, client = NULL)on_request(query, variables = NULL, client = NULL)
query |
A GraphQL query string. |
variables |
A named list of variables to pass to the query. |
client |
An |
The function implements several reliability features:
Automatic retry on transient errors (429, 500, 502, 503)
Rate limiting (10 requests per minute)
User-Agent header for API identification
Bearer token authentication when available
GraphQL errors (returned with HTTP 200 status) are detected and raised
as R errors with class openneuro_api_error.
The data field from the GraphQL response.
on_client() for creating client objects
## Not run: # Execute a simple query query <- "query { datasets(first: 1) { edges { node { id } } } }" result <- on_request(query) ## End(Not run)## Not run: # Execute a simple query query <- "query { datasets(first: 1) { edges { node { id } } } }" result <- on_request(query) ## End(Not run)
Searches the OpenNeuro database for datasets. When a text query is provided, uses the search endpoint if available. Otherwise lists datasets with optional filtering.
on_search( query = NULL, modality = NULL, limit = 50, all = FALSE, client = NULL )on_search( query = NULL, modality = NULL, limit = 50, all = FALSE, client = NULL )
query |
Text query to search for. Note: The OpenNeuro search API may
have limited availability. If search returns no results, consider using
|
modality |
Filter by modality (e.g., "MRI", "EEG", "MEG", "iEEG", "PET"). Case-insensitive matching is attempted. |
limit |
Maximum number of results to return per page (default 50). |
all |
If |
client |
An |
A tibble with columns:
Dataset identifier (e.g., "ds000001")
Dataset title
Timestamp when dataset was created (POSIXct)
Whether the dataset is publicly accessible (logical)
List of modalities in the dataset
Number of subjects in the dataset
List of tasks in the dataset
Returns an empty tibble with the same column structure if no matches found.
on_dataset() for detailed metadata on a single dataset
## Not run: # List datasets (most reliable) results <- on_search(limit = 10) # Filter by modality mri_datasets <- on_search(modality = "MRI", limit = 25) eeg_datasets <- on_search(modality = "EEG", limit = 25) # Text search (may have limited availability) results <- on_search("visual cortex", limit = 10) # Get all datasets (may be slow) all_datasets <- on_search(all = TRUE) ## End(Not run)## Not run: # List datasets (most reliable) results <- on_search(limit = 10) # Filter by modality mri_datasets <- on_search(modality = "MRI", limit = 25) eeg_datasets <- on_search(modality = "EEG", limit = 25) # Text search (may have limited availability) results <- on_search("visual cortex", limit = 10) # Get all datasets (may be slow) all_datasets <- on_search(all = TRUE) ## End(Not run)
Retrieves all snapshots (versioned releases) for a dataset. Snapshots are immutable versions of the dataset that can be referenced by tag.
on_snapshots(id, client = NULL)on_snapshots(id, client = NULL)
id |
Dataset identifier (e.g., "ds000001"). |
client |
An |
A tibble with columns:
Snapshot version tag (e.g., "1.0.0")
Timestamp when snapshot was created (POSIXct)
Total size of the snapshot in bytes (numeric)
Rows are ordered with most recent snapshot first. Returns an empty tibble with the same column structure if the dataset has no snapshots.
on_files() to list files in a snapshot, on_dataset() for metadata
## Not run: # List all snapshots for a dataset snaps <- on_snapshots("ds000001") print(snaps) # Get the latest snapshot tag latest_tag <- snaps$tag[1] # Calculate total size in GB snaps$size_gb <- snaps$size / (1024^3) ## End(Not run)## Not run: # List all snapshots for a dataset snaps <- on_snapshots("ds000001") print(snaps) # Get the latest snapshot tag latest_tag <- snaps$tag[1] # Calculate total size in GB snaps$size_gb <- snaps$size / (1024^3) ## End(Not run)
Discovers the available output spaces (MNI152NLin2009cAsym, fsaverage, etc.)
for a derivative dataset. Parses BIDS _space- entity from filenames.
on_spaces(derivative, refresh = FALSE, client = NULL)on_spaces(derivative, refresh = FALSE, client = NULL)
derivative |
A single-row tibble from |
refresh |
If |
client |
An |
This function samples derivative files and extracts the _space-<label>
entity from BIDS-formatted filenames. It does NOT infer T1w from files
without a space entity (per BIDS convention, native space files may
omit the space entity).
embedded: Uses the OpenNeuro API to list files in the
derivatives/{pipeline}/ directory.
openneuro-derivatives: Uses AWS CLI to list files from the
s3://openneuro-derivatives/ bucket.
Results are cached per-session to minimize API/S3 calls. Use
refresh = TRUE to bypass the cache.
A character vector of space names, sorted alphabetically. Common spaces include:
Volumetric: MNI152NLin2009cAsym, MNI152NLin6Asym, T1w
Surface: fsaverage, fsaverage5, fsaverage6, fsnative
Returns character(0) with a warning if no spaces are found.
on_derivatives() to discover available derivative datasets
## Not run: # First, get available derivatives for a dataset derivs <- on_derivatives("ds000102") print(derivs) # Then get spaces for the first derivative spaces <- on_spaces(derivs[1, ]) print(spaces) # Example output: c("MNI152NLin2009cAsym", "fsaverage") # Force refresh of cached spaces spaces <- on_spaces(derivs[1, ], refresh = TRUE) ## End(Not run)## Not run: # First, get available derivatives for a dataset derivs <- on_derivatives("ds000102") print(derivs) # Then get spaces for the first derivative spaces <- on_spaces(derivs[1, ]) print(spaces) # Example output: c("MNI152NLin2009cAsym", "fsaverage") # Force refresh of cached spaces spaces <- on_spaces(derivs[1, ], refresh = TRUE) ## End(Not run)
Returns the subject IDs present in a dataset snapshot without downloading any data. This is a metadata-only query using the OpenNeuro GraphQL API.
on_subjects(id, tag = NULL, client = NULL)on_subjects(id, tag = NULL, client = NULL)
id |
Dataset identifier (e.g., "ds000001"). |
tag |
Snapshot version tag (e.g., "1.0.0"). If |
client |
An |
Subject IDs are returned in natural sort order, so "sub-10" comes after "sub-9" rather than after "sub-1".
The n_sessions and n_files columns provide dataset-level context. Per-subject session and file counts are not available from the OpenNeuro API.
A tibble with columns:
The dataset identifier
Subject identifier (e.g., "sub-01")
Number of sessions in the dataset (same for all rows)
Estimated files per subject (same for all rows)
Returns an empty tibble with the same column structure if the dataset has no BIDS subjects (e.g., non-BIDS datasets).
on_files() to list files, on_download() to download data
## Not run: # List subjects in a dataset subjects <- on_subjects("ds000001") print(subjects) # List subjects in a specific snapshot subjects <- on_subjects("ds000001", tag = "1.0.0") # Get subject count nrow(subjects) ## End(Not run)## Not run: # List subjects in a dataset subjects <- on_subjects("ds000001") print(subjects) # List subjects in a specific snapshot subjects <- on_subjects("ds000001", tag = "1.0.0") # Get subject count nrow(subjects) ## End(Not run)
Displays styled CLI output showing backend availability and versions.
## S3 method for class 'openneuro_doctor' print(x, ...)## S3 method for class 'openneuro_doctor' print(x, ...)
x |
An |
... |
Additional arguments (ignored). |
x invisibly.
Print Method for OpenNeuro Handle
## S3 method for class 'openneuro_handle' print(x, ...)## S3 method for class 'openneuro_handle' print(x, ...)
x |
An |
... |
Additional arguments (ignored). |
x invisibly.
Creates a regex pattern object for use with the subjects parameter in
on_download(). Patterns are auto-anchored to match complete subject IDs.
regex(pattern)regex(pattern)
pattern |
A single non-empty character string containing a regex pattern. |
A character vector with class c("on_regex", "character").
on_download() for downloading with subject filters
# Match subjects sub-01 through sub-05 regex("sub-0[1-5]") # Match any subject starting with sub-1 regex("sub-1.*") ## Not run: # Use in on_download() on_download("ds000001", subjects = regex("sub-0[1-5]")) ## End(Not run)# Match subjects sub-01 through sub-05 regex("sub-0[1-5]") # Match any subject starting with sub-1 regex("sub-1.*") ## Not run: # Use in on_download() on_download("ds000001", subjects = regex("sub-0[1-5]")) ## End(Not run)