| Title: | Lazy Delayed Arrays with Fused Execution |
|---|---|
| Description: | Provides a lightweight delayed array abstraction with tidy-friendly verbs, expression fusion, and pluggable storage backends. |
| Authors: | Bradley Buchsbaum [aut, cre] |
| Maintainer: | Bradley Buchsbaum <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-05-15 08:59:31 UTC |
| Source: | https://github.com/bbuchsbaum/delarr |
Performs array-style slicing lazily, capturing the indices in the DAG.
For 2D arrays, standard x[i, j] syntax works. For N-d arrays, provide
one index expression per dimension: x[i, j, k, ...].
## S3 method for class 'delarr' x[..., drop = FALSE]## S3 method for class 'delarr' x[..., drop = FALSE]
x |
A |
... |
Index expressions, one per dimension. Missing indices select all. |
drop |
Logical indicating whether to drop dimensions (ignored lazily). |
A delarr containing the slice operation.
Materialise a delayed matrix as a base matrix
## S3 method for class 'delarr' as.matrix(x, ...)## S3 method for class 'delarr' as.matrix(x, ...)
x |
A |
... |
Passed to |
A base matrix containing the realised data.
Evaluates a delarr slice-by-slice, materialising manageable chunks for
further processing without realising the full matrix.
block_apply( x, margin = c("cols", "rows"), size = 16384L, fn, parallel = FALSE, workers = NULL )block_apply( x, margin = c("cols", "rows"), size = 16384L, fn, parallel = FALSE, workers = NULL )
x |
A |
margin |
Dimension along which to chunk ( |
size |
Approximate chunk size. |
fn |
Function applied to each materialised chunk. |
parallel |
Logical; process chunks in parallel when possible. |
workers |
Number of worker processes for parallel execution. |
A list of results returned by fn.
mat <- matrix(1:20, nrow = 4, ncol = 5) darr <- delarr(mat) # Apply function to column chunks col_maxes <- block_apply(darr, margin = "cols", size = 2L, fn = function(block) { apply(block, 2, max) }) unlist(col_maxes) # Apply function to row chunks row_means <- block_apply(darr, margin = "rows", size = 2L, fn = function(block) { rowMeans(block) }) unlist(row_means)mat <- matrix(1:20, nrow = 4, ncol = 5) darr <- delarr(mat) # Apply function to column chunks col_maxes <- block_apply(darr, margin = "cols", size = 2L, fn = function(block) { apply(block, 2, max) }) unlist(col_maxes) # Apply function to row chunks row_means <- block_apply(darr, margin = "rows", size = 2L, fn = function(block) { rowMeans(block) }) unlist(row_means)
Streams column chunks from the backing seed, applying deferred operations
and optional reductions on the fly. By default the result is returned as a
base matrix or vector; alternatively, supply a writer via into to stream
the output elsewhere (e.g., hdf5_writer()).
collect( x, into = NULL, chunk_size = NULL, chunk_margin = c("cols", "rows"), target_bytes = NULL, parallel = FALSE, workers = NULL, optimize = TRUE )collect( x, into = NULL, chunk_size = NULL, chunk_margin = c("cols", "rows"), target_bytes = NULL, parallel = FALSE, workers = NULL, optimize = TRUE )
x |
A |
into |
Optional writer or callback used to receive streamed chunks. |
chunk_size |
Optional chunk size along |
chunk_margin |
Chunking axis for non-reduction collection. |
target_bytes |
Optional memory budget (bytes) used to adapt chunk size. |
parallel |
Logical; attempt parallel chunk execution when safe. |
workers |
Number of worker processes when |
optimize |
Logical; run lightweight DAG optimizations before evaluation. |
A realised matrix/vector, or NULL invisibly when writing to
into.
# Basic materialization mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) collect(darr) # Collect after lazy operations result <- darr |> d_map(~ .x^2) |> collect() result# Basic materialization mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) collect(darr) # Collect after lazy operations result <- darr |> d_map(~ .x^2) |> collect() result
Evaluates a delarr pipeline in parallel using shard::shard_map(). This
gives proper multi-process parallelism with shared-memory I/O, including
parallel reductions.
collect_shard(x, workers = NULL, chunk_size = NULL, optimize = TRUE)collect_shard(x, workers = NULL, chunk_size = NULL, optimize = TRUE)
x |
A |
workers |
Number of worker processes. Defaults to
|
chunk_size |
Column chunk size for sharding. If |
optimize |
Logical; run DAG optimizations before evaluation. |
Pipelines that require full-matrix evaluation (row-wise center/scale/zscore/
detrend), paired RHS delarrs (d_map2 with two delarrs), or generic
(user-supplied) reductions automatically fall back to sequential collect().
A materialised matrix or vector.
if (requireNamespace("shard", quietly = TRUE)) { old_conn <- getAllConnections() mat <- matrix(rnorm(100), 10, 10) darr <- delarr_shard(mat) result <- collect_shard(darr |> d_map(~ .x^2), workers = 2) all.equal(result, mat^2) new_conn <- setdiff(getAllConnections(), old_conn) for (con in new_conn) try(close(getConnection(con)), silent = TRUE) }if (requireNamespace("shard", quietly = TRUE)) { old_conn <- getAllConnections() mat <- matrix(rnorm(100), 10, 10) darr <- delarr_shard(mat) result <- collect_shard(darr |> d_map(~ .x^2), workers = 2) all.equal(result, mat^2) new_conn <- setdiff(getAllConnections(), old_conn) for (con in new_conn) try(close(getConnection(con)), silent = TRUE) }
Generic counterpart to matrixStats::colMeans2(). Methods are provided for
delarr objects, but packages can extend the generic for their own delayed
types.
colMeans2(x, ...)colMeans2(x, ...)
x |
An object for which row means should be computed. |
... |
Additional arguments passed to methods. |
Typically a numeric vector of column means.
mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) # Compute column means lazily colMeans2(darr) # Compare with base R colMeans(mat)mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) # Compute column means lazily colMeans2(darr) # Compare with base R colMeans(mat)
Computes column means lazily via d_reduce(); acts as a drop-in replacement
for matrixStats::colMeans2().
## S3 method for class 'delarr' colMeans2(x, ..., na.rm = FALSE)## S3 method for class 'delarr' colMeans2(x, ..., na.rm = FALSE)
x |
A |
... |
Unused. |
na.rm |
Logical; remove missing values before averaging. |
A numeric vector of column means.
Permute dimensions of a delayed array
d_aperm(x, perm = rev(seq_along(dim(x))), chunk_size = NULL)d_aperm(x, perm = rev(seq_along(dim(x))), chunk_size = NULL)
x |
A |
perm |
A permutation of |
chunk_size |
Optional chunk size used for internal pulls. |
A permuted delarr.
Center a delayed matrix along rows or columns
d_center(x, dim = c("rows", "cols"), axis = NULL, na.rm = FALSE)d_center(x, dim = c("rows", "cols"), axis = NULL, na.rm = FALSE)
x |
A |
dim |
Dimension along which to subtract the mean. |
axis |
Integer axis for N-d arrays (alternative to |
na.rm |
Logical; remove missing values when computing the centre. |
A delarr with a deferred centering operation.
mat <- matrix(c(1, 2, 3, 10, 20, 30), nrow = 2, ncol = 3) darr <- delarr(mat) # Center rows (subtract row means) centered_rows <- darr |> d_center(dim = "rows") |> collect() centered_rows rowMeans(centered_rows) # Should be ~0 # Center columns (subtract column means) centered_cols <- darr |> d_center(dim = "cols") |> collect() colMeans(centered_cols) # Should be ~0mat <- matrix(c(1, 2, 3, 10, 20, 30), nrow = 2, ncol = 3) darr <- delarr(mat) # Center rows (subtract row means) centered_rows <- darr |> d_center(dim = "rows") |> collect() centered_rows rowMeans(centered_rows) # Should be ~0 # Center columns (subtract column means) centered_cols <- darr |> d_center(dim = "cols") |> collect() colMeans(centered_cols) # Should be ~0
Removes a polynomial trend of the specified degree along the chosen dimension.
d_detrend(x, dim = c("rows", "cols"), axis = NULL, degree = 1L)d_detrend(x, dim = c("rows", "cols"), axis = NULL, degree = 1L)
x |
A |
dim |
Dimension along which to fit the trend. |
axis |
Integer axis for N-d arrays (alternative to |
degree |
Polynomial degree (default 1). |
A delarr with the detrend operation queued.
# Create matrix with linear trend in rows mat <- matrix(1:12 + rep(1:4, each = 3), nrow = 3, ncol = 4) darr <- delarr(mat) # Remove linear trend along rows detrended <- darr |> d_detrend(dim = "rows", degree = 1L) |> collect() detrended # Remove quadratic trend quad_detrend <- darr |> d_detrend(dim = "rows", degree = 2L) |> collect() quad_detrend# Create matrix with linear trend in rows mat <- matrix(1:12 + rep(1:4, each = 3), nrow = 3, ncol = 4) darr <- delarr(mat) # Remove linear trend along rows detrended <- darr |> d_detrend(dim = "rows", degree = 1L) |> collect() detrended # Remove quadratic trend quad_detrend <- darr |> d_detrend(dim = "rows", degree = 2L) |> collect() quad_detrend
Apply an elementwise transformation lazily
d_map(x, f)d_map(x, f)
x |
A |
f |
A function or formula suitable for |
A delarr representing the transformation.
mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) # Apply elementwise transformation with formula squared <- darr |> d_map(~ .x^2) |> collect() squared # Apply with function logged <- darr |> d_map(log1p) |> collect() loggedmat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) # Apply elementwise transformation with formula squared <- darr |> d_map(~ .x^2) |> collect() squared # Apply with function logged <- darr |> d_map(log1p) |> collect() logged
Apply a binary elementwise transformation lazily
d_map2(x, y, f)d_map2(x, y, f)
x |
A |
y |
Another |
f |
A function or formula combining two arguments. |
A delarr representing the fused binary operation.
mat1 <- matrix(1:12, nrow = 3, ncol = 4) mat2 <- matrix(12:1, nrow = 3, ncol = 4) darr1 <- delarr(mat1) darr2 <- delarr(mat2) # Binary operation between two delayed matrices added <- d_map2(darr1, darr2, ~ .x + .y) |> collect() added # Binary operation with scalar scaled <- d_map2(darr1, 10, ~ .x * .y) |> collect() scaledmat1 <- matrix(1:12, nrow = 3, ncol = 4) mat2 <- matrix(12:1, nrow = 3, ncol = 4) darr1 <- delarr(mat1) darr2 <- delarr(mat2) # Binary operation between two delayed matrices added <- d_map2(darr1, darr2, ~ .x + .y) |> collect() added # Binary operation with scalar scaled <- d_map2(darr1, 10, ~ .x * .y) |> collect() scaled
Delayed matrix multiplication
d_matmul(x, y, chunk_size = NULL)d_matmul(x, y, chunk_size = NULL)
x |
A |
y |
A |
chunk_size |
Optional chunk size used during block pulls. |
A delarr representing %*%.
For 2D arrays use dim = "rows" or "cols". For N-d arrays you can
also supply a numeric axis indicating which dimension to collapse.
d_reduce(x, f = base::sum, dim = c("rows", "cols"), axis = NULL, na.rm = FALSE)d_reduce(x, f = base::sum, dim = c("rows", "cols"), axis = NULL, na.rm = FALSE)
x |
A |
f |
A reduction function (defaults to |
dim |
Dimension to reduce: |
axis |
Integer axis to collapse (alternative to |
na.rm |
Logical; remove missing values while reducing. |
A delarr capturing the reduction.
mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) row_sums <- darr |> d_reduce(sum, dim = "rows") |> collect() row_sums col_means <- darr |> d_reduce(mean, dim = "cols") |> collect() col_meansmat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) row_sums <- darr |> d_reduce(sum, dim = "rows") |> collect() row_sums col_means <- darr |> d_reduce(mean, dim = "cols") |> collect() col_means
Run multiple reductions and collect results
d_reduce_many( x, fns, dim = c("rows", "cols"), na.rm = FALSE, chunk_size = NULL, simplify = TRUE )d_reduce_many( x, fns, dim = c("rows", "cols"), na.rm = FALSE, chunk_size = NULL, simplify = TRUE )
x |
A |
fns |
A named list of reduction functions. |
dim |
Reduction dimension ( |
na.rm |
Logical; remove missing values in each reducer. |
chunk_size |
Optional chunk size passed to |
simplify |
Logical; combine equal-length outputs into a matrix. |
A named list (or matrix when simplify = TRUE) of reductions.
Scale a delayed matrix along rows or columns
d_scale( x, dim = c("rows", "cols"), axis = NULL, center = TRUE, scale = TRUE, na.rm = FALSE )d_scale( x, dim = c("rows", "cols"), axis = NULL, center = TRUE, scale = TRUE, na.rm = FALSE )
x |
A |
dim |
Dimension to scale. |
axis |
Integer axis for N-d arrays (alternative to |
center |
Logical; subtract the mean before scaling. |
scale |
Logical; divide by the standard deviation. |
na.rm |
Logical; remove missing values when computing statistics. |
A delarr with a deferred scaling operation.
mat <- matrix(c(1, 2, 3, 10, 20, 30), nrow = 2, ncol = 3) darr <- delarr(mat) # Scale rows (center and divide by SD) scaled <- darr |> d_scale(dim = "rows") |> collect() scaled # Scale without centering scaled_only <- darr |> d_scale(dim = "rows", center = FALSE) |> collect() scaled_onlymat <- matrix(c(1, 2, 3, 10, 20, 30), nrow = 2, ncol = 3) darr <- delarr(mat) # Scale rows (center and divide by SD) scaled <- darr |> d_scale(dim = "rows") |> collect() scaled # Scale without centering scaled_only <- darr |> d_scale(dim = "rows", center = FALSE) |> collect() scaled_only
Transpose a delayed matrix
d_transpose(x, chunk_size = NULL)d_transpose(x, chunk_size = NULL)
x |
A |
chunk_size |
Optional chunk size used for internal pulls. |
A transposed delarr.
Elements failing the predicate are replaced with fill at materialisation
time.
d_where(x, predicate, fill = 0)d_where(x, predicate, fill = 0)
x |
A |
predicate |
A function or formula returning a logical matrix. |
fill |
Replacement value for elements where the predicate is |
A delarr including the mask.
mat <- matrix(c(-1, 2, -3, 4, -5, 6), nrow = 2, ncol = 3) darr <- delarr(mat) # Replace negative values with 0 masked <- darr |> d_where(~ .x >= 0, fill = 0) |> collect() masked # Replace values below threshold with NA filtered <- darr |> d_where(~ .x > 1, fill = NA) |> collect() filteredmat <- matrix(c(-1, 2, -3, 4, -5, 6), nrow = 2, ncol = 3) darr <- delarr(mat) # Replace negative values with 0 masked <- darr |> d_where(~ .x >= 0, fill = 0) |> collect() masked # Replace values below threshold with NA filtered <- darr |> d_where(~ .x > 1, fill = NA) |> collect() filtered
Equivalent to centering and scaling with unit variance.
d_zscore(x, dim = c("rows", "cols"), axis = NULL, na.rm = FALSE)d_zscore(x, dim = c("rows", "cols"), axis = NULL, na.rm = FALSE)
x |
A |
dim |
Dimension over which to compute the z-score. |
axis |
Integer axis for N-d arrays (alternative to |
na.rm |
Logical; remove missing values when computing statistics. |
A delarr with the z-score applied lazily.
mat <- matrix(c(1, 2, 3, 10, 20, 30), nrow = 2, ncol = 3) darr <- delarr(mat) # Z-score normalize rows zscored <- darr |> d_zscore(dim = "rows") |> collect() zscored # Row means should be ~0, row SDs should be ~1 rowMeans(zscored)mat <- matrix(c(1, 2, 3, 10, 20, 30), nrow = 2, ncol = 3) darr <- delarr(mat) # Z-score normalize rows zscored <- darr |> d_zscore(dim = "rows") |> collect() zscored # Row means should be ~0, row SDs should be ~1 rowMeans(zscored)
Wraps an existing matrix or delarr_seed in the lightweight delayed
pipeline. Matrix inputs are wrapped in a seed that simply slices the source
object, while delarr inputs are returned unchanged.
delarr(x, ...)delarr(x, ...)
x |
A base matrix or a |
... |
Future extensions; currently ignored. |
A delarr object representing the delayed matrix.
# Create a delayed matrix from a regular matrix mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) darr # Operations are queued lazily result <- darr * 2 result # Materialize with collect() collect(result)# Create a delayed matrix from a regular matrix mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) darr # Operations are queued lazily result <- darr * 2 result # Materialize with collect() collect(result)
Provides a convenience helper that turns a user-supplied slice function into
a ready-to-use delarr object.
delarr_backend( nrow, ncol, pull, chunk_hint = NULL, dimnames = NULL, begin = NULL, end = NULL )delarr_backend( nrow, ncol, pull, chunk_hint = NULL, dimnames = NULL, begin = NULL, end = NULL )
nrow, ncol
|
Dimensions of the logical matrix. |
pull |
Function of |
chunk_hint |
Optional preferred chunking metadata. |
dimnames |
Optional dimnames to expose lazily. |
begin |
Optional function invoked before streaming. |
end |
Optional function invoked after streaming. |
A delarr backed by the provided pull function.
# Create a custom backend from a pull function data <- matrix(1:20, nrow = 4, ncol = 5) darr <- delarr_backend( nrow = 4, ncol = 5, pull = function(rows = NULL, cols = NULL) { rows <- rows %||% seq_len(4) cols <- cols %||% seq_len(5) data[rows, cols, drop = FALSE] } ) darr # Use like any delarr result <- darr |> d_map(~ .x * 2) |> collect() result# Create a custom backend from a pull function data <- matrix(1:20, nrow = 4, ncol = 5) darr <- delarr_backend( nrow = 4, ncol = 5, pull = function(rows = NULL, cols = NULL) { rows <- rows %||% seq_len(4) cols <- cols %||% seq_len(5) data[rows, cols, drop = FALSE] } ) darr # Use like any delarr result <- darr |> d_map(~ .x * 2) |> collect() result
Uses hdf5r to lazily read slices from disk on demand.
delarr_hdf5(path, dataset)delarr_hdf5(path, dataset)
path |
Path to the HDF5 file. |
dataset |
Name of the dataset within the file. |
A delarr that streams data from the HDF5 dataset.
# Create a temporary HDF5 file tf <- tempfile(fileext = ".h5") data <- matrix(1:20, nrow = 4, ncol = 5) # Write test data f <- hdf5r::H5File$new(tf, mode = "w") f$create_dataset("X", robj = data) f$close_all() # Load as delayed array darr <- delarr_hdf5(tf, "X") darr # Apply operations and collect result <- darr |> d_map(~ .x * 2) |> collect() result # Clean up unlink(tf)# Create a temporary HDF5 file tf <- tempfile(fileext = ".h5") data <- matrix(1:20, nrow = 4, ncol = 5) # Write test data f <- hdf5r::H5File$new(tf, mode = "w") f$create_dataset("X", robj = data) f$close_all() # Load as delayed array darr <- delarr_hdf5(tf, "X") darr # Apply operations and collect result <- darr |> d_map(~ .x * 2) |> collect() result # Clean up unlink(tf)
Create a delayed matrix from an in-memory matrix
delarr_mem(x)delarr_mem(x)
x |
A numeric or logical matrix, or an array with at least 2 dimensions. |
A delarr referencing the original object.
# Wrap an in-memory matrix mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr_mem(mat) darr # Apply operations lazily result <- darr |> d_center(dim = "rows") |> collect() result# Wrap an in-memory matrix mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr_mem(mat) darr # Apply operations lazily result <- darr |> d_center(dim = "rows") |> collect() result
Uses the mmap package to lazily read slices from a binary file on demand. The file must contain raw numeric data in column-major order (R's default).
delarr_mmap(path, nrow, ncol, mode = NULL)delarr_mmap(path, nrow, ncol, mode = NULL)
path |
Path to the binary file containing matrix data. |
nrow |
Number of rows in the matrix. |
ncol |
Number of columns in the matrix. |
mode |
mmap mode object specifying data type. Default is double(). |
A delarr that streams data from the memory-mapped file.
This backend supports 2D matrices only. For N-d arrays, use
delarr_hdf5() or wrap an in-memory array with delarr().
# Create a binary file with matrix data mat <- matrix(1:20, nrow = 4, ncol = 5) tf <- tempfile() writeBin(as.double(mat), tf) # Load as delayed array darr <- delarr_mmap(tf, nrow = 4, ncol = 5) darr # Apply operations and collect result <- darr |> d_map(~ .x * 2) |> collect() result # Clean up unlink(tf)# Create a binary file with matrix data mat <- matrix(1:20, nrow = 4, ncol = 5) tf <- tempfile() writeBin(as.double(mat), tf) # Load as delayed array darr <- delarr_mmap(tf, nrow = 4, ncol = 5) darr # Apply operations and collect result <- darr |> d_map(~ .x * 2) |> collect() result # Clean up unlink(tf)
delarr
Seeds encapsulate storage access for delayed matrices. They define matrix
dimensions and a pull() function that returns materialised slices.
delarr_seed( nrow, ncol, pull, chunk_hint = NULL, dimnames = NULL, begin = NULL, end = NULL )delarr_seed( nrow, ncol, pull, chunk_hint = NULL, dimnames = NULL, begin = NULL, end = NULL )
nrow, ncol
|
Number of rows and columns. |
pull |
A function accepting |
chunk_hint |
Optional list describing preferred chunk sizes
(e.g. |
dimnames |
Optional list of dimnames to expose lazily. |
begin |
Optional function invoked before streaming begins. |
end |
Optional function invoked after streaming completes. |
An object of class delarr_seed.
# Create a custom seed with a pull function data <- matrix(1:12, nrow = 3, ncol = 4) seed <- delarr_seed( nrow = 3, ncol = 4, pull = function(rows = NULL, cols = NULL) { rows <- rows %||% seq_len(3) cols <- cols %||% seq_len(4) data[rows, cols, drop = FALSE] } ) seed # Wrap in delarr() to use with lazy operations darr <- delarr(seed) result <- darr |> d_map(~ .x * 2) |> collect() result# Create a custom seed with a pull function data <- matrix(1:12, nrow = 3, ncol = 4) seed <- delarr_seed( nrow = 3, ncol = 4, pull = function(rows = NULL, cols = NULL) { rows <- rows %||% seq_len(3) cols <- cols %||% seq_len(4) data[rows, cols, drop = FALSE] } ) seed # Wrap in delarr() to use with lazy operations darr <- delarr(seed) result <- darr |> d_map(~ .x * 2) |> collect() result
delarr
Creates a seed for arrays with 2 or more dimensions. The pull function receives a list of per-dimension index vectors and returns the corresponding sub-array.
delarr_seed_nd( dims, pull, chunk_hint = NULL, dimnames = NULL, begin = NULL, end = NULL )delarr_seed_nd( dims, pull, chunk_hint = NULL, dimnames = NULL, begin = NULL, end = NULL )
dims |
Integer vector of dimension extents (length >= 2). |
pull |
A function accepting a single argument |
chunk_hint |
Optional list describing preferred chunk sizes. |
dimnames |
Optional list of dimnames (one element per dimension). |
begin |
Optional function invoked before streaming begins. |
end |
Optional function invoked after streaming completes. |
An object of class delarr_seed.
arr <- array(seq_len(24), dim = c(3, 4, 2)) seed <- delarr_seed_nd( dims = c(3, 4, 2), pull = function(indices) { idx <- lapply(seq_along(dim(arr)), function(k) indices[[k]] %||% seq_len(dim(arr)[k])) do.call("[", c(list(arr), idx, list(drop = FALSE))) } ) dim(seed)arr <- array(seq_len(24), dim = c(3, 4, 2)) seed <- delarr_seed_nd( dims = c(3, 4, 2), pull = function(indices) { idx <- lapply(seq_along(dim(arr)), function(k) indices[[k]] %||% seq_len(dim(arr)[k])) do.call("[", c(list(arr), idx, list(drop = FALSE))) } ) dim(seed)
Wraps a numeric matrix or array into shard's shared memory, returning a
delarr.
The shared ALTREP vector is stored on the seed so that collect_shard()
can reuse it without re-sharing (zero-copy).
delarr_shard(x, backing = "auto")delarr_shard(x, backing = "auto")
x |
A numeric matrix or array. |
backing |
Backing type passed to |
A delarr backed by shared memory.
if (requireNamespace("shard", quietly = TRUE)) { mat <- matrix(rnorm(20), 4, 5) darr <- delarr_shard(mat) collect(darr) }if (requireNamespace("shard", quietly = TRUE)) { mat <- matrix(rnorm(20), 4, 5) darr <- delarr_shard(mat) collect(darr) }
Computes the realised dimensions after taking queued slice and reduce operations into account.
## S3 method for class 'delarr' dim(x)## S3 method for class 'delarr' dim(x)
x |
A |
An integer vector of dimension extents.
delarr_seed
Dimensions for a delarr_seed
## S3 method for class 'delarr_seed' dim(x)## S3 method for class 'delarr_seed' dim(x)
x |
A |
An integer vector of dimension extents.
Dimension names for a delayed array
## S3 method for class 'delarr' dimnames(x)## S3 method for class 'delarr' dimnames(x)
x |
A |
A list of per-dimension names or NULL placeholders.
Explain a delayed execution plan
explain( x, chunk_size = NULL, chunk_margin = c("cols", "rows"), target_bytes = NULL, optimize = TRUE )explain( x, chunk_size = NULL, chunk_margin = c("cols", "rows"), target_bytes = NULL, optimize = TRUE )
x |
A |
chunk_size |
Optional chunk size hint. |
chunk_margin |
Chunking axis for non-reduction materialization. |
target_bytes |
Optional memory budget used for adaptive chunking. |
optimize |
Logical; whether to explain the optimized DAG. |
An object of class delarr_explain.
collect()
Creates or extends an HDF5 dataset so that collect(x, into = writer) can
stream column blocks directly to disk without materialising the full matrix
in memory.
hdf5_writer(path, dataset, ncol, chunk = c(128L, 4096L), compression = 4L)hdf5_writer(path, dataset, ncol, chunk = c(128L, 4096L), compression = 4L)
path |
Path to the HDF5 file. The file is created if it does not exist. |
dataset |
Name of the dataset to create or update. |
ncol |
Total number of columns that will be written. The writer uses this to size the target dataset up-front. |
chunk |
Integer vector of length two giving the chunk size
|
compression |
Gzip compression level (0-9). Use 0 for no compression, higher values for better compression at cost of speed. Default is 4. Use NULL to disable compression entirely. |
A writer object with $write() and $finalize() methods understood
by collect().
# Create source data in a temp HDF5 file tf_in <- tempfile(fileext = ".h5") data <- matrix(1:20, nrow = 4, ncol = 5) f <- hdf5r::H5File$new(tf_in, mode = "w") f$create_dataset("X", robj = data) f$close_all() # Load, transform, and stream to output file darr <- delarr_hdf5(tf_in, "X") transformed <- darr |> d_center(dim = "cols") tf_out <- tempfile(fileext = ".h5") writer <- hdf5_writer(tf_out, "result", ncol = ncol(transformed), compression = 4L) collect(transformed, into = writer) # Verify output g <- hdf5r::H5File$new(tf_out, mode = "r") result <- g[["result"]]$read() g$close_all() result # Clean up unlink(c(tf_in, tf_out))# Create source data in a temp HDF5 file tf_in <- tempfile(fileext = ".h5") data <- matrix(1:20, nrow = 4, ncol = 5) f <- hdf5r::H5File$new(tf_in, mode = "w") f$create_dataset("X", robj = data) f$close_all() # Load, transform, and stream to output file darr <- delarr_hdf5(tf_in, "X") transformed <- darr |> d_center(dim = "cols") tf_out <- tempfile(fileext = ".h5") writer <- hdf5_writer(tf_out, "result", ncol = ncol(transformed), compression = 4L) collect(transformed, into = writer) # Verify output g <- hdf5r::H5File$new(tf_out, mode = "r") result <- g[["result"]]$read() g$close_all() result # Clean up unlink(c(tf_in, tf_out))
delarr
Supports elementwise operations between delayed matrices or between a delayed matrix and scalars/matrices.
## S3 method for class 'delarr' Ops(e1, e2)## S3 method for class 'delarr' Ops(e1, e2)
e1, e2
|
Operands supplied by the R math group generics. |
A delarr representing the fused operation.
Applies lightweight algebraic simplifications to reduce unnecessary work
during collect().
optimize_delarr(x)optimize_delarr(x)
x |
A |
A delarr with an optimized operation list.
Pretty-print a delayed matrix
## S3 method for class 'delarr' print(x, ...)## S3 method for class 'delarr' print(x, ...)
x |
A |
... |
Unused. |
The original object, invisibly.
collect() runtimeProfile collect() runtime
profile_collect(x, reps = 3L, ...)profile_collect(x, reps = 3L, ...)
x |
A |
reps |
Number of repetitions. |
... |
Additional arguments forwarded to |
An object of class delarr_profile.
Simple convenience function to read a matrix from an HDF5 dataset. For
lazy/streaming access, use delarr_hdf5() instead.
read_hdf5(path, dataset)read_hdf5(path, dataset)
path |
Path to the HDF5 file. |
dataset |
Name of the dataset to read. |
The matrix stored in the dataset.
# Write and read back mat <- matrix(1:20, nrow = 4, ncol = 5) tf <- tempfile(fileext = ".h5") write_hdf5(mat, tf, "X") read_hdf5(tf, "X") # Clean up unlink(tf)# Write and read back mat <- matrix(1:20, nrow = 4, ncol = 5) tf <- tempfile(fileext = ".h5") write_hdf5(mat, tf, "X") read_hdf5(tf, "X") # Clean up unlink(tf)
Generic counterpart to matrixStats::rowMeans2(). Methods are provided for
delarr objects, but packages can extend the generic for their own delayed
types.
rowMeans2(x, ...)rowMeans2(x, ...)
x |
An object for which row means should be computed. |
... |
Additional arguments passed to methods. |
Typically a numeric vector of row means.
mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) # Compute row means lazily rowMeans2(darr) # Compare with base R rowMeans(mat)mat <- matrix(1:12, nrow = 3, ncol = 4) darr <- delarr(mat) # Compute row means lazily rowMeans2(darr) # Compare with base R rowMeans(mat)
Computes row means lazily via d_reduce(); acts as a drop-in replacement for
matrixStats::rowMeans2().
## S3 method for class 'delarr' rowMeans2(x, ..., na.rm = FALSE)## S3 method for class 'delarr' rowMeans2(x, ..., na.rm = FALSE)
x |
A |
... |
Unused. |
na.rm |
Logical; remove missing values before averaging. |
A numeric vector of row means.
collect()
Creates a writer object backed by shard::buffer() conforming to delarr's
writer protocol. Pass to collect(x, into = shard_writer(...)) to stream
results into shared memory.
shard_writer(nrow, ncol, backing = "auto")shard_writer(nrow, ncol, backing = "auto")
nrow, ncol
|
Dimensions of the output matrix. |
backing |
Backing type passed to |
A writer list with $write(), $finalize(), $result(), and
$close() methods.
This writer supports 2D matrices only. N-d array collection does not
currently support writer-style into targets.
if (requireNamespace("shard", quietly = TRUE)) { mat <- matrix(rnorm(20), 4, 5) darr <- delarr(mat) w <- shard_writer(4, 5) collect(darr |> d_map(~ .x * 2), into = w) w$result() w$close() }if (requireNamespace("shard", quietly = TRUE)) { mat <- matrix(rnorm(20), 4, 5) darr <- delarr(mat) w <- shard_writer(4, 5) collect(darr |> d_map(~ .x * 2), into = w) w$result() w$close() }
Simple convenience function to write a matrix to an HDF5 dataset. For
streaming writes during collect(), use hdf5_writer() instead.
write_hdf5(x, path, dataset, compression = 4L)write_hdf5(x, path, dataset, compression = 4L)
x |
A matrix to write. |
path |
Path to the HDF5 file. Created if it doesn't exist. |
dataset |
Name of the dataset to create. |
compression |
Gzip compression level (0-9), or NULL for no compression. |
The path to the HDF5 file (invisibly).
# Write a matrix to HDF5 mat <- matrix(1:20, nrow = 4, ncol = 5) tf <- tempfile(fileext = ".h5") write_hdf5(mat, tf, "X") # Read it back as a delarr darr <- delarr_hdf5(tf, "X") collect(darr) # Clean up unlink(tf)# Write a matrix to HDF5 mat <- matrix(1:20, nrow = 4, ncol = 5) tf <- tempfile(fileext = ".h5") write_hdf5(mat, tf, "X") # Read it back as a delarr darr <- delarr_hdf5(tf, "X") collect(darr) # Clean up unlink(tf)