Title: | Extensible Data Structures for Multivariate Analysis |
---|---|
Description: | Provides a set of basic and extensible data structures and functions for multivariate analysis, including dimensionality reduction techniques, projection methods, and preprocessing functions. The aim of this package is to offer a flexible and user-friendly framework for multivariate analysis that can be easily extended for custom requirements and specific data analysis tasks. |
Authors: | Bradley Buchsbaum [aut, cre] |
Maintainer: | Bradley Buchsbaum <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-10 03:44:39 UTC |
Source: | https://github.com/bbuchsbaum/multivarious |
add a pre-processing stage
add_node(x, step, ...)
add_node(x, step, ...)
x |
the processing pipeline |
step |
the pre-processing step to add |
... |
extra args |
a new pre-processing pipeline with the added step
Apply a specified rotation to the fitted model
apply_rotation(x, rotation_matrix, ...)
apply_rotation(x, rotation_matrix, ...)
x |
A model object, possibly created using the |
rotation_matrix |
|
... |
extra args |
A modified object with updated components and scores after applying the specified rotation.
apply a pre-processing transform
apply_transform(x, X, colind, ...)
apply_transform(x, X, colind, ...)
x |
the pre_processor |
X |
the data matrix |
colind |
column indices |
... |
extra args |
the transformed data
A bi_projector offers a two-way mapping from samples (rows) to scores and from variables (columns) to components. Thus, one can project from D-dimensional input space to d-dimensional subspace. And one can project (project_vars) from n-dimensional variable space to the d-dimensional component space. The singular value decomposition is a canonical example of such a two-way mapping.
bi_projector(v, s, sdev, preproc = prep(pass()), classes = NULL, ...)
bi_projector(v, s, sdev, preproc = prep(pass()), classes = NULL, ...)
v |
A matrix of coefficients with dimensions |
s |
The score matrix |
sdev |
The standard deviations of the score matrix |
preproc |
(optional) A pre-processing pipeline, default is prep(pass()) |
classes |
(optional) A character vector specifying the class attributes of the object, default is NULL |
... |
Extra arguments to be stored in the |
A bi_projector object
X <- matrix(rnorm(200), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d)
X <- matrix(rnorm(200), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d)
bi_projector
FitsThis function combines a set of bi_projector
fits into a single bi_projector
instance.
The new instance's weights and associated scores are obtained by concatenating the weights
and scores of the input fits.
bi_projector_union(fits, outer_block_indices = NULL)
bi_projector_union(fits, outer_block_indices = NULL)
fits |
A list of |
outer_block_indices |
An optional list of indices for the outer blocks. If not provided, the function will compute the indices based on the dimensions of the input fits. |
A new bi_projector
instance with concatenated weights, scores, and other
properties from the input bi_projector
instances.
X1 <- matrix(rnorm(5*5), 5, 5) X2 <- matrix(rnorm(5*5), 5, 5) bpu <- bi_projector_union(list(pca(X1), pca(X2)))
X1 <- matrix(rnorm(5*5), 5, 5) X2 <- matrix(rnorm(5*5), 5, 5) bpu <- bi_projector_union(list(pca(X1), pca(X2)))
extract the list of indices associated with each block in a multiblock
object
block_indices(x, ...)
block_indices(x, ...)
x |
the object |
... |
extra args |
a list of block indices
extract the lengths of each block in a multiblock object
block_lengths(x)
block_lengths(x)
x |
the object |
the block lengths
Perform bootstrap resampling on a multivariate model to estimate the variability of components and scores.
bootstrap(x, nboot, ...)
bootstrap(x, nboot, ...)
x |
A fitted model object, such as a |
nboot |
An integer specifying the number of bootstrap resamples to perform. |
... |
Additional arguments to be passed to the specific model implementation of |
A list containing the bootstrap resampled components and scores for the model.
Perform bootstrap resampling for Principal Component Analysis (PCA) to estimate component and score variability.
## S3 method for class 'pca' bootstrap(x, nboot = 100, k = ncomp(x), ...)
## S3 method for class 'pca' bootstrap(x, nboot = 100, k = ncomp(x), ...)
x |
A fitted PCA model object. |
nboot |
The number of bootstrap resamples (default: 100). |
k |
The number of components to bootstrap (default: all components in the fitted PCA model). |
... |
Additional arguments to be passed to the specific model implementation of |
A list
containing bootstrap z-scores for the loadings (zboot_loadings
) and scores (zboot_scores
).
Fisher, Aaron, Brian Caffo, Brian Schwartz, and Vadim Zipunnikov. 2016. "Fast, Exact Bootstrap Principal Component Analysis for P > 1 Million." Journal of the American Statistical Association 111 (514): 846-60.
X <- matrix(rnorm(10*100), 10, 100) x <- pca(X, ncomp=9) bootstrap_results <- bootstrap(x)
X <- matrix(rnorm(10*100), 10, 100) x <- pca(X, ncomp=9) bootstrap_results <- bootstrap(x)
remove mean of all columns in matrix
center(preproc = prepper(), cmeans = NULL)
center(preproc = prepper(), cmeans = NULL)
preproc |
the pre-processing pipeline |
cmeans |
optional vector of precomputed column means |
a prepper
list
Create a classifier from a given model object (e.g., projector
). This classifier can generate predictions for new data points.
classifier(x, colind, ...)
classifier(x, colind, ...)
x |
A model object, such as a |
colind |
Optional vector of column indices used for prediction. If not provided, all columns will be used. |
... |
Additional arguments to be passed to the specific model implementation of |
A classifier function that can be used to make predictions on new data points.
Constructs a k-NN classifier for a discriminant projector, with an option to use a subset of the components.
## S3 method for class 'discriminant_projector' classifier(x, colind = NULL, knn = 1, ...)
## S3 method for class 'discriminant_projector' classifier(x, colind = NULL, knn = 1, ...)
x |
the discriminant projector object |
colind |
an optional vector specifying the column indices of the components to use for prediction (NULL by default) |
knn |
the number of nearest neighbors to consider in the k-NN classifier (default is 1) |
... |
extra arguments |
a classifier object
Constructs a classifier for a multiblock bi-projector model that can generate predictions for new data points.
## S3 method for class 'multiblock_biprojector' classifier( x, colind = NULL, labels, new_data = NULL, block = NULL, knn = 1, ... )
## S3 method for class 'multiblock_biprojector' classifier( x, colind = NULL, labels, new_data = NULL, block = NULL, knn = 1, ... )
x |
A fitted multiblock bi-projector model object. |
colind |
An optional vector of column indices used for prediction (default: NULL). |
labels |
A factor or vector of class labels for the training data. |
new_data |
An optional data matrix for which to generate predictions (default: NULL). |
block |
An optional block index for prediction (default: NULL). |
knn |
The number of nearest neighbors to consider in the classifier (default: 1). |
... |
Additional arguments to be passed to the specific model implementation of |
A multiblock classifier object.
Other classifier:
classifier.projector()
classifier
from a projector
create classifier
from a projector
## S3 method for class 'projector' classifier(x, colind = NULL, labels, new_data, knn = 1, ...)
## S3 method for class 'projector' classifier(x, colind = NULL, labels, new_data, knn = 1, ...)
x |
A model object, such as a |
colind |
Optional vector of column indices used for prediction. If not provided, all columns will be used. |
labels |
the labels associated with the rows of the projected data (see |
new_data |
reference data associated with |
knn |
the number of nearest neighbors to use when classifying a new point. |
... |
Additional arguments to be passed to the specific model implementation of |
a classifier
object
Other classifier:
classifier.multiblock_biprojector()
data(iris) X <- iris[,1:4] pcres <- pca(as.matrix(X),2) cfier <- classifier(pcres, labels=iris[,5], new_data=as.matrix(iris[,1:4])) p <- predict(cfier, as.matrix(iris[,1:4]))
data(iris) X <- iris[,1:4] pcres <- pca(as.matrix(X),2) cfier <- classifier(pcres, labels=iris[,5], new_data=as.matrix(iris[,1:4])) p <- predict(cfier, as.matrix(iris[,1:4]))
Extract coefficients from a cross_projector object
## S3 method for class 'cross_projector' coef(object, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' coef(object, source = c("X", "Y"), ...)
object |
the model fit |
source |
the source of the data (X or Y block), either "X" or "Y" |
... |
extra args |
the coefficients
normalize each column by a scale factor.
colscale(preproc = prepper(), type = c("unit", "z", "weights"), weights = NULL)
colscale(preproc = prepper(), type = c("unit", "z", "weights"), weights = NULL)
preproc |
the pre-processing pipeline |
type |
the kind of scaling, |
weights |
optional precomputed weights |
a prepper
list
Extract the component matrix of a fit.
components(x, ...)
components(x, ...)
x |
the model fit |
... |
extra args |
the component matrix
Combine two projector models into a single projector by sequentially applying the first projector and then the second projector.
compose_projector(x, y, ...)
compose_projector(x, y, ...)
x |
A fitted model object (e.g., |
y |
A second fitted model object (e.g., |
... |
Additional arguments to be passed to the specific model implementation of |
A new projector
object representing the composed projector, which can be used to project data onto the combined subspace.
Compose a sequence of projector
objects in forward order.
This function allows the composition of multiple projectors, applying them sequentially to the input data.
compose_projectors(...)
compose_projectors(...)
... |
The sequence of |
A composed_projector
object that extends the function
class, allowing the composed projectors to be
applied to input data.
# Create two PCA projectors and compose them X <- matrix(rnorm(20*20), 20, 20) pca1 <- pca(X, ncomp=10) X2 <- scores(pca1) pca2 <- pca(X2, ncomp=4) # Compose the PCA projectors cproj <- compose_projectors(pca1, pca2) # Ensure the output of the composed projectors has the expected dimensions stopifnot(ncol(cproj(X)) == 4) # Check that the composed projectors work as expected all.equal(project(cproj, X), cproj(X))
# Create two PCA projectors and compose them X <- matrix(rnorm(20*20), 20, 20) pca1 <- pca(X, ncomp=10) X2 <- scores(pca1) pca2 <- pca(X2, ncomp=4) # Compose the PCA projectors cproj <- compose_projectors(pca1, pca2) # Ensure the output of the composed projectors has the expected dimensions stopifnot(ncol(cproj(X)) == 4) # Check that the composed projectors work as expected all.equal(project(cproj, X), cproj(X))
concatenate a sequence of pre-processors, each previously applied to a block of data.
concat_pre_processors(preprocs, block_indices)
concat_pre_processors(preprocs, block_indices)
preprocs |
a list of initialized |
block_indices |
a list of block indices where each vector in the list contains the global indices of the variables. |
a new prepper
object
p1 <- center() |> prep() p2 <- center() |> prep() x1 <- rbind(1:10, 2:11) x2 <- rbind(1:10, 2:11) p1a <- init_transform(p1,x1) p2a <- init_transform(p2,x2) clist <- concat_pre_processors(list(p1,p2), list(1:10, 11:20)) t1 <- apply_transform(clist, cbind(x1,x2)) t2 <- apply_transform(clist, cbind(x1,x2[,1:5]), colind=1:15)
p1 <- center() |> prep() p2 <- center() |> prep() x1 <- rbind(1:10, 2:11) x2 <- rbind(1:10, 2:11) p1a <- init_transform(p1,x1) p2a <- init_transform(p2,x2) clist <- concat_pre_processors(list(p1,p2), list(1:10, 11:20)) t1 <- apply_transform(clist, cbind(x1,x2)) t2 <- apply_transform(clist, cbind(x1,x2[,1:5]), colind=1:15)
Convert between data representations in a multiblock decomposition/alignment by projecting the input data onto a common latent space and then reconstructing it in the target domain.
convert_domain(x, new_data, i, j, comp, rowind, colind, ...)
convert_domain(x, new_data, i, j, comp, rowind, colind, ...)
x |
The model fit, typically an object of a class that implements a |
new_data |
The data to transfer, with the same number of rows as the source data block |
i |
The index of the source data block |
j |
The index of the destination data block |
comp |
A vector of component indices to use in the reconstruction |
rowind |
Optional set of row indices to transfer (default: all rows) |
colind |
Optional set of column indices to transfer (default: all columns) |
... |
Additional arguments passed to the underlying |
A matrix or data frame representing the transferred data in the target domain
project_block
for projecting a single block of data onto the subspace
A projector that reduces two blocks of data, X and Y, yielding a pair of weights for each component. This structure can be used, for example, to store weights derived from canonical correlation analysis.
cross_projector( vx, vy, preproc_x = prep(pass()), preproc_y = prep(pass()), ..., classes = NULL )
cross_projector( vx, vy, preproc_x = prep(pass()), preproc_y = prep(pass()), ..., classes = NULL )
vx |
the X coefficients |
vy |
the Y coefficients |
preproc_x |
the X pre-processor |
preproc_y |
the Y pre-processor |
... |
extra parameters or results to store |
classes |
additional class names |
This class extends projector
and therefore basic operations such as project
, shape
, reprocess
,
and coef
work, but by default, it is assumed that the X
block is primary. To access Y
block operations, an
additional argument source
must be supplied to the relevant functions, e.g., coef(fit, source = "Y")
a cross_projector object
# Create two scaled matrices X and Y X <- scale(matrix(rnorm(10 * 5), 10, 5)) Y <- scale(matrix(rnorm(10 * 5), 10, 5)) # Perform canonical correlation analysis on X and Y cres <- cancor(X, Y) sx <- X %*% cres$xcoef sy <- Y %*% cres$ycoef # Create a cross_projector object using the canonical correlation analysis results canfit <- cross_projector(cres$xcoef, cres$ycoef, cor = cres$cor, sx = sx, sy = sy, classes = "cancor")
# Create two scaled matrices X and Y X <- scale(matrix(rnorm(10 * 5), 10, 5)) Y <- scale(matrix(rnorm(10 * 5), 10, 5)) # Perform canonical correlation analysis on X and Y cres <- cancor(X, Y) sx <- X %*% cres$xcoef sy <- Y %*% cres$ycoef # Create a cross_projector object using the canonical correlation analysis results canfit <- cross_projector(cres$xcoef, cres$ycoef, cor = cres$cor, sx = sx, sy = sy, classes = "cancor")
A discriminant_projector
is an instance that extends bi_projector
with a projection that maximizes class separation.
This can be useful for dimensionality reduction techniques that take class labels into account, such as Linear Discriminant Analysis (LDA).
discriminant_projector( v, s, sdev, preproc = prep(pass()), labels, classes = NULL, ... )
discriminant_projector( v, s, sdev, preproc = prep(pass()), labels, classes = NULL, ... )
v |
A matrix of coefficients with dimensions |
s |
The score matrix |
sdev |
The standard deviations of the score matrix |
preproc |
(optional) A pre-processing pipeline, default is prep(pass()) |
labels |
A factor or character vector of class labels corresponding to the rows of the score matrix |
classes |
(optional) A character vector specifying the class attributes of the object, default is NULL |
... |
Extra arguments to be stored in the |
A discriminant_projector
object.
bi_projector
# Simulate data and labels set.seed(123) X <- matrix(rnorm(100 * 10), 100, 10) labels <- factor(rep(1:2, each = 50)) # Perform LDA and create a discriminant projector lda_fit <- MASS::lda(X, labels) dp <- discriminant_projector(lda_fit$scaling, X %*% lda_fit$scaling, sdev = lda_fit$svd, labels = labels)
# Simulate data and labels set.seed(123) X <- matrix(rnorm(100 * 10), 100, 10) labels <- factor(rep(1:2, each = 50)) # Perform LDA and create a discriminant projector lda_fit <- MASS::lda(X, labels) dp <- discriminant_projector(lda_fit$scaling, X %*% lda_fit$scaling, sdev = lda_fit$svd, labels = labels)
Get a fresh pre-processing node cleared of any cached data
fresh(x, ...)
fresh(x, ...)
x |
the processing pipeline |
... |
extra args |
a fresh pre-processing pipeline
This function computes group means for each factor level of Y in the provided data matrix X.
group_means(Y, X)
group_means(Y, X)
Y |
a vector of labels to compute means over disjoint sets |
X |
a data matrix from which to compute means |
a matrix with row names corresponding to factor levels of Y and column-wise means for each factor level
# Example data X <- matrix(rnorm(50), 10, 5) Y <- factor(rep(1:2, each = 5)) # Compute group means gm <- group_means(Y, X)
# Example data X <- matrix(rnorm(50), 10, 5) Y <- factor(rep(1:2, each = 5)) # Compute group means gm <- group_means(Y, X)
Return the inverse projection matrix, which can be used to map back to data space. If the component matrix is orthogonal, then the inverse projection is the transpose of the component matrix.
inverse_projection(x, ...)
inverse_projection(x, ...)
x |
The model fit. |
... |
Extra arguments. |
The inverse projection matrix.
project
for projecting data onto the subspace.
test whether components are orthogonal
is_orthogonal(x)
is_orthogonal(x)
x |
the object |
a logical value indicating whether the transformation is orthogonal
Constructs a multiblock bi-projector using the given component matrix (v
), score matrix (s
), singular values (sdev
),
a preprocessing function, and a list of block indices. This allows for the projection of multiblock data, where each block
represents a different set of variables or features, with two-way mapping from samples to scores and from variables to components.
multiblock_biprojector( v, s, sdev, preproc = prep(pass()), ..., block_indices, classes = NULL )
multiblock_biprojector( v, s, sdev, preproc = prep(pass()), ..., block_indices, classes = NULL )
v |
A matrix of components with dimensions |
s |
A matrix of scores. |
sdev |
A numeric vector of singular values. |
preproc |
A pre-processing function for the data (default is a pass-through with |
... |
Extra arguments. |
block_indices |
A list of numeric vectors specifying the indices of each data block. |
classes |
(optional) A character vector specifying the class attributes of the object, default is NULL. |
A multiblock_biprojector
object.
bi_projector, multiblock_projector
Constructs a multiblock projector using the given component matrix (v
), a preprocessing function, and a list of block indices.
This allows for the projection of multiblock data, where each block represents a different set of variables or features.
multiblock_projector( v, preproc = prep(pass()), ..., block_indices, classes = NULL )
multiblock_projector( v, preproc = prep(pass()), ..., block_indices, classes = NULL )
v |
A matrix of components with dimensions |
preproc |
A pre-processing function for the data (default is a pass-through with |
... |
Extra arguments. |
block_indices |
A list of numeric vectors specifying the indices of each data block. |
classes |
(optional) A character vector specifying the class attributes of the object, default is NULL. |
A multiblock_projector
object.
projector
# Generate some example data X1 <- matrix(rnorm(10 * 5), 10, 5) X2 <- matrix(rnorm(10 * 5), 10, 5) X <- cbind(X1, X2) # Compute PCA on the combined data pc <- pca(X, ncomp = 8) # Create a multiblock projector using PCA components and block indices mb_proj <- multiblock_projector(pc$v, block_indices = list(1:5, 6:10)) # Project the multiblock data using the multiblock projector mb_scores <- project(mb_proj, X)
# Generate some example data X1 <- matrix(rnorm(10 * 5), 10, 5) X2 <- matrix(rnorm(10 * 5), 10, 5) X <- cbind(X1, X2) # Compute PCA on the combined data pc <- pca(X, ncomp = 8) # Create a multiblock projector using PCA components and block indices mb_proj <- multiblock_projector(pc$v, block_indices = list(1:5, 6:10)) # Project the multiblock data using the multiblock projector mb_scores <- project(mb_proj, X)
The number of data blocks in a multiblock element
nblocks(x)
nblocks(x)
x |
the object |
the number of blocks
This function returns the total number of components in the fitted model.
ncomp(x)
ncomp(x)
x |
A fitted model object. |
The number of components in the fitted model.
# Example using the svd_wrapper function data(iris) X <- iris[, 1:4] fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base") ncomp(fit) # Should return 3
# Example using the svd_wrapper function data(iris) X <- iris[, 1:4] fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base") ncomp(fit) # Should return 3
Approximate the embedding of a new data point using the Nystrom method, which is particularly useful for large datasets and data-dependent embedding spaces, such as multidimensional scaling (MDS).
nystrom_embedding( new_data, landmark_data, kernel_function, eigenvectors, eigenvalues, ... )
nystrom_embedding( new_data, landmark_data, kernel_function, eigenvectors, eigenvalues, ... )
new_data |
A matrix or data frame containing the new data points to be projected. |
landmark_data |
A matrix or data frame containing the landmark data points used for approximation. |
kernel_function |
A function used to compute the kernel matrix (e.g., a distance function for MDS). |
eigenvectors |
A matrix containing the eigenvectors obtained from the eigendecomposition of the kernel matrix between the landmark points. |
eigenvalues |
A vector containing the eigenvalues obtained from the eigendecomposition of the kernel matrix between the landmark points. |
... |
Additional arguments passed to the kernel_function. |
A matrix containing the approximate embedding of the new_data in the data-dependent space.
Compute the inverse projection of a columnwise subset of the component matrix (e.g., a sub-block). Even when the full component matrix is orthogonal, there is no guarantee that the partial component matrix is orthogonal.
partial_inverse_projection(x, colind, ...)
partial_inverse_projection(x, colind, ...)
x |
A fitted model object, such as a |
colind |
A numeric vector specifying the column indices of the component matrix to consider for the partial inverse projection. |
... |
Additional arguments to be passed to the specific model implementation of |
A matrix representing the partial inverse projection.
Project a selected subset of column indices onto the subspace. This function allows for the projection of new data onto a lower-dimensional space using only a subset of the variables, as specified by the column indices.
partial_project(x, new_data, colind)
partial_project(x, new_data, colind)
x |
The model fit, typically an object of class |
new_data |
A matrix or vector of new observations with a subset of columns equal to length of |
colind |
A numeric vector of column indices to select in the projection matrix. These indices correspond to the variables used for the partial projection |
A matrix or vector of the partially projected observations, where rows represent observations and columns represent the lower-dimensional space
bi_projector
for an example of a class that implements a partial_project
method
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Partially project new_data onto the same subspace as the original data # using only the first 10 variables new_data <- matrix(rnorm(5*20), 5, 20) colind <- 1:10 partially_projected_data <- partial_project(p, new_data[,colind], colind)
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Partially project new_data onto the same subspace as the original data # using only the first 10 variables new_data <- matrix(rnorm(5*20), 5, 20) colind <- 1:10 partially_projected_data <- partial_project(p, new_data[,colind], colind)
Create a new projector instance restricted to a subset of input columns. This function allows for the generation of a new projection object that focuses only on the specified columns, enabling the projection of data using a limited set of variables.
partial_projector(x, colind, ...)
partial_projector(x, colind, ...)
x |
The original |
colind |
A numeric vector of column indices to select in the projection matrix. These indices correspond to the variables used for the partial projector |
... |
Additional arguments passed to the underlying |
A new projector
instance, with the same class as the original object, that is restricted to the specified subset of input columns
bi_projector
for an example of a class that implements a partial_projector
method
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Create a partial projector using only the first 10 variables colind <- 1:10 partial_p <- partial_projector(p, colind)
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Create a partial projector using only the first 10 variables colind <- 1:10 partial_p <- partial_projector(p, colind)
projector
instanceconstruct a partial_projector from a projector
instance
## S3 method for class 'projector' partial_projector(x, colind, ...)
## S3 method for class 'projector' partial_projector(x, colind, ...)
x |
The original |
colind |
A numeric vector of column indices to select in the projection matrix. These indices correspond to the variables used for the partial projector |
... |
Additional arguments passed to the underlying |
A partial_projector
instance
X <- matrix(rnorm(10*10), 10, 10) pfit <- pca(X, ncomp=9) proj <- project(pfit, X) pp <- partial_projector(pfit, 1:5)
X <- matrix(rnorm(10*10), 10, 10) pfit <- pca(X, ncomp=9) proj <- project(pfit, X) pp <- partial_projector(pfit, 1:5)
pass
simply passes its data through the chain
pass(preproc = prepper())
pass(preproc = prepper())
preproc |
the pre-processing pipeline |
a prepper
list
Compute the directions of maximal variance in a data matrix using the Singular Value Decomposition (SVD).
pca( X, ncomp = min(dim(X)), preproc = center(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), ... )
pca( X, ncomp = min(dim(X)), preproc = center(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), ... )
X |
The data matrix. |
ncomp |
The number of requested components to estimate (default is the minimum dimension of the data matrix). |
preproc |
The pre-processing function to apply to the data matrix (default is centering). |
method |
The SVD method to use, passed to |
... |
Extra arguments to send to |
A bi_projector
object containing the PCA results.
svd_wrapper
for details on SVD methods.
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) tres <- truncate(res, 3)
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) tres <- truncate(res, 3)
Estimate confidence intervals for model parameters using permutation testing.
perm_ci(x, X, nperm, ...)
perm_ci(x, X, nperm, ...)
x |
A model fit object. |
X |
The original data matrix used to fit the model. |
nperm |
The number of permutations to perform for the confidence interval estimation. |
... |
Additional arguments to be passed to the specific model implementation of |
A list containing the estimated lower and upper bounds of the confidence intervals for model parameters.
predict with a classifier object
## S3 method for class 'classifier' predict( object, new_data, ncomp = NULL, colind = NULL, metric = c("cosine", "euclidean"), ... )
## S3 method for class 'classifier' predict( object, new_data, ncomp = NULL, colind = NULL, metric = c("cosine", "euclidean"), ... )
object |
the model fit |
new_data |
new data to predict on |
ncomp |
the number of components to use |
colind |
the column indices to select in the projection matrix |
metric |
the similarity metric ("euclidean" or "cosine") |
... |
additional arguments to projection function |
a list with the predicted class and probabilities
prepare a dataset by applying a pre-processing pipeline
prep(x, ...)
prep(x, ...)
x |
the pipeline |
... |
extra args |
the pre-processed data
This function calculates the principal angles between subspaces derived from a list of bi_projector instances.
prinang(fits)
prinang(fits)
fits |
a list of |
a numeric vector of principal angles with length equal to the minimum dimension of input subspaces
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) fits_list <- list(res,res,res) principal_angles <- prinang(fits_list)
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) fits_list <- list(res,res,res) principal_angles <- prinang(fits_list)
Pretty Print S3 Method for bi_projector Class
## S3 method for class 'bi_projector' print(x, ...)
## S3 method for class 'bi_projector' print(x, ...)
x |
A |
... |
Additional arguments passed to the print function |
Invisible bi_projector
object
Pretty Print S3 Method for bi_projector_union Class
## S3 method for class 'bi_projector_union' print(x, ...)
## S3 method for class 'bi_projector_union' print(x, ...)
x |
A |
... |
Additional arguments passed to the print function |
Invisible bi_projector_union
object
classifier
ObjectsDisplay a human-readable summary of a classifier
object, including information about the k-NN classifier, the model fit, and the dimensions of the scores matrix.
## S3 method for class 'classifier' print(x, ...)
## S3 method for class 'classifier' print(x, ...)
x |
A |
... |
Additional arguments passed to |
classifier
object.
composed_projector
ObjectsDisplay a human-readable summary of a composed_projector
object, including information about the number and order of projectors.
## S3 method for class 'composed_projector' print(x, ...)
## S3 method for class 'composed_projector' print(x, ...)
x |
A |
... |
Additional arguments passed to |
The composed_projector
object.
# Create two PCA projectors and compose them X <- matrix(rnorm(20*20), 20, 20) pca1 <- pca(X, ncomp=10) X2 <- scores(pca1) pca2 <- pca(X2, ncomp=4) cproj <- compose_projectors(pca1, pca2)
# Create two PCA projectors and compose them X <- matrix(rnorm(20*20), 20, 20) pca1 <- pca(X, ncomp=10) X2 <- scores(pca1) pca2 <- pca(X2, ncomp=4) cproj <- compose_projectors(pca1, pca2)
multiblock_biprojector
ObjectsDisplay a human-readable summary of a multiblock_biprojector
object, including information about the dimensions of the projection matrix, the pre-processing pipeline, and block indices.
## S3 method for class 'multiblock_biprojector' print(x, ...)
## S3 method for class 'multiblock_biprojector' print(x, ...)
x |
A |
... |
Additional arguments passed to |
Invisible multiblock_biprojector
object.
# Generate some example data X1 <- matrix(rnorm(10 * 5), 10, 5) X2 <- matrix(rnorm(10 * 5), 10, 5) X <- cbind(X1, X2) # Compute PCA on the combined data pc <- pca(X, ncomp = 8) # Create a multiblock bi-projector using PCA components and block indices mb_biproj <- multiblock_biprojector(pc$v, s = pc$u %*% diag(sdev(pc)), sdev = sdev(pc), block_indices = list(1:5, 6:10)) # Pretty print the multiblock bi-projector object print(mb_biproj)
# Generate some example data X1 <- matrix(rnorm(10 * 5), 10, 5) X2 <- matrix(rnorm(10 * 5), 10, 5) X <- cbind(X1, X2) # Compute PCA on the combined data pc <- pca(X, ncomp = 8) # Create a multiblock bi-projector using PCA components and block indices mb_biproj <- multiblock_biprojector(pc$v, s = pc$u %*% diag(sdev(pc)), sdev = sdev(pc), block_indices = list(1:5, 6:10)) # Pretty print the multiblock bi-projector object print(mb_biproj)
projector
ObjectsDisplay a human-readable summary of a projector
object, including information about the dimensions of the projection matrix and the pre-processing pipeline.
## S3 method for class 'projector' print(x, ...) ## S3 method for class 'projector' print(x, ...)
## S3 method for class 'projector' print(x, ...) ## S3 method for class 'projector' print(x, ...)
x |
A |
... |
Additional arguments passed to |
the projector
object
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) print(p)
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) print(p)
Project one or more samples onto a subspace. This function takes a model fit and new observations, and projects them onto the subspace defined by the model. This allows for the transformation of new data into the same lower-dimensional space as the original data.
project(x, new_data, ...)
project(x, new_data, ...)
x |
The model fit, typically an object of class bi_projector or any other class that implements a project method |
new_data |
A matrix or vector of new observations with the same number of columns as the original data. Rows represent observations and columns represent variables |
... |
Extra arguments to be passed to the specific project method for the object's class |
A matrix or vector of the projected observations, where rows represent observations and columns represent the lower-dimensional space
bi_projector
for an example of a class that implements a project method
Other project:
project.cross_projector()
,
project_block()
,
project_vars()
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d) # Project new_data onto the same subspace as the original data new_data <- matrix(rnorm(5*20), 5, 20) projected_data <- project(p, new_data)
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d) # Project new_data onto the same subspace as the original data new_data <- matrix(rnorm(5*20), 5, 20) projected_data <- project(p, new_data)
When observations are concatenated into "blocks", it may be useful to project one block from the set. This function facilitates the projection of a specific block of data onto a subspace. It is a convenience method for multi-block fits and is equivalent to a "partial projection" where the column indices are associated with a given block.
project_block(x, new_data, block, ...)
project_block(x, new_data, block, ...)
x |
The model fit, typically an object of a class that implements a |
new_data |
A matrix or vector of new observation(s) with the same number of columns as the original data |
block |
An integer representing the block ID to select in the block projection matrix. This ID corresponds to the specific block of data to be projected |
... |
Additional arguments passed to the underlying |
A matrix or vector of the projected data for the specified block
project
for the generic projection function
Other project:
project()
,
project.cross_projector()
,
project_vars()
This function projects one or more variables onto a subspace. It is often called supplementary variable projection and can be computed for a biorthogonal decomposition, such as Singular Value Decomposition (SVD).
project_vars(x, new_data, ...)
project_vars(x, new_data, ...)
x |
The model fit, typically an object of a class that implements a |
new_data |
A matrix or vector of new observation(s) with the same number of rows as the original data |
... |
Additional arguments passed to the underlying |
A matrix or vector of the projected variables in the subspace
project
for the generic projection function for samples
Other project:
project()
,
project.cross_projector()
,
project_block()
project a cross_projector instance
## S3 method for class 'cross_projector' project(x, new_data, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' project(x, new_data, source = c("X", "Y"), ...)
x |
The model fit, typically an object of class bi_projector or any other class that implements a project method |
new_data |
A matrix or vector of new observations with the same number of columns as the original data. Rows represent observations and columns represent variables |
source |
the source of the data (X or Y block) |
... |
Extra arguments to be passed to the specific project method for the object's class |
the projected data
Other project:
project()
,
project_block()
,
project_vars()
projector
instanceA projector
maps a matrix from an N-dimensional space to d-dimensional space, where d
may be less than N
.
The projection matrix, v
, is not necessarily orthogonal. This function constructs a projector
instance which can be
used for various dimensionality reduction techniques like PCA, LDA, etc.
projector(v, preproc = prep(pass()), ..., classes = NULL)
projector(v, preproc = prep(pass()), ..., classes = NULL)
v |
A matrix of coefficients with dimensions |
preproc |
A prepped pre-processing object. Default is the no-processing |
... |
Extra arguments to be stored in the |
classes |
Additional class information used for creating subtypes of |
An instance of type projector
.
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) proj <- project(p, X)
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) proj <- project(p, X)
Reconstruct a data set from its (possibly) low-rank representation. This can be useful when analyzing the impact of dimensionality reduction or when visualizing approximations of the original data.
reconstruct(x, comp, rowind, colind, ...)
reconstruct(x, comp, rowind, colind, ...)
x |
The model fit, typically an object of a class that implements a |
comp |
A vector of component indices to use in the reconstruction |
rowind |
The row indices to reconstruct (optional). If not provided, all rows are used. |
colind |
The column indices to reconstruct (optional). If not provided, all columns are used. |
... |
Additional arguments passed to the underlying |
A reconstructed data set based on the selected components, rows, and columns
bi_projector
for an example of a two-way mapping model that can be reconstructed
refit a model given new data or new parameter(s)
refit(x, new_data, ...)
refit(x, new_data, ...)
x |
the original model fit object |
new_data |
the new data to process |
... |
extra args |
a refit model object
Fit a multivariate regression model for a matrix of basis functions, X
, and a response matrix Y
.
The goal is to find a projection matrix that can be used for mapping and reconstruction.
regress( X, Y, preproc = NULL, method = c("lm", "enet", "mridge", "pls"), intercept = FALSE, lambda = 0.001, alpha = 0, ncomp = ceiling(ncol(X)/2), ... )
regress( X, Y, preproc = NULL, method = c("lm", "enet", "mridge", "pls"), intercept = FALSE, lambda = 0.001, alpha = 0, ncomp = ceiling(ncol(X)/2), ... )
X |
the set of independent (basis) variables |
Y |
the response matrix |
preproc |
the pre-processor (currently unused) |
method |
the regression method: |
intercept |
whether to include an intercept term |
lambda |
ridge shrinkage parameter (for methods |
alpha |
the elastic net mixing parameter if method is |
ncomp |
number of PLS components if method is |
... |
extra arguments sent to the underlying fitting function |
a bi-projector of type regress
# Generate synthetic data Y <- matrix(rnorm(100 * 10), 10, 100) X <- matrix(rnorm(10 * 9), 10, 9) # Fit regression models and reconstruct the response matrix r_lm <- regress(X, Y, intercept = FALSE, method = "lm") recon_lm <- reconstruct(r_lm) r_mridge <- regress(X, Y, intercept = TRUE, method = "mridge", lambda = 0.001) recon_mridge <- reconstruct(r_mridge) r_enet <- regress(X, Y, intercept = TRUE, method = "enet", lambda = 0.001, alpha = 0.5) recon_enet <- reconstruct(r_enet) r_pls <- regress(X, Y, intercept = TRUE, method = "pls", ncomp = 5) recon_pls <- reconstruct(r_pls)
# Generate synthetic data Y <- matrix(rnorm(100 * 10), 10, 100) X <- matrix(rnorm(10 * 9), 10, 9) # Fit regression models and reconstruct the response matrix r_lm <- regress(X, Y, intercept = FALSE, method = "lm") recon_lm <- reconstruct(r_lm) r_mridge <- regress(X, Y, intercept = TRUE, method = "mridge", lambda = 0.001) recon_mridge <- reconstruct(r_mridge) r_enet <- regress(X, Y, intercept = TRUE, method = "enet", lambda = 0.001, alpha = 0.5) recon_enet <- reconstruct(r_enet) r_pls <- regress(X, Y, intercept = TRUE, method = "pls", ncomp = 5) recon_pls <- reconstruct(r_pls)
Given a new dataset, process it in the same way the original data was processed (e.g. centering, scaling, etc.)
reprocess(x, new_data, colind, ...)
reprocess(x, new_data, colind, ...)
x |
the model fit object |
new_data |
the new data to process |
colind |
the column indices of the new data |
... |
extra args |
the reprocessed data
reprocess a cross_projector instance
## S3 method for class 'cross_projector' reprocess(x, new_data, colind = NULL, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' reprocess(x, new_data, colind = NULL, source = c("X", "Y"), ...)
x |
the model fit object |
new_data |
the new data to process |
colind |
the column indices of the new data |
source |
the source of the data (X or Y block) |
... |
extra args |
the re(pre-)processed data
Compute a regression model for each column in a matrix and return residual matrix
residualize(form, X, design, intercept = FALSE)
residualize(form, X, design, intercept = FALSE)
form |
the formula defining the model to fit for residuals |
X |
the response matrix |
design |
the |
intercept |
add an intercept term (default is FALSE) |
a matrix
of residuals
X <- matrix(rnorm(20*10), 20, 10) des <- data.frame(a=rep(letters[1:4], 5), b=factor(rep(1:5, each=4))) xresid <- residualize(~ a+b, X, design=des) ## design is saturated, residuals should be zero xresid2 <- residualize(~ a*b, X, design=des) sum(xresid2) == 0
X <- matrix(rnorm(20*10), 20, 10) des <- data.frame(a=rep(letters[1:4], 5), b=factor(rep(1:5, each=4))) xresid <- residualize(~ a+b, X, design=des) ## design is saturated, residuals should be zero xresid2 <- residualize(~ a*b, X, design=des) sum(xresid2) == 0
Calculate the residuals of a model after removing the effect of the first ncomp
components.
This function is useful to assess the quality of the fit or to identify patterns that are not
captured by the model.
residuals(x, ncomp, xorig, ...)
residuals(x, ncomp, xorig, ...)
x |
The model fit object. |
ncomp |
The number of components to factor out before calculating residuals. |
xorig |
The original data matrix (X) used to fit the model. |
... |
Additional arguments passed to the method. |
A matrix of residuals, with the same dimensions as the original data matrix.
reverse a pre-processing transform
reverse_transform(x, X, colind, ...)
reverse_transform(x, X, colind, ...)
x |
the pre_processor |
X |
the data matrix |
colind |
column indices |
... |
extra args |
the reverse-transformed data
Given a model object (e.g. projector
construct a random forest classifier that can generate predictions for new data points.
rf_classifier(x, colind, ...)
rf_classifier(x, colind, ...)
x |
the model object |
colind |
the (optional) column indices used for prediction |
... |
extra arguments to |
a random forest classifier
create a random forest classifier
## S3 method for class 'projector' rf_classifier(x, colind = NULL, labels, scores, ...)
## S3 method for class 'projector' rf_classifier(x, colind = NULL, labels, scores, ...)
x |
the model object |
colind |
the (optional) column indices used for prediction |
labels |
A factor or vector of class labels for the training data. |
scores |
a matrix of references scores used for classification |
... |
extra arguments to |
a rf_classifier
object
data(iris) X <- iris[,1:4] pcres <- pca(as.matrix(X),2) cfier <- rf_classifier(pcres, labels=iris[,5], scores=scores(pcres)) p <- predict(cfier, new_data=as.matrix(iris[,1:4]))
data(iris) X <- iris[,1:4] pcres <- pca(as.matrix(X),2) cfier <- rf_classifier(pcres, labels=iris[,5], scores=scores(pcres)) p <- predict(cfier, new_data=as.matrix(iris[,1:4]))
Perform a rotation of the component loadings to improve interpretability.
rotate(x, ncomp, type)
rotate(x, ncomp, type)
x |
The model fit, typically a result from a dimensionality reduction method like PCA. |
ncomp |
The number of components to rotate. |
type |
The type of rotation to apply (e.g., "varimax", "quartimax", "promax"). |
A modified model fit with the rotated components.
Extract the factor score matrix from a fitted model. The factor scores represent the projections of the data onto the components, which can be used for further analysis or visualization.
scores(x, ...)
scores(x, ...)
x |
The model fit object. |
... |
Additional arguments passed to the method. |
A matrix of factor scores, with rows corresponding to samples and columns to components.
project
for projecting new data onto the components.
The standard deviations of the projected data matrix
sdev(x)
sdev(x)
x |
the model fit |
the standard deviations
Get the input/output shape of the projector.
shape(x, ...)
shape(x, ...)
x |
The model fit. |
... |
Extra arguments. |
This function retrieves the dimensions of the sample loadings matrix v
in the form of a vector with two elements.
The first element is the number of rows in the v
matrix, and the second element is the number of columns.
A vector containing the dimensions of the sample loadings matrix v
(number of rows and columns).
shape of a cross_projector instance
## S3 method for class 'cross_projector' shape(x, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' shape(x, source = c("X", "Y"), ...)
x |
The model fit. |
source |
the source of the data (X or Y block) |
... |
Extra arguments. |
the shape of the data
center and scale each vector of a matrix
standardize(preproc = prepper(), cmeans = NULL, sds = NULL)
standardize(preproc = prepper(), cmeans = NULL, sds = NULL)
preproc |
the pre-processing pipeline |
cmeans |
an optional vector of column means |
sds |
an optional vector of sds |
a prepper
list
Calculate standardized factor scores from a fitted model. Standardized scores are useful for comparing the contributions of different components on the same scale, which can help in interpreting the results.
std_scores(x, ...)
std_scores(x, ...)
x |
The model fit object. |
... |
Additional arguments passed to the method. |
A matrix of standardized factor scores, with rows corresponding to samples and columns to components.
scores
for retrieving the original component scores.
Computes the singular value decomposition of a matrix using one of the specified methods. It is designed to be an easy-to-use wrapper for various SVD methods available in R.
svd_wrapper( X, ncomp = min(dim(X)), preproc = pass(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), q = 2, p = 10, tol = .Machine$double.eps, ... )
svd_wrapper( X, ncomp = min(dim(X)), preproc = pass(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), q = 2, p = 10, tol = .Machine$double.eps, ... )
X |
the input matrix |
ncomp |
the number of components to estimate (default: min(dim(X))) |
preproc |
the pre-processor to apply on the input matrix (e.g., |
method |
the SVD method to use: 'base', 'fast', 'irlba', 'propack', 'rsvd', or 'svds' |
q |
parameter passed to method |
p |
parameter passed to method |
tol |
minimum eigenvalue magnitude, otherwise component is dropped (default: .Machine$double.eps) |
... |
extra arguments passed to the selected SVD function |
an SVD object that extends projector
# Load iris dataset and select the first four columns data(iris) X <- iris[, 1:4] # Compute SVD using the base method and 3 components fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base")
# Load iris dataset and select the first four columns data(iris) X <- iris[, 1:4] # Compute SVD using the base method and 3 components fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base")
This function transposes a model by switching coefficients and scores. It is useful when you want to reverse the roles of samples and variables in a model, especially in the context of dimensionality reduction methods.
transpose(x, ...)
transpose(x, ...)
x |
The model fit, typically an object of a class that implements a |
... |
Additional arguments passed to the underlying |
A transposed model with coefficients and scores switched
bi_projector
for an example of a two-way mapping model that can be transposed
take the first n components of a decomposition
truncate(x, ncomp)
truncate(x, ncomp)
x |
the object to truncate |
ncomp |
number of components to retain |
a truncated object (e.g. PCA with 'ncomp' components)