Title: | Extensible Data Structures for Multivariate Analysis |
---|---|
Description: | Provides a set of basic and extensible data structures and functions for multivariate analysis, including dimensionality reduction techniques, projection methods, and preprocessing functions. The aim of this package is to offer a flexible and user-friendly framework for multivariate analysis that can be easily extended for custom requirements and specific data analysis tasks. |
Authors: | Bradley Buchsbaum [aut, cre] |
Maintainer: | Bradley Buchsbaum <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-02-16 05:47:37 UTC |
Source: | https://github.com/bbuchsbaum/multivarious |
add a pre-processing stage
add_node(x, step, ...)
add_node(x, step, ...)
x |
the processing pipeline |
step |
the pre-processing step to add |
... |
extra args |
a new pre-processing pipeline with the added step
Add a pre-processing node to a pipeline
## S3 method for class 'prepper' add_node(x, step, ...)
## S3 method for class 'prepper' add_node(x, step, ...)
x |
A |
step |
The pre-processing step to add |
... |
Additional arguments |
Apply a specified rotation to the fitted model
apply_rotation(x, rotation_matrix, ...)
apply_rotation(x, rotation_matrix, ...)
x |
A model object, possibly created using the |
rotation_matrix |
|
... |
extra args |
A modified object with updated components and scores after applying the specified rotation.
apply a pre-processing transform
apply_transform(x, X, colind, ...)
apply_transform(x, X, colind, ...)
x |
the pre_processor |
X |
the data matrix |
colind |
column indices |
... |
extra args |
the transformed data
A bi_projector offers a two-way mapping from samples (rows) to scores and from variables (columns) to components. Thus, one can project from D-dimensional input space to d-dimensional subspace. And one can project (project_vars) from n-dimensional variable space to the d-dimensional component space. The singular value decomposition is a canonical example of such a two-way mapping.
bi_projector(v, s, sdev, preproc = prep(pass()), classes = NULL, ...)
bi_projector(v, s, sdev, preproc = prep(pass()), classes = NULL, ...)
v |
A matrix of coefficients with dimensions |
s |
The score matrix |
sdev |
The standard deviations of the score matrix |
preproc |
(optional) A pre-processing pipeline, default is prep(pass()) |
classes |
(optional) A character vector specifying the class attributes of the object, default is NULL |
... |
Extra arguments to be stored in the |
A bi_projector object
X <- matrix(rnorm(200), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d)
X <- matrix(rnorm(200), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d)
bi_projector
FitsThis function combines a set of bi_projector
fits into a single bi_projector
instance.
The new instance's weights and associated scores are obtained by concatenating the weights
and scores of the input fits.
bi_projector_union(fits, outer_block_indices = NULL)
bi_projector_union(fits, outer_block_indices = NULL)
fits |
A list of |
outer_block_indices |
An optional list of indices for the outer blocks. If not provided, the function will compute the indices based on the dimensions of the input fits. |
A new bi_projector
instance with concatenated weights, scores, and other
properties from the input bi_projector
instances.
X1 <- matrix(rnorm(5*5), 5, 5) X2 <- matrix(rnorm(5*5), 5, 5) bpu <- bi_projector_union(list(pca(X1), pca(X2)))
X1 <- matrix(rnorm(5*5), 5, 5) X2 <- matrix(rnorm(5*5), 5, 5) bpu <- bi_projector_union(list(pca(X1), pca(X2)))
extract the list of indices associated with each block in a multiblock
object
block_indices(x, ...)
block_indices(x, ...)
x |
the object |
... |
extra args |
a list of block indices
Extract the Block Indices from a Multiblock Projector
## S3 method for class 'multiblock_projector' block_indices(x, i, ...)
## S3 method for class 'multiblock_projector' block_indices(x, i, ...)
x |
A |
i |
Ignored. |
... |
Ignored. |
The list of block indices.
extract the lengths of each block in a multiblock object
block_lengths(x)
block_lengths(x)
x |
the object |
the block lengths
Perform bootstrap resampling on a multivariate model to estimate the variability of components and scores.
bootstrap(x, nboot, ...)
bootstrap(x, nboot, ...)
x |
A fitted model object, such as a |
nboot |
An integer specifying the number of bootstrap resamples to perform. |
... |
Additional arguments to be passed to the specific model implementation of |
A list containing the bootstrap resampled components and scores for the model.
Perform bootstrap resampling for Principal Component Analysis (PCA) to estimate component and score variability.
## S3 method for class 'pca' bootstrap(x, nboot = 100, k = ncomp(x), ...)
## S3 method for class 'pca' bootstrap(x, nboot = 100, k = ncomp(x), ...)
x |
A fitted PCA model object. |
nboot |
The number of bootstrap resamples (default: 100). |
k |
The number of components to bootstrap (default: all components in the fitted PCA model). |
... |
Additional arguments to be passed to the specific model implementation of |
A list
containing bootstrap z-scores for the loadings (zboot_loadings
) and scores (zboot_scores
).
Fisher, Aaron, Brian Caffo, Brian Schwartz, and Vadim Zipunnikov. 2016. "Fast, Exact Bootstrap Principal Component Analysis for P > 1 Million." Journal of the American Statistical Association 111 (514): 846-60.
X <- matrix(rnorm(10*100), 10, 100) x <- pca(X, ncomp=9) bootstrap_results <- bootstrap(x)
X <- matrix(rnorm(10*100), 10, 100) x <- pca(X, ncomp=9) bootstrap_results <- bootstrap(x)
remove mean of all columns in matrix
center(preproc = prepper(), cmeans = NULL)
center(preproc = prepper(), cmeans = NULL)
preproc |
the pre-processing pipeline |
cmeans |
optional vector of precomputed column means |
a prepper
list
Create a classifier from a given model object (e.g., projector
). This classifier can generate predictions for new data points.
classifier(x, colind, ...)
classifier(x, colind, ...)
x |
A model object, such as a |
colind |
Optional vector of column indices used for prediction. If not provided, all columns will be used. |
... |
Additional arguments to be passed to the specific model implementation of |
A classifier function that can be used to make predictions on new data points.
Create a k-NN classifier for a discriminant projector
## S3 method for class 'discriminant_projector' classifier(x, colind = NULL, knn = 1, ...)
## S3 method for class 'discriminant_projector' classifier(x, colind = NULL, knn = 1, ...)
x |
the discriminant projector object |
colind |
an optional vector specifying the column indices of the components |
knn |
the number of nearest neighbors (default=1) |
... |
extra arguments |
a classifier object
Constructs a classifier for a multiblock bi-projector model object.
Either global or partial scores can be used. If colind
or block
are provided
and global_scores=FALSE
, partial projection is performed. Otherwise, global projection is used.
## S3 method for class 'multiblock_biprojector' classifier( x, colind = NULL, labels, new_data = NULL, block = NULL, global_scores = TRUE, knn = 1, ... )
## S3 method for class 'multiblock_biprojector' classifier( x, colind = NULL, labels, new_data = NULL, block = NULL, global_scores = TRUE, knn = 1, ... )
x |
A fitted multiblock bi-projector model object. |
colind |
An optional vector of column indices used for prediction (default: NULL). |
labels |
A factor or vector of class labels for the training data. |
new_data |
An optional data matrix for which to generate predictions (default: NULL). |
block |
An optional block index for prediction (default: NULL). |
global_scores |
Whether to use the global scores or the partial scores for reference space (default: TRUE). |
knn |
The number of nearest neighbors to consider in the classifier (default: 1). |
... |
Additional arguments. |
A multiblock classifier object.
create classifier from a projector
## S3 method for class 'projector' classifier( x, colind = NULL, labels, new_data, knn = 1, global_scores = TRUE, ... )
## S3 method for class 'projector' classifier( x, colind = NULL, labels, new_data, knn = 1, global_scores = TRUE, ... )
x |
projector |
colind |
... |
labels |
... |
new_data |
... |
knn |
... |
global_scores |
... |
... |
extra args |
Extract coefficients from a cross_projector object
## S3 method for class 'cross_projector' coef(object, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' coef(object, source = c("X", "Y"), ...)
object |
the model fit |
source |
the source of the data (X or Y block), either "X" or "Y" |
... |
extra args |
the coefficients
Extracts the components (loadings) for a given block or the entire projector.
## S3 method for class 'multiblock_projector' coef(object, block, ...)
## S3 method for class 'multiblock_projector' coef(object, block, ...)
object |
A |
block |
Optional block index. If missing, returns loadings for all variables. |
... |
Additional arguments. |
A matrix of loadings.
normalize each column by a scale factor.
colscale(preproc = prepper(), type = c("unit", "z", "weights"), weights = NULL)
colscale(preproc = prepper(), type = c("unit", "z", "weights"), weights = NULL)
preproc |
the pre-processing pipeline |
type |
the kind of scaling, |
weights |
optional precomputed weights |
a prepper
list
Extract the component matrix of a fit.
components(x, ...)
components(x, ...)
x |
the model fit |
... |
extra args |
the component matrix
Creates a composed_partial_projector
object that applies partial projections sequentially.
If multiple projectors are composed, the column indices (colind) used at each stage must be considered.
compose_partial_projector(...)
compose_partial_projector(...)
... |
A sequence of projectors that implement |
A composed_partial_projector
object.
# Suppose pca1 and pca2 support partial_project(). # cpartial <- compose_partial_projector(pca1, pca2) # partial_project(cpartial, new_data, colind=1:5)
# Suppose pca1 and pca2 support partial_project(). # cpartial <- compose_partial_projector(pca1, pca2) # partial_project(cpartial, new_data, colind=1:5)
Combine two projector models into a single projector by sequentially applying the first projector and then the second projector.
compose_projector(x, y, ...)
compose_projector(x, y, ...)
x |
A fitted model object (e.g., |
y |
A second fitted model object (e.g., |
... |
Additional arguments to be passed to the specific model implementation of |
A new projector
object representing the composed projector, which can be used to project data onto the combined subspace.
concatenate a sequence of pre-processors, each applied to a block of data.
concat_pre_processors(preprocs, block_indices)
concat_pre_processors(preprocs, block_indices)
preprocs |
a list of initialized |
block_indices |
a list of integer vectors specifying the global column indices for each block |
a new pre_processor
object that applies the correct transformations blockwise
p1 <- center() |> prep() p2 <- center() |> prep() x1 <- rbind(1:10, 2:11) x2 <- rbind(1:10, 2:11) p1a <- init_transform(p1,x1) p2a <- init_transform(p2,x2) clist <- concat_pre_processors(list(p1,p2), list(1:10, 11:20)) t1 <- apply_transform(clist, cbind(x1,x2)) t2 <- apply_transform(clist, cbind(x1,x2[,1:5]), colind=1:15)
p1 <- center() |> prep() p2 <- center() |> prep() x1 <- rbind(1:10, 2:11) x2 <- rbind(1:10, 2:11) p1a <- init_transform(p1,x1) p2a <- init_transform(p2,x2) clist <- concat_pre_processors(list(p1,p2), list(1:10, 11:20)) t1 <- apply_transform(clist, cbind(x1,x2)) t2 <- apply_transform(clist, cbind(x1,x2[,1:5]), colind=1:15)
Convert between data representations in a multiblock decomposition/alignment by projecting the input data onto a common latent space and then reconstructing it in the target domain.
convert_domain(x, new_data, i, j, comp, rowind, colind, ...)
convert_domain(x, new_data, i, j, comp, rowind, colind, ...)
x |
The model fit, typically an object of a class that implements a |
new_data |
The data to transfer, with the same number of rows as the source data block |
i |
The index of the source data block |
j |
The index of the destination data block |
comp |
A vector of component indices to use in the reconstruction |
rowind |
Optional set of row indices to transfer (default: all rows) |
colind |
Optional set of column indices to transfer (default: all columns) |
... |
Additional arguments passed to the underlying |
A matrix or data frame representing the transferred data in the target domain
project_block
for projecting a single block of data onto the subspace
Contrastive PCA (cPCA) finds directions that capture the variation in a "foreground" dataset that is not present (or less present) in a "background" dataset
. This function adaptively chooses how to solve the generalized eigenvalue problem based on the dataset sizes and the chosen method:
cPCA( X_f, X_b, ncomp = min(dim(X_f)[2]), preproc = center(), lambda = 0, method = c("geigen", "primme", "sdiag", "corpcor"), allow_transpose = TRUE, ... )
cPCA( X_f, X_b, ncomp = min(dim(X_f)[2]), preproc = center(), lambda = 0, method = c("geigen", "primme", "sdiag", "corpcor"), allow_transpose = TRUE, ... )
X_f |
A numeric matrix representing the foreground dataset, with dimensions (samples x features). |
X_b |
A numeric matrix representing the background dataset, with dimensions (samples x features). |
ncomp |
Number of components to estimate. Defaults to |
preproc |
A pre-processing function (default: |
lambda |
Shrinkage parameter for covariance estimation. Defaults to 0. Used by |
method |
A character string specifying the computation method. One of:
|
... |
Additional arguments passed to underlying functions such as |
method = "corpcor": Uses a corpcor-based whitening approach (crossprod.powcor.shrink
) to transform the data, then performs a standard PCA on the transformed foreground data.
method \in {"geigen","primme","sdiag"} and moderate number of features (D): Directly forms covariance matrices and uses geneig
to solve the generalized eigenvalue problem.
method \in {"geigen","primme","sdiag"} and large number of features (D >> N): Uses an SVD-based reduction on the background data to avoid forming large matrices. This reduces the problem to
space.
Adaptive Strategy:
If method = "corpcor"
, no large covariance matrices are formed. Instead, the background data is used to "whiten" the foreground, followed by a simple PCA.
If method \neq "corpcor"
and the number of features D
is manageable (e.g. D <= max(N_f, N_b)
), the function forms covariance matrices and directly solves the generalized eigenproblem.
If method \neq "corpcor"
and D
is large (e.g., tens of thousands, D > max(N_f, N_b)
), it computes the SVD of the background data X_b
to derive a smaller N x N
eigenproblem, thereby avoiding the costly computation of covariance matrices.
Note: If lambda != 0
and D
is very large, the current implementation does not fully integrate shrinkage into the large-D SVD-based approach and will issue a warning.
A bi_projector
object containing:
A (features x ncomp) matrix of eigenvectors (loadings).
A (samples x ncomp) matrix of scores, i.e., projections of X_f
onto the eigenvectors.
A vector of length ncomp
giving the square-root of the eigenvalues.
The pre-processing object used.
set.seed(123) X_f <- matrix(rnorm(2000), nrow=100, ncol=20) # Foreground: 100 samples, 20 features X_b <- matrix(rnorm(2000), nrow=100, ncol=20) # Background: same size # Default method (geigen), small dimension scenario res <- cPCA(X_f, X_b, ncomp=5) plot(res$s[,1], res$s[,2], main="cPCA scores (component 1 vs 2)")
set.seed(123) X_f <- matrix(rnorm(2000), nrow=100, ncol=20) # Foreground: 100 samples, 20 features X_b <- matrix(rnorm(2000), nrow=100, ncol=20) # Background: same size # Default method (geigen), small dimension scenario res <- cPCA(X_f, X_b, ncomp=5) plot(res$s[,1], res$s[,2], main="cPCA scores (component 1 vs 2)")
A projector that reduces two blocks of data, X and Y, yielding a pair of weights for each component. This structure can be used, for example, to store weights derived from canonical correlation analysis.
cross_projector( vx, vy, preproc_x = prep(pass()), preproc_y = prep(pass()), ..., classes = NULL )
cross_projector( vx, vy, preproc_x = prep(pass()), preproc_y = prep(pass()), ..., classes = NULL )
vx |
the X coefficients |
vy |
the Y coefficients |
preproc_x |
the X pre-processor |
preproc_y |
the Y pre-processor |
... |
extra parameters or results to store |
classes |
additional class names |
This class extends projector
and therefore basic operations such as project
, shape
, reprocess
,
and coef
work, but by default, it is assumed that the X
block is primary. To access Y
block operations, an
additional argument source
must be supplied to the relevant functions, e.g., coef(fit, source = "Y")
a cross_projector object
# Create two scaled matrices X and Y X <- scale(matrix(rnorm(10 * 5), 10, 5)) Y <- scale(matrix(rnorm(10 * 5), 10, 5)) # Perform canonical correlation analysis on X and Y cres <- cancor(X, Y) sx <- X %*% cres$xcoef sy <- Y %*% cres$ycoef # Create a cross_projector object using the canonical correlation analysis results canfit <- cross_projector(cres$xcoef, cres$ycoef, cor = cres$cor, sx = sx, sy = sy, classes = "cancor")
# Create two scaled matrices X and Y X <- scale(matrix(rnorm(10 * 5), 10, 5)) Y <- scale(matrix(rnorm(10 * 5), 10, 5)) # Perform canonical correlation analysis on X and Y cres <- cancor(X, Y) sx <- X %*% cres$xcoef sy <- Y %*% cres$ycoef # Create a cross_projector object using the canonical correlation analysis results canfit <- cross_projector(cres$xcoef, cres$ycoef, cor = cres$cor, sx = sx, sy = sy, classes = "cancor")
A discriminant_projector
is an instance that extends bi_projector
with a projection that maximizes class separation.
This can be useful for dimensionality reduction techniques that take class labels into account, such as Linear Discriminant Analysis (LDA).
discriminant_projector( v, s, sdev, preproc = prep(pass()), labels, classes = NULL, ... )
discriminant_projector( v, s, sdev, preproc = prep(pass()), labels, classes = NULL, ... )
v |
A matrix of coefficients with dimensions |
s |
The score matrix |
sdev |
The standard deviations of the score matrix |
preproc |
(optional) A pre-processing pipeline, default is prep(pass()) |
labels |
A factor or character vector of class labels corresponding to the rows of the score matrix |
classes |
(optional) A character vector specifying the class attributes of the object, default is NULL |
... |
Extra arguments to be stored in the |
A discriminant_projector
object.
bi_projector
# Simulate data and labels set.seed(123) X <- matrix(rnorm(100 * 10), 100, 10) labels <- factor(rep(1:2, each = 50)) # Perform LDA and create a discriminant projector lda_fit <- MASS::lda(X, labels) dp <- discriminant_projector(lda_fit$scaling, X %*% lda_fit$scaling, sdev = lda_fit$svd, labels = labels)
# Simulate data and labels set.seed(123) X <- matrix(rnorm(100 * 10), 100, 10) labels <- factor(rep(1:2, each = 50)) # Perform LDA and create a discriminant projector lda_fit <- MASS::lda(X, labels) dp <- discriminant_projector(lda_fit$scaling, X %*% lda_fit$scaling, sdev = lda_fit$svd, labels = labels)
Calculate the importance of features in a model
feature_importance(x, ...)
feature_importance(x, ...)
x |
the model fit |
... |
extra args |
the feature importance scores
Uses "marginal" or "standalone" approaches:
marginal: remove block and see change in accuracy
standalone: use only that block and measure accuracy
## S3 method for class 'classifier' feature_importance( x, new_data, ncomp = NULL, blocks = NULL, metric = c("cosine", "euclidean", "ejaccard"), fun = rank_score, normalize_probs = FALSE, approach = c("marginal", "standalone"), ... )
## S3 method for class 'classifier' feature_importance( x, new_data, ncomp = NULL, blocks = NULL, metric = c("cosine", "euclidean", "ejaccard"), fun = rank_score, normalize_probs = FALSE, approach = c("marginal", "standalone"), ... )
x |
classifier |
new_data |
new data |
ncomp |
... |
blocks |
a list of feature indices |
metric |
... |
fun |
a function to compute accuracy (default rank_score) |
normalize_probs |
logical |
approach |
"marginal" or "standalone" |
... |
args to projection |
a data.frame with block and importance
Get a fresh pre-processing node cleared of any cached data
fresh(x, ...)
fresh(x, ...)
x |
the processing pipeline |
... |
extra args |
a fresh pre-processing pipeline
Recreates the pipeline structure without any learned parameters.
## S3 method for class 'prepper' fresh(x, ...)
## S3 method for class 'prepper' fresh(x, ...)
Computes the generalized eigenvalues and eigenvectors for the problem: A x = λ B x. Various methods are available and differ in their assumptions about A and B.
geneig(A, B, ncomp, method = c("robust", "sdiag", "geigen", "primme"), ...)
geneig(A, B, ncomp, method = c("robust", "sdiag", "geigen", "primme"), ...)
A |
The left-hand side square matrix. |
B |
The right-hand side square matrix, same dimension as A. |
ncomp |
Number of eigenpairs to return. |
method |
Method to compute the eigenvalues and eigenvectors:
|
... |
Additional arguments passed to the underlying methods. |
An object of class projector
with eigenvalues stored in values
and standard deviations in sdev = sqrt(values)
.
if (requireNamespace("geigen", quietly = TRUE)) { A <- matrix(c(14, 10, 12, 10, 12, 13, 12, 13, 14), nrow=3, byrow=TRUE) B <- matrix(c(48, 17, 26, 17, 33, 32, 26, 32, 34), nrow=3, byrow=TRUE) res <- geneig(A, B, ncomp=3, method="geigen") # res$values and coefficients(res) }
if (requireNamespace("geigen", quietly = TRUE)) { A <- matrix(c(14, 10, 12, 10, 12, 13, 12, 13, 14), nrow=3, byrow=TRUE) B <- matrix(c(48, 17, 26, 17, 33, 32, 26, 32, 34), nrow=3, byrow=TRUE) res <- geneig(A, B, ncomp=3, method="geigen") # res$values and coefficients(res) }
This function computes group means for each factor level of Y in the provided data matrix X.
group_means(Y, X)
group_means(Y, X)
Y |
a vector of labels to compute means over disjoint sets |
X |
a data matrix from which to compute means |
a matrix with row names corresponding to factor levels of Y and column-wise means for each factor level
# Example data X <- matrix(rnorm(50), 10, 5) Y <- factor(rep(1:2, each = 5)) # Compute group means gm <- group_means(Y, X)
# Example data X <- matrix(rnorm(50), 10, 5) Y <- factor(rep(1:2, each = 5)) # Compute group means gm <- group_means(Y, X)
Return the inverse projection matrix, which can be used to map back to data space. If the component matrix is orthogonal, then the inverse projection is the transpose of the component matrix.
inverse_projection(x, ...)
inverse_projection(x, ...)
x |
The model fit. |
... |
Extra arguments. |
The inverse projection matrix.
project
for projecting data onto the subspace.
test whether components are orthogonal
is_orthogonal(x)
is_orthogonal(x)
x |
the object |
a logical value indicating whether the transformation is orthogonal
Constructs a multiblock bi-projector using the given component matrix (v
), score matrix (s
), singular values (sdev
),
a preprocessing function, and a list of block indices. This allows for two-way mapping with multiblock data.
multiblock_biprojector( v, s, sdev, preproc = prep(pass()), ..., block_indices, classes = NULL )
multiblock_biprojector( v, s, sdev, preproc = prep(pass()), ..., block_indices, classes = NULL )
v |
A matrix of components (nrow = number of variables, ncol = number of components). |
s |
A matrix of scores (nrow = samples, ncol = components). |
sdev |
A numeric vector of singular values or standard deviations. |
preproc |
A pre-processing object (default: |
... |
Extra arguments. |
block_indices |
A list of numeric vectors specifying data block variable indices. |
classes |
Additional class attributes (default NULL). |
A multiblock_biprojector
object.
bi_projector, multiblock_projector
Constructs a multiblock projector using the given component matrix (v
), a preprocessing function, and a list of block indices.
This allows for the projection of multiblock data, where each block represents a different set of variables or features.
multiblock_projector( v, preproc = prep(pass()), ..., block_indices, classes = NULL )
multiblock_projector( v, preproc = prep(pass()), ..., block_indices, classes = NULL )
v |
A matrix of components with dimensions |
preproc |
A pre-processing function for the data (default: |
... |
Extra arguments. |
block_indices |
A list of numeric vectors specifying the indices of each data block. |
classes |
(optional) A character vector specifying additional class attributes of the object, default is NULL. |
A multiblock_projector
object.
projector
# Generate some example data X1 <- matrix(rnorm(10 * 5), 10, 5) X2 <- matrix(rnorm(10 * 5), 10, 5) X <- cbind(X1, X2) # Compute PCA on the combined data pc <- pca(X, ncomp = 8) # Create a multiblock projector using PCA components and block indices mb_proj <- multiblock_projector(pc$v, block_indices = list(1:5, 6:10)) # Project multiblock data using the multiblock projector mb_scores <- project(mb_proj, X)
# Generate some example data X1 <- matrix(rnorm(10 * 5), 10, 5) X2 <- matrix(rnorm(10 * 5), 10, 5) X <- cbind(X1, X2) # Compute PCA on the combined data pc <- pca(X, ncomp = 8) # Create a multiblock projector using PCA components and block indices mb_proj <- multiblock_projector(pc$v, block_indices = list(1:5, 6:10)) # Project multiblock data using the multiblock projector mb_scores <- project(mb_proj, X)
The number of data blocks in a multiblock element
nblocks(x)
nblocks(x)
x |
the object |
the number of blocks
This function returns the total number of components in the fitted model.
ncomp(x)
ncomp(x)
x |
A fitted model object. |
The number of components in the fitted model.
# Example using the svd_wrapper function data(iris) X <- iris[, 1:4] fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base") ncomp(fit) # Should return 3
# Example using the svd_wrapper function data(iris) X <- iris[, 1:4] fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base") ncomp(fit) # Should return 3
Approximate the eigen-decomposition of a large kernel matrix using either the standard Nyström method or the Double Nyström method.
nystrom_approx( X, kernel_func = NULL, ncomp = min(dim(X)), landmarks = NULL, nlandmarks = 10, preproc = pass(), method = c("standard", "double"), l = NULL, use_RSpectra = TRUE, ... )
nystrom_approx( X, kernel_func = NULL, ncomp = min(dim(X)), landmarks = NULL, nlandmarks = 10, preproc = pass(), method = c("standard", "double"), l = NULL, use_RSpectra = TRUE, ... )
X |
A numeric matrix or data frame of size (N x D), where N is number of samples. |
kernel_func |
A kernel function with signature |
ncomp |
Number of components (eigenvectors/eigenvalues) to return. |
landmarks |
A vector of row indices (of X) specifying the landmark points.
If NULL, |
nlandmarks |
The number of landmark points to sample if |
preproc |
A pre-processing pipeline (default |
method |
Either "standard" (the classic single-stage Nyström) or "double" (the two-stage Double Nyström method). |
l |
Intermediate rank for the double Nyström method. Ignored if |
use_RSpectra |
Logical. If TRUE, use |
... |
Additional arguments passed to |
The Double Nyström method introduces an intermediate step that reduces the size of the decomposition problem, potentially improving efficiency and scalability.
A bi_projector
object with fields:
v
The eigenvectors (N x ncomp) approximating the kernel eigenbasis.
s
The scores (N x ncomp) = v * diag(sdev), analogous to principal component scores.
sdev
The square roots of the eigenvalues.
preproc
The pre-processing pipeline used.
set.seed(123) X <- matrix(rnorm(1000*1000), 1000, 1000) # Standard Nyström res_std <- nystrom_approx(X, ncomp=5, nlandmarks=20, method="standard") # Double Nyström res_db <- nystrom_approx(X, ncomp=5, nlandmarks=20, method="double", l=10)
set.seed(123) X <- matrix(rnorm(1000*1000), 1000, 1000) # Standard Nyström res_std <- nystrom_approx(X, ncomp=5, nlandmarks=20, method="standard") # Double Nyström res_db <- nystrom_approx(X, ncomp=5, nlandmarks=20, method="double", l=10)
Compute the inverse projection of a columnwise subset of the component matrix (e.g., a sub-block). Even when the full component matrix is orthogonal, there is no guarantee that the partial component matrix is orthogonal.
partial_inverse_projection(x, colind, ...)
partial_inverse_projection(x, colind, ...)
x |
A fitted model object, such as a |
colind |
A numeric vector specifying the column indices of the component matrix to consider for the partial inverse projection. |
... |
Additional arguments to be passed to the specific model implementation of |
A matrix representing the partial inverse projection.
Project a selected subset of column indices onto the subspace. This function allows for the projection of new data onto a lower-dimensional space using only a subset of the variables, as specified by the column indices.
partial_project(x, new_data, colind)
partial_project(x, new_data, colind)
x |
The model fit, typically an object of class |
new_data |
A matrix or vector of new observations with a subset of columns equal to length of |
colind |
A numeric vector of column indices to select in the projection matrix. These indices correspond to the variables used for the partial projection |
A matrix or vector of the partially projected observations, where rows represent observations and columns represent the lower-dimensional space
bi_projector
for an example of a class that implements a partial_project
method
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Partially project new_data onto the same subspace as the original data # using only the first 10 variables new_data <- matrix(rnorm(5*20), 5, 20) colind <- 1:10 partially_projected_data <- partial_project(p, new_data[,colind], colind)
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Partially project new_data onto the same subspace as the original data # using only the first 10 variables new_data <- matrix(rnorm(5*20), 5, 20) colind <- 1:10 partially_projected_data <- partial_project(p, new_data[,colind], colind)
Applies partial_project()
through each projector in the composition.
If colind
is a single vector, it applies to the first projector only. Subsequent projectors apply full columns.
If colind
is a list, each element specifies the colind
for the corresponding projector in the chain.
## S3 method for class 'composed_partial_projector' partial_project(x, new_data, colind, ...)
## S3 method for class 'composed_partial_projector' partial_project(x, new_data, colind, ...)
x |
A |
new_data |
The input data matrix or vector. |
colind |
A numeric vector or a list of numeric vectors. If a single vector, applies to the first projector.
If a list, its length must match the number of projectors in |
... |
Additional arguments passed to |
The partially projected data after all projectors are applied.
Create a new projector instance restricted to a subset of input columns. This function allows for the generation of a new projection object that focuses only on the specified columns, enabling the projection of data using a limited set of variables.
partial_projector(x, colind, ...)
partial_projector(x, colind, ...)
x |
The original |
colind |
A numeric vector of column indices to select in the projection matrix. These indices correspond to the variables used for the partial projector |
... |
Additional arguments passed to the underlying |
A new projector
instance, with the same class as the original object, that is restricted to the specified subset of input columns
bi_projector
for an example of a class that implements a partial_projector
method
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Create a partial projector using only the first 10 variables colind <- 1:10 partial_p <- partial_projector(p, colind)
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev=svdfit$d) # Create a partial projector using only the first 10 variables colind <- 1:10 partial_p <- partial_projector(p, colind)
projector
instanceconstruct a partial_projector from a projector
instance
## S3 method for class 'projector' partial_projector(x, colind, ...)
## S3 method for class 'projector' partial_projector(x, colind, ...)
x |
The original |
colind |
A numeric vector of column indices to select in the projection matrix. These indices correspond to the variables used for the partial projector |
... |
Additional arguments passed to the underlying |
A partial_projector
instance
# Assuming pfit is a projector with many components: # pp <- partial_projector(pfit, 1:5)
# Assuming pfit is a projector with many components: # pp <- partial_projector(pfit, 1:5)
pass
simply passes its data through the chain
pass(preproc = prepper())
pass(preproc = prepper())
preproc |
the pre-processing pipeline |
a prepper
list
Compute the directions of maximal variance in a data matrix using the Singular Value Decomposition (SVD).
pca( X, ncomp = min(dim(X)), preproc = center(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), ... )
pca( X, ncomp = min(dim(X)), preproc = center(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), ... )
X |
The data matrix. |
ncomp |
The number of requested components to estimate (default is the minimum dimension of the data matrix). |
preproc |
The pre-processing function to apply to the data matrix (default is centering). |
method |
The SVD method to use, passed to |
... |
Extra arguments to send to |
A bi_projector
object containing the PCA results.
svd_wrapper
for details on SVD methods.
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) tres <- truncate(res, 3)
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) tres <- truncate(res, 3)
Estimate confidence intervals for model parameters using permutation testing.
perm_ci(x, X, nperm, ...)
perm_ci(x, X, nperm, ...)
x |
A model fit object. |
X |
The original data matrix used to fit the model. |
nperm |
The number of permutations to perform for the confidence interval estimation. |
... |
Additional arguments to be passed to the specific model implementation of |
A list containing the estimated lower and upper bounds of the confidence intervals for model parameters.
Perform a permutation test to assess the significance of variance explained by PCA components.
## S3 method for class 'pca' perm_ci(x, X, nperm = 100, k = 4, distr = "gamma", parallel = FALSE, ...)
## S3 method for class 'pca' perm_ci(x, X, nperm = 100, k = 4, distr = "gamma", parallel = FALSE, ...)
x |
A PCA object from |
X |
The original data matrix used for PCA. |
nperm |
Number of permutations. |
k |
Number of components (beyond the first) to test. Default tests up to |
distr |
Distribution to fit to the permutation results ("gamma", "norm", or "empirical"). |
parallel |
Logical, whether to use parallel processing for permutations. |
... |
Additional arguments passed to |
The function computes a statistic F_a
for each component a
, representing the fraction
of variance explained relative to the remaining components. It then uses permutations of
the preprocessed data to generate a null distribution. The first component uses the full data;
subsequent components are tested by partialing out previously identified components and
permuting the residuals.
By default, a gamma distribution is fit to the permuted values to derive CIs and p-values.
If distr="empirical"
, it uses empirical quantiles instead.
A list containing:
The observed F_a values for tested components.
A matrix of permuted F-values. Each column corresponds to a component.
A list of fit objects or NULL if empirical chosen.
Computed confidence intervals for each component.
p-values for each component.
predict with a classifier object
## S3 method for class 'classifier' predict( object, new_data, ncomp = NULL, colind = NULL, metric = c("euclidean", "cosine", "ejaccard"), normalize_probs = FALSE, ... )
## S3 method for class 'classifier' predict( object, new_data, ncomp = NULL, colind = NULL, metric = c("euclidean", "cosine", "ejaccard"), normalize_probs = FALSE, ... )
object |
classifier |
new_data |
new data |
ncomp |
number of components |
colind |
column indices |
metric |
similarity metric |
normalize_probs |
logical |
... |
extra args |
list with class and prob
prepare a dataset by applying a pre-processing pipeline
prep(x, ...)
prep(x, ...)
x |
the pipeline |
... |
extra args |
the pre-processed data
Prepares a pre-processing pipeline for application by creating init
, transform
, and reverse_transform
functions.
## S3 method for class 'prepper' prep(x, ...)
## S3 method for class 'prepper' prep(x, ...)
This function calculates the principal angles between subspaces derived from a list of bi_projector instances.
prinang(fits)
prinang(fits)
fits |
a list of |
a numeric vector of principal angles with length equal to the minimum dimension of input subspaces
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) fits_list <- list(res,res,res) principal_angles <- prinang(fits_list)
data(iris) X <- as.matrix(iris[, 1:4]) res <- pca(X, ncomp = 4) fits_list <- list(res,res,res) principal_angles <- prinang(fits_list)
Pretty Print S3 Method for bi_projector Class
## S3 method for class 'bi_projector' print(x, ...)
## S3 method for class 'bi_projector' print(x, ...)
x |
A |
... |
Additional arguments passed to the print function |
Invisible bi_projector
object
Pretty Print S3 Method for bi_projector_union Class
## S3 method for class 'bi_projector_union' print(x, ...)
## S3 method for class 'bi_projector_union' print(x, ...)
x |
A |
... |
Additional arguments passed to the print function |
Invisible bi_projector_union
object
classifier
ObjectsDisplay a human-readable summary of a classifier
object.
## S3 method for class 'classifier' print(x, ...)
## S3 method for class 'classifier' print(x, ...)
x |
A |
... |
Additional arguments. |
classifier
object.
Print a concat_pre_processor object
## S3 method for class 'concat_pre_processor' print(x, ...)
## S3 method for class 'concat_pre_processor' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
multiblock_biprojector
ObjectsDisplay a summary of a multiblock_biprojector
object.
## S3 method for class 'multiblock_biprojector' print(x, ...)
## S3 method for class 'multiblock_biprojector' print(x, ...)
x |
A |
... |
Additional arguments passed to |
Invisible multiblock_biprojector
object.
Display information about a pre_processor
using crayon-based formatting.
## S3 method for class 'pre_processor' print(x, ...)
## S3 method for class 'pre_processor' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Uses crayon
to produce a colorful and readable representation of the pipeline steps.
## S3 method for class 'prepper' print(x, ...)
## S3 method for class 'prepper' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
projector
ObjectsDisplay a human-readable summary of a projector
object using crayon formatting, including information
about the dimensions of the projection matrix and the pre-processing pipeline.
## S3 method for class 'projector' print(x, ...)
## S3 method for class 'projector' print(x, ...)
x |
A |
... |
Additional arguments passed to |
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) print(p)
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) print(p)
regress
ObjectsDisplay a human-readable summary of a regress
object using crayon formatting,
including information about the method and dimensions.
## S3 method for class 'regress' print(x, ...)
## S3 method for class 'regress' print(x, ...)
x |
A |
... |
Additional arguments passed to |
Project one or more samples onto a subspace. This function takes a model fit and new observations, and projects them onto the subspace defined by the model. This allows for the transformation of new data into the same lower-dimensional space as the original data.
project(x, new_data, ...)
project(x, new_data, ...)
x |
The model fit, typically an object of class bi_projector or any other class that implements a project method |
new_data |
A matrix or vector of new observations with the same number of columns as the original data. Rows represent observations and columns represent variables |
... |
Extra arguments to be passed to the specific project method for the object's class |
A matrix or vector of the projected observations, where rows represent observations and columns represent the lower-dimensional space
bi_projector
for an example of a class that implements a project method
Other project:
project.cross_projector()
,
project_block()
,
project_vars()
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d) # Project new_data onto the same subspace as the original data new_data <- matrix(rnorm(5*20), 5, 20) projected_data <- project(p, new_data)
# Example with the bi_projector class X <- matrix(rnorm(10*20), 10, 20) svdfit <- svd(X) p <- bi_projector(svdfit$v, s = svdfit$u %% diag(svdfit$d), sdev=svdfit$d) # Project new_data onto the same subspace as the original data new_data <- matrix(rnorm(5*20), 5, 20) projected_data <- project(p, new_data)
When observations are concatenated into "blocks", it may be useful to project one block from the set. This function facilitates the projection of a specific block of data onto a subspace. It is a convenience method for multi-block fits and is equivalent to a "partial projection" where the column indices are associated with a given block.
project_block(x, new_data, block, ...)
project_block(x, new_data, block, ...)
x |
The model fit, typically an object of a class that implements a |
new_data |
A matrix or vector of new observation(s) with the same number of columns as the original data |
block |
An integer representing the block ID to select in the block projection matrix. This ID corresponds to the specific block of data to be projected |
... |
Additional arguments passed to the underlying |
A matrix or vector of the projected data for the specified block
project
for the generic projection function
Other project:
project()
,
project.cross_projector()
,
project_vars()
Projects the new data onto the subspace defined by a specific block of variables.
## S3 method for class 'multiblock_projector' project_block(x, new_data, block, ...)
## S3 method for class 'multiblock_projector' project_block(x, new_data, block, ...)
x |
A |
new_data |
The new data to be projected. |
block |
The block index (1-based) to project onto. |
... |
Additional arguments passed to |
The projected scores for the specified block.
This function projects one or more variables onto a subspace. It is often called supplementary variable projection and can be computed for a biorthogonal decomposition, such as Singular Value Decomposition (SVD).
project_vars(x, new_data, ...)
project_vars(x, new_data, ...)
x |
The model fit, typically an object of a class that implements a |
new_data |
A matrix or vector of new observation(s) with the same number of rows as the original data |
... |
Additional arguments passed to the underlying |
A matrix or vector of the projected variables in the subspace
project
for the generic projection function for samples
Other project:
project()
,
project.cross_projector()
,
project_block()
project a cross_projector instance
## S3 method for class 'cross_projector' project(x, new_data, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' project(x, new_data, source = c("X", "Y"), ...)
x |
The model fit, typically an object of class bi_projector or any other class that implements a project method |
new_data |
A matrix or vector of new observations with the same number of columns as the original data. Rows represent observations and columns represent variables |
source |
the source of the data (X or Y block) |
... |
Extra arguments to be passed to the specific project method for the object's class |
the projected data
Other project:
project()
,
project_block()
,
project_vars()
projector
instanceA projector
maps a matrix from an N-dimensional space to d-dimensional space, where d
may be less than N
.
The projection matrix, v
, is not necessarily orthogonal. This function constructs a projector
instance which can be
used for various dimensionality reduction techniques like PCA, LDA, etc.
projector(v, preproc = prep(pass()), ..., classes = NULL)
projector(v, preproc = prep(pass()), ..., classes = NULL)
v |
A matrix of coefficients with dimensions |
preproc |
A prepped pre-processing object. Default is the no-processing |
... |
Extra arguments to be stored in the |
classes |
Additional class information used for creating subtypes of |
An instance of type projector
.
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) proj <- project(p, X)
X <- matrix(rnorm(10*10), 10, 10) svdfit <- svd(X) p <- projector(svdfit$v) proj <- project(p, X)
Calculate Rank Score for Predictions
rank_score(prob, observed)
rank_score(prob, observed)
prob |
matrix of predicted probabilities (observations x classes) |
observed |
vector of observed class labels |
data.frame with prank and observed
Reconstruct a data set from its (possibly) low-rank representation. This can be useful when analyzing the impact of dimensionality reduction or when visualizing approximations of the original data.
reconstruct(x, comp, rowind, colind, ...)
reconstruct(x, comp, rowind, colind, ...)
x |
The model fit, typically an object of a class that implements a |
comp |
A vector of component indices to use in the reconstruction |
rowind |
The row indices to reconstruct (optional). If not provided, all rows are used. |
colind |
The column indices to reconstruct (optional). If not provided, all columns are used. |
... |
Additional arguments passed to the underlying |
A reconstructed data set based on the selected components, rows, and columns
bi_projector
for an example of a two-way mapping model that can be reconstructed
refit a model given new data or new parameter(s)
refit(x, new_data, ...)
refit(x, new_data, ...)
x |
the original model fit object |
new_data |
the new data to process |
... |
extra args |
a refit model object
Fit a multivariate regression model for a matrix of basis functions, X
, and a response matrix Y
.
The goal is to find a projection matrix that can be used for mapping and reconstruction.
regress( X, Y, preproc = NULL, method = c("lm", "enet", "mridge", "pls"), intercept = FALSE, lambda = 0.001, alpha = 0, ncomp = ceiling(ncol(X)/2), ... )
regress( X, Y, preproc = NULL, method = c("lm", "enet", "mridge", "pls"), intercept = FALSE, lambda = 0.001, alpha = 0, ncomp = ceiling(ncol(X)/2), ... )
X |
the set of independent (basis) variables |
Y |
the response matrix |
preproc |
the pre-processor (currently unused) |
method |
the regression method: |
intercept |
whether to include an intercept term |
lambda |
ridge shrinkage parameter (for methods |
alpha |
the elastic net mixing parameter if method is |
ncomp |
number of PLS components if method is |
... |
extra arguments sent to the underlying fitting function |
a bi-projector of type regress
# Generate synthetic data Y <- matrix(rnorm(100 * 10), 10, 100) X <- matrix(rnorm(10 * 9), 10, 9) # Fit regression models and reconstruct the response matrix r_lm <- regress(X, Y, intercept = FALSE, method = "lm") recon_lm <- reconstruct(r_lm) r_mridge <- regress(X, Y, intercept = TRUE, method = "mridge", lambda = 0.001) recon_mridge <- reconstruct(r_mridge) r_enet <- regress(X, Y, intercept = TRUE, method = "enet", lambda = 0.001, alpha = 0.5) recon_enet <- reconstruct(r_enet) r_pls <- regress(X, Y, intercept = TRUE, method = "pls", ncomp = 5) recon_pls <- reconstruct(r_pls)
# Generate synthetic data Y <- matrix(rnorm(100 * 10), 10, 100) X <- matrix(rnorm(10 * 9), 10, 9) # Fit regression models and reconstruct the response matrix r_lm <- regress(X, Y, intercept = FALSE, method = "lm") recon_lm <- reconstruct(r_lm) r_mridge <- regress(X, Y, intercept = TRUE, method = "mridge", lambda = 0.001) recon_mridge <- reconstruct(r_mridge) r_enet <- regress(X, Y, intercept = TRUE, method = "enet", lambda = 0.001, alpha = 0.5) recon_enet <- reconstruct(r_enet) r_pls <- regress(X, Y, intercept = TRUE, method = "pls", ncomp = 5) recon_pls <- reconstruct(r_pls)
Perform a relative eigenanalysis between two groups, fully integrated with the
pre-processing and projector ecosystem. The function computes the directions that
maximize the variance ratio between two groups and returns a bi_projector
object.
relative_eigen( XA, XB, ncomp = NULL, preproc = center(), reg_param = 1e-05, threshold = 2000, ... )
relative_eigen( XA, XB, ncomp = NULL, preproc = center(), reg_param = 1e-05, threshold = 2000, ... )
XA |
A numeric matrix or data frame of observations for group A (n_A x p). |
XB |
A numeric matrix or data frame of observations for group B (n_B x p). |
ncomp |
The number of components to compute. If NULL (default), computes up to |
preproc |
A pre-processing pipeline created with |
reg_param |
A small regularization parameter to ensure numerical stability. Defaults to 1e-5. |
threshold |
An integer specifying the dimension threshold to switch between direct and iterative solvers. Defaults to 2000. |
... |
Additional arguments passed to lower-level functions. |
This function computes the leading eigenvalues and eigenvectors of the generalized eigenvalue problem
, fully integrated with the pre-processing ecosystem.
It uses a direct solver when the number of variables
is less than or equal to
threshold
,
and switches to an iterative method when is greater than
threshold
.
A bi_projector
object containing the components, scores, and other relevant information.
# Simulate data for two groups set.seed(123) n_A <- 100 n_B <- 80 p <- 500 # Number of variables XA <- matrix(rnorm(n_A * p), nrow = n_A, ncol = p) XB <- matrix(rnorm(n_B * p), nrow = n_B, ncol = p) # Perform relative eigenanalysis res <- relative_eigen(XA, XB, ncomp = 5)
# Simulate data for two groups set.seed(123) n_A <- 100 n_B <- 80 p <- 500 # Number of variables XA <- matrix(rnorm(n_A * p), nrow = n_A, ncol = p) XB <- matrix(rnorm(n_B * p), nrow = n_B, ncol = p) # Perform relative eigenanalysis res <- relative_eigen(XA, XB, ncomp = 5)
Given a new dataset, process it in the same way the original data was processed (e.g. centering, scaling, etc.)
reprocess(x, new_data, colind, ...)
reprocess(x, new_data, colind, ...)
x |
the model fit object |
new_data |
the new data to process |
colind |
the column indices of the new data |
... |
extra args |
the reprocessed data
reprocess a cross_projector instance
## S3 method for class 'cross_projector' reprocess(x, new_data, colind = NULL, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' reprocess(x, new_data, colind = NULL, source = c("X", "Y"), ...)
x |
the model fit object |
new_data |
the new data to process |
colind |
the column indices of the new data |
source |
the source of the data (X or Y block) |
... |
extra args |
the re(pre-)processed data
Compute a regression model for each column in a matrix and return residual matrix
residualize(form, X, design, intercept = FALSE)
residualize(form, X, design, intercept = FALSE)
form |
the formula defining the model to fit for residuals |
X |
the response matrix |
design |
the |
intercept |
add an intercept term (default is FALSE) |
a matrix
of residuals
X <- matrix(rnorm(20*10), 20, 10) des <- data.frame(a=rep(letters[1:4], 5), b=factor(rep(1:5, each=4))) xresid <- residualize(~ a+b, X, design=des) ## design is saturated, residuals should be zero xresid2 <- residualize(~ a*b, X, design=des) sum(xresid2) == 0
X <- matrix(rnorm(20*10), 20, 10) des <- data.frame(a=rep(letters[1:4], 5), b=factor(rep(1:5, each=4))) xresid <- residualize(~ a+b, X, design=des) ## design is saturated, residuals should be zero xresid2 <- residualize(~ a*b, X, design=des) sum(xresid2) == 0
Calculate the residuals of a model after removing the effect of the first ncomp
components.
This function is useful to assess the quality of the fit or to identify patterns that are not
captured by the model.
residuals(x, ncomp, xorig, ...)
residuals(x, ncomp, xorig, ...)
x |
The model fit object. |
ncomp |
The number of components to factor out before calculating residuals. |
xorig |
The original data matrix (X) used to fit the model. |
... |
Additional arguments passed to the method. |
A matrix of residuals, with the same dimensions as the original data matrix.
reverse a pre-processing transform
reverse_transform(x, X, colind, ...)
reverse_transform(x, X, colind, ...)
x |
the pre_processor |
X |
the data matrix |
colind |
column indices |
... |
extra args |
the reverse-transformed data
Given a model object (e.g. projector
construct a random forest classifier that can generate predictions for new data points.
rf_classifier(x, colind, ...)
rf_classifier(x, colind, ...)
x |
the model object |
colind |
the (optional) column indices used for prediction |
... |
extra arguments to |
a random forest classifier
Uses randomForest
to train a random forest on the provided scores and labels.
## S3 method for class 'projector' rf_classifier(x, colind = NULL, labels, scores, ...)
## S3 method for class 'projector' rf_classifier(x, colind = NULL, labels, scores, ...)
x |
a projector object |
colind |
optional col indices |
labels |
class labels |
scores |
reference scores |
... |
passed to |
a rf_classifier
object with rfres (rf model), labels, scores
Perform a rotation of the component loadings to improve interpretability.
rotate(x, ncomp, type)
rotate(x, ncomp, type)
x |
The model fit, typically a result from a dimensionality reduction method like PCA. |
ncomp |
The number of components to rotate. |
type |
The type of rotation to apply (e.g., "varimax", "quartimax", "promax"). |
A modified model fit with the rotated components.
Apply a specified rotation to the component loadings of a PCA model. This function leverages the GPArotation package to apply orthogonal or oblique rotations.
## S3 method for class 'pca' rotate( x, ncomp, type = c("varimax", "quartimax", "promax"), loadings_type = c("pattern", "structure"), score_method = c("auto", "recompute", "original"), ... )
## S3 method for class 'pca' rotate( x, ncomp, type = c("varimax", "quartimax", "promax"), loadings_type = c("pattern", "structure"), score_method = c("auto", "recompute", "original"), ... )
x |
A PCA model object, typically created using the |
ncomp |
The number of components to rotate. Must be <= ncomp(x). |
type |
The type of rotation to apply. Supported rotation types:
|
... |
Additional arguments passed to GPArotation functions. |
A modified PCA object with class rotated_pca
and additional fields:
v |
Rotated loadings |
s |
Rotated scores |
sdev |
Updated standard deviations of rotated components |
explained_variance |
Proportion of explained variance for each rotated component |
rotation |
A list with rotation details: type, R (orth) or Phi (oblique), and loadings_type |
# Perform PCA on iris dataset data(iris) X <- as.matrix(iris[,1:4]) res <- pca(X, ncomp=4) # Apply varimax rotation to the first 3 components rotated_res <- rotate(res, ncomp=3, type="varimax")
# Perform PCA on iris dataset data(iris) X <- as.matrix(iris[,1:4]) res <- pca(X, ncomp=4) # Apply varimax rotation to the first 3 components rotated_res <- rotate(res, ncomp=3, type="varimax")
Extract the factor score matrix from a fitted model. The factor scores represent the projections of the data onto the components, which can be used for further analysis or visualization.
scores(x, ...)
scores(x, ...)
x |
The model fit object. |
... |
Additional arguments passed to the method. |
A matrix of factor scores, with rows corresponding to samples and columns to components.
project
for projecting new data onto the components.
The standard deviations of the projected data matrix
sdev(x)
sdev(x)
x |
the model fit |
the standard deviations
Get the input/output shape of the projector.
shape(x, ...)
shape(x, ...)
x |
The model fit. |
... |
Extra arguments. |
This function retrieves the dimensions of the sample loadings matrix v
in the form of a vector with two elements.
The first element is the number of rows in the v
matrix, and the second element is the number of columns.
A vector containing the dimensions of the sample loadings matrix v
(number of rows and columns).
shape of a cross_projector instance
## S3 method for class 'cross_projector' shape(x, source = c("X", "Y"), ...)
## S3 method for class 'cross_projector' shape(x, source = c("X", "Y"), ...)
x |
The model fit. |
source |
the source of the data (X or Y block) |
... |
Extra arguments. |
the shape of the data
center and scale each vector of a matrix
standardize(preproc = prepper(), cmeans = NULL, sds = NULL)
standardize(preproc = prepper(), cmeans = NULL, sds = NULL)
preproc |
the pre-processing pipeline |
cmeans |
an optional vector of column means |
sds |
an optional vector of sds |
a prepper
list
Calculate standardized factor scores from a fitted model. Standardized scores are useful for comparing the contributions of different components on the same scale, which can help in interpreting the results.
std_scores(x, ...)
std_scores(x, ...)
x |
The model fit object. |
... |
Additional arguments passed to the method. |
A matrix of standardized factor scores, with rows corresponding to samples and columns to components.
scores
for retrieving the original component scores.
Computes the singular value decomposition of a matrix using one of the specified methods. It is designed to be an easy-to-use wrapper for various SVD methods available in R.
svd_wrapper( X, ncomp = min(dim(X)), preproc = pass(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), q = 2, p = 10, tol = .Machine$double.eps, ... )
svd_wrapper( X, ncomp = min(dim(X)), preproc = pass(), method = c("fast", "base", "irlba", "propack", "rsvd", "svds"), q = 2, p = 10, tol = .Machine$double.eps, ... )
X |
the input matrix |
ncomp |
the number of components to estimate (default: min(dim(X))) |
preproc |
the pre-processor to apply on the input matrix (e.g., |
method |
the SVD method to use: 'base', 'fast', 'irlba', 'propack', 'rsvd', or 'svds' |
q |
parameter passed to method |
p |
parameter passed to method |
tol |
minimum eigenvalue magnitude, otherwise component is dropped (default: .Machine$double.eps) |
... |
extra arguments passed to the selected SVD function |
an SVD object that extends projector
# Load iris dataset and select the first four columns data(iris) X <- iris[, 1:4] # Compute SVD using the base method and 3 components fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base")
# Load iris dataset and select the first four columns data(iris) X <- iris[, 1:4] # Compute SVD using the base method and 3 components fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base")
This function transposes a model by switching coefficients and scores. It is useful when you want to reverse the roles of samples and variables in a model, especially in the context of dimensionality reduction methods.
transpose(x, ...)
transpose(x, ...)
x |
The model fit, typically an object of a class that implements a |
... |
Additional arguments passed to the underlying |
A transposed model with coefficients and scores switched
bi_projector
for an example of a two-way mapping model that can be transposed
take the first n components of a decomposition
truncate(x, ncomp)
truncate(x, ncomp)
x |
the object to truncate |
ncomp |
number of components to retain |
a truncated object (e.g. PCA with 'ncomp' components)