| Title: | Multivariate Regression Association Measure |
|---|---|
| Description: | Implementations of an estimator for the multivariate regression association measure (MRAM) proposed in Shih and Chen (2026) <doi:10.1016/j.csda.2025.108288> and its associated variable selection algorithm. The MRAM quantifies the predictability of a random vector Y from a random vector X given a random vector Z. It takes the maximum value 1 if and only if Y is almost surely a measurable function of X and Z, and the minimum value of 0 if Y is conditionally independent of X given Z. The MRAM generalizes the Kendall's tau copula correlation ratio proposed in Shih and Emura (2021) <doi:10.1016/j.jmva.2020.104708> by employing the spatial sign function. The estimator is based on the nearest neighbor method, and the associated variable selection algorithm is adapted from the feature ordering by conditional independence (FOCI) algorithm of Azadkia and Chatterjee (2021) <doi:10.1214/21-AOS2073>. For further details, see the paper Shih and Chen (2026) <doi:10.1016/j.csda.2025.108288>. |
| Authors: | Jia-Han Shih [aut, cre], Yi-Hau Chen [aut] |
| Maintainer: | Jia-Han Shih <[email protected]> |
| License: | GPL-2 |
| Version: | 1.0.1 |
| Built: | 2026-05-25 10:35:52 UTC |
| Source: | https://github.com/cran/MRAM |
Compute and its standard error estimates using the nearest neighbor method and the -out-of- bootstrap.
mram( y_data, x_data, z_data = NULL, bootstrap = FALSE, B = 1000, g_vec = seq(0.4, 0.9, by = 0.05) )mram( y_data, x_data, z_data = NULL, bootstrap = FALSE, B = 1000, g_vec = seq(0.4, 0.9, by = 0.05) )
y_data |
A |
x_data |
A |
z_data |
A |
bootstrap |
Perform the |
B |
Number of bootstrap replications. The default value is |
g_vec |
A vector of candidate values for |
Let be independent and identically distributed data from the population . The estimate for the unconditional measure (z_data = NULL) is given as
where is the dot product, is the spatial sign function, and is the index such that is the nearest neighbor of according to the Euclidean distance. The estimate for the conditional measure is given as
See the paper Shih and Chen (2026) for more details.
For the -out-of- bootstrap, the rule (resample size) is set to be , where denotes the largest integer that is smaller than or equal to and takes values from the vector g_vec. It is recommended to use T_se_cluster, the standard error estimate obtained based on the cluster rule. See Dette and Kroll (2024) for more details.
The mram function is used in vs_mram function for variable selection.
T_est |
The estimate of the multivariate regression association measure. The value returned by |
T_se_cluster |
The standard error estimate based on the cluster rule. |
m_vec |
The vector of |
T_se_vec |
The vector of standard error estimates obtained from the |
J_cluster |
The index of the best |
Dette and Kroll (2024) A Simple Bootstrap for Chatterjee’s Rank Correlation, Biometrika, asae045.
Shih and Chen (2026) Measuring multivariate regression association via spatial sign, Computational Statistics & Data Analysis, 215, 108288.
library(MRAM) n = 100 set.seed(1) x_data = matrix(rnorm(n*2),n,2) y_data = matrix(0,n,2) y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]+rnorm(n) y_data[,2] = x_data[,1]*x_data[,2]-x_data[,1]+rnorm(n) mram(y_data,x_data[,1],x_data[,2]) mram(y_data,x_data[,2],x_data[,1]) mram(y_data,x_data[,1]) mram(y_data,x_data[,2]) ## Not run: # perform the m-out-of-n bootstrap mram(y_data,x_data[,1],x_data[,2],bootstrap = TRUE) mram(y_data,x_data[,2],x_data[,1],bootstrap = TRUE) mram(y_data,x_data[,1],bootstrap = TRUE) mram(y_data,x_data[,2],bootstrap = TRUE) ## End(Not run)library(MRAM) n = 100 set.seed(1) x_data = matrix(rnorm(n*2),n,2) y_data = matrix(0,n,2) y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]+rnorm(n) y_data[,2] = x_data[,1]*x_data[,2]-x_data[,1]+rnorm(n) mram(y_data,x_data[,1],x_data[,2]) mram(y_data,x_data[,2],x_data[,1]) mram(y_data,x_data[,1]) mram(y_data,x_data[,2]) ## Not run: # perform the m-out-of-n bootstrap mram(y_data,x_data[,1],x_data[,2],bootstrap = TRUE) mram(y_data,x_data[,2],x_data[,1],bootstrap = TRUE) mram(y_data,x_data[,1],bootstrap = TRUE) mram(y_data,x_data[,2],bootstrap = TRUE) ## End(Not run)
Select a subset of which can be used to predict based on .
vs_mram(y_data, x_data)vs_mram(y_data, x_data)
y_data |
A |
x_data |
A |
vs_mram performs forward stepwise variable selection based on the multivariate regression association measure proposed in Shih and Chen (2026). At each step, it selects the predictor with the highest conditional predictability for the response given the previously selected predictors. The algorithm is modified from the FOCI algorithm from Azadkia and Chatterjee (2021).
The vector containing the indices of the selected predictors in the order they were chosen.
Azadkia and Chatterjee (2021) A simple measure of conditional dependence, Annals of Statistics, 46(6): 3070-3102.
Shih and Chen (2026) Measuring multivariate regression association via spatial sign, Computational Statistics & Data Analysis, 215, 108288.
library(MRAM) n = 200 p = 10 set.seed(1) x_data = matrix(rnorm(p*n),n,p) colnames(x_data) = paste0(rep("X",p),seq(1,p)) y_data = x_data[,1]*x_data[,2]+x_data[,1]-x_data[,3]+rnorm(n) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables ## Not run: n = 500 p = 10 set.seed(1) x_data = matrix(rnorm(p*n),n,p) colnames(x_data) = paste0(rep("X",p),seq(1,p)) # Linear y_data = matrix(0,n,2) y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]-x_data[,3]+rnorm(n) y_data[,2] = x_data[,2]*x_data[,4]+x_data[,2]-x_data[,5]+rnorm(n) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables # Nonlinear y_data = matrix(0,n,2) y_data[,1] = x_data[,1]*x_data[,2]+sin(x_data[,1]*x_data[,3])+0.3*rnorm(n) y_data[,2] = cos(x_data[,2]*x_data[,4])+x_data[,3]-x_data[,4]+0.3*rnorm(n) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables # Non-additive error y_data = matrix(0,n,2) y_data[,1] = abs(x_data[,1]+runif(n))^(sin(x_data[,2])-cos(x_data[,3])) y_data[,2] = abs(x_data[,2]-runif(n))^(sin(x_data[,3])-cos(x_data[,4])) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables ## End(Not run)library(MRAM) n = 200 p = 10 set.seed(1) x_data = matrix(rnorm(p*n),n,p) colnames(x_data) = paste0(rep("X",p),seq(1,p)) y_data = x_data[,1]*x_data[,2]+x_data[,1]-x_data[,3]+rnorm(n) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables ## Not run: n = 500 p = 10 set.seed(1) x_data = matrix(rnorm(p*n),n,p) colnames(x_data) = paste0(rep("X",p),seq(1,p)) # Linear y_data = matrix(0,n,2) y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]-x_data[,3]+rnorm(n) y_data[,2] = x_data[,2]*x_data[,4]+x_data[,2]-x_data[,5]+rnorm(n) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables # Nonlinear y_data = matrix(0,n,2) y_data[,1] = x_data[,1]*x_data[,2]+sin(x_data[,1]*x_data[,3])+0.3*rnorm(n) y_data[,2] = cos(x_data[,2]*x_data[,4])+x_data[,3]-x_data[,4]+0.3*rnorm(n) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables # Non-additive error y_data = matrix(0,n,2) y_data[,1] = abs(x_data[,1]+runif(n))^(sin(x_data[,2])-cos(x_data[,3])) y_data[,2] = abs(x_data[,2]-runif(n))^(sin(x_data[,3])-cos(x_data[,4])) colnames(x_data)[vs_mram(y_data,x_data)] # selected variables ## End(Not run)