Package 'MRAM'

Title: Multivariate Regression Association Measure
Description: Implementations of an estimator for the multivariate regression association measure (MRAM) proposed in Shih and Chen (2026) <doi:10.1016/j.csda.2025.108288> and its associated variable selection algorithm. The MRAM quantifies the predictability of a random vector Y from a random vector X given a random vector Z. It takes the maximum value 1 if and only if Y is almost surely a measurable function of X and Z, and the minimum value of 0 if Y is conditionally independent of X given Z. The MRAM generalizes the Kendall's tau copula correlation ratio proposed in Shih and Emura (2021) <doi:10.1016/j.jmva.2020.104708> by employing the spatial sign function. The estimator is based on the nearest neighbor method, and the associated variable selection algorithm is adapted from the feature ordering by conditional independence (FOCI) algorithm of Azadkia and Chatterjee (2021) <doi:10.1214/21-AOS2073>. For further details, see the paper Shih and Chen (2026) <doi:10.1016/j.csda.2025.108288>.
Authors: Jia-Han Shih [aut, cre], Yi-Hau Chen [aut]
Maintainer: Jia-Han Shih <[email protected]>
License: GPL-2
Version: 1.0.1
Built: 2026-05-25 10:35:52 UTC
Source: https://github.com/cran/MRAM

Help Index


Estimate the Multivariate Regression Association Measure

Description

Compute TnT_n and its standard error estimates using the nearest neighbor method and the mm-out-of-nn bootstrap.

Usage

mram(
  y_data,
  x_data,
  z_data = NULL,
  bootstrap = FALSE,
  B = 1000,
  g_vec = seq(0.4, 0.9, by = 0.05)
)

Arguments

y_data

A n×dn \times d matrix of responses, where nn is the sample size.

x_data

A n×pn \times p matrix of predictors.

z_data

A n×qn \times q matrix of conditional predictors. The default value is NULL.

bootstrap

Perform the mm-out-of-nn bootstrap if TRUE. The default value is FALSE.

B

Number of bootstrap replications. The default value is 1000.

g_vec

A vector of candidate values for γ\gamma between 0 and 1, used to generate a collection of rules for the mm-out-of-nn bootstrap. The default value is seq(0.4,0.9,by = 0.05).

Details

Let {(Xi,Yi,Zi)}i=1n\{({\bf X}_i,{\bf Y}_i,{\bf Z}_i)\}_{i = 1}^n be independent and identically distributed data from the population (X,Y,Z)({\bf X},{\bf Y},{\bf Z}). The estimate Tn(X,Y)T_n({\bf X},{\bf Y}) for the unconditional measure (z_data = NULL) is given as

Tn(X,Y)=(n2)1i<jS(YiYj),S(YN(i)YN(j)),T_n({\bf X},{\bf Y}) = \binom{n}{2}^{-1} \sum_{i < j} \langle S({{\bf Y}_i - {\bf Y}_j}), S({{\bf Y}_{N(i)} - {\bf Y}_{N(j)}}) \rangle,

where ,\langle \cdot, \cdot \rangle is the dot product, S()S(\cdot) is the spatial sign function, and N(i)N(i) is the index jj such that Xj{\bf X}_j is the nearest neighbor of Xi{\bf X}_i according to the Euclidean distance. The estimate Tn(X,YZ)T_n({\bf X},{\bf Y} \mid {\bf Z}) for the conditional measure is given as

Tn(X,YZ)=Tn((X,Z),Y)Tn(Z,Y)1Tn(Z,Y).T_n({\bf X},{\bf Y} \mid {\bf Z} ) = \frac{T_n(({\bf X},{\bf Z}),{\bf Y} ) - T_n({\bf Z},{\bf Y} )}{1 - T_n({\bf Z},{\bf Y} )}.

See the paper Shih and Chen (2026) for more details.

For the mm-out-of-nn bootstrap, the rule (resample size) is set to be m=nγm = \lfloor n^\gamma \rfloor, where x\lfloor x \rfloor denotes the largest integer that is smaller than or equal to xx and 0<γ<10 < \gamma < 1 takes values from the vector g_vec. It is recommended to use T_se_cluster, the standard error estimate obtained based on the cluster rule. See Dette and Kroll (2024) for more details.

The mram function is used in vs_mram function for variable selection.

Value

T_est

The estimate of the multivariate regression association measure. The value returned by T_est is between 1-1 and 11. However, it is between 00 and 11 asymptotically. A small value indicates that x_data has low predictability for y_data condition on z_data in the sense of the considered measure. On the other hand, a large value indicates that x_data has high predictability for y_data condition on z_data. If z_data = NULL, the returned value indicates the unconditional predictability.

T_se_cluster

The standard error estimate based on the cluster rule.

m_vec

The vector of mm generated by g_vec.

T_se_vec

The vector of standard error estimates obtained from the mm-out-of-nn bootstrap, where mm is equal to m_vec.

J_cluster

The index of the best m_vec chosen by the cluster rule.

References

Dette and Kroll (2024) A Simple Bootstrap for Chatterjee’s Rank Correlation, Biometrika, asae045.

Shih and Chen (2026) Measuring multivariate regression association via spatial sign, Computational Statistics & Data Analysis, 215, 108288.

See Also

vs_mram

Examples

library(MRAM)

n = 100

set.seed(1)
x_data = matrix(rnorm(n*2),n,2)
y_data = matrix(0,n,2)
y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]+rnorm(n)
y_data[,2] = x_data[,1]*x_data[,2]-x_data[,1]+rnorm(n)

mram(y_data,x_data[,1],x_data[,2])
mram(y_data,x_data[,2],x_data[,1])
mram(y_data,x_data[,1])
mram(y_data,x_data[,2])

## Not run: 

# perform the m-out-of-n bootstrap
mram(y_data,x_data[,1],x_data[,2],bootstrap = TRUE)
mram(y_data,x_data[,2],x_data[,1],bootstrap = TRUE)
mram(y_data,x_data[,1],bootstrap = TRUE)
mram(y_data,x_data[,2],bootstrap = TRUE)

## End(Not run)

Variable Selection via the Multivariate Regression Association Measure

Description

Select a subset of X\bf X which can be used to predict Y\bf Y based on TnT_n.

Usage

vs_mram(y_data, x_data)

Arguments

y_data

A n×dn \times d matrix of responses, where nn is the sample size.

x_data

A n×pn \times p matrix of predictors.

Details

vs_mram performs forward stepwise variable selection based on the multivariate regression association measure proposed in Shih and Chen (2026). At each step, it selects the predictor with the highest conditional predictability for the response given the previously selected predictors. The algorithm is modified from the FOCI algorithm from Azadkia and Chatterjee (2021).

Value

The vector containing the indices of the selected predictors in the order they were chosen.

References

Azadkia and Chatterjee (2021) A simple measure of conditional dependence, Annals of Statistics, 46(6): 3070-3102.

Shih and Chen (2026) Measuring multivariate regression association via spatial sign, Computational Statistics & Data Analysis, 215, 108288.

See Also

mram

Examples

library(MRAM)

n = 200
p = 10

set.seed(1)
x_data = matrix(rnorm(p*n),n,p)
colnames(x_data) = paste0(rep("X",p),seq(1,p))

y_data = x_data[,1]*x_data[,2]+x_data[,1]-x_data[,3]+rnorm(n)
colnames(x_data)[vs_mram(y_data,x_data)] # selected variables

## Not run: 

n = 500
p = 10

set.seed(1)
x_data = matrix(rnorm(p*n),n,p)
colnames(x_data) = paste0(rep("X",p),seq(1,p))

# Linear
y_data = matrix(0,n,2)
y_data[,1] = x_data[,1]*x_data[,2]+x_data[,1]-x_data[,3]+rnorm(n)
y_data[,2] = x_data[,2]*x_data[,4]+x_data[,2]-x_data[,5]+rnorm(n)
colnames(x_data)[vs_mram(y_data,x_data)] # selected variables

# Nonlinear
y_data = matrix(0,n,2)
y_data[,1] = x_data[,1]*x_data[,2]+sin(x_data[,1]*x_data[,3])+0.3*rnorm(n)
y_data[,2] = cos(x_data[,2]*x_data[,4])+x_data[,3]-x_data[,4]+0.3*rnorm(n)
colnames(x_data)[vs_mram(y_data,x_data)] # selected variables

# Non-additive error
y_data = matrix(0,n,2)
y_data[,1] = abs(x_data[,1]+runif(n))^(sin(x_data[,2])-cos(x_data[,3]))
y_data[,2] = abs(x_data[,2]-runif(n))^(sin(x_data[,3])-cos(x_data[,4]))
colnames(x_data)[vs_mram(y_data,x_data)] # selected variables

## End(Not run)