Factor Augmented Sparse Throughput Deep ReLU Neural Networks
The software implements factor augmented sparse throughput deep ReLU Neural Networks that select important variables in the neural networks with or without factor structure in high-dimensional inputs. It encompasses both nonparametric sparse regression and sparse linear models with or without model structure. It also includes nonparametric factor regression model and principal component regression model as specific examples.
Fan, J. and Gu, Y. (2022).
Factor Augmented Sparse Throughput Deep ReLU Neural Networks for High Dimensional Regression.
Manuscript.
Iteratively Projected SVD for Tensor Factor Analysis
The software computes low-rank tensor decomposition with auxiliary covariates. It iteratively projects tensor data onto the linear space spanned by the basis functions of covariates and applies SVD on matricized tensors over each mode.
Semiparametric Tensor Factor Analysis by \\ Iteratively Projected SVD
Estimating the number of factors by adjusted eigenvalues thresholding
Under some conditions, the number of factors = the number of population eigenvalues exceeding 1 for the correlation matrix.
To implement this, it is estimated by the number of biased corrected eigenvalues of sample correlation matrix exceeding 1 + C sqrt(p/(n-1)), where p = dimensionality, n = sample size, and C is a tuning parameter. The default C = 1.
Fan, J., Guo, J., and Zheng, S. (2022).
Estimating number of factors by adjusted eigenvalues thresholding.
Journal of American Statistical Association, 117, 852-861.
FarmTest: Factor Adjusted Robust Multiple Testing
Performs robust multiple testing for means in the presence of known and unknown latent factors. It implements a robust procedure to estimate distribution parameters using the Huber's loss function and accounts for strong dependence among coordinates via an approximate factor model.
Main functions:
farm.test(X,...):
one-sample multiple tests;
farm.test(X,Y,...):
two-sample multiple tests.
Reference:
- Bose, K., Fan, J., Ke, Y. Pan, X. and Zhou, W.-X. (2020).
FarmTest: An R Package for Factor-Adjusted Robust Multiple Testing.
The R Journal, 12, 388-401. - Fan, J., Ke, Y., Sun, Q., and Zhou, W.X. (2019).
FarmTest: Factor-adjusted robust multiple testing with false discovery control.
Journal of American Statistical Association, 114, 1880-1893. - Zhou, W.-X., Bose, K., Fan, J. and Liu, H. (2018).
A new perspective on robust M-estimation: Finite sample theory and applications to dependence-adjusted multiple testing.
Annals of Statistics , 46, 1904-1931.
Manuscript
FarmSelect: Factor Adjusted Robust Model Selection
Implements a consistent model selection strategy for high dimensional sparse regression when the covariate dependence can be reduced through factor models. By separating the latent factors from idiosyncratic components, the problem is transformed from model selection with highly correlated covariates to that with weakly correlated variables.
Usage: farm.res(X, K.factors = NULL, robust = FALSE)
Reference:
- Fan, J., Ke, Y., Wang, K. (2017).
Decorrelation of Covariates for High Dimensional Sparse Regression
Manuscript.
Matlab codes for Adaptive Huber estimation
This is the matlab codes used for simulation and real data analysis for the paper below. It computes robust mean regression for high-dimensional feature space with variable selection.
Reference:
- Fan, J., Li, Q., and Wang, Y. (2017).
Estimation of high-dimensional mean regression in absence of symmetry and light-tail assumptions. Journal of Royal Statistical Society B , 79, 247--265.
pfa: an R package for "Estimates False Discovery Proportion Under Arbitrary Covariance Dependence"
by Jianqing Fan, Tracy Ke, Sydney Li and Lucy Xia
This package contains functions for performing multiple testing and estimating the false discovery proportion (FDP) under dependence.
Main functions: pfa.test(X,...):
one-sample multiple tests;
pfa.test(X,Y,...):
two-sample multiple tests.
pfa.gwas(X,Y,...):
multiple testing in the genome-wise association study (GWAS).
See Manual
Reference:
- (2011) Nonparametric independence screening in sparse ultra-high dimensional additive models.
Journal of American Statistical Association, 116, 544-557.
POET: an R package for estimating large covariance matrices by thresholding principal orthogonal complements.
by Fan, J., Liao, Y., and Mincheva, M. (2012)
Main function: POET performs PCA, estimate factor loadings, realized factors, and estimate sparse residual matrix by adaptive thresholding and the covariance matrix; See Manual
Reference:
- Fan, J., Liao, Y. and Micheva, M. (2013).
Large Covariance Estimation by Thresholding Principal Orthogonal Complements. (with discussion)
Journal of Royal Statistical Society B , to appear.
SIS: an R package for (Iterative) Sure Independence Screening for generalized linear models and Cox's proportional hazards models.
by Fan, J., Feng, Y., Samworth, R. J. and Wu, Y. (2010)
Main function: SIS performs variable selection using iteratively two-scale methods (large-scale screenings followed by moderate-scale selections). It calls automatically the functions GLMvanISISscad and its variant GLMvarISISscad for Generalized linear models, and functions COXvanISISscad and its variant COXvarISISscad for Cox's proportional models. Many other functions are available and can be called directly or by SIS using non-default options. Examples are scadglm (a one-step method) and fullscadglm, and scadcox (a one-step method) and fullscadcox; See Manual.
Related papers: procedures can be computed by the package
- Fan, J., Feng, Y. and Song, R. (2011)
Nonparametric independence screening in sparse ultra-high dimensional additive models.
Journal of American Statistical Association, 116, 544-557. - Fan, J., Samworth, R. and Wu, Y. (2009)
Ultrahigh dimensional variable selection: beyond the lienar model.
Journal of Machine Learning Research, 10, 1829-1853. - Fan, J. and Lv, J. (2008)
Sure independence screening for ultra-high dimensional feature space.
(with discussion) Journal of Royal Statistical Society B, 70, 849-911. - Fan, J. and Li, R. (2001)
Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of American Statistical Association, 1348-1360.
- Fan, J. and Niu, Y. (2007)
Selection and validation of normalization methods for c-DNA microarrays using within-array replications.
Bioinformatics, 23, 2391-2398. - C-Code for bandwidth selection of the conditional mean and variance functions
To compile, type "cc -l autovar.c" and then "run a.out" and follow the instructions on the screen. - Fan, J. and Yao, Q. (1998)
Efficient estimation of conditional variance functions in stochastic regression.
Biometrika, 85, 645-660. - Splus-Code for computing the Adaptive Neyman statistic: aneyman.s
- Splus-Code for computing the Hanova statistic: hanava.s
- C-Code for computing the P-value of the adaptive neyman statistic.
To compile, type: cc neyman.c -lm. To run, type a.out and then following the instructions on the screen.