Inference after Variable Selection Using Restricted Permutation Methods
Rui Wang, Stephen W. Lagakos
Biostatistics, Harvard School of Public Health, Boston, MA, United States

Consider the scenario where a variable selection procedure is applied to the covariate-response data to identify a subset of covariates that appear to be related to the response and we are interested in making inferences about parameters in a model for the marginal association between the selected covariates and the response. If an independent data set were available, we can fit the postulated marginal model to the independent data set and the parameters of interest could be estimated using standard inference methods. However, an independent data set is often unavailable in practice. When applied to the same data set used in the variable selection, standard methods can lead to distorted inferences. We develop exact and approximate testing and interval estimation methods for parameters reflecting the marginal association between the selected covariates and response variable, based on the same data set used for variable selection. Depending on the specific hypothesis of interest and assumptions about the joint distribution of the covariates and response, the covariate-response data matrix is transformed to a matrix that can be partitioned into two components that are independent under the null hypothesis. From this, new matrices are formed by permuting the rows of one component while holding the other component fixed, and inferences are based on a permutation distribution formed by a restricted set of the resulting matrices. We provide theoretical justification for the proposed methods, present results to guide their implementation, use simulations to assess and compare their performance to a sample-splitting approach which undertakes variable selection on one subset of the observations and inference on the other.

Keywords: Variable selector; Regression; Covariates

Biography: Dr. Rui Wang is an Instructor of Medicine at Harvard Medical School and a research associate at Harvard School of Public Health. Her research interests include statistical methodology development for design and analysis of clinical trials, and for HIV prevention strategies.