Inference for the Coefficient Parameters of the Logistic Regression from a Small Sample
Toshinari Kamakura
Industrial & Systems Engineering, Chuo University, Tokyo, Japan

The logistic model is widely used in the field of medical and industrial applications for evaluating the risk factors adjusting confounding factors based on the event occurrence data and covariates. More than two thousand articles were published in 1999 according to Ryan (2000), but as much less than 1% of the large number of papers on logistic regression appear in statistics journals, despite the fact that there are many unsolved research issues.

We investigate the Wald test of the regression parameter in case of a small sample. Hauck and Donner (1977) pointed out that behavior of the Wald and the likelihood ratio test do not coincide. Furthermore, in case of the high or low probability it is known the phenomenon of separation or monotone likelihood is observed in the fitting process of a logistic model and the Wald test sometimes gives rise to very conservative results. In this case the likelihood test is also conservative in our simulation studies. Separation primarily occurs in small samples with several unbalanced and highly predictive risk factors (Albert and Anderson, 1984; Heinze and Schemper, 2002). We investigate separation or quasi-separation for dataset and calculate probability of separation or quasi-separation as the function of event occurrence probability and test and study the property of p-values of the tests.

We proposed a new method based on the bootstrap resampling techniques and compare the true p-values for the likelihood ratio test, Wald test, and other testing methods (Ohkura and Kamakura, 2011). Simulations studies illuminate that our new method keeps nominal p-values even for the high or low event probabilities. The Firth (1993) method has a good property that it gives the bias reduction of the maximum likelihood estimates and the stable estimates are obtained even for the nearly quasi-separation case. Combination of the bootstrap method and the Firth method will give a good performance even for moderately high probability of event occurrences.


[1] Albert, A. and Anderson. J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models, Biometricka, 71, 1-10.

[2] Firth, D. (1993). Bias reduction of maximum likelihood estimates, Biometrika, 80, 27-38.

[3] Hauck, Jr., W. W. and Donner, A. (1977). Wald's test as applied to hypotheses in logit analysis, JASA, 72, 851-853

[4] Ohkura, M. and Kamakura, T. (2011). Test for a regression parameter in a logistic regression model under the small sample size and the high evet occurrence probability, Japanese Applied Statistics (in Japanese), to appear.

[5] Ryan, T. P. (2000). Some issues in logistic regression, 29 (9&10), 2019-2032.

Heinze, G. and Schemper, M. (2002). A solution to the problem of separation in logistic regression, Statist. Med., 21, 2409–2419.

Keywords: logistic regression; Wald test; bootstrap method; small sample

Biography: I am a Prof. of Chuo University in Japan. My research interests are survival analysis in medical and engineering field, statistical risk analysis, functional data analysis for human behaviors, data mining for massive datasets in sports science, and others. Recently I also study the problems on the statistical production process control for highly reliable systems.