The Optimal Cut-Off Point for Predictive Classification of Ungrouped Data with Binary Logistic Regression Model
Supol Durongwatana
Department of Statistics, Chulalongkorn University, Bangkok, Thailand

The empirical average value of the cut-off point[1] can be interpolated from the estimated multiple linear regression model which is fitted using the average cut-off point as the observations for the dependent variable. Each observation of the sample size, the number of independent variables, the failure rate, and the Kaiser-Meyer-Olkin measure of sample adequacy is used as all four independent variables. All observations are generated using Monte Carlo technique. The simulated data are designed with four factors as the independent variables. They are sample size, number of independent variables, failure rate of the data, and Kaiser-Meyer-Olkin measure of sample adequacy. The optimal cut-off point is located using the Hadjicostas's theory. This located cut-off point provides the maximum proportion of correct classification. The average of located cut-off points is computed. Finally, the multiple linear regression model is fitted using the data from all simulated situations. The results are shown as follows:

When the number of independent variables increases, the average of the cut-off points will decrease while keeping the other factors constant. However, when the sample size, the failure rate of the data, and the KMO measure of sample adequacy change, the average of the cut-off points will change and converge to the neighborhood of 0.5 on the condition that the sample size is remarkably large.

When the sample size increases, the average of the cut-off points will also decrease while keeping the other factors unchanged. When the number of independent variables, the failure rate of the data, and the KMO measure of sample adequacy change, the average of the cut-off points will change and converge to the neighborhood of 0.5 on the condition that the sample size itself is substantially large.

When the failure rate of the data set increases, the average of the cut-off points will decrease while keeping the other factors unchanged. When the number of independent variables, the sample size, and the KMO measure of sample adequacy change, the average of the cut-off points will change and converge to the values of the neighborhood of 0.5 similar to the first two situations.

When the KMO measure of sample adequacy increases, the average of the cut-off points will decrease while keeping the other factors constant. When the number of independent variables, the failure rate of the data, and the sample size change, the average of the cut-off points will change and converge to the neighborhood of 0.5 as well.

References:

[1] Hadjicostas, Petros. (2006) Maximum Proportion of Correct Classifications in Binary Logistic Regression. Journal of Applied Statistics, Vol.33. No.6, pp.629-640.

Keywords: Cut-off point; Predictive classification; Maximum proportion of correct classification; Binary logistic regression model

Biography: Education: Ph.D. in Statistics, Oklahoma State University, USA

Working Place: Department of Statistics, Faculty of Commerce and Accountancy, Chulalongkorn University, Bangkok 10330, THailand

Academic Interests: Regression Models and Analysis for Business Applications, Statistical Models for Business Applications, and Design of Experiments in Business Applications.