Claudio G.E. Borroni, Paolo Radaelli, Mariangela Zenga

CART (Classification and Regression Trees) is a non-parametric tree-structured recursive partitioning method introduced by Breiman et al. (1984) that enables to predict a response variable *Y* on the basis of *p* predictors observed on a learning sample of *N* units.

In this paper we consider the case in which *Y*, the response, is an ordered categorical variable with *k* levels. The aim of the classification tree is thus to predict the level of *Y* on the basis of the vector of the *p* explanatory variables.The tree is grown according to a training set of *N* cases whose measurements, both for the response and for the predictors, are available. The derived classification rule is then applied to predict the level of the response for a new unit with explanatory variables. Predicting the ordinal classes can thought to be somewhat intermediate between classification and regression trees; however, while classification trees for nominal (unordered) categorical variable and regression trees have been widely studied, the use of decision trees in ordinal regression is largely unexplored. By ignoring the ordering information in the class attribute, standard classification algorithms for nominal classes can be applied but some information is lost and this prejudices the predictive performance of the classification rule because, besides the accuracy, the severity of the error should be taken into account.

In this work we propose to use the Gini's concentration ratio and its decomposition in subgroup (Dagum, 1997) as splitting criterion. Some applications and simulations will be given.

**References:**

L. Breiman, J. Friedman, R. Olshen and C. Stone (1984), *Classification and Regression Trees*, Wadsworth and Brooks

C. Dagum (1997), *A New Approach to the Decomposition of the Gini Income Inequality Ratio*, Empirical Economics, vol. 22, n. 4, pag 515-531.

**Keywords:** Classification; Decision trees; Ordinal categorical variable; Gini

**Biography:** Associate professor in Statistics. Main research in Bayesian Statistics, Reliability analysis, Non parametric inference.