Disclosing Diversity in Populations with Large Number of Subclasses
Kalev Pärna, Mihhail Juhkam
Institute of Mathematical Statistics, University of Tartu, Tartu, Estonia

We consider populations which are divided into a big number of mutually exclusive classes. Some typical examples come from biology - fauna with its numerous species, human population with its gnotypes, etc. We may wish to draw a sample which contains at least one object from each class, however, it can be practically impossible for very small probabilities of rare classes. Increasing the sample and identification of membership of objects is often costly or time-consuming. Hence, we limit ourselves with discovering the classes that represent a dominating part (e.g. 99 per cent) of the whole population (in such a case the sample is said to have coverage 0.99).

Naturally, the problem of the required sample size arises to ensure a given coverage.

Estimation of the sample coverage was first discussed by Good (1953), who proposed a nonparametric estimator of the sample

coverage. Among numerous papers in this area we mention the work by Mao and Lindsay (2002) who describe an practical genomic application of a Poisson mixture model. Current paper is a continuation of our previous work Juhkam and Pärna (2008).

We address the following questions: (1) Is the coverage of a given sample large enough? 2) If not, then, how many

additional objects we have to draw into the sample in order to achieve the given coverage? 3) What is the average sample size necessary for obtaining the given coverage? We discuss these questions under multinomial and Poisson sampling schemes assuming various models of class probabilities.


Good, I. J. (1953) The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika, v. 40, no. 3, 237-264.

Mao, C. X. and Lindsay, B. G. (2002) A Poisson model for the coverage problem with a genomic application. Biometrika, v. 89, no. 3, 669-681.

Juhkam, M. and Pärna, K. (2008) Estimation of the Sample Size Required for Obtaining Given Sample Coverage. Acta et Commentationes Universitatis Tartuensis de Mathematica, 12, 89 - 99.

Keywords: Sample coverage; Sample size estimation; Species richness; Diversity estimation

Biography: Kalev Pärna is Professor of Probability at the Institute of Mathematical Statistics, Univeristy of Tartu, Estonia. Ha has also served as the President of Estonian Statistical Society. His research interests include statistical and probabilistic models applied in life sciences, social sciences, business and finance.