Individual bacteria from the same population often carry different genes. Existing approaches modelling this genomic diversity are based on the total number of genes (pangenome) and the gene frequency spectrum (supragenome). We extend these ideas by taking the sample genealogy into account. In population genetics the well known Infinitely Many Sites model is used to model the variation of a DNA sequence. In the same way we assume that new genes are taken from the environment and may be lost again along the ancestral lines. The presented Infinitely Many Genes Model fits well to data from cyanobacteria and provides statistical tests for neutrality.
Keywords: Population genetics; Pangenomes; Infinitely many genes model; Kingman's Coalescent
Biography: Franz Baumdicker has studied mathematics at the University of Freiburg in Germany. Now he is a PhD student at the department of stochastics in the group of Prof. Pfaffelhuber in Freiburg. His main fields of interest are probabilistic models and questions arising from biological issues.