Séminaire GULLIVER : E. Aurell (KTH-Royal Institute of Technology, Sweden)

10 septembre 2018 11:30 » 12:30 — Bibliothèque PCT - F3.04

Genome-scale DCA

Direct Coupling Analysis (DCA) has become a powerful tool to find pair-wise dependencies in data. It amounts to inferring coefficients in an Ising or a Potts model and then using the largest such inferred coefficients as predictors for the dependencies of interest. A main success has been to predict residue-residue contacts in protein structures from tables of similar protein sequences.

In a larger context DCA is a way to detect epistasis, meaning the effect on fitness of one gene being (locus) dependent on the variants (alleles) present in other genes (loci). The application of DCA to contacts in protein structures thus means detecting epistasis within one locus. I will discuss an application to whole-genome DNA sequences of the human pathogen S pneumoniae and show some of the predictions obtained between loci that lie far apart on the pneumoccal genome. I will also discuss the new computational challenges when extending DCA to date of this size (about 100,000 loci and about 3,000 samples). I will finally discuss how and when theories of population genetics provide a conceptual basis for DCA applied on such data, and why such a perspective is useful.

Haut de page