Comparative Metagenomic for Measuring Biodiversity. Application to Ocean Life Studies

Informations

Funding country: France
Acronym: HydroGen
URL: -
Start date: 10/1/2014
End date: -
Budget: 399,246 EUR

Fundings

Name	Role	Start	End	Amount
AAPG - Generic call for proposals [Appel à projets générique] 2014	Grant	10/1/2014	-	399,246 EUR

Abstract

The HydroGen project aims to design new statistical and computational tools to measure and analyze biodiversity through comparative metagenomic approaches. The support application is the study of ocean biodiversity based on the analysis of seawater samples available from the Tara Oceans expedition. Comparative metagenomic is a new field aiming at providing high-level information based on DNA material extracted and sequenced from different environments. The problem is not to identify taxonomically the various living organisms present in the various environments. The purpose is mainly to estimate proximity between two or more environmental sites at the genomic level. One way to estimate similarity is to count the number of similar DNA fragments. The sequencing of a single environment generates a dataset of 108 to 109 short DNA sequences ranging typically from 100 to 150 base pairs (called reads). From a computational point of view, the problem is thus to calculate the intersections between datasets of reads. To evaluate this similarity, the traditional way is to compute a score attached to an alignment between two reads. The main drawback of this technique is that the number of alignments to compute is excessive (1016 to 1018 between 2 samples). Furthermore, if several hundreds of metagenomic samples are involved, then this approach is currently not achievable with current alignment techniques. The main challenge of the HydroGen project is to propose alternative methodologies to efficiently compare such volume of metagenomic samples. The validation of our methodologies, and the scaling of algorithmic and statistical tools developed during the project, will be done from environmental questions linked to the study of the biodiversity of oceans. The Tara Oceans expedition has collected hundreds of seawater samples that are currently sequenced. Hundred of metagenomic data sets are thus available to the scientific community. This mass of data will be used as the primary material in the framework of the HydroGen project. The HydroGen project gathers 3 research teams with complementary competences in algorithmic, statistic and genomics: INRIA-GenScale, INRA (MIG+AgroParisTech) and CEA-CNS-LABIS.

Keywords

Countries

Years

Research organisations

Funding programs

Research projects