Abstract
New genetic techniques have led to a massively increased volume of data across the biological sciences, including on the biology of organisms within the natural environment. Typically the level of activity of thousands of genes is studied in multiple replicates in multiple experiments. The problem is to turn this huge volume of data into biological understanding. For organisms that are relevant to understanding the natural environment, this problem is compounded by the fact that they are not as well resourced as the typical models like a fruitfly or a lab mouse, where a detailed understanding of the function of thousands of genes is generated by well-resourced research communities. If we want to study the genetics of organisms in the natural environment, we need to try to summarise the sheer volume of data in a biologically meaningful way, and we need to relate our organisms of environmental interest to their better-studied, model organism relatives. Within the last few years, the tools to do these two things have become available and we are in a position to wrap these tools into a pipeline that will allow us to analyse rapidly the large amounts of data that have been generated by three NERC-funded projects. Combining the data analyses from these separate projects is an efficient use of resources that will lead to additional peer-reviewed papers from these projects. The methods that we develop will also help other researchers within the NERC community.