How many viruses are there in a pig: new inferential statistics for metagenomic data

Presenter: James Foster, UI

Abstract:

It is now possible to get samples of DNA from every DNA-bearing entity in a given environmental sample. Clustering and string processing algorithms analyze millions to billions of small DNA sequences to determine how many different "species" were present in the original sample, and in what abundances. But there are two confounding factors in the data: the number of sequences is too small (!) for clustering to be completely reliable; and current statistical techniques are purely descriptive, and the sampling power is so weak (!) that descriptions of a sample do not fully reflect the structure of the populations in the original environment. In this talk, I present the results of a study we did to determine how the bacteria-eating virus in pig guts respond to antibiotic treatments. This requires a clustering analysis of large shotgun metagenomics datasets and new statistical techniques to interpret those data - and I promise to describe what all that means.