Metagenomes and Their Utility
Recall that in article one of this series we wrote that there are two main techniques to obtain a microbiome, a targeted (e.g., bacteria or fungi) or a metagenome (in which all DNA in a sample is sequenced, not just specific targets like bacteria or fungi). In this column we will now explore metagenomes and some applications to food safety and quality.
We have invited Dr. Nur Hasan of CosmosID, Inc., an expert in the field of microbial metagenomics, to share his deep knowledge of metagenomics. Our format will be an interview style.
Safe food production and preservation is a balancing act between food enzymes and microbes. We will start with some general questions about the microbial world, and then proceed deeper into why and how tools such as metagenomics are advancing our ability to explore this universe. Finally, we will ask Dr. Hasan how he sees all of this applying to food microbiology and safe food production.
Greg Siragusa/Doug Marshall: Thank you for joining us. Dr. Hasan, please give us a brief statement of your background and current position.
Nur Hasan: Thanks for having me. I am a molecular biologist by training. I did my bachelor and masters in microbiology, M.B.A in marketing, and Ph.D. in molecular biology. My current position is vice president and head of research and development at CosmosID, Inc., where I am leading the effort on developing the world’s largest curated genome databases and ultra rapid bioinformatics tools to build the most comprehensive, actionable and user-friendly metagenomic analysis platform for both pathogen detection and microbiome characterization.
Siragusa/Marshall: The slogan for CosmosID is “Exploring the Universe of Microbes”. What is your estimate of the numbers of bacterial genera and species that have not yet been cultured in the lab?
Hasan: Estimating the number of uncultured bacteria on earth is an ongoing challenge in biology. The widely accepted notion is more than 99% of bacteria from environmental samples remain ‘unculturable’ in the laboratory; however, with improvements in media design, adjustment of nutrient compositions and optimization of growth conditions based on the ecosystem these bacteria are naturally inhabiting, scientists are now able to grow more bacteria in the lab than we anticipated. Yet, our understanding is very scant on culturable species diversity across diverse ecosystems on earth. With more investigators using metagenomics tools, many ecosystems are being repeatedly sampled, with ever more microbial diversity revealed. Other ecosystems remain ignored, so we only have a skewed understanding of species diversity and what portion of such diversity is actually culturable. A report from Schloss & Handelsman highlighted the limitations of sampling and the fact that it is not possible to estimate the total number of bacterial species on Earth.1 Despite the limitation, they took a stab at the question and predicted minimum bacterial species richness to be 35,498. A more recent report by Hugenholtz estimated that there are currently 61 distinct bacterial phyla, of which 31 have no cultivable representatives.2 Currently NCBI has about 16,757 bacterial species listed, which represent less than 50% of minimum species richness as predicted by Schloss & Handelsman and only a fraction of all global species richness of about 107 to 109 estimated by Curtis and Dykhuizen.3,4
Siragusa/Marshall: In generic terms what exactly is a metagenome? Also, please explain the meaning of the terms “shotgun sequencing”, “shotgun metagenomes”, and “metagenomes”. How are they equivalent, similar or different?
Hasan: Metagenome is actually an umbrella term. It refers to the collection of genetic content of all organisms present in a given sample. It is studied by a method called metagenomics that involves direct sequencing of a heterogeneous population of DNA molecules from a biological sample all at once. Although in most applications, metagenome is often used to refer to microbial metagenome (the genes and genomes of microbial communities of given sample), in a broader sense, it actually represents total genetic makeup of a sample including genomes and gene sequences of other materials in the sample, such as nucleic acids contributed by other food ingredients of plant and animal origin. The metagenome provides an in-depth understanding of the composition, structure, functional and metabolic activities of food, agricultural and human communities.
Shotgun sequencing is a method where long strands of DNA (such as an entire genome of a bacterium) are randomly shredded (“shotgunning”) into smaller DNA fragments, so that they can be sequenced individually. Once sequenced, these small fragments are then assembled together into contigs by computer programs that find overlaps in the genetic code, and the complete sequence of the bacterial genome is generated. Now, instead of one genome, if you directly sequence entire assemblage of genomes from a metagenome using such shotgun approach, it’s called shotgun metagenomics and resulting output is termed a shotgun metagenome. By this method, you are literally sequencing thousands of genomes simultaneously from a given metagenome in one assay and get the opportunity to reconstruct individual genomes or genome fragments for investigation and comparison of the genetic consortia and taxonomic composition of complete communities and their predicted functions. Whereas targeted 16S rRNA or targeted 16S amplicon sequencing relies on amplification and sequencing of one target region, the 16S gene region, shotgun metagenomics is actually target free, it is aimed at sequencing entire genomes of every organism present in a sample and gives a more accurate, and unbiased biological representation of a sample. As an analogy of shotgun metagenomics, lets think about your library where you may have multiple books (like as different organisms present in a metagenome). You can imagine shotgun metagenomics as a process whereby all books from your library are shredded, mixed up, and then you will assemble the small shredded pieces to find text overlap and piecing the cover of all books together to reassemble each of your favorite books. Shotgun metagenomics approximates this analogy.
Metagenome and metagenomics are often used interchangeably. Where metagenome is the total collection of all genetic material from a given samples, metagenomics is the method to obtain a metagenome that utilizes a shotgun sequencing approach to sequence all these genetic material at once.
Shotgun sequencing and shotgun metagenomics are also used interchangeably. Shotgun sequencing is a technique where you fragment large DNA strands into small pieces and sequence all small fragments. Now, if you apply such techniques to sequence a metagenome, than we call it shotgun metagenomics.
Siragusa/Marshall: What can metagenomics data tell us that a Targeted 16S (Bacterial) or ITS (Fungal) Microbiome does not?
Hasan: Actually, the differences are quite profound! With 16S or ITS, you typically only get taxonomic composition of bacterial and fungal community at a genus level. In contrast, metagenomics gives you more precise taxonomic composition of both bacteria and fungi at species, sub-species or strain-level accuracy. Shotgun metagenomics also captures viruses and eukaryotes, and provides information on antimicrobial resistance and virulence genes in parallel, which is not possible with targeted 16S, ITS, etc. Because shotgun metagenomics sequences the entire genetic content of a sample, it offers the opportunity to characterize functional and metabolic potential of a given community. For example, new species, enzymes, or pathways can be discovered. Furthermore, genomic linkages between function and phylogeny for uncultured organisms can be elucidated. Additionally, shotgun metagenomics has high specificity of detection and unbiased measurement of organism abundance. It is more “representative” of the natural community than targeted 16S or ITS sequencing, which can disrupt the natural composition due to PCR amplification bias.
Marshall/Siragusa: Your company’s innovative science and technology uses metagenomics to determine microbiomes of a host of samples and niches. How has the technology of CosmosID changed the whole field of metagenomics?
Hasan: In short, we offer a solution to the data analysis bottleneck created by shotgun metagenomics. Recent advances in next-generation sequencing (NGS) technologies (i.e., significantly increased throughput and rapid turnaround time), have made it possible to sequence complex biological samples deep enough so that metagenomics can be used to identify and characterize all pathogens and commensals in any given sample, thereby establishing community profiling for decision-making. While metagenomics offers many advantages, the time and expense of bioinformatics to piece together complex unassembled reads and lack of comprehensive databases containing curated and validated genomes of all possible organisms of interest was a significant bottleneck for its applications in many critical areas including food quality and safety. Realizing such critical need, CosmosID has developed an integrated platform with novel bioinformatics tools and expertly curated genome databases to facilitate ultra-fast metagenomic analysis with high specificity required for applications like clinical diagnostics and food safety. We converted complex bioinformatics data analysis processes into a simple interface that is user-friendly so that microbiologists, even without any bioinformatics skill, can do their own analysis and obtain rapid, reliable, high-resolution identification and accurate quantitation of microbial populations. Furthermore, characterization of the microbial population attributes, such as antibiotic resistance and virulence is possible with this tool. To our knowledge, the CosmosID platform is the only metagenomics solution that can profile pathogens at sub-species and strain level with highly accuracy. The CosmosID platform even provides comparative statistical analysis across many different datasets based on distinct cohorts and/or other associated metadata. With this technology scientists can now better explore biodiversity, understand microbial interactions and community interplay in various biological processes, and monitor pathogen and antibiotic resistance transmission. When this pool of data is combined, source trace back and outbreak control activities become easier and quicker to manage.
Siragusa/Marshall: Of course the microbial world is not only composed of bacteria. Metagenomics captures sequences of all DNA-based life forms in a sample. Could this tool also be used for detecting foodborne viruses and protozoal pathogens?
Hasan: Oh, yes! The beauty of metagenomics that it’s a one time universal assay; therefore, your detection capability is not restricted to bacteria and fungi only: Metagenomics can also detect both viruses and other non-microbial eukaryotes simultaneously from the same assay.
Siragusa/Marshall: What if a virus in a sample is an RNA virus?
Hasan: That’s a very good question. To capture the RNA viruses, you need to take a slightly different approach, called microbial transcriptomics, where you isolate RNA from a given sample and do RNA sequencing. Interestingly, this approach not only ensures detection of RNA viruses, but also detects all DNA-based life forms. However, this method is relatively more expensive compared to DNA sequencing-based methods.
Siragusa/Marshall: What if a fermented food producer is interested in bacteriophage that might impact their process. Would this technique help them?
Hasan: Yes. Using metagenomics, we can readily identify, detect and track bacteriophages of interest from fermented food products and manufacturing environments. In fact, our curated virus database already includes a large number of bacteriophages, in addition to DNA and RNA viruses. Furthermore, if a fermented food producer encounters bacteriophages that are not listed in our database, we can easily update our database to include those bacteriophage genomes to enable tracking of these specific bacteriophages in the fermented food manufacturing environment. Our ability to confidently detect phages is another key differentiator of our solution, which offers invaluable information particularly when a phage-specific organism appears in a food or environmental sample at low abundance.
Siragusa/Marshall: You indicated that the length of a genome obtained in shotgun sequencing is significantly longer than that obtained by a targeted amplicon sequencing method such as obtained in a Targeted 16S microbiome. Why is that important?
Hasan: Yes, that’s true. Typically bacterial genomes comprise about 4,000 genes; but when you are using targeted sequencing like 16S, you are essentially sequencing only a single gene, such as the 16S gene, or even part of a gene (i.e., variable regions). You are then using the differences you observed in those sequences, which represent only about 0.025% of a genome, to infer identity of a bacterium. Extrapolation of a single gene polymorphism may not be accurate and often loses resolution in detecting closely related organisms. In shotgun sequencing, instead of a single gene, you are sequencing the entire genome and leveraging the sequence information of all or many of these 4,000 genes to investigate the complete genetic make-up of an organism. Shotgun sequencing offers high precision and accuracy in detection and provides information related to other genes, including antibiotic resistance and virulence, and can even reconstruct an organism’s full-length genome to understand evolution, pathogenesis and clonal transmission.
Siragusa/Marshall: Do you envision a time when metagenomics sequencing will become a solution or a “one-stop shop” for microbial diagnostics in food?
Hasan: Yes! When you use metagenomics, you are using an unbiased, culture-free, universal method that gives you the opportunity to investigate complete genetic make-up of a food sample, offering insights into the associated microbial community—their diversity, composition, functional, metabolic and virulence potential. You can also focus on your pathogen of interest to understand its source in a food production environment. Because the food genome is also sequenced, insights on authenticity and economic fraud can also be made. Thus, metagenomics offer a tantalizing new tool for supplier verification activities.
Siragusa/Marshall: Would metagenomics analysis be useful in the monitoring of antimicrobial resistance genes in food production environments or even from bacteria on food itself?
Hasan: Yes, using metagenomics you can readily detect and track antibiotic resistance genes in food production and monitor their transmission. CosmosID has developed a comprehensive antibiotic resistance database that contains resistance gene sequences from all major classes of antibiotics; therefore, you can probe metagenomic or individual bacterial sequence data against that database to profile the antibiotic resistance genes carried by the community as a whole (community resistome) or by the individual bacteria.
Siragusa/Marshall: CosmosID’s technology is intimately tied to curated databases. What is a curated database and what if a specific life-form is not included in that database? For example, you mentioned this earlier in a statement about bacteriophage.
Hasan: Yes, the solution is tied to curated databases. By curated database we mean a database whose content is expertly checked and screened for a variety of common sequence errors, contaminations, miss-assemblies and taxa misclassifications that may otherwise affect accuracy of metagenomic detection. Examples of errors include nematode sequences being submitted at NCBI as a bacterial sequence and human sequence reads assembled into microbial genomes including foodborne pathogens. We do extensive cleaning of genomes before they are incorporated into our databases; however, to get high resolution and accurate results, it takes more than just “clean genomic sequences”. For example, we have built whole genome relatedness trees and due to such phylogenetic organization of our database, we are able to identify an organism down to the sub-species or strain level. Additionally, if a specific life form is not represented in the database, we can identify the organism to it’s nearest phylogenetic neighbor, and statistics of such identification will be indicative of it being a novel organism. Furthermore our database is incredibly flexible. We can readily incorporate new genomes and gene sequences into the existing databases and utilize the modified databases to probe metagenome datasets. This allows us to develop custom databases for specific clients by incorporating any novel and/or proprietary genomes they may have.
Siragusa/Marshall: Food fraud is rapidly becoming a topic of keen interest in the food production world. How would this technology address the problem?
Hasan: As I mentioned earlier that metagenomics is a universal assay, therefore, as long as you have a qualified database to represent the plant and animal species of your interest and sequencing depth are tailored appropriately, you can also detect plant and animal species using metagenomics. In fact, one of our current databases includes about 1500 plant and animal reference genomes, and using this database we can detect those plant and animal species readily. What we may not be able to address is if identified species are intentionally added or unintentionally present due to incidental agricultural or transport carryover.
Siragusa/Marshall: Some practical questions: First, how many target organisms must be in a sample to be detected in a metagenome?
Hasan: Very interesting question. Basically, it is possible to detect as low as 1 to 10 cells of an organism in a specimen (per gram or mL) by metagenomics as long as you sequence your sample deep enough. In general, when using targeted 16S sequencing at a sequencing depth of ~20,000, for a taxonomic unit to show up as a discrete grouping (e.g. genus or species) and appear as a piece of a pie chart diagram requires approximately 103–104 gene copies or roughly 103–104 cells. For metagenomic shotgun sequencing Now, how deep* a sample needs to be sequenced to attain such lower level of detection depends on the sample type, sample size, diversity and richness of underlying microbes associated with it. (* Read Depth (or Sequencing Depth) is the number of times a sequence is determined or read for a single sample or the average number of times that a particular nucleotide is represented in a collection of random raw sequences. A single read can have errors so multiple reads are desired for data quality. Sequencing depth can range from thousands to millions.) For example, soil is a very complex sample type with enormous diversity and richness of both microbes and non-microbial eukaryotes, whereas a fermented food product may be considered as relatively simple sample type as its expected microbial diversity and richness would be much lowerTherefore, to detect a particularly low abundance organism at a level of only 1 to 10 cells amongst a large background of other different bacteria, one will need to perform sequencing at a much deeper level on highly diverse samples (e.g. soil) than you need for a low diversity sample (e.g. a fermented food product).Additionally, use of pathogen-specific prior DNA isolation and/or use of larger sample size should be considered to increase detection sensitivity.5 (Authors note: we will explore sampling and sample sizes further in a future installment of Food Genomics)
Siragusa/Marshall: If DNA from dead or lysed bacteria or other organisms is detected, what does that mean in terms of food safety or quality?
Hasan: This is a critical point in food safety or quality point of view. The method is so sensitive that it is likely to detect DNA from dead or lysed organisms, and if detected, we expect their abundance and coverage to be very low as well. However, we are mindful that even low-level presence of some food pathogens can have important food safety or quality implications, therefore, our recommendations is to use microbiological culture to confirm viability when food pathogens are detected, especially at low-level abundance or coverage. Use of RNA isolation and sequencing instead of DNA, pretreatment of samples with a membrane-impermeable reagent like propidium monoazide (PMA), which enables selective amplification of DNA from viable cells, or carrying out multiple metagenomic analyses to measure quantitative changes over time could be used to circumvent this problem.
Siragusa/Marshall: Will a shotgun approach go beyond the level of the genus or species taxonomic classification?
Hasan: Oh, yes. Shotgun metagenomics approach not only detects organisms beyond genus or species, but it also readily detects organisms at sub-species and strain level. In many cases it also can detect multiple strains of a species if present in a given sample. We frequently observe detection of multiple strains of a species in fermented food or probiotic samples. In our recent studies on Listeria monocytogenes in the Blue Bell ice cream outbreak and Aeromonas hydrophila in necrotizing, we have demonstrated such capability.5,6
Siragusa/Marshall: For someone to use this technology, what type of sample would they have to obtain? How much sample is needed? Could it be preserved in ethanol or should it be frozen? If so, at what temperature do you recommend? What is the average TAT (turnaround time) for metagenomics analysis?
Hasan: We always prefer to use fresh food samples for nucleic acid extraction. Typically 100–250 mg of solid or 1–2 mL of liquid food samples yields adequate nucleic acid for next generation sequencing. However, when it is not possible to use fresh samples, freezing and storage at −80 °C should be considered. Ethanol preservation can also be used, and in that case we suggest using > 95% ethanol to allow for more rapid penetration of cellular membranes and deactivation of DNases. However, it is important to remember that ethanol preserved samples often yields lower DNA yield. While we have expedited the turnaround time for sequence data analysis to be done in minutes, the current turnaround times for the entire workflow (sample to report) are 1 to 2 weeks. However, depending on the batch size of the samples and use of integrated workflow, turnaround time is rapidly approaching 48–96 hours.
Siragusa/Marshall: Analytical costs are an obvious consideration for the food industry. Could you share some general ranges of costs for this technology? Do you see prices for metagenomics analysis dropping as sequencing and processor costs fall?
Hasan: Pricing is dependent on a few factors, such as type of sample, batch size, sample prep difficulty, genome complexity and data analysis package desired. Generally, it ranges from $300 –$1000/sample. We expect the cost of sequencing to drop significantly in the next few years. Coupled with workflow automation and automated user-friendly bioinformatics that we have already developed, I expect this method to be more cost competitive in the coming years.
Siragusa/Marshall: Finally, would you give us your view on how metagenomics will impact food production as a routine tool in the food microbiologist’s toolkit?
Hasan: Long term, as costs come down and the sequencing and analytics capabilities become more portable and cost effective, I think you’ll see metagenomics become the main diagnostic tool for both routine monitoring through the entire food supply chain, food fraud, outbreak trace back and spoilage investigations. In terms of timing, I think we are a few years out for mass adoption, but it will still be less than a decade, and the technology is really coming up quickly! Even today, as you can see from the [previously mentioned] Listeria study, it is possible to fast track Listeria outbreak investigations from weeks to few days using metagenomics. As you can imagine, this sort of technology can and will revolutionize the microbiology of food and food fermentation with lot of new insights enabling critical improvements in manufacturing efficiency, reducing food recalls, wastage, improving shelf life and ultimately providing safer foods, while reducing total costs across the entire food supply chain.
- Schloss, P. D., and Handelsman, J. “Status of the microbial census.” Microbiology and Molecular Biology Reviews 68.4 (2004): 686-691.
- Hugenholtz, P., Hooper, S.D., and Kyrpides, N.C.. “Focus: Synergistetes.” Environmental microbiology 11.6 (2009): 1327-1329.
- Curtis, T. P., Sloan, W. T. and Scannell, J. W. (2002). Estimating prokaryotic diversity and its limits. Proc. Natl. Acad. Sci. USA 99:10494-10499.
- Dykhuizen, D. E. (1998). Santa Rosalia revisited: Why are there so many species of bacteria? Antonie Leeuwenhaek 73:25-33.
- Ottesen, A., et al. (2016).”Enrichment dynamics of Listeria monocytogenes and the associated microbiome from naturally contaminated ice cream linked to a listeriosis outbreak.” BMC microbiology 16.1: 275.
- Ponnusamy, D., et al. (2016).”Cross-talk among flesh-eating Aeromonas hydrophila strains in mixed infection leading to necrotizing fasciitis.” Proceedings of the National Academy of Sciences 113.3: 722-727.