Implications of exponential growth of global whole genome sequencing capacity

Illumina’s HiSeq 2000 running at capacity can sequence two whole human genomes per week at 30x coverage – enough for a full-blown whole genome analysis. One instrument produces 104 human genomes per year.

Beijing Genomics Institute alone has purchased 128 of these instruments. The Broad has 51. And based on Illumina’s 2010 Q1 10-Q filing, they’ve got a backlog that represents maybe another 200 machines. So by 2011, there may be some 500 of these machines running. Not to mention the GA-IIs, the SOLiD machines, the 454 machines, Helicos, Pac Bio, Ion Torrent, Complete Genomics, and all of the next-next generation single  molecule sequencing companies making big promises.

The Fact Is…

…it’s easy to lose track of what this means. It’s easy to get stuck in today’s problems.

In 2010, we may have something like 1,000 publicly available human genomes at a wide variety of coverage. That’s giving us as a society the benefit of the doubt.

In 2011, the worldwide capacity for whole human genome sequencing will easily reach 50,000 – real data based on orders that have already been placed.

Do we believe this is going to slow down? What incentives does the industry have to dial this down? None that I can think of.

If it’s 50,000 genomes in 2011 (50x increase from 2010), it’s totally reasonable to believe that capacity will grow to 250,000 genomes by 2012 – that’s only a 5x increase from the previous year. Call 2013 a 4x increase over 2012 – that’s a capacity to sequence 1 million genomes, just three years from now.

The only thing in the way of this explosive growth is our ability to absorb the new capacity – and that gets directly to tools that can analyze the data. As the number of genomes increases exponentially, the types of questions we’ll ask of this data will change dramatically. We’re in the middle of an incredible revolution that will move more quickly than many of us appreciate. Let me propose one vision.

2001-2009: A Human Genome

The 10 or so years after the Human Genome Project, through say 2009, were characterized by large-scale research operations to understand the basic biology behind genomics. Gene and target discovery, pathway modeling, disease models, GWAS, expression analysis. Consumers of the Human Genome Project have been academic, pharmaceutical, and biotech researchers. The genome was sequenced, and sequencing was thought to be yesterday’s job.

2010: 1,000 Genomes – Learning the Ropes

In 2010 with the nascent adoption of NGS (if you think it’s widespread today, just wait), new applications have exploded on to the scene: larger-scale resequencing of exomes and whole genomes, RNA sequencing, CHiP-seq, metagenomic sequencing, and a renaissance in the agricultural sciences who can finally run their own versions of the Human Genome Project. The consumers of this early-stage adoption of NGS remain the academic researchers, pharma and biotech researchers, and ag companies. We’re finding new variation across different ethnicities, identifying novel transcripts in previously well-understood genes, and developing exciting new insights in epigenetics. But it’s still basic research. And the bioinformatics community is still arguing about basic approaches to alignments, calling variants, and normalizing across experiments.

2011: 50,000 Genomes – Clinical Flirtation

How do things change when we have the capacity to sequence and analyze 50,000 genomes? Catalogues of human variation will become large-scale for the first time. We’ll build strong correlations between phenotype, genotype, and treatments. Early-stage sequence-based diagnostics will find their way into the leading-edge labs and hospitals. Pharma will take real steps towards the design and optimization of genotype-centric clinical trials. The FDA will provide better guidances towards developing drugs and diagnostics that employ sequencing. We’ll start talking about “Genomicists” in the same way we currently describe Pathologists or Radiologists although there will be very few of them. (Indeed, some Pathologists already believe that genomics will fall in their house.)

2012: 250,000 Genomes – Clinical Early Adoption

With 250,000 genomes, the clinical adoption of sequence data will begin in earnest. Genomics-based diagnostics will be a real business: comments from a recent J.P. Morgan report indicate that lab managers believe that this switch will occur in the next 5 years, particularly in cancer detection and classification. The FDA will support pharmacogenomics-based clinical trials at large. Population studies will continue to drive massive insights into human variation. Leading-edge hospitals will store whole genome data for patients as a part of their medical records. The consumers of NGS are changing from academic and commercial researchers to Pathologists, Genomicists, VPs of Clinical Development in pharma, and young doctors everywhere.

2013: 1 Million Genomes – Consumer Awareness

When the planet has the capacity to sequence 1 million genomes per year, many 1st-world health-care consumers will have enough knowledge to seek out health-care providers who provide these services. Savvy patients, already practiced in researching their own conditions on the Internet prior to a doctor’s visit, will begin to push back on doctors’ recommendations, saying, “before we make a decision on that cancer treatment, I want my genome sequenced to see whether it’ll be effective.” Health and life insurance companies will get into the game, and barring significant ethical battles, will use genomic information to guide treatments, suggest specialists, and even set prices for premiums. Diagnostics for personalized care will double from the previous year. The personal genome will be within reach to many individuals, and the FDA will struggle to keep up with regulation to restrict the use of personal genomes from unapproved diagnostics. It is not at all clear to this author whether the FDA is sufficiently staffed to keep pace with the innovation that will explode from this level of availability of sequencing capacity.

2014: 5 Million Genomes – Consumer Reality

Many cancers in the 1st-world will be sequenced as a regular component of a biopsy. Patterns of drug efficacy will be published and made available against different genotypes. Oncologists will work with statisticians to develop treatment programs. Hospitals will offer whole genome sequencing services to newborns. Chronic pain will be managed on a genotype-by-genotype basis. Medical schools will redesign their curricula to produce physicians and researchers to lead medicine into the Genomics Age and to provide advanced training for the Genomicist specialty.

2015-2020: 25 Million Genomes And Beyond – A Brave New World

The ability to sequence 25 million genomes just five years from now seems well within the industry’s grasp, barring significant issues of uptake and absorption of the data. And applying just a doubling of capacity each year between 2015 and 2020, we would have the capacity to sequence just under 1 billion genomes a year by 2020. This will have drastic impacts on society.

While the health-care industry will continue to adopt sequencing for broader and broader applications, the insurers will do everything in their power to get access to this information both for the microeconomic management of individuals as well as for the macroeconomic indicators of ethnic and regional health that will surely increase their profit margins.

Consumer applications for genomics will flower: want to see whether you are genetically compatible with your new girlfriend? There’s an app for that. DNA sequencing on your iPhone? Believe it. Personalized genomic massage, anyone? This is already happening today – see labs testing for allele 334 of the AVPR1a gene to see whether your new mate has the “cheating gene.” Then imagine the market for consumer applications and gimmicks when your entire genome is already on a USB drive.

Genetic discrimination may need to be addressed in the highest regulatory bodies: do you really want to elect a President whose genome suggests cardiomyopathy? Think this won’t happen? Just imagine the first candidate to release his healthy genome just like his last two years of tax returns, challenging his opponents to do the same. What will the world’s reaction be?

Will LinkedIn and Facebook suggest people you may be related to? Sure, they’ll probably not have your genome, but your genome will be somewhere in a de-identified way, sitting right next to other de-identified genomes. It’s easy to envision software to mine this data that will find your relatives and common ancestors. It may start as a medical application but it won’t be able to stay that way. Just let that software platform tell you that they’ve found a genome of someone who looks like a third cousin and provide a way to reach out to them anonymously. Welcome to ChromosomallyLinkedIn.

Back to Reality

I’m no futurist – most weeks I can barely tell you what my schedule is the following week. So while it’s fun to dream up the next decade, there are too many variables to get it all right and this thought experiment may be off a few years in any direction. We’re squarely in 2010, the year of the 1,000 genomes. The deeper we allow ourselves to look into the future, the less clear it becomes.

But one thing is certain – sequencing capacity world-wide will continue to grow exponentially for at least the next 10 years. This is going to happen. That means sample preparation will get vastly easier, throughput will continue to increase at a dizzying rate, sequencing costs will plummet, and the applications of sequencing will become more mass-market.

And most of all, it means that the software that we use to analyze sequence will need to become a lot simpler to use, and more purpose-built for specific applications. General bioinformatics frameworks are dinosaurs awaiting the impact of the meteor. In the (near) future, no one will be arguing about gapped vs. ungapped alignments. No one will be talking about Phred-like quality scores. No one will be talking about reads, even – they’ll seem like antiquated tiny puzzle pieces from a past when sequencing technology was like a nuclear bomb rather than a precision scalpel.

As I look ahead to develop the long-term vision for the product roadmap for GenomeQuest it’s obvious to me that our immediate-term focus must be on simple, easy-to-use, whole and multi-genome analysis. With the coming of 50,000 genomes next year, our immediate problem is supporting the absorption of this new knowledge. That means continuing to enable the processing of data as quickly as it comes off the sequencers and presenting it to end users in a way they can understand, interact with, and discover. What are all of the proteins affected by this individuals variants and what are the types of modifications we see? How does that impact disease pathways? How is this individual similar to others for whom we have treatment/outcome data?

Today’s consumer of genome sequencing is the researcher or clinician doing basic discovery with thousands or hundreds of thousands of genomes. But the longer-term audience is the clinic itself.

And I for one don’t think we have that long to wait.

Calling all clinicians.

Rants welcomed.
-Richard