“Watson meets Moore”

March 5th, 2010 by Ron Ranauro

The title of the article on the new $50,000 ION Torrent machine by Kevin Davies at Bio-IT World says it all: “Watson meets Moore” as Ion Torrent Introduces Semiconductor Sequencing

Xconomy covers GQ

February 26th, 2010 by Ron Ranauro

Ryan McBride from Xconomy posted this nice article on GQ yesterday. After our discussion, he asked me if the “Google of Genomics” metaphor applied and I did not deny, thus the title: “GenomeQuest Wants to Be the Google of DNA Data Searches”

Gene patents on trial

February 8th, 2010 by Ron Ranauro

GenomeWeb reports on the HHS Advisory groups proposal to “limit the ability of holders of gene patents to keep others from using those genes for diagnostic and research purposes.” The GLR recaps the evolution of the debate and the sponsors an interesting dialog in a recent post Up Next in Gene Patents: Waiting for a Ruling (Again) and SACGHS Meets (Again).

Gene patents were also a hot topic at the Molecular Medicine Tri-Conference last week in San Francisco. One of the talks “Gene Patents in Molecular Diagnostics: Valuable Assets or Impediments?”. The speaker, Frances Toneguzzo, Ph.D., Director of Corporate Research and Licensing at MGH brought up an interesting perspective. She discussed the idea that limiting patents on genes is a slippery slope since other forms of biomarker patents, such as “image biomarkers” could be eliminated from patent protection.

Programming the cloud

January 31st, 2010 by Ron Ranauro

If you are a developer or a technical type, this one is for you.

Over at Depth-First there is a blog post about an application in the cheminformatics field: PubCouch: Streams aren’t just for Pipeline Pilot. The author illustrates how a well abstracted Web service avoids the costly database Extract-Transform-Load operations so familiar to most life science development. In the example, the author streams the entire contents of the PubChem FTP server to PubCouch, a web-service based on the NoSQL style document-oriented database CouchDB. CouchDB doesn’t rely on a database, instead it computes the PubChem relationships “on-the-fly” using an approach based on  MapReduce.

So what you say?

The vision is this: Since modern Web-based programming (aka RESTful architecture) hides the details of massive data and computing resources, programmers can focus on “what to do” and not “how to do it” and that increases productivity.

GenomeQuest’s developers have thought deeply about what a scalable computational biology engine should look like in the cloud-based, MapReduce paradigm. If you want to read a primer on the GQ Engine, feel free to check it out.

Soon, we’ll publish the full-blown URL API so that large-scale biological data and computation can be assembled from any Internet connected desktop, using the language of the Web. A command line interface to our Web-services can be found here.

A final remark: Deepak Singh from business|bytes|genes|molecules wonders aloud what is the role of Pipeline Pilot in this new programming paradigm? I’m guessing within a domain, the value proposition might be limited, but across domains these tools will continue to be able to solve even bigger problems by leveraging better designed Web-services.

So, what’s the argument for cloud computing?

January 25th, 2010 by Ron Ranauro

A plot of the Evolution of Computer Capacity and Costs shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don’t think there is an economic argument that shows everyone belongs on the cloud based just on hardware and system administration cost.

Dave Dooling at PolITiGenomics finds two good reasons for considering cloud options: when organizations have peak demands for compute power and when limitations on space/power/cooling preclude building a system in-house. These are two good reasons, but hardly enough to justify all the cloud computing hype.

So, what’s the argument for cloud computing?

Unlike computing which gets cheaper every year, people cost more every year. So, it makes sense to evaluate the annual software development and maintenance costs, the cost of managing the reference databases; integrating and maintaining new applications, the productivity of the end-users and how to change the ratio of end-user-to-support-programmer from 2-to-1 to 10-to-1 or 20-to-1. Cloud computing defined as “Infrastructure” (computers, networks, and storage) doesn’t alleviate these costs.

Challenges in 1000 Genomes Data

January 18th, 2010 by Ron Ranauro

Variant reports are not the right deliverable for a re-sequencing study.

A well written technical blog ‘MassGenomics‘ written by Dan Koboldt illustrates why. Dan says “What’s more, with the advent of next-generation sequencing, I hate to tell you, but people are going to be reporting a lot of false positives.  I guarantee it.  So when you filter all of the variants, you might actually remove the ones you’re looking for.”

Its easy to see why researchers are not enthusiastic about tabular reports. They want to get into the data on their own, without intermediaries, and they want software to facilitate that, not be in the way.

No Need for Data Pipelining

December 8th, 2009 by Ron Ranauro

Our concept “Sequence Data Management” (SDM) doesn’t fit the primary/secondary/tertiary analysis informatics categories. Why? Because we’ve coupled the alignment step with the analysis step in one-shot. Why is that better? Biologists can to compute the data on their own and mine the data in an easy to use web application. They are able to “finish the pipeline on their own”. Hopefully, this will to more interesting biological conclusions and more enthusiastic end-users of NGS technology.

Announcing ChIP-Seq Support

November 9th, 2009 by Richard Resnick

We’ve released our ChIP-Seq workflow this week, available to anyone with a Free Basic Account inside of GenomeQuest. Like all of our NGS workflows, it runs in two basic steps: a mapping step and a downstream analysis step. In this case, of course, the downstream analysis is a peak-finding algorithm. We chose the MACS modeling software for peak modeling. (You can see the entire workflow’s documentation here.) Integrated into the GenomeQuest Sequence Data Management platform, it outputs a heavily annotated sequence database, which can then be interactively filtered, grouped, sorted, and mined for peaks of interest. And this can all be connected to your RNA-Seq and resequencing data to get the global picture.

So now researchers can go from their ChIP-Seq NGS runs directly to gene-based annotation of the peaks found by their biology. Select regions of interest, or genes of interest, or peaks of a certain class, and drill down to see the actual evidence that backs up the call.

We’re giving away free ChIP-Seq runs to the first 100 people to sign up.

As always, feel free to leave a comment – we read every one.

Richard J. Resnick
VP Software and Services

Cloud now?

October 19th, 2009 by Ron Ranauro

At the CHI NGS conference,  I chaired a roundtable of key managers and influencers discussing the opportunity and challenges to adoption of “cloud computing” for NGS applications. As a first observation, the session was well attended and people are thinking deeply about cloud issues.  About 16 participated including representatives from major pharmaceuticals, agroscience, major medical research core labs, and the NIH.

Here is a transcript of my notes from the roundtable:

  1. Some felt end-users increasingly accept the privacy of their data in the hands of a secure cloud provider. Others remarked it remains uncomfortable for some end users who worry when they “don’t know where their data lives”. The roundtable agreed that more end-user education is needed.
  2. Data transfer from the location of data generation to where it is processed remains a bottleneck. However, the problem is more on the upload side since cloud providers tend to have unlimited bandwidth. Corporate and Institution wide networks will have to improve to remedy this bottleneck.
  3. Software application providers will have to develop metering metrics for licensing their applications on cloud resources that can be de-commissioned.
  4. Cloud resources should allow for moving data between desktop applications and the centralized resources.
  5. Commissioning and de-commissioning fixed resources such as databases can be an issue.
  6. From a clinical applicability perspective, cloud providers (and those who run applications on cloud resources) will have to consider how to make their solution suitable for regulatory approval and auditing.
  7. Finally, if an application provider such as GenomeQuest uses a commercial cloud provider such as Amazon EC2, the participants agreed that the application provider and not Amazon is accountable for the security, privacy, and over all robustness of the IT.

My takeaways? Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.

APIs + Sequence Data Management = Haplotype Tables

September 30th, 2009 by Richard Resnick

We’ve heard lots of requests from customers not only to provide them with powerful methods for detection variants across multiple experiments (or phenotypes, or organisms, or lines), but for unifying all of this data to find knowledge that spans these experiments.

Of course we have our variant calling workflow, just as we integrate with other variant calling workflows. All of these produce GenomeQuest-native browsable, mineable, and queryable databases. And because of the GQ Engine, we can easily combine sets of 10s or 100s or even 1,000s of these variant databases into a single queryable entity with “web-speed query performance.”

Nevertheless, while our customers get the benefit of the combined data, they often ask for more. So today I jumped in to the APIs of GenomeQuest and tried to address the simple problem of building a table of SNPs that span a series of experiments. Each SNP should have the specific allele called for each experiment in which it was found. A simple little table designed to be the input into any of a number of linkage disequalibrium mapping packages. I made a GQ Plug-in: 5 lines of code to make it accessible in the user interface, and another 100 lines of code (I’m wordy) on the back-end to build the table and present it. And so, the multi-experiment haplotype table is born. I might even convince the development team to include it in our next live push.

If you want to hear more or check out the code, drop me a line.

Richard J. Resnick
VP Software and Services