Programming the cloud

If you are a developer or a technical type, this one is for you.

Over at Depth-First there is a blog post about an application in the cheminformatics field: PubCouch: Streams aren’t just for Pipeline Pilot. The author illustrates how a well abstracted Web service avoids the costly database Extract-Transform-Load operations so familiar to most life science development. In the example, the author streams the entire contents of the PubChem FTP server to PubCouch, a web-service based on the NoSQL style document-oriented database CouchDB. CouchDB doesn’t rely on a database, instead it computes the PubChem relationships “on-the-fly” using an approach based onĀ  MapReduce.

So what you say?

The vision is this: Since modern Web-based programming (aka RESTful architecture) hides the details of massive data and computing resources, programmers can focus on “what to do” and not “how to do it” and that increases productivity.

GenomeQuest’s developers have thought deeply about what a scalable computational biology engine should look like in the cloud-based, MapReduce paradigm. If you want to read a primer on the GQ Engine, feel free to check it out.

Soon, we’ll publish the full-blown URL API so that large-scale biological data and computation can be assembled from any Internet connected desktop, using the language of the Web. A command line interface to our Web-services can be found here.

A final remark: Deepak Singh from business|bytes|genes|molecules wonders aloud what is the role of Pipeline Pilot in this new programming paradigm? I’m guessing within a domain, the value proposition might be limited, but across domains these tools will continue to be able to solve even bigger problems by leveraging better designed Web-services.

No comments yet.

Leave a Reply

Spam Protection by WP-SpamFree