<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GenomeQuest Industry &#187; Cloud Computing</title>
	<atom:link href="http://blog.genomequest.com/category/cloud-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.genomequest.com</link>
	<description>Conversations on the convergence of SDM, cloud computing, and applications to personalized medicine</description>
	<lastBuildDate>Thu, 12 Jan 2012 23:33:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How fast is your read mapping algorithm?</title>
		<link>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/</link>
		<comments>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 07:25:38 +0000</pubDate>
		<dc:creator>Henk Heus</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Bowtie]]></category>
		<category><![CDATA[BWA]]></category>
		<category><![CDATA[GASSST]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[GenomeQuest Engine]]></category>
		<category><![CDATA[GQ-Engine]]></category>
		<category><![CDATA[Next Generation Sequencing]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=379</guid>
		<description><![CDATA[
This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.
The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.
Which [...]]]></description>
			<content:encoded><![CDATA[<div>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia; min-height: 15.0px} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} ul.ul1 {list-style-type: square} -->This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.</p>
<p>The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.</p>
<p>Which nicely transitions me into the second phase of the answer. This phase is much more rigorous and usually starts with: &#8220;Well, it depends. Let me try to explain.&#8221;:</p>
<ul>
<li>Any good computer scientist can write a mapping algorithm that is really fast. However, that doesn&#8217;t mean anything unless it produces the kind of results that are needed. The NGS application you&#8217;re working with matters here. For example, finding genetic variation in human disease requires very accurate mapping with mismatches and indels. In contrast, with digital gene expression in maize you can cut a lot of corners. You just need to have a rough idea of the number of alignments on a transcript.</li>
<li>Then there is the matter of the data and the technology that produced it. Do you have long reads (more than 120 bp)? Will your mapper handle them? Will it also increase the number of mismatches / indels it can use to align a read? Will it significantly slow down execution, or eat up all your memory when reads get longer? Does this mapper also support local alignments when you need them? Will it align in colorspace? In paired end mode? I could go on.</li>
<li>Next there is the matter of connecting results to the following step in your pipeline. How long does it take to de-duplicate a  1TB alignment file in SAM/BAM format? Or to find those alignments who&#8217;s position overlaps with your exome capture experiment, or all dbSNP entries? At GenomeQuest we have a very efficient way of storing and handling alignments (including sequences/annotation). This saves real time, especially when compared to the alignment step itself (it saves a lot of disk space as well by the way).</li>
</ul>
<p>By the time we get to the third phase of the answer I&#8217;m usually much more confident: &#8220;Well, how fast do you need our mapping algorithm to be?&#8221;.</p>
<ul>
<li>Does it really matter how fast the read mapper is, as long as it&#8217;s comparable in performance to other algorithms for most common use cases? Does it matter if you have the alignments in 2.5 hours instead of 3? Maybe if you analyze thousands of samples per week it matters, but then other things like reliability and professional software support should matter as well.</li>
<li>Do these other algorithms scale with the hardware you throw at them? How easy is it to run a read mapping on 64 compute nodes, with 2 CPUs, 8 cores per CPU per node? What about if you double the amount of hardware? Will you go twice as fast? With the GQ- Engine  you will. Want to run on 1024 nodes? That&#8217;s possible.</li>
<li>Are you asking about speed for a single run, or the throughput for a bunch of runs? Last weekend I ran 2000 NGS read databases though our read mapping workflow (low coverage genome sequencing, about 80M reads per database). I started them on Friday afternoon, went for drinks with my friends that evening, had a nice family dinner on Saturday afternoon, and watched a movie with my kid afterwards. The runs were finished before I woke up on Sunday. No hiccups, no failed runs, no logs to monitor, and &#8211; best of all &#8211; no &#8220;one million-file&#8221; directories to organize. There were a lot of other customers on the system that weekend, doing their NGS analysis as well.</li>
</ul>
<p>If we ever meet for a demo, please ask me this question. I love to talk about it.</p>
<p>Henk Heus, Ph.D.<br />
VP Product Management &amp; Services<br />
GenomeQuest Inc.</p>
<p>At GenomeQuest we use an extended version of the GASSST read mapping algorithm (among others). Read about it here in Bioinformatics here: <a title="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" href="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" target="_self">http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Good Crowd at GQ Harvard Seminar</title>
		<link>http://blog.genomequest.com/2010/06/good-crowd-at-gq-harvard-seminar/</link>
		<comments>http://blog.genomequest.com/2010/06/good-crowd-at-gq-harvard-seminar/#comments</comments>
		<pubDate>Wed, 23 Jun 2010 15:43:25 +0000</pubDate>
		<dc:creator>Tony Flynn</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[GenomeQuest]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=241</guid>
		<description><![CDATA[Last week, GenomeQuest held our &#8220;The Next Generation of Sequence Analysis&#8221; seminar for Harvard-based Researchers.  It was sponsored by Bob Steen, manager of the <a title="a" href="http://genome.med.harvard.edu/" target="_blank">Harvard Biopolymers Facility</a>.
According to Bob, it was the 2nd largest crowd ever for his seminars and the largest ever for a software topic &#8212; an indicator that reseachers [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, GenomeQuest held our &#8220;<!--StartFragment-->The Next Generation of Sequence Analysis<!--EndFragment-->&#8221; seminar for Harvard-based Researchers.  It was sponsored by <!--StartFragment-->Bob Steen, manager of the <a title="a" href="http://genome.med.harvard.edu/" target="_blank">Harvard Biopolymers Facility</a>.</p>
<p>According to Bob, it was the 2nd largest crowd ever for his seminars and the largest ever for a software topic &#8212; an indicator that <strong>reseachers are indeed planning for NGS and eager for answers to their &#8220;information bottleneck&#8221;</strong>.</p>
<p>Over 80 principal investigators, Post Docs, and MDs attended from Harvard hospitals including Beth Israel Deaconess, Children&#8217;s Hospital, Dana Farber, Brigham and Womens, Mass General, as well as Harvard Medical School.</p>
<p><!--EndFragment-->Based on the questions, most researchers were interested in the RNA-Seq and Variant Detection applications. Richard Resnick stressed GQ&#8217;s cloud-based storage/analysis of results, the GQ Browser for whole-genome analysis, as well as cloud-sharing of results.</p>
<p>Harvard bioinformaticians and computational biologists are welcome to a follow-up seminar or training on the GQ API.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/06/good-crowd-at-gq-harvard-seminar/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OK to move the data 1 time</title>
		<link>http://blog.genomequest.com/2010/05/ok-to-move-the-data-1-time/</link>
		<comments>http://blog.genomequest.com/2010/05/ok-to-move-the-data-1-time/#comments</comments>
		<pubDate>Mon, 17 May 2010 19:57:53 +0000</pubDate>
		<dc:creator>Richard Resnick</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Informatics Industry]]></category>
		<category><![CDATA[SDM]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=211</guid>
		<description><![CDATA[<a href="http://www.oicr.on.ca/research/stein.htm">Lincoln Stein</a> lays out &#8220;<a href="http://genomebiology.com/2010/11/5/207">The case for cloud computing in genome informatics</a>&#8221; pretty nicely. The article describes the inflection point of sequencing technology. That is from 1990 to 2004 &#8216;base-pair/$&#8217; doubled every 19 months versus a doubling every 5 months since 2004 to present. There is no end in sight.
Moving data to the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.oicr.on.ca/research/stein.htm">Lincoln Stein</a> lays out &#8220;<a href="http://genomebiology.com/2010/11/5/207">The case for cloud computing in genome informatics</a>&#8221; pretty nicely. The article describes the inflection point of sequencing technology. That is from 1990 to 2004 &#8216;base-pair/$&#8217; doubled every 19 months versus a doubling every 5 months since 2004 to present. There is no end in sight.</p>
<p>Moving data to the cloud remains the biggest obstacle to cloud adoption. The article makes the <a href="http://www.genomeweb.com/node/940668/?hq_e=el&amp;hq_m=717858&amp;hq_l=13&amp;hq_v=c092034955">case for moving the computation to the data</a> instead of vice versa. Presupposing however that people will be willing to move their data at least 1 time. Otherwise, &#8220;moving the computation to the data&#8221; is an argument for building and maintaining a local compute cluster, nearby the sequencing instrument.</p>
<p>At GQ we realize there is no &#8220;one size fits all&#8221;. We support a hosted, cloud based solution for customers with limited IT expertise or inclination. Or, for lab operations running multiple sequencing instruments, we can install the cloud locally so you benefit from the bandwidth of the Local Area Network connecting the instrument to computing. Either way, data moves only 1 time and GQ solves for scalability with its algorithms and data model.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/05/ok-to-move-the-data-1-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perhaps the Biggest (Unintended) Consequence to the Health Care Bill</title>
		<link>http://blog.genomequest.com/2010/04/perhaps-the-biggest-unintended-consequence-to-the-health-care-bill/</link>
		<comments>http://blog.genomequest.com/2010/04/perhaps-the-biggest-unintended-consequence-to-the-health-care-bill/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 23:00:45 +0000</pubDate>
		<dc:creator>Tony Flynn</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Implications for Society]]></category>
		<category><![CDATA[Personalized Medicine]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=188</guid>
		<description><![CDATA[The most thoughtful folks in the health care industry acknowledge that the future will be defined by molecular (aka personalized) medicine.  Without being infinitely tedious, it will be a matter of measuring your body&#8217;s instructions (DNA) and your present state (RNA/proteins) and prescribing a course of treatment with the most likely positive outcome and [...]]]></description>
			<content:encoded><![CDATA[<p>The most thoughtful folks in the health care industry acknowledge that the future will be defined by molecular (aka personalized) medicine.  Without being infinitely tedious, it will be a matter of measuring your body&#8217;s instructions (DNA) and your present state (RNA/proteins) and prescribing a course of treatment with the most likely positive outcome and least likely negative outcome(s).</p>
<p>Pretty simple, right?</p>
<p>Well, the main challenge is that personalized medicine (PM) is all about comparing you to the history of human results.  That&#8217;s right, molecular biologists know much about the concepts of DNA/RNA/proteins but they are now in deep learning mode about what causes what at a molecular level.  And they are learning mostly by observation &#8212; that is, what happens to a person of type X when we do Y.</p>
<p>So the world of health care is collectively building a PM knowledge base and we wish for doctors to act upon it.</p>
<p>And while today&#8217;s knowledge base is small, if people are willing to contribute, it could swell to something very meaningful in but a few years.  It&#8217;s a massively exciting time for medicine (what with the cost of genome sequencing crashing by 5X every year and cloud computing enabling global sharing of all this).</p>
<p>So a major impediment to this nirvana is that folks today are reluctant to share: &#8220;but what if something worse is exposed&#8221;.  With worries that &#8220;existing conditions&#8221; (EC) will result in a lifelong insurance ban, it’s a reasonable objection.</p>
<p>Well, with a nation-wide heath care plan with EC worries removed, people will be far more open to sharing their data &#8212; which will materially accelerate our inexorable move to PM and its considerable social and economic rewards.</p>
<p>I offer this perspective as a (unintended?) consequence of our national health care program and progress that we all can applaud.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/04/perhaps-the-biggest-unintended-consequence-to-the-health-care-bill/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Industry confusing &#8220;Cloud&#8221; with &#8220;Infrastructure&#8221;</title>
		<link>http://blog.genomequest.com/2010/04/industry-confusing-cloud-with-infrastructure/</link>
		<comments>http://blog.genomequest.com/2010/04/industry-confusing-cloud-with-infrastructure/#comments</comments>
		<pubDate>Sun, 04 Apr 2010 23:42:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[SDM]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=181</guid>
		<description><![CDATA[Earlier I <a href="http://blog.genomequest.com/?p=154">blogged</a> on the distinctions between Infrastructure, Platform, and Software-as-a-Service offerings. The message was that &#8220;cloud&#8221; is an overloaded word and takes many forms and has different customer value propositions.
A recent commentary in GenomeWeb &#8220;<a href="http://www.genomeweb.com/blog/considering-cloud-cost-isnt-everything">Considering a Cloud? Cost isn&#8217;t everything&#8230;</a>&#8221; citing the paper &#8220;<a href="http://www.genomeweb.com/sites/default/files/walker.pdf">The Real  Cost of a CPU Hour</a>&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>Earlier I <a href="http://blog.genomequest.com/?p=154">blogged</a> on the distinctions between Infrastructure, Platform, and Software-as-a-Service offerings. The message was that &#8220;cloud&#8221; is an overloaded word and takes many forms and has different customer value propositions.</p>
<p>A recent commentary in GenomeWeb &#8220;<a href="http://www.genomeweb.com/blog/considering-cloud-cost-isnt-everything">Considering a Cloud? Cost isn&#8217;t everything&#8230;</a>&#8221; citing the paper &#8220;<a href="http://www.genomeweb.com/sites/default/files/walker.pdf">The Real  Cost of a CPU Hour</a>&#8221; illustrates the confusion. The paper benchmarks <a href="http://en.wikipedia.org/wiki/High-performance_computing">HPC</a> on a dedicated cluster versus bare metal resources on Amazon EC2. The conclusion is foregone since commodity EC2 network can&#8217;t keep up with a tuned applications running on a compute cluster with a high-speed network.</p>
<p>A more informative title for the blogger and the article would be: &#8220;Considering  Infrastructure-As-A-Service? Beware if your application requires Message  Passing Interface and therefore High-Speed Network&#8221;.</p>
<p>Fortunately for large scale Sequence Data Management the predominant application mode is &#8220;embarrassingly parallel&#8221; and therefore HPC are not needed, except maybe on the Web Server where response time is critical. The statistics in the article show that  Embarrassingly Parallel applications run acceptably well on Amazon EC2, adding less than 5% overhead to the computations.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/04/industry-confusing-cloud-with-infrastructure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Developers Wanted</title>
		<link>http://blog.genomequest.com/2010/03/bioinformatics-and-compuational-biologists-wanted/</link>
		<comments>http://blog.genomequest.com/2010/03/bioinformatics-and-compuational-biologists-wanted/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 23:19:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[SDM]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=159</guid>
		<description><![CDATA[Today, we launched our <a href="http://www.genomequest.com/GenomeQuest-opens-API-for-SDM.xhtml">API’s for Sequence Data Management</a> on the cloud.
So what?
GenomeQuest is now for bioinformatics and computational biologists (we call them developers for short). These are people who prefer to write code in Unix, and prefer awk, perl, and sed to Firefox, Internet Explorer, Safari, or Chrome.
So why is that important?
With no [...]]]></description>
			<content:encoded><![CDATA[<p>Today, we launched our <a href="http://www.genomequest.com/GenomeQuest-opens-API-for-SDM.xhtml">API’s for Sequence Data Management</a> on the cloud.</p>
<p>So what?</p>
<p>GenomeQuest is now for bioinformatics and computational biologists (we call them <em>developers</em> for short). These are people who prefer to write code in Unix, and prefer awk, perl, and sed to Firefox, Internet Explorer, Safari, or Chrome.</p>
<p>So why is that important?</p>
<p>With no up-front investment, developers can use the <a href="http://wiki.genomequest.com/index.php/DeveloperAPIOverview">GQ API</a> to write large-scale sequence comparison applications for the cloud without regard for the details of the computing and the reference data. And, they can publish their applications to the <a href="http://www.genomequest.com/basic-registration/">GenomeQuest Web Application</a>, so research biologist customers can use and reuse them.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/03/bioinformatics-and-compuational-biologists-wanted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An SDM Cloud?</title>
		<link>http://blog.genomequest.com/2010/03/an-sdm-cloud/</link>
		<comments>http://blog.genomequest.com/2010/03/an-sdm-cloud/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 23:05:33 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Informatics Industry]]></category>
		<category><![CDATA[SDM]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=154</guid>
		<description><![CDATA[Executives in the industry sometimes ask me if we are moving our software to the “cloud”. When I say we are already a cloud, then they wonder: “<a href="http://aws.amazon.com/ec2/">then what is Amazon offering</a>?”
Its helpful to think of the cloud as a layered architecture. The <a href="http://www.bvp.com/Default.aspx">VC Bessemer</a> provides a nice definition of this layering.


Infrastructure-as-a-Service: Web-services [...]]]></description>
			<content:encoded><![CDATA[<p>Executives in the industry sometimes ask me if we are moving our software to the “cloud”. When I say we are already a cloud, then they wonder: “<a href="http://aws.amazon.com/ec2/">then what is Amazon offering</a>?”</p>
<p>Its helpful to think of the cloud as a layered architecture. The <a href="http://www.bvp.com/Default.aspx">VC Bessemer</a> provides a nice definition of this layering.</p>
<p><img class="aligncenter" title="Bessemer definition of &quot;cloud&quot;" src="http://www.bvp.com/uploadedImages/About/Investment_Practice/Cloud%20Computing%20Ecosystem(1).gif" alt="" width="347" height="249" /></p>
<ul>
<li>Infrastructure-as-a-Service: Web-services applications framework for provisioning computers, networks, and storage resources without regard to the actual hardware topology. Examples of IaaS are Amazon EC2 and <a href="http://www.eweek.com/c/a/IT-Infrastructure/SGI-Offers-Cyclone-Cloud-Computing-for-HPC-810779/">SGI</a></li>
</ul>
<ul>
<li>Platform-as-a-Service:  Web-services application framework for provisioning data, algorithm, and analysis services and developing new applications. Leveraging all the benefits of the IaaS layer, the <a href="http://wiki.genomequest.com/index.php/DeveloperAPIOverview">GenomeQuest Developer API Framework</a> provides services for managing, comparing, and mining sequence databases without regard to the underlying computers and storage.</li>
</ul>
<ul>
<li>Software-as-a-Service: A finished Web-appliction end-to-end solution for a given end-user application. Examples of this in the Life Science space include the <a href="http://www.ingenuity.com/">Ingenuity</a> IPA product and the GenomeQuest 6.3 SDM platform.</li>
</ul>
<p>To my knowledge, we have the industry&#8217;s only cloud-based &#8220;PaaS&#8221;. With a few commands from a remote desktop, developers can upload data, apply algorithms on hundreds of cores while accessing terabytes of well managed sequence data. In the future, we intend to leverage the extraordinary economies of scale offered by IaaS providers such as Amazon.</p>
<p>If you&#8217;re curious about the API&#8217;s, take a look <a href="http://wiki.genomequest.com/index.php/GenomeQuest_Documentation#Developer_API">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/03/an-sdm-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming the cloud</title>
		<link>http://blog.genomequest.com/2010/01/programming-the-cloud/</link>
		<comments>http://blog.genomequest.com/2010/01/programming-the-cloud/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 20:23:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[Informatics Industry]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=131</guid>
		<description><![CDATA[If you are a developer or a technical type, this one is for you.
Over at <a href="http://depth-first.com/">Depth-First</a> there is a blog post about an application in the cheminformatics field: <a href="http://depth-first.com/articles/2010/01/29/pubcouch-streams-arent-just-for-pipeline-pilot">PubCouch: Streams aren&#8217;t just for Pipeline Pilot.</a> The author illustrates how a well abstracted Web service avoids the costly database <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">Extract-Transform-Load</a> operations so familiar [...]]]></description>
			<content:encoded><![CDATA[<p>If you are a developer or a technical type, this one is for you.</p>
<p>Over at <a href="http://depth-first.com/">Depth-First</a> there is a blog post about an application in the cheminformatics field: <a href="http://depth-first.com/articles/2010/01/29/pubcouch-streams-arent-just-for-pipeline-pilot">PubCouch: Streams aren&#8217;t just for Pipeline Pilot.</a> The author illustrates how a well abstracted Web service avoids the costly database <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">Extract-Transform-Load</a> operations so familiar to most life science development. In the example, the author streams the entire contents of the PubChem FTP server to PubCouch, a web-service based on the <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> style document-oriented database CouchDB. CouchDB doesn&#8217;t rely on a database, instead it computes the PubChem relationships &#8220;on-the-fly&#8221; using an approach based on  <a href="http://en.wikipedia.org/wiki/Mapreduce">MapReduce</a>.</p>
<p>So what you say?</p>
<p>The vision is this: Since modern Web-based programming (aka <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">RESTful</a> architecture) hides the details of massive data and computing resources, programmers can focus on &#8220;what to do&#8221; and not &#8220;how to do it&#8221; and that increases productivity.</p>
<p>GenomeQuest&#8217;s developers have thought deeply about what a scalable computational biology engine should look like in the cloud-based, MapReduce paradigm. If you want to read a primer on the GQ Engine, <a href="http://wiki.genomequest.com/index.php/GQEnginePrimer">feel free to check it out</a>.</p>
<p>Soon, we&#8217;ll publish the full-blown <a href="http://wiki.genomequest.com/index.php/URL_API">URL API</a> so that large-scale biological data and computation can be assembled from any Internet connected desktop, using the language of the Web. A command line interface to our Web-services can be found <a href="http://wiki.genomequest.com/index.php/HTxReferenceManual">here</a>.</p>
<p>A final remark: Deepak Singh from <a href="http://mndoci.com/">business|bytes|genes|molecules</a> wonders aloud <a href="http://mndoci.com/2010/01/29/pubchem-couchdb-and-data-pipelines/">what is the role of Pipeline Pilot</a> in this new programming paradigm? I&#8217;m guessing within a domain, the value proposition might be limited, but across domains these tools will continue to be able to solve even bigger problems by leveraging better designed Web-services.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/01/programming-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>So, what&#8217;s the argument for cloud computing?</title>
		<link>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/</link>
		<comments>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 20:48:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Informatics Industry]]></category>
		<category><![CDATA[SDM]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[NGS software]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=122</guid>
		<description><![CDATA[A plot of the <a href="http://images.google.com/imgres?imgurl=http://www.mocom2020.com/data/2009/05/computer-power-future.gif&#38;imgrefurl=http://www.mocom2020.com/2009/05/evolution-of-computer-capacity-and-costs/&#38;usg=__nNgm1nlSJX4QpgaOrsZQnHi0TjM=&#38;h=768&#38;w=1205&#38;sz=116&#38;hl=en&#38;start=1&#38;um=1&#38;tbnid=HchqHMRDWWHr9M:&#38;tbnh=96&#38;tbnw=150&#38;prev=/images%3Fq%3Dcost%2Bof%2Bcomputing%2Bpower%26hl%3Den%26sa%3DN%26um%3D1">Evolution of Computer Capacity and Costs</a> shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don&#8217;t think there is [...]]]></description>
			<content:encoded><![CDATA[<p>A plot of the <a href="http://images.google.com/imgres?imgurl=http://www.mocom2020.com/data/2009/05/computer-power-future.gif&amp;imgrefurl=http://www.mocom2020.com/2009/05/evolution-of-computer-capacity-and-costs/&amp;usg=__nNgm1nlSJX4QpgaOrsZQnHi0TjM=&amp;h=768&amp;w=1205&amp;sz=116&amp;hl=en&amp;start=1&amp;um=1&amp;tbnid=HchqHMRDWWHr9M:&amp;tbnh=96&amp;tbnw=150&amp;prev=/images%3Fq%3Dcost%2Bof%2Bcomputing%2Bpower%26hl%3Den%26sa%3DN%26um%3D1">Evolution of Computer Capacity and Costs</a> shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don&#8217;t think there is an economic argument that shows everyone belongs on the cloud based just on hardware and system administration cost.</p>
<p>Dave Dooling at <a href="http://www.politigenomics.com/2010/01/cloudy-with-a-chance-of-sunshine.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Politigenomics+%28PolITiGenomics%29">PolITiGenomics</a> finds two good reasons for considering cloud options: when organizations have peak demands for compute power and when limitations on space/power/cooling preclude building a system in-house. These are two good reasons, but hardly enough to justify all the cloud computing hype.</p>
<p>So, what&#8217;s the argument for cloud computing?</p>
<p>Unlike computing which gets cheaper every year, people cost more every year. So, it makes sense to evaluate the annual software development and maintenance costs, the cost of managing the reference databases; integrating and maintaining new applications, the productivity of the end-users and how to change the ratio of end-user-to-support-programmer from 2-to-1 to 10-to-1 or 20-to-1. Cloud computing defined as &#8220;Infrastructure&#8221; (computers, networks, and storage) doesn&#8217;t alleviate these costs.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud now?</title>
		<link>http://blog.genomequest.com/2009/10/cloud-now/</link>
		<comments>http://blog.genomequest.com/2009/10/cloud-now/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:38:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=71</guid>
		<description><![CDATA[Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.]]></description>
			<content:encoded><![CDATA[<p>At the <a href="http://www.healthtech.com/sda">CHI NGS conference</a>,  I chaired a roundtable of key managers and influencers discussing the opportunity and challenges to adoption of &#8220;cloud computing&#8221; for NGS applications. As a first observation, the session was well attended and people are thinking deeply about cloud issues.  About 16 participated including representatives from major pharmaceuticals, agroscience, major medical research core labs, and the NIH.</p>
<p>Here is a transcript of my notes from the roundtable:</p>
<ol>
<li>Some felt end-users increasingly accept the privacy of their data in the hands of a secure cloud provider. Others remarked it remains uncomfortable for some end users who worry when they &#8220;don&#8217;t know where their data lives&#8221;. The roundtable agreed that more end-user education is needed.</li>
<li>Data transfer from the location of data generation to where it is processed remains a bottleneck. However, the problem is more on the upload side since cloud providers tend to have unlimited bandwidth. Corporate and Institution wide networks will have to improve to remedy this bottleneck.</li>
<li>Software application providers will have to develop metering metrics for licensing their applications on cloud resources that can be de-commissioned.</li>
<li>Cloud resources should allow for moving data between desktop applications and the centralized resources.</li>
<li>Commissioning and de-commissioning fixed resources such as databases can be an issue.</li>
<li>From a clinical applicability perspective, cloud providers (and those who run applications on cloud resources) will have to consider how to make their solution suitable for regulatory approval and auditing.</li>
<li>Finally, if an application provider such as GenomeQuest uses a commercial cloud provider such as Amazon EC2, the participants agreed that the application provider and not Amazon is accountable for the security, privacy, and over all robustness of the IT.</li>
</ol>
<p>My takeaways? Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/10/cloud-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

