<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GenomeQuest Industry &#187; Cloud Computing</title>
	<atom:link href="http://blog.genomequest.com/tag/cloud-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.genomequest.com</link>
	<description>Conversations on the convergence of SDM, cloud computing, and applications to personalized medicine</description>
	<lastBuildDate>Thu, 12 Jan 2012 23:33:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How fast is your read mapping algorithm?</title>
		<link>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/</link>
		<comments>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 07:25:38 +0000</pubDate>
		<dc:creator>Henk Heus</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Bowtie]]></category>
		<category><![CDATA[BWA]]></category>
		<category><![CDATA[GASSST]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[GenomeQuest Engine]]></category>
		<category><![CDATA[GQ-Engine]]></category>
		<category><![CDATA[Next Generation Sequencing]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=379</guid>
		<description><![CDATA[
This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.
The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.
Which [...]]]></description>
			<content:encoded><![CDATA[<div>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia; min-height: 15.0px} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} ul.ul1 {list-style-type: square} -->This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.</p>
<p>The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.</p>
<p>Which nicely transitions me into the second phase of the answer. This phase is much more rigorous and usually starts with: &#8220;Well, it depends. Let me try to explain.&#8221;:</p>
<ul>
<li>Any good computer scientist can write a mapping algorithm that is really fast. However, that doesn&#8217;t mean anything unless it produces the kind of results that are needed. The NGS application you&#8217;re working with matters here. For example, finding genetic variation in human disease requires very accurate mapping with mismatches and indels. In contrast, with digital gene expression in maize you can cut a lot of corners. You just need to have a rough idea of the number of alignments on a transcript.</li>
<li>Then there is the matter of the data and the technology that produced it. Do you have long reads (more than 120 bp)? Will your mapper handle them? Will it also increase the number of mismatches / indels it can use to align a read? Will it significantly slow down execution, or eat up all your memory when reads get longer? Does this mapper also support local alignments when you need them? Will it align in colorspace? In paired end mode? I could go on.</li>
<li>Next there is the matter of connecting results to the following step in your pipeline. How long does it take to de-duplicate a  1TB alignment file in SAM/BAM format? Or to find those alignments who&#8217;s position overlaps with your exome capture experiment, or all dbSNP entries? At GenomeQuest we have a very efficient way of storing and handling alignments (including sequences/annotation). This saves real time, especially when compared to the alignment step itself (it saves a lot of disk space as well by the way).</li>
</ul>
<p>By the time we get to the third phase of the answer I&#8217;m usually much more confident: &#8220;Well, how fast do you need our mapping algorithm to be?&#8221;.</p>
<ul>
<li>Does it really matter how fast the read mapper is, as long as it&#8217;s comparable in performance to other algorithms for most common use cases? Does it matter if you have the alignments in 2.5 hours instead of 3? Maybe if you analyze thousands of samples per week it matters, but then other things like reliability and professional software support should matter as well.</li>
<li>Do these other algorithms scale with the hardware you throw at them? How easy is it to run a read mapping on 64 compute nodes, with 2 CPUs, 8 cores per CPU per node? What about if you double the amount of hardware? Will you go twice as fast? With the GQ- Engine  you will. Want to run on 1024 nodes? That&#8217;s possible.</li>
<li>Are you asking about speed for a single run, or the throughput for a bunch of runs? Last weekend I ran 2000 NGS read databases though our read mapping workflow (low coverage genome sequencing, about 80M reads per database). I started them on Friday afternoon, went for drinks with my friends that evening, had a nice family dinner on Saturday afternoon, and watched a movie with my kid afterwards. The runs were finished before I woke up on Sunday. No hiccups, no failed runs, no logs to monitor, and &#8211; best of all &#8211; no &#8220;one million-file&#8221; directories to organize. There were a lot of other customers on the system that weekend, doing their NGS analysis as well.</li>
</ul>
<p>If we ever meet for a demo, please ask me this question. I love to talk about it.</p>
<p>Henk Heus, Ph.D.<br />
VP Product Management &amp; Services<br />
GenomeQuest Inc.</p>
<p>At GenomeQuest we use an extended version of the GASSST read mapping algorithm (among others). Read about it here in Bioinformatics here: <a title="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" href="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" target="_self">http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>So, what&#8217;s the argument for cloud computing?</title>
		<link>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/</link>
		<comments>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 20:48:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Informatics Industry]]></category>
		<category><![CDATA[SDM]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[NGS software]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=122</guid>
		<description><![CDATA[A plot of the <a href="http://images.google.com/imgres?imgurl=http://www.mocom2020.com/data/2009/05/computer-power-future.gif&#38;imgrefurl=http://www.mocom2020.com/2009/05/evolution-of-computer-capacity-and-costs/&#38;usg=__nNgm1nlSJX4QpgaOrsZQnHi0TjM=&#38;h=768&#38;w=1205&#38;sz=116&#38;hl=en&#38;start=1&#38;um=1&#38;tbnid=HchqHMRDWWHr9M:&#38;tbnh=96&#38;tbnw=150&#38;prev=/images%3Fq%3Dcost%2Bof%2Bcomputing%2Bpower%26hl%3Den%26sa%3DN%26um%3D1">Evolution of Computer Capacity and Costs</a> shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don&#8217;t think there is [...]]]></description>
			<content:encoded><![CDATA[<p>A plot of the <a href="http://images.google.com/imgres?imgurl=http://www.mocom2020.com/data/2009/05/computer-power-future.gif&amp;imgrefurl=http://www.mocom2020.com/2009/05/evolution-of-computer-capacity-and-costs/&amp;usg=__nNgm1nlSJX4QpgaOrsZQnHi0TjM=&amp;h=768&amp;w=1205&amp;sz=116&amp;hl=en&amp;start=1&amp;um=1&amp;tbnid=HchqHMRDWWHr9M:&amp;tbnh=96&amp;tbnw=150&amp;prev=/images%3Fq%3Dcost%2Bof%2Bcomputing%2Bpower%26hl%3Den%26sa%3DN%26um%3D1">Evolution of Computer Capacity and Costs</a> shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don&#8217;t think there is an economic argument that shows everyone belongs on the cloud based just on hardware and system administration cost.</p>
<p>Dave Dooling at <a href="http://www.politigenomics.com/2010/01/cloudy-with-a-chance-of-sunshine.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Politigenomics+%28PolITiGenomics%29">PolITiGenomics</a> finds two good reasons for considering cloud options: when organizations have peak demands for compute power and when limitations on space/power/cooling preclude building a system in-house. These are two good reasons, but hardly enough to justify all the cloud computing hype.</p>
<p>So, what&#8217;s the argument for cloud computing?</p>
<p>Unlike computing which gets cheaper every year, people cost more every year. So, it makes sense to evaluate the annual software development and maintenance costs, the cost of managing the reference databases; integrating and maintaining new applications, the productivity of the end-users and how to change the ratio of end-user-to-support-programmer from 2-to-1 to 10-to-1 or 20-to-1. Cloud computing defined as &#8220;Infrastructure&#8221; (computers, networks, and storage) doesn&#8217;t alleviate these costs.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud now?</title>
		<link>http://blog.genomequest.com/2009/10/cloud-now/</link>
		<comments>http://blog.genomequest.com/2009/10/cloud-now/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:38:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=71</guid>
		<description><![CDATA[Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.]]></description>
			<content:encoded><![CDATA[<p>At the <a href="http://www.healthtech.com/sda">CHI NGS conference</a>,  I chaired a roundtable of key managers and influencers discussing the opportunity and challenges to adoption of &#8220;cloud computing&#8221; for NGS applications. As a first observation, the session was well attended and people are thinking deeply about cloud issues.  About 16 participated including representatives from major pharmaceuticals, agroscience, major medical research core labs, and the NIH.</p>
<p>Here is a transcript of my notes from the roundtable:</p>
<ol>
<li>Some felt end-users increasingly accept the privacy of their data in the hands of a secure cloud provider. Others remarked it remains uncomfortable for some end users who worry when they &#8220;don&#8217;t know where their data lives&#8221;. The roundtable agreed that more end-user education is needed.</li>
<li>Data transfer from the location of data generation to where it is processed remains a bottleneck. However, the problem is more on the upload side since cloud providers tend to have unlimited bandwidth. Corporate and Institution wide networks will have to improve to remedy this bottleneck.</li>
<li>Software application providers will have to develop metering metrics for licensing their applications on cloud resources that can be de-commissioned.</li>
<li>Cloud resources should allow for moving data between desktop applications and the centralized resources.</li>
<li>Commissioning and de-commissioning fixed resources such as databases can be an issue.</li>
<li>From a clinical applicability perspective, cloud providers (and those who run applications on cloud resources) will have to consider how to make their solution suitable for regulatory approval and auditing.</li>
<li>Finally, if an application provider such as GenomeQuest uses a commercial cloud provider such as Amazon EC2, the participants agreed that the application provider and not Amazon is accountable for the security, privacy, and over all robustness of the IT.</li>
</ol>
<p>My takeaways? Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/10/cloud-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

