<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GenomeQuest Industry &#187; Sequence Data Management</title>
	<atom:link href="http://blog.genomequest.com/tag/sequence-data-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.genomequest.com</link>
	<description>Conversations on the convergence of SDM, cloud computing, and applications to personalized medicine</description>
	<lastBuildDate>Thu, 12 Jan 2012 23:33:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How fast is your read mapping algorithm?</title>
		<link>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/</link>
		<comments>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 07:25:38 +0000</pubDate>
		<dc:creator>Henk Heus</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Bowtie]]></category>
		<category><![CDATA[BWA]]></category>
		<category><![CDATA[GASSST]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[GenomeQuest Engine]]></category>
		<category><![CDATA[GQ-Engine]]></category>
		<category><![CDATA[Next Generation Sequencing]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=379</guid>
		<description><![CDATA[
This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.
The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.
Which [...]]]></description>
			<content:encoded><![CDATA[<div>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia; min-height: 15.0px} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} ul.ul1 {list-style-type: square} -->This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.</p>
<p>The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.</p>
<p>Which nicely transitions me into the second phase of the answer. This phase is much more rigorous and usually starts with: &#8220;Well, it depends. Let me try to explain.&#8221;:</p>
<ul>
<li>Any good computer scientist can write a mapping algorithm that is really fast. However, that doesn&#8217;t mean anything unless it produces the kind of results that are needed. The NGS application you&#8217;re working with matters here. For example, finding genetic variation in human disease requires very accurate mapping with mismatches and indels. In contrast, with digital gene expression in maize you can cut a lot of corners. You just need to have a rough idea of the number of alignments on a transcript.</li>
<li>Then there is the matter of the data and the technology that produced it. Do you have long reads (more than 120 bp)? Will your mapper handle them? Will it also increase the number of mismatches / indels it can use to align a read? Will it significantly slow down execution, or eat up all your memory when reads get longer? Does this mapper also support local alignments when you need them? Will it align in colorspace? In paired end mode? I could go on.</li>
<li>Next there is the matter of connecting results to the following step in your pipeline. How long does it take to de-duplicate a  1TB alignment file in SAM/BAM format? Or to find those alignments who&#8217;s position overlaps with your exome capture experiment, or all dbSNP entries? At GenomeQuest we have a very efficient way of storing and handling alignments (including sequences/annotation). This saves real time, especially when compared to the alignment step itself (it saves a lot of disk space as well by the way).</li>
</ul>
<p>By the time we get to the third phase of the answer I&#8217;m usually much more confident: &#8220;Well, how fast do you need our mapping algorithm to be?&#8221;.</p>
<ul>
<li>Does it really matter how fast the read mapper is, as long as it&#8217;s comparable in performance to other algorithms for most common use cases? Does it matter if you have the alignments in 2.5 hours instead of 3? Maybe if you analyze thousands of samples per week it matters, but then other things like reliability and professional software support should matter as well.</li>
<li>Do these other algorithms scale with the hardware you throw at them? How easy is it to run a read mapping on 64 compute nodes, with 2 CPUs, 8 cores per CPU per node? What about if you double the amount of hardware? Will you go twice as fast? With the GQ- Engine  you will. Want to run on 1024 nodes? That&#8217;s possible.</li>
<li>Are you asking about speed for a single run, or the throughput for a bunch of runs? Last weekend I ran 2000 NGS read databases though our read mapping workflow (low coverage genome sequencing, about 80M reads per database). I started them on Friday afternoon, went for drinks with my friends that evening, had a nice family dinner on Saturday afternoon, and watched a movie with my kid afterwards. The runs were finished before I woke up on Sunday. No hiccups, no failed runs, no logs to monitor, and &#8211; best of all &#8211; no &#8220;one million-file&#8221; directories to organize. There were a lot of other customers on the system that weekend, doing their NGS analysis as well.</li>
</ul>
<p>If we ever meet for a demo, please ask me this question. I love to talk about it.</p>
<p>Henk Heus, Ph.D.<br />
VP Product Management &amp; Services<br />
GenomeQuest Inc.</p>
<p>At GenomeQuest we use an extended version of the GASSST read mapping algorithm (among others). Read about it here in Bioinformatics here: <a title="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" href="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" target="_self">http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>So, what&#8217;s the argument for cloud computing?</title>
		<link>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/</link>
		<comments>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 20:48:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Informatics Industry]]></category>
		<category><![CDATA[SDM]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[NGS software]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=122</guid>
		<description><![CDATA[A plot of the <a href="http://images.google.com/imgres?imgurl=http://www.mocom2020.com/data/2009/05/computer-power-future.gif&#38;imgrefurl=http://www.mocom2020.com/2009/05/evolution-of-computer-capacity-and-costs/&#38;usg=__nNgm1nlSJX4QpgaOrsZQnHi0TjM=&#38;h=768&#38;w=1205&#38;sz=116&#38;hl=en&#38;start=1&#38;um=1&#38;tbnid=HchqHMRDWWHr9M:&#38;tbnh=96&#38;tbnw=150&#38;prev=/images%3Fq%3Dcost%2Bof%2Bcomputing%2Bpower%26hl%3Den%26sa%3DN%26um%3D1">Evolution of Computer Capacity and Costs</a> shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don&#8217;t think there is [...]]]></description>
			<content:encoded><![CDATA[<p>A plot of the <a href="http://images.google.com/imgres?imgurl=http://www.mocom2020.com/data/2009/05/computer-power-future.gif&amp;imgrefurl=http://www.mocom2020.com/2009/05/evolution-of-computer-capacity-and-costs/&amp;usg=__nNgm1nlSJX4QpgaOrsZQnHi0TjM=&amp;h=768&amp;w=1205&amp;sz=116&amp;hl=en&amp;start=1&amp;um=1&amp;tbnid=HchqHMRDWWHr9M:&amp;tbnh=96&amp;tbnw=150&amp;prev=/images%3Fq%3Dcost%2Bof%2Bcomputing%2Bpower%26hl%3Den%26sa%3DN%26um%3D1">Evolution of Computer Capacity and Costs</a> shows that compute power will be 1,000X cheaper in 10 years. How much lower can it go? As this happens the relative cost of managing another computer goes asymptotic to zero, regardless of whether its hosted internally or externally. I don&#8217;t think there is an economic argument that shows everyone belongs on the cloud based just on hardware and system administration cost.</p>
<p>Dave Dooling at <a href="http://www.politigenomics.com/2010/01/cloudy-with-a-chance-of-sunshine.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Politigenomics+%28PolITiGenomics%29">PolITiGenomics</a> finds two good reasons for considering cloud options: when organizations have peak demands for compute power and when limitations on space/power/cooling preclude building a system in-house. These are two good reasons, but hardly enough to justify all the cloud computing hype.</p>
<p>So, what&#8217;s the argument for cloud computing?</p>
<p>Unlike computing which gets cheaper every year, people cost more every year. So, it makes sense to evaluate the annual software development and maintenance costs, the cost of managing the reference databases; integrating and maintaining new applications, the productivity of the end-users and how to change the ratio of end-user-to-support-programmer from 2-to-1 to 10-to-1 or 20-to-1. Cloud computing defined as &#8220;Infrastructure&#8221; (computers, networks, and storage) doesn&#8217;t alleviate these costs.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2010/01/so-whats-the-argument-for-cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>No Need for Data Pipelining</title>
		<link>http://blog.genomequest.com/2009/12/no-need-for-data-pipelining/</link>
		<comments>http://blog.genomequest.com/2009/12/no-need-for-data-pipelining/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 18:11:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[SDM]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=102</guid>
		<description><![CDATA[Our concept “Sequence Data Management” (SDM) doesn’t fit the primary/secondary/tertiary analysis informatics categories. Why? Because we&#8217;ve coupled the alignment step with the analysis step in one-shot. Why is that better? Biologists can to compute the data on their own and mine the data in an easy to use web application. They are able to “finish [...]]]></description>
			<content:encoded><![CDATA[<p>Our concept “Sequence Data Management” (SDM) doesn’t fit the primary/secondary/tertiary analysis informatics categories. Why? Because we&#8217;ve coupled the alignment step with the analysis step in one-shot. Why is that better? Biologists can to compute the data on their own and mine the data in an easy to use web application. They are able to “finish the pipeline on their own”. Hopefully, this will to more interesting biological conclusions and more enthusiastic end-users of NGS technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/12/no-need-for-data-pipelining/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>APIs + Sequence Data Management = Haplotype Tables</title>
		<link>http://blog.genomequest.com/2009/09/apis-sequence-data-management-haplotype-tables/</link>
		<comments>http://blog.genomequest.com/2009/09/apis-sequence-data-management-haplotype-tables/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 13:30:52 +0000</pubDate>
		<dc:creator>GenomeQuest</dc:creator>
				<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[detection variants]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[haplotype tables]]></category>
		<category><![CDATA[Sequence Data Management]]></category>
		<category><![CDATA[SNP]]></category>
		<category><![CDATA[variant calling]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=68</guid>
		<description><![CDATA[We&#8217;ve heard lots of requests from customers not only to provide them with powerful methods for detection variants across multiple experiments (or phenotypes, or organisms, or lines), but for unifying all of this data to find knowledge that spans these experiments.
Of course we have our variant calling workflow, just as we integrate with other variant [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve heard lots of requests from customers not only to provide them with powerful methods for detection variants across multiple experiments (or phenotypes, or organisms, or lines), but for unifying all of this data to find knowledge that spans these experiments.</p>
<p>Of course we have our variant calling workflow, just as we integrate with other variant calling workflows. All of these produce GenomeQuest-native browsable, mineable, and queryable databases. And because of the GQ Engine, we can easily combine sets of 10s or 100s or even 1,000s of these variant databases into a single queryable entity with &#8220;web-speed query performance.&#8221;</p>
<p>Nevertheless, while our customers get the benefit of the combined data, they often ask for more. So today I jumped in to the APIs of GenomeQuest and tried to address the simple problem of building a table of SNPs that span a series of experiments. Each SNP should have the specific allele called for each experiment in which it was found. A simple little table designed to be the input into any of a number of linkage disequalibrium mapping packages. I made a GQ Plug-in: 5 lines of code to make it accessible in the user interface, and another 100 lines of code (I&#8217;m wordy) on the back-end to build the table and present it. And so, the multi-experiment haplotype table is born. I might even convince the development team to include it in our next live push.</p>
<p>If you want to hear more or check out the code, drop me a line.</p>
<p>Richard J. Resnick<br />
VP Software and Services</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/09/apis-sequence-data-management-haplotype-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Science Advisory Board</title>
		<link>http://blog.genomequest.com/2009/08/science-advisory-board/</link>
		<comments>http://blog.genomequest.com/2009/08/science-advisory-board/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 12:00:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[Next Generation Sequencing]]></category>
		<category><![CDATA[NGS]]></category>
		<category><![CDATA[Science Advisory Board]]></category>
		<category><![CDATA[SDM]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=33</guid>
		<description><![CDATA[Dr. Mark Boguski appointed to Science Advisory Board.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m excited to start our advisory board, with the appointment of Dr. Mark Boguski. Since our initial financing in 2005, our investors have urged us to build the science advisory board.</p>
<p>So what took so long? Until now, it wasn&#8217;t necessary. We provided clear business value to pharma for a well-defined use case. An advisory board might have even been a distraction.</p>
<p>So what&#8217;s changed? As we build out the sequence data management (SDM) platform, we want to see beyond this year’s application of next generation sequencing (NGS), and make sure we understand where the industry is going.</p>
<div id="attachment_44" class="wp-caption alignright" style="width: 107px"><img class="size-full wp-image-44 " title="Mark Boguski" src="http://blog.genomequest.com/wp-content/uploads/2009/08/Mark-Boguski_3.jpg" alt="Mark Boguski" width="97" height="118" /><p class="wp-caption-text">Dr. Mark Boguski</p></div>
<p>Mark is a perfect advisor for this initiative. His practical experience at NCBI, Rosetta, and Novartis, and his current vantage point at Harvard Medical School and Beth Israel, places him squarely with a view to the future uses of sequence data in a clinical setting, and with firm grounding in the practical applications of sequence data for the past 20 years.</p>
<p>It’s an honor to have Mark join us as a scientific advisor. We hope to build a diverse advisory team to complement him with skills and experiences that reflect the diversity of talents converging on the digital revolution in biology.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/08/science-advisory-board/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

