<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GenomeQuest Industry &#187; GenomeQuest</title>
	<atom:link href="http://blog.genomequest.com/tag/genomequest/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.genomequest.com</link>
	<description>Conversations on the convergence of SDM, cloud computing, and applications to personalized medicine</description>
	<lastBuildDate>Thu, 12 Jan 2012 23:33:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How fast is your read mapping algorithm?</title>
		<link>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/</link>
		<comments>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 07:25:38 +0000</pubDate>
		<dc:creator>Henk Heus</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Bowtie]]></category>
		<category><![CDATA[BWA]]></category>
		<category><![CDATA[GASSST]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[GenomeQuest Engine]]></category>
		<category><![CDATA[GQ-Engine]]></category>
		<category><![CDATA[Next Generation Sequencing]]></category>
		<category><![CDATA[Sequence Data Management]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=379</guid>
		<description><![CDATA[
This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.
The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.
Which [...]]]></description>
			<content:encoded><![CDATA[<div>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia; min-height: 15.0px} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 19.0px; font: 13.0px Georgia} ul.ul1 {list-style-type: square} -->This is a question that is often asked when I demo the GenomeQuest platform to potential customers. I always answer that question in three phases.</p>
<p>The first phase goes like this: &#8220;It&#8217;s really fast, it&#8217;s certainly not any slower than BWA/Bowtie or anything else out there.&#8221;. Next question is always: &#8220;Well, do you have any benchmarks?&#8221;.</p>
<p>Which nicely transitions me into the second phase of the answer. This phase is much more rigorous and usually starts with: &#8220;Well, it depends. Let me try to explain.&#8221;:</p>
<ul>
<li>Any good computer scientist can write a mapping algorithm that is really fast. However, that doesn&#8217;t mean anything unless it produces the kind of results that are needed. The NGS application you&#8217;re working with matters here. For example, finding genetic variation in human disease requires very accurate mapping with mismatches and indels. In contrast, with digital gene expression in maize you can cut a lot of corners. You just need to have a rough idea of the number of alignments on a transcript.</li>
<li>Then there is the matter of the data and the technology that produced it. Do you have long reads (more than 120 bp)? Will your mapper handle them? Will it also increase the number of mismatches / indels it can use to align a read? Will it significantly slow down execution, or eat up all your memory when reads get longer? Does this mapper also support local alignments when you need them? Will it align in colorspace? In paired end mode? I could go on.</li>
<li>Next there is the matter of connecting results to the following step in your pipeline. How long does it take to de-duplicate a  1TB alignment file in SAM/BAM format? Or to find those alignments who&#8217;s position overlaps with your exome capture experiment, or all dbSNP entries? At GenomeQuest we have a very efficient way of storing and handling alignments (including sequences/annotation). This saves real time, especially when compared to the alignment step itself (it saves a lot of disk space as well by the way).</li>
</ul>
<p>By the time we get to the third phase of the answer I&#8217;m usually much more confident: &#8220;Well, how fast do you need our mapping algorithm to be?&#8221;.</p>
<ul>
<li>Does it really matter how fast the read mapper is, as long as it&#8217;s comparable in performance to other algorithms for most common use cases? Does it matter if you have the alignments in 2.5 hours instead of 3? Maybe if you analyze thousands of samples per week it matters, but then other things like reliability and professional software support should matter as well.</li>
<li>Do these other algorithms scale with the hardware you throw at them? How easy is it to run a read mapping on 64 compute nodes, with 2 CPUs, 8 cores per CPU per node? What about if you double the amount of hardware? Will you go twice as fast? With the GQ- Engine  you will. Want to run on 1024 nodes? That&#8217;s possible.</li>
<li>Are you asking about speed for a single run, or the throughput for a bunch of runs? Last weekend I ran 2000 NGS read databases though our read mapping workflow (low coverage genome sequencing, about 80M reads per database). I started them on Friday afternoon, went for drinks with my friends that evening, had a nice family dinner on Saturday afternoon, and watched a movie with my kid afterwards. The runs were finished before I woke up on Sunday. No hiccups, no failed runs, no logs to monitor, and &#8211; best of all &#8211; no &#8220;one million-file&#8221; directories to organize. There were a lot of other customers on the system that weekend, doing their NGS analysis as well.</li>
</ul>
<p>If we ever meet for a demo, please ask me this question. I love to talk about it.</p>
<p>Henk Heus, Ph.D.<br />
VP Product Management &amp; Services<br />
GenomeQuest Inc.</p>
<p>At GenomeQuest we use an extended version of the GASSST read mapping algorithm (among others). Read about it here in Bioinformatics here: <a title="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" href="http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract" target="_self">http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2011/06/how-fast-is-your-read-mapping-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upcoming Improvements to the GenomeQuest Engine</title>
		<link>http://blog.genomequest.com/2011/06/upcoming-improvements-to-the-genomequest-engine/</link>
		<comments>http://blog.genomequest.com/2011/06/upcoming-improvements-to-the-genomequest-engine/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 21:27:51 +0000</pubDate>
		<dc:creator>Henk Heus</dc:creator>
				<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[GenomeQuest 7.1]]></category>
		<category><![CDATA[GenomeQuest Engine]]></category>
		<category><![CDATA[global alignment]]></category>
		<category><![CDATA[GQ-Engine]]></category>
		<category><![CDATA[interval indexing]]></category>
		<category><![CDATA[local alignment]]></category>
		<category><![CDATA[paired end reads]]></category>
		<category><![CDATA[Product update]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=365</guid>
		<description><![CDATA[As the product manager at GenomeQuest, I&#8217;m very excited to tell you about a couple of really great new features in the GQ-Engine. Features that add to the growing library of high quality NGS components available to GQ platform developers and end users.
Fast local alignments of NGS reads
NGS read mappers typically align reads by trying to [...]]]></description>
			<content:encoded><![CDATA[<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #540703} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #540703; min-height: 14.0px} p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Helvetica; color: #540703} -->As the product manager at GenomeQuest, I&#8217;m very excited to tell you about a couple of really great new features in the GQ-Engine. Features that add to the growing library of high quality NGS components available to GQ platform developers and end users.</p>
<p><strong>Fast local alignments of NGS reads</strong></p>
<p>NGS read mappers typically align reads by trying to fit the entire read into the reference sequence. This is referred to as a global alignment, or best fit, strategy. While this works great for short genomic reads, it is not always the best possible solution for longer reads. When a read gets longer the chance of it matching the reference sequence over its entire length decreases.</p>
<p>The shortcomings of global alignment algorithms become readily apparent in RNA-seq studies where a single read can span multiple exons. These exons can be right next to each other in the mRNA, but separated by megabases of intronic sequence on the genome. The only way to align such reads is to use a local alignment strategy that can map different parts of a read to different positions on the reference sequence.</p>
<p>We have added local alignment capabilities to our GASSST read mapper while keeping the existing speed and scaling. This allows us to analyze NGS-sized data sets regardless of the read length or sample source and gets us ready for PacBio and Ion Torrent. It also supports our new RNA-seq workflow that maps the transcriptome directly to the genomic reference sequence.</p>
<p><strong>Improved support for Paired End (PE) read handling</strong></p>
<p>We have added exciting new possibilities to work with PE reads to the GQ-Engine. By examining all possible alignment combinations for a read pair we can keep the most likely alignment pairs for further analysis. This strategy to find &#8220;happy pairs&#8221; can be parameterized on the command line and takes into account the expected distance between the reads, the orientation of the alignment (fwd/rev strands), and the number of mismatches and indels that are needed to align the reads at those positions. All of this happens in memory while computing the alignments, and is much more efficient and exhaustive than the post-alignment processing strategies typically implemented by other read mappers.</p>
<p>Because the PE mapping strategy is fully integrated into the GQ-Engine, we have complete flexibility working with the results. We can, for example, decide to also keep the single end reads that are mapped with high confidence (we will). As well, we can dump all non-happy pairs into a separate alignment database to look for interesting things like, copy number variation or structural variations.</p>
<p>For our web interface users, using the PE read mapping strategy will be completely transparent. When you map a PE read database, we will ask confirmation of the expected insert size and read orientation. That&#8217;s all.</p>
<p><strong>Interval Indexing and Positional Based Annotation</strong></p>
<p>Interval indexing within the GQ-Engine adds the ability to very quickly find the overlap between different sets of intervals. Examples of use cases are: &#8220;find all alignments overlapping with exons of known genes&#8221;, or &#8220;find all SNPs in my data set that are already known in dbSNP&#8221;. This technology will support many use cases in the GenomeQuest platform. To start, it will speed up the existing variant annotation workflow and drive the RNA-seq workflow. More applications will follow soon.</p>
<p>The GenomeQuest 7.1 release is planned for Friday the 8th of July 2011. I hope to see you there.</p>
<p>Henk Heus, Ph.D.<br />
VP Product Management &amp; Services<br />
GenomeQuest Inc</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2011/06/upcoming-improvements-to-the-genomequest-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing ChIP-Seq Support</title>
		<link>http://blog.genomequest.com/2009/11/announcing-chip-seq-support/</link>
		<comments>http://blog.genomequest.com/2009/11/announcing-chip-seq-support/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 13:30:19 +0000</pubDate>
		<dc:creator>GenomeQuest</dc:creator>
				<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[ChIP-Seq]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[GenomeQuest 6.0Beta]]></category>
		<category><![CDATA[Next Generation Sequencing]]></category>
		<category><![CDATA[NGS]]></category>
		<category><![CDATA[RNA-Seq]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=75</guid>
		<description><![CDATA[GenomeQuest released its ChIP-Seq workflow this week, available to anyone with a Free Basic Account inside of GenomeQuest. ]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve released our ChIP-Seq workflow this week, available to anyone with a Free Basic Account inside of GenomeQuest. Like all of our NGS workflows, it runs in two basic steps: a mapping step and a downstream analysis step. In this case, of course, the downstream analysis is a peak-finding algorithm. We chose the MACS modeling software for peak modeling. (You can see the entire workflow&#8217;s documentation <a href="http://wiki.genomequest.com/index.php/ChipSeq_Workflow">here</a>.) Integrated into the GenomeQuest Sequence Data Management platform, it outputs a heavily annotated <strong>sequence database</strong>, which can then be interactively filtered, grouped, sorted, and mined for peaks of interest. And this can all be connected to your RNA-Seq and resequencing data to get the global picture.</p>
<p>So now researchers can go from their ChIP-Seq NGS runs directly to gene-based annotation of the peaks found by their biology. Select regions of interest, or genes of interest, or peaks of a certain class, and drill down to see the actual evidence that backs up the call.</p>
<p>We&#8217;re giving away free ChIP-Seq runs to the first 100 people to <a href="http://www.genomequest.com/basic-registration/">sign up</a>.</p>
<p>As always, feel free to leave a comment &#8211; we read every one.</p>
<p>Richard J. Resnick<br />
VP Software and Services</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/11/announcing-chip-seq-support/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud now?</title>
		<link>http://blog.genomequest.com/2009/10/cloud-now/</link>
		<comments>http://blog.genomequest.com/2009/10/cloud-now/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:38:55 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=71</guid>
		<description><![CDATA[Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.]]></description>
			<content:encoded><![CDATA[<p>At the <a href="http://www.healthtech.com/sda">CHI NGS conference</a>,  I chaired a roundtable of key managers and influencers discussing the opportunity and challenges to adoption of &#8220;cloud computing&#8221; for NGS applications. As a first observation, the session was well attended and people are thinking deeply about cloud issues.  About 16 participated including representatives from major pharmaceuticals, agroscience, major medical research core labs, and the NIH.</p>
<p>Here is a transcript of my notes from the roundtable:</p>
<ol>
<li>Some felt end-users increasingly accept the privacy of their data in the hands of a secure cloud provider. Others remarked it remains uncomfortable for some end users who worry when they &#8220;don&#8217;t know where their data lives&#8221;. The roundtable agreed that more end-user education is needed.</li>
<li>Data transfer from the location of data generation to where it is processed remains a bottleneck. However, the problem is more on the upload side since cloud providers tend to have unlimited bandwidth. Corporate and Institution wide networks will have to improve to remedy this bottleneck.</li>
<li>Software application providers will have to develop metering metrics for licensing their applications on cloud resources that can be de-commissioned.</li>
<li>Cloud resources should allow for moving data between desktop applications and the centralized resources.</li>
<li>Commissioning and de-commissioning fixed resources such as databases can be an issue.</li>
<li>From a clinical applicability perspective, cloud providers (and those who run applications on cloud resources) will have to consider how to make their solution suitable for regulatory approval and auditing.</li>
<li>Finally, if an application provider such as GenomeQuest uses a commercial cloud provider such as Amazon EC2, the participants agreed that the application provider and not Amazon is accountable for the security, privacy, and over all robustness of the IT.</li>
</ol>
<p>My takeaways? Cloud computing is becoming viable in the minds of the industry. A few solvable roadblocks remain. With infinite computing and infinite data, managing the data and turning it into insight remains the challenge and the opportunity.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/10/cloud-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>APIs + Sequence Data Management = Haplotype Tables</title>
		<link>http://blog.genomequest.com/2009/09/apis-sequence-data-management-haplotype-tables/</link>
		<comments>http://blog.genomequest.com/2009/09/apis-sequence-data-management-haplotype-tables/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 13:30:52 +0000</pubDate>
		<dc:creator>GenomeQuest</dc:creator>
				<category><![CDATA[Message from Technology Team]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[detection variants]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[haplotype tables]]></category>
		<category><![CDATA[Sequence Data Management]]></category>
		<category><![CDATA[SNP]]></category>
		<category><![CDATA[variant calling]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=68</guid>
		<description><![CDATA[We&#8217;ve heard lots of requests from customers not only to provide them with powerful methods for detection variants across multiple experiments (or phenotypes, or organisms, or lines), but for unifying all of this data to find knowledge that spans these experiments.
Of course we have our variant calling workflow, just as we integrate with other variant [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve heard lots of requests from customers not only to provide them with powerful methods for detection variants across multiple experiments (or phenotypes, or organisms, or lines), but for unifying all of this data to find knowledge that spans these experiments.</p>
<p>Of course we have our variant calling workflow, just as we integrate with other variant calling workflows. All of these produce GenomeQuest-native browsable, mineable, and queryable databases. And because of the GQ Engine, we can easily combine sets of 10s or 100s or even 1,000s of these variant databases into a single queryable entity with &#8220;web-speed query performance.&#8221;</p>
<p>Nevertheless, while our customers get the benefit of the combined data, they often ask for more. So today I jumped in to the APIs of GenomeQuest and tried to address the simple problem of building a table of SNPs that span a series of experiments. Each SNP should have the specific allele called for each experiment in which it was found. A simple little table designed to be the input into any of a number of linkage disequalibrium mapping packages. I made a GQ Plug-in: 5 lines of code to make it accessible in the user interface, and another 100 lines of code (I&#8217;m wordy) on the back-end to build the table and present it. And so, the multi-experiment haplotype table is born. I might even convince the development team to include it in our next live push.</p>
<p>If you want to hear more or check out the code, drop me a line.</p>
<p>Richard J. Resnick<br />
VP Software and Services</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/09/apis-sequence-data-management-haplotype-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Do we have to fear the &#8220;Public Option?&#8221;</title>
		<link>http://blog.genomequest.com/2009/09/do-we-have-to-fear-the-public-option/</link>
		<comments>http://blog.genomequest.com/2009/09/do-we-have-to-fear-the-public-option/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 12:00:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Informatics Industry]]></category>
		<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[GenomeQuest]]></category>
		<category><![CDATA[Pharma]]></category>

		<guid isPermaLink="false">http://blog.genomequest.com/?p=61</guid>
		<description><![CDATA[The recent article "Tear Down This Firewall: Pharma Scientists Call for a Pre-competitive Approach to Bioinformatics" signals a watershed event in the evolution of commercial bioinformatics industry.]]></description>
			<content:encoded><![CDATA[<p>The recent article &#8220;<a title="Tear Down This Firewall" href="http://www.genomeweb.com//node/923345?emc=el&amp;m=484532&amp;l=1&amp;v=5f9ac6187c">Tear Down This Firewall: Pharma Scientists Call for a Pre-competitive Approach to Bioinformatics</a>&#8221; signals a watershed event in the evolution of commercial bioinformatics industry.</p>
<p>Faced with the dual forces of budget pressures and need to invest or die, pharma is giving itself &#8220;permission&#8221; to consider economic alternatives to in-house management of bioinformatics data and applications. This could signal the time is right for GenomeQuest and its competitors to show pharma that we have the goods to do it &#8220;better, cheaper, and faster&#8221; and with more innovation over time than they can do internally.</p>
<p>Of some concern is the idea that public-private funding models where pharma collaborates with publicly funded bioinformatics agencies to solve problems common. I hope these monies go into projects that create opportunities for, and do not compete with private funded enterprises. A vibrant eco-system of commercial informatics tools and service providers will over time innovate more and lower-costs of discovery and pharma industry execs should make sure they have surveyed the commercial landscape before committing their budgets to publicly funded agencies.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.genomequest.com/2009/09/do-we-have-to-fear-the-public-option/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

