Earlier I blogged on the distinctions between Infrastructure, Platform, and Software-as-a-Service offerings. The message was that “cloud” is an overloaded word and takes many forms and has different customer value propositions.
A recent commentary in GenomeWeb “Considering a Cloud? Cost isn’t everything…” citing the paper “The Real Cost of a CPU Hour” illustrates the confusion. The paper benchmarks HPC on a dedicated cluster versus bare metal resources on Amazon EC2. The conclusion is foregone since commodity EC2 network can’t keep up with a tuned applications running on a compute cluster with a high-speed network.
A more informative title for the blogger and the article would be: “Considering Infrastructure-As-A-Service? Beware if your application requires Message Passing Interface and therefore High-Speed Network”.
Fortunately for large scale Sequence Data Management the predominant application mode is “embarrassingly parallel” and therefore HPC are not needed, except maybe on the Web Server where response time is critical. The statistics in the article show that Embarrassingly Parallel applications run acceptably well on Amazon EC2, adding less than 5% overhead to the computations.