Just a few months ago, I couldn’t see any value in this fad known as cloud computing. But then, it was in the comments for that blog post that Tomer Gabel introduced me to the notion of using cloud computing for, well, computing. Prior to that, I had only heard of cloud computing as it pertained to data storage. A prime example of cloud computing resources is Amazon’s Elastic Compute Cloud (EC2).
After reading up on EC2, my first thought vis-à-vis FATE was to migrate my 32- and 64-bit x86 Linux compilation duties to a beefy instance in the cloud. However, the smallest instance currently costs $73/month to leave running continously. That doesn’t seem cost effective.
My next idea was to spin up an instance for each 32- or 64-bit x86 Linux build on demand when there was new FFmpeg code in need of compilation. That would mean 20 separate instances right now, instances that each wouldn’t have to run very long. This still doesn’t seem like a very good idea since instance computing time is billed by the hour and it’s rounded up. Thus, even bringing up an instance for 8 minutes of build/test time incurs a full billable hour. I’m unclear on whether bring-up/tear-down cycles of, say, 1 minute each, are each billed as separate compute hours, but it still doesn’t sound like a worthwhile solution no matter how I slice it.
A different architecture occurred to me recently: For each new code revision, spin up a new, single core compute instance and run all 20 build/test configurations serially. This would at long last guarantee that each revision is being built, at least for 32- and 64-bit x86 Linux. It’s interesting to note that I could easily quantify the financial cost of an SVN commit– if it took, say, 2-3 hours to build and test the 20 configs, that would amount to $.30 per code commit.
Those aren’t the only costs, though. There are additional costs for bandwidth, both in and out (at different rates depending on direction). Fortunately, I designed FATE to minimize bandwidth burden, so I wouldn’t be worried about that cost. Understand, though, that data on these compute instances is not persistent. If you need persistent storage, that’s a separate service called Elastic Block Store. I can imagine using this for ccache output. Pricing for this service is fascinating: Amazon charges for capacity used on a monthly basis, naturally, but also $.10 per million I/O requests.
This is filed under “Outlandish Brainstorms” for the time being, mostly because of the uncertain and possibly intractable costs (out of my own pocket). But the whole cloud thing is worth keeping an eye on since it’s a decent wager that prices will only decline from this point on and make these ideas and others workable. How about a dedicated instance loaded with the mphq samples archive, iterating through it periodically and looking for crashers? How about loading up a small compute instance’s 160 GB of space with a random corpus of multimedia samples found all over the web (I think I know people who could hook us up) which would start periodically and process random samples from the corpus while tracking and logging statistics about performance?
Heh, you’ve read my mind in more ways than you might think :-)