Golden Image or Foil Ball?
The essential basis of running services in “the cloud” is that they run in virtual machines, which come with their own idioms and practices for managing them. One of the mainstays of managing virtual machines (’VMs’) uses what’s called ‘golden images’. These are images built up complete with the services they’re supposed to run, so that you can just start them and they’ll immediately join the network and do whatever it is they’re supposed to do.
As the post title suggests, though, I think ‘foil ball’ is a more appropriate term.
You have significant problems when you rely on golden images: Image sprawl, updating your images, and image state vs. running state.
Image sprawl is what you get when the number of images (not running virtual machines) you have grows to an essentially unmaintainable figure. Let’s start with a simple LAMP stack: At the least, you’ll have a separate image for your web, database, and application servers. Oh, except you probably need a load balancer image. If you have any support services like DNS, you need an image for those. And so on. You soon find that you have a separate image for every service you provide.
Now that you’ve got this image sprawl, you run into the next issue: Updating these images is relatively expensive, and nearly always results in redundancy. It’s expensive because even trivial changes require a full image rebuild, which is itself a bit complicated. The redundancy comes because you *still* have to do some work on the image once it’s booted as a server, even if it’s minimal. So now you’ve got this complicated image generation process that has some kind of overlap with a simple on-server management process. Another kind of redundancy arrives when you make a change that affects multiple images (e.g., upgrading the same package, or performing the same configuration change): you have to make this change to each of these images separately.
Oh, and by the way - this updating process is usually completely unrelated to the process you use to update your non-image machines. Because hey, if a little bit of redundancy is good, then redundant redundancy is especially awesome.
Say you managed all of that, though, and all of your images are correctly updated all of the time. Great, now you just have to reboot every machine on your network to take advantage of the new changes. Of course, this isn’t exactly feasible for every machine all the time, which means you’ve got drift between the desired and actual configuration state.
This is why I think maintaining these images is more like managing a foil ball: It’s difficult to pull apart, difficult to press back together, and if you get too many of them they just get into the way.
If, instead, you use a single, base image for all of your work — I call these images stem cell images for what are hopefully obvious reasons - and then use a tool like Puppet to configure them once they’re running, you avoid all of the above problems: You have one image to maintain and it’s necessarily simplistic, you use the same tool and the same configuration base across all images, and Puppet keeps your machines updated within 30 minutes of any central change.
So, if someone tries to sell you a golden image, don’t buy it - instead choose a tool you can use for every machine in your organization, and push every configuration operation possible into that tool, rather than spreading tasks around to your provisioning, image management, and configuration management tools. This is just as true for tools like Jumpstart and Kickstart - they should do as little as possible, and hand off immediately to a tool like Puppet; well, really, just Puppet.