Just a few years ago, it was common wisdom among tech startups that the cloud was so expensive that at some reasonable scale you’d migrate to your own hardware in a colocation center. Recently, teams have grown wiser about total cost of ownership while cloud costs have fallen, so companies are ditching their hardware left and right to migrate into the cloud. For startups flush with investor money and focused on product, spend maybe doesn’t matter too much at first, but when you’re blowing $100,000s a month on AWS, you might wonder, “Am I getting my money’s worth?” I’ll bet you’re not.
One of the best features of the cloud is how easy it is to provision resources. This lets your teams move quickly and experiment at unprecedented speed, but sometimes temporary resources get forgotten. A micro instance here, some orphaned disks, a database cluster there, a Lambda pilot and maybe some large temporary files in blob storage. Each one is often only a few dollars a month, but they can sure add up. I accidentally left a Bigtable cluster running for a month after I developed some deployment automation for it and forgot to clean up after myself!
If you’re not tagging every resource at creation time and asking teams to monitor their spend, you probably have a few forgotten resources lying around. You should require everyone to tag their resources (even temporary ones), integrate tagging in your pipeline, follow a consistent scheme, and have your operations team regularly review and terminate untagged resources. Even better, automate the entire process and set up detection of anomalies in your spending patterns. This also sets you up for more sophisticated accounting as you grow.
Computing workloads usually require a particular shape across the primary dimensions of CPU, RAM, network throughput, storage throughput, and storage capacity. Precisely sizing instances for their job can be a tedious task, and many of us just pick something we know will be big enough and don’t worry about it. Does this Spark job run faster on compute-optimized instances? Great, give me a dozen. Oh wait, now it’s blocked on I/O operations to disk. Whatever, onto the next project!
Luckily you get some free help from the major platforms and third party vendors in the form of Trusted Advisor, Rightsizing Recommendations, and Cost Management. These tools will tell you when your instances are vastly underutilized, but you’ll need more sophisticated analytics-driven tooling combined with human insight to really dial these in. Often there are subtle tradeoffs. For example, reducing the RAM:CPU ratio could shrink the disk cache, causing pressure on I/O throughput. To really get these things right, you’ll need effective load testing or some way to test and measure system performance under real-world load, then dedicate the time and effort to analyze your most expensive systems for footprint fit.
If you buy your own hardware, it’s depreciating 24/7 whether or not you’re actually using it, so you might as well keep it all running all day. In the cloud, though, you pay for most resources by the hour, by the second, or even per-operation. This gives you the freedom to scale your systems up and down minute-to-minute based on actual load, paying only for the resources you actually need to serve your customers. Most businesses see a diurnal pattern to their traffic, with peaks while their largest market is awake and troughs when those people are asleep. Many have another spike at midnight for batch data processing jobs.
By matching your supply (cloud resources) to your demand (user requests) in real-time, you can avoid paying for resources you’re not actually using without really sacrificing anything. If your systems are stateless and boot quickly, this is fairly easy to implement. If your load is unpredictable and your systems take a while to start (large Java application, cache warming, etc), then you’ll need to be a bit more clever about leaving headroom and scaling proactively. Either way you can definitely cut your bill by 20% or more by implementing diurnal scaling.
Python, Ruby, and PHP are incredibly popular languages for building web services. Alas, their very flexibility that makes them great for developer productivity makes them slow to execute. At scale in production, you want the best bang for your buck. Luckily we have high-performance production runtimes like PyPy (for Python), JRuby (for Ruby), and HHVM (for PHP). Being alternative runtimes, they don’t always work for every application out-of-the-box, but all three are used by many companies at scale in production. If you’re running lots of web serving processes, you should at least experiment with one of these high-performance interpreter. You could see a 5-10x reduction in CPU usage, which translates directly into cloud cost savings.
Big internet companies like Google, Facebook and Amazon have been optimizing their production systems for years. With cloud platforms, every company now has access to the same sort of flexible computing resources that give you unprecedented freedom to pay for only what you need. Actually leveraging that freedom takes a bit of work, but saving 20%, 40%, or even 60% off your cloud computing bill can easily pay back in months.
If you think you might be wasting money in the cloud, we’d love to talk!