Cost planning is an important phase in your design that starts with capacity planning. Capacity planning ensures that you’re matching what you need with what you have before your project kicks off. It helps you deliver work on time, on budget, and on scope.
Capacity planning
Capacity planning is a continuous, iterative cycle. I recommend that you treat capacity planning not as a one off task, but as a continuous, iterative cycle, as illustrated on this slide.
Start with a forecast that estimates the capacity needed. Monitor and review this forecast. Then allocate by determining the resources required to meet the forecasted capacity. This allows you to estimate costs and balance them against risks and rewards. Once the design and cost is approved, deploy your design and monitor it to see how accurate your forecasts were. This feeds into the next forecast as the process repeats.
Optimizing cost of compute
A good starting point for anybody working on cost optimization is to become familiar with the VM instance pricing. It is often beneficial to start with a couple of small machines that can scale out through auto scaling as demand grows. To optimize the cost of your virtual machines, consider using committed use discounts, as these can be significant. Also, if your workloads allow for preemptible instances, you can save up to 80% and use auto healing to recover when instances are preempted.
Compute Engine also provides sizing recommendations for your VM instances, as shown on the right. This is a really useful feature that can help you select the right size of VM for your workloads and optimize costs.
Tips
- Start with small VMs, and test to see whether they work.
- Consider more small machines with auto scaling turned on.
- Consider committed use discounts.
- Consider at least some preemptible/spot instances.
- Use auto healing to recreate VMs when they are preempted.
- Use insights and AI tools available: rightsizing recommendations will alert you when VMs are underutilized.
Optimizing disk cost
A common mistake is to over-allocate disk space. This is not cost-efficient, but selecting a disk is not just about size. It is important to determine the performance characteristics your applications display: the I/O patterns, do you have large reads, small writes, vice versa, mainly read-only data? This type of information will help you select the correct type of disk. As the table shows, SSD persistent disks are significantly more expensive than standard persistent disks. Understanding your I/O patterns can help provide significant savings.
Tips
- Don’t over-allocate disk space.
- Determine what performance characteristics your applications require:
- I/O Pattern: small reads and writes or large reads and writes
- Configure your instances to optimize storage performance.
- Depending on I/O requirements, consider Standard HDD over SSD disks.
Optimize network costs
To optimize network costs, keep machines close to your data.
To optimize network costs, it is best practice to keep machines as close as possible to the data they need to access. This graphic shows the different types of egress: within the same zone, between zones in the same region, intercontinental egress, and internet egress.
It is important to be aware of the egress charges. These are not all straightforward.
Tips
- Egress in the same zone is free.
- Egress to within the same region is free.
- Egress to within the same region using an external IP address is charged.
- Egress between zones in the same region is charged
- All internet egress is charged.
Best practices
- Prevent over-provisioning your Kubernetes clusters.
- Always compare the costs of different storage alternatives before deciding which one to use.
- Consider alternative services to save cost rather than allocating more resources. Like CDN, Caching, Messaging, Queueing.
- Use the Cloud pricing calculator to estimate costs.
- Create regular billing reports provide detailed cost breakdowns.
- Visualize spend with Power BI or cloud native tools.
- Set budgets, labels and alerts to keep teams aware of how much they are spending.
Please visit our SRE section or #CyberTechTalk WIKI pages for much more information about designing reliable systems, monitoring and information security.