Cloud Bursting

What is cloud bursting?

Cloud bursting is a way of using a hybrid cloud infrastructure to spread computing load. With a hybrid cloud, an organization makes use of both private cloud infrastructure (either owned and managed by the organization, or provided for their exclusive use) and public cloud computing resources, such as AWS, Google Cloud Platform or Microsoft Azure. While a hybrid cloud deployment can simply refer to a situation where unrelated applications are hosted in different datacenters in relatively static manner, cloud bursting is a dynamic deployment model which allows organizations to leverage the elasticity of a public cloud when demand exceeds the capacity of their private cloud. With cloud bursting, a private cloud is the primary deployment, with public cloud resources being used to accommodate an increase in traffic. Once the load reduces to normal levels, the public cloud is decommissioned.

When and why to consider cloud bursting

Extremely large spikes in traffic are not unheard of for consumer websites: advertisements during half time of the Super Bowl or World Cup final, Black Friday deals and the publication of Stephen Hawking’s PhD thesis have all caused websites to crash due to huge peaks in traffic. In software development, spinning up a staging environment for a complex application in order to conduct pre-release testing can consume a significant amount of capacity, degrading performance of other business applications for weeks at a time. For many industries, data collected from social media, IoT devices, online transactions and logs presents huge potential, but the drain on computing resources when modelling and analyzing these huge data sets can have a significant impact on business-critical tasks.

Cloud bursting is a useful tool for organizations that do not want to move all their computing to a public cloud, perhaps due to concerns over data security and governance, or to avoid vendor lock-in. While most public cloud providers offer scalable solutions, where compute and storage resources can expand as needed, the same is not true for a privately hosted cloud. An organization may provision their private cloud to provide some extra capacity beyond typical load, but large spikes in demand can result in the infrastructure running at over 100% capacity, causing everything to run more slowly. By offloading some of the demand to a public cloud, organizations can accommodate peaks in demand without the capital expenditure involved in purchasing additional servers that are not in use most of the time.

Advantages of cloud bursting

With cloud bursting, enterprises can add an extra layer of flexibility and responsiveness to their IT infrastructure, while avoiding the capital expenditure required to provision their private cloud with sufficient resources to handle spikes in usage. By only using a public cloud for occasional peaks in demand, costs are kept to a minimum; the public cloud resources are decommissioned once they are no longer needed, and you only pay for what you use. Public cloud providers typically offer a range of tiers, so you can choose the level of performance required with a corresponding variation in cost. Contrast this with the costs of acquiring and maintaining hardware for occasional use; cloud bursting can provide significant savings.

Ways to implement cloud bursting

When considering cloud bursting, one of the main obstacles to overcome is ensuring that the applications you want to offload to a public cloud are compatible with the new environment. For an application to be able to scale seamlessly across private and public cloud infrastructure, considerable advance planning is required in order to create homogeneous environments, manage permissions, and balance the load. A cloud provider that offers both managed private clouds and a public cloud may provide cloud bursting as part of the package. Nevertheless, the performance of an application scaled across a hybrid cloud will be limited by the available network bandwidth, and this option carries the risk of vendor lock-in.

An alternative approach is to design your applications for a hybrid deployment. In this case, it’s important to consider which elements need to be hosted in your private cloud, for example, because of data security or governance concerns, and which would benefit from the scalability of a public cloud. A simple split between back-end services (hosted in a private cloud to maintain control of sensitive data) and front-end (hosted in a public cloud for greater scalability) often makes sense, but the impact of limited network bandwidth on performance still needs to be considered. Some degree of refactoring may be needed to provide locally cached data for the front-end and asynchronous data processing in the back-end. Once you’ve invested in an architecture designed for a hybrid deployment, you can take advantage of cloud services, such as content delivery networks, in order to maintain performance of your application at times of high demand.

A further option is to identify applications that can be moved in their entirety to a public cloud, freeing up capacity on the private cloud during peaks in traffic, and moved back when the load returns to normal. However, the list of suitable candidates may be limited by the drivers for choosing a private cloud in the first place, such as data security and governance. This option works best for non-sensitive data and applications that can be optimized for the scalability of a public cloud, such as a cluster of virtual machines created to test an application in development.

Cloud bursting for data analytics

For enterprises with very large – and continuously growing – data sets stored in a private cloud, running analytics workloads can have a significant impact on compute resources, degrading performance for other business-critical applications also running in the private cloud. As new analytics frameworks are added over time, the available resources are put under more and more strain.  Given that these data processing workloads are typically run intermittently, they should be an ideal candidate for bursting to a public cloud, leaving capacity for business-critical application running in the private cloud. In the past, the latency introduced by reading data from the private cloud has seriously hampered this approach. 

One workaround to the latency problem is to copy the data to the public cloud in order to run the analytics job. This has several disadvantages. Not only is copying vast quantities of data over a network time and resource intensive, but keeping it in sync with live data is a challenge. Even if copying the data is acceptable from a security perspective, you’re effectively restricted to read-only analytics on data that is increasingly stale.

Alluxio’s data orchestration platform makes it possible to run analytics workloads in a public cloud against live data stored in a private cloud, without the excessive latency usually associated with reading data across a hybrid deployment. Alluxio can connect to multiple different data stores, both within your private cloud and in a public cloud, and provides APIs for all major analytics frameworks, so you can continue to scale as you add new data sources and processing workloads. The highly distributed, multi-tier cache minimizes network latency, improving query performance and reducing public cloud costs. With Alluxio, cached data is encrypted and no data is persisted in the public cloud once the processing is complete. As a comprehensive solution for big data analytics, Alluxio can also be used to run analytics workoads in your on-premise data center against data stored in a public cloud. 

Additional Resources

To learn more about cloud bursting your analytics workload with Alluxio, read our whitepaper.

Learn how to burst your analytics workloads to the cloud with Alluxio