If you have a hybrid cloud architecture, using either a VPN or a dedicated high-speed circuit, does the network speed become a bottleneck in the hybrid data use case?

With network speed, users typically think in terms of latency or throughput, depending on the workload. Latency becomes a larger factor for high-request-volume use cases, like you see with OLTP workloads. On the other hand, for lower-request-volume OLAP workloads, latency is less of an issue. On the throughput side, the bandwidth considerations are important and yes, throughput can become a bottleneck in the hybrid data use case if not addressed appropriately.

While adding a higher-bandwidth dedicated circuit will help, Alluxio data orchestration addresses the hybrid problem by making the data local to the compute nodes:

Hybrid cloud/cross-datacenter analytics

Why?
Alluxio workers are colocated with the compute frameworks. As such, it can drastically help with throughput and latency bottlenecks. This is because the relevant parts of the dataset are requested by the applications and made local to the compute with Alluxio. This helps in two meaningful ways:

1) Reduces or negates the number of round trips over the network

2) Reduces the amount of data that must be transferred back and forth