Tech Talk: From limited Hadoop compute capacity to increased data scientist efficiency

Want to leverage your existing investments in Hadoop with your data on-premise and still benefit from the elasticity of the cloud? 

Like other Hadoop users, you most likely experience very large and busy Hadoop clusters, particularly when it comes to compute capacity. Bursting HDFS data to the cloud can bring challenges – network latency impacts performance, copying data via DistCP means maintaining duplicate data, and you may have to make application changes to accomodate the use of S3. 

“Zero-copy” hybrid bursting with Alluxio keeps your data on-prem and syncs data to compute in the cloud so you can expand compute capacity, particularly for ephemeral Spark jobs. 

Tags: , , , ,

Online Meetup: Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3

In this online presentation, we present how ING is leveraging Presto (interactive query), Alluxio (data orchestration & acceleration), S3 (massive storage), and DC/OS (container orchestration) to build and operate our modern Security Analytics & Machine Learning platform. We will share the challenges we encountered and how we solved them.

Tags: , , , ,

Tech Talk: Accelerating analytics with EMR on your S3 data lake

EMR has become a widely used service to run big data analytics in the public cloud. But issues around slow/inconsistent EMR performance due to S3 data lakes creates challenges for organizations.

Alluxio is a data orchestration layer for the cloud that increases performance of analytic workloads running on AWS EMR using S3 as the storage. 

Join us for this webinar where we will show you how to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake. You’ll see the performance gains with Alluxio in your EMR/S3 stack.

Tags: , , , , ,

Tech Talk: Accelerating Spark with Kubernetes

Kubernetes is widely used across enterprises to orchestrate computation. And while Kubernetes helps improve flexibility and portability for computation in public/hybrid cloud environments across infrastructure providers, running data-intensive workloads can be challenging.

When it comes to efficiently moving data closer to Spark or Presto frameworks, co-locating data with these frameworks and accessing data from multiple or remote clouds is hard to do. That’s where Alluxio, an open source data orchestration platform, can help.

Alluxio enables data locality with your Spark and Presto workloads for faster performance and better data accessibility in Kubernetes. It also provides portability across storage providers.

In this on demand tech talk we’ll give a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We’ll show you how to set up Alluxio and Spark/Presto to run in Kubernetes as well.

Tags: , , , , , ,