Effective Data Engineering in the Cloud World

Cloud has changed the dynamics of data engineering as well as the behavior of data engineers in many ways. This is primarily because a data engineer on premise only dealt with databases and some parts of the hadoop stack.
In the cloud, things are a bit different. Data engineers suddenly need to think different and broader. Instead of being purely focused on data infrastructure, you are now almost a full stack engineer (leaving out the final end application perhaps). Compute, containers, storage, data movement, performance, network — skills are increasing needed across the broader stack. Here are some design concept and data stack elements to keep in mind.

Embracing Data Silos — the journey through a fragmented data world

Over the years of working in the big data and machine learning space, we frequently hear from data engineers that the biggest obstacle to extracting value from data is being able to access the data efficiently. Data silos, isolated islands of data, are often viewed by data engineers as the key culprit or public enemy №1. There have been many attempts to do away with data silos, but those attempts themselves have resulted in yet another data silo, with data lakes being one such example. Rather than attempting to eliminate data silos, we believe the right approach is to embrace them.

Recap: Spark+AI Summit 2019

Alluxio is a proud sponsor and exhibitor of Spark+AI Summit in San Francisco.
What’s Spark+AI Summit? It’s the world’s largest conference that is focused on Apache Spark – Alluxio’s older cousin open source project from the same lab (UC Berkeley’s AMPLab – now RISElab).

Fast Big Data Analytics and Machine Learning Using Alluxio and Spark in Baidu

Strata+Hadoop World 2016 – Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.

Tags: , , , , , , , ,