Alluxio learning center

Beginner to advanced topics on analytics, AI/ML, storage, and cloud concepts

Hybrid Cloud

Cloud bursting spreads computing load across both private and public hybrid cloud infrastructures. Find out how they work and more with Alluxio.

Presto

Introduction to Presto and commonly asked questions

Presto was originally designed at Facebook to run interactive queries against large data warehouses in Hadoop and run fast queries against data warehouses storing petabytes of data.

An Introduction to Presto architecture

A typical Presto deployment will include one Presto Coordinator and any number of Presto Workers. In practice, you might deploy Presto in the cloud or on-prem.

Presto and Hadoop

What is a query engine, more specifically, a SQL query engine? Learn about the benefits of using, along with examples.

What is a query engine?

What is a query engine, more specifically, a SQL query engine? Learn about the benefits of using, along with examples.

EMR

Introduction to Amazon EMR and MapReduce

Amazon Elastic MapReduce (EMR) is a tool for processing and analyzing big data quickly. Using query tools like Spark, Hive, HBase, and Presto along with storage (like S3) and compute capacity (like EC2).

FAQ on Amazon EMR and EC2

The key differences between Amazon EMR and EC2, and how EMR works.

How to Use Presto on Amazon EMR

Amazon EMR provides scalable compute in the cloud, including interactive queries with Presto, for big data in S3 storage.

AI/GPU

What is GPU acceleration? A data Science Powerhouse

GPU acceleration, or graphics processing unit acceleration is a computing technique that utilizes not only central processing units (CPU), but also graphics processing units (GPU) to accelerate performance of data intensive applications.

Computer Vision 101: All you need to know for Computer Vision Model Training

Computer Vision is the ability of computers to recognize, analyze, and process visual contents using the way humans do. With AI technologies and algorithms, computers can learn to understand the patterns and traits of visual data.

Spark

Introduction to Apache Spark and commonly asked questions

Apache Spark is an open source analytics framework for big data, AI, and machine learning best used for large-scale data processing.

An Introduction to the Apache Spark architecture

Apache Spark includes Spark Core and four libraries: Spark SQL, MLlib, GraphX, and Spark Streaming. Individual applications will typically require Spark Core and at least one of these libraries.

HDFS

Introduction to Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) is the primary data storage system under Hadoop applications. It is a distributed file system and provides high-throughput access to application data.

Basic HDFS File Operations Commands

Learn basic HDFS commands in Linux, enabling you to create and list directories, move, delete, read files, and more.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Alluxio learning center

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer