Getting Started with Alluxio + Apache Spark + AWS S3

ALLUXIO BAY AREA MEETUP AT INTEL 2016

Intel and Big Data Ecosystem

Big data ecosystem is moving with massive energy, customers are from healthcare, retail, transportation, and other fields are benefiting significantly from the business insights derived. As the data growth continues, storage technologies and distributed memory systems are becoming even more important for real time decision making and insight discovery. Intel is excited to work with developer communities on Alluxio and to optimize Alluxio solutions on Intel platform. In this talk, Ziya will discuss Intel’s optimization work in the area, open source contribution and industry use cases.

Getting Started with Alluxio + Spark + S3

Enabling and improving the integrations between different systems in the Big Data ecosystem is a main focus of Alluxio. The most common stack is compute frameworks on top of Alluxio and under storage systems below Alluxio. Using this architecture brings significant benefits to both sides of Alluxio: Compute frameworks benefit from a level of abstraction and performance gain, allowing for fast data access without considering the storage type. Under storage systems can be easily integrated with Big Data applications and suddenly can be treated as fast storage. This presentation will outline the benefits Alluxio brings to the stack and give a demo of some of these advantages with Spark, Alluxio, and S3.

Getting Started with Alluxio + Spark + S3 from Alluxio, Inc.

Accessing Data Anywhere with Unified Namespace

Alluxio’s unified namespace is an abstraction that makes it possible to access multiple independent storage systems through the same namespace and interface. Leveraging Alluxio’s unified namespace provides the following benefits:

Future-proofing your applications: applications can communicate with different storage systems, both existing and new, using the same namespace and interface; seamless integration between applications and new storage systems enables faster innovation
Enabling new workloads: integrating an application or a storage system with Alluxio is a one-time effort which enables the application to access many different types of storage systems and the storage system to be accessed by many different types of applications

Accessing Data Anywhere with Unified Namespace from Alluxio, Inc.

Open Source Memory Speed Virtual Distributed Storage

Alluxio is a memory speed virtual distributed storage system. It originated from AMPLab, UC Berkeley in 2012, the same lab produced Apache Mesos and Apache Spark. Soon later, it became an open source project and is deployed at many companies. Since then, Alluxio has attracted more than 250 contributors from over 50 institutions. Since 2015, Alluxio creators and top committers founded a company to further accelerate the development of Alluxio. As the first meetup after the rebranding from Tachyon to Alluxio, we will first present exciting updates and new developments of the community. Followed by many new features and improvements in Alluxio 1.0 and 1.1 releases. These features include new Alluxio APIs, improved storage system integrations, usability features, and performance improvements.

Open Source Memory Speed Virtual Distributed Storage from Alluxio, Inc.

New Features and Improvements in Alluxio 1.0 and 1.1

View