Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.
From time to time, a question pops up on the user mailing list referencing job failures with the error message “java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found”. This post explains the reason for the failure and the solution to the issue when it occurs.
This error indicates the Alluxio client is not available at runtime. This causes an exception when the job tries to access the Alluxio filesystem but fails to find the implementation of Alluxio client to connect to the service.
TalkingData leverages Alluxio as a single platform to manage all the data across disparate data sources on-premise and in the cloud. Alluxio removes the complexity of our environment by abstracting the different data sources and providing a unified interface. Applications simply interact with Alluxio, and Alluxio manages data access to different storage systems on behalf of the applications. Alluxio effectively democratizes data access, allowing data scientists and analysts in various business units to accomplish their goals without needing to consider where the data is located or having to go to central IT or the engineering team to transfer or prepare the data.
Tachyon: A reliable memory-centric distributed storage system presentation by founder Haoyuan Li.
We introduce Tachyon, a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.