Developer Tip Why Did My Job Fail with Error Message Class alluxiohadoopFileSystem not found

October 30, 2018

Bin Fan

From time to time, a question pops up on the user mailing list referencing job failures with the error message "java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found". This post explains the reason for the failure and the solution to the issue when it occurs.

Why does this happen?

This error indicates the Alluxio client is not available at runtime. This causes an exception when the job tries to access the Alluxio filesystem but fails to find the implementation of Alluxio client to connect to the service.

An Alluxio client is a Java library and defines the class alluxio.hadoop.FileSystem to invoke Alluxio services per user requests (such as creating a file, listing a directory, etc). It is typically pre-compiled into a jar file named alluxio-1.8.1-client.jar (for v1.8.1), and distributed with the Alluxio tarball. To work with applications this file should be located on the JVM classpath so that it can be discovered and loaded into the JVM process. If the application JVM fails to find this file on the classpath, it does not know the implementation of class alluxio.hadoop.FileSystem and will therefore throw the exception.

How to address this problem

The solution is to ensure the Alluxio client jar is distributed on the classpath of applications. There are several factors that should be considered when troubleshooting.

If the application is distributed across multiple nodes, this jar should be distributed to all these nodes. Depending on the compute framework, this configuration can be very different:

For MapReduce or YARN applications, one can append the path to Alluxio client jar to mapreduce.application.classpath or yarn.application.classpath to ensure each task can find it. Alternatively, you can supply the path as argument of -libjars like

$ bin/hadoop jar \
libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount \
-libjars /<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar \
<INPUT FILES> <OUTPUT DIRECTORY>

Depending on the Hadoop distribution, it may also help to set $HADOOP_CLASSPATH:

export HADOOP_CLASSPATH=/<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar:${HADOOP_CLASSPATH}
For Spark applications, set in spark/conf/spark-defaults.conf on every node running Spark and restart the long-running Spark server processes:

spark.driver.extraClassPath /<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar
spark.executor.extraClassPath /<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar
For Hive, set environment variable HIVE_AUX_JARS_PATH in conf/hive-env.sh:

export HIVE_AUX_JARS_PATH=/<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar:${HIVE_AUX_JARS_PATH}

In some cases, one compute framework relies on another. For example, a Hive service can use MapReduce as the engine for distributed query. In this case it is necessary to set classpath for both Hive and MapReduce to be configured correctly.

Summary

For applications to work with Alluxio, they must append the Alluxio client jar file into their classpath.
How to configure Alluxio client jar file to the classpath can be case-by-case based on the compute framework.

Share this post

Blog

New Features in Alluxio Enterprise AI 3.6

How Coupang Leverages Distributed Cache to Accelerate Machine Learning Model Training

Coupang, a Fortune 200 technology company, manages a multi-cluster GPU architecture for their AI/ML model training. This architecture introduced significant challenges, including:

Time-consuming data preparation and data copy/movement
Difficulty utilizing GPU resources efficiently
High and growing storage costs
Excessive operational overhead maintaining storage for localized data silos

To resolve these challenges, Coupang’s AI platform team implemented a distributed caching system that automatically retrieves training data from their central data lake, improves data loading performance, unifies access paths for model developers, automates data lifecycle management, and extends easily across Kubernetes environments. The new distributed caching architecture has improved model training speed, reduced storage costs, increased GPU utilization across clusters, lowered operational overhead, enabled training workload portability, and delivered 40% better I/O performance compared to parallel file systems.

Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale

Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo