Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
Proven at global web scale in production for modern data services, Alluxio is the developer of open source data orchestration software for the cloud. Alluxio moves data closer to big data and machine learning compute frameworks in any cloud across clusters, regions, clouds and countries, providing memory-speed data access to files and objects.
Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access layer for compute applications.
Alluxio is not a persistent storage system. Instead, Alluxio serves as a data access layer, residing between any persistent storage system (such as Amazon S3, Microsoft Azure Object Store, Apache HDFS or OpenStack Swift) and computation frameworks (such as Apache Spark, Presto or Hadoop MapReduce). This whitepaper provides a technical overview of the Alluxio architecture and describes the data flow for common read and write scenarios.
Learn more about data unification for the digital economy and how Alluxio’s data orchestration brings your data to your compute, wherever it’s located.
Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage.
The world is entering the data revolution era. Along with the latest advancements of the Internet, Artificial Intelligence (AI), mobile devices, autonomous driving, and Internet of Things (IoT), the amount of data we are generating, collecting, storing, managing, and analyzing is growing exponentially. To store and process these data has exposed tremendous challenges and opportunities. … Continued
In the past year, the Alluxio project experienced significant improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace.