Community feedback invited on biggest Alluxio release ever, including support for more than one billion files and new Alluxio POSIX API for AI applications
SAN MATEO, Calif., March 07, 2019 (GLOBE NEWSWIRE) — Alluxio, developer of the world’s first software system that unifies data at memory speed, today announced the preview release of Alluxio 2.0, the most ambitious platform upgrade since the inception of Alluxio. This preview release, now available for free download, is the largest open source release with the most new features added since the creation of the project and is designed to allow the community to experiment with new capabilities and explore Alluxio for new use cases such as simplifying data engineering and access for AI model training.
“Today, our users already deploy Alluxio at very large scale with many thousand node single cluster production deployments across telecommunications, retail and internet companies,” said Haoyuan Li, CEO and co-founder of Alluxio. “This release allows our users to take Alluxio deployments to the next level of scale with support for extreme data requirements. Our users as well as the data engineering community will find a much more intuitive interface with greatly expanded capabilities to help them run analytics and AI workloads on private, public or hybrid cloud infrastructures leveraging valuable data wherever it might be stored.”
“At China Unicom, we use Alluxio at scale as a core component of our modern data stack along with Apache Spark, HDFS, Hive and Apache Kafka. We are excited about Alluxio 2.0, particularly the new metadata management and scale out capabilities that will allow us to continue elastically scaling our deployment for the explosive data growth we see coming,” said Ce Zhang, Senior Software Engineer at China Unicom Software Research Institute.
“AVA — our cloud-native deep learning platform — is built on Tensorflow, Caffe, Alluxio and KODO (a customized object store and CEPH). Alluxio orchestrates data movement from storage systems to data science environments, eliminating complex data engineering tasks and speeding up model training. Alluxio 2.0’s improved file system API to access data stored in any storage system will allow for accelerating machine learning training even further for faster innovation,” said Chaoguang Li, Technical Director of Atlab at Qiniu Cloud.
The Alluxio 2.0 preview release provides new features across critical key areas:
Support for hyperscale data workloads:
- Support for more than 1 billion files – New option for tiered metadata storage for files and objects enabling the unified namespace to scale to more than a billion files with metadata for hot data stored in the process memory while the rest is managed by Alluxio outside the process memory.
- Highly distributed data services – 2.0 introduces the Alluxio Job Service, a distributed clustered service, that data operations such as replication, persistence, cross storage move and distributed load now use, for enabling high performance and massive scale.
- Adaptive replication for increased data locality – New feature to configure a range for the number of copies of data stored in Alluxio that are automatically managed.
- High availability with embedded journal – A new fault tolerance and high availability mode for file and object metadata called the embedded journal that uses the RAFT consensus algorithm and is independent of any other external storage systems. This is particularly helpful for abstracting object storage.
Enabling machine learning and deep learning workloads on any storage:
Machine learning and deep learning frameworks need to extract data from Hadoop and object stores, typically a very manual and time consuming process.
- Alluxio POSIX API. Alluxio’s FUSE feature enables a POSIX compatible API so that frameworks like Tensorflow, Caffe and other Python-based models can directly access data from any storage system via Alluxio using traditional file system access.
Better storage abstraction for completely independent and elastic compute:
- Support for HDFS clusters across different versions – Explosive growth of data has led enterprises to have many data silos including multiple Hadoop clusters across many different versions. Unified access across these clusters is currently very difficult. With Alluxio 2.0, users can connect to multiple HDFS clusters with any version to Alluxio and unify data access across them.
- Active sync with Hadoop – New capability integrates with HDFS iNotify to update any data and metadata changes that happen to files stored in Hadoop allowing for applications accessing data via Alluxio to proactively receive the latest updates.
For more information on the biggest advance in Alluxio capabilities ever, register here for the free San Francisco Bay Area 2.0 Preview Meetup on March 14, 2019 and the free 2.0 Preview webinar on March 28, 2019.
- Announcing Alluxio 2.0 Preview – Release overview blog
- 2.0 Preview Release – Documentation
- Download Alluxio 2.0 Preview
- Alluxio Community Slack Channel
Proven at global web scale in production for modern data services, Alluxio is the world’s first system that unifies data at memory speed. Named a Top 10 Storage Startup by CRN in 2018, Alluxio provides a single source virtual data layer connecting data analytics and machine learning frameworks to data running on premises, in public clouds or in multi / hybrid cloud environments. Intelligent data tiering and data management deliver consistent high performance to customers in financial services, high tech, retail and telecommunications. Venture-backed by Andreessen Horowitz and Seven Seas Partners, Alluxio was founded at UC Berkeley’s AMPLab by the creators of the Tachyon open source project. For more information, contact email@example.com or follow us on LinkedIn, or Twitter.
Lonn Johnston for Alluxio