Accelerate Auto Data Tagging with Alluxio and Spark in Hybrid Cloud – A Practice in WeRide
This blog shares the practice of using Alluxio and Spark to accelerate the auto data tagging system in WeRide, an autonomous driving technology company.
This blog shares the practice of using Alluxio and Spark to accelerate the auto data tagging system in WeRide, an autonomous driving technology company.
Alluxio started as a virtual distributed file system, a research project out of the AMPLab at U.C. Berkeley. Alluxio foresaw the need for agility when accessing large data stores separated from compute engines like Hadoop or Spark.
Fast forward several years and over a thousand committers later, and Alluxio has blossomed into the industry’s leading data orchestration platform for analytics and AI/ML. But as with any new type of technology, figuring out the best ways to use it depends on your data environment, computational workloads, issues, and goals.
Tags: cloud, data orchestration, datacenter, hybrid cloud, on-prem object storage, overview, storage, use cases
Alluxio’s distributed systems experts explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
Tags: ai, analytics, data orchestration, data platform, hybrid cloud, webinar
Join Alluxio’s distributed systems experts as they explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
Join Alluxio’s distributed systems experts as they explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and S3 client interfaces have greatly improved in performance and functionality as a result of the widespread usage and demand from AI/ML workloads and system administration needs. Alluxio is rapidly evolving to meet the needs of enterprises that are deploying it as a key component of their AI/ML stacks.
Tags: alluxio engineering, data orchestration, hybrid cloud, office hour, release
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
Tags: analytics, data orchestration, hybrid cloud, machine learning, webinar
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
Tags: analytics, data orchestration, hybrid cloud, machine learning, overview, webinar
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.