TalkingData’s largest data broker, provides data intelligence solutions and processes over 20 terabytes of data and more than one billion session requests per day. TalkingData deployed Alluxio to unify disparate cloud, on-premise, and hybrid data sources for a range of analytics applications. The architecture provides self-service data access for data scientists and engineers, eliminating the need for ETL or manual IT assistance.
Tag: compute storage separation
The world is entering the data revolution era. Along with the latest advancements of the Internet, Artificial Intelligence (AI), mobile devices, autonomous driving, and Internet of Things (IoT), the amount of data we are generating, collecting, storing, managing, and analyzing is growing exponentially. To store and process these data has exposed tremendous challenges and opportunities. … Continued
The primary appeal of a coupled compute-storage architecture, an architecture where the computation is happening on the machines where the data resides, is the performance possible by bringing the compute engine to the data it requires; however, the costs of maintaining such tight-knit architectures are gradually overtaking the performance benefits. Especially with the popularity of cloud resources, being able to independently scale compute and storage results in large cost savings and cheaper maintenance. In addition, data has become the new oil, and all modern organizations are looking to capture as much data as possible.
Strata Singapore 2017 – Hear about how to decouple compute and storage with Alluxio, exploring the decision factors and considerations, along with production best practices and solutions.
In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs & DataFrames, and describe production deployments with both Alluxio and Spark working together.
Strata+Hadoop 2016 – In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. At the same time, the Alluxio ecosystem has expanded to include support for more under storage systems and computation frameworks.
Southern California Data Science Conference 2016 – In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. At the same time, the Alluxio ecosystem has expanded to include support for more under storage systems and computation frameworks.
A demo video of how Ceph could be integrated with Big Data solution through Alluxio.
This is an excerpt from the Accelerating On-Demand Data Analytics with Alluxio whitepaper, which includes a detailed implementation guide in addition to this high level overview.
In the Big Data world, it is often the case that only a subset of the total data is relevant for answering the question at hand. As a result, the total cost of ownership for long running clusters for analytics is high while utilization is low, especially when adopting an architecture of co-locating compute and storage.