Grafana, a comprehensive metrics visualization software, ties into this process by pulling the metrics that systems like Alluxio collect through a sink and visualizes them in a more helpful fashion. This guide will cover how to set up Grafana and Graphite, a supported sink for Alluxio that will put metrics in a time-series database, along with exploring some of the possibilities that the combination offers.
The Alluxio-Presto sandbox is a docker application featuring installations of MySQL, Hadoop, Hive, Presto, and Alluxio. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio in a big data software stack.
What is Apache Hadoop If you’re new to building big data applications, Apache Hadoop is a distributed framework for managing data processing and storage for big data applications running in clustered systems. It consists of 5 modules – a distributed file system (aka HDFS or Hadoop Distributed File System), MapReduce for parallel processing of datasets, … Continued
How do we access AWS S3 data when running Presto in an on-premise environment, how can we do it efficiently to reduce both egress cost and performance runtimes? Alluxio as a local cache for Presto queries against remote AWS S3 data sources As we move toward more and more decoupled environments one of the things … Continued
As the data ecosystem within enterprises grow larger and larger, not only do we see an increase in total data volumes but also an increase in the disparate storage systems in which they are housed. The challenge then becomes how do different applications and teams have an efficient way of being able to access data … Continued
Problem It becomes increasingly more popular among data scientists to train models based on frameworks like TensorFlow on a local server or cluster while using remote shared storages like S3 or Google Cloud Storage to store a massive amount of the input data. This stack provides high flexibility and cost efficiency, especially requires no dev-ops … Continued
Introducing S3 and Spark S3 has become the de-facto standard API for digital business applications to store unstructured data chunks. To this end, several vendors have S3-API compatible offerings that allow app developers to standardize on the S3 API’s on-premise, and port these apps to run on other platforms when ready. So, what is S3 and … Continued
Proven at global web scale in production for modern data services, Alluxio is the developer of open source data orchestration software for the cloud. Alluxio moves data closer to big data and machine learning compute frameworks in any cloud across clusters, regions, clouds and countries, providing memory-speed data access to files and objects.
Apache Spark has brought significant innovation to Big Data computing, but its results are even more extraordinary when paired with Alluxio. Alluxio, provides Spark with a reliable data sharing layer, enabling Spark to excel at performing application logic while Alluxio handles storage. Bazaarvoice uses the combination of Spark and Alluxio to provide a real time big data platform that has the ability to not only handle the intake of 1.5 billion page views during peak events like Black Friday, but also provide real time analytics against it (read more). At this scale, the gain in speed is an enabler for new workloads. We’ve established a clean and simple way to integrate Alluxio and Spark.