Controlling Cloud Egress Fees With Hybrid Cloud Data Access | Alluxio


Storing data in the cloud is cheap. Cloud providers do this to incentivize enterprises to move all their data to the cloud so that they can use the different compute services that they provide. However, every time the data moves across regions of the cloud, or the data moves out of the cloud (when accessed by on-premise data centers or by a different cloud), cloud providers charge an egress fee based on the amount of traffic that moves across the network.

DBTA 100 2023: The Companies That Matter Most in Data

Database Trends & Applications

The need to balance data safety with new data initiatives, deliver business value, and change company culture around data tops this year’s list of data and analytics management challenges.

How to Orchestrate Data for Machine Learning Pipelines


Machine learning (ML) workloads require efficient infrastructure to yield rapid results. Model training relies heavily on large data sets. Funneling this data from storage to the training cluster is the first step of any ML workflow, which significantly impacts the efficiency of model training. This article will discuss a new solution to orchestrating data for end-to-end machine learning pipelines that addresses the above questions. I will outline common challenges and pitfalls, followed by proposing a new technique, data orchestration, to optimize the data pipeline for machine learning.

Heard on the Street – 5/15/2023


Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!

Storage news ticker – May 12

Blocks & Files

Alluxio has published a Presto Optimization Handbook, downloadable here; Presto being a distributed query engine for data analytics. For customers using Trino (formerly PrestoSQL), check out The Trino Optimization Handbook here

Emerging Startups 2023: Top Open Source Startups


The Open Source has over 2.3K+ startups that comprise of companies that are engaged in tools that make source code available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. These companies includes software products for enterprise apps, infrastructure, software development, hardware, industry verticals such as finance, education, agriculture, advertisement, health, energy, gaming, technology, logistics, retail and any other miscellaneous open source softwares.