Many companies are working with development architectures for AI platforms but have concerns about efficiency at scale as data volumes increase. They use centralized cloud data lakes, like S3, to store training data for AI platforms. However, GPU shortages add more complications. Storage and compute can be separate, or even remote, making data loading slow and expensive:
- Optimizing a developmental setup can include manual copies, which are slow and error-prone
- Directly transferring data across regions or from cloud to on-premises can incur expensive egress fees
This webinar covers solutions to improve data loading for model training. You will learn:
- The data loading challenges with distributed infrastructure
- Typical solutions, including NFS/NAS on object storage, and why they are not the best options
- Common architectures that can improve data loading and cost efficiency
- Using Alluxio to accelerate model training and reduce costs
Video:
Presentation slides:
Speaker:
Dr. Beinan Wang is the Tech Lead at Alluxio and is the committer of PrestoDB. Prior to Alluxio, he was the Tech Lead of the Presto team in Twitter and he built large-scale distributed SQL systems for Twitter’s data platform. He has twelve-year of experience working on performance optimization, distributed caching, and volume data processing. He received his Ph.D. in computer engineering from Syracuse University on the symbolic model checking and runtime verification of distributed systems.
Tarik Bennett is a Senior Solutions Engineer at Alluxio with 10 years of experience in the technology space. Born and raised in San Diego, Tarik received a BS from University of California, Davis. He has experience in big data, search, distributed systems, and cloud environments. He is an advocate for open-source technologies and has sold and supported enterprise software.