Increasingly powerful compute accelerators and large training dataset have made the storage layer a potential bottleneck in deep learning training/inference. Offline inference job usually consumes and produces tens of tera-bytes data while running more than 10 hours. For a large-scale job, it usually causes high IO pressure, increase job failure rate, and bring many challenges for system stability. We adopt alluxio which acts as an intermediate storage tier between the compute tier and cloud storage to optimize IO throughput of deep learning inference job. For the production workload, the performance improves 18% and we seldom see job failure because of storage issue.