Does Alluxio handle query push down - predicate?

The idea of predicate pushdown allows for certain parts of SQL queries (the predicates) to be “pushed” down to where the data lives. This optimization can reduce query processing time by filtering data that much earlier. Depending on compute framework, predicate pushdown can also optimize your query by filtering data before it is transferred over network, loaded into memory or even skip reading entire files or chunks of files.

Alluxio itself does not do any predicate pushdown. From the Alluxio perspective it is unaware of what the table or query is. Typically Alluxio would be given whatever information the compute-file format would have. So if you are running Spark on parquet, for example, Spark will ready the Parquet metadata headers to decide if the file is relevant to the query or not. Alluxio does do some optimization in that it can bring those certain metadata header chunks into memory since those blocks are the ones that are far more frequently read.

Data Architecture Answers

Does Alluxio handle query push down – predicate?