How does replication in Alluxio happen across worker nodes? Is the unit of replication a file or a block?

Alluxio is a data orchestration system which provides data locality with intelligent multi-tiering. The replication parameters are easily configured and once done, Alluxio handles replication transparently to the requesting compute framework. As always, there’s no changes required by the end user, it’s transparent:

In the above diagram, data is stored in RAM, SSD, or HDD. Its placement is determined by LRU (which is configurable). Also, the data can be pinned in the hot tier.

When replication is configured Alluxio will make replicas of the data in other worker nodes. As the data get colder those replicas will also move down the tiers. When a worker fails, the Alluxio master will direct accesses to another node with replica data, and create new replicas per the configuration. The unit of replication is a block.

More details of how replication works can be found in this doc: