Is it possible to have data only in Alluxio or do you have to data in storage like S3 or HDFS?

The purpose of Alluxio is to be an abstraction layer with storage systems underneath it. Alluxio is designed in a way that it assumes that there’s a storage layer underneath, so using it as another storage system does not solve the problem of having storage and compute co-located. 

Alluxio allows you to have long-running data clusters that bring data together from many different systems and make it available to compute. So in one way, it is storing data, but the purpose is to make it more accessible to compute as opposed to being the storage system itself.

Storage systems are solving for different concerns – persistence, durability, built to be cost-efficient like S3. We don’t recommend having data only in Alluxio in a distributed way.

That said, there may be a very basic use case where you might need a temporary buffer for data that does not need to live on permanent storage.  In this case, you can use Alluxio as a storage layer for this temporary data.