Apache Spark 

Apache Spark is an advanced distribution execution data for large-scale data processing that differs from Hadoop by privileging in-memory compute and further enforcing the decoupling between compute and storage.

I’ve been an early adopter and spent far too long messing about with it in low-powered machines, and am rather partial to the DataBricks hosted solution.

Resources

Category Date Link Notes
Add-ons 2024 DataFusion Comet

a modern query accelerator