Apache Spark is an advanced distribution execution data for large-scale data processing that differs from Hadoop by privileging in-memory compute and further enforcing the decoupling between compute and storage.
I’ve been an early adopter and spent far too long messing about with it in low-powered machines, and am rather partial to the DataBricks hosted solution.
Resources
Category | Date | Link | Notes |
---|---|---|---|
Add-ons | 2024 | DataFusion Comet | a modern query accelerator |