Spark is a distributed, in-memory compute framework. It provides a platform for ingesting, analyzing, and querying data. In addition to high-level APIs in Java, Scala, Python, and R, Spark has a broad ecosystem of applications, including Spark SQL (structured data), MLlib (machine learning), GraphX (graph data), and Spark Streaming (micro-batch data streams).
Developed at UC Berkeley in 2009, Apache Spark is well suited for interactive querying and analysis on extremely large datasets. It’s made available in the cloud by AWS on Elastic MapReduce and Databricks, or on-premises. Spark can scale to thousands of machines to handle petabytes of data.