What is an MPP Database?
Analytical Massively Parallel Processing (MPP) Databases are databases that are optimized for analytical workloads: aggregating and processing large datasets. MPP databases tend to be columnar, so rather than storing each row in a table as an object (a feature of transactional databases), MPP databases generally store each column as an object. This architecture allows complex analytical queries to be processed much more quickly and efficiently.
These analytic databases distribute their datasets across many machines, or nodes, to process large volumes of data (hence the name). These nodes all contain their own storage and compute capabilities, enabling each to execute a portion of the query.
The proliferation and drop in cost of analytical MPP databases in the last decade has created a huge opportunity for data-driven organizations to operationalize and analyze larger datasets than ever before. These databases have been a wonderful addition to the growing toolkit for analysts, but also introduce additional complexity into architectures.
What are Data Warehouses really great for?
Typical analytical workloads
MPP databases are very good at the most common analytical workloads, which are generally characterized by queries on a subset of columns with aggregations over broad ranges of rows. This is due to their columnar architecture, which allows them to only access the fields needed to complete a query (as opposed to transactional databases, which must access all fields in a row).
A columnar architecture also gives MPP databases additional features that are useful for analytic workloads. These vary by database, but often include the ability to compress like data values, efficiently index very large tables, and handle wide, denormalized tables.
Organizations typically use analytical MPP databases as data warehouses, or centralized repositories that house all data generated within their organization, such as transactional sales data, web tracking data, marketing data, customer service data, inventory/logistical data, HR/recruiting data, and system log data. Because analytical MPP databases can handle huge data volumes, an organization can comfortably rely on these databases to not only store data, but also support analytical workloads from these various business functions.
Analytical MPP databases can easily scale their compute and storage capabilities linearly by adding more servers to the system. This the opposite of vertically scaling compute and storage capabilities, which involves upgrading to larger and more powerful individual servers, and which generally hits a wall at scale. Analytical MPP databases are able to scale out so quickly, easily, and efficiently that on-demand database vendors have automated that process to scale the system up or down depending on the size of the query.