Overview

MemSQL is a highly scalable, distributed database that pairs an in-memory rowstore with an analytics-oriented columnstore on disk. This ability to act as both a row store and a column store concurrently, combined with lock-free data structures, multiversion concurrency control, and lock-free skiplists, allows MemSQL to support real-time analytics, eliminating the need for an additional data warehouse and ETL processes.

Why and When to Use MemSQL

Both a Transactional Database and Analytical Data Warehouse

MemSQL is designed to act as both a transactional (rowstore) database, and an analytical (columnstore) data warehouse. This unique technology eliminates the need for an organization to spin up a transactional database for row-level writes, and a separate analytical datawarehouse for historical analysis. Organizations can concurrently write and read from the same distributed system using ANSI SQL.

Real-Time Analytics

The ability to stream live data into relational format while concurrently reading data means MemSQL can discover insights on the latest events or compare those with historical data. The integrated architecture can replace more complex systems that require movement of data from a transactional database or data lake into a dedicated data warehouse for analysis. Most relational database management systems (RDBMSs) cannot write and read data at the same time. This unique architecture makes MemSQL an attractive technology for organizations that want to analyze their business continuously while providing a historical reference store for predicting events or for making strategic decisions.

Scalability

MemSQL scales similar to a Hadoop architecture in that MemSQL doesn’t require exotic hardware; it scales linearly on commodity hardware. This allows MemSQL to run on-premises, in the cloud, or as a managed service. It’s also possible to scale MemSQL instances without taking the cluster offline, which means MemSQL can be easily scaled “just-in-time” when new performance or storage is necessary.

What use cases is MemSQL really great for?

Use Cases that Require Real-Time Analytics

MemSQL supports real-time analytics. It can natively subscribe to Kafka or Spark data streams and process at a very large scale. Once data is in MemSQL, you can use full SQL to analyze it. This makes MemSQL a great technology for use cases where a high business value is placed on minimized latency, such as fraud detection, system monitoring, e-commerce, and social networks.

How Much Does MemSQL Cost?

Enterprise Edition

MemSQL is priced per GB of RAM in the cluster on an hourly or annual subscription basis. For more details, contact MemSQL’s team.

Community Edition

MemSQL has a free Community Edition for development only purposes with no limits on scale or capacity.

How Does MemSQL Work?

MemSQL’s architecture has three distinct components that contribute to its speed and dual functionality. These three distinct components are:

  • Distributed, shared-nothing architecture on commodity hardware
  • Hybrid OLTP & OLAP Architecture
  • Lock-free Data Structures

Distributed Shared Nothing Architecture

MemSQL’s architecture is distributed across many individual machines, with each machine possessing its own CPU, memory, and disk; this is commonly known as a shared-nothing architecture, because storage and compute resources aren’t shared across the system.

MemSQL is also designed to run on commodity hardware, which means that the hardware needed to run a MemSQL system is not proprietary and therefore is relatively cheap. This also allows MemSQL administrators to scale the instance cheaply and easily.

Each node within MemSQL will fall under one of two types:

  • Aggregator nodes store metadata such as indexes across the entire system, and are responsible for querying the storage nodes (known as leaves), aggregating the results, and sending them back to the client.
  • Leaf nodes each contain a portion of the larger table. They’re responsible for storing and computing the data located on the node.



Application workloads can be divided amongst aggregators, where one aggregator serves application A, and another serves application B.

Hybrid OLTP & OLAP Architecture

MemSQL contains both an in-memory rowstore for fast, high availability queries and an on-disk columnstore for analytical workloads that involve aggregating data, or where the size of table exceeds memory constraints.

MemSQL is able to join between its rowstore and columnstore, which means that a user querying the MemSQL database will be able surface both real-time and historical data with one query.

Lock-Free Data Structures

Many relational databases impose a “lock” on stored tables when writing data. This is because if a read occurs at the same time data is being written to the table, the data can be output inconsistently. For example if an UPDATE statement is issued on the table at the same time as a SELECT statement, you could potentially read out half written data.

Table locks increase latency for database systems because they necessitate a dedicated database or data warehouse specifically for reading data or performing analytical queries. Data will then be moved in batches into that database at dedicated ETL intervals.

MemSQL implements lock-free data structures, skiplists, and Multiversion Concurrency Control (MVCC) to eliminate the need to lock the database when updating tables, which means that writes can happen at very high throughput while also maintaining a high level of concurrent reads.

Types of data

MemSQL stores data in either row or columnar format, depending on the table DDL*. It can ingest both tabular and nested data structures originating from text (delimited) files, JSON, and geospatial data.

Range of row (quantity)

MemSQL is capable of billion-row, petabyte-scale tables given enough compute and storage. Keep in mind that costs will increase if you need substantial space in the in-memory rowstore portion of MemSQL.

How to Use MemSQL

Implementing MemSQL

MemSQL is flexible enough to be implemented on-premise, and across a variety of cloud platforms. A good resource to get an idea for the implementation process for MemSQL is MemSQL’s developer documentation.

MemSQL is an ODBC database and wire-compatible with MySQL, so any software that is compatible with MySQL can also be connected to MemSQL. This makes it very easy to move from MySQL to MemSQL. Once you move your data out of MySQL and into MemSQL, all you’ll need to do is point your existing applications at the MemSQL database.

Process for new data

MemSQL supports the SQL standard LOAD DATA command to bring data into the environment. Because of MemSQL’s MySQL heritage, you can also leverage MySQL’s CLI commands or an open source tool like mysqldump to get data into MemSQL. Additionally, MemSQL has a GUI-based loading tool called Pipelines that can ingest data from different sources, including Amazon S3, Kafka, and other sources.

MemSQL uses sharding to distribute data uniformly across the leaf nodes. Sharding is the process of horizontally splitting a database table into uniform chunks, and distributing those chunks evenly across multiple nodes. With other databases, sharding must be configured manually, which prompts occasionally rebalancing the database. MemSQL on the other hand, partitions the tables automatically.

Maintenance

MemSQL can be scaled linearly, with “just-in-time” scaling. You don’t need to take the cluster offline. Increasing leaf nodes allows for faster query return time, and higher concurrency.

Things to look out for

MemSQL is a relatively new database, and is not yet fully ANSI SQL compatible (as of this writing it’s only ANSI SQL-92 compliant, compared to MySQL or Postgres which are at least SQL:2003 compliant).

While MemSQL has worked hard to develop the database optimizer, the technology still has shortcomings that are natural with a newer database technology. You may need to rewrite some queries in order to get the performance you’re looking for. Their support team will assist you with this process.

MemSQL Facts