MemSQL is a highly scalable, distributed SQL database that pairs an in-memory rowstore with an analytics-oriented columnstore on disk. This ability to act as both a row store and a column store concurrently, combined with lock-free data structures, multiversion concurrency control, and lock-free skiplists, allows MemSQL to support traditional and real-time analytics, eliminating the need for an additional data warehouse and ETL processes.
Why and When to Use MemSQL
Both a Transactional Database and Analytical Data Warehouse
MemSQL is designed to act as both a transactional (rowstore) database, and an analytical (columnstore) data warehouse. This unique technology eliminates the need for an organization to spin up a transactional database for row-level writes, and a separate analytical datawarehouse for historical analysis. Organizations can concurrently write and read from the same distributed system using ANSI SQL.
The ability to stream operational data into relational format while concurrently reading data means MemSQL can discover insights on the latest events or compare those with historical data. This approach enables real-time financial reporting or insights on live customer interactions with a single unified database. The integrated architecture can replace more complex systems that require copying of data from a transactional database or data lake into a dedicated data warehouse for analysis. Most relational database management systems (RDBMSs) cannot write and read data at the same time. This unique architecture makes MemSQL an attractive technology for organizations that want to analyze their business continuously while providing a historical reference store for predicting events or for making strategic decisions.
MemSQL scales similar to a Hadoop architecture in that MemSQL doesn’t require exotic hardware; it scales linearly on commodity hardware. This allows MemSQL to run on-premises, in the cloud, or as a managed service. It’s also possible to scale MemSQL instances without taking the cluster offline, which means MemSQL can be easily scaled “just-in-time” when new performance or storage is necessary.
What use cases is MemSQL really great for?
Use Cases that Require Operational or Real-Time Analytics
MemSQL supports real-time analytics. It can natively subscribe to Kafka, Spark, AWS S3, or HDFS data pipelines and process at a very large scale. Once data is in MemSQL, you can use standard SQL to analyze it. This makes MemSQL a great technology for use cases where a high business value is placed on minimized latency, such as fraud detection, system monitoring, e-commerce, and social networks.
How Much Does MemSQL Cost?
MemSQL offers a free tier up to 128GB of RAM and unlimited disk or as an Enterprise subscription priced per GB of RAM in the cluster. The subscription provides professional support. For more details, contact MemSQL’s team.
MemSQL has a free tier that can be used in production and is available up to 128GB of RAM with unlimited disk.
How Does MemSQL Work?
MemSQL’s architecture has three distinct components that contribute to its speed and dual functionality. These three distinct components are:
- Distributed, shared-nothing architecture on commodity hardware
- Hybrid OLTP & OLAP Architecture
- Lock-free Data Structures
Distributed Shared Nothing Architecture
MemSQL’s architecture is distributed across many individual machines, with each machine possessing its own CPU, memory, and disk; this is commonly known as a shared-nothing architecture, because storage and compute resources aren’t shared across the system.
MemSQL is also designed to run on commodity hardware, which means that the hardware needed to run a MemSQL system is not proprietary and therefore is relatively cheap. This also allows MemSQL administrators to scale the instance cheaply and easily.
Each node within MemSQL will fall under one of two types:
- Aggregator nodes store metadata such as indexes across the entire system, and are responsible for querying the storage nodes (known as leaves), aggregating the results, and sending them back to the client.
- Leaf nodes each contain a portion of the larger table. They’re responsible for storing and computing the data located on the node.
Application workloads can be divided amongst aggregators, where one aggregator serves application A, and another serves application B.
Hybrid OLTP & OLAP Architecture
MemSQL contains both an in-memory rowstore for fast, high availability queries and an on-disk columnstore for analytical workloads that involve aggregating data, or where the size of table exceeds memory constraints.
MemSQL is able to join between its rowstore and columnstore, which means that a user querying the MemSQL database will be able surface both real-time and historical data with one query.
Lock-Free Data Structures
Many relational databases impose a “lock” on stored tables when writing data. This is because if a read occurs at the same time data is being written to the table, the data can be output inconsistently. For example if an
UPDATE statement is issued on the table at the same time as a
SELECT statement, you could potentially read out half written data.
Table locks increase latency for database systems because they necessitate a dedicated database or data warehouse specifically for reading data or performing analytical queries. Data will then be moved in batches into that database at dedicated ETL intervals.
MemSQL implements lock-free data structures, skiplists, and Multiversion Concurrency Control (MVCC) to eliminate the need to lock the database when updating tables, which means that writes can happen at very high throughput while also maintaining a high level of concurrent reads.
Types of data
MemSQL stores data in either row or columnar format, depending on the table DDL*. It can ingest both tabular and nested data structures originating from text (delimited) files, JSON, Avro, and geospatial data.
Range of row (quantity)
MemSQL is capable of billion-row, petabyte-scale tables given enough compute and storage. Keep in mind that costs will increase if you need substantial space in the in-memory rowstore portion of MemSQL.
How to Use MemSQL
MemSQL is flexible enough to be implemented on-premise, and across a variety of cloud platforms. A good resource to get an idea for the implementation process for MemSQL is MemSQL’s developer documentation.
MemSQL is an ODBC database and wire-compatible with MySQL, so any software that is compatible with MySQL can also be connected to MemSQL. This makes it very easy to move from MySQL to MemSQL. Once you move your data out of MySQL and into MemSQL, all you’ll need to do is point your existing applications at the MemSQL database.
Process for new data
MemSQL supports the SQL standard
LOAD DATA command to bring data into the environment. Because of MemSQL’s MySQL heritage, you can also leverage MySQL’s
CLI commands or an open source tool like
mysqldump to get data into MemSQL. Additionally, MemSQL has a GUI-based loading tool called Pipelines that can ingest data from different sources, including Amazon S3, Kafka, and other sources.
MemSQL uses sharding to distribute data uniformly across the leaf nodes. Sharding is the process of horizontally splitting a database table into uniform chunks, and distributing those chunks evenly across multiple nodes. With other databases, sharding must be configured manually, which prompts occasionally rebalancing the database. MemSQL on the other hand, partitions the tables automatically.
MemSQL can be scaled linearly, with “just-in-time” scaling. You don’t need to take the cluster offline. Increasing leaf nodes allows for faster query return time, and higher concurrency.
Things to look out for
MemSQL is a relatively new database, and is not yet fully ANSI SQL compatible (as of this writing it’s only ANSI SQL-92 compliant, compared to MySQL or Postgres which are at least SQL:2003 compliant).
While MemSQL has worked hard to develop the database optimizer, the technology still has shortcomings that are natural with a newer database technology. You may need to rewrite some queries in order to get the performance you’re looking for. Their support team will assist you with this process.
- Some of MemSQL’s more popular customers include rideshareing company Uber, content directory network provider Akamai, food manufacturing company Kellogg’s, and media giant Comcast.
- MemSQL’s founding team was initially out of Facebook, and went through Y-Combinator in 2011.
- MemSQL’s first customer was fellow Y-Combinator alumni LikeALittle.
- Sam Altman (Y-Combinator) and Ashton Kutcher (A-Grade Investments) are some of MemSQL’s more famous angel investors.
- In 2016, MemSQL had the top score in “Operational Data Warehouse Use Cases” in Gartner’s 2016 report titled “Analytical Capabilities for Data Warehouse and Data Management Solutions for Analytics”