Oh boy, 20 years of software engineering definitely makes you tune your BS radar when reading claims about how awesome and mature a relatively new technology is. Every new project claims to solve all the problems for the current itch on hand. Do you know that feeling? You spend a week reading the documentation, prepare the test deployment, find out that documentation doesn’t match reality, look at GitHub and find dozens open issues. Finally, you throw it out in anger and curse at yourself because you just wasted a week of time.
Recently, I’ve been on a hunt for a PostgreSQL (optionally distributed) compatible database for one of the work tasks. It’s an awesome task because it forced me to look deep into the current database ecosystem. Other than occasional Postgres Docker image pull, I haven’t touched a traditional SQL database since 2012. This exercise is a good opportunity to figure out what’s new in the relational databases world since my last experiences with Microsoft SQL Server 2008, which I enjoyed very much!
Looking at just PostgreSQL was much fun. PostgreSQL is simply amazing. Fast, just works, JSON operations included and the grant system is outstanding. Permissions can be managed on a database, tablespace, schema, column and row level. With tablespaces, I can tell Postgres where the particular bit of data resides on disk. It’s a solid, ubiquitous database available in every cloud and very easy to deploy on premise. The tooling is mature, the operations are pretty well understood. Some claim that setting up replication and automatic failover can be a pita. I don’t know, I haven’t had a chance to go that far yet. However, when I was contracting for Saywell International back in 2010, we were running running an active-active MSSQL setup with log shipping over Dropbox. Crazy idea but it worked and was a lot of fun, ha! 2021 Postgres cannot be more difficult than that.
PostgreSQL is an ACID (atomic, consistent, isolated, durable) RDBMS (relational database management system). There’s a catch with RDBMS. Data must fit on a single server. If the data outgrows a server, it can be manually shareded (partitioned) or moved to a bigger server. Well, we can can get pretty big servers today, for sure. 256BG RAM? 44 cores? 12TB disks? Sure. But when that bad boy has a hiccup, things go bad fast. The cost of these servers, even in the cloud, is mind-boggling. Triple that when you’re in need of replicas.
A database article without mentioning the CAP theorem is not a database article. The CAP theorem describes the three major properties of a database system: consistency, availability and partition tolerance. The CAP theorem also says that it is not possible to have all three properties fulfilled by one system, choose any of the two properties but all three is not possible. An RDMBS satisfies the C and A properties: data is consistent and available but not partition tolerant.
The last few years, and especially the last couple of years, has seen some really interesting new developments in the database space. New SQL databases are popping up left and right and many of them attempt to crack the hard problem of distributed querying, distributed transactions and data replication. This is an interesting development, most likely, rooted in NoSQL (not only SQL). Databases like Cassandra or Riak brought the concept of Amazon’s Dynamo to the masses. These new databases took advantage of and introduced many new concepts. Consistent hashing and distributed hash tables (DHT) for data storage being probably most crucial ones.
DHTs made the automatic partitioning possible. Consistent hashing makes it possible to identify the location of the partition the data belongs to within the cluster with O(1) complexity. These new databases replicate the data within the cluster. Data is available, a loss of a node does not imply service interruption. These databases fulfill the A and P properties of CAP.
On a hand-wavy, high conceptual level, the storage mechanism wasn’t different than RDBMS sharding with manual replication configuration. However, with regular RDBMS, it isn’t possible to treat a sharded table as one. Querying across shards is not straightforward, transactions across shards are not possible.
Solving one problem often implies introducing another. These new databases are in fact key/value stores with an SQL layer on top. There are no transactions, there are no joins, distributed consistent counters are difficult. Due to the nature of replication, the replicas are eventually consistent; it takes time for the replica to catch up with the main partition (hence no C property). Using databases like Cassandra or Riak implied plenty of write overhead. For performance reasons, it’s often better to store the data in the format intended for reading. It’s a common pattern to write the data multiple multiple times to different tables, depending on the context it would be used when querying. Capacity planning is difficult, what sits where in the database is difficult to track by just looking at the schema. There are no foreign keys, constraints are not not always possible. Plenty of the database logic is shifted to the application layer.
As the world was figuring out NoSQL, on the other side of the spectrum, the smart folks at Google and Amazon had a crack at the number one RDBMS problem. How to make a global, scalable, distributed, highly available ACID database a reality. The result of that work is—respectively—Google Spanner and Amazon Aurora. Both are available, consistent, isolated and durable global scale, distributed SQL databases featuring distributed transactions, joins and majority of the niceties of an RDBMS. Both can be used with existing RDBMS tools and often are a drop-in replacement. Things don’t always work like in the real implementation because of the design trade-offs of the distributed implementation, but things are often close enough. The problem with those? They’re proprietary and prohibitively expensive when using from the outside of the respective cloud provider.
As it usually happens with those clever whitepapers from the big players, other smart people pick them up and go on to build interesting products. It’s no different with Spanner / Aurora thing. The two most interesting products currently available are CockroachDB and YugabyteDB. They are roughly similar in what problems they tackle. Both are CP databases but both achieve high availability. Both are ACID and both are PostgreSQL compatible. Both have a roughly similar architecture. CockroachDB is written in go and reimplements the Postgres protocol from scratch, YugabyteDB is C++ and lifts the complete original C++ PostgreSQL engine.
It’s like PostgreSQL with storage ripped out and new RAFT based storage layer added.
I have briefly looked at CockroachDB but rejected it almost immediately due to the licensing constraints. It’s not really clear what becomes Apache 2 licensed when and it explicitly disallows as-a-service use.
Here’s the description:
YugabyteDB is a free and open-source, distributed, relational, NewSQL database management system designed to handle large amounts of data spanning across multiple availability zones and geographic regions while providing single-digit latency, high availability, and no single point of failure.
Basically, the magic pixie dust. I was very reluctant giving it a try. My BS radar was fully tuned.
This was different than anxiously anticipated! It does deliver everything what it claims it does.
Databases, schemas, tables, views, triggers, foreign data wrapper, and tablespaces, it’s all in there. Permission management comes from PostgreSQL and works exactly as expected. Roles can be restricted to databases, schemas, tablesspaces. Access to tables, views, functions, triggers, even individual columns and rows, can be controlled exactly like in Postgres—it’s all in there and working.
There’s more. YugabyteDB is a distributed system for a reason. The underlying DocDB layer automatically replicates and shards the data for high availability while leaving full control over those aspects to the operator. This is great because the replication is the first backup layer, a loss of a node does not mean data loss. Of course, take your backups…
YugabyteDB uses RAFT replication per table. Different tables can have different replication factor. Different tables can be tied to different regions. Wow, even specific rows of specific tables can be placed in exact geographical locations, for example for legal compliance. There are distributed queries, joins, triggers, transactions, everything.
My first Postgres compatibility tests used a single YB master and YB TServer setup. The next day, I had a 30 node cluster with latency on par with PostgreSQL.
Maybe I’m still in the honeymoon period with YugabyteDB. Whatever, so far I love the product and the execution. It behaves like PostgreSQL. For the tooling, it’s basically PostgreSQL with a lot of muscle behind. Like the new tablespaces implementation taking the Postgres concept one step further to allow geo-aware placement.
While the Postgres layer is amazing in itself, YugabyteDB is not only about Postgres. There’s also the YCQL layer which is basically Cassandra on steroids and there’s the Yedis layer—Redis on steroids. There’s change data capture, point in time restore, Kafka ingest integration and many more features adding up to a very solid all round database solution ready for the cloud-native XXI century.
It’s clear that the team behind the product understands the problem space and is focused on solving their clients problems in open source rather than chasing an ever longer feature list. There does not seem to be a secondary agenda. There are two commercial products next to the open source version: the enterprise fully managed solution and the self-managed licensed solution. The company behind clearly knowns what, how and for whom.
If you enjoy using PostgreSQL, you’ll love YugabyteDB.