YugabyteDB change data capture

19th of March, 2022:
Interested in YugabyteDB CDC? YugabyteDB 2.13 comes with an all-new beta of a CDC SDK: YugabyteDB CDC SDK beta, a high level overview.

Today I’m going to look at change data capture (CDC) in YugabyteDB. Change data capture is a data integration method based on the identification, capture, and delivery of data changes. Enterprises can choose this method to identify what data changes in the database and to act on those changes in real-time. What are the use cases for CDC? For example:

Identifying Postgres features unsupported in YugabyteDB

With every version, the number of PostgreSQL features supported by YugabyteDB inevitably increases. This is good! However, it is pretty difficult to come by a comprehensive list of unsupported features. There’s a roadmap on GitHub1, but the roadmap lists top-level features and misses the exact list of statements and keywords. There are also GitHub issues, but there’s over 11k of them so it could be pretty difficult to filter out what’s really unsupported.

YugabyteDB Go RPC client

The YugabyteDB RPC API isn’t an official API, and that’s what makes it interesting. The whole distributed aspect of YugabyteDB is based on that API. Digging through it is a perfect way to understand the internals of the database. The RPC API can also be used to automate various aspects of the database. I’ll come back to this subject in near future.

A brief look at YugabyteDB RPC API

YugabyteDB is a horizontally scalable 100% open source distributed SQL database providing Cassandra (YCQL) and PostgreSQL (YSQL) compatible APIs. For PostgreSQL API, YugabyteDB uses the actual PostgreSQL 11.2 engine and builds on top of it with a RAFT consensus layer.

There are two components in the YugabyteDB architecture: master servers and tablet servers. Master servers know everything about the state of the cluster, they are like an address book of all objects and data stored in the database. Tablet servers store the data. Each tablet server runs an instance of PostgreSQL.

YugabyteDB: the book

I’ve been deep in YugabyteDB trenches for the last 6 months. At Klarrio, we are building a Database as a Service solution for the Data Services Hub. This is a self-managed, multi-tenant solution designed to run on top of Apache Mesos. We have a small team doing all kinds of integration work. I am focusing on the core database rollout.

YugabyteDB: Postgres foreign data wrapper

Mmmm, nearly missed it.

YugabyteDB 2.9.1.0 was released on the 29th of October.

So here’s the thing. Back in August 2021, I contributed foreign data wrapper support to YugabyteDB, and 2.9.1.0 is the first beta release with this feature included. What I’m trying to say: postgres_fdw extension can be used in YugabyteDB starting with version 2.9.1.0.

Postgres in Docker with persistent storage

Yes, it’s perfectly fine to run databases in containers. The only challenge is to make sure that the data stored by the database does not reside within the file system of the container. Otherwise, after removing the container, the data will be gone, too.

YugabyteDB Docker image

The default YugabyteDB Docker image from Docker Hub runs the database as a root user.

I need to run it as a non-root user and there is no release Docker image Dockerfile available in YugabyteDB repositories.

So I’ve created my own and here it is.