Gunnar Morling

Gunnar Morling

Random Musings on All Things Software Engineering

Recent posts

Nov 21, 2023

"Change Data Capture Breaks Encapsulation". Does it, though?

This post originally appeared on the Decodable blog. All rights reserved. Having worked on Debezium—​an open-source platform for Change Data Capture (CDC)—​for several years, one concern I’ve heard repeatedly is this: aren’t you breaking the encapsulation of your application when you expose change event feeds directly from your database? After all, CDC exposes your internal persistent data model to the outside world, which may have unintended consequences, e.g. in terms of data exposure but also when it comes to changes to the schema of your data, which may break downstream consumers.

Read More...

Nov 14, 2023

Can Debezium Lose Events?

This question came up on the Data Engineering sub-reddit the other day: Can Debezium lose any events? I.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?

Read More...

Nov 2, 2023

CDC Use Cases: 7 Ways to Put CDC to Work

This post originally appeared on the Decodable blog. All rights reserved. Change Data Capture (CDC) is a powerful tool in data engineering and has seen a tremendous uptake in organizations of all kinds over the last few years. This is because it enables the tight integration of transactional databases into many other systems in your business at a very low latency.

Read More...

Feb 28, 2023

Finding Java Thread Leaks With JDK Flight Recorder and a Bit Of SQL

The other day at work, we had a situation where we suspected a thread leak in one particular service, i.e. code which continuously starts new threads, without taking care of ever stopping them again. Each thread requires a bit of memory for its stack space, so starting an unbounded number of threads can be considered as a form of memory leak, causing your application to run out of memory eventually. In addition, the more threads there are, the more overhead the operating system incurs for scheduling them, until the scheduler itself will consume most of the available CPU resources. Thus it’s vital to detect and fix this kind of problem early on.

Read More...

Jan 15, 2023

Getting Started With Java Development in 2023 — An Opinionated Guide

27 years of age, and alive and kicking — The Java platform regularly comes out amongst the top contenders in rankings like the TIOBE index. In my opinion, rightly so. The language is very actively maintained and constantly improved; its underlying runtime, the Java Virtual Machine (JVM), is one of, if not the most, advanced runtime environments for managed programming languages. There is a massive eco-system of Java libraries which make it a great tool for a large number of use cases, ranging from command-line and desktop applications, over web apps and backend web services, to datastores and stream processing platforms. With upcoming features like support for vectorized computations (SIMD), light-weight virtual threads, improved integration with native code, value objects and user-defined primitives, and others, Java is becoming an excellent tool for solving a larger number of software development tasks than ever before.

Read More...

Jan 5, 2023

Oh... This is Prod?!

I strongly believe that you should avoid connecting to production environments from local developer machines as much as possible. But sometimes, e.g. in order to analyse some specific kinds of failures, doing so can be inevitable. Now, if this is the case, I really, really want to be sure that I’m aware of the environment I am working in. I absolutely want to avoid a situation as in the catchy title of this post, when for instance you realize that you just ran some integration test against a production environment. In the context of working with the AWS CLI tool this means I’d like to be aware of the currently active profile by means of coloring my shell accordingly. Here’s how I’ve set this up using iTerm2 and zsh.

Read More...

Jan 3, 2023

Is your Blocking Queue... Blocking?

Java’s BlockingQueue hierarchy is widely used for coordinating work between different producer and consumer threads. When set up with a maximum capacity (i.e. a bounded queue), no more elements can be added by producers to the queue once it is full, until a consumer has taken at least one element. For scenarios where new work may arrive more quickly than it can be consumed, this applies means of back-pressure, ensuring the application doesn’t run out of memory eventually, while enqueuing more and more work items.

Read More...

Dec 18, 2022

Maven, What Are You Waiting For?!

As part of my new job at Decodable, I am also planning to contribute to the Apache Flink project (as Decodable’s fully-managed stream processing platform is based on Flink). Right now, I am in the process of familiarizing myself with the Flink code base, and as such I am of course building the project from source, too.

Read More...

Dec 15, 2022

Postgres 15: Logical Decoding Row Filters With Debezium

This post originally appeared on the Decodable blog. All rights reserved. Since logical decoding was added to Postgres in version 9.4, this powerful feature for capturing changes from the write-ahead log of the database has been continuously improved. Postgres 15, released in October this year, added support for fine-grained control over which columns (by means of column lists) and rows (via row filters) should be exported from captured tables. This means, in relational terminology, projections and filters are now natively supported by Postgres change event publications.

Read More...

Nov 30, 2022

The Insatiable Postgres Replication Slot

While working on a demo for processing change events from Postgres with Apache Flink, I noticed an interesting phenomenon: A Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space. The machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks. Now a common cause for this kind of issue are replication slots which are not advanced: in that case, Postgres will hold on to all WAL segments after the latest log sequence number (LSN) which was confirmed for that slot. Indeed I had set up a replication slot (via the Decodable CDC source connector for Postgres, which is based on Debezium). I then had stopped that connector, causing the slot to become inactive. The problem was though that I was really sure that there was no traffic in that database whatsoever! What could cause a WAL growth of ~18 GB/day then?

Read More...