Gunnar Morling

Gunnar Morling

Random Musings on All Things Software Engineering

Recent posts

Feb 10, 2024

Filtering Process Output With tee

Recently I ran into a situation where it was necessary to capture the output of a Java process on the stdout stream, and at the same time a filtered subset of the output in a log file. The former, so that the output gets picked up by the Kubernetes logging infrastructure. The letter for further processing on our end: we were looking to detect when the JVM stops due to an OutOfMemoryError, passing on that information to some error classifier.

Read More...

Feb 4, 2024

1BRC—The Results Are In!

Oh what a wild ride the last few weeks have been. The One Billion Row Challenge (1BRC for short), something I had expected to be interesting to a dozen folks or so at best, has gone kinda viral, with hundreds of people competing and engaging. In Java, as intended, but also beyond: folks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk. It’s really incredible how far people have pushed the limits here. Pull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation. Today I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31 and all submissions have been reviewed.

Read More...

Jan 1, 2024

The One Billion Row Challenge

Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit. For folks to show-case non-Java solutions, there is a "Show & Tell" now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot. Thanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries. Let’s kick off 2024 true coder style—​I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31. Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

Read More...

Dec 19, 2023

Logical Replication From Postgres 16 Stand-By Servers—Debezium and Failover Slots

This post originally appeared on the Decodable blog. All rights reserved. Welcome back to this series about logical replication from Postgres 16 stand-by servers, in which we’ll discuss how to use this feature with Debezium—a popular open-source platform for Change Data Capture (CDC) for a wide range of databases—as well as how to manage logical replication in case of failover scenarios, i.e. a situation where your primary Postgres server becomes unavailable and a stand-by server needs to take over.

Read More...

Dec 19, 2023

Using Stand-by Servers for Postgres Logical Replication

This post originally appeared on the Decodable blog. All rights reserved. For users of Change Data Capture (CDC), one of the most exciting features in Postgres version 16 (released in September this year) is the support for logical replication from stand-by servers. Instead of connecting to your primary server, you can now point CDC tools such as Debezium to a replica server, which is very interesting for instance from a load distribution perspective. I am going to take a closer look at this new feature in this two-part blog series:

Read More...

Dec 17, 2023

Tracking Java Native Memory With JDK Flight Recorder

As regular readers of this blog will now, JDK Flight Recorder (JFR) is one of my favorite tools of the Java platform. This low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues. JFR continues to become better and better with every new release, with one recent addition being support for native memory tracking (NMT).

Read More...

Dec 7, 2023

Getting Started With PyFlink on Kubernetes

This post originally appeared on the Decodable blog. All rights reserved. The other day, I wanted to get my feet wet with PyFlink. While there is a fair amount of related information out there, I couldn’t find really up-to-date documentation on using current versions of PyFlink with Flink on Kubernetes.

Read More...

Nov 21, 2023

"Change Data Capture Breaks Encapsulation". Does it, though?

This post originally appeared on the Decodable blog. All rights reserved. Having worked on Debezium—​an open-source platform for Change Data Capture (CDC)—​for several years, one concern I’ve heard repeatedly is this: aren’t you breaking the encapsulation of your application when you expose change event feeds directly from your database? After all, CDC exposes your internal persistent data model to the outside world, which may have unintended consequences, e.g. in terms of data exposure but also when it comes to changes to the schema of your data, which may break downstream consumers.

Read More...

Nov 14, 2023

Can Debezium Lose Events?

This question came up on the Data Engineering sub-reddit the other day: Can Debezium lose any events? I.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?

Read More...

Nov 2, 2023

CDC Use Cases: 7 Ways to Put CDC to Work

This post originally appeared on the Decodable blog. All rights reserved. Change Data Capture (CDC) is a powerful tool in data engineering and has seen a tremendous uptake in organizations of all kinds over the last few years. This is because it enables the tight integration of transactional databases into many other systems in your business at a very low latency.

Read More...