Gunnar Morling

Gunnar Morling

Random Musings on All Things Software Engineering

Gunnar Morling

Gunnar Morling

Random Musings on All Things Software Engineering

Recent posts

Feb 4, 2024

1BRC—The Results Are In!

Oh what a wild ride the last few weeks have been. The One Billion Row Challenge (1BRC for short), something I had expected to be interesting to a dozen folks or so at best, has gone kinda viral, with hundreds of people competing and engaging. In Java, as intended, but also beyond: folks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk. It’s really incredible how far people have pushed the limits here. Pull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation. Today I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31 and all submissions have been reviewed.

Read More...

Jan 1, 2024

The One Billion Row Challenge

Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit. For folks to show-case non-Java solutions, there is a "Show & Tell" now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot. Thanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries. Let’s kick off 2024 true coder style—​I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31. Your mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!

Read More...

Dec 17, 2023

Tracking Java Native Memory With JDK Flight Recorder

Update Dec 18: This post is discussed on Hacker News 🍊 As regular readers of this blog will now, JDK Flight Recorder (JFR) is one of my favorite tools of the Java platform. This low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues. JFR continues to become better and better with every new release, with one recent addition being support for native memory tracking (NMT).

Read More...

Nov 14, 2023

Can Debezium Lose Events?

This question came up on the Data Engineering sub-reddit the other day: Can Debezium lose any events? I.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?

Read More...

Feb 28, 2023

Finding Java Thread Leaks With JDK Flight Recorder and a Bit Of SQL

The other day at work, we had a situation where we suspected a thread leak in one particular service, i.e. code which continuously starts new threads, without taking care of ever stopping them again. Each thread requires a bit of memory for its stack space, so starting an unbounded number of threads can be considered as a form of memory leak, causing your application to run out of memory eventually. In addition, the more threads there are, the more overhead the operating system incurs for scheduling them, until the scheduler itself will consume most of the available CPU resources. Thus it’s vital to detect and fix this kind of problem early on.

Read More...

Jan 15, 2023

Getting Started With Java Development in 2023 — An Opinionated Guide

27 years of age, and alive and kicking — The Java platform regularly comes out amongst the top contenders in rankings like the TIOBE index. In my opinion, rightly so. The language is very actively maintained and constantly improved; its underlying runtime, the Java Virtual Machine (JVM), is one of, if not the most, advanced runtime environments for managed programming languages. There is a massive eco-system of Java libraries which make it a great tool for a large number of use cases, ranging from command-line and desktop applications, over web apps and backend web services, to datastores and stream processing platforms. With upcoming features like support for vectorized computations (SIMD), light-weight virtual threads, improved integration with native code, value objects and user-defined primitives, and others, Java is becoming an excellent tool for solving a larger number of software development tasks than ever before.

Read More...

Jan 5, 2023

Oh... This is Prod?!

I strongly believe that you should avoid connecting to production environments from local developer machines as much as possible. But sometimes, e.g. in order to analyse some specific kinds of failures, doing so can be inevitable. Now, if this is the case, I really, really want to be sure that I’m aware of the environment I am working in. I absolutely want to avoid a situation as in the catchy title of this post, when for instance you realize that you just ran some integration test against a production environment. In the context of working with the AWS CLI tool this means I’d like to be aware of the currently active profile by means of coloring my shell accordingly. Here’s how I’ve set this up using iTerm2 and zsh.

Read More...

Jan 3, 2023

Is your Blocking Queue... Blocking?

Java’s BlockingQueue hierarchy is widely used for coordinating work between different producer and consumer threads. When set up with a maximum capacity (i.e. a bounded queue), no more elements can be added by producers to the queue once it is full, until a consumer has taken at least one element. For scenarios where new work may arrive more quickly than it can be consumed, this applies means of back-pressure, ensuring the application doesn’t run out of memory eventually, while enqueuing more and more work items.

Read More...

Dec 18, 2022

Maven, What Are You Waiting For?!

As part of my new job at Decodable, I am also planning to contribute to the Apache Flink project (as Decodable’s fully-managed stream processing platform is based on Flink). Right now, I am in the process of familiarizing myself with the Flink code base, and as such I am of course building the project from source, too.

Read More...

Nov 30, 2022

The Insatiable Postgres Replication Slot

While working on a demo for processing change events from Postgres with Apache Flink, I noticed an interesting phenomenon: A Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space. The machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks. Now a common cause for this kind of issue are replication slots which are not advanced: in that case, Postgres will hold on to all WAL segments after the latest log sequence number (LSN) which was confirmed for that slot. Indeed I had set up a replication slot (via the Decodable CDC source connector for Postgres, which is based on Debezium). I then had stopped that connector, causing the slot to become inactive. The problem was though that I was really sure that there was no traffic in that database whatsoever! What could cause a WAL growth of ~18 GB/day then?

Read More...