The Insatiable Postgres Replication Slot
While working on a demo for processing change events from Postgres with Apache Flink, I noticed an interesting phenomenon: A Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space. The machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks.
Now a common cause for this kind of issue are replication slots which are not advanced: in that case, Postgres will hold on to all WAL segments after the latest log sequence number (LSN) which was confirmed for that slot. Indeed I had set up a replication slot (via the Decodable CDC source connector for Postgres, which is based on Debezium). I then had stopped that connector, causing the slot to become inactive. The problem was though that I was really sure that there was no traffic in that database whatsoever! What could cause a WAL growth of ~18 GB/day then?
Running a Quarkus Native Application on Render
This is a quick run down of the steps required for running JVM applications, built using Quarkus and GraalVM, on Render.
Render is a cloud platform for running websites and applications. Like most other comparable services such as fly.io, it offers a decent free tier, which lets you try out the service without any financial commitment. Unlike most other services, with Render, you don’t need to provide a credit card in order to use the free tier. Which means there’s no risk of surprise bills, as often is the case with pay-per-use models, where a malicious actor could DDOS your service and drive up cost for consumed CPU resources or egress bandwidth indefinitely.
Why I Joined Decodable
It’s my first week as a software engineer at Decodable, a start-up building a serverless real-time data platform! When I shared this news on social media yesterday, folks were not only super supportive and excited for me (thank you so much for all the nice words and wishes!), but some also asked about the reasons behind my decision for switching jobs and going to a start-up, after having worked for Red Hat for the last few years.
An Ideation for Kubernetes-native Kafka Connect
Kafka Connect, part of the Apache Kafka project, is a development framework and runtime for connectors which either ingest data into Kafka clusters (source connectors) or propagate data from Kafka into external systems (sink connectors). A diverse ecosystem of ready-made connectors has come to life on top of Kafka Connect, which lets you connect all kinds of data stores, APIs, and other systems to Kafka in a no-code approach.
With the continued move towards running software in the cloud and on Kubernetes in particular, it’s just natural that many folks also try to run Kafka Connect on Kubernetes.
Testing Kafka Connectors
Kafka Connect is a key factor for the wide-spread adoption of Apache Kafka: a framework and runtime environment for connectors, it makes the task of getting data either into Kafka or out of Kafka solely a matter of configuration, rather than a bespoke programming job. There’s dozens, if not hundreds, of readymade source and sink connectors, allowing you to create no-code data pipelines between all kinds of databases, APIs, and other systems.
There may be situations though where there is no existing connector matching your requirements, in which case you can implement your own custom connector using the Kafka Connect framework. Naturally, this raises the question of how to test such a Kafka connector, making sure it propagates the data between the connected external system and Kafka correctly and completely. In this blog post I’d like to focus on testing approaches for Kafka Connect source connectors, i.e. connectors like Debezium, which ingest data from an external system into Kafka. Very similar strategies can be employed for testing sink connectors, though.
Ten Tips to Make Conference Talks Suck Less
Every so often, I come across some conference talk which is highly interesting in terms of its actual contents, but which unfortunately is presented in a less than ideal way. I’m thinking of basic mistakes here, such as the presenter primarily looking at their slides rather than at the audience. I’m always feeling a bit sorry when this happens, as I firmly believe that everyone can do good and even great talks, just by being aware of — and thus avoiding — a few common mistakes, and sticking to some simple principles.
Loom and Thread Fairness
Update Jun 3: This post is discussed on Reddit and Hacker News
Project Loom (JEP 425) is probably amongst the most awaited feature additions to Java ever; its implementation of virtual threads (or "green threads") promises developers the ability to create highly concurrent applications, for instance with hundreds of thousands of open HTTP connections, sticking to the well-known thread-per-request programming model, without having to resort to less familiar and often more complex to use reactive approaches.
Having been in the workings for several years, Loom got merged into the mainline of OpenJDK just recently and is available as a preview feature in the latest Java 19 early access builds. I.e. it’s the perfect time to get your hands onto virtual threads and explore the new feature. In this post I’m going to share an interesting aspect I learned about thread scheduling fairness for CPU-bound workloads running on Loom.
Running JDK Mission Control on Apple M1
JDK Mission Control (JMC) is invaluable for analysing performance data recording using JDK Flight Recorder (JFR). The other day, I ran into a problem when trying to run JMC on my Mac Mini M1. Mostly for my own reference, here’s what I did to overcome it.
The Code Review Pyramid
When it comes to code reviews, it’s a common phenomenon that there is much focus and long-winded discussions around mundane aspects like code formatting and style, whereas important aspects (does the code change do what it is supposed to do, is it performant, is it backwards-compatible for existing clients, and many others) tend to get less attention.
To raise awareness for the issue and providing some guidance on aspects to focus on, I shared a small visual on Twitter the other day, which I called the "
The JDK Flight Recorder File Format
The JDK Flight Recorder (JFR) is one of Java’s secret weapons; deeply integrated into the Hotspot VM, it’s a high-performance event collection framework, which lets you collect metrics on runtime aspects like object allocation and garbage collection, class loading, file and network I/O, and lock contention, do method profiling, and much more.
JFR data is persisted in recording files (since Java 14, also "realtime" event streaming is supported), which can be loaded for analysis into tools like JDK Mission Control (JMC), or the jfr utility coming with OpenJDK itself.