Three Plus Some Lovely Kafka Trends
Over the course of the last few months, I’ve had the pleasure to serve on the Kafka Summit program committee and review several hundred session abstracts for the three Summits happening this year (Europe, APAC, Americas). That’s not only a big honour, but also a unique opportunity to learn what excites people currently in the Kafka eco-system (and yes, it’s a fair amount of work, too ;).
While voting on the proposals, and also generally aspiring to stay informed of what’s going on in the Kafka community at large, I noticed a few repeating themes and topics which I thought would be interesting to share (without touching on any specific talks of course). At first I meant to put this out via a Twitter thread, but then it became a bit too long for that, so I decided to write this quick blog post instead. Here it goes!
Exploring ZooKeeper-less Kafka
Sometimes, less is more. One case where that’s certainly true is dependencies. And so it shouldn’t come at a surprise that the Apache Kafka community is eagerly awaiting the removal of the dependency to the ZooKeeper service, which currently is used for storing Kafka metadata (e.g. about topics and partitions) as well as for the purposes of leader election in the cluster.
The Kafka improvement proposal KIP-500 ("Replace ZooKeeper with a Self-Managed Metadata Quorum") promises to make life better for users in many regards:
Better getting started and operational experience by requiring to run only one system, Kafka, instead of two
Removing potential for discrepancies of metadata state between ZooKeeper and the Kafka controller
Simplifying configuration, for instance when it comes to security
Better scalability, e.g. in terms of number of partitions; faster execution of operations like topic creation
The Anatomy of ct.sym — How javac Ensures Backwards Compatibility
One of the ultimate strengths of Java is its strong notion of backwards compatibility: Java applications and libraries built many years ago oftentimes run without problems on current JVMs, and the compiler of current JDKs can produce byte code, that is executable with earlier Java versions.
For instance, JDK 16 supports byte code levels going back as far as to Java 1.7; But: hic sunt dracones. The emitted byte code level is just one part of the story. It’s equally important to consider which APIs of the JDK are used by the compiled code, and whether they are available in the targeted Java runtime version. As an example, let’s consider this simple "Hello World" program:
FizzBuzz – SIMD Style!
Java 16 is around the corner, so there’s no better time than now for learning more about the features which the new version will bring. After exploring the support for Unix domain sockets a while ago, I’ve lately been really curious about the incubating Vector API, as defined by JEP 338, developed under the umbrella of Project Panama, which aims at "interconnecting JVM and native code".
Vectors?!? Of course this is not about renewing the ancient Java collection types like java.util.Vector (<insert some pun about this here>), but rather about an API which lets Java developers take advantage of the vector calculation capabilities you can find in most CPUs these days. Now I’m by no means an expert on low-level programming leveraging specific CPU instructions, but exactly that’s why I hope to make the case with this post that the new Vector API makes these capabilities approachable to a wide audience of Java programmers.
Talking to Postgres Through Java 16 Unix-Domain Socket Channels
Update Feb 5: This post is discussed on Hacker News
Reading a blog post about what’s coming up in JDK 16 recently, I learned that one of the new features is support for Unix domain sockets (JEP 380). Before Java 16, you’d have to resort to 3rd party libraries like jnr-unixsocket in order to use them. If you haven’t heard about Unix domain sockets before, they are "data communications [endpoints] for exchanging data between processes executing on the same host operating system". Don’t be put off by the name btw.; Unix domain sockets are also supported by macOS and even Windows since version 10.
jlink's Missing Link: API Signature Validation
Discussions around Java’s jlink tool typically center around savings in terms of (disk) space. Instead of shipping an entire JDK, a custom runtime image created with jlink contains only those JDK modules which an application actually requires, resulting in smaller distributables and container images.
But the contribution of jlink — as a part of the Java module system at large — to the development of Java application’s is bigger than that: with the notion of link time it defines an optional complement to the well known phases compile time and application run-time:
Link time is an opportunity to do whole-world optimizations that are otherwise difficult at compile time or costly at run-time. An example would be to optimize a computation when all its inputs become constant (i.e., not unknown). A follow-up optimization would be to remove code that is no longer reachable.
ByteBuffer and the Dreaded NoSuchMethodError
The other day, a user in the Debezium community reported an interesting issue; They were using Debezium with Java 1.8 and got an odd NoSuchMethodError:
Towards Continuous Performance Regression Testing
Functional unit and integration tests are a standard tool of any software development organization, helping not only to ensure correctness of newly implemented code, but also to identify regressions — bugs in existing functionality introduced by a code change. The situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones: How to detect increased response times in a web application? How to identify decreased throughput?
These aspects are typically hard to test in an automated and reliable way in the development workflow, as they are dependent on the underlying hardware and the workload of an application. For instance assertions on the duration of specific requests of a web application typically cannot be run in a meaningful way on a developer laptop, which differs from the actual production hardware (ironically, nowadays both is an option, the developer laptop being less or more powerful than the actual production environment). When run in a virtualized or containerized CI environment, such tests are prone to severe measurement distortions due to concurrent load of other applications and jobs.
This post introduces the JfrUnit open-source project, which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on indirect metrics which may impact those. JfrUnit allows you define expected values for metrics such as memory allocation, database I/O, or number of executed SQL statements, for a given workload and asserts the actual metrics values — which are obtained from JDK Flight Recorder events — against these expected values. Starting off from a defined base line, future failures of such assertions are an indicator for potential performance regressions in an application, as a code change may have introduced higher GC pressure, the retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.
Smaller, Faster-starting Container Images With jlink and AppCDS
A few months ago I wrote about how you could speed up your Java application’s start-up times using application class data sharing (AppCDS), based on the example of a simple Quarkus application. Since then, quite some progress has been made in this area: Quarkus 1.6 brought built-in support for AppCDS, so that now you just need to provide the -Dquarkus.package.create-appcds=true option when building your project, and you’ll find an AppCDS file in the target folder.
Things get more challenging though when combining AppCDS with custom Java runtime images, as produced using the jlink tool added in Java 9. Combining custom runtime images with AppCDS is very attractive, in particular when looking at the deployment of Java applications via Linux containers. Instead of putting the full Java runtime into the container image, you only add those JDK modules which your application actually requires. (Parts of) what you save in image size by doing so, can be used for adding an AppCDS archive to your container image. The result will be a container image which still is smaller than before — and thus is faster to push to a container registry, distribute to worker nodes in a Kubernetes cluster, etc. — and which starts up significantly faster.
Quarkus and Testcontainers
The Testcontainers project is invaluable for spinning up containerized resources during your (JUnit) tests, e.g. databases or Kafka clusters.
For users of JUnit 5, the project provides the @Testcontainers extension, which controls the lifecycle of containers used by a test. When testing a Quarkus application though, this is at odds with Quarkus' own @QuarkusTest extension; it’s a recommended best practice to avoid fixed ports for any containers started by Testcontainers. Instead, you should rely on Docker to automatically allocate random free ports. This avoids conflicts between concurrently running tests, e.g. amongst multiple Postgres containers, started up by several parallel job runs in a CI environment, all trying to allocate Postgres' default port 5432. Obtaining the randomly assigned port and passing it into the Quarkus bootstrap process isn’t possible though when combining the two JUnit extensions.