Smaller, Faster-starting Container Images With jlink and AppCDS

Posted at Dec 13, 2020

A few months ago I wrote about how you could speed up your Java application’s start-up times using application class data sharing (AppCDS), based on the example of a simple Quarkus application. Since then, quite some progress has been made in this area: Quarkus 1.6 brought built-in support for AppCDS, so that now you just need to provide the -Dquarkus.package.create-appcds=true option when building your project, and you’ll find an AppCDS file in the target folder.

Things get more challenging though when combining AppCDS with custom Java runtime images, as produced using the jlink tool added in Java 9. Combining custom runtime images with AppCDS is very attractive, in particular when looking at the deployment of Java applications via Linux containers. Instead of putting the full Java runtime into the container image, you only add those JDK modules which your application actually requires. (Parts of) what you save in image size by doing so, can be used for adding an AppCDS archive to your container image. The result will be a container image which still is smaller than before — and thus is faster to push to a container registry, distribute to worker nodes in a Kubernetes cluster, etc. — and which starts up significantly faster.

A challenge though is that AppCDS archives must be created with exactly same Java runtime which later on is used to run the application. In the case of jlink this means the custom runtime image itself must be used to produce the AppCDS archive. In other words, the default archive produced by the Quarkus build unfortunately cannot be used with jlink images. The goal for this post is to explore

the steps required to create a custom runtime image for a simple Java CRUD application based on Quarkus,
how to build a Linux container image with this custom runtime image and the application itself,
how this approach compares to container images with the full Java runtime in terms of size and start-up time.

Creating a Modular Runtime Image for a Quarkus Application

It’s a common misbelief that only Java applications which have been fully ported to the Java module system (JPMS) would be able to benefit from jlink. But as explained by Simon Ritter in this blog post, this is not true actually; you don’t need to fully modularize an application in order to run it via a custom runtime image.

While indeed the creation of a runtime image is a bit easier when it only is comprised of proper Java modules, it also is possible to create a runtime image by explicitly stating which JDK (or other) modules it should contain. The application can then be run via the traditional classpath, just as you’d do it with a full Java runtime. Which JDK modules to add though? To answer this question, the jdeps tool comes in handy. Via its --print-module-deps option it can determine for a given set of JARs which (JDK) modules they depend on, and which thus are the ones that need to go into the custom runtime image.

Having built the example application from the previous blog post via mvn clean verify, let’s try and invoke jdeps like so:

jdeps --print-module-deps \
    --class-path target/lib/* \
    target/todo-manager-1.0.0-SNAPSHOT-runner.jar

This results in an error though:

Error: com.sun.istack.istack-commons-runtime-3.0.10.jar is a multi-release jar file but --multi-release option is not set

Ok, we need to tell which code version to analyse for multi-release JARs; no problem:

jdeps --print-module-deps \
    --multi-release 15 \
    --class-path target/lib/* \
    target/todo-manager-1.0.0-SNAPSHOT-runner.jar

Hum, some progress, but still an issue:

Exception in thread "main" java.lang.module.FindException: Module java.xml.bind not found, required by java.ws.rs

This one is a bit odd; the file org.jboss.spec.javax.ws.rs.jboss-jaxrs-api_2.1_spec-2.0.1.Final.jar is an explicit module with a module-info.class descriptor, which references the module java.xml.bind, and this one is not found on the module path. It’s not quite clear to me why this is flagged here, given that the JAX-RS API JAR is part of the class path and not the module path. But it’s not a big problem, we simply can add the JAXB API (which also is provided on the class path) on the module path, too.

The same issue arises for some other dependencies which are explicit modules already, so we end up with the following configuration:

jdeps --print-module-deps \
    --multi-release 15 \
    --module-path target/lib/jakarta.activation.jakarta.activation-api-1.2.1.jar:target/lib/org.reactivestreams.reactive-streams-1.0.3.jar:target/lib/org.jboss.spec.javax.xml.bind.jboss-jaxb-api_2.3_spec-2.0.0.Final.jar \
    --class-path target/lib/* \
    target/todo-manager-1.0.0-SNAPSHOT-runner.jar

And another issue, now about some missing dependencies:

...
org.postgresql.util.internal.Nullness              -> org.checkerframework.dataflow.qual.Pure            not found
org.wildfly.common.wildfly-common-1.5.4.Final-format-001.jar
   org.wildfly.common.Substitutions$Target_Branch     -> com.oracle.svm.core.annotate.AlwaysInline          not found
...

After taking a closer look, these are either compile-time only dependencies (like annotations from the Checker framework), or dependencies of optional features which are not relevant for our case. These can be safely ignored using the --ignore-missing-deps switch, which leaves us with this jdeps invocation:

jdeps --print-module-deps \
    --ignore-missing-deps \
    --multi-release 15 \
    --module-path target/lib/jakarta.activation.jakarta.activation-api-1.2.1.jar:target/lib/org.reactivestreams.reactive-streams-1.0.3.jar:target/lib/org.jboss.spec.javax.xml.bind.jboss-jaxb-api_2.3_spec-2.0.0.Final.jar \
    --class-path target/lib/* \
    target/todo-manager-1.0.0-SNAPSHOT-runner.jar

The required JDK modules are printed out finally:

java.base,java.compiler,java.instrument,java.naming,java.rmi,java.security.jgss,java.security.sasl,java.sql,jdk.jconsole,jdk.unsupported

I.e. out of the nearly 60 modules which make up OpenJDK 15, only ten are required by this particular application. Building a custom runtime image containing only these modules should result in quite some space saving.

Why is a Particular Module Required?

When looking at the module list, you might wonder why certain modules actually are needed. What is this application doing with jdk.jconsole for instance? To gain insight into this, jdeps can help, too. Run it again without the --print-module-deps switch, and you can grep for interesting module references:

jdeps <...> | grep jconsole

org.jboss.narayana.jta.narayana-jta-5.10.6.Final.jar -> jdk.jconsole
  com.arjuna.ats.arjuna.tools.stats ->
    com.sun.tools.jconsole jdk.jconsole

In this case, there’s a single dependency to jconsole, from the Narayana transaction manager. Depending on the details, it might be an opportunity to reach out to the maintainers of such library and discuss, whether this dependency really is needed or whether it could be avoided (e.g. by moving the code in question to a separate module), resulting in a further decreased size of custom runtime images.

With the list of required modules, creating the actual runtime image is rather simple:

$JAVA_HOME/bin/jlink \
  --add-modules java.base,java.compiler,java.instrument,java.naming,java.rmi,java.security.jgss,java.security.sasl,java.sql,jdk.jconsole,jdk.unsupported \
  --compress 2 --no-header-files --no-man-pages \(1)
  --output target/runtime-image (2)

1	Compressing the runtime image as well as omitting header files and man pages helps to further reduce the size of the runtime image
2	Output location for creating the runtime image

In order to create a dynamic AppCDS archive for our application classes later on, we now need to add the class data archive for all of the classes of the image itself. Failing to do so results in this error message:

Error occurred during initialization of VM
DynamicDumpSharedSpaces is unsupported when base CDS archive is not loaded

This step isn’t very well documented, and at this point I was somewhat stuck. But you always can count on the OpenJDK community: after asking about this on Twitter, Claes Redestad pointed me into the right direction:

./target/runtime-image/bin/java -Xshare:dump

Thanks, Claes! This creates the base class data archive under target/runtime-image/lib/server/classes.jsa, adding ~12 MB to the runtime image, which now has a size of ~63 MB; not too bad.

Adding an AppCDS Archive to a Custom Runtime Image

Having created the custom Java runtime image, let’s now add the AppCDS archive to it. Since the introduction of dynamic AppCDS archives in JDK 13, this is one simple step which only requires to run the application with the -XX:ArchiveClassesAtExit option:

cd target (1)

mkdir runtime-image/cds (2)

(3)
runtime-image/bin/java \
  -XX:ArchiveClassesAtExit=runtime-image/cds/app-cds.jsa \
  -jar todo-manager-1.0.0-SNAPSHOT-runner.jar

cd ..

1	The class path used when running the application later on must be the same as (or rather a prefix of, to be precise) the class path used for building the AppCDS archive; hence changing to the target directory, so to run with -jar -runner.jar, instead of with -jar target/-runner.jar
2	Creating a folder for storing the AppCDS archive
3	Using the java binary of the runtime image to launch the application and create the AppCDS archive when exiting

This will create the CDS archive under target/runtime-image/cds/app-cds.jsa. In the next step this can be added to a Linux container image, built e.g. using Docker or podman. Note that while jlink supports cross-platform builds (so for instance you could build a custom runtime image for a Linux container on macOS), the same isn’t the case for AppCDS. This means an AppCDS archive to be used by a containerized application needs to be built on Linux. When not running on Linux yourself, but on Windows or macOS, you could put the entire build process into a container for this purpose.

Creating a Linux Container Image

At this point we have built our actual application, a custom Java runtime image with the required JDK modules, and an AppCDS archive for the application’s classes. The final step is to put everything into a Linux container image, which is quickly done via a small Dockerfile:

FROM registry.fedoraproject.org/fedora-minimal:33

COPY target/runtime-image /opt/todo-manager/jdk
COPY target/lib/* /opt/todo-manager/lib/
COPY target/todo-manager-1.0.0-SNAPSHOT-runner.jar /opt/todo-manager
COPY todo-manager.sh /opt/todo-manager

ENTRYPOINT [ "/opt/todo-manager/todo-manager.sh" ]

This uses the Fedora minimal base image, which is a great foundation for container images. With a size of ~120 MB, it’s small enough to be distributed efficiently, while still providing the flexibility of a complete Linux distribution, e.g. allowing for the installation of additional tools if needed.

Even Smaller Container Images

If you wanted to shrink the image size further and felt adventureous, you could look into using Alpine Linux as a base image; the issue there though is that Alpine comes with musl instead of glibc (as used by the JDK) as its implementation of the ISO C and POSIX standard APIs. The OpenJDK Portola project aims at providing a port to Alpine and musl. But as of JDK 15, no GA build of this port exists yet. For JDK 16, an early access build of the Alpine/musl port is available.

Another option for smaller images is to use jib, which also is supported by Quarkus out of the box. I haven’t tried out yet though whether/how jib would work with custom runtime images and AppCDS.

It’s also worth pointing out that the size of base images doesn’t matter too much in practice, as container images use a layered file system, which means that typically rather stable base image layers don’t need to be redistributed too often when pushing or pulling a container image.

The container’s entry point, todo-manager.sh, is a basic shell script, which starts the actual Java application via the Java runtime image:

#!/bin/bash

export PATH="/opt/todo-manager/jdk/bin:${PATH}"

cd /opt/todo-manager && \ (1)
  exec java -Xshare:on -XX:SharedArchiveFile=jdk/cds/app-cds.jsa -jar \ (2)
  todo-manager-1.0.0-SNAPSHOT-runner.jar

1	Changing into the todo-manager directory, so to make sure the same JAR path is passed as when creating the CDS archive
2	Specifying the archive name; the -Xshare:on isn’t strictly needed, it’s used here though to ensure the process will fail if something is wrong with the CDS archive, instead of silently not using it

Let’s See Some Numbers!

Finally, let’s compare some numbers: container image size, and start-up time for different ways of containerizing the todo manager application. I’ve tried out four different aproaches:

OpenJDK 11 on the RHEL UBI 8.3 image (universal base image), as per the default Dockerfile created for new Quarkus applications
A full OpenJDK 15 on Fedora 33 (as there’s no OpenJDK 15 package for the RHEL base image yet)
A custom runtime image for OpenJDK 15 on Fedora 33
A custome runtime image with AppCDS on Fedora 33

Here are the results, running on a Hetzner Cloud CX4 instance (4 vCPUs, 16 GB RAM), using Fedora 33 as the host OS:

As we can see, the container image size is significantly lower when adding a custom Java runtime image instead of the full JDK. In particular when comparing to the OpenJDK package of Fedora 33 which is a fair bit larger than the OpenJDK 11 package of the RHEL UBI 8.3 image, the difference is striking.

The start-up times are as displayed by Quarkus, averaged over five runs. Numbers have improved by about 10% by going from OpenJDK 11 to 15, which is explained by multiple improvements in this area, most notably the introduction of default CDS archives for the JDK’s own classes in JDK 12 (JEP 341). Using a custom runtime image by itself doesn’t have any measurable impact on start-up time. The AppCDS archive improves the start-up time by a whopping 54%. Unless pure image size is the key factor for you (in which case you should look for alternative approaches anyways, see note "Even Smaller Container Images" above), I would say that the additional 40 MB for the AppCDS archive are more than worth it. In particular as the resulting container image still is way smaller than when adding the full JDK, be it with the Fedora base image or the RHEL UBI one.

Based on those numbers, I think it’s fair to say that custom Java runtime images created via jlink, combined with AppCDS archives are a great foundation for containerized Java applications. Adding a custom runtime image containing only those JDK modules actually needed by an application help to cut down image size signficantly. Parts of that saved space can be invested into adding an AppCDS archive, so you end up with a container image that’s smaller and starts up faster. I.e. you can have this cake, and eat it, too!

The one downside is the increased complexity of the build process for producing the runtime image as well as the AppCDS archive. This should be manageable though by means of scripting and automation; also I’d expect tooling like the Quarkus Maven plug-in and others to further improve on this front. One tricky aspect is that you must not forget to rebuild the custom runtime image, in case you have added dependencies to your application which affect the set of required JDK modules. Automated tests of the application running via the runtime image should help to identify this situation.

If you’d like to give it a try yourself, or obtain numbers for the different deployment approaches on your own hardware, you can find all the required code and information in this GitHub repository.

Gunnar Morling

Random Musings on All Things Software Engineering