Class Unloading in Layered Java Applications

Posted at Oct 14, 2020

Layers are sort of the secret sauce of the Java platform module system (JPMS): by providing fine-grained control over how individual JPMS modules and their classes are loaded by the JVM, they enable advanced usages like loading multiple versions of a given module, or dynamically adding and removing modules at application runtime.

The Layrry API and launcher provides a small plug-in API based on top of layers, which for instance can be used to dynamically add plug-ins contributing new views and widgets to a running JavaFX application. If such plug-in gets removed from the application again, all its classes need to be unloaded by the JVM, avoiding an ever-increasing memory consumption if for instance a plug-in gets updated multiple times.

In this blog post I’m going to explore how to ensure classes from removed plug-in layers are unloaded in a timely manner, and how to find the culprit in case some class fails to be unloaded.

Do We Really Need Plug-ins?

Before diving into the details of class unloading, let’s spend some time to think about the use cases for dynamic plug-ins in Java applications to begin with. I would argue that for typical backend applications this need mostly has diminished. At large, the industry is moving away from application servers and their model around "deploying" applications (which you could consider as some kind of "plug-in") into a running server process. Instead, there’s a strong trend towards immutable application packages, based on stacks like Quarkus or Spring Boot, embedding the web server, the application as well as its dependencies, often-times deployed as container images.

The advantages of this approach centered around immutable images manifold, e.g. in terms of security (no interface for deploying applications is needed) and governance (it’s always exactly clear which version of the application is running). Updates — i.e. the deployment of a new revision of the container image — can be put in place e.g. with help of a proxy in front of a cluster of application nodes, which are updated in a rolling manner. That way, there’s no downtime of the service that’ll impact the user. Also techniques like canary releases and A/B testing, as well as rolling back to specific earlier versions of an application become a breeze that way.

The situation is different though when it comes to client applications. When thinking of your favourite editor, IDE or web browser for instance, requiring a restart when installing or updating a plug-in is not desirable. Instead, it should be possible to add plug-ins (or new plug-in versions) to a running application instance and be usable immediately, without interrupting the flow of the user. The same applies for many IoT scenarios, where e.g. an application consuming sensor measurements should be updateable without any downtime.

Plug-ins in Layered Java Applications

JPMS addresses this requirement via the notion of module layers:

A layer is created from a graph of modules in a Configuration and a function that maps each module to a ClassLoader. Creating a layer informs the Java virtual machine about the classes that may be loaded from the modules so that the Java virtual machine knows which module that each class is a member of.

Layers are the perfect means of adding new code into a running Java application: they can be added and removed dynamically, and code in an already running layer can invoke functionality from a dynamically added layer in different ways, e.g. via reflection or by using the service loader API. Layrry exposes this functionality via a very basic plug-in API:

public interface PluginLifecycleListener {

    void pluginAdded(PluginDescriptor plugin);

    void pluginRemoved(PluginDescriptor plugin);
}

public class PluginDescriptor {

    public String getName() { ... }

    public ModuleLayer getModuleLayer() { ... }
}

A plug-in in this context is a JPMS layer containing one or more modules (either explicit or automatic) which all are loaded via a single class loader. A Layrry-based application can implement the PluginLifecycleListener service contract in order to be notified whenever a plug-in is added or removed. Plug-ins are loaded from configured directories in the file system which are monitored by Layrry (other means of (un-)installing plug-ins may be added in future versions of Layrry).

Installing a plug-in is as easy as copying its JAR(s) into a sub-folder of such monitored directory. Layrry will copy the plug-in contents to a temporary directory, create a layer with all the plug-ins JARs, and notify any registered plug-in listeners about the new layer. These will typically use the service loader API then to interact with application-specific services which model its extension points, e.g. to contribute visual UI components in case of a desktop application.

The reverse process happens when a plug-in gets un-installed: the user removes a plug-in’s directory, and all listeners will be notified by the Layrry about the removal. They should release all references to any classes from the removed plug-in, rendering it avaible for garbage collection.

Class Unloading in Practice

There is no API in the Java platform for explicitly unloading a given class. Instead, "a class or interface may be unloaded if and only if its defining class loader may be reclaimed by the garbage collector" (JLS, chapter 12.7). This means in a layered Java application any classes in a layer that got removed can be unloaded as soon as the layer’s class loader is subject to GC. Most importantly, no class in a still running layer must keep a (strong) reference to any class of the removed layer; otherwise this class would hinder collecting the removed layer’s loader and its classes.

As an example, let’s look at the modular-tiles demo, a JavaFX application which uses the Layrry plug-in API for dynamically adding and removing tiles with different widgets like clocks and gauges to its graphical UI. The tiles themselves are implemented using the fabulous TilesFX project by Gerrit Grundwald.

If you want to follow along, check out the source code of the demo and build it as per the instructions in the README file. Then run the Layrry launcher with the -Xlog:class+unload=info option, so to be notified about any unloaded classes in the system output:

java -Xlog:class+unload=info \
  -jar path/to/layrry-launcher-1.0-SNAPSHOT-all.jar \
  --layers-config staging/layers.toml \
  --properties staging/versions.properties

Now add and remove some tiles plug-ins a few times:

cp -r staging/plugins-prepared/* staging/plugins
rm -rf staging/plugins/*

The widgets will show up and disappear in the JavaFX UI, but what about class unloading in the logs? In all likelyhood, nothing! This is because without any further configuration, the G1 garbage collector (which is used by the JDK by default since Java 9) will unload classes only during a full garbage collection, which may only run after a long time (if at all), if there’s no substantial object allocation happening.

JEP 158: Unified JVM Logging

The -Xlog option has been defined by JEP 158, added to the JDK with Java 9, which provides a "common logging system for all components of the JVM". The new unified options should be preferred over the legacy options like -XX:+TraceClassLoading and -XX:+TraceClassUnloading. Usage of -Xlog is described in detail in the java man page; also Nicolai Parlog discusses JEP 158 in great depth in this blog post.

So at this point you could trigger a GC explicitly, e.g. via jcmd:

jcmd <pid> GC.run

But of course that’s not too desirable when running things in production. Instead, if you’re on JDK 12 or later, you can use the new G1PeriodicGCInterval option for triggering a periodic GC:

java -Xlog:class+unload=info \
  -XX:G1PeriodicGCInterval=5000 \
  -jar path/to/layrry-launcher-1.0-SNAPSHOT-all.jar \
  --layers-config staging/layers.toml \
  --properties staging/versions.properties

Introduced via JEP 346 ("Promptly Return Unused Committed Memory from G1"), this will periodically initiate a concurrent GC cycle (or optionally even a full GC). Add and remove some plug-ins again, and after some time you should see messages about the unloaded classes in the log:

...
[138.912s][info][class,unload] unloading class org.kordamp.tiles.sparkline.SparklineTilePlugin 0x0000000800de1840
[138.912s][info][class,unload] unloading class org.kordamp.tiles.gauge.GaugeTilePlugin 0x0000000800de2040
[138.913s][info][class,unload] unloading class org.kordamp.tiles.clock.ClockTilePlugin 0x0000000800de2840
...

From what I observed, class unloading doesn’t happen on every concurrent GC cycle; it might take a few cycles after a plug-in has been removed until its classes are unloaded. If you’re not using G1, but the new low-pause concurrent collectors Shenandoah or ZGC, they’ll be able to concurrently unload classes without any special configuration needed. Note that class unloading is not a mandatory operation which would have to be provided by every GC implementation. E.g. initial ZGC releases did not support class unloading, which would have rendered them unsuitable for this use case.

JEP 371: Hidden Classes

As mentioned above, regular classes can only be unloaded if their defining class loader become subject to garbage collection. This can be an issue for frameworks and libraries which generate lots of classes dynamically at runtime, e.g. script language implementations or solutions like Presto, which generates a class for each query.

The traditional workaround is to generate each class using its own dedicated class loader, which then can be discarded specifically. This solves the GC issue, but it isn’t ideal in terms of overall memory consumption and speed of class generation. Hence, JDK 15 defines a notion of Hidden Classes (JEP 371), which are not created by class loaders and thus can be unloaded eagerly: "when all instances of the hidden class are reclaimed and the hidden class is no longer reachable, it may be unloaded even though its notional defining loader is still reachable".

You can find some more information on hidden classes in this tweet thread and this code example on GitHub.

But who wants to stare at logs in the system output, that’s so 2010! So let’s fire up JDK Mission Control and trigger a recording via the JDK Flight Recorder (JFR) to observe what’s going on in more depth.

JFR can capture class unloading events, you need to make sure though to enable this event type, which is not the case by default. In order to do so, start a recording, then go to the Template Manager, edit or create a flight recording template and check the Enabled box for the events under Java Virtual Machine → Class Loading. With the recorder running, add and remove some tiles plug-ins to the running application.

Once the recording is finished, you should see class unloading events under JVM Internals → Class Loading:

JFR class unloading events in JDK Mission Control

In this case, the classes from a set of plug-ins were unloaded at 16:48:11, which correlates to the periodic GC cycle running at that time and spending a slightly increased time for cleaning up class loader data:

As a good Java citizen, Layrry itself also emits JFR events whenever a plug-in layer is added or removed, which helps to track the need for classes to be unloaded:

JFR Layrry layer removal events in JDK Mission Control

If Things Go Wrong

Now let’s look at the situation where some class failed to unload after its plug-in layer was removed. Common reasons for that include remaining references from classes in a still running layer to classes in the removed layer, threads started by a class in the removed layer which were not stopped, and JVM shutdown hooks registered by code in the removed layer.

This is known as a class loader leak and is problematic as it means more and more memory will be consumed and cannot be freed as plug-ins are added and removed, which eventually may lead to an OutOfMemoryError. So how could you detect and analyse this situation? An OutOfMemoryError in production would surely be an indicator that there must be a memory or class loader leak somewhere. It’s also a good idea to regularly examine JFR recording files (e.g. in your testing or staging environment): the absence of any class unloading event despite the removal of plug-ins should trigger an investigation.

As far as analysing the situation is concerned, examining a heap dump of the application will typically yield insight into the cause rather quickly. Take a heap dump using jcmd as shown above, then load the dump into a tool such as Eclipse MAT. In Eclipse MAT, the "Duplicate Classes" action is a great starting point. If one class has been loaded by multiple class loaders, but failed to unload, it’s a pretty strong indicator that something is wrong:

The next step is to analyse the shortest path from the involved class loaders to a GC root:

Analyzing shortest paths to GC roots in Eclipse MAT

Some object on that path must hold on to a reference to a class or the class loader of the removed plug-in, preventing the loader to be GC-ed. In the case at hand, it’s the leakingPlugins field in the PluginRegistry class, to which each plug-in is added upon addition of the layer, but then apparently its coffee-deprived author forgot to remove the plug-in from that collection within the pluginRemoved() event handler ;)

As a quick side note, there’s a really cool plug-in for Eclipse MAT written by Vladimir Sitnikov, which allows you to query heap dumps using SQL. It maps each class to its own "table", so that e.g. classes loaded more than once could be selected using the following SQL query on the java.lang.Class class:

select
  c.name,
  listagg(toString(c."@classLoader")) as 'loaders',
  count(*) as 'count'
from
  "java.lang.Class" c
where
  c.name <> ''
group by
  c.name
having
  count(*) > 1

Resulting in the same list of classes as above:

Analyzing heap dumps in Eclipse MAT using SQL

This could come in very handy for more advanced heap dump analyses, which cannot be done using Eclipse MAT’s built-in query capabilities.

Learning More

Via module layers, JPMS provides the foundation for dynamic plug-in architectures, as demonstrated by Layrry. Removing layers at runtime requires some care and consideration, so to avoid class loader leaks which eventually may lead to OutOfMemoryErrors. As so often, JDK Mission Control, JFR, and Eclipse MAT prove to be invaluable tools in the box of every Java developer, helping to ensure class unloading in your layered applications is done correctly, and if it is not, helping to understand and fix the underlying issue.

Here are some more resources about class unloading and analysing class loader leaks:

Shenandoah GC in JDK 14, Part 2: Concurrent roots and class unloading: A blog post touching on class unloading in Shenandoah by Roman Kennke
ZGC Concurrent Class Unloading: A conference talk by Erik Österlund
class loader leaks: A series of blog posts by Mattias Jiderhamn
ClassLoader & memory leaks: a Java love story: A post about heap dump analysis by Aloïs Micard

Lastly, if you’d like to explore the dynamic addition and removal of JPMS layers to a running application yourself, the modular-tiles demo app is a great starting point. Its source code can be found on GitHub.

Gunnar Morling

Random Musings on All Things Software Engineering