[{"content":"","id":0,"publicationdate":"Apr 24, 2025","section":"blog","summary":"","tags":null,"title":"Blogs","uri":"https://www.morling.dev/blog/"},{"content":"","id":1,"publicationdate":"Apr 24, 2025","section":"","summary":"","tags":null,"title":"Gunnar Morling","uri":"https://www.morling.dev/"},{"content":" Update April 25: This post is being discussed on Hacker News, lobste.rs, and /r/apachekafka\nThe last few days I spent some time digging into the recently announced KIP-1150 (\u0026#34;Diskless Kafka\u0026#34;), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.\nThis got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—​Kafka.next if you will—​which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:\nDo away with partitions: topic partitions were crucial for scaling purposes when data was stored on node-local disks, but they are not required when storing data on effectively infinitely large object storage in the cloud. While partitions also provide ordering guarantees, this never struck me as overly useful from a client perspective. You either want to have global ordering of all messages on a given topic, or (more commonly) ordering of all messages with the same key. In contrast, defined ordering of otherwise unrelated messages whose key happens to yield the same partition after hashing isn’t that valuable, so there’s not much point in exposing partitions as a concept to users.\nKey-centric access: instead of partition-based access, efficient access and replay of all the messages with one and the same key would be desirable. Rather than coarse-grained scanning of all the records on a given topic or partition, let’s have millions of entity-level streams! Not only would this provide access exactly to the subset of data you need, it would also let you increase and decrease the number of consumers dynamically based on demand, not hitting the limits of a pre-defined partition count. Key-level streams (with guaranteed ordering) would be a perfect foundation for Event Sourcing architectures as well as actor-based and agentic systems. In addition, this approach largely solves the problem of head-of-line blocking found in partition based systems with cumulative acknowledgements: if a consumer can’t process a particular message, this will only block other messages with the same key (which oftentimes is exactly what you’d want), while all other messages are not affected. Rather than coarse-grained partitions, individual messages keys are becoming the failure domain.\nTopic hierarchies: available in systems like Solace, topic hierarchies promote parts of the message payload into structured path-like topic identifiers, allowing for clients to subscribe to arbitrary sub sets of all the available streams based on patterns in an efficient way, without requiring brokers to deserialize and parse entire messages.\nMeans of concurrency control: As is, using Kafka as a system of record can be problematic as you can’t prevent writing messages which are based on an outdated view of the stored data. Concurrency control, for instance via optimistic locking of message keys, would help to detect and fence off concurrent conflicting writes. That way, when a message gets acknowledged successfully, it is guaranteed that it has been produced seeing the latest state of that key, avoiding lost updates.\nBroker-side schema support: Kafka treats messages as opaque byte arrays with arbitrary content, requiring out-of-bands propagation of message schemas to consumers. This can be especially problematic when erroneous (or malicious) producers send non-conformant data. Also, without additional tooling, the current architecture prevents Kafka data from being written to open table formats such as Apache Iceberg. For all these reasons, Kafka is used with a schema registry most of the time, but making schema support a first-class concept would allow for better user ergonomics—​for instance, Kafka could expose AsyncAPI-compatible metadata out of the box—​and also open the door for storing data in different ways, for instance in a columnar representation.\nExtensibility and pluggability: a common trait of many successful open-source projects like Postgres or Kubernetes is their extensibility. Users and integrators can customize the behavior of the system by providing implementations of well-defined extension points and plug-in contracts, rather than by modifying the system’s core itself (following the Open-closed principle). This would enable for instance custom broker-side message filters and transformations (addressing many scenarios currently requiring a protocol-aware proxy such as Kroxylicious), storage formats (e.g. columnar), and more. Functionality such as rate limiting, topic encryption, or backing a topic via an Iceberg table should be possible to implement solely via extensions to the system.\nSynchronous commit callbacks: End-to-end Kafka pipelines ensure eventual consistency. When producing a record to a topic and then using that record for materializing some derived data view on some downstream data store, there’s no way for the producer to know when it will be able to \u0026#34;see\u0026#34; that downstream update. For certain use cases it would be helpful to be able to guarantee that derived data views have been updated when a produce request gets acknowledged, allowing Kafka to act as a log for a true database with strong read-your-own-writes semantics.\nSnapshotting: Currently, Kafka supports topic compaction, which will only retain the last record for a given key. This works well, if records contain the full state of the entity they represent (a customer, purchase order etc.). It doesn’t work though for partial or delta events, which describe changes to an entity and which need to be applied all after one another to fully restore the state of the entity. Assuming there was support for efficient key-based message replay (see above), this would take longer and longer, as the number of records for a key increases. Built-in snapshot support could allow for \u0026#34;logical compaction\u0026#34;, passing all events for a key to some event handler which condenses them into a snapshot. This would then serve as the foundation for subsequent update events, while all previous records for that key could be removed during compaction.\nMulti-tenancy: Any modern data system should be built with multi-tenancy in mind from the ground up. Spinning up a new customer-specific environment should be a very cheap operation, happening instantaneously; the workloads of individual tenants should be strictly isolated, not interfering with each other in regards to access control and security, resource utilization, metering etc.\nSome of these features are supported in other systems already—​for instance, high cardinality streams in S2, optimistic locking in Waltz, or multi-tenancy in Apache Pulsar. But others are not, and I am not aware of a single system, let alone open-source, which would combine all these traits.\nNow, this describes my personal (which is to say, that in no way this post should be understood as speaking for my employer, Confluent, in any official capacity) wishlist for what a Kafka.next could be and the semantics it could provide, driven by the use cases and applications I’ve seen people wanting to employ Kafka for. But I am sure everyone who has worked with Kafka or comparable platforms for some time will have their own thoughts around this, and I’d love to learn about yours in the comments!\nFinally, an important question of course is how would such a system actually be architected? While I’ll have to leave the answer to that for another time, it’s safe to say that building that system on top of a log-structured merge (LSM) tree would be a likely choice.\n","id":2,"publicationdate":"Apr 24, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate April 25: This post is being discussed on  \u003ca href=\"https://news.ycombinator.com/item?id=43790420\"\u003eHacker News\u003c/a\u003e, \u003ca href=\"https://lobste.rs/s/8s6cxz/what_if_we_could_rebuild_kafka_from\"\u003elobste.rs\u003c/a\u003e, and \u003ca href=\"https://old.reddit.com/r/apachekafka/comments/1k6u6jw/what_if_we_could_rebuild_kafka_from_scratch/\"\u003e/r/apachekafka\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe last few days I spent some time digging into the recently announced \u003ca href=\"https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics\"\u003eKIP-1150\u003c/a\u003e (\u0026#34;Diskless Kafka\u0026#34;), as well \u003ca href=\"https://github.com/AutoMQ/automq\"\u003eAutoMQ’s Kafka fork\u003c/a\u003e, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—​Kafka.next if you will—​which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"What If We Could Rebuild Kafka From Scratch?","uri":"https://www.morling.dev/blog/what-if-we-could-rebuild-kafka-from-scratch/"},{"content":" Over the years, I’ve spoken quite a bit about the use cases for processing Debezium data change events with Apache Flink, such as metadata enrichment, building denormalized data views, and creating data contracts for your CDC streams. One detail I haven’t covered in depth so far is how to actually ingest Debezium change events from a Kafka topic into Flink, in particular via Flink SQL. Several connectors and data formats exist for this, which can make things somewhat confusing at first. So let’s dive into the different options and the considerations around them!\nFlink SQL Connectors for Apache Kafka For processing events from a Kafka topic using Flink SQL (or the Flink Table API, which essentially offers a programmatic counterpart to SQL), there are two connectors provided by the Apache Flink project: The Apache Kafka SQL connector and the Upsert Kafka SQL Connector.\nBoth connectors can be used as a source connector—​reading data from a Kafka topic—​and as a sink connector, for writing data to a Kafka topic. There’s support for different data formats such as JSON and Apache Avro, the latter with a schema registry such as the Confluent schema registry, or API-compatible implementations like Apicurio. The Apache Kafka SQL Connector also supports Debezium-specific JSON and Avro formats.\nThe combination of connector and format defines the exact semantics, in particular whether the ingested Debezium events are processed as an append-only stream, or as a changelog stream, building and incrementally updating materialized views of the source tables based on the incoming INSERT, UPDATE, and DELETE events (Dynamic Tables in Flink SQL terminology).\nThe Apache Kafka SQL Connector in Append-Only Mode When using the Apache Kafka SQL Connector with the JSON format, no Debezium-specific semantics are applied: The Kafka topic with the Debezium events is interpreted as an append-only log of independent events. The same is the case when using the Confluent Avro format instead of JSON.\nThe schema of the table must be exactly modeled after Debezium’s data event structure, including all the fields of both message key (representing the record’s primary key) and message value (the change event):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 CREATE TABLE authors_append_only_source ( id BIGINT NOT NULL, (1) before ROW( (2) id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT ), after ROW( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT ), source ROW( version STRING, connector STRING, name STRING, ts_ms BIGINT, snapshot BOOLEAN, db STRING, sequence STRING, table STRING, txid BIGINT, lsn BIGINT, xmin BIGINT ), op STRING, ts_ms BIGINT ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;dbserver1.inventory.authors\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;scan.startup.mode\u0026#39; = \u0026#39;earliest-offset\u0026#39;, (3) \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, (4) \u0026#39;key.fields\u0026#39; = \u0026#39;id\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;json\u0026#39;, (5) \u0026#39;value.fields-include\u0026#39; = \u0026#39;EXCEPT_KEY\u0026#39; ); 1 The id field maps to the key of incoming Kafka messages 2 The before, after, source, op, and ts_ms fields map to the value of incoming Kafka messages 3 Start reading from the earliest offset of the topic 4 Use JSON as the format for Kafka keys, with the id field being part of the key 5 Use JSON as the format for Kafka values, excluding the key fields (id in this case) When taking a look at the type of the events in the Flink source table—​for instance by setting the result mode to changelog when querying the table in the Flink SQL client—​you’ll see that all the events are insertions (first op column in the listing below), no matter what their change event type is from a Debezium perspective (second op column):\n1 2 3 4 5 6 | op | id | before | after | source | op | ts_ms | +----+------+--------------------------------+--------------------------------+--------------------------------+ ---+---------------+ | +I | 1001 | \u0026lt;NULL\u0026gt; | (1001, John, Stenton, ZbJa0... | (3.1.0.Final, postgresql, d... | r | 1744296502685 | | +I | 1008 | \u0026lt;NULL\u0026gt; | (1009, John, Thomas, ZbJ0du... | (3.1.0.Final, postgresql, d... | c | 1744360987874 | | +I | 1009 | (1009, John, Thomas, ZbJ0du... | (1009, John, Beck, ZbJ0duaf... | (3.1.0.Final, postgresql, d... | u | 1744626041413 | | +I | 1008 | (1009, John, Beck, ZbJ0duaf... | \u0026lt;NULL\u0026gt; | (3.1.0.Final, postgresql, d... | d | 1744627927160 | For writing (potentially processed) change events back into an output topic, another table can be created with exactly the same schema and configuration, only that you’d adjust the topic name accordingly and omit the scan.startup.mode option. The mapping of the key is required for both source and sink table in order to ensure that the partitioning, and thus the ordering, of the Debezium events on the output topic is the same as on the input topic.\nWhen to use it: The Apache Kafka SQL Connector in append-only mode is a great choice when you want to operate on a \u0026#34;raw\u0026#34; stream of Debezium data change events, without applying any changelog or upsert semantics. It comes in handy for applying transformations such as adjusting date formats or filtering events based on specific field values. In that sense, this is similar to using the Flink DataStream API on a change event stream, only that you are using SQL rather than Java for your processing logic.\nThe Apache Kafka SQL Connector As a Changelog Source Besides the append-only mode, the Apache Kafka SQL Connector also supports changelog semantics via the Debezium data format. Both JSON (by specifying debezium-json as the value format of your table) and Avro with a registry (via debezium-avro-confluent) are supported. The INSERT, UPDATE, and DELETE events ingested from the Kafka topic are used by the Flink SQL engine to incrementally re-compute the corresponding dynamic table, as well as any continuous queries you are running against it. If you query a changelog-based source table, the result set always represents the current state of that table, updated in realtime whenever a new Debezium event comes in.\nThe table schema looks quite a bit different than before. Instead of modeling the entire Debezium envelope structure, only the actual table schema (i.e. the contents of the before and after sections) needs to be specified:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE authors_changelog_source ( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT, PRIMARY KEY (id) NOT ENFORCED (1) ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;dbserver1.inventory.authors\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;scan.startup.mode\u0026#39; = \u0026#39;earliest-offset\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;debezium-json\u0026#39; (2) ); 1 While not strictly needed here, a primary key definition—in conjunction with setting the job-level configuration table.exec.source.cdc-events-duplicate to true—ensures that duplicates are discarded in case Debezium events are ingested a second time, for instance after a connector crash 2 Using debezium-json as the value format enables changelog semantics for this table When querying this table in the Flink SQL client, the operation type reflects the kind of the incoming Debezium event. Note how update events are broken up into an update-before event (-U, representing the retraction of the old row) and an update-after event (+U, the insertion of the new row) internally by the Flink SQL engine:\n1 2 3 4 5 6 7 +----+------+------------+-----------+-----------+------------------+ | op | id | first_name | last_name | biography | registered | +----+------+------------+-----------+-----------+------------------+ | +I | 1010 | John | Thomas | ZbJ0duDvW | 1741642600000000 | | -U | 1010 | John | Thomas | ZbJ0duDvW | 1741642600000000 | | +U | 1010 | John | Stenton | ZbJ0duDvW | 1741642600000000 | | -D | 1010 | John | Stenton | ZbJ0duDvW | 1741642600000000 | For a source table it is typically not required to map the Kafka message key field(s) to the table schema when using the Debezium data format. Instead, they are part of the change event value. For situations where that’s not the case, key fields can be mapped via the key.fields configuration option; also the value.fields-include option must be set to EXCEPT_KEY then. Optionally, additional Debezium metadata fields such as the origin timestamp or the name of the source table and schema can be mapped as virtual columns:\n1 2 3 4 5 6 7 8 9 CREATE TABLE authors_changelog_source ( ts_ms TIMESTAMP_LTZ METADATA FROM \u0026#39;value.ingestion-timestamp\u0026#39; VIRTUAL, (1) source_table STRING METADATA FROM \u0026#39;value.source.table\u0026#39; VIRTUAL, (2) source_properties MAP\u0026lt;STRING, STRING\u0026gt; METADATA FROM \u0026#39;value.source.properties\u0026#39; VIRTUAL, (3) id BIGINT, ... ) WITH ( ... ); 1 Maps the ts_ms field of the change events (the time at which the data change occurred in the source database) 2 Maps the source.table field of the change events 3 Maps all the source metadata of the change events Flink’s Debezium data format requires change events to have not only the after section, but also the before part which describes the previous state of a row which got updated or deleted. This old row image is required by Flink for retracting previous values when incrementally re-computing derived data views. Unfortunately, this means that Postgres users can leverage this format only for tables which have a replica identity of FULL. Otherwise, the old row image isn’t captured in the Postgres WAL and thus not exposed via logical replication. An exception is raised in this case:\n1 2 3 java.lang.IllegalStateException: The \u0026#34;before\u0026#34; field of UPDATE message is null, if you are using Debezium Postgres Connector, please check the Postgres table has been set REPLICA IDENTITY to FULL level. at org.apache.flink.formats.json.debezium.DebeziumJsonDeserializationSchema.deserialize(DebeziumJsonDeserializationSchema.java:159) ... While Flink’s ChangelogNormalize operator can materialize the retract events (at the cost of persisting all the required data in its own state store), this currently is not supported when using the Apache Kafka SQL Connector as a changelog source with the Debezium change event format. I don’t think there’s a fundamental issue which would prevent this from being possible, it just currently isn’t implemented.\nIn order to propagate change events to another Kafka topic, you’ll need to set up a sink connector, also using debezium-json as the value format. You can define which field(s) should go into the Kafka message key via the key.fields property. Make sure to use json (not debezium-json!) as the key format:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE authors_changelog_sink ( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;authors_processed\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, \u0026#39;key.fields\u0026#39; = \u0026#39;id\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;debezium-json\u0026#39; ); While the events on the downstream Kafka topic adhere to the Debezium’s event envelope schema, they are produced by Flink, not Debezium. In particular, they are lacking all the metadata you’d usually find in the source block. Also updates are reflected by two events, rather than a single event as Debezium would emit it: a deletion event with the old row state, followed by an insert event with the new row state.\nWhen to use it: The Apache Kafka SQL connector as a changelog source (and sink) is great when you want to implement streaming queries against incoming data change events, for instance in order to create denormalized views or to enable real-time analytics of the data in an OLTP datastore. It is not the best choice for ETL pipelines which don’t require stateful processing due to the removal of all the Debezium metadata. Also, splitting updates into a delete and insert event causes write amplification in downstream systems, which otherwise might support in-place updates to existing rows.\nThe Upsert Kafka SQL Connector Last, let’s take a look at the Upsert Kafka SQL Connector. It consumes/produces a changelog stream applying \u0026#34;upsert\u0026#34; semantics. As a source connector, the first event for a given key is considered an INSERT, all subsequent events for that key with a non-null value are considered UPDATEs to the same. Tombstone records on the Kafka topic (i.e. records with a key and a null value) are interpreted as DELETE events for that key.\nTombstone records are used by Kafka to remove records during log compaction. You therefore need to configure a value for the topic’s delete.retention.ms setting which is long enough to make sure Flink gets to ingest all tombstones, also considering there may be downtimes of your processing job.\nAs a sink connector, any insert or update for a key yields an event with the current state as the value, and the deletion of a key yields a tombstone record.\nIn order for Debezium to emit such a \u0026#34;flat\u0026#34; event structure with just the current state of a row—​instead of the full Debebezium change event envelope—​the new record state transformation (a Kafka Connect single message transform, SMT) needs to be applied when configuring the connector:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 { \u0026#34;name\u0026#34;: \u0026#34;inventory-connector\u0026#34;, \u0026#34;config\u0026#34;: { \u0026#34;connector.class\u0026#34;: \u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, \u0026#34;tasks.max\u0026#34;: \u0026#34;1\u0026#34;, \u0026#34;database.hostname\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.port\u0026#34;: \u0026#34;5432\u0026#34;, \u0026#34;database.user\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.password\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.dbname\u0026#34; : \u0026#34;postgres\u0026#34;, \u0026#34;topic.prefix\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;schema.include.list\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;slot.name\u0026#34; : \u0026#34;dbserver1\u0026#34;, \u0026#34;plugin.name\u0026#34; : \u0026#34;pgoutput\u0026#34;, \u0026#34;transforms\u0026#34; : \u0026#34;unwrap\u0026#34;, (1) \u0026#34;transforms.unwrap.type\u0026#34; : \u0026#34;io.debezium.transforms.ExtractNewRecordState\u0026#34;, \u0026#34;transforms.unwrap.drop.tombstones\u0026#34; : \u0026#34;false\u0026#34; (2) } } 1 Apply the ExtractNewRecordState transform before sending the events to Kafka 2 As some Kafka Connect sink connectors can’t handle tombstone records, the connector supports dropping them. Setting this option will keep tombstone records, allowing to propagate delete events to Flink With this SMT in place, the contents of the after section of INSERT and UPDATE events will be extracted and propagated as the sole change event value, i.e. the new row state. DELETE events will be propagated as Kafka tombstones, as expected by the upsert connector. Note that the ExtractNewRecordState SMT is highly configurable, for instance you could opt into exporting specific source metadata properties as fields in the change event value, or as header properties of the emitted Kafka records.\nThe configuration of a source table for the upsert connector is pretty similar to the previous changelog source, only that the connector type is upsert-kafka:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE authors_upsert_source ( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT, PRIMARY KEY (id) NOT ENFORCED (1) ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;upsert-kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;dbserver1.inventory.authors\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;json\u0026#39; ); 1 A primary key definition is mandatory when using the upsert connector; it determines which field(s) are part of the Kafka message key and thus are forming the upsert key The same goes for defining sink tables. Now, is it also possible to ingest full Debezium change events, i.e. with the envelope, but emit upsert-style events? Indeed it is, as you can mix and match the Kafka SQL connector as a source using the debezium-json with the Upsert Kafka SQL connector as a sink using the json format. This comes in handy for instance for writing updates to an incrementally recomputed materialized view to an OLAP store for serving purposes, without incurring the overhead of the delete + insert event pair emitted by the non-upsert connector.\nWhen to use it: Use the Upsert Kafka SQL Connector for processing \u0026#34;flat\u0026#34; Data change events, without the Debezium event envelope. Similar to the Kafka SQL Connector as a changelog source, the upsert connector lets you implement streaming queries on change event feeds. Unlike the Kafka SQL Connector, updates are emitted as a single event, which results in less write overhead on downstream systems, in particular if partial updates (rather than full row rewrites) are supported.\nSummary When venturing into the world of processing Debezium data change events in realtime with Apache Flink and Flink SQL, the combination of available connectors and data formats for doing so can be somewhat overwhelming. The table below gives an overview over the different options, their characteristics, and use cases:\nConnector Kafka SQL Connector Kafka SQL Connector as changelog source Upsert Kafka SQL Connector Stream type\nAppend-only\nChangelog\nChangelog\nChange event format\njson, avro-confluent\ndebezium-json, debezium-avro-confluent\njson, avro-confluent\nInput event type\nDebezium change event envelope\nDebezium change event envelope\nFlat events with current state; tombstone records\nOutput event type\nDebezium change event envelope\nSynthetic Debezium change event envelope; updates broken up into delete + insert event\nFlat events with current state; tombstone records\nMetadata\nIn change event envelope\nMapped to table schema\nMapped to table schema, must be part of row state\nStart reading position\nConfigurable\nConfigurable\nEarliest offset\nWhen to use\nProcessing of change events themselves, e.g. transformation, enrichment, routing\nRealtime queries on changelog streams of full Debezium events, e.g. to create materialized views and enable realtime analytics\nRealtime queries on changelog streams of \u0026#34;flat\u0026#34; data change events, e.g. to create materialized views and enable realtime analytics\nInterestingly, whereas the Apache Flink project itself provides two separate Kafka connectors for upsert and non-upsert use cases, managed Flink SQL offerings in the cloud tend to provide a more unified experience centered around one single higher-level connector. As an example, the connector for integrating Flink with Kafka topics on Confluent Cloud exposes a setting changelog.mode, which defaults to append when deriving a Flink table from an uncompacted Kafka topic and to upsert for compacted topics. Similar abstractions exist on other services too, with the general aim being to shield users from some of the intricacies here.\nOne more thing you might wonder at this point is: how does Flink CDC fit into all this? Also hosted by the Apache Software Foundation, this project integrates Debezium as a native connector into Flink, instead of channeling data change events through Apache Kafka. The Flink CDC connectors also emit changelog streams with retraction events as shown above, only the Postgres connector optionally supports upsert semantics via its changelog-mode setting.\nThere are pros and cons for both ways of integrating Debezium and Flink, for instance in regards to the replayability of events. This warrants a separate blog post just dedicated to comparing both approaches at some point, though.\nIf you’d like to experiment with the different connectors and data formats for ingesting Debezium data change events from Kafka into Flink SQL by yourself, check out this project in my stream-examples repository which contains Flink jobs for all the different configurations.\n","id":3,"publicationdate":"Apr 16, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOver the years, I’ve spoken quite a bit about the use cases for processing \u003ca href=\"https://2023.javazone.no/program/355869fa-5aa0-43a7-abd2-7c5250e10bcd\"\u003eDebezium data change events with Apache Flink\u003c/a\u003e,\nsuch as metadata enrichment, building denormalized data views, and creating data contracts for your CDC streams.\nOne detail I haven’t covered in depth so far is how to actually ingest Debezium change events from a Kafka topic into Flink,\nin particular via Flink SQL.\nSeveral connectors and data formats exist for this, which can make things somewhat confusing at first.\nSo let’s dive into the different options and the considerations around them!\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"A Deep Dive Into Ingesting Debezium Events From Kafka With Flink SQL","uri":"https://www.morling.dev/blog/ingesting-debezium-events-from-kafka-with-flink-sql/"},{"content":" With help of the GraalVM configuration developed for KIP-974 (Docker Image for GraalVM based Native Kafka Broker), you can easily build a self-contained native binary for Apache Kafka. Read on to learn how you can build a native Kafka executable yourself, starting in milli-seconds, making it a perfect fit for development and testing purposes.\nWhen I wrote about ahead-of-time class loading and linking in Java 24 recently, I also published the start-up time for Apache Kafka as a native binary for comparison. This was done via Docker, as there’s no pre-built native binary of Kafka available for the operating system I’m running on, macOS. But there is a native Kafka container image, so this is what I chose for the sake of convenience.\nNow, running in a container adds a little bit of overhead of course, so it wasn’t a surprise when Thomas Würthinger, lead of the GraalVM project at Oracle, brought up the question what the value would be when running Kafka natively on macOS. Needless to say I can’t leave this kind of nice nerd snipe pass, so I set out to learn how to build a native Kafka binary on macOS, using GraalVM.\nKIP-974: Docker Image for GraalVM based Native Kafka Broker The container image for Kafka as a native binary based on GraalVM was added via KIP-974, available since Kafka 3.8.0. And while the container image, available on DockerHub, is the only official release artifact for a native Kafka binary, the tooling and infrastructure for creating that image can be used for producing a native binary for macOS as well. You can find it in the docker/native sub-directory of the Kafka source tree.\nFor creating a native binary, you’ll need to have GraalVM installed first of all. The simplest way for doing so is via SDKMan:\n1 sdk install java 21.0.6-graal This will also install GraalVM’s native-image tool, which is needed for creating native application binaries. The build requires all the Kafka libraries (JARs) as an input. Either download the latest Kafka distribution, or just build it yourself from source:\n1 2 3 4 git clone git@github.com:apache/kafka.git cd kafka ./gradlew releaseTarGz tar xvf core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT.tgz -C core/build/distributions This will give you a Kafka distribution directory under core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT. GraalVM binary image builds require a fair bit of configuration, for instance to specify which classes should be subject to reflection, which interfaces should be available for the creation of dynamic proxies, and more. All the required configuration files are provided under docker/native/native-image-configs. Using those configuration files and the JARs from the Kafka distribution, you can build a Kafka native binary like so (there’s a ready-made script native_command.sh wrapping this invocation):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 native-image --no-fallback \\ --enable-http \\ --enable-https \\ --report-unsupported-elements-at-runtime \\ --install-exit-handlers \\ --enable-monitoring=jmxserver,jmxclient,heapdump,jvmstat \\ -H:+ReportExceptionStackTraces \\ -H:+EnableAllSecurityServices \\ -H:EnableURLProtocols=http,https \\ -H:AdditionalSecurityProviders=sun.security.jgss.SunProvider \\ -H:ReflectionConfigurationFiles=docker/native/native-image-configs/reflect-config.json \\ -H:JNIConfigurationFiles=docker/native/native-image-configs/jni-config.json \\ -H:ResourceConfigurationFiles=docker/native/native-image-configs/resource-config.json \\ -H:SerializationConfigurationFiles=docker/native/native-image-configs/serialization-config.json \\ -H:PredefinedClassesConfigurationFiles=docker/native/native-image-configs/predefined-classes-config.json \\ -H:DynamicProxyConfigurationFiles=docker/native/native-image-configs/proxy-config.json \\ --verbose \\ -march=compatibility \\ -cp \u0026#34;core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT/libs/*\u0026#34; kafka.docker.KafkaDockerWrapper \\ -o \u0026#34;native-kafka\u0026#34;; say \u0026#34;Enjoy native Kafka\u0026#34; This takes about 1m 36s on my machine (2023 MacBook Pro M3 Max with 48 GB of shared RAM), after which there is a fully self-contained macOS/AArch64 binary native-kafka. To see how this one is used, refer to the launch script.\nThe binary supports two modes, setup and start. The former formats a Kafka log directory. As the primary use case is in containers, the set-up mode supports the overlay of a set of default configuration files with user-provided configuration provided via a volume mount, which is merged and then written out to another directory. For a quick test run we can overlay the default configuration from the Kafka distribution with the one from the container image for setting up a single node Kafka cluster and write the result to a new directory:\n1 2 3 4 5 6 7 8 9 mkdir native-conf export CLUSTER_ID=\u0026#34;5L6g3nShT-eMCtK--X86sw\u0026#34; # Obtain a unique id via \u0026#34;$(bin/kafka-storage.sh random-uuid)\u0026#34; ./native-kafka setup \\ --default-configs-dir core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT/config \\ --mounted-configs-dir docker \\ --final-configs-dir native-conf Formatting metadata directory /tmp/kraft-combined-logs with metadata.version 4.0-IV3 With the log directory being formatted, the actual Kafka broker can be run using the start mode like so:\n1 2 3 ./native-kafka start \\ --config docker/server.properties \\ -Dlog4j2.configurationFile=native-conf/log4j2.yaml Now, interestingly, this actually takes a fair bit longer to start than when run via Docker as in the previous post: about 220 ms from the first log message emitted by Kafka to the \u0026#34;Kafka Server started\u0026#34; message, vs the 120 ms I had observed via Docker. Which is kinda puzzling, considering that Linux containers are running in a virtual machine on macOS. It would be very interesting to learn why that’s the case, perhaps some more efficient library implementation in Linux when running in a container?\nThat being said, starting up the container itself takes about 340 ms on my machine (time from starting Docker up to the first Kafka log message), so running the native executable directly on macOS still is the fastest way to launch a Kafka broker.\n","id":4,"publicationdate":"Apr 7, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eWith help of the GraalVM configuration developed for KIP-974 (Docker Image for GraalVM based Native Kafka Broker),\nyou can easily build a self-contained native binary for Apache Kafka.\nRead on to learn how you can build a native Kafka executable yourself,\nstarting in milli-seconds, making it a perfect fit for development and testing purposes\u003c/em\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen I wrote about \u003ca href=\"/blog/jep-483-aot-class-loading-linking/\"\u003eahead-of-time class loading and linking in Java 24\u003c/a\u003e recently,\nI also published the start-up time for Apache Kafka as a native binary for comparison.\nThis was done via Docker, as there’s no pre-built native binary of Kafka available for the operating system I’m running on, macOS.\nBut there is a native Kafka container image, so this is what I chose for the sake of convenience.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eNow, running in a container adds a little bit of overhead of course,\nso it wasn’t a surprise when Thomas Würthinger, lead of the GraalVM project at Oracle,\n\u003ca href=\"https://bsky.app/profile/thomaswue.dev/post/3lloypreatk2s\"\u003ebrought up the question\u003c/a\u003e what the value would be when running Kafka natively on macOS.\nNeedless to say I can’t leave this kind of nice nerd snipe pass,\nso I set out to learn how to build a native Kafka binary on macOS, using GraalVM.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Building a Native Binary for Apache Kafka on macOS","uri":"https://www.morling.dev/blog/building-native-binary-for-apache-kafka-macos/"},{"content":" In the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am exploring interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, relevant improvements to Java and the JVM, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below.\nUpdate March 28: This post is on being discussed Hacker News 🍊\nJava 24 got released last week, and what a meaty release it is: more than twenty Java Enhancement Proposals (JEPs) have been shipped, including highlights such as compact object headers (JEP 450, I hope to spend some time diving into that one some time soon), a new class-file API (JEP 484), and more flexible constructor bodies (JEP 492, third preview). One other JEP which might fly a bit under the radar is JEP 483 (\u0026#34;Ahead-of-Time Class Loading \u0026amp; Linking\u0026#34;). It promises to reduce the start-up time of Java applications without requiring any modifications to the application itself, what’s not to be liked about that? Let’s take a closer look!\nJEP 483 is part of a broader OpenJDK initiative called Project Leyden, whose objective is to reduce the overall footprint of Java programs, including startup time and time to peak performance. Eventually, its goal is to enable ahead-of-time compilation of Java applications, as such providing an alternative to GraalVM and its support for AOT native image compilation, which has seen tremendous success and uptake recently. AOT class loading and linking is the first step towards this goal within Project Leyden. It builds upon of the Application Class Data Sharing (AppCDS) feature available in earlier Java versions. While AppCDS only reads and parses the class files referenced by an application and dumps them into an archive file, JEP 483 also loads and links the classes and caches that data. I.e. even more work is moved from application runtime to build time, thus resulting in further reduced start-up times.\nLike the case with AppCDS, a training run is required for creating the AOT cache file. During that run, you should make sure that the right set of classes gets loaded: when not loading all the classes required by an application, the AOT cache is not utilized to the fullest extent and the JVM will fall back to loading them on demand at runtime. On the other hand, when loading classes actually not used by an application at runtime (for instance classes of a testing framework), the size of the cache file gets bloated without any benefit. The classpath must be consistent between training run and actual application run: the same JAR files must be present, in the same order. The runtime classpath may be amended with additional JARs though, which naturally will not feed into the AOT cache.\nLet’s put AOT class loading and linking into action using Apache Kafka as an example. While the start-up overhead of a long-running component like a Kafka broker typically may not be that relevant, it absolutely can make a difference when for instance frequently starting and stopping brokers during development and testing.\nBuilding an AOT Cache for Apache Kafka Coincidentally, Apache Kafka 4.0 was released last week, too. So let’s download it and use it for our experiments. Unpack the distribution and format a directory for the Kafka files:\n1 2 3 tar xvf kafka_2.13-4.0.0.tgz KAFKA_CLUSTER_ID=\u0026#34;$(bin/kafka-storage.sh random-uuid)\u0026#34; bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties Building an AOT cache is a two-step process. First, a list of all the classes which should go into the archive needs to be generated. This list is then used for creating the archive itself. This feels a bit more convoluted than it should be, and indeed the JEP mentions that simplifying this is on the roadmap.\nCreate the class list like so:\n1 2 export EXTRA_ARGS=\u0026#34;-XX:AOTMode=record -XX:AOTConfiguration=kafka.aotconf\u0026#34; (1) bin/kafka-server-start.sh config/server.properties 1 The EXTRA_ARGS variable can be used to pass any additional arguments to the JVM when launching Kafka, in this case to specify that the list of classes for the AOT cache should be recorded in the file kafka.aotconf As an aside, Kafka has completely parted ways with ZooKeeper as of the 4.0 release and exclusively supports KRaft for cluster coordination. By using the server.properties file, our single broker runs in the so-called \u0026#34;combined\u0026#34; mode, so it has both the \u0026#34;broker\u0026#34; and \u0026#34;controller\u0026#34; roles. Very nice to see how simple things have become here over the years!\nOnce Kafka has started, open a separate shell window. Create a topic in Kafka, then produce and consume a couple of messages like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 bin/kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 Created topic my-topic. bin/kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092 \u0026gt;hello \u0026gt;world \u0026lt;Ctrl + C\u0026gt; bin/kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092 hello world \u0026lt;Ctrl + C\u0026gt; Processed a total of 2 messages This shows the trade-off involved when creating AOT cache files: we don’t have to produce and consume messages here, but in all likelihood this will trigger the loading of classes which otherwise would be loaded and linked at runtime only. It may be a good idea to monitor which classes get loaded via JDK Flight Recorder, thus making sure you are indeed capturing the relevant set when creating the AOT cache file.\nStop the broker by hitting \u0026lt;Ctrl + C\u0026gt; in the session where you started it. If you take a look at the kafka.aotconf file, you’ll see that it essentially is a long list of classes to be cached, as well as other class-related metadata. The comment at the top still hints at the history of Leyden’s AOT support being built on top of CDS:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # NOTE: Do not modify this file. # # This file is generated via the -XX:DumpLoadedClassList=\u0026lt;class_list_file\u0026gt; option # and is used at CDS archive dump time (see -Xshare:dump). # java/lang/Object id: 0 java/io/Serializable id: 1 java/lang/Comparable id: 2 java/lang/CharSequence id: 3 java/lang/constant/Constable id: 4 java/lang/constant/ConstantDesc id: 5 java/lang/String id: 6 java/lang/reflect/AnnotatedElement id: 7 java/lang/reflect/GenericDeclaration id: 8 java/lang/reflect/Type id: 9 java/lang/invoke/TypeDescriptor id: 10 ... Next, let’s try and create the actual AOT cache file. To do so, specify the -XX:AOTMode=create option. Note that the application is not actually executed during this process, instead the JVM will only create the AOT cache file and exit again:\n1 2 export EXTRA_ARGS=\u0026#34;-XX:AOTMode=create -XX:AOTConfiguration=kafka.aotconf -XX:AOTCache=kafka.aot\u0026#34; (1) bin/kafka-server-start.sh config/server.properties 1 Create the AOT cache using the previously created configuration file Uh, oh, something isn’t quite working as expected:\n1 2 3 4 5 java.lang.IllegalArgumentException: javax.management.NotCompliantMBeanException: com.sun.management.UnixOperatingSystemMXBean: During -Xshare:dump, module system cannot be modified after it\u0026#39;s initialized at java.management/javax.management.StandardMBean.\u0026lt;init\u0026gt;(StandardMBean.java:270) at java.management/java.lang.management.ManagementFactory.addMXBean(ManagementFactory.java:882) at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$1(ManagementFactory.java:474) ... This message was a bit confusing to me—​I don’t think I’m interacting with the Java module system in any way? So I sent a message to the leyden-dev mailing list, where I learned that this may be triggered by starting the JMX agent of the JVM. While I was not actively doing that, indeed this is the case by default as per the run-class.sh launcher script coming with the Kafka distribution. So let’s disable JMX diagnostics and try again:\n1 2 export KAFKA_JMX_OPTS=\u0026#34; \u0026#34; bin/kafka-server-start.sh config/server.properties Some of the classes are skipped for different reasons, but overall, things look much better this time:\n1 2 3 4 5 6 7 8 [0.908s][warning][cds] Preload Warning: Verification failed for org.apache.logging.log4j.core.async.AsyncLoggerContext [2.307s][warning][cds] Skipping org/slf4j/Logger: Old class has been linked [2.307s][warning][cds,resolve] Cannot aot-resolve Lambda proxy because org.slf4j.Logger is excluded [2.613s][warning][cds ] Skipping jdk/internal/event/Event: JFR event class [2.615s][warning][cds ] Skipping org/apache/logging/slf4j/Log4jLogger: Unlinked class not supported by AOTClassLinking [2.615s][warning][cds ] Skipping org/apache/logging/slf4j/Log4jLoggerFactory: Unlinked class not supported by AOTClassLinking ... AOTCache creation is complete: kafka.aot A tad concerning that Log4j’s AsyncLoggerContext class fails verification, but we’ll leave analysis of that for another time. The AOT cache file has a size of 66 MB in this case. It is considered an implementation detail and as such is subject to change between Java versions. Now let’s see what’s the impact of using the AOT cache on Kafka’s start-up time. To do so, simply specify the name of the cache file when running the application:\n1 2 export EXTRA_ARGS=\u0026#34;-XX:AOTCache=kafka.aot\u0026#34; bin/kafka-server-start.sh config/server.properties I’ve measured the start-up time by comparing the timestamp of the very first log message emitted by Kafka to the timestamp of the message saying \u0026#34;Kafka Server started\u0026#34;, always starting from a freshly formatted Kafka logs directory and flushing the page cache in between runs. Averaged over five runs, this took 285 ms on my machine (a 2023 MacBook Pro with M3 Max processor and 48 GB shared memory). In comparison, Kafka took 690 ms to start without the archive, i.e. the AOT cache makes for a whopping 59% reduction of start-up time in this scenario.\nWhen building the AOT cache, you can also disable AOT class loading and linking by specifying the -XX:-AOTClassLinking option, effectively resulting in the same behavior you’d get when using AppCDS on earlier Java versions. This would result an Kafka start-up time of 327 ms on my laptop, i.e. the lion share of the improvement in the case at hand indeed originates from reading and parsing the class files ahead of time, with AOT loading and linking them only yielding a relatively small improvement in addition. Finally, I’ve also measured how long it takes to start the Kafka native binary in a Docker container (see KIP 974), which took 118 ms, i.e. less than half of the time it took with the AOT cache. Keep in mind though that this image is considered experimental and not ready for production, whereas there shouldn’t be any concern of that kind when running Kafka with the AOT cache on the JVM.\nAOT Caching With Apache Flink As mentioned before, apart from testing scenarios, Kafka typically is a long-running workload, and as such, start-up times don’t matter that much in the grand scheme of things. To add another data point, I’ve also tested how beneficial AOT class-loading and linking is for a simple Apache Flink job.\nNow, Flink jobs usually are deployed by uploading them as a JAR to a Flink cluster, after which their code is loaded with a custom classloader. As of today, JEP 483 doesn’t support AOT class loading and linking with user-defined class loaders, though (the JEP suggests that this limitation may be lifted in a future Java version). This means that only Flink’s built-in classes would benefit from AOT, while any classes of a Flink job and its dependencies would be excluded. For my experimentation I’ve therefore decided to go with Flink’s mini-cluster deployment, a simplified mode of using Flink in a non-distributed manner, just by running the job’s main class.\nThe test job uses the Flink connector for Apache Kafka to read a message from a Kafka topic. I measured the time-to-first-message after starting the job: without the AOT cache (again averaged over five runs), this took 1.875 seconds on my machine, vs. 0.913 seconds with the AOT cache. A 51% reduction of time-to-first-message in this scenario, very nice! Using the AOT cache without loading and linking classes yielded a 40% improvement over the default behavior (1.118 seconds). I couldn’t test Flink as a GraalVM native binary; if you are aware of any work towards making that a reality, I’d love to hear from you!\nSummary AOT class loading and linking is a very welcomed addition to Java. Built upon the previously existing concepts of CDS and AppCDS, it helps to further cut down the start-up time of JVM-based applications, by moving the process of loading and linking classes ahead to build time. The actual impact will vary between specific applications, for Kafka and a basic Flink job I could observe a reduction of 59% and 51% of start-up time, respectively.\nWhile start-up times don’t matter that much for long running workloads, they can make a huge difference in cloud-native scenarios where applications are dynamically scaled out, spinning up new instances on demand as the load of incoming requests increases. Also think of scale-to-zero deployments, preview jobs for real-time queries in a cloud-based stream processing solution, CLI utilities, starting up resources such as Kafka for integration tests, and many more—​whenever a human is waiting for a process to come up and provide a response, every bit of time you can save will result in a better user experience immediately.\nThe great thing about the AOT machinery provided by Project Leyden and JEP 483 is that it requires no modifications whatsoever to your application code. It can be used with any Java application, providing potentially significant reductions to start-up times essentially for free. The required training run feels a bit cumbersome in its current form, but the JEP suggests that improvements in that area will be done in future revisions. In fact, there’s a draft JEP already which provides some more details on how this might look like. In general, the requirement of a training run can be challenging from a software development lifecycle perspective, in particular when considering (immutable) container images, for instance when deploying to Kubernetes. The application will have to be executed at image build time, also performing some work to trigger loading and linking all relevant classes, potentially requiring remote resources such as a database, too. This may not always be trivial to do.\nThe big elephant in the room is how Project Leyden compares to GraalVM, the other Java AOT technology developed by Oracle. As far as I can say, there’s quite a bit of overlap between the goals of the two projects. At this point, GraalVM is much more advanced than Leyden, with full support for AOT compilation, not only providing even more impressive improvements to start-up times (a Java application can start in a few milli-seconds when compiled into a native binary using GraalVM) but also yielding a significant reduction of memory usage. On the downside, applications and their dependencies typically need adjustment and more or less complex configuration in order to make use of GraalVM’s AOT compilation (frameworks like Quarkus can help with this task). Furthermore, the closed-world assumption underlying GraalVM prevents the dynamism the JVM is known for, such as loading classes at application runtime for plug-in use cases, modifying or even generating classes on the fly, etc.\nIn that regard it will be interesting to see what Project Leyden will come up with in this space. It also seeks to support AOT compilation eventually, but is exploring a middle ground between a highly constrained closed-world assumption and full dynamism, for instance by providing means to developers for specifying which modules of their application may be target to class redefinitions and which ones are not. Besides faster start-up times, another goal here is faster warm-up, i.e. a faster time to peak performance.\nHaving been kicked off in 2020, it got silent around Leyden for quite some time, but it has picked up steam again more recently, with JEP 483 being one of the first actual deliverables. It’ll definitely be worth keeping your eyes open for the other Leyden JEPs, AOT code compilation and AOT method profiling. Currently in draft state, there’s no target Java version known for those, but early access builds can already be obtained from the OpenJDK website.\n","id":5,"publicationdate":"Mar 27, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eIn the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am exploring interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, relevant improvements to Java and the JVM, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate March 28: This post is on \u003ca href=\"https://news.ycombinator.com/item?id=43503960\"\u003ebeing discussed Hacker News\u003c/a\u003e\u003c/em\u003e 🍊\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://openjdk.org/projects/jdk/24/\"\u003eJava 24\u003c/a\u003e got released last week,\nand what a meaty release it is:\nmore than twenty Java Enhancement Proposals (JEPs) have been shipped,\nincluding highlights such as compact object headers (\u003ca href=\"https://openjdk.org/jeps/450\"\u003eJEP 450\u003c/a\u003e, I hope to spend some time diving into that one some time soon),\na new class-file API (\u003ca href=\"https://openjdk.org/jeps/484\"\u003eJEP 484\u003c/a\u003e),\nand more flexible constructor bodies (\u003ca href=\"https://openjdk.org/jeps/492\"\u003eJEP 492\u003c/a\u003e, third preview).\nOne other JEP which might fly a bit under the radar is \u003ca href=\"https://openjdk.org/jeps/483\"\u003eJEP 483\u003c/a\u003e (\u0026#34;Ahead-of-Time Class Loading \u0026amp; Linking\u0026#34;).\nIt promises to reduce the start-up time of Java applications without requiring any modifications to the application itself,\nwhat’s not to be liked about that?\nLet’s take a closer look!\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Let's Take a Look at... JEP 483: Ahead-of-Time Class Loading \u0026 Linking!","uri":"https://www.morling.dev/blog/jep-483-aot-class-loading-linking/"},{"content":" Update March 27: This post is being discussed on Hacker News\nFor building a system of distributed services, one concept I think is very valuable to keep in mind is what I call the synchrony budget: as much as possible, a service should minimize the number of synchronous requests which it makes to other services.\nThe reasoning behind this is two-fold: synchronous calls are costly. The more synchronous requests you are doing, the longer it will take to process inbound requests to your own service; users don’t like to wait and might decide to take their business elsewhere if things take too long. Secondly, synchronous requests impact the availability of your service, because all the invoked services must be up and running in order for your service to work. The more services you rely on in a synchronous manner, the lower the availability of your service will be.\nSynchronous calls are tools that can help assure consistency, but by design they block progression until complete. In that sense, the idea of the synchrony budget is not about a literal budget which you can spend, but rather about being mindful how you implement communication flows between services: as asynchronous as possible, as synchronous as necessary.\nLet’s make things a bit more tangible by looking at an example. Consider an e-commerce website where users can place purchase orders. When an order comes in, the order entry service needs to interact with a couple of other services in order to process that order:\na payment service for processing the payment of the customer\nan inventory service for allocating stock of the purchased item\na shipment service for triggering the fulfillment of the order\nLet’s start with the last one, the shipment service. Does it matter to the customer who is placing an order when exactly the shipment service receives that notification? Not at all. Hence, notifying the shipment service synchronously from within the order entry request handler would be a waste of our synchrony budget. Not only would it cause that inbound request to take longer than it has to, it would also cause the order entry request to fail when the shipment service isn’t available, for instance due to maintenance, a network split, or some other kind of failure. Also, we don’t need to report any response from the shipment service back to the client making the inbound order placement request. This makes this call a perfect candidate for asynchronous execution, for instance by having the order service send a message to a Kafka topic, which then gets consumed by the shipment service. That way, the order service request isn’t slowed down by awaiting a response from the shipment service, also a downtime of the shipment service won’t affect the order service’s availability. It will just process any pending messages from the Kafka topic when it is up again. In general, whenever one service solely needs to notify another service about something that happened, defaulting to asynchronous communication makes a lot of sense.\nIn a similar spirit, if any changed data should be propagated from an OLTP data store to an OLAP system, this should be done asynchronously. By definition, analytical queries issued against the latter don’t require instantaneous visibility into each single data change as it is occurring in the OLTP system. So sending synchronous requests to an OLAP store would be another good example for unnecessarily spending your synchrony budget.\nNow, what if our messaging infrastructure, such as Kafka, can’t be reached? Aren’t we back to square one? We might envision some means of buffering for that case, such as storing the messages to be sent in some local state store and sending them out once connectivity to Kafka has been restored. Luckily, we don’t have to reinvent the wheel here: the outbox pattern is a well-established approach for channeling outgoing messages through a service’s data store, transactionally consistent with any other data changes that need to be done at the same time. Tools for log-based change data capture (CDC), such as Debezium, can be used for extracting the messages from an outbox table with low overhead and high performance. That way, the only stateful resource which is required by a service to process incoming requests is its own database.\nLet’s look at the communication with the inventory service next. When the order service processes an incoming request, it will require the information whether the specified item is available in the desired quantity. This differs from the notification semantics used for communicating with the shipment service, as we do need data from the inventory service in order to process the inbound request. So should we make a synchronous call in this case? It certainly could be an option, but again it would eat into our synchrony budget: there’d be an impact on our response times, and what should we do in case the inventory service isn’t available? Should the incoming request be failed? But not accepting customer requests because of some internal technical hick-up doesn’t sound that desirable.\nReversing the communication flow can be a way out: the inventory service could publish a feed of inventory changes, pushing a message to Kafka whenever there’s an inventory update. The order service could subscribe to this feed and materialize a view of this data in its own local data store. That way, no synchronous calls between services are required when processing an order request, this can solely be done by querying the order service’s database. The change feed of the inventory service could again be implemented via the outbox pattern; another option would be to use CDC for capturing changes in the actual business tables in the inventory database and then leverage stream processing, for instance with Apache Flink, to establish a stable data contract for that data stream. That way, consumers like the order service are shielded from any potentially disruptive changes to the shipment service’s data model and the stream processor can handle denormalizing relational tables to provide consumers with fully contextualized events.\nOf course, there is a trade-off here: as updates to the order service’s view of the inventory data happen asynchronously, we might run into a situation where that view is outdated and a request for an item gets accepted, while it actually is not in stock any more. In practice, Debezium and Kafka can propagate data changes with sub-second latency end-to-end, so the time window for errors will be very small during normal operation. But it also helps to take a step back and look at things from a business perspective: reality isn’t transactional to begin with. I remember a birthday party a few years back where one of my friends was on call and had to patch the inventory table of an e-commerce application after a rack of flowers had been tossed over in the warehouse. In other words, a business needs to have means of dealing with situations like this in any case. In all likelihood, we’ll be better off sending a customer a $10 voucher as an apology in the rare case of accepting an order for an item without inventory, instead of spending our synchrony budget and establishing a synchronous call flow for this process.\nNow, let’s look at the communication with the payment service. Depending on the specifics, this one actually may be a case where a synchronous call is justified. When for instance building a flight booking system, you really want to be 100% sure that the credit card of the customer can be charged successfully, before acknowledging a booking request. Replicating the data of all credit cards and bank accounts in the world obviously isn’t possible, so the call flow can’t be reversed either. It’s for a reason that payment processor APIs are built with extremely high availability in mind. And this is what the notion of the synchrony budget is about: implement inter-service calls asynchronously whenever it’s possible, so you have the room to make synchronous calls if and when it’s absolutely required. That being said, for an e-commerce application it may be actually feasible to make synchronous calls to the payment service by default, but fall back to asynchronous processing in case of failures. As the contract to sell typically only gets accepted when an item gets shipped, you still have the room to cancel an order if a payment falls through on the asynchronous processing path.\nFinally, here’s how our overall solution of the data flows relevant to the order service could look like, applying the mental model of a synchrony budget:\n","id":6,"publicationdate":"Mar 18, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate March 27: This post is being  \u003ca href=\"https://news.ycombinator.com/item?id=43452793\"\u003ediscussed on Hacker News\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor building a system of distributed services, one concept I think is very valuable to keep in mind is what I call the \u003cem\u003esynchrony budget\u003c/em\u003e:\nas much as possible, a service should minimize the number of synchronous requests which it makes to other services.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"The Synchrony Budget","uri":"https://www.morling.dev/blog/the-synchrony-budget/"},{"content":" In the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am going to explore interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below!\nThat guy above? Yep, that’s me, whenever someone says \u0026#34;Kafka queue\u0026#34;. Because, that’s not what Apache Kafka is. At its core, Kafka is a distributed durable event log. Producers write events to a topic, organized in partitions which are distributed amongst the brokers of a Kafka cluster. Consumers, organized in groups, divide the partitions they process amongst themselves, so that each partition of a topic is read by exactly one consumer in the group.\nThis partition-based design defines two of Kafka’s key characteristics:\nThe maximum degree of consumer parallelism: Each partition is processed by not more than one consumer; in order to increase the number of consumers processing a topic, it needs to be split up into more partitions, which implies a potentially costly repartitioning operation for existing topics with a large amount of data.\nOrdered processing of messages: All messages with the same partitioning key will be sent to the same partition which is processed by a single consumer.\nThese semantics make Kafka a great foundation for a large variety of high volume data streaming use cases such as click stream processing, metrics and log ingestion, real-time ETL and analytics, microservices data exchange, fraud detection, and many more. On the flip side, Kafka, as is, is not a good fit for use cases requiring queuing semantics, where you’d like to process messages one by one, potentially scaling out consumers way beyond the number of partitions in a topic. In particular, consumers as of today commit the progress they’ve made within a partition by means of persisting the offset of the last message they’ve processed. It is not possible to acknowledge or reject individual messages. This leads to a problem known as \u0026#34;head-of-line blocking\u0026#34;: if a given message can’t be consumed for whatever reason, or if it just takes very long to do so, that consumer can’t easily move beyond of that message.\nIn Kafka terminology, the elements of a topic are referred to as \u0026#34;record\u0026#34;, with \u0026#34;message\u0026#34; oftentimes being used interchangeably. Personally, I am using the former when referring to the technical concept of an entry of a log, whereas I’m using \u0026#34;message\u0026#34; (or \u0026#34;event\u0026#34;, depending on the specific use case) when discussing the semantic entity which is represented by a record.\nOne common example for this is job queueing: you’d like to submit unrelated work items to a queue, from where they are picked up and processed as quickly as possible by a set of independent workers. Each item should be processed in isolation, i.e. while one worker is consuming an item from the queue, another worker should be able to pick up the next one in parallel, without having to await successful handling of the first one. If there’s many work items, or if they take a longer time to process, it should be able to add more workers to ensure a reasonable overall throughput of the system. A work item which cannot be processed for some reason should not hold up the processing of subsequent items.\nWhile some efforts were made to support this kind of use case when using Kafka, for instance in the form of Confluent’s parallel consumer, actual queue implementations such as ActiveMQ Artemis or RabbitMQ were traditionally better suited for this. To learn more about the fundamental differences between event logs and queues, and why it can be interesting to implement the latter on top of the former, refer to this excellent blog post by Jack Vanlightly.\nAs of Kafka 4.0—​due in a couple of weeks—​things will change though: after two years of work, an Early Access of KIP-932: Queues for Kafka is part of this release. It promises to add queue-like semantics to Kafka. Let’s take a look!\nTowards Queue Support in Kafka—​Introducing Share Groups At the core of KIP-932 are so-called share groups : expanding the existing notion of Kafka consumer groups, a share group is a set of cooperative consumers processing the messages from a topic. Unlike consumer groups though, multiple members of a share group can process the messages on one and the same partition. This means that there can be more (active) members in a share group than there are partitions, and a high degree of consumer parallelism can be achieved also when having just a few or even only a single partition. Membership in a share group is coordinated using the new consumer rebalance protocol introduced in Kafka 4.0 via KIP-898. A partition consumed by a share group is called a share partition.\nMessages can be acknowledged individually, allowing for much more flexibility than the offset-based approach of consumer groups. A broker-side component called the share-partition leader manages the state of in-flight messages, distributing them to the members of the share group. The share-partition leader is co-located with the leader of the partition, i.e. it’s currently not supported to use share groups and thus Kafka queues when reading from a follower node in the Kafka cluster.\nThe messages in a share-partition go through a life cycle of distinct states as shown below:\nThe share-partition leader processes messages which are eligible for consumption on a share-partition via a sliding window, demarcated by a lower offset called the share-partition start offset (SPSO) and a higher offset called the share-partition end offset (SPEO) . All messages before the SPSO are in the Archived state, all messages after the SPEO are in Available state. The messages within the window are called in-flight messages. When a consumer fetches messages, the leader will search for available messages in the in-flight window, mark them as acquired, and return them in a batch to the consumer. To limit memory consumption on the broker, the maximum number of messages in Acquired state can be controlled via the group.share.partition.max.record.locks configuration setting. When processing a message, a consumer may\nacknowledge it as successfully consumed, transitioning it to Acknowledged state,\nrelease it, transitioning it back to Available state and thus making it available for redelivery, or\nreject it, transitioning it to Archived state, marking it as unprocessable.\nEvery message has a delivery counter, which gets increased each time it gets acquired. The maximum number of deliveries is limited using the group.share.delivery.attempt.limit broker option, preventing an infinite retry loop of consuming some unprocessable message (\u0026#34;poison pill\u0026#34;).\nOne key aspect to understand is that the specific message states exist exclusively within the scope of a specific share group; this means that for instance a message may be rejected by one share group but be processed successfully by another. A share group may also be reset, allowing it to reprocess all the messages of a topic, or all the messages after a given timestamp. The Kafka distribution provides a new script, bin/kafka-share-groups.sh , for this purpose.\nAs the available messages on a share-partition are distributed amongst the members of the share group, there’s no guarantee in regards to the order of processing. Depending on specific timing behaviors, potential retries, etc., messages with higher offsets may be consumed before messages with lower offsets in the same partition. This is in stark contrast to how traditional Kafka consumer groups work, where the messages in one partition are always consumed in order of increasing offset. The KIP mentions that ordering of messages within a single batch is guaranteed to be in increasing offset order, but I’m not sure how useful this is going to be in practice, given consumers lack control over which messages end up in a given batch.\nOn the other hand it could be very useful for certain use cases to have guaranteed ordering for the messages with one and the same key. Consider for instance an ETL use case consuming data change events produced by a CDC tool such as Debezium. The source record’s primary key is used as the Kafka message key in this scenario, ensuring all change events for a given record are written to the same partition of the corresponding Kafka topic. With regular consumer groups, ordering of events for the same key is ensured, which is vital to make sure that the destination of such a pipeline receives the change events in the correct order, for instance when considering two subsequent updates to a record.\nBut arguably, the partition-based ordering is too coarse-grained in this scenario, as the order of events across keys typically doesn’t matter (and where it does matter, it would have to be global for the entire topic, not just a single partition). This comes at the price of reduced flexibility to parallelize and scale out the consumer, as described above. In contrast, share groups essentially don’t provide strong ordering guarantees, making them not suitable for this use case. If there was support for strong key-based ordering, that’d be a very useful middle ground between scalability and the provided semantics. It would be great to see this in a future version of queue support for Apache Kafka.\nShare Groups in Action Let’s shift gears a bit and take a look at how share groups can be used from within a Java application. At the time of writing, there’s no preview build of Apache Kafka 4.0 available yet, so I’ve built Kafka and its client libraries from source, which luckily is as straight forward as running the following:\n1 ./gradlew releaseTarGz publishToMavenLocal This will yield a Kafka distribution archive under core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT and install the client libraries into the local Maven repository.\nAs of the Kafka 4.0 release, share groups are an early access feature, not meant for production usage yet. As such, the feature needs to be enabled explicitly. To do so, add the following settings to your broker configuration file (for more details, see the release notes as well as the KIP, which provides a list of all new configuration options added for share group support):\n1 2 unstable.api.versions.enable=true group.coordinator.rebalance.protocols=classic,consumer,share The Kafka client library contains a new API, KafkaShareConsumer, which exposes the new queue and share group semantics. Its overall programming model is very similar to the existing KafkaConsumer API, simplifying the transition from one to the other. For console-based access, the Kafka distribution contains a new shell script, kafka-console-share-consumer.sh , similar to kafka-console-consumer.sh known from previous Kafka versions.\nThe share consumer supports two working modes: implicit and explicit acknowledgement of messages. When using implicit mode, message acknowledgements will be committed automatically for the entire batch of messages processed by the consumer. In the simplest case, this happens for the previous batch when calling poll() again:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Properties props = new Properties(); props.setProperty(\u0026#34;bootstrap.servers\u0026#34;, \u0026#34;localhost:9092\u0026#34;); props.setProperty(\u0026#34;group.id\u0026#34;, \u0026#34;my-share-group\u0026#34;); KafkaShareConsumer\u0026lt;String, String\u0026gt; consumer = new KafkaShareConsumer\u0026lt;\u0026gt;( props, new StringDeserializer(), new StringDeserializer()); consumer.subscribe(Arrays.asList(\u0026#34;my-topic\u0026#34;)); while (true) { ConsumerRecords\u0026lt;String, String\u0026gt; records = consumer.poll( Duration.ofMillis(100)); (1) for (ConsumerRecord\u0026lt;String, String\u0026gt; record : records) { process(record); } } 1 Fetch the next batch of messages, implicitly acknowledging the messages of the previous batch This approach lacks fine-grained control over acknowledgements, but it can be interesting if your primary interest in using share groups is to increase the number of workers in a consumer group beyond the partition count. For a typical queueing use case however, you’ll want message-level acknowledgements. This can be achieved via the ShareConsumer::acknowledge() method. It takes a record and an acknowledge type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 while (true) { ConsumerRecords\u0026lt;String, String\u0026gt; records = consumer.poll( Duration.ofMillis(100)); for (ConsumerRecord\u0026lt;String, String\u0026gt; record : records) { if (isProcessable(record)) { process(record); consumer.acknowledge(record, AcknowledgeType.ACCEPT); (1) } else if (isRetriable(record)) { consumer.acknowledge(record, AcknowledgeType.RELEASE); (1) } else { consumer.acknowledge(record, AcknowledgeType.REJECT); (1) } } consumer.commitSync(); (2) } 1 Acknowledge a message 2 Synchronously commit the acknowledgement state of all messages of the batch The acknowledge type can be one of the following:\nACCEPT, if the message could be processed successfully\nRELEASE, if the message cannot be processed due to some transient error, i.e. it may be processed successfully when retrying later on\nREJECT, if the the message cannot be processed and also is not retriable\nThe acknowledgement status for a given message will only be actually committed by calling commitSync(). If the consumer crashes after calling acknowledge() but before the commit happens, all messages from the batch will be presented to a consumer of the group again. When not calling commitSync(), the next invocation of poll() will commit automatically. This happens asynchronously though, which means you might receive a new batch of messages while the commit of the acknowledgement status of a previous batch fails.\nWhen releasing a message for retrying, it will be part of a subsequent batch until the maximum delivery count for the message has been reached, in which case it will transition to Archived state, without having been processed. If required, a messages delivery count can be obtained from the ConsumerRecord. This allows you for instance to log a record when it hits the retry limit before archiving it.\nNewly created share groups start processing from the latest offset by default. If you want it to start from the beginning of the input topic(s) instead, you need to set the newly added configuration property share.auto.offset.reset to earliest. Unlike the well-known auto.offset.reset option, this is not a consumer configuration option, but a group configuration option. You can use the AdminClient API for setting it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 try (AdminClient client = AdminClient.create(adminProperties)) { ConfigEntry entry = new ConfigEntry(\u0026#34;share.auto.offset.reset\u0026#34;, \u0026#34;earliest\u0026#34;); AlterConfigOp op = new AlterConfigOp(entry, AlterConfigOp.OpType.SET); Map\u0026lt;ConfigResource, Collection\u0026lt;AlterConfigOp\u0026gt;\u0026gt; configs = Map.of( new ConfigResource( ConfigResource.Type.GROUP, SHARE_GROUP), Arrays.asList(op)); try (Admin admin = AdminClient.create(adminProperties)) { admin.incrementalAlterConfigs(configs).all().get(); } } Message-level acknowledgement is a key improvement to Kafka, enabling use cases like job queuing which were not well supported before. At the same time, the feature still feels relatively basic at this point.\nMost importantly, there’s no notion of a dead letter queue (DLQ) as of the Apache Kafka 4.0 release. Once an unprocessable message has been archived, there’s no way of identifying it. For many use cases it will be required to either have means for retrieving the unprocessable messages with an offset smaller than the SPSO or, better yet, to have bespoke DLQ support, i.e. a dedicated topic to which unprocessable messages are sent automatically. In scenarios where there’s a dependency between messages with the same key, it would also be desirable to send all subsequent messages to the DLQ once one message with a given key got DLQ-ed, until that issue has been resolved. As of today, this is something you’d have to build entirely yourself.\nAnother useful enhancement would be more flexible retrying behaviors. In the current form of Kafka queues, a released message will be retried immediately; there’s no support for delaying retries (e.g. via exponential back-off) or configure a scheduled redelivery. This means that all available retry attempts will happen very quickly, which isn’t ideal for dealing with transient failures such as not being able to connect to an external service. Retrying within a short period of time may not be useful in this situation, while retrying after 30 or 60 minutes could.\nAll that being said, the support for queue semantics in Kafka 4.0 is an early access feature after all, and I’m sure all kinds of improvements can and will be made in subsequent releases. In particular, DLQ support is explicitly being mentioned in the KIP as a future extension.\nRetry Behavior and State Management Let’s dig a bit deeper and explore how retries are currently handled by the share group API. To do so, I’ve built a share consumer which processes some messages as shown in the in-flight records example in the KIP:\nThe messages on the topic have a String value which matches their offset: \u0026#34;0\u0026#34;, \u0026#34;1\u0026#34;, \u0026#34;2\u0026#34;, etc. The process logic looks like follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 System.out.println(\u0026#34;Record | Status | Delivery Count\u0026#34;); System.out.println(\u0026#34;--------------------------------\u0026#34;); while (true) { ConsumerRecords\u0026lt;String, String\u0026gt; records = consumer.poll( Duration.ofMillis(100)); for (ConsumerRecord\u0026lt;String, String\u0026gt; record : records) { String status = switch(record.value()) { case \u0026#34;1\u0026#34;, \u0026#34;5\u0026#34; -\u0026gt; { consumer.acknowledge(record, AcknowledgeType.ACCEPT); yield \u0026#34;ACKED\u0026#34;; } case \u0026#34;3\u0026#34;, \u0026#34;7\u0026#34;, \u0026#34;8\u0026#34;, \u0026#34;9\u0026#34; -\u0026gt; { consumer.acknowledge(record, AcknowledgeType.RELEASE); yield \u0026#34;AVAIL\u0026#34;; } case \u0026#34;6\u0026#34; -\u0026gt; { consumer.acknowledge(record, AcknowledgeType.REJECT); yield \u0026#34;ARCHV\u0026#34;; } // doing nothing, i.e. remain in Acquired state default -\u0026gt; { yield \u0026#34;ACQRD\u0026#34;; } }; System.out.println(String.format(\u0026#34;%s | %s | %s\u0026#34;, record.value(), status, record.deliveryCount().get())); } consumer.commitSync(); } Starting from the beginning of the topic, here’s the output of the first polling iteration:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Record | Status | Delivery Count -------------------------------- 0 | ACKED | 1 1 | ACKED | 1 2 | ACQRD | 1 3 | AVAIL | 1 4 | ACQRD | 1 5 | ACKED | 1 6 | ARCHV | 1 7 | AVAIL | 1 8 | AVAIL | 1 9 | AVAIL | 1 2 | ACQRD | 1 4 | ACQRD | 1 2 | ACQRD | 1 4 | ACQRD | 1 2 | ACQRD | 1 4 | ACQRD | 1 ... The first ten lines—​corresponding to the first batch returned by the poll() call—​are not too surprising: all messages are processed as expected. But then something interesting is happening: messages 2 and 4 (but not messages 3, 7, 8, 9 in Available state) are retrieved again and again. As it turns out, messages in Acquired status are returned indefinitely by poll() until they are acknowledged. This happens purely client-side, i.e. reaching the broker-side maximum lock duration (configured via group.share.record.lock.duration.ms, defaulting to 30s) does not cause an interruption here, which may be surprising. Also note that the delivery count is not increased in this case. After speaking to the engineering team working on this team I learned that exact behaviors and semantics are still in flux here—​the API is marked as unstable at this point—​so you probably are going to see some changes here with the 4.1 release.\nOnly when actually acknowledging a message and trying to commit after the maximum lock duration has been reached, an exception is triggered. It is not actually raised though; instead you need to examine the partition-exception map returned by commitSync():\n1 2 3 4 5 6 7 Map\u0026lt;TopicIdPartition, Optional\u0026lt;KafkaException\u0026gt;\u0026gt; syncResult = consumer.commitSync(); System.out.println(syncResult); // output adjusted for readability: // { [.underline]#oj_vK_XvQeSrL58aI81r1g:my-topic-0=Optional[org.apache.kafka.common.errors.InvalidRecordStateException# : // The record state is invalid. The acknowledgement of delivery could not be completed.]} Note that this affects all the messages on that share partition whose acknowledgement you tried to commit. I.e. also a message which you acknowledged would be retried again in this case.\nWhen running another consumer in the same share group—​or when restarting the consumer above—​it’ll receive the available messages 3, 7, 8, and 9. Whether it’ll also receive 2 and 4 depends on whether the acknowledgement lock already has expired or not.\nShare Group State Persistence The state of inflight messages needs to be made durable by the share-partition coordinator. This responsibility is handled through a component called the share-group state persister ; While the KIP mentions that his could be a pluggable component eventually, there’s only a single persister implementation right now. It stores the state of share groups in a special Kafka topic named __share_group_state.\nThere are two kinds of records on that topic, ShareSnapshot and ShareUpdate records. The former represents a complete self-contained snapshot of the persistent state of a share-group, whereas the latter represents an incremental update to that state. An epoch field in the records is used to fence off writes by zombie share-partition leaders. Upon start-up, the coordinator reads the entire topic and builds up the state for a given share-partition. It does so by finding the latest snapshot record and then applying all subsequent updates. As such, the share group state topic isn’t suitable for Kafka topic compaction (i.e. keeping only the latest record with a given message key). Instead, the coordinator itself deletes all records for a share partition before the latest snapshot record.\nTo take a look at the __share_group_state topic, you can use the standard Kafka console consumer; just make sure to use the class o.a.k.t.c.g.s.ShareGroupStateMessageFormatter as a formatter:\n1 2 3 4 5 6 bin/kafka-console-consumer.sh \\ --bootstrap-server localhost:9092 \\ --property print.key=true \\ --topic __share_group_state \\ --from-beginning \\ --formatter=org.apache.kafka.tools.consumer.group.share.ShareGroupStateMessageFormatter Here’s a message describing the state of the inflight messages shown above:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 { \u0026#34;key\u0026#34;: { \u0026#34;version\u0026#34;: 1, (1) \u0026#34;data\u0026#34;: { \u0026#34;groupId\u0026#34;: \u0026#34;my-share-group\u0026#34;, \u0026#34;topicId\u0026#34;: \u0026#34;YrHYV-TdRrqvUkvejYQ8Gw\u0026#34;, \u0026#34;partition\u0026#34;: 0 } }, \u0026#34;value\u0026#34;: { \u0026#34;version\u0026#34;: 0, \u0026#34;data\u0026#34;: { \u0026#34;snapshotEpoch\u0026#34;: 0, \u0026#34;leaderEpoch\u0026#34;: 0, \u0026#34;startOffset\u0026#34;: 0, (2) \u0026#34;stateBatches\u0026#34;: [ { \u0026#34;firstOffset\u0026#34;: 0, \u0026#34;lastOffset\u0026#34;: 1, \u0026#34;deliveryState\u0026#34;: 2, (3) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 3, \u0026#34;lastOffset\u0026#34;: 3, \u0026#34;deliveryState\u0026#34;: 0, (4) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 5, \u0026#34;lastOffset\u0026#34;: 5, \u0026#34;deliveryState\u0026#34;: 2, (3) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 6, \u0026#34;lastOffset\u0026#34;: 6, \u0026#34;deliveryState\u0026#34;: 4, (5) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 7, \u0026#34;lastOffset\u0026#34;: 9, \u0026#34;deliveryState\u0026#34;: 0, (4) \u0026#34;deliveryCount\u0026#34;: 1 } ] } } } 1 Indicates this is a ShareUpdate record) 2 The current share-partition start offset 3 Status ACKED 4 Status AVAIL 5 Status ARCHV To manage the state of share groups, the aforementioned script bin/kafka-share-groups.sh can be used. It allows you to list and describe existing share groups and their members, reset and delete their offsets, and more:\n1 2 3 4 5 6 7 8 9 bin/kafka-share-groups.sh \\ --bootstrap-server localhost:9092 \\ --describe \\ --group my-share-group \\ --verbose GROUP TOPIC PARTITION LEADER-EPOCH START-OFFSET my-share-group\tmy-topic-2 0 - 2 Summary and Outlook KIP-932: Queues for Kafka adds a long awaited capability to the Apache Kafka project: queue-like semantics, including the ability to acknowledge messages on a one-by-one basis. This positions Kafka for use cases such as job queuing, for which it hasn’t been a good fit historically. As multiple members of a share group can process the messages from a single topic partition, the partition count does not limit the degree of consumer parallelism any longer. The number of consumers in a group can quickly be increased and decreased as needed, without requiring to repartition the topic.\nBuilt on top of Kafka’s event log semantics, Kafka queues provide some interesting characteristics typically not found in other queue implementations, such as the ability to retain the messages on a queue for an indefinite period of time, reprocess some or all of them, and have multiple independent groups of consumers, with each group processing all the messages on the topic. For instance, you could have two share groups applying slightly different variants of some processing logic in an A/B testing scenario.\nOne aspect which I couldn’t explore due to time constraints are the performance characteristics of Kafka’s queue support. It would be interesting to see how the overall throughput increases as more consumers are added to a share group—​without increasing the number of partitions—​how message-level acknowledgements impact performance, or what the impact of, say, rejecting and retrying every 10th message would be. This would be a highly interesting topic for a follow-up post.\nAvailable as an early access feature as of the Kafka 4.0 release, Kafka queues are not recommended for production usage yet, and there are several limitations worth calling out: most importantly, the lack of DLQ support. More control over retry timing would be desirable, too. As such, I don’t think Kafka queues in their current form will make users of established queue solutions such as Artemis or RabbitMQ migrate to Kafka. It is a very useful addition to the Kafka feature set nevertheless, coming in handy for instance for teams already running Kafka and who look for a solution for simple queuing use cases, avoiding to stand up and operate a separate solution just for these. This story will become even more compelling if the feature gets built out and improved in future Kafka releases.\nVoting for the release 4.0.0. RC1 of Apache Kafka just started earlier today, so it shouldn’t be much longer until you can give queue support a try yourself with an official release. To discuss any feedback you may have, reach out to the Kafka developer mailing list.\nMany thanks to Andrew Schofield for his input and feedback while writing this post!\n","id":7,"publicationdate":"Mar 5, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eIn the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am going to explore interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below!\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cspan class=\"image\"\u003e\u003cimg src=\"/images/kip_932_1.jpg\" alt=\"kip 932 1\" width=\"333px\"/\u003e\u003c/span\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThat guy above? Yep, that’s me, whenever someone says \u0026#34;Kafka queue\u0026#34;. Because, that’s not what Apache Kafka is. At its core, Kafka is a distributed durable event log. Producers write events to a topic, organized in partitions which are distributed amongst the brokers of a Kafka cluster. Consumers, organized in groups, divide the partitions they process amongst themselves, so that each partition of a topic is read by exactly one consumer in the group.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Let's Take a Look at... KIP-932: Queues for Kafka!","uri":"https://www.morling.dev/blog/kip-932-queues-for-kafka/"},{"content":" If you are following the news around Debezium—​an open-source platform for Change Data Capture (CDC) for a variety of databases—​you may have seen the announcement that the project is in the process of moving to the Commonhaus Foundation. I think this is excellent news for the Debezium project, its community, and open-source CDC at large. In this post I’d like to share some more context on why I am so excited about this development.\nDebezium was founded in 2016 by Randall Hauch, back then a software engineer at Red Hat. Pretty much from the get go, the Apache-licensed project attracted a very diverse community not only of users, but also of contributors. The Debezium development team has been doing an amazing job of fostering a welcoming and open environment, establishing a level playing field for contributors from Red Hat—​who funded and continues to fund the salaries of the core team working on the project, including mine, while leading Debezium between 2017 and 2022—​as well as other organizations alike. Companies such as Google, IBM, Stripe, Slack, WePay, SugarCRM, Instaclustr, Bolt, and many others have put substantial resources into the project. More than 650 individuals have contributed to the project at this point, ranging from small fixes and improvements, over developing complete features in the Debezium core framework, all the way up to driving the work and roadmap of specific connectors.\nOf course I am biased, but I think it’s fair to say that when it comes to \u0026#34;vendor-owned\u0026#34; open-source, Debezium has been a tremendous success. When the project website says the following, it’s truly meant like that and not just empty words:\nThe Debezium project is operated as a community-centric open source project. While Red Hat product management has a voice, it is akin to the same voice of any member of the community, whether they contribute code, bug reports, bug fixes or documentation.\nThe community has truly lived up to this aspiration and the project has always managed to align the interests of the different parties involved (I only remember a single time where there was a continued discussion about a specific feature and its implementation between the core team and the contributing team, with the idea of forking the project being floated briefly). Nevertheless, ultimately the Debezium project was controlled by a single entity, Red Hat. They owned the name, the domain, the GitHub organization, social media channels, etc. Despite the continued demonstration of best intentions, some folks may have had reservations to contribute to a project managed like that.\nThat’s why I was thrilled to learn that several other projects sponsored and managed by Red Hat, for instance Quarkus and Hibernate, announced their move to the Commonhaus Foundation earlier this year. This foundation acts as a 100% neutral home of open-source projects, addressing any potential concerns around ownership which contributors may have. I was hoping for Debezium to make the move to Commonhaus as well, and I could not have been any happier when learning a few weeks back that it actually is going to happen.\nThe Commonhaus Foundation is a particularly interesting instance of an open-source foundation, as it provides its projects with an extensive degree of freedom. Quoting their FAQ, what Commonhaus differentiates from other foundations such as Apache Software Foundation, Eclipse Foundation, or Linux Foundation, is this (check out the full FAQ for comparisons with specific foundations):\nThe Commonhaus Foundation sets itself apart by providing open source projects with a unique combination of autonomy and tailored support, adapted to their specific stages of development and needs. By simplifying access to funding and offering a stable, long-term home for their assets, the Foundation enables projects to govern themselves and leverage collective resources for greater visibility and impact.\nUnlike the structured environments and specific licensing and infrastructure requirements characteristic of foundations like the Apache and Eclipse Foundations, Commonhaus allows projects to maintain their established brand, community identity, infrastructure, and governance practices. It also supports a broader array of OSI-approved licenses.\nThe way I perceive it, Commonhaus is a \u0026#34;No frills\u0026#34; foundation, a neutral project home which acts as the owner of IP such as project trademarks, helps projects with financial management, provides them with infrastructure for receiving donations (something we always struggled with during my time leading the project), and more. But it stays out of projects day-to-day operations as much as possible. I believe it’s a perfect fit for a project like Debezium, with a strong existing community, brand, and established processes. Debezium is going to join the ranks of other popular projects under the Commonhaus umbrella, such as SDKMan, OpenRewrite, and Jackson. Also SlateDB, a recently open-sourced embedded database built on object storage just moved to Commonhaus, which goes to show that the foundation also is a great home for young projects, relatively early in their lifecycle.\nAs such, I think moving to Commonhaus is an outstanding milestone for the Debezium project, ensuring its ongoing success in the future. Big kudos to the Debezium team for making this move, and massive props to Red Hat for supporting this step. It shows a deep understanding of and belief into open-source and its unique advantages, not paralleled by many other organizations. Now, some folks might wonder whether this is about dumping a project to an open-source foundation and then quickly pulling resources after that. Needless to say that I am not a spokesperson for Red Hat and I can’t predict what’s going to happen in the future. But personally, this is not something I am worried about. Historically, this isn’t something the company has been doing (with the exception of the Ceylon programming language perhaps, which got discontinued pretty quickly after moving to the Eclipse Foundation). Case in point, they just published a job posting for a Principal Software Engineer working on Debezium.\nTo wrap things up, I think the future is looking really bright for Debezium. The need for CDC and the interest in Debezium as the leading open-source implementation is unbroken. At the same time, it’s a very active space, with new projects popping up frequently, so it’s vital for Debezium and its community to keep moving and innovating. The move to Commonhaus lays an excellent foundation for this next chapter of Debezium’s success. With the team currently discussing the project roadmap for 2025, it’s a perfect time for getting involved and becoming a part of the journey.\n","id":8,"publicationdate":"Nov 27, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIf you are following the news around Debezium—​an open-source platform for Change Data Capture (CDC) for a variety of databases—​you may have seen the announcement that the project is in the process of \u003ca href=\"https://debezium.io/blog/2024/11/04/debezium-moving-to-commonhaus/\"\u003emoving to the Commonhaus Foundation\u003c/a\u003e. I think this is excellent news for the Debezium project, its community, and open-source CDC at large. In this post I’d like to share some more context on why I am so excited about this development.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Thoughts On Moving Debezium to the Commonhaus Foundation","uri":"https://www.morling.dev/blog/thoughts-on-moving-debezium-to-commonhaus-foundation/"},{"content":" Every now and then, it can come in very handy to build OpenJDK from source yourself, for instance if you want to explore a feature which is under development on a branch for which no builds are published. For some reason I always thought that building OpenJDK is a very complex processing, requiring the installation of arcane tool chains etc. But as it turns out, this actually not true: the project does a great job of documenting what’s needed and only a few steps are necessary to build your very own JDK.\nThe following is a run-down of what I had to do to build JDK 24 from source on macOS 14.7.1. This is mostly for my own reference, check out the upstream documentation for a comprehensive description of the OpenJDK build, all requirements, build options, etc.\nFirst, install the required tools:\nA boot JDK, typically the previous version; I highly recommend to use SDKMan to do so:\n1 sdk install java 23.0.1-tem XCode, Apple’s development environment for macOS; the easiest way is to get it from the App Store. Unfortunately though, the current release 16.1 ships a broken version of clang which makes the JDK build fail. So you should either install 15.4 from Apple Developer, or apply the following patch before building OpenJDK which sidesteps that issue (at the price of building with fewer compiler optimizations):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 git apply \u0026lt;\u0026lt; EOF --- a/make/autoconf/flags-cflags.m4 +++ b/make/autoconf/flags-cflags.m4 @@ -337,9 +337,9 @@ AC_DEFUN([FLAGS_SETUP_OPTIMIZATION], C_O_FLAG_HIGHEST=\u0026#34;-O3 -finline-functions\u0026#34; C_O_FLAG_HI=\u0026#34;-O3 -finline-functions\u0026#34; else - C_O_FLAG_HIGHEST_JVM=\u0026#34;-O3\u0026#34; - C_O_FLAG_HIGHEST=\u0026#34;-O3\u0026#34; - C_O_FLAG_HI=\u0026#34;-O3\u0026#34; + C_O_FLAG_HIGHEST_JVM=\u0026#34;-O1\u0026#34; + C_O_FLAG_HIGHEST=\u0026#34;-O1\u0026#34; + C_O_FLAG_HI=\u0026#34;-O1\u0026#34; fi C_O_FLAG_NORM=\u0026#34;-O2\u0026#34; C_O_FLAG_DEBUG_JVM=\u0026#34;-O0\u0026#34; EOF Autoconf:\n1 brew install autoconf With that, you should have everything in place for building OpenJDK:\nClone the project:\n1 2 git clone https://git.openjdk.org/jdk cd jdk Run configure:\n1 bash configure Run the actual build:\n1 make images Rejoice:\n1 2 3 4 ./build/macosx-aarch64-server-release/jdk/bin/java --version openjdk 24-internal 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.gunnarmorling.jdk) OpenJDK 64-Bit Server VM (build 24-internal-adhoc.gunnarmorling.jdk, mixed mode) And that’s it, you now have your own JDK build you can use for testing. Pretty easy, right? That said, if you still don’t feel like running this build by yourself, and if you’re on Linux rather than macOS, you also can check out the OpenJDK builds provided by Aleksey Shipilëv, which are provided for a variety of OpenJDK projects as well as target platforms.\n","id":9,"publicationdate":"Nov 16, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eEvery now and then, it can come in very handy to build OpenJDK from source yourself,\nfor instance if you want to explore a feature which is under development on a branch for which no builds are published.\nFor some reason I always thought that building OpenJDK is a very complex processing,\nrequiring the installation of arcane tool chains etc.\nBut as it turns out, this actually not true:\nthe project does a great job of documenting what’s needed and only a few steps are necessary to build your very own JDK.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Building OpenJDK From Source On macOS","uri":"https://www.morling.dev/blog/building-openjdk-from-source-on-macos/"},{"content":" This page gives an overview over some talks I have done over the last years. I have spoken at large conferences such as QCon San Francisco, Devoxx and JavaOne, local meet-ups as well as company-internal events, covering topics such as Debezium and Change Data Capture, Bean Validation, NoSQL and more.\nIf you’d like to have me as a speaker at your conference or meet-up, please get in touch.\n2025 Current (Bengaluru, India): Ins and Outs of the Outbox Pattern\n2024 Flink Forward Asia (Jakarta, Indonesia): Streaming Data Contracts With Debezium and Apache Flink\nBig Data Europe (Vilnius, Lithunia): Data Contracts In Practice With Debezium and Apache Flink\nBig Data Europe (Vilnius, Lithunia; panel discussion): Building Effective Data Teams: Strategies for Success\nJ-Fall (Ede, Netherlands): 1BRC–-Nerd Sniping the Java Community\nP99 Conf (online): 1BRC–-Nerd Sniping the Java Community\nFlink Forward (Berlin; panel discussion): AI and Apache Flink: Expert Panel\nDevoxx Belgium (Antwerp; joint talk together with Roy van Rijn): 1BRC–-Nerd Sniping the Java Community\nDevoxx Belgium (Antwerp; joint hands-on lab with Hans-Peter Grahsl): Putting AI Into Real-time ETL with Apache Flink, Debezium, and LangChain4j\nInfoQ DevSummit (Munich): 1BRC–-Nerd Sniping the Java Community\nCurrent (Austin): Data Contracts In Practice With Debezium and Apache Flink\nJavaZone (Oslo): 1BRC—​Nerd Sniping the Java Community\nData Berlin Midsummer Meetup: From Postgres to OpenSearch in No Time\nJCon OpenBlend Slovenia: 1BRC—​Nerd Sniping the Java Community, Syncing your Database To OpenSearch In Real-Time\nJavaDay Istanbul: Syncing your Database To OpenSearch In Real-Time\nIndia Open Source Data Infrastructure Meetup (Bengaluru): From Postgres to OpenSearch in No Time\nKafka Summit (Bengaluru): Debezium Snapshots Revisited!\nKafka Summit (London): Data Contracts In Practice With Debezium and Apache Flink\nJavaLand (Nürburgring): 1BRC—​Nerd Sniping the Java Community\n2023 Open Source Data Infrastructure Meetup (Berlin): From Postgres to OpenSearch in No Time\nFlink Forward (Seattle): Debezium Snapshots Revisited!\nCurrent (San José): Debezium Snapshots Revisited!\nJavaZone (Oslo): Real-time Change Stream Processing with Apache Flink\nKafka Summit (London): Taming Kafka Connect with kcctl\nData Council (Austin): Change Data Streaming Patterns With Debezium \u0026amp; Apache Flink\nQCon London: Change Data Capture for Microservices; I also was the host for the Building Modern Backends track\njProfessionals (Sofia): \u0026#34;Change Stream Processing with Debezium and Apache Flink\u0026#34;\n2022 Devoxx (Antwerp): Taming Kafka Connect with kcctl\nDevoxx (Antwerp): Keep Your Cache Always Fresh with Debezium!\nCurrent (Austin): Keep Your Cache Always Fresh with Debezium!\ncode.talks (Hamburg): Change Data Streaming Patterns für Verteilte Systeme\nUptime (Amsterdam): Keep Your Cache Always Fresh with Debezium!\nJava User Group Hamburg: Mit Java-18-APIs zum Mond und weiter\nJBCONConf (Barcelona): Keep Your Cache Always Fresh with Debezium!\nKafka Summit London: Keep Your Cache Always Fresh with Debezium! (recording, slides)\nCarnegie Mellon University \u0026#34;Vaccination Database Tech Talks\u0026#34; (online): Open-source Change Data Capture With Debezium (recording, slides)\njChampionsConference (online): Continuous Performance Regression Testing with JfrUnit (recording, slides)\n2021 DevNation Tech Talk (online): To the Moon and Beyond With Java 17 APIs!\nRed Hat Summit Connect Developer Experience (online, German): Change Data Streaming Patterns für Verteilte Systeme\nFlink Forward (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems\nVoxxed Days Romania (joint presentation with Hans-Peter Grahsl; online): Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB\nP99 Conf (online): Continuous Performance Regression Testing with JfrUnit\nAccento (online): To the Moon and Beyond With Java 17 APIs!; Panel The Present and Future of Java (17)\nHeise betterCode() Java (online): Mit Java-17-APIs zum Mond und weiter\nKafka Summit Americas (joint presentation with Hans-Peter Grahsl; online): Dissecting our Legacy: The Strangler Fig Pattern with Debezium, Apache Kafka \u0026amp; MongoDB\nApache Pinot Meetup (joint presentation with Kenny Bastani; online): Analyzing Real-time Order Deliveries using CDC with Debezium and Pinot\nMongoDB World (joint presentation with Hans-Peter Grahsl; online): Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB\njLove (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems\nBerlin Buzzwords (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems (recording, slides)\nThe Developer’s Conference (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems (slides)\nKafka Summit Europe (joint presentation with Hans-Peter Grahsl; online): Advanced Change Data Streaming Patterns in Distributed Systems (recording, slides)\nDevNation Tech Talk (online): Continuous performance regression testing with JfrUnit (recording, slides)\nBordeaux JUG (joint presentation with Katja Aresti; online): Don’t fear outdated caches — change data capture to the rescue! Let’s discover Infinispan and Debezium\njChampionsConference (joint presentation with Andres Almiray; online): Plug-in Architectures for Java with Layrry \u0026amp; the Java Module System\n2020 JokerConf (online): Change data capture pipelines with Debezium and Kafka Streams\nVirtual JUG (joint presentation with Andres Almiray; online): Plug-in Architectures With Layrry and the Java Module System (recording, slides)\nQConPlus (online): Serverless Search for My Blog With Java, Quarkus, \u0026amp; AWS Lambda\nJFall (joint presentation with Andres Almiray; online): Plug-in Architectures for Java With Layrry and the Java Module System\nJava Day Istanbul (online): Change Data Streaming Use Cases With Apache Kafka \u0026amp; Debezium\nGreat International Developer Summit (online): Change Data Capture Pipelines with Debezium and Kafka Streams\nKafka Summit (online): Change Data Capture Pipelines with Debezium and Kafka Streams\nRed Hat Summit Virtual Experience: Data integration patterns for microservices with Debezium and Apache Kafka\n2019 Nordic Coding, Kiel: Quarkus - Supersonic Subatomic Java\nJava User Group Paderborn: Change Data Streaming Use Cases mit Debezium und Apache Kafka\nQCon San Francisco: Practical Change Data Streaming Use Cases With Apache Kafka \u0026amp; Debezium\nJokerConf, St. Petersburg: Practical change data streaming use cases with Apache Kafka and Debezium\nJavaZone, Oslo: Change Data Streaming For Microservices With Apache Kafka and Debezium\nMicroXchg, Berlin: Change Data Streaming Patterns For Microservices With Debezium\nJavaLand, Brühl\nChange Data Streaming für Microservices mit Debezium\nDas Annotation Processing API - Use Cases und Best Practices\nRivieraDev, Sophia Antipolis: Practical Change Data Streaming Use Cases With Apache Kafka and Debezium\nKafka Summit London: Change Data Streaming Patterns For Microservices With Debezium\nRed Hat Summit, Boston\nBridging microservice boundaries with Apache Kafka and Debezium (hands-on lab)\nChange data streaming patterns for microservices with Debezium\nRed Hat Modern Integration and Application Development Day, Milano: Data Strategies for Microservices: Change Data Capture with Debezium\n2018 Devoxx Morocco, Marrakesh\nChange Data Streaming Patterns for Microservices With Debezium\nMap me if you can! Painless bean mappings with MapStruct\nKafka Summit San Francisco: Change Data Streaming Patterns for Microservices With Debezium\nVoxxedDays Microservices Paris: Data Streaming for Microservices using Debezium\nJUG Saxony Day, Dresden: Streaming von Datenbankänderungen mit Debezium\nJava User Group Darmstadt: Streaming von Datenbankänderungen mit Debezium\nJavaLand, Brühl: Hibernate - State of the Union; Migrating to Java 9 Modules with ModiTect\nRivieraDev, Sophia Antipolis: Data Streaming for Microservices using Debezium\nRed Hat Summit, San Francisco: Running data-streaming applications with Kafka on OpenShift (hands-on lab)\nJava User Group Münster, Streaming von Datenbankänderungen mit Debezium\n2017 JavaZone, Oslo: Keeping Your Data Sane with Bean Validation 2.0\ncode.talks, Hamburg: Neues in Bean Validation 2.0 - Support für Java 8 und mehr (recording)\nJavaOne, San Francisco\nKeeping Your Data Sane with Bean Validation 2.0\nNoSQL? Have it Your Way!\nDevoxx Belgium, Antwerp\nStreaming Database Changes with Debezium\nShort talks on Bean Validation 2.0 and MapStruct\njdk.io, Copenhagen: Keeping Your Data Sane with Bean Validation 2.0\nRivieraDev, Sophia Antipolis: Keeping Your Data Sane with Bean Validation 2.0\nJavaLand, Brühl\nBean Validation 2.0\nHibernate Search and Elasticsearch\n2016 JavaZone, Oslo: From Hibernate to Elasticsearch in no time\n2013 Berlin Expert Days: Bean Validation 1.1 - Whats Cooking? (slides)\n","id":10,"publicationdate":"Nov 7, 2024","section":"","summary":"This page gives an overview over some talks I have done over the last years. I have spoken at large conferences such as QCon San Francisco, Devoxx and JavaOne, local meet-ups as well as company-internal events, covering topics such as Debezium and Change Data Capture, Bean Validation, NoSQL and more.\nIf you’d like to have me as a speaker at your conference or meet-up, please get in touch.\n2025 Current (Bengaluru, India): Ins and Outs of the Outbox Pattern","tags":null,"title":"Conferences","uri":"https://www.morling.dev/conferences/"},{"content":" During and after my time as the lead of Debezium, a widely used open-source platform for Change Data Capture (CDC) for a variety of database, I got repeatedly asked whether I’d be interested in creating a company around CDC. VCs, including wellknown household names, did and do reach out to me, pitching this idea.\nOn the surface, this sounds tempting. CDC, and Debezium in particular, are widely used in the data sphere. So taking a few million seed capital and building CDC-as-a-Service sounds like an attractive idea, doesn’t it? Living the start-up life and creating a unicorn-to-be, oh what a sweet dream. But having worked on CDC for quite a few years now, I am convinced that this wouldn’t the right thing to do.\nThe reason being that CDC is a feature, not a product.\nBy that I mean that CDC is an incredibly powerful tool, a huge enabler for working with your data in real-time, enabling a wide range of use cases such replication, cache and search index updates, auditing, microservice data exchange, and many others. Liberty for your data—​rejoice!\nBut it’s that, an enabler. CDC isn’t really that useful on its own. You ingest data change events into Kafka, and then what? At the very least, you want to have sink connectors which take that data and put it elsewhere. For a successful product, you need to solve a problem people have. And that problem rarely is \u0026#34;Take my data from Postgres to Kafka\u0026#34;, and much more often is \u0026#34;Take my data from Postgres to Snowflake/Elasticsearch/S3\u0026#34;. Very often, you also want to put some processing to your change event streams, e.g. to filter, transform, denormalize, or aggregate them.\nIn my opinion, CDC makes sense as part of a cohesive data platform which integrates all these things. These, and more: also data governance, schema management, observability, quality management, etc. Another angle for CDC productization could be to marry it closely with a database. Imagine Postgres provided out of the box a Kafka broker endpoint to which you can subscribe for getting Debezium-formatted data change events. How cool would that be? But again, that’s a feature, not a product.\nNow, there have been a few start-ups focused on CDC lately. Two that stuck out to me were Arcion and PeerDB: They got acquired quickly by Databricks and Clickhouse, respectively. As I suppose with the goal of turning them—​you’ll guess it—​into features of their data offerings.\n","id":11,"publicationdate":"Oct 18, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eDuring and after my time as the lead of \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e,\na widely used open-source platform for Change Data Capture (CDC) for a variety of database,\nI got repeatedly asked whether I’d be interested in creating a company around CDC.\nVCs, including wellknown household names, did and do reach out to me,\npitching this idea.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"CDC Is a Feature Not a Product","uri":"https://www.morling.dev/blog/cdc-is-a-feature-not-a-product/"},{"content":" Whenever I’ve need a Linux box for some testing or experimentation, or projects like the One Billion Row Challenge a few months back, my go-to solution is Hetzner Online, a data center operator here in Europe.\nTheir prices for VMs are unbeatable, starting with 3,92 €/month for two shared vCPUs (either x64 or AArch64), four GB of RAM, and 20 TB of network traffic (these are prices for their German data centers, they vary between regions). four dedicated cores with 16 GB, e.g. for running a small web server, will cost you 28.55 €/month. Getting a box with similar specs on AWS would set you back a multiple of that, with the (outbound) network cost being the largest chunk. So it’s not a big surprise that more and more people realize the advantages of this offering, most notably Ruby on Rails creator David Heinemeier Hansson, who has been singing the praise for Hetzner’s dedicated servers, but also their VM instances, quite a bit on Twitter lately.\nSo I thought I’d share the automated process I’ve been using over the last few years for spinning up new boxes on Hetzner Cloud, hoping it’s gonna be helpful to other folks out there eager to explore this world of cheap compute. I’ve had that set-up in a GitHub repo for quite a while and meant to write about it, with the recent attention on Hetzner being a nice motivator for finally doing so. Note I am not affiliated with Hetzner in any way or form, I just like their offering and think more people should be aware of it and benefit from it.\nCreating Instances To create new VMs, I am using Terraform, which shouldn’t be a big surprise. The Hetzner Terraform provider is very mature and reflects the latest product features pretty quickly, as far as I can tell (alternatively, there’s a CLI tool, and of course an API as well). Here’s my complete Terraform definition for launching one VM instance and a firewall to control access to it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 terraform { required_providers { hcloud = { source = \u0026#34;hetznercloud/hcloud\u0026#34; version = \u0026#34;~\u0026gt; 1.45\u0026#34; } } } variable \u0026#34;hcloud_token\u0026#34; { sensitive = true } variable \u0026#34;firewall_source_ip\u0026#34; { default = \u0026#34;0.0.0.0\u0026#34; } # Configure the Hetzner Cloud Provider provider \u0026#34;hcloud\u0026#34; { token = \u0026#34;${var.hcloud_token}\u0026#34; (1) } resource \u0026#34;hcloud_firewall\u0026#34; \u0026#34;common-firewall\u0026#34; { (2) name = \u0026#34;common-firewall\u0026#34; rule { direction = \u0026#34;in\u0026#34; protocol = \u0026#34;tcp\u0026#34; port = \u0026#34;14625\u0026#34; (3) source_ips = [ \u0026#34;${var.firewall_source_ip}/32\u0026#34; (4) ] } rule { direction = \u0026#34;in\u0026#34; protocol = \u0026#34;icmp\u0026#34; source_ips = [ \u0026#34;${var.firewall_source_ip}/32\u0026#34; ] } } resource \u0026#34;hcloud_server\u0026#34; \u0026#34;control\u0026#34; { (5) name = \u0026#34;control\u0026#34; image = \u0026#34;fedora-40\u0026#34; location = \u0026#34;nbg1\u0026#34; server_type = \u0026#34;cx22\u0026#34; (6) keep_disk = true ssh_keys = [\u0026#34;some key\u0026#34;] (7) firewall_ids = [hcloud_firewall.common-firewall.id] } output \u0026#34;control_public_ip4\u0026#34; { value = \u0026#34;${hcloud_server.control.ipv4_address}\u0026#34; } 1 Hetzner Cloud API token, defined in .tfvars 2 Setting up a firewall for limiting access to the instance 3 Using a random non-standard SSH port; take that, script kiddies! And no, this is not the one I am actually using 4 If I don’t need public access, allowing to connect only from my own local machine 5 The VM to set up 6 The instance size, in this case the smallest one they have with 2 vCPUs and 4 GB of RAM 7 SSH access key, to be set up in the web console before Bringing up the VM is as easy as running the following command:\n1 TF_VAR_firewall_source_ip=`dig +short txt ch whoami.cloudflare @1.0.0.1 | tr -d \u0026#39;\u0026#34;\u0026#39;` terraform apply -var-file=.tfvars Note how I am injecting my own public IP as a variable, allowing the firewall configuration to be trimmed down to grant access only from that IP. That’s my standard set-up for test and dev boxes which don’t require public access. After just a little bit, your new cloud VM will be up and running, with Terraform reporting the IP address of the new box in its output. The cool thing is that you can rescale this box later on as needed. If you set keep_disk to true as above, the box will keep its initial disk size, allowing you to scale back down later on, too.\nSo I’ll always start with the smallest configuration, which costs not even four Euros per month. Then, when I am actually going to make use of the box for something which requires a bit more juice, I’ll update the server_type line as needed, e.g. to \u0026#34;ccx33\u0026#34; for eight dedicated vCPUs and 32 GB of RAM. This configuration would then cost 9,2 cents per hour, until I scale it back down again to cx22. Rescaling just takes a minute or two and is done by re-running Terraform as shown above. So it’s something which you can easily do whenever starting or stopping to work on some project. Of course, this makes sense for ad-hoc usage scenarios like mine, not so much for more permanently running workloads.\nConfiguring SSH After the box has been set up via Terraform, I am using Ansible for provisioning, i.e. the installation of software (yepp, my Red Hat past is shining through here). That way, the process is fully automated, and I can set up and provision new machines with the same configuration with ease at any time. My Ansible set-up is made up of two parts: one for configuring SSH, one for installing whatever packages are needed.\nHere’s the playbook for the SSH configuration, applying some best practices such as enforcing key-based authentication and disabling remote root access:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 --- - name: Create user (1) hosts: all remote_user: root gather_facts: false vars_files: - vars.yml tasks: - name: have {{ user }} user user: name: \u0026#34;{{ user }}\u0026#34; shell: /bin/bash - name: add wheel group group: name: wheel state: present - name: Allow wheel group to have passwordless sudo lineinfile: dest: /etc/sudoers state: present regexp: \u0026#39;^%wheel\u0026#39; line: \u0026#39;%wheel ALL=(ALL) NOPASSWD: ALL\u0026#39; validate: visudo -cf %s - name: add user user: name={{ user }} groups=wheel state=present append=yes - name: Add authorized key authorized_key: user: \u0026#34;{{ user }}\u0026#34; state: present key: \u0026#34;{{ lookup(\u0026#39;file\u0026#39;, \u0026#39;{{ ssh_public_key_file }}\u0026#39;) }}\u0026#34; (2) - name: Set up SSH (3) hosts: all remote_user: \u0026#34;build\u0026#34; become: true become_user: root gather_facts: false vars_files: - vars.yml tasks: - name: Disable root login over SSH lineinfile: dest=/etc/ssh/sshd_config regexp=\u0026#34;^PermitRootLogin\u0026#34; line=\u0026#34;PermitRootLogin no\u0026#34; state=present notify: - restart sshd - name: Disable password login lineinfile: dest=/etc/ssh/sshd_config regexp=\u0026#34;^PasswordAuthentication\u0026#34; line=\u0026#34;PasswordAuthentication no\u0026#34; state=present notify: - restart sshd - name: Change SSH port lineinfile: dest=/etc/ssh/sshd_config regexp=\u0026#34;^#Port 22\u0026#34; line=\u0026#34;Port 14625\u0026#34; state=present notify: - restart sshd handlers: - name: restart sshd service: name: sshd state: restarted 1 Adding a user \u0026#34;build\u0026#34; (name defined vars.yml) with sudo permissions 2 The SSH key to add for the user 3 Configuring SSH: disabling remote root login, disabling password login, and changing the SSH port to a non-standard value. Before running Ansible, I need to put the IP reported by Terraform into the hosts file, along with the path of private and public SSH key:\n1 2 [hetzner] \u0026lt;IP of the box\u0026gt;:14625 ansible_ssh_private_key_file=path/to/my-key ssh_public_key_file=/path/to/my-key.pub Then this playbook can be run like so:\n1 ansible-playbook -i hosts --limit=hetzner init-ssh.yml Note this can be executed only exactly once. Afterwards, the root user cannot connect anymore via SSH. Purists out there might say that the non-standard SSH port smells a bit like security by obscurity, and they wouldn’t be wrong. But it does help to prevent lots of entries about failed log-in attempts in the log, as most folks just randomly looking for machines to hack won’t bother trying with ports other than 22.\nProvisioning Software With the SSH configuration hardened a bit, it’s time to install some software onto the machine. What you’ll install depends on your specific requirements of course. For my purposes, I have two roles for installing some commonly required things and Docker, which both are incorporated via a playbook to be executed by the build user set up in the step before:\n1 2 3 4 5 6 7 8 9 --- - hosts: all remote_user: build roles: - base - docker vars_files: - vars.yml Here’s the base role’s task definitions:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 - name: upgrade all packages become: true become_user: root dnf: name=\u0026#34;*\u0026#34; state=latest - name: Have common tools become: true become_user: root dnf: name={{item}} state=latest with_items: - git - wget - the_silver_searcher - htop - acl - dnf-plugins-core - bash-completion - jq - gnupg - haveged - vim-enhanced - entr - zip - fail2ban - httpie - hyperfine - name: Have SDKMan become: no shell: \u0026#34;curl -s \u0026#39;https://get.sdkman.io\u0026#39; | bash\u0026#34; args: executable: /bin/bash creates: /home/build/.sdkman/bin/sdkman-init.sh - name: Have .bashrc copy: src: user_bashrc dest: /home/{{ user }}/.bashrc mode: 0644 I used to install Java via a separate role, allowing me to switch versions via update-alternatives, but this became a bit of a hassle, so I am doing this via the amazing SDKMan tool now. Finally, for the sake of completeness, here are the tasks for installing Docker. It’s a bit more complex than I’d like it to be, due to the fact that a separate DNF repo must be configured first:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 - name: Have docker repo become: true become_user: root shell: \u0026#39;dnf config-manager \\ --add-repo \\ https://download.docker.com/linux/fedora/docker-ce.repo\u0026#39; - name: Have dnf cache updated become: true become_user: root shell: \u0026#39;dnf makecache\u0026#39; - name: Have Docker become: true become_user: root dnf: name={{item}} state=latest with_items: - docker-ce - docker-ce-cli - containerd.io - docker-compose - docker-buildx-plugin - name: add docker group group: name=docker state=present become: true become_user: root - name: Have /etc/docker file: path=/etc/docker state=directory become: true become_user: root - name: Have daemon.json become: true become_user: root copy: src: docker_daemon.json dest: /etc/docker/daemon.json - name: Ensure Docker is started become: true become_user: root systemd: state: started enabled: yes name: docker - name: add user become: true become_user: root user: name={{ user}} groups=docker state=present append=yes Try It Out Yourself Thanks to Terraform and Ansible, spinning up a box for testing and development on Hetzner Cloud can be fully automated, letting you go from zero to a running VM—​set up for safe SSH access, and provisioned with the software you need—​within a few minutes. Once your VM is running, you can scale it up, and back down, based on your specific workloads. This allows you to stay on a really, really cheap configuration when you don’t actually need it, and then scale up and pay a bit more just for the hours you actually require the additional power.\nYou can find my complete Terraform and Ansible set-up for Hetzner Cloud in this GitHub repository. Note this is purely a side project I am using for personal projects, such as ad-hoc experimentation with new Java versions. I am not a Linux sysadmin by profession, so make sure to examine all the details and use it at your own risk. In case you do want to run this on a publicly reachable box and not behind a firewall, I recommend you install fail2ban as an additional measure of caution.\nIf you have any suggestions for improving this set-up, in particular for further improving security, please let me know in the comments below.\n","id":12,"publicationdate":"Oct 6, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhenever I’ve need a Linux box for some testing or experimentation,\nor projects like the \u003ca href=\"/blog/1brc-results-are-in/\"\u003eOne Billion Row Challenge\u003c/a\u003e a few months back,\nmy go-to solution is \u003ca href=\"https://www.hetzner.com/\"\u003eHetzner Online\u003c/a\u003e, a data center operator here in Europe.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eTheir prices for VMs are unbeatable, starting with 3,92 €/month for two shared vCPUs (either x64 or AArch64), four GB of RAM, and 20 TB of network traffic\n(these are prices for their German data centers, they vary between regions).\nfour dedicated cores with 16 GB, e.g. for running a small web server, will cost you 28.55 €/month.\nGetting a box with similar specs on AWS would set you back a multiple of that, with the (outbound) network cost being the largest chunk.\nSo it’s not a big surprise that more and more people realize the advantages of this offering,\nmost notably Ruby on Rails creator \u003ca href=\"https://x.com/dhh/\"\u003eDavid Heinemeier Hansson\u003c/a\u003e,\nwho has been singing the praise for Hetzner’s dedicated servers, but also their VM instances, quite a bit on \u003ca href=\"https://x.com/search?q=from%3Adhh%20hetzner\u0026amp;src=typed_query\u0026amp;f=live\"\u003eTwitter\u003c/a\u003e lately.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"How I Am Setting Up VMs On Hetzner Cloud","uri":"https://www.morling.dev/blog/how-i-am-setting-up-vms-on-hetzner-cloud/"},{"content":" Update Aug 30: This article is discussed on Hacker News and lobste.rs.\nIn distributed systems, for instance when scaling out some workload to multiple compute nodes, it is a common requirement to select a leader for performing a given task: only one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc. Otherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.\nOne way to implement leader election is distributed locking. All the nodes compete to obtain a specific lock, but only one of them can succeed, which will then be the selected leader for as long as it holds that lock. Systems like Apache ZooKeeper or Postgres (via Advisory Locks) provide the required building blocks for this.\nNow, if your application solely is in the business of writing data to object storage such as Amazon S3, Google Cloud Storage, or Azure Blob Storage, running such a stateful service solely for the purposes for leader election can be an overhead which you’d like to avoid from an operational as well as financial perspective. While you could implement distributed locks on the latter two platforms for quite a while with their respective compare-and-swap (CAS) operations, this notoriously was not the case for S3. That is, until last week, when AWS announced support for conditional writes on S3, which was received with great excitement by many folks in the data and distributed systems communities.\nIn a nutshell, the S3 PutObject operation now supports an optional If-None-Match header. When specified, the call will only succeed when no file with the same key exists in the target bucket yet; otherwise you’ll get a 412 Precondition Failed response. Compared to what’s available on GCP and Azure, that’s rather limited, but it’s all you need for implementing a locking scheme for leader election.\nThe Algorithm The basic idea is to have nodes compete on creating a lock file, with the winner being the leader. As S3 conditional writes don’t prevent lost updates to existing files, a new lock file will be created for each leader epoch, i.e. when leadership changes either after a node failure or when the leader releases the lock voluntarily. The lock file can be a simple JSON structure like this:\n1 2 3 { \u0026#34;expired\u0026#34; : false } The expired attribute is used for releasing a lock after use (more on that below). The leader epoch, a strictly increasing numeric value, is part of the file name, e.g. lock_0000000001.json. This allows you to determine the current epoch by listing all lock files and finding the one with the highest epoch value (all lock files but the latest one can be removed by a background process, thus keeping the cost for the listing call constant).\nHere’s the overall leader election algorithm:\n1. List all lock files 2. If there is no lock file, or the latest one has expired: 3. Increment the epoch value by 1 and try to create a new lock file 4. If the lock file could be created: 5. The current node is the leader, start with the actual work 6. Otherwise, go back to 1. 7. Otherwise, another process already is the leader, so do nothing. Go back to 1. periodically Obtaining the Lock To obtain the lock (step 3.), put a file for the next epoch. The key thing is to pass the If-None-Match header and handle the potential 412 Precondition failed response. Using the AWS Java SDK, this could look like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 int epoch = ...; PutObjectRequest put = PutObjectRequest.builder() .bucket(BUCKET) .key(\u0026#34;lock-%010d.json\u0026#34;.formatted(epoch)) .ifNoneMatch(\u0026#34;*\u0026#34;) .build(); try { s3.putObject(put, RequestBody.fromString(\u0026#34;\u0026#34;\u0026#34; { \u0026#34;expired\u0026#34;: false } \u0026#34;\u0026#34;\u0026#34;)); } catch(S3Exception e) { if (e.statusCode() == 412) { //handle elsewhere and start over throw new LockingFailedException(); } else { throw e; } } If you receive a 412 response, this means another process created the lock file since between you’ve listed the existing locks and now. That way, it is guaranteed that only one process succeeds to create the lock for the current epoch and thus becomes the leader.\nExpiring a Lock At some point, the current leader may decide to step down from this role, for instance when gracefully shutting down. This is as simple as setting the expired attribute to true and update the current lock file:\n1 2 3 { \u0026#34;expired\u0026#34; : true } When other nodes list the existing lock files subsequently, they’ll see that the lock has expired and thus a new leader needs to be elected. Note that only ever that process which created the lock file for a given epoch may expire it, otherwise chaos may ensue. Naturally, this brings up the question of what happens when a leader never expires its lock, such as when it crashes. In that case, no new leader could ever be elected without manual intervention, hampering the liveness of the system.\nLock Validity To address this situation, you can add another attribute to the lock file format, defining for how long it should be valid:\n1 2 3 4 { \u0026#34;validity_ms\u0026#34; : 60000, \u0026#34;expired\u0026#34; : false } In this example, the lock should be valid for 60 seconds. For each file, S3 provides the last modification timestamp, specifying when it has been created or updated. When performing its work, the current leader needs to check whether the lock is still valid (i.e. have less than 60 seconds passed since the lock was obtained), optionally touching the file in order to extend the lease. Similarly, current non-leader nodes can check whether the latest lock is still valid or not.\nWhat about clock drift though? After all, you never should rely on clock accuracy of different nodes when building distributed systems. But the good news is, you don’t have to. Let’s discuss the different options. If the current leader’s clock is ahead, it will stop doing its work, despite the lock still being valid. Similarly, if the clock of a current non-leader is behind, it may not try to acquire leadership although the current lock already has expired. While this may impact throughput of the system, both cases are not a correctness problem.\nThings look different if the current leader relies on a lock after it has expired (because its clock is behind) and another leader has been elected already, or if a non-leader determines prematurely that the current lock has expired (because its clock is ahead) and thus picks up leadership.\nIn both cases, there’s more than one node which assumes to be the leader, which is exactly what we want to avoid with leadership election. But as it turns out, this is just the nature of the beast: leader election will only ever be eventually correct. As Martin Kleppmann describes in this excellent post, checking lock validity and performing the leader’s actual work is not atomic, no matter how hard you try (for instance, think of unexpected GC pauses). So you’ll always need to be prepared to detect and fence off work done by a previous leader.\nMinimizing Clock Drift While you never should rely on clock consistency across systems from a correctness point of view, it does make sense to keep clocks synchronous on a best-effort basis, thus reducing the aforementioned throughput impact. To do so, nodes could create a temporary file on S3 and compare its creation time on S3 with their local time. Alternatively, you could use the Amazon Time Sync Service, which offers micro-second time accuracy.\nFencing Off Zombies As a solution, Kleppmann suggests using the leader epoch as a fencing token. The epoch value only ever increases, so it can be used to identify requests by a stale leader (\u0026#34;zombie\u0026#34;). When for instance invoking a remote API, the fencing token could be passed as a request header, allowing the API provider to recognize and discard zombie requests by keeping track of the highest epoch value it has seen. Of course this requires the remote API to support the notion of fencing tokens, which may or may not be the case.\nAs an example targeting S3 (which doesn’t have bespoke support for fencing tokens), SlateDB implements this by uploading files following a serial order (similar to the lock file naming scheme above) and detecting conflicts between competing writers trying to create the same file. Thanks to the new support for conditional writes on S3, this task is trivial, not requiring any external stateful services any longer.\n","id":13,"publicationdate":"Aug 26, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Aug 30: This article is discussed on \u003ca href=\"https://news.ycombinator.com/item?id=41357123\"\u003eHacker News\u003c/a\u003e and \u003ca href=\"https://lobste.rs/s/ljq5pm/leader_election_with_s3_conditional\"\u003elobste.rs\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn distributed systems, for instance when scaling out some workload to multiple compute nodes,\nit is a common requirement to select a \u003cem\u003eleader\u003c/em\u003e for performing a given task:\nonly one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc.\nOtherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Leader Election With S3 Conditional Writes","uri":"https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/"},{"content":" In my day job at Decodable, I am currently working with Terraform to provision some cloud infrastructure for an upcoming hands-on lab. Part of this set-up is a Postgres database on Amazon RDS, which I am creating using the Terraform AWS modules. Now, once my database was up and running, I wanted to extract two dynamically generated values from Terraform: the random password created for the root user, and the database host URL. On my way down the rabbit hole for finding a CLI command for doing this efficiently, I learned a few interesting shell details which I’d like to share.\nThe basic idea is to fetch the current Terraform state via terraform show -json and then extract the two values we’re after from that. The JSON output of Terraform looks like follows. The values I am after are on lines 20 and 40, respectively (shortened for readability, and no, those aren’t the actual values from my database instance 😉):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 { \u0026#34;format_version\u0026#34;: \u0026#34;1.0\u0026#34;, \u0026#34;terraform_version\u0026#34;: \u0026#34;1.5.4\u0026#34;, \u0026#34;values\u0026#34;: { \u0026#34;root_module\u0026#34;: { \u0026#34;resources\u0026#34;: [ ... ], \u0026#34;child_modules\u0026#34;: [ { \u0026#34;resources\u0026#34;: [ { \u0026#34;address\u0026#34;: \u0026#34;module.lab-001.aws_db_instance.lab_001_db\u0026#34;, \u0026#34;mode\u0026#34;: \u0026#34;managed\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;aws_db_instance\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;lab_001_db\u0026#34;, \u0026#34;provider_name\u0026#34;: \u0026#34;registry.terraform.io/hashicorp/aws\u0026#34;, \u0026#34;schema_version\u0026#34;: 2, \u0026#34;values\u0026#34;: { \u0026#34;address\u0026#34;: \u0026#34;lab-001-db.a4dadf981fgh.us-east-1.rds.amazonaws.com\u0026#34;, ... }, \u0026#34;sensitive_values\u0026#34;: { ... }, \u0026#34;depends_on\u0026#34;: [ \u0026#34;module.lab-001.random_password.this\u0026#34;, ... ] }, { \u0026#34;address\u0026#34;: \u0026#34;module.lab-001.random_password.this\u0026#34;, \u0026#34;mode\u0026#34;: \u0026#34;managed\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;random_password\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;this\u0026#34;, \u0026#34;provider_name\u0026#34;: \u0026#34;registry.terraform.io/hashicorp/random\u0026#34;, \u0026#34;schema_version\u0026#34;: 3, \u0026#34;values\u0026#34;: { \u0026#34;result\u0026#34;: \u0026#34;5adCpQc]$s3pQ=a\u0026#34;, ... }, \u0026#34;sensitive_values\u0026#34;: { ... } } ], \u0026#34;address\u0026#34;: \u0026#34;module.lab-001\u0026#34; } ] } } } Extracting the two values is relatively simple using jq. But I wanted to get both values at once, with a single Terraform call—​which is a remote and thus slow operation—​so I could pass them on to psql and get a database session. All that without storing the Terraform output in a file (which would taint my workspace), and as a copy/paste friendly snippet which I can add to the README of the project for documentation purposes.\nAfter fiddling around for a little while, I asked for help in our internal Slack, where my fellow Decoder Jared Breeden took the bits I already had and morphed them into this really cool solution (thanks again, mate!):\n1 2 3 4 5 6 7 8 9 10 ({ read -r host read -r password } \u0026lt; \u0026lt;(terraform show -json | jq -r \u0026#39; .values.root_module.child_modules[] | select(.address==\u0026#34;module.lab-001\u0026#34;) | .resources[] | (select(.address==\u0026#34;module.lab-001.random_password.this\u0026#34;) | .values.result), (select(.address==\u0026#34;module.lab-001.aws_db_instance.lab_001_db\u0026#34;) | .values.address)\u0026#39;) psql \u0026#34;postgresql://root:${password}@${host}:5432/labdb\u0026#34;) This does exactly what I want: retrieving the password and database host from the current Terraform state in one go and using them to open a session with the database via psql. So let’s dissect this little gem to understand how it works.\nterraform show -json retrieves the full JSON description of the Terraform state shown above:\n1 terraform show -json The resulting JSON is piped to jq for extracting the values of password and host:\n1 2 3 4 5 6 jq -r \u0026#39; .values.root_module.child_modules[] | select(.address==\u0026#34;module.lab-001\u0026#34;) | .resources[] | (select(.address==\u0026#34;module.lab-001.random_password.this\u0026#34;) | .values.result), (select(.address==\u0026#34;module.lab-001.aws_db_instance.lab_001_db\u0026#34;) | .values.address)\u0026#39; jq is invaluable for handling JSON and I highly recommend spending some time with its reference documentation to learn about it. For the case at hand, the select() function is used within a pipeline for finding the right elements within the array of Terraform child modules and extracting the required values. Putting the two inner select() calls into parenthesis makes them two separate expressions whose output will go onto separate lines.\nAt this point, the value of host and password are passed to stdout (the order is determined by the order of resource definitions in the input main.tf file and thus stable):\n1 2 lab-001-db.a4dadf981fgh.us-east-1.rds.amazonaws.com 5adCpQc]$s3pQ=a How to pass on the two values to psql? This is where the grouping command in curly braces comes in:\n1 2 3 4 { read -r host read -r password } \u0026lt; \u0026lt;(...) The list of commands between curly braces will be executed in the current shell context as one unit; in particular any input/output redirections will be applied to all the commands. Here we redirect the input (using the \u0026lt; operator, the counterpart to the more commonly used \u0026gt; operator for redirecting a command’s output) of the grouping command to the output of the jq invocation with the help of process substitution (\u0026lt;(...)), about which I wrote recently.\nYou might wonder why input redirection and process substitution are used here, instead of simply piping the output of jq to the grouping command. Indeed this would work when using zsh as a shell. Other shells such as bash execute each command of a pipeline in its own subshell, though. This means that the two variables wouldn’t be available any longer once the grouping command has completed. The input redirection approach thus increases portability of the solution across shells.\nWithin the grouping command, the two lines on stdin are read and stored under the names host and password in the shell context, respectively.\nThat way, they can be referenced in the subsequent command for opening a database session:\n1 psql \u0026#34;postgresql://root:${password}@${host}:5432/labdb\u0026#34; There’s one remaining problem, and that is that the host and password variables are still around after closing the database session, which may pose a security issue. We could call unset to remove them, but it’s even easier to make everything another grouping command, using (...) this time. This ensures a sub-shell is created for the commands which will be destroyed after closing the database session.\nLearning some new shell tricks will never be boring to me. Do you have another solution for solving this little problem? Let me know in the comments below!\n","id":14,"publicationdate":"Jul 6, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn my day job at \u003ca href=\"https://www.decodable.co/\"\u003eDecodable\u003c/a\u003e,\nI am currently working with Terraform to provision some cloud infrastructure for an upcoming hands-on lab.\nPart of this set-up is a Postgres database on Amazon RDS,\nwhich I am creating using the \u003ca href=\"https://developer.hashicorp.com/terraform/tutorials/aws/aws-rds\"\u003eTerraform AWS modules\u003c/a\u003e.\nNow, once my database was up and running,\nI wanted to extract two dynamically generated values from Terraform:\nthe random password created for the root user, and the database host URL.\nOn my way down the rabbit hole for finding a CLI command for doing this efficiently,\nI learned a few interesting shell details which I’d like to share.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Shell Spell: Extracting and Propagating Multiple Values With jq","uri":"https://www.morling.dev/blog/extracting-and-propagating-multiple-values-with-jq/"},{"content":" The other day, I was looking for means of zipping two Java streams: connecting them element by element—​essentially a join based on stream offset position—​and emitting an output stream with the results. Unfortunately, there is no zip() method offered by the Java Streams API itself. While it was considered for inclusion in early preview versions, the method was removed before the API went GA with Java 8 and you have to resort to 3rd party libraries such as Google Guava if you need this functionality.\nJava 22, scheduled for release later this week, promises to improve the situation here. It introduces a preview API for so-called stream gatherers (JEP 461). Similar to how collectors allow you to implement custom terminal operations on Java streams, gatherers let you add custom intermediary operations to a stream pipeline, providing an extension point for adding stream operations such as distinct() or window(), without having to bake them into the API itself. This sounds pretty much like what we need for a zip() method, doesn’t it?\nSo I spent some time studying the JEP and here’s the basic implementation I came up with:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 public record ObjToObjZipper\u0026lt;T1, T2, R\u0026gt;( Stream\u0026lt;T2\u0026gt; other, BiFunction\u0026lt;T1, T2, R\u0026gt; zipperFunction) (1) implements Gatherer\u0026lt;T1, Iterator\u0026lt;T2\u0026gt;, R\u0026gt; { (2) @Override public Supplier\u0026lt;Iterator\u0026lt;T2\u0026gt;\u0026gt; initializer() { (3) return () -\u0026gt; other.iterator(); } @Override public Integrator\u0026lt;Iterator\u0026lt;T2\u0026gt;, T1, R\u0026gt; integrator() { (4) return Gatherer.Integrator.ofGreedy((state, element, downstream) -\u0026gt; { if (state.hasNext()) { return downstream.push(zipperFunction.apply(element, state.next())); } return false; }); } } 1 This gatherer takes the stream to zip with and a function, which is applied to pairs of elements of the two streams and returns the zipped result 2 Gatherer has three type parameters: the element type of the stream the gatherer is applied to, a type for keeping track of intermediary state (in our case, that’s just the iterator of the second stream), and the output type 3 initializer() returns a supplier of the state tracking type, if needed 4 integrator() returns a function which \u0026#34;integrates provided elements, potentially using the provided intermediate state, optionally producing output to the provided Downstream\u0026#34; It’s the first time I have been using this API, so I hope I haven’t done anything too stupid :) The key part of the gatherer is its Integrator implementation. This is where for each element of the stream the gatherer is applied to, we take the corresponding element of the given second stream, apply the given function, and emit the function’s return value to the next stage in the stream pipeline.\nThis particular implementation stops emitting elements as soon as one of the two streams has been exhausted, but of course you also could have an implementation with \u0026#34;left join\u0026#34; semantics, or similar. With some more glue code for instantiating this zipping gatherer \u0026#34;builder style\u0026#34; (you can find the complete source code on GitHub), this is how it can be used:\n1 2 3 4 5 6 7 8 9 10 11 12 @Test public void canZipTwoObjectStreams() { List\u0026lt;String\u0026gt; letters = List.of(\u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;, \u0026#34;c\u0026#34;, \u0026#34;d\u0026#34;, \u0026#34;e\u0026#34;); Stream\u0026lt;Integer\u0026gt; numbers = IntStream.range(0, letters.size()) .mapToObj(i -\u0026gt; i); List\u0026lt;String\u0026gt; zipped = letters.stream() .gather(zip(numbers).with((letter, i) -\u0026gt; i + \u0026#34;-\u0026#34; + letter)) (1) .collect(Collectors.toList()); assertThat(zipped).containsExactly(\u0026#34;0-a\u0026#34;, \u0026#34;1-b\u0026#34;, \u0026#34;2-c\u0026#34;, \u0026#34;3-d\u0026#34;, \u0026#34;4-e\u0026#34;); } 1 gather() applies the given gatherer to each element of the stream Et voilà, we have a zip() function which can be used with Java Streams, and short of having a zip() method directly on the Stream interface itself, the resulting code reads quite nicely. In order to avoid the boxing of the int stream, I’ve also built an ObjToIntZipper:\n1 2 3 4 5 6 7 8 9 10 11 @Test public void canZipObjectWithIntStream() { List\u0026lt;String\u0026gt; letters = List.of(\u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;, \u0026#34;c\u0026#34;, \u0026#34;d\u0026#34;, \u0026#34;e\u0026#34;); IntStream numbers = IntStream.range(0, letters.size()); List\u0026lt;String\u0026gt; zipped = letters.stream() .gather(zip(numbers).with((letter, i) -\u0026gt; i + \u0026#34;-\u0026#34; + letter)) .collect(Collectors.toList()); assertThat(zipped).containsExactly(\u0026#34;0-a\u0026#34;, \u0026#34;1-b\u0026#34;, \u0026#34;2-c\u0026#34;, \u0026#34;3-d\u0026#34;, \u0026#34;4-e\u0026#34;); } Usually I am cautious of types with three or more type arguments, as it easily leads to APIs which are cumbersome to use. But the Gatherer API actually felt quite intuitive to me after just a little while.\nThe only real downside is that this gatherer cannot be parallelized. While the API itself allows for the creation of parallel-ready gatherers (by implementing the optional combiner()) method, you don’t have a handle to the second stream’s spliterator of a particular subdivision step from within a gatherer implementation. The only way for doing this is on the spliterator level, as shown by Jose Paumard in here. Note that both input streams must have the same length in order for this to work, as otherwise you’d end up zipping elements at different positions in the two input streams.\nYou can find the complete source code of the proof-of-concept zipping gatherer in this GitHub repository.\n","id":15,"publicationdate":"Mar 18, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, I was looking for means of \u003ca href=\"https://twitter.com/gunnarmorling/status/1764305703047438361\"\u003ezipping two Java streams\u003c/a\u003e:\nconnecting them element by element—​essentially a join based on stream offset position—​and emitting an output stream with the results.\nUnfortunately, there is no \u003ccode\u003ezip()\u003c/code\u003e method offered by the Java Streams API itself.\nWhile it was considered for inclusion in early preview versions,\nthe method was removed before the API went GA with Java 8 and you have to resort to 3rd party libraries such as \u003ca href=\"https://guava.dev/releases/snapshot-jre/api/docs/com/google/common/collect/Streams.html#zip(java.util.stream.Stream,java.util.stream.Stream,java.util.function.BiFunction)\"\u003eGoogle Guava\u003c/a\u003e if you need this functionality.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"A Zipping Gatherer","uri":"https://www.morling.dev/blog/zipping-gatherer/"},{"content":" I had the pleasure to do a few podcasts and interviews, e.g. talking about Debezium, change data capture, stream processing, my career, and software engineering in general.\nUnapologetically Technical with Jesse Anderson: Ep.9 Gunnar Morling - One Billion Row Challenge\nInterview with InfoQ: The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds\nCoffee + Software with Josh Long: Gunnar Morling on the 1BRC\nHappy Path Programming, by Bruce Eckel \u0026amp; James Ward: Ep. #93: Nerd Sniping via the 1B Row Challenge with Gunnar Morling\nDeveloper Voices: Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)\nThe Geek Narrator: Decoding Decodable with Gunnar Morling\nThe Data Stack Show: All About Debezium and Change Data Capture With Gunnar Morling of Decodable\nReal-Time Analytics Podcast: Mr. Debezium on Pinot, Flink, CDC \u0026amp; Decodable | Ep. 4: Gunnar Morling\nSaaS Developer Community: The Wonders of Postgres Logical Decoding Messages\nThe Modern Data Show: S02 E03: Innovating the Modern Data Stack: Change Data Capture and Beyond with Gunnar Morling\nThe Geek Narrator: Change Data Capture and Debezium with Gunnar Morling\nA Bootiful Podcast: Java Champion Gunnar Morling about messaging middleware, Debezium, change data capture, and more.\nTrino Community Broadcast: Episode #25: Trino Going Through Changes; together with Ashhar Hasan and Ayush Chauhan\nThe InfoQ Podcast, with Wes Reisz: Gunnar Morling on Change Data Capture and Debezium\nData Engineering Podcast by Tobias Macey: Episode 114 — Change Data Capture For All Of Your Databases With Debezium; together with Randall Hauch\nAdam Bien’s airhacks.fm podcast: Episode 39 — Use the Most Productive Stack You Can Get\nAdam Bien’s airhacks.fm podcast: Episode 57 — CDC, Debezium, streaming and Apache Kafka\nStreaming Audio: a Confluent podcast about Apache Kafka: Change Data Capture with Debezium ft. Gunnar Morling\nInterview with Thorben Janssen for heise.de (German): Im Gespräch: Gunnar Morling über Debezium und CDC\nThoughts On Java: Interview with Gunnar Morling\n","id":16,"publicationdate":"Mar 11, 2024","section":"","summary":"I had the pleasure to do a few podcasts and interviews, e.g. talking about Debezium, change data capture, stream processing, my career, and software engineering in general.\nUnapologetically Technical with Jesse Anderson: Ep.9 Gunnar Morling - One Billion Row Challenge\nInterview with InfoQ: The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds\nCoffee + Software with Josh Long: Gunnar Morling on the 1BRC","tags":null,"title":"Podcasts and Interviews","uri":"https://www.morling.dev/podcasts/"},{"content":" In many applications it’s a requirement to keep track of when a record was created and updated the last time. Often, this is implemented by having columns such as created_at and updated_at within each table. To make things as simple as possible for application developers, the database itself should take care of maintaining these values automatically when a record gets inserted or updated.\nFor the creation timestamp, that’s as simple as specifying a column default value of current_timestamp. When omitting the value from an INSERT statement, the field will be populated automatically with the current timestamp. What about the update timestamp though?\nSolely relying on the default value won’t cut it, as the field already has a value when a row gets updated. You also shouldn’t set the value from within your application code. Otherwise, create and update timestamps would have different sources, potentially leading to anomalies if there are clock differences between application and database server, such as a row’s created_at timestamp being younger than it’s updated_at timestamp.\nFor MySQL, the ON UPDATE clause can be used to set the current timestamp when a row gets updated. Postgres does not support this feature, unfortunately. If you search for a solution, most folks suggest defining an ON UPDATE trigger for setting the update timestamp. This also is what I’d have done until recently; it works, but having to declare such a trigger for every table can quickly become a bit cumbersome.\nBut as I’ve just learned from a colleague, there’s actually a much simpler solution: Postgres lets you explicitly set a field’s value to its default value when updating a row. So given this table and row:\n1 2 3 4 5 6 7 8 9 CREATE TABLE movie ( id SERIAL NOT NULL, title TEXT, viewer_rating NUMERIC(2, 1), created_at TIMESTAMP NOT NULL DEFAULT current_timestamp, updated_at TIMESTAMP NOT NULL DEFAULT current_timestamp ); INSERT INTO movie (title, score) VALUES (\u0026#39;North by Northwest\u0026#39;, 9.2); Then auto-updating the updated_at field is as simple as this:\n1 2 3 4 5 6 UPDATE movie SET viewer_rating = 9.6, updated_at = DEFAULT WHERE id = 1; The value will be retrieved by the database when executing the statement, so there is no potential for inconsistencies with the created_at value. It is not quite as elegant as MySQL’s ON UPDATE, as you must make sure to set the value to DEFAULT in each UPDATE statement your application issues. But pretty handy nevertheless, and certainly more convenient than defining triggers for all tables. If you need to retrieve the value from within your application as well, you simply can expose it using the RETURNING clause:\n1 2 3 4 5 6 7 8 UPDATE movie SET score = 9.6, updated_at = DEFAULT WHERE id = 1 RETURNING updated_at; If you want to play with this example by yourself, you can find it here on DB Fiddle.\n","id":17,"publicationdate":"Feb 20, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn many applications it’s a requirement to keep track of when a record was created and updated the last time.\nOften, this is implemented by having columns such as \u003ccode\u003ecreated_at\u003c/code\u003e and \u003ccode\u003eupdated_at\u003c/code\u003e within each table.\nTo make things as simple as possible for application developers,\nthe database itself should take care of maintaining these values automatically when a record gets inserted or updated.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Last Updated Columns With Postgres","uri":"https://www.morling.dev/blog/last-updated-columns-with-postgres/"},{"content":" I am an open-source software engineer in the Java and data streaming space. I currently work as a Technologist at Confluent. In my past role at Decodable I focused on developer outreach and helped them build their stream processing platform based on on Apache Flink. Prior to that, I spent ten years at Red Hat, where I led the Debezium project, a platform for change data capture.\nI have been a long-time committer to multiple open-source projects, including Hibernate, kcctl, JfrUnit, MapStruct and Deptective; I also serve as the spec lead for Bean Validation 2.0 (first at the JCP, now under the Jakarta EE umbrella at the Eclipse Foundation).\nNamed a Java Champion, I enjoy speaking at conferences, for instance at QCon, JavaOne, Red Hat Summit, JavaZone, JavaLand and Kafka Summit.\nOccasionally, I blog about topics related to software engineering.\nSpeaker Information The following information can be used on conference websites.\nHead shot: 500px, 1000px\n🇬🇧 English Title: Principal Technoologist, Confluent\nBio: Gunnar Morling is a software engineer and open-source enthusiast by heart, currently working as a Technologist at Confluent. Previously, he helped to build a realtime stream processing platform based on Apache Flink and led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at various conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.\n🇩🇪 Deutsch Titel: Principal Technoologist, Confluent\nBio: Gunnar Morling ist Softwareentwickler und Open-Source-Enthusiast, gegenwärtig tätig als Technologist für Confluent. Zuvor arbeitete er an einer Plattform für Real-Time Stream Processing basierend auf Apache Flink; weiterhin leitete er das Debezium-Projekt, eine verteilte Lösung für Change Data Capture. Er ist ein Java Champion und hat diverse Open-Source-Projekte wie JfrUnit, kcctl und MapStruct ins Leben gerufen. Gunnar bloggt auf morling.dev und teilt seine Erfahrungen in Vorträgen, u.a. bei JavaLand, QCon, JavaOne und Devoxx. Er lebt und arbeitet in Hamburg.\n","id":18,"publicationdate":"Feb 11, 2024","section":"","summary":"I am an open-source software engineer in the Java and data streaming space. I currently work as a Technologist at Confluent. In my past role at Decodable I focused on developer outreach and helped them build their stream processing platform based on on Apache Flink. Prior to that, I spent ten years at Red Hat, where I led the Debezium project, a platform for change data capture.\nI have been a long-time committer to multiple open-source projects, including Hibernate, kcctl, JfrUnit, MapStruct and Deptective; I also serve as the spec lead for Bean Validation 2.","tags":null,"title":"About Me","uri":"https://www.morling.dev/about/"},{"content":" Recently I ran into a situation where it was necessary to capture the output of a Java process on the stdout stream, and at the same time a filtered subset of the output in a log file. The former, so that the output gets picked up by the Kubernetes logging infrastructure. The letter for further processing on our end: we were looking to detect when the JVM stops due to an OutOfMemoryError, passing on that information to some error classifier.\nSimply redirecting the standard output stream of the process to a file wouldn’t satisfy the first requirement. Instead, the tee command offers a solution: it reads from stdin and writes everything to stdout as well as a file:\n1 2 $ java -jar my-app.jar -XX:+ExitOnOutOfMemoryError \\ | tee my-app.log (1) 1 Pipe stdout to tee, which writes it to both stdout and a log file This kinda gives us what we want, but we lack control over the size of that log file. As is, it can grow indefinitely, eventually causing the application’s pod to run out of disk space. For the case at hand, we’re just interested in specific lines anyways. So ideally the content written to the log file would be filtered accordingly, while exposing the complete output to the Kubernetes log collector via stdout.\nTo accommodate that requirement, process substitution can be used. In a nutshell, it provides a bridge between the standard input and output streams and files:\n\u0026gt;(commands) substitutes a file a process writes to with another process which receives the written content on stdin\n\u0026lt;(commands) substitutes a file a process reads from with another process which provides the content on stdout\nNote that there must be no space between \u0026gt;/\u0026lt; and the left parenthesis. I.e. this is no redirection. The former variant is exactly what we need: instead of directly writing all the process output to the log file, we use grep to filter any written content, based on the string we’re looking for:\n1 2 $ java -jar my-app.jar -XX:+ExitOnOutOfMemoryError \\ | tee \u0026gt;(grep \u0026#39;OutOfMemoryError\u0026#39; \u0026gt; my-app.log) (1) 1 Represent the stdin of grep as a file for tee to write to That way, the complete stdout of our process gets exposed to Kubernetes\u0026#39; logging infrastructure, while only the filtered output get written to our log file:\n1 2 $ cat my-app.log Terminating due to java.lang.OutOfMemoryError: Java heap space To get a better intuition of what process substitution does under the hood, let’s create a simple Java program which reads from a file specified as a program argument:\n1 2 3 4 5 6 7 8 9 10 11 12 13 import java.nio.charset.Charset; import java.nio.file.Files; import java.nio.file.Paths; public void main(String... args) throws Exception { var fileName = args[0]; System.out.println(\u0026#34;File: \u0026#34; + fileName); (1) String content = Files.readString( (2) Paths.get(fileName), Charset.defaultCharset() ); System.out.println(\u0026#34;Content: \u0026#34; + content); } 1 Print the passed file name 2 Print the content of the file Here’s the program’s output when using process substitution for exposing the stdout of echo:\n1 2 3 $ java --enable-preview --source 21 read_file.java \u0026lt;(echo \u0026#34;hello\u0026#34;) File: /dev/fd/11 Content: hello /dev/fd is a special directory which contains a file descriptor for each file opened by a process. So what is /dev/fd/11 then? Most implementations of process substitution represent stdin/stdout via anonymous pipes. If we take a look at the list of files opened by the process, we can see that this is the case here too:\n1 2 3 4 $ lsof -p 99657 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ... java 99657 gunnar 11 PIPE 0xc2e0b19eaf172929 16384 FD 11 is a pipe created through process substitution, and the standard Java file I/O APIs can be used to read its contents via that descriptor.\n","id":19,"publicationdate":"Feb 10, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eRecently I ran into a situation where it was necessary to capture the output of a Java process on the \u003ccode\u003estdout\u003c/code\u003e stream,\nand at the same time a filtered subset of the output in a log file.\nThe former, so that the output gets picked up by the Kubernetes logging infrastructure.\nThe letter for further processing on our end:\nwe were looking to detect when the JVM stops due to an \u003ccode\u003eOutOfMemoryError\u003c/code\u003e, passing on that information to some error classifier.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Filtering Process Output With tee","uri":"https://www.morling.dev/blog/filtering-process-output-with-tee/"},{"content":" Oh what a wild ride the last few weeks have been. The One Billion Row Challenge (1BRC for short), something I had expected to be interesting to a dozen folks or so at best, has gone kinda viral, with hundreds of people competing and engaging. In Java, as intended, but also beyond: folks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk.\nIt’s really incredible how far people have pushed the limits here. Pull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation. Today I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31 and all submissions have been reviewed.\nResults So without further ado, here are the top 10 entries for the official 1BRC competition. These results are from running on eight cores of a 32 core AMD EPYC™ 7502P (Zen2) machine:\n# Result (m:s.ms) Implementation JDK Submitter 1\n00:01.535\nlink\n21.0.2-graal\nThomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen\n2\n00:01.587\nlink\n21.0.2-graal\nArtsiom Korzun\n3\n00:01.608\nlink\n21.0.2-graal\nJaromir Hamala\n00:01.880\nlink\n21.0.1-open\nSerkan Özal\n00:01.921\nlink\n21.0.2-graal\nVan Phu DO\n00:02.018\nlink\n21.0.2-graal\nStephen Von Worley\n00:02.157\nlink\n21.0.2-graal\nRoy van Rijn\n00:02.319\nlink\n21.0.2-graal\nYavuz Tas\n00:02.332\nlink\n21.0.2-graal\nMarko Topolnik\n00:02.367\nlink\n21.0.1-open\nQuan Anh Mai\nYou can find the complete result list with all 164 submissions as well as the description of the evaluation process in the 1BRC repository.\nCongratulations to the implementers of the top three entries (Thomas Wuerthinger/Quan Anh Mai/Alfonso² Peterssen, Artsiom Korzun, and Jaromir Hamala), as well as everyone else one the leaderboard for putting in the effort to participate in this challenge! For the fun of it, and as a small expression of my appreciation, I have created a personalized \u0026#34;certificate\u0026#34; PDF for each accepted submission, stating the author’s name and time. You can find it at your entry in the leaderboard table.\nInitially I had meant to pay for a 1️⃣🐝🏎️ t-shirt for the winner out of my own pocket. But then I remembered I have a company credit card ;) So I will actually do t-shirts for the Top 3 and a 1BRC coffee mug for the Top 20. I will follow up with the winners on the details of getting these to you shortly. Thanks a lot to my employer Decodable (we build a SaaS for real-time ETL and stream processing, you should totally check us out!) for sponsoring not only these prizes, but also the evaluation machine. It means a lot to me!\nI am planning to dive into some of the implementation details in another blog post, there is so much to talk about: segmentation and parallelization, SIMD and SWAR, avoiding branch mispredictions and spilling, making sure the processor’s pipelines are always fully utilized, the \u0026#34;process forking\u0026#34; trick, and so much more. For now let me just touch on two things which stick out when looking at the results. One is that all the entries in the Top 10 are using Java’s notorious Unsafe class for faster yet unsafe memory access. Planned to be removed in a future version, it will be interesting to see which replacement APIs will be provided in the JDK for ensuring performance-sensitive applications like 1BRC don’t suffer.\nAnother noteworthy aspect is that with two exceptions all entries in the Top 10 are using GraalVM to produce a native binary. These are faster to start and reach peak performance very quickly (no JIT compilation). As the result times got down to less than two seconds, this makes the deciding difference. Note that other entries of the contest also use GraalVM as a JIT compiler for JVM-based entries, which also was beneficial for the problem at hand. This is a perfect example for the kind of insight I was hoping to gain from 1BRC. A special shout-out to Serkan Özal for creating the fastest JVM-based solution, coming in on a great fourth place!\nBonus Result: 32 Cores, 64 Threads For officially evaluating entries into the challenge, each contender was run on eight cores of the target machine. This was done primarily to keep results somewhat in the same ballpark as the figures of the originally used machine (I had to move to a different environment after a little while, re-evaluating all the previous entries).\nBut it would be a pity to leave all the 24 other cores unused, right? So I ran the fastest 50 entries from the regular evaluation on all 32 cores / 64 threads (i.e. SMT is enabled) of the machine, with turbo boost enabled too, and here is the Top 10 from this evaluation (the complete result set for this evaluation can be found here):\n# Result (m:s.ms) Implementation JDK Submitter 1\n00:00.323\nlink\n21.0.2-graal\nJaromir Hamala\n2\n00:00.326\nlink\n21.0.2-graal\nThomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen\n3\n00:00.349\nlink\n21.0.2-graal\nArtsiom Korzun\n00:00.351\nlink\n21.0.2-graal\nVan Phu DO\n00:00.389\nlink\n21.0.2-graal\nStephen Von Worley\n00:00.408\nlink\n21.0.2-graal\nYavuz Tas\n00:00.415\nlink\n21.0.2-graal\nRoy van Rijn\n00:00.499\nlink\n21.0.2-graal\nMarko Topolnik\n00:00.602\nlink\n21.0.1-graal\nRoman Musin\n00:00.623\nlink\n21.0.1-open\ngonix\nThe fastest one coming in here is Jaromir Hamala, whose entry seems to take a tad more advantage of the increased level of parallelism. I’ve run this benchmark a handful of times, and the times and ordering remained stable, so I feel comfortable about publishing these results, albeit being very, very close. Congrats, Jaromir!\nBonus Result: 10K Key Set One thing which I didn’t expect to happen was that folks would optimize that much for the specific key set used by the example data generator I had provided. While the rules allow for 10,000 different weather station names with a length of up to 100 bytes, the key set used during evaluation contained only 413 distinct names, with most of them being shorter than 16 bytes. This fact heavily impacted implementation strategies, for instance when it comes to parsing rows of the file, or choosing hash functions which work particularly well for aggregating values for those 413 names.\nSo some folks asked for another evaluation using a data set which contains a larger variety of station names (kudos to Marko Topolnik who made a strong push here). I didn’t want to change the nature of the original task after folks had already entered their submissions, but another bonus evaluation with 10K keys and longer names seemed like a great idea. Here are the top 10 results from running the fastest 40 entries of the regular evaluation against this data set (all results are here):\n# Result (m:s.ms) Implementation JDK Submitter 1\n00:02.957\nlink\n21.0.2-graal\nArtsiom Korzun\n2\n00:03.058\nlink\n21.0.2-graal\nMarko Topolnik\n3\n00:03.186\nlink\n21.0.2-graal\nStephen Von Worley\n00:03.998\nlink\n21.0.2-graal\nRoy van Rijn\n00:04.042\nlink\n21.0.2-graal\nJaromir Hamala\n00:04.289\nlink\n21.0.1-open\ngonix\n00:04.522\nlink\n21.0.2-graal\ntivrfoa\n00:04.653\nlink\n21.0.2-graal\nJamal Mulla\n00:04.733\nlink\n21.0.1-open\ngonix\n00:04.836\nlink\n21.0.1-graal\nSubrahmanyam\nThis evaluation shows some interesting differences to the other ones. There are some new entries to this Top 10, while some entries from the original Top 10 do somewhat worse for the 10K key set, solely due to the fact that they have been so highly optimized for the 413 stations key set. Congrats to Artsiom Korzun, whose entry is not only the fastest one in this evaluation, but who also is the only contender to be in the Top 3 for all the different evaluations!\nThank You! The goal of 1BRC was to be an opportunity to learn something new, inspire others to do the same, and have some fun along the way. This was certainly the case for me, and I think for participants too. It was just great to see how folks kept working on their submissions, trying out new approaches and techniques, helping each other to improve their implementations, and even teaming up to create joint entries. I feel the decision to allow participants to take inspiration from each other and adopt promising techniques explored by others was absolutely the right one, aiding with the \u0026#34;learning\u0026#34; theme of the challenge.\nI’d like to extend my gratitude to everyone who took part in the challenge: Running 1BRC over this month and getting to experience where the community would go with this has been nothing but absolutely amazing. This would not have been possible without all the folks who stepped up to help organize the challenge, be it by creating and extending a test suite for verifying correctness of challenge submissions, setting up and configuring the evaluation machine, or by building the infrastructure for running the benchmark and maintaining the leaderboard. A big shout-out to Alexander Yastrebov, Rene Schwietzke, Jason Nochlin, Marko Topolnik, and everyone else involved!\nA few people have asked for stats around the challenge, so here are some:\n587 integrated pull requests, 164 submissions\n61 discussions, including an amazing \u0026#34;Show \u0026amp; Tell\u0026#34; section where folks show-case their non-Java based solutions\n1.1K forks of the project\n3K star-gazers of the project, with the fastest growth in the second week of January\n1,909 workflow runs on GitHub Actions (it would have been way more, had I set up an action for running the test suite against incoming pull requests earlier, doh)\n187 lines of comment in the entry of Aleksey Shipilëv\n188x speed-up improvement between the baseline implementation and the winning entry\n~100 consumed cups of coffee while evaluating the entries\nLastly, here are some more external resources on 1BRC, either on the challenge itself or folks sharing their insights from building a solution (see here for a longer list of blog posts and videos):\nCliff Click discussing his 1BRC solution on the Coffee Compiler Club (video)\nThe One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds (interview by Olimpiu Pop)\nOne Billion Row Challenge (blog post by Ragnar Groot Koerkamp)\nWhich Challenge Will Be Next? Java is alive and kicking! 1BRC has shown that Java and its runtime are powerful and highly versatile tools, suitable also for tasks where performance is of uttermost importance. Apart from the tech itself, the most amazing thing about Java is its community though: it sparked a tremendous level of joy to witness how folks came together for solving this challenge, learning with and from each other, sharing tricks, and making this a excellent experience all-around.\nSo I guess it’s just natural that some folks asked whether there’d be another challenge like this any time soon, when it is going to happen, what it will be about, etc. Someone even stated they’d take some time off January next year to fully focus on the challenge :)\nI think for now it’s a bit too early to tell what could be next and I’ll definitely need a break from running a challenge. But if a team came together to organize something like 1BRC next year, with a strong focus on running things in an automated way as much as possible, I could absolutely see this. The key challenge (sic!) will be to find a topic which is equally as approachable as this year’s task, while providing enough opportunity for exploration and optimization. I am sure the community will manage to come up with something here.\nFor now, congrats once again to everyone participating this time around, and a big thank you to everyone helping to make it a reality!\n1️⃣🐝🏎️ ","id":20,"publicationdate":"Feb 4, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOh what a wild ride the last few weeks have been.\nThe \u003ca href=\"/blog/one-billion-row-challenge/\"\u003eOne Billion Row Challenge\u003c/a\u003e (1BRC for short),\nsomething I had expected to be interesting to a dozen folks or so at best,\nhas gone kinda viral, with hundreds of people competing and engaging.\nIn Java, as intended, but also \u003ca href=\"https://github.com/gunnarmorling/1brc/discussions/categories/show-and-tell\"\u003ebeyond\u003c/a\u003e:\nfolks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt’s really incredible how far people have pushed the limits here.\nPull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation.\nToday I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31\nand all submissions have been reviewed.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"1BRC—The Results Are In!","uri":"https://www.morling.dev/blog/1brc-results-are-in/"},{"content":" I have contributed to a wide range of open-source projects over the last years. Here’s a selection of projects I have been involved with.\n1BRC 1️⃣🐝🏎️ The One Billion Row Challenge, or 1BRC for short, is a fun exploration of how quickly 1B rows from a text file can be aggregated with Java. It is a coding challenge I ran in January 2024, which provided an opportunity to learn about modern Java APIs, SIMD, and high-performance programming techniques to hundreds of developers. Focussed on Java originally, 1BRC garnered a huge interest from folks in other eco-systems as well.\nDebezium Debezium is a platform for change data capture; it lets you stream changes out of different databases such as Postgres, MySQL, MongoDB and SQL Server into Apache Kafka. I was the lead of the Debezium project for several years.\nQuarkus Quarkus is a \u0026#34;Kubernetes Native Java stack tailored for OpenJDK HotSpot and GraalVM, crafted from the best of breed Java libraries and standards\u0026#34;. My contributions to Quarkus are centered around its extension for Kafka Streams, which I initially created.\nJfrUnit JfrUnit is a JUnit extension for asserting JDK Flight Recorder events. It comes handy for ensuring the right custom JFR events are emitted by a JVM-based library or application as well as for identifying potential performance regressions by tracking JFR events for memory allocation, network I/O, etc.\nJFR Analytics JFR Analytics is an exploration for running analytics on JDK Flight Recorder recordings. It lets you use SQL to identify things like classloader leaks or thread leaks, allocation-heavy methods in your program, and more.\nkcctl 🧸 kcctl is a command-line client for working with Kafka Connect, allowing to examine the state of the Connect cluster and individual connectors, register and start/stop connectors, etc. It is based on Quarkus and provided as a native binary for Linux, macOS, and Windows via GraalVM.\nModiTect, Layrry, and Deptective ModiTect is a family of Maven and Gradle plug-ins around the Java Module System, e.g. for creating module descriptors and building modular runtime images via jlink. Layrry is an API and launcher for modularized Java applications, leveraging the Java Module System’s notion of module layers, e.g. allowing to work with multiple versions of one dependency. Deptective is a plug-in for the Java compiler (javac) for enforcing package dependencies within Java projects based on a declarative architecture definition.\nMapStruct MapStruct is a compile-time code generator for bean-to-bean mappings. Based on annotated Java interfaces, MapStruct generates mapping code that it is fully type-safe and very efficient by avoiding any usage of reflection. I was the creator and initial project lead of MapStruct.\nBean Validation and Hibernate Validator Bean Validation is a Java specification which lets you express constraints on object models via annotations. Originally developed at the JCP, it’s now part of the Jakarta EE umbrella at the Eclipse foundation. I have been the spec lead for Bean Validation 2.0 (JSR 380) and the lead of the reference implementation Hibernate Validator.\nOther Hibernate Projects As part of the Hibernate team, I’ve contributed to different projects such as Hibernate OGM (an effort to access NoSQL stores with JPA), Hibernate Search (full-text search for domain models based on Apache Lucene and Elasticsearch) and Hibernate ORM.\n","id":21,"publicationdate":"Jan 20, 2024","section":"","summary":"I have contributed to a wide range of open-source projects over the last years. Here’s a selection of projects I have been involved with.\n1BRC 1️⃣🐝🏎️ The One Billion Row Challenge, or 1BRC for short, is a fun exploration of how quickly 1B rows from a text file can be aggregated with Java. It is a coding challenge I ran in January 2024, which provided an opportunity to learn about modern Java APIs, SIMD, and high-performance programming techniques to hundreds of developers.","tags":null,"title":"Projects","uri":"https://www.morling.dev/projects/"},{"content":" Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit.\nFor folks to show-case non-Java solutions, there is a \u0026#34;Show \u0026amp; Tell\u0026#34; now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot.\nThanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries.\nLet’s kick off 2024 true coder style—​I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31.\nYour mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!\nThe text file has a simple structure with one measurement value per row:\n1 2 3 4 5 6 Hamburg;12.0 Bulawayo;8.9 Palembang;38.8 St. John\u0026#39;s;15.2 Cracow;12.6 ... The program should print out the min, mean, and max values per station, alphabetically ordered like so:\n1 {Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...} The goal of the 1BRC challenge is to create the fastest implementation for this task, and while doing so, explore the benefits of modern Java and find out how far you can push this platform. So grab all your (virtual) threads, reach out to the Vector API and SIMD, optimize your GC, leverage AOT compilation, or pull any other trick you can think of.\nThere’s a few simple rules of engagement for 1BRC (see here for more details):\nAny submission must be written in Java\nAny Java distribution available through SDKMan as well as early access builds from openjdk.net may be used, including EA builds for OpenJDK projects like Valhalla\nNo external dependencies may be used\nTo enter the challenge, clone the 1brc repository from GitHub and follow the instructions in the README file. There is a very basic implementation of the task which you can use as a baseline for comparisons and to make sure that your own implementation emits the correct result. Once you’re satisfied with your work, open a pull request against the upstream repo to submit your implementation to the challenge.\nAll submissions will be evaluated by running the program on a Hetzner Cloud CCX33 instance (8 dedicated vCPU, 32 GB RAM). The time program is used for measuring execution times, i.e. end-to-end times are measured. Each contender will be run five times in a row. The slowest and the fastest runs are discarded. The mean value of the remaining three runs is the result for that contender and will be added to the leaderboard. If you have any questions or would like to discuss any potential 1BRC optimization techniques, please join the discussion in the GitHub repo.\nAs for a prize, by entering this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. Rumor has it that the winner may receive a unique 1️⃣🐝🏎️ t-shirt, too.\nSo don’t wait, join this challenge, and find out how fast Java can be—​I’m really curious what the community will come up with for this one. Happy 2024, coder style!\n","id":22,"publicationdate":"Jan 1, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Jan 4: Wow, this thing really took off!\u003c/em\u003e\n\u003cem\u003e1BRC is discussed at a couple of places on the internet, including \u003ca href=\"https://news.ycombinator.com/item?id=38851337\"\u003eHacker News\u003c/a\u003e, \u003ca href=\"https://lobste.rs/s/u2qcnf/one_billion_row_challenge\"\u003elobste.rs\u003c/a\u003e, and \u003ca href=\"https://old.reddit.com/r/programming/comments/18x0x0u/the_one_billion_row_challenge/\"\u003eReddit\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eFor folks to show-case non-Java solutions, there is a \u003ca href=\"https://github.com/gunnarmorling/1brc/discussions/categories/show-and-tell\"\u003e\u0026#34;Show \u0026amp; Tell\u0026#34;\u003c/a\u003e now, check that one out for 1BRC implementations in Rust, Go, C++, and others.\u003c/em\u003e\n\u003cem\u003eSome interesting related write-ups include \u003ca href=\"https://rmoff.net/2024/01/03/1%EF%B8%8F%E2%83%A3%EF%B8%8F-1brc-in-sql-with-duckdb/\"\u003e1BRC in SQL with DuckDB\u003c/a\u003e by Robin Moffatt and \u003ca href=\"https://ftisiot.net/posts/1brows/\"\u003e1 billion rows challenge in PostgreSQL and ClickHouse\u003c/a\u003e by Francesco Tisiot.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThanks a lot for all the submissions, this is going way beyond what I’d have expected!\u003c/em\u003e\n\u003cem\u003eI am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit.\u003c/em\u003e\n\u003cem\u003eI have also made a few clarifications to \u003ca href=\"https://github.com/gunnarmorling/1brc#faq\"\u003ethe rules\u003c/a\u003e of the challenge; please make sure to read them before submitting any entries.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLet’s kick off 2024 true coder style—​I’m excited to announce the \u003ca href=\"https://github.com/gunnarmorling/onebrc\"\u003eOne Billion Row Challenge\u003c/a\u003e (1BRC), running from Jan 1 until Jan 31.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eYour mission, should you decide to accept it, is deceptively simple:\nwrite a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station.\nThere’s just one caveat: the file has \u003cstrong\u003e1,000,000,000 rows\u003c/strong\u003e!\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"The One Billion Row Challenge","uri":"https://www.morling.dev/blog/one-billion-row-challenge/"},{"content":" Update Dec 18: This post is discussed on Hacker News 🍊\nAs regular readers of this blog will now, JDK Flight Recorder (JFR) is one of my favorite tools of the Java platform. This low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues. JFR continues to become better and better with every new release, with one recent addition being support for native memory tracking (NMT).\nNMT by itself is not a new capability of the JVM: it provides you with detailed insight into the memory consumption of your application, which goes way beyond the well-known Java heap space. NMT tells you how much memory the JVM uses for class metadata, thread stacks, the JIT compiler, garbage collection, memory-mapped files, and much more (the one thing which NMT does not report, despite what the name might suggest, is any memory allocated by native libraries, for instance invoked via JNI). To learn more about NMT, I highly recommend to read the excellent post Off-Heap memory reconnaissance by Brice Dutheil.\nUntil recently, in order to access NMT, you’d have to use the jcmd command line tool for capturing the values of a running JVM in an ad-hoc way. Whereas since Java 20, you can record NMT data continuously with JFR, thanks to two new JFR event types added for this purpose. This makes it much easier to collect that data over a longer period of time and analyze it in a systematic way. You could also expose a live stream of NMT data to remote clients via JFR event streaming, for instance for integration with dashboards and monitoring solutions.\nThe list of JFR event types grows with every release. If you’d like to learn which event types are available in which Java version, take a look at the JFR Events list compiled by Johannes Bechberger from the Java team at SAP. It also shows you the events added in a particular version, for instance here for the new events in Java 21.\nAn Example So let’s see how NMT data is reported via JFR. Here’s a simple example program which allocates some off heap memory, once using a good old direct byte buffer, and once using the new Foreign Memory API, finalized in Java 22 with JEP 454 (it feels so nice to be able to allocate 4GB at once, something you couldn’t do before):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import java.nio.ByteBuffer; import java.lang.foreign.Arena; import java.lang.foreign.MemorySegment; import static java.time.LocalDateTime.now; public void main() throws Exception { System.out.println(STR.\u0026#34;\\{ now() } Started\u0026#34;); Thread.sleep(5000); ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024 * 1024); System.out.println(STR.\u0026#34;\\{ now() } Allocated (Direct)\u0026#34;); Thread.sleep(5000); try (Arena arena = Arena.ofConfined()) { MemorySegment segment = arena.allocate(4L * 1024L * 1024L * 1024L); System.out.println(STR.\u0026#34;\\{ now() } Allocated (FMI)\u0026#34;); Thread.sleep(5000); } buffer = null; System.out.println(STR.\u0026#34;\\{ now() } Deallocated\u0026#34;); Thread.sleep(5000); } JFR records NMT events every second by default, so I’ve sprinkled in some sleep() calls to make sure the program runs long enough and the different allocations are spread out a bit. Just for the fun of it, I’m also using a top-level main method—as supported by JEP 463—and string templates for the log messages (JEP 459).\nLet’s run this and see how those off-heap allocations are tracked by JFR. Somewhat surprisingly, NMT in JFR is controlled via the gc setting, which must be set to a value of \u0026#34;normal\u0026#34;, \u0026#34;detailed\u0026#34;, \u0026#34;high\u0026#34;, or \u0026#34;all\u0026#34; for recording NMT data. This is the case for the default and profile JFR configurations which ship with the SDK, so using either configuration will give you the NMT data. Note though that in addition, NMT itself must be enabled using the -XX:NativeMemoryTracking JVM option:\n1 2 3 4 5 6 7 8 9 10 java --enable-preview --source 22 \\ -XX:StartFlightRecording=name=Profiling,filename=nmt-recording.jfr,settings=profile \\ -XX:NativeMemoryTracking=detail main.java [0.316s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default. [0.316s][info][jfr,startup] [0.316s][info][jfr,startup] Use jcmd 47194 JFR.dump name=Profiling to copy recording data to file. 2023-12-17T18:31:00.475598 Started 2023-12-17T18:31:05.609319 Allocated (Direct) 2023-12-17T18:31:11.167484 Allocated (FMI) 2023-12-17T18:31:16.253059 Deallocated Let’s open the recording in JDK Mission Control and see what we find. As of version 8.3, JMC doesn’t have a bespoke view for displaying NMT data, but the NMT events show up in the generic event browser view. There are two event types, the first one being \u0026#34;Total Native Memory Usage\u0026#34;:\nThe two off-heap allocations of 1 GB (direct byte buffer) and 4 GB (Foreign Memory API) show up as expected as increases to the reserved and committed memory of the program. We also see one of the advantages of the new Foreign Memory API: the memory is deallocated as soon as the Arena object is closed, whereas the JVM holds on to the memory of the byte buffer also after discarding the reference. There’s no control over when this memory will be released exactly, it will be done via a phantom-reference-based cleaner some time after the GC has removed the associated buffer object.\nThe second new event type, \u0026#34;Native Memory Usage Per Type\u0026#34;, provides a more fine grained view (when setting -XX:NativeMemoryTracking to detail rather than summary). The off-heap allocations show up under the \u0026#34;Other\u0026#34; category there:\nUpdate Dec 18: As OpenJDK developer Eric Gahlin pointed out, you also can take a high-level view at the NMT events of a recording using the JDK’s jfr tool, which provides two built-in views for committed and reserved memory:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 $JAVA_HOME/bin/jfr view native-memory-committed recording.jfr Native Memory Committed Memory Type First Observed Average Last Observed Maximum ------------------------------ -------------- --------- ------------- --------- Other 1,8 MB 1,7 GB 1,0 GB 5,0 GB Java Heap 136,0 MB 136,0 MB 136,0 MB 136,0 MB GC 54,2 MB 54,2 MB 54,2 MB 54,2 MB Metaspace 16,0 MB 16,0 MB 16,1 MB 16,1 MB Tracing 15,6 MB 15,7 MB 15,7 MB 15,7 MB Code 12,6 MB 12,6 MB 12,6 MB 12,6 MB Shared class space 12,4 MB 12,4 MB 12,4 MB 12,4 MB Arena Chunk 8,5 MB 2,2 MB 2,0 kB 8,5 MB Symbol 5,8 MB 5,8 MB 5,8 MB 5,8 MB Class 2,7 MB 2,7 MB 2,7 MB 2,7 MB Native Memory Tracking 1,7 MB 1,7 MB 1,7 MB 1,7 MB Synchronization 1,2 MB 1,2 MB 1,2 MB 1,2 MB Internal 563,4 kB 561,9 kB 561,7 kB 563,4 kB Compiler 202,9 kB 206,4 kB 205,6 kB 238,5 kB Module 174,1 kB 174,1 kB 174,1 kB 174,1 kB Thread 86,0 kB 82,5 kB 81,4 kB 86,0 kB Safepoint 32,0 kB 32,0 kB 32,0 kB 32,0 kB GCCardSet 29,5 kB 29,5 kB 29,5 kB 29,5 kB Serviceability 17,6 kB 17,6 kB 17,6 kB 17,6 kB Object Monitors 1,0 kB 1,0 kB 1,0 kB 1,0 kB String Deduplication 608 bytes 608 bytes 608 bytes 608 bytes Arguments 185 bytes 185 bytes 185 bytes 185 bytes Statistics 128 bytes 128 bytes 128 bytes 128 bytes Logging 32 bytes 32 bytes 32 bytes 32 bytes Test 0 bytes 0 bytes 0 bytes 0 bytes JVMCI 0 bytes 0 bytes 0 bytes 0 bytes Thread Stack 0 bytes 0 bytes 0 bytes 0 bytes Tracking RSS As per the docs, NMT will cause a performance overhead of 5% - 10% (how large the overhead actually is, depends a lot on the specific workload), so it’s probably not something you’d want to do permanently in a production setting. Luckily, Java 21 adds another JFR event type, \u0026#34;Resident Set Size\u0026#34; (RSS), which allows you to track the overall memory consumption of your application on an ongoing basis:\nOf course you can retrieve the RSS, i.e. the physical memory allocated by a process, also using other tools like ps, but recording it via JFR makes it really simple to analyze its development over time, and also allows you to correlate it with other relevant JFR events, for instance for class (un-)loading or garbage collection.\nWith JFR event streaming, you could also expose a live feed of the value to remote monitoring clients, allowing you to keep track visually using a dashboard. But you also could apply some kind of pattern matching to this time series of values, triggering an alert when it continues to grow also after the application’s warm-up phase.\nI am planning to explore how to do this with a bit of SQL using JFR Analytics in a future blog post.\n","id":23,"publicationdate":"Dec 17, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Dec 18: This post is \u003ca href=\"https://news.ycombinator.com/item?id=38677628\"\u003ediscussed on Hacker News\u003c/a\u003e\u003c/em\u003e 🍊\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs regular readers of this blog will now, \u003ca href=\"https://openjdk.org/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) is one of my favorite tools of the Java platform.\nThis low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues.\nJFR continues to become better and better with every new release,\nwith one recent addition being support for native memory tracking (NMT).\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Tracking Java Native Memory With JDK Flight Recorder","uri":"https://www.morling.dev/blog/tracking-java-native-memory-with-jdk-flight-recorder/"},{"content":" This question came up on the Data Engineering sub-reddit the other day: Can Debezium lose any events? I.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?\nI’ve already replied on Reddit itself, but I thought it’d warrant a slightly longer discussion here. To get the most important thing out of the way first: In general, Debezium by itself should never miss any event. If it does, that’s considered a blocker bug which the development team will address with highest priority. After all, Debezium’s semantics are at-least-once (i.e. duplicate events may occur, specifically after an unclean connector shut-down), not at-most-once.\nThat being said, it may happen that due to operational deficiencies portions of the database’s transaction log get discarded before Debezium gets a chance to capture them. This can happen when a Debezium connector isn’t running for a longer period of time, and the maximum transaction log retention time is reached.\nMost of the databases provide some sort of configuration parameter for controlling this behavior. In MySQL for instance, there is the binlog_expire_logs_seconds parameter for this purpose (which defaults to 2,592,000 seconds, i.e. 30 days). When you are using MySQL on Amazon RDS, the option to use is called binlog retention hours. For SQL Server, the retention time for CDC data can be configured using the stored procedure sys.sp_cdc_change_job().\nIn contrast, Postgres approaches this matter a bit differently: Replication slots keeps track of how far consumers have consumed the write-ahead log (WAL). Consumers must actively acknowledge the latest WAL position (log sequence number, LSN) they have consumed. Only when an LSN has been acknowledged by all replication slots, the database will discard older WAL segments. This means that, by default, even an extended connector downtime will not lead to event loss. This comes at a price though: the database holds on to all the unconsumed WAL segments, consuming more and more disk space until the connector gets restarted again.\nThe Insatiable Replication Slot Even when a replication slot is active, it can happen under specific circumstances that the slot’s consumer cannot acknowledge any LSNs, causing the database machine to run out of disk space eventually. You can learn more about the reasons, and ways for mitigating this issue, in this blog post. Luckily, the issue has recently been resolved in the Postgres JDBC driver, version 42.6.0.\nTherefore, a new configuration option was introduced in Postgres 13, max_slot_wal_keep_size, which defines the maximum WAL size in bytes which a replication slot may retain. If a slot causes retained WAL files to grow beyond the configured value, older segments will be removed. This means that, when configuring this option (the default value is -1, i.e. an indefinite WAL keep size), the behavior is the same as for instance with MySQL, and consumers will not be able to resume processing after falling off the log. By means of the always snapshot mode, you can start with a new complete initial snapshot in this case.\nIn general though, you should avoid this situation to begin with, and have observability tools in place which will trigger an alert when a Debezium connector isn’t running for a longer period of time, for instance by querying the Kafka Connect REST API. For Postgres, you also can track the retained WAL size of a replication slot using the pg_current_wal_lsn() and pg_wal_lsn_diff() functions, as I described in this blog post a while ago.\n","id":24,"publicationdate":"Nov 14, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis question came up on the Data Engineering sub-reddit the other day:\n\u003ca href=\"https://old.reddit.com/r/dataengineering/comments/17ttw5e/can_debezium_loose_updates/\"\u003eCan Debezium lose any events\u003c/a\u003e?\nI.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Can Debezium Lose Events?","uri":"https://www.morling.dev/blog/can-debezium-lose-events/"},{"content":" The other day at work, we had a situation where we suspected a thread leak in one particular service, i.e. code which continuously starts new threads, without taking care of ever stopping them again. Each thread requires a bit of memory for its stack space, so starting an unbounded number of threads can be considered as a form of memory leak, causing your application to run out of memory eventually. In addition, the more threads there are, the more overhead the operating system incurs for scheduling them, until the scheduler itself will consume most of the available CPU resources. Thus it’s vital to detect and fix this kind of problem early on.\nThe usual starting point for analyzing a suspected thread leak is taking a thread dump, for instance using the jstack CLI tool or via JDK Mission Control; if there’s an unexpected large number of threads (oftentimes with similar or even identical names), then it’s very likely that something is wrong indeed. But a thread dump by itself is only a snapshot of the thread state at a given time, i.e. it doesn’t tell you how the thread count is changing over time (perhaps there are many threads which are started but also stopped again?), and it also doesn’t provide you with information about the cause, i.e. which part of your application is starting all those threads. Does it happen in your own code base, or within some 3rd party dependency? While the thread names and stacks in the dump can give you some idea, that information isn’t necessarily enough for a conclusive root cause analysis.\nLuckily, Java’s built-in event recorder and performance analysis tool, JDK Flight Recorder, exposes all the data you need to identify thread leaks and their cause. So let’s take a look at the details, bidding farewell to those pesky thread leaks once and forever!\nThe first JFR event type to look at is jdk.JavaThreadStatistics: recorded every second by default, it keeps track of active, accumulated, and peak thread counts. Here is a JFR recording from a simple thread leak demo application I’ve created (newest events at the top):\nThe number of active threads is continuously increasing, never going back down again — pretty clearly that this a thread leak. Now let’s figure out where exactly all those threads are coming from.\nFor this, two other JFR event types come in handy: jdk.ThreadStart and jdk.ThreadEnd. The former captures all the relevant information when a thread is started: time stamp, name of the new thread and the parent thread, and the stack trace of the parent thread when starting the child thread. The latter event type will be recorded when a thread finishes. If we find many thread start events originating at the same code location without a corresponding end event (correlated via the thread id contained in the events), this is very likely a source of a thread leak.\nThis sort of event analysis is a perfect use case for JFR Analytics. This tool allows you to analyze JFR recordings using standard SQL (leveraging Apache Calcite under the hood). In JFR Analytics, each event type is represented by its own \u0026#34;table\u0026#34;. Finding thread start events without matching end events is as simple as running a LEFT JOIN on the two event types and keeping only those start events which don’t have a join partner.\nSo let’s load the file into the SQLLine command line client (see the README of JFR Analytics for instructions on building and launching this tool):\n1 2 3 !connect jdbc:calcite:schemaFactory=org.moditect.jfranalytics.JfrSchemaFactory;schema.file=thread_leak_recording.jfr dummy dummy !outputformat vertical Run the following SQL query for finding thread start events without corresponding thread join events:\n1 2 3 4 5 6 7 8 SELECT ts.\u0026#34;startTime\u0026#34;, ts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaName\u0026#34; as \u0026#34;parentThread\u0026#34;, ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;newThread\u0026#34;, TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 20) AS \u0026#34;stackTrace\u0026#34; FROM \u0026#34;jdk.ThreadStart\u0026#34; ts LEFT JOIN \u0026#34;jdk.ThreadEnd\u0026#34; te ON ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; = te.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; WHERE te.\u0026#34;startTime\u0026#34; IS NULL; Note how the parentThread and eventThread columns are of a complex SQL type, allowing you to refer to thread properties such as javaName or javaThreadId using dot notation. In that simple example recording, there’s one stack trace which dominates the result set, so looking at any of the events reveals the culprit:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 startTime 2023-02-26 11:36:04.284 javaName executor-thread-0 javaName pool-1060-thread-1 stackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Callable):145 java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791 org.acme.GreetingResource.hello():18 null null null null jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Object, Object[]):104 java.lang.reflect.Method.invoke(Object, Object[]):578 org.jboss.resteasy.core.MethodInjectorImpl.invoke(HttpRequest, HttpResponse, Object, Object[]):170 org.jboss.resteasy.core.MethodInjectorImpl.invoke(HttpRequest, HttpResponse, Object):130 org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(HttpRequest, HttpResponse, Object):660 org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(HttpRequest, HttpResponse, Object):524 org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(HttpRequest, HttpResponse, Object):474 null org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter():364 The call for creating a new thread apparently is initiated by the GreetingResource::hello() method by submitting a Callable to an executor service. And surely enough, this is how it looks like:\n1 2 3 4 5 6 7 8 9 10 11 @GET @Produces(MediaType.TEXT_PLAIN) public String hello() { ExecutorService executor = Executors.newSingleThreadExecutor(); executor.submit(() -\u0026gt; { while (true) { Thread.sleep(1000L); } }); return \u0026#34;Hello World\u0026#34;; } If things are not as clear-cut as in that contrived example, it can be useful to truncate stack traces to a reasonable line count (e.g. it should be save to assume that the user code starting a thread is never further away than ten frames from the actual thread start call) and group by that. JFR Analytics provides the built-in function TRUNCATE_STACKTRACE for this purpose:\n1 2 3 4 5 6 7 8 9 SELECT TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 10) AS \u0026#34;stackTrace\u0026#34;, COUNT(1) AS \u0026#34;threadCount\u0026#34; FROM \u0026#34;jdk.ThreadStart\u0026#34; ts LEFT JOIN \u0026#34;jdk.ThreadEnd\u0026#34; te ON ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; = te.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; WHERE te.\u0026#34;startTime\u0026#34; IS NULL GROUP BY TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 10) ORDER BY \u0026#34;threadCount\u0026#34; DESC; This points at the problematic stack traces and code locations in a very pronounced way (output slightly adjusted for better readability):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 stackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Callable):145 java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791 org.acme.GreetingResource.hello():18 null null null threadCount 414 --- stackTrace java.util.Timer.\u0026lt;init\u0026gt;(String, boolean):188 jdk.jfr.internal.PlatformRecorder.lambda$createTimer$0(List):101 null java.lang.Thread.run():1589 threadCount 1 Sometimes you may encounter a situation where new threads are started from within other threads in a 3rd party dependency, rather than directly from threads within your own code base. In that case the stack traces of the thread start events may not tell you enough about the root cause of the problem, i.e. where those other \u0026#34;intermediary\u0026#34; threads are coming from, and how they relate to your own code.\nTo dig into the details here, you can leverage the fact that each jdk.ThreadStart event contains information about the parent thread which started a new thread. So you can join the jdk.ThreadStart table to itself on the parent thread’s id, fetching also the stack traces of the code starting those parent threads:\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT ts.\u0026#34;startTime\u0026#34;, pts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;grandParentThread\u0026#34;, ts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;parentThread\u0026#34;, ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;newThread\u0026#34;, TRUNCATE_STACKTRACE(pts.\u0026#34;stackTrace\u0026#34;, 15) AS \u0026#34;parentStackTrace\u0026#34;, TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 15) AS \u0026#34;stackTrace\u0026#34; FROM \u0026#34;jdk.ThreadStart\u0026#34; ts LEFT JOIN \u0026#34;jdk.ThreadEnd\u0026#34; te ON ts.\u0026#34;startTime\u0026#34; = te.\u0026#34;startTime\u0026#34; JOIN \u0026#34;jdk.ThreadStart\u0026#34; pts ON ts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; = pts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; WHERE te.\u0026#34;startTime\u0026#34; IS NULL; Here, stackTrace is the trace of a thread (named \u0026#34;pool-728-thread-1\u0026#34;) of an external library, \u0026#34;greeting provider\u0026#34;, which starts another (leaking) thread (named \u0026#34;pool-729-thread-1\u0026#34;), and parentStackTrace points to the code in our own application (thread name \u0026#34;executor-thread-0\u0026#34;) which started that first thread:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 startTime 2023-02-28 09:15:24.493 grandParentThread executor-thread-0 parentThread pool-728-thread-1 newThread pool-729-thread-1 parentStackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Runnable):123 java.util.concurrent.Executors$DelegatedExecutorService.submit(Runnable):786 com.example.greeting.GreetingService.greet():20 com.example.greeting.GreetingService_ClientProxy.greet() org.acme.GreetingResource.hello():20 null null null null jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Object, Object[]):104 java.lang.reflect.Method.invoke(Object, Object[]):578 --- stackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Callable):145 java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791 com.example.greeting.GreetingProvider.createGreeting():13 com.example.greeting.GreetingProvider_ClientProxy.createGreeting() com.example.greeting.GreetingService.lambda$greet$0(AtomicReference):21 null java.util.concurrent.Executors$RunnableAdapter.call():577 java.util.concurrent.FutureTask.run():317 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker):1144 java.util.concurrent.ThreadPoolExecutor$Worker.run():642 java.lang.Thread.run():1589 If the thread hierarchy is even deeper, you could continue down that path and keep joining more and more parent threads until you’ve arrived at the application’s main thread. I was hoping to leverage recursive query support in Calcite for this purpose, but as it turned out, support for this only exists in the Calcite RelBuilder API at the moment, while the RECURSIVE keyword is not supported for SQL queries yet.\nEquipped with JDK Flight Recorder, JDK Mission Control, and JFR Analytics, identifying and fixing thread leaks in your Java application is becoming a relatively simple task. The jdk.JavaThreadStatistics, jdk.ThreadStart, and jdk.ThreadEnd event types are enabled in the default JFR profile, which is meant for permanent usage in production. I.e. you can keep a size-capped continuous recording running all the time, dump it into a file whenever needed, and then start the analysis process as described above.\nTaking things a step further, you could also set up monitoring and alerting on the number of active threads, e.g. by exposing the jdk.JavaThreadStatistics event via a remote JFR event recording stream, allowing you to react in real-time whenever the active thread count reaches an unexpected high level.\n","id":25,"publicationdate":"Feb 28, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day at work, we had a situation where we suspected a thread leak in one particular service,\ni.e. code which continuously starts new threads, without taking care of ever stopping them again.\nEach thread requires a bit of memory for its stack space,\nso starting an unbounded number of threads can be considered as a form of memory leak, causing your application to run out of memory eventually.\nIn addition, the more threads there are, the more overhead the operating system incurs for scheduling them,\nuntil the scheduler itself will consume most of the available CPU resources.\nThus it’s vital to detect and fix this kind of problem early on.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Finding Java Thread Leaks With JDK Flight Recorder and a Bit Of SQL","uri":"https://www.morling.dev/blog/finding-java-thread-leaks-with-jdk-flight-recorder-and-bit-of-sql/"},{"content":" 27 years of age, and alive and kicking — The Java platform regularly comes out amongst the top contenders in rankings like the TIOBE index. In my opinion, rightly so. The language is very actively maintained and constantly improved; its underlying runtime, the Java Virtual Machine (JVM), is one of, if not the most, advanced runtime environments for managed programming languages.\nThere is a massive eco-system of Java libraries which make it a great tool for a large number of use cases, ranging from command-line and desktop applications, over web apps and backend web services, to datastores and stream processing platforms. With upcoming features like support for vectorized computations (SIMD), light-weight virtual threads, improved integration with native code, value objects and user-defined primitives, and others, Java is becoming an excellent tool for solving a larger number of software development tasks than ever before.\nThe immense breadth of Java and its ecosystem, having grown and matured over nearly three decades, can also be challenging though for folks just starting their careers as a Java developer. Which Java version should you use? How to install it? Which build tool and IDE are the right ones? For all these, and many other questions, there are typically a number of options, which can easily overwhelm you if you are new to Java. As the platform has evolved, tools have come and gone, things which were hugely popular years ago have fallen into obsolescence since then. As related information can still be found on the internet, it can be hard to identify what’s still relevant and what not.\nThe idea for this blog post is to provide an opinionated guide for folks getting started with Java development in 2023, helping you with your very first steps with that amazing platform. Note I’m not saying that the things I’m going to recommend are the best ones for each and every situation. The focus is on providing a good getting-started experience. Some of the recommended tools or approaches may make less sense to use as you get more experienced and other choices might be better suited for you then, based on the specific situation and its requirements.\nAlso, very importantly, there is a notion of personal taste and preference to these things, those are my recommendations, and those of others might look different, which is perfectly fine.\nJava — What is What? As you make your first steps with Java, it might be confusing to understand what even Java is. Indeed \u0026#34;Java\u0026#34; refers to several things, which can even trip up more experienced folks. Here’s a list of key terms and concepts:\nThe Java programming language A general-purpose statically-typed object-oriented programming language (with some functional flavors); it is compiled into portable byte code which can be executed on a wide range of platforms\nThe Java platform \u0026#34;A suite of programs that facilitate developing and running programs written in the Java programming language\u0026#34;, with key elements being the Java compiler (javac), the Java virtual machine, and the Java standard class library. Focus of this post is the Java Standard Edition (SE), other platforms like Java Micro Edition (ME), and Jakarta Enterprise Edition (EE) are not discussed here. A large number of languages other than Java (the language) itself can run on Java SE (the platform), for instance Kotlin, Groovy, and Scala; they are also out of scope for this article, though\nThe Java virtual machine (JVM) A virtual machine for executing Java programs (or more precisely, byte code), e.g. taking care of tasks like loading byte code and verifying its correctness, compiling it into platform-specific machine code using a just-in-time compiler (JIT), automated memory management via garbage collection, ensuring isolation between different components, providing runtime diagnostics, etc.; multiple JVM implementations exist, including HotSpot and OpenJ9\nThe Java Development Kit (JDK) A distribution of tools for developing and running Java applications\nOpenJDK An open-source implementation of Java SE and related projects; also the name of the open-source community creating this implementation\nThe Java Community Process (JCP) A mechanism for developing specifications in the Java space, including those defining the different Java versions\n📦 Distribution The Java platform is maintained by the OpenJDK open-source project. Similar to Linux, multiple vendors provide binary distributions for this project, including Amazon, the Eclipse Foundation, Microsoft, Oracle, or Red Hat. These distributions differ in aspects like availability of commercial support and duration of the same, supported platforms, extent of testing, certain features like available garbage collectors, potentially bug fixes, and others. So which one should you use?\nFor the beginning, the differences won’t matter too much, and I suggest choosing Eclipse Temurin. It is backed by Adoptium, a working group of companies like Google, Red Hat, Microsoft, Alibaba, Azul, and others. You can download and use it for free, it contains everything you’ll need, passes the test compatibility kit (TCK) of the JDK, and if needed, there is commercial support provided by different vendors.\n📆 Version A new version of Java is released every six months, with the current one at the time of writing this being Java 19. Specific releases are long-term support (LTS) releases, for which vendors provide maintenance for many years. The current LTS release is Java 17 and I recommend you to get started with this one.\nWhile newer non-LTS releases may add useful new features, finding a sustainable update strategy can be a bit tricky, and many of the new features are preview or incubating features, meaning that you would not use them in production code anyways. I recommend you diving into those later on, once you’ve gained some experience with Java and its ecosystem.\nIf specific 3rd-party libraries don’t work seamlessly with Java 17 yet, you should use the previous LTS (Java 11). Don’t use non-LTS releases apart from the current one, as they are mostly unmaintained, i.e. you may open yourself to security issues and other bugs which won’t get fixed. Also don’t use Java 8 (alternatively named 1.8), which is the LTS before 11, as it’s really ancient by today’s standards.\n🔧 Installation There’s different ways for installing your chosen Java distribution. Usually, there’ll be a distribution package which you can download from the vendor’s website. Alternatively, package managers of the operating system allow you to install Java too.\nFor a simplified getting started experience, my recommendation is to take a look at SDKMan. This is a tool which allows you to install software development kits (SDKs) such as Java’s JDK. You can also update your installed SDK versions and easily switch between multiple versions.\nIf you have SDKMan installed, obtaining the current Eclipse Temurin build of Java 17 is as simple as running the following in your shell:\n1 2 3 4 5 6 $ sdk install java 17.0.5-tem # Install $ sdk use java 17.0.5-tem # Activate $ java --version # Verify version openjdk 17.0.5 2022-10-18 ... Installation in Windows SDKMan is implemented in bash, so if you are on Windows, you’ll need to install either the Windows Subsystem for Linux (WSL) or Cygwin before you can use SDKMan. I’d recommend having either in any case, but if that’s not an option, you may install Java using the winget package manager or by downloading your distribution directly from its vendors website.\n💡 Your First Java Program Having installed Java, it’s time to write your first Java program. Java is first and foremost an object-oriented language, hence everything in a Java program is defined in the form of classes, which have fields (representing their state) and methods (the behavior operating on that state). The canonical \u0026#34;Hello World\u0026#34; example in Java looks like this:\n1 2 3 4 5 public class HelloWorld { (1) public static void main(String... args) { (2) System.out.println(\u0026#34;Hello world!\u0026#34;); (3) } } 1 The class HelloWorld must be specified in a source file named HelloWorld.java 2 The main() method is the entry point into a Java program 3 The println() method prints the given text to standard out Java source code is compiled into class files which then are loaded into the JVM and executed. Normally, this is done in two steps: first running the compiler javac, then executing the program using the java binary. For quick testing and exploring, both steps can be combined, so you can execute your \u0026#34;Hello World\u0026#34; program like this:\n1 2 $ java HelloWorld.java Hello world! For exploring Java in a quick and iterative mode, it provides jshell, an interactive Read-Evaluate-Print Loop (REPL). You can use it for running expressions and statements without defining a surrounding method or class, simplifying \u0026#34;Hello World\u0026#34; quite a bit:\n1 2 3 4 $ jshell jshell\u0026gt; System.out.println(\u0026#34;Hello World\u0026#34;); Hello World Similar to jshell, but quite a bit more fancier is jbang, which for instance allows you to easily pull in 3rd party libraries into your single source file Java programs.\n📚 Learning the Language Providing an introduction to all the features of the Java programming language is beyond the scope of this blog post. To truly learn the language and all its details, my recommendation would be to get a good book, grab a coffee (or two, or three, …​) and work through its chapters, in order of your personal interests. A popular choice for getting started with Java is \u0026#34;Head First Java, 3rd Edition\u0026#34; by Kathy Sierra, Bert Bates, Trisha Gee, nicely complemented by The Well-Grounded Java Developer, 2nd Edition, by Benjamin Evans, Jason Clark, and Martijn Verburg. A must-read for honing your Java skills is \u0026#34;Effective Java, 3rd Edition\u0026#34;, by Joshua Bloch. While this has been updated for Java 9 the last time, its contents are pretty much timeless and still apply to current Java versions.\nIf you don’t want to commit to buying a book just yet, check out the \u0026#34;Learn Java\u0026#34; section on dev.java, which has tons of material describing the Java language, key parts of the class library, the JVM and its most important tools, and more in great detail.\nThe authoritative resource on the Java language is the Java Language Specification, or JLS for short. The specification is written in a very concise and well understandable way, and I highly recommend you to take a look if you’d like to understand how specific details of the language exactly work. That being said, when you’re just about to get started with learning Java, you’ll be better off by studying the resources mentioned above.\nIf certifications are your thing, you might consider learning for and taking the exam for the \u0026#34;Oracle Certified Professional: Java SE 17 Developer\u0026#34; one. I’d only recommend doing so after having worked with Java at least for a year or so, as the exam actually is quite involved. You’ll certainly learn a lot about Java, including all kinds of corner cases and odd details; not everything will necessarily translate into your day-to-day work as a developer, though. So you should consciously decide whether you want to spend the time preparing for the certification or not.\n👷‍♀️ Build Tool Once you go beyond the basics of manually compiling and running a set of Java classes, you’ll need a build tool. It will not only help you with compiling your code, but also with managing dependencies (i.e. 3rd party libraries you are using), testing your application, assembling the output artifacts (e.g. a JAR file with your program), and much more. There are plug-ins for finding common bug patterns, auto-formatting your code, etc. Commonly used build tool options for Java include Apache Maven, Gradle, and Bazel.\nMy recommendation is to stick with Maven for the beginning; it’s the most widely used one, and in my opinion the easiest to learn. Installing it is as simple as running sdk install maven with SDKMan. While it defines a rather rigid structure for your project, that also frees you from having to think about many aspects, which is great in particular when getting started. Maven has support for archetypes, templates which you can use to quickly bootstrap new projects. For instance you can use the oss-quickstart archetype which I have built for creating new projects with a reasonable set of pre-configured plug-ins like so:\n1 2 3 4 5 6 7 8 mvn archetype:generate -B \\ -DarchetypeGroupId=org.moditect.ossquickstart \\ -DarchetypeArtifactId=oss-quickstart-simple-archetype \\ -DarchetypeVersion=1.0.0.Alpha1 \\ -DgroupId=com.example.demos \\ -DartifactId=fancy-project \\ -Dversion=1.0.0-SNAPSHOT \\ -DmoduleName=com.example.fancy A lesser known yet super-useful companion to Maven is the Maven Daemon, which helps you to drastically speed up your builds by keeping a daemon process running in the background, avoiding the cost of repeatedly launching and initializing the build environment. You can install it via SDKMan by running sdk install mvnd.\nAlternative build tools like Gradle tend to provide more flexibility and interesting features like \u0026#34;compilation avoidance\u0026#34; (rebuilding only affected parts of large code bases after a change) or distributed build caches (increasing developer productivity in particular in large projects), but I’d wait with looking at those until you’ve gathered some experience with Java itself.\n📝 Editor Many Java developers love to fight over their favorite build tools, and it’s the same with editors and full-blown integrated development environments (IDEs). So whatever I’m going to say here, it’s guaranteed a significant number of people won’t like it ;)\nMy suggestion is to start with VSCode. It’s a rather light-weight editor, which comes with excellent Java developer support, e.g. for testing and debugging your code. It integrates very well with Maven-based projects and has a rich eco-system of plug-ins you can tap into.\nAs your needs grow, you’ll probably look for an IDE which comes with even more advanced functionality, e.g. when it comes to refactoring your code. While I’m personally a happy user of the Eclipse IDE, most folks tend to use IntelliJ IDEA these days and it’s thus what I’d recommend you to look into too. It comes with a feature-rich free community edition which will help you a lot with the day-to-day tasks you’ll encounter as a Java developer. Make sure to spend a few hours learning the most important keyboard short-cuts, it will save you lots of time later on.\n🧱 Libraries The ecosystem of 3rd party libraries is one of Java’s absolute super-powers: there is a ready-made library or framework available for pretty much every task you might think of, most of the times available as open-source.\nPerhaps counter-intuitively, my recommendation here is to try and be conservative with pulling in libraries into your project, and instead work with what’s available in Java’s standard class library (which is huge and covers a wide range of functionality already). Next, check out what your chosen application framework (if you use one, see below) offers either itself or provides integrations for.\nAdding a dependency to an external library should always be a conscious decision, as you might easily run into version conflicts between transitive dependencies (i.e. dependencies of dependencies) in different versions, more dependencies increase the complexity of your application (for instance, you must keep them all up-to-date), they may increase the attack surface of your application, etc. Sometimes, you might be better off by implementing something yourself, or maybe copy a bit of code from a 3rd party library into your own codebase, provided the license of that library allows for that.\nThat said, some popular libraries you will encounter in many projects include JUnit (for unit testing), slf4j (logging), Jackson (JSON handling), Hibernate (object-relational persistence, domain model validation, etc.), Testcontainers (integration testing with Docker), and ArchUnit (enforcing software architecture rules). The \u0026#34;awesome-java\u0026#34; list is a great starting point for diving into the ecosystem of Java libraries.\nMost open-source dependencies are available via the Maven Central repository; All the build tools integrate with it, not only Maven itself, but also Gradle and all the others. The MVN Repository site is a good starting point for finding dependencies and their latest versions. If you want to distribute libraries within your own organization, you can do so by self-running repository servers like Nexus or Artifactory, or use managed cloud services such as AWS CodeArtifact.\n🐢 Application Framework Most Java enterprise applications are built on top of an application framework which provides support for structuring your code via dependency injection, seamlessly integrates with a curated set of 3rd party libraries in compatible versions, helps with configuring and testing your application, and much more.\nAgain, there’s plenty of options in Java here, such as Spring Boot, Quarkus, Jakarta EE, Micronaut, Helidon, and more. My personal recommendation here is to use Quarkus (it’s the one I’m most familiar with, having worked for Red Hat before, who are the company behind this framework), or alternatively Spring Boot.\nBoth are widely popular, integrate with a wide range of technologies (e.g. web frameworks and databases of all kinds), come with excellent developer tooling, and are backed by very active open-source communities.\n🐳 Container Base Image In particular when you are going to work on an enterprise application, chances are that you’ll publish your application in form of a container image, so people can run it on Docker or Kubernetes.\nSticking to the recommendation on using Eclipse Temurin as your Java distribution, I suggest to use the Temurin image as the base for your application images, e.g. eclipse-temurin:17 for Java 17. Just make sure to keep your image up to date, so you and your users benefit from updates to the base image.\nOne base image you should avoid is the OpenJDK one, which is officially deprecated and not recommended for production usage.\n🔭 Next Steps The points above hopefully can help you to embark onto a successful journey with the Java platform, but they only are a starting point. Depending on your specific needs and requirements, here is a number possible next topics to explore and learn about:\nExploring the tools which come with the JDK, for instance javadoc (for generating API documentation), jcmd (for sending diagnostic commands to a running Java application), or jpackage (for packaging self-contained Java applications)\nBuilding native binaries using GraalVM, allowing for a fast start-up and low memory consumption; very useful for instance for building command-line tools or AWS Lambda functions\nAnalyzing the performance and runtime characteristics of your application using JDK Flight Recorder and JDK Mission Control\nSetting up continuous integration (CI) workflows for automatically building and testing your application with GitHub Actions (the aforementioned Maven oss-quickstart archetype will generate a basic template for that automatically)\nPublishing open-source libraries to Maven Central with JReleaser\nFinally, a few resources which should help you to stay up-to-date with everything Java and learn what’s going on in the community include the Java News on dev.java, inside.java (\u0026#34;news and views from members of the Java team at Oracle\u0026#34;) the JEP Search (for searching and filtering Java enhancement proposals, i.e. changes to the language and the platform) and Foojay (Friends of OpenJDK).\nMany thanks to Nils Hartmann, Andres Almiray, and Oliver Zeigermann for their input and feedback while writing this post!\n","id":26,"publicationdate":"Jan 15, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e27 years of age, and alive and kicking — The Java platform regularly comes out amongst the top contenders in rankings like the \u003ca href=\"https://www.tiobe.com/tiobe-index/\"\u003eTIOBE index\u003c/a\u003e.\nIn my opinion, rightly so. The language is very actively maintained and constantly improved;\nits underlying runtime, the Java Virtual Machine (JVM),\nis one of, if not the most, advanced runtime environments for managed programming languages.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThere is a massive eco-system of Java libraries which make it a great tool for a large number of use cases,\nranging from command-line and desktop applications, over web apps and backend web services, to datastores and stream processing platforms.\nWith upcoming features like \u003ca href=\"https://openjdk.org/jeps/426\"\u003esupport for vectorized computations\u003c/a\u003e (SIMD),\nlight-weight \u003ca href=\"https://openjdk.org/projects/loom\"\u003evirtual threads\u003c/a\u003e,\nimproved \u003ca href=\"https://openjdk.org/projects/panama/\"\u003eintegration with native code\u003c/a\u003e,\n\u003ca href=\"https://openjdk.org/projects/valhalla/\"\u003evalue objects and user-defined primitives\u003c/a\u003e, and others,\nJava is becoming an excellent tool for solving a larger number of software development tasks than ever before.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Getting Started With Java Development in 2023 — An Opinionated Guide","uri":"https://www.morling.dev/blog/getting-started-with-java-development-2023/"},{"content":" I strongly believe that you should avoid connecting to production environments from local developer machines as much as possible. But sometimes, e.g. in order to analyse some specific kinds of failures, doing so can be inevitable.\nNow, if this is the case, I really, really want to be sure that I’m aware of the environment I am working in. I absolutely want to avoid a situation as in the catchy title of this post, when for instance you realize that you just ran some integration test against a production environment. In the context of working with the AWS CLI tool this means I’d like to be aware of the currently active profile by means of coloring my shell accordingly. Here’s how I’ve set this up using iTerm2 and zsh.\nThe first step is to create a profile in iTerm2 for each separate environment which you can easily recognize and tell apart. In my case, I’ve set up two profiles:\nA \u0026#34;Dev\u0026#34; profile with a dark green background\nA \u0026#34;Prod\u0026#34; profile with a dark red background\nI have also added badges with the profile name which is shown a the upper right corner of the window for further emphasis.\nWhile you can specify the right profile to use for each single invocation of the aws tool, this quickly becomes cumbersome. So I am enabling profiles using the AWS_PROFILE environment variable:\n1 export AWS_PROFILE=dev Whenever the value of this environment variable changes, I would like to activate the corresponding iTerm2 profile. This can be done programmatically by echo-ing a specific escape sequence which is interpreted by the terminal emulator:\n1 echo -e \u0026#34;\\033]50;SetProfile=Dev\\a\u0026#34; To make sure the right profile is set, I am using the precmd hook function in zsh. It is invoked every time before the prompt is displayed. Just add the following to your .zshrc file (if you have multiple actions you’d like to execute, it can be worthwhile to set them up as separate hook functions, as described in this post):\n1 2 3 4 5 6 7 8 9 10 11 precmd () { if [ \u0026#34;$AWS_PROFILE\u0026#34; = \u0026#34;dev\u0026#34; ] then echo -e \u0026#34;\\033]50;SetProfile=Dev\\a\u0026#34; elif [ \u0026#34;$AWS_PROFILE\u0026#34; = \u0026#34;prod\u0026#34; ] then echo -e \u0026#34;\\033]50;SetProfile=Prod\\a\u0026#34; else echo -e \u0026#34;\\033]50;SetProfile=Default\\a\u0026#34; fi } With that configuration in place (either source your .zshrc or open a new session for activating it), choosing a specific AWS profile automatically triggers the activation of the matching profile in iTerm2:\nThat way, it’s very apparent which AWS profile currently is active, substantially reducing the risk for making any stupid mistakes.\n","id":27,"publicationdate":"Jan 5, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI strongly believe that you should avoid connecting to production environments from local developer machines as much as possible.\nBut sometimes, e.g. in order to analyse some specific kinds of failures,\ndoing so can be inevitable.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eNow, if this is the case, I really, really want to be sure that I’m aware of the environment I am working in.\nI absolutely want to avoid a situation as in the catchy title of this post,\nwhen for instance you realize that you just ran some integration test against a production environment.\nIn the context of working with the AWS CLI tool this means I’d like to be aware of the currently active \u003ca href=\"https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html\"\u003eprofile\u003c/a\u003e by means of coloring my shell accordingly.\nHere’s how I’ve set this up using \u003ca href=\"https://iterm2.com/\"\u003eiTerm2\u003c/a\u003e and \u003ca href=\"https://www.zsh.org/\"\u003ezsh\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Oh... This is Prod?!","uri":"https://www.morling.dev/blog/oh_this_is_prod/"},{"content":" Java’s BlockingQueue hierarchy is widely used for coordinating work between different producer and consumer threads. When set up with a maximum capacity (i.e. a bounded queue), no more elements can be added by producers to the queue once it is full, until a consumer has taken at least one element. For scenarios where new work may arrive more quickly than it can be consumed, this applies means of back-pressure, ensuring the application doesn’t run out of memory eventually, while enqueuing more and more work items.\nOne interesting usage of blocking queues is to buffer writes to a database. Let’s take SQLite, an embedded RDBMS, as an example; SQLite only allows for a single writer at any given time, and it tends to yield a sub-optimal write through-put when executing many small transactions.\nA blocking queue can be used to mitigate that situation: all threads that wish to perform an update to the database, for instance the worker threads of a web application, submit work items with their write tasks to a blocking queue. Another thread fetches items in batches from that queue, executing one single transaction for all work items of a batch.\nThis results in a much better performance compared to each thread executing its own individual write transaction, in particular when keeping those open for the entire duration of web requests, as it’s commonly the case with most web frameworks. More on that architecture, in particular in regards to failure handling, in a future blog post.\nHow do you find out though when a producer actually is blocked while trying to add items to a BlockingQueue? After all, this is an indicator that the through-put of your system isn’t as high as it would need to be in order to fully satisfy the workload submitted by the producers.\nIf you have the means of running a profiler against the system, then for instance async-profiler with its wall-clock profiling option will come in handy for this task; unlike CPU profiling which only profiles running threads, wall-clock profiling will also tell you about the time spent by threads in blocked and waiting states, as is the case here.\nBut what when connecting with a wall-clock profiler is not an option? In this case, JDK Flight Recorder, Java’s go-to tool for all kinds of performance analyses, and its accompanying client, JDK Mission Control (JMC), can be of help to you. JFR specifically has been designed as an \u0026#34;always-on\u0026#34; event recording engine for usage in production environment. It doesn’t provide bespoke support for identifying blocked queue producers, though. BlockingQueue implementations such as ArrayBlockingQueue don’t use Java intrinsic locks (i.e. what you’d get when using the synchronized keyword), but rather locks based on the LockSupport primitives. These don’t show up in the \u0026#34;Lock Instances\u0026#34; view in JMC at this point.\nEmitting Custom Events One possible solution is to emit custom JFR events from within your own code whenever you’re trying to submit an item to a bounded queue at its maximum capacity. For this, you couldn’t use the put() method of the BlockingQueue interface, though, as it actually is blocking and you’d have no way to react to that.\nInstead, you’d have to rely on either offer() (which returns false when it cannot submit an item) or add() (which raises an exception). When the queue is full and you can’t submit another item, you’d instantiate your custom JFR event type, retry to submit the item for as long as it’s needed, and finally commit the JFR event. Needless to say that this kind of busy waiting is not only rather inefficient, you’d also have to remember to implement this pattern in all your blocking queue producers of your program.\nA better option, at least in theory, would be to use the JMC Agent. Part of the JDK Mission Control project, this Java agent allows you to instrument the byte code of existing methods, so that a JFR event will be emitted whenever they are invoked. The configuration of JMC Agent happens via an XML file and is rather straightforward. Here’s how instrumenting the put() method of the ArrayBlockingQueue type would look like:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 \u0026lt;?xml version=\u0026#34;1.0\u0026#34; encoding=\u0026#34;UTF-8\u0026#34;?\u0026gt; \u0026lt;jfragent\u0026gt; \u0026lt;config\u0026gt; \u0026lt;classprefix\u0026gt;__JFREvent\u0026lt;/classprefix\u0026gt; \u0026lt;allowtostring\u0026gt;true\u0026lt;/allowtostring\u0026gt; \u0026lt;allowconverter\u0026gt;true\u0026lt;/allowconverter\u0026gt; \u0026lt;/config\u0026gt; \u0026lt;events\u0026gt; \u0026lt;event id=\u0026#34;queue.Put\u0026#34;\u0026gt; \u0026lt;label\u0026gt;Put\u0026lt;/label\u0026gt; \u0026lt;description\u0026gt;Queue Put\u0026lt;/description\u0026gt; \u0026lt;path\u0026gt;Queues\u0026lt;/path\u0026gt; \u0026lt;stacktrace\u0026gt;true\u0026lt;/stacktrace\u0026gt; \u0026lt;class\u0026gt;java.util.concurrent.ArrayBlockingQueue\u0026lt;/class\u0026gt; \u0026lt;method\u0026gt; \u0026lt;name\u0026gt;put\u0026lt;/name\u0026gt; \u0026lt;descriptor\u0026gt;(Ljava/lang/Object;)V\u0026lt;/descriptor\u0026gt; \u0026lt;/method\u0026gt; \u0026lt;location\u0026gt;WRAP\u0026lt;/location\u0026gt; \u0026lt;/event\u0026gt; \u0026lt;/events\u0026gt; \u0026lt;/jfragent\u0026gt; With this agent configuration in place, you’d get an event for every invocation of put() though, no matter whether it actually is blocking or not. While you might be able to make some educated guess based on the duration of these events, that’s not totally reliable. For instance, you couldn’t be quite sure whether a \u0026#34;long\u0026#34; event actually is caused by blocking on the queue or by some GC activity.\nSo how about going one level deeper then? If you look at the implementation of ArrayBlockingQueue::put(), you’ll find that the actual blocking call happens through the await() method on the notFull Condition object. You could use JMC Agent to instrument that await() method, but this would give you events for every Condition instance, also for those not used by BlockingQueue implementations.\nFiltering \u0026#34;Thread Park\u0026#34; Events But this finally hints us into the right direction: await() is implemented on top of LockSupport::park(), and the JVM itself emits a JFR event whenever a thread is parked. How to identify though those \u0026#34;Java Thread Park\u0026#34; events actually triggered by blocking on a queue? If there only was a way to query and filter JFR events in a structured query language!\nTurns out there is. JFR Analytics lets you do exactly that: analysing JFR recording files using standard SQL. I haven’t worked that much on this project over the last year, but extending it for the use case at hand was easy enough. By means of the new HAS_MATCHING_FRAME() function it becomes trivial to identify the relevant events.\nJFR Analytics hasn’t been released to Maven Central yet, so you need to check out its source code and build it from source yourself. You then can use the SQLLine command line interface for examining your recordings:\n1 2 java --class-path \u0026#34;target/lib/*:target/jfr-analytics-1.0.0-SNAPSHOT.jar\u0026#34; \\ sqlline.SqlLine Then, within the CLI tool, \u0026#34;connect\u0026#34; to a recording file and change the output format to \u0026#34;vertical\u0026#34; for better readability of stack traces:\n1 2 sqlline\u0026gt; !connect jdbc:calcite:schemaFactory=org.moditect.jfranalytics.JfrSchemaFactory;schema.file=path/to/lock-recording.jfr dummy dummy sqlline\u0026gt; !outputformat vertical If you need a recording file to play with, check out this example project. It has a very simple main class with two threads: a producer thread which inserts 20 items per second to a blocking queue, and a consumer thread, which takes those items at a rate of ten items per second. Once the queue’s capacity has been reached, the producer will regularly block, as it can only insert ten items per second instead of 20. With JFR Analytics, the affected put() calls can be identified via the following query:\n1 2 3 4 5 6 7 8 SELECT \u0026#34;startTime\u0026#34;, \u0026#34;duration\u0026#34; / 1000000 AS \u0026#34;duration\u0026#34;, \u0026#34;eventThread\u0026#34;, TRUNCATE_STACKTRACE(\u0026#34;stackTrace\u0026#34;, 8) as \u0026#34;stack trace\u0026#34; FROM \u0026#34;jdk.ThreadPark\u0026#34; WHERE HAS_MATCHING_FRAME(\u0026#34;stackTrace\u0026#34;, \u0026#39;.*ArrayBlockingQueue\\.put.*\u0026#39;); Et voilà, the query returns exactly those thread park events emitted for any blocked put() call:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... startTime 2023-01-02 18:42:57.594 duration 455 eventThread pool-1-thread-1 stack trace jdk.internal.misc.Unsafe.park(boolean, long) java.util.concurrent.locks.LockSupport.park():371 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block():506 java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool$ManagedBlocker):3744 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool$ManagedBlocker):3689 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await():1625 java.util.concurrent.ArrayBlockingQueue.put(Object):370 dev.morling.demos.BlockingQueueExample$1.run():35 startTime 2023-01-02 18:42:58.097 duration 954 eventThread pool-1-thread-1 stack trace jdk.internal.misc.Unsafe.park(boolean, long) java.util.concurrent.locks.LockSupport.park():371 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block():506 java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool$ManagedBlocker):3744 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool$ManagedBlocker):3689 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await():1625 java.util.concurrent.ArrayBlockingQueue.put(Object):370 dev.morling.demos.BlockingQueueExample$1.run():35 ... Note how the stack traces are truncated so you can see the immediate caller, in this case the producer thread of the aforementioned example application. One thing to be aware of is that JFR applies a minimum threshold for capturing thread park events: 20 ms with the default configuration and 10 ms with the profile configuration. I.e. you would not know about any calls blocking shorter than that. You can adjust the threshold in your JFR configuration, but be aware of the potential overhead.\nEquipped with the information about any blocked invocations of put(), you now could take appropriate action; depending on the specific workload and its characteristics, you might for instance look into tuning your queue consumers, add more of them (when not in a sequencer scenario as with SQLite above), or maybe share the load across multiple machines. You also might increase the size of the queue, providing more wiggle room to accommodate short load spikes.\nTowards Real-Time Analysis of JFR Events All this happens after the fact though, through offline analysis of JFR recording files. An alternative would be to run this kind of analysis in realtime on live JFR data. The foundation for this is JFR event streaming which provides low-latency access to the JFR events of a running JVM.\nExpanding JFR Analytics into this direction is one of my goals for this year: complementing its current pull query capabilities (based on Apache Calcite) with push queries, leveraging Apache Flink as a stream processing engine. That way, blocked queue producers could trigger some kind of alert in a live production environment, for instance raised when the overall duration of blocked calls exceeds a given threshold in a given time window, indicating the need for intervention with a much lower delay than possible with offline analysis.\nTaking things even further, streaming queries could even enable predictive analytics; Flink’s pattern matching capabilities and the MATCH_RECOGNIZE clause could be used for instance to identify specific sequences of events which indicate that a full garbage collection is going to happen very soon. This information could be exposed via a health check, signalling to the load balancer in front of a clustered web application that affected nodes should not receive any more requests for some time, so as to shield users from long GC-induced response times.\nIf this sounds interesting to you, please let me know; I’d love to collaborate with the open-source community on this effort.\nMany thanks to Richard Startin for his feedback while working on this post!\n","id":28,"publicationdate":"Jan 3, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJava’s \u003ca href=\"https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/BlockingQueue.html\"\u003e\u003ccode\u003eBlockingQueue\u003c/code\u003e\u003c/a\u003e hierarchy is widely used for coordinating work between different producer and consumer threads.\nWhen set up with a maximum capacity (i.e. a \u003cem\u003ebounded queue\u003c/em\u003e), no more elements can be added by producers to the queue once it is full, until a consumer has taken at least one element.\nFor scenarios where new work may arrive more quickly than it can be consumed, this applies means of back-pressure,\nensuring the application doesn’t run out of memory eventually, while enqueuing more and more work items.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Is your Blocking Queue... Blocking?","uri":"https://www.morling.dev/blog/is-your-blocking-queue-blocking/"},{"content":" As part of my new job at Decodable, I am also planning to contribute to the Apache Flink project (as Decodable’s fully-managed stream processing platform is based on Flink). Right now, I am in the process of familiarizing myself with the Flink code base, and as such I am of course building the project from source, too.\nFlink uses Apache Maven as its build tool. It comes with the Maven Wrapper, simplifying the onboarding experience for new contributors, who don’t need to have Maven installed upfront. The configured Maven version is quite old though, 3.2.5 from 2014. Not even coloured output on the CLI yet — Boo! So I tried to build Flink with the latest stable version of Maven, 3.8.6 at the time of writing, but ran into some issues doing so.\nSpecifically, there are several dependencies with repository information embedded into their POM files. This is generally considered a bad practice for libraries, as it will inject those repositories into the build of any consumers, e.g. causing slower build processes. In the case at hand, the situation is even worse, as Maven since version 3.8.1 blocks access to non-HTTPS repositories for security reasons. This means that your build will fail if any dependency pulls in an HTTP repository.\nDealing with this is a bit cumbersome, as it’s not always obvious which dependency is causing that issue. For Flink, I encountered two instances of that problem. First, a transitive dependency of the flink-connector-hive_2.12 module (message slightly adapted for readability):\n1 2 3 4 5 6 7 8 9 10 11 12 ... [ERROR] Failed to execute goal on project flink-connector-hive_2.12: Could not resolve dependencies for project org.apache.flink:flink-connector-hive_2.12:jar:1.17-SNAPSHOT: Failed to collect dependencies at org.apache.hive:hive-exec:jar:2.3.9 -\u0026gt; org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to read artifact descriptor for org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Could not transfer artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: [ repository.jboss.org (http://repository.jboss.org/nexus/content/groups/public/, default, disabled), conjars (http://conjars.org/repo, default, releases+snapshots), apache.snapshots (http://repository.apache.org/snapshots, default, snapshots) ] ... There’s three non-HTTPS repositories involved here which got blocked by Maven. Note that those are all the unsecure repositories found in the dependency chain, they are not necessarily related to that particular error.\nUnfortunately, there’s no good way for identifying which dependency exactly is pulling them into the build and which repository is the problem here. Instead, you need to analyse all the dependencies from the project root to the flagged dependency, including any potential parent POM(s). In the case at hand, the problematic repo is the \u0026#34;conjars\u0026#34; one, as defined in the parent POM of the org:apache:hive:hive-exec artifact, org:apache:hive.\nAs far as I am aware, there’s no way for overriding such dependency-defined repositories in a downstream build; the only way I’ve found is to define a repository with the same id in a custom settings.xml file, redefining its URL to make use of HTTPS:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 \u0026lt;?xml version=\u0026#34;1.0\u0026#34; encoding=\u0026#34;UTF-8\u0026#34;?\u0026gt; \u0026lt;settings xmlns=\u0026#34;http://maven.apache.org/SETTINGS/1.0.0\u0026#34; xmlns:xsi=\u0026#34;http://www.w3.org/2001/XMLSchema-instance\u0026#34; xsi:schemaLocation=\u0026#34;http://maven.apache.org/SETTINGS/1.0.0 https://maven.apache.org/xsd/settings-1.0.0.xsd\u0026#34;\u0026gt; \u0026lt;mirrors\u0026gt; \u0026lt;mirror\u0026gt; \u0026lt;id\u0026gt;conjars\u0026lt;/id\u0026gt; \u0026lt;name\u0026gt;conjars\u0026lt;/name\u0026gt; \u0026lt;url\u0026gt;https://conjars.org/repo\u0026lt;/url\u0026gt; \u0026lt;mirrorOf\u0026gt;conjars\u0026lt;/mirrorOf\u0026gt; \u0026lt;/mirror\u0026gt; \u0026lt;/mirrors\u0026gt; \u0026lt;/settings\u0026gt; Building Flink with this settings.xml file gets us beyond that error. As far as the other two repositories are concerned, the JBoss one is actually defined in the root POM of Apache Flink itself. I’m not sure whether it’s actually needed, but I have created a pull request for changing it to HTTPS, just in case. The \u0026#34;apache.snapshots\u0026#34; repo is defined in the parent POM of org:apache:hive and seems also not needed. You could override it in your settings.xml using its HTTPS URL as a measure of good practice, though.\nWith that settings.xml in place, I could build Apache Flink using the current Maven version 3.8.6. I noticed though that the build gets stuck for quite some time at the following step:\n1 2 3 4 5 6 ... [INFO] ------------------\u0026lt; org.apache.flink:flink-hadoop-fs \u0026gt;------------------ [INFO] Building Flink : FileSystems : Hadoop FS 1.17-SNAPSHOT [INFO] --------------------------------[ jar ]--------------------------------- Downloading from maven-default-http-blocker: http://0.0.0.0/net/minidev/json-smart/maven-metadata.xml ... The build wouldn’t fail, though: after exactly 75 seconds, it continues and runs to completion. So what’s causing this stall? Again, a non-HTTPS repository is the culprit, but in a slightly more confusing way. As it turns out, that transitive dependency to the net.minidev:json-smart library is declared using a version range by the artifact com.nimbusds:nimbus-jose-jwt: [1.3.1,2.3].\nSo Maven reaches out to all configured repositories in order to identify the latest version within that range. Now the hadoop-auth dependency (via its parent hadoop-main) pulls in the JBoss HTTP repository; and while access to this is prevented via Maven’s HTTP blocker, for some reason it still tries to connect to that blocker’s pseudo URL 0.0.0.0. After 75 seconds, this request eventually times out and the build continues. Go figure.\nFor preventing this issue, you have a few options:\nAdd the JBoss repository with HTTPS to your settings.xml (again, the definition in the root POM of your own build does not suffice for that)\nRun the build with the -o (offline) flag\nPin down the version of the artifact in the dependency management of your build, sidestepping the need for resolving the version range:\n1 2 3 4 5 6 7 8 9 10 11 ... \u0026lt;dependencyManagement\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;net.minidev\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;json-smart\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.3\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; \u0026lt;/dependencyManagement\u0026gt; ... This approach has the advantage that it can be done in a persistent way as part of the Maven POM itself, there’s no need for a custom settings.xml or build time parameters like the offline flag.\nIn any case, the build will now skip that 75 seconds pause. I.e. less time for drinking a coffee while the build is running, which is a good thing of course. Now you might wonder why exactly 75 seconds, and I have to admit it’s not fully clear to me.\nWhen running the build with a debugger attached (I know, I know, it’s not en-vogue these days), I didn’t see any timeout configuration for establishing that HTTP connection. Some default TCP connection timeout on macOS perhaps? Interestingly, when trying with the latest Alpha of Maven 4, the build would only stall for ten seconds when trying to resolve that version range; Maven’s HTTP client is configured with a timeout of ten seconds as of this release.\nThe moral of the story? Don’t put repository information into published Maven POMs. If you publish something to Maven Central, all its dependencies should be resolvable from there, too. Luckily, Maven 4 will make this problem an issue of the past, bringing the long-awaited separation of build and consumer POMs.\nI’d also advise caution when it comes to adding version ranges to dependency definitions, it can have unexpected consequences as demonstrated above, and it’s probably not worth the hassle.\n","id":29,"publicationdate":"Dec 18, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs part of my \u003ca href=\"/blog/why-i-joined-decodable/\"\u003enew job\u003c/a\u003e at Decodable,\nI am also planning to contribute to the \u003ca href=\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e project\n(as Decodable’s fully-managed \u003ca href=\"https://www.decodable.co/product\"\u003estream processing platform\u003c/a\u003e is based on Flink).\nRight now, I am in the process of familiarizing myself with the Flink code base,\nand as such I am of course building the project from source, too.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Maven, What Are You Waiting For?!","uri":"https://www.morling.dev/blog/maven-what-are-you-waiting-for/"},{"content":" While working on a demo for processing change events from Postgres with Apache Flink, I noticed an interesting phenomenon: A Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space. The machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks.\nNow a common cause for this kind of issue are replication slots which are not advanced: in that case, Postgres will hold on to all WAL segments after the latest log sequence number (LSN) which was confirmed for that slot. Indeed I had set up a replication slot (via the Decodable CDC source connector for Postgres, which is based on Debezium). I then had stopped that connector, causing the slot to become inactive. The problem was though that I was really sure that there was no traffic in that database whatsoever! What could cause a WAL growth of ~18 GB/day then?\nWhat follows is a quick write-up of my investigations, mostly as a reference for my future self, but I hope this will come in handy for others in the same situation, too.\nThe Observation Let’s start with the observations I made. I don’t have the data and log files from the original situation any longer, but the following steps are enough to reproduce the issue. The first thing is to create a new Postgres database on Amazon RDS (I used version 14.5 on the free tier). Then get a session on the database and create a replication slot like this:\n1 2 3 4 5 6 SELECT * FROM pg_create_logical_replication_slot( \u0026#39;regression_slot\u0026#39;, \u0026#39;test_decoding\u0026#39;, false, true ); Now grab a coffee (or two, or three), and after some hours take a look into the metrics of the database in the RDS web console. \u0026#34;Free Storage Space\u0026#34; shows the following, rather unpleasant, picture:\nWe’ve lost more than two GB within three hours, meaning that the 20 GiB free tier database would run out of disk space within less than two days. Next, let’s take a look at the \u0026#34;Transaction Log Disk Usage\u0026#34; metric. It shows the problem in a very pronounced way:\nRoughly very few minutes the transaction log of the database grows by 64 MB. The \u0026#34;Write IOPS\u0026#34; metric further completes this picture. Again, every five minutes something causes write IOPS in that idle database:\nNow let’s see whether our replication slot actually is the culprit. By looking at the difference between its restart LSN (the earliest LSN which the database needs to retain in order to allow for this slot to resume) and the database’s current LSN we see how much bytes of WAL this slot prevents from being freed while it is inactive:\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT slot_name, pg_size_pretty( pg_wal_lsn_diff( pg_current_wal_lsn(), restart_lsn)) AS retained_wal, active, restart_lsn FROM pg_replication_slots; +-----------------+----------------+----------+---------------+ | slot_name | retained_wal | active | restart_lsn | |-----------------+----------------+----------+---------------| | regression_slot | 2166 MB | False | 0/4A05AF0 | +-----------------+----------------+----------+---------------+ Pretty much exactly the size of the WAL we saw in the database metrics. The big question now is of course what is causing that growth of the WAL? Which process is adding 64 MB to it every five minutes? So let’s take a look at the active server processes in Postgres, using the pg_stat_activity view:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SELECT pid AS process_id, usename AS username, datname AS database_name, client_addr AS client_address, application_name, backend_start, state, state_change FROM pg_stat_activity WHERE usename IS NOT NULL; +--------------+------------+-----------------+------------------+------------------------+-------------------------------+---------+-------------------------------+ | process_id | username | database_name | client_address | application_name | backend_start | state | state_change | |--------------+------------+-----------------+------------------+------------------------+-------------------------------+---------+-------------------------------| | 370 | rdsadmin | \u0026lt;null\u0026gt; | \u0026lt;null\u0026gt; | | 2022-11-30 11:11:03.424359+00 | \u0026lt;null\u0026gt; | \u0026lt;null\u0026gt; | | 468 | rdsadmin | rdsadmin | 127.0.0.1 | PostgreSQL JDBC Driver | 2022-11-30 11:12:02.517528+00 | idle | 2022-11-30 14:15:05.601626+00 | | 14760 | postgres | decodabletest | www.xxx.yyy.zzz | pgcli | 2022-11-30 14:04:58.765899+00 | active | 2022-11-30 14:15:06.820204+00 | +--------------+------------+-----------------+------------------+------------------------+-------------------------------+---------+-------------------------------+ This is interesting: besides our own session (user postgres), there’s also two other sessions by a user rdsadmin. As we don’t do any data changes ourselves, they must be somehow related to the WAL growth we observe.\nThe Solution At this point I had enough information to do some meaningful Google search, and I came across the blog post \u0026#34;Postgres Logical Replication and Idle Databases\u0026#34; by Byron Wolfman, who ran into the exact same issue as I did. As it turns out, RDS is periodically writing heartbeats into that rdsadmin database:\nIn RDS, we write to a heartbeat table in our internal “rdsadmin” database every 5 minutes\nThis is one part of the explanation: in our seemingless inactive RDS Postgres database, there actually is some traffic. But how is it possible that this heartbeat causes such a large amount of WAL growth? Surely those heartbeat events won’t be 64 MB large?\nAnother blog post hinted at the next bit of information: as of Postgres 11, the WAL segment size — i.e. the size of individual files making up the WAL — can be configured. On RDS, this is changed from the default of 16 MB to 64 MB. This sounds familiar!\nThat knowledge center post also led me to the last missing piece of the puzzle, the archive_timeout parameter, which defaults to five minutes. This is what the excellent postgresqlco.nf site has to say about this option:\nWhen this parameter is greater than zero, the server will switch to a new segment file whenever this amount of time has elapsed since the last segment file switch, and there has been any database activity …​ Note that archived files that are closed early due to a forced switch are still the same length as completely full files.\nAnd this finally explains why that inactive replication slot causes the retention of that much WAL on an idle database: there actually are some data changes made every five minutes in form of that heartbeat in the rdsadmin database. This in turn causes a new WAL segment of 64 MB to be created every five minutes. As long as that replication slot is inactive and doesn’t make any progress, all those WAL segments will be kept, (not so) slowly causing the database server to run out of disk space.\nTake Away The morale of the story? Don’t leave your replication slots unattended! There shouldn’t be any slots which are inactive for too long. For instance you could set up an alert based on the query above which notifies you if some slot retains WAL of more than 100 MB. And of course you should monitor your free disk space, too.\nThat being said, you still might be in for a bad surprise: under specific instances, also an active replication slot can cause unexpected WAL retention. If for instance large amounts of changes are being made to one database but a replication slot has been set up for another database which doesn’t receive any changes, that slot still won’t be able to make any progress.\nA common solution to that scenario is inducing some sort of artificial traffic into the database, as for instance supported by the Debezium Postgres connector. Note this doesn’t even require a specific tables, periodically writing a message just to the WAL using pg_logical_emit_message() is enough:\n1 SELECT pg_logical_emit_message(false, \u0026#39;heartbeat\u0026#39;, now()::varchar); If you use a logical decoding plug-in which supports logical replication messages — like pgoutput since Postgres 14 — then that’s all that’s needed for letting your replication slot advance within an otherwise idle database.\n","id":30,"publicationdate":"Nov 30, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhile working on a demo for processing change events from Postgres with Apache Flink,\nI noticed an interesting phenomenon:\nA Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space.\nThe machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eNow a common cause for this kind of issue are replication slots which are not advanced:\nin that case, Postgres will hold on to all WAL segments after the latest log sequence number (\u003ca href=\"https://pgpedia.info/l/LSN-log-sequence-number.html\"\u003eLSN\u003c/a\u003e) which was confirmed for that slot.\nIndeed I had set up a replication slot (via the \u003ca href=\"https://www.decodable.co/connectors/postgres-cdc\"\u003eDecodable CDC source connector for Postgres\u003c/a\u003e, which is based on \u003ca href=\"https://debezium.io\"\u003eDebezium\u003c/a\u003e).\nI then had stopped that connector, causing the slot to become inactive.\nThe problem was though that I was really sure that there was no traffic in that database whatsoever!\nWhat could cause a WAL growth of ~18 GB/day then?\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"The Insatiable Postgres Replication Slot","uri":"https://www.morling.dev/blog/insatiable-postgres-replication-slot/"},{"content":" This is a quick run down of the steps required for running JVM applications, built using Quarkus and GraalVM, on Render.\nRender is a cloud platform for running websites and applications. Like most other comparable services such as fly.io, it offers a decent free tier, which lets you try out the service without any financial commitment. Unlike most other services, with Render, you don’t need to provide a credit card in order to use the free tier. Which means there’s no risk of surprise bills, as often is the case with pay-per-use models, where a malicious actor could DDOS your service and drive up cost for consumed CPU resources or egress bandwidth indefinitely.\nIf the free tier limits are reached (see Free Plans for details), your services are shut down, until you either upgrade to a paid plan or the next month has started. This makes Render particularly interesting for personal projects and hobbyist use cases, for which you typically don’t have ops staff around who are looking 24/7 at dashboards and budget alerts and could take down the service in case of a DDOS attack.\nJava Applications on Render Render offers a PaaS-like model: when configuring an application, you point Render to a Git repository with your source code, and the platform will build and deploy it after each push to that repo. Unfortunately, Java is not amongst the supported languages right now. But Render also allows you to deploy applications via Docker, so that’s what we’ll use.\nAs an example project, I have created a very basic Quarkus-based web service. It is generated using code.quarkus.io and contains a single /hello REST endpoint. To make the best use of the resources of the constrained free tier, it is compiled into a native application using GraalVM. That way, it consumes way less memory than when running on the JVM. Feel free to use it for your own experiments.\nRender always builds deployed applications from source, i.e. there is no way for deploying a ready-made container image from a registry like Docker Hub. Now we could build our application using Docker on Render, but I have decided against that for two reasons:\nIt’s quite slow: the free tier allocates a rather limited CPU quota to build jobs, so building the container image for that simple Quarkus application takes more than ten minutes\nI like to have my application images in a container image registry, which for instance allows me to run exactly the same bits locally for debugging purposes\nIf you still would like to build a container image for your application directly on Render, check out the Quarkus documentation on multi-stage Docker builds. It describes how to build a Quarkus application within Docker, which is what you need to do in the absence of bespoke support for Java on Render.\nSo I ended up with the following flow for deploying that Quarkus application on Render:\nWhen a commit is pushed to the source repository (1), then a GitHub Action is triggered (2), which builds the application as a native binary, using GraalVM’s native-image tool. The resulting binary is packaged up as a container image, which is deployed to the Docker Hub registry (3). Once the image has been uploaded, a new deployment is triggered on Render (4). The deploy job fetches the container image from Docker Hub and builds the actual image for deployment (5), and finally the service is published to the outside world (6).\nConfiguration Details Now let’s dive into some specifics of the configuration on Render and GitHub. Once you have signed up for your Render account, go to the main dashboard and click the \u0026#34;New +\u0026#34; button for creating a new \u0026#34;Web Service\u0026#34;.\nYou then have two options: \u0026#34;Connect a repository\u0026#34; and \u0026#34;Public Git repository\u0026#34;. The former makes things a bit simpler to use, for instance by configuring all the webhook magic required for a tight integration between GitHub (or GitLab) and Render. It requires more permissions than I’m comfortable with though, one of them being \u0026#34;Act on your behalf\u0026#34;. So my recommendation is to go with the second option; it requires some more manual configuration, but it feels a bit safer to me. Specify the URL of your repository and click \u0026#34;Continue\u0026#34;:\nOn the following page, enter the following information:\nName: A unique name for your new application\nRegion: Choose where your application should be deployed\nEnvironment: Choose Docker here, then \u0026#34;Free\u0026#34; plan\nDockerfile Path (under \u0026#34;Advanced\u0026#34;): Specify ./src/main/docker/Dockerfile.render; this is a very simple Dockerfile which has the sole purpose of letting Render build an image for deployment; it simply is derived from the actual image with the application which is deployed to Docker Hub:\n1 FROM gunnarmorling/quarkus-on-render:latest Deploy Hook: Note down this generated URL, you will need it later when configuring the deployment trigger with GitHub Actions\nDocker Hub Access Token Next, create an access token for Docker Hub. This will be used for authenticating the GitHub Action when pushing an image to Docker Hub. Log into your Docker Hub account, click on your name at the upper right corner and choose \u0026#34;Account Settings\u0026#34;. Go to \u0026#34;Security\u0026#34; and click on \u0026#34;New Access Token\u0026#34;.\nSpecify a description for the token and choose \u0026#34;Read \u0026amp; Write\u0026#34; for its access permissions. On the next screen, make sure to copy the generated token, as it will be the only opportunity where you can see it.\nGitHub Actions The last part of the puzzle is setting up a GitHub Action which builds the application, pushes the container image with the application to Docker Hub and triggers a new deployment on Render. Navigate to your repository, click on the \u0026#34;Settings\u0026#34; tab and choose \u0026#34;Security\u0026#34; → \u0026#34;Secrets\u0026#34; → \u0026#34;Actions\u0026#34;.\nSet up the following three repository secrets:\nDOCKERHUB_TOKEN: The access token you just generated on Docker Hub\nDOCKERHUB_USERNAME: Your Docker Hub account name\nRENDER_DEPLOY_HOOK: The deploy hook URL from Render\nThese secrets will be used in the GitHub Action. The Action itself is a big wall of YAML, but most of the things should be fairly self-descriptive:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 name: ci on: push: branches: - \u0026#39;main\u0026#39; jobs: docker: runs-on: ubuntu-latest steps: - name: \u0026#39;Check out repository\u0026#39; (1) uses: actions/checkout@v3 - uses: graalvm/setup-graalvm@v1 (2) with: version: \u0026#39;latest\u0026#39; java-version: \u0026#39;17\u0026#39; components: \u0026#39;native-image\u0026#39; github-token: ${{ secrets.GITHUB_TOKEN }} - name: \u0026#39;Cache Maven packages\u0026#39; uses: actions/cache@v3.0.11 with: path: ~/.m2 key: ${{ runner.os }}-m2-${{ hashFiles(\u0026#39;**/pom.xml\u0026#39;) }} restore-keys: ${{ runner.os }}-m2 - name: \u0026#39;Build\u0026#39; (3) run: \u0026gt; ./mvnw -B --file pom.xml verify -Pnative -Dquarkus.native.additional-build-args=-H:-UseContainerSupport - name: Set up QEMU uses: docker/setup-qemu-action@v2 - name: Set up Docker Buildx (4) uses: docker/setup-buildx-action@v2 - name: Login to Docker Hub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }} - name: Build and push (5) uses: docker/build-push-action@v3 with: context: . push: true file: src/main/docker/Dockerfile.native tags: gunnarmorling/quarkus-on-render:latest - name: Deploy (6) uses: fjogeleit/http-request-action@v1 with: url: ${{ secrets.RENDER_DEPLOY_HOOK }} method: \u0026#39;POST\u0026#39; 1 Retrieve the source code of the application 2 Install GraalVM and its native-image tool 3 Build the project; the -Pnative build option instructs Quarkus to emit a native binary via GraalVM; more on the need for the -H:-UseContainerSupport option further below 4 Install Docker and log into Docker Hub 5 Build the container image and push it to Docker Hub; the used Dockerfile is the one generated by the Quarkus project creation wizard on code.quarkus.io; it packages takes a native binary based on the ubi-minimal base image: 1 2 3 4 5 6 7 8 9 10 11 12 FROM registry.access.redhat.com/ubi8/ubi-minimal:8.6 WORKDIR /work/ RUN chown 1001 /work \\ \u0026amp;\u0026amp; chmod \u0026#34;g+rwX\u0026#34; /work \\ \u0026amp;\u0026amp; chown 1001:root /work COPY --chown=1001:root target/*-runner /work/application EXPOSE 8080 USER 1001 CMD [\u0026#34;./application\u0026#34;, \u0026#34;-Dquarkus.http.host=0.0.0.0\u0026#34;] Note that setting the build context to . is vital in order to actually package the binary produced by the previous build step; without this, the Docker action would check out a fresh copy of the source repository itself\n6 Trigger a new deployment of the application on Render by invoking the deploy hook You can find the original YAML file here in my example repository. In fact, I am quite impressed how powerful GitHub Actions is by means of using ready-made actions for interacting with Docker, setting up GraalVM, invoking HTTP endpoints, and others.\nOne thing which deserves a special mention is the need for specifying the -H:-UseContainerSupport option when invoking the native-image tool via Quarkus. This is a work-around for GraalVM bug #4757 which triggers an exception upon invocation the method java.lang.Runtime::availableProcessors(). It seems the GraalVM code stumbles upon cgroup paths containing a colon, which apparently is the case in the Docker environment on Render (a similar bug, JDK-8272124, has been fixed in OpenJDK recently).\nBy disabling the container support when building the application this issue is circumvented, the solution is not ideal though: when determining the number of available processors, any CPU quotas applied for the container will not be considered, but rather the number of cores from the host system will be returned (8 in the case of Render as per a quick test I did). This causes thread pools in the application, like the common fork-join pool, to be sized superfluously large, potentially resulting in performance degredations at runtime. So let’s hope that issue in GraalVM will be fixed soon.\nAnd that’s all there is to it: at this point, you should have all the configuration in place for running a Java application — compiled into a native binary using Quarkus and GraalVM — on the Render cloud platform. Whenever you push a commit to the source repository, a new version of the application will be built, pushed as a container image to Docker Hub, and deployed on Render. The end-to-end execution time for that flow is ca. five minutes, about twice as fast as when building everything on Render itself. To further improve build times, you’d have to invest in beefier build machines; while compilation times with GraalVM have improved quite a bit over the last few years, it’s still a rather time-consuming experience.\nCheck out my repository on GitHub for the complete source code of the example application, with GitHub Actions definition, Maven POM file, etc.\n","id":31,"publicationdate":"Nov 28, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis is a quick run down of the steps required for running JVM applications,\nbuilt using Quarkus and GraalVM, on Render.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://render.com/\"\u003eRender\u003c/a\u003e is a cloud platform for running websites and applications.\nLike most other comparable services such as \u003ca href=\"https://fly.io/\"\u003efly.io\u003c/a\u003e,\nit offers a decent free tier, which lets you try out the service without any financial commitment.\n\u003cem\u003eUnlike\u003c/em\u003e most other services,\nwith Render, you don’t need to provide a credit card in order to use the free tier.\nWhich means there’s no risk of surprise bills, as often is the case with pay-per-use models,\nwhere a malicious actor could DDOS your service and drive up cost for consumed CPU resources or egress bandwidth indefinitely.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Running a Quarkus Native Application on Render","uri":"https://www.morling.dev/blog/running-quarkus-native-app-on-render/"},{"content":" It’s my first week as a software engineer at Decodable, a start-up building a serverless real-time data platform! When I shared this news on social media yesterday, folks were not only super supportive and excited for me (thank you so much for all the nice words and wishes!), but some also asked about the reasons behind my decision for switching jobs and going to a start-up, after having worked for Red Hat for the last few years. That’s a great question indeed, and I thought I’d put down some thoughts in this post. To me, it boils down to three key aspects: the general field of work, the environment, and the team. In the following, I’ll drill a bit further into each of them.\nThe Space: Real-Time Stream Processing Over the last five years, I’ve worked on Debezium, a popular open-source platform for change data capture (CDC). It retrieves change events from the transaction logs of databases such as MySQL and Postgres and emits them in a uniform event format to consumers via data streaming platforms like Apache Kafka, Pulsar or Amazon Kinesis. Reacting to low-latency change events enables all kinds of very interesting use cases, ranging from replication to other databases or (cloud) data warehouses, over updating caches and search indexes, to continuous queries over your operational data, or migrating monolithic architectures to microservices.\nNow, CDC is an important part of data pipelines for implementing such use cases, but it’s not the only one. You need to reason about the sink side of your pipelines and how to get your data from the streaming platform into your target system. There’s many critical questions there, such as: How do you propagate type metadata? How do you handle changes to the schema of your data? How to deal with duplicate events? Another concern is processing data, as it flows through your pipelines; you might want to filter records based on specific criteria and patterns, apply format conversions, group and aggregate data, join multiple data streams, and more. Lastly, there’s many other kinds of data sources besides CDC, such as sensor data in IoT scenarios, click streams from websites, APIs, and more.\nThis all is to say that I am really excited about the chance to take things to the next level and explore the field of stream processing at large, helping people to implement their streaming use cases end-to-end. Very often, once people have implemented their first low-latency data streaming use case and for instance observe data in their DWH within a second or two after a change has occurred in their operational database, there’s no going back, and they want this for everything. Of course it’s impossible to predict the future, but I think stream processing is at this point on the famous \u0026#34;hockey stick\u0026#34; curve right before it’s massively taking off, and it’s the perfect time to join this space.\nFrom a technical perspective, Apache Flink, an open-source \u0026#34;processing engine for stateful computations over unbounded and bounded data streams\u0026#34; is an excellent and proven foundation for this, and it’s a core technology behind Decodable. Getting my feet wet with Flink, learning about it in detail and hopefully contributing to it is one of the first things I’m planning to do. At the same time, I also think there’s lots of potential for further improving the user experience here, for instance by processing CDC events in a transactional way, smoothly handling schema changes, and much more. Exciting times!\nThe Environment: A Start-up Up to this point, I have mostly worked at large, established companies and enterprises during my career. Red Hat, where I’ve been at for the last ten years, grew to more than 20,000 employees during that time. Other places, like the German e-commerce giant Otto Group, had even larger workforce sizes of 50,000 people and more.\nAs with everything, being with such a large company has its pros and cons. On the upside, it’s a relatively safe bet, there’s brand recognition, you can approach and tackle huge undertakings as part of a big organization. At the same time, there tends to be quite a bit of process overhead, things can take a long time, there can be lots of politics, you need approval and buy-in for many things, etc. Note I am not saying that any of this is necessarily a bad thing (Ok, doing your travel expenses just sucks. Period.), lots of it makes sense and just is a reality in a large organization.\nThat all being said, I just felt that I want to gather some experience in a small environment, in a start-up company. I want to find out how it is to work in this kind of setting, being part of a small, hyper-focused team, working jointly towards one common goal and shared vision. Coming up with ideas, giving it a try, seeing what flies, and what doesn’t. Putting something minimal yet useful out there and quickly gathering user feedback. Having a good sense for your own impact. Seeing how the company grows and evolves. That’s the kind of sensation I am looking for and which I am hoping to find by working at Decodable.\nI could experience a first taste of the agility even before my first day at the company: \u0026#34;Would you feel comfortable to just buy a laptop of your choosing by yourself and expense it?\u0026#34; Sure thing! Some clicks and a few days later I had a very nice MacBook delivered to my doorstep. If you’ve been at bigger organizations, you’ll know how complicated such seemingly simple things like getting a new laptop can be.\nAt the same time, judging by my impressions during interviews, Decodable is a very mature start-up. Most folks have lots of experience, they are senior, in a very positive sense. Sure, there’s a ton of things on our plates, but there’s no expectation to work crazy hours. Many people here have families, and there’s a very healthy culture where it’s just normal that people have unforeseen situations where they need to pick up their kids on short notice, things like that. People are treated as the grown-ups they are, with lots of autonomy and trust by the leadership. Another key aspect for me is transparency: it’s one of the company’s core values, so everyone has the chance to know what’s going on (technically, business-wise, etc.), which gave me lots of confidence and trust when making the decision to join the team.\nThe Team: One of a Kind One of the clichés in the industry is: \u0026#34;It’s all about the people\u0026#34;. And yes, it is a cliché, but I’m also 100% convinced that it is true. You could work on the most amazing piece of technology, but if you don’t get along with the people around you, it won’t be an awful lot of fun. Or rather, it could be really bad.\nSo getting a vibe for the team and the people at Decodable was one of the most important things to me when I interviewed with them. And all I can say is that I was really impressed. Starting with the founder and CEO Eric Sammer, I had the opportunity to speak with about one third of the company’s employees during the interviewing process (talking to everyone is one of my personal onboarding goals, when do you ever get that chance?). I loved the passion, but also the respectfulness and sincereness of everyone. Needless to say that I’m deeply impressed with what the team has accomplished so far, since Decodable launched last year. I experienced Eric as a very considerate and mindful person, caring deeply about the concerns of the company’s employees. Plus, not only is he a legend in the data space, he’s also super well connected within Silicon Valley, opening up lots of doors for the company. Decodable being his second start-up will surely help us to avoid many mistakes.\nIn regards to the hiring process itself, it could probably be a topic for a separate blog post. The experience was nothing but excellent, with everyone being very open and transparent, willing to answer any questions I had. It really wasn’t that much of a series of interviews, but rather really good two-way conversations which helped us to get to know each other and find out whether I would be a good fit for Decodable, and whether the company would be a good fit for me. All in all, I very quickly had a feeling that this is a group of people I want to work with. I’m sure the direction of the company and the product can and will be adjusted over time, but this is a team I can’t wait to work with to make this a success.\nOutlook So those are the three key reasons which made me join Decodable: the exciting field of data streaming, the start-up environment, and a highly competent and friendly team.\nIn case you’re wondering what exactly I will be doing – that’s something we’re still figuring out. I am a member of the engineering organization, so I will get my fingers onto Apache Flink, but of course also on Decodable’s SaaS product around it. But I’m also planning to continue my fair share of evangelization work and talk about technology and its applications in blog posts or conference sessions. I hope to share my input on the product, be part of customer conversations, and much more. For the beginning, I’ll mostly focus on learning and sharing feedback based on my perspective of being the \u0026#34;new guy\u0026#34; on the team.\nFully adhering to the start-up spirit, I’m sure things will be very much in flux and my responsibilities will shift over time. But that dynamic is exactly what I’m looking for by joining Decodable. Let’s do this!\n","id":32,"publicationdate":"Nov 3, 2022","section":"blog","summary":"It’s my first week as a software engineer at Decodable, a start-up building a serverless real-time data platform! When I shared this news on social media yesterday, folks were not only super supportive and excited for me (thank you so much for all the nice words and wishes!), but some also asked about the reasons behind my decision for switching jobs and going to a start-up, after having worked for Red Hat for the last few years.","tags":null,"title":"Why I Joined Decodable","uri":"https://www.morling.dev/blog/why-i-joined-decodable/"},{"content":" Kafka Connect, part of the Apache Kafka project, is a development framework and runtime for connectors which either ingest data into Kafka clusters (source connectors) or propagate data from Kafka into external systems (sink connectors). A diverse ecosystem of ready-made connectors has come to life on top of Kafka Connect, which lets you connect all kinds of data stores, APIs, and other systems to Kafka in a no-code approach.\nWith the continued move towards running software in the cloud and on Kubernetes in particular, it’s just natural that many folks also try to run Kafka Connect on Kubernetes. On first thought, this should be simple enough: just take the Connect binary and some connector(s), put them into a container image, and schedule it for execution on Kubernetes. As so often, the devil is in the details though: should you use Connect’s standalone or distributed mode? How can you control the lifecycle of specific connectors via the Kubernetes control plane? How to make sure different connectors don’t compete unfairly on resources such as CPU, RAM, or network bandwidth? In the remainder of this blog post, I’d like to explore running Kafka Connect on Kubernetes, what some of the challenges are for doing so, and how Kafka Connect could potentially be reimagined to become more \u0026#34;Kubernetes-friendly\u0026#34; in the future.\nStandalone or Distributed? If you’ve used Kafka Connect before, then you’ll know that it has two modes of execution: standalone and distributed. In the former, you configure Connect via property files which you pass as parameters during launch. There will be a single process which executes all the configured connectors and their tasks. In distributed mode, multiple Kafka Connect worker nodes running on different machines form a cluster onto which the workload of the connectors and their tasks is distributed. Configuration is done via a REST API which is exposed on all the worker nodes. Internally, A Connect-specific protocol (which itself is based on Kafka’s group membership protocol) is used for the purposes of coordination and task assignment.\nThe distributed mode is in general the preferred and recommended mode of operating Connect in production, due to its obvious advantages in regards to scalability (one connector can spawn many tasks which are executed on different machines), reliability (connector configuration and offset state is stored in Kafka topics rather than files in the local file system), and fault tolerance (if one worker node crashes, the tasks which were scheduled on that node can be transparently rebalanced to other members of the Connect cluster).\nThat’s why also Kafka users on Kubernetes typically opt for Connect’s distributed mode, as for instance it’s the case with Strimzi’s operator for Kafka Connect. But that’s not without its issues either, as now essentially two scheduling systems are competing with each other: Kubernetes itself (scheduling pods to compute nodes), and Connect’s worker coordination mechanism (scheduling connector tasks to Connect worker nodes). This becomes particularly apparent in case of node failures. Should the Kubernetes scheduler spin up the affected pods on another node in the Kubernetes cluster, or should you rely on Connect to schedule the affected tasks to another Connect worker node? Granted, improvements in this area have been made, for instance in form of Kafka improvement proposal KIP-415 (\u0026#34;Incremental Cooperative Rebalancing in Kafka Connect\u0026#34;). It adds a new configuration property scheduled.rebalance.max.delay.ms, allowing you to defer rebalances after worker failures. But such a setting will always be a trade-off, and I think in general it’s fair to say that if there’s multiple components in a system which share the same responsibility (placement of workloads), that’s likely going to be a friction point.\nIssues with Kafka Connect on Kubernetes So let’s explore a bit more the challenges users often encounter when running Kafka Connect on Kubernetes. One general problem is the lack of awareness for running on Kubernetes from a Connect perspective.\nFor instance, consider the case of a stretched Kubernetes cluster, with Kubernetes nodes running in different regions of a cloud provider, or within different data centers. Let’s assume you have a source connector which ingests data from a database running within one of the regions. As you’re only interested in a subset of the records produced by that connector, you use a Kafka Connect single message transformation for filtering out a significant number of records. In that scenario, it makes sense to deploy that connector in local proximity to the database it connects to, so as to limit the data that’s transferred across network boundaries. But Kafka Connect doesn’t have any understanding of \u0026#34;regions\u0026#34; or related Kubernetes concepts like node selectors or node pools, i.e. you’ll lack the control needed for making sure that the tasks of that connector get scheduled onto Connect worker nodes running on the right Kubernetes nodes (a mitigation strategy would be to set up multiple Connect clusters, tied to specific Kubernetes node pools in the different regions).\nA second big source of issues is Connect’s model for the deployment of connectors, which in a way resembles the approach taken by Java application servers in the past: multiple, independent connectors are deployed and executed in shared JVM processes. This results in a lack of isolation between connectors, which can have far-reaching consequences in production scenarios:\nConnectors compete on resources: one connector or task can use up an unfairly large share of CPU, RAM or networking resources assigned to a pod, so that other connectors running on the same Connect worker will be negatively impacted; this could be caused by bugs or poor programming, but it also can simply be a result of different workload requirements, with one connector requiring more resources than others. While a rate limiting feature for Connect is being proposed via KIP-731 (which may eventually address the issue of distributing network resources more fairly), there’s no satisfying answer for assigning and limiting CPU and RAM resources when running multiple connectors on one shared JVM, due to its lack of application isolation.\nScaling complexities: when increasing the number of tasks of a connector (so as to scale out its load), it’s likely also necessary to increase the number of Connect workers, unless there were idle workers before; this process seems more complex and at the same time less powerful than it should be. For instance, there’s no way for ensuring that additional worker nodes would exclusively be used for the tasks of one particularly demanding connector.\nSecurity implications: as per the OpenJDK Vulnerability Group, \u0026#34;speculative execution vulnerabilities (e.g., Meltdown, Spectre, and RowHammer) cannot be addressed in the JDK. These hardware design flaws make complete intra-process isolation impossible\u0026#34;. Malicious connectors could leverage these attack vectors for instance to obtain secrets from other connectors running on the same JVM. Furthermore, some connectors rely on secrets (such as cloud SDK credentials) to be provided in the form of environment variables or Java system properties, which by definition are accessible by all connectors scheduled on the same Connect worker node.\nRisk of resource leaks : Incorrectly implemented connectors can cause memory and thread leaks after they were stopped, resulting in out-of-memory errors after stopping and restarting them several times, potentially impacting other connectors and tasks running on the same Connect worker node.\nCan’t use Kubernetes health checks: as health checks (such as liveness probes) work on the container level, a failed health check would restart the container, and thus Connect worker node with all its connectors, even if only one connector is actually failing. On the other hand, when relying on Connect itself to restart failed connectors and/or tasks, that’s not visible at the level of the Kubernetes control plane, resulting potentially in a false impression of a good health status of a connector, while it actually is in a restarting loop.\nCan’t easily examine logs of a single connector: When examining the logs of a Kafka Connect pod, messages from multiple running connectors will potentially show up in an interweaved way, depending on the specific logger configurations; as log messages can be prefixed with the connector name, that’s not that much of an issue when analyzing logs in dedicated tools like Logstash or Splunk, but it can be challenging when looking at the raw pod logs on the command line or via a Kubernetes web console.\nCan’t run multiple versions of one connector: As connectors are solely identified by their classname, it’s not possible to set up a connector instance of a specific version in case there’s multiple versions of that connector present.\nLastly, a third category of issues with running Connect on Kubernetes stems from the inherently mutable design of the system and the ability to dynamically instantiate and reconfigure connectors at runtime via a REST API.\nWithout proper discipline, this can quickly lead to a lack of insight into the connector configuration applying at a given time (in Strimzi, this is solved by preferrably deploying connectors via custom Kubernetes resources, rather than invoking the REST API directly). In fact, the REST API itself can be a source of issues: access to it needs to be secured in production use cases, also I’ve come across multiple reports over the years (and witnessed myself) where the REST API became unresponsive, while Connect itself still was running. It’s not exactly clear why this happened, but one potential course could be a buggy connector, consuming 100% of CPU cycles, leaving not enough resources for the REST API worker threads. Essentially, I think that such a control plane element like a REST API shouldn’t really be exposed on each member of a data plane, as represented by Connect worker nodes.\nBased on all these challenges, in particular those around lacking isolation between different connectors, many users of Kafka Connect stick to the practice of actually not deploying multiple connectors into shared worker clusters, but instead operate a dedicated cluster of Kafka Connect for each connector. This could be a cluster with a node count equal to the configured number of tasks, essentially resulting in 1:1 mapping of tasks to worker processes. Some users also deploy a number of spare workers for fail-over purposes. In fact, that’s the recommendation we’ve been giving to users in the Debezium community for a long time, and it also tends to be a common choice amongst providers of managed Kafka Connect services. Another approach taken by some teams is to deploy specific Connect clusters per connector type , preventing interferences between different kinds of connectors.\nAll these strategies can help to run connectors for running connectors safely and reliably, but the operational overhead of running multiple Connect clusters is evident.\nA Vision for Kubernetes-native Kafka Connect Having explored the potential issues with running Kafka Connect on Kubernetes, let’s finally discuss how Connect could be reimagined for being more Kubernetes-friendly. What are the parts that could remain? Which things would have to change? Many of the questions and shortcomings raised above – such as workload isolation, applying resource constraints, capability-based scheduling, lifecycle management – have been solved by Kubernetes at the pod level already, so how could that foundation be leveraged for Kafka Connect?\nTo put a disclaimer first: this part of this post may be a bit dissatisfying to read for some, as it merely describes an idea, I haven’t actually implemented any of this. My line of thinking is to hopefully ignite a discussion in the community and gauge the general level of interest, perhaps even motivating someone in the community to follow through and make this a reality. At least, that’s the plan :)\nThe general idea is to keep all the actual runtime bits and pieces of Connect: that’s key to being able to run all the amazing existing connectors out there, which are implemented against Connect’s framework interfaces. All the semantics and behaviors, like converters and SMTs, retries, dead-letter queue support, the upcoming exactly-once support for source connectors (KIP-618), all that could just be used as is.\nBut the entire layer for forming and coordinating clusters of worker nodes and distributing tasks amongst them would be replaced by a Kubernetes operator. To quote the official docs, \u0026#34;operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop\u0026#34;. The overall architecture would look like this:\nIn this envisioned model for Kafka Connect, such an operator would spin up one separate Kubernetes pod (and thus JVM process) for each connector task of a connector. Conceptually, those task processes would be somewhat of a mixture between today’s Connect standalone and distributed modes. Like standalone mode in the sense, that there would be no coordination amongst worker nodes and also no capability to dynamically reconfigure or start and stop a running task; each process/pod would run exactly one task in isolation, coordinated by the operator. Similar to distributed mode in the sense, that there would be a read-only REST API for health information, and that connector offsets would be stored in a Kafka topic, so as to avoid any pod-local state. There wouldn’t be the need for the configuration topic though, as the configuration would be passed upon start-up to the task pods (again akin to standalone mode today, e.g. by mapping a properties file to the pod), with the custom Kubernetes resources defining the connectors being the \u0026#34;system of record\u0026#34; for their configuration.\nFor this to work, the connector configuration needs to be pre-sliced into task-specific chunks. This could happen in two different ways, depending on the implementation of the specific connectors. For connectors which have a static set of tasks which doesn’t change at runtime (that’s the case for the Debezium connectors, for instance), the operator would deploy a short-lived pod on the Kubernetes cluster which runs the actual Connector implementation class and invoke its taskConfigs(int maxTasks) method . This could be implemented using a Kubernetes job, for instance. Once the operator has received the result (a map with one configuration entry per task), the connector pod can be stopped again and the operator will deploy one pod for each configured task, passing its specific configuration to the pod.\nThings get a bit more tricky if connectors dynamically change the number and/or configuration of tasks at runtime, which also is possible with Connect. For instance, that’s the case for the MirrorMaker 2 connector. Such a connector typically spins up a dedicated thread upon start-up which monitors some input resource. If that resource’s state changes (say, a new topic to replicate gets detected by MirrorMaker 2), it invokes the ConnectorContext::requestTaskReconfiguration() method, which in turn lets Connect retrieve the task configuration from the connector. This requires a permanently running pod for that connector class . Right now, there’d be no way for the operator to know whether that connector pod can be short-lived (static task set) or must be long-lived (dynamic task set). Either Connect itself would define some means of metadata for connectors to declare that information, or it could be part of the Kubernetes custom resource for a connector described in the next section.\nThe configuration of connectors would happen — the Kubernetes way — via custom resources. This could look rather similar to how Connect and connectors are deployed via CRs with Strimzi today; the only difference being that there’d be one CR which describes both Connect (and the resource limits to apply, the connector archive to run) and the actual connector configuration. Here’s an example how that could look like (again, that’s a sketch of how such a CR could look like, this won’t work with Strimzi right now):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaConnector metadata: name: debezium-connect-cluster spec: version: 3.2.0 bootstrapServers: debezium-cluster-kafka-bootstrap:9092 config: config.providers: secrets config.providers.secrets.class: io.strimzi.kafka.KubernetesSecretConfigProvider group.id: connect-cluster offset.storage.topic: connect-cluster-offsets config.storage.topic: connect-cluster-configs status.storage.topic: connect-cluster-status connector: class: io.debezium.connector.mysql.MySqlConnector tasksMax: 1 database.hostname: mysql database.port: 3306 database.user: ${secrets:debezium-example/debezium-secret:username} database.password: ${secrets:debezium-example/debezium-secret:password} database.server.id: 184054 database.server.name: mysql database.include.list: inventory database.history.kafka.bootstrap.servers: debezium-cluster-kafka-bootstrap:9092 database.history.kafka.topic: schema-changes.inventory build: output: type: docker image: 10.110.154.103/debezium-connect-mysql:latest plugins: - name: debezium-mysql-connector artifacts: - type: tgz url: https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/1.9.0.Final/debezium-connector-mysql-1.9.0.Final-plugin.tar.gz The operator would react to the creation, modification, or deletion of this resource, retrieve the (initial) task configuration as described above and spin up corresponding connector and task pods. To stop or restart a connector or task, the user would update the resource state accordingly, upon which the operator would stop and restart the affected pod(s).\nSuch an operator-based design addresses all the concerns for running Connect on Kubernetes identified above:\nOnly one component in charge of workload distribution: by removing Connect’s own clustering layer from the picture, the scheduling of tasks to compute resources is completely left to one component, the operator; it will determine the number and configuration of tasks to be executed and schedule a pod for each of them; regular health checks can be used for monitoring the state of each task, restarting failed task pods as needed; a degraded health state should be exposed if a connector task is in a retrying loop, so as to make this situation apparent at the Kubernetes level; if a pod crashes, it can be restarted by the operator on the same or another node of the Kubernetes cluster, not requiring any kind of task rebalancing from a Connect perspective. Node selectors could be used to pin a task to specific node groups, e.g. in a specific region or availability zone.\nOne JVM process and Kubernetes pod per task: by launching each task in its own process, all the isolation issues discussed above can be avoided, preventing multiple tasks from negatively impacting each other. If needed, Kubernetes resource limits can be put in place in order to effectively cap the resources available to one particular task, such as CPU and RAM, while also allowing to schedule all the task pods tightly packed onto the compute nodes, making efficient use of the available resources. As each process runs exactly one task, log files are easy to consume and analyze. Scaling out can happen by increasing a single configuration parameter in the CR, and a corresponding number of task pods will be deployed by the operator. Thread leaks become a non-issue too, as there would be no notion of stopping or pausing a task; instead, just the pod itself would be stopped for that purpose, terminating the JVM process running inside of it. On the downside, the overall memory consumption across all the tasks would be increased, as there would be no amortization of Connect classes loaded into JVM processes shared by multiple tasks. Considering the significant advantages of process-based isolation, this seems like an acceptable trade-off, just as Java application developers largely have moved on from the model of co-deploying several applications into shared application server instances.\nImmutable design: by driving configuration solely through Kubernetes resources and passing the resulting Connect configuration as parameters to the Connect process upon start-up, there’s no need for exposing a mutating REST API (there’d still be a REST endpoint exposing health information), making things more secure and potentially less complex internally, as the entire machinery for pausing/resuming, dynamically reconfiguring and stopping tasks could be removed. At any time, a connector’s configuration would be apparent by examining its CR, which ideally should be sourced from an SCM (GitOps).\nLooking further out into the future, such a design for making Kafka Connect Kubernetes-native would also allow for other, potentially very interesting explorations: for instance one could compile connectors into native binaries using GraalVM, resulting in a significantly lower consumption of memory and faster start-up times (e.g. when reconfiguring a connector and subsequently restarting the corresponding pod), making that model very interesting for densely packed Kubernetes environments. A buildtime toolkit like Quarkus could be used for producing specifically tailored executables, which run exactly one single connector task on top of the Connect framework infrastructure, a bit similar to how Camel-K works under the hood. Ultimately, such Kubernetes-native design could even open up the door to Kafka connectors being built in languages and runtimes other than Java and the JVM, similar to the route explored by the Conduit project.\nIf you think this all sounds exciting and should become a reality, I would love to hear from you. One aspect of specific interest will be which of the proposed changes would have to be implemented within Kafka Connect itself (vs. a separate operator project, for instance under the Strimzi umbrella), without disrupting non-Kubernetes users. In any case, it would be amazing to see the Kafka community at large take its steps towards making Connect truly Kubernetes-native and fully taking advantage of this immensely successful container orchestration platform!\nMany thanks to Tom Bentley, Tom Cooper, Ryanne Dolan, Neil Buesing, Mickael Maison, Mattia Mascia, Paolo Patierno, Jakub Scholz, and Kate Stanley for providing their feedback while writing this post!\n","id":33,"publicationdate":"Sep 6, 2022","section":"blog","summary":"Kafka Connect, part of the Apache Kafka project, is a development framework and runtime for connectors which either ingest data into Kafka clusters (source connectors) or propagate data from Kafka into external systems (sink connectors). A diverse ecosystem of ready-made connectors has come to life on top of Kafka Connect, which lets you connect all kinds of data stores, APIs, and other systems to Kafka in a no-code approach.","tags":null,"title":"An Ideation for Kubernetes-native Kafka Connect","uri":"https://www.morling.dev/blog/ideation-kubernetes-native-kafka-connect/"},{"content":" Kafka Connect is a key factor for the wide-spread adoption of Apache Kafka: a framework and runtime environment for connectors, it makes the task of getting data either into Kafka or out of Kafka solely a matter of configuration, rather than a bespoke programming job. There’s dozens, if not hundreds, of readymade source and sink connectors, allowing you to create no-code data pipelines between all kinds of databases, APIs, and other systems.\nThere may be situations though where there is no existing connector matching your requirements, in which case you can implement your own custom connector using the Kafka Connect framework. Naturally, this raises the question of how to test such a Kafka connector, making sure it propagates the data between the connected external system and Kafka correctly and completely. In this blog post I’d like to focus on testing approaches for Kafka Connect source connectors, i.e. connectors like Debezium, which ingest data from an external system into Kafka. Very similar strategies can be employed for testing sink connectors, though.\nUnit Tests One first obvious approach is implementing good old unit tests: simply instantiate the class under test (typically, your SourceConnector or SourceTask implementation), invoke its methods (for instance, SourceConnector::taskConfigs(), or SourceTask::poll()), and assert the return values.\nHere’s an example for such a test from kc-etcd, a simple source connector for etcd, which is a distributed key/value store, most prominently used by Kubernetes as its metadata storage. Note that kc-etcd isn’t meant to be a production-ready connector; I have written it primarily for learning and teaching purposes.\nThis test verifies that the connector produces the correct task configuration, based on a given number of maximum tasks of two:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 public class EtcdSourceConnectorTest { @Test public void shouldCreateConfigurationForTasks() throws Exception { EtcdSourceConnector connector = new EtcdSourceConnector(); Map\u0026lt;String, String\u0026gt; config = new HashMap\u0026lt;\u0026gt;(); config.put( \u0026#34;clusters\u0026#34;, \u0026#34;etcd-a=http://etcd-a-1:2379,http://etcd-a-2:2379,http://etcd-a-3:2379;etcd-b=http://etcd-b-1:2379;etcd-c=http://etcd-c-1:2379\u0026#34; ); (1) connector.start(config); List\u0026lt;Map\u0026lt;String, String\u0026gt;\u0026gt; taskConfigs = connector.taskConfigs(2); (2) assertThat(taskConfigs).hasSize(2); (3) taskConfig = taskConfigs.get(0); assertThat(taskConfig).containsEntry(\u0026#34;clusters\u0026#34;, \u0026#34;etcd-a=http://etcd-a-1:2379,http://etcd-a-2:2379,http://etcd-a-3:2379;etcd-b=http://etcd-b-1:2379\u0026#34;); (4) taskConfig = taskConfigs.get(1); assertThat(taskConfig).containsEntry(\u0026#34;clusters\u0026#34;, \u0026#34;etcd-c=http://etcd-c-1:2379\u0026#34;); } } 1 Configure the connector with three etcd clusters 2 Request the configuration for two tasks 3 The first connector task should handle the first two clusters 4 The second task should handle the remaining third cluster Things look similar when testing the actual polling loop of the connector’s task class. As this is about testing a source connector, we first need to do some data changes in the configured etcd cluster(s), before we can assert the corresponding SourceRecords that are emitted by the task. As part of kc-etcd, I’ve implemented a very basic testing harness named kcute (\u0026#34;Kafka Connect Unit Testing\u0026#34;) which simplifies that process a bit. Here’s an example test from kc-etcd, based on kcute:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 public class EtcdSourceTaskTest { @RegisterExtension (1) public static final EtcdClusterExtension etcd = EtcdClusterExtension.builder() .withNodes(1) .build(); @RegisterExtension (2) public TaskRunner taskRunner = TaskRunner.forSourceTask(EtcdSourceConnectorTask.class) .with(\u0026#34;clusters\u0026#34;, \u0026#34;test-etcd=\u0026#34; + etcd.clientEndpoints().get(0)) .build(); @Test public void shouldHandleAllTypesOfEvents() throws Exception { Client client = Client.builder() (3) .keepaliveWithoutCalls(false) .endpoints(etcd.clientEndpoints()) .build(); KV kvClient = client.getKVClient(); long currentRevision = getCurrentRevision(kvClient); // insert ByteSequence key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); ByteSequence value = ByteSequence.from(\u0026#34;value-1\u0026#34;.getBytes()); kvClient.put(key, value).get(); // update key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); value = ByteSequence.from(\u0026#34;value-1a\u0026#34;.getBytes()); kvClient.put(key, value).get(); // delete key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); kvClient.delete(key).get(); (4) List\u0026lt;SourceRecord\u0026gt; records = taskRunner.take(\u0026#34;test-etcd\u0026#34;, 3); (5) // insert SourceRecord record = records.get(0); assertThat(record.sourcePartition()).isEqualTo(Collections.singletonMap(\u0026#34;name\u0026#34;, \u0026#34;test-etcd\u0026#34;)); assertThat(record.sourceOffset()).isEqualTo(Collections.singletonMap(\u0026#34;revision\u0026#34;, ++currentRevision)); assertThat(record.keySchema()).isEqualTo(Schema.STRING_SCHEMA); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.valueSchema()).isEqualTo(Schema.STRING_SCHEMA); assertThat(record.value()).isEqualTo(\u0026#34;value-1\u0026#34;); // update record = records.get(1); assertThat(record.sourceOffset()).isEqualTo(Collections.singletonMap(\u0026#34;revision\u0026#34;, ++currentRevision)); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isEqualTo(\u0026#34;value-1a\u0026#34;); // delete record = records.get(2); assertThat(record.sourceOffset()).isEqualTo(Collections.singletonMap(\u0026#34;revision\u0026#34;, ++currentRevision)); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isNull(); } } 1 Set up an etcd cluster using the JUnit extension provided by the jetcd client project 2 Set up the task unter test using kcute 3 Obtain a client for etcd and do some data changes 4 Retrieve three records for the specified topic via kcute 5 Assert the emitted SourceRecords corresponding to the data changes done before in etcd Now one could argue about whether this test is a true unit test or not, given it launches and relies on an instance of an external system in the form of etcd. My personal take on these matters is to be pragmatic here, as a) there’s a difference to true end-to-end integration tests as discussed in the next section, and b) approaches to mock external systems usually are not worth the effort or, worse, result in poor tests, due to incorrect assumptions when implemening the mocked behavior.\nThis testing approach works very well in general; in particular it doesn’t require you to start Apache Kafka (and ZooKeeper), nor Kafka Connect, resulting in very fast test execution times and a great dev experience when creating and running these tests in your IDE.\nBut there are some limitations, too. Essentially, we end up emulating the behavior of the actual Kafka Connect runtime in our testing harness. This can become tedious when more advanced Connect features are required for a given test, for instance retrying/restart logic, the dynamic reconfiguration of connector tasks while the connector is running, etc. Ideally, there’d be a testing harness with all these capabilities provided as part of Kafka Connect itself (similar in spirit to the TopologyTestDriver of Kafka Streams), but in the absence of that, we may be better off for certain tests by deploying our source connector into an actual Kafka Connect instance and run assertions against the topic(s) it writes to.\nIntegration Tests When it comes to setting up the required infrastructure for integration tests in Java, the go-to solution these days is the excellent Testcontainers project. So let’s see what it takes to test a custom Kafka connector using Testcontainers.\nAs far as Kafka itself is concerned, there’s a module for that coming with Testcontainers, based on Confluent Platform. Alternatively, you could use the Testcontainers module from the Strimzi project, which provides you with plain upstream Apache Kafka container images. For Kafka Connect, we provide a Testcontainers integration as part of the Debezium project, offering an API for registering connectors and controlling their lifecycle.\nNow, unfortunately, the application server like deployment model of Kafka Connect poses a challenge when it comes to testing a connector which is built as part of the current project itself. For each connector plug-in, Connect expects a directory on its plug-in path which contains all the JARs of the connector itself and its dependencies. I’m not aware of any kind of \u0026#34;exploded mode\u0026#34;, where you could point Connect to a directory which contains a connector’s class files and its dependencies in JAR form.\nThis means that that the connector must be packaged into a JAR file as part of the test preparation. In order to make integration tests friendly towards being run from within an IDE, this should happen programmatically within the test itself. That way, any code changes to the connector will be picked up automatically when running the test for the next time, without having to run the project’s Maven build. The entire code for doing this is a bit too long (and boring) for sharing it in this blog post, but you can find it in the kc-etcd repository on GitHub.\nHere’s the key parts of an integration test based on that approach, though:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 public class EtcdConnectorIT { private static Network network = Network.newNetwork(); (1) private static KafkaContainer kafkaContainer = new KafkaContainer(DockerImageName.parse(\u0026#34;confluentinc/cp-kafka:7.2.0\u0026#34;)) .withNetwork(network); (2) public static DebeziumContainer connectContainer = new DebeziumContainer(\u0026#34;debezium/connect-base:1.9.5.Final\u0026#34;) .withFileSystemBind(\u0026#34;target/kcetcd-connector\u0026#34;, \u0026#34;/kafka/connect/kcetcd-connector\u0026#34;) .withNetwork(network) .withKafka(kafkaContainer) .dependsOn(kafkaContainer); (3) public static EtcdContainer etcdContainer = new EtcdContainer(\u0026#34;gcr.io/etcd-development/etcd:v3.5.4\u0026#34;, \u0026#34;etcd-a\u0026#34;, Arrays.asList(\u0026#34;etcd-a\u0026#34;)) .withNetworkAliases(\u0026#34;etcd\u0026#34;) .withNetwork(network); @BeforeAll public static void startContainers() throws Exception { createConnectorJar(); (4) Startables.deepStart(Stream.of( kafkaContainer, etcdContainer, connectContainer)) .join(); } @Test public void shouldHandleAllTypesOfEvents() throws Exception { Client client = Client.builder() .endpoints(etcdContainer.clientEndpoint()).build(); (5) ConnectorConfiguration connector = ConnectorConfiguration.create() .with(\u0026#34;connector.class\u0026#34;, \u0026#34;dev.morling.kcetcd.source.EtcdSourceConnector\u0026#34;) .with(\u0026#34;clusters\u0026#34;, \u0026#34;test-etcd=http://etcd:2379\u0026#34;) .with(\u0026#34;tasks.max\u0026#34;, \u0026#34;2\u0026#34;) .with(\u0026#34;key.converter\u0026#34;, \u0026#34;org.apache.kafka.connect.storage.StringConverter\u0026#34;) .with(\u0026#34;value.converter\u0026#34;, \u0026#34;org.apache.kafka.connect.storage.StringConverter\u0026#34;); (6) connectContainer.registerConnector(\u0026#34;my-connector\u0026#34;, connector); connectContainer.ensureConnectorTaskState(\u0026#34;my-connector\u0026#34;, 0, State.RUNNING); KV kvClient = client.getKVClient(); (7) // insert ByteSequence key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); ByteSequence value = ByteSequence.from(\u0026#34;value-1\u0026#34;.getBytes()); kvClient.put(key, value).get(); // update key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); value = ByteSequence.from(\u0026#34;value-1a\u0026#34;.getBytes()); kvClient.put(key, value).get(); // delete key = ByteSequence.from(\u0026#34;key-2\u0026#34;.getBytes()); kvClient.delete(key).get(); (8) List\u0026lt;ConsumerRecord\u0026lt;String, String\u0026gt;\u0026gt; records = drain(getConsumer(kafkaContainer), 3); // insert ConsumerRecord\u0026lt;String, String\u0026gt; record = records.get(0); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isEqualTo(\u0026#34;value-1\u0026#34;); // update record = records.get(1); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isEqualTo(\u0026#34;value-1a\u0026#34;); // delete record = records.get(2); assertThat(record.key()).isEqualTo(\u0026#34;key-2\u0026#34;); assertThat(record.value()).isNull(); } } 1 Set up Apache Kafka in a container using the Testcontainers Kafka module 2 Set up Kafka Connect in a container, mounting the target/kcetcd-connector directory onto the plug-in path; as part of the project’s Maven build, all the dependencies of the kc-etcd connector are copied into that directory 3 Set up etcd in a container 4 Package the connector classes from the target/classes directory into a JAR and add that JAR to the mounted plug-in directory; the complete source code for this can be found here 5 Configure an instance of the etcd source connector, using String as the key and value format 6 Register the connector, then block until its tasks have reached the RUNNING state 7 Do some changes in the source etcd cluster 8 Using a regular Kafka consumer, read three records from the corresponding Kafka topic and assert the keys and values (complete code here) And that’s all there is to it; we now have a test which packages our source connector, deploys it into Kafka Connect and asserts the messages it sends to Kafka. While this is definitely more time-consuming to run than the simple test harness shown above, this true end-to-end approach tests the connector in the actual runtime environment, verifying its behavior when executed via Kafka Connect, just as it would be the case when running the connector in production later on.\nWrap-Up In this post, we’ve discussed two approaches for testing Kafka Connect source connectors: plain unit tests, \u0026#34;manually\u0026#34; invoking the methods of the connector/task classes under test, and integration tests, deploying a connector into Kafka Connect and verifying its behavior there via Testcontainers.\nThe former approach provides you with faster turnaround times and shorter feedback cycles, whereas the latter approach gives you the confidence of testing a connector within the actual Kafka Connect runtime environment, at the cost of a more complex infrastructure set-up and longer test execution times. While we’ve focused on testing source connectors in this post, both approaches could equally be applied to sink connectors, with the only difference being that you’d feed records to the connector (either directly or by writing to a Kafka topic) and then observe and assert the expected state changes of the sink system in question.\nYou can find the complete source code for this post, including some parts omitted here for brevity, in the kc-etcd repository on GitHub. If you think that having a test harness like kcute for unit testing connectors is a good idea and it’s something you’d like to contribute to, then please let me know. Ultimately, this could be extracted into its own project, independent from kc-etcd, or even be upstreamed to the Apache Kafka project proper, reusing as much as possible the actual Connect code, sans the bits for \u0026#34;deploying\u0026#34; connectors via a separate process.\nMany thanks to Hans-Peter Grahsl and Kate Stanley for their feedback while writing this blog post!\n","id":34,"publicationdate":"Aug 25, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://kafka.apache.org/documentation/#connect\"\u003eKafka Connect\u003c/a\u003e is a key factor for the wide-spread adoption of Apache Kafka:\na framework and runtime environment for connectors,\nit makes the task of getting data either into Kafka or out of Kafka solely a matter of configuration,\nrather than a bespoke programming job.\nThere’s dozens, if not hundreds, of readymade source and sink connectors,\nallowing you to create no-code data pipelines between all kinds of databases, APIs, and other systems.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThere may be situations though where there is no existing connector matching your requirements,\nin which case you can \u003ca href=\"https://kafka.apache.org/documentation/#connect_development\"\u003eimplement your own\u003c/a\u003e custom connector using the Kafka Connect framework.\nNaturally, this raises the question of how to test such a Kafka connector,\nmaking sure it propagates the data between the connected external system and Kafka correctly and completely.\nIn this blog post I’d like to focus on testing approaches for Kafka Connect \u003cem\u003esource\u003c/em\u003e connectors,\ni.e. connectors like \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e, which ingest data from an external system into Kafka.\nVery similar strategies can be employed for testing sink connectors, though.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Testing Kafka Connectors","uri":"https://www.morling.dev/blog/testing-kafka-connectors/"},{"content":" Every so often, I come across some conference talk which is highly interesting in terms of its actual contents, but which unfortunately is presented in a less than ideal way. I’m thinking of basic mistakes here, such as the presenter primarily looking at their slides rather than at the audience. I’m always feeling a bit sorry when this happens, as I firmly believe that everyone can do good and even great talks, just by being aware of — and thus avoiding — a few common mistakes, and sticking to some simple principles.\nNow, who am I to give any advice on public speaking? Indeed I’m not a professional full-time speaker, but I do enjoy presenting on technologies which I am working on or with as part of my job. Over time, I’ve come to learn about a few techniques which I felt helped me to give better talks. A few simple things, which can be easy to get wrong, but which make a big difference for the perception of your talk. Do I always stick to them myself? I try my best, but sometimes, I fail ¯\\_(ツ)_/¯.\nSo, without further ado, here’s ten tips and techniques for making your next conference talk suck a bit less.\n1. 💦 Rehearse, Rehearse, Rehearse In particular if you have done a few talks already and you start feeling comfortable, it can be tempting to think you could just wing it and skip the rehearsal for your next one. After all, it can feel weird to be alone in your room and speak aloud all by yourself. I highly recommend to not fall for that — rehearsing a talk is absolutely vital for making it successful. It will help you to develop a consistent line of argument and identify any things you otherwise may forget to mention. Only proper rehearsing will give you that natural flow you want to have for a talk.\nAlso, it will help you with the right timing of your talk: you don’t want to finish 20 min ahead of time, nor reach the end of your presentation slot with half of your slides remaining. If this happens, it usually means folks haven’t rehearsed once and it’s not a good position to be in. For a new talk, I usually will do three rehearsal runs before presenting it at an event. I will also do a rehearsal run if I repeat an earlier talk after some months, as it’s too easy to forget some important point otherwise.\nWhen doing a rehearsal, it’s a good idea to note down some key timestamps, such as when you transition to a demo. This will come in handy for instance for identifying sections you could shorten if you realize the talk is too long in its initial form.\n2. 🎬 Start With a Mission How to start a talk well could be an entire topic for its own post. After all, the first few seconds decide whether folks will be excited about your talk and pay attention, or rather pack out their laptop and check their emails. What I’ve found to work well for me is starting with a mission. I.e. I’ll often present a specific problem and make the case for how listening to that talk will help to address that problem. Needless to say that the problem should be relevant to the audience, i.e. its key to motivate why and how it matters to them, and how learning about the solution will benefit them. Don’t focus on the thing you want to talk about, focus on a challenge your audience has and how your talk will help them to overcome that.\nAnother approach is to present the key learnings (for instance three, see below) which the audience will make during the talk. While this may sound similar to an agenda slide, the framing is different: it’s taking the perspective of the listener and what’s in it for them by sticking through your session. Don’t lead with your personal introduction; if you’re known in the field, people don’t care. And if you’re not, well, they probably also won’t care. In any case, telling much about yourself is not what will attract people to your talk. I usually have a very brief intro slide after discussing the mission or key learnings.\n3. 📖 Tell a Story Good talks tell a story, i.e. there’s a meaningful progression in terms of what you tell, starting with some setting and context, perhaps with some challenge or drama (\u0026#34;And this is when our main production server failed\u0026#34;), and of course a happy ending (\u0026#34;With the new solution we can fail-over to a stand-by in less than a second\u0026#34;).\nNow it doesn’t literally have to be a story (although it can be, as for instance in my talk To the Moon and Beyond With Java 17 APIs!), but you should make sure that there is a logical order of the things you discuss, for instance in a temporal or causal sense, and you should avoid jumping forth and back between different things. The latter for instance can happen due to insufficient rehearsal, forcing you to make a specific point too late during the talk, as you forgot to bring it at the right moment. Also, for each discussion point and slide there should be a very specific reason for having it in your deck. I.e. it should form a cohesive unit, rather than being a collection of random unrelated talking points.\nOther storytelling techniques can be employed to great effect as well, such as doing a quick wrap-up when finishing a key section of your session, or adding little \u0026#34;side quests\u0026#34; for things you really want to mention but which are not strictly related to the main storyline.\nIn terms of crafting a story, I try to start early and collect input over a longer period of time, typically using a mind map. This allows you to identify and gather the most interesting aspects of a given topic, also touching on points which perhaps came up in a revelation you had a while ago. You’ll be less likely to have that breadth of contents at your disposal when starting the day before the presentation. This is not to say that you should use every single bit of information you’ve collected, but starting from a broad foundation allows you to select the most relevant and insightful bits.\n4. 👀 Look at the Audience, Not Your Slides As mentioned at the beginning, one of my pet peeves is presenters turning their back (or side) to the audience and looking towards their slides projected next to them. This creates a big disconnect with your audience. The same applies to the slides on the laptop in front of you, avoid looking at them as much as you can. Instead, try to have as much eye contact with the audience as possible, it makes a huge difference in terms of perception and quality of your talk. Putting a sticker onto your screen can be a helpful reminder. Only if you actually speak to the audience, it will be an engaging and immersive experience for them. It’s extra bad if you don’t use a microphone, say at a local meet-up, as it means people will be able to understand you much worse.\nNow why are folks actually looking at their slides? I think it’s generally an expression of feeling a bit insecure or uncomfortable, and in particular the concern to forget to mention an important point. To me, the only viable solution here is that you really need to memorize what you want to say, in which case you’ll be able to make your points without having to read anything from your slides. Your slides are not your speaker notes!\n5. 🧹 Put Less Text on Your Slides. Much Less In terms of what should be on slides, this again could be a topic for its own blog post. In general, the less words the better. Note I’m not suggesting you need to go image-only slides TED talk style, but you should minimize the amount of text on slides as much as possible. The reason being that folks will either listen to you or read what’s on your slides, but hardly both. Which means that either your effort for putting the text on the slides is wasted (bad), or folks don’t actually get what you’re telling them (worse). So if you think you’ve removed enough, remove some more. And then some more. This also allows you to make the font size big enough, so that folks actually can read those few items which remain.\nWhat I personally like to have on slides the most is diagrams, charts, sketches, and the like. Anything visual really. Which also brings up one exception to the \u0026#34;Don’t look at your slides\u0026#34; rule: if you actually explain a visual, elaborating a particular part for instance, then shortly turning towards the slide and pointing to some element of it can make sense.\nOn a related note, I recommend not relying on having access to your speaker notes during a talk. While technically it may be possible to show the notes on your laptop and the actual slides on the projector, this will fall apart when you do a live demo, where you really need to work with a mirrored set-up. Think of speaker notes as of cheat sheets back in school: the value is in writing them, not in reading them. By the time you’ll present your talk, you’ll have memorized what’s on your notes. Make use of them for developing the story line for each slide, and of course they will also be useful when coming back to a talk after a few months.\n6. ✂️ Tailor the Talk Towards Your Audience I don’t see that one done wrong too often, but it’s worth pointing out: a talk should actually match its audience. So if for instance you talk to users of some technology, focussing on use cases of it makes sense, or on how to run it in production etc. Whereas this audience probably won’t care as much about implementation details (as much as you may want to talk about how you solved that one tricky technical challenge using some clever approach). If, on the other hand, you present about the same technology to a conference geared towards builders of tech in that space, diving into those gory details would be highly attractive for the audience.\nThat’s why I focus heavily on use cases when talking about Debezium at developer conferences. Whereas when I had the opportunity to present on Debezium and change data capture (CDC) during an online talk series of Carnegie Mellon’s database group, I centered the talk around implementation challenges and improvements databases could make to better support CDC use cases.\nKey here is expectation management: make sure you know what kind of audience you’re going to speak to and adjust your talk accordingly. Oftentimes, the same basic talk can work well for different settings and audiences, just with framing things the right way and putting the focus on the right parts, for instance by swapping a few slides in and out.\n7. 3️⃣ Rule of Three Over time I’ve become a big believer in the rule of three; for instance, have three main learnings or ideas for a talk. If it’s a talk about a new product release, share three key features. On one slide, have three main points to discuss. When you share examples, give three of them. And so on.\nWhy three? It hits the sweet spot of providing representative information and data, letting you enough time to sufficiently dive into each of them, and not being too extensive or repetitive. Your audience can digest only so much input in a given session, so they’ll be better served if you tell them about three things which they can take in and remember, instead of telling them about ten things which they all quickly forget or even miss to begin with.\n8. 🚑 Have a Fallback Plan for Demos Live demos can be a great addition to any technology-centered conference talk. Actually showing how the thing you discuss works can be an eye-opener and be truly impressive. Not so much though if the demo gods aren’t with you. And we’ve all been there: poor network at the conference venue doesn’t let you download that one container image you’re missing, you have a compile error in your code and in the heat of the moment you can’t find out what’s wrong, etc.\nTrying to analyze problems in front of a conference audience can be very stressful, and frankly speaking, it’s quickly getting boring or even weird for the audience. So you always should have a fallback plan in case things don’t go as expected with your demo. My go-to strategy is to have a pre-recorded video of the demo which I can play back, instead of wasting minutes trying to solve any issues. I’ll still live-comment that video, which makes it a bit more interactive rather than collectively listening to my pre-recorded voice. For instance I can pause the video and expand on some specific point.\n9. 💪 Play to Your Strengths Some personal habits are really hard to change. One example: I tend to speak fast, very fast, during talks. I’m well aware of that, listeners told me, a coach told me, I saw it myself in recordings. But it’s somehow impossible for me to change it. If I really force myself hard to speak slower, it will work for a while, but typically I’ll be back to my usual speed after a while.\nSo I’ve decided to not fight against this any longer and just live with it. The reason being that I feel the high pace also gives me some energy and flow which I hope becomes apparent to the audience. I believe viewers (and I) are better off with me doing a passionate talk which may be a bit too fast, instead of one which has a slower pace but lacks the right amount of energy.\nI think that’s generally applicable: You don’t like talking about concepts, but love showing how things work in action? Then shorten the former and make more room for a live demo. You enjoy discussing live questions? Make more time for the Q\u0026amp;A. This all is to say, instead of excessively focussing on things you perceive as your weak sides, rather leverage your strong suites.\n(Yes, the irony of this being part of a post focussing on avoiding basic mistakes is not lost on me.)\n10. 🔄 Circle Back I’ve found it works great if you circle back to a point you made earlier during a talk. The most apparent way of doing this is coming back to the mission statement you set out for the talk at the beginning. You should be able to make the point that the things you presented actually satisfy that original mission. Or you have some sort of catch phrase to which you cycle back a few times, repetition can help to drive home a point. Just don’t overdo it, as it can become annoying otherwise. Personally, I like the notion of circling back as it provides some means of closure which is a pleasant sensation.\nAnd that’s it, ten basic tips for making your next talk suck a bit less. You probably won’t get an invitation for doing your first TED talk just by applying them, but they may help you with your next tech conference or meet-up presentation. As a presenter, you should think of yourself as a service provider to the audience: they pay with their time (and usually a fair amount of money) to attend your talk, so you should put in the effort to make sure they have a great time and experience.\nWhat are your presentation tips and tricks? Let me know in the comments below!\nMany thanks to Hans-Peter Grahsl, Marta Paes, and Robin Moffatt for their feedback while writing this blog post!\n","id":35,"publicationdate":"Jun 23, 2022","section":"blog","summary":"Every so often, I come across some conference talk which is highly interesting in terms of its actual contents, but which unfortunately is presented in a less than ideal way. I’m thinking of basic mistakes here, such as the presenter primarily looking at their slides rather than at the audience. I’m always feeling a bit sorry when this happens, as I firmly believe that everyone can do good and even great talks, just by being aware of — and thus avoiding — a few common mistakes, and sticking to some simple principles.","tags":null,"title":"Ten Tips to Make Conference Talks Suck Less","uri":"https://www.morling.dev/blog/ten-tips-make-conference-talks-suck-less/"},{"content":" Update Jun 3: This post is discussed on Reddit and Hacker News\nProject Loom (JEP 425) is probably amongst the most awaited feature additions to Java ever; its implementation of virtual threads (or \u0026#34;green threads\u0026#34;) promises developers the ability to create highly concurrent applications, for instance with hundreds of thousands of open HTTP connections, sticking to the well-known thread-per-request programming model, without having to resort to less familiar and often more complex to use reactive approaches.\nHaving been in the workings for several years, Loom got merged into the mainline of OpenJDK just recently and is available as a preview feature in the latest Java 19 early access builds. I.e. it’s the perfect time to get your hands onto virtual threads and explore the new feature. In this post I’m going to share an interesting aspect I learned about thread scheduling fairness for CPU-bound workloads running on Loom.\nProject Loom First, some context. The problem with the classic thread-per-request model is that only scales up to a certain point. Threads managed by the operating system are a costly resource, which means you can typically have at most a few thousands of them, but not hundreds of thousands, or even millions. Now, if for instance a web application makes a blocking request to a database, the thread making that request is exactly that, blocked. Of course other threads can be scheduled on the CPU in the meantime, but you cannot have more concurrent requests than threads available to you.\nReactive programming models address this limitation by releasing threads upon blocking operations such as file or network IO, allowing other requests to be processed in the meantime. Once a blocking call has completed, the request in question will be continued, using a thread again. This model makes much more efficient use of the threads resource for IO-bound workloads, unfortunately at the price of a more involved programming model, which doesn’t feel familiar to many developers. Also aspects like debuggability or observability can be more challenging with reactive models, as described in the Loom JEP.\nThis explains the huge excitement and anticipation of Project Loom within the Java community. Loom introduces a notion of virtual threads which are scheduled onto OS-level carrier threads by the JVM. If application code hits a blocking method, Loom will unmount the virtual thread from its curring carrier, making space for other virtual threads to be scheduled. Virtual threads are cheap and managed by the JVM, i.e. you can have many of them, even millions. The beauty of the model is that developers can stick to the familiar thread-per-request programming model without running into scaling issues due to a limited number of available threads. I highly recommend you to read the JEP of Project Loom, which is very well written and provides much more details and context.\nScheduling Now how does Loom’s scheduler know that a method is blocking? Turns out, it doesn’t. As I learned from Ron Pressler, the main author of Project Loom, it’s the other way around: blocking methods in the JDK have been adjusted for Loom, so as to release the OS-level carrier thread when being called by a virtual thread:\nAll blocking in Java is done through the JDK (unless you explicitly call native code). We changed the \u0026#34;leaf\u0026#34; blocking methods in the JDK to block the virtual thread rather than the platform thread. E.g. in all of java.util.concurrent there\u0026#39;s just one such method: LockSupport.park\n— Ron Pressler (@pressron) May 24, 2022 Ron’s reply triggered a very interesting discussion with Tim Fox (e.g. of Vert.x fame): what happens if code is not IO-bound, but CPU-bound? I.e. if code in a virtual thread runs some heavy calculation without ever calling any of the JDK’s blocking methods, will that virtual thread ever be unmounted?\nPerhaps surprisingly, the answer currently is: No. Which means that CPU-bound code will actually behave very differently with virtual threads than with OS-level threads. So let’s take a closer look at that phenomenon with the following example program:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 public class LoomTest { public static long blackHole; public static void main(String[] args) throws Exception { ExecutorService executor = Executors.newCachedThreadPool(); for(int i = 0; i \u0026lt; 64; i++) { final Instant start = Instant.now(); final int id = i; executor.submit(() -\u0026gt; { BigInteger res = BigInteger.ZERO; for(int j = 0; j \u0026lt; 100_000_000; j++) { res = res.add(BigInteger.valueOf(1L)); } blackHole = res.longValue(); System.out.println(id + \u0026#34;;\u0026#34; + Duration.between(start, Instant.now()).toMillis()); }); } executor.shutdown(); executor.awaitTermination(1, TimeUnit.HOURS); } } 64 threads are started at approximately the same time using a traditional cached thread pool, i.e. OS-level threads. Each thread counts to 100M (using BigInteger to make it a bit more CPU-intensive) and then prints out how long it took from scheduling the thread to the point of its completion. Here are the results from my Mac Mini M1:\nIn wallclock time, it took all the 64 threads roughly 16 seconds to complete. The threads are rather equally scheduled between the available cores of my machine. I.e. we’re observing a fair scheduling scheme. Now here are the results using virtual threads (by obtaining the executor via Executors::newVirtualThreadPerTaskExecutor()):\nThat chart looks very differently. The first eight threads took a wallclock time of about two seconds to complete, the next eight took about four seconds, etc. As the executed code doesn’t hit any of the JDK’s blocking methods, the threads never yield and thus ursurpate their carrier threads until they have run to completion. This represents an unfair scheduling scheme of the threads. While they were all started at the same time, for the first two seconds only eight of them were actually executed, followed by the next eight, and so on.\nLoom’s scheduler uses by default as many carrier threads as there are CPU cores available; There are eight cores in my M1, so processing happens in chunks of eight virtual threads at a time. Using the jdk.virtualThreadScheduler.parallelism system property, the number of carrier threads can be adjusted, e.g. to 16:\nFor the fun of it, let’s add a call to Thread::sleep() (i.e. a blocking method) to the processing loop and see what happens:\n1 2 3 4 5 6 7 8 9 10 11 12 13 ... for(int j = 0; j \u0026lt; 100_000_000; j++) { res = res.add(BigInteger.valueOf(1L)); if (j % 1_000_000 == 0) { try { Thread.sleep(1L); } catch (InterruptedException e) { throw new RuntimeException(e); } } } ... Surely enough, we’re back to a fair scheduling, with all threads completing after the roughly same wallclock time:\nIt’s noteworthy that the actual durations appear more harmonized in comparison to the original results we got from running with 64 OS-level threads. It seems the Loom scheduler can do a slightly better job of distributing the available resources between virtual threads. Surprisingly, a call to Thread::yield() didn’t have the same result. While a scheduler is free to ignore this intend to yield as per the method’s JavaDoc, Sundararajan Athijegannathan indicated that this would be applied by Loom. It would surely be interesting to know why that’s not the case here.\nDiscussion Seeing these results, the big question of course is whether this unfair scheduling of CPU-bound threads in Loom poses a problem in practice or not. Ron and Tim had an expanded debate on that point, which I recommend you to check out to form an opinion yourself. As per Ron, support for yielding at points in program execution other than blocking methods has been implemented in Loom already, but this hasn’t been merged into the mainline with the initial drop of Loom. It should be easy enough though to bring this back if the current behavior turns out to be problematic.\nNow there’s not much point in overcommitting to more threads than physically supported by a given CPU anyways for CPU-bound code (nor in using virtual threads to begin with). But in any case it’s worth pointing out that CPU-bound code may behavior differently with virtual threads than with classic OS-level threads. This may come at a suprise for Java developers, in particular if authors of such code are not in charge of selecting the thread executor/scheduler actually used by an application.\nTime will tell whether yield support also for CPU-bound code will be required or not, either via support for explicit calls to Thread::yield() (which I think should be supported at the very least) or through more implicit means, e.g. by yielding when reaching a safepoint. As I learned, Go’s goroutines support yielding in similar scenarios since version 1.14, so I wouldn’t be surprised to see Java and Loom taking the same course eventually.\n","id":36,"publicationdate":"May 27, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Jun 3: This post is discussed on \u003ca href=\"https://www.reddit.com/r/java/comments/v394uh/loom_and_thread_fairness/\"\u003eReddit\u003c/a\u003e and \u003ca href=\"https://news.ycombinator.com/item?id=31600067\"\u003eHacker News\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eProject Loom (\u003ca href=\"https://openjdk.java.net/jeps/425\"\u003eJEP 425\u003c/a\u003e) is probably amongst the most awaited feature additions to Java ever;\nits implementation of virtual threads (or \u0026#34;green threads\u0026#34;) promises developers the ability to create highly concurrent applications,\nfor instance with hundreds of thousands of open HTTP connections,\nsticking to the well-known thread-per-request programming model,\nwithout having to resort to less familiar and often more complex to use reactive approaches.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eHaving been in the workings for several years, Loom got merged into the mainline of OpenJDK \u003ca href=\"https://github.com/openjdk/jdk/commit/9583e3657e43cc1c6f2101a64534564db2a9bd84\"\u003ejust recently\u003c/a\u003e and is available as a preview feature in the latest \u003ca href=\"https://jdk.java.net/19/\"\u003eJava 19 early access builds\u003c/a\u003e.\nI.e. it’s the perfect time to get your hands onto virtual threads and explore the new feature.\nIn this post I’m going to share an interesting aspect I learned about thread scheduling fairness for CPU-bound workloads running on Loom.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Loom and Thread Fairness","uri":"https://www.morling.dev/blog/loom-and-thread-fairness/"},{"content":" JDK Mission Control (JMC) is invaluable for analysing performance data recording using JDK Flight Recorder (JFR). The other day, I ran into a problem when trying to run JMC on my Mac Mini M1. Mostly for my own reference, here’s what I did to overcome it.\nUpon launching a freshly downloaded JMC (I tried both the upstream build from OpenJDK and the one from the Eclipse Adoptium project), I’d get the following error message:\nThe JVM shared library \u0026#34;/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/../lib/server/libjvm.dylib\u0026#34; does not contain the JNI_CreateJavaVM symbol.\n\u0026#34;temurin-17.jdk\u0026#34; is my default JDK; it’s the Java 17 build provided by the Eclipse Temurin project for macOS/AArch64, i.e. the right one for the ARM chip of the M1 (\u0026#34;Apple silicon\u0026#34;). The error message isn’t overly helpful; after all, that referenced JDK works just fine for all my other applications. The problem though is that JMC itself currently only is shipped as an x64 application:\n1 2 $ file \u0026#34;JDK Mission Control.app\u0026#34;/Contents/MacOS/jmc .../JDK Mission Control.app/Contents/MacOS/jmc: Mach-O 64-bit executable x86_64 So I decided to try with an x64 JDK build instead; thanks to Apple’s Rosetta project, x64 binaries can be executed on the M1 via a rather efficient emulation.\nAfter downloading the macOS/x64 Temurin build, it needs to be configured as the JDK to use for JMC. For that, open the file JDK Mission Control.app/Contents/Info.plist in an editor and look for the Eclipse key. Add the -vm parameter with the path to the x64 JDK to the key’s value. Altogether, it should look like so:\n1 2 3 4 5 6 7 8 ... \u0026lt;array\u0026gt; \u0026lt;string\u0026gt;-keyring\u0026lt;/string\u0026gt; \u0026lt;string\u0026gt;~/.eclipse_keyring\u0026lt;/string\u0026gt; \u0026lt;string\u0026gt;-vm\u0026lt;/string\u0026gt; \u0026lt;string\u0026gt;/path/to/jdk-17.0.3+7-x86-64/Contents/Home/bin/java\u0026lt;/string\u0026gt; \u0026lt;/array\u0026gt; ... Et voilà, JMC will now start just fine on the Apple M1. Note that in some cases I got an intermittent permission issue after editing the plist file. Resetting the permissions helped in that case:\n1 $ sudo chmod -R 755 \u0026#34;JDK Mission Control.app\u0026#34; With the x64 JDK around, it’s a good idea to make sure it’s only used for JMC, while sticking to the AArch64 build for all other usages for the sake of performance. Unfortunately, it’s not quite obvious to see flavour you are running, as the target architecture isn’t displayed in the output of java --version:\n1 2 3 4 5 6 7 8 9 10 11 $ export JAVA_HOME=path/to/temurin-17.jdk/Contents/Home $ java --version openjdk 17.0.3 2022-04-19 OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7) OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode) $ export JAVA_HOME=path/to/jdk-17.0.3+7-x86-64/Contents/Home $ jdks java --version openjdk 17.0.3 2022-04-19 OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7) OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing) Not sure what \u0026#34;sharing\u0026#34; exactly means in the x64 output, perhaps it’s a hint? In any case, printing the contents of the os.arch system property will tell the truth, e.g. in jshell:\n1 2 3 4 5 6 7 $ export JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home $ jdks jshell | Welcome to JShell -- Version 17.0.3 | For an introduction type: /help intro jshell\u0026gt; System.out.println(System.getProperty(\u0026#34;os.arch\u0026#34;)) aarch64 1 2 3 4 5 6 7 $ export JAVA_HOME=~/Applications/jdks/jdk-17.0.3+7-x86-64/Contents/Home $ jshell | Welcome to JShell -- Version 17.0.3 | For an introduction type: /help intro jshell\u0026gt; System.out.println(System.getProperty(\u0026#34;os.arch\u0026#34;)) x86_64 If you are aware of a quicker way for identifying the current JDK’s target platform, I’d love to learn about it in the comments below. Thanks!\n","id":37,"publicationdate":"May 17, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://jdk.java.net/jmc/8/\"\u003eJDK Mission Control\u003c/a\u003e (JMC) is invaluable for analysing performance data recording using \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR).\nThe other day, I ran into a problem when trying to run JMC on my Mac Mini M1.\nMostly for my own reference, here’s what I did to overcome it.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Running JDK Mission Control on Apple M1","uri":"https://www.morling.dev/blog/running-jmc-on-apple-m1/"},{"content":" When it comes to code reviews, it’s a common phenomenon that there is much focus and long-winded discussions around mundane aspects like code formatting and style, whereas important aspects (does the code change do what it is supposed to do, is it performant, is it backwards-compatible for existing clients, and many others) tend to get less attention.\nTo raise awareness for the issue and providing some guidance on aspects to focus on, I shared a small visual on Twitter the other day, which I called the \u0026#34;Code Review Pyramid\u0026#34;. Its intention is to help putting focus on those parts which matter the most during a code review (in my opinion, anyways), and also which parts could and should be automated.\nAs some folks asked for a permanent, referenceable location of that resource and others wanted to have a high-res printing version, I’m putting it here again:\nYou can also download the visual as an SVG file.\nFAQ Why is it a pyramid?\nThe lower parts of the pyramid should be the foundation of a code review and take up the most part of it.\nHey, that’s a triangle!\nYou might think so, but it’s a pyramid from the side.\nWhich tool did you use for creating the drawing?\nExcalidraw.\n","id":38,"publicationdate":"Mar 10, 2022","section":"blog","summary":"When it comes to code reviews, it’s a common phenomenon that there is much focus and long-winded discussions around mundane aspects like code formatting and style, whereas important aspects (does the code change do what it is supposed to do, is it performant, is it backwards-compatible for existing clients, and many others) tend to get less attention.\nTo raise awareness for the issue and providing some guidance on aspects to focus on, I shared a small visual on Twitter the other day, which I called the \u0026#34;Code Review Pyramid\u0026#34;.","tags":null,"title":"The Code Review Pyramid","uri":"https://www.morling.dev/blog/the-code-review-pyramid/"},{"content":" The JDK Flight Recorder (JFR) is one of Java’s secret weapons; deeply integrated into the Hotspot VM, it’s a high-performance event collection framework, which lets you collect metrics on runtime aspects like object allocation and garbage collection, class loading, file and network I/O, and lock contention, do method profiling, and much more.\nJFR data is persisted in recording files (since Java 14, also \u0026#34;realtime\u0026#34; event streaming is supported), which can be loaded for analysis into tools like JDK Mission Control (JMC), or the jfr utility coming with OpenJDK itself.\nWhile there’s lots of blog posts, conference talks, and other coverage on JFR itself, information about the format of recording files is surprisingly heard to come by. There is no official specification, so the only way to actually understand the JFR file format is to read the source code for writing recordings in the JDK itself, which is a combination of Java and C++ code. Alternatively, you can study the code for parsing recordings in JMC (an official JDK project). Btw., JMC comes with a pure Java-based JFR file writer implementation too.\nApart from the source code itself, the only somewhat related resources which I could find are this JavaOne presentation by Staffan Larssan (2013, still referring to the proprietary Oracle JFR), several JFR-related blog posts by Marcus Hirt, and a post about JFR event sizes by Richard Startin. But there’s no in-depth discussion or explanation of the file format. As it turns out, this by design; the OpenJDK team shied away from creating a spec, \u0026#34;because of the overhead of maintaining and staying compatible with it\u0026#34;. I.e. the JFR file format is an implementation detail of OpenJDK, and as such the only stable contract for interacting with it are the APIs provided by JFR.\nNow, also if it is an implementation detail, knowing more about the JFR file format would certainly be useful; for instance, you could use this to implement tools for analyzing and visualizing JFR data in non-JVM programming languages, say Python, or to patch corrupted recording files. So my curiosity was piqued and I thought it’d be fun to try and find out how JFR recording files are structured. In particular, I was curious about which techniques are used for keeping files relatively small, also with hundreds of thousands or even millions of recoreded events.\nI grabbed a hex editor, the source code of JMC’s recording parser (which I found a bit easier to grasp than the Java/C++ hybrid in the JDK itself), and loaded several example recordings from my JFR Analytics project, stepping through the parser code in debug mode (fun fact: while doing so, I noticed JMC currently fails to parse events with char attributes).\nJust a feeew hours later, and I largely understood how the thing works. As an image says more than a thousand words, and I’ll never say no to an opportunity to draw something in the fabuluous Excalidraw, so I proudly present to you this visualization of the JFR file format as per my understanding (click to enlarge):\nIt’s best viewed on a big screen 😎. Alternatively, here’s a SVG version. Now this doesn’t go into all the finest aspects, so you probably couldn’t go off and implement a clean-room JFR file parser solely based on this. But it does show the relevant concepts and mechanisms. I suggest you spend some time going through sections one to five in the picture, and dive into the sections for header, metadata, constant pool, and actual recorded events. Studying the image should give you a good understanding of the JFR file format and its structure.\nHere are some observations I made as I found my way through the file format:\nJFR recordings are organized in chunks: Chunks are self-contained independent containers of recorded events and all the metadata required for interpreting these events. There’s no additional content in recordings besides the chunks, i.e. concat several chunk files, and you’ll have a JFR recording file. A multi-chunk recording file can be split up into the individual chunks using the jfr utility which comes with OpenJDK:\n1 jfr disassemble --output \u0026lt;target-dir\u0026gt; some-recording.jfr The default chunksize is 12MB, but if needed, you can override this, e.g. using the -XX:FlightRecorderOptions:maxchunksize=1MB option when starting a recording. A smaller chunk size can come in handy if for instance you only want to transmit a specific section of a long-running recording. On the other hand, many small chunks will increase the overall size of a recording, due to the repeatedly stored metadata and constant pools\nThe event format is self-descriptive: The metadata part of each chunk describes the structure of the contained events, all referenced types, their attributes, etc.; by means of JFR metadata annotations, such as @Label, @Description, @Timestamp etc., further metadata like human-readable names and description as well as units of measurements are expressed, allowing to consume and parse an event stream without a-priori knowledge of specific event types. In particular, this allows for the definition of custom event types and displaying them in the generic event browser of JMC (of course, bespoke views such as the \u0026#34;Memory\u0026#34; view rely on type-specific interpretations of individual event types)\nThe format is geared towards space efficiency: Integer values are stored in a variable-length encoded way (LEB128), which will safe lots of space when storing small values. A constant pool is used to store repeatedly referenced objects, such as String literals, stack traces, class and method names, etc.; for each usage of such constant in a recorded event, only the constant pool index is stored (a var-length encoded long). Note that Strings can either be stored as raw values within events themselves, or in the constant pool. Unfortunately, no control is provided for choosing between the two; strings with a length between 16 and 128 will be stored in the constant pool, any others as raw value. It could be a nice extension to give event authors more control here, e.g. by means of an annotation on the event attribute definition\nWhen using the jdk.OldObjectSample event type, beware of bug JDK-8277919, which may cause a bloat of the constant pool, as the same entry is duplicated in the pool many times. This will be fixed in Java 17.0.3 and 18. The format is row-based: Events are stored sequentially one after another in recording files; this means that for instance boolean attributes will consume one full byte, also if actually eight boolean values could be stored in a single byte. It could be interesting to explore a columnar format as an alternative, which may help to further reduce recording size, for instance also allowing to efficiently compress event timestamps values using delta-encoding\nCompression support in JMC reader implementation: The JFR parser implementation of JMC transparently unpacks recording files which are compressed using GZip, ZIP, or LZ4 (Marcus Hirt discusses the compression of JFR recordings in this post). Interestingly, JMC 8.1 still failed to open such compressed recording with an error message. The jfr utility doesn’t support compressed recording files, and I suppose the JFR writer in the JDK doesn’t produce compressed recordings either\n","id":39,"publicationdate":"Feb 20, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) is one of Java’s secret weapons;\ndeeply integrated into the Hotspot VM, it’s a high-performance event collection framework,\nwhich lets you collect metrics on runtime aspects like object allocation and garbage collection,\nclass loading, file and network I/O, and lock contention, do method profiling, and much more.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJFR data is persisted in recording files\n(since Java 14, also \u003ca href=\"https://openjdk.java.net/jeps/349\"\u003e\u0026#34;realtime\u0026#34; event streaming\u003c/a\u003e is supported),\nwhich can be loaded for analysis into tools like JDK Mission Control (JMC),\nor the \u003cem\u003ejfr\u003c/em\u003e utility coming with OpenJDK itself.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"The JDK Flight Recorder File Format","uri":"https://www.morling.dev/blog/jdk-flight-recorder-file-format/"},{"content":" Update Jan 13: This post is discussed on Reddit\nUpdate Feb 7: This post is discussed on Hacker News\nAs software developers, we’ve all come across those annoying, not-so-useful error messages when using some library or framework: \u0026#34;Couldn’t parse config file\u0026#34;, \u0026#34;Lacking permission for this operation\u0026#34;, etc. Ok, ok, so something went wrong apparently; but what exactly? What config file? Which permissions? And what should you do about it? Error messages lacking this kind of information quickly create a feeling of frustration and helplessness.\nSo what makes a good error message then? To me, it boils down to three pieces of information which should be conveyed by an error message:\nContext: What led to the error? What was the code trying to do when it failed?\nThe error itself: What exactly failed?\nMitigation: What needs to be done in order to overcome the error?\nLet’s dive into these individidual aspects a bit. Before we start, let me clarify that this is about error messages created by library or framework code, for instance in form of an exception message, or in form of a message written to some log file. This means the consumers of these error messages will typically be either other software developers (encountering errors raised by 3rd party dependencies during application development), or ops folks (encountering errors while running an application).\nThat’s in contrast to user-facing error messages, for which other guidance and rules (in particular in regards to security concerns) should be applied. For instance, you typically should not expose any implementation details in a user-facing message, whereas that’s not that much of a concern — or on the contrary, it can even be desirable — for the kind of error messages discussed here.\nContext In a way, an error message tells a story; and as with every good story, you need to establish some context about its general settings. For an error message, this should tell the recipient what the code in question was trying to do when it failed. In that light, the first example above, \u0026#34;Couldn’t parse config file\u0026#34;, is addressing this aspect (and only this one) to some degree, but probably it’s not enough. For instance, it would be very useful to know the exact name of the file:\nCouldn’t parse config file: /etc/sample-config.properties\u0026#34;\nUsing an example from Debezium, the open-source change data capture platform I am working on in my day job, the second message could read like so with some context about what happened:\nFailed to create an initial snapshot of the data; lacking permission for this operation\nComing back to error messages related to the processing of some input or configuration file, it can be a good idea to print the absolute path. In case file system resources are provided as relative paths, this can help to identify wrong assumptions around the current working directory, or whatever else is used as the root for resolving relative paths. On the other hand, in particular in case of multi-tenant or SaaS scenarios, you may consider filesystem layouts as a confidential implementation detail, which you may prefer to not reveal to unknown code you run. What’s best here depends on your specific situation.\nIf some framework supports different kinds of files, the specific kind of the problematic file in question should be part of the message as well: \u0026#34;Couldn’t parse entity mapping file…​\u0026#34;. If the error is about specific parts of the contents of a file, displaying the line number and/or the line itself is a good idea.\nIn terms of how to convey the context of an error, it can be part of messages themselves, as shown above. Many logging frameworks also support the notion of a Mapped Diagnostic Context (MDC), a map for propagating arbitrary key/value pairs into log messages. So if your messages are meant to show up in logs, setting contextual information to the MDC can be very useful. In Debezium this is used for instance to propagate the name of the affected connector, allowing Kafka Connect users to tell apart log messages originating from different connectors deployed to the same Connect cluster.\nAs far as propagating contextual information via log messages is concerned (as opposed to, say, error messages printed by a CLI tool), structured logging, typically in form of JSON, simplifies any downstream processing. By putting contextual information into separate attributes of a structured log entry, consumers can easily filter messages, ingest only specific sub-sets of messages based on their contents, etc. In case of exceptions, the chain of exceptions leading to the root cause is an important contextual information, too. So I’d recommend to always log the entire exception chain, rather than catching exceptions and only logging some substitute message instead.\nThe Error Itself On to the next part then, the description of the actual error itself. That’s where you should describe what exactly happened in a concise way. Sticking to the examples above, the first message, including context and error description could read like so:\nCouldn’t parse config file: /etc/sample-config.properties; given snapshot mode \u0026#39;nevr\u0026#39; isn’t valid\nAnd for the second one:\nFailed to create an initial snapshot of the data; database user \u0026#39;snapper\u0026#39; is lacking the required permissions\nOther than that, there’s not too much to be said here; try to be efficient: make messages as long as needed, and as short as possible. One idea could be to work with different variants of messages for the same kind of error, a shorter and a longer one. Which one is used could be controlled via log levels or some kind of \u0026#34;verbose\u0026#34; flag. Java developers may find Cédric Champeau’s jdoctor library useful for implementing this. Personally, I haven’t used such an approach yet, but it may be worth the effort for specific situations.\nMitigation Having established the context of the failure and what went wrong exactly, the last — and oftentimes most interesting — part is a description of how the user can overcome the error. What’s the action they need to take in order to avoid it? This could be as simple as telling the user about the constraints and/or valid values in case of the config file example (i.e. akin to test failure messages, which show both expected and actual values):\nCouldn’t parse config file: /etc/sample-config.properties; given snapshot mode \u0026#39;nevr\u0026#39; isn’t valid (must be one of \u0026#39;initial\u0026#39;, \u0026#39;always\u0026#39;, \u0026#39;never\u0026#39;)\nIn case of the permission issue, you may clarify which ones are needed:\nCouldn’t take database snapshot: database user \u0026#39;snapper\u0026#39; is lacking the required permissions \u0026#39;SELECT\u0026#39;, \u0026#39;REPLICATION\u0026#39;\nAlternatively, if longer mitigation strategies are required, you may point to a (stable!) URL in your reference documention which provides the required information:\nCouldn’t take database snapshot: database user \u0026#39;snapper\u0026#39; is lacking the required permissions. Please see https://example.com/knowledge-base/snapshot-permissions/ for the complete set of necessary permissions\nIf some configuration change is required (for instance database or IAM permissions), your users will love you even more if you share that information in \u0026#34;executable\u0026#34; form, for instance as GRANT statements which they can simply copy, or vendor-specific CLI invocations such as aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/SomePolicy --role-name SomeRole.\nSpeaking of external resources referenced in error messages, it’s a great idea to have unique error codes as part of your messages (such as Oracle’s ORA codes, or the error messages produced by WildFly and its components). Corresponding resources (either provided by yourself, or externally, for instance in answers on StackOverflow) will then be easy to find using your favourite search engine. Bonus points for adding a reference to your own canonical resource right to the error message itself:\nCouldn’t take database snapshot: database user \u0026#39;snapper\u0026#39; is lacking the required permissions (DBZ-42). Please see https://dbz.codes/dbz-42/ for the complete set of necessary permissions\n(That’s a made-up example, we don’t make use of this approach in Debezium currently; but I probably should look into buying the dbz.codes domain 😉).\nThe key take-away is that you should not leave your users in the dark about what they need to do in order to address the error they ran into. Nothing is more frustrating than essentially being told \u0026#34;You did it wrong!\u0026#34;, without getting hinted at what’s the right thing to do instead.\nGeneral Best Practices Lastly, some practices in regards to error messages which I try to adhere to, and which I would generally recommend:\nUniform voice and style: The specific style chosen doesn’t matter too much, but you should settle on either active vs. passive voice (\u0026#34;couldn’t parse config file\u0026#34; vs. \u0026#34;config file couldn’t be parsed\u0026#34;), apply consistent casing, either finish or not finishes messages with a dot, etc.; not a big thing, but it will make your messages a bit easier to deal with\nOne concept, one term: Avoid referring to the same concept from your domain using different terms in different error messages; similarly, avoid using the same term for multiple things. Use the same terms as in other places, e.g. your API documentation, reference guides etc.; The more consisent and unambiguous you are, the better\nDon’t localize error messages: This one is not as clear cut, but I’d generally recommend to not translate error messages into other languages than English; Again, this all is not about user-facing error messages, but about messages geared towards software developers and ops folks, who generally should command reasonable English skills; depending on your audience and target market, translations to specific languages might make sense, in which case a common, unambiguous error code should definitely be part of messages, so as to facilitate searching for the error on the internet\nDon’t make error messages an API contract: In case consumers of your API should be able to react to different kinds of errors, they should not be required to parse any error messages in order to do so. Instead, raise an exception type which exposes a machine-processable error code, or raise specific exception types which can be caught separately by the caller\nBe cautious about exposing sensitive data: if your library is in the business of handling and processing sensitive user data, make sure to to not create any privacy concerns; for instance, \u0026#34;show actual vs. expected value\u0026#34; may not pose a problem for values provided by an application developer or administrator; but it can pose a problem if the actual value is GDPR protected user data\nEither raise an exception OR log an error, but not both: A given error should either be communicated by raising an exception or by logging an error. Otherwise, when doing both, as the exception will typically end up being logged via some kind of generic handler anyways, the user would see information about the same error in their logs twice, which only adds confusion\nFail early: This one is not so much about how to express error messages, but when to raise them; in general, the earlier, the better; a message at application start-up beats one later at runtime; a message at build time beats one at start-up, etc. Quicker feedback makes for shorter turn-around times for fixes and also helps to provide the context of any failures\nWith that all being said, what’s your take on the matter? Any best practices you would recommend? Do you have any examples for particularly well (or poorly) crafted messages? Let me know in the comments below!\n","id":40,"publicationdate":"Jan 12, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Jan 13: This post is \u003ca href=\"https://www.reddit.com/r/programming/comments/s2kcp7/whats_in_a_good_error_message/\"\u003ediscussed on Reddit\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Feb 7: This post is \u003ca href=\"https://news.ycombinator.com/item?id=30234572\"\u003ediscussed on Hacker News\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs software developers, we’ve all come across those annoying, not-so-useful error messages when using some library or framework: \u003cem\u003e\u0026#34;Couldn’t parse config file\u0026#34;\u003c/em\u003e, \u003cem\u003e\u0026#34;Lacking permission for this operation\u0026#34;\u003c/em\u003e, etc.\nOk, ok, so \u003cem\u003esomething\u003c/em\u003e went wrong apparently; but what exactly? What config file? Which permissions? And what should you do about it?\nError messages lacking this kind of information quickly create a feeling of frustration and helplessness.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSo what makes a good error message then?\nTo me, it boils down to three pieces of information which should be conveyed by an error message:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cem\u003eContext:\u003c/em\u003e What led to the error? What was the code trying to do when it failed?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cem\u003eThe error itself:\u003c/em\u003e What exactly failed?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cem\u003eMitigation:\u003c/em\u003e What needs to be done in order to overcome the error?\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e","tags":null,"title":"What's in a Good Error Message?","uri":"https://www.morling.dev/blog/whats-in-a-good-error-message/"},{"content":" 🧸 It’s Casey. Casey Cuddle.\nI am very happy to announce the first stable release of kcctl, a modern and intuitive command line client for Apache Kafka Connect!\nForget about having to memorize and type the right REST API paths and curl flags; with kcctl, managing your Kafka connectors is done via concise and logically structured commands, modeled after the semantics of the kubectl tool known from Kubernetes.\nStarting now, kcctl is available via SDKMan, which means it’s as easy as running sdk install kcctl for getting the latest kcctl release onto your Linux, macOS, or Windows x86 machine. For the best experience, also install the kcctl shell completion script, which not only \u0026lt;TAB\u0026gt;-completes command names and options, but also dynamic information such as connector, task, and logger names:\n1 2 wget https://raw.githubusercontent.com/kcctl/kcctl/main/kcctl_completion . kcctl_completion kcctl offers commands for all the common tasks you’ll encounter when dealing with Kafka Connect, such as listing the available connector plug-ins, registering new connectors, changing their configuration, pausing and resuming them, changing log levels, and much more.\nSimilar to kubectl, kcctl works with the notion of named configuration contexts. Contexts allow you to set up multiple named Kafka Connect environments (e.g. \u0026#34;local\u0026#34; and \u0026#34;testing\u0026#34;) and easily switch between them, without having to specify the current Connect cluster URL all the time:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ kcctl config get-contexts NAME KAFKA CONNECT URI local http://localhost:8083 testing* http://localhost:8084 $ kcctl config use-context local Using context \u0026#39;local\u0026#39; $ kcctl get plugins TYPE CLASS VERSION source io.debezium.connector.db2.Db2Connector 1.8.0.Final source io.debezium.connector.mongodb.MongoDbConnector 1.8.0.Final source io.debezium.connector.mysql.MySqlConnector 1.8.0.Final source org.apache.kafka.connect.file.FileStreamSourceConnector 3.0.0 source org.apache.kafka.connect.mirror.MirrorCheckpointConnector 1 source org.apache.kafka.connect.mirror.MirrorHeartbeatConnector 1 source org.apache.kafka.connect.mirror.MirrorSourceConnector 1 sink org.apache.kafka.connect.file.FileStreamSinkConnector 3.0.0 Once you’ve set up a kcctl context, you can start using the tool for managing your connectors. Here is a video which shows a typical workflow in kcctl (note this recording shows an earlier version of kcctl, there’s a few less commands and the notion of contexts has is slightly changed since then):\nAs shown in the video, connectors are registered and updated via kcctl apply. This command can also read input from stdin, which for instance comes in handy when templating connector configuration using Jsonnet and setting up multiple similar connectors at once:\nTo learn more about these and all the other commands available in kcctl, run kcctl --help.\nDiscussion and Outlook kcctl offers an easy yet very powerful way for solving your day-to-day tasks with Kafka Connect. In comparison to using the REST API directly via clients such as curl or httpie, kcctl as a dedicated tool offers commands which are more concise and intuitive; also its output is logically organized, using colored formatting to highlight key information. It has become an invaluable tool for my own work on Debezium, e.g. when testing, or doing some demo. These days, I find myself very rarely using the REST API directly any more.\nI hope kcctl becomes useful helper for folks working with Kafka Connect. As such, I see it as a complement to other means of interacting with Kafka Connect. Sometimes a CLI client may be what does the job the best, while at other times you may prefer to work with a graphical user interface such as Debezium UI or the vendor-specific consoles of managed connector services, Kubernetes operators such as Strimzi, Terraform, or perhaps even a Java API. It’s all about options!\nWhile all the typical Kafka Connect workflows are supported by kcctl already, there’s quite a few additional features I’d love to see. First and foremost, the ability to display (and reset) the offsets of Kafka Connect source connectors. Work on that is well underway, and I expect this to be available very soon. There also should be support for different output formats such as JSON, improving useability in conjunction with other CLI tools such as jq. The restart command should be expanded, so as to take advantage of the API for restarting all (failed) connector tasks added in Kafka Connect 3.0. Going beyond the scope of supporting plain Kafka Connect, there could also be connector specific commands, such as an option for compacting the history topic of Debezium connectors. Of course, your feature requests are welcome, too! Please log an issue in the kcctl project with your proposals for additions to the tool. And while at it, we’d also love to welcome you as a stargazer 🌟 to the project!\nLastly, a big thank you to all the amazing people who have contributed to kcctl up to this point:\nAndres Almiray, Guillaume Smet, Hans-Peter Grahsl, Iskandar Abudiab, Jay Patel, Karim ElNaggar, Michael Simons, Mickael Maison, Oliver Weiler, Sergey Nuyanzin, Siddique Ahmad, Thomas Dangleterre, and Tony Foster!\nYou’re the best 🧸!\n","id":41,"publicationdate":"Dec 21, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e🧸 \u003cem\u003eIt’s Casey. Casey Cuddle.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am very happy to announce the first stable release of \u003ca href=\"https://github.com/kcctl/kcctl\"\u003ekcctl\u003c/a\u003e,\na modern and intuitive command line client for \u003ca href=\"https://kafka.apache.org/documentation/#connect\"\u003eApache Kafka Connect\u003c/a\u003e!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eForget about having to memorize and type the right REST API paths and curl flags;\nwith kcctl, managing your Kafka connectors is done via concise and logically structured commands,\nmodeled after the semantics of the kubectl tool known from Kubernetes.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Announcing the First Release of kcctl","uri":"https://www.morling.dev/blog/announcing-first-release-of-kcctl/"},{"content":" I am very happy to announce the availability of the OSS Quickstart Archetype!\nPart of the ModiTect family of open-source projects, this is a Maven archetype which makes it very easy to bootstrap new Maven-based open-source projects, satisfying common requirements such as configuring plug-in versions, and adhering to best practices like auto-formatting the source code. Think Maven Quickstart Archetype and friends, but more modern, complete, and opinionated.\nThe Challenge When bootstrapping new Maven-based projects, be it long-running ones, a short-lived proof-of-concept projects, or just some quick demo you’d like to publish on GitHub, there’s always some boilerplate involved: creating the POM with the right plug-in versions and configurations, preparing CI e.g. on GitHub Actions, providing a license file, etc.\nWhile you could try and copy (parts of) an existing project you already have, Maven has a better answer to this problem: archetypes, pre-configured project templates which can be parameterized to some degree and which let you create new projects with just a few steps. Unfortunately, the canonical Maven quickstart archetype is rather outdated, creating projects for Java 1.7, using JUnit 4, etc.\nThe OSS Quickstart Archetype The OSS (open-source software) quickstart archetype is meant as a fresh alternative, not only providing more current defaults and dependency versions, but also going beyond what’s provided by the traditional quickstart archetype. More specifically, it\ndefines up-to-date versions of all plug-ins in use, as well as of JUnit 5 and AssertJ (the opinionated part ;)\nenforces all plug-in versions to be defined via the Maven enforcer plug-in\nprovides a license file and uses the license Maven plug-in for formatting/checking license headers in all source files\ndefines a basic set up for CI on GitHub Actions, building the project upon each push to the main branch of your repository and for each PR\nconfigures plug-ins for auto-formatting code and imports (I told you, it’s opinionated)\ndefines a -Dquick option for skipping all non-essential plug-ins, allowing you to produce the project’s JAR as quickly as possible\n(optionally) provides a module-info.java descriptor\nAnd most importantly, opening braces are not on the next line. We all agree nobody likes that, right?! Using the OSS Quickstart Archetype for bootstrapping a new project is as simple as running the following command:\n1 2 3 4 5 6 7 8 mvn archetype:generate -B \\ -DarchetypeGroupId=org.moditect.ossquickstart \\ -DarchetypeArtifactId=oss-quickstart-simple-archetype \\ -DarchetypeVersion=1.0.0.Alpha1 \\ -DgroupId=com.example.demos \\ -DartifactId=fancy-project \\ -Dversion=1.0.0-SNAPSHOT \\ -DmoduleName=com.example.fancy Just a few seconds later, and you’ll have a new project applying all the configuration above, ready for you to start some open-source awesomeness.\nOutlook Version 1.0.0.Alpha1 of the OSS Quickstart Archetype is available today on Maven Central, i.e. you can starting using it for bootstrapping new projects right now. It already contains most of the things I wanted it to have, but there’s also a few more improvements I would like to make:\nAdd the Maven wrapper (#1)\nMake the license of the generated project configurable; currently, it uses Apache License, version 2. I’d like to make this an option of the archetype, which would let you choose between this license and a few other key open-source licenses, like MIT and BSD 3-clause (#2)\nProvide a variant of the archetype for creating multi-module Maven projects (#7)\nAdd basic CheckStyle configuration (also skippable via -Dquick, #10)\nAny contributions for implementing these, as well as other feature requests are highly welcome. Note the idea is to keep these archetypes lean and mean, i.e. they should only contain widely applicable features, leaving more specific things for the user to add after they created a project with the archetype.\nHappy open-sourcing!\nMany thanks to Andres Almiray for setting up the release pipeline for this project, using the amazing JReleaser tool!\n","id":42,"publicationdate":"Dec 2, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am very happy to announce the availability of the \u003ca href=\"https://github.com/moditect/oss-quickstart\"\u003eOSS Quickstart Archetype\u003c/a\u003e!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003ePart of the \u003ca href=\"https://github.com/moditect/\"\u003eModiTect\u003c/a\u003e family of open-source projects,\nthis is a Maven archetype which makes it very easy to bootstrap new Maven-based open-source projects,\nsatisfying common requirements such as configuring plug-in versions, and adhering to best practices like auto-formatting the source code.\nThink \u003ca href=\"https://maven.apache.org/archetypes/maven-archetype-quickstart/scm.html\"\u003eMaven Quickstart Archetype\u003c/a\u003e and friends, but more modern, complete, and opinionated.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Introducing the OSS Quickstart Archetype","uri":"https://www.morling.dev/blog/introducing-oss-quickstart-archetype/"},{"content":" The other day, I came across an interesting thread in the Java sub-reddit, with someone asking: \u0026#34;Has anyone attempted to write logs directly to Kafka?\u0026#34;. This triggered a number of thoughts and questions for myself, in particular how one should deal in an application when an attempt to send messages to Kafka fails, for instance due to some network connectivity issue? What do you do when you cannot reach the Kafka broker?\nWhile the Java Kafka producer buffers requests internally (primarily for performance reasons) and also supports retries, you cannot do so indefinitely (or can you?), so I went to ask the Kafka community on Twitter how they would handle this situation:\n#Kafka users: how do you deal in producers with brokers not being available? Take a use case like sending logs; you don\u0026#39;t want to fail your business process due to Kafka issues here, it\u0026#39;s fine do this later on. Large producer buffer and retries? Some extra buffer (e.g. off-heap)?\n— Gunnar Morling 🌍 (@gunnarmorling) November 27, 2021 This question spawned a great discussion with tons of insightful replies (thanks a lot to you all!), so I thought I’d try and give an overview on the different comments and arguments. As with everything, the right strategy and solution depends on the specific requirements of the use case at hand; in particular whether you can or cannot afford for potential inconsistencies between the state of the caller of your application, its own state, and the state in the Kafka cluster.\nAs an example, let’s consider an application which exposes a REST API for placing purchase orders. Acknowledging such a request while actually failing to send a Kafka message with the purchase order to some fulfillment system would be pretty bad: the user would believe their order has been received and will be fulfilled eventually, whereas that’s actually not the case.\nOn the other hand, if the incoming request was savely persisted in a database, and a message is sent to Kafka only for logging purposes, we may be fine to accept this inconsistency between the user’s state (\u0026#34;my order has been received\u0026#34;), the application’s state (order is stored in the database), and the state in Kafka (log message got lost; not ideal, but not the end of the world either).\nUnderstanding these different semantics helps to put the replies to the question into context. There’s one group of replies along the lines of \u0026#34;buffer indefinitely, block inbound requests until messages are sent\u0026#34;, e.g. by Pere Urbón-Bayes:\nThis would certainly depend on the client used and your app use case. Generally speaking, retry forever and block if the buffer is full, leave time for broker to recover, with backpressure.if backpressure not possible, cause use case, off-load off-heap for later recovery.\n— Pere Urbón-Bayes (@purbon) November 28, 2021 This strategy makes a lot of sense if you cannot afford any inconsistency between the state of the different actors at all: e.g. when you’d rather tell the user that you cannot receive their purchase order right now, instead of being at the risk of telling them that you did, whereas you actually didn’t.\nWhat though, if we don’t want to let the availability of a resource like Apache Kafka — which is used for asynchronous message exchanges to begin with — impact the availability of our own application? Can we somehow buffer requests in a safe way, if they cannot be sent to Kafka right away? This would allow to complete the inbound request, while hopefully still avoiding any inconsistencies, at least eventually.\nNow simply buffering requests in memory isn’t reliable in any meaningful sense of the word; if the producing application crashes, any unsent messages will be lost, making this approach not different in terms of reliability from working with ack = 0, i.e. not waiting for any acknowledgements from the Kafka broker. It may be useful for pure fire-and-forget use cases, where you don’t care about delivery guarantees at at all, but these tend to be rare.\nMultiple folks therefore suggested more reliable means of implementing such buffering, e.g. by storing un-sent messages on disk or by using some local, persistent queuing implementation. Some have built solutions using existing open-source components, as Antón Rodriguez and Josh Reagan suggest:\nI usually retry forever, specially when reading from another topic because we can apply backpressure. In some cases, discard after some time is ok. Very rarely off-heap with ChronicleQueue or MapsDB. I have considered but never used an external service as DLQ or a Kafka mesh\n— Antón (@antonmry) November 27, 2021 Embedded broker and in-vm protocol. Either ActiveMQ or Artemis work great.\n— Josh Reagan (@joshdreagan) November 28, 2021 You even could think of having a Kafka cluster close by (which then may have other accessibility characteristics than your \u0026#34;primary\u0026#34; cluster e.g. running in another availability zone) and keeping everything in sync via tools such as MirrorMaker 2. Others, like Jonathan Santilli, create their own custom solutions by forking existing projects:\nI forked Apache Flume and modified it to used a WAL on the disk, so, messages are technically sent, but store on disk, when the Broker is available, the local queue gets flushed, all transparent for the producer.\n— Jonathan Santilli (@pachilo) November 27, 2021 Also ready-made wrappers aound the producer exists, e.g. in Wix\u0026#39; Greyhound Kafka client library, which supports producing via local disk as per Derek Moore:\nI built a proprietary \u0026#34;data refinery\u0026#34; on Kafka for @fanthreesixty and we built ourselves libraries not dissimilar to https://t.co/uQdepGHTzj\n— Derek Moore (@derekm00r3) November 27, 2021 But there be dragons! Persisting to disk will actually not be any better at all, if it’s for instance an ephermeral disk of a Kubernetes pod which gets destroyed after an application crash. But even when using persistent volumes, you may end up with an inherently unreliable solution, as Mic Hussey points out:\nThese are two contradictory requirements 😉 Sooner or later you will run out of local storage capacity. And unless you are very careful you end up moving from a well understood shared queue to a hacked together implicit queue.\n— Mic Hussey (@hussey_mic) November 29, 2021 So it shouldn’t come at a surprise that people in this situation have been looking at alternatives, e.g. by using DynamoDB or S3 as an intermediary buffer; The team around Natan Silnitsky working on Greyhound at Wix are exploring this option currently:\nSo instead we want to fallback only on failure to send. In addition we want to skip the disk all together, because recovery mechanism when a pod is killed in #Kubernetes is too complex (involves a daemonset...), So we\u0026#39;re doing a POC, writing to #DynamoDB/#S3 upon failure 2/3 🧵\n— Natan Silnitsky (@NSilnitsky) November 29, 2021 At this point it’s worth thinking about failure domains, though. Say your application is in its own network and it cannot write to Kafka due to some network split, chances are that it cannot reach other services like S3 either. So another option could be to use a datastore close by as a buffer, for instance a replicated database running on the same Kubernetes cluster or at least in the same availability zone.\nIf this reminds you of change data capture (CDC) and the outbox pattern, you’re absolutely right; multiple folks made this point as well in the conversation, including Natan Silnitsky and R.J. Lorimer:\nThen a dedicated service will listen to #DynamoDB CDC events and produce to #ApacheKafka including payload, key, headers, etc...\n— Natan Silnitsky (@NSilnitsky) November 29, 2021 For our event sourcing systems the event being delivered actually is critical. For \u0026#34;pure\u0026#34; cqrs services, Kafka being down is paramount to not having a db so we fail. Other systems use a transactional outbox that persists in the db. If Kafka is down it sits there until ready.\n— R.J. Lorimer (He/Him) (@realjenius) November 27, 2021 As Kacper Zielinski tells us, this approach is an example of a staged event-driven architecture, or SEDA for short:\nOutbox / SEDA to rescue here. Not sure if any “retry” can guarantee you more than “either you will loose some messages or fail the business logic by eating all resources” :)\n— Kacper Zielinski (@xkzielinski) November 27, 2021 In this model, a database serves as the buffer for persisting messages before they are sent to Kafka, which makes for for a highly reliable solution, provided the right degree of redundancy is implemented e.g. in form of replicas. In fact, if your application needs to write to a database anyways, \u0026#34;sending\u0026#34; messages to Kafka via an outbox table and CDC tools like Debezium is a great way to avoid any inconsistencies between the state in the database and Kafka, without incurring any unsafe dual writes.\nBut of course there is a price to pay here too: end-to-end latency will be increased when going through a database first and then to Kafka, rather than going to Kafka directly. You also should keep in mind that the more moving pieces your solution has, the more complex to operate it will become of course, and the more subtle and hard-to-understand failure modes and edge cases it will have.\nAn excellent point is made by Adam Kotwasinski by stating that it’s not a question of whether things will go wrong, but only when they will go wrong, and that you need to have the right policies in place in order to be prepared for that:\nFor some of my usecases I have a wrapper for Kafka\u0026#39;s producer that requires users to _explicitly_ set up policies like retry/backoff/drop. It allows my customers to think about outages (that will happen!) up front instead of being surprised. Each usecase is different.\n— Adam Kotwasinski (@AKotwasinski) November 28, 2021 In the end it’s all about trade-offs, probabilities and acceptable risks. For instance, would you receive and acknowledge that purchase order request as long as you can store it in a replicated database in the local availability zone, or would you rather reject it, as long as you cannot safely persist it in a multi-AZ Kafka cluster?\nThese questions aren’t merely technical ones any longer, but they require close collaboration with product owners and subject matter experts in the business domain at hand, so to make the most suitable decisions for your specific situation. Managed services with defined SLAs guaranteeing high availability values can make the deciding difference here, as Vikas Sood mentions:\nThat’s why we decided to go with a managed offering to avoid disruptions in some critical processes.Some teams still had another decoupling layer (rabbit) between producers and Kafka. Was never a huge fan of that coz it simply meant more points of failure.\n— Vikas Sood (@Sood1Vikas) November 27, 2021 Thanks a lot again to everyone chiming in and sharing their experiences, this was highly interesting and insightful! You have further ideas and thoughts to share? Let me and the community at large know either by leaving a comment below, or by replying to the thread on Twitter. I’m also curious about your feedback on this format of putting a Twitter discussion into some expanded context. It’s the first time I’ve been doing it, and I’d be eager to know whether you find it useful or not. Thanks!\n","id":43,"publicationdate":"Nov 29, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, I came across an \u003ca href=\"https://www.reddit.com/r/java/comments/r2z17a/has_any_one_attempted_to_write_logs_directly_to/\"\u003einteresting thread\u003c/a\u003e in the Java sub-reddit, with someone asking:\n\u0026#34;Has anyone attempted to write logs directly to Kafka?\u0026#34;.\nThis triggered a number of thoughts and questions for myself,\nin particular how one should deal in an application when an attempt to send messages to Kafka fails,\nfor instance due to some network connectivity issue?\nWhat do you do when you cannot reach the Kafka broker?\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"O Kafka, Where Art Thou?","uri":"https://www.morling.dev/blog/kafka-where-art-thou/"},{"content":" If you work on any kind of software library, ensuring backwards-compatibility is a key concern: if there’s one thing which users really dislike, it is breaking changes in a new version of a library. The rules of what can (and cannot) be changed in a Java API without breaking existing consumers are well defined in the Java language specification (JLS), but things can get pretty interesting in certain corner cases.\nThe Eclipse team provides a comprehensive overview about API evolution guidelines in their wiki. When I shared the link to this great resource on Twitter the other day, I received an interesting reply from Lukas Eder:\nI wish Java had a few tools to prevent some cases of binary compatibility breakages. E.g. when refining a method return type, I’d like to keep the old method around in byte code (but not in source code). I think kotlin has such tools? In the remainder of this post, I’d like to provide some more insight into that problem mentioned by Lukas, and how it can be addressed using an open-source tool called Bridger.\nThe Problem Let’s assume we have a Java library which provides a public class and method like this:\n1 2 3 4 5 6 public class SomeService { public Number getSomeNumber() { return 42L; } } The library is released as open-source and it gets adopted quickly by the community; it’s a great service after all, providing 42 as the answer, right?\nAfter some time though, people start to complain: instead of the generic Number return type, they’d rather prefer a more specific return type of Long, which for instance offers the compareTo() method. Since the returned value is always a long value indeed (and no other Number subtype such as Double), we agree that the initial API definition wasn’t ideal, and we alter the method definition, now returning Long instead.\nBut soon after we’ve released version 2.0 of the library with that change, users report a new problem: after upgrading to the new version, they suddenly get the following error when running their application:\n1 2 java.lang.NoSuchMethodError: \u0026#39;java.lang.Number dev.morling.demos.bridgemethods.SomeService.getSomeNumber()\u0026#39; at dev.morling.demos.bridgemethods.SomeClientTest.shouldReturn42(SomeClientTest.java:27) That doesn’t look good! Interestingly, other users don’t have a problem with version 2.0, so what is going on here? In order to understand that, let’s take a look at how this method is used, in source code and in Java binary code. First the source code:\n1 2 3 4 5 6 7 public class SomeClient { public String getSomeNumber() { SomeService service = new SomeService(); return String.valueOf(service.getSomeNumber()); } } Rather unspectacular; so let’s use javap to examine the byte code of that class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public java.lang.String getSomeNumber(); descriptor: ()Ljava/lang/String; flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=2, args_size=1 0: new #7 // class dev/morling/demos/bridgemethods/SomeService 3: dup 4: invokespecial #9 // Method dev/morling/demos/bridgemethods/SomeService.\u0026#34;\u0026lt;init\u0026gt;\u0026#34;:()V 7: astore_1 8: aload_1 9: invokevirtual #10 // Method dev/morling/demos/bridgemethods/SomeService.getSomeNumber:()Ljava/lang/Number; 12: invokestatic #14 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String; 15: areturn LineNumberTable: line 21: 0 line 22: 8 LocalVariableTable: Start Length Slot Name Signature 0 16 0 this Ldev/morling/demos/bridgemethods/SomeClient; 8 8 1 service Ldev/morling/demos/bridgemethods/SomeService; The interesting part is the invokevirtual at label 9; that’s the invocation of the SomeService::getSomeNumber() method, and as we see, the return type of the invoked method is part of the byte code of that invocation, too. As developers writing code in the Java language, this might come at a suprise at first, as we tend to think of just a method’s names and its parameter types as the method signature. For instance, we may not declare two methods which only differ by their return type in the same Java class. But from the perspective of the Java runtime, the return type of a method is part of method signatures as well.\nThis explains the error reports we got from our users: when changing the method return type from Number to Long, we did a change that broke the binary compatibility of our library. The JVM was looking for a method SomeService::getSomeNumber() returning Number, but it couldn’t find it in the class file of version 2.0 of our service.\nIt also explains why not all the users reported that problem: those that recompiled their own application when upgrading to 2.0 would not run into any issues, as the compiler would simply use the new version of the method and put the invocation of that signature into the class files of any callers. Only those users who did not re-compile their code encountered the problem, i.e. the change actually was source-compatible.\nBridge Methods to the Rescue At this point you might wonder: Isn’t it possible to refine method return types in sub-classes? How does that work then? Indeed it’s true, Java does support co-variant return types, i.e. a sub-class can override a method using a more specific return type than declared in the super-type:\n1 2 3 4 5 6 7 public class SomeSubService extends SomeService { @Override public Long getSomeNumber() { return 42L; } } To make this work for a client coded against the super-type, the Java compiler uses a neat trick: it injects a so-called bridge method into the class file of the sub-class, which has the signature of the overridden method and which calls the overriding method. This is how this looks like when disassembling the SomeSubService class file:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 public java.lang.Long getSomeNumber(); (1) descriptor: ()Ljava/lang/Long; flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: ldc2_w #14 // long 42l 3: invokestatic #21 // Method java/lang/Long.valueOf:(J)Ljava/lang/Long; 6: areturn LineNumberTable: line 22: 0 LocalVariableTable: Start Length Slot Name Signature 0 7 0 this Ldev/morling/demos/bridgemethods/SomeSubService; public java.lang.Number getSomeNumber(); (2) descriptor: ()Ljava/lang/Number; flags: (0x1041) ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC (3) Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokevirtual #24 // Method getSomeNumber:()Ljava/lang/Long; 4: areturn LineNumberTable: line 18: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Ldev/morling/demos/bridgemethods/SomeSubService; 1 The overriding method as defined in the sub-class 2 The bridge method with the signature from the super-class, invoking the overriding method 3 The injected method has the ACC_BRIDGE and ACC_SYNTHETIC modifiers That way, a client compiled against the super-type method will first invoke the bridge method, which in turn delegates to the overriding method of the sub-class, providing the late binding semantics we’d expect from Java.\nAnother situation where the Java compiler relies on bridge methods is compiling sub-types of generic super-classes or interfaces. Refer to the Java Tutorial to learn more about this. Creating Bridge Methods Ourselves So as we’ve seen, with bridge methods, there is a tool in the box to ensure compatibility in case of refining return types in sub-classes. Which brings us back to Lukas\u0026#39; question from the beginning: is there a way for using the same trick for ensuring compatibility when evolving our API across library versions?\nNow you can’t define a bridge method using the Java language, this concept just doesn’t exist at the language level. So I thought about quickly hacking together a PoC for this using the ASM bytecode manipulation toolkit; but what’s better than creating open-source? Re-using existing open-source! As it turns out, there’s a tool for that very purpose exactly: Bridger, created by my fellow Red Hatter David M. Lloyd.\nBridger lets you create your own bridge methods, using ASM to apply the required class file transformations for turning a method into a bridge method. It comes with a Maven plug-in for integrating this transformation step into your build process. Here’s the plug-in configuration we need:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.jboss.bridger\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;bridger\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.5.Final\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;weave\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;process-classes\u0026lt;/phase\u0026gt; (1) \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;transform\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; (2) \u0026lt;groupId\u0026gt;org.ow2.asm\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;asm\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;9.2\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; \u0026lt;/plugin\u0026gt; 1 Bind the transform goal to the process-classes build lifecycle phase, so as to modify the classes produced by the Java compiler 2 Use the latest version of ASM, so we can work with Java 17 With the plug-in in place, you can define bridge methods like so, using the $$bridge name suffix (seems the syntax highligher doesn’t like the $ signs in identifiers…​):\n1 2 3 4 5 6 7 8 9 10 11 12 13 public class SomeService { /** * @hidden (1) */ public Number getSomeNumber$$bridge() { (2) return getSomeNumber(); } public Long getSomeNumber() { return 42L; } } 1 By means of the @hidden JavaDoc tag (added in Java 9), this method will be excluded from the JavaDoc generated for our library 2 The bridge method to be; the name suffix will be removed by Bridger, i.e. it will be named getSomeNumber; it will also have the ACC_BRIDGE and ACC_SYNTHETIC modifiers And that’s how the byte code of SomeService looks like after Bridger applied the transformation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 public java.lang.Number getSomeNumber(); descriptor: ()Ljava/lang/Number; flags: (0x1041) ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokevirtual #16 // Method getSomeNumber:()Ljava/lang/Long; 4: areturn LineNumberTable: line 21: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Ldev/morling/demos/bridgemethods/SomeService; public java.lang.Long getSomeNumber(); descriptor: ()Ljava/lang/Long; flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: ldc2_w #17 // long 42l 3: invokestatic #24 // Method java/lang/Long.valueOf:(J)Ljava/lang/Long; 6: areturn LineNumberTable: line 25: 0 LocalVariableTable: Start Length Slot Name Signature 0 7 0 this Ldev/morling/demos/bridgemethods/SomeService; With that, we have solved the challenge: utilizing a bridge method, we can rectify the glitch in the version 1.0 API and refine the method return type in a new version of our library, without breaking source nor binary compatibility with existing users.\nBy means of the @hidden JavaDoc tag, the source of our bridge method won’t show up in the rendered documentation (which would be rather confusing), and marked as a synthetic bridge method in the class file, it also won’t show up when looking at the JAR in an IDE.\nIf you’d like to start your own explorations of Java bridge methods, you can find the complete source code of the example in this GitHub repo. Useful tools for tracking API changes and identifying any potential breaking changes include SigTest (we use this one for instance in the Bean Validation specification to ensure backwards compatibility) and Revapi (which we use in Debezium). Lastly, here’s a great blog post by Stuart Marks, where he describes how even the seemingly innocent addition of a Java default method to a widely used (and implemented) interface may lead to problems in the real world.\n","id":44,"publicationdate":"Nov 22, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIf you work on any kind of software library,\nensuring backwards-compatibility is a key concern:\nif there’s one thing which users really dislike, it is breaking changes in a new version of a library.\nThe rules of what can (and cannot) be changed in a Java API without breaking existing consumers are well defined in the Java language specification (JLS),\nbut things can get pretty interesting in certain corner cases.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe Eclipse team provides a \u003ca href=\"https://wiki.eclipse.org/Evolving_Java-based_APIs_2\"\u003ecomprehensive overview\u003c/a\u003e about API evolution guidelines in their wiki.\nWhen I shared the link to this great resource on Twitter the other day,\nI received an \u003ca href=\"https://twitter.com/lukaseder/status/1462358911072317440\"\u003einteresting reply\u003c/a\u003e from Lukas Eder:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"quoteblock\"\u003e\n\u003cblockquote\u003e\nI wish Java had a few tools to prevent some cases of binary compatibility breakages. E.g. when refining a method return type, I’d like to keep the old method around in byte code (but not in source code).\n\u003cbr/\u003e\n\u003cbr/\u003e\nI think kotlin has such tools?\n\u003c/blockquote\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn the remainder of this post,\nI’d like to provide some more insight into that problem mentioned by Lukas,\nand how it can be addressed using an open-source tool called \u003ca href=\"https://github.com/dmlloyd/bridger\"\u003eBridger\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Refining The Return Type Of Java Methods Without Breaking Backwards-Compatibility","uri":"https://www.morling.dev/blog/refining-return-type-java-methods-without-breaking-backwards-compatibility/"},{"content":" If you have followed this blog for a while, you’ll know that I am a big fan of JDK Flight Recorder (JFR), the low-overhead diagnostics and profiling framework built into the HotSpot Java virtual machine. And indeed, until recently, this meant only HotSpot: Folks compiling their Java applications into GraalVM native binaries could not benefit from all the JFR goodness so far.\nBut luckily, this situation has changed with GraalVM 21.2: Thanks to a collaboration of engineers from Red Hat and Oracle, GraalVM native binaries now also support JDK Flight Recorder. At this point, the JFR recording engine itself has been put in place, there are not many event types actually emitted yet. As Jie Kang wrote recently in a post about this ongoing work, this should change soon, though:\nThe initial merge for JFR infrastructure is complete but there is a long road ahead before the system can provide a view into native executables produced by GraalVM that is similar to what is possible for HotSpot. Up next is the work to add events for garbage collection, threads, exceptions, and other useful locations in SubstrateVM. — JDK Flight Recorder support for GraalVM Native Image: The journey so far What already does work is emitting custom JFR events from your application code. So I took the Quarkus-based todo management application from my earlier post about monitoring REST APIs with JFR and explored what it’d take to make it work as a native binary. And what should I say, essentially things \u0026#34;just worked ™️\u0026#34;. All I had to do, was the following:\nUse a current version of GraalVM (21.3 at the time of writing)\nUpgrade Quarkus to the current version (2.4.2.Final, released just today); with 2.2.3.Final, which I had been using before, I’d get an error at image build time about a modifier mismatch with a native method substituted by Quarkus\nEnable GraalVM’s AllowVMInspection option when creating the native binary\nAs per the GraalVM documentation, the latter is required in order to use JFR events in native binaries. Unfortunately, failing to do so will only be reported at application runtime with an exception like this:\n1 2 3 4 5 6 7 8 9 10 11 2021-11-12 15:31:22,456 ERROR [io.qua.run.Application] (main) Failed to start application (with profile prod): java.lang.UnsatisfiedLinkError: jdk.jfr.internal.JVM.getHandler(Ljava/lang/Class;)Ljava/lang/Object; [symbol: Java_jdk_jfr_internal_JVM_getHandler or Java_jdk_jfr_internal_JVM_getHandler__Ljava_lang_Class_2] at com.oracle.svm.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:153) at com.oracle.svm.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:57) at jdk.jfr.internal.JVM.getHandler(JVM.java) at jdk.jfr.internal.Utils.getHandler(Utils.java:448) at jdk.jfr.internal.MetadataRepository.getHandler(MetadataRepository.java:174) at jdk.jfr.internal.MetadataRepository.register(MetadataRepository.java:135) at jdk.jfr.internal.MetadataRepository.register(MetadataRepository.java:130) at jdk.jfr.FlightRecorder.register(FlightRecorder.java:136) at dev.morling.demos.jfr.Metrics.registerEvent(Metrics.java:27) ... This is triggered by the application code registering the custom JFR event type:\n1 2 3 public void registerEvent(@Observes StartupEvent se) { FlightRecorder.register(JaxRsInvocationEvent.class); } Here I’d wish that either GraalVM’s native-image tool or Quarkus would tell me about this situation already upon build time, in particular as the cause of that problem is not readily apparent from the exception above. In any case, the required fix is simple enough, all we need to do is to set the quarkus.native.enable-vm-inspection option in the application.properties file of the Quarkus application:\n1 quarkus.native.enable-vm-inspection=true With that configuration in place, the application can be built as a native binary via mvn clean verify -Pnative. Grab a coffee while the build is running (it takes about two minutes on my laptop), and then you can start the resulting native binary with the following options for creating a JFR recording:\n1 2 3 ./target/flight-recorder-demo-1.0.0-SNAPSHOT-runner \\ -XX:+FlightRecorder \\ -XX:StartFlightRecording=\u0026#34;filename=my-recording.jfr\u0026#34; You can also configure some more of the known JFR options, such as maximum recording size and duration. What is not possible at this point is starting recordings dynamically at runtime e.g. via jcmd or JDK Mission Control, as the JMX-based infrastructure required for this isn’t present in native binaries (I haven’t tried to do so programmatically from within the application itself, this may be supported already). JFR Event Streaming (as introduced with JEP 349 in Java 14) also doesn’t work yet.\nAfter creating some todos in the web application, we can open the JFR recording in JDK Mission Control and examine the JFR events emitted for each invocation of the REST API:\nAs you see, besides the custom REST invocation events and some system events representing environment variables and system properties, the recording is rather empty. Also note how the thread attribute of the custom event type isn’t populated.\nI’ve updated the jfr-custom-events repository on GitHub, so you can get started with your own explorations around JFR events in GraalVM native binaries easily. Just make sure to have a current GraalVM and its native-image tool installed. The initial feature request for adding JFR support to GraalVM native binaries provides some more background information. You also can use JFR with the Mandrel distribution of GraalVM.\nTo learn more about JFR in general, have a look at this post by Mario Torre. Finally, if you’d like to find out how to use JFR for identifying potential performance regressions in your Java applications, check out this talk about JfrUnit which I did at the P99Conf conference a few weeks ago.\n","id":45,"publicationdate":"Nov 12, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIf you have followed this blog for a while,\nyou’ll know that I am a big fan of \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR),\nthe low-overhead diagnostics and profiling framework built into the HotSpot Java virtual machine.\nAnd indeed, until recently, this meant \u003cem\u003eonly\u003c/em\u003e HotSpot:\nFolks compiling their Java applications into \u003ca href=\"https://www.graalvm.org/reference-manual/native-image/\"\u003eGraalVM native binaries\u003c/a\u003e could not benefit from all the JFR goodness so far.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"JDK Flight Recorder Events in GraalVM Native Binaries","uri":"https://www.morling.dev/blog/jfr-events-in-graalvm-native-binaries/"},{"content":" If you love to attend conferences around the world without actually leaving the comfort of your house, 2021 certainly was (and is!) a perfect year for you. Tons of online conferences, many of them available for free, are hosting talks on all kinds of topics, and virtual conference platforms are getting better, too.\nAs the year is slowly reaching its end, I thought it might be nice to do a quick recap and gather in one place all the talks on Debezium and change data capture (CDC) which I did in 2021. An overarching theme for these talks was to discuss different CDC usage patterns and putting Debezium into the context of solving common data engineering tasks by combining it with other open-source projects such as Infinispan and Apache Pinot. In order to not feel too lonely in front of the screen and make things a bit more exciting, I decided to team up with some amazing friends from the open-source community for the different talks. A big thank you for these phantastic collaborations to Katia Aresti, Kenny Bastani, and Hans-Peter Grahsl!\nSo without further ado, here are four Debezium talks I had the pleasure to co-present in 2021.\nDon’t Fear Outdated Caches – Change Data Capture to the Rescue! As per an old saying in software engineering, there’s only two hard things: cache invalidation and naming things. Well, turns out the first is solved actually ;)\nIn this talk at the Bordeaux Java User Group, Katia Aresti from the Infinispan team and I explored how users of an application can benefit from low response times by means of read data models, persisted in distributed caches close to the user. When working with a central database as the authoritative data source — thus receiving all the write requests — these local caches need to be kept up to date, of course. This is where Debezium comes in: any data changes are captured and propagated to the caches via Apache Kafka.\nAnd as if the combination of Kafka, Infinispan and Debezium was not already exciting enough, we also threw some Quarkus and Kafka Streams into the mix, joining the data from multiple Debezium change data topics, allowing to retrieve entire aggregate structures via a single key look-up from the local caches. It’s still on our agenda to describe that archicture in a blog post, so stay tuned for that.\n📺 Recording on YouTube\n🖥️ Slides\n🤖 Demo source code\nChange Data Streaming Patterns in Distributed Systems While some folks already might feel something like microservices fatigue, the fact is undeniable that organizing business functionality into multiple, loosely coupled services has been one of the biggest trends in software engineering over the last years.\nOf course these services don’t exist in isolation, but they need to exchange data and cooperate; Apache Kafka has become the de-facto standard as the backbone for connecting services, facilitating asynchronous event-driven communication between them. In this joint presentation, my dear friend Hans-Peter Grahsl and I set out to explore what role change data capture can play in such architectures, and which patterns there are for applying CDC to solve common problems related to handling data in microservices architectures. We focused on three patterns in particular, each implemented using log-based CDC via Debezium:\nThe outbox pattern for reliable, eventually consistent data exchange between microservices, without incurring unsafe dual writes or tight coupling\nThe strangler fig pattern for gradually extracting microservices from existing monolithic applications\nThe saga pattern for coordinating long-running business transactions across multiple services, ensuring such activity gets consistently applied or aborted by all participating services\nWe presented that talk at several conferences, including Kafka Summit Europe, Berlin BuzzWords, and jLove. We also did a variation of the presentation at Flink Forward, discussing how to implement the different CDC patterns using Debezium and Apache Flink. The recording of this session should be published soon, in the meantime you can find the slides here. I also highly recommend to take a look at this blog post by Bilgin Ibryam, in which he discusses these patterns in depth.\n📺 Recording on YouTube\n🖥️ Slides\n🤖 Demo source code\nAnalyzing Real-time Order Deliveries using CDC with Debezium and Pinot Traditionally, there has been a chasm between operational databases backing enterprise applications (i.e. OLTP systems), and systems meant for ad-hoc analytics use cases, such as queries run by a business analyst in the back office. (OLAP systems). Data would typically be propagated in batches from the former to the latter, resulting in multi-hour delays until the analytics system would be able to run queries on changed production data.\nWith the current shift to user-facing analytics, we are observing nothing less than a revolution: the ability to serve low-latency analytical queries on large data sets to the end users of an application, based on data that is really fresh (seconds old, rather than hours). Compared to response times and freshness guarantees you’d typically get from earlier generations of data warehouses, this is a game changer.\nIn this model, Debezium is used to capture all data changes from the operational database and propagate them into the analytics system. Kenny Bastani of StartTree and I spoke about the opportunities and use cases enabled by combining Debezium with Apache Pinot, a realtime distributed OLAP datastore, at the Pinot meet-up. A massive shout-out to Kenny again for putting together an awesome demo, showing how to use Debezium and the outbox pattern for getting the data into Apache Kafka, transform the data and ingest it into Pinot, and do some really cool visualizations via Apache Superset.\n📺 Recording on YouTube\n🤖 Demo source code\nDissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB After talking about three different CDC patterns, Hans-Peter and I decided to explore one of the patterns in some more depth and did this talk focusing exclusively on the strangler fig pattern. Existing monolithic applications are a reality in many enterprises, and oftentimes it’s just not feasible to replace them with a microservices architecture all at once in one single migration step.\nThis is where the strangler fig pattern comes in: it helps you to gradually extract components from a monolith into separate services, relying on CDC for keeping the data stores of the different systems in sync. A routing component, such as Nginx or Envoy Proxy, in front of all the systems sends each incoming request to that system which is in charge of a specific part of the domain at a given point in time during the migration.\nThis talk (which we presented at MongoDB.Live, Kafka Summit Americas, and VoxxedDays Romania), also contains a demo, we show how to implement the strangler fig pattern using Debezium, gradually moving data from a legacy system’s MySQL database over to the MongoDB instance of a new microservice, which is built using Quarkus.\n📺 Recording on YouTube\n🖥️ Slides\n🤖 Demo source code\nBonus: Debezium at the Trino Community Broadcast This one is not so much a regular conference talk, but more of an informal exchange, so I’m adding it as a bonus here, hoping you may find it interesting too. Brian Olsen and Manfred Moser of Starburst, the company behind Trino, invited Ashhar Hasan, Ayush Chauhan, and me onto their Trino Community Broadcast.\nWe had a great time talking about Debezium and CDC in the context of Trino and its federated query capabilities, learning a lot from Ashhar and Ayush about their real-world experiences from using these technologies in production.\nLearning More Thanks again to Katia, Kenny, and Hans-Peter for joining the virtual conference stages with me this year! It would not have been half as much fun without you.\nIf these talks have piqued your interest in open-source change data capture and Debezium, head over to the Debezium website to learn more. You can also find many more examples in the Debezium examples repo on GitHub, and if you look for reports by folks from the community about their experiences using Debezium, take a look at this currated list of blog posts and other resources.\n","id":46,"publicationdate":"Nov 2, 2021","section":"blog","summary":"If you love to attend conferences around the world without actually leaving the comfort of your house, 2021 certainly was (and is!) a perfect year for you. Tons of online conferences, many of them available for free, are hosting talks on all kinds of topics, and virtual conference platforms are getting better, too.\nAs the year is slowly reaching its end, I thought it might be nice to do a quick recap and gather in one place all the talks on Debezium and change data capture (CDC) which I did in 2021.","tags":null,"title":"Debezium and Friends – Conference Talks 2021","uri":"https://www.morling.dev/blog/debezium-talks-2021/"},{"content":" I’ve been working from home exclusively for the last nine years, but it was only last year that I started to look into ways for expanding my computer set-up and go beyond the usual combination of having a laptop with your regular external screen. The global COVID-19 pandemic, the prospect of having more calls with colleagues than ever (no physical meetings), and the constantly increasing need for recording talks for online conferences and meet-ups made me reevaluate things and steadily improve and fine tune my set-up, in particular in regards to better video and audio quality.\nWhen I shared a picture of my desk on Twitter recently, a few folks asked for more details on specific parts like the screen, microphone etc, so I thought I’d provide some insights in this post. Don’t expect any sophisticated test or evaluation of sorts, I’m just going to briefly describe the different components, how I use them, things I like about them, and other aspects which still could be improved. Note that I’m not affiliated in any way with any of the vendors mentioned in this post, so anything positive or negative I’m going to mention, is solely based on my personal experience from using the discussed items, without any financial incentive to do so. There are also no affiliate links.\nThe Screen Let’s start with the most apparent part of the set-up, the screen. It’s a curved 49\u0026#34; 32:9 ultra-widescreen display (Samsung C49RG94SSR, 5120 x 1440 pixels), i.e. it offers the same screen real estate like two 16:9 screens next to each other.\nWhether such a large screen suits your personal preferences is something which you only really can find out by yourself. Curvature of the screen is something you may have to get used to, initially I was slightly put off by (wide) windows not appearing 100% straight, but by now I don’t even notice this any more. I suggest you have a look at this article by my colleague Emmanuel Bernard, where he compares ultra-wide monitors to the alternatives and discusses the pros and cons of each. Personally, I’m very happy with this screen and really wouldn’t want to miss it. I never was a fan of multi-screen set-ups due to the inevitable frames between screens, and in fact, my only regret is that I didn’t buy it earlier. So thanks a lot for the recommendation, Emmanuel!\nSome folks use window managers to arrange their application windows on large screens (e.g. Rectangle on macOS, a few more alternatives are discussed in this thread by Guillaume Laforge), but I find myself just manually organizing things in roughly three columns: communications (email, chat), editing (documents, shell, IDE, etc.), and preview (e.g. rendered AsciiDoc documents).\nFigure 1. Reading the source code of HasThisTypePatternTried…​Visitor at 300%? No problem! One very useful feature of this monitor is its picture-by-picture mode (PBP): it lets you connect two sources at once, which then will show up next to each other on the screen. Now I’m typically not working with two computers simultaneously (although this can be useful when for instance editing a benchmark on one machine and running it on another), but I use PBP when doing presentations, or when recording conference talks; in that case, I’ll connect the same machine twice, i.e. as primary and secondary screen. This allows me to share one of the screens entirely for the presentation/recording (thus having the commonly expected 16:9 aspect ratio), with other applications being located on the second screen, and without having to manually adjust the size of individually shared windows or tabs. Needless to say that sharing the full screen isn’t very practical, as viewers with a regular screen would just see a small wide ribbon.\nAre there downsides to this screen? So far, I’ve found two. One is its energy consumption; with 55 kWh/1,000h, it’s definitely on the high end of the spectrum. I suppose in parts that’s just due to its sheer size, but I’m sure things could be improved here. The other thing to mention is that when using it with a MacBook Pro, you should make sure to have the lid of the laptop closed (implying that you’ll need an external keyboard and mouse/touchpad), as the fan will be audible substantially more when driving both internal and external screens.\nOne last minor annoyance is that the screen’s software forgets the settings when enabling and disabling the picture-by-picture mode. When switching from single input to PBP, I always need to configure the input sources again. Here I’d really wish the screen would memorize the settings from the last time I was using PBP.\nCompute I am using two Apple computers to get things done: a 2019 16\u0026#34; MacBook Pro (2,6 GHz 6-Core Intel Core i7, 32 GB of RAM) provided by my employer, and a Mac Mini M1 2020 with 16 GB of RAM. Most work stuff is happening on the MackBook Pro, and really there’s nothing too exciting to share here; it tends to do its job just as it should. There’s two things I don’t like about it though:\nthe touch bar; it’s virtually useless to me, and I wished for physical function keys instead, making it much more reliable to hit the right key combinations, e.g. in the IDE. Granted, I work with an external keyboard most of the time, so it’s not impacting me that much\nthe only connectivity option being USB-C; while surely elegant, the required zoo of connectors and adapters to actually plug in external hardware, renders that point more than moot\nThankfully, Apple finally got that memo too and addressed both things in their latest MacBook Pro edition.\nFigure 2. Duplo bricks make for a perfect laptop stand; luckily, I could borrow some from my daughter The Mac Mini is awesome for any kind of video recording and streaming. Recently, I was asked to record two Full HD streams for a talk at AccentoDev: one with my slides, and one with my camera feed, allowing the video editor to freely switch between the two when creating the final recording for publication. The M1 wouldn’t break a sweat when recording this video with a resolution of 3,840 x 1,080 pixels via OBS, with the fan barely being audible. Whereas when trying to do the same on the MBP, the fan would spin up heavily, and you’d have a hard time to not capture the fan noise with the microphone.\nFigure 3. MacMini M1 2020 Originally, I bought the Mac Mini M1 to experiment a bit with running Java applications on the AArch64 architecture. Unfortunately, I didn’t really find much time yet to do so. One interesting thing I noticed though from running some quick JMH benchmarks against the new Java Vector API is that results tended to be super-stable, with a much smaller standard deviation than running the same benchmark on the x86-based laptop. I hope to find some time to dive a bit more into that area at some point in the future.\nCloud Compute Every now and then, I do have the need for running something on Linux rather than macOS, or for spinning up multiple boxes, executing a benchmark for instance. Ok, ok, they are not actually running on my desk, but I thought it still might be interesting to share a few words on that.\nMy preferred go-to platform for these scenarios is Hetzner Cloud, as they provide flexible cloud compute options at a really attractive price tag, in particular capped at a fixed limit, so there’s no potential for surprise bills coming in.\nTo make launching and configuring boxes in the Hetzner cloud as easy as possible for me, I have a simple set-up of Terraform and Ansible scripts. The former just launches up the desired number of compute nodes with the chosen spec, using the current version of Fedora as the operating system. The latter installs the tools I commonly need, such as different Java versions, commonly used CLI tools, and such.\nOne neat thing about Hetzner Cloud is that you can easily scale up and down single instances. So what I’ll usually do is to spin up a box in the smallest available configuration (CX11); running this for a full month costs a whopping €4.15. But then, when I actually want to use the node, I’ll change the Terraform configuration to something more powerful, such as the CCX22 instance type with 4 dedicated vCPU and 16 GB RAM. One quick terraform apply and a few seconds later, I’ll have a node with the specs I need. Only for the few hours I’m using it, I’ll have to pay the increased price for the better spec, before scaling it back down to the CX11 instance again.\nCameras So let’s change topics and talk a bit about my recording set-up. There’s essentially three scenarios where I need to record myself and/or my screen:\nVideo calls: working 100% from home in a globally distributed development team, there’s not a single day where I won’t have to do at least a couple of calls with my co-workers\nConference talks: with the global pandemic still going on, all the conferences have gone virtual, requiring either to pre-record or live-stream any talks\nDemos: lately, I’ve become a fan of recording short videos introducing new features in the projects I’m involved with, e.g. the Debezium UI or kcctl\nAdditionally, I’m joining Nicolai Parlog once per month on his Twitch channel, where we talk about and explore all things Java.\nWhile I initially used the internal camera and microphone of my laptop, I wasn’t really satisfied with the outcome, in particular once I saw the high quality of recordings shared by other folks. For a really good video image quality, two things are key: using a \u0026#34;real\u0026#34; camera (i.e. not a webcam), and proper lighting. You’ll also want a good external microphone, more on that below.\nSo why not a webcam? Essentially because sensors are too small and lenses are too slow, which means you’ll quickly have noise in the image and you won’t get that nice movie-like look with a shallow depth of field (bokeh). Using either a DSLR or a mirrorless system camera will yield a dramatically better image quality. In my case, I am using the Lumix GX80 (sold as GX85 in the US), a mirrorless system camera from Panasonic, using the Micro Four Thirds interchangeable lens standard.\nFigure 4. Panasonic Lumix GX80 and Logitech StreamCam I’m generally happy with it for this purpose: it provides clean HDMI output (i.e. no menu overlays when capturing the live feed via HDMI, as it’s the case with some cameras), image quality and ergonomics are good overall. On the downside, it doesn’t provide continuous auto-focus if you’re not actually recording on the camera. This sounds worse than it actually is in practice: using the \u0026#34;Quick AF\u0026#34; option, it will auto-focus when turning on the camera, or when zooming in or out a bit, which is enough to get proper focussing in a relatively static setting such as a screen recording session. If you are planning to move forth and back a lot though, then you should look into other options. Another thing to mention is that the GX80 doesn’t allow to connect an external microphone to it; in my case, that doesn’t matter though, as I’m connecting the mic via a separate audio interface.\nAs you’d quickly run down the camera’s battery when streaming its video signal for a longer period of time, an external power source should be used. I’m using a dummy battery similar to this one, which does the job as expected. Just make sure to have an USB power adapter which provides enough output current (2A or more); I had missed that initially and was wondering why the camera would always turn off when pressing the focus button…​ . For a camera mount, I’m using this cheap one; it’s pretty crappy, with lots of wobbling, but once you have the camera in the place where you want it to be, it’ll stay there. Still, I’d probably pay a bit more to get a more robust mount, should I ever have to buy a new one.\nAs you typically cannot connect a DSLR or a mirrorless system camera like the GX80 via USB, you’ll also need an HDMI converter which you then can plug into your USB port. Here I’m using the ubiquitous Elgato Cam Link 4K. Back when I got it, it was pretty much the only (and pricy) option, but I believe by now there are alternatives, which should work equally well but are a bit cheaper.\nDespite my \u0026#34;no webcam\u0026#34; mantra, I also have a Logitech StreamCam in addition to the GX80. As you’d expect, image quality is not really comparable, in particular white balance tends to be quite off for a while after switching it on. I still use it occasionally for video calls, as it’s a bit quicker to turn on and set up in comparison to the GX80.\nWork in Progress: Teleprompter One of my pet peeves with modern communication is the lack of \u0026#34;eye contact\u0026#34; during virtual conference talks and video calls. As we all want to look onto the screen rather than the camera, the viewer on the other side feels like you are not looking at them, but slightly below or to the side. While I believe I largely manage to look into the camera when doing talk recordings, I find it nearly impossible to do so during calls, as the natural desire to look at the other person’s image on my screen is just too strong.\nThat’s why I’ve started to explore how I could build my own teleprompter, which puts the camera behind a two-way mirror. That way, I can look at the screen, while also looking straight into the camera. For this purpose, I bought two-way mirror glass on eBay (Schott beamsplitter glass, which is working amazingly well) as well as a cheap-ish external screen, and built a quick proof-of-concept (again using some of my daughter’s Duplo bricks, this time for the frame).\nThe result was pretty promising, with one open challenge being that the display contents are mirrored from left to right. So I’d need to digitally mirror the output of that display; if you are aware of any option to do so on macOS, any pointers would be appreciated. With 11.6\u0026#34;, the screen also is rather small, if you consider building something like this by yourself, I’d recommend going for a larger one.\nSince then, I’ve dropped that ball a bit and haven’t followed through yet to make it \u0026#34;production-worthy\u0026#34;. I’d still love to make this useful in practice eventually, perhaps once my daughter lets me keep those Duplo bricks ;)\nLighting The best camera won’t help you much if there isn’t enough light to work with. Generally, the more light you have, the easier the job will be for the camera. I have a ring light similar to this one, with adjustable brightness and color temperature. I don’t have much to say about it, other than that it does what I want it to do. Note that the tripod requires some space on the floor, which means you cannot move your desk all the way to the wall if you have the light behind it. It’s not that much of a problem in my case, but you may consider getting a desk-mount alternatively.\nOne problem I do have with the ring light is reflections on my glasses. I haven’t really found a good solution here (no, I won’t get contact lenses), other than pushing the ring light a bit higher than ideal, so that there are no reflections when looking into the camera further below. On the downside, this results in the area below my chin becoming a bit shaded. A case of having to choose your poison, I suppose.\nFigure 5. Background Lights When doing conference talks, I have two more lights in the backgrounds which make for a nicer atmosphere of the scenery. A vintage light (no-name brand, got it from my local hardware store) which adds a nice highlight, and a Philips Hue Iris lamp which adds a colored note of my choosing. Overall, I’m like 90% happy with the lighting set-up, the comment by video grandmaster Nicolai about lacking separation of background and foreground still nags me ;)\nAudio Finally, let’s talk about my audio recording set-up. This definitely is the area I knew the least about when setting out to improve my computing and recording gear. I don’t quite remember when and how I got sucked into the audio game, perhaps it was when I learned about scientific research indicating that audio quality impacts the perceived quality of spoken content.\nAfter a rather disappointing experience with the RØDE NT-USB (perhaps it’s my lack of audiophile sensitivity, but I didn’t sense a significant difference compared to using the built-in laptop mic), I decided to look for an external microphone which doesn’t connect via USB. After some research, I decided to go for the RØDE Procaster, which is a rather professional microphone purpose-built for voice recording. It is a dynamic microphone, which in comparison to a condenser microphone will pick up much less noise from your surroundings (you can learn more about the differences between these two kinds of microphones here). This means that I don’t have to ask my family to be extra-silent in the house while I am doing a recording.\nFigure 6. RØDE Procaster Microphone One thing to keep in mind is that this type of microphone is meant to be put rather close to your mouth, which you may or may not find annoying. Personally, I sort of like how this makes speaking a more conscious act, but I’d probably not like to have the microphone in front of me when doing a multi-hour call. That’s why I also have a cheap-ish headset as an alternative for these situations. Yet another — and more costly — option would be to get a shotgun microphone which you can position further away from you.\nThe microphone is rather heavy (and you wouldn’t want to hold it anyways), so I am using the PSA1 studio boom arm. It lets you move the microphone with a single finger to where you want it to be, and then it will stay exactly there. A really solid piece of engineering, in particular when comparing it to the no-name mount I’m using for the camera.\nHaving an external microphone is just one part of the story, though. You also need to have an audio interface which lets you plug in the microphone (using an XLR cable) and then propagates the audio signal to your computer via USB. I didn’t do much exploration here, but went for the PreSonus AudioBox USB 96, which was recommended to me by a coworker. In general, it does the job well, there’s two things I don’t like about it though.\nFigure 7. PreSonus AudioBox USB 96 Audio Interface First, it doesn’t have a physical power switch, which means its two (rather bright) red LEDs will be lighting up as long as it’s connected to the USB port. Secondly, I really wished it would have a built-in option to emit the microphone signal on both audio channels, left and right. As a microphone is a mono audio source, you’ll hear the signal only on one channel (typically the left one) on your computer. When doing recordings, where you have the time and ability to do some post-processing, that’s not a big problem; you can simply duplicate the audio track to both channels. But when using the microphone in a Zoom call or similar, the one-sided output is not what you want. In absence of hardware support for this kind of upmixing in the AudioBox, I had to go for a software solution, which took me quite some time to figure out.\nOn macOS, this requires two programs, LadioCast and Blackhole. The former lets you take the single channel input from the AudioBox and expose it on both channels, left and right. This is then connected to a virtual audio device created using the BlackHole audio driver. In Zoom or similar software, you then use that virtual device for the audio input. This works reliably and without any noticeable latency. Still I wished the AudioBox would just take care of all of this and provide me with the microphone input upmixed to both channels.\nFigure 8. Setting up a virtual audio device using BlackHole and connecting the mono microphone input to it using both channels via LadioCast; note how channel 1 is used for both L and R in the input configuration in LadioCast Coming back to the microphone, one thing to be aware of is that it provides a rather low output signal. While you can boost it up far enough with the AudioBox, you’ll start to hear some noise. And I haven’t spent hundreds of Euros and multiple hours to get noise, have I?! So I did what every reasonable person would do in that situation: spend some more money.\nFigure 9. CloudLifter CL-1 Mic Activator The solution was to add a pre-amplifier. Here I went for the CloudLifter, which you put between the microphone and the audio interface. It takes 48V phantom power (which the AudioBox provides) and adds +25dB of gain, giving me audio with proper volume, without any audible hiss whatsoever. Take that, sunken cost fallacy!\nIf you would like to hear (and see) a recording with this set-up, have a look at this session about the JfrUnit project from P99Conf earlier this year.\nWhat’s Next? Overall, I’m very happy with my computing and recording set-up. One thing that still could be improved is lighting. It’s a common practice to work with two front lights (or one from the front and one from the side), so I’ll probably buy another light at some point. I also hope to finish the teleprompter project and put it into daily use.\nOther than that, I am sometimes wondering whether I should get a second mirrorless camera and a video switcher like the Atem Mini and explore a multi-camera set-up. I’m certain that this would be lots of fun, on the other hand I don’t really have the need for it…​ yet?\nMany thanks to Hans-Peter Grahsl for his feedback while writing this blog post!\n","id":47,"publicationdate":"Oct 24, 2021","section":"blog","summary":"\u003cdiv class=\"imageblock\"\u003e\n\u003cdiv class=\"content\"\u003e\n\u003cimg src=\"/images/desk_complete.jpg\" alt=\"desk complete\"/\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI’ve been working from home exclusively for the last nine years,\nbut it was only last year that I started to look into ways for expanding my computer set-up and go beyond the usual combination of having a laptop with your regular external screen.\nThe global COVID-19 pandemic, the prospect of having more calls with colleagues than ever (no physical meetings), and the constantly increasing need for recording talks for online conferences and meet-ups made me reevaluate things and steadily improve and fine tune my set-up, in particular in regards to better video and audio quality.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"What's on My Desk?","uri":"https://www.morling.dev/blog/whats-on-my-desk/"},{"content":" It has been just a few weeks since the release of Java 17, but the first changes scheduled for Java 18 begin to show up in early access builds. One feature in particular that excites me as a maintainer of different Java libraries is JEP 413 (\u0026#34;Code Snippets in Java API Documentation\u0026#34;).\nSo far, JavaDoc has not made it exactly comfortable to include example code which shows how to use an API: you had to escape special characters like \u0026#34;\u0026lt;\u0026#34;, \u0026#34;\u0026gt;\u0026#34;, and \u0026#34;\u0026amp;\u0026#34;, indentation handling was cumbersome. But the biggest problem was that any such code snippet would have to be specified within the actual JavaDoc comment itself, i.e. you did not have proper editor support when creating it, and worse, it was not validated that the shown code actually is correct. This often led to code snippets which wouldn’t compile if you were to copy them into a Java source file, be it due to an oversight by the author, or simply because APIs changed over time and no one was thinking of updating the corresponding snippets in JavaDoc comments.\nAll this is going to change with JEP 413: it does not only improve ergonomics of inline snippets, but it also allows you to include code snippets from external source files. This means that you’ll be able to edit and refactor any example code using your regular Java toolchain; better yet: you can also compile and test it as part of your build. Welcome to 2021 — no more wrong or outdated code snippets in JavaDoc!\nIncluding Snippets From Your Test Directory You could think of different ways for organizing your snippet files with JEP 413, but one particularly intriguing option is to source them straight from the tests of your project, e.g. the src/test/java directory in case of a Maven project. That way, any incorrect snippet code — be it due to compilation failures or due to failing test assertions — will be directly flagged within your build.\nSo let’s see how to set this up, using the Jakarta Bean Validation API project as an example. The required configuration is refreshingly simple; all we need to do is to specify src/test/java as our \u0026#34;snippet path\u0026#34;. While the Maven JavaDoc plug-in does not yet provide a bespoke configuration option for this, we can simply pass it using the \u0026lt;additionalOptions\u0026gt; property (make sure to use version 3.0.0 or later):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-javadoc-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.3.1\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;attach-javadocs\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;jar\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;additionalOptions\u0026gt; \u0026lt;additionalOption\u0026gt; (1) --snippet-path=${basedir}/src/test/java \u0026lt;/additionalOption\u0026gt; \u0026lt;/additionalOptions\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; 1 Obtain snippets from src/test/java And that’s all there is to it really, you now can start to work with example code as actual source code. Here’s an example for a snippet to be included into the API documentation of jakarta.validation.Validation, the entry point into the Bean Validation API:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 package snippets; (1) import jakarta.validation.Validation; import jakarta.validation.ValidatorFactory; public class CustomProviderSnippet { public void customProvider() { // @start region=\u0026#34;provider\u0026#34; (2) ACMEConfiguration configuration = Validation .byProvider(ACMEProvider.class) .providerResolver( new MyResolverStrategy() ) .configure(); ValidatorFactory factory = configuration.buildValidatorFactory(); // @end (2) } } 1 There’s no specific requirements on the package to be used; I like using a descriptive name snippets, so to easily tell apart snippets from functional tests 2 If you don’t want to include the entire file, regions allow to specify the exact section(s) to include While a plain method is shown here, this could of course also be an JUnit test with assertions for making sure that the snippet code does what it is supposed to do (being an API specification, the Bean Validation project itself doesn’t provide an implementation we could test against). Including the snippet into the JavaDoc in the source file is straight-forward:\n1 2 3 4 5 6 7 8 9 10 11 /** * ... * \u0026lt;li\u0026gt; * The third approach allows you to specify explicitly and in * a type safe fashion the expected provider. * \u0026lt;p\u0026gt; * Optionally you can choose a custom {@code ValidationProviderResolver}. * {@snippet class=\u0026#34;snippets.CustomProviderSnippet\u0026#34; region=\u0026#34;provider\u0026#34;} (1) * \u0026lt;/li\u0026gt; * ... */ 1 Specify the snippet either using the class or the file attribute; optionally define a specific snippet region to be included If needed, you also can customize appearance of the rendered snippet, so to add links, highlight key parts (using custom CSS styles if needed), or replace specific parts of the snippet. The latter comes in handy for instance to replace non-critical parts with a placeholder such as \u0026#34;…​\u0026#34;. This is one of the details I really like about this JEP: Even if you did manage example code in separate source files in the past, then manually copying them into JavaDoc, such placeholders made things cumbersome. Naturally, they’d fail compilation, e.e. you always had to do some manual editing when copying over the snippet into JavaDoc. Getting all this \u0026#34;for free\u0026#34; is a very nice improvement.\nHere’s an example showing these adjustments in source form (scroll to the right to see all the snippet tag attributes, as these lines can become fairly long):\n1 2 3 4 5 6 7 8 9 public void customProvider() { // @start region=\u0026#34;provider\u0026#34; ACMEConfiguration configuration = Validation .byProvider(ACMEProvider.class) // @highlight substring=\u0026#34;byProvider\u0026#34; (1) .providerResolver( new MyResolverStrategy() ) // @replace regex=\u0026#34; new MyResolverStrategy\\(\\) \u0026#34; replacement=\u0026#34;...\u0026#34; (2) .configure(); ValidatorFactory factory = configuration.buildValidatorFactory(); // @link regex=\u0026#34;^.*?ValidatorFactory\u0026#34; target=\u0026#34;jakarta.validation.ValidatorFactory\u0026#34; (3) // @end } 1 Highlight the byProvider() method 2 Replace the parameter value of the method call with \u0026#34;…​\u0026#34; 3 Make the ValidatorFactory class name a link to its own JavaDoc And this is how the snippet will looks like in the rendered documention:\nSome folks may argue that it might be nice to have proper colored syntax highlighting support. I’m not sure whether I agree though: your typical code snippets in API docs should be rather short, and simply highlighting key parts like shown above may be more useful than colorizing the entire thing. Note the extra new line at the beginning of the snippet shouldn’t really be there, it’s not quite clear to me where it’s coming from. I’ll try and get this clarified on the javadoc-dev mailing list.\nSummary Being able to include code snippets from actual source files into API documentation is a highly welcomed improvement for Java API docs authors and users alike. It’s great to see Java catching up here with other language eco-systems like Rust, which already support executable documentation examples. I’m expecting this feature to be used very quickly, with first folks already announcing to build their API docs with Java 18 as soon as it’s out. Of course you can still ensure compatibility of your code with earlier Java versions also when doing so.\nIf you’d like get your hands on executable JavaDoc code snippets yourself, you can start with this commit showing the required changes for the Bean Validation API. Run mvn clean verify, and you’ll find the rendered JavaDoc under target/apidocs. Just make sure to build this project using a current Java 18 early access build. Happy snippeting!\n","id":48,"publicationdate":"Oct 18, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt has been just a few weeks since the \u003ca href=\"https://www.infoq.com/news/2021/09/java17-released/\"\u003erelease of Java 17\u003c/a\u003e, but the first changes scheduled for Java 18 begin to show up in early access builds.\nOne feature in particular that excites me as a maintainer of different Java libraries is \u003ca href=\"https://openjdk.java.net/jeps/413\"\u003eJEP 413\u003c/a\u003e (\u0026#34;Code Snippets in Java API Documentation\u0026#34;).\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Executable JavaDoc Code Snippets","uri":"https://www.morling.dev/blog/executable-javadoc-code-snippets/"},{"content":" The ResourceBundle class is Java’s workhorse for managing and retrieving locale specific resources, such as error messages of internationalized applications. With the advent of the module system in Java 9, specifics around discovering and loading resource bundles have changed quite a bit, in particular when it comes to retrieving resource bundles across the boundaries of named modules.\nIn this blog post I’d like to discuss how resource bundles can be used in a multi-module application (i.e. a \u0026#34;modular monolith\u0026#34;) for internationalizing error messages. The following requirements should be satisified:\nThe individual modules of the application should contribute bundles with their specific error messages, avoiding the need for developers from the team having to work on one large shared resource bundle\nOne central component (like an error handler) should use these bundles for displaying or logging the error messages in a uniform way\nThere should be no knowledge about the specific modules needed in the central component, i.e. it should be possible to add further modules to the application, each with their own set of resource bundles, without having to modify the central component\nThe rationale of this design is to enable individual development teams to work independently on their respective components, including the error message resource bundles, while ensuring consistent preparation of messages via the central error handler.\nAs an example, we’re going to use Links, a hypothetical management software for golf courses. It is comprised of the following modules (click on image to enlarge):\nThe core module contains common \u0026#34;framework\u0026#34; code, such as the error handler class. The modules greenkeeping, tournament, and membership represent different parts of the business domain of the Links application. Normally, this is where we’d put our business logic, but in the case at hand they’ll just contain the different resource bundles. Lastly, the app module provides the entry point of the application in form of a simple main class.\nThe ResourceBundleProvider Interface If you have worked with resource bundles before, you may have come across approaches for merging multiple bundles into one. While technically still doable when running with named Java modules, it is not adviseable; in order to be found across module boundaries, your bundles would have to reside in open packages. Also, as no package must be contained in more than one module, you’d have to implement some potentially complex logic for identifying bundles contributed by different modules, whose exact names you don’t know (see the third requirement above). You may consider to use automatic modules, but then you’d void some advantages of the Java module system, such as the ability to create modular runtime images.\nThe solution to these issues comes in the form of the ResourceBundleProvider API, introduced alongside the module system in Java 9. Based on the Java service loader mechanism, it enables one module to retrieve bundles from other modules in a loosely coupled way; the consuming module neither needs to know about the providing modules themselves, nor about implementation details such as their internally used bundle names and locations.\nSo let’s see how we can use ResourceBundleProvider in the Links application. The first step is to define a bundle-specific service provider interface, derived from ResourceBundleProvider:\n1 2 3 4 5 6 package dev.morling.links.core.spi; import java.util.spi.ResourceBundleProvider; public interface LinksMessagesProvider extends ResourceBundleProvider { } The name of bundle provider interfaces must follow the pattern \u0026lt;package of baseName\u0026gt; + \u0026#34;.spi.\u0026#34; + \u0026lt;simple name of baseName\u0026gt; + \u0026#34;Provider\u0026#34;. As the base name is dev.morling.links.core.LinksMessages in our case, the provider interface name must be dev.morling.links.core.spi.LinksMessagesProvider. This can be sort of a stumbling stone, as an innocent typo in the package or type name will cause your bundle not to be found, without good means of analyzing the situation, other than double and triple checking that all names are correct.\nNext, we need to declare the usage of this provider interface in the consuming module. Assuming the afore-mentioned error handler class is located in the core module, the module descriptor of the same looks like so:\n1 2 3 4 5 module dev.morling.links.core { exports dev.morling.links.core; exports dev.morling.links.core.spi; (1) uses dev.morling.links.core.spi.LinksMessagesProvider; (2) } 1 Export the package of the resource bundle provider interface so that implementations can be created in other modules 2 Declare the usage of the LinksMessagesProvider service Using the resource bundle in the error handler class is rather unexciting; note that not our own application code retrieves the resource bundle provider via the service loader, but instead this is happening in the ResourceBundle::getBundle() factory method:\n1 2 3 4 5 6 7 8 9 public class ErrorHandler { public String getErrorMessage(String key, UserContext context) { ResourceBundle bundle = ResourceBundle.getBundle( \u0026#34;dev.morling.links.base.LinksMessages\u0026#34;, context.getLocale()); return \u0026#34;[User: \u0026#34; + context.getName() + \u0026#34;] \u0026#34; + bundle.getString(key); } } Here, the error handler simply obtains the message for a given key from the bundle, using the locale of some user context object, and returning a message prefixed with the user’s name. This implementation just serves for example purposes of course; in an actual application, message keys might for instance be obtained from application specific exception types, raised in the different modules, and logged in a unified way via the error handler.\nResource Bundle Providers With the code in the core module in place (mostly, that is, as we’ll see in a bit), let’s shift our attention towards the resource bundle providers in the different application modules. Not too suprising, they need to define an implementation of the LinksMessagesProvider contract.\nThere is one challenge though: how can the different modules contribute implementations for one and the same bundle base name and locale? Once the look-up code in ResourceBundle has found a provider which returns a bundle for a requested name and locale, it will not query any other bundle providers. In our case though, we need to be able to obtain messages from any of the bundles contributed by the different modules: messages related to green keeping must be obtained from the bundle of the dev.morling.links.greenkeeping module, tournament messages from dev.morling.links.tournament, and so on.\nThe idea to address this concern is the following:\nPrefix each message key with a module specific string, resulting in keys like tournament.fullybooked, greenkeeping.greenclosed, etc.\nWhen requesting the bundle for a given key in the error handler class, obtain the key’s prefix and pass it to bundle providers\nLet bundle providers react only to their specific message prefix\nThis is where things become a little bit fiddly: there isn’t a really good way for passing such contextual information from bundle consumers to providers. Our loop hole here will be to squeeze that information into the the requested Locale instance. Besides the well-known language and country attributes, Locale can also carry variant data and even application specific extensions.\nThe latter, in form of a private use extension, would actually be pretty much ideal for our purposes. But unfortunately, extensions aren’t evaluated by the look-up routine in ResourceBundle. So instead we’ll go with propagating the key namespace information via the locale’s variant. First, let’s revisit the code in the ErrorHandler class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public class ErrorHandler { public String getErrorMessage(String key, UserContext context) { String prefix = key.split(\u0026#34;\\\\.\u0026#34;)[0]; (1) Locale locale = new Locale( (2) context.getLocale().getLanguage(), context.getLocale().getCountry(), prefix ); ResourceBundle bundle = ResourceBundle.getBundle( \u0026#34;dev.morling.links.core.LinksMessages\u0026#34;, locale); (3) return \u0026#34;[User: \u0026#34; + context.getName() + \u0026#34;] \u0026#34; + bundle.getString(key); (4) } } 1 Extract the key prefix, e.g. \u0026#34;greenkeeping\u0026#34; 2 Construct a new Locale, using the language and country information from the current user’s locale and the key prefix as variant 3 Retrieve the bundle using the adjusted locale 4 Prepare the error message Based on this approach, the resource bundle provider implementation in the greenkeeping module looks like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 public class GreenKeepingMessagesProvider extends AbstractResourceBundleProvider implements LinksMessagesProvider { @Override public ResourceBundle getBundle(String baseName, Locale locale) { if (locale.getVariant().equals(\u0026#34;greenkeeping\u0026#34;)) { (1) baseName = baseName.replace(\u0026#34;core.LinksMessages\u0026#34;, \u0026#34;greenkeeping.internal.LinksMessages\u0026#34;); (2) locale = new Locale(locale.getLanguage(), locale.getCountry()); (3) return super.getBundle(baseName), locale); } return null; (4) } } 1 This provider only should return a bundle for \u0026#34;greenkeeping\u0026#34; messages 2 Retrieve the bundle, adjusting the name (see below) 3 Create a Locale without the variant 4 Let other providers kick in for messages unrelated to green-keeping The adjustment of the bundle name deserves some more explanation. The module system forbids so-called \u0026#34;split packages\u0026#34;, i.e. packages of the same name in several modules of an application. That’s why we cannot have a bundle named dev.morling.links.core.LinksMessages in multiple modules, even if the package dev.morling.links.core isn’t exported by any of them. So each module must have its bundles in a specific package, and the bundle provider has to adjust the name accordingly, e.g. into dev.morling.links.greenkeeping.internal.LinksMessages in the greenkeeping module.\nAs with the service consumer, the service provider also must be declared in the module’s descriptor:\n1 2 3 4 5 6 module dev.morling.links.greenkeeping { requires dev.morling.links.core; provides dev.morling.links.core.spi.LinksMessagesProvider with dev.morling.links.greenkeeping.internal. ↩ GreenKeepingMessagesProvider; } Note how the package of the provider and the bundle isn’t exported or opened, solely being exposed via the service loader mechanism. For the sake of completeness, here are two resource bundle files from the greenkeeping module, one for English, and one for German:\n1 greenkeeping.greenclosed=Green closed due to mowing 1 greenkeeping.greenclosed=Grün wegen Pflegearbeiten gesperrt Lastly, some test for the ErrorHandler class, making sure it works as expected:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 ErrorHandler errorHandler = new ErrorHandler(); String message = errorHandler.getErrorMessage(\u0026#34;greenkeeping.greenclosed\u0026#34;, new UserContext(\u0026#34;Bob\u0026#34;, Locale.US)); assert message.equals(\u0026#34;[User: Bob] Green closed due to mowing\u0026#34;); message = errorHandler.getErrorMessage(\u0026#34;greenkeeping.greenclosed\u0026#34;, new UserContext(\u0026#34;Herbert\u0026#34;, Locale.GERMANY)); assert message.equals(\u0026#34;[User: Herbert] Grün wegen \u0026#34; + \u0026#34;Pflegearbeiten gesperrt\u0026#34;); message = errorHandler.getErrorMessage(\u0026#34;tournament.fullybooked\u0026#34;, new UserContext(\u0026#34;Bob\u0026#34;, Locale.US)); assert message.equals(\u0026#34;[User: Bob] This tournament is fully booked\u0026#34;); Running on the Classpath At this point, the design supports cross-module look-ups of resource bundles when running the application on the module path. Can we also make it work when running the same modules on the classpath instead? Indeed we can, but some slight additions to the core module will be needed. The reason being, that ResourceBundleProvider service contract isn’t considered at all by the the bundle retrieval logic in ResourceBundle when running on the classpath.\nThe way out is to provide a custom ResourceBundle.Control implementation which mimicks the logic for adjusting the bundle names based on the requested locale variant, as done by the different providers above:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public class LinksMessagesControl extends Control { @Override public String toBundleName(String baseName, Locale locale) { if (locale.getVariant() != null) { baseName = baseName.replace(\u0026#34;core.LinksMessages\u0026#34;, locale.getVariant() + \u0026#34;.internal.LinksMessages\u0026#34;); (1) locale = new Locale(locale.getLanguage(), locale.getCountry()); (2) return super.toBundleName(baseName, locale); } return super.toBundleName(baseName, locale); } } 1 Adjust the requested bundle name so that the module-specific bundles are retrieved 2 Drop the variant name from the locale Now we could explicitly pass in an instance of that Control implementation when retrieving a resource bundle through ResourceBundle::getBundle(), but there’s a simpler solution in form of the not overly widely known ResourceBundleControlProvider API:\n1 2 3 4 5 6 7 8 9 10 11 public class LinksMessagesControlProvider implements ResourceBundleControlProvider { @Override public Control getControl(String baseName) { if (baseName.equals(\u0026#34;dev.morling.links.core.LinksMessages\u0026#34;)) { (1) return new LinksMessagesControl(); } return null; } } 1 Return the LinksMessagesControl when the LinksMessages bundle is requested This is another service provider contract; its implementations are retrieved from the classpath when obtaining a resource bundle and no control has been given explicity. Of course, the service implementation still needs to be registered, this time using the traditional approach of specifying the implementation name(s) in the META-INF/services/java.util.spi.ResourceBundleControlProvider file:\ndev.morling.links.core.internal.LinksMessagesControlProvider With the control and control provider in place, the modular resource bundle look-up will work on the module path as well as the classpath, when running on Java 9+. There’s one caveat remaining though if we want to enable the application also to be run on the classpath with Java 8.\nIn Java 8, ResourceBundleControlProvider implementations are not picked up from the classpath, but only via the Java extension mechanism (now deprecated). This means you’d have to provide the custom control provider through the lib/ext or jre/lib/ext directory of your JRE or JDK, respectively, which often isn’t very practical. At this point we might be ready to cave in and just pass in the custom control implementation to ResourceBundle::getBundle(). But we can’t actually do that: when invoked in a named module on Java 9+ (which is the case when running the application on the module path), the getBundle(String, Locale, Control) method will raise an UnsupportedOperationException!\nTo overcome this last obstacle and make the application useable across the different Java versions, we can resort to the multi-release JAR mechanism: two different versions of the ErrorHandler class can be provided within a single JAR, one to be used with Java 8, and another one to be used with Java 9 and later. The latter calls getBundle(String, Locale), i.e. not passing the control, thus using the resource bundle providers (when running on the module path) or the control provider (when running on the classpath). The former invokes getBundle(String, Locale, Control), allowing the custom control to be used on Java 8.\nBuilding Multi-Release JARs When multi-release JARs were first introduced in Java 9 with JEP 238, tool support for building them was non-existent, making this task quite a challenging one. Luckily, the situation has improved a lot since then. When using Apache Maven, only two plug-ins need to be configured:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-compiler-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; (1) \u0026lt;id\u0026gt;compile-java-9\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;compile\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;compile\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;release\u0026gt;9\u0026lt;/release\u0026gt; (2) \u0026lt;compileSourceRoots\u0026gt; \u0026lt;compileSourceRoot\u0026gt; ${project.basedir}/src/main/java-9 (3) \u0026lt;/compileSourceRoot\u0026gt; \u0026lt;/compileSourceRoots\u0026gt; \u0026lt;multiReleaseOutput\u0026gt;true\u0026lt;/multiReleaseOutput\u0026gt; (4) \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-jar-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;archive\u0026gt; \u0026lt;manifestEntries\u0026gt; \u0026lt;Multi-Release\u0026gt;true\u0026lt;/Multi-Release\u0026gt; (5) \u0026lt;/manifestEntries\u0026gt; \u0026lt;/archive\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; ... 1 Set up another execution of the Maven compiler plug-in for the Java 9 specific sources, 2 using Java 9 bytecode level, 3 picking up the sources from src/main/java-9, 4 and organizing the compilation output in the multi-release structure under META-INF/versions/…​ 5 Configure the Maven JAR plug-in so that the Multi-Release manifest entry is set, marking the JAR als a multi-release JAR Discussion and Wrap-Up Let’s wrap up and evaluate whether the proposed implementation satisfies our original requirements:\nModules of the application contribute bundles with their specific error messages: ✅ Each module of the Links application can provide its own bundle(s), using a specific key prefix; we could even take it a step further and provide bundles via separate i18n modules, for instance created by an external translation agency, independent from the development teams\nCentral error handler component can use these bundles for displaying or logging the error messages: ✅ The error handler in the core module can retrieve messages from all the bundles in the different modules, freeing the developers of the application modules from details like adding the user’s name to the final messages\nNo knowledge about the specific modules in the central component: ✅ Thanks to the different providers (or the custom Control, respectively), there is no need for registering the specific bundles with the error handler in the core module; further modules could be added to the Links application and the error handler would be able to obtain messages from the resource bundles contributed by them\nWith a little bit of extra effort, it also was possible to design the code in the core module in a way that the application can be used with different Java versions and configurations: on the module path with Java 9+, on the classpath with Java 9+, on the classpath with Java 8.\nIf you’d like to explore the complete code by yourself, you can find it in the modular-resource-bundles GitHub repository. To learn more about resource bundle retrieval in named modules, please refer to the extensive documentation of ResourceBundle and ResourceBundleProvider.\nMany thanks to Hans-Peter Grahsl for providing feedback while writing this post!\n","id":49,"publicationdate":"Aug 29, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ResourceBundle.html\"\u003e\u003ccode\u003eResourceBundle\u003c/code\u003e\u003c/a\u003e class is Java’s workhorse for managing and retrieving locale specific resources,\nsuch as error messages of internationalized applications.\nWith the advent of the module system in Java 9, specifics around discovering and loading resource bundles have changed quite a bit, in particular when it comes to retrieving resource bundles across the boundaries of named modules.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this blog post I’d like to discuss how resource bundles can be used in a multi-module application\n(i.e. a \u0026#34;modular monolith\u0026#34;) for internationalizing error messages.\nThe following requirements should be satisified:\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Resource Bundle Look-ups in Modular Java Applications","uri":"https://www.morling.dev/blog/resource-bundle-lookups-in-modular-java-applications/"},{"content":" Unit testing, for performance\nIt’s with great pleasure that I’m announcing the first official release of JfrUnit today!\nJfrUnit is an extension to JUnit which allows you to assert JDK Flight Recorder events in your unit tests. This capability opens up a number of interesting use cases in the field of testing JVM-based applications:\nYou can use JfrUnit to ensure your application produces the custom JFR events you expect it to emit\nYou can use JfrUnit to identify potential performance regressions of your application by means of tracking JFR events e.g. for garbage collection, memory allocation and network I/O\nYou can use JfrUnit together with JMC Agent for whitebox tests of your application, ensuring specific methods are invoked with the expected parameters and return values\nGetting Started With JfrUnit JfrUnit is available on Maven Central (a big shout-out to Andres Almiray for setting up a fully automated release pipeline using the excellent JReleaser project!). If you’re working with Apache Maven, add the following dependency to your pom.xml file:\n... \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.moditect.jfrunit\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jfrunit\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.0.0.Alpha1\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;test\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; ... Alternatively, you can of course build JfrUnit from source yourself, as described in the project’s README file.\nWhat is ModiTect? JfrUnit is part of the ModiTect family of open-source projects. All the ModiTect projects are in some way related to Java infrastructure, such as the Java Module System, or JDK Flight Recorder. Besides JfrUnit, the following project are currently developed under the ModiTect umbrella:\nModiTect: this eponymous project provides tooling for the Java Module System, e.g. for adding module descriptors while building with Java 8, creating jlink images, etc.\nLayrry: a Runner and API for layered Java applications, which lets you use the module system’s notion of module layers for implementing plug-in architectures, loading multiple versions of one dependency into your application, etc.\nDeptective 🕵️: a plug-in for the javac compiler for analysing, validating and enforcing well-defined relationships between the packages of a Java application\nWith that dependency in place, the steps of using JfrUnit are the following:\nEnable the JFR event type(s) you want to assert against\nRun the application logic under test\nAssert the emitted JFR events\nTo make things more tangible, here’s an example that asserts the memory allocation done by a Quarkus-based web application for a specific use case:\n@Test @EnableEvent(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) (1) @EnableEvent(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;) public void retrieveTodoShouldYieldExpectedAllocation() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder() .build(); // warm-up (2) for (int i = 1; i\u0026lt;= WARMUP_ITERATIONS; i++) { if (i % 1000 == 0) { System.out.println(i); } executeRequest(r.nextInt(20) + 1, client); } jfrEvents.awaitEvents(); jfrEvents.reset(); (3) (4) for (int i = 1; i\u0026lt;= ITERATIONS; i++) { if (i % 1000 == 0) { System.out.println(i); } executeRequest(r.nextInt(20) + 1, client); } jfrEvents.awaitEvents(); (5) long sum = jfrEvents.filter(this::isObjectAllocationEvent) .filter(this::isRelevantThread) .mapToLong(this::getAllocationSize) .sum(); assertThat(sum / ITERATIONS).isLessThan(33_000); (6) } 1 Enable the jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB JFR event types; on Java 16 and beyond, you could also use the new jdk.ObjectAllocationSample type instead 2 Do some warm-up iterations so to achieve a steady state for the memory allocation rate 3 Reset the JfrUnit event collector after the warm-up 4 Run the code under test, in this case invoking some REST API of the application 5 Wait until all the events from the test have been received 6 Run assertions against the JFR events, in this case summing up all memory allocations and asserting that the value per REST call isn’t larger than 33K (the exact threshold has been determined upfront) The general idea behind this testing approach is that a regression in regards to metrics like memory allocation or I/O — e.g. with a database — can be a hint for a performance degredation. Allocating more memory than anticipated may be an indicator that your application started to do something which it hadn’t done before, and which may impact its latency and through-put characteristics.\nTo learn more about this approach for identifying potential performance regressions, please refer to this post, which introduced JfrUnit originally.\nGroovier Tests With Spock Thanks to an outstanding contribution by Petr Hejl, instead of the Java-based API, you can also use Groovy and the Spock framework for your JfrUnit tests, which makes for very compact and nicely readable tests. Here’s an example for asserting two JFR events using the Spock integration:\nclass JfrSpec extends Specification { JfrEvents jfrEvents = new JfrEvents() @EnableEvent(\u0026#39;jdk.GarbageCollection\u0026#39;) (1) @EnableEvent(\u0026#39;jdk.ThreadSleep\u0026#39;) def \u0026#39;should Have GC And Sleep Events\u0026#39;() { when: (2) System.gc() sleep(1000) then: (3) jfrEvents[\u0026#39;jdk.GarbageCollection\u0026#39;] jfrEvents[\u0026#39;jdk.ThreadSleep\u0026#39;].withTime(Duration.ofMillis(1000)) } } 1 Enable the jdk.GarbageCollection and jdk.ThreadSleep event types 2 Run the test code 3 Assert the events; thanks to the integration with Spock, no explicit barrier for awaiting all events is needed To learn more about the Spock-based approach of using JfrUnit, please refer to the instructions in the README.\nFor getting started with JfrUnit yourself, you may take a look at the jfrunit-examples repo, which shows some common usages the project.\nOutlook This first Alpha release is an important milestone for the JfrUnit project. Since its inception in the December of last year, I’ve received tons of invaluable feedback, and the project has matured quite a bit.\nIn terms of next steps, apart from further expanding and honing the API, one area I’d like to explore with JfrUnit is keeping track of and analysing historical event data from multiple test runs over a longer period of time.\nFor instance, consider a case where your REST call allocates 33 KB today, 40 KB next month, 50 KB the month after, etc. Each increase by itself may not be problematic, but when comparing the results from today to those of a run in six months from now, a substantial regression may have accumulated. For identifying and analysing such trends, loading JfrUnit result data into a time series database, or repository systems like Hyperfoil Horreum, may be a very interesting feature.\nOn a related note, John O’Hara has started work towards automated event analysis using the rules system of JDK Mission Control, so stay tuned for some really exciting developments in this area!\nLast but not least, I’d like say thank you to all the folks helping with the work on JfrUnit, be it through discussions, raising feature requests or bug reports, or code changes, including the following fine folks who have contributed to the JfrUnit repository at this point: Andres Almiray, Hash Zhang, Leonard Brünings, Manyanda Chitimbo, Matthias Andreas Benkard, Petr Hejl, Sam Brannen, Sullis, Thomas, Tivrfoa, and Tushar Badgu. Onwards and upwards!\n","id":50,"publicationdate":"Aug 4, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUnit testing, for performance\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt’s with great pleasure that I’m announcing the first official release of JfrUnit today!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://github.com/moditect/jfrunit\"\u003eJfrUnit\u003c/a\u003e is an extension to JUnit which allows you to assert \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e events in your unit tests.\nThis capability opens up a number of interesting use cases in the field of testing JVM-based applications:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eYou can use JfrUnit to ensure your application produces the \u003ca href=\"blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/\"\u003ecustom JFR events\u003c/a\u003e you expect it to emit\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eYou can use JfrUnit to identify potential performance regressions of your application by means of tracking JFR events e.g. for garbage collection, memory allocation and network I/O\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eYou can use JfrUnit together with \u003ca href=\"https://wiki.openjdk.java.net/display/jmc/The+JMC+Agent\"\u003eJMC Agent\u003c/a\u003e for whitebox tests of your application, ensuring specific methods are invoked with the expected parameters and return values\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e","tags":null,"title":"Introducing JfrUnit 1.0.0.Alpha1","uri":"https://www.morling.dev/blog/introducing-jfrunit-1-0-0-alpha1/"},{"content":" Over the course of the last few months, I’ve had the pleasure to serve on the Kafka Summit program committee and review several hundred session abstracts for the three Summits happening this year (Europe, APAC, Americas). That’s not only a big honour, but also a unique opportunity to learn what excites people currently in the Kafka eco-system (and yes, it’s a fair amount of work, too ;).\nWhile voting on the proposals, and also generally aspiring to stay informed of what’s going on in the Kafka community at large, I noticed a few repeating themes and topics which I thought would be interesting to share (without touching on any specific talks of course). At first I meant to put this out via a Twitter thread, but then it became a bit too long for that, so I decided to write this quick blog post instead. Here it goes!\nCambrian Explosion of Connectors Apache Kafka is a great commit log and streaming platform, but of course you also need to get data into and out of it. Kafka Connect is vital for doing just that, linking data sources and sinks to the Kafka backbone. Be it integration of legacy apps and databases, external systems (e.g. IoT), data lakes, or DWHs, different CDC options (including Debezium, of course) — There’s connectors for everything.\nThe ever-increasing number of connectors is accompanied by growing operational maturity (large-scale deployments, KC on K8s, etc.) and upcoming improvements like KIP-618 (exactly-once source connectors) or KIP-731 (rate limiting). There’s so much activity within the Kafka connector eco-system, and it really sets Kafka apart from alternatives.\nDemocratization of Data Pipelines Another exciting trend is a move to self-service Kafka environments, with portals and infrastructure aimed at reducing the friction for standing up new deployments of Kafka, Connect, and related components like schema registries, while keeping track of and running everything in a safe way, e.g. when it comes to things like access control, role and schema management, (topic) naming conventions, managing data lineage and quality, ensuring compliance, privacy and operational best-practices, or observability.\nA healthy combination of in-house as well as open-source developments is happening here, and I’m sure it’s a field where we’ll see more tools and solutions appearing in the next months and years.\nStream Processing for Everyone Not exactly a new trend, but definitely a growing one: more and more users appreciate the benefits of stream processing for working with their data in Kafka, filtering, transforming, enriching and aggregating it either programmatically using libraries such as Kafka Streams or Apache Flink, or in a declarative fashion, e.g. via ksqlDB or Flink SQL. Either way, small, focused stream processing apps are a true manifestation of the microservices idea — have cohesive, independent application units, each focusing on one particular task and loosely coupled to each other, via Apache Kafka in this case.\nIt’s great to see the uptake here, including approaches for dynamic scaling based on end-to-end lag, and innovative new solutions for efficient incremental view materialization.\nHonorable Mentions Besides these bigger trends, there’s also a few more specific topics which I saw several times and which I found very interesting:\nTools and best practices for testing of Kafka-based applications (e.g. for creating test data or mock producers/consumers)\nFeeding ML/AI models is becoming a popular Kafka use case; it’s not my field of experience at all, but it seems like a very logical choice to run ML algorithms on data ingested via Kafka, allowing to gain new insight into business data with a low latency\nPushing data to consumers via GraphQL; (still?) even more niche probably, but I love the idea of push updates to browsers based on data from Kafka; this should allow for some interesting use cases\nOf course there’s also things like geo-replicated Kafka, the ongoing move towards managed Kafka service offerings (which raises interesting questions around connectivity to on-prem systems and data), architectural trends like data meshes, and so much more.\nIf you want to learn more about these and many other facets of Apache Kafka, its use cases, best practices, and latest developments, make sure to register for Kafka Summit (it’s free and online). The sessions from the Europe run can already be watched, while the APAC (July 27 - 28) and Americas (September 14 - 15) editions are still to come.\n","id":51,"publicationdate":"May 28, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOver the course of the last few months, I’ve had the pleasure to serve on the \u003ca href=\"https://www.kafka-summit.org/\"\u003eKafka Summit\u003c/a\u003e program committee and review several hundred session abstracts for the three Summits happening this year (Europe, APAC, Americas).\nThat’s not only a big honour, but also a unique opportunity to learn what excites people currently in the Kafka eco-system\n(and yes, it’s a fair amount of work, too ;).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhile voting on the proposals, and also generally aspiring to stay informed of what’s going on in the Kafka community at large, I noticed a few repeating themes and topics which I thought would be interesting to share\n(without touching on any specific talks of course).\nAt first I meant to put this out via a Twitter thread, but then it became a bit too long for that, so I decided to write this quick blog post instead.\nHere it goes!\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Three Plus Some Lovely Kafka Trends","uri":"https://www.morling.dev/blog/three-plus-some-lovely-kafka-trends/"},{"content":" Sometimes, less is more. One case where that’s certainly true is dependencies. And so it shouldn’t come at a surprise that the Apache Kafka community is eagerly awaiting the removal of the dependency to the ZooKeeper service, which currently is used for storing Kafka metadata (e.g. about topics and partitions) as well as for the purposes of leader election in the cluster.\nThe Kafka improvement proposal KIP-500 (\u0026#34;Replace ZooKeeper with a Self-Managed Metadata Quorum\u0026#34;) promises to make life better for users in many regards:\nBetter getting started and operational experience by requiring to run only one system, Kafka, instead of two\nRemoving potential for discrepancies of metadata state between ZooKeeper and the Kafka controller\nSimplifying configuration, for instance when it comes to security\nBetter scalability, e.g. in terms of number of partitions; faster execution of operations like topic creation\nWith KIP-500, Kafka itself will store all the required metadata in an internal Kafka topic, and controller election will be done amongst (a subset of) the Kafka cluster nodes themselves, based on a variant of the Raft protocol for distributed consensus. Removing the ZooKeeper dependency is great not only for running Kafka clusters in production, also for local development and testing being able to start up a Kafka node with a single process comes in very handy.\nHaving been in the works for multiple years, ZK-less Kafka, also known as KRaft (\u0026#34;Kafka Raft metadata mode\u0026#34;), was recently published as an early access feature with Kafka 2.8. I.e. the perfect time to get my hands on this and get a first feeling for ZK-less Kafka myself. Note this post isn’t meant to be a thorough evaluation or systematic testing of the new Kafka deployment mode, rather take it as a description of how to get started with playing with ZK-less Kafka and of a few observations I made while doing so.\nIn the world of ZK-less Kafka, there’s two node roles for nodes: controller and broker. Each node in the cluster can have either one or both roles (\u0026#34;combined nodes\u0026#34;). All controller nodes elect the active controller, which is in charge of coordinating the whole cluster, with other controller nodes acting as hot stand-by replicas. In the KRaft KIPs, the active controller sometimes also is simply referred to as leader. This may appear confusing at first, if you are familiar with the existing concept of partition leaders. It started to make sense to me once I realized that the active controller is the leader of the sole partition of the metadata topic. All broker nodes are handling client requests, just as before with ZooKeeper.\nWhile for smaller clusters it is expected that the majority of, or even all cluster nodes act as controllers, you may have dedicated controller-only nodes in larger clusters, e.g. 3 controller nodes and 7 broker nodes in a cluster of 10 nodes overall. As per the KRaft README, having dedicated controller nodes should increase overall stability, as for instance an out-of-memory error on a broker wouldn’t impact controllers, or potentially even cause a leader re-election.\nTrying ZK-less Kafka Yourself As a foundation, I’ve created a variant of the Debezium 1.6 container image, which updates Kafka from 2.7 to Kafka 2.8, and also does the required changes to the entrypoint script for using the KRaft mode. Note this change hasn’t been merged yet to the upstream Debezium repository, so if you’d like to try out things by yourself, you’ll have to clone my repo, and then build the container image yourself like this:\n$ git clone git@github.com:gunnarmorling/docker-images.git $ cd docker-images/kafka/1.6 $ docker build -t debezium/zkless-kafka:1.6 --build-arg DEBEZIUM_VERSION=1.6.0 . In order to start the image with Kafka in KRaft mode, the CLUSTER_ID environment variable must be set. A value can be obtained using the new bin/kafka-storage.sh script; going forward, we’ll likely add an option to the Debezium Kafka container image for doing so. If that variable is set, the entrypoint script of the image does the following things:\nUse config/kraft/server.properties instead of config/server.properties as the Kafka configuration file; this one comes with the Kafka distribution and is meant for nodes which should have both the controller and broker roles; i.e. the container image currently only supports combined nodes\nFormat the node’s storage directory, if not the case yet\nSet up a listener for controller communication\nBased on that, here is what’s needed in a Docker Compose file for spinning up a Kafka cluster with three nodes:\nversion: \u0026#39;2\u0026#39; services: kafka-1: image: debezium/zkless-kafka:1.6 ports: - 19092:9092 - 19093:9093 environment: - CLUSTER_ID=oh-sxaDRTcyAr6pFRbXyzA (1) - BROKER_ID=1 (2) - KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093 (3) kafka-2: image: debezium/zkless-kafka:1.6 ports: - 29092:9092 - 29093:9093 environment: - CLUSTER_ID=oh-sxaDRTcyAr6pFRbXyzA (1) - BROKER_ID=2 (2) - KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093 (3) kafka-3: image: debezium/zkless-kafka:1.6 ports: - 39092:9092 - 39093:9093 environment: - CLUSTER_ID=oh-sxaDRTcyAr6pFRbXyzA (1) - BROKER_ID=3 (2) - KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093 (3) 1 Cluster id; must be the same for all the nodes 2 Broker id; must be unique for each node 3 Addresses of all the controller nodes in the format id1@host1:port1,id2@host2:port2…​ No ZooKeeper nodes, yeah :)\nWorking on Debezium, and being a Kafka Connect aficionado allaround, I’m also going to add Connect and a Postgres database for testing purposes (you can find the complete Compose file here):\nversion: \u0026#39;2\u0026#39; services: # ... connect: image: debezium/connect:1.6 ports: - 8083:8083 links: - kafka-1 - kafka-2 - kafka-3 - postgres environment: - BOOTSTRAP_SERVERS=kafka-1:9092 - GROUP_ID=1 - CONFIG_STORAGE_TOPIC=my_connect_configs - OFFSET_STORAGE_TOPIC=my_connect_offsets - STATUS_STORAGE_TOPIC=my_connect_statuses postgres: image: debezium/example-postgres:1.6 ports: - 5432:5432 environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres Now let’s start everything:\n$ docker-compose -f docker-compose-zkless-kafka.yaml up Let’s also register an instance of the Debezium Postgres connector, which will connect to the PG database and take an initial snapshot, so we got some topics with a few messages to play with:\n$ curl -0 -v -X POST http://localhost:8083/connectors \\ -H \u0026#34;Expect:\u0026#34; \\ -H \u0026#39;Content-Type: application/json; charset=utf-8\u0026#39; \\ --data-binary @- \u0026lt;\u0026lt; EOF { \u0026#34;name\u0026#34;: \u0026#34;inventory-connector\u0026#34;, \u0026#34;config\u0026#34;: { \u0026#34;connector.class\u0026#34;: \u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, \u0026#34;tasks.max\u0026#34;: \u0026#34;1\u0026#34;, \u0026#34;database.hostname\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.port\u0026#34;: \u0026#34;5432\u0026#34;, \u0026#34;database.user\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.password\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.dbname\u0026#34; : \u0026#34;postgres\u0026#34;, \u0026#34;database.server.name\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;schema.include\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;topic.creation.default.replication.factor\u0026#34;: 2, \u0026#34;topic.creation.default.partitions\u0026#34;: 10 } } EOF Note how this is using a replication factor of 2 for all the topics created via Kafka Connect, which will come in handy for some experimenting later on.\nThe nosy person I am, I first wanted to take a look into that new internal metadata topic, where all the cluster metadata is stored. As per the release announcement, it should be named @metadata. But no such topic shows up when listing the available topics; only the __consumer_offsets topic, the change data topics created by Debezium, and some Kafka Connect specific topics are shown:\n# Get a shell on one of the broker containers $ docker-compose -f docker-compose-zkless-kafka.yaml exec kafka-1 bash # In that shell $ /kafka/bin/kafka-topics.sh --bootstrap-server kafka-3:9092 --list __consumer_offsets dbserver1.inventory.customers dbserver1.inventory.geom dbserver1.inventory.orders dbserver1.inventory.products dbserver1.inventory.products_on_hand dbserver1.inventory.spatial_ref_sys my_connect_configs my_connect_offsets my_connect_statuses Seems that this topic is truly meant to be internal; also trying to consume messages from the topic with kafka-console-consumer.sh or kafkacat fails due to the invalid topic name. Let’s see whether things are going to change here, since KIP-595 (\u0026#34;A Raft Protocol for the Metadata Quorum\u0026#34;) explicitly mentions the ability for consumers to \u0026#34;read the contents of the metadata log for debugging purposes\u0026#34;.\nIn the meantime, we can take a look at the contents of the metadata topic using the kafka-dump-log.sh utility, e.g. filtering out all RegisterBroker records:\n$ /kafka/bin/kafka-dump-log.sh --cluster-metadata-decoder \\ --skip-record-metadata \\ --files /kafka/data//\\@metadata-0/*.log | grep REGISTER_BROKER payload: {\u0026#34;type\u0026#34;:\u0026#34;REGISTER_BROKER_RECORD\u0026#34;,\u0026#34;version\u0026#34;:0,\u0026#34;data\u0026#34;:{\u0026#34;brokerId\u0026#34;:3,\u0026#34;incarnationId\u0026#34;:\u0026#34;O_PiUrjNTsqVEQv61gB2Vg\u0026#34;,\u0026#34;brokerEpoch\u0026#34;:0,\u0026#34;endPoints\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;PLAINTEXT\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;172.18.0.2\u0026#34;,\u0026#34;port\u0026#34;:9092,\u0026#34;securityProtocol\u0026#34;:0}],\u0026#34;features\u0026#34;:[],\u0026#34;rack\u0026#34;:null}} payload: {\u0026#34;type\u0026#34;:\u0026#34;REGISTER_BROKER_RECORD\u0026#34;,\u0026#34;version\u0026#34;:0,\u0026#34;data\u0026#34;:{\u0026#34;brokerId\u0026#34;:1,\u0026#34;incarnationId\u0026#34;:\u0026#34;FbOZdz9rSZqTyuSKr12JWg\u0026#34;,\u0026#34;brokerEpoch\u0026#34;:2,\u0026#34;endPoints\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;PLAINTEXT\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;172.18.0.3\u0026#34;,\u0026#34;port\u0026#34;:9092,\u0026#34;securityProtocol\u0026#34;:0}],\u0026#34;features\u0026#34;:[],\u0026#34;rack\u0026#34;:null}} payload: {\u0026#34;type\u0026#34;:\u0026#34;REGISTER_BROKER_RECORD\u0026#34;,\u0026#34;version\u0026#34;:0,\u0026#34;data\u0026#34;:{\u0026#34;brokerId\u0026#34;:2,\u0026#34;incarnationId\u0026#34;:\u0026#34;ZF_WQqk_T5q3l1vhiWT_FA\u0026#34;,\u0026#34;brokerEpoch\u0026#34;:4,\u0026#34;endPoints\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;PLAINTEXT\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;172.18.0.4\u0026#34;,\u0026#34;port\u0026#34;:9092,\u0026#34;securityProtocol\u0026#34;:0}],\u0026#34;features\u0026#34;:[],\u0026#34;rack\u0026#34;:null}} ... The individual record formats are described in KIP-631 (\u0026#34;The Quorum-based Kafka Controller\u0026#34;).\nAnother approach would be to use a brand-new tool, kafka-metadata-shell.sh. Also defined in KIP-631, this utility script allows to browse a cluster’s metadata, similarly to zookeeper-shell.sh known from earlier releases. For instance, you can list all brokers and get the metadata of the registration of node 1 like this:\n$ /kafka/bin/kafka-metadata-shell.sh --snapshot /kafka/data/@metadata-0/00000000000000000000.log Loading... Starting... [ Kafka Metadata Shell ] \u0026gt;\u0026gt; ls brokers configs local metadataQuorum topicIds topics \u0026gt;\u0026gt; ls brokers 1 2 3 \u0026gt;\u0026gt; cd brokers/1 \u0026gt;\u0026gt; cat registration RegisterBrokerRecord(brokerId=1, incarnationId=TmM_u-_cQ2ChbUy9NZ9wuA, brokerEpoch=265, endPoints=[BrokerEndpoint(name=\u0026#39;PLAINTEXT\u0026#39;, host=\u0026#39;172.18.0.3\u0026#39;, port=9092, securityProtocol=0)], features=[], rack=null) \u0026gt;\u0026gt; Or to display the current leader:\n\u0026gt;\u0026gt; cat /metadataQuorum/leader MetaLogLeader(nodeId=1, epoch=12) Or to show the metadata of a specific topic partition:\n\u0026gt;\u0026gt; cat /topics/dbserver1.inventory.customers/0/data { \u0026#34;partitionId\u0026#34; : 0, \u0026#34;topicId\u0026#34; : \u0026#34;8xjqykVRT_WpkqbXHwbeCA\u0026#34;, \u0026#34;replicas\u0026#34; : [ 2, 3 ], \u0026#34;isr\u0026#34; : [ 2, 3 ], \u0026#34;removingReplicas\u0026#34; : null, \u0026#34;addingReplicas\u0026#34; : null, \u0026#34;leader\u0026#34; : 2, \u0026#34;leaderEpoch\u0026#34; : 0, \u0026#34;partitionEpoch\u0026#34; : 0 } \u0026gt;\u0026gt; Those are just a few of the things you can do with kafka-metadata-shell.sh, and it surely will be an invaluable tool in the box of administrators in the ZK-less era. Another new tool is kafka-cluster.sh, which currently can do two things: displaying the unique id of a cluster, and unregistering a broker. While the former worked for me:\n$ /kafka/bin/kafka-cluster.sh cluster-id --bootstrap-server kafka-1:9092 Cluster ID: oh-sxaDRTcyAr6pFRbXyzA The latter always failed with a NotControllerException, no matter on which node I invoked the command:\n$ /kafka/bin/kafka-cluster.sh unregister --bootstrap-server kafka-1:9092 --id 3 [2021-05-15 20:52:54,626] ERROR [AdminClient clientId=adminclient-1] Unregister broker request for broker ID 3 failed: This is not the correct controller for this cluster. It’s not quite clear to me whether I did something wrong, or whether this functionality simply should not be expected to be supported just yet.\nThe Raft-based metadata quorum also comes with a set of new metrics (described in KIP-595), allowing to retrieve information like the current active controller, role of the node at hand, and more. Here’s a screenshot of the metrics invoked on a non-leader node:\nTaking Brokers Down An essential aspect to any distributed system like Kafka is the fact that invidual nodes of a cluster can disappear at any time, be it due to failures (node crashes, network splits, etc.), or due to controlled shut downs, e.g. for a version upgrade. So I was curious how Kafka in KRaft mode would deal with the situation where nodes in the cluster are stopped and then restarted. Note I’m stopping nodes gracefully via docker-compose stop, instead of randomly crashing them, Jepsen-style ;)\nThe sequence of events I was testing was the following:\nStop the current active controller, so two nodes from the original three-node cluster remain\nStop the then new active controller node, at which point the majority of cluster nodes isn’t available any longer\nStart both nodes again\nHere’s a few noteworthy things I observed. As you’d expect, when stopping the active controller, a new leader was elected (as per the result of cat /metadataQuorum/leader in the Kafka metadata shell), and also all partitions which had the previous active controller as partition leader, got re-assigned (in this case node 1 was the active controller and got stopped):\n$ /kafka/bin/kafka-topics.sh --bootstrap-server kafka-2:9092 --describe --topic dbserver1.inventory.customers Topic: dbserver1.inventory.customers\tTopicId: a6qzjnQwQ2eLNSXL5svW8g\tPartitionCount: 10\tReplicationFactor: 2\tConfigs: segment.bytes=1073741824 Topic: dbserver1.inventory.customers\tPartition: 0\tLeader: 1\tReplicas: 1,3\tIsr: 1,3 Topic: dbserver1.inventory.customers\tPartition: 1\tLeader: 1\tReplicas: 3,1\tIsr: 1,3 Topic: dbserver1.inventory.customers\tPartition: 2\tLeader: 1\tReplicas: 1,2\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 3\tLeader: 1\tReplicas: 2,1\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 4\tLeader: 1\tReplicas: 2,1\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 5\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 6\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 7\tLeader: 2\tReplicas: 2,3\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 8\tLeader: 1\tReplicas: 2,1\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 9\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 # After stopping node 1 $ /kafka/bin/kafka-topics.sh --bootstrap-server kafka-2:9092 --describe --topic dbserver1.inventory.customers Topic: dbserver1.inventory.customers\tTopicId: a6qzjnQwQ2eLNSXL5svW8g\tPartitionCount: 10\tReplicationFactor: 2\tConfigs: segment.bytes=1073741824 Topic: dbserver1.inventory.customers\tPartition: 0\tLeader: 3\tReplicas: 1,3\tIsr: 3 Topic: dbserver1.inventory.customers\tPartition: 1\tLeader: 3\tReplicas: 3,1\tIsr: 3 Topic: dbserver1.inventory.customers\tPartition: 2\tLeader: 2\tReplicas: 1,2\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 3\tLeader: 2\tReplicas: 2,1\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 4\tLeader: 2\tReplicas: 2,1\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 5\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 6\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 7\tLeader: 2\tReplicas: 2,3\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 8\tLeader: 2\tReplicas: 2,1\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 9\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Things got interesting though when also stopping the newly elected leader subsequently. At this point, the cluster isn’t in a healthy state any longer, as no majority of nodes of the cluster is available for leader election. Logs of the remaining node are flooded with an UnknownHostException in this situation:\nkafka-3_1 | 2021-05-16 10:16:45,282 - WARN [kafka-raft-outbound-request-thread:NetworkClient@992] - [RaftManager nodeId=3] Error connecting to node kafka-2:9093 (id: 2 rack: null) kafka-3_1 | java.net.UnknownHostException: kafka-2 kafka-3_1 | at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797) kafka-3_1 | at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505) kafka-3_1 | at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364) kafka-3_1 | at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298) kafka-3_1 | at org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27) kafka-3_1 | at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111) kafka-3_1 | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512) kafka-3_1 | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466) kafka-3_1 | at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172) kafka-3_1 | at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985) kafka-3_1 | at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311) kafka-3_1 | at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103) kafka-3_1 | at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99) kafka-3_1 | at scala.collection.Iterator.foreach(Iterator.scala:943) kafka-3_1 | at scala.collection.Iterator.foreach$(Iterator.scala:943) kafka-3_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) kafka-3_1 | at scala.collection.IterableLike.foreach(IterableLike.scala:74) kafka-3_1 | at scala.collection.IterableLike.foreach$(IterableLike.scala:73) kafka-3_1 | at scala.collection.AbstractIterable.foreach(Iterable.scala:56) kafka-3_1 | at kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99) kafka-3_1 | at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73) kafka-3_1 | at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94) kafka-3_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) Here I think it’d be great to get a more explicit indication in the logs of what’s going on, clearly indicating the unhealthy status of the cluster at large.\nWhat’s also interesting is that the remaining node claims to be a leader as per its exposed metrics and value of /metadataQuorum/leader in the metadata shell. This seems a bit dubious, as no leader election can happen without the majority of nodes available. Consequently, creation of a topic in this state also times out, so I suspect this is more an artifact of displaying the cluster state rather than of what’s actually going on.\nThings get a bit more troublesome when restarting the two stopped nodes; Very often I’d then see a very high CPU consumption on the Kafka nodes as well as the Connect node:\n$ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 642eb697fed6 tutorial_connect_1 122.04% 668.3MiB / 7.775GiB 8.39% 99.7MB / 46.9MB 131kB / 106kB 47 5d9806526f92 tutorial_kafka-1_1 9.24% 386.4MiB / 7.775GiB 4.85% 105kB / 104kB 0B / 877kB 93 767e6c0f6cd3 tutorial_kafka-3_1 176.40% 739.2MiB / 7.775GiB 9.28% 14.5MB / 40.6MB 0B / 1.52MB 120 a0ce8438557f tutorial_kafka-2_1 87.51% 567.8MiB / 7.775GiB 7.13% 6.52MB / 24.9MB 0B / 881kB 95 df978d220132 tutorial_postgres_1 0.00% 36.39MiB / 7.775GiB 0.46% 243kB / 5.49MB 0B / 79.4MB 9 In some cases stopping and restarting the Kafka nodes would help, other times only a restart of the Connect node would mitigate the situation. I didn’t further explore this issue by taking a thread dump, but I suppose threads are stuck in some kind of busy spin loop at this point. The early access state of KRaft mode seems to be somewhat showing here. After bringing up the issue on the Kafka mailing list, I’ve logged KAFKA-12801 for this problem, as it seems not to have been tracked before.\nOn the bright side, once all brokers were up and running again, the cluster and the Debezium connector would happily continue their work.\nWrap-Up Not many features have been awaited by the Kafka community as eagerly as the removal of the ZooKeeper dependency. Rightly so: Kafka-based metadata storage and leader election will greatly simplify the operational burden for running Kafka and also allow for better scalability. Lifting the requirement for running separate ZooKeeper processes or even machines should also help to make things more cost-effective, so you should benefit from this change no matter whether you’re running Kafka yourself or are using a managed service offering.\nThe early access release of ZK-less Kafka in version 2.8 gives a first impression of what will hopefully be the standard way of running Kafka in the not too distant future. As very clearly stated in the KRaft README, you should not use this in production yet; this matches with the observerations made above: while running Kafka without ZooKeeper definitely feels great, there’s still some rough edges to be sorted out. Also check out the README for a list of currently missing features, such as support of transactions, adding partitions to existing topics, partition reassignment, and more. Lastly, any distributed system should only be fully trusted after going through the grinder of the Jepsen test suite, which I’m sure will only be a question of time.\nDespite the early state, I would very much recommend to get started testing ZK-less Kafka at this point, so to get a feeling for it and of course to report back any findings and insights. To do so, either download the upstream Kafka distribution, or build the Debezium 1.6 container image for Kafka with preliminary support for KRaft mode, which lets you set up a ZK-less Kafka cluster in no time.\nIn order to learn more about ZK-less Kafka, besides diving into the relevant KIPs (which all are linked from the umbrella KIP-500), also check out the QCon talk \u0026#34;Kafka Needs No Keeper\u0026#34; by Colin McCabe, one of the main engineers driving this effort.\n","id":52,"publicationdate":"May 17, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSometimes, less is more.\nOne case where that’s certainly true is dependencies.\nAnd so it shouldn’t come at a surprise that the \u003ca href=\"https://kafka.apache.org/\"\u003eApache Kafka\u003c/a\u003e community is eagerly awaiting the removal of the dependency to the \u003ca href=\"https://zookeeper.apache.org/\"\u003eZooKeeper\u003c/a\u003e service,\nwhich currently is used for storing Kafka metadata (e.g. about topics and partitions) as well as for the purposes of leader election in the cluster.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe Kafka improvement proposal \u003ca href=\"https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum\"\u003eKIP-500\u003c/a\u003e\n(\u0026#34;Replace ZooKeeper with a Self-Managed Metadata Quorum\u0026#34;)\npromises to make life better for users in many regards:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eBetter getting started and operational experience by requiring to run only one system, Kafka, instead of two\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eRemoving potential for discrepancies of metadata state between ZooKeeper and the Kafka controller\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eSimplifying configuration, for instance when it comes to security\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eBetter scalability, e.g. in terms of number of partitions; faster execution of operations like topic creation\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e","tags":null,"title":"Exploring ZooKeeper-less Kafka","uri":"https://www.morling.dev/blog/exploring-zookeeper-less-kafka/"},{"content":" One of the ultimate strengths of Java is its strong notion of backwards compatibility: Java applications and libraries built many years ago oftentimes run without problems on current JVMs, and the compiler of current JDKs can produce byte code, that is executable with earlier Java versions.\nFor instance, JDK 16 supports byte code levels going back as far as to Java 1.7; But: hic sunt dracones. The emitted byte code level is just one part of the story. It’s equally important to consider which APIs of the JDK are used by the compiled code, and whether they are available in the targeted Java runtime version. As an example, let’s consider this simple \u0026#34;Hello World\u0026#34; program:\npackage com.example; import java.util.List; public class HelloWorld { public static void main(String... args) { System.out.println(List.of(\u0026#34;Hello\u0026#34;, \u0026#34;World!\u0026#34;)); } } Let’s assume we’re using Java 16 for compiling this code, aiming for compatibility with Java 1.8. Historically, the Java compiler has provided the --source and --target options for this purpose, which are well known to most Java developers:\n$ javac --source 1.8 --target 1.8 -d classes HelloWorld.java warning: [options] bootstrap class path not set in conjunction with -source 8 1 warning This compiles successfully (we’ll come back on that warning in a bit). But if you actually try to run that class on Java 8, you’re in for a bad suprise:\n$ java -classpath classes com.example.HelloWorld Exception in thread \u0026#34;main\u0026#34; java.lang.NoSuchMethodError: ↩ java.util.List.of(Ljava/lang/Object;Ljava/lang/Object;)Ljava/util/List; at com.example.HelloWorld.main(HelloWorld.java:7) This makes sense: the List.of() methods were only introduced in Java 9, so they are not present in the Java 8 API. Shouldn’t the compiler have let us know us about this? Absolutely, and that’s where this warning about the bootstrap class path is coming in: the compiler recognized our potentially dangerous endavour and essentially suggested to compile against the class library matching the targeted Java version instead of that one of the JDK used for compilation. This is done using the -Xbootclasspath option:\n$ javac --source 1.8 --target 1.8 \\ -d classes \\ -Xbootclasspath:${JAVA_8_HOME}/jre/lib/rt.jar \\ (1) HelloWorld.java HelloWorld.java:7: error: cannot find symbol System.out.println(List.of(\u0026#34;Hello\u0026#34;, \u0026#34;World!\u0026#34;)); ^ symbol: method of(String,String) location: interface List 1 error 1 Path to the rt.jar of Java 8 That’s much better: now the invocation of the List.of() method causes compilation to fail, instead of finding out about this problem only during testing, or worse, in production.\nWhile this approach works, it’s not without issues: requiring the target Java version’s class library complicates things quite a bit; multiple Java versions need to be installed, and the targeted JDK’s location must be known, which for instance tends to make build processes not portable between different machines and platforms.\nLuckily, Java 9 improved things significantly here; by means of the new --release option, code can be compiled for older Java versions in a fully safe and portable way. Let’s give this a try:\n$ javac --release 8 -d classes HelloWorld.java HelloWorld.java:7: error: cannot find symbol System.out.println(List.of(\u0026#34;Hello\u0026#34;, \u0026#34;World!\u0026#34;)); ^ symbol: method of(String,String) location: interface List 1 error Very nice, the same compilation error as before, but without the need for any complex configuration besides the --release 8 option. So how does this work? Does the JDK come with full class libraries of all the earlier Java versions which it supports? Considering that the modules file of Java 16 has a size of more than one hundred megabytes (to be precise, 118 MB on macOS), that’d clearly be not a good idea; We’d end up with a JDK size of nearly one gigabyte.\nWhat’s happening instead is that the JDK ships \u0026#34;stripped-down class files corresponding to class files from the target platform versions\u0026#34;, as we can read in JEP 247 (\u0026#34;Compile for Older Platform Versions\u0026#34;), which introduced the --release option. Details about the implementation are sparse, though. The JEP only mentions a ZIP file named ct.sym which contains those signature files. So I started by taking a look at what’s in there:\n$ unzip -l $JAVA_HOME/lib/ct.sym Archive: /Library/Java/JavaVirtualMachines/jdk-16.sdk/Contents/Home/lib/ct.sym Length Date Time Name --------- ---------- ----- ---- 0 03-26-2021 18:11 7/java.base/java/awt/peer/ 2557 03-26-2021 18:11 7/java.base/java/awt/peer/ComponentPeer.sig 542 03-26-2021 18:11 7/java.base/java/awt/peer/FramePeer.sig ... 856 03-26-2021 18:11 879A/java.activation/javax/activation/ActivationDataFlavor.sig 491 03-26-2021 18:11 879A/java.activation/javax/activation/CommandInfo.sig 299 03-26-2021 18:11 879A/java.activation/javax/activation/CommandObject.sig ... 1566 03-26-2021 18:11 9ABCDE/java.base/java/lang/Byte.sig 1616 03-26-2021 18:11 9ABCDE/java.base/java/lang/Short.sig ... That’s interesting, lots of *.sig files, organized in some at first odd-looking directory structure. So let’s see what’s there for the java.util.List class:\n$ unzip -l $JAVA_HOME/lib/ct.sym | grep \u0026#34;java/util/List.sig\u0026#34; 1481 03-26-2021 18:11 7/java.base/java/util/List.sig 1771 03-26-2021 18:11 8/java.base/java/util/List.sig 4040 03-26-2021 18:11 9/java.base/java/util/List.sig 4184 03-26-2021 18:11 A/java.base/java/util/List.sig 4097 03-26-2021 18:11 BCDEF/java.base/java/util/List.sig Five different versions altogether, under the directories 7, 8, 9, A, and BCDEF. It took a few moments until the structure began to make sense to me: the top-level directory names encode Java version(s), and there’s a new version of the signature file whenever its API changed. I.e. java.util.List changed in Java 7, 8, 9, 10 (A), and 11 (B), and has remained stable since then, i.e. from version 11 to 16, there have been no changes to the public List API.\nSo let’s dive in a bit further and compare the signature files of Java 8 and 9. As JEP 247 states that these files are (stripped-down) class files, we should be able to examine them using javap. In order to so, I had to change the file extensions from *.sig to *.class, though. After that, I could decompile the files using javap, save the result in text files and compare them using git:\n$ javap List8.class \u0026gt; List8.txt $ javap List9.class \u0026gt; List9.txt $ git diff --no-index List8.txt List9.txt diff --git a/List8.txt b/List9.txt index b2ca320..b276286 100644 --- a/List8.txt +++ b/List9.txt @@ -27,4 +27,16 @@ public interface java.util.List\u0026lt;E\u0026gt; extends java.util.Collection\u0026lt;E\u0026gt; { public abstract java.util.ListIterator\u0026lt;E\u0026gt; listIterator(int); public abstract java.util.List\u0026lt;E\u0026gt; subList(int, int); public default java.util.Spliterator\u0026lt;E\u0026gt; spliterator(); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E...); } As expected, the diff between the two signature files reveals the addition of the different List.of() methods in Java 9, as such exactly the reason why the Hello World example from the beginning cannot be executed on Java 8.\nDebugging the Java Compiler In order to understand in detail how the ct.sym file is used by the Java compiler, it can be useful to run javac in debug mode. As javac is written in Java itself, this can be done exactly the same way as when remote debugging any other Java application. You only need to start javac using the usual debug switches, which must be prepended with -J in this case:\n$ javac -J-Xdebug \\ -J-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 \\ HelloWorld.java Make sure to download the right version of the OpenJDK source code and set it up in your IDE, so that you also can step through internal classes whose source code isn’t distributed with binary builds. An interesting starting point for your explorations could be the JDKPlatformProvider class.\nTo double-check, you could also confirm with the API diffs provided by the Java Version Almanac or the Adopt OpenJDK JDK API diff generator. While doing so, one more thing piqued my curiosity: these reports don’t show any changes to java.util.List in Java 11, whereas ct.sym contains a new version of the corresponding signature file; To find out what’s going on, again javap — this time with a bit more detail level — came in handy:\n$ javap -p -c -s -v -l List10.class \u0026gt; List10.txt $ javap -p -c -s -v -l List11.class \u0026gt; List11.txt $ git diff --no-index -w List10.txt List11.txt ... - #96 = Utf8 RuntimeInvisibleAnnotations - #97 = Utf8 Ljdk/Profile+Annotation; - #98 = Utf8 value - #99 = Integer 1 { public abstract int size(); descriptor: ()I @@ -308,8 +304,3 @@ Constant pool: Signature: #87 // \u0026lt;E:Ljava/lang/Object;\u0026gt;(Ljava/util/Collection\u0026lt;+TE;\u0026gt;;)Ljava/util/List\u0026lt;TE;\u0026gt;; } Signature: #95 // \u0026lt;E:Ljava/lang/Object;\u0026gt;Ljava/lang/Object;Ljava/util/Collection\u0026lt;TE;\u0026gt;; -RuntimeInvisibleAnnotations: - 0: #97(#98=I#99) - jdk.Profile+Annotation( - value=1 - ) An annotation with the interesting name @jdk.Profile+Annotion(1) got removed. Now, if you look at the List.java source file in Java 10, you won’t find this annotation anywhere. In fact, this annotation type doesn’t exist at all. By grepping through the OpenJDK source code for ct.sym, I learned that it is a synthetic annotation which gets added during the process of creating the signature files, denoting which compact profile a class belongs to.\nCompact Profiles Compact Profiles are a notion in Java 8 which defines three specific sub-sets of the Java platform: compact1, compact2, and compact3. Each profile contains a fixed set of JDK packages and build upon each other, allowing for more size-efficient deployments to constrained devices, if such profile is sufficient for a given application. With Java 9, the module system, and the ability to create custom runtime images on a much more granular level (using jlink), compact profiles became pretty much obsolete.\nSo that’s another purpose of the ct.sym file: it allows the compiler to ensure compatibility with a chosen compact profile. In current JDKs, javac still supports the -profile option, but only when compiling for Java 8. In that light, it’s not quite clear why that annotation only was removed from the signature file with Java 11.\nSumming up, since Java 9 the javac compiler provides powerful means of ensuring API compatibility with earlier Java versions. With a size of 7.2 MB for Java 16, the ct.sym file contains the JDK API signature versions all the way back to Java 7. Using the --release compiler option, backwards-compatible builds, fully portable, and without the need for actually installing earlier JDKs, are straight foward. With that tool in your box, there’s really no need any longer for using the -source and -target options. Not only that, --release will also help to spot subtle compatibility issues related to overriding methods with co-variant return types, such as ByteBuffer.position().\n","id":53,"publicationdate":"Apr 26, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOne of the ultimate strengths of Java is its strong notion of backwards compatibility:\nJava applications and libraries built many years ago oftentimes run without problems on current JVMs,\nand the compiler of current JDKs can produce byte code, that is executable with earlier Java versions.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor instance, JDK 16 supports byte code levels going back as far as to Java 1.7;\nBut: \u003cem\u003ehic sunt dracones\u003c/em\u003e.\nThe emitted byte code level is just one part of the story.\nIt’s equally important to consider which APIs of the JDK are used by the compiled code,\nand whether they are available in the targeted Java runtime version.\nAs an example, let’s consider this simple \u0026#34;Hello World\u0026#34; program:\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"The Anatomy of ct.sym — How javac Ensures Backwards Compatibility","uri":"https://www.morling.dev/blog/the-anatomy-of-ct-sym-how-javac-ensures-backwards-compatibility/"},{"content":" Java 16 is around the corner, so there’s no better time than now for learning more about the features which the new version will bring. After exploring the support for Unix domain sockets a while ago, I’ve lately been really curious about the incubating Vector API, as defined by JEP 338, developed under the umbrella of Project Panama, which aims at \u0026#34;interconnecting JVM and native code\u0026#34;.\nVectors?!? Of course this is not about renewing the ancient Java collection types like java.util.Vector (\u0026lt;insert some pun about this here\u0026gt;), but rather about an API which lets Java developers take advantage of the vector calculation capabilities you can find in most CPUs these days. Now I’m by no means an expert on low-level programming leveraging specific CPU instructions, but exactly that’s why I hope to make the case with this post that the new Vector API makes these capabilities approachable to a wide audience of Java programmers.\nWhat’s SIMD Anyways? Before diving into a specific example, it’s worth pointing out why that API is so interesting, and what it could be used for. In a nutshell, CPU architectures like x86 or AArch64 provide extensions to their instruction sets which allow you to apply a single operation to multiple data items at once (SIMD — single instruction, multiple data). If a specific computing problem can be solved using an algorithm that lends itself to such parallelization, substantial performance improvements can be gained. Examples for such SIMD instruction set extensions include SSE and AVX for x64, and Neon of AArch64 (Arm).\nAs such, they complement other means of compute parallelization: scaling out across multiple machines which collaborate in a cluster, and multi-threaded programming. Unlike these though, vectorized computations are done within the scope of an individual method, e.g. operating on multiple elements of an array at once.\nSo far, there was no way for Java developers to directly work with such SIMD instructions. While you can use SIMD intrinsics in languages closer to the metal such as C/C++, no such option exists in Java so far. Note this doesn’t mean Java wouldn’t take advantage of SIMD at all: the JIT compiler can auto-vectorize code in specific situations, i.e. transforming code from a loop into vectorized code. Whether that’s possible or not isn’t easy to determine, though; small changes to a loop which the compiler was able to vectorize before, may lead to scalar execution, resulting in a performance regression.\nJEP 338 aims to improve this situation: introducing a portable vector computation API, it allows Java developers to benefit from SIMD execution by means of explicitly vectorized algorithms. Unlike C/C++ style intrinsics, this API will be mapped automatically by the C2 JIT compiler to the corresponding instruction set of the underlying platform, falling back to scalar execution if the platform doesn’t provide the required capabilities. A pretty sweet deal, if you ask me!\nNow, why would you be interested in this? Doesn’t \u0026#34;vector calculation\u0026#34; sound an awful lot like mathematics-heavy, low-level algorithms, which you don’t tend to find that much in your typical Java enterprise applications? I’d say, yes and no. Indeed it may not be that beneficial for say a CRUD application copying some data from left to right. But there are many interesting applications in areas like image processing, AI, parsing, (SIMD-based JSON parsing being a prominently discussed example), text processing, data type conversions, and many others. In that regard, I’d expect that JEP 338 will pave the path for using Java in many interesting use cases, where it may not be the first choice today.\nVectorizing FizzBuzz To see how the Vector API can help with improving the performance of some calculation, let’s consider FizzBuzz. Originally, FizzBuzz is a game to help teaching children division; but interestingly, it also serves as entry-level interview question for hiring software engineers in some places. In any case, it’s a nice example for exploring how some calculation can benefit from vectorization. The rules of FizzBuzz are simple:\nNumbers are counted and printed out: 1, 2, 3, …​\nIf a number if divisible by 3, instead of printing the number, print \u0026#34;Fizz\u0026#34;\nIf a number if divisible by 5, print \u0026#34;Buzz\u0026#34;\nIf a number if divisible by 3 and 5, print \u0026#34;FizzBuzz\u0026#34;\nAs the Vector API concerns itself with numeric values instead of strings, rather than \u0026#34;Fizz\u0026#34;, \u0026#34;Buzz\u0026#34;, and \u0026#34;FizzBuzz\u0026#34;, we’re going to emit -1, -2, and -3, respectively. The input of the program will be an array with the numbers from 1 …​ 256, the output an array with the FizzBuzz sequence:\n1, 2, -1, 4, -2, -1, 7, 8, -1, -2, 11, -1, 13, 14, -3, 16, ... The task is easily solved using a plain for loop processing scalar values one by one:\nprivate static final int FIZZ = -1; private static final int BUZZ = -2; private static final int FIZZ_BUZZ = -3; public int[] scalarFizzBuzz(int[] values) { int[] result = new int[values.length]; for (int i = 0; i \u0026lt; values.length; i++) { int value = values[i]; if (value % 3 == 0) { if (value % 5 == 0) { (1) result[i] = FIZZ_BUZZ; } else { result[i] = FIZZ; (2) } } else if (value % 5 == 0) { result[i] = BUZZ; (3) } else { result[i] = value; (4) } } return result; } 1 The current number is divisible by 3 and 5: emit FIZZ_BUZZ (-3) 2 The current number is divisible by 3: emit FIZZ (-1) 3 The current number is divisible by 5: emit BUZZ (-2) 4 The current number is divisible by neither 3 nor 5: emit the number itself As a baseline, this implementation can be executed ~2.2M times per second in a simple JMH benchmark running on my Macbook Pro 2019, with a 2.6 GHz 6-Core Intel Core i7 CPU:\nBenchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s Now let’s see how this calculation could be vectorized and what performance improvements can be gained by doing so. When looking at the incubating Vector API, you may be overwhelmed at first by its large API surface. But it’s becoming manageable once you realize that all the types like IntVector, LongVector, etc. essentially expose the same set of methods, only specific for each of the supported data types (and indeed, as per the JavaDoc, all these classes were not hand-written by some poor soul, but generated, from some sort of parameterized template supposedly).\nAmongst the plethora of API methods, there is no modulo operation, though (which makes sense, as for instance there isn’t such instruction in any of the x86 SIMD extensions). So what could we do to solve the FizzBuzz task? After skimming through the API for some time, the method blend​(Vector\u0026lt;Integer\u0026gt; v, VectorMask\u0026lt;Integer\u0026gt; m) caught my attention:\nReplaces selected lanes of this vector with corresponding lanes from a second input vector under the control of a mask. […​]\nFor any lane set in the mask, the new lane value is taken from the second input vector, and replaces whatever value was in the that lane of this vector.\nFor any lane unset in the mask, the replacement is suppressed and this vector retains the original value stored in that lane.\nThis sounds pretty useful; The pattern of expected -1, -2, and -3 values repeats every 15 input values. So we can \u0026#34;pre-calculate\u0026#34; that pattern once and persist it in form of vectors and masks for the blend() method. While stepping through the input array, the right vector and mask are obtained based on the current position and are used with blend() in order to mark the values divisible by 3, 5, and 15 (another option could be min(Vector\u0026lt;Integer\u0026gt; v), but I decided against it, as we’d need some magic value for representing those numbers which should be emitted as-is).\nHere is a visualization of the approach, assuming a vector length of eight elements (\u0026#34;lanes\u0026#34;):\nSo let’s see how we can implement this using the Vector API. The mask and second input vector repeat every 120 elements (least common multiple of 8 and 15), so 15 masks and vectors need to be determined. They can be created like so:\npublic class FizzBuzz { private static final VectorSpecies\u0026lt;Integer\u0026gt; SPECIES = IntVector.SPECIES_256; (1) private final List\u0026lt;VectorMask\u0026lt;Integer\u0026gt;\u0026gt; resultMasks = new ArrayList\u0026lt;\u0026gt;(15); private final IntVector[] resultVectors = new IntVector[15]; public FizzBuzz() { List\u0026lt;VectorMask\u0026lt;Integer\u0026gt;\u0026gt; threes = Arrays.asList( (2) VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00100100), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b01001001), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b10010010) ); List\u0026lt;VectorMask\u0026lt;Integer\u0026gt;\u0026gt; fives = Arrays.asList( (3) VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00010000), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b01000010), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00001000), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00100001), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b10000100) ); for(int i = 0; i \u0026lt; 15; i++) { (4) VectorMask\u0026lt;Integer\u0026gt; threeMask = threes.get(i%3); VectorMask\u0026lt;Integer\u0026gt; fiveMask = fives.get(i%5); resultMasks.add(threeMask.or(fiveMask)); (5) resultVectors[i] = IntVector.zero(SPECIES) (6) .blend(FIZZ, threeMask) .blend(BUZZ, fiveMask) .blend(FIZZ_BUZZ, threeMask.and(fiveMask)); } } } 1 A vector species describes the combination of an vector element type (in this case Integer) and a vector shape (in this case 256 bit); i.e. here we’re going to deal with vectors that hold 8 32 bit int values 2 Vector masks describing the numbers divisible by three (read the bit values from right to left) 3 Vector masks describing the numbers divisible by five 4 Let’s create the fifteen required result masks and vectors 5 A value in the output array should be set to another value if it’s divisible by three or five 6 Set the value to -1, -2, or -3, depending on whether its divisible by three, five, or fifteen, respectively; otherwise set it to the corresponding value from the input array With this infrastructure in place, we can implement the actual method for calculating the FizzBuzz values for an arbitrarily long input array:\npublic int[] simdFizzBuzz(int[] values) { int[] result = new int[values.length]; int i = 0; int upperBound = SPECIES.loopBound(values.length); (1) for (; i \u0026lt; upperBound; i += SPECIES.length()) { (2) IntVector chunk = IntVector.fromArray(SPECIES, values, i); (3) int maskIdx = (i/SPECIES.length())%15; (4) IntVector fizzBuzz = chunk.blend(resultValues[maskIdx], resultMasks[maskIdx]); (5) fizzBuzz.intoArray(result, i); (6) } for (; i \u0026lt; values.length; i++) { (7) int value = values[i]; if (value % 3 == 0) { if (value % 5 == 0) { result[i] = FIZZ_BUZZ; } else { result[i] = FIZZ; } } else if (value % 5 == 0) { result[i] = BUZZ; } else { result[i] = value; } } return result; } 1 determine the maximum index in the array that’s divisible by the species length; e.g. if the input array is 100 elements long, that’d be 96 in the case of vectors with eight elements each 2 Iterate through the input array in steps of the vector length 3 Load the current chunk of the input array into an IntVector 4 Obtain the index of the right result vector and mask 5 Determine the FizzBuzz numbers for the current chunk (i.e. that’s the actual SIMD instruction, processing all eight elements of the current chunk at once) 6 Copy the result values at the right index into the result array 7 Process any remainder (e.g. the last four remaining elements in case of an input array with 100 elements) using the traditional scalar approach, as those values couldn’t fill up another vector instance To reiterate what’s happening here: instead of processing the values of the input array one by one, they are processed in chunks of eight elements each by means of the blend() vector operation, which can be mapped to an equivalent SIMD instruction of the CPU. In case the input array doesn’t have a length that’s a multiple of the vector length, the remainder is processed in the traditional scalar way. The resulting duplication of the logic seems a bit inelegant, we’ll discuss in a bit what can be done about that.\nFor now, let’s see whether our efforts pay off; i.e. is this vectorized approach actually faster then the basic scalar implementation? Turns out it is! Here are the numbers I get from JMH on my machine, showing through-put increasing by factor 3:\nBenchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s Is there anything that could be further improved? I’m pretty sure, but as said I’m not an expert here, so I’ll leave it to smarter folks to point out more efficient implementations in the comments. One thing I figured is that the division and modulo operation for obtaining the current mask index isn’t ideal. Keeping a separate loop variable that’s reset to 0 after reaching 15 proved to be quite a bit faster:\npublic int[] simdFizzBuzz(int[] values) { int[] result = new int[values.length]; int i = 0; int j = 0; int upperBound = SPECIES.loopBound(values.length); for (; i \u0026lt; upperBound; i += SPECIES.length()) { IntVector chunk = IntVector.fromArray(SPECIES, values, i); IntVector fizzBuzz = chunk.blend(resultValues[j], resultMasks[j]); fizzBuzz.intoArray(result, i); j++; if (j == 15) { j = 0; } } // processing of remainder... } Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 8830433,250 ± 69955,161 ops/s This makes for another nice improvement, yielding 4x the throughput of the original scalar implementation. Now, to make this a true apple-to-apple comparison, a mask-based approach can also be applied to the purely scalar implementation, only that each value needs to be looked up individually:\nprivate int[] serialMask = new int[] {0, 0, -1, 0, -2, -1, 0, 0, -1, -10, 0, -1, 0, 0, -3}; public int[] serialFizzBuzzMasked(int[] values) { int[] result = new int[values.length]; int j = 0; for (int i = 0; i \u0026lt; values.length; i++) { int res = serialMask[j]; result[i] = res == 0 ? values[i] : res; j++; if (j == 15) { j = 0; } } return result; } Indeed, this implementation is quite a bit better than the original one, but still the SIMD-based approach is more than twice as fast:\nBenchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.scalarFizzBuzzMasked 256 thrpt 5 4156751,424 ± 23668,949 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 8830433,250 ± 69955,161 ops/s Examining the Native Code This all is pretty cool, but can we trust that under the hood things actually happen the way we expect them to happen? In order to verify that, let’s take a look at the native assembly code that gets produced by the JIT compiler for this implementation. This requires you to run the JVM with the hsdis plug-in; see this post for instructions on how to build and install hsdis. Let’s create a simple main class which executes the method in question in a loop, so to make sure the method actually gets JIT-compiled:\npublic class Main { public static int[] blackhole; public static void main(String[] args) { FizzBuzz fizzBuzz = new FizzBuzz(); var values = IntStream.range(1, 257).toArray(); for(int i = 0; i \u0026lt; 5_000_000; i++) { blackhole = fizzBuzz.simdFizzBuzz(values); } } } Run the program, enabling the output of the assembly, and piping its output into a log file:\njava -XX:+UnlockDiagnosticVMOptions \\ -XX:+PrintAssembly -XX:+LogCompilation \\ --add-modules=jdk.incubator.vector \\ --class-path target/classes \\ dev.morling.demos.simdfizzbuzz.Main \u0026gt; fizzbuzz.log Open the fizzbuzz.log file and look for the C2-compiled nmethod block of the simdFizzBuzz method. Somewhere within the method’s native code, you should find the vpblendvb instruction (output slightly adjusted for better readability):\n... =========================== C2-compiled nmethod ============================ --------------------------------- Assembly --------------------------------- Compiled method (c2) ... dev.morling.demos.simdfizzbuzz.FizzBuzz:: ↩ simdFizzBuzz (161 bytes) ... 0x000000011895e18d: vpmovsxbd %xmm7,%ymm7 ↩ ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0} ; - jdk.incubator.vector.IntVector::intoArray@42 (line 2962) ; - dev.morling.demos.simdfizzbuzz.FizzBuzz::simdFizzBuzz@76 (line 92) 0x000000011895e192: vpblendvb %ymm7,%ymm5,%ymm8,%ymm0 ↩ ;*invokestatic blend {reexecute=0 rethrow=0 return_oop=0} ; - jdk.incubator.vector.IntVector::blendTemplate@26 (line 1895) ; - jdk.incubator.vector.Int256Vector::blend@11 (line 376) ; - jdk.incubator.vector.Int256Vector::blend@3 (line 41) ; - dev.morling.demos.simdfizzbuzz.FizzBuzz::simdFizzBuzz@67 (line 91) ... vpblendvb is part of the x86 AVX2 instruction set and \u0026#34;conditionally copies byte elements from the source operand (second operand) to the destination operand (first operand) depending on mask bits defined in the implicit third register argument\u0026#34;, as such exactly corresponding to the blend() method in the JEP 338 API.\nOne detail not quite clear to me is why vpmovsxbd for copying the results into the output array (the intoArray() call) shows up before vpblendvb. If you happen to know the reason for this, I’d love to hear from you and learn about this.\nAvoiding Scalar Processing of Tail Elements Let’s get back to the scalar processing of the potential remainder of the input array. This feels a bit \u0026#34;un-DRY\u0026#34;, as it requires the algorithm to be implemented twice, once vectorized and once in a scalar way.\nThe Vector API recognizes the desire for avoiding this duplication and provides masked versions of all the required operations, so that during the last iteration no access beyond the array length will happen. Using this approach, the SIMD FizzBuzz method looks like this:\npublic int[] simdFizzBuzzMasked(int[] values) { int[] result = new int[values.length]; int j = 0; for (int i = 0; i \u0026lt; values.length; i += SPECIES.length()) { var mask = SPECIES.indexInRange(i, values.length); (1) var chunk = IntVector.fromArray(SPECIES, values, i, mask); (2) var fizzBuzz = chunk.blend(resultValues[j], resultMasks.get(j)); fizzBuzz.intoArray(result, i, mask); (2) j++; if (j == 15) { j = 0; } } return result; } 1 Obtain a mask which, during the last iteration, will have bits for those lanes unset, which are larger than the last encountered multiple of the vector length 2 Perform the same operations as above, but using the mask to prevent any access beyond the array length The implementation looks quite a bit nicer than the version with the explicit scalar processing of the remainder portion. But the impact on throughput is significant, the result is quite a disappointing:\nBenchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.scalarFizzBuzzMasked 256 thrpt 5 4156751,424 ± 23668,949 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 8830433,250 ± 69955,161 ops/s FizzBuzzBenchmark.simdFizzBuzzMasked 256 thrpt 5 1204128,029 ± 5556,553 ops/s In its current form, this approach is even slower than the pure scalar implementation. It remains to be seen whether and how performance gets improved here, as the Vector API matures. Ideally, the mask would have to be only applied during the very last iteration. This is something we either could do ourselves — re-introducing some special remainder handling, albeit less different from the core implementation than with the pure scalar approach discussed above — or perhaps even the compiler itself may be able to apply such transformation.\nOne important take-away from this is that a SIMD-based approach does not necessarily have to be faster than a scalar one. So every algorithmic adjustment should be validated with a corresponding benchmark, before drawing any conclusions. Speaking of which, I also ran the benchmark on that shiny new Mac Mini M1 (i.e. an AArch64-based machine) that found its way to my desk recently, and numbers are, mh, interesting:\nBenchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2717990,097 ± 4203,628 ops/s FizzBuzzBenchmark.scalarFizzBuzzMasked 256 thrpt 5 5750402,582 ± 2479,462 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 1297631,404 ± 15613,288 ops/s FizzBuzzBenchmark.simdFizzBuzzMasked 256 thrpt 5 374313,033 ± 2219,940 ops/s FizzBuzzBenchmark.simdFizzBuzzMasksInArray 256 thrpt 5 1316375,073 ± 1178,704 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 998979,324 ± 69997,361 ops/s The scalar implementation on the M1 out-performs the x86 MacBook Pro by quite a bit, but SIMD numbers are significantly lower.\nI haven’t checked the assembly code, but solely based on the figures, my guess is that the JEP 338 implementation in the current JDK 16 builds does not yet support AArch64, and the API falls back to scalar execution.\nHere it would be nice to have some method in the API which reveals whether SIMD support is provided by the current platform or not, as e.g. done by .NET with its Vector.IsHardwareAccelerated() method.\nUpdate, March 9th: After asking about this on the panama-dev mailing list, Ningsheng Jian from Arm explained that the AArch64 NEON instruction set has a maximum hardware vector size of 128 bits; hence the Vector API is transparently falling back to the Java implementation in our case of using 256 bits. By passing the -XX:+PrintIntrinsics flag you can inspect which API calls get intrinsified (i.e. executed via corresponding hardware instructions) and which ones not. When running the main class from above with this option, we get the relevant information (output slightly adjusted for better readability):\n@ 31 jdk.internal.vm.vector.VectorSupport::load (38 bytes) ↩ failed to inline (intrinsic) ... @ 26 jdk.internal.vm.vector.VectorSupport::blend (38 bytes) ↩ failed to inline (intrinsic) ... @ 42 jdk.internal.vm.vector.VectorSupport::store (38 bytes) ↩ failed to inline (intrinsic) ** not supported: arity=0 op=load vlen=8 etype=int ismask=no ** not supported: arity=2 op=blend vlen=8 etype=int ismask=useload ** not supported: arity=1 op=store vlen=8 etype=int ismask=no Fun fact: during the entire benchmark runtime of 10 min the fan of the Mac Mini was barely to hear, if at all. Definitely a very exciting platform, and I’m looking forward to doing more Java experiments on it soon.\nWrap-Up Am I suggesting you should go and implement your next FizzBuzz using SIMD? Of course not, FizzBuzz just served as an example here for exploring how a well-known \u0026#34;problem\u0026#34; can be solved more efficiently via the new Java Vector API (at the cost of increased complexity in the code), also without being a seasoned systems programmer. On the other hand, it may make an impression during your next job interview ;)\nIf you want to get started with your own experiments around the Vector API and SIMD, install a current JDK 16 RC (release candidate) build and grab the SIMD FizzBuzz example from this GitHub repo. A nice twist to explore would for instance be using ShortVector instead of IntVector (allowing to put 16 values into 256-bit vector), running the benchmark on machines with the AVX-512 extension (e.g. via the C5 instance type on AWS EC2), or both :)\nApart from the JEP document itself, there isn’t too much info out yet about the Vector API; a great starting point are the \u0026#34;vector\u0026#34; tagged posts on the blog of Richard Startin. Another inspirational resource is August Nagro’s project for vectorized UTF-8 validation based on a paper by John Keiser and Daniel Lemire. Kishor Kharbas and Paul Sandoz did a talk about the Vector API at CodeOne a while ago.\nTaking a step back, it’s hard to overstate the impact which the Vector API potentially will have on the Java platform. Providing SIMD capabilities in a rather easy-to-use, portable way, without having to rely on CPU instruction set specific intrinsics, may result in nothing less than a \u0026#34;democratization of SIMD\u0026#34;, making these powerful means of parallelizing computations available to a much larger developer audience.\nAlso the JDK class library itself may benefit from the Vector API; while JDK authors — unlike Java application developers — already have the JVM intrinsics mechanism at their disposal, the new API will \u0026#34;make prototyping easier, and broaden what might be economical to consider\u0026#34;, as pointed out by Claes Redestad.\nBut nothing in life is free, and code will have to be restructured or even re-written in order to benefit from this. Some problems lend themselves better than others to SIMD-style processing, and only time will tell in which areas the new API will be adopted. As said above, use cases like image processing and AI can benefit from SIMD a lot, due to the nature of the underlying calculations. Also specific data store operations can be sped up significantly using SIMD instructions; so my personal hope is that the Vector API can contribute to making Java an attractive choice for such applications, which previously were not considered a sweet spot for the Java platform.\nAs such, I can’t think of many recent Java API additions which may prove as influential as the Vector API.\n","id":54,"publicationdate":"Mar 8, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJava 16 is around the corner, so there’s no better time than now for learning more about the features which the new version will bring.\nAfter exploring the support for \u003ca href=\"/blog/talking-to-postgres-through-java-16-unix-domain-socket-channels/\"\u003eUnix domain sockets\u003c/a\u003e a while ago,\nI’ve lately been really curious about the incubating Vector API,\nas defined by \u003ca href=\"https://openjdk.java.net/jeps/338\"\u003eJEP 338\u003c/a\u003e,\ndeveloped under the umbrella of \u003ca href=\"https://openjdk.java.net/projects/panama/\"\u003eProject Panama\u003c/a\u003e,\nwhich aims at \u0026#34;interconnecting JVM and native code\u0026#34;.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eVectors?!?\u003c/em\u003e\nOf course this is not about renewing the ancient Java collection types like \u003ccode\u003ejava.util.Vector\u003c/code\u003e\n(\u0026lt;insert some pun about this here\u0026gt;),\nbut rather about an API which lets Java developers take advantage of the vector calculation capabilities you can find in most CPUs these days.\nNow I’m by no means an expert on low-level programming leveraging specific CPU instructions,\nbut exactly that’s why I hope to make the case with this post that the new Vector API makes these capabilities approachable to a wide audience of Java programmers.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"FizzBuzz – SIMD Style!","uri":"https://www.morling.dev/blog/fizzbuzz-simd-style/"},{"content":" Update Feb 5: This post is discussed on Hacker News\nReading a blog post about what’s coming up in JDK 16 recently, I learned that one of the new features is support for Unix domain sockets (JEP 380). Before Java 16, you’d have to resort to 3rd party libraries like jnr-unixsocket in order to use them. If you haven’t heard about Unix domain sockets before, they are \u0026#34;data communications [endpoints] for exchanging data between processes executing on the same host operating system\u0026#34;. Don’t be put off by the name btw.; Unix domain sockets are also supported by macOS and even Windows since version 10.\nDatabases such as Postgres or MySQL use them for offering an alternative to TCP/IP-based connections to client applications running on the same machine as the database. In such scenario, Unix domain sockets are both more secure (no remote access to the database is exposed at all; file system permissions can be used for access control), and also more efficient than TCP/IP loopback connections.\nA common use case are proxies for accessing Cloud-based databases, such as as the GCP Cloud SQL Proxy. Running on the same machine as a client application (e.g. in a sidecar container in case of Kubernetes deployments), they provide secure access to a managed database, for instance taking care of the SSL handling.\nMy curiousity was piqued and I was wondering what it’d take to make use of the new Java 16 Unix domain socket for connecting to Postgres. It was your regular evening during the pandemic, without much to do, so I thought \u0026#34;Let’s give this a try\u0026#34;. To have a testing bed, I started with installing Postgres 13 on Fedora 33. Fedora might not always have the latest Postgres version packaged just yet, but following the official Postgres instructions it is straight-forward to install newer versions.\nIn order to connect with user name and password via a Unix domain socket, one small adjustment to /var/lib/pgsql/13/data/pg_hba.conf is needed: the access method for the local connection type must be switched from the default value peer (which would try to authenticate using the operating system user name of the client process) to md5.\n... # TYPE DATABASE USER ADDRESS METHOD # \u0026#34;local\u0026#34; is for Unix domain socket connections only local all all md5 ... Make sure to apply the changed configuration by restarting the database (systemctl restart postgresql-13), and things are ready to go.\nThe Postgres JDBC Driver The first thing I looked into was the Postgres JDBC driver. Since version 9.4-1208 (released in 2016) it allows you to configure custom socket factories, a feature which explicitly was added considering Unix domain sockets. The driver itself doesn’t come with a socket factory implementation that’d actually support Unix domain sockets, but a few external open-source implementations exist. Most notably junixsocket provides such socket factory.\nCustom socket factories must extend javax.net.SocketFactory, and their fully-qualified class name needs to be specified using the socketFactory driver parameter. So it should be easy to create SocketFactory implementation based on the new UnixDomainSocketAddress class, right?\npublic class PostgresUnixDomainSocketFactory extends SocketFactory { @Override public Socket createSocket() throws IOException { var socket = new Socket(); socket.connect(UnixDomainSocketAddress.of( \u0026#34;/var/run/postgresql/.s.PGSQL.5432\u0026#34;)); (1) return socket; } // other create methods ... } 1 Create a Unix domain socket address for the default path of the socket on Fedora and related systems It compiles just fine; but it turns out not all socket addresses are equal, and java.net.Socket only connects to addresses of type InetSocketAddress (and the PG driver maintainers seem to sense some air of mystery around these \u0026#34;unusual\u0026#34; events, too):\norg.postgresql.util.PSQLException: Something unusual has occurred to cause the driver to fail. Please report this exception. at org.postgresql.Driver.connect(Driver.java:285) ... Caused by: java.lang.IllegalArgumentException: Unsupported address type at java.base/java.net.Socket.connect(Socket.java:629) at java.base/java.net.Socket.connect(Socket.java:595) at dev.morling.demos.PostgresUnixDomainSocketFactory.createSocket(PostgresUnixDomainSocketFactory.java:19) ... Now JEP 380 solely speaks about SocketChannel and stays silent about Socket; but perhaps obtaining a socket from a domain socket channel works?\npublic Socket createSocket() throws IOException { var sc = SocketChannel.open(UnixDomainSocketAddress.of( \u0026#34;/var/run/postgresql/.s.PGSQL.5432\u0026#34;)); return sc.socket(); } Nope, no luck either:\njava.lang.UnsupportedOperationException: Not supported at java.base/sun.nio.ch.SocketChannelImpl.socket(SocketChannelImpl.java:226) at dev.morling.demos.PostgresUnixDomainSocketFactory.createSocket(PostgresUnixDomainSocketFactory.java:17) Indeed it looks like JEP 380 is concerning itself only with the non-blocking SocketChannel API, while users of the blocking Socket API do not get to benefit from it. It should be possible to create a custom Socket implementation based on the socket channel support of JEP 380, but that’s going beyond the scope of my little exploration.\nThe Vert.x Postgres Client If the Postgres JDBC driver doesn’t easily benefit from the JEP, what about other Java Postgres clients then? There are several non-blocking options, including the Vert.x Postgres client and R2DBC. The former is used to bring Reactive capabilities for Postgres into the Quarkus stack, too, so I turned my attention to it.\nNow the Vert.x Postgres Client already has support for Unix domain sockets, by means of adding the right Netty native transport dependency to your project. So purely from functionality perspective, there’s not that much to be gained here. But being able to use domain sockets also with the default NIO transport would still be nice, as it means one less dependency to take care of. So I dug a bit into the code of the Postgres client and Vert.x itself and figured out, that two things needed adjustment:\nThe NIO-based Transport class of Vert.x needs to learn about the fact that SocketChannel now also supports Unix domain sockets (currently, an exception is raised when trying to use them without a Netty native transport)\nNetty’s NioSocketChannel needs some small changes, as it tries to obtain a Socket from the underlying SocketChannel, which doesn’t work for domain sockets as we’ve seen above\nStep 1 was quickly done by creating a custom sub-class of the default Transport class. Two methods needed changes: channelFactory() for obtaining a factory for the actual Netty transport channel, and convert() for converting a Vert.x SocketAddress into a NIO one:\npublic class UnixDomainTransport extends Transport { @Override public ChannelFactory\u0026lt;? extends Channel\u0026gt; channelFactory( boolean domainSocket) { if (!domainSocket) { (1) return super.channelFactory(domainSocket); } else { return () -\u0026gt; { try { var sc = SocketChannel.open(StandardProtocolFamily.UNIX); (2) return new UnixDomainSocketChannel(null, sc); } catch(Exception e) { throw new RuntimeException(e); } }; } } @Override public SocketAddress convert(io.vertx.core.net.SocketAddress address) { if (!address.isDomainSocket()) { (3) return super.convert(address); } else { return UnixDomainSocketAddress.of(address.path()); (4) } } } 1 Delegate creation of non domain socket factories to the regular NIO transport implementation 2 This channel factory returns instances of our own UnixDomainSocketChannel type (see below), passing a socket channel based on the new UNIX protocol family 3 Delegate conversion of non domain socket addresses to the regular NIO transport implementation 4 Create a UnixDomainSocketAddress for the socket’s file system path Now let’s take a look at the UnixDomainSocketChannel class. I was hoping to get away again with creating a sub-class of the NIO-based implementation, io.netty.channel.socket.nio.NioSocketChannel in this case. Unfortunately, though, the NioSocketChannel constructor invokes the taboo SocketChannel#socket() method. Of course that’d not be a problem when doing this change in Netty itself, but for my little exploration I ended up copying the class and doing the required adjustments in that copy. I ended up doing two small changes:\nAvoiding the call to SocketChannel#socket() in the constructor:\npublic UnixDomainSocketChannel(Channel parent, SocketChannel socket) { super(parent, socket); config = new NioSocketChannelConfig(this, new Socket()); (1) } 1 Passing a dummy socket instead of socket.socket(), it shouldn’t be accessed in our case anyways A few methods call the Socket methods isInputShutdown() and isOutputShutdown(); those should be possible to be by-passed by keeping track of the two shutdown flags ourselves\nAs I was creating the UnixDomainSocketChannel in my own namespace instead of Netty’s packages, a few references to the non-public method NioChannelOption#getOptions() needed commenting out, which again shouldn’t be relevant for the domain socket case\nYou can find the complete change in this commit. All in all, not exactly an artisanal piece of software engineering, but the little hack seemed good enough at least for taking a quick glimpse at the new domain socket support. Of course a real implementation could be done much more properly within the Netty project itself.\nSo it was time to give this thing a test ride. As we need to configure the custom Transport implementation, retrieval of a PgPool instance is a tad more verbose than usual:\nPgConnectOptions connectOptions = new PgConnectOptions() .setPort(5432) (1) .setHost(\u0026#34;/var/run/postgresql\u0026#34;) .setDatabase(\u0026#34;test_db\u0026#34;) .setUser(\u0026#34;test_user\u0026#34;) .setPassword(\u0026#34;topsecret!\u0026#34;); PoolOptions poolOptions = new PoolOptions() .setMaxSize(5); VertxFactory fv = new VertxFactory(); fv.transport(new UnixDomainTransport()); (2) Vertx v = fv.vertx(); PgPool client = PgPool.pool(v, connectOptions, poolOptions); (3) 1 The Vert.x Postgres client constructs the domain socket path from the given port and path (via setHost()); the full path will be /var/run/postgresql/.s.PGSQL.5432, just as above 2 Construct a Vertx instance with the custom transport class 3 Obtain a PgPool instance using the customized Vertx instance We then can can use the client instance as usual, only that it now will connect to Postgres using the domain socket instead of via TCP/IP. All this solely using the default NIO-based transports, without the need for adding any Netty native dependency, such as its epoll-based transport.\nI haven’t done any real performance benchmark at this point; in a quick ad-hoc test of executing a trivial SELECT query on a primay key 200,000 times, I observed a latency of ~0.11 ms when using Unix domain sockets — with both, netty-transport-native-epoll and JDK 16 Unix domain sockets — and ~0.13 ms when connecting via TCP/IP. So definitely a significant improvement which can be a deciding factor for low-latency use cases, though in comparison to other reports, the latency reduction of ~15% appears to be at the lower end of the spectrum.\nSome more sincere performance evaluation should be done, for instance also examining the impact on garbage collection. And it goes without saying that you should only trust your own measurements, on your own hardware, based on your specific workloads, in order to decide whether you would benefit from domain sockets or not.\nOther Use Cases Database connectivity is just one of the use cases for domain sockets; highly performant local inter-process communication comes in handy for all kinds of use cases. One which I find particularly intriguing is the creation of modular applications based on a multi-process architecture.\nWhen thinking of classic Java Jakarta EE application servers for instance, you could envision a model where both the application server and each deployment are separate processes, communicating through domain sockets. This would have some interesting advantages, such as stricter isolation (so for instance an OutOfMemoryError in one deployed application won’t impact others) and re-deployments without any risk of classloader leaks, as the JVM of an deployment would be restarted. On the downside, you’d be facing a higher overall memory consumption (although that can at least partly be mitigated through class data sharing, which also works across JVM boundaries) and more costly (remote) method invocations between deployments.\nNow the application server model has fallen out of favour for various reasons, but such multi-process design still is very interesting, for instance for building modular applications that should expose a single web endpoint, while being assembled from a set of processes which are developed and deployed by several, independent teams. Another use case would be desktop applications that are made up of a set of processes for isolation purposes, as it’s e.g. done by most web browsers noawadays with distinct processes for separate tabs. JEP 380 should facilitate this model when creating Java applications, e.g. considering rich clients built with JavaFX.\nAnother, really interesting feature of Unix domain sockets is the ability to transfer open file descriptors from one process to another. This allows for non-disruptive upgrades of server applications, without dropping any open TCP connections. This technique is used for instance by Envoy Proxy for applying configuration changes: upon a configuration change, a second Envoy instance with the new configuration is started up, takes over the active sockets from the previous instance and after some \u0026#34;draining period\u0026#34; triggers a shutdown of the old instance. This approach enables a truly immutable application design within Envoy itself, with all its advantages, without the need for in-process configuration reloads. I highly recommend to read the two posts linked above, they are super-interesting.\nUnfortunately, JEP 380 doesn’t seem to support file descriptor transfers. So for this kind of architecture, you’d still have to refrain to the aforementioned junixsocket library, which explicitly lists file transcriptor transfer support as one of its features. While you couldn’t take advantage of that using Java’s NIO API, it should be doable using alternative networking frameworks such as Netty. Probably a topic for another blog post on another one of those pandemic weekends ;)\nAnd that completes my small exploration of Java 16’s support for Unix domain sockets. If you want to do your own experiments of using them to connect to Postgres, make sure to install the latest JDK 16 EA build and grab the source code of my experimentation from this GitHub repo.\nIt’d be my hope that frameworks like Netty and Vert.x make use of this JDK feature fairly quickly, as only a small amount of code changes is required, and users get to benefit from the higher performance of domain sockets without having to pull in any additional dependencies. In order to keep compatibility with Java versions prior to 16, multi-release JARs offer one avenue for integrating this feature.\n","id":55,"publicationdate":"Jan 31, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Feb 5: This post is \u003ca href=\"https://news.ycombinator.com/item?id=26012466\"\u003ediscussed on Hacker News\u003c/a\u003e\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eReading a blog post about what’s \u003ca href=\"https://www.loicmathieu.fr/wordpress/en/informatique/java-16-quoi-de-neuf/\"\u003ecoming up in JDK 16\u003c/a\u003e recently,\nI learned that one of the new features is support for Unix domain sockets (\u003ca href=\"https://openjdk.java.net/jeps/380\"\u003eJEP 380\u003c/a\u003e).\nBefore Java 16, you’d have to resort to 3rd party libraries like \u003ca href=\"https://github.com/jnr/jnr-unixsocket\"\u003ejnr-unixsocket\u003c/a\u003e in order to use them.\nIf you haven’t heard about \u003ca href=\"https://en.wikipedia.org/wiki/Unix_domain_socket\"\u003eUnix domain sockets\u003c/a\u003e before,\nthey are \u0026#34;data communications [endpoints] for exchanging data between processes executing on the same host operating system\u0026#34;.\nDon’t be put off by the name btw.;\nUnix domain sockets are also supported by macOS and even Windows since \u003ca href=\"https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/\"\u003eversion 10\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Talking to Postgres Through Java 16 Unix-Domain Socket Channels","uri":"https://www.morling.dev/blog/talking-to-postgres-through-java-16-unix-domain-socket-channels/"},{"content":" Discussions around Java’s jlink tool typically center around savings in terms of (disk) space. Instead of shipping an entire JDK, a custom runtime image created with jlink contains only those JDK modules which an application actually requires, resulting in smaller distributables and container images.\nBut the contribution of jlink — as a part of the Java module system at large — to the development of Java application’s is bigger than that: with the notion of link time it defines an optional complement to the well known phases compile time and application run-time:\nLink time is an opportunity to do whole-world optimizations that are otherwise difficult at compile time or costly at run-time. An example would be to optimize a computation when all its inputs become constant (i.e., not unknown). A follow-up optimization would be to remove code that is no longer reachable.\nOther examples for link time optimizations are the removal of unnecessary classes and resources, the conversion of (XML-based) deployment descriptors into binary representations (which will be more efficiently processable at run-time), obfuscation, or the generation of annotation indexes. It would also be very interesting to create AppCDS archives for all the classes of a runtime image at link time and bake that archive into the image, resulting in faster application start-up, without any further manual configuration needed.\nWhile these use cases mostly relate to optimization of the runtime image in one way or another, the link time phase also is beneficial for the validation of applications. In the remainder of this post, I’d like to discuss how link time validation can be employed to ensure the consistency of API signatures within a modularized Java application. This helps to avoid potential NoSuchMethodErrors and related errors which would otherwise be raised by the JVM at application run-time, stemming from the usage of incompatible module versions, different from the ones used at compile time.\nThe Example To make things more tangible, let’s look at an application made up of two modules, customer and order. As always, the full source code is available online, for you to play with. The customer module defines a service interface with the following signature:\n1 2 3 public interface CustomerService { void incrementLoyaltyPoints(long customerId, long orderValue); } The CustomerService interface is part of the customer module’s public API and is invoked from within the order module like so:\n1 2 3 4 5 6 7 public class OrderService { public static void main(String[] args) { CustomerService customerService = ...; customerService.incrementLoyaltyPoints(123, 4999); } } Now let’s assume there’s a new version of the customer module; the signature of the incrementLoyaltyPoints() method got slightly changed for the sake of a more expressive and type-safe API:\n1 2 3 4 5 // record CustomerId(long id) {} public interface CustomerService { void incrementLoyaltyPoints(CustomerId customerId, long orderValue); } We now create a custom runtime image for the application. But we’re at the end of a tough week, so accidentally we add version 2 of the customer module and the unchanged order module:\n1 2 3 4 $ $JAVA_HOME/bin/jlink \\ --module-path=path/to/customer-2.0.0.jar:path/to/order-1.0.0.jar \\ --add-modules=com.example.order \\ --output=target/runtime-image Note that jlink won’t complain about this and create the runtime image. When executing the application via the image we’re in for a bad surprise, though (slightly modified for the sake of readability):\n1 2 3 4 5 $ ./target/runtime-image/bin/java com.example.order.OrderService Exception in thread \u0026#34;main\u0026#34; java.lang.NoSuchMethodError: \u0026#39;void c.e.customer.CustomerService.incrementLoyaltyPoints(long, long)\u0026#39; at com.example.order@1.0.0/c.e.order.OrderService.main(OrderService.java:5) This might be surprising at first; while jlink and the module system in general put a strong emphasis on reliability and e.g. flag referenced yet missing modules, mismatching API signatures like this are not raised as an issue and will only show up as an error at application run-time.\nIndeed, when I did a quick non-representative poll about this on Twitter, it turned out that more than 40% of participants were not aware of this pitfall:\nNeedless to say that it’d be much more desirable to spot this error already early on at link time, before shipping the affected application to production, and suffering from all the negative consequences associated to that.\nThe API Signature Check jlink Plug-in While jlink doesn’t detect this kind of API signature mismatch by itself, it comes with a plug-in API, which allows to hook into and enrich the linking process. By creating a custom jlink plug-in, we can implement the API signature check and fail the image creation process when detecting any invalid method references like the one above.\nUnfortunately though, the plug-in mechanism isn’t an official, supported API at this point. As a matter of fact, it is not even exported within jlink’s own module definition. With the right set of javac/java flags and the help of a small Java agent, it is possible though to compile custom plug-ins and have them picked up by jlink. To learn more about the required sorcery, check out this blog post which I wrote a while ago over on the Hibernate team blog.\nLet’s start with creating the basic structure of the plug-in implementation class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import jdk.tools.jlink.plugin.Plugin; public class SignatureCheckPlugin implements Plugin { @Override public String getName() { (1) return \u0026#34;check-signatures\u0026#34;; } @Override public Category getType() { (2) return Category.VERIFIER; } @Override public String getDescription() { (3) return \u0026#34;Checks the API references amongst the modules of \u0026#34; + \u0026#34;an application for consistency\u0026#34;; } } 1 Returns the name for the option to enable this plug-in when running the jlink command 2 Returns the category of this plug-in, which impacts the ordering within the plug-in stack (other types include TRANSFORMER, FILTER, etc.) 3 A description which will be shown when listing all plug-ins There are a few more optional methods which we could implement, e.g. if the plug-in had any parameters for controlling its behaviors, or if we wanted it to be enabled by default. But as that’s not the case for the plug-in at hand, the only method that’s missing is transform(), which does the actual heavy-lifting of the plug-in’s work.\nNow implementing the complete rule set of the JVM applied when loading and linking classes at run-time would be a somewhat daunting task. As I am lazy and this is just meant to be a basic PoC, I’m going to limit myself to the detection of mismatching signatures of invoked methods, as shown in the customer/order example above. The reason being that this task can be elegantly delegated to an existing tool (I told you, I’m lazy): Animal Sniffer.\nWhile typically used as build tool plug-in for verifying that classes built on a newer JDK version can also be executed with older Java versions (and as such mostly obsoleted by the JDK’s --release option), Animal Sniffer also provides an API for creating and verifying custom signatures. This comes in handy for our jlink plug-in implementation.\nThe general design of the transform() mechanism is that of a classic input-process-output pipeline. The method receives a ResourcePool object, which allows to traverse and examine the set of resources going into the image, such as class files, resource bundles, or manifests. A new resource pool is to be returned, which could contain exactly the same resources as the original one (as in our case); but of course it could also contain less or newly generated resources, or modified ones:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 @Override public ResourcePool transform(ResourcePool in, ResourcePoolBuilder out) { try { byte[] signature = createSignature(in); (1) boolean broken = checkSignature(in, signature); (2) if (broken) { (3) throw new PluginException(\u0026#34;There are API signature \u0026#34; + \u0026#34;inconsistencies, please check the logs\u0026#34;); } } catch(PluginException e) { throw e; } catch(Exception e) { throw new RuntimeException(e); } in.transformAndCopy(e -\u0026gt; e, out); (4) return out.build(); } /** * Creates a signature for all classes in the resource pool. */ private byte[] createSignature(ResourcePool in) throws IOException { ByteArrayOutputStream signatureStream = new ByteArrayOutputStream(); var builder = new StreamSignatureBuilder(signatureStream, new PrintWriterLogger(System.out)); in.entries() (5) .filter(e -\u0026gt; isClassFile(e) \u0026amp;\u0026amp; !isModuleInfo(e)) .forEach(e -\u0026gt; builder.process(e.path(), e.content())); builder.close(); return signatureStream.toByteArray(); } /** * Checks all classes against the given signature. */ private boolean checkSignature(ResourcePool in, byte[] signature) throws IOException { var checker = new StreamSignatureChecker( new ByteArrayInputStream(signature), Collections.\u0026lt;String\u0026gt;emptySet(), new PrintWriterLogger(System.out) ); checker.setSourcePath(Collections.\u0026lt;File\u0026gt;emptyList()); in.entries() (6) .filter(e -\u0026gt; isClassFile(e) \u0026amp;\u0026amp; !isModuleInfo(e) \u0026amp;\u0026amp; !isJdkClass(e)) .forEach(e -\u0026gt; checker.process(e.path(), e.content())); return checker.isSignatureBroken(); } private boolean isJdkClass(ResourcePoolEntry e) { return e.path().startsWith(\u0026#34;/java.\u0026#34;) || e.path().startsWith(\u0026#34;/javax.\u0026#34;) || e.path().startsWith(\u0026#34;/jdk.\u0026#34;); } private boolean isModuleInfo(ResourcePoolEntry e) { return e.path().endsWith(\u0026#34;module-info.class\u0026#34;); } private boolean isClassFile(ResourcePoolEntry e) { return e.path().endsWith(\u0026#34;class\u0026#34;); } 1 Create an Animal Sniffer signature for all the APIs in modules added to the runtime image 2 Verify all classes against that signature 3 If there’s a signature violation, fail the jlink execution by raising a PluginException 4 All classes are passed on as-is 5 Feed each class to Animal Sniffer’s signature builder for creating the signature; non-class resources and module descriptors are ignored 6 Verify each class against the signature; JDK classes can be skipped here, we assume there’s no inconsistencies amongst the JDK’s own modules The input resource pool is traversed twice: first to create an Animal Sniffer signature of all the APIs, then a second time to validate the image’s classes against that signature.\nLet me re-iterate that this a very basic, PoC-level implementation of link time API signature validation. A number of incompatibilities would not be detected by this, e.g. adding an abstract method to a superclass or interface, modifying the number and specification of the type parameters of a class, and others. The implementation could also be further optimized by validating only cross-module references. Still, this implementation is good enough to demonstrate the general principle and advantages of link time API consistency validation.\nWith the implementation in place (see the README in the PoC’s GitHub repository for details on building the project), it’s time to invoke jlink again, this time activating the new plug-in. Now, as mentioned before, the jlink plug-in API isn’t publicly exposed as of Java 15 (the current Java version at the point of writing), which means we need to jump some hoops in order to enable the plug-in and expose it to the jlink tool itself.\nIn a nutshell, a Java agent can be used to bend the module configurations as needed. Details can be found in aforementioned post on the Hibernate blog (the agent’s source code is here). The required boiler plate can be nicely encapsulated within a shell function:\n1 2 3 4 5 6 function myjlink { \\ $JAVA_HOME/bin/jlink \\ -J-javaagent:signature-check-jlink-plugin-registration-agent-1.0-SNAPSHOT.jar \\ -J--module-path=signature-check-jlink-plugin-1.0-SNAPSHOT.jar:path/to/animal-sniffer-1.19.jar:path/to/asm-9.0.jar \\ -J--add-modules=dev.morling.jlink.plugins.sigcheck \u0026#34;$@\u0026#34; \\ } All the -J options are VM options passed through to the jlink tool, in order to register the required Java agent and add the plug-in module to jlink’s module path. Instead of directly calling jlink binary itself, this wrapper function can now be used to invoke jlink with the custom plug-in. Let’s first take a look at the description in the plug-in list:\n1 2 3 4 5 6 7 8 9 10 11 $ myjlink --list-plugins ... Plugin Name: check-signatures Plugin Class: dev.morling.jlink.plugins.sigcheck.SignatureCheckPlugin Plugin Module: dev.morling.jlink.plugins.sigcheck Category: VERIFIER Functional state: Functional. Option: --check-signatures Description: Checks the API references amongst the modules of an application for consistency ... Now let’s try and create the runtime image with the mismatching customer and order modules again:\n1 2 3 4 5 6 7 8 9 10 myjlink --module-path=path/to/customer-2.0.0.jar:path/to/order-1.0.0.jar \\ --add-modules=com.example.order \\ --output=target/runtime-image \\ --check-signatures [INFO] Wrote signatures for 6156 classes. [ERROR] /com.example.order/com/example/order/OrderService.class:5: Undefined reference: void com.example.customer.CustomerService .incrementLoyaltyPoints(long, long) Error: Signature violations, check the logs Et voilà! The mismatching signature of the incrementLoyaltyPoints() method was spotted and the creation of the runtime image failed. Now we could take action, examine our module path and make sure to feed correctly matching versions of the customer and order modules to the image creation process.\nSummary The link time phase — added to the Java platform as part of the module system in version 9, and positioned between the well-known compile time and run-time phases — opens up very interesting opportunities to apply whole-world optimizations and validations to Java applications. One example is the checking the API definitions and usages across the different modules of a Java application for consistency. By means of a custom plug-in for the jlink tool, this validation can happen at link time, allowing to detect any mismatches when assembling an application, so that this kind of error can be fixed early on, before it hits an integration test or even production environment.\nThis is particularly interesting when using the Java module system for building large, modular monolithic applications. Unless you’re working with custom module layers — e.g. via the Layrry launcher — only one version of a given module may be present on the module path. If multiple modules of an application depend on different versions of a transitive dependency, link time API signature validation can help to identify inconsistencies caused by converging to a single version of that dependency.\nThe approach can also help saving build time; when only modifying a single module of a larger modularized application, instead of re-compiling everything from scratch, you could just re-build that single module. Then, when re-creating the runtime image using this module and the other existing ones, you would be sure that all module API signature definitions and usages still match.\nThe one caveat is the fact that the jlink plug-in API isn’t a public, supported API of the JDK yet. I hope this is going to change some time soon, though. E.g. the next planned LTS release, Java 17, would be a great opportunity for officially adding the ability to build and use custom jlink plug-ins. This would open the road towards more wide-spread use of link time optimizations and validations, beyond those provided by the JDK and the jlink tool itself.\nUntil then, you can explore this area starting from the source code of the signature check plug-in and its accompanying Java agent for enabling its usage with jlink.\n","id":56,"publicationdate":"Dec 28, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eDiscussions around Java’s \u003ca href=\"https://openjdk.java.net/jeps/282\"\u003ejlink\u003c/a\u003e tool typically center around savings in terms of (disk) space.\nInstead of shipping an entire JDK,\na custom runtime image created with jlink contains only those JDK modules which an application actually requires,\nresulting in smaller distributables and \u003ca href=\"blog/smaller-faster-starting-container-images-with-jlink-and-appcds/\"\u003econtainer images\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eBut the contribution of jlink — as a part of the Java module system at large — to the development of Java application’s is bigger than that:\nwith the notion of \u003cem\u003elink time\u003c/em\u003e it defines an optional complement to the well known phases \u003cem\u003ecompile time\u003c/em\u003e and application \u003cem\u003erun-time\u003c/em\u003e:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"quoteblock\"\u003e\n\u003cblockquote\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLink time is an opportunity to do whole-world optimizations that are otherwise difficult at compile time or costly at run-time. An example would be to optimize a computation when all its inputs become constant (i.e., not unknown). A follow-up optimization would be to remove code that is no longer reachable.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/blockquote\u003e\n\u003c/div\u003e","tags":null,"title":"jlink's Missing Link: API Signature Validation","uri":"https://www.morling.dev/blog/jlinks-missing-link-api-signature-validation/"},{"content":" The other day, a user in the Debezium community reported an interesting issue; They were using Debezium with Java 1.8 and got an odd NoSuchMethodError:\njava.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer; at io.debezium.connector.postgresql.connection.Lsn.valueOf(Lsn.java:86) at io.debezium.connector.postgresql.connection.PostgresConnection.tryParseLsn(PostgresConnection.java:270) at io.debezium.connector.postgresql.connection.PostgresConnection.parseConfirmedFlushLsn(PostgresConnection.java:235) ... A NoSuchMethodError typically is an indication for a mismatch of the Java version used to compile some code, and the Java version used for running it: some method existed at compile time, but it’s not available at runtime.\nNow indeed we use JDK 11 for building the Debezium code base, while targeting Java 1.8 as the minimal required version at runtime. But there is a method position(int) defined on the Buffer class (which ByteBuffer extends) also in Java 1.8. And as a matter of fact, the Debezium code compiles just fine with that version, too. So why would the user run into this error then?\nTo understand what’s going on, let’s create a very simple class for reproducing the issue:\n1 2 3 4 5 6 7 8 9 import java.nio.ByteBuffer; public class ByteBufferTest { public static void main(String... args) { ByteBuffer buffer = ByteBuffer.wrap(new byte[] { 1, 2, 3 }); buffer.position(1); (1) System.out.println(buffer.get()); } } 1 Why does this not work with Java 1.8 when compiled with JDK 9 or newer? Compile this with a current JDK:\n$ javac --source 1.8 --target 1.8 ByteBufferTest.java And sure enough, the NoSuchMethodError shows up when running this with Java 1.8:\n$ java ByteBufferTest Exception in thread \u0026#34;main\u0026#34; java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer; at ByteBufferTest.main(ByteBufferTest.java:6) Whereas, when using 1.8 to compile and run this code, it just works fine. Now, if we take a closer look at the error message again, the missing method is defined as ByteBuffer position(int). I.e. for an invoked method like position(), not only its name, parameter type(s), and the name of the declaring class are part of the byte code for that invocation, but also the method’s return type. A look at the byte code of the class using javap confirms that:\n$ javap -p -c -s -v -l -constants ByteBufferTest ... public static void main(java.lang.String...); descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS Code: stack=4, locals=2, args_size=1 ... 19: aload_1 20: iconst_1 21: invokevirtual #13 // Method java/nio/ByteBuffer.position:(I)Ljava/nio/ByteBuffer; ... And this points us to the right direction; In Java 1.8, indeed there is no such method, only the position() method on Buffer, which, of course, returns Buffer and not ByteBuffer. Whereas since Java 9, this method (and several others) is overridden in ByteBuffer — leveraging Java’s support for co-variant return types — to return ByteBuffer. The Java compiler will now select that method, ByteBuffer position(int), and record that as the invoked method signature in the byte code of the caller class.\nThis is per-se a nice usability improvement, as it allows to invoke further ByteBuffer methods on the return value, instead of just those methods declared by Buffer. But as we’ve seen, it comes with this little surprise when compiling code on JDK 9 or newer, while trying to keep compatibility with older Java versions. And as it turns out, we were not the first or only ones to encounter this issue. Quite a few open-source projects ran into this, e.g. Eclipse Jetty, Apache Pulsar, Eclipse Vert.x, Apache Thrift, the Yugabyte DB client, and a few others.\nHow to Prevent This Situation? So what can you do in order to prevent this issue from happening? One first idea could be to enforce selection of the right method by casting to Buffer:\n1 ((java.nio.Buffer) buffer).position(1); But while this produces the desired byte code indeed, it isn’t exactly the best way for doing so. You’d have to remember to do so for every invocation of any of the affected ByteBuffer methods, and the seemling unneeded cast might be an easy target for some \u0026#34;clean-up\u0026#34; by unsuspecting co-workers on our team.\nLuckily, there’s a much better way, and this is to rely on the Java compiler’s --release parameter, which was introduced via JEP 247 (\u0026#34;Compile for Older Platform Versions\u0026#34;), added to the platform also in JDK 9. In contrast to the more widely known pair of --source and --target, the --release switch will ensure that only byte code is produced which actually will be useable with the specified Java version. For this purpose, the JDK contains the signature data for all supported Java versions (stored in the $JAVA_HOME/lib/ct.sym file).\nSo all that’s needed really is compiling the code with --release=8:\n$ javac --release=8 ByteBufferTest.java Examine the bytecode using javap again, and now the expected signature is in place:\n21: invokevirtual #13 // Method java/nio/ByteBuffer.position:(I)Ljava/nio/Buffer; When run on Java 1.8, this virtual method call will be resolved to Buffer#position(int) at runtime, whereas on Java 9 and later, it’d resolve to the bridge method inserted by the compiler into the class file of ByteBuffer due to the co-variant return type, which itself calls the overriding ByteBuffer#position(int) method.\nNow let’s see what happens if we actually try to make use of the overriding method version in ByteBuffer by re-assigning the result:\n1 2 3 4 ... ByteBuffer buffer = ByteBuffer.wrap(new byte[] { 1, 2, 3 }); buffer = buffer.position(1); ... Et voilà, this gets rejected by the compiler when targeting Java 1.8, as the return type of the JDK 1.8 method Buffer#position(int) cannot be assigned to ByteBuffer:\n$ javac --release=8 ByteBufferTest.java ByteBufferTest.java:6: error: incompatible types: Buffer cannot be converted to ByteBuffer buffer = buffer.position(1); To cut a long story short, we — and many other projects — should have used the --release switch instead of --source/--target, and the user would not have had that issue. In order to achieve the same in your Maven-based build, just specify the following property in your pom.xml:\n1 2 3 4 5 ... \u0026lt;properties\u0026gt; \u0026lt;maven.compiler.release\u0026gt;8\u0026lt;/maven.compiler.release\u0026gt; \u0026lt;/properties\u0026gt; ... Note that theoretically you could achieve the same effect also when using --source and --target; by means of the --boot-class-path option, you could advise the compiler to use a specific set of bootstrap class files instead of those from the JDK used for compilation. But that’d be quite a bit more cumbersome as it requires you to actually provide the classes of the targeted Java version, whereas --release will make use of the signature data coming with the currently used JDK itself.\n","id":57,"publicationdate":"Dec 21, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, a user in the \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e community reported an interesting issue;\nThey were using Debezium with Java 1.8 and got an odd \u003ccode\u003eNoSuchMethodError\u003c/code\u003e:\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"ByteBuffer and the Dreaded NoSuchMethodError","uri":"https://www.morling.dev/blog/bytebuffer-and-the-dreaded-nosuchmethoderror/"},{"content":" Functional unit and integration tests are a standard tool of any software development organization, helping not only to ensure correctness of newly implemented code, but also to identify regressions — bugs in existing functionality introduced by a code change. The situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones: How to detect increased response times in a web application? How to identify decreased throughput?\nThese aspects are typically hard to test in an automated and reliable way in the development workflow, as they are dependent on the underlying hardware and the workload of an application. For instance assertions on the duration of specific requests of a web application typically cannot be run in a meaningful way on a developer laptop, which differs from the actual production hardware (ironically, nowadays both is an option, the developer laptop being less or more powerful than the actual production environment). When run in a virtualized or containerized CI environment, such tests are prone to severe measurement distortions due to concurrent load of other applications and jobs.\nThis post introduces the JfrUnit open-source project, which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on indirect metrics which may impact those. JfrUnit allows you define expected values for metrics such as memory allocation, database I/O, or number of executed SQL statements, for a given workload and asserts the actual metrics values — which are obtained from JDK Flight Recorder events — against these expected values. Starting off from a defined base line, future failures of such assertions are an indicator for potential performance regressions in an application, as a code change may have introduced higher GC pressure, the retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.\nJfrUnit provides means of identifying and analyzing such anomalies in a reliable, environment independent way in standard JUnit tests, before they manifest as actual performance regressions in production. Test results are independent from wall clock time and thus provide actionable information, also when not testing with production-like hardware and data volumes.\nThis post is a bit longer than usual (I didn’t have the time to write shorter ;), but it’s broken down into several sections, so you can pause and continue later on with fresh energy:\nGetting Started With JfrUnit\nCase Study 1: Spotting Increased Memory Allocation\nCase Study 2: Identifying Increased I/O With the Database\nDiscussion\nSummary and Outlook\nGetting Started With JfrUnit JfrUnit is an extension for JUnit 5 which integrates Flight Recorder into unit tests; it makes it straight forward to initiate a JFR recording for a given set of event types, execute some test routine, and then assert the JFR events which should have been produced.\nHere is a basic example of a JfrUnit test:\n@JfrEventTest (1) public class JfrUnitTest { public JfrEvents jfrEvents = new JfrEvents(); @Test @EnableEvent(\u0026#34;jdk.GarbageCollection\u0026#34;) (2) @EnableEvent(\u0026#34;jdk.ThreadSleep\u0026#34;) public void shouldHaveGcAndSleepEvents() throws Exception { System.gc(); Thread.sleep(1000); jfrEvents.awaitEvents(); (3) ExpectedEvent event = event(\u0026#34;jdk.GarbageCollection\u0026#34;); (4) assertThat(jfrEvents).contains(event); event = event(\u0026#34;jdk.GarbageCollection\u0026#34;) (4) .with(\u0026#34;cause\u0026#34;, \u0026#34;System.gc()\u0026#34;)); assertThat(jfrEvents).contains(event); event = event(\u0026#34;jdk.ThreadSleep\u0026#34;). with(\u0026#34;time\u0026#34;, Duration.ofSeconds(1))); assertThat(jfrEvents).contains(event); assertThat(jfrEvents.ofType(\u0026#34;jdk.GarbageCollection\u0026#34;)).hasSize(1); (5) } } 1 @JfrEventTest marks this as a JfrUnit test, activating its extension 2 All JFR event types to be recorded must be enabled via @EnableEvent 3 After running the test logic, awaitEvents() must be invoked as a synchronization barrier, making sure all previously produced events have been received 4 Using the JfrEventsAssert#event() method, an ExpectedEvent instance can be created — optionally specifying one or more expected attribute values — which then is asserted via JfrEventsAssert#assertThat() 5 JfrEvents#ofType() allows to filter on specific event types, enabling arbitrary assertions against the returned stream of RecordedEvents By means of a custom assertThat() matcher method for AssertJ, JfrUnit allows to validate that specific JFR events are raised during at test. Events to be matched are described via their event type name, and optionally one more event attribute vaues. As we’ll see in a bit, JfrUnit also integrates nicely with the Java Stream API, allowing you to filter and aggregate recorded event atribute values and match them against expected values.\nJfrUnit persists a JFR recording file for each test method, which you can examine after a test failure, for instance using JDK Mission Control. To learn more about JfrUnit and its capabilities, take a look at the project’s README. The project is in an early proof-of-concept stage at the moment, so changes to its APIs and semantics are likely.\nNow that you’ve taken the JfrUnit quick tour, let’s put that knowledge into practice. Our example project will be the Todo Manager Quarkus application you may already be familiar with from my earlier post about custom JFR events. We’re going to discuss two examples for using JfrUnit to identify potential performance regressions.\nCase Study 1: Spotting Increased Memory Allocation At first, let’s explore how to identify increased memory allocation rates. Typically, it’s mostly library and middleware authors who are interested in this. For a library such as Hibernate ORM it can make a huge difference whether a method that is invoked many times on a hot code path allocates a few objects more or less. Less object allocations mean less work for the garbage collector, which in turn means those precious CPU cores of your machine can spend more cycles processing your actual business logic.\nBut also for application developers it can be beneficial to keep an eye on — and systematically track — object allocations, as regressions there lead to increased GC pressure, and in turn eventually to higher latencies and reduced throughput.\nThe key for tracking object allocations with JFR are the jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB events, which are emitted when\nan object allocation triggered the creation of a new thread-local allocation buffer (TLAB)\nan object got allocated outside of the thread’s TLAB\nThread-local allocation buffers (TLAB) When creating new object instances on the heap, this primarily happens via thread-local allocation buffers. A TLAB is a pre-allocated memory block that’s exclusively used by a single thread. Since this space is exclusively owned by the thread, creating new objects within a TLAB can happen without costly synchronization with other threads. Once a thread’s current TLAB capacity is about to be exceeded by a new object allocation, a new TLAB will be allocated for that thread. In addition, large objects will typically need to be directly allocated outside of the more efficient TLAB space.\nTo learn more about TLAB allocation, refer to part #4 of Aleksey Shipilёv’s \u0026#34;JVM Anatomy Quark\u0026#34; blog series.\nNote these events don’t allow for tracking of each individual object allocation, as multiple objects will be allocated within a TLAB before a new one is required, and thus the jdk.ObjectAllocationInNewTLAB event will be emitted. But as that event exposes the size of the new TLAB, we can keep track of the overall amount of memory that’s allocated while the application is running.\nIn that sense, jdk.ObjectAllocationInNewTLAB represents a sampling of object allocations, which means we need to collect a reasonable number of events to identify those locations in the program which are the sources of high object allocation and thus frequently trigger new TLAB creations.\nSo let’s start and work on a test for spotting regressions in terms of object allocations of one of the Todo Manager app’s API methods, GET /todo/{id}. To identify a baseline of the allocation to be expected, we first invoke that method in a loop and print out the actual allocation values. This should happen in intervals, e.g. every 10,000 invocations, so to average out numbers from individual API calls.\n@Test @EnableEvent(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) (1) @EnableEvent(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;) public void retrieveTodoBaseline() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder() .build(); for (int i = 1; i\u0026lt;= 100_000; i++) { executeRequest(r, client); if (i % 10_000 == 0) { jfrEvents.awaitEvents(); (2) long sum = jfrEvents.filter(this::isObjectAllocationEvent) (3) .filter(this::isRelevantThread) .mapToLong(this::getAllocationSize) .sum(); System.out.printf( Locale.ENGLISH, \u0026#34;Requests executed: %s, memory allocated: (%,d bytes/request)%n\u0026#34;, i, sum/10_000 ); jfrEvents.reset(); (4) } } private void executeRequest(Random r, HttpClient client) throws Exception { int id = r.nextInt(20) + 1; HttpRequest request = HttpRequest.newBuilder() .uri(new URI(\u0026#34;http://localhost:8081/todo/\u0026#34; + id)) .headers(\u0026#34;Content-Type\u0026#34;, \u0026#34;application/json\u0026#34;) .GET() .build(); HttpResponse\u0026lt;String\u0026gt; response = client .send(request, HttpResponse.BodyHandlers.ofString()); assertThat(response.statusCode()).isEqualTo(200); } private boolean isObjectAllocationEvent(RecordedEvent re) { (5) String name = re.getEventType().getName(); return name.equals(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) || name.equals(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;); } private long getAllocationSize(RecordedEvent re) { (6) return re.getEventType().getName() .equals(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) ? re.getLong(\u0026#34;tlabSize\u0026#34;) : re.getLong(\u0026#34;allocationSize\u0026#34;); } private boolean isRelevantThread(RecordedEvent re) { (7) return re.getThread().getJavaName().startsWith(\u0026#34;vert.x-eventloop\u0026#34;) || re.getThread().getJavaName().startsWith(\u0026#34;executor-thread\u0026#34;); } } 1 Enable the jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB JFR events 2 Every 10,000 events, wait for all the JFR events produced so far 3 Calculate the total allocation size, by summing up the TLAB allocations of all relevant threads 4 Reset the event stream for the next iteration 5 Is this a TLAB event? 6 Get the new TLAB size in case of an in TLAB allocation, otherwise the allocated object size out of TLAB 7 We’re only interested in the web application’s own threads, in particular ignoring the main thread which runs the HTTP client of the test Note that unlike in the initial example showing the usage of JfrUnit, here we’re not using the simple contains() AssertJ matcher, but rather calculate some custom value — the overall object allocation in bytes — by means of filtering and aggregating the relevant JFR events.\nHere are the numbers I got from running 100,000 invocations:\nRequests executed: 10000, memory allocated: 34096 bytes/request Requests executed: 20000, memory allocated: 31768 bytes/request Requests executed: 30000, memory allocated: 31473 bytes/request Requests executed: 40000, memory allocated: 31462 bytes/request Requests executed: 50000, memory allocated: 31547 bytes/request Requests executed: 60000, memory allocated: 31545 bytes/request Requests executed: 70000, memory allocated: 31537 bytes/request Requests executed: 80000, memory allocated: 31624 bytes/request Requests executed: 90000, memory allocated: 31703 bytes/request Requests executed: 100000, memory allocated: 31682 bytes/request As we see, there’s some warm-up phase during which allocation rates still go down, but after ~20 K requests, the allocation per request is fairly stable, with a volatility of ~1% when averaged out over 10K requests. This means that this initial phase should be excluded during the actual test.\nTo emphasize the key part again, this allocation is per request, it is independent from wall clock time and thus is neither dependent from the machine running the test (i.e. the test should behave the same when running on a developer laptop and on a CI machine), nor is it subject to volatility induced by other workloads running concurrently.\nTracking Object Allocations in Java 16 The two TLAB allocation events provide all the information required for analysing object allocations in Java applications, but often it’s not practical to enable them on a continuous basis when running in production. Due to the high amount of events produced, enabling them adds some overhead in terms of latency, also the size of JFR recording files can be hard to predict.\nBoth issues are addressed by a JFR improvement that’s proposed for inclusion into Java 16, \u0026#34;JFR Event Throttling\u0026#34;. This will provide control over the emission rate of events, e.g. allowing to sample object allocations with a defined rate of 100 events per second, which addresses both the overhead as well as the recording size issue. A new event type, jdk.ObjectAllocationSample will be added, too, which will be enabled in the JFR default configuration.\nFor JfrUnit, explicit control over the event sampling rate will be a very interesting capability, as a higher sampling rate may lead to stable results more quickly, in turn resulting in shorter test execution times.\nBased on that, the actual test could look like so:\n@Test @EnableEvent(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) @EnableEvent(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;) public void retrieveTodo() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder().build(); for (int i = 1; i\u0026lt;= 20_000; i++) { (1) executeRequest(r, client); } jfrEvents.awaitEvents(); jfrEvents.reset(); for (int i = 1; i\u0026lt;= 10_000; i++) { (2) executeRequest(r, client); } jfrEvents.awaitEvents(); long sum = jfrEvents.filter(this::isObjectAllocationEvent) .filter(this::isRelevantThread) .mapToLong(this::getAllocationSize) .sum(); assertThat(sum / 10_000).isLessThan(33_000); (3) } 1 Warm-up phase 2 The actual test phase 3 Assert the memory allocation per request is within the expected boundary; note we could also add a lower boundary, so to make sure we notice any future improvements (e.g. caused by upgrading to new efficient versions of a library), which otherwise may hide subsequent regressions Now let’s assume we’ve wrapped up the initial round of work on this application, and its tests have been passing on CI for a while. One day, the retrieveTodo() performance test method fails though:\njava.lang.AssertionError: Expecting: \u0026lt;388370L\u0026gt; to be less than: \u0026lt;33000L\u0026gt; Ugh, it’s suddenly allocating about ten times more memory per request than before! What has happened? To find the answer, we can take a look at the test’s JFR recording, which JfrUnit persists under target/jfrunit:\nls target/jfrunit dev.morling.demos.quarkus.TodoResourcePerformanceTest-createTodo.jfr dev.morling.demos.quarkus.TodoResourcePerformanceTest-retrieveTodo.jfr Let’s open the *.jfr file for the failing test in JDK Mission Control (JMC) in order to analyse all the recorded events (note that the recording will always contain some JfrUnit-internal events which are needed for synchronizing the recording stream and the events exposed to the test).\nWhen taking a look at the TLAB events of the application’s executor thread, the culprit is identified quickly; a lot of the sampled TLAB allocations contain this stack trace (click on the image to enlarge):\nInteresting, REST Assured loading a Jackson object mapper, what’s going on there? Here’s the full stacktrace:\nSo it seems a REST call to another service is made from within the TodoResource#get(long) method! At this point we know where to look into the source code of the application:\n@GET @Transactional @Produces(MediaType.APPLICATION_JSON) @Path(\u0026#34;/{id}\u0026#34;) public Response get(@PathParam(\u0026#34;id\u0026#34;) long id) throws Exception { Todo res = Todo.findById(id); User user = RestAssured.given().port(8082) .when() .get(\u0026#34;/users/\u0026#34; + res.userId) .as(User.class); res.userName = user.name; return Response.ok() .entity(res) .build(); } Gasp, it looks like a developer on the team has been taking the microservices mantra a bit too far, and has changed the code so it invokes another service in order to obtain some additional data associated to the user who created the retrieved todo.\nWhile that’s problematic in its own right due to the inherent coupling between the two services (how should the Todo Manager service react if the user service isn’t available?), they made matters worse by using the REST Assured API as a REST client, in a less than ideal way. The API’s simplicity and elegance makes it a great solution for testing (and indeed that’s its primary use case), but this particular usage seems to be not such a good choice for production code.\nAt this point you should ask yourself whether the increased allocation per request actually is a problem for your application or not. To determine if that’s the case, you could run some tests on actual request latency and throughput in a production-like environment. If there’s no impact based on the workload you have to process, you might very well decide that additional allocations are well spent for your application’s purposes.\nIncreasing the allocation per request by a factor of ten in the described way quite likely does not fall into this category, though. At the very least, we should look into making the call against the User REST API more efficiently, either by setting up REST Assured in a more suitable way, or by looking for an alternative REST client. Of course the external API call just by itself adds to the request latency, which is something we might want to avoid.\nIt’s also worth examining the application’s garbage collection behavior. In order to so, you can run the performance test method again, either enabling all the GC-related JFR event types, or by enabling a pre-existing JFR configuration (the JDK comes with two built-in JFR configurations, default and profile, but you can also create and export them via JMC):\n@Test @EnableConfiguration(\u0026#34;profile\u0026#34;) public void retrieveTodo() throws Exception { // ... } Note that the pre-defined configurations imply minimum durations for certain event types; e.g. the I/O events discussed in the next section will only be recorded if they have a duration of 20 ms or longer. Depending on your testing requirements, you may have to adjust and tweak the configuration to be used.\nOpen the recording in JMC, and you’ll see there’s a substantial amount of GC activity happening:\nThe difference to the GC behavior before this code change is striking:\nPause times are worse, directly impacting the application’s latency, and the largely increased GC volume means the production environment will be able to serve less concurrent requests when reaching its capacity limits, meaning you’d have to provision another machine earlier on as your load increases.\nMemory Leak in the JFR Event Streaming API The astute reader may have noticed that there is a memory leak before and after the code change, as indicated by the ever increased heap size post GC. After some exploration it turned out that this is a bug in the JFR event streaming API which holds on to a large number of RecordedEvent instances internally. Erik Gahlin from the OpenJDK team logged JDK-8257906 for tracking and hopefully fixing this in JDK 16.\nNow such drastic increase of object allocation and thus potential impact on performance should hopefully be an exception rather than a regular situation. But the example shows how continuous performance unit tests on impacting metrics like memory allocation, using JfrUnit and JDK Flight Recorder and, can help to identify performance issues in an automated and reliable way, preventing such regression to sneak into production. Being able to identify this kind of issue by running tests locally on a developer laptop or a CI server can be a huge time-saver and productivity boost.\nCase Study 2: Identifying Increased I/O With the Database Once you’ve started to look at performance regression tests through the lense of JfrUnit, more and more possibilities pop up. Asserting a maximum number of garbage collections? Not a problem. Avoiding an unexpected amount of file system IO? The jdk.FileRead and jdk.FileWrite events are our friend. Examining and asserting the I/O done with the database? Easily doable. Assertions on application-specific JFR event types you’ve defined yourself? Sure thing!\nYou can find a complete list of all JFR event types by JDK version in this nice matrix created by Tom Schindl. The number of JFR event types is growing constantly; as of JDK 15, there are 157 different ones of them.\nNow let’s take a look at assertions on database I/O, as the amount of data fetched from or written to the database often is a very impactful factor of an enterprise application’s behavior. A regression here, e.g. fetching more data from the database than anticipated, may indicate that data is unnecessarily loaded. For instance it might be the case that a set of data is loaded only in order to filter it in the application subsequently, instead of doing so via SQL in the database itself, resulting in increased request durations.\nSo how could such test look like for our GET /todo/{id} API call? The general approach is the same as before with memory allocations: first define a baseline of the bytes read and written by invoking the API under test for a given number of executions. Once that’s done, you can implement the actual test, including an assertion on the expected number of bytes read or written:\n@Test @EnableEvent(value=\u0026#34;jdk.SocketRead\u0026#34;, stackTrace=INCLUDED) (1) @EnableEvent(value=\u0026#34;jdk.SocketWrite\u0026#34;, stackTrace=INCLUDED) public void retrieveTodo() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder() .build(); for (int i = 1; i\u0026lt;= ITERATIONS; i++) { executeRequest(r, client); } jfrEvents.awaitEvents(); long count = jfrEvents.filter(this::isDatabaseIoEvent).count(); (2) assertThat(count / ITERATIONS).isEqualTo(4) .describedAs(\u0026#34;write + read per statement, write + read per commit\u0026#34;); long bytesReadOrWritten = jfrEvents.filter(this::isDatabaseIoEvent) .mapToLong(this::getBytesReadOrWritten) .sum(); assertThat(bytesReadOrWritten / ITERATIONS).isLessThan(250); (3) } private boolean isDatabaseIoEvent(RecordedEvent re) { (4) return ((re.getEventType().getName().equals(\u0026#34;jdk.SocketRead\u0026#34;) || re.getEventType().getName().equals(\u0026#34;jdk.SocketWrite\u0026#34;)) \u0026amp;\u0026amp; re.getInt(\u0026#34;port\u0026#34;) == databasePort); } private long getBytesReadOrWritten(RecordedEvent re) { (5) return re.getEventType().getName().equals(\u0026#34;jdk.SocketRead\u0026#34;) ? re.getLong(\u0026#34;bytesRead\u0026#34;) : re.getLong(\u0026#34;bytesWritten\u0026#34;); } 1 Enable the jdk.SocketRead and jdk.SocketWrite event types; by default, those don’t contain the stacktrace for the events, so that needs to be enabled explicitly 2 There should be four events per invocation of the API method 3 Less than 250 bytes I/O are expected per invocation 4 Only read and write events on the database port are relevant for this test, but e.g. not I/O on the web port of the application 5 Retrieve the value of the event’s bytesRead or bytesWritten field, depending on the event type Now let’s again assume that after some time the test begins to fail. This time it’s the assertion on the number of executed reads and writes:\nAssertionFailedError: Expecting: \u0026lt;18L\u0026gt; to be equal to: \u0026lt;4L\u0026gt; but was not. Also the number of bytes read and written has substantially increased:\njava.lang.AssertionError: Expecting: \u0026lt;1117L\u0026gt; to be less than: \u0026lt;250L\u0026gt; That’s definitely something to look into. So let’s open the recording of the failed test in Flight Recorder and take a look at the socket read and write events. Thanks to enabling stacktraces for the two JFR event types we can quite quickly identify the events asssociated to an invocation of the GET /todo/{id} API:\nAt this point, some familiarity with the application in question will come in handy to identify suspicous events. But even without that, we could compare previous recordings of successful test runs with the recording from the failing one in order to see where differences are. In the case at hand, the BlobInputStream and Hibernate’s BlobTypeDescriptor in the call stack seem pretty unexpected, as our User entity didn’t have any BLOB attribute before.\nIn reality, comparing with the latest version and a look into the git history of that class could confirm that there’s a new attribute storing an image (perhaps not a best practice to do so ;):\n@Entity public class Todo extends PanacheEntity { public String title; public int priority; public boolean completed; @Lob (1) public byte[] image; } 1 This looks suspicious! We now would have to decide whether this image attribute actually should be loaded for this particular use case, (if so, we’d have to adjust the test accordingly), or whether it would for instance make more sense to mark this property as a lazily loaded one and only retrieve it when actually required.\nSolely working with the raw socket read and write events can be a bit cumbersome, though. Wouldn’t it be nice if we also had the actual SQL statement which caused this I/O? Glad you asked! Neither Hibernate nor the Postgres JDBC driver emit any JFR events at the moment (although well-informed sources are telling me that the Hibernate team wants to look into this). Therefore, in part two of this blog post series, we’ll discuss how to instrument an existing library to emit events like this, using a Java agent, without modifying the library in question.\nDiscussion JfrUnit in conjunction with JDK Flight Recorder opens up a very interesting approach for identifying potential performance regressions in Java applications. Instead of directly measuring an application’s performance metrics, most notably latency and throughput, the idea is to measure and assert metrics that impact the performance characteristics. This allows you to implement stable and reliable automated performance regression tests, whose outcome does not depend on the capabilities of the execution environment (e.g. number/size of CPUs), or other influential factors like concurrently running programs.\nRegressions in such impacting metrics, e.g. the amount of allocated memory, or bytes read from a database, are indicators that the application’s performance may have degraded. This approach offers some interesting advantages over performance tests on actual latency and throughput themselves:\nHardware independent: You can identify potential regressions also when running tests on hardware which is different (e.g. less powerful) from the actual production hardware\nFast feedback cycle: Being able to run performance regression tests on developer laptops, even in the IDE, allows for fast identification of potential regressions right during development, instead of having to wait for the results of less frequently executed test runs in a traditional performance test lab environment\nRobustness: Tests are robust and not prone to factors such as the load induced by parallel jobs of a CI server or a virtualized/containerized environment\nPro-active identification of performance issues: Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system’s load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on\nReduced need for warm-up: For traditional performance tests of Java-based applications, a thorough warm-up is mandatory, e.g. to ensure proper optimization of the JIT-compiled code. In comparison, metrics like file or database I/O are very stable for a defined workload, so that regressions can be identified also with just a single or a few executions\nNeedless to say, that you should be aware of the limitations of this approach, too:\nNo statement on user-visible performance metrics: Measuring and asserting performance-impacting factors doesn’t tell you anything in terms of the user-visible performance characteristics themselves. While we can reason about guarantees like \u0026#34;The system can handle 10K concurrent requests while the 99.9 percentile of requests has a latency of less than 250 ms\u0026#34;, that’s not the case for metrics like memory allocation or I/O. What does it mean if an application allocates 100 KB of RAM for a particular use case? Is it a lot? Too much? Just fine?\nFocused on identifying regressions: Somewhat related to the first point, this approach of testing is focused not on specific absolute values, but rather on identifying performance regressions. It’s hard to tell whether 100 KB database I/O is good or bad for a particular web request, but a change from 100 KB to 200 KB might indicate that something is wrong\nFocused on identifying potential regressions: A change in performance-impacting metrics does not necessarily imply an actual user-visible performance regression. For instance it might be acceptable for a specific request to allocate more RAM than it did before, if the production system generally isn’t under high load and the additional GC effort doesn’t matter in practice\nDoes not work for all performance-impacting metrics: Some performance metrics cannot be meaningfully asserted in plain unit tests; e.g. degraded throughput due to lock contention can typically only be identified with a reasonable number of concurrent requests\nOnly identifies regressions in the application itself: A traditional integrative performance test of an enterprise application will also capture issues in related components, such as the application’s database. A query run with a sub-optimal execution plan won’t be noticed with this testing approach\nVolatile results for timer-based tasks: While metrics like object allocations should be stable e.g. for a specific web request, events which are timing-based, would yield more events on a slower environment than on a faster machine\nSummary and Outlook JUnit tests based on performance-impacting factors can be a very useful part of the performance testing strategy for an application. They can help to identify potential performance regressions very early in the development lifecycle, when they can be fixed comparatively easy and cheap. Of course they are no silver bullet; you should consider them as complement for classic performance tests running on production-like hardware, not a replacement.\nThe approach may feel a bit unfamiliar initially, and it may take some time to learn about the different metrics which can be measured with JFR and asserted via JfrUnit, as well as their implications on an application’s performance characteristics. But once this hurdle is passed, continuous performance regression tests can be a valuable tool in the box of every software and performance engineer.\nJfrUnit is still in its infancy, and could evolve into a complete toolkit around automated test of JFR-based metrics. Ideas for future development include:\nA more powerful \u0026#34;built-in\u0026#34; API which e.g. provides the functionality for calculating the total TLAB allocations of a given set of threads as a ready-to-use method\nIt could also be very interesting to run assertions against externally collected JFR recording files. This would allow to validate workloads which require more complex set-ups, e.g. running in a dedicated performance testing lab, or even from continuous recordings taken in production\nThe JFR event streaming API could be leveraged for streaming queries on live events streamed from a remote system\nAnother use case we haven’t explored yet is the validation of resource consumption before and after a defined workload. E.g. after logging in and out a user 100 times, the system should roughly consume — ignoring any initial growth after starting up — the same amount of memory. A failure of such assertion would indicate a potential memory leak in the application\nJfrUnit might automatically detect that certain metrics like object allocations are still undergoing some kind of warm-up phase and thus are not stable, and mark such tests as potentially incorrect or flaky\nKeeping track of historical measurement data, e.g. allowing to identify regressions which got introduced step by step over a longer period of time, with one comparatively small change being the straw finally breaking the camel’s back\nYour feedback, feature requests, or even contributions to the project will be highly welcomed!\nStay tuned for part two of this blog post, where we’ll explore how to trace the SQL statements executed by an application using the JMC Agent and assert these query events using JfrUnit. This will come in very handy for instance for identifying common performance problems like N+1 SELECT statements.\nMany thanks to Hans-Peter Grahsl, John O’Hara, Nitsan Wakart, and Sanne Grinovero for their extensive feedback while writing this blog post!\n","id":58,"publicationdate":"Dec 16, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFunctional unit and integration tests are a standard tool of any software development organization,\nhelping not only to ensure correctness of newly implemented code,\nbut also to identify regressions — bugs in existing functionality introduced by a code change.\nThe situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones:\nHow to detect increased response times in a web application?\nHow to identify decreased throughput?\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThese aspects are typically hard to test in an automated and reliable way in the development workflow,\nas they are dependent on the underlying hardware and the workload of an application.\nFor instance assertions on the duration of specific requests of a web application typically cannot be run in a meaningful way on a developer laptop,\nwhich differs from the actual production hardware\n(ironically, nowadays both is an option, the developer laptop being less or more powerful than the actual production environment).\nWhen run in a virtualized or containerized CI environment, such tests are prone to severe measurement distortions due to concurrent load of other applications and jobs.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis post introduces the \u003ca href=\"https://github.com/gunnarmorling/jfrunit\"\u003eJfrUnit\u003c/a\u003e open-source project, which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on \u003cem\u003eindirect metrics\u003c/em\u003e which may impact those.\nJfrUnit allows you define expected values for metrics such as memory allocation, database I/O, or number of executed SQL statements, for a given workload and asserts the actual metrics values — which are obtained from \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e events — against these expected values.\nStarting off from a defined base line, future failures of such assertions are an indicator for potential performance regressions in an application, as a code change may have introduced higher GC pressure,\nthe retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Towards Continuous Performance Regression Testing","uri":"https://www.morling.dev/blog/towards-continuous-performance-regression-testing/"},{"content":" A few months ago I wrote about how you could speed up your Java application’s start-up times using application class data sharing (AppCDS), based on the example of a simple Quarkus application. Since then, quite some progress has been made in this area: Quarkus 1.6 brought built-in support for AppCDS, so that now you just need to provide the -Dquarkus.package.create-appcds=true option when building your project, and you’ll find an AppCDS file in the target folder.\nThings get more challenging though when combining AppCDS with custom Java runtime images, as produced using the jlink tool added in Java 9. Combining custom runtime images with AppCDS is very attractive, in particular when looking at the deployment of Java applications via Linux containers. Instead of putting the full Java runtime into the container image, you only add those JDK modules which your application actually requires. (Parts of) what you save in image size by doing so, can be used for adding an AppCDS archive to your container image. The result will be a container image which still is smaller than before — and thus is faster to push to a container registry, distribute to worker nodes in a Kubernetes cluster, etc. — and which starts up significantly faster.\nA challenge though is that AppCDS archives must be created with exactly same Java runtime which later on is used to run the application. In the case of jlink this means the custom runtime image itself must be used to produce the AppCDS archive. In other words, the default archive produced by the Quarkus build unfortunately cannot be used with jlink images. The goal for this post is to explore\nthe steps required to create a custom runtime image for a simple Java CRUD application based on Quarkus,\nhow to build a Linux container image with this custom runtime image and the application itself,\nhow this approach compares to container images with the full Java runtime in terms of size and start-up time.\nCreating a Modular Runtime Image for a Quarkus Application It’s a common misbelief that only Java applications which have been fully ported to the Java module system (JPMS) would be able to benefit from jlink. But as explained by Simon Ritter in this blog post, this is not true actually; you don’t need to fully modularize an application in order to run it via a custom runtime image.\nWhile indeed the creation of a runtime image is a bit easier when it only is comprised of proper Java modules, it also is possible to create a runtime image by explicitly stating which JDK (or other) modules it should contain. The application can then be run via the traditional classpath, just as you’d do it with a full Java runtime. Which JDK modules to add though? To answer this question, the jdeps tool comes in handy. Via its --print-module-deps option it can determine for a given set of JARs which (JDK) modules they depend on, and which thus are the ones that need to go into the custom runtime image.\nHaving built the example application from the previous blog post via mvn clean verify, let’s try and invoke jdeps like so:\njdeps --print-module-deps \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar This results in an error though:\nError: com.sun.istack.istack-commons-runtime-3.0.10.jar is a multi-release jar file but --multi-release option is not set Ok, we need to tell which code version to analyse for multi-release JARs; no problem:\njdeps --print-module-deps \\ --multi-release 15 \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar Hum, some progress, but still an issue:\nException in thread \u0026#34;main\u0026#34; java.lang.module.FindException: Module java.xml.bind not found, required by java.ws.rs This one is a bit odd; the file org.jboss.spec.javax.ws.rs.jboss-jaxrs-api_2.1_spec-2.0.1.Final.jar is an explicit module with a module-info.class descriptor, which references the module java.xml.bind, and this one is not found on the module path. It’s not quite clear to me why this is flagged here, given that the JAX-RS API JAR is part of the class path and not the module path. But it’s not a big problem, we simply can add the JAXB API (which also is provided on the class path) on the module path, too.\nThe same issue arises for some other dependencies which are explicit modules already, so we end up with the following configuration:\njdeps --print-module-deps \\ --multi-release 15 \\ --module-path target/lib/jakarta.activation.jakarta.activation-api-1.2.1.jar:target/lib/org.reactivestreams.reactive-streams-1.0.3.jar:target/lib/org.jboss.spec.javax.xml.bind.jboss-jaxb-api_2.3_spec-2.0.0.Final.jar \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar And another issue, now about some missing dependencies:\n... org.postgresql.util.internal.Nullness -\u0026gt; org.checkerframework.dataflow.qual.Pure not found org.wildfly.common.wildfly-common-1.5.4.Final-format-001.jar org.wildfly.common.Substitutions$Target_Branch -\u0026gt; com.oracle.svm.core.annotate.AlwaysInline not found ... After taking a closer look, these are either compile-time only dependencies (like annotations from the Checker framework), or dependencies of optional features which are not relevant for our case. These can be safely ignored using the --ignore-missing-deps switch, which leaves us with this jdeps invocation:\njdeps --print-module-deps \\ --ignore-missing-deps \\ --multi-release 15 \\ --module-path target/lib/jakarta.activation.jakarta.activation-api-1.2.1.jar:target/lib/org.reactivestreams.reactive-streams-1.0.3.jar:target/lib/org.jboss.spec.javax.xml.bind.jboss-jaxb-api_2.3_spec-2.0.0.Final.jar \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar The required JDK modules are printed out finally:\njava.base,java.compiler,java.instrument,java.naming,java.rmi,java.security.jgss,java.security.sasl,java.sql,jdk.jconsole,jdk.unsupported I.e. out of the nearly 60 modules which make up OpenJDK 15, only ten are required by this particular application. Building a custom runtime image containing only these modules should result in quite some space saving.\nWhy is a Particular Module Required? When looking at the module list, you might wonder why certain modules actually are needed. What is this application doing with jdk.jconsole for instance? To gain insight into this, jdeps can help, too. Run it again without the --print-module-deps switch, and you can grep for interesting module references:\njdeps \u0026lt;...\u0026gt; | grep jconsole org.jboss.narayana.jta.narayana-jta-5.10.6.Final.jar -\u0026gt; jdk.jconsole com.arjuna.ats.arjuna.tools.stats -\u0026gt; com.sun.tools.jconsole jdk.jconsole In this case, there’s a single dependency to jconsole, from the Narayana transaction manager. Depending on the details, it might be an opportunity to reach out to the maintainers of such library and discuss, whether this dependency really is needed or whether it could be avoided (e.g. by moving the code in question to a separate module), resulting in a further decreased size of custom runtime images.\nWith the list of required modules, creating the actual runtime image is rather simple:\n$JAVA_HOME/bin/jlink \\ --add-modules java.base,java.compiler,java.instrument,java.naming,java.rmi,java.security.jgss,java.security.sasl,java.sql,jdk.jconsole,jdk.unsupported \\ --compress 2 --no-header-files --no-man-pages \\(1) --output target/runtime-image (2) 1 Compressing the runtime image as well as omitting header files and man pages helps to further reduce the size of the runtime image 2 Output location for creating the runtime image In order to create a dynamic AppCDS archive for our application classes later on, we now need to add the class data archive for all of the classes of the image itself. Failing to do so results in this error message:\nError occurred during initialization of VM DynamicDumpSharedSpaces is unsupported when base CDS archive is not loaded This step isn’t very well documented, and at this point I was somewhat stuck. But you always can count on the OpenJDK community: after asking about this on Twitter, Claes Redestad pointed me into the right direction:\n./target/runtime-image/bin/java -Xshare:dump Thanks, Claes! This creates the base class data archive under target/runtime-image/lib/server/classes.jsa, adding ~12 MB to the runtime image, which now has a size of ~63 MB; not too bad.\nAdding an AppCDS Archive to a Custom Runtime Image Having created the custom Java runtime image, let’s now add the AppCDS archive to it. Since the introduction of dynamic AppCDS archives in JDK 13, this is one simple step which only requires to run the application with the -XX:ArchiveClassesAtExit option:\ncd target (1) mkdir runtime-image/cds (2) (3) runtime-image/bin/java \\ -XX:ArchiveClassesAtExit=runtime-image/cds/app-cds.jsa \\ -jar todo-manager-1.0.0-SNAPSHOT-runner.jar cd .. 1 The class path used when running the application later on must be the same as (or rather a prefix of, to be precise) the class path used for building the AppCDS archive; hence changing to the target directory, so to run with -jar *-runner.jar, instead of with -jar target/*-runner.jar 2 Creating a folder for storing the AppCDS archive 3 Using the java binary of the runtime image to launch the application and create the AppCDS archive when exiting This will create the CDS archive under target/runtime-image/cds/app-cds.jsa. In the next step this can be added to a Linux container image, built e.g. using Docker or podman. Note that while jlink supports cross-platform builds (so for instance you could build a custom runtime image for a Linux container on macOS), the same isn’t the case for AppCDS. This means an AppCDS archive to be used by a containerized application needs to be built on Linux. When not running on Linux yourself, but on Windows or macOS, you could put the entire build process into a container for this purpose.\nCreating a Linux Container Image At this point we have built our actual application, a custom Java runtime image with the required JDK modules, and an AppCDS archive for the application’s classes. The final step is to put everything into a Linux container image, which is quickly done via a small Dockerfile:\nFROM registry.fedoraproject.org/fedora-minimal:33 COPY target/runtime-image /opt/todo-manager/jdk COPY target/lib/* /opt/todo-manager/lib/ COPY target/todo-manager-1.0.0-SNAPSHOT-runner.jar /opt/todo-manager COPY todo-manager.sh /opt/todo-manager ENTRYPOINT [ \u0026#34;/opt/todo-manager/todo-manager.sh\u0026#34; ] This uses the Fedora minimal base image, which is a great foundation for container images. With a size of ~120 MB, it’s small enough to be distributed efficiently, while still providing the flexibility of a complete Linux distribution, e.g. allowing for the installation of additional tools if needed.\nEven Smaller Container Images If you wanted to shrink the image size further and felt adventureous, you could look into using Alpine Linux as a base image; the issue there though is that Alpine comes with musl instead of glibc (as used by the JDK) as its implementation of the ISO C and POSIX standard APIs. The OpenJDK Portola project aims at providing a port to Alpine and musl. But as of JDK 15, no GA build of this port exists yet. For JDK 16, an early access build of the Alpine/musl port is available.\nAnother option for smaller images is to use jib, which also is supported by Quarkus out of the box. I haven’t tried out yet though whether/how jib would work with custom runtime images and AppCDS.\nIt’s also worth pointing out that the size of base images doesn’t matter too much in practice, as container images use a layered file system, which means that typically rather stable base image layers don’t need to be redistributed too often when pushing or pulling a container image.\nThe container’s entry point, todo-manager.sh, is a basic shell script, which starts the actual Java application via the Java runtime image:\n#!/bin/bash export PATH=\u0026#34;/opt/todo-manager/jdk/bin:${PATH}\u0026#34; cd /opt/todo-manager \u0026amp;\u0026amp; \\ (1) exec java -Xshare:on -XX:SharedArchiveFile=jdk/cds/app-cds.jsa -jar \\ (2) todo-manager-1.0.0-SNAPSHOT-runner.jar 1 Changing into the todo-manager directory, so to make sure the same JAR path is passed as when creating the CDS archive 2 Specifying the archive name; the -Xshare:on isn’t strictly needed, it’s used here though to ensure the process will fail if something is wrong with the CDS archive, instead of silently not using it Let’s See Some Numbers! Finally, let’s compare some numbers: container image size, and start-up time for different ways of containerizing the todo manager application. I’ve tried out four different aproaches:\nOpenJDK 11 on the RHEL UBI 8.3 image (universal base image), as per the default Dockerfile created for new Quarkus applications\nA full OpenJDK 15 on Fedora 33 (as there’s no OpenJDK 15 package for the RHEL base image yet)\nA custom runtime image for OpenJDK 15 on Fedora 33\nA custome runtime image with AppCDS on Fedora 33\nHere are the results, running on a Hetzner Cloud CX4 instance (4 vCPUs, 16 GB RAM), using Fedora 33 as the host OS:\nAs we can see, the container image size is significantly lower when adding a custom Java runtime image instead of the full JDK. In particular when comparing to the OpenJDK package of Fedora 33 which is a fair bit larger than the OpenJDK 11 package of the RHEL UBI 8.3 image, the difference is striking.\nThe start-up times are as displayed by Quarkus, averaged over five runs. Numbers have improved by about 10% by going from OpenJDK 11 to 15, which is explained by multiple improvements in this area, most notably the introduction of default CDS archives for the JDK’s own classes in JDK 12 (JEP 341). Using a custom runtime image by itself doesn’t have any measurable impact on start-up time. The AppCDS archive improves the start-up time by a whopping 54%. Unless pure image size is the key factor for you (in which case you should look for alternative approaches anyways, see note \u0026#34;Even Smaller Container Images\u0026#34; above), I would say that the additional 40 MB for the AppCDS archive are more than worth it. In particular as the resulting container image still is way smaller than when adding the full JDK, be it with the Fedora base image or the RHEL UBI one.\nBased on those numbers, I think it’s fair to say that custom Java runtime images created via jlink, combined with AppCDS archives are a great foundation for containerized Java applications. Adding a custom runtime image containing only those JDK modules actually needed by an application help to cut down image size signficantly. Parts of that saved space can be invested into adding an AppCDS archive, so you end up with a container image that’s smaller and starts up faster. I.e. you can have this cake, and eat it, too!\nThe one downside is the increased complexity of the build process for producing the runtime image as well as the AppCDS archive. This should be manageable though by means of scripting and automation; also I’d expect tooling like the Quarkus Maven plug-in and others to further improve on this front. One tricky aspect is that you must not forget to rebuild the custom runtime image, in case you have added dependencies to your application which affect the set of required JDK modules. Automated tests of the application running via the runtime image should help to identify this situation.\nIf you’d like to give it a try yourself, or obtain numbers for the different deployment approaches on your own hardware, you can find all the required code and information in this GitHub repository.\n","id":59,"publicationdate":"Dec 13, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eA few months ago I \u003ca href=\"/blog/building-class-data-sharing-archives-with-apache-maven/\"\u003ewrote about\u003c/a\u003e how you could speed up your Java application’s start-up times using application class data sharing (\u003ca href=\"http://openjdk.java.net/jeps/350\"\u003eAppCDS\u003c/a\u003e),\nbased on the example of a simple \u003ca href=\"https://quarkus.io/\"\u003eQuarkus\u003c/a\u003e application.\nSince then, quite some progress has been made in this area:\nQuarkus 1.6 brought \u003ca href=\"https://quarkus.io/guides/maven-tooling#quarkus-package-pkg-package-config_quarkus.package.create-appcds\"\u003ebuilt-in support for AppCDS\u003c/a\u003e,\nso that now you just need to provide the \u003cem\u003e-Dquarkus.package.create-appcds=true\u003c/em\u003e option when building your project,\nand you’ll find an AppCDS file in the \u003cem\u003etarget\u003c/em\u003e folder.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThings get more challenging though when combining AppCDS with custom Java runtime images,\nas produced using the \u003ca href=\"https://docs.oracle.com/en/java/javase/15/docs/specs/man/jlink.html\"\u003ejlink\u003c/a\u003e tool added in Java 9.\nCombining custom runtime images with AppCDS is very attractive,\nin particular when looking at the deployment of Java applications via Linux containers.\nInstead of putting the full Java runtime into the container image, you only add those JDK modules which your application actually requires.\n(Parts of) what you save in image size by doing so,\ncan be used for adding an AppCDS archive to your container image.\nThe result will be a container image which still is smaller than before — and thus is faster to push to a container registry, distribute to worker nodes in a Kubernetes cluster, etc. — and which starts up significantly faster.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Smaller, Faster-starting Container Images With jlink and AppCDS","uri":"https://www.morling.dev/blog/smaller-faster-starting-container-images-with-jlink-and-appcds/"},{"content":" The Testcontainers project is invaluable for spinning up containerized resources during your (JUnit) tests, e.g. databases or Kafka clusters.\nFor users of JUnit 5, the project provides the @Testcontainers extension, which controls the lifecycle of containers used by a test. When testing a Quarkus application though, this is at odds with Quarkus\u0026#39; own @QuarkusTest extension; it’s a recommended best practice to avoid fixed ports for any containers started by Testcontainers. Instead, you should rely on Docker to automatically allocate random free ports. This avoids conflicts between concurrently running tests, e.g. amongst multiple Postgres containers, started up by several parallel job runs in a CI environment, all trying to allocate Postgres\u0026#39; default port 5432. Obtaining the randomly assigned port and passing it into the Quarkus bootstrap process isn’t possible though when combining the two JUnit extensions.\nOne work-around you can find described e.g. on StackOverflow is setting up the database container via a static class initializer block and then propagating the host and port to Quarkus through system properties. While this works, it’s not ideal in terms of lifecycle control (e.g. how to make sure the container is started up once at the beginning of an entire test suite), and in general, it just feels a bit hack-ish.\nLuckily, there’s a better alternative, which interestingly isn’t discussed as much: using Quarkus\u0026#39; notion of test resources. There’s just two steps involved. First, create an implementation of the QuarkusTestResourceLifecycleManager interface, which controls your resource’s lifecycle. In case of a Postgres database, this could look like this:\npublic class PostgresResource implements QuarkusTestResourceLifecycleManager { static PostgreSQLContainer\u0026lt;?\u0026gt; db = new PostgreSQLContainer\u0026lt;\u0026gt;(\u0026#34;postgres:13\u0026#34;) (1) .withDatabaseName(\u0026#34;tododb\u0026#34;) .withUsername(\u0026#34;todouser\u0026#34;) .withPassword(\u0026#34;todopw\u0026#34;); @Override public Map\u0026lt;String, String\u0026gt; start() { (2) db.start(); return Collections.singletonMap( \u0026#34;quarkus.datasource.url\u0026#34;, db.getJdbcUrl() ); } @Override public void stop() { (3) db.stop(); } } 1 Configure the database container, using the Postgres 13 container image, the given database name, and credentials 2 Start up the database; the returned map of configuration properties amends/overrides the configuration properties of the test; in this case the datasource URL will be overridden with the value obtained from Testcontainers, which contains the randomly allocated public port of the Postgres container 3 Shut down the database after all tests have been executed All you then need to do is to reference that test resource from your test class using the @QuarkusTestResource annotation:\n@QuarkusTest @QuarkusTestResource(PostgresResource.class) (1) public class TodoResourceTest { @Test public void createTodoShouldYieldId() { given() .when() .contentType(ContentType.JSON) .body(\u0026#34;\u0026#34;\u0026#34; { \u0026#34;title\u0026#34; : \u0026#34;Learn Quarkus\u0026#34;, \u0026#34;priority\u0026#34; : 1, } \u0026#34;\u0026#34;\u0026#34;) .then() .statusCode(201) .body( matchesJson( \u0026#34;\u0026#34;\u0026#34; { \u0026#34;id\u0026#34; : 1, \u0026#34;title\u0026#34; : \u0026#34;Learn Quarkus\u0026#34;, \u0026#34;priority\u0026#34; : 1, \u0026#34;completed\u0026#34; : false, } \u0026#34;\u0026#34;\u0026#34;)); } } 1 Ensures the Postgres database is started up And that’s it! Note that all the test resources of the test module are detected and started up, before starting the first test.\nBonus: Schema Creation One other subtle issue is the creation of the database schema for the test. E.g. for my Todo example application, I’d like to use a schema named \u0026#34;todo\u0026#34; in the Postgres database:\ncreate schema todo; Quarkus supports SQL load scripts for executing SQL scripts when Hibernate ORM starts. But this will be executed only after Hibernate ORM has set up all the database objects, such as tables, sequences, indexes etc. (I’m using the drop-and-create database generation mode during testing). This means that while a load script is great for inserting test data, it’s executed too late for defining the actual database schema itself.\nLuckily, most database container images themselves support the execution of load scripts right upon database start-up; The Postgres image is no exception, so it’s just a matter of exposing that script via Testcontainers. All it needs for that is a bit of tweaking of the Quarkus test resource for Postgres:\nstatic PostgreSQLContainer\u0026lt;?\u0026gt; db = new PostgreSQLContainer\u0026lt;\u0026gt;(\u0026#34;postgres:13\u0026#34;) .withDatabaseName(\u0026#34;tododb\u0026#34;) .withUsername(\u0026#34;todouser\u0026#34;) .withPassword(\u0026#34;todopw\u0026#34;) .withClasspathResourceMapping(\u0026#34;init.sql\u0026#34;, (1) \u0026#34;/docker-entrypoint-initdb.d/init.sql\u0026#34;, BindMode.READ_ONLY); 1 Expose the file src/main/resources/init.sql as /docker-entrypoint-initdb.d/init.sql within the container With that in place, Postgres will start up and the \u0026#34;todo\u0026#34; schema will be created in the database, before Quarkus boots Hibernate ORM, which will populate the schema, and finally, all tests can run.\nYou can find the complete source code of this test and the Postgres test resource on GitHub.\nMany thanks to Sergei Egorov for his feedback while writing this blog post!\n","id":60,"publicationdate":"Nov 28, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://www.testcontainers.org/\"\u003eTestcontainers\u003c/a\u003e project is invaluable for spinning up containerized resources during your (JUnit) tests,\ne.g. databases or Kafka clusters.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor users of JUnit 5, the project provides the \u003ca href=\"https://www.testcontainers.org/quickstart/junit_5_quickstart/\"\u003e\u003ccode\u003e@Testcontainers\u003c/code\u003e\u003c/a\u003e extension, which controls the lifecycle of containers used by a test.\nWhen testing a \u003ca href=\"https://quarkus.io/\"\u003eQuarkus\u003c/a\u003e application though, this is at odds with Quarkus\u0026#39; own \u003ca href=\"https://quarkus.io/guides/getting-started-testing#recap-of-http-based-testing-in-jvm-mode\"\u003e\u003ccode\u003e@QuarkusTest\u003c/code\u003e\u003c/a\u003e extension;\nit’s a recommended \u003ca href=\"https://bsideup.github.io/posts/testcontainers_fixed_ports/\"\u003ebest practice\u003c/a\u003e to avoid fixed ports for any containers started by Testcontainers.\nInstead, you should rely on Docker to automatically allocate random free ports.\nThis avoids conflicts between concurrently running tests,\ne.g. amongst multiple Postgres containers,\nstarted up by several parallel job runs in a CI environment, all trying to allocate Postgres\u0026#39; default port 5432.\nObtaining the randomly assigned port and passing it into the Quarkus bootstrap process isn’t possible though when combining the two JUnit extensions.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Quarkus and Testcontainers","uri":"https://www.morling.dev/blog/quarkus-and-testcontainers/"},{"content":" Layers are sort of the secret sauce of the Java platform module system (JPMS): by providing fine-grained control over how individual JPMS modules and their classes are loaded by the JVM, they enable advanced usages like loading multiple versions of a given module, or dynamically adding and removing modules at application runtime.\nThe Layrry API and launcher provides a small plug-in API based on top of layers, which for instance can be used to dynamically add plug-ins contributing new views and widgets to a running JavaFX application. If such plug-in gets removed from the application again, all its classes need to be unloaded by the JVM, avoiding an ever-increasing memory consumption if for instance a plug-in gets updated multiple times.\nIn this blog post I’m going to explore how to ensure classes from removed plug-in layers are unloaded in a timely manner, and how to find the culprit in case some class fails to be unloaded.\nDo We Really Need Plug-ins? Before diving into the details of class unloading, let’s spend some time to think about the use cases for dynamic plug-ins in Java applications to begin with. I would argue that for typical backend applications this need mostly has diminished. At large, the industry is moving away from application servers and their model around \u0026#34;deploying\u0026#34; applications (which you could consider as some kind of \u0026#34;plug-in\u0026#34;) into a running server process. Instead, there’s a strong trend towards immutable application packages, based on stacks like Quarkus or Spring Boot, embedding the web server, the application as well as its dependencies, often-times deployed as container images.\nThe advantages of this approach centered around immutable images manifold, e.g. in terms of security (no interface for deploying applications is needed) and governance (it’s always exactly clear which version of the application is running). Updates — i.e. the deployment of a new revision of the container image — can be put in place e.g. with help of a proxy in front of a cluster of application nodes, which are updated in a rolling manner. That way, there’s no downtime of the service that’ll impact the user. Also techniques like canary releases and A/B testing, as well as rolling back to specific earlier versions of an application become a breeze that way.\nThe situation is different though when it comes to client applications. When thinking of your favourite editor, IDE or web browser for instance, requiring a restart when installing or updating a plug-in is not desirable. Instead, it should be possible to add plug-ins (or new plug-in versions) to a running application instance and be usable immediately, without interrupting the flow of the user. The same applies for many IoT scenarios, where e.g. an application consuming sensor measurements should be updateable without any downtime.\nPlug-ins in Layered Java Applications JPMS addresses this requirement via the notion of module layers:\nA layer is created from a graph of modules in a Configuration and a function that maps each module to a ClassLoader. Creating a layer informs the Java virtual machine about the classes that may be loaded from the modules so that the Java virtual machine knows which module that each class is a member of.\nLayers are the perfect means of adding new code into a running Java application: they can be added and removed dynamically, and code in an already running layer can invoke functionality from a dynamically added layer in different ways, e.g. via reflection or by using the service loader API. Layrry exposes this functionality via a very basic plug-in API:\npublic interface PluginLifecycleListener { void pluginAdded(PluginDescriptor plugin); void pluginRemoved(PluginDescriptor plugin); } public class PluginDescriptor { public String getName() { ... } public ModuleLayer getModuleLayer() { ... } } A plug-in in this context is a JPMS layer containing one or more modules (either explicit or automatic) which all are loaded via a single class loader. A Layrry-based application can implement the PluginLifecycleListener service contract in order to be notified whenever a plug-in is added or removed. Plug-ins are loaded from configured directories in the file system which are monitored by Layrry (other means of (un-)installing plug-ins may be added in future versions of Layrry).\nInstalling a plug-in is as easy as copying its JAR(s) into a sub-folder of such monitored directory. Layrry will copy the plug-in contents to a temporary directory, create a layer with all the plug-ins JARs, and notify any registered plug-in listeners about the new layer. These will typically use the service loader API then to interact with application-specific services which model its extension points, e.g. to contribute visual UI components in case of a desktop application.\nThe reverse process happens when a plug-in gets un-installed: the user removes a plug-in’s directory, and all listeners will be notified by the Layrry about the removal. They should release all references to any classes from the removed plug-in, rendering it avaible for garbage collection.\nClass Unloading in Practice There is no API in the Java platform for explicitly unloading a given class. Instead, \u0026#34;a class or interface may be unloaded if and only if its defining class loader may be reclaimed by the garbage collector\u0026#34; (JLS, chapter 12.7). This means in a layered Java application any classes in a layer that got removed can be unloaded as soon as the layer’s class loader is subject to GC. Most importantly, no class in a still running layer must keep a (strong) reference to any class of the removed layer; otherwise this class would hinder collecting the removed layer’s loader and its classes.\nAs an example, let’s look at the modular-tiles demo, a JavaFX application which uses the Layrry plug-in API for dynamically adding and removing tiles with different widgets like clocks and gauges to its graphical UI. The tiles themselves are implemented using the fabulous TilesFX project by Gerrit Grundwald.\nIf you want to follow along, check out the source code of the demo and build it as per the instructions in the README file. Then run the Layrry launcher with the -Xlog:class+unload=info option, so to be notified about any unloaded classes in the system output:\njava -Xlog:class+unload=info \\ -jar path/to/layrry-launcher-1.0-SNAPSHOT-all.jar \\ --layers-config staging/layers.toml \\ --properties staging/versions.properties Now add and remove some tiles plug-ins a few times:\ncp -r staging/plugins-prepared/* staging/plugins rm -rf staging/plugins/* The widgets will show up and disappear in the JavaFX UI, but what about class unloading in the logs? In all likelyhood, nothing! This is because without any further configuration, the G1 garbage collector (which is used by the JDK by default since Java 9) will unload classes only during a full garbage collection, which may only run after a long time (if at all), if there’s no substantial object allocation happening.\nJEP 158: Unified JVM Logging The -Xlog option has been defined by JEP 158, added to the JDK with Java 9, which provides a \u0026#34;common logging system for all components of the JVM\u0026#34;. The new unified options should be preferred over the legacy options like -XX:+TraceClassLoading and -XX:+TraceClassUnloading. Usage of -Xlog is described in detail in the java man page; also Nicolai Parlog discusses JEP 158 in great depth in this blog post.\nSo at this point you could trigger a GC explicitly, e.g. via jcmd:\njcmd \u0026lt;pid\u0026gt; GC.run But of course that’s not too desirable when running things in production. Instead, if you’re on JDK 12 or later, you can use the new G1PeriodicGCInterval option for triggering a periodic GC:\njava -Xlog:class+unload=info \\ -XX:G1PeriodicGCInterval=5000 \\ -jar path/to/layrry-launcher-1.0-SNAPSHOT-all.jar \\ --layers-config staging/layers.toml \\ --properties staging/versions.properties Introduced via JEP 346 (\u0026#34;Promptly Return Unused Committed Memory from G1\u0026#34;), this will periodically initiate a concurrent GC cycle (or optionally even a full GC). Add and remove some plug-ins again, and after some time you should see messages about the unloaded classes in the log:\n... [138.912s][info][class,unload] unloading class org.kordamp.tiles.sparkline.SparklineTilePlugin 0x0000000800de1840 [138.912s][info][class,unload] unloading class org.kordamp.tiles.gauge.GaugeTilePlugin 0x0000000800de2040 [138.913s][info][class,unload] unloading class org.kordamp.tiles.clock.ClockTilePlugin 0x0000000800de2840 ... From what I observed, class unloading doesn’t happen on every concurrent GC cycle; it might take a few cycles after a plug-in has been removed until its classes are unloaded. If you’re not using G1, but the new low-pause concurrent collectors Shenandoah or ZGC, they’ll be able to concurrently unload classes without any special configuration needed. Note that class unloading is not a mandatory operation which would have to be provided by every GC implementation. E.g. initial ZGC releases did not support class unloading, which would have rendered them unsuitable for this use case.\nJEP 371: Hidden Classes As mentioned above, regular classes can only be unloaded if their defining class loader become subject to garbage collection. This can be an issue for frameworks and libraries which generate lots of classes dynamically at runtime, e.g. script language implementations or solutions like Presto, which generates a class for each query.\nThe traditional workaround is to generate each class using its own dedicated class loader, which then can be discarded specifically. This solves the GC issue, but it isn’t ideal in terms of overall memory consumption and speed of class generation. Hence, JDK 15 defines a notion of Hidden Classes (JEP 371), which are not created by class loaders and thus can be unloaded eagerly: \u0026#34;when all instances of the hidden class are reclaimed and the hidden class is no longer reachable, it may be unloaded even though its notional defining loader is still reachable\u0026#34;.\nYou can find some more information on hidden classes in this tweet thread and this code example on GitHub.\nBut who wants to stare at logs in the system output, that’s so 2010! So let’s fire up JDK Mission Control and trigger a recording via the JDK Flight Recorder (JFR) to observe what’s going on in more depth.\nJFR can capture class unloading events, you need to make sure though to enable this event type, which is not the case by default. In order to do so, start a recording, then go to the Template Manager, edit or create a flight recording template and check the Enabled box for the events under Java Virtual Machine → Class Loading. With the recorder running, add and remove some tiles plug-ins to the running application.\nOnce the recording is finished, you should see class unloading events under JVM Internals → Class Loading:\nIn this case, the classes from a set of plug-ins were unloaded at 16:48:11, which correlates to the periodic GC cycle running at that time and spending a slightly increased time for cleaning up class loader data:\nAs a good Java citizen, Layrry itself also emits JFR events whenever a plug-in layer is added or removed, which helps to track the need for classes to be unloaded:\nIf Things Go Wrong Now let’s look at the situation where some class failed to unload after its plug-in layer was removed. Common reasons for that include remaining references from classes in a still running layer to classes in the removed layer, threads started by a class in the removed layer which were not stopped, and JVM shutdown hooks registered by code in the removed layer.\nThis is known as a class loader leak and is problematic as it means more and more memory will be consumed and cannot be freed as plug-ins are added and removed, which eventually may lead to an OutOfMemoryError. So how could you detect and analyse this situation? An OutOfMemoryError in production would surely be an indicator that there must be a memory or class loader leak somewhere. It’s also a good idea to regularly examine JFR recording files (e.g. in your testing or staging environment): the absence of any class unloading event despite the removal of plug-ins should trigger an investigation.\nAs far as analysing the situation is concerned, examining a heap dump of the application will typically yield insight into the cause rather quickly. Take a heap dump using jcmd as shown above, then load the dump into a tool such as Eclipse MAT. In Eclipse MAT, the \u0026#34;Duplicate Classes\u0026#34; action is a great starting point. If one class has been loaded by multiple class loaders, but failed to unload, it’s a pretty strong indicator that something is wrong:\nThe next step is to analyse the shortest path from the involved class loaders to a GC root:\nSome object on that path must hold on to a reference to a class or the class loader of the removed plug-in, preventing the loader to be GC-ed. In the case at hand, it’s the leakingPlugins field in the PluginRegistry class, to which each plug-in is added upon addition of the layer, but then apparently its coffee-deprived author forgot to remove the plug-in from that collection within the pluginRemoved() event handler ;)\nAs a quick side note, there’s a really cool plug-in for Eclipse MAT written by Vladimir Sitnikov, which allows you to query heap dumps using SQL. It maps each class to its own \u0026#34;table\u0026#34;, so that e.g. classes loaded more than once could be selected using the following SQL query on the java.lang.Class class:\nselect c.name, listagg(toString(c.\u0026#34;@classLoader\u0026#34;)) as \u0026#39;loaders\u0026#39;, count(*) as \u0026#39;count\u0026#39; from \u0026#34;java.lang.Class\u0026#34; c where c.name \u0026lt;\u0026gt; \u0026#39;\u0026#39; group by c.name having count(*) \u0026gt; 1 Resulting in the same list of classes as above:\nThis could come in very handy for more advanced heap dump analyses, which cannot be done using Eclipse MAT’s built-in query capabilities.\nLearning More Via module layers, JPMS provides the foundation for dynamic plug-in architectures, as demonstrated by Layrry. Removing layers at runtime requires some care and consideration, so to avoid class loader leaks which eventually may lead to OutOfMemoryErrors. As so often, JDK Mission Control, JFR, and Eclipse MAT prove to be invaluable tools in the box of every Java developer, helping to ensure class unloading in your layered applications is done correctly, and if it is not, helping to understand and fix the underlying issue.\nHere are some more resources about class unloading and analysing class loader leaks:\nShenandoah GC in JDK 14, Part 2: Concurrent roots and class unloading: A blog post touching on class unloading in Shenandoah by Roman Kennke\nZGC Concurrent Class Unloading: A conference talk by Erik Österlund\nclass loader leaks: A series of blog posts by Mattias Jiderhamn\nClassLoader \u0026amp; memory leaks: a Java love story: A post about heap dump analysis by Aloïs Micard\nLastly, if you’d like to explore the dynamic addition and removal of JPMS layers to a running application yourself, the modular-tiles demo app is a great starting point. Its source code can be found on GitHub.\n","id":61,"publicationdate":"Oct 14, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLayers are sort of the secret sauce of the Java platform module system (JPMS):\nby providing fine-grained control over how individual JPMS modules and their classes are loaded by the JVM,\nthey enable advanced usages like loading multiple versions of a given module, or dynamically adding and removing modules at application runtime.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"/blog/introducing-layrry-runner-and-api-for-modularized-java-applications/\"\u003eLayrry\u003c/a\u003e API and launcher provides a small plug-in API based on top of layers,\nwhich for instance can be used to dynamically add plug-ins contributing new views and widgets to a running JavaFX application.\nIf such plug-in gets removed from the application again,\nall its classes need to be unloaded by the JVM, avoiding an ever-increasing memory consumption if for instance a plug-in gets updated multiple times.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this blog post I’m going to explore how to ensure classes from removed plug-in layers are unloaded in a timely manner,\nand how to find the culprit in case some class fails to be unloaded.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Class Unloading in Layered Java Applications","uri":"https://www.morling.dev/blog/class-unloading-in-layered-java-applications/"},{"content":" Lately I’ve been fascinated by the possibility to analyse the assembly code emitted by the Java JIT (just-in-time) compiler. So far I had only looked only into Java class files using javap; diving into the world of assembly code feels a bit like Alice must have felt when falling down the rabbit whole into wonderland.\nMy motivation for this exploration was trying to understand what is faster in Java: a switch statement over strings, or a lookup in a hash map. Solely looking at Java bytecode isn’t going far enough to answer this question, as the difference lies in the actual assembly statements executed on the CPU. I’ll keep the details around that for another time; in this post I’m just going quickly to share what I learned in regards to building a tool needed for this exercise, hsdis.\nhsdis is a disassembler library which can be used with the java runtime as well as tools such as JitWatch to analyse the code produced by the Java JIT compiler. For licensing reasons though it doesn’t come as a binary with the JDK. Instead, you need it to build yourself from source. Instructions for doing so are spread across a few different places, but I couldn’t find any 100% current information, in particular as OpenJDK has moved to git and GitHub just recently.\nSo here is what you need to do in order to build hsdis for OpenJDK 15; in my case I’m running on macOS, slightly different steps may apply for other platforms. First, get the OpenJDK source code and check out the version for which you want to build hsdis:\ngit clone git@github.com:openjdk/jdk.git git checkout jdk-15+36 # Current stable JDK 15 build The source location of hsdis has changed with the move from Mercurial to git:\ncd src/utils/hsdis In order to build hsdis, you’ll need the GNU Binutils, a collection of several binary tools:\nwget https://ftp.gnu.org/gnu/binutils/binutils-2.35.tar.gz tar xvf binutils-2.35.tar.gz Then run the actual hsdis build (macOS comes with all the required tools like make):\nmake BINUTILS=binutils-2.35 ARCH=amd64 This will take a few minutes; if all goes well, there’ll be hsdis binary in the build directory, in my case this is build/macosx-amd64/hsdis-amd64.dylib. Copy the library to lib/server of our JDK:\nsudo cp build/macosx-amd64/hsdis-amd64.dylib $JAVA_HOME/lib/server If you’re on Linux, you also can provide the hsdis tool via the LD_LIBRARY_PATH environment variable:\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:path/to/hsdis/build/linux-amd64 Note this won’t work on current macOS versions unfortunately due to its System Integrity Protection feature (SIP). Thanks to Brice Dutheil for this tip!\nCongrats! You now can use the XX:+PrintAssembly flag of the java command to examine the assembly code of your Java program. Let’s give it a try. Create a Java source file with the following contents:\npublic class PrintAssemblyTest { public static void main(String... args) { PrintAssemblyTest hello = new PrintAssemblyTest(); for(int i = 0; i \u0026lt;= 10_000_000; i++) { hello.hello(i); } } private void hello(int i) { if (i % 1_000_000 == 0) { System.out.println(\u0026#34;Hello, \u0026#34; + i); } } } Compile and run it like so:\njavac PrintAssemblyTest.java java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly \\ -Xlog:class+load=info -XX:+LogCompilation \\ PrintAssemblyTest You should then find the assembly code of the hello() method somewhere in the output:\n============================= C2-compiled nmethod ============================== ----------------------------------- Assembly ----------------------------------- Compiled method (c2) 1409 106 4 PrintAssemblyTest::hello (20 bytes) total in heap [0x000000011e3fce90,0x000000011e3fd148] = 696 relocation [0x000000011e3fcfe8,0x000000011e3fcff8] = 16 main code [0x000000011e3fd000,0x000000011e3fd080] = 128 stub code [0x000000011e3fd080,0x000000011e3fd098] = 24 oops [0x000000011e3fd098,0x000000011e3fd0a0] = 8 metadata [0x000000011e3fd0a0,0x000000011e3fd0a8] = 8 scopes data [0x000000011e3fd0a8,0x000000011e3fd0d0] = 40 scopes pcs [0x000000011e3fd0d0,0x000000011e3fd140] = 112 dependencies [0x000000011e3fd140,0x000000011e3fd148] = 8 -------------------------------------------------------------------------------- [Constant Pool (empty)] -------------------------------------------------------------------------------- [Entry Point] # {method} {0x000000010d74c4b0} \u0026#39;hello\u0026#39; \u0026#39;(I)V\u0026#39; in \u0026#39;PrintAssemblyTest\u0026#39; # this: rsi:rsi = \u0026#39;PrintAssemblyTest\u0026#39; # parm0: rdx = int # [sp+0x30] (sp of caller) 0x000000011e3fd000: mov 0x8(%rsi),%r10d 0x000000011e3fd004: shl $0x3,%r10 0x000000011e3fd008: movabs $0x800000000,%r11 0x000000011e3fd012: add %r11,%r10 0x000000011e3fd015: cmp %r10,%rax 0x000000011e3fd018: jne 0x0000000116977100 ; {runtime_call ic_miss_stub} 0x000000011e3fd01e: xchg %ax,%ax [Verified Entry Point] 0x000000011e3fd020: mov %eax,-0x14000(%rsp) 0x000000011e3fd027: push %rbp 0x000000011e3fd028: sub $0x20,%rsp ;*synchronization entry ; - PrintAssemblyTest::hello@-1 (line 10) 0x000000011e3fd02c: movslq %edx,%r10 0x000000011e3fd02f: mov %edx,%r11d 0x000000011e3fd032: sar $0x1f,%r11d 0x000000011e3fd036: imul $0x431bde83,%r10,%r10 0x000000011e3fd03d: sar $0x32,%r10 0x000000011e3fd041: mov %r10d,%r10d 0x000000011e3fd044: sub %r11d,%r10d 0x000000011e3fd047: imul $0xf4240,%r10d,%r10d ;*irem {reexecute=0 rethrow=0 return_oop=0} ; - PrintAssemblyTest::hello@3 (line 10) 0x000000011e3fd04e: cmp %r10d,%edx 0x000000011e3fd051: je 0x000000011e3fd063 ;*ifne {reexecute=0 rethrow=0 return_oop=0} ; - PrintAssemblyTest::hello@4 (line 10) 0x000000011e3fd053: add $0x20,%rsp 0x000000011e3fd057: pop %rbp 0x000000011e3fd058: mov 0x110(%r15),%r10 0x000000011e3fd05f: test %eax,(%r10) ; {poll_return} 0x000000011e3fd062: retq 0x000000011e3fd063: mov %edx,%ebp 0x000000011e3fd065: sub %r10d,%ebp ;*irem {reexecute=0 rethrow=0 return_oop=0} ; - PrintAssemblyTest::hello@3 (line 10) 0x000000011e3fd068: mov $0xffffff45,%esi 0x000000011e3fd06d: mov %edx,(%rsp) 0x000000011e3fd070: data16 xchg %ax,%ax 0x000000011e3fd073: callq 0x0000000116979080 ; ImmutableOopMap {} ;*ifne {reexecute=1 rethrow=0 return_oop=0} ; - (reexecute) PrintAssemblyTest::hello@4 (line 10) ; {runtime_call UncommonTrapBlob} 0x000000011e3fd078: hlt 0x000000011e3fd079: hlt 0x000000011e3fd07a: hlt 0x000000011e3fd07b: hlt 0x000000011e3fd07c: hlt 0x000000011e3fd07d: hlt 0x000000011e3fd07e: hlt 0x000000011e3fd07f: hlt [Exception Handler] 0x000000011e3fd080: jmpq 0x0000000116a22d80 ; {no_reloc} [Deopt Handler Code] 0x000000011e3fd085: callq 0x000000011e3fd08a 0x000000011e3fd08a: subq $0x5,(%rsp) 0x000000011e3fd08f: jmpq 0x0000000116978ca0 ; {runtime_call DeoptimizationBlob} 0x000000011e3fd094: hlt 0x000000011e3fd095: hlt 0x000000011e3fd096: hlt 0x000000011e3fd097: hlt -------------------------------------------------------------------------------- Interpreting the output is left as an exercise for the astute reader ;-) A great resource for getting started doing so is the post PrintAssembly output explained! by Jean-Philippe Bempel.\nWith hsdis in place, you also can use the excellent JitWatch tool for analysing the assembly code, which e.g. not only provides an easy way to navigate from source code to byte code to assembly code, but also comes with helpful tooltips explaining the meaning of the different assembly mnemonics.\n","id":62,"publicationdate":"Oct 5, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLately I’ve been fascinated by the possibility to analyse the assembly code emitted by the Java JIT (just-in-time) compiler.\nSo far I had only looked only into Java class files using \u003cem\u003ejavap\u003c/em\u003e;\ndiving into the world of assembly code feels a bit like Alice must have felt when falling down the rabbit whole into wonderland.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Building hsdis for OpenJDK 15","uri":"https://www.morling.dev/blog/building-hsdis-for-openjdk-15/"},{"content":" I’m excited to share the news about an open-source utility I’ve been working on lately: JmFrX, a tool for capturing JMX data with JDK Flight Recorder.\nWhen using JMX (Java Management Extensions), The Java platform’s standard for monitoring and managing applications, JmFrX allows you to periodically record the attributes from any JMX MBean into JDK Flight Recorder (JFR) files, which you then can analyse using JDK Mission Control (JMC).\nThis is useful for a number of reasons:\nYou can track changes to the values of JMX MBean attributes over time without resorting to external monitoring tools\nYou can analyze JMX data from offline JFR recording files in cases where you cannot directly connect to the running application\nYou can export JMX data as live data streams using the JFR event streaming API introduced in Java 14\nIn this blog post I’m going to explain how to use JmFrX for recording JMX data in your applications, point out some interesting JmFrX implemention details, and lastly will discuss some potential steps for future development of the tool.\nWhy JmFrX? JDK Flight Recorder is a \u0026#34;low-overhead data collection framework for troubleshooting Java applications and the HotSpot JVM\u0026#34;. In combination with the JDK Mission Control client application it allows to gain deep insights into the performance characteristics of Java applications.\nIn addition to the built-in metrics and event types, JFR also allows to define and emit custom event types. JFR got open-sourced in JDK 11; since then, developers in the Java eco-system began to support this, enabling users to work with JFR and JMC for analyzing the runtime behavior of 3rd party libraries and frameworks. For instance, JUnit 5.7 produces JFR events related to the execution lifecycle of unit tests.\nAt the same time, many library authors are not (yet) in a position where they could easily emit JFR events from their tools, as for instance they might wish to keep compatibility with older Java versions. They might already expose JMX MBeans though which often provide fine-grained information about the execution state of Java applications. This is where JmFrX comes in: by periodically capturing the attribute values from a given set of JMX MBeans, it allows to capture this information in JFR recordings.\nJmFrX isn’t the first effort that seeks to bridge JMX and JFR; JDK Mission Control project lead Marcus Hirt discusses a similar project in a blog post in 2016. But unlike the implementation described by Marcus in this post, JmFrX is based on the public and supported APIs for defining, configuring and emitting JFR events, as available since OpenJDK 11.\nHow To Use JmFrX In order to use JmFrX, make sure to run OpenJDK 11 or newer. OpenJDK 8 also contains the open-sourced Flight Recorder bits as of release 8u262 (from July this year); so this should work, too, but I haven’t tested it yet.\nUntil a stable release will be provided, you can obtain JmFrX snapshot builds via JitPack. For that, add the JitPack repository to your pom.xml when using Apache Maven (or apply equivalent configuration for your preferred build tool):\n... \u0026lt;repositories\u0026gt; \u0026lt;repository\u0026gt; \u0026lt;id\u0026gt;jitpack.io\u0026lt;/id\u0026gt; \u0026lt;url\u0026gt;https://jitpack.io\u0026lt;/url\u0026gt; \u0026lt;/repository\u0026gt; \u0026lt;/repositories\u0026gt; ... Then add the JmFrX dependency:\n... \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;com.github.gunnarmorling\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jmfrx\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;master-SNAPSHOT\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; ... The next step is registering the JmFrX event type with JFR in the start-up routine of your program. This could for instance be done in the main() method, the static initializer of a class loaded early on, an eagerly initialized Spring or CDI bean, etc. A Java agent for this purpose will be provided as part of this project soon.\nWhen building applications with Quarkus, you could use an application start-up event like so:\n@ApplicationScoped public class EventRegisterer { public void registerEvent(@Observes StartupEvent se) { Jmfrx.getInstance().register(); } public void unregisterEvent(@Observes ShutdownEvent se) { Jmfrx.getInstance().unregister(); } } Now start your application and create a JFR configuration file which enables the JmFrX event type. To do so, open JDK Mission Control, and choose your running application in the JVM Browser. Then perform these steps:\nRight-click the target JVM → Select Start Flight Recording…​\nClick on Template Manager\nCopy the Continuous setting and click Edit for modifying this copy\nExpand the JMX and JMX Dump nodes\nMake sure the JMX Dump event type is Enabled; choose a period for dumping the chosen JMX MBeans (by default 60 s) and specify the MBeans whose data should be captured; that’s done by means of a regular expression, which matches one or more JMX object names, for instance .*OperatingSystem.*:\nClose the two last dialogues by clicking OK and OK\nImportant: Make sure that the template you edited is selected under Event settings\nClick Finish to begin the recording\nOnce the recording is complete, open the recording file in JDK Mission Control and go to the Event Browser. You should see periodic events corresponding to the selected MBeans under the JMX node:\nWhen not using JDK Mission Control to initiate recordings, but the jcmd utility on the command line, also follow the same steps as above for creating a configuration as described above. But then, instead of starting the recording, export the configuration file from the template manager and specify its name to jcmd via the settings=/path/to/settings.jfc parameter.\nNow using JmFrX to observe JMX data from for the java.lang MBeans like Runtime and OperatingSystem in JFR isn’t too exciting yet, as there’s dedicated JFR event types which contain most of that information. But things get more interesting when capturing data from custom MBean types, as e.g. here for the stream threads metrics from a Kafka Streams application:\nCustomizing Event Formats By default, JmFrX will propagate the raw attribute values from a JMX MBean to the corresponding JFR event. This makes sure that all the information can be retrieved from recordings, but the data format can be a bit unwieldy, e.g. when it comes to data amounts in bytes, or time periods in milli-seconds since epoch.\nTo address this, JFR supports a range of metadata annotations such as @DataAmount, @Timespan, or @Percentage, which allow to format event attributes. This information then is used by JMC for instance when displaying events in the browser (see event Properties to the left in the screenshot above).\nJmFrX integrates with this metadata facility via the notion of event profiles, which describe the data format of one MBean type and its attributes. When creating an event for a given JMX MBean, JmFrX will look for a corresponding event profile and apply its settings. Event profiles are defined by implementing the EventProfileContributor SPI. As an example here’s a subset of the the built-in profile definition for the OperatingSystem MBean:\npublic class JavaLangEventProfileContributor implements EventProfileContributor { @Override public void contributeProfiles(EventProfileBuilder builder) { builder.addEventProfile(\u0026#34;java.lang:type=OperatingSystem\u0026#34;) (1) .addAttributeProfile(\u0026#34;TotalSwapSpaceSize\u0026#34;, long.class, new AnnotationElement(DataAmount.class, DataAmount.BYTES), (2) v -\u0026gt; v) .addAttributeProfile(\u0026#34;FreeSwapSpaceSize\u0026#34;, long.class, new AnnotationElement(DataAmount.class, DataAmount.BYTES), v -\u0026gt; v) (3) .addAttributeProfile(\u0026#34;CpuLoad\u0026#34;, double.class, new AnnotationElement(Percentage.class), v -\u0026gt; v) .addAttributeProfile(\u0026#34;ProcessCpuLoad\u0026#34;, double.class, new AnnotationElement(Percentage.class), v -\u0026gt; v) .addAttributeProfile(\u0026#34;SystemCpuLoad\u0026#34;, double.class, new AnnotationElement(Percentage.class), v -\u0026gt; v) .addAttributeProfile(\u0026#34;ProcessCpuTime\u0026#34;, long.class, new AnnotationElement(Timespan.class, Timespan.NANOSECONDS), v -\u0026gt; v ); } } 1 Profiles are linked via the MBean name 2 The atribute type is specified via an AnnotationElement for one of the JFR type metadata annotations 3 If needed, the actual value can be modified too, e.g. to convert it into another data type, or to shift its value into an expected range (for instance 0 to 1 for percentage values) Once you’ve defined the event profiles for your MBean type(s), don’t forget to register the contributor type either as a service implementation in your module-info.java descriptor (when building a modular Java application):\nmodule com.example { requires jdk.jfr; requires dev.morling.jmfrx; provides dev.morling.jmfrx.spi.EventProfileContributor with com.example.MyEventProfileContributor; } When building an application using the traditional classpath, register the names of all profile contributors in the META-INF/services/dev.morling.jmfrx.spi.EventProfileContributor file.\nThere’s a small (yet hopefully growing) set of event profiles built into JmFrX. But as event profile contributors are discovered using the Java service loader mechanism, you can also easily plug in event profiles for other MBean types, e.g. for the JMX MBeans of Apache Kafka or Kafka Connect, or application servers like WildFly.\nAlso your pull requests for contributing event profiles for common JMX applications to JmFrX itself will be very welcomed!\nHow It Works If you solely want to use JmFrX, you can pretty much stop reading this post at this point. But if you’re curious about how it is working internally, stay with me for a bit longer: JmFrX uses two lesser known JFR features which also might be interesting for your own application-specific event types, periodic JFR events and dynamic event types.\nUnlike most JFR event types which are emitted when some specific JVM or application functionality is executed, periodic events are produced in a regular interval. The default interval (which can be overridden by the user) is specified using the @Period annotation on the event type definition:\n@Name(JmxDumpEvent.NAME) @Label(\u0026#34;JMX Dump\u0026#34;) @Category(\u0026#34;JMX\u0026#34;) @Description(\u0026#34;Periodically dumps specific JMX MBeans\u0026#34;) @StackTrace(false) @Period(\u0026#34;60 s\u0026#34;) public class JmxDumpEvent extends Event { public static final String NAME = \u0026#34;dev.morling.jmfrx.JmxDumpEvent\u0026#34;; // event implementation ... } Upon application start-up, JmFrX registers this event type with the JFR environment:\n... private Runnable hook; public void register() { hook = () -\u0026gt; { (1) JmxDumpEvent dumpEvent = new JmxDumpEvent(); if (!dumpEvent.isEnabled()) { return; } dumpEvent.begin(); // retrieve data from matching MBean(s) and create event(s) ... dumpEvent.commit(); }; FlightRecorder.addPeriodicEvent(JmxDumpEvent.class, hook); (2) } public void unregister() { FlightRecorder.removePeriodicEvent(hook); (3) } ... 1 The event hook implementation 2 Register the periodic event 3 Unregister the periodic event The regular expression for specifying the MBean name(s) is passed to the event type as a SettingControl. You can learn more about event settings in my post on custom JFR event types.\nWhen the periodic event hook runs, it must create one event for each captured MBean. As JmFrX cannot know which MBean(s) you’re interested in, it’s not an option to pre-define these event types and their structure.\nThis is where dynamic JFR event types come in: Using the EventFactory class, event types can be defined at runtime. Under the covers, JFR will create a corresponding Event sub-class dynamically using the ASM API. Here’s the relevant JmFrX code which defines the event type for a given MBean:\n... public static EventDescriptor getDescriptorFor(String mBeanName) { MBeanServer mbeanServer = ManagementFactory.getPlatformMBeanServer(); try { ObjectName objectName = new ObjectName(mBeanName); MBeanInfo mBeanInfo = mbeanServer.getMBeanInfo(objectName); List\u0026lt;AnnotationElement\u0026gt; eventAnnotations = Arrays.asList( (1) new AnnotationElement(Category.class, getCategory(objectName)), new AnnotationElement(StackTrace.class, false), new AnnotationElement(Name.class, getName(objectName)), new AnnotationElement(Label.class, getLabel(objectName)), new AnnotationElement(Description.class, mBeanInfo.getDescription()) ); List\u0026lt;AttributeDescriptor\u0026gt; fields = getFields(objectName, mBeanInfo); List\u0026lt;ValueDescriptor\u0026gt; valueDescriptors = fields.stream() (2) .map(AttributeDescriptor::getValueDescriptor) .collect(Collectors.toList()); return new EventDescriptor(EventFactory.create(eventAnnotations, valueDescriptors), fields); } catch (Exception e) { throw new RuntimeException(e); } } ... 1 Define event metadata like name, label, category etc. via the JFR metadata annotations 2 For each MBean attribute, an attribute is added to the event type; its definition is based on the information in the corresponding event profile, if present The actual implemention is slightly more complex, as it deals with integrating metadata from JmFrX event profiles and more. You can find the complete code in the EventProfile class.\nTakeaways JmFrX is a small utility which allows you to capture JMX data with JDK Flight Recorder. It’s open-source (Apache License, version 2), you can find the source code on GitHub. With the wide usage of JMX for application monitoring in the Java world, JmFrX can help to bring that information into JFR recordings, making it available for offline investigations and analyses.\nPotential next steps for JmFrX include more meaningful handling of tabular and composite JMX data, adding a Java agent for registering the event type, providing some more built-in event profiles and publishing a stable release on Maven Central. Eventually, the JmFrX project might move over to the rh-jmc-team GitHub organization, which is is managed by Red Hat’s OpenJDK team and contains many other very useful projects around JDK Flight Recorder and Mission Control.\nYour feedback on and contributions to JmFrX will be very welcomed!\n","id":63,"publicationdate":"Aug 18, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI’m excited to share the news about an open-source utility I’ve been working on lately:\n\u003ca href=\"https://github.com/gunnarmorling/jmfrx\"\u003eJmFrX\u003c/a\u003e,\na tool for capturing JMX data with JDK Flight Recorder.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen using JMX (\u003ca href=\"https://en.wikipedia.org/wiki/Java_Management_Extensions\"\u003eJava Management Extensions\u003c/a\u003e), The Java platform’s standard for monitoring and managing applications,\nJmFrX allows you to periodically record the attributes from any JMX MBean into \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) files,\nwhich you then can analyse using \u003ca href=\"https://openjdk.java.net/projects/jmc/\"\u003eJDK Mission Control\u003c/a\u003e (JMC).\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Introducing JmFrX: A Bridge From JMX to JDK Flight Recorder","uri":"https://www.morling.dev/blog/introducing-jmfrx-a-bridge-from-jmx-to-jdk-flight-recorder/"},{"content":" I have built a custom search functionality for this blog, based on Java and the Apache Lucene full-text search library, compiled into a native binary using the Quarkus framework and GraalVM. It is deployed as a Serverless application running on AWS Lambda, providing search results without any significant cold start delay. If you thought Java wouldn’t be the right language for this job, keep reading; in this post I’m going to give an overview over the implementation of this feature and my learnings along the way.\nHaving a search functionality for my blog has been on my mind for quite some time; I’d like to give users the opportunity to find specific contents on this blog right here on this site, without having to use an external search engine. That’s not only nice in terms of user experience, but also having insight into the kind of information readers look for on this blog should help me to identify interesting things to write about in the future.\nNow this blog is a static site — generated using Hugo, hosted on GitHub Pages — which makes this an interesting challenge. I didn’t want to rely on an external search service (see \u0026#34;Why No External Search Service\u0026#34; below for the reasoning), and also a purely client-side solution as described in this excellent blog post didn’t seem ideal. While technically fascinating, I didn’t like the fact that it requires shipping the entire search index to the client for executing search queries. Also things like result highlighting, customized result scoring, word stemming, fuzzy search and more seemed a bit more than I’d be willing to implement on the client.\nAll these issues have largely been solved on the server-side by libraries such as Apache Lucene for quite some time. Using a library like Lucene means implementing a custom server-side process, though. How to deploy such service? Operating a VM 24/7 with my search backend for what’s likely going to be not more than a few dozen queries per month seemed a bit like overkill.\nSo after some consideration I decided to implement my own search functionality, based on the highly popular Apache Lucene library, deployed as a Serverless application, which is started on-demand if a user runs a query on my website. In the remainder of this post I’m going to describe the solution I came up with and how it works.\nIf you like, you can try it out right now, this post is about this little search input control at the top right of this page!\nWhy No External Search Service? When tweeting about my serverless search experiment, one of the questions was \u0026#34;What’s wrong with Algolia?\u0026#34;. To be very clear, there’s nothing wrong with it at all. External search services like Algolia, Google Custom Search, or an Elasticsearch provider such as Bonsai promise an easy-to-use, turn-key search functionality which can be a great choice for your specific use case.\nHowever, I felt that none of these options would provide me the degree of control and customizability I was after. I also ruled out any \u0026#34;free\u0026#34; options, as they’d either mean having ads or paying for the service with the data of myself or that of my readers. And to be honest, I also just fancied the prospect of solving the problem by myself, instead of relying on an off-the-shelf solution.\nWhy Serverless? First of all, let’s discuss why I opted for a Serverless solution. It boils down to three reasons:\nSecurity: While it’d only cost a few EUR per month to set up a VM with a cloud provider like Digital Ocean or Hetzner, having to manage a full operating system installation would require too much of my attention; I don’t want someone to mine bitcoins or doing other nasty things on a box I run, just because I failed to apply some security patch\nCost: Serverless does not only promise to scale-out (and let’s be honest, there likely won’t be millions of search queries on my blog every month), but also scale-to-zero. As Serverless is pay-per-use and there are free tiers in place e.g. for AWS Lambda, this service ideally should cost me just a few cents per month\nLearning Opportunity: Last but not least, this also should be a nice occasion for me to dive into the world of Serverless, by means of designing, developing and running a solution for a real-world problem, exploring how Java as my preferred programming language can be used for this task\nSolution Overview The overall idea is quite simple: there’s a simple HTTP service which takes a query string, runs the query against a Lucene index with my blog’s contents and returns the search results to the caller. This service gets invoked via JavaScript from my static blog pages, where results are shown to the user.\nThe Lucene search index is read-only and gets rebuilt whenever I update the blog. It’s baked into the search service deployment package, which that way becomes fully immutable. This reduces complexities and the attack surface at runtime. Surely that’s not an approach that’s viable for more dynamic use cases, but for a blog that’s updated every few weeks, it’s perfect. Here’s a visualization of the overall flow:\nThe search service is deployed as a Serverless function on AWS Lambda. One important design goal for me is to avoid lock-in to any specific cloud provider: the solution should be portable and also be usable with container-based Serverless approaches like Knative.\nRelying on a Serverless architecture means its start-up time must be a matter of milli-seconds rather than seconds, so to not have a user wait for a noticeable amount of time in case of a cold start. While substantial improvements have been made in recent Java versions to improve start-up times, it’s still not ideal for this kind of use case. Therefore, the application is compiled into a native binary via Quarkus and GraalVM, which results in a start-up time of ~30 ms on my laptop, and ~180 ms when deployed to AWS Lambda. With that we’re in a range where a cold start won’t impact the user experience in any significant way.\nThe Lambda function is exposed to callers via the AWS API Gateway, which takes incoming HTTP requests, maps them to calls of the function and converts its response into an HTTP response which is sent back to the caller.\nNow let’s dive down a bit more into the specific parts of the solution. Overall, there are four steps involved:\nData extraction: The blog contents to be indexed must be extracted and converted into an easy-to-process data format\nSearch backend implementation: A small HTTP service is needed which exposes the search functionality of Apache Lucene, which in particular requires some steps to enable Lucene being used in a native GraalVM binary\nIntegration with the website: The search service must be integrated into the static site on GitHub Pages\nDeployment: Finally, the search service needs to be deployed to AWS API Gateway and Lambda\nData Extraction The first step was to obtain the contents of my blog in an easily processable format. Instead of requiring something like a real search engine’s crawler, I essentially only needed to have a single file in a structured format which then can be passed on to the Lucene indexer.\nThis task proved rather easy with Hugo; by means of a custom output format it’s straight-forward to produce a JSON file which contains the text of all my blog pages. In my config.toml I declared the new output format and activate it for the homepage (largely inspired by this write-up):\n[outputFormats.SearchIndex] mediaType = \u0026#34;application/json\u0026#34; baseName = \u0026#34;searchindex\u0026#34; isPlainText = true notAlternative = true [outputs] home = [\u0026#34;HTML\u0026#34;,\u0026#34;RSS\u0026#34;, \u0026#34;SearchIndex\u0026#34;] The template in layouts/_default/list.searchindex.json isn’t too complex either:\n{{- $.Scratch.Add \u0026#34;searchindex\u0026#34; slice -}} {{- range $index, $element := .Site.Pages -}} {{- $.Scratch.Add \u0026#34;searchindex\u0026#34; (dict \u0026#34;id\u0026#34; $index \u0026#34;title\u0026#34; $element.Title \u0026#34;uri\u0026#34; $element.Permalink \u0026#34;tags\u0026#34; $element.Params.tags \u0026#34;section\u0026#34; $element.Section \u0026#34;content\u0026#34; $element.Plain \u0026#34;summary\u0026#34; $element.Summary \u0026#34;publicationdate\u0026#34; ($element.Date.Format \u0026#34;Jan 2, 2006\u0026#34;)) -}} {{- end -}} {{- $.Scratch.Get \u0026#34;searchindex\u0026#34; | jsonify -}} The result is this JSON file:\n[ ... { \u0026#34;content\u0026#34;: \u0026#34;The JDK Flight Recorder (JFR) is an invaluable tool...\u0026#34;, \u0026#34;id\u0026#34;: 12, \u0026#34;publicationdate\u0026#34;: \u0026#34;Jan 29, 2020\u0026#34;, \u0026#34;section\u0026#34;: \u0026#34;blog\u0026#34;, \u0026#34;summary\u0026#34;: \u0026#34;\\u003cdiv class=\\\u0026#34;paragraph\\\u0026#34;\\u003e\\n\\u003cp\\u003eThe \\u003ca href=\\\u0026#34;https://openjdk.java.net/jeps/328\\\u0026#34;\\u003eJDK Flight Recorder\\u003c/a\\u003e (JFR) is an invaluable tool...\u0026#34;, \u0026#34;tags\u0026#34;: [ \u0026#34;java\u0026#34;, \u0026#34;monitoring\u0026#34;, \u0026#34;microprofile\u0026#34;, \u0026#34;jakartaee\u0026#34;, \u0026#34;quarkus\u0026#34; ], \u0026#34;title\u0026#34;: \u0026#34;Monitoring REST APIs with Custom JDK Flight Recorder Events\u0026#34;, \u0026#34;uri\u0026#34;: \u0026#34;https://www.morling.dev/blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/\u0026#34; }, ... ] This file gets automatically updated whenever I republish the blog.\nSearch Backend Implementation My stack of choice for this kind of application is Quarkus. As a contributor, I am of course biased, but Quarkus is ideal for the task at hand: built and optimized from the ground up for implementing fast-starting and memory-efficient cloud-native and Serverless applications, it makes building HTTP services, e.g. based on JAX-RS, running on GraalVM a trivial effort.\nNow typically a Java library such as Lucene will not run in a GraalVM native binary out-of-the-box. Things like reflection or JNI usage require specific configuration, while other Java features like method handles are only supported partly or not at all.\nApache Lucene in a GraalVM Native Binary Quarkus enables a wide range of popular Java libraries to be used with GraalVM, but at this point there’s no extension yet which would take care of Lucene. So I set out to implement a small Quarkus extension for Lucene. Depending on the implementation details of the library in question, this can be a more or less complex and time-consuming endeavor. The workflow is like so:\ncompile down an application using the library into a native image\nrun into some sort of exception, e.g. due to types accessed via Java reflection (which causes the GraalVM compiler to miss them during call flow analysis so that they are missing from the generated binary image)\nfix the issue e.g. by registering the types in question for reflection\nrinse and repeat\nThe good thing there is that the list of Quarkus extensions is constantly growing, so that you hopefully don’t have to go through this by yourself. Or if you do, consider publishing your extension via the Quarkus platform, saving others from the same work.\nFor my particular usage of Lucene, I ran luckily into two issues only. The first is the usage of method handles in the AttributeFactory class for dynamically instantiating sub-classes of the AttributeImpl type, which isn’t supported in that form by GraalVM. One way for dealing with this is to define substitutions, custom methods or classes which will override a specific original implementation. As an example, here’s one of the substitution classes I had to create:\n@TargetClass(className = \u0026#34;org.apache.lucene.util.AttributeFactory$DefaultAttributeFactory\u0026#34;) public final class DefaultAttributeFactorySubstitution { public DefaultAttributeFactorySubstitution() {} @Substitute public AttributeImpl createAttributeInstance(Class\u0026lt;? extends Attribute\u0026gt; attClass) { if (attClass == BoostAttribute.class) { return new BoostAttributeImpl(); } else if (attClass == CharTermAttribute.class) { return new CharTermAttributeImpl(); } else if (...) { ... } throw new UnsupportedOperationException(\u0026#34;Unknown attribute class: \u0026#34; + attClass); } } During native image creation, the GraalVM compiler will discover all substitute classes and apply their code instead of the original ones.\nThe other problem I ran into was the usage of method handles in the MMapDirectory class, which will be used by Lucene by default on Linux when obtaining a file-system backed index directory. I didn’t explore how to circumvent that, instead I opted for using the SimpleFSDirectory implementation which proved to work fine in my native GraalVM binary.\nWhile this was enough in order to get Lucene going in a native image, you might run into different issues when using other libraries with GraalVM native binaries. Quarkus comes with a rich set of so-called build items which extension authors can use in order to enable external dependencies on GraalVM, e.g. for registering classes for reflective access or JNI, adding additional resources to the image, and much more. I recommend you take a look at the extension author guide in order to learn more.\nBesides enabling Lucene on GraalVM, that Quarkus extension also does two more things:\nParse the previously extracted JSON file, build a Lucene index from that and store that index in the file system; that’s fairly standard Lucene procedure without anything noteworthy; I only had to make sure that the index fields are stored in their original form in the search index, so that they can be accessed at runtime when displaying fragments with the query hits\nRegister a CDI bean, which allows to obtain the index at runtime via @Inject dependency injection from within the HTTP endpoint class\nA downside of creating binaries via GraalVM is the increased build time: creating a native binary for macOS via a locally installed GraalVM SDK takes about two minutes on my laptop. For creating a Linux binary to be used with AWS Lambda, I need to run the build in a Linux container, which takes about five minutes. But typically this task is only done once when actually deploying the application, whereas locally I’d work either with the Quarkus Dev Mode (which does a live reload of the application as its code changes) or test on the JVM. In any case it’s a price worth paying: only with start-up times in the range of milli-seconds on-demand Serverless cold starts with the user waiting for a response become an option.\nThe Search HTTP Service The actual HTTP service implementation for running queries is rather unspectacular; It’s based on JAX-RS and exposes as simple endpoint which can be invoked with a given query like so:\nhttp \u0026#34;https://my-search-service/search?q=java\u0026#34; HTTP/1.1 200 OK Connection: keep-alive Content-Length: 4930 Content-Type: application/json Date: Tue, 21 Jul 2020 17:05:00 GMT { \u0026#34;message\u0026#34;: \u0026#34;ok\u0026#34;, \u0026#34;results\u0026#34;: [ { \u0026#34;fragment\u0026#34;: \u0026#34;...plug-ins. In this post I\u0026amp;#8217;m going to explore how the \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; Platform Module System\u0026#39;s notion of module layers can be leveraged for implementing plug-in architectures on the JVM. We\u0026amp;#8217;ll also discuss how Layrry, a launcher and runtime for layered \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; applications, can help with this task. A key requirement...\u0026#34;, \u0026#34;publicationdate\u0026#34;: \u0026#34;Apr 21, 2020\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Plug-in Architectures With Layrry and the \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; Module System\u0026#34;, \u0026#34;uri\u0026#34;: \u0026#34;https://www.morling.dev/blog/plugin-architectures-with-layrry-and-the-java-module-system/\u0026#34; }, { \u0026#34;fragment\u0026#34;: \u0026#34;...the current behavior indeed is not intended (see JDK-8236597) and in a future \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; version the shorter version of the code shown above should work. Wrap-Up In this blog post we\u0026amp;#8217;ve explored how invariants on \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; 14 record types can be enforced using the Bean Validation API. With just a bit...\u0026#34;, \u0026#34;publicationdate\u0026#34;: \u0026#34;Jan 20, 2020\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Enforcing \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; Record Invariants With Bean Validation\u0026#34;, \u0026#34;uri\u0026#34;: \u0026#34;https://www.morling.dev/blog/enforcing-java-record-invariants-with-bean-validation/\u0026#34; }, ... ] } Internally it’s using Lucene’s MultiFieldQueryParser for parsing the query and running it against the \u0026#34;title\u0026#34; and \u0026#34;content\u0026#34; fields of the index. It is set to combine multiple terms using the logical AND operator by default (who ever would want the default of OR?), it supports phrase queries given in quotes, and a number of other query operators.\nQuery hits are highlighted using the FastVectorHighlighter highlighter and SimpleHTMLFormatter as a fallback (not all kinds of queries can be processed by FastVectorHighlighter). The highlighter wraps the matched search terms in the returned fragment in \u0026lt;b\u0026gt; tags, which are styled appropriately in my website’s CSS. I was prepared to do some adjustments to result scoring, but this wasn’t necessary so far. Title matches are implicitly ranked higher than content matches due to the shorter length of the title field values.\nImplementing the service using a standard HTTP interface instead of relying on specific AWS Lambda contracts is great in terms of local testing as well as portability: I can work on the service using the Quarkus Dev Mode and invoke it locally, without having to deploy it into some kind of Lambda test environment. It also means that should the need arise, I can take this service and run it elsewhere, without requiring any code changes. As I’ll discuss in a bit, Quarkus takes care of making this HTTP service runnable within the Lambda environment by means of a single dependency configuration.\nWiring Things Up Now it was time to hook up the search service into my blog. I wouldn’t want to have the user navigate to the URL of the AWS API Gateway in their browser; this means that the form with the search text input field cannot actually be submitted. Instead, the default form handling must be disabled, and the search string be sent via JavaScript to the API Gateway URL.\nThis means the search feature won’t work for users who have JavaScript disabled in their browser. I deemed this an acceptable limitation; in order to avoid unnecessary confusion and frustration, the search text input field is hidden in that case via CSS:\n\u0026lt;noscript\u0026gt; \u0026lt;style type=\u0026#34;text/css\u0026#34;\u0026gt; .search-input { display:none; } \u0026lt;/style\u0026gt; \u0026lt;/noscript\u0026gt; The implementation of the backend call is fairly standard JavaScript business using the XMLHttpRequest API, so I’ll spare you the details here. You can find the complete implementation in my GitHub repo.\nThere’s one interesting detail to share though in terms of improving the user experience after a cold start. As mentioned above, the Quarkus application itself starts up on Lambda in about ~180 ms. Together with the initialization of the Lambda execution environment I typically see ~370 ms for a cold start. Add to that the network round-trip times, and a user will feel a slight delay. Nothing dramatical, but it doesn’t have that snappy instant feeling you get when executing the search with a warm environment.\nThinking about the typical user interaction though, the situation can be nicely improved: if a visitor puts the focus onto the search text input field, it’s highly likely that they will submit a query shortly thereafter. We can take advantage of that and have the website send a small \u0026#34;ping\u0026#34; request right at the point when the input field obtains the focus. This gives us enough headstart to have the Lambda function being started before the actual query comes in. Here’s the request flow of a typical interaction (the \u0026#34;Other\u0026#34; requests are CORS preflight requests):\nNote how the search call is issued only a few hundred ms after the ping. Now you could beat this e.g. when navigating to the text field using your keyboard and if you were typing really fast. But most users will use their mouse or touchpad to put the cursor into the input, and then change to the keyboard to enter the query, which is time enough for this little trick to work.\nThe analysis of the logs confirms that essentially all executed queries hit a warmed up Lambda function, making cold starts a non-issue. To avoid any unneeded warm-up calls, they are only done when entering the input field for the first time after loading the page, or when staying on the page for long enough, so that the Lambda might have shut down again due to lack of activity.\nOf course you’ll be charged for the additional ping requests, but for the volume I expect, this makes no relevant difference whatsoever.\nDeployment to AWS Lambda The last part of my journey towards a Serverless search function was deployment to AWS Lambda. I was exploring Heroku and Google Cloud Run as alternatives, too. Both allow you to deploy regular container images, which then are automatically scaled on demand. This results in great portability, as things hardly can get any more standard than plain Linux containers.\nWith Heroku, cold start times proved problematic, though: I observed 5 - 6 seconds, which completely ruling it out. This wasn’t a problem with Cloud Run, and it’d surely work very well overall. In the end I went for AWS Lambda, as its entire package of service runtime, API Gateway and web application firewall seemed more complete and mature to me.\nWith AWS Lambda, I observed cold start times of less than 0.4 sec for my actual Lambda function, plus the actual request round trip. Together with the warm-up trick described above, this means that a user practically never will get a cold start when executing the search.\nYou shouldn’t under-estimate the time needed though to get familiar with Lambda itself, the API Gateway which is needed for routing HTTP requests to your function and the interplay of the two.\nTo get started, I configured some playground Lambda and API in the web console, but eventually I needed something along the lines of infrastructure-as-code, means of reproducible and automated steps for configuring and setting up all the required components. My usual go-to solution in this area is Terraform, but here I settled for the AWS Serverless Application Model (SAM), which is tailored specifically to setting up Serverless apps via Lambda and API Gateway and thus promised to be a bit easier to use.\nBuilding Quarkus Applications for AWS Lambda Quarkus supports multiple approaches for building Lambda-based applications:\nYou can directly implement Lambda’s APIs like RequestHandler, which I wanted to avoid though for the sake of portability between different environments and cloud providers\nYou can use the Quarkus Funqy API for building portable functions which e.g. can be deployed to AWS, Azure Functions and Google Cloud Functions; the API is really straight-forward and it’s a very attractive option, but right now there’s no way to use Funqy for implementing an HTTP GET API with request parameters, which ruled out this option for my purposes\nYou can implement your Lambda function using the existing and well-known HTTP APIs of Vert.x, RESTEasy (JAX-RS) and Undertow; in this case Quarkus will take care of mapping the incoming function call to the matching HTTP endpoint of the application\nUsed together with the proxy feature of the AWS API Gateway, the third option is exactly what I was looking for. I can implement the search endpoint using the JAX-RS API I’m familiar with, and the API Gateway proxy integration together with Quarkus\u0026#39; glue code will take care of everything else for running this. This is also great in terms of portability: I only need to add the io.quarkus:quarkus-amazon-lambda-http dependency to my project, and the Quarkus build will emit a function.zip file which can be deployed to AWS Lambda. I’ve put this into a separate Maven build profile, so I can easily switch between creating the Lambda function deployment package and a regular container image with my REST endpoint which I can deploy to Knative and environments like OpenShift Serverless, without requiring any code changes whatsoever.\nThe Quarkus Lambda extension also produces templates for the AWS SAM tool for deploying my stack. They are a good starting point which just needs a little bit of massaging; For the purposes of cost control (see further below), I added an API usage plan and API key. I also enabled CORS so that the API can be called from my static website. This made it necessary to disable the configuration of binary media types which the generated template contains by default. Lastly, I used a specific pre-configured execution role instead of the default AWSLambdaBasicExecutionRole.\nWith the SAM descriptor in place, re-building and publishing the search service becomes a procedure of three steps:\nmvn clean package -Pnative,lambda -DskipTests=true \\ -Dquarkus.native.container-build=true sam package --template-file sam.native.yaml \\ --output-template-file packaged.yaml \\ --s3-bucket \u0026lt;my S3 bucket\u0026gt; sam deploy --template-file packaged.yaml \\ --capabilities CAPABILITY_IAM \\ --stack-name \u0026lt;my stack name\u0026gt; The lambda profile takes care of adding the Quarkus Lambda HTTP extension, while the native profile makes sure that a native binary is built instead of a JAR to be run on the JVM. As I need to build a Linux binary for the Lambda function while running on macOS locally, I’m using the -Dquarkus.native.container-build=true option, which will make the Quarkus build running in a container itself, producing a Linux binary no matter which platform this build itself is executed on.\nThe function.zip file produced by the Quarkus build has a size of ~15 MB, i.e. it’s uploaded and deployed to Lambda in a few seconds. Currently it also contains the Lucene search index, meaning I need to run the time-consuming GraalVM build whenever I want to update the index. As an optimization I might at some point extract the index into a separate Lambda layer, which then could be deployed by itself, if there were no code changes to the search service otherwise.\nIdentity and Access Management A big pain point for me was identity and access management (IAM) for the AWS API Gateway and Lambda. While the AWS IAM is really powerful and flexible, there’s unfortunately no documentation, which would describe the minimum set of required permissions in order to deploy a stack like my search using SAM.\nThings work nicely if you use a highly-privileged account, but I’m a strong believer into running things with only the least privileges needed for the job. For instance I don’t want my Lambda deployer to set up the execution role, but rather have it using one I pre-defined. The same goes for other resources like the S3 bucket used for uploading the deployment package.\nIdentifying the set of privileges actually needed is a rather soul-crushing experience of trial and error (please let me know in the comments below if there’s a better way to do this), which gets complicated by the fact that different resources in the AWS stack expose insufficient privileges in inconsistent ways, or sometimes in no really meaningful way at all when configured via SAM. I spent hours identifying a lacking S3 privilege when trying to deploy a Lambda layer from the Serverless Application Repository.\nHoping to spare others from this tedious work, here’s the policy for my deployment role I came up with:\n{ \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34;, \u0026#34;Statement\u0026#34;: [ { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;s3:PutObject\u0026#34;, \u0026#34;s3:GetObject\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:s3:::\u0026lt;deployment-bucket\u0026gt;\u0026#34;, \u0026#34;arn:aws:s3:::\u0026lt;deployment-bucket\u0026gt;/*\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;lambda:CreateFunction\u0026#34;, \u0026#34;lambda:GetFunction\u0026#34;, \u0026#34;lambda:GetFunctionConfiguration\u0026#34;, \u0026#34;lambda:AddPermission\u0026#34;, \u0026#34;lambda:UpdateFunctionCode\u0026#34;, \u0026#34;lambda:ListTags\u0026#34;, \u0026#34;lambda:TagResource\u0026#34;, \u0026#34;lambda:UntagResource\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:lambda:eu-central-1:\u0026lt;account-id\u0026gt;:function:search-morling-dev-SearchMorlingDev-*\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;iam:PassRole\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:iam::\u0026lt;account-id\u0026gt;:role/\u0026lt;execution-role\u0026gt;\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;cloudformation:DescribeStacks\u0026#34;, \u0026#34;cloudformation:DescribeStackEvents\u0026#34;, \u0026#34;cloudformation:CreateChangeSet\u0026#34;, \u0026#34;cloudformation:ExecuteChangeSet\u0026#34;, \u0026#34;cloudformation:DescribeChangeSet\u0026#34;, \u0026#34;cloudformation:GetTemplateSummary\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:cloudformation:eu-central-1:\u0026lt;account-id\u0026gt;:stack/search-morling-dev/*\u0026#34;, \u0026#34;arn:aws:cloudformation:eu-central-1:aws:transform/Serverless-2016-10-31\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;apigateway:POST\u0026#34;, \u0026#34;apigateway:PATCH\u0026#34;, \u0026#34;apigateway:GET\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:apigateway:eu-central-1::/restapis\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/restapis/*\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;apigateway:POST\u0026#34;, \u0026#34;apigateway:GET\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:apigateway:eu-central-1::/usageplans\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/usageplans/*\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/apikeys\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/apikeys/search-morling-dev-apikey\u0026#34; ] } ] } Perhaps this could be trimmed down some more, but I felt it’s good enough for my purposes.\nPerformance At this point I haven’t conducted any systematic performance testing yet. There’s definitely a significant difference in terms of latency between running things locally on my (not exactly new) laptop and on AWS Lambda. Where the app starts up in ~30 ms locally, it’s ~180 ms when deployed to Lambda. Note this is only the number reported by Quarkus itself, the entire cold start duration of the application on Lambda, i.e. including the time required for fetching the code to the execution environment and starting the container, is ~370 ms (with 256 MB RAM assigned). Due to the little trick described above, though, a visitor is very unlikely to ever experience this delay when executing a query.\nSimilarly, there’s a substantial difference in terms of request execution duration. Still, when running a quick test of the deployed service via Siege, the vast majority of Lambda executions clocked in well below 100 ms (depending on the number of query hits which need result highlighting), putting them into the lowest bracket of billed Lambda execution time. As I learned, Lambda allocates CPU resources proportionally to assigned RAM, meaning assigning twice as much RAM should speed up execution, also if my application actually does not need that much memory. Indeed, with 512 MB RAM assigned, Lambda execution is down to ~30 - 40 ms after some warm-up, which is more than good enough for my purposes.\nRaw Lambda execution of course is only one part of the overall request duration, on top of that some time is spent in the API Gateway and on the wire to the user; The service is deployed in the AWS eu-central-1 region (Frankfurt, Germany), yielding roundtrip times for me, living a few hundred km away, between 50 - 70 ms (again with 512 MB RAM). With longer distances, network latencies outweigh the Lambda execution time: My good friend Eric Murphy from Seattle in the US reported a roundtrip time of ~240 ms when searching for \u0026#34;Java\u0026#34;, which I think is still quite good, given the long distance.\nCost Control The biggest issue for me as a hobbyist when using pay-per-use services like AWS Lambda and API Gateway is cost control. Unlike typical enterprise scenarios where you might be willing to accept higher cost for your service in case of growing demand, in my case I’d rather set up a fixed spending limit and shut down my search service for the rest of the month, once that has been reached. I absolutely cannot have an attacker doing millions and millions of calls against my API which could cost me a substantial amount of money.\nUnfortunately, there’s no easy way on AWS for setting up a maximum spending after which all service consumption would be stopped. Merely setting up a budget alert won’t cut it either, as this won’t help me while sitting on a plane for 12h (whenever that will be possible again…​) or being on vacation for three weeks. And needless to say, I don’t have an ops team monitoring my blog infrastructure 24/7 either.\nSo what to do to keep costs under control? An API usage plan is the first part of the answer. It allows you to set up a quota (maximum number of calls in a given time frame) which is pretty much what I need. Any calls beyond the quota are rejected by the API Gateway and not charged.\nThere’s one caveat though: a usage plan is tied to an API key, which the caller needs to pass using the X-API-Key HTTP request header. The idea being that different usage plans can be put in place for different clients of an API. Any calls without the API key are not charged either. Unfortunately though this doesn’t play well with CORS preflight requests as needed in my particular use case. Such requests will be sent by the browser before the actual GET calls to validate that the server actually allows for that cross-origin request. CORS preflight requests cannot have any custom request headers, though, meaning they cannot be part of a usage plan. The AWS docs are unclear whether those preflight requests are charged or not, and in a way it seems unfair if they were charged given there’s no way to prevent this situation. But at this point it is fair to assume they are charged and we need a way to prevent having to pay for a gazillion preflight calls by a malicious actor.\nIn good software developer’s tradition I turned to Stack Overflow for finding help, and indeed I received a nice idea: A budget alert can be linked with an SNS topic, to which a message will be sent once the alert triggers. Then another Lambda function can be used to set the allowed rate of API invocations to 0, effectively disabling the API, preventing any further cost to pile up. A bit more complex than I was hoping for, but it does the trick. Thanks a lot to Harish for providing this nice answer on Stack Overflow and his blog! I implemented this solution and sleep much better now.\nNote that you should set the alert to a lower value than what you’re actually willing to spend, as billing happens asynchronously and requests might come in some more time until the alert triggers: as per Corey Quinn, there’s an \u0026#34;8-48 hour lag between \u0026#39;you incur the charge\u0026#39; and \u0026#39;it shows up in the billing system where an alert can see it and thus fire\u0026#39;\u0026#34;. It’s therefore also a good idea to reduce the allowed request rate. E.g. in my case I’m not expecting really that there’d be more than let’s say 25 concurrent requests (unless this post hits the Hackernews front page of course), so setting the allowed rate to that value helps to at least slow down the spending until the alert triggers.\nWith these measures in place, there should (hopefully!) be no bad surprises at the end of the month. Assuming a (very generously estimated) number of 10K search queries per month, each returning a payload of 5 KB, I’d be looking at an invoice over EUR 0.04 for the API Gateway, while the Lambda executions would be fully covered by the AWS free tier. That seems manageable :)\nWrap-Up and Outlook Having rolled out the search feature for this blog a few days ago, I’m really happy with the outcome. It was a significant amount of work to put everything together, but I think a custom search is a great addition to this site which hopefully proves helpful to my readers. Serverless is a perfect architecture and deployment option for this use case, being very cost-efficient for the expected low volume of requests, and providing a largely hands-off operations experience for myself.\nWith AOT compilation down to native binaries and enabling frameworks like Quarkus, Java definitely is in the game for building Serverless apps. Its huge eco-system of libraries such as Apache Lucene, sophisticated tooling and solid performance make it a very attractive implementation choice. Basing the application on Quarkus makes it a matter of configuration to switch between creating a deployment package for Lambda and a regular container image, avoiding any kind of lock-in into a specific platform.\nEnabling libraries for being used in native binaries can be a daunting task, but over time I’d expect either library authors themselves to do the required adjustment to smoothen that experience, and of course the growing number of Quarkus extensions also helps to use more and more Java libraries in native apps. I’m also looking forward to Project Leyden, which aims at making AOT compilation a part of the Java core platform.\nThe deployment to AWS Lambda and API Gateway was definitely more involved than I had anticipated; things like IAM and budget control are more complex than I think they could and should be. That there is no way to set up a hard spend capping is a severe shortcoming; hobbyists like myself should be able to explore this platform without having to fear any surprise AWS bills. It’s particular bothersome that API usage plans are no 100% safe way to enforce API quotas, as they cannot be applied to unauthorized CORS pref-flight requests and custom scripting is needed in order to close this loophole.\nBut then this experiment also was an interesting learning experience for me; working on libraries and integration solutions most of the time during my day job, I sincerely enjoyed the experience of designing a service from the ground-up and rolling it out into \u0026#34;production\u0026#34;, if I may dare to use that term here.\nWhile the search functionality is rolled out on my blog, ready for you to use, there’s a few things I’d like to improve and expand going forward:\nCI pipeline: Automatically re-building and deploying the search service after changes to the contents of my blog; this should hopefully be quite easy using GitHub Actions\nPerformance improvements: While the performance of the query service definitely is good enough, I’d like to see whether and how it could be tuned here and there. Tooling might be challenging there; where I’d use JDK Flight Recorder and Mission Control with a JVM based application, I’m much less familiar with equivalent tooling for native binaries. One option I’d like to explore in particular is taking advantage of Quarkus bytecode recording capability: bytecode instructions for creating the in-memory data structure of the Lucene index could be recorded at build time and then just be executed at application start-up; this might be the fastest option for loading the index in my special use case of a read-only index\nServerless comments: Currently I’m using Disqus for the commenting feature of the blog. It’s not ideal in terms of privacy and page loading speed, which is why I’m looking for alternatives. One idea could be a custom Serverless commenting functionality, which would be very interesting to explore, in particular as it shifts the focus from a purely immutable application to a stateful service that’ll require some means of modifiable, persistent storage\nIn the meantime, you can find the source code of the Serverless search feature on GitHub. Feel free to take the code and deploy it to your own website!\nMany thanks to Hans-Peter Grahsl and Eric Murphy for their feedback while writing this post!\n","id":64,"publicationdate":"Jul 29, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eI have built a custom search functionality for this blog,\nbased on Java and the Apache Lucene full-text search library,\ncompiled into a native binary using the Quarkus framework and GraalVM.\nIt is deployed as a Serverless application running on AWS Lambda,\nproviding search results without any significant cold start delay.\nIf you thought Java wouldn’t be the right language for this job, keep reading;\nin this post I’m going to give an overview over the implementation of this feature and my learnings along the way.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"How I Built a Serverless Search for My Blog","uri":"https://www.morling.dev/blog/how-i-built-a-serverless-search-for-my-blog/"},{"content":" Ahead-of-time compilation (AOT) is the big topic in the Java ecosystem lately: by compiling Java code to native binaries, developers and users benefit from vastly improved start-up times and reduced memory usage. The GraalVM project made huge progress towards AOT-compiled Java applications, and Project Leyden promises to standardize AOT in a future version of the Java platform.\nThis makes it easy to miss out on significant performance improvements which have been made on the JVM in recent Java versions, in particular when it comes to faster start-up times. Besides a range of improvements related to class loading, linking and bytecode verification, substantial work has been done around class data sharing (CDS). Faster start-ups are beneficial in many ways: shorter turnaround times during development, quicker time-to-first-response for users in coldstart scenarios, cost savings when billed by CPU time in the cloud.\nWith CDS, class metadata is persisted in an archive file, which during subsequent application starts is mapped into memory. This is faster than loading the actual class files, resulting in reduced start-up times. When starting multiple JVM processes on the same host, read-only archives of class metadata can also be shared between the VMs, so that less memory is consumed overall.\nOriginally a partially commercial feature of the Oracle JDK, CDS was completely open-sourced in JDK 10 and got incrementally improved since then in a series of Java improvement proposals:\nJEP 310, Application Class-Data Sharing (AppCDS), in JDK 10: \u0026#34;To improve startup and footprint, extend the existing [CDS] feature to allow application classes to be placed in the shared archive\u0026#34;\nJEP 341, Default CDS Archives, in JDK 12: \u0026#34;Enhance the JDK build process to generate a class data-sharing (CDS) archive, using the default class list, on 64-bit platforms\u0026#34;\nJEP 350, Dynamic CDS Archives, in JDK 13: \u0026#34;Extend application class-data sharing to allow the dynamic archiving of classes at the end of Java application execution. The archived classes will include all loaded application classes and library classes that are not present in the default, base-layer CDS archive\u0026#34;\nIn the remainder of this blog post we’ll discuss how to automatically create AppCDS archives as part of your (Maven) project build, based on the improvements made with JEP 350. I.e. Java 13 or later is a prerequisite for this. To learn more about using CDS with the current LTS release JDK 11 and about CDS in general, refer to the excellent blog post on everything CDS by Nicolai Parlog.\nManually Creating CDS Archives At first let’s see what’s needed to manually create and use an AppCDS archive (note I’m going to use \u0026#34;AppCDS\u0026#34; and \u0026#34;CDS\u0026#34; somewhat interchangeably for the sake of brevity). Subsequently, we’ll discuss how the task can be automated in a Maven project build.\nTo have an example to work with which goes beyond a plain \u0026#34;Hello World\u0026#34;, I’ve created a small web application for managing personal to-dos, using the Quarkus stack. If you’d like to follow along, clone the repo and build the project:\ngit clone git@github.com:gunnarmorling/quarkus-cds.git cd quarkus-cds mvn clean verify -DskipTests=true The application uses a Postgres database for persisting the to-dos; fire it up via Docker:\ncd compose docker run -d -p 5432:5432 --name pgdemodb \\ -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \\ -e POSTGRES_USER=todouser \\ -e POSTGRES_PASSWORD=todopw \\ -e POSTGRES_DB=tododb postgres:11 The next step is to run the application and create the CDS archive file. Do so by passing the -XX:ArchiveClassesAtExit option:\njava -XX:ArchiveClassesAtExit=target/app-cds.jsa \\ (1) -jar target/todo-manager-1.0.0-SNAPSHOT-runner.jar 1 Triggers creation of a CDS archive at the given location upon application shutdown Only loaded classes will be added to the archive. As classloading on the JVM happens lazily, you must invoke some functionality in your application in order to cause all the relevant classes to be loaded. For that to happen, open the application’s API endpoint in a browser or invoke it via curl, httpie or similar:\nhttp localhost:8080/api Stop the application by hitting Ctrl+C. This will create the CDS archive under target/app-cds.jsa. In our case it should have a size of about 41 MB. Also observe the log messages about classes which were skipped from archiving:\n... [190.220s][warning][cds] Skipping java/lang/invoke/LambdaForm$MH+0x0000000800bd0c40: Hidden or Unsafe anonymous class [190.220s][warning][cds] Skipping java/lang/invoke/LambdaForm$DMH+0x0000000800fdc840: Hidden or Unsafe anonymous class [190.220s][warning][cds] Pre JDK 6 class not supported by CDS: 46.0 antlr/TokenStreamIOException ... Mostly this is about hidden or anonymous classes which cannot be archived; there’s not so much you can do about that (apart from using less Lambda expressions perhaps…​).\nThe hint on old classfile versions is more actionable: only classes using classfile format 50 (= JDK 1.6) or newer are supported by CDS. In the case at hand, the classes from Antlr 2.7.7 are using classfile format 46 (which was introduced in Java 1.2) and thus cannot be added to the CDS archive. Note this also applies to any subclasses, even if they themselves use a newer classfile format version.\nIt’s thus a good idea to check whether you can upgrade to newer versions of your dependencies, as this may result in more classes becoming available for CDS, resulting in better start-up times in turn.\nUsing the CDS Archive Now let’s run the application again, this time using the previously created CDS archive:\njava -XX:SharedArchiveFile=target/app-cds.jsa \\ (1) -Xlog:class+load:file=target/classload.log \\ (2) -Xshare:on \\ (3) -jar target/todo-manager-1.0.0-SNAPSHOT-runner.jar 1 The path to the CDS archive 2 classloading logging allows to verify whether the CDS archive gets applied as expected 3 While class data sharing is enabled by default on JDK 12 and newer, explicitely enforcing it will ensure an error is raised if something is wrong, e.g. a mismatch of Java versions between building and using the archive When examining the classload.log file, you should see how most class metadata is obtained from the CDS archive (\u0026#34;source: shared object file\u0026#34;), while some classes such as the ancient Antlr classes are loaded just as usual from the corresponding JAR:\n[0.016s][info][class,load] java.lang.Object source: shared objects file [0.016s][info][class,load] java.io.Serializable source: shared objects file [0.016s][info][class,load] java.lang.Comparable source: shared objects file [0.016s][info][class,load] java.lang.CharSequence source: shared objects file ... [2.555s][info][class,load] antlr.Parser source: file:/.../antlr.antlr-2.7.7.jar ... Note it is vital that the exact same Java version is used as when creating the archive, otherwise an error will be raised. Unfortunately, this also means that AppCDS archives cannot be built cross-platform. This would be very useful, e.g. when building a Java application on macOS or Windows, which should be packaged in a Linux container. If you are aware of a way for doing so, please let me know in the comments below.\nCDS and the Java Module System Beginning with Java 11, not only classes from the classpath can be added to CDS archives, but also classes from the module path of a modularized Java application. One important detail to consider there is that the --upgrade-module-path and --patch-module options will cause CDS to be disabled or disallowed (with -Xshare:on) is specified. This is to avoid a mismatch of class metadata in the CDS archive and classes brought in by a newer module version.\nCreating CDS Archives in Your Maven Build Manually creating a CDS archive is not very efficient nor reliable, so let’s see how the task can be automated as part of your project build. The following shows the required configuration when using Apache Maven, but of course the same approach could be implemented with Gradle or any other build system.\nThe basic idea is the follow the same steps as before, but executed as part of the Maven build:\nstart up the application with the -XX:ArchiveClassesAtExit option\ninvoke some application functionality to initiate the loading of all relevant classes\nstop the application\nIt might appear as a compelling idea to produce the CDS archive as part of regular test execution, e.g. via JUnit. This will not work though, as the classpath at the time of using the CDS archive must be not miss any entries from the classpath at the time of creating it. As during test execution all the test-scoped dependencies will be part of the classpath, any CDS archive created that way couldn’t be used when running the application later on without those test dependencies.\nSteps 1. and 3. can be automated with help of the Process-Exec Maven plug-in, binding it to the pre-integration-test and post-integration-test build phases, respectively. While I was thinking of using the more widely known Exec plug-in initially, this turned out to not be viable as there’s no way for stopping any forked process in a later build phase.\nHere’s the relevant configuration:\n... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;com.bazaarvoice.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;process-exec-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;0.9\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; (1) \u0026lt;id\u0026gt;app-cds-creation\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;pre-integration-test\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;start\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;name\u0026gt;todo-manager\u0026lt;/name\u0026gt; \u0026lt;healthcheckUrl\u0026gt;http://localhost:8080/\u0026lt;/healthcheckUrl\u0026gt; (2) \u0026lt;arguments\u0026gt; \u0026lt;argument\u0026gt;java\u0026lt;/argument\u0026gt; (3) \u0026lt;argument\u0026gt;-XX:ArchiveClassesAtExit=app-cds.jsa\u0026lt;/argument\u0026gt; \u0026lt;argument\u0026gt;-jar\u0026lt;/argument\u0026gt; \u0026lt;argument\u0026gt; ${project.build.directory}/${project.artifactId}-${project.version}-runner.jar \u0026lt;/argument\u0026gt; \u0026lt;/arguments\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;execution\u0026gt; (4) \u0026lt;id\u0026gt;stop-all\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;post-integration-test\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;stop-all\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... 1 Start up the application in the pre-integration-test build phase 2 The health-check URL is used to await application start-up before proceeding with the next build phase 3 Assemble the java invocation 4 Stop the application in the post-integration-test build phase What remains to be done is the automation of step 2, the invocation of the required application logic so to trigger the loading of all relevant classes. This can be done with help of the Maven Surefire plug-in. A simple \u0026#34;integration test\u0026#34; via REST Assured does the trick:\npublic class ExampleResourceAppCds { @Test public void getAll() { given() .when() .get(\u0026#34;/api\u0026#34;) .then() .statusCode(200); } } We just need to configure a specific execution of the plug-in, which only picks up any test classes whose names end with *AppCds.java, so to keep them apart from actual integration tests:\n... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-failsafe-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.0.0-M4\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;integration-test\u0026lt;/goal\u0026gt; \u0026lt;goal\u0026gt;verify\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;includes\u0026gt; \u0026lt;include\u0026gt;**/*AppCds.java\u0026lt;/include\u0026gt; \u0026lt;/includes\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... And that’s all we need; when now building the project via mvn clean verify, a CDS archive will be created at target/app-cds.jsa. You can find the complete example project and steps for building/running it on GitHub.\nWhat Do You Gain? Creating a CDS archive is nice, but is it also worth the effort? In order to answer this question, I’ve done some measurements of the \u0026#34;time-to-first-response\u0026#34; metric, following the Quarkus guide on measuring performance. I.e. instead of awaiting some rather meaningless \u0026#34;start-up complete\u0026#34; status, which could arbitrarily be tweaked by means of lazy initialization, this measures the time until the application is actually ready to handle the first incoming request after start-up.\nI’ve done measurements on OpenJDK 1.8.0_252 (AdoptOpenJDK build), OpenJDK 14.0.1 (upstream build, without and with AppCDS), and OpenJDK 15-ea-b26 (upstream build, with AppCDS). Please see the README file of the example repo for the exact steps.\nHere are the numbers, averaged over ten runs each:\nUpdate, June 12th: I had originally classload logging enabled for the OpenJDK 14 AppCDS runs, which added an unneccessary overhead (thanks a lot to Claes Redestad for pointing this out!). The numbers and chart have been updated accordingly. I’ve also added numbers for OpenJDK 15-ea.\nTime-to-first-response values are 2s 267ms, 2s 162ms, 1s 669ms 1s 483ms, and 1s 279ms. I.e. on my machine (2014 MacBook Pro), with this specific workload, there’s an improvement of ~100ms just by upgrading to the current JDK, and of another ~500ms ~700ms by using AppCDS.\nWith OpenJDK 15 things will further improve. The latest EA build at the time of writing (b26) shortens time-to-first-response by another ~200ms. The upcoming EA build 27 should bring another improvement, as Lambda proxy classes will be added to AppCDS archives then.\nThat all is definitely a nice improvement, in particular as we get it essentially for free, without any changes to the actual application itself. You should contrast this with the additional size of the application distribution, though. E.g. when obtaining the application as a container image from a remote container registry, downloading the additional ~40 MB might take longer than the time saved during application start-up. Typically, this will only affect the first start-up of on a particular node, though, after which the image will be cached locally.\nAs always when it comes to any kinds of performance numbers, please take these numbers with a grain of salt, do your own measurements, using your own applications and in your own environment.\nAddressing Different Workload Profiles If your application supports different \u0026#34;work modes\u0026#34;, e.g. \u0026#34;online\u0026#34; and \u0026#34;batch\u0026#34;, which work with a largely differing set of classes, you also might consider to create different CDS archives for the specific workloads. This might give you a good balance between additional size and realized improvements of start-up times, when for instance dealing with at large monolithic application instead of more fine-grained microservices.\nWrap-Up AppCDS provides Java developers with a useful tool for reducing start-up times of their applications, without requiring any code changes. For the example discussed, we could observe an improvement of the time-to-first-response metric by about 30% when running with OpenJDK 14. Other users reported even bigger improvements.\nWe didn’t discuss any potential memory improvements due to CDS when sharing class metadata between multiple JVMs on one host. In containerized server applications, with each JVM being packaged in its own container image, this won’t play a role. It could make a difference on desktop systems, though. For instance multiple instances of the Java language server, as leveraged by VSCode and other editors, could benefit from that.\nThat all being said, when raw start-up time is your primary concern, e.g. in a serverless or Function-based setting, you should look at AOT compilation with GraalVM (or Project Leyden in the future). This will bring down start-up times to a completely different level; for example the todo manager application would return a first response within a few 10s of milliseconds when executed as a native image via GraalVM.\nBut AOT is not always an option, nor does it always make sense: the JVM may offer a better latency than native binaries, external dependencies migh not be ready for usage in AOT-compiled native images yet, or you simply might want to be able to benefit from all the JVM goodness, like familiar debugging tools, the JDK Flight Recorder, or JMX. In that case, CDS can give you a nice start-up time improvement, solely by means of adding a few steps to your build process.\nBesides class data sharing in OpenJDK, there are some other related techniques for improving start-up times which are worth exploring:\nEclipse OpenJ9 has its own implementation of class data sharing\nAlibaba’s Dragonwell distribution of the OpenJDK comes with JWarmUp, a tool for speeding up initial JIT compilations\nTo learn more about AppCDS, a long yet insightful post is this one by Vladimir Plizga. Volker Simonis did another interesting write-up. Also take a look at the CDS documentation in the reference docs of the java command.\nLastly, the Quarkus team is working on out-of-the-box support for CDS archives. This could fully automate the creation of an archive for all required classes without any further configuration, making it even easier to benefit from the start-up time improvements promised by CDS.\n","id":65,"publicationdate":"Jun 11, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAhead-of-time compilation (AOT) is \u003cem\u003ethe\u003c/em\u003e big topic in the Java ecosystem lately:\nby compiling Java code to native binaries, developers and users benefit from vastly improved start-up times and reduced memory usage.\nThe \u003ca href=\"https://www.graalvm.org/\"\u003eGraalVM\u003c/a\u003e project made huge progress towards AOT-compiled Java applications,\nand \u003ca href=\"https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html\"\u003eProject Leyden\u003c/a\u003e promises to standardize AOT in a future version of the Java platform.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis makes it easy to miss out on significant performance improvements which have been made on the JVM in recent Java versions,\nin particular when it comes to \u003ca href=\"https://cl4es.github.io/2019/11/20/OpenJDK-Startup-Update.html\"\u003efaster start-up times\u003c/a\u003e.\nBesides a range of improvements related to class loading, linking and bytecode verification,\nsubstantial work has been done around \u003ca href=\"https://docs.oracle.com/en/java/javase/14/vm/class-data-sharing.html\"\u003eclass data sharing\u003c/a\u003e (CDS).\nFaster start-ups are beneficial in many ways:\nshorter turnaround times during development,\nquicker time-to-first-response for users in coldstart scenarios,\ncost savings when billed by CPU time in the cloud.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWith CDS, class metadata is persisted in an archive file,\nwhich during subsequent application starts is mapped into memory.\nThis is faster than loading the actual class files, resulting in reduced start-up times.\nWhen starting multiple JVM processes on the same host, read-only archives of class metadata can also be shared between the VMs, so that less memory is consumed overall.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Building Class Data Sharing Archives with Apache Maven","uri":"https://www.morling.dev/blog/building-class-data-sharing-archives-with-apache-maven/"},{"content":" Do you remember Angus \u0026#34;Mac\u0026#34; MacGyver? The always creative protagonist of the popular 80ies/90ies TV show, who could solve about any problem with nothing more than a Swiss Army knife, duct tape, shoe strings and a paper clip?\nThe single message transformations (SMTs) of Kafka Connect are almost as versatile as MacGyver’s Swiss Army knife:\nHow to change the timezone or format of date/time message fields?\nHow to change the topic a specific message gets sent to?\nHow to filter out specific records?\nSMTs can be the answer to these and many other questions that come up in the context of Kafka Connect. Applied to source or sink connectors, SMTs allow to modify Kafka records before they are sent to Kafka, or after they are consumed from a topic, respectively.\nIn this post I’d like to focus on some interesting (hopefully anyways) usages of SMTs. Those use cases are mostly based on my experiences from using Kafka Connect with Debezium, an open-source platform for change data capture (CDC). I also got some great pointers on interesting SMT usages when asking the community about this on Twitter some time ago:\nI definitely recommend to check out the thread; thanks a lot to all who replied! In order to learn more about SMTs in general, how to configure them etc., refer to the resources given towards the end of this post.\nFor each category of use cases, I’ve also asked our sympathetic TV hero for his opinion on the usefulness of SMTs for the task at hand. You can find his rating at the end of each section, ranging from 📎 (poor fit) to 📎📎📎📎📎 (perfect fit).\nFormat Conversions Probably the most common application of SMTs is format conversion, i.e. adjustments to type, format and representation of data. This may apply to entire messages, or to specific message attributes. Let’s first look at a few examples for converting individual message attribute formats:\nTimestamps: Different systems tend to have different assumptions of how timestamps should be typed and formatted. Debezium for instance represents most temporal column types as milli-seconds since epoch. Change event consumers on the other hand might expect such date and time values using Kafka Connect’s Date type, or as an ISO-8601 formatted string, potentially using a specific timezone\nValue masking: Sensitive data might have be to masked or truncated, or specific fields should even be removed altogether; the org.apache.kafka.connect.transforms.MaskField and ReplaceField SMTs shipping with Kafka Connect out of the box come in handy for that\nNumeric types: Similar to timestamps, requirements around the representation of (decimal) numbers may differ between systems; e.g. Kafka Connect’s Decimal type allows to convey arbitrary-precision decimals, but its binary representation of numbers might not be supported by all sink connectors and consumers\nName adjustments: Depending on the chosen serialization formats, specific field names might be unsupported; when working with Apache Avro for instance, field names must not start with a number\nIn all these cases, either existing, ready-made SMTs or bespoke implementations can be used to apply the required attribute type and/or format conversions.\nWhen using Kafka Connect for integrating legacy services and databases with newly built microservices, such format conversions can play an important role for creating an anti-corruption layer: by using better field names, choosing more suitable data types or by removing unneeded fields, SMTs can help to shield a new service’s model from the oddities and quirks of the legacy world.\nBut SMTs cannot only modify the representation of single fields, also the format and structure of entire messages can be adjusted. E.g. Kafka Connect’s ExtractField transformation allows to extract a single field from a message and propagate that one. A related SMT is Debezium’s SMT for change event flattening. It can be used to convert the complex Debezium change event structure with old and new row state, metadata and more, into a flat row representation, which can be consumed by many existing sink connectors.\nSMTs also allow to fine-tune schema namespaces; that can be of interest when working with a schema registry for managing schemas and their versions, and specific schema namespaces should be enforced for the messages on given topics. Two more, very useful examples of SMTs in this category are kafka-connect-transform-xml and kafka-connect-json-schema by Jeremy Custenborder, which will take XML or text and produce a typed Kafka Connect Struct, based on a given XML schema or JSON schema, respectively.\nLastly, as a special kind of format conversion, SMTs can be used to modify or set the key of Kafka records. This may be desirable if a source connector doesn’t produce any meaningful key, but one can be extracted from the record value. Also changing the message key can be useful, when considering subsequent stream processing. Choosing matching keys right at the source side e.g. allows for joining multiple topics via Kafka Streams, without the need for re-keying records.\nMac’s rating: 📎📎📎📎📎 SMTs are the perfect tool for format conversions of Kafka Connect records\nEnsuring Backwards Compatibility Changes to the schema of Kafka records can potentially be disruptive for consumers. If for instance a record field gets renamed, a consumer must be adapted accordingly, reading the value using the new field name. In case a field gets dropped altogether, consumers must not expect this field any longer.\nMessage transformations can help with such transition from one schema version to the next, thus reducing the coupling of the lifecycles of message producers and consumers. In case of a renamed field, an SMT could add the field another time, using the original name. That’ll allow consumers to continue reading the field using the old name and to be upgraded to use the new name at their own pace. After some time, once all consumers have been adjusted, the SMT can be removed again, only exposing the new field name going forward. Similarly, a field that got removed from a message schema could be re-added, e.g. using some sort of constant placeholder value. In other cases it might be possible to derive the field value from other, still existing fields. Again consumers could then be updated at their own pace to not expect and access that field any longer.\nIt should be said though that there are limits for this usage: e.g. when changing the type of a field, things quickly become tricky. One option could be a multi-step approach where at first a separate field with the new type is added, before renaming it again as described above.\nMac’s rating: 📎📎📎 SMTs can primarily help to address basic compatibility concerns around schema evolution\nFiltering and Routing When applied on the source side, SMTs allow to filter out specific records produced by the connector. They also can be used for controlling the Kafka topic a record gets sent to. That’s in particular interesting when filtering and routing is based on the actual record contents. In an IoT scenario for instance where Kafka Connect is used to ingest data from some kind of sensors, an SMT might be used to filter out all sensor measurements below a certain threshold, or route measurement events above a threshold to a special topic.\nDebezium provides a range of SMTs for record filtering and routing:\nThe logical topic routing SMT allows to send change events originating from multiple tables to the same Kafka topic, which can be useful when working with partition tables in Postgres, or with data that is sharded into multiple tables\nThe Filter and ContentBasedRouter SMTs let you use script expressions in languages such as Groovy or JavaScript for filtering and routing change events based on their contents; such script-based approach can be an interesting middleground between ease-of-use (no Java code must be compiled and deployed to Kafka Connect) and expressiveness; e.g. here is how the routing SMT could be used with GraalVM’s JavaScript engine for routing change events from a table with purchase orders to different topics in Kafka, based on the order type:\n... transforms=route transforms.route.type=io.debezium.transforms.ContentBasedRouter transforms.route.topic.regex=.*purchaseorders transforms.route.language=jsr223.graal.js transforms.route.topic.expression= value.after.ordertype == \u0026#39;B2B\u0026#39; ? \u0026#39;b2b_orders\u0026#39; : \u0026#39;b2c_orders\u0026#39; ... The outbox event router comes in handy when implementing the transactional outbox pattern for data propagation between microservices: it can be used to send events originating from a single outbox table to a specific Kafka topic per aggregate (when thinking of domain driven design) or event type\nThere are also two SMTs for routing purposes in Kafka Connect itself: RegexRouter which allows to re-route records two different topics based on regular expressions, and TimestampRouter for determining topic names based on the record’s timestamp.\nWhile routing SMTs usually are applied to source connectors (defining the Kafka topic a record gets sent to), it can also make sense to use them with sink connectors. That’s the case when a sink connector derives the name of downstream table names, index names or similar from the topic name.\nMac’s rating: 📎📎📎📎📎 Message filtering and topic routing — no problem for SMTs\nTombstone Handling Tombstone records are Kafka records with a null value. They carry special semantics when working with compacted topics: during log compaction, all records with the same key as a tombstone record will be removed from the topic.\nTombstones will be retained on a topic for a configurable time before compaction happens (controlled via delete.retention.ms topic setting), which means that also Kafka Connect sink connectors need to handle them. Unfortunately though, not all connectors are prepared for records with a null value, typically resulting in NullPointerExceptions and similar. A filtering SMT such as the one above can be used to drop tombstone records in such case.\nBut also the exact opposite — producing tombstone records — can be useful: some sink connectors use tombstone records as the indicator to delete corresponding rows from a downstream datastore. Now when using a CDC connector like Debezium to capture changes from a database where \u0026#34;soft deletes\u0026#34; are used (i.e. records are not physically deleted, but a logically deleted flag is set to true when deleting a record), those change events will be exported as update events (which they technically are). A bespoke SMT can be used to translate these update events into tombstone records, triggering the deletion of corresponding records in downstream datastores.\nMac’s rating: 📎📎📎📎 SMTs work well to discard tombstones or convert soft delete events into tombstones. What’s not possible though is to keep the original event and produce an additional tombstone record at the same time\nExternalizing Large Payloads Even some advanced enterprise application patterns can be implemented with the help of SMTs, one example being the claim check pattern. This pattern comes in handy in situations like this:\nA message may contain a set of data items that may be needed later in the message flow, but that are not necessary for all intermediate processing steps. We may not want to carry all this information through each processing step because it may cause performance degradation and makes debugging harder because we carry so much extra data.\n— Gregor Hohpe, Bobby Woolf; Enterprise Application Patterns\nA specific example could again be a CDC connector that captures changes from a database table Users, with a BLOB column that contains the user’s profile picture (surely not a best practice, still not that uncommon in reality…​).\nApache Kafka and Large Messages Apache Kafka isn’t meant for large messages. The maximum message size is 1 MB by default, and while this can be increased, benchmarks are showing best throughput for much smaller messages. Strategies like chunking and externalizing large payloads can thus be vital in order to ensure a satisfying performance.\nWhen propagating change data events from that table to Apache Kafka, adding the picture data to each event poses a significant overhead. In particular, if the picture BLOB hasn’t changed between two events at all.\nUsing an SMT, the BLOB data could be externalized to some other storage. On the source side, the SMT could extract the image data from the original record and e.g. write it to a network file system or an Amazon S3 bucket. The corresponding field in the record would be updated so it just contains the unique address of the externalised payload, such as the S3 bucket name and file path:\nAs an optimization, it could be avoided to re-upload unchanged file contents another time by comparing earlier and current hash of the externalized file.\nA corresponding SMT instance applied to sink connectors would retrieve the identifier of the externalized files from the incoming record, obtain the contents from the external storage and put it back into the record before passing it on to the connector.\nMac’s rating: 📎📎📎📎 SMTs can help to externalize payloads, avoiding large Kafka records. Relying on another service increases overall complexity, though\nLimitations As we’ve seen, single message transformations can help to address quite a few requirements that commonly come up for users of Kafka Connect. But there are limitations, too; Like MacGyver, who sometimes has to reach for some other tool than his beloved Swiss Army knife, you shouldn’t think of SMTs as the perfect solution all the time.\nThe biggest shortcoming is already hinted at in their name: SMTs only can be used to process single records, one at a time. E.g. you cannot split up a record into multiple ones using an SMT, as they only can return (at most) one record. Also any kind of stateful processing, like aggregating data from multiple records, or correlating records from several topics is off limits for SMTs. For such use cases, you should be looking at stream processing technologies like Kafka Streams and Apache Flink; also integration technologies like Apache Camel can be of great use here.\nOne thing to be aware of when working with SMTs is configuration complexity; when using generic, highly configurable SMTs, you might end up with lengthy configuration that’s hard to grasp and debug. You might be better off implementing a bespoke SMT which is focussing on one particular task, leveraging the full capabilities of the Java programming language.\nSMT Testing Whether you use ready-made SMTs by means of configuration, or you implement custom SMTs in Java, testing your work is essential.\nWhile unit tests are a viable option for basic testing of bespoke SMT implementations, integration tests running against Kafka Connect connectors are recommended for testing SMT configurations. That way you’ll be sure that the SMT can process actual messages and it has been configured the way you intended to.\nTestcontainers and the Debezium support for Testcontainers are a great foundation for setting up all the required components such as Apache Kafka, Kafka Connect, connectors and the SMTs to test.\nA specific feature I wished for every now and then is the ability to apply SMTs only to a specific sub-set of the topics created or consumed by a connector. In particular if connectors create different kinds of topics (like an actual data topic and another one with with metadata), it can be desirable to apply SMTs only to the topics of one group but not the other. This requirement is captured in KIP-585 (\u0026#34;Filter and Conditional SMTs\u0026#34;), please join the discussion on that one if you got requirements or feedback related to that.\nLearning More There are several great presentations and blog posts out there which describe in depth what SMTs are, how you can implement your own one, how they are configured etc.\nHere are a few resources I found particularly helpful:\nKIP-66: The original KIP (Kafka Improvement Proposal) that introduced SMTs\nSinge Message Transforms are not the Transformations You’re Looking For: A great overview on SMTs, their capabilities as well as limitations, by Ewen Cheslack-Postava\nA hands-on experience with Kafka Connect SMTs: In-depth blog post on SMT use cases, things to be aware of and more, by Gian D’Uia\nNow, considering this wide range of use cases for SMTs, would MacGyver like and use them for implementing various tasks around Kafka Connect? I would certainly think so. But as always, the right tool for the job must be chosen: sometimes an SMT may be a great fit, another time a more flexible (and complex) stream processing solution might be preferable.\nJust as MacGyver, you got to make a call when to use your Swiss Army knife, duct tape or a paper clip.\nMany thanks to Hans-Peter Grahsl for his feedback while writing this blog post!\n","id":66,"publicationdate":"May 14, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eDo you remember Angus \u0026#34;Mac\u0026#34; MacGyver?\nThe always creative protagonist of the popular 80ies/90ies TV show, who could solve about any problem with nothing more than a Swiss Army knife, duct tape, shoe strings and a paper clip?\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe single message transformations (SMTs) of Kafka Connect are almost as versatile as MacGyver’s Swiss Army knife:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eHow to change the timezone or format of date/time message fields?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eHow to change the topic a specific message gets sent to?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eHow to filter out specific records?\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSMTs can be the answer to these and many other questions that come up in the context of Kafka Connect.\nApplied to source or sink connectors,\nSMTs allow to modify Kafka records before they are sent to Kafka, or after they are consumed from a topic, respectively.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Single Message Transformations - The Swiss Army Knife of Kafka Connect","uri":"https://www.morling.dev/blog/single-message-transforms-swiss-army-knife-of-kafka-connect/"},{"content":" For libraries and frameworks it’s a common requirement to make specific aspects customizeable via service provider interfaces (SPIs): contracts to be implemented by the application developer, which then are invoked by framework code, adding new or replacing existing functionality.\nOften times, the method implementations of such an SPI need to return value(s) to the framework. An alternative to return values are \u0026#34;emitter parameters\u0026#34;: passed by the framework to the SPI method, they offer an API for receiving value(s) via method calls. Certainly not revolutionary or even a new idea, I find myself using emitter parameters more and more in libraries and frameworks I work on. Hence I’d like to discuss some advantages I perceive about the emitter parameter pattern.\nAn Example As an example, let’s consider a blogging platform which provides an SPI for extracting categories and tags from given blog posts. Application developers can plug in custom implementations of that SPI, e.g. based on the latest and greatest algorithms in information retrieval and machine learning. Here’s how a basic SPI contract for this use case could look like, using regular method return values:\n1 2 3 4 5 public interface BlogPostDataExtractor { Set\u0026lt;String\u0026gt; extractCategories(String contents); Set\u0026lt;String\u0026gt; extractTags(String contents); } This probably would get the job done, but there are a few problems: any implementation will have to do two passes on the given blog post contents, once in each method — not ideal. Also let’s assume that most blog posts only belong to exactly one category. Implementations still would have to allocate a set for the single returned category.\nWhile there’s not much we can do about the second issue with a return value based design, the former problem could be addressed by combining the two methods:\n1 2 3 4 public interface BlogPostDataExtractor { CategoriesAndTags extractCategoriesAndTags(String contents); } Now an implementation can retrieve both categories and tags at once. But it’s worth thinking about how an SPI implementation would instantiate the return type.\nExposing a concrete class to be instantiated by implementors poses a challenge for future evolution of the SPI: following the best practice and making the return object type immutable, all its properties must be passed to its constructor. Now if an additional attribute should be extracted from blog posts, such as a teaser, the existing constructor cannot be modified, so to not break existing user code. Instead, we’d have to introduce new constructors whenever adding further attributes. Dealing with all these constructors could become quite inconvenient, in particular if a specific SPI implementation is only interested in producing some of the attributes.\nAll in all, for SPIs it’s often a good idea to only expose interfaces, but no concrete classes. So we could make the return type an interface and leave it to SPI implementors to create an implementation class, but that’d be rather tedious.\nThe Emitter Parameter Pattern Or, we could provide some sort of builder object which can be used to construct CategoriesAndTags objects. But then why even return an object at all, instead of simply mutating the state of a builder that is provided through a method parameter? And that’s essentially what the emitter parameter pattern is about: passing in an object which can be used to emit the values which should be \u0026#34;returned\u0026#34; by the method.\nI’m not aware of any specific name for this pattern, so I came up with \u0026#34;emitter parameter pattern\u0026#34; (the notion of callback parameters is related, yet different). And hey, perhaps I’ll become famous for coining a design pattern name ;) Please let me know in the comments below if you know this pattern under a different name.\nHere’s how the extractor SPI could look like when designed with an emitter parameter:\n1 2 3 4 5 6 7 8 9 10 public interface BlogPostDataExtractor { void extractData(String contents, BlogPostDataReceiver data); (1) interface BlogPostDataReceiver { (2) void addCategory(String category); void addTag(String tag); } } 1 SPI method with input parameter and emitter parameter 2 Emitter parameter type An implementation would emit the retrieved information by invoking the methods on the data parameter:\n1 2 3 4 5 6 7 8 9 10 public class MyBlogPostDataExtractor implements BlogPostDataExtractor { public void extractData(String contents, BlogPostDataReceiver data) { String category = ...; Stream\u0026lt;String\u0026gt; tags = ...; data.addCatgory(category); tags.forEach(data::addTag); } } This approach nicely avoids all the issues with the return value based design:\nSingle and multiple value case handled uniformly: an implementation can call addCategory() just once, or multiple times; either way, it doesn’t have to deal with the creation of a set, list, or other container for the produced value(s)\nFlexible evolution of the SPI contract: new methods such as addTeaser(), or addTags(String…​ tags) can be added to the emitter parameter type, avoiding the creation of more and more return type constructors; as the passed BlogPostDataReceiver instance is controlled by the framework itself, we also could add methods which provide more context required for the task at hand\nNo need for exposing concrete types on the SPI surface: as no return value needs to be instantiated by SPI implementations, the solution works solely with interfaces on the SPI surface; this provides more control to the framework, e.g. the emitter object could be re-used etc.\nFlexible implementation choices: by not requiring SPI implementations to allocate any return objects, the platform gains a lot of flexibility for how it’s processing the emitted values: while it could collect the values in a set or list, it also has the option to not allocate any intermediary collections, but process and pass on values one-by-one in a streaming-based way, without any of this impacting SPI implementors\nNow, are there some downsides to this approach, too? I can see two: if a method only ever should yield a single value, the emitter API might be misleading. We could raise an exception though if an emitter method is called more than once. Also an implementation might hold on to the emitter object and invoke its methods after the call flow has returned from the SPI method, which typically isn’t desirable. Again that’s something that can be prevented by invalidating the emitter object after the SPI method returned, raising an exception in case of further method invocations.\nOverall, I think the emitter parameter pattern is a valuable tool in the box of library and framework authors; it provides flexibility for implementation choices and future evolution when designing SPIs. Real-world examples include the ValueExtractor SPI in Bean Validation 2.0 (where it was chosen to provide a uniform value of extracting single and multiple values from container objects) and the ChangeRecordEmitter contract in Debezium’s SPI.\nMany thanks to Hans-Peter Grahsl and Nils Hartmann for reviewing an early version of this blog post.\n","id":67,"publicationdate":"May 4, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor libraries and frameworks it’s a common requirement to make specific aspects customizeable via \u003ca href=\"https://en.wikipedia.org/wiki/Service_provider_interface\"\u003eservice provider interfaces\u003c/a\u003e (SPIs):\ncontracts to be implemented by the application developer, which then are invoked by framework code,\nadding new or replacing existing functionality.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOften times, the method implementations of such an SPI need to return value(s) to the framework.\nAn alternative to return values are \u0026#34;emitter parameters\u0026#34;:\npassed by the framework to the SPI method, they offer an \u003cem\u003eAPI\u003c/em\u003e for receiving value(s) via method calls.\nCertainly not revolutionary or even a new idea,\nI find myself using emitter parameters more and more in libraries and frameworks I work on.\nHence I’d like to discuss some advantages I perceive about the emitter parameter pattern.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"The Emitter Parameter Pattern for Flexible SPI Contracts","uri":"https://www.morling.dev/blog/emitter-parameter-pattern-for-flexible-spis/"},{"content":" Making applications extensible with some form of plug-ins is a very common pattern in software design: based on well-defined APIs provided by the application core, plug-ins can customize an application’s behavior and provide new functionality. Examples include desktop applications like IDEs or web browsers, build tools such as Apache Maven or Gradle, as well as server-side applications such as Apache Kafka Connect, a runtime for Kafka connectors plug-ins.\nIn this post I’m going to explore how the Java Platform Module System\u0026#39;s notion of module layers can be leveraged for implementing plug-in architectures on the JVM. We’ll also discuss how Layrry, a launcher and runtime for layered Java applications, can help with this task.\nA key requirement for any plug-in architecture is strong isolation between different plug-ins: their state, classes and dependencies should be encapsulated and independent of each other. E.g. package declarations in two plug-ins should not collide, also they should be able to use different versions of another 3rd party dependency. This is why the default module path of Java (specified using the --module-path option) is not enough for this purpose: it doesn’t support more than one version of a given module.\nThe module system’s answer are module layers: by organizing an application and its plug-ins into multiple layers, the required isolation between plug-ins can be achieved.\nWith the module system, each Java application always contains at least one layer, the boot layer. It contains the platform modules and the modules provided on the module path.\nAn Example: The Greeter CLI App To make things more tangible, let’s consider a specific example; The \u0026#34;Greeter\u0026#34; app is a little CLI utility, that can produce greetings in different languages.\nIn order to not limit the number of supported languages, it provides a plug-in API, which allows to add additional greeting implementations, without the need to rebuild the core application. Here is the Greeter contract, which is to be implemented by each language plug-in:\n1 2 3 4 5 package com.example.greeter.api; public interface Greeter { String greet(String name); } Greeters are instantiated via accompanying implementations of GreeterFactory:\n1 2 3 4 5 public interface GreeterFactory { String getLanguage(); (1) String getFlag(); Greeter getGreeter(); (2) } 1 The getLanguage() and getFlag() methods are used to show a description of all available greeters in the CLI application 2 The getGreeter() method returns a new instance of the corresponding Greeter type Here’s the overall architecture of the Greeter application, with three different language implementations:\nThe application is made up of five different layers:\ngreeter-platform: contains the Greeter and GreeterFactory contracts\ngreeter-en, greeter-de and greeter-fr: greeter implementations for different languages; note how each one is depending on a different version of some greeter-date module. As they are isolated in different layers, they can co-exist within the application\ngreeter-app: the \u0026#34;shell\u0026#34; of the application which loads all the greeter implementations and makes them accessible as a simple CLI application\nNow let’s see how this application structure can be assembled using Layrry.\nApplication Plug-ins With Layrry In a previous blog post we’ve explored how applications can be cut into layers, described in Layrry’s layers.yml configuration file. A simple static layer definition would defeat the purpose of a plug-in architecture, though: not all possible plug-ins are known when assembling the application.\nLayrry addresses this requirement by allowing to source different layers from directories on the file system:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 layers: platform: (1) modules: - \u0026#34;com.example.greeter:greeter-api:1.0.0\u0026#34; plugins: (2) parents: - \u0026#34;api\u0026#34; directory: path/to/plugins app: (3) parents: - \u0026#34;plugins\u0026#34; modules: - \u0026#34;com.example.greeter:greeter-app:1.0.0\u0026#34; main: module: com.example.greeter.app class: com.example.greeter.app.App 1 The platform layer with the API module 2 The plug-in layer(s) 3 The application layer with the \u0026#34;application shell\u0026#34; Whereas the platform and app layers are statically defined, using the Maven GAV coordinates of the modules to include, the plugins part of the configuration describes an open-ended set of layers. Each sub-directory of the given directory represents its own layer. All modules within this sub-directory will be added to the layer, and the API layer will be the parent of each of the plug-in layers. The app layer has all the plug-in layers as its ancestors, allowing it to retrieve plug-in implementations from these layers.\nMore greeter plug-ins can be added to the application by simply creating a sub-directory with the required module(s).\nFinding Plug-in Implementations With the Java Service Loader Structuring the application into different layers isn’t all we need for building a plug-in architecture; we also need a way for detecting and loading the actual plug-in implementations. The service loader mechanism of the Java platform comes in handy for that. If you have never worked with the service loader API, it’s definitely recommended to study its extensive JavaDoc description:\nA service is a well-known interface or class for which zero, one, or many service providers exist. A service provider (or just provider) is a class that implements or subclasses the well-known interface or class. A ServiceLoader is an object that locates and loads service providers deployed in the run time environment at a time of an application’s choosing. Having been a supported feature of Java since version 6, the service loader API has been been reworked and refined to work within modular environments when the Java Module System was introduced in JDK 9.\nIn order to retrieve service implementations via the service loader, a consuming module must declare the use of the service in its module descriptor. For our purposes, the GreeterFactory contract is a perfect examplification of the service idea. Here’s the descriptor of the Greeter application’s app module, declaring its usage of this service:\n1 2 3 4 5 module com.example.greeter.app { exports com.example.greeter.app; requires com.example.greeter.api; uses com.example.greeter.api.GreeterFactory; } The module descriptor of each greeter plug-in must declare the service implementation(s) which it provides. E.g. here is the module descriptor of the English greeter implementation:\n1 2 3 4 5 6 module com.example.greeter.en { requires com.example.greeter.api; requires com.example.greeter.dateutil; provides com.example.greeter.api.GreeterFactory with com.example.greeter.en.EnglishGreeterFactory; } From within the app module, the service implementations can be retrieved via the java.util.ServiceLoader class.\nWhen using the service loader in layered applications, there’s one potential pitfall though, which mostly will affect existing applications which are migrated: in order to access service implementations located in a different layer (specifically, in an ancestor layer of the loading layer), the method load(ModuleLayer, Class\u0026lt;?\u0026gt;) must be used. When using other overloaded variants of load(), e.g. the commonly used load(Class\u0026lt;?\u0026gt;), those implementations won’t be found.\nHence the code for loading the greeter implementations from within the app layer could look like this:\n1 2 3 4 5 6 7 8 9 10 private static List\u0026lt;GreeterFactory\u0026gt; getGreeterFactories() { ModuleLayer appLayer = App.class.getModule().getLayer(); return ServiceLoader.load(appLayer, GreeterFactory.class) .stream() .map(p -\u0026gt; p.get()) .sorted((gf1, gf2) -\u0026gt; gf1.getLanguage().compareTo( gf2.getLanguage())) .collect(Collectors.toList()); } Having loaded the list of greeter factories, it doesn’t take too much code to display a list with all available implementations, expect a choice by the user and invoke the greeter for the chosen language. This code which isn’t too interesting is omitted here for the sake of brevity and can be found in the accompanying example source code repo.\nJDK 9 brought some more nice improvements for the service loader API. E.g. the type of service implementations can be examined without actually instantiating them. This allows for interesting alternatives for providing service meta-data and choosing an implementation based on some criteria. For instance, greeter metadata like the language name and flag could be given using an annotation:\n1 2 3 4 @GreeterDefinition(lang=\u0026#34;English\u0026#34;, flag=\u0026#34;🇬🇧\u0026#34;) public class EnglishGreeterFactory implements GreeterFactory { Greeter getGreeter(); } Then the method ServiceLoader.Provider#type() can be used to obtain the annotation and return a greeter factory for a given language:\n1 2 3 4 5 6 7 8 9 10 11 private Optional\u0026lt;GreeterFactory\u0026gt; getGreeterFactoryForLanguage( String language) { ModuleLayer layer = App.class.getModule().getLayer(); return ServiceLoader.load(layer, GreeterFactory.class) .stream() .filter(gf -\u0026gt; gf.type().getAnnotation( GreeterDefinition.class).lang().equals(language)) .map(gf -\u0026gt; gf.get()) .findFirst(); } Seeing it in Action Lastly, let’s take a look at the complete Greeter application in action. Here it is, initially with two, and then with three greeter implementations:\nThe layers configuration file is adjusted to load greeter plug-ins from the plugins directory; initially, two greeters for English and French exist. Then the German greeter implementation gets picked up by the application after adding it to the plug-in directory, without requiring any changes to the application tiself.\nThe complete source code, including the logic for displaying all the available greeters and prompting for input, is available in the Layrry repository on GitHub.\nAnd there you have it, a basic plug-in architecture using Layrry and the Java Module System. Going forward, this might evolve in a few ways. E.g. it might be desirable to detect additional plug-ins without having to restart the application, e.g. when thinking of desktop application use cases. While loading additional plug-ins in new layers should be comparatively easy, unloading already loaded layers, e.g. when updating a plug-in to a newer version, could potentially be quite tricky. In particular, there’s no way to actively unload layers, so we’d have to rely on the garbage collector to clean up unused layers, making sure no references to any of their classes are kept in other, active layers.\nOne also could think of an event bus, allowing different plug-ins to communicate in a safe, yet loosely coupled way. What requirements would you have for plug-in centered applications running on the Java Module System? Let’s exchange in the comments below!\n","id":68,"publicationdate":"Apr 21, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eMaking applications extensible with some form of plug-ins is a very common pattern in software design:\nbased on well-defined APIs provided by the application core, plug-ins can customize an application’s behavior and provide new functionality.\nExamples include desktop applications like IDEs or web browsers, build tools such as Apache Maven or Gradle, as well as server-side applications such as Apache Kafka Connect,\na runtime for Kafka connectors plug-ins.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this post I’m going to explore how the \u003ca href=\"https://www.jcp.org/en/jsr/detail?id=376\"\u003eJava Platform Module System\u003c/a\u003e\u0026#39;s notion of module layers can be leveraged for implementing plug-in architectures on the JVM.\nWe’ll also discuss how \u003ca href=\"https://github.com/moditect/layrry\"\u003eLayrry\u003c/a\u003e, a launcher and runtime for layered Java applications, can help with this task.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Plug-in Architectures With Layrry and the Java Module System","uri":"https://www.morling.dev/blog/plugin-architectures-with-layrry-and-the-java-module-system/"},{"content":" One of the biggest changes in recent Java versions has been the introduction of the module system in Java 9. It allows to organize Java applications and their dependencies in strongly encapsulated modules, utilizing explicit and well-defined module APIs and relationships.\nIn this post I’m going to introduce the Layrry open-source project, a launcher and Java API for executing modularized Java applications. Layrry helps Java developers to assemble modularized applications from dependencies using their Maven coordinates and execute them using module layers. Layers go beyond the capabilities of the \u0026#34;flat\u0026#34; module path specified via the --module-path parameter of the java command, e.g. allowing to use multiple versions of one module within one and the same application.\nWhy Layrry? The Java Module System doesn’t define any means of mapping between modules (e.g. com.acme.crm) and JARs providing such module (e.g. acme-crm-1.0.0.Final.jar), or retrieving modules from remote repositories using unique identifiers (e.g. com.acme:acme-crm:1.0.0.Final). Instead, it’s the responsibility of the user to obtain all required JARs of a modularized application and provide them via the --module-path parameter.\nFurthermore, the module system doesn’t define any means of module versioning; i.e. it’s the responsibility of the user to obtain all modules in the right version. Using the --module-path option, it’s not possible, though, to assemble an application that uses multiple versions of one and the same module. This may be desirable for transitive dependencies of an application, which might be required in different versions by two separate direct dependencies.\nThis is where Layrry comes in (pronounced \u0026#34;Larry\u0026#34;): it provides a declarative approach as well as an API for assembling modularized applications. The (modular) JARs to be included are described using Maven GAV (group id, artifact id, version) coordinates, solving the issue of retrieving all required JARs from a remote repository, in the right version.\nWith Layrry, applications are organized in module layers, which allows to use different versions of one and the same module in different layers of an application (as long as they are not exposed in a conflicting way on module API boundaries).\nAn Example As an example, let’s consider an application made up of the following modules:\nThe application’s main module, com.example:app, depends on two others, com.example:foo and com.example:bar. They in turn depend on the Log4j API and another module, com.example:greeter. The latter is used in two different versions, though.\nLet’s take a closer look at the Greeter class in these modules. Here is the version in com.example:greeter@1.0.0, as used by com.example:foo:\n1 2 3 4 5 6 public class Greeter { public String greet(String name, String from) { return \u0026#34;Hello, \u0026#34; + name + \u0026#34; from \u0026#34; + from + \u0026#34; (Greeter 1.0.0)\u0026#34;; } } And this is how it looks in com.example:greeter@2.0.0, as used by com.example:bar:\n1 2 3 4 5 6 7 8 9 10 11 public class Greeter { public String hello(String name, String from) { return \u0026#34;Hello, \u0026#34; + name + \u0026#34; from \u0026#34; + from + \u0026#34; (Greeter 2.0.0)\u0026#34;; } public String goodBye(String name, String from) { return \u0026#34;Good bye, \u0026#34; + name + \u0026#34; from \u0026#34; + from + \u0026#34; (Greeter 2.0.0)\u0026#34;; } } The Greeter API has evolved in a backwards-incompatible way, i.e. it’s not possible for the foo and bar modules to use the same version.\nWith a \u0026#34;flat\u0026#34; module path (or classpath), there’s no way for dealing with this situation. You’d inevitably end up with a NoSuchMethodError, as either foo or bar would be linked at runtime against a version of the class different from the version it has been compiled against.\nThe lack of support for using multiple module versions when working with the --module-path option might be surprising at first, but it’s an explicit non-requirement of the module system to support multiple module versions or even deal with selecting matching module versions at all.\nThis means that the module descriptors of both foo and bar require the greeter module without any version information:\n1 2 3 4 5 module com.example.foo { exports com.example.foo; requires org.apache.logging.log4j; requires com.example.greeter; } 1 2 3 4 5 module com.example.bar { exports com.example.bar; requires org.apache.logging.log4j; requires com.example.greeter; } Module Layers to the Rescue While only one version of a given module is supported when running applications via java --module-path=…​, there’s a lesser known feature of the module system which provides a way out: module layers.\nA module layer \u0026#34;is created from a graph of modules in a Configuration and a function that maps each module to a ClassLoader.\u0026#34; Using the module layer API, multiple versions of a module can be loaded in different layers, thus using different classloaders.\nNote the layers API doesn’t concern itself with obtaining JARs or modules from remote locations such as the Maven Central repository; instead, any modules must be provided as Path objects. Here is how a layer with the foo and greeter:1.0.0 modules could be assembled:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ModuleLayer boot = ModuleLayer.boot(); ClassLoader scl = ClassLoader.getSystemClassLoader(); Path foo = Paths.get(\u0026#34;path/to/foo-1.0.0.jar\u0026#34;); (1) Path greeter10 = Paths.get(\u0026#34;path/to/greeter-1.0.0.jar\u0026#34;); (2) ModuleFinder fooFinder = ModuleFinder.of(foo, greeter10); Configuration fooConfig = boot.configuration() (3) .resolve( fooFinder, ModuleFinder.of(), Set.of(\u0026#34;com.example.foo\u0026#34;, \u0026#34;com.example.greeter\u0026#34;) ); ModuleLayer fooLayer = boot.defineModulesWithOneLoader( fooConfig, scl); (4) 1 obtain foo-1.0.0.jar 2 obtain greeter-1.0.0.jar 3 Create a configuration derived from the \u0026#34;boot\u0026#34; module of the JVM, providing a ModuleFinder for the two JARs obtained before, and resolving the two modules 4 Create a module layer using the configuration, loading all contained modules with a single classloader Similarly, you could create a layer for bar and greeter:2.0.0, as well as layers for log4j and the main application module. The layers API is very flexible, e.g. you could load each module in its own classloader and more. But all this flexibility can make using the API direcly a daunting task.\nAlso using an API might not be what you want in the first place: wouldn’t it be nice if there was a CLI tool, akin to using java --module-path=…​, but with the additional powers of module layers?\nThe Layrry Launcher This is where Layrry comes in: it is a CLI tool which takes a configuration of a layered application (defined in a YAML file) and executes it. The layer descriptor for the example above looks like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 layers: log: (1) modules: (2) - \u0026#34;org.apache.logging.log4j:log4j-api:jar:2.13.1\u0026#34; - \u0026#34;org.apache.logging.log4j:log4j-core:jar:2.13.1\u0026#34; - \u0026#34;com.example:logconfig:1.0.0\u0026#34; foo: parents: (3) - \u0026#34;log\u0026#34; modules: - \u0026#34;com.example:greeter:1.0.0\u0026#34; - \u0026#34;com.example:foo:1.0.0\u0026#34; bar: parents: - \u0026#34;log\u0026#34; modules: - \u0026#34;com.example:greeter:2.0.0\u0026#34; - \u0026#34;com.example:bar:1.0.0\u0026#34; app: parents: - \u0026#34;foo\u0026#34; - \u0026#34;bar\u0026#34; modules: - \u0026#34;com.example:app:1.0.0\u0026#34; main: (4) module: com.example.app class: com.example.app.App 1 Each layer has a unique name 2 The modules element lists all the modules contained in the layer, using Maven coordinates (group id, artifact id, version), unambigously referencing a (modular) JAR in a specific version 3 A layer can have one or more parent layers, whose modules it can access; if no parent is given, the JVM’s \u0026#34;boot\u0026#34; layer is the implicit parent of a layer 4 The given main module and class is the one that will be executed by Layrry The configuration above describes four layers, log, foo, bar and app, with the modules they contain and the parent/child relationships between these layers. Note how the versions 1.0.0 and 2.0.0 of the greeter module are used in foo and bar. The file also specifies the main class to execute when running this application.\nUsing Layrry, a modular application is executed like this:\n1 2 3 4 5 6 7 java -jar layrry-1.0-SNAPSHOT-jar-with-dependencies.jar \\ --layers-config layers.yml \\ Alice 20:58:01.451 [main] INFO com.example.foo.Foo - Hello, Alice from Foo (Greeter 1.0.0) 20:58:01.472 [main] INFO com.example.bar.Bar - Hello, Alice from Bar (Greeter 2.0.0) 20:58:01.473 [main] INFO com.example.bar.Bar - Good bye, Alice from Bar (Greeter 2.0.0) The log messages show how the two versions of greeter are used by foo and bar, respectively. Layrry will download all referenced JARs using the Maven resolver API, i.e. you don’t have to deal with manually obtaining all the JARs and providing them to the java runtime.\nUsing the Layrry API In addition to the YAML-based launcher, Layrry provides also a Java API for assembling and running layered applications. This can be used in cases where the structure of layers is only known at runtime, or for implementing plug-in architectures.\nIn order to use Layrry programmatically, add the following dependency to your pom.xml:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.moditect.layrry\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;layrry\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.0-SNAPSHOT\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Then, the Layrry Java API can be used like this (showing the same example as above):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Layers layers = Layers.layer(\u0026#34;log\u0026#34;) .withModule(\u0026#34;org.apache.logging.log4j:log4j-api:jar:2.13.1\u0026#34;) .withModule(\u0026#34;org.apache.logging.log4j:log4j-core:jar:2.13.1\u0026#34;) .withModule(\u0026#34;com.example:logconfig:1.0.0\u0026#34;) .layer(\u0026#34;foo\u0026#34;) .withParent(\u0026#34;log\u0026#34;) .withModule(\u0026#34;com.example:greeter:1.0.0\u0026#34;) .withModule(\u0026#34;com.example:foo:1.0.0\u0026#34;) .layer(\u0026#34;bar\u0026#34;) .withParent(\u0026#34;log\u0026#34;) .withModule(\u0026#34;com.example:greeter:2.0.0\u0026#34;) .withModule(\u0026#34;com.example:bar:1.0.0\u0026#34;) .layer(\u0026#34;app\u0026#34;) .withParent(\u0026#34;foo\u0026#34;) .withParent(\u0026#34;bar\u0026#34;) .withModule(\u0026#34;com.example:app:1.0.0\u0026#34;) .build(); layers.run(\u0026#34;com.example.app/com.example.app.App\u0026#34;, \u0026#34;Alice\u0026#34;); Next Steps The Layrry project is still in its infancy. Nevertheless it can be a useful tool for application developers wishing to leverage the Java Module System. Obtaining modular JARs via Maven coordinates and providing an easy-to-use mechanism for organizing modules in layers enables usages which cannot be addressed using the plain java --module-path …​ approach.\nLayrry is open-source (under the Apache License version 2.0). The source code is hosted on GitHub, and your contributions are very welcomed.\nPlease let me know about your ideas and requirements in the comments below or by opening up issues on GitHub. Planned enhancements include support for creating modular runtime images (jlink) based on the modules referenced in a layers.yml file, and visualization of module layers and their modules via GraphViz.\n","id":69,"publicationdate":"Mar 29, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOne of the biggest changes in recent Java versions has been the introduction of the \u003ca href=\"http://openjdk.java.net/projects/jigsaw/spec/\"\u003emodule system\u003c/a\u003e in Java 9.\nIt allows to organize Java applications and their dependencies in strongly encapsulated modules, utilizing explicit and well-defined module APIs and relationships.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this post I’m going to introduce the \u003ca href=\"https://github.com/moditect/layrry\"\u003eLayrry\u003c/a\u003e open-source project, a launcher and Java API for executing modularized Java applications.\nLayrry helps Java developers to assemble modularized applications from dependencies using their Maven coordinates and execute them using module layers.\nLayers go beyond the capabilities of the \u0026#34;flat\u0026#34; module path specified via the \u003cem\u003e--module-path\u003c/em\u003e parameter of the \u003cem\u003ejava\u003c/em\u003e command,\ne.g. allowing to use multiple versions of one module within one and the same application.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Introducing Layrry: A Launcher and API for Modularized Java Applications","uri":"https://www.morling.dev/blog/introducing-layrry-runner-and-api-for-modularized-java-applications/"},{"content":" Within Debezium, the project I’m working on at Red Hat, we recently encountered an \u0026#34;interesting\u0026#34; situation where we had to resolve a rather difficult merge conflict. As others where interested in how we addressed the issue, and also for our own future reference, I’m going to give a quick run down of the problem we encountered and how we solved it.\nThe Problem Ideally, we’d only ever work on a single branch and would never have to deal with porting changes between the master and other branches. Oftentimes we cannot get around this, though: specific versions of a software may have to be maintained for some time, requiring to backport bugfixes from the current development branch to the branch corresponding to the maintained version.\nIn our specific case we had to deal with backporting changes to our project documentation. To complicate things, this documentation (written in AsciiDoc) has been largely re-organized between master and the targeted older branch, 1.0. What used to be one large AsciiDoc file for each of the Debezium connectors, got split up into multiple smaller files on master now. This split was meant to be applied to 1.0 too, but due to some miscommunication in the team (these things happen, right) this wasn’t done, whereas an asorted set of documentation changes had been backported already to the larger, monolithic AsciiDoc files.\nSo the situation we faced was this:\nlarge, monolithic AsciiDoc files on the 1.0 branch\nsmaller, modularized AsciiDoc files on master\nDocumentation updates applied on master, of which only a subset is relevant for 1.0 (new features shouldn’t be added to the Debezium 1.0 documentation)\nSome of the documentation updates relevant for the 1.0 branch already had been backported from master, while others had not\nAll in all, a rather convoluted situation; the full diff of the documentation sub-directory between the two branches was about 13K lines.\nSo what should we do? Cherry-picking individual commits from master was not really an option, as there were a few hundred commits on master since 1.0 had been forked off. Also many commits would contain documentation and code changes. The latter had already been backported successfully before.\nRealizing that resolving that merge conflict was next to impossible, the next idea was to essentially start from scratch and re-apply all relevant documentation changes to the 1.0 branch. Our initial idea was to create a patch with the difference of the documentation directory between the two branches. But editing that patch file with 13K lines turned out to be not manageable, either.\nThe Solution This is when we were reminded of the possibilities of git filter-branch: using this command it should be possible to isolate all the documentation changes done on master since Debezium 1.0 and apply the required sub-set of these changes to the 1.0 branch.\nTo start with a clean slate, we created a new temporary branch based on 1.0:\ngit checkout -b docs_backport 1.0 We then reset the contents of the documentation directory to its state as of the 1.0.0.Final release, as that’s where the 1.0 and master branches diverged.\nrm -rf documentation git add documentation git checkout v1.0.0.Final documentation git commit -m \u0026#34;Resetting documentation dir to v1.0.0.Final\u0026#34; # This should yield no differences git diff v1.0.0.Final..docs_backport documentation The next step was to filter all commits on master so to only keep any changes to the documentation directory. This was done on a new branch, docs_filtered. The --subdirectory-filter option comes in handy for that:\ngit checkout -b docs_filtered master git filter-branch -f --prune-empty \\ --subdirectory-filter documentation \\ v1.0.0.Final..docs_filtered This leaves us with a branch docs_filtered which only contains the commits since the v1.0.0.Final tag that modified the documentation directory.\nThe --subdirectory-filter option also moves the contents of the given directory to the root of the repo, though. That’s not exactly what we need. But another option, --tree-filter, lets us restore the original directory layout. It allows to run a set of commands against each of the filtered commits. We can use this to move the contents of documentation back to that directory:\ngit filter-branch -f \\ --tree-filter \u0026#39;mkdir -p documentation; \\ mv antora.yml documentation 1\u0026gt;/dev/null 2\u0026gt;/dev/null; \\ mv modules documentation 1\u0026gt;/dev/null 2\u0026gt;/dev/null;\u0026#39; \\ v1.0.0.Final..docs_filtered Examining the history now, we can see that the commits on the docs_filtered apply the changes to the documentation directory, as expected.\nOne problem still remains, though: by means of the --subdirectory-filter option, the very first commit removes all contents besides the documentation directory. This can be fixed by doing an interactive rebase of the current branch, beginning at the v1.0.0.Final tag:\ngit rebase -i v1.0.0.Final We need to edit the very first commit; all changes besides those to the documentation directory need to be reverted from that commit. There might be a better way of doing so, I simply ran git checkout for all the other resources:\ngit checkout v1.0.0.Final debezium-connector-mongodb git checkout v1.0.0.Final debezium-connector-mysql ... At this point the filtered branch still is based off of the v1.0.0.Final tag, whereas it should be based off of the docs_backport branch. git rebase --onto to the rescue:\ngit rebase --onto docs_backport v1.0.0.Final docs_filtered This rebases all the commits from the docs_filtered branch onto the docs_backport branch. Now we have a state where where all the documention changes have been cleanly applied to the 1.0 code base, i.e. the following should yield no differences:\ngit diff docs_filtered..master documentation The last and missing step is to do another rebase of all the documentation commits, discarding those that apply to any features that didn’t get backported to 1.0.\nThankfully, my partner-in-crime Jiri Pechanec stepped in here: as he had done the original feature backport, it didn’t take him too long to go through the list of documentation commits and identify those which were relevant for the 1.0 code base. After one more interactive rebase for applying those we finally were in a state, where all the required documentation changes had been backported.\nLooking at the 1.0 history, you’d still see some partial documentation changes up to the point, where we decided to start all over and revert these. Theoretically we could do another git filter run to exclude those, but we decided against that, as we already had done releases off of the 1.0 branch and didn’t want to alter the commit history of a released branch after the fact.\n","id":70,"publicationdate":"Mar 16, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWithin \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e, the project I’m working on at Red Hat, we recently encountered an \u0026#34;interesting\u0026#34; situation where we had to resolve a rather difficult merge conflict.\nAs others where interested in how we addressed the issue, and also for our own future reference,\nI’m going to give a quick run down of the problem we encountered and how we solved it.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Reworking Git Branches with git filter-branch","uri":"https://www.morling.dev/blog/reworking-git-branches-with-git-filter-branch/"},{"content":"","id":71,"publicationdate":"Jan 29, 2020","section":"tags","summary":"","tags":null,"title":"jakartaee","uri":"https://www.morling.dev/tags/jakartaee/"},{"content":"","id":72,"publicationdate":"Jan 29, 2020","section":"tags","summary":"","tags":null,"title":"java","uri":"https://www.morling.dev/tags/java/"},{"content":"","id":73,"publicationdate":"Jan 29, 2020","section":"tags","summary":"","tags":null,"title":"microprofile","uri":"https://www.morling.dev/tags/microprofile/"},{"content":"","id":74,"publicationdate":"Jan 29, 2020","section":"tags","summary":"","tags":null,"title":"monitoring","uri":"https://www.morling.dev/tags/monitoring/"},{"content":" The JDK Flight Recorder (JFR) is an invaluable tool for gaining deep insights into the performance characteristics of Java applications. Open-sourced in JDK 11, JFR provides a low-overhead framework for collecting events from Java applications, the JVM and the operating system.\nIn this blog post we’re going to explore how custom, application-specific JFR events can be used to monitor a REST API, allowing to track request counts, identify long-running requests and more. We’ll also discuss how the JFR Event Streaming API new in Java 14 can be used to export live events, making them available for monitoring and alerting via tools such as Prometheus and Grafana.\nJFR and its companion tool JDK Mission Control (JMC) for analyzing JFR recordings have come a long way; originally developed at BEA and part of the JRockit VM, they were later on commercial features of the Oracle JDK. As of Java 11, JFR got open-sourced and is part of OpenJDK distributions. JMC is also open-source, but it’s an independent tool under the OpenJDK umbrella, which must be downloaded separately.\nUsing the combination of JFR and JMC, you can get all kinds of information about your Java application, such as events on garbage collection, compilation, classloading, memory allocation, file and socket IO, method profiling data, and much more. To learn more about Flight Recorder and Mission Control in general, have a look at the Code One 2019 presentation Introduction to JDK Mission Control \u0026amp; JDK Flight Recorder by Marcus Hirt and Klara Ward. You can find some more links to related useful resources towards the end of this post.\nCustom Flight Recorder Events One thing that’s really great about JFR and JMC is that you’re not limited to the events and data baked into the JVM and platform libraries: JFR also provides an API for implementing custom events. That way you can use the low-overhead event recording infrastructure (its goal is to add at most 1% performance overhead) for your own event types. This allows you to record and analyze higher-level events, using the language of your application-specific domain.\nTaking my day job project Debezium as an example (an open-source platform for change data capture for a variety of databases), we could for instance produce events such as \u0026#34;Snapshot started\u0026#34;, \u0026#34;Snapshotting of table \u0026#39;Customers\u0026#39; completed\u0026#34;, \u0026#34;Captured change event for transaction log offset 123\u0026#34; etc. Users could send us recordings with these events and we could dive into them, in order to identify bugs or performance issues.\nIn the following let’s consider a less complex and hence better approachable example, though. We’ll implement an event for measuring the duration of REST API calls. The Todo service from my recent blog post on Quarkus Qute will serve as our guinea pig. It is based on the Quarkus stack and provides a simple REST API based on JAX-RS. As always, you can find the complete source code for this blog post on GitHub.\nEvent types are implemented by extending the jdk.jfr.Event class; It already provides us with some common attributes such as a timestamp and a duration. In sub-classes you can add application-specific payload attributes, as well as some metadata such as a name and category which will be used for organizing and displaying events when looking at them in JMC.\nWhich attributes to add depends on your specific requirements; you should aim for the right balance between capturing all the relevant information that will be useful for analysis purposes later on, while not going overboard and adding too much, as that could cause record files to become too large, in particular for events that are emitted with a high frequency. Also retrieval of the attributes should be an efficient operation, so to avoid any unneccessary overhead.\nHere’s a basic event class for monitoring our REST API calls:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 @Name(JaxRsInvocationEvent.NAME) (1) @Label(\u0026#34;JAX-RS Invocation\u0026#34;) @Category(\u0026#34;JAX-RS\u0026#34;) @Description(\u0026#34;Invocation of a JAX-RS resource method\u0026#34;) @StackTrace(false) (2) public class JaxRsInvocationEvent extends Event { static final String NAME = \u0026#34;dev.morling.jfr.JaxRsInvocation\u0026#34;; @Label(\u0026#34;Resource Method\u0026#34;) (3) public String method; @Label(\u0026#34;Media Type\u0026#34;) public String mediaType; @Label(\u0026#34;Java Method\u0026#34;) public String javaMethod; @Label(\u0026#34;Path\u0026#34;) public String path; @Label(\u0026#34;Query Parameters\u0026#34;) public String queryParameters; @Label(\u0026#34;Headers\u0026#34;) public String headers; @Label(\u0026#34;Length\u0026#34;) @DataAmount (4) public int length; @Label(\u0026#34;Response Headers\u0026#34;) public String responseHeaders; @Label(\u0026#34;Response Length\u0026#34;) public int responseLength; @Label(\u0026#34;Response Status\u0026#34;) public int status; } 1 The @Name, @Category, @Description and @Label annotations define some meta-data, e.g. used for controlling the appearance of these events in the JMC UI 2 JAX-RS invocation events shouldn’t contain a stacktrace by default, as that’d only increase the size of Flight Recordings without adding much value 3 One payload attribute is defined for each relevant property such as HTTP method, media type, the invoked path etc. 4 @DataAmount tags this attribute as a data amount (by default in bytes) and will be displayed accordingly in JMC; there are many other similar annotations in the jdk.jfr package, such as @MemoryAddress, @Timestamp and more Having defined the event class itself, we must find a way for emitting event instances at the right point in time. In the simplest case, e.g. suitable for events related to your application logic, this might happen right in the application code itself. For more \u0026#34;technical\u0026#34; events it’s a good idea though to keep the creation of Flight Recorder events separate from your business logic, e.g. by using mechanisms such as servlet filters, interceptors and similar, which allow to inject cross-cutting logic into the call flow of your application.\nYou also might employ byte code instrumentation at build or runtime for this purpose. The JMC Agent project aims at providing a configurable Java agent that allows to dynamically inject code for emitting JFR events into running programs. Via the EventFactory class, the JFR API also provides a way for defining event types dynamically, should their payload attributes only be known at runtime.\nFor monitoring a JAX-RS based REST API, the ContainerRequestFilter and ContainerResponseFilter contracts come in handy, as they allow to hook into the request handling logic before and after a REST request gets processed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 @Provider (1) public class FlightRecorderFilter implements ContainerRequestFilter, ContainerResponseFilter { @Override (2) public void filter(ContainerRequestContext requestContext) throws IOException { JaxRsInvocationEvent event = new JaxRsInvocationEvent(); if (!event.isEnabled()) { (3) return; } event.begin(); (4) requestContext.setProperty(JaxRsInvocationEvent.NAME, event); (5) } @Override (6) public void filter(ContainerRequestContext requestContext, ContainerResponseContext responseContext) throws IOException { JaxRsInvocationEvent event = (JaxRsInvocationEvent) requestContext .getProperty(JaxRsInvocationEvent.NAME); if (event == null || !event.isEnabled()) { return; } event.end(); (7) event.path = String.valueOf(requestContext.getUriInfo().getPath()); if (event.shouldCommit()) { (8) event.method = requestContext.getMethod(); event.mediaType = String.valueOf(requestContext.getMediaType()); event.length = requestContext.getLength(); event.queryParameters = requestContext.getUriInfo() .getQueryParameters().toString(); event.headers = requestContext.getHeaders().toString(); event.javaMethod = getJavaMethod(requestContext); event.responseLength = responseContext.getLength(); event.responseHeaders = responseContext.getHeaders().toString(); event.status = responseContext.getStatus(); event.commit(); (9) } } private String getJavaMethod(ContainerRequestContext requestContext) { String propName = \u0026#34;org.jboss.resteasy.core.ResourceMethodInvoker\u0026#34;; ResourceMethodInvoker invoker = (ResourceMethodInvoker)requestContext.getProperty(propName); return invoker.getMethod().toString(); } } 1 Allows the filter to be picked up automatically by the JAX-RS implementation 2 Will be invoked before the request is processed 3 Nothing to do if the event type is not enabled for recordings currently 4 Begin the timing of the event 5 Store the event in the request context, so it can be obtained again later on 6 Will be invoked after the request has been processed 7 End the timing of the event 8 The event should be committed if it is enabled and its duration is within the threshold configured for it; in that case, populate all the payload attributes of the event based on the values from the request and response contexts 9 Commit the event with Flight Recorder With that, our event class is pretty much ready to be used. There’s only one more thing to do, and that is registering the new type with the Flight Recorder system. A Quarkus application start-up lifecycle method comes in handy for that:\n1 2 3 4 5 6 7 @ApplicationScoped public class Metrics { public void registerEvent(@Observes StartupEvent se) { FlightRecorder.register(JaxRsInvocationEvent.class); } } Note this step isn’t strictly needed, the event type can also be used without explicit registration. But doing so will later on allow to apply specific settings for the event in Mission Control (see below), also if no event of this type has been emitted yet.\nCreating JFR Recordings Now let’s capture some JAX-RS API events using Flight Recorder and inspect them in Mission Control.\nTo do so, make sure to have Mission Control installed. Just as with OpenJDK, there are different builds for Mission Control to choose from. If you’re in the Fedora/RHEL universe, there’s a repository package which you can install, e.g. like this for the Fedora JMC package:\n1 sudo dnf module install jmc:7/default Alternatively, you can download builds for different platforms from Oracle; some more info about these builds can be found in this blog post by Marcus Hirt. There’s also the Liberica Mission Control build by BellSoft and Zulu Mission Control by Azul. The AdoptOpenJDK provides snapshot builds of JMC 8 as well as an Eclipse update site for installing JMC into an existing Eclipse instance.\nIf you’d like to follow along and run these steps yourself, check out the source code from GitHub and then perform the following commands:\n1 2 cd example-service \u0026amp;\u0026amp; mvn clean package \u0026amp;\u0026amp; cd .. docker-compose up --build This builds the project using Maven and spins up the following services using Docker Compose:\nexample-service: The Todo example application\ntodo-db: The Postgres database used by the Todo service\nprometheus and grafana: For monitoring live events later on\nThen go to http://localhost:8080/todo, where you should see the Todo web application:\nNow fire up Mission Control. The example service run via Docker Compose is configured so you can connect to it on localhost. In the JVM Browser, create a new connection with host \u0026#34;localhost\u0026#34; and port \u0026#34;1898\u0026#34;. Hit \u0026#34;Test connection\u0026#34;, which should yield \u0026#34;OK\u0026#34;, then click \u0026#34;Finish\u0026#34;.\nCreate a new recording by expanding the localhost:1898 node in the JVM Explorer, right-clicking on \u0026#34;Flight Recorder\u0026#34; and choosing \u0026#34;Start Flight Recording…​\u0026#34;. Confirm the default settings, which will create a recording with a duration of one minute. Go back to the Todo web application and perform a few tasks like creating some new todos, editing and deleting them, or filtering the todo list.\nEither wait for the recording to complete or stop it by right-clicking on the recording name and selecting \u0026#34;Stop\u0026#34;. Once the recording is done, it will be opened automatically. Now you could dive into all the logged events for the OS, the JVM etc, but as we’re interested in our custom JAX-RS events, Choose \u0026#34;Event Browser\u0026#34; in the outline view and expand the \u0026#34;JAX-RS\u0026#34; category. You will see the events for all your REST API invocations, including information such as duration of the request, the HTTP method, the resource path and much more:\nIn a real-world use case, you could now use this information for instance to identify long-running requests and correlate these events with other data points in the Flight Recording, such as method profiling and memory allocation data, or sub-optimal SQL statements in your database.\nIf your application is running in production, it might not be feasible to connect to it via Mission Control from your local workstation. The jcmd utility comes in handy in that case; part of the JDK, you can use it to issue diagnostic commands against a running JVM.\nAmongst many other things, it allows you to start and stop Flight Recordings. On the environment with your running application, first run jcmd -l, which will show you the PIDs of all running Java processes. Having identified the PID of the process you’d like to examine, you can initiate a recording like so:\n1 2 jcmd \u0026lt;PID\u0026gt; JFR.start delay=5s duration=30s \\ name=MyRecording filename=my-recording.jfr This will start a recording of 30 seconds, beginning in 5 seconds from now. Once the recording is done, you could copy the file to your local machine and load it into Mission Control for further analysis. To learn more about creating Flight Recordings via jcmd, refer to this great cheat sheet.\nAnother useful tool in the belt is the jfr command, which was introduced in JDK 12. It allows you to filter and examine the binary Flight Recording files. You also can use it to extract parts of a recording and convert them to JSON, allowing them to be processed with other tools. E.g. you could convert all the JAX-RS events to JSON like so:\n1 jfr print --json --categories JAX-RS my-recording.jfr Event Settings Sometimes it’s desirable to configure detailed behaviors of a given event type. For the JAX-RS invocation event it might for instance make sense to only log invocations of particular paths in a specific recording, allowing for a smaller recording size and keeping the focus on a particular subset of all invocations. JFR supports this by the notion of event settings. Such settings can be specified when creating a recording; based on the active settings, particular events will be included or excluded in the recording.\nInspired by the JavaDoc of @SettingDefinition let’s see what’s needed to enhance JaxRsInvocationEvent with that capability. The first step is to define a subclass of jdk.jfr.SettingControl, which serves as the value holder for our setting:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 public class PathFilterControl extends SettingControl { private Pattern pattern = Pattern.compile(\u0026#34;.*\u0026#34;); (1) @Override (2) public void setValue(String value) { this.pattern = Pattern.compile(value); } @Override (3) public String combine(Set\u0026lt;String\u0026gt; values) { return String.join(\u0026#34;|\u0026#34;, values); } @Override (4) public String getValue() { return pattern.toString(); } (5) public boolean matches(String s) { return pattern.matcher(s).matches(); } } 1 A regular expression pattern that’ll be matched against the path of incoming events; by default all paths are included (.*) 2 Invoked by the JFR runtime to set the value for this setting 3 Invoked when multiple recordings are running at the same time, combining the settings values 4 Invoked by the runtime for instance when getting the default value of the setting 5 Matches the configured setting value against a particular path On the event class itself a method with the following characteristics must be declared which will receive the setting by the JFR runtime:\n1 2 3 4 5 6 7 8 9 10 11 12 13 class JaxRsInvocationEvent extends Event { @Label(\u0026#34;Path\u0026#34;) public String path; // other members... @Label(\u0026#34;Path Filter\u0026#34;) @SettingDefinition (1) protected boolean pathFilter(PathFilterControl pathFilter) { (2) return pathFilter.matches(path); } } 1 Tags this as a setting 2 The method must be public, take a SettingControl type as its single parameter and return boolean This method will be invoked by the JFR runtime during the shouldCommit() call. It passes in the setting value of the current recording so it can applied to the path value of the given event. In case the filter returns true, the event will be added to the recording, otherwise it will be ignored.\nWe also could use such setting to control the inclusion or exclusion of specific event attributes. For that, the setting definition method would always have to return true, but depending on the actual setting it might set particular attributes of the event class to null. For instance this might come in handy if we wanted to log the entire request/response body of our REST API. Doing this all the time might be prohibitive in terms of recording size, but it might be enabled for a particlar short-term recording for analyzing some bug.\nNow let’s see how the path filter can be applied when creating a new recording in Mission Control. The option is a bit hidden, but here’s how you can enable it. First, create a new Flight Recording, then choose \u0026#34;Template Manager\u0026#34; in the dialogue:\nDuplicate the \u0026#34;Continuous\u0026#34; template and edit it:\nClick \u0026#34;Advanced\u0026#34;:\nExpand \u0026#34;JAX-RS\u0026#34; → \u0026#34;JAX-RS Invocation\u0026#34; and put .*(new|edit).* into the Path Filter control:\nNow close the last two dialogues. In the \u0026#34;Start Flight Recording\u0026#34; dialogue make sure to select your new template under \u0026#34;Event Settings\u0026#34;; although you’ve edited it before, it won’t be selected automatically. I lost an hour or so wondering why my settings were not applied…​ .\nLastly, click \u0026#34;Finish\u0026#34; to begin the recording:\nPerform some tasks in the Todo web app and stop the recording. You should see only the REST API calls for the new and edit operations, whereas no events should be shown for the list and delete operations of the API.\nIn order to apply specific settings when creating a recording on the CLI using jcmd, edit the settings as described above. Then go to the Template Manager and export the profile you’d like to use. When starting the recording via jcmd, specify the settings file via the settings=/path/to/settings.jfc parameter.\nJFR Event Streaming Flight Recorder files are great for analyzing performance characteristics in an \u0026#34;offline\u0026#34; approach: you can take recordings in your production environment and ship them to your work station or a remote support team, without requiring live access to the running application. This is also an interesting mode for open-source projects, where maintainers typically don’t have access to running applications of their users. Exchanging Flight Recordings (limited to a sensible subset of information, so to avoid exposure of confidential internals) might allow open source developers to gain insight into characteristics of their libraries when deployed to production at their users.\nBut there’s another category of use cases for event data sourced from applications, the JVM and the operating system, where the recording file approach doesn’t quite fit: live monitoring and alerting of running applications. E.g. operations teams might want to set up dashboards showing the most relevant application metrics in \u0026#34;real-time\u0026#34;, without having to create any recording files first. A related requirement is alerting, so to be notified when metrics reach a certain threshold. For instance it might be desirable to be alterted if the request duration of our JAX-RS API goes beyond a defined value such as 100 ms.\nThis is where JEP 349 (\u0026#34;JFR Event Streaming\u0026#34;) comes in. It’ll be part of Java 14 and its stated goal is to \u0026#34;provide an API for the continuous consumption of JFR data on disk, both for in-process and out-of-process applications\u0026#34;. That’s exactly what we need for our monitoring/dashboarding use case. Using the Streaming API, Flight Recorder events of the running application can be exposed to external consumers, without having to explicitly load any recording files.\nNow it may be prohibitively expensive to stream each and every event with all its detailed information to remote clients. But that’s not needed for monitoring purposes anyways. Instead, we can expose metrics based on our events, such as the total number and frequency of REST API invocations, or the average and 99th percentile duration of the calls.\nMicroProfile Metrics The following shows a basic implementation of exposing these metrics for the JAX-RS API events to Prometheus/Grafana, where they can be visualized using a dashboard. Being based on Quarkus, the Todo web application can leverage all the MicroProfile APIs. On of them is the MicroProfile Metrics API, which defines a \u0026#34;unified way for Microprofile servers to export Monitoring data (\u0026#34;Telemetry\u0026#34;) to management agents\u0026#34;.\nWhile the MicroProfile Metrics API is used in an annotation-driven fashion often-times, it also provides a programmatic API for registering metrics. This can be leveraged to expose metrics based on the JAX-RS Flight Recorder events:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 @ApplicationScoped public class Metrics { @Inject (1) MetricRegistry metricsRegistry; private RecordingStream recordingStream; (2) public void onStartup(@Observes StartupEvent se) { recordingStream = new RecordingStream(); (3) recordingStream.enable(JaxRsInvocationEvent.NAME); recordingStream.onEvent(JaxRsInvocationEvent.NAME, event -\u0026gt; { (4) String path = event.getString(\u0026#34;path\u0026#34;) .replaceAll(\u0026#34;(\\\\/)([0-9]+)(\\\\/?)\u0026#34;, \u0026#34;$1{param}$3\u0026#34;); (5) String method = event.getString(\u0026#34;method\u0026#34;); String name = path + \u0026#34;-\u0026#34; + method; Metadata metadata = metricsRegistry.getMetadata().get(name); if (metadata == null) { metricsRegistry.timer(Metadata.builder() (6) .withName(name) .withType(MetricType.TIMER) .withDescription(\u0026#34;Metrics for \u0026#34; + path + \u0026#34; (\u0026#34; + method + \u0026#34;)\u0026#34;) .build()).update(event.getDuration().toNanos(), TimeUnit.NANOSECONDS); } else { (7) metricsRegistry.timer(name).update(event.getDuration() .toNanos(), TimeUnit.NANOSECONDS); } }); recordingStream.startAsync(); (8) } public void stop(@Observes ShutdownEvent se) { recordingStream.close(); (9) try { recordingStream.awaitTermination(); } catch (InterruptedException e) { throw new RuntimeException(e); } } } 1 Inject the MicroProfile Metrics registry 2 A stream providing push access to JFR events 3 Initialize the stream upon application start-up, so it includes the JAX-RS invocation events 4 For each JaxRsInvocationEvent this callback will be invoked 5 To register a corresponding metric, any path parameters are replaced with a constant placeholder, so that e.g. all invocations of the todo/{id}/edit path are exposed via one single metric instead of having separate ones for Todo 1, Todo 2 etc. 6 If the metric for the specific path hasn’t been registered yet, then do so; it’s a metric of type TIMER, allowing metric consumers to track the duration of calls of that particular path 7 If the metric for the path has been registered before, update its value with the duration of the incoming event 8 Start the stream asynchronously, not blocking the onStartup() method 9 Close the JFR event stream upon application shutdown When connecting to the running application using JMC now, you’ll see a continuous recording, which serves as the basis for the event stream. It only contains events of the JaxRsInvocationEvent type.\nMicroProfile Metrics exposes any application-provided metrics in the Prometheus format under the /metrics/application endpoint; for each operation of the REST API, e.g. POST to /todo/{id}/edit, the following metrics are provided:\nrequest rate per second, minute, five minutes and 15 minutes\nmin, mean and max duration as well as standard deviation\ntotal invocation count\nduration of 75th, 95th, 99th etc. percentiles\nOnce the endpoint is provided, it’s not difficult to set up a scraping process for ingesting the metrics into the Prometheus time-series database. You can find the required Prometheus configuration in the accompanying source code repository.\nWhile Prometheus provides some visualization capabilities itself, it is often used together with Grafana, which allows to build nicely looking dashboards via a rather intuitive UI. Here’s an example dashboard showing the duration and invocation numbers for the different methods in the Todo REST API:\nAgain you can find the complete configuration for Grafana including the definition of that dashboard in the example repo. It will automatically be loaded when using the Docker Compose set-up shown above. Based on that you could easily expand the dashboard for other metrics and set up alerts, too.\nCombining the monitoring of live key metrics with the deep insights possible via detailed JFR recordings enable a very powerful workflow for analysing performance issues in production:\nWhen setting up the continuous recording that serves as the basis for the metrics, have it contain all the event types you’d need to gain insight into GC or memory issues etc.; specify a maximum size via RecordingStream#setMaxSize(), so to avoid an indefinitely growing recording; you’ll probably need to experiment a bit to find the right trade-off between number of enabled events, duration that’ll be covered by the recording and the required disk space\nOnly expose a relevant subset of the events as metrics to Prometheus/Grafana, such as the JAX-RS API invocation events in our example\nSet up an alert in Grafana on the key metrics, e.g. mean duration of the REST calls, or 99th percentile thereof\nIf the alert triggers, take a dump of the last N minutes of the continuous recording via JMC or jcmd (using the JFR.dump command), and analyze that detailed recording to understand what was happening in the time leading to the alert\nSummary and Related Work Flight Recorder and Mission Control are excellent tools providing deep insight into the performance characteristics of Java applications. While there’s a large amount of data and highly valuable information provided out the box, JFR and JMC also allow for the recording of custom, application-specific events. With its low overhead, JFR can be enabled on a permanent basis in production environments. Combined with the Event Streaming API introduced in Java 14, this opens up an attractive, very performant alternative to other means of capturing analysis information at application runtime, such as logging libraries. Providing live key metrics derived from JFR events to tools such as Prometheus and Grafana enables monitoring and alerting in \u0026#34;real-time\u0026#34;.\nFor many enterprises that are still on Java 11 or even 8, it’ll still be far out into the future until they might adopt the streaming API. But with more and more companies joining the OpenJDK efforts, it might be a possiblity that this useful feature gets backported to earlier LTS releases, just as the open-sourced version of Flight Recorder itself got backported to Java 8.\nThere are quite a few posts and presentations about JFR and JMC available online, but many of them refer to older versions of those tools, before they got open-sourced. Here are some up-to-date resources which I found very helpful:\nContinuous Monitoring with JDK Flight Recorder: a talk from QCon SF 2019 by Mikael Vidstedt\nFlight Recorder \u0026amp; Mission Control at Code One 2019: a compilation of several great sessions on these two tools at last year’s Code One, put together by Marcus Hirt\nDigging Into Sockets With Java Flight Recorder: blog post by Petr Bouda on identifying performance bottlenecks with JFR in a Netty-based web application\nLastly, the Red Hat OpenJDK team is working on some very interesting projects around JFR and JMC, too. E.g. they’ve built a datasource for Grafana which lets you examine the events of a JFR file. They also work on tooling to simplify the usage of JFR in container-based environments such as Kubernetes and OpenShift, including a K8s Operator for controlling Flight Recordings and a web-based UI for managing JFR in remote JVMs. Should you happen to be at the FOSDEM conference in Brussels on the next weekend, be sure to not miss the JMC \u0026amp; JFR - 2020 Vision session by Red Hat engineer Jie Kang.\nIf you’d like to experiment with JDK Flight Recorder and JDK Mission Control based on the Todo web application yourself, you can find the complete source code for this post on GitHub.\nMany thanks to Mario Torre and Jie Kang for reviewing an early draft of this post.\n","id":75,"publicationdate":"Jan 29, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) is an invaluable tool for gaining deep insights into the performance characteristics of Java applications.\nOpen-sourced in JDK 11, JFR provides a low-overhead framework for collecting events from Java applications, the JVM and the operating system.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this blog post we’re going to explore how custom, application-specific JFR events can be used to monitor a REST API, allowing to track request counts, identify long-running requests and more.\nWe’ll also discuss how the JFR \u003ca href=\"https://openjdk.java.net/jeps/349\"\u003eEvent Streaming API\u003c/a\u003e new in Java 14 can be used to export live events,\nmaking them available for monitoring and alerting via tools such as Prometheus and Grafana.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","monitoring","microprofile","jakartaee","quarkus"],"title":"Monitoring REST APIs with Custom JDK Flight Recorder Events","uri":"https://www.morling.dev/blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/"},{"content":"","id":76,"publicationdate":"Jan 29, 2020","section":"tags","summary":"","tags":null,"title":"quarkus","uri":"https://www.morling.dev/tags/quarkus/"},{"content":"","id":77,"publicationdate":"Jan 29, 2020","section":"tags","summary":"","tags":null,"title":"Tags","uri":"https://www.morling.dev/tags/"},{"content":"","id":78,"publicationdate":"Jan 20, 2020","section":"tags","summary":"","tags":null,"title":"bean-validation","uri":"https://www.morling.dev/tags/bean-validation/"},{"content":" Record types are one of the most awaited features in Java 14; they promise to \u0026#34;provide a compact syntax for declaring classes which are transparent holders for shallowly immutable data\u0026#34;. One example where records should be beneficial are data transfer objects (DTOs), as e.g. found in the remoting layer of enterprise applications. Typically, certain rules should be applied to the attributes of such DTO, e.g. in terms of allowed values. The goal of this blog post is to explore how such invariants can be enforced on record types, using annotation-based constraints as provided by the Bean Validation API.\nRecord Invariants and Bean Validation Records (a preview feature as of Java 14) help to cut down the ceremony when defining plain data holder objects. In a nutshell, you solely need to declare the attributes that should make up the state of the record type (\u0026#34;components\u0026#34; in terms of JEP 359), and quite a few things you’d otherwise have to implement by hand will be created for you automatically:\na private final field and a corresponding read accessor for each component\na constructor for passing in all component values\ntoString(), equals() and hashCode() methods.\nAs an example, here’s a record Car with three components:\n1 2 3 public record Car(String manufacturer, String licensePlate, int seatCount) { } Now let’s assume a few class invariants should be applied to this record (inspired by an example from the Hibernate Validator reference guide):\nmanufacturer is a non-blank string\nlicense plate is never null and has a length of 2 to 14 characters\nseatCount is at least 2\nClass invariants like these are specific conditions or rules applying to the state of a class (as manifesting in its fields), which always are guaranteed to be satisfied for the lifetime of an instance of the class.\nThe Bean Validation API defines a way for expressing and validating constraints using Java annotations. By putting constraint annotations to the components of a record type, it’s a perfect means of describing the invariants from above:\n1 2 3 4 5 public record Car( @NotBlank String manufacturer, @NotNull @Size(min = 2, max = 14) String licensePlate, @Min(2) int seatCount) { } Of course declaring constraints using annotations by itself won’t magically enforce these invariants. In order to do so, the javax.validation.Validator API must be invoked at suitable points in the object lifecycle, so to avoid any of the invariants to be violated. As records are immutable, it is sufficient to validate the constraints once when creating a new Car instance. If no constraints are violated, the created instance is guaranteed to always satisfy its invariants.\nImplementation The key question now is how to validate the invariants while constructing new Car instances. This is where Bean Validation’s API for method validation comes in: it allows to validate pre- and post-conditions that should be satisfied when a Java method or constructor gets invoked. Pre-conditions are expressed by applying constraints to method and constructor parameters, whereas post-conditions are expressed by putting constraints to a method or constructor itself.\nThis can be leveraged for enforcing record invariants: as it turns out, any annotations on the components of a record type are also copied to the corresponding parameters of the generated constructor. I.e. the Car record implicitly has a constructor which looks like this:\n1 2 3 4 5 6 7 8 9 public Car( @NotBlank String manufacturer, @NotNull @Size(min = 2, max = 14) String licensePlate, @Min(2) int seatCount) { this.manufacturer = manufacturer; this.licensePlate = licensePlate; this.seatCount = seatCount; } That’s exactly what we need: by validating these parameter constraints upon instantiation of the Car class, we can make sure that only valid objects can ever be created, ensuring that the record type’s invariants are always guaranteed.\nWhat’s missing is a way for automatically validating them upon constructor invocation. The idea for that is to enhance the byte code of the implicit Car constructor so that it passes the incoming parameter values to Bean Validation’s ExecutableValidator#validateConstructorParameters() method and raises a constraint violation exception in case of any invalid parameter values.\nWe’re going to use the excellent ByteBuddy library for this job. Here’s a slightly simplified implementation for invoking the executable validator (you can find the complete source code of this example in this GitHub repository):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 public class ValidationInterceptor { private static final Validator validator = Validation (1) .buildDefaultValidatorFactory() .getValidator(); public static \u0026lt;T\u0026gt; void validate(@Origin Constructor\u0026lt;T\u0026gt; constructor, @AllArguments Object[] args) { (2) Set\u0026lt;ConstraintViolation\u0026lt;T\u0026gt;\u0026gt; violations = validator (3) .forExecutables() .validateConstructorParameters(constructor, args); if (!violations.isEmpty()) { String message = violations.stream() (4) .sorted(ValidationInterceptor::compare) .map(cv -\u0026gt; getParameterName(cv) + \u0026#34; - \u0026#34; + cv.getMessage()) .collect(Collectors.joining(System.lineSeparator())); throw new ConstraintViolationException( (5) \u0026#34;Invalid instantiation of record type \u0026#34; + constructor.getDeclaringClass().getSimpleName() + System.lineSeparator() + message, violations); } } private static int compare(ConstraintViolation\u0026lt;?\u0026gt; o1, ConstraintViolation\u0026lt;?\u0026gt; o2) { return Integer.compare(getParameterIndex(o1), getParameterIndex(o2)); } private static String getParameterName(ConstraintViolation\u0026lt;?\u0026gt; cv) { // traverse property path to extract parameter name } private static int getParameterIndex(ConstraintViolation\u0026lt;?\u0026gt; cv) { // traverse property path to extract parameter index } } 1 Obtain a Bean Validation Validator instance 2 The @Origin and @AllArguments annotations are the hint to ByteBuddy that the invoked constructor and parameter values should be passed to this method from within the enhanced constructor 3 Validate the passed constructor arguments using Bean Validation 4 If there’s at least one violated constraint, create a message comprising all constraint violation messages, ordered by parameter index 5 Raise a ConstraintViolationException, containing the message created before as well as all the constraint violations Having implemented the validation interceptor, the code of the record constructor must be enhanced by ByteBuddy, so that it invokes the inceptor. ByteBuddy provides different ways for doing so, e.g. at application start-up using a Java agent. For this example, we’re going to employ build-time enhancement via the ByteBuddy Maven plug-in. The enhancement logic itself is implemented in a custom net.bytebuddy.build.Plugin:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 public class ValidationWeavingPlugin implements Plugin { @Override public boolean matches(TypeDescription target) { (1) return target.getDeclaredMethods() .stream() .anyMatch(m -\u0026gt; m.isConstructor() \u0026amp;\u0026amp; hasConstrainedParameter(m)); } @Override public Builder\u0026lt;?\u0026gt; apply(Builder\u0026lt;?\u0026gt; builder, TypeDescription typeDescription, ClassFileLocator classFileLocator) { return builder.constructor(this::hasConstrainedParameter) (2) .intercept(SuperMethodCall.INSTANCE.andThen( MethodDelegation.to(ValidationInterceptor.class))); } private boolean hasConstrainedParameter(MethodDescription method) { return method.getParameters() (3) .asDefined() .stream() .anyMatch(p -\u0026gt; isConstrained(p)); } private boolean isConstrained( ParameterDescription.InDefinedShape parameter) { (4) return !parameter.getDeclaredAnnotations() .asTypeList() .filter(hasAnnotation(annotationType(Constraint.class))) .isEmpty(); } @Override public void close() throws IOException { } } 1 Determines whether a type should be enhanced or not; this is the case if there’s at least one constructor that has one more more constrained parameters 2 Applies the actual enhancement: into each constrained constructor the call to ValidationInterceptor gets injected 3 Determines whether a method or constructor has at least one constrained parameter 4 Determines whether a parameter has at least one constraint annotation (an annotation meta-annotated with @Constraint; for the sake of simplicity the case of constraint inheritance is ignored here) The next step is to configure the ByteBuddy Maven plug-in in the pom.xml of the project:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;net.bytebuddy\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;byte-buddy-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;${version.bytebuddy}\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;transform\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;transformations\u0026gt; \u0026lt;transformation\u0026gt; \u0026lt;plugin\u0026gt; dev.morling.demos.recordvalidation.implementation.ValidationWeavingPlugin \u0026lt;/plugin\u0026gt; \u0026lt;/transformation\u0026gt; \u0026lt;/transformations\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; This plug-in runs in the process-classes phase by default, so it can access and enhance the class files generated during compilation. If you were to build the project now, you could use the javap tool to examine the byte code of the Car class,and you’d see that the implicit constructor of that class contains an invocation of the ValidationInterceptor#validate() method.\nAs an example, let’s consider the following attempt to instantiate a Car object, which violates the invariants of that record type:\n1 Car invalid = new Car(\u0026#34;\u0026#34;, \u0026#34;HH-AB-123\u0026#34;, 1); A constraint violation like this will be thrown immediately:\n1 2 3 4 5 javax.validation.ConstraintViolationException: Invalid instantiation of record type Car manufacturer - must not be blank seatCount - must be greater than or equal to 2 at dev.morling.demos.recordvalidation.RecordValidationTest.canValidate(RecordValidationTest.java:20) If all constraints are satisfied, no exception will be thrown and the caller obtains the new Car instance, whose invariants are guaranteed to be met for the remainder of the object’s lifetime.\nAdvantages Having shown how Bean Validation can be leveraged to enforce the invariants of Java record types, it is time to reflect: is this this approach worth the additional complexity incurred by adding a library such as Bean Validation and hooking it up using byte code enhancement? After all, you could also validate incoming parameter values using methods such as Objects#requireNonNull().\nAs so often, you need to make such decision based on your specific requirements and needs. Here are some advantages I can see about the Bean Validation approach:\nInvariants become part of the API: Constraint annotations on public API members such as the implicit record constructor are easily discoverable by users of such type; they are listed in generated JavaDoc, you can see them when hovering over an invocation in your IDE (once records are supported); when used on the DTOs of a REST layer, the invariants could also be added to automatically generated API documentation. All this makes it easy for users of the type to understand the invariants and also avoids potential inconsistencies between a manual validation implementation and corresponding hand-written documentation\nProviding constraint metadata: The Bean Validation constraint meta-data API can be used to obtain information about the constraints of Java types; for instance this can be used to implement client-side validation of constraints in a web application\nLess code: Putting constraint annotations directly to the record components themselves avoids the need for implementing these checks manually in an explicit canonical constructor\nI18N support: Bean Validation provides means of internationalizing constraint violation messages; if your record types are instantiated based on user input (e.g. when using them as data types in a REST API), this allows for localized error messages in the UI\nReturning all constraints at once: For UIs it’s typically beneficial to return all the constraint violations at once instead of showing them one by one; while doable in a hand-written implementation, it requires a bit of effort, whereas you get this \u0026#34;for free\u0026#34; when using Bean Validation which always returns a set of all the violations\nLots of ready-made constraints: Bean Validation comes with a range of constraints out of the box; in addition libraries such as Hibernate Validator and others provide many more ready-to-use constraints, coming in handy for instance when implementing domain-specific value types with complex validation rules:\n1 2 3 public record EmailAddress( @Email @NotNull @Size(min=1, max=250) String value) { } Support for validation groups: Bean Validation’s concept of validation groups allows you to validate only sub-sets of constraints in specific contexts; e.g. based on location and applying legal requirements\nDynamic constraint definition: Using Hibernate Validator, constraints can also be declared dynamically using a fluent API. This can be very useful when your validation requirements vary at runtime, e.g. if you need to apply different constraint configurations for different tenants.\nLimitations One area where this current proof-of-concept implementation falls a bit short is the validation of invariants that apply to multiple components. For instance consider a record type representing an interval with a begin and an end attribute, where you’d like to enforce the invariant that end is larger than begin.\nBean Validation addresses this sort of requirement via class-level constraints and, for method and constructor validation, cross-parameter constraints. Class-level constraints are not really suitable for our purposes, because we want to validate the invariants before an object instance is created.\nCross-parameter constraints on the other hand are exactly what we’d need. As they must be given on a constructor or method, the canonical constructor of a record must be explicitly declared in this case. Using Hibernate Validator’s @ParameterScriptAssert constraint, the invariant from above could be expressed like so:\n1 2 3 4 5 6 public record Interval(int begin, int end) { @ParameterScriptAssert(lang=\u0026#34;javascript\u0026#34;, script=\u0026#34;end \u0026gt; begin\u0026#34;) public Interval { } } This works as expected, but there’s one caveat: any annotations from the record components are not propagated to the corresponding parameters of the canoncial constructor in this case. This means that any constraints given on the individual components would be lost. Right now it’s not quite clear to me whether that’s an intended behavior or rather a bug in the current record implementation.\nIf indeed it is intentional, than there’d be no way other than specifying the constraints explicitly on the parameters of a fully manually implemented constructor:\n1 2 3 4 5 6 7 8 public record Interval(int begin, int end) { @ParameterScriptAssert(lang=\u0026#34;javascript\u0026#34;, script=\u0026#34;end \u0026gt; begin\u0026#34;) public Interval(@Positive int begin, @Positive int end) { this.begin = begin; this.end = end; } } This works, but of course we’re losing a bit of the conciseness promised by records.\nUpdate, Jan 20, 2020, 20:57: Turns out, the current behavior indeed is not intended (see JDK-8236597) and in a future Java version the shorter version of the code shown above should work.\nWrap-Up In this blog post we’ve explored how invariants on Java 14 record types can be enforced using the Bean Validation API. With just a bit of byte code magic the task gets manageable: by validating invariants expressed by constraint annotations on record components right at instantiation time, only valid record instances will ever be exposed to callers. Key for that is the fact that any annotations from record components are automatically propagated to the corresponding parameters of the canonical record constructor. That way they can be validated using Bean Validation’s method validation API. It remains to be seen, whether invariants based on multiple record components also can be enforced as easily.\nFrom the perspective of the Bean Validation specification, it’ll surely make sense to explore support for record types. While not as powerful as enforcing invariants at construction time via byte code enhancement, it might also be useful to support the validation of component values via their read accessors. For that, the notion of \u0026#34;properties\u0026#34; would have to be relaxed, as the read accessors of records don’t have the JavaBeans get prefix currently expected by Bean Validation. It also should be considered to expand the Bean Validation metadata API accordingly.\nI would also be very happy to learn about your thoughts around this topic. While Bean Validation 3.0 (as part of Jakarta EE 9) in all likelyhood won’t bring any changes besides the transition to the jakarta.* package namespace, this may be an area where we could evolve the specification for Jakarta EE 10.\nIf you’d like to experiment with the validation of record types yourself, you can find the complete source code on GitHub.\n","id":79,"publicationdate":"Jan 20, 2020","section":"blog","summary":"Record types are one of the most awaited features in Java 14; they promise to \u0026#34;provide a compact syntax for declaring classes which are transparent holders for shallowly immutable data\u0026#34;. One example where records should be beneficial are data transfer objects (DTOs), as e.g. found in the remoting layer of enterprise applications. Typically, certain rules should be applied to the attributes of such DTO, e.g. in terms of allowed values.","tags":["bean-validation","jakartaee"],"title":"Enforcing Java Record Invariants With Bean Validation","uri":"https://www.morling.dev/blog/enforcing-java-record-invariants-with-bean-validation/"},{"content":" When Java 9 was introduced in 2017, it was the last major version published under the old release scheme. Since then, a six month release cadence has been adopted. This means developers don’t have to wait years for new APIs and language features, but they can get their hands onto the latest additions twice a year. In this post I’d like to describe how you can try out new language features such as Java 13 text blocks in the test code of your project, while keeping your main code still compatible with older Java versions.\nOne goal of the increased release cadence is to shorten the feedback loop for the OpenJDK team: have developers in the field try out new functionality early on, collect feedback based on that, adjust as needed. To aid with that process, the JDK has two means of publishing preliminary work before new APIs and language features are cast in stone:\nIncubator JDK modules\nPreview language and VM features\nAn example for the former is the new HTTP client API, which was an incubator module in JDK 9 and 10, before it got standardized as a regular API in JDK 11. Examples for preview language features are switch expressions (added as a preview feature in Java 12) and text blocks (added in Java 13).\nNow especially text blocks are a feature which many developers have missed in Java for a long time. They are really useful when embedding other languages, or just any kind of longer text into your Java program, e.g. multi-line SQL statements, JSON documents and others. So you might want to go and use them as quickly as possible, but depending on your specific situation and requirements, you may no be able to move to Java 13 just yet.\nIn particular when working on libraries, compatibility with older Java versions is a high priority in order to not cut off a large number of potential users. E.g. in the JetBrains Developer Ecosystem Survey from early 2019, 83% of participants said that Java 8 is a version they regularly use. This matches with what I’ve observed myself during conversations e.g. at conferences. Now this share may have reduced a bit since then (I couldn’t find any newer numbers), but at this point in time it still seems save to say that libraries should support Java 8 to not limit their audience in a signficant way.\nSo while building on Java 13 is fine, requiring it at runtime for libraries isn’t. Does this mean as a library author you cannot use text blocks then for many years to come? For your main code (i.e. the one shipped to users) it indeed does mean that, but things look different when it comes to test code.\nAn Example One case where text blocks come in extremely handy is testing of REST APIs, where JSON requests need to created and responses may have to be compared to a JSON string with the expected value. Here’s an example of using text blocks in a test of a Quarkus-based REST service, implemented using RESTAssured and JSONAssert:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 @QuarkusTest public class TodoResourceTest { @Test public void canPostNewTodoAndReceiveId() throws Exception { given() .when() .body(\u0026#34;\u0026#34;\u0026#34; (1) { \u0026#34;title\u0026#34; : \u0026#34;Learn Java\u0026#34;, \u0026#34;completed\u0026#34; : false } \u0026#34;\u0026#34;\u0026#34; ) .contentType(ContentType.JSON) .post(\u0026#34;/hello\u0026#34;) .then() .statusCode(201) .body(matchesJson(\u0026#34;\u0026#34;\u0026#34; (2) { \u0026#34;id\u0026#34; : 1, \u0026#34;title\u0026#34; : \u0026#34;Learn Java\u0026#34;, \u0026#34;completed\u0026#34; : false } \u0026#34;\u0026#34;\u0026#34;) ); } } 1 Text block with the JSON request to send 2 Text block with the expected JSON response Indeed that’s much nicer to read, e.g. when comparing the request JSON to the code you’d typically write without text blocks. Concatenating multiple lines, escaping quotes and explicitly specifying line breaks make this quite cumbersome:\n1 2 3 4 5 6 .body( \u0026#34;{\\n\u0026#34; + \u0026#34; \\\u0026#34;title\\\u0026#34; : \\\u0026#34;Learn Java 13\\\u0026#34;,\\n\u0026#34; + \u0026#34; \\\u0026#34;completed\\\u0026#34; : false\\n\u0026#34; + \u0026#34;}\u0026#34; ) Now let’s see what’s needed in terms of configuration to enable usage of Java 13 text blocks for tests, while keeping the main code of a project compatible with Java 8.\nConfiguration Two options of the Java compiler javac come into play here:\n--release: specifies the Java version to compile for\n--enable-preview: allows to use language features currently in \u0026#34;preview\u0026#34; status such as text blocks as of Java 13/14\nThe --release option was introduced in Java 9 and should be preferred over the more widely known pair of --source and --target. The reason being that --release will prevent any accidental usage of APIs only introduced in later versions.\nE.g. say you were to write code such as List.of(\u0026#34;Foo\u0026#34;, \u0026#34;Bar\u0026#34;); the of() methods on java.util.List were only introduced in Java 9, so compiling with --release 8 will raise a compilation error in this case. When using the older options, this situation wouldn’t be detected at compile time, making the problem only apparent when actually running the application on the older Java version.\nBuild tools typically allow to use different configurations for the compilation of main and test code. E.g. here is what you’d use for Maven (you can find the complete source code of the example in this GitHub repo):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ... \u0026lt;properties\u0026gt; ... \u0026lt;maven.compiler.release\u0026gt;8\u0026lt;/maven.compiler.release\u0026gt; (1) ... \u0026lt;/properties\u0026gt; \u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; ... \u0026lt;plugin\u0026gt; \u0026lt;artifactId\u0026gt;maven-compiler-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.8.1\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;default-testCompile\u0026lt;/id\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;release\u0026gt;13\u0026lt;/release\u0026gt; (2) \u0026lt;compilerArgs\u0026gt;--enable-preview\u0026lt;/compilerArgs\u0026gt; (3) \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... \u0026lt;/plugins\u0026gt; ... \u0026lt;/build\u0026gt; ... 1 Compile for release 8 by default, i.e. the main code 2 Compile test code for release 13 3 Also pass the --enable-preview option when compiling the test code Also at runtime preview features must be explicitly enabled. Therefore the java command must be accordingly configured when executing the tests, e.g. like so when using the Maven Surefire plug-in:\n1 2 3 4 5 6 7 8 9 ... \u0026lt;plugin\u0026gt; \u0026lt;artifactId\u0026gt;maven-surefire-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.22.1\u0026lt;/version\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;argLine\u0026gt;--enable-preview\u0026lt;/argLine\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; ... With this configuration in place, text blocks can now be used in tests as the one above, but not in the main code of the program. Doing so would result in a compilation error.\nNote your IDE might still let you do this kind of mistake. At least Eclipse chose for me the maximum of main (8) and test code (13) release levels when importing the project. But running the build on the command line via Maven or on your CI server will detect this situation.\nAs Java 13 now is required to build this code base, it’s a good idea to make this prerequisite explicit in the build process itself. The Maven enforcer plug-in comes in handy for that, allowing to express this requirement using its Java version rule:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-enforcer-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.0.0-M3\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;enforce-java\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;enforce\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;rules\u0026gt; \u0026lt;requireJavaVersion\u0026gt; \u0026lt;version\u0026gt;[13,)\u0026lt;/version\u0026gt; \u0026lt;/requireJavaVersion\u0026gt; \u0026lt;/rules\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... The plug-in will fail the build when being run on a version before Java 13.\nShould You Do This? Having seen how you can use preview features in test code, the question is: should you actually do this? A few things should be kept in mind for answering that. First of all, preview features are really that, a preview. This means that details may change in future Java revisions. Or, albeit unlikely, such feature may even be dropped altogether, should the JDK team arrive at the conclusion that it is fundamentally flawed.\nAnother important factor is the minimum Java language version supported by the JDK compiler. As of Java 13, the oldest supported release is 7; i.e. using JDK 13, you can produce byte code that can be run with Java versions as old as Java 7. In order to keep the Java compiler maintainable, support for older versions is dropped every now and then. Right now, there’s no formal process in place which would describe when support for a specific version is going to be removed (defining such policy is the goal of JEP 182).\nAs per JDK developer Joe Darcy, \u0026#34;there are no plans to remove support for --release 7 in JDK 15\u0026#34;. Conversely, this means that support for release 7 theoretically could be removed in JDK 16 and support for release 8 could be removed in JDK 17. In that case you’d be caught between a rock and a hard place: Once you’re on a non-LTS (\u0026#34;long-term support\u0026#34;) release like JDK 13, you’ll need to upgrade to JDK 14, 15 etc. as soon as they are out, in order to not be cut off from bug fixes and security patches. Now while doing so, you’d be forced to increase the release level of your main code, once support for release 8 gets dropped, which may not desirable. Or you’d have to apply some nice awk/sed magic to replace all those shiny text blocks with traditional concatenated and escaped strings, so you can go back to the current LTS release, Java 11. Not nice, but surely doable.\nThat being said, this all doesn’t seem like a likely scenario to me. JEP 182 expresses a desire \u0026#34;that source code 10 or more years old should still be able to be compiled\u0026#34;; hence I think it’s save to assume that JDK 17 (the next release planned to receive long-term support) will still support release 8, which will be seven years old when 17 gets released as planned in September 2021. In that case you’d be on the safe side, receiving update releases and being able to keep your main code Java 8 compatible for quite a few years to come.\nNeedless to say, it’s a call that you need to make, deciding for yourself wether the benefits of using new language features such as text blocks is worth it in your specific situation or not.\n","id":80,"publicationdate":"Jan 13, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen Java 9 was introduced in 2017,\nit was the last major version published under the old release scheme.\nSince then, a \u003ca href=\"https://www.infoq.com/news/2017/09/Java6Month/\"\u003esix month release cadence\u003c/a\u003e has been adopted.\nThis means developers don’t have to wait years for new APIs and language features,\nbut they can get their hands onto the latest additions twice a year.\nIn this post I’d like to describe how you can try out new language features such as \u003ca href=\"http://openjdk.java.net/jeps/355\"\u003eJava 13 text blocks\u003c/a\u003e in the test code of your project,\nwhile keeping your main code still compatible with older Java versions.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Using Java 13 Text Blocks (Only) for Your Tests","uri":"https://www.morling.dev/blog/using-java-13-text-blocks-for-tests/"},{"content":" One of the long-awaited features in Quarkus was support for server-side templating: until recently, Quarkus supported only client-side web frameworks which obtain there data by calling a REST API on the backend. This has changed with Quarkus 1.1: it comes with a brand-new template engine named Qute, which allows to build web applications using server-side templates.\nWhen looking at frameworks for building web applications, there’s two large categories:\nclient-side solutions based on JavaScript such as React, vue.js or Angular\nserver-side frameworks such as Spring Web MVC, JSF or MVC 1.0 (in the Java world)\nBoth have their indivdual strengths and weaknesses and it’d be not very wise to always prefer one over the other. Instead, the choice should be based on specific requirements (e.g. what kind of interactivity is needed) and prerequisites (e.g. the skillset of the team building the application).\nBeing mostly experienced with Java, server-side solutions are appealing to me, as they allow me to use the language I know and tooling (build tools, IDEs) I’m familiar and most productive with. So when Qute was announced, it instantly caught my attention and I had to give it a test ride. In this post I want to share some of the experiences I made.\nNote this isn’t a comprehensive tutorial for building web apps with Qute, instead, I’d like to discuss a few things that stuck out to me. You can find a complete working example here on GitHub. It implements a basic CRUD application for managing personal todos, persisted in a Postgres database. Here’s a video that shows the demo in action:\nThe Basics The Qute engine is based on RESTEasy/JAX-RS. As such, Qute web applications are implemented by defining resource types with methods answering to specific HTTP verbs and accept headers. The only difference being, that HTML pages are returned instead of JSON as in your typical REST-ful data API. The individual pages are created by processing template files. Here’s a basic example for returning all the Todo records in our application:\n1 2 3 4 5 6 7 8 9 10 11 12 13 @Path(\u0026#34;/todo\u0026#34;) public class TodoResource { @Inject Template todos; @GET (1) @Consumes(MediaType.TEXT_HTML) (2) @Produces(MediaType.TEXT_HTML) public TemplateInstance listTodos() { return todos.data(\u0026#34;todos\u0026#34;, Todo.findAll().list()); (3) } } 1 Processes HTTP GET requests for /todo 2 This method consumes and produces the text/html media type 3 Obtain all todos from the database and feed them to the todos template The Todo class is as JPA entity implemented via Hibernate Panache:\n1 2 3 4 5 6 7 @Entity public class Todo extends PanacheEntity { public String title; public int priority; public boolean completed; } Panache is a perfect fit for this kind of CRUD applications. It helps with common tasks such as id mapping, and by means of the active record pattern you get query methods like findAll() \u0026#34;for free\u0026#34;.\nTo produce an HTML page for displaying the result list, the todos template is used. Templates are located under src/main/resources/templates. As you would expect it, changes to template files are immediatly picked up when running Quarkus in Dev Mode. By default, the template name is derived from the field name of the injected Template instance, i.e. in this case the src/main/resources/templates/todos.html template will be used. It could look like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34;\u0026gt; \u0026lt;!-- CSS ... --\u0026gt; \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;...\u0026#34;\u0026gt; \u0026lt;title\u0026gt;My Todos\u0026lt;/title\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;div class=\u0026#34;container\u0026#34;\u0026gt; \u0026lt;h1\u0026gt;My Todos\u0026lt;/h1\u0026gt; \u0026lt;table class=\u0026#34;table table-striped table-bordered\u0026#34;\u0026gt; \u0026lt;thead\u0026gt; \u0026lt;tr\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; class=\u0026#34;fit\u0026#34;\u0026gt;Id\u0026lt;/th\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; \u0026gt;Title\u0026lt;/th\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; class=\u0026#34;fit\u0026#34;\u0026gt;Priority\u0026lt;/th\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; class=\u0026#34;fit\u0026#34;\u0026gt;Completed\u0026lt;/th\u0026gt; \u0026lt;/tr\u0026gt; \u0026lt;/thead\u0026gt; {#if todos.size == 0} (1) \u0026lt;tr\u0026gt; \u0026lt;td colspan=\u0026#34;4\u0026#34;\u0026gt;No data found.\u0026lt;/td\u0026gt; \u0026lt;/tr\u0026gt; {#else} {#for todo in todos} (2) \u0026lt;tr\u0026gt; \u0026lt;th scope=\u0026#34;row\u0026#34;\u0026gt;#{todo.id}\u0026lt;/th\u0026gt; \u0026lt;td\u0026gt; {todo.title} (3) \u0026lt;/td\u0026gt; \u0026lt;td\u0026gt; {todo.priority} (4) \u0026lt;/td\u0026gt; \u0026lt;td\u0026gt; (5) \u0026lt;div class=\u0026#34;custom-control custom-checkbox\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;checkbox\u0026#34; class=\u0026#34;custom-control-input\u0026#34; disabled id=\u0026#34;completed-{todo.id}\u0026#34; {#if todo.completed}checked{/if}\u0026gt; \u0026lt;label class=\u0026#34;custom-control-label\u0026#34; for=\u0026#34;completed-{todo.id}\u0026#34;\u0026gt;\u0026lt;/label\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/td\u0026gt; \u0026lt;/tr\u0026gt; {/for} {/if} \u0026lt;/table\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; 1 If the injected todos list is empty, display a placeholder row 2 Otherwise, iterate over the todos list and add a table row for each one 3 Table cell for title 4 Table cell for priority 5 Table cell for completion status, rendered as a checkbox If you’ve worked with other templating engine before, this will look very familiar to you. You can refer to injected objects and their properties to display their values, have conditional logic, iterate over collections etc. A very nice aspect about Qute templates is that they are processed at build time, following the Quarkus notion of \u0026#34;compile-time boot\u0026#34;. This means if there is an error in a template such as unbalanced control keywords, you’ll find out about this at build time instead of only at runtime.\nThe reference documentation describes the syntax and all options in depth. Note that things are still in flux here, e.g. I couldn’t work with boolean operators in conditions.\nCombining HTML and Data APIs Thanks to HTTP content negotiation, you can easily combine resource methods for returning HTML and JSON for API-style consumers in a single endpoint. Just add another resource method for handling the required media type, e.g. \u0026#34;application/json\u0026#34;:\n1 2 3 4 5 6 @GET @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public List\u0026lt;Todo\u0026gt; listTodosJson() { return Todo.findAll().list(); } A standard HTTP request issued by a web browser would now be answered with the HTML page, whereas an AJAX request with the \u0026#34;application/json\u0026#34; accept header (or a manual invocation via curl) would yield the JSON representation. I really like that idea of considering HTML and JSON-based representations as two different \u0026#34;views\u0026#34; of the same API essentially.\nTemplate Organization If a web application has multiple pages or \u0026#34;views\u0026#34;, chances are there are many similarities between those. E.g. there might be a common header and footer for all pages, or one and the same form is used on multiple pages.\nTo avoid duplication in the templates in such cases, Qute supports the notion of includes. E.g. let’s say there’s a common form for creating new and editing existing todos. This can be put into its own template:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 (1) \u0026lt;form action=\u0026#34;/todo/{#if update}{todo.id}/edit{#else}new{/if}\u0026#34; method=\u0026#34;POST\u0026#34; name=\u0026#34;todoForm\u0026#34; enctype=\u0026#34;multipart/form-data\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-row align-items-center\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;col-sm-3 my-1\u0026#34;\u0026gt; \u0026lt;label class=\u0026#34;sr-only\u0026#34; for=\u0026#34;title\u0026#34;\u0026gt;Title\u0026lt;/label\u0026gt; (2) \u0026lt;input type=\u0026#34;text\u0026#34; name=\u0026#34;title\u0026#34; class=\u0026#34;form-control\u0026#34; id=\u0026#34;title\u0026#34; placeholder=\u0026#34;Title\u0026#34; required autofocus {#if update}value=\u0026#34;{todo.title}\u0026#34;{/if}\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;div class=\u0026#34;col-auto my-1\u0026#34;\u0026gt; \u0026lt;select class=\u0026#34;custom-select\u0026#34; name=\u0026#34;priority\u0026#34;\u0026gt; \u0026lt;option disabled value=\u0026#34;\u0026#34;\u0026gt;Priority\u0026lt;/option\u0026gt; {#for prio in priorities} \u0026lt;option value=\u0026#34;{prio}\u0026#34; {#if todo.priority == prio}selected{/if}\u0026gt;{prio}\u0026lt;/option\u0026gt; {/for} \u0026lt;/select\u0026gt; \u0026lt;/div\u0026gt; (3) {#if update} \u0026lt;div class=\u0026#34;col-auto my-1\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-check\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;completed\u0026#34; class=\u0026#34;form-check-input\u0026#34; id=\u0026#34;completed\u0026#34; {#if todo.completed}checked{/if}\u0026gt; \u0026lt;label class=\u0026#34;form-check-label\u0026#34; for=\u0026#34;completed\u0026#34;\u0026gt;Completed\u0026lt;/label\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/div\u0026gt; {/if} (4) \u0026lt;button type=\u0026#34;submit\u0026#34; class=\u0026#34;btn btn-primary\u0026#34;\u0026gt;{#if update}Update{#else}Create{/if}\u0026lt;/button\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/form\u0026gt; 1 Post to different path for update and create 2 Display existing title and priority in case of an update 3 Show checkbox for completion status in case of an update 4 Choose button caption depending on use case In order to display this form right under the table with all todos, the template can simply be included like so:\n1 2 \u0026lt;h2\u0026gt;New Todo\u0026lt;/h2\u0026gt; {#include todo-form.html}{/include} It’s also possible to extract the outer shell of multiple pages into a shared template (\u0026#34;template inheritance\u0026#34;). This allows to extract common headers and footers into one single template with placeholders for the inner parts.\nFor that, create a template with the common outer structure:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34;\u0026gt; \u0026lt;!-- CSS ... --\u0026gt; \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;...\u0026#34;\u0026gt; \u0026lt;title\u0026gt;{#insert title}Default Title{/}\u0026lt;/title\u0026gt; (1) \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;div class=\u0026#34;container\u0026#34;\u0026gt; \u0026lt;h1\u0026gt;{#insert title}Default Title{/}\u0026lt;/h1\u0026gt; (1) {#insert contents}No contents!{/} (2) \u0026lt;/div\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; 1 Derived templates define a section title which will be inserted here 2 Derived templates define a section contents which will be inserted here Other templates can then extend the base one, e.g. like so for the \u0026#34;Edit Todo\u0026#34; page:\n1 2 3 4 5 6 {#include base.html} (1) {#title}Edit Todo #{todo.id}{/title} (2) {#contents} (3) {#include todo-form.html}{/include} (4) {/contents} {/include} 1 Include the base template 2 Define the title section 3 Define the contents section 4 Include the template for displaying the todo form As so often, a balance needs to be found between extracting common parts and still being able to comprehend the overall structure without having to pursue a large number of template references. But in any case with includes and inserts Qute puts the neccessary tools into your hands.\nError Handling For a great user experience robust error handling is a must. E.g. might happen that a user loads the \u0026#34;Edit Todo\u0026#34; dialog and while they’re in the process of editing, that record gets deleted by someone else. When saving, a proper error message should be displayed to the first user. Here’s the resource method implementation for that:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 @POST @Consumes(MediaType.MULTIPART_FORM_DATA) @Transactional @Path(\u0026#34;/{id}/edit\u0026#34;) public Object updateTodo( @PathParam(\u0026#34;id\u0026#34;) long id, @MultipartForm TodoForm todoForm) { Todo loaded = Todo.findById(id); (1) if (loaded == null) { (2) return error.data(\u0026#34;error\u0026#34;, \u0026#34;Todo with id \u0026#34; + id + \u0026#34; has been deleted after loading this form.\u0026#34;); } loaded = todoForm.updateTodo(loaded); (3) return Response.status(301) (4) .location(URI.create(\u0026#34;/todo\u0026#34;)) .build(); } 1 Load the todo record to be updated 2 If it doesn’t exist, render the \u0026#34;error\u0026#34; template 3 Otherwise, update the record; as loaded is an attached entity, no call to persist is needed 4 redirect the user to the main page, avoiding issues with reloading etc. (post-redirect-get pattern) Note that TemplateInstance as returned from the Template#data() method doesn’t extend the JAX-RS Response class. Therefore the return type of the method must be declared as Object in this case.\nSearch Thanks to Hibernate Panache it’s quite simple to refine the todo list and only return those whose title matches a given search term. Also ordering the list in some meaningful way would be nice. All we need is an optional query parameter for specifying the search term and a custom query method:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 @GET @Consumes(MediaType.TEXT_HTML) @Produces(MediaType.TEXT_HTML) public TemplateInstance listTodos(@QueryParam(\u0026#34;filter\u0026#34;) String filter) { return todos.data(\u0026#34;todos\u0026#34;, find(filter)); } @GET @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public List\u0026lt;Todo\u0026gt; listTodosJson(@QueryParam(\u0026#34;filter\u0026#34;) String filter) { return find(filter); } private List\u0026lt;Todo\u0026gt; find(String filter) { Sort sort = Sort.ascending(\u0026#34;completed\u0026#34;) (1) .and(\u0026#34;priority\u0026#34;, Direction.Descending) .and(\u0026#34;title\u0026#34;, Direction.Ascending); if (filter != null \u0026amp;\u0026amp; !filter.isEmpty()) { (2) return Todo.find(\u0026#34;LOWER(title) LIKE LOWER(?1)\u0026#34;, sort, \u0026#34;%\u0026#34; + filter + \u0026#34;%\u0026#34;).list(); } else { return Todo.findAll(sort).list(); (3) } } 1 First sort by completion status, then priority, then by title 2 If a filter is given, apply the search term lower-cased and with wildcards, i.e. using a WHERE clause such as where lower(todo0_.title) like lower(%searchterm%) 3 Otherwise, return all todos To enter the search term, a form is added next to the table of todos:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 (1) \u0026lt;form action=\u0026#34;/todo\u0026#34; method=\u0026#34;GET\u0026#34; name=\u0026#34;search\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-row align-items-center\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;col-sm-3 my-1\u0026#34;\u0026gt; \u0026lt;label class=\u0026#34;sr-only\u0026#34; for=\u0026#34;filter\u0026#34;\u0026gt;Search\u0026lt;/label\u0026gt; (2) \u0026lt;input type=\u0026#34;text\u0026#34; name=\u0026#34;filter\u0026#34; class=\u0026#34;form-control\u0026#34; id=\u0026#34;filter\u0026#34; placeholder=\u0026#34;Search By Title\u0026#34; required {#if filtered}value=\u0026#34;{filter}\u0026#34;{/if}\u0026gt; \u0026lt;/div\u0026gt; (3) \u0026lt;input class=\u0026#34;btn btn-primary\u0026#34; value=\u0026#34;Search\u0026#34; type=\u0026#34;submit\u0026#34;\u0026gt;\u0026amp;nbsp; \u0026lt;a class=\u0026#34;btn btn-secondary {#if !filtered}disabled{/if}\u0026#34; href=\u0026#34;/todo\u0026#34; role=\u0026#34;button\u0026#34;\u0026gt;Clear Filter\u0026lt;/a\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/form\u0026gt; 1 Invoke this page with the entered search as query parameter 2 Input for the search term; show the previously entered term, if any 3 A button for clearing the result list if a search term has been entered; otherwise the button will be disabled Smoother User Experience via Unpoly The last thing I wanted to explore is how the usability and performance of the application can be improved by means of some client-side enhancements. By default, a web app rendered on the server-side like ours requires full page loads when going from one page to the other. This is where single page applications (SPAs) implemented with client-side frameworks shine: just parts of the document object model tree in the browser will be replaced e.g. when loading a result list via AJAX, resulting in a much smoother and faster user experience.\nDoes this mean we have to give up on server-side rendering altogether if we’re after this kind of UX? Luckily not, as small helper libraries such as Unpoly, Intercooler or Turbolinks can be leveraged to replace just page fragments instead of requiring full page loads. This results in a smooth SPA-like user experience without having to opt into the full client-side programming model. For the Todo example I’ve obtained great results using Unpoly. After importing its JavaScript file, all that’s needed is to add the up-target attribute to links or forms.\nE.g. here’s the form for entering the search term with that modification:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 (1) \u0026lt;form action=\u0026#34;/todo\u0026#34; method=\u0026#34;GET\u0026#34; name=\u0026#34;search\u0026#34; up-target=\u0026#34;.container\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-row align-items-center\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;col-sm-3 my-1\u0026#34;\u0026gt; \u0026lt;label class=\u0026#34;sr-only\u0026#34; for=\u0026#34;filter\u0026#34;\u0026gt;Search\u0026lt;/label\u0026gt; \u0026lt;input type=\u0026#34;text\u0026#34; name=\u0026#34;filter\u0026#34; class=\u0026#34;form-control\u0026#34; id=\u0026#34;filter\u0026#34; placeholder=\u0026#34;Search By Title\u0026#34; required {#if filtered}value=\u0026#34;{filter}\u0026#34;{/if}\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;input class=\u0026#34;btn btn-primary\u0026#34; value=\u0026#34;Search\u0026#34; type=\u0026#34;submit\u0026#34;\u0026gt;\u0026amp;nbsp; (2) \u0026lt;a class=\u0026#34;btn btn-secondary {#if !filtered}disabled{/if}\u0026#34; href=\u0026#34;/todo\u0026#34; role=\u0026#34;button\u0026#34; up-target=\u0026#34;.container\u0026#34;\u0026gt;Clear Filter\u0026lt;/a\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/form\u0026gt; 1 When receiving the result of the form submission, replace the \u0026lt;div\u0026gt; with CSS class container of the current page with the one from the response 2 Do the same when following the \u0026#34;Clear Filter\u0026#34; link The magic trick of Unpoly is that links and forms with the up-target attribute are intercepted by Unpoly and executed via AJAX calls. The specified fragments from the result page are then used to replace parts of the already loaded page, instead of having the browser load the full response page. The result is the fast user experience shown in the video above.\nUnpoly also allows to show page fragments in modal dialogs, allowing to remain on the same page also when showing forms such as the one for editing a todo:\nNote that if JavaScript is disabled, the application gracefully falls back to full page loads. I.e. it will still be fully functional, just with a slightly degraded user experience. The same would happen when accessing the edit dialog directly via its URL or when opening the \u0026#34;Edit\u0026#34; link in a new tab or window:\nBonus: Using WebJars In a thread on Twitter James Ward brought up the idea of pulling in required resources such as Bootstrap via WebJars instead of getting them from a CDN. WebJars is a useful utility for obtaining all sorts of client-side libraries with Java build tools such as Maven or Gradle.\nFor Bootstrap, the following dependency must be added to the Maven pom.xml file:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.webjars\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;bootstrap\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;4.4.1\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; The Bootstrap CSS can then be included within the base.html template like so:\n1 2 3 4 5 6 7 ... \u0026lt;head\u0026gt; ... \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;/webjars/bootstrap/4.4.1/css/bootstrap.min.css\u0026#34;\u0026gt; ... \u0026lt;/head\u0026gt; ... This is all that’s needed in order to use Bootstrap via WebJars. Note this will work on the JVM and also with a native binary via GraalVM: WebJars resources are located under META-INF/resources, and Quarkus automatically adds all resources from there when building a native image.\nWrap Up This concludes my quick tour through server-side web applications with Quarkus and its new Qute extension. Where only web applications based on REST APIs called by client-side web applications were supported before, Qute is a great addition to the list of Quarkus extensions, allowing to choose different architecture styles based on your needs and preferences.\nNote that Qute currently is in \u0026#34;Experimental\u0026#34; state, i.e. it’s a great time to give it a try and share your feedback, but be prepared for possible immaturities and potential changes down the road. E.g. I noticed that complex boolean expressions in template conditions aren’t support yet. Also it would be great to get build-time feedback upon invalid variable references in templates.\nTo learn more, refer to the Qute guide and its reference documentation. You can find the complete source code of the Todo example including instructions for building and running in this GitHub repo.\n","id":81,"publicationdate":"Jan 3, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOne of the long-awaited features in Quarkus was support for server-side templating:\nuntil recently, Quarkus supported only client-side web frameworks which obtain there data by calling a REST API on the backend.\nThis has changed with \u003ca href=\"https://quarkus.io/blog/quarkus-1-1-0-final-released/\"\u003eQuarkus 1.1\u003c/a\u003e: it comes with a brand-new template engine named \u003ca href=\"https://quarkus.io/guides/qute\"\u003eQute\u003c/a\u003e,\nwhich allows to build web applications using server-side templates.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Quarkus Qute – A Test Ride","uri":"https://www.morling.dev/blog/quarkus-qute-test-ride/"},{"content":" As a software engineer, I like to automate tedious tasks as much as possible. The deployment of this website is no exception: it is built using the Hugo static site generator and hosted on GitHub Pages; so wouldn’t it be nice if the rendered website would automatically be published whenever an update is pushed to its source code repository?\nWith the advent of GitHub Actions, tasks like this can easily be implemented without having to rely on any external CI service. Instead, many ready-made actions can be obtained from the GitHub marketplace and easily be configured as per our needs. E.g. triggered by a push to a specified branch in a GitHub repository, they can execute tasks like project builds, tests and many others, running in virtual machines based on Linux, Windows and even macOS. So let’s see what’s needed for building a Hugo website and deploying it to GitHub Pages.\nGitHub Actions To the Rescue Using my favourite search engine, I came across two GitHub actions which do everything we need:\nGitHub Actions for Hugo\nGitHub Actions for GitHub Pages\nThere are multiple alternatives for GitHub Pages deployment. I chose this one basically because it seems to be the most popular one (as per number of GitHub stars), and because it’s by the same author as the Hugo one, so they should nicely play together.\nRegistering a Deploy Key In order for the GitHub action to deploy the website, a GitHub deploy key must be registered.\nTo do so, create a new SSH key pair on your machine like so:\nssh-keygen -t rsa -b 4096 -C \u0026#34;$(git config user.email)\u0026#34; -f gh-pages -N \u0026#34;\u0026#34; This will create two files, the public key (gh-pages.pub) and the private key (gh-pages). Go to https://github.com/\u0026lt;your-user-or-organisation\u0026gt;/\u0026lt;your-repo\u0026gt;/settings/keys and click \u0026#34;Add deploy key\u0026#34;. Paste in the public part of your key pair and check the \u0026#34;Allow write access\u0026#34; box.\nNow go to https://github.com/\u0026lt;your-user-or-organisation\u0026gt;/\u0026lt;your-repo\u0026gt;/settings/secrets and click \u0026#34;Add new secret\u0026#34;. Choose ACTIONS_DEPLOY_KEY as the name and paste the private part of your key pair into the \u0026#34;Value\u0026#34; field.\nThe key will be stored in an encrypted way as per GitHub’s documentation Nevertheless I’d recommend to use a specific key pair just for this purpose, instead of re-using any other key pair. That way, impact will be reduced to this particular usage, should the private key get leaked somehow.\nDefining the Workflow With the key in place, it’s time to set up the actual GitHub Actions workflow. This is simply done by creating the file .github/workflows/gh-pages-deployment.yml in your repository with the following contents. GitHub Actions workflows are YAML files, because YOLO ;)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 name: GitHub Pages on: (1) push: branches: - master jobs: build-deploy: runs-on: ubuntu-18.04 steps: - uses: actions/checkout@v1 (2) with: submodules: true - name: Install Ruby Dev (3) run: sudo apt-get install ruby-dev - name: Install AsciiDoctor and Rouge run: sudo gem install asciidoctor rouge - name: Setup Hugo (4) uses: peaceiris/actions-hugo@v2 with: hugo-version: \u0026#39;0.62.0\u0026#39; - name: Build (5) run: hugo - name: Deploy (6) uses: peaceiris/actions-gh-pages@v2 env: ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }} PUBLISH_BRANCH: gh-pages PUBLISH_DIR: ./public 1 Run this action whenever changes are pushed to the master branch 2 The first step in the job: check out the source code 3 Install AsciiDoctor (in case you use Hugo with AsciiDoc files, like I do) and Rouge, a Ruby gem for syntax highlighting; I’m installing the gems instead of Ubuntu packages in order to get current versions 4 Set up Hugo via the aforementioned GitHub Actions for Hugo 5 Run the hugo command; here you could add parameters such as -F for also building future posts 6 Deploy the website to GitHub pages; the contents of Hugo’s build directory public will be pushed to the gh-pages branch of the upstream repository, using the deploy key configured before And that’s all we need; once the file is committed and pushed to the upstream repository, the deployment workflow will be executed upon each push to the master branch.\nYou can find the complete workflow definition used for publishing this website here. Also check out the documentation of GitHub Actions for Hugo and GitHub Actions for GitHub Pages to learn more about their capabilities and the options they offer.\n","id":82,"publicationdate":"Dec 26, 2019","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs a software engineer, I like to automate tedious tasks as much as possible.\nThe deployment of this website is no exception:\nit is built using the \u003ca href=\"https://gohugo.io/\"\u003eHugo\u003c/a\u003e static site generator and hosted on \u003ca href=\"https://pages.github.com/\"\u003eGitHub Pages\u003c/a\u003e;\nso wouldn’t it be nice if the rendered website would automatically be published whenever an update is pushed to its source code repository?\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Automatically Deploying a Hugo Website via GitHub Actions","uri":"https://www.morling.dev/blog/automatically-deploying-hugo-website-via-github-actions/"},{"content":" It has been quite a while since the last post on my old personal blog; since then, I’ve mostly focused on writing about my day-work on the Debezium blog as well as some posts about more general technical topics on the Hibernate team blog.\nNow recently I had some ideas for things I wanted to write about, which didn’t feel like a good fit for neither of those two. So it was time to re-boot a personal blog. The previous Blogger based one really, really feels outdated by now. Plus, I also wanted to have more control over how things work, and also be able to publish a list of projects I work on, conference talks I gave etc. So I decided to build the site using Hugo, a static site generator, and also use a nice new shiny dev domain. And here we are, welcome to morling.dev!\nStay tuned for more posts every now and then about anything related to open source, the projects I work on and software engineering in general. Onwards!\n","id":83,"publicationdate":"Dec 26, 2019","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt has been quite a while since the last post on my old \u003ca href=\"http://musingsofaprogrammingaddict.blogspot.com/\"\u003epersonal blog\u003c/a\u003e;\nsince then, I’ve mostly focused on writing about my day-work on the \u003ca href=\"https://debezium.io/blog/\"\u003eDebezium blog\u003c/a\u003e as well as \u003ca href=\"https://in.relation.to/gunnar-morling/\"\u003esome posts\u003c/a\u003e about more general technical topics on the Hibernate team blog.\u003c/p\u003e\n\u003c/div\u003e","tags":null,"title":"Time for a New Blog","uri":"https://www.morling.dev/blog/time-for-new-blog/"},{"content":"","id":84,"publicationdate":"Jan 1, 0001","section":"categories","summary":"","tags":null,"title":"Categories","uri":"https://www.morling.dev/categories/"}]