[{"content":"","id":0,"publicationdate":"May 31, 2026","section":"blog","summary":"","tags":null,"title":"Blogs","uri":"https://www.morling.dev/blog/"},{"content":"","id":1,"publicationdate":"May 31, 2026","section":"","summary":"","tags":null,"title":"Gunnar Morling","uri":"https://www.morling.dev/"},{"content":"","id":2,"publicationdate":"May 31, 2026","section":"tags","summary":"","tags":null,"title":"hardwood","uri":"https://www.morling.dev/tags/hardwood/"},{"content":" Table of Contents Reworked ColumnReader API Geospatial Support Documentation Overhaul Further Fixes and Improvements I am happy to announce the release of Hardwood 1.0.0.CR1!\nThis first candidate release of Hardwood 1.0 brings a substantially improved API for columnar access to Apache Parquet files, initial support for Parquet’s GEOMETRY/GEOGRAPHY column types, and many other improvements to the core library as well as the Hardwood CLI.\nReworked ColumnReader API Hardwood provides two APIs for parsing Parquet files:\nThe RowReader API provides row-oriented access to Parquet records, including nested structs, lists, and maps. Optimized for ergonomics and ease of use, it is the recommended general-purpose API for reading arbitrarily complex structured records one by one\nThe ColumnReader API offers batch-style access to the columnar data of a Parquet file; it is optimized for throughput and the preferred choice for analytical workloads that operate on large numbers of values\nFor the 1.0.0.CR1 release, we’ve reworked the columnar API to close some gaps around the retrieval of optional and repeatable columns and make the API less error-prone to use. Taking inspiration from Apache Arrow’s columnar format for nested data, we introduced a new type, Validity, to model nullability across both flat and nested data. Let’s take a look at some examples. First, here’s how to sum all the values from a flat (i.e. non-nested and non-repeatable) column:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ParquetFileReader reader = ...; try (ColumnReader fare = reader.columnReader(\u0026#34;fare_amount\u0026#34;)) { (1) double sum = 0; while (fare.nextBatch()) { int count = fare.getValueCount(); double[] values = fare.getDoubles(); (2) Validity validity = fare.getLeafValidity(); boolean hasNulls = validity.hasNulls(); (3) for (int i = 0; i \u0026lt; count; i++) { (4) if (!hasNulls || validity.isNotNull(i)) { sum += values[i]; } } } } 1 Create a column reader by name (spans all row groups automatically) 2 Get the values from the current batch as double 3 Hoisting the hasNulls() check outside the loop increases throughput if most batches don’t have nulls 4 Process the values from the current batch When reading multiple columns from a file, you can obtain a ColumnReaders object which lets you drive the readers in lockstep:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ParquetFileReader reader = ...; long passengerCount = 0; double tripDistance = 0, fareAmount = 0; ColumnReaders columns = reader.buildColumnReaders( ColumnProjection.columns( \u0026#34;passenger_count\u0026#34;, \u0026#34;trip_distance\u0026#34;, \u0026#34;fare_amount\u0026#34;)) .build(); while (columns.nextBatch()) { int count = columns.getRecordCount(); long[] v0 = columns.getColumnReader(\u0026#34;passenger_count\u0026#34;).getLongs(); double[] v1 = columns.getColumnReader(\u0026#34;trip_distance\u0026#34;).getDoubles(); double[] v2 = columns.getColumnReader(\u0026#34;fare_amount\u0026#34;).getDoubles(); for (int i = 0; i \u0026lt; count; i++) { passengerCount += v0[i]; tripDistance += v1[i]; fareAmount += v2[i]; } } Parquet also allows for repeatable columns (i.e. lists) and even nested repeatable columns (i.e. lists of lists). The column reader API captures the nullability of these structures through a notion of layers — telling you which elements at each level of nesting are null, say an outermost list, a list nested within it, or a leaf value inside a list. Here is an example of a dataset which contains multiple temperature measurements per day, and we’d like to calculate the mean daily maximum:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 // temperature_samples is a list\u0026lt;double\u0026gt; — each record holds one day\u0026#39;s // readings. Mean daily maximum: the average, across days, of each day\u0026#39;s // hottest reading. ColumnReader col = reader.columnReader( \u0026#34;temperature_samples.list.element\u0026#34;); (1) double sumOfMaxima = 0; long days = 0; while (col.nextBatch()) { int records = col.getRecordCount(); double[] readings = col.getDoubles(); int[] offsets = col.getLayerOffsets(0); (2) Validity present = col.getLayerValidity(0); (3) Validity valid = col.getLeafValidity(); (4) for (int r = 0; r \u0026lt; records; r++) { if (present.isNull(r)) continue; (5) double dailyMax = Double.NEGATIVE_INFINITY; for (int i = offsets[r]; i \u0026lt; offsets[r + 1]; i++) { (6) if (valid.isNull(i)) continue; if (readings[i] \u0026gt; dailyMax) dailyMax = readings[i]; } if (dailyMax != Double.NEGATIVE_INFINITY) { (7) sumOfMaxima += dailyMax; days++; } } } System.out.printf(\u0026#34;Mean daily maximum: %.1f °C over %d days%n\u0026#34;, sumOfMaxima / days, days); 1 Open a column reader on the list’s leaf, element 2 Per-list boundaries: record r’s readings run from `offsets[r] up to (excluding) offsets[r + 1] 3 List-level validity — which records actually logged a day, versus a null list 4 Leaf-level validity — which individual readings within those lists are non-null 5 Skip null lists; the layer model keeps this per-record check separate from element nulls 6 Reduce each list within its own span — a per-list max can’t be recovered from one flat array of every reading 7 Left at -inf by an empty list (or one whose readings were all null), so those days don’t count Handing values back as contiguous primitive arrays plus a set-bit-means-present validity bitmap is also exactly the shape vectorized processing wants: callers can run branch-free, data-parallel loops over the values (e.g. with Java’s Vector API). The how-to guide covers the API in depth: multi-level nesting, efficient retrieval of repeatable String values, working effectively with sparse columns, and more.\nNote that the ColumnReader API is marked experimental in Hardwood 1.0: there may be changes—​potentially backwards-incompatible ones—​in response to the feedback we receive. We’re planning to promote the API to stable in a future Hardwood 1.x version.\nGeospatial Support Via its GEOMETRY and GEOGRAPHY logical types, Apache Parquet allows you to store geospatial data using the Well-Known Binary (WKB) serialization. Both column types are supported by Hardwood as of this release, and their geospatial statistics drive predicate push-down to the row group level. Geospatial data is currently exposed as raw byte arrays, so you can use a geometry library of your choice (e.g. JTS) for decoding:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 ParquetFileReader fileReader = ...; FilterPredicate filter = FilterPredicate.intersects( \u0026#34;location\u0026#34;, -25.0, 35.0, 45.0, 72.0); (1) WKBReader wkbReader = new WKBReader(); (2) RowReader rowReader = fileReader.buildRowReader().filter(filter).build(); while (rowReader.hasNext()) { rowReader.next(); byte[] wkb = rowReader.getBinary(\u0026#34;location\u0026#34;); Geometry geom = wkbReader.read(wkb); (3) } 1 Filter records by intersecting with the given bounding box; note this applies at the row group level, i.e. a record will be returned if there’s at least one match in the same row group 2 Decode the raw bytes with any WKB library — here JTS\u0026#39; WKBReader 3 The decoded Geometry is ready to inspect, intersect, etc. Documentation Overhaul As we’re approaching the Hardwood 1.0 release, we’ve also spent some time improving and completing the project documentation. We’re big fans of the Diátaxis approach for structuring technical documentation, which proposes to organize docs in four distinct categories: tutorials (learning-oriented), how-to guides (goal-oriented), reference (information-oriented), and explanation (understanding-oriented). The docs on hardwood.dev have been restructured and built out based on this framework:\nWe hope that Hardwood users will find it much easier now to get started with the library, solve specific tasks such as reading files on S3, or learn more about Hardwood’s concurrency model. Any feedback on the new documentation structure is more than welcome!\nFurther Fixes and Improvements In addition, Hardwood 1.0.0.CR1 contains a number of other changes:\nLocal files of arbitrary size can be parsed, as long as individual column chunks don’t exceed 2 GB; for remote files (e.g. on S3), the 2 GB total file size limit remains in place\nHardwood now supports Parquet’s FLOAT16 column type\nThe RowReader value model gained more ergonomic accessors: by-index field access on PqStruct, key-based lookup and typed accessors on PqMap, typed List accessors on PqList, and additional Variant accessors\nMulti-column filter expressions are now evaluated more efficiently by pushing as much work as possible to individual page decoder threads\nTo distribute work across cluster engines such as Apache Flink, Hardwood now supports split-aware reading via RowGroupPredicate.byteRange(…​), allowing the row groups of a file to be processed by multiple worker instances\nExhaustive logical-type formatting in the Hardwood CLI; faster navigation of large collections and corrected \u0026#34;go to latest\u0026#34; in the data preview of hardwood dive\nOverall, 50 issues were resolved for Hardwood 1.0.0.CR1; see the release notes and the GitHub milestone for the complete list. The Hardwood library artifacts (hardwood-core, hardwood-s3, etc.) are available on Maven Central, while platform-specific native binaries for the Hardwood CLI can be downloaded from the 1.0.0.CR1 release page.\nAs always, a massive shout-out to everyone who contributed to this release: Carlos Sousa, Fawzi Essam, Manish, Mohamed Ibrahim Elsawy, Muhannd Sayed, polo, Prashant Khanal, Rion Williams, and Said Boudjelda!\nWith 1.0.0.CR1 out the door, we’re on the home stretch to Hardwood 1.0 Final, which should ship in a week or so. After that, we’ll begin work on writing Parquet files, slated for Hardwood 1.1 in early summer.\n","id":3,"publicationdate":"May 31, 2026","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_reworked_columnreader_api\"\u003eReworked ColumnReader API\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_geospatial_support\"\u003eGeospatial Support\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_documentation_overhaul\"\u003eDocumentation Overhaul\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_further_fixes_and_improvements\"\u003eFurther Fixes and Improvements\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am happy to announce the release of \u003ca href=\"https://hardwood.dev/1.0.0.CR1/\"\u003eHardwood 1.0.0.CR1\u003c/a\u003e!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis first candidate release of Hardwood 1.0 brings a substantially improved API for columnar access to Apache Parquet files,\ninitial support for Parquet’s \u003ccode\u003eGEOMETRY\u003c/code\u003e/\u003ccode\u003eGEOGRAPHY\u003c/code\u003e column types, and many other improvements to the core library as well as the Hardwood CLI.\u003c/p\u003e\n\u003c/div\u003e","tags":["parquet","open-source","java","performance","hardwood"],"title":"Improved Column Reader API, First Cut of Geospatial Support: Hardwood 1.0.0.CR1 Is Available","uri":"https://www.morling.dev/blog/improved-column-reader-api-geospatial-support-hardwood-1-0-0-cr1-available/"},{"content":"","id":4,"publicationdate":"May 31, 2026","section":"tags","summary":"","tags":null,"title":"java","uri":"https://www.morling.dev/tags/java/"},{"content":"","id":5,"publicationdate":"May 31, 2026","section":"tags","summary":"","tags":null,"title":"open-source","uri":"https://www.morling.dev/tags/open-source/"},{"content":"","id":6,"publicationdate":"May 31, 2026","section":"tags","summary":"","tags":null,"title":"parquet","uri":"https://www.morling.dev/tags/parquet/"},{"content":"","id":7,"publicationdate":"May 31, 2026","section":"tags","summary":"","tags":null,"title":"performance","uri":"https://www.morling.dev/tags/performance/"},{"content":"","id":8,"publicationdate":"May 31, 2026","section":"tags","summary":"","tags":null,"title":"Tags","uri":"https://www.morling.dev/tags/"},{"content":" I had the pleasure to do a few podcasts and interviews, e.g. talking about Debezium, change data capture, stream processing, my career, and software engineering in general.\nInterview with InfoQ: Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively\nConfluent Developer Podcast: Ep. 23 - The 1 Billion Row Challenge with Gunnar Morling\nUnapologetically Technical with Jesse Anderson: Ep.9 Gunnar Morling - One Billion Row Challenge\nInterview with InfoQ: The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds\nCoffee + Software with Josh Long: Gunnar Morling on the 1BRC\nHappy Path Programming, by Bruce Eckel \u0026amp; James Ward: Ep. #93: Nerd Sniping via the 1B Row Challenge with Gunnar Morling\nDeveloper Voices: Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)\nThe Geek Narrator: Decoding Decodable with Gunnar Morling\nThe Data Stack Show: All About Debezium and Change Data Capture With Gunnar Morling of Decodable\nReal-Time Analytics Podcast: Mr. Debezium on Pinot, Flink, CDC \u0026amp; Decodable | Ep. 4: Gunnar Morling\nSaaS Developer Community: The Wonders of Postgres Logical Decoding Messages\nThe Modern Data Show: S02 E03: Innovating the Modern Data Stack: Change Data Capture and Beyond with Gunnar Morling\nThe Geek Narrator: Change Data Capture and Debezium with Gunnar Morling\nA Bootiful Podcast: Java Champion Gunnar Morling about messaging middleware, Debezium, change data capture, and more.\nTrino Community Broadcast: Episode #25: Trino Going Through Changes; together with Ashhar Hasan and Ayush Chauhan\nThe InfoQ Podcast, with Wes Reisz: Gunnar Morling on Change Data Capture and Debezium\nData Engineering Podcast by Tobias Macey: Episode 114 — Change Data Capture For All Of Your Databases With Debezium; together with Randall Hauch\nAdam Bien’s airhacks.fm podcast: Episode 39 — Use the Most Productive Stack You Can Get\nAdam Bien’s airhacks.fm podcast: Episode 57 — CDC, Debezium, streaming and Apache Kafka\nStreaming Audio: a Confluent podcast about Apache Kafka: Change Data Capture with Debezium ft. Gunnar Morling\nInterview with Thorben Janssen for heise.de (German): Im Gespräch: Gunnar Morling über Debezium und CDC\nThoughts On Java: Interview with Gunnar Morling\n","id":9,"publicationdate":"May 26, 2026","section":"","summary":"I had the pleasure to do a few podcasts and interviews, e.g. talking about Debezium, change data capture, stream processing, my career, and software engineering in general.\nInterview with InfoQ: Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively\nConfluent Developer Podcast: Ep. 23 - The 1 Billion Row Challenge with Gunnar Morling\nUnapologetically Technical with Jesse Anderson: Ep.9 Gunnar Morling - One Billion Row Challenge\nInterview with InfoQ: The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds","tags":null,"title":"Podcasts and Interviews","uri":"https://www.morling.dev/podcasts/"},{"content":" Table of Contents 2026 2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 2013 This page gives an overview over some talks I have done over the last years. I have spoken at large conferences such as QCon San Francisco, Devoxx and JavaOne, local meet-ups as well as company-internal events, covering topics such as Debezium and Change Data Capture, Bean Validation, NoSQL and more.\nIf you’d like to have me as a speaker at your conference or meet-up, please get in touch.\n2026 Java User Group Hamburg: Ins and Outs of the Outbox Pattern\nCurrent (London): Hardwood—​Building a Parquet Parser From Scratch (With a Little Help From AI)\nCurrent (London): Transactional Change Stream Processing With Debezium and Apache Flink\n2025 JavaZone (Oslo): Mastering Postgres Replication Slots: Preventing WAL Bloat and Other Production Issues\nJavaZone (Oslo): Ins and Outs of the Outbox Pattern\nData Streaming World Tour (Vienna): Debezium and Apache Flink for Real-Time CDC Stream Processing\nKafka Meetup (Warsaw): Ins and Outs of the Outbox Pattern\nCurrent (London): Ins and Outs of the Outbox Pattern\njPrime (Sofia: Ins and Outs of the Outbox Pattern\nCurrent (Bengaluru, India): Ins and Outs of the Outbox Pattern\n2024 Flink Forward Asia (Jakarta, Indonesia): Streaming Data Contracts With Debezium and Apache Flink\nBig Data Europe (Vilnius, Lithunia): Data Contracts In Practice With Debezium and Apache Flink\nBig Data Europe (Vilnius, Lithunia; panel discussion): Building Effective Data Teams: Strategies for Success\nJ-Fall (Ede, Netherlands): 1BRC–-Nerd Sniping the Java Community\nP99 Conf (online): 1BRC–-Nerd Sniping the Java Community\nFlink Forward (Berlin; panel discussion): AI and Apache Flink: Expert Panel\nDevoxx Belgium (Antwerp; joint talk together with Roy van Rijn): 1BRC–-Nerd Sniping the Java Community\nDevoxx Belgium (Antwerp; joint hands-on lab with Hans-Peter Grahsl): Putting AI Into Real-time ETL with Apache Flink, Debezium, and LangChain4j\nInfoQ DevSummit (Munich): 1BRC–-Nerd Sniping the Java Community\nCurrent (Austin): Data Contracts In Practice With Debezium and Apache Flink\nJavaZone (Oslo): 1BRC—​Nerd Sniping the Java Community\nData Berlin Midsummer Meetup: From Postgres to OpenSearch in No Time\nJCon OpenBlend Slovenia: 1BRC—​Nerd Sniping the Java Community, Syncing your Database To OpenSearch In Real-Time\nJavaDay Istanbul: Syncing your Database To OpenSearch In Real-Time\nIndia Open Source Data Infrastructure Meetup (Bengaluru): From Postgres to OpenSearch in No Time\nKafka Summit (Bengaluru): Debezium Snapshots Revisited!\nKafka Summit (London): Data Contracts In Practice With Debezium and Apache Flink\nJavaLand (Nürburgring): 1BRC—​Nerd Sniping the Java Community\n2023 Open Source Data Infrastructure Meetup (Berlin): From Postgres to OpenSearch in No Time\nFlink Forward (Seattle): Debezium Snapshots Revisited!\nCurrent (San José): Debezium Snapshots Revisited!\nJavaZone (Oslo): Real-time Change Stream Processing with Apache Flink\nKafka Summit (London): Taming Kafka Connect with kcctl\nData Council (Austin): Change Data Streaming Patterns With Debezium \u0026amp; Apache Flink\nQCon London: Change Data Capture for Microservices; I also was the host for the Building Modern Backends track\njProfessionals (Sofia): \u0026#34;Change Stream Processing with Debezium and Apache Flink\u0026#34;\n2022 Devoxx (Antwerp): Taming Kafka Connect with kcctl\nDevoxx (Antwerp): Keep Your Cache Always Fresh with Debezium!\nCurrent (Austin): Keep Your Cache Always Fresh with Debezium!\ncode.talks (Hamburg): Change Data Streaming Patterns für Verteilte Systeme\nUptime (Amsterdam): Keep Your Cache Always Fresh with Debezium!\nJava User Group Hamburg: Mit Java-18-APIs zum Mond und weiter\nJBCONConf (Barcelona): Keep Your Cache Always Fresh with Debezium!\nKafka Summit London: Keep Your Cache Always Fresh with Debezium! (recording, slides)\nCarnegie Mellon University \u0026#34;Vaccination Database Tech Talks\u0026#34; (online): Open-source Change Data Capture With Debezium (recording, slides)\njChampionsConference (online): Continuous Performance Regression Testing with JfrUnit (recording, slides)\n2021 DevNation Tech Talk (online): To the Moon and Beyond With Java 17 APIs!\nRed Hat Summit Connect Developer Experience (online, German): Change Data Streaming Patterns für Verteilte Systeme\nFlink Forward (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems\nVoxxed Days Romania (joint presentation with Hans-Peter Grahsl; online): Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB\nP99 Conf (online): Continuous Performance Regression Testing with JfrUnit\nAccento (online): To the Moon and Beyond With Java 17 APIs!; Panel The Present and Future of Java (17)\nHeise betterCode() Java (online): Mit Java-17-APIs zum Mond und weiter\nKafka Summit Americas (joint presentation with Hans-Peter Grahsl; online): Dissecting our Legacy: The Strangler Fig Pattern with Debezium, Apache Kafka \u0026amp; MongoDB\nApache Pinot Meetup (joint presentation with Kenny Bastani; online): Analyzing Real-time Order Deliveries using CDC with Debezium and Pinot\nMongoDB World (joint presentation with Hans-Peter Grahsl; online): Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB\njLove (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems\nBerlin Buzzwords (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems (recording, slides)\nThe Developer’s Conference (joint presentation with Hans-Peter Grahsl; online): Change Data Streaming Patterns in Distributed Systems (slides)\nKafka Summit Europe (joint presentation with Hans-Peter Grahsl; online): Advanced Change Data Streaming Patterns in Distributed Systems (recording, slides)\nDevNation Tech Talk (online): Continuous performance regression testing with JfrUnit (recording, slides)\nBordeaux JUG (joint presentation with Katja Aresti; online): Don’t fear outdated caches — change data capture to the rescue! Let’s discover Infinispan and Debezium\njChampionsConference (joint presentation with Andres Almiray; online): Plug-in Architectures for Java with Layrry \u0026amp; the Java Module System\n2020 JokerConf (online): Change data capture pipelines with Debezium and Kafka Streams\nVirtual JUG (joint presentation with Andres Almiray; online): Plug-in Architectures With Layrry and the Java Module System (recording, slides)\nQConPlus (online): Serverless Search for My Blog With Java, Quarkus, \u0026amp; AWS Lambda\nJFall (joint presentation with Andres Almiray; online): Plug-in Architectures for Java With Layrry and the Java Module System\nJava Day Istanbul (online): Change Data Streaming Use Cases With Apache Kafka \u0026amp; Debezium\nGreat International Developer Summit (online): Change Data Capture Pipelines with Debezium and Kafka Streams\nKafka Summit (online): Change Data Capture Pipelines with Debezium and Kafka Streams\nRed Hat Summit Virtual Experience: Data integration patterns for microservices with Debezium and Apache Kafka\n2019 Nordic Coding, Kiel: Quarkus - Supersonic Subatomic Java\nJava User Group Paderborn: Change Data Streaming Use Cases mit Debezium und Apache Kafka\nQCon San Francisco: Practical Change Data Streaming Use Cases With Apache Kafka \u0026amp; Debezium\nJokerConf, St. Petersburg: Practical change data streaming use cases with Apache Kafka and Debezium\nJavaZone, Oslo: Change Data Streaming For Microservices With Apache Kafka and Debezium\nMicroXchg, Berlin: Change Data Streaming Patterns For Microservices With Debezium\nJavaLand, Brühl\nChange Data Streaming für Microservices mit Debezium\nDas Annotation Processing API - Use Cases und Best Practices\nRivieraDev, Sophia Antipolis: Practical Change Data Streaming Use Cases With Apache Kafka and Debezium\nKafka Summit London: Change Data Streaming Patterns For Microservices With Debezium\nRed Hat Summit, Boston\nBridging microservice boundaries with Apache Kafka and Debezium (hands-on lab)\nChange data streaming patterns for microservices with Debezium\nRed Hat Modern Integration and Application Development Day, Milano: Data Strategies for Microservices: Change Data Capture with Debezium\n2018 Devoxx Morocco, Marrakesh\nChange Data Streaming Patterns for Microservices With Debezium\nMap me if you can! Painless bean mappings with MapStruct\nKafka Summit San Francisco: Change Data Streaming Patterns for Microservices With Debezium\nVoxxedDays Microservices Paris: Data Streaming for Microservices using Debezium\nJUG Saxony Day, Dresden: Streaming von Datenbankänderungen mit Debezium\nJava User Group Darmstadt: Streaming von Datenbankänderungen mit Debezium\nJavaLand, Brühl: Hibernate - State of the Union; Migrating to Java 9 Modules with ModiTect\nRivieraDev, Sophia Antipolis: Data Streaming for Microservices using Debezium\nRed Hat Summit, San Francisco: Running data-streaming applications with Kafka on OpenShift (hands-on lab)\nJava User Group Münster, Streaming von Datenbankänderungen mit Debezium\n2017 JavaZone, Oslo: Keeping Your Data Sane with Bean Validation 2.0\ncode.talks, Hamburg: Neues in Bean Validation 2.0 - Support für Java 8 und mehr (recording)\nJavaOne, San Francisco\nKeeping Your Data Sane with Bean Validation 2.0\nNoSQL? Have it Your Way!\nDevoxx Belgium, Antwerp\nStreaming Database Changes with Debezium\nShort talks on Bean Validation 2.0 and MapStruct\njdk.io, Copenhagen: Keeping Your Data Sane with Bean Validation 2.0\nRivieraDev, Sophia Antipolis: Keeping Your Data Sane with Bean Validation 2.0\nJavaLand, Brühl\nBean Validation 2.0\nHibernate Search and Elasticsearch\n2016 JavaZone, Oslo: From Hibernate to Elasticsearch in no time\n2013 Berlin Expert Days: Bean Validation 1.1 - Whats Cooking? (slides)\n","id":10,"publicationdate":"May 21, 2026","section":"","summary":"Table of Contents 2026 2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 2013 This page gives an overview over some talks I have done over the last years. I have spoken at large conferences such as QCon San Francisco, Devoxx and JavaOne, local meet-ups as well as company-internal events, covering topics such as Debezium and Change Data Capture, Bean Validation, NoSQL and more.\nIf you’d like to have me as a speaker at your conference or meet-up, please get in touch.","tags":null,"title":"Conferences","uri":"https://www.morling.dev/conferences/"},{"content":" Table of Contents VARIANT Support Hardwood CLI TUI Unified Reader API Performance Improvements Wrapping Up I am happy to announce the release of Hardwood 1.0.0.Beta2!\nThe latest version of this new parser for Apache Parquet comes with support for VARIANT columns, an interactive text-based UI (TUI) for examining and analysing the structure of Parquet files, significantly improved performance, more efficient reading of files from object storage, and much more.\nVARIANT Support Parquet’s VARIANT logical type lets you store semi-structured, JSON-like data in a self-describing binary encoding. Physically it is a group of two required BYTE_ARRAY children, metadata and value, whose bytes together define a variant value with its own type tag (object, array, string, int, etc.).\nVARIANT columns come in handy to store dynamically shaped data, such as entity-attribute-value (EAV) data models. They are also useful to model data with varying types, for instance a \u0026#34;measurements\u0026#34; column which contains both long and double values. Hardwood surfaces variant values through the new PqVariant API:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 // attributes.parquet — an entity-attribute-value table: // // id BIGINT -- entity being described // name STRING -- which attribute // value VARIANT -- the attribute\u0026#39;s value, shape depends on `name` // // id name value // (1, \u0026#34;age\u0026#34;, 42) - INT64 // (1, \u0026#34;email\u0026#34;, \u0026#34;ada@example.com\u0026#34;) - STRING // (1, \u0026#34;preferences\u0026#34;, { \u0026#34;theme\u0026#34;: \u0026#34;dark\u0026#34;, \u0026#34;opt_in\u0026#34;: true }) - OBJECT RowReader rows = file.rowReader(); while (rows.hasNext()) { rows.next(); long id = rows.getLong(\u0026#34;id\u0026#34;); String name = rows.getString(\u0026#34;name\u0026#34;); PqVariant v = rows.getVariant(\u0026#34;value\u0026#34;); (1) String rendered = switch (v.type()) { (2) case INT8, INT16, INT32, INT64 -\u0026gt; Long.toString(v.asLong()); case STRING -\u0026gt; v.asString(); case OBJECT -\u0026gt; v.asObject().getString(\u0026#34;theme\u0026#34;); default -\u0026gt; \u0026#34;\u0026lt;\u0026#34; + v.type() + \u0026#34;\u0026gt;\u0026#34;; }; System.out.println(id + \u0026#34; \u0026#34; + name + \u0026#34; = \u0026#34; + rendered); // 1 age = 42 // 1 email = ada@example.com // 1 preferences = dark byte[] metadata = v.metadata(); (3) byte[] value = v.value(); } 1 dynamically typed variant value, shape varies per row 2 narrow to specific runtime type 3 access raw canonical bytes if needed The as*() methods (asInt(), asString(), asTimestamp(), etc.) let you extract primitives from a variant value. Via getObject() and getArray() you can navigate to nested variant objects and arrays, respectively.\nHardwood also supports the retrieval of shredded variants: Some writers store part of the payload in a typed sibling column (typed_value) alongside value for better compression and pushdown. Reassembly is transparent: access is exactly the same as for non-shredded variants, and metadata() and value() return canonical bytes regardless of whether the file was shredded, so PqVariant consumers see a single consistent representation. Note that predicate push-down and path projections are not aware of shredding yet; this optimization is tracked as #309.\nHardwood CLI TUI The Hardwood CLI now has a new command dive which lets you interactively explore and analyse Parquet files through a text-based UI (TUI). It complements the existing non-interactive commands such as inspect, schema, and convert, which continue to be available e.g. for scripting and automation use cases. The TUI shows you file statistics and schemas; you can drill into row groups and column chunks, examine indexes and dictionaries, take a look at the parsed data, and much more.\nTo run the TUI, grab the Hardwood CLI native binary distribution for your platform from GitHub, then launch it via hardwood dive, specifying the name of the file to explore (either locally or on S3):\n1 hardwood dive -f s3://your-bucket/your-data.parquet See the following screen recording for some of the features of the Hardwood TUI:\nWhen examining a file on object storage, only the required sections are retrieved; the number of S3 requests and the downloaded data volume are shown in the title bar. A local off-heap cache ensures that each segment of a file is downloaded only once.\nThe current release is just the starting point for the TUI, we have quite a few ideas for expanding it into a complete Parquet diagnostics tool, e.g. showing raw page data, inspecting Bloom filters, and much more. Of course, your ideas and feature requests are welcome in the issue tracker, too.\nUnified Reader API As we added more capabilities to the core Hardwood row reader API (projections, filters, row limits, start offsets, etc.), more and more overloaded versions of the createRowReader() method accumulated. So we decided to rework this API. A reader for fetching all rows and all columns can now be obtained via rowReader(). Otherwise, a builder can be used to customize readers as needed:\n1 2 3 4 5 6 7 8 ParquetFileReader fileReader = ParquetFileReader.open(\u0026lt;some file\u0026gt;); RowReader rowReader = fileReader.buildRowReader() .projection(ColumnProjection.columns(\u0026#34;id\u0026#34;, \u0026#34;name\u0026#34;)) (1) .filter(FilterPredicate.gt(\u0026#34;age\u0026#34;, 21)) (2) .firstRow(1_000_000) (3) .head(100) (4) .build(); 1 project only the id and name columns 2 only return those rows where age is greater than 21 3 start from row 1,000,000 4 return 100 rows Similarly, buildColumnReader() allows you to retrieve a customized columnar reader. The previous split of single-file and multi-file readers has been replaced with one unified reader abstraction.\nPerformance Improvements As part of this release, we’ve substantially reworked and optimized the core page fetching and decoding pipeline, yielding some nice performance gains. The pipeline applies per-column parallelism when fetching and decoding pages, and uses column filters, when set, to skip entire pages of non-matching values.\nOn a MacBook Pro M3 Max, the existing flat-file benchmark (processing 9.6 GB of NYC taxi ride data, aggregating three of 20 columns via the Hardwood row reader API) improved from ~2.7 sec to 2.2 sec. The nested-file benchmark (4.7 M rows from the Overture Maps dataset, via the column reader API) improved from ~1.4 sec to 0.7 sec1. Object allocations have been reduced, resulting in less GC pressure and thus more stable tail latencies.\n[1] Setting up a comprehensive benchmark suite, systematically testing Hardwood’s performance for a range of representative workloads and comparing to other solutions including parquet-java is very high on our roadmap; if you’d like to help with this task, please reach out.\nWhen reading files from S3, GET requests for file segments are scheduled much more efficiently than before. When applicable, requests are coalesced across column chunks, small columns are fetched in a single request, and fetched segments can be cached locally for repeated access.\nWrapping Up Other additions in this release include support for more Parquet logical types (INTERVAL, MAP/LIST, INT96), reproducible builds for the published Hardwood JARs, and a reorganized hardwood inspect CLI command with a more consistent subcommand layout (if you have scripts using inspect, take a look at the release notes for the migration). See the 1.0.0.Beta2 release notes and GitHub milestone for the complete list of closed issues.\nI am particularly excited about the growing number of people involved with this project. A big thank you to everyone contributing to this release: André Rouél, Brandon Brown, Bruno Borges, Fawzi Essam, Manish Ghildiyal, polo, Rion Williams, Sabarish Rajamohan, and Trevin Chow. If you’d like to start your own contribution journey, then check out the \u0026#34;good first issue\u0026#34; and \u0026#34;help wanted\u0026#34; labels in the issue tracker. If you want to discuss any ideas or have questions around the project, join the Hardwood Discussions on GitHub.\nA first candidate release for Hardwood 1.0 should be out next week, followed by the 1.0 Final release later in May, barring any unforeseen issues. Hardwood 1.1 with support for writing Parquet files should follow shortly thereafter.\n","id":11,"publicationdate":"Apr 29, 2026","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_variant_support\"\u003eVARIANT Support\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_hardwood_cli_tui\"\u003eHardwood CLI TUI\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_unified_reader_api\"\u003eUnified Reader API\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_performance_improvements\"\u003ePerformance Improvements\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrapping_up\"\u003eWrapping Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am happy to announce the release of Hardwood 1.0.0.Beta2!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe latest version of this \u003ca href=\"https://hardwood.dev\"\u003enew parser for Apache Parquet\u003c/a\u003e comes with support for \u003ccode\u003eVARIANT\u003c/code\u003e columns,\nan interactive text-based UI (TUI) for examining and analysing the structure of Parquet files,\nsignificantly improved performance, more efficient reading of files from object storage, and much more.\u003c/p\u003e\n\u003c/div\u003e","tags":["parquet","open-source","java","performance","hardwood"],"title":"VARIANT Support, Interactive Parquet File TUI: Hardwood 1.0.0.Beta2 Is Out","uri":"https://www.morling.dev/blog/variant-support-interactive-parquet-file-tui-hardwood-1-0-0-beta2-is-out/"},{"content":" Table of Contents S3 Backend Predicate Push-Down Avro Bindings hardwood-cli Wrapping Up I am pleased to announce the release of Hardwood 1.0.0.Beta1!\nHardwood is a new parser for Apache Parquet, optimized for minimal dependencies and great performance. Since the project’s initial release just a few weeks back, a small yet very active community has come together and evolved Hardwood significantly. Today, we are shipping an S3 backend, allowing to parse files directly from object storage, predicate pushdown for both local and remote files, Avro bindings, a CLI for inspecting Parquet files, and much more. We’re also excited to launch a website for the project, hardwood.dev, which contains the documentation and API reference.\nLet’s dig in.\nS3 Backend Hardwood now allows you to parse files from Amazon S3, or any API-compatible object storage such as Cloudflare R2 or Google Cloud Storage. This means you can parse remote files directly, without having to download them first. Together with column projection and predicate push-down (see below), this can drastically reduce network IO if you only want to access a certain subset of your data, which is key when querying Parquet files in a data lake.\nLiving up to Hardwood’s premise of having a minimal dependency footprint, the S3 feature adds no mandatory dependencies, in particular avoiding pulling in heavy dependencies such as the AWS S3 SDK. Instead, Hardwood issues requests to the S3 REST API using Java’s built-in HTTP client; requests are signed using a custom implementation of the AWS SigV4 algorithm. Complete compatibility with the reference implementation is ensured by validating the signer against the complete suite of the official SigV4 test vectors.\nAuthentication is done via a simple callback API; in the simplest case, access key id and secret access key can be specified like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 S3Source source = S3Source.builder() .region(\u0026#34;us-east-1\u0026#34;) .credentials(S3Credentials.of(\u0026#34;AKIA...\u0026#34;, \u0026#34;secret\u0026#34;)) .build(); try (ParquetFileReader reader = ParquetFileReader.open( source.inputFile(\u0026#34;s3://my-bucket/data/trips.parquet\u0026#34;))) { try (RowReader rows = reader.createRowReader()) { while (rows.hasNext()) { rows.next(); long id = rows.getLong(\u0026#34;id\u0026#34;); } } } For dynamic or refreshable credentials, you can implement the S3CredentialsProvider functional interface:\n1 2 3 4 S3Source source = S3Source.builder() .region(\u0026#34;us-east-1\u0026#34;) .credentials(() -\u0026gt; fetchCredentialsFromVault()) .build(); If you’d like to use the full AWS credential chain (env vars, ~/.aws/credentials, EC2/ECS instance profile, SSO, web identity), you can do so by adding the optional hardwood-aws-auth module (which in turn relies on the software.amazon.awssdk:auth module from the official AWS SDK):\n1 2 3 4 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;dev.hardwood\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hardwood-aws-auth\u0026lt;/artifactId\u0026gt; \u0026lt;/dependency\u0026gt; 1 2 3 4 5 6 import dev.hardwood.aws.auth.SdkCredentialsProviders; S3Source source = S3Source.builder() .region(\u0026#34;us-east-1\u0026#34;) .credentials(SdkCredentialsProviders.defaultChain()) .build(); For S3-compatible services, timeout and retry configuration, and other options, see the S3 backend documentation.\nPredicate Push-Down When querying files on remote storage, it is essential to limit the amount of fetched data as much as possible, reducing network I/O and thus minimizing query times as well as any potential data transfer fees. For this purpose, Hardwood now also supports predicate push-down in addition to column projections. Parquet files can optionally contain statistics for row groups as well as for specific chunk pages. At the row-group level, entire row groups whose statistics prove no rows can match are skipped. Within matching row groups, the Column Index (per-page min/max statistics) is used to skip individual pages, avoiding unnecessary decompression and decoding.\nThe FilterPredicate API allows you to create filters based on the operators eq, notEq, lt, ltEq, gt, gtEq, in, isNull, and isNotNull.\n1 2 3 4 5 6 7 8 9 10 11 // Simple filter FilterPredicate filter = FilterPredicate.gt(\u0026#34;age\u0026#34;, 21); // IN filter FilterPredicate filter = FilterPredicate.in(\u0026#34;department_id\u0026#34;, 1, 3, 7); FilterPredicate filter = FilterPredicate.inStrings( \u0026#34;city\u0026#34;, \u0026#34;NYC\u0026#34;, \u0026#34;LA\u0026#34;, \u0026#34;Chicago\u0026#34;); // NULL checks FilterPredicate filter = FilterPredicate.isNull(\u0026#34;middle_name\u0026#34;); FilterPredicate filter = FilterPredicate.isNotNull(\u0026#34;email\u0026#34;); The logical operators and, or, and not can be used to combine basic filters:\n1 2 3 4 5 // Compound filter FilterPredicate filter = FilterPredicate.and( FilterPredicate.gtEq(\u0026#34;salary\u0026#34;, 50000L), FilterPredicate.lt(\u0026#34;age\u0026#34;, 65) ); Then, when obtaining a Parquet row or column reader, specify the filter predicate like so:\n1 2 3 4 5 6 7 8 9 10 try (ParquetFileReader fileReader = ParquetFileReader.open( InputFile.of(path)); RowReader rowReader = fileReader.createRowReader(filter)) { while (rowReader.hasNext()) { rowReader.next(); // Only rows from non-skipped row groups are returned } } The reference documentation discusses predicate push-down in full depth, for instance touching on how to use this together with column projections as well as on some limitations of the current implementation.\nAvro Bindings If your application already works with Avro records, for instance in a Kafka pipeline or Flink job, the new hardwood-avro module lets you read Parquet files directly into GenericRecord instances. Add it alongside hardwood-core:\n1 2 3 4 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;dev.hardwood\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hardwood-avro\u0026lt;/artifactId\u0026gt; \u0026lt;/dependency\u0026gt; Then use the AvroReaders class to obtain a reader:\n1 2 3 4 5 6 7 8 9 10 11 try (ParquetFileReader fileReader = ParquetFileReader.open( InputFile.of(path)); AvroRowReader reader = AvroReaders.createRowReader(fileReader)) { while (reader.hasNext()) { GenericRecord record = reader.next(); long id = (Long) record.get(\u0026#34;id\u0026#34;); GenericRecord address = (GenericRecord) record.get(\u0026#34;address\u0026#34;); } } Column projection and predicate push-down are fully supported, so you’re not giving anything up compared to the native row API. The schema conversion and type mapping match the behavior of parquet-java’s AvroReadSupport, which should make migration straightforward. See the Avro documentation for the full details.\nhardwood-cli Building a command line client for Hardwood was something which had been on my mind for a while, but initially I had planned to only do so after the 1.0 release. However, Brandon Brown stepped up to build a first version of the CLI before I even got around to it. It lets you examine Parquet files (both locally and on object storage), e.g. to take a look at their metadata and schema, inspect dictionaries and column indexes, print a few lines for getting a quick understanding of a file’s contents, convert them to JSON and CSV, and more.\nYou can see some of the features in action in this recording:\nTo get started with the Hardwood CLI, download the right native binary for your platform from GitHub; we currently provide binaries for Linux (x86_64 and aarch64), macOS (x86_64 and aarch64), and Windows (x86_64).\nWrapping Up Besides these key features, there’s also support for key/value metadata, Page CRC verification, and more. See the release notes for the details. You can grab the new release from Maven Central.\nHardwood wouldn’t be possible without the help of the following amazing folks from the open-source community, who contributed to this release: Arnav Balyan, Said Boudjelda, Brandon Brown, Manish Ghildiyal, Nicolas Grondin, Rion Williams, and Romain Manni-Bucau. Thank you all!\nAt this point, Hardwood handles the common Parquet reading use cases. For the remainder of the 1.0 release train, we are planning to focus on performance optimizations, close some gaps like Bloom filters, and stabilize the public API. You should expect a first 1.0 candidate in a week or two, with the 1.0.0.Final release hopefully following later this month.\nKey features for the 1.1 release later this year will be write support as well as support for less widely adopted Parquet features such as VARIANT columns.\n","id":12,"publicationdate":"Apr 2, 2026","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_s3_backend\"\u003eS3 Backend\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_predicate_push_down\"\u003ePredicate Push-Down\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_avro_bindings\"\u003eAvro Bindings\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_hardwood_cli\"\u003ehardwood-cli\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrapping_up\"\u003eWrapping Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am pleased to announce the release of Hardwood 1.0.0.Beta1!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://github.com/hardwood-hq/hardwood\"\u003eHardwood\u003c/a\u003e is a new parser for Apache Parquet, optimized for minimal dependencies and great performance.\nSince the project’s initial release just \u003ca href=\"/blog/hardwood-new-parser-for-apache-parquet/\"\u003ea few weeks back\u003c/a\u003e, a small yet very active community has come together and evolved Hardwood significantly.\nToday, we are shipping an S3 backend, allowing to parse files directly from object storage,\npredicate pushdown for both local and remote files, Avro bindings, a CLI for inspecting Parquet files, and much more.\nWe’re also excited to launch a website for the project, \u003ca href=\"https://hardwood.dev\"\u003ehardwood.dev\u003c/a\u003e, which contains the documentation and API reference.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLet’s dig in.\u003c/p\u003e\n\u003c/div\u003e","tags":["parquet","open-source","java","performance","hardwood"],"title":"Hardwood Reaches Beta: S3, Predicate Push-Down, CLI, and More","uri":"https://www.morling.dev/blog/hardwood-reaches-beta-s3-predicate-push-down-cli/"},{"content":" I have contributed to a wide range of open-source projects over the last years. Here’s a selection of projects I have been involved with.\n1BRC 1️⃣🐝🏎️ The One Billion Row Challenge, or 1BRC for short, is a fun exploration of how quickly 1B rows from a text file can be aggregated with Java. It is a coding challenge I ran in January 2024, which provided an opportunity to learn about modern Java APIs, SIMD, and high-performance programming techniques to hundreds of developers. Focussed on Java originally, 1BRC garnered a huge interest from folks in other eco-systems as well.\nDebezium Debezium is a platform for change data capture; it lets you stream changes out of different databases such as Postgres, MySQL, MongoDB and SQL Server into Apache Kafka. I was the lead of the Debezium project for several years.\nHardwood Hardwood is a parser for the Apache Parquet file format, optimized for minimal dependencies (no Hadoop, IYKYK) and great performance, applying some of the learnings from 1BRC.\nJfrUnit JfrUnit is a JUnit extension for asserting JDK Flight Recorder events. It comes handy for ensuring the right custom JFR events are emitted by a JVM-based library or application as well as for identifying potential performance regressions by tracking JFR events for memory allocation, network I/O, etc.\nJFR Analytics JFR Analytics is an exploration for running analytics on JDK Flight Recorder recordings. It lets you use SQL to identify things like classloader leaks or thread leaks, allocation-heavy methods in your program, and more.\nkcctl 🧸 kcctl is a command-line client for working with Kafka Connect, allowing to examine the state of the Connect cluster and individual connectors, register and start/stop connectors, etc. It is based on Quarkus and provided as a native binary for Linux, macOS, and Windows via GraalVM.\nPersistasaurus Persistasaurus is a PoC of a durable execution engine backed by SQLite. It allows to define multi-step workflows in Java using a simple annotation-based programming model, with SQLite used for persisting the execution log. If a flow gets interrupted, for instance due to a machine failure, Persistasaurus can resume it from the last successfully executed step. You can learn more about it in this blog post.\nModiTect, Layrry, and Deptective ModiTect is a family of Maven and Gradle plug-ins around the Java Module System, e.g. for creating module descriptors and building modular runtime images via jlink. Layrry is an API and launcher for modularized Java applications, leveraging the Java Module System’s notion of module layers, e.g. allowing to work with multiple versions of one dependency. Deptective is a plug-in for the Java compiler (javac) for enforcing package dependencies within Java projects based on a declarative architecture definition.\nMapStruct MapStruct is a compile-time code generator for bean-to-bean mappings. Based on annotated Java interfaces, MapStruct generates mapping code that it is fully type-safe and very efficient by avoiding any usage of reflection. I was the creator and initial project lead of MapStruct.\nQuarkus Quarkus is a \u0026#34;Kubernetes Native Java stack tailored for OpenJDK HotSpot and GraalVM, crafted from the best of breed Java libraries and standards\u0026#34;. My contributions to Quarkus are centered around its extension for Kafka Streams, which I initially created.\nBean Validation and Hibernate Validator Bean Validation is a Java specification which lets you express constraints on object models via annotations. Originally developed at the JCP, it’s now part of the Jakarta EE umbrella at the Eclipse foundation. I have been the spec lead for Bean Validation 2.0 (JSR 380) and the lead of the reference implementation Hibernate Validator.\nOther Hibernate Projects As part of the Hibernate team, I’ve contributed to different projects such as Hibernate OGM (an effort to access NoSQL stores with JPA), Hibernate Search (full-text search for domain models based on Apache Lucene and Elasticsearch) and Hibernate ORM.\n","id":13,"publicationdate":"Feb 27, 2026","section":"","summary":"I have contributed to a wide range of open-source projects over the last years. Here’s a selection of projects I have been involved with.\n1BRC 1️⃣🐝🏎️ The One Billion Row Challenge, or 1BRC for short, is a fun exploration of how quickly 1B rows from a text file can be aggregated with Java. It is a coding challenge I ran in January 2024, which provided an opportunity to learn about modern Java APIs, SIMD, and high-performance programming techniques to hundreds of developers.","tags":null,"title":"Projects","uri":"https://www.morling.dev/projects/"},{"content":" Table of Contents Why Hardwood? Hello, Hardwood! Parsing Performance Built With AI, Not By AI What’s Next? Today, it’s my great pleasure to announce the first public release of Hardwood, a new parser for the Apache Parquet file format, optimized for minimal dependencies and great performance.\nHardwood is open-source (Apache License 2.0) and supports Java 21 or newer. You can grab it from Maven Central and start parsing your Parquet files with ease and efficiency.\nWhy Hardwood? Apache Parquet has become the lingua franca of the modern data ecosystem. With its columnar data layout, Parquet enables efficient analytical queries and significant compression. This makes it a highly popular storage format for data lakes, supported by table formats such as Apache Iceberg and Delta Lake, as well as query engines like Trino and DuckDB. Whether you’re building ETL pipelines, running ad-hoc analytics, or training ML models, chances are Parquet files are somewhere in the mix.\nParquet support exists across a wide range of languages and runtimes. For Java, the parquet-java project is the most widely used library for parsing (and writing) Parquet files. Unfortunately, parquet-java is very dependency-heavy—most notably, pulling in Hadoop, amongst other things—and its reader is single-threaded, not taking advantage of all the cores your system may have.\nThis is where Hardwood comes in: it is a brand-new Parquet parser implementation written from the ground up in modern Java, based on the Parquet specification and test files. It avoids external dependencies as much as possible, the only exception being (optional) libraries for compression algorithms found in Parquet files, such as snappy or zstd. Another key objective for Hardwood is achieving great performance: a multi-threaded decoding pipeline distributes the work of parsing Parquet files across all the available CPU cores, yielding significantly faster parsing times.\nHello, Hardwood! Let’s take a quick look at how to use Hardwood for parsing a Parquet file. To get started, add Hardwood as a project dependency, e.g. like so for Maven:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;dev.hardwood\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hardwood-core\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.0.0.Alpha1\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Hardwood doesn’t pull in any transitive dependencies. If you want to parse Parquet files using one of the supported compression algorithms, add the library for that, too (one of snappy-java, zstd-jni, lz4-java, or brotli4j). For instance, add the following for snappy:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.xerial.snappy\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;snappy-java\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.1.10.8\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Hardwood provides two APIs for accessing the contents of Parquet files, a row-oriented API and a columnar API. The first one, RowReader, comes in handy in particular when working with complex, nested record schemas, making it very easy to access the contents of nested structs, lists, etc.:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Path myParquetFile = ...; try (ParquetFileReader fileReader = ParquetFileReader.open(myParquetFile); RowReader rowReader = fileReader.createRowReader()) { while (rowReader.hasNext()) { rowReader.next(); // Access columns with typed accessors by name... long id = rowReader.getLong(\u0026#34;id\u0026#34;); // ... Or by index String name = rowReader.getString(1); // Logical types are automatically converted LocalDate birthDate = rowReader.getDate(\u0026#34;birth_date\u0026#34;); UUID accountId = rowReader.getUuid(\u0026#34;account_id\u0026#34;); // Check for null values if (!rowReader.isNull(\u0026#34;age\u0026#34;)) { int age = rowReader.getInt(\u0026#34;age\u0026#34;); } // Access nested structs PqStruct address = rowReader.getStruct(\u0026#34;address\u0026#34;); if (address != null) { String city = address.getString(\u0026#34;city\u0026#34;); int zip = address.getInt(\u0026#34;zip\u0026#34;); } // Access lists and iterate with typed accessors PqList tags = rowReader.getList(\u0026#34;tags\u0026#34;); if (tags != null) { for (String tag : tags.strings()) { //... } } } } In contrast, the ColumnReader API is a bit more low-level, providing direct access to the column values of a Parquet file. It is the preferred choice when peak performance is the primary concern:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 try (ParquetFileReader reader = ParquetFileReader.open(myParquetFile)) { try (ColumnReader fare = reader.createColumnReader(\u0026#34;fare_amount\u0026#34;)) { double sum = 0; while (fare.nextBatch()) { int count = fare.getValueCount(); double[] values = fare.getDoubles(); // null if column is required BitSet nulls = fare.getElementNulls(); for (int i = 0; i \u0026lt; count; i++) { if (nulls == null || !nulls.get(i)) { sum += values[i]; } } } } } The columnar API is less convenient to use, in particular when it comes to reading files with nested or repeatable schemas. However, it can yield significantly higher throughput than the row reader API. It returns batches in the form of typed primitive arrays (e.g., double[]) that can be iterated with a simple for loop, avoiding the per-row next()/getDouble() method calls, virtual dispatch, and null-checking overhead of the row reader. It also enables the JIT compiler to auto-vectorize tight loops over contiguous arrays and improves CPU cache utilization by accessing a single column’s data sequentially rather than interleaving across all columns.\nParsing Performance Hardwood is built with high performance in mind. It applies many of the lessons learned from 1BRC, such as memory-mapping files or multi-threading. I am planning to share more details in a future blog post, so I’m going to focus just on one specific performance-related aspect here: Parallelizing the work of parsing Parquet files, so as to utilize the available CPU resources as much as possible and achieve high throughput.\nThis task is surprisingly complex due to the subtleties of the format, so Hardwood pulls a few tricks for taking advantage of all the available cores:\nPage-level parallelism, fanning out the work of decoding individual data pages to multiple worker threads. This allows for a much higher CPU utilization (and lower memory consumption) than when solely processing different column chunks, row groups, or even files in parallel.\nAdaptive page prefetching, ensuring that columns which are slower to decode than others (e.g. depending on their data type) receive more resources, so that all columns of a file can be read at the same pace.\nCross-file prefetching, starting to map and decode the pages of file N+1 when approaching the end of file N of a multi-file dataset, avoiding any slowdown at file transitions.\nBy employing these techniques and some others, such as minimizing allocations and avoiding auto-boxing of primitive values, Hardwood’s performance has come quite a long way since starting the project at the end of last year. As an example, the values of three out of 20 columns of the NYC taxi ride data set (a subset of 119 files overall, ~9.2 GB total, ~650M rows) can be summed up in ~2.7 sec using the row reader API with indexed access on my MacBook Pro M3 Max with 16 CPU cores. With the column reader API, the same task takes ~1.2 sec.\nThe taxi ride data set has a completely flat schema, i.e. it doesn’t contain any structs, lists, or maps. Most Parquet-based data sets fall into this category, and thus the focus for optimizing Hardwood has primarily been on these kinds of files so far. While less commonly found, the Parquet format also supports nested schemas. An example for this category are the Parquet files of the Overture Maps project. On the same machine as above, Hardwood can completely parse all the columns of a file with points of interest (~900 MB, ~9M records) in ~2.1 sec using the row reader API and in ~1.3 sec with the column reader API.\nIn order to identify bottlenecks, Hardwood comes with support for the JDK Flight Recorder, tracking key performance metrics and events such as prefetch misses, page decoding times, etc.\nFurther improving performance remains a key objective for the project going forward; to that end there are some first automated performance tests for flat and nested schemas and we are planning to set up an automated change detection pipeline using Apache Otava, allowing us to detect any potential regressions early on.\nBuilt With AI, Not By AI AI is used extensively for building Hardwood. Getting first-hand experience of how capable current LLMs are when building a relatively low-level code base like a file parser has been one of the motivations for starting the project.\nClaude Code is the tool of choice, and overall the experience has been really good. It would have been impossible to make progress as quickly without it. The task lends itself well to LLM-assisted coding: there is a comprehensive specification and an extensive suite of test files to assert correctness against. Adding support for another encoding or compression algorithm, analyzing failing tests, fixing thread pool starvation bugs—Claude handles these tasks very effectively.\nSo, are we using AI for building Hardwood? Absolutely. Is Hardwood vibe-coded? Absolutely not.\nThe LLM-generated code is a starting point, not an end state. Claude will happily duplicate logic, paper over corner cases with yet another if/else, or quietly exclude an unexpected result from a test, instead of fixing the underlying bug. You need to examine the code, understand it, and take ownership of it. \u0026#34;But Claude did this\u0026#34; is not going to cut it when things go wrong.\nAs things stand, AI is an amazing tool and an incredible productivity booster, but one you need to use with intention and deep understanding. And I don’t think it’s going to make libraries like Hardwood obsolete any time soon. Could you vibe-code something which parses a given file? Maybe. But building a solution that is correct, fast, and maintainable still takes significant effort, and there’s no point in reinventing the wheel over and over again.\nWhat’s Next? So, what’s next for Hardwood then? Really, we’re just at the beginning. I’d love for folks to take the 1.0.0.Alpha1 release for a spin and parse some of their Parquet files. Any bug reports or feature requests are welcome in the issue tracker.\nAs of this release, Hardwood supports all Parquet column types, column projections, as well as all key encoding and compression types. So it will work very well for use cases processing all the values of a file or certain columns.\nWhile the parser is relatively complete overall, one key part still missing is support for predicate push-down. The Parquet format supports different ways for pruning entire row groups from processing, including column statistics and Bloom filters. This is on the roadmap for one of the next 1.0 preview releases, at which point Hardwood will also be a good fit for use cases with higher data selectivity, e.g. allowing to fetch only relevant segments of a file from remote object storage. We are also working on a parquet-java compatibility layer, implementing its key APIs on top of Hardwood, to simplify migrations from one project to the other.\nFor the time after the 1.0 final release, we are planning to add support for writing Parquet files, and we may provide a CLI tool for inspecting and analyzing Parquet files. In the long term, the project could evolve into a library and toolkit for working with Parquet-based data lake table formats such as Apache Iceberg, it could serve as a testing bed for alternative columnar file formats, and much more.\nLastly, I’d like to give a massive shout-out to Andres Almiray (release automation) and Rion Williams (performance optimizations, JFR support) for their contributions to the project and this release!\nOnwards and upwards!\n","id":14,"publicationdate":"Feb 26, 2026","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_why_hardwood\"\u003eWhy Hardwood?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_hello_hardwood\"\u003eHello, Hardwood!\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_parsing_performance\"\u003eParsing Performance\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_built_with_ai_not_by_ai\"\u003eBuilt With AI, Not By AI\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_whats_next\"\u003eWhat’s Next?\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eToday, it’s my great pleasure to announce the first public release of Hardwood, a new parser for the Apache Parquet file format, optimized for minimal dependencies and great performance.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://github.com/hardwood-hq/hardwood\"\u003eHardwood\u003c/a\u003e is open-source (Apache License 2.0) and supports Java 21 or newer. You can grab it from \u003ca href=\"https://central.sonatype.com/search?q=hardwood\u0026amp;namespace=dev.hardwood\"\u003eMaven Central\u003c/a\u003e and start parsing your Parquet files with ease and efficiency.\u003c/p\u003e\n\u003c/div\u003e","tags":["parquet","open-source","java","performance","hardwood"],"title":"Hardwood: A New Parser for Apache Parquet","uri":"https://www.morling.dev/blog/hardwood-new-parser-for-apache-parquet/"},{"content":"","id":15,"publicationdate":"Dec 7, 2025","section":"tags","summary":"","tags":null,"title":"architecture","uri":"https://www.morling.dev/tags/architecture/"},{"content":"","id":16,"publicationdate":"Dec 7, 2025","section":"tags","summary":"","tags":null,"title":"cdc","uri":"https://www.morling.dev/tags/cdc/"},{"content":"","id":17,"publicationdate":"Dec 7, 2025","section":"tags","summary":"","tags":null,"title":"materialized-views","uri":"https://www.morling.dev/tags/materialized-views/"},{"content":"","id":18,"publicationdate":"Dec 7, 2025","section":"tags","summary":"","tags":null,"title":"streaming","uri":"https://www.morling.dev/tags/streaming/"},{"content":" Table of Contents Materialized Views Embracing Data Duplication Streams for machines, tables for humans Historically, data management systems have been built around the notion of pull queries: users query data which, for instance, is stored in tables in an RDBMS, Parquet files in a data lake, or a full-text index in Elasticsearch. When a user issues a query, the engine will produce the result set at that point in time by churning through the data set and finding all matching records (oftentimes sped up by utilizing indexes).\nGenerally, this approach of pulling data works well and it matches with how people think and operate. You have a question about your data set? Express it as a query, run that query, and the system will provide the answer. But there are some challenges with that, too:\nPerformance: Queries might be prohibitively expensive to process, taking too long to provide an answer in a satisfactory amount of time, for instance if the data set is very large, or if a query is very complex.\nData format: Depending on the kind of query, the system storing the data might not be the most suitable one to answer it. For instance, an RDBMS such as Postgres might not be the best tool for processing analytical queries.\nData shape: The data might not be in the right shape to answer a given query efficiently, for instance requiring a complex many-way join of highly normalized tables which is expensive to compute.\nData location: The data might just not be at the right place. Depending on the specific use case, it might be necessary to store it close to users, e.g. at edge locations or even on the user’s mobile device, in order to meet defined latency or availability requirements.\nMaterialized Views All these problems can be overcome with the help of materialized views, applying a very broad interpretation of that term. No matter whether you use actual materialized views in a database or work with derived datasets stored in different kinds of data systems, the idea is always the same: precompute the results of a given query and store them in a format, shape, and location optimized specifically for that query.\nThat five-way join required to fetch a purchase order with all its order lines, associated product data, shipment details, and customer information? Precompute it and store it in a materialized view in your database, allowing for super-fast look-up solely based on the order id. Ad-hoc queries by the data science team for identifying upsell opportunities across all the orders, which would put a massive load onto your operational database? Copy the data into a data lake, allowing for all sorts of analytical queries without impacting the operational database whatsoever. Need to answer queries with the lowest latency possible? Put a derived view of the data set into a cache close to your users.\nPut differently, in order to get the most value out of pull queries, you should lean into data duplication and denormalization. Arranging multiple materialized views of your data set in different ways, each one optimized for specific query and access patterns, allows you to satisfy the requirements of different use cases operating on that data set. Ultimately, it is physics which are demanding this: there’s only ever going to be a single way to iterate efficiently through a data set in its natural sort order. This is why we have indexes in databases, which, if you squint a little, are just another kind of materialized view, at least in their covering form1.\n[1] A covering index is an index which contains all the columns required by a given query. A query engine therefore needs only to scan the index for processing the query, it doesn’t have to retrieve further data from the table’s primary B+ tree or the heap.\nEmbracing Data Duplication Now, the thought of duplication might trigger some reservations: isn’t chaos going to ensue if there are multiple copies of the same data set? Which version is the right one? Aren’t you at the risk of producing inconsistent query results? I think there’s not much to fear if you keep one fundamental principle in mind: there should be exactly one canonical instance of the data set. As the system of record it is the only one that gets mutated by business transactions. All other views of the data set are derived from this one, i.e. it is the source of truth.\nIn practice, it is not feasible to update all derived views synchronously, in particular if they are located in another system. This means consumers need to account for eventual consistency of view data. For many use cases, such as analytics, that is perfectly acceptable. Other situations might have stronger consistency requirements, making it necessary to prevent stale data from being retrieved from a view. Different techniques exist for doing so, such as tracking logical timestamps or log sequence numbers (LSNs).\nThis raises the question of how to keep all these different flavors of materialized views in sync, as the original data set changes. New records will be added, existing ones be updated or removed, and all the derived views need to be updated in order to reflect these changes. You could periodically recreate any derived views from scratch, but not only might this be a very costly operation, you’d also have to deal with outdated or incomplete query results very quickly again.\nThinking about it, recomputing materialized views from scratch can be pretty wasteful. Typically, only small parts of a dataset change, hence also only small parts of any derived views should require to be updated. Intuitively, this makes a lot of sense. For instance, assume you’d want to keep track of the revenue per product category across the purchase orders in your system. When a new order arrives, would you recalculate the totals for all the categories, by processing all the orders? Of course not. Instead, you’d keep the totals of all the unrelated categories as-is. Only the total of the incoming order’s category needs updating, and you’d compute that one by simply adding the new order’s value to the previous total.\nThis is exactly how push queries work. Triggered by changes to rows in the source tables they operate on, they’ll emit new (or updated) results reflecting exactly these changes. A new row in the purchase orders table in the previous example will yield exactly one update to the sum of that order’s category. That way, push queries solve the concern of pull queries potentially being too costly and taking too long to run. As they operate on small incremental data changes, the cost is distributed over time, and each output update can be calculated very quickly.\nA core assumption of push queries is that the delta they operate on is comparatively small. If there is a massive data change—for instance when doing bulk deletes, or when backfilling historical data—instead of processing all these changes incrementally, triggering millions of state updates, each with its own overhead (lookups, partial aggregations, downstream propagation), it may be advantageous to fall back to a pull query processing the complete data set.\nStreams for machines, tables for humans Such a stream of query result updates is great for implementing realtime use cases, acting on the data as it changes. For instance, to build a system for fraud detection you could define a push query identifying certain patterns in newly created purchase orders as they come in and emit any results to some alerting system.\nIn contrast, push queries don’t work very well for how humans operate. You probably don’t want to be notified for every single update to the revenue-per-category query. Users want to process query results at their own pace and demand. Most of the time, users of a data system are not interested in a stream of data changes, but rather the effective result at a given point in time. When looking for the current balance of your bank account, you don’t want to sum up all previous transactions; you just want to see the resulting number.\nThat’s where the combination of push and pull queries comes in: taking the incrementally computed updates to the results of a push query and storing them in a system supporting pull queries lets you have the cake and eat it too.\nThere are several ways for implementing this. This could be a stream processor such as Flink SQL, operating on change events sourced from a database via change data capture, and writing data into Elasticsearch, or an Iceberg table. It could be a database supporting incremental view maintenance (IVM), such as Postgres with the pg_ivm extension. Or it could be an external IVM engine such as Feldera, Materialize, or RisingWave.\nThe IVM space has seen considerable activity over the past few years, with vendors working to extend incremental computation to increasingly complex SQL queries, e.g. with windowed aggregations or recursive logic, managing the potentially large state2 required for incremental computations and amortizing it across multiple queries, supporting edge-based caching (ReadySet), and much more.\n[2] As an example, consider an incremental MAX() query on rows which can be updated. If the row contributing the current max value gets deleted, the next largest value needs to be emitted instead. Thus, that operator needs to store all row values for that column.\nHowever, no matter which specific solution you choose, this approach allows you to materialize views incrementally and efficiently, making your data available for pull-based querying in the right format and shape, at the right location.\nIf you want instant pulls, you need constant pushes.\n","id":19,"publicationdate":"Dec 7, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_materialized_views\"\u003eMaterialized Views\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_embracing_data_duplication\"\u003eEmbracing Data Duplication\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_streams_for_machines_tables_for_humans\"\u003eStreams for machines, tables for humans\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eHistorically, data management systems have been built around the notion of \u003cem\u003epull queries\u003c/em\u003e: users query data which, for instance, is stored in tables in an RDBMS, Parquet files in a data lake, or a full-text index in Elasticsearch. When a user issues a query, the engine will produce the result set at that point in time by churning through the data set and finding all matching records (oftentimes sped up by utilizing indexes).\u003c/p\u003e\n\u003c/div\u003e","tags":["streaming","cdc","materialized-views","architecture"],"title":"You Gotta Push If You Wanna Pull","uri":"https://www.morling.dev/blog/you-gotta-push-if-you-wanna-pull/"},{"content":"","id":20,"publicationdate":"Nov 25, 2025","section":"tags","summary":"","tags":null,"title":"distributed-systems","uri":"https://www.morling.dev/tags/distributed-systems/"},{"content":" Table of Contents UUIDs Monotonically Increasing Sequences Deriving Idempotency Keys From the Transaction Log Discussion In distributed systems, there’s a common understanding that it is not possible to guarantee exactly-once delivery of messages. What is possible though is exactly-once processing. By adding a unique idempotency key to each message, you can enable consumers to recognize and ignore duplicate messages, i.e. messages which they have received and successfully processed before.\nNow, how does this work exactly? When receiving a message, a consumer takes the message’s idempotency key and compares it to the keys of the messages which it already has processed. If it has seen the key before, the incoming message is a duplicate and can be ignored. Otherwise, the consumer goes on to process the message, for instance by storing the message itself, or a view derived from it, in some kind of database.\nIn addition, it stores the idempotency key of the message. Critically, these two things must happen atomically, typically by wrapping them in a database transaction. Either the message gets processed and its idempotency key gets persisted. Or, the transaction gets rolled back and no changes are applied at all. That way, it is ensured that the consumer will process a message again upon redelivery, if it failed to do so before. It also is ensured that duplicates received after successfully processing the message are skipped over.\nUUIDs So let’s discuss what makes for a good idempotency key then. One possible option would be to use a UUIDv4. These random identifiers solve the requirement of uniquely identifying each message. However, they require the consumer to store the UUIDs of all the previous messages it ever has received in order to reliably identify a duplicate. Depending on the message volume, this may not be practical. Pragmatically, you might get away with discarding received UUIDs after some time period, if it is acceptable to occasionally receive and process a duplicate after that period. Unfortunately, neither the producer of the message nor the consumer will have any indication of the duplicated processing in that case.\nWe can somewhat improve this situation by adding a timestamp to the idempotency key, for instance by using a UUIDv7 which contains both a timestamp part (first 48 bits) and a random part (remaining bits), or an ULID. That way, the consumer can detect when it receives a message with an idempotency key which is \u0026#34;too old\u0026#34;. While it can’t decide whether the message is a duplicate or not, it can flag to the producer that it can’t handle that message. It is then upon the producer to decide how to proceed. For instance, if the message is part of a payment flow, the system might suggest to the user to first check in their banking account whether this payment has already been executed or not. Only if that’s not the case, a new message with the same payload and a fresh UUID would be sent.\nMonotonically Increasing Sequences All these intricacies can be avoided when it is possible to use a monotonically increasing sequence value as the idempotency key. In that case, the consumer does not need to store all the keys it ever has processed (or a reasonably sized subset thereof). It only needs to store a single value, the one of the latest message which it has processed. If it receives a message with the same or a lower idempotency key, that message must be a duplicate and can be ignored. When receiving messages from a partitioned source, such as a Kafka topic with multiple partitions, or from multiple independent producers (e.g., different clients of a REST API, each using their own separate sequence), then the latest key value per partition must be stored.\nMonotonically increasing idempotency keys are a great improvement from the perspective of the message consumer. On the flipside, they may make things more complicated for producers: creating monotonically increasing sequence values isn’t without its own challenges. It is trivial if producers are single-threaded, producing one message at a time. In that case, a database sequence, or even a simple in-memory counter, can be used for creating the idempotency keys. Gaps in the sequence are fine, hence it is possible to increment the persistent state of the sequence or counter in larger steps, and dispense the actual values from an in-memory copy. That way, disk IO can be reduced. From a consumer perspective, Kafka partition offsets fall into that bucket, as they can be considered a monotonically increasing idempotency key for the messages consumed from a given partition.\nThings get more complicated when the producer is subject to multiple concurrent requests at once, for instance a REST service with multiple request workers, perhaps even scaled out to multiple compute nodes in a cluster. To ensure monotonicity, retrieval of the idempotency key and emitting a message with that key must happen atomically, uninterrupted by other worker threads. Otherwise, you may end up in a situation where thread A fetches sequence value 100, thread B fetches sequence value 101, B emits a message with idempotency key 101, and then A emits a message with idempotency key 100. A consumer would then, incorrectly, discard A’s message as a duplicate.\nFor most cases, ensuring this level of atomicity will impose a severe bottleneck, essentially serializing all requests of the producer system, regardless of how many worker threads or service instances you deploy. Note that if you really wanted to go down that route, solely using a database sequence for producing the idempotency key will not work. Instead, you’d have to use a mechanism such as Postgres advisory locks in order to guarantee monotonicity of idempotency keys in the outgoing messages.\nDeriving Idempotency Keys From the Transaction Log Now, is there a way for us to have this cake and eat it too? Can we get the space efficiency for consumers when using monotonically increasing idempotency keys, without hampering performance of multi-threaded producers? Turns out we can, at least when the emission of messages can be made an asynchronous activity in the producer system, happening independently from processing inbound requests. This means clients of the producer system receive confirmation that the intent to send a message or request was persisted, but they don’t get the result of the same right away. If a use case can be modeled with these semantics, the problem can be reduced to the single-threaded situation above: instead of emitting messages directly to the target system, each producer thread inserts them into a queue. This queue is processed by a single-threaded worker process which emits all the messages sequentially. As argued in The Synchrony Budget, making activities asynchronous can be generally advantageous, if we don’t require their outcome right away.\nOne specific way to do so would be a variation of the widely used outbox pattern, utilizing the transaction log of the producer service’s database. After all, it’s not necessary to sequence inbound requests ourselves as the database already is doing that for us when serializing the transactions in its log. When producers persist the intent to send a message in the transaction log—for instance by writing a record into a specific table—a process tailing the log can assign idempotency keys to these messages based on their position in the transaction log.\nAn implementation of this is straight-forward using tools for log-based Change Data Capture (CDC), such as Debezium: You retrieve the messages to be sent from the log by capturing the INSERT events from the outbox table, and assign an idempotency key before emitting them, derived from their log offset. The exact details are going to depend on the specific database.\nFor example, in Postgres it is ensured that the log sequence numbers (LSN) of commit events within its write-ahead log (WAL) are monotonically increasing: the commit event of a transaction committing after another transaction will have a higher LSN. Furthermore, it is guaranteed that within a given transaction, the LSNs of the events are also monotonically increasing. This makes the tuple of { Commit LSN, Event LSN } a great fit for an idempotency key. In order to not leak the fact that a producer is using a Postgres database, both values can be encoded into a single 128 bit number value. Note that you don’t need to deploy Kafka or Kafka Connect for this solution. Debezium’s embedded engine is a great fit for this use case, allowing you to assign idempotency keys from within a callback method in the producer service itself, not requiring any further infrastructure. When using Postgres to implement this pattern, you don’t even need a dedicated outbox table, as it lets you write arbitrary contents into the transaction log via pg_logical_emit_message(), which is perfect for the use case at hand.\nDiscussion So, when to use which kind of idempotency key then? As always, there are no silver bullets, and the answer depends on your specific use case. For many scenarios, using UUIDs and dropping them after some time will probably be sufficient, provided you can tolerate that messages occasionally can be processed a second time when duplicates arrive after the retention period of processed keys.\nThe more messages you need to process overall, the more attractive a solution centered around monotonically increasing sequences becomes, as it allows for space-efficient duplicate detection and exclusion, no matter how many messages you have. The proposed log-based approach can be an efficient solution for doing so, but it also adds operational complexity: your database needs to support logical replication, you need to run a CDC connector, etc. However, many organizations already operate CDC pipelines for other purposes (analytics, search indexing, cache invalidation, etc.). If you’re in that category, the incremental complexity is minimal. If you’re not, you should weigh the operational overhead against the benefits (constant-space duplicate detection) for your specific scale.\n","id":21,"publicationdate":"Nov 25, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_uuids\"\u003eUUIDs\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_monotonically_increasing_sequences\"\u003eMonotonically Increasing Sequences\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_deriving_idempotency_keys_from_the_transaction_log\"\u003eDeriving Idempotency Keys From the Transaction Log\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_discussion\"\u003eDiscussion\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn distributed systems, there’s a common understanding that it is not possible to guarantee exactly-once delivery of messages.\n\u003ca href=\"https://medium.com/@jaykreps/exactly-once-support-in-apache-kafka-55e1fdd0a35f\"\u003eWhat is possible\u003c/a\u003e though is \u003cem\u003eexactly-once processing\u003c/em\u003e. By adding a unique idempotency key to each message, you can enable consumers to recognize and ignore duplicate messages, i.e. messages which they have received and successfully processed before.\u003c/p\u003e\n\u003c/div\u003e","tags":["distributed-systems","reliability","patterns"],"title":"On Idempotency Keys","uri":"https://www.morling.dev/blog/on-idempotency-keys/"},{"content":"","id":22,"publicationdate":"Nov 25, 2025","section":"tags","summary":"","tags":null,"title":"patterns","uri":"https://www.morling.dev/tags/patterns/"},{"content":"","id":23,"publicationdate":"Nov 25, 2025","section":"tags","summary":"","tags":null,"title":"reliability","uri":"https://www.morling.dev/tags/reliability/"},{"content":" Table of Contents Hello Persistasaurus! Capturing Execution State Delayed Executions Human Interaction Managing State Wrapping Up Lately, there has been a lot of excitement around Durable Execution (DE) engines. The basic idea of DE is to take (potentially long-running) multi-step workflows, such as processing a purchase order or a user sign-up, and make their individual steps persistent. If a flow gets interrupted while running, for instance due to a machine failure, the DE engine can resume it from the last successfully executed step and drive it to completion.\nThis is a very interesting value proposition: the progress of critical business processes is captured reliably, ensuring they’ll complete eventually. Importantly, any steps performed already successfully won’t be repeated when retrying a failed flow. This helps to ensure that flows are executed correctly (for instance preventing inventory from getting assigned twice to the same purchase order), efficiently (e.g. avoiding repeated remote API calls), and deterministically. One particular category of software which benefits from this are agentic systems, or more generally speaking, any sort of system which interacts with LLMs. LLM calls are slow and costly, and their results are non-deterministic. So it is desirable to avoid repeating any previous LLM calls when continuing an agentic flow after a failure.\nNow, at a high level, \u0026#34;durable execution\u0026#34; is nothing new. A scheduler running a batch job for moving purchase orders through their lifecycle? You could consider this a form of durable execution. Sending a Kafka message from one microservice to another and reacting to the response message in a callback? Also durable execution, if you squint a little. A workflow engine running a BPMN job? Implementing durable execution, before the term actually got popularized. All these approaches model multi-step business transactions—​making the logical flow of the overall transaction more or less explicit—​in a persistent way, ensuring that transactions progress safely and reliably and eventually complete.\nHowever, modern DE typically refers to one particular approach for achieving this goal: Workflows defined in code, using general purpose programming languages such as Python, TypeScript, or Java. That way, developers don’t need to pick up a new language for defining flows, as was the case with earlier process automation platforms. They can use their familiar tooling for editing flows, versioning them, etc. A DE engine transparently tracks program progress, persists execution state in the form of durable checkpoints, and enables resumption after failures.\nNaturally, this piqued my interest: what would it take to implement a basic DE engine in Java? Can we achieve something useful with less than, let’s say, 1,000 lines of code? The idea being not to build a production-ready engine, but to get a better understanding of the problem space and potential solutions for it. You can find the result of this exploration, called Persistasaurus, in this GitHub repository. Coincidentally, this project also serves as a very nice example of how modern Java versions can significantly simplify the life of developers.\nHello Persistasaurus! Let’s take a look at an example of what you can do with Persistasaurus and then dive into some of the key implementation details. As per the idea of DE, flows are implemented as regular Java code. The entry point of a flow is a method marked with the @Flow annotation. Individual flow steps are methods annotated with @Step:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public class HelloWorldFlow { @Flow public void sayHello() { int sum = 0; for (int i = 0; i \u0026lt; 5; i++) { sum += say(\u0026#34;World\u0026#34;, i); } System.out.println(String.format(\u0026#34;Sum: %s\u0026#34;, sum)); } @Step protected int say(String name, int count) { System.out.println(String.format(\u0026#34;Hello, %s (%s)\u0026#34;, name, count)); return count; } } Steps are the unit of persistence—​their outcomes are recorded, and when resuming a flow after a failure, it will continue from the last successfully run step method. Now, which exact parts of a flow warrant being persisted as a step is on the developer to decide. You don’t want to define steps too granularly, so as to keep the overhead of logging low. In general, flow sections which are costly or time-consuming to run or whose result cannot easily be reproduced, are great candidates for being moved into a step method.\nA flow is executed by obtaining a FlowInstance object and then calling the flow’s main method:\n1 2 3 4 5 6 UUID uuid = UUID.randomUUID(); FlowInstance\u0026lt;HelloWorldFlow\u0026gt; flow = Persistasaurus.getFlow( HelloWorldFlow.class, uuid); flow.run(f -\u0026gt; f.sayHello()); Each flow run is identified by a unique id, allowing to re-execute it after a failure, or to resume it when waiting for an external signal (\u0026#34;human in the loop\u0026#34;, more on that below). If the Hello World flow runs to completion, the following will be logged to stdout:\n1 2 3 4 5 6 Hello, World (0) Hello, World (1) Hello, World (2) Hello, World (3) Hello, World (4) Sum: 10 Now let’s assume something goes wrong while executing the third step:\n1 2 3 4 Hello, World (0) Hello, World (1) Hello, World (2) RuntimeException(\u0026#34;Uh oh\u0026#34;) When re-running the flow, using the same UUID as before, it will retry that failed step and resume from there. The first two steps which were already run successfully are not re-executed. Instead, they will be replayed from a persistent execution log, which is based on SQLite, an embedded SQL database:\n1 2 3 Hello, World (3) Hello, World (4) Sum: 10 In the following, let’s take a closer look at some of the implementation choices in Persistasaurus.\nCapturing Execution State At the core of every DE engine there’s some form of persistent durable execution log. You can think of this a bit like the write-ahead log of a database. It captures the intent to execute a given flow step, which makes it possible to retry that step should it fail, using the same parameter values. Once successfully executed, a step’s result will also be recorded in the log, so that it can be replayed from there if needed, without having to actually re-execute the step itself.\nDE logs come in two flavours largely speaking; one is in the form of an external state store which is accessed via some sort of SDK. Example frameworks taking this approach include Temporal, Restate, Resonate, and Inngest. The other option is to persist DE state in the local database of a given application or (micro)service. One solution in this category is DBOS, which implements DE on top of Postgres.\nTo keep things simple, I went with the local database model for Persistasaurus, using SQLite for storing the execution log. But as we’ll see later on, depending on your specific use case, SQLite actually might also be a great choice for a production scenario, for instance when building a self-contained agentic system.\nThe structure of the execution log table in SQLite is straight-forward. It contains one entry for each durable execution step:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 CREATE TABLE IF NOT EXISTS execution_log ( flowId TEXT NOT NULL, (1) step INTEGER NOT NULL, (2) timestamp INTEGER NOT NULL, (3) class_name TEXT NOT NULL, (4) method_name TEXT NOT NULL, (5) delay INTEGER, (6) status TEXT (7) CHECK( status IN (\u0026#39;PENDING\u0026#39;,\u0026#39;WAITING_FOR_SIGNAL\u0026#39;,\u0026#39;COMPLETE\u0026#39;) ) NOT NULL, attempts INTEGER NOT NULL DEFAULT 1, (8) parameters BLOB, (9) return_value BLOB, (10) PRIMARY KEY (flowId, step) ) 1 The UUID of the flow 2 The sequence number of the step within the flow, in the order of execution 3 The timestamp of first running this step 4 The name of the class defining the step method 5 The name of the step method (currently ignoring overloaded methods for this PoC) 6 For delayed steps, the delay in milli-seconds 7 The current status of the step 8 A counter for keeping track of how many times the step has been tried 9 The serialized form of the step’s input parameters, if any 10 The serialized form of the step’s result, if any This log table stores all information needed to capture execution intent and persist results. More details on the notion of delays and signals follow further down.\nWhen running a flow, the engine needs to know when a given step gets executed so it can be logged. One common way for doing so is via explicit API calls into the engine, e.g. like so with DBOS Transact:\n1 2 3 4 5 @Workflow public void workflow() { DBOS.runStep(() -\u0026gt; stepOne(), \u0026#34;stepOne\u0026#34;); DBOS.runStep(() -\u0026gt; stepTwo(), \u0026#34;stepTwo\u0026#34;); } This works, but tightly couples workflows to the DE engine’s API. For Persistaurus I aimed to avoid this dependency as much as possible. Instead, the idea is to transparently intercept the invocations of all step methods and track them in the execution log, allowing for a very concise flow expression, without any API dependencies:\n1 2 3 4 5 @Flow public void workflow() { stepOne(); stepTwo(); } In order for the DE engine to know when a flow or step method gets invoked, the proxy pattern is being used: a proxy wraps the actually flow object and handles each of its method invocations, updating the state in the execution log before and after passing the call on to the flow itself. Thanks to Java’s dynamic nature, creating such a proxy is relatively easy, requiring just a little bit of bytecode generation. Unsurprisingly, I’m using the ByteBuddy library for this job:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 private static \u0026lt;T\u0026gt; T getFlowProxy(Class\u0026lt;T\u0026gt; clazz, UUID id) { try { return new ByteBuddy() .subclass(clazz) (1) .method(ElementMatchers.any()) (2) .intercept( (3) MethodDelegation.withDefaultConfiguration() .withBinders( Morph.Binder.install(OverrideCallable.class)) .to(new Interceptor(id))) .make() .load(Persistasaurus.class.getClassLoader()) (4) .getLoaded() .getDeclaredConstructor() .newInstance(); (5) } catch (Exception e) { throw new RuntimeException(\u0026#34;Couldn\u0026#39;t instantiate flow\u0026#34;, e); } } 1 Create a sub-class proxy for the flow type 2 Intercept all method invocations on this proxy…​ 3 …​and delegate them to an Interceptor object 4 Load the generated proxy class 5 Instantiate the flow proxy As an aside, Claude Code does an excellent job in creating code using the ByteBuddy API, which is not always self-explanatory. Now, whenever a method is invoked on the flow proxy, the call is delegated to the Interceptor class, which will record the step in the execution log before invoking the actual flow method. I am going to spare you the complete details of the method interceptor implementation (you can find it here on GitHub), but the high-level logic looks like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 public Object intercept(@This Object instance, @Origin Method method, @AllArguments Object[] args, @Morph OverrideCallable callable) throws Throwable { if (!isFlowOrStep(method)) { return callable.call(args); } Invocation loggedInvocation = executionLog.getInvocation(id, step); if (loggedInvocation != null \u0026amp;\u0026amp; loggedInvocation.status() == InvocationStatus.COMPLETE) { (1) step++; return loggedInvocation.returnValue(); } else { executionLog.logInvocationStart( id, step, method.getName(), InvocationStatus.PENDING, args); (2) int currentStep = step; step++; Object result = callable.call(args); (3) executionLog.logInvocationCompletion(id, currentStep, result); (4) return result; } } 1 Replay completed step if present 2 Log invocation 3 Execute the actual step method 4 Log result Replaying completed steps from the log is essential for ensuring deterministic execution. Each step typically runs exactly once, capturing non-deterministic values such as the current time or random numbers while doing so.\nThere’s an important failure mode, though: if the system crashes after a step has been executed but before the result can be recorded in the log, that step would be repeated when rerunning the flow. Odds for this to happen are pretty small, but whether it is acceptable or not depends on the particular use case. When executing steps with side-effects, such as remote API calls, it may be a good idea to add idempotency keys to the requests, which lets the invoked services detect and ignore any potential duplicate calls.\nThe actual execution log implementation isn’t that interesting, you can find its source code here. All it does is persist step invocations and their status in the execution_log SQLite table shown above.\nDelayed Executions At this point, we have a basic Durable Execution engine which can run simple flows as the one above. Next, I explored implementing delayed execution steps. As an example, consider a user onboarding flow, where you might want to send out an email with useful resources a few days after a user has signed up. Using the annotation-based programming model of Persistasaurus, this can be expressed like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public class SignupFlow { @Flow public void signUp(String userName, String email) { long id = createUserRecord(userName, email); sendUsefulResources(id); } @Step protected long createUserRecord(String userName, String email) { // persist the user... return id; } @Step(delay=3, timeUnit=DAYS) protected void sendUsefulResources(long userId) { // send the email... } } Naturally, we don’t want to block the initiating thread when delaying a step—​for instance, a web application’s request handler. Instead, we need a way to temporarily yield execution of the flow, return control to the caller, and then later on, when the configured delay has passed, resume the flow.\nUnlike other programming languages, Java doesn’t support continuations via its public API. So how could we yield control then? One option would be to define a specific exception type, let’s say FlowYieldException, and raise it from within the method interceptor when encountering a delayed method. The call stack would be unwound until some framework-provided exception handler catches that exception and returns control to the code triggering the flow. For this to work, it is essential that no user-provided flow or step code catches that exception type. Alternatively, one could transform the bytecode of the step method (and all the methods below it in the call stack), so that it can return control at given suspension points and later on resume from there, similar to how Kotlin’s coroutines are implemented under the hood (\u0026#34;continuation passing style\u0026#34;).\nLuckily, Java 21 offers a much simpler solution. This version added support for virtual threads (JEP 444), and while you shouldn’t block OS level threads, blocking virtual threads is totally fine. Virtual threads are lightweight user-mode threads managed by the JVM, and an application can have hundreds of thousands, or even millions of them at once. Thus I decided to implement delayed executions in Persistasaurus through virtual threads, sleeping for the given period of time when encountering a delayed method.\nTo run a flow with a delayed step, trigger it via runAsync(), which immediately returns control to the caller:\n1 2 3 4 FlowInstance\u0026lt;SignupFlow\u0026gt; flow = Persistasaurus.getFlow( SignupFlow.class, uuid); flow.runAsync(f -\u0026gt; f.signUp(\u0026#34;Bob\u0026#34;, \u0026#34;bob@example.com\u0026#34;)); When putting a virtual thread running a flow method asleep, it will be unmounted from the underlying OS level carrier thread, freeing its resources. Later on, once the sleep time has passed, the virtual thread will be remounted onto a carrier thread and continue the flow. When rerunning non-finished flows with a delayed execution step, Persistasaurus will only sleep for the remainder of the configured delay, which might be zero if enough time has passed since the original run of the flow.\nSo in fact, you could think of virtual threads as a form of continuations; and indeed, if you look closely at the stacktrace of a virtual thread, you’ll see that the frame at the very bottom is the enter() method of a JDK-internal class Continuation. Interestingly, this class was even part of the public Java API in early preview versions of virtual threads, but it got made private later on.\nHuman Interaction As the last step of my exploration I was curious how flows with \u0026#34;human in the loop\u0026#34;-steps could be implemented: steps where externally provided input or data is required in order for a flow to continue. Sticking to the sign-up flow example, this could be an email by the user, so as to confirm their identity (double opt-in). As much as possible, I tried to stick to the idea of using plain method calls for expressing the flow logic, but I couldn’t get around making flows invoke a Persistasaurus-specific method, await(), for signalling that a step requires external input:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 public class SignupFlow { @Flow public void signUp(String userName, String email) { long id = createUserRecord(userName, email); sendEmailConfirmationRequest(email); await(() -\u0026gt; confirmEmailAddress(any())); (1) finalizeSignUp(id); } @Step protected void confirmEmailAddress(Instant timeOfConfirmation) { // ... } } 1 Await the invocation of the given step method When the method interceptor encounters a step method invoked from within an await() block, it doesn’t go on to actually execute right away. Instead, the flow will await continuation until the step method gets triggered. This is why it doesn’t matter which parameter values are passed to that step within the flow definition. You could pass null, or, as a convention, the any() placeholder method.\nIn order to provide the input to a waiting step and continue the flow, call the step method via resume(), for instance like so, in a request handler method of a Spring Boot web application:\n1 2 3 4 5 6 7 8 9 @PostMapping(\u0026#34;/email-confirmations\u0026#34;) void confirmEmailAddress(@RequestBody Confirmation confirmation) { FlowInstance\u0026lt;UserSignupFlow\u0026gt; flow = Persistasaurus.getFlow( UserSignupFlow.class, confirmation.uuid()); flow.resume(f -\u0026gt; { f.confirmEmailAddress(confirmation.timestamp()); }); } The flow will then continue from that step, using the given parameter value(s) as its input. For this to work, we need a way for the engine to know whether a given step method gets invoked from within resume() and thus actually should be executed, or, whether it gets invoked from within await() and hence should be suspended.\nSeasoned framework developers might immediately think of using thread-local variables for this purpose, but as of Java 25, this can be solved much more elegantly and safely using so-called scoped values, as defined in JEP 506. To quote that JEP, scoped values\nenable a method to share immutable data both with its callees within a thread, and with child threads. Scoped values are easier to reason about than thread-local variables. They also have lower space and time costs\nScoped values are typically defined as as a static field like so:\n1 2 3 4 5 6 7 8 9 public class Persistasaurus { enum CallType { RUN, AWAIT, RESUME; } static final ScopedValue\u0026lt;CallType\u0026gt; CALL_TYPE = ScopedValue.newInstance(); // ... } To set the scoped value and run some unit of code with that value, call ScopedValue::where():\n1 2 3 public static void await(Runnable r) { ScopedValue.where(CALL_TYPE, CallType.AWAIT).run(r); } Unlike thread-local variables, this ensures the scoped value is cleared when leaving the scope. Then, further down in the call stack, within the method handler, the scoped value can be consumed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CallType callType = CALL_TYPE.get(); if (callType == CallType.RESUME) { WaitCondition waitCondition = getWaitCondition(flowId); waitCondition.lock.lock(); try { waitCondition.condition.signal(); } finally { waitCondition.lock.unlock(); } } In order to yield control when waiting for external input and to resume when that input has been provided, a ReentrantLock with a wait condition is used. Similar to the sleep() call used for fixed delay steps above, a virtual thread will be unmounted from its carrier when waiting for a condition.\nWhen accidentally trying to access a scoped value which isn’t actually set, an exception will be raised, addressing another issue you’d commonly encounter with thread-local variables. This might not seem like a huge deal, but it’s great to see how the Java platform continues to evolve and improves things like this.\nManaging State Let’s dive a bit deeper into managing state in a durable execution engine. For the example DE implementation developed for this blog post, I went with SQLite primarily for the sake of simplicity. Now, would you use SQLite, as an embedded database, also in an actual production-ready implementation? The answer is going to depend on your specific use case. If, for instance, you are building a self-contained AI agent and you want to use DE for making sure LLM invocations are not repeated when the agent crashes, an embedded database such as SQLite would make for a great store for persisting execution state. Each agent could have its own database, thus avoiding any concurrent writes, which can pose a bottleneck due to SQLite’s single-writer design.\nOn the other hand, if you’re building a system with a high number of parallel requests by different users, such as a typical microservice, a client/server database such as Postgres or MySQL would be a better fit. If that system already maintains state in a database (as most services do), then re-using that same database to store execution state provides a critical advantage: Updates to the application’s data and its execution state can happen atomically in a single database transaction, providing atomicity guarantees. This solution is implemented by the DBOS engine, on top of Postgres, for instance.\nAnother category of DE engines which include systems such as Temporal and Restate, utilizes a separate server component with its own dedicated store for persisting execution state. This approach can be very useful to implement flows spanning across a set of multiple services (sometimes referred to as Sagas). By keeping track of the overall execution state in one central place, they essentially avoid the need for cross-system transactions.\nAnother advantage of this approach is that the actual application doesn’t have to keep running while waiting for delayed execution steps, making it a great fit for systems implemented in the form of scale-to-zero serverless designs (Function-as-a-Service, Knative, etc.). The downside of this centralized design is the potentially closer coupling of the participating services, as they all need to converge on a specific DE engine, on one specific version of that engine, etc. Also HA and fault tolerance must be a priority in order to avoid the creation of a single point of failure between all the orchestrated services.\nWrapping Up At its heart, the idea of Durable Execution is not a complex one: Potentially long-running workflows are organized into individual steps whose execution status and result is persisted in a durable form. That way, flows become resumable after failures, while skipping any steps already executed successfully. You could think of it as a persistent implementation of the memoization pattern, or a persistent form of continuations.\nAs demonstrated in this post and the accompanying source code, it doesn’t take too much work to create a functioning PoC for a DE engine. Of course, it’s still quite a way to go from there to a system you’d actually want to put into production. At the persistence level, you’d have to address aspects such as (horizontal) scalability, fault tolerance and HA. The engine should support things such as retrying failing steps with exponential back-off, parallel execution of workflow steps, throttling flow executions, compensation steps for implementing Sagas, and more. You’d also want to have a UI for managing flows, analyzing, restarting, and debugging them. Finally, you should also have a strategy for evolving flow definitions and the state they persist, in particular when dealing with long-running flows which may take days, weeks, or months to complete.\n","id":24,"publicationdate":"Nov 20, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_hello_persistasaurus\"\u003eHello Persistasaurus!\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_capturing_execution_state\"\u003eCapturing Execution State\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_delayed_executions\"\u003eDelayed Executions\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_human_interaction\"\u003eHuman Interaction\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_managing_state\"\u003eManaging State\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrapping_up\"\u003eWrapping Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLately, there has been a lot of excitement around Durable Execution (DE) engines.\nThe basic idea of DE is to take (potentially long-running) multi-step workflows,\nsuch as processing a purchase order or a user sign-up,\nand make their individual steps persistent.\nIf a flow gets interrupted while running, for instance due to a machine failure,\nthe DE engine can resume it from the last successfully executed step and drive it to completion.\u003c/p\u003e\n\u003c/div\u003e","tags":["sqlite","durable-execution","architecture"],"title":"Building a Durable Execution Engine With SQLite","uri":"https://www.morling.dev/blog/building-durable-execution-engine-with-sqlite/"},{"content":"","id":25,"publicationdate":"Nov 20, 2025","section":"tags","summary":"","tags":null,"title":"durable-execution","uri":"https://www.morling.dev/tags/durable-execution/"},{"content":"","id":26,"publicationdate":"Nov 20, 2025","section":"tags","summary":"","tags":null,"title":"sqlite","uri":"https://www.morling.dev/tags/sqlite/"},{"content":" Looking to make it to the front page of HackerNews? Then writing a post arguing that \u0026#34;Postgres is enough\u0026#34;, or why \u0026#34;you don’t need Kafka at your scale\u0026#34; is a pretty failsafe way of achieving exactly that. No matter how often it has been discussed before, this topic is always doing well. And sure, what’s not to love about that? I mean, it has it all: Postgres, everybody’s most favorite RDBMS—​check! Keeping things lean and easy—​sure, count me in! A somewhat spicy take—​bring it on!\nThe thing is, I feel all these articles kinda miss the point; Postgres and Kafka are tools designed for very different purposes, and naturally, which tool to use depends very much on the problem you actually want to solve. To me, the advice \u0026#34;You Don’t Need Kafka, Just Use Postgres\u0026#34; is doing more harm than good, leading to systems built in a less than ideal way, and I’d like to discuss why this is in more detail in this post. Before getting started though, let me get one thing out of the way really quick: this is not an anti-Postgres post. I enjoy working with Postgres as much as the next person (for those use cases it is meant for). I’ve used it in past jobs, and I’ve written about it on this blog before. No, this is a pro-\u0026#34;use the right tool for the job\u0026#34; post.\nSo what’s the argument of the \u0026#34;You Don’t Need Kafka, Just Use Postgres\u0026#34; posts? Typically, they argue that Kafka is hard to run or expensive to run, or a combination thereof. When you don’t have \u0026#34;big data\u0026#34;, this cost may not be justified. And if you already have Postgres as a database in your tech stack, why not keep using this, instead of adding yet another technology?\nUsually, these posts then go on to show how to use SELECT ... FOR UPDATE SKIP LOCKED for building a… job queue. Which is where things already start to make a bit less sense to me. The reason being that queuing just is not a typical use case for Kafka to begin with. It requires message-level consumer parallelism, as well as the ability to acknowledge individual messages, something Kafka historically has not supported. Now, the Kafka community actually is working towards queue support via KIP-932, but this is not quite ready for primetime yet (I took a look at that KIP earlier this year). Until then, the argument boils down to not use Kafka for something it has not been designed for in the first place. Hm, yeah, ok?\nThat being said, building a robust queue on top of Postgres is actually harder than it may sound. Long-running transactions by queue consumers can cause MVCC bloat and WAL pile-up; Postgres\u0026#39; vacuum process not being able to keep up with the rate of changes can quickly become a problem for this use case. So if you want to go down that path, make sure to run representative performance tests, for a sustained period of time. You won’t find out about issues like this by running two minute tests.\nSo let’s actually take a closer look at the \u0026#34;small scale\u0026#34; argument, as in \u0026#34;with such a low data volume, you just can use Postgres\u0026#34;. But to use it for what exactly? What is the problem you are trying to solve? After all, Postgres and Kafka are tools designed for addressing specific use cases. One is a database, the other is an event streaming platform. Without knowing and talking about what one actually wants to achieve, the conversation boils down to \u0026#34;I like this tool better than that\u0026#34; and is pretty meaningless.\nKafka enables a wide range of use cases such as microservices communication and data exchange, ingesting IoT sensor data, click streams, or metrics, log processing and aggregation, low-latency data pipelines between operational databases and data lakes/warehouses, and realtime stream processing, for instance for fraud detection and recommendation systems.\nSo if you have one of those use cases, but at a small scale (low volume of data), could you then use Postgres instead of Kafka? And if so, does it make sense? To answer this, you need to consider the capabilities and features you get from Kafka which make it such a good fit for these applications. And while scalability indeed is one of Kafka’s core characteristics, it has many other traits which make it very attractive for event streaming applications:\nLog semantics: At its core, Kafka is a persistent ordered event log. Records are not deleted after processing, instead they are subject to time-based retention policies or key-based compaction, or they could be retained indefinitely. Consumers can replay a topic from a given offset, or from the very beginning. If needed, consumers can work with exactly-once semantics. This goes way beyond simple queue semantics and replicating it on top of Postgres will be a substantial undertaking.\nFault tolerance and high availability (HA): Kafka workloads are scaled out in clusters running on multiple compute nodes. This is done for two reasons: increasing the throughput the system can handle (not relevant at small scale) and increasing reliability (very much relevant also at small scale). By replicating the data to multiple nodes, instance failures can be easily tolerated. Each node in the cluster can be a leader for a topic partition (i.e., receive writes), with another node taking over if the previous leader becomes unavailable. With Postgres in contrast, all writes go to a single node, while replicas only support read requests. A broker failover in Kafka will affect (in the form of increased latencies) only those partitions it is the leader for, whereas the failure of the Postgres primary node in a cluster is going to affect all writers. While Kafka broker failovers happen automatically, manual intervention is required in order to promote a Postgres replica to primary, or an external coordinator such as Patroni must be used. Alternatively, you might consider Postgres-compatible distributed databases such as CockroachDB, but then the conversation shifts quite a bit away from \u0026#34;Just use Postgres\u0026#34;.\nConsumer groups: One of the strengths of the Kafka protocol is its support for organizing consumers in groups. Multiple clients can distribute the load of reading the messages from a given topic, making sure that each message is processed by exactly one member of the group. Also when handling only a low volume of messages, this is very useful. For instance, consider a microservice which receives messages from another service. For the purposes of fault-tolerance, the service is scaled out to multiple instances. By configuring a Kafka consumer group for all the service instances, the incoming messages will be distributed amongst them. How would the same look when using Postgres? Considering the \u0026#34;small scale\u0026#34; scenario, you could decide that only one of the service instances should read all the messages. But which one do you select? What happens if that node fails? Some kind of leader election would be required. Ok, so let’s make each member of the application cluster consume from the topic then? For this you need to think about how to distribute the messages from the Postgres-based topic, how to handle client failures, etc. So your job now essentially is to re-implement Kafka’s consumer rebalance protocol. This is far from trivial and it certainly goes against the initial goal of keeping things simple.\nLow latency: Let’s talk about latency, i.e. the time it takes from sending a message to a topic until it gets processed by a consumer. Having a low data volume doesn’t necessarily imply that you do not want low latency. Think about fraud detection, for example. Also when processing only a handful of transactions per second, you want to be able to spot fraudulent patterns very quickly and take action accordingly. Or a data pipeline from your operational data store to a search index. For a good user experience, search results should be based on the latest data as much as possible. With Kafka, latencies in the milli-second range can be achieved for use cases like this. Trying to do the same with Postgres would be really tough, if possible at all. You don’t want to hammer your database with queries from a herd of poll-based queue clients too often, while LISTEN/NOTIFY is known to suffer from heavy lock contention problems.\nConnectors: One important aspect which is usually omitted from all the \u0026#34;Just use Postgres\u0026#34; posts is connectivity. When implementing data pipelines and ETL use cases, you need to get data out of your data source and put it into Kafka. From there, it needs to be propagated into all kinds of data sinks, with the same dataset oftentimes flowing into multiple sinks at once, such as a search index and a data lake. Via Kafka Connect, Kafka has a vast ecosystem of source and sink connectors, which can be combined, mix-and-match style. Taking data from MySQL into Iceberg? Easy. Going from Salesforce to Snowflake? Sure. There’s ready-made connectors for pretty much every data system under the sun. Now, what would this look like when using Postgres instead? There’s no connector ecosystem for Postgres like there is for Kafka. This makes sense, as Postgres never has been meant to be a data integration platform, but it means you’ll have to implement bespoke source and sink connectors for all the systems you want to integrate with.\nClients, schemas, developer experience: One last thing I want to address is the general programming model of a \u0026#34;Just use Postgres\u0026#34; event streaming solution. You might think of using SQL as the primary interface for producing and consuming messages. That sounds easy enough, but it’s also very low level. Building some sort of client will probably make sense. You may need consumer group support, as discussed above. You’ll need support for metrics and observability (\u0026#34;What’s my consumer lag?\u0026#34;). How do you actually go about converting your events into a persistent format? Some kind of serializer/deserializer infrastructure will be needed, and while at it, you probably should have support for schema management and evolution, too. What about DLQ support? With Kafka and its ecosystem, you get battle-proven clients and tooling, which will help you with all that, for all kinds of programming languages. You could rebuild all this, of course, but it would take a long time and essentially equate to recreating large parts of Kafka and its ecosystem.\nSo where does all that leave us? Should you use Postgres as a job queue then? I mean, why not, if it fits the bill for you, go for it. Don’t build it yourself though, use an existing extension like pgmq. And make sure to understand the potential implications on MVCC bloat and vacuuming discussed above.\nNow, when it comes to using Postgres instead of Kafka as an event streaming platform, this proposition just doesn’t make an awful lot of sense to me, no matter what the volume of the data is going to be. There’s so much more to event streaming than what’s typically discussed in the \u0026#34;Just use Postgres\u0026#34; posts; while you might be able to punt some of the challenges for some time, you’ll eventually find yourself in the business of rebuilding your own version of Kafka, on top of Postgres. But what’s the point of recreating and maintaining the work already done by hundreds of contributors in the course of many years? What starts as an effort to \u0026#34;keep things simple\u0026#34; actually creates a substantial amount of unnecessary complexity. Solving this challenge might sound like a lot of fun purely from an engineering perspective, but for most organizations out there, it’s probably just not the right problem they should focus on.\nAnother problem of the \u0026#34;small scale\u0026#34; argument is that what’s a low data volume today may be a much bigger volume next week. This is a trade-off, of course, but a common piece of advice is to build your systems for the current and the next order of magnitude of load: you should be able to sustain 10x of your current load and data volume as your business grows. This will be easily doable with Kafka which has been designed with scalability at its core, but it may be much harder for a queue implementation based on Postgres. It is single-writer as discussed above, so you’d have to look at scaling up, which becomes really expensive really quickly. So you might decide to migrate to Kafka eventually, which will be a substantial effort when thinking of migrating data, moving your applications from your home-grown clients to Kafka, etc.\nIn the end, it all comes down to choosing the right tool for the job. Use Postgres if you want to manage and query a relational data set. Use Kafka if you need to implement realtime event streaming use cases. Which means, yes, oftentimes, it actually makes sense to work with both tools as part of your overall solution: Postgres for managing a service’s internal state, and Kafka for exchanging data and events with other services. Rather than trying to emulate one with the other, use each one for its specific strengths. How to keep both Postgres and Kafka in sync in this scenario? Change data capture, and in particular the outbox pattern can help there. So if there is a place for \u0026#34;Postgres over Kafka\u0026#34;, it is actually here: for many cases it makes sense to write to Kafka not directly, but through your database, and then to emit events to Kafka via CDC, using tools such as Debezium. That way, both resources are (eventually) consistent, keeping things very simple from an application developer perspective.\nThis approach also has the benefit of decoupling (and protecting) your operational datastore from the potential impact of downstream event consumers. You probably don’t want to be at the risk of increased tail latencies of your operational REST API because there’s a data lake ingest process, perhaps owned by another team, which happens to reread an entire topic from a table in your service’s database at the wrong time. Adhering to the idea of the synchrony budget, it makes sense to separate the systems for addressing these different concerns.\nWhat about the operational overhead then? While this definitely warrants consideration, I believe that oftentimes that concern is overblown. Running Kafka for small data sets really isn’t that hard. With the move from ZooKeeper to KRaft mode, running a single Kafka instance is trivial for scenarios not requiring fault tolerance. Managed services make running Kafka a very uneventful experience (pun intended) and should be the first choice, in particular when setting out with low scale use cases. Cost will be manageable kinda by definition by virtue of having a low volume of data. Plus, the time and effort for solving all the issues with a custom implementation discussed above should be part of the TCO consideration to be useful.\nSo yes, if you want to make it to the front page of HackerNews, arguing that \u0026#34;Postgres is enough\u0026#34; may get you there; but if you actually want to solve your real-world problems in an effective and robust way, make sure to understand the sweet spots and limitations of your tools and use the right one for the job.\n","id":27,"publicationdate":"Nov 3, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLooking to make it to the front page of HackerNews? Then writing a post arguing that \u0026#34;Postgres is enough\u0026#34;, or why \u0026#34;you don’t need Kafka at your scale\u0026#34; is a pretty failsafe way of achieving exactly that. No matter how often it has been discussed before, this topic is always doing well. And sure, what’s not to love about that? I mean, it has it all: Postgres, everybody’s most favorite RDBMS—​check! Keeping things lean and easy—​sure, count me in! A somewhat spicy take—​bring it on!\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","postgres","architecture","streaming"],"title":"\"You Don't Need Kafka, Just Use Postgres\" Considered Harmful","uri":"https://www.morling.dev/blog/you-dont-need-kafka-just-use-postgres-considered-harmful/"},{"content":"","id":28,"publicationdate":"Nov 3, 2025","section":"tags","summary":"","tags":null,"title":"kafka","uri":"https://www.morling.dev/tags/kafka/"},{"content":"","id":29,"publicationdate":"Nov 3, 2025","section":"tags","summary":"","tags":null,"title":"postgres","uri":"https://www.morling.dev/tags/postgres/"},{"content":"","id":30,"publicationdate":"Sep 17, 2025","section":"tags","summary":"","tags":null,"title":"gc","uri":"https://www.morling.dev/tags/gc/"},{"content":" Table of Contents ZGC Allocation Stalls Summary In the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am exploring interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, relevant improvements to Java and the JVM, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below.\nJava 25 was released earlier this week, and it is the first Java release with long-term support (LTS) which ships with Generational ZGC as the one (and only) flavor of the ZGC garbage collector. ZGC itself is a relatively new concurrent collector, originally added in Java 11.\nThe high-level intuition on concurrent garbage collectors (another example being Shenandoah) is that they move as much of their work as possible from the application’s threads to separate GC threads. That way, they essentially do away with GC pauses, which used to plague Java users in the past in the form of high tail latencies of their applications. ZGC pushes down GC times in application threads down to the sub millisecond range, making GC pauses practically a non-issue for the vast majority of use cases. Of course, there is no free lunch: by running the GC logic in separate threads, concurrent collectors require more CPU resources, thus reducing the overall throughput of the system.\nSo far, I haven’t had the chance to gather some hands-on experience with ZGC yet; hence, I set out to run some comparisons of ZGC and G1, which is Java’s default garbage collector since version 9. Now, ZGC oftentimes is associated with large heaps of hundreds of gigabytes and beyond, but I was curious whether it would also be beneficial for a typical microservice deployment with just a few gigabytes. Furthermore, I was eager to learn about the performance characteristics using the default settings, i.e. I’m not too interested in fine-tuning specific garbage collectors. In practice, most folks don’t bother doing so for running their applications either. Hardly anyone has the time or interest to find optimal settings, which may be obsolete very soon anyways when details of the workload change, or a new Java version with changes to the GC behavior gets released. So arguably, in most cases the performance with default settings matters more than a theoretical peak performance achievable only with highly tuned settings.\nI started by benchmarking a sample microservice built using the Quarkus framework, returning some data from a Postgres database. Using Vegeta as a load generator, I created a moderate load of 1,000 requests per second. The test ran on a Hetzner CCX43 instance, using four of its 16 exclusive CPU cores and four GB of RAM. Here are the request latencies from running the test for two minutes with each collector, discarding the first 30 seconds of each run to exclude any warm-up effects. It’s not a super-scientific benchmark by any means, but good enough to show some interesting results (click to enlarge):\nWhile latencies are practically identically up to the 99th percentile, the p999 and p9999 latencies show quite an advantage for ZGC. Let’s try and find out whether indeed GC pauses explain the difference. Examining the actual request latencies in the Vegeta plot show that there a several significant outliers with G1:\nWhereas the runtimes look much more homogenous with ZGC:\nIn order to verify whether the G1 outliers actually were caused by GC pauses, I enabled JDK Flight Recorder while running the tests. And indeed there we can observe GC pauses of more than 20 ms at the respective offsets in the JFR recording:\nWith ZGC on the other hand, the longest GC pause time observed is ~50 microseconds:\nThat’s pretty neat: solely by using ZGC as the garbage collector, we could improve tail latencies of this example service substantially, without any sort of tuning. Note you may potentially get better results out of G1 too by playing with JVM options such -XX:MaxGCPauseMillis, but the much lower tail latencies you get from ZGC with default settings are what make it very appealing. Results may look different for your specific workloads, but it’s definitely worth giving ZGC a try. Chances are you may see some really nice benefits, without a lot of effort.\nGarbage collections are note the only cause for JVM pauses. Other examples include the deoptimization of compiled methods and the creation of heap dumps. These, and other operations, require all threads to come to a JVM savepoint, which may take some time. This post by Zac Blanco discusses potential causes for JVM pauses and ways to analyse them.\nNow, the test above didn’t put an awful lot of pressure on the garbage collectors to begin with (~17 MB/sec), and by no means the take away should be that ZGC always is a superior choice. In particular if there is a high CPU load on the system, ZGC’s more resource-intensive approach of cleaning up garbage in concurrent threads may actually yield higher request times than G1.\nZGC Allocation Stalls To see when and how that can be the case, let’s turn to another example. This is a synthetic benchmark which allocates large amounts of objects in the form of List\u0026lt;Long\u0026gt; with random numbers in a loop. The results are interesting:\nWhen allocating ~12 GB/sec (using 4 cores of the test system), the picture is similar to the one above: up to p99, G1 and ZGC are on par, whereas the p999 and p9999 latencies are significantly lower with ZGC. In contrast, when allocating ~30 GB/sec (using all the 16 cores of the test system), latencies are generally lower with G1 than with ZGC.\nAs above, a JFR recording can help to identify the cause. Looking at GC pause times is going to be misleading though: the longest pause time of ZGC still is in the microseconds range. So what is going on? Running on all the system’s cores, the workload under test is CPU bound, not leaving enough CPU resources for the concurrent GC threads of ZGC. This means that the collector can’t free up memory fast enough in order to keep up with the application allocating new objects at such a high rate. In that situation, ZGC will stall allocations until memory has become available again. Since Java 15, a dedicated JFR event, ZAllocationStall is logged in this case:\nSimilar to GC pause times, allocation stalls increase tail latencies of an application. They shouldn’t be equated to GC pauses though: unlike when using a non-concurrent collector which is causing pauses in application threads, a healthy application using ZGC should generally not encounter any allocation stalls at runtime. If it does, it is a sign that the workload doesn’t have enough CPU capacity at its disposal and you should either identify potential bottlenecks using a profiler, or provision more CPU resources. It’s a good idea to monitor allocation stalls via JFR event streaming and trigger an alert when they manifest.\nSummary ZGC is a very interesting addition to the JVM’s portfolio of garbage collectors. By moving all the heavy lifting to separate GC threads, large tail latencies due to GC pauses essentially are a thing of the past, making the Java platform a compelling choice also for workloads for which it historically may not have been considered.\nIf you haven’t looked at ZGC before, now may be a great time to do so: Java 25 is the first release with LTS support which includes Generational ZGC as the one and only form of this collector (Java 21 also shipped generational support for ZGC, but it had to be enabled via a bespoke JVM option), yielding significant improvements in regards to throughput and tail latencies over Java 17’s single generation ZGC implementation.\nGenerational garbage collectors organize the heap in multiple generations, taking advantage \u0026#34;of the weak-generational hypothesis, which posits that most objects become unreachable shortly after they are created\u0026#34;. Objects which have survived for some time after being created are moved to a heap area called the \u0026#34;old generation\u0026#34; which is scanned less frequently, thus making more efficient usage of CPU resources.\nIt’s important to keep in mind though that there is not the one best garbage collector for each and every situation. While you can get a very welcomed improvement to tail latencies by moving to ZGC, there’s a price for this to pay in the form of a lower overall throughput. In particular if your application is close to being CPU bound already, ZGC may not be the right choice. For many workloads though this can be mitigated by scaling out to multiple compute nodes organized in a cluster. You should do your own testing with your specific workload in your specific runtime environment to evaluate whether moving to ZGC is beneficial or not. Luckily that’s as simple as specifying -XX:+UseZGC when starting the JVM.\nYou can find the source code of the benchmarks used for this blog post here and here. If you’d like to learn more about ZGC and its concepts, the blog by OpenJDK developer Per Liden is a great starting point.\nWhen I find the time, I’d like to run some data streaming workloads using Apache Kafka and Flink on ZGC and share my findings in a follow-up to this post. If you have any experience and insight from running these systems on ZGC, I’d love to hear from you in the comments!\n","id":31,"publicationdate":"Sep 17, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_zgc_allocation_stalls\"\u003eZGC Allocation Stalls\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eIn the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am exploring interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, relevant improvements to Java and the JVM, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJava 25 \u003ca href=\"https://www.oracle.com/news/announcement/oracle-releases-java-25-2025-09-16/\"\u003ewas released\u003c/a\u003e earlier this week,\nand it is the first Java release with long-term support (LTS) which ships with Generational ZGC as the one (and only) flavor of the ZGC garbage collector.\n\u003ca href=\"https://openjdk.org/jeps/333\"\u003eZGC\u003c/a\u003e itself is a relatively new concurrent collector, originally added in Java 11.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","gc"],"title":"Let's Take a Look at... Lower Java Tail Latencies With ZGC","uri":"https://www.morling.dev/blog/lower-java-tail-latencies-with-zgc/"},{"content":" Table of Contents confirmed_flush_sn: Tracking Consumer Progress restart_lsn: Handling Concurrent Transactions Mid-Transaction Recovery Looking Forward: Streaming In-Progress Transactions Replication slots in Postgres keep track of how far consumers have read a replication stream. After a restart, consumers—​either Postgres read replicas or external tools for change data capture (CDC), like Debezium—resume reading from the last confirmed log sequence number (LSN) of their replication slot. The slot prevents the database from disposing of required log segments, allowing safe resumption after downtime.\nIn this post, we are going to take a look at why Postgres replication slots don’t have one but two LSN-related attributes: restart_lsn and confirmed_flush_lsn. Understanding the difference between the two is crucial for troubleshooting replication issues, optimizing WAL retention, and avoiding common pitfalls in production environments.\nTo examine the state of a replication slot, you can query the pg_replication_slots view. It also shows the restart_lsn and confirmed_flush_lsn of each slot:\n1 2 3 4 5 6 7 8 9 10 SELECT slot_name, plugin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots; +----------------+---------------+-------------+---------------------+ | slot_name | plugin | restart_lsn | confirmed_flush_lsn | |----------------+---------------+-------------+---------------------| | logical_slot_4 | pgoutput | 0/1D4A478 | 0/1D52850 | | demo_slot | test_decoding | 0/1DDC4D0 | 0/1DDC4D0 | +----------------+---------------+-------------+---------------------+ So what’s the difference between the two? Shouldn’t a single LSN be enough for tracking the progress a consumer has made?\nconfirmed_flush_sn: Tracking Consumer Progress In order to understand why Postgres manages these two LSN separately, let’s take a look at what the Postgres docs have to say, starting with confirmed_flush_lsn:\nconfirmed_flush_lsn: The address (LSN) up to which the logical slot’s consumer has confirmed receiving data. Data corresponding to the transactions committed before this LSN is not available anymore. NULL for physical slots.\nSo this is the latest LSN acknowledged by the slot’s consumer. In general, streaming will be continued from the next entry in the write-ahead log (WAL) after this LSN when restarting a consumer (we’ll get to one exception later on). Let’s confirm this by creating a slot using the test_decoding plug-in:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -- Creating a table for experimenting CREATE TABLE inventory.customers ( id SERIAL NOT NULL PRIMARY KEY, first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE, is_test_account BOOLEAN NOT NULL ); ALTER SEQUENCE inventory.customers_id_seq RESTART WITH 1001; ALTER TABLE inventory.customers REPLICA IDENTITY FULL; -- Creating a replication slot SELECT * FROM pg_create_logical_replication_slot(\u0026#39;demo_slot\u0026#39;, \u0026#39;test_decoding\u0026#39;); The test_decoding logical decoding plug-in is great for some quick testing and experimenting. For production use cases, the pgoutput plug-in should be used. You can learn more about this and other best practices for managing Postgres replication slots in this blog post I wrote recently.\nInsert some data:\n1 2 3 4 5 6 7 8 9 INSERT INTO inventory.customers (first_name, last_name, email, is_test_account) SELECT md5(random()::text), md5(random()::text), md5(random()::text), false FROM generate_series(1, 3) g; And consume it using Postgres\u0026#39; built-in SQL interface for working with logical replication streams (output shortened for readability):\n1 2 3 4 5 6 7 8 9 10 11 SELECT * FROM pg_logical_slot_get_changes(\u0026#39;demo_slot\u0026#39;, NULL, NULL); +-----------+-----+-----------------------------------------------+ | lsn | xid | data | |-----------+-----+-----------------------------------------------| | 0/1DDC648 | 765 | BEGIN 765 | | 0/1DDC6B0 | 765 | table customers: INSERT: id[integer]:1138 ... | | 0/1DDF4D0 | 765 | table customers: INSERT: id[integer]:1139 ... | | 0/1DDF610 | 765 | table customers: INSERT: id[integer]:1140 ... | | 0/1DDF780 | 765 | COMMIT 765 | +-----------+-----+-----------------------------------------------+ SELECT 5 The slot’s confirmed_flush_lsn is the last consumed LSN, 0/1DDF780, automatically acknowledged by pg_logical_slot_get_changes():\n1 2 3 4 5 6 7 8 9 SELECT slot_name, plugin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots; +----------------+---------------+-------------+---------------------+ | slot_name | plugin | restart_lsn | confirmed_flush_lsn | |----------------+---------------+-------------+---------------------| | demo_slot | test_decoding | 0/1DDC610 | 0/1DDF780 | +----------------+---------------+-------------+---------------------+ If we were to do more data changes and consume the slot again, we’d receive any events after LSN 0/1DDF780.\nrestart_lsn: Handling Concurrent Transactions So far, so good; what’s the deal with restart_lsn then? Unfortunately, the official docs are a bit vague on this one:\nrestart_lsn: The address (LSN) of oldest WAL which still might be required by the consumer of this slot and thus won’t be automatically removed during checkpoints unless this LSN gets behind more than max_slot_wal_keep_size from the current LSN. NULL if the LSN of this slot has never been reserved.\nWhen would a consumer need access to events older than its last confirmed LSN? This becomes clear when we examine how the WAL captures multiple transactions running concurrently, and how these are streamed to logical replication consumers. By default, the events of a transaction will only be published after the transaction has been committed. Suppose there are two transactions A and B running at the same time, with transaction A beginning before transaction B, but B committing before A. The WAL might look like this (LSNs shortened for readability):\nWhen transaction B commits, all its changes get published to the consumer of the replication slot, which eventually will confirm the latest LSN it has seen, 0/A6 (the LSN of the commit event). This does not mean though that the database can prune all earlier WAL sections just yet. At this point, transaction A still is running, so any WAL sections with the changes from this transaction still need to be retained until the transaction commits and the change events have been received and acknowledged by the replication slot consumer.\nAnd this is exactly the purpose of the replication slot’s restart_lsn: it is the latest LSN prior to all transactions which either are still in-flight, or which have committed but not been acknowledged by the consumer yet, in the example 0/A0. It acts as a retention boundary—​WAL segments before this point can be safely discarded.\nThis way of handling concurrent transactions has a few important implications:\nConsumers of logical replication cannot rely on the LSNs of received events to be strictly increasing. As transactions are exposed in commit order, events with a lower LSN can be published after events with a higher LSN. Only the tuple (commit_lsn, lsn) is guaranteed to be strictly increasing, i.e. commit LSNs are non-decreasing, and the LSNs of the events within one and the same transaction are non-decreasing.\nLarge or long-running transactions prevent the database from increasing the restart LSN of replication slots and hence may cause excessive amounts of WAL to be retained; therefore, you should generally avoid these types of transactions when possible\nYou also might wonder how the logical replication engine identifies the events to publish when encountering a COMMIT event in the WAL. A data structure called the \u0026#34;reorder buffer\u0026#34; is used for this purpose. It stores all events retrieved from the WAL, keyed by transaction id. Upon processing a transaction’s commit event, all events for the transaction are fetched from the buffer and emitted to the consumer. That way, no costly seeking in the WAL is required.\nThe buffer can spill over to disk for large transactions when reaching a given threshold, defaulting to 64 MB and configurable via the logical_decoding_work_mem setting. As this means additional disk I/O though, you should keep an eye on the amount of disk spill, using the pg_stat_replication_slots view.\nMid-Transaction Recovery Above, I mentioned there’d be one situation where a consumer may receive events from before the confirmed_flush_lsn of its replication slot when resuming to process a replication stream after a downtime. This happens when confirmed_flush_lsn points to an event in the middle of a transaction, rather than to a COMMIT event. In this case, all events of the entire transaction will be replayed to the consumer, starting with a BEGIN event.\nLet’s try to reproduce this situation. pg_logical_slot_get_changes() always returns all the events of a transaction, also when instructed to fetch a lower number of events. So we’ll have to be a bit more creative. First, let’s retrieve the current LSN and then insert a couple of rows into the customers table in a transaction:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 SELECT pg_current_wal_lsn(); +--------------------+ | pg_current_wal_lsn | |--------------------| | 2/50955BD8 | +--------------------+ BEGIN; INSERT INTO inventory.customers (first_name, last_name, email, is_test_account) SELECT md5(random()::text), md5(random()::text), md5(random()::text), false FROM generate_series(1, 3) g; COMMIT; To find out the LSN of one of the individual row inserts, we can use the pg_walinspect extension; it provides the pg_get_wal_records_info() function which lets you take a view at the WAL events of a given LSN range (as an aside, this shows that there is no explicit event for the begin of a transaction in the WAL; the BEGIN events in a replication stream are inserted by the logical replication system):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 SELECT start_lsn, end_lsn, xid, record_type FROM pg_get_wal_records_info( \u0026#39;2/50955BD8\u0026#39;, pg_current_wal_lsn() ); +------------+------------+-----+---------------+ | start_lsn | end_lsn | xid | record_type | |------------+------------+-----+---------------| | 2/50955BD8 | 2/50955C40 | 777 | LOG | | 2/50955C40 | 2/50956E18 | 777 | INSERT | | 2/50956E18 | 2/50956E90 | 777 | INSERT_LEAF | | 2/50956E90 | 2/50956F90 | 777 | INSERT_LEAF | | 2/50956F90 | 2/50957030 | 777 | INSERT | | 2/50957030 | 2/50957070 | 777 | INSERT_LEAF | | 2/50957070 | 2/50957170 | 777 | INSERT_LEAF | | 2/50957170 | 2/50957210 | 777 | INSERT | | 2/50957210 | 2/50957250 | 777 | INSERT_LEAF | | 2/50957250 | 2/50957350 | 777 | INSERT_LEAF | | 2/50957350 | 2/50957388 | 0 | RUNNING_XACTS | | 2/50957388 | 2/509573B8 | 777 | COMMIT | +------------+------------+-----+---------------+ Next, move the replication slot forward to the LSN of the second INSERT:\n1 SELECT pg_replication_slot_advance(\u0026#39;demo_slot\u0026#39;, \u0026#39;2/50956F90\u0026#39;); If you now retrieve the changes from the slot, you’ll see that it still returns all the events from that transaction, including the first INSERT, despite this one having an LSN older than confirmed_flush_lsn:\n1 2 3 4 5 6 7 8 9 10 SELECT * FROM pg_logical_slot_get_changes(\u0026#39;demo_slot\u0026#39;, NULL, NULL); +------------+-----+---------------------------------------------------+ | lsn | xid | data | |------------+-----+---------------------------------------------------| | 2/50955BD8 | 777 | BEGIN 777 | | 2/50955C40 | 777 | table customers: INSERT: id[integer]:10001159 ... | | 2/50956F90 | 777 | table customers: INSERT: id[integer]:10001160 ... | | 2/50957170 | 777 | table customers: INSERT: id[integer]:10001161 ... | | 2/509573B8 | 777 | COMMIT 777 | +------------+-----+---------------------------------------------------+ It is therefore generally advisable to confirm commit LSNs, as it allows the database to discard all the WAL elements for that transaction. When using Debezium, you can set the connector option provide.transaction.metadata to true in order to achieve that. Otherwise, Debezium would only acknowledge the LSN of the last event within a transaction. This is due to the constraints of the Kafka Connect framework, which only triggers a commit of source offsets when emitting records to Kafka.\nLooking Forward: Streaming In-Progress Transactions One last thing worth mentioning is that since version 14, Postgres also supports logical replication of in-progress transactions. This can be an interesting option to mitigate the issue of replication slots retaining a lot of WAL for large transactions, and it also can help to reduce end-to-end latencies as CDC tools can process change events (format them, filter them, etc.) before a transaction commits.\nOn the other hand, it also shifts quite a bit of complexity into the CDC layer, which now—​similar to Postgres\u0026#39; internal reorder buffer—​requires a way to store all the events of a transaction, so as to drop the events of transactions which eventually get rolled back. Debezium tracks this feature under the issue DBZ-9309.\n","id":32,"publicationdate":"Aug 5, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_confirmed_flush_sn_tracking_consumer_progress\"\u003econfirmed_flush_sn: Tracking Consumer Progress\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_restart_lsn_handling_concurrent_transactions\"\u003erestart_lsn: Handling Concurrent Transactions\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_mid_transaction_recovery\"\u003eMid-Transaction Recovery\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_looking_forward_streaming_in_progress_transactions\"\u003eLooking Forward: Streaming In-Progress Transactions\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eReplication slots in Postgres keep track of how far consumers have read a replication stream.\nAfter a restart, consumers—​either Postgres read replicas or external tools for change data capture (CDC), like \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e—resume reading from the last confirmed log sequence number (LSN) of their replication slot. The slot prevents the database from disposing of required log segments, allowing safe resumption after downtime.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this post, we are going to take a look at why Postgres replication slots don’t have one but two LSN-related attributes: \u003ccode\u003erestart_lsn\u003c/code\u003e and \u003ccode\u003econfirmed_flush_lsn\u003c/code\u003e.\nUnderstanding the difference between the two is crucial for troubleshooting replication issues, optimizing WAL retention, and avoiding common pitfalls in production environments.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","cdc","replication"],"title":"Postgres Replication Slots: Confirmed Flush LSN vs. Restart LSN","uri":"https://www.morling.dev/blog/postgres-replication-slots-confirmed-flush-lsn-vs-restart-lsn/"},{"content":"","id":33,"publicationdate":"Aug 5, 2025","section":"tags","summary":"","tags":null,"title":"replication","uri":"https://www.morling.dev/tags/replication/"},{"content":"","id":34,"publicationdate":"Jul 17, 2025","section":"tags","summary":"","tags":null,"title":"concurrency","uri":"https://www.morling.dev/tags/concurrency/"},{"content":" This post explores how virtual threads in Java 21+ provide an elegant solution for converting legacy Future objects into CompletableFuture instances.\nSince Java 8, the CompletableFuture API provides a convenient way for performing asynchronous operations in a functional, composable way. This makes it very simple to call some long-running methods—​for instance involving external I/O—​asynchronously and process each result as soon as it is available, without blocking on any threads:\n1 2 3 CompletableFuture.supplyAsync(() -\u0026gt; getCustomerFromDb(\u0026#34;Bob\u0026#34;)) .thenApply(c -\u0026gt; getOrdersForCustomer(c.id())) .thenAccept(o -\u0026gt; System.out.println(\u0026#34;Bob\u0026#39;s orders: \u0026#34; + o)); Unfortunately, many Java platform APIs, for instance ExecutorService, as well as 3rd party libraries, still don’t expose CompletableFuture, but the legacy Future type from Java 5.0 times. While both types represent the result of an asynchronous computation, CompletableFuture provides several advantages. Not only is it composable, it also is push-based, i.e. it notifies you when the computation result is available.\nFuture, on the other hand, only supports pull-style access: To retrieve the result value, you need to call the get() method, which will block the current thread until the result is available, thus somewhat defeating the purpose of using an asynchronous processing model to begin with. To mitigate the situation, you can check whether the Future has been completed by calling the (non-blocking) isDone() method, and do something else until the result finally is there.\nThis approach is neither very elegant nor efficient, which raises the question whether a Future can be converted into a CompletableFuture. A first attempt could look like so:\n1 2 3 4 5 6 7 8 9 public \u0026lt;T\u0026gt; CompletableFuture\u0026lt;T\u0026gt; toCompletableFuture(Future\u0026lt;T\u0026gt; future) { return CompletableFuture.supplyAsync(() -\u0026gt; { try { return future.get(); } catch (InterruptedException | ExecutionException e) { throw new CompletionException(e); } }); } This lets you integrate a Future, for instance returned by some legacy library, into a CompletableFuture-based processing pipeline. However, closer inspection reveals that this only shifts the problem from one place to another: the call to Future::get() blocks the thread running the Lambda expression passed to supplyAsync(). By default, when not specifying a particular executor, this will be a thread of the common fork-join pool. Since this pool is shared globally across the application, blocking threads in it is undesirable.\nAlternatively, you could envision a solution based on the aforementioned isDone() method: you could use another thread or a timer which regularly checks the future for completion, and once that’s the case, the CompletableFuture gets completed with the value obtained from get(). While this avoids any thread blocking, it either adds CPU overhead—​when calling isDone() with a very high frequency—​or it adds latencym when checking less frequently.\nNow, taking a step back, let’s rethink the first solution. Is blocking a thread actually always bad? In fact, it is not, thanks to virtual threads, as available since Java 21. Virtual threads are cheap, you can have hundreds of thousands, or even millions of them. When a virtual thread blocks, it will be unmounted from the underlying operating system thread which is running it (\u0026#34;carrier thread\u0026#34;), thus freeing it for running other virtual threads. Only once the virtual thread gets unblocked, it will be mounted to a carrier again.\nIn certain situations, a virtual thread actually can block its carrier, a situation known as \u0026#34;pinning\u0026#34;. Most notably, on Java 21 this would happen when calling a blocking operation from within a synchronized block. Starting with Java 24, this will not cause pinning any longer. The use case discussed in this post is unaffected by pinning.\nHow could we use virtual threads then to convert a Future into a CompletableFuture? One option would be to pass an executor backed by virtual threads when calling CompletableFuture::supplyAsync() in the solution above. Or, we could just start a virtual thread ourselves and manually complete a CompletableFuture object with the result of the original Future:\n1 2 3 4 5 6 7 8 9 10 11 12 13 public \u0026lt;T\u0026gt; CompletableFuture\u0026lt;T\u0026gt; toCompletableFuture(Future\u0026lt;T\u0026gt; future) { CompletableFuture\u0026lt;T\u0026gt; completable = new CompletableFuture\u0026lt;T\u0026gt;(); Thread.ofVirtual().start(() -\u0026gt; { try { completable.complete(future.get()); } catch (InterruptedException | ExecutionException e) { completable.completeExceptionally(e); } }); return completable; } Virtual threads provide an elegant solution to a long-standing integration challenge. Thanks to their lightweight nature, you can seamlessly bridge the gap between legacy Future-based APIs and modern CompletableFuture composition patterns, without the traditional trade-offs of thread blocking or polling overhead.\n","id":35,"publicationdate":"Jul 17, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eThis post explores how virtual threads in Java 21+ provide an elegant solution for converting legacy \u003ccode\u003eFuture\u003c/code\u003e objects into \u003ccode\u003eCompletableFuture\u003c/code\u003e instances.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSince Java 8, the \u003ca href=\"https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/concurrent/CompletableFuture.html\"\u003e\u003ccode\u003eCompletableFuture\u003c/code\u003e\u003c/a\u003e API provides a convenient way for performing asynchronous operations in a functional, composable way.\nThis makes it very simple to call some long-running methods—​for instance involving external I/O—​asynchronously and process each result as soon as it is available, without blocking on any threads:\u003c/p\u003e\n\u003c/div\u003e","tags":["java","concurrency","virtual-threads"],"title":"Converting Future to CompletableFuture With Java Virtual Threads","uri":"https://www.morling.dev/blog/future-to-completablefuture-with-java-virtual-threads/"},{"content":"","id":36,"publicationdate":"Jul 17, 2025","section":"tags","summary":"","tags":null,"title":"virtual-threads","uri":"https://www.morling.dev/tags/virtual-threads/"},{"content":"","id":37,"publicationdate":"Jul 8, 2025","section":"tags","summary":"","tags":null,"title":"debezium","uri":"https://www.morling.dev/tags/debezium/"},{"content":" Table of Contents Use the pgoutput Logical Decoding Output Plug-in Define a Maximum Replication Slot Size Enable Heartbeats Use Table-level Publications Use Column and Row Filters Enable Fail-Over Slots Consider Using Replica Identity FULL Monitor, Monitor, Monitor! Drop Unused Replication Slots Summary Over the last couple of years, I’ve helped dozens of users and organizations to build Change Data Capture (CDC) pipelines for their Postgres databases. A key concern in that process is setting up and managing replication slots, which are Postgres\u0026#39; mechanism for making sure that any segments of the write-ahead log (WAL) of the database are kept around until they have been processed by registered replication consumers.\nWhen not being careful, a replication slot may cause unduly large amounts of WAL segments to be retained by the database. This post describes best practices helping to prevent this and other issues, discussing aspects like heartbeats, replication slot failover, monitoring, the management of Postgres publications, and more. While this is primarily based on my experience of using replication slots via Debezium’s Postgres connector, the principles are generally applicable and are worth considering also when using other CDC tools for Postgres based on logical replication.\nUse the pgoutput Logical Decoding Output Plug-in Postgres uses logical decoding output plug-ins for serializing the data sent to logical replication clients. When creating a replication slot, you need to specify which plug-in to use. While several options exist, I’d recommend using the pgoutput plug-in, which is the standard decoding plug-in also used for logical replication between Postgres servers. It has a couple of advantages:\npgoutput is available out-of-the-box with Postgres 10+ (including AWS, GCP, and Azure managed services), requiring no additional installation\nCompared to other plug-ins using JSON as a serialization format, pgoutput uses the efficient binary Postgres replication message format.\nIt provides fine-grained control over the replicated tables, columns, and rows (see further down for more information)\nWhen using CDC tools like Debezium, they’ll typically create the replication slot automatically. In order to manually create a slot using the pgoutput plug-ing, call the pg_create_logical_replication_slot() function like so:\n1 2 3 4 5 6 7 SELECT * FROM pg_create_logical_replication_slot(\u0026#39;my_slot\u0026#39;, \u0026#39;pgoutput\u0026#39;); +-----------+-----------+ | slot_name | lsn | |-----------+-----------| | my_slot | 0/15AF761 | +-----------+-----------+ The returned LSN (log sequence number) is a 64-bit pointer uniquely identifying a location in the WAL, with the first part identifying the WAL segment and the second part identifying the offset within that segment. A consumer subscribing to this slot will receive change events starting from this LSN.\nUnlike text-based decoding plug-ins such as test_decoding, pgoutput emits a binary format, which means that you cannot inspect the messages for a replication slot using Postgres functions such as pg_logical_slot_peek_changes(). To take a quick look at the messages produced by pgoutput on the command line, without running a full CDC solution such as Debezium, you can use my tool pgoutput-cli:\n1 2 3 4 5 6 docker run -it --rm --network my-network gunnarmorling/pgoutput-cli \\ pgoutput-cli --host=postgres --port=5432 \\ --database=inventorydb --user=dbz_user \\ --password=kusnyf-maczuz-7qabnA \\ --publication=dbz_publication --slot=my_slot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 { \u0026#34;op\u0026#34;: \u0026#34;I\u0026#34;, \u0026#34;message_id\u0026#34;: \u0026#34;b14952a5-4033-4252-a33c-6039f700e9db\u0026#34;, \u0026#34;lsn\u0026#34;: 26691112, \u0026#34;transaction\u0026#34;: { \u0026#34;tx_id\u0026#34;: 755, \u0026#34;begin_lsn\u0026#34;: 26691432, \u0026#34;commit_ts\u0026#34;: \u0026#34;2025-07-02T12:09:05.854104Z\u0026#34; }, \u0026#34;table_schema\u0026#34;: { \u0026#34;column_definitions\u0026#34;: [ { \u0026#34;name\u0026#34;: \u0026#34;id\u0026#34;, \u0026#34;part_of_pkey\u0026#34;: true, \u0026#34;type_id\u0026#34;: 23, \u0026#34;type_name\u0026#34;: \u0026#34;integer\u0026#34;, \u0026#34;optional\u0026#34;: false }, ... ], \u0026#34;db\u0026#34;: \u0026#34;inventorydb\u0026#34;, \u0026#34;schema_name\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;table\u0026#34;: \u0026#34;customers\u0026#34;, \u0026#34;relation_id\u0026#34;: 16391 }, \u0026#34;before\u0026#34;: null, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 1028, \u0026#34;first_name\u0026#34;: \u0026#34;Sarah\u0026#34;, \u0026#34;last_name\u0026#34;: \u0026#34;O\u0026#39;Brian\u0026#34;, \u0026#34;email\u0026#34;: \u0026#34;sarah@example.com\u0026#34;, \u0026#34;is_test_account\u0026#34;: \u0026#34;f\u0026#34; } } Define a Maximum Replication Slot Size Replication protocols like Postgres\u0026#39; need to balance keeping transaction logs around long enough so that consumers can catch up after a downtime—​for instance during an upgrade, and making sure transaction logs don’t consume an unreasonable amount of disk space on the database machine.\nDifferent systems make different trade-offs in this area. As an example, MySQL allows you to configure a maximum retention time for its binlog; consumers are responsible to process all events from the binlog in time. In contrast, Postgres replication slots require active acknowledgement by the consumer in order to release processed WAL segments. The database will hold onto all the WAL segments for as long as needed for all replication consumers to process the data. Historically, this meant that a replication slot could cause an unlimited amount of WAL backlog if a consumer stopped processing that slot, potentially exhausting the disk space of the database machine when not taking action.\nFortunately, this situation changed with Postgres version 13, which allows you to limit the maximum WAL size a replication slot can hold on to. To do so, specify the max_slot_wal_keep_size parameter in your postgresql.conf file:\n1 max_slot_wal_keep_size=50GB If the difference between a slot’s restart LSN and the current LSN is larger than this limit, the database will invalidate the replication slot and drop older WAL segments. This renders the slot unusable, which means a new slot needs to be created, typically also taking a fresh initial snapshot of the data when using Debezium. While that’s inconvenient for event consumers, it’s definitely preferable to your operational database running out of disk space eventually.\nEnable Heartbeats A situation which is particularly prone to accidental WAL growth is the combination of multiple logical databases with different traffic patterns on one Postgres host. This is because there is one shared WAL for the entire instance, whereas replication slots are scoped to individual databases. Now, imagine a situation where there are many transactions run against one database—​thus adding many entries to the WAL—​while there’s another database which is idle. A replication slot for that second database can’t make any progress, as it never receives any change events, and therefore will cause more and more WAL segments to be retained.\nA solution to this is to produce some \u0026#34;fake\u0026#34; traffic in that second database, allowing its replication slot to progress. Historically, a dedicated heartbeat table was often used for this purpose. However, Postgres 14+ offers an elegant table-less alternative: via the function pg_logical_emit_message(), arbitrary contents can be written to the WAL, without them manifesting in any table. There’s a number of interesting applications for these logical decoding messages, including advancing replication slots in low traffic databases. To enable heartbeat messages with Debezium, add the following to your connector configuration:\n1 2 3 4 5 6 { ... \u0026#34;heartbeat.interval.ms\u0026#34; : \u0026#34;60000\u0026#34;, \u0026#34;heartbeat.action.query\u0026#34; : \u0026#34;SELECT pg_logical_emit_message(false, \u0026#39;heartbeat\u0026#39;, now()::varchar)\u0026#34;, ... } The connector executes this query every 60 seconds, writing a logical decoding message with the current timestamp to the WAL. Subsequently, it will retrieve the message via logical replication and thus allow the slot to advance. Note that the EXECUTE permission for this function must have been granted to the Debezium database user:\n1 2 GRANT EXECUTE ON FUNCTION pg_logical_emit_message(transactional boolean, prefix text, content text) TO \u0026lt;debezium_user\u0026gt;; Use Table-level Publications If you are using the pgoutput logical decoding plug-in, you have fine-grained control over the contents of the replication stream. If you are interested only in changes to ten tables out of 100 tables in your database, streaming changes for exactly those ten tables not only helps with saving resources (CPU, network I/O) on the database side, but it also can drastically reduce egress cost, when streaming change events into another availability zone of your cloud provider.\nThe pgoutput plug-in relies on Postgres publications for defining which kinds of changes should be published via logical replication. To create a table-level publication, specify all the tables for which change events should be published:\n1 CREATE PUBLICATION mypublication FOR TABLE customers, purchase_orders; If your database has multiple schemas but you only want to capture changes to the tables in a given schema, you can do so by creating publication as follows:\n1 CREATE PUBLICATION mypublication FOR TABLES IN SCHEMA inventory; When using Debezium, it can create publications for you automatically. By default, it will create a publication FOR ALL TABLES. However, this requires superuser permissions and it may unnecessarily stream events of tables which you are going to filter out in the connector anyways.\nAlternatively, you can have Debezium create table-level publications by setting the publication.autocreate.mode connector option to filtered. Debezium will then create a publication reflecting the set of captured tables as defined via the connector’s table and schema include/exclude filters. Note that this requires ownership permissions to all affected tables for the connector user.\nTo follow the principle of least privilege, you should therefore consider creating a publication for the connector by yourself, thus minimizing the set of permissions you need to grant to the connector user. By default, the publication is expected to be named \u0026#34;dbz_publication\u0026#34;, but you can override the name via the publication.name connector property. When setting up multiple connectors for capturing distinct sets of tables in the same database, a dedicated publication needs to be created for each connector.\nUse Column and Row Filters As of Postgres 15 and beyond, publications let you further trim down the contents of a replication stream. Via column lists, you can specify which column(s) of a table should be published. This can be very useful to exclude large columns, for instance binary data, which isn’t required for a given use case. Unfortunately, there’s no way to exclude a given column; instead, the names of all columns to be captured need to be specified when creating the publication:\n1 2 CREATE PUBLICATION mypublication FOR TABLE customers (id, first_name, last_name); If this publication is used via Debezium, make sure that the connector’s column list (as specified via the column.include.list and column.exclude.list connector options), matches the column list of the publication.\nColumn lists represent a form of projection, i.e. they are akin to the SELECT clause of a SQL query. In addition, publications also provide control over which rows to include in a replication stream via row filters. This corresponds to the WHERE clause of a query, and it is looking very similar to that when creating a publication:\n1 2 CREATE PUBLICATION mypublication FOR TABLE customers WHERE (is_test_account IS FALSE); Row filters can come in very handy to exclude portions of the operational data set from replication, for example test data or logically deleted data. You can learn more about row filters in this post I wrote after Postgres 15 was released.\nWhen using the snapshotting feature of Debezium—​which retrieves rows not via logical replication but by scanning the actual tables in the database—​you should specify the same filter expression via the snapshot.select.statement.overrides option in order to ensure consistency between snapshotting and streaming events.\nEnable Fail-Over Slots A long-standing shortcoming of logical replication in Postgres used to be the lack of fail-over support. Until relatively recently, replication slots could only be created on primary instances. If you had set up a Postgres cluster comprising a primary server and a read replica, logical replication couldn’t resume from the replica after promoting it to primary in case of a failure. Instead, you’d typically have to create a new replication slot, which also meant starting with a new initial snapshot if writes could occur on the new primary before creating a new replication slot.\nLuckily, over the last few Postgres versions, this issue finally got addressed. In Postgres 16, support for creating replication slots on replicas was added. While not solving the failover problem directly, this is a substantial improvement, as it allows you to have slots on primary and standby servers and manually keep them in sync. To do so, you need to track the progress of the primary slot and move the slot on the stand-by forward accordingly with the help of the pg_replication_slot_advance() function. I wrote about this topic in this post a while ago.\nPostgres 17 finally added full support for failover slots. It now can automatically sync the status of a replication slot on a standby server with a slot on the primary, without requiring any manual intervention whatsoever. After failover, consumers can continue to read from the slot on the newly promoted primary, without missing any events, or facing a large amount of duplicate events (some duplicates are to be expected in case the consumer has fetched events from the slot on the primary and a failover happens before the slot state could be updated accordingly on the replica server). To enable failover slots, a bit of configuration is required.\nOn the primary:\nPass failover=true when calling pg_create_logical_replication_slot() for creating the replication slot on the primary; With Debezium 3.0.5 or newer, you can have Debezium create a failover slot by setting the slot.failover connector option to true\nSet the option synchronized_standby_slots to the name of the physical slot connecting primary and standby server; this ensures that no logical replication slot can advance beyond the latest LSN synchronized from the primary to the replica\nAnd on the stand-by server:\nSet the option sync_replication_slots to on; this will start a worker process which automatically synchronizes the state of any logical replication slots from the primary server to the stand-by server; alternatively, you can call the function pg_sync_replication_slots() manually for synchronizing the slot state\nAdd the slot’s database name to the connection string used for connecting to the primary server (primary_conninfo), e.g. …dbname=inventorydb; If you are using Postgres on Amazon RDS, specify the database name instead using the option rds.logical_slot_sync_dbname\nSet the option hot_standby_feedback to true\nIf you connect to Postgres through a proxy, for instance pgbouncer, promoting a replica to primary can be made fully transparent to your replication consumers such as Debezium, seamlessly continuing to process any change events after a failover. You can find a complete example for doing so in this blog post.\nConsider Using Replica Identity FULL In Postgres, a table’s replica identity determines which fields of a row will be written to the WAL for the old row image for update and delete events. By default, the old value will be only recorded for primary columns. In addition, the value of any TOAST columns will only be contained in the new row image if their value changed.\nThese peculiarities can make change events somewhat difficult and complex to process for consumers. When performing incremental stream processing on a change event stream, the missing old row image (the before part of Debezium change events), requires a costly state materialization operation. Due to values for unchanged TOAST columns being absent from update events (represented by Debezium with a special value, __debezium_unavailable_value), consumers cannot apply such a change event with simple upsert semantics (I’ve discussed a potential solution for backfilling missing TOAST values via Apache Flink here).\nTo avoid these problems, consider changing the replica identity of your tables from DEFAULT to FULL:\n1 ALTER TABLE inventory.customers REPLICA IDENTITY FULL; This will cause the complete old and new row image, including TOAST columns, to be written to the WAL and thus be available in data change events. Some Postgres DBAs are concerned about the potential impact on disk utilization and CPU consumption. However, the overhead is actually manageable in many cases. The details depend on your specific workload, so you should do your own benchmarking to measure the exact impact. But as an example, this post mentions a moderate increase of peak CPU consumption from 30% to 35% when enabling replica identity FULL. This should be acceptable in many cases, and doing so can help substantially to simplify the consumption and processing of change event streams.\nMonitor, Monitor, Monitor! Deep observability is key for operating data systems successfully in production. When running Postgres, you should put monitoring and alerting for your replication slots in place to make sure that they’ll never consume unreasonably large amounts of WAL. The following metrics should be constantly tracked using observability tools such as Prometheus and Grafana, Datadog, Elastic, or similar:\nTotal WAL size\nRetained WAL size per replication slot\nRemaining WAL space per replication slot\nStatus (active/inactive/invalid) per replication slot\nTo obtain the total WAL size, you can sum up the sizes of all the files returned by the pg_ls_waldir() function. The slot specific metrics can be retrieved from the pg_replication_slots view, e.g. like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 SELECT slot_name, plugin, database, restart_lsn, CASE WHEN invalidation_reason IS NOT NULL THEN \u0026#39;invalid\u0026#39; ELSE CASE WHEN active IS TRUE THEN \u0026#39;active\u0026#39; ELSE \u0026#39;inactive\u0026#39; END END as \u0026#34;status\u0026#34;, pg_size_pretty( pg_wal_lsn_diff( pg_current_wal_lsn(), restart_lsn)) AS \u0026#34;retained_wal\u0026#34;, pg_size_pretty(safe_wal_size) AS \u0026#34;safe_wal_size\u0026#34; FROM pg_replication_slots ORDER BY slot_name; +----------------+----------+-------------+-------------+----------+--------------+---------------+ | slot_name | plugin | database | restart_lsn | status | retained_wal | safe_wal_size | |----------------+----------+-------------+-------------+----------+--------------+---------------| | logical_slot_1 | pgoutput | inventorydb | 0/1983A40 | inactive | 2386 MB | 48 GB | | logical_slot_2 | pgoutput | inventorydb | 0/96BFA970 | active | 3920 bytes | 50 GB | +----------------+----------+-------------+-------------+----------+--------------+---------------+ The retained WAL size can be calculated by determining the difference between the slot’s restart LSN (the earliest LSN it holds on to) and the current LSN of the database. The safe_wal_size field in the view represents the number of bytes which the slot can hold in addition until it hits the limit configured via max_slot_wal_keep_size (see above).\nAll these metrics can be obtained from a Postgres instance very easily using the postgres_exporter project, which exposes a Prometheus-compatible endpoint. In addition, it also makes sense to track the remaining free space of the disk or volume holding the WAL. Postgres itself doesn’t expose this value, instead you’ll have to obtain it from your operating system, job orchestrator (such as Kubernetes), or cloud provider (when running Postgres on a service such as Amazon RDS). Last but not least, it is recommended to monitor the MilliSecondsBehindSource metric which Debezium provides for each connector instance. It represents the time it takes from the point in time a change is made in the database until that event is being processed by Debezium. Debezium provides its metrics via JMX; via Prometheus\u0026#39; jmx_exporter component, they can be exposed via HTTP in a Prometheus-compatible format.\nAs a starting point for your own observability solution for Postgres logical replication slot, you can find a Grafana dashboard displaying most of these metrics here:\nThe example shows results from a 30 min run of pgbench (20 connections, four threads each). There are three Debezium connectors with corresponding logical replication slots. Replication slot 1 shows a constant level of WAL retention, as this connector is continuously running and is emitting events. The connector owning slot 2 is stopped for a few minutes in the middle of the run, as indicated by the red columns in the activity status panel. During that time, the WAL backlog of that slot goes up, but it then shrinks again as the connector catches up after being restarted. Slot 3 finally continuously retains more and more WAL, the reason being that this slot is configured against another database on this Postgres host, and no changes are occurring in that database. Thus, Debezium never gets to acknowledge progress on this slot. Heartbeat events, as discussed above, can be used to overcome this situation.\nWhat to do if an active replication consumer can’t keep up with the changes in the source database and its replication lag continuously increases? While I am planning to write a separate blog post about tuning the performance of the Debezium Postgres connector, one solution can be to work with multiple replication slots, each exporting changes to a distinct subset of tables, thus allowing you to split the consumer load to multiple processes running on separate machines. To do so, you can copy an existing replication slot with pg_copy_logical_replication_slot(). That way, a second connector can resume processing from the same LSN as the original slot.\nBesides visualizing the values in a dashboard, you should also have alerts which trigger when certain thresholds are passed. The specific values depend on your particular database size and the characteristics of your workload. Consider starting with the following values and adjust the thresholds from there to find the right balance between firing early enough and avoiding unnecessary noise:\nDisk utilization passes 60-70%\nA replication slot is inactive for longer than 30 minutes\nA replication slot retains more than 10-20 GB of WAL data\nOftentimes, more than absolute values themselves, the first derivative—​i.e. changes to the values—​is interesting, and should be subject to alerting, for instance if disk utilization rapidly increases, or if the WAL retained by a replication slot slowly yet steadily grows over a longer period of time.\nIf you apply larger, long-running transactions against your Postgres database, this may cause logical replication to spill state to disk during decoding the WAL contents, increasing the I/O load of the machine and slowing down the replication process. On Postgres 14 and newer, you can examine the disk spill of a replication slot by querying the pg_stat_replication_slots view:\n1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT slot_name, total_txns, spill_txns, pg_size_pretty(spill_bytes) as spilled, pg_size_pretty(total_bytes) as total FROM pg_stat_replication_slots; +-----------+------------+------------+---------+--------+ | slot_name | total_txns | spill_txns | spilled | total | |-----------+------------+------------+---------+--------| | my_slot | 3 | 1 | 66 MB | 122 MB | +-----------+------------+------------+---------+--------+ If you are observing an unduly large amount of disk spill, consider increasing the logical_decoding_work_mem setting (defaults to 64 MB).\nDrop Unused Replication Slots Finally, a housekeeping tip: don’t forget to delete any unused replication slots! In particular, when stopping and deleting a Debezium connector, its replication slot in Postgres will not automatically be removed. If the slot is not required any more for other connectors or other types of replication consumers, you should drop the slot in order to prevent it from blocking the removal of WAL segments. To do so, call the function pg_drop_replication_slot() like so:\n1 SELECT pg_drop_replication_slot(\u0026#39;my_replication slot\u0026#39;); Once Postgres 18 has been released (planned for September 2025), the new option idle_replication_slot_timeout will come in handy for that. A time-based counterpart to the aforementioned max_slot_wal_keep_size option, it lets you invalidate replication slots after a configurable period of inactivity. Setting it to a reasonably large value such as 48h or 72h will help to make sure that inactive slots are invalidated in time, preventing them from holding on to more and more WAL segments.\nSummary Logical replication slots are an essential building block for building CDC pipelines with Postgres. While concerns about potential WAL growth sometimes lead to uncertainty and anxiety among users, these fears are largely unnecessary when replication slots are set up and configured correctly.\nBy carefully configuring aspects like the maximum slot size, fine-grained publications, and heartbeats, you can ensure the stability and performance of your Postgres database and your CDC pipelines. Fail-over slots, as supported since Postgres 17, let you resume replication seamlessly after promoting a stand-by server to primary. Finally, put comprehensive monitoring and alerting in place, to make sure your replication slots behave as intended. The Grafana dashboard shown above can be a starting point for doing so; you can find it in my streaming-examples repository on GitHub. Contributions to this dashboard will be very welcomed!\n","id":38,"publicationdate":"Jul 8, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_use_the_pgoutput_logical_decoding_output_plug_in\"\u003eUse the pgoutput Logical Decoding Output Plug-in\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_define_a_maximum_replication_slot_size\"\u003eDefine a Maximum Replication Slot Size\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_enable_heartbeats\"\u003eEnable Heartbeats\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_use_table_level_publications\"\u003eUse Table-level Publications\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_use_column_and_row_filters\"\u003eUse Column and Row Filters\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_enable_fail_over_slots\"\u003eEnable Fail-Over Slots\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_consider_using_replica_identity_full\"\u003eConsider Using Replica Identity FULL\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_monitor_monitor_monitor\"\u003eMonitor, Monitor, Monitor!\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_drop_unused_replication_slots\"\u003eDrop Unused Replication Slots\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOver the last couple of years, I’ve helped dozens of users and organizations to build Change Data Capture (CDC) pipelines for their Postgres databases. A key concern in that process is setting up and managing replication slots, which are Postgres\u0026#39; mechanism for making sure that any segments of the write-ahead log (WAL) of the database are kept around until they have been processed by registered replication consumers.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen not being careful, a replication slot may cause unduly large amounts of WAL segments to be retained by the database. This post describes best practices helping to prevent this and other issues, discussing aspects like heartbeats, replication slot failover, monitoring, the management of Postgres publications, and more. While this is primarily based on my experience of using replication slots via \u003ca href=\"https://debezium.io/documentation/reference/stable/connectors/postgresql.html\"\u003eDebezium’s Postgres connector\u003c/a\u003e, the principles are generally applicable and are worth considering also when using other CDC tools for Postgres based on logical replication.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","cdc","debezium","replication"],"title":"Mastering Postgres Replication Slots: Preventing WAL Bloat and Other Production Issues","uri":"https://www.morling.dev/blog/mastering-postgres-replication-slots/"},{"content":" Table of Contents What I Use AI For What I Do Not Use AI For When you read a post on this site, you can be sure it is an original article, written by a human: me.\nMy goal is to share unique (in any case, personal), and hopefully innovative perspectives, based on my own work and experience. Large language models (LLMs) are not going to give you that. I don’t see much value in publishing articles written by an LLM; after all, you could just run ChatGPT or Claude by yourself in order to do so.\nThis is not to say I am not working with AI in the process of writing at all. I am using AI occassionally, similar to working with a human copy editor, helping me to improve articles, once I have a draft. The scope of this are light-touch changes, affecting a few handful of words. Here is how I am using AI:\nWhat I Use AI For Checking for correct grammar and wording\nMaking suggestions for improving the expressiveness of my language; being a non-native English speaker, I like for instance finding new words to add variety. Case in point: I tend to overuse the word \u0026#34;use\u0026#34; (just check this page ;), and AI helps to find alternatives\nValidating whether the line of thought of an article is coherent\nMaking suggestions for article titles and teasers, which I rework substantially to focus on the right thing and match my personal voice\nWhat I Do Not Use AI For Original writing in any form\nCreating complete paragraphs or sections\n","id":39,"publicationdate":"Jun 19, 2025","section":"","summary":"Table of Contents What I Use AI For What I Do Not Use AI For When you read a post on this site, you can be sure it is an original article, written by a human: me.\nMy goal is to share unique (in any case, personal), and hopefully innovative perspectives, based on my own work and experience. Large language models (LLMs) are not going to give you that. I don’t see much value in publishing articles written by an LLM; after all, you could just run ChatGPT or Claude by yourself in order to do so.","tags":null,"title":"How I Use (and Don't Use) AI","uri":"https://www.morling.dev/ai/"},{"content":"","id":40,"publicationdate":"Jun 18, 2025","section":"tags","summary":"","tags":null,"title":"ai","uri":"https://www.morling.dev/tags/ai/"},{"content":"","id":41,"publicationdate":"Jun 18, 2025","section":"tags","summary":"","tags":null,"title":"flink","uri":"https://www.morling.dev/tags/flink/"},{"content":"","id":42,"publicationdate":"Jun 18, 2025","section":"tags","summary":"","tags":null,"title":"sql","uri":"https://www.morling.dev/tags/sql/"},{"content":" Table of Contents Agents Need to Interact With LLMs Agents Should Be Event-Driven Agents Need Context Agents Require Memory When SQL Is Not Enough Parting Thoughts AI Agents have improved in leaps and bounds in recent times, moving beyond simple chatbots to sophisticated, autonomous systems. This post explores a novel approach to building agentic systems: using the power of streaming SQL queries. Discover how platforms like Apache Flink can transform the development of AI Agents, offering benefits in consistency, scalability, and developer experience.\nA while ago, Seth Wiesman did an excellent talk at Kafka Summit titled \u0026#34;OH: That microservice should have been a SQL query\u0026#34;. In this presentation he made the case for implementing microservices as SQL queries on top of a stream processor, arguing that this approach yields faster times to market, while ensuring high consistency, scalability, and low latency for your data processing. This story resonated a lot with me, considering that stream processing jobs really are an exemplification of the microservices idea: do one thing, and do it well.\nThis got me thinking: can the same line of argument be made for building agentic systems? Would it be a good idea to build an AI Agent as a streaming SQL query? And if so, what would it take to do so? Before running this thought experiment, let’s define what we mean when talking about AI Agents. I like Google’s no fluff definition quite a bit:\nAI agents are software systems that use AI to pursue goals and complete tasks on behalf of users. They show reasoning, planning, and memory and have a level of autonomy to make decisions, learn, and adapt.\nAnother, rather pragmatic way to look at this is to think of AI Agents as microservices, which take some kind of input, process that input, with an AI model, typically a large language model (LLM), and emit the result. This is to say, compared to some lofty ideas of what an AI Agent could be and what it could do (\u0026#34;Here’s my AWS access key, go and cut my cloud spend into half!\u0026#34;), most of the agents people actually deploy at this point are relatively firmly defined AI-assisted workflows (in fact, in their widely received article Building effective agents, the Anthropic team is categorizing agentic systems into workflows and agents). Potential use cases include customer service interactions, document processing in healthcare, automated sales processes, predictive maintenance processes, and others.\nNow, why could it be interesting to build AI Agents in the form of streaming jobs, specifically as SQL queries? It might sound a bit like an odd idea at first, but I think it actually warrants some consideration.\nWhen issuing a SQL query in a traditional database, results are determined in a pull approach, i.e. the query is run against the underlying dataset (by scanning tables, querying indexes, etc.) and the entire result set is returned to the client. Streaming query systems reverse this pattern. Queries are running continuously and compute results incrementally, in a push based way. If there is a change to the dataset, only the affected records are processed by the query, and the corresponding delta to the query’s result set is emitted to clients.\nAt their core, stream processors like Apache Flink are a platform for building event-driven data-intensive applications, with a strong focus on high performance, scalability, and robustness. As such, they provide many of the building blocks needed for implementing AI Agents, too. Using SQL makes building agents a possibility not only to application developers, but also to all the SQL-savvy data engineers out there.\nBased on this premise, let’s discuss a few aspects you should keep in mind in order to build an AI Agent successfully, and whether Apache Flink, as an example of a widely used stream processing engine, and Flink SQL in particular, can be a useful foundation for doing so.\nAgents Need to Interact With LLMs While there are many opinions about what AI Agents really are, one thing is for sure: they need to interact with LLMs. Instead of the traditional way of building software which processes data according to some predefined rules, typically yielding a deterministic outcome, LLM-based systems are less pre-defined. Input data, structured and unstructured, and context such as conversation history is passed as natural language to the LLM, which produces a response in natural language (potentially wrapped in a structured container such as a JSON document to simplify further processing). This output then either is emitted as the result to the caller, or it serves as input for further LLM interactions. In agent-to-agent scenarios, it may also be passed on as the input to another AI Agent.\nSo how does Flink fare in that regard? Can you interact with ML/AI models from within your streaming SQL queries? Indeed you can. FLIP-437 (\u0026#34;Support ML Models in Flink SQL\u0026#34;) aims at making models first class citizens in streaming applications. A new DDL statement CREATE MODEL allows for the registration of AI models from providers such as OpenAI, Google AI, AWS Bedrock, and others.\nAs an example, let’s assume we’d like to stay on top of new research papers around databases and data streaming from conferences such as VLDB. As reading all the papers can be quite time-consuming, let’s build an agent which summarizes given papers, a task for which LLMs come in really handy. Here’s how a solution for this problem could look like, running on a fully managed stream processing platform such as Confluent Cloud:\nNew papers are uploaded to some S3 bucket (for instance using Apache Tika to extract the text from the original PDF files), where they are picked up from an S3 source connector and submitted to a Kafka topic. The agent, implemented as a streaming SQL query, creates a summary for each new paper with the help of an OpenAI model. The result is written to another topic, for instance allowing to push the summary of each new paper into some Slack channel. Here’s how the model can be creation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 CREATE MODEL summarization_model INPUT(text STRING) OUTPUT(title STRING, authors STRING, year_of_publication INT, summary STRING) COMMENT \u0026#39;Research paper summarization model\u0026#39; WITH ( \u0026#39;provider\u0026#39; = \u0026#39;openai\u0026#39;, \u0026#39;task\u0026#39; = \u0026#39;text_generation\u0026#39;, \u0026#39;openai.connection\u0026#39; = \u0026#39;openai-connection\u0026#39;, \u0026#39;openai.model_version\u0026#39; = \u0026#39;gpt-4.1-mini\u0026#39;, \u0026#39;openai.output_format\u0026#39; = \u0026#39;json\u0026#39;, \u0026#39;openai.system_prompt\u0026#39; = \u0026#39;This is a text extract of a research paper in PDF format. ↩ Provide its title, authors, year of publication, and a summary ↩ of 200 to 400 words. Reply with a JSON structure with the fields ↩ \u0026#34;title\u0026#34;, \u0026#34;authors\u0026#34;, \u0026#34;year_of_publication\u0026#34;, \u0026#34;summary\u0026#34;. Return ↩ only the JSON itself, no Markdown mark-up.\u0026#39; ); Note how that model definition also contains the system prompt to be used. The FLIP still is work-in-progress in the Flink open-source project, but it already is supported by some Flink-based offerings, including Confluent Cloud for Apache Flink. Once you have defined a model, you can query it via the ML_PREDICT() function (see FLIP-525 for more details). For instance like so, querying the summarization model above:\n1 2 3 4 5 6 INSERT INTO papers_summarized SELECT fulltext, title, authors, year_of_publication, summary FROM research_papers, LATERAL TABLE(ML_PREDICT(\u0026#39;summarization_model\u0026#39;, fulltext)); Once this query is running, a new paper pushed to the input topic, research_papers, will yield a result like this on the papers_summarized topic:\n1 2 3 4 | fulltext | title | authors | year_of_publication | summary | +----------------------------------------------+------------------------+---------------------------------------------+---------------------+------------------------------------------------+ | Styx: Transactional Stateful Functions on... | Styx: Transactional... | Kyriakos Psarakis, George Christodoulou,... | 2025 | This paper introduces Styx, a novel runtime... | | ... | ... | ... | ... | ... | In this example we’re using an LLM for summarizing the elements of a data stream, but you could also follow the same approach for sentiment analysis, categorizing data, creating recommendations, detecting spam, translating text, and much more.\nAgents Should Be Event-Driven When thinking of agentic AI, conversational agents—based on synchronous request-response patterns—may be the first thing coming to mind, with a ubiquitous example being LLM-backed chatbots. At this point, probably everyone has communicated with LLMs that way, either directly by using tools like ChatGPT or Claude, or indirectly by talking to chatbots on the website of an ecommerce platform or an airline.\nArguably though, in an enterprise context, autonomous event-driven agents oftentimes are more relevant. Based on real-time data and event streams, such as user interactions in a web shop, sensor data from a wind turbine, or changes in some database, such agents take intelligent action without user intervention, for instance to restock inventory, issue a predictive repair order, etc. An event-driven agent performs its job not when a human happens to engage with it, but when the input data requires it. The result typically will be another type of event, either consumed asynchronously as input by other AI agents, as a command by traditional non-agentic systems, or by a human for validation and approval.\nThis sort of event-driven data processing is an absolute sweet spot for Flink SQL, and Flink in general. Its large ecosystem of ready-made connectors provides integration with a wide range of source and sink systems, data stores, and services. Clickstream data via Kafka, change data feeds from your database, sensor measurements via MQTT—There’s connectors pretty much for everything.\nWhile Flink lets you run connectors directly embedded into the stream processing engine, in particular the combination with an event streaming platform such as Apache Kafka opens up many interesting possibilities. This approach allows you to create networks of specialized loosely coupled agents, which can build on each other’s results, without having to know details like where a given agent runs. Kafka connects and unlocks your company’s systems, teams, and databases, providing agents with the context they need to operate and provide value on top of your organization’s proprietary data. Thanks to Flink’s unification of stream and batch processing, agents can not only react to incoming events in real-time, but—with the right retention policy for your Kafka topics—they also can reprocess a stream of input data if needed. This is not only very useful for the purposes of failure recovery, but also for testing and validating changed processing logic after updating an agent. In an A/B testing scenario, two different variants of the same agent could process the same set of input topics, allowing you to compare the different outcomes and evaluate which one performs better.\nFinally, an event-driven architecture also helps to overcome an inherent limitation of LLMs: they are fixed in time. Their knowledge is subject to the cutoff date of their training dataset. With a RAG-based approach (retrieval-augmented generation), as discussed in the next section, additional data can be fed to a model at inference time. Ingesting new or changed data in real-time into a vector store helps to make the latest and up-to-date information available to the LLM.\nAgents Need Context LLMs are general-purpose models created from huge bodies of publicly available datasets. However, many, if not most, AI Agents for enterprise use cases require access to context such as internal data and resources, tools and services. How can this be implemented when building an agentic system using Flink SQL?\nFirst, let’s consider the case of structured data, for instance details about a given customer stored in an external database. SQL is a natural fit for accessing that kind of data: Flink SQL allows you to enrich the data to be sent to an LLM using SQL join semantics. One option is to join streams sourced from one of the wide range of source connectors (and by extension, also using the Kafka Connect source connector ecosystem). Alternatively, in particular for reference data which doesn’t frequently change, you also can use look-up joins, which let you retrieve data from external data sources, such as databases or CRM systems. In that case, Flink will take care of caching look-up results in a local RocksDB instance for the sake of efficiency, fetching data from the upstream source only when needed.\nWhen it comes to feeding non-public unstructured data—documentation and wiki pages, reports, knowledgebases, customer contracts, etc.—to an LLM, retrieval-augmented generation (RAG) is a proven solution. With the help of a language model, unstructured domain-specific information is encoded into embeddings, which are stored in a vector database such as Pinecone or Elasticsearch, or alternatively using a vector index of a more traditional data store like Postgres or MongoDB. Thanks to Flink SQL’s rich type system, vectors are natively supported as ARRAY\u0026lt;FLOAT\u0026gt;. When an agent is about to make a query to an LLM, the input data is used to query the vector store, allowing the agent to enrich the LLM prompt with relevant domain-specific information, yielding higher quality results, based on the latest data and information of your specific business context.\nWhat does that mean for our thought experiment of building AI Agents as Flink SQL queries? Following up on the example of summarizing research papers, let’s assume we’re also doing company-internal research, the results of which are documented in an internal wiki. Based on the summary of an incoming research paper, we’d like to identify relevant internal research and get some understanding of the relationship between the new paper and our own research, for instance providing new angles and perspectives for future research activities. To solve that task, we could think of having two streaming SQL jobs, which both taken together form an agentic system:\nOne job creates and updates the embeddings in the vector store, whenever there’s a change in the internal research wiki. In other scenarios, thanks to the rich eco system of Flink connectors, the data could also be retrieved in real-time from a relational database using change data capture, through a web hook which receives a notification after changes to a company’s wiki pages, etc. To create the vector embeddings (A1), the ML_PREDICT() function can be used with an embedding model such as OpenAI’s text-embedding-3-small model. That way, the embedding representation in the vector store is continuously kept in sync with the original data (A2).\nIn the actual agent job itself, we’d create a summary of each new paper as described above (B1). Next, we’d use ML_PREDICT() with the same embedding model for creating a vector representation of that summary (B2). This embedding then is used to query the vector store and identify the most relevant internal research documents, for instance based on cosine similarity (B3). Currently, there’s no support for this built into Apache Flink itself, so this is something you’d have to implement yourself with a user-defined function (UDF). When running on Confluent Cloud, there’s a ready-made function VECTOR_SEARCH(), which lets you execute queries against different vector stores; eventually, I’d expect this capability to also be available in upstream Flink. Finally, we’d use the results to augment another LLM invocation via ML_PREDICT() for establishing the relationship between the new paper and our own research (B4).\nArguably, so far we’ve stayed on the workflow side of the workflow/agent dichotomy mentioned initially. For building a true AI Agent, it may be necessary to let the LLM itself decide which resources or tools to tap into for a given prompt. Anthropic’s MCP standard (Model Context Protocol) has seen a massive uptake over the last few months for exactly this use case, allowing you to integrate custom services and data sources into your agentic workflows.\nUnfortunately, as of today, this is not something which is supported by Flink SQL out-of-the-box. But you can close this gap by implementing a UDF. In particular, Process Table Functions (PTF, defined by FLIP-440), a new kind of UDF available in Flink 2.1 come in very handy for this purpose. They allow you to integrate arbitrary logic written in Java into your SQL pipelines, which means you could build a PTF for the integration of external tools via MCP, for instance using the LangChain4j API.\nPTFs allow for very flexible customizations of the processing logic of Flink SQL jobs. The integration of MCP into a PTF may be a subject for a future post; in the mean time, refer to this post for taking a first look at using PTFs in the context of a change data capture pipeline for Postgres.\nAs PTFs are table valued functions, they can not only operate on single rows and events, but also on groups of rows, for instance all the events pertaining to a specific customer or workflow instance. This makes them a candidate for implementing agent memory; more on that in the following.\nAgents Require Memory Finally, let’s discuss the aspect of state when it comes to building AI Agents. When processing an incoming event, it may be necessary to look back at previous events when assembling the prompt for an LLM. In our research example, this may be previous papers of the same author. In a recommendation use case, this could for instance be all the purchase orders of the customers in a given segment. In a conversational scenario, this might be all the previous messages, requests and responses, in a given conversation.\nWhile Flink SQL manages state for different kinds of query operators (for instance, for windowed aggregations or joins), SQL by itself doesn’t give you the level of fine-grained state access you’d need to model the memory of an AI Agent. The aforementioned process table functions can help with that, though. When applying a PTF to partitioned input streams, you can manage arbitrary state in the context of individual partitions, such as all the events and messages pertaining to a given instance of an AI-based workflow, including previous LLM responses. You could then retrieve these messages from the state store when building the LLM prompt. In that light, a PTF backed by Flink state can be considered as a form of durable execution, tracking the progress of a long-running operation in persistent, resumable form. As a bonus, Flink automatically takes care of distributing that state in a cluster, allowing you to scale out stateful AI Agents to as many compute nodes as needed.\nWhen SQL Is Not Enough So, it seems we can use Flink SQL for building agentic systems, be it workflows or agents; but does this also mean we should? Are we at risk that—with that squirrely hammer in our hand—every problem is looking like a nail?\nRelatively uncontroversially, SQL is great for all kinds of pre- and post-processing of the (structured) data consumed and created by an agent: filtering and transforming data, joining multiple streams, aggregating and grouping data is the sweet spot of a stream processing engine like Flink SQL. It offers tools such as the very powerful MATCH_RECOGNIZE() operator, which lets you search for specific patterns in your input data streams to identify records relevant for further processing. All that on top of a highly scalable, fault-tolerant and battle-proven runtime. But as we’ve seen, it’s also possible to bridge the world to unstructured data processing in natural language, using LLMs, relatively easily. Thanks to recent additions such as built-in model support, LLMs can be integrated into event-driven streaming pipelines, also providing tools like PTFs for managing context and state, integration of MCP, and more.\nThis post explores the implementation of agentic systems in the form of streaming SQL jobs. Another facet to this discussion is how AI Agents can interact with data streaming infrastructure as part of their business logic, for instance in order to identify relevant topics on a Kafka cluster and retrieve data from them, issue Flink streaming queries, etc. The community has been working on several MCP servers for this purpose, including mcp-kafka and mcp-confluent, which enables the integration of Confluent Cloud resources into agentic workflows.\nBut what if you want to build an AI Agent which requires some more, well, agency? At some point, you may need to go beyond what’s reasonably doable with a SQL-based implementation. Would it still make sense then to use Flink (instead of Flink SQL), as a runtime for AI Agents? The community seems to think so, considering the recent announcement of the Flink Agents sub-project (FLIP-531).\nA collaboration between engineers from Confluent and Alibaba, this project proposal aims at the creation of a Flink-based runtime for AI Agents. The idea is to re-use Flink’s proven foundation for low-latency continuous data processing, which offers many desirable traits such as fault tolerance, scalability, state management, observability, and more. The FLIP seeks to explore a new easy-to-use agent framework on top of that, making AI Agents a first class citizen in the Flink ecosystem. Besides Java, Python support is envisioned, allowing agent authors to tap into the vast ecosystem of AI-related Python libraries. The agent SDK will provide out-of-the-box integration of external tools via MCP, vector search, agent-to-agent communication, etc. In particular that last aspect might trigger some memories of an earlier, now dormant, project under the Flink umbrella: Stateful Functions (StateFun). It remains to be seen whether this will see a revival in the form of an agentic runtime as part of the work on this FLIP.\nParting Thoughts Apache Flink, with its robust stream processing capabilities and evolving AI integrations, is a compelling and versatile platform for building intelligent, event-driven agentic systems. While some more work needs to be done—for instance around the integration of external tools and resources via MCP—to bridge the gap between agentic workflows and true AI Agents, Flink provides you with the essential tools for connecting to all kinds of event streams and data sources in real-time, LLM integration, context and state management, and much more.\nTo me, the appeal of using SQL in particular for building agentic systems in a declarative way lies in its notion of democratization: with the right building blocks—for instance, ready-made UDFs for invoking tools via MCP—everyone familiar with SQL can build agentic solutions and put them into production on one of the available fully managed services for Apache Flink. To automate parts of their own personal workflows, but also to create reusable workflows and agents for others.\nSo, coming back to the original premise of this post—Is this all to say that you should build all your AI Agents using Apache Flink, or Flink SQL? Certainly not. But can it be a very solid foundation for certain cases? Absolutely!\nMany thanks to everyone who provided their input and feedback while writing this post, including Joydeep Bhattacharya, Brandon Brown, Steffen Hoellinger, and Michael Noll!\n","id":43,"publicationdate":"Jun 18, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_agents_need_to_interact_with_llms\"\u003eAgents Need to Interact With LLMs\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_agents_should_be_event_driven\"\u003eAgents Should Be Event-Driven\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_agents_need_context\"\u003eAgents Need Context\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_agents_require_memory\"\u003eAgents Require Memory\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_when_sql_is_not_enough\"\u003eWhen SQL Is Not Enough\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_parting_thoughts\"\u003eParting Thoughts\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eAI Agents have improved in leaps and bounds in recent times, moving beyond simple chatbots to sophisticated, autonomous systems. This post explores a novel approach to building agentic systems: using the power of streaming SQL queries. Discover how platforms like Apache Flink can transform the development of AI Agents, offering benefits in consistency, scalability, and developer experience.\u003c/p\u003e\n\u003c/div\u003e","tags":["ai","flink","streaming","sql"],"title":"This AI Agent Should Have Been a SQL Query","uri":"https://www.morling.dev/blog/this-ai-agent-should-have-been-sql-query/"},{"content":" Table of Contents Debezium Reselect Postprocessor Flink DataStream API Flink SQL With OVER Aggregation Flink Process Table Functions Summary and Discussion Postgres logical replication, while powerful for capturing real-time data changes, presents challenges with TOAST columns, whose values can be absent from data change events in specific situations. This post discusses how Debezium addresses this through its built-in reselect post processor, then explores more robust solutions leveraging Apache Flink’s capabilities for stateful stream processing, including Flink SQL and the brand-new process table functions (PTFs) in Flink 2.1.\nLogical replication allows you to capture and propagate all the data changes from a Postgres database in real-time. Not only is it widely used for replication within Postgres clusters, thanks to the well documented protocol, also non-Postgres tools can tap into the replication data stream and leverage it for heterogeneous replication pipelines across system boundaries. With the help of logical replication clients such as the Debezium connector for Postgres, you can transfer data from your operational database into data warehouses, data lakes, or search indexes, typically with (sub-)second end-to-end latencies.\nBut logical replication has its quirks, too. Besides WAL pile-up caused by inactive replication slots (something I’ve written about here), one common stumbling stone is the specific way of how TOAST (The Oversized-Attribute Storage Technique) columns are handled by logical replication. TOAST is Postgres\u0026#39; way of dealing with large column values: if a tuple (the physical representation of a row in a Postgres table) is larger than two kilobytes, large column values will be split up into several tuples, spread across multiple database pages. Such large values are commonly found when dealing with unstructured text, or when storing non-textual media blobs, for example for multi-modal AI use cases. For each table with TOAST-able column types (for instance, text and bytea), an associated TOAST table will be created for storing these out-of-line values.\nNow, how does all that relate to logical replication? The answer to this depends on the replica identity configured for a given table. Specifically, unless a table has replica identity FULL (which isn’t always desirable due to the impact on WAL size and CPU consumption), if a row in that table gets updated, logical replication will expose a TOAST-ed field only if its value has changed. Conversely, unchanged TOAST-ed fields will not have a value provided. This means that the change events created by a CDC tool such as Debezium don’t completely describe the current state of that row, which makes them more complex to handle for consumers. Debezium change events contain a special marker value for unchanged TOAST columns in this situation, __debezium_unavailable_value.\nYou might wonder why this relatively generic sentinel value was chosen. The reason is that the value is not only used for representing missing TOAST columns in data change events emitted by the Postgres connector, but for instance also for representing Oracle LOB/BLOB columns in a similar situation.\nA change event consumer supporting partial updates can issue specific update queries which exclude any fields with that marker value. For example, Snowflake lets you do this through MERGE queries with a CASE clause. This approach isn’t ideal for a number of reasons, though. It requires the consumer to be aware of the fact that specific columns are TOAST-able, and it needs to have that information for each affected column of each affected table. Worse, if there are multiple consumers, each and every one of them will have to implement that logic. Finally, not all downstream systems may allow for partial updates to begin with, only letting you update entire records at once.\nTaking a step back, the underlying problem is that we are leaking an implementation detail here, requiring consumers to deal with something they shouldn’t really have to care about. It would be much better to solve this issue at the producer side, establishing a consciously designed data contract which shields consumers from intricacies like TOAST columns. Moving this sort of processing closer to the source of a data pipeline (\u0026#34;Shift Left\u0026#34;), helps to create reusable data products which are easier to consume, without having to reinvent the wheel in every single consumer, be it a data warehouse, data lake, or a search index.\nIn the remainder of this post I’d like to discuss several techniques for doing exactly that: Debezium’s built-in solution—​column reselects—​as well as stateful stream processing with Apache Flink.\nDebezium Reselect Postprocessor While Debezium by default exports the __debezium_unavailable_value sentinel value for unchanged TOAST-ed fields for tables with default replica identity, it provides some means to improve the situation. A post processor is available that queries the source database to retrieve the current value of the affected field, updating the change event with that value before it’s emitted. To set up the post processor, add the following to your Debezium connector configuration:\n1 2 3 4 5 6 7 8 9 10 11 { \u0026#34;connector.class\u0026#34;: \u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, ... \u0026#34;post.processors\u0026#34;: \u0026#34;reselector\u0026#34;, \u0026#34;reselector.type\u0026#34;: \u0026#34;io.debezium.processors.reselect.ReselectColumnsPostProcessor\u0026#34;, (1) \u0026#34;reselector.reselect.columns.include.list\u0026#34;: \u0026#34;inventory.authors:biography\u0026#34;, (2) \u0026#34;reselector.reselect.unavailable.values\u0026#34;: \u0026#34;true\u0026#34;, \u0026#34;reselector.reselect.null.values\u0026#34; : \u0026#34;false\u0026#34; } 1 Enable the column reselect post processor for the events emitted by this connector 2 Query missing values for the biography column of the inventory.authors table This may do the trick in certain situations, in particular if a TOAST-ed column rarely or even never changes. There are some important implications, though. Most importantly, the solution is inherently prone to data races: If there are multiple updates to a row in quick succession and the TOAST-ed column changes, an earlier change event may be enriched with the latest value of the column. This may happen as Postgres does not support queries for past values (Debezium implements a more robust solution for Oracle using an AS OF SCN query). Longer delays between creating a change event in the database and processing it with Debezium—​for instance in case of a connector downtime—​exacerbate that problem.\nFurthermore, there may be a performance impact: running a query for every event adds latency, and it may impose undesired load onto the source database, in particular considering that currently there’s no batching applied for these look-ups. When using the reselect post processor, you should make sure to run Debezium close to your database, in order to minimize the latency impact.\nIssuing a database query for getting the current value of a TOAST-ed column isn’t ideal. Rather, we’d want to retrieve the column value exactly as it was when that update happened, ideally also offloading these look-ups to a separate system. This kind of processing is a prime use case for stateful stream processors such as Apache Flink. So let’s explore how we could implement TOAST column backfills using Flink.\nFlink DataStream API Flink supports several APIs for implementing stream processing jobs which differ in terms of their complexity and the capabilities they offer. The DataStream API is a foundational API which provides you with the highest degree of freedom and flexibility, at the same time it has a steep learning curve and you can shoot into your own foot easily.\nTo implement a backfill of TOAST columns, we’ll need to create a custom processing function which manages the column values through a persistent state store. It puts the value into the state store when processing an insert change event, and later on, it’ll read it back to replace the sentinel value in update events which don’t modify the TOAST column. As the state needs to be managed per record, the KeyedProcessFunction contract must be implemented:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 public class ToastBackfillFunction extends KeyedProcessFunction\u0026lt;Long, KafkaRecord, KafkaRecord\u0026gt; { (1) private static final String UNCHANGED_TOAST_VALUE = \u0026#34;__debezium_unavailable_value\u0026#34;; private final String columnName; private ValueStateDescriptor\u0026lt;String\u0026gt; descriptor; (2) public ToastBackfillFunction(String columnName) { this.columnName = columnName; } @Override public void open(OpenContext openContext) throws Exception { descriptor = new ValueStateDescriptor\u0026lt;String\u0026gt;(columnName, String.class); (3) } @Override public void processElement(KafkaRecord in, Context ctx, Collector\u0026lt;KafkaRecord\u0026gt; out) throws Exception { (4) ValueState\u0026lt;String\u0026gt; state = getRuntimeContext().getState(descriptor); Map\u0026lt;String, Object\u0026gt; newRowState = (Map\u0026lt;String, Object\u0026gt;) in.value().get(\u0026#34;after\u0026#34;); switch ((String)in.value().get(\u0026#34;op\u0026#34;)) { case \u0026#34;r\u0026#34;, \u0026#34;i\u0026#34; -\u0026gt; state.update((String) newRowState.get(columnName)); (5) case \u0026#34;u\u0026#34; -\u0026gt; { if (UNCHANGED_TOAST_VALUE.equals( newRowState.get(columnName))) { (6) newRowState.put(columnName, state.value()); } else { state.update((String) newRowState.get(columnName)); (7) } } case \u0026#34;d\u0026#34; -\u0026gt; { state.clear(); (8) } } out.collect(in); (9) } } 1 This is a keyed process function working on Long keys (the primary key type of our table), consuming and emitting Kafka records mapped via Jackson 2 Descriptor for a key-scoped value store containing the latest value of the TOAST column 3 Initialize the state store when the function instance gets created and configured 4 The processElement() method is invoked for each element on the stream 5 When receiving an insert or read (i.e. snapshot) event, put the value of the given TOAST column into the state store 6 When receiving an update event which doesn’t modify the TOAST column, retrieve the value from the state store and put it into the event 7 When receiving an update event which does modify the column, update the value in the state store 8 When receiving a delete event, remove the value from the state store 9 Emit the event The function must be applied to a stream which is keyed by the change event’s primary record:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); KafkaSource\u0026lt;KafkaRecord\u0026gt; source = ...; KafkaSink\u0026lt;KafkaRecord\u0026gt; sink = ...; env.fromSource(source, WatermarkStrategy.noWatermarks(), \u0026#34;Kafka Source\u0026#34;) .keyBy(record -\u0026gt; { (1) return Long.valueOf((Integer) record.key().get(\u0026#34;id\u0026#34;)); }) .process(new ToastBackfillFunction(\u0026#34;biography\u0026#34;)) (2) .sinkTo(sink); env.execute(\u0026#34;Flink TOAST Backfill\u0026#34;); 1 Key the incoming change event stream by the table’s primary key, id 2 For each change event, apply the TOAST backfill function The Kafka source shown in the job reads Debezium data change events from a Kafka topic, whereas the Kafka sink will write them to another topic, once they have been processed. For each record of the source table, the processing function keeps the latest value of the TOAST column in the state store. Depending on the number of records and the size of the TOAST column values, a sizable amount of state will be stored. That’s not a fundamental problem though: Flink jobs commonly manage hundreds of gigabytes of state size, and newer developments like the disaggregated state management in Flink 2.0 can help with that task.\nYou can find the complete runnable example in my streaming-examples repo on GitHub.\nFlink SQL With OVER Aggregation Besides the DataStream API, Apache Flink also provides a relational interface to stream processing in the form of Flink SQL and the accompanying Table API. This makes stateful stream processing accessible to a much larger audience: all the developers and data engineers who are familiar with SQL. Which begs the question: can TOAST column backfills be implemented with a SQL query? As it turns out, yes it can!\nThe key idea is to use Flink’s Apache Kafka SQL connector in append-only mode for operating on the \u0026#34;raw\u0026#34; stream of Debezium change events and applying the necessary backfill with an OVER aggregation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 INSERT INTO authors_backfilled SELECT id, before, ROW( id, after.first_name, after.last_name, CASE WHEN after.biography IS NULL THEN NULL ELSE LAST_VALUE(NULLIF(after.biography, \u0026#39;__debezium_unavailable_value\u0026#39;)) OVER ( PARTITION BY id ORDER BY proctime RANGE BETWEEN INTERVAL \u0026#39;30\u0026#39; DAY PRECEDING AND CURRENT ROW ) END, after.dob ), source, op, ts_ms FROM authors Unlike a regular GROUP BY aggregation, which condenses multiple input rows into a single output row, an OVER aggregation produces an aggregated value for every input row, based on a given window.\nThe LAST_VALUE() aggregation function propagates the last non NULL value for each window. By mapping the unavailable value placeholder to NULL using NULLIF(), this will always be the latest value of the biography column. The data is partitioned by id: the aggregation window are all the change events with the same primary key within the given interval of 30 days.\nFinding the right value for that look-back period can be tricky, as it depends on the lifecycle of your data. If update events for a record can come in 180 days after the previous update, state in the Flink job must be retained for that entire time. Ideally, we’d dispose of the state for a given record once the delete event for that key has been ingested. Unfortunately, I am not aware of any way for doing so purely with Flink SQL on an append-only data stream. The PTF solution discussed in the next section implements this logic.\nIn order to handle the situation where the TOAST-ed column actually is set to NULL, the aggregation is wrapped by a CASE clause which emits the NULL value in this case. Note that the statement above is simplified somewhat for the sake of comprehensibility. In particular, it ignores the case of delete events whose after field is null, which could be implemented using another CASE clause.\nSolving the problem solely with SQL makes for a generally elegant and portable solution, especially when considering that Flink SQL tends to be more widely supported by Flink SaaS vendors than the DataStream API, due to the inherent complexities of operating the latter. Yet, it is not a silver bullet: The complexity of statements can become a problem quickly. As discussed above, you lack fine-grained control over the retention period of the required state. Furthermore, SQL arguably has a bit of a discoverability problem, in particular software engineers with a background in application development may not necessarily be aware of features such as OVER aggregations.\nThis leads us to the next and final way for backfilling TOAST columns, which combines the simplicity of SQL with the flexibility and expressiveness of implementing key parts of the functionality imperatively.\nFlink Process Table Functions The idea of this approach is to delegate state management to a custom process table function (PTF). Specified in FLIP-440, PTFs are a new kind of user-defined function (UDF) for Flink SQL, which will be available in Flink 2.1. Complementing other types of UDFs already present in earlier Flink SQL versions, such as scalar and aggregate functions, PTFs are much more powerful and have a few very interesting characteristics:\nJust like a custom process function you’d implement for the DataStream API, they provide you with access to persistent state and timers\nUnlike scalar functions, they are table-valued functions (TVFs) that accept tables as input and produce a table as output\nThey are also polymorphic functions (in fact, PTFs are called polymorphic table functions in the SQL standard), which means that their input and output types are determined dynamically, rather than statically\nThe polymorphic nature allows for extremely powerful customizations of your SQL queries, for instance there could be a PTF which exposes the contents of a Parquet file in a typed way, allowing for the projection of specific columns. Other potential use cases for custom PTFs include implementing specific join operators, doing remote REST API calls for enriching your data, integrating with LLMs for sentiment analysis or categorization, and much more.\nPTFs are a comprehensive extension to the Flink API and definitely warrant their own blog post at some point, for now let’s just take a look at how to use a PTF for backfilling Postgres TOAST columns. Note that PTFs are still work-in-progress and details of the API may change. The following has been implemented against Flink built from source as of commit f7b5d00.\nTo create a PTF, create a subclass of ProcessTableFunction, parameterized with the output type. In our case that’s Row, as this PTF produces entire table rows. The processing logic needs to be implemented in a method named eval(), which takes any arguments, and optionally a state carrier object as well as other context, as input:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 public class ToastBackfillFunction extends ProcessTableFunction\u0026lt;Row\u0026gt; { private static final String UNCHANGED_TOAST_VALUE = \u0026#34;__debezium_unavailable_value\u0026#34;; public static class ToastState { (1) public String value; } public void eval(ToastState state, Row input, String column) { (2) Row newRowState = (Row) input.getField(\u0026#34;after\u0026#34;); switch ((String)input.getField(\u0026#34;op\u0026#34;)) { case \u0026#34;r\u0026#34;, \u0026#34;c\u0026#34; -\u0026gt; { (3) state.value = (String) newRowState.getField(column); } case \u0026#34;u\u0026#34; -\u0026gt; { (4) if (UNCHANGED_TOAST_VALUE.equals(newRowState.getField(column))) { newRowState.setField(column, state.value); } else { state.value = (String) newRowState.getField(column); } } case \u0026#34;d\u0026#34; -\u0026gt; { (5) state.value = null; } } collect(input); (6) } } 1 A custom state type for managing the persistent state of this PTF; stores the latest value for the given TOAST column 2 The eval() method will be invoked for each row to be aggregated; it declares the state type and two arguments for PTF: the table to process, and the name of the TOAST column 3 If the incoming event is an insert (c) or snapshot (r) event, store the value of the specified TOAST column in the state store 4 If the incoming event is an update and the value of the TOAST column didn’t change, retrieve the value from the state store and update the input row with it; if the value did change, update the value in the state store 5 If the incoming event is a delete, remove the value for the given key from the state; i.e. in contrast to the OVER aggregation solution, the state retention time now closely matches the lifecycle of the underlying data itself 6 Emit the table row In most cases, semantics of the arguments of the eval() method can be determined automatically via reflection, or they can be specified using annotations such as @StateHint and @ArgumentHint. The TOAST backfill PTF is special in so far as that its output type can’t be specified statically; instead, it mirrors the type of the table the PTF is applied to. For dynamic cases like this, the getTypeInference() method can be overridden, allowing you to declare the exact input and output type semantics for the method:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 @Override public TypeInference getTypeInference(DataTypeFactory typeFactory) { LinkedHashMap\u0026lt;String, StateTypeStrategy\u0026gt; stateTypeStrategies = LinkedHashMap.newLinkedHashMap(1); (1) stateTypeStrategies.put(\u0026#34;state\u0026#34;, StateTypeStrategy.of( TypeStrategies.explicit( DataTypes.of(ToastState.class).toDataType(typeFactory)))); return TypeInference.newBuilder() .staticArguments( (2) StaticArgument.table( (3) \u0026#34;input\u0026#34;, Row.class, false, EnumSet.of(StaticArgumentTrait.TABLE_AS_SET)), StaticArgument.scalar(\u0026#34;column\u0026#34;, DataTypes.STRING(), false) (4) ) .stateTypeStrategies(stateTypeStrategies) (1) .outputTypeStrategy(callContext -\u0026gt; (5) Optional.of(callContext.getArgumentDataTypes().get(0))) .build(); } 1 Declares the state type of the PTF 2 Defines the arguments of the PTF 3 The first argument is the input table; it has \u0026#34;set\u0026#34; semantics, which means the method operates on partitioned sets of rows (as opposed to \u0026#34;row\u0026#34; semantics, in which case it would operate on individual rows of the table); the PTF’s state is managed within the context of each of those partitioned sets; the argument is of type Row (representing a table row) and it is not optional 4 The second argument is the name of the TOAST column to process; it is of type String and also not optional 5 The output type is exactly the same as the row type of the input table With that PTF definition in place, it can be invoked like this:\n1 2 3 4 5 6 7 8 9 10 INSERT INTO authors_backfilled SELECT id, before, after, source, op, ts_ms FROM ToastBackfill(TABLE authors PARTITION BY id, \u0026#34;biography\u0026#34;); (1) 1 Invoke the PTF for the authors table, partitioned by id, and backfilling values for the biography TOAST column Invoking a table-valued function might feel unusual at first, but on the upside the overall statement is quite a bit less complex than the OVER aggregation shown above. This illustrates another potential benefit of PTFs: they let you encapsulate that logic in a reusable function, thus allowing for less complex and verbose queries. You might develop a library of parameterized PTFs tailored to your specific use cases, ready to be used by the data engineers in your organization for building streaming pipelines.\nSummary and Discussion Used for storing large values, Postgres TOAST columns are not fully represented in data change events for tables without replica identity FULL. As such, they create complexities for downstream consumers, which typically are better off with events describing the complete state of a row.\nIn this post, we’ve explored several solutions to address this issue. Debezium’s built-in reselect post processor queries the database for missing values. It can be a solution for simple cases, but it is prone to data races and can create performance issues. Stateful stream processing, using Apache Flink, is a powerful alternative. Flink provides multiple options for solving this task, ranging from a purely imperative solution using the DataStream API, over a purely SQL-based implementation in form of an OVER aggregation, to a hybrid solution with a custom process table function for state management, invoked from within a very basic SQL query.\nTo be officially released with Flink 2.1 later this year, the PTF approach strikes a very appealing balance between expressiveness and flexibility—​for instance in regards to managing the lifecycle of TOAST backfill data in the Flink state store—​and ease of use for authors of SQL queries.\nNow, could Debezium also provide a reliable and robust solution out of the box, thus eliminating the need for any subsequent processing? Indeed I think it could: Next to the existing re-select post processor, there could be another one which implements the backfilling logic described in this post. To do so, such a post processor could directly manage values in a persistent store such as RocksDB or SlateDB. Alternatively, it also could embed Flink into the connector process, using Flink’s mini-cluster deployment mode. I’ve logged issue DBZ-9078 for exploring this further; please reach out if this sounds interesting to you!\nMany thanks to Andrew Sellers, Renato Mefi, and Steffen Hausmann for their feedback while writing this post!\n","id":44,"publicationdate":"May 26, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_debezium_reselect_postprocessor\"\u003eDebezium Reselect Postprocessor\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_flink_datastream_api\"\u003eFlink DataStream API\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_flink_sql_with_over_aggregation\"\u003eFlink SQL With OVER Aggregation\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_flink_process_table_functions\"\u003eFlink Process Table Functions\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary_and_discussion\"\u003eSummary and Discussion\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003ePostgres logical replication, while powerful for capturing real-time data changes, presents challenges with TOAST columns,\nwhose values can be absent from data change events in specific situations.\nThis post discusses how Debezium addresses this through its built-in reselect post processor,\nthen explores more robust solutions leveraging Apache Flink’s capabilities for stateful stream processing,\nincluding Flink SQL and the brand-new process table functions (PTFs) in Flink 2.1.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","debezium","cdc","flink"],"title":"Backfilling Postgres TOAST Columns in Debezium Data Change Events","uri":"https://www.morling.dev/blog/backfilling-postgres-toast-columns-debezium-change-events/"},{"content":" Often times, \u0026#34;Stream vs. Batch\u0026#34; is discussed as if it’s one or the other, but to me this does not make that much sense really.\nMany streaming systems will apply batching too, i.e. processing or transferring multiple records (a \u0026#34;batch\u0026#34;) at once, thus offsetting connection overhead, amortizing the cost of fanning out work to multiple threads, opening the door for highly efficient SIMD processing, etc., all to ensure high performance. The prevailing trend towards storage/compute separation in data streaming and processing architectures (for instance, thinking of platforms such as WarpStream, and Diskless Kafka at large) further accelerates this development.\nTypically, this is happening transparently to users, done in an opportunistic way: handling all of those records (up to some limit) which have arrived in a buffer since the last batch. This makes for a very nice self-regulating system. High arrival rate of records: larger batches, improving throughput. Low arrival rate: smaller batches, perhaps with even just a single record, ensuring low latency. Columnar in-memory data formats like Apache Arrow are of great help for implementing such a design.\nIn contrast, what the \u0026#34;Stream vs. Batch\u0026#34; discussion in my opinion should actually be about, are \u0026#34;Pull vs. Push\u0026#34; semantics: will the system query its sources for new records in a fixed interval, or will new records be pushed to the system as soon as possible? Now, no matter how often you pull, you can’t convert a pull-based solution into a streaming one. Unless a source represents a consumable stream of changes itself (you see where this is going), a pull system may miss updates happening between fetch attempts, as well as deletes.\nThis is what makes streaming so interesting and powerful: it provides you with a complete view of your data in real-time. A streaming system lets you put your data to the location where you need it, in the format you need it, and in the shape you need it (think denormalization), immediately as it gets produced or updated. The price for this is a potentially higher complexity, for example when reasoning about streaming joins (and their state), or handling out-of-order data. But the streaming community is working continuously to improve things here, e.g. via disaggregated state backends, transactional stream processing, and much more. I’m really excited about all the innovation happening in this space right now.\nNow, you might wonder: \u0026#34;Do I really need streaming (push), though? I’m fine with batch (pull).\u0026#34;\nThat’s a common and fair question. In my experience, it is best answered by giving it a try yourself. Again and again I have seen how folks who were skeptical at first, very quickly wanted to get real-time streaming for more and more, if not all of their use cases, once they had seen it in action once. If you’ve experienced a data freshness of a second or two in your data warehouse, you don’t want to ever miss this magic again.\nAll that being said, it’s actually not even about pull or push so much—​the approaches complement each other. For instance, backfills often are done via batching, i.e. querying, in an otherwise streaming-based system. Also, if you want the completeness of streaming but don’t require a super low latency, you may decide to suspend your streaming pipelines (thus saving cost) in times of low data volume, resume when there’s new data to process, and halt again.\nBatch streaming, if you will.\n","id":45,"publicationdate":"May 14, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOften times, \u0026#34;Stream vs. Batch\u0026#34; is discussed as if it’s one \u003cem\u003eor\u003c/em\u003e the other, but to me this does not make that much sense really.\u003c/p\u003e\n\u003c/div\u003e","tags":["streaming","data-processing","architecture"],"title":"\"Streaming vs. Batch\" Is a Wrong Dichotomy, and I Think It's Confusing","uri":"https://www.morling.dev/blog/streaming-vs-batch-wrong-dichotomy/"},{"content":"","id":46,"publicationdate":"May 14, 2025","section":"tags","summary":"","tags":null,"title":"data-processing","uri":"https://www.morling.dev/tags/data-processing/"},{"content":" Table of Contents Speaker Information 🇬🇧 English 🇩🇪 Deutsch I am an open-source software engineer in the Java and data streaming space. I currently work as a Technologist for Confluent. In my past role at Decodable I focused on developer outreach and helped them build their stream processing platform based on Apache Flink. Prior to that, I spent ten years at Red Hat, where I led the Debezium project, a platform for change data capture.\nI have been a long-time committer to multiple open-source projects, including Hibernate, kcctl, JfrUnit, MapStruct and Deptective; I also serve as the spec lead for Bean Validation 2.0 (first at the JCP, now under the Jakarta EE umbrella at the Eclipse Foundation).\nNamed a Java Champion, I enjoy speaking at conferences, for instance at QCon, JavaOne, Red Hat Summit, JavaZone, JavaLand and Kafka Summit.\nOccasionally, I blog about topics related to software engineering.\nSpeaker Information The following information can be used on conference websites.\nHead shot: 500px, 1000px\n🇬🇧 English Title: Senior Principal Technologist, Confluent\nBio: Gunnar Morling is an open-source software engineer in the Java and data streaming space, currently working as a Technologist at Confluent. Previously, he helped to build a realtime stream processing platform based on Apache Flink and led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as Hardwood, kcctl, JfrUnit, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at various conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.\n🇩🇪 Deutsch Titel: Senior Principal Technologist, Confluent\nBio: Gunnar Morling ist Open-Source-Softwareentwickler im Java- und Data-Streaming-Bereich, gegenwärtig tätig als Technologist für Confluent. Zuvor arbeitete er an einer Plattform für Real-Time Stream Processing basierend auf Apache Flink; weiterhin leitete er das Debezium-Projekt, eine verteilte Lösung für Change Data Capture. Er ist ein Java Champion und hat diverse Open-Source-Projekte wie Hardwood, kcctl, JfrUnit und MapStruct ins Leben gerufen. Gunnar bloggt auf morling.dev und teilt seine Erfahrungen in Vorträgen, u.a. bei JavaLand, QCon, JavaOne und Devoxx. Er lebt und arbeitet in Hamburg.\n","id":47,"publicationdate":"May 2, 2025","section":"","summary":"Table of Contents Speaker Information 🇬🇧 English 🇩🇪 Deutsch I am an open-source software engineer in the Java and data streaming space. I currently work as a Technologist for Confluent. In my past role at Decodable I focused on developer outreach and helped them build their stream processing platform based on Apache Flink. Prior to that, I spent ten years at Red Hat, where I led the Debezium project, a platform for change data capture.","tags":null,"title":"About Me","uri":"https://www.morling.dev/about/"},{"content":" The last few days I spent some time digging into the recently announced KIP-1150 (\u0026#34;Diskless Kafka\u0026#34;), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.\nThis got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—​Kafka.next if you will—​which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:\nDo away with partitions: topic partitions were crucial for scaling purposes when data was stored on node-local disks, but they are not required when storing data on effectively infinitely large object storage in the cloud. While partitions also provide ordering guarantees, this never struck me as overly useful from a client perspective. You either want to have global ordering of all messages on a given topic, or (more commonly) ordering of all messages with the same key. In contrast, defined ordering of otherwise unrelated messages whose key happens to yield the same partition after hashing isn’t that valuable, so there’s not much point in exposing partitions as a concept to users.\nKey-centric access: instead of partition-based access, efficient access and replay of all the messages with one and the same key would be desirable. Rather than coarse-grained scanning of all the records on a given topic or partition, let’s have millions of entity-level streams! Not only would this provide access exactly to the subset of data you need, it would also let you increase and decrease the number of consumers dynamically based on demand, not hitting the limits of a pre-defined partition count. Key-level streams (with guaranteed ordering) would be a perfect foundation for Event Sourcing architectures as well as actor-based and agentic systems. In addition, this approach largely solves the problem of head-of-line blocking found in partition based systems with cumulative acknowledgements: if a consumer can’t process a particular message, this will only block other messages with the same key (which oftentimes is exactly what you’d want), while all other messages are not affected. Rather than coarse-grained partitions, individual messages keys are becoming the failure domain.\nTopic hierarchies: available in systems like Solace, topic hierarchies promote parts of the message payload into structured path-like topic identifiers, allowing for clients to subscribe to arbitrary sub sets of all the available streams based on patterns in an efficient way, without requiring brokers to deserialize and parse entire messages.\nMeans of concurrency control: As is, using Kafka as a system of record can be problematic as you can’t prevent writing messages which are based on an outdated view of the stored data. Concurrency control, for instance via optimistic locking of message keys, would help to detect and fence off concurrent conflicting writes. That way, when a message gets acknowledged successfully, it is guaranteed that it has been produced seeing the latest state of that key, avoiding lost updates.\nBroker-side schema support: Kafka treats messages as opaque byte arrays with arbitrary content, requiring out-of-bands propagation of message schemas to consumers. This can be especially problematic when erroneous (or malicious) producers send non-conformant data. Also, without additional tooling, the current architecture prevents Kafka data from being written to open table formats such as Apache Iceberg. For all these reasons, Kafka is used with a schema registry most of the time, but making schema support a first-class concept would allow for better user ergonomics—​for instance, Kafka could expose AsyncAPI-compatible metadata out of the box—​and also open the door for storing data in different ways, for instance in a columnar representation.\nExtensibility and pluggability: a common trait of many successful open-source projects like Postgres or Kubernetes is their extensibility. Users and integrators can customize the behavior of the system by providing implementations of well-defined extension points and plug-in contracts, rather than by modifying the system’s core itself (following the Open-closed principle). This would enable for instance custom broker-side message filters and transformations (addressing many scenarios currently requiring a protocol-aware proxy such as Kroxylicious), storage formats (e.g. columnar), and more. Functionality such as rate limiting, topic encryption, or backing a topic via an Iceberg table should be possible to implement solely via extensions to the system.\nSynchronous commit callbacks: End-to-end Kafka pipelines ensure eventual consistency. When producing a record to a topic and then using that record for materializing some derived data view on some downstream data store, there’s no way for the producer to know when it will be able to \u0026#34;see\u0026#34; that downstream update. For certain use cases it would be helpful to be able to guarantee that derived data views have been updated when a produce request gets acknowledged, allowing Kafka to act as a log for a true database with strong read-your-own-writes semantics.\nSnapshotting: Currently, Kafka supports topic compaction, which will only retain the last record for a given key. This works well, if records contain the full state of the entity they represent (a customer, purchase order etc.). It doesn’t work though for partial or delta events, which describe changes to an entity and which need to be applied all after one another to fully restore the state of the entity. Assuming there was support for efficient key-based message replay (see above), this would take longer and longer, as the number of records for a key increases. Built-in snapshot support could allow for \u0026#34;logical compaction\u0026#34;, passing all events for a key to some event handler which condenses them into a snapshot. This would then serve as the foundation for subsequent update events, while all previous records for that key could be removed during compaction.\nMulti-tenancy: Any modern data system should be built with multi-tenancy in mind from the ground up. Spinning up a new customer-specific environment should be a very cheap operation, happening instantaneously; the workloads of individual tenants should be strictly isolated, not interfering with each other in regards to access control and security, resource utilization, metering etc.\nSome of these features are supported in other systems already—​for instance, high cardinality streams in S2, optimistic locking in Waltz, or multi-tenancy in Apache Pulsar. But others are not, and I am not aware of a single system, let alone open-source, which would combine all these traits.\nNow, this describes my personal (which is to say, that in no way this post should be understood as speaking for my employer, Confluent, in any official capacity) wishlist for what a Kafka.next could be and the semantics it could provide, driven by the use cases and applications I’ve seen people wanting to employ Kafka for. But I am sure everyone who has worked with Kafka or comparable platforms for some time will have their own thoughts around this, and I’d love to learn about yours in the comments!\nFinally, an important question of course is how would such a system actually be architected? While I’ll have to leave the answer to that for another time, it’s safe to say that building that system on top of a log-structured merge (LSM) tree would be a likely choice.\n","id":48,"publicationdate":"Apr 24, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe last few days I spent some time digging into the recently announced \u003ca href=\"https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics\"\u003eKIP-1150\u003c/a\u003e (\u0026#34;Diskless Kafka\u0026#34;), as well \u003ca href=\"https://github.com/AutoMQ/automq\"\u003eAutoMQ’s Kafka fork\u003c/a\u003e, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—​Kafka.next if you will—​which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","architecture","streaming"],"title":"What If We Could Rebuild Kafka From Scratch?","uri":"https://www.morling.dev/blog/what-if-we-could-rebuild-kafka-from-scratch/"},{"content":" Table of Contents Flink SQL Connectors for Apache Kafka The Apache Kafka SQL Connector in Append-Only Mode The Apache Kafka SQL Connector As a Changelog Source The Upsert Kafka SQL Connector Summary Over the years, I’ve spoken quite a bit about the use cases for processing Debezium data change events with Apache Flink, such as metadata enrichment, building denormalized data views, and creating data contracts for your CDC streams. One detail I haven’t covered in depth so far is how to actually ingest Debezium change events from a Kafka topic into Flink, in particular via Flink SQL. Several connectors and data formats exist for this, which can make things somewhat confusing at first. So let’s dive into the different options and the considerations around them!\nFlink SQL Connectors for Apache Kafka For processing events from a Kafka topic using Flink SQL (or the Flink Table API, which essentially offers a programmatic counterpart to SQL), there are two connectors provided by the Apache Flink project: The Apache Kafka SQL connector and the Upsert Kafka SQL Connector.\nBoth connectors can be used as a source connector—​reading data from a Kafka topic—​and as a sink connector, for writing data to a Kafka topic. There’s support for different data formats such as JSON and Apache Avro, the latter with a schema registry such as the Confluent schema registry, or API-compatible implementations like Apicurio. The Apache Kafka SQL Connector also supports Debezium-specific JSON and Avro formats.\nThe combination of connector and format defines the exact semantics, in particular whether the ingested Debezium events are processed as an append-only stream, or as a changelog stream, building and incrementally updating materialized views of the source tables based on the incoming INSERT, UPDATE, and DELETE events (Dynamic Tables in Flink SQL terminology).\nThe Apache Kafka SQL Connector in Append-Only Mode When using the Apache Kafka SQL Connector with the JSON format, no Debezium-specific semantics are applied: The Kafka topic with the Debezium events is interpreted as an append-only log of independent events. The same is the case when using the Confluent Avro format instead of JSON.\nThe schema of the table must be exactly modeled after Debezium’s data event structure, including all the fields of both message key (representing the record’s primary key) and message value (the change event):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 CREATE TABLE authors_append_only_source ( id BIGINT NOT NULL, (1) before ROW( (2) id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT ), after ROW( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT ), source ROW( version STRING, connector STRING, name STRING, ts_ms BIGINT, snapshot BOOLEAN, db STRING, sequence STRING, table STRING, txid BIGINT, lsn BIGINT, xmin BIGINT ), op STRING, ts_ms BIGINT ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;dbserver1.inventory.authors\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;scan.startup.mode\u0026#39; = \u0026#39;earliest-offset\u0026#39;, (3) \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, (4) \u0026#39;key.fields\u0026#39; = \u0026#39;id\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;json\u0026#39;, (5) \u0026#39;value.fields-include\u0026#39; = \u0026#39;EXCEPT_KEY\u0026#39; ); 1 The id field maps to the key of incoming Kafka messages 2 The before, after, source, op, and ts_ms fields map to the value of incoming Kafka messages 3 Start reading from the earliest offset of the topic 4 Use JSON as the format for Kafka keys, with the id field being part of the key 5 Use JSON as the format for Kafka values, excluding the key fields (id in this case) When taking a look at the type of the events in the Flink source table—​for instance by setting the result mode to changelog when querying the table in the Flink SQL client—​you’ll see that all the events are insertions (first op column in the listing below), no matter what their change event type is from a Debezium perspective (second op column):\n1 2 3 4 5 6 | op | id | before | after | source | op | ts_ms | +----+------+--------------------------------+--------------------------------+--------------------------------+ ---+---------------+ | +I | 1001 | \u0026lt;NULL\u0026gt; | (1001, John, Stenton, ZbJa0... | (3.1.0.Final, postgresql, d... | r | 1744296502685 | | +I | 1008 | \u0026lt;NULL\u0026gt; | (1009, John, Thomas, ZbJ0du... | (3.1.0.Final, postgresql, d... | c | 1744360987874 | | +I | 1009 | (1009, John, Thomas, ZbJ0du... | (1009, John, Beck, ZbJ0duaf... | (3.1.0.Final, postgresql, d... | u | 1744626041413 | | +I | 1008 | (1009, John, Beck, ZbJ0duaf... | \u0026lt;NULL\u0026gt; | (3.1.0.Final, postgresql, d... | d | 1744627927160 | For writing (potentially processed) change events back into an output topic, another table can be created with exactly the same schema and configuration, only that you’d adjust the topic name accordingly and omit the scan.startup.mode option. The mapping of the key is required for both source and sink table in order to ensure that the partitioning, and thus the ordering, of the Debezium events on the output topic is the same as on the input topic.\nWhen to use it: The Apache Kafka SQL Connector in append-only mode is a great choice when you want to operate on a \u0026#34;raw\u0026#34; stream of Debezium data change events, without applying any changelog or upsert semantics. It comes in handy for applying transformations such as adjusting date formats or filtering events based on specific field values. In that sense, this is similar to using the Flink DataStream API on a change event stream, only that you are using SQL rather than Java for your processing logic.\nThe Apache Kafka SQL Connector As a Changelog Source Besides the append-only mode, the Apache Kafka SQL Connector also supports changelog semantics via the Debezium data format. Both JSON (by specifying debezium-json as the value format of your table) and Avro with a registry (via debezium-avro-confluent) are supported. The INSERT, UPDATE, and DELETE events ingested from the Kafka topic are used by the Flink SQL engine to incrementally re-compute the corresponding dynamic table, as well as any continuous queries you are running against it. If you query a changelog-based source table, the result set always represents the current state of that table, updated in realtime whenever a new Debezium event comes in.\nThe table schema looks quite a bit different than before. Instead of modeling the entire Debezium envelope structure, only the actual table schema (i.e. the contents of the before and after sections) needs to be specified:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE authors_changelog_source ( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT, PRIMARY KEY (id) NOT ENFORCED (1) ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;dbserver1.inventory.authors\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;scan.startup.mode\u0026#39; = \u0026#39;earliest-offset\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;debezium-json\u0026#39; (2) ); 1 While not strictly needed here, a primary key definition—in conjunction with setting the job-level configuration table.exec.source.cdc-events-duplicate to true—ensures that duplicates are discarded in case Debezium events are ingested a second time, for instance after a connector crash 2 Using debezium-json as the value format enables changelog semantics for this table When querying this table in the Flink SQL client, the operation type reflects the kind of the incoming Debezium event. Note how update events are broken up into an update-before event (-U, representing the retraction of the old row) and an update-after event (+U, the insertion of the new row) internally by the Flink SQL engine:\n1 2 3 4 5 6 7 +----+------+------------+-----------+-----------+------------------+ | op | id | first_name | last_name | biography | registered | +----+------+------------+-----------+-----------+------------------+ | +I | 1010 | John | Thomas | ZbJ0duDvW | 1741642600000000 | | -U | 1010 | John | Thomas | ZbJ0duDvW | 1741642600000000 | | +U | 1010 | John | Stenton | ZbJ0duDvW | 1741642600000000 | | -D | 1010 | John | Stenton | ZbJ0duDvW | 1741642600000000 | For a source table it is typically not required to map the Kafka message key field(s) to the table schema when using the Debezium data format. Instead, they are part of the change event value. For situations where that’s not the case, key fields can be mapped via the key.fields configuration option; also the value.fields-include option must be set to EXCEPT_KEY then. Optionally, additional Debezium metadata fields such as the origin timestamp or the name of the source table and schema can be mapped as virtual columns:\n1 2 3 4 5 6 7 8 9 CREATE TABLE authors_changelog_source ( ts_ms TIMESTAMP_LTZ METADATA FROM \u0026#39;value.ingestion-timestamp\u0026#39; VIRTUAL, (1) source_table STRING METADATA FROM \u0026#39;value.source.table\u0026#39; VIRTUAL, (2) source_properties MAP\u0026lt;STRING, STRING\u0026gt; METADATA FROM \u0026#39;value.source.properties\u0026#39; VIRTUAL, (3) id BIGINT, ... ) WITH ( ... ); 1 Maps the ts_ms field of the change events (the time at which the data change occurred in the source database) 2 Maps the source.table field of the change events 3 Maps all the source metadata of the change events Flink’s Debezium data format requires change events to have not only the after section, but also the before part which describes the previous state of a row which got updated or deleted. This old row image is required by Flink for retracting previous values when incrementally re-computing derived data views. Unfortunately, this means that Postgres users can leverage this format only for tables which have a replica identity of FULL. Otherwise, the old row image isn’t captured in the Postgres WAL and thus not exposed via logical replication. An exception is raised in this case:\n1 2 3 java.lang.IllegalStateException: The \u0026#34;before\u0026#34; field of UPDATE message is null, if you are using Debezium Postgres Connector, please check the Postgres table has been set REPLICA IDENTITY to FULL level. at org.apache.flink.formats.json.debezium.DebeziumJsonDeserializationSchema.deserialize(DebeziumJsonDeserializationSchema.java:159) ... While Flink’s ChangelogNormalize operator can materialize the retract events (at the cost of persisting all the required data in its own state store), this currently is not supported when using the Apache Kafka SQL Connector as a changelog source with the Debezium change event format. I don’t think there’s a fundamental issue which would prevent this from being possible, it just currently isn’t implemented.\nIn order to propagate change events to another Kafka topic, you’ll need to set up a sink connector, also using debezium-json as the value format. You can define which field(s) should go into the Kafka message key via the key.fields property. Make sure to use json (not debezium-json!) as the key format:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE authors_changelog_sink ( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;authors_processed\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, \u0026#39;key.fields\u0026#39; = \u0026#39;id\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;debezium-json\u0026#39; ); While the events on the downstream Kafka topic adhere to the Debezium’s event envelope schema, they are produced by Flink, not Debezium. In particular, they are lacking all the metadata you’d usually find in the source block. Also updates are reflected by two events, rather than a single event as Debezium would emit it: a deletion event with the old row state, followed by an insert event with the new row state.\nWhen to use it: The Apache Kafka SQL connector as a changelog source (and sink) is great when you want to implement streaming queries against incoming data change events, for instance in order to create denormalized views or to enable real-time analytics of the data in an OLTP datastore. It is not the best choice for ETL pipelines which don’t require stateful processing due to the removal of all the Debezium metadata. Also, splitting updates into a delete and insert event causes write amplification in downstream systems, which otherwise might support in-place updates to existing rows.\nThe Upsert Kafka SQL Connector Last, let’s take a look at the Upsert Kafka SQL Connector. It consumes/produces a changelog stream applying \u0026#34;upsert\u0026#34; semantics. As a source connector, the first event for a given key is considered an INSERT, all subsequent events for that key with a non-null value are considered UPDATEs to the same. Tombstone records on the Kafka topic (i.e. records with a key and a null value) are interpreted as DELETE events for that key.\nTombstone records are used by Kafka to remove records during log compaction. You therefore need to configure a value for the topic’s delete.retention.ms setting which is long enough to make sure Flink gets to ingest all tombstones, also considering there may be downtimes of your processing job.\nAs a sink connector, any insert or update for a key yields an event with the current state as the value, and the deletion of a key yields a tombstone record.\nIn order for Debezium to emit such a \u0026#34;flat\u0026#34; event structure with just the current state of a row—​instead of the full Debebezium change event envelope—​the new record state transformation (a Kafka Connect single message transform, SMT) needs to be applied when configuring the connector:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 { \u0026#34;name\u0026#34;: \u0026#34;inventory-connector\u0026#34;, \u0026#34;config\u0026#34;: { \u0026#34;connector.class\u0026#34;: \u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, \u0026#34;tasks.max\u0026#34;: \u0026#34;1\u0026#34;, \u0026#34;database.hostname\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.port\u0026#34;: \u0026#34;5432\u0026#34;, \u0026#34;database.user\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.password\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.dbname\u0026#34; : \u0026#34;postgres\u0026#34;, \u0026#34;topic.prefix\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;schema.include.list\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;slot.name\u0026#34; : \u0026#34;dbserver1\u0026#34;, \u0026#34;plugin.name\u0026#34; : \u0026#34;pgoutput\u0026#34;, \u0026#34;transforms\u0026#34; : \u0026#34;unwrap\u0026#34;, (1) \u0026#34;transforms.unwrap.type\u0026#34; : \u0026#34;io.debezium.transforms.ExtractNewRecordState\u0026#34;, \u0026#34;transforms.unwrap.drop.tombstones\u0026#34; : \u0026#34;false\u0026#34; (2) } } 1 Apply the ExtractNewRecordState transform before sending the events to Kafka 2 As some Kafka Connect sink connectors can’t handle tombstone records, the connector supports dropping them. Setting this option will keep tombstone records, allowing to propagate delete events to Flink With this SMT in place, the contents of the after section of INSERT and UPDATE events will be extracted and propagated as the sole change event value, i.e. the new row state. DELETE events will be propagated as Kafka tombstones, as expected by the upsert connector. Note that the ExtractNewRecordState SMT is highly configurable, for instance you could opt into exporting specific source metadata properties as fields in the change event value, or as header properties of the emitted Kafka records.\nThe configuration of a source table for the upsert connector is pretty similar to the previous changelog source, only that the connector type is upsert-kafka:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE authors_upsert_source ( id BIGINT, first_name STRING, last_name STRING, biography STRING, registered BIGINT, PRIMARY KEY (id) NOT ENFORCED (1) ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;upsert-kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;dbserver1.inventory.authors\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;localhost:9092\u0026#39;, \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;json\u0026#39; ); 1 A primary key definition is mandatory when using the upsert connector; it determines which field(s) are part of the Kafka message key and thus are forming the upsert key The same goes for defining sink tables. Now, is it also possible to ingest full Debezium change events, i.e. with the envelope, but emit upsert-style events? Indeed it is, as you can mix and match the Kafka SQL connector as a source using the debezium-json with the Upsert Kafka SQL connector as a sink using the json format. This comes in handy for instance for writing updates to an incrementally recomputed materialized view to an OLAP store for serving purposes, without incurring the overhead of the delete + insert event pair emitted by the non-upsert connector.\nWhen to use it: Use the Upsert Kafka SQL Connector for processing \u0026#34;flat\u0026#34; Data change events, without the Debezium event envelope. Similar to the Kafka SQL Connector as a changelog source, the upsert connector lets you implement streaming queries on change event feeds. Unlike the Kafka SQL Connector, updates are emitted as a single event, which results in less write overhead on downstream systems, in particular if partial updates (rather than full row rewrites) are supported.\nSummary When venturing into the world of processing Debezium data change events in realtime with Apache Flink and Flink SQL, the combination of available connectors and data formats for doing so can be somewhat overwhelming. The table below gives an overview over the different options, their characteristics, and use cases:\nConnector Kafka SQL Connector Kafka SQL Connector as changelog source Upsert Kafka SQL Connector Stream type\nAppend-only\nChangelog\nChangelog\nChange event format\njson, avro-confluent\ndebezium-json, debezium-avro-confluent\njson, avro-confluent\nInput event type\nDebezium change event envelope\nDebezium change event envelope\nFlat events with current state; tombstone records\nOutput event type\nDebezium change event envelope\nSynthetic Debezium change event envelope; updates broken up into delete + insert event\nFlat events with current state; tombstone records\nMetadata\nIn change event envelope\nMapped to table schema\nMapped to table schema, must be part of row state\nStart reading position\nConfigurable\nConfigurable\nEarliest offset\nWhen to use\nProcessing of change events themselves, e.g. transformation, enrichment, routing\nRealtime queries on changelog streams of full Debezium events, e.g. to create materialized views and enable realtime analytics\nRealtime queries on changelog streams of \u0026#34;flat\u0026#34; data change events, e.g. to create materialized views and enable realtime analytics\nInterestingly, whereas the Apache Flink project itself provides two separate Kafka connectors for upsert and non-upsert use cases, managed Flink SQL offerings in the cloud tend to provide a more unified experience centered around one single higher-level connector. As an example, the connector for integrating Flink with Kafka topics on Confluent Cloud exposes a setting changelog.mode, which defaults to append when deriving a Flink table from an uncompacted Kafka topic and to upsert for compacted topics. Similar abstractions exist on other services too, with the general aim being to shield users from some of the intricacies here.\nOne more thing you might wonder at this point is: how does Flink CDC fit into all this? Also hosted by the Apache Software Foundation, this project integrates Debezium as a native connector into Flink, instead of channeling data change events through Apache Kafka. The Flink CDC connectors also emit changelog streams with retraction events as shown above, only the Postgres connector optionally supports upsert semantics via its changelog-mode setting.\nThere are pros and cons for both ways of integrating Debezium and Flink, for instance in regards to the replayability of events. This warrants a separate blog post just dedicated to comparing both approaches at some point, though.\nIf you’d like to experiment with the different connectors and data formats for ingesting Debezium data change events from Kafka into Flink SQL by yourself, check out this project in my stream-examples repository which contains Flink jobs for all the different configurations.\n","id":49,"publicationdate":"Apr 16, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_flink_sql_connectors_for_apache_kafka\"\u003eFlink SQL Connectors for Apache Kafka\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_apache_kafka_sql_connector_in_append_only_mode\"\u003eThe Apache Kafka SQL Connector in Append-Only Mode\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_apache_kafka_sql_connector_as_a_changelog_source\"\u003eThe Apache Kafka SQL Connector As a Changelog Source\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_upsert_kafka_sql_connector\"\u003eThe Upsert Kafka SQL Connector\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOver the years, I’ve spoken quite a bit about the use cases for processing \u003ca href=\"https://2023.javazone.no/program/355869fa-5aa0-43a7-abd2-7c5250e10bcd\"\u003eDebezium data change events with Apache Flink\u003c/a\u003e,\nsuch as metadata enrichment, building denormalized data views, and creating data contracts for your CDC streams.\nOne detail I haven’t covered in depth so far is how to actually ingest Debezium change events from a Kafka topic into Flink,\nin particular via Flink SQL.\nSeveral connectors and data formats exist for this, which can make things somewhat confusing at first.\nSo let’s dive into the different options and the considerations around them!\u003c/p\u003e\n\u003c/div\u003e","tags":["debezium","flink","kafka","cdc"],"title":"A Deep Dive Into Ingesting Debezium Events From Kafka With Flink SQL","uri":"https://www.morling.dev/blog/ingesting-debezium-events-from-kafka-with-flink-sql/"},{"content":" Table of Contents KIP-974: Docker Image for GraalVM based Native Kafka Broker With help of the GraalVM configuration developed for KIP-974 (Docker Image for GraalVM based Native Kafka Broker), you can easily build a self-contained native binary for Apache Kafka. Read on to learn how you can build a native Kafka executable yourself, starting in milli-seconds, making it a perfect fit for development and testing purposes.\nWhen I wrote about ahead-of-time class loading and linking in Java 24 recently, I also published the start-up time for Apache Kafka as a native binary for comparison. This was done via Docker, as there’s no pre-built native binary of Kafka available for the operating system I’m running on, macOS. But there is a native Kafka container image, so this is what I chose for the sake of convenience.\nNow, running in a container adds a little bit of overhead of course, so it wasn’t a surprise when Thomas Würthinger, lead of the GraalVM project at Oracle, brought up the question what the value would be when running Kafka natively on macOS. Needless to say I can’t leave this kind of nice nerd snipe pass, so I set out to learn how to build a native Kafka binary on macOS, using GraalVM.\nKIP-974: Docker Image for GraalVM based Native Kafka Broker The container image for Kafka as a native binary based on GraalVM was added via KIP-974, available since Kafka 3.8.0. And while the container image, available on DockerHub, is the only official release artifact for a native Kafka binary, the tooling and infrastructure for creating that image can be used for producing a native binary for macOS as well. You can find it in the docker/native sub-directory of the Kafka source tree.\nFor creating a native binary, you’ll need to have GraalVM installed first of all. The simplest way for doing so is via SDKMan:\n1 sdk install java 21.0.6-graal This will also install GraalVM’s native-image tool, which is needed for creating native application binaries. The build requires all the Kafka libraries (JARs) as an input. Either download the latest Kafka distribution, or just build it yourself from source:\n1 2 3 4 git clone git@github.com:apache/kafka.git cd kafka ./gradlew releaseTarGz tar xvf core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT.tgz -C core/build/distributions This will give you a Kafka distribution directory under core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT. GraalVM binary image builds require a fair bit of configuration, for instance to specify which classes should be subject to reflection, which interfaces should be available for the creation of dynamic proxies, and more. All the required configuration files are provided under docker/native/native-image-configs. Using those configuration files and the JARs from the Kafka distribution, you can build a Kafka native binary like so (there’s a ready-made script native_command.sh wrapping this invocation):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 native-image --no-fallback \\ --enable-http \\ --enable-https \\ --report-unsupported-elements-at-runtime \\ --install-exit-handlers \\ --enable-monitoring=jmxserver,jmxclient,heapdump,jvmstat \\ -H:+ReportExceptionStackTraces \\ -H:+EnableAllSecurityServices \\ -H:EnableURLProtocols=http,https \\ -H:AdditionalSecurityProviders=sun.security.jgss.SunProvider \\ -H:ReflectionConfigurationFiles=docker/native/native-image-configs/reflect-config.json \\ -H:JNIConfigurationFiles=docker/native/native-image-configs/jni-config.json \\ -H:ResourceConfigurationFiles=docker/native/native-image-configs/resource-config.json \\ -H:SerializationConfigurationFiles=docker/native/native-image-configs/serialization-config.json \\ -H:PredefinedClassesConfigurationFiles=docker/native/native-image-configs/predefined-classes-config.json \\ -H:DynamicProxyConfigurationFiles=docker/native/native-image-configs/proxy-config.json \\ --verbose \\ -march=compatibility \\ -cp \u0026#34;core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT/libs/*\u0026#34; kafka.docker.KafkaDockerWrapper \\ -o \u0026#34;native-kafka\u0026#34;; say \u0026#34;Enjoy native Kafka\u0026#34; This takes about 1m 36s on my machine (2023 MacBook Pro M3 Max with 48 GB of shared RAM), after which there is a fully self-contained macOS/AArch64 binary native-kafka. To see how this one is used, refer to the launch script.\nThe binary supports two modes, setup and start. The former formats a Kafka log directory. As the primary use case is in containers, the set-up mode supports the overlay of a set of default configuration files with user-provided configuration provided via a volume mount, which is merged and then written out to another directory. For a quick test run we can overlay the default configuration from the Kafka distribution with the one from the container image for setting up a single node Kafka cluster and write the result to a new directory:\n1 2 3 4 5 6 7 8 9 mkdir native-conf export CLUSTER_ID=\u0026#34;5L6g3nShT-eMCtK--X86sw\u0026#34; # Obtain a unique id via \u0026#34;$(bin/kafka-storage.sh random-uuid)\u0026#34; ./native-kafka setup \\ --default-configs-dir core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT/config \\ --mounted-configs-dir docker \\ --final-configs-dir native-conf Formatting metadata directory /tmp/kraft-combined-logs with metadata.version 4.0-IV3 With the log directory being formatted, the actual Kafka broker can be run using the start mode like so:\n1 2 3 ./native-kafka start \\ --config docker/server.properties \\ -Dlog4j2.configurationFile=native-conf/log4j2.yaml Now, interestingly, this actually takes a fair bit longer to start than when run via Docker as in the previous post: about 220 ms from the first log message emitted by Kafka to the \u0026#34;Kafka Server started\u0026#34; message, vs the 120 ms I had observed via Docker. Which is kinda puzzling, considering that Linux containers are running in a virtual machine on macOS. It would be very interesting to learn why that’s the case, perhaps some more efficient library implementation in Linux when running in a container?\nThat being said, starting up the container itself takes about 340 ms on my machine (time from starting Docker up to the first Kafka log message), so running the native executable directly on macOS still is the fastest way to launch a Kafka broker.\n","id":50,"publicationdate":"Apr 7, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_kip_974_docker_image_for_graalvm_based_native_kafka_broker\"\u003eKIP-974: Docker Image for GraalVM based Native Kafka Broker\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eWith help of the GraalVM configuration developed for KIP-974 (Docker Image for GraalVM based Native Kafka Broker),\nyou can easily build a self-contained native binary for Apache Kafka.\nRead on to learn how you can build a native Kafka executable yourself,\nstarting in milli-seconds, making it a perfect fit for development and testing purposes.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen I wrote about \u003ca href=\"/blog/jep-483-aot-class-loading-linking/\"\u003eahead-of-time class loading and linking in Java 24\u003c/a\u003e recently,\nI also published the start-up time for Apache Kafka as a native binary for comparison.\nThis was done via Docker, as there’s no pre-built native binary of Kafka available for the operating system I’m running on, macOS.\nBut there is a native Kafka container image, so this is what I chose for the sake of convenience.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eNow, running in a container adds a little bit of overhead of course,\nso it wasn’t a surprise when Thomas Würthinger, lead of the GraalVM project at Oracle,\n\u003ca href=\"https://bsky.app/profile/thomaswue.dev/post/3lloypreatk2s\"\u003ebrought up the question\u003c/a\u003e what the value would be when running Kafka natively on macOS.\nNeedless to say I can’t leave this kind of nice nerd snipe pass,\nso I set out to learn how to build a native Kafka binary on macOS, using GraalVM.\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","graalvm","native-image","performance"],"title":"Building a Native Binary for Apache Kafka on macOS","uri":"https://www.morling.dev/blog/building-native-binary-for-apache-kafka-macos/"},{"content":"","id":51,"publicationdate":"Apr 7, 2025","section":"tags","summary":"","tags":null,"title":"graalvm","uri":"https://www.morling.dev/tags/graalvm/"},{"content":"","id":52,"publicationdate":"Apr 7, 2025","section":"tags","summary":"","tags":null,"title":"native-image","uri":"https://www.morling.dev/tags/native-image/"},{"content":"","id":53,"publicationdate":"Mar 27, 2025","section":"tags","summary":"","tags":null,"title":"aot","uri":"https://www.morling.dev/tags/aot/"},{"content":" Table of Contents Building an AOT Cache for Apache Kafka AOT Caching With Apache Flink Summary In the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am exploring interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, relevant improvements to Java and the JVM, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below.\nJava 24 got released last week, and what a meaty release it is: more than twenty Java Enhancement Proposals (JEPs) have been shipped, including highlights such as compact object headers (JEP 450, I hope to spend some time diving into that one some time soon), a new class-file API (JEP 484), and more flexible constructor bodies (JEP 492, third preview). One other JEP which might fly a bit under the radar is JEP 483 (\u0026#34;Ahead-of-Time Class Loading \u0026amp; Linking\u0026#34;). It promises to reduce the start-up time of Java applications without requiring any modifications to the application itself, what’s not to be liked about that? Let’s take a closer look!\nJEP 483 is part of a broader OpenJDK initiative called Project Leyden, whose objective is to reduce the overall footprint of Java programs, including startup time and time to peak performance. Eventually, its goal is to enable ahead-of-time compilation of Java applications, as such providing an alternative to GraalVM and its support for AOT native image compilation, which has seen tremendous success and uptake recently. AOT class loading and linking is the first step towards this goal within Project Leyden. It builds upon of the Application Class Data Sharing (AppCDS) feature available in earlier Java versions. While AppCDS only reads and parses the class files referenced by an application and dumps them into an archive file, JEP 483 also loads and links the classes and caches that data. I.e. even more work is moved from application runtime to build time, thus resulting in further reduced start-up times.\nLike the case with AppCDS, a training run is required for creating the AOT cache file. During that run, you should make sure that the right set of classes gets loaded: when not loading all the classes required by an application, the AOT cache is not utilized to the fullest extent and the JVM will fall back to loading them on demand at runtime. On the other hand, when loading classes actually not used by an application at runtime (for instance classes of a testing framework), the size of the cache file gets bloated without any benefit. The classpath must be consistent between training run and actual application run: the same JAR files must be present, in the same order. The runtime classpath may be amended with additional JARs though, which naturally will not feed into the AOT cache.\nLet’s put AOT class loading and linking into action using Apache Kafka as an example. While the start-up overhead of a long-running component like a Kafka broker typically may not be that relevant, it absolutely can make a difference when for instance frequently starting and stopping brokers during development and testing.\nBuilding an AOT Cache for Apache Kafka Coincidentally, Apache Kafka 4.0 was released last week, too. So let’s download it and use it for our experiments. Unpack the distribution and format a directory for the Kafka files:\n1 2 3 tar xvf kafka_2.13-4.0.0.tgz KAFKA_CLUSTER_ID=\u0026#34;$(bin/kafka-storage.sh random-uuid)\u0026#34; bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties Building an AOT cache is a two-step process. First, a list of all the classes which should go into the archive needs to be generated. This list is then used for creating the archive itself. This feels a bit more convoluted than it should be, and indeed the JEP mentions that simplifying this is on the roadmap.\nCreate the class list like so:\n1 2 export EXTRA_ARGS=\u0026#34;-XX:AOTMode=record -XX:AOTConfiguration=kafka.aotconf\u0026#34; (1) bin/kafka-server-start.sh config/server.properties 1 The EXTRA_ARGS variable can be used to pass any additional arguments to the JVM when launching Kafka, in this case to specify that the list of classes for the AOT cache should be recorded in the file kafka.aotconf As an aside, Kafka has completely parted ways with ZooKeeper as of the 4.0 release and exclusively supports KRaft for cluster coordination. By using the server.properties file, our single broker runs in the so-called \u0026#34;combined\u0026#34; mode, so it has both the \u0026#34;broker\u0026#34; and \u0026#34;controller\u0026#34; roles. Very nice to see how simple things have become here over the years!\nOnce Kafka has started, open a separate shell window. Create a topic in Kafka, then produce and consume a couple of messages like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 bin/kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 Created topic my-topic. bin/kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092 \u0026gt;hello \u0026gt;world \u0026lt;Ctrl + C\u0026gt; bin/kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092 hello world \u0026lt;Ctrl + C\u0026gt; Processed a total of 2 messages This shows the trade-off involved when creating AOT cache files: we don’t have to produce and consume messages here, but in all likelihood this will trigger the loading of classes which otherwise would be loaded and linked at runtime only. It may be a good idea to monitor which classes get loaded via JDK Flight Recorder, thus making sure you are indeed capturing the relevant set when creating the AOT cache file.\nStop the broker by hitting \u0026lt;Ctrl + C\u0026gt; in the session where you started it. If you take a look at the kafka.aotconf file, you’ll see that it essentially is a long list of classes to be cached, as well as other class-related metadata. The comment at the top still hints at the history of Leyden’s AOT support being built on top of CDS:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # NOTE: Do not modify this file. # # This file is generated via the -XX:DumpLoadedClassList=\u0026lt;class_list_file\u0026gt; option # and is used at CDS archive dump time (see -Xshare:dump). # java/lang/Object id: 0 java/io/Serializable id: 1 java/lang/Comparable id: 2 java/lang/CharSequence id: 3 java/lang/constant/Constable id: 4 java/lang/constant/ConstantDesc id: 5 java/lang/String id: 6 java/lang/reflect/AnnotatedElement id: 7 java/lang/reflect/GenericDeclaration id: 8 java/lang/reflect/Type id: 9 java/lang/invoke/TypeDescriptor id: 10 ... Next, let’s try and create the actual AOT cache file. To do so, specify the -XX:AOTMode=create option. Note that the application is not actually executed during this process, instead the JVM will only create the AOT cache file and exit again:\n1 2 export EXTRA_ARGS=\u0026#34;-XX:AOTMode=create -XX:AOTConfiguration=kafka.aotconf -XX:AOTCache=kafka.aot\u0026#34; (1) bin/kafka-server-start.sh config/server.properties 1 Create the AOT cache using the previously created configuration file Uh, oh, something isn’t quite working as expected:\n1 2 3 4 5 java.lang.IllegalArgumentException: javax.management.NotCompliantMBeanException: com.sun.management.UnixOperatingSystemMXBean: During -Xshare:dump, module system cannot be modified after it\u0026#39;s initialized at java.management/javax.management.StandardMBean.\u0026lt;init\u0026gt;(StandardMBean.java:270) at java.management/java.lang.management.ManagementFactory.addMXBean(ManagementFactory.java:882) at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$1(ManagementFactory.java:474) ... This message was a bit confusing to me—​I don’t think I’m interacting with the Java module system in any way? So I sent a message to the leyden-dev mailing list, where I learned that this may be triggered by starting the JMX agent of the JVM. While I was not actively doing that, indeed this is the case by default as per the run-class.sh launcher script coming with the Kafka distribution. So let’s disable JMX diagnostics and try again:\n1 2 export KAFKA_JMX_OPTS=\u0026#34; \u0026#34; bin/kafka-server-start.sh config/server.properties Some of the classes are skipped for different reasons, but overall, things look much better this time:\n1 2 3 4 5 6 7 8 [0.908s][warning][cds] Preload Warning: Verification failed for org.apache.logging.log4j.core.async.AsyncLoggerContext [2.307s][warning][cds] Skipping org/slf4j/Logger: Old class has been linked [2.307s][warning][cds,resolve] Cannot aot-resolve Lambda proxy because org.slf4j.Logger is excluded [2.613s][warning][cds ] Skipping jdk/internal/event/Event: JFR event class [2.615s][warning][cds ] Skipping org/apache/logging/slf4j/Log4jLogger: Unlinked class not supported by AOTClassLinking [2.615s][warning][cds ] Skipping org/apache/logging/slf4j/Log4jLoggerFactory: Unlinked class not supported by AOTClassLinking ... AOTCache creation is complete: kafka.aot A tad concerning that Log4j’s AsyncLoggerContext class fails verification, but we’ll leave analysis of that for another time. The AOT cache file has a size of 66 MB in this case. It is considered an implementation detail and as such is subject to change between Java versions. Now let’s see what’s the impact of using the AOT cache on Kafka’s start-up time. To do so, simply specify the name of the cache file when running the application:\n1 2 export EXTRA_ARGS=\u0026#34;-XX:AOTCache=kafka.aot\u0026#34; bin/kafka-server-start.sh config/server.properties I’ve measured the start-up time by comparing the timestamp of the very first log message emitted by Kafka to the timestamp of the message saying \u0026#34;Kafka Server started\u0026#34;, always starting from a freshly formatted Kafka logs directory and flushing the page cache in between runs. Averaged over five runs, this took 285 ms on my machine (a 2023 MacBook Pro with M3 Max processor and 48 GB shared memory). In comparison, Kafka took 690 ms to start without the archive, i.e. the AOT cache makes for a whopping 59% reduction of start-up time in this scenario.\nWhen building the AOT cache, you can also disable AOT class loading and linking by specifying the -XX:-AOTClassLinking option, effectively resulting in the same behavior you’d get when using AppCDS on earlier Java versions. This would result an Kafka start-up time of 327 ms on my laptop, i.e. the lion share of the improvement in the case at hand indeed originates from reading and parsing the class files ahead of time, with AOT loading and linking them only yielding a relatively small improvement in addition. Finally, I’ve also measured how long it takes to start the Kafka native binary in a Docker container (see KIP 974), which took 118 ms, i.e. less than half of the time it took with the AOT cache. Keep in mind though that this image is considered experimental and not ready for production, whereas there shouldn’t be any concern of that kind when running Kafka with the AOT cache on the JVM.\nAOT Caching With Apache Flink As mentioned before, apart from testing scenarios, Kafka typically is a long-running workload, and as such, start-up times don’t matter that much in the grand scheme of things. To add another data point, I’ve also tested how beneficial AOT class-loading and linking is for a simple Apache Flink job.\nNow, Flink jobs usually are deployed by uploading them as a JAR to a Flink cluster, after which their code is loaded with a custom classloader. As of today, JEP 483 doesn’t support AOT class loading and linking with user-defined class loaders, though (the JEP suggests that this limitation may be lifted in a future Java version). This means that only Flink’s built-in classes would benefit from AOT, while any classes of a Flink job and its dependencies would be excluded. For my experimentation I’ve therefore decided to go with Flink’s mini-cluster deployment, a simplified mode of using Flink in a non-distributed manner, just by running the job’s main class.\nThe test job uses the Flink connector for Apache Kafka to read a message from a Kafka topic. I measured the time-to-first-message after starting the job: without the AOT cache (again averaged over five runs), this took 1.875 seconds on my machine, vs. 0.913 seconds with the AOT cache. A 51% reduction of time-to-first-message in this scenario, very nice! Using the AOT cache without loading and linking classes yielded a 40% improvement over the default behavior (1.118 seconds). I couldn’t test Flink as a GraalVM native binary; if you are aware of any work towards making that a reality, I’d love to hear from you!\nSummary AOT class loading and linking is a very welcomed addition to Java. Built upon the previously existing concepts of CDS and AppCDS, it helps to further cut down the start-up time of JVM-based applications, by moving the process of loading and linking classes ahead to build time. The actual impact will vary between specific applications, for Kafka and a basic Flink job I could observe a reduction of 59% and 51% of start-up time, respectively.\nWhile start-up times don’t matter that much for long running workloads, they can make a huge difference in cloud-native scenarios where applications are dynamically scaled out, spinning up new instances on demand as the load of incoming requests increases. Also think of scale-to-zero deployments, preview jobs for real-time queries in a cloud-based stream processing solution, CLI utilities, starting up resources such as Kafka for integration tests, and many more—​whenever a human is waiting for a process to come up and provide a response, every bit of time you can save will result in a better user experience immediately.\nThe great thing about the AOT machinery provided by Project Leyden and JEP 483 is that it requires no modifications whatsoever to your application code. It can be used with any Java application, providing potentially significant reductions to start-up times essentially for free. The required training run feels a bit cumbersome in its current form, but the JEP suggests that improvements in that area will be done in future revisions. In fact, there’s a draft JEP already which provides some more details on how this might look like. In general, the requirement of a training run can be challenging from a software development lifecycle perspective, in particular when considering (immutable) container images, for instance when deploying to Kubernetes. The application will have to be executed at image build time, also performing some work to trigger loading and linking all relevant classes, potentially requiring remote resources such as a database, too. This may not always be trivial to do.\nThe big elephant in the room is how Project Leyden compares to GraalVM, the other Java AOT technology developed by Oracle. As far as I can say, there’s quite a bit of overlap between the goals of the two projects. At this point, GraalVM is much more advanced than Leyden, with full support for AOT compilation, not only providing even more impressive improvements to start-up times (a Java application can start in a few milli-seconds when compiled into a native binary using GraalVM) but also yielding a significant reduction of memory usage. On the downside, applications and their dependencies typically need adjustment and more or less complex configuration in order to make use of GraalVM’s AOT compilation (frameworks like Quarkus can help with this task). Furthermore, the closed-world assumption underlying GraalVM prevents the dynamism the JVM is known for, such as loading classes at application runtime for plug-in use cases, modifying or even generating classes on the fly, etc.\nIn that regard it will be interesting to see what Project Leyden will come up with in this space. It also seeks to support AOT compilation eventually, but is exploring a middle ground between a highly constrained closed-world assumption and full dynamism, for instance by providing means to developers for specifying which modules of their application may be target to class redefinitions and which ones are not. Besides faster start-up times, another goal here is faster warm-up, i.e. a faster time to peak performance.\nHaving been kicked off in 2020, it got silent around Leyden for quite some time, but it has picked up steam again more recently, with JEP 483 being one of the first actual deliverables. It’ll definitely be worth keeping your eyes open for the other Leyden JEPs, AOT code compilation and AOT method profiling. Currently in draft state, there’s no target Java version known for those, but early access builds can already be obtained from the OpenJDK website.\n","id":54,"publicationdate":"Mar 27, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_building_an_aot_cache_for_apache_kafka\"\u003eBuilding an AOT Cache for Apache Kafka\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_aot_caching_with_apache_flink\"\u003eAOT Caching With Apache Flink\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eIn the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am exploring interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, relevant improvements to Java and the JVM, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://openjdk.org/projects/jdk/24/\"\u003eJava 24\u003c/a\u003e got released last week,\nand what a meaty release it is:\nmore than twenty Java Enhancement Proposals (JEPs) have been shipped,\nincluding highlights such as compact object headers (\u003ca href=\"https://openjdk.org/jeps/450\"\u003eJEP 450\u003c/a\u003e, I hope to spend some time diving into that one some time soon),\na new class-file API (\u003ca href=\"https://openjdk.org/jeps/484\"\u003eJEP 484\u003c/a\u003e),\nand more flexible constructor bodies (\u003ca href=\"https://openjdk.org/jeps/492\"\u003eJEP 492\u003c/a\u003e, third preview).\nOne other JEP which might fly a bit under the radar is \u003ca href=\"https://openjdk.org/jeps/483\"\u003eJEP 483\u003c/a\u003e (\u0026#34;Ahead-of-Time Class Loading \u0026amp; Linking\u0026#34;).\nIt promises to reduce the start-up time of Java applications without requiring any modifications to the application itself,\nwhat’s not to be liked about that?\nLet’s take a closer look!\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","aot"],"title":"Let's Take a Look at... JEP 483: Ahead-of-Time Class Loading \u0026 Linking!","uri":"https://www.morling.dev/blog/jep-483-aot-class-loading-linking/"},{"content":"","id":55,"publicationdate":"Mar 18, 2025","section":"tags","summary":"","tags":null,"title":"microservices","uri":"https://www.morling.dev/tags/microservices/"},{"content":" For building a system of distributed services, one concept I think is very valuable to keep in mind is what I call the synchrony budget: as much as possible, a service should minimize the number of synchronous requests which it makes to other services.\nThe reasoning behind this is two-fold: synchronous calls are costly. The more synchronous requests you are doing, the longer it will take to process inbound requests to your own service; users don’t like to wait and might decide to take their business elsewhere if things take too long. Secondly, synchronous requests impact the availability of your service, because all the invoked services must be up and running in order for your service to work. The more services you rely on in a synchronous manner, the lower the availability of your service will be.\nSynchronous calls are tools that can help assure consistency, but by design they block progression until complete. In that sense, the idea of the synchrony budget is not about a literal budget which you can spend, but rather about being mindful how you implement communication flows between services: as asynchronous as possible, as synchronous as necessary.\nLet’s make things a bit more tangible by looking at an example. Consider an e-commerce website where users can place purchase orders. When an order comes in, the order entry service needs to interact with a couple of other services in order to process that order:\na payment service for processing the payment of the customer\nan inventory service for allocating stock of the purchased item\na shipment service for triggering the fulfillment of the order\nLet’s start with the last one, the shipment service. Does it matter to the customer who is placing an order when exactly the shipment service receives that notification? Not at all. Hence, notifying the shipment service synchronously from within the order entry request handler would be a waste of our synchrony budget. Not only would it cause that inbound request to take longer than it has to, it would also cause the order entry request to fail when the shipment service isn’t available, for instance due to maintenance, a network split, or some other kind of failure. Also, we don’t need to report any response from the shipment service back to the client making the inbound order placement request. This makes this call a perfect candidate for asynchronous execution, for instance by having the order service send a message to a Kafka topic, which then gets consumed by the shipment service. That way, the order service request isn’t slowed down by awaiting a response from the shipment service, also a downtime of the shipment service won’t affect the order service’s availability. It will just process any pending messages from the Kafka topic when it is up again. In general, whenever one service solely needs to notify another service about something that happened, defaulting to asynchronous communication makes a lot of sense.\nIn a similar spirit, if any changed data should be propagated from an OLTP data store to an OLAP system, this should be done asynchronously. By definition, analytical queries issued against the latter don’t require instantaneous visibility into each single data change as it is occurring in the OLTP system. So sending synchronous requests to an OLAP store would be another good example for unnecessarily spending your synchrony budget.\nNow, what if our messaging infrastructure, such as Kafka, can’t be reached? Aren’t we back to square one? We might envision some means of buffering for that case, such as storing the messages to be sent in some local state store and sending them out once connectivity to Kafka has been restored. Luckily, we don’t have to reinvent the wheel here: the outbox pattern is a well-established approach for channeling outgoing messages through a service’s data store, transactionally consistent with any other data changes that need to be done at the same time. Tools for log-based change data capture (CDC), such as Debezium, can be used for extracting the messages from an outbox table with low overhead and high performance. That way, the only stateful resource which is required by a service to process incoming requests is its own database.\nLet’s look at the communication with the inventory service next. When the order service processes an incoming request, it will require the information whether the specified item is available in the desired quantity. This differs from the notification semantics used for communicating with the shipment service, as we do need data from the inventory service in order to process the inbound request. So should we make a synchronous call in this case? It certainly could be an option, but again it would eat into our synchrony budget: there’d be an impact on our response times, and what should we do in case the inventory service isn’t available? Should the incoming request be failed? But not accepting customer requests because of some internal technical hick-up doesn’t sound that desirable.\nReversing the communication flow can be a way out: the inventory service could publish a feed of inventory changes, pushing a message to Kafka whenever there’s an inventory update. The order service could subscribe to this feed and materialize a view of this data in its own local data store. That way, no synchronous calls between services are required when processing an order request, this can solely be done by querying the order service’s database. The change feed of the inventory service could again be implemented via the outbox pattern; another option would be to use CDC for capturing changes in the actual business tables in the inventory database and then leverage stream processing, for instance with Apache Flink, to establish a stable data contract for that data stream. That way, consumers like the order service are shielded from any potentially disruptive changes to the shipment service’s data model and the stream processor can handle denormalizing relational tables to provide consumers with fully contextualized events.\nOf course, there is a trade-off here: as updates to the order service’s view of the inventory data happen asynchronously, we might run into a situation where that view is outdated and a request for an item gets accepted, while it actually is not in stock any more. In practice, Debezium and Kafka can propagate data changes with sub-second latency end-to-end, so the time window for errors will be very small during normal operation. But it also helps to take a step back and look at things from a business perspective: reality isn’t transactional to begin with. I remember a birthday party a few years back where one of my friends was on call and had to patch the inventory table of an e-commerce application after a rack of flowers had been tossed over in the warehouse. In other words, a business needs to have means of dealing with situations like this in any case. In all likelihood, we’ll be better off sending a customer a $10 voucher as an apology in the rare case of accepting an order for an item without inventory, instead of spending our synchrony budget and establishing a synchronous call flow for this process.\nNow, let’s look at the communication with the payment service. Depending on the specifics, this one actually may be a case where a synchronous call is justified. When for instance building a flight booking system, you really want to be 100% sure that the credit card of the customer can be charged successfully, before acknowledging a booking request. Replicating the data of all credit cards and bank accounts in the world obviously isn’t possible, so the call flow can’t be reversed either. It’s for a reason that payment processor APIs are built with extremely high availability in mind. And this is what the notion of the synchrony budget is about: implement inter-service calls asynchronously whenever it’s possible, so you have the room to make synchronous calls if and when it’s absolutely required. That being said, for an e-commerce application it may be actually feasible to make synchronous calls to the payment service by default, but fall back to asynchronous processing in case of failures. As the contract to sell typically only gets accepted when an item gets shipped, you still have the room to cancel an order if a payment falls through on the asynchronous processing path.\nFinally, here’s how our overall solution of the data flows relevant to the order service could look like, applying the mental model of a synchrony budget:\n","id":56,"publicationdate":"Mar 18, 2025","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor building a system of distributed services, one concept I think is very valuable to keep in mind is what I call the \u003cem\u003esynchrony budget\u003c/em\u003e:\nas much as possible, a service should minimize the number of synchronous requests which it makes to other services.\u003c/p\u003e\n\u003c/div\u003e","tags":["architecture","distributed-systems","microservices"],"title":"The Synchrony Budget","uri":"https://www.morling.dev/blog/the-synchrony-budget/"},{"content":" Table of Contents Towards Queue Support in Kafka—​Introducing Share Groups Share Groups in Action Retry Behavior and State Management Share Group State Persistence Summary and Outlook In the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am going to explore interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below!\nThat guy above? Yep, that’s me, whenever someone says \u0026#34;Kafka queue\u0026#34;. Because, that’s not what Apache Kafka is. At its core, Kafka is a distributed durable event log. Producers write events to a topic, organized in partitions which are distributed amongst the brokers of a Kafka cluster. Consumers, organized in groups, divide the partitions they process amongst themselves, so that each partition of a topic is read by exactly one consumer in the group.\nThis partition-based design defines two of Kafka’s key characteristics:\nThe maximum degree of consumer parallelism: Each partition is processed by not more than one consumer; in order to increase the number of consumers processing a topic, it needs to be split up into more partitions, which implies a potentially costly repartitioning operation for existing topics with a large amount of data.\nOrdered processing of messages: All messages with the same partitioning key will be sent to the same partition which is processed by a single consumer.\nThese semantics make Kafka a great foundation for a large variety of high volume data streaming use cases such as click stream processing, metrics and log ingestion, real-time ETL and analytics, microservices data exchange, fraud detection, and many more. On the flip side, Kafka, as is, is not a good fit for use cases requiring queuing semantics, where you’d like to process messages one by one, potentially scaling out consumers way beyond the number of partitions in a topic. In particular, consumers as of today commit the progress they’ve made within a partition by means of persisting the offset of the last message they’ve processed. It is not possible to acknowledge or reject individual messages. This leads to a problem known as \u0026#34;head-of-line blocking\u0026#34;: if a given message can’t be consumed for whatever reason, or if it just takes very long to do so, that consumer can’t easily move beyond of that message.\nIn Kafka terminology, the elements of a topic are referred to as \u0026#34;record\u0026#34;, with \u0026#34;message\u0026#34; oftentimes being used interchangeably. Personally, I am using the former when referring to the technical concept of an entry of a log, whereas I’m using \u0026#34;message\u0026#34; (or \u0026#34;event\u0026#34;, depending on the specific use case) when discussing the semantic entity which is represented by a record.\nOne common example for this is job queueing: you’d like to submit unrelated work items to a queue, from where they are picked up and processed as quickly as possible by a set of independent workers. Each item should be processed in isolation, i.e. while one worker is consuming an item from the queue, another worker should be able to pick up the next one in parallel, without having to await successful handling of the first one. If there’s many work items, or if they take a longer time to process, it should be able to add more workers to ensure a reasonable overall throughput of the system. A work item which cannot be processed for some reason should not hold up the processing of subsequent items.\nWhile some efforts were made to support this kind of use case when using Kafka, for instance in the form of Confluent’s parallel consumer, actual queue implementations such as ActiveMQ Artemis or RabbitMQ were traditionally better suited for this. To learn more about the fundamental differences between event logs and queues, and why it can be interesting to implement the latter on top of the former, refer to this excellent blog post by Jack Vanlightly.\nAs of Kafka 4.0—​due in a couple of weeks—​things will change though: after two years of work, an Early Access of KIP-932: Queues for Kafka is part of this release. It promises to add queue-like semantics to Kafka. Let’s take a look!\nTowards Queue Support in Kafka—​Introducing Share Groups At the core of KIP-932 are so-called share groups : expanding the existing notion of Kafka consumer groups, a share group is a set of cooperative consumers processing the messages from a topic. Unlike consumer groups though, multiple members of a share group can process the messages on one and the same partition. This means that there can be more (active) members in a share group than there are partitions, and a high degree of consumer parallelism can be achieved also when having just a few or even only a single partition. Membership in a share group is coordinated using the new consumer rebalance protocol introduced in Kafka 4.0 via KIP-898. A partition consumed by a share group is called a share partition.\nMessages can be acknowledged individually, allowing for much more flexibility than the offset-based approach of consumer groups. A broker-side component called the share-partition leader manages the state of in-flight messages, distributing them to the members of the share group. The share-partition leader is co-located with the leader of the partition, i.e. it’s currently not supported to use share groups and thus Kafka queues when reading from a follower node in the Kafka cluster.\nThe messages in a share-partition go through a life cycle of distinct states as shown below:\nThe share-partition leader processes messages which are eligible for consumption on a share-partition via a sliding window, demarcated by a lower offset called the share-partition start offset (SPSO) and a higher offset called the share-partition end offset (SPEO) . All messages before the SPSO are in the Archived state, all messages after the SPEO are in Available state. The messages within the window are called in-flight messages. When a consumer fetches messages, the leader will search for available messages in the in-flight window, mark them as acquired, and return them in a batch to the consumer. To limit memory consumption on the broker, the maximum number of messages in Acquired state can be controlled via the group.share.partition.max.record.locks configuration setting. When processing a message, a consumer may\nacknowledge it as successfully consumed, transitioning it to Acknowledged state,\nrelease it, transitioning it back to Available state and thus making it available for redelivery, or\nreject it, transitioning it to Archived state, marking it as unprocessable.\nEvery message has a delivery counter, which gets increased each time it gets acquired. The maximum number of deliveries is limited using the group.share.delivery.attempt.limit broker option, preventing an infinite retry loop of consuming some unprocessable message (\u0026#34;poison pill\u0026#34;).\nOne key aspect to understand is that the specific message states exist exclusively within the scope of a specific share group; this means that for instance a message may be rejected by one share group but be processed successfully by another. A share group may also be reset, allowing it to reprocess all the messages of a topic, or all the messages after a given timestamp. The Kafka distribution provides a new script, bin/kafka-share-groups.sh , for this purpose.\nAs the available messages on a share-partition are distributed amongst the members of the share group, there’s no guarantee in regards to the order of processing. Depending on specific timing behaviors, potential retries, etc., messages with higher offsets may be consumed before messages with lower offsets in the same partition. This is in stark contrast to how traditional Kafka consumer groups work, where the messages in one partition are always consumed in order of increasing offset. The KIP mentions that ordering of messages within a single batch is guaranteed to be in increasing offset order, but I’m not sure how useful this is going to be in practice, given consumers lack control over which messages end up in a given batch.\nOn the other hand it could be very useful for certain use cases to have guaranteed ordering for the messages with one and the same key. Consider for instance an ETL use case consuming data change events produced by a CDC tool such as Debezium. The source record’s primary key is used as the Kafka message key in this scenario, ensuring all change events for a given record are written to the same partition of the corresponding Kafka topic. With regular consumer groups, ordering of events for the same key is ensured, which is vital to make sure that the destination of such a pipeline receives the change events in the correct order, for instance when considering two subsequent updates to a record.\nBut arguably, the partition-based ordering is too coarse-grained in this scenario, as the order of events across keys typically doesn’t matter (and where it does matter, it would have to be global for the entire topic, not just a single partition). This comes at the price of reduced flexibility to parallelize and scale out the consumer, as described above. In contrast, share groups essentially don’t provide strong ordering guarantees, making them not suitable for this use case. If there was support for strong key-based ordering, that’d be a very useful middle ground between scalability and the provided semantics. It would be great to see this in a future version of queue support for Apache Kafka.\nShare Groups in Action Let’s shift gears a bit and take a look at how share groups can be used from within a Java application. At the time of writing, there’s no preview build of Apache Kafka 4.0 available yet, so I’ve built Kafka and its client libraries from source, which luckily is as straight forward as running the following:\n1 ./gradlew releaseTarGz publishToMavenLocal This will yield a Kafka distribution archive under core/build/distributions/kafka_2.13-4.1.0-SNAPSHOT and install the client libraries into the local Maven repository.\nAs of the Kafka 4.0 release, share groups are an early access feature, not meant for production usage yet. As such, the feature needs to be enabled explicitly. To do so, add the following settings to your broker configuration file (for more details, see the release notes as well as the KIP, which provides a list of all new configuration options added for share group support):\n1 2 unstable.api.versions.enable=true group.coordinator.rebalance.protocols=classic,consumer,share The Kafka client library contains a new API, KafkaShareConsumer, which exposes the new queue and share group semantics. Its overall programming model is very similar to the existing KafkaConsumer API, simplifying the transition from one to the other. For console-based access, the Kafka distribution contains a new shell script, kafka-console-share-consumer.sh , similar to kafka-console-consumer.sh known from previous Kafka versions.\nThe share consumer supports two working modes: implicit and explicit acknowledgement of messages. When using implicit mode, message acknowledgements will be committed automatically for the entire batch of messages processed by the consumer. In the simplest case, this happens for the previous batch when calling poll() again:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Properties props = new Properties(); props.setProperty(\u0026#34;bootstrap.servers\u0026#34;, \u0026#34;localhost:9092\u0026#34;); props.setProperty(\u0026#34;group.id\u0026#34;, \u0026#34;my-share-group\u0026#34;); KafkaShareConsumer\u0026lt;String, String\u0026gt; consumer = new KafkaShareConsumer\u0026lt;\u0026gt;( props, new StringDeserializer(), new StringDeserializer()); consumer.subscribe(Arrays.asList(\u0026#34;my-topic\u0026#34;)); while (true) { ConsumerRecords\u0026lt;String, String\u0026gt; records = consumer.poll( Duration.ofMillis(100)); (1) for (ConsumerRecord\u0026lt;String, String\u0026gt; record : records) { process(record); } } 1 Fetch the next batch of messages, implicitly acknowledging the messages of the previous batch This approach lacks fine-grained control over acknowledgements, but it can be interesting if your primary interest in using share groups is to increase the number of workers in a consumer group beyond the partition count. For a typical queueing use case however, you’ll want message-level acknowledgements. This can be achieved via the ShareConsumer::acknowledge() method. It takes a record and an acknowledge type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 while (true) { ConsumerRecords\u0026lt;String, String\u0026gt; records = consumer.poll( Duration.ofMillis(100)); for (ConsumerRecord\u0026lt;String, String\u0026gt; record : records) { if (isProcessable(record)) { process(record); consumer.acknowledge(record, AcknowledgeType.ACCEPT); (1) } else if (isRetriable(record)) { consumer.acknowledge(record, AcknowledgeType.RELEASE); (1) } else { consumer.acknowledge(record, AcknowledgeType.REJECT); (1) } } consumer.commitSync(); (2) } 1 Acknowledge a message 2 Synchronously commit the acknowledgement state of all messages of the batch The acknowledge type can be one of the following:\nACCEPT, if the message could be processed successfully\nRELEASE, if the message cannot be processed due to some transient error, i.e. it may be processed successfully when retrying later on\nREJECT, if the the message cannot be processed and also is not retriable\nThe acknowledgement status for a given message will only be actually committed by calling commitSync(). If the consumer crashes after calling acknowledge() but before the commit happens, all messages from the batch will be presented to a consumer of the group again. When not calling commitSync(), the next invocation of poll() will commit automatically. This happens asynchronously though, which means you might receive a new batch of messages while the commit of the acknowledgement status of a previous batch fails.\nWhen releasing a message for retrying, it will be part of a subsequent batch until the maximum delivery count for the message has been reached, in which case it will transition to Archived state, without having been processed. If required, a messages delivery count can be obtained from the ConsumerRecord. This allows you for instance to log a record when it hits the retry limit before archiving it.\nNewly created share groups start processing from the latest offset by default. If you want it to start from the beginning of the input topic(s) instead, you need to set the newly added configuration property share.auto.offset.reset to earliest. Unlike the well-known auto.offset.reset option, this is not a consumer configuration option, but a group configuration option. You can use the AdminClient API for setting it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 try (AdminClient client = AdminClient.create(adminProperties)) { ConfigEntry entry = new ConfigEntry(\u0026#34;share.auto.offset.reset\u0026#34;, \u0026#34;earliest\u0026#34;); AlterConfigOp op = new AlterConfigOp(entry, AlterConfigOp.OpType.SET); Map\u0026lt;ConfigResource, Collection\u0026lt;AlterConfigOp\u0026gt;\u0026gt; configs = Map.of( new ConfigResource( ConfigResource.Type.GROUP, SHARE_GROUP), Arrays.asList(op)); try (Admin admin = AdminClient.create(adminProperties)) { admin.incrementalAlterConfigs(configs).all().get(); } } Message-level acknowledgement is a key improvement to Kafka, enabling use cases like job queuing which were not well supported before. At the same time, the feature still feels relatively basic at this point.\nMost importantly, there’s no notion of a dead letter queue (DLQ) as of the Apache Kafka 4.0 release. Once an unprocessable message has been archived, there’s no way of identifying it. For many use cases it will be required to either have means for retrieving the unprocessable messages with an offset smaller than the SPSO or, better yet, to have bespoke DLQ support, i.e. a dedicated topic to which unprocessable messages are sent automatically. In scenarios where there’s a dependency between messages with the same key, it would also be desirable to send all subsequent messages to the DLQ once one message with a given key got DLQ-ed, until that issue has been resolved. As of today, this is something you’d have to build entirely yourself.\nAnother useful enhancement would be more flexible retrying behaviors. In the current form of Kafka queues, a released message will be retried immediately; there’s no support for delaying retries (e.g. via exponential back-off) or configure a scheduled redelivery. This means that all available retry attempts will happen very quickly, which isn’t ideal for dealing with transient failures such as not being able to connect to an external service. Retrying within a short period of time may not be useful in this situation, while retrying after 30 or 60 minutes could.\nAll that being said, the support for queue semantics in Kafka 4.0 is an early access feature after all, and I’m sure all kinds of improvements can and will be made in subsequent releases. In particular, DLQ support is explicitly being mentioned in the KIP as a future extension.\nRetry Behavior and State Management Let’s dig a bit deeper and explore how retries are currently handled by the share group API. To do so, I’ve built a share consumer which processes some messages as shown in the in-flight records example in the KIP:\nThe messages on the topic have a String value which matches their offset: \u0026#34;0\u0026#34;, \u0026#34;1\u0026#34;, \u0026#34;2\u0026#34;, etc. The process logic looks like follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 System.out.println(\u0026#34;Record | Status | Delivery Count\u0026#34;); System.out.println(\u0026#34;--------------------------------\u0026#34;); while (true) { ConsumerRecords\u0026lt;String, String\u0026gt; records = consumer.poll( Duration.ofMillis(100)); for (ConsumerRecord\u0026lt;String, String\u0026gt; record : records) { String status = switch(record.value()) { case \u0026#34;1\u0026#34;, \u0026#34;5\u0026#34; -\u0026gt; { consumer.acknowledge(record, AcknowledgeType.ACCEPT); yield \u0026#34;ACKED\u0026#34;; } case \u0026#34;3\u0026#34;, \u0026#34;7\u0026#34;, \u0026#34;8\u0026#34;, \u0026#34;9\u0026#34; -\u0026gt; { consumer.acknowledge(record, AcknowledgeType.RELEASE); yield \u0026#34;AVAIL\u0026#34;; } case \u0026#34;6\u0026#34; -\u0026gt; { consumer.acknowledge(record, AcknowledgeType.REJECT); yield \u0026#34;ARCHV\u0026#34;; } // doing nothing, i.e. remain in Acquired state default -\u0026gt; { yield \u0026#34;ACQRD\u0026#34;; } }; System.out.println(String.format(\u0026#34;%s | %s | %s\u0026#34;, record.value(), status, record.deliveryCount().get())); } consumer.commitSync(); } Starting from the beginning of the topic, here’s the output of the first polling iteration:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Record | Status | Delivery Count -------------------------------- 0 | ACKED | 1 1 | ACKED | 1 2 | ACQRD | 1 3 | AVAIL | 1 4 | ACQRD | 1 5 | ACKED | 1 6 | ARCHV | 1 7 | AVAIL | 1 8 | AVAIL | 1 9 | AVAIL | 1 2 | ACQRD | 1 4 | ACQRD | 1 2 | ACQRD | 1 4 | ACQRD | 1 2 | ACQRD | 1 4 | ACQRD | 1 ... The first ten lines—​corresponding to the first batch returned by the poll() call—​are not too surprising: all messages are processed as expected. But then something interesting is happening: messages 2 and 4 (but not messages 3, 7, 8, 9 in Available state) are retrieved again and again. As it turns out, messages in Acquired status are returned indefinitely by poll() until they are acknowledged. This happens purely client-side, i.e. reaching the broker-side maximum lock duration (configured via group.share.record.lock.duration.ms, defaulting to 30s) does not cause an interruption here, which may be surprising. Also note that the delivery count is not increased in this case. After speaking to the engineering team working on this team I learned that exact behaviors and semantics are still in flux here—​the API is marked as unstable at this point—​so you probably are going to see some changes here with the 4.1 release.\nOnly when actually acknowledging a message and trying to commit after the maximum lock duration has been reached, an exception is triggered. It is not actually raised though; instead you need to examine the partition-exception map returned by commitSync():\n1 2 3 4 5 6 7 Map\u0026lt;TopicIdPartition, Optional\u0026lt;KafkaException\u0026gt;\u0026gt; syncResult = consumer.commitSync(); System.out.println(syncResult); // output adjusted for readability: // { [.underline]#oj_vK_XvQeSrL58aI81r1g:my-topic-0=Optional[org.apache.kafka.common.errors.InvalidRecordStateException# : // The record state is invalid. The acknowledgement of delivery could not be completed.]} Note that this affects all the messages on that share partition whose acknowledgement you tried to commit. I.e. also a message which you acknowledged would be retried again in this case.\nWhen running another consumer in the same share group—​or when restarting the consumer above—​it’ll receive the available messages 3, 7, 8, and 9. Whether it’ll also receive 2 and 4 depends on whether the acknowledgement lock already has expired or not.\nShare Group State Persistence The state of inflight messages needs to be made durable by the share-partition coordinator. This responsibility is handled through a component called the share-group state persister ; While the KIP mentions that his could be a pluggable component eventually, there’s only a single persister implementation right now. It stores the state of share groups in a special Kafka topic named __share_group_state.\nThere are two kinds of records on that topic, ShareSnapshot and ShareUpdate records. The former represents a complete self-contained snapshot of the persistent state of a share-group, whereas the latter represents an incremental update to that state. An epoch field in the records is used to fence off writes by zombie share-partition leaders. Upon start-up, the coordinator reads the entire topic and builds up the state for a given share-partition. It does so by finding the latest snapshot record and then applying all subsequent updates. As such, the share group state topic isn’t suitable for Kafka topic compaction (i.e. keeping only the latest record with a given message key). Instead, the coordinator itself deletes all records for a share partition before the latest snapshot record.\nTo take a look at the __share_group_state topic, you can use the standard Kafka console consumer; just make sure to use the class o.a.k.t.c.g.s.ShareGroupStateMessageFormatter as a formatter:\n1 2 3 4 5 6 bin/kafka-console-consumer.sh \\ --bootstrap-server localhost:9092 \\ --property print.key=true \\ --topic __share_group_state \\ --from-beginning \\ --formatter=org.apache.kafka.tools.consumer.group.share.ShareGroupStateMessageFormatter Here’s a message describing the state of the inflight messages shown above:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 { \u0026#34;key\u0026#34;: { \u0026#34;version\u0026#34;: 1, (1) \u0026#34;data\u0026#34;: { \u0026#34;groupId\u0026#34;: \u0026#34;my-share-group\u0026#34;, \u0026#34;topicId\u0026#34;: \u0026#34;YrHYV-TdRrqvUkvejYQ8Gw\u0026#34;, \u0026#34;partition\u0026#34;: 0 } }, \u0026#34;value\u0026#34;: { \u0026#34;version\u0026#34;: 0, \u0026#34;data\u0026#34;: { \u0026#34;snapshotEpoch\u0026#34;: 0, \u0026#34;leaderEpoch\u0026#34;: 0, \u0026#34;startOffset\u0026#34;: 0, (2) \u0026#34;stateBatches\u0026#34;: [ { \u0026#34;firstOffset\u0026#34;: 0, \u0026#34;lastOffset\u0026#34;: 1, \u0026#34;deliveryState\u0026#34;: 2, (3) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 3, \u0026#34;lastOffset\u0026#34;: 3, \u0026#34;deliveryState\u0026#34;: 0, (4) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 5, \u0026#34;lastOffset\u0026#34;: 5, \u0026#34;deliveryState\u0026#34;: 2, (3) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 6, \u0026#34;lastOffset\u0026#34;: 6, \u0026#34;deliveryState\u0026#34;: 4, (5) \u0026#34;deliveryCount\u0026#34;: 1 }, { \u0026#34;firstOffset\u0026#34;: 7, \u0026#34;lastOffset\u0026#34;: 9, \u0026#34;deliveryState\u0026#34;: 0, (4) \u0026#34;deliveryCount\u0026#34;: 1 } ] } } } 1 Indicates this is a ShareUpdate record) 2 The current share-partition start offset 3 Status ACKED 4 Status AVAIL 5 Status ARCHV To manage the state of share groups, the aforementioned script bin/kafka-share-groups.sh can be used. It allows you to list and describe existing share groups and their members, reset and delete their offsets, and more:\n1 2 3 4 5 6 7 8 9 bin/kafka-share-groups.sh \\ --bootstrap-server localhost:9092 \\ --describe \\ --group my-share-group \\ --verbose GROUP TOPIC PARTITION LEADER-EPOCH START-OFFSET my-share-group\tmy-topic-2 0 - 2 Summary and Outlook KIP-932: Queues for Kafka adds a long awaited capability to the Apache Kafka project: queue-like semantics, including the ability to acknowledge messages on a one-by-one basis. This positions Kafka for use cases such as job queuing, for which it hasn’t been a good fit historically. As multiple members of a share group can process the messages from a single topic partition, the partition count does not limit the degree of consumer parallelism any longer. The number of consumers in a group can quickly be increased and decreased as needed, without requiring to repartition the topic.\nBuilt on top of Kafka’s event log semantics, Kafka queues provide some interesting characteristics typically not found in other queue implementations, such as the ability to retain the messages on a queue for an indefinite period of time, reprocess some or all of them, and have multiple independent groups of consumers, with each group processing all the messages on the topic. For instance, you could have two share groups applying slightly different variants of some processing logic in an A/B testing scenario.\nOne aspect which I couldn’t explore due to time constraints are the performance characteristics of Kafka’s queue support. It would be interesting to see how the overall throughput increases as more consumers are added to a share group—​without increasing the number of partitions—​how message-level acknowledgements impact performance, or what the impact of, say, rejecting and retrying every 10th message would be. This would be a highly interesting topic for a follow-up post.\nAvailable as an early access feature as of the Kafka 4.0 release, Kafka queues are not recommended for production usage yet, and there are several limitations worth calling out: most importantly, the lack of DLQ support. More control over retry timing would be desirable, too. As such, I don’t think Kafka queues in their current form will make users of established queue solutions such as Artemis or RabbitMQ migrate to Kafka. It is a very useful addition to the Kafka feature set nevertheless, coming in handy for instance for teams already running Kafka and who look for a solution for simple queuing use cases, avoiding to stand up and operate a separate solution just for these. This story will become even more compelling if the feature gets built out and improved in future Kafka releases.\nVoting for the release 4.0.0. RC1 of Apache Kafka just started earlier today, so it shouldn’t be much longer until you can give queue support a try yourself with an official release. To discuss any feedback you may have, reach out to the Kafka developer mailing list.\nMany thanks to Andrew Schofield for his input and feedback while writing this post!\n","id":57,"publicationdate":"Mar 5, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_towards_queue_support_in_kafkaintroducing_share_groups\"\u003eTowards Queue Support in Kafka—​Introducing Share Groups\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_share_groups_in_action\"\u003eShare Groups in Action\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_retry_behavior_and_state_management\"\u003eRetry Behavior and State Management\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_share_group_state_persistence\"\u003eShare Group State Persistence\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary_and_outlook\"\u003eSummary and Outlook\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eIn the \u0026#34;Let’s Take a Look at…​!\u0026#34; blog series I am going to explore interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cspan class=\"image\"\u003e\u003cimg src=\"/images/kip_932_1.jpg\" alt=\"kip 932 1\" width=\"333px\"/\u003e\u003c/span\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThat guy above? Yep, that’s me, whenever someone says \u0026#34;Kafka queue\u0026#34;. Because, that’s not what Apache Kafka is. At its core, Kafka is a distributed durable event log. Producers write events to a topic, organized in partitions which are distributed amongst the brokers of a Kafka cluster. Consumers, organized in groups, divide the partitions they process amongst themselves, so that each partition of a topic is read by exactly one consumer in the group.\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","streaming","queues"],"title":"Let's Take a Look at... KIP-932: Queues for Kafka!","uri":"https://www.morling.dev/blog/kip-932-queues-for-kafka/"},{"content":"","id":58,"publicationdate":"Mar 5, 2025","section":"tags","summary":"","tags":null,"title":"queues","uri":"https://www.morling.dev/tags/queues/"},{"content":" Table of Contents Fault Tolerance and High Availability Manually Triggering Savepoints Observability Bonus: Managing Flink Jobs With the Heimdall UI Summary and Discussion This post originally appeared on the Decodable blog. All rights reserved.\nWelcome back to this two-part blog post series about running Apache Flink on Kubernetes, using the Flink Kubernetes operator. In part one, we discussed installation and setup of the operator, different deployment types, how to deploy Flink jobs using custom Kubernetes resources, and how to create container images for your own Flink jobs. In this part, we’ll focus on aspects such as fault tolerance and high availability of your Flink jobs running on Kubernetes, savepoint management, observability, and more. You can find the complete source code for all the examples shown in this series in the Decodable examples repository on GitHub: on GitHub.\nFault Tolerance and High Availability So far, we don’t have any means of fault tolerance or high availability (HA) in place for our job. If for instance a task or job manager pod crashes, all progress the job has made will be lost. Similarly, its state is not retained across restarts. If you change the job’s configuration or suspend and restart it, it will start from the beginning. To ensure that no state is lost in case of failures or (intentional) restarts, the following is required:\nEnable checkpointing for the Flink job, allowing to recover from task manager failures\nEnabling job manager HA, allowing to gracefully handle job manager failures\nSavepoints, allowing to take a consistent snapshot of a job and to resume from that snapshot later on\nIt also is possible to ensure HA for the Flink Kubernetes operator itself. To do so, enable leader election for the operator, as described in the documentation. You then can run multiple replicas of the operator, with one of them being the leader and others in a stand-by capacity, ready to take over should the current leader fail.\nWhether this is necessary or not, depends on the requirements of the specific use case. Even without operator HA, existing job deployments are not affected by an operator failure and will continue to run. It thus may be acceptable to have short downtimes of the operator, provided the right monitoring is in place to detect the situation swiftly and resolve it manually.\nAs Kubernetes pods are ephemeral, the corresponding state needs to be persisted externally. Oftentimes, object storage such as an S3 bucket is used for this purpose, thus avoiding the need to mount any persistent volumes to the Flink pods. Based on the upstream HA example resource, the following shows how to configure checkpoints, savepoints, and job manager HA using the bucket created in MinIO we set up before:\nResource definition custom-job-ha.yaml for a custom Flink job with fault tolerance and high availability 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: custom-job-ha spec: image: decodable-examples/flink-hello-world:1.0 flinkVersion: v1_20 flinkConfiguration: taskmanager.numberOfTaskSlots: \u0026#34;2\u0026#34; s3.access.key: minio s3.secret.key: minio123 s3.endpoint: http://minio-service.default.svc.cluster.local:9000 s3.path.style.access: \u0026#34;true\u0026#34; state.backend: rocksdb state.backend.incremental: \u0026#34;true\u0026#34; state.checkpoints.dir: s3://flink-data/checkpoints state.savepoints.dir: s3://flink-data/savepoints high-availability.type: kubernetes high-availability.storageDir: s3://flink-data/ha serviceAccount: flink jobManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 taskManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 podTemplate: spec: containers: - name: flink-main-container env: - name: ENABLE_BUILT_IN_PLUGINS value: \u0026#34;flink-s3-fs-presto-1.20.0.jar\u0026#34; job: jarURI: local:///opt/flink/examples/streaming/flink-hello-world-1.0.jar parallelism: 2 upgradeMode: savepoint state: running Deploy the job by applying the resource:\n1 $ kubectl apply -f flink/custom-job-ha.yaml Quite a few things are going on here, so let’s digest them one by one. First, the directories for storing checkpoints and savepoints are configured using the state.checkpoints.dir and state.savepoints.dir options, respectively. We are using RocksDB as a Flink state backend, enabling incremental checkpointing, thus avoiding the need to transfer the complete state upon each checkpoint, which can substantially reduce checkpoint durations. While not that relevant for a small-scale example like the one at hand, it can make sense to make use of S3 entropy injection: by adding a random part at the beginning of checkpoint paths, the data will be distributed across multiple shards of the S3 bucket.\nTo enable Flink’s job manager HA services, high-availability.type must be set to kubernetes and a directory for storing job manager metadata must be given via high-availability.storageDir. In order for Flink to access data on S3, an S3 file system plug-in is enabled via the ENABLE_BUILT_IN_PLUGINS environment variable set for the Flink containers in the pod template of the resource descriptor. Flink provides two plug-ins for S3; we are using the Presto one as this is the recommended option for storing checkpoint data.\nFor accessing the S3 API provided by MinIO in this example, we are providing the access key and secret key right within the resource definition itself. In a production use case, storing data in an S3 bucket on AWS, this should be avoided, instead authentication should happen via IAM, granting access to S3 via an IAM role for the Flink pods.\nLastly, by setting job.upgradeMode to savepoint, the operator will trigger a savepoint when suspending a job and start from that savepoint when resuming. To verify that this actually works, make a change to the resource descriptor (for instance to change its parallelism to 2) and apply it again. If you examine the logs of the task manager pod subsequently, you should notice that the emitted sequence numbers don’t start at 1 but at the point where the job left off before.\nYou can also suspend a job explicitly, if for instance your use case doesn’t require a real-time processing of data and you want to save on compute resources. To do so, patch the job’s target state to suspended:\n1 $ kubectl patch FlinkDeployment custom-job-ha --patch \u0026#39;{\u0026#34;spec\u0026#34; : { \u0026#34;job\u0026#34; : { \u0026#34;state\u0026#34; : \u0026#34;suspended\u0026#34; }}}\u0026#39; --type=merge Once the operator picks up this change, it will shut down the job, removing the job manager and task manager pods. The actual FlinkDeployment resource remains present, though, now indicating that it is suspended:\n1 2 3 $ kubectl get FlinkDeployment custom-job-ha NAME JOB STATUS LIFECYCLE STATE custom-job-ha FINISHED SUSPENDED To resume the job, patch the target state back to running, after which you should be able to observe that the job continues from the latest savepoint. The cool thing is that savepoints created that way are represented by Kubernetes resources, too. The operator provides a dedicated resource type for this, FlinkStateSnapshot, which you can query to obtain the list of savepoints:\n1 2 3 $ kubectl get FlinkStateSnapshots NAME PATH RESULT TIMESTAMP SNAPSHOT STATE custom-job-ha-savepoint-upgrade-1736963811478 s3://flink-data/savepoints/savepoint-52e52e-58013f2a2604 2025-01-15T17:56:51.536782Z COMPLETED Manually Triggering Savepoints It is also possible to manually create savepoints and use them when resuming a suspended job, or when creating a new instance of that job. To do so, create and apply a FlinkStateSnapshot resource, referencing the job to snapshot:\nResource definition savepoint.yaml for manually triggering a savepoint 1 2 3 4 5 6 7 8 9 10 apiVersion: flink.apache.org/v1beta1 kind: FlinkStateSnapshot metadata: name: example-savepoint spec: backoffLimit: 1 jobReference: kind: FlinkDeployment name: custom-job-ha savepoint: {} 1 $ kubectl apply -f flink/savepoint.yaml In order to start a job from this savepoint, retrieve its path on S3 via kubectl describe FlinkStateSnapshot example-savepoint and specify it under spec.job.initialSavepointPath in the FlinkDeployment resource of the job:\n1 2 3 4 5 6 7 8 ... spec: job: jarURI: local:///opt/flink-jobs/flink-hello-world-1.0.jar upgradeMode: savepoint state: running initialSavepointPath: s3://flink-data/savepoints/savepoint-eba39a-1fd698a34a01 ... Note that if you’d like to restart an existing job from a specific savepoint, you also need to specify savepointRedeployNonce in addition. Set it to 1 when it hasn’t been set before, otherwise increment its value for each restart:\n1 2 3 4 ... initialSavepointPath: s3://flink-data/savepoints/savepoint-eba39a-1fd698a34a01 savepointRedeployNonce: 1 ... Observability Once you are moving your Flink jobs to production, it is critical to have good observability for all the components in place, allowing you to examine the system’s state and react to failures, degraded performance, etc. Typically, an application’s logs, metrics, and traces are imported into services and tools such as Datadog, New Relic, or OpenSearch for this purpose.\nIt’s beyond the scope of this post to provide a comprehensive description of an observability solution for Flink deployments. To give you a starting point though, the accompanying repository contains a basic example setup which shows how to ingest logs from job and task manager pods into Elasticsearch, allowing you to analyze them in a Kibana dashboard.\nInstall the Elasticsearch Kubernetes operator, then set up Elasticsearch and Kibana by running the following:\n1 2 3 4 $ kubectl create -f https://download.elastic.co/downloads/eck/2.16.0/crds.yaml $ kubectl apply -f https://download.elastic.co/downloads/eck/2.16.0/operator.yaml $ kubectl -n logging apply -f logging/elastic.yaml $ kubectl -n logging apply -f logging/kibana.yaml The Kubernetes Logging operator is used for collecting and forwarding log files. You can deploy it with the following Helm command:\n1 $ helm upgrade --install --wait --create-namespace --namespace logging logging-operator oci://ghcr.io/kube-logging/helm-charts/logging-operator Next, configure a Logging instance:\n1 $ kubectl -n logging apply -f logging/logging.yaml This installs as an agent (fluentbit) on each worker node of the Kubernetes cluster, which collects the logs of the pods on this node—enriching them with Kubernetes metadata such as pod labels—and forwards them to a collector service (fluentd). There they are filtered and transformed as per the current configuration and finally forwarded to Elasticsearch. By using log4j2’s JsonTemplateLayout for Flink log messages (configured here), the same are emitted in a structured form, making it very easy to process them:\n1 2 3 ... {\u0026#34;@timestamp\u0026#34;:\u0026#34;2025-01-17T09:51:55.295Z\u0026#34;,\u0026#34;ecs.version\u0026#34;:\u0026#34;1.2.0\u0026#34;,\u0026#34;log.level\u0026#34;:\u0026#34;INFO\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;Completed checkpoint 360 for job 2bd169b220ae0d7e26f3edf0869c244f (15960 bytes, checkpointDuration=9 ms, finalizationTime=0 ms).\u0026#34;,\u0026#34;process.thread.name\u0026#34;:\u0026#34;jobmanager-io-thread-1\u0026#34;,\u0026#34;log.logger\u0026#34;:\u0026#34;org.apache.flink.runtime.checkpoint.CheckpointCoordinator\u0026#34;,\u0026#34;flink-job-id\u0026#34;:\u0026#34;2bd169b220ae0d7e26f3edf0869c244f\u0026#34;} ... The ingestion pipeline into Elasticsearch is set up via two resources processed by the Kubernetes Logging operator. The first is a ClusterOutput resource, describing the destination of the pipeline, i.e. the Elasticsearch endpoint, the credentials, etc.:\nResource definition elastic-output.yaml, describing the output of the logging pipeline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 apiVersion: logging.banzaicloud.io/v1beta1 kind: ClusterOutput metadata: name: es-output spec: elasticsearch: host: quickstart-es-http.logging.svc.cluster.local port: 9200 scheme: https ssl_verify: false ssl_version: TLSv1_2 user: elastic password: valueFrom: secretKeyRef: name: quickstart-es-elastic-user key: elastic buffer: timekey: 1m timekey_wait: 30s timekey_use_utc: true Create the output:\n1 $ kubectl -n logging apply -f logging/elastic-output.yaml The second resource is of type Flow and describes the set of pods to ingest from (leveraging the component labels set up by the Flink Kubernetes operator), as well as the parsing logic and destination of log pipeline:\nResource definition elastic-flow.yaml, describing the logic of the logging pipeline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion: logging.banzaicloud.io/v1beta1 kind: Flow metadata: name: es-flow spec: filters: - tag_normaliser: {} - parser: remove_key_name_field: true reserve_data: true parse: type: json time_key: \u0026#34;@timestamp\u0026#34; match: - select: labels: component: jobmanager - select: labels: component: taskmanager globalOutputRefs: - es-output Create the flow:\n1 $ kubectl apply -f logging/elastic-flow.yaml Once these resources are in place, the Kubernetes Logging operator will set up the required infrastructure for extracting and propagating the Flink logs, utilizing fluentbit and fluentd. The logs will be ingested into an Elasticsearch, which you can inspect and query using Kibana. To access Kibana, retrieve the password of the elastic user like so:\n1 2 $ kubectl -n logging get secret quickstart-es-elastic-user -o=jsonpath=\u0026#39;{.data.elastic}\u0026#39; | base64 --decode; echo 9V9520L2yvgzJla0RIyQo784 Then forward port 5601 of the \u0026#34;quickstart-kb-http\u0026#34; service:\n1 $ kubectl -n logging port-forward service/quickstart-kb-http 5601 Navigate to https://localhost:5601/ (accept the warning because of the unknown certificate issue) and log in, then go to \u0026#34;Analytics\u0026#34; → \u0026#34;Discover\u0026#34; and create a data view for the index named fluentd:\nFigure 1. Flink job logs ingested into Elasticsearch As mentioned before, this is just scratching the surface when it comes to setting up observability infrastructure for Flink. There’s a wide range of tools and platforms in that space, and you should choose what matches your requirements, integrating with the solutions and infrastructure already in use with your organization. Besides logging, your observability strategy will typically also cover metrics (which are supported by Flink via a range of metrics exporters, for instance for Prometheus, InfluxDB, and DataDog) and traces (e.g. supported via the OpenTelemetry trace exporter). But also Java-specific tools such as JDK Flight Recorder can be invaluable to gain insight into the performance characteristics and other runtime behaviors of your Flink jobs.\nBonus: Managing Flink Jobs With the Heimdall UI For working with the Flink web UI in a production deployment, the operator optionally sets up a Kubernetes Ingress resource for each job, making its UI available for external access. To do so, simply add the following section to the job’s resource definition:\n1 2 3 4 5 ingress: template: \u0026#34;localhost/{{name}}(/|$)(.*)\u0026#34; className: \u0026#34;nginx\u0026#34; annotations: nginx.ingress.kubernetes.io/rewrite-target: \u0026#34;/$2\u0026#34; You’ll also need an ingress controller, which you can install for the local kind cluster like so:\n1 2 3 4 5 $ helm upgrade --install ingress-nginx ingress-nginx \\ --repo https://kubernetes.github.io/ingress-nginx \\ --namespace ingress-nginx --create-namespace --set controller.nodeSelector.\u0026#34;kubernetes\\.io/hostname\u0026#34;=my-cluster-worker $ kubectl apply -f heimdall/deploy-ingress-nginx.yaml Once the controller is running, you can then access the Flink web UI from your local host, without requiring any port forwardings, for instance at http://localhost/custom-job-ha/ for the custom-job-ha job.\nWith a large number of jobs it can become challenging to keep track of all the deployed jobs and where to find their UIs, logs, or metrics. This is where Heimdall comes into the picture, a project by Yaroslav Tkachenko: it provides one unified UI for accessing all the Flink jobs executed via the operator on one Kubernetes cluster. You can install it into the cluster by running this:\n1 $ kubectl apply -f heimdall/heimdall.yaml Once running, the UI provides a list of all the deployed jobs, their status, resource, etc. Forward its port to access it from your host:\n1 $ kubectl port-forward service/heimdall 8080 Figure 2. Heimdall, a UI for managing multiple Flink jobs on one Kubernetes cluster The endpoint URLs in the Heimdall UI are customizable, so that you can easily link to the right locations of Flink UI, API, metrics, and logs of your specific environment.\nSummary and Discussion The Apache Flink Kubernetes Operator is becoming increasingly popular for running Flink jobs on Kubernetes. Following the well-established Kubernetes operator pattern, it allows you to deploy and run your stream processing jobs by creating declarative resources which are processed in a reconciliation loop by the operator. This means you get to focus on the \u0026#34;What?\u0026#34; of your job, with a lot of the \u0026#34;How?\u0026#34; being taken care of automatically for you. Rather than having to deal with details such as configuring deployments, services, and ingresses, the operator lets you express your intent in a higher-level resource and derives all the required lower-level resources from that.\nThere’s quite a few features of the operator which we couldn’t discuss in this post, most prominently its capabilities for auto-scaling (i.e. scaling jobs automatically up and down in order to minimize back pressure while also satisfying given utilization targets) and auto-tuning Flink jobs (automatically adjusting the memory consumed by a job). Both are very promising, warranting their own article.\nThe Flink Kubernetes operator is a versatile solution for deploying and managing Flink jobs. When it comes to building a full-fledged Flink-based stream processing platform for an organization, there’s a non-trivial amount of additional aspects to keep in mind, including the following:\nThe management and evolution of data schemas\nDeveloper experience, for instance the ability to deploy SQL jobs and preview their results\nHigher-level resources such as source and sink connectors or declarative end-to-end data pipelines\nCost attribution and quota management between the teams and departments of a larger organization\nSecurity and Compliance with regulatory requirements such as HIPPA, GDPR, SOC 2\nOperator upgrades as well as upgrades of (stateful) Flink jobs\nIf all this sounds like it’s too much work, a fully-managed platform like Decodable might be interesting to you. Sign up for a free trial today and give it a try.\nAnother interesting feature of the operator is the ability to customize its behavior via plug-ins. This can be useful if for instance you want to make sure all deployed jobs adhere to specific standards (by implementing a custom resource validator) or you want to amend their configuration upon deployment, e.g. to add specific pod labels automatically (by implementing a resource mutator).\nThe Flink Kubernetes operator is under active development and definitely a project worth keeping an eye on. For upcoming versions, the team has planned to work on a better \u0026#34;rollback mechanism and stability conditions\u0026#34; as well several improvements to the autoscaler.\n","id":59,"publicationdate":"Jan 28, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_fault_tolerance_and_high_availability\"\u003eFault Tolerance and High Availability\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_manually_triggering_savepoints\"\u003eManually Triggering Savepoints\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_observability\"\u003eObservability\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_bonus_managing_flink_jobs_with_the_heimdall_ui\"\u003eBonus: Managing Flink Jobs With the Heimdall UI\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary_and_discussion\"\u003eSummary and Discussion\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/get-running-with-apache-flink-on-kubernetes-2\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWelcome back to this two-part blog post series about running Apache Flink on Kubernetes, using the Flink Kubernetes operator.\nIn \u003ca href=\"/blog/get-running-with-apache-flink-on-kubernetes-1\"\u003epart one\u003c/a\u003e, we discussed installation and setup of the operator, different deployment types, how to deploy Flink jobs using custom Kubernetes resources, and how to create container images for your own Flink jobs.\nIn this part, we’ll focus on aspects such as fault tolerance and high availability of your Flink jobs running on Kubernetes, savepoint management, observability, and more.\nYou can find the complete source code for all the examples shown in this series in the Decodable \u003ca href=\"https://github.com/decodableco/examples/blob/main/flink-on-kubernetes/\"\u003eexamples repository\u003c/a\u003e on GitHub: on GitHub.\u003c/p\u003e\n\u003c/div\u003e","tags":["flink","kubernetes","streaming"],"title":"Get Running with Apache Flink on Kubernetes, part 2 of 2","uri":"https://www.morling.dev/blog/get-running-with-apache-flink-on-kubernetes-2/"},{"content":"","id":60,"publicationdate":"Jan 28, 2025","section":"tags","summary":"","tags":null,"title":"kubernetes","uri":"https://www.morling.dev/tags/kubernetes/"},{"content":" Table of Contents Installation and Setup Deployment Types Deploying Your First Flink Job on Kubernetes Building Custom Job Images This post originally appeared on the Decodable blog. All rights reserved.\nKubernetes is a widely used deployment platform for Apache Flink. While Flink has had native support for Kubernetes for quite a while, it is in particular the operator pattern which makes deploying Flink jobs onto Kubernetes clusters a compelling option: you define jobs in a declarative resource, and a control loop running in a component called a Kubernetes operator takes care of provisioning and maintaining (e.g. scaling, updating) all the required resources. Automation is the keyword here, significantly reducing the manual effort required for running Flink jobs in production.\nThere are multiple Kubernetes operators for Flink, including Lyft’s flinkk8soperator, flink-on-k8s-operator by Spotify (a fork of a now unmaintained operator initially developed by Google), and flink-controller by Andrea Medeghini. The most widely used operator these days though is the upstream Flink Kubernetes Operator, developed under the umbrella of the Apache Flink project itself, which is also the focus of this two-part blog post series. It gives you an overview on what it takes to install the operator and run your first jobs on Kubernetes, and discuss what’s missing to create a complete Flink-based data platform for an organization.\nIn part one, we’ll touch on installation and setup, deploying Flink jobs via custom resources and creating container images for your own Flink jobs. Part two covers fault tolerance and high availability, savepoint management, observability, and then UI access.\nThe idea for these posts is to give you a fully runnable example so you can try out everything yourself. To follow along, grab the complete source code in the Decodable examples repository on GitHub:\n1 2 $ git clone $ cd flink-on-kubernetes You’ll need the following prerequisites:\nDocker, for running and building container images\nkind, for setting up a local Kubernetes cluster, and kubectl for interacting with it\nHelm, for installing the Kubernetes operator\nFig. 1 gives an overview of the solution we are going to build in this blog series, with all the key resources and most relevant interactions:\nFigure 1. Solution overview Installation and Setup Start by creating a local Kubernetes cluster using kind:\n1 $ kind create cluster --name my-cluster --config kind-cluster.yml The cluster contains a control plane as well as a worker node. The kind configuration makes the directory /tmp/kind-worker available to the worker node of the cluster, which will allow you to keep Flink savepoints when rebuilding the kind cluster. It also maps ports 80 and 443 of the worker node to the host, which will allow you to access the Flink web UI via a Kubernetes ingress later on:\nkind-cluster.yml for spinning up a local Kubernetes cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker extraMounts: - hostPath: /tmp/kind-worker containerPath: /files extraPortMappings: - containerPort: 80 hostPort: 80 protocol: TCP - containerPort: 443 hostPort: 443 protocol: TCP Verify the cluster is running:\n1 2 3 4 $ kubectl get nodes NAME STATUS ROLES AGE VERSION my-cluster-control-plane Ready control-plane 2d18h v1.29.2 my-cluster-worker Ready 2d18h v1.29.2 Installing the Flink Kubernetes operator is straight-forward thanks to the provided Helm chart. Add the Helm repository for the current version 1.10 (released in October 2024) of the operator like so:\n1 $ helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.10.0/ Before installing the operator itself, it is required to install the Kubernetes cert-manager (needed for issuing TLS certificates used by the operator’s admission controller):\n1 $ kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml The operator Helm chart can be customized upon installation with a wide set of parameters, for instance to control which Kubernetes namespaces the operator should observe, which images should be used by default for deploying Flink jobs, etc. We’re gonna make the following adjustments:\nInject the configuration required for running Flink jobs on Java 17; while Flink itself provides support for Java 17 since version 1.18, the operator in version 1.10 doesn’t pass the --add-opens flags required to run Flink on Java 17 and newer. This has been added recently (see FLINK-36646), but no release of the operator with those changes is available as of now\nShorten the lifetime of Flink job managers after a job has been suspended; this setting, kubernetes.operator.jm-deployment.shutdown-ttl, defaults to one day, which means you’ll have a superfluous job manager pod in your cluster for quite a while when suspending a job\nThe Helm values file with these configuration amendments looks like so:\nhelm-values.yaml for customizing the operator configuration 1 2 3 4 5 6 7 defaultConfiguration: create: true append: true flink-conf.yaml: |+ kubernetes.operator.default-configuration.flink-version.v1_20.env.java.default-opts.all: --add-exports=java.base/sun.net.util=ALL-UNNAMED --add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED kubernetes.operator.jm-deployment.shutdown-ttl: 10000 Install the Helm chart, using this values file:\n1 $ helm install -f helm-values.yaml flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator This should yield the following output:\n1 2 3 4 5 6 NAME: flink-kubernetes-operator LAST DEPLOYED: Mon Jan 20 17:02:43 2025 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None The Helm chart creates two Kubernetes roles and associated service accounts: flink-operator, used by the Flink operator for managing Flink deployments, and flink, used by the Flink job manager for spawning task manager pods. By default, the former is cluster-scoped, while the latter is namespace-scoped. In the following, we are going to deploy Flink jobs in the default namespace; if you want to deploy jobs in other namespaces, refer to the operator’s RBAC documentation to learn how to create the required role and service account.\nNext, we are going to create a deployment of MinIO to have an S3 backend for persisting Flink checkpoints and savepoints as well as Flink high availability data, independent of the lifetime of specific Flink pods. The MinIO deployment is taken pretty much verbatim from the upstream Kubernetes examples, only that the volume for persisting the MinIO data is backed by the aforementioned Kind extra mount:\n1 $ kubectl apply -f storage/ Create a bucket named flink-data:\n1 $ kubectl -n default run minio-client --image=minio/mc:latest --restart=Never --command=true -- sh -c \u0026#34;mc config host add minio http://minio-service.default.svc.cluster.local:9000 minio minio123 \u0026amp;\u0026amp; mc mb minio/flink-data\u0026#34; This concludes the installation steps and we are ready to run our first Flink job on Kubernetes.\nDeployment Types Following the Kubernetes operator pattern, Flink jobs are deployed using custom Kubernetes resource definitions (CRDs). This means you define a job and its configuration in a resource descriptor (a YAML file), and the operator will take care of creating and managing all the required Kubernetes resources, such as pods for job and task managers, config maps for job configuration, ingresses for accessing the Flink web UI, etc. If you make changes to a job’s definition (or delete it), the operator will perform the steps required to keep the deployed resources in sync with the resource definition.\nThe Flink Kubernetes operator supports running Flink in both application and session mode, with dedicated resource definitions for each mode. In application mode, each job is executed in its own Flink cluster, i.e. with its own job manager. When the job is finished, that cluster will be shut down. The FlinkDeployment resource type is used for running jobs in application mode.\nIn contrast, Flink’s session mode utilizes a long-running Flink cluster onto which multiple jobs are deployed, using the FlinkSessionJob resource type. Session mode can be more resource-efficient due to the shared job manager, at the cost of reduced isolation between jobs.\nIn the following, we are going to use application mode, i.e., a dedicated Flink cluster per job.\nSticking to the recommendations from the operator documentation, we’ll deploy Flink using the Native Kubernetes resource provider, which means that Flink will directly talk to the Kubernetes API to orchestrate resources such as task manager pods. In particular when deploying jobs of unknown provenience, it can make sense to use the Standalone provider instead, in which case Flink jobs don’t have access to the Kubernetes API, but instead the operator is managing all the required resources.\nDeploying Your First Flink Job on Kubernetes The operator’s repository on GitHub contains a number of very useful example resource definitions. Slightly adjusted to run with Flink 1.20 on Java 17, the most basic FlinkDeployment definition for running one of the example jobs shipping with Flink looks like this:\nResource definition basic.yaml for a first Flink job 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: basic-example spec: image: flink:1.20-java17 flinkVersion: v1_20 flinkConfiguration: taskmanager.numberOfTaskSlots: \u0026#34;2\u0026#34; serviceAccount: flink jobManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 taskManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar parallelism: 2 upgradeMode: stateless This is running the StateMachineExample.jar job on Flink 1.20 with a parallelism of 2, assigning 1 CPU core and 2 GB of RAM to the job manager and task manager pods, respectively. No execution state will be retained between job runs (upgrade mode stateless). To deploy this job, apply the resource definition using kubectl:\n1 $ kubectl apply -f flink/basic.yaml As the operator exposes Flink deployments just like any other kind of Kubernetes resource, you can examine their status with kubectl, too (output adjusted for readability):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 $ kubectl describe FlinkDeployment basic-example Name: basic-example Namespace: default Labels: Annotations: API Version: flink.apache.org/v1beta1 Kind: FlinkDeployment Metadata: Creation Timestamp: 2025-01-15T09:58:26Z Finalizers: flinkdeployments.flink.apache.org/finalizer Generation: 2 Managed Fields: API Version: flink.apache.org/v1beta1 ... Resource Version: 16235 UID: 470322f2-2658-48d0-bd86-7473816a5e40 Spec: Flink Configuration: taskmanager.numberOfTaskSlots: 2 Flink Version: v1_20 Image: flink:1.20 Job: Args: Jar URI: local:///opt/flink/examples/streaming/StateMachineExample.jar Parallelism: 4 State: running Upgrade Mode: stateless Job Manager: Replicas: 1 Resource: Cpu: 1 Memory: 2048m Service Account: flink Task Manager: Resource: Cpu: 1 Memory: 2048m Status: Cluster Info: Flink - Revision: b1fe7b4 @ 2024-07-25T04:22:22+02:00 Flink - Version: 1.20.0 Total - Cpu: 3.0 Total - Memory: 6442450944 Job Manager Deployment Status: READY Job Status: Checkpoint Info: Last Periodic Checkpoint Timestamp: 0 Job Id: 0f646b467710dd8f39e856f4a892be1d Job Name: State machine job Savepoint Info: Last Periodic Savepoint Timestamp: 0 Savepoint History: Start Time: 1736935168717 State: RUNNING Update Time: 1736935179288 Lifecycle State: STABLE Observed Generation: 2 Reconciliation Status: Last Reconciled Spec: {\u0026#34;spec\u0026#34;:{...}} Last Stable Spec: {\u0026#34;spec\u0026#34;:{...}} Reconciliation Timestamp: 1736935106418 State: DEPLOYED Task Manager: Label Selector: component=taskmanager,app=basic-example Replicas: 2 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Submit 6m28s JobManagerDeployment Starting deployment Normal JobStatusChanged 5m15s Job Job status changed from RECONCILING to RUNNING When deploying a job, the operator also creates a Kubernetes Service for exposing the Flink web UI. For testing purposes, you can forward its port, allowing you to access the UI on http://localhost:8081:\n1 $ kubectl port-forward service/basic-example-rest 8081 Figure 2. The Flink web UI Next, let’s make some changes to the job definition. Currently, the job runs with a parallelism of 2. As there are also two slots per task manager, a single task manager pod is running (besides the job manager pod) for the job:\n1 2 3 4 $ kubectl get pods -l app=basic-example NAME READY STATUS RESTARTS AGE basic-example-5d9dd5d5c4-h42jd 1/1 Running 0 14s basic-example-taskmanager-1-1 1/1 Running 0 6s Patch the resource to change the parallelism to 4:\n1 $ kubectl patch FlinkDeployment basic-example --patch \u0026#39;{\u0026#34;spec\u0026#34; : { \u0026#34;job\u0026#34; : { \u0026#34;parallelism\u0026#34; : 4 }}}\u0026#39; --type=merge The operator will recognize the change and reconcile the deployment accordingly. Shortly thereafter, you’ll see another task pod running:\n1 2 3 4 5 $ kubectl get pods -l app=basic-example NAME READY STATUS RESTARTS AGE basic-example-6fcfd8464c-sm9vc 1/1 Running 0 77s basic-example-taskmanager-1-1 1/1 Running 0 68s basic-example-taskmanager-1-2 1/1 Running 0 68s If you go back to the Flink web UI (recreate the port forwarding, if needed), you’ll also see that there are two task managers now, with four task slots overall:\nFigure 3. Updated job in the Flink web UI Finally, to delete the job, run the following:\n1 $ kubectl delete FlinkDeployment basic-example Besides Java, you can also implement your streaming processing jobs using Flink’s Python API, PyFlink. Refer to this post for learning how to get started with running your first PyFlink jobs on Kubernetes.\nBuilding Custom Job Images So far, we’ve run an example Flink job which is contained within the upstream Flink container image. In practise, you want to run your own Flink job JARs of course. Multiple options exist for doing so:\nAdding an init container to the Flink pods which retrieves the JAR at start-up from a remote location such as an S3 bucket, as shown in this blog post\nRetrieving the JAR from an HTTP server—for instance an internal Maven repository—by specifying an HTTP URI for the jarURI parameter; when using Flink 1.19 or later, the parameter also accepts S3 URIs, provided an S3 file system plug-in has been configured (see FLINK-28915)\nBuilding a custom container image, extending the upstream Flink image and adding your job JAR file at build time\nWhile all these approaches do the trick, my personal preference and recommendation is the last one, building a custom image with your job, rather than fetching the JAR at runtime. This can help to speed up things when restarting a job (as the image may be retrieved from a node-local image cache) and generally is more in line with the philosophy of immutable container images; for instance, when inspecting the image run by a pod, it’s immediately clear which version of the Flink job this is.\nThe example project of this post contains a small Flink job, which prints out a sequence of numbers, using Flink’s built-in DataGen source. You can build a container image for this job via Docker like so:\n1 $ docker build -t decodable-examples/hello-world-job:1.0 hello-world-job While you’d normally push this image to an image registry such as ECR, we’re simply going to load it directly into Kind in this case:\n1 $ kind load docker-image decodable-examples/hello-world-job:1.0 --name my-cluster Next, we need to create a FlinkDeployment resource for this job. This is very similar to the one before, only that now we’re referring to our custom image containing the Flink job JAR file:\nResource definition custom-job.yaml for a custom Flink job 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: custom-job spec: image: decodable-examples/flink-hello-world:1.0 flinkVersion: v1_20 flinkConfiguration: taskmanager.numberOfTaskSlots: \u0026#34;2\u0026#34; serviceAccount: flink jobManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 taskManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/flink-hello-world-0.1.jar parallelism: 1 upgradeMode: stateless Apply the descriptor:\n1 $ kubectl apply -f flink/custom-job.yaml Then verify the job status is running:\n1 2 3 $ kubectl get FlinkDeployment custom-job NAME JOB STATUS LIFECYCLE STATE custom-job RUNNING STABLE For further analysis, you can also take a look at the logs of the task manager pod like so:\n1 2 3 4 5 6 7 8 9 10 $ kubectl logs -l app=custom-job,component=taskmanager -f ... 2025-01-20 11:07:40,323 INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Generator Source -\u0026gt; Sink: Writer (1/1)#0 (d70586a2890af7b3459578ad30dcd550_cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from DEPLOYING to INITIALIZING. 2025-01-20 11:07:40,365 INFO org.apache.flink.runtime.state.StateBackendLoader [] - State backend is set to heap memory org.apache.flink.runtime.state.hashmap.HashMapStateBackend@79439cec 2025-01-20 11:07:40,375 INFO org.apache.flink.runtime.state.StateBackendLoader [] - State backend is set to heap memory org.apache.flink.runtime.state.hashmap.HashMapStateBackend@177287da 2025-01-20 11:07:40,380 INFO org.apache.flink.runtime.taskmanager.Task [] - Source: Generator Source -\u0026gt; Sink: Writer (1/1)#0 (d70586a2890af7b3459578ad30dcd550_cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from INITIALIZING to RUNNING. Number: 0 Number: 1 Number: 2 ... This concludes part one of this blog post series. Head over to part two to learn about fault tolerance and high availability of your Flink jobs running on Kubernetes, savepoint management, observability, and more.\n","id":61,"publicationdate":"Jan 21, 2025","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_installation_and_setup\"\u003eInstallation and Setup\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_deployment_types\"\u003eDeployment Types\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_deploying_your_first_flink_job_on_kubernetes\"\u003eDeploying Your First Flink Job on Kubernetes\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_building_custom_job_images\"\u003eBuilding Custom Job Images\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/get-running-with-apache-flink-on-kubernetes-1\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eKubernetes is a widely used deployment platform for Apache Flink.\nWhile Flink has had native support for Kubernetes for quite a while, it is in particular the operator pattern which makes deploying Flink jobs onto Kubernetes clusters a compelling option: you define jobs in a declarative resource, and a control loop running in a component called a Kubernetes operator takes care of provisioning and maintaining (e.g.\nscaling, updating) all the required resources.\nAutomation is the keyword here, significantly reducing the manual effort required for running Flink jobs in production.\u003c/p\u003e\n\u003c/div\u003e","tags":["flink","kubernetes","streaming"],"title":"Get Running with Apache Flink on Kubernetes, part 1 of 2","uri":"https://www.morling.dev/blog/get-running-with-apache-flink-on-kubernetes-1/"},{"content":" Table of Contents Hello, Failover Slots! Failover Slots in Decodable Wrapping Up This post originally appeared on the Decodable blog. All rights reserved.\nPostgres read replicas are commonly used not only to distribute query load amongst multiple nodes, but also to ensure high availability (HA) of the database. If the primary node of a Postgres cluster fails, a read replica can be promoted to be the new primary, processing write (and read) requests from thereon.\nPrior to Postgres version 16, read replicas (or stand-by servers) couldn’t be used at all for logical replication. Logical replication is a method for replicating data from a Postgres publisher to subscribers. These subscribers can be other Postgres instances, as well as non-Postgres tools, such as Debezium, which use logical replication for change data capture (CDC). Logical replication slots—which keep track of how far a specific subscriber has consumed the database’s change event stream—could only be created on the primary node of a Postgres cluster. This meant that after a failover from primary to replica you’d have to create a new replication slot and typically also start with a new initial snapshot of the data. Otherwise you might have missed change events occurring after reading from the slot on the old primary and before creating the slot on the new primary.\nWhilst external tools such as pg_failover_slots were added over time, a built-in solution for failing over replication slots from one Postgres node to another was sorely missed by many users. This situation substantially improved with the release of Postgres 16, which brought support for setting up replication slots on read replicas. I’ve discussed this feature in great detail in this post. Back then, I also explored how to use replication slots on replicas for manually implementing replication slot failover. But the good news is, as of Postgres version 17, all this is not needed any longer, as it finally supports failover slots out of the box!\nFailover slots are replication slots which are created on the primary node and which are propagated automatically to read replicas. Their state on the replica is kept in sync with the upstream slot on the primary, which means that after a failover, when promoting a replica to primary, you can continue to consume the slot on the new primary without the risk of missing any change events. This is really great news for using tools like Debezium (and by extension, data platforms such as Decodable, which provides a fully-managed Postgres CDC connector based on Debezium) in HA scenarios, so let’s take a closer look at how this works.\nHello, Failover Slots! Let’s start by exploring failover slots solely from the perspective of using Postgres\u0026#39; SQL interface to logical replication. If you want to follow along, check out this project from the Decodable examples repository. It contains a Docker Compose file which brings up a Postgres primary and a read replica (based on this set-up by Peter Eremeykin, which makes it really easy to spin up an ephemeral Postgres cluster for testing purposes), as well as a few other components which we’ll use later on. Start everything by running:\n1 $ docker compose up Primary and replica are synchronized via a physical replication slot (i.e. unlike with logical replication, all the WAL segments are replicated), ensuring that all data changes done on the primary are replicated to the replica immediately. Get a session on the primary (I’m going to use pgcli which is my favorite Postgres CLI client, but psql will do the trick as well):\n1 $ pgcli --prompt \u0026#34;\\u@primary:\\d\u0026gt; \u0026#34; \u0026#34;postgresql://user:top-secret@localhost:5432/inventorydb\u0026#34; The prompt has been adjusted to show the current instance. To verify you are on the primary and not the replica run the following:\n1 2 3 4 5 6 user@primary:inventorydb\u0026gt; SELECT * from pg_is_in_recovery(); +-------------------+ | pg_is_in_recovery | |-------------------| | False | +-------------------+ Both primary and replica are already configured with WAL level logical, as required for logical replication:\n1 2 3 4 5 6 user@primary:inventorydb\u0026gt; SHOW wal_level; +-----------+ | wal_level | |-----------| | logical | +-----------+ Note that if you are running Postgres on Amazon RDS (where version 17 is available in the preview environment right now), you’ll need to set the parameter rds.logical_replication to on for this.\nThere’s a physical replication slot for synchronizing changes from the primary to the replica server:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 user@primary:inventorydb\u0026gt; WITH node_status AS ( SELECT CASE WHEN pg_is_in_recovery() = \u0026#39;True\u0026#39; Then \u0026#39;stand-by\u0026#39; ELSE \u0026#39;primary\u0026#39; END AS role ) SELECT node_status.role AS node, slot_name, slot_type, active, plugin, database, failover, synced, confirmed_flush_lsn FROM pg_replication_slots, node_status; +---------+------------------+-----------+--------+--------+----------+----------+--------+---------------------+ | node | slot_name | slot_type | active | plugin | database | failover | synced | confirmed_flush_lsn | |---------+------------------+-----------+--------+--------+----------+----------+--------+---------------------| | primary | replication_slot | physical | True | | | False | False | | +---------+------------------+-----------+--------+--------+----------+----------+--------+---------------------+ This slot must be added to the set of synchronized stand-by slots using the synchronized_standby_slots configuration option:\n1 2 user@primary:inventorydb\u0026gt; ALTER SYSTEM SET synchronized_standby_slots=\u0026#39;replication_slot\u0026#39;; user@primary:inventorydb\u0026gt; SELECT pg_reload_conf(); This setting, new in Postgres 17, makes sure that any logical replication slots we are going to set up in the following cannot advance beyond the confirmed log sequence number (LSN) of this physical slot. Without that setting, logical replication consumers such as Debezium may receive changes which never got propagated to the replica in case of a failure of the primary, resulting in an inconsistent state.\nNext, get a database session on the replica:\n1 $ pgcli --prompt \u0026#34;\\u@replica:\\d\u0026gt; \u0026#34; \u0026#34;postgresql://user:top-secret@localhost:5433/inventorydb\u0026#34; As above, verify that you are on the right node indeed:\n1 2 3 4 5 6 user@primary:inventorydb\u0026gt; SELECT * from pg_is_in_recovery(); +-------------------+ | pg_is_in_recovery | |-------------------| | True | +-------------------+ It is already configured with hot_standby_feedback=ON which is a requirement for failover slots to work. In addition, the database name must be added to primary_conninfo on the replica (its connection string for connecting to the primary). I couldn’t find a way for just adding a single attribute, so I retrieved the current value and added the database name, so that the complete string can be written back (on RDS, that setting can’t be changed, instead set the rds.logical_slot_sync_dbname parameter to the name of the database):\n1 2 3 user@replica:inventorydb\u0026gt; ALTER SYSTEM SET primary_conninfo = \u0026#39;user=replicator password=\u0026#39;\u0026#39;zufsob-kuvtum-bImxa6\u0026#39;\u0026#39; channel_binding=prefer host=postgres_primary port=5432 sslmode=prefer sslnegotiation=postgres sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable dbname=inventorydb\u0026#39;; user@replica:inventorydb\u0026gt; SELECT pg_reload_conf(); At this point, primary and replica are set up for failover slots to work. So let’s create a logical replication slot on the primary next:\n1 2 3 4 5 6 user@primary:inventorydb\u0026gt; SELECT * FROM pg_create_logical_replication_slot(\u0026#39;test_slot\u0026#39;, \u0026#39;test_decoding\u0026#39;, false, false, true); +-----------+-----------+ | slot_name | lsn | |-----------+-----------| | test_slot | 0/304DCA0 | +-----------+-----------+ As of Postgres 17, there’s a new optional parameter of the pg_create_logical_replication_slot() function for specifying that a failover slot should be created (failover=true). On the replica, call pg_sync_replication_slots() for synchronizing all failover slots from the primary:\n1 2 3 4 5 6 user@replica:inventorydb\u0026gt; SELECT pg_sync_replication_slots(); +---------------------------+ | pg_sync_replication_slots | |---------------------------| | | +---------------------------+ This will make sure that the slots on both primary and replica are at exactly the same LSN. To verify that this is the case, query the status of the slot on both nodes, using the same query as above:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 WITH node_status AS ( SELECT CASE WHEN pg_is_in_recovery() = \u0026#39;True\u0026#39; Then \u0026#39;stand-by\u0026#39; ELSE \u0026#39;primary\u0026#39; END AS role ) SELECT node_status.role AS node, slot_name, slot_type, active, plugin, database, failover, synced, confirmed_flush_lsn FROM pg_replication_slots, node_status; Right now, the confirmed_flush_lsn of the slot on the replica matches that of the slot on the primary, i.e. the two slots are in sync. But once a client consumes changes from the primary slot, the slot on the replica will not be updated accordingly. You could manually call pg_sync_replication_slots() repeatedly to synchronize the slot on the replica, but luckily, there’s an easier way. By setting sync_replication_slots to on on the replica, a synchronization worker will be started, which will propagate the replication state automatically:\n1 2 user@replica:inventorydb\u0026gt; ALTER SYSTEM SET sync_replication_slots = true; user@replica:inventorydb\u0026gt; SELECT pg_reload_conf(); Now do some data changes in the primary and consume them from the replication slot:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 user@primary:inventorydb\u0026gt; UPDATE inventory.customers SET first_name=\u0026#39;Sarah\u0026#39; where id = 1001; user@primary:inventorydb\u0026gt; UPDATE inventory.customers SET first_name=\u0026#39;Sam\u0026#39; where id = 1001; user@primary:inventorydb\u0026gt; SELECT * FROM pg_logical_slot_get_changes(\u0026#39;test_slot\u0026#39;, NULL, NULL); +-----------+-----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | lsn | xid | data | |-----------+-----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0/304DCD8 | 752 | BEGIN 752 | | 0/304DCD8 | 752 | table inventory.customers: UPDATE: old-key: id[integer]:1001 first_name[character varying]:\u0026#39;Sally\u0026#39; last_name[character varying]:\u0026#39;Thomas\u0026#39; email[character varying]:\u0026#39;sally.thomas@acme.com\u0026#39; is_test_account[boolean]:false new-tuple: id[integer]:1001 first_name[character varying]:\u0026#39;Sarah\u0026#39; last_name[character varying]:\u0026#39;Thomas\u0026#39; email[character varying]:\u0026#39;sally.thomas@acme.com\u0026#39; is_test_account[boolean]:false | | 0/304DFE8 | 752 | COMMIT 752 | | 0/304E038 | 753 | BEGIN 753 | | 0/304E038 | 753 | table inventory.customers: UPDATE: old-key: id[integer]:1001 first_name[character varying]:\u0026#39;Sarah\u0026#39; last_name[character varying]:\u0026#39;Thomas\u0026#39; email[character varying]:\u0026#39;sally.thomas@acme.com\u0026#39; is_test_account[boolean]:false new-tuple: id[integer]:1001 first_name[character varying]:\u0026#39;Sam\u0026#39; last_name[character varying]:\u0026#39;Thomas\u0026#39; email[character varying]:\u0026#39;sally.thomas@acme.com\u0026#39; is_test_account[boolean]:false | | 0/304E100 | 753 | COMMIT 753 | +-----------+-----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ If you query pg_replication_slots once more, you’ll see that the confirmed flush LSN of the slot on the replica matches still that of the primary. To conclude our basic experiments, let’s see what happens when we try to consume the replication slot on the replica server:\n1 2 3 4 user@replica:inventorydb\u0026gt; SELECT * FROM pg_logical_slot_get_changes(\u0026#39;test_slot\u0026#39;, NULL, NULL); cannot use replication slot \u0026#34;test_slot\u0026#34; for logical decoding DETAIL: This replication slot is being synchronized from the primary server. HINT: Specify another replication slot. Somewhat to be expected, this triggers an error: as the state of failover slots on a replica is driven by the corresponding slot on the primary, they cannot be consumed. Only after promoting a replica to the primary, clients can connect to that slot and read change events from it. Let’s give this a try by setting up an instance of the Decodable Postgres CDC connector next!\nFailover Slots in Decodable Decodable is a fully managed realtime data platform based on Apache Flink. It offers managed connectors for a wide range of source and sink systems, allowing you to build robust and efficient ETL pipelines with ease. Its Postgres CDC connector is using Debezium under the hood, which in turn uses logical replication for ingesting data change events from Postgres. Thanks to failover slots, it’s possible to continue streaming changes from a Postgres read replica which got promoted to primary into Decodable, without missing any events.\nFor this kind of use case it makes sense to put a Postgres proxy in front of the database cluster, exposing one stable endpoint for it. This will enable us later on to fail over from primary to replica without having to reconfigure the Decodable Postgres connector. In a production scenario, this approach will make the fail-over an implementation detail managed by the team owning the database, not requiring coordination with the team running the data platform. The Docker Compose set-up from above contains the pgbouncer proxy for this purpose.\nIn order to access the database on your machine from Decodable which is running in the cloud, we are going to use ngrok, an API gateway service. Amongst other things, ngrok lets you expose non-public resources to the cloud, i.e. exactly what we need for this example. You may consider this approach also for connecting to an on-prem database in a production use case.\nThe overall architecture looks like this:\nFigure 1. Solution overview For the following steps, you’ll need to have these things:\nA free Decodable account\nThe Decodable CLI installed on your machine\nA free ngrok account\nThe pgbouncer proxy is configured to connect to the primary Postgres host, and ngrok exposes a publicly accessible tunnel for connecting to pgbouncer.\nDecodable provides support for declaring your resources—connectors, SQL pipelines, etc.—in a declarative way. You describe the resources you’d like to have in a YAML file, and the Decodable platform will take care of materializing them in your account. The example project contains definitions for a Postgres CDC source connector, a Decodable secret with the database password to be used by the connector, and a stream, to which the connector will write its data:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 --- kind: secret metadata: name: inventorydb-password tags: context: failover-slots-demo spec_version: v1 spec: value_literal: \u0026#34;top-secret\u0026#34; --- kind: connection metadata: name: inventorydb-source tags: context: failover-slots-demo spec_version: v2 spec: connector: postgres-cdc type: source stream_mappings: - stream_name: inventorydb__inventory__customers external_resource_specifier: database-name: inventorydb schema-name: inventory table-name: customers properties: hostname: \u0026#34;%DB_HOST%\u0026#34; password: inventorydb-password port: \u0026#34;%DB_PORT%\u0026#34; database-name: inventorydb decoding.plugin.name: pgoutput username: user --- kind: stream metadata: name: inventorydb__inventory__customers description: Automatically created stream for inventorydb.inventory.customers tags: context: failover-slots-demo spec_version: v1 spec: schema_v2: fields: - kind: physical name: id type: INT NOT NULL - kind: physical name: first_name type: VARCHAR(255) - kind: physical name: last_name type: VARCHAR(255) - kind: physical name: email type: VARCHAR(255) - kind: physical name: is_test_account type: BOOLEAN constraints: primary_key: - id type: CHANGE properties: partition.count: \u0026#34;2\u0026#34; compaction.enable: \u0026#34;false\u0026#34; properties.compression.type: zstd ngrok creates a tunnel for publicly exposing the local Postgres instance using a random host name and port. With the help of httpie and jq, we can retrieve the public endpoint of the tunnel from the local ngrok REST API (alternatively, there’s also a UI running on http://localhost:4040/), allowing us to propagate these values to the resource definition via sed before applying the same using the Decodable CLI ( the special file name - indicates to read from stdin rather from a given file):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 $ sed -e s/%DB_HOST%/$(http localhost:4040/api/tunnels | jq -r \u0026#39;.tunnels[] | select(.name==\u0026#34;pgbouncer\u0026#34;) | .public_url | sub(\u0026#34;tcp://\u0026#34;; \u0026#34;\u0026#34;) | sub(\u0026#34;:.*\u0026#34;; \u0026#34;\u0026#34;) \u0026#39;)/g \\ -e s/%DB_PORT%/`http localhost:4040/api/tunnels | jq -r \u0026#39;.tunnels[] | select(.name==\u0026#34;pgbouncer\u0026#34;) | .public_url | sub(\u0026#34;tcp://\u0026#34;; \u0026#34;\u0026#34;) | sub(\u0026#34;.*:\u0026#34;; \u0026#34;\u0026#34;) \u0026#39;`/g \\ decodable-resources.yaml | \\ decodable apply - --- kind: secret name: inventorydb-password id: 8fc44565 result: created --- kind: connection name: inventorydb-source id: \u0026#34;62943949\u0026#34; result: created --- kind: stream name: inventorydb__inventory__customers id: \u0026#34;92523911\u0026#34; result: created • Wrote plaintext values for secret IDs: [8fc44565] At this point, all the resources have been created in your Decodable account, but the Postgres source connector is not running yet. Before we can start it (\u0026#34;activate\u0026#34; in Decodable terminology), we need to create the replication slot in the database, configuring it as a failover slot. For the Decodable Postgres CDC connector to pick up the slot, it must have the name decodable_\u0026lt;connection-id\u0026gt;. To obtain the connection id, refer to the output of the decodable apply command above, or retrieve it using decodable query like so:\n1 2 3 $ decodable query --name \u0026#34;inventorydb-source\u0026#34; --kind connection --keep-ids | yq .metadata.id 62943949 In the primary Postgres instance, create the replication slot, configuring it as a failover slot:\n1 user@primary:inventorydb\u0026gt; SELECT * FROM pg_create_logical_replication_slot(\u0026#39;decodable_\u0026#39;, \u0026#39;pgoutput\u0026#39;, false, false, true); Next, activate the connector:\n1 2 3 4 5 6 7 8 $ decodable connection activate $(decodable query --name \u0026#34;inventorydb-source\u0026#34; --kind connection --keep-ids | yq .metadata.id) inventorydb-source id 62943949 description - connector postgres-cdc type source ... Upon its first activation, the connector will create an initial snapshot of the data in the customers table and then read any subsequent data changes incrementally via the replication slot we’ve created. To take a look at the data head over to the Decodable web UI in your browser and go to the \u0026#34;Streams\u0026#34; view. Select the inventorydbinventorycustomers stream and go to the \u0026#34;Preview\u0026#34; tab where you should see the data from the snapshot. Do a few data changes in the Postgres session on the primary node:\n1 2 user@primary:inventorydb\u0026gt; UPDATE inventory.customers SET first_name=\u0026#39;Saundra\u0026#39; where id = 1001; user@primary:inventorydb\u0026gt; UPDATE inventory.customers SET first_name=\u0026#39;Samantha\u0026#39; where id = 1001; Shortly thereafter, these updates will be reflected in the stream preview as well:\nFigure 2. Postgres data change events in the Decodable stream preview Now let’s simulate a failover from primary to replica server and see how the failover of the replication slot is handled. Stop the primary Postgres node:\n1 $ docker compose stop postgres_primary Go to the database session on the replica and promote it to the primary:\n1 user@replica:inventorydb\u0026gt; select pg_promote(); At this point, pgbouncer still points to the previous, now defunct primary server. Change its configuration in the Docker Compose file so it forwards to the new primary:\n1 2 3 4 5 6 7 8 9 10 pgbouncer: image: edoburu/pgbouncer:latest environment: - - DB_HOST=postgres_primary + - DB_HOST=postgres_replica - DB_PORT=5432 - DB_USER=user - DB_PASSWORD=top-secret - ADMIN_USERS=postgres,admin - AUTH_TYPE=scram-sha-256 Stop and restart pgbouncer (other proxies such as pgcat are also be able to reload updated configuration on the fly):\n1 2 $ docker compose stop pgbouncer $ docker compose up -d If you go back to the Decodable web UI and take a look at the inventorydb_source connection, it should be in the \u0026#34;Retrying\u0026#34; state at this point, as it lost the connection to the previous primary (via the proxy). After a little while, it will be in \u0026#34;Running\u0026#34; state again, as the proxy now routes to the new primary server. The connector now consumes from the replication slot on that new primary server, as you can confirm by doing a few more data changes on that node and examining the data in the Decodable stream preview also by verifying that the replication slot now is in Active state:\n1 user@replica:inventorydb\u0026gt; INSERT INTO inventory.customers VALUES (default, \u0026#39;Rudy\u0026#39;, \u0026#39;Replica\u0026#39;, \u0026#39;rudy@example.com\u0026#39;, FALSE) If you retrieve the state of the replication state again, using the same query as above, you’ll also see that it is marked as active now:\n1 2 3 4 5 +----------+--------------------+-----------+--------+----------+-------------+----------+--------+---------------------+ | node | slot_name | slot_type | active | plugin | database | failover | synced | confirmed_flush_lsn | |----------+--------------------+-----------+--------+----------+-------------+----------+--------+---------------------| | stand-by | decodable_62943949 | logical | False | pgoutput | inventorydb | True | True | 0/304E2E0 | +----------+--------------------+-----------+--------+----------+-------------+----------+--------+---------------------+ Wrapping Up Failover slots are an essential part to using Postgres and logical replication in HA scenarios. Added in Postgres 17, they allow logical replication clients such as Debezium to seamlessly continue streaming change events after a database fail-over, ensuring no events are lost in the process. This renders previous solutions such as manually synchronizing slots on a standby server (as supported since Postgres 16) or external tools such as pg_failover_slots obsolete (although the latter still comes in handy if you are on an older Postgres version and can’t upgrade to 17 just yet).\nTo create a failover slot, the new failover parameter must be set to true when calling pg_create_logical_replication_slot(). Debezium does not do this yet at the moment, but I am planning to implement the required change in the next few weeks. Keep an eye on DBZ-8412 to track the progress there. In the meantime, you can create the replication slot manually, as described above. To learn more about Postgres failover slots, check out the blog posts by Bertrand Drouvot and Amit Kapila. RDS users should refer to the AWS documentation on managing failover slots.\nIf you’d like to give it a try yourself, including the managed Postgres CDC connector on Decodable, you can find the complete source code for this blog post in the Decodable examples repository on GitHub.\n","id":62,"publicationdate":"Dec 3, 2024","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_hello_failover_slots\"\u003eHello, Failover Slots!\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_failover_slots_in_decodable\"\u003eFailover Slots in Decodable\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrapping_up\"\u003eWrapping Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/failover-replication-slots-with-postgres-17\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003ePostgres read replicas are commonly used not only to distribute query load amongst multiple nodes, but also to ensure high availability (HA) of the database.\nIf the primary node of a Postgres cluster fails, a read replica can be promoted to be the new primary, processing write (and read) requests from thereon.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","cdc","debezium"],"title":"Failover Replication Slots with Postgres 17","uri":"https://www.morling.dev/blog/failover-replication-slots-with-postgres-17/"},{"content":"","id":63,"publicationdate":"Nov 27, 2024","section":"tags","summary":"","tags":null,"title":"governance","uri":"https://www.morling.dev/tags/governance/"},{"content":" If you are following the news around Debezium—​an open-source platform for Change Data Capture (CDC) for a variety of databases—​you may have seen the announcement that the project is in the process of moving to the Commonhaus Foundation. I think this is excellent news for the Debezium project, its community, and open-source CDC at large. In this post I’d like to share some more context on why I am so excited about this development.\nDebezium was founded in 2016 by Randall Hauch, back then a software engineer at Red Hat. Pretty much from the get go, the Apache-licensed project attracted a very diverse community not only of users, but also of contributors. The Debezium development team has been doing an amazing job of fostering a welcoming and open environment, establishing a level playing field for contributors from Red Hat—​who funded and continues to fund the salaries of the core team working on the project, including mine, while leading Debezium between 2017 and 2022—​as well as other organizations alike. Companies such as Google, IBM, Stripe, Slack, WePay, SugarCRM, Instaclustr, Bolt, and many others have put substantial resources into the project. More than 650 individuals have contributed to the project at this point, ranging from small fixes and improvements, over developing complete features in the Debezium core framework, all the way up to driving the work and roadmap of specific connectors.\nOf course I am biased, but I think it’s fair to say that when it comes to \u0026#34;vendor-owned\u0026#34; open-source, Debezium has been a tremendous success. When the project website says the following, it’s truly meant like that and not just empty words:\nThe Debezium project is operated as a community-centric open source project. While Red Hat product management has a voice, it is akin to the same voice of any member of the community, whether they contribute code, bug reports, bug fixes or documentation.\nThe community has truly lived up to this aspiration and the project has always managed to align the interests of the different parties involved (I only remember a single time where there was a continued discussion about a specific feature and its implementation between the core team and the contributing team, with the idea of forking the project being floated briefly). Nevertheless, ultimately the Debezium project was controlled by a single entity, Red Hat. They owned the name, the domain, the GitHub organization, social media channels, etc. Despite the continued demonstration of best intentions, some folks may have had reservations to contribute to a project managed like that.\nThat’s why I was thrilled to learn that several other projects sponsored and managed by Red Hat, for instance Quarkus and Hibernate, announced their move to the Commonhaus Foundation earlier this year. This foundation acts as a 100% neutral home of open-source projects, addressing any potential concerns around ownership which contributors may have. I was hoping for Debezium to make the move to Commonhaus as well, and I could not have been any happier when learning a few weeks back that it actually is going to happen.\nThe Commonhaus Foundation is a particularly interesting instance of an open-source foundation, as it provides its projects with an extensive degree of freedom. Quoting their FAQ, what Commonhaus differentiates from other foundations such as Apache Software Foundation, Eclipse Foundation, or Linux Foundation, is this (check out the full FAQ for comparisons with specific foundations):\nThe Commonhaus Foundation sets itself apart by providing open source projects with a unique combination of autonomy and tailored support, adapted to their specific stages of development and needs. By simplifying access to funding and offering a stable, long-term home for their assets, the Foundation enables projects to govern themselves and leverage collective resources for greater visibility and impact.\nUnlike the structured environments and specific licensing and infrastructure requirements characteristic of foundations like the Apache and Eclipse Foundations, Commonhaus allows projects to maintain their established brand, community identity, infrastructure, and governance practices. It also supports a broader array of OSI-approved licenses.\nThe way I perceive it, Commonhaus is a \u0026#34;No frills\u0026#34; foundation, a neutral project home which acts as the owner of IP such as project trademarks, helps projects with financial management, provides them with infrastructure for receiving donations (something we always struggled with during my time leading the project), and more. But it stays out of projects day-to-day operations as much as possible. I believe it’s a perfect fit for a project like Debezium, with a strong existing community, brand, and established processes. Debezium is going to join the ranks of other popular projects under the Commonhaus umbrella, such as SDKMan, OpenRewrite, and Jackson. Also SlateDB, a recently open-sourced embedded database built on object storage just moved to Commonhaus, which goes to show that the foundation also is a great home for young projects, relatively early in their lifecycle.\nAs such, I think moving to Commonhaus is an outstanding milestone for the Debezium project, ensuring its ongoing success in the future. Big kudos to the Debezium team for making this move, and massive props to Red Hat for supporting this step. It shows a deep understanding of and belief into open-source and its unique advantages, not paralleled by many other organizations. Now, some folks might wonder whether this is about dumping a project to an open-source foundation and then quickly pulling resources after that. Needless to say that I am not a spokesperson for Red Hat and I can’t predict what’s going to happen in the future. But personally, this is not something I am worried about. Historically, this isn’t something the company has been doing (with the exception of the Ceylon programming language perhaps, which got discontinued pretty quickly after moving to the Eclipse Foundation). Case in point, they just published a job posting for a Principal Software Engineer working on Debezium.\nTo wrap things up, I think the future is looking really bright for Debezium. The need for CDC and the interest in Debezium as the leading open-source implementation is unbroken. At the same time, it’s a very active space, with new projects popping up frequently, so it’s vital for Debezium and its community to keep moving and innovating. The move to Commonhaus lays an excellent foundation for this next chapter of Debezium’s success. With the team currently discussing the project roadmap for 2025, it’s a perfect time for getting involved and becoming a part of the journey.\n","id":64,"publicationdate":"Nov 27, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIf you are following the news around Debezium—​an open-source platform for Change Data Capture (CDC) for a variety of databases—​you may have seen the announcement that the project is in the process of \u003ca href=\"https://debezium.io/blog/2024/11/04/debezium-moving-to-commonhaus/\"\u003emoving to the Commonhaus Foundation\u003c/a\u003e. I think this is excellent news for the Debezium project, its community, and open-source CDC at large. In this post I’d like to share some more context on why I am so excited about this development.\u003c/p\u003e\n\u003c/div\u003e","tags":["debezium","open-source","governance"],"title":"Thoughts On Moving Debezium to the Commonhaus Foundation","uri":"https://www.morling.dev/blog/thoughts-on-moving-debezium-to-commonhaus-foundation/"},{"content":"","id":65,"publicationdate":"Nov 16, 2024","section":"tags","summary":"","tags":null,"title":"build-tools","uri":"https://www.morling.dev/tags/build-tools/"},{"content":" Every now and then, it can come in very handy to build OpenJDK from source yourself, for instance if you want to explore a feature which is under development on a branch for which no builds are published. For some reason I always thought that building OpenJDK is a very complex processing, requiring the installation of arcane tool chains etc. But as it turns out, this actually not true: the project does a great job of documenting what’s needed and only a few steps are necessary to build your very own JDK.\nThe following is a run-down of what I had to do to build JDK 24 from source on macOS 14.7.1. This is mostly for my own reference, check out the upstream documentation for a comprehensive description of the OpenJDK build, all requirements, build options, etc.\nFirst, install the required tools:\nA boot JDK, typically the previous version; I highly recommend to use SDKMan to do so:\n1 sdk install java 23.0.1-tem XCode, Apple’s development environment for macOS; the easiest way is to get it from the App Store. Unfortunately though, the current release 16.1 ships a broken version of clang which makes the JDK build fail. So you should either install 15.4 from Apple Developer, or apply the following patch before building OpenJDK which sidesteps that issue (at the price of building with fewer compiler optimizations):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 git apply \u0026lt;\u0026lt; EOF --- a/make/autoconf/flags-cflags.m4 +++ b/make/autoconf/flags-cflags.m4 @@ -337,9 +337,9 @@ AC_DEFUN([FLAGS_SETUP_OPTIMIZATION], C_O_FLAG_HIGHEST=\u0026#34;-O3 -finline-functions\u0026#34; C_O_FLAG_HI=\u0026#34;-O3 -finline-functions\u0026#34; else - C_O_FLAG_HIGHEST_JVM=\u0026#34;-O3\u0026#34; - C_O_FLAG_HIGHEST=\u0026#34;-O3\u0026#34; - C_O_FLAG_HI=\u0026#34;-O3\u0026#34; + C_O_FLAG_HIGHEST_JVM=\u0026#34;-O1\u0026#34; + C_O_FLAG_HIGHEST=\u0026#34;-O1\u0026#34; + C_O_FLAG_HI=\u0026#34;-O1\u0026#34; fi C_O_FLAG_NORM=\u0026#34;-O2\u0026#34; C_O_FLAG_DEBUG_JVM=\u0026#34;-O0\u0026#34; EOF Autoconf:\n1 brew install autoconf With that, you should have everything in place for building OpenJDK:\nClone the project:\n1 2 git clone https://git.openjdk.org/jdk cd jdk Run configure:\n1 bash configure Run the actual build:\n1 make images Rejoice:\n1 2 3 4 ./build/macosx-aarch64-server-release/jdk/bin/java --version openjdk 24-internal 2025-03-18 OpenJDK Runtime Environment (build 24-internal-adhoc.gunnarmorling.jdk) OpenJDK 64-Bit Server VM (build 24-internal-adhoc.gunnarmorling.jdk, mixed mode) And that’s it, you now have your own JDK build you can use for testing. Pretty easy, right? That said, if you still don’t feel like running this build by yourself, and if you’re on Linux rather than macOS, you also can check out the OpenJDK builds provided by Aleksey Shipilëv, which are provided for a variety of OpenJDK projects as well as target platforms.\n","id":66,"publicationdate":"Nov 16, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eEvery now and then, it can come in very handy to build OpenJDK from source yourself,\nfor instance if you want to explore a feature which is under development on a branch for which no builds are published.\nFor some reason I always thought that building OpenJDK is a very complex processing,\nrequiring the installation of arcane tool chains etc.\nBut as it turns out, this actually not true:\nthe project does a great job of documenting what’s needed and only a few steps are necessary to build your very own JDK.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","openjdk","build-tools"],"title":"Building OpenJDK From Source On macOS","uri":"https://www.morling.dev/blog/building-openjdk-from-source-on-macos/"},{"content":"","id":67,"publicationdate":"Nov 16, 2024","section":"tags","summary":"","tags":null,"title":"openjdk","uri":"https://www.morling.dev/tags/openjdk/"},{"content":" Table of Contents Recap: What’s The Outbox Pattern? Implementation Considerations Polling vs. Log-Based CDC The Outbox Table pg_logical_emit_message() Format Considerations Backfills Idempotency for Consumers Criticisms of the Outbox pattern Database Overhead Complexity Latency Discussion Alternatives to the Outbox Pattern Dapr Read-yourself Stream Processing on \u0026#34;Raw\u0026#34; Change Event Streams 2PC Durable Execution Summary This post originally appeared on the Decodable blog. All rights reserved.\nOver the last few years, the outbox pattern has become a common solution for implementing data exchange flows between microservices. It allows services to safely and reliably update their own local datastore and at the same time send out notifications to other services via data streaming platforms such as Apache Kafka. But time isn’t standing still: people ask about disadvantages of the pattern (is the database becoming a bottleneck?), alternative solutions such as \u0026#34;listen-to-yourself\u0026#34; have been proposed, Kafka is about to get support for participating in 2-phase commit (2PC) transactions, and more. It’s time to take another look at the outbox pattern, what it is and how to implement it, whether it’s still relevant in 2024, and which alternatives exist!\nRecap: What’s The Outbox Pattern? Congrats! You’ve landed that job as a software engineer at Oh-my-Dawg, the nation’s latest and hottest franchise in beauty care for our four-pawed friends. Pedicure for poodles, lathering for labradors, grooming for greyhounds—​there’s a lot of money to be made in this business, and your task is gonna be to build robust and reliable backend systems for that.\nLet’s look at one of the business processes, the creation of appointments for a treatment, and how it could be implemented. Oh-my-Dawg runs a microservices architecture, with one service for managing appointments, and another one for managing inventory. When the appointment service receives a request via its REST API, it must persist the information about the requested treatment in its own database. We’re members of the boring technology club, so Postgres is used as a data store. At the same time, the appointment service needs to notify the inventory service about the new appointment, so that shampoo and whatever else is needed can be reserved, and backordered, if needed. To decouple the two services, the notification is sent via Apache Kafka.\nFigure 1. The Oh-my-Dawg microservices architecture Now, it is vital that these two operations, updating the appointment database and sending the notification, happen atomically—​that is, they either both happen, or neither of them does. Otherwise, the overall state of the system would be inconsistent:\nIf the database transaction commits but the inventory notification isn’t sent, we’d end up without shampoo for the appointment. Our furry customers won’t be happy!\nIf the database transaction fails but the notification is sent, we’d inform the customer about the failed request but still have allocated inventory for a non-existent appointment. The folks over at Oh-my-Dawg accounting won’t like that!\nAs of today, there’s no way for Kafka to participate in a distributed transaction shared with a database (more on that, in particular whether it is a good idea or not, later on). Fortunately, the outbox pattern provides a solution which doesn’t require distributed transactions to begin with. As part of its local database transaction, the appointment service inserts the notification message which it wants to send into a table in that database. The transaction ensures atomicity for writing the actual data and that message.\nFigure 2. The outbox pattern for atomically updating the Appointments table and emitting a notification A separate process, called the outbox relay, picks up that message from this outbox table and sends it to Kafka. This happens asynchronously and can be retried if needed, without impacting the clients of the appointment service in any way. Different approaches exist for implementing the outbox relay, with one popular option being log-based change data capture (CDC), using tools such as Debezium. You can learn more about the foundations of the outbox pattern in this article, while this post provides an in-depth hands-on example.\nNote that, while the outbox pattern ensures atomicity of a local database update and sending out a message to external consumers, it does not provide complete ACID transactional guarantees. Specifically, the pattern provides eventual consistency semantics: changes to the local datastore of the writing service become visible immediately, while the notification of other services happens asynchronously.\nImplementation Considerations Having discussed what the outbox pattern is and what it is used for, let’s dive into some of the implementation details.\nPolling vs. Log-Based CDC One key component to an outbox pattern implementation is the mechanism for retrieving messages from the outbox table. A commonly suggested solution is polling: Some background process queries the outbox table at regular intervals for newly inserted messages and sends them to Kafka. Once that’s done, all emitted messages are marked as processed in the database, or they are deleted right away.\nWhile deceptively simple, there’s a number of problems with this approach. For one, it can be resource-intensive, creating spikes of high load on the database upon each polling attempt. There’s a natural conflict between achieving low latency and not overwhelming the database due to polling too frequently.\nThe biggest challenge of this approach though, as pointed out by Martin Kleppmann, are its poor ordering semantics. If there are multiple transactions running in parallel, and each one emits an outbox message, it can’t be guaranteed that the order of messages in the outbox table—​e.g., designated by timestamps or a sequence field—​is the same as the order of commits. This can have severe implications. When not being very careful, it can cause the relay to miss outbox messages, providing consumers with an incomplete feed of events (Oskar Dudycz does a great job in explaining the problem and describing one potential solution; the basic idea is to consider an outbox event only once there are no more transactions running which are older than the one emitting the outbox event). But this also means that there can be inconsistencies between the state of the writing service’s local database and the external state as represented by its published messages. While serialization may be enforced in some cases, for instance using optimistic locking on a specific record, the problem can’t be solved generically in the presence of concurrent writers.\nA much better solution is to retrieve the outbox events via log-based CDC. By tailing the database’s transaction log (e.g. the write-ahead log, WAL, in case of Postgres), events are emitted in the exact same order as transactions were committed to the database, ensuring consistency between internal and external representation of the data. Log-based CDC comes with a few other advantages: it avoids the polling overhead and ensures a low latency (typically, changes can be propagated from Postgres to Kafka within a two-digit milliseconds timeframe). It should be the preferred option whenever possible.\nThe Outbox Table At the center of the outbox pattern is a table for storing the outbox events. The writing service—​such as the appointment service in the Oh-my-Dawg example—​only ever makes inserts into this table, but it never updates. Existing events are immutable and cannot be altered after the fact. In that sense, the outbox table represents an append-only log, not unlike the actual transaction log of the database itself.\nBesides the actual event payload (often JSON, but you can use any format of your choosing) an outbox table typically has columns for message id (allowing consumers to identify messages sent more than once, more on that later) and event type, allowing you to route events of different types (e.g. \u0026#34;Appointment\u0026#34;) to different topics. When using a partitioned streaming platform such as Kafka, you’ll also need a column for the id of the represented entity (for instance, the appointment id), so that all events pertaining to the same record will be written to the same partition, thus ensuring correct ordering of these events.\nAs an example, here is the default outbox table format when using Debezium as a log-based outbox relay:\nid (UUID): unique event identifier\naggregatetype (VARCHAR(255)): the type of domain object described by an event, e.g. \u0026#34;appointment\u0026#34;; used for routing outbox events of different types to different Kafka topics\naggregateid (VARCHAR(255)): the id of the represented domain object, e.g. the appointment id; used as message key to ensure correct event ordering with partitioned Kafka topics\ntype (VARCHAR(255)): The type of event, e.g. \u0026#34;appointment created\u0026#34;; can be used by consumers to trigger specific event handlers\npayload (JSONB): The actual event payload\nHousekeeping One detail which can be easy to ignore at first but which is critical in production scenarios is housekeeping for the outbox table: once events have been picked up from the outbox table, they can and should be removed from the outbox table, preventing it from growing larger and larger. With a polling-based approach as described above this can be done as part of the polling loop (which has to issue read-write transactions to do this, though).\nWhen retrieving outbox messages via log-based CDC, the removal can actually be done right away after inserting a message into the outbox table. More specifically, the INSERT and DELETE can be two subsequent operations within one and the same transaction. As both operations are represented as entries in the append-only transaction log, the outbox message can be safely retrieved from the INSERT change event, while the outbox table always is empty when running a SELECT against it. This mitigates at least in parts the concern about storage overhead that is occasionally brought up against the outbox pattern (see below for details).\npg_logical_emit_message() What if, instead of implementing what is, in effect, an append-only log in a custom table, you could just use the database’s actual transaction log itself for relaying outbox events? Turns out you can—​at least with Postgres!\nThrough its function pg_logical_emit_message(), Postgres lets you write arbitrary messages to the WAL. This is exactly what you need for the outbox pattern: instead of inserting the outbox messages into a dedicated table, you just store them in the transaction log by means of a simple function call:\nWriting an outbox message to the transaction log with pg_logical_emit_message() 1 2 3 4 5 6 7 8 9 10 11 12 SELECT * FROM pg_logical_emit_message( -- This message is transactional, only emit it -- if the transaction commits true, -- An arbitrary prefix which can be used to differentiate -- between different kinds of messages \u0026#39;appointments\u0026#39;, -- The actual outbox message payload, for instance as JSON \u0026#39;{ ... }\u0026#39; ); These messages never materialize in any tables (and thus don’t cause any database growth apart from the WAL itself) and you also don’t need to take care of housekeeping, as any obsolete WAL segments will automatically be disposed of. If you are on Postgres, such a log-only outbox implementation is the one I’d recommend to use; you can learn more about the details in this article. It shows how to implement the outbox pattern with Postgres logical decoding messages, using Flink CDC and Debezium as a log-based outbox relay.\nFor services using MySQL as a datastore, the BLACKHOLE storage engine can be used in a similar way. Akin to writing something to /dev/null, any data written to a table with this storage engine will be immediately thrown away. The writes will reflect in records appended to the binlog (MySQL’s transaction log) though, allowing you to retrieve them using log-based CDC. Note you can use BLACKHOLE and InnoDB tables in one shared transaction, ensuring atomicity for the writes to your actual data tables and the outbox table. If you are aware of similar capabilities for other databases, I’d love to hear from you!\nFormat Considerations Whether you are using an actual outbox table or are storing outbox events only in the transaction log, you’ll need to decide on the format for the actual event payload. As far the logical format is concerned, it’s completely up to you how the events should be structured. Being independent from the schema of a service’s actual data tables, the schema of outbox messages can be considered as a form of a data contract, allowing you to evolve your internal table model without impacting or even breaking any external services. The event format should be evolved in a forward-compatible manner, i.e. you can add new fields and drop existing optional fields. That way, the data producer can evolve the schema without having to synchronize with consumers, as consumers with an old schema version will still be able to process events adhering to a new schema version.\nAs for a physical message format, JSON continues to be a popular choice. It is verbose though, so you may consider using compression, or working with a binary format such as Apache Avro or Google Protocol Buffers instead. With the latter option it can be interesting—​instead of connecting to a schema registry from within the writing service—​to use a registryless serializer. As schemas won’t change while a service is running, they can be statically defined at build time. That way, there’s no runtime dependency from the application to a schema registry and thus one less failure point on the synchronous request processing path. If required, schemas can still be published to a registry (for instance, the Confluent schema registry, or Apicurio) via a CI/CD pipeline when deploying a new version of the source microservice, making them available for discovery and consumption by downstream processes.\nAnother interesting option to consider can be whether to adopt CloudEvents for your event payloads. It’s a specification for describing events in a standardized way, allowing consumers to uniformly access common attributes such as event id, source, and timestamp.\nBackfills One of the most common questions I’ve seen around the outbox pattern is how to deal with backfills. Let’s assume Oh-my-Dawg has been operating for a while already, and only now the need comes up for notifying other services about appointment updates. So you adjust the appointment service to use the outbox pattern for that purpose, but how do you emit messages describing the appointments already existing in the database at this point? Or maybe disaster has struck, and you’ve lost the topic with appointment events on the Kafka cluster so now your inventory is going to be out of sync.\nOne solution to this is to use the same machinery and communication channel—​i.e. outbox, CDC, Kafka—​and emit backfill events which essentially describe the current state of the data. This is relatively easy if there are no concurrent writes. At Oh-my-Dawg, you’d scan the existing dataset and insert an event for each existing appointment into the outbox table. Unfortunately, in a live production system you usually won’t have the luxury of exclusive write access. In which case you’ll need to deal with concurrent updates and make sure that any backfill events don’t overwrite updates to a record which is happening in parallel.\nThis can be done by implementing the watermark-based snapshotting approach introduced in the DBLog paper (which since then has been implemented by Debezium, Flink CDC, and others). The high-level intuition for this algorithm is to incrementally step through the dataset to be backfilled in ordered chunks, consuming change events from the transaction log in parallel, and apply a deduplication step for giving any events from the log precedence over backfill events. Chunks are segmented by marker events which are inserted into the log by the snapshotting mechanism. A backfilling job for outbox events could look like this:\nInsert a \u0026#34;chunk start\u0026#34; event into the outbox\nSelect the next chunk of data to be backfilled, for instance appointments with ids 1-1,000, and insert corresponding backfill events into the outbox\nInsert a \u0026#34;chunk end\u0026#34; event into the outbox\nRepeat at 1.) until all data has been backfilled\nWhen extracting events from the outbox, for each of the windowed chunks, processing happens in a buffered way. All the regular, non-backfill events are propagated. A backfill event will only be propagated if there’s no non-backfill event for the same record (for instance, the appointment with id 42) in the current chunk, otherwise it will be discarded. This buffering could happen in different ways, for instance using a Kafka Connect SMT, or using a Flink stream processing job. To learn more about this approach to concurrent incremental backfilling, refer to this post on the Debezium blog.\nIdempotency for Consumers Let’s spend a few minutes thinking about how event consumers such as the inventory service could be implemented. A key requirement there is to make sure that each event is processed not more than once. Otherwise, we might end up allocating too much inventory, say two portions of shampoo for one and the same treatment. The challenge is that these kinds of data pipelines have at-least-once semantics typically. If, for instance, the CDC process crashes after emitting an event to Kafka, but before acknowledging that event as consumed with the source database, it will be emitted a second time after restarting.\nOne option for detecting—​and discarding—​such duplicates, is adding a unique identifier to each event, such as a UUID. Consumers keep track of the UUIDs of the events they have processed successfully, for instance by storing them in a journal table in their database. When an event comes in, they check whether they’ve seen its UUID before, and if so ignore that duplicate event. The problem though is: for how long should a consumer keep these UUIDs before removing them from its journal? Keeping them for too long may cause the journal to grow unwieldy, dropping them too early may cause duplicate events to go unnoticed if they arrive after the retention period of the journal.\nA better approach is using a monotonically increasing value, i.e. a value that only ever increases and never decreases between different events. That way, consumers need to store only the latest value they’ve seen, just like a watermark. When they receive an event with a sequence value which is the same as or lower than the current watermark (and transport semantics are not at-most-once, i.e. events are guaranteed to not get lost), this must be a duplicate which can be discarded. Now, as discussed above, you can’t use a standard database sequence for creating that value, as you can’t guarantee that it is going to be monotonically increasing for events created in multiple concurrent transactions. Instead, the records position (offset) in the source database’s transaction log can be used, for instance the LSN (log sequence number) in case of Postgres, a \u0026#34;byte offset into the [transaction log], increasing monotonically with each new record\u0026#34;. Similar ids exist in other databases too, check their documentation on the exact format and semantics.\nCriticisms of the Outbox pattern The outbox pattern definitely fulfills its purpose: allowing a service to update its own database and send out notifications to other services via streaming platforms such as Kafka, in a safe and reliable way, without requiring distributed transactions. When it comes to critique on this pattern, I mostly see two things being mentioned: overhead on the database, and complexity. A less commonly brought up concern is latency. Let’s take a look at all of these in turn.\nDatabase Overhead This concern is about the additional load put onto a service’s database by using it for storing and emitting outbox messages. The potential impact is twofold: more data needs to be stored in the database (adding to the overall size of the database), and transactions have a larger payload, as besides the actual table writes, there’s also the outbox inserts (adding to the overall I/O of the database, thus potentially impacting latency and throughput).\nWhile true, it depends on the actual situation whether these things actually are a problem or not. When it comes to the size overhead, this can be mitigated by using a log-only outbox implementation, instead of having an actual outbox table, as described above. That way, outbox messages are only present in the transaction log from where they can be discarded as soon as they have been picked up by the outbox relay, i.e. they are typically short-lived. As for the overhead on transactions, each transaction has a given baseline cost anyways. Doing one more insert for an outbox event probably won’t be significant in most cases, but in the end it’s something you can only determine in your actual environment with your actual workload and its characteristics.\nSo far I have seen this mostly as a theoretical concern rather than as an actual, empirically demonstrated problem.\nComplexity The complexity argument needs consideration from two different angles: an external perspective looking from the outside at a service landscape such as Oh-my-Dawg’s as a whole, and an internal perspective looking at how individual services are implemented.\nFrom the external perspective, you’re looking at things such as the overall number of moving parts of the solution as well as their interactions. In a microservices architecture, there’ll be a database for each service and a streaming platform for inter-service messaging in any case. When implementing the outbox pattern, you’ll also need the outbox relay. If you don’t do CDC yet, adding a system like Debezium—​for instance by standing up a Kafka Connect cluster—​for this purpose will add one more component indeed. This component needs to be operated, updated, kept secure, etc. On the other hand, chances are you need CDC anyways, in which case configuring it for also capturing outbox events doesn’t change the consideration much. Another option can be to run the outbox relay within the writing service yourself: via its embedded engine, Debezium can be used as a library in Java-based applications, thus avoiding the need for running the outbox relay in a separate process.\nThe other, internal, angle to the complexity argument is primarily about the programming model, i.e. how hard or simple is it to make use of the outbox pattern from an application development perspective. For that I’d argue, when done right, the outbox pattern can actually reduce complexity. Instead of dealing with database APIs such as JDBC and the Kafka client, developers of the writing service only need to concern themselves with the database access. As the Kafka access happens behind the scenes via the CDC process, they don’t need to be aware of the intricacies of Kafka producers, how to configure and tune them, etc. The key thing here is to find the right abstractions, so as to simplify the process of emitting outbox messages as much as possible. As an example, Debezium’s extension for implementing the outbox pattern within Quarkus-based microservices allows you to use CDI events for doing so:\nPersisting an appointment and emitting an outbox message in a Quarkus-based microservice 1 2 3 4 5 6 7 8 9 10 11 12 13 @Inject AppointmentRepository appointmentRepository; @Inject \u0026lt;Event\u0026gt; event; @Transactional public Appointment placeAppointment(Appointment appointment) { // update the database appointment = appointmentRepository.save(appointment); // emit the outbox event; the Quarkus extension takes care // of persisting the event in the outbox event.fire(new AppointmentCreatedEvent(Instant.now(), appointment)); return appointment; } Similar solutions exist for other stacks, for instance for Spring Boot, and programming languages. Whether using a ready-made solution or implementing something from scratch, the required infrastructure for enabling the outbox pattern within an application can and should be encapsulated in an easy-to-use component, shielding application logic from the implementation details.\nLatency A minor disadvantage of the outbox pattern is the increased latency of messages sent to external consumers. This will take longer when routing them through the database than when publishing them to Kafka directly. While that’s true, I don’t think this is really significant in practice. The notification of external processes is designed as an asynchronous process anyways, and the publication of messages via log-based CDC can be really fast. If it is a problem in a microservices architecture when one service receives a message sent by another service with a delay of a few hundred milliseconds, then probably there’s something not quite right with how the services are cut in the first place.\nDiscussion If you want to achieve consistency in a distributed system, such as an ensemble of cooperating microservices, there is going to be a cost. This goes for the outbox pattern, as well as for the potential alternatives discussed in the next section. As such, there are valid criticisms of the outbox pattern, but in the end it’s all about trade-offs: does the outbox put an additional load onto your database? Yes, it does (though it usually is insignificant, in particular when using a log-based outbox relay implementation). Does it increase complexity? Potentially. But this will be a price well worth paying most of the time, in order to achieve consistency amongst distributed services. In the next section, we are going to take a look at other solutions to this problem and the trade-offs they make.\nAlternatives to the Outbox Pattern The outbox pattern isn’t the only solution for implementing reliable data exchanges between different systems in a microservices architecture. Let’s explore some of the options and their specific pros and cons.\nDapr Not so much an alternative as a variation of the outbox pattern is provided by Dapr, a distributed platform for building reliable microservices. In its outbox pattern implementation, Dapr first writes the message to be published to an internal Pub/Sub topic. If that succeeds, it updates the local state, also writing an identifier of that previously sent message.\nA separate Dapr process reads from the internal topic and looks for the identifier in the state store. If present, it re-publishes the message on a user-facing topic—​from where it can be read by the consuming service—​and removes the identifier from the state store. If the message identifier cannot be found in the state store, this indicates that the state store update failed and the internal Pub/Sub message will not be propagated.\nThe advantage is that the outbox event doesn’t have to be written to the state store, addressing the overhead concern discussed above. That is being paid for though by having two Pub/Sub writes (internal and external) instead of one, plus the additional look-up and removal of the identifier in the state store (when compared to a purely log-based outbox implementation). So the trade-offs are slightly different, but not clearly advantageous to me.\nRead-yourself Instead of having a service write to its database and drive the update to Kafka from there via an outbox, a commonly suggested alternative is to reverse this order. When handling a request, the service writes only to Kafka. Thanks to transactional producers, writes to an internal data topic and a public topic for external consumption can happen atomically. The service subscribes to this data topic and updates data views in its local state store based on that (Event Sourcing).\nThis provides the same characteristics in terms of availability and consistency across services as the original outbox pattern. The downside of this approach is that there are no synchronous read-your-own-write guarantees. As consumption via Kafka is asynchronous, it is not guaranteed that a service will be able to retrieve a record from the data store right after it has been published to Kafka. This can make for confusing user experiences, if for instance a user can’t see their appointment right after creating it. Also other tasks which are trivial when writing to a database first—​such as enforcing unique constraints—​are becoming challenging with this approach. While some techniques for mitigation exist, it’s not an approach I’d recommend due to the inherent complexities.\nStream Processing on \u0026#34;Raw\u0026#34; Change Event Streams Another alternative is to use change data capture not for an outbox table, but for the actual data tables of the service itself, such as the \u0026#34;Appointment\u0026#34; table in the Oh-my-Dawg example. Exposing these change event streams across service—​and team—​boundaries can be problematic, as it exposes a service’s internal data model and its structure to external consumers. In particular when it comes to changes to the data schema, this can cause friction and disruption. Stream processing, for instance via Apache Flink or Kafka Streams, can be a way out, establishing consciously crafted data contracts between the providing service and event consumers. With tools like Flink SQL, you can apply data transformations for shaping the published events, limit which events get published, create denormalized events by joining multiple change event streams, and much more. Flink CDC can be used to run Debezium natively within Flink, essentially letting a Flink job take the role of the outbox relay.\nFigure 3. Joining \u0026#34;raw\u0026#34; change event streams (produced by Debezium and Flink CDC) via Flink SQL The big advantage of this over the outbox pattern is that no code changes to the source application itself are required. Also there is no impact on the source database, as it only contains the actual data tables and nothing more. The shaping of published events sent to other services happens completely asynchronously. The downside is that you need to add a stream processing solution to the overall architecture (though managed services can help navigate this complexity), and stream processing has a learning curve on its own. Establishing transactional semantics—​i.e. emitting a derived public event only once all relevant events from a source transaction have been processed—​can be a challenging task, too. With the outbox pattern, which emits messages in the context of transactions in the source database, this comes for free.\n2PC Some folks argue that the outbox pattern essentially is a work-around for the lack of 2PC (two-phase commit) support in Kafka. A spicy take, I like it! And indeed, there’s something to this. If you are using messaging infrastructure like a good ol\u0026#39; JMS broker which supports enlisting in global transactions alongside databases, you’ll get the all-or-nothing semantics for writing to a service’s database and sending a message to other services which we’re after.\nSo it should be good news then that something similar will be possible with Kafka soon, as the Kafka community currently is working on KIP-939 (\u0026#34;Support Participation in 2PC\u0026#34;). But in my opinion, this KIP won’t render the outbox pattern obsolete; the reason being the implications on service availability. With the outbox pattern, a service only needs a single resource to be available in order to process an incoming request—​its own database. Whereas with 2PC, it needs two resources, the database and the Kafka cluster, which means that the service’s overall availability actually is lower.\nAlso, there usually isn’t a good reason in the first place to make the process of emitting a notification to external services part of the synchronous call flow. This activity isn’t relevant for the client making the inbound request and you’re just unnecessarily extending request processing time. Another way to see this is as eating into your \u0026#34;synchrony budget\u0026#34; (a notion I am planning to discuss in more detail in a future blog post) without any benefit. Another, minor, advantage of the outbox pattern is that developers of the writing service only need to know one API (for interacting with the database) rather than two APIs, for database access and for message platform access, respectively.\nDurable Execution Using the outbox pattern and a bit of glue code, it is possible to orchestrate Sagas, long-running business transactions, across distributed services. Durable execution frameworks such as Temporal, Restate, or DBOS Transact aim to take things one step further with a higher-level programming experience. In a nutshell, they provide a form of persistent continuations, transparently retrying failed operations, keeping track of the execution state of a service, and automatically restoring it in case of failures.\nThere has been a fair amount of buzz around these solutions lately and their promise of drastically simplifying the creation of robust and fault-tolerant distributed workflows. Whether they’ll gain widespread adoption in the industry remains to be seen at this point. In particular the requirement to implement workflows using particular SDKs—​potentially creating a lock-in effect—​and the need for framework-specific execution runtimes such as the Temporal Service or the Restate server, typically with bespoke state store implementations, may pose a challenge for proliferation. But if they manage to live up to their promise, they may free developers from having to deal with details such as outboxes, retries, etc.\nSummary So, where does all that leave us? Is the outbox pattern still relevant and should the development team at Oh-my-Dawg use it for implementing data exchanges between their microservices? Or are there clearly superior alternatives, rendering the pattern obsolete?\nAs far as I am concerned, the outbox pattern continues to deserve a very central spot in the toolbox of development teams building microservices architectures.\nWhile alternatives do exist, they each come with their own specific trade-offs, around a number of aspects such consistency, availability, queryability, developer experience, operational complexity, and more. The outbox pattern puts a strong focus on consistency and reliability (i.e. eventual consistency across services is ensured also in case of failures), availability (a writing service only needs a single resource, its own database) and letting developers benefit from all the capabilities of their well-known datastore (instant read-your-own-writes, queries, etc.). But each situation comes with its own context, if for instance you are not in a position to modify a given service, stream processing on table-level change event streams may be the way to go for you.\nWhenever possible, the outbox pattern should be implemented via log-based CDC rather than polling, thus ensuring correct ordering of outbox events and mitigating concerns around overhead on the database. A log-only implementation, for instance using Postgres\u0026#39; pg_logical_emit_message() is particularly appealing.\nIn any case, whether you are using the outbox pattern, or any of the alternatives, what you should never do is just write to a database and a messaging broker without global transactional guarantees, hoping for the best. Unfortunately, there are many blog posts and articles out there, advocating for exactly that, either knowingly or unknowingly of the implications. In particular, in the absence of distributed transactions, it is not sufficient to trigger a Kafka message using a local database transaction synchronization, which is a commonly described approach. Processes will shut down uncleanly, machines will crash, and networks will stop working. So inconsistencies are bound to happen when not taking the right precautions.\nBetter don’t go there and always remember: Friends don’t let friends do dual writes!\n","id":68,"publicationdate":"Oct 31, 2024","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_recap_whats_the_outbox_pattern\"\u003eRecap: What’s The Outbox Pattern?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_implementation_considerations\"\u003eImplementation Considerations\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_polling_vs_log_based_cdc\"\u003ePolling vs. Log-Based CDC\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_outbox_table\"\u003eThe Outbox Table\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_pg_logical_emit_message\"\u003epg_logical_emit_message()\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_format_considerations\"\u003eFormat Considerations\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_backfills\"\u003eBackfills\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_idempotency_for_consumers\"\u003eIdempotency for Consumers\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_criticisms_of_the_outbox_pattern\"\u003eCriticisms of the Outbox pattern\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_database_overhead\"\u003eDatabase Overhead\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_complexity\"\u003eComplexity\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_latency\"\u003eLatency\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_discussion\"\u003eDiscussion\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_alternatives_to_the_outbox_pattern\"\u003eAlternatives to the Outbox Pattern\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_dapr\"\u003eDapr\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_read_yourself\"\u003eRead-yourself\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_stream_processing_on_raw_change_event_streams\"\u003eStream Processing on \u0026#34;Raw\u0026#34; Change Event Streams\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_2pc\"\u003e2PC\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_durable_execution\"\u003eDurable Execution\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/revisiting-the-outbox-pattern\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOver the last few years, the outbox pattern has become a common solution for implementing data exchange flows between microservices.\nIt allows services to safely and reliably update their own local datastore and at the same time send out notifications to other services via data streaming platforms such as Apache Kafka.\nBut time isn’t standing still: people ask about disadvantages of the pattern (\u003cem\u003eis the database becoming a bottleneck?\u003c/em\u003e), alternative solutions such as \u0026#34;listen-to-yourself\u0026#34; have been proposed, Kafka is about to get support for participating in 2-phase commit (2PC) transactions, and more.\nIt’s time to take another look at the outbox pattern, what it is and how to implement it, whether it’s still relevant in 2024, and which alternatives exist!\u003c/p\u003e\n\u003c/div\u003e","tags":["cdc","debezium","postgres","kafka"],"title":"Revisiting the Outbox Pattern","uri":"https://www.morling.dev/blog/revisiting-the-outbox-pattern/"},{"content":" During and after my time as the lead of Debezium, a widely used open-source platform for Change Data Capture (CDC) for a variety of database, I got repeatedly asked whether I’d be interested in creating a company around CDC. VCs, including wellknown household names, did and do reach out to me, pitching this idea.\nOn the surface, this sounds tempting. CDC, and Debezium in particular, are widely used in the data sphere. So taking a few million seed capital and building CDC-as-a-Service sounds like an attractive idea, doesn’t it? Living the start-up life and creating a unicorn-to-be, oh what a sweet dream. But having worked on CDC for quite a few years now, I am convinced that this wouldn’t the right thing to do.\nThe reason being that CDC is a feature, not a product.\nBy that I mean that CDC is an incredibly powerful tool, a huge enabler for working with your data in real-time, enabling a wide range of use cases such replication, cache and search index updates, auditing, microservice data exchange, and many others. Liberty for your data—​rejoice!\nBut it’s that, an enabler. CDC isn’t really that useful on its own. You ingest data change events into Kafka, and then what? At the very least, you want to have sink connectors which take that data and put it elsewhere. For a successful product, you need to solve a problem people have. And that problem rarely is \u0026#34;Take my data from Postgres to Kafka\u0026#34;, and much more often is \u0026#34;Take my data from Postgres to Snowflake/Elasticsearch/S3\u0026#34;. Very often, you also want to put some processing to your change event streams, e.g. to filter, transform, denormalize, or aggregate them.\nIn my opinion, CDC makes sense as part of a cohesive data platform which integrates all these things. These, and more: also data governance, schema management, observability, quality management, etc. Another angle for CDC productization could be to marry it closely with a database. Imagine Postgres provided out of the box a Kafka broker endpoint to which you can subscribe for getting Debezium-formatted data change events. How cool would that be? But again, that’s a feature, not a product.\nNow, there have been a few start-ups focused on CDC lately. Two that stuck out to me were Arcion and PeerDB: They got acquired quickly by Databricks and Clickhouse, respectively. As I suppose with the goal of turning them—​you’ll guess it—​into features of their data offerings.\n","id":69,"publicationdate":"Oct 18, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eDuring and after my time as the lead of \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e,\na widely used open-source platform for Change Data Capture (CDC) for a variety of database,\nI got repeatedly asked whether I’d be interested in creating a company around CDC.\nVCs, including wellknown household names, did and do reach out to me,\npitching this idea.\u003c/p\u003e\n\u003c/div\u003e","tags":["cdc","debezium","architecture"],"title":"CDC Is a Feature Not a Product","uri":"https://www.morling.dev/blog/cdc-is-a-feature-not-a-product/"},{"content":"","id":70,"publicationdate":"Oct 6, 2024","section":"tags","summary":"","tags":null,"title":"automation","uri":"https://www.morling.dev/tags/automation/"},{"content":"","id":71,"publicationdate":"Oct 6, 2024","section":"tags","summary":"","tags":null,"title":"cloud","uri":"https://www.morling.dev/tags/cloud/"},{"content":" Table of Contents Creating Instances Configuring SSH Provisioning Software Try It Out Yourself Whenever I’ve need a Linux box for some testing or experimentation, or projects like the One Billion Row Challenge a few months back, my go-to solution is Hetzner Online, a data center operator here in Europe.\nTheir prices for VMs are unbeatable, starting with 3,92 €/month for two shared vCPUs (either x64 or AArch64), four GB of RAM, and 20 TB of network traffic (these are prices for their German data centers, they vary between regions). four dedicated cores with 16 GB, e.g. for running a small web server, will cost you 28.55 €/month. Getting a box with similar specs on AWS would set you back a multiple of that, with the (outbound) network cost being the largest chunk. So it’s not a big surprise that more and more people realize the advantages of this offering, most notably Ruby on Rails creator David Heinemeier Hansson, who has been singing the praise for Hetzner’s dedicated servers, but also their VM instances, quite a bit on Twitter lately.\nSo I thought I’d share the automated process I’ve been using over the last few years for spinning up new boxes on Hetzner Cloud, hoping it’s gonna be helpful to other folks out there eager to explore this world of cheap compute. I’ve had that set-up in a GitHub repo for quite a while and meant to write about it, with the recent attention on Hetzner being a nice motivator for finally doing so. Note I am not affiliated with Hetzner in any way or form, I just like their offering and think more people should be aware of it and benefit from it.\nCreating Instances To create new VMs, I am using Terraform, which shouldn’t be a big surprise. The Hetzner Terraform provider is very mature and reflects the latest product features pretty quickly, as far as I can tell (alternatively, there’s a CLI tool, and of course an API as well). Here’s my complete Terraform definition for launching one VM instance and a firewall to control access to it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 terraform { required_providers { hcloud = { source = \u0026#34;hetznercloud/hcloud\u0026#34; version = \u0026#34;~\u0026gt; 1.45\u0026#34; } } } variable \u0026#34;hcloud_token\u0026#34; { sensitive = true } variable \u0026#34;firewall_source_ip\u0026#34; { default = \u0026#34;0.0.0.0\u0026#34; } # Configure the Hetzner Cloud Provider provider \u0026#34;hcloud\u0026#34; { token = \u0026#34;${var.hcloud_token}\u0026#34; (1) } resource \u0026#34;hcloud_firewall\u0026#34; \u0026#34;common-firewall\u0026#34; { (2) name = \u0026#34;common-firewall\u0026#34; rule { direction = \u0026#34;in\u0026#34; protocol = \u0026#34;tcp\u0026#34; port = \u0026#34;14625\u0026#34; (3) source_ips = [ \u0026#34;${var.firewall_source_ip}/32\u0026#34; (4) ] } rule { direction = \u0026#34;in\u0026#34; protocol = \u0026#34;icmp\u0026#34; source_ips = [ \u0026#34;${var.firewall_source_ip}/32\u0026#34; ] } } resource \u0026#34;hcloud_server\u0026#34; \u0026#34;control\u0026#34; { (5) name = \u0026#34;control\u0026#34; image = \u0026#34;fedora-40\u0026#34; location = \u0026#34;nbg1\u0026#34; server_type = \u0026#34;cx22\u0026#34; (6) keep_disk = true ssh_keys = [\u0026#34;some key\u0026#34;] (7) firewall_ids = [hcloud_firewall.common-firewall.id] } output \u0026#34;control_public_ip4\u0026#34; { value = \u0026#34;${hcloud_server.control.ipv4_address}\u0026#34; } 1 Hetzner Cloud API token, defined in .tfvars 2 Setting up a firewall for limiting access to the instance 3 Using a random non-standard SSH port; take that, script kiddies! And no, this is not the one I am actually using 4 If I don’t need public access, allowing to connect only from my own local machine 5 The VM to set up 6 The instance size, in this case the smallest one they have with 2 vCPUs and 4 GB of RAM 7 SSH access key, to be set up in the web console before Bringing up the VM is as easy as running the following command:\n1 TF_VAR_firewall_source_ip=`dig +short txt ch whoami.cloudflare @1.0.0.1 | tr -d \u0026#39;\u0026#34;\u0026#39;` terraform apply -var-file=.tfvars Note how I am injecting my own public IP as a variable, allowing the firewall configuration to be trimmed down to grant access only from that IP. That’s my standard set-up for test and dev boxes which don’t require public access. After just a little bit, your new cloud VM will be up and running, with Terraform reporting the IP address of the new box in its output. The cool thing is that you can rescale this box later on as needed. If you set keep_disk to true as above, the box will keep its initial disk size, allowing you to scale back down later on, too.\nSo I’ll always start with the smallest configuration, which costs not even four Euros per month. Then, when I am actually going to make use of the box for something which requires a bit more juice, I’ll update the server_type line as needed, e.g. to \u0026#34;ccx33\u0026#34; for eight dedicated vCPUs and 32 GB of RAM. This configuration would then cost 9,2 cents per hour, until I scale it back down again to cx22. Rescaling just takes a minute or two and is done by re-running Terraform as shown above. So it’s something which you can easily do whenever starting or stopping to work on some project. Of course, this makes sense for ad-hoc usage scenarios like mine, not so much for more permanently running workloads.\nConfiguring SSH After the box has been set up via Terraform, I am using Ansible for provisioning, i.e. the installation of software (yepp, my Red Hat past is shining through here). That way, the process is fully automated, and I can set up and provision new machines with the same configuration with ease at any time. My Ansible set-up is made up of two parts: one for configuring SSH, one for installing whatever packages are needed.\nHere’s the playbook for the SSH configuration, applying some best practices such as enforcing key-based authentication and disabling remote root access:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 --- - name: Create user (1) hosts: all remote_user: root gather_facts: false vars_files: - vars.yml tasks: - name: have {{ user }} user user: name: \u0026#34;{{ user }}\u0026#34; shell: /bin/bash - name: add wheel group group: name: wheel state: present - name: Allow wheel group to have passwordless sudo lineinfile: dest: /etc/sudoers state: present regexp: \u0026#39;^%wheel\u0026#39; line: \u0026#39;%wheel ALL=(ALL) NOPASSWD: ALL\u0026#39; validate: visudo -cf %s - name: add user user: name={{ user }} groups=wheel state=present append=yes - name: Add authorized key authorized_key: user: \u0026#34;{{ user }}\u0026#34; state: present key: \u0026#34;{{ lookup(\u0026#39;file\u0026#39;, \u0026#39;{{ ssh_public_key_file }}\u0026#39;) }}\u0026#34; (2) - name: Set up SSH (3) hosts: all remote_user: \u0026#34;build\u0026#34; become: true become_user: root gather_facts: false vars_files: - vars.yml tasks: - name: Disable root login over SSH lineinfile: dest=/etc/ssh/sshd_config regexp=\u0026#34;^PermitRootLogin\u0026#34; line=\u0026#34;PermitRootLogin no\u0026#34; state=present notify: - restart sshd - name: Disable password login lineinfile: dest=/etc/ssh/sshd_config regexp=\u0026#34;^PasswordAuthentication\u0026#34; line=\u0026#34;PasswordAuthentication no\u0026#34; state=present notify: - restart sshd - name: Change SSH port lineinfile: dest=/etc/ssh/sshd_config regexp=\u0026#34;^#Port 22\u0026#34; line=\u0026#34;Port 14625\u0026#34; state=present notify: - restart sshd handlers: - name: restart sshd service: name: sshd state: restarted 1 Adding a user \u0026#34;build\u0026#34; (name defined vars.yml) with sudo permissions 2 The SSH key to add for the user 3 Configuring SSH: disabling remote root login, disabling password login, and changing the SSH port to a non-standard value. Before running Ansible, I need to put the IP reported by Terraform into the hosts file, along with the path of private and public SSH key:\n1 2 [hetzner] \u0026lt;IP of the box\u0026gt;:14625 ansible_ssh_private_key_file=path/to/my-key ssh_public_key_file=/path/to/my-key.pub Then this playbook can be run like so:\n1 ansible-playbook -i hosts --limit=hetzner init-ssh.yml Note this can be executed only exactly once. Afterwards, the root user cannot connect anymore via SSH. Purists out there might say that the non-standard SSH port smells a bit like security by obscurity, and they wouldn’t be wrong. But it does help to prevent lots of entries about failed log-in attempts in the log, as most folks just randomly looking for machines to hack won’t bother trying with ports other than 22.\nProvisioning Software With the SSH configuration hardened a bit, it’s time to install some software onto the machine. What you’ll install depends on your specific requirements of course. For my purposes, I have two roles for installing some commonly required things and Docker, which both are incorporated via a playbook to be executed by the build user set up in the step before:\n1 2 3 4 5 6 7 8 9 --- - hosts: all remote_user: build roles: - base - docker vars_files: - vars.yml Here’s the base role’s task definitions:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 - name: upgrade all packages become: true become_user: root dnf: name=\u0026#34;*\u0026#34; state=latest - name: Have common tools become: true become_user: root dnf: name={{item}} state=latest with_items: - git - wget - the_silver_searcher - htop - acl - dnf-plugins-core - bash-completion - jq - gnupg - haveged - vim-enhanced - entr - zip - fail2ban - httpie - hyperfine - name: Have SDKMan become: no shell: \u0026#34;curl -s \u0026#39;https://get.sdkman.io\u0026#39; | bash\u0026#34; args: executable: /bin/bash creates: /home/build/.sdkman/bin/sdkman-init.sh - name: Have .bashrc copy: src: user_bashrc dest: /home/{{ user }}/.bashrc mode: 0644 I used to install Java via a separate role, allowing me to switch versions via update-alternatives, but this became a bit of a hassle, so I am doing this via the amazing SDKMan tool now. Finally, for the sake of completeness, here are the tasks for installing Docker. It’s a bit more complex than I’d like it to be, due to the fact that a separate DNF repo must be configured first:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 - name: Have docker repo become: true become_user: root shell: \u0026#39;dnf config-manager \\ --add-repo \\ https://download.docker.com/linux/fedora/docker-ce.repo\u0026#39; - name: Have dnf cache updated become: true become_user: root shell: \u0026#39;dnf makecache\u0026#39; - name: Have Docker become: true become_user: root dnf: name={{item}} state=latest with_items: - docker-ce - docker-ce-cli - containerd.io - docker-compose - docker-buildx-plugin - name: add docker group group: name=docker state=present become: true become_user: root - name: Have /etc/docker file: path=/etc/docker state=directory become: true become_user: root - name: Have daemon.json become: true become_user: root copy: src: docker_daemon.json dest: /etc/docker/daemon.json - name: Ensure Docker is started become: true become_user: root systemd: state: started enabled: yes name: docker - name: add user become: true become_user: root user: name={{ user}} groups=docker state=present append=yes Try It Out Yourself Thanks to Terraform and Ansible, spinning up a box for testing and development on Hetzner Cloud can be fully automated, letting you go from zero to a running VM—​set up for safe SSH access, and provisioned with the software you need—​within a few minutes. Once your VM is running, you can scale it up, and back down, based on your specific workloads. This allows you to stay on a really, really cheap configuration when you don’t actually need it, and then scale up and pay a bit more just for the hours you actually require the additional power.\nYou can find my complete Terraform and Ansible set-up for Hetzner Cloud in this GitHub repository. Note this is purely a side project I am using for personal projects, such as ad-hoc experimentation with new Java versions. I am not a Linux sysadmin by profession, so make sure to examine all the details and use it at your own risk. In case you do want to run this on a publicly reachable box and not behind a firewall, I recommend you install fail2ban as an additional measure of caution.\nIf you have any suggestions for improving this set-up, in particular for further improving security, please let me know in the comments below.\n","id":72,"publicationdate":"Oct 6, 2024","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_creating_instances\"\u003eCreating Instances\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_configuring_ssh\"\u003eConfiguring SSH\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_provisioning_software\"\u003eProvisioning Software\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_try_it_out_yourself\"\u003eTry It Out Yourself\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhenever I’ve need a Linux box for some testing or experimentation,\nor projects like the \u003ca href=\"/blog/1brc-results-are-in/\"\u003eOne Billion Row Challenge\u003c/a\u003e a few months back,\nmy go-to solution is \u003ca href=\"https://www.hetzner.com/\"\u003eHetzner Online\u003c/a\u003e, a data center operator here in Europe.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eTheir prices for VMs are unbeatable, starting with 3,92 €/month for two shared vCPUs (either x64 or AArch64), four GB of RAM, and 20 TB of network traffic\n(these are prices for their German data centers, they vary between regions).\nfour dedicated cores with 16 GB, e.g. for running a small web server, will cost you 28.55 €/month.\nGetting a box with similar specs on AWS would set you back a multiple of that, with the (outbound) network cost being the largest chunk.\nSo it’s not a big surprise that more and more people realize the advantages of this offering,\nmost notably Ruby on Rails creator \u003ca href=\"https://x.com/dhh/\"\u003eDavid Heinemeier Hansson\u003c/a\u003e,\nwho has been singing the praise for Hetzner’s dedicated servers, but also their VM instances, quite a bit on \u003ca href=\"https://x.com/search?q=from%3Adhh%20hetzner\u0026amp;src=typed_query\u0026amp;f=live\"\u003eTwitter\u003c/a\u003e lately.\u003c/p\u003e\n\u003c/div\u003e","tags":["cloud","infrastructure","automation"],"title":"How I Am Setting Up VMs On Hetzner Cloud","uri":"https://www.morling.dev/blog/how-i-am-setting-up-vms-on-hetzner-cloud/"},{"content":"","id":73,"publicationdate":"Oct 6, 2024","section":"tags","summary":"","tags":null,"title":"infrastructure","uri":"https://www.morling.dev/tags/infrastructure/"},{"content":"","id":74,"publicationdate":"Aug 26, 2024","section":"tags","summary":"","tags":null,"title":"algorithms","uri":"https://www.morling.dev/tags/algorithms/"},{"content":"","id":75,"publicationdate":"Aug 26, 2024","section":"tags","summary":"","tags":null,"title":"aws","uri":"https://www.morling.dev/tags/aws/"},{"content":" Table of Contents The Algorithm Obtaining the Lock Expiring a Lock Lock Validity Fencing Off Zombies In distributed systems, for instance when scaling out some workload to multiple compute nodes, it is a common requirement to select a leader for performing a given task: only one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc. Otherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.\nOne way to implement leader election is distributed locking. All the nodes compete to obtain a specific lock, but only one of them can succeed, which will then be the selected leader for as long as it holds that lock. Systems like Apache ZooKeeper or Postgres (via Advisory Locks) provide the required building blocks for this.\nNow, if your application solely is in the business of writing data to object storage such as Amazon S3, Google Cloud Storage, or Azure Blob Storage, running such a stateful service solely for the purposes for leader election can be an overhead which you’d like to avoid from an operational as well as financial perspective. While you could implement distributed locks on the latter two platforms for quite a while with their respective compare-and-swap (CAS) operations, this notoriously was not the case for S3. That is, until last week, when AWS announced support for conditional writes on S3, which was received with great excitement by many folks in the data and distributed systems communities.\nIn a nutshell, the S3 PutObject operation now supports an optional If-None-Match header. When specified, the call will only succeed when no file with the same key exists in the target bucket yet; otherwise you’ll get a 412 Precondition Failed response. Compared to what’s available on GCP and Azure, that’s rather limited, but it’s all you need for implementing a locking scheme for leader election.\nThe Algorithm The basic idea is to have nodes compete on creating a lock file, with the winner being the leader. As S3 conditional writes don’t prevent lost updates to existing files, a new lock file will be created for each leader epoch, i.e. when leadership changes either after a node failure or when the leader releases the lock voluntarily. The lock file can be a simple JSON structure like this:\n1 2 3 { \u0026#34;expired\u0026#34; : false } The expired attribute is used for releasing a lock after use (more on that below). The leader epoch, a strictly increasing numeric value, is part of the file name, e.g. lock_0000000001.json. This allows you to determine the current epoch by listing all lock files and finding the one with the highest epoch value (all lock files but the latest one can be removed by a background process, thus keeping the cost for the listing call constant).\nHere’s the overall leader election algorithm:\n1. List all lock files 2. If there is no lock file, or the latest one has expired: 3. Increment the epoch value by 1 and try to create a new lock file 4. If the lock file could be created: 5. The current node is the leader, start with the actual work 6. Otherwise, go back to 1. 7. Otherwise, another process already is the leader, so do nothing. Go back to 1. periodically Obtaining the Lock To obtain the lock (step 3.), put a file for the next epoch. The key thing is to pass the If-None-Match header and handle the potential 412 Precondition failed response. Using the AWS Java SDK, this could look like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 int epoch = ...; PutObjectRequest put = PutObjectRequest.builder() .bucket(BUCKET) .key(\u0026#34;lock-%010d.json\u0026#34;.formatted(epoch)) .ifNoneMatch(\u0026#34;*\u0026#34;) .build(); try { s3.putObject(put, RequestBody.fromString(\u0026#34;\u0026#34;\u0026#34; { \u0026#34;expired\u0026#34;: false } \u0026#34;\u0026#34;\u0026#34;)); } catch(S3Exception e) { if (e.statusCode() == 412) { //handle elsewhere and start over throw new LockingFailedException(); } else { throw e; } } If you receive a 412 response, this means another process created the lock file since between you’ve listed the existing locks and now. That way, it is guaranteed that only one process succeeds to create the lock for the current epoch and thus becomes the leader.\nExpiring a Lock At some point, the current leader may decide to step down from this role, for instance when gracefully shutting down. This is as simple as setting the expired attribute to true and update the current lock file:\n1 2 3 { \u0026#34;expired\u0026#34; : true } When other nodes list the existing lock files subsequently, they’ll see that the lock has expired and thus a new leader needs to be elected. Note that only ever that process which created the lock file for a given epoch may expire it, otherwise chaos may ensue. Naturally, this brings up the question of what happens when a leader never expires its lock, such as when it crashes. In that case, no new leader could ever be elected without manual intervention, hampering the liveness of the system.\nLock Validity To address this situation, you can add another attribute to the lock file format, defining for how long it should be valid:\n1 2 3 4 { \u0026#34;validity_ms\u0026#34; : 60000, \u0026#34;expired\u0026#34; : false } In this example, the lock should be valid for 60 seconds. For each file, S3 provides the last modification timestamp, specifying when it has been created or updated. When performing its work, the current leader needs to check whether the lock is still valid (i.e. have less than 60 seconds passed since the lock was obtained), optionally touching the file in order to extend the lease. Similarly, current non-leader nodes can check whether the latest lock is still valid or not.\nWhat about clock drift though? After all, you never should rely on clock accuracy of different nodes when building distributed systems. But the good news is, you don’t have to. Let’s discuss the different options. If the current leader’s clock is ahead, it will stop doing its work, despite the lock still being valid. Similarly, if the clock of a current non-leader is behind, it may not try to acquire leadership although the current lock already has expired. While this may impact throughput of the system, both cases are not a correctness problem.\nThings look different if the current leader relies on a lock after it has expired (because its clock is behind) and another leader has been elected already, or if a non-leader determines prematurely that the current lock has expired (because its clock is ahead) and thus picks up leadership.\nIn both cases, there’s more than one node which assumes to be the leader, which is exactly what we want to avoid with leadership election. But as it turns out, this is just the nature of the beast: leader election will only ever be eventually correct. As Martin Kleppmann describes in this excellent post, checking lock validity and performing the leader’s actual work is not atomic, no matter how hard you try (for instance, think of unexpected GC pauses). So you’ll always need to be prepared to detect and fence off work done by a previous leader.\nMinimizing Clock Drift While you never should rely on clock consistency across systems from a correctness point of view, it does make sense to keep clocks synchronous on a best-effort basis, thus reducing the aforementioned throughput impact. To do so, nodes could create a temporary file on S3 and compare its creation time on S3 with their local time. Alternatively, you could use the Amazon Time Sync Service, which offers micro-second time accuracy.\nFencing Off Zombies As a solution, Kleppmann suggests using the leader epoch as a fencing token. The epoch value only ever increases, so it can be used to identify requests by a stale leader (\u0026#34;zombie\u0026#34;). When for instance invoking a remote API, the fencing token could be passed as a request header, allowing the API provider to recognize and discard zombie requests by keeping track of the highest epoch value it has seen. Of course this requires the remote API to support the notion of fencing tokens, which may or may not be the case.\nAs an example targeting S3 (which doesn’t have bespoke support for fencing tokens), SlateDB implements this by uploading files following a serial order (similar to the lock file naming scheme above) and detecting conflicts between competing writers trying to create the same file. Thanks to the new support for conditional writes on S3, this task is trivial, not requiring any external stateful services any longer.\n","id":76,"publicationdate":"Aug 26, 2024","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_algorithm\"\u003eThe Algorithm\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_obtaining_the_lock\"\u003eObtaining the Lock\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_expiring_a_lock\"\u003eExpiring a Lock\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_lock_validity\"\u003eLock Validity\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_fencing_off_zombies\"\u003eFencing Off Zombies\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn distributed systems, for instance when scaling out some workload to multiple compute nodes,\nit is a common requirement to select a \u003cem\u003eleader\u003c/em\u003e for performing a given task:\nonly one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc.\nOtherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.\u003c/p\u003e\n\u003c/div\u003e","tags":["distributed-systems","aws","algorithms"],"title":"Leader Election With S3 Conditional Writes","uri":"https://www.morling.dev/blog/leader-election-with-s3-conditional-writes/"},{"content":"","id":77,"publicationdate":"Jul 6, 2024","section":"tags","summary":"","tags":null,"title":"jq","uri":"https://www.morling.dev/tags/jq/"},{"content":"","id":78,"publicationdate":"Jul 6, 2024","section":"tags","summary":"","tags":null,"title":"shell","uri":"https://www.morling.dev/tags/shell/"},{"content":" In my day job at Decodable, I am currently working with Terraform to provision some cloud infrastructure for an upcoming hands-on lab. Part of this set-up is a Postgres database on Amazon RDS, which I am creating using the Terraform AWS modules. Now, once my database was up and running, I wanted to extract two dynamically generated values from Terraform: the random password created for the root user, and the database host URL. On my way down the rabbit hole for finding a CLI command for doing this efficiently, I learned a few interesting shell details which I’d like to share.\nThe basic idea is to fetch the current Terraform state via terraform show -json and then extract the two values we’re after from that. The JSON output of Terraform looks like follows. The values I am after are on lines 20 and 40, respectively (shortened for readability, and no, those aren’t the actual values from my database instance 😉):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 { \u0026#34;format_version\u0026#34;: \u0026#34;1.0\u0026#34;, \u0026#34;terraform_version\u0026#34;: \u0026#34;1.5.4\u0026#34;, \u0026#34;values\u0026#34;: { \u0026#34;root_module\u0026#34;: { \u0026#34;resources\u0026#34;: [ ... ], \u0026#34;child_modules\u0026#34;: [ { \u0026#34;resources\u0026#34;: [ { \u0026#34;address\u0026#34;: \u0026#34;module.lab-001.aws_db_instance.lab_001_db\u0026#34;, \u0026#34;mode\u0026#34;: \u0026#34;managed\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;aws_db_instance\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;lab_001_db\u0026#34;, \u0026#34;provider_name\u0026#34;: \u0026#34;registry.terraform.io/hashicorp/aws\u0026#34;, \u0026#34;schema_version\u0026#34;: 2, \u0026#34;values\u0026#34;: { \u0026#34;address\u0026#34;: \u0026#34;lab-001-db.a4dadf981fgh.us-east-1.rds.amazonaws.com\u0026#34;, ... }, \u0026#34;sensitive_values\u0026#34;: { ... }, \u0026#34;depends_on\u0026#34;: [ \u0026#34;module.lab-001.random_password.this\u0026#34;, ... ] }, { \u0026#34;address\u0026#34;: \u0026#34;module.lab-001.random_password.this\u0026#34;, \u0026#34;mode\u0026#34;: \u0026#34;managed\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;random_password\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;this\u0026#34;, \u0026#34;provider_name\u0026#34;: \u0026#34;registry.terraform.io/hashicorp/random\u0026#34;, \u0026#34;schema_version\u0026#34;: 3, \u0026#34;values\u0026#34;: { \u0026#34;result\u0026#34;: \u0026#34;5adCpQc]$s3pQ=a\u0026#34;, ... }, \u0026#34;sensitive_values\u0026#34;: { ... } } ], \u0026#34;address\u0026#34;: \u0026#34;module.lab-001\u0026#34; } ] } } } Extracting the two values is relatively simple using jq. But I wanted to get both values at once, with a single Terraform call—​which is a remote and thus slow operation—​so I could pass them on to psql and get a database session. All that without storing the Terraform output in a file (which would taint my workspace), and as a copy/paste friendly snippet which I can add to the README of the project for documentation purposes.\nAfter fiddling around for a little while, I asked for help in our internal Slack, where my fellow Decoder Jared Breeden took the bits I already had and morphed them into this really cool solution (thanks again, mate!):\n1 2 3 4 5 6 7 8 9 10 ({ read -r host read -r password } \u0026lt; \u0026lt;(terraform show -json | jq -r \u0026#39; .values.root_module.child_modules[] | select(.address==\u0026#34;module.lab-001\u0026#34;) | .resources[] | (select(.address==\u0026#34;module.lab-001.random_password.this\u0026#34;) | .values.result), (select(.address==\u0026#34;module.lab-001.aws_db_instance.lab_001_db\u0026#34;) | .values.address)\u0026#39;) psql \u0026#34;postgresql://root:${password}@${host}:5432/labdb\u0026#34;) This does exactly what I want: retrieving the password and database host from the current Terraform state in one go and using them to open a session with the database via psql. So let’s dissect this little gem to understand how it works.\nterraform show -json retrieves the full JSON description of the Terraform state shown above:\n1 terraform show -json The resulting JSON is piped to jq for extracting the values of password and host:\n1 2 3 4 5 6 jq -r \u0026#39; .values.root_module.child_modules[] | select(.address==\u0026#34;module.lab-001\u0026#34;) | .resources[] | (select(.address==\u0026#34;module.lab-001.random_password.this\u0026#34;) | .values.result), (select(.address==\u0026#34;module.lab-001.aws_db_instance.lab_001_db\u0026#34;) | .values.address)\u0026#39; jq is invaluable for handling JSON and I highly recommend spending some time with its reference documentation to learn about it. For the case at hand, the select() function is used within a pipeline for finding the right elements within the array of Terraform child modules and extracting the required values. Putting the two inner select() calls into parenthesis makes them two separate expressions whose output will go onto separate lines.\nAt this point, the value of host and password are passed to stdout (the order is determined by the order of resource definitions in the input main.tf file and thus stable):\n1 2 lab-001-db.a4dadf981fgh.us-east-1.rds.amazonaws.com 5adCpQc]$s3pQ=a How to pass on the two values to psql? This is where the grouping command in curly braces comes in:\n1 2 3 4 { read -r host read -r password } \u0026lt; \u0026lt;(...) The list of commands between curly braces will be executed in the current shell context as one unit; in particular any input/output redirections will be applied to all the commands. Here we redirect the input (using the \u0026lt; operator, the counterpart to the more commonly used \u0026gt; operator for redirecting a command’s output) of the grouping command to the output of the jq invocation with the help of process substitution (\u0026lt;(...)), about which I wrote recently.\nYou might wonder why input redirection and process substitution are used here, instead of simply piping the output of jq to the grouping command. Indeed this would work when using zsh as a shell. Other shells such as bash execute each command of a pipeline in its own subshell, though. This means that the two variables wouldn’t be available any longer once the grouping command has completed. The input redirection approach thus increases portability of the solution across shells.\nWithin the grouping command, the two lines on stdin are read and stored under the names host and password in the shell context, respectively.\nThat way, they can be referenced in the subsequent command for opening a database session:\n1 psql \u0026#34;postgresql://root:${password}@${host}:5432/labdb\u0026#34; There’s one remaining problem, and that is that the host and password variables are still around after closing the database session, which may pose a security issue. We could call unset to remove them, but it’s even easier to make everything another grouping command, using (...) this time. This ensures a sub-shell is created for the commands which will be destroyed after closing the database session.\nLearning some new shell tricks will never be boring to me. Do you have another solution for solving this little problem? Let me know in the comments below!\n","id":79,"publicationdate":"Jul 6, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn my day job at \u003ca href=\"https://www.decodable.co/\"\u003eDecodable\u003c/a\u003e,\nI am currently working with Terraform to provision some cloud infrastructure for an upcoming hands-on lab.\nPart of this set-up is a Postgres database on Amazon RDS,\nwhich I am creating using the \u003ca href=\"https://developer.hashicorp.com/terraform/tutorials/aws/aws-rds\"\u003eTerraform AWS modules\u003c/a\u003e.\nNow, once my database was up and running,\nI wanted to extract two dynamically generated values from Terraform:\nthe random password created for the root user, and the database host URL.\nOn my way down the rabbit hole for finding a CLI command for doing this efficiently,\nI learned a few interesting shell details which I’d like to share.\u003c/p\u003e\n\u003c/div\u003e","tags":["shell","jq","terraform"],"title":"Shell Spell: Extracting and Propagating Multiple Values With jq","uri":"https://www.morling.dev/blog/extracting-and-propagating-multiple-values-with-jq/"},{"content":"","id":80,"publicationdate":"Jul 6, 2024","section":"tags","summary":"","tags":null,"title":"terraform","uri":"https://www.morling.dev/tags/terraform/"},{"content":" The other day, I was looking for means of zipping two Java streams: connecting them element by element—​essentially a join based on stream offset position—​and emitting an output stream with the results. Unfortunately, there is no zip() method offered by the Java Streams API itself. While it was considered for inclusion in early preview versions, the method was removed before the API went GA with Java 8 and you have to resort to 3rd party libraries such as Google Guava if you need this functionality.\nJava 22, scheduled for release later this week, promises to improve the situation here. It introduces a preview API for so-called stream gatherers (JEP 461). Similar to how collectors allow you to implement custom terminal operations on Java streams, gatherers let you add custom intermediary operations to a stream pipeline, providing an extension point for adding stream operations such as distinct() or window(), without having to bake them into the API itself. This sounds pretty much like what we need for a zip() method, doesn’t it?\nSo I spent some time studying the JEP and here’s the basic implementation I came up with:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 public record ObjToObjZipper\u0026lt;T1, T2, R\u0026gt;( Stream\u0026lt;T2\u0026gt; other, BiFunction\u0026lt;T1, T2, R\u0026gt; zipperFunction) (1) implements Gatherer\u0026lt;T1, Iterator\u0026lt;T2\u0026gt;, R\u0026gt; { (2) @Override public Supplier\u0026lt;Iterator\u0026lt;T2\u0026gt;\u0026gt; initializer() { (3) return () -\u0026gt; other.iterator(); } @Override public Integrator\u0026lt;Iterator\u0026lt;T2\u0026gt;, T1, R\u0026gt; integrator() { (4) return Gatherer.Integrator.ofGreedy((state, element, downstream) -\u0026gt; { if (state.hasNext()) { return downstream.push(zipperFunction.apply(element, state.next())); } return false; }); } } 1 This gatherer takes the stream to zip with and a function, which is applied to pairs of elements of the two streams and returns the zipped result 2 Gatherer has three type parameters: the element type of the stream the gatherer is applied to, a type for keeping track of intermediary state (in our case, that’s just the iterator of the second stream), and the output type 3 initializer() returns a supplier of the state tracking type, if needed 4 integrator() returns a function which \u0026#34;integrates provided elements, potentially using the provided intermediate state, optionally producing output to the provided Downstream\u0026#34; It’s the first time I have been using this API, so I hope I haven’t done anything too stupid :) The key part of the gatherer is its Integrator implementation. This is where for each element of the stream the gatherer is applied to, we take the corresponding element of the given second stream, apply the given function, and emit the function’s return value to the next stage in the stream pipeline.\nThis particular implementation stops emitting elements as soon as one of the two streams has been exhausted, but of course you also could have an implementation with \u0026#34;left join\u0026#34; semantics, or similar. With some more glue code for instantiating this zipping gatherer \u0026#34;builder style\u0026#34; (you can find the complete source code on GitHub), this is how it can be used:\n1 2 3 4 5 6 7 8 9 10 11 12 @Test public void canZipTwoObjectStreams() { List\u0026lt;String\u0026gt; letters = List.of(\u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;, \u0026#34;c\u0026#34;, \u0026#34;d\u0026#34;, \u0026#34;e\u0026#34;); Stream\u0026lt;Integer\u0026gt; numbers = IntStream.range(0, letters.size()) .mapToObj(i -\u0026gt; i); List\u0026lt;String\u0026gt; zipped = letters.stream() .gather(zip(numbers).with((letter, i) -\u0026gt; i + \u0026#34;-\u0026#34; + letter)) (1) .collect(Collectors.toList()); assertThat(zipped).containsExactly(\u0026#34;0-a\u0026#34;, \u0026#34;1-b\u0026#34;, \u0026#34;2-c\u0026#34;, \u0026#34;3-d\u0026#34;, \u0026#34;4-e\u0026#34;); } 1 gather() applies the given gatherer to each element of the stream Et voilà, we have a zip() function which can be used with Java Streams, and short of having a zip() method directly on the Stream interface itself, the resulting code reads quite nicely. In order to avoid the boxing of the int stream, I’ve also built an ObjToIntZipper:\n1 2 3 4 5 6 7 8 9 10 11 @Test public void canZipObjectWithIntStream() { List\u0026lt;String\u0026gt; letters = List.of(\u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;, \u0026#34;c\u0026#34;, \u0026#34;d\u0026#34;, \u0026#34;e\u0026#34;); IntStream numbers = IntStream.range(0, letters.size()); List\u0026lt;String\u0026gt; zipped = letters.stream() .gather(zip(numbers).with((letter, i) -\u0026gt; i + \u0026#34;-\u0026#34; + letter)) .collect(Collectors.toList()); assertThat(zipped).containsExactly(\u0026#34;0-a\u0026#34;, \u0026#34;1-b\u0026#34;, \u0026#34;2-c\u0026#34;, \u0026#34;3-d\u0026#34;, \u0026#34;4-e\u0026#34;); } Usually I am cautious of types with three or more type arguments, as it easily leads to APIs which are cumbersome to use. But the Gatherer API actually felt quite intuitive to me after just a little while.\nThe only real downside is that this gatherer cannot be parallelized. While the API itself allows for the creation of parallel-ready gatherers (by implementing the optional combiner()) method, you don’t have a handle to the second stream’s spliterator of a particular subdivision step from within a gatherer implementation. The only way for doing this is on the spliterator level, as shown by Jose Paumard in here. Note that both input streams must have the same length in order for this to work, as otherwise you’d end up zipping elements at different positions in the two input streams.\nYou can find the complete source code of the proof-of-concept zipping gatherer in this GitHub repository.\n","id":81,"publicationdate":"Mar 18, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, I was looking for means of \u003ca href=\"https://twitter.com/gunnarmorling/status/1764305703047438361\"\u003ezipping two Java streams\u003c/a\u003e:\nconnecting them element by element—​essentially a join based on stream offset position—​and emitting an output stream with the results.\nUnfortunately, there is no \u003ccode\u003ezip()\u003c/code\u003e method offered by the Java Streams API itself.\nWhile it was considered for inclusion in early preview versions,\nthe method was removed before the API went GA with Java 8 and you have to resort to 3rd party libraries such as \u003ca href=\"https://guava.dev/releases/snapshot-jre/api/docs/com/google/common/collect/Streams.html#zip(java.util.stream.Stream,java.util.stream.Stream,java.util.function.BiFunction)\"\u003eGoogle Guava\u003c/a\u003e if you need this functionality.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","streams","language-features"],"title":"A Zipping Gatherer","uri":"https://www.morling.dev/blog/zipping-gatherer/"},{"content":"","id":82,"publicationdate":"Mar 18, 2024","section":"tags","summary":"","tags":null,"title":"language-features","uri":"https://www.morling.dev/tags/language-features/"},{"content":"","id":83,"publicationdate":"Mar 18, 2024","section":"tags","summary":"","tags":null,"title":"streams","uri":"https://www.morling.dev/tags/streams/"},{"content":" Table of Contents Full Events Delta Events Id-only Events Change Event Metadata Comparison This post originally appeared on the Decodable blog. All rights reserved.\nData change events are at the core of Change Data Capture (CDC) solutions such as Debezium. They describe the changes made to a specific record in a database and allow event consumers to take action based on this information, enabling a wide range of use cases, such as real-time ETL (by propagating the updated data into downstream data stores such as data warehouses, analytics databases, or fulltext search indexes), microservices data exchange, or audit logging.\nWhat is contained within a change event, really? What kinds of change events exist, and when should you use which? These are some of the questions I’d like to answer in this post by developing a taxonomy of data change events, discussing three kinds of events:\nFull events, which contain the complete state of a changed record,\nDelta events, which contain the mutated fields of a record, and\nId-only events, which contain only the id (primary key) of a changed record.\nFull Events Let’s start with the type of event which most users of CDC probably will be familiar with: full, or complete, data change events. Whenever something changes to a record in a source datastore, such a change event will contain the complete state of that record. As an example, let’s consider a table customers with columns id, first_name, and last_name, as well as an array-typed column emails. If a customer record’s first_name value gets updated, while the other fields don’t change, the corresponding change event could look like this, using JSON notation:\n1 2 3 4 5 6 { \u0026#34;id\u0026#34; : 42, \u0026#34;first_name\u0026#34; : \u0026#34;Barry\u0026#34;, \u0026#34;last_name\u0026#34; : \u0026#34;Wilson\u0026#34;, \u0026#34;emails\u0026#34; : [\u0026#34;barry@example.com\u0026#34;, \u0026#34;bwilson@example.com\u0026#34;] } The change event is fully self-contained. It describes the complete state of the record at the point in time when it was altered, specifically, the record’s new state after the modification. Many CDC solutions expose the old and the new state (sometimes referred to as old and new \u0026#34;row image\u0026#34;) of a modified record in their change events, for instance named before and after in the case of Debezium:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 { \u0026#34;before\u0026#34;: { \u0026#34;id\u0026#34; : 42, \u0026#34;first_name\u0026#34; : \u0026#34;Billy\u0026#34;, \u0026#34;last_name\u0026#34; : \u0026#34;Wilson\u0026#34;, \u0026#34;emails\u0026#34; : [\u0026#34;barry@example.com\u0026#34;, \u0026#34;bwilson@example.com\u0026#34;] }, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34; : 42, \u0026#34;first_name\u0026#34; : \u0026#34;Barry\u0026#34;, \u0026#34;last_name\u0026#34; : \u0026#34;Wilson\u0026#34;, \u0026#34;emails\u0026#34; : [\u0026#34;barry@example.com\u0026#34;, \u0026#34;bwilson@example.com\u0026#34;] }, } Which parts are present in an event, depends on the kind of the data change:\nfor an event representing the insertion of a record, only the new row image is present,\nfor an update, both old and new are present, and\nfor a delete event, only the old row image in the after block is present.\nWhether the old row image actually is present in insert and update events also depends on the configuration of the source database. Typically, the retention of old row images must be explicitly enabled, as it comes at the cost of additional disk space consumption by the database system. As an example, in order to emit the old row version in change events with Postgres, the table’s replica identity must be set to FULL.\nData Change Events and Apache Kafka\nWhen transferring data change events via partitioned systems such as Apache Kafka, you also need to define a key for your messages. It defines which partition of a change event topic a record will be sent to, ensuring correct ordering of all the records with the same key. For data change events, the key should be derived from the primary key of the represented record in the source data store. That way, all change events for one record will go into one and the same partition of the corresponding change event topic, and consumers will receive them in the exact same order as they occurred in the source database.\nDelta Events Let’s look at delta events, or partial change events, next. They don’t contain the full state of the represented record, but only those columns or fields whose value actually changed as well as the record’s id. In other words, they describe exactly what has changed compared to the previous version of a record (but nothing more). For an event representing an insert operation, these are all the record’s attributes, and for an update operation just the mutated ones. For a delete, only the id will be present.\nPartial change events can be designed in two different ways. The first one is to emit any modified attribute in a change event. Let’s consider the example from the previous section again: the first name of customer 42 gets modified, while last name and email addresses remain unchanged. Using JSON notation again, and just focusing on the new row image, the corresponding change event could look like this:\n1 2 3 4 { \u0026#34;id\u0026#34; : 42, \u0026#34;first_name\u0026#34; : \u0026#34;Barry\u0026#34; } Depending on the chosen serialization format, there are some subtleties around the handling of null values. In particular, it must allow you to differentiate between an (optional) attribute being set to null and an attribute not being mutated at all. In JSON, you could distinguish between these two cases by emitting a null value for the field vs. omitting it from the event payload.\nA second option for partial data change events is to describe which operations were applied to which attributes specifically. This can come in handy in particular when dealing with array-valued attributes. In case of updates, when the change event format contains the full new value, then a small change could cause write amplification, when for instance adding or removing one element to/from an array with twenty entries. Formats such as JSON Patch are useful here, as they allow you to describe the changes on a more fine-grained basis:\n1 2 3 4 5 6 7 { \u0026#34;id\u0026#34; : 42, \u0026#34;patch\u0026#34; : [ { \u0026#34;op\u0026#34;: \u0026#34;replace\u0026#34;, \u0026#34;path\u0026#34;: \u0026#34;/first_name\u0026#34;, \u0026#34;value\u0026#34;: \u0026#34;Barry\u0026#34; }, { \u0026#34;op\u0026#34;: \u0026#34;add\u0026#34;, \u0026#34;path\u0026#34;: \u0026#34;/emails/-\u0026#34;, \u0026#34;value\u0026#34;: { \u0026#34;berry@example.com\u0026#34; } } ] } Unlike full events, delta data change events are not fully self-contained. When receiving a partial update event, an event consumer must be able to access the previous state of that record in order to be able apply that patch event. If for instance the consuming system is a SQL database, an UPDATE statement could be issued for updating the affected columns.\nBut what to do when a sink data system does not support partial updates and instead always requires ingesting the complete record when an update has happened? For cases like this, stateful stream processing, for instance using Apache Flink, can be a useful option. You’d put this stream processor between your event source and sink, and it would \u0026#34;re-hydrate\u0026#34; full events, this means apply all the incoming partial events one after the other. To do so, it would utilize an internal state store (such as RocksDB, in case of Flink). When processing an insert change event for a record, this event would be put into the state store, before emitting it downstream.\nLater on, when processing update events, the stream processor can obtain any attribute values absent from incoming partial events from the state store, thus exposing only ever full events to the downstream event consumers. While a similar read-before-write approach could also be implemented within sink data stores, doing it in a stream processing pipeline allows to build the re-hydration logic once and then let multiple sinks benefit from it.\nThis technique can also come in handy for situations where a CDC system emits full data change events most of the time, but may emit partial events in certain cases. One example is Debezium’s connector for Postgres, which doesn’t emit the value for TOAST columns if their value hasn’t changed. Stateful stream processing as described above can help to shield consumers from this behavior and always expose complete events to event consumers.\nId-only Events The last and most basic form of a data change event are id-only events. They merely describe which record in the source database was affected by a change. For this purpose, all which the event must contain is the id of the record, for instance, the primary key value of a row in an RDBMS):\n1 2 3 { \u0026#34;id\u0026#34; : 42 } Id-only events are used in contexts other than databases and CDC in the strict sense of the word, too. One example are Amazon S3 event notifications, which you can use for subscribing to changes occurring in an S3 bucket, such as the addition or removal of files. The id-only event style is used here as would not be practical to expose the entire file state in the corresponding change events.\nBy its very nature, such an id-only event doesn’t tell you what exactly has changed about the represented record. This makes this event type only useful for quite a narrow range of applications. For instance, you could use it to invalidate items in a cache, but you couldn’t use it by itself to update at a cache. Examples for systems working with id-only events include the Change Tracking feature of Microsoft SQL Server, the \u0026#34;key only\u0026#34; mode of CockroachDB, and the KEYS_ONLY stream view type in DynamoDB.\nIf you’d like to obtain the entire row, you don’t have any other choice than re-selecting it from the source store. This could be done by change event consumers themselves, but also by a stream processor which then emits full change events to downstream consumers. There are a few things to consider when doing this.\nMost importantly, CDC tools emit change events asynchronously, which means that by the time you run a query for obtaining the complete row state, that row may already have been mutated again. The query will return the current state of the row, not the one which was valid at the point in time when the change event was triggered originally. If there are multiple changes to the row in close timely proximity, you may not be able to extract all the intermediary versions of that record. The following visual illustrates this situation:\nAt time t1, the record with id 4 is inserted into the customers table, and the corresponding id-only change event is emitted. Shortly thereafter, the record gets updated at time t2, changing the first name of the customer from \u0026#34;Jazmine\u0026#34; to \u0026#34;Melissa\u0026#34;. At time t3, the change event for the original insert operation gets processed, issuing a re-select of the record from the database. As the update already has been committed at this point, the stream processor will propagate the state as of after the update, rather than its original version at insertion time.\nIt may even be possible that the record has been deleted since then, which means that its state cannot be reconstructed by querying for it.\nAn exception are databases which allow for point-in-time queries, provided the change event contains an unambiguous timestamp or log position describing when the event occurred. In that case, you could use that information to retrieve the right version, for instance using an AS OF SCN query in Oracle.\nEvents re-hydrated that way can still be very useful, for instance for propagating data changes into a full-text search engine; in general, you’ll be fine there with just having the latest version of a record in the index, and you don’t need to apply all intermediary updates occurring in a short period of time. On the other hand, if you’re using CDC for tracking the state transitions of a purchase order and triggering corresponding downstream actions, or for maintaining an audit log, then it is vital to keep track of each and every data change and this technique would not be useful.\nWhen implementing a re-select strategy, you should consider retrieving multiple records at once. So, for example, when receiving change events for ten customer records, instead of executing ten queries for retrieving them one-by-one, you might batch them into one single query, significantly reducing the load on the source database. Another interesting option is to not only retrieve the specific record itself, but instead to retrieve an entire aggregate of data. When receiving that id-only event for customer 42, you might for instance run a query which retrieves the customer data as well as their address information and bank account details by joining all the relevant tables.\nBefore comparing the three types of data change events and discussing their individual pros and cons, there’s another concern which deserves attention, and this is event metadata, i.e. data which describes contextual information for an event.\nChange Event Metadata Besides the actual change event payload, representing the data change itself, it often is useful to have additional metadata for an event. This typically includes:\nThe type of a change (insert, update, delete)\nTimestamp when the event occurred\nName of the originating database, schema, and table\nTransaction id\nPosition of the event in the source database’s transaction log\nThe query triggering a change\nAs an example, here’s an update event as emitted by the Debezium connector for Postgres, with a range of event metadata in the ts_ms, op, and source fields (you’d find similar metadata in the events emitted by other CDC tools such as Maxwell’s Daemon):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 { \u0026#34;before\u0026#34;: { \u0026#34;id\u0026#34;: 1004, \u0026#34;first_name\u0026#34;: \u0026#34;Billy\u0026#34;, \u0026#34;last_name\u0026#34;: \u0026#34;Wilson\u0026#34;, \u0026#34;email\u0026#34;: \u0026#34;bwilson@example.com\u0026#34; }, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 1004, \u0026#34;first_name\u0026#34;: \u0026#34;Barray\u0026#34;, \u0026#34;last_name\u0026#34;: \u0026#34;Wilson\u0026#34;, \u0026#34;email\u0026#34;: \u0026#34;bwilson@example.com\u0026#34; }, \u0026#34;source\u0026#34;: { \u0026#34;version\u0026#34;: \u0026#34;2.5.0.Final\u0026#34;, \u0026#34;connector\u0026#34;: \u0026#34;postgresql\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1705663711187, \u0026#34;snapshot\u0026#34;: \u0026#34;false\u0026#34;, \u0026#34;db\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;sequence\u0026#34;: \u0026#34;[\\\u0026#34;34471328\\\u0026#34;,\\\u0026#34;34494376\\\u0026#34;]\u0026#34;, \u0026#34;schema\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;table\u0026#34;: \u0026#34;customers\u0026#34;, \u0026#34;txId\u0026#34;: 773, \u0026#34;lsn\u0026#34;: 34494376, \u0026#34;xmin\u0026#34;: null }, \u0026#34;op\u0026#34;: \u0026#34;u\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1705663711220 } Change event metadata allows for a number of interesting applications on the consumer side. For instance, the information about which transaction an event originated from can be used for propagating the same transactional semantics to the sink of a data pipeline too: instead of ingesting incoming events one by one, you could buffer the events for one transaction and apply them all at once in a transaction to the sink data store. That way, queries against the sink data store are subject to the same isolation guarantees as with the source database. Another interesting metadata field is the sequence attribute emitted by Debezium’s Postgres connector, which can be used by clients for deduplication in data pipelines with at-least once semantics.\nComparison Having explored the three kinds of data change event, which one should you use? As so often, there is not an universal answer to that. Each of the types has its respective advantages and disadvantages and you’ll need to make an informed decision based on your specific context.\nComplete data change events tend to be the easiest to handle for consuming systems. The incoming event can be simply written to a sink data store using \u0026#34;upsert\u0026#34; semantics, overwriting whatever version there might have been there before. When propagating change events via distributed log systems such as Apache Kafka, a topic with full change events can be compacted. As each event is fully self-contained, it is sufficient to keep the latest change event per record in the log and it is still possible to propagate the complete state of the data set to consumers (and if your change events contain new and old row images, you’d even have the last two versions per recorded even in a compacted change event topic). It is also easily possible to bootstrap new event consumers solely from the state in the distributed log. The downside of full events is their larger size.\nId-only events are much more compact, as they don’t convey any information other than the id of changed records. In order to retrieve the actual event state, you’ll need to query the source system again which comes at the risk that you may miss any intermediary updates to a record occurring between the point in time when a change event was triggered and when you process it. As such, their use is rather limited, but they can come in handy for some use cases such as cache invalidation.\nDelta events can be an interesting middle ground. Conveying only the modified fields of a changed record, they consume less space than a full event. But in order to propagate them into a sink datastore, it must have the capability to do partial updates, i.e. the ability to just update a subset of a record’s fields, rather than having to rewrite the entire record. If that is not an option, you can use a stateful stream processing pipeline between the CDC tool and the sink datastore to recreate full events. A change event topic with delta events cannot be compacted, as consumers otherwise may miss update events they need to recreate the represented source record. As a consequence, when there is a high volume of updates, a topic with partial events may even consume more space than a (compacted) topic with full events.\nThe following table provides an overview of the three change event types and their specific properties:\nAs the comparison shows, each of the different event types has its individual characteristics. Which one to use depends not only on the capabilities of the systems you’re working with, but also—as always—on the specific use case and its requirements.\nReal-time stream processing with solutions such as Apache Flink, is a powerful companion to CDC tools like Debezium, allowing you to transform and amend change event streams if and when needed. Examples include the expansion of id-only change events—by selecting the entire row state from the source database—as well as the hydration of full events from delta change events by using a state store. But you also can employ stream processing to great effect for other CDC-related tasks, for instance for establishing stable data contracts for your change events streams, as I’ve discussed in another blog post not too long ago.\n","id":84,"publicationdate":"Mar 13, 2024","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_full_events\"\u003eFull Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_delta_events\"\u003eDelta Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_id_only_events\"\u003eId-only Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_change_event_metadata\"\u003eChange Event Metadata\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_comparison\"\u003eComparison\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/taxonomy-of-data-change-events\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eData change events are at the core of Change Data Capture (CDC) solutions such as \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e.\nThey describe the changes made to a specific record in a database and allow event consumers to take action based on this information, enabling a wide range of \u003ca href=\"/blog/cdc-use-cases\"\u003euse cases\u003c/a\u003e, such as real-time ETL (by propagating the updated data into downstream data stores such as data warehouses, analytics databases, or fulltext search indexes), microservices data exchange, or audit logging.\u003c/p\u003e\n\u003c/div\u003e","tags":["cdc","debezium","streaming"],"title":"A Taxonomy Of Data Change Events","uri":"https://www.morling.dev/blog/taxonomy-of-data-change-events/"},{"content":"","id":85,"publicationdate":"Feb 20, 2024","section":"tags","summary":"","tags":null,"title":"database-design","uri":"https://www.morling.dev/tags/database-design/"},{"content":" In many applications it’s a requirement to keep track of when a record was created and updated the last time. Often, this is implemented by having columns such as created_at and updated_at within each table. To make things as simple as possible for application developers, the database itself should take care of maintaining these values automatically when a record gets inserted or updated.\nFor the creation timestamp, that’s as simple as specifying a column default value of current_timestamp. When omitting the value from an INSERT statement, the field will be populated automatically with the current timestamp. What about the update timestamp though?\nSolely relying on the default value won’t cut it, as the field already has a value when a row gets updated. You also shouldn’t set the value from within your application code. Otherwise, create and update timestamps would have different sources, potentially leading to anomalies if there are clock differences between application and database server, such as a row’s created_at timestamp being younger than it’s updated_at timestamp.\nFor MySQL, the ON UPDATE clause can be used to set the current timestamp when a row gets updated. Postgres does not support this feature, unfortunately. If you search for a solution, most folks suggest defining an ON UPDATE trigger for setting the update timestamp. This also is what I’d have done until recently; it works, but having to declare such a trigger for every table can quickly become a bit cumbersome.\nBut as I’ve just learned from a colleague, there’s actually a much simpler solution: Postgres lets you explicitly set a field’s value to its default value when updating a row. So given this table and row:\n1 2 3 4 5 6 7 8 9 CREATE TABLE movie ( id SERIAL NOT NULL, title TEXT, viewer_rating NUMERIC(2, 1), created_at TIMESTAMP NOT NULL DEFAULT current_timestamp, updated_at TIMESTAMP NOT NULL DEFAULT current_timestamp ); INSERT INTO movie (title, score) VALUES (\u0026#39;North by Northwest\u0026#39;, 9.2); Then auto-updating the updated_at field is as simple as this:\n1 2 3 4 5 6 UPDATE movie SET viewer_rating = 9.6, updated_at = DEFAULT WHERE id = 1; The value will be retrieved by the database when executing the statement, so there is no potential for inconsistencies with the created_at value. It is not quite as elegant as MySQL’s ON UPDATE, as you must make sure to set the value to DEFAULT in each UPDATE statement your application issues. But pretty handy nevertheless, and certainly more convenient than defining triggers for all tables. If you need to retrieve the value from within your application as well, you simply can expose it using the RETURNING clause:\n1 2 3 4 5 6 7 8 UPDATE movie SET score = 9.6, updated_at = DEFAULT WHERE id = 1 RETURNING updated_at; If you want to play with this example by yourself, you can find it here on DB Fiddle.\n","id":86,"publicationdate":"Feb 20, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn many applications it’s a requirement to keep track of when a record was created and updated the last time.\nOften, this is implemented by having columns such as \u003ccode\u003ecreated_at\u003c/code\u003e and \u003ccode\u003eupdated_at\u003c/code\u003e within each table.\nTo make things as simple as possible for application developers,\nthe database itself should take care of maintaining these values automatically when a record gets inserted or updated.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","database-design"],"title":"Last Updated Columns With Postgres","uri":"https://www.morling.dev/blog/last-updated-columns-with-postgres/"},{"content":" Recently I ran into a situation where it was necessary to capture the output of a Java process on the stdout stream, and at the same time a filtered subset of the output in a log file. The former, so that the output gets picked up by the Kubernetes logging infrastructure. The letter for further processing on our end: we were looking to detect when the JVM stops due to an OutOfMemoryError, passing on that information to some error classifier.\nSimply redirecting the standard output stream of the process to a file wouldn’t satisfy the first requirement. Instead, the tee command offers a solution: it reads from stdin and writes everything to stdout as well as a file:\n1 2 $ java -jar my-app.jar -XX:+ExitOnOutOfMemoryError \\ | tee my-app.log (1) 1 Pipe stdout to tee, which writes it to both stdout and a log file This kinda gives us what we want, but we lack control over the size of that log file. As is, it can grow indefinitely, eventually causing the application’s pod to run out of disk space. For the case at hand, we’re just interested in specific lines anyways. So ideally the content written to the log file would be filtered accordingly, while exposing the complete output to the Kubernetes log collector via stdout.\nTo accommodate that requirement, process substitution can be used. In a nutshell, it provides a bridge between the standard input and output streams and files:\n\u0026gt;(commands) substitutes a file a process writes to with another process which receives the written content on stdin\n\u0026lt;(commands) substitutes a file a process reads from with another process which provides the content on stdout\nNote that there must be no space between \u0026gt;/\u0026lt; and the left parenthesis. I.e. this is no redirection. The former variant is exactly what we need: instead of directly writing all the process output to the log file, we use grep to filter any written content, based on the string we’re looking for:\n1 2 $ java -jar my-app.jar -XX:+ExitOnOutOfMemoryError \\ | tee \u0026gt;(grep \u0026#39;OutOfMemoryError\u0026#39; \u0026gt; my-app.log) (1) 1 Represent the stdin of grep as a file for tee to write to That way, the complete stdout of our process gets exposed to Kubernetes\u0026#39; logging infrastructure, while only the filtered output get written to our log file:\n1 2 $ cat my-app.log Terminating due to java.lang.OutOfMemoryError: Java heap space To get a better intuition of what process substitution does under the hood, let’s create a simple Java program which reads from a file specified as a program argument:\n1 2 3 4 5 6 7 8 9 10 11 12 13 import java.nio.charset.Charset; import java.nio.file.Files; import java.nio.file.Paths; public void main(String... args) throws Exception { var fileName = args[0]; System.out.println(\u0026#34;File: \u0026#34; + fileName); (1) String content = Files.readString( (2) Paths.get(fileName), Charset.defaultCharset() ); System.out.println(\u0026#34;Content: \u0026#34; + content); } 1 Print the passed file name 2 Print the content of the file Here’s the program’s output when using process substitution for exposing the stdout of echo:\n1 2 3 $ java --enable-preview --source 21 read_file.java \u0026lt;(echo \u0026#34;hello\u0026#34;) File: /dev/fd/11 Content: hello /dev/fd is a special directory which contains a file descriptor for each file opened by a process. So what is /dev/fd/11 then? Most implementations of process substitution represent stdin/stdout via anonymous pipes. If we take a look at the list of files opened by the process, we can see that this is the case here too:\n1 2 3 4 $ lsof -p 99657 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ... java 99657 gunnar 11 PIPE 0xc2e0b19eaf172929 16384 FD 11 is a pipe created through process substitution, and the standard Java file I/O APIs can be used to read its contents via that descriptor.\n","id":87,"publicationdate":"Feb 10, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eRecently I ran into a situation where it was necessary to capture the output of a Java process on the \u003ccode\u003estdout\u003c/code\u003e stream,\nand at the same time a filtered subset of the output in a log file.\nThe former, so that the output gets picked up by the Kubernetes logging infrastructure.\nThe letter for further processing on our end:\nwe were looking to detect when the JVM stops due to an \u003ccode\u003eOutOfMemoryError\u003c/code\u003e, passing on that information to some error classifier.\u003c/p\u003e\n\u003c/div\u003e","tags":["shell","linux","logging"],"title":"Filtering Process Output With tee","uri":"https://www.morling.dev/blog/filtering-process-output-with-tee/"},{"content":"","id":88,"publicationdate":"Feb 10, 2024","section":"tags","summary":"","tags":null,"title":"linux","uri":"https://www.morling.dev/tags/linux/"},{"content":"","id":89,"publicationdate":"Feb 10, 2024","section":"tags","summary":"","tags":null,"title":"logging","uri":"https://www.morling.dev/tags/logging/"},{"content":" Table of Contents Results Bonus Result: 32 Cores, 64 Threads Bonus Result: 10K Key Set Thank You! Which Challenge Will Be Next? Oh what a wild ride the last few weeks have been. The One Billion Row Challenge (1BRC for short), something I had expected to be interesting to a dozen folks or so at best, has gone kinda viral, with hundreds of people competing and engaging. In Java, as intended, but also beyond: folks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk.\nIt’s really incredible how far people have pushed the limits here. Pull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation. Today I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31 and all submissions have been reviewed.\nResults So without further ado, here are the top 10 entries for the official 1BRC competition. These results are from running on eight cores of a 32 core AMD EPYC™ 7502P (Zen2) machine:\n# Result (m:s.ms) Implementation JDK Submitter 1\n00:01.535\nlink\n21.0.2-graal\nThomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen\n2\n00:01.587\nlink\n21.0.2-graal\nArtsiom Korzun\n3\n00:01.608\nlink\n21.0.2-graal\nJaromir Hamala\n00:01.880\nlink\n21.0.1-open\nSerkan Özal\n00:01.921\nlink\n21.0.2-graal\nVan Phu DO\n00:02.018\nlink\n21.0.2-graal\nStephen Von Worley\n00:02.157\nlink\n21.0.2-graal\nRoy van Rijn\n00:02.319\nlink\n21.0.2-graal\nYavuz Tas\n00:02.332\nlink\n21.0.2-graal\nMarko Topolnik\n00:02.367\nlink\n21.0.1-open\nQuan Anh Mai\nYou can find the complete result list with all 164 submissions as well as the description of the evaluation process in the 1BRC repository.\nCongratulations to the implementers of the top three entries (Thomas Wuerthinger/Quan Anh Mai/Alfonso² Peterssen, Artsiom Korzun, and Jaromir Hamala), as well as everyone else one the leaderboard for putting in the effort to participate in this challenge! For the fun of it, and as a small expression of my appreciation, I have created a personalized \u0026#34;certificate\u0026#34; PDF for each accepted submission, stating the author’s name and time. You can find it at your entry in the leaderboard table.\nInitially I had meant to pay for a 1️⃣🐝🏎️ t-shirt for the winner out of my own pocket. But then I remembered I have a company credit card ;) So I will actually do t-shirts for the Top 3 and a 1BRC coffee mug for the Top 20. I will follow up with the winners on the details of getting these to you shortly. Thanks a lot to my employer Decodable (we build a SaaS for real-time ETL and stream processing, you should totally check us out!) for sponsoring not only these prizes, but also the evaluation machine. It means a lot to me!\nI am planning to dive into some of the implementation details in another blog post, there is so much to talk about: segmentation and parallelization, SIMD and SWAR, avoiding branch mispredictions and spilling, making sure the processor’s pipelines are always fully utilized, the \u0026#34;process forking\u0026#34; trick, and so much more. For now let me just touch on two things which stick out when looking at the results. One is that all the entries in the Top 10 are using Java’s notorious Unsafe class for faster yet unsafe memory access. Planned to be removed in a future version, it will be interesting to see which replacement APIs will be provided in the JDK for ensuring performance-sensitive applications like 1BRC don’t suffer.\nAnother noteworthy aspect is that with two exceptions all entries in the Top 10 are using GraalVM to produce a native binary. These are faster to start and reach peak performance very quickly (no JIT compilation). As the result times got down to less than two seconds, this makes the deciding difference. Note that other entries of the contest also use GraalVM as a JIT compiler for JVM-based entries, which also was beneficial for the problem at hand. This is a perfect example for the kind of insight I was hoping to gain from 1BRC. A special shout-out to Serkan Özal for creating the fastest JVM-based solution, coming in on a great fourth place!\nBonus Result: 32 Cores, 64 Threads For officially evaluating entries into the challenge, each contender was run on eight cores of the target machine. This was done primarily to keep results somewhat in the same ballpark as the figures of the originally used machine (I had to move to a different environment after a little while, re-evaluating all the previous entries).\nBut it would be a pity to leave all the 24 other cores unused, right? So I ran the fastest 50 entries from the regular evaluation on all 32 cores / 64 threads (i.e. SMT is enabled) of the machine, with turbo boost enabled too, and here is the Top 10 from this evaluation (the complete result set for this evaluation can be found here):\n# Result (m:s.ms) Implementation JDK Submitter 1\n00:00.323\nlink\n21.0.2-graal\nJaromir Hamala\n2\n00:00.326\nlink\n21.0.2-graal\nThomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen\n3\n00:00.349\nlink\n21.0.2-graal\nArtsiom Korzun\n00:00.351\nlink\n21.0.2-graal\nVan Phu DO\n00:00.389\nlink\n21.0.2-graal\nStephen Von Worley\n00:00.408\nlink\n21.0.2-graal\nYavuz Tas\n00:00.415\nlink\n21.0.2-graal\nRoy van Rijn\n00:00.499\nlink\n21.0.2-graal\nMarko Topolnik\n00:00.602\nlink\n21.0.1-graal\nRoman Musin\n00:00.623\nlink\n21.0.1-open\ngonix\nThe fastest one coming in here is Jaromir Hamala, whose entry seems to take a tad more advantage of the increased level of parallelism. I’ve run this benchmark a handful of times, and the times and ordering remained stable, so I feel comfortable about publishing these results, albeit being very, very close. Congrats, Jaromir!\nBonus Result: 10K Key Set One thing which I didn’t expect to happen was that folks would optimize that much for the specific key set used by the example data generator I had provided. While the rules allow for 10,000 different weather station names with a length of up to 100 bytes, the key set used during evaluation contained only 413 distinct names, with most of them being shorter than 16 bytes. This fact heavily impacted implementation strategies, for instance when it comes to parsing rows of the file, or choosing hash functions which work particularly well for aggregating values for those 413 names.\nSo some folks asked for another evaluation using a data set which contains a larger variety of station names (kudos to Marko Topolnik who made a strong push here). I didn’t want to change the nature of the original task after folks had already entered their submissions, but another bonus evaluation with 10K keys and longer names seemed like a great idea. Here are the top 10 results from running the fastest 40 entries of the regular evaluation against this data set (all results are here):\n# Result (m:s.ms) Implementation JDK Submitter 1\n00:02.957\nlink\n21.0.2-graal\nArtsiom Korzun\n2\n00:03.058\nlink\n21.0.2-graal\nMarko Topolnik\n3\n00:03.186\nlink\n21.0.2-graal\nStephen Von Worley\n00:03.998\nlink\n21.0.2-graal\nRoy van Rijn\n00:04.042\nlink\n21.0.2-graal\nJaromir Hamala\n00:04.289\nlink\n21.0.1-open\ngonix\n00:04.522\nlink\n21.0.2-graal\ntivrfoa\n00:04.653\nlink\n21.0.2-graal\nJamal Mulla\n00:04.733\nlink\n21.0.1-open\ngonix\n00:04.836\nlink\n21.0.1-graal\nSubrahmanyam\nThis evaluation shows some interesting differences to the other ones. There are some new entries to this Top 10, while some entries from the original Top 10 do somewhat worse for the 10K key set, solely due to the fact that they have been so highly optimized for the 413 stations key set. Congrats to Artsiom Korzun, whose entry is not only the fastest one in this evaluation, but who also is the only contender to be in the Top 3 for all the different evaluations!\nThank You! The goal of 1BRC was to be an opportunity to learn something new, inspire others to do the same, and have some fun along the way. This was certainly the case for me, and I think for participants too. It was just great to see how folks kept working on their submissions, trying out new approaches and techniques, helping each other to improve their implementations, and even teaming up to create joint entries. I feel the decision to allow participants to take inspiration from each other and adopt promising techniques explored by others was absolutely the right one, aiding with the \u0026#34;learning\u0026#34; theme of the challenge.\nI’d like to extend my gratitude to everyone who took part in the challenge: Running 1BRC over this month and getting to experience where the community would go with this has been nothing but absolutely amazing. This would not have been possible without all the folks who stepped up to help organize the challenge, be it by creating and extending a test suite for verifying correctness of challenge submissions, setting up and configuring the evaluation machine, or by building the infrastructure for running the benchmark and maintaining the leaderboard. A big shout-out to Alexander Yastrebov, Rene Schwietzke, Jason Nochlin, Marko Topolnik, and everyone else involved!\nA few people have asked for stats around the challenge, so here are some:\n587 integrated pull requests, 164 submissions\n61 discussions, including an amazing \u0026#34;Show \u0026amp; Tell\u0026#34; section where folks show-case their non-Java based solutions\n1.1K forks of the project\n3K star-gazers of the project, with the fastest growth in the second week of January\n1,909 workflow runs on GitHub Actions (it would have been way more, had I set up an action for running the test suite against incoming pull requests earlier, doh)\n187 lines of comment in the entry of Aleksey Shipilëv\n188x speed-up improvement between the baseline implementation and the winning entry\n~100 consumed cups of coffee while evaluating the entries\nLastly, here are some more external resources on 1BRC, either on the challenge itself or folks sharing their insights from building a solution (see here for a longer list of blog posts and videos):\nCliff Click discussing his 1BRC solution on the Coffee Compiler Club (video)\nThe One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds (interview by Olimpiu Pop)\nOne Billion Row Challenge (blog post by Ragnar Groot Koerkamp)\nWhich Challenge Will Be Next? Java is alive and kicking! 1BRC has shown that Java and its runtime are powerful and highly versatile tools, suitable also for tasks where performance is of uttermost importance. Apart from the tech itself, the most amazing thing about Java is its community though: it sparked a tremendous level of joy to witness how folks came together for solving this challenge, learning with and from each other, sharing tricks, and making this a excellent experience all-around.\nSo I guess it’s just natural that some folks asked whether there’d be another challenge like this any time soon, when it is going to happen, what it will be about, etc. Someone even stated they’d take some time off January next year to fully focus on the challenge :)\nI think for now it’s a bit too early to tell what could be next and I’ll definitely need a break from running a challenge. But if a team came together to organize something like 1BRC next year, with a strong focus on running things in an automated way as much as possible, I could absolutely see this. The key challenge (sic!) will be to find a topic which is equally as approachable as this year’s task, while providing enough opportunity for exploration and optimization. I am sure the community will manage to come up with something here.\nFor now, congrats once again to everyone participating this time around, and a big thank you to everyone helping to make it a reality!\n1️⃣🐝🏎️ ","id":90,"publicationdate":"Feb 4, 2024","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_results\"\u003eResults\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_bonus_result_32_cores_64_threads\"\u003eBonus Result: 32 Cores, 64 Threads\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_bonus_result_10k_key_set\"\u003eBonus Result: 10K Key Set\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_thank_you\"\u003eThank You!\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_which_challenge_will_be_next\"\u003eWhich Challenge Will Be Next?\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOh what a wild ride the last few weeks have been.\nThe \u003ca href=\"/blog/one-billion-row-challenge/\"\u003eOne Billion Row Challenge\u003c/a\u003e (1BRC for short),\nsomething I had expected to be interesting to a dozen folks or so at best,\nhas gone kinda viral, with hundreds of people competing and engaging.\nIn Java, as intended, but also \u003ca href=\"https://github.com/gunnarmorling/1brc/discussions/categories/show-and-tell\"\u003ebeyond\u003c/a\u003e:\nfolks implemented the challenge in languages such as Go, Rust, C/C++, C#, Fortran, or Erlang, as well databases (Postgres, Oracle, Snowflake, etc.), and tools like awk.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt’s really incredible how far people have pushed the limits here.\nPull request by pull request, the execution times for solving the problem layed out in the challenge — aggregating random temperature values from a file with 1,000,000,000 rows — improved by two orders of magnitudes in comparison to the initial baseline implementation.\nToday I am happy to share the final results, as the challenge closed for new entries after exactly one month on Jan 31\nand all submissions have been reviewed.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","challenge"],"title":"1BRC—The Results Are In!","uri":"https://www.morling.dev/blog/1brc-results-are-in/"},{"content":"","id":91,"publicationdate":"Feb 4, 2024","section":"tags","summary":"","tags":null,"title":"challenge","uri":"https://www.morling.dev/tags/challenge/"},{"content":" Update Jan 4: Wow, this thing really took off! 1BRC is discussed at a couple of places on the internet, including Hacker News, lobste.rs, and Reddit.\nFor folks to show-case non-Java solutions, there is a \u0026#34;Show \u0026amp; Tell\u0026#34; now, check that one out for 1BRC implementations in Rust, Go, C++, and others. Some interesting related write-ups include 1BRC in SQL with DuckDB by Robin Moffatt and 1 billion rows challenge in PostgreSQL and ClickHouse by Francesco Tisiot.\nThanks a lot for all the submissions, this is going way beyond what I’d have expected! I am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit. I have also made a few clarifications to the rules of the challenge; please make sure to read them before submitting any entries.\nLet’s kick off 2024 true coder style—​I’m excited to announce the One Billion Row Challenge (1BRC), running from Jan 1 until Jan 31.\nYour mission, should you decide to accept it, is deceptively simple: write a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. There’s just one caveat: the file has 1,000,000,000 rows!\nThe text file has a simple structure with one measurement value per row:\n1 2 3 4 5 6 Hamburg;12.0 Bulawayo;8.9 Palembang;38.8 St. John\u0026#39;s;15.2 Cracow;12.6 ... The program should print out the min, mean, and max values per station, alphabetically ordered like so:\n1 {Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1, Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1, Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7, ...} The goal of the 1BRC challenge is to create the fastest implementation for this task, and while doing so, explore the benefits of modern Java and find out how far you can push this platform. So grab all your (virtual) threads, reach out to the Vector API and SIMD, optimize your GC, leverage AOT compilation, or pull any other trick you can think of.\nThere’s a few simple rules of engagement for 1BRC (see here for more details):\nAny submission must be written in Java\nAny Java distribution available through SDKMan as well as early access builds from openjdk.net may be used, including EA builds for OpenJDK projects like Valhalla\nNo external dependencies may be used\nTo enter the challenge, clone the 1brc repository from GitHub and follow the instructions in the README file. There is a very basic implementation of the task which you can use as a baseline for comparisons and to make sure that your own implementation emits the correct result. Once you’re satisfied with your work, open a pull request against the upstream repo to submit your implementation to the challenge.\nAll submissions will be evaluated by running the program on a Hetzner Cloud CCX33 instance (8 dedicated vCPU, 32 GB RAM). The time program is used for measuring execution times, i.e. end-to-end times are measured. Each contender will be run five times in a row. The slowest and the fastest runs are discarded. The mean value of the remaining three runs is the result for that contender and will be added to the leaderboard. If you have any questions or would like to discuss any potential 1BRC optimization techniques, please join the discussion in the GitHub repo.\nAs for a prize, by entering this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. Rumor has it that the winner may receive a unique 1️⃣🐝🏎️ t-shirt, too.\nSo don’t wait, join this challenge, and find out how fast Java can be—​I’m really curious what the community will come up with for this one. Happy 2024, coder style!\n","id":92,"publicationdate":"Jan 1, 2024","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUpdate Jan 4: Wow, this thing really took off!\u003c/em\u003e\n\u003cem\u003e1BRC is discussed at a couple of places on the internet, including \u003ca href=\"https://news.ycombinator.com/item?id=38851337\"\u003eHacker News\u003c/a\u003e, \u003ca href=\"https://lobste.rs/s/u2qcnf/one_billion_row_challenge\"\u003elobste.rs\u003c/a\u003e, and \u003ca href=\"https://old.reddit.com/r/programming/comments/18x0x0u/the_one_billion_row_challenge/\"\u003eReddit\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eFor folks to show-case non-Java solutions, there is a \u003ca href=\"https://github.com/gunnarmorling/1brc/discussions/categories/show-and-tell\"\u003e\u0026#34;Show \u0026amp; Tell\u0026#34;\u003c/a\u003e now, check that one out for 1BRC implementations in Rust, Go, C++, and others.\u003c/em\u003e\n\u003cem\u003eSome interesting related write-ups include \u003ca href=\"https://rmoff.net/2024/01/03/1%EF%B8%8F%E2%83%A3%EF%B8%8F-1brc-in-sql-with-duckdb/\"\u003e1BRC in SQL with DuckDB\u003c/a\u003e by Robin Moffatt and \u003ca href=\"https://ftisiot.net/posts/1brows/\"\u003e1 billion rows challenge in PostgreSQL and ClickHouse\u003c/a\u003e by Francesco Tisiot.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThanks a lot for all the submissions, this is going way beyond what I’d have expected!\u003c/em\u003e\n\u003cem\u003eI am behind a bit with evalutions due to the sheer amount of entries, I will work through them bit by bit.\u003c/em\u003e\n\u003cem\u003eI have also made a few clarifications to \u003ca href=\"https://github.com/gunnarmorling/1brc#faq\"\u003ethe rules\u003c/a\u003e of the challenge; please make sure to read them before submitting any entries.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLet’s kick off 2024 true coder style—​I’m excited to announce the \u003ca href=\"https://github.com/gunnarmorling/onebrc\"\u003eOne Billion Row Challenge\u003c/a\u003e (1BRC), running from Jan 1 until Jan 31.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eYour mission, should you decide to accept it, is deceptively simple:\nwrite a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station.\nThere’s just one caveat: the file has \u003cstrong\u003e1,000,000,000 rows\u003c/strong\u003e!\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","challenge"],"title":"The One Billion Row Challenge","uri":"https://www.morling.dev/blog/one-billion-row-challenge/"},{"content":" Table of Contents Stand-By Logical Replication With Debezium Towards Fail-Over Slots Wrap-Up This post originally appeared on the Decodable blog. All rights reserved.\nWelcome back to this series about logical replication from Postgres 16 stand-by servers, in which we’ll discuss how to use this feature with Debezium—a popular open-source platform for Change Data Capture (CDC) for a wide range of databases—as well as how to manage logical replication in case of failover scenarios, i.e. a situation where your primary Postgres server becomes unavailable and a stand-by server needs to take over.\nIf you want to learn more about logical replication in general, and why and how to use it with stand-by servers in Postgres 16, then I suggest heading over to part one of this blog series before continuing here.\nStand-By Logical Replication With Debezium The beauty of Postgres logical replication is that not only other Postgres instances can serve as the consumer of a replication stream, but that also other clients can subscribe to a replication slot for ingesting realtime change event feeds from Postgres. Debezium, offering CDC support for many databases, including Postgres, makes use of that for exposing change event streams via Apache Kafka, but also—via its Debezium Server component—to other kinds of messaging and streaming platforms such as AWS Kinesis, Apache Pulsar, NATS, and others. So let’s quickly test how to stream changes from a Postgres stand-by with Debezium.\nNote that we’ll need to use the latest Debezium release—2.5.0.Beta 1, released last week —in order to stream changes from a stand-by server. When I first had tested this, things wouldn’t quite work, as the connector made use of the function pg_current_wal_lsn() in order to obtain the current WAL position. This is only available on primary servers, though. So I took the opportunity to make my first little Debezium contribution for quite a while, changing it to invoke pg_last_wal_receive_lsn() instead when connecting against a stand-by. Thanks a lot to the team for the quick merge and inclusion in the Beta1 release!\nAs a playground for this experiment, I’ve created a simple Docker Compose file which launches Kafka, and Kafka Connect as the runtime environment for Debezium. Here’s an overview of all the involved components:\nFigure 1. Solution Overview Fun fact: this uses Kafka in KRaft mode, i.e. no ZooKeeper process is needed. Good times! If you want to follow along, make sure you have Docker installed and you have a Postgres primary and stand-by node set up on Amazon RDS as described in part one. Then clone the decodable/examples repository and launch the demo environment like so:\n1 2 3 git clone git@github.com:decodableco/examples.git decodable-examples cd decodable-examples/postgres-logical-replication-standby docker compose up In order to use Debezium with Postgres on RDS, it is recommended to use the pgoutput logical decoding plug-in. It is the standard decoding plug-in, also used for logical replication to other Postgres instances. This plug-in requires a publication to be set up, which configures which kinds of changes should be published for which tables. Usually, Debezium will set up the publication—similar to the logical replication slot—automatically. Unfortunately, this is not supported when ingesting changes from a stand-by server, as publications (unlike replication slots) can only be created on a primary server. Debezium doesn’t know about the primary, so you’ll need to create that publication manually before setting up the connector:\n1 2 primary\u0026gt; CREATE PUBLICATION my_publication FOR ALL TABLES; CREATE PUBLICATION Having to create publications for stand-by replication slots on the primary seems somewhat inconsistent and it’s also not ideal in terms of operations, but there may be a good reason for that requirement.\nInstead of the ALL TABLES publication, you could also narrow this further down and only expose the changes for specific tables, omit certain columns (e.g. with PII data) or rows (e.g. logically deleted rows), and more. See the docs for the CREATE PUBLICATION command for more details.\nLet’s take a look at the connector configuration then. This is done via a JSON-based configuration file looking like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 { \u0026#34;name\u0026#34; : \u0026#34;inventory-connector\u0026#34;, \u0026#34;config\u0026#34; : { \u0026#34;connector.class\u0026#34; : \u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, \u0026#34;tasks.max\u0026#34; : \u0026#34;1\u0026#34;, \u0026#34;database.hostname\u0026#34; : \u0026#34;\u0026#34;, \u0026#34;database.port\u0026#34; : \u0026#34;5432\u0026#34;, \u0026#34;database.user\u0026#34; : \u0026#34;\u0026#34;, \u0026#34;database.password\u0026#34; : \u0026#34;\u0026#34;, \u0026#34;database.dbname\u0026#34; : \u0026#34;\u0026#34;, \u0026#34;topic.prefix\u0026#34; : \u0026#34;dbserver1\u0026#34;, \u0026#34;plugin.name\u0026#34; : \u0026#34;pgoutput\u0026#34;, \u0026#34;publication.name\u0026#34; : \u0026#34;my_publication\u0026#34;, \u0026#34;poll.interval.ms\u0026#34; : \u0026#34;100\u0026#34; } } Adjust database host, user name, password, and database name as needed when applying this to your own environment. To apply this configuration, the REST API of Kafka Connect can be invoked. But if you’re like me and tend to forget all the exact endpoint URLs, then take a look at kcctl 🧸 (yes, the teddy bear emoji is part of the name), which I am going to use in the following. It is a command line client for Kafka Connect, which makes it very easy to create connectors, restart and stop them, etc. Following the semantics of kubectl, a configuration file is applied like this:\n1 2 kcctl apply -f postgres-connector.json Created connector inventory-connector Let’s take a look at the connector and its status:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 kcctl describe connector inventory-connector Name: inventory-connector Type: source State: RUNNING Worker ID: 172.21.0.3:8083 Config: connector.class: io.debezium.connector.postgresql.PostgresConnector database.dbname: database.hostname: database.password: database.port: 5432 database.user: name: inventory-connector plugin.name: pgoutput poll.interval.ms: 100 publication.name: my_publication tasks.max: 1 topic.prefix: dbserver1 Tasks: 0: State: RUNNING Worker ID: 172.21.0.3:8083 Topics: dbserver1.public.some_data Having confirmed that the connector is running, let’s do a quick update in the primary database and examine the corresponding change events in Kafka, ingested from the stand-by instance:\n1 2 3 4 docker run --tty --rm \\ --network postgres-logical-replication-standby_default \\ quay.io/debezium/tooling:1.2 \\ kcat -b kafka:9092 -C -o beginning -q -t dbserver1.public.some_data | jq .payload 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 { \u0026#34;before\u0026#34;: null, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 1, \u0026#34;short_text\u0026#34;: \u0026#34;c4ca4\u0026#34;, \u0026#34;long_text\u0026#34;: \u0026#34;3a3c3274941c83e253ebf8d2438ea5a2\u0026#34; }, \u0026#34;source\u0026#34;: { \u0026#34;version\u0026#34;: \u0026#34;2.5.0.Beta1\u0026#34;, \u0026#34;connector\u0026#34;: \u0026#34;postgresql\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1702469846436, \u0026#34;snapshot\u0026#34;: \u0026#34;first_in_data_collection\u0026#34;, \u0026#34;db\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;sequence\u0026#34;: \u0026#34;[null,\\\u0026#34;406746824704\\\u0026#34;]\u0026#34;, \u0026#34;schema\u0026#34;: \u0026#34;public\u0026#34;, \u0026#34;table\u0026#34;: \u0026#34;some_data\u0026#34;, \u0026#34;txId\u0026#34;: null, \u0026#34;lsn\u0026#34;: 406746824704, \u0026#34;xmin\u0026#34;: null }, \u0026#34;op\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1702469849341, \u0026#34;transaction\u0026#34;: null } ... These are the snapshot events emitted by the connector when starting up. Let’s do an update to one of the records on the primary:\n1 primary\u0026gt; UPDATE some_data SET short_text=\u0026#39;hello\u0026#39; WHERE id = 1; And shortly thereafter, the corresponding change event should show up in the Kafka topic:\n1 2 3 4 5 6 7 8 9 10 11 12 ... \u0026#34;before\u0026#34;: { \u0026#34;id\u0026#34;: 1, \u0026#34;short_text\u0026#34;: \u0026#34;c4ca4\u0026#34;, \u0026#34;long_text\u0026#34;: \u0026#34;3a3c3274941c83e253ebf8d2438ea5a2\u0026#34; }, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 1, \u0026#34;short_text\u0026#34;: \u0026#34;hello\u0026#34;,I \u0026#34;long_text\u0026#34;: \u0026#34;3a3c3274941c83e253ebf8d2438ea5a2\u0026#34; }, ... At this point, you could hook up that change event stream with Apache Flink, or the Decodable Kafka source connector for feeding it into a real-time stream processing pipeline, but I’ll leave that for another day 🙂.\nTowards Fail-Over Slots Postgres\u0026#39; support for logical replication has been built out significantly over the last few years. One thing still is missing, though: failover slots. Logical replication slots on the primary are not propagated to stand-bys. This means that when the primary unexpectedly goes down, any slots must be recreated on the new primary after promotion. Unfortunately, this can cause gaps in the change event stream, as any data change occurring before the new slot has been created would be missed. Clients would be forced to backfill the entire data set (i.e. take a snapshot in Debezium terminology) to be sure that no data is missing.\nDiscussions around adding support for fail-over slots go back to Postgres versions as old as 9.6. More recently, Patroni added their own solution to the problem, and EnterpriseDB released pg_failover_slots, a Postgres extension for slot failover. It remains to be seen when Postgres itself adds this feature (as hinted at in this presentation, it may happen with Postgres 17). Until then, in cases where the pg_failover_slots extension isn’t available—such as on Amazon RDS—logical replication slots on stand-bys let you build your own version of failover slots. The idea is to create two corresponding slots on primary and stand-by, and use the pg_replication_slot_advance() function (added in Postgres 11) to keep the two in sync. The replication consumer would connect to the slot on the primary at first. After a fail-over, when the stand-by server has been promoted to primary, it would reconnect to that slot.\nFor this to work, it is critical to periodically move the stand-by slot forward by calling pg_replication_slot_advance() with the confirmed flush LSN from the primary, i.e. the latest position in the WAL which has been processed and acknowledged by the consumer of the primary slot. Otherwise, the stand-by slot would retain larger and larger amounts of WAL, also the consumer would receive lots of duplicated events after a fail-over.\nThis could for instance be implemented using a cron job, or, when running on AWS, with a scheduled Lambda function. This job would periodically retrieve the confirmed flush LSN for the slot on the primary via the pg_replication_slots view:\n1 2 3 4 5 6 7 8 9 primary\u0026gt; SELECT slot_name, confirmed_flush_lsn FROM pg_replication_slots; +--------------+---------------------+ | slot_name | confirmed_flush_lsn | |--------------+---------------------| | primary_slot | 5F/C501098 | +--------------+---------------------+ The slot on the stand-by would then be advanced to that LSN:\n1 2 3 4 5 6 standby\u0026gt; SELECT * FROM pg_replication_slot_advance(\u0026#39;failover_slot\u0026#39;, \u0026#39;5F/C501098\u0026#39;); +---------------+------------+ | slot_name | end_lsn | |---------------+------------| | failover_slot | 5F/C501098 | +---------------+------------+ How often you should advance the stand-by slot depends on the amount of duplication you are willing to accept after a failover: the closer the stand-by slot follows the primary slot, the fewer duplicates there will be when switching from one slot to the other. Note that the stand-by slot must never be advanced beyond the confirmed LSN of the primary slot. Otherwise, events would be lost when reading from the stand-by slot after a failover. Specifically, when setting up the stand-by slot, it will in all likelihood be on a newer LSN than what has been confirmed by the primary and it is vital to synchronize the two slots at first. To do so, wait for the next LSN to be confirmed by the primary slot, make sure this LSN has been replicated to the stand-by, and then advance the stand-by slot to that LSN.\nWrap-Up Logical replication from Postgres stand-by servers has been a long awaited functionality, and it finally shipped with Postgres 16. Not only does it allow you to build chains of Postgres replicas (one stand-by server subscribing to another), but also non-Postgres clients, such as Debezium are not limited any longer to solely being able to connect to primary Postgres instances. This can be very useful for the purposes of load distribution or in situations where you prefer a CDC tool not to connect directly to your primary database.\nThe last missing piece in the puzzle here is full support for fail-over slots, for which you still need either a separate extension ( pg_failover_slots) or implement your own approach by manually keeping two slots on primary and stand-by in sync. It would be great to see official support for this in a future Postgres release.\nFinally, if you’d like learn more about logical replication from stand-bys, check out these posts from some fine folks in the Postgres community:\nPostgres 16 highlight: Logical decoding on standby, by Bertrand Drouvot\nLogical Replication on Standbys in Postgres 16, by Roberto Mello\nPostgres 16: The exciting and the unnoticed, by Samay Sharma\nPostgreSQL Logical Replication: Advantages, EDB’s Contributions and PG 16 Enhancements, by Shaun Thomas\nMany thanks to Bertrand Drouvot, Robert Metzger, Robin Moffatt, Gwen Shapira, and Sharon Xie for their feedback while writing this post.\n","id":93,"publicationdate":"Dec 19, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_stand_by_logical_replication_with_debezium\"\u003eStand-By Logical Replication With Debezium\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_towards_fail_over_slots\"\u003eTowards Fail-Over Slots\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/logical-replication-from-postgres-16-stand-by-servers-part-2-of-2\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWelcome back to this series about logical replication from Postgres 16 stand-by servers, in which we’ll discuss how to use this feature with Debezium—a popular open-source platform for Change Data Capture (CDC) for a wide range of databases—as well as how to manage logical replication in case of failover scenarios, i.e.\na situation where your primary Postgres server becomes unavailable and a stand-by server needs to take over.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","debezium","cdc"],"title":"Logical Replication From Postgres 16 Stand-By Servers—Debezium and Failover Slots","uri":"https://www.morling.dev/blog/logical-replication-from-postgres-stand-by-servers-debezium-and-failover-slots/"},{"content":" Table of Contents What is Postgres Logical Replication? Why Logical Replication On Stand-By Servers? Provisioning a Testing Environment Testing Things Out This post originally appeared on the Decodable blog. All rights reserved.\nFor users of Change Data Capture (CDC), one of the most exciting features in Postgres version 16 (released in September this year) is the support for logical replication from stand-by servers. Instead of connecting to your primary server, you can now point CDC tools such as Debezium to a replica server, which is very interesting for instance from a load distribution perspective. I am going to take a closer look at this new feature in this two-part blog series:\nPart I (this part): Explaining what Postgres logical replication is, why to use it on stand-by servers, and how to set it up\nPart II: Discussing how to use logical replication on Postgres stand-bys with Debezium and how to handle failover scenarios\nWhat is Postgres Logical Replication? Let’s start with the fundamentals. Replication is the process of synchronizing all the data from one database server to one or more other servers, for the purposes of ensuring high availability (HA)—if the primary server fails, one of the stand-bys can take over—and load distribution, as replicas can serve read requests. When it comes to replication in Postgres, it supports two kinds of continuous replication from primary to replicas: streaming and logical replication.\nStreaming replication (sometimes also called physical replication) propagates all the segments of the Write-Ahead Log (WAL) from a primary server to one or more stand-by servers(or replicas). It includes all the databases and tables and automatically propagates schema changes, resulting in an exact copy of the primary server. This makes it a great choice for ensuring high availability of your data, in particular when used in synchronous mode where writing transactions to the primary server are only committed after successful replication.\nIn contrast, logical replication works much more selectively, \u0026#34;replicating data objects and their changes, based upon their replication identity (usually a primary key)\u0026#34;. It operates in the context of one single database on a Postgres server, and you can control on a very fine-grained level which schemas or tables should be part of the replication stream.\nLogical replication offers a flexible publish/subscribe model, which goes beyond the basic primary/read-replica scheme of physical replication. In this model, a publisher node is a Postgres instance which exposes one or more publications, describing which tables should be replicated. Publications also allow you to limit which columns, rows, and operations (INSERT, UPDATE, or DELETE) should be propagated. Once set up, one or more subscriber nodes retrieve the changes from a publication.\nA subscriber can consume the changes from multiple publishers, and it can also be a publisher itself. You can execute local write transactions on subscribers, which means that logical replication can form the foundation for multi-master architectures. Unlike streaming replication, you can use logical replication between Postgres instances with different versions, making it a useful tool for zero-downtime version upgrades.\nOn the downside, there are some limitations: most importantly, logical replication does not propagate DDL changes—i.e. you must make sure by yourself that the database schemas of source and destination are in sync. Also replicating sequences, (materialized) views, and large objects is currently not supported.\nWhat makes logical replication so interesting, beyond the purposes of just replicating data from one Postgres instance to another, is the notion of logical decoding plug-ins. They control the data format used for the logical replication messages and allow external (i.e. non-Postgres) consumers to subscribe to a replication stream. This is how tools such as Debezium implement change data capture (CDC) for Postgres, enabling use cases such as the replication of data to data warehouses and data lakes, cache invalidation, low latency synchronization of search indexes, and many others.\nHaving established what logical replication is and why it is such an important part of the data engineering toolbox, let’s explore why it is so interesting that Postgres now supports logical replication from stand-by servers and how to set this up.\nWhy Logical Replication On Stand-By Servers? The first and most important reason for using logical replication from stand-by servers is load distribution. When you have many logical replication slots and corresponding clients, this potentially can create an unduly heavy load on the primary database server. In that case it can be beneficial to set up the replication slots on one or even multiple read replicas of the primary, thus distributing resource consumption (CPU, network bandwidth) across multiple machines.\nIn addition, setting up replication slots on stand-by servers is one step towards enabling failover slots, i.e. the ability for logical replication consumers to resume processing after a failure of the primary and subsequent promotion of a stand-by server. While Postgres does not (yet) support failover slots, you can implement them yourself with a bit of glue code, as we’ll discuss in the second part of this blog series.\nFinally, in some cases people just don’t like the idea of logical replication clients, in particular non-Postgres tools like Debezium, directly connecting to their operational database. Whether rightfully so or not, setting up logical replication on read replicas just helps with peace of mind.\nOn the flip-side, there are some implications of this to consider. One is the slightly increased end-to-end latency: as changes are traveling from the primary server to the stand-by and then to the client, things will take a bit longer than when connecting your replication consumers directly to the primary. The other aspect is that read replicas are exactly that—read-only. While CDC in general doesn’t require write access to the database, there are some exceptions to that. In case of Debezium, you won’t be able to use its incremental snapshotting implementation, as this requires inserting watermarking events into a signaling table (you can learn more about this in my talk \u0026#34;Debezium Snapshots Revisited!\u0026#34;). Also you cannot use the heartbeat feature, which lets the connector update its restart offsets also in cases where there are no change events for any table the connector is capturing are coming in. If you want to use these features, then you’ll need to create a replication slot on the primary server.\nEnough with all the theory, let’s have a look at logical replication on stand-bys in practice, starting with setting it up.\nProvisioning a Testing Environment As a testing environment I am going to use Postgres 16 on Amazon RDS (engine version \u0026#34;PostgreSQL 16.1-R1\u0026#34; at the time of writing), as this makes setting up a Postgres primary and an associated read replica a matter of just a few clicks. A free tier instance type like t4g.micro will do for this experiment. Instead of the pre-configured Multi-AZ cluster deployment option (which isn’t available in the free tier anyways), a single DB instance can be used, with a manually created replica.\nNote that you must have backups enabled for the primary server, as otherwise the option for creating a read replica will be disabled in the RDS console. In order to use logical replication, the database’s WAL level must be set to logical. To do so, set rds.logical_replication to 1 in the DB parameter groups of primary and stand-by, as I’ve recently discussed in Data Streaming Quick Tips episode #4. Finally, you should set hot_standby_feedback to true on the stand-by server. Otherwise, the primary server might prematurely discard tuples for system catalog rows which are still needed on the stand-by for logical decoding.\nEnabling hot stand-by feedback may cause a large amount of WAL segments to be pinned on the primary when the stand-by replication slot is not consumed. In the worst case this can cause the primary server to run out of disk space, clearly something you’d want to avoid at all costs. While this situation generally can be avoided by putting proper monitoring and alerting into place and has further been improved with the addition of the max_wal_keep_size setting in Postgres 13 (at the price that a replication consumer may not able to resume its work after a prolonged downtime), it is something to keep in mind when setting up logical replication on read replicas.\nIf you would like to work with a local set-up instead, you can follow the steps from this blog post for setting up two Postgres nodes via Docker, or use any of the Postgres Kubernetes operators for a Kube-based set-up. Set wal_level to logical in the postgresql.conf of the stand-by in this case.\nTesting Things Out With our database cluster up and running, let’s take a look at the existing replication slots on the primary and on the stand-by server:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 primary\u0026gt; WITH node_status AS ( SELECT CASE WHEN pg_is_in_recovery() = \u0026#39;True\u0026#39; Then \u0026#39;stand-by\u0026#39; ELSE \u0026#39;primary\u0026#39; END AS role ) SELECT node_status.role as node, slot_name, slot_type, active, plugin, database, confirmed_flush_lsn FROM pg_replication_slots, node_status; +---------+------------------------------------------------+-----------+--------+--------+----------+---------------------+ | node | slot_name | slot_type | active | plugin | database | confirmed_flush_lsn | |---------+------------------------------------------------+-----------+--------+--------+----------+---------------------| | primary | rds_eu_central_1_db_eygryp2llpzqewobw57j5mrtum | physical | True | | | | +---------+------------------------------------------------+-----------+--------+--------+----------+---------------------+ As expected, there is one physical (i.e. streaming) replication slot on the primary. This slot has been set up automatically when creating a replica on RDS and it propagates all the changes from the primary to the stand-by server. On the stand-by server itself, there’s no replication slot at this point. So let’s create a logical replication slot on the stand-by using pg_create_logical_replication_slot():\n1 2 3 4 5 6 7 standby\u0026gt; SELECT * FROM pg_create_logical_replication_slot(\u0026#39;demo_slot_standby\u0026#39;, \u0026#39;test_decoding\u0026#39;); +-------------------+-------------+ | slot_name | lsn | |-------------------+-------------| | demo_slot_standby | 5B/D80000E8 | +-------------------+-------------+ The test_decoding plug-in which is used here, comes with Postgres out of the box and emits changes in a simple text-based representation._ _It allows you to examine the change events using Postgres\u0026#39; logical decoding SQL interface, which comes in handy for testing logical replication without setting up a replication consumer.\nAs Bertrand Drouvot discusses in this blog post, the creation of the slot may take a while if there’s no traffic on the primary database: the slot will only be created when the next xl_running_xact record (describing currently active transactions) is received. Postgres 16 therefore adds a new function pg_log_standby_snapshot(), which can be invoked on the primary to trigger such a record. Unfortunately, it failed with a permission error when I tried to invoke it on RDS:\n1 2 3 primary\u0026gt; SELECT pg_log_standby_snapshot(); permission denied for function pg_log_standby_snapshot So if you are on RDS, and there are not a lot of transactions on your primary, you may have to wait for a minute or two for the slot to be created. Once that’s done, running the query from above again should show the new slot:\n1 2 3 4 5 +----------+-------------------+-----------+--------+---------------+-----------+---------------------+ | node | slot_name | slot_type | active | plugin | database | confirmed_flush_lsn | |----------+-------------------+-----------+--------+---------------+-----------+---------------------| | stand-by | demo_slot_standby | logical | False | test_decoding | inventory | 5E/700010B8 | +----------+-------------------+-----------+--------+---------------+-----------+---------------------+ Now it’s time to do some data changes on the primary then. Let’s create a simple table on the primary:\n1 2 3 4 5 6 primary\u0026gt; CREATE TABLE some_data ( id SERIAL NOT NULL PRIMARY KEY, short_text VARCHAR(255) NOT NULL, long_text text ); ALTER TABLE some_data REPLICA IDENTITY FULL; The table’s replica identity is set to FULL, ensuring that all its columns are part of the replication messages.\nInsert a few rows with random data using the generate_series() function:\n1 2 3 4 5 primary\u0026gt; INSERT INTO some_data(short_text, long_text) SELECT left(md5(i::text), 5), md5(random()::text) FROM generate_series(1, 5) s(i); To confirm that streaming replication from the primary to the stand-by server works, query that data on the stand-by:\n1 2 3 4 5 6 7 8 9 10 standby\u0026gt; SELECT * FROM some_data; +----+------------+----------------------------------+ | id | short_text | long_text | |----+------------+----------------------------------| | 1 | c4ca4 | 22f3cc8011bb8e2f553f8af1c5db18be | | 2 | c81e7 | 205304c828220a5aea30d5a13af4a01f | | 3 | eccbc | c049b5cdd131fbdceb6b3172dfe7399e | | 4 | a87ff | 0aed3b4e9d9e1a50e65597bdec7dfbc6 | | 5 | e4da3 | 68db5e5b4a57e7189f48cc89936567c2 | +----+------------+----------------------------------+ Still on the replica, we can now retrieve the corresponding change events from the logical replication slot with help of the pg_logical_slot_get_changes() function. The function takes the following parameters:\nslot_name: The name of the replication slot to get changes from,\nupto_lsn: The latest LSN (log sequence number, i.e. an offset in the WAL) to fetch,\nupto_nchanges: The maximum number of events to retrieve,\noptions: A variadic text parameter for specifying any decoding plug-in specific options\nIf neither of upto_lsn or upto_nchanges are specified, the WAL is consumed until the end. So let’s retrieve all changes for the slot:\n1 2 3 4 5 6 7 8 9 10 11 12 standby\u0026gt; SELECT * FROM pg_logical_slot_get_changes(\u0026#39;demo_slot_standby\u0026#39;, NULL, NULL); +-------------+------+----------------------------------------------------------------------------------------------------------------------------------------+ | lsn | xid | data | |-------------+------+----------------------------------------------------------------------------------------------------------------------------------------| | 5E/90005120 | 6983 | BEGIN 6983 | | 5E/90005188 | 6983 | table public.some_data: INSERT: id[integer]:1 short_text[character varying]:\u0026#39;c4ca4\u0026#39; long_text[text]:\u0026#39;3a3c3274941c83e253ebf8d2438ea5a2\u0026#39; | | 5E/90005290 | 6983 | table public.some_data: INSERT: id[integer]:2 short_text[character varying]:\u0026#39;c81e7\u0026#39; long_text[text]:\u0026#39;9d6d86f986523accd08b372333e77b4f\u0026#39; | | 5E/90005338 | 6983 | table public.some_data: INSERT: id[integer]:3 short_text[character varying]:\u0026#39;eccbc\u0026#39; long_text[text]:\u0026#39;8848fd2b0bd6fbddbbb72091658c047d\u0026#39; | | 5E/900053E0 | 6983 | table public.some_data: INSERT: id[integer]:4 short_text[character varying]:\u0026#39;a87ff\u0026#39; long_text[text]:\u0026#39;d7ce7d90eacaee6b8f8390023fa0636f\u0026#39; | | 5E/90005488 | 6983 | table public.some_data: INSERT: id[integer]:5 short_text[character varying]:\u0026#39;e4da3\u0026#39; long_text[text]:\u0026#39;843ea832ec1f7adc2f23647379435982\u0026#39; | | 5E/90005560 | 6983 | COMMIT 6983 | +-------------+------+----------------------------------------------------------------------------------------------------------------------------------------+ Very nice, change events retrieved from a replication slot on a stand-by server—things work as expected!\nThis concludes part one of this blog post series where we have explored what Postgres logical replication is, what you can do with it, and why and how to use it on stand-by servers, as possible since Postgres version 16. If you’d like to know how to use logical replication on stand-bys with Debezium and how to manage replication slots in failover scenarios, then check out the second part of this series!\nMany thanks to Bertrand Drouvot, Robert Metzger, Robin Moffatt, Gwen Shapira, and Sharon Xie for their feedback while writing this post.\n","id":94,"publicationdate":"Dec 19, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_what_is_postgres_logical_replication\"\u003eWhat is Postgres Logical Replication?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_why_logical_replication_on_stand_by_servers\"\u003eWhy Logical Replication On Stand-By Servers?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_provisioning_a_testing_environment\"\u003eProvisioning a Testing Environment\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_testing_things_out\"\u003eTesting Things Out\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/postgres-logical-replication\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor users of Change Data Capture (CDC), one of the most exciting features in Postgres version 16 (released in September this year) is the support for logical replication from stand-by servers.\nInstead of connecting to your primary server, you can now point CDC tools such as Debezium to a replica server, which is very interesting for instance from a load distribution perspective.\nI am going to take a closer look at this new feature in this two-part blog series:\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","cdc","debezium"],"title":"Using Stand-by Servers for Postgres Logical Replication","uri":"https://www.morling.dev/blog/stand-by-servers-for-postgres-logical-replication/"},{"content":"","id":95,"publicationdate":"Dec 17, 2023","section":"tags","summary":"","tags":null,"title":"jfr","uri":"https://www.morling.dev/tags/jfr/"},{"content":"","id":96,"publicationdate":"Dec 17, 2023","section":"tags","summary":"","tags":null,"title":"monitoring","uri":"https://www.morling.dev/tags/monitoring/"},{"content":" Table of Contents An Example Tracking RSS As regular readers of this blog will now, JDK Flight Recorder (JFR) is one of my favorite tools of the Java platform. This low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues. JFR continues to become better and better with every new release, with one recent addition being support for native memory tracking (NMT).\nNMT by itself is not a new capability of the JVM: it provides you with detailed insight into the memory consumption of your application, which goes way beyond the well-known Java heap space. NMT tells you how much memory the JVM uses for class metadata, thread stacks, the JIT compiler, garbage collection, memory-mapped files, and much more (the one thing which NMT does not report, despite what the name might suggest, is any memory allocated by native libraries, for instance invoked via JNI). To learn more about NMT, I highly recommend to read the excellent post Off-Heap memory reconnaissance by Brice Dutheil.\nUntil recently, in order to access NMT, you’d have to use the jcmd command line tool for capturing the values of a running JVM in an ad-hoc way. Whereas since Java 20, you can record NMT data continuously with JFR, thanks to two new JFR event types added for this purpose. This makes it much easier to collect that data over a longer period of time and analyze it in a systematic way. You could also expose a live stream of NMT data to remote clients via JFR event streaming, for instance for integration with dashboards and monitoring solutions.\nThe list of JFR event types grows with every release. If you’d like to learn which event types are available in which Java version, take a look at the JFR Events list compiled by Johannes Bechberger from the Java team at SAP. It also shows you the events added in a particular version, for instance here for the new events in Java 21.\nAn Example So let’s see how NMT data is reported via JFR. Here’s a simple example program which allocates some off heap memory, once using a good old direct byte buffer, and once using the new Foreign Memory API, finalized in Java 22 with JEP 454 (it feels so nice to be able to allocate 4GB at once, something you couldn’t do before):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import java.nio.ByteBuffer; import java.lang.foreign.Arena; import java.lang.foreign.MemorySegment; import static java.time.LocalDateTime.now; public void main() throws Exception { System.out.println(STR.\u0026#34;\\{ now() } Started\u0026#34;); Thread.sleep(5000); ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024 * 1024); System.out.println(STR.\u0026#34;\\{ now() } Allocated (Direct)\u0026#34;); Thread.sleep(5000); try (Arena arena = Arena.ofConfined()) { MemorySegment segment = arena.allocate(4L * 1024L * 1024L * 1024L); System.out.println(STR.\u0026#34;\\{ now() } Allocated (FMI)\u0026#34;); Thread.sleep(5000); } buffer = null; System.out.println(STR.\u0026#34;\\{ now() } Deallocated\u0026#34;); Thread.sleep(5000); } JFR records NMT events every second by default, so I’ve sprinkled in some sleep() calls to make sure the program runs long enough and the different allocations are spread out a bit. Just for the fun of it, I’m also using a top-level main method—as supported by JEP 463—and string templates for the log messages (JEP 459).\nLet’s run this and see how those off-heap allocations are tracked by JFR. Somewhat surprisingly, NMT in JFR is controlled via the gc setting, which must be set to a value of \u0026#34;normal\u0026#34;, \u0026#34;detailed\u0026#34;, \u0026#34;high\u0026#34;, or \u0026#34;all\u0026#34; for recording NMT data. This is the case for the default and profile JFR configurations which ship with the SDK, so using either configuration will give you the NMT data. Note though that in addition, NMT itself must be enabled using the -XX:NativeMemoryTracking JVM option:\n1 2 3 4 5 6 7 8 9 10 java --enable-preview --source 22 \\ -XX:StartFlightRecording=name=Profiling,filename=nmt-recording.jfr,settings=profile \\ -XX:NativeMemoryTracking=detail main.java [0.316s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default. [0.316s][info][jfr,startup] [0.316s][info][jfr,startup] Use jcmd 47194 JFR.dump name=Profiling to copy recording data to file. 2023-12-17T18:31:00.475598 Started 2023-12-17T18:31:05.609319 Allocated (Direct) 2023-12-17T18:31:11.167484 Allocated (FMI) 2023-12-17T18:31:16.253059 Deallocated Let’s open the recording in JDK Mission Control and see what we find. As of version 8.3, JMC doesn’t have a bespoke view for displaying NMT data, but the NMT events show up in the generic event browser view. There are two event types, the first one being \u0026#34;Total Native Memory Usage\u0026#34;:\nThe two off-heap allocations of 1 GB (direct byte buffer) and 4 GB (Foreign Memory API) show up as expected as increases to the reserved and committed memory of the program. We also see one of the advantages of the new Foreign Memory API: the memory is deallocated as soon as the Arena object is closed, whereas the JVM holds on to the memory of the byte buffer also after discarding the reference. There’s no control over when this memory will be released exactly, it will be done via a phantom-reference-based cleaner some time after the GC has removed the associated buffer object.\nThe second new event type, \u0026#34;Native Memory Usage Per Type\u0026#34;, provides a more fine grained view (when setting -XX:NativeMemoryTracking to detail rather than summary). The off-heap allocations show up under the \u0026#34;Other\u0026#34; category there:\nUpdate Dec 18: As OpenJDK developer Eric Gahlin pointed out, you also can take a high-level view at the NMT events of a recording using the JDK’s jfr tool, which provides two built-in views for committed and reserved memory:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 $JAVA_HOME/bin/jfr view native-memory-committed recording.jfr Native Memory Committed Memory Type First Observed Average Last Observed Maximum ------------------------------ -------------- --------- ------------- --------- Other 1,8 MB 1,7 GB 1,0 GB 5,0 GB Java Heap 136,0 MB 136,0 MB 136,0 MB 136,0 MB GC 54,2 MB 54,2 MB 54,2 MB 54,2 MB Metaspace 16,0 MB 16,0 MB 16,1 MB 16,1 MB Tracing 15,6 MB 15,7 MB 15,7 MB 15,7 MB Code 12,6 MB 12,6 MB 12,6 MB 12,6 MB Shared class space 12,4 MB 12,4 MB 12,4 MB 12,4 MB Arena Chunk 8,5 MB 2,2 MB 2,0 kB 8,5 MB Symbol 5,8 MB 5,8 MB 5,8 MB 5,8 MB Class 2,7 MB 2,7 MB 2,7 MB 2,7 MB Native Memory Tracking 1,7 MB 1,7 MB 1,7 MB 1,7 MB Synchronization 1,2 MB 1,2 MB 1,2 MB 1,2 MB Internal 563,4 kB 561,9 kB 561,7 kB 563,4 kB Compiler 202,9 kB 206,4 kB 205,6 kB 238,5 kB Module 174,1 kB 174,1 kB 174,1 kB 174,1 kB Thread 86,0 kB 82,5 kB 81,4 kB 86,0 kB Safepoint 32,0 kB 32,0 kB 32,0 kB 32,0 kB GCCardSet 29,5 kB 29,5 kB 29,5 kB 29,5 kB Serviceability 17,6 kB 17,6 kB 17,6 kB 17,6 kB Object Monitors 1,0 kB 1,0 kB 1,0 kB 1,0 kB String Deduplication 608 bytes 608 bytes 608 bytes 608 bytes Arguments 185 bytes 185 bytes 185 bytes 185 bytes Statistics 128 bytes 128 bytes 128 bytes 128 bytes Logging 32 bytes 32 bytes 32 bytes 32 bytes Test 0 bytes 0 bytes 0 bytes 0 bytes JVMCI 0 bytes 0 bytes 0 bytes 0 bytes Thread Stack 0 bytes 0 bytes 0 bytes 0 bytes Tracking RSS As per the docs, NMT will cause a performance overhead of 5% - 10% (how large the overhead actually is, depends a lot on the specific workload), so it’s probably not something you’d want to do permanently in a production setting. Luckily, Java 21 adds another JFR event type, \u0026#34;Resident Set Size\u0026#34; (RSS), which allows you to track the overall memory consumption of your application on an ongoing basis:\nOf course you can retrieve the RSS, i.e. the physical memory allocated by a process, also using other tools like ps, but recording it via JFR makes it really simple to analyze its development over time, and also allows you to correlate it with other relevant JFR events, for instance for class (un-)loading or garbage collection.\nWith JFR event streaming, you could also expose a live feed of the value to remote monitoring clients, allowing you to keep track visually using a dashboard. But you also could apply some kind of pattern matching to this time series of values, triggering an alert when it continues to grow also after the application’s warm-up phase.\nI am planning to explore how to do this with a bit of SQL using JFR Analytics in a future blog post.\n","id":97,"publicationdate":"Dec 17, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_an_example\"\u003eAn Example\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_tracking_rss\"\u003eTracking RSS\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs regular readers of this blog will now, \u003ca href=\"https://openjdk.org/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) is one of my favorite tools of the Java platform.\nThis low-overhead event recording engine built into the JVM is invaluable for observing the runtime characteristics of Java applications and identifying any potential performance issues.\nJFR continues to become better and better with every new release,\nwith one recent addition being support for native memory tracking (NMT).\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","performance","monitoring"],"title":"Tracking Java Native Memory With JDK Flight Recorder","uri":"https://www.morling.dev/blog/tracking-java-native-memory-with-jdk-flight-recorder/"},{"content":" Table of Contents What Is PyFlink and Why Should You Care? Prerequisites Installing the Flink Kubernetes Operator Installing Strimzi and Apache Kafka A Simple PyFlink Job Building a Container Image With Your PyFlink Job Deploying a PyFlink Job On Kubernetes This post originally appeared on the Decodable blog. All rights reserved.\nThe other day, I wanted to get my feet wet with PyFlink. While there is a fair amount of related information out there, I couldn’t find really up-to-date documentation on using current versions of PyFlink with Flink on Kubernetes.\nKubernetes is the common go-to platform for running Flink at scale in production these days, allowing you to deploy and operate Flink jobs efficiently and securely, providing high availability, observability, features such as auto-scaling, and much more. All of which are good reasons for running PyFlink jobs on Kubernetes, too, and so I thought I’d provide a quick run-through of the required steps for getting started with PyFlink on Kubernetes as of Apache Flink 1.18.\nIn the remainder of this blog post, I’m going to explore how to\ninstall Flink and its Kubernetes operator on a local Kubernetes cluster,\ninstall Kafka on the same cluster, using the Strimzi operator,\ncreate a PyFlink job which creates some random data using Flink’s DataGen connector and writes that data to a Kafka topic using Flink SQL, and\ndeploy and run that job to Kubernetes.\nThe overall solution is going to look like this:\nFigure 1. Solution Overview What Is PyFlink and Why Should You Care? In a nutshell, PyFlink is the Python-based API to Apache Flink. Per the docs, it enables you to:\nBuild scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes.\nIt is particularly useful for development teams who are more familiar with Python than with Java and who still would like to benefit from Flink’s powerful stream processing capabilities. Another factor is Python’s rich 3rd party ecosystem: not only does it provide libraries for data engineering (e.g. Pandas) and scientific computing (e.g. NumPy), but also for ML and AI (e.g. PyTorch and TensorFlow), which makes PyFlink the ideal link between these fields and real-time stream processing. So if for instance you wanted to train your ML models on real time event data sourced from your production RDBMS, doing some data cleaning, filtering, and aggregation along the way, PyFlink would be a great option.\nSimilar to Flink’s Java APIs, PyFlink comes with a DataStream API and a Table API. The former lets you implement operations such as filtering and mapping, joining, grouping and aggregation, and much more on data streams, providing you with a large degree of freedom and control over aspects such as state management, late event handling, or output control. In contrast, the Table API offers a more rigid but also easier-to-use and less verbose relational programming interface, including support for defining stream processing pipelines using SQL, benefitting from automatic optimizations by Flink’s query planner.\nImplementation-wise, PyFlink acts as a wrapper around Flink’s Java APIs. Upon start-up, the job graph is retrieved from the Python job using Py4J, a bridge between the Python VM and the JVM. It then gets transformed into an equivalent job running on the Flink cluster. At runtime, Flink will call back to the Python VM for executing any Python-based user-defined functions (UDFs) with the help of the Apache Beam portability framework. As of Flink 1.15, there’s also support for a new execution mode called \u0026#34;thread mode\u0026#34;, where Python code is executed on the JVM itself (via JNI), avoiding the overhead of cross-process communication.\nPrerequisites So let’s set up a Kubernetes cluster and install the aforementioned operators onto it. To follow along, make sure to have the following things installed on your machine:\nDocker, for creating and running containers\nkind, for setting up a local Kubernetes cluster (of course you could also use alternatives such as MiniKube or a managed cluster on EKS or GKE, etc.)\nhelm, for installing software into the cluster\nkubectl, for interacting with Kubernetes\nBegin by creating a new cluster with one control plane and one worker node via kind:\n1 2 3 4 5 6 7 { cat | kind create cluster --name pyflink-test --config -; } \u0026lt;\u0026lt; EOF kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker EOF After a minute or so, the Kubernetes cluster should be ready, and you can take a look at its nodes with kubectl:\n1 2 3 4 kubectl get nodes NAME STATUS ROLES AGE VERSION pyflink-test-2-control-plane Ready control-plane 67s v1.27.3 pyflink-test-2-worker Ready 48s v1.27.3 Installing the Flink Kubernetes Operator Next, install the Flink Kubernetes Operator. Official part of the Flink project, its task will be to deploy and run Flink jobs on Kubernetes, based on custom resource definitions. It is installed using a Helm chart, using the following steps (refer to the upstream documentation for more details):\nDeploy the certificate manager (required later on when the operator’s webhook is invoked during the creation of custom resources):\n1 kubectl create -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml Add the Helm repository for the operator:\n1 helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.7.0/ Install the operator using the provided Helm chart:\n1 helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator After a short time, a Kubernetes pod with the operator should be running on the cluster:\n1 2 3 kubectl get pods NAME READY STATUS RESTARTS AGE flink-kubernetes-operator-f4bbff6-jtd4x 2/2 Running 0 0h17m Installing Strimzi and Apache Kafka With the Flink Kubernetes Operator up and running, it’s time to install Strimzi. It is another Kubernetes Operator, in this case in charge of deploying and running Kafka clusters. Strimzi is a very powerful tool, supporting all kinds of Kafka deployments, Kafka Connect, MirrorMaker2, and much more. We are going to use it for installing a simple one node Kafka cluster, following the steps from Strimzi’s quickstart guide:\nCreate a Kubernetes namespace for Strimzi and Kafka:\n1 kubectl create namespace kafka Install Strimzi into that namespace:\n1 kubectl create -f \u0026#39;https://strimzi.io/install/latest?namespace=kafka\u0026#39; -n kafka Create a Kafka cluster:\n1 kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka Again this will take some time to complete. You can use the wait command to await the Kafka cluster materialization:\n1 kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka Once the Kafka broker is up, the command will return and you’re ready to go.\nA Simple PyFlink Job With the operators for Flink and Kafka as well as a single-node Kafka cluster in place, let’s create a simple stream processing job using PyFlink. Inspired by the Python example job coming with the Flink Kubernetes operator, it uses the Flink DataGen SQL connector for creating random purchase orders. But instead of just printing them to sysout, we’re going to emit the stream of orders to a Kafka topic, using Flink’s Apache Kafka SQL Connector.\nIn order to get started with setting up a PyFlink development environment on your local machine, check out this post by Robin Moffatt. In particular, make sure to use no Python version newer than 3.10 with PyFlink 1.18. At the time of writing, Python 3.11 is not supported yet, it is on the roadmap for PyFlink 1.19 ( FLINK-33030).\nHere is the complete job:\npyflink_hello_world.py (source) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 import logging import sys import os from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment def pyflink_hello_world(): env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) t_env = StreamTableEnvironment.create(stream_execution_environment=env) kafka_jar = os.path.join(os.path.abspath(os.path.dirname(__file__)), \u0026#39;flink-sql-connector-kafka-3.0.2-1.18.jar\u0026#39;) t_env.get_config()\\ .get_configuration()\\ .set_string(\u0026#34;pipeline.jars\u0026#34;, \u0026#34;file://{}\u0026#34;.format(kafka_jar)) t_env.execute_sql(\u0026#34;\u0026#34;\u0026#34; CREATE TABLE orders ( order_number BIGINT, price DECIMAL(32,2), buyer ROW, order_time TIMESTAMP(3) ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;datagen\u0026#39;, \u0026#39;rows-per-second\u0026#39; = \u0026#39;4\u0026#39; )\u0026#34;\u0026#34;\u0026#34;) t_env.execute_sql(\u0026#34;\u0026#34;\u0026#34; CREATE TABLE orders_sink ( order_number BIGINT, price DECIMAL(32,2), buyer ROW, order_time TIMESTAMP(3) ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;orders\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;my-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092\u0026#39;, \u0026#39;properties.group.id\u0026#39; = \u0026#39;orders-sink\u0026#39;, \u0026#39;format\u0026#39; = \u0026#39;json\u0026#39; )\u0026#34;\u0026#34;\u0026#34;) t_env.execute_sql(\u0026#34;\u0026#34;\u0026#34; INSERT INTO orders_sink SELECT * FROM orders\u0026#34;\u0026#34;\u0026#34;) if __name__ == \u0026#39;__main__\u0026#39;: logging.basicConfig(stream=sys.stdout, level=logging.INFO, format=\u0026#34;%(message)s\u0026#34;) pyflink_hello_world() pyflink_hello_world.py (source) The logic is fairly straightforward: A StreamTableEnvironment is created and the Flink SQL Kafka connector JAR is registered with it. Then two tables are created: a source table orders, based on the datagen connector, and a sink table orders_sink, based on the Kafka sink connector. Finally, an INSERT query is issued, which propagates any changed rows from the source to the sink table.\nBuilding a Container Image With Your PyFlink Job In order to deploy our PyFlink job onto Kubernetes, you’ll need to create a container image, containing the Python job itself and all its dependencies: PyFlink, any required Python packages, as well as any Flink connectors used by the job.\nUnfortunately, the Dockerfile of the aforementioned Python example coming with the Flink Kubernetes operator didn’t quite work for me. When trying to build an image from it, I’d get the following error:\n1 2 3 4 5 6 7 8 9 10 ... 38.20 Collecting pemja==0.3.0 38.22 Downloading pemja-0.3.0.tar.gz (48 kB) ... 42.47 × Getting requirements to build wheel did not run successfully. 42.47 | exit code: 255 42.47 ╰─\u0026gt; [1 lines of output] 42.47 Include folder should be at \u0026#39;/opt/java/openjdk/include\u0026#39; but doesn\u0026#39;t exist. Please check you\u0026#39;ve installed the JDK properly. 42.47 [end of output] ... The problem is that the upstream Flink container image which is used as a base image here, itself is derived from a JRE image for Java, i.e. it contains only a subset of all the modules provided by the Java platform. PemJa, one of PyFlink’s dependencies, requires some header files which only are provided by a complete JDK, though.\nI have therefore created a new base image for PyFlink jobs, which removes the JRE and adds back the complete JDK, inspired by the equivalent step in the Dockerfile for creating the JDK 11 image provided by the Eclipse Temurin project. This is not quite ideal in terms of overall image size, and the cleaner approach would be to create a new image derived from the JDK one, but it does the trick for now to get going:\nDockerfile.pyflink-base (source) 1 2 3 4 5 6 7 8 9 10 11 12 FROM flink:1.18.0 RUN rm -rf $JAVA_HOME RUN /bin/sh -c set -eux; ARCH=\u0026#34;$(dpkg --print-architecture)\u0026#34;; case \u0026#34;${ARCH}\u0026#34; in aarch64|arm64) ESUM=\u0026#39;8c3146035b99c55ab26a2982f4b9abd2bf600582361cf9c732539f713d271faf\u0026#39;; BINARY_URL=\u0026#39;https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.21%2B9/OpenJDK11U-jdk_aarch64_linux_hotspot_11.0.21_9.tar.gz\u0026#39;; ;; amd64|i386:x86-64) ESUM=\u0026#39;60ea98daa09834fdd3162ca91ddc8d92a155ab3121204f6f643176ee0c2d0d5e\u0026#39;; BINARY_URL=\u0026#39;https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.21%2B9/OpenJDK11U-jdk_x64_linux_hotspot_11.0.21_9.tar.gz\u0026#39;; ;; armhf|arm) ESUM=\u0026#39;a64b005b84b173e294078fec34660ed3429d8c60726a5fb5c140e13b9e0c79fa\u0026#39;; BINARY_URL=\u0026#39;https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.21%2B9/OpenJDK11U-jdk_arm_linux_hotspot_11.0.21_9.tar.gz\u0026#39;; ;; ppc64el|powerpc:common64) ESUM=\u0026#39;262ff98d6d88a7c7cc522cb4ec4129491a0eb04f5b17dcca0da57cfcdcf3830d\u0026#39;; BINARY_URL=\u0026#39;https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.21%2B9/OpenJDK11U-jdk_ppc64le_linux_hotspot_11.0.21_9.tar.gz\u0026#39;; ;; s390x|s390:64-bit) ESUM=\u0026#39;bc67f79fb82c4131d9dcea32649c540a16aa380a9726306b9a67c5ec9690c492\u0026#39;; BINARY_URL=\u0026#39;https://github.com/adoptium/temurin11-binaries/releases/download/jdk-11.0.21%2B9/OpenJDK11U-jdk_s390x_linux_hotspot_11.0.21_9.tar.gz\u0026#39;; ;; *) echo \u0026#34;Unsupported arch: ${ARCH}\u0026#34;; exit 1; ;; esac; wget --progress=dot:giga -O /tmp/openjdk.tar.gz ${BINARY_URL}; echo \u0026#34;${ESUM} */tmp/openjdk.tar.gz\u0026#34; | sha256sum -c -; mkdir -p \u0026#34;$JAVA_HOME\u0026#34;; tar --extract --file /tmp/openjdk.tar.gz --directory \u0026#34;$JAVA_HOME\u0026#34; --strip-components 1 --no-same-owner ; rm -f /tmp/openjdk.tar.gz ${JAVA_HOME}/lib/src.zip; find \u0026#34;$JAVA_HOME/lib\u0026#34; -name \u0026#39;*.so\u0026#39; -exec dirname \u0026#39;{}\u0026#39; \u0026#39;;\u0026#39; | sort -u \u0026gt; /etc/ld.so.conf.d/docker-openjdk.conf; ldconfig; java -Xshare:dump; # install python3 and pip3 RUN apt-get update -y \u0026amp;\u0026amp; \\ apt-get install -y python3 python3-pip python3-dev \u0026amp;\u0026amp; \\ rm -rf /var/lib/apt/lists/* # install PyFlink RUN pip3 install apache-flink==1.18.0 Create an image from that Dockerfile and store it in your local container image registry:\n1 docker build -f Dockerfile.pyflink-base . -t decodable-examples/pyflink-base:latest As for the actual image with our job, all that remains needed to be done is to extend that base image and add the Python job with all its dependencies (in this case, just the Kafka connector):\nDockerfile (source) 1 2 3 4 FROM decodable-examples/pyflink-base:latest RUN wget -P /opt/flink/usrlib https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka/3.0.2-1.18/flink-sql-connector-kafka-3.0.2-1.18.jar ADD --chown=flink:flink python_demo.py /opt/flink/usrlib/pyflink_hello_world.py Let’s also build an image for that:\n1 docker build -f Dockerfile . -t decodable-examples/pyflink-hello-world:latest In order to use that image from within Kubernetes, you’ll finally need to load it into the cluster, which can be done with kind like this:\n1 kind load docker-image decodable-examples/pyflink-hello-world:latest --name pyflink-test Deploying a PyFlink Job On Kubernetes At this point, you have everything in place for deploying your PyFlink job to Kubernetes. The operator project comes with an example resource of type FlinkDeployment, which works pretty much as-is for our purposes. Only the image name and the Flink version need to be changed to the image you’ve just created:\npyflink-hello-world.yaml (source) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: pyflink-hello-world spec: image: decodable-examples/pyflink-hello-world:latest flinkVersion: v1_18 flinkConfiguration: taskmanager.numberOfTaskSlots: \u0026#34;1\u0026#34; serviceAccount: flink jobManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 taskManager: resource: memory: \u0026#34;2048m\u0026#34; cpu: 1 job: jarURI: local:///opt/flink/opt/flink-python-1.18.0.jar entryClass: \u0026#34;org.apache.flink.client.python.PythonDriver\u0026#34; args: [\u0026#34;-pyclientexec\u0026#34;, \u0026#34;/usr/bin/python3\u0026#34;, \u0026#34;-py\u0026#34;, \u0026#34;/opt/flink/usrlib/pyflink_hello_world.py\u0026#34;] parallelism: 1 upgradeMode: stateless Note how the PythonDriver class is used as the entry point for running a PyFlink job and the job to run is passed in via the -py argument. Just as any other Kubernetes resource, this Flink job can be deployed using kubectl:\n1 kubectl create -f pyflink-hello-world.yaml The operator will pick up that resource definition and spin up the corresponding pods with the Flink job and task manager for running this job. As before, you can await the creation of the job:\n1 kubectl wait FlinkDeployment/python-example --for=jsonpath=\u0026#39;{.status.jobStatus.state}\u0026#39;=RUNNING --timeout=300s As the last step, you can check out the Kafka topic, confirming that the job propagates all the orders from the datagen source to the Kafka sink as expected:\n1 2 3 4 kubectl -n kafka run kafka-consumer -ti --rm=true --restart=Never \\ --image=quay.io/strimzi/kafka:0.38.0-kafka-3.6.0 -- \\ bin/kafka-console-consumer.sh --bootstrap-server \\ my-cluster-kafka-bootstrap.kafka:9092 --topic orders 1 2 {\u0026#34;order_number\u0026#34;:1574206908793601022,\u0026#34;price\u0026#34;:824651156650254403841457913856,\u0026#34;buyer\u0026#34;:{\u0026#34;first_name\u0026#34;:\u0026#34;3ffbd66e114b1d26e08181bd8c248aac514b812bce11ebaf11b3f2ee8941d0df3feea556bced4c07bd040bac4da53af1774b\u0026#34;,\u0026#34;last_name\u0026#34;:\u0026#34;24eb04e10e90a4e1717dd5afa574eb964775466e22f725a6d603d1723f27d2616d095792cfaf5f83815c728b6eb0a961c673\u0026#34;},\u0026#34;order_time\u0026#34;:\u0026#34;2023-12-06 09:11:10.397\u0026#34;} {\u0026#34;order_number\u0026#34;:4306245129932523235,\u0026#34;price\u0026#34;:966772426991501079169688666112,\u0026#34;buyer\u0026#34;:{\u0026#34;first_name\u0026#34;:\u0026#34;c7d08f3d15b2e993b6e12be76a891b1e367a5a841850375142ee4b2b62dc2a1541a94b16267f568f0ada8c3e97963f346745\u0026#34;,\u0026#34;last_name\u0026#34;:\u0026#34;2d80bb118ca512c1a7aa5a9bfcf651b62521de8489ea7a80554723625674e000153e3a37f652ce1137df4dc154e573fd09ce\u0026#34;},\u0026#34;order_time\u0026#34;:\u0026#34;2023-12-06 09:11:10.397\u0026#34;} You also can take a look at the deployed job in the Flink web UI by forwarding the REST port from the job manager:\n1 kubectl port-forward service/python-hello-world-rest 8081:rest This makes the Flink Dashboard accessible at http://localhost:8081, allowing you to take a look at the job’s health status, metrics, etc.:\nFigure 2. PyFlink job in the Apache Flink Dashboard Finally, to stop your job, simply delete its resource definition:\n1 kubectl delete -f python-hello-world.yaml And there you have it—Your first PyFlink job running on Kubernetes 🎉. To try everything out yourself, you can find the complete source code in our examples repository on GitHub. Happy PyFlink-ing!\nMany thanks to Robert Metzger and Robin Moffatt for their feedback while writing this post.\n","id":98,"publicationdate":"Dec 7, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_what_is_pyflink_and_why_should_you_care\"\u003eWhat Is PyFlink and Why Should You Care?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_prerequisites\"\u003ePrerequisites\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_installing_the_flink_kubernetes_operator\"\u003eInstalling the Flink Kubernetes Operator\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_installing_strimzi_and_apache_kafka\"\u003eInstalling Strimzi and Apache Kafka\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_a_simple_pyflink_job\"\u003eA Simple PyFlink Job\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_building_a_container_image_with_your_pyflink_job\"\u003eBuilding a Container Image With Your PyFlink Job\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_deploying_a_pyflink_job_on_kubernetes\"\u003eDeploying a PyFlink Job On Kubernetes\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/getting-started-with-pyflink-on-kubernetes\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, I wanted to get my feet wet with PyFlink.\nWhile there is a fair amount of related information out there, I couldn’t find really up-to-date documentation on using current versions of PyFlink with Flink on Kubernetes.\u003c/p\u003e\n\u003c/div\u003e","tags":["flink","kubernetes","streaming"],"title":"Getting Started With PyFlink on Kubernetes","uri":"https://www.morling.dev/blog/getting-started-with-pyflink-on-kubernetes/"},{"content":" Table of Contents CDC—​A Quick Primer Does CDC Break Encapsulation? Entering Data Contracts Implementation Approaches For Data Contracts The Outbox Pattern Stream Processing Streaming Data Contracts—​Beyond the Basics Handling Schema Changes Summary This post originally appeared on the Decodable blog. All rights reserved.\nHaving worked on Debezium—​an open-source platform for Change Data Capture (CDC)—​for several years, one concern I’ve heard repeatedly is this: aren’t you breaking the encapsulation of your application when you expose change event feeds directly from your database? After all, CDC exposes your internal persistent data model to the outside world, which may have unintended consequences, e.g. in terms of data exposure but also when it comes to changes to the schema of your data, which may break downstream consumers.\nIn this blog post I am going to dive into this problem space, discuss when—​and when not—​CDC can break encapsulation, whether it matters, and explore strategies for avoiding these problems when it does.\nCDC—​A Quick Primer With log-based change data capture—​for instance, using Debezium—​you can expose realtime change event streams for the tables of your database, sourced from the database’s transaction log. For each executed INSERT, UPDATE, and DELETE, an event is appended to the log, from where it is captured by the CDC tool and propagated to consumers, usually through data streaming platforms such as Apache Kafka or Amazon Kinesis.\nThese event streams enable a large variety of use cases, such as low-latency data feeds for analytical data stores, cache updates, full-text search indexes, and many more. While there are different alternatives for implementing CDC systems (for instance based on polling for changed rows, or using database triggers), log-based CDC is generally the most powerful and efficient approach and should be preferred whenever possible.\nDoes CDC Break Encapsulation? In software design, encapsulation refers to the practice of hiding the implementation details and inner data structures of a component from the outside world, providing access to the component’s functionality and data only through well-defined interfaces. By publishing change event streams for the tables of an application’s database, this encapsulation may be violated.\nOver time, several people have touched on this aspect, for instance Chris Riccomini in this blog post and Yaroslav Tkachenko here. The following implications of using using log-based CDC are typically at the center of this discussion:\nYour table model becomes your API: by default, your table’s column names and types correspond to fields in the change events emitted by the CDC tool. This can yield less-than-ideal event schemas, particularly for legacy applications.\nFine-grained events: CDC event streams typically expose one event per affected table row, whereas it can be desirable to publish higher-level events to consumers. An example of this would be wanting one event for one purchase order with all its order lines, even if they are stored within two separate tables in an RDBMS. The loss of transaction semantics in CDC event streams can aggravate that concern, as consumers cannot easily correlate the events originating from one and the same transaction in the source database.\nSchema changes might break things: Downstream consumers of the change events will expect the data to adhere to the schema known to them. As there is no abstraction between your internal data model and consumers, any changes to the database schema, such as renaming a column or changing its type, could cause downstream event consumers to break, unless they are updated in lockstep.\nYou may accidentally leak sensitive data: a change event stream will, by default, contain all the rows of a table with all its columns. This means that sensitive data which shouldn’t leave the security perimeter of your application could be exposed to external consumers.\nNow, this perhaps sounds a bit scarier than it actually is! In order to understand whether there actually is a problem here or not, it helps to look at how and where your change event streams are consumed. Specifically, do change events permeate multiple bounded contexts (in terminology of Domain-Driven Design)? Are they propagated across system (and team) boundaries, or not?\nIf a change event stream is consumed within the same context as the source database itself—​such as updating an in-memory cache managed by the application or service owning the database—​then I would argue that there actually isn’t much to be concerned about. You actually want the data in the cache to match the original data. Similarly, if a change event stream is used for feeding a search index owned and managed by the same team also building the source application itself, aspects like schema changes can be coordinated by the team itself and applied to the database and index together.\nThings look different though when crossing context or organizational boundaries. Your change events are consumed by the analytics team sitting at the end of the hallway to whom you only speak once a year? You are using CDC to propagate data changes between microservices created by different parts of your organization? In situations like these, directly exposing change streams from your internal data model indeed may be problematic. Tying these kinds of external consumers to your own data model and its lifecycle can lead to a loss of agility (changes to your model require convoluted and time-consuming change control processes tightly synchronized between different teams) and service disruptions (downstream pipelines and consumers fail due to incompatible schema changes).\nEntering Data Contracts So how do you mitigate these risks of using CDC between bounded contexts and/or teams? The solution is similar to what you’d do for exposing any other remote API—​such as REST or gRPC—​from your application: you have an API layer which is separate from the internal data model. This layer exposes a service’s functionality and data in exactly the way it’s needed, evolving at its own pace and independently from changes you make to your internal model, with a strong notion of not breaking compatibility in mind.\nIn databases, views have historically been a proven means of establishing module and system boundaries, for instance providing an explicit interface to traditional pull-based ETL tools. Unfortunately, (non-materialized) database views cannot be exposed via CDC because they don’t operate via the transaction log. But as we’ll see below, there are other ways for setting up separate change streams which don’t directly mirror the raw streams corresponding to the tables of your data model. These public change event streams adhere to their own well-defined and deliberately crafted data contract.\nData contracts have been quite the hotness lately, and for good reason. They typically comprise the following aspects as a formal agreement between data providers and consumers:\nData schema: describes the structure and format of the events in a stream, i.e. the fields, their names and types, constraints, etc.\nData semantics: describe the meaning of event attributes, e.g. units of measurement for numeric values\nService level agreements (SLAs): describe qualitative aspects of a data stream, such as mean and maximum latency, event rates, etc.\nEvolution rules: describe how and when a data contract can be changed, in particular its schema, so that consumers can prepare for any changes without breaking\nMetadata and policies: describes attributes like the owner of a data contract, what the data can (and cannot) be used for, etc.\nExamples: show how events adhering to the contract may look like\nThis \u0026#34;contract\u0026#34; can be expressed and managed using different formats and tools. For instance, data schemas can be defined using JSON Schema, Avro schemas, or ProtoBuf definitions. Another option would be describing your change event streams with help of the AsyncAPI specification. Evolution rules can be managed and enforced using schema registries such as Confluent’s or Apicurio. Other contract elements may lend themselves to a textual representation, perhaps on a wiki page or some other kind of document. But also a more formal, machine-readable contract definition is possible, such as in the form of YAML (will this ever stop ?!), as suggested by the Data Contracts specification.\nRegardless of how it’s implemented, the development team building a service from whose database change event streams are exposed should also own any data contracts for these streams. That way, the contracts are part of that team’s product—​developed and maintained by that team, just as they would for any other APIs of the service.\nImplementation Approaches For Data Contracts Having established that explicitly designed data contracts are very useful, how can you go about implementing them—​specifically event schemas and their evolution—​for your CDC events? In the following, I’d like to describe two approaches for doing so: the Outbox Pattern, and stream processing using something like Apache Flink. I’ll also illustrate exactly how data contracts help you address some of the potential encapsulation risks identified earlier on.\nThe Outbox Pattern The idea of the Outbox Pattern is that instead of capturing the changes from your internal domain model tables, your application emits bespoke events via a separate table, typically called the outbox table. A CDC tool like Debezium will capture only the events inserted into that outbox table and relay them to any downstream consumers. Very importantly, any actual data changes (e.g. an update to a customer record), and the insertion of the outbox event, must happen in one single database transaction, ensuring atomic all-or-nothing guarantees.\nThe contract of the outbox events is kept separated from the internal data model, allowing you to expose the data in exactly the way you want to expose it. For instance you could emit a single event for an entire aggregate root which is persisted in multiple tables in your internal model, as shown below:\nFigure 1. The outbox pattern with Debezium But you may also decide to adjust the types of exposed fields, only expose a subset of all your data attributes—​giving you the opportunity to omit any sensitive or implementation-specific attributes—​and much more. Having a separate contract allows you to evolve it independently from your internal model, too. Say, you rename a column in one of your tables; it’s a conscious decision then to also rename it in the schema of your outbox events (potentially requiring a new major version of the same), or keep it as is.\nDebezium comes with powerful support for implementing the Outbox Pattern. This includes a routing component for propagating outbox events to specific topics in Kafka, based on configurable event metadata. If you are on Postgres, an interesting implementation option is logical decoding messages: instead of having a bespoke outbox table, Postgres lets you write arbitrary events solely to its write-ahead log, from where they can be retrieved using CDC. This spares you from implementing your own housekeeping routines (removal of events from the outbox table after they have been captured and sent), as the database itself will discard any obsolete segments of the transaction log automatically.\nSo, the Outbox Pattern is a rather simple option for implementing data contracts (at least the schema portion) in a reliable way. It avoids unsafe dual writes (e.g. to your service’s database and Apache Kafka), while not exposing your internal data model to external consumers. On the downside, it does require you to modify your service so that it emits the outbox events, and you must make sure to consistently do this for all the write operations of your service. There is a potential performance impact (as you do another insert call in write transactions) and for high-volume use cases, the additional disk space required for storing outbox events in the database could be an issue.\nStream Processing As we saw above, when we implement data contracts using the Outbox Pattern, the application itself is responsible for forming and emitting the change events that adhere to the contract specified. But what if we can’t—​or don’t want to, for whatever reason—​change the application to do this? The alternative is to use stream processing to publish change event streams with explicit data contracts after the fact. The idea here is to take the unprocessed table-level change events from a CDC source, process and convert them as needed, and re-publish them as separate streams with defined data contracts.\nWhile simple event transformations can be implemented with stateless tools like Single Message Transforms in Kafka Connect, stateful stream processing engines like Apache Flink provide much greater flexibility and more possibilities. This is particularly relevant when it comes to joining multiple input streams into a single output stream, creating multiple output versions for one input stream, enriching change events with contextual metadata, and more. With its support for SQL, Flink also has the benefit of using a commonly-understood language to define contracts, making them more accessible and maintainable than the kind of non-portable JSON configuration that other tools might provide for data transformation.\nThe following image shows the overall approach, using Flink SQL to establish a public data contract for a table customers in a Postgres database:\nFigure 2. Stream processing pipeline with Apache Flink Flink CDC is used to ingest the raw CDC feed into a Flink SQL table. The data is transformed and published to a Kafka topic which adheres to a stable data contract (specifically, the topic’s schema represents the schema part of the contract). The source and sink connectors are represented as tables in Flink SQL; here’s the definition of the source table, customers, using the postgres-cdc connector, which itself is based on top of the Debezium connector for Postgres (you can find the complete source code for this blog post in the decodableco/examples repository on GitHub):\nConfiguration of the Postgres source connector 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 CREATE TABLE customers ( id INT, fname STRING, lname STRING, email STRING, street STRING, zip STRING, city STRING, status INT, registered TIMESTAMP, PRIMARY KEY (id) NOT ENFORCED ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;postgres-cdc\u0026#39;, \u0026#39;hostname\u0026#39; = \u0026#39;postgres\u0026#39;, \u0026#39;port\u0026#39; = \u0026#39;5432\u0026#39;, \u0026#39;username\u0026#39; = \u0026#39;postgres\u0026#39;, \u0026#39;password\u0026#39; = \u0026#39;postgres\u0026#39;, \u0026#39;database-name\u0026#39; = \u0026#39;postgres\u0026#39;, \u0026#39;schema-name\u0026#39; = \u0026#39;inventory\u0026#39;, \u0026#39;table-name\u0026#39; = \u0026#39;customers\u0026#39;, \u0026#39;slot.name\u0026#39; = \u0026#39;customers_replication_slot\u0026#39;, \u0026#39;decoding.plugin.name\u0026#39; = \u0026#39;pgoutput\u0026#39; ); For publishing the change events into a Kafka topic customers, another table, customers_public, is created which looks like this:\nConfiguration of the Kafka sink connector 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 CREATE TABLE customers_public ( id INT, first_name STRING, last_name STRING, email STRING, zip STRING, status STRING, registration_date STRING, PRIMARY KEY (id) NOT ENFORCED ) WITH ( \u0026#39;connector\u0026#39; = \u0026#39;upsert-kafka\u0026#39;, \u0026#39;topic\u0026#39; = \u0026#39;customers\u0026#39;, \u0026#39;properties.bootstrap.servers\u0026#39; = \u0026#39;kafka:9092\u0026#39;, \u0026#39;key.format\u0026#39; = \u0026#39;json\u0026#39;, \u0026#39;value.format\u0026#39; = \u0026#39;json\u0026#39; ); Note that the upsert-kafka connector must be used (i.e. not the kafka one), as I’ve discussed recently in this Data Streaming Quick Tip episode. Finally, the transformation from the source stream into the published format is a simple INSERT statements, using just a few lines of Flink SQL:\nFlink SQL job for transforming the source table into the sink table 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 INSERT INTO customers_public SELECT id, fname, lname, email, zip, phone, CASE WHEN status = 1 THEN \u0026#39;NEW\u0026#39; WHEN status = 2 THEN \u0026#39;VIP\u0026#39; WHEN status = 3 THEN \u0026#39;BLOCKED\u0026#39; ELSE \u0026#39;STANDARD\u0026#39; END, DATE_FORMAT(registered, \u0026#39;dd-MM-yyyy\u0026#39;) FROM customers; When submitting this query, a job will be deployed onto your Flink cluster which executes the query in a continuous fashion, propagating any changes from the source to the destination, whenever there is a row inserted, updated, or deleted in the source table. The following transformations are applied:\nRenamings: The names of some fields are changed, e.g. from fname to first_name (by means of inserting the selected values into fields with the desired alternative names in the sink table)\nOmissions: Some sensitive fields are excluded from the published stream (all address fields besides zip)\nType changes: The registered field is changed from TIMESTAMP to STRING\nValue conversions: Instead of the original numeric constants, corresponding string labels are emitted for the status field\nAll these transformations are projections generally speaking. But depending on your requirements, you could take things much further and for instance:\napply filters using a WHERE clause, excluding test accounts or logically deleted customer records from the change event stream,\njoin multiple change event streams into a single one, which becomes particularly useful when publishing change events for one aggregate which is persisted in multiple tables in a relational database (see further below for an example for that),\nadd derived fields, e.g. with a customer’s full name, and much more.\nInstead of publishing a single data contract for a table like the customers table in the example above, you also may decide to create multiple streams with slightly different contracts, geared towards different consumers and use cases. For instance, you may have two public customer change event streams, one with address data (which can only be accessed by a small number of authorized clients) and one without (which would be accessible by a larger number of clients).\nThere is an interesting tension here between defining contracts which are widely applicable vs. contracts which are optimized for specific consumers. A useful guiding principle for resolving this tension could be \u0026#34;As general as possible, as specific as needed\u0026#34;. This means that, for instance, you would include the widest set of fields possible in a change stream’s data contract by default, only keeping sensitive data exclusive to streams for specific privileged consumers. Other kinds of filtering on the other hand—​such as filtering out any backfilling events—​would be the responsibility of individual clients, based on their specific requirements.\nAs far as filtering specific rows or columns is concerned, you have different options. You could ingest everything into Flink but only publish a subset, as shown in the example above. This would be useful when you want to have the option to publish multiple variants of a data contract, as just discussed. But you also could decide to omit specific sensitive fields in the Flink source table definition, thus making sure they never can be part of any published data contracts. Depending on your database, you may even exclude specific data from the ingested change stream altogether. As an example, Postgres supports the definition of column lists and row filters, providing fine-grained control over the contents of any logical replication streams, thus helping to reduce network traffic and any potential cost associated with it.\nStreaming Data Contracts—​Beyond the Basics Once you have embarked onto your journey of creating data contracts with stateful stream processing, the sky’s the limit, and you have all kinds of interesting related capabilities at your disposal.\nSo let’s discuss how to expose a complexly structured change event stream, derived from two tables in the source database. Imagine that the domain model from the example above is changed so that a customer can have multiple phone numbers, instead of just a single one. To model that, instead of having a phone column within the customers table, let’s assume that there’s a separate table phone_numbers, with a 1:n relationship between the two tables.\nA stream processing job could then be used to join the two tables, emitting the phone numbers as part of the data contract for the customer change event stream. That way, instead of having to deal with two table-level streams, consumers would be able to ingest all the data pertaining to a customer from one single stream, independent from how that data is organized in the source database. To make things a bit more interesting, let’s emit one of the numbers (the customer’s preferred one) via its own field phone, and all the others via an array-typed field further_phones:\nFigure 3. Joining two change event streams A Flink SQL user-defined function (UDF) for aggregating the elements of the \u0026#34;many\u0026#34; side of the join into an array in a type-safe way can come in handy here, something I’ve recently explored in this video. Using the ARRAY_AGGR() function discussed in that quick tip, the two source tables could be joined like this:\nJoining two source change streams into a single public stream 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 INSERT INTO customers_public SELECT c.id, c.fname, c.lname, c.email, c.zip, preferred.`value`, ARRAY_AGGR(further_phones.`value`), CASE WHEN status = 1 THEN \u0026#39;NEW\u0026#39; WHEN status = 2 THEN \u0026#39;VIP\u0026#39; WHEN status = 3 THEN \u0026#39;BLOCKED\u0026#39; ELSE \u0026#39;STANDARD\u0026#39; END, DATE_FORMAT(registered, \u0026#39;dd-MM-yyyy\u0026#39;) FROM customers c LEFT JOIN (SELECT * FROM phone_numbers WHERE preferred = true) preferred ON c.id = preferred.customer_id LEFT JOIN (SELECT * FROM phone_numbers WHERE preferred = false) further_phones ON c.id = further_phones.customer_id GROUP BY c.id, fname, lname, email, zip, c.phone, registered, status, preferred.`value`; The phone_numbers table is left-joined here twice, once to obtain the preferred phone number and once for all the non-preferred numbers, which then are exposed as an array via the aforementioned ARRAY_AGGR() function.\nBut you also can go beyond the pure needs of data contracts themselves. One example is the expansion of partial change events: UPDATE events emitted by the Debezium connector for Cassandra only contain those fields of a record whose values actually changed, whereas any unchanged values are not contained. Similarly, the Postgres connector won’t emit values for unchanged TOAST columns (large column values stored by the database in a specific way). If a consumer only supports full record updates, it won’t be easily able to process such partial change events. This could be addressed by implementing a job for publishing a data contract with Flink’s DataStream API which leverages a state store for expanding any partial events into full ones, retrieving any missing field values from the store.\nAnother very interesting option would be taking advantage of the metadata emitted by Debezium for transaction boundaries; with this information you could implement buffering logic for emitting change events originating from one and the same transaction only when all the events from that transaction have been ingested, which is particularly useful when joining multiple raw change event streams into a single one.\nHandling Schema Changes As the saying goes: Nothing is permanent except change. It’s only a question of time until new columns are added to your application’s data schema, existing ones are renamed or removed, or their types get changed. With explicitly defined data contracts in place, you have taken the first step for making sure that any changes to your internal data schema do not directly affect the consumers of your change event streams.\nFrom a procedural perspective, it’s important that the team owning and publishing a data contract can apply changes to the contract without having to synchronize with any event consumers, who perhaps may not even be known to the upstream team. At the same time, any changes to the contract should not break existing consumers—​after a schema change they should be able to continue to process a change event stream based on the previous schema known to them. Of course, they will need to be adjusted eventually, so as to take advantage of the capabilities of a new contract version, such as any added fields. The guarantees around duration of support for particular versions of a contract is something that would be built into them along with the other metadata previously discussed such as SLAs.\nThis means data contracts for change event streams should be evolved in a forward compatible way, which allows for the addition of new fields and the removal of optional fields, whereas existing non-optional fields may not be removed.\nFigure 4. Producer-driven evolution of a schema To learn more about the guidelines for schema evolution, I highly recommend referring to Gwen Shapira’s presentation \u0026#34;Streaming Microservices: Contracts \u0026amp; Compatibility\u0026#34;, where she discusses this topic around 16:50 min. A schema registry should be used in order to ensure that any changes to a data contract adhere to these requirements. Before rolling out any data contract changes to production, a CI/CD pipeline would validate any schema changes using the compatibility rules configured in the registry, as for instance described in this excellent blog post by Chad Sanderson and Adrian Kreuziger. Any contract changes which would actually break existing consumers, would fail the build process and thus be prevented from being deployed.\nEvolving data contracts in a forward-compatible manner means that consumers cannot replay any events from the beginning using only the latest schema version. This would fail when, for instance, re-processing an event lacking a non-optional field added in a later schema version. Instead, each event should be processed with the schema version valid at the time when the event was originally created.\nNow, how can stream processing help you with managing these kinds of data contract changes? As an example, consider the case of renaming a column within a source table. The schema of the table’s change stream would change correspondingly, whenever the first change event after the name change is ingested. Exposing this schema change as-is to any downstream consumers would be an incompatible change and thus should be avoided. Different options for solving the issue exist:\nCreating another version of the stream with the new schema (i.e. new field name); both, old and new stream versions, would co-exist, and clients could migrate from the old to the new one at their own pace\nExpand the schema of the existing stream, so that it contains another field with the new name, next to the existing field with the old name\nKeep the existing stream schema, i.e. don’t change the field name in the public data contract\nIn every case, stream processing can be used to apply the required transformations between the source events (containing either old or new field name, depending on the specific stream position) and the published counterparts(s). Let’s see how the last option—​completely shielding consumers from that name change—​can be implemented with Flink SQL.\nFigure 5. Renaming a column in the source table The key idea is to use Flink’s savepoint mechanism for pausing the job, while applying the required schema changes to the database and the Flink job, making sure the job maps both old and new incoming field names to the existing name in the public contract. The exact sequence of events would be this:\nIn Flink, stop the job with a savepoint: STOP JOB \u0026#39;\u0026lt;job id\u0026gt;\u0026#39; WITH SAVEPOINT; This makes sure the job, after restarting, will continue to process the source change stream from the exact position where it left off, not missing any changes which happened in between.\nIn the source database, rename the column: ALTER TABLE customers RENAME COLUMN fname TO first_name;\nIn Flink, add a column with the new name to the table, keeping the one with the old name too: ALTER TABLE customers ADD first_name STRING;\nIn Flink, configure the savepoint path: SET \u0026#39;execution.savepoint.path\u0026#39; = \u0026#39;/path/to/savepoints/savepoint-\u0026lt;job id\u0026gt;\u0026#39;;\nIn Flink, create a new version of the job, using the COALESCE() function to retrieve the first name either from the old or the new field, depending on which one exists in the incoming event:\nRetrieving the first name from the correct column, depending on which value is present 1 2 3 4 5 6 INSERT INTO customers_public SELECT id, COALESCE(fname, first_name), ... FROM customers; With this procedure, any consumers of the public data contract are fully shielded from the column name change in the database. The job will source the first name from the correct incoming field, no matter whether it processes a change event from before or after the schema change was made in the database.\nNote that the correct order of steps is vital here; in particular, the Flink job must be stopped before applying the schema change in the source database. Otherwise, it would not pick up the value from change events emitted after the column has been renamed. Therefore, the development team owning the database schema should also be in charge of the CDC pipeline and the Flink job for creating the public change stream with the data contract.\nBeyond renaming columns, also other schema changes can be handled with a stream processing engine. New columns could be added just like above. For dropped NOT NULL columns, the streaming job could omit a sentinel value such as \u0026#34;n/a\u0026#34; to ensure compatibility with existing consumers. Also cardinality changes—​for instance, going from a single phone column within the customers table to a separate table with multiple phone numbers per customer, as shown above—​are possible by aggregating all the values into a new array-typed field in the public data stream.\nSummary Now, does CDC break data encapsulation? As we’ve seen, the answer to this question is surprisingly nuanced and depends a lot on how and where change event streams are consumed: the key consideration is whether events cross team and/or context boundaries, or not.\nIn cases where encapsulation is a concern, consciously designed data contracts can be a great tool to shield external consumers of a change event stream from the implementation details of an application’s persistent data model and any changes to its schema. With the help of stream processing, for instance using Apache Flink, you can establish well-defined APIs for your data, resulting in more robust and reliable data pipelines. Flink SQL makes the creation of data contracts a matter of describing the shape of your data with a few lines of SQL, while Flink’s DataStream API can be used for implementing more advanced requirements such as expanding incoming partial change events into full events.\nWith those tools and corresponding processes in place, you don’t need to be concerned about accidentally exposing your internal data model and changes to the schema of the same breaking your change stream consumers.\n","id":99,"publicationdate":"Nov 21, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_cdca_quick_primer\"\u003eCDC—​A Quick Primer\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_does_cdc_break_encapsulation\"\u003eDoes CDC Break Encapsulation?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_entering_data_contracts\"\u003eEntering Data Contracts\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_implementation_approaches_for_data_contracts\"\u003eImplementation Approaches For Data Contracts\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_outbox_pattern\"\u003eThe Outbox Pattern\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_stream_processing\"\u003eStream Processing\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_streaming_data_contractsbeyond_the_basics\"\u003eStreaming Data Contracts—​Beyond the Basics\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_handling_schema_changes\"\u003eHandling Schema Changes\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/change-data-capture-breaks-encapsulation-does-it-though\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eHaving worked on Debezium—​an open-source platform for Change Data Capture (CDC)—​for several years, one concern I’ve heard repeatedly is this: aren’t you breaking the encapsulation of your application when you expose change event feeds directly from your database?\nAfter all, CDC exposes your internal persistent data model to the outside world, which may have unintended consequences, e.g.\nin terms of data exposure but also when it comes to changes to the schema of your data, which may break downstream consumers.\u003c/p\u003e\n\u003c/div\u003e","tags":["cdc","debezium","flink","streaming"],"title":"\"Change Data Capture Breaks Encapsulation\". Does it, though?","uri":"https://www.morling.dev/blog/change-data-capture-breaks-encapsulation-does-it-though/"},{"content":" This question came up on the Data Engineering sub-reddit the other day: Can Debezium lose any events? I.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?\nI’ve already replied on Reddit itself, but I thought it’d warrant a slightly longer discussion here. To get the most important thing out of the way first: In general, Debezium by itself should never miss any event. If it does, that’s considered a blocker bug which the development team will address with highest priority. After all, Debezium’s semantics are at-least-once (i.e. duplicate events may occur, specifically after an unclean connector shut-down), not at-most-once.\nThat being said, it may happen that due to operational deficiencies portions of the database’s transaction log get discarded before Debezium gets a chance to capture them. This can happen when a Debezium connector isn’t running for a longer period of time, and the maximum transaction log retention time is reached.\nMost of the databases provide some sort of configuration parameter for controlling this behavior. In MySQL for instance, there is the binlog_expire_logs_seconds parameter for this purpose (which defaults to 2,592,000 seconds, i.e. 30 days). When you are using MySQL on Amazon RDS, the option to use is called binlog retention hours. For SQL Server, the retention time for CDC data can be configured using the stored procedure sys.sp_cdc_change_job().\nIn contrast, Postgres approaches this matter a bit differently: Replication slots keeps track of how far consumers have consumed the write-ahead log (WAL). Consumers must actively acknowledge the latest WAL position (log sequence number, LSN) they have consumed. Only when an LSN has been acknowledged by all replication slots, the database will discard older WAL segments. This means that, by default, even an extended connector downtime will not lead to event loss. This comes at a price though: the database holds on to all the unconsumed WAL segments, consuming more and more disk space until the connector gets restarted again.\nThe Insatiable Replication Slot Even when a replication slot is active, it can happen under specific circumstances that the slot’s consumer cannot acknowledge any LSNs, causing the database machine to run out of disk space eventually. You can learn more about the reasons, and ways for mitigating this issue, in this blog post. Luckily, the issue has recently been resolved in the Postgres JDBC driver, version 42.6.0.\nTherefore, a new configuration option was introduced in Postgres 13, max_slot_wal_keep_size, which defines the maximum WAL size in bytes which a replication slot may retain. If a slot causes retained WAL files to grow beyond the configured value, older segments will be removed. This means that, when configuring this option (the default value is -1, i.e. an indefinite WAL keep size), the behavior is the same as for instance with MySQL, and consumers will not be able to resume processing after falling off the log. By means of the always snapshot mode, you can start with a new complete initial snapshot in this case.\nIn general though, you should avoid this situation to begin with, and have observability tools in place which will trigger an alert when a Debezium connector isn’t running for a longer period of time, for instance by querying the Kafka Connect REST API. For Postgres, you also can track the retained WAL size of a replication slot using the pg_current_wal_lsn() and pg_wal_lsn_diff() functions, as I described in this blog post a while ago.\n","id":100,"publicationdate":"Nov 14, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis question came up on the Data Engineering sub-reddit the other day:\n\u003ca href=\"https://old.reddit.com/r/dataengineering/comments/17ttw5e/can_debezium_loose_updates/\"\u003eCan Debezium lose any events\u003c/a\u003e?\nI.e. can there be a situation where a record in a database get inserted, updated, or deleted, but Debezium fails to capture that event from the transaction log and propagate it to downstream consumers?\u003c/p\u003e\n\u003c/div\u003e","tags":["debezium","cdc","reliability"],"title":"Can Debezium Lose Events?","uri":"https://www.morling.dev/blog/can-debezium-lose-events/"},{"content":" Table of Contents What is CDC? CDC Tools Analytics Data Platforms Application Caches Full-Text Search Audit Logs Continuous Queries Microservices Data Exchange Monolith-to-Microservices Migration Summary This post originally appeared on the Decodable blog. All rights reserved.\nChange Data Capture (CDC) is a powerful tool in data engineering and has seen a tremendous uptake in organizations of all kinds over the last few years. This is because it enables the tight integration of transactional databases into many other systems in your business at a very low latency.\nWhat is CDC? CDC responds to changes—​such as inserts, updates, and deletions—​as they are made in a transactional database and sends those changes in real time to another system for ingesting and processing. While there are multiple possible ways for implementing a CDC solution, the most powerful approach is log-based CDC, which retrieves change events from the transaction log of a database.\nFigure 1. Log-based change data capture Compared to other styles of CDC, such as periodic polling for changed records, log-based CDC has a number of advantages, including:\nvery low latency while also being resource-efficient\nguaranteed to never miss any changes, including deletes\nno impact on your source data model\nCDC Tools There are a number of commercial and open-source software (OSS) CDC tools, but the most widely used OSS CDC platform is Debezium, which provides connectors for many popular databases such as MySQL, Postgres, SQL Server, Oracle, Cassandra, Google Cloud Spanner, and others. Several database vendors have implemented their own CDC connectors based on Debezium, for instance Yugabyte and ScyllaDB.\nDebezium’s record format, used to model change events, describes the old and new states of data records and includes metadata like the source database and table name, as well as the position in the transaction log file. This format has become a de-facto standard for change event data, supported by numerous projects and vendors in the data streaming space.\nCDC platforms, like Debezium, are frequently used alongside popular data engineering tools such as Apache Kafka for data streaming and Apache Flink, which natively supports Debezium event formats for stateful stream processing, including filtering and joining change event streams from various sources.\nIn this article, I’ll present an overview of seven common use cases for change data capture—Let’s go!\nAnalytics Data Platforms One of the most widely adopted uses for CDC is getting data from transactional databases into systems that more efficiently support analytical processing. Whereas transactional database systems such as MySQL or Postgres are optimized for efficiently processing transactions on comparatively small numbers of records, OLAP systems are designed to handle queries that read potentially millions of rows of data. CDC provides a way to keep OLAP data stores up to date with the freshest data possible from OLTP systems, with end-to-end latencies in the range of mere seconds.\nCDC is commonly used as a component in data pipelines that propagate data changes into cloud data warehouses (e.g. Snowflake, via Snowpipe, or Google BigQuery) and data lakes, enabling data science, general reporting, and ad-hoc querying use cases. Low-latency ingestion into real-time analytics stores on the other hand (e.g. Apache Pinot, Apache Druid or Clickhouse) allows you to implement use cases like in-app analytics or real-time dashboards. For use cases involving frequent updates to existing records, such as mutable data, stores optimized for upsert semantics (e.g., Pinot or Rockset) often offer tailored support for the Debezium event format.\nApplication Caches A common pattern used for improving application performance is the introduction of a local cache of read-only data. One of the main challenges with this is keeping the cache fresh and ensuring that the application does not read stale data. CDC can be an excellent fit here. CDC responds to changes to the database in real time and can feed those changes into a cache updater component that ensures that the local caches provide an up-to-date view.\nBesides dedicated caching solutions such as Redis, Infinispan, or Hazelcast, embedded SQLite databases are an interesting option for implementing application-side caches, as they not only allow for simple key-based look-ups but support full query flexibility via SQL. If you would like to learn more about this kind of architecture, take a look at my talk \u0026#34;Keep Your Cache Always Fresh With Debezium\u0026#34;, which explores one implementation of this in depth ( video recording, slides).\nFigure 2. Application-side caches in a distributed service, kept in sync with the primary DB via CDC Instead of caching raw data as-is, it can be useful to create denormalized data views of your data which are then stored in the cache. For instance, Apache Flink could be used to join the change event streams from two tables in an RDBMS and create one single nested data structure from that. That way, data can be retrieved from the cache very efficiently, without performing any costly read-time joins.\nFull-Text Search Similar to the way in which transactional database systems are not well suited to directly supporting analytical workloads, full-text search also benefits greatly from data stores that are specifically designed for that purpose, such as Elasticsearch or OpenSearch. Leveraging search-specific functionality such as stemming, normalization, stop words, synonyms, and abbreviations ensures rapid delivery of the most relevant results for broad or even \u0026#34;fuzzy\u0026#34; searches.\nJust like with the cache example above, CDC fits perfectly in your architecture for keeping your full-text search data store up to date. CDC responds in real-time to changes made to the database and sends the changed data events to a tool such as Apache Flink which loads them into your search system.\nAnother consideration is the potential need to create nested document structures which are used in document stores like Elasticsearch. I recently published an episode of \u0026#34;Data Streaming Quick Tips\u0026#34;, exploring array aggregation and how to join and nest the data of one aggregate instead of trying to force a 1:1 mapping between RDBMS source tables and search indexes. If you prefer a text-based version, you can find a transcript of this video here.\nAudit Logs In enterprise applications, retaining an audit log of your data is a common requirement, keeping track of when and how data records changed. CDC offers an effective solution since extracting data changes from a database transaction log provides that information: A change event stream, with events for all the inserts, updates, and deletes executed for a table could be considered a simple form of an audit log.\nHowever, this approach lacks contextual metadata, such as user information, client details, use case identifiers, etc. To address this limitation, applications can provide metadata—either via a dedicated metadata table or in the form of a logical decoding message emitted at the start of each transaction. Stream processing with Apache Flink can then be used to incorporate the missing context into the records.\nFlink’s datastream API can be used for enriching all the change events from one transaction with the applicable metadata. As all change events emitted by Debezium contain the id of the transaction they originate from, correlating the events of one transaction isn’t complicated. You can find a basic implementation of this in the Decodable examples repository.\nFigure 3. Enriching CDC events with transactional metadata using stream processing Within the same Flink job, you now could add a sink connector and for instance write the enriched events into a Kafka topic. Alternatively, depending on your business requirements, the enriched change events could also be written as an audit log to an object store such as S3, or a queryable analytics data store.\nContinuous Queries Not all queries need to be served from static stores like Pinot or Snowflake. Apache Flink provides dynamic tables, which is one that is \u0026#34;continuously updated and can be queried like a regular, static table\u0026#34;. However, in contrast to a static table query, a query on a dynamic table runs continuously and produces a table that is continuously updated based on changes to the input table, with results stored in a new dynamic table.\nCDC streams can be used as the source that drives a continuous query, representing a form of an incrementally updated materialized view, always yielding the latest results, as the underlying data changes.\nFigure 4. Continuous query for incrementally computing a join between two CDC streams This allows you to create a data pipeline that ingests CDC data into Flink, create continuous queries to perform aggregations, filters, or pattern recognition on the CDC data stream, and the results can be used to enable real-time analytics and decision-making. For instance, you may consider pushing any updates directly to a dashboard in connected web browsers using technologies such as server-sent events or Web Sockets, completely avoiding the need for any intermediary query layer.\nMicroservices Data Exchange As part of their business logic, microservices often not only have to update their own local data store, but also need to notify other services about data changes that have happened. The outbox pattern —which is implemented using CDC—is an approach for letting services execute these two tasks in a safe and consistent manner, avoiding the pitfalls of unsafe dual writes.\nFigure 5. Microservices data exchange with the outbox pattern CDC is a great fit for responding to new entries in the outbox table and streaming them to a messaging service such as Apache Kafka for propagation to other services. By only modifying a single resource—the source microservice’s own database—it avoids any potential inconsistencies of altering multiple resources at the same time which don’t share one common transactional context.\nIf you are on Postgres, then you don’t even need to have a bespoke outbox table for implementing the outbox pattern. With the help of logical decoding messages, you can insert your outbox events exclusively to the transaction log, from where they can be retrieved and propagated with CDC tools such as Debezium.\nMonolith-to-Microservices Migration Oftentimes, you don’t start building applications from scratch, but there is an existing application landscape which needs to be evolved and expanded. In this context, it may be necessary to migrate from an existing monolithic architecture to a set of loosely coupled microservices, splitting up the existing monolith. In order to avoid the risks of a big bang migration, it is recommended to take a gradual approach, extracting one service at a time. This approach is also named the \u0026#34;strangler fig pattern\u0026#34;, as the new services grow around the old application, \u0026#34;strangling\u0026#34; it over time.\nWhen doing so, the old monolith and the new service(s) will co-exist for some time. For example, you may start with extracting just a read view of some data (e.g. a customer’s order history) to a new microservice, while the monolithic application continues to handle data writes. You can then use CDC to propagate data changes from the monolith to the service providing the read view. I discussed the strangler pattern with CDC in a joint talk with Hans-Peter Grahsl, where we also explored advanced aspects such as avoiding infinite replication loops when doing bi-directional CDC or using stream processing for establishing an anti-corruption layer between legacy data models and newly created microservices.\nSummary And that’s it—seven use cases for CDC. It is a powerful enabler for your data, allowing you to react to any data changes in real time. I have worked with CDC for several years now, and I’m still surprised to learn about new use cases regularly. Case in point: just recently we started to use CDC for capturing changes in our control plane database and materializing corresponding Kubernetes resources.\nWhen ingesting raw change streams is not enough, stateful stream processing—e.g. with Apache Flink—allows you to transform, filter, join and aggregate your data. Flink’s rich ecosystem of connectors also provides you with connectivity with a wide range of data sinks, allowing you to implement cohesive end-to-end data flows on one unified platform.\nAre you using CDC already? For any of the use cases discussed above, or maybe others not mentioned here? I’d love to hear from you about your experiences with CDC—just reach out to me on Twitter, LinkedIn, or the Decodable Slack community.\nWant to try out CDC for yourself? Decodable has managed CDC— try it now and get started with processing your change streams today.\n","id":101,"publicationdate":"Nov 2, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_what_is_cdc\"\u003eWhat is CDC?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_cdc_tools\"\u003eCDC Tools\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_analytics_data_platforms\"\u003eAnalytics Data Platforms\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_application_caches\"\u003eApplication Caches\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_full_text_search\"\u003eFull-Text Search\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_audit_logs\"\u003eAudit Logs\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_continuous_queries\"\u003eContinuous Queries\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_microservices_data_exchange\"\u003eMicroservices Data Exchange\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_monolith_to_microservices_migration\"\u003eMonolith-to-Microservices Migration\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/cdc-use-cases\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eChange Data Capture (CDC) is a powerful tool in data engineering and has seen a tremendous uptake in organizations of all kinds over the last few years.\nThis is because it enables the tight integration of transactional databases into many other systems in your business at a very low latency.\u003c/p\u003e\n\u003c/div\u003e","tags":["cdc","debezium","streaming","flink"],"title":"CDC Use Cases: 7 Ways to Put CDC to Work","uri":"https://www.morling.dev/blog/cdc-use-cases/"},{"content":" The other day at work, we had a situation where we suspected a thread leak in one particular service, i.e. code which continuously starts new threads, without taking care of ever stopping them again. Each thread requires a bit of memory for its stack space, so starting an unbounded number of threads can be considered as a form of memory leak, causing your application to run out of memory eventually. In addition, the more threads there are, the more overhead the operating system incurs for scheduling them, until the scheduler itself will consume most of the available CPU resources. Thus it’s vital to detect and fix this kind of problem early on.\nThe usual starting point for analyzing a suspected thread leak is taking a thread dump, for instance using the jstack CLI tool or via JDK Mission Control; if there’s an unexpected large number of threads (oftentimes with similar or even identical names), then it’s very likely that something is wrong indeed. But a thread dump by itself is only a snapshot of the thread state at a given time, i.e. it doesn’t tell you how the thread count is changing over time (perhaps there are many threads which are started but also stopped again?), and it also doesn’t provide you with information about the cause, i.e. which part of your application is starting all those threads. Does it happen in your own code base, or within some 3rd party dependency? While the thread names and stacks in the dump can give you some idea, that information isn’t necessarily enough for a conclusive root cause analysis.\nLuckily, Java’s built-in event recorder and performance analysis tool, JDK Flight Recorder, exposes all the data you need to identify thread leaks and their cause. So let’s take a look at the details, bidding farewell to those pesky thread leaks once and forever!\nThe first JFR event type to look at is jdk.JavaThreadStatistics: recorded every second by default, it keeps track of active, accumulated, and peak thread counts. Here is a JFR recording from a simple thread leak demo application I’ve created (newest events at the top):\nThe number of active threads is continuously increasing, never going back down again — pretty clearly that this a thread leak. Now let’s figure out where exactly all those threads are coming from.\nFor this, two other JFR event types come in handy: jdk.ThreadStart and jdk.ThreadEnd. The former captures all the relevant information when a thread is started: time stamp, name of the new thread and the parent thread, and the stack trace of the parent thread when starting the child thread. The latter event type will be recorded when a thread finishes. If we find many thread start events originating at the same code location without a corresponding end event (correlated via the thread id contained in the events), this is very likely a source of a thread leak.\nThis sort of event analysis is a perfect use case for JFR Analytics. This tool allows you to analyze JFR recordings using standard SQL (leveraging Apache Calcite under the hood). In JFR Analytics, each event type is represented by its own \u0026#34;table\u0026#34;. Finding thread start events without matching end events is as simple as running a LEFT JOIN on the two event types and keeping only those start events which don’t have a join partner.\nSo let’s load the file into the SQLLine command line client (see the README of JFR Analytics for instructions on building and launching this tool):\n1 2 3 !connect jdbc:calcite:schemaFactory=org.moditect.jfranalytics.JfrSchemaFactory;schema.file=thread_leak_recording.jfr dummy dummy !outputformat vertical Run the following SQL query for finding thread start events without corresponding thread join events:\n1 2 3 4 5 6 7 8 SELECT ts.\u0026#34;startTime\u0026#34;, ts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaName\u0026#34; as \u0026#34;parentThread\u0026#34;, ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;newThread\u0026#34;, TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 20) AS \u0026#34;stackTrace\u0026#34; FROM \u0026#34;jdk.ThreadStart\u0026#34; ts LEFT JOIN \u0026#34;jdk.ThreadEnd\u0026#34; te ON ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; = te.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; WHERE te.\u0026#34;startTime\u0026#34; IS NULL; Note how the parentThread and eventThread columns are of a complex SQL type, allowing you to refer to thread properties such as javaName or javaThreadId using dot notation. In that simple example recording, there’s one stack trace which dominates the result set, so looking at any of the events reveals the culprit:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 startTime 2023-02-26 11:36:04.284 javaName executor-thread-0 javaName pool-1060-thread-1 stackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Callable):145 java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791 org.acme.GreetingResource.hello():18 null null null null jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Object, Object[]):104 java.lang.reflect.Method.invoke(Object, Object[]):578 org.jboss.resteasy.core.MethodInjectorImpl.invoke(HttpRequest, HttpResponse, Object, Object[]):170 org.jboss.resteasy.core.MethodInjectorImpl.invoke(HttpRequest, HttpResponse, Object):130 org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(HttpRequest, HttpResponse, Object):660 org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(HttpRequest, HttpResponse, Object):524 org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(HttpRequest, HttpResponse, Object):474 null org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter():364 The call for creating a new thread apparently is initiated by the GreetingResource::hello() method by submitting a Callable to an executor service. And surely enough, this is how it looks like:\n1 2 3 4 5 6 7 8 9 10 11 @GET @Produces(MediaType.TEXT_PLAIN) public String hello() { ExecutorService executor = Executors.newSingleThreadExecutor(); executor.submit(() -\u0026gt; { while (true) { Thread.sleep(1000L); } }); return \u0026#34;Hello World\u0026#34;; } If things are not as clear-cut as in that contrived example, it can be useful to truncate stack traces to a reasonable line count (e.g. it should be save to assume that the user code starting a thread is never further away than ten frames from the actual thread start call) and group by that. JFR Analytics provides the built-in function TRUNCATE_STACKTRACE for this purpose:\n1 2 3 4 5 6 7 8 9 SELECT TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 10) AS \u0026#34;stackTrace\u0026#34;, COUNT(1) AS \u0026#34;threadCount\u0026#34; FROM \u0026#34;jdk.ThreadStart\u0026#34; ts LEFT JOIN \u0026#34;jdk.ThreadEnd\u0026#34; te ON ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; = te.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; WHERE te.\u0026#34;startTime\u0026#34; IS NULL GROUP BY TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 10) ORDER BY \u0026#34;threadCount\u0026#34; DESC; This points at the problematic stack traces and code locations in a very pronounced way (output slightly adjusted for better readability):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 stackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Callable):145 java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791 org.acme.GreetingResource.hello():18 null null null threadCount 414 --- stackTrace java.util.Timer.\u0026lt;init\u0026gt;(String, boolean):188 jdk.jfr.internal.PlatformRecorder.lambda$createTimer$0(List):101 null java.lang.Thread.run():1589 threadCount 1 Sometimes you may encounter a situation where new threads are started from within other threads in a 3rd party dependency, rather than directly from threads within your own code base. In that case the stack traces of the thread start events may not tell you enough about the root cause of the problem, i.e. where those other \u0026#34;intermediary\u0026#34; threads are coming from, and how they relate to your own code.\nTo dig into the details here, you can leverage the fact that each jdk.ThreadStart event contains information about the parent thread which started a new thread. So you can join the jdk.ThreadStart table to itself on the parent thread’s id, fetching also the stack traces of the code starting those parent threads:\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT ts.\u0026#34;startTime\u0026#34;, pts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;grandParentThread\u0026#34;, ts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;parentThread\u0026#34;, ts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaName\u0026#34; AS \u0026#34;newThread\u0026#34;, TRUNCATE_STACKTRACE(pts.\u0026#34;stackTrace\u0026#34;, 15) AS \u0026#34;parentStackTrace\u0026#34;, TRUNCATE_STACKTRACE(ts.\u0026#34;stackTrace\u0026#34;, 15) AS \u0026#34;stackTrace\u0026#34; FROM \u0026#34;jdk.ThreadStart\u0026#34; ts LEFT JOIN \u0026#34;jdk.ThreadEnd\u0026#34; te ON ts.\u0026#34;startTime\u0026#34; = te.\u0026#34;startTime\u0026#34; JOIN \u0026#34;jdk.ThreadStart\u0026#34; pts ON ts.\u0026#34;parentThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; = pts.\u0026#34;eventThread\u0026#34;.\u0026#34;javaThreadId\u0026#34; WHERE te.\u0026#34;startTime\u0026#34; IS NULL; Here, stackTrace is the trace of a thread (named \u0026#34;pool-728-thread-1\u0026#34;) of an external library, \u0026#34;greeting provider\u0026#34;, which starts another (leaking) thread (named \u0026#34;pool-729-thread-1\u0026#34;), and parentStackTrace points to the code in our own application (thread name \u0026#34;executor-thread-0\u0026#34;) which started that first thread:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 startTime 2023-02-28 09:15:24.493 grandParentThread executor-thread-0 parentThread pool-728-thread-1 newThread pool-729-thread-1 parentStackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Runnable):123 java.util.concurrent.Executors$DelegatedExecutorService.submit(Runnable):786 com.example.greeting.GreetingService.greet():20 com.example.greeting.GreetingService_ClientProxy.greet() org.acme.GreetingResource.hello():20 null null null null jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Object, Object[]):104 java.lang.reflect.Method.invoke(Object, Object[]):578 --- stackTrace java.lang.System$2.start(Thread, ThreadContainer):2528 jdk.internal.vm.SharedThreadContainer.start(Thread):160 java.util.concurrent.ThreadPoolExecutor.addWorker(Runnable, boolean):953 java.util.concurrent.ThreadPoolExecutor.execute(Runnable):1364 java.util.concurrent.AbstractExecutorService.submit(Callable):145 java.util.concurrent.Executors$DelegatedExecutorService.submit(Callable):791 com.example.greeting.GreetingProvider.createGreeting():13 com.example.greeting.GreetingProvider_ClientProxy.createGreeting() com.example.greeting.GreetingService.lambda$greet$0(AtomicReference):21 null java.util.concurrent.Executors$RunnableAdapter.call():577 java.util.concurrent.FutureTask.run():317 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker):1144 java.util.concurrent.ThreadPoolExecutor$Worker.run():642 java.lang.Thread.run():1589 If the thread hierarchy is even deeper, you could continue down that path and keep joining more and more parent threads until you’ve arrived at the application’s main thread. I was hoping to leverage recursive query support in Calcite for this purpose, but as it turned out, support for this only exists in the Calcite RelBuilder API at the moment, while the RECURSIVE keyword is not supported for SQL queries yet.\nEquipped with JDK Flight Recorder, JDK Mission Control, and JFR Analytics, identifying and fixing thread leaks in your Java application is becoming a relatively simple task. The jdk.JavaThreadStatistics, jdk.ThreadStart, and jdk.ThreadEnd event types are enabled in the default JFR profile, which is meant for permanent usage in production. I.e. you can keep a size-capped continuous recording running all the time, dump it into a file whenever needed, and then start the analysis process as described above.\nTaking things a step further, you could also set up monitoring and alerting on the number of active threads, e.g. by exposing the jdk.JavaThreadStatistics event via a remote JFR event recording stream, allowing you to react in real-time whenever the active thread count reaches an unexpected high level.\n","id":102,"publicationdate":"Feb 28, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day at work, we had a situation where we suspected a thread leak in one particular service,\ni.e. code which continuously starts new threads, without taking care of ever stopping them again.\nEach thread requires a bit of memory for its stack space,\nso starting an unbounded number of threads can be considered as a form of memory leak, causing your application to run out of memory eventually.\nIn addition, the more threads there are, the more overhead the operating system incurs for scheduling them,\nuntil the scheduler itself will consume most of the available CPU resources.\nThus it’s vital to detect and fix this kind of problem early on.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","troubleshooting","performance"],"title":"Finding Java Thread Leaks With JDK Flight Recorder and a Bit Of SQL","uri":"https://www.morling.dev/blog/finding-java-thread-leaks-with-jdk-flight-recorder-and-bit-of-sql/"},{"content":"","id":103,"publicationdate":"Feb 28, 2023","section":"tags","summary":"","tags":null,"title":"troubleshooting","uri":"https://www.morling.dev/tags/troubleshooting/"},{"content":" Table of Contents 📦 Distribution 📆 Version 🔧 Installation 💡 Your First Java Program 📚 Learning the Language 👷‍♀️ Build Tool 📝 Editor 🧱 Libraries 🐢 Application Framework 🐳 Container Base Image 🔭 Next Steps 27 years of age, and alive and kicking — The Java platform regularly comes out amongst the top contenders in rankings like the TIOBE index. In my opinion, rightly so. The language is very actively maintained and constantly improved; its underlying runtime, the Java Virtual Machine (JVM), is one of, if not the most, advanced runtime environments for managed programming languages.\nThere is a massive eco-system of Java libraries which make it a great tool for a large number of use cases, ranging from command-line and desktop applications, over web apps and backend web services, to datastores and stream processing platforms. With upcoming features like support for vectorized computations (SIMD), light-weight virtual threads, improved integration with native code, value objects and user-defined primitives, and others, Java is becoming an excellent tool for solving a larger number of software development tasks than ever before.\nThe immense breadth of Java and its ecosystem, having grown and matured over nearly three decades, can also be challenging though for folks just starting their careers as a Java developer. Which Java version should you use? How to install it? Which build tool and IDE are the right ones? For all these, and many other questions, there are typically a number of options, which can easily overwhelm you if you are new to Java. As the platform has evolved, tools have come and gone, things which were hugely popular years ago have fallen into obsolescence since then. As related information can still be found on the internet, it can be hard to identify what’s still relevant and what not.\nThe idea for this blog post is to provide an opinionated guide for folks getting started with Java development in 2023, helping you with your very first steps with that amazing platform. Note I’m not saying that the things I’m going to recommend are the best ones for each and every situation. The focus is on providing a good getting-started experience. Some of the recommended tools or approaches may make less sense to use as you get more experienced and other choices might be better suited for you then, based on the specific situation and its requirements.\nAlso, very importantly, there is a notion of personal taste and preference to these things, those are my recommendations, and those of others might look different, which is perfectly fine.\nJava — What is What? As you make your first steps with Java, it might be confusing to understand what even Java is. Indeed \u0026#34;Java\u0026#34; refers to several things, which can even trip up more experienced folks. Here’s a list of key terms and concepts:\nThe Java programming language A general-purpose statically-typed object-oriented programming language (with some functional flavors); it is compiled into portable byte code which can be executed on a wide range of platforms\nThe Java platform \u0026#34;A suite of programs that facilitate developing and running programs written in the Java programming language\u0026#34;, with key elements being the Java compiler (javac), the Java virtual machine, and the Java standard class library. Focus of this post is the Java Standard Edition (SE), other platforms like Java Micro Edition (ME), and Jakarta Enterprise Edition (EE) are not discussed here. A large number of languages other than Java (the language) itself can run on Java SE (the platform), for instance Kotlin, Groovy, and Scala; they are also out of scope for this article, though\nThe Java virtual machine (JVM) A virtual machine for executing Java programs (or more precisely, byte code), e.g. taking care of tasks like loading byte code and verifying its correctness, compiling it into platform-specific machine code using a just-in-time compiler (JIT), automated memory management via garbage collection, ensuring isolation between different components, providing runtime diagnostics, etc.; multiple JVM implementations exist, including HotSpot and OpenJ9\nThe Java Development Kit (JDK) A distribution of tools for developing and running Java applications\nOpenJDK An open-source implementation of Java SE and related projects; also the name of the open-source community creating this implementation\nThe Java Community Process (JCP) A mechanism for developing specifications in the Java space, including those defining the different Java versions\n📦 Distribution The Java platform is maintained by the OpenJDK open-source project. Similar to Linux, multiple vendors provide binary distributions for this project, including Amazon, the Eclipse Foundation, Microsoft, Oracle, or Red Hat. These distributions differ in aspects like availability of commercial support and duration of the same, supported platforms, extent of testing, certain features like available garbage collectors, potentially bug fixes, and others. So which one should you use?\nFor the beginning, the differences won’t matter too much, and I suggest choosing Eclipse Temurin. It is backed by Adoptium, a working group of companies like Google, Red Hat, Microsoft, Alibaba, Azul, and others. You can download and use it for free, it contains everything you’ll need, passes the test compatibility kit (TCK) of the JDK, and if needed, there is commercial support provided by different vendors.\n📆 Version A new version of Java is released every six months, with the current one at the time of writing this being Java 19. Specific releases are long-term support (LTS) releases, for which vendors provide maintenance for many years. The current LTS release is Java 17 and I recommend you to get started with this one.\nWhile newer non-LTS releases may add useful new features, finding a sustainable update strategy can be a bit tricky, and many of the new features are preview or incubating features, meaning that you would not use them in production code anyways. I recommend you diving into those later on, once you’ve gained some experience with Java and its ecosystem.\nIf specific 3rd-party libraries don’t work seamlessly with Java 17 yet, you should use the previous LTS (Java 11). Don’t use non-LTS releases apart from the current one, as they are mostly unmaintained, i.e. you may open yourself to security issues and other bugs which won’t get fixed. Also don’t use Java 8 (alternatively named 1.8), which is the LTS before 11, as it’s really ancient by today’s standards.\n🔧 Installation There’s different ways for installing your chosen Java distribution. Usually, there’ll be a distribution package which you can download from the vendor’s website. Alternatively, package managers of the operating system allow you to install Java too.\nFor a simplified getting started experience, my recommendation is to take a look at SDKMan. This is a tool which allows you to install software development kits (SDKs) such as Java’s JDK. You can also update your installed SDK versions and easily switch between multiple versions.\nIf you have SDKMan installed, obtaining the current Eclipse Temurin build of Java 17 is as simple as running the following in your shell:\n1 2 3 4 5 6 $ sdk install java 17.0.5-tem # Install $ sdk use java 17.0.5-tem # Activate $ java --version # Verify version openjdk 17.0.5 2022-10-18 ... Installation in Windows SDKMan is implemented in bash, so if you are on Windows, you’ll need to install either the Windows Subsystem for Linux (WSL) or Cygwin before you can use SDKMan. I’d recommend having either in any case, but if that’s not an option, you may install Java using the winget package manager or by downloading your distribution directly from its vendors website.\n💡 Your First Java Program Having installed Java, it’s time to write your first Java program. Java is first and foremost an object-oriented language, hence everything in a Java program is defined in the form of classes, which have fields (representing their state) and methods (the behavior operating on that state). The canonical \u0026#34;Hello World\u0026#34; example in Java looks like this:\n1 2 3 4 5 public class HelloWorld { (1) public static void main(String... args) { (2) System.out.println(\u0026#34;Hello world!\u0026#34;); (3) } } 1 The class HelloWorld must be specified in a source file named HelloWorld.java 2 The main() method is the entry point into a Java program 3 The println() method prints the given text to standard out Java source code is compiled into class files which then are loaded into the JVM and executed. Normally, this is done in two steps: first running the compiler javac, then executing the program using the java binary. For quick testing and exploring, both steps can be combined, so you can execute your \u0026#34;Hello World\u0026#34; program like this:\n1 2 $ java HelloWorld.java Hello world! For exploring Java in a quick and iterative mode, it provides jshell, an interactive Read-Evaluate-Print Loop (REPL). You can use it for running expressions and statements without defining a surrounding method or class, simplifying \u0026#34;Hello World\u0026#34; quite a bit:\n1 2 3 4 $ jshell jshell\u0026gt; System.out.println(\u0026#34;Hello World\u0026#34;); Hello World Similar to jshell, but quite a bit more fancier is jbang, which for instance allows you to easily pull in 3rd party libraries into your single source file Java programs.\n📚 Learning the Language Providing an introduction to all the features of the Java programming language is beyond the scope of this blog post. To truly learn the language and all its details, my recommendation would be to get a good book, grab a coffee (or two, or three, …​) and work through its chapters, in order of your personal interests. A popular choice for getting started with Java is \u0026#34;Head First Java, 3rd Edition\u0026#34; by Kathy Sierra, Bert Bates, Trisha Gee, nicely complemented by The Well-Grounded Java Developer, 2nd Edition, by Benjamin Evans, Jason Clark, and Martijn Verburg. A must-read for honing your Java skills is \u0026#34;Effective Java, 3rd Edition\u0026#34;, by Joshua Bloch. While this has been updated for Java 9 the last time, its contents are pretty much timeless and still apply to current Java versions.\nIf you don’t want to commit to buying a book just yet, check out the \u0026#34;Learn Java\u0026#34; section on dev.java, which has tons of material describing the Java language, key parts of the class library, the JVM and its most important tools, and more in great detail.\nThe authoritative resource on the Java language is the Java Language Specification, or JLS for short. The specification is written in a very concise and well understandable way, and I highly recommend you to take a look if you’d like to understand how specific details of the language exactly work. That being said, when you’re just about to get started with learning Java, you’ll be better off by studying the resources mentioned above.\nIf certifications are your thing, you might consider learning for and taking the exam for the \u0026#34;Oracle Certified Professional: Java SE 17 Developer\u0026#34; one. I’d only recommend doing so after having worked with Java at least for a year or so, as the exam actually is quite involved. You’ll certainly learn a lot about Java, including all kinds of corner cases and odd details; not everything will necessarily translate into your day-to-day work as a developer, though. So you should consciously decide whether you want to spend the time preparing for the certification or not.\n👷‍♀️ Build Tool Once you go beyond the basics of manually compiling and running a set of Java classes, you’ll need a build tool. It will not only help you with compiling your code, but also with managing dependencies (i.e. 3rd party libraries you are using), testing your application, assembling the output artifacts (e.g. a JAR file with your program), and much more. There are plug-ins for finding common bug patterns, auto-formatting your code, etc. Commonly used build tool options for Java include Apache Maven, Gradle, and Bazel.\nMy recommendation is to stick with Maven for the beginning; it’s the most widely used one, and in my opinion the easiest to learn. Installing it is as simple as running sdk install maven with SDKMan. While it defines a rather rigid structure for your project, that also frees you from having to think about many aspects, which is great in particular when getting started. Maven has support for archetypes, templates which you can use to quickly bootstrap new projects. For instance you can use the oss-quickstart archetype which I have built for creating new projects with a reasonable set of pre-configured plug-ins like so:\n1 2 3 4 5 6 7 8 mvn archetype:generate -B \\ -DarchetypeGroupId=org.moditect.ossquickstart \\ -DarchetypeArtifactId=oss-quickstart-simple-archetype \\ -DarchetypeVersion=1.0.0.Alpha1 \\ -DgroupId=com.example.demos \\ -DartifactId=fancy-project \\ -Dversion=1.0.0-SNAPSHOT \\ -DmoduleName=com.example.fancy A lesser known yet super-useful companion to Maven is the Maven Daemon, which helps you to drastically speed up your builds by keeping a daemon process running in the background, avoiding the cost of repeatedly launching and initializing the build environment. You can install it via SDKMan by running sdk install mvnd.\nAlternative build tools like Gradle tend to provide more flexibility and interesting features like \u0026#34;compilation avoidance\u0026#34; (rebuilding only affected parts of large code bases after a change) or distributed build caches (increasing developer productivity in particular in large projects), but I’d wait with looking at those until you’ve gathered some experience with Java itself.\n📝 Editor Many Java developers love to fight over their favorite build tools, and it’s the same with editors and full-blown integrated development environments (IDEs). So whatever I’m going to say here, it’s guaranteed a significant number of people won’t like it ;)\nMy suggestion is to start with VSCode. It’s a rather light-weight editor, which comes with excellent Java developer support, e.g. for testing and debugging your code. It integrates very well with Maven-based projects and has a rich eco-system of plug-ins you can tap into.\nAs your needs grow, you’ll probably look for an IDE which comes with even more advanced functionality, e.g. when it comes to refactoring your code. While I’m personally a happy user of the Eclipse IDE, most folks tend to use IntelliJ IDEA these days and it’s thus what I’d recommend you to look into too. It comes with a feature-rich free community edition which will help you a lot with the day-to-day tasks you’ll encounter as a Java developer. Make sure to spend a few hours learning the most important keyboard short-cuts, it will save you lots of time later on.\n🧱 Libraries The ecosystem of 3rd party libraries is one of Java’s absolute super-powers: there is a ready-made library or framework available for pretty much every task you might think of, most of the times available as open-source.\nPerhaps counter-intuitively, my recommendation here is to try and be conservative with pulling in libraries into your project, and instead work with what’s available in Java’s standard class library (which is huge and covers a wide range of functionality already). Next, check out what your chosen application framework (if you use one, see below) offers either itself or provides integrations for.\nAdding a dependency to an external library should always be a conscious decision, as you might easily run into version conflicts between transitive dependencies (i.e. dependencies of dependencies) in different versions, more dependencies increase the complexity of your application (for instance, you must keep them all up-to-date), they may increase the attack surface of your application, etc. Sometimes, you might be better off by implementing something yourself, or maybe copy a bit of code from a 3rd party library into your own codebase, provided the license of that library allows for that.\nThat said, some popular libraries you will encounter in many projects include JUnit (for unit testing), slf4j (logging), Jackson (JSON handling), Hibernate (object-relational persistence, domain model validation, etc.), Testcontainers (integration testing with Docker), and ArchUnit (enforcing software architecture rules). The \u0026#34;awesome-java\u0026#34; list is a great starting point for diving into the ecosystem of Java libraries.\nMost open-source dependencies are available via the Maven Central repository; All the build tools integrate with it, not only Maven itself, but also Gradle and all the others. The MVN Repository site is a good starting point for finding dependencies and their latest versions. If you want to distribute libraries within your own organization, you can do so by self-running repository servers like Nexus or Artifactory, or use managed cloud services such as AWS CodeArtifact.\n🐢 Application Framework Most Java enterprise applications are built on top of an application framework which provides support for structuring your code via dependency injection, seamlessly integrates with a curated set of 3rd party libraries in compatible versions, helps with configuring and testing your application, and much more.\nAgain, there’s plenty of options in Java here, such as Spring Boot, Quarkus, Jakarta EE, Micronaut, Helidon, and more. My personal recommendation here is to use Quarkus (it’s the one I’m most familiar with, having worked for Red Hat before, who are the company behind this framework), or alternatively Spring Boot.\nBoth are widely popular, integrate with a wide range of technologies (e.g. web frameworks and databases of all kinds), come with excellent developer tooling, and are backed by very active open-source communities.\n🐳 Container Base Image In particular when you are going to work on an enterprise application, chances are that you’ll publish your application in form of a container image, so people can run it on Docker or Kubernetes.\nSticking to the recommendation on using Eclipse Temurin as your Java distribution, I suggest to use the Temurin image as the base for your application images, e.g. eclipse-temurin:17 for Java 17. Just make sure to keep your image up to date, so you and your users benefit from updates to the base image.\nOne base image you should avoid is the OpenJDK one, which is officially deprecated and not recommended for production usage.\n🔭 Next Steps The points above hopefully can help you to embark onto a successful journey with the Java platform, but they only are a starting point. Depending on your specific needs and requirements, here is a number possible next topics to explore and learn about:\nExploring the tools which come with the JDK, for instance javadoc (for generating API documentation), jcmd (for sending diagnostic commands to a running Java application), or jpackage (for packaging self-contained Java applications)\nBuilding native binaries using GraalVM, allowing for a fast start-up and low memory consumption; very useful for instance for building command-line tools or AWS Lambda functions\nAnalyzing the performance and runtime characteristics of your application using JDK Flight Recorder and JDK Mission Control\nSetting up continuous integration (CI) workflows for automatically building and testing your application with GitHub Actions (the aforementioned Maven oss-quickstart archetype will generate a basic template for that automatically)\nPublishing open-source libraries to Maven Central with JReleaser\nFinally, a few resources which should help you to stay up-to-date with everything Java and learn what’s going on in the community include the Java News on dev.java, inside.java (\u0026#34;news and views from members of the Java team at Oracle\u0026#34;) the JEP Search (for searching and filtering Java enhancement proposals, i.e. changes to the language and the platform) and Foojay (Friends of OpenJDK).\nMany thanks to Nils Hartmann, Andres Almiray, and Oliver Zeigermann for their input and feedback while writing this post!\n","id":104,"publicationdate":"Jan 15, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_distribution\"\u003e📦 Distribution\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_version\"\u003e📆 Version\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_installation\"\u003e🔧 Installation\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_your_first_java_program\"\u003e💡 Your First Java Program\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_learning_the_language\"\u003e📚 Learning the Language\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_️_build_tool\"\u003e👷‍♀️ Build Tool\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_editor\"\u003e📝 Editor\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_libraries\"\u003e🧱 Libraries\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_application_framework\"\u003e🐢 Application Framework\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_container_base_image\"\u003e🐳 Container Base Image\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_next_steps\"\u003e🔭 Next Steps\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e27 years of age, and alive and kicking — The Java platform regularly comes out amongst the top contenders in rankings like the \u003ca href=\"https://www.tiobe.com/tiobe-index/\"\u003eTIOBE index\u003c/a\u003e.\nIn my opinion, rightly so. The language is very actively maintained and constantly improved;\nits underlying runtime, the Java Virtual Machine (JVM),\nis one of, if not the most, advanced runtime environments for managed programming languages.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThere is a massive eco-system of Java libraries which make it a great tool for a large number of use cases,\nranging from command-line and desktop applications, over web apps and backend web services, to datastores and stream processing platforms.\nWith upcoming features like \u003ca href=\"https://openjdk.org/jeps/426\"\u003esupport for vectorized computations\u003c/a\u003e (SIMD),\nlight-weight \u003ca href=\"https://openjdk.org/projects/loom\"\u003evirtual threads\u003c/a\u003e,\nimproved \u003ca href=\"https://openjdk.org/projects/panama/\"\u003eintegration with native code\u003c/a\u003e,\n\u003ca href=\"https://openjdk.org/projects/valhalla/\"\u003evalue objects and user-defined primitives\u003c/a\u003e, and others,\nJava is becoming an excellent tool for solving a larger number of software development tasks than ever before.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","tooling","getting-started"],"title":"Getting Started With Java Development in 2023 — An Opinionated Guide","uri":"https://www.morling.dev/blog/getting-started-with-java-development-2023/"},{"content":"","id":105,"publicationdate":"Jan 15, 2023","section":"tags","summary":"","tags":null,"title":"getting-started","uri":"https://www.morling.dev/tags/getting-started/"},{"content":"","id":106,"publicationdate":"Jan 15, 2023","section":"tags","summary":"","tags":null,"title":"tooling","uri":"https://www.morling.dev/tags/tooling/"},{"content":" I strongly believe that you should avoid connecting to production environments from local developer machines as much as possible. But sometimes, e.g. in order to analyse some specific kinds of failures, doing so can be inevitable.\nNow, if this is the case, I really, really want to be sure that I’m aware of the environment I am working in. I absolutely want to avoid a situation as in the catchy title of this post, when for instance you realize that you just ran some integration test against a production environment. In the context of working with the AWS CLI tool this means I’d like to be aware of the currently active profile by means of coloring my shell accordingly. Here’s how I’ve set this up using iTerm2 and zsh.\nThe first step is to create a profile in iTerm2 for each separate environment which you can easily recognize and tell apart. In my case, I’ve set up two profiles:\nA \u0026#34;Dev\u0026#34; profile with a dark green background\nA \u0026#34;Prod\u0026#34; profile with a dark red background\nI have also added badges with the profile name which is shown a the upper right corner of the window for further emphasis.\nWhile you can specify the right profile to use for each single invocation of the aws tool, this quickly becomes cumbersome. So I am enabling profiles using the AWS_PROFILE environment variable:\n1 export AWS_PROFILE=dev Whenever the value of this environment variable changes, I would like to activate the corresponding iTerm2 profile. This can be done programmatically by echo-ing a specific escape sequence which is interpreted by the terminal emulator:\n1 echo -e \u0026#34;\\033]50;SetProfile=Dev\\a\u0026#34; To make sure the right profile is set, I am using the precmd hook function in zsh. It is invoked every time before the prompt is displayed. Just add the following to your .zshrc file (if you have multiple actions you’d like to execute, it can be worthwhile to set them up as separate hook functions, as described in this post):\n1 2 3 4 5 6 7 8 9 10 11 precmd () { if [ \u0026#34;$AWS_PROFILE\u0026#34; = \u0026#34;dev\u0026#34; ] then echo -e \u0026#34;\\033]50;SetProfile=Dev\\a\u0026#34; elif [ \u0026#34;$AWS_PROFILE\u0026#34; = \u0026#34;prod\u0026#34; ] then echo -e \u0026#34;\\033]50;SetProfile=Prod\\a\u0026#34; else echo -e \u0026#34;\\033]50;SetProfile=Default\\a\u0026#34; fi } With that configuration in place (either source your .zshrc or open a new session for activating it), choosing a specific AWS profile automatically triggers the activation of the matching profile in iTerm2:\nThat way, it’s very apparent which AWS profile currently is active, substantially reducing the risk for making any stupid mistakes.\n","id":107,"publicationdate":"Jan 5, 2023","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI strongly believe that you should avoid connecting to production environments from local developer machines as much as possible.\nBut sometimes, e.g. in order to analyse some specific kinds of failures,\ndoing so can be inevitable.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eNow, if this is the case, I really, really want to be sure that I’m aware of the environment I am working in.\nI absolutely want to avoid a situation as in the catchy title of this post,\nwhen for instance you realize that you just ran some integration test against a production environment.\nIn the context of working with the AWS CLI tool this means I’d like to be aware of the currently active \u003ca href=\"https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html\"\u003eprofile\u003c/a\u003e by means of coloring my shell accordingly.\nHere’s how I’ve set this up using \u003ca href=\"https://iterm2.com/\"\u003eiTerm2\u003c/a\u003e and \u003ca href=\"https://www.zsh.org/\"\u003ezsh\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e","tags":["aws","shell","tooling"],"title":"Oh... This is Prod?!","uri":"https://www.morling.dev/blog/oh_this_is_prod/"},{"content":" Table of Contents Emitting Custom Events Filtering \u0026#34;Thread Park\u0026#34; Events Towards Real-Time Analysis of JFR Events Java’s BlockingQueue hierarchy is widely used for coordinating work between different producer and consumer threads. When set up with a maximum capacity (i.e. a bounded queue), no more elements can be added by producers to the queue once it is full, until a consumer has taken at least one element. For scenarios where new work may arrive more quickly than it can be consumed, this applies means of back-pressure, ensuring the application doesn’t run out of memory eventually, while enqueuing more and more work items.\nOne interesting usage of blocking queues is to buffer writes to a database. Let’s take SQLite, an embedded RDBMS, as an example; SQLite only allows for a single writer at any given time, and it tends to yield a sub-optimal write through-put when executing many small transactions.\nA blocking queue can be used to mitigate that situation: all threads that wish to perform an update to the database, for instance the worker threads of a web application, submit work items with their write tasks to a blocking queue. Another thread fetches items in batches from that queue, executing one single transaction for all work items of a batch.\nThis results in a much better performance compared to each thread executing its own individual write transaction, in particular when keeping those open for the entire duration of web requests, as it’s commonly the case with most web frameworks. More on that architecture, in particular in regards to failure handling, in a future blog post.\nHow do you find out though when a producer actually is blocked while trying to add items to a BlockingQueue? After all, this is an indicator that the through-put of your system isn’t as high as it would need to be in order to fully satisfy the workload submitted by the producers.\nIf you have the means of running a profiler against the system, then for instance async-profiler with its wall-clock profiling option will come in handy for this task; unlike CPU profiling which only profiles running threads, wall-clock profiling will also tell you about the time spent by threads in blocked and waiting states, as is the case here.\nBut what when connecting with a wall-clock profiler is not an option? In this case, JDK Flight Recorder, Java’s go-to tool for all kinds of performance analyses, and its accompanying client, JDK Mission Control (JMC), can be of help to you. JFR specifically has been designed as an \u0026#34;always-on\u0026#34; event recording engine for usage in production environment. It doesn’t provide bespoke support for identifying blocked queue producers, though. BlockingQueue implementations such as ArrayBlockingQueue don’t use Java intrinsic locks (i.e. what you’d get when using the synchronized keyword), but rather locks based on the LockSupport primitives. These don’t show up in the \u0026#34;Lock Instances\u0026#34; view in JMC at this point.\nEmitting Custom Events One possible solution is to emit custom JFR events from within your own code whenever you’re trying to submit an item to a bounded queue at its maximum capacity. For this, you couldn’t use the put() method of the BlockingQueue interface, though, as it actually is blocking and you’d have no way to react to that.\nInstead, you’d have to rely on either offer() (which returns false when it cannot submit an item) or add() (which raises an exception). When the queue is full and you can’t submit another item, you’d instantiate your custom JFR event type, retry to submit the item for as long as it’s needed, and finally commit the JFR event. Needless to say that this kind of busy waiting is not only rather inefficient, you’d also have to remember to implement this pattern in all your blocking queue producers of your program.\nA better option, at least in theory, would be to use the JMC Agent. Part of the JDK Mission Control project, this Java agent allows you to instrument the byte code of existing methods, so that a JFR event will be emitted whenever they are invoked. The configuration of JMC Agent happens via an XML file and is rather straightforward. Here’s how instrumenting the put() method of the ArrayBlockingQueue type would look like:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 \u0026lt;?xml version=\u0026#34;1.0\u0026#34; encoding=\u0026#34;UTF-8\u0026#34;?\u0026gt; \u0026lt;jfragent\u0026gt; \u0026lt;config\u0026gt; \u0026lt;classprefix\u0026gt;__JFREvent\u0026lt;/classprefix\u0026gt; \u0026lt;allowtostring\u0026gt;true\u0026lt;/allowtostring\u0026gt; \u0026lt;allowconverter\u0026gt;true\u0026lt;/allowconverter\u0026gt; \u0026lt;/config\u0026gt; \u0026lt;events\u0026gt; \u0026lt;event id=\u0026#34;queue.Put\u0026#34;\u0026gt; \u0026lt;label\u0026gt;Put\u0026lt;/label\u0026gt; \u0026lt;description\u0026gt;Queue Put\u0026lt;/description\u0026gt; \u0026lt;path\u0026gt;Queues\u0026lt;/path\u0026gt; \u0026lt;stacktrace\u0026gt;true\u0026lt;/stacktrace\u0026gt; \u0026lt;class\u0026gt;java.util.concurrent.ArrayBlockingQueue\u0026lt;/class\u0026gt; \u0026lt;method\u0026gt; \u0026lt;name\u0026gt;put\u0026lt;/name\u0026gt; \u0026lt;descriptor\u0026gt;(Ljava/lang/Object;)V\u0026lt;/descriptor\u0026gt; \u0026lt;/method\u0026gt; \u0026lt;location\u0026gt;WRAP\u0026lt;/location\u0026gt; \u0026lt;/event\u0026gt; \u0026lt;/events\u0026gt; \u0026lt;/jfragent\u0026gt; With this agent configuration in place, you’d get an event for every invocation of put() though, no matter whether it actually is blocking or not. While you might be able to make some educated guess based on the duration of these events, that’s not totally reliable. For instance, you couldn’t be quite sure whether a \u0026#34;long\u0026#34; event actually is caused by blocking on the queue or by some GC activity.\nSo how about going one level deeper then? If you look at the implementation of ArrayBlockingQueue::put(), you’ll find that the actual blocking call happens through the await() method on the notFull Condition object. You could use JMC Agent to instrument that await() method, but this would give you events for every Condition instance, also for those not used by BlockingQueue implementations.\nFiltering \u0026#34;Thread Park\u0026#34; Events But this finally hints us into the right direction: await() is implemented on top of LockSupport::park(), and the JVM itself emits a JFR event whenever a thread is parked. How to identify though those \u0026#34;Java Thread Park\u0026#34; events actually triggered by blocking on a queue? If there only was a way to query and filter JFR events in a structured query language!\nTurns out there is. JFR Analytics lets you do exactly that: analysing JFR recording files using standard SQL. I haven’t worked that much on this project over the last year, but extending it for the use case at hand was easy enough. By means of the new HAS_MATCHING_FRAME() function it becomes trivial to identify the relevant events.\nJFR Analytics hasn’t been released to Maven Central yet, so you need to check out its source code and build it from source yourself. You then can use the SQLLine command line interface for examining your recordings:\n1 2 java --class-path \u0026#34;target/lib/*:target/jfr-analytics-1.0.0-SNAPSHOT.jar\u0026#34; \\ sqlline.SqlLine Then, within the CLI tool, \u0026#34;connect\u0026#34; to a recording file and change the output format to \u0026#34;vertical\u0026#34; for better readability of stack traces:\n1 2 sqlline\u0026gt; !connect jdbc:calcite:schemaFactory=org.moditect.jfranalytics.JfrSchemaFactory;schema.file=path/to/lock-recording.jfr dummy dummy sqlline\u0026gt; !outputformat vertical If you need a recording file to play with, check out this example project. It has a very simple main class with two threads: a producer thread which inserts 20 items per second to a blocking queue, and a consumer thread, which takes those items at a rate of ten items per second. Once the queue’s capacity has been reached, the producer will regularly block, as it can only insert ten items per second instead of 20. With JFR Analytics, the affected put() calls can be identified via the following query:\n1 2 3 4 5 6 7 8 SELECT \u0026#34;startTime\u0026#34;, \u0026#34;duration\u0026#34; / 1000000 AS \u0026#34;duration\u0026#34;, \u0026#34;eventThread\u0026#34;, TRUNCATE_STACKTRACE(\u0026#34;stackTrace\u0026#34;, 8) as \u0026#34;stack trace\u0026#34; FROM \u0026#34;jdk.ThreadPark\u0026#34; WHERE HAS_MATCHING_FRAME(\u0026#34;stackTrace\u0026#34;, \u0026#39;.*ArrayBlockingQueue\\.put.*\u0026#39;); Et voilà, the query returns exactly those thread park events emitted for any blocked put() call:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... startTime 2023-01-02 18:42:57.594 duration 455 eventThread pool-1-thread-1 stack trace jdk.internal.misc.Unsafe.park(boolean, long) java.util.concurrent.locks.LockSupport.park():371 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block():506 java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool$ManagedBlocker):3744 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool$ManagedBlocker):3689 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await():1625 java.util.concurrent.ArrayBlockingQueue.put(Object):370 dev.morling.demos.BlockingQueueExample$1.run():35 startTime 2023-01-02 18:42:58.097 duration 954 eventThread pool-1-thread-1 stack trace jdk.internal.misc.Unsafe.park(boolean, long) java.util.concurrent.locks.LockSupport.park():371 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block():506 java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool$ManagedBlocker):3744 java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool$ManagedBlocker):3689 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await():1625 java.util.concurrent.ArrayBlockingQueue.put(Object):370 dev.morling.demos.BlockingQueueExample$1.run():35 ... Note how the stack traces are truncated so you can see the immediate caller, in this case the producer thread of the aforementioned example application. One thing to be aware of is that JFR applies a minimum threshold for capturing thread park events: 20 ms with the default configuration and 10 ms with the profile configuration. I.e. you would not know about any calls blocking shorter than that. You can adjust the threshold in your JFR configuration, but be aware of the potential overhead.\nEquipped with the information about any blocked invocations of put(), you now could take appropriate action; depending on the specific workload and its characteristics, you might for instance look into tuning your queue consumers, add more of them (when not in a sequencer scenario as with SQLite above), or maybe share the load across multiple machines. You also might increase the size of the queue, providing more wiggle room to accommodate short load spikes.\nTowards Real-Time Analysis of JFR Events All this happens after the fact though, through offline analysis of JFR recording files. An alternative would be to run this kind of analysis in realtime on live JFR data. The foundation for this is JFR event streaming which provides low-latency access to the JFR events of a running JVM.\nExpanding JFR Analytics into this direction is one of my goals for this year: complementing its current pull query capabilities (based on Apache Calcite) with push queries, leveraging Apache Flink as a stream processing engine. That way, blocked queue producers could trigger some kind of alert in a live production environment, for instance raised when the overall duration of blocked calls exceeds a given threshold in a given time window, indicating the need for intervention with a much lower delay than possible with offline analysis.\nTaking things even further, streaming queries could even enable predictive analytics; Flink’s pattern matching capabilities and the MATCH_RECOGNIZE clause could be used for instance to identify specific sequences of events which indicate that a full garbage collection is going to happen very soon. This information could be exposed via a health check, signalling to the load balancer in front of a clustered web application that affected nodes should not receive any more requests for some time, so as to shield users from long GC-induced response times.\nIf this sounds interesting to you, please let me know; I’d love to collaborate with the open-source community on this effort.\nMany thanks to Richard Startin for his feedback while working on this post!\n","id":108,"publicationdate":"Jan 3, 2023","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_emitting_custom_events\"\u003eEmitting Custom Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_filtering_thread_park_events\"\u003eFiltering \u0026#34;Thread Park\u0026#34; Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_towards_real_time_analysis_of_jfr_events\"\u003eTowards Real-Time Analysis of JFR Events\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJava’s \u003ca href=\"https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/BlockingQueue.html\"\u003e\u003ccode\u003eBlockingQueue\u003c/code\u003e\u003c/a\u003e hierarchy is widely used for coordinating work between different producer and consumer threads.\nWhen set up with a maximum capacity (i.e. a \u003cem\u003ebounded queue\u003c/em\u003e), no more elements can be added by producers to the queue once it is full, until a consumer has taken at least one element.\nFor scenarios where new work may arrive more quickly than it can be consumed, this applies means of back-pressure,\nensuring the application doesn’t run out of memory eventually, while enqueuing more and more work items.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","concurrency","sqlite"],"title":"Is your Blocking Queue... Blocking?","uri":"https://www.morling.dev/blog/is-your-blocking-queue-blocking/"},{"content":"","id":109,"publicationdate":"Dec 18, 2022","section":"tags","summary":"","tags":null,"title":"maven","uri":"https://www.morling.dev/tags/maven/"},{"content":" As part of my new job at Decodable, I am also planning to contribute to the Apache Flink project (as Decodable’s fully-managed stream processing platform is based on Flink). Right now, I am in the process of familiarizing myself with the Flink code base, and as such I am of course building the project from source, too.\nFlink uses Apache Maven as its build tool. It comes with the Maven Wrapper, simplifying the onboarding experience for new contributors, who don’t need to have Maven installed upfront. The configured Maven version is quite old though, 3.2.5 from 2014. Not even coloured output on the CLI yet — Boo! So I tried to build Flink with the latest stable version of Maven, 3.8.6 at the time of writing, but ran into some issues doing so.\nSpecifically, there are several dependencies with repository information embedded into their POM files. This is generally considered a bad practice for libraries, as it will inject those repositories into the build of any consumers, e.g. causing slower build processes. In the case at hand, the situation is even worse, as Maven since version 3.8.1 blocks access to non-HTTPS repositories for security reasons. This means that your build will fail if any dependency pulls in an HTTP repository.\nDealing with this is a bit cumbersome, as it’s not always obvious which dependency is causing that issue. For Flink, I encountered two instances of that problem. First, a transitive dependency of the flink-connector-hive_2.12 module (message slightly adapted for readability):\n1 2 3 4 5 6 7 8 9 10 11 12 ... [ERROR] Failed to execute goal on project flink-connector-hive_2.12: Could not resolve dependencies for project org.apache.flink:flink-connector-hive_2.12:jar:1.17-SNAPSHOT: Failed to collect dependencies at org.apache.hive:hive-exec:jar:2.3.9 -\u0026gt; org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Failed to read artifact descriptor for org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde: Could not transfer artifact org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: [ repository.jboss.org (http://repository.jboss.org/nexus/content/groups/public/, default, disabled), conjars (http://conjars.org/repo, default, releases+snapshots), apache.snapshots (http://repository.apache.org/snapshots, default, snapshots) ] ... There’s three non-HTTPS repositories involved here which got blocked by Maven. Note that those are all the unsecure repositories found in the dependency chain, they are not necessarily related to that particular error.\nUnfortunately, there’s no good way for identifying which dependency exactly is pulling them into the build and which repository is the problem here. Instead, you need to analyse all the dependencies from the project root to the flagged dependency, including any potential parent POM(s). In the case at hand, the problematic repo is the \u0026#34;conjars\u0026#34; one, as defined in the parent POM of the org:apache:hive:hive-exec artifact, org:apache:hive.\nAs far as I am aware, there’s no way for overriding such dependency-defined repositories in a downstream build; the only way I’ve found is to define a repository with the same id in a custom settings.xml file, redefining its URL to make use of HTTPS:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 \u0026lt;?xml version=\u0026#34;1.0\u0026#34; encoding=\u0026#34;UTF-8\u0026#34;?\u0026gt; \u0026lt;settings xmlns=\u0026#34;http://maven.apache.org/SETTINGS/1.0.0\u0026#34; xmlns:xsi=\u0026#34;http://www.w3.org/2001/XMLSchema-instance\u0026#34; xsi:schemaLocation=\u0026#34;http://maven.apache.org/SETTINGS/1.0.0 https://maven.apache.org/xsd/settings-1.0.0.xsd\u0026#34;\u0026gt; \u0026lt;mirrors\u0026gt; \u0026lt;mirror\u0026gt; \u0026lt;id\u0026gt;conjars\u0026lt;/id\u0026gt; \u0026lt;name\u0026gt;conjars\u0026lt;/name\u0026gt; \u0026lt;url\u0026gt;https://conjars.org/repo\u0026lt;/url\u0026gt; \u0026lt;mirrorOf\u0026gt;conjars\u0026lt;/mirrorOf\u0026gt; \u0026lt;/mirror\u0026gt; \u0026lt;/mirrors\u0026gt; \u0026lt;/settings\u0026gt; Building Flink with this settings.xml file gets us beyond that error. As far as the other two repositories are concerned, the JBoss one is actually defined in the root POM of Apache Flink itself. I’m not sure whether it’s actually needed, but I have created a pull request for changing it to HTTPS, just in case. The \u0026#34;apache.snapshots\u0026#34; repo is defined in the parent POM of org:apache:hive and seems also not needed. You could override it in your settings.xml using its HTTPS URL as a measure of good practice, though.\nWith that settings.xml in place, I could build Apache Flink using the current Maven version 3.8.6. I noticed though that the build gets stuck for quite some time at the following step:\n1 2 3 4 5 6 ... [INFO] ------------------\u0026lt; org.apache.flink:flink-hadoop-fs \u0026gt;------------------ [INFO] Building Flink : FileSystems : Hadoop FS 1.17-SNAPSHOT [INFO] --------------------------------[ jar ]--------------------------------- Downloading from maven-default-http-blocker: http://0.0.0.0/net/minidev/json-smart/maven-metadata.xml ... The build wouldn’t fail, though: after exactly 75 seconds, it continues and runs to completion. So what’s causing this stall? Again, a non-HTTPS repository is the culprit, but in a slightly more confusing way. As it turns out, that transitive dependency to the net.minidev:json-smart library is declared using a version range by the artifact com.nimbusds:nimbus-jose-jwt: [1.3.1,2.3].\nSo Maven reaches out to all configured repositories in order to identify the latest version within that range. Now the hadoop-auth dependency (via its parent hadoop-main) pulls in the JBoss HTTP repository; and while access to this is prevented via Maven’s HTTP blocker, for some reason it still tries to connect to that blocker’s pseudo URL 0.0.0.0. After 75 seconds, this request eventually times out and the build continues. Go figure.\nFor preventing this issue, you have a few options:\nAdd the JBoss repository with HTTPS to your settings.xml (again, the definition in the root POM of your own build does not suffice for that)\nRun the build with the -o (offline) flag\nPin down the version of the artifact in the dependency management of your build, sidestepping the need for resolving the version range:\n1 2 3 4 5 6 7 8 9 10 11 ... \u0026lt;dependencyManagement\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;net.minidev\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;json-smart\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.3\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; \u0026lt;/dependencyManagement\u0026gt; ... This approach has the advantage that it can be done in a persistent way as part of the Maven POM itself, there’s no need for a custom settings.xml or build time parameters like the offline flag.\nIn any case, the build will now skip that 75 seconds pause. I.e. less time for drinking a coffee while the build is running, which is a good thing of course. Now you might wonder why exactly 75 seconds, and I have to admit it’s not fully clear to me.\nWhen running the build with a debugger attached (I know, I know, it’s not en-vogue these days), I didn’t see any timeout configuration for establishing that HTTP connection. Some default TCP connection timeout on macOS perhaps? Interestingly, when trying with the latest Alpha of Maven 4, the build would only stall for ten seconds when trying to resolve that version range; Maven’s HTTP client is configured with a timeout of ten seconds as of this release.\nThe moral of the story? Don’t put repository information into published Maven POMs. If you publish something to Maven Central, all its dependencies should be resolvable from there, too. Luckily, Maven 4 will make this problem an issue of the past, bringing the long-awaited separation of build and consumer POMs.\nI’d also advise caution when it comes to adding version ranges to dependency definitions, it can have unexpected consequences as demonstrated above, and it’s probably not worth the hassle.\n","id":110,"publicationdate":"Dec 18, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs part of my \u003ca href=\"/blog/why-i-joined-decodable/\"\u003enew job\u003c/a\u003e at Decodable,\nI am also planning to contribute to the \u003ca href=\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e project\n(as Decodable’s fully-managed \u003ca href=\"https://www.decodable.co/product\"\u003estream processing platform\u003c/a\u003e is based on Flink).\nRight now, I am in the process of familiarizing myself with the Flink code base,\nand as such I am of course building the project from source, too.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","maven","flink","build-tools"],"title":"Maven, What Are You Waiting For?!","uri":"https://www.morling.dev/blog/maven-what-are-you-waiting-for/"},{"content":" Table of Contents Using Logical Decoding Row Filters With Debezium Observing Filtered Change Events Wrap-Up This post originally appeared on the Decodable blog. All rights reserved.\nSince logical decoding was added to Postgres in version 9.4, this powerful feature for capturing changes from the write-ahead log of the database has been continuously improved. Postgres 15, released in October this year, added support for fine-grained control over which columns (by means of column lists) and rows (via row filters) should be exported from captured tables. This means, in relational terminology, projections and filters are now natively supported by Postgres change event publications.\nReasons for specifically configuring which columns and rows should be contained in a change data stream are manifold:\nExcluding large columns (say, a binary column with image data) can significantly reduce the size of change events and thus the required network bandwidth\nExcluding columns or rows with sensitive data can be necessary in order to satisfy privacy requirements, when for instance Personally Identifiable Information (PII) shouldn’t be exposed to external systems\nFiltering published rows by tenant id can be useful for setting up tenant-specific change streams in a multi-tenant architecture\nBefore the advent of Postgres-native column lists and row filters, users of Debezium — a popular open-source platform for change data capture (CDC), which also is used by several Decodable CDC connectors — would typically have implemented these kinds of use cases via a combination of configuration options and single message transformations (SMTs).\nProjections are supported in Debezium via the column.include.list and column.exclude.list options. These configuration options are applied client-side, i.e. within the Debezium connector, which makes them less efficient to server-side column lists, potentially causing large amounts of data to be streamed to Debezium, only to be discarded there.\nFilters are a bit more involved: while there is built-in support for filtering the contents of initial and ad-hoc incremental snapshots, filtering change events emitted from the WAL requires a custom SMT. Pushing this logic into the logical replication mechanism of the database itself makes a lot of sense from a usability and efficiency perspective.\nSo let’s see how Postgres 15 row filters can be used together with Debezium. Initially, I meant to demonstrate the usage of column lists, too. But in the course of exploring that feature, I discovered a bug in Postgres which causes incorrect events to be emitted for UPDATE and DELETE statements when column lists are present. So this will have to wait for another time. The Postgres community took care of this super fast: a bug fix has already been applied, so that column lists should work as expected in the next Postgres release.\nUsing Logical Decoding Row Filters With Debezium To follow along, check out the postgres-publication-filtering demo project from GitHub. It contains a Docker Compose file for running Postgres as well as Apache Kafka and Kafka Connect with Debezium:\n1 2 3 git clone git@github.com:gunnarmorling/postgres-publication-filtering.git cd postgres-publication-filtering docker-compose up --build That Postgres example container image contains a table products with the following schema:\nLet’s set up a change event stream for that table which only contains events if the quantity of the given product item is below 10. We could then for instance envision a microservice which subscribes to that stream and places backfill orders with our suppliers for those products.\nRow filters are configured via Postgres publications, as used with the pgoutput logical decoding plug-in. As Debezium can only create publications with the default settings (at least for now), you need to manually create a custom publication with the required configurations and have Debezium make use of it. To do so, launch a Postgres session via pgcli:\n1 2 3 4 docker run --tty --rm -i \\ --network postgres-publication-filtering_default \\ quay.io/debezium/tooling:1.2 \\ bash -c \u0026#39;pgcli postgresql://postgresuser:postgrespw@postgres:5432/postgresdb\u0026#39; Then create a publication like so:\n1 2 SET search_path TO inventory; CREATE PUBLICATION inventory_publication FOR TABLE products WHERE (quantity \u0026lt; 10); As of Postgres 15, the CREATE PUBLICATION statement allows you to narrow down the events to be emitted for a given table via a custom WHERE clause. A few conditions apply to that clause (see this post for more information), most importantly:\nIf the publication publishes UPDATE or DELETE events, only columns which are part of the table’s replica identity may be referenced\nOnly simple expressions are allowed, for example not referring to user-defined functions or types, system columns etc.\nThat’s all we need to do on the Postgres side. Now let’s take a look at the required Debezium configuration:\n1 2 3 \u0026#34;plugin.name\u0026#34; : \u0026#34;pgoutput\u0026#34;, \u0026#34;publication.autocreate.mode\u0026#34; : \u0026#34;disabled\u0026#34;, \u0026#34;publication.name\u0026#34; : \u0026#34;inventory_publication\u0026#34;, As the Postgres publication only is used when Debezium retrieves change events via logical decoding from the WAL, you also need to customize the SELECT statement used for the products table when snapshotting the table. Otherwise, you’d get snapshot events for all the rows of that table, no matter what their quantity is. This can be done via the following configuration:\n1 \u0026#34;snapshot.select.statement.overrides\u0026#34; : \u0026#34;inventory.products:WHERE quantity \u0026lt; 10\u0026#34; Altogether, the connector configuration looks like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 { \u0026#34;name\u0026#34;: \u0026#34;inventory-connector\u0026#34;, \u0026#34;config\u0026#34;: { \u0026#34;connector.class\u0026#34;:\u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, \u0026#34;tasks.max\u0026#34;: \u0026#34;1\u0026#34;, \u0026#34;database.hostname\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.port\u0026#34;: \u0026#34;5432\u0026#34;, \u0026#34;database.user\u0026#34;: \u0026#34;postgresuser\u0026#34;, \u0026#34;database.password\u0026#34;: \u0026#34;postgrespw\u0026#34;, \u0026#34;database.dbname\u0026#34; : \u0026#34;postgresdb\u0026#34;, \u0026#34;topic.prefix\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;schema.include.list\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;table.include.list\u0026#34; : \u0026#34;inventory.products,inventory.customers,inventory.test\u0026#34;, \u0026#34;plugin.name\u0026#34; : \u0026#34;pgoutput\u0026#34;, \u0026#34;publication.autocreate.mode\u0026#34; : \u0026#34;disabled\u0026#34;, \u0026#34;publication.name\u0026#34; : \u0026#34;inventory_publication\u0026#34;, \u0026#34;snapshot.select.statement.overrides\u0026#34; : \u0026#34;inventory.products\u0026#34;, \u0026#34;snapshot.select.statement.overrides.inventory.products\u0026#34; : \u0026#34;SELECT * FROM inventory.products WHERE quantity \u0026lt; 10\u0026#34; } } Now register a connector instance with this configuration. If you have kcctl 🧸 installed (which I highly recommend), that’s as simple as that:\n1 kcctl apply -f inventory-connector.json Alternatively, use curl to post the configuration directly to Kafka Connect’s REST API:\n1 2 curl -i -X POST -H \u0026#34;Accept:application/json\u0026#34; -H\u0026#34;Content-Type:application/json\u0026#34; \\ http://localhost:8083/connectors/ -d @inventory-connector.json Observing Filtered Change Events Return to your Postgres session and display the contents of the products table:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 postgresdb\u0026gt; select id, name, quantity from inventory.products; +------+--------------------+------------+ | id | name | quantity | |------+--------------------+------------| | 101 | scooter | 5 | | 102 | car battery | 10 | | 103 | 12-pack drill bits | 44 | | 104 | hammer | 12 | | 105 | hammer | 42 | | 106 | hammer | 37 | | 107 | rocks | 9 | | 108 | jacket | 19 | | 109 | spare tire | 28 | +------+--------------------+------------+ SELECT 9 Time: 0.050s Out of those nine product items, only those with a quantity of less than ten show up as snapshot events in the corresponding Kafka topic:\n1 2 3 4 5 docker run --tty --rm \\ --network postgres-publication-filtering_default \\ quay.io/debezium/tooling:1.2 \\ kafkacat -b kafka:9092 -C -o beginning -q \\ -t dbserver1.inventory.products | jq \u0026#39;.payload | { op, ts_ms, after }\u0026#39; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 { \u0026#34;op\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1669375471236, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 101, \u0026#34;name\u0026#34;: \u0026#34;scooter\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;Small 2-wheel scooter\u0026#34;, \u0026#34;weight\u0026#34;: 3.14, \u0026#34;quantity\u0026#34;: 5 } } { \u0026#34;op\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1669375471238, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 107, \u0026#34;name\u0026#34;: \u0026#34;rocks\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;box of assorted rocks\u0026#34;, \u0026#34;weight\u0026#34;: 5.3, \u0026#34;quantity\u0026#34;: 9 } } Now let’s do some data changes and observe the resulting change events, as retrieved from the database via logical decoding. First, insert a few records into the table:\n1 2 3 4 5 INSERT INTO products VALUES (DEFAULT, \u0026#39;deck chair\u0026#39;, \u0026#39;A cozy wooden deck chair\u0026#39;, 15.7, 7), (DEFAULT, \u0026#39;paint\u0026#39;, \u0026#39;A bucket of white paint\u0026#39;, 5.0, 15), (DEFAULT, \u0026#39;lamp\u0026#39;, \u0026#39;A green library style lamp\u0026#39;, 4.8, 3); Nothing too exciting is happening in the Kafka topic: as you would expect, only events for the deck chair and the lamp products show up in Kafka, but not for the paint item, as its quantity is larger than 10. Things get a bit more interesting when doing some updates:\n1 2 3 4 UPDATE products SET quantity = 6 WHERE NAME = \u0026#39;deck chair\u0026#39;; UPDATE products SET quantity = 14 WHERE NAME = \u0026#39;paint\u0026#39;; UPDATE products SET quantity = 9 WHERE NAME = \u0026#39;paint\u0026#39;; UPDATE products SET quantity = 11 WHERE NAME = \u0026#39;lamp\u0026#39;; The following three events are emitted to Kafka for those:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 { \u0026#34;op\u0026#34;: \u0026#34;u\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1669382021989, \u0026#34;before\u0026#34;: { \u0026#34;id\u0026#34;: 110, \u0026#34;name\u0026#34;: \u0026#34;deck chair\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;A cozy wooden deck chair\u0026#34;, \u0026#34;weight\u0026#34;: 15.7, \u0026#34;quantity\u0026#34;: 7 }, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 110, \u0026#34;name\u0026#34;: \u0026#34;deck chair\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;A cozy wooden deck chair\u0026#34;, \u0026#34;weight\u0026#34;: 15.7, \u0026#34;quantity\u0026#34;: 6 } } { \u0026#34;op\u0026#34;: \u0026#34;c\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1669382021990, \u0026#34;before\u0026#34;: null, \u0026#34;after\u0026#34;: { \u0026#34;id\u0026#34;: 111, \u0026#34;name\u0026#34;: \u0026#34;paint\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;A bucket of white paint\u0026#34;, \u0026#34;weight\u0026#34;: 5, \u0026#34;quantity\u0026#34;: 9 } } { \u0026#34;op\u0026#34;: \u0026#34;d\u0026#34;, \u0026#34;ts_ms\u0026#34;: 1669382021990, \u0026#34;before\u0026#34;: { \u0026#34;id\u0026#34;: 112, \u0026#34;name\u0026#34;: \u0026#34;lamp\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;A green library style lamp\u0026#34;, \u0026#34;weight\u0026#34;: 4.8, \u0026#34;quantity\u0026#34;: 3 }, \u0026#34;after\u0026#34;: null } Note not all of them have the u (update) operation type, but some are c (create) and d (delete) events. The logic here is that the publication works from a perspective of looking at the row set specified via the WHERE clause for the table. In that light,\nAn update event is emitted for the deck chair quantity update from 7 to 6\nNo event event is emitted for the paint quantity update from 15 to 14, as that row is not part of this row set before and after the change\nA create event is emitted for the paint quantity update from 14 to 9, as that row now became a part of the row set\nA delete event is emitted for the lamp quantity update from 3 to 11, as that row now is not a part of the row set any longer\nFinally, let’s delete some product items:\n1 2 DELETE FROM products WHERE NAME = \u0026#39;lamp\u0026#39;; DELETE FROM products WHERE NAME = \u0026#39;deck chair\u0026#39;; In the Kafka topic you can observe that no change event is emitted for the first deletion (as there’s 11 lamps in stock). But there is an event for the deletion of the deck chair record with a quantity of six.\nAs they say, a picture is worth a thousand words (and I’d never pass on an opportunity for using my favorite tool Excalidraw), so here is an overview of the published events, depending on the specifics of a given data change:\nWrap-Up Row filters (and column lists) are a great addition to the Postgres logical decoding toolbox. Having fine-grained control over which change events should be published and which field they should contain, opens up many interesting opportunities from a perspective of efficiency and data privacy as well as the ability to set up content specific change data streams, as demonstrated in the example above.\nGoing forward, a good next step usability-wise would be for Debezium to apply any configured row filters and column lists to the Postgres publications it creates, simplifying things for users a bit. As far as Flink SQL and Decodable are concerned, row filters and column lists potentially allow for the push down of filter and projection operators of streaming SQL queries; Instead of applying these operators within the Flink stream processing engine, SELECT and WHERE clauses of queries could be re-written transparently and these operators executed as part of the logical replication publication within Postgres itself. Flink supports this kind of push down of logic into data sources via the SupportsFilterPushDown and SupportsProjectionPushDown extension points. For example, this could be very interesting to customers who don’t want specific segments of their data to leave the realm of their database. Please reach out to us if you think this would be an interesting capability to have.\nIf you would like to get started with your own experimentations around 15 Postgres row filters using Debezium, you can find the complete source code of the example shown above in this repository on GitHub. You can find more information about row filters in this blog post; also refer to this post to learn more about this and other new features related to logical replication in Postgres 15.\nMany thanks to Robert Metzger for his feedback while writing this post!\n","id":111,"publicationdate":"Dec 15, 2022","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_using_logical_decoding_row_filters_with_debezium\"\u003eUsing Logical Decoding Row Filters With Debezium\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_observing_filtered_change_events\"\u003eObserving Filtered Change Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eThis post originally appeared on the \u003ca href=\"https://www.decodable.co/blog/postgres-15-logical-decoding-row-filters-with-debezium\"\u003eDecodable blog\u003c/a\u003e. All rights reserved.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSince \u003ca href=\"https://www.postgresql.org/docs/current/logicaldecoding-explanation.html\"\u003elogical decoding\u003c/a\u003e was added to Postgres in version 9.4, this powerful feature for capturing changes from the write-ahead log of the database has been continuously improved.\n\u003ca href=\"https://www.postgresql.org/about/news/postgresql-15-released-2526/\"\u003ePostgres 15\u003c/a\u003e, released in October this year, added support for fine-grained control over which columns (by means of \u003cem\u003ecolumn lists\u003c/em\u003e) and rows (via \u003cem\u003erow filters\u003c/em\u003e) should be exported from captured tables.\nThis means, in relational terminology, projections and filters are now natively supported by Postgres change event publications.\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","debezium","cdc"],"title":"Postgres 15: Logical Decoding Row Filters With Debezium","uri":"https://www.morling.dev/blog/postgres-15-logical-decoding-row-filters-with-debezium/"},{"content":" Table of Contents The Observation The Solution Take Away While working on a demo for processing change events from Postgres with Apache Flink, I noticed an interesting phenomenon: A Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space. The machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks.\nNow a common cause for this kind of issue are replication slots which are not advanced: in that case, Postgres will hold on to all WAL segments after the latest log sequence number (LSN) which was confirmed for that slot. Indeed I had set up a replication slot (via the Decodable CDC source connector for Postgres, which is based on Debezium). I then had stopped that connector, causing the slot to become inactive. The problem was though that I was really sure that there was no traffic in that database whatsoever! What could cause a WAL growth of ~18 GB/day then?\nWhat follows is a quick write-up of my investigations, mostly as a reference for my future self, but I hope this will come in handy for others in the same situation, too.\nThe Observation Let’s start with the observations I made. I don’t have the data and log files from the original situation any longer, but the following steps are enough to reproduce the issue. The first thing is to create a new Postgres database on Amazon RDS (I used version 14.5 on the free tier). Then get a session on the database and create a replication slot like this:\n1 2 3 4 5 6 SELECT * FROM pg_create_logical_replication_slot( \u0026#39;regression_slot\u0026#39;, \u0026#39;test_decoding\u0026#39;, false, true ); Now grab a coffee (or two, or three), and after some hours take a look into the metrics of the database in the RDS web console. \u0026#34;Free Storage Space\u0026#34; shows the following, rather unpleasant, picture:\nWe’ve lost more than two GB within three hours, meaning that the 20 GiB free tier database would run out of disk space within less than two days. Next, let’s take a look at the \u0026#34;Transaction Log Disk Usage\u0026#34; metric. It shows the problem in a very pronounced way:\nRoughly very few minutes the transaction log of the database grows by 64 MB. The \u0026#34;Write IOPS\u0026#34; metric further completes this picture. Again, every five minutes something causes write IOPS in that idle database:\nNow let’s see whether our replication slot actually is the culprit. By looking at the difference between its restart LSN (the earliest LSN which the database needs to retain in order to allow for this slot to resume) and the database’s current LSN we see how much bytes of WAL this slot prevents from being freed while it is inactive:\n1 2 3 4 5 6 7 8 9 10 11 12 SELECT slot_name, pg_size_pretty( pg_wal_lsn_diff( pg_current_wal_lsn(), restart_lsn)) AS retained_wal, active, restart_lsn FROM pg_replication_slots; +-----------------+----------------+----------+---------------+ | slot_name | retained_wal | active | restart_lsn | |-----------------+----------------+----------+---------------| | regression_slot | 2166 MB | False | 0/4A05AF0 | +-----------------+----------------+----------+---------------+ Pretty much exactly the size of the WAL we saw in the database metrics. The big question now is of course what is causing that growth of the WAL? Which process is adding 64 MB to it every five minutes? So let’s take a look at the active server processes in Postgres, using the pg_stat_activity view:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SELECT pid AS process_id, usename AS username, datname AS database_name, client_addr AS client_address, application_name, backend_start, state, state_change FROM pg_stat_activity WHERE usename IS NOT NULL; +--------------+------------+-----------------+------------------+------------------------+-------------------------------+---------+-------------------------------+ | process_id | username | database_name | client_address | application_name | backend_start | state | state_change | |--------------+------------+-----------------+------------------+------------------------+-------------------------------+---------+-------------------------------| | 370 | rdsadmin | \u0026lt;null\u0026gt; | \u0026lt;null\u0026gt; | | 2022-11-30 11:11:03.424359+00 | \u0026lt;null\u0026gt; | \u0026lt;null\u0026gt; | | 468 | rdsadmin | rdsadmin | 127.0.0.1 | PostgreSQL JDBC Driver | 2022-11-30 11:12:02.517528+00 | idle | 2022-11-30 14:15:05.601626+00 | | 14760 | postgres | decodabletest | www.xxx.yyy.zzz | pgcli | 2022-11-30 14:04:58.765899+00 | active | 2022-11-30 14:15:06.820204+00 | +--------------+------------+-----------------+------------------+------------------------+-------------------------------+---------+-------------------------------+ This is interesting: besides our own session (user postgres), there’s also two other sessions by a user rdsadmin. As we don’t do any data changes ourselves, they must be somehow related to the WAL growth we observe.\nThe Solution At this point I had enough information to do some meaningful Google search, and I came across the blog post \u0026#34;Postgres Logical Replication and Idle Databases\u0026#34; by Byron Wolfman, who ran into the exact same issue as I did. As it turns out, RDS is periodically writing heartbeats into that rdsadmin database:\nIn RDS, we write to a heartbeat table in our internal \u0026#34;rdsadmin\u0026#34; database every 5 minutes\nThis is one part of the explanation: in our seemingless inactive RDS Postgres database, there actually is some traffic. But how is it possible that this heartbeat causes such a large amount of WAL growth? Surely those heartbeat events won’t be 64 MB large?\nAnother blog post hinted at the next bit of information: as of Postgres 11, the WAL segment size — i.e. the size of individual files making up the WAL — can be configured. On RDS, this is changed from the default of 16 MB to 64 MB. This sounds familiar!\nThat knowledge center post also led me to the last missing piece of the puzzle, the archive_timeout parameter, which defaults to five minutes. This is what the excellent postgresqlco.nf site has to say about this option:\nWhen this parameter is greater than zero, the server will switch to a new segment file whenever this amount of time has elapsed since the last segment file switch, and there has been any database activity …​ Note that archived files that are closed early due to a forced switch are still the same length as completely full files.\nAnd this finally explains why that inactive replication slot causes the retention of that much WAL on an idle database: there actually are some data changes made every five minutes in form of that heartbeat in the rdsadmin database. This in turn causes a new WAL segment of 64 MB to be created every five minutes. As long as that replication slot is inactive and doesn’t make any progress, all those WAL segments will be kept, (not so) slowly causing the database server to run out of disk space.\nTake Away The morale of the story? Don’t leave your replication slots unattended! There shouldn’t be any slots which are inactive for too long. For instance you could set up an alert based on the query above which notifies you if some slot retains WAL of more than 100 MB. And of course you should monitor your free disk space, too.\nThat being said, you still might be in for a bad surprise: under specific instances, also an active replication slot can cause unexpected WAL retention. If for instance large amounts of changes are being made to one database but a replication slot has been set up for another database which doesn’t receive any changes, that slot still won’t be able to make any progress.\nA common solution to that scenario is inducing some sort of artificial traffic into the database, as for instance supported by the Debezium Postgres connector. Note this doesn’t even require a specific tables, periodically writing a message just to the WAL using pg_logical_emit_message() is enough:\n1 SELECT pg_logical_emit_message(false, \u0026#39;heartbeat\u0026#39;, now()::varchar); If you use a logical decoding plug-in which supports logical replication messages — like pgoutput since Postgres 14 — then that’s all that’s needed for letting your replication slot advance within an otherwise idle database.\n","id":112,"publicationdate":"Nov 30, 2022","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_observation\"\u003eThe Observation\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_solution\"\u003eThe Solution\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_take_away\"\u003eTake Away\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhile working on a demo for processing change events from Postgres with Apache Flink,\nI noticed an interesting phenomenon:\nA Postgres database which I had set up for that demo on Amazon RDS, ran out of disk space.\nThe machine had a disk size of 200 GiB which was fully used up in the course of less than two weeks.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eNow a common cause for this kind of issue are replication slots which are not advanced:\nin that case, Postgres will hold on to all WAL segments after the latest log sequence number (\u003ca href=\"https://pgpedia.info/l/LSN-log-sequence-number.html\"\u003eLSN\u003c/a\u003e) which was confirmed for that slot.\nIndeed I had set up a replication slot (via the \u003ca href=\"https://www.decodable.co/connectors/postgres-cdc\"\u003eDecodable CDC source connector for Postgres\u003c/a\u003e, which is based on \u003ca href=\"https://debezium.io\"\u003eDebezium\u003c/a\u003e).\nI then had stopped that connector, causing the slot to become inactive.\nThe problem was though that I was really sure that there was no traffic in that database whatsoever!\nWhat could cause a WAL growth of ~18 GB/day then?\u003c/p\u003e\n\u003c/div\u003e","tags":["postgres","cdc","troubleshooting"],"title":"The Insatiable Postgres Replication Slot","uri":"https://www.morling.dev/blog/insatiable-postgres-replication-slot/"},{"content":"","id":113,"publicationdate":"Nov 28, 2022","section":"tags","summary":"","tags":null,"title":"deployment","uri":"https://www.morling.dev/tags/deployment/"},{"content":"","id":114,"publicationdate":"Nov 28, 2022","section":"tags","summary":"","tags":null,"title":"quarkus","uri":"https://www.morling.dev/tags/quarkus/"},{"content":" Table of Contents Java Applications on Render Configuration Details Docker Hub Access Token GitHub Actions This is a quick run down of the steps required for running JVM applications, built using Quarkus and GraalVM, on Render.\nRender is a cloud platform for running websites and applications. Like most other comparable services such as fly.io, it offers a decent free tier, which lets you try out the service without any financial commitment. Unlike most other services, with Render, you don’t need to provide a credit card in order to use the free tier. Which means there’s no risk of surprise bills, as often is the case with pay-per-use models, where a malicious actor could DDOS your service and drive up cost for consumed CPU resources or egress bandwidth indefinitely.\nIf the free tier limits are reached (see Free Plans for details), your services are shut down, until you either upgrade to a paid plan or the next month has started. This makes Render particularly interesting for personal projects and hobbyist use cases, for which you typically don’t have ops staff around who are looking 24/7 at dashboards and budget alerts and could take down the service in case of a DDOS attack.\nJava Applications on Render Render offers a PaaS-like model: when configuring an application, you point Render to a Git repository with your source code, and the platform will build and deploy it after each push to that repo. Unfortunately, Java is not amongst the supported languages right now. But Render also allows you to deploy applications via Docker, so that’s what we’ll use.\nAs an example project, I have created a very basic Quarkus-based web service. It is generated using code.quarkus.io and contains a single /hello REST endpoint. To make the best use of the resources of the constrained free tier, it is compiled into a native application using GraalVM. That way, it consumes way less memory than when running on the JVM. Feel free to use it for your own experiments.\nRender always builds deployed applications from source, i.e. there is no way for deploying a ready-made container image from a registry like Docker Hub. Now we could build our application using Docker on Render, but I have decided against that for two reasons:\nIt’s quite slow: the free tier allocates a rather limited CPU quota to build jobs, so building the container image for that simple Quarkus application takes more than ten minutes\nI like to have my application images in a container image registry, which for instance allows me to run exactly the same bits locally for debugging purposes\nIf you still would like to build a container image for your application directly on Render, check out the Quarkus documentation on multi-stage Docker builds. It describes how to build a Quarkus application within Docker, which is what you need to do in the absence of bespoke support for Java on Render.\nSo I ended up with the following flow for deploying that Quarkus application on Render:\nWhen a commit is pushed to the source repository (1), then a GitHub Action is triggered (2), which builds the application as a native binary, using GraalVM’s native-image tool. The resulting binary is packaged up as a container image, which is deployed to the Docker Hub registry (3). Once the image has been uploaded, a new deployment is triggered on Render (4). The deploy job fetches the container image from Docker Hub and builds the actual image for deployment (5), and finally the service is published to the outside world (6).\nConfiguration Details Now let’s dive into some specifics of the configuration on Render and GitHub. Once you have signed up for your Render account, go to the main dashboard and click the \u0026#34;New +\u0026#34; button for creating a new \u0026#34;Web Service\u0026#34;.\nYou then have two options: \u0026#34;Connect a repository\u0026#34; and \u0026#34;Public Git repository\u0026#34;. The former makes things a bit simpler to use, for instance by configuring all the webhook magic required for a tight integration between GitHub (or GitLab) and Render. It requires more permissions than I’m comfortable with though, one of them being \u0026#34;Act on your behalf\u0026#34;. So my recommendation is to go with the second option; it requires some more manual configuration, but it feels a bit safer to me. Specify the URL of your repository and click \u0026#34;Continue\u0026#34;:\nOn the following page, enter the following information:\nName: A unique name for your new application\nRegion: Choose where your application should be deployed\nEnvironment: Choose Docker here, then \u0026#34;Free\u0026#34; plan\nDockerfile Path (under \u0026#34;Advanced\u0026#34;): Specify ./src/main/docker/Dockerfile.render; this is a very simple Dockerfile which has the sole purpose of letting Render build an image for deployment; it simply is derived from the actual image with the application which is deployed to Docker Hub:\n1 FROM gunnarmorling/quarkus-on-render:latest Deploy Hook: Note down this generated URL, you will need it later when configuring the deployment trigger with GitHub Actions\nDocker Hub Access Token Next, create an access token for Docker Hub. This will be used for authenticating the GitHub Action when pushing an image to Docker Hub. Log into your Docker Hub account, click on your name at the upper right corner and choose \u0026#34;Account Settings\u0026#34;. Go to \u0026#34;Security\u0026#34; and click on \u0026#34;New Access Token\u0026#34;.\nSpecify a description for the token and choose \u0026#34;Read \u0026amp; Write\u0026#34; for its access permissions. On the next screen, make sure to copy the generated token, as it will be the only opportunity where you can see it.\nGitHub Actions The last part of the puzzle is setting up a GitHub Action which builds the application, pushes the container image with the application to Docker Hub and triggers a new deployment on Render. Navigate to your repository, click on the \u0026#34;Settings\u0026#34; tab and choose \u0026#34;Security\u0026#34; → \u0026#34;Secrets\u0026#34; → \u0026#34;Actions\u0026#34;.\nSet up the following three repository secrets:\nDOCKERHUB_TOKEN: The access token you just generated on Docker Hub\nDOCKERHUB_USERNAME: Your Docker Hub account name\nRENDER_DEPLOY_HOOK: The deploy hook URL from Render\nThese secrets will be used in the GitHub Action. The Action itself is a big wall of YAML, but most of the things should be fairly self-descriptive:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 name: ci on: push: branches: - \u0026#39;main\u0026#39; jobs: docker: runs-on: ubuntu-latest steps: - name: \u0026#39;Check out repository\u0026#39; (1) uses: actions/checkout@v3 - uses: graalvm/setup-graalvm@v1 (2) with: version: \u0026#39;latest\u0026#39; java-version: \u0026#39;17\u0026#39; components: \u0026#39;native-image\u0026#39; github-token: ${{ secrets.GITHUB_TOKEN }} - name: \u0026#39;Cache Maven packages\u0026#39; uses: actions/cache@v3.0.11 with: path: ~/.m2 key: ${{ runner.os }}-m2-${{ hashFiles(\u0026#39;**/pom.xml\u0026#39;) }} restore-keys: ${{ runner.os }}-m2 - name: \u0026#39;Build\u0026#39; (3) run: \u0026gt; ./mvnw -B --file pom.xml verify -Pnative -Dquarkus.native.additional-build-args=-H:-UseContainerSupport - name: Set up QEMU uses: docker/setup-qemu-action@v2 - name: Set up Docker Buildx (4) uses: docker/setup-buildx-action@v2 - name: Login to Docker Hub uses: docker/login-action@v2 with: username: ${{ secrets.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }} - name: Build and push (5) uses: docker/build-push-action@v3 with: context: . push: true file: src/main/docker/Dockerfile.native tags: gunnarmorling/quarkus-on-render:latest - name: Deploy (6) uses: fjogeleit/http-request-action@v1 with: url: ${{ secrets.RENDER_DEPLOY_HOOK }} method: \u0026#39;POST\u0026#39; 1 Retrieve the source code of the application 2 Install GraalVM and its native-image tool 3 Build the project; the -Pnative build option instructs Quarkus to emit a native binary via GraalVM; more on the need for the -H:-UseContainerSupport option further below 4 Install Docker and log into Docker Hub 5 Build the container image and push it to Docker Hub; the used Dockerfile is the one generated by the Quarkus project creation wizard on code.quarkus.io; it packages takes a native binary based on the ubi-minimal base image: 1 2 3 4 5 6 7 8 9 10 11 12 FROM registry.access.redhat.com/ubi8/ubi-minimal:8.6 WORKDIR /work/ RUN chown 1001 /work \\ \u0026amp;\u0026amp; chmod \u0026#34;g+rwX\u0026#34; /work \\ \u0026amp;\u0026amp; chown 1001:root /work COPY --chown=1001:root target/*-runner /work/application EXPOSE 8080 USER 1001 CMD [\u0026#34;./application\u0026#34;, \u0026#34;-Dquarkus.http.host=0.0.0.0\u0026#34;] Note that setting the build context to . is vital in order to actually package the binary produced by the previous build step; without this, the Docker action would check out a fresh copy of the source repository itself\n6 Trigger a new deployment of the application on Render by invoking the deploy hook You can find the original YAML file here in my example repository. In fact, I am quite impressed how powerful GitHub Actions is by means of using ready-made actions for interacting with Docker, setting up GraalVM, invoking HTTP endpoints, and others.\nOne thing which deserves a special mention is the need for specifying the -H:-UseContainerSupport option when invoking the native-image tool via Quarkus. This is a work-around for GraalVM bug #4757 which triggers an exception upon invocation the method java.lang.Runtime::availableProcessors(). It seems the GraalVM code stumbles upon cgroup paths containing a colon, which apparently is the case in the Docker environment on Render (a similar bug, JDK-8272124, has been fixed in OpenJDK recently).\nBy disabling the container support when building the application this issue is circumvented, the solution is not ideal though: when determining the number of available processors, any CPU quotas applied for the container will not be considered, but rather the number of cores from the host system will be returned (8 in the case of Render as per a quick test I did). This causes thread pools in the application, like the common fork-join pool, to be sized superfluously large, potentially resulting in performance degredations at runtime. So let’s hope that issue in GraalVM will be fixed soon.\nAnd that’s all there is to it: at this point, you should have all the configuration in place for running a Java application — compiled into a native binary using Quarkus and GraalVM — on the Render cloud platform. Whenever you push a commit to the source repository, a new version of the application will be built, pushed as a container image to Docker Hub, and deployed on Render. The end-to-end execution time for that flow is ca. five minutes, about twice as fast as when building everything on Render itself. To further improve build times, you’d have to invest in beefier build machines; while compilation times with GraalVM have improved quite a bit over the last few years, it’s still a rather time-consuming experience.\nCheck out my repository on GitHub for the complete source code of the example application, with GitHub Actions definition, Maven POM file, etc.\n","id":115,"publicationdate":"Nov 28, 2022","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_java_applications_on_render\"\u003eJava Applications on Render\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_configuration_details\"\u003eConfiguration Details\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_docker_hub_access_token\"\u003eDocker Hub Access Token\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_github_actions\"\u003eGitHub Actions\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis is a quick run down of the steps required for running JVM applications,\nbuilt using Quarkus and GraalVM, on Render.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://render.com/\"\u003eRender\u003c/a\u003e is a cloud platform for running websites and applications.\nLike most other comparable services such as \u003ca href=\"https://fly.io/\"\u003efly.io\u003c/a\u003e,\nit offers a decent free tier, which lets you try out the service without any financial commitment.\n\u003cem\u003eUnlike\u003c/em\u003e most other services,\nwith Render, you don’t need to provide a credit card in order to use the free tier.\nWhich means there’s no risk of surprise bills, as often is the case with pay-per-use models,\nwhere a malicious actor could DDOS your service and drive up cost for consumed CPU resources or egress bandwidth indefinitely.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","quarkus","graalvm","deployment"],"title":"Running a Quarkus Native Application on Render","uri":"https://www.morling.dev/blog/running-quarkus-native-app-on-render/"},{"content":"","id":116,"publicationdate":"Nov 3, 2022","section":"tags","summary":"","tags":null,"title":"career","uri":"https://www.morling.dev/tags/career/"},{"content":" Table of Contents The Space: Real-Time Stream Processing The Environment: A Start-up The Team: One of a Kind Outlook It’s my first week as a software engineer at Decodable, a start-up building a serverless real-time data platform! When I shared this news on social media yesterday, folks were not only super supportive and excited for me (thank you so much for all the nice words and wishes!), but some also asked about the reasons behind my decision for switching jobs and going to a start-up, after having worked for Red Hat for the last few years. That’s a great question indeed, and I thought I’d put down some thoughts in this post. To me, it boils down to three key aspects: the general field of work, the environment, and the team. In the following, I’ll drill a bit further into each of them.\nThe Space: Real-Time Stream Processing Over the last five years, I’ve worked on Debezium, a popular open-source platform for change data capture (CDC). It retrieves change events from the transaction logs of databases such as MySQL and Postgres and emits them in a uniform event format to consumers via data streaming platforms like Apache Kafka, Pulsar or Amazon Kinesis. Reacting to low-latency change events enables all kinds of very interesting use cases, ranging from replication to other databases or (cloud) data warehouses, over updating caches and search indexes, to continuous queries over your operational data, or migrating monolithic architectures to microservices.\nNow, CDC is an important part of data pipelines for implementing such use cases, but it’s not the only one. You need to reason about the sink side of your pipelines and how to get your data from the streaming platform into your target system. There’s many critical questions there, such as: How do you propagate type metadata? How do you handle changes to the schema of your data? How to deal with duplicate events? Another concern is processing data, as it flows through your pipelines; you might want to filter records based on specific criteria and patterns, apply format conversions, group and aggregate data, join multiple data streams, and more. Lastly, there’s many other kinds of data sources besides CDC, such as sensor data in IoT scenarios, click streams from websites, APIs, and more.\nThis all is to say that I am really excited about the chance to take things to the next level and explore the field of stream processing at large, helping people to implement their streaming use cases end-to-end. Very often, once people have implemented their first low-latency data streaming use case and for instance observe data in their DWH within a second or two after a change has occurred in their operational database, there’s no going back, and they want this for everything. Of course it’s impossible to predict the future, but I think stream processing is at this point on the famous \u0026#34;hockey stick\u0026#34; curve right before it’s massively taking off, and it’s the perfect time to join this space.\nFrom a technical perspective, Apache Flink, an open-source \u0026#34;processing engine for stateful computations over unbounded and bounded data streams\u0026#34; is an excellent and proven foundation for this, and it’s a core technology behind Decodable. Getting my feet wet with Flink, learning about it in detail and hopefully contributing to it is one of the first things I’m planning to do. At the same time, I also think there’s lots of potential for further improving the user experience here, for instance by processing CDC events in a transactional way, smoothly handling schema changes, and much more. Exciting times!\nThe Environment: A Start-up Up to this point, I have mostly worked at large, established companies and enterprises during my career. Red Hat, where I’ve been at for the last ten years, grew to more than 20,000 employees during that time. Other places, like the German e-commerce giant Otto Group, had even larger workforce sizes of 50,000 people and more.\nAs with everything, being with such a large company has its pros and cons. On the upside, it’s a relatively safe bet, there’s brand recognition, you can approach and tackle huge undertakings as part of a big organization. At the same time, there tends to be quite a bit of process overhead, things can take a long time, there can be lots of politics, you need approval and buy-in for many things, etc. Note I am not saying that any of this is necessarily a bad thing (Ok, doing your travel expenses just sucks. Period.), lots of it makes sense and just is a reality in a large organization.\nThat all being said, I just felt that I want to gather some experience in a small environment, in a start-up company. I want to find out how it is to work in this kind of setting, being part of a small, hyper-focused team, working jointly towards one common goal and shared vision. Coming up with ideas, giving it a try, seeing what flies, and what doesn’t. Putting something minimal yet useful out there and quickly gathering user feedback. Having a good sense for your own impact. Seeing how the company grows and evolves. That’s the kind of sensation I am looking for and which I am hoping to find by working at Decodable.\nI could experience a first taste of the agility even before my first day at the company: \u0026#34;Would you feel comfortable to just buy a laptop of your choosing by yourself and expense it?\u0026#34; Sure thing! Some clicks and a few days later I had a very nice MacBook delivered to my doorstep. If you’ve been at bigger organizations, you’ll know how complicated such seemingly simple things like getting a new laptop can be.\nAt the same time, judging by my impressions during interviews, Decodable is a very mature start-up. Most folks have lots of experience, they are senior, in a very positive sense. Sure, there’s a ton of things on our plates, but there’s no expectation to work crazy hours. Many people here have families, and there’s a very healthy culture where it’s just normal that people have unforeseen situations where they need to pick up their kids on short notice, things like that. People are treated as the grown-ups they are, with lots of autonomy and trust by the leadership. Another key aspect for me is transparency: it’s one of the company’s core values, so everyone has the chance to know what’s going on (technically, business-wise, etc.), which gave me lots of confidence and trust when making the decision to join the team.\nThe Team: One of a Kind One of the clichés in the industry is: \u0026#34;It’s all about the people\u0026#34;. And yes, it is a cliché, but I’m also 100% convinced that it is true. You could work on the most amazing piece of technology, but if you don’t get along with the people around you, it won’t be an awful lot of fun. Or rather, it could be really bad.\nSo getting a vibe for the team and the people at Decodable was one of the most important things to me when I interviewed with them. And all I can say is that I was really impressed. Starting with the founder and CEO Eric Sammer, I had the opportunity to speak with about one third of the company’s employees during the interviewing process (talking to everyone is one of my personal onboarding goals, when do you ever get that chance?). I loved the passion, but also the respectfulness and sincereness of everyone. Needless to say that I’m deeply impressed with what the team has accomplished so far, since Decodable launched last year. I experienced Eric as a very considerate and mindful person, caring deeply about the concerns of the company’s employees. Plus, not only is he a legend in the data space, he’s also super well connected within Silicon Valley, opening up lots of doors for the company. Decodable being his second start-up will surely help us to avoid many mistakes.\nIn regards to the hiring process itself, it could probably be a topic for a separate blog post. The experience was nothing but excellent, with everyone being very open and transparent, willing to answer any questions I had. It really wasn’t that much of a series of interviews, but rather really good two-way conversations which helped us to get to know each other and find out whether I would be a good fit for Decodable, and whether the company would be a good fit for me. All in all, I very quickly had a feeling that this is a group of people I want to work with. I’m sure the direction of the company and the product can and will be adjusted over time, but this is a team I can’t wait to work with to make this a success.\nOutlook So those are the three key reasons which made me join Decodable: the exciting field of data streaming, the start-up environment, and a highly competent and friendly team.\nIn case you’re wondering what exactly I will be doing — that’s something we’re still figuring out. I am a member of the engineering organization, so I will get my fingers onto Apache Flink, but of course also on Decodable’s SaaS product around it. But I’m also planning to continue my fair share of evangelization work and talk about technology and its applications in blog posts or conference sessions. I hope to share my input on the product, be part of customer conversations, and much more. For the beginning, I’ll mostly focus on learning and sharing feedback based on my perspective of being the \u0026#34;new guy\u0026#34; on the team.\nFully adhering to the start-up spirit, I’m sure things will be very much in flux and my responsibilities will shift over time. But that dynamic is exactly what I’m looking for by joining Decodable. Let’s do this!\n","id":117,"publicationdate":"Nov 3, 2022","section":"blog","summary":"Table of Contents The Space: Real-Time Stream Processing The Environment: A Start-up The Team: One of a Kind Outlook It’s my first week as a software engineer at Decodable, a start-up building a serverless real-time data platform! When I shared this news on social media yesterday, folks were not only super supportive and excited for me (thank you so much for all the nice words and wishes!), but some also asked about the reasons behind my decision for switching jobs and going to a start-up, after having worked for Red Hat for the last few years.","tags":["career","streaming","flink"],"title":"Why I Joined Decodable","uri":"https://www.morling.dev/blog/why-i-joined-decodable/"},{"content":" Table of Contents Standalone or Distributed? Issues with Kafka Connect on Kubernetes A Vision for Kubernetes-native Kafka Connect Kafka Connect, part of the Apache Kafka project, is a development framework and runtime for connectors which either ingest data into Kafka clusters (source connectors) or propagate data from Kafka into external systems (sink connectors). A diverse ecosystem of ready-made connectors has come to life on top of Kafka Connect, which lets you connect all kinds of data stores, APIs, and other systems to Kafka in a no-code approach.\nWith the continued move towards running software in the cloud and on Kubernetes in particular, it’s just natural that many folks also try to run Kafka Connect on Kubernetes. On first thought, this should be simple enough: just take the Connect binary and some connector(s), put them into a container image, and schedule it for execution on Kubernetes. As so often, the devil is in the details though: should you use Connect’s standalone or distributed mode? How can you control the lifecycle of specific connectors via the Kubernetes control plane? How to make sure different connectors don’t compete unfairly on resources such as CPU, RAM, or network bandwidth? In the remainder of this blog post, I’d like to explore running Kafka Connect on Kubernetes, what some of the challenges are for doing so, and how Kafka Connect could potentially be reimagined to become more \u0026#34;Kubernetes-friendly\u0026#34; in the future.\nStandalone or Distributed? If you’ve used Kafka Connect before, then you’ll know that it has two modes of execution: standalone and distributed. In the former, you configure Connect via property files which you pass as parameters during launch. There will be a single process which executes all the configured connectors and their tasks. In distributed mode, multiple Kafka Connect worker nodes running on different machines form a cluster onto which the workload of the connectors and their tasks is distributed. Configuration is done via a REST API which is exposed on all the worker nodes. Internally, A Connect-specific protocol (which itself is based on Kafka’s group membership protocol) is used for the purposes of coordination and task assignment.\nThe distributed mode is in general the preferred and recommended mode of operating Connect in production, due to its obvious advantages in regards to scalability (one connector can spawn many tasks which are executed on different machines), reliability (connector configuration and offset state is stored in Kafka topics rather than files in the local file system), and fault tolerance (if one worker node crashes, the tasks which were scheduled on that node can be transparently rebalanced to other members of the Connect cluster).\nThat’s why also Kafka users on Kubernetes typically opt for Connect’s distributed mode, as for instance it’s the case with Strimzi’s operator for Kafka Connect. But that’s not without its issues either, as now essentially two scheduling systems are competing with each other: Kubernetes itself (scheduling pods to compute nodes), and Connect’s worker coordination mechanism (scheduling connector tasks to Connect worker nodes). This becomes particularly apparent in case of node failures. Should the Kubernetes scheduler spin up the affected pods on another node in the Kubernetes cluster, or should you rely on Connect to schedule the affected tasks to another Connect worker node? Granted, improvements in this area have been made, for instance in form of Kafka improvement proposal KIP-415 (\u0026#34;Incremental Cooperative Rebalancing in Kafka Connect\u0026#34;). It adds a new configuration property scheduled.rebalance.max.delay.ms, allowing you to defer rebalances after worker failures. But such a setting will always be a trade-off, and I think in general it’s fair to say that if there’s multiple components in a system which share the same responsibility (placement of workloads), that’s likely going to be a friction point.\nIssues with Kafka Connect on Kubernetes So let’s explore a bit more the challenges users often encounter when running Kafka Connect on Kubernetes. One general problem is the lack of awareness for running on Kubernetes from a Connect perspective.\nFor instance, consider the case of a stretched Kubernetes cluster, with Kubernetes nodes running in different regions of a cloud provider, or within different data centers. Let’s assume you have a source connector which ingests data from a database running within one of the regions. As you’re only interested in a subset of the records produced by that connector, you use a Kafka Connect single message transformation for filtering out a significant number of records. In that scenario, it makes sense to deploy that connector in local proximity to the database it connects to, so as to limit the data that’s transferred across network boundaries. But Kafka Connect doesn’t have any understanding of \u0026#34;regions\u0026#34; or related Kubernetes concepts like node selectors or node pools, i.e. you’ll lack the control needed for making sure that the tasks of that connector get scheduled onto Connect worker nodes running on the right Kubernetes nodes (a mitigation strategy would be to set up multiple Connect clusters, tied to specific Kubernetes node pools in the different regions).\nA second big source of issues is Connect’s model for the deployment of connectors, which in a way resembles the approach taken by Java application servers in the past: multiple, independent connectors are deployed and executed in shared JVM processes. This results in a lack of isolation between connectors, which can have far-reaching consequences in production scenarios:\nConnectors compete on resources: one connector or task can use up an unfairly large share of CPU, RAM or networking resources assigned to a pod, so that other connectors running on the same Connect worker will be negatively impacted; this could be caused by bugs or poor programming, but it also can simply be a result of different workload requirements, with one connector requiring more resources than others. While a rate limiting feature for Connect is being proposed via KIP-731 (which may eventually address the issue of distributing network resources more fairly), there’s no satisfying answer for assigning and limiting CPU and RAM resources when running multiple connectors on one shared JVM, due to its lack of application isolation.\nScaling complexities: when increasing the number of tasks of a connector (so as to scale out its load), it’s likely also necessary to increase the number of Connect workers, unless there were idle workers before; this process seems more complex and at the same time less powerful than it should be. For instance, there’s no way for ensuring that additional worker nodes would exclusively be used for the tasks of one particularly demanding connector.\nSecurity implications: as per the OpenJDK Vulnerability Group, \u0026#34;speculative execution vulnerabilities (e.g., Meltdown, Spectre, and RowHammer) cannot be addressed in the JDK. These hardware design flaws make complete intra-process isolation impossible\u0026#34;. Malicious connectors could leverage these attack vectors for instance to obtain secrets from other connectors running on the same JVM. Furthermore, some connectors rely on secrets (such as cloud SDK credentials) to be provided in the form of environment variables or Java system properties, which by definition are accessible by all connectors scheduled on the same Connect worker node.\nRisk of resource leaks_ _: Incorrectly implemented connectors can cause memory and thread leaks after they were stopped, resulting in out-of-memory errors after stopping and restarting them several times, potentially impacting other connectors and tasks running on the same Connect worker node.\nCan’t use Kubernetes health checks: as health checks (such as liveness probes) work on the container level, a failed health check would restart the container, and thus Connect worker node with all its connectors, even if only one connector is actually failing. On the other hand, when relying on Connect itself to restart failed connectors and/or tasks, that’s not visible at the level of the Kubernetes control plane, resulting potentially in a false impression of a good health status of a connector, while it actually is in a restarting loop.\nCan’t easily examine logs of a single connector: When examining the logs of a Kafka Connect pod, messages from multiple running connectors will potentially show up in an interweaved way, depending on the specific logger configurations; as log messages can be prefixed with the connector name, that’s not that much of an issue when analyzing logs in dedicated tools like Logstash or Splunk, but it can be challenging when looking at the raw pod logs on the command line or via a Kubernetes web console.\nCan’t run multiple versions of one connector: As connectors are solely identified by their classname, it’s not possible to set up a connector instance of a specific version in case there’s multiple versions of that connector present.\nLastly, a third category of issues with running Connect on Kubernetes stems from the inherently mutable design of the system and the ability to dynamically instantiate and reconfigure connectors at runtime via a REST API.\nWithout proper discipline, this can quickly lead to a lack of insight into the connector configuration applying at a given time (in Strimzi, this is solved by preferrably deploying connectors via custom Kubernetes resources, rather than invoking the REST API directly). In fact, the REST API itself can be a source of issues: access to it needs to be secured in production use cases, also I’ve come across multiple reports over the years (and witnessed myself) where the REST API became unresponsive, while Connect itself still was running. It’s not exactly clear why this happened, but one potential course could be a buggy connector, consuming 100% of CPU cycles, leaving not enough resources for the REST API worker threads. Essentially, I think that such a control plane element like a REST API shouldn’t really be exposed on each member of a data plane, as represented by Connect worker nodes.\nBased on all these challenges, in particular those around lacking isolation between different connectors, many users of Kafka Connect stick to the practice of actually not deploying multiple connectors into shared worker clusters, but instead operate a dedicated cluster of Kafka Connect for each connector. This could be a cluster with a node count equal to the configured number of tasks, essentially resulting in 1:1 mapping of tasks to worker processes. Some users also deploy a number of spare workers for fail-over purposes. In fact, that’s the recommendation we’ve been giving to users in the Debezium community for a long time, and it also tends to be a common choice amongst providers of managed Kafka Connect services. Another approach taken by some teams is to deploy specific Connect clusters per connector type , preventing interferences between different kinds of connectors.\nAll these strategies can help to run connectors for running connectors safely and reliably, but the operational overhead of running multiple Connect clusters is evident.\nA Vision for Kubernetes-native Kafka Connect Having explored the potential issues with running Kafka Connect on Kubernetes, let’s finally discuss how Connect could be reimagined for being more Kubernetes-friendly. What are the parts that could remain? Which things would have to change? Many of the questions and shortcomings raised above — such as workload isolation, applying resource constraints, capability-based scheduling, lifecycle management — have been solved by Kubernetes at the pod level already, so how could that foundation be leveraged for Kafka Connect?\nTo put a disclaimer first: this part of this post may be a bit dissatisfying to read for some, as it merely describes an idea, I haven’t actually implemented any of this. My line of thinking is to hopefully ignite a discussion in the community and gauge the general level of interest, perhaps even motivating someone in the community to follow through and make this a reality. At least, that’s the plan :)\nThe general idea is to keep all the actual runtime bits and pieces of Connect: that’s key to being able to run all the amazing existing connectors out there, which are implemented against Connect’s framework interfaces. All the semantics and behaviors, like converters and SMTs, retries, dead-letter queue support, the upcoming exactly-once support for source connectors (KIP-618), all that could just be used as is.\nBut the entire layer for forming and coordinating clusters of worker nodes and distributing tasks amongst them would be replaced by a Kubernetes operator. To quote the official docs, \u0026#34;operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop\u0026#34;. The overall architecture would look like this:\nIn this envisioned model for Kafka Connect, such an operator would spin up one separate Kubernetes pod (and thus JVM process) for each connector task of a connector. Conceptually, those task processes would be somewhat of a mixture between today’s Connect standalone and distributed modes. Like standalone mode in the sense, that there would be no coordination amongst worker nodes and also no capability to dynamically reconfigure or start and stop a running task; each process/pod would run exactly one task in isolation, coordinated by the operator. Similar to distributed mode in the sense, that there would be a read-only REST API for health information, and that connector offsets would be stored in a Kafka topic, so as to avoid any pod-local state. There wouldn’t be the need for the configuration topic though, as the configuration would be passed upon start-up to the task pods (again akin to standalone mode today, e.g. by mapping a properties file to the pod), with the custom Kubernetes resources defining the connectors being the \u0026#34;system of record\u0026#34; for their configuration.\nFor this to work, the connector configuration needs to be pre-sliced into task-specific chunks. This could happen in two different ways, depending on the implementation of the specific connectors. For connectors which have a static set of tasks which doesn’t change at runtime (that’s the case for the Debezium connectors, for instance), the operator would deploy a short-lived pod on the Kubernetes cluster which runs the actual Connector implementation class and invoke its taskConfigs(int maxTasks) method . This could be implemented using a Kubernetes job, for instance. Once the operator has received the result (a map with one configuration entry per task), the connector pod can be stopped again and the operator will deploy one pod for each configured task, passing its specific configuration to the pod.\nThings get a bit more tricky if connectors dynamically change the number and/or configuration of tasks at runtime, which also is possible with Connect. For instance, that’s the case for the MirrorMaker 2 connector. Such a connector typically spins up a dedicated thread upon start-up which monitors some input resource. If that resource’s state changes (say, a new topic to replicate gets detected by MirrorMaker 2), it invokes the ConnectorContext::requestTaskReconfiguration() method, which in turn lets Connect retrieve the task configuration from the connector. This requires a permanently running pod for that connector class . Right now, there’d be no way for the operator to know whether that connector pod can be short-lived (static task set) or must be long-lived (dynamic task set). Either Connect itself would define some means of metadata for connectors to declare that information, or it could be part of the Kubernetes custom resource for a connector described in the next section.\nThe configuration of connectors would happen — the Kubernetes way — via custom resources. This could look rather similar to how Connect and connectors are deployed via CRs with Strimzi today; the only difference being that there’d be one CR which describes both Connect (and the resource limits to apply, the connector archive to run) and the actual connector configuration. Here’s an example how that could look like (again, that’s a sketch of how such a CR could look like, this won’t work with Strimzi right now):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaConnector metadata: name: debezium-connect-cluster spec: version: 3.2.0 bootstrapServers: debezium-cluster-kafka-bootstrap:9092 config: config.providers: secrets config.providers.secrets.class: io.strimzi.kafka.KubernetesSecretConfigProvider group.id: connect-cluster offset.storage.topic: connect-cluster-offsets config.storage.topic: connect-cluster-configs status.storage.topic: connect-cluster-status connector: class: io.debezium.connector.mysql.MySqlConnector tasksMax: 1 database.hostname: mysql database.port: 3306 database.user: ${secrets:debezium-example/debezium-secret:username} database.password: ${secrets:debezium-example/debezium-secret:password} database.server.id: 184054 database.server.name: mysql database.include.list: inventory database.history.kafka.bootstrap.servers: debezium-cluster-kafka-bootstrap:9092 database.history.kafka.topic: schema-changes.inventory build: output: type: docker image: 10.110.154.103/debezium-connect-mysql:latest plugins: - name: debezium-mysql-connector artifacts: - type: tgz url: https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/1.9.0.Final/debezium-connector-mysql-1.9.0.Final-plugin.tar.gz The operator would react to the creation, modification, or deletion of this resource, retrieve the (initial) task configuration as described above and spin up corresponding connector and task pods. To stop or restart a connector or task, the user would update the resource state accordingly, upon which the operator would stop and restart the affected pod(s).\nSuch an operator-based design addresses all the concerns for running Connect on Kubernetes identified above:\nOnly one component in charge of workload distribution: by removing Connect’s own clustering layer from the picture, the scheduling of tasks to compute resources is completely left to one component, the operator; it will determine the number and configuration of tasks to be executed and schedule a pod for each of them; regular health checks can be used for monitoring the state of each task, restarting failed task pods as needed; a degraded health state should be exposed if a connector task is in a retrying loop, so as to make this situation apparent at the Kubernetes level; if a pod crashes, it can be restarted by the operator on the same or another node of the Kubernetes cluster, not requiring any kind of task rebalancing from a Connect perspective. Node selectors could be used to pin a task to specific node groups, e.g. in a specific region or availability zone.\nOne JVM process and Kubernetes pod per task: by launching each task in its own process, all the isolation issues discussed above can be avoided, preventing multiple tasks from negatively impacting each other. If needed, Kubernetes resource limits can be put in place in order to effectively cap the resources available to one particular task, such as CPU and RAM, while also allowing to schedule all the task pods tightly packed onto the compute nodes, making efficient use of the available resources. As each process runs exactly one task, log files are easy to consume and analyze. Scaling out can happen by increasing a single configuration parameter in the CR, and a corresponding number of task pods will be deployed by the operator. Thread leaks become a non-issue too, as there would be no notion of stopping or pausing a task; instead, just the pod itself would be stopped for that purpose, terminating the JVM process running inside of it. On the downside, the overall memory consumption across all the tasks would be increased, as there would be no amortization of Connect classes loaded into JVM processes shared by multiple tasks. Considering the significant advantages of process-based isolation, this seems like an acceptable trade-off, just as Java application developers largely have moved on from the model of co-deploying several applications into shared application server instances.\nImmutable design: by driving configuration solely through Kubernetes resources and passing the resulting Connect configuration as parameters to the Connect process upon start-up, there’s no need for exposing a mutating REST API (there’d still be a REST endpoint exposing health information), making things more secure and potentially less complex internally, as the entire machinery for pausing/resuming, dynamically reconfiguring and stopping tasks could be removed. At any time, a connector’s configuration would be apparent by examining its CR, which ideally should be sourced from an SCM (GitOps).\nLooking further out into the future, such a design for making Kafka Connect Kubernetes-native would also allow for other, potentially very interesting explorations: for instance one could compile connectors into native binaries using GraalVM, resulting in a significantly lower consumption of memory and faster start-up times (e.g. when reconfiguring a connector and subsequently restarting the corresponding pod), making that model very interesting for densely packed Kubernetes environments. A buildtime toolkit like Quarkus could be used for producing specifically tailored executables, which run exactly one single connector task on top of the Connect framework infrastructure, a bit similar to how Camel-K works under the hood. Ultimately, such Kubernetes-native design could even open up the door to Kafka connectors being built in languages and runtimes other than Java and the JVM, similar to the route explored by the Conduit project.\nIf you think this all sounds exciting and should become a reality, I would love to hear from you. One aspect of specific interest will be which of the proposed changes would have to be implemented within Kafka Connect itself (vs. a separate operator project, for instance under the Strimzi umbrella), without disrupting non-Kubernetes users. In any case, it would be amazing to see the Kafka community at large take its steps towards making Connect truly Kubernetes-native and fully taking advantage of this immensely successful container orchestration platform!\nMany thanks to Tom Bentley, Tom Cooper, Ryanne Dolan, Neil Buesing, Mickael Maison, Mattia Mascia, Paolo Patierno, Jakub Scholz, and Kate Stanley for providing their feedback while writing this post!\n","id":118,"publicationdate":"Sep 6, 2022","section":"blog","summary":"Table of Contents Standalone or Distributed? Issues with Kafka Connect on Kubernetes A Vision for Kubernetes-native Kafka Connect Kafka Connect, part of the Apache Kafka project, is a development framework and runtime for connectors which either ingest data into Kafka clusters (source connectors) or propagate data from Kafka into external systems (sink connectors). A diverse ecosystem of ready-made connectors has come to life on top of Kafka Connect, which lets you connect all kinds of data stores, APIs, and other systems to Kafka in a no-code approach.","tags":["kafka","kafka-connect","kubernetes"],"title":"An Ideation for Kubernetes-native Kafka Connect","uri":"https://www.morling.dev/blog/ideation-kubernetes-native-kafka-connect/"},{"content":"","id":119,"publicationdate":"Sep 6, 2022","section":"tags","summary":"","tags":null,"title":"kafka-connect","uri":"https://www.morling.dev/tags/kafka-connect/"},{"content":"","id":120,"publicationdate":"Aug 25, 2022","section":"tags","summary":"","tags":null,"title":"testing","uri":"https://www.morling.dev/tags/testing/"},{"content":" Table of Contents Unit Tests Integration Tests Wrap-Up Kafka Connect is a key factor for the wide-spread adoption of Apache Kafka: a framework and runtime environment for connectors, it makes the task of getting data either into Kafka or out of Kafka solely a matter of configuration, rather than a bespoke programming job. There’s dozens, if not hundreds, of readymade source and sink connectors, allowing you to create no-code data pipelines between all kinds of databases, APIs, and other systems.\nThere may be situations though where there is no existing connector matching your requirements, in which case you can implement your own custom connector using the Kafka Connect framework. Naturally, this raises the question of how to test such a Kafka connector, making sure it propagates the data between the connected external system and Kafka correctly and completely. In this blog post I’d like to focus on testing approaches for Kafka Connect source connectors, i.e. connectors like Debezium, which ingest data from an external system into Kafka. Very similar strategies can be employed for testing sink connectors, though.\nUnit Tests One first obvious approach is implementing good old unit tests: simply instantiate the class under test (typically, your SourceConnector or SourceTask implementation), invoke its methods (for instance, SourceConnector::taskConfigs(), or SourceTask::poll()), and assert the return values.\nHere’s an example for such a test from kc-etcd, a simple source connector for etcd, which is a distributed key/value store, most prominently used by Kubernetes as its metadata storage. Note that kc-etcd isn’t meant to be a production-ready connector; I have written it primarily for learning and teaching purposes.\nThis test verifies that the connector produces the correct task configuration, based on a given number of maximum tasks of two:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 public class EtcdSourceConnectorTest { @Test public void shouldCreateConfigurationForTasks() throws Exception { EtcdSourceConnector connector = new EtcdSourceConnector(); Map\u0026lt;String, String\u0026gt; config = new HashMap\u0026lt;\u0026gt;(); config.put( \u0026#34;clusters\u0026#34;, \u0026#34;etcd-a=http://etcd-a-1:2379,http://etcd-a-2:2379,http://etcd-a-3:2379;etcd-b=http://etcd-b-1:2379;etcd-c=http://etcd-c-1:2379\u0026#34; ); (1) connector.start(config); List\u0026lt;Map\u0026lt;String, String\u0026gt;\u0026gt; taskConfigs = connector.taskConfigs(2); (2) assertThat(taskConfigs).hasSize(2); (3) taskConfig = taskConfigs.get(0); assertThat(taskConfig).containsEntry(\u0026#34;clusters\u0026#34;, \u0026#34;etcd-a=http://etcd-a-1:2379,http://etcd-a-2:2379,http://etcd-a-3:2379;etcd-b=http://etcd-b-1:2379\u0026#34;); (4) taskConfig = taskConfigs.get(1); assertThat(taskConfig).containsEntry(\u0026#34;clusters\u0026#34;, \u0026#34;etcd-c=http://etcd-c-1:2379\u0026#34;); } } 1 Configure the connector with three etcd clusters 2 Request the configuration for two tasks 3 The first connector task should handle the first two clusters 4 The second task should handle the remaining third cluster Things look similar when testing the actual polling loop of the connector’s task class. As this is about testing a source connector, we first need to do some data changes in the configured etcd cluster(s), before we can assert the corresponding SourceRecords that are emitted by the task. As part of kc-etcd, I’ve implemented a very basic testing harness named kcute (\u0026#34;Kafka Connect Unit Testing\u0026#34;) which simplifies that process a bit. Here’s an example test from kc-etcd, based on kcute:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 public class EtcdSourceTaskTest { @RegisterExtension (1) public static final EtcdClusterExtension etcd = EtcdClusterExtension.builder() .withNodes(1) .build(); @RegisterExtension (2) public TaskRunner taskRunner = TaskRunner.forSourceTask(EtcdSourceConnectorTask.class) .with(\u0026#34;clusters\u0026#34;, \u0026#34;test-etcd=\u0026#34; + etcd.clientEndpoints().get(0)) .build(); @Test public void shouldHandleAllTypesOfEvents() throws Exception { Client client = Client.builder() (3) .keepaliveWithoutCalls(false) .endpoints(etcd.clientEndpoints()) .build(); KV kvClient = client.getKVClient(); long currentRevision = getCurrentRevision(kvClient); // insert ByteSequence key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); ByteSequence value = ByteSequence.from(\u0026#34;value-1\u0026#34;.getBytes()); kvClient.put(key, value).get(); // update key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); value = ByteSequence.from(\u0026#34;value-1a\u0026#34;.getBytes()); kvClient.put(key, value).get(); // delete key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); kvClient.delete(key).get(); (4) List\u0026lt;SourceRecord\u0026gt; records = taskRunner.take(\u0026#34;test-etcd\u0026#34;, 3); (5) // insert SourceRecord record = records.get(0); assertThat(record.sourcePartition()).isEqualTo(Collections.singletonMap(\u0026#34;name\u0026#34;, \u0026#34;test-etcd\u0026#34;)); assertThat(record.sourceOffset()).isEqualTo(Collections.singletonMap(\u0026#34;revision\u0026#34;, ++currentRevision)); assertThat(record.keySchema()).isEqualTo(Schema.STRING_SCHEMA); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.valueSchema()).isEqualTo(Schema.STRING_SCHEMA); assertThat(record.value()).isEqualTo(\u0026#34;value-1\u0026#34;); // update record = records.get(1); assertThat(record.sourceOffset()).isEqualTo(Collections.singletonMap(\u0026#34;revision\u0026#34;, ++currentRevision)); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isEqualTo(\u0026#34;value-1a\u0026#34;); // delete record = records.get(2); assertThat(record.sourceOffset()).isEqualTo(Collections.singletonMap(\u0026#34;revision\u0026#34;, ++currentRevision)); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isNull(); } } 1 Set up an etcd cluster using the JUnit extension provided by the jetcd client project 2 Set up the task unter test using kcute 3 Obtain a client for etcd and do some data changes 4 Retrieve three records for the specified topic via kcute 5 Assert the emitted SourceRecords corresponding to the data changes done before in etcd Now one could argue about whether this test is a true unit test or not, given it launches and relies on an instance of an external system in the form of etcd. My personal take on these matters is to be pragmatic here, as a) there’s a difference to true end-to-end integration tests as discussed in the next section, and b) approaches to mock external systems usually are not worth the effort or, worse, result in poor tests, due to incorrect assumptions when implemening the mocked behavior.\nThis testing approach works very well in general; in particular it doesn’t require you to start Apache Kafka (and ZooKeeper), nor Kafka Connect, resulting in very fast test execution times and a great dev experience when creating and running these tests in your IDE.\nBut there are some limitations, too. Essentially, we end up emulating the behavior of the actual Kafka Connect runtime in our testing harness. This can become tedious when more advanced Connect features are required for a given test, for instance retrying/restart logic, the dynamic reconfiguration of connector tasks while the connector is running, etc. Ideally, there’d be a testing harness with all these capabilities provided as part of Kafka Connect itself (similar in spirit to the TopologyTestDriver of Kafka Streams), but in the absence of that, we may be better off for certain tests by deploying our source connector into an actual Kafka Connect instance and run assertions against the topic(s) it writes to.\nIntegration Tests When it comes to setting up the required infrastructure for integration tests in Java, the go-to solution these days is the excellent Testcontainers project. So let’s see what it takes to test a custom Kafka connector using Testcontainers.\nAs far as Kafka itself is concerned, there’s a module for that coming with Testcontainers, based on Confluent Platform. Alternatively, you could use the Testcontainers module from the Strimzi project, which provides you with plain upstream Apache Kafka container images. For Kafka Connect, we provide a Testcontainers integration as part of the Debezium project, offering an API for registering connectors and controlling their lifecycle.\nNow, unfortunately, the application server like deployment model of Kafka Connect poses a challenge when it comes to testing a connector which is built as part of the current project itself. For each connector plug-in, Connect expects a directory on its plug-in path which contains all the JARs of the connector itself and its dependencies. I’m not aware of any kind of \u0026#34;exploded mode\u0026#34;, where you could point Connect to a directory which contains a connector’s class files and its dependencies in JAR form.\nThis means that that the connector must be packaged into a JAR file as part of the test preparation. In order to make integration tests friendly towards being run from within an IDE, this should happen programmatically within the test itself. That way, any code changes to the connector will be picked up automatically when running the test for the next time, without having to run the project’s Maven build. The entire code for doing this is a bit too long (and boring) for sharing it in this blog post, but you can find it in the kc-etcd repository on GitHub.\nHere’s the key parts of an integration test based on that approach, though:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 public class EtcdConnectorIT { private static Network network = Network.newNetwork(); (1) private static KafkaContainer kafkaContainer = new KafkaContainer(DockerImageName.parse(\u0026#34;confluentinc/cp-kafka:7.2.0\u0026#34;)) .withNetwork(network); (2) public static DebeziumContainer connectContainer = new DebeziumContainer(\u0026#34;debezium/connect-base:1.9.5.Final\u0026#34;) .withFileSystemBind(\u0026#34;target/kcetcd-connector\u0026#34;, \u0026#34;/kafka/connect/kcetcd-connector\u0026#34;) .withNetwork(network) .withKafka(kafkaContainer) .dependsOn(kafkaContainer); (3) public static EtcdContainer etcdContainer = new EtcdContainer(\u0026#34;gcr.io/etcd-development/etcd:v3.5.4\u0026#34;, \u0026#34;etcd-a\u0026#34;, Arrays.asList(\u0026#34;etcd-a\u0026#34;)) .withNetworkAliases(\u0026#34;etcd\u0026#34;) .withNetwork(network); @BeforeAll public static void startContainers() throws Exception { createConnectorJar(); (4) Startables.deepStart(Stream.of( kafkaContainer, etcdContainer, connectContainer)) .join(); } @Test public void shouldHandleAllTypesOfEvents() throws Exception { Client client = Client.builder() .endpoints(etcdContainer.clientEndpoint()).build(); (5) ConnectorConfiguration connector = ConnectorConfiguration.create() .with(\u0026#34;connector.class\u0026#34;, \u0026#34;dev.morling.kcetcd.source.EtcdSourceConnector\u0026#34;) .with(\u0026#34;clusters\u0026#34;, \u0026#34;test-etcd=http://etcd:2379\u0026#34;) .with(\u0026#34;tasks.max\u0026#34;, \u0026#34;2\u0026#34;) .with(\u0026#34;key.converter\u0026#34;, \u0026#34;org.apache.kafka.connect.storage.StringConverter\u0026#34;) .with(\u0026#34;value.converter\u0026#34;, \u0026#34;org.apache.kafka.connect.storage.StringConverter\u0026#34;); (6) connectContainer.registerConnector(\u0026#34;my-connector\u0026#34;, connector); connectContainer.ensureConnectorTaskState(\u0026#34;my-connector\u0026#34;, 0, State.RUNNING); KV kvClient = client.getKVClient(); (7) // insert ByteSequence key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); ByteSequence value = ByteSequence.from(\u0026#34;value-1\u0026#34;.getBytes()); kvClient.put(key, value).get(); // update key = ByteSequence.from(\u0026#34;key-1\u0026#34;.getBytes()); value = ByteSequence.from(\u0026#34;value-1a\u0026#34;.getBytes()); kvClient.put(key, value).get(); // delete key = ByteSequence.from(\u0026#34;key-2\u0026#34;.getBytes()); kvClient.delete(key).get(); (8) List\u0026lt;ConsumerRecord\u0026lt;String, String\u0026gt;\u0026gt; records = drain(getConsumer(kafkaContainer), 3); // insert ConsumerRecord\u0026lt;String, String\u0026gt; record = records.get(0); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isEqualTo(\u0026#34;value-1\u0026#34;); // update record = records.get(1); assertThat(record.key()).isEqualTo(\u0026#34;key-1\u0026#34;); assertThat(record.value()).isEqualTo(\u0026#34;value-1a\u0026#34;); // delete record = records.get(2); assertThat(record.key()).isEqualTo(\u0026#34;key-2\u0026#34;); assertThat(record.value()).isNull(); } } 1 Set up Apache Kafka in a container using the Testcontainers Kafka module 2 Set up Kafka Connect in a container, mounting the target/kcetcd-connector directory onto the plug-in path; as part of the project’s Maven build, all the dependencies of the kc-etcd connector are copied into that directory 3 Set up etcd in a container 4 Package the connector classes from the target/classes directory into a JAR and add that JAR to the mounted plug-in directory; the complete source code for this can be found here 5 Configure an instance of the etcd source connector, using String as the key and value format 6 Register the connector, then block until its tasks have reached the RUNNING state 7 Do some changes in the source etcd cluster 8 Using a regular Kafka consumer, read three records from the corresponding Kafka topic and assert the keys and values (complete code here) And that’s all there is to it; we now have a test which packages our source connector, deploys it into Kafka Connect and asserts the messages it sends to Kafka. While this is definitely more time-consuming to run than the simple test harness shown above, this true end-to-end approach tests the connector in the actual runtime environment, verifying its behavior when executed via Kafka Connect, just as it would be the case when running the connector in production later on.\nWrap-Up In this post, we’ve discussed two approaches for testing Kafka Connect source connectors: plain unit tests, \u0026#34;manually\u0026#34; invoking the methods of the connector/task classes under test, and integration tests, deploying a connector into Kafka Connect and verifying its behavior there via Testcontainers.\nThe former approach provides you with faster turnaround times and shorter feedback cycles, whereas the latter approach gives you the confidence of testing a connector within the actual Kafka Connect runtime environment, at the cost of a more complex infrastructure set-up and longer test execution times. While we’ve focused on testing source connectors in this post, both approaches could equally be applied to sink connectors, with the only difference being that you’d feed records to the connector (either directly or by writing to a Kafka topic) and then observe and assert the expected state changes of the sink system in question.\nYou can find the complete source code for this post, including some parts omitted here for brevity, in the kc-etcd repository on GitHub. If you think that having a test harness like kcute for unit testing connectors is a good idea and it’s something you’d like to contribute to, then please let me know. Ultimately, this could be extracted into its own project, independent from kc-etcd, or even be upstreamed to the Apache Kafka project proper, reusing as much as possible the actual Connect code, sans the bits for \u0026#34;deploying\u0026#34; connectors via a separate process.\nMany thanks to Hans-Peter Grahsl and Kate Stanley for their feedback while writing this blog post!\n","id":121,"publicationdate":"Aug 25, 2022","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_unit_tests\"\u003eUnit Tests\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_integration_tests\"\u003eIntegration Tests\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://kafka.apache.org/documentation/#connect\"\u003eKafka Connect\u003c/a\u003e is a key factor for the wide-spread adoption of Apache Kafka:\na framework and runtime environment for connectors,\nit makes the task of getting data either into Kafka or out of Kafka solely a matter of configuration,\nrather than a bespoke programming job.\nThere’s dozens, if not hundreds, of readymade source and sink connectors,\nallowing you to create no-code data pipelines between all kinds of databases, APIs, and other systems.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThere may be situations though where there is no existing connector matching your requirements,\nin which case you can \u003ca href=\"https://kafka.apache.org/documentation/#connect_development\"\u003eimplement your own\u003c/a\u003e custom connector using the Kafka Connect framework.\nNaturally, this raises the question of how to test such a Kafka connector,\nmaking sure it propagates the data between the connected external system and Kafka correctly and completely.\nIn this blog post I’d like to focus on testing approaches for Kafka Connect \u003cem\u003esource\u003c/em\u003e connectors,\ni.e. connectors like \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e, which ingest data from an external system into Kafka.\nVery similar strategies can be employed for testing sink connectors, though.\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","kafka-connect","testing"],"title":"Testing Kafka Connectors","uri":"https://www.morling.dev/blog/testing-kafka-connectors/"},{"content":"","id":122,"publicationdate":"Jun 23, 2022","section":"tags","summary":"","tags":null,"title":"conferences","uri":"https://www.morling.dev/tags/conferences/"},{"content":"","id":123,"publicationdate":"Jun 23, 2022","section":"tags","summary":"","tags":null,"title":"speaking","uri":"https://www.morling.dev/tags/speaking/"},{"content":" Table of Contents 1. 💦 Rehearse, Rehearse, Rehearse 2. 🎬 Start With a Mission 3. 📖 Tell a Story 4. 👀 Look at the Audience, Not Your Slides 5. 🧹 Put Less Text on Your Slides. Much Less 6. ✂️ Tailor the Talk Towards Your Audience 7. 3️⃣ Rule of Three 8. 🚑 Have a Fallback Plan for Demos 9. 💪 Play to Your Strengths 10. 🔄 Circle Back Every so often, I come across some conference talk which is highly interesting in terms of its actual contents, but which unfortunately is presented in a less than ideal way. I’m thinking of basic mistakes here, such as the presenter primarily looking at their slides rather than at the audience. I’m always feeling a bit sorry when this happens, as I firmly believe that everyone can do good and even great talks, just by being aware of — and thus avoiding — a few common mistakes, and sticking to some simple principles.\nNow, who am I to give any advice on public speaking? Indeed I’m not a professional full-time speaker, but I do enjoy presenting on technologies which I am working on or with as part of my job. Over time, I’ve come to learn about a few techniques which I felt helped me to give better talks. A few simple things, which can be easy to get wrong, but which make a big difference for the perception of your talk. Do I always stick to them myself? I try my best, but sometimes, I fail ¯\\_(ツ)_/¯.\nSo, without further ado, here’s ten tips and techniques for making your next conference talk suck a bit less.\n1. 💦 Rehearse, Rehearse, Rehearse In particular if you have done a few talks already and you start feeling comfortable, it can be tempting to think you could just wing it and skip the rehearsal for your next one. After all, it can feel weird to be alone in your room and speak aloud all by yourself. I highly recommend to not fall for that — rehearsing a talk is absolutely vital for making it successful. It will help you to develop a consistent line of argument and identify any things you otherwise may forget to mention. Only proper rehearsing will give you that natural flow you want to have for a talk.\nAlso, it will help you with the right timing of your talk: you don’t want to finish 20 min ahead of time, nor reach the end of your presentation slot with half of your slides remaining. If this happens, it usually means folks haven’t rehearsed once and it’s not a good position to be in. For a new talk, I usually will do three rehearsal runs before presenting it at an event. I will also do a rehearsal run if I repeat an earlier talk after some months, as it’s too easy to forget some important point otherwise.\nWhen doing a rehearsal, it’s a good idea to note down some key timestamps, such as when you transition to a demo. This will come in handy for instance for identifying sections you could shorten if you realize the talk is too long in its initial form.\n2. 🎬 Start With a Mission How to start a talk well could be an entire topic for its own post. After all, the first few seconds decide whether folks will be excited about your talk and pay attention, or rather pack out their laptop and check their emails. What I’ve found to work well for me is starting with a mission. I.e. I’ll often present a specific problem and make the case for how listening to that talk will help to address that problem. Needless to say that the problem should be relevant to the audience, i.e. its key to motivate why and how it matters to them, and how learning about the solution will benefit them. Don’t focus on the thing you want to talk about, focus on a challenge your audience has and how your talk will help them to overcome that.\nAnother approach is to present the key learnings (for instance three, see below) which the audience will make during the talk. While this may sound similar to an agenda slide, the framing is different: it’s taking the perspective of the listener and what’s in it for them by sticking through your session. Don’t lead with your personal introduction; if you’re known in the field, people don’t care. And if you’re not, well, they probably also won’t care. In any case, telling much about yourself is not what will attract people to your talk. I usually have a very brief intro slide after discussing the mission or key learnings.\n3. 📖 Tell a Story Good talks tell a story, i.e. there’s a meaningful progression in terms of what you tell, starting with some setting and context, perhaps with some challenge or drama (\u0026#34;And this is when our main production server failed\u0026#34;), and of course a happy ending (\u0026#34;With the new solution we can fail-over to a stand-by in less than a second\u0026#34;).\nNow it doesn’t literally have to be a story (although it can be, as for instance in my talk To the Moon and Beyond With Java 17 APIs!), but you should make sure that there is a logical order of the things you discuss, for instance in a temporal or causal sense, and you should avoid jumping forth and back between different things. The latter for instance can happen due to insufficient rehearsal, forcing you to make a specific point too late during the talk, as you forgot to bring it at the right moment. Also, for each discussion point and slide there should be a very specific reason for having it in your deck. I.e. it should form a cohesive unit, rather than being a collection of random unrelated talking points.\nOther storytelling techniques can be employed to great effect as well, such as doing a quick wrap-up when finishing a key section of your session, or adding little \u0026#34;side quests\u0026#34; for things you really want to mention but which are not strictly related to the main storyline.\nIn terms of crafting a story, I try to start early and collect input over a longer period of time, typically using a mind map. This allows you to identify and gather the most interesting aspects of a given topic, also touching on points which perhaps came up in a revelation you had a while ago. You’ll be less likely to have that breadth of contents at your disposal when starting the day before the presentation. This is not to say that you should use every single bit of information you’ve collected, but starting from a broad foundation allows you to select the most relevant and insightful bits.\n4. 👀 Look at the Audience, Not Your Slides As mentioned at the beginning, one of my pet peeves is presenters turning their back (or side) to the audience and looking towards their slides projected next to them. This creates a big disconnect with your audience. The same applies to the slides on the laptop in front of you, avoid looking at them as much as you can. Instead, try to have as much eye contact with the audience as possible, it makes a huge difference in terms of perception and quality of your talk. Putting a sticker onto your screen can be a helpful reminder. Only if you actually speak to the audience, it will be an engaging and immersive experience for them. It’s extra bad if you don’t use a microphone, say at a local meet-up, as it means people will be able to understand you much worse.\nNow why are folks actually looking at their slides? I think it’s generally an expression of feeling a bit insecure or uncomfortable, and in particular the concern to forget to mention an important point. To me, the only viable solution here is that you really need to memorize what you want to say, in which case you’ll be able to make your points without having to read anything from your slides. Your slides are not your speaker notes!\n5. 🧹 Put Less Text on Your Slides. Much Less In terms of what should be on slides, this again could be a topic for its own blog post. In general, the less words the better. Note I’m not suggesting you need to go image-only slides TED talk style, but you should minimize the amount of text on slides as much as possible. The reason being that folks will either listen to you or read what’s on your slides, but hardly both. Which means that either your effort for putting the text on the slides is wasted (bad), or folks don’t actually get what you’re telling them (worse). So if you think you’ve removed enough, remove some more. And then some more. This also allows you to make the font size big enough, so that folks actually can read those few items which remain.\nWhat I personally like to have on slides the most is diagrams, charts, sketches, and the like. Anything visual really. Which also brings up one exception to the \u0026#34;Don’t look at your slides\u0026#34; rule: if you actually explain a visual, elaborating a particular part for instance, then shortly turning towards the slide and pointing to some element of it can make sense.\nOn a related note, I recommend not relying on having access to your speaker notes during a talk. While technically it may be possible to show the notes on your laptop and the actual slides on the projector, this will fall apart when you do a live demo, where you really need to work with a mirrored set-up. Think of speaker notes as of cheat sheets back in school: the value is in writing them, not in reading them. By the time you’ll present your talk, you’ll have memorized what’s on your notes. Make use of them for developing the story line for each slide, and of course they will also be useful when coming back to a talk after a few months.\n6. ✂️ Tailor the Talk Towards Your Audience I don’t see that one done wrong too often, but it’s worth pointing out: a talk should actually match its audience. So if for instance you talk to users of some technology, focussing on use cases of it makes sense, or on how to run it in production etc. Whereas this audience probably won’t care as much about implementation details (as much as you may want to talk about how you solved that one tricky technical challenge using some clever approach). If, on the other hand, you present about the same technology to a conference geared towards builders of tech in that space, diving into those gory details would be highly attractive for the audience.\nThat’s why I focus heavily on use cases when talking about Debezium at developer conferences. Whereas when I had the opportunity to present on Debezium and change data capture (CDC) during an online talk series of Carnegie Mellon’s database group, I centered the talk around implementation challenges and improvements databases could make to better support CDC use cases.\nKey here is expectation management: make sure you know what kind of audience you’re going to speak to and adjust your talk accordingly. Oftentimes, the same basic talk can work well for different settings and audiences, just with framing things the right way and putting the focus on the right parts, for instance by swapping a few slides in and out.\n7. 3️⃣ Rule of Three Over time I’ve become a big believer in the rule of three; for instance, have three main learnings or ideas for a talk. If it’s a talk about a new product release, share three key features. On one slide, have three main points to discuss. When you share examples, give three of them. And so on.\nWhy three? It hits the sweet spot of providing representative information and data, letting you enough time to sufficiently dive into each of them, and not being too extensive or repetitive. Your audience can digest only so much input in a given session, so they’ll be better served if you tell them about three things which they can take in and remember, instead of telling them about ten things which they all quickly forget or even miss to begin with.\n8. 🚑 Have a Fallback Plan for Demos Live demos can be a great addition to any technology-centered conference talk. Actually showing how the thing you discuss works can be an eye-opener and be truly impressive. Not so much though if the demo gods aren’t with you. And we’ve all been there: poor network at the conference venue doesn’t let you download that one container image you’re missing, you have a compile error in your code and in the heat of the moment you can’t find out what’s wrong, etc.\nTrying to analyze problems in front of a conference audience can be very stressful, and frankly speaking, it’s quickly getting boring or even weird for the audience. So you always should have a fallback plan in case things don’t go as expected with your demo. My go-to strategy is to have a pre-recorded video of the demo which I can play back, instead of wasting minutes trying to solve any issues. I’ll still live-comment that video, which makes it a bit more interactive rather than collectively listening to my pre-recorded voice. For instance I can pause the video and expand on some specific point.\n9. 💪 Play to Your Strengths Some personal habits are really hard to change. One example: I tend to speak fast, very fast, during talks. I’m well aware of that, listeners told me, a coach told me, I saw it myself in recordings. But it’s somehow impossible for me to change it. If I really force myself hard to speak slower, it will work for a while, but typically I’ll be back to my usual speed after a while.\nSo I’ve decided to not fight against this any longer and just live with it. The reason being that I feel the high pace also gives me some energy and flow which I hope becomes apparent to the audience. I believe viewers (and I) are better off with me doing a passionate talk which may be a bit too fast, instead of one which has a slower pace but lacks the right amount of energy.\nI think that’s generally applicable: You don’t like talking about concepts, but love showing how things work in action? Then shorten the former and make more room for a live demo. You enjoy discussing live questions? Make more time for the Q\u0026amp;A. This all is to say, instead of excessively focussing on things you perceive as your weak sides, rather leverage your strong suites.\n(Yes, the irony of this being part of a post focussing on avoiding basic mistakes is not lost on me.)\n10. 🔄 Circle Back I’ve found it works great if you circle back to a point you made earlier during a talk. The most apparent way of doing this is coming back to the mission statement you set out for the talk at the beginning. You should be able to make the point that the things you presented actually satisfy that original mission. Or you have some sort of catch phrase to which you cycle back a few times, repetition can help to drive home a point. Just don’t overdo it, as it can become annoying otherwise. Personally, I like the notion of circling back as it provides some means of closure which is a pleasant sensation.\nAnd that’s it, ten basic tips for making your next talk suck a bit less. You probably won’t get an invitation for doing your first TED talk just by applying them, but they may help you with your next tech conference or meet-up presentation. As a presenter, you should think of yourself as a service provider to the audience: they pay with their time (and usually a fair amount of money) to attend your talk, so you should put in the effort to make sure they have a great time and experience.\nWhat are your presentation tips and tricks? Let me know in the comments below!\nMany thanks to Hans-Peter Grahsl, Marta Paes, and Robin Moffatt for their feedback while writing this blog post!\n","id":124,"publicationdate":"Jun 23, 2022","section":"blog","summary":"Table of Contents 1. 💦 Rehearse, Rehearse, Rehearse 2. 🎬 Start With a Mission 3. 📖 Tell a Story 4. 👀 Look at the Audience, Not Your Slides 5. 🧹 Put Less Text on Your Slides. Much Less 6. ✂️ Tailor the Talk Towards Your Audience 7. 3️⃣ Rule of Three 8. 🚑 Have a Fallback Plan for Demos 9. 💪 Play to Your Strengths 10. 🔄 Circle Back Every so often, I come across some conference talk which is highly interesting in terms of its actual contents, but which unfortunately is presented in a less than ideal way.","tags":["speaking","conferences"],"title":"Ten Tips to Make Conference Talks Suck Less","uri":"https://www.morling.dev/blog/ten-tips-make-conference-talks-suck-less/"},{"content":" Table of Contents Project Loom Scheduling Discussion Project Loom (JEP 425) is probably amongst the most awaited feature additions to Java ever; its implementation of virtual threads (or \u0026#34;green threads\u0026#34;) promises developers the ability to create highly concurrent applications, for instance with hundreds of thousands of open HTTP connections, sticking to the well-known thread-per-request programming model, without having to resort to less familiar and often more complex to use reactive approaches.\nHaving been in the workings for several years, Loom got merged into the mainline of OpenJDK just recently and is available as a preview feature in the latest Java 19 early access builds. I.e. it’s the perfect time to get your hands onto virtual threads and explore the new feature. In this post I’m going to share an interesting aspect I learned about thread scheduling fairness for CPU-bound workloads running on Loom.\nProject Loom First, some context. The problem with the classic thread-per-request model is that only scales up to a certain point. Threads managed by the operating system are a costly resource, which means you can typically have at most a few thousands of them, but not hundreds of thousands, or even millions. Now, if for instance a web application makes a blocking request to a database, the thread making that request is exactly that, blocked. Of course other threads can be scheduled on the CPU in the meantime, but you cannot have more concurrent requests than threads available to you.\nReactive programming models address this limitation by releasing threads upon blocking operations such as file or network IO, allowing other requests to be processed in the meantime. Once a blocking call has completed, the request in question will be continued, using a thread again. This model makes much more efficient use of the threads resource for IO-bound workloads, unfortunately at the price of a more involved programming model, which doesn’t feel familiar to many developers. Also aspects like debuggability or observability can be more challenging with reactive models, as described in the Loom JEP.\nThis explains the huge excitement and anticipation of Project Loom within the Java community. Loom introduces a notion of virtual threads which are scheduled onto OS-level carrier threads by the JVM. If application code hits a blocking method, Loom will unmount the virtual thread from its curring carrier, making space for other virtual threads to be scheduled. Virtual threads are cheap and managed by the JVM, i.e. you can have many of them, even millions. The beauty of the model is that developers can stick to the familiar thread-per-request programming model without running into scaling issues due to a limited number of available threads. I highly recommend you to read the JEP of Project Loom, which is very well written and provides much more details and context.\nScheduling Now how does Loom’s scheduler know that a method is blocking? Turns out, it doesn’t. As I learned from Ron Pressler, the main author of Project Loom, it’s the other way around: blocking methods in the JDK have been adjusted for Loom, so as to release the OS-level carrier thread when being called by a virtual thread:\nAll blocking in Java is done through the JDK (unless you explicitly call native code). We changed the \u0026#34;leaf\u0026#34; blocking methods in the JDK to block the virtual thread rather than the platform thread. E.g. in all of java.util.concurrent there\u0026#39;s just one such method: LockSupport.park\n— Ron Pressler (@pressron) May 24, 2022 Ron’s reply triggered a very interesting discussion with Tim Fox (e.g. of Vert.x fame): what happens if code is not IO-bound, but CPU-bound? I.e. if code in a virtual thread runs some heavy calculation without ever calling any of the JDK’s blocking methods, will that virtual thread ever be unmounted?\nPerhaps surprisingly, the answer currently is: No. Which means that CPU-bound code will actually behave very differently with virtual threads than with OS-level threads. So let’s take a closer look at that phenomenon with the following example program:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 public class LoomTest { public static long blackHole; public static void main(String[] args) throws Exception { ExecutorService executor = Executors.newCachedThreadPool(); for(int i = 0; i \u0026lt; 64; i++) { final Instant start = Instant.now(); final int id = i; executor.submit(() -\u0026gt; { BigInteger res = BigInteger.ZERO; for(int j = 0; j \u0026lt; 100_000_000; j++) { res = res.add(BigInteger.valueOf(1L)); } blackHole = res.longValue(); System.out.println(id + \u0026#34;;\u0026#34; + Duration.between(start, Instant.now()).toMillis()); }); } executor.shutdown(); executor.awaitTermination(1, TimeUnit.HOURS); } } 64 threads are started at approximately the same time using a traditional cached thread pool, i.e. OS-level threads. Each thread counts to 100M (using BigInteger to make it a bit more CPU-intensive) and then prints out how long it took from scheduling the thread to the point of its completion. Here are the results from my Mac Mini M1:\nIn wallclock time, it took all the 64 threads roughly 16 seconds to complete. The threads are rather equally scheduled between the available cores of my machine. I.e. we’re observing a fair scheduling scheme. Now here are the results using virtual threads (by obtaining the executor via Executors::newVirtualThreadPerTaskExecutor()):\nThat chart looks very differently. The first eight threads took a wallclock time of about two seconds to complete, the next eight took about four seconds, etc. As the executed code doesn’t hit any of the JDK’s blocking methods, the threads never yield and thus ursurpate their carrier threads until they have run to completion. This represents an unfair scheduling scheme of the threads. While they were all started at the same time, for the first two seconds only eight of them were actually executed, followed by the next eight, and so on.\nLoom’s scheduler uses by default as many carrier threads as there are CPU cores available; There are eight cores in my M1, so processing happens in chunks of eight virtual threads at a time. Using the jdk.virtualThreadScheduler.parallelism system property, the number of carrier threads can be adjusted, e.g. to 16:\nFor the fun of it, let’s add a call to Thread::sleep() (i.e. a blocking method) to the processing loop and see what happens:\n1 2 3 4 5 6 7 8 9 10 11 12 13 ... for(int j = 0; j \u0026lt; 100_000_000; j++) { res = res.add(BigInteger.valueOf(1L)); if (j % 1_000_000 == 0) { try { Thread.sleep(1L); } catch (InterruptedException e) { throw new RuntimeException(e); } } } ... Surely enough, we’re back to a fair scheduling, with all threads completing after the roughly same wallclock time:\nIt’s noteworthy that the actual durations appear more harmonized in comparison to the original results we got from running with 64 OS-level threads. It seems the Loom scheduler can do a slightly better job of distributing the available resources between virtual threads. Surprisingly, a call to Thread::yield() didn’t have the same result. While a scheduler is free to ignore this intend to yield as per the method’s JavaDoc, Sundararajan Athijegannathan indicated that this would be applied by Loom. It would surely be interesting to know why that’s not the case here.\nDiscussion Seeing these results, the big question of course is whether this unfair scheduling of CPU-bound threads in Loom poses a problem in practice or not. Ron and Tim had an expanded debate on that point, which I recommend you to check out to form an opinion yourself. As per Ron, support for yielding at points in program execution other than blocking methods has been implemented in Loom already, but this hasn’t been merged into the mainline with the initial drop of Loom. It should be easy enough though to bring this back if the current behavior turns out to be problematic.\nNow there’s not much point in overcommitting to more threads than physically supported by a given CPU anyways for CPU-bound code (nor in using virtual threads to begin with). But in any case it’s worth pointing out that CPU-bound code may behavior differently with virtual threads than with classic OS-level threads. This may come at a suprise for Java developers, in particular if authors of such code are not in charge of selecting the thread executor/scheduler actually used by an application.\nTime will tell whether yield support also for CPU-bound code will be required or not, either via support for explicit calls to Thread::yield() (which I think should be supported at the very least) or through more implicit means, e.g. by yielding when reaching a safepoint. As I learned, Go’s goroutines support yielding in similar scenarios since version 1.14, so I wouldn’t be surprised to see Java and Loom taking the same course eventually.\n","id":125,"publicationdate":"May 27, 2022","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_project_loom\"\u003eProject Loom\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_scheduling\"\u003eScheduling\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_discussion\"\u003eDiscussion\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eProject Loom (\u003ca href=\"https://openjdk.java.net/jeps/425\"\u003eJEP 425\u003c/a\u003e) is probably amongst the most awaited feature additions to Java ever;\nits implementation of virtual threads (or \u0026#34;green threads\u0026#34;) promises developers the ability to create highly concurrent applications,\nfor instance with hundreds of thousands of open HTTP connections,\nsticking to the well-known thread-per-request programming model,\nwithout having to resort to less familiar and often more complex to use reactive approaches.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eHaving been in the workings for several years, Loom got merged into the mainline of OpenJDK \u003ca href=\"https://github.com/openjdk/jdk/commit/9583e3657e43cc1c6f2101a64534564db2a9bd84\"\u003ejust recently\u003c/a\u003e and is available as a preview feature in the latest \u003ca href=\"https://jdk.java.net/19/\"\u003eJava 19 early access builds\u003c/a\u003e.\nI.e. it’s the perfect time to get your hands onto virtual threads and explore the new feature.\nIn this post I’m going to share an interesting aspect I learned about thread scheduling fairness for CPU-bound workloads running on Loom.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","concurrency","virtual-threads"],"title":"Loom and Thread Fairness","uri":"https://www.morling.dev/blog/loom-and-thread-fairness/"},{"content":" JDK Mission Control (JMC) is invaluable for analysing performance data recording using JDK Flight Recorder (JFR). The other day, I ran into a problem when trying to run JMC on my Mac Mini M1. Mostly for my own reference, here’s what I did to overcome it.\nUpon launching a freshly downloaded JMC (I tried both the upstream build from OpenJDK and the one from the Eclipse Adoptium project), I’d get the following error message:\nThe JVM shared library \u0026#34;/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/../lib/server/libjvm.dylib\u0026#34; does not contain the JNI_CreateJavaVM symbol.\n\u0026#34;temurin-17.jdk\u0026#34; is my default JDK; it’s the Java 17 build provided by the Eclipse Temurin project for macOS/AArch64, i.e. the right one for the ARM chip of the M1 (\u0026#34;Apple silicon\u0026#34;). The error message isn’t overly helpful; after all, that referenced JDK works just fine for all my other applications. The problem though is that JMC itself currently only is shipped as an x64 application:\n1 2 $ file \u0026#34;JDK Mission Control.app\u0026#34;/Contents/MacOS/jmc .../JDK Mission Control.app/Contents/MacOS/jmc: Mach-O 64-bit executable x86_64 So I decided to try with an x64 JDK build instead; thanks to Apple’s Rosetta project, x64 binaries can be executed on the M1 via a rather efficient emulation.\nAfter downloading the macOS/x64 Temurin build, it needs to be configured as the JDK to use for JMC. For that, open the file JDK Mission Control.app/Contents/Info.plist in an editor and look for the Eclipse key. Add the -vm parameter with the path to the x64 JDK to the key’s value. Altogether, it should look like so:\n1 2 3 4 5 6 7 8 ... \u0026lt;array\u0026gt; \u0026lt;string\u0026gt;-keyring\u0026lt;/string\u0026gt; \u0026lt;string\u0026gt;~/.eclipse_keyring\u0026lt;/string\u0026gt; \u0026lt;string\u0026gt;-vm\u0026lt;/string\u0026gt; \u0026lt;string\u0026gt;/path/to/jdk-17.0.3+7-x86-64/Contents/Home/bin/java\u0026lt;/string\u0026gt; \u0026lt;/array\u0026gt; ... Et voilà, JMC will now start just fine on the Apple M1. Note that in some cases I got an intermittent permission issue after editing the plist file. Resetting the permissions helped in that case:\n1 $ sudo chmod -R 755 \u0026#34;JDK Mission Control.app\u0026#34; With the x64 JDK around, it’s a good idea to make sure it’s only used for JMC, while sticking to the AArch64 build for all other usages for the sake of performance. Unfortunately, it’s not quite obvious to see flavour you are running, as the target architecture isn’t displayed in the output of java --version:\n1 2 3 4 5 6 7 8 9 10 11 $ export JAVA_HOME=path/to/temurin-17.jdk/Contents/Home $ java --version openjdk 17.0.3 2022-04-19 OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7) OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode) $ export JAVA_HOME=path/to/jdk-17.0.3+7-x86-64/Contents/Home $ jdks java --version openjdk 17.0.3 2022-04-19 OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7) OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing) Not sure what \u0026#34;sharing\u0026#34; exactly means in the x64 output, perhaps it’s a hint? In any case, printing the contents of the os.arch system property will tell the truth, e.g. in jshell:\n1 2 3 4 5 6 7 $ export JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home $ jdks jshell | Welcome to JShell -- Version 17.0.3 | For an introduction type: /help intro jshell\u0026gt; System.out.println(System.getProperty(\u0026#34;os.arch\u0026#34;)) aarch64 1 2 3 4 5 6 7 $ export JAVA_HOME=~/Applications/jdks/jdk-17.0.3+7-x86-64/Contents/Home $ jshell | Welcome to JShell -- Version 17.0.3 | For an introduction type: /help intro jshell\u0026gt; System.out.println(System.getProperty(\u0026#34;os.arch\u0026#34;)) x86_64 If you are aware of a quicker way for identifying the current JDK’s target platform, I’d love to learn about it in the comments below. Thanks!\n","id":126,"publicationdate":"May 17, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://jdk.java.net/jmc/8/\"\u003eJDK Mission Control\u003c/a\u003e (JMC) is invaluable for analysing performance data recording using \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR).\nThe other day, I ran into a problem when trying to run JMC on my Mac Mini M1.\nMostly for my own reference, here’s what I did to overcome it.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","tooling"],"title":"Running JDK Mission Control on Apple M1","uri":"https://www.morling.dev/blog/running-jmc-on-apple-m1/"},{"content":"","id":127,"publicationdate":"Mar 10, 2022","section":"tags","summary":"","tags":null,"title":"code-review","uri":"https://www.morling.dev/tags/code-review/"},{"content":"","id":128,"publicationdate":"Mar 10, 2022","section":"tags","summary":"","tags":null,"title":"software-engineering","uri":"https://www.morling.dev/tags/software-engineering/"},{"content":" Table of Contents FAQ When it comes to code reviews, it’s a common phenomenon that there is much focus and long-winded discussions around mundane aspects like code formatting and style, whereas important aspects (does the code change do what it is supposed to do, is it performant, is it backwards-compatible for existing clients, and many others) tend to get less attention.\nTo raise awareness for the issue and providing some guidance on aspects to focus on, I shared a small visual on Twitter the other day, which I called the \u0026#34;Code Review Pyramid\u0026#34;. Its intention is to help putting focus on those parts which matter the most during a code review (in my opinion, anyways), and also which parts could and should be automated.\nAs some folks asked for a permanent, referenceable location of that resource and others wanted to have a high-res printing version, I’m putting it here again:\nYou can also download the visual as an SVG file.\nFAQ Why is it a pyramid?\nThe lower parts of the pyramid should be the foundation of a code review and take up the most part of it.\nHey, that’s a triangle!\nYou might think so, but it’s a pyramid from the side.\nWhich tool did you use for creating the drawing?\nExcalidraw.\n","id":129,"publicationdate":"Mar 10, 2022","section":"blog","summary":"Table of Contents FAQ When it comes to code reviews, it’s a common phenomenon that there is much focus and long-winded discussions around mundane aspects like code formatting and style, whereas important aspects (does the code change do what it is supposed to do, is it performant, is it backwards-compatible for existing clients, and many others) tend to get less attention.\nTo raise awareness for the issue and providing some guidance on aspects to focus on, I shared a small visual on Twitter the other day, which I called the \u0026#34;Code Review Pyramid\u0026#34;.","tags":["software-engineering","code-review"],"title":"The Code Review Pyramid","uri":"https://www.morling.dev/blog/the-code-review-pyramid/"},{"content":"","id":130,"publicationdate":"Feb 20, 2022","section":"tags","summary":"","tags":null,"title":"internals","uri":"https://www.morling.dev/tags/internals/"},{"content":" The JDK Flight Recorder (JFR) is one of Java’s secret weapons; deeply integrated into the Hotspot VM, it’s a high-performance event collection framework, which lets you collect metrics on runtime aspects like object allocation and garbage collection, class loading, file and network I/O, and lock contention, do method profiling, and much more.\nJFR data is persisted in recording files (since Java 14, also \u0026#34;realtime\u0026#34; event streaming is supported), which can be loaded for analysis into tools like JDK Mission Control (JMC), or the jfr utility coming with OpenJDK itself.\nWhile there’s lots of blog posts, conference talks, and other coverage on JFR itself, information about the format of recording files is surprisingly heard to come by. There is no official specification, so the only way to actually understand the JFR file format is to read the source code for writing recordings in the JDK itself, which is a combination of Java and C++ code. Alternatively, you can study the code for parsing recordings in JMC (an official JDK project). Btw., JMC comes with a pure Java-based JFR file writer implementation too.\nApart from the source code itself, the only somewhat related resources which I could find are this JavaOne presentation by Staffan Larssan (2013, still referring to the proprietary Oracle JFR), several JFR-related blog posts by Marcus Hirt, and a post about JFR event sizes by Richard Startin. But there’s no in-depth discussion or explanation of the file format. As it turns out, this by design; the OpenJDK team shied away from creating a spec, \u0026#34;because of the overhead of maintaining and staying compatible with it\u0026#34;. I.e. the JFR file format is an implementation detail of OpenJDK, and as such the only stable contract for interacting with it are the APIs provided by JFR.\nNow, also if it is an implementation detail, knowing more about the JFR file format would certainly be useful; for instance, you could use this to implement tools for analyzing and visualizing JFR data in non-JVM programming languages, say Python, or to patch corrupted recording files. So my curiosity was piqued and I thought it’d be fun to try and find out how JFR recording files are structured. In particular, I was curious about which techniques are used for keeping files relatively small, also with hundreds of thousands or even millions of recoreded events.\nI grabbed a hex editor, the source code of JMC’s recording parser (which I found a bit easier to grasp than the Java/C++ hybrid in the JDK itself), and loaded several example recordings from my JFR Analytics project, stepping through the parser code in debug mode (fun fact: while doing so, I noticed JMC currently fails to parse events with char attributes).\nJust a feeew hours later, and I largely understood how the thing works. As an image says more than a thousand words, and I’ll never say no to an opportunity to draw something in the fabuluous Excalidraw, so I proudly present to you this visualization of the JFR file format as per my understanding (click to enlarge):\nIt’s best viewed on a big screen 😎. Alternatively, here’s a SVG version. Now this doesn’t go into all the finest aspects, so you probably couldn’t go off and implement a clean-room JFR file parser solely based on this. But it does show the relevant concepts and mechanisms. I suggest you spend some time going through sections one to five in the picture, and dive into the sections for header, metadata, constant pool, and actual recorded events. Studying the image should give you a good understanding of the JFR file format and its structure.\nHere are some observations I made as I found my way through the file format:\nJFR recordings are organized in chunks: Chunks are self-contained independent containers of recorded events and all the metadata required for interpreting these events. There’s no additional content in recordings besides the chunks, i.e. concat several chunk files, and you’ll have a JFR recording file. A multi-chunk recording file can be split up into the individual chunks using the jfr utility which comes with OpenJDK:\n1 jfr disassemble --output \u0026lt;target-dir\u0026gt; some-recording.jfr The default chunksize is 12MB, but if needed, you can override this, e.g. using the -XX:FlightRecorderOptions:maxchunksize=1MB option when starting a recording. A smaller chunk size can come in handy if for instance you only want to transmit a specific section of a long-running recording. On the other hand, many small chunks will increase the overall size of a recording, due to the repeatedly stored metadata and constant pools\nThe event format is self-descriptive: The metadata part of each chunk describes the structure of the contained events, all referenced types, their attributes, etc.; by means of JFR metadata annotations, such as @Label, @Description, @Timestamp etc., further metadata like human-readable names and description as well as units of measurements are expressed, allowing to consume and parse an event stream without a-priori knowledge of specific event types. In particular, this allows for the definition of custom event types and displaying them in the generic event browser of JMC (of course, bespoke views such as the \u0026#34;Memory\u0026#34; view rely on type-specific interpretations of individual event types)\nThe format is geared towards space efficiency: Integer values are stored in a variable-length encoded way (LEB128), which will safe lots of space when storing small values. A constant pool is used to store repeatedly referenced objects, such as String literals, stack traces, class and method names, etc.; for each usage of such constant in a recorded event, only the constant pool index is stored (a var-length encoded long). Note that Strings can either be stored as raw values within events themselves, or in the constant pool. Unfortunately, no control is provided for choosing between the two; strings with a length between 16 and 128 will be stored in the constant pool, any others as raw value. It could be a nice extension to give event authors more control here, e.g. by means of an annotation on the event attribute definition\nWhen using the jdk.OldObjectSample event type, beware of bug JDK-8277919, which may cause a bloat of the constant pool, as the same entry is duplicated in the pool many times. This will be fixed in Java 17.0.3 and 18. The format is row-based: Events are stored sequentially one after another in recording files; this means that for instance boolean attributes will consume one full byte, also if actually eight boolean values could be stored in a single byte. It could be interesting to explore a columnar format as an alternative, which may help to further reduce recording size, for instance also allowing to efficiently compress event timestamps values using delta-encoding\nCompression support in JMC reader implementation: The JFR parser implementation of JMC transparently unpacks recording files which are compressed using GZip, ZIP, or LZ4 (Marcus Hirt discusses the compression of JFR recordings in this post). Interestingly, JMC 8.1 still failed to open such compressed recording with an error message. The jfr utility doesn’t support compressed recording files, and I suppose the JFR writer in the JDK doesn’t produce compressed recordings either\n","id":131,"publicationdate":"Feb 20, 2022","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) is one of Java’s secret weapons;\ndeeply integrated into the Hotspot VM, it’s a high-performance event collection framework,\nwhich lets you collect metrics on runtime aspects like object allocation and garbage collection,\nclass loading, file and network I/O, and lock contention, do method profiling, and much more.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJFR data is persisted in recording files\n(since Java 14, also \u003ca href=\"https://openjdk.java.net/jeps/349\"\u003e\u0026#34;realtime\u0026#34; event streaming\u003c/a\u003e is supported),\nwhich can be loaded for analysis into tools like JDK Mission Control (JMC),\nor the \u003cem\u003ejfr\u003c/em\u003e utility coming with OpenJDK itself.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","internals"],"title":"The JDK Flight Recorder File Format","uri":"https://www.morling.dev/blog/jdk-flight-recorder-file-format/"},{"content":"","id":132,"publicationdate":"Jan 12, 2022","section":"tags","summary":"","tags":null,"title":"api-design","uri":"https://www.morling.dev/tags/api-design/"},{"content":" Table of Contents Context The Error Itself Mitigation General Best Practices As software developers, we’ve all come across those annoying, not-so-useful error messages when using some library or framework: \u0026#34;Couldn’t parse config file\u0026#34;, \u0026#34;Lacking permission for this operation\u0026#34;, etc. Ok, ok, so something went wrong apparently; but what exactly? What config file? Which permissions? And what should you do about it? Error messages lacking this kind of information quickly create a feeling of frustration and helplessness.\nSo what makes a good error message then? To me, it boils down to three pieces of information which should be conveyed by an error message:\nContext: What led to the error? What was the code trying to do when it failed?\nThe error itself: What exactly failed?\nMitigation: What needs to be done in order to overcome the error?\nLet’s dive into these individidual aspects a bit. Before we start, let me clarify that this is about error messages created by library or framework code, for instance in form of an exception message, or in form of a message written to some log file. This means the consumers of these error messages will typically be either other software developers (encountering errors raised by 3rd party dependencies during application development), or ops folks (encountering errors while running an application).\nThat’s in contrast to user-facing error messages, for which other guidance and rules (in particular in regards to security concerns) should be applied. For instance, you typically should not expose any implementation details in a user-facing message, whereas that’s not that much of a concern — or on the contrary, it can even be desirable — for the kind of error messages discussed here.\nContext In a way, an error message tells a story; and as with every good story, you need to establish some context about its general settings. For an error message, this should tell the recipient what the code in question was trying to do when it failed. In that light, the first example above, \u0026#34;Couldn’t parse config file\u0026#34;, is addressing this aspect (and only this one) to some degree, but probably it’s not enough. For instance, it would be very useful to know the exact name of the file:\nCouldn’t parse config file: /etc/sample-config.properties\u0026#34;\nUsing an example from Debezium, the open-source change data capture platform I am working on in my day job, the second message could read like so with some context about what happened:\nFailed to create an initial snapshot of the data; lacking permission for this operation\nComing back to error messages related to the processing of some input or configuration file, it can be a good idea to print the absolute path. In case file system resources are provided as relative paths, this can help to identify wrong assumptions around the current working directory, or whatever else is used as the root for resolving relative paths. On the other hand, in particular in case of multi-tenant or SaaS scenarios, you may consider filesystem layouts as a confidential implementation detail, which you may prefer to not reveal to unknown code you run. What’s best here depends on your specific situation.\nIf some framework supports different kinds of files, the specific kind of the problematic file in question should be part of the message as well: \u0026#34;Couldn’t parse entity mapping file…​\u0026#34;. If the error is about specific parts of the contents of a file, displaying the line number and/or the line itself is a good idea.\nIn terms of how to convey the context of an error, it can be part of messages themselves, as shown above. Many logging frameworks also support the notion of a Mapped Diagnostic Context (MDC), a map for propagating arbitrary key/value pairs into log messages. So if your messages are meant to show up in logs, setting contextual information to the MDC can be very useful. In Debezium this is used for instance to propagate the name of the affected connector, allowing Kafka Connect users to tell apart log messages originating from different connectors deployed to the same Connect cluster.\nAs far as propagating contextual information via log messages is concerned (as opposed to, say, error messages printed by a CLI tool), structured logging, typically in form of JSON, simplifies any downstream processing. By putting contextual information into separate attributes of a structured log entry, consumers can easily filter messages, ingest only specific sub-sets of messages based on their contents, etc. In case of exceptions, the chain of exceptions leading to the root cause is an important contextual information, too. So I’d recommend to always log the entire exception chain, rather than catching exceptions and only logging some substitute message instead.\nThe Error Itself On to the next part then, the description of the actual error itself. That’s where you should describe what exactly happened in a concise way. Sticking to the examples above, the first message, including context and error description could read like so:\nCouldn’t parse config file: /etc/sample-config.properties; given snapshot mode \u0026#39;nevr\u0026#39; isn’t valid\nAnd for the second one:\nFailed to create an initial snapshot of the data; database user \u0026#39;snapper\u0026#39; is lacking the required permissions\nOther than that, there’s not too much to be said here; try to be efficient: make messages as long as needed, and as short as possible. One idea could be to work with different variants of messages for the same kind of error, a shorter and a longer one. Which one is used could be controlled via log levels or some kind of \u0026#34;verbose\u0026#34; flag. Java developers may find Cédric Champeau’s jdoctor library useful for implementing this. Personally, I haven’t used such an approach yet, but it may be worth the effort for specific situations.\nMitigation Having established the context of the failure and what went wrong exactly, the last — and oftentimes most interesting — part is a description of how the user can overcome the error. What’s the action they need to take in order to avoid it? This could be as simple as telling the user about the constraints and/or valid values in case of the config file example (i.e. akin to test failure messages, which show both expected and actual values):\nCouldn’t parse config file: /etc/sample-config.properties; given snapshot mode \u0026#39;nevr\u0026#39; isn’t valid (must be one of \u0026#39;initial\u0026#39;, \u0026#39;always\u0026#39;, \u0026#39;never\u0026#39;)\nIn case of the permission issue, you may clarify which ones are needed:\nCouldn’t take database snapshot: database user \u0026#39;snapper\u0026#39; is lacking the required permissions \u0026#39;SELECT\u0026#39;, \u0026#39;REPLICATION\u0026#39;\nAlternatively, if longer mitigation strategies are required, you may point to a (stable!) URL in your reference documention which provides the required information:\nCouldn’t take database snapshot: database user \u0026#39;snapper\u0026#39; is lacking the required permissions. Please see https://example.com/knowledge-base/snapshot-permissions/ for the complete set of necessary permissions\nIf some configuration change is required (for instance database or IAM permissions), your users will love you even more if you share that information in \u0026#34;executable\u0026#34; form, for instance as GRANT statements which they can simply copy, or vendor-specific CLI invocations such as aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/SomePolicy --role-name SomeRole.\nSpeaking of external resources referenced in error messages, it’s a great idea to have unique error codes as part of your messages (such as Oracle’s ORA codes, or the error messages produced by WildFly and its components). Corresponding resources (either provided by yourself, or externally, for instance in answers on StackOverflow) will then be easy to find using your favourite search engine. Bonus points for adding a reference to your own canonical resource right to the error message itself:\nCouldn’t take database snapshot: database user \u0026#39;snapper\u0026#39; is lacking the required permissions (DBZ-42). Please see https://dbz.codes/dbz-42/ for the complete set of necessary permissions\n(That’s a made-up example, we don’t make use of this approach in Debezium currently; but I probably should look into buying the dbz.codes domain 😉).\nThe key take-away is that you should not leave your users in the dark about what they need to do in order to address the error they ran into. Nothing is more frustrating than essentially being told \u0026#34;You did it wrong!\u0026#34;, without getting hinted at what’s the right thing to do instead.\nGeneral Best Practices Lastly, some practices in regards to error messages which I try to adhere to, and which I would generally recommend:\nUniform voice and style: The specific style chosen doesn’t matter too much, but you should settle on either active vs. passive voice (\u0026#34;couldn’t parse config file\u0026#34; vs. \u0026#34;config file couldn’t be parsed\u0026#34;), apply consistent casing, either finish or not finishes messages with a dot, etc.; not a big thing, but it will make your messages a bit easier to deal with\nOne concept, one term: Avoid referring to the same concept from your domain using different terms in different error messages; similarly, avoid using the same term for multiple things. Use the same terms as in other places, e.g. your API documentation, reference guides etc.; The more consisent and unambiguous you are, the better\nDon’t localize error messages: This one is not as clear cut, but I’d generally recommend to not translate error messages into other languages than English; Again, this all is not about user-facing error messages, but about messages geared towards software developers and ops folks, who generally should command reasonable English skills; depending on your audience and target market, translations to specific languages might make sense, in which case a common, unambiguous error code should definitely be part of messages, so as to facilitate searching for the error on the internet\nDon’t make error messages an API contract: In case consumers of your API should be able to react to different kinds of errors, they should not be required to parse any error messages in order to do so. Instead, raise an exception type which exposes a machine-processable error code, or raise specific exception types which can be caught separately by the caller\nBe cautious about exposing sensitive data: if your library is in the business of handling and processing sensitive user data, make sure to to not create any privacy concerns; for instance, \u0026#34;show actual vs. expected value\u0026#34; may not pose a problem for values provided by an application developer or administrator; but it can pose a problem if the actual value is GDPR protected user data\nEither raise an exception OR log an error, but not both: A given error should either be communicated by raising an exception or by logging an error. Otherwise, when doing both, as the exception will typically end up being logged via some kind of generic handler anyways, the user would see information about the same error in their logs twice, which only adds confusion\nFail early: This one is not so much about how to express error messages, but when to raise them; in general, the earlier, the better; a message at application start-up beats one later at runtime; a message at build time beats one at start-up, etc. Quicker feedback makes for shorter turn-around times for fixes and also helps to provide the context of any failures\nWith that all being said, what’s your take on the matter? Any best practices you would recommend? Do you have any examples for particularly well (or poorly) crafted messages? Let me know in the comments below!\n","id":133,"publicationdate":"Jan 12, 2022","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_context\"\u003eContext\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_error_itself\"\u003eThe Error Itself\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_mitigation\"\u003eMitigation\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_general_best_practices\"\u003eGeneral Best Practices\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs software developers, we’ve all come across those annoying, not-so-useful error messages when using some library or framework: \u003cem\u003e\u0026#34;Couldn’t parse config file\u0026#34;\u003c/em\u003e, \u003cem\u003e\u0026#34;Lacking permission for this operation\u0026#34;\u003c/em\u003e, etc.\nOk, ok, so \u003cem\u003esomething\u003c/em\u003e went wrong apparently; but what exactly? What config file? Which permissions? And what should you do about it?\nError messages lacking this kind of information quickly create a feeling of frustration and helplessness.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSo what makes a good error message then?\nTo me, it boils down to three pieces of information which should be conveyed by an error message:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cstrong\u003eContext:\u003c/strong\u003e What led to the error? What was the code trying to do when it failed?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cstrong\u003eThe error itself:\u003c/strong\u003e What exactly failed?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\u003cstrong\u003eMitigation:\u003c/strong\u003e What needs to be done in order to overcome the error?\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e","tags":["software-engineering","api-design"],"title":"What's in a Good Error Message?","uri":"https://www.morling.dev/blog/whats-in-a-good-error-message/"},{"content":" Table of Contents Discussion and Outlook 🧸 It’s Casey. Casey Cuddle.\nI am very happy to announce the first stable release of kcctl, a modern and intuitive command line client for Apache Kafka Connect!\nForget about having to memorize and type the right REST API paths and curl flags; with kcctl, managing your Kafka connectors is done via concise and logically structured commands, modeled after the semantics of the kubectl tool known from Kubernetes.\nStarting now, kcctl is available via SDKMan, which means it’s as easy as running sdk install kcctl for getting the latest kcctl release onto your Linux, macOS, or Windows x86 machine. For the best experience, also install the kcctl shell completion script, which not only \u0026lt;TAB\u0026gt;-completes command names and options, but also dynamic information such as connector, task, and logger names:\n1 2 wget https://raw.githubusercontent.com/kcctl/kcctl/main/kcctl_completion . kcctl_completion kcctl offers commands for all the common tasks you’ll encounter when dealing with Kafka Connect, such as listing the available connector plug-ins, registering new connectors, changing their configuration, pausing and resuming them, changing log levels, and much more.\nSimilar to kubectl, kcctl works with the notion of named configuration contexts. Contexts allow you to set up multiple named Kafka Connect environments (e.g. \u0026#34;local\u0026#34; and \u0026#34;testing\u0026#34;) and easily switch between them, without having to specify the current Connect cluster URL all the time:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ kcctl config get-contexts NAME KAFKA CONNECT URI local http://localhost:8083 testing* http://localhost:8084 $ kcctl config use-context local Using context \u0026#39;local\u0026#39; $ kcctl get plugins TYPE CLASS VERSION source io.debezium.connector.db2.Db2Connector 1.8.0.Final source io.debezium.connector.mongodb.MongoDbConnector 1.8.0.Final source io.debezium.connector.mysql.MySqlConnector 1.8.0.Final source org.apache.kafka.connect.file.FileStreamSourceConnector 3.0.0 source org.apache.kafka.connect.mirror.MirrorCheckpointConnector 1 source org.apache.kafka.connect.mirror.MirrorHeartbeatConnector 1 source org.apache.kafka.connect.mirror.MirrorSourceConnector 1 sink org.apache.kafka.connect.file.FileStreamSinkConnector 3.0.0 Once you’ve set up a kcctl context, you can start using the tool for managing your connectors. Here is a video which shows a typical workflow in kcctl (note this recording shows an earlier version of kcctl, there’s a few less commands and the notion of contexts has is slightly changed since then):\nAs shown in the video, connectors are registered and updated via kcctl apply. This command can also read input from stdin, which for instance comes in handy when templating connector configuration using Jsonnet and setting up multiple similar connectors at once:\nTo learn more about these and all the other commands available in kcctl, run kcctl --help.\nDiscussion and Outlook kcctl offers an easy yet very powerful way for solving your day-to-day tasks with Kafka Connect. In comparison to using the REST API directly via clients such as curl or httpie, kcctl as a dedicated tool offers commands which are more concise and intuitive; also its output is logically organized, using colored formatting to highlight key information. It has become an invaluable tool for my own work on Debezium, e.g. when testing, or doing some demo. These days, I find myself very rarely using the REST API directly any more.\nI hope kcctl becomes useful helper for folks working with Kafka Connect. As such, I see it as a complement to other means of interacting with Kafka Connect. Sometimes a CLI client may be what does the job the best, while at other times you may prefer to work with a graphical user interface such as Debezium UI or the vendor-specific consoles of managed connector services, Kubernetes operators such as Strimzi, Terraform, or perhaps even a Java API. It’s all about options!\nWhile all the typical Kafka Connect workflows are supported by kcctl already, there’s quite a few additional features I’d love to see. First and foremost, the ability to display (and reset) the offsets of Kafka Connect source connectors. Work on that is well underway, and I expect this to be available very soon. There also should be support for different output formats such as JSON, improving useability in conjunction with other CLI tools such as jq. The restart command should be expanded, so as to take advantage of the API for restarting all (failed) connector tasks added in Kafka Connect 3.0. Going beyond the scope of supporting plain Kafka Connect, there could also be connector specific commands, such as an option for compacting the history topic of Debezium connectors. Of course, your feature requests are welcome, too! Please log an issue in the kcctl project with your proposals for additions to the tool. And while at it, we’d also love to welcome you as a stargazer 🌟 to the project!\nLastly, a big thank you to all the amazing people who have contributed to kcctl up to this point:\nAndres Almiray, Guillaume Smet, Hans-Peter Grahsl, Iskandar Abudiab, Jay Patel, Karim ElNaggar, Michael Simons, Mickael Maison, Oliver Weiler, Sergey Nuyanzin, Siddique Ahmad, Thomas Dangleterre, and Tony Foster!\nYou’re the best 🧸!\n","id":134,"publicationdate":"Dec 21, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_discussion_and_outlook\"\u003eDiscussion and Outlook\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e🧸 \u003cem\u003eIt’s Casey. Casey Cuddle.\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am very happy to announce the first stable release of \u003ca href=\"https://github.com/kcctl/kcctl\"\u003ekcctl\u003c/a\u003e,\na modern and intuitive command line client for \u003ca href=\"https://kafka.apache.org/documentation/#connect\"\u003eApache Kafka Connect\u003c/a\u003e!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eForget about having to memorize and type the right REST API paths and curl flags;\nwith kcctl, managing your Kafka connectors is done via concise and logically structured commands,\nmodeled after the semantics of the kubectl tool known from Kubernetes.\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","kafka-connect","tooling"],"title":"Announcing the First Release of kcctl","uri":"https://www.morling.dev/blog/announcing-first-release-of-kcctl/"},{"content":" Table of Contents The Challenge The OSS Quickstart Archetype Outlook I am very happy to announce the availability of the OSS Quickstart Archetype!\nPart of the ModiTect family of open-source projects, this is a Maven archetype which makes it very easy to bootstrap new Maven-based open-source projects, satisfying common requirements such as configuring plug-in versions, and adhering to best practices like auto-formatting the source code. Think Maven Quickstart Archetype and friends, but more modern, complete, and opinionated.\nThe Challenge When bootstrapping new Maven-based projects, be it long-running ones, a short-lived proof-of-concept projects, or just some quick demo you’d like to publish on GitHub, there’s always some boilerplate involved: creating the POM with the right plug-in versions and configurations, preparing CI e.g. on GitHub Actions, providing a license file, etc.\nWhile you could try and copy (parts of) an existing project you already have, Maven has a better answer to this problem: archetypes, pre-configured project templates which can be parameterized to some degree and which let you create new projects with just a few steps. Unfortunately, the canonical Maven quickstart archetype is rather outdated, creating projects for Java 1.7, using JUnit 4, etc.\nThe OSS Quickstart Archetype The OSS (open-source software) quickstart archetype is meant as a fresh alternative, not only providing more current defaults and dependency versions, but also going beyond what’s provided by the traditional quickstart archetype. More specifically, it\ndefines up-to-date versions of all plug-ins in use, as well as of JUnit 5 and AssertJ (the opinionated part ;)\nenforces all plug-in versions to be defined via the Maven enforcer plug-in\nprovides a license file and uses the license Maven plug-in for formatting/checking license headers in all source files\ndefines a basic set up for CI on GitHub Actions, building the project upon each push to the main branch of your repository and for each PR\nconfigures plug-ins for auto-formatting code and imports (I told you, it’s opinionated)\ndefines a -Dquick option for skipping all non-essential plug-ins, allowing you to produce the project’s JAR as quickly as possible\n(optionally) provides a module-info.java descriptor\nAnd most importantly, opening braces are not on the next line. We all agree nobody likes that, right?! Using the OSS Quickstart Archetype for bootstrapping a new project is as simple as running the following command:\n1 2 3 4 5 6 7 8 mvn archetype:generate -B \\ -DarchetypeGroupId=org.moditect.ossquickstart \\ -DarchetypeArtifactId=oss-quickstart-simple-archetype \\ -DarchetypeVersion=1.0.0.Alpha1 \\ -DgroupId=com.example.demos \\ -DartifactId=fancy-project \\ -Dversion=1.0.0-SNAPSHOT \\ -DmoduleName=com.example.fancy Just a few seconds later, and you’ll have a new project applying all the configuration above, ready for you to start some open-source awesomeness.\nOutlook Version 1.0.0.Alpha1 of the OSS Quickstart Archetype is available today on Maven Central, i.e. you can starting using it for bootstrapping new projects right now. It already contains most of the things I wanted it to have, but there’s also a few more improvements I would like to make:\nAdd the Maven wrapper (#1)\nMake the license of the generated project configurable; currently, it uses Apache License, version 2. I’d like to make this an option of the archetype, which would let you choose between this license and a few other key open-source licenses, like MIT and BSD 3-clause (#2)\nProvide a variant of the archetype for creating multi-module Maven projects (#7)\nAdd basic CheckStyle configuration (also skippable via -Dquick, #10)\nAny contributions for implementing these, as well as other feature requests are highly welcome. Note the idea is to keep these archetypes lean and mean, i.e. they should only contain widely applicable features, leaving more specific things for the user to add after they created a project with the archetype.\nHappy open-sourcing!\nMany thanks to Andres Almiray for setting up the release pipeline for this project, using the amazing JReleaser tool!\n","id":135,"publicationdate":"Dec 2, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_challenge\"\u003eThe Challenge\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_oss_quickstart_archetype\"\u003eThe OSS Quickstart Archetype\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_outlook\"\u003eOutlook\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI am very happy to announce the availability of the \u003ca href=\"https://github.com/moditect/oss-quickstart\"\u003eOSS Quickstart Archetype\u003c/a\u003e!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003ePart of the \u003ca href=\"https://github.com/moditect/\"\u003eModiTect\u003c/a\u003e family of open-source projects,\nthis is a Maven archetype which makes it very easy to bootstrap new Maven-based open-source projects,\nsatisfying common requirements such as configuring plug-in versions, and adhering to best practices like auto-formatting the source code.\nThink \u003ca href=\"https://maven.apache.org/archetypes/maven-archetype-quickstart/scm.html\"\u003eMaven Quickstart Archetype\u003c/a\u003e and friends, but more modern, complete, and opinionated.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","maven","tooling"],"title":"Introducing the OSS Quickstart Archetype","uri":"https://www.morling.dev/blog/introducing-oss-quickstart-archetype/"},{"content":"","id":136,"publicationdate":"Nov 29, 2021","section":"tags","summary":"","tags":null,"title":"error-handling","uri":"https://www.morling.dev/tags/error-handling/"},{"content":" The other day, I came across an interesting thread in the Java sub-reddit, with someone asking: \u0026#34;Has anyone attempted to write logs directly to Kafka?\u0026#34;. This triggered a number of thoughts and questions for myself, in particular how one should deal in an application when an attempt to send messages to Kafka fails, for instance due to some network connectivity issue? What do you do when you cannot reach the Kafka broker?\nWhile the Java Kafka producer buffers requests internally (primarily for performance reasons) and also supports retries, you cannot do so indefinitely (or can you?), so I went to ask the Kafka community on Twitter how they would handle this situation:\n#Kafka users: how do you deal in producers with brokers not being available? Take a use case like sending logs; you don\u0026#39;t want to fail your business process due to Kafka issues here, it\u0026#39;s fine do this later on. Large producer buffer and retries? Some extra buffer (e.g. off-heap)?\n— Gunnar Morling 🌍 (@gunnarmorling) November 27, 2021 This question spawned a great discussion with tons of insightful replies (thanks a lot to you all!), so I thought I’d try and give an overview on the different comments and arguments. As with everything, the right strategy and solution depends on the specific requirements of the use case at hand; in particular whether you can or cannot afford for potential inconsistencies between the state of the caller of your application, its own state, and the state in the Kafka cluster.\nAs an example, let’s consider an application which exposes a REST API for placing purchase orders. Acknowledging such a request while actually failing to send a Kafka message with the purchase order to some fulfillment system would be pretty bad: the user would believe their order has been received and will be fulfilled eventually, whereas that’s actually not the case.\nOn the other hand, if the incoming request was savely persisted in a database, and a message is sent to Kafka only for logging purposes, we may be fine to accept this inconsistency between the user’s state (\u0026#34;my order has been received\u0026#34;), the application’s state (order is stored in the database), and the state in Kafka (log message got lost; not ideal, but not the end of the world either).\nUnderstanding these different semantics helps to put the replies to the question into context. There’s one group of replies along the lines of \u0026#34;buffer indefinitely, block inbound requests until messages are sent\u0026#34;, e.g. by Pere Urbón-Bayes:\nThis would certainly depend on the client used and your app use case. Generally speaking, retry forever and block if the buffer is full, leave time for broker to recover, with backpressure.if backpressure not possible, cause use case, off-load off-heap for later recovery.\n— Pere Urbón-Bayes (@purbon) November 28, 2021 This strategy makes a lot of sense if you cannot afford any inconsistency between the state of the different actors at all: e.g. when you’d rather tell the user that you cannot receive their purchase order right now, instead of being at the risk of telling them that you did, whereas you actually didn’t.\nWhat though, if we don’t want to let the availability of a resource like Apache Kafka — which is used for asynchronous message exchanges to begin with — impact the availability of our own application? Can we somehow buffer requests in a safe way, if they cannot be sent to Kafka right away? This would allow to complete the inbound request, while hopefully still avoiding any inconsistencies, at least eventually.\nNow simply buffering requests in memory isn’t reliable in any meaningful sense of the word; if the producing application crashes, any unsent messages will be lost, making this approach not different in terms of reliability from working with ack = 0, i.e. not waiting for any acknowledgements from the Kafka broker. It may be useful for pure fire-and-forget use cases, where you don’t care about delivery guarantees at at all, but these tend to be rare.\nMultiple folks therefore suggested more reliable means of implementing such buffering, e.g. by storing un-sent messages on disk or by using some local, persistent queuing implementation. Some have built solutions using existing open-source components, as Antón Rodriguez and Josh Reagan suggest:\nI usually retry forever, specially when reading from another topic because we can apply backpressure. In some cases, discard after some time is ok. Very rarely off-heap with ChronicleQueue or MapsDB. I have considered but never used an external service as DLQ or a Kafka mesh\n— Antón (@antonmry) November 27, 2021 Embedded broker and in-vm protocol. Either ActiveMQ or Artemis work great.\n— Josh Reagan (@joshdreagan) November 28, 2021 You even could think of having a Kafka cluster close by (which then may have other accessibility characteristics than your \u0026#34;primary\u0026#34; cluster e.g. running in another availability zone) and keeping everything in sync via tools such as MirrorMaker 2. Others, like Jonathan Santilli, create their own custom solutions by forking existing projects:\nI forked Apache Flume and modified it to used a WAL on the disk, so, messages are technically sent, but store on disk, when the Broker is available, the local queue gets flushed, all transparent for the producer.\n— Jonathan Santilli (@pachilo) November 27, 2021 Also ready-made wrappers aound the producer exists, e.g. in Wix\u0026#39; Greyhound Kafka client library, which supports producing via local disk as per Derek Moore:\nI built a proprietary \u0026#34;data refinery\u0026#34; on Kafka for @fanthreesixty and we built ourselves libraries not dissimilar to https://t.co/uQdepGHTzj\n— Derek Moore (@derekm00r3) November 27, 2021 But there be dragons! Persisting to disk will actually not be any better at all, if it’s for instance an ephermeral disk of a Kubernetes pod which gets destroyed after an application crash. But even when using persistent volumes, you may end up with an inherently unreliable solution, as Mic Hussey points out:\nThese are two contradictory requirements 😉 Sooner or later you will run out of local storage capacity. And unless you are very careful you end up moving from a well understood shared queue to a hacked together implicit queue.\n— Mic Hussey (@hussey_mic) November 29, 2021 So it shouldn’t come at a surprise that people in this situation have been looking at alternatives, e.g. by using DynamoDB or S3 as an intermediary buffer; The team around Natan Silnitsky working on Greyhound at Wix are exploring this option currently:\nSo instead we want to fallback only on failure to send. In addition we want to skip the disk all together, because recovery mechanism when a pod is killed in #Kubernetes is too complex (involves a daemonset...), So we\u0026#39;re doing a POC, writing to #DynamoDB/#S3 upon failure 2/3 🧵\n— Natan Silnitsky (@NSilnitsky) November 29, 2021 At this point it’s worth thinking about failure domains, though. Say your application is in its own network and it cannot write to Kafka due to some network split, chances are that it cannot reach other services like S3 either. So another option could be to use a datastore close by as a buffer, for instance a replicated database running on the same Kubernetes cluster or at least in the same availability zone.\nIf this reminds you of change data capture (CDC) and the outbox pattern, you’re absolutely right; multiple folks made this point as well in the conversation, including Natan Silnitsky and R.J. Lorimer:\nThen a dedicated service will listen to #DynamoDB CDC events and produce to #ApacheKafka including payload, key, headers, etc...\n— Natan Silnitsky (@NSilnitsky) November 29, 2021 For our event sourcing systems the event being delivered actually is critical. For \u0026#34;pure\u0026#34; cqrs services, Kafka being down is paramount to not having a db so we fail. Other systems use a transactional outbox that persists in the db. If Kafka is down it sits there until ready.\n— R.J. Lorimer (He/Him) (@realjenius) November 27, 2021 As Kacper Zielinski tells us, this approach is an example of a staged event-driven architecture, or SEDA for short:\nOutbox / SEDA to rescue here. Not sure if any \u0026#34;retry\u0026#34; can guarantee you more than \u0026#34;either you will loose some messages or fail the business logic by eating all resources\u0026#34; :)\n— Kacper Zielinski (@xkzielinski) November 27, 2021 In this model, a database serves as the buffer for persisting messages before they are sent to Kafka, which makes for for a highly reliable solution, provided the right degree of redundancy is implemented e.g. in form of replicas. In fact, if your application needs to write to a database anyways, \u0026#34;sending\u0026#34; messages to Kafka via an outbox table and CDC tools like Debezium is a great way to avoid any inconsistencies between the state in the database and Kafka, without incurring any unsafe dual writes.\nBut of course there is a price to pay here too: end-to-end latency will be increased when going through a database first and then to Kafka, rather than going to Kafka directly. You also should keep in mind that the more moving pieces your solution has, the more complex to operate it will become of course, and the more subtle and hard-to-understand failure modes and edge cases it will have.\nAn excellent point is made by Adam Kotwasinski by stating that it’s not a question of whether things will go wrong, but only when they will go wrong, and that you need to have the right policies in place in order to be prepared for that:\nFor some of my usecases I have a wrapper for Kafka\u0026#39;s producer that requires users to _explicitly_ set up policies like retry/backoff/drop. It allows my customers to think about outages (that will happen!) up front instead of being surprised. Each usecase is different.\n— Adam Kotwasinski (@AKotwasinski) November 28, 2021 In the end it’s all about trade-offs, probabilities and acceptable risks. For instance, would you receive and acknowledge that purchase order request as long as you can store it in a replicated database in the local availability zone, or would you rather reject it, as long as you cannot safely persist it in a multi-AZ Kafka cluster?\nThese questions aren’t merely technical ones any longer, but they require close collaboration with product owners and subject matter experts in the business domain at hand, so to make the most suitable decisions for your specific situation. Managed services with defined SLAs guaranteeing high availability values can make the deciding difference here, as Vikas Sood mentions:\nThat\u0026#39;s why we decided to go with a managed offering to avoid disruptions in some critical processes.Some teams still had another decoupling layer (rabbit) between producers and Kafka. Was never a huge fan of that coz it simply meant more points of failure.\n— Vikas Sood (@Sood1Vikas) November 27, 2021 Thanks a lot again to everyone chiming in and sharing their experiences, this was highly interesting and insightful! You have further ideas and thoughts to share? Let me and the community at large know either by leaving a comment below, or by replying to the thread on Twitter. I’m also curious about your feedback on this format of putting a Twitter discussion into some expanded context. It’s the first time I’ve been doing it, and I’d be eager to know whether you find it useful or not. Thanks!\n","id":137,"publicationdate":"Nov 29, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, I came across an \u003ca href=\"https://www.reddit.com/r/java/comments/r2z17a/has_any_one_attempted_to_write_logs_directly_to/\"\u003einteresting thread\u003c/a\u003e in the Java sub-reddit, with someone asking:\n\u0026#34;Has anyone attempted to write logs directly to Kafka?\u0026#34;.\nThis triggered a number of thoughts and questions for myself,\nin particular how one should deal in an application when an attempt to send messages to Kafka fails,\nfor instance due to some network connectivity issue?\nWhat do you do when you cannot reach the Kafka broker?\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","reliability","error-handling"],"title":"O Kafka, Where Art Thou?","uri":"https://www.morling.dev/blog/kafka-where-art-thou/"},{"content":"","id":138,"publicationdate":"Nov 22, 2021","section":"tags","summary":"","tags":null,"title":"compatibility","uri":"https://www.morling.dev/tags/compatibility/"},{"content":" Table of Contents The Problem Bridge Methods to the Rescue Creating Bridge Methods Ourselves If you work on any kind of software library, ensuring backwards-compatibility is a key concern: if there’s one thing which users really dislike, it is breaking changes in a new version of a library. The rules of what can (and cannot) be changed in a Java API without breaking existing consumers are well defined in the Java language specification (JLS), but things can get pretty interesting in certain corner cases.\nThe Eclipse team provides a comprehensive overview about API evolution guidelines in their wiki. When I shared the link to this great resource on Twitter the other day, I received an interesting reply from Lukas Eder:\nI wish Java had a few tools to prevent some cases of binary compatibility breakages. E.g. when refining a method return type, I’d like to keep the old method around in byte code (but not in source code). I think kotlin has such tools? In the remainder of this post, I’d like to provide some more insight into that problem mentioned by Lukas, and how it can be addressed using an open-source tool called Bridger.\nThe Problem Let’s assume we have a Java library which provides a public class and method like this:\n1 2 3 4 5 6 public class SomeService { public Number getSomeNumber() { return 42L; } } The library is released as open-source and it gets adopted quickly by the community; it’s a great service after all, providing 42 as the answer, right?\nAfter some time though, people start to complain: instead of the generic Number return type, they’d rather prefer a more specific return type of Long, which for instance offers the compareTo() method. Since the returned value is always a long value indeed (and no other Number subtype such as Double), we agree that the initial API definition wasn’t ideal, and we alter the method definition, now returning Long instead.\nBut soon after we’ve released version 2.0 of the library with that change, users report a new problem: after upgrading to the new version, they suddenly get the following error when running their application:\n1 2 java.lang.NoSuchMethodError: \u0026#39;java.lang.Number dev.morling.demos.bridgemethods.SomeService.getSomeNumber()\u0026#39; at dev.morling.demos.bridgemethods.SomeClientTest.shouldReturn42(SomeClientTest.java:27) That doesn’t look good! Interestingly, other users don’t have a problem with version 2.0, so what is going on here? In order to understand that, let’s take a look at how this method is used, in source code and in Java binary code. First the source code:\n1 2 3 4 5 6 7 public class SomeClient { public String getSomeNumber() { SomeService service = new SomeService(); return String.valueOf(service.getSomeNumber()); } } Rather unspectacular; so let’s use javap to examine the byte code of that class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public java.lang.String getSomeNumber(); descriptor: ()Ljava/lang/String; flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=2, args_size=1 0: new #7 // class dev/morling/demos/bridgemethods/SomeService 3: dup 4: invokespecial #9 // Method dev/morling/demos/bridgemethods/SomeService.\u0026#34;\u0026lt;init\u0026gt;\u0026#34;:()V 7: astore_1 8: aload_1 9: invokevirtual #10 // Method dev/morling/demos/bridgemethods/SomeService.getSomeNumber:()Ljava/lang/Number; 12: invokestatic #14 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String; 15: areturn LineNumberTable: line 21: 0 line 22: 8 LocalVariableTable: Start Length Slot Name Signature 0 16 0 this Ldev/morling/demos/bridgemethods/SomeClient; 8 8 1 service Ldev/morling/demos/bridgemethods/SomeService; The interesting part is the invokevirtual at label 9; that’s the invocation of the SomeService::getSomeNumber() method, and as we see, the return type of the invoked method is part of the byte code of that invocation, too. As developers writing code in the Java language, this might come at a suprise at first, as we tend to think of just a method’s names and its parameter types as the method signature. For instance, we may not declare two methods which only differ by their return type in the same Java class. But from the perspective of the Java runtime, the return type of a method is part of method signatures as well.\nThis explains the error reports we got from our users: when changing the method return type from Number to Long, we did a change that broke the binary compatibility of our library. The JVM was looking for a method SomeService::getSomeNumber() returning Number, but it couldn’t find it in the class file of version 2.0 of our service.\nIt also explains why not all the users reported that problem: those that recompiled their own application when upgrading to 2.0 would not run into any issues, as the compiler would simply use the new version of the method and put the invocation of that signature into the class files of any callers. Only those users who did not re-compile their code encountered the problem, i.e. the change actually was source-compatible.\nBridge Methods to the Rescue At this point you might wonder: Isn’t it possible to refine method return types in sub-classes? How does that work then? Indeed it’s true, Java does support co-variant return types, i.e. a sub-class can override a method using a more specific return type than declared in the super-type:\n1 2 3 4 5 6 7 public class SomeSubService extends SomeService { @Override public Long getSomeNumber() { return 42L; } } To make this work for a client coded against the super-type, the Java compiler uses a neat trick: it injects a so-called bridge method into the class file of the sub-class, which has the signature of the overridden method and which calls the overriding method. This is how this looks like when disassembling the SomeSubService class file:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 public java.lang.Long getSomeNumber(); (1) descriptor: ()Ljava/lang/Long; flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: ldc2_w #14 // long 42l 3: invokestatic #21 // Method java/lang/Long.valueOf:(J)Ljava/lang/Long; 6: areturn LineNumberTable: line 22: 0 LocalVariableTable: Start Length Slot Name Signature 0 7 0 this Ldev/morling/demos/bridgemethods/SomeSubService; public java.lang.Number getSomeNumber(); (2) descriptor: ()Ljava/lang/Number; flags: (0x1041) ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC (3) Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokevirtual #24 // Method getSomeNumber:()Ljava/lang/Long; 4: areturn LineNumberTable: line 18: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Ldev/morling/demos/bridgemethods/SomeSubService; 1 The overriding method as defined in the sub-class 2 The bridge method with the signature from the super-class, invoking the overriding method 3 The injected method has the ACC_BRIDGE and ACC_SYNTHETIC modifiers That way, a client compiled against the super-type method will first invoke the bridge method, which in turn delegates to the overriding method of the sub-class, providing the late binding semantics we’d expect from Java.\nAnother situation where the Java compiler relies on bridge methods is compiling sub-types of generic super-classes or interfaces. Refer to the Java Tutorial to learn more about this. Creating Bridge Methods Ourselves So as we’ve seen, with bridge methods, there is a tool in the box to ensure compatibility in case of refining return types in sub-classes. Which brings us back to Lukas\u0026#39; question from the beginning: is there a way for using the same trick for ensuring compatibility when evolving our API across library versions?\nNow you can’t define a bridge method using the Java language, this concept just doesn’t exist at the language level. So I thought about quickly hacking together a PoC for this using the ASM bytecode manipulation toolkit; but what’s better than creating open-source? Re-using existing open-source! As it turns out, there’s a tool for that very purpose exactly: Bridger, created by my fellow Red Hatter David M. Lloyd.\nBridger lets you create your own bridge methods, using ASM to apply the required class file transformations for turning a method into a bridge method. It comes with a Maven plug-in for integrating this transformation step into your build process. Here’s the plug-in configuration we need:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.jboss.bridger\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;bridger\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.5.Final\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;weave\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;process-classes\u0026lt;/phase\u0026gt; (1) \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;transform\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; (2) \u0026lt;groupId\u0026gt;org.ow2.asm\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;asm\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;9.2\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; \u0026lt;/plugin\u0026gt; 1 Bind the transform goal to the process-classes build lifecycle phase, so as to modify the classes produced by the Java compiler 2 Use the latest version of ASM, so we can work with Java 17 With the plug-in in place, you can define bridge methods like so, using the $$bridge name suffix (seems the syntax highligher doesn’t like the $ signs in identifiers…​):\n1 2 3 4 5 6 7 8 9 10 11 12 13 public class SomeService { /** * @hidden (1) */ public Number getSomeNumber$$bridge() { (2) return getSomeNumber(); } public Long getSomeNumber() { return 42L; } } 1 By means of the @hidden JavaDoc tag (added in Java 9), this method will be excluded from the JavaDoc generated for our library 2 The bridge method to be; the name suffix will be removed by Bridger, i.e. it will be named getSomeNumber; it will also have the ACC_BRIDGE and ACC_SYNTHETIC modifiers And that’s how the byte code of SomeService looks like after Bridger applied the transformation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 public java.lang.Number getSomeNumber(); descriptor: ()Ljava/lang/Number; flags: (0x1041) ACC_PUBLIC, ACC_BRIDGE, ACC_SYNTHETIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokevirtual #16 // Method getSomeNumber:()Ljava/lang/Long; 4: areturn LineNumberTable: line 21: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this Ldev/morling/demos/bridgemethods/SomeService; public java.lang.Long getSomeNumber(); descriptor: ()Ljava/lang/Long; flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: ldc2_w #17 // long 42l 3: invokestatic #24 // Method java/lang/Long.valueOf:(J)Ljava/lang/Long; 6: areturn LineNumberTable: line 25: 0 LocalVariableTable: Start Length Slot Name Signature 0 7 0 this Ldev/morling/demos/bridgemethods/SomeService; With that, we have solved the challenge: utilizing a bridge method, we can rectify the glitch in the version 1.0 API and refine the method return type in a new version of our library, without breaking source nor binary compatibility with existing users.\nBy means of the @hidden JavaDoc tag, the source of our bridge method won’t show up in the rendered documentation (which would be rather confusing), and marked as a synthetic bridge method in the class file, it also won’t show up when looking at the JAR in an IDE.\nIf you’d like to start your own explorations of Java bridge methods, you can find the complete source code of the example in this GitHub repo. Useful tools for tracking API changes and identifying any potential breaking changes include SigTest (we use this one for instance in the Bean Validation specification to ensure backwards compatibility) and Revapi (which we use in Debezium). Lastly, here’s a great blog post by Stuart Marks, where he describes how even the seemingly innocent addition of a Java default method to a widely used (and implemented) interface may lead to problems in the real world.\n","id":139,"publicationdate":"Nov 22, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_problem\"\u003eThe Problem\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_bridge_methods_to_the_rescue\"\u003eBridge Methods to the Rescue\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_creating_bridge_methods_ourselves\"\u003eCreating Bridge Methods Ourselves\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIf you work on any kind of software library,\nensuring backwards-compatibility is a key concern:\nif there’s one thing which users really dislike, it is breaking changes in a new version of a library.\nThe rules of what can (and cannot) be changed in a Java API without breaking existing consumers are well defined in the Java language specification (JLS),\nbut things can get pretty interesting in certain corner cases.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe Eclipse team provides a \u003ca href=\"https://wiki.eclipse.org/Evolving_Java-based_APIs_2\"\u003ecomprehensive overview\u003c/a\u003e about API evolution guidelines in their wiki.\nWhen I shared the link to this great resource on Twitter the other day,\nI received an \u003ca href=\"https://twitter.com/lukaseder/status/1462358911072317440\"\u003einteresting reply\u003c/a\u003e from Lukas Eder:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"quoteblock\"\u003e\n\u003cblockquote\u003e\nI wish Java had a few tools to prevent some cases of binary compatibility breakages. E.g. when refining a method return type, I’d like to keep the old method around in byte code (but not in source code).\n\u003cbr/\u003e\n\u003cbr/\u003e\nI think kotlin has such tools?\n\u003c/blockquote\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn the remainder of this post,\nI’d like to provide some more insight into that problem mentioned by Lukas,\nand how it can be addressed using an open-source tool called \u003ca href=\"https://github.com/dmlloyd/bridger\"\u003eBridger\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","api-design","compatibility"],"title":"Refining The Return Type Of Java Methods Without Breaking Backwards-Compatibility","uri":"https://www.morling.dev/blog/refining-return-type-java-methods-without-breaking-backwards-compatibility/"},{"content":" If you have followed this blog for a while, you’ll know that I am a big fan of JDK Flight Recorder (JFR), the low-overhead diagnostics and profiling framework built into the HotSpot Java virtual machine. And indeed, until recently, this meant only HotSpot: Folks compiling their Java applications into GraalVM native binaries could not benefit from all the JFR goodness so far.\nBut luckily, this situation has changed with GraalVM 21.2: Thanks to a collaboration of engineers from Red Hat and Oracle, GraalVM native binaries now also support JDK Flight Recorder. At this point, the JFR recording engine itself has been put in place, there are not many event types actually emitted yet. As Jie Kang wrote recently in a post about this ongoing work, this should change soon, though:\nThe initial merge for JFR infrastructure is complete but there is a long road ahead before the system can provide a view into native executables produced by GraalVM that is similar to what is possible for HotSpot. Up next is the work to add events for garbage collection, threads, exceptions, and other useful locations in SubstrateVM. — JDK Flight Recorder support for GraalVM Native Image: The journey so far What already does work is emitting custom JFR events from your application code. So I took the Quarkus-based todo management application from my earlier post about monitoring REST APIs with JFR and explored what it’d take to make it work as a native binary. And what should I say, essentially things \u0026#34;just worked ™️\u0026#34;. All I had to do, was the following:\nUse a current version of GraalVM (21.3 at the time of writing)\nUpgrade Quarkus to the current version (2.4.2.Final, released just today); with 2.2.3.Final, which I had been using before, I’d get an error at image build time about a modifier mismatch with a native method substituted by Quarkus\nEnable GraalVM’s AllowVMInspection option when creating the native binary\nAs per the GraalVM documentation, the latter is required in order to use JFR events in native binaries. Unfortunately, failing to do so will only be reported at application runtime with an exception like this:\n1 2 3 4 5 6 7 8 9 10 11 2021-11-12 15:31:22,456 ERROR [io.qua.run.Application] (main) Failed to start application (with profile prod): java.lang.UnsatisfiedLinkError: jdk.jfr.internal.JVM.getHandler(Ljava/lang/Class;)Ljava/lang/Object; [symbol: Java_jdk_jfr_internal_JVM_getHandler or Java_jdk_jfr_internal_JVM_getHandler__Ljava_lang_Class_2] at com.oracle.svm.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:153) at com.oracle.svm.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:57) at jdk.jfr.internal.JVM.getHandler(JVM.java) at jdk.jfr.internal.Utils.getHandler(Utils.java:448) at jdk.jfr.internal.MetadataRepository.getHandler(MetadataRepository.java:174) at jdk.jfr.internal.MetadataRepository.register(MetadataRepository.java:135) at jdk.jfr.internal.MetadataRepository.register(MetadataRepository.java:130) at jdk.jfr.FlightRecorder.register(FlightRecorder.java:136) at dev.morling.demos.jfr.Metrics.registerEvent(Metrics.java:27) ... This is triggered by the application code registering the custom JFR event type:\n1 2 3 public void registerEvent(@Observes StartupEvent se) { FlightRecorder.register(JaxRsInvocationEvent.class); } Here I’d wish that either GraalVM’s native-image tool or Quarkus would tell me about this situation already upon build time, in particular as the cause of that problem is not readily apparent from the exception above. In any case, the required fix is simple enough, all we need to do is to set the quarkus.native.enable-vm-inspection option in the application.properties file of the Quarkus application:\n1 quarkus.native.enable-vm-inspection=true With that configuration in place, the application can be built as a native binary via mvn clean verify -Pnative. Grab a coffee while the build is running (it takes about two minutes on my laptop), and then you can start the resulting native binary with the following options for creating a JFR recording:\n1 2 3 ./target/flight-recorder-demo-1.0.0-SNAPSHOT-runner \\ -XX:+FlightRecorder \\ -XX:StartFlightRecording=\u0026#34;filename=my-recording.jfr\u0026#34; You can also configure some more of the known JFR options, such as maximum recording size and duration. What is not possible at this point is starting recordings dynamically at runtime e.g. via jcmd or JDK Mission Control, as the JMX-based infrastructure required for this isn’t present in native binaries (I haven’t tried to do so programmatically from within the application itself, this may be supported already). JFR Event Streaming (as introduced with JEP 349 in Java 14) also doesn’t work yet.\nAfter creating some todos in the web application, we can open the JFR recording in JDK Mission Control and examine the JFR events emitted for each invocation of the REST API:\nAs you see, besides the custom REST invocation events and some system events representing environment variables and system properties, the recording is rather empty. Also note how the thread attribute of the custom event type isn’t populated.\nI’ve updated the jfr-custom-events repository on GitHub, so you can get started with your own explorations around JFR events in GraalVM native binaries easily. Just make sure to have a current GraalVM and its native-image tool installed. The initial feature request for adding JFR support to GraalVM native binaries provides some more background information. You also can use JFR with the Mandrel distribution of GraalVM.\nTo learn more about JFR in general, have a look at this post by Mario Torre. Finally, if you’d like to find out how to use JFR for identifying potential performance regressions in your Java applications, check out this talk about JfrUnit which I did at the P99Conf conference a few weeks ago.\n","id":140,"publicationdate":"Nov 12, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIf you have followed this blog for a while,\nyou’ll know that I am a big fan of \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR),\nthe low-overhead diagnostics and profiling framework built into the HotSpot Java virtual machine.\nAnd indeed, until recently, this meant \u003cem\u003eonly\u003c/em\u003e HotSpot:\nFolks compiling their Java applications into \u003ca href=\"https://www.graalvm.org/reference-manual/native-image/\"\u003eGraalVM native binaries\u003c/a\u003e could not benefit from all the JFR goodness so far.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","graalvm","native-image"],"title":"JDK Flight Recorder Events in GraalVM Native Binaries","uri":"https://www.morling.dev/blog/jfr-events-in-graalvm-native-binaries/"},{"content":" Table of Contents Don’t Fear Outdated Caches – Change Data Capture to the Rescue! Change Data Streaming Patterns in Distributed Systems Analyzing Real-time Order Deliveries using CDC with Debezium and Pinot Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB Bonus: Debezium at the Trino Community Broadcast Learning More If you love to attend conferences around the world without actually leaving the comfort of your house, 2021 certainly was (and is!) a perfect year for you. Tons of online conferences, many of them available for free, are hosting talks on all kinds of topics, and virtual conference platforms are getting better, too.\nAs the year is slowly reaching its end, I thought it might be nice to do a quick recap and gather in one place all the talks on Debezium and change data capture (CDC) which I did in 2021. An overarching theme for these talks was to discuss different CDC usage patterns and putting Debezium into the context of solving common data engineering tasks by combining it with other open-source projects such as Infinispan and Apache Pinot. In order to not feel too lonely in front of the screen and make things a bit more exciting, I decided to team up with some amazing friends from the open-source community for the different talks. A big thank you for these phantastic collaborations to Katia Aresti, Kenny Bastani, and Hans-Peter Grahsl!\nSo without further ado, here are four Debezium talks I had the pleasure to co-present in 2021.\nDon’t Fear Outdated Caches – Change Data Capture to the Rescue! As per an old saying in software engineering, there’s only two hard things: cache invalidation and naming things. Well, turns out the first is solved actually ;)\nIn this talk at the Bordeaux Java User Group, Katia Aresti from the Infinispan team and I explored how users of an application can benefit from low response times by means of read data models, persisted in distributed caches close to the user. When working with a central database as the authoritative data source — thus receiving all the write requests — these local caches need to be kept up to date, of course. This is where Debezium comes in: any data changes are captured and propagated to the caches via Apache Kafka.\nAnd as if the combination of Kafka, Infinispan and Debezium was not already exciting enough, we also threw some Quarkus and Kafka Streams into the mix, joining the data from multiple Debezium change data topics, allowing to retrieve entire aggregate structures via a single key look-up from the local caches. It’s still on our agenda to describe that archicture in a blog post, so stay tuned for that.\n📺 Recording on YouTube\n🖥️ Slides\n🤖 Demo source code\nChange Data Streaming Patterns in Distributed Systems While some folks already might feel something like microservices fatigue, the fact is undeniable that organizing business functionality into multiple, loosely coupled services has been one of the biggest trends in software engineering over the last years.\nOf course these services don’t exist in isolation, but they need to exchange data and cooperate; Apache Kafka has become the de-facto standard as the backbone for connecting services, facilitating asynchronous event-driven communication between them. In this joint presentation, my dear friend Hans-Peter Grahsl and I set out to explore what role change data capture can play in such architectures, and which patterns there are for applying CDC to solve common problems related to handling data in microservices architectures. We focused on three patterns in particular, each implemented using log-based CDC via Debezium:\nThe outbox pattern for reliable, eventually consistent data exchange between microservices, without incurring unsafe dual writes or tight coupling\nThe strangler fig pattern for gradually extracting microservices from existing monolithic applications\nThe saga pattern for coordinating long-running business transactions across multiple services, ensuring such activity gets consistently applied or aborted by all participating services\nWe presented that talk at several conferences, including Kafka Summit Europe, Berlin BuzzWords, and jLove. We also did a variation of the presentation at Flink Forward, discussing how to implement the different CDC patterns using Debezium and Apache Flink. The recording of this session should be published soon, in the meantime you can find the slides here. I also highly recommend to take a look at this blog post by Bilgin Ibryam, in which he discusses these patterns in depth.\n📺 Recording on YouTube\n🖥️ Slides\n🤖 Demo source code\nAnalyzing Real-time Order Deliveries using CDC with Debezium and Pinot Traditionally, there has been a chasm between operational databases backing enterprise applications (i.e. OLTP systems), and systems meant for ad-hoc analytics use cases, such as queries run by a business analyst in the back office. (OLAP systems). Data would typically be propagated in batches from the former to the latter, resulting in multi-hour delays until the analytics system would be able to run queries on changed production data.\nWith the current shift to user-facing analytics, we are observing nothing less than a revolution: the ability to serve low-latency analytical queries on large data sets to the end users of an application, based on data that is really fresh (seconds old, rather than hours). Compared to response times and freshness guarantees you’d typically get from earlier generations of data warehouses, this is a game changer.\nIn this model, Debezium is used to capture all data changes from the operational database and propagate them into the analytics system. Kenny Bastani of StartTree and I spoke about the opportunities and use cases enabled by combining Debezium with Apache Pinot, a realtime distributed OLAP datastore, at the Pinot meet-up. A massive shout-out to Kenny again for putting together an awesome demo, showing how to use Debezium and the outbox pattern for getting the data into Apache Kafka, transform the data and ingest it into Pinot, and do some really cool visualizations via Apache Superset.\n📺 Recording on YouTube\n🤖 Demo source code\nDissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB After talking about three different CDC patterns, Hans-Peter and I decided to explore one of the patterns in some more depth and did this talk focusing exclusively on the strangler fig pattern. Existing monolithic applications are a reality in many enterprises, and oftentimes it’s just not feasible to replace them with a microservices architecture all at once in one single migration step.\nThis is where the strangler fig pattern comes in: it helps you to gradually extract components from a monolith into separate services, relying on CDC for keeping the data stores of the different systems in sync. A routing component, such as Nginx or Envoy Proxy, in front of all the systems sends each incoming request to that system which is in charge of a specific part of the domain at a given point in time during the migration.\nThis talk (which we presented at MongoDB.Live, Kafka Summit Americas, and VoxxedDays Romania), also contains a demo, we show how to implement the strangler fig pattern using Debezium, gradually moving data from a legacy system’s MySQL database over to the MongoDB instance of a new microservice, which is built using Quarkus.\n📺 Recording on YouTube\n🖥️ Slides\n🤖 Demo source code\nBonus: Debezium at the Trino Community Broadcast This one is not so much a regular conference talk, but more of an informal exchange, so I’m adding it as a bonus here, hoping you may find it interesting too. Brian Olsen and Manfred Moser of Starburst, the company behind Trino, invited Ashhar Hasan, Ayush Chauhan, and me onto their Trino Community Broadcast.\nWe had a great time talking about Debezium and CDC in the context of Trino and its federated query capabilities, learning a lot from Ashhar and Ayush about their real-world experiences from using these technologies in production.\nLearning More Thanks again to Katia, Kenny, and Hans-Peter for joining the virtual conference stages with me this year! It would not have been half as much fun without you.\nIf these talks have piqued your interest in open-source change data capture and Debezium, head over to the Debezium website to learn more. You can also find many more examples in the Debezium examples repo on GitHub, and if you look for reports by folks from the community about their experiences using Debezium, take a look at this currated list of blog posts and other resources.\n","id":141,"publicationdate":"Nov 2, 2021","section":"blog","summary":"Table of Contents Don’t Fear Outdated Caches – Change Data Capture to the Rescue! Change Data Streaming Patterns in Distributed Systems Analyzing Real-time Order Deliveries using CDC with Debezium and Pinot Dissecting our Legacy: The Strangler Fig Pattern with Apache Kafka, Debezium and MongoDB Bonus: Debezium at the Trino Community Broadcast Learning More If you love to attend conferences around the world without actually leaving the comfort of your house, 2021 certainly was (and is!","tags":["debezium","cdc","conferences"],"title":"Debezium and Friends – Conference Talks 2021","uri":"https://www.morling.dev/blog/debezium-talks-2021/"},{"content":"","id":142,"publicationdate":"Oct 24, 2021","section":"tags","summary":"","tags":null,"title":"hardware","uri":"https://www.morling.dev/tags/hardware/"},{"content":" Table of Contents The Screen Compute Cameras Work in Progress: Teleprompter Lighting Audio What’s Next? I’ve been working from home exclusively for the last nine years, but it was only last year that I started to look into ways for expanding my computer set-up and go beyond the usual combination of having a laptop with your regular external screen. The global COVID-19 pandemic, the prospect of having more calls with colleagues than ever (no physical meetings), and the constantly increasing need for recording talks for online conferences and meet-ups made me reevaluate things and steadily improve and fine tune my set-up, in particular in regards to better video and audio quality.\nWhen I shared a picture of my desk on Twitter recently, a few folks asked for more details on specific parts like the screen, microphone etc, so I thought I’d provide some insights in this post. Don’t expect any sophisticated test or evaluation of sorts, I’m just going to briefly describe the different components, how I use them, things I like about them, and other aspects which still could be improved. Note that I’m not affiliated in any way with any of the vendors mentioned in this post, so anything positive or negative I’m going to mention, is solely based on my personal experience from using the discussed items, without any financial incentive to do so. There are also no affiliate links.\nThe Screen Let’s start with the most apparent part of the set-up, the screen. It’s a curved 49\u0026#34; 32:9 ultra-widescreen display (Samsung C49RG94SSR, 5120 x 1440 pixels), i.e. it offers the same screen real estate like two 16:9 screens next to each other.\nWhether such a large screen suits your personal preferences is something which you only really can find out by yourself. Curvature of the screen is something you may have to get used to, initially I was slightly put off by (wide) windows not appearing 100% straight, but by now I don’t even notice this any more. I suggest you have a look at this article by my colleague Emmanuel Bernard, where he compares ultra-wide monitors to the alternatives and discusses the pros and cons of each. Personally, I’m very happy with this screen and really wouldn’t want to miss it. I never was a fan of multi-screen set-ups due to the inevitable frames between screens, and in fact, my only regret is that I didn’t buy it earlier. So thanks a lot for the recommendation, Emmanuel!\nSome folks use window managers to arrange their application windows on large screens (e.g. Rectangle on macOS, a few more alternatives are discussed in this thread by Guillaume Laforge), but I find myself just manually organizing things in roughly three columns: communications (email, chat), editing (documents, shell, IDE, etc.), and preview (e.g. rendered AsciiDoc documents).\nFigure 1. Reading the source code of HasThisTypePatternTried…​Visitor at 300%? No problem! One very useful feature of this monitor is its picture-by-picture mode (PBP): it lets you connect two sources at once, which then will show up next to each other on the screen. Now I’m typically not working with two computers simultaneously (although this can be useful when for instance editing a benchmark on one machine and running it on another), but I use PBP when doing presentations, or when recording conference talks; in that case, I’ll connect the same machine twice, i.e. as primary and secondary screen. This allows me to share one of the screens entirely for the presentation/recording (thus having the commonly expected 16:9 aspect ratio), with other applications being located on the second screen, and without having to manually adjust the size of individually shared windows or tabs. Needless to say that sharing the full screen isn’t very practical, as viewers with a regular screen would just see a small wide ribbon.\nAre there downsides to this screen? So far, I’ve found two. One is its energy consumption; with 55 kWh/1,000h, it’s definitely on the high end of the spectrum. I suppose in parts that’s just due to its sheer size, but I’m sure things could be improved here. The other thing to mention is that when using it with a MacBook Pro, you should make sure to have the lid of the laptop closed (implying that you’ll need an external keyboard and mouse/touchpad), as the fan will be audible substantially more when driving both internal and external screens.\nOne last minor annoyance is that the screen’s software forgets the settings when enabling and disabling the picture-by-picture mode. When switching from single input to PBP, I always need to configure the input sources again. Here I’d really wish the screen would memorize the settings from the last time I was using PBP.\nCompute I am using two Apple computers to get things done: a 2019 16\u0026#34; MacBook Pro (2,6 GHz 6-Core Intel Core i7, 32 GB of RAM) provided by my employer, and a Mac Mini M1 2020 with 16 GB of RAM. Most work stuff is happening on the MackBook Pro, and really there’s nothing too exciting to share here; it tends to do its job just as it should. There’s two things I don’t like about it though:\nthe touch bar; it’s virtually useless to me, and I wished for physical function keys instead, making it much more reliable to hit the right key combinations, e.g. in the IDE. Granted, I work with an external keyboard most of the time, so it’s not impacting me that much\nthe only connectivity option being USB-C; while surely elegant, the required zoo of connectors and adapters to actually plug in external hardware, renders that point more than moot\nThankfully, Apple finally got that memo too and addressed both things in their latest MacBook Pro edition.\nFigure 2. Duplo bricks make for a perfect laptop stand; luckily, I could borrow some from my daughter The Mac Mini is awesome for any kind of video recording and streaming. Recently, I was asked to record two Full HD streams for a talk at AccentoDev: one with my slides, and one with my camera feed, allowing the video editor to freely switch between the two when creating the final recording for publication. The M1 wouldn’t break a sweat when recording this video with a resolution of 3,840 x 1,080 pixels via OBS, with the fan barely being audible. Whereas when trying to do the same on the MBP, the fan would spin up heavily, and you’d have a hard time to not capture the fan noise with the microphone.\nFigure 3. MacMini M1 2020 Originally, I bought the Mac Mini M1 to experiment a bit with running Java applications on the AArch64 architecture. Unfortunately, I didn’t really find much time yet to do so. One interesting thing I noticed though from running some quick JMH benchmarks against the new Java Vector API is that results tended to be super-stable, with a much smaller standard deviation than running the same benchmark on the x86-based laptop. I hope to find some time to dive a bit more into that area at some point in the future.\nCloud Compute Every now and then, I do have the need for running something on Linux rather than macOS, or for spinning up multiple boxes, executing a benchmark for instance. Ok, ok, they are not actually running on my desk, but I thought it still might be interesting to share a few words on that.\nMy preferred go-to platform for these scenarios is Hetzner Cloud, as they provide flexible cloud compute options at a really attractive price tag, in particular capped at a fixed limit, so there’s no potential for surprise bills coming in.\nTo make launching and configuring boxes in the Hetzner cloud as easy as possible for me, I have a simple set-up of Terraform and Ansible scripts. The former just launches up the desired number of compute nodes with the chosen spec, using the current version of Fedora as the operating system. The latter installs the tools I commonly need, such as different Java versions, commonly used CLI tools, and such.\nOne neat thing about Hetzner Cloud is that you can easily scale up and down single instances. So what I’ll usually do is to spin up a box in the smallest available configuration (CX11); running this for a full month costs a whopping €4.15. But then, when I actually want to use the node, I’ll change the Terraform configuration to something more powerful, such as the CCX22 instance type with 4 dedicated vCPU and 16 GB RAM. One quick terraform apply and a few seconds later, I’ll have a node with the specs I need. Only for the few hours I’m using it, I’ll have to pay the increased price for the better spec, before scaling it back down to the CX11 instance again.\nCameras So let’s change topics and talk a bit about my recording set-up. There’s essentially three scenarios where I need to record myself and/or my screen:\nVideo calls: working 100% from home in a globally distributed development team, there’s not a single day where I won’t have to do at least a couple of calls with my co-workers\nConference talks: with the global pandemic still going on, all the conferences have gone virtual, requiring either to pre-record or live-stream any talks\nDemos: lately, I’ve become a fan of recording short videos introducing new features in the projects I’m involved with, e.g. the Debezium UI or kcctl\nAdditionally, I’m joining Nicolai Parlog once per month on his Twitch channel, where we talk about and explore all things Java.\nWhile I initially used the internal camera and microphone of my laptop, I wasn’t really satisfied with the outcome, in particular once I saw the high quality of recordings shared by other folks. For a really good video image quality, two things are key: using a \u0026#34;real\u0026#34; camera (i.e. not a webcam), and proper lighting. You’ll also want a good external microphone, more on that below.\nSo why not a webcam? Essentially because sensors are too small and lenses are too slow, which means you’ll quickly have noise in the image and you won’t get that nice movie-like look with a shallow depth of field (bokeh). Using either a DSLR or a mirrorless system camera will yield a dramatically better image quality. In my case, I am using the Lumix GX80 (sold as GX85 in the US), a mirrorless system camera from Panasonic, using the Micro Four Thirds interchangeable lens standard.\nFigure 4. Panasonic Lumix GX80 and Logitech StreamCam I’m generally happy with it for this purpose: it provides clean HDMI output (i.e. no menu overlays when capturing the live feed via HDMI, as it’s the case with some cameras), image quality and ergonomics are good overall. On the downside, it doesn’t provide continuous auto-focus if you’re not actually recording on the camera. This sounds worse than it actually is in practice: using the \u0026#34;Quick AF\u0026#34; option, it will auto-focus when turning on the camera, or when zooming in or out a bit, which is enough to get proper focussing in a relatively static setting such as a screen recording session. If you are planning to move forth and back a lot though, then you should look into other options. Another thing to mention is that the GX80 doesn’t allow to connect an external microphone to it; in my case, that doesn’t matter though, as I’m connecting the mic via a separate audio interface.\nAs you’d quickly run down the camera’s battery when streaming its video signal for a longer period of time, an external power source should be used. I’m using a dummy battery similar to this one, which does the job as expected. Just make sure to have an USB power adapter which provides enough output current (2A or more); I had missed that initially and was wondering why the camera would always turn off when pressing the focus button…​ . For a camera mount, I’m using this cheap one; it’s pretty crappy, with lots of wobbling, but once you have the camera in the place where you want it to be, it’ll stay there. Still, I’d probably pay a bit more to get a more robust mount, should I ever have to buy a new one.\nAs you typically cannot connect a DSLR or a mirrorless system camera like the GX80 via USB, you’ll also need an HDMI converter which you then can plug into your USB port. Here I’m using the ubiquitous Elgato Cam Link 4K. Back when I got it, it was pretty much the only (and pricy) option, but I believe by now there are alternatives, which should work equally well but are a bit cheaper.\nDespite my \u0026#34;no webcam\u0026#34; mantra, I also have a Logitech StreamCam in addition to the GX80. As you’d expect, image quality is not really comparable, in particular white balance tends to be quite off for a while after switching it on. I still use it occasionally for video calls, as it’s a bit quicker to turn on and set up in comparison to the GX80.\nWork in Progress: Teleprompter One of my pet peeves with modern communication is the lack of \u0026#34;eye contact\u0026#34; during virtual conference talks and video calls. As we all want to look onto the screen rather than the camera, the viewer on the other side feels like you are not looking at them, but slightly below or to the side. While I believe I largely manage to look into the camera when doing talk recordings, I find it nearly impossible to do so during calls, as the natural desire to look at the other person’s image on my screen is just too strong.\nThat’s why I’ve started to explore how I could build my own teleprompter, which puts the camera behind a two-way mirror. That way, I can look at the screen, while also looking straight into the camera. For this purpose, I bought two-way mirror glass on eBay (Schott beamsplitter glass, which is working amazingly well) as well as a cheap-ish external screen, and built a quick proof-of-concept (again using some of my daughter’s Duplo bricks, this time for the frame).\nThe result was pretty promising, with one open challenge being that the display contents are mirrored from left to right. So I’d need to digitally mirror the output of that display; if you are aware of any option to do so on macOS, any pointers would be appreciated. With 11.6\u0026#34;, the screen also is rather small, if you consider building something like this by yourself, I’d recommend going for a larger one.\nSince then, I’ve dropped that ball a bit and haven’t followed through yet to make it \u0026#34;production-worthy\u0026#34;. I’d still love to make this useful in practice eventually, perhaps once my daughter lets me keep those Duplo bricks ;)\nLighting The best camera won’t help you much if there isn’t enough light to work with. Generally, the more light you have, the easier the job will be for the camera. I have a ring light similar to this one, with adjustable brightness and color temperature. I don’t have much to say about it, other than that it does what I want it to do. Note that the tripod requires some space on the floor, which means you cannot move your desk all the way to the wall if you have the light behind it. It’s not that much of a problem in my case, but you may consider getting a desk-mount alternatively.\nOne problem I do have with the ring light is reflections on my glasses. I haven’t really found a good solution here (no, I won’t get contact lenses), other than pushing the ring light a bit higher than ideal, so that there are no reflections when looking into the camera further below. On the downside, this results in the area below my chin becoming a bit shaded. A case of having to choose your poison, I suppose.\nFigure 5. Background Lights When doing conference talks, I have two more lights in the backgrounds which make for a nicer atmosphere of the scenery. A vintage light (no-name brand, got it from my local hardware store) which adds a nice highlight, and a Philips Hue Iris lamp which adds a colored note of my choosing. Overall, I’m like 90% happy with the lighting set-up, the comment by video grandmaster Nicolai about lacking separation of background and foreground still nags me ;)\nAudio Finally, let’s talk about my audio recording set-up. This definitely is the area I knew the least about when setting out to improve my computing and recording gear. I don’t quite remember when and how I got sucked into the audio game, perhaps it was when I learned about scientific research indicating that audio quality impacts the perceived quality of spoken content.\nAfter a rather disappointing experience with the RØDE NT-USB (perhaps it’s my lack of audiophile sensitivity, but I didn’t sense a significant difference compared to using the built-in laptop mic), I decided to look for an external microphone which doesn’t connect via USB. After some research, I decided to go for the RØDE Procaster, which is a rather professional microphone purpose-built for voice recording. It is a dynamic microphone, which in comparison to a condenser microphone will pick up much less noise from your surroundings (you can learn more about the differences between these two kinds of microphones here). This means that I don’t have to ask my family to be extra-silent in the house while I am doing a recording.\nFigure 6. RØDE Procaster Microphone One thing to keep in mind is that this type of microphone is meant to be put rather close to your mouth, which you may or may not find annoying. Personally, I sort of like how this makes speaking a more conscious act, but I’d probably not like to have the microphone in front of me when doing a multi-hour call. That’s why I also have a cheap-ish headset as an alternative for these situations. Yet another — and more costly — option would be to get a shotgun microphone which you can position further away from you.\nThe microphone is rather heavy (and you wouldn’t want to hold it anyways), so I am using the PSA1 studio boom arm. It lets you move the microphone with a single finger to where you want it to be, and then it will stay exactly there. A really solid piece of engineering, in particular when comparing it to the no-name mount I’m using for the camera.\nHaving an external microphone is just one part of the story, though. You also need to have an audio interface which lets you plug in the microphone (using an XLR cable) and then propagates the audio signal to your computer via USB. I didn’t do much exploration here, but went for the PreSonus AudioBox USB 96, which was recommended to me by a coworker. In general, it does the job well, there’s two things I don’t like about it though.\nFigure 7. PreSonus AudioBox USB 96 Audio Interface First, it doesn’t have a physical power switch, which means its two (rather bright) red LEDs will be lighting up as long as it’s connected to the USB port. Secondly, I really wished it would have a built-in option to emit the microphone signal on both audio channels, left and right. As a microphone is a mono audio source, you’ll hear the signal only on one channel (typically the left one) on your computer. When doing recordings, where you have the time and ability to do some post-processing, that’s not a big problem; you can simply duplicate the audio track to both channels. But when using the microphone in a Zoom call or similar, the one-sided output is not what you want. In absence of hardware support for this kind of upmixing in the AudioBox, I had to go for a software solution, which took me quite some time to figure out.\nOn macOS, this requires two programs, LadioCast and Blackhole. The former lets you take the single channel input from the AudioBox and expose it on both channels, left and right. This is then connected to a virtual audio device created using the BlackHole audio driver. In Zoom or similar software, you then use that virtual device for the audio input. This works reliably and without any noticeable latency. Still I wished the AudioBox would just take care of all of this and provide me with the microphone input upmixed to both channels.\nFigure 8. Setting up a virtual audio device using BlackHole and connecting the mono microphone input to it using both channels via LadioCast; note how channel 1 is used for both L and R in the input configuration in LadioCast Coming back to the microphone, one thing to be aware of is that it provides a rather low output signal. While you can boost it up far enough with the AudioBox, you’ll start to hear some noise. And I haven’t spent hundreds of Euros and multiple hours to get noise, have I?! So I did what every reasonable person would do in that situation: spend some more money.\nFigure 9. CloudLifter CL-1 Mic Activator The solution was to add a pre-amplifier. Here I went for the CloudLifter, which you put between the microphone and the audio interface. It takes 48V phantom power (which the AudioBox provides) and adds +25dB of gain, giving me audio with proper volume, without any audible hiss whatsoever. Take that, sunken cost fallacy!\nIf you would like to hear (and see) a recording with this set-up, have a look at this session about the JfrUnit project from P99Conf earlier this year.\nWhat’s Next? Overall, I’m very happy with my computing and recording set-up. One thing that still could be improved is lighting. It’s a common practice to work with two front lights (or one from the front and one from the side), so I’ll probably buy another light at some point. I also hope to finish the teleprompter project and put it into daily use.\nOther than that, I am sometimes wondering whether I should get a second mirrorless camera and a video switcher like the Atem Mini and explore a multi-camera set-up. I’m certain that this would be lots of fun, on the other hand I don’t really have the need for it…​ yet?\nMany thanks to Hans-Peter Grahsl for his feedback while writing this blog post!\n","id":143,"publicationdate":"Oct 24, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_screen\"\u003eThe Screen\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_compute\"\u003eCompute\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_cameras\"\u003eCameras\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_work_in_progress_teleprompter\"\u003eWork in Progress: Teleprompter\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_lighting\"\u003eLighting\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_audio\"\u003eAudio\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_whats_next\"\u003eWhat’s Next?\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"imageblock\"\u003e\n\u003cdiv class=\"content\"\u003e\n\u003cimg src=\"/images/desk_complete.jpg\" alt=\"desk complete\"/\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI’ve been working from home exclusively for the last nine years,\nbut it was only last year that I started to look into ways for expanding my computer set-up and go beyond the usual combination of having a laptop with your regular external screen.\nThe global COVID-19 pandemic, the prospect of having more calls with colleagues than ever (no physical meetings), and the constantly increasing need for recording talks for online conferences and meet-ups made me reevaluate things and steadily improve and fine tune my set-up, in particular in regards to better video and audio quality.\u003c/p\u003e\n\u003c/div\u003e","tags":["workspace","hardware"],"title":"What's on My Desk?","uri":"https://www.morling.dev/blog/whats-on-my-desk/"},{"content":"","id":144,"publicationdate":"Oct 24, 2021","section":"tags","summary":"","tags":null,"title":"workspace","uri":"https://www.morling.dev/tags/workspace/"},{"content":"","id":145,"publicationdate":"Oct 18, 2021","section":"tags","summary":"","tags":null,"title":"documentation","uri":"https://www.morling.dev/tags/documentation/"},{"content":" Table of Contents Including Snippets From Your Test Directory Summary It has been just a few weeks since the release of Java 17, but the first changes scheduled for Java 18 begin to show up in early access builds. One feature in particular that excites me as a maintainer of different Java libraries is JEP 413 (\u0026#34;Code Snippets in Java API Documentation\u0026#34;).\nSo far, JavaDoc has not made it exactly comfortable to include example code which shows how to use an API: you had to escape special characters like \u0026#34;\u0026lt;\u0026#34;, \u0026#34;\u0026gt;\u0026#34;, and \u0026#34;\u0026amp;\u0026#34;, indentation handling was cumbersome. But the biggest problem was that any such code snippet would have to be specified within the actual JavaDoc comment itself, i.e. you did not have proper editor support when creating it, and worse, it was not validated that the shown code actually is correct. This often led to code snippets which wouldn’t compile if you were to copy them into a Java source file, be it due to an oversight by the author, or simply because APIs changed over time and no one was thinking of updating the corresponding snippets in JavaDoc comments.\nAll this is going to change with JEP 413: it does not only improve ergonomics of inline snippets, but it also allows you to include code snippets from external source files. This means that you’ll be able to edit and refactor any example code using your regular Java toolchain; better yet: you can also compile and test it as part of your build. Welcome to 2021 — no more wrong or outdated code snippets in JavaDoc!\nIncluding Snippets From Your Test Directory You could think of different ways for organizing your snippet files with JEP 413, but one particularly intriguing option is to source them straight from the tests of your project, e.g. the src/test/java directory in case of a Maven project. That way, any incorrect snippet code — be it due to compilation failures or due to failing test assertions — will be directly flagged within your build.\nSo let’s see how to set this up, using the Jakarta Bean Validation API project as an example. The required configuration is refreshingly simple; all we need to do is to specify src/test/java as our \u0026#34;snippet path\u0026#34;. While the Maven JavaDoc plug-in does not yet provide a bespoke configuration option for this, we can simply pass it using the \u0026lt;additionalOptions\u0026gt; property (make sure to use version 3.0.0 or later):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-javadoc-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.3.1\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;attach-javadocs\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;jar\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;additionalOptions\u0026gt; \u0026lt;additionalOption\u0026gt; (1) --snippet-path=${basedir}/src/test/java \u0026lt;/additionalOption\u0026gt; \u0026lt;/additionalOptions\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; 1 Obtain snippets from src/test/java And that’s all there is to it really, you now can start to work with example code as actual source code. Here’s an example for a snippet to be included into the API documentation of jakarta.validation.Validation, the entry point into the Bean Validation API:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 package snippets; (1) import jakarta.validation.Validation; import jakarta.validation.ValidatorFactory; public class CustomProviderSnippet { public void customProvider() { // @start region=\u0026#34;provider\u0026#34; (2) ACMEConfiguration configuration = Validation .byProvider(ACMEProvider.class) .providerResolver( new MyResolverStrategy() ) .configure(); ValidatorFactory factory = configuration.buildValidatorFactory(); // @end (2) } } 1 There’s no specific requirements on the package to be used; I like using a descriptive name snippets, so to easily tell apart snippets from functional tests 2 If you don’t want to include the entire file, regions allow to specify the exact section(s) to include While a plain method is shown here, this could of course also be an JUnit test with assertions for making sure that the snippet code does what it is supposed to do (being an API specification, the Bean Validation project itself doesn’t provide an implementation we could test against). Including the snippet into the JavaDoc in the source file is straight-forward:\n1 2 3 4 5 6 7 8 9 10 11 /** * ... * \u0026lt;li\u0026gt; * The third approach allows you to specify explicitly and in * a type safe fashion the expected provider. * \u0026lt;p\u0026gt; * Optionally you can choose a custom {@code ValidationProviderResolver}. * {@snippet class=\u0026#34;snippets.CustomProviderSnippet\u0026#34; region=\u0026#34;provider\u0026#34;} (1) * \u0026lt;/li\u0026gt; * ... */ 1 Specify the snippet either using the class or the file attribute; optionally define a specific snippet region to be included If needed, you also can customize appearance of the rendered snippet, so to add links, highlight key parts (using custom CSS styles if needed), or replace specific parts of the snippet. The latter comes in handy for instance to replace non-critical parts with a placeholder such as \u0026#34;…​\u0026#34;. This is one of the details I really like about this JEP: Even if you did manage example code in separate source files in the past, then manually copying them into JavaDoc, such placeholders made things cumbersome. Naturally, they’d fail compilation, e.e. you always had to do some manual editing when copying over the snippet into JavaDoc. Getting all this \u0026#34;for free\u0026#34; is a very nice improvement.\nHere’s an example showing these adjustments in source form (scroll to the right to see all the snippet tag attributes, as these lines can become fairly long):\n1 2 3 4 5 6 7 8 9 public void customProvider() { // @start region=\u0026#34;provider\u0026#34; ACMEConfiguration configuration = Validation .byProvider(ACMEProvider.class) // @highlight substring=\u0026#34;byProvider\u0026#34; (1) .providerResolver( new MyResolverStrategy() ) // @replace regex=\u0026#34; new MyResolverStrategy\\(\\) \u0026#34; replacement=\u0026#34;...\u0026#34; (2) .configure(); ValidatorFactory factory = configuration.buildValidatorFactory(); // @link regex=\u0026#34;^.*?ValidatorFactory\u0026#34; target=\u0026#34;jakarta.validation.ValidatorFactory\u0026#34; (3) // @end } 1 Highlight the byProvider() method 2 Replace the parameter value of the method call with \u0026#34;…​\u0026#34; 3 Make the ValidatorFactory class name a link to its own JavaDoc And this is how the snippet will looks like in the rendered documention:\nSome folks may argue that it might be nice to have proper colored syntax highlighting support. I’m not sure whether I agree though: your typical code snippets in API docs should be rather short, and simply highlighting key parts like shown above may be more useful than colorizing the entire thing. Note the extra new line at the beginning of the snippet shouldn’t really be there, it’s not quite clear to me where it’s coming from. I’ll try and get this clarified on the javadoc-dev mailing list.\nSummary Being able to include code snippets from actual source files into API documentation is a highly welcomed improvement for Java API docs authors and users alike. It’s great to see Java catching up here with other language eco-systems like Rust, which already support executable documentation examples. I’m expecting this feature to be used very quickly, with first folks already announcing to build their API docs with Java 18 as soon as it’s out. Of course you can still ensure compatibility of your code with earlier Java versions also when doing so.\nIf you’d like get your hands on executable JavaDoc code snippets yourself, you can start with this commit showing the required changes for the Bean Validation API. Run mvn clean verify, and you’ll find the rendered JavaDoc under target/apidocs. Just make sure to build this project using a current Java 18 early access build. Happy snippeting!\n","id":146,"publicationdate":"Oct 18, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_including_snippets_from_your_test_directory\"\u003eIncluding Snippets From Your Test Directory\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt has been just a few weeks since the \u003ca href=\"https://www.infoq.com/news/2021/09/java17-released/\"\u003erelease of Java 17\u003c/a\u003e, but the first changes scheduled for Java 18 begin to show up in early access builds.\nOne feature in particular that excites me as a maintainer of different Java libraries is \u003ca href=\"https://openjdk.java.net/jeps/413\"\u003eJEP 413\u003c/a\u003e (\u0026#34;Code Snippets in Java API Documentation\u0026#34;).\u003c/p\u003e\n\u003c/div\u003e","tags":["java","documentation","tooling"],"title":"Executable JavaDoc Code Snippets","uri":"https://www.morling.dev/blog/executable-javadoc-code-snippets/"},{"content":"","id":147,"publicationdate":"Aug 29, 2021","section":"tags","summary":"","tags":null,"title":"i18n","uri":"https://www.morling.dev/tags/i18n/"},{"content":"","id":148,"publicationdate":"Aug 29, 2021","section":"tags","summary":"","tags":null,"title":"jpms","uri":"https://www.morling.dev/tags/jpms/"},{"content":" Table of Contents The ResourceBundleProvider Interface Resource Bundle Providers Running on the Classpath Discussion and Wrap-Up The ResourceBundle class is Java’s workhorse for managing and retrieving locale specific resources, such as error messages of internationalized applications. With the advent of the module system in Java 9, specifics around discovering and loading resource bundles have changed quite a bit, in particular when it comes to retrieving resource bundles across the boundaries of named modules.\nIn this blog post I’d like to discuss how resource bundles can be used in a multi-module application (i.e. a \u0026#34;modular monolith\u0026#34;) for internationalizing error messages. The following requirements should be satisified:\nThe individual modules of the application should contribute bundles with their specific error messages, avoiding the need for developers from the team having to work on one large shared resource bundle\nOne central component (like an error handler) should use these bundles for displaying or logging the error messages in a uniform way\nThere should be no knowledge about the specific modules needed in the central component, i.e. it should be possible to add further modules to the application, each with their own set of resource bundles, without having to modify the central component\nThe rationale of this design is to enable individual development teams to work independently on their respective components, including the error message resource bundles, while ensuring consistent preparation of messages via the central error handler.\nAs an example, we’re going to use Links, a hypothetical management software for golf courses. It is comprised of the following modules (click on image to enlarge):\nThe core module contains common \u0026#34;framework\u0026#34; code, such as the error handler class. The modules greenkeeping, tournament, and membership represent different parts of the business domain of the Links application. Normally, this is where we’d put our business logic, but in the case at hand they’ll just contain the different resource bundles. Lastly, the app module provides the entry point of the application in form of a simple main class.\nThe ResourceBundleProvider Interface If you have worked with resource bundles before, you may have come across approaches for merging multiple bundles into one. While technically still doable when running with named Java modules, it is not adviseable; in order to be found across module boundaries, your bundles would have to reside in open packages. Also, as no package must be contained in more than one module, you’d have to implement some potentially complex logic for identifying bundles contributed by different modules, whose exact names you don’t know (see the third requirement above). You may consider to use automatic modules, but then you’d void some advantages of the Java module system, such as the ability to create modular runtime images.\nThe solution to these issues comes in the form of the ResourceBundleProvider API, introduced alongside the module system in Java 9. Based on the Java service loader mechanism, it enables one module to retrieve bundles from other modules in a loosely coupled way; the consuming module neither needs to know about the providing modules themselves, nor about implementation details such as their internally used bundle names and locations.\nSo let’s see how we can use ResourceBundleProvider in the Links application. The first step is to define a bundle-specific service provider interface, derived from ResourceBundleProvider:\n1 2 3 4 5 6 package dev.morling.links.core.spi; import java.util.spi.ResourceBundleProvider; public interface LinksMessagesProvider extends ResourceBundleProvider { } The name of bundle provider interfaces must follow the pattern \u0026lt;package of baseName\u0026gt; + \u0026#34;.spi.\u0026#34; + \u0026lt;simple name of baseName\u0026gt; + \u0026#34;Provider\u0026#34;. As the base name is dev.morling.links.core.LinksMessages in our case, the provider interface name must be dev.morling.links.core.spi.LinksMessagesProvider. This can be sort of a stumbling stone, as an innocent typo in the package or type name will cause your bundle not to be found, without good means of analyzing the situation, other than double and triple checking that all names are correct.\nNext, we need to declare the usage of this provider interface in the consuming module. Assuming the afore-mentioned error handler class is located in the core module, the module descriptor of the same looks like so:\n1 2 3 4 5 module dev.morling.links.core { exports dev.morling.links.core; exports dev.morling.links.core.spi; (1) uses dev.morling.links.core.spi.LinksMessagesProvider; (2) } 1 Export the package of the resource bundle provider interface so that implementations can be created in other modules 2 Declare the usage of the LinksMessagesProvider service Using the resource bundle in the error handler class is rather unexciting; note that not our own application code retrieves the resource bundle provider via the service loader, but instead this is happening in the ResourceBundle::getBundle() factory method:\n1 2 3 4 5 6 7 8 9 public class ErrorHandler { public String getErrorMessage(String key, UserContext context) { ResourceBundle bundle = ResourceBundle.getBundle( \u0026#34;dev.morling.links.base.LinksMessages\u0026#34;, context.getLocale()); return \u0026#34;[User: \u0026#34; + context.getName() + \u0026#34;] \u0026#34; + bundle.getString(key); } } Here, the error handler simply obtains the message for a given key from the bundle, using the locale of some user context object, and returning a message prefixed with the user’s name. This implementation just serves for example purposes of course; in an actual application, message keys might for instance be obtained from application specific exception types, raised in the different modules, and logged in a unified way via the error handler.\nResource Bundle Providers With the code in the core module in place (mostly, that is, as we’ll see in a bit), let’s shift our attention towards the resource bundle providers in the different application modules. Not too suprising, they need to define an implementation of the LinksMessagesProvider contract.\nThere is one challenge though: how can the different modules contribute implementations for one and the same bundle base name and locale? Once the look-up code in ResourceBundle has found a provider which returns a bundle for a requested name and locale, it will not query any other bundle providers. In our case though, we need to be able to obtain messages from any of the bundles contributed by the different modules: messages related to green keeping must be obtained from the bundle of the dev.morling.links.greenkeeping module, tournament messages from dev.morling.links.tournament, and so on.\nThe idea to address this concern is the following:\nPrefix each message key with a module specific string, resulting in keys like tournament.fullybooked, greenkeeping.greenclosed, etc.\nWhen requesting the bundle for a given key in the error handler class, obtain the key’s prefix and pass it to bundle providers\nLet bundle providers react only to their specific message prefix\nThis is where things become a little bit fiddly: there isn’t a really good way for passing such contextual information from bundle consumers to providers. Our loop hole here will be to squeeze that information into the the requested Locale instance. Besides the well-known language and country attributes, Locale can also carry variant data and even application specific extensions.\nThe latter, in form of a private use extension, would actually be pretty much ideal for our purposes. But unfortunately, extensions aren’t evaluated by the look-up routine in ResourceBundle. So instead we’ll go with propagating the key namespace information via the locale’s variant. First, let’s revisit the code in the ErrorHandler class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public class ErrorHandler { public String getErrorMessage(String key, UserContext context) { String prefix = key.split(\u0026#34;\\\\.\u0026#34;)[0]; (1) Locale locale = new Locale( (2) context.getLocale().getLanguage(), context.getLocale().getCountry(), prefix ); ResourceBundle bundle = ResourceBundle.getBundle( \u0026#34;dev.morling.links.core.LinksMessages\u0026#34;, locale); (3) return \u0026#34;[User: \u0026#34; + context.getName() + \u0026#34;] \u0026#34; + bundle.getString(key); (4) } } 1 Extract the key prefix, e.g. \u0026#34;greenkeeping\u0026#34; 2 Construct a new Locale, using the language and country information from the current user’s locale and the key prefix as variant 3 Retrieve the bundle using the adjusted locale 4 Prepare the error message Based on this approach, the resource bundle provider implementation in the greenkeeping module looks like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 public class GreenKeepingMessagesProvider extends AbstractResourceBundleProvider implements LinksMessagesProvider { @Override public ResourceBundle getBundle(String baseName, Locale locale) { if (locale.getVariant().equals(\u0026#34;greenkeeping\u0026#34;)) { (1) baseName = baseName.replace(\u0026#34;core.LinksMessages\u0026#34;, \u0026#34;greenkeeping.internal.LinksMessages\u0026#34;); (2) locale = new Locale(locale.getLanguage(), locale.getCountry()); (3) return super.getBundle(baseName), locale); } return null; (4) } } 1 This provider only should return a bundle for \u0026#34;greenkeeping\u0026#34; messages 2 Retrieve the bundle, adjusting the name (see below) 3 Create a Locale without the variant 4 Let other providers kick in for messages unrelated to green-keeping The adjustment of the bundle name deserves some more explanation. The module system forbids so-called \u0026#34;split packages\u0026#34;, i.e. packages of the same name in several modules of an application. That’s why we cannot have a bundle named dev.morling.links.core.LinksMessages in multiple modules, even if the package dev.morling.links.core isn’t exported by any of them. So each module must have its bundles in a specific package, and the bundle provider has to adjust the name accordingly, e.g. into dev.morling.links.greenkeeping.internal.LinksMessages in the greenkeeping module.\nAs with the service consumer, the service provider also must be declared in the module’s descriptor:\n1 2 3 4 5 6 module dev.morling.links.greenkeeping { requires dev.morling.links.core; provides dev.morling.links.core.spi.LinksMessagesProvider with dev.morling.links.greenkeeping.internal. ↩ GreenKeepingMessagesProvider; } Note how the package of the provider and the bundle isn’t exported or opened, solely being exposed via the service loader mechanism. For the sake of completeness, here are two resource bundle files from the greenkeeping module, one for English, and one for German:\n1 greenkeeping.greenclosed=Green closed due to mowing 1 greenkeeping.greenclosed=Grün wegen Pflegearbeiten gesperrt Lastly, some test for the ErrorHandler class, making sure it works as expected:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 ErrorHandler errorHandler = new ErrorHandler(); String message = errorHandler.getErrorMessage(\u0026#34;greenkeeping.greenclosed\u0026#34;, new UserContext(\u0026#34;Bob\u0026#34;, Locale.US)); assert message.equals(\u0026#34;[User: Bob] Green closed due to mowing\u0026#34;); message = errorHandler.getErrorMessage(\u0026#34;greenkeeping.greenclosed\u0026#34;, new UserContext(\u0026#34;Herbert\u0026#34;, Locale.GERMANY)); assert message.equals(\u0026#34;[User: Herbert] Grün wegen \u0026#34; + \u0026#34;Pflegearbeiten gesperrt\u0026#34;); message = errorHandler.getErrorMessage(\u0026#34;tournament.fullybooked\u0026#34;, new UserContext(\u0026#34;Bob\u0026#34;, Locale.US)); assert message.equals(\u0026#34;[User: Bob] This tournament is fully booked\u0026#34;); Running on the Classpath At this point, the design supports cross-module look-ups of resource bundles when running the application on the module path. Can we also make it work when running the same modules on the classpath instead? Indeed we can, but some slight additions to the core module will be needed. The reason being, that ResourceBundleProvider service contract isn’t considered at all by the the bundle retrieval logic in ResourceBundle when running on the classpath.\nThe way out is to provide a custom ResourceBundle.Control implementation which mimicks the logic for adjusting the bundle names based on the requested locale variant, as done by the different providers above:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public class LinksMessagesControl extends Control { @Override public String toBundleName(String baseName, Locale locale) { if (locale.getVariant() != null) { baseName = baseName.replace(\u0026#34;core.LinksMessages\u0026#34;, locale.getVariant() + \u0026#34;.internal.LinksMessages\u0026#34;); (1) locale = new Locale(locale.getLanguage(), locale.getCountry()); (2) return super.toBundleName(baseName, locale); } return super.toBundleName(baseName, locale); } } 1 Adjust the requested bundle name so that the module-specific bundles are retrieved 2 Drop the variant name from the locale Now we could explicitly pass in an instance of that Control implementation when retrieving a resource bundle through ResourceBundle::getBundle(), but there’s a simpler solution in form of the not overly widely known ResourceBundleControlProvider API:\n1 2 3 4 5 6 7 8 9 10 11 public class LinksMessagesControlProvider implements ResourceBundleControlProvider { @Override public Control getControl(String baseName) { if (baseName.equals(\u0026#34;dev.morling.links.core.LinksMessages\u0026#34;)) { (1) return new LinksMessagesControl(); } return null; } } 1 Return the LinksMessagesControl when the LinksMessages bundle is requested This is another service provider contract; its implementations are retrieved from the classpath when obtaining a resource bundle and no control has been given explicity. Of course, the service implementation still needs to be registered, this time using the traditional approach of specifying the implementation name(s) in the META-INF/services/java.util.spi.ResourceBundleControlProvider file:\ndev.morling.links.core.internal.LinksMessagesControlProvider With the control and control provider in place, the modular resource bundle look-up will work on the module path as well as the classpath, when running on Java 9+. There’s one caveat remaining though if we want to enable the application also to be run on the classpath with Java 8.\nIn Java 8, ResourceBundleControlProvider implementations are not picked up from the classpath, but only via the Java extension mechanism (now deprecated). This means you’d have to provide the custom control provider through the lib/ext or jre/lib/ext directory of your JRE or JDK, respectively, which often isn’t very practical. At this point we might be ready to cave in and just pass in the custom control implementation to ResourceBundle::getBundle(). But we can’t actually do that: when invoked in a named module on Java 9+ (which is the case when running the application on the module path), the getBundle(String, Locale, Control) method will raise an UnsupportedOperationException!\nTo overcome this last obstacle and make the application useable across the different Java versions, we can resort to the multi-release JAR mechanism: two different versions of the ErrorHandler class can be provided within a single JAR, one to be used with Java 8, and another one to be used with Java 9 and later. The latter calls getBundle(String, Locale), i.e. not passing the control, thus using the resource bundle providers (when running on the module path) or the control provider (when running on the classpath). The former invokes getBundle(String, Locale, Control), allowing the custom control to be used on Java 8.\nBuilding Multi-Release JARs When multi-release JARs were first introduced in Java 9 with JEP 238, tool support for building them was non-existent, making this task quite a challenging one. Luckily, the situation has improved a lot since then. When using Apache Maven, only two plug-ins need to be configured:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-compiler-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; (1) \u0026lt;id\u0026gt;compile-java-9\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;compile\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;compile\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;release\u0026gt;9\u0026lt;/release\u0026gt; (2) \u0026lt;compileSourceRoots\u0026gt; \u0026lt;compileSourceRoot\u0026gt; ${project.basedir}/src/main/java-9 (3) \u0026lt;/compileSourceRoot\u0026gt; \u0026lt;/compileSourceRoots\u0026gt; \u0026lt;multiReleaseOutput\u0026gt;true\u0026lt;/multiReleaseOutput\u0026gt; (4) \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-jar-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;archive\u0026gt; \u0026lt;manifestEntries\u0026gt; \u0026lt;Multi-Release\u0026gt;true\u0026lt;/Multi-Release\u0026gt; (5) \u0026lt;/manifestEntries\u0026gt; \u0026lt;/archive\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; ... 1 Set up another execution of the Maven compiler plug-in for the Java 9 specific sources, 2 using Java 9 bytecode level, 3 picking up the sources from src/main/java-9, 4 and organizing the compilation output in the multi-release structure under META-INF/versions/…​ 5 Configure the Maven JAR plug-in so that the Multi-Release manifest entry is set, marking the JAR als a multi-release JAR Discussion and Wrap-Up Let’s wrap up and evaluate whether the proposed implementation satisfies our original requirements:\nModules of the application contribute bundles with their specific error messages: ✅ Each module of the Links application can provide its own bundle(s), using a specific key prefix; we could even take it a step further and provide bundles via separate i18n modules, for instance created by an external translation agency, independent from the development teams\nCentral error handler component can use these bundles for displaying or logging the error messages: ✅ The error handler in the core module can retrieve messages from all the bundles in the different modules, freeing the developers of the application modules from details like adding the user’s name to the final messages\nNo knowledge about the specific modules in the central component: ✅ Thanks to the different providers (or the custom Control, respectively), there is no need for registering the specific bundles with the error handler in the core module; further modules could be added to the Links application and the error handler would be able to obtain messages from the resource bundles contributed by them\nWith a little bit of extra effort, it also was possible to design the code in the core module in a way that the application can be used with different Java versions and configurations: on the module path with Java 9+, on the classpath with Java 9+, on the classpath with Java 8.\nIf you’d like to explore the complete code by yourself, you can find it in the modular-resource-bundles GitHub repository. To learn more about resource bundle retrieval in named modules, please refer to the extensive documentation of ResourceBundle and ResourceBundleProvider.\nMany thanks to Hans-Peter Grahsl for providing feedback while writing this post!\n","id":149,"publicationdate":"Aug 29, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_resourcebundleprovider_interface\"\u003eThe \u003ccode\u003eResourceBundleProvider\u003c/code\u003e Interface\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_resource_bundle_providers\"\u003eResource Bundle Providers\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_running_on_the_classpath\"\u003eRunning on the Classpath\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_discussion_and_wrap_up\"\u003eDiscussion and Wrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ResourceBundle.html\"\u003e\u003ccode\u003eResourceBundle\u003c/code\u003e\u003c/a\u003e class is Java’s workhorse for managing and retrieving locale specific resources,\nsuch as error messages of internationalized applications.\nWith the advent of the module system in Java 9, specifics around discovering and loading resource bundles have changed quite a bit, in particular when it comes to retrieving resource bundles across the boundaries of named modules.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this blog post I’d like to discuss how resource bundles can be used in a multi-module application\n(i.e. a \u0026#34;modular monolith\u0026#34;) for internationalizing error messages.\nThe following requirements should be satisified:\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jpms","i18n"],"title":"Resource Bundle Look-ups in Modular Java Applications","uri":"https://www.morling.dev/blog/resource-bundle-lookups-in-modular-java-applications/"},{"content":" Table of Contents Getting Started With JfrUnit Groovier Tests With Spock Outlook Unit testing, for performance\nIt’s with great pleasure that I’m announcing the first official release of JfrUnit today!\nJfrUnit is an extension to JUnit which allows you to assert JDK Flight Recorder events in your unit tests. This capability opens up a number of interesting use cases in the field of testing JVM-based applications:\nYou can use JfrUnit to ensure your application produces the custom JFR events you expect it to emit\nYou can use JfrUnit to identify potential performance regressions of your application by means of tracking JFR events e.g. for garbage collection, memory allocation and network I/O\nYou can use JfrUnit together with JMC Agent for whitebox tests of your application, ensuring specific methods are invoked with the expected parameters and return values\nGetting Started With JfrUnit JfrUnit is available on Maven Central (a big shout-out to Andres Almiray for setting up a fully automated release pipeline using the excellent JReleaser project!). If you’re working with Apache Maven, add the following dependency to your pom.xml file:\n1 2 3 4 5 6 7 8 ... \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.moditect.jfrunit\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jfrunit\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.0.0.Alpha1\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;test\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; ... Alternatively, you can of course build JfrUnit from source yourself, as described in the project’s README file.\nWhat is ModiTect? JfrUnit is part of the ModiTect family of open-source projects. All the ModiTect projects are in some way related to Java infrastructure, such as the Java Module System, or JDK Flight Recorder. Besides JfrUnit, the following project are currently developed under the ModiTect umbrella:\nModiTect: this eponymous project provides tooling for the Java Module System, e.g. for adding module descriptors while building with Java 8, creating jlink images, etc.\nLayrry: a Runner and API for layered Java applications, which lets you use the module system’s notion of module layers for implementing plug-in architectures, loading multiple versions of one dependency into your application, etc.\nDeptective 🕵️: a plug-in for the javac compiler for analysing, validating and enforcing well-defined relationships between the packages of a Java application\nWith that dependency in place, the steps of using JfrUnit are the following:\nEnable the JFR event type(s) you want to assert against\nRun the application logic under test\nAssert the emitted JFR events\nTo make things more tangible, here’s an example that asserts the memory allocation done by a Quarkus-based web application for a specific use case:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 @Test @EnableEvent(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) (1) @EnableEvent(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;) public void retrieveTodoShouldYieldExpectedAllocation() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder() .build(); // warm-up (2) for (int i = 1; i\u0026lt;= WARMUP_ITERATIONS; i++) { if (i % 1000 == 0) { System.out.println(i); } executeRequest(r.nextInt(20) + 1, client); } jfrEvents.awaitEvents(); jfrEvents.reset(); (3) (4) for (int i = 1; i\u0026lt;= ITERATIONS; i++) { if (i % 1000 == 0) { System.out.println(i); } executeRequest(r.nextInt(20) + 1, client); } jfrEvents.awaitEvents(); (5) long sum = jfrEvents.filter(this::isObjectAllocationEvent) .filter(this::isRelevantThread) .mapToLong(this::getAllocationSize) .sum(); assertThat(sum / ITERATIONS).isLessThan(33_000); (6) } 1 Enable the jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB JFR event types; on Java 16 and beyond, you could also use the new jdk.ObjectAllocationSample type instead 2 Do some warm-up iterations so to achieve a steady state for the memory allocation rate 3 Reset the JfrUnit event collector after the warm-up 4 Run the code under test, in this case invoking some REST API of the application 5 Wait until all the events from the test have been received 6 Run assertions against the JFR events, in this case summing up all memory allocations and asserting that the value per REST call isn’t larger than 33K (the exact threshold has been determined upfront) The general idea behind this testing approach is that a regression in regards to metrics like memory allocation or I/O — e.g. with a database — can be a hint for a performance degredation. Allocating more memory than anticipated may be an indicator that your application started to do something which it hadn’t done before, and which may impact its latency and through-put characteristics.\nTo learn more about this approach for identifying potential performance regressions, please refer to this post, which introduced JfrUnit originally.\nGroovier Tests With Spock Thanks to an outstanding contribution by Petr Hejl, instead of the Java-based API, you can also use Groovy and the Spock framework for your JfrUnit tests, which makes for very compact and nicely readable tests. Here’s an example for asserting two JFR events using the Spock integration:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class JfrSpec extends Specification { JfrEvents jfrEvents = new JfrEvents() @EnableEvent(\u0026#39;jdk.GarbageCollection\u0026#39;) (1) @EnableEvent(\u0026#39;jdk.ThreadSleep\u0026#39;) def \u0026#39;should Have GC And Sleep Events\u0026#39;() { when: (2) System.gc() sleep(1000) then: (3) jfrEvents[\u0026#39;jdk.GarbageCollection\u0026#39;] jfrEvents[\u0026#39;jdk.ThreadSleep\u0026#39;].withTime(Duration.ofMillis(1000)) } } 1 Enable the jdk.GarbageCollection and jdk.ThreadSleep event types 2 Run the test code 3 Assert the events; thanks to the integration with Spock, no explicit barrier for awaiting all events is needed To learn more about the Spock-based approach of using JfrUnit, please refer to the instructions in the README.\nFor getting started with JfrUnit yourself, you may take a look at the jfrunit-examples repo, which shows some common usages the project.\nOutlook This first Alpha release is an important milestone for the JfrUnit project. Since its inception in the December of last year, I’ve received tons of invaluable feedback, and the project has matured quite a bit.\nIn terms of next steps, apart from further expanding and honing the API, one area I’d like to explore with JfrUnit is keeping track of and analysing historical event data from multiple test runs over a longer period of time.\nFor instance, consider a case where your REST call allocates 33 KB today, 40 KB next month, 50 KB the month after, etc. Each increase by itself may not be problematic, but when comparing the results from today to those of a run in six months from now, a substantial regression may have accumulated. For identifying and analysing such trends, loading JfrUnit result data into a time series database, or repository systems like Hyperfoil Horreum, may be a very interesting feature.\nOn a related note, John O’Hara has started work towards automated event analysis using the rules system of JDK Mission Control, so stay tuned for some really exciting developments in this area!\nLast but not least, I’d like say thank you to all the folks helping with the work on JfrUnit, be it through discussions, raising feature requests or bug reports, or code changes, including the following fine folks who have contributed to the JfrUnit repository at this point: Andres Almiray, Hash Zhang, Leonard Brünings, Manyanda Chitimbo, Matthias Andreas Benkard, Petr Hejl, Sam Brannen, Sullis, Thomas, Tivrfoa, and Tushar Badgu. Onwards and upwards!\n","id":150,"publicationdate":"Aug 4, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_getting_started_with_jfrunit\"\u003eGetting Started With JfrUnit\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_groovier_tests_with_spock\"\u003eGroovier Tests With Spock\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_outlook\"\u003eOutlook\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eUnit testing, for performance\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt’s with great pleasure that I’m announcing the first official release of JfrUnit today!\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003ca href=\"https://github.com/moditect/jfrunit\"\u003eJfrUnit\u003c/a\u003e is an extension to JUnit which allows you to assert \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e events in your unit tests.\nThis capability opens up a number of interesting use cases in the field of testing JVM-based applications:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eYou can use JfrUnit to ensure your application produces the \u003ca href=\"blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/\"\u003ecustom JFR events\u003c/a\u003e you expect it to emit\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eYou can use JfrUnit to identify potential performance regressions of your application by means of tracking JFR events e.g. for garbage collection, memory allocation and network I/O\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eYou can use JfrUnit together with \u003ca href=\"https://wiki.openjdk.java.net/display/jmc/The+JMC+Agent\"\u003eJMC Agent\u003c/a\u003e for whitebox tests of your application, ensuring specific methods are invoked with the expected parameters and return values\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e","tags":["java","jfr","testing","performance"],"title":"Introducing JfrUnit 1.0.0.Alpha1","uri":"https://www.morling.dev/blog/introducing-jfrunit-1-0-0-alpha1/"},{"content":" Table of Contents Cambrian Explosion of Connectors Democratization of Data Pipelines Stream Processing for Everyone Honorable Mentions Over the course of the last few months, I’ve had the pleasure to serve on the Kafka Summit program committee and review several hundred session abstracts for the three Summits happening this year (Europe, APAC, Americas). That’s not only a big honour, but also a unique opportunity to learn what excites people currently in the Kafka eco-system (and yes, it’s a fair amount of work, too ;).\nWhile voting on the proposals, and also generally aspiring to stay informed of what’s going on in the Kafka community at large, I noticed a few repeating themes and topics which I thought would be interesting to share (without touching on any specific talks of course). At first I meant to put this out via a Twitter thread, but then it became a bit too long for that, so I decided to write this quick blog post instead. Here it goes!\nCambrian Explosion of Connectors Apache Kafka is a great commit log and streaming platform, but of course you also need to get data into and out of it. Kafka Connect is vital for doing just that, linking data sources and sinks to the Kafka backbone. Be it integration of legacy apps and databases, external systems (e.g. IoT), data lakes, or DWHs, different CDC options (including Debezium, of course) — There’s connectors for everything.\nThe ever-increasing number of connectors is accompanied by growing operational maturity (large-scale deployments, KC on K8s, etc.) and upcoming improvements like KIP-618 (exactly-once source connectors) or KIP-731 (rate limiting). There’s so much activity within the Kafka connector eco-system, and it really sets Kafka apart from alternatives.\nDemocratization of Data Pipelines Another exciting trend is a move to self-service Kafka environments, with portals and infrastructure aimed at reducing the friction for standing up new deployments of Kafka, Connect, and related components like schema registries, while keeping track of and running everything in a safe way, e.g. when it comes to things like access control, role and schema management, (topic) naming conventions, managing data lineage and quality, ensuring compliance, privacy and operational best-practices, or observability.\nA healthy combination of in-house as well as open-source developments is happening here, and I’m sure it’s a field where we’ll see more tools and solutions appearing in the next months and years.\nStream Processing for Everyone Not exactly a new trend, but definitely a growing one: more and more users appreciate the benefits of stream processing for working with their data in Kafka, filtering, transforming, enriching and aggregating it either programmatically using libraries such as Kafka Streams or Apache Flink, or in a declarative fashion, e.g. via ksqlDB or Flink SQL. Either way, small, focused stream processing apps are a true manifestation of the microservices idea — have cohesive, independent application units, each focusing on one particular task and loosely coupled to each other, via Apache Kafka in this case.\nIt’s great to see the uptake here, including approaches for dynamic scaling based on end-to-end lag, and innovative new solutions for efficient incremental view materialization.\nHonorable Mentions Besides these bigger trends, there’s also a few more specific topics which I saw several times and which I found very interesting:\nTools and best practices for testing of Kafka-based applications (e.g. for creating test data or mock producers/consumers)\nFeeding ML/AI models is becoming a popular Kafka use case; it’s not my field of experience at all, but it seems like a very logical choice to run ML algorithms on data ingested via Kafka, allowing to gain new insight into business data with a low latency\nPushing data to consumers via GraphQL; (still?) even more niche probably, but I love the idea of push updates to browsers based on data from Kafka; this should allow for some interesting use cases\nOf course there’s also things like geo-replicated Kafka, the ongoing move towards managed Kafka service offerings (which raises interesting questions around connectivity to on-prem systems and data), architectural trends like data meshes, and so much more.\nIf you want to learn more about these and many other facets of Apache Kafka, its use cases, best practices, and latest developments, make sure to register for Kafka Summit (it’s free and online). The sessions from the Europe run can already be watched, while the APAC (July 27 - 28) and Americas (September 14 - 15) editions are still to come.\n","id":151,"publicationdate":"May 28, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_cambrian_explosion_of_connectors\"\u003eCambrian Explosion of Connectors\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_democratization_of_data_pipelines\"\u003eDemocratization of Data Pipelines\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_stream_processing_for_everyone\"\u003eStream Processing for Everyone\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_honorable_mentions\"\u003eHonorable Mentions\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOver the course of the last few months, I’ve had the pleasure to serve on the \u003ca href=\"https://www.kafka-summit.org/\"\u003eKafka Summit\u003c/a\u003e program committee and review several hundred session abstracts for the three Summits happening this year (Europe, APAC, Americas).\nThat’s not only a big honour, but also a unique opportunity to learn what excites people currently in the Kafka eco-system\n(and yes, it’s a fair amount of work, too ;).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhile voting on the proposals, and also generally aspiring to stay informed of what’s going on in the Kafka community at large, I noticed a few repeating themes and topics which I thought would be interesting to share\n(without touching on any specific talks of course).\nAt first I meant to put this out via a Twitter thread, but then it became a bit too long for that, so I decided to write this quick blog post instead.\nHere it goes!\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","kafka-connect","streaming"],"title":"Three Plus Some Lovely Kafka Trends","uri":"https://www.morling.dev/blog/three-plus-some-lovely-kafka-trends/"},{"content":" Table of Contents Trying ZK-less Kafka Yourself Taking Brokers Down Wrap-Up Sometimes, less is more. One case where that’s certainly true is dependencies. And so it shouldn’t come at a surprise that the Apache Kafka community is eagerly awaiting the removal of the dependency to the ZooKeeper service, which currently is used for storing Kafka metadata (e.g. about topics and partitions) as well as for the purposes of leader election in the cluster.\nThe Kafka improvement proposal KIP-500 (\u0026#34;Replace ZooKeeper with a Self-Managed Metadata Quorum\u0026#34;) promises to make life better for users in many regards:\nBetter getting started and operational experience by requiring to run only one system, Kafka, instead of two\nRemoving potential for discrepancies of metadata state between ZooKeeper and the Kafka controller\nSimplifying configuration, for instance when it comes to security\nBetter scalability, e.g. in terms of number of partitions; faster execution of operations like topic creation\nWith KIP-500, Kafka itself will store all the required metadata in an internal Kafka topic, and controller election will be done amongst (a subset of) the Kafka cluster nodes themselves, based on a variant of the Raft protocol for distributed consensus. Removing the ZooKeeper dependency is great not only for running Kafka clusters in production, also for local development and testing being able to start up a Kafka node with a single process comes in very handy.\nHaving been in the works for multiple years, ZK-less Kafka, also known as KRaft (\u0026#34;Kafka Raft metadata mode\u0026#34;), was recently published as an early access feature with Kafka 2.8. I.e. the perfect time to get my hands on this and get a first feeling for ZK-less Kafka myself. Note this post isn’t meant to be a thorough evaluation or systematic testing of the new Kafka deployment mode, rather take it as a description of how to get started with playing with ZK-less Kafka and of a few observations I made while doing so.\nIn the world of ZK-less Kafka, there’s two node roles for nodes: controller and broker. Each node in the cluster can have either one or both roles (\u0026#34;combined nodes\u0026#34;). All controller nodes elect the active controller, which is in charge of coordinating the whole cluster, with other controller nodes acting as hot stand-by replicas. In the KRaft KIPs, the active controller sometimes also is simply referred to as leader. This may appear confusing at first, if you are familiar with the existing concept of partition leaders. It started to make sense to me once I realized that the active controller is the leader of the sole partition of the metadata topic. All broker nodes are handling client requests, just as before with ZooKeeper.\nWhile for smaller clusters it is expected that the majority of, or even all cluster nodes act as controllers, you may have dedicated controller-only nodes in larger clusters, e.g. 3 controller nodes and 7 broker nodes in a cluster of 10 nodes overall. As per the KRaft README, having dedicated controller nodes should increase overall stability, as for instance an out-of-memory error on a broker wouldn’t impact controllers, or potentially even cause a leader re-election.\nTrying ZK-less Kafka Yourself As a foundation, I’ve created a variant of the Debezium 1.6 container image, which updates Kafka from 2.7 to Kafka 2.8, and also does the required changes to the entrypoint script for using the KRaft mode. Note this change hasn’t been merged yet to the upstream Debezium repository, so if you’d like to try out things by yourself, you’ll have to clone my repo, and then build the container image yourself like this:\n1 2 3 $ git clone git@github.com:gunnarmorling/docker-images.git $ cd docker-images/kafka/1.6 $ docker build -t debezium/zkless-kafka:1.6 --build-arg DEBEZIUM_VERSION=1.6.0 . In order to start the image with Kafka in KRaft mode, the CLUSTER_ID environment variable must be set. A value can be obtained using the new bin/kafka-storage.sh script; going forward, we’ll likely add an option to the Debezium Kafka container image for doing so. If that variable is set, the entrypoint script of the image does the following things:\nUse config/kraft/server.properties instead of config/server.properties as the Kafka configuration file; this one comes with the Kafka distribution and is meant for nodes which should have both the controller and broker roles; i.e. the container image currently only supports combined nodes\nFormat the node’s storage directory, if not the case yet\nSet up a listener for controller communication\nBased on that, here is what’s needed in a Docker Compose file for spinning up a Kafka cluster with three nodes:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 version: \u0026#39;2\u0026#39; services: kafka-1: image: debezium/zkless-kafka:1.6 ports: - 19092:9092 - 19093:9093 environment: - CLUSTER_ID=oh-sxaDRTcyAr6pFRbXyzA (1) - BROKER_ID=1 (2) - KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093 (3) kafka-2: image: debezium/zkless-kafka:1.6 ports: - 29092:9092 - 29093:9093 environment: - CLUSTER_ID=oh-sxaDRTcyAr6pFRbXyzA (1) - BROKER_ID=2 (2) - KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093 (3) kafka-3: image: debezium/zkless-kafka:1.6 ports: - 39092:9092 - 39093:9093 environment: - CLUSTER_ID=oh-sxaDRTcyAr6pFRbXyzA (1) - BROKER_ID=3 (2) - KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093 (3) 1 Cluster id; must be the same for all the nodes 2 Broker id; must be unique for each node 3 Addresses of all the controller nodes in the format id1@host1:port1,id2@host2:port2…​ No ZooKeeper nodes, yeah :)\nWorking on Debezium, and being a Kafka Connect aficionado allaround, I’m also going to add Connect and a Postgres database for testing purposes (you can find the complete Compose file here):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 version: \u0026#39;2\u0026#39; services: # ... connect: image: debezium/connect:1.6 ports: - 8083:8083 links: - kafka-1 - kafka-2 - kafka-3 - postgres environment: - BOOTSTRAP_SERVERS=kafka-1:9092 - GROUP_ID=1 - CONFIG_STORAGE_TOPIC=my_connect_configs - OFFSET_STORAGE_TOPIC=my_connect_offsets - STATUS_STORAGE_TOPIC=my_connect_statuses postgres: image: debezium/example-postgres:1.6 ports: - 5432:5432 environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres Now let’s start everything:\n1 $ docker-compose -f docker-compose-zkless-kafka.yaml up Let’s also register an instance of the Debezium Postgres connector, which will connect to the PG database and take an initial snapshot, so we got some topics with a few messages to play with:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 $ curl -0 -v -X POST http://localhost:8083/connectors \\ -H \u0026#34;Expect:\u0026#34; \\ -H \u0026#39;Content-Type: application/json; charset=utf-8\u0026#39; \\ --data-binary @- \u0026lt;\u0026lt; EOF { \u0026#34;name\u0026#34;: \u0026#34;inventory-connector\u0026#34;, \u0026#34;config\u0026#34;: { \u0026#34;connector.class\u0026#34;: \u0026#34;io.debezium.connector.postgresql.PostgresConnector\u0026#34;, \u0026#34;tasks.max\u0026#34;: \u0026#34;1\u0026#34;, \u0026#34;database.hostname\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.port\u0026#34;: \u0026#34;5432\u0026#34;, \u0026#34;database.user\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.password\u0026#34;: \u0026#34;postgres\u0026#34;, \u0026#34;database.dbname\u0026#34; : \u0026#34;postgres\u0026#34;, \u0026#34;database.server.name\u0026#34;: \u0026#34;dbserver1\u0026#34;, \u0026#34;schema.include\u0026#34;: \u0026#34;inventory\u0026#34;, \u0026#34;topic.creation.default.replication.factor\u0026#34;: 2, \u0026#34;topic.creation.default.partitions\u0026#34;: 10 } } EOF Note how this is using a replication factor of 2 for all the topics created via Kafka Connect, which will come in handy for some experimenting later on.\nThe nosy person I am, I first wanted to take a look into that new internal metadata topic, where all the cluster metadata is stored. As per the release announcement, it should be named @metadata. But no such topic shows up when listing the available topics; only the __consumer_offsets topic, the change data topics created by Debezium, and some Kafka Connect specific topics are shown:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # Get a shell on one of the broker containers $ docker-compose -f docker-compose-zkless-kafka.yaml exec kafka-1 bash # In that shell $ /kafka/bin/kafka-topics.sh --bootstrap-server kafka-3:9092 --list __consumer_offsets dbserver1.inventory.customers dbserver1.inventory.geom dbserver1.inventory.orders dbserver1.inventory.products dbserver1.inventory.products_on_hand dbserver1.inventory.spatial_ref_sys my_connect_configs my_connect_offsets my_connect_statuses Seems that this topic is truly meant to be internal; also trying to consume messages from the topic with kafka-console-consumer.sh or kafkacat fails due to the invalid topic name. Let’s see whether things are going to change here, since KIP-595 (\u0026#34;A Raft Protocol for the Metadata Quorum\u0026#34;) explicitly mentions the ability for consumers to \u0026#34;read the contents of the metadata log for debugging purposes\u0026#34;.\nIn the meantime, we can take a look at the contents of the metadata topic using the kafka-dump-log.sh utility, e.g. filtering out all RegisterBroker records:\n1 2 3 4 5 6 7 8 $ /kafka/bin/kafka-dump-log.sh --cluster-metadata-decoder \\ --skip-record-metadata \\ --files /kafka/data//\\@metadata-0/*.log | grep REGISTER_BROKER payload: {\u0026#34;type\u0026#34;:\u0026#34;REGISTER_BROKER_RECORD\u0026#34;,\u0026#34;version\u0026#34;:0,\u0026#34;data\u0026#34;:{\u0026#34;brokerId\u0026#34;:3,\u0026#34;incarnationId\u0026#34;:\u0026#34;O_PiUrjNTsqVEQv61gB2Vg\u0026#34;,\u0026#34;brokerEpoch\u0026#34;:0,\u0026#34;endPoints\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;PLAINTEXT\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;172.18.0.2\u0026#34;,\u0026#34;port\u0026#34;:9092,\u0026#34;securityProtocol\u0026#34;:0}],\u0026#34;features\u0026#34;:[],\u0026#34;rack\u0026#34;:null}} payload: {\u0026#34;type\u0026#34;:\u0026#34;REGISTER_BROKER_RECORD\u0026#34;,\u0026#34;version\u0026#34;:0,\u0026#34;data\u0026#34;:{\u0026#34;brokerId\u0026#34;:1,\u0026#34;incarnationId\u0026#34;:\u0026#34;FbOZdz9rSZqTyuSKr12JWg\u0026#34;,\u0026#34;brokerEpoch\u0026#34;:2,\u0026#34;endPoints\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;PLAINTEXT\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;172.18.0.3\u0026#34;,\u0026#34;port\u0026#34;:9092,\u0026#34;securityProtocol\u0026#34;:0}],\u0026#34;features\u0026#34;:[],\u0026#34;rack\u0026#34;:null}} payload: {\u0026#34;type\u0026#34;:\u0026#34;REGISTER_BROKER_RECORD\u0026#34;,\u0026#34;version\u0026#34;:0,\u0026#34;data\u0026#34;:{\u0026#34;brokerId\u0026#34;:2,\u0026#34;incarnationId\u0026#34;:\u0026#34;ZF_WQqk_T5q3l1vhiWT_FA\u0026#34;,\u0026#34;brokerEpoch\u0026#34;:4,\u0026#34;endPoints\u0026#34;:[{\u0026#34;name\u0026#34;:\u0026#34;PLAINTEXT\u0026#34;,\u0026#34;host\u0026#34;:\u0026#34;172.18.0.4\u0026#34;,\u0026#34;port\u0026#34;:9092,\u0026#34;securityProtocol\u0026#34;:0}],\u0026#34;features\u0026#34;:[],\u0026#34;rack\u0026#34;:null}} ... The individual record formats are described in KIP-631 (\u0026#34;The Quorum-based Kafka Controller\u0026#34;).\nAnother approach would be to use a brand-new tool, kafka-metadata-shell.sh. Also defined in KIP-631, this utility script allows to browse a cluster’s metadata, similarly to zookeeper-shell.sh known from earlier releases. For instance, you can list all brokers and get the metadata of the registration of node 1 like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 $ /kafka/bin/kafka-metadata-shell.sh --snapshot /kafka/data/@metadata-0/00000000000000000000.log Loading... Starting... [ Kafka Metadata Shell ] \u0026gt;\u0026gt; ls brokers configs local metadataQuorum topicIds topics \u0026gt;\u0026gt; ls brokers 1 2 3 \u0026gt;\u0026gt; cd brokers/1 \u0026gt;\u0026gt; cat registration RegisterBrokerRecord(brokerId=1, incarnationId=TmM_u-_cQ2ChbUy9NZ9wuA, brokerEpoch=265, endPoints=[BrokerEndpoint(name=\u0026#39;PLAINTEXT\u0026#39;, host=\u0026#39;172.18.0.3\u0026#39;, port=9092, securityProtocol=0)], features=[], rack=null) \u0026gt;\u0026gt; Or to display the current leader:\n1 2 3 \u0026gt;\u0026gt; cat /metadataQuorum/leader MetaLogLeader(nodeId=1, epoch=12) Or to show the metadata of a specific topic partition:\n1 2 3 4 5 6 7 8 9 10 11 12 13 \u0026gt;\u0026gt; cat /topics/dbserver1.inventory.customers/0/data { \u0026#34;partitionId\u0026#34; : 0, \u0026#34;topicId\u0026#34; : \u0026#34;8xjqykVRT_WpkqbXHwbeCA\u0026#34;, \u0026#34;replicas\u0026#34; : [ 2, 3 ], \u0026#34;isr\u0026#34; : [ 2, 3 ], \u0026#34;removingReplicas\u0026#34; : null, \u0026#34;addingReplicas\u0026#34; : null, \u0026#34;leader\u0026#34; : 2, \u0026#34;leaderEpoch\u0026#34; : 0, \u0026#34;partitionEpoch\u0026#34; : 0 } \u0026gt;\u0026gt; Those are just a few of the things you can do with kafka-metadata-shell.sh, and it surely will be an invaluable tool in the box of administrators in the ZK-less era. Another new tool is kafka-cluster.sh, which currently can do two things: displaying the unique id of a cluster, and unregistering a broker. While the former worked for me:\n1 2 3 $ /kafka/bin/kafka-cluster.sh cluster-id --bootstrap-server kafka-1:9092 Cluster ID: oh-sxaDRTcyAr6pFRbXyzA The latter always failed with a NotControllerException, no matter on which node I invoked the command:\n1 2 3 $ /kafka/bin/kafka-cluster.sh unregister --bootstrap-server kafka-1:9092 --id 3 [2021-05-15 20:52:54,626] ERROR [AdminClient clientId=adminclient-1] Unregister broker request for broker ID 3 failed: This is not the correct controller for this cluster. It’s not quite clear to me whether I did something wrong, or whether this functionality simply should not be expected to be supported just yet.\nThe Raft-based metadata quorum also comes with a set of new metrics (described in KIP-595), allowing to retrieve information like the current active controller, role of the node at hand, and more. Here’s a screenshot of the metrics invoked on a non-leader node:\nTaking Brokers Down An essential aspect to any distributed system like Kafka is the fact that invidual nodes of a cluster can disappear at any time, be it due to failures (node crashes, network splits, etc.), or due to controlled shut downs, e.g. for a version upgrade. So I was curious how Kafka in KRaft mode would deal with the situation where nodes in the cluster are stopped and then restarted. Note I’m stopping nodes gracefully via docker-compose stop, instead of randomly crashing them, Jepsen-style ;)\nThe sequence of events I was testing was the following:\nStop the current active controller, so two nodes from the original three-node cluster remain\nStop the then new active controller node, at which point the majority of cluster nodes isn’t available any longer\nStart both nodes again\nHere’s a few noteworthy things I observed. As you’d expect, when stopping the active controller, a new leader was elected (as per the result of cat /metadataQuorum/leader in the Kafka metadata shell), and also all partitions which had the previous active controller as partition leader, got re-assigned (in this case node 1 was the active controller and got stopped):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 $ /kafka/bin/kafka-topics.sh --bootstrap-server kafka-2:9092 --describe --topic dbserver1.inventory.customers Topic: dbserver1.inventory.customers\tTopicId: a6qzjnQwQ2eLNSXL5svW8g\tPartitionCount: 10\tReplicationFactor: 2\tConfigs: segment.bytes=1073741824 Topic: dbserver1.inventory.customers\tPartition: 0\tLeader: 1\tReplicas: 1,3\tIsr: 1,3 Topic: dbserver1.inventory.customers\tPartition: 1\tLeader: 1\tReplicas: 3,1\tIsr: 1,3 Topic: dbserver1.inventory.customers\tPartition: 2\tLeader: 1\tReplicas: 1,2\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 3\tLeader: 1\tReplicas: 2,1\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 4\tLeader: 1\tReplicas: 2,1\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 5\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 6\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 7\tLeader: 2\tReplicas: 2,3\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 8\tLeader: 1\tReplicas: 2,1\tIsr: 1,2 Topic: dbserver1.inventory.customers\tPartition: 9\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 # After stopping node 1 $ /kafka/bin/kafka-topics.sh --bootstrap-server kafka-2:9092 --describe --topic dbserver1.inventory.customers Topic: dbserver1.inventory.customers\tTopicId: a6qzjnQwQ2eLNSXL5svW8g\tPartitionCount: 10\tReplicationFactor: 2\tConfigs: segment.bytes=1073741824 Topic: dbserver1.inventory.customers\tPartition: 0\tLeader: 3\tReplicas: 1,3\tIsr: 3 Topic: dbserver1.inventory.customers\tPartition: 1\tLeader: 3\tReplicas: 3,1\tIsr: 3 Topic: dbserver1.inventory.customers\tPartition: 2\tLeader: 2\tReplicas: 1,2\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 3\tLeader: 2\tReplicas: 2,1\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 4\tLeader: 2\tReplicas: 2,1\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 5\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 6\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 7\tLeader: 2\tReplicas: 2,3\tIsr: 2,3 Topic: dbserver1.inventory.customers\tPartition: 8\tLeader: 2\tReplicas: 2,1\tIsr: 2 Topic: dbserver1.inventory.customers\tPartition: 9\tLeader: 2\tReplicas: 3,2\tIsr: 2,3 Things got interesting though when also stopping the newly elected leader subsequently. At this point, the cluster isn’t in a healthy state any longer, as no majority of nodes of the cluster is available for leader election. Logs of the remaining node are flooded with an UnknownHostException in this situation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 kafka-3_1 | 2021-05-16 10:16:45,282 - WARN [kafka-raft-outbound-request-thread:NetworkClient@992] - [RaftManager nodeId=3] Error connecting to node kafka-2:9093 (id: 2 rack: null) kafka-3_1 | java.net.UnknownHostException: kafka-2 kafka-3_1 | at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797) kafka-3_1 | at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505) kafka-3_1 | at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364) kafka-3_1 | at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298) kafka-3_1 | at org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27) kafka-3_1 | at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111) kafka-3_1 | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512) kafka-3_1 | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466) kafka-3_1 | at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172) kafka-3_1 | at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985) kafka-3_1 | at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311) kafka-3_1 | at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103) kafka-3_1 | at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99) kafka-3_1 | at scala.collection.Iterator.foreach(Iterator.scala:943) kafka-3_1 | at scala.collection.Iterator.foreach$(Iterator.scala:943) kafka-3_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) kafka-3_1 | at scala.collection.IterableLike.foreach(IterableLike.scala:74) kafka-3_1 | at scala.collection.IterableLike.foreach$(IterableLike.scala:73) kafka-3_1 | at scala.collection.AbstractIterable.foreach(Iterable.scala:56) kafka-3_1 | at kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99) kafka-3_1 | at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73) kafka-3_1 | at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94) kafka-3_1 | at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) Here I think it’d be great to get a more explicit indication in the logs of what’s going on, clearly indicating the unhealthy status of the cluster at large.\nWhat’s also interesting is that the remaining node claims to be a leader as per its exposed metrics and value of /metadataQuorum/leader in the metadata shell. This seems a bit dubious, as no leader election can happen without the majority of nodes available. Consequently, creation of a topic in this state also times out, so I suspect this is more an artifact of displaying the cluster state rather than of what’s actually going on.\nThings get a bit more troublesome when restarting the two stopped nodes; Very often I’d then see a very high CPU consumption on the Kafka nodes as well as the Connect node:\n1 2 3 4 5 6 7 8 $ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 642eb697fed6 tutorial_connect_1 122.04% 668.3MiB / 7.775GiB 8.39% 99.7MB / 46.9MB 131kB / 106kB 47 5d9806526f92 tutorial_kafka-1_1 9.24% 386.4MiB / 7.775GiB 4.85% 105kB / 104kB 0B / 877kB 93 767e6c0f6cd3 tutorial_kafka-3_1 176.40% 739.2MiB / 7.775GiB 9.28% 14.5MB / 40.6MB 0B / 1.52MB 120 a0ce8438557f tutorial_kafka-2_1 87.51% 567.8MiB / 7.775GiB 7.13% 6.52MB / 24.9MB 0B / 881kB 95 df978d220132 tutorial_postgres_1 0.00% 36.39MiB / 7.775GiB 0.46% 243kB / 5.49MB 0B / 79.4MB 9 In some cases stopping and restarting the Kafka nodes would help, other times only a restart of the Connect node would mitigate the situation. I didn’t further explore this issue by taking a thread dump, but I suppose threads are stuck in some kind of busy spin loop at this point. The early access state of KRaft mode seems to be somewhat showing here. After bringing up the issue on the Kafka mailing list, I’ve logged KAFKA-12801 for this problem, as it seems not to have been tracked before.\nOn the bright side, once all brokers were up and running again, the cluster and the Debezium connector would happily continue their work.\nWrap-Up Not many features have been awaited by the Kafka community as eagerly as the removal of the ZooKeeper dependency. Rightly so: Kafka-based metadata storage and leader election will greatly simplify the operational burden for running Kafka and also allow for better scalability. Lifting the requirement for running separate ZooKeeper processes or even machines should also help to make things more cost-effective, so you should benefit from this change no matter whether you’re running Kafka yourself or are using a managed service offering.\nThe early access release of ZK-less Kafka in version 2.8 gives a first impression of what will hopefully be the standard way of running Kafka in the not too distant future. As very clearly stated in the KRaft README, you should not use this in production yet; this matches with the observerations made above: while running Kafka without ZooKeeper definitely feels great, there’s still some rough edges to be sorted out. Also check out the README for a list of currently missing features, such as support of transactions, adding partitions to existing topics, partition reassignment, and more. Lastly, any distributed system should only be fully trusted after going through the grinder of the Jepsen test suite, which I’m sure will only be a question of time.\nDespite the early state, I would very much recommend to get started testing ZK-less Kafka at this point, so to get a feeling for it and of course to report back any findings and insights. To do so, either download the upstream Kafka distribution, or build the Debezium 1.6 container image for Kafka with preliminary support for KRaft mode, which lets you set up a ZK-less Kafka cluster in no time.\nIn order to learn more about ZK-less Kafka, besides diving into the relevant KIPs (which all are linked from the umbrella KIP-500), also check out the QCon talk \u0026#34;Kafka Needs No Keeper\u0026#34; by Colin McCabe, one of the main engineers driving this effort.\n","id":152,"publicationdate":"May 17, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_trying_zk_less_kafka_yourself\"\u003eTrying ZK-less Kafka Yourself\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_taking_brokers_down\"\u003eTaking Brokers Down\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSometimes, less is more.\nOne case where that’s certainly true is dependencies.\nAnd so it shouldn’t come at a surprise that the \u003ca href=\"https://kafka.apache.org/\"\u003eApache Kafka\u003c/a\u003e community is eagerly awaiting the removal of the dependency to the \u003ca href=\"https://zookeeper.apache.org/\"\u003eZooKeeper\u003c/a\u003e service,\nwhich currently is used for storing Kafka metadata (e.g. about topics and partitions) as well as for the purposes of leader election in the cluster.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe Kafka improvement proposal \u003ca href=\"https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum\"\u003eKIP-500\u003c/a\u003e\n(\u0026#34;Replace ZooKeeper with a Self-Managed Metadata Quorum\u0026#34;)\npromises to make life better for users in many regards:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eBetter getting started and operational experience by requiring to run only one system, Kafka, instead of two\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eRemoving potential for discrepancies of metadata state between ZooKeeper and the Kafka controller\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eSimplifying configuration, for instance when it comes to security\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eBetter scalability, e.g. in terms of number of partitions; faster execution of operations like topic creation\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e","tags":["kafka","kraft","architecture"],"title":"Exploring ZooKeeper-less Kafka","uri":"https://www.morling.dev/blog/exploring-zookeeper-less-kafka/"},{"content":"","id":153,"publicationdate":"May 17, 2021","section":"tags","summary":"","tags":null,"title":"kraft","uri":"https://www.morling.dev/tags/kraft/"},{"content":" One of the ultimate strengths of Java is its strong notion of backwards compatibility: Java applications and libraries built many years ago oftentimes run without problems on current JVMs, and the compiler of current JDKs can produce byte code, that is executable with earlier Java versions.\nFor instance, JDK 16 supports byte code levels going back as far as to Java 1.7; But: hic sunt dracones. The emitted byte code level is just one part of the story. It’s equally important to consider which APIs of the JDK are used by the compiled code, and whether they are available in the targeted Java runtime version. As an example, let’s consider this simple \u0026#34;Hello World\u0026#34; program:\n1 2 3 4 5 6 7 8 9 package com.example; import java.util.List; public class HelloWorld { public static void main(String... args) { System.out.println(List.of(\u0026#34;Hello\u0026#34;, \u0026#34;World!\u0026#34;)); } } Let’s assume we’re using Java 16 for compiling this code, aiming for compatibility with Java 1.8. Historically, the Java compiler has provided the --source and --target options for this purpose, which are well known to most Java developers:\n1 2 3 $ javac --source 1.8 --target 1.8 -d classes HelloWorld.java warning: [options] bootstrap class path not set in conjunction with -source 8 1 warning This compiles successfully (we’ll come back on that warning in a bit). But if you actually try to run that class on Java 8, you’re in for a bad suprise:\n1 2 3 4 $ java -classpath classes com.example.HelloWorld Exception in thread \u0026#34;main\u0026#34; java.lang.NoSuchMethodError: ↩ java.util.List.of(Ljava/lang/Object;Ljava/lang/Object;)Ljava/util/List; at com.example.HelloWorld.main(HelloWorld.java:7) This makes sense: the List.of() methods were only introduced in Java 9, so they are not present in the Java 8 API. Shouldn’t the compiler have let us know us about this? Absolutely, and that’s where this warning about the bootstrap class path is coming in: the compiler recognized our potentially dangerous endavour and essentially suggested to compile against the class library matching the targeted Java version instead of that one of the JDK used for compilation. This is done using the -Xbootclasspath option:\n1 2 3 4 5 6 7 8 9 10 $ javac --source 1.8 --target 1.8 \\ -d classes \\ -Xbootclasspath:${JAVA_8_HOME}/jre/lib/rt.jar \\ (1) HelloWorld.java HelloWorld.java:7: error: cannot find symbol System.out.println(List.of(\u0026#34;Hello\u0026#34;, \u0026#34;World!\u0026#34;)); ^ symbol: method of(String,String) location: interface List 1 error 1 Path to the rt.jar of Java 8 That’s much better: now the invocation of the List.of() method causes compilation to fail, instead of finding out about this problem only during testing, or worse, in production.\nWhile this approach works, it’s not without issues: requiring the target Java version’s class library complicates things quite a bit; multiple Java versions need to be installed, and the targeted JDK’s location must be known, which for instance tends to make build processes not portable between different machines and platforms.\nLuckily, Java 9 improved things significantly here; by means of the new --release option, code can be compiled for older Java versions in a fully safe and portable way. Let’s give this a try:\n1 2 3 4 5 6 7 $ javac --release 8 -d classes HelloWorld.java HelloWorld.java:7: error: cannot find symbol System.out.println(List.of(\u0026#34;Hello\u0026#34;, \u0026#34;World!\u0026#34;)); ^ symbol: method of(String,String) location: interface List 1 error Very nice, the same compilation error as before, but without the need for any complex configuration besides the --release 8 option. So how does this work? Does the JDK come with full class libraries of all the earlier Java versions which it supports? Considering that the modules file of Java 16 has a size of more than one hundred megabytes (to be precise, 118 MB on macOS), that’d clearly be not a good idea; We’d end up with a JDK size of nearly one gigabyte.\nWhat’s happening instead is that the JDK ships \u0026#34;stripped-down class files corresponding to class files from the target platform versions\u0026#34;, as we can read in JEP 247 (\u0026#34;Compile for Older Platform Versions\u0026#34;), which introduced the --release option. Details about the implementation are sparse, though. The JEP only mentions a ZIP file named ct.sym which contains those signature files. So I started by taking a look at what’s in there:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 $ unzip -l $JAVA_HOME/lib/ct.sym Archive: /Library/Java/JavaVirtualMachines/jdk-16.sdk/Contents/Home/lib/ct.sym Length Date Time Name --------- ---------- ----- ---- 0 03-26-2021 18:11 7/java.base/java/awt/peer/ 2557 03-26-2021 18:11 7/java.base/java/awt/peer/ComponentPeer.sig 542 03-26-2021 18:11 7/java.base/java/awt/peer/FramePeer.sig ... 856 03-26-2021 18:11 879A/java.activation/javax/activation/ActivationDataFlavor.sig 491 03-26-2021 18:11 879A/java.activation/javax/activation/CommandInfo.sig 299 03-26-2021 18:11 879A/java.activation/javax/activation/CommandObject.sig ... 1566 03-26-2021 18:11 9ABCDE/java.base/java/lang/Byte.sig 1616 03-26-2021 18:11 9ABCDE/java.base/java/lang/Short.sig ... That’s interesting, lots of *.sig files, organized in some at first odd-looking directory structure. So let’s see what’s there for the java.util.List class:\n1 2 3 4 5 6 $ unzip -l $JAVA_HOME/lib/ct.sym | grep \u0026#34;java/util/List.sig\u0026#34; 1481 03-26-2021 18:11 7/java.base/java/util/List.sig 1771 03-26-2021 18:11 8/java.base/java/util/List.sig 4040 03-26-2021 18:11 9/java.base/java/util/List.sig 4184 03-26-2021 18:11 A/java.base/java/util/List.sig 4097 03-26-2021 18:11 BCDEF/java.base/java/util/List.sig Five different versions altogether, under the directories 7, 8, 9, A, and BCDEF. It took a few moments until the structure began to make sense to me: the top-level directory names encode Java version(s), and there’s a new version of the signature file whenever its API changed. I.e. java.util.List changed in Java 7, 8, 9, 10 (A), and 11 (B), and has remained stable since then, i.e. from version 11 to 16, there have been no changes to the public List API.\nSo let’s dive in a bit further and compare the signature files of Java 8 and 9. As JEP 247 states that these files are (stripped-down) class files, we should be able to examine them using javap. In order to so, I had to change the file extensions from *.sig to *.class, though. After that, I could decompile the files using javap, save the result in text files and compare them using git:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 $ javap List8.class \u0026gt; List8.txt $ javap List9.class \u0026gt; List9.txt $ git diff --no-index List8.txt List9.txt diff --git a/List8.txt b/List9.txt index b2ca320..b276286 100644 --- a/List8.txt +++ b/List9.txt @@ -27,4 +27,16 @@ public interface java.util.List\u0026lt;E\u0026gt; extends java.util.Collection\u0026lt;E\u0026gt; { public abstract java.util.ListIterator\u0026lt;E\u0026gt; listIterator(int); public abstract java.util.List\u0026lt;E\u0026gt; subList(int, int); public default java.util.Spliterator\u0026lt;E\u0026gt; spliterator(); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E, E, E, E, E, E, E, E, E, E); + public static \u0026lt;E\u0026gt; java.util.List\u0026lt;E\u0026gt; of(E...); } As expected, the diff between the two signature files reveals the addition of the different List.of() methods in Java 9, as such exactly the reason why the Hello World example from the beginning cannot be executed on Java 8.\nDebugging the Java Compiler In order to understand in detail how the ct.sym file is used by the Java compiler, it can be useful to run javac in debug mode. As javac is written in Java itself, this can be done exactly the same way as when remote debugging any other Java application. You only need to start javac using the usual debug switches, which must be prepended with -J in this case:\n1 2 3 $ javac -J-Xdebug \\ -J-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 \\ HelloWorld.java Make sure to download the right version of the OpenJDK source code and set it up in your IDE, so that you also can step through internal classes whose source code isn’t distributed with binary builds. An interesting starting point for your explorations could be the JDKPlatformProvider class.\nTo double-check, you could also confirm with the API diffs provided by the Java Version Almanac or the Adopt OpenJDK JDK API diff generator. While doing so, one more thing piqued my curiosity: these reports don’t show any changes to java.util.List in Java 11, whereas ct.sym contains a new version of the corresponding signature file; To find out what’s going on, again javap — this time with a bit more detail level — came in handy:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 $ javap -p -c -s -v -l List10.class \u0026gt; List10.txt $ javap -p -c -s -v -l List11.class \u0026gt; List11.txt $ git diff --no-index -w List10.txt List11.txt ... - #96 = Utf8 RuntimeInvisibleAnnotations - #97 = Utf8 Ljdk/Profile+Annotation; - #98 = Utf8 value - #99 = Integer 1 { public abstract int size(); descriptor: ()I @@ -308,8 +304,3 @@ Constant pool: Signature: #87 // \u0026lt;E:Ljava/lang/Object;\u0026gt;(Ljava/util/Collection\u0026lt;+TE;\u0026gt;;)Ljava/util/List\u0026lt;TE;\u0026gt;; } Signature: #95 // \u0026lt;E:Ljava/lang/Object;\u0026gt;Ljava/lang/Object;Ljava/util/Collection\u0026lt;TE;\u0026gt;; -RuntimeInvisibleAnnotations: - 0: #97(#98=I#99) - jdk.Profile+Annotation( - value=1 - ) An annotation with the interesting name @jdk.Profile+Annotion(1) got removed. Now, if you look at the List.java source file in Java 10, you won’t find this annotation anywhere. In fact, this annotation type doesn’t exist at all. By grepping through the OpenJDK source code for ct.sym, I learned that it is a synthetic annotation which gets added during the process of creating the signature files, denoting which compact profile a class belongs to.\nCompact Profiles Compact Profiles are a notion in Java 8 which defines three specific sub-sets of the Java platform: compact1, compact2, and compact3. Each profile contains a fixed set of JDK packages and build upon each other, allowing for more size-efficient deployments to constrained devices, if such profile is sufficient for a given application. With Java 9, the module system, and the ability to create custom runtime images on a much more granular level (using jlink), compact profiles became pretty much obsolete.\nSo that’s another purpose of the ct.sym file: it allows the compiler to ensure compatibility with a chosen compact profile. In current JDKs, javac still supports the -profile option, but only when compiling for Java 8. In that light, it’s not quite clear why that annotation only was removed from the signature file with Java 11.\nSumming up, since Java 9 the javac compiler provides powerful means of ensuring API compatibility with earlier Java versions. With a size of 7.2 MB for Java 16, the ct.sym file contains the JDK API signature versions all the way back to Java 7. Using the --release compiler option, backwards-compatible builds, fully portable, and without the need for actually installing earlier JDKs, are straight foward. With that tool in your box, there’s really no need any longer for using the -source and -target options. Not only that, --release will also help to spot subtle compatibility issues related to overriding methods with co-variant return types, such as ByteBuffer.position().\n","id":154,"publicationdate":"Apr 26, 2021","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOne of the ultimate strengths of Java is its strong notion of backwards compatibility:\nJava applications and libraries built many years ago oftentimes run without problems on current JVMs,\nand the compiler of current JDKs can produce byte code, that is executable with earlier Java versions.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor instance, JDK 16 supports byte code levels going back as far as to Java 1.7;\nBut: \u003cem\u003ehic sunt dracones\u003c/em\u003e.\nThe emitted byte code level is just one part of the story.\nIt’s equally important to consider which APIs of the JDK are used by the compiled code,\nand whether they are available in the targeted Java runtime version.\nAs an example, let’s consider this simple \u0026#34;Hello World\u0026#34; program:\u003c/p\u003e\n\u003c/div\u003e","tags":["java","compatibility","internals"],"title":"The Anatomy of ct.sym — How javac Ensures Backwards Compatibility","uri":"https://www.morling.dev/blog/the-anatomy-of-ct-sym-how-javac-ensures-backwards-compatibility/"},{"content":" Table of Contents What’s SIMD Anyways? Vectorizing FizzBuzz Examining the Native Code Avoiding Scalar Processing of Tail Elements Wrap-Up Java 16 is around the corner, so there’s no better time than now for learning more about the features which the new version will bring. After exploring the support for Unix domain sockets a while ago, I’ve lately been really curious about the incubating Vector API, as defined by JEP 338, developed under the umbrella of Project Panama, which aims at \u0026#34;interconnecting JVM and native code\u0026#34;.\nVectors?!? Of course this is not about renewing the ancient Java collection types like java.util.Vector (\u0026lt;insert some pun about this here\u0026gt;), but rather about an API which lets Java developers take advantage of the vector calculation capabilities you can find in most CPUs these days. Now I’m by no means an expert on low-level programming leveraging specific CPU instructions, but exactly that’s why I hope to make the case with this post that the new Vector API makes these capabilities approachable to a wide audience of Java programmers.\nWhat’s SIMD Anyways? Before diving into a specific example, it’s worth pointing out why that API is so interesting, and what it could be used for. In a nutshell, CPU architectures like x86 or AArch64 provide extensions to their instruction sets which allow you to apply a single operation to multiple data items at once (SIMD — single instruction, multiple data). If a specific computing problem can be solved using an algorithm that lends itself to such parallelization, substantial performance improvements can be gained. Examples for such SIMD instruction set extensions include SSE and AVX for x64, and Neon of AArch64 (Arm).\nAs such, they complement other means of compute parallelization: scaling out across multiple machines which collaborate in a cluster, and multi-threaded programming. Unlike these though, vectorized computations are done within the scope of an individual method, e.g. operating on multiple elements of an array at once.\nSo far, there was no way for Java developers to directly work with such SIMD instructions. While you can use SIMD intrinsics in languages closer to the metal such as C/C++, no such option exists in Java so far. Note this doesn’t mean Java wouldn’t take advantage of SIMD at all: the JIT compiler can auto-vectorize code in specific situations, i.e. transforming code from a loop into vectorized code. Whether that’s possible or not isn’t easy to determine, though; small changes to a loop which the compiler was able to vectorize before, may lead to scalar execution, resulting in a performance regression.\nJEP 338 aims to improve this situation: introducing a portable vector computation API, it allows Java developers to benefit from SIMD execution by means of explicitly vectorized algorithms. Unlike C/C++ style intrinsics, this API will be mapped automatically by the C2 JIT compiler to the corresponding instruction set of the underlying platform, falling back to scalar execution if the platform doesn’t provide the required capabilities. A pretty sweet deal, if you ask me!\nNow, why would you be interested in this? Doesn’t \u0026#34;vector calculation\u0026#34; sound an awful lot like mathematics-heavy, low-level algorithms, which you don’t tend to find that much in your typical Java enterprise applications? I’d say, yes and no. Indeed it may not be that beneficial for say a CRUD application copying some data from left to right. But there are many interesting applications in areas like image processing, AI, parsing, (SIMD-based JSON parsing being a prominently discussed example), text processing, data type conversions, and many others. In that regard, I’d expect that JEP 338 will pave the path for using Java in many interesting use cases, where it may not be the first choice today.\nVectorizing FizzBuzz To see how the Vector API can help with improving the performance of some calculation, let’s consider FizzBuzz. Originally, FizzBuzz is a game to help teaching children division; but interestingly, it also serves as entry-level interview question for hiring software engineers in some places. In any case, it’s a nice example for exploring how some calculation can benefit from vectorization. The rules of FizzBuzz are simple:\nNumbers are counted and printed out: 1, 2, 3, …​\nIf a number if divisible by 3, instead of printing the number, print \u0026#34;Fizz\u0026#34;\nIf a number if divisible by 5, print \u0026#34;Buzz\u0026#34;\nIf a number if divisible by 3 and 5, print \u0026#34;FizzBuzz\u0026#34;\nAs the Vector API concerns itself with numeric values instead of strings, rather than \u0026#34;Fizz\u0026#34;, \u0026#34;Buzz\u0026#34;, and \u0026#34;FizzBuzz\u0026#34;, we’re going to emit -1, -2, and -3, respectively. The input of the program will be an array with the numbers from 1 …​ 256, the output an array with the FizzBuzz sequence:\n1 1, 2, -1, 4, -2, -1, 7, 8, -1, -2, 11, -1, 13, 14, -3, 16, ... The task is easily solved using a plain for loop processing scalar values one by one:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 private static final int FIZZ = -1; private static final int BUZZ = -2; private static final int FIZZ_BUZZ = -3; public int[] scalarFizzBuzz(int[] values) { int[] result = new int[values.length]; for (int i = 0; i \u0026lt; values.length; i++) { int value = values[i]; if (value % 3 == 0) { if (value % 5 == 0) { (1) result[i] = FIZZ_BUZZ; } else { result[i] = FIZZ; (2) } } else if (value % 5 == 0) { result[i] = BUZZ; (3) } else { result[i] = value; (4) } } return result; } 1 The current number is divisible by 3 and 5: emit FIZZ_BUZZ (-3) 2 The current number is divisible by 3: emit FIZZ (-1) 3 The current number is divisible by 5: emit BUZZ (-2) 4 The current number is divisible by neither 3 nor 5: emit the number itself As a baseline, this implementation can be executed ~2.2M times per second in a simple JMH benchmark running on my Macbook Pro 2019, with a 2.6 GHz 6-Core Intel Core i7 CPU:\n1 2 Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s Now let’s see how this calculation could be vectorized and what performance improvements can be gained by doing so. When looking at the incubating Vector API, you may be overwhelmed at first by its large API surface. But it’s becoming manageable once you realize that all the types like IntVector, LongVector, etc. essentially expose the same set of methods, only specific for each of the supported data types (and indeed, as per the JavaDoc, all these classes were not hand-written by some poor soul, but generated, from some sort of parameterized template supposedly).\nAmongst the plethora of API methods, there is no modulo operation, though (which makes sense, as for instance there isn’t such instruction in any of the x86 SIMD extensions). So what could we do to solve the FizzBuzz task? After skimming through the API for some time, the method blend​(Vector\u0026lt;Integer\u0026gt; v, VectorMask\u0026lt;Integer\u0026gt; m) caught my attention:\nReplaces selected lanes of this vector with corresponding lanes from a second input vector under the control of a mask. […​]\nFor any lane set in the mask, the new lane value is taken from the second input vector, and replaces whatever value was in the that lane of this vector.\nFor any lane unset in the mask, the replacement is suppressed and this vector retains the original value stored in that lane.\nThis sounds pretty useful; The pattern of expected -1, -2, and -3 values repeats every 15 input values. So we can \u0026#34;pre-calculate\u0026#34; that pattern once and persist it in form of vectors and masks for the blend() method. While stepping through the input array, the right vector and mask are obtained based on the current position and are used with blend() in order to mark the values divisible by 3, 5, and 15 (another option could be min(Vector\u0026lt;Integer\u0026gt; v), but I decided against it, as we’d need some magic value for representing those numbers which should be emitted as-is).\nHere is a visualization of the approach, assuming a vector length of eight elements (\u0026#34;lanes\u0026#34;):\nSo let’s see how we can implement this using the Vector API. The mask and second input vector repeat every 120 elements (least common multiple of 8 and 15), so 15 masks and vectors need to be determined. They can be created like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 public class FizzBuzz { private static final VectorSpecies\u0026lt;Integer\u0026gt; SPECIES = IntVector.SPECIES_256; (1) private final List\u0026lt;VectorMask\u0026lt;Integer\u0026gt;\u0026gt; resultMasks = new ArrayList\u0026lt;\u0026gt;(15); private final IntVector[] resultVectors = new IntVector[15]; public FizzBuzz() { List\u0026lt;VectorMask\u0026lt;Integer\u0026gt;\u0026gt; threes = Arrays.asList( (2) VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00100100), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b01001001), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b10010010) ); List\u0026lt;VectorMask\u0026lt;Integer\u0026gt;\u0026gt; fives = Arrays.asList( (3) VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00010000), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b01000010), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00001000), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b00100001), VectorMask.\u0026lt;Integer\u0026gt;fromLong(SPECIES, 0b10000100) ); for(int i = 0; i \u0026lt; 15; i++) { (4) VectorMask\u0026lt;Integer\u0026gt; threeMask = threes.get(i%3); VectorMask\u0026lt;Integer\u0026gt; fiveMask = fives.get(i%5); resultMasks.add(threeMask.or(fiveMask)); (5) resultVectors[i] = IntVector.zero(SPECIES) (6) .blend(FIZZ, threeMask) .blend(BUZZ, fiveMask) .blend(FIZZ_BUZZ, threeMask.and(fiveMask)); } } } 1 A vector species describes the combination of an vector element type (in this case Integer) and a vector shape (in this case 256 bit); i.e. here we’re going to deal with vectors that hold 8 32 bit int values 2 Vector masks describing the numbers divisible by three (read the bit values from right to left) 3 Vector masks describing the numbers divisible by five 4 Let’s create the fifteen required result masks and vectors 5 A value in the output array should be set to another value if it’s divisible by three or five 6 Set the value to -1, -2, or -3, depending on whether its divisible by three, five, or fifteen, respectively; otherwise set it to the corresponding value from the input array With this infrastructure in place, we can implement the actual method for calculating the FizzBuzz values for an arbitrarily long input array:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 public int[] simdFizzBuzz(int[] values) { int[] result = new int[values.length]; int i = 0; int upperBound = SPECIES.loopBound(values.length); (1) for (; i \u0026lt; upperBound; i += SPECIES.length()) { (2) IntVector chunk = IntVector.fromArray(SPECIES, values, i); (3) int maskIdx = (i/SPECIES.length())%15; (4) IntVector fizzBuzz = chunk.blend(resultValues[maskIdx], resultMasks[maskIdx]); (5) fizzBuzz.intoArray(result, i); (6) } for (; i \u0026lt; values.length; i++) { (7) int value = values[i]; if (value % 3 == 0) { if (value % 5 == 0) { result[i] = FIZZ_BUZZ; } else { result[i] = FIZZ; } } else if (value % 5 == 0) { result[i] = BUZZ; } else { result[i] = value; } } return result; } 1 determine the maximum index in the array that’s divisible by the species length; e.g. if the input array is 100 elements long, that’d be 96 in the case of vectors with eight elements each 2 Iterate through the input array in steps of the vector length 3 Load the current chunk of the input array into an IntVector 4 Obtain the index of the right result vector and mask 5 Determine the FizzBuzz numbers for the current chunk (i.e. that’s the actual SIMD instruction, processing all eight elements of the current chunk at once) 6 Copy the result values at the right index into the result array 7 Process any remainder (e.g. the last four remaining elements in case of an input array with 100 elements) using the traditional scalar approach, as those values couldn’t fill up another vector instance To reiterate what’s happening here: instead of processing the values of the input array one by one, they are processed in chunks of eight elements each by means of the blend() vector operation, which can be mapped to an equivalent SIMD instruction of the CPU. In case the input array doesn’t have a length that’s a multiple of the vector length, the remainder is processed in the traditional scalar way. The resulting duplication of the logic seems a bit inelegant, we’ll discuss in a bit what can be done about that.\nFor now, let’s see whether our efforts pay off; i.e. is this vectorized approach actually faster then the basic scalar implementation? Turns out it is! Here are the numbers I get from JMH on my machine, showing through-put increasing by factor 3:\n1 2 3 Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s Is there anything that could be further improved? I’m pretty sure, but as said I’m not an expert here, so I’ll leave it to smarter folks to point out more efficient implementations in the comments. One thing I figured is that the division and modulo operation for obtaining the current mask index isn’t ideal. Keeping a separate loop variable that’s reset to 0 after reaching 15 proved to be quite a bit faster:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public int[] simdFizzBuzz(int[] values) { int[] result = new int[values.length]; int i = 0; int j = 0; int upperBound = SPECIES.loopBound(values.length); for (; i \u0026lt; upperBound; i += SPECIES.length()) { IntVector chunk = IntVector.fromArray(SPECIES, values, i); IntVector fizzBuzz = chunk.blend(resultValues[j], resultMasks[j]); fizzBuzz.intoArray(result, i); j++; if (j == 15) { j = 0; } } // processing of remainder... } 1 2 3 4 Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 8830433,250 ± 69955,161 ops/s This makes for another nice improvement, yielding 4x the throughput of the original scalar implementation. Now, to make this a true apple-to-apple comparison, a mask-based approach can also be applied to the purely scalar implementation, only that each value needs to be looked up individually:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 private int[] serialMask = new int[] {0, 0, -1, 0, -2, -1, 0, 0, -1, -10, 0, -1, 0, 0, -3}; public int[] serialFizzBuzzMasked(int[] values) { int[] result = new int[values.length]; int j = 0; for (int i = 0; i \u0026lt; values.length; i++) { int res = serialMask[j]; result[i] = res == 0 ? values[i] : res; j++; if (j == 15) { j = 0; } } return result; } Indeed, this implementation is quite a bit better than the original one, but still the SIMD-based approach is more than twice as fast:\n1 2 3 4 5 Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.scalarFizzBuzzMasked 256 thrpt 5 4156751,424 ± 23668,949 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 8830433,250 ± 69955,161 ops/s Examining the Native Code This all is pretty cool, but can we trust that under the hood things actually happen the way we expect them to happen? In order to verify that, let’s take a look at the native assembly code that gets produced by the JIT compiler for this implementation. This requires you to run the JVM with the hsdis plug-in; see this post for instructions on how to build and install hsdis. Let’s create a simple main class which executes the method in question in a loop, so to make sure the method actually gets JIT-compiled:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 public class Main { public static int[] blackhole; public static void main(String[] args) { FizzBuzz fizzBuzz = new FizzBuzz(); var values = IntStream.range(1, 257).toArray(); for(int i = 0; i \u0026lt; 5_000_000; i++) { blackhole = fizzBuzz.simdFizzBuzz(values); } } } Run the program, enabling the output of the assembly, and piping its output into a log file:\n1 2 3 4 5 java -XX:+UnlockDiagnosticVMOptions \\ -XX:+PrintAssembly -XX:+LogCompilation \\ --add-modules=jdk.incubator.vector \\ --class-path target/classes \\ dev.morling.demos.simdfizzbuzz.Main \u0026gt; fizzbuzz.log Open the fizzbuzz.log file and look for the C2-compiled nmethod block of the simdFizzBuzz method. Somewhere within the method’s native code, you should find the vpblendvb instruction (output slightly adjusted for better readability):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ... =========================== C2-compiled nmethod ============================ --------------------------------- Assembly --------------------------------- Compiled method (c2) ... dev.morling.demos.simdfizzbuzz.FizzBuzz:: ↩ simdFizzBuzz (161 bytes) ... 0x000000011895e18d: vpmovsxbd %xmm7,%ymm7 ↩ ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0} ; - jdk.incubator.vector.IntVector::intoArray@42 (line 2962) ; - dev.morling.demos.simdfizzbuzz.FizzBuzz::simdFizzBuzz@76 (line 92) 0x000000011895e192: vpblendvb %ymm7,%ymm5,%ymm8,%ymm0 ↩ ;*invokestatic blend {reexecute=0 rethrow=0 return_oop=0} ; - jdk.incubator.vector.IntVector::blendTemplate@26 (line 1895) ; - jdk.incubator.vector.Int256Vector::blend@11 (line 376) ; - jdk.incubator.vector.Int256Vector::blend@3 (line 41) ; - dev.morling.demos.simdfizzbuzz.FizzBuzz::simdFizzBuzz@67 (line 91) ... vpblendvb is part of the x86 AVX2 instruction set and \u0026#34;conditionally copies byte elements from the source operand (second operand) to the destination operand (first operand) depending on mask bits defined in the implicit third register argument\u0026#34;, as such exactly corresponding to the blend() method in the JEP 338 API.\nOne detail not quite clear to me is why vpmovsxbd for copying the results into the output array (the intoArray() call) shows up before vpblendvb. If you happen to know the reason for this, I’d love to hear from you and learn about this.\nAvoiding Scalar Processing of Tail Elements Let’s get back to the scalar processing of the potential remainder of the input array. This feels a bit \u0026#34;un-DRY\u0026#34;, as it requires the algorithm to be implemented twice, once vectorized and once in a scalar way.\nThe Vector API recognizes the desire for avoiding this duplication and provides masked versions of all the required operations, so that during the last iteration no access beyond the array length will happen. Using this approach, the SIMD FizzBuzz method looks like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 public int[] simdFizzBuzzMasked(int[] values) { int[] result = new int[values.length]; int j = 0; for (int i = 0; i \u0026lt; values.length; i += SPECIES.length()) { var mask = SPECIES.indexInRange(i, values.length); (1) var chunk = IntVector.fromArray(SPECIES, values, i, mask); (2) var fizzBuzz = chunk.blend(resultValues[j], resultMasks.get(j)); fizzBuzz.intoArray(result, i, mask); (2) j++; if (j == 15) { j = 0; } } return result; } 1 Obtain a mask which, during the last iteration, will have bits for those lanes unset, which are larger than the last encountered multiple of the vector length 2 Perform the same operations as above, but using the mask to prevent any access beyond the array length The implementation looks quite a bit nicer than the version with the explicit scalar processing of the remainder portion. But the impact on throughput is significant, the result is quite a disappointing:\n1 2 3 4 5 6 Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2204774,792 ± 76581,374 ops/s FizzBuzzBenchmark.scalarFizzBuzzMasked 256 thrpt 5 4156751,424 ± 23668,949 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 6748723,261 ± 34725,507 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 8830433,250 ± 69955,161 ops/s FizzBuzzBenchmark.simdFizzBuzzMasked 256 thrpt 5 1204128,029 ± 5556,553 ops/s In its current form, this approach is even slower than the pure scalar implementation. It remains to be seen whether and how performance gets improved here, as the Vector API matures. Ideally, the mask would have to be only applied during the very last iteration. This is something we either could do ourselves — re-introducing some special remainder handling, albeit less different from the core implementation than with the pure scalar approach discussed above — or perhaps even the compiler itself may be able to apply such transformation.\nOne important take-away from this is that a SIMD-based approach does not necessarily have to be faster than a scalar one. So every algorithmic adjustment should be validated with a corresponding benchmark, before drawing any conclusions. Speaking of which, I also ran the benchmark on that shiny new Mac Mini M1 (i.e. an AArch64-based machine) that found its way to my desk recently, and numbers are, mh, interesting:\n1 2 3 4 5 6 7 Benchmark (arrayLength) Mode Cnt Score Error Units FizzBuzzBenchmark.scalarFizzBuzz 256 thrpt 5 2717990,097 ± 4203,628 ops/s FizzBuzzBenchmark.scalarFizzBuzzMasked 256 thrpt 5 5750402,582 ± 2479,462 ops/s FizzBuzzBenchmark.simdFizzBuzz 256 thrpt 5 1297631,404 ± 15613,288 ops/s FizzBuzzBenchmark.simdFizzBuzzMasked 256 thrpt 5 374313,033 ± 2219,940 ops/s FizzBuzzBenchmark.simdFizzBuzzMasksInArray 256 thrpt 5 1316375,073 ± 1178,704 ops/s FizzBuzzBenchmark.simdFizzBuzzSeparateMaskIndex 256 thrpt 5 998979,324 ± 69997,361 ops/s The scalar implementation on the M1 out-performs the x86 MacBook Pro by quite a bit, but SIMD numbers are significantly lower.\nI haven’t checked the assembly code, but solely based on the figures, my guess is that the JEP 338 implementation in the current JDK 16 builds does not yet support AArch64, and the API falls back to scalar execution.\nHere it would be nice to have some method in the API which reveals whether SIMD support is provided by the current platform or not, as e.g. done by .NET with its Vector.IsHardwareAccelerated() method.\nUpdate, March 9th: After asking about this on the panama-dev mailing list, Ningsheng Jian from Arm explained that the AArch64 NEON instruction set has a maximum hardware vector size of 128 bits; hence the Vector API is transparently falling back to the Java implementation in our case of using 256 bits. By passing the -XX:+PrintIntrinsics flag you can inspect which API calls get intrinsified (i.e. executed via corresponding hardware instructions) and which ones not. When running the main class from above with this option, we get the relevant information (output slightly adjusted for better readability):\n1 2 3 4 5 6 7 8 9 10 11 12 @ 31 jdk.internal.vm.vector.VectorSupport::load (38 bytes) ↩ failed to inline (intrinsic) ... @ 26 jdk.internal.vm.vector.VectorSupport::blend (38 bytes) ↩ failed to inline (intrinsic) ... @ 42 jdk.internal.vm.vector.VectorSupport::store (38 bytes) ↩ failed to inline (intrinsic) ** not supported: arity=0 op=load vlen=8 etype=int ismask=no ** not supported: arity=2 op=blend vlen=8 etype=int ismask=useload ** not supported: arity=1 op=store vlen=8 etype=int ismask=no Fun fact: during the entire benchmark runtime of 10 min the fan of the Mac Mini was barely to hear, if at all. Definitely a very exciting platform, and I’m looking forward to doing more Java experiments on it soon.\nWrap-Up Am I suggesting you should go and implement your next FizzBuzz using SIMD? Of course not, FizzBuzz just served as an example here for exploring how a well-known \u0026#34;problem\u0026#34; can be solved more efficiently via the new Java Vector API (at the cost of increased complexity in the code), also without being a seasoned systems programmer. On the other hand, it may make an impression during your next job interview ;)\nIf you want to get started with your own experiments around the Vector API and SIMD, install a current JDK 16 RC (release candidate) build and grab the SIMD FizzBuzz example from this GitHub repo. A nice twist to explore would for instance be using ShortVector instead of IntVector (allowing to put 16 values into 256-bit vector), running the benchmark on machines with the AVX-512 extension (e.g. via the C5 instance type on AWS EC2), or both :)\nApart from the JEP document itself, there isn’t too much info out yet about the Vector API; a great starting point are the \u0026#34;vector\u0026#34; tagged posts on the blog of Richard Startin. Another inspirational resource is August Nagro’s project for vectorized UTF-8 validation based on a paper by John Keiser and Daniel Lemire. Kishor Kharbas and Paul Sandoz did a talk about the Vector API at CodeOne a while ago.\nTaking a step back, it’s hard to overstate the impact which the Vector API potentially will have on the Java platform. Providing SIMD capabilities in a rather easy-to-use, portable way, without having to rely on CPU instruction set specific intrinsics, may result in nothing less than a \u0026#34;democratization of SIMD\u0026#34;, making these powerful means of parallelizing computations available to a much larger developer audience.\nAlso the JDK class library itself may benefit from the Vector API; while JDK authors — unlike Java application developers — already have the JVM intrinsics mechanism at their disposal, the new API will \u0026#34;make prototyping easier, and broaden what might be economical to consider\u0026#34;, as pointed out by Claes Redestad.\nBut nothing in life is free, and code will have to be restructured or even re-written in order to benefit from this. Some problems lend themselves better than others to SIMD-style processing, and only time will tell in which areas the new API will be adopted. As said above, use cases like image processing and AI can benefit from SIMD a lot, due to the nature of the underlying calculations. Also specific data store operations can be sped up significantly using SIMD instructions; so my personal hope is that the Vector API can contribute to making Java an attractive choice for such applications, which previously were not considered a sweet spot for the Java platform.\nAs such, I can’t think of many recent Java API additions which may prove as influential as the Vector API.\n","id":155,"publicationdate":"Mar 8, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_whats_simd_anyways\"\u003eWhat’s SIMD Anyways?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_vectorizing_fizzbuzz\"\u003eVectorizing FizzBuzz\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_examining_the_native_code\"\u003eExamining the Native Code\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_avoiding_scalar_processing_of_tail_elements\"\u003eAvoiding Scalar Processing of Tail Elements\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eJava 16 is around the corner, so there’s no better time than now for learning more about the features which the new version will bring.\nAfter exploring the support for \u003ca href=\"/blog/talking-to-postgres-through-java-16-unix-domain-socket-channels/\"\u003eUnix domain sockets\u003c/a\u003e a while ago,\nI’ve lately been really curious about the incubating Vector API,\nas defined by \u003ca href=\"https://openjdk.java.net/jeps/338\"\u003eJEP 338\u003c/a\u003e,\ndeveloped under the umbrella of \u003ca href=\"https://openjdk.java.net/projects/panama/\"\u003eProject Panama\u003c/a\u003e,\nwhich aims at \u0026#34;interconnecting JVM and native code\u0026#34;.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003e\u003cem\u003eVectors?!?\u003c/em\u003e\nOf course this is not about renewing the ancient Java collection types like \u003ccode\u003ejava.util.Vector\u003c/code\u003e\n(\u0026lt;insert some pun about this here\u0026gt;),\nbut rather about an API which lets Java developers take advantage of the vector calculation capabilities you can find in most CPUs these days.\nNow I’m by no means an expert on low-level programming leveraging specific CPU instructions,\nbut exactly that’s why I hope to make the case with this post that the new Vector API makes these capabilities approachable to a wide audience of Java programmers.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","simd"],"title":"FizzBuzz – SIMD Style!","uri":"https://www.morling.dev/blog/fizzbuzz-simd-style/"},{"content":"","id":156,"publicationdate":"Mar 8, 2021","section":"tags","summary":"","tags":null,"title":"simd","uri":"https://www.morling.dev/tags/simd/"},{"content":"","id":157,"publicationdate":"Jan 31, 2021","section":"tags","summary":"","tags":null,"title":"networking","uri":"https://www.morling.dev/tags/networking/"},{"content":" Table of Contents The Postgres JDBC Driver The Vert.x Postgres Client Other Use Cases Reading a blog post about what’s coming up in JDK 16 recently, I learned that one of the new features is support for Unix domain sockets (JEP 380). Before Java 16, you’d have to resort to 3rd party libraries like jnr-unixsocket in order to use them. If you haven’t heard about Unix domain sockets before, they are \u0026#34;data communications [endpoints] for exchanging data between processes executing on the same host operating system\u0026#34;. Don’t be put off by the name btw.; Unix domain sockets are also supported by macOS and even Windows since version 10.\nDatabases such as Postgres or MySQL use them for offering an alternative to TCP/IP-based connections to client applications running on the same machine as the database. In such scenario, Unix domain sockets are both more secure (no remote access to the database is exposed at all; file system permissions can be used for access control), and also more efficient than TCP/IP loopback connections.\nA common use case are proxies for accessing Cloud-based databases, such as as the GCP Cloud SQL Proxy. Running on the same machine as a client application (e.g. in a sidecar container in case of Kubernetes deployments), they provide secure access to a managed database, for instance taking care of the SSL handling.\nMy curiousity was piqued and I was wondering what it’d take to make use of the new Java 16 Unix domain socket for connecting to Postgres. It was your regular evening during the pandemic, without much to do, so I thought \u0026#34;Let’s give this a try\u0026#34;. To have a testing bed, I started with installing Postgres 13 on Fedora 33. Fedora might not always have the latest Postgres version packaged just yet, but following the official Postgres instructions it is straight-forward to install newer versions.\nIn order to connect with user name and password via a Unix domain socket, one small adjustment to /var/lib/pgsql/13/data/pg_hba.conf is needed: the access method for the local connection type must be switched from the default value peer (which would try to authenticate using the operating system user name of the client process) to md5.\n1 2 3 4 5 ... # TYPE DATABASE USER ADDRESS METHOD # \u0026#34;local\u0026#34; is for Unix domain socket connections only local all all md5 ... Make sure to apply the changed configuration by restarting the database (systemctl restart postgresql-13), and things are ready to go.\nThe Postgres JDBC Driver The first thing I looked into was the Postgres JDBC driver. Since version 9.4-1208 (released in 2016) it allows you to configure custom socket factories, a feature which explicitly was added considering Unix domain sockets. The driver itself doesn’t come with a socket factory implementation that’d actually support Unix domain sockets, but a few external open-source implementations exist. Most notably junixsocket provides such socket factory.\nCustom socket factories must extend javax.net.SocketFactory, and their fully-qualified class name needs to be specified using the socketFactory driver parameter. So it should be easy to create SocketFactory implementation based on the new UnixDomainSocketAddress class, right?\n1 2 3 4 5 6 7 8 9 10 11 12 public class PostgresUnixDomainSocketFactory extends SocketFactory { @Override public Socket createSocket() throws IOException { var socket = new Socket(); socket.connect(UnixDomainSocketAddress.of( \u0026#34;/var/run/postgresql/.s.PGSQL.5432\u0026#34;)); (1) return socket; } // other create methods ... } 1 Create a Unix domain socket address for the default path of the socket on Fedora and related systems It compiles just fine; but it turns out not all socket addresses are equal, and java.net.Socket only connects to addresses of type InetSocketAddress (and the PG driver maintainers seem to sense some air of mystery around these \u0026#34;unusual\u0026#34; events, too):\n1 2 3 4 5 6 7 8 9 10 org.postgresql.util.PSQLException: Something unusual has occurred to cause the driver to fail. Please report this exception. at org.postgresql.Driver.connect(Driver.java:285) ... Caused by: java.lang.IllegalArgumentException: Unsupported address type at java.base/java.net.Socket.connect(Socket.java:629) at java.base/java.net.Socket.connect(Socket.java:595) at dev.morling.demos.PostgresUnixDomainSocketFactory.createSocket(PostgresUnixDomainSocketFactory.java:19) ... Now JEP 380 solely speaks about SocketChannel and stays silent about Socket; but perhaps obtaining a socket from a domain socket channel works?\n1 2 3 4 5 public Socket createSocket() throws IOException { var sc = SocketChannel.open(UnixDomainSocketAddress.of( \u0026#34;/var/run/postgresql/.s.PGSQL.5432\u0026#34;)); return sc.socket(); } Nope, no luck either:\n1 2 3 java.lang.UnsupportedOperationException: Not supported at java.base/sun.nio.ch.SocketChannelImpl.socket(SocketChannelImpl.java:226) at dev.morling.demos.PostgresUnixDomainSocketFactory.createSocket(PostgresUnixDomainSocketFactory.java:17) Indeed it looks like JEP 380 is concerning itself only with the non-blocking SocketChannel API, while users of the blocking Socket API do not get to benefit from it. It should be possible to create a custom Socket implementation based on the socket channel support of JEP 380, but that’s going beyond the scope of my little exploration.\nThe Vert.x Postgres Client If the Postgres JDBC driver doesn’t easily benefit from the JEP, what about other Java Postgres clients then? There are several non-blocking options, including the Vert.x Postgres client and R2DBC. The former is used to bring Reactive capabilities for Postgres into the Quarkus stack, too, so I turned my attention to it.\nNow the Vert.x Postgres Client already has support for Unix domain sockets, by means of adding the right Netty native transport dependency to your project. So purely from functionality perspective, there’s not that much to be gained here. But being able to use domain sockets also with the default NIO transport would still be nice, as it means one less dependency to take care of. So I dug a bit into the code of the Postgres client and Vert.x itself and figured out, that two things needed adjustment:\nThe NIO-based Transport class of Vert.x needs to learn about the fact that SocketChannel now also supports Unix domain sockets (currently, an exception is raised when trying to use them without a Netty native transport)\nNetty’s NioSocketChannel needs some small changes, as it tries to obtain a Socket from the underlying SocketChannel, which doesn’t work for domain sockets as we’ve seen above\nStep 1 was quickly done by creating a custom sub-class of the default Transport class. Two methods needed changes: channelFactory() for obtaining a factory for the actual Netty transport channel, and convert() for converting a Vert.x SocketAddress into a NIO one:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 public class UnixDomainTransport extends Transport { @Override public ChannelFactory\u0026lt;? extends Channel\u0026gt; channelFactory( boolean domainSocket) { if (!domainSocket) { (1) return super.channelFactory(domainSocket); } else { return () -\u0026gt; { try { var sc = SocketChannel.open(StandardProtocolFamily.UNIX); (2) return new UnixDomainSocketChannel(null, sc); } catch(Exception e) { throw new RuntimeException(e); } }; } } @Override public SocketAddress convert(io.vertx.core.net.SocketAddress address) { if (!address.isDomainSocket()) { (3) return super.convert(address); } else { return UnixDomainSocketAddress.of(address.path()); (4) } } } 1 Delegate creation of non domain socket factories to the regular NIO transport implementation 2 This channel factory returns instances of our own UnixDomainSocketChannel type (see below), passing a socket channel based on the new UNIX protocol family 3 Delegate conversion of non domain socket addresses to the regular NIO transport implementation 4 Create a UnixDomainSocketAddress for the socket’s file system path Now let’s take a look at the UnixDomainSocketChannel class. I was hoping to get away again with creating a sub-class of the NIO-based implementation, io.netty.channel.socket.nio.NioSocketChannel in this case. Unfortunately, though, the NioSocketChannel constructor invokes the taboo SocketChannel#socket() method. Of course that’d not be a problem when doing this change in Netty itself, but for my little exploration I ended up copying the class and doing the required adjustments in that copy. I ended up doing two small changes:\nAvoiding the call to SocketChannel#socket() in the constructor:\n1 2 3 4 public UnixDomainSocketChannel(Channel parent, SocketChannel socket) { super(parent, socket); config = new NioSocketChannelConfig(this, new Socket()); (1) } 1 Passing a dummy socket instead of socket.socket(), it shouldn’t be accessed in our case anyways A few methods call the Socket methods isInputShutdown() and isOutputShutdown(); those should be possible to be by-passed by keeping track of the two shutdown flags ourselves\nAs I was creating the UnixDomainSocketChannel in my own namespace instead of Netty’s packages, a few references to the non-public method NioChannelOption#getOptions() needed commenting out, which again shouldn’t be relevant for the domain socket case\nYou can find the complete change in this commit. All in all, not exactly an artisanal piece of software engineering, but the little hack seemed good enough at least for taking a quick glimpse at the new domain socket support. Of course a real implementation could be done much more properly within the Netty project itself.\nSo it was time to give this thing a test ride. As we need to configure the custom Transport implementation, retrieval of a PgPool instance is a tad more verbose than usual:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 PgConnectOptions connectOptions = new PgConnectOptions() .setPort(5432) (1) .setHost(\u0026#34;/var/run/postgresql\u0026#34;) .setDatabase(\u0026#34;test_db\u0026#34;) .setUser(\u0026#34;test_user\u0026#34;) .setPassword(\u0026#34;topsecret!\u0026#34;); PoolOptions poolOptions = new PoolOptions() .setMaxSize(5); VertxFactory fv = new VertxFactory(); fv.transport(new UnixDomainTransport()); (2) Vertx v = fv.vertx(); PgPool client = PgPool.pool(v, connectOptions, poolOptions); (3) 1 The Vert.x Postgres client constructs the domain socket path from the given port and path (via setHost()); the full path will be /var/run/postgresql/.s.PGSQL.5432, just as above 2 Construct a Vertx instance with the custom transport class 3 Obtain a PgPool instance using the customized Vertx instance We then can can use the client instance as usual, only that it now will connect to Postgres using the domain socket instead of via TCP/IP. All this solely using the default NIO-based transports, without the need for adding any Netty native dependency, such as its epoll-based transport.\nI haven’t done any real performance benchmark at this point; in a quick ad-hoc test of executing a trivial SELECT query on a primay key 200,000 times, I observed a latency of ~0.11 ms when using Unix domain sockets — with both, netty-transport-native-epoll and JDK 16 Unix domain sockets — and ~0.13 ms when connecting via TCP/IP. So definitely a significant improvement which can be a deciding factor for low-latency use cases, though in comparison to other reports, the latency reduction of ~15% appears to be at the lower end of the spectrum.\nSome more sincere performance evaluation should be done, for instance also examining the impact on garbage collection. And it goes without saying that you should only trust your own measurements, on your own hardware, based on your specific workloads, in order to decide whether you would benefit from domain sockets or not.\nOther Use Cases Database connectivity is just one of the use cases for domain sockets; highly performant local inter-process communication comes in handy for all kinds of use cases. One which I find particularly intriguing is the creation of modular applications based on a multi-process architecture.\nWhen thinking of classic Java Jakarta EE application servers for instance, you could envision a model where both the application server and each deployment are separate processes, communicating through domain sockets. This would have some interesting advantages, such as stricter isolation (so for instance an OutOfMemoryError in one deployed application won’t impact others) and re-deployments without any risk of classloader leaks, as the JVM of an deployment would be restarted. On the downside, you’d be facing a higher overall memory consumption (although that can at least partly be mitigated through class data sharing, which also works across JVM boundaries) and more costly (remote) method invocations between deployments.\nNow the application server model has fallen out of favour for various reasons, but such multi-process design still is very interesting, for instance for building modular applications that should expose a single web endpoint, while being assembled from a set of processes which are developed and deployed by several, independent teams. Another use case would be desktop applications that are made up of a set of processes for isolation purposes, as it’s e.g. done by most web browsers noawadays with distinct processes for separate tabs. JEP 380 should facilitate this model when creating Java applications, e.g. considering rich clients built with JavaFX.\nAnother, really interesting feature of Unix domain sockets is the ability to transfer open file descriptors from one process to another. This allows for non-disruptive upgrades of server applications, without dropping any open TCP connections. This technique is used for instance by Envoy Proxy for applying configuration changes: upon a configuration change, a second Envoy instance with the new configuration is started up, takes over the active sockets from the previous instance and after some \u0026#34;draining period\u0026#34; triggers a shutdown of the old instance. This approach enables a truly immutable application design within Envoy itself, with all its advantages, without the need for in-process configuration reloads. I highly recommend to read the two posts linked above, they are super-interesting.\nUnfortunately, JEP 380 doesn’t seem to support file descriptor transfers. So for this kind of architecture, you’d still have to refrain to the aforementioned junixsocket library, which explicitly lists file transcriptor transfer support as one of its features. While you couldn’t take advantage of that using Java’s NIO API, it should be doable using alternative networking frameworks such as Netty. Probably a topic for another blog post on another one of those pandemic weekends ;)\nAnd that completes my small exploration of Java 16’s support for Unix domain sockets. If you want to do your own experiments of using them to connect to Postgres, make sure to install the latest JDK 16 EA build and grab the source code of my experimentation from this GitHub repo.\nIt’d be my hope that frameworks like Netty and Vert.x make use of this JDK feature fairly quickly, as only a small amount of code changes is required, and users get to benefit from the higher performance of domain sockets without having to pull in any additional dependencies. In order to keep compatibility with Java versions prior to 16, multi-release JARs offer one avenue for integrating this feature.\n","id":158,"publicationdate":"Jan 31, 2021","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_postgres_jdbc_driver\"\u003eThe Postgres JDBC Driver\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_vert_x_postgres_client\"\u003eThe Vert.x Postgres Client\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_other_use_cases\"\u003eOther Use Cases\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eReading a blog post about what’s \u003ca href=\"https://www.loicmathieu.fr/wordpress/en/informatique/java-16-quoi-de-neuf/\"\u003ecoming up in JDK 16\u003c/a\u003e recently,\nI learned that one of the new features is support for Unix domain sockets (\u003ca href=\"https://openjdk.java.net/jeps/380\"\u003eJEP 380\u003c/a\u003e).\nBefore Java 16, you’d have to resort to 3rd party libraries like \u003ca href=\"https://github.com/jnr/jnr-unixsocket\"\u003ejnr-unixsocket\u003c/a\u003e in order to use them.\nIf you haven’t heard about \u003ca href=\"https://en.wikipedia.org/wiki/Unix_domain_socket\"\u003eUnix domain sockets\u003c/a\u003e before,\nthey are \u0026#34;data communications [endpoints] for exchanging data between processes executing on the same host operating system\u0026#34;.\nDon’t be put off by the name btw.;\nUnix domain sockets are also supported by macOS and even Windows since \u003ca href=\"https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/\"\u003eversion 10\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","postgres","networking"],"title":"Talking to Postgres Through Java 16 Unix-Domain Socket Channels","uri":"https://www.morling.dev/blog/talking-to-postgres-through-java-16-unix-domain-socket-channels/"},{"content":"","id":159,"publicationdate":"Dec 28, 2020","section":"tags","summary":"","tags":null,"title":"jlink","uri":"https://www.morling.dev/tags/jlink/"},{"content":" Table of Contents The Example The API Signature Check jlink Plug-in Summary Discussions around Java’s jlink tool typically center around savings in terms of (disk) space. Instead of shipping an entire JDK, a custom runtime image created with jlink contains only those JDK modules which an application actually requires, resulting in smaller distributables and container images.\nBut the contribution of jlink — as a part of the Java module system at large — to the development of Java application’s is bigger than that: with the notion of link time it defines an optional complement to the well known phases compile time and application run-time:\nLink time is an opportunity to do whole-world optimizations that are otherwise difficult at compile time or costly at run-time. An example would be to optimize a computation when all its inputs become constant (i.e., not unknown). A follow-up optimization would be to remove code that is no longer reachable.\nOther examples for link time optimizations are the removal of unnecessary classes and resources, the conversion of (XML-based) deployment descriptors into binary representations (which will be more efficiently processable at run-time), obfuscation, or the generation of annotation indexes. It would also be very interesting to create AppCDS archives for all the classes of a runtime image at link time and bake that archive into the image, resulting in faster application start-up, without any further manual configuration needed.\nWhile these use cases mostly relate to optimization of the runtime image in one way or another, the link time phase also is beneficial for the validation of applications. In the remainder of this post, I’d like to discuss how link time validation can be employed to ensure the consistency of API signatures within a modularized Java application. This helps to avoid potential NoSuchMethodErrors and related errors which would otherwise be raised by the JVM at application run-time, stemming from the usage of incompatible module versions, different from the ones used at compile time.\nThe Example To make things more tangible, let’s look at an application made up of two modules, customer and order. As always, the full source code is available online, for you to play with. The customer module defines a service interface with the following signature:\n1 2 3 public interface CustomerService { void incrementLoyaltyPoints(long customerId, long orderValue); } The CustomerService interface is part of the customer module’s public API and is invoked from within the order module like so:\n1 2 3 4 5 6 7 public class OrderService { public static void main(String[] args) { CustomerService customerService = ...; customerService.incrementLoyaltyPoints(123, 4999); } } Now let’s assume there’s a new version of the customer module; the signature of the incrementLoyaltyPoints() method got slightly changed for the sake of a more expressive and type-safe API:\n1 2 3 4 5 // record CustomerId(long id) {} public interface CustomerService { void incrementLoyaltyPoints(CustomerId customerId, long orderValue); } We now create a custom runtime image for the application. But we’re at the end of a tough week, so accidentally we add version 2 of the customer module and the unchanged order module:\n1 2 3 4 $ $JAVA_HOME/bin/jlink \\ --module-path=path/to/customer-2.0.0.jar:path/to/order-1.0.0.jar \\ --add-modules=com.example.order \\ --output=target/runtime-image Note that jlink won’t complain about this and create the runtime image. When executing the application via the image we’re in for a bad surprise, though (slightly modified for the sake of readability):\n1 2 3 4 5 $ ./target/runtime-image/bin/java com.example.order.OrderService Exception in thread \u0026#34;main\u0026#34; java.lang.NoSuchMethodError: \u0026#39;void c.e.customer.CustomerService.incrementLoyaltyPoints(long, long)\u0026#39; at com.example.order@1.0.0/c.e.order.OrderService.main(OrderService.java:5) This might be surprising at first; while jlink and the module system in general put a strong emphasis on reliability and e.g. flag referenced yet missing modules, mismatching API signatures like this are not raised as an issue and will only show up as an error at application run-time.\nIndeed, when I did a quick non-representative poll about this on Twitter, it turned out that more than 40% of participants were not aware of this pitfall:\nNeedless to say that it’d be much more desirable to spot this error already early on at link time, before shipping the affected application to production, and suffering from all the negative consequences associated to that.\nThe API Signature Check jlink Plug-in While jlink doesn’t detect this kind of API signature mismatch by itself, it comes with a plug-in API, which allows to hook into and enrich the linking process. By creating a custom jlink plug-in, we can implement the API signature check and fail the image creation process when detecting any invalid method references like the one above.\nUnfortunately though, the plug-in mechanism isn’t an official, supported API at this point. As a matter of fact, it is not even exported within jlink’s own module definition. With the right set of javac/java flags and the help of a small Java agent, it is possible though to compile custom plug-ins and have them picked up by jlink. To learn more about the required sorcery, check out this blog post which I wrote a while ago over on the Hibernate team blog.\nLet’s start with creating the basic structure of the plug-in implementation class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import jdk.tools.jlink.plugin.Plugin; public class SignatureCheckPlugin implements Plugin { @Override public String getName() { (1) return \u0026#34;check-signatures\u0026#34;; } @Override public Category getType() { (2) return Category.VERIFIER; } @Override public String getDescription() { (3) return \u0026#34;Checks the API references amongst the modules of \u0026#34; + \u0026#34;an application for consistency\u0026#34;; } } 1 Returns the name for the option to enable this plug-in when running the jlink command 2 Returns the category of this plug-in, which impacts the ordering within the plug-in stack (other types include TRANSFORMER, FILTER, etc.) 3 A description which will be shown when listing all plug-ins There are a few more optional methods which we could implement, e.g. if the plug-in had any parameters for controlling its behaviors, or if we wanted it to be enabled by default. But as that’s not the case for the plug-in at hand, the only method that’s missing is transform(), which does the actual heavy-lifting of the plug-in’s work.\nNow implementing the complete rule set of the JVM applied when loading and linking classes at run-time would be a somewhat daunting task. As I am lazy and this is just meant to be a basic PoC, I’m going to limit myself to the detection of mismatching signatures of invoked methods, as shown in the customer/order example above. The reason being that this task can be elegantly delegated to an existing tool (I told you, I’m lazy): Animal Sniffer.\nWhile typically used as build tool plug-in for verifying that classes built on a newer JDK version can also be executed with older Java versions (and as such mostly obsoleted by the JDK’s --release option), Animal Sniffer also provides an API for creating and verifying custom signatures. This comes in handy for our jlink plug-in implementation.\nThe general design of the transform() mechanism is that of a classic input-process-output pipeline. The method receives a ResourcePool object, which allows to traverse and examine the set of resources going into the image, such as class files, resource bundles, or manifests. A new resource pool is to be returned, which could contain exactly the same resources as the original one (as in our case); but of course it could also contain less or newly generated resources, or modified ones:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 @Override public ResourcePool transform(ResourcePool in, ResourcePoolBuilder out) { try { byte[] signature = createSignature(in); (1) boolean broken = checkSignature(in, signature); (2) if (broken) { (3) throw new PluginException(\u0026#34;There are API signature \u0026#34; + \u0026#34;inconsistencies, please check the logs\u0026#34;); } } catch(PluginException e) { throw e; } catch(Exception e) { throw new RuntimeException(e); } in.transformAndCopy(e -\u0026gt; e, out); (4) return out.build(); } /** * Creates a signature for all classes in the resource pool. */ private byte[] createSignature(ResourcePool in) throws IOException { ByteArrayOutputStream signatureStream = new ByteArrayOutputStream(); var builder = new StreamSignatureBuilder(signatureStream, new PrintWriterLogger(System.out)); in.entries() (5) .filter(e -\u0026gt; isClassFile(e) \u0026amp;\u0026amp; !isModuleInfo(e)) .forEach(e -\u0026gt; builder.process(e.path(), e.content())); builder.close(); return signatureStream.toByteArray(); } /** * Checks all classes against the given signature. */ private boolean checkSignature(ResourcePool in, byte[] signature) throws IOException { var checker = new StreamSignatureChecker( new ByteArrayInputStream(signature), Collections.\u0026lt;String\u0026gt;emptySet(), new PrintWriterLogger(System.out) ); checker.setSourcePath(Collections.\u0026lt;File\u0026gt;emptyList()); in.entries() (6) .filter(e -\u0026gt; isClassFile(e) \u0026amp;\u0026amp; !isModuleInfo(e) \u0026amp;\u0026amp; !isJdkClass(e)) .forEach(e -\u0026gt; checker.process(e.path(), e.content())); return checker.isSignatureBroken(); } private boolean isJdkClass(ResourcePoolEntry e) { return e.path().startsWith(\u0026#34;/java.\u0026#34;) || e.path().startsWith(\u0026#34;/javax.\u0026#34;) || e.path().startsWith(\u0026#34;/jdk.\u0026#34;); } private boolean isModuleInfo(ResourcePoolEntry e) { return e.path().endsWith(\u0026#34;module-info.class\u0026#34;); } private boolean isClassFile(ResourcePoolEntry e) { return e.path().endsWith(\u0026#34;class\u0026#34;); } 1 Create an Animal Sniffer signature for all the APIs in modules added to the runtime image 2 Verify all classes against that signature 3 If there’s a signature violation, fail the jlink execution by raising a PluginException 4 All classes are passed on as-is 5 Feed each class to Animal Sniffer’s signature builder for creating the signature; non-class resources and module descriptors are ignored 6 Verify each class against the signature; JDK classes can be skipped here, we assume there’s no inconsistencies amongst the JDK’s own modules The input resource pool is traversed twice: first to create an Animal Sniffer signature of all the APIs, then a second time to validate the image’s classes against that signature.\nLet me re-iterate that this a very basic, PoC-level implementation of link time API signature validation. A number of incompatibilities would not be detected by this, e.g. adding an abstract method to a superclass or interface, modifying the number and specification of the type parameters of a class, and others. The implementation could also be further optimized by validating only cross-module references. Still, this implementation is good enough to demonstrate the general principle and advantages of link time API consistency validation.\nWith the implementation in place (see the README in the PoC’s GitHub repository for details on building the project), it’s time to invoke jlink again, this time activating the new plug-in. Now, as mentioned before, the jlink plug-in API isn’t publicly exposed as of Java 15 (the current Java version at the point of writing), which means we need to jump some hoops in order to enable the plug-in and expose it to the jlink tool itself.\nIn a nutshell, a Java agent can be used to bend the module configurations as needed. Details can be found in aforementioned post on the Hibernate blog (the agent’s source code is here). The required boiler plate can be nicely encapsulated within a shell function:\n1 2 3 4 5 6 function myjlink { \\ $JAVA_HOME/bin/jlink \\ -J-javaagent:signature-check-jlink-plugin-registration-agent-1.0-SNAPSHOT.jar \\ -J--module-path=signature-check-jlink-plugin-1.0-SNAPSHOT.jar:path/to/animal-sniffer-1.19.jar:path/to/asm-9.0.jar \\ -J--add-modules=dev.morling.jlink.plugins.sigcheck \u0026#34;$@\u0026#34; \\ } All the -J options are VM options passed through to the jlink tool, in order to register the required Java agent and add the plug-in module to jlink’s module path. Instead of directly calling jlink binary itself, this wrapper function can now be used to invoke jlink with the custom plug-in. Let’s first take a look at the description in the plug-in list:\n1 2 3 4 5 6 7 8 9 10 11 $ myjlink --list-plugins ... Plugin Name: check-signatures Plugin Class: dev.morling.jlink.plugins.sigcheck.SignatureCheckPlugin Plugin Module: dev.morling.jlink.plugins.sigcheck Category: VERIFIER Functional state: Functional. Option: --check-signatures Description: Checks the API references amongst the modules of an application for consistency ... Now let’s try and create the runtime image with the mismatching customer and order modules again:\n1 2 3 4 5 6 7 8 9 10 myjlink --module-path=path/to/customer-2.0.0.jar:path/to/order-1.0.0.jar \\ --add-modules=com.example.order \\ --output=target/runtime-image \\ --check-signatures [INFO] Wrote signatures for 6156 classes. [ERROR] /com.example.order/com/example/order/OrderService.class:5: Undefined reference: void com.example.customer.CustomerService .incrementLoyaltyPoints(long, long) Error: Signature violations, check the logs Et voilà! The mismatching signature of the incrementLoyaltyPoints() method was spotted and the creation of the runtime image failed. Now we could take action, examine our module path and make sure to feed correctly matching versions of the customer and order modules to the image creation process.\nSummary The link time phase — added to the Java platform as part of the module system in version 9, and positioned between the well-known compile time and run-time phases — opens up very interesting opportunities to apply whole-world optimizations and validations to Java applications. One example is the checking the API definitions and usages across the different modules of a Java application for consistency. By means of a custom plug-in for the jlink tool, this validation can happen at link time, allowing to detect any mismatches when assembling an application, so that this kind of error can be fixed early on, before it hits an integration test or even production environment.\nThis is particularly interesting when using the Java module system for building large, modular monolithic applications. Unless you’re working with custom module layers — e.g. via the Layrry launcher — only one version of a given module may be present on the module path. If multiple modules of an application depend on different versions of a transitive dependency, link time API signature validation can help to identify inconsistencies caused by converging to a single version of that dependency.\nThe approach can also help saving build time; when only modifying a single module of a larger modularized application, instead of re-compiling everything from scratch, you could just re-build that single module. Then, when re-creating the runtime image using this module and the other existing ones, you would be sure that all module API signature definitions and usages still match.\nThe one caveat is the fact that the jlink plug-in API isn’t a public, supported API of the JDK yet. I hope this is going to change some time soon, though. E.g. the next planned LTS release, Java 17, would be a great opportunity for officially adding the ability to build and use custom jlink plug-ins. This would open the road towards more wide-spread use of link time optimizations and validations, beyond those provided by the JDK and the jlink tool itself.\nUntil then, you can explore this area starting from the source code of the signature check plug-in and its accompanying Java agent for enabling its usage with jlink.\n","id":160,"publicationdate":"Dec 28, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_example\"\u003eThe Example\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_api_signature_check_jlink_plug_in\"\u003eThe API Signature Check jlink Plug-in\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary\"\u003eSummary\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eDiscussions around Java’s \u003ca href=\"https://openjdk.java.net/jeps/282\"\u003ejlink\u003c/a\u003e tool typically center around savings in terms of (disk) space.\nInstead of shipping an entire JDK,\na custom runtime image created with jlink contains only those JDK modules which an application actually requires,\nresulting in smaller distributables and \u003ca href=\"blog/smaller-faster-starting-container-images-with-jlink-and-appcds/\"\u003econtainer images\u003c/a\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eBut the contribution of jlink — as a part of the Java module system at large — to the development of Java application’s is bigger than that:\nwith the notion of \u003cem\u003elink time\u003c/em\u003e it defines an optional complement to the well known phases \u003cem\u003ecompile time\u003c/em\u003e and application \u003cem\u003erun-time\u003c/em\u003e:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"quoteblock\"\u003e\n\u003cblockquote\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLink time is an opportunity to do whole-world optimizations that are otherwise difficult at compile time or costly at run-time. An example would be to optimize a computation when all its inputs become constant (i.e., not unknown). A follow-up optimization would be to remove code that is no longer reachable.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/blockquote\u003e\n\u003c/div\u003e","tags":["java","jlink","validation"],"title":"jlink's Missing Link: API Signature Validation","uri":"https://www.morling.dev/blog/jlinks-missing-link-api-signature-validation/"},{"content":"","id":161,"publicationdate":"Dec 28, 2020","section":"tags","summary":"","tags":null,"title":"validation","uri":"https://www.morling.dev/tags/validation/"},{"content":" Table of Contents How to Prevent This Situation? The other day, a user in the Debezium community reported an interesting issue; They were using Debezium with Java 1.8 and got an odd NoSuchMethodError:\n1 2 3 4 5 java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer; at io.debezium.connector.postgresql.connection.Lsn.valueOf(Lsn.java:86) at io.debezium.connector.postgresql.connection.PostgresConnection.tryParseLsn(PostgresConnection.java:270) at io.debezium.connector.postgresql.connection.PostgresConnection.parseConfirmedFlushLsn(PostgresConnection.java:235) ... A NoSuchMethodError typically is an indication for a mismatch of the Java version used to compile some code, and the Java version used for running it: some method existed at compile time, but it’s not available at runtime.\nNow indeed we use JDK 11 for building the Debezium code base, while targeting Java 1.8 as the minimal required version at runtime. But there is a method position(int) defined on the Buffer class (which ByteBuffer extends) also in Java 1.8. And as a matter of fact, the Debezium code compiles just fine with that version, too. So why would the user run into this error then?\nTo understand what’s going on, let’s create a very simple class for reproducing the issue:\n1 2 3 4 5 6 7 8 9 import java.nio.ByteBuffer; public class ByteBufferTest { public static void main(String... args) { ByteBuffer buffer = ByteBuffer.wrap(new byte[] { 1, 2, 3 }); buffer.position(1); (1) System.out.println(buffer.get()); } } 1 Why does this not work with Java 1.8 when compiled with JDK 9 or newer? Compile this with a current JDK:\n1 $ javac --source 1.8 --target 1.8 ByteBufferTest.java And sure enough, the NoSuchMethodError shows up when running this with Java 1.8:\n1 2 3 4 $ java ByteBufferTest Exception in thread \u0026#34;main\u0026#34; java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer; at ByteBufferTest.main(ByteBufferTest.java:6) Whereas, when using 1.8 to compile and run this code, it just works fine. Now, if we take a closer look at the error message again, the missing method is defined as ByteBuffer position(int). I.e. for an invoked method like position(), not only its name, parameter type(s), and the name of the declaring class are part of the byte code for that invocation, but also the method’s return type. A look at the byte code of the class using javap confirms that:\n1 2 3 4 5 6 7 8 9 10 11 12 13 $ javap -p -c -s -v -l -constants ByteBufferTest ... public static void main(java.lang.String...); descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS Code: stack=4, locals=2, args_size=1 ... 19: aload_1 20: iconst_1 21: invokevirtual #13 // Method java/nio/ByteBuffer.position:(I)Ljava/nio/ByteBuffer; ... And this points us to the right direction; In Java 1.8, indeed there is no such method, only the position() method on Buffer, which, of course, returns Buffer and not ByteBuffer. Whereas since Java 9, this method (and several others) is overridden in ByteBuffer — leveraging Java’s support for co-variant return types — to return ByteBuffer. The Java compiler will now select that method, ByteBuffer position(int), and record that as the invoked method signature in the byte code of the caller class.\nThis is per-se a nice usability improvement, as it allows to invoke further ByteBuffer methods on the return value, instead of just those methods declared by Buffer. But as we’ve seen, it comes with this little surprise when compiling code on JDK 9 or newer, while trying to keep compatibility with older Java versions. And as it turns out, we were not the first or only ones to encounter this issue. Quite a few open-source projects ran into this, e.g. Eclipse Jetty, Apache Pulsar, Eclipse Vert.x, Apache Thrift, the Yugabyte DB client, and a few others.\nHow to Prevent This Situation? So what can you do in order to prevent this issue from happening? One first idea could be to enforce selection of the right method by casting to Buffer:\n1 ((java.nio.Buffer) buffer).position(1); But while this produces the desired byte code indeed, it isn’t exactly the best way for doing so. You’d have to remember to do so for every invocation of any of the affected ByteBuffer methods, and the seemling unneeded cast might be an easy target for some \u0026#34;clean-up\u0026#34; by unsuspecting co-workers on our team.\nLuckily, there’s a much better way, and this is to rely on the Java compiler’s --release parameter, which was introduced via JEP 247 (\u0026#34;Compile for Older Platform Versions\u0026#34;), added to the platform also in JDK 9. In contrast to the more widely known pair of --source and --target, the --release switch will ensure that only byte code is produced which actually will be useable with the specified Java version. For this purpose, the JDK contains the signature data for all supported Java versions (stored in the $JAVA_HOME/lib/ct.sym file).\nSo all that’s needed really is compiling the code with --release=8:\n1 $ javac --release=8 ByteBufferTest.java Examine the bytecode using javap again, and now the expected signature is in place:\n1 21: invokevirtual #13 // Method java/nio/ByteBuffer.position:(I)Ljava/nio/Buffer; When run on Java 1.8, this virtual method call will be resolved to Buffer#position(int) at runtime, whereas on Java 9 and later, it’d resolve to the bridge method inserted by the compiler into the class file of ByteBuffer due to the co-variant return type, which itself calls the overriding ByteBuffer#position(int) method.\nNow let’s see what happens if we actually try to make use of the overriding method version in ByteBuffer by re-assigning the result:\n1 2 3 4 ... ByteBuffer buffer = ByteBuffer.wrap(new byte[] { 1, 2, 3 }); buffer = buffer.position(1); ... Et voilà, this gets rejected by the compiler when targeting Java 1.8, as the return type of the JDK 1.8 method Buffer#position(int) cannot be assigned to ByteBuffer:\n1 2 3 4 $ javac --release=8 ByteBufferTest.java ByteBufferTest.java:6: error: incompatible types: Buffer cannot be converted to ByteBuffer buffer = buffer.position(1); To cut a long story short, we — and many other projects — should have used the --release switch instead of --source/--target, and the user would not have had that issue. In order to achieve the same in your Maven-based build, just specify the following property in your pom.xml:\n1 2 3 4 5 ... \u0026lt;properties\u0026gt; \u0026lt;maven.compiler.release\u0026gt;8\u0026lt;/maven.compiler.release\u0026gt; \u0026lt;/properties\u0026gt; ... Note that theoretically you could achieve the same effect also when using --source and --target; by means of the --boot-class-path option, you could advise the compiler to use a specific set of bootstrap class files instead of those from the JDK used for compilation. But that’d be quite a bit more cumbersome as it requires you to actually provide the classes of the targeted Java version, whereas --release will make use of the signature data coming with the currently used JDK itself.\n","id":162,"publicationdate":"Dec 21, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_how_to_prevent_this_situation\"\u003eHow to Prevent This Situation?\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe other day, a user in the \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e community reported an interesting issue;\nThey were using Debezium with Java 1.8 and got an odd \u003ccode\u003eNoSuchMethodError\u003c/code\u003e:\u003c/p\u003e\n\u003c/div\u003e","tags":["java","compatibility","troubleshooting"],"title":"ByteBuffer and the Dreaded NoSuchMethodError","uri":"https://www.morling.dev/blog/bytebuffer-and-the-dreaded-nosuchmethoderror/"},{"content":" Table of Contents Getting Started With JfrUnit Case Study 1: Spotting Increased Memory Allocation Case Study 2: Identifying Increased I/O With the Database Discussion Summary and Outlook Functional unit and integration tests are a standard tool of any software development organization, helping not only to ensure correctness of newly implemented code, but also to identify regressions — bugs in existing functionality introduced by a code change. The situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones: How to detect increased response times in a web application? How to identify decreased throughput?\nThese aspects are typically hard to test in an automated and reliable way in the development workflow, as they are dependent on the underlying hardware and the workload of an application. For instance assertions on the duration of specific requests of a web application typically cannot be run in a meaningful way on a developer laptop, which differs from the actual production hardware (ironically, nowadays both is an option, the developer laptop being less or more powerful than the actual production environment). When run in a virtualized or containerized CI environment, such tests are prone to severe measurement distortions due to concurrent load of other applications and jobs.\nThis post introduces the JfrUnit open-source project, which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on indirect metrics which may impact those. JfrUnit allows you define expected values for metrics such as memory allocation, database I/O, or number of executed SQL statements, for a given workload and asserts the actual metrics values — which are obtained from JDK Flight Recorder events — against these expected values. Starting off from a defined base line, future failures of such assertions are an indicator for potential performance regressions in an application, as a code change may have introduced higher GC pressure, the retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.\nJfrUnit provides means of identifying and analyzing such anomalies in a reliable, environment independent way in standard JUnit tests, before they manifest as actual performance regressions in production. Test results are independent from wall clock time and thus provide actionable information, also when not testing with production-like hardware and data volumes.\nThis post is a bit longer than usual (I didn’t have the time to write shorter ;), but it’s broken down into several sections, so you can pause and continue later on with fresh energy:\nGetting Started With JfrUnit\nCase Study 1: Spotting Increased Memory Allocation\nCase Study 2: Identifying Increased I/O With the Database\nDiscussion\nSummary and Outlook\nGetting Started With JfrUnit JfrUnit is an extension for JUnit 5 which integrates Flight Recorder into unit tests; it makes it straight forward to initiate a JFR recording for a given set of event types, execute some test routine, and then assert the JFR events which should have been produced.\nHere is a basic example of a JfrUnit test:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 @JfrEventTest (1) public class JfrUnitTest { public JfrEvents jfrEvents = new JfrEvents(); @Test @EnableEvent(\u0026#34;jdk.GarbageCollection\u0026#34;) (2) @EnableEvent(\u0026#34;jdk.ThreadSleep\u0026#34;) public void shouldHaveGcAndSleepEvents() throws Exception { System.gc(); Thread.sleep(1000); jfrEvents.awaitEvents(); (3) ExpectedEvent event = event(\u0026#34;jdk.GarbageCollection\u0026#34;); (4) assertThat(jfrEvents).contains(event); event = event(\u0026#34;jdk.GarbageCollection\u0026#34;) (4) .with(\u0026#34;cause\u0026#34;, \u0026#34;System.gc()\u0026#34;)); assertThat(jfrEvents).contains(event); event = event(\u0026#34;jdk.ThreadSleep\u0026#34;). with(\u0026#34;time\u0026#34;, Duration.ofSeconds(1))); assertThat(jfrEvents).contains(event); assertThat(jfrEvents.ofType(\u0026#34;jdk.GarbageCollection\u0026#34;)).hasSize(1); (5) } } 1 @JfrEventTest marks this as a JfrUnit test, activating its extension 2 All JFR event types to be recorded must be enabled via @EnableEvent 3 After running the test logic, awaitEvents() must be invoked as a synchronization barrier, making sure all previously produced events have been received 4 Using the JfrEventsAssert#event() method, an ExpectedEvent instance can be created — optionally specifying one or more expected attribute values — which then is asserted via JfrEventsAssert#assertThat() 5 JfrEvents#ofType() allows to filter on specific event types, enabling arbitrary assertions against the returned stream of RecordedEvents By means of a custom assertThat() matcher method for AssertJ, JfrUnit allows to validate that specific JFR events are raised during at test. Events to be matched are described via their event type name, and optionally one more event attribute vaues. As we’ll see in a bit, JfrUnit also integrates nicely with the Java Stream API, allowing you to filter and aggregate recorded event atribute values and match them against expected values.\nJfrUnit persists a JFR recording file for each test method, which you can examine after a test failure, for instance using JDK Mission Control. To learn more about JfrUnit and its capabilities, take a look at the project’s README. The project is in an early proof-of-concept stage at the moment, so changes to its APIs and semantics are likely.\nNow that you’ve taken the JfrUnit quick tour, let’s put that knowledge into practice. Our example project will be the Todo Manager Quarkus application you may already be familiar with from my earlier post about custom JFR events. We’re going to discuss two examples for using JfrUnit to identify potential performance regressions.\nCase Study 1: Spotting Increased Memory Allocation At first, let’s explore how to identify increased memory allocation rates. Typically, it’s mostly library and middleware authors who are interested in this. For a library such as Hibernate ORM it can make a huge difference whether a method that is invoked many times on a hot code path allocates a few objects more or less. Less object allocations mean less work for the garbage collector, which in turn means those precious CPU cores of your machine can spend more cycles processing your actual business logic.\nBut also for application developers it can be beneficial to keep an eye on — and systematically track — object allocations, as regressions there lead to increased GC pressure, and in turn eventually to higher latencies and reduced throughput.\nThe key for tracking object allocations with JFR are the jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB events, which are emitted when\nan object allocation triggered the creation of a new thread-local allocation buffer (TLAB)\nan object got allocated outside of the thread’s TLAB\nThread-local allocation buffers (TLAB) When creating new object instances on the heap, this primarily happens via thread-local allocation buffers. A TLAB is a pre-allocated memory block that’s exclusively used by a single thread. Since this space is exclusively owned by the thread, creating new objects within a TLAB can happen without costly synchronization with other threads. Once a thread’s current TLAB capacity is about to be exceeded by a new object allocation, a new TLAB will be allocated for that thread. In addition, large objects will typically need to be directly allocated outside of the more efficient TLAB space.\nTo learn more about TLAB allocation, refer to part #4 of Aleksey Shipilёv’s \u0026#34;JVM Anatomy Quark\u0026#34; blog series.\nNote these events don’t allow for tracking of each individual object allocation, as multiple objects will be allocated within a TLAB before a new one is required, and thus the jdk.ObjectAllocationInNewTLAB event will be emitted. But as that event exposes the size of the new TLAB, we can keep track of the overall amount of memory that’s allocated while the application is running.\nIn that sense, jdk.ObjectAllocationInNewTLAB represents a sampling of object allocations, which means we need to collect a reasonable number of events to identify those locations in the program which are the sources of high object allocation and thus frequently trigger new TLAB creations.\nSo let’s start and work on a test for spotting regressions in terms of object allocations of one of the Todo Manager app’s API methods, GET /todo/{id}. To identify a baseline of the allocation to be expected, we first invoke that method in a loop and print out the actual allocation values. This should happen in intervals, e.g. every 10,000 invocations, so to average out numbers from individual API calls.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 @Test @EnableEvent(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) (1) @EnableEvent(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;) public void retrieveTodoBaseline() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder() .build(); for (int i = 1; i\u0026lt;= 100_000; i++) { executeRequest(r, client); if (i % 10_000 == 0) { jfrEvents.awaitEvents(); (2) long sum = jfrEvents.filter(this::isObjectAllocationEvent) (3) .filter(this::isRelevantThread) .mapToLong(this::getAllocationSize) .sum(); System.out.printf( Locale.ENGLISH, \u0026#34;Requests executed: %s, memory allocated: (%,d bytes/request)%n\u0026#34;, i, sum/10_000 ); jfrEvents.reset(); (4) } } private void executeRequest(Random r, HttpClient client) throws Exception { int id = r.nextInt(20) + 1; HttpRequest request = HttpRequest.newBuilder() .uri(new URI(\u0026#34;http://localhost:8081/todo/\u0026#34; + id)) .headers(\u0026#34;Content-Type\u0026#34;, \u0026#34;application/json\u0026#34;) .GET() .build(); HttpResponse\u0026lt;String\u0026gt; response = client .send(request, HttpResponse.BodyHandlers.ofString()); assertThat(response.statusCode()).isEqualTo(200); } private boolean isObjectAllocationEvent(RecordedEvent re) { (5) String name = re.getEventType().getName(); return name.equals(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) || name.equals(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;); } private long getAllocationSize(RecordedEvent re) { (6) return re.getEventType().getName() .equals(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) ? re.getLong(\u0026#34;tlabSize\u0026#34;) : re.getLong(\u0026#34;allocationSize\u0026#34;); } private boolean isRelevantThread(RecordedEvent re) { (7) return re.getThread().getJavaName().startsWith(\u0026#34;vert.x-eventloop\u0026#34;) || re.getThread().getJavaName().startsWith(\u0026#34;executor-thread\u0026#34;); } } 1 Enable the jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB JFR events 2 Every 10,000 events, wait for all the JFR events produced so far 3 Calculate the total allocation size, by summing up the TLAB allocations of all relevant threads 4 Reset the event stream for the next iteration 5 Is this a TLAB event? 6 Get the new TLAB size in case of an in TLAB allocation, otherwise the allocated object size out of TLAB 7 We’re only interested in the web application’s own threads, in particular ignoring the main thread which runs the HTTP client of the test Note that unlike in the initial example showing the usage of JfrUnit, here we’re not using the simple contains() AssertJ matcher, but rather calculate some custom value — the overall object allocation in bytes — by means of filtering and aggregating the relevant JFR events.\nHere are the numbers I got from running 100,000 invocations:\n1 2 3 4 5 6 7 8 9 10 Requests executed: 10000, memory allocated: 34096 bytes/request Requests executed: 20000, memory allocated: 31768 bytes/request Requests executed: 30000, memory allocated: 31473 bytes/request Requests executed: 40000, memory allocated: 31462 bytes/request Requests executed: 50000, memory allocated: 31547 bytes/request Requests executed: 60000, memory allocated: 31545 bytes/request Requests executed: 70000, memory allocated: 31537 bytes/request Requests executed: 80000, memory allocated: 31624 bytes/request Requests executed: 90000, memory allocated: 31703 bytes/request Requests executed: 100000, memory allocated: 31682 bytes/request As we see, there’s some warm-up phase during which allocation rates still go down, but after ~20 K requests, the allocation per request is fairly stable, with a volatility of ~1% when averaged out over 10K requests. This means that this initial phase should be excluded during the actual test.\nTo emphasize the key part again, this allocation is per request, it is independent from wall clock time and thus is neither dependent from the machine running the test (i.e. the test should behave the same when running on a developer laptop and on a CI machine), nor is it subject to volatility induced by other workloads running concurrently.\nTracking Object Allocations in Java 16 The two TLAB allocation events provide all the information required for analysing object allocations in Java applications, but often it’s not practical to enable them on a continuous basis when running in production. Due to the high amount of events produced, enabling them adds some overhead in terms of latency, also the size of JFR recording files can be hard to predict.\nBoth issues are addressed by a JFR improvement that’s proposed for inclusion into Java 16, \u0026#34;JFR Event Throttling\u0026#34;. This will provide control over the emission rate of events, e.g. allowing to sample object allocations with a defined rate of 100 events per second, which addresses both the overhead as well as the recording size issue. A new event type, jdk.ObjectAllocationSample will be added, too, which will be enabled in the JFR default configuration.\nFor JfrUnit, explicit control over the event sampling rate will be a very interesting capability, as a higher sampling rate may lead to stable results more quickly, in turn resulting in shorter test execution times.\nBased on that, the actual test could look like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 @Test @EnableEvent(\u0026#34;jdk.ObjectAllocationInNewTLAB\u0026#34;) @EnableEvent(\u0026#34;jdk.ObjectAllocationOutsideTLAB\u0026#34;) public void retrieveTodo() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder().build(); for (int i = 1; i\u0026lt;= 20_000; i++) { (1) executeRequest(r, client); } jfrEvents.awaitEvents(); jfrEvents.reset(); for (int i = 1; i\u0026lt;= 10_000; i++) { (2) executeRequest(r, client); } jfrEvents.awaitEvents(); long sum = jfrEvents.filter(this::isObjectAllocationEvent) .filter(this::isRelevantThread) .mapToLong(this::getAllocationSize) .sum(); assertThat(sum / 10_000).isLessThan(33_000); (3) } 1 Warm-up phase 2 The actual test phase 3 Assert the memory allocation per request is within the expected boundary; note we could also add a lower boundary, so to make sure we notice any future improvements (e.g. caused by upgrading to new efficient versions of a library), which otherwise may hide subsequent regressions Now let’s assume we’ve wrapped up the initial round of work on this application, and its tests have been passing on CI for a while. One day, the retrieveTodo() performance test method fails though:\n1 2 3 4 5 java.lang.AssertionError: Expecting: \u0026lt;388370L\u0026gt; to be less than: \u0026lt;33000L\u0026gt; Ugh, it’s suddenly allocating about ten times more memory per request than before! What has happened? To find the answer, we can take a look at the test’s JFR recording, which JfrUnit persists under target/jfrunit:\n1 2 3 4 ls target/jfrunit dev.morling.demos.quarkus.TodoResourcePerformanceTest-createTodo.jfr dev.morling.demos.quarkus.TodoResourcePerformanceTest-retrieveTodo.jfr Let’s open the *.jfr file for the failing test in JDK Mission Control (JMC) in order to analyse all the recorded events (note that the recording will always contain some JfrUnit-internal events which are needed for synchronizing the recording stream and the events exposed to the test).\nWhen taking a look at the TLAB events of the application’s executor thread, the culprit is identified quickly; a lot of the sampled TLAB allocations contain this stack trace (click on the image to enlarge):\nInteresting, REST Assured loading a Jackson object mapper, what’s going on there? Here’s the full stacktrace:\nSo it seems a REST call to another service is made from within the TodoResource#get(long) method! At this point we know where to look into the source code of the application:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 @GET @Transactional @Produces(MediaType.APPLICATION_JSON) @Path(\u0026#34;/{id}\u0026#34;) public Response get(@PathParam(\u0026#34;id\u0026#34;) long id) throws Exception { Todo res = Todo.findById(id); User user = RestAssured.given().port(8082) .when() .get(\u0026#34;/users/\u0026#34; + res.userId) .as(User.class); res.userName = user.name; return Response.ok() .entity(res) .build(); } Gasp, it looks like a developer on the team has been taking the microservices mantra a bit too far, and has changed the code so it invokes another service in order to obtain some additional data associated to the user who created the retrieved todo.\nWhile that’s problematic in its own right due to the inherent coupling between the two services (how should the Todo Manager service react if the user service isn’t available?), they made matters worse by using the REST Assured API as a REST client, in a less than ideal way. The API’s simplicity and elegance makes it a great solution for testing (and indeed that’s its primary use case), but this particular usage seems to be not such a good choice for production code.\nAt this point you should ask yourself whether the increased allocation per request actually is a problem for your application or not. To determine if that’s the case, you could run some tests on actual request latency and throughput in a production-like environment. If there’s no impact based on the workload you have to process, you might very well decide that additional allocations are well spent for your application’s purposes.\nIncreasing the allocation per request by a factor of ten in the described way quite likely does not fall into this category, though. At the very least, we should look into making the call against the User REST API more efficiently, either by setting up REST Assured in a more suitable way, or by looking for an alternative REST client. Of course the external API call just by itself adds to the request latency, which is something we might want to avoid.\nIt’s also worth examining the application’s garbage collection behavior. In order to so, you can run the performance test method again, either enabling all the GC-related JFR event types, or by enabling a pre-existing JFR configuration (the JDK comes with two built-in JFR configurations, default and profile, but you can also create and export them via JMC):\n1 2 3 4 5 @Test @EnableConfiguration(\u0026#34;profile\u0026#34;) public void retrieveTodo() throws Exception { // ... } Note that the pre-defined configurations imply minimum durations for certain event types; e.g. the I/O events discussed in the next section will only be recorded if they have a duration of 20 ms or longer. Depending on your testing requirements, you may have to adjust and tweak the configuration to be used.\nOpen the recording in JMC, and you’ll see there’s a substantial amount of GC activity happening:\nThe difference to the GC behavior before this code change is striking:\nPause times are worse, directly impacting the application’s latency, and the largely increased GC volume means the production environment will be able to serve less concurrent requests when reaching its capacity limits, meaning you’d have to provision another machine earlier on as your load increases.\nMemory Leak in the JFR Event Streaming API The astute reader may have noticed that there is a memory leak before and after the code change, as indicated by the ever increased heap size post GC. After some exploration it turned out that this is a bug in the JFR event streaming API which holds on to a large number of RecordedEvent instances internally. Erik Gahlin from the OpenJDK team logged JDK-8257906 for tracking and hopefully fixing this in JDK 16.\nNow such drastic increase of object allocation and thus potential impact on performance should hopefully be an exception rather than a regular situation. But the example shows how continuous performance unit tests on impacting metrics like memory allocation, using JfrUnit and JDK Flight Recorder and, can help to identify performance issues in an automated and reliable way, preventing such regression to sneak into production. Being able to identify this kind of issue by running tests locally on a developer laptop or a CI server can be a huge time-saver and productivity boost.\nCase Study 2: Identifying Increased I/O With the Database Once you’ve started to look at performance regression tests through the lense of JfrUnit, more and more possibilities pop up. Asserting a maximum number of garbage collections? Not a problem. Avoiding an unexpected amount of file system IO? The jdk.FileRead and jdk.FileWrite events are our friend. Examining and asserting the I/O done with the database? Easily doable. Assertions on application-specific JFR event types you’ve defined yourself? Sure thing!\nYou can find a complete list of all JFR event types by JDK version in this nice matrix created by Tom Schindl. The number of JFR event types is growing constantly; as of JDK 15, there are 157 different ones of them.\nNow let’s take a look at assertions on database I/O, as the amount of data fetched from or written to the database often is a very impactful factor of an enterprise application’s behavior. A regression here, e.g. fetching more data from the database than anticipated, may indicate that data is unnecessarily loaded. For instance it might be the case that a set of data is loaded only in order to filter it in the application subsequently, instead of doing so via SQL in the database itself, resulting in increased request durations.\nSo how could such test look like for our GET /todo/{id} API call? The general approach is the same as before with memory allocations: first define a baseline of the bytes read and written by invoking the API under test for a given number of executions. Once that’s done, you can implement the actual test, including an assertion on the expected number of bytes read or written:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 @Test @EnableEvent(value=\u0026#34;jdk.SocketRead\u0026#34;, stackTrace=INCLUDED) (1) @EnableEvent(value=\u0026#34;jdk.SocketWrite\u0026#34;, stackTrace=INCLUDED) public void retrieveTodo() throws Exception { Random r = new Random(); HttpClient client = HttpClient.newBuilder() .build(); for (int i = 1; i\u0026lt;= ITERATIONS; i++) { executeRequest(r, client); } jfrEvents.awaitEvents(); long count = jfrEvents.filter(this::isDatabaseIoEvent).count(); (2) assertThat(count / ITERATIONS).isEqualTo(4) .describedAs(\u0026#34;write + read per statement, write + read per commit\u0026#34;); long bytesReadOrWritten = jfrEvents.filter(this::isDatabaseIoEvent) .mapToLong(this::getBytesReadOrWritten) .sum(); assertThat(bytesReadOrWritten / ITERATIONS).isLessThan(250); (3) } private boolean isDatabaseIoEvent(RecordedEvent re) { (4) return ((re.getEventType().getName().equals(\u0026#34;jdk.SocketRead\u0026#34;) || re.getEventType().getName().equals(\u0026#34;jdk.SocketWrite\u0026#34;)) \u0026amp;\u0026amp; re.getInt(\u0026#34;port\u0026#34;) == databasePort); } private long getBytesReadOrWritten(RecordedEvent re) { (5) return re.getEventType().getName().equals(\u0026#34;jdk.SocketRead\u0026#34;) ? re.getLong(\u0026#34;bytesRead\u0026#34;) : re.getLong(\u0026#34;bytesWritten\u0026#34;); } 1 Enable the jdk.SocketRead and jdk.SocketWrite event types; by default, those don’t contain the stacktrace for the events, so that needs to be enabled explicitly 2 There should be four events per invocation of the API method 3 Less than 250 bytes I/O are expected per invocation 4 Only read and write events on the database port are relevant for this test, but e.g. not I/O on the web port of the application 5 Retrieve the value of the event’s bytesRead or bytesWritten field, depending on the event type Now let’s again assume that after some time the test begins to fail. This time it’s the assertion on the number of executed reads and writes:\n1 2 3 4 5 6 AssertionFailedError: Expecting: \u0026lt;18L\u0026gt; to be equal to: \u0026lt;4L\u0026gt; but was not. Also the number of bytes read and written has substantially increased:\n1 2 3 4 5 java.lang.AssertionError: Expecting: \u0026lt;1117L\u0026gt; to be less than: \u0026lt;250L\u0026gt; That’s definitely something to look into. So let’s open the recording of the failed test in Flight Recorder and take a look at the socket read and write events. Thanks to enabling stacktraces for the two JFR event types we can quite quickly identify the events asssociated to an invocation of the GET /todo/{id} API:\nAt this point, some familiarity with the application in question will come in handy to identify suspicous events. But even without that, we could compare previous recordings of successful test runs with the recording from the failing one in order to see where differences are. In the case at hand, the BlobInputStream and Hibernate’s BlobTypeDescriptor in the call stack seem pretty unexpected, as our User entity didn’t have any BLOB attribute before.\nIn reality, comparing with the latest version and a look into the git history of that class could confirm that there’s a new attribute storing an image (perhaps not a best practice to do so ;):\n1 2 3 4 5 6 7 8 9 10 @Entity public class Todo extends PanacheEntity { public String title; public int priority; public boolean completed; @Lob (1) public byte[] image; } 1 This looks suspicious! We now would have to decide whether this image attribute actually should be loaded for this particular use case, (if so, we’d have to adjust the test accordingly), or whether it would for instance make more sense to mark this property as a lazily loaded one and only retrieve it when actually required.\nSolely working with the raw socket read and write events can be a bit cumbersome, though. Wouldn’t it be nice if we also had the actual SQL statement which caused this I/O? Glad you asked! Neither Hibernate nor the Postgres JDBC driver emit any JFR events at the moment (although well-informed sources are telling me that the Hibernate team wants to look into this). Therefore, in part two of this blog post series, we’ll discuss how to instrument an existing library to emit events like this, using a Java agent, without modifying the library in question.\nDiscussion JfrUnit in conjunction with JDK Flight Recorder opens up a very interesting approach for identifying potential performance regressions in Java applications. Instead of directly measuring an application’s performance metrics, most notably latency and throughput, the idea is to measure and assert metrics that impact the performance characteristics. This allows you to implement stable and reliable automated performance regression tests, whose outcome does not depend on the capabilities of the execution environment (e.g. number/size of CPUs), or other influential factors like concurrently running programs.\nRegressions in such impacting metrics, e.g. the amount of allocated memory, or bytes read from a database, are indicators that the application’s performance may have degraded. This approach offers some interesting advantages over performance tests on actual latency and throughput themselves:\nHardware independent: You can identify potential regressions also when running tests on hardware which is different (e.g. less powerful) from the actual production hardware\nFast feedback cycle: Being able to run performance regression tests on developer laptops, even in the IDE, allows for fast identification of potential regressions right during development, instead of having to wait for the results of less frequently executed test runs in a traditional performance test lab environment\nRobustness: Tests are robust and not prone to factors such as the load induced by parallel jobs of a CI server or a virtualized/containerized environment\nPro-active identification of performance issues: Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system’s load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on\nReduced need for warm-up: For traditional performance tests of Java-based applications, a thorough warm-up is mandatory, e.g. to ensure proper optimization of the JIT-compiled code. In comparison, metrics like file or database I/O are very stable for a defined workload, so that regressions can be identified also with just a single or a few executions\nNeedless to say, that you should be aware of the limitations of this approach, too:\nNo statement on user-visible performance metrics: Measuring and asserting performance-impacting factors doesn’t tell you anything in terms of the user-visible performance characteristics themselves. While we can reason about guarantees like \u0026#34;The system can handle 10K concurrent requests while the 99.9 percentile of requests has a latency of less than 250 ms\u0026#34;, that’s not the case for metrics like memory allocation or I/O. What does it mean if an application allocates 100 KB of RAM for a particular use case? Is it a lot? Too much? Just fine?\nFocused on identifying regressions: Somewhat related to the first point, this approach of testing is focused not on specific absolute values, but rather on identifying performance regressions. It’s hard to tell whether 100 KB database I/O is good or bad for a particular web request, but a change from 100 KB to 200 KB might indicate that something is wrong\nFocused on identifying potential regressions: A change in performance-impacting metrics does not necessarily imply an actual user-visible performance regression. For instance it might be acceptable for a specific request to allocate more RAM than it did before, if the production system generally isn’t under high load and the additional GC effort doesn’t matter in practice\nDoes not work for all performance-impacting metrics: Some performance metrics cannot be meaningfully asserted in plain unit tests; e.g. degraded throughput due to lock contention can typically only be identified with a reasonable number of concurrent requests\nOnly identifies regressions in the application itself: A traditional integrative performance test of an enterprise application will also capture issues in related components, such as the application’s database. A query run with a sub-optimal execution plan won’t be noticed with this testing approach\nVolatile results for timer-based tasks: While metrics like object allocations should be stable e.g. for a specific web request, events which are timing-based, would yield more events on a slower environment than on a faster machine\nSummary and Outlook JUnit tests based on performance-impacting factors can be a very useful part of the performance testing strategy for an application. They can help to identify potential performance regressions very early in the development lifecycle, when they can be fixed comparatively easy and cheap. Of course they are no silver bullet; you should consider them as complement for classic performance tests running on production-like hardware, not a replacement.\nThe approach may feel a bit unfamiliar initially, and it may take some time to learn about the different metrics which can be measured with JFR and asserted via JfrUnit, as well as their implications on an application’s performance characteristics. But once this hurdle is passed, continuous performance regression tests can be a valuable tool in the box of every software and performance engineer.\nJfrUnit is still in its infancy, and could evolve into a complete toolkit around automated test of JFR-based metrics. Ideas for future development include:\nA more powerful \u0026#34;built-in\u0026#34; API which e.g. provides the functionality for calculating the total TLAB allocations of a given set of threads as a ready-to-use method\nIt could also be very interesting to run assertions against externally collected JFR recording files. This would allow to validate workloads which require more complex set-ups, e.g. running in a dedicated performance testing lab, or even from continuous recordings taken in production\nThe JFR event streaming API could be leveraged for streaming queries on live events streamed from a remote system\nAnother use case we haven’t explored yet is the validation of resource consumption before and after a defined workload. E.g. after logging in and out a user 100 times, the system should roughly consume — ignoring any initial growth after starting up — the same amount of memory. A failure of such assertion would indicate a potential memory leak in the application\nJfrUnit might automatically detect that certain metrics like object allocations are still undergoing some kind of warm-up phase and thus are not stable, and mark such tests as potentially incorrect or flaky\nKeeping track of historical measurement data, e.g. allowing to identify regressions which got introduced step by step over a longer period of time, with one comparatively small change being the straw finally breaking the camel’s back\nYour feedback, feature requests, or even contributions to the project will be highly welcomed!\nStay tuned for part two of this blog post, where we’ll explore how to trace the SQL statements executed by an application using the JMC Agent and assert these query events using JfrUnit. This will come in very handy for instance for identifying common performance problems like N+1 SELECT statements.\nMany thanks to Hans-Peter Grahsl, John O’Hara, Nitsan Wakart, and Sanne Grinovero for their extensive feedback while writing this blog post!\n","id":163,"publicationdate":"Dec 16, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_getting_started_with_jfrunit\"\u003eGetting Started With JfrUnit\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_case_study_1_spotting_increased_memory_allocation\"\u003eCase Study 1: Spotting Increased Memory Allocation\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_case_study_2_identifying_increased_io_with_the_database\"\u003eCase Study 2: Identifying Increased I/O With the Database\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_discussion\"\u003eDiscussion\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary_and_outlook\"\u003eSummary and Outlook\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFunctional unit and integration tests are a standard tool of any software development organization,\nhelping not only to ensure correctness of newly implemented code,\nbut also to identify regressions — bugs in existing functionality introduced by a code change.\nThe situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones:\nHow to detect increased response times in a web application?\nHow to identify decreased throughput?\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThese aspects are typically hard to test in an automated and reliable way in the development workflow,\nas they are dependent on the underlying hardware and the workload of an application.\nFor instance assertions on the duration of specific requests of a web application typically cannot be run in a meaningful way on a developer laptop,\nwhich differs from the actual production hardware\n(ironically, nowadays both is an option, the developer laptop being less or more powerful than the actual production environment).\nWhen run in a virtualized or containerized CI environment, such tests are prone to severe measurement distortions due to concurrent load of other applications and jobs.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis post introduces the \u003ca href=\"https://github.com/gunnarmorling/jfrunit\"\u003eJfrUnit\u003c/a\u003e open-source project, which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on \u003cem\u003eindirect metrics\u003c/em\u003e which may impact those.\nJfrUnit allows you define expected values for metrics such as memory allocation, database I/O, or number of executed SQL statements, for a given workload and asserts the actual metrics values — which are obtained from \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e events — against these expected values.\nStarting off from a defined base line, future failures of such assertions are an indicator for potential performance regressions in an application, as a code change may have introduced higher GC pressure,\nthe retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","testing","performance"],"title":"Towards Continuous Performance Regression Testing","uri":"https://www.morling.dev/blog/towards-continuous-performance-regression-testing/"},{"content":"","id":164,"publicationdate":"Dec 13, 2020","section":"tags","summary":"","tags":null,"title":"containers","uri":"https://www.morling.dev/tags/containers/"},{"content":" Table of Contents Creating a Modular Runtime Image for a Quarkus Application Adding an AppCDS Archive to a Custom Runtime Image Creating a Linux Container Image Let’s See Some Numbers! A few months ago I wrote about how you could speed up your Java application’s start-up times using application class data sharing (AppCDS), based on the example of a simple Quarkus application. Since then, quite some progress has been made in this area: Quarkus 1.6 brought built-in support for AppCDS, so that now you just need to provide the -Dquarkus.package.create-appcds=true option when building your project, and you’ll find an AppCDS file in the target folder.\nThings get more challenging though when combining AppCDS with custom Java runtime images, as produced using the jlink tool added in Java 9. Combining custom runtime images with AppCDS is very attractive, in particular when looking at the deployment of Java applications via Linux containers. Instead of putting the full Java runtime into the container image, you only add those JDK modules which your application actually requires. (Parts of) what you save in image size by doing so, can be used for adding an AppCDS archive to your container image. The result will be a container image which still is smaller than before — and thus is faster to push to a container registry, distribute to worker nodes in a Kubernetes cluster, etc. — and which starts up significantly faster.\nA challenge though is that AppCDS archives must be created with exactly same Java runtime which later on is used to run the application. In the case of jlink this means the custom runtime image itself must be used to produce the AppCDS archive. In other words, the default archive produced by the Quarkus build unfortunately cannot be used with jlink images. The goal for this post is to explore\nthe steps required to create a custom runtime image for a simple Java CRUD application based on Quarkus,\nhow to build a Linux container image with this custom runtime image and the application itself,\nhow this approach compares to container images with the full Java runtime in terms of size and start-up time.\nCreating a Modular Runtime Image for a Quarkus Application It’s a common misbelief that only Java applications which have been fully ported to the Java module system (JPMS) would be able to benefit from jlink. But as explained by Simon Ritter in this blog post, this is not true actually; you don’t need to fully modularize an application in order to run it via a custom runtime image.\nWhile indeed the creation of a runtime image is a bit easier when it only is comprised of proper Java modules, it also is possible to create a runtime image by explicitly stating which JDK (or other) modules it should contain. The application can then be run via the traditional classpath, just as you’d do it with a full Java runtime. Which JDK modules to add though? To answer this question, the jdeps tool comes in handy. Via its --print-module-deps option it can determine for a given set of JARs which (JDK) modules they depend on, and which thus are the ones that need to go into the custom runtime image.\nHaving built the example application from the previous blog post via mvn clean verify, let’s try and invoke jdeps like so:\n1 2 3 jdeps --print-module-deps \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar This results in an error though:\n1 Error: com.sun.istack.istack-commons-runtime-3.0.10.jar is a multi-release jar file but --multi-release option is not set Ok, we need to tell which code version to analyse for multi-release JARs; no problem:\n1 2 3 4 jdeps --print-module-deps \\ --multi-release 15 \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar Hum, some progress, but still an issue:\n1 Exception in thread \u0026#34;main\u0026#34; java.lang.module.FindException: Module java.xml.bind not found, required by java.ws.rs This one is a bit odd; the file org.jboss.spec.javax.ws.rs.jboss-jaxrs-api_2.1_spec-2.0.1.Final.jar is an explicit module with a module-info.class descriptor, which references the module java.xml.bind, and this one is not found on the module path. It’s not quite clear to me why this is flagged here, given that the JAX-RS API JAR is part of the class path and not the module path. But it’s not a big problem, we simply can add the JAXB API (which also is provided on the class path) on the module path, too.\nThe same issue arises for some other dependencies which are explicit modules already, so we end up with the following configuration:\n1 2 3 4 5 jdeps --print-module-deps \\ --multi-release 15 \\ --module-path target/lib/jakarta.activation.jakarta.activation-api-1.2.1.jar:target/lib/org.reactivestreams.reactive-streams-1.0.3.jar:target/lib/org.jboss.spec.javax.xml.bind.jboss-jaxb-api_2.3_spec-2.0.0.Final.jar \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar And another issue, now about some missing dependencies:\n1 2 3 4 5 ... org.postgresql.util.internal.Nullness -\u0026gt; org.checkerframework.dataflow.qual.Pure not found org.wildfly.common.wildfly-common-1.5.4.Final-format-001.jar org.wildfly.common.Substitutions$Target_Branch -\u0026gt; com.oracle.svm.core.annotate.AlwaysInline not found ... After taking a closer look, these are either compile-time only dependencies (like annotations from the Checker framework), or dependencies of optional features which are not relevant for our case. These can be safely ignored using the --ignore-missing-deps switch, which leaves us with this jdeps invocation:\n1 2 3 4 5 6 jdeps --print-module-deps \\ --ignore-missing-deps \\ --multi-release 15 \\ --module-path target/lib/jakarta.activation.jakarta.activation-api-1.2.1.jar:target/lib/org.reactivestreams.reactive-streams-1.0.3.jar:target/lib/org.jboss.spec.javax.xml.bind.jboss-jaxb-api_2.3_spec-2.0.0.Final.jar \\ --class-path target/lib/* \\ target/todo-manager-1.0.0-SNAPSHOT-runner.jar The required JDK modules are printed out finally:\n1 java.base,java.compiler,java.instrument,java.naming,java.rmi,java.security.jgss,java.security.sasl,java.sql,jdk.jconsole,jdk.unsupported I.e. out of the nearly 60 modules which make up OpenJDK 15, only ten are required by this particular application. Building a custom runtime image containing only these modules should result in quite some space saving.\nWhy is a Particular Module Required? When looking at the module list, you might wonder why certain modules actually are needed. What is this application doing with jdk.jconsole for instance? To gain insight into this, jdeps can help, too. Run it again without the --print-module-deps switch, and you can grep for interesting module references:\n1 2 3 4 5 jdeps \u0026lt;...\u0026gt; | grep jconsole org.jboss.narayana.jta.narayana-jta-5.10.6.Final.jar -\u0026gt; jdk.jconsole com.arjuna.ats.arjuna.tools.stats -\u0026gt; com.sun.tools.jconsole jdk.jconsole In this case, there’s a single dependency to jconsole, from the Narayana transaction manager. Depending on the details, it might be an opportunity to reach out to the maintainers of such library and discuss, whether this dependency really is needed or whether it could be avoided (e.g. by moving the code in question to a separate module), resulting in a further decreased size of custom runtime images.\nWith the list of required modules, creating the actual runtime image is rather simple:\n1 2 3 4 $JAVA_HOME/bin/jlink \\ --add-modules java.base,java.compiler,java.instrument,java.naming,java.rmi,java.security.jgss,java.security.sasl,java.sql,jdk.jconsole,jdk.unsupported \\ --compress 2 --no-header-files --no-man-pages \\(1) --output target/runtime-image (2) 1 Compressing the runtime image as well as omitting header files and man pages helps to further reduce the size of the runtime image 2 Output location for creating the runtime image In order to create a dynamic AppCDS archive for our application classes later on, we now need to add the class data archive for all of the classes of the image itself. Failing to do so results in this error message:\n1 2 Error occurred during initialization of VM DynamicDumpSharedSpaces is unsupported when base CDS archive is not loaded This step isn’t very well documented, and at this point I was somewhat stuck. But you always can count on the OpenJDK community: after asking about this on Twitter, Claes Redestad pointed me into the right direction:\n1 ./target/runtime-image/bin/java -Xshare:dump Thanks, Claes! This creates the base class data archive under target/runtime-image/lib/server/classes.jsa, adding ~12 MB to the runtime image, which now has a size of ~63 MB; not too bad.\nAdding an AppCDS Archive to a Custom Runtime Image Having created the custom Java runtime image, let’s now add the AppCDS archive to it. Since the introduction of dynamic AppCDS archives in JDK 13, this is one simple step which only requires to run the application with the -XX:ArchiveClassesAtExit option:\n1 2 3 4 5 6 7 8 9 10 cd target (1) mkdir runtime-image/cds (2) (3) runtime-image/bin/java \\ -XX:ArchiveClassesAtExit=runtime-image/cds/app-cds.jsa \\ -jar todo-manager-1.0.0-SNAPSHOT-runner.jar cd .. 1 The class path used when running the application later on must be the same as (or rather a prefix of, to be precise) the class path used for building the AppCDS archive; hence changing to the target directory, so to run with -jar *-runner.jar, instead of with -jar target/*-runner.jar 2 Creating a folder for storing the AppCDS archive 3 Using the java binary of the runtime image to launch the application and create the AppCDS archive when exiting This will create the CDS archive under target/runtime-image/cds/app-cds.jsa. In the next step this can be added to a Linux container image, built e.g. using Docker or podman. Note that while jlink supports cross-platform builds (so for instance you could build a custom runtime image for a Linux container on macOS), the same isn’t the case for AppCDS. This means an AppCDS archive to be used by a containerized application needs to be built on Linux. When not running on Linux yourself, but on Windows or macOS, you could put the entire build process into a container for this purpose.\nCreating a Linux Container Image At this point we have built our actual application, a custom Java runtime image with the required JDK modules, and an AppCDS archive for the application’s classes. The final step is to put everything into a Linux container image, which is quickly done via a small Dockerfile:\n1 2 3 4 5 6 7 8 FROM registry.fedoraproject.org/fedora-minimal:33 COPY target/runtime-image /opt/todo-manager/jdk COPY target/lib/* /opt/todo-manager/lib/ COPY target/todo-manager-1.0.0-SNAPSHOT-runner.jar /opt/todo-manager COPY todo-manager.sh /opt/todo-manager ENTRYPOINT [ \u0026#34;/opt/todo-manager/todo-manager.sh\u0026#34; ] This uses the Fedora minimal base image, which is a great foundation for container images. With a size of ~120 MB, it’s small enough to be distributed efficiently, while still providing the flexibility of a complete Linux distribution, e.g. allowing for the installation of additional tools if needed.\nEven Smaller Container Images If you wanted to shrink the image size further and felt adventureous, you could look into using Alpine Linux as a base image; the issue there though is that Alpine comes with musl instead of glibc (as used by the JDK) as its implementation of the ISO C and POSIX standard APIs. The OpenJDK Portola project aims at providing a port to Alpine and musl. But as of JDK 15, no GA build of this port exists yet. For JDK 16, an early access build of the Alpine/musl port is available.\nAnother option for smaller images is to use jib, which also is supported by Quarkus out of the box. I haven’t tried out yet though whether/how jib would work with custom runtime images and AppCDS.\nIt’s also worth pointing out that the size of base images doesn’t matter too much in practice, as container images use a layered file system, which means that typically rather stable base image layers don’t need to be redistributed too often when pushing or pulling a container image.\nThe container’s entry point, todo-manager.sh, is a basic shell script, which starts the actual Java application via the Java runtime image:\n1 2 3 4 5 6 7 #!/bin/bash export PATH=\u0026#34;/opt/todo-manager/jdk/bin:${PATH}\u0026#34; cd /opt/todo-manager \u0026amp;\u0026amp; \\ (1) exec java -Xshare:on -XX:SharedArchiveFile=jdk/cds/app-cds.jsa -jar \\ (2) todo-manager-1.0.0-SNAPSHOT-runner.jar 1 Changing into the todo-manager directory, so to make sure the same JAR path is passed as when creating the CDS archive 2 Specifying the archive name; the -Xshare:on isn’t strictly needed, it’s used here though to ensure the process will fail if something is wrong with the CDS archive, instead of silently not using it Let’s See Some Numbers! Finally, let’s compare some numbers: container image size, and start-up time for different ways of containerizing the todo manager application. I’ve tried out four different aproaches:\nOpenJDK 11 on the RHEL UBI 8.3 image (universal base image), as per the default Dockerfile created for new Quarkus applications\nA full OpenJDK 15 on Fedora 33 (as there’s no OpenJDK 15 package for the RHEL base image yet)\nA custom runtime image for OpenJDK 15 on Fedora 33\nA custome runtime image with AppCDS on Fedora 33\nHere are the results, running on a Hetzner Cloud CX4 instance (4 vCPUs, 16 GB RAM), using Fedora 33 as the host OS:\nAs we can see, the container image size is significantly lower when adding a custom Java runtime image instead of the full JDK. In particular when comparing to the OpenJDK package of Fedora 33 which is a fair bit larger than the OpenJDK 11 package of the RHEL UBI 8.3 image, the difference is striking.\nThe start-up times are as displayed by Quarkus, averaged over five runs. Numbers have improved by about 10% by going from OpenJDK 11 to 15, which is explained by multiple improvements in this area, most notably the introduction of default CDS archives for the JDK’s own classes in JDK 12 (JEP 341). Using a custom runtime image by itself doesn’t have any measurable impact on start-up time. The AppCDS archive improves the start-up time by a whopping 54%. Unless pure image size is the key factor for you (in which case you should look for alternative approaches anyways, see note \u0026#34;Even Smaller Container Images\u0026#34; above), I would say that the additional 40 MB for the AppCDS archive are more than worth it. In particular as the resulting container image still is way smaller than when adding the full JDK, be it with the Fedora base image or the RHEL UBI one.\nBased on those numbers, I think it’s fair to say that custom Java runtime images created via jlink, combined with AppCDS archives are a great foundation for containerized Java applications. Adding a custom runtime image containing only those JDK modules actually needed by an application help to cut down image size signficantly. Parts of that saved space can be invested into adding an AppCDS archive, so you end up with a container image that’s smaller and starts up faster. I.e. you can have this cake, and eat it, too!\nThe one downside is the increased complexity of the build process for producing the runtime image as well as the AppCDS archive. This should be manageable though by means of scripting and automation; also I’d expect tooling like the Quarkus Maven plug-in and others to further improve on this front. One tricky aspect is that you must not forget to rebuild the custom runtime image, in case you have added dependencies to your application which affect the set of required JDK modules. Automated tests of the application running via the runtime image should help to identify this situation.\nIf you’d like to give it a try yourself, or obtain numbers for the different deployment approaches on your own hardware, you can find all the required code and information in this GitHub repository.\n","id":165,"publicationdate":"Dec 13, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_creating_a_modular_runtime_image_for_a_quarkus_application\"\u003eCreating a Modular Runtime Image for a Quarkus Application\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_adding_an_appcds_archive_to_a_custom_runtime_image\"\u003eAdding an AppCDS Archive to a Custom Runtime Image\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_creating_a_linux_container_image\"\u003eCreating a Linux Container Image\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_lets_see_some_numbers\"\u003eLet’s See Some Numbers!\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eA few months ago I \u003ca href=\"/blog/building-class-data-sharing-archives-with-apache-maven/\"\u003ewrote about\u003c/a\u003e how you could speed up your Java application’s start-up times using application class data sharing (\u003ca href=\"http://openjdk.java.net/jeps/350\"\u003eAppCDS\u003c/a\u003e),\nbased on the example of a simple \u003ca href=\"https://quarkus.io/\"\u003eQuarkus\u003c/a\u003e application.\nSince then, quite some progress has been made in this area:\nQuarkus 1.6 brought \u003ca href=\"https://quarkus.io/guides/maven-tooling#quarkus-package-pkg-package-config_quarkus.package.create-appcds\"\u003ebuilt-in support for AppCDS\u003c/a\u003e,\nso that now you just need to provide the \u003cem\u003e-Dquarkus.package.create-appcds=true\u003c/em\u003e option when building your project,\nand you’ll find an AppCDS file in the \u003cem\u003etarget\u003c/em\u003e folder.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThings get more challenging though when combining AppCDS with custom Java runtime images,\nas produced using the \u003ca href=\"https://docs.oracle.com/en/java/javase/15/docs/specs/man/jlink.html\"\u003ejlink\u003c/a\u003e tool added in Java 9.\nCombining custom runtime images with AppCDS is very attractive,\nin particular when looking at the deployment of Java applications via Linux containers.\nInstead of putting the full Java runtime into the container image, you only add those JDK modules which your application actually requires.\n(Parts of) what you save in image size by doing so,\ncan be used for adding an AppCDS archive to your container image.\nThe result will be a container image which still is smaller than before — and thus is faster to push to a container registry, distribute to worker nodes in a Kubernetes cluster, etc. — and which starts up significantly faster.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","containers","jlink"],"title":"Smaller, Faster-starting Container Images With jlink and AppCDS","uri":"https://www.morling.dev/blog/smaller-faster-starting-container-images-with-jlink-and-appcds/"},{"content":" Table of Contents Bonus: Schema Creation The Testcontainers project is invaluable for spinning up containerized resources during your (JUnit) tests, e.g. databases or Kafka clusters.\nFor users of JUnit 5, the project provides the @Testcontainers extension, which controls the lifecycle of containers used by a test. When testing a Quarkus application though, this is at odds with Quarkus\u0026#39; own @QuarkusTest extension; it’s a recommended best practice to avoid fixed ports for any containers started by Testcontainers. Instead, you should rely on Docker to automatically allocate random free ports. This avoids conflicts between concurrently running tests, e.g. amongst multiple Postgres containers, started up by several parallel job runs in a CI environment, all trying to allocate Postgres\u0026#39; default port 5432. Obtaining the randomly assigned port and passing it into the Quarkus bootstrap process isn’t possible though when combining the two JUnit extensions.\nOne work-around you can find described e.g. on StackOverflow is setting up the database container via a static class initializer block and then propagating the host and port to Quarkus through system properties. While this works, it’s not ideal in terms of lifecycle control (e.g. how to make sure the container is started up once at the beginning of an entire test suite), and in general, it just feels a bit hack-ish.\nLuckily, there’s a better alternative, which interestingly isn’t discussed as much: using Quarkus\u0026#39; notion of test resources. There’s just two steps involved. First, create an implementation of the QuarkusTestResourceLifecycleManager interface, which controls your resource’s lifecycle. In case of a Postgres database, this could look like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 public class PostgresResource implements QuarkusTestResourceLifecycleManager { static PostgreSQLContainer\u0026lt;?\u0026gt; db = new PostgreSQLContainer\u0026lt;\u0026gt;(\u0026#34;postgres:13\u0026#34;) (1) .withDatabaseName(\u0026#34;tododb\u0026#34;) .withUsername(\u0026#34;todouser\u0026#34;) .withPassword(\u0026#34;todopw\u0026#34;); @Override public Map\u0026lt;String, String\u0026gt; start() { (2) db.start(); return Collections.singletonMap( \u0026#34;quarkus.datasource.url\u0026#34;, db.getJdbcUrl() ); } @Override public void stop() { (3) db.stop(); } } 1 Configure the database container, using the Postgres 13 container image, the given database name, and credentials 2 Start up the database; the returned map of configuration properties amends/overrides the configuration properties of the test; in this case the datasource URL will be overridden with the value obtained from Testcontainers, which contains the randomly allocated public port of the Postgres container 3 Shut down the database after all tests have been executed All you then need to do is to reference that test resource from your test class using the @QuarkusTestResource annotation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 @QuarkusTest @QuarkusTestResource(PostgresResource.class) (1) public class TodoResourceTest { @Test public void createTodoShouldYieldId() { given() .when() .contentType(ContentType.JSON) .body(\u0026#34;\u0026#34;\u0026#34; { \u0026#34;title\u0026#34; : \u0026#34;Learn Quarkus\u0026#34;, \u0026#34;priority\u0026#34; : 1, } \u0026#34;\u0026#34;\u0026#34;) .then() .statusCode(201) .body( matchesJson( \u0026#34;\u0026#34;\u0026#34; { \u0026#34;id\u0026#34; : 1, \u0026#34;title\u0026#34; : \u0026#34;Learn Quarkus\u0026#34;, \u0026#34;priority\u0026#34; : 1, \u0026#34;completed\u0026#34; : false, } \u0026#34;\u0026#34;\u0026#34;)); } } 1 Ensures the Postgres database is started up And that’s it! Note that all the test resources of the test module are detected and started up, before starting the first test.\nBonus: Schema Creation One other subtle issue is the creation of the database schema for the test. E.g. for my Todo example application, I’d like to use a schema named \u0026#34;todo\u0026#34; in the Postgres database:\n1 create schema todo; Quarkus supports SQL load scripts for executing SQL scripts when Hibernate ORM starts. But this will be executed only after Hibernate ORM has set up all the database objects, such as tables, sequences, indexes etc. (I’m using the drop-and-create database generation mode during testing). This means that while a load script is great for inserting test data, it’s executed too late for defining the actual database schema itself.\nLuckily, most database container images themselves support the execution of load scripts right upon database start-up; The Postgres image is no exception, so it’s just a matter of exposing that script via Testcontainers. All it needs for that is a bit of tweaking of the Quarkus test resource for Postgres:\n1 2 3 4 5 6 7 8 static PostgreSQLContainer\u0026lt;?\u0026gt; db = new PostgreSQLContainer\u0026lt;\u0026gt;(\u0026#34;postgres:13\u0026#34;) .withDatabaseName(\u0026#34;tododb\u0026#34;) .withUsername(\u0026#34;todouser\u0026#34;) .withPassword(\u0026#34;todopw\u0026#34;) .withClasspathResourceMapping(\u0026#34;init.sql\u0026#34;, (1) \u0026#34;/docker-entrypoint-initdb.d/init.sql\u0026#34;, BindMode.READ_ONLY); 1 Expose the file src/main/resources/init.sql as /docker-entrypoint-initdb.d/init.sql within the container With that in place, Postgres will start up and the \u0026#34;todo\u0026#34; schema will be created in the database, before Quarkus boots Hibernate ORM, which will populate the schema, and finally, all tests can run.\nYou can find the complete source code of this test and the Postgres test resource on GitHub.\nMany thanks to Sergei Egorov for his feedback while writing this blog post!\n","id":166,"publicationdate":"Nov 28, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_bonus_schema_creation\"\u003eBonus: Schema Creation\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://www.testcontainers.org/\"\u003eTestcontainers\u003c/a\u003e project is invaluable for spinning up containerized resources during your (JUnit) tests,\ne.g. databases or Kafka clusters.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor users of JUnit 5, the project provides the \u003ca href=\"https://www.testcontainers.org/quickstart/junit_5_quickstart/\"\u003e\u003ccode\u003e@Testcontainers\u003c/code\u003e\u003c/a\u003e extension, which controls the lifecycle of containers used by a test.\nWhen testing a \u003ca href=\"https://quarkus.io/\"\u003eQuarkus\u003c/a\u003e application though, this is at odds with Quarkus\u0026#39; own \u003ca href=\"https://quarkus.io/guides/getting-started-testing#recap-of-http-based-testing-in-jvm-mode\"\u003e\u003ccode\u003e@QuarkusTest\u003c/code\u003e\u003c/a\u003e extension;\nit’s a recommended \u003ca href=\"https://bsideup.github.io/posts/testcontainers_fixed_ports/\"\u003ebest practice\u003c/a\u003e to avoid fixed ports for any containers started by Testcontainers.\nInstead, you should rely on Docker to automatically allocate random free ports.\nThis avoids conflicts between concurrently running tests,\ne.g. amongst multiple Postgres containers,\nstarted up by several parallel job runs in a CI environment, all trying to allocate Postgres\u0026#39; default port 5432.\nObtaining the randomly assigned port and passing it into the Quarkus bootstrap process isn’t possible though when combining the two JUnit extensions.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","quarkus","testing","testcontainers"],"title":"Quarkus and Testcontainers","uri":"https://www.morling.dev/blog/quarkus-and-testcontainers/"},{"content":"","id":167,"publicationdate":"Nov 28, 2020","section":"tags","summary":"","tags":null,"title":"testcontainers","uri":"https://www.morling.dev/tags/testcontainers/"},{"content":" Table of Contents Do We Really Need Plug-ins? Plug-ins in Layered Java Applications Class Unloading in Practice If Things Go Wrong Learning More Layers are sort of the secret sauce of the Java platform module system (JPMS): by providing fine-grained control over how individual JPMS modules and their classes are loaded by the JVM, they enable advanced usages like loading multiple versions of a given module, or dynamically adding and removing modules at application runtime.\nThe Layrry API and launcher provides a small plug-in API based on top of layers, which for instance can be used to dynamically add plug-ins contributing new views and widgets to a running JavaFX application. If such plug-in gets removed from the application again, all its classes need to be unloaded by the JVM, avoiding an ever-increasing memory consumption if for instance a plug-in gets updated multiple times.\nIn this blog post I’m going to explore how to ensure classes from removed plug-in layers are unloaded in a timely manner, and how to find the culprit in case some class fails to be unloaded.\nDo We Really Need Plug-ins? Before diving into the details of class unloading, let’s spend some time to think about the use cases for dynamic plug-ins in Java applications to begin with. I would argue that for typical backend applications this need mostly has diminished. At large, the industry is moving away from application servers and their model around \u0026#34;deploying\u0026#34; applications (which you could consider as some kind of \u0026#34;plug-in\u0026#34;) into a running server process. Instead, there’s a strong trend towards immutable application packages, based on stacks like Quarkus or Spring Boot, embedding the web server, the application as well as its dependencies, often-times deployed as container images.\nThe advantages of this approach centered around immutable images manifold, e.g. in terms of security (no interface for deploying applications is needed) and governance (it’s always exactly clear which version of the application is running). Updates — i.e. the deployment of a new revision of the container image — can be put in place e.g. with help of a proxy in front of a cluster of application nodes, which are updated in a rolling manner. That way, there’s no downtime of the service that’ll impact the user. Also techniques like canary releases and A/B testing, as well as rolling back to specific earlier versions of an application become a breeze that way.\nThe situation is different though when it comes to client applications. When thinking of your favourite editor, IDE or web browser for instance, requiring a restart when installing or updating a plug-in is not desirable. Instead, it should be possible to add plug-ins (or new plug-in versions) to a running application instance and be usable immediately, without interrupting the flow of the user. The same applies for many IoT scenarios, where e.g. an application consuming sensor measurements should be updateable without any downtime.\nPlug-ins in Layered Java Applications JPMS addresses this requirement via the notion of module layers:\nA layer is created from a graph of modules in a Configuration and a function that maps each module to a ClassLoader. Creating a layer informs the Java virtual machine about the classes that may be loaded from the modules so that the Java virtual machine knows which module that each class is a member of.\nLayers are the perfect means of adding new code into a running Java application: they can be added and removed dynamically, and code in an already running layer can invoke functionality from a dynamically added layer in different ways, e.g. via reflection or by using the service loader API. Layrry exposes this functionality via a very basic plug-in API:\n1 2 3 4 5 6 public interface PluginLifecycleListener { void pluginAdded(PluginDescriptor plugin); void pluginRemoved(PluginDescriptor plugin); } 1 2 3 4 5 6 public class PluginDescriptor { public String getName() { ... } public ModuleLayer getModuleLayer() { ... } } A plug-in in this context is a JPMS layer containing one or more modules (either explicit or automatic) which all are loaded via a single class loader. A Layrry-based application can implement the PluginLifecycleListener service contract in order to be notified whenever a plug-in is added or removed. Plug-ins are loaded from configured directories in the file system which are monitored by Layrry (other means of (un-)installing plug-ins may be added in future versions of Layrry).\nInstalling a plug-in is as easy as copying its JAR(s) into a sub-folder of such monitored directory. Layrry will copy the plug-in contents to a temporary directory, create a layer with all the plug-ins JARs, and notify any registered plug-in listeners about the new layer. These will typically use the service loader API then to interact with application-specific services which model its extension points, e.g. to contribute visual UI components in case of a desktop application.\nThe reverse process happens when a plug-in gets un-installed: the user removes a plug-in’s directory, and all listeners will be notified by the Layrry about the removal. They should release all references to any classes from the removed plug-in, rendering it avaible for garbage collection.\nClass Unloading in Practice There is no API in the Java platform for explicitly unloading a given class. Instead, \u0026#34;a class or interface may be unloaded if and only if its defining class loader may be reclaimed by the garbage collector\u0026#34; (JLS, chapter 12.7). This means in a layered Java application any classes in a layer that got removed can be unloaded as soon as the layer’s class loader is subject to GC. Most importantly, no class in a still running layer must keep a (strong) reference to any class of the removed layer; otherwise this class would hinder collecting the removed layer’s loader and its classes.\nAs an example, let’s look at the modular-tiles demo, a JavaFX application which uses the Layrry plug-in API for dynamically adding and removing tiles with different widgets like clocks and gauges to its graphical UI. The tiles themselves are implemented using the fabulous TilesFX project by Gerrit Grundwald.\nIf you want to follow along, check out the source code of the demo and build it as per the instructions in the README file. Then run the Layrry launcher with the -Xlog:class+unload=info option, so to be notified about any unloaded classes in the system output:\n1 2 3 4 java -Xlog:class+unload=info \\ -jar path/to/layrry-launcher-1.0-SNAPSHOT-all.jar \\ --layers-config staging/layers.toml \\ --properties staging/versions.properties Now add and remove some tiles plug-ins a few times:\n1 2 cp -r staging/plugins-prepared/* staging/plugins rm -rf staging/plugins/* The widgets will show up and disappear in the JavaFX UI, but what about class unloading in the logs? In all likelyhood, nothing! This is because without any further configuration, the G1 garbage collector (which is used by the JDK by default since Java 9) will unload classes only during a full garbage collection, which may only run after a long time (if at all), if there’s no substantial object allocation happening.\nJEP 158: Unified JVM Logging The -Xlog option has been defined by JEP 158, added to the JDK with Java 9, which provides a \u0026#34;common logging system for all components of the JVM\u0026#34;. The new unified options should be preferred over the legacy options like -XX:+TraceClassLoading and -XX:+TraceClassUnloading. Usage of -Xlog is described in detail in the java man page; also Nicolai Parlog discusses JEP 158 in great depth in this blog post.\nSo at this point you could trigger a GC explicitly, e.g. via jcmd:\n1 jcmd \u0026lt;pid\u0026gt; GC.run But of course that’s not too desirable when running things in production. Instead, if you’re on JDK 12 or later, you can use the new G1PeriodicGCInterval option for triggering a periodic GC:\n1 2 3 4 5 java -Xlog:class+unload=info \\ -XX:G1PeriodicGCInterval=5000 \\ -jar path/to/layrry-launcher-1.0-SNAPSHOT-all.jar \\ --layers-config staging/layers.toml \\ --properties staging/versions.properties Introduced via JEP 346 (\u0026#34;Promptly Return Unused Committed Memory from G1\u0026#34;), this will periodically initiate a concurrent GC cycle (or optionally even a full GC). Add and remove some plug-ins again, and after some time you should see messages about the unloaded classes in the log:\n1 2 3 4 5 ... [138.912s][info][class,unload] unloading class org.kordamp.tiles.sparkline.SparklineTilePlugin 0x0000000800de1840 [138.912s][info][class,unload] unloading class org.kordamp.tiles.gauge.GaugeTilePlugin 0x0000000800de2040 [138.913s][info][class,unload] unloading class org.kordamp.tiles.clock.ClockTilePlugin 0x0000000800de2840 ... From what I observed, class unloading doesn’t happen on every concurrent GC cycle; it might take a few cycles after a plug-in has been removed until its classes are unloaded. If you’re not using G1, but the new low-pause concurrent collectors Shenandoah or ZGC, they’ll be able to concurrently unload classes without any special configuration needed. Note that class unloading is not a mandatory operation which would have to be provided by every GC implementation. E.g. initial ZGC releases did not support class unloading, which would have rendered them unsuitable for this use case.\nJEP 371: Hidden Classes As mentioned above, regular classes can only be unloaded if their defining class loader become subject to garbage collection. This can be an issue for frameworks and libraries which generate lots of classes dynamically at runtime, e.g. script language implementations or solutions like Presto, which generates a class for each query.\nThe traditional workaround is to generate each class using its own dedicated class loader, which then can be discarded specifically. This solves the GC issue, but it isn’t ideal in terms of overall memory consumption and speed of class generation. Hence, JDK 15 defines a notion of Hidden Classes (JEP 371), which are not created by class loaders and thus can be unloaded eagerly: \u0026#34;when all instances of the hidden class are reclaimed and the hidden class is no longer reachable, it may be unloaded even though its notional defining loader is still reachable\u0026#34;.\nYou can find some more information on hidden classes in this tweet thread and this code example on GitHub.\nBut who wants to stare at logs in the system output, that’s so 2010! So let’s fire up JDK Mission Control and trigger a recording via the JDK Flight Recorder (JFR) to observe what’s going on in more depth.\nJFR can capture class unloading events, you need to make sure though to enable this event type, which is not the case by default. In order to do so, start a recording, then go to the Template Manager, edit or create a flight recording template and check the Enabled box for the events under Java Virtual Machine → Class Loading. With the recorder running, add and remove some tiles plug-ins to the running application.\nOnce the recording is finished, you should see class unloading events under JVM Internals → Class Loading:\nIn this case, the classes from a set of plug-ins were unloaded at 16:48:11, which correlates to the periodic GC cycle running at that time and spending a slightly increased time for cleaning up class loader data:\nAs a good Java citizen, Layrry itself also emits JFR events whenever a plug-in layer is added or removed, which helps to track the need for classes to be unloaded:\nIf Things Go Wrong Now let’s look at the situation where some class failed to unload after its plug-in layer was removed. Common reasons for that include remaining references from classes in a still running layer to classes in the removed layer, threads started by a class in the removed layer which were not stopped, and JVM shutdown hooks registered by code in the removed layer.\nThis is known as a class loader leak and is problematic as it means more and more memory will be consumed and cannot be freed as plug-ins are added and removed, which eventually may lead to an OutOfMemoryError. So how could you detect and analyse this situation? An OutOfMemoryError in production would surely be an indicator that there must be a memory or class loader leak somewhere. It’s also a good idea to regularly examine JFR recording files (e.g. in your testing or staging environment): the absence of any class unloading event despite the removal of plug-ins should trigger an investigation.\nAs far as analysing the situation is concerned, examining a heap dump of the application will typically yield insight into the cause rather quickly. Take a heap dump using jcmd as shown above, then load the dump into a tool such as Eclipse MAT. In Eclipse MAT, the \u0026#34;Duplicate Classes\u0026#34; action is a great starting point. If one class has been loaded by multiple class loaders, but failed to unload, it’s a pretty strong indicator that something is wrong:\nThe next step is to analyse the shortest path from the involved class loaders to a GC root:\nSome object on that path must hold on to a reference to a class or the class loader of the removed plug-in, preventing the loader to be GC-ed. In the case at hand, it’s the leakingPlugins field in the PluginRegistry class, to which each plug-in is added upon addition of the layer, but then apparently its coffee-deprived author forgot to remove the plug-in from that collection within the pluginRemoved() event handler ;)\nAs a quick side note, there’s a really cool plug-in for Eclipse MAT written by Vladimir Sitnikov, which allows you to query heap dumps using SQL. It maps each class to its own \u0026#34;table\u0026#34;, so that e.g. classes loaded more than once could be selected using the following SQL query on the java.lang.Class class:\n1 2 3 4 5 6 7 8 9 10 11 12 select c.name, listagg(toString(c.\u0026#34;@classLoader\u0026#34;)) as \u0026#39;loaders\u0026#39;, count(*) as \u0026#39;count\u0026#39; from \u0026#34;java.lang.Class\u0026#34; c where c.name \u0026lt;\u0026gt; \u0026#39;\u0026#39; group by c.name having count(*) \u0026gt; 1 Resulting in the same list of classes as above:\nThis could come in very handy for more advanced heap dump analyses, which cannot be done using Eclipse MAT’s built-in query capabilities.\nLearning More Via module layers, JPMS provides the foundation for dynamic plug-in architectures, as demonstrated by Layrry. Removing layers at runtime requires some care and consideration, so to avoid class loader leaks which eventually may lead to OutOfMemoryErrors. As so often, JDK Mission Control, JFR, and Eclipse MAT prove to be invaluable tools in the box of every Java developer, helping to ensure class unloading in your layered applications is done correctly, and if it is not, helping to understand and fix the underlying issue.\nHere are some more resources about class unloading and analysing class loader leaks:\nShenandoah GC in JDK 14, Part 2: Concurrent roots and class unloading: A blog post touching on class unloading in Shenandoah by Roman Kennke\nZGC Concurrent Class Unloading: A conference talk by Erik Österlund\nclass loader leaks: A series of blog posts by Mattias Jiderhamn\nClassLoader \u0026amp; memory leaks: a Java love story: A post about heap dump analysis by Aloïs Micard\nLastly, if you’d like to explore the dynamic addition and removal of JPMS layers to a running application yourself, the modular-tiles demo app is a great starting point. Its source code can be found on GitHub.\n","id":168,"publicationdate":"Oct 14, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_do_we_really_need_plug_ins\"\u003eDo We Really Need Plug-ins?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_plug_ins_in_layered_java_applications\"\u003ePlug-ins in Layered Java Applications\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_class_unloading_in_practice\"\u003eClass Unloading in Practice\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_if_things_go_wrong\"\u003eIf Things Go Wrong\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_learning_more\"\u003eLearning More\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLayers are sort of the secret sauce of the Java platform module system (JPMS):\nby providing fine-grained control over how individual JPMS modules and their classes are loaded by the JVM,\nthey enable advanced usages like loading multiple versions of a given module, or dynamically adding and removing modules at application runtime.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"/blog/introducing-layrry-runner-and-api-for-modularized-java-applications/\"\u003eLayrry\u003c/a\u003e API and launcher provides a small plug-in API based on top of layers,\nwhich for instance can be used to dynamically add plug-ins contributing new views and widgets to a running JavaFX application.\nIf such plug-in gets removed from the application again,\nall its classes need to be unloaded by the JVM, avoiding an ever-increasing memory consumption if for instance a plug-in gets updated multiple times.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this blog post I’m going to explore how to ensure classes from removed plug-in layers are unloaded in a timely manner,\nand how to find the culprit in case some class fails to be unloaded.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jpms","plugin-architecture"],"title":"Class Unloading in Layered Java Applications","uri":"https://www.morling.dev/blog/class-unloading-in-layered-java-applications/"},{"content":"","id":169,"publicationdate":"Oct 14, 2020","section":"tags","summary":"","tags":null,"title":"plugin-architecture","uri":"https://www.morling.dev/tags/plugin-architecture/"},{"content":" Lately I’ve been fascinated by the possibility to analyse the assembly code emitted by the Java JIT (just-in-time) compiler. So far I had only looked only into Java class files using javap; diving into the world of assembly code feels a bit like Alice must have felt when falling down the rabbit whole into wonderland.\nMy motivation for this exploration was trying to understand what is faster in Java: a switch statement over strings, or a lookup in a hash map. Solely looking at Java bytecode isn’t going far enough to answer this question, as the difference lies in the actual assembly statements executed on the CPU. I’ll keep the details around that for another time; in this post I’m just going quickly to share what I learned in regards to building a tool needed for this exercise, hsdis.\nhsdis is a disassembler library which can be used with the java runtime as well as tools such as JitWatch to analyse the code produced by the Java JIT compiler. For licensing reasons though it doesn’t come as a binary with the JDK. Instead, you need it to build yourself from source. Instructions for doing so are spread across a few different places, but I couldn’t find any 100% current information, in particular as OpenJDK has moved to git and GitHub just recently.\nSo here is what you need to do in order to build hsdis for OpenJDK 15; in my case I’m running on macOS, slightly different steps may apply for other platforms. First, get the OpenJDK source code and check out the version for which you want to build hsdis:\n1 2 git clone git@github.com:openjdk/jdk.git git checkout jdk-15+36 # Current stable JDK 15 build The source location of hsdis has changed with the move from Mercurial to git:\n1 cd src/utils/hsdis In order to build hsdis, you’ll need the GNU Binutils, a collection of several binary tools:\n1 2 wget https://ftp.gnu.org/gnu/binutils/binutils-2.35.tar.gz tar xvf binutils-2.35.tar.gz Then run the actual hsdis build (macOS comes with all the required tools like make):\n1 make BINUTILS=binutils-2.35 ARCH=amd64 This will take a few minutes; if all goes well, there’ll be hsdis binary in the build directory, in my case this is build/macosx-amd64/hsdis-amd64.dylib. Copy the library to lib/server of our JDK:\n1 sudo cp build/macosx-amd64/hsdis-amd64.dylib $JAVA_HOME/lib/server If you’re on Linux, you also can provide the hsdis tool via the LD_LIBRARY_PATH environment variable:\n1 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:path/to/hsdis/build/linux-amd64 Note this won’t work on current macOS versions unfortunately due to its System Integrity Protection feature (SIP). Thanks to Brice Dutheil for this tip!\nCongrats! You now can use the XX:+PrintAssembly flag of the java command to examine the assembly code of your Java program. Let’s give it a try. Create a Java source file with the following contents:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public class PrintAssemblyTest { public static void main(String... args) { PrintAssemblyTest hello = new PrintAssemblyTest(); for(int i = 0; i \u0026lt;= 10_000_000; i++) { hello.hello(i); } } private void hello(int i) { if (i % 1_000_000 == 0) { System.out.println(\u0026#34;Hello, \u0026#34; + i); } } } Compile and run it like so:\n1 2 3 4 javac PrintAssemblyTest.java java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly \\ -Xlog:class+load=info -XX:+LogCompilation \\ PrintAssemblyTest You should then find the assembly code of the hello() method somewhere in the output:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 ============================= C2-compiled nmethod ============================== ----------------------------------- Assembly ----------------------------------- Compiled method (c2) 1409 106 4 PrintAssemblyTest::hello (20 bytes) total in heap [0x000000011e3fce90,0x000000011e3fd148] = 696 relocation [0x000000011e3fcfe8,0x000000011e3fcff8] = 16 main code [0x000000011e3fd000,0x000000011e3fd080] = 128 stub code [0x000000011e3fd080,0x000000011e3fd098] = 24 oops [0x000000011e3fd098,0x000000011e3fd0a0] = 8 metadata [0x000000011e3fd0a0,0x000000011e3fd0a8] = 8 scopes data [0x000000011e3fd0a8,0x000000011e3fd0d0] = 40 scopes pcs [0x000000011e3fd0d0,0x000000011e3fd140] = 112 dependencies [0x000000011e3fd140,0x000000011e3fd148] = 8 -------------------------------------------------------------------------------- [Constant Pool (empty)] -------------------------------------------------------------------------------- [Entry Point] # {method} {0x000000010d74c4b0} \u0026#39;hello\u0026#39; \u0026#39;(I)V\u0026#39; in \u0026#39;PrintAssemblyTest\u0026#39; # this: rsi:rsi = \u0026#39;PrintAssemblyTest\u0026#39; # parm0: rdx = int # [sp+0x30] (sp of caller) 0x000000011e3fd000: mov 0x8(%rsi),%r10d 0x000000011e3fd004: shl $0x3,%r10 0x000000011e3fd008: movabs $0x800000000,%r11 0x000000011e3fd012: add %r11,%r10 0x000000011e3fd015: cmp %r10,%rax 0x000000011e3fd018: jne 0x0000000116977100 ; {runtime_call ic_miss_stub} 0x000000011e3fd01e: xchg %ax,%ax [Verified Entry Point] 0x000000011e3fd020: mov %eax,-0x14000(%rsp) 0x000000011e3fd027: push %rbp 0x000000011e3fd028: sub $0x20,%rsp ;*synchronization entry ; - PrintAssemblyTest::hello@-1 (line 10) 0x000000011e3fd02c: movslq %edx,%r10 0x000000011e3fd02f: mov %edx,%r11d 0x000000011e3fd032: sar $0x1f,%r11d 0x000000011e3fd036: imul $0x431bde83,%r10,%r10 0x000000011e3fd03d: sar $0x32,%r10 0x000000011e3fd041: mov %r10d,%r10d 0x000000011e3fd044: sub %r11d,%r10d 0x000000011e3fd047: imul $0xf4240,%r10d,%r10d ;*irem {reexecute=0 rethrow=0 return_oop=0} ; - PrintAssemblyTest::hello@3 (line 10) 0x000000011e3fd04e: cmp %r10d,%edx 0x000000011e3fd051: je 0x000000011e3fd063 ;*ifne {reexecute=0 rethrow=0 return_oop=0} ; - PrintAssemblyTest::hello@4 (line 10) 0x000000011e3fd053: add $0x20,%rsp 0x000000011e3fd057: pop %rbp 0x000000011e3fd058: mov 0x110(%r15),%r10 0x000000011e3fd05f: test %eax,(%r10) ; {poll_return} 0x000000011e3fd062: retq 0x000000011e3fd063: mov %edx,%ebp 0x000000011e3fd065: sub %r10d,%ebp ;*irem {reexecute=0 rethrow=0 return_oop=0} ; - PrintAssemblyTest::hello@3 (line 10) 0x000000011e3fd068: mov $0xffffff45,%esi 0x000000011e3fd06d: mov %edx,(%rsp) 0x000000011e3fd070: data16 xchg %ax,%ax 0x000000011e3fd073: callq 0x0000000116979080 ; ImmutableOopMap {} ;*ifne {reexecute=1 rethrow=0 return_oop=0} ; - (reexecute) PrintAssemblyTest::hello@4 (line 10) ; {runtime_call UncommonTrapBlob} 0x000000011e3fd078: hlt 0x000000011e3fd079: hlt 0x000000011e3fd07a: hlt 0x000000011e3fd07b: hlt 0x000000011e3fd07c: hlt 0x000000011e3fd07d: hlt 0x000000011e3fd07e: hlt 0x000000011e3fd07f: hlt [Exception Handler] 0x000000011e3fd080: jmpq 0x0000000116a22d80 ; {no_reloc} [Deopt Handler Code] 0x000000011e3fd085: callq 0x000000011e3fd08a 0x000000011e3fd08a: subq $0x5,(%rsp) 0x000000011e3fd08f: jmpq 0x0000000116978ca0 ; {runtime_call DeoptimizationBlob} 0x000000011e3fd094: hlt 0x000000011e3fd095: hlt 0x000000011e3fd096: hlt 0x000000011e3fd097: hlt -------------------------------------------------------------------------------- Interpreting the output is left as an exercise for the astute reader ;-) A great resource for getting started doing so is the post PrintAssembly output explained! by Jean-Philippe Bempel.\nWith hsdis in place, you also can use the excellent JitWatch tool for analysing the assembly code, which e.g. not only provides an easy way to navigate from source code to byte code to assembly code, but also comes with helpful tooltips explaining the meaning of the different assembly mnemonics.\n","id":170,"publicationdate":"Oct 5, 2020","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eLately I’ve been fascinated by the possibility to analyse the assembly code emitted by the Java JIT (just-in-time) compiler.\nSo far I had only looked only into Java class files using \u003cem\u003ejavap\u003c/em\u003e;\ndiving into the world of assembly code feels a bit like Alice must have felt when falling down the rabbit whole into wonderland.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","openjdk","performance","build-tools"],"title":"Building hsdis for OpenJDK 15","uri":"https://www.morling.dev/blog/building-hsdis-for-openjdk-15/"},{"content":" Table of Contents Why JmFrX? How To Use JmFrX Customizing Event Formats How It Works Takeaways I’m excited to share the news about an open-source utility I’ve been working on lately: JmFrX, a tool for capturing JMX data with JDK Flight Recorder.\nWhen using JMX (Java Management Extensions), The Java platform’s standard for monitoring and managing applications, JmFrX allows you to periodically record the attributes from any JMX MBean into JDK Flight Recorder (JFR) files, which you then can analyse using JDK Mission Control (JMC).\nThis is useful for a number of reasons:\nYou can track changes to the values of JMX MBean attributes over time without resorting to external monitoring tools\nYou can analyze JMX data from offline JFR recording files in cases where you cannot directly connect to the running application\nYou can export JMX data as live data streams using the JFR event streaming API introduced in Java 14\nIn this blog post I’m going to explain how to use JmFrX for recording JMX data in your applications, point out some interesting JmFrX implemention details, and lastly will discuss some potential steps for future development of the tool.\nWhy JmFrX? JDK Flight Recorder is a \u0026#34;low-overhead data collection framework for troubleshooting Java applications and the HotSpot JVM\u0026#34;. In combination with the JDK Mission Control client application it allows to gain deep insights into the performance characteristics of Java applications.\nIn addition to the built-in metrics and event types, JFR also allows to define and emit custom event types. JFR got open-sourced in JDK 11; since then, developers in the Java eco-system began to support this, enabling users to work with JFR and JMC for analyzing the runtime behavior of 3rd party libraries and frameworks. For instance, JUnit 5.7 produces JFR events related to the execution lifecycle of unit tests.\nAt the same time, many library authors are not (yet) in a position where they could easily emit JFR events from their tools, as for instance they might wish to keep compatibility with older Java versions. They might already expose JMX MBeans though which often provide fine-grained information about the execution state of Java applications. This is where JmFrX comes in: by periodically capturing the attribute values from a given set of JMX MBeans, it allows to capture this information in JFR recordings.\nJmFrX isn’t the first effort that seeks to bridge JMX and JFR; JDK Mission Control project lead Marcus Hirt discusses a similar project in a blog post in 2016. But unlike the implementation described by Marcus in this post, JmFrX is based on the public and supported APIs for defining, configuring and emitting JFR events, as available since OpenJDK 11.\nHow To Use JmFrX In order to use JmFrX, make sure to run OpenJDK 11 or newer. OpenJDK 8 also contains the open-sourced Flight Recorder bits as of release 8u262 (from July this year); so this should work, too, but I haven’t tested it yet.\nUntil a stable release will be provided, you can obtain JmFrX snapshot builds via JitPack. For that, add the JitPack repository to your pom.xml when using Apache Maven (or apply equivalent configuration for your preferred build tool):\n1 2 3 4 5 6 7 8 ... \u0026lt;repositories\u0026gt; \u0026lt;repository\u0026gt; \u0026lt;id\u0026gt;jitpack.io\u0026lt;/id\u0026gt; \u0026lt;url\u0026gt;https://jitpack.io\u0026lt;/url\u0026gt; \u0026lt;/repository\u0026gt; \u0026lt;/repositories\u0026gt; ... Then add the JmFrX dependency:\n1 2 3 4 5 6 7 ... \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;com.github.gunnarmorling\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jmfrx\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;master-SNAPSHOT\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; ... The next step is registering the JmFrX event type with JFR in the start-up routine of your program. This could for instance be done in the main() method, the static initializer of a class loaded early on, an eagerly initialized Spring or CDI bean, etc. A Java agent for this purpose will be provided as part of this project soon.\nWhen building applications with Quarkus, you could use an application start-up event like so:\n1 2 3 4 5 6 7 8 9 10 11 @ApplicationScoped public class EventRegisterer { public void registerEvent(@Observes StartupEvent se) { Jmfrx.getInstance().register(); } public void unregisterEvent(@Observes ShutdownEvent se) { Jmfrx.getInstance().unregister(); } } Now start your application and create a JFR configuration file which enables the JmFrX event type. To do so, open JDK Mission Control, and choose your running application in the JVM Browser. Then perform these steps:\nRight-click the target JVM → Select Start Flight Recording…​\nClick on Template Manager\nCopy the Continuous setting and click Edit for modifying this copy\nExpand the JMX and JMX Dump nodes\nMake sure the JMX Dump event type is Enabled; choose a period for dumping the chosen JMX MBeans (by default 60 s) and specify the MBeans whose data should be captured; that’s done by means of a regular expression, which matches one or more JMX object names, for instance .*OperatingSystem.*:\nClose the two last dialogues by clicking OK and OK\nImportant: Make sure that the template you edited is selected under Event settings\nClick Finish to begin the recording\nOnce the recording is complete, open the recording file in JDK Mission Control and go to the Event Browser. You should see periodic events corresponding to the selected MBeans under the JMX node:\nWhen not using JDK Mission Control to initiate recordings, but the jcmd utility on the command line, also follow the same steps as above for creating a configuration as described above. But then, instead of starting the recording, export the configuration file from the template manager and specify its name to jcmd via the settings=/path/to/settings.jfc parameter.\nNow using JmFrX to observe JMX data from for the java.lang MBeans like Runtime and OperatingSystem in JFR isn’t too exciting yet, as there’s dedicated JFR event types which contain most of that information. But things get more interesting when capturing data from custom MBean types, as e.g. here for the stream threads metrics from a Kafka Streams application:\nCustomizing Event Formats By default, JmFrX will propagate the raw attribute values from a JMX MBean to the corresponding JFR event. This makes sure that all the information can be retrieved from recordings, but the data format can be a bit unwieldy, e.g. when it comes to data amounts in bytes, or time periods in milli-seconds since epoch.\nTo address this, JFR supports a range of metadata annotations such as @DataAmount, @Timespan, or @Percentage, which allow to format event attributes. This information then is used by JMC for instance when displaying events in the browser (see event Properties to the left in the screenshot above).\nJmFrX integrates with this metadata facility via the notion of event profiles, which describe the data format of one MBean type and its attributes. When creating an event for a given JMX MBean, JmFrX will look for a corresponding event profile and apply its settings. Event profiles are defined by implementing the EventProfileContributor SPI. As an example here’s a subset of the the built-in profile definition for the OperatingSystem MBean:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 public class JavaLangEventProfileContributor implements EventProfileContributor { @Override public void contributeProfiles(EventProfileBuilder builder) { builder.addEventProfile(\u0026#34;java.lang:type=OperatingSystem\u0026#34;) (1) .addAttributeProfile(\u0026#34;TotalSwapSpaceSize\u0026#34;, long.class, new AnnotationElement(DataAmount.class, DataAmount.BYTES), (2) v -\u0026gt; v) .addAttributeProfile(\u0026#34;FreeSwapSpaceSize\u0026#34;, long.class, new AnnotationElement(DataAmount.class, DataAmount.BYTES), v -\u0026gt; v) (3) .addAttributeProfile(\u0026#34;CpuLoad\u0026#34;, double.class, new AnnotationElement(Percentage.class), v -\u0026gt; v) .addAttributeProfile(\u0026#34;ProcessCpuLoad\u0026#34;, double.class, new AnnotationElement(Percentage.class), v -\u0026gt; v) .addAttributeProfile(\u0026#34;SystemCpuLoad\u0026#34;, double.class, new AnnotationElement(Percentage.class), v -\u0026gt; v) .addAttributeProfile(\u0026#34;ProcessCpuTime\u0026#34;, long.class, new AnnotationElement(Timespan.class, Timespan.NANOSECONDS), v -\u0026gt; v ); } } 1 Profiles are linked via the MBean name 2 The atribute type is specified via an AnnotationElement for one of the JFR type metadata annotations 3 If needed, the actual value can be modified too, e.g. to convert it into another data type, or to shift its value into an expected range (for instance 0 to 1 for percentage values) Once you’ve defined the event profiles for your MBean type(s), don’t forget to register the contributor type either as a service implementation in your module-info.java descriptor (when building a modular Java application):\n1 2 3 4 5 6 module com.example { requires jdk.jfr; requires dev.morling.jmfrx; provides dev.morling.jmfrx.spi.EventProfileContributor with com.example.MyEventProfileContributor; } When building an application using the traditional classpath, register the names of all profile contributors in the META-INF/services/dev.morling.jmfrx.spi.EventProfileContributor file.\nThere’s a small (yet hopefully growing) set of event profiles built into JmFrX. But as event profile contributors are discovered using the Java service loader mechanism, you can also easily plug in event profiles for other MBean types, e.g. for the JMX MBeans of Apache Kafka or Kafka Connect, or application servers like WildFly.\nAlso your pull requests for contributing event profiles for common JMX applications to JmFrX itself will be very welcomed!\nHow It Works If you solely want to use JmFrX, you can pretty much stop reading this post at this point. But if you’re curious about how it is working internally, stay with me for a bit longer: JmFrX uses two lesser known JFR features which also might be interesting for your own application-specific event types, periodic JFR events and dynamic event types.\nUnlike most JFR event types which are emitted when some specific JVM or application functionality is executed, periodic events are produced in a regular interval. The default interval (which can be overridden by the user) is specified using the @Period annotation on the event type definition:\n1 2 3 4 5 6 7 8 9 10 11 12 @Name(JmxDumpEvent.NAME) @Label(\u0026#34;JMX Dump\u0026#34;) @Category(\u0026#34;JMX\u0026#34;) @Description(\u0026#34;Periodically dumps specific JMX MBeans\u0026#34;) @StackTrace(false) @Period(\u0026#34;60 s\u0026#34;) public class JmxDumpEvent extends Event { public static final String NAME = \u0026#34;dev.morling.jmfrx.JmxDumpEvent\u0026#34;; // event implementation ... } Upon application start-up, JmFrX registers this event type with the JFR environment:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... private Runnable hook; public void register() { hook = () -\u0026gt; { (1) JmxDumpEvent dumpEvent = new JmxDumpEvent(); if (!dumpEvent.isEnabled()) { return; } dumpEvent.begin(); // retrieve data from matching MBean(s) and create event(s) ... dumpEvent.commit(); }; FlightRecorder.addPeriodicEvent(JmxDumpEvent.class, hook); (2) } public void unregister() { FlightRecorder.removePeriodicEvent(hook); (3) } ... 1 The event hook implementation 2 Register the periodic event 3 Unregister the periodic event The regular expression for specifying the MBean name(s) is passed to the event type as a SettingControl. You can learn more about event settings in my post on custom JFR event types.\nWhen the periodic event hook runs, it must create one event for each captured MBean. As JmFrX cannot know which MBean(s) you’re interested in, it’s not an option to pre-define these event types and their structure.\nThis is where dynamic JFR event types come in: Using the EventFactory class, event types can be defined at runtime. Under the covers, JFR will create a corresponding Event sub-class dynamically using the ASM API. Here’s the relevant JmFrX code which defines the event type for a given MBean:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ... public static EventDescriptor getDescriptorFor(String mBeanName) { MBeanServer mbeanServer = ManagementFactory.getPlatformMBeanServer(); try { ObjectName objectName = new ObjectName(mBeanName); MBeanInfo mBeanInfo = mbeanServer.getMBeanInfo(objectName); List\u0026lt;AnnotationElement\u0026gt; eventAnnotations = Arrays.asList( (1) new AnnotationElement(Category.class, getCategory(objectName)), new AnnotationElement(StackTrace.class, false), new AnnotationElement(Name.class, getName(objectName)), new AnnotationElement(Label.class, getLabel(objectName)), new AnnotationElement(Description.class, mBeanInfo.getDescription()) ); List\u0026lt;AttributeDescriptor\u0026gt; fields = getFields(objectName, mBeanInfo); List\u0026lt;ValueDescriptor\u0026gt; valueDescriptors = fields.stream() (2) .map(AttributeDescriptor::getValueDescriptor) .collect(Collectors.toList()); return new EventDescriptor(EventFactory.create(eventAnnotations, valueDescriptors), fields); } catch (Exception e) { throw new RuntimeException(e); } } ... 1 Define event metadata like name, label, category etc. via the JFR metadata annotations 2 For each MBean attribute, an attribute is added to the event type; its definition is based on the information in the corresponding event profile, if present The actual implemention is slightly more complex, as it deals with integrating metadata from JmFrX event profiles and more. You can find the complete code in the EventProfile class.\nTakeaways JmFrX is a small utility which allows you to capture JMX data with JDK Flight Recorder. It’s open-source (Apache License, version 2), you can find the source code on GitHub. With the wide usage of JMX for application monitoring in the Java world, JmFrX can help to bring that information into JFR recordings, making it available for offline investigations and analyses.\nPotential next steps for JmFrX include more meaningful handling of tabular and composite JMX data, adding a Java agent for registering the event type, providing some more built-in event profiles and publishing a stable release on Maven Central. Eventually, the JmFrX project might move over to the rh-jmc-team GitHub organization, which is is managed by Red Hat’s OpenJDK team and contains many other very useful projects around JDK Flight Recorder and Mission Control.\nYour feedback on and contributions to JmFrX will be very welcomed!\n","id":171,"publicationdate":"Aug 18, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_why_jmfrx\"\u003eWhy JmFrX?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_how_to_use_jmfrx\"\u003eHow To Use JmFrX\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_customizing_event_formats\"\u003eCustomizing Event Formats\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_how_it_works\"\u003eHow It Works\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_takeaways\"\u003eTakeaways\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eI’m excited to share the news about an open-source utility I’ve been working on lately:\n\u003ca href=\"https://github.com/gunnarmorling/jmfrx\"\u003eJmFrX\u003c/a\u003e,\na tool for capturing JMX data with JDK Flight Recorder.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen using JMX (\u003ca href=\"https://en.wikipedia.org/wiki/Java_Management_Extensions\"\u003eJava Management Extensions\u003c/a\u003e), The Java platform’s standard for monitoring and managing applications,\nJmFrX allows you to periodically record the attributes from any JMX MBean into \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) files,\nwhich you then can analyse using \u003ca href=\"https://openjdk.java.net/projects/jmc/\"\u003eJDK Mission Control\u003c/a\u003e (JMC).\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","jmx","monitoring"],"title":"Introducing JmFrX: A Bridge From JMX to JDK Flight Recorder","uri":"https://www.morling.dev/blog/introducing-jmfrx-a-bridge-from-jmx-to-jdk-flight-recorder/"},{"content":"","id":172,"publicationdate":"Aug 18, 2020","section":"tags","summary":"","tags":null,"title":"jmx","uri":"https://www.morling.dev/tags/jmx/"},{"content":" Table of Contents Why Serverless? Solution Overview Data Extraction Search Backend Implementation Apache Lucene in a GraalVM Native Binary The Search HTTP Service Wiring Things Up Deployment to AWS Lambda Building Quarkus Applications for AWS Lambda Identity and Access Management Performance Cost Control Wrap-Up and Outlook I have built a custom search functionality for this blog, based on Java and the Apache Lucene full-text search library, compiled into a native binary using the Quarkus framework and GraalVM. It is deployed as a Serverless application running on AWS Lambda, providing search results without any significant cold start delay. If you thought Java wouldn’t be the right language for this job, keep reading; in this post I’m going to give an overview over the implementation of this feature and my learnings along the way.\nHaving a search functionality for my blog has been on my mind for quite some time; I’d like to give users the opportunity to find specific contents on this blog right here on this site, without having to use an external search engine. That’s not only nice in terms of user experience, but also having insight into the kind of information readers look for on this blog should help me to identify interesting things to write about in the future.\nNow this blog is a static site — generated using Hugo, hosted on GitHub Pages — which makes this an interesting challenge. I didn’t want to rely on an external search service (see \u0026#34;Why No External Search Service\u0026#34; below for the reasoning), and also a purely client-side solution as described in this excellent blog post didn’t seem ideal. While technically fascinating, I didn’t like the fact that it requires shipping the entire search index to the client for executing search queries. Also things like result highlighting, customized result scoring, word stemming, fuzzy search and more seemed a bit more than I’d be willing to implement on the client.\nAll these issues have largely been solved on the server-side by libraries such as Apache Lucene for quite some time. Using a library like Lucene means implementing a custom server-side process, though. How to deploy such service? Operating a VM 24/7 with my search backend for what’s likely going to be not more than a few dozen queries per month seemed a bit like overkill.\nSo after some consideration I decided to implement my own search functionality, based on the highly popular Apache Lucene library, deployed as a Serverless application, which is started on-demand if a user runs a query on my website. In the remainder of this post I’m going to describe the solution I came up with and how it works.\nIf you like, you can try it out right now, this post is about this little search input control at the top right of this page!\nWhy No External Search Service? When tweeting about my serverless search experiment, one of the questions was \u0026#34;What’s wrong with Algolia?\u0026#34;. To be very clear, there’s nothing wrong with it at all. External search services like Algolia, Google Custom Search, or an Elasticsearch provider such as Bonsai promise an easy-to-use, turn-key search functionality which can be a great choice for your specific use case.\nHowever, I felt that none of these options would provide me the degree of control and customizability I was after. I also ruled out any \u0026#34;free\u0026#34; options, as they’d either mean having ads or paying for the service with the data of myself or that of my readers. And to be honest, I also just fancied the prospect of solving the problem by myself, instead of relying on an off-the-shelf solution.\nWhy Serverless? First of all, let’s discuss why I opted for a Serverless solution. It boils down to three reasons:\nSecurity: While it’d only cost a few EUR per month to set up a VM with a cloud provider like Digital Ocean or Hetzner, having to manage a full operating system installation would require too much of my attention; I don’t want someone to mine bitcoins or doing other nasty things on a box I run, just because I failed to apply some security patch\nCost: Serverless does not only promise to scale-out (and let’s be honest, there likely won’t be millions of search queries on my blog every month), but also scale-to-zero. As Serverless is pay-per-use and there are free tiers in place e.g. for AWS Lambda, this service ideally should cost me just a few cents per month\nLearning Opportunity: Last but not least, this also should be a nice occasion for me to dive into the world of Serverless, by means of designing, developing and running a solution for a real-world problem, exploring how Java as my preferred programming language can be used for this task\nSolution Overview The overall idea is quite simple: there’s a simple HTTP service which takes a query string, runs the query against a Lucene index with my blog’s contents and returns the search results to the caller. This service gets invoked via JavaScript from my static blog pages, where results are shown to the user.\nThe Lucene search index is read-only and gets rebuilt whenever I update the blog. It’s baked into the search service deployment package, which that way becomes fully immutable. This reduces complexities and the attack surface at runtime. Surely that’s not an approach that’s viable for more dynamic use cases, but for a blog that’s updated every few weeks, it’s perfect. Here’s a visualization of the overall flow:\nThe search service is deployed as a Serverless function on AWS Lambda. One important design goal for me is to avoid lock-in to any specific cloud provider: the solution should be portable and also be usable with container-based Serverless approaches like Knative.\nRelying on a Serverless architecture means its start-up time must be a matter of milli-seconds rather than seconds, so to not have a user wait for a noticeable amount of time in case of a cold start. While substantial improvements have been made in recent Java versions to improve start-up times, it’s still not ideal for this kind of use case. Therefore, the application is compiled into a native binary via Quarkus and GraalVM, which results in a start-up time of ~30 ms on my laptop, and ~180 ms when deployed to AWS Lambda. With that we’re in a range where a cold start won’t impact the user experience in any significant way.\nThe Lambda function is exposed to callers via the AWS API Gateway, which takes incoming HTTP requests, maps them to calls of the function and converts its response into an HTTP response which is sent back to the caller.\nNow let’s dive down a bit more into the specific parts of the solution. Overall, there are four steps involved:\nData extraction: The blog contents to be indexed must be extracted and converted into an easy-to-process data format\nSearch backend implementation: A small HTTP service is needed which exposes the search functionality of Apache Lucene, which in particular requires some steps to enable Lucene being used in a native GraalVM binary\nIntegration with the website: The search service must be integrated into the static site on GitHub Pages\nDeployment: Finally, the search service needs to be deployed to AWS API Gateway and Lambda\nData Extraction The first step was to obtain the contents of my blog in an easily processable format. Instead of requiring something like a real search engine’s crawler, I essentially only needed to have a single file in a structured format which then can be passed on to the Lucene indexer.\nThis task proved rather easy with Hugo; by means of a custom output format it’s straight-forward to produce a JSON file which contains the text of all my blog pages. In my config.toml I declared the new output format and activate it for the homepage (largely inspired by this write-up):\n1 2 3 4 5 6 7 8 [outputFormats.SearchIndex] mediaType = \u0026#34;application/json\u0026#34; baseName = \u0026#34;searchindex\u0026#34; isPlainText = true notAlternative = true [outputs] home = [\u0026#34;HTML\u0026#34;,\u0026#34;RSS\u0026#34;, \u0026#34;SearchIndex\u0026#34;] The template in layouts/_default/list.searchindex.json isn’t too complex either:\n1 2 3 4 5 {{- $.Scratch.Add \u0026#34;searchindex\u0026#34; slice -}} {{- range $index, $element := .Site.Pages -}} {{- $.Scratch.Add \u0026#34;searchindex\u0026#34; (dict \u0026#34;id\u0026#34; $index \u0026#34;title\u0026#34; $element.Title \u0026#34;uri\u0026#34; $element.Permalink \u0026#34;tags\u0026#34; $element.Params.tags \u0026#34;section\u0026#34; $element.Section \u0026#34;content\u0026#34; $element.Plain \u0026#34;summary\u0026#34; $element.Summary \u0026#34;publicationdate\u0026#34; ($element.Date.Format \u0026#34;Jan 2, 2006\u0026#34;)) -}} {{- end -}} {{- $.Scratch.Get \u0026#34;searchindex\u0026#34; | jsonify -}} The result is this JSON file:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [ ... { \u0026#34;content\u0026#34;: \u0026#34;The JDK Flight Recorder (JFR) is an invaluable tool...\u0026#34;, \u0026#34;id\u0026#34;: 12, \u0026#34;publicationdate\u0026#34;: \u0026#34;Jan 29, 2020\u0026#34;, \u0026#34;section\u0026#34;: \u0026#34;blog\u0026#34;, \u0026#34;summary\u0026#34;: \u0026#34;\\u003cdiv class=\\\u0026#34;paragraph\\\u0026#34;\\u003e\\n\\u003cp\\u003eThe \\u003ca href=\\\u0026#34;https://openjdk.java.net/jeps/328\\\u0026#34;\\u003eJDK Flight Recorder\\u003c/a\\u003e (JFR) is an invaluable tool...\u0026#34;, \u0026#34;tags\u0026#34;: [ \u0026#34;java\u0026#34;, \u0026#34;monitoring\u0026#34;, \u0026#34;microprofile\u0026#34;, \u0026#34;jakartaee\u0026#34;, \u0026#34;quarkus\u0026#34; ], \u0026#34;title\u0026#34;: \u0026#34;Monitoring REST APIs with Custom JDK Flight Recorder Events\u0026#34;, \u0026#34;uri\u0026#34;: \u0026#34;https://www.morling.dev/blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/\u0026#34; }, ... ] This file gets automatically updated whenever I republish the blog.\nSearch Backend Implementation My stack of choice for this kind of application is Quarkus. As a contributor, I am of course biased, but Quarkus is ideal for the task at hand: built and optimized from the ground up for implementing fast-starting and memory-efficient cloud-native and Serverless applications, it makes building HTTP services, e.g. based on JAX-RS, running on GraalVM a trivial effort.\nNow typically a Java library such as Lucene will not run in a GraalVM native binary out-of-the-box. Things like reflection or JNI usage require specific configuration, while other Java features like method handles are only supported partly or not at all.\nApache Lucene in a GraalVM Native Binary Quarkus enables a wide range of popular Java libraries to be used with GraalVM, but at this point there’s no extension yet which would take care of Lucene. So I set out to implement a small Quarkus extension for Lucene. Depending on the implementation details of the library in question, this can be a more or less complex and time-consuming endeavor. The workflow is like so:\ncompile down an application using the library into a native image\nrun into some sort of exception, e.g. due to types accessed via Java reflection (which causes the GraalVM compiler to miss them during call flow analysis so that they are missing from the generated binary image)\nfix the issue e.g. by registering the types in question for reflection\nrinse and repeat\nThe good thing there is that the list of Quarkus extensions is constantly growing, so that you hopefully don’t have to go through this by yourself. Or if you do, consider publishing your extension via the Quarkus platform, saving others from the same work.\nFor my particular usage of Lucene, I ran luckily into two issues only. The first is the usage of method handles in the AttributeFactory class for dynamically instantiating sub-classes of the AttributeImpl type, which isn’t supported in that form by GraalVM. One way for dealing with this is to define substitutions, custom methods or classes which will override a specific original implementation. As an example, here’s one of the substitution classes I had to create:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 @TargetClass(className = \u0026#34;org.apache.lucene.util.AttributeFactory$DefaultAttributeFactory\u0026#34;) public final class DefaultAttributeFactorySubstitution { public DefaultAttributeFactorySubstitution() {} @Substitute public AttributeImpl createAttributeInstance(Class\u0026lt;? extends Attribute\u0026gt; attClass) { if (attClass == BoostAttribute.class) { return new BoostAttributeImpl(); } else if (attClass == CharTermAttribute.class) { return new CharTermAttributeImpl(); } else if (...) { ... } throw new UnsupportedOperationException(\u0026#34;Unknown attribute class: \u0026#34; + attClass); } } During native image creation, the GraalVM compiler will discover all substitute classes and apply their code instead of the original ones.\nThe other problem I ran into was the usage of method handles in the MMapDirectory class, which will be used by Lucene by default on Linux when obtaining a file-system backed index directory. I didn’t explore how to circumvent that, instead I opted for using the SimpleFSDirectory implementation which proved to work fine in my native GraalVM binary.\nWhile this was enough in order to get Lucene going in a native image, you might run into different issues when using other libraries with GraalVM native binaries. Quarkus comes with a rich set of so-called build items which extension authors can use in order to enable external dependencies on GraalVM, e.g. for registering classes for reflective access or JNI, adding additional resources to the image, and much more. I recommend you take a look at the extension author guide in order to learn more.\nBesides enabling Lucene on GraalVM, that Quarkus extension also does two more things:\nParse the previously extracted JSON file, build a Lucene index from that and store that index in the file system; that’s fairly standard Lucene procedure without anything noteworthy; I only had to make sure that the index fields are stored in their original form in the search index, so that they can be accessed at runtime when displaying fragments with the query hits\nRegister a CDI bean, which allows to obtain the index at runtime via @Inject dependency injection from within the HTTP endpoint class\nA downside of creating binaries via GraalVM is the increased build time: creating a native binary for macOS via a locally installed GraalVM SDK takes about two minutes on my laptop. For creating a Linux binary to be used with AWS Lambda, I need to run the build in a Linux container, which takes about five minutes. But typically this task is only done once when actually deploying the application, whereas locally I’d work either with the Quarkus Dev Mode (which does a live reload of the application as its code changes) or test on the JVM. In any case it’s a price worth paying: only with start-up times in the range of milli-seconds on-demand Serverless cold starts with the user waiting for a response become an option.\nThe Search HTTP Service The actual HTTP service implementation for running queries is rather unspectacular; It’s based on JAX-RS and exposes as simple endpoint which can be invoked with a given query like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 http \u0026#34;https://my-search-service/search?q=java\u0026#34; HTTP/1.1 200 OK Connection: keep-alive Content-Length: 4930 Content-Type: application/json Date: Tue, 21 Jul 2020 17:05:00 GMT { \u0026#34;message\u0026#34;: \u0026#34;ok\u0026#34;, \u0026#34;results\u0026#34;: [ { \u0026#34;fragment\u0026#34;: \u0026#34;...plug-ins. In this post I\u0026amp;#8217;m going to explore how the \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; Platform Module System\u0026#39;s notion of module layers can be leveraged for implementing plug-in architectures on the JVM. We\u0026amp;#8217;ll also discuss how Layrry, a launcher and runtime for layered \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; applications, can help with this task. A key requirement...\u0026#34;, \u0026#34;publicationdate\u0026#34;: \u0026#34;Apr 21, 2020\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Plug-in Architectures With Layrry and the \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; Module System\u0026#34;, \u0026#34;uri\u0026#34;: \u0026#34;https://www.morling.dev/blog/plugin-architectures-with-layrry-and-the-java-module-system/\u0026#34; }, { \u0026#34;fragment\u0026#34;: \u0026#34;...the current behavior indeed is not intended (see JDK-8236597) and in a future \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; version the shorter version of the code shown above should work. Wrap-Up In this blog post we\u0026amp;#8217;ve explored how invariants on \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; 14 record types can be enforced using the Bean Validation API. With just a bit...\u0026#34;, \u0026#34;publicationdate\u0026#34;: \u0026#34;Jan 20, 2020\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Enforcing \u0026lt;b\u0026gt;Java\u0026lt;/b\u0026gt; Record Invariants With Bean Validation\u0026#34;, \u0026#34;uri\u0026#34;: \u0026#34;https://www.morling.dev/blog/enforcing-java-record-invariants-with-bean-validation/\u0026#34; }, ... ] } Internally it’s using Lucene’s MultiFieldQueryParser for parsing the query and running it against the \u0026#34;title\u0026#34; and \u0026#34;content\u0026#34; fields of the index. It is set to combine multiple terms using the logical AND operator by default (who ever would want the default of OR?), it supports phrase queries given in quotes, and a number of other query operators.\nQuery hits are highlighted using the FastVectorHighlighter highlighter and SimpleHTMLFormatter as a fallback (not all kinds of queries can be processed by FastVectorHighlighter). The highlighter wraps the matched search terms in the returned fragment in \u0026lt;b\u0026gt; tags, which are styled appropriately in my website’s CSS. I was prepared to do some adjustments to result scoring, but this wasn’t necessary so far. Title matches are implicitly ranked higher than content matches due to the shorter length of the title field values.\nImplementing the service using a standard HTTP interface instead of relying on specific AWS Lambda contracts is great in terms of local testing as well as portability: I can work on the service using the Quarkus Dev Mode and invoke it locally, without having to deploy it into some kind of Lambda test environment. It also means that should the need arise, I can take this service and run it elsewhere, without requiring any code changes. As I’ll discuss in a bit, Quarkus takes care of making this HTTP service runnable within the Lambda environment by means of a single dependency configuration.\nWiring Things Up Now it was time to hook up the search service into my blog. I wouldn’t want to have the user navigate to the URL of the AWS API Gateway in their browser; this means that the form with the search text input field cannot actually be submitted. Instead, the default form handling must be disabled, and the search string be sent via JavaScript to the API Gateway URL.\nThis means the search feature won’t work for users who have JavaScript disabled in their browser. I deemed this an acceptable limitation; in order to avoid unnecessary confusion and frustration, the search text input field is hidden in that case via CSS:\n1 2 3 4 5 \u0026lt;noscript\u0026gt; \u0026lt;style type=\u0026#34;text/css\u0026#34;\u0026gt; .search-input { display:none; } \u0026lt;/style\u0026gt; \u0026lt;/noscript\u0026gt; The implementation of the backend call is fairly standard JavaScript business using the XMLHttpRequest API, so I’ll spare you the details here. You can find the complete implementation in my GitHub repo.\nThere’s one interesting detail to share though in terms of improving the user experience after a cold start. As mentioned above, the Quarkus application itself starts up on Lambda in about ~180 ms. Together with the initialization of the Lambda execution environment I typically see ~370 ms for a cold start. Add to that the network round-trip times, and a user will feel a slight delay. Nothing dramatical, but it doesn’t have that snappy instant feeling you get when executing the search with a warm environment.\nThinking about the typical user interaction though, the situation can be nicely improved: if a visitor puts the focus onto the search text input field, it’s highly likely that they will submit a query shortly thereafter. We can take advantage of that and have the website send a small \u0026#34;ping\u0026#34; request right at the point when the input field obtains the focus. This gives us enough headstart to have the Lambda function being started before the actual query comes in. Here’s the request flow of a typical interaction (the \u0026#34;Other\u0026#34; requests are CORS preflight requests):\nNote how the search call is issued only a few hundred ms after the ping. Now you could beat this e.g. when navigating to the text field using your keyboard and if you were typing really fast. But most users will use their mouse or touchpad to put the cursor into the input, and then change to the keyboard to enter the query, which is time enough for this little trick to work.\nThe analysis of the logs confirms that essentially all executed queries hit a warmed up Lambda function, making cold starts a non-issue. To avoid any unneeded warm-up calls, they are only done when entering the input field for the first time after loading the page, or when staying on the page for long enough, so that the Lambda might have shut down again due to lack of activity.\nOf course you’ll be charged for the additional ping requests, but for the volume I expect, this makes no relevant difference whatsoever.\nDeployment to AWS Lambda The last part of my journey towards a Serverless search function was deployment to AWS Lambda. I was exploring Heroku and Google Cloud Run as alternatives, too. Both allow you to deploy regular container images, which then are automatically scaled on demand. This results in great portability, as things hardly can get any more standard than plain Linux containers.\nWith Heroku, cold start times proved problematic, though: I observed 5 - 6 seconds, which completely ruling it out. This wasn’t a problem with Cloud Run, and it’d surely work very well overall. In the end I went for AWS Lambda, as its entire package of service runtime, API Gateway and web application firewall seemed more complete and mature to me.\nWith AWS Lambda, I observed cold start times of less than 0.4 sec for my actual Lambda function, plus the actual request round trip. Together with the warm-up trick described above, this means that a user practically never will get a cold start when executing the search.\nYou shouldn’t under-estimate the time needed though to get familiar with Lambda itself, the API Gateway which is needed for routing HTTP requests to your function and the interplay of the two.\nTo get started, I configured some playground Lambda and API in the web console, but eventually I needed something along the lines of infrastructure-as-code, means of reproducible and automated steps for configuring and setting up all the required components. My usual go-to solution in this area is Terraform, but here I settled for the AWS Serverless Application Model (SAM), which is tailored specifically to setting up Serverless apps via Lambda and API Gateway and thus promised to be a bit easier to use.\nBuilding Quarkus Applications for AWS Lambda Quarkus supports multiple approaches for building Lambda-based applications:\nYou can directly implement Lambda’s APIs like RequestHandler, which I wanted to avoid though for the sake of portability between different environments and cloud providers\nYou can use the Quarkus Funqy API for building portable functions which e.g. can be deployed to AWS, Azure Functions and Google Cloud Functions; the API is really straight-forward and it’s a very attractive option, but right now there’s no way to use Funqy for implementing an HTTP GET API with request parameters, which ruled out this option for my purposes\nYou can implement your Lambda function using the existing and well-known HTTP APIs of Vert.x, RESTEasy (JAX-RS) and Undertow; in this case Quarkus will take care of mapping the incoming function call to the matching HTTP endpoint of the application\nUsed together with the proxy feature of the AWS API Gateway, the third option is exactly what I was looking for. I can implement the search endpoint using the JAX-RS API I’m familiar with, and the API Gateway proxy integration together with Quarkus\u0026#39; glue code will take care of everything else for running this. This is also great in terms of portability: I only need to add the io.quarkus:quarkus-amazon-lambda-http dependency to my project, and the Quarkus build will emit a function.zip file which can be deployed to AWS Lambda. I’ve put this into a separate Maven build profile, so I can easily switch between creating the Lambda function deployment package and a regular container image with my REST endpoint which I can deploy to Knative and environments like OpenShift Serverless, without requiring any code changes whatsoever.\nThe Quarkus Lambda extension also produces templates for the AWS SAM tool for deploying my stack. They are a good starting point which just needs a little bit of massaging; For the purposes of cost control (see further below), I added an API usage plan and API key. I also enabled CORS so that the API can be called from my static website. This made it necessary to disable the configuration of binary media types which the generated template contains by default. Lastly, I used a specific pre-configured execution role instead of the default AWSLambdaBasicExecutionRole.\nWith the SAM descriptor in place, re-building and publishing the search service becomes a procedure of three steps:\n1 2 3 4 5 6 7 8 9 10 mvn clean package -Pnative,lambda -DskipTests=true \\ -Dquarkus.native.container-build=true sam package --template-file sam.native.yaml \\ --output-template-file packaged.yaml \\ --s3-bucket \u0026lt;my S3 bucket\u0026gt; sam deploy --template-file packaged.yaml \\ --capabilities CAPABILITY_IAM \\ --stack-name \u0026lt;my stack name\u0026gt; The lambda profile takes care of adding the Quarkus Lambda HTTP extension, while the native profile makes sure that a native binary is built instead of a JAR to be run on the JVM. As I need to build a Linux binary for the Lambda function while running on macOS locally, I’m using the -Dquarkus.native.container-build=true option, which will make the Quarkus build running in a container itself, producing a Linux binary no matter which platform this build itself is executed on.\nThe function.zip file produced by the Quarkus build has a size of ~15 MB, i.e. it’s uploaded and deployed to Lambda in a few seconds. Currently it also contains the Lucene search index, meaning I need to run the time-consuming GraalVM build whenever I want to update the index. As an optimization I might at some point extract the index into a separate Lambda layer, which then could be deployed by itself, if there were no code changes to the search service otherwise.\nIdentity and Access Management A big pain point for me was identity and access management (IAM) for the AWS API Gateway and Lambda. While the AWS IAM is really powerful and flexible, there’s unfortunately no documentation, which would describe the minimum set of required permissions in order to deploy a stack like my search using SAM.\nThings work nicely if you use a highly-privileged account, but I’m a strong believer into running things with only the least privileges needed for the job. For instance I don’t want my Lambda deployer to set up the execution role, but rather have it using one I pre-defined. The same goes for other resources like the S3 bucket used for uploading the deployment package.\nIdentifying the set of privileges actually needed is a rather soul-crushing experience of trial and error (please let me know in the comments below if there’s a better way to do this), which gets complicated by the fact that different resources in the AWS stack expose insufficient privileges in inconsistent ways, or sometimes in no really meaningful way at all when configured via SAM. I spent hours identifying a lacking S3 privilege when trying to deploy a Lambda layer from the Serverless Application Repository.\nHoping to spare others from this tedious work, here’s the policy for my deployment role I came up with:\n{ \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34;, \u0026#34;Statement\u0026#34;: [ { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;s3:PutObject\u0026#34;, \u0026#34;s3:GetObject\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:s3:::\u0026lt;deployment-bucket\u0026gt;\u0026#34;, \u0026#34;arn:aws:s3:::\u0026lt;deployment-bucket\u0026gt;/*\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;lambda:CreateFunction\u0026#34;, \u0026#34;lambda:GetFunction\u0026#34;, \u0026#34;lambda:GetFunctionConfiguration\u0026#34;, \u0026#34;lambda:AddPermission\u0026#34;, \u0026#34;lambda:UpdateFunctionCode\u0026#34;, \u0026#34;lambda:ListTags\u0026#34;, \u0026#34;lambda:TagResource\u0026#34;, \u0026#34;lambda:UntagResource\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:lambda:eu-central-1:\u0026lt;account-id\u0026gt;:function:search-morling-dev-SearchMorlingDev-*\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;iam:PassRole\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:iam::\u0026lt;account-id\u0026gt;:role/\u0026lt;execution-role\u0026gt;\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;cloudformation:DescribeStacks\u0026#34;, \u0026#34;cloudformation:DescribeStackEvents\u0026#34;, \u0026#34;cloudformation:CreateChangeSet\u0026#34;, \u0026#34;cloudformation:ExecuteChangeSet\u0026#34;, \u0026#34;cloudformation:DescribeChangeSet\u0026#34;, \u0026#34;cloudformation:GetTemplateSummary\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:cloudformation:eu-central-1:\u0026lt;account-id\u0026gt;:stack/search-morling-dev/*\u0026#34;, \u0026#34;arn:aws:cloudformation:eu-central-1:aws:transform/Serverless-2016-10-31\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;apigateway:POST\u0026#34;, \u0026#34;apigateway:PATCH\u0026#34;, \u0026#34;apigateway:GET\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:apigateway:eu-central-1::/restapis\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/restapis/*\u0026#34; ] }, { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Action\u0026#34;: [ \u0026#34;apigateway:POST\u0026#34;, \u0026#34;apigateway:GET\u0026#34; ], \u0026#34;Resource\u0026#34;: [ \u0026#34;arn:aws:apigateway:eu-central-1::/usageplans\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/usageplans/*\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/apikeys\u0026#34;, \u0026#34;arn:aws:apigateway:eu-central-1::/apikeys/search-morling-dev-apikey\u0026#34; ] } ] } Perhaps this could be trimmed down some more, but I felt it’s good enough for my purposes.\nPerformance At this point I haven’t conducted any systematic performance testing yet. There’s definitely a significant difference in terms of latency between running things locally on my (not exactly new) laptop and on AWS Lambda. Where the app starts up in ~30 ms locally, it’s ~180 ms when deployed to Lambda. Note this is only the number reported by Quarkus itself, the entire cold start duration of the application on Lambda, i.e. including the time required for fetching the code to the execution environment and starting the container, is ~370 ms (with 256 MB RAM assigned). Due to the little trick described above, though, a visitor is very unlikely to ever experience this delay when executing a query.\nSimilarly, there’s a substantial difference in terms of request execution duration. Still, when running a quick test of the deployed service via Siege, the vast majority of Lambda executions clocked in well below 100 ms (depending on the number of query hits which need result highlighting), putting them into the lowest bracket of billed Lambda execution time. As I learned, Lambda allocates CPU resources proportionally to assigned RAM, meaning assigning twice as much RAM should speed up execution, also if my application actually does not need that much memory. Indeed, with 512 MB RAM assigned, Lambda execution is down to ~30 - 40 ms after some warm-up, which is more than good enough for my purposes.\nRaw Lambda execution of course is only one part of the overall request duration, on top of that some time is spent in the API Gateway and on the wire to the user; The service is deployed in the AWS eu-central-1 region (Frankfurt, Germany), yielding roundtrip times for me, living a few hundred km away, between 50 - 70 ms (again with 512 MB RAM). With longer distances, network latencies outweigh the Lambda execution time: My good friend Eric Murphy from Seattle in the US reported a roundtrip time of ~240 ms when searching for \u0026#34;Java\u0026#34;, which I think is still quite good, given the long distance.\nCost Control The biggest issue for me as a hobbyist when using pay-per-use services like AWS Lambda and API Gateway is cost control. Unlike typical enterprise scenarios where you might be willing to accept higher cost for your service in case of growing demand, in my case I’d rather set up a fixed spending limit and shut down my search service for the rest of the month, once that has been reached. I absolutely cannot have an attacker doing millions and millions of calls against my API which could cost me a substantial amount of money.\nUnfortunately, there’s no easy way on AWS for setting up a maximum spending after which all service consumption would be stopped. Merely setting up a budget alert won’t cut it either, as this won’t help me while sitting on a plane for 12h (whenever that will be possible again…​) or being on vacation for three weeks. And needless to say, I don’t have an ops team monitoring my blog infrastructure 24/7 either.\nSo what to do to keep costs under control? An API usage plan is the first part of the answer. It allows you to set up a quota (maximum number of calls in a given time frame) which is pretty much what I need. Any calls beyond the quota are rejected by the API Gateway and not charged.\nThere’s one caveat though: a usage plan is tied to an API key, which the caller needs to pass using the X-API-Key HTTP request header. The idea being that different usage plans can be put in place for different clients of an API. Any calls without the API key are not charged either. Unfortunately though this doesn’t play well with CORS preflight requests as needed in my particular use case. Such requests will be sent by the browser before the actual GET calls to validate that the server actually allows for that cross-origin request. CORS preflight requests cannot have any custom request headers, though, meaning they cannot be part of a usage plan. The AWS docs are unclear whether those preflight requests are charged or not, and in a way it seems unfair if they were charged given there’s no way to prevent this situation. But at this point it is fair to assume they are charged and we need a way to prevent having to pay for a gazillion preflight calls by a malicious actor.\nIn good software developer’s tradition I turned to Stack Overflow for finding help, and indeed I received a nice idea: A budget alert can be linked with an SNS topic, to which a message will be sent once the alert triggers. Then another Lambda function can be used to set the allowed rate of API invocations to 0, effectively disabling the API, preventing any further cost to pile up. A bit more complex than I was hoping for, but it does the trick. Thanks a lot to Harish for providing this nice answer on Stack Overflow and his blog! I implemented this solution and sleep much better now.\nNote that you should set the alert to a lower value than what you’re actually willing to spend, as billing happens asynchronously and requests might come in some more time until the alert triggers: as per Corey Quinn, there’s an \u0026#34;8-48 hour lag between \u0026#39;you incur the charge\u0026#39; and \u0026#39;it shows up in the billing system where an alert can see it and thus fire\u0026#39;\u0026#34;. It’s therefore also a good idea to reduce the allowed request rate. E.g. in my case I’m not expecting really that there’d be more than let’s say 25 concurrent requests (unless this post hits the Hackernews front page of course), so setting the allowed rate to that value helps to at least slow down the spending until the alert triggers.\nWith these measures in place, there should (hopefully!) be no bad surprises at the end of the month. Assuming a (very generously estimated) number of 10K search queries per month, each returning a payload of 5 KB, I’d be looking at an invoice over EUR 0.04 for the API Gateway, while the Lambda executions would be fully covered by the AWS free tier. That seems manageable :)\nWrap-Up and Outlook Having rolled out the search feature for this blog a few days ago, I’m really happy with the outcome. It was a significant amount of work to put everything together, but I think a custom search is a great addition to this site which hopefully proves helpful to my readers. Serverless is a perfect architecture and deployment option for this use case, being very cost-efficient for the expected low volume of requests, and providing a largely hands-off operations experience for myself.\nWith AOT compilation down to native binaries and enabling frameworks like Quarkus, Java definitely is in the game for building Serverless apps. Its huge eco-system of libraries such as Apache Lucene, sophisticated tooling and solid performance make it a very attractive implementation choice. Basing the application on Quarkus makes it a matter of configuration to switch between creating a deployment package for Lambda and a regular container image, avoiding any kind of lock-in into a specific platform.\nEnabling libraries for being used in native binaries can be a daunting task, but over time I’d expect either library authors themselves to do the required adjustment to smoothen that experience, and of course the growing number of Quarkus extensions also helps to use more and more Java libraries in native apps. I’m also looking forward to Project Leyden, which aims at making AOT compilation a part of the Java core platform.\nThe deployment to AWS Lambda and API Gateway was definitely more involved than I had anticipated; things like IAM and budget control are more complex than I think they could and should be. That there is no way to set up a hard spend capping is a severe shortcoming; hobbyists like myself should be able to explore this platform without having to fear any surprise AWS bills. It’s particular bothersome that API usage plans are no 100% safe way to enforce API quotas, as they cannot be applied to unauthorized CORS pref-flight requests and custom scripting is needed in order to close this loophole.\nBut then this experiment also was an interesting learning experience for me; working on libraries and integration solutions most of the time during my day job, I sincerely enjoyed the experience of designing a service from the ground-up and rolling it out into \u0026#34;production\u0026#34;, if I may dare to use that term here.\nWhile the search functionality is rolled out on my blog, ready for you to use, there’s a few things I’d like to improve and expand going forward:\nCI pipeline: Automatically re-building and deploying the search service after changes to the contents of my blog; this should hopefully be quite easy using GitHub Actions\nPerformance improvements: While the performance of the query service definitely is good enough, I’d like to see whether and how it could be tuned here and there. Tooling might be challenging there; where I’d use JDK Flight Recorder and Mission Control with a JVM based application, I’m much less familiar with equivalent tooling for native binaries. One option I’d like to explore in particular is taking advantage of Quarkus bytecode recording capability: bytecode instructions for creating the in-memory data structure of the Lucene index could be recorded at build time and then just be executed at application start-up; this might be the fastest option for loading the index in my special use case of a read-only index\nServerless comments: Currently I’m using Disqus for the commenting feature of the blog. It’s not ideal in terms of privacy and page loading speed, which is why I’m looking for alternatives. One idea could be a custom Serverless commenting functionality, which would be very interesting to explore, in particular as it shifts the focus from a purely immutable application to a stateful service that’ll require some means of modifiable, persistent storage\nIn the meantime, you can find the source code of the Serverless search feature on GitHub. Feel free to take the code and deploy it to your own website!\nMany thanks to Hans-Peter Grahsl and Eric Murphy for their feedback while writing this post!\n","id":173,"publicationdate":"Jul 29, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_why_serverless\"\u003eWhy Serverless?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_solution_overview\"\u003eSolution Overview\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_data_extraction\"\u003eData Extraction\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_search_backend_implementation\"\u003eSearch Backend Implementation\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_apache_lucene_in_a_graalvm_native_binary\"\u003eApache Lucene in a GraalVM Native Binary\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_search_http_service\"\u003eThe Search HTTP Service\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wiring_things_up\"\u003eWiring Things Up\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_deployment_to_aws_lambda\"\u003eDeployment to AWS Lambda\u003c/a\u003e\n\u003cul class=\"sectlevel2\"\u003e\n\u003cli\u003e\u003ca href=\"#_building_quarkus_applications_for_aws_lambda\"\u003eBuilding Quarkus Applications for AWS Lambda\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_identity_and_access_management\"\u003eIdentity and Access Management\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_performance\"\u003ePerformance\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_cost_control\"\u003eCost Control\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up_and_outlook\"\u003eWrap-Up and Outlook\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph teaser\"\u003e\n\u003cp\u003eI have built a custom search functionality for this blog,\nbased on Java and the Apache Lucene full-text search library,\ncompiled into a native binary using the Quarkus framework and GraalVM.\nIt is deployed as a Serverless application running on AWS Lambda,\nproviding search results without any significant cold start delay.\nIf you thought Java wouldn’t be the right language for this job, keep reading;\nin this post I’m going to give an overview over the implementation of this feature and my learnings along the way.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","quarkus","serverless","graalvm"],"title":"How I Built a Serverless Search for My Blog","uri":"https://www.morling.dev/blog/how-i-built-a-serverless-search-for-my-blog/"},{"content":"","id":174,"publicationdate":"Jul 29, 2020","section":"tags","summary":"","tags":null,"title":"serverless","uri":"https://www.morling.dev/tags/serverless/"},{"content":"","id":175,"publicationdate":"Jun 11, 2020","section":"tags","summary":"","tags":null,"title":"appcds","uri":"https://www.morling.dev/tags/appcds/"},{"content":" Table of Contents Manually Creating CDS Archives Using the CDS Archive Creating CDS Archives in Your Maven Build What Do You Gain? Wrap-Up Ahead-of-time compilation (AOT) is the big topic in the Java ecosystem lately: by compiling Java code to native binaries, developers and users benefit from vastly improved start-up times and reduced memory usage. The GraalVM project made huge progress towards AOT-compiled Java applications, and Project Leyden promises to standardize AOT in a future version of the Java platform.\nThis makes it easy to miss out on significant performance improvements which have been made on the JVM in recent Java versions, in particular when it comes to faster start-up times. Besides a range of improvements related to class loading, linking and bytecode verification, substantial work has been done around class data sharing (CDS). Faster start-ups are beneficial in many ways: shorter turnaround times during development, quicker time-to-first-response for users in coldstart scenarios, cost savings when billed by CPU time in the cloud.\nWith CDS, class metadata is persisted in an archive file, which during subsequent application starts is mapped into memory. This is faster than loading the actual class files, resulting in reduced start-up times. When starting multiple JVM processes on the same host, read-only archives of class metadata can also be shared between the VMs, so that less memory is consumed overall.\nOriginally a partially commercial feature of the Oracle JDK, CDS was completely open-sourced in JDK 10 and got incrementally improved since then in a series of Java improvement proposals:\nJEP 310, Application Class-Data Sharing (AppCDS), in JDK 10: \u0026#34;To improve startup and footprint, extend the existing [CDS] feature to allow application classes to be placed in the shared archive\u0026#34;\nJEP 341, Default CDS Archives, in JDK 12: \u0026#34;Enhance the JDK build process to generate a class data-sharing (CDS) archive, using the default class list, on 64-bit platforms\u0026#34;\nJEP 350, Dynamic CDS Archives, in JDK 13: \u0026#34;Extend application class-data sharing to allow the dynamic archiving of classes at the end of Java application execution. The archived classes will include all loaded application classes and library classes that are not present in the default, base-layer CDS archive\u0026#34;\nIn the remainder of this blog post we’ll discuss how to automatically create AppCDS archives as part of your (Maven) project build, based on the improvements made with JEP 350. I.e. Java 13 or later is a prerequisite for this. To learn more about using CDS with the current LTS release JDK 11 and about CDS in general, refer to the excellent blog post on everything CDS by Nicolai Parlog.\nManually Creating CDS Archives At first let’s see what’s needed to manually create and use an AppCDS archive (note I’m going to use \u0026#34;AppCDS\u0026#34; and \u0026#34;CDS\u0026#34; somewhat interchangeably for the sake of brevity). Subsequently, we’ll discuss how the task can be automated in a Maven project build.\nTo have an example to work with which goes beyond a plain \u0026#34;Hello World\u0026#34;, I’ve created a small web application for managing personal to-dos, using the Quarkus stack. If you’d like to follow along, clone the repo and build the project:\n1 2 3 git clone git@github.com:gunnarmorling/quarkus-cds.git cd quarkus-cds mvn clean verify -DskipTests=true The application uses a Postgres database for persisting the to-dos; fire it up via Docker:\n1 2 3 4 5 6 cd compose docker run -d -p 5432:5432 --name pgdemodb \\ -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \\ -e POSTGRES_USER=todouser \\ -e POSTGRES_PASSWORD=todopw \\ -e POSTGRES_DB=tododb postgres:11 The next step is to run the application and create the CDS archive file. Do so by passing the -XX:ArchiveClassesAtExit option:\n1 2 java -XX:ArchiveClassesAtExit=target/app-cds.jsa \\ (1) -jar target/todo-manager-1.0.0-SNAPSHOT-runner.jar 1 Triggers creation of a CDS archive at the given location upon application shutdown Only loaded classes will be added to the archive. As classloading on the JVM happens lazily, you must invoke some functionality in your application in order to cause all the relevant classes to be loaded. For that to happen, open the application’s API endpoint in a browser or invoke it via curl, httpie or similar:\n1 http localhost:8080/api Stop the application by hitting Ctrl+C. This will create the CDS archive under target/app-cds.jsa. In our case it should have a size of about 41 MB. Also observe the log messages about classes which were skipped from archiving:\n1 2 3 4 5 ... [190.220s][warning][cds] Skipping java/lang/invoke/LambdaForm$MH+0x0000000800bd0c40: Hidden or Unsafe anonymous class [190.220s][warning][cds] Skipping java/lang/invoke/LambdaForm$DMH+0x0000000800fdc840: Hidden or Unsafe anonymous class [190.220s][warning][cds] Pre JDK 6 class not supported by CDS: 46.0 antlr/TokenStreamIOException ... Mostly this is about hidden or anonymous classes which cannot be archived; there’s not so much you can do about that (apart from using less Lambda expressions perhaps…​).\nThe hint on old classfile versions is more actionable: only classes using classfile format 50 (= JDK 1.6) or newer are supported by CDS. In the case at hand, the classes from Antlr 2.7.7 are using classfile format 46 (which was introduced in Java 1.2) and thus cannot be added to the CDS archive. Note this also applies to any subclasses, even if they themselves use a newer classfile format version.\nIt’s thus a good idea to check whether you can upgrade to newer versions of your dependencies, as this may result in more classes becoming available for CDS, resulting in better start-up times in turn.\nUsing the CDS Archive Now let’s run the application again, this time using the previously created CDS archive:\n1 2 3 4 java -XX:SharedArchiveFile=target/app-cds.jsa \\ (1) -Xlog:class+load:file=target/classload.log \\ (2) -Xshare:on \\ (3) -jar target/todo-manager-1.0.0-SNAPSHOT-runner.jar 1 The path to the CDS archive 2 classloading logging allows to verify whether the CDS archive gets applied as expected 3 While class data sharing is enabled by default on JDK 12 and newer, explicitely enforcing it will ensure an error is raised if something is wrong, e.g. a mismatch of Java versions between building and using the archive When examining the classload.log file, you should see how most class metadata is obtained from the CDS archive (\u0026#34;source: shared object file\u0026#34;), while some classes such as the ancient Antlr classes are loaded just as usual from the corresponding JAR:\n1 2 3 4 5 6 7 [0.016s][info][class,load] java.lang.Object source: shared objects file [0.016s][info][class,load] java.io.Serializable source: shared objects file [0.016s][info][class,load] java.lang.Comparable source: shared objects file [0.016s][info][class,load] java.lang.CharSequence source: shared objects file ... [2.555s][info][class,load] antlr.Parser source: file:/.../antlr.antlr-2.7.7.jar ... Note it is vital that the exact same Java version is used as when creating the archive, otherwise an error will be raised. Unfortunately, this also means that AppCDS archives cannot be built cross-platform. This would be very useful, e.g. when building a Java application on macOS or Windows, which should be packaged in a Linux container. If you are aware of a way for doing so, please let me know in the comments below.\nCDS and the Java Module System Beginning with Java 11, not only classes from the classpath can be added to CDS archives, but also classes from the module path of a modularized Java application. One important detail to consider there is that the --upgrade-module-path and --patch-module options will cause CDS to be disabled or disallowed (with -Xshare:on) is specified. This is to avoid a mismatch of class metadata in the CDS archive and classes brought in by a newer module version.\nCreating CDS Archives in Your Maven Build Manually creating a CDS archive is not very efficient nor reliable, so let’s see how the task can be automated as part of your project build. The following shows the required configuration when using Apache Maven, but of course the same approach could be implemented with Gradle or any other build system.\nThe basic idea is the follow the same steps as before, but executed as part of the Maven build:\nstart up the application with the -XX:ArchiveClassesAtExit option\ninvoke some application functionality to initiate the loading of all relevant classes\nstop the application\nIt might appear as a compelling idea to produce the CDS archive as part of regular test execution, e.g. via JUnit. This will not work though, as the classpath at the time of using the CDS archive must be not miss any entries from the classpath at the time of creating it. As during test execution all the test-scoped dependencies will be part of the classpath, any CDS archive created that way couldn’t be used when running the application later on without those test dependencies.\nSteps 1. and 3. can be automated with help of the Process-Exec Maven plug-in, binding it to the pre-integration-test and post-integration-test build phases, respectively. While I was thinking of using the more widely known Exec plug-in initially, this turned out to not be viable as there’s no way for stopping any forked process in a later build phase.\nHere’s the relevant configuration:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;com.bazaarvoice.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;process-exec-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;0.9\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; (1) \u0026lt;id\u0026gt;app-cds-creation\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;pre-integration-test\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;start\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;name\u0026gt;todo-manager\u0026lt;/name\u0026gt; \u0026lt;healthcheckUrl\u0026gt;http://localhost:8080/\u0026lt;/healthcheckUrl\u0026gt; (2) \u0026lt;arguments\u0026gt; \u0026lt;argument\u0026gt;java\u0026lt;/argument\u0026gt; (3) \u0026lt;argument\u0026gt;-XX:ArchiveClassesAtExit=app-cds.jsa\u0026lt;/argument\u0026gt; \u0026lt;argument\u0026gt;-jar\u0026lt;/argument\u0026gt; \u0026lt;argument\u0026gt; ${project.build.directory}/${project.artifactId}-${project.version}-runner.jar \u0026lt;/argument\u0026gt; \u0026lt;/arguments\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;execution\u0026gt; (4) \u0026lt;id\u0026gt;stop-all\u0026lt;/id\u0026gt; \u0026lt;phase\u0026gt;post-integration-test\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;stop-all\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... 1 Start up the application in the pre-integration-test build phase 2 The health-check URL is used to await application start-up before proceeding with the next build phase 3 Assemble the java invocation 4 Stop the application in the post-integration-test build phase What remains to be done is the automation of step 2, the invocation of the required application logic so to trigger the loading of all relevant classes. This can be done with help of the Maven Surefire plug-in. A simple \u0026#34;integration test\u0026#34; via REST Assured does the trick:\n1 2 3 4 5 6 7 8 9 10 11 public class ExampleResourceAppCds { @Test public void getAll() { given() .when() .get(\u0026#34;/api\u0026#34;) .then() .statusCode(200); } } We just need to configure a specific execution of the plug-in, which only picks up any test classes whose names end with *AppCds.java, so to keep them apart from actual integration tests:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-failsafe-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.0.0-M4\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;integration-test\u0026lt;/goal\u0026gt; \u0026lt;goal\u0026gt;verify\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;includes\u0026gt; \u0026lt;include\u0026gt;**/*AppCds.java\u0026lt;/include\u0026gt; \u0026lt;/includes\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... And that’s all we need; when now building the project via mvn clean verify, a CDS archive will be created at target/app-cds.jsa. You can find the complete example project and steps for building/running it on GitHub.\nWhat Do You Gain? Creating a CDS archive is nice, but is it also worth the effort? In order to answer this question, I’ve done some measurements of the \u0026#34;time-to-first-response\u0026#34; metric, following the Quarkus guide on measuring performance. I.e. instead of awaiting some rather meaningless \u0026#34;start-up complete\u0026#34; status, which could arbitrarily be tweaked by means of lazy initialization, this measures the time until the application is actually ready to handle the first incoming request after start-up.\nI’ve done measurements on OpenJDK 1.8.0_252 (AdoptOpenJDK build), OpenJDK 14.0.1 (upstream build, without and with AppCDS), and OpenJDK 15-ea-b26 (upstream build, with AppCDS). Please see the README file of the example repo for the exact steps.\nHere are the numbers, averaged over ten runs each:\nUpdate, June 12th: I had originally classload logging enabled for the OpenJDK 14 AppCDS runs, which added an unneccessary overhead (thanks a lot to Claes Redestad for pointing this out!). The numbers and chart have been updated accordingly. I’ve also added numbers for OpenJDK 15-ea.\nTime-to-first-response values are 2s 267ms, 2s 162ms, 1s 669ms 1s 483ms, and 1s 279ms. I.e. on my machine (2014 MacBook Pro), with this specific workload, there’s an improvement of ~100ms just by upgrading to the current JDK, and of another ~500ms ~700ms by using AppCDS.\nWith OpenJDK 15 things will further improve. The latest EA build at the time of writing (b26) shortens time-to-first-response by another ~200ms. The upcoming EA build 27 should bring another improvement, as Lambda proxy classes will be added to AppCDS archives then.\nThat all is definitely a nice improvement, in particular as we get it essentially for free, without any changes to the actual application itself. You should contrast this with the additional size of the application distribution, though. E.g. when obtaining the application as a container image from a remote container registry, downloading the additional ~40 MB might take longer than the time saved during application start-up. Typically, this will only affect the first start-up of on a particular node, though, after which the image will be cached locally.\nAs always when it comes to any kinds of performance numbers, please take these numbers with a grain of salt, do your own measurements, using your own applications and in your own environment.\nAddressing Different Workload Profiles If your application supports different \u0026#34;work modes\u0026#34;, e.g. \u0026#34;online\u0026#34; and \u0026#34;batch\u0026#34;, which work with a largely differing set of classes, you also might consider to create different CDS archives for the specific workloads. This might give you a good balance between additional size and realized improvements of start-up times, when for instance dealing with at large monolithic application instead of more fine-grained microservices.\nWrap-Up AppCDS provides Java developers with a useful tool for reducing start-up times of their applications, without requiring any code changes. For the example discussed, we could observe an improvement of the time-to-first-response metric by about 30% when running with OpenJDK 14. Other users reported even bigger improvements.\nWe didn’t discuss any potential memory improvements due to CDS when sharing class metadata between multiple JVMs on one host. In containerized server applications, with each JVM being packaged in its own container image, this won’t play a role. It could make a difference on desktop systems, though. For instance multiple instances of the Java language server, as leveraged by VSCode and other editors, could benefit from that.\nThat all being said, when raw start-up time is your primary concern, e.g. in a serverless or Function-based setting, you should look at AOT compilation with GraalVM (or Project Leyden in the future). This will bring down start-up times to a completely different level; for example the todo manager application would return a first response within a few 10s of milliseconds when executed as a native image via GraalVM.\nBut AOT is not always an option, nor does it always make sense: the JVM may offer a better latency than native binaries, external dependencies migh not be ready for usage in AOT-compiled native images yet, or you simply might want to be able to benefit from all the JVM goodness, like familiar debugging tools, the JDK Flight Recorder, or JMX. In that case, CDS can give you a nice start-up time improvement, solely by means of adding a few steps to your build process.\nBesides class data sharing in OpenJDK, there are some other related techniques for improving start-up times which are worth exploring:\nEclipse OpenJ9 has its own implementation of class data sharing\nAlibaba’s Dragonwell distribution of the OpenJDK comes with JWarmUp, a tool for speeding up initial JIT compilations\nTo learn more about AppCDS, a long yet insightful post is this one by Vladimir Plizga. Volker Simonis did another interesting write-up. Also take a look at the CDS documentation in the reference docs of the java command.\nLastly, the Quarkus team is working on out-of-the-box support for CDS archives. This could fully automate the creation of an archive for all required classes without any further configuration, making it even easier to benefit from the start-up time improvements promised by CDS.\n","id":176,"publicationdate":"Jun 11, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_manually_creating_cds_archives\"\u003eManually Creating CDS Archives\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_using_the_cds_archive\"\u003eUsing the CDS Archive\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_creating_cds_archives_in_your_maven_build\"\u003eCreating CDS Archives in Your Maven Build\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_what_do_you_gain\"\u003eWhat Do You Gain?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap-Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAhead-of-time compilation (AOT) is \u003cem\u003ethe\u003c/em\u003e big topic in the Java ecosystem lately:\nby compiling Java code to native binaries, developers and users benefit from vastly improved start-up times and reduced memory usage.\nThe \u003ca href=\"https://www.graalvm.org/\"\u003eGraalVM\u003c/a\u003e project made huge progress towards AOT-compiled Java applications,\nand \u003ca href=\"https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html\"\u003eProject Leyden\u003c/a\u003e promises to standardize AOT in a future version of the Java platform.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThis makes it easy to miss out on significant performance improvements which have been made on the JVM in recent Java versions,\nin particular when it comes to \u003ca href=\"https://cl4es.github.io/2019/11/20/OpenJDK-Startup-Update.html\"\u003efaster start-up times\u003c/a\u003e.\nBesides a range of improvements related to class loading, linking and bytecode verification,\nsubstantial work has been done around \u003ca href=\"https://docs.oracle.com/en/java/javase/14/vm/class-data-sharing.html\"\u003eclass data sharing\u003c/a\u003e (CDS).\nFaster start-ups are beneficial in many ways:\nshorter turnaround times during development,\nquicker time-to-first-response for users in coldstart scenarios,\ncost savings when billed by CPU time in the cloud.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWith CDS, class metadata is persisted in an archive file,\nwhich during subsequent application starts is mapped into memory.\nThis is faster than loading the actual class files, resulting in reduced start-up times.\nWhen starting multiple JVM processes on the same host, read-only archives of class metadata can also be shared between the VMs, so that less memory is consumed overall.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","performance","maven","appcds"],"title":"Building Class Data Sharing Archives with Apache Maven","uri":"https://www.morling.dev/blog/building-class-data-sharing-archives-with-apache-maven/"},{"content":" Table of Contents Format Conversions Ensuring Backwards Compatibility Filtering and Routing Tombstone Handling Externalizing Large Payloads Limitations Learning More Do you remember Angus \u0026#34;Mac\u0026#34; MacGyver? The always creative protagonist of the popular 80ies/90ies TV show, who could solve about any problem with nothing more than a Swiss Army knife, duct tape, shoe strings and a paper clip?\nThe single message transformations (SMTs) of Kafka Connect are almost as versatile as MacGyver’s Swiss Army knife:\nHow to change the timezone or format of date/time message fields?\nHow to change the topic a specific message gets sent to?\nHow to filter out specific records?\nSMTs can be the answer to these and many other questions that come up in the context of Kafka Connect. Applied to source or sink connectors, SMTs allow to modify Kafka records before they are sent to Kafka, or after they are consumed from a topic, respectively.\nIn this post I’d like to focus on some interesting (hopefully anyways) usages of SMTs. Those use cases are mostly based on my experiences from using Kafka Connect with Debezium, an open-source platform for change data capture (CDC). I also got some great pointers on interesting SMT usages when asking the community about this on Twitter some time ago:\nI definitely recommend to check out the thread; thanks a lot to all who replied! In order to learn more about SMTs in general, how to configure them etc., refer to the resources given towards the end of this post.\nFor each category of use cases, I’ve also asked our sympathetic TV hero for his opinion on the usefulness of SMTs for the task at hand. You can find his rating at the end of each section, ranging from 📎 (poor fit) to 📎📎📎📎📎 (perfect fit).\nFormat Conversions Probably the most common application of SMTs is format conversion, i.e. adjustments to type, format and representation of data. This may apply to entire messages, or to specific message attributes. Let’s first look at a few examples for converting individual message attribute formats:\nTimestamps: Different systems tend to have different assumptions of how timestamps should be typed and formatted. Debezium for instance represents most temporal column types as milli-seconds since epoch. Change event consumers on the other hand might expect such date and time values using Kafka Connect’s Date type, or as an ISO-8601 formatted string, potentially using a specific timezone\nValue masking: Sensitive data might have be to masked or truncated, or specific fields should even be removed altogether; the org.apache.kafka.connect.transforms.MaskField and ReplaceField SMTs shipping with Kafka Connect out of the box come in handy for that\nNumeric types: Similar to timestamps, requirements around the representation of (decimal) numbers may differ between systems; e.g. Kafka Connect’s Decimal type allows to convey arbitrary-precision decimals, but its binary representation of numbers might not be supported by all sink connectors and consumers\nName adjustments: Depending on the chosen serialization formats, specific field names might be unsupported; when working with Apache Avro for instance, field names must not start with a number\nIn all these cases, either existing, ready-made SMTs or bespoke implementations can be used to apply the required attribute type and/or format conversions.\nWhen using Kafka Connect for integrating legacy services and databases with newly built microservices, such format conversions can play an important role for creating an anti-corruption layer: by using better field names, choosing more suitable data types or by removing unneeded fields, SMTs can help to shield a new service’s model from the oddities and quirks of the legacy world.\nBut SMTs cannot only modify the representation of single fields, also the format and structure of entire messages can be adjusted. E.g. Kafka Connect’s ExtractField transformation allows to extract a single field from a message and propagate that one. A related SMT is Debezium’s SMT for change event flattening. It can be used to convert the complex Debezium change event structure with old and new row state, metadata and more, into a flat row representation, which can be consumed by many existing sink connectors.\nSMTs also allow to fine-tune schema namespaces; that can be of interest when working with a schema registry for managing schemas and their versions, and specific schema namespaces should be enforced for the messages on given topics. Two more, very useful examples of SMTs in this category are kafka-connect-transform-xml and kafka-connect-json-schema by Jeremy Custenborder, which will take XML or text and produce a typed Kafka Connect Struct, based on a given XML schema or JSON schema, respectively.\nLastly, as a special kind of format conversion, SMTs can be used to modify or set the key of Kafka records. This may be desirable if a source connector doesn’t produce any meaningful key, but one can be extracted from the record value. Also changing the message key can be useful, when considering subsequent stream processing. Choosing matching keys right at the source side e.g. allows for joining multiple topics via Kafka Streams, without the need for re-keying records.\nMac’s rating: 📎📎📎📎📎 SMTs are the perfect tool for format conversions of Kafka Connect records\nEnsuring Backwards Compatibility Changes to the schema of Kafka records can potentially be disruptive for consumers. If for instance a record field gets renamed, a consumer must be adapted accordingly, reading the value using the new field name. In case a field gets dropped altogether, consumers must not expect this field any longer.\nMessage transformations can help with such transition from one schema version to the next, thus reducing the coupling of the lifecycles of message producers and consumers. In case of a renamed field, an SMT could add the field another time, using the original name. That’ll allow consumers to continue reading the field using the old name and to be upgraded to use the new name at their own pace. After some time, once all consumers have been adjusted, the SMT can be removed again, only exposing the new field name going forward. Similarly, a field that got removed from a message schema could be re-added, e.g. using some sort of constant placeholder value. In other cases it might be possible to derive the field value from other, still existing fields. Again consumers could then be updated at their own pace to not expect and access that field any longer.\nIt should be said though that there are limits for this usage: e.g. when changing the type of a field, things quickly become tricky. One option could be a multi-step approach where at first a separate field with the new type is added, before renaming it again as described above.\nMac’s rating: 📎📎📎 SMTs can primarily help to address basic compatibility concerns around schema evolution\nFiltering and Routing When applied on the source side, SMTs allow to filter out specific records produced by the connector. They also can be used for controlling the Kafka topic a record gets sent to. That’s in particular interesting when filtering and routing is based on the actual record contents. In an IoT scenario for instance where Kafka Connect is used to ingest data from some kind of sensors, an SMT might be used to filter out all sensor measurements below a certain threshold, or route measurement events above a threshold to a special topic.\nDebezium provides a range of SMTs for record filtering and routing:\nThe logical topic routing SMT allows to send change events originating from multiple tables to the same Kafka topic, which can be useful when working with partition tables in Postgres, or with data that is sharded into multiple tables\nThe Filter and ContentBasedRouter SMTs let you use script expressions in languages such as Groovy or JavaScript for filtering and routing change events based on their contents; such script-based approach can be an interesting middleground between ease-of-use (no Java code must be compiled and deployed to Kafka Connect) and expressiveness; e.g. here is how the routing SMT could be used with GraalVM’s JavaScript engine for routing change events from a table with purchase orders to different topics in Kafka, based on the order type:\n1 2 3 4 5 6 7 8 ... transforms=route transforms.route.type=io.debezium.transforms.ContentBasedRouter transforms.route.topic.regex=.*purchaseorders transforms.route.language=jsr223.graal.js transforms.route.topic.expression= value.after.ordertype == \u0026#39;B2B\u0026#39; ? \u0026#39;b2b_orders\u0026#39; : \u0026#39;b2c_orders\u0026#39; ... The outbox event router comes in handy when implementing the transactional outbox pattern for data propagation between microservices: it can be used to send events originating from a single outbox table to a specific Kafka topic per aggregate (when thinking of domain driven design) or event type\nThere are also two SMTs for routing purposes in Kafka Connect itself: RegexRouter which allows to re-route records two different topics based on regular expressions, and TimestampRouter for determining topic names based on the record’s timestamp.\nWhile routing SMTs usually are applied to source connectors (defining the Kafka topic a record gets sent to), it can also make sense to use them with sink connectors. That’s the case when a sink connector derives the name of downstream table names, index names or similar from the topic name.\nMac’s rating: 📎📎📎📎📎 Message filtering and topic routing — no problem for SMTs\nTombstone Handling Tombstone records are Kafka records with a null value. They carry special semantics when working with compacted topics: during log compaction, all records with the same key as a tombstone record will be removed from the topic.\nTombstones will be retained on a topic for a configurable time before compaction happens (controlled via delete.retention.ms topic setting), which means that also Kafka Connect sink connectors need to handle them. Unfortunately though, not all connectors are prepared for records with a null value, typically resulting in NullPointerExceptions and similar. A filtering SMT such as the one above can be used to drop tombstone records in such case.\nBut also the exact opposite — producing tombstone records — can be useful: some sink connectors use tombstone records as the indicator to delete corresponding rows from a downstream datastore. Now when using a CDC connector like Debezium to capture changes from a database where \u0026#34;soft deletes\u0026#34; are used (i.e. records are not physically deleted, but a logically deleted flag is set to true when deleting a record), those change events will be exported as update events (which they technically are). A bespoke SMT can be used to translate these update events into tombstone records, triggering the deletion of corresponding records in downstream datastores.\nMac’s rating: 📎📎📎📎 SMTs work well to discard tombstones or convert soft delete events into tombstones. What’s not possible though is to keep the original event and produce an additional tombstone record at the same time\nExternalizing Large Payloads Even some advanced enterprise application patterns can be implemented with the help of SMTs, one example being the claim check pattern. This pattern comes in handy in situations like this:\nA message may contain a set of data items that may be needed later in the message flow, but that are not necessary for all intermediate processing steps. We may not want to carry all this information through each processing step because it may cause performance degradation and makes debugging harder because we carry so much extra data.\n— Gregor Hohpe, Bobby Woolf; Enterprise Application Patterns\nA specific example could again be a CDC connector that captures changes from a database table Users, with a BLOB column that contains the user’s profile picture (surely not a best practice, still not that uncommon in reality…​).\nApache Kafka and Large Messages Apache Kafka isn’t meant for large messages. The maximum message size is 1 MB by default, and while this can be increased, benchmarks are showing best throughput for much smaller messages. Strategies like chunking and externalizing large payloads can thus be vital in order to ensure a satisfying performance.\nWhen propagating change data events from that table to Apache Kafka, adding the picture data to each event poses a significant overhead. In particular, if the picture BLOB hasn’t changed between two events at all.\nUsing an SMT, the BLOB data could be externalized to some other storage. On the source side, the SMT could extract the image data from the original record and e.g. write it to a network file system or an Amazon S3 bucket. The corresponding field in the record would be updated so it just contains the unique address of the externalised payload, such as the S3 bucket name and file path:\nAs an optimization, it could be avoided to re-upload unchanged file contents another time by comparing earlier and current hash of the externalized file.\nA corresponding SMT instance applied to sink connectors would retrieve the identifier of the externalized files from the incoming record, obtain the contents from the external storage and put it back into the record before passing it on to the connector.\nMac’s rating: 📎📎📎📎 SMTs can help to externalize payloads, avoiding large Kafka records. Relying on another service increases overall complexity, though\nLimitations As we’ve seen, single message transformations can help to address quite a few requirements that commonly come up for users of Kafka Connect. But there are limitations, too; Like MacGyver, who sometimes has to reach for some other tool than his beloved Swiss Army knife, you shouldn’t think of SMTs as the perfect solution all the time.\nThe biggest shortcoming is already hinted at in their name: SMTs only can be used to process single records, one at a time. E.g. you cannot split up a record into multiple ones using an SMT, as they only can return (at most) one record. Also any kind of stateful processing, like aggregating data from multiple records, or correlating records from several topics is off limits for SMTs. For such use cases, you should be looking at stream processing technologies like Kafka Streams and Apache Flink; also integration technologies like Apache Camel can be of great use here.\nOne thing to be aware of when working with SMTs is configuration complexity; when using generic, highly configurable SMTs, you might end up with lengthy configuration that’s hard to grasp and debug. You might be better off implementing a bespoke SMT which is focussing on one particular task, leveraging the full capabilities of the Java programming language.\nSMT Testing Whether you use ready-made SMTs by means of configuration, or you implement custom SMTs in Java, testing your work is essential.\nWhile unit tests are a viable option for basic testing of bespoke SMT implementations, integration tests running against Kafka Connect connectors are recommended for testing SMT configurations. That way you’ll be sure that the SMT can process actual messages and it has been configured the way you intended to.\nTestcontainers and the Debezium support for Testcontainers are a great foundation for setting up all the required components such as Apache Kafka, Kafka Connect, connectors and the SMTs to test.\nA specific feature I wished for every now and then is the ability to apply SMTs only to a specific sub-set of the topics created or consumed by a connector. In particular if connectors create different kinds of topics (like an actual data topic and another one with with metadata), it can be desirable to apply SMTs only to the topics of one group but not the other. This requirement is captured in KIP-585 (\u0026#34;Filter and Conditional SMTs\u0026#34;), please join the discussion on that one if you got requirements or feedback related to that.\nLearning More There are several great presentations and blog posts out there which describe in depth what SMTs are, how you can implement your own one, how they are configured etc.\nHere are a few resources I found particularly helpful:\nKIP-66: The original KIP (Kafka Improvement Proposal) that introduced SMTs\nSinge Message Transforms are not the Transformations You’re Looking For: A great overview on SMTs, their capabilities as well as limitations, by Ewen Cheslack-Postava\nA hands-on experience with Kafka Connect SMTs: In-depth blog post on SMT use cases, things to be aware of and more, by Gian D’Uia\nNow, considering this wide range of use cases for SMTs, would MacGyver like and use them for implementing various tasks around Kafka Connect? I would certainly think so. But as always, the right tool for the job must be chosen: sometimes an SMT may be a great fit, another time a more flexible (and complex) stream processing solution might be preferable.\nJust as MacGyver, you got to make a call when to use your Swiss Army knife, duct tape or a paper clip.\nMany thanks to Hans-Peter Grahsl for his feedback while writing this blog post!\n","id":177,"publicationdate":"May 14, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_format_conversions\"\u003eFormat Conversions\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_ensuring_backwards_compatibility\"\u003eEnsuring Backwards Compatibility\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_filtering_and_routing\"\u003eFiltering and Routing\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_tombstone_handling\"\u003eTombstone Handling\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_externalizing_large_payloads\"\u003eExternalizing Large Payloads\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_limitations\"\u003eLimitations\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_learning_more\"\u003eLearning More\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eDo you remember Angus \u0026#34;Mac\u0026#34; MacGyver?\nThe always creative protagonist of the popular 80ies/90ies TV show, who could solve about any problem with nothing more than a Swiss Army knife, duct tape, shoe strings and a paper clip?\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe single message transformations (SMTs) of Kafka Connect are almost as versatile as MacGyver’s Swiss Army knife:\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"ulist\"\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eHow to change the timezone or format of date/time message fields?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eHow to change the topic a specific message gets sent to?\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eHow to filter out specific records?\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eSMTs can be the answer to these and many other questions that come up in the context of Kafka Connect.\nApplied to source or sink connectors,\nSMTs allow to modify Kafka records before they are sent to Kafka, or after they are consumed from a topic, respectively.\u003c/p\u003e\n\u003c/div\u003e","tags":["kafka","kafka-connect","debezium","cdc"],"title":"Single Message Transformations - The Swiss Army Knife of Kafka Connect","uri":"https://www.morling.dev/blog/single-message-transforms-swiss-army-knife-of-kafka-connect/"},{"content":" Table of Contents An Example The Emitter Parameter Pattern For libraries and frameworks it’s a common requirement to make specific aspects customizeable via service provider interfaces (SPIs): contracts to be implemented by the application developer, which then are invoked by framework code, adding new or replacing existing functionality.\nOften times, the method implementations of such an SPI need to return value(s) to the framework. An alternative to return values are \u0026#34;emitter parameters\u0026#34;: passed by the framework to the SPI method, they offer an API for receiving value(s) via method calls. Certainly not revolutionary or even a new idea, I find myself using emitter parameters more and more in libraries and frameworks I work on. Hence I’d like to discuss some advantages I perceive about the emitter parameter pattern.\nAn Example As an example, let’s consider a blogging platform which provides an SPI for extracting categories and tags from given blog posts. Application developers can plug in custom implementations of that SPI, e.g. based on the latest and greatest algorithms in information retrieval and machine learning. Here’s how a basic SPI contract for this use case could look like, using regular method return values:\n1 2 3 4 5 public interface BlogPostDataExtractor { Set\u0026lt;String\u0026gt; extractCategories(String contents); Set\u0026lt;String\u0026gt; extractTags(String contents); } This probably would get the job done, but there are a few problems: any implementation will have to do two passes on the given blog post contents, once in each method — not ideal. Also let’s assume that most blog posts only belong to exactly one category. Implementations still would have to allocate a set for the single returned category.\nWhile there’s not much we can do about the second issue with a return value based design, the former problem could be addressed by combining the two methods:\n1 2 3 4 public interface BlogPostDataExtractor { CategoriesAndTags extractCategoriesAndTags(String contents); } Now an implementation can retrieve both categories and tags at once. But it’s worth thinking about how an SPI implementation would instantiate the return type.\nExposing a concrete class to be instantiated by implementors poses a challenge for future evolution of the SPI: following the best practice and making the return object type immutable, all its properties must be passed to its constructor. Now if an additional attribute should be extracted from blog posts, such as a teaser, the existing constructor cannot be modified, so to not break existing user code. Instead, we’d have to introduce new constructors whenever adding further attributes. Dealing with all these constructors could become quite inconvenient, in particular if a specific SPI implementation is only interested in producing some of the attributes.\nAll in all, for SPIs it’s often a good idea to only expose interfaces, but no concrete classes. So we could make the return type an interface and leave it to SPI implementors to create an implementation class, but that’d be rather tedious.\nThe Emitter Parameter Pattern Or, we could provide some sort of builder object which can be used to construct CategoriesAndTags objects. But then why even return an object at all, instead of simply mutating the state of a builder that is provided through a method parameter? And that’s essentially what the emitter parameter pattern is about: passing in an object which can be used to emit the values which should be \u0026#34;returned\u0026#34; by the method.\nI’m not aware of any specific name for this pattern, so I came up with \u0026#34;emitter parameter pattern\u0026#34; (the notion of callback parameters is related, yet different). And hey, perhaps I’ll become famous for coining a design pattern name ;) Please let me know in the comments below if you know this pattern under a different name.\nHere’s how the extractor SPI could look like when designed with an emitter parameter:\n1 2 3 4 5 6 7 8 9 10 public interface BlogPostDataExtractor { void extractData(String contents, BlogPostDataReceiver data); (1) interface BlogPostDataReceiver { (2) void addCategory(String category); void addTag(String tag); } } 1 SPI method with input parameter and emitter parameter 2 Emitter parameter type An implementation would emit the retrieved information by invoking the methods on the data parameter:\n1 2 3 4 5 6 7 8 9 10 public class MyBlogPostDataExtractor implements BlogPostDataExtractor { public void extractData(String contents, BlogPostDataReceiver data) { String category = ...; Stream\u0026lt;String\u0026gt; tags = ...; data.addCatgory(category); tags.forEach(data::addTag); } } This approach nicely avoids all the issues with the return value based design:\nSingle and multiple value case handled uniformly: an implementation can call addCategory() just once, or multiple times; either way, it doesn’t have to deal with the creation of a set, list, or other container for the produced value(s)\nFlexible evolution of the SPI contract: new methods such as addTeaser(), or addTags(String…​ tags) can be added to the emitter parameter type, avoiding the creation of more and more return type constructors; as the passed BlogPostDataReceiver instance is controlled by the framework itself, we also could add methods which provide more context required for the task at hand\nNo need for exposing concrete types on the SPI surface: as no return value needs to be instantiated by SPI implementations, the solution works solely with interfaces on the SPI surface; this provides more control to the framework, e.g. the emitter object could be re-used etc.\nFlexible implementation choices: by not requiring SPI implementations to allocate any return objects, the platform gains a lot of flexibility for how it’s processing the emitted values: while it could collect the values in a set or list, it also has the option to not allocate any intermediary collections, but process and pass on values one-by-one in a streaming-based way, without any of this impacting SPI implementors\nNow, are there some downsides to this approach, too? I can see two: if a method only ever should yield a single value, the emitter API might be misleading. We could raise an exception though if an emitter method is called more than once. Also an implementation might hold on to the emitter object and invoke its methods after the call flow has returned from the SPI method, which typically isn’t desirable. Again that’s something that can be prevented by invalidating the emitter object after the SPI method returned, raising an exception in case of further method invocations.\nOverall, I think the emitter parameter pattern is a valuable tool in the box of library and framework authors; it provides flexibility for implementation choices and future evolution when designing SPIs. Real-world examples include the ValueExtractor SPI in Bean Validation 2.0 (where it was chosen to provide a uniform value of extracting single and multiple values from container objects) and the ChangeRecordEmitter contract in Debezium’s SPI.\nMany thanks to Hans-Peter Grahsl and Nils Hartmann for reviewing an early version of this blog post.\n","id":178,"publicationdate":"May 4, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_an_example\"\u003eAn Example\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_emitter_parameter_pattern\"\u003eThe Emitter Parameter Pattern\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eFor libraries and frameworks it’s a common requirement to make specific aspects customizeable via \u003ca href=\"https://en.wikipedia.org/wiki/Service_provider_interface\"\u003eservice provider interfaces\u003c/a\u003e (SPIs):\ncontracts to be implemented by the application developer, which then are invoked by framework code,\nadding new or replacing existing functionality.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOften times, the method implementations of such an SPI need to return value(s) to the framework.\nAn alternative to return values are \u0026#34;emitter parameters\u0026#34;:\npassed by the framework to the SPI method, they offer an \u003cem\u003eAPI\u003c/em\u003e for receiving value(s) via method calls.\nCertainly not revolutionary or even a new idea,\nI find myself using emitter parameters more and more in libraries and frameworks I work on.\nHence I’d like to discuss some advantages I perceive about the emitter parameter pattern.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","api-design","patterns"],"title":"The Emitter Parameter Pattern for Flexible SPI Contracts","uri":"https://www.morling.dev/blog/emitter-parameter-pattern-for-flexible-spis/"},{"content":" Table of Contents An Example: The Greeter CLI App Application Plug-ins With Layrry Finding Plug-in Implementations With the Java Service Loader Seeing it in Action Making applications extensible with some form of plug-ins is a very common pattern in software design: based on well-defined APIs provided by the application core, plug-ins can customize an application’s behavior and provide new functionality. Examples include desktop applications like IDEs or web browsers, build tools such as Apache Maven or Gradle, as well as server-side applications such as Apache Kafka Connect, a runtime for Kafka connectors plug-ins.\nIn this post I’m going to explore how the Java Platform Module System\u0026#39;s notion of module layers can be leveraged for implementing plug-in architectures on the JVM. We’ll also discuss how Layrry, a launcher and runtime for layered Java applications, can help with this task.\nA key requirement for any plug-in architecture is strong isolation between different plug-ins: their state, classes and dependencies should be encapsulated and independent of each other. E.g. package declarations in two plug-ins should not collide, also they should be able to use different versions of another 3rd party dependency. This is why the default module path of Java (specified using the --module-path option) is not enough for this purpose: it doesn’t support more than one version of a given module.\nThe module system’s answer are module layers: by organizing an application and its plug-ins into multiple layers, the required isolation between plug-ins can be achieved.\nWith the module system, each Java application always contains at least one layer, the boot layer. It contains the platform modules and the modules provided on the module path.\nAn Example: The Greeter CLI App To make things more tangible, let’s consider a specific example; The \u0026#34;Greeter\u0026#34; app is a little CLI utility, that can produce greetings in different languages.\nIn order to not limit the number of supported languages, it provides a plug-in API, which allows to add additional greeting implementations, without the need to rebuild the core application. Here is the Greeter contract, which is to be implemented by each language plug-in:\n1 2 3 4 5 package com.example.greeter.api; public interface Greeter { String greet(String name); } Greeters are instantiated via accompanying implementations of GreeterFactory:\n1 2 3 4 5 public interface GreeterFactory { String getLanguage(); (1) String getFlag(); Greeter getGreeter(); (2) } 1 The getLanguage() and getFlag() methods are used to show a description of all available greeters in the CLI application 2 The getGreeter() method returns a new instance of the corresponding Greeter type Here’s the overall architecture of the Greeter application, with three different language implementations:\nThe application is made up of five different layers:\ngreeter-platform: contains the Greeter and GreeterFactory contracts\ngreeter-en, greeter-de and greeter-fr: greeter implementations for different languages; note how each one is depending on a different version of some greeter-date module. As they are isolated in different layers, they can co-exist within the application\ngreeter-app: the \u0026#34;shell\u0026#34; of the application which loads all the greeter implementations and makes them accessible as a simple CLI application\nNow let’s see how this application structure can be assembled using Layrry.\nApplication Plug-ins With Layrry In a previous blog post we’ve explored how applications can be cut into layers, described in Layrry’s layers.yml configuration file. A simple static layer definition would defeat the purpose of a plug-in architecture, though: not all possible plug-ins are known when assembling the application.\nLayrry addresses this requirement by allowing to source different layers from directories on the file system:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 layers: platform: (1) modules: - \u0026#34;com.example.greeter:greeter-api:1.0.0\u0026#34; plugins: (2) parents: - \u0026#34;api\u0026#34; directory: path/to/plugins app: (3) parents: - \u0026#34;plugins\u0026#34; modules: - \u0026#34;com.example.greeter:greeter-app:1.0.0\u0026#34; main: module: com.example.greeter.app class: com.example.greeter.app.App 1 The platform layer with the API module 2 The plug-in layer(s) 3 The application layer with the \u0026#34;application shell\u0026#34; Whereas the platform and app layers are statically defined, using the Maven GAV coordinates of the modules to include, the plugins part of the configuration describes an open-ended set of layers. Each sub-directory of the given directory represents its own layer. All modules within this sub-directory will be added to the layer, and the API layer will be the parent of each of the plug-in layers. The app layer has all the plug-in layers as its ancestors, allowing it to retrieve plug-in implementations from these layers.\nMore greeter plug-ins can be added to the application by simply creating a sub-directory with the required module(s).\nFinding Plug-in Implementations With the Java Service Loader Structuring the application into different layers isn’t all we need for building a plug-in architecture; we also need a way for detecting and loading the actual plug-in implementations. The service loader mechanism of the Java platform comes in handy for that. If you have never worked with the service loader API, it’s definitely recommended to study its extensive JavaDoc description:\nA service is a well-known interface or class for which zero, one, or many service providers exist. A service provider (or just provider) is a class that implements or subclasses the well-known interface or class. A ServiceLoader is an object that locates and loads service providers deployed in the run time environment at a time of an application’s choosing. Having been a supported feature of Java since version 6, the service loader API has been been reworked and refined to work within modular environments when the Java Module System was introduced in JDK 9.\nIn order to retrieve service implementations via the service loader, a consuming module must declare the use of the service in its module descriptor. For our purposes, the GreeterFactory contract is a perfect examplification of the service idea. Here’s the descriptor of the Greeter application’s app module, declaring its usage of this service:\n1 2 3 4 5 module com.example.greeter.app { exports com.example.greeter.app; requires com.example.greeter.api; uses com.example.greeter.api.GreeterFactory; } The module descriptor of each greeter plug-in must declare the service implementation(s) which it provides. E.g. here is the module descriptor of the English greeter implementation:\n1 2 3 4 5 6 module com.example.greeter.en { requires com.example.greeter.api; requires com.example.greeter.dateutil; provides com.example.greeter.api.GreeterFactory with com.example.greeter.en.EnglishGreeterFactory; } From within the app module, the service implementations can be retrieved via the java.util.ServiceLoader class.\nWhen using the service loader in layered applications, there’s one potential pitfall though, which mostly will affect existing applications which are migrated: in order to access service implementations located in a different layer (specifically, in an ancestor layer of the loading layer), the method load(ModuleLayer, Class\u0026lt;?\u0026gt;) must be used. When using other overloaded variants of load(), e.g. the commonly used load(Class\u0026lt;?\u0026gt;), those implementations won’t be found.\nHence the code for loading the greeter implementations from within the app layer could look like this:\n1 2 3 4 5 6 7 8 9 10 private static List\u0026lt;GreeterFactory\u0026gt; getGreeterFactories() { ModuleLayer appLayer = App.class.getModule().getLayer(); return ServiceLoader.load(appLayer, GreeterFactory.class) .stream() .map(p -\u0026gt; p.get()) .sorted((gf1, gf2) -\u0026gt; gf1.getLanguage().compareTo( gf2.getLanguage())) .collect(Collectors.toList()); } Having loaded the list of greeter factories, it doesn’t take too much code to display a list with all available implementations, expect a choice by the user and invoke the greeter for the chosen language. This code which isn’t too interesting is omitted here for the sake of brevity and can be found in the accompanying example source code repo.\nJDK 9 brought some more nice improvements for the service loader API. E.g. the type of service implementations can be examined without actually instantiating them. This allows for interesting alternatives for providing service meta-data and choosing an implementation based on some criteria. For instance, greeter metadata like the language name and flag could be given using an annotation:\n1 2 3 4 @GreeterDefinition(lang=\u0026#34;English\u0026#34;, flag=\u0026#34;🇬🇧\u0026#34;) public class EnglishGreeterFactory implements GreeterFactory { Greeter getGreeter(); } Then the method ServiceLoader.Provider#type() can be used to obtain the annotation and return a greeter factory for a given language:\n1 2 3 4 5 6 7 8 9 10 11 private Optional\u0026lt;GreeterFactory\u0026gt; getGreeterFactoryForLanguage( String language) { ModuleLayer layer = App.class.getModule().getLayer(); return ServiceLoader.load(layer, GreeterFactory.class) .stream() .filter(gf -\u0026gt; gf.type().getAnnotation( GreeterDefinition.class).lang().equals(language)) .map(gf -\u0026gt; gf.get()) .findFirst(); } Seeing it in Action Lastly, let’s take a look at the complete Greeter application in action. Here it is, initially with two, and then with three greeter implementations:\nThe layers configuration file is adjusted to load greeter plug-ins from the plugins directory; initially, two greeters for English and French exist. Then the German greeter implementation gets picked up by the application after adding it to the plug-in directory, without requiring any changes to the application tiself.\nThe complete source code, including the logic for displaying all the available greeters and prompting for input, is available in the Layrry repository on GitHub.\nAnd there you have it, a basic plug-in architecture using Layrry and the Java Module System. Going forward, this might evolve in a few ways. E.g. it might be desirable to detect additional plug-ins without having to restart the application, e.g. when thinking of desktop application use cases. While loading additional plug-ins in new layers should be comparatively easy, unloading already loaded layers, e.g. when updating a plug-in to a newer version, could potentially be quite tricky. In particular, there’s no way to actively unload layers, so we’d have to rely on the garbage collector to clean up unused layers, making sure no references to any of their classes are kept in other, active layers.\nOne also could think of an event bus, allowing different plug-ins to communicate in a safe, yet loosely coupled way. What requirements would you have for plug-in centered applications running on the Java Module System? Let’s exchange in the comments below!\n","id":179,"publicationdate":"Apr 21, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_an_example_the_greeter_cli_app\"\u003eAn Example: The Greeter CLI App\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_application_plug_ins_with_layrry\"\u003eApplication Plug-ins With Layrry\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_finding_plug_in_implementations_with_the_java_service_loader\"\u003eFinding Plug-in Implementations With the Java Service Loader\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_seeing_it_in_action\"\u003eSeeing it in Action\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eMaking applications extensible with some form of plug-ins is a very common pattern in software design:\nbased on well-defined APIs provided by the application core, plug-ins can customize an application’s behavior and provide new functionality.\nExamples include desktop applications like IDEs or web browsers, build tools such as Apache Maven or Gradle, as well as server-side applications such as Apache Kafka Connect,\na runtime for Kafka connectors plug-ins.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this post I’m going to explore how the \u003ca href=\"https://www.jcp.org/en/jsr/detail?id=376\"\u003eJava Platform Module System\u003c/a\u003e\u0026#39;s notion of module layers can be leveraged for implementing plug-in architectures on the JVM.\nWe’ll also discuss how \u003ca href=\"https://github.com/moditect/layrry\"\u003eLayrry\u003c/a\u003e, a launcher and runtime for layered Java applications, can help with this task.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jpms","plugin-architecture"],"title":"Plug-in Architectures With Layrry and the Java Module System","uri":"https://www.morling.dev/blog/plugin-architectures-with-layrry-and-the-java-module-system/"},{"content":" Table of Contents Why Layrry? An Example Module Layers to the Rescue The Layrry Launcher Using the Layrry API Next Steps One of the biggest changes in recent Java versions has been the introduction of the module system in Java 9. It allows to organize Java applications and their dependencies in strongly encapsulated modules, utilizing explicit and well-defined module APIs and relationships.\nIn this post I’m going to introduce the Layrry open-source project, a launcher and Java API for executing modularized Java applications. Layrry helps Java developers to assemble modularized applications from dependencies using their Maven coordinates and execute them using module layers. Layers go beyond the capabilities of the \u0026#34;flat\u0026#34; module path specified via the --module-path parameter of the java command, e.g. allowing to use multiple versions of one module within one and the same application.\nWhy Layrry? The Java Module System doesn’t define any means of mapping between modules (e.g. com.acme.crm) and JARs providing such module (e.g. acme-crm-1.0.0.Final.jar), or retrieving modules from remote repositories using unique identifiers (e.g. com.acme:acme-crm:1.0.0.Final). Instead, it’s the responsibility of the user to obtain all required JARs of a modularized application and provide them via the --module-path parameter.\nFurthermore, the module system doesn’t define any means of module versioning; i.e. it’s the responsibility of the user to obtain all modules in the right version. Using the --module-path option, it’s not possible, though, to assemble an application that uses multiple versions of one and the same module. This may be desirable for transitive dependencies of an application, which might be required in different versions by two separate direct dependencies.\nThis is where Layrry comes in (pronounced \u0026#34;Larry\u0026#34;): it provides a declarative approach as well as an API for assembling modularized applications. The (modular) JARs to be included are described using Maven GAV (group id, artifact id, version) coordinates, solving the issue of retrieving all required JARs from a remote repository, in the right version.\nWith Layrry, applications are organized in module layers, which allows to use different versions of one and the same module in different layers of an application (as long as they are not exposed in a conflicting way on module API boundaries).\nAn Example As an example, let’s consider an application made up of the following modules:\nThe application’s main module, com.example:app, depends on two others, com.example:foo and com.example:bar. They in turn depend on the Log4j API and another module, com.example:greeter. The latter is used in two different versions, though.\nLet’s take a closer look at the Greeter class in these modules. Here is the version in com.example:greeter@1.0.0, as used by com.example:foo:\n1 2 3 4 5 6 public class Greeter { public String greet(String name, String from) { return \u0026#34;Hello, \u0026#34; + name + \u0026#34; from \u0026#34; + from + \u0026#34; (Greeter 1.0.0)\u0026#34;; } } And this is how it looks in com.example:greeter@2.0.0, as used by com.example:bar:\n1 2 3 4 5 6 7 8 9 10 11 public class Greeter { public String hello(String name, String from) { return \u0026#34;Hello, \u0026#34; + name + \u0026#34; from \u0026#34; + from + \u0026#34; (Greeter 2.0.0)\u0026#34;; } public String goodBye(String name, String from) { return \u0026#34;Good bye, \u0026#34; + name + \u0026#34; from \u0026#34; + from + \u0026#34; (Greeter 2.0.0)\u0026#34;; } } The Greeter API has evolved in a backwards-incompatible way, i.e. it’s not possible for the foo and bar modules to use the same version.\nWith a \u0026#34;flat\u0026#34; module path (or classpath), there’s no way for dealing with this situation. You’d inevitably end up with a NoSuchMethodError, as either foo or bar would be linked at runtime against a version of the class different from the version it has been compiled against.\nThe lack of support for using multiple module versions when working with the --module-path option might be surprising at first, but it’s an explicit non-requirement of the module system to support multiple module versions or even deal with selecting matching module versions at all.\nThis means that the module descriptors of both foo and bar require the greeter module without any version information:\n1 2 3 4 5 module com.example.foo { exports com.example.foo; requires org.apache.logging.log4j; requires com.example.greeter; } 1 2 3 4 5 module com.example.bar { exports com.example.bar; requires org.apache.logging.log4j; requires com.example.greeter; } Module Layers to the Rescue While only one version of a given module is supported when running applications via java --module-path=…​, there’s a lesser known feature of the module system which provides a way out: module layers.\nA module layer \u0026#34;is created from a graph of modules in a Configuration and a function that maps each module to a ClassLoader.\u0026#34; Using the module layer API, multiple versions of a module can be loaded in different layers, thus using different classloaders.\nNote the layers API doesn’t concern itself with obtaining JARs or modules from remote locations such as the Maven Central repository; instead, any modules must be provided as Path objects. Here is how a layer with the foo and greeter:1.0.0 modules could be assembled:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ModuleLayer boot = ModuleLayer.boot(); ClassLoader scl = ClassLoader.getSystemClassLoader(); Path foo = Paths.get(\u0026#34;path/to/foo-1.0.0.jar\u0026#34;); (1) Path greeter10 = Paths.get(\u0026#34;path/to/greeter-1.0.0.jar\u0026#34;); (2) ModuleFinder fooFinder = ModuleFinder.of(foo, greeter10); Configuration fooConfig = boot.configuration() (3) .resolve( fooFinder, ModuleFinder.of(), Set.of(\u0026#34;com.example.foo\u0026#34;, \u0026#34;com.example.greeter\u0026#34;) ); ModuleLayer fooLayer = boot.defineModulesWithOneLoader( fooConfig, scl); (4) 1 obtain foo-1.0.0.jar 2 obtain greeter-1.0.0.jar 3 Create a configuration derived from the \u0026#34;boot\u0026#34; module of the JVM, providing a ModuleFinder for the two JARs obtained before, and resolving the two modules 4 Create a module layer using the configuration, loading all contained modules with a single classloader Similarly, you could create a layer for bar and greeter:2.0.0, as well as layers for log4j and the main application module. The layers API is very flexible, e.g. you could load each module in its own classloader and more. But all this flexibility can make using the API direcly a daunting task.\nAlso using an API might not be what you want in the first place: wouldn’t it be nice if there was a CLI tool, akin to using java --module-path=…​, but with the additional powers of module layers?\nThe Layrry Launcher This is where Layrry comes in: it is a CLI tool which takes a configuration of a layered application (defined in a YAML file) and executes it. The layer descriptor for the example above looks like so:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 layers: log: (1) modules: (2) - \u0026#34;org.apache.logging.log4j:log4j-api:jar:2.13.1\u0026#34; - \u0026#34;org.apache.logging.log4j:log4j-core:jar:2.13.1\u0026#34; - \u0026#34;com.example:logconfig:1.0.0\u0026#34; foo: parents: (3) - \u0026#34;log\u0026#34; modules: - \u0026#34;com.example:greeter:1.0.0\u0026#34; - \u0026#34;com.example:foo:1.0.0\u0026#34; bar: parents: - \u0026#34;log\u0026#34; modules: - \u0026#34;com.example:greeter:2.0.0\u0026#34; - \u0026#34;com.example:bar:1.0.0\u0026#34; app: parents: - \u0026#34;foo\u0026#34; - \u0026#34;bar\u0026#34; modules: - \u0026#34;com.example:app:1.0.0\u0026#34; main: (4) module: com.example.app class: com.example.app.App 1 Each layer has a unique name 2 The modules element lists all the modules contained in the layer, using Maven coordinates (group id, artifact id, version), unambigously referencing a (modular) JAR in a specific version 3 A layer can have one or more parent layers, whose modules it can access; if no parent is given, the JVM’s \u0026#34;boot\u0026#34; layer is the implicit parent of a layer 4 The given main module and class is the one that will be executed by Layrry The configuration above describes four layers, log, foo, bar and app, with the modules they contain and the parent/child relationships between these layers. Note how the versions 1.0.0 and 2.0.0 of the greeter module are used in foo and bar. The file also specifies the main class to execute when running this application.\nUsing Layrry, a modular application is executed like this:\n1 2 3 4 5 6 7 java -jar layrry-1.0-SNAPSHOT-jar-with-dependencies.jar \\ --layers-config layers.yml \\ Alice 20:58:01.451 [main] INFO com.example.foo.Foo - Hello, Alice from Foo (Greeter 1.0.0) 20:58:01.472 [main] INFO com.example.bar.Bar - Hello, Alice from Bar (Greeter 2.0.0) 20:58:01.473 [main] INFO com.example.bar.Bar - Good bye, Alice from Bar (Greeter 2.0.0) The log messages show how the two versions of greeter are used by foo and bar, respectively. Layrry will download all referenced JARs using the Maven resolver API, i.e. you don’t have to deal with manually obtaining all the JARs and providing them to the java runtime.\nUsing the Layrry API In addition to the YAML-based launcher, Layrry provides also a Java API for assembling and running layered applications. This can be used in cases where the structure of layers is only known at runtime, or for implementing plug-in architectures.\nIn order to use Layrry programmatically, add the following dependency to your pom.xml:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.moditect.layrry\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;layrry\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;1.0-SNAPSHOT\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Then, the Layrry Java API can be used like this (showing the same example as above):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Layers layers = Layers.layer(\u0026#34;log\u0026#34;) .withModule(\u0026#34;org.apache.logging.log4j:log4j-api:jar:2.13.1\u0026#34;) .withModule(\u0026#34;org.apache.logging.log4j:log4j-core:jar:2.13.1\u0026#34;) .withModule(\u0026#34;com.example:logconfig:1.0.0\u0026#34;) .layer(\u0026#34;foo\u0026#34;) .withParent(\u0026#34;log\u0026#34;) .withModule(\u0026#34;com.example:greeter:1.0.0\u0026#34;) .withModule(\u0026#34;com.example:foo:1.0.0\u0026#34;) .layer(\u0026#34;bar\u0026#34;) .withParent(\u0026#34;log\u0026#34;) .withModule(\u0026#34;com.example:greeter:2.0.0\u0026#34;) .withModule(\u0026#34;com.example:bar:1.0.0\u0026#34;) .layer(\u0026#34;app\u0026#34;) .withParent(\u0026#34;foo\u0026#34;) .withParent(\u0026#34;bar\u0026#34;) .withModule(\u0026#34;com.example:app:1.0.0\u0026#34;) .build(); layers.run(\u0026#34;com.example.app/com.example.app.App\u0026#34;, \u0026#34;Alice\u0026#34;); Next Steps The Layrry project is still in its infancy. Nevertheless it can be a useful tool for application developers wishing to leverage the Java Module System. Obtaining modular JARs via Maven coordinates and providing an easy-to-use mechanism for organizing modules in layers enables usages which cannot be addressed using the plain java --module-path …​ approach.\nLayrry is open-source (under the Apache License version 2.0). The source code is hosted on GitHub, and your contributions are very welcomed.\nPlease let me know about your ideas and requirements in the comments below or by opening up issues on GitHub. Planned enhancements include support for creating modular runtime images (jlink) based on the modules referenced in a layers.yml file, and visualization of module layers and their modules via GraphViz.\n","id":180,"publicationdate":"Mar 29, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_why_layrry\"\u003eWhy Layrry?\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_an_example\"\u003eAn Example\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_module_layers_to_the_rescue\"\u003eModule Layers to the Rescue\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_layrry_launcher\"\u003eThe Layrry Launcher\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_using_the_layrry_api\"\u003eUsing the Layrry API\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_next_steps\"\u003eNext Steps\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOne of the biggest changes in recent Java versions has been the introduction of the \u003ca href=\"http://openjdk.java.net/projects/jigsaw/spec/\"\u003emodule system\u003c/a\u003e in Java 9.\nIt allows to organize Java applications and their dependencies in strongly encapsulated modules, utilizing explicit and well-defined module APIs and relationships.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this post I’m going to introduce the \u003ca href=\"https://github.com/moditect/layrry\"\u003eLayrry\u003c/a\u003e open-source project, a launcher and Java API for executing modularized Java applications.\nLayrry helps Java developers to assemble modularized applications from dependencies using their Maven coordinates and execute them using module layers.\nLayers go beyond the capabilities of the \u0026#34;flat\u0026#34; module path specified via the \u003cem\u003e--module-path\u003c/em\u003e parameter of the \u003cem\u003ejava\u003c/em\u003e command,\ne.g. allowing to use multiple versions of one module within one and the same application.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jpms","plugin-architecture"],"title":"Introducing Layrry: A Launcher and API for Modularized Java Applications","uri":"https://www.morling.dev/blog/introducing-layrry-runner-and-api-for-modularized-java-applications/"},{"content":"","id":181,"publicationdate":"Mar 16, 2020","section":"tags","summary":"","tags":null,"title":"git","uri":"https://www.morling.dev/tags/git/"},{"content":" Table of Contents The Problem The Solution Within Debezium, the project I’m working on at Red Hat, we recently encountered an \u0026#34;interesting\u0026#34; situation where we had to resolve a rather difficult merge conflict. As others where interested in how we addressed the issue, and also for our own future reference, I’m going to give a quick run down of the problem we encountered and how we solved it.\nThe Problem Ideally, we’d only ever work on a single branch and would never have to deal with porting changes between the master and other branches. Oftentimes we cannot get around this, though: specific versions of a software may have to be maintained for some time, requiring to backport bugfixes from the current development branch to the branch corresponding to the maintained version.\nIn our specific case we had to deal with backporting changes to our project documentation. To complicate things, this documentation (written in AsciiDoc) has been largely re-organized between master and the targeted older branch, 1.0. What used to be one large AsciiDoc file for each of the Debezium connectors, got split up into multiple smaller files on master now. This split was meant to be applied to 1.0 too, but due to some miscommunication in the team (these things happen, right) this wasn’t done, whereas an asorted set of documentation changes had been backported already to the larger, monolithic AsciiDoc files.\nSo the situation we faced was this:\nlarge, monolithic AsciiDoc files on the 1.0 branch\nsmaller, modularized AsciiDoc files on master\nDocumentation updates applied on master, of which only a subset is relevant for 1.0 (new features shouldn’t be added to the Debezium 1.0 documentation)\nSome of the documentation updates relevant for the 1.0 branch already had been backported from master, while others had not\nAll in all, a rather convoluted situation; the full diff of the documentation sub-directory between the two branches was about 13K lines.\nSo what should we do? Cherry-picking individual commits from master was not really an option, as there were a few hundred commits on master since 1.0 had been forked off. Also many commits would contain documentation and code changes. The latter had already been backported successfully before.\nRealizing that resolving that merge conflict was next to impossible, the next idea was to essentially start from scratch and re-apply all relevant documentation changes to the 1.0 branch. Our initial idea was to create a patch with the difference of the documentation directory between the two branches. But editing that patch file with 13K lines turned out to be not manageable, either.\nThe Solution This is when we were reminded of the possibilities of git filter-branch: using this command it should be possible to isolate all the documentation changes done on master since Debezium 1.0 and apply the required sub-set of these changes to the 1.0 branch.\nTo start with a clean slate, we created a new temporary branch based on 1.0:\n1 git checkout -b docs_backport 1.0 We then reset the contents of the documentation directory to its state as of the 1.0.0.Final release, as that’s where the 1.0 and master branches diverged.\n1 2 3 4 5 6 7 rm -rf documentation git add documentation git checkout v1.0.0.Final documentation git commit -m \u0026#34;Resetting documentation dir to v1.0.0.Final\u0026#34; # This should yield no differences git diff v1.0.0.Final..docs_backport documentation The next step was to filter all commits on master so to only keep any changes to the documentation directory. This was done on a new branch, docs_filtered. The --subdirectory-filter option comes in handy for that:\n1 2 3 4 5 git checkout -b docs_filtered master git filter-branch -f --prune-empty \\ --subdirectory-filter documentation \\ v1.0.0.Final..docs_filtered This leaves us with a branch docs_filtered which only contains the commits since the v1.0.0.Final tag that modified the documentation directory.\nThe --subdirectory-filter option also moves the contents of the given directory to the root of the repo, though. That’s not exactly what we need. But another option, --tree-filter, lets us restore the original directory layout. It allows to run a set of commands against each of the filtered commits. We can use this to move the contents of documentation back to that directory:\n1 2 3 4 5 git filter-branch -f \\ --tree-filter \u0026#39;mkdir -p documentation; \\ mv antora.yml documentation 1\u0026gt;/dev/null 2\u0026gt;/dev/null; \\ mv modules documentation 1\u0026gt;/dev/null 2\u0026gt;/dev/null;\u0026#39; \\ v1.0.0.Final..docs_filtered Examining the history now, we can see that the commits on the docs_filtered apply the changes to the documentation directory, as expected.\nOne problem still remains, though: by means of the --subdirectory-filter option, the very first commit removes all contents besides the documentation directory. This can be fixed by doing an interactive rebase of the current branch, beginning at the v1.0.0.Final tag:\n1 git rebase -i v1.0.0.Final We need to edit the very first commit; all changes besides those to the documentation directory need to be reverted from that commit. There might be a better way of doing so, I simply ran git checkout for all the other resources:\n1 2 3 git checkout v1.0.0.Final debezium-connector-mongodb git checkout v1.0.0.Final debezium-connector-mysql ... At this point the filtered branch still is based off of the v1.0.0.Final tag, whereas it should be based off of the docs_backport branch. git rebase --onto to the rescue:\n1 git rebase --onto docs_backport v1.0.0.Final docs_filtered This rebases all the commits from the docs_filtered branch onto the docs_backport branch. Now we have a state where where all the documention changes have been cleanly applied to the 1.0 code base, i.e. the following should yield no differences:\n1 git diff docs_filtered..master documentation The last and missing step is to do another rebase of all the documentation commits, discarding those that apply to any features that didn’t get backported to 1.0.\nThankfully, my partner-in-crime Jiri Pechanec stepped in here: as he had done the original feature backport, it didn’t take him too long to go through the list of documentation commits and identify those which were relevant for the 1.0 code base. After one more interactive rebase for applying those we finally were in a state, where all the required documentation changes had been backported.\nLooking at the 1.0 history, you’d still see some partial documentation changes up to the point, where we decided to start all over and revert these. Theoretically we could do another git filter run to exclude those, but we decided against that, as we already had done releases off of the 1.0 branch and didn’t want to alter the commit history of a released branch after the fact.\n","id":182,"publicationdate":"Mar 16, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_problem\"\u003eThe Problem\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_the_solution\"\u003eThe Solution\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWithin \u003ca href=\"https://debezium.io/\"\u003eDebezium\u003c/a\u003e, the project I’m working on at Red Hat, we recently encountered an \u0026#34;interesting\u0026#34; situation where we had to resolve a rather difficult merge conflict.\nAs others where interested in how we addressed the issue, and also for our own future reference,\nI’m going to give a quick run down of the problem we encountered and how we solved it.\u003c/p\u003e\n\u003c/div\u003e","tags":["git","tooling"],"title":"Reworking Git Branches with git filter-branch","uri":"https://www.morling.dev/blog/reworking-git-branches-with-git-filter-branch/"},{"content":" Table of Contents Custom Flight Recorder Events Creating JFR Recordings Event Settings JFR Event Streaming MicroProfile Metrics Summary and Related Work The JDK Flight Recorder (JFR) is an invaluable tool for gaining deep insights into the performance characteristics of Java applications. Open-sourced in JDK 11, JFR provides a low-overhead framework for collecting events from Java applications, the JVM and the operating system.\nIn this blog post we’re going to explore how custom, application-specific JFR events can be used to monitor a REST API, allowing to track request counts, identify long-running requests and more. We’ll also discuss how the JFR Event Streaming API new in Java 14 can be used to export live events, making them available for monitoring and alerting via tools such as Prometheus and Grafana.\nJFR and its companion tool JDK Mission Control (JMC) for analyzing JFR recordings have come a long way; originally developed at BEA and part of the JRockit VM, they were later on commercial features of the Oracle JDK. As of Java 11, JFR got open-sourced and is part of OpenJDK distributions. JMC is also open-source, but it’s an independent tool under the OpenJDK umbrella, which must be downloaded separately.\nUsing the combination of JFR and JMC, you can get all kinds of information about your Java application, such as events on garbage collection, compilation, classloading, memory allocation, file and socket IO, method profiling data, and much more. To learn more about Flight Recorder and Mission Control in general, have a look at the Code One 2019 presentation Introduction to JDK Mission Control \u0026amp; JDK Flight Recorder by Marcus Hirt and Klara Ward. You can find some more links to related useful resources towards the end of this post.\nCustom Flight Recorder Events One thing that’s really great about JFR and JMC is that you’re not limited to the events and data baked into the JVM and platform libraries: JFR also provides an API for implementing custom events. That way you can use the low-overhead event recording infrastructure (its goal is to add at most 1% performance overhead) for your own event types. This allows you to record and analyze higher-level events, using the language of your application-specific domain.\nTaking my day job project Debezium as an example (an open-source platform for change data capture for a variety of databases), we could for instance produce events such as \u0026#34;Snapshot started\u0026#34;, \u0026#34;Snapshotting of table \u0026#39;Customers\u0026#39; completed\u0026#34;, \u0026#34;Captured change event for transaction log offset 123\u0026#34; etc. Users could send us recordings with these events and we could dive into them, in order to identify bugs or performance issues.\nIn the following let’s consider a less complex and hence better approachable example, though. We’ll implement an event for measuring the duration of REST API calls. The Todo service from my recent blog post on Quarkus Qute will serve as our guinea pig. It is based on the Quarkus stack and provides a simple REST API based on JAX-RS. As always, you can find the complete source code for this blog post on GitHub.\nEvent types are implemented by extending the jdk.jfr.Event class; It already provides us with some common attributes such as a timestamp and a duration. In sub-classes you can add application-specific payload attributes, as well as some metadata such as a name and category which will be used for organizing and displaying events when looking at them in JMC.\nWhich attributes to add depends on your specific requirements; you should aim for the right balance between capturing all the relevant information that will be useful for analysis purposes later on, while not going overboard and adding too much, as that could cause record files to become too large, in particular for events that are emitted with a high frequency. Also retrieval of the attributes should be an efficient operation, so to avoid any unneccessary overhead.\nHere’s a basic event class for monitoring our REST API calls:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 @Name(JaxRsInvocationEvent.NAME) (1) @Label(\u0026#34;JAX-RS Invocation\u0026#34;) @Category(\u0026#34;JAX-RS\u0026#34;) @Description(\u0026#34;Invocation of a JAX-RS resource method\u0026#34;) @StackTrace(false) (2) public class JaxRsInvocationEvent extends Event { static final String NAME = \u0026#34;dev.morling.jfr.JaxRsInvocation\u0026#34;; @Label(\u0026#34;Resource Method\u0026#34;) (3) public String method; @Label(\u0026#34;Media Type\u0026#34;) public String mediaType; @Label(\u0026#34;Java Method\u0026#34;) public String javaMethod; @Label(\u0026#34;Path\u0026#34;) public String path; @Label(\u0026#34;Query Parameters\u0026#34;) public String queryParameters; @Label(\u0026#34;Headers\u0026#34;) public String headers; @Label(\u0026#34;Length\u0026#34;) @DataAmount (4) public int length; @Label(\u0026#34;Response Headers\u0026#34;) public String responseHeaders; @Label(\u0026#34;Response Length\u0026#34;) public int responseLength; @Label(\u0026#34;Response Status\u0026#34;) public int status; } 1 The @Name, @Category, @Description and @Label annotations define some meta-data, e.g. used for controlling the appearance of these events in the JMC UI 2 JAX-RS invocation events shouldn’t contain a stacktrace by default, as that’d only increase the size of Flight Recordings without adding much value 3 One payload attribute is defined for each relevant property such as HTTP method, media type, the invoked path etc. 4 @DataAmount tags this attribute as a data amount (by default in bytes) and will be displayed accordingly in JMC; there are many other similar annotations in the jdk.jfr package, such as @MemoryAddress, @Timestamp and more Having defined the event class itself, we must find a way for emitting event instances at the right point in time. In the simplest case, e.g. suitable for events related to your application logic, this might happen right in the application code itself. For more \u0026#34;technical\u0026#34; events it’s a good idea though to keep the creation of Flight Recorder events separate from your business logic, e.g. by using mechanisms such as servlet filters, interceptors and similar, which allow to inject cross-cutting logic into the call flow of your application.\nYou also might employ byte code instrumentation at build or runtime for this purpose. The JMC Agent project aims at providing a configurable Java agent that allows to dynamically inject code for emitting JFR events into running programs. Via the EventFactory class, the JFR API also provides a way for defining event types dynamically, should their payload attributes only be known at runtime.\nFor monitoring a JAX-RS based REST API, the ContainerRequestFilter and ContainerResponseFilter contracts come in handy, as they allow to hook into the request handling logic before and after a REST request gets processed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 @Provider (1) public class FlightRecorderFilter implements ContainerRequestFilter, ContainerResponseFilter { @Override (2) public void filter(ContainerRequestContext requestContext) throws IOException { JaxRsInvocationEvent event = new JaxRsInvocationEvent(); if (!event.isEnabled()) { (3) return; } event.begin(); (4) requestContext.setProperty(JaxRsInvocationEvent.NAME, event); (5) } @Override (6) public void filter(ContainerRequestContext requestContext, ContainerResponseContext responseContext) throws IOException { JaxRsInvocationEvent event = (JaxRsInvocationEvent) requestContext .getProperty(JaxRsInvocationEvent.NAME); if (event == null || !event.isEnabled()) { return; } event.end(); (7) event.path = String.valueOf(requestContext.getUriInfo().getPath()); if (event.shouldCommit()) { (8) event.method = requestContext.getMethod(); event.mediaType = String.valueOf(requestContext.getMediaType()); event.length = requestContext.getLength(); event.queryParameters = requestContext.getUriInfo() .getQueryParameters().toString(); event.headers = requestContext.getHeaders().toString(); event.javaMethod = getJavaMethod(requestContext); event.responseLength = responseContext.getLength(); event.responseHeaders = responseContext.getHeaders().toString(); event.status = responseContext.getStatus(); event.commit(); (9) } } private String getJavaMethod(ContainerRequestContext requestContext) { String propName = \u0026#34;org.jboss.resteasy.core.ResourceMethodInvoker\u0026#34;; ResourceMethodInvoker invoker = (ResourceMethodInvoker)requestContext.getProperty(propName); return invoker.getMethod().toString(); } } 1 Allows the filter to be picked up automatically by the JAX-RS implementation 2 Will be invoked before the request is processed 3 Nothing to do if the event type is not enabled for recordings currently 4 Begin the timing of the event 5 Store the event in the request context, so it can be obtained again later on 6 Will be invoked after the request has been processed 7 End the timing of the event 8 The event should be committed if it is enabled and its duration is within the threshold configured for it; in that case, populate all the payload attributes of the event based on the values from the request and response contexts 9 Commit the event with Flight Recorder With that, our event class is pretty much ready to be used. There’s only one more thing to do, and that is registering the new type with the Flight Recorder system. A Quarkus application start-up lifecycle method comes in handy for that:\n1 2 3 4 5 6 7 @ApplicationScoped public class Metrics { public void registerEvent(@Observes StartupEvent se) { FlightRecorder.register(JaxRsInvocationEvent.class); } } Note this step isn’t strictly needed, the event type can also be used without explicit registration. But doing so will later on allow to apply specific settings for the event in Mission Control (see below), also if no event of this type has been emitted yet.\nCreating JFR Recordings Now let’s capture some JAX-RS API events using Flight Recorder and inspect them in Mission Control.\nTo do so, make sure to have Mission Control installed. Just as with OpenJDK, there are different builds for Mission Control to choose from. If you’re in the Fedora/RHEL universe, there’s a repository package which you can install, e.g. like this for the Fedora JMC package:\n1 sudo dnf module install jmc:7/default Alternatively, you can download builds for different platforms from Oracle; some more info about these builds can be found in this blog post by Marcus Hirt. There’s also the Liberica Mission Control build by BellSoft and Zulu Mission Control by Azul. The AdoptOpenJDK provides snapshot builds of JMC 8 as well as an Eclipse update site for installing JMC into an existing Eclipse instance.\nIf you’d like to follow along and run these steps yourself, check out the source code from GitHub and then perform the following commands:\n1 2 cd example-service \u0026amp;\u0026amp; mvn clean package \u0026amp;\u0026amp; cd .. docker-compose up --build This builds the project using Maven and spins up the following services using Docker Compose:\nexample-service: The Todo example application\ntodo-db: The Postgres database used by the Todo service\nprometheus and grafana: For monitoring live events later on\nThen go to http://localhost:8080/todo, where you should see the Todo web application:\nNow fire up Mission Control. The example service run via Docker Compose is configured so you can connect to it on localhost. In the JVM Browser, create a new connection with host \u0026#34;localhost\u0026#34; and port \u0026#34;1898\u0026#34;. Hit \u0026#34;Test connection\u0026#34;, which should yield \u0026#34;OK\u0026#34;, then click \u0026#34;Finish\u0026#34;.\nCreate a new recording by expanding the localhost:1898 node in the JVM Explorer, right-clicking on \u0026#34;Flight Recorder\u0026#34; and choosing \u0026#34;Start Flight Recording…​\u0026#34;. Confirm the default settings, which will create a recording with a duration of one minute. Go back to the Todo web application and perform a few tasks like creating some new todos, editing and deleting them, or filtering the todo list.\nEither wait for the recording to complete or stop it by right-clicking on the recording name and selecting \u0026#34;Stop\u0026#34;. Once the recording is done, it will be opened automatically. Now you could dive into all the logged events for the OS, the JVM etc, but as we’re interested in our custom JAX-RS events, Choose \u0026#34;Event Browser\u0026#34; in the outline view and expand the \u0026#34;JAX-RS\u0026#34; category. You will see the events for all your REST API invocations, including information such as duration of the request, the HTTP method, the resource path and much more:\nIn a real-world use case, you could now use this information for instance to identify long-running requests and correlate these events with other data points in the Flight Recording, such as method profiling and memory allocation data, or sub-optimal SQL statements in your database.\nIf your application is running in production, it might not be feasible to connect to it via Mission Control from your local workstation. The jcmd utility comes in handy in that case; part of the JDK, you can use it to issue diagnostic commands against a running JVM.\nAmongst many other things, it allows you to start and stop Flight Recordings. On the environment with your running application, first run jcmd -l, which will show you the PIDs of all running Java processes. Having identified the PID of the process you’d like to examine, you can initiate a recording like so:\n1 2 jcmd \u0026lt;PID\u0026gt; JFR.start delay=5s duration=30s \\ name=MyRecording filename=my-recording.jfr This will start a recording of 30 seconds, beginning in 5 seconds from now. Once the recording is done, you could copy the file to your local machine and load it into Mission Control for further analysis. To learn more about creating Flight Recordings via jcmd, refer to this great cheat sheet.\nAnother useful tool in the belt is the jfr command, which was introduced in JDK 12. It allows you to filter and examine the binary Flight Recording files. You also can use it to extract parts of a recording and convert them to JSON, allowing them to be processed with other tools. E.g. you could convert all the JAX-RS events to JSON like so:\n1 jfr print --json --categories JAX-RS my-recording.jfr Event Settings Sometimes it’s desirable to configure detailed behaviors of a given event type. For the JAX-RS invocation event it might for instance make sense to only log invocations of particular paths in a specific recording, allowing for a smaller recording size and keeping the focus on a particular subset of all invocations. JFR supports this by the notion of event settings. Such settings can be specified when creating a recording; based on the active settings, particular events will be included or excluded in the recording.\nInspired by the JavaDoc of @SettingDefinition let’s see what’s needed to enhance JaxRsInvocationEvent with that capability. The first step is to define a subclass of jdk.jfr.SettingControl, which serves as the value holder for our setting:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 public class PathFilterControl extends SettingControl { private Pattern pattern = Pattern.compile(\u0026#34;.*\u0026#34;); (1) @Override (2) public void setValue(String value) { this.pattern = Pattern.compile(value); } @Override (3) public String combine(Set\u0026lt;String\u0026gt; values) { return String.join(\u0026#34;|\u0026#34;, values); } @Override (4) public String getValue() { return pattern.toString(); } (5) public boolean matches(String s) { return pattern.matcher(s).matches(); } } 1 A regular expression pattern that’ll be matched against the path of incoming events; by default all paths are included (.*) 2 Invoked by the JFR runtime to set the value for this setting 3 Invoked when multiple recordings are running at the same time, combining the settings values 4 Invoked by the runtime for instance when getting the default value of the setting 5 Matches the configured setting value against a particular path On the event class itself a method with the following characteristics must be declared which will receive the setting by the JFR runtime:\n1 2 3 4 5 6 7 8 9 10 11 12 13 class JaxRsInvocationEvent extends Event { @Label(\u0026#34;Path\u0026#34;) public String path; // other members... @Label(\u0026#34;Path Filter\u0026#34;) @SettingDefinition (1) protected boolean pathFilter(PathFilterControl pathFilter) { (2) return pathFilter.matches(path); } } 1 Tags this as a setting 2 The method must be public, take a SettingControl type as its single parameter and return boolean This method will be invoked by the JFR runtime during the shouldCommit() call. It passes in the setting value of the current recording so it can applied to the path value of the given event. In case the filter returns true, the event will be added to the recording, otherwise it will be ignored.\nWe also could use such setting to control the inclusion or exclusion of specific event attributes. For that, the setting definition method would always have to return true, but depending on the actual setting it might set particular attributes of the event class to null. For instance this might come in handy if we wanted to log the entire request/response body of our REST API. Doing this all the time might be prohibitive in terms of recording size, but it might be enabled for a particlar short-term recording for analyzing some bug.\nNow let’s see how the path filter can be applied when creating a new recording in Mission Control. The option is a bit hidden, but here’s how you can enable it. First, create a new Flight Recording, then choose \u0026#34;Template Manager\u0026#34; in the dialogue:\nDuplicate the \u0026#34;Continuous\u0026#34; template and edit it:\nClick \u0026#34;Advanced\u0026#34;:\nExpand \u0026#34;JAX-RS\u0026#34; → \u0026#34;JAX-RS Invocation\u0026#34; and put .*(new|edit).* into the Path Filter control:\nNow close the last two dialogues. In the \u0026#34;Start Flight Recording\u0026#34; dialogue make sure to select your new template under \u0026#34;Event Settings\u0026#34;; although you’ve edited it before, it won’t be selected automatically. I lost an hour or so wondering why my settings were not applied…​ .\nLastly, click \u0026#34;Finish\u0026#34; to begin the recording:\nPerform some tasks in the Todo web app and stop the recording. You should see only the REST API calls for the new and edit operations, whereas no events should be shown for the list and delete operations of the API.\nIn order to apply specific settings when creating a recording on the CLI using jcmd, edit the settings as described above. Then go to the Template Manager and export the profile you’d like to use. When starting the recording via jcmd, specify the settings file via the settings=/path/to/settings.jfc parameter.\nJFR Event Streaming Flight Recorder files are great for analyzing performance characteristics in an \u0026#34;offline\u0026#34; approach: you can take recordings in your production environment and ship them to your work station or a remote support team, without requiring live access to the running application. This is also an interesting mode for open-source projects, where maintainers typically don’t have access to running applications of their users. Exchanging Flight Recordings (limited to a sensible subset of information, so to avoid exposure of confidential internals) might allow open source developers to gain insight into characteristics of their libraries when deployed to production at their users.\nBut there’s another category of use cases for event data sourced from applications, the JVM and the operating system, where the recording file approach doesn’t quite fit: live monitoring and alerting of running applications. E.g. operations teams might want to set up dashboards showing the most relevant application metrics in \u0026#34;real-time\u0026#34;, without having to create any recording files first. A related requirement is alerting, so to be notified when metrics reach a certain threshold. For instance it might be desirable to be alterted if the request duration of our JAX-RS API goes beyond a defined value such as 100 ms.\nThis is where JEP 349 (\u0026#34;JFR Event Streaming\u0026#34;) comes in. It’ll be part of Java 14 and its stated goal is to \u0026#34;provide an API for the continuous consumption of JFR data on disk, both for in-process and out-of-process applications\u0026#34;. That’s exactly what we need for our monitoring/dashboarding use case. Using the Streaming API, Flight Recorder events of the running application can be exposed to external consumers, without having to explicitly load any recording files.\nNow it may be prohibitively expensive to stream each and every event with all its detailed information to remote clients. But that’s not needed for monitoring purposes anyways. Instead, we can expose metrics based on our events, such as the total number and frequency of REST API invocations, or the average and 99th percentile duration of the calls.\nMicroProfile Metrics The following shows a basic implementation of exposing these metrics for the JAX-RS API events to Prometheus/Grafana, where they can be visualized using a dashboard. Being based on Quarkus, the Todo web application can leverage all the MicroProfile APIs. On of them is the MicroProfile Metrics API, which defines a \u0026#34;unified way for Microprofile servers to export Monitoring data (\u0026#34;Telemetry\u0026#34;) to management agents\u0026#34;.\nWhile the MicroProfile Metrics API is used in an annotation-driven fashion often-times, it also provides a programmatic API for registering metrics. This can be leveraged to expose metrics based on the JAX-RS Flight Recorder events:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 @ApplicationScoped public class Metrics { @Inject (1) MetricRegistry metricsRegistry; private RecordingStream recordingStream; (2) public void onStartup(@Observes StartupEvent se) { recordingStream = new RecordingStream(); (3) recordingStream.enable(JaxRsInvocationEvent.NAME); recordingStream.onEvent(JaxRsInvocationEvent.NAME, event -\u0026gt; { (4) String path = event.getString(\u0026#34;path\u0026#34;) .replaceAll(\u0026#34;(\\\\/)([0-9]+)(\\\\/?)\u0026#34;, \u0026#34;$1{param}$3\u0026#34;); (5) String method = event.getString(\u0026#34;method\u0026#34;); String name = path + \u0026#34;-\u0026#34; + method; Metadata metadata = metricsRegistry.getMetadata().get(name); if (metadata == null) { metricsRegistry.timer(Metadata.builder() (6) .withName(name) .withType(MetricType.TIMER) .withDescription(\u0026#34;Metrics for \u0026#34; + path + \u0026#34; (\u0026#34; + method + \u0026#34;)\u0026#34;) .build()).update(event.getDuration().toNanos(), TimeUnit.NANOSECONDS); } else { (7) metricsRegistry.timer(name).update(event.getDuration() .toNanos(), TimeUnit.NANOSECONDS); } }); recordingStream.startAsync(); (8) } public void stop(@Observes ShutdownEvent se) { recordingStream.close(); (9) try { recordingStream.awaitTermination(); } catch (InterruptedException e) { throw new RuntimeException(e); } } } 1 Inject the MicroProfile Metrics registry 2 A stream providing push access to JFR events 3 Initialize the stream upon application start-up, so it includes the JAX-RS invocation events 4 For each JaxRsInvocationEvent this callback will be invoked 5 To register a corresponding metric, any path parameters are replaced with a constant placeholder, so that e.g. all invocations of the todo/{id}/edit path are exposed via one single metric instead of having separate ones for Todo 1, Todo 2 etc. 6 If the metric for the specific path hasn’t been registered yet, then do so; it’s a metric of type TIMER, allowing metric consumers to track the duration of calls of that particular path 7 If the metric for the path has been registered before, update its value with the duration of the incoming event 8 Start the stream asynchronously, not blocking the onStartup() method 9 Close the JFR event stream upon application shutdown When connecting to the running application using JMC now, you’ll see a continuous recording, which serves as the basis for the event stream. It only contains events of the JaxRsInvocationEvent type.\nMicroProfile Metrics exposes any application-provided metrics in the Prometheus format under the /metrics/application endpoint; for each operation of the REST API, e.g. POST to /todo/{id}/edit, the following metrics are provided:\nrequest rate per second, minute, five minutes and 15 minutes\nmin, mean and max duration as well as standard deviation\ntotal invocation count\nduration of 75th, 95th, 99th etc. percentiles\nOnce the endpoint is provided, it’s not difficult to set up a scraping process for ingesting the metrics into the Prometheus time-series database. You can find the required Prometheus configuration in the accompanying source code repository.\nWhile Prometheus provides some visualization capabilities itself, it is often used together with Grafana, which allows to build nicely looking dashboards via a rather intuitive UI. Here’s an example dashboard showing the duration and invocation numbers for the different methods in the Todo REST API:\nAgain you can find the complete configuration for Grafana including the definition of that dashboard in the example repo. It will automatically be loaded when using the Docker Compose set-up shown above. Based on that you could easily expand the dashboard for other metrics and set up alerts, too.\nCombining the monitoring of live key metrics with the deep insights possible via detailed JFR recordings enable a very powerful workflow for analysing performance issues in production:\nWhen setting up the continuous recording that serves as the basis for the metrics, have it contain all the event types you’d need to gain insight into GC or memory issues etc.; specify a maximum size via RecordingStream#setMaxSize(), so to avoid an indefinitely growing recording; you’ll probably need to experiment a bit to find the right trade-off between number of enabled events, duration that’ll be covered by the recording and the required disk space\nOnly expose a relevant subset of the events as metrics to Prometheus/Grafana, such as the JAX-RS API invocation events in our example\nSet up an alert in Grafana on the key metrics, e.g. mean duration of the REST calls, or 99th percentile thereof\nIf the alert triggers, take a dump of the last N minutes of the continuous recording via JMC or jcmd (using the JFR.dump command), and analyze that detailed recording to understand what was happening in the time leading to the alert\nSummary and Related Work Flight Recorder and Mission Control are excellent tools providing deep insight into the performance characteristics of Java applications. While there’s a large amount of data and highly valuable information provided out the box, JFR and JMC also allow for the recording of custom, application-specific events. With its low overhead, JFR can be enabled on a permanent basis in production environments. Combined with the Event Streaming API introduced in Java 14, this opens up an attractive, very performant alternative to other means of capturing analysis information at application runtime, such as logging libraries. Providing live key metrics derived from JFR events to tools such as Prometheus and Grafana enables monitoring and alerting in \u0026#34;real-time\u0026#34;.\nFor many enterprises that are still on Java 11 or even 8, it’ll still be far out into the future until they might adopt the streaming API. But with more and more companies joining the OpenJDK efforts, it might be a possiblity that this useful feature gets backported to earlier LTS releases, just as the open-sourced version of Flight Recorder itself got backported to Java 8.\nThere are quite a few posts and presentations about JFR and JMC available online, but many of them refer to older versions of those tools, before they got open-sourced. Here are some up-to-date resources which I found very helpful:\nContinuous Monitoring with JDK Flight Recorder: a talk from QCon SF 2019 by Mikael Vidstedt\nFlight Recorder \u0026amp; Mission Control at Code One 2019: a compilation of several great sessions on these two tools at last year’s Code One, put together by Marcus Hirt\nDigging Into Sockets With Java Flight Recorder: blog post by Petr Bouda on identifying performance bottlenecks with JFR in a Netty-based web application\nLastly, the Red Hat OpenJDK team is working on some very interesting projects around JFR and JMC, too. E.g. they’ve built a datasource for Grafana which lets you examine the events of a JFR file. They also work on tooling to simplify the usage of JFR in container-based environments such as Kubernetes and OpenShift, including a K8s Operator for controlling Flight Recordings and a web-based UI for managing JFR in remote JVMs. Should you happen to be at the FOSDEM conference in Brussels on the next weekend, be sure to not miss the JMC \u0026amp; JFR - 2020 Vision session by Red Hat engineer Jie Kang.\nIf you’d like to experiment with JDK Flight Recorder and JDK Mission Control based on the Todo web application yourself, you can find the complete source code for this post on GitHub.\nMany thanks to Mario Torre and Jie Kang for reviewing an early draft of this post.\n","id":183,"publicationdate":"Jan 29, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_custom_flight_recorder_events\"\u003eCustom Flight Recorder Events\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_creating_jfr_recordings\"\u003eCreating JFR Recordings\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_event_settings\"\u003eEvent Settings\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_jfr_event_streaming\"\u003eJFR Event Streaming\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_microprofile_metrics\"\u003eMicroProfile Metrics\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_summary_and_related_work\"\u003eSummary and Related Work\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eThe \u003ca href=\"https://openjdk.java.net/jeps/328\"\u003eJDK Flight Recorder\u003c/a\u003e (JFR) is an invaluable tool for gaining deep insights into the performance characteristics of Java applications.\nOpen-sourced in JDK 11, JFR provides a low-overhead framework for collecting events from Java applications, the JVM and the operating system.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIn this blog post we’re going to explore how custom, application-specific JFR events can be used to monitor a REST API, allowing to track request counts, identify long-running requests and more.\nWe’ll also discuss how the JFR \u003ca href=\"https://openjdk.java.net/jeps/349\"\u003eEvent Streaming API\u003c/a\u003e new in Java 14 can be used to export live events,\nmaking them available for monitoring and alerting via tools such as Prometheus and Grafana.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","jfr","monitoring","quarkus"],"title":"Monitoring REST APIs with Custom JDK Flight Recorder Events","uri":"https://www.morling.dev/blog/rest-api-monitoring-with-custom-jdk-flight-recorder-events/"},{"content":" Table of Contents Record Invariants and Bean Validation Implementation Advantages Limitations Wrap-Up Record types are one of the most awaited features in Java 14; they promise to \u0026#34;provide a compact syntax for declaring classes which are transparent holders for shallowly immutable data\u0026#34;. One example where records should be beneficial are data transfer objects (DTOs), as e.g. found in the remoting layer of enterprise applications. Typically, certain rules should be applied to the attributes of such DTO, e.g. in terms of allowed values. The goal of this blog post is to explore how such invariants can be enforced on record types, using annotation-based constraints as provided by the Bean Validation API.\nRecord Invariants and Bean Validation Records (a preview feature as of Java 14) help to cut down the ceremony when defining plain data holder objects. In a nutshell, you solely need to declare the attributes that should make up the state of the record type (\u0026#34;components\u0026#34; in terms of JEP 359), and quite a few things you’d otherwise have to implement by hand will be created for you automatically:\na private final field and a corresponding read accessor for each component\na constructor for passing in all component values\ntoString(), equals() and hashCode() methods.\nAs an example, here’s a record Car with three components:\n1 2 3 public record Car(String manufacturer, String licensePlate, int seatCount) { } Now let’s assume a few class invariants should be applied to this record (inspired by an example from the Hibernate Validator reference guide):\nmanufacturer is a non-blank string\nlicense plate is never null and has a length of 2 to 14 characters\nseatCount is at least 2\nClass invariants like these are specific conditions or rules applying to the state of a class (as manifesting in its fields), which always are guaranteed to be satisfied for the lifetime of an instance of the class.\nThe Bean Validation API defines a way for expressing and validating constraints using Java annotations. By putting constraint annotations to the components of a record type, it’s a perfect means of describing the invariants from above:\n1 2 3 4 5 public record Car( @NotBlank String manufacturer, @NotNull @Size(min = 2, max = 14) String licensePlate, @Min(2) int seatCount) { } Of course declaring constraints using annotations by itself won’t magically enforce these invariants. In order to do so, the javax.validation.Validator API must be invoked at suitable points in the object lifecycle, so to avoid any of the invariants to be violated. As records are immutable, it is sufficient to validate the constraints once when creating a new Car instance. If no constraints are violated, the created instance is guaranteed to always satisfy its invariants.\nImplementation The key question now is how to validate the invariants while constructing new Car instances. This is where Bean Validation’s API for method validation comes in: it allows to validate pre- and post-conditions that should be satisfied when a Java method or constructor gets invoked. Pre-conditions are expressed by applying constraints to method and constructor parameters, whereas post-conditions are expressed by putting constraints to a method or constructor itself.\nThis can be leveraged for enforcing record invariants: as it turns out, any annotations on the components of a record type are also copied to the corresponding parameters of the generated constructor. I.e. the Car record implicitly has a constructor which looks like this:\n1 2 3 4 5 6 7 8 9 public Car( @NotBlank String manufacturer, @NotNull @Size(min = 2, max = 14) String licensePlate, @Min(2) int seatCount) { this.manufacturer = manufacturer; this.licensePlate = licensePlate; this.seatCount = seatCount; } That’s exactly what we need: by validating these parameter constraints upon instantiation of the Car class, we can make sure that only valid objects can ever be created, ensuring that the record type’s invariants are always guaranteed.\nWhat’s missing is a way for automatically validating them upon constructor invocation. The idea for that is to enhance the byte code of the implicit Car constructor so that it passes the incoming parameter values to Bean Validation’s ExecutableValidator#validateConstructorParameters() method and raises a constraint violation exception in case of any invalid parameter values.\nWe’re going to use the excellent ByteBuddy library for this job. Here’s a slightly simplified implementation for invoking the executable validator (you can find the complete source code of this example in this GitHub repository):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 public class ValidationInterceptor { private static final Validator validator = Validation (1) .buildDefaultValidatorFactory() .getValidator(); public static \u0026lt;T\u0026gt; void validate(@Origin Constructor\u0026lt;T\u0026gt; constructor, @AllArguments Object[] args) { (2) Set\u0026lt;ConstraintViolation\u0026lt;T\u0026gt;\u0026gt; violations = validator (3) .forExecutables() .validateConstructorParameters(constructor, args); if (!violations.isEmpty()) { String message = violations.stream() (4) .sorted(ValidationInterceptor::compare) .map(cv -\u0026gt; getParameterName(cv) + \u0026#34; - \u0026#34; + cv.getMessage()) .collect(Collectors.joining(System.lineSeparator())); throw new ConstraintViolationException( (5) \u0026#34;Invalid instantiation of record type \u0026#34; + constructor.getDeclaringClass().getSimpleName() + System.lineSeparator() + message, violations); } } private static int compare(ConstraintViolation\u0026lt;?\u0026gt; o1, ConstraintViolation\u0026lt;?\u0026gt; o2) { return Integer.compare(getParameterIndex(o1), getParameterIndex(o2)); } private static String getParameterName(ConstraintViolation\u0026lt;?\u0026gt; cv) { // traverse property path to extract parameter name } private static int getParameterIndex(ConstraintViolation\u0026lt;?\u0026gt; cv) { // traverse property path to extract parameter index } } 1 Obtain a Bean Validation Validator instance 2 The @Origin and @AllArguments annotations are the hint to ByteBuddy that the invoked constructor and parameter values should be passed to this method from within the enhanced constructor 3 Validate the passed constructor arguments using Bean Validation 4 If there’s at least one violated constraint, create a message comprising all constraint violation messages, ordered by parameter index 5 Raise a ConstraintViolationException, containing the message created before as well as all the constraint violations Having implemented the validation interceptor, the code of the record constructor must be enhanced by ByteBuddy, so that it invokes the inceptor. ByteBuddy provides different ways for doing so, e.g. at application start-up using a Java agent. For this example, we’re going to employ build-time enhancement via the ByteBuddy Maven plug-in. The enhancement logic itself is implemented in a custom net.bytebuddy.build.Plugin:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 public class ValidationWeavingPlugin implements Plugin { @Override public boolean matches(TypeDescription target) { (1) return target.getDeclaredMethods() .stream() .anyMatch(m -\u0026gt; m.isConstructor() \u0026amp;\u0026amp; hasConstrainedParameter(m)); } @Override public Builder\u0026lt;?\u0026gt; apply(Builder\u0026lt;?\u0026gt; builder, TypeDescription typeDescription, ClassFileLocator classFileLocator) { return builder.constructor(this::hasConstrainedParameter) (2) .intercept(SuperMethodCall.INSTANCE.andThen( MethodDelegation.to(ValidationInterceptor.class))); } private boolean hasConstrainedParameter(MethodDescription method) { return method.getParameters() (3) .asDefined() .stream() .anyMatch(p -\u0026gt; isConstrained(p)); } private boolean isConstrained( ParameterDescription.InDefinedShape parameter) { (4) return !parameter.getDeclaredAnnotations() .asTypeList() .filter(hasAnnotation(annotationType(Constraint.class))) .isEmpty(); } @Override public void close() throws IOException { } } 1 Determines whether a type should be enhanced or not; this is the case if there’s at least one constructor that has one more more constrained parameters 2 Applies the actual enhancement: into each constrained constructor the call to ValidationInterceptor gets injected 3 Determines whether a method or constructor has at least one constrained parameter 4 Determines whether a parameter has at least one constraint annotation (an annotation meta-annotated with @Constraint; for the sake of simplicity the case of constraint inheritance is ignored here) The next step is to configure the ByteBuddy Maven plug-in in the pom.xml of the project:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;net.bytebuddy\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;byte-buddy-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;${version.bytebuddy}\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;transform\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;transformations\u0026gt; \u0026lt;transformation\u0026gt; \u0026lt;plugin\u0026gt; dev.morling.demos.recordvalidation.implementation.ValidationWeavingPlugin \u0026lt;/plugin\u0026gt; \u0026lt;/transformation\u0026gt; \u0026lt;/transformations\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; This plug-in runs in the process-classes phase by default, so it can access and enhance the class files generated during compilation. If you were to build the project now, you could use the javap tool to examine the byte code of the Car class,and you’d see that the implicit constructor of that class contains an invocation of the ValidationInterceptor#validate() method.\nAs an example, let’s consider the following attempt to instantiate a Car object, which violates the invariants of that record type:\n1 Car invalid = new Car(\u0026#34;\u0026#34;, \u0026#34;HH-AB-123\u0026#34;, 1); A constraint violation like this will be thrown immediately:\n1 2 3 4 5 javax.validation.ConstraintViolationException: Invalid instantiation of record type Car manufacturer - must not be blank seatCount - must be greater than or equal to 2 at dev.morling.demos.recordvalidation.RecordValidationTest.canValidate(RecordValidationTest.java:20) If all constraints are satisfied, no exception will be thrown and the caller obtains the new Car instance, whose invariants are guaranteed to be met for the remainder of the object’s lifetime.\nAdvantages Having shown how Bean Validation can be leveraged to enforce the invariants of Java record types, it is time to reflect: is this this approach worth the additional complexity incurred by adding a library such as Bean Validation and hooking it up using byte code enhancement? After all, you could also validate incoming parameter values using methods such as Objects#requireNonNull().\nAs so often, you need to make such decision based on your specific requirements and needs. Here are some advantages I can see about the Bean Validation approach:\nInvariants become part of the API: Constraint annotations on public API members such as the implicit record constructor are easily discoverable by users of such type; they are listed in generated JavaDoc, you can see them when hovering over an invocation in your IDE (once records are supported); when used on the DTOs of a REST layer, the invariants could also be added to automatically generated API documentation. All this makes it easy for users of the type to understand the invariants and also avoids potential inconsistencies between a manual validation implementation and corresponding hand-written documentation\nProviding constraint metadata: The Bean Validation constraint meta-data API can be used to obtain information about the constraints of Java types; for instance this can be used to implement client-side validation of constraints in a web application\nLess code: Putting constraint annotations directly to the record components themselves avoids the need for implementing these checks manually in an explicit canonical constructor\nI18N support: Bean Validation provides means of internationalizing constraint violation messages; if your record types are instantiated based on user input (e.g. when using them as data types in a REST API), this allows for localized error messages in the UI\nReturning all constraints at once: For UIs it’s typically beneficial to return all the constraint violations at once instead of showing them one by one; while doable in a hand-written implementation, it requires a bit of effort, whereas you get this \u0026#34;for free\u0026#34; when using Bean Validation which always returns a set of all the violations\nLots of ready-made constraints: Bean Validation comes with a range of constraints out of the box; in addition libraries such as Hibernate Validator and others provide many more ready-to-use constraints, coming in handy for instance when implementing domain-specific value types with complex validation rules:\n1 2 3 public record EmailAddress( @Email @NotNull @Size(min=1, max=250) String value) { } Support for validation groups: Bean Validation’s concept of validation groups allows you to validate only sub-sets of constraints in specific contexts; e.g. based on location and applying legal requirements\nDynamic constraint definition: Using Hibernate Validator, constraints can also be declared dynamically using a fluent API. This can be very useful when your validation requirements vary at runtime, e.g. if you need to apply different constraint configurations for different tenants.\nLimitations One area where this current proof-of-concept implementation falls a bit short is the validation of invariants that apply to multiple components. For instance consider a record type representing an interval with a begin and an end attribute, where you’d like to enforce the invariant that end is larger than begin.\nBean Validation addresses this sort of requirement via class-level constraints and, for method and constructor validation, cross-parameter constraints. Class-level constraints are not really suitable for our purposes, because we want to validate the invariants before an object instance is created.\nCross-parameter constraints on the other hand are exactly what we’d need. As they must be given on a constructor or method, the canonical constructor of a record must be explicitly declared in this case. Using Hibernate Validator’s @ParameterScriptAssert constraint, the invariant from above could be expressed like so:\n1 2 3 4 5 6 public record Interval(int begin, int end) { @ParameterScriptAssert(lang=\u0026#34;javascript\u0026#34;, script=\u0026#34;end \u0026gt; begin\u0026#34;) public Interval { } } This works as expected, but there’s one caveat: any annotations from the record components are not propagated to the corresponding parameters of the canoncial constructor in this case. This means that any constraints given on the individual components would be lost. Right now it’s not quite clear to me whether that’s an intended behavior or rather a bug in the current record implementation.\nIf indeed it is intentional, than there’d be no way other than specifying the constraints explicitly on the parameters of a fully manually implemented constructor:\n1 2 3 4 5 6 7 8 public record Interval(int begin, int end) { @ParameterScriptAssert(lang=\u0026#34;javascript\u0026#34;, script=\u0026#34;end \u0026gt; begin\u0026#34;) public Interval(@Positive int begin, @Positive int end) { this.begin = begin; this.end = end; } } This works, but of course we’re losing a bit of the conciseness promised by records.\nUpdate, Jan 20, 2020, 20:57: Turns out, the current behavior indeed is not intended (see JDK-8236597) and in a future Java version the shorter version of the code shown above should work.\nWrap-Up In this blog post we’ve explored how invariants on Java 14 record types can be enforced using the Bean Validation API. With just a bit of byte code magic the task gets manageable: by validating invariants expressed by constraint annotations on record components right at instantiation time, only valid record instances will ever be exposed to callers. Key for that is the fact that any annotations from record components are automatically propagated to the corresponding parameters of the canonical record constructor. That way they can be validated using Bean Validation’s method validation API. It remains to be seen, whether invariants based on multiple record components also can be enforced as easily.\nFrom the perspective of the Bean Validation specification, it’ll surely make sense to explore support for record types. While not as powerful as enforcing invariants at construction time via byte code enhancement, it might also be useful to support the validation of component values via their read accessors. For that, the notion of \u0026#34;properties\u0026#34; would have to be relaxed, as the read accessors of records don’t have the JavaBeans get prefix currently expected by Bean Validation. It also should be considered to expand the Bean Validation metadata API accordingly.\nI would also be very happy to learn about your thoughts around this topic. While Bean Validation 3.0 (as part of Jakarta EE 9) in all likelyhood won’t bring any changes besides the transition to the jakarta.* package namespace, this may be an area where we could evolve the specification for Jakarta EE 10.\nIf you’d like to experiment with the validation of record types yourself, you can find the complete source code on GitHub.\n","id":184,"publicationdate":"Jan 20, 2020","section":"blog","summary":"Table of Contents Record Invariants and Bean Validation Implementation Advantages Limitations Wrap-Up Record types are one of the most awaited features in Java 14; they promise to \u0026#34;provide a compact syntax for declaring classes which are transparent holders for shallowly immutable data\u0026#34;. One example where records should be beneficial are data transfer objects (DTOs), as e.g. found in the remoting layer of enterprise applications. Typically, certain rules should be applied to the attributes of such DTO, e.","tags":["java","validation","records"],"title":"Enforcing Java Record Invariants With Bean Validation","uri":"https://www.morling.dev/blog/enforcing-java-record-invariants-with-bean-validation/"},{"content":"","id":185,"publicationdate":"Jan 20, 2020","section":"tags","summary":"","tags":null,"title":"records","uri":"https://www.morling.dev/tags/records/"},{"content":" Table of Contents An Example Configuration Should You Do This? When Java 9 was introduced in 2017, it was the last major version published under the old release scheme. Since then, a six month release cadence has been adopted. This means developers don’t have to wait years for new APIs and language features, but they can get their hands onto the latest additions twice a year. In this post I’d like to describe how you can try out new language features such as Java 13 text blocks in the test code of your project, while keeping your main code still compatible with older Java versions.\nOne goal of the increased release cadence is to shorten the feedback loop for the OpenJDK team: have developers in the field try out new functionality early on, collect feedback based on that, adjust as needed. To aid with that process, the JDK has two means of publishing preliminary work before new APIs and language features are cast in stone:\nIncubator JDK modules\nPreview language and VM features\nAn example for the former is the new HTTP client API, which was an incubator module in JDK 9 and 10, before it got standardized as a regular API in JDK 11. Examples for preview language features are switch expressions (added as a preview feature in Java 12) and text blocks (added in Java 13).\nNow especially text blocks are a feature which many developers have missed in Java for a long time. They are really useful when embedding other languages, or just any kind of longer text into your Java program, e.g. multi-line SQL statements, JSON documents and others. So you might want to go and use them as quickly as possible, but depending on your specific situation and requirements, you may no be able to move to Java 13 just yet.\nIn particular when working on libraries, compatibility with older Java versions is a high priority in order to not cut off a large number of potential users. E.g. in the JetBrains Developer Ecosystem Survey from early 2019, 83% of participants said that Java 8 is a version they regularly use. This matches with what I’ve observed myself during conversations e.g. at conferences. Now this share may have reduced a bit since then (I couldn’t find any newer numbers), but at this point in time it still seems save to say that libraries should support Java 8 to not limit their audience in a signficant way.\nSo while building on Java 13 is fine, requiring it at runtime for libraries isn’t. Does this mean as a library author you cannot use text blocks then for many years to come? For your main code (i.e. the one shipped to users) it indeed does mean that, but things look different when it comes to test code.\nAn Example One case where text blocks come in extremely handy is testing of REST APIs, where JSON requests need to created and responses may have to be compared to a JSON string with the expected value. Here’s an example of using text blocks in a test of a Quarkus-based REST service, implemented using RESTAssured and JSONAssert:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 @QuarkusTest public class TodoResourceTest { @Test public void canPostNewTodoAndReceiveId() throws Exception { given() .when() .body(\u0026#34;\u0026#34;\u0026#34; (1) { \u0026#34;title\u0026#34; : \u0026#34;Learn Java\u0026#34;, \u0026#34;completed\u0026#34; : false } \u0026#34;\u0026#34;\u0026#34; ) .contentType(ContentType.JSON) .post(\u0026#34;/hello\u0026#34;) .then() .statusCode(201) .body(matchesJson(\u0026#34;\u0026#34;\u0026#34; (2) { \u0026#34;id\u0026#34; : 1, \u0026#34;title\u0026#34; : \u0026#34;Learn Java\u0026#34;, \u0026#34;completed\u0026#34; : false } \u0026#34;\u0026#34;\u0026#34;) ); } } 1 Text block with the JSON request to send 2 Text block with the expected JSON response Indeed that’s much nicer to read, e.g. when comparing the request JSON to the code you’d typically write without text blocks. Concatenating multiple lines, escaping quotes and explicitly specifying line breaks make this quite cumbersome:\n1 2 3 4 5 6 .body( \u0026#34;{\\n\u0026#34; + \u0026#34; \\\u0026#34;title\\\u0026#34; : \\\u0026#34;Learn Java 13\\\u0026#34;,\\n\u0026#34; + \u0026#34; \\\u0026#34;completed\\\u0026#34; : false\\n\u0026#34; + \u0026#34;}\u0026#34; ) Now let’s see what’s needed in terms of configuration to enable usage of Java 13 text blocks for tests, while keeping the main code of a project compatible with Java 8.\nConfiguration Two options of the Java compiler javac come into play here:\n--release: specifies the Java version to compile for\n--enable-preview: allows to use language features currently in \u0026#34;preview\u0026#34; status such as text blocks as of Java 13/14\nThe --release option was introduced in Java 9 and should be preferred over the more widely known pair of --source and --target. The reason being that --release will prevent any accidental usage of APIs only introduced in later versions.\nE.g. say you were to write code such as List.of(\u0026#34;Foo\u0026#34;, \u0026#34;Bar\u0026#34;); the of() methods on java.util.List were only introduced in Java 9, so compiling with --release 8 will raise a compilation error in this case. When using the older options, this situation wouldn’t be detected at compile time, making the problem only apparent when actually running the application on the older Java version.\nBuild tools typically allow to use different configurations for the compilation of main and test code. E.g. here is what you’d use for Maven (you can find the complete source code of the example in this GitHub repo):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ... \u0026lt;properties\u0026gt; ... \u0026lt;maven.compiler.release\u0026gt;8\u0026lt;/maven.compiler.release\u0026gt; (1) ... \u0026lt;/properties\u0026gt; \u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; ... \u0026lt;plugin\u0026gt; \u0026lt;artifactId\u0026gt;maven-compiler-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.8.1\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;default-testCompile\u0026lt;/id\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;release\u0026gt;13\u0026lt;/release\u0026gt; (2) \u0026lt;compilerArgs\u0026gt;--enable-preview\u0026lt;/compilerArgs\u0026gt; (3) \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... \u0026lt;/plugins\u0026gt; ... \u0026lt;/build\u0026gt; ... 1 Compile for release 8 by default, i.e. the main code 2 Compile test code for release 13 3 Also pass the --enable-preview option when compiling the test code Also at runtime preview features must be explicitly enabled. Therefore the java command must be accordingly configured when executing the tests, e.g. like so when using the Maven Surefire plug-in:\n1 2 3 4 5 6 7 8 9 ... \u0026lt;plugin\u0026gt; \u0026lt;artifactId\u0026gt;maven-surefire-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.22.1\u0026lt;/version\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;argLine\u0026gt;--enable-preview\u0026lt;/argLine\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; ... With this configuration in place, text blocks can now be used in tests as the one above, but not in the main code of the program. Doing so would result in a compilation error.\nNote your IDE might still let you do this kind of mistake. At least Eclipse chose for me the maximum of main (8) and test code (13) release levels when importing the project. But running the build on the command line via Maven or on your CI server will detect this situation.\nAs Java 13 now is required to build this code base, it’s a good idea to make this prerequisite explicit in the build process itself. The Maven enforcer plug-in comes in handy for that, allowing to express this requirement using its Java version rule:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-enforcer-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.0.0-M3\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;enforce-java\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;enforce\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;rules\u0026gt; \u0026lt;requireJavaVersion\u0026gt; \u0026lt;version\u0026gt;[13,)\u0026lt;/version\u0026gt; \u0026lt;/requireJavaVersion\u0026gt; \u0026lt;/rules\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; ... The plug-in will fail the build when being run on a version before Java 13.\nShould You Do This? Having seen how you can use preview features in test code, the question is: should you actually do this? A few things should be kept in mind for answering that. First of all, preview features are really that, a preview. This means that details may change in future Java revisions. Or, albeit unlikely, such feature may even be dropped altogether, should the JDK team arrive at the conclusion that it is fundamentally flawed.\nAnother important factor is the minimum Java language version supported by the JDK compiler. As of Java 13, the oldest supported release is 7; i.e. using JDK 13, you can produce byte code that can be run with Java versions as old as Java 7. In order to keep the Java compiler maintainable, support for older versions is dropped every now and then. Right now, there’s no formal process in place which would describe when support for a specific version is going to be removed (defining such policy is the goal of JEP 182).\nAs per JDK developer Joe Darcy, \u0026#34;there are no plans to remove support for --release 7 in JDK 15\u0026#34;. Conversely, this means that support for release 7 theoretically could be removed in JDK 16 and support for release 8 could be removed in JDK 17. In that case you’d be caught between a rock and a hard place: Once you’re on a non-LTS (\u0026#34;long-term support\u0026#34;) release like JDK 13, you’ll need to upgrade to JDK 14, 15 etc. as soon as they are out, in order to not be cut off from bug fixes and security patches. Now while doing so, you’d be forced to increase the release level of your main code, once support for release 8 gets dropped, which may not desirable. Or you’d have to apply some nice awk/sed magic to replace all those shiny text blocks with traditional concatenated and escaped strings, so you can go back to the current LTS release, Java 11. Not nice, but surely doable.\nThat being said, this all doesn’t seem like a likely scenario to me. JEP 182 expresses a desire \u0026#34;that source code 10 or more years old should still be able to be compiled\u0026#34;; hence I think it’s save to assume that JDK 17 (the next release planned to receive long-term support) will still support release 8, which will be seven years old when 17 gets released as planned in September 2021. In that case you’d be on the safe side, receiving update releases and being able to keep your main code Java 8 compatible for quite a few years to come.\nNeedless to say, it’s a call that you need to make, deciding for yourself wether the benefits of using new language features such as text blocks is worth it in your specific situation or not.\n","id":186,"publicationdate":"Jan 13, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_an_example\"\u003eAn Example\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_configuration\"\u003eConfiguration\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_should_you_do_this\"\u003eShould You Do This?\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eWhen Java 9 was introduced in 2017,\nit was the last major version published under the old release scheme.\nSince then, a \u003ca href=\"https://www.infoq.com/news/2017/09/Java6Month/\"\u003esix month release cadence\u003c/a\u003e has been adopted.\nThis means developers don’t have to wait years for new APIs and language features,\nbut they can get their hands onto the latest additions twice a year.\nIn this post I’d like to describe how you can try out new language features such as \u003ca href=\"http://openjdk.java.net/jeps/355\"\u003eJava 13 text blocks\u003c/a\u003e in the test code of your project,\nwhile keeping your main code still compatible with older Java versions.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","testing","language-features"],"title":"Using Java 13 Text Blocks (Only) for Your Tests","uri":"https://www.morling.dev/blog/using-java-13-text-blocks-for-tests/"},{"content":" Table of Contents The Basics Combining HTML and Data APIs Template Organization Error Handling Search Smoother User Experience via Unpoly Bonus: Using WebJars Wrap Up One of the long-awaited features in Quarkus was support for server-side templating: until recently, Quarkus supported only client-side web frameworks which obtain there data by calling a REST API on the backend. This has changed with Quarkus 1.1: it comes with a brand-new template engine named Qute, which allows to build web applications using server-side templates.\nWhen looking at frameworks for building web applications, there’s two large categories:\nclient-side solutions based on JavaScript such as React, vue.js or Angular\nserver-side frameworks such as Spring Web MVC, JSF or MVC 1.0 (in the Java world)\nBoth have their indivdual strengths and weaknesses and it’d be not very wise to always prefer one over the other. Instead, the choice should be based on specific requirements (e.g. what kind of interactivity is needed) and prerequisites (e.g. the skillset of the team building the application).\nBeing mostly experienced with Java, server-side solutions are appealing to me, as they allow me to use the language I know and tooling (build tools, IDEs) I’m familiar and most productive with. So when Qute was announced, it instantly caught my attention and I had to give it a test ride. In this post I want to share some of the experiences I made.\nNote this isn’t a comprehensive tutorial for building web apps with Qute, instead, I’d like to discuss a few things that stuck out to me. You can find a complete working example here on GitHub. It implements a basic CRUD application for managing personal todos, persisted in a Postgres database. Here’s a video that shows the demo in action:\nThe Basics The Qute engine is based on RESTEasy/JAX-RS. As such, Qute web applications are implemented by defining resource types with methods answering to specific HTTP verbs and accept headers. The only difference being, that HTML pages are returned instead of JSON as in your typical REST-ful data API. The individual pages are created by processing template files. Here’s a basic example for returning all the Todo records in our application:\n1 2 3 4 5 6 7 8 9 10 11 12 13 @Path(\u0026#34;/todo\u0026#34;) public class TodoResource { @Inject Template todos; @GET (1) @Consumes(MediaType.TEXT_HTML) (2) @Produces(MediaType.TEXT_HTML) public TemplateInstance listTodos() { return todos.data(\u0026#34;todos\u0026#34;, Todo.findAll().list()); (3) } } 1 Processes HTTP GET requests for /todo 2 This method consumes and produces the text/html media type 3 Obtain all todos from the database and feed them to the todos template The Todo class is as JPA entity implemented via Hibernate Panache:\n1 2 3 4 5 6 7 @Entity public class Todo extends PanacheEntity { public String title; public int priority; public boolean completed; } Panache is a perfect fit for this kind of CRUD applications. It helps with common tasks such as id mapping, and by means of the active record pattern you get query methods like findAll() \u0026#34;for free\u0026#34;.\nTo produce an HTML page for displaying the result list, the todos template is used. Templates are located under src/main/resources/templates. As you would expect it, changes to template files are immediatly picked up when running Quarkus in Dev Mode. By default, the template name is derived from the field name of the injected Template instance, i.e. in this case the src/main/resources/templates/todos.html template will be used. It could look like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34;\u0026gt; \u0026lt;!-- CSS ... --\u0026gt; \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;...\u0026#34;\u0026gt; \u0026lt;title\u0026gt;My Todos\u0026lt;/title\u0026gt; \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;div class=\u0026#34;container\u0026#34;\u0026gt; \u0026lt;h1\u0026gt;My Todos\u0026lt;/h1\u0026gt; \u0026lt;table class=\u0026#34;table table-striped table-bordered\u0026#34;\u0026gt; \u0026lt;thead\u0026gt; \u0026lt;tr\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; class=\u0026#34;fit\u0026#34;\u0026gt;Id\u0026lt;/th\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; \u0026gt;Title\u0026lt;/th\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; class=\u0026#34;fit\u0026#34;\u0026gt;Priority\u0026lt;/th\u0026gt; \u0026lt;th scope=\u0026#34;col\u0026#34; class=\u0026#34;fit\u0026#34;\u0026gt;Completed\u0026lt;/th\u0026gt; \u0026lt;/tr\u0026gt; \u0026lt;/thead\u0026gt; {#if todos.size == 0} (1) \u0026lt;tr\u0026gt; \u0026lt;td colspan=\u0026#34;4\u0026#34;\u0026gt;No data found.\u0026lt;/td\u0026gt; \u0026lt;/tr\u0026gt; {#else} {#for todo in todos} (2) \u0026lt;tr\u0026gt; \u0026lt;th scope=\u0026#34;row\u0026#34;\u0026gt;#{todo.id}\u0026lt;/th\u0026gt; \u0026lt;td\u0026gt; {todo.title} (3) \u0026lt;/td\u0026gt; \u0026lt;td\u0026gt; {todo.priority} (4) \u0026lt;/td\u0026gt; \u0026lt;td\u0026gt; (5) \u0026lt;div class=\u0026#34;custom-control custom-checkbox\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;checkbox\u0026#34; class=\u0026#34;custom-control-input\u0026#34; disabled id=\u0026#34;completed-{todo.id}\u0026#34; {#if todo.completed}checked{/if}\u0026gt; \u0026lt;label class=\u0026#34;custom-control-label\u0026#34; for=\u0026#34;completed-{todo.id}\u0026#34;\u0026gt;\u0026lt;/label\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/td\u0026gt; \u0026lt;/tr\u0026gt; {/for} {/if} \u0026lt;/table\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; 1 If the injected todos list is empty, display a placeholder row 2 Otherwise, iterate over the todos list and add a table row for each one 3 Table cell for title 4 Table cell for priority 5 Table cell for completion status, rendered as a checkbox If you’ve worked with other templating engine before, this will look very familiar to you. You can refer to injected objects and their properties to display their values, have conditional logic, iterate over collections etc. A very nice aspect about Qute templates is that they are processed at build time, following the Quarkus notion of \u0026#34;compile-time boot\u0026#34;. This means if there is an error in a template such as unbalanced control keywords, you’ll find out about this at build time instead of only at runtime.\nThe reference documentation describes the syntax and all options in depth. Note that things are still in flux here, e.g. I couldn’t work with boolean operators in conditions.\nCombining HTML and Data APIs Thanks to HTTP content negotiation, you can easily combine resource methods for returning HTML and JSON for API-style consumers in a single endpoint. Just add another resource method for handling the required media type, e.g. \u0026#34;application/json\u0026#34;:\n1 2 3 4 5 6 @GET @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public List\u0026lt;Todo\u0026gt; listTodosJson() { return Todo.findAll().list(); } A standard HTTP request issued by a web browser would now be answered with the HTML page, whereas an AJAX request with the \u0026#34;application/json\u0026#34; accept header (or a manual invocation via curl) would yield the JSON representation. I really like that idea of considering HTML and JSON-based representations as two different \u0026#34;views\u0026#34; of the same API essentially.\nTemplate Organization If a web application has multiple pages or \u0026#34;views\u0026#34;, chances are there are many similarities between those. E.g. there might be a common header and footer for all pages, or one and the same form is used on multiple pages.\nTo avoid duplication in the templates in such cases, Qute supports the notion of includes. E.g. let’s say there’s a common form for creating new and editing existing todos. This can be put into its own template:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 (1) \u0026lt;form action=\u0026#34;/todo/{#if update}{todo.id}/edit{#else}new{/if}\u0026#34; method=\u0026#34;POST\u0026#34; name=\u0026#34;todoForm\u0026#34; enctype=\u0026#34;multipart/form-data\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-row align-items-center\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;col-sm-3 my-1\u0026#34;\u0026gt; \u0026lt;label class=\u0026#34;sr-only\u0026#34; for=\u0026#34;title\u0026#34;\u0026gt;Title\u0026lt;/label\u0026gt; (2) \u0026lt;input type=\u0026#34;text\u0026#34; name=\u0026#34;title\u0026#34; class=\u0026#34;form-control\u0026#34; id=\u0026#34;title\u0026#34; placeholder=\u0026#34;Title\u0026#34; required autofocus {#if update}value=\u0026#34;{todo.title}\u0026#34;{/if}\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;div class=\u0026#34;col-auto my-1\u0026#34;\u0026gt; \u0026lt;select class=\u0026#34;custom-select\u0026#34; name=\u0026#34;priority\u0026#34;\u0026gt; \u0026lt;option disabled value=\u0026#34;\u0026#34;\u0026gt;Priority\u0026lt;/option\u0026gt; {#for prio in priorities} \u0026lt;option value=\u0026#34;{prio}\u0026#34; {#if todo.priority == prio}selected{/if}\u0026gt;{prio}\u0026lt;/option\u0026gt; {/for} \u0026lt;/select\u0026gt; \u0026lt;/div\u0026gt; (3) {#if update} \u0026lt;div class=\u0026#34;col-auto my-1\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-check\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;completed\u0026#34; class=\u0026#34;form-check-input\u0026#34; id=\u0026#34;completed\u0026#34; {#if todo.completed}checked{/if}\u0026gt; \u0026lt;label class=\u0026#34;form-check-label\u0026#34; for=\u0026#34;completed\u0026#34;\u0026gt;Completed\u0026lt;/label\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/div\u0026gt; {/if} (4) \u0026lt;button type=\u0026#34;submit\u0026#34; class=\u0026#34;btn btn-primary\u0026#34;\u0026gt;{#if update}Update{#else}Create{/if}\u0026lt;/button\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/form\u0026gt; 1 Post to different path for update and create 2 Display existing title and priority in case of an update 3 Show checkbox for completion status in case of an update 4 Choose button caption depending on use case In order to display this form right under the table with all todos, the template can simply be included like so:\n1 2 \u0026lt;h2\u0026gt;New Todo\u0026lt;/h2\u0026gt; {#include todo-form.html}{/include} It’s also possible to extract the outer shell of multiple pages into a shared template (\u0026#34;template inheritance\u0026#34;). This allows to extract common headers and footers into one single template with placeholders for the inner parts.\nFor that, create a template with the common outer structure:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 \u0026lt;!doctype html\u0026gt; \u0026lt;html lang=\u0026#34;en\u0026#34;\u0026gt; \u0026lt;head\u0026gt; \u0026lt;meta charset=\u0026#34;utf-8\u0026#34;\u0026gt; \u0026lt;!-- CSS ... --\u0026gt; \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;...\u0026#34;\u0026gt; \u0026lt;title\u0026gt;{#insert title}Default Title{/}\u0026lt;/title\u0026gt; (1) \u0026lt;/head\u0026gt; \u0026lt;body\u0026gt; \u0026lt;div class=\u0026#34;container\u0026#34;\u0026gt; \u0026lt;h1\u0026gt;{#insert title}Default Title{/}\u0026lt;/h1\u0026gt; (1) {#insert contents}No contents!{/} (2) \u0026lt;/div\u0026gt; \u0026lt;/body\u0026gt; \u0026lt;/html\u0026gt; 1 Derived templates define a section title which will be inserted here 2 Derived templates define a section contents which will be inserted here Other templates can then extend the base one, e.g. like so for the \u0026#34;Edit Todo\u0026#34; page:\n1 2 3 4 5 6 {#include base.html} (1) {#title}Edit Todo #{todo.id}{/title} (2) {#contents} (3) {#include todo-form.html}{/include} (4) {/contents} {/include} 1 Include the base template 2 Define the title section 3 Define the contents section 4 Include the template for displaying the todo form As so often, a balance needs to be found between extracting common parts and still being able to comprehend the overall structure without having to pursue a large number of template references. But in any case with includes and inserts Qute puts the neccessary tools into your hands.\nError Handling For a great user experience robust error handling is a must. E.g. might happen that a user loads the \u0026#34;Edit Todo\u0026#34; dialog and while they’re in the process of editing, that record gets deleted by someone else. When saving, a proper error message should be displayed to the first user. Here’s the resource method implementation for that:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 @POST @Consumes(MediaType.MULTIPART_FORM_DATA) @Transactional @Path(\u0026#34;/{id}/edit\u0026#34;) public Object updateTodo( @PathParam(\u0026#34;id\u0026#34;) long id, @MultipartForm TodoForm todoForm) { Todo loaded = Todo.findById(id); (1) if (loaded == null) { (2) return error.data(\u0026#34;error\u0026#34;, \u0026#34;Todo with id \u0026#34; + id + \u0026#34; has been deleted after loading this form.\u0026#34;); } loaded = todoForm.updateTodo(loaded); (3) return Response.status(301) (4) .location(URI.create(\u0026#34;/todo\u0026#34;)) .build(); } 1 Load the todo record to be updated 2 If it doesn’t exist, render the \u0026#34;error\u0026#34; template 3 Otherwise, update the record; as loaded is an attached entity, no call to persist is needed 4 redirect the user to the main page, avoiding issues with reloading etc. (post-redirect-get pattern) Note that TemplateInstance as returned from the Template#data() method doesn’t extend the JAX-RS Response class. Therefore the return type of the method must be declared as Object in this case.\nSearch Thanks to Hibernate Panache it’s quite simple to refine the todo list and only return those whose title matches a given search term. Also ordering the list in some meaningful way would be nice. All we need is an optional query parameter for specifying the search term and a custom query method:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 @GET @Consumes(MediaType.TEXT_HTML) @Produces(MediaType.TEXT_HTML) public TemplateInstance listTodos(@QueryParam(\u0026#34;filter\u0026#34;) String filter) { return todos.data(\u0026#34;todos\u0026#34;, find(filter)); } @GET @Produces(MediaType.APPLICATION_JSON) @Consumes(MediaType.APPLICATION_JSON) public List\u0026lt;Todo\u0026gt; listTodosJson(@QueryParam(\u0026#34;filter\u0026#34;) String filter) { return find(filter); } private List\u0026lt;Todo\u0026gt; find(String filter) { Sort sort = Sort.ascending(\u0026#34;completed\u0026#34;) (1) .and(\u0026#34;priority\u0026#34;, Direction.Descending) .and(\u0026#34;title\u0026#34;, Direction.Ascending); if (filter != null \u0026amp;\u0026amp; !filter.isEmpty()) { (2) return Todo.find(\u0026#34;LOWER(title) LIKE LOWER(?1)\u0026#34;, sort, \u0026#34;%\u0026#34; + filter + \u0026#34;%\u0026#34;).list(); } else { return Todo.findAll(sort).list(); (3) } } 1 First sort by completion status, then priority, then by title 2 If a filter is given, apply the search term lower-cased and with wildcards, i.e. using a WHERE clause such as where lower(todo0_.title) like lower(%searchterm%) 3 Otherwise, return all todos To enter the search term, a form is added next to the table of todos:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 (1) \u0026lt;form action=\u0026#34;/todo\u0026#34; method=\u0026#34;GET\u0026#34; name=\u0026#34;search\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-row align-items-center\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;col-sm-3 my-1\u0026#34;\u0026gt; \u0026lt;label class=\u0026#34;sr-only\u0026#34; for=\u0026#34;filter\u0026#34;\u0026gt;Search\u0026lt;/label\u0026gt; (2) \u0026lt;input type=\u0026#34;text\u0026#34; name=\u0026#34;filter\u0026#34; class=\u0026#34;form-control\u0026#34; id=\u0026#34;filter\u0026#34; placeholder=\u0026#34;Search By Title\u0026#34; required {#if filtered}value=\u0026#34;{filter}\u0026#34;{/if}\u0026gt; \u0026lt;/div\u0026gt; (3) \u0026lt;input class=\u0026#34;btn btn-primary\u0026#34; value=\u0026#34;Search\u0026#34; type=\u0026#34;submit\u0026#34;\u0026gt;\u0026amp;nbsp; \u0026lt;a class=\u0026#34;btn btn-secondary {#if !filtered}disabled{/if}\u0026#34; href=\u0026#34;/todo\u0026#34; role=\u0026#34;button\u0026#34;\u0026gt;Clear Filter\u0026lt;/a\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/form\u0026gt; 1 Invoke this page with the entered search as query parameter 2 Input for the search term; show the previously entered term, if any 3 A button for clearing the result list if a search term has been entered; otherwise the button will be disabled Smoother User Experience via Unpoly The last thing I wanted to explore is how the usability and performance of the application can be improved by means of some client-side enhancements. By default, a web app rendered on the server-side like ours requires full page loads when going from one page to the other. This is where single page applications (SPAs) implemented with client-side frameworks shine: just parts of the document object model tree in the browser will be replaced e.g. when loading a result list via AJAX, resulting in a much smoother and faster user experience.\nDoes this mean we have to give up on server-side rendering altogether if we’re after this kind of UX? Luckily not, as small helper libraries such as Unpoly, Intercooler or Turbolinks can be leveraged to replace just page fragments instead of requiring full page loads. This results in a smooth SPA-like user experience without having to opt into the full client-side programming model. For the Todo example I’ve obtained great results using Unpoly. After importing its JavaScript file, all that’s needed is to add the up-target attribute to links or forms.\nE.g. here’s the form for entering the search term with that modification:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 (1) \u0026lt;form action=\u0026#34;/todo\u0026#34; method=\u0026#34;GET\u0026#34; name=\u0026#34;search\u0026#34; up-target=\u0026#34;.container\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;form-row align-items-center\u0026#34;\u0026gt; \u0026lt;div class=\u0026#34;col-sm-3 my-1\u0026#34;\u0026gt; \u0026lt;label class=\u0026#34;sr-only\u0026#34; for=\u0026#34;filter\u0026#34;\u0026gt;Search\u0026lt;/label\u0026gt; \u0026lt;input type=\u0026#34;text\u0026#34; name=\u0026#34;filter\u0026#34; class=\u0026#34;form-control\u0026#34; id=\u0026#34;filter\u0026#34; placeholder=\u0026#34;Search By Title\u0026#34; required {#if filtered}value=\u0026#34;{filter}\u0026#34;{/if}\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;input class=\u0026#34;btn btn-primary\u0026#34; value=\u0026#34;Search\u0026#34; type=\u0026#34;submit\u0026#34;\u0026gt;\u0026amp;nbsp; (2) \u0026lt;a class=\u0026#34;btn btn-secondary {#if !filtered}disabled{/if}\u0026#34; href=\u0026#34;/todo\u0026#34; role=\u0026#34;button\u0026#34; up-target=\u0026#34;.container\u0026#34;\u0026gt;Clear Filter\u0026lt;/a\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;/form\u0026gt; 1 When receiving the result of the form submission, replace the \u0026lt;div\u0026gt; with CSS class container of the current page with the one from the response 2 Do the same when following the \u0026#34;Clear Filter\u0026#34; link The magic trick of Unpoly is that links and forms with the up-target attribute are intercepted by Unpoly and executed via AJAX calls. The specified fragments from the result page are then used to replace parts of the already loaded page, instead of having the browser load the full response page. The result is the fast user experience shown in the video above.\nUnpoly also allows to show page fragments in modal dialogs, allowing to remain on the same page also when showing forms such as the one for editing a todo:\nNote that if JavaScript is disabled, the application gracefully falls back to full page loads. I.e. it will still be fully functional, just with a slightly degraded user experience. The same would happen when accessing the edit dialog directly via its URL or when opening the \u0026#34;Edit\u0026#34; link in a new tab or window:\nBonus: Using WebJars In a thread on Twitter James Ward brought up the idea of pulling in required resources such as Bootstrap via WebJars instead of getting them from a CDN. WebJars is a useful utility for obtaining all sorts of client-side libraries with Java build tools such as Maven or Gradle.\nFor Bootstrap, the following dependency must be added to the Maven pom.xml file:\n1 2 3 4 5 \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.webjars\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;bootstrap\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;4.4.1\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; The Bootstrap CSS can then be included within the base.html template like so:\n1 2 3 4 5 6 7 ... \u0026lt;head\u0026gt; ... \u0026lt;link rel=\u0026#34;stylesheet\u0026#34; href=\u0026#34;/webjars/bootstrap/4.4.1/css/bootstrap.min.css\u0026#34;\u0026gt; ... \u0026lt;/head\u0026gt; ... This is all that’s needed in order to use Bootstrap via WebJars. Note this will work on the JVM and also with a native binary via GraalVM: WebJars resources are located under META-INF/resources, and Quarkus automatically adds all resources from there when building a native image.\nWrap Up This concludes my quick tour through server-side web applications with Quarkus and its new Qute extension. Where only web applications based on REST APIs called by client-side web applications were supported before, Qute is a great addition to the list of Quarkus extensions, allowing to choose different architecture styles based on your needs and preferences.\nNote that Qute currently is in \u0026#34;Experimental\u0026#34; state, i.e. it’s a great time to give it a try and share your feedback, but be prepared for possible immaturities and potential changes down the road. E.g. I noticed that complex boolean expressions in template conditions aren’t support yet. Also it would be great to get build-time feedback upon invalid variable references in templates.\nTo learn more, refer to the Qute guide and its reference documentation. You can find the complete source code of the Todo example including instructions for building and running in this GitHub repo.\n","id":187,"publicationdate":"Jan 3, 2020","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_the_basics\"\u003eThe Basics\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_combining_html_and_data_apis\"\u003eCombining HTML and Data APIs\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_template_organization\"\u003eTemplate Organization\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_error_handling\"\u003eError Handling\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_search\"\u003eSearch\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_smoother_user_experience_via_unpoly\"\u003eSmoother User Experience via Unpoly\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_bonus_using_webjars\"\u003eBonus: Using WebJars\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_wrap_up\"\u003eWrap Up\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eOne of the long-awaited features in Quarkus was support for server-side templating:\nuntil recently, Quarkus supported only client-side web frameworks which obtain there data by calling a REST API on the backend.\nThis has changed with \u003ca href=\"https://quarkus.io/blog/quarkus-1-1-0-final-released/\"\u003eQuarkus 1.1\u003c/a\u003e: it comes with a brand-new template engine named \u003ca href=\"https://quarkus.io/guides/qute\"\u003eQute\u003c/a\u003e,\nwhich allows to build web applications using server-side templates.\u003c/p\u003e\n\u003c/div\u003e","tags":["java","quarkus","web-development"],"title":"Quarkus Qute – A Test Ride","uri":"https://www.morling.dev/blog/quarkus-qute-test-ride/"},{"content":"","id":188,"publicationdate":"Jan 3, 2020","section":"tags","summary":"","tags":null,"title":"web-development","uri":"https://www.morling.dev/tags/web-development/"},{"content":" Table of Contents GitHub Actions To the Rescue Registering a Deploy Key Defining the Workflow As a software engineer, I like to automate tedious tasks as much as possible. The deployment of this website is no exception: it is built using the Hugo static site generator and hosted on GitHub Pages; so wouldn’t it be nice if the rendered website would automatically be published whenever an update is pushed to its source code repository?\nWith the advent of GitHub Actions, tasks like this can easily be implemented without having to rely on any external CI service. Instead, many ready-made actions can be obtained from the GitHub marketplace and easily be configured as per our needs. E.g. triggered by a push to a specified branch in a GitHub repository, they can execute tasks like project builds, tests and many others, running in virtual machines based on Linux, Windows and even macOS. So let’s see what’s needed for building a Hugo website and deploying it to GitHub Pages.\nGitHub Actions To the Rescue Using my favourite search engine, I came across two GitHub actions which do everything we need:\nGitHub Actions for Hugo\nGitHub Actions for GitHub Pages\nThere are multiple alternatives for GitHub Pages deployment. I chose this one basically because it seems to be the most popular one (as per number of GitHub stars), and because it’s by the same author as the Hugo one, so they should nicely play together.\nRegistering a Deploy Key In order for the GitHub action to deploy the website, a GitHub deploy key must be registered.\nTo do so, create a new SSH key pair on your machine like so:\n1 ssh-keygen -t rsa -b 4096 -C \u0026#34;$(git config user.email)\u0026#34; -f gh-pages -N \u0026#34;\u0026#34; This will create two files, the public key (gh-pages.pub) and the private key (gh-pages). Go to https://github.com/\u0026lt;your-user-or-organisation\u0026gt;/\u0026lt;your-repo\u0026gt;/settings/keys and click \u0026#34;Add deploy key\u0026#34;. Paste in the public part of your key pair and check the \u0026#34;Allow write access\u0026#34; box.\nNow go to https://github.com/\u0026lt;your-user-or-organisation\u0026gt;/\u0026lt;your-repo\u0026gt;/settings/secrets and click \u0026#34;Add new secret\u0026#34;. Choose ACTIONS_DEPLOY_KEY as the name and paste the private part of your key pair into the \u0026#34;Value\u0026#34; field.\nThe key will be stored in an encrypted way as per GitHub’s documentation Nevertheless I’d recommend to use a specific key pair just for this purpose, instead of re-using any other key pair. That way, impact will be reduced to this particular usage, should the private key get leaked somehow.\nDefining the Workflow With the key in place, it’s time to set up the actual GitHub Actions workflow. This is simply done by creating the file .github/workflows/gh-pages-deployment.yml in your repository with the following contents. GitHub Actions workflows are YAML files, because YOLO;)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 name: GitHub Pages on: (1) push: branches: - master jobs: build-deploy: runs-on: ubuntu-18.04 steps: - uses: actions/checkout@v1 (2) with: submodules: true - name: Install Ruby Dev (3) run: sudo apt-get install ruby-dev - name: Install AsciiDoctor and Rouge run: sudo gem install asciidoctor rouge - name: Setup Hugo (4) uses: peaceiris/actions-hugo@v2 with: hugo-version: \u0026#39;0.62.0\u0026#39; - name: Build (5) run: hugo - name: Deploy (6) uses: peaceiris/actions-gh-pages@v2 env: ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }} PUBLISH_BRANCH: gh-pages PUBLISH_DIR: ./public 1 Run this action whenever changes are pushed to the master branch 2 The first step in the job: check out the source code 3 Install AsciiDoctor (in case you use Hugo with AsciiDoc files, like I do) and Rouge, a Ruby gem for syntax highlighting; I’m installing the gems instead of Ubuntu packages in order to get current versions 4 Set up Hugo via the aforementioned GitHub Actions for Hugo 5 Run the hugo command; here you could add parameters such as -F for also building future posts 6 Deploy the website to GitHub pages; the contents of Hugo’s build directory public will be pushed to the gh-pages branch of the upstream repository, using the deploy key configured before And that’s all we need; once the file is committed and pushed to the upstream repository, the deployment workflow will be executed upon each push to the master branch.\nYou can find the complete workflow definition used for publishing this website here. Also check out the documentation of GitHub Actions for Hugo and GitHub Actions for GitHub Pages to learn more about their capabilities and the options they offer.\n","id":189,"publicationdate":"Dec 26, 2019","section":"blog","summary":"\u003cdiv id=\"toc\" class=\"toc\"\u003e\n\u003cdiv id=\"toctitle\"\u003eTable of Contents\u003c/div\u003e\n\u003cul class=\"sectlevel1\"\u003e\n\u003cli\u003e\u003ca href=\"#_github_actions_to_the_rescue\"\u003eGitHub Actions To the Rescue\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_registering_a_deploy_key\"\u003eRegistering a Deploy Key\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"#_defining_the_workflow\"\u003eDefining the Workflow\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eAs a software engineer, I like to automate tedious tasks as much as possible.\nThe deployment of this website is no exception:\nit is built using the \u003ca href=\"https://gohugo.io/\"\u003eHugo\u003c/a\u003e static site generator and hosted on \u003ca href=\"https://pages.github.com/\"\u003eGitHub Pages\u003c/a\u003e;\nso wouldn’t it be nice if the rendered website would automatically be published whenever an update is pushed to its source code repository?\u003c/p\u003e\n\u003c/div\u003e","tags":["hugo","ci-cd","github-actions"],"title":"Automatically Deploying a Hugo Website via GitHub Actions","uri":"https://www.morling.dev/blog/automatically-deploying-hugo-website-via-github-actions/"},{"content":"","id":190,"publicationdate":"Dec 26, 2019","section":"tags","summary":"","tags":null,"title":"blogging","uri":"https://www.morling.dev/tags/blogging/"},{"content":"","id":191,"publicationdate":"Dec 26, 2019","section":"tags","summary":"","tags":null,"title":"ci-cd","uri":"https://www.morling.dev/tags/ci-cd/"},{"content":"","id":192,"publicationdate":"Dec 26, 2019","section":"tags","summary":"","tags":null,"title":"github-actions","uri":"https://www.morling.dev/tags/github-actions/"},{"content":"","id":193,"publicationdate":"Dec 26, 2019","section":"tags","summary":"","tags":null,"title":"hugo","uri":"https://www.morling.dev/tags/hugo/"},{"content":"","id":194,"publicationdate":"Dec 26, 2019","section":"tags","summary":"","tags":null,"title":"meta","uri":"https://www.morling.dev/tags/meta/"},{"content":" It has been quite a while since the last post on my old personal blog; since then, I’ve mostly focused on writing about my day-work on the Debezium blog as well as some posts about more general technical topics on the Hibernate team blog.\nNow recently I had some ideas for things I wanted to write about, which didn’t feel like a good fit for neither of those two. So it was time to re-boot a personal blog. The previous Blogger based one really, really feels outdated by now. Plus, I also wanted to have more control over how things work, and also be able to publish a list of projects I work on, conference talks I gave etc. So I decided to build the site using Hugo, a static site generator, and also use a nice new shiny dev domain. And here we are, welcome to morling.dev!\nStay tuned for more posts every now and then about anything related to open source, the projects I work on and software engineering in general. Onwards!\n","id":195,"publicationdate":"Dec 26, 2019","section":"blog","summary":"\u003cdiv class=\"paragraph\"\u003e\n\u003cp\u003eIt has been quite a while since the last post on my old \u003ca href=\"http://musingsofaprogrammingaddict.blogspot.com/\"\u003epersonal blog\u003c/a\u003e;\nsince then, I’ve mostly focused on writing about my day-work on the \u003ca href=\"https://debezium.io/blog/\"\u003eDebezium blog\u003c/a\u003e as well as \u003ca href=\"https://in.relation.to/gunnar-morling/\"\u003esome posts\u003c/a\u003e about more general technical topics on the Hibernate team blog.\u003c/p\u003e\n\u003c/div\u003e","tags":["meta","blogging"],"title":"Time for a New Blog","uri":"https://www.morling.dev/blog/time-for-new-blog/"}]