This project has been build with a doc-as-code approach from the repository : https://github.com/fugerit-org/graalkus.
The cover image of the PDF version has been generated with the help of DALL·E

Introduction

In recent years the interest for going AOT has grown more and more among the java developers community.

Some projects were born or added support for GraalVM and native compilation. Just to name a few :

Using GraalVM has some great benefits (for instance faster startup and lower memory footprint) and a few limitations (configuration complexity, runs only on target environment).

AOT may not be viable for all scenarios, but when it is possible to use if performances and costs can be reduced a lot.

Starting in 2023 I’ve been using it more and more on the projects I’m working on.

Talking with other developers interested in the technology, one big obstacle to GraalVM adoption is first of all configuration complexity (for features like reflection).

Graalkus is a simple microservice, based on Quarkus, that I created to share my personal experience on migrating JIT application to AOT.

Conversion approach

Diagram
Figure 1. JIT to AOT conversion approach

Usually I took into consideration two possible approach when migrating a JIT application to AOT :

  1. Full approach - when all the features can be easily configured to be included in a native build

  2. Mixed approach - when not all features can be converted to AOT for any reason, for instance :

    • costs - we need to rewrite the feature and we decide conversion is not worth

    • technical limitation - some feature simply relies on some technology which cannot be converted (i.e. a very old library)

Often the mixed approach could be a good idea, because conversion can be sometimes complex and it is easier to isolate the features to be converted. Maybe starting from the easier and iterate on the others in a later time.

Demo scenario

This demo is inspired by a real microservice I configured for AOT some time ago.

The scenario we take in consideration is a JIT application used generate documents in various formats (HTML, MarkDown, AsciiDoc and PDF), through rest services.

Let’s define every format as a feature, and the load is roughly this :

  1. HTML : 40%

  2. MarkDown : 30%

  3. AsciiDoc : 20%

  4. PDF : 10%

We will find out that PDF conversion it is not easy to implement.

So we will use the mixed approach, converting formats 1, 2, 3 only. So at the end the AOT Application will handle the 90% of the load, whereas the JIT Application will be left with only 10%.

We can use an API gateway or some other technology to keep usage transparent for clients.

Part I - Development

In this section we will describe how to develop our demo application :

Requirements

  • Oracle GraalVM (tested on 21.0.5)

  • Apache Maven (tested on 3.9.9)

  • Container environment (i.e. docker, podman)

Project Initialization (JIT)

We will create a project based on Venus, a Framework to produce documents in different output formats starting from an XML document model.

Venus has a maven plugin to initialize a maven project with some flavours. I’m going to pick a Quarkus application with the command :

mvn org.fugerit.java:fj-doc-maven-plugin:init \
-DgroupId=org.fugerit.java.demo \
-DartifactId=graalkus \
-Dflavour=quarkus-3 \
-Dextensions=base,freemarker,mod-fop

This will create a maven project structure, with a rest service for document generation in html, adoc, markdown and pdf format.

Just run :

mvn quarkus:dev

And access the swagger ui to check available paths :

For instance the PDF version http://localhost:8080/doc/example.pdf or the AsciiDoc one http://localhost:8080/doc/example.adoc.

Ready for the next step?

Going AOT

As stated in Quarkus documentation, we try to build a native executable running :

mvn install -Dnative

Which will lead to a few errors, starting with :

Error: Detected a started Thread in the image heap. Thread name: Java2D Disposer. Threads running in the image generator are no longer running at image runtime. If these objects should not be stored in the image heap, you can use

    '--trace-object-instantiation=java.lang.Thread'

It is often possible to achieve AOT compatibility tweaking a few parameters. GraalVM is very good at providing hints on what to do (like in the example above). There are also a few techniques helping to configure the application in order to create a native image (for instance the tracing agent).

Generally speaking the framework we are using, Venus, is already configured for AOT. Unfortunately not all modules are native ready. In particular the mod-fop extension it is not easy to be built with GraalVM.

This is partly explained in a Quarkus Camel issue about pdfbox 2.

Our goal is to show a demo for the mixed JIT to AOT conversion approach.

In this scenario we modify the applicatin to run both in JIT and AOT mode, but in latter the PDF document feature will be disabled.

We will achieve with three simple modifications.

1. Update the maven pom file

The main reason why we get the error is that GraalVM fails on this dependency at build time :

    <dependency>
      <groupId>org.fugerit.java</groupId>
      <artifactId>fj-doc-mod-fop</artifactId>
      <exclusions>
        <exclusion>
          <groupId>xml-apis</groupId>
          <artifactId>xml-apis</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

So we will move it to the profiles sectionas and make it only available in JIT profile :

<profile>
      <id>jit</id>
      <activation>
        <property>
          <name>!native</name>
        </property>
      </activation>
      <dependencies>
        <dependency>
          <groupId>org.fugerit.java</groupId>
          <artifactId>fj-doc-mod-fop</artifactId>
          <exclusions>
            <exclusion>
              <groupId>xml-apis</groupId>
              <artifactId>xml-apis</artifactId>
            </exclusion>
          </exclusions>
        </dependency>
      </dependencies>
    </profile>
    <profile>
      <id>native</id>
      <activation>
        <property>
          <name>native</name>
        </property>
      </activation>
      <properties>
        <skipITs>true</skipITs>
        <quarkus.native.enabled>true</quarkus.native.enabled>
      </properties>
      <dependencies>
        <dependency>
          <groupId>org.fugerit.java</groupId>
          <artifactId>fj-doc-mod-fop</artifactId>
          <scope>provided</scope>
          <exclusions>
            <exclusion>
              <groupId>xml-apis</groupId>
              <artifactId>xml-apis</artifactId>
            </exclusion>
          </exclusions>
        </dependency>
      </dependencies>
    </profile>

Of course the PDF document using Apache FOP will fail when running the native executables.

2. Update the project config file

This application use the file src/main/resources/graalkus/fm-doc-process-config.xml to load doc handlers classes by reflection, we will mark as unsafe the handlers not available in AOT mode :

<freemarker-doc-process-config>
<!-- Type handler generating xls:fo style sheet -->
<docHandler id="fo-fop" info="fo" type="org.fugerit.java.doc.mod.fop.FreeMarkerFopTypeHandlerUTF8" unsafe="true"/>
<!-- Type handler generating pdf -->
<docHandler id="pdf-fop" info="pdf" type="org.fugerit.java.doc.mod.fop.PdfFopTypeHandler" unsafe="true">
</freemarker-doc-process-config>

3. Disable relevant tests

We are going to add the JUnit 5 DisabledInNativeImage annotation to the tests that would fail :

    @Test
    @DisabledInNativeImage
    void testPdf() {
        given().when().get("/doc/example.pdf").then().statusCode(200);
    }

So now we can try again to build native image :

mvn package -Dnative

This time the build is successful and features for HTML, AsciiDoc and MarkDown documents will be available, for instance http://localhost:8080/doc/example.adoc, while the pdf version will fail http://localhost:8080/doc/example.pdf.

So we have now a project which can be built both in JIT and AOT mode.

Now it’s time for the docker images.

Container images

In this step we are going to build and test the container image.

JIT container

First of all we build the application :

mvn package

Then build the container image :

docker build -f src/main/docker/Dockerfile.jvm -t graalkus-jit .

And launch it :

docker run --rm -p 8080:8080 --name graalkus-jit graalkus-jit

On my system quarkus starts in 0.458s.

__  ____  __  _____   ___  __ ____  ______
--/ __ \/ / / / _ | / _ \/ //_/ / / / __/
-/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2024-12-01 00:50:30,285 INFO  [org.fug.jav.dem.gra.AppInit] (main) The application is starting...
2024-12-01 00:50:30,333 INFO  [io.quarkus] (main) graalkus 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.17.2) started in 0.458s. Listening on: http://0.0.0.0:8080

AOT container

After building the application :

mvn package -Dnative -Dquarkus.native.container-build=true
this time we are going to use the quarkus.native.container-build option, so the build will be handled by a container.

We can now build the container :

docker build -f src/main/docker/Dockerfile.native-micro -t graalkus-aot .

And launch it :

docker run --rm -p 8080:8080 --name graalkus-aot graalkus-aot

This time quarkus starts in 0.020s, about 25 times faster than JIT version!

__  ____  __  _____   ___  __ ____  ______
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2024-12-01 00:52:13,027 INFO  [org.fug.jav.dem.gra.AppInit] (main) The application is starting...
2024-12-01 00:52:13,029 INFO  [io.quarkus] (main) graalkus 1.0.0-SNAPSHOT native (powered by Quarkus 3.17.2) started in 0.020s. Listening on: http://0.0.0.0:8080

Benchmark application

In this step we are going to benchmark the application, both in JIT and AOT version.

Requirements

For this benchmark we will use a script that can be found in the folder bench-graph-h2-load.sh, it is possible to find it in the following path of the repository src/main/script/bench-graph-h2-load.sh.

The script needs :

Benchmark JIT application

Build the application

mvn package

Run the script (will also launch the application)

./src/main/script/bench-graph-h2-load.sh -m JIT

Benchmark AOT application

Build the application

mvn install -Dnative

Run the script (will also launch the application)

./src/main/script/bench-graph-h2-load.sh -m AOT

Sample output

Here I will show, as an example, the result on my system.

  • OS : Ubuntu 24.04.1 LTS

  • CPU : AMD Ryzen 7 3700X (8 core, 16 thread)

  • Memory : 32 GB

With standard script parameters (h2load) :

  • 50000 requests for warm up run (w)

  • 250000 requests for benchmark run (r)

  • 12 clients (c)

  • 1 threads (t)

JIT result :

finished in 13.11s, 19068.33 req/s, 23.58MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      120us     11.00ms       600us       468us    92.87%
time for connect:      112us       588us       312us       141us    66.67%
time to 1st byte:     3.93ms      5.26ms      4.27ms       356us    91.67%
req/s           :    1589.12     1596.61     1592.01        2.11    75.00%

AOT result :

finished in 16.87s, 14819.13 req/s, 18.33MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      207us     14.50ms       781us       531us    94.41%
time for connect:       89us       542us       286us       137us    66.67%
time to 1st byte:     1.57ms      2.11ms      1.74ms       172us    75.00%
req/s           :    1234.99     1241.60     1238.78        2.47    58.33%

And the relative resource plotting :

JIT Benchmark plotting
Figure 2. JIT Benchmark plotting
AOT Benchmark plotting
Figure 3. AOT Benchmark plotting

As you can see :

  • The rate is more or less the same for JIT and AOT version

  • All request are successful in both scenarios

  • CPU footprint is also comparable (Except at startup where AOT performs better)

  • AOT memory footprint is 3x times lower than JIT version

Keep in mind we did not add any optimization to JIT version (for instance CRaC), nor to AOT one (i.e. PGO).

Profile-Guided Optimizations

Native executables with GraalVM can perform better if they are optimized with some real data.

In this section we will explore the GraaVM’s Profile-Guided Optimizations feature.

Instrumentation

  1. We add an instrumented profile to our project :

<profile>
  <id>instrumented</id>
  <build>
    <finalName>${project.artifactId}-${project.version}-instrumented</finalName>
  </build>
  <properties>
    <quarkus.native.additional-build-args>${base-native-build-args},--pgo-instrument</quarkus.native.additional-build-args>
  </properties>
</profile>
  1. Then we will create the native image :

mvn install -Dnative -Pinstrumented
  1. Start the application :

./target/graalkus-1.0.0-SNAPSHOT-instrumented-runner
  1. Provide some relevant workload :

./src/main/script/bench-graph-h2-load.sh

After the application shutdown a .iprof file will be available in the working folder.

Optimization

  1. Add another profile to build the optimized native image :

<profile>
  <id>optimized</id>
  <build>
    <finalName>${project.artifactId}-${project.version}-optimized</finalName>
  </build>
  <properties>
    <quarkus.native.additional-build-args>${base-native-build-args},--pgo=${project.basedir}/default.iprof</quarkus.native.additional-build-args>
  </properties>
</profile>
  1. Create the optimized native executable :

mvn install -Dnative -Poptimized
  1. Run the benchmark :

./src/main/script/bench-graph-h2-load.sh -m AOT -a graalkus-1.0.0-SNAPSHOT-optimized-runner
  1. Sample optimized result

This section contains the result of an optimized benchmark run :

finished in 12.84s, 19464.30 req/s, 24.07MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      150us     32.20ms       588us       542us    96.14%
time for connect:      101us       576us       317us       148us    66.67%
time to 1st byte:     1.19ms      2.09ms      1.65ms       252us    75.00%
req/s           :    1622.11     1632.73     1628.10        3.58    58.33%

And the relative resource plotting :

Optimized AOT Benchmark plotting
Figure 4. Optimized AOT Benchmark plotting

Let’s compare the result with the Unoptimized benchmark (they have been run on the same system).

After optimization, CPU and memory footprint is more or less the same, but request rate is about 20% higher (160.94 req/s from to 197.93 req/s).

More optimization options are available. A good resource for it is the Build and test various capabilities of Spring Boot & GraalVM repository. (Even though focused on Spring Boot, most concept and options can be used on other frameworks too).
Profile-Guided Optimizations are only available on Oracle GraalVM. other distributions like GraalVM Community Edition or Mandrel do not provide it.

Conclusion

So in this first part we :

  1. Developed the stand alone JIT application

  2. Converted it to an AOT application

  3. Created the container image version of each

  4. Run benchmarks on standalone application

  5. Done native image optimization (PGO)

Here is a summary of the result :

Info JIT AOT Optimized AOT

Startup time (s)

0.634

0.018

0.014

Requests/s

19068.33

14819.13

19464.30

Memory (MB)

400/500

150/250

150/250

Part II - CI and container images

This section describes container images build thought CI :

JIT Container image

finished in 12.65s, 19763.04 req/s, 24.44MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min         max         mean         sd        +/- sd
time for request:      142us      6.53ms       584us       262us    72.69%
time for connect:      117us       608us       351us       150us    66.67%
time to 1st byte:     4.15ms      5.57ms      4.50ms       397us    91.67%
req/s           :    1646.94     1654.02     1650.47        1.74    75.00%

AOT Container image

finished in 21.21s, 11785.14 req/s, 14.58MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      298us      8.97ms      1.00ms       555us    92.20%
time for connect:      105us       599us       341us       152us    66.67%
time to 1st byte:     1.42ms      2.38ms      1.91ms       280us    66.67%
req/s           :     982.14      985.04      982.96        0.89    83.33%

AOT Optimized Container image

finished in 12.63s, 19792.74 req/s, 24.48MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      183us      8.08ms       583us       354us    95.75%
time for connect:      109us       558us       314us       137us    66.67%
time to 1st byte:     1.17ms      1.73ms      1.38ms       152us    83.33%
req/s           :    1649.48     1659.64     1653.12        2.69    83.33%

Benchmark with limits

Let’s configure a docker compose to limit resources for our containers :

services:
  graalkus-jit-limit:
    image: fugeritorg/graalkus:latest
    container_name: graalkus-jit-limit
    restart: always
    ports:
      - "8084:8080"
    deploy:
      resources:
        limits:
          cpus: 1.0
          memory: 256M
        reservations:
          cpus: 1.0
          memory: 64M
  graalkus-aot-limit:
    image: fugeritorg/graalkus:latest-amd64native
    container_name: graalkus-aot-limit
    restart: always
    ports:
      - "8085:8080"
    deploy:
      resources:
        limits:
          cpus: 1.0
          memory: 128M
        reservations:
          cpus: 1.0
          memory: 64M
  graalkus-aot-optimized-limit:
    image: fugeritorg/graalkus:latest-amd64native-pgo
    container_name: graalkus-aot-optimized-limit
    restart: always
    ports:
      - "8086:8080"
    deploy:
      resources:
        limits:
          cpus: 1.0
          memory: 128M
        reservations:
          cpus: 1.0
          memory: 64M
  graalkus-jit-limit-high:
    image: fugeritorg/graalkus:latest
    container_name: graalkus-jit-high-limit
    restart: always
    ports:
      - "8087:8080"
    deploy:
      resources:
        limits:
          cpus: 1.0
          memory: 256M
        reservations:
          cpus: 1.0
          memory: 64M

and start it the containers :

docker compose -f src/main/docker/docker-compose-limit.yml up -d

For this compose configuration the pre-built container images.

Then benchmark one by one the services :

  1. JIT Version (1.0 CPU, max 64/128 MB)

./src/main/script/bench-graph-h2-load.sh -u http://localhost:8084
finished in 172.83s, 1446.51 req/s, 1.79MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      147us    590.09ms      8.25ms     24.06ms    91.73%
time for connect:      114us       592us       330us       146us    66.67%
time to 1st byte:    15.10ms     16.43ms     15.69ms       500us    66.67%
req/s           :     120.55      121.13      120.78        0.21    58.33%
  1. AOT Version (1.0 CPU, max 64/128 MB)

./src/main/script/bench-graph-h2-load.sh -u http://localhost:8085
finished in 311.73s, 801.98 req/s, 1015.69KB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      304us    104.70ms     14.92ms     31.52ms    85.05%
time for connect:      115us       584us       333us       143us    66.67%
time to 1st byte:     2.17ms      4.38ms      3.12ms       885us    66.67%
req/s           :      66.83       66.95       66.90        0.04    75.00%
  1. AOT Optimized Version (1.0 CPU, max 64/128 MB)

./src/main/script/bench-graph-h2-load.sh -u http://localhost:8086
finished in 132.81s, 1882.39 req/s, 2.33MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      179us     97.61ms      6.33ms     20.95ms    93.63%
time for connect:      185us      1.07ms       609us       278us    66.67%
time to 1st byte:     1.43ms      2.80ms      2.09ms       465us    58.33%
req/s           :     156.86      157.24      157.01        0.15    83.33%
  1. JIT Version, High limits (1.0 CPU, max 64/256 MB)

./src/main/script/bench-graph-h2-load.sh -u http://localhost:8087
finished in 156.90s, 1593.32 req/s, 1.97MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
                     min         max         mean         sd        +/- sd
time for request:      152us     94.86ms      7.48ms     22.68ms    92.48%
time for connect:      130us       614us       350us       146us    66.67%
time to 1st byte:     7.48ms     83.63ms     14.15ms     21.88ms    91.67%
req/s           :     132.78      133.47      133.06        0.23    75.00%

Appendix A : Going AOT in depth

In Going AOT section we just adapted our software with a few modifications.

This was possible because :

  1. The application is built on Quarkus which is already native ready. All core modules are already supporting native image build.

  2. Venus framework too is already pre-configure for native image build. For instance by configuring the native build args.

quarkus:
  native:
    # if needed add -H:+UnlockExperimentalVMOptions
    additional-build-args: -H:IncludeResources=graalkus/fm-doc-process-config.xml,\
      -H:IncludeResources=graalkus/template/document.ftl

In a legacy application, not based on a native ready framework like Quarkus or Spring Boot, the conversion could be lengthier.

One possible approach could be to split a monolith features in microservices and going AOT when possible.

Diagram
Figure 5. From legacy to AOT application

Appendix B : Resources