This project has been build with a doc-as-code approach from the repository : https://github.com/fugerit-org/graalkus. |
The cover image of the PDF version has been generated with the help of DALL·E |
Introduction
In recent years the interest for going AOT has grown more and more among the java developers community.
Some projects were born or added support for GraalVM and native compilation. Just to name a few :
Using GraalVM has some great benefits (for instance faster startup and lower memory footprint) and a few limitations (configuration complexity, runs only on target environment).
AOT may not be viable for all scenarios, but when it is possible to use if performances and costs can be reduced a lot.
Starting in 2023 I’ve been using it more and more on the projects I’m working on.
Talking with other developers interested in the technology, one big obstacle to GraalVM adoption is first of all configuration complexity (for features like reflection).
Graalkus is a simple microservice, based on Quarkus, that I created to share my personal experience on migrating JIT application to AOT.
Conversion approach
Usually I took into consideration two possible approach when migrating a JIT application to AOT :
-
Full approach - when all the features can be easily configured to be included in a native build
-
Mixed approach - when not all features can be converted to AOT for any reason, for instance :
-
costs - we need to rewrite the feature and we decide conversion is not worth
-
technical limitation - some feature simply relies on some technology which cannot be converted (i.e. a very old library)
-
Often the mixed approach could be a good idea, because conversion can be sometimes complex and it is easier to isolate the features to be converted. Maybe starting from the easier and iterate on the others in a later time.
Demo scenario
This demo is inspired by a real microservice I configured for AOT some time ago.
The scenario we take in consideration is a JIT application used generate documents in various formats (HTML, MarkDown, AsciiDoc and PDF), through rest services.
Let’s define every format as a feature, and the load is roughly this :
-
HTML : 40%
-
MarkDown : 30%
-
AsciiDoc : 20%
-
PDF : 10%
We will find out that PDF conversion it is not easy to implement.
So we will use the mixed approach, converting formats 1, 2, 3 only. So at the end the AOT Application will handle the 90% of the load, whereas the JIT Application will be left with only 10%.
We can use an API gateway or some other technology to keep usage transparent for clients.
Part I - Development
In this section we will describe how to develop our demo application :
Requirements
-
Oracle GraalVM (tested on 21.0.5)
-
Apache Maven (tested on 3.9.9)
-
Container environment (i.e. docker, podman)
Project Initialization (JIT)
We will create a project based on Venus, a Framework to produce documents in different output formats starting from an XML document model.
Venus has a maven plugin to initialize a maven project with some flavours. I’m going to pick a Quarkus application with the command :
mvn org.fugerit.java:fj-doc-maven-plugin:init \
-DgroupId=org.fugerit.java.demo \
-DartifactId=graalkus \
-Dflavour=quarkus-3 \
-Dextensions=base,freemarker,mod-fop
This will create a maven project structure, with a rest service for document generation in html, adoc, markdown and pdf format.
Just run :
mvn quarkus:dev
And access the swagger ui to check available paths :
For instance the PDF version http://localhost:8080/doc/example.pdf or the AsciiDoc one http://localhost:8080/doc/example.adoc.
Ready for the next step?
Going AOT
As stated in Quarkus documentation, we try to build a native executable running :
mvn install -Dnative
Which will lead to a few errors, starting with :
Error: Detected a started Thread in the image heap. Thread name: Java2D Disposer. Threads running in the image generator are no longer running at image runtime. If these objects should not be stored in the image heap, you can use
'--trace-object-instantiation=java.lang.Thread'
It is often possible to achieve AOT compatibility tweaking a few parameters. GraalVM is very good at providing hints on what to do (like in the example above). There are also a few techniques helping to configure the application in order to create a native image (for instance the tracing agent).
Generally speaking the framework we are using, Venus, is already configured for AOT. Unfortunately not all modules are native ready. In particular the mod-fop extension it is not easy to be built with GraalVM.
This is partly explained in a Quarkus Camel issue about pdfbox 2.
Our goal is to show a demo for the mixed JIT to AOT conversion approach.
In this scenario we modify the applicatin to run both in JIT and AOT mode, but in latter the PDF document feature will be disabled.
We will achieve with three simple modifications.
1. Update the maven pom file
The main reason why we get the error is that GraalVM fails on this dependency at build time :
<dependency>
<groupId>org.fugerit.java</groupId>
<artifactId>fj-doc-mod-fop</artifactId>
<exclusions>
<exclusion>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
</exclusion>
</exclusions>
</dependency>
So we will move it to the profiles sectionas and make it only available in JIT profile :
<profile>
<id>jit</id>
<activation>
<property>
<name>!native</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.fugerit.java</groupId>
<artifactId>fj-doc-mod-fop</artifactId>
<exclusions>
<exclusion>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</profile>
<profile>
<id>native</id>
<activation>
<property>
<name>native</name>
</property>
</activation>
<properties>
<skipITs>true</skipITs>
<quarkus.native.enabled>true</quarkus.native.enabled>
</properties>
<dependencies>
<dependency>
<groupId>org.fugerit.java</groupId>
<artifactId>fj-doc-mod-fop</artifactId>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</profile>
Of course the PDF document using Apache FOP will fail when running the native executables.
2. Update the project config file
This application use the file src/main/resources/graalkus/fm-doc-process-config.xml to load doc handlers classes by reflection, we will mark as unsafe the handlers not available in AOT mode :
<freemarker-doc-process-config>
<!-- Type handler generating xls:fo style sheet -->
<docHandler id="fo-fop" info="fo" type="org.fugerit.java.doc.mod.fop.FreeMarkerFopTypeHandlerUTF8" unsafe="true"/>
<!-- Type handler generating pdf -->
<docHandler id="pdf-fop" info="pdf" type="org.fugerit.java.doc.mod.fop.PdfFopTypeHandler" unsafe="true">
</freemarker-doc-process-config>
3. Disable relevant tests
We are going to add the JUnit 5 DisabledInNativeImage annotation to the tests that would fail :
@Test
@DisabledInNativeImage
void testPdf() {
given().when().get("/doc/example.pdf").then().statusCode(200);
}
So now we can try again to build native image :
mvn package -Dnative
This time the build is successful and features for HTML, AsciiDoc and MarkDown documents will be available, for instance http://localhost:8080/doc/example.adoc, while the pdf version will fail http://localhost:8080/doc/example.pdf.
So we have now a project which can be built both in JIT and AOT mode.
Now it’s time for the docker images.
Container images
In this step we are going to build and test the container image.
JIT container
First of all we build the application :
mvn package
Then build the container image :
docker build -f src/main/docker/Dockerfile.jvm -t graalkus-jit .
And launch it :
docker run --rm -p 8080:8080 --name graalkus-jit graalkus-jit
On my system quarkus starts in 0.458s.
__ ____ __ _____ ___ __ ____ ______
--/ __ \/ / / / _ | / _ \/ //_/ / / / __/
-/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2024-12-01 00:50:30,285 INFO [org.fug.jav.dem.gra.AppInit] (main) The application is starting...
2024-12-01 00:50:30,333 INFO [io.quarkus] (main) graalkus 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.17.2) started in 0.458s. Listening on: http://0.0.0.0:8080
AOT container
After building the application :
mvn package -Dnative -Dquarkus.native.container-build=true
this time we are going to use the quarkus.native.container-build option, so the build will be handled by a container. |
We can now build the container :
docker build -f src/main/docker/Dockerfile.native-micro -t graalkus-aot .
And launch it :
docker run --rm -p 8080:8080 --name graalkus-aot graalkus-aot
This time quarkus starts in 0.020s, about 25 times faster than JIT version!
__ ____ __ _____ ___ __ ____ ______
--/ __ \/ / / / _ | / _ \/ //_/ / / / __/
-/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2024-12-01 00:52:13,027 INFO [org.fug.jav.dem.gra.AppInit] (main) The application is starting...
2024-12-01 00:52:13,029 INFO [io.quarkus] (main) graalkus 1.0.0-SNAPSHOT native (powered by Quarkus 3.17.2) started in 0.020s. Listening on: http://0.0.0.0:8080
Benchmark application
In this step we are going to benchmark the application, both in JIT and AOT version.
Requirements
For this benchmark we will use a script that can be found in the folder bench-graph-h2-load.sh, it is possible to find it in the following path of the repository src/main/script/bench-graph-h2-load.sh.
The script needs :
Benchmark JIT application
Build the application
mvn package
Run the script (will also launch the application)
./src/main/script/bench-graph-h2-load.sh -m JIT
Benchmark AOT application
Build the application
mvn install -Dnative
Run the script (will also launch the application)
./src/main/script/bench-graph-h2-load.sh -m AOT
Sample output
Here I will show, as an example, the result on my system.
-
OS : Ubuntu 24.04.1 LTS
-
CPU : AMD Ryzen 7 3700X (8 core, 16 thread)
-
Memory : 32 GB
With standard script parameters (h2load) :
-
50000 requests for warm up run (w)
-
250000 requests for benchmark run (r)
-
12 clients (c)
-
1 threads (t)
JIT result :
finished in 13.11s, 19068.33 req/s, 23.58MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 120us 11.00ms 600us 468us 92.87%
time for connect: 112us 588us 312us 141us 66.67%
time to 1st byte: 3.93ms 5.26ms 4.27ms 356us 91.67%
req/s : 1589.12 1596.61 1592.01 2.11 75.00%
AOT result :
finished in 16.87s, 14819.13 req/s, 18.33MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 207us 14.50ms 781us 531us 94.41%
time for connect: 89us 542us 286us 137us 66.67%
time to 1st byte: 1.57ms 2.11ms 1.74ms 172us 75.00%
req/s : 1234.99 1241.60 1238.78 2.47 58.33%
And the relative resource plotting :
As you can see :
-
The rate is more or less the same for JIT and AOT version
-
All request are successful in both scenarios
-
CPU footprint is also comparable (Except at startup where AOT performs better)
-
AOT memory footprint is 3x times lower than JIT version
Profile-Guided Optimizations
Native executables with GraalVM can perform better if they are optimized with some real data.
In this section we will explore the GraaVM’s Profile-Guided Optimizations feature.
Instrumentation
-
We add an instrumented profile to our project :
<profile>
<id>instrumented</id>
<build>
<finalName>${project.artifactId}-${project.version}-instrumented</finalName>
</build>
<properties>
<quarkus.native.additional-build-args>${base-native-build-args},--pgo-instrument</quarkus.native.additional-build-args>
</properties>
</profile>
-
Then we will create the native image :
mvn install -Dnative -Pinstrumented
-
Start the application :
./target/graalkus-1.0.0-SNAPSHOT-instrumented-runner
-
Provide some relevant workload :
./src/main/script/bench-graph-h2-load.sh
After the application shutdown a .iprof file will be available in the working folder.
Optimization
-
Add another profile to build the optimized native image :
<profile>
<id>optimized</id>
<build>
<finalName>${project.artifactId}-${project.version}-optimized</finalName>
</build>
<properties>
<quarkus.native.additional-build-args>${base-native-build-args},--pgo=${project.basedir}/default.iprof</quarkus.native.additional-build-args>
</properties>
</profile>
-
Create the optimized native executable :
mvn install -Dnative -Poptimized
-
Run the benchmark :
./src/main/script/bench-graph-h2-load.sh -m AOT -a graalkus-1.0.0-SNAPSHOT-optimized-runner
-
Sample optimized result
This section contains the result of an optimized benchmark run :
finished in 12.84s, 19464.30 req/s, 24.07MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 150us 32.20ms 588us 542us 96.14%
time for connect: 101us 576us 317us 148us 66.67%
time to 1st byte: 1.19ms 2.09ms 1.65ms 252us 75.00%
req/s : 1622.11 1632.73 1628.10 3.58 58.33%
And the relative resource plotting :
Let’s compare the result with the Unoptimized benchmark (they have been run on the same system).
After optimization, CPU and memory footprint is more or less the same, but request rate is about 20% higher (160.94 req/s from to 197.93 req/s).
More optimization options are available. A good resource for it is the Build and test various capabilities of Spring Boot & GraalVM repository. (Even though focused on Spring Boot, most concept and options can be used on other frameworks too). |
Profile-Guided Optimizations are only available on Oracle GraalVM. other distributions like GraalVM Community Edition or Mandrel do not provide it. |
Conclusion
So in this first part we :
-
Developed the stand alone JIT application
-
Converted it to an AOT application
-
Created the container image version of each
-
Run benchmarks on standalone application
-
Done native image optimization (PGO)
Here is a summary of the result :
Info | JIT | AOT | Optimized AOT |
---|---|---|---|
Startup time (s) |
0.634 |
0.018 |
0.014 |
Requests/s |
19068.33 |
14819.13 |
19464.30 |
Memory (MB) |
400/500 |
150/250 |
150/250 |
Part II - CI and container images
This section describes container images build thought CI :
JIT Container image
finished in 12.65s, 19763.04 req/s, 24.44MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 142us 6.53ms 584us 262us 72.69%
time for connect: 117us 608us 351us 150us 66.67%
time to 1st byte: 4.15ms 5.57ms 4.50ms 397us 91.67%
req/s : 1646.94 1654.02 1650.47 1.74 75.00%
AOT Container image
finished in 21.21s, 11785.14 req/s, 14.58MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 298us 8.97ms 1.00ms 555us 92.20%
time for connect: 105us 599us 341us 152us 66.67%
time to 1st byte: 1.42ms 2.38ms 1.91ms 280us 66.67%
req/s : 982.14 985.04 982.96 0.89 83.33%
AOT Optimized Container image
finished in 12.63s, 19792.74 req/s, 24.48MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 183us 8.08ms 583us 354us 95.75%
time for connect: 109us 558us 314us 137us 66.67%
time to 1st byte: 1.17ms 1.73ms 1.38ms 152us 83.33%
req/s : 1649.48 1659.64 1653.12 2.69 83.33%
Benchmark with limits
Let’s configure a docker compose to limit resources for our containers :
services:
graalkus-jit-limit:
image: fugeritorg/graalkus:latest
container_name: graalkus-jit-limit
restart: always
ports:
- "8084:8080"
deploy:
resources:
limits:
cpus: 1.0
memory: 256M
reservations:
cpus: 1.0
memory: 64M
graalkus-aot-limit:
image: fugeritorg/graalkus:latest-amd64native
container_name: graalkus-aot-limit
restart: always
ports:
- "8085:8080"
deploy:
resources:
limits:
cpus: 1.0
memory: 128M
reservations:
cpus: 1.0
memory: 64M
graalkus-aot-optimized-limit:
image: fugeritorg/graalkus:latest-amd64native-pgo
container_name: graalkus-aot-optimized-limit
restart: always
ports:
- "8086:8080"
deploy:
resources:
limits:
cpus: 1.0
memory: 128M
reservations:
cpus: 1.0
memory: 64M
graalkus-jit-limit-high:
image: fugeritorg/graalkus:latest
container_name: graalkus-jit-high-limit
restart: always
ports:
- "8087:8080"
deploy:
resources:
limits:
cpus: 1.0
memory: 256M
reservations:
cpus: 1.0
memory: 64M
and start it the containers :
docker compose -f src/main/docker/docker-compose-limit.yml up -d
For this compose configuration the pre-built container images.
Then benchmark one by one the services :
-
JIT Version (1.0 CPU, max 64/128 MB)
./src/main/script/bench-graph-h2-load.sh -u http://localhost:8084
finished in 172.83s, 1446.51 req/s, 1.79MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 147us 590.09ms 8.25ms 24.06ms 91.73%
time for connect: 114us 592us 330us 146us 66.67%
time to 1st byte: 15.10ms 16.43ms 15.69ms 500us 66.67%
req/s : 120.55 121.13 120.78 0.21 58.33%
-
AOT Version (1.0 CPU, max 64/128 MB)
./src/main/script/bench-graph-h2-load.sh -u http://localhost:8085
finished in 311.73s, 801.98 req/s, 1015.69KB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 304us 104.70ms 14.92ms 31.52ms 85.05%
time for connect: 115us 584us 333us 143us 66.67%
time to 1st byte: 2.17ms 4.38ms 3.12ms 885us 66.67%
req/s : 66.83 66.95 66.90 0.04 75.00%
-
AOT Optimized Version (1.0 CPU, max 64/128 MB)
./src/main/script/bench-graph-h2-load.sh -u http://localhost:8086
finished in 132.81s, 1882.39 req/s, 2.33MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 179us 97.61ms 6.33ms 20.95ms 93.63%
time for connect: 185us 1.07ms 609us 278us 66.67%
time to 1st byte: 1.43ms 2.80ms 2.09ms 465us 58.33%
req/s : 156.86 157.24 157.01 0.15 83.33%
-
JIT Version, High limits (1.0 CPU, max 64/256 MB)
./src/main/script/bench-graph-h2-load.sh -u http://localhost:8087
finished in 156.90s, 1593.32 req/s, 1.97MB/s
requests: 250000 total, 250000 started, 250000 done, 250000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 250000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 309.20MB (324215904) total, 733.51KB (751116) headers (space savings 95.35%), 303.00MB (317714500) data
min max mean sd +/- sd
time for request: 152us 94.86ms 7.48ms 22.68ms 92.48%
time for connect: 130us 614us 350us 146us 66.67%
time to 1st byte: 7.48ms 83.63ms 14.15ms 21.88ms 91.67%
req/s : 132.78 133.47 133.06 0.23 75.00%
Appendix A : Going AOT in depth
In Going AOT section we just adapted our software with a few modifications.
This was possible because :
quarkus:
native:
# if needed add -H:+UnlockExperimentalVMOptions
additional-build-args: -H:IncludeResources=graalkus/fm-doc-process-config.xml,\
-H:IncludeResources=graalkus/template/document.ftl
In a legacy application, not based on a native ready framework like Quarkus or Spring Boot, the conversion could be lengthier.
One possible approach could be to split a monolith features in microservices and going AOT when possible.
Appendix B : Resources
In this appendix there are some references to some useful documentation and resources.
-
Quarkus
-
Quarkus documentation, especially ;
-
Quarkus Event Bus Logging Filter JAX-RS Documentation (a very good example of both Quarkus Event Bus usage and doc-as-code approach).
-
-
Fugerit Venus Doc
-
GraalVM
-
Build and test various capabilities of Spring Boot & GraalVM (GitHub repository)
-
A few videos :
-
Going AOT: Everything you need to know about GraalVM for Java applications by Alina Yurenko SpringIO
-
Bring the action: using GraalVM in production by Alina Yurenko Going AOT: Everything you need to know about GraalVM for Java applications by Alina Yurenko SpringIO
-
Scala fino a zero con Spring + GraalVM o WebAssembly di Sébastien Deleuze
-