Optimize Java Performance On Kubernetes

Lalit Chaturvedi
6 min readDec 28, 2023

--

Set the CPU and RAM limits accordingly

Apart from optimizing the JVM memory footprint with options such as -Xmx, -XX:MaxRAM, -XX:+UseStringDeduplication, etc. You should set the requests and limits for CPU and RAM utilized by pods and containers in Kubernetes, it’s recommended to have same value in request and limit.

But selecting the proper memory limits for the application is like going between Scylla and Charybdis: the FinOps demand shrinking the resource usage to reduce cloud bills, but at the same time, meeting the Service Level Agreement (SLA) made with users or clients may require more resources. Luckily, the pieces of advice below will help you navigate these waters safely:

  • Use load testing to understand the application’s behavior under normal conditions and stress testing to measure resource consumption at peak performance. Tune the settings accordingly. Don’t set the limits too low. Even if your application consumes fewer resources under stable load, it needs more CPU for warmup and peak loads.
  • Determine the lowest requirements to meet SLOs (Service Level Objectives the developers should reach to meet SLA).
  • Utilize tools to study the relevant metrics, such as RAM consumption. For instance, kubectl top pod <podname> provides data on memory usage inside the pod, and jcmd <id> GC.heap_info gives information about the heap usage.
  • Use Native Memory Tracking in addition to GC logs to understand how much memory your application actually uses.
  • Take Kubernetes overhead into account. The pod overhead is the resources it uses when running on a node. For instance, the Fargate pod adds 256 MB to each pod’s memory reservation for necessary K8s components.

Improper configuration will lead to

  • Overutilization, when the service consumes all available memory, the application constantly performs garbage collection, and we add more instances to resume normal operation;
  • Underutilization, when the application doesn’t use all available memory, and we have to spin up more instances and waste resources.

Distinguish types of services

When performing optimisations, always consider the type of service. Some services are less critical, and some are highly critical, so the performance requirements differ. Less critical services

  • Have moderate RTO (recovery time objectives) requirements;
  • Have a burstable Quality of Service (QoS) class, meaning that the pods have lower resource guarantees based on the container request and don’t require a specific memory limit (but at least one container in the pod must have a memory or CPU request/limit);

Highly critical services

  • Have strict RTO requirements;
  • Are highly elastic and designed for handling exponential growth;
  • Have a Guaranteed QoS class, meaning that the pods have strict resource limits and are guaranteed not to be killed until they exceed their limits. All containers in such pods must have a CPU limit/request and a memory limit/request.

Therefore, the performance requirements should be determined per service.

Note that Java applications don’t handle the vertical scaling well and are better suited for horizontal scaling. This means that the requests and limits should be based on the peak performance data. In addition, the scaling strategy shouldn’t be based on CPU and RAM metrics only. Sometimes latency or throughput are more important for the given application.

Use Kubernetes probes effectively

Kubernetes probes used by the kubelet are vital for gathering information about the health of your containers. On the other hand, their incorrect configuration may lead to performance degradation and unnecessary scaling.

There are three types of Kubernetes probes:

  • A startup probe determines if the containerized application has started. Other probes are turned off until the startup probe confirms the successful startup.
  • A liveness probe determines whether the container is running. If not, it signals the kubelet to restart it.
  • A readiness probe decides when the container is ready to accept network requests.

The probes work better together. For instance, a startup probe is perfect for slow-starting containers because otherwise, the liveness probe could kill the container prematurely unless the timeoutSeconds is appropriately set. The readiness probe could say that the container is running all right when, in reality, the application is in a deadlock, which can be identified only by the liveness probe.

Major Java frameworks, including Spring Boot, support Kubernetes probes configuration and autoconfiguration. With Spring Boot, you need to add the spring-boot-starter-actuator dependency to the pom.xml file. Spring Boot will register liveness and readiness probes automatically when the management.health.probes.enabled property is set to true (or management.endpoint.health.probes.enabled=true starting with Spring 2.3.2) in application.properties.

You can then adjust the probe settings for your workloads. Proper configuration will enable you to avoid frequent and unnecessary container restarts or other issues. For instance, suppose the probes don’t wait long enough (e.g., when you set the response time limits too low) and return negative responses. In that case, the Kubernetes autoscaler may decide that additional pods are needed and perform unrequired vertical scaling, thus wasting resources.

Upgrade the Java version

Even if you use regularly updated JDK images, it’s not time to bask in the sun yet. Upgrading the Java version is no less important because the overall JVM performance is getting better with each JDK release. For example, Java has become increasingly container-aware starting with JDK 9:

  • JDK versions 10+ have better Docker container detection algorithm and allow for better resource configuration usage and more flexible adjustment of heap percentage with available RAM;
  • Versions 11+ collect and use cgroups v1 data;
  • Versions 17+ and 11u have cgroups v2 support;

And so on. In addition, fresh versions include numerous improvements to garbage collection, affecting the KPIs greatly. But although some fixes are backported to legacy Java versions, fewer and fewer improvements make it to older LTS versions with each new release. Therefore, upgrading the JDK is crucial for optimal Java performance, not only in Kubernetes.

But what if you can’t migrate to a newer Java version now? After all, the migration requires solving the compatibility issues and sometimes rewriting the code significantly.

Select the appropriate Garbage Collector

The Java platform offers a variety of garbage collectors tailored for specific workloads and aimed at improving relevant KPIs. For instance,

  • ParallelGC is suitable for high-throughput applications;
  • G1GC is aimed at reducing latency;
  • ZGC is a concurrent garbage collector, meaning all heavy lifting work is done while Java threads continue to execute
  • ShenandoahGC (not included with Oracle Java, but shipped with OpenJDK distributions, including Liberica JDK) focuses on keeping pauses short even with large heaps.

The goal is to select the appropriate collector for your Kubernetes cluster — in most cases, it will be sufficient for performance improvement and won’t require exquisite GC tuning.

Furthermore, developers should avoid automatic SerialGC switching. Suppose you set the -Xmx parameter to 2 Gb or less and limit the application to less than two processors. In that case, SerialGC will switch automatically (even specifying another collector explicitly in the JVM settings won’t remedy the situation). SerialGC might be optimal for single CPU machines and applications running in extremely memory-tight environments but may lead to significant performance degradation in other use cases.

Therefore, don’t set the limits too low or use the -XX:+AlwaysActAsServerClassMachine that prevents automatic SerialGC usage.

Use a small base OS image

Minimizing the size of containers is crucial for optimizing the resource consumption in your Kubernetes clusters and keeping the cloud costs under control. Although several technologies enable the developers to keep their containers neat and lean, the top-priority step is to choose a minimalistic base OS image. This way, you will immediately reduce the container size without laborious JVM memory configuration or stripping the unnecessary packages off the OS image.

The best lightweight Linux distributions for the cloud are Alpine and Alpaquita Linux. Both have a base image size of less than 4MB (additional packages are easily installed with the APK tool). Still, Alpaquita, which is 100% Alpine-compatible, has several distinguishing features that make it perfect for enterprise Java development:

  • Two libc implementations, optimized musl and glibc, for performance improvement and seamless migration;
  • Additional kernel hardening and regular updates for optimal security;
  • Tools facilitating Java development and four mallocs for various Java workloads;
  • A bonus for Spring developers: we created Alpaquita Containers tailor-made for Spring Boot apps and aimed at reducing their RAM consumption by up to 30 %.

Reduce startup and warmup time

Reducing application startup is critical when you use cloud run similar services. It is also essential if your cloud provider charges you for CPU time. In addition, JVM warmup is associated with increased memory consumption, so you have to allocate more memory to your instances, which won’t be used later.

There are several ways to reduce Java application startup, including AppCDS, AOT-compilation, and some completely novel solutions, one of which will be integrated into the upcoming JDK minor release. Along with that the https://www.graalvm.org option can also be explored !

The topic is too extensive for this article, so we will make a deep dive into it in the next one.

--

--