{"id":1153,"date":"2020-11-16T13:26:13","date_gmt":"2020-11-16T12:26:13","guid":{"rendered":"https:\/\/www.loicmathieu.fr\/wordpress\/?p=1153"},"modified":"2020-11-16T13:27:58","modified_gmt":"2020-11-16T12:27:58","slug":"profiler-une-image-native-graalvm-avec-perf","status":"publish","type":"post","link":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/profiler-une-image-native-graalvm-avec-perf\/","title":{"rendered":"Profiling a GraalVM native image with perf"},"content":{"rendered":"<p>The GraalVM native-image tool allows you to generate a native executable (or native image) from your Java application.<\/p>\n<p>This native executable will start very quickly and have a much smaller memory footprint than a traditional Java application; at the cost of reduced peak performance and a relatively high packaging build time. More information on native executables <a href=\"https:\/\/www.graalvm.org\/reference-manual\/native-image\/\" rel=\"noopener noreferrer\" target=\"_blank\">here<\/a>.<\/p>\n<p>A native executable contains a minimalist JVM called SubstratVM, this one has some limitations:<\/p>\n<ul><li>Partial support for reflection<\/li>\n\n<li>Partial support for dynamic proxy<\/li>\n\n<li>Partial support for dynamic class loading<\/li>\n\n<li>No JNI<\/li>\n\n<li>No JVMTI<\/li>\n<\/ul>\n<p>No support for <strong>JMVTI<\/strong> means no support for Java agents, JMX, Java profilers, Java debuggers, Java Flight Recorder and Java Mission Control, as well as all the tools delivered with the JDK (jps, jstack, jmap).<\/p>\n<p>For all the needs covered by these tools, you must therefore use a solution integrated into the application (for example, replacing the JMX metrics by Prometheus metrics), or standard tools provided by your operating system.<\/p>\n<p>To profile the execution of an application, the Linux OS has a very powerful tool: <strong>perf<\/strong>.<\/p>\n<p>The perf tool has many features, it can access all OS and CPU metrics (performance counters, hence its name: perf) and profile the application in many different ways.<\/p>\n<h3>Using perf to profile the CPU<\/h3>\n<p>The perf tool will use the symbols integrated in the binary of your application to make the link between a memory pointer and the corresponding Java method (or the system call).<\/p>\n<p>By default, these symbols are not integrated into the native executables, so you must ask the native-image tool to leave them there via the options <code>H:-DeleteLocalSymbols -H:+PreserveFramePointer<\/code>.<\/p>\n<p>If you want to test these steps, you can use the Quarkus <a href=\"https:\/\/github.com\/quarkusio\/quarkus-quickstarts\/tree\/master\/getting-started\" rel=\"noopener noreferrer\" target=\"_blank\">getting-started<\/a> application. Quarkus has easy native-image support, just add to your application&#8217;s application.properties the property <code>quarkus.native.additional-build-args=-H:-DeleteLocalSymbols,-H:+PreserveFramePointer<\/code> and it will automatically add these options to the command line of the native-image tool.<\/p>\n<p>After having generated your native executable, you can launch it, then recover its PID; we will use this one in the command line of the perf tool.<\/p>\n<p>Once your application is launched, and ideally under load (you can use a tool such as wrk to generate load), you can profile it via the following perf command: <code>perf record -F 99 -p PID --call-graph dwarf sleep 10<\/code>.<\/p>\n<ul><li><strong>record<\/strong> : asks perf to start profiling the application.<\/li>\n\n<li><strong>-F 99<\/strong> : profiles at 99 Hertz, which means 99 samples per second.<\/li>\n\n<li><strong>-p PID<\/strong> : asks perf to profile this particular PID (the one of your application).<\/li>\n\n<li><strong>&#8211;call-graph dwarf<\/strong> : tells perf to use the symbols built into your application (ELF symbol).<\/li>\n\n<li><strong>sleep 10<\/strong> : as perf profiles a PID and not a command, it must be given a command to execute. When this command is complete, perf will stop profiling your application. By using <code>sleep 10<\/code> as a command, we will therefore profile the application for 10 seconds.<\/li>\n<\/ul>\n<p>When the command is finished, perf will have generated a data file containing the profile of your application (CPU profile here, because it has not been told which event it should profile): <code>perf.data<\/code>.<\/p>\n<p>You can use the following command to view this profile in the console: <code>perf report --stdio<\/code>, you will then have a result close to this one:<\/p>\n<pre>\n# Children      Self  Command          Shared Object                        Symbol                                                                                                                        &gt;\n# ........  ........  ...............  ...................................  ..............................................................................................................................&gt;\n#\n    13.47%     0.00%  tloop-thread-19  libpthread-2.31.so                   [.] start_thread\n            |\n            ---start_thread\n               IsolateEnterStub_PosixJavaThreads_pthreadStartRoutine_e1f4a8c0039f8337338252cd8734f63a79b5e3df_06195ea7c1ac11d884862c6f069b026336aa4f8c\n               JavaThreads_threadStartRoutine_241bd8ce6d5858d439c83fac40308278d1b55d23\n               Thread_run_857ee078f8137062fcf27275732adf5c4870652a\n               FastThreadLocalRunnable_run_0329ad2c5210a091812879bcecd155c58e561e60\n               ThreadExecutorMap$2_run_66c8943ee6536a10df07f979fb6cd278adcf96bc\n               SingleThreadEventExecutor$4_run_1b47df7867e302a2fb7f28d7657a73e92f89d91f\n               |          \n               |--12.64%--NioEventLoop_run_be89580b4d16514bef6e948913d2ed21c5e4f679\n               |          |          \n               |          |--5.14%--NioEventLoop_processSelectedKeys_9a76c58d657b781ee037bbb65f41f01d2eb54e7c\n               |          |          NioEventLoop_processSelectedKeysOptimized_c36ca161e53573665bc03cb5392e91c123bcd359\n               |          |          NioEventLoop_processSelectedKey_3a0d92ce472db6c251df4485227a85acb9d3a1ca\n               |          |          AbstractNioByteChannel$NioByteUnsafe_read_45358e803c643a6380776021e488e79d981b159d\n<\/pre>\n<p>And this over thousands of lines &#8230; not easy to analyze eh?<\/p>\n<p>To easily analyze a profile generated by perf, you can use the FlameGraph tool, accessible here: <a href=\"https:\/\/github.com\/brendangregg\/FlameGraph\" rel=\"noopener noreferrer\" target=\"_blank\"><a href=\"https:\/\/github.com\/brendangregg\/FlameGraph\">https:\/\/github.com\/brendangregg\/FlameGraph<\/a><\/a><\/p>\n<p>A FlameGraph is a way of visualizing the profile of an application allowing to instantly detect the most frequent code path. It will display on the x-axis the population (generally the method) whose size is proportional to the number of samples in the profile, and on the y-axis the depth in the stack. More information on FlameGraphs <a href=\"http:\/\/www.brendangregg.com\/flamegraphs.html\" rel=\"noopener noreferrer\" target=\"_blank\">here<\/a>.<\/p>\n<p>We can note a small problem in the profile data, the column <strong>Command<\/strong> instead of containing the given command, contains the name of the thread (truncated moreover). This is a bug in the native-image tool, to work around it we will use <strong>sed<\/strong> to modify the profile data before using it in the FlameGraph tool. The value of the <strong>Command<\/strong> column is found at the base of the FlameGraph, it must normally be unique for the aggregation of stacks to be done.<\/p>\n<p>The first step is to use <code>perf script<\/code> to extract the profile data into a textual format, then use <code>sed<\/code> to correct the command name so that you can then generate a FlameGraph.<\/p>\n<pre>\nperf script &gt; out.perf\nsed -i -E \"s\/cutor-thread-[0-9]*\/executor-thread\/\" out.perf\nsed -i -E \"s\/ntloop-thread-[0-9]*\/eventloop-thread\/\" out.perf\nsed -i -E \"s\/tloop-thread-[0-9]*\/eventloop-thread\/\" out.perf\n~\/FlameGraph\/stackcollapse-perf.pl out.perf | ~FlameGraph\/flamegraph.pl &gt; perf.svg\n<\/pre>\n<p>Here is an example of a generated FlameGraph:<\/p>\n<img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-1.png?resize=640%2C502&#038;ssl=1\" alt=\"\" width=\"640\" height=\"502\" class=\"alignnone size-full wp-image-1158\" srcset=\"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-1.png?w=957&amp;ssl=1 957w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-1.png?resize=300%2C235&amp;ssl=1 300w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-1.png?resize=768%2C603&amp;ssl=1 768w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-1.png?resize=344%2C270&amp;ssl=1 344w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/>\n<p>What I like most about FlameGraphs is that you can zoom in on them by clicking on a frame:<\/p>\n<img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-2.png?resize=640%2C615&#038;ssl=1\" alt=\"\" width=\"640\" height=\"615\" class=\"alignnone size-full wp-image-1159\" srcset=\"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-2.png?w=959&amp;ssl=1 959w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-2.png?resize=300%2C288&amp;ssl=1 300w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-2.png?resize=768%2C738&amp;ssl=1 768w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/flamegraph-2.png?resize=281%2C270&amp;ssl=1 281w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/>\n<h3>Using perf to profile memory<\/h3>\n<p>To profile memory, we&#8217;ll use the same technique with a slightly modified command.<\/p>\n<p>There are several ways to profile memory with perf, you can ask perf to record memory related OS events, profile one of the system methods that allocate memory, or use <code>pef mem<\/code>. We will use the last solution.<\/p>\n<p>For this, you must start your application using the perf tool: <code>perf mem record --call-graph dwarf -F 99 .\/getting-started-1.0-SNAPSHOT-runner<\/code>.<\/p>\n<p>When the application stops, perf will save the profile data on disk which can then be used in the same way as the CPU profile data (via perf report, perf script and the FlameGraph tool).<\/p>\n<h3>Pour aller plus loin<\/h3>\n<p>A talk I gave on the topic, starts at minute 44: <a href=\"https:\/\/www.youtube.com\/watch?v=TXnJ9eyoEhw\" rel=\"noopener noreferrer\" target=\"_blank\"><iframe loading=\"lazy\" width=\"356\" height=\"200\" src=\"https:\/\/www.youtube.com\/embed\/TXnJ9eyoEhw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen title=\"RemoteClazz Singapore - Engineering Applications with GraalVM\"><\/iframe><\/a>.<\/p>\n<p>Tips for using perf with lots of recipes: <a href=\"http:\/\/www.brendangregg.com\/perf.htm\" rel=\"noopener noreferrer\" target=\"_blank\"><a href=\"http:\/\/www.brendangregg.com\/perf.html\">http:\/\/www.brendangregg.com\/perf.html<\/a><\/a>.<\/p>\n<p>An article describing in detail what a FlameGraph is: <a href=\"https:\/\/queue.acm.org\/detail.cfm?id=2927301\" rel=\"noopener noreferrer\" target=\"_blank\"><a href=\"https:\/\/queue.acm.org\/detail.cfm?id=2927301\">https:\/\/queue.acm.org\/detail.cfm?id=2927301<\/a><\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>The GraalVM native-image tool allows you to generate a native executable (or native image) from your Java application. This native executable will start very quickly and have a much smaller memory footprint than a traditional Java application; at the cost of reduced peak performance and a relatively high packaging build time. More information on native executables here. A native executable contains a minimalist JVM called SubstratVM, this one has some limitations: Partial support for reflection Partial support for dynamic proxy&#8230;<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/profiler-une-image-native-graalvm-avec-perf\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[9],"tags":[170,159,187],"class_list":["post-1153","post","type-post","status-publish","format-standard","hentry","category-informatique","tag-graalvm","tag-performance","tag-profiling"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":1090,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/quarkus-jlink-et-application-class-data-sharing-appcds\/","url_meta":{"origin":1153,"position":0},"title":"Quarkus, jlink and Application Class Data Sharing (AppCDS)","author":"admin","date":"Friday May 29th, 2020","format":false,"excerpt":"Quarkus is optimized to start quickly and have a very small memory footprint. This is true when deploying in a standard JVM but even more so when deploying our application as a native executable via GraalVM. Quarkus greatly facilitates the creation of a native executable, thanks to this, a Quarkus\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/quarkus_metrics_graphic_bootmem_wide-1024x473.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/quarkus_metrics_graphic_bootmem_wide-1024x473.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/quarkus_metrics_graphic_bootmem_wide-1024x473.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1877,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/java-24-quoi-de-neuf\/","url_meta":{"origin":1153,"position":1},"title":"Java 24 : what&#8217;s new?","author":"admin","date":"Friday January 10th, 2025","format":false,"excerpt":"Now that Java 24 is features complete (Rampdown Phase One at the day of writing), it\u2019s time to walk through all the functionalities that bring to us, developers, this new version. This article is part of a series on what\u2019s new on the last versions of Java, for those who\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1353,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/quarkus-tip-comment-ne-pas-creer-une-extension-quarkus\/","url_meta":{"origin":1153,"position":2},"title":"Quarkus Tip: How NOT to create a Quarkus extension","author":"admin","date":"Tuesday November 16th, 2021","format":false,"excerpt":"When you develop an application composed of several components, it is frequent to want to share some code in an external library, for example via an external JAR integrated as a dependency of your components. Quarkus is an extension framework, each extension it offers allows to integrate a technology (BDD\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1267,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/debugger-une-image-native-graalvm-avec-gdb\/","url_meta":{"origin":1153,"position":3},"title":"Debugging a GraalVM native image with GDB","author":"admin","date":"Monday June 14th, 2021","format":false,"excerpt":"In a previous article, I mentioned how to profile a native GraalVM image with perf. If you are not familiar with the GraalVM tool and the limitations it brings, I suggest you reread my article, or at least the beginning of it. As seen in my previous article, a native\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":923,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/devoxx-france-2019\/","url_meta":{"origin":1153,"position":4},"title":"(Fran\u00e7ais) Devoxx France 2019","author":"admin","date":"Monday May 13th, 2019","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1668,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/devoxx-fr-2023-hidden-security-features-of-th-jvm-everything-you-didnt-know-and-more-par-steve-poole\/","url_meta":{"origin":1153,"position":5},"title":"(Fran\u00e7ais) Devoxx FR 2023 &#8211; Hidden security features of the JVM &#8211; everything you didn&#8217;t know and more par Steve Poole","author":"admin","date":"Friday April 14th, 2023","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/comments?post=1153"}],"version-history":[{"count":0,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1153\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/media?parent=1153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/categories?post=1153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/tags?post=1153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}