{"id":1180,"date":"2020-12-08T12:04:35","date_gmt":"2020-12-08T11:04:35","guid":{"rendered":"https:\/\/www.loicmathieu.fr\/wordpress\/?p=1180"},"modified":"2020-12-08T12:04:35","modified_gmt":"2020-12-08T11:04:35","slug":"benchmark-conversion-de-long-en-byte","status":"publish","type":"post","link":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/benchmark-conversion-de-long-en-byte\/","title":{"rendered":"Benchmark : conversion from long to byte[]"},"content":{"rendered":"<p>I&#8217;ve been using Kafka a lot lately, and in Kafka a lot of things are byte arrays, even headers!<\/p>\n<p>As I have many components that exchange messages, I added headers to help with message tracking, including a <code>timestamp<\/code> header which has the value <code>System.currentTimeMillis()<\/code>.<\/p>\n<p>So I had to transform a <code>long<\/code> into a byte array; in a very naive way, I coded this: <code>String.valueOf(System.currentTimeMillis()).getBytes()<\/code>. But instantiating a <code>String<\/code> each time a header is created does not seem very optimal to me!<\/p>\n<p>Looking a bit further, Guava has a solution based on bitwise calculation via the <code>Longs<\/code> class, as well as Kafka via its <code>LongSerializer<\/code>. You can also use a <code>ByteBuffer<\/code> to perform the conversion.<\/p>\n<p>To compare the three, nothing better than JMH &#8211; <a href=\"https:\/\/github.com\/openjdk\/jmh\" target=\"_blank\" rel=\"noopener noreferrer\">The Java Microbenchmark Harness<\/a>. This tool allows to write relevant micro-benchmarks taking into account the internal characteristics of the JVM. It also offers integrated tools to analyze the performance of our tests (profiling, disassembling, &#8230;). If you don&#8217;t know JMH, you can refer to this article: <a href=\"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/introduction-a-jmh-java-microbenchmark-harness\/\">INTRODUCTION TO JMH &#8211; JAVA MICROBENCHMARK HARNESS<\/a>.<\/p>\n<h2>The benchmark<\/h2>\n<p>First of all, I configured the benchmark with a Thread State so that the setup is played for each Thread. Among other things, I created a <code>ByteBuffer<\/code> per thread to compare implementations with and without re-use of the buffer.<\/p>\n<pre>\n@State(Scope.Thread)\n\/\/ other JMH annotations ...\npublic class LongToByteArray {\n    private static final LongSerializer LONG_SERIALIZER = new LongSerializer();\n\n    long timestamp;\n    ByteBuffer perThreadBuffer;\n\n    @Setup\n    public void setup() {\n        timestamp = System.currentTimeMillis();\n        perThreadBuffer = ByteBuffer.allocate(Long.BYTES);\n    }\n\n    \/\/ benchmark methods\n}\n<\/pre>\n<p>Then, I implement a benchmark method for each way of converting a <code>long<\/code> to <code>byte[]<\/code>. I implemented two different algorithms for <code>ByteBuffer<\/code>: one with an instantiation of a buffer on each conversion, and the other with a recycling of an existing buffer using the <code>ByteBuffer<\/code> instantiated in the benchmark setup phase.<\/p>\n<pre>    \n@Benchmark\n    public byte[] testStringValueOf() {\n        return String.valueOf(timestamp).getBytes();\n    }\n\n    @Benchmark\n    public byte[] testGuava() {\n        return Longs.toByteArray(timestamp);\n    }\n\n    @Benchmark\n    public byte[] testKafkaSerde() {\n        return LONG_SERIALIZER.serialize(null, timestamp);\n    }\n\n    @Benchmark\n    public byte[] testByteBuffer() {\n        ByteBuffer buffer = ByteBuffer.allocate(Long.BYTES);\n        buffer.putLong(timestamp);\n        return buffer.array();\n    }\n\n    @Benchmark\n    public byte[] testByteBuffer_reuse() {\n        perThreadBuffer.putLong(timestamp);\n        byte[] result = perThreadBuffer.array();\n        perThreadBuffer.clear();\n        return result;\n    }\n<\/pre>\n<p>The full benchmark is accessible <a href=\"https:\/\/github.com\/loicmathieu\/jmh-benchmarks\/blob\/master\/src\/main\/java\/fr\/loicmathieu\/jmh\/LongToByteArray.java\" target=\"_ blank\" rel=\"noopener noreferrer\">here<\/a>.<\/p>\n<h2>The results<\/h2>\n<p>All the tests were run on my laptop: Intel(R) Core(TM) i7-8750H 6 cores (12 with hyperthreading) &#8211; Ubuntu 19.10.<\/p>\n<p>The Java version used was: <code>openjdk version &quot;11.0.7&quot; 2020-04-14<\/code>.<\/p>\n<pre>\nBenchmark                             Mode  Cnt   Score   Error  Units\nLongToByteArray.testByteBuffer        avgt    5   4,429 \u00b1 0,204  ns\/op\nLongToByteArray.testByteBuffer_reuse  avgt    5   5,655 \u00b1 0,793  ns\/op\nLongToByteArray.testGuava             avgt    5   6,422 \u00b1 0,428  ns\/op\nLongToByteArray.testKafkaSerde        avgt    5   9,103 \u00b1 1,515  ns\/op\nLongToByteArray.testStringValueOf     avgt    5  39,660 \u00b1 4,372  ns\/op\n<\/pre>\n<p>First observation: my intuition was good, instantiating a <code>String<\/code> for each conversion is very bad, 4 to 10 times slower than all the other implementations. When we look at the result of the conversion, we understand why. By using a <code>String<\/code> we no longer convert a 64bit number but a character string where each character (each digit of the number) is coded on a byte. So we do not compare exactly the same thing since the result of the conversion via a <code>String<\/code> will give you an array of 13 bytes, while a <code>Long<\/code> can be encoded in 8 bytes, as gives us the conversion via Guava, Kafka or a ByteBuffer.<\/p>\n<p>Surprisingly Kafka, which is known for its performance, has a slower implementation than Guava or that via a <code>ByteBuffer<\/code>.<\/p>\n<p>The results obtained via <code>ByteBuffer<\/code> are surprising, the instantiation of a <code>ByteBuffer<\/code> for each conversion is more efficient than the reuse of an existing one (which requires a clean of the buffer) .<\/p>\n<h2>A little more detailed analysis<\/h2>\n<p>Let&#8217;s put aside the implementation via a <code>String<\/code> and try to better understand the differences between the other implementations.<\/p>\n<p>For this I will use the profiling capabilities of JMH via the <code>-prof<\/code> option.<\/p>\n<p>If we profile the memory allocations via <code>-prof gc<\/code> we have the following results:<\/p>\n<pre>\nLongToByteArray.testByteBuffer \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 avgt    5     4,492 \u00b1   0,708   ns\/op\nLongToByteArray.testByteBuffer:\u00b7gc.alloc.rate\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 avgt    5  4635,903 \u00b1 712,889  MB\/sec\nLongToByteArray.testByteBuffer_reuse \u00a0 \u00a0 \u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 avgt    5     5,798 \u00b1   1,139   ns\/op\nLongToByteArray.testByteBuffer_reuse:\u00b7gc.alloc.rate avgt    5    \u2248 10\u207b\u2074            MB\/sec\nLongToByteArray.testGuava\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 avgt    5     6,939 \u00b1   0,899   ns\/op\nLongToByteArray.testGuava:\u00b7gc.alloc.rate\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 avgt    5  3000,818 \u00b1 376,613  MB\/sec\nLongToByteArray.testKafkaSerde \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 avgt    5     9,317 \u00b1   0,842   ns\/op\nLongToByteArray.testKafkaSerde:\u00b7gc.alloc.rate \u00a0 \u00a0 \u00a0 avgt    5  4467,791 \u00b1 405,897  MB\/sec\n<\/pre>\n<p>We can clearly see the advantage of reusing the <code>ByteBuffer<\/code>: there is no memory allocation, while by creating a new buffer for each conversion, we have 4 GB\/s of memory allocation !<\/p>\n<p>On the other hand, the memory allocations are close for the three other implementations, so that does not give us much more information.<\/p>\n<p>Now let&#8217;s try to profile the CPU with <code>-prof perf<\/code> which will use the perf tool to profile the application.<\/p>\n<p>The results are not easily understandable (to see them this is <a href=\"https:\/\/github.com\/loicmathieu\/jmh-benchmarks\/blob\/master\/run\/LongToByteArray\/perf-profile.txt\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>), some observations :<\/p>\n<ul><li>Reusing a <code>ByteBuffer<\/code> seems to involve a lot more CPU branches, maybe this is the cause of the performance difference.<\/li>\n\n<li>The Kafka implementation seems to involve more CPU branches than Guava&#8217;s despite performing fewer instructions. Because of these branches, fewer instructions can be performed per CPU cycle. This is certainly the reason why the Guava implementation is more efficient.<\/li>\n<\/ul>\n<p>Finally, out of curiosity, I looked at the code for <code>HeapByteBuffer.putLong()<\/code>, this is the implementation used via <code>ByteBuffer<\/code> because I don&#8217;t do any direct allocation. This uses a <code>Unsafe.putLongUnaligned()<\/code> method. <code>Unsafe<\/code> is known for its high performance implementations (but should not be used by everyone), here this method is annotated with <a>@HotSpotIntrinsicCandidate<\/code>code&gt;@HotSpotIntrinsicCandidate&lt;\/code<\/a> which means that an <strong>intrinsic<\/strong> may exists for it and could explain its difference in performance with other implementations. An <strong>intrinsic<\/strong> can be seen as a piece of native code, optimized for your OS \/ CPU architecture, which the JVM will substitute for the Java implementation of the method under certain conditions.<\/p>\n<h2>Conclusion<\/h2>\n<p>Be careful what you measure, the implementation via a <code>String<\/code> does not generate the same array of bytes as the others, and is therefore much less efficient.<\/p>\n<p>Reusing a <code>ByteBuffer<\/code> is not always the best solution, as the cost of recycling canbe significant. Allocations are not very expensive within the JVM, and sometimes it is better to allocate than execute more instructions.<\/p>\n<p>Follow the force, read the code;)<\/p>\n<p>Although JMH is a great tool, it needs technical skills and a lot of time to fully analyze its results. Even if the observed differences are not fully explained; I&#8217;m still happy with my little experimentation;)<\/p>","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been using Kafka a lot lately, and in Kafka a lot of things are byte arrays, even headers! As I have many components that exchange messages, I added headers to help with message tracking, including a timestamp header which has the value System.currentTimeMillis(). So I had to transform a long into a byte array; in a very naive way, I coded this: String.valueOf(System.currentTimeMillis()).getBytes(). But instantiating a String each time a header is created does not seem very optimal to&#8230;<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/benchmark-conversion-de-long-en-byte\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[9],"tags":[178,11,188,189,159],"class_list":["post-1180","post","type-post","status-publish","format-standard","hentry","category-informatique","tag-benchmark","tag-java","tag-jmh","tag-micro-benchmark","tag-performance"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":1330,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/devoxx-france-2021-ledition-9-3-4\/","url_meta":{"origin":1180,"position":0},"title":"(Fran\u00e7ais) Devoxx France 2021 &#8211; l&#8217;\u00e9dition 9 3\/4","author":"admin","date":"Friday October  1st, 2021","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1030,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/for-vs-stream\/","url_meta":{"origin":1180,"position":1},"title":"(Fran\u00e7ais) For vs Stream","author":"admin","date":"Tuesday April 21st, 2020","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":966,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/1-an-chez-zenika\/","url_meta":{"origin":1180,"position":2},"title":"(Fran\u00e7ais) 1 an chez Zenika","author":"admin","date":"Tuesday September  3rd, 2019","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1309,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/3-ans-chez-zenika\/","url_meta":{"origin":1180,"position":3},"title":"(Fran\u00e7ais) 3 ans chez Zenika","author":"admin","date":"Tuesday September  7th, 2021","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1138,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/ma-deuxieme-annee-chez-zenika\/","url_meta":{"origin":1180,"position":4},"title":"(Fran\u00e7ais) Ma deuxi\u00e8me ann\u00e9e chez Zenika","author":"admin","date":"Thursday September  3rd, 2020","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":39,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/sitemesh-gerer-son-layout-sans-douleur\/","url_meta":{"origin":1180,"position":5},"title":"Sitemesh : g\u00e9rer son layout sans douleur","author":"admin","date":"Friday July  6th, 2007","format":false,"excerpt":"Sitemesh permet de g\u00e9rer facilement le layout des application web JAVA. Une petite description de comment il marche.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/comments?post=1180"}],"version-history":[{"count":0,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1180\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/media?parent=1180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/categories?post=1180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/tags?post=1180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}