VisualVM SQL profiler SQL

VisualVM SQL profiler SQL

A little while ago, I discovered the SQL profiler of VisualVM and I thought I should share it with you ;).

VisualVM is a tool that provides a visual interface to display detailed information about applications running on a Java Virtual Machine (JVM). VisualVM is designed for use in development and production.

VisualVM provides lightweight profiling tools including a SQL profiler. This will detect each SQL query (passing through the JDBC layer) and group them in a tabular view with the number of executions of the query and the total execution time.

This tabular view allows you to detect quickly the queries that have the greatest impact on performance. Indeed, a query that is called often but responds quickly is generally not problematic, and a query that has a long execution time but is rarely executed is generally not problematic for the overall performance of an application (but maybe for the part of the application that generates it). When we want to optimize the performance of an application, we will therefore generally look at the requests whose total execution time (the addition of the times of each execution of the request) is the most important. Generally, we will look at the top 3, 5 or 10 queries with the highest total execution time and try to optimize them.

Here is an example of an optimization done following the performance analysis of Kestra using VisualVM’s SQL profiler. Kestra is a highly scalable data scheduling and orchestration platform that creates, executes, schedules, and monitors millions of complex pipelines. It’s also the company I’ve been working at for the past 3 months!

Kestra, in its open-source version, uses a database engine, the Enterprise Edition allows to use an alternative engine based on Kafka and Elasticsearch. The database engine supports H2, PostgreSQL, and MySQL. The optimization in question has been done for PostgreSQL.

VisualVM’s JDBC profiler can be accessed via the Profiler tab by clicking on the JDBC button. It can be configured with a text to filter the queries to be analyzed. Once launched, it will record each executed query and display a table with, for each query, the total execution time, the percentage compared to all the queries, as well as the number of executions.

For my use case, the result was as follows.

As always in performance, you have to look at the n top problems. Here, I will focus on the first 4 queries, which are the variation of the same query on the queues table. We can see that unitarily the queries take about 7ms; depending on the size of the database, they can take much longer to execute. Here, I started with a blank database, so a select should be almost immediate. Even if they are not the longest queries, as they are executed a lot of times, they contribute the most to the overall execution time on the database (about 72% here).

Kestra has an asynchronous implementation based on message queues for communication between its internal components, among other things, during the execution of flows. When the database engine is used, these queues will poll the queues table of the database: every 25ms a query will be executed to read the messages to be processed.

The first four requests in terms of total execution are the poll requests, so they have a direct impact on the flow execution time.

It is possible to see the entire query by right-clicking on a line and then View SQL.

Here is the query:

select "value", "offset" from queues 
where ((type = CAST('io.kestra.core.models.executions.Execution' AS queue_type)) and "consumer_scheduler" = false) 
order by "offset" asc 
fetch next 100 rows only 
for update skip locked

A explain plan of this query shows that despite the presence of an index on the queue_type column, the PostgreSQL optimizer decides to scan the primary key index which is suboptimal because it will scan the entirety of the records in that index.

Since PosgreSQL has no hint, it is difficult to force it to use a specific index. So I used a little workaround: I removed the primary key and replaced it with a hash index (to be able to quickly access a message when it is deleted). PostgreSQL then favors the existing index on the queue_type column, and the query execution time is greatly improved at about 1ms.

If we look at the top queries, we see that our 4 queries now account for only 23% of the total execution time.

And the result is there! Almost 50% optimization of the execution time of a flow by changing an index. For more information see the PR #1012!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.