Creating a chatbot with Google Gemini Vertex AI and Quarkus

Creating a chatbot with Google Gemini Vertex AI and Quarkus

I recently created a Quarkus extension that provides access to Google Vertex AI.
In this article, I’m going to use this extension to create a chatbot.

The first step is to create a Quarkus project containing the REST and Google Cloud Vertex AI extensions.

Here are the extensions to add to your dependency file:

<dependency>
  <groupId>io.quarkiverse.googlecloudservices</groupId>
  <artifactId>quarkus-google-cloud-vertex-ai</artifactId>
</dependency>
<dependency>
  <groupId>io.quarkus</groupId>
  <artifactId>quarkus-resteasy-reactive</artifactId>
</dependency>

Or more simply, here is a link that will allow you to create a project from code.quarkus.io.

The extension allows you to inject a VertexAI object into a Quarkus application to access the Google Cloud Vertex AI service. This service enables the Gemini generative AI model to be used in a Google Cloud project.

We’re going to use this extension in a REST endpoint to code a chatbot. As I don’t really like developing UIs, I’m going to use cURL to call the chatbot.

Here is the starting code for a chatbot with VertexAI:

package fr.loicmathieu.quarkus.vertexai;

// [...] imports omitted for brevity

// 1. Standard REST endpoint available on /chat
@Path("/chat")
public class VertexAIResource {
    // 2. System message tellin the LLM it is a chatbot
    private static final String SYSTEM_INSTRUCTION = """
        You are a chatbot named Loïc.
        Don't pretend you know everything but be helpful with a touch of humour.""";

    private GenerativeModel model;

    // 3. Injection of the provided VertexAI object
    @Inject
    VertexAI vertexAI;

    // 4. Init the model with Gemini 2.5 Flash
    @PostConstruct
    void initModel() {
        this.model =  new GenerativeModel("gemini-2.5-flash", vertexAI)
            .withSystemInstruction(ContentMaker.fromString(SYSTEM_INSTRUCTION));
    }

    // 5. Generate a chat response on each call to /chat
    @GET
    public String chat(@QueryParam("message") String message) throws IOException {
        var response = model.generateContent(message);

        // For simplicity: we only take the first part of the first candidate
        return response.getCandidatesList().getFirst()
            .getContent().getPartsList().getFirst().getText() + "\n";
    }
}
  1. REST endpoint using the JAX-RS standard available from the /chat path of the application.
  2. System message telling the model it’s a chatbot and asking it to respond with a touch of humor.
  3. Injection of the VertexAI service, provided by the Quarkus Google Cloud VertexAI extension.
  4. Model initialization at the REST endpoint post-construct. We’re using Gemini Flash 2.5 here, because Gemini Flash is a model that combines performance and cost control, which is what we want for a chatbot that needs to respond quickly and doesn’t require overly complex reasoning.
  5. Each time the endpoint is called, the model is called and the first response generated is returned. A real-life application would certainly implement something more advanced.

To start the application in dev mode, you can use the Quarkus CLI: quarkus dev, the application is then available on port 8080.

To call the chatbot, I use the following cURL: curl -G localhost:8080/chat --data-urlencode "message=A message?". For those who don’t know thsi trick, by using -G I force cURL to do a GET, it will then pass the data as a query-string in the URL. This avoids any space-encoding problems by using --data-urlencode.

Here’s a sample conversation:

As you can see, everything’s going fine until I ask it what my name is! Even though I’ve just told it, it doesn’t know it, because it has no memory.

We’re going to add a memory to the chatbot so that we can have a real conversation with it.

package fr.loicmathieu.quarkus.vertexai;

// [...] Imports omitted for brevity

// 1. Standard REST endpoint available on /chat
@Path("/chat")
public class VertexAIResource {
    // 2. System message tellin the LLM it is a chatbot
    private static final String SYSTEM_INSTRUCTION = """
        You are a chatbot named Loïc.
        Don't pretend you know everything but be helpful with a touch of humour.""";

    private GenerativeModel model;
    private ChatSession chatSession;

    // 3. Injection of the provided VertexAI object
    @Inject
    VertexAI vertexAI;

    // 4. Init the model with Gemini 2.5 Flash
    @PostConstruct
    void initModel() {
        this.model =  new GenerativeModel("gemini-2.5-flash", vertexAI)
            .withSystemInstruction(ContentMaker.fromString(SYSTEM_INSTRUCTION));

        // For simplicity: we use a single chat session.
        // On real use cases, we should use one per user
        this.chatSession = this.model.startChat();
    }

    // 5. Generate a chat response on each call to /chat
    @GET
    public String chat(@QueryParam("message") String message) throws IOException {
        var response = chatSession.sendMessage(message);

        // For simplicity: we only take the first part of the first candidate
        return response.getCandidatesList().getFirst()
           .getContent().getPartsList().getFirst().getText() + "\n";
    }
}

With a memory, the chatbot works much better!

Here, I’ve used a single chat memory for the whole application, stored in memory (RAM). In the real world, of course, you’ll need to use one memory per chat user (or session), ideally stored persistently.

That’s all for today, we’ve discovered how to create a chatbot in Quarkus with Google VertexAI and the Gemini model.
Google VertexAI supports many other features: RAG, multimodal completion, tools, … You’ll find more information in its documentation.

You can find the project code on GitHub: quarkus-vertexai-example.

Thanks to Google for providing Google Cloud credits to write this article.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.