CharacterEscapes: Jackson’s hidden gem

CharacterEscapes: Jackson’s hidden gem

At Kestra, the data orchestration platform I work for, we had an issue ([#10326] (https://github.com/kestra-io/kestra/issues/10326)) opened by a user reporting a problem with the PostgreSQL database and the Unicode character \u0000. A workflow task that returned this character in its output was failing.

After investigation, PostgreSQL refuses to store a JSONB entry containing this character because it has no textual representation. This is because it is the null character in Unicode, which is not allowed in JSON, which represents null with the character string null.

A hidden feature of Jackson, which we use for our JSON serialization layer, comes into play: CharacterEscapes. This class allows us to configure how Jackson will escape special characters and allows us to escape the Unicode character \u0000 to avoid crashing a workflow task that would return it in the output.

public class SafeguardCharacterEscapes extends CharacterEscapes { // <1>
    private static final SerializableString NULL = new SerializedString("null"); // <2>

    private final int[] asciiEscapes;

    SafeguardCharacterEscapes() {
        // <3> Start with the standard JSON escapes
        asciiEscapes = CharacterEscapes.standardAsciiEscapesForJSON();
        // <4> And then specify that the null character should be escaped
        asciiEscapes[0] = CharacterEscapes.ESCAPE_CUSTOM;
    }

    @Override
    public int[] getEscapeCodesForAscii() {
        return asciiEscapes;
    }

    @Override
    public SerializableString getEscapeSequence(int ch) {
        if (ch == 0) {
            return NULL; // <5>
        }
        return null; // <6>
    }
}
  1. We create our own implementation of CharacterEscapes, which will be used when configuring the Jackson ObjectMapper. See below.
  2. We define a pre-calculated String that will be used when serializing the character; it will return the String null.
  3. We pre-calculate the escape codes for the ASCII character range. Since these characters are the most commonly used, they have a fast path via a correspondence table that we pre-calculate here by retrieving the standard escape codes.
  4. We replace the standard escape code for the ASCII character , which corresponds to the Unicode character \u0000, because the first 128 Unicode characters coincide with the ASCII range, with a custom escape sequence. Custom sequences are defined dynamically by the getEscapeSequence(int) method.
  5. We return our pre-calculated String as a custom sequence for the character corresponding to the code point 0.
  6. We return null for all others, indicating that there is no other custom escape sequence.

Last step, we’ll configure Jackson’s ObjectMapper to use our custom CharacterEscapes:

ObjectMapper objectMapper = new ObjectMapper();
objectMapper.getFactory().setCharacterEscapes(new SafeguardCharacterEscapes());

And that’s it, no more failing tasks! Thank you Jackson and your hidden gem CharacterEscapes!

Finally, after discussion, we decided not to integrate this change into Kestra. If the character is not supported by PostgreSQL, there is a good reason. A task output must be representable in JSON, and \0000 is not valid JSON. But even though we didn’t integrate this change, I found the feature interesting enough to share it with you ;).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.