CharacterEscapes: Jackson’s hidden gem
At Kestra, the data orchestration platform I work for, we had an issue ([#10326] (https://github.com/kestra-io/kestra/issues/10326)) opened by a user reporting a problem with the PostgreSQL database and the Unicode character \u0000
. A workflow task that returned this character in its output was failing.
After investigation, PostgreSQL refuses to store a JSONB entry containing this character because it has no textual representation. This is because it is the null character in Unicode, which is not allowed in JSON, which represents null with the character string
null
.
A hidden feature of Jackson, which we use for our JSON serialization layer, comes into play:
CharacterEscapes
.
This class allows us to configure how Jackson will escape special characters and allows us to escape the Unicode character
\u0000
to avoid crashing a workflow task that would return it in the output.
public class SafeguardCharacterEscapes extends CharacterEscapes { // <1> private static final SerializableString NULL = new SerializedString("null"); // <2> private final int[] asciiEscapes; SafeguardCharacterEscapes() { // <3> Start with the standard JSON escapes asciiEscapes = CharacterEscapes.standardAsciiEscapesForJSON(); // <4> And then specify that the null character should be escaped asciiEscapes[0] = CharacterEscapes.ESCAPE_CUSTOM; } @Override public int[] getEscapeCodesForAscii() { return asciiEscapes; } @Override public SerializableString getEscapeSequence(int ch) { if (ch == 0) { return NULL; // <5> } return null; // <6> } }
- We create our own implementation of
CharacterEscapes
, which will be used when configuring the JacksonObjectMapper
. See below. - We define a pre-calculated String that will be used when serializing the character; it will return the String
null
. - We pre-calculate the escape codes for the ASCII character range. Since these characters are the most commonly used, they have a fast path via a correspondence table that we pre-calculate here by retrieving the standard escape codes.
- We replace the standard escape code for the ASCII character
, which corresponds to the Unicode character
\u0000
, because the first 128 Unicode characters coincide with the ASCII range, with a custom escape sequence. Custom sequences are defined dynamically by thegetEscapeSequence(int)
method. - We return our pre-calculated String as a custom sequence for the character corresponding to the code point 0.
- We return
null
for all others, indicating that there is no other custom escape sequence.
Last step, we’ll configure Jackson’s
ObjectMapper
to use our custom CharacterEscapes
:
ObjectMapper objectMapper = new ObjectMapper(); objectMapper.getFactory().setCharacterEscapes(new SafeguardCharacterEscapes());
And that’s it, no more failing tasks! Thank you Jackson and your hidden gem
CharacterEscapes
!
Finally, after discussion, we decided not to integrate this change into Kestra. If the character is not supported by PostgreSQL, there is a good reason. A task output must be representable in JSON, and
\0000
is not valid JSON.
But even though we didn’t integrate this change, I found the feature interesting enough to share it with you ;).