The surprising complexity of .properties files
I was recently working on better support for .properties files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.
There are three separators (and one of them is whitespace)
Most people think .properties means key=value. In reality:
key=valuekey:valuekeyâ value(one or more spaces or tabs)
All three are valid. That means the following are different lines with the same meaning:
server.port=8080
server.port:8080
server.port 8080
What I validate
Missing separator: if a nonâcomment, nonâblank line has no
=,:, or whitespace separator, thatâs an error.Empty key: a line thatâs just
=or:(or just whitespace before value) is an error for an empty key.=value # âľ error: empty key :value # âľ error: empty key
What is allowed
Explicit empty values are fine with any separator:
empty.key= empty.key: empty.keyâ
All three parse as
empty.keywith an empty string value.
Continuations: odd vs even backslashes, and trailing whitespace
A line ending with a continuation backslash \ joins with the next line. This is where bugs hide:
- Odd number of trailing backslashes â continuation.
- Even number â the last backslash is escaped, so no continuation.
# Continues (odd backslashes at EOL)
sql.query=SELECT * FROM users \
# Does NOT continue (even backslashes at EOL)
literal.backslash=path ends with \\ # value ends with a single '\'
Trailing whitespace matters
A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), itâs a broken continuation error.
broken.continuation=this ends with a backslash \â â â
# EOF here â error: âLine ends with continuation backslash but file ended.â
Multiline values done right
sql.query=SELECT id, name, email \
FROM users \
WHERE active = true \
ORDER BY name
When parsed, this becomes a single value:
SELECT id, name, email FROM users WHERE active = true ORDER BY name
Duplicates are subtle (caseâsensitive keys)
I treat keys as caseâsensitive and flag all occurrences when the same key appears multiple times:
duplicate.key=first
duplicate.key=second
duplicate.key=third
All three lines receive a warning that includes the index of every occurrence (e.g., âDuplicate key âduplicate.keyâ found at: line 2, line 5, line 8â). By contrast:
myKey=one
MyKey=two
myKey=three
Only the two myKey entries get flagged; MyKey is distinct.
Why warn and not error? Real configs sometimes rely on âlast one wins,â but itâs almost never intentional. A warning keeps you honest without breaking builds.
Unicode: \uXXXX escapes, surrogate pairs, and âgarbageâinâ behavior
Properties files support \uXXXX escapes. That opens a whole Unicode can: invalid lengths, nonâhex digits, surrogate pairs for emoji, and âunknownâ escapes.
Invalid escape sequences
Things like \u123 or \u12G4 show up in the wild. I parse them gracefullyâno exceptionsâand keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesnât overâcorrect malformed text.
Surrogate pairs for emoji
Escaped emoji like \uD83D\uDE80 (đ) decode correctly. In UTFâ8 mode I emit a warning (âUnicode escape sequence detectedâ) because direct Unicode is usually clearer. In ISOâ8859â1 mode, escapes are often necessary, so I emit no warning.
Standard escapes âjust workâ
The usual suspects decode as expected:
\t,\n,\r,\f,\\- escaped separators and specials:
\,\:,\=,\#,\!
Unknown singleâletter escapes like \q or \z are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.
Encoding modes: UTFâ8 vs ISOâ8859â1
Historically, Java treated .properties as Latinâ1 (ISOâ8859â1), with \uXXXX for anything beyond that range. Many modern tools use UTFâ8. To make intent explicit, I let the validator run in either mode.
ISOâ8859â1 mode
Error on characters outside Latinâ1.
unicode.chinese=ä˝ ĺĽ˝ä¸ç # error (outside ISO-8859-1) unicode.emoji=đđ # error valid.iso=cafĂŠ # fine (ĂŠ is Latinâ1)
\uXXXXfor Latinâ1 letters like\u00e9(ĂŠ) is allowed and not warned.
UTFâ8 mode
- Direct Unicode is preferred and not warned.
\uXXXXescapes are warned as unnecessary (but still decoded). That includes escapes for ASCII:\u0041â âAâ with a warning.
Pick the mode that matches your runtime, and youâll get the right balance of errors vs. guidance.
Comments and structure: preserve intent, donât rewrite history
Lines starting with # or ! are comments. During validation, I:
- Attach leading comments to the next property as
leadingComments. - Keep
rawtext for each entry exactly as read. - Do not escape or normalize anything during validation.
During formatting, I:
Preserve comments asâis.
Add a consistent
key = valuespacing.Escape
=,:, and spaces inside values so the output remains parsable:# original key=value with = and : chars # formatted key = value with \= and \: chars
This âno touching during validationâ rule prevents a whole class of âthe linter changed my configâ surprises.
Lines that look empty⌠but arenât
A sneaky category:
A line thatâs only
=or:â empty key error.A line thatâs
keyâ â ââ a valid key with an explicit empty value (whitespace is the separator).Whitespace around separators with empty values is fine:
key1 = key2: key3â â â
A practical checklist (aka miniâlinter rules)
Flag lines with no
=,:, or whitespace separator (error).Flag empty keys (error) but allow explicit empty values.
Handle continuation logic: odd vs even trailing backslashes; treat trailing whitespace after a continuation backslash as continuation; error if EOF cuts it off.
Treat keys as caseâsensitive; warn on duplicates and list all occurrences.
Decode standard escapes; treat unknown escapes literally without crashing.
Support UTFâ8 and ISOâ8859â1 modes:
- UTFâ8: warn on
\uXXXXas unnecessary. - ISOâ8859â1: error on outâofârange chars; allow
\uXXXXfreely.
- UTFâ8: warn on
Keep validation readâonly; do formatting in a separate step.
Preserve comments and attach them to following entries for context.
Represent multiline values as a single logical value; track start/end lines for tooling.
Closing thoughts
I was planning to be done with .properties files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, realâworld examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)