The surprising complexity of .properties files
I was recently working on better support for .properties
files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.
There are three separators (and one of them is whitespace)
Most people think .properties
means key=value
. In reality:
key=value
key:value
keyâ value
(one or more spaces or tabs)
All three are valid. That means the following are different lines with the same meaning:
server.port=8080
server.port:8080
server.port 8080
What I validate
Missing separator: if a nonâcomment, nonâblank line has no
=
,:
, or whitespace separator, thatâs an error.Empty key: a line thatâs just
=
or:
(or just whitespace before value) is an error for an empty key.=value # â” error: empty key :value # â” error: empty key
What is allowed
Explicit empty values are fine with any separator:
empty.key= empty.key: empty.keyâ
All three parse as
empty.key
with an empty string value.
Continuations: odd vs even backslashes, and trailing whitespace
A line ending with a continuation backslash \
joins with the next line. This is where bugs hide:
- Odd number of trailing backslashes â continuation.
- Even number â the last backslash is escaped, so no continuation.
# Continues (odd backslashes at EOL)
sql.query=SELECT * FROM users \
# Does NOT continue (even backslashes at EOL)
literal.backslash=path ends with \\ # value ends with a single '\'
Trailing whitespace matters
A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), itâs a broken continuation error.
broken.continuation=this ends with a backslash \â â â
# EOF here â error: âLine ends with continuation backslash but file ended.â
Multiline values done right
sql.query=SELECT id, name, email \
FROM users \
WHERE active = true \
ORDER BY name
When parsed, this becomes a single value:
SELECT id, name, email FROM users WHERE active = true ORDER BY name
Duplicates are subtle (caseâsensitive keys)
I treat keys as caseâsensitive and flag all occurrences when the same key appears multiple times:
duplicate.key=first
duplicate.key=second
duplicate.key=third
All three lines receive a warning that includes the index of every occurrence (e.g., âDuplicate key âduplicate.keyâ found at: line 2, line 5, line 8â). By contrast:
myKey=one
MyKey=two
myKey=three
Only the two myKey
entries get flagged; MyKey
is distinct.
Why warn and not error? Real configs sometimes rely on âlast one wins,â but itâs almost never intentional. A warning keeps you honest without breaking builds.
Unicode: \uXXXX
escapes, surrogate pairs, and âgarbageâinâ behavior
Properties files support \uXXXX
escapes. That opens a whole Unicode can: invalid lengths, nonâhex digits, surrogate pairs for emoji, and âunknownâ escapes.
Invalid escape sequences
Things like \u123
or \u12G4
show up in the wild. I parse them gracefullyâno exceptionsâand keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesnât overâcorrect malformed text.
Surrogate pairs for emoji
Escaped emoji like \uD83D\uDE80
(đ) decode correctly. In UTFâ8 mode I emit a warning (âUnicode escape sequence detectedâ) because direct Unicode is usually clearer. In ISOâ8859â1 mode, escapes are often necessary, so I emit no warning.
Standard escapes âjust workâ
The usual suspects decode as expected:
\t
,\n
,\r
,\f
,\\
- escaped separators and specials:
\
,\:
,\=
,\#
,\!
Unknown singleâletter escapes like \q
or \z
are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.
Encoding modes: UTFâ8 vs ISOâ8859â1
Historically, Java treated .properties
as Latinâ1 (ISOâ8859â1), with \uXXXX
for anything beyond that range. Many modern tools use UTFâ8. To make intent explicit, I let the validator run in either mode.
ISOâ8859â1 mode
Error on characters outside Latinâ1.
unicode.chinese=äœ ć„œäžç # error (outside ISO-8859-1) unicode.emoji=đđ # error valid.iso=cafĂ© # fine (Ă© is Latinâ1)
\uXXXX
for Latinâ1 letters like\u00e9
(Ă©) is allowed and not warned.
UTFâ8 mode
- Direct Unicode is preferred and not warned.
\uXXXX
escapes are warned as unnecessary (but still decoded). That includes escapes for ASCII:\u0041
â âAâ with a warning.
Pick the mode that matches your runtime, and youâll get the right balance of errors vs. guidance.
Comments and structure: preserve intent, donât rewrite history
Lines starting with #
or !
are comments. During validation, I:
- Attach leading comments to the next property as
leadingComments
. - Keep
raw
text for each entry exactly as read. - Do not escape or normalize anything during validation.
During formatting, I:
Preserve comments asâis.
Add a consistent
key = value
spacing.Escape
=
,:
, and spaces inside values so the output remains parsable:# original key=value with = and : chars # formatted key = value with \= and \: chars
This âno touching during validationâ rule prevents a whole class of âthe linter changed my configâ surprises.
Lines that look empty⊠but arenât
A sneaky category:
A line thatâs only
=
or:
â empty key error.A line thatâs
keyâ â â
â a valid key with an explicit empty value (whitespace is the separator).Whitespace around separators with empty values is fine:
key1 = key2: key3â â â
A practical checklist (aka miniâlinter rules)
Flag lines with no
=
,:
, or whitespace separator (error).Flag empty keys (error) but allow explicit empty values.
Handle continuation logic: odd vs even trailing backslashes; treat trailing whitespace after a continuation backslash as continuation; error if EOF cuts it off.
Treat keys as caseâsensitive; warn on duplicates and list all occurrences.
Decode standard escapes; treat unknown escapes literally without crashing.
Support UTFâ8 and ISOâ8859â1 modes:
- UTFâ8: warn on
\uXXXX
as unnecessary. - ISOâ8859â1: error on outâofârange chars; allow
\uXXXX
freely.
- UTFâ8: warn on
Keep validation readâonly; do formatting in a separate step.
Preserve comments and attach them to following entries for context.
Represent multiline values as a single logical value; track start/end lines for tooling.
Closing thoughts
I was planning to be done with .properties
files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, realâworld examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)