The surprising complexity of .properties files
I was recently working on better support for .properties files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.
There are three separators (and one of them is whitespace)
Most people think .properties means key=value. In reality:
- key=value
- key:value
- keyâ value(one or more spaces or tabs)
All three are valid. That means the following are different lines with the same meaning:
server.port=8080
server.port:8080
server.port    8080
What I validate
- Missing separator: if a nonâcomment, nonâblank line has no - =,- :, or whitespace separator, thatâs an error.
- Empty key: a line thatâs just - =or- :(or just whitespace before value) is an error for an empty key.- =value # âľ error: empty key :value # âľ error: empty key 
What is allowed
- Explicit empty values are fine with any separator: - empty.key= empty.key: empty.keyâ - All three parse as - empty.keywith an empty string value.
Continuations: odd vs even backslashes, and trailing whitespace
A line ending with a continuation backslash \ joins with the next line. This is where bugs hide:
- Odd number of trailing backslashes â continuation.
- Even number â the last backslash is escaped, so no continuation.
# Continues (odd backslashes at EOL)
sql.query=SELECT * FROM users \
# Does NOT continue (even backslashes at EOL)
literal.backslash=path ends with \\  # value ends with a single '\'
Trailing whitespace matters
A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), itâs a broken continuation error.
broken.continuation=this ends with a backslash \â â â 
# EOF here â error: âLine ends with continuation backslash but file ended.â
Multiline values done right
sql.query=SELECT id, name, email \
    FROM users \
    WHERE active = true \
    ORDER BY name
When parsed, this becomes a single value:
SELECT id, name, email FROM users WHERE active = true ORDER BY name
Duplicates are subtle (caseâsensitive keys)
I treat keys as caseâsensitive and flag all occurrences when the same key appears multiple times:
duplicate.key=first
duplicate.key=second
duplicate.key=third
All three lines receive a warning that includes the index of every occurrence (e.g., âDuplicate key âduplicate.keyâ found at: line 2, line 5, line 8â). By contrast:
myKey=one
MyKey=two
myKey=three
Only the two myKey entries get flagged; MyKey is distinct.
Why warn and not error? Real configs sometimes rely on âlast one wins,â but itâs almost never intentional. A warning keeps you honest without breaking builds.
Unicode: \uXXXX escapes, surrogate pairs, and âgarbageâinâ behavior
Properties files support \uXXXX escapes. That opens a whole Unicode can: invalid lengths, nonâhex digits, surrogate pairs for emoji, and âunknownâ escapes.
Invalid escape sequences
Things like \u123 or \u12G4 show up in the wild. I parse them gracefullyâno exceptionsâand keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesnât overâcorrect malformed text.
Surrogate pairs for emoji
Escaped emoji like \uD83D\uDE80 (đ) decode correctly. In UTFâ8 mode I emit a warning (âUnicode escape sequence detectedâ) because direct Unicode is usually clearer. In ISOâ8859â1 mode, escapes are often necessary, so I emit no warning.
Standard escapes âjust workâ
The usual suspects decode as expected:
- \t,- \n,- \r,- \f,- \\
- escaped separators and specials: \,\:,\=,\#,\!
Unknown singleâletter escapes like \q or \z are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.
Encoding modes: UTFâ8 vs ISOâ8859â1
Historically, Java treated .properties as Latinâ1 (ISOâ8859â1), with \uXXXX for anything beyond that range. Many modern tools use UTFâ8. To make intent explicit, I let the validator run in either mode.
ISOâ8859â1 mode
- Error on characters outside Latinâ1. - unicode.chinese=ä˝ ĺĽ˝ä¸ç # error (outside ISO-8859-1) unicode.emoji=đđ # error valid.iso=cafĂŠ # fine (ĂŠ is Latinâ1) 
- \uXXXXfor Latinâ1 letters like- \u00e9(ĂŠ) is allowed and not warned.
UTFâ8 mode
- Direct Unicode is preferred and not warned.
- \uXXXXescapes are warned as unnecessary (but still decoded). That includes escapes for ASCII:- \u0041â âAâ with a warning.
Pick the mode that matches your runtime, and youâll get the right balance of errors vs. guidance.
Comments and structure: preserve intent, donât rewrite history
Lines starting with # or ! are comments. During validation, I:
- Attach leading comments to the next property as leadingComments.
- Keep rawtext for each entry exactly as read.
- Do not escape or normalize anything during validation.
During formatting, I:
- Preserve comments asâis. 
- Add a consistent - key = valuespacing.
- Escape - =,- :, and spaces inside values so the output remains parsable:- # original key=value with = and : chars # formatted key = value with \= and \: chars 
This âno touching during validationâ rule prevents a whole class of âthe linter changed my configâ surprises.
Lines that look empty⌠but arenât
A sneaky category:
- A line thatâs only - =or- :â empty key error.
- A line thatâs - keyâ â ââ a valid key with an explicit empty value (whitespace is the separator).
- Whitespace around separators with empty values is fine: - key1 = key2: key3â â â 
A practical checklist (aka miniâlinter rules)
- Flag lines with no - =,- :, or whitespace separator (error).
- Flag empty keys (error) but allow explicit empty values. 
- Handle continuation logic: odd vs even trailing backslashes; treat trailing whitespace after a continuation backslash as continuation; error if EOF cuts it off. 
- Treat keys as caseâsensitive; warn on duplicates and list all occurrences. 
- Decode standard escapes; treat unknown escapes literally without crashing. 
- Support UTFâ8 and ISOâ8859â1 modes: - UTFâ8: warn on \uXXXXas unnecessary.
- ISOâ8859â1: error on outâofârange chars; allow \uXXXXfreely.
 
- UTFâ8: warn on 
- Keep validation readâonly; do formatting in a separate step. 
- Preserve comments and attach them to following entries for context. 
- Represent multiline values as a single logical value; track start/end lines for tooling. 
Closing thoughts
I was planning to be done with .properties files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, realâworld examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)