The surprising complexity of .properties files

28 Jul, 2025

I was recently working on better support for .properties files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.

There are three separators (and one of them is whitespace)

Most people think .properties means key=value. In reality:

key=value
key:value
key␠value (one or more spaces or tabs)

All three are valid. That means the following are different lines with the same meaning:

server.port=8080
server.port:8080
server.port    8080

What I validate

Missing separator: if a non‑comment, non‑blank line has no =, :, or whitespace separator, that’s an error.
Empty key: a line that’s just = or : (or just whitespace before value) is an error for an empty key.
```
=value   # ⟵ error: empty key
:value   # ⟵ error: empty key
```

What is allowed

Explicit empty values are fine with any separator:
```
empty.key=
empty.key:
empty.key␠
```
All three parse as empty.key with an empty string value.

Continuations: odd vs even backslashes, and trailing whitespace

A line ending with a continuation backslash \ joins with the next line. This is where bugs hide:

Odd number of trailing backslashes → continuation.
Even number → the last backslash is escaped, so no continuation.

# Continues (odd backslashes at EOL)
sql.query=SELECT * FROM users \

# Does NOT continue (even backslashes at EOL)
literal.backslash=path ends with \\  # value ends with a single '\'

Trailing whitespace matters

A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), it’s a broken continuation error.

broken.continuation=this ends with a backslash \␠␠␠
# EOF here → error: “Line ends with continuation backslash but file ended.”

Multiline values done right

sql.query=SELECT id, name, email \
    FROM users \
    WHERE active = true \
    ORDER BY name

When parsed, this becomes a single value:

SELECT id, name, email FROM users WHERE active = true ORDER BY name

Duplicates are subtle (case‑sensitive keys)

I treat keys as case‑sensitive and flag all occurrences when the same key appears multiple times:

duplicate.key=first
duplicate.key=second
duplicate.key=third

All three lines receive a warning that includes the index of every occurrence (e.g., “Duplicate key ‘duplicate.key’ found at: line 2, line 5, line 8”). By contrast:

myKey=one
MyKey=two
myKey=three

Only the two myKey entries get flagged; MyKey is distinct.

Why warn and not error? Real configs sometimes rely on “last one wins,” but it’s almost never intentional. A warning keeps you honest without breaking builds.

Unicode: `\uXXXX` escapes, surrogate pairs, and “garbage‑in” behavior

Properties files support \uXXXX escapes. That opens a whole Unicode can: invalid lengths, non‑hex digits, surrogate pairs for emoji, and “unknown” escapes.

Invalid escape sequences

Things like \u123 or \u12G4 show up in the wild. I parse them gracefully—no exceptions—and keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesn’t over‑correct malformed text.

Surrogate pairs for emoji

Escaped emoji like \uD83D\uDE80 (🚀) decode correctly. In UTF‑8 mode I emit a warning (“Unicode escape sequence detected”) because direct Unicode is usually clearer. In ISO‑8859‑1 mode, escapes are often necessary, so I emit no warning.

Standard escapes “just work”

The usual suspects decode as expected:

\t, \n, \r, \f, \\
escaped separators and specials: \ , \:, \=, \#, \!

Unknown single‑letter escapes like \q or \z are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.

Encoding modes: UTF‑8 vs ISO‑8859‑1

Historically, Java treated .properties as Latin‑1 (ISO‑8859‑1), with \uXXXX for anything beyond that range. Many modern tools use UTF‑8. To make intent explicit, I let the validator run in either mode.

ISO‑8859‑1 mode

Error on characters outside Latin‑1.

unicode.chinese=你好世界     # error (outside ISO-8859-1)
unicode.emoji=🎉🚀         # error
valid.iso=café            # fine (é is Latin‑1)

\uXXXX for Latin‑1 letters like \u00e9 (é) is allowed and not warned.

UTF‑8 mode

Direct Unicode is preferred and not warned.
\uXXXX escapes are warned as unnecessary (but still decoded). That includes escapes for ASCII: \u0041 → “A” with a warning.

Pick the mode that matches your runtime, and you’ll get the right balance of errors vs. guidance.

Comments and structure: preserve intent, don’t rewrite history

Lines starting with # or ! are comments. During validation, I:

Attach leading comments to the next property as leadingComments.
Keep raw text for each entry exactly as read.
Do not escape or normalize anything during validation.

During formatting, I:

Preserve comments as‑is.
Add a consistent key = value spacing.

Escape =, :, and spaces inside values so the output remains parsable:

# original
key=value with = and : chars

# formatted
key = value with \= and \: chars

This “no touching during validation” rule prevents a whole class of “the linter changed my config” surprises.

Lines that look empty… but aren’t

A sneaky category:

A line that’s only = or : → empty key error.
A line that’s key␠␠␠ → a valid key with an explicit empty value (whitespace is the separator).
Whitespace around separators with empty values is fine:
```
key1 = 
key2: 
key3␠␠␠
```

A practical checklist (aka mini‑linter rules)

Flag lines with no =, :, or whitespace separator (error).
Flag empty keys (error) but allow explicit empty values.
Handle continuation logic: odd vs even trailing backslashes; treat trailing whitespace after a continuation backslash as continuation; error if EOF cuts it off.
Treat keys as case‑sensitive; warn on duplicates and list all occurrences.
Decode standard escapes; treat unknown escapes literally without crashing.
Support UTF‑8 and ISO‑8859‑1 modes:
- UTF‑8: warn on \uXXXX as unnecessary.
- ISO‑8859‑1: error on out‑of‑range chars; allow \uXXXX freely.
Keep validation read‑only; do formatting in a separate step.
Preserve comments and attach them to following entries for context.
Represent multiline values as a single logical value; track start/end lines for tooling.

Closing thoughts

I was planning to be done with .properties files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, real‑world examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)