Igor's Techno Club

The surprising complexity of .properties files

I was recently working on better support for .properties files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.


There are three separators (and one of them is whitespace)

Most people think .properties means key=value. In reality:

All three are valid. That means the following are different lines with the same meaning:

server.port=8080
server.port:8080
server.port    8080

What I validate

What is allowed

Continuations: odd vs even backslashes, and trailing whitespace

A line ending with a continuation backslash \ joins with the next line. This is where bugs hide:

# Continues (odd backslashes at EOL)
sql.query=SELECT * FROM users \

# Does NOT continue (even backslashes at EOL)
literal.backslash=path ends with \\  # value ends with a single '\'

Trailing whitespace matters

A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), it’s a broken continuation error.

broken.continuation=this ends with a backslash \␠␠␠
# EOF here → error: “Line ends with continuation backslash but file ended.”

Multiline values done right

sql.query=SELECT id, name, email \
    FROM users \
    WHERE active = true \
    ORDER BY name

When parsed, this becomes a single value:

SELECT id, name, email FROM users WHERE active = true ORDER BY name

Duplicates are subtle (case‑sensitive keys)

I treat keys as case‑sensitive and flag all occurrences when the same key appears multiple times:

duplicate.key=first
duplicate.key=second
duplicate.key=third

All three lines receive a warning that includes the index of every occurrence (e.g., “Duplicate key ‘duplicate.key’ found at: line 2, line 5, line 8”). By contrast:

myKey=one
MyKey=two
myKey=three

Only the two myKey entries get flagged; MyKey is distinct.

Why warn and not error? Real configs sometimes rely on “last one wins,” but it’s almost never intentional. A warning keeps you honest without breaking builds.

Unicode: \uXXXX escapes, surrogate pairs, and “garbage‑in” behavior

Properties files support \uXXXX escapes. That opens a whole Unicode can: invalid lengths, non‑hex digits, surrogate pairs for emoji, and “unknown” escapes.

Invalid escape sequences

Things like \u123 or \u12G4 show up in the wild. I parse them gracefully—no exceptions—and keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesn’t over‑correct malformed text.

Surrogate pairs for emoji

Escaped emoji like \uD83D\uDE80 (🚀) decode correctly. In UTF‑8 mode I emit a warning (“Unicode escape sequence detected”) because direct Unicode is usually clearer. In ISO‑8859‑1 mode, escapes are often necessary, so I emit no warning.

Standard escapes “just work”

The usual suspects decode as expected:

Unknown single‑letter escapes like \q or \z are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.

Encoding modes: UTF‑8 vs ISO‑8859‑1

Historically, Java treated .properties as Latin‑1 (ISO‑8859‑1), with \uXXXX for anything beyond that range. Many modern tools use UTF‑8. To make intent explicit, I let the validator run in either mode.

ISO‑8859‑1 mode

UTF‑8 mode

Pick the mode that matches your runtime, and you’ll get the right balance of errors vs. guidance.

Comments and structure: preserve intent, don’t rewrite history

Lines starting with # or ! are comments. During validation, I:

During formatting, I:

This “no touching during validation” rule prevents a whole class of “the linter changed my config” surprises.

Lines that look empty
 but aren’t

A sneaky category:

A practical checklist (aka mini‑linter rules)

Closing thoughts

I was planning to be done with .properties files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, real‑world examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)