Sometimes you need to remove duplicate lines from a string but still preserve the structure, including empty lines. This is common when working with structured text data like logs or config files, where whitespace may carry meaning. Here's how to do it in Elixir.
The goal
Given a multi-line string, we want to:
- Keep the first occurrence of each non-empty line
- Preserve empty lines exactly where they appear
- Maintain the original line order
For example:
"""
line 1
line 2
line 1
line 3
line 2
"""
Should become:
"""
line 1
line 2
line 3
"""
Here's a function that does exactly that:
def remove_duplicate_lines_preserving_empty(string) do
string
|> String.split("\n", trim: false)
|> Enum.reduce({MapSet.new(), []}, fn
"", {seen, acc} ->
{seen, ["" | acc]} # always keep empty lines
line, {seen, acc} ->
if MapSet.member?(seen, line) do
{seen, acc}
else
{MapSet.put(seen, line), [line | acc]}
end
end)
|> elem(1)
|> Enum.reverse()
|> Enum.join("\n")
end
This function:
- Splits the string into lines while preserving empty ones
- Uses a
MapSet
to track seen non-empty lines - Uses
Enum.reduce
to build the result while filtering duplicates - Reverses the list because we built it in reverse order
You can validate this function using ExUnit
:
defmodule MyStringUtilsTest do
use ExUnit.Case
test "removes duplicates but preserves first occurrence and empty lines" do
input = """
line 1
line 2
line 1
line 3
line 2
"""
expected = """
line 1
line 2
line 3
"""
assert remove_duplicate_lines_preserving_empty(input) == expected
end
test "only empty lines" do
input = "\n\n\n"
expected = "\n\n\n"
assert remove_duplicate_lines_preserving_empty(input) == expected
end
test "no duplicates, includes empty lines" do
input = """
a
b
c
"""
assert remove_duplicate_lines_preserving_empty(input) == input
end
test "all lines are duplicates except empty ones" do
input = """
repeat
repeat
repeat
repeat
"""
expected = """
repeat
"""
assert remove_duplicate_lines_preserving_empty(input) == expected
end
test "empty input returns empty string" do
assert remove_duplicate_lines_preserving_empty("") == ""
end
end
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.