Sometimes you need to remove duplicate lines from a string but still preserve the structure, including empty lines. This is common when working with structured text data like logs or config files, where whitespace may carry meaning. Here's how to do it in Elixir.

The goal

Given a multi-line string, we want to:

  • Keep the first occurrence of each non-empty line
  • Preserve empty lines exactly where they appear
  • Maintain the original line order

For example:

"""
line 1

line 2
line 1

line 3
line 2
"""

Should become:

"""
line 1

line 2

line 3
"""

Here's a function that does exactly that:

def remove_duplicate_lines_preserving_empty(string) do
  string
  |> String.split("\n", trim: false)
  |> Enum.reduce({MapSet.new(), []}, fn
    "", {seen, acc} ->
      {seen, ["" | acc]} # always keep empty lines

    line, {seen, acc} ->
      if MapSet.member?(seen, line) do
        {seen, acc}
      else
        {MapSet.put(seen, line), [line | acc]}
      end
  end)
  |> elem(1)
  |> Enum.reverse()
  |> Enum.join("\n")
end

This function:

  • Splits the string into lines while preserving empty ones
  • Uses a MapSet to track seen non-empty lines
  • Uses Enum.reduce to build the result while filtering duplicates
  • Reverses the list because we built it in reverse order

You can validate this function using ExUnit:

defmodule MyStringUtilsTest do
  use ExUnit.Case

  test "removes duplicates but preserves first occurrence and empty lines" do
    input = """
    line 1

    line 2
    line 1

    line 3
    line 2
    """

    expected = """
    line 1

    line 2

    line 3
    """

    assert remove_duplicate_lines_preserving_empty(input) == expected
  end

  test "only empty lines" do
    input = "\n\n\n"
    expected = "\n\n\n"
    assert remove_duplicate_lines_preserving_empty(input) == expected
  end

  test "no duplicates, includes empty lines" do
    input = """
    a

    b

    c
    """

    assert remove_duplicate_lines_preserving_empty(input) == input
  end

  test "all lines are duplicates except empty ones" do
    input = """
    repeat
    repeat

    repeat
    repeat
    """

    expected = """
    repeat

    """

    assert remove_duplicate_lines_preserving_empty(input) == expected
  end

  test "empty input returns empty string" do
    assert remove_duplicate_lines_preserving_empty("") == ""
  end
end