558 words, 3 min read

When you need to convert PDF files to images on a Linux server, pdftoppm (from the Poppler utilities) is a fast and reliable tool. In this post, we’ll look at how to invoke pdftoppm from Elixir and how to run multiple conversions in parallel to improve throughput.

Installing pdftoppm

On most Linux distributions, pdftoppm is part of the poppler-utils package, on macOS, it's simply poppler.

# Debian / Ubuntu
sudo apt install poppler-utils
# Alpine
apk add poppler-utils
# macOS
brew install poppler

You can verify the installation with:

pdftoppm -h

Basic pdftoppm usage

To convert a PDF to JPEG images at 150 DPI:

pdftoppm -jpeg -r 150 input.pdf output/page

This produces files like:

output/page-1.jpg
output/page-2.jpg

Each page becomes a separate image.

Writing image data to stdout with pdftoppm

In some setups it is useful to avoid temporary files and let pdftoppm write the rendered image directly to stdout. From Elixir, you can then capture that output and persist it yourself. This post shows how to do this cleanly, while keeping stdout and stderr separated so errors are easy to handle.

pdftoppm writes images to files by default, but if you don't pass the PPM-file-prefix it will write the image data to stdout.

To render a single page as JPEG to stdout:

pdftoppm -jpeg -r 150 -f 1 -l 1 -jpegopt quality=85 -aa yes -aaVector yes input.pdf

On success:

  • stdout contains the binary JPEG data
  • stderr is empty

On failure:

  • stdout is empty
  • stderr contains the error message

This makes it a good fit for piping and programmatic use.

Why System.cmd/3 is not enough

System.cmd/3 can redirect stderr to stdout, but it cannot capture them separately. Since we explicitly want:

  • image data from stdout
  • error messages from stderr

we need to use a Port.

Converting a single page from Elixir

The function below renders a single page to JPEG, saves the image to disk, and returns structured errors when something goes wrong.

defmodule PdfToImage do
def convert_page(pdf_path, page, output_file, opts \\ []) do
dpi = Keyword.get(opts, :dpi, 150)
args = [
"pdftoppm",
"-jpeg",
"-jpegopt", "quality=85",
"-aa", "yes",
"-aaVector", "yes",
"-r", to_string(dpi),
"-f", to_string(page),
"-l", to_string(page),
pdf_path
]
port =
Port.open(
{:spawn_executable, System.find_executable("pdftoppm")},
[:binary, :exit_status, args: tl(args)]
)
collect_output(port, output_file, <<>>, <<>>)
end
defp collect_output(port, output_file, stdout, stderr) do
receive do
{^port, {:data, data}} ->
collect_output(port, output_file, stdout <> data, stderr)
{^port, {:exit_status, 0}} ->
File.write!(output_file, stdout)
:ok
{^port, {:exit_status, status}} ->
{:error, {status, stderr}}
after
30_000 ->
Port.close(port)
{:error, :timeout}
end
end
end

Usage:

PdfToImage.convert_page(
"input.pdf",
1,
"output/page-1.jpg",
dpi: 200
)

Parallelizing page conversion

Because each page conversion is independent, this approach works well with Task.async_stream/3.

pages = 1..10
Task.async_stream(
pages,
fn page ->
PdfToImage.convert_page(
"input.pdf",
page,
"output/page-#{page}.jpg"
)
end,
max_concurrency: System.schedulers_online(),
timeout: :infinity
)
|> Enum.to_list()

Each task spawns its own pdftoppm process, captures binary image data from stdout, and only writes a file once rendering succeeds.

Error handling characteristics

  • On success, only stdout is used and written to disk
  • On failure, no file is created
  • The returned error contains the full stderr output from pdftoppm
  • This makes it suitable for background jobs and structured logging

Conclusion

By letting pdftoppm write image data to stdout and capturing it via a Port, you gain full control over I/O, error handling, and parallel execution. This avoids temporary files, keeps failure cases clean, and integrates well with Elixir’s concurrency primitives.