โ ๏ธ This post links to an external website. โ ๏ธ
In highโthroughput data pipelines, the goal is to keep CPUs saturated while avoiding memory exhaustion.
Memory exhaustion happens when pile-ups occur. Without proper back-pressure, a data pipeline can enter an out-of-memory loop (affectionately, an "OOM loop") that makes recovery very difficult: the pipeline goes down because a pile-up caused a memory spike. When it comes back up, there's only more work to do, which means a bigger pile-up and a bigger memory spike.
Elixir's GenStage provides a stellar example for how to build high-throughput data pipelines that saturate cores with back-pressure controls. We've used it to great effect at Sequin. I explain the principles of its architecture below.
continue reading on blog.sequinstream.com
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.