🔗 Kafka is fast -- I'll use Postgres

database postgresql reading-list

727 words, 4 min read

⚠️ This post links to an external website. ⚠️

I feel like the tech world lives in two camps.

One camp chases buzzwords.

This camp tends to adopt whatever’s popular without thinking hard about whether it’s appropriate. They tend to fall for all the purported benefits the sales pitch gives them - real-time, infinitely scale, cutting-edge, cloud-native, serverless, zero-trust, AI-powered, etc.

You see this everywhere in the Kafka world: Streaming Lakehouse™️, Kappa™️ Architecture, Streaming AI Agents1.

This phenomenon is sometimes known as resume-driven design. It can also be a scale cargo-cult. Modern practices actively encourage both. Consultants push “innovative architectures” stuffed with vendor tech via “insight” reports2. System design interviews expect you to design Google-scale architectures that are inevitably at a scale 100x higher than the company you’re interviewing for would ever need. Career progression rewards you for replatforming to the Hot New Stack™️, not for being resourceful.

The other camp chases common sense

This camp is far more pragmatic. They strip away unnecessary complexity and steer clear of overengineered solutions. They reason from first principles before making technology choices. They resist marketing hype and approach vendor claims with healthy skepticism.

Historically, it has felt like Camp 1 definitively held the upper hand in sheer numbers and noise. Today, it feels like the pendulum may be beginning to swing back, at least a tiny bit. Two recent trends are on the side of Camp 2:

Trend 1 - the “Small Data” movement. People are realizing two things - their data isn’t that big and their computers are becoming big too. You can rent a 128-core, 4 TB of RAM instance from AWS. AMD just released 192-core CPUs this summer. That ought to be enough for anybody.3

Trend 2 - the Postgres Renaissance. The space is seeing incredible growth and investment4. In the last 2 years, the phrase “Just Use Postgres (for everything)” has gained a ton of popularity. The basic premise is that you shouldn’t complicate things with new tech when you don’t need to, and that Postgres alone solves most problems pretty well. Postgres competes with purpose-built solutions like:

Elasticsearch (functionality supported by Postgres’ tsvector/tsquery)

MongoDB (jsonb)

Redis (CREATE UNLOGGED TABLE)

AI Vector Databases (pgvector, pgai)

Snowflake / OLAP (pg_lake, pg_duckdb, pg_mooncake)

and… Kafka (this blog).

The claim isn’t that Postgres is functionally equivalent to any of these specialized systems. The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)

When you combine the two trends, the appeal becomes obvious. Postgres is a battle-tested, well-known system that is simple, scalable and reliable. Pair it with today’s powerful hardware and you quickly begin to realize that, more often than not, you do not need the state-of-the-art highly optimized and complex distributed system in order to handle your organization’s scale.

Despite being somebody who is biased towards Kafka, I tend to agree. Kafka is similar to Postgres in that it’s stable, mature, battle-tested and boasts a strong community. It also scales a lot further. Despite that, I don’t think it’s the right choice for a lot of cases. Very often I see it get adopted where it doesn’t make sense.

A 500 KB/s workload should not use Kafka. There is a scalability cargo cult in tech that always wants to choose “the best possible” tech for a problem - but this misses the forest for the trees. The “best possible” solution frequently isn’t a technical question - it’s a practical one. Adriano makes an airtight case for why you should opt for simple tech in his PG as Queue blog (2023) that originally inspired me to write this.

Enough background. In this article, we will do three simple things:

Benchmark how far Postgres can scale for pub/sub messaging - # PG as a Pub/Sub

Benchmark how far Postgres can scale for queueing - # PG as a Queue

Concisely touch upon when Postgres can be a fit for these use cases - # Should You Use Postgres?

I am not aiming for an exhaustive in-depth evaluation. Benchmarks are messy af. Rather, my goal is to publish some reasonable data points which can start a discussion.

(while this article is for Postgres, feel free to replace it with your database of choice)

continue reading on topicpartition.io

If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.