<p>If you're interested in database syncing and how to tackle that, you must read this one:</p> <blockquote><p>A common problem you’ve almost certainly faced is to sync two datastores. This problem comes up in numerous shapes and forms: Receiving webhooks and writing them into your datastore, maintaining a materialized view, making sure a cache reflects reality, ensure documents make it from your source of truth to a search index, or your data from your transactional store to your data lake or column store.</p> <p>If you’ve built such a system, you’ve almost certainly seen B drift out of sync. Building a completely reliable syncing mechanism is difficult, but perhaps we can build a checksumming mechanism to check if the two datastores are equal in a few seconds?</p> <p>In this issue of napkin math, we look at implementing a solution to check whether A and B are in sync for 100M records in a few seconds. The key idea is to checksum an indexed updated_at column and use a binary search to drill down to the mismatching records. All of this will be explained in great detail, read on!</p> <p><a href="https://sirupsen.com/napkin/problem-14-using-checksums-to-verify/">Read more</a></p> </blockquote>

Related Posts

  • Migrating from GORM v1 to v2
  • How to test if __name__ == "__main__"
  • Measure execution time
  • Use Prefetch Objects to Control Your Prefetch Related
  • Combining gRPC and HTTP on the same port