Distributed Data Insights

Why Distributed Data Is Hard (and Worth It)

Sun, 07 Jun 2026 10:00:00 -0700

Every distributed data system is an answer to the same uncomfortable question: what do you do when part of your system fails but the rest keeps running? On a single machine, a crash takes everything down together — clean, if catastrophic. Across a network, failure is partial, ambiguous, and constant.

The three things the network takes away

When you split state across machines, you lose three guarantees you took for granted on a single box:

Quorums and Tunable Consistency

Fri, 05 Jun 2026 09:00:00 -0700

Many distributed databases — Dynamo-style stores like Cassandra and Riak — don’t make you choose consistency once and for all. They let you tune it per request using quorums. The mechanism is simple arithmetic with surprisingly deep consequences.

The setup

Each piece of data is replicated to N nodes. For every operation you pick:

W — how many replicas must acknowledge a write before it’s considered done.
R — how many replicas you read from and compare before returning.

The headline rule:

LSM-Trees vs. B-Trees: Choosing Your Write Path

Tue, 02 Jun 2026 08:30:00 -0700

Underneath nearly every database is one of two storage engines: a B-tree or an LSM-tree. The choice shapes write throughput, read latency, space usage, and how the system behaves under pressure. It’s worth understanding why the big distributed stores — Cassandra, RocksDB, HBase, ScyllaDB — overwhelmingly chose LSM-trees.

B-trees: update in place

A B-tree keeps data sorted in fixed-size pages and mutates pages in place. To update a key, you find its page and overwrite it. Reads are excellent — typically a handful of page lookups — and the structure has powered relational databases for decades.

About

Mon, 01 Jan 0001 00:00:00 +0000

About me

I’m Dikang Gu, a distributed systems engineer who has spent his career building the data infrastructure behind some of the largest consumer platforms in the world.

I’m currently a Technical Director at Roblox, where I lead the OLTP database and storage platform that provides data storage and serving for Roblox’s hundreds of millions of monthly users and millions of creators. The work centers on fault-tolerant distributed storage, smart partitioning, rock-solid replication, and squeezing every drop of performance out of planet-scale systems.