<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Distributed Data Insights</title><link>http://ddinsights.net/</link><description>Recent content on Distributed Data Insights</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 07 Jun 2026 10:00:00 -0700</lastBuildDate><atom:link href="http://ddinsights.net/index.xml" rel="self" type="application/rss+xml"/><item><title>Why Distributed Data Is Hard (and Worth It)</title><link>http://ddinsights.net/posts/why-distributed-data-is-hard/</link><pubDate>Sun, 07 Jun 2026 10:00:00 -0700</pubDate><guid>http://ddinsights.net/posts/why-distributed-data-is-hard/</guid><description>&lt;p>Every distributed data system is an answer to the same uncomfortable question:
&lt;em>what do you do when part of your system fails but the rest keeps running?&lt;/em> On a
single machine, a crash takes everything down together — clean, if catastrophic.
Across a network, failure is &lt;strong>partial&lt;/strong>, &lt;strong>ambiguous&lt;/strong>, and &lt;strong>constant&lt;/strong>.&lt;/p>
&lt;h2 id="the-three-things-the-network-takes-away">The three things the network takes away&lt;/h2>
&lt;p>When you split state across machines, you lose three guarantees you took for
granted on a single box:&lt;/p></description></item><item><title>Quorums and Tunable Consistency</title><link>http://ddinsights.net/posts/quorums-and-tunable-consistency/</link><pubDate>Fri, 05 Jun 2026 09:00:00 -0700</pubDate><guid>http://ddinsights.net/posts/quorums-and-tunable-consistency/</guid><description>&lt;p>Many distributed databases — Dynamo-style stores like Cassandra and Riak — don&amp;rsquo;t
make you choose consistency once and for all. They let you tune it &lt;strong>per request&lt;/strong>
using quorums. The mechanism is simple arithmetic with surprisingly deep
consequences.&lt;/p>
&lt;h2 id="the-setup">The setup&lt;/h2>
&lt;p>Each piece of data is replicated to &lt;strong>N&lt;/strong> nodes. For every operation you pick:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>W&lt;/strong> — how many replicas must acknowledge a &lt;em>write&lt;/em> before it&amp;rsquo;s considered done.&lt;/li>
&lt;li>&lt;strong>R&lt;/strong> — how many replicas you read from and compare before returning.&lt;/li>
&lt;/ul>
&lt;p>The headline rule:&lt;/p></description></item><item><title>LSM-Trees vs. B-Trees: Choosing Your Write Path</title><link>http://ddinsights.net/posts/lsm-trees-vs-btrees/</link><pubDate>Tue, 02 Jun 2026 08:30:00 -0700</pubDate><guid>http://ddinsights.net/posts/lsm-trees-vs-btrees/</guid><description>&lt;p>Underneath nearly every database is one of two storage engines: a &lt;strong>B-tree&lt;/strong> or an
&lt;strong>LSM-tree&lt;/strong>. The choice shapes write throughput, read latency, space usage, and
how the system behaves under pressure. It&amp;rsquo;s worth understanding &lt;em>why&lt;/em> the big
distributed stores — Cassandra, RocksDB, HBase, ScyllaDB — overwhelmingly chose
LSM-trees.&lt;/p>
&lt;h2 id="b-trees-update-in-place">B-trees: update in place&lt;/h2>
&lt;p>A B-tree keeps data sorted in fixed-size pages and &lt;strong>mutates pages in place&lt;/strong>. To
update a key, you find its page and overwrite it. Reads are excellent — typically
a handful of page lookups — and the structure has powered relational databases for
decades.&lt;/p></description></item><item><title>About</title><link>http://ddinsights.net/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://ddinsights.net/about/</guid><description>&lt;h2 id="about-me">About me&lt;/h2>
&lt;p>I&amp;rsquo;m &lt;strong>Dikang Gu&lt;/strong>, a distributed systems engineer who has spent his career building
the data infrastructure behind some of the largest consumer platforms in the world.&lt;/p>
&lt;p>I&amp;rsquo;m currently a &lt;strong>Technical Director at Roblox&lt;/strong>, where I lead the OLTP database and
storage platform that provides data storage and serving for Roblox&amp;rsquo;s hundreds of
millions of monthly users and millions of creators. The work centers on
fault-tolerant distributed storage, smart partitioning, rock-solid replication, and
squeezing every drop of performance out of planet-scale systems.&lt;/p></description></item></channel></rss>