DDIA: Chapter 6 Partioning

The main reason for wanting to partition data is scalability. Normally, partitions are defined in such a way that each piece of data (each record, row, or document) belongs to exactly one partition. Partitioning and Replication Partitioning is usually combined with replication so that copies of each partition are stored on multiple nodes. This means that, even though each record belongs to exactly one partition, it may still be stored on several different nodes for fault tolerance. ...

October 24, 2023 · Last updated on August 1, 2025 · 7 min · KKKZOZ

DDIA: Chapter 5 Replication

Replication Versus Partitioning There are two common ways data is distributed across multiple nodes: Replication Keeping a copy of the same data on several different nodes, potentially in different locations. Replication provides redundancy and can also help improve performance. Partitioning Splitting a big database into smaller subsets called partitions so that different partitions can be assigned to different nodes (also known as sharding). These are separate mechanisms, but they often go hand in hand: ...

October 23, 2023 · Last updated on August 1, 2025 · 14 min · KKKZOZ

DDIA: Chapter 4 Encoding and Evolution

Formats for Encoding Data 这里提到了两种兼容性,后面分析数据编码格式时都会用到: In order for the system to continue running smoothly, we need to maintain compatibility in both directions: Backward compatibility Newer code can read data that was written by older code. Forward compatibility Older code can read data that was written by newer code. 直译有一个问题, 英语的"前后"在时间和空间上统一, 而汉语却是相反. 比如 forward 在空间上指前进, 在时间上指未来. 但是汉语中的"前"在空间上指前进, 在时间上却指过去. 向后兼容很好理解:指新的版本的软/硬件可以使用老版本的软/硬件产生的数据。 Forward compatibility 译为向前兼容极容易混乱,这里可以想成向未来兼容:指老的版本的软/硬件可以使用新版本的软/硬件产生的数据。 以下是几个例子: Intel 的 x86指令集 CPU 是向后兼容的,因为新款 CPU 依然可以运行老版本的软件。Intel 保证老版本 CPU 有的指令集新版本一定还保留着,这种只增加不删除的策略,保证了我们换 CPU 时,不需要更换很多软件。 ...

October 21, 2023 · Last updated on August 1, 2025 · 9 min · KKKZOZ

DDIA: Chapter 2 Data Models and Query Languages

Relational Model Versus Document Model 首先谈到了 NoSQL 的诞生: There are several driving forces behind the adoption of NoSQL databases, including: A need for greater scalability than relational databases can easily achieve, includ‐ ing very large datasets or very high write throughput A widespread preference for free and open source software over commercial database products Specialized query operations that are not well supported by the relational model Frustration with the restrictiveness of relational schemas, and a desire for a more dynamic and expressive data model 然后通过下图的这份简历来说明了 one-to-many 这种关系 ...

October 20, 2023 · Last updated on August 1, 2025 · 7 min · KKKZOZ

DDIA: Chapter 3 Storage and Retrieval

这一章主要讲的是数据库更底层的一些东西。 In order to tune a storage engine to perform well on your kind of workload, you need to have a rough idea of what the storage engine is doing under the hood. Data Structures That Power Your Database Index Any kind of index usually slows down writes, because the index also needs to be updated every time data is written. This is an important trade-off in storage systems: well-chosen indexes speed up read queries, but every index slows down writes. ...

October 20, 2023 · Last updated on August 1, 2025 · 9 min · KKKZOZ

Cascade Speculative Drafting for Even Faster LLM Inference

Extensive Reading Author Info Background While speculative decoding improves latency by using a smaller draft model to generate tokens for a larger target model, it suffers from two specific bottlenecks: Autoregressive Drafting: The draft model itself generates tokens autoregressively (one by one), which is still computationally expensive and slow. Inefficient Time Allocation: Standard methods allocate equal time to generate every draft token. However, tokens later in the sequence have a significantly lower probability of acceptance. Using the same computational resources for these “high-rejection” tokens is inefficient. Insights The autoregressive process of draft model is the bottleneck: Use draft model to accelerate draft models (Vertical Cascade) Tokens later in the sequence have a lower probability of acceptance: Use a faster and lighter draft model later in the sequence (Horizontal Cascade) Challenges Approaches ...

February 10, 2026 · Last updated on February 10, 2026 · 4 min · KKKZOZ

3-Model Speculative Decoding

Extensive Reading Author Info Background The Accuracy-Speed Trade-off: The effectiveness of SD is limited by a fundamental trade-off: very small draft models are fast but often diverge from the target model’s distribution, leading to low acceptance rates. Conversely, larger draft models have higher acceptance rates but are too slow to provide significant speedups. Limitations of Single-Stage Verification: As the performance gap between the draft and target models widens, the output distributions diverge significantly, diminishing the acceleration gains. Even relaxed verification methods like Fuzzy Speculative Decoding struggle to bridge large distributional gaps between a tiny draft model and a massive target model in a single step. Insights The authors propose Pyramid Speculative Decoding, which inserts an intermediate “Qualifier Model” between the small Draft and the large Target. This creates a hierarchical pipeline that bridges the “distributional gap” between the small and large models. ...

February 9, 2026 · Last updated on February 9, 2026 · 3 min · KKKZOZ

LayerSkip Enabling Early Exit Inference and Self-Speculative Decoding

Extensive Reading Author Info Background Early Exit (Dynamic Halting) These techniques attempt to stop the forward pass at an intermediate layer if the model is sufficiently confident in the prediction. Problems: In standard LLMs, early layers are “lazy” (not trained to produce final tokens), leading to severe accuracy drops; furthermore, these methods typically require adding and training auxiliary “exit heads,” which increases parameter overhead. Layer Pruning and Dropout Existing research has explored skipping layers (dropout) during training to make sub-networks robust or pruning layers post-training for speed. Problems: Standard uniform layer dropout does not specifically incentivize early layers to be accurate, and post-training pruning often results in performance degradation that requires complex fine-tuning to recover. Insights Accelerate Large Language Model (LLM) inference by enabling the model to generate tokens using fewer layers when possible, while maintaining accuracy. ...

February 9, 2026 · Last updated on February 9, 2026 · 3 min · KKKZOZ