Paper Note: How to Read a Paper

The Three-Pass Approach Each pass accomplishes specific goals and builds upon the previous pass: The first pass gives you a general idea about the paper. The second pass lets you grasp the paper’s content, but not its details. The third pass helps you understand the paper in depth. The First Pass A quick scan to get a bird’s-eye view of the paper. This pass should take about five to ten minutes and consists of the following steps: ...

December 5, 2023 · Last updated on April 5, 2025 · 3 min · KKKZOZ

Paper Note: Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly

这篇文章讲道理不应该写这么长的,但讲的东西比较“好玩”,于是就多记录了一些。 FAQ What is an ad hoc transaction? It refers to database operations coordinated by application code. Specifically, developers might explicitly use locking primitives and validation procedures to implement concurrency control amid the application code to coordinate critical database operations. Why do not use database transactions instead? For flexibility or efficiency. 在一般的数据库使用场景下,伴随着数据库的隔离级别提升,性能下降十分严重,为此,应用层临时事务需要做到既利用低隔离级别的数据库防止性能下降,又要实现应用层的事务机制防止数据一致性错误等问题。 What is a false conflict in database systems? In database systems, a false conflict, also known as a phantom conflict, occurs when a transaction appears to conflict with other transactions, but in reality, it does not. This can happen in situations where transactions are using optimistic concurrency control or multi-version concurrency control mechanisms. ...

November 30, 2023 · Last updated on August 1, 2025 · 9 min · KKKZOZ

Paper Note: Cobra: Making Transactional Key-Value Stores Verifiably Serializable

Analyze 这篇论文关注在如何使用黑盒的方式验证键值存储的的可串行化。 Background 如今许多客户选择使用云数据库提供的键值存储服务,客户程序的正确性受到云数据库的正确性的影响,云数据库的正确性常常通过可串行化来定义,即客户的事务仿佛以串行的方式执行,那么云数据库是否符合了可串行化的约束?这个问题有几个挑战,一方面数据库是黑盒的,我们无法得到数据库的代码,只能分析数据库的行为,即输入和输出,另一方面需要在数据库不断运行中,同步验证其是否符合可串行化的要求,这需要验证手段高效并具有可扩展性。 这篇论文的直觉来源于 SMT solver 以及计算能力的进步,认为其足以自动化的验证可串行化的问题,于是他们基于 SMT solver 提出 Cobra 框架,Cobra 包含一系列技术,做到了高效可扩展的验证可串行化,实验表明 Cobra 能验证实际场景下数据库的可串行化。 Structures Each client request is one of five operations: start, commit, abort (which refer to transactions), and read and write (which refer to keys). History collectors sit between clients and the database, capturing the requests that clients issue and the (possibly wrong) results delivered by the database. A verifier retrieves history fragments from collectors and attempts to verify whether the history is serializable. ...

November 23, 2023 · Last updated on August 1, 2025 · 2 min · KKKZOZ

Paper Note: Epoxy: ACID Transactions Across Diverse Data Stores

Summary 一句话总结,就是:Re-implement the multi-version concurrency control mechanism of Postgres on shim layers. 因为这篇文章在组会上做了汇报,所以我就直接贴 PPT 了。 Content

November 22, 2023 · Last updated on August 1, 2025 · 1 min · KKKZOZ

Paper Note: Zab: High-performance broadcast for primary-backup systems

FAQ What is the difference between receive and deliver? What does it mean by saying “Zab’s transaction log doubles as the database write-ahead transaction log” in page 3? ZooKeeper uses an in-memory database and stores transaction logs (Write-ahead log) and periodic snapshots on disk. Before a transaction is executed and its changes are applied to the in-memory database, it is first logged. This means that if the system crashes before the changes can be applied, the transaction can be replayed from the log to ensure data integrity. ...

November 15, 2023 · Last updated on August 1, 2025 · 4 min · KKKZOZ

Paper Note: CAP Twelve Years Later: How the "Rules" have Changed

FAQ what is a version vector? A version vector is a construct used in distributed systems to track the version of data across different nodes in a network, ensuring consistency and helping to resolve conflicts. Version vectors are particularly useful in systems where multiple nodes may independently modify data and then need to synchronize with each other without relying on a central authority. This concept is fundamental in the context of eventual consistency and conflict resolution in distributed databases, file systems, and data replication scenarios. ...

November 12, 2023 · Last updated on August 1, 2025 · 8 min · KKKZOZ

Paper Note: Chain Replication

FAQ Is chain replication used in practice over other things like Raft or Paxos? Systems often use both. A common way of building distributed systems is to use a configuration server (called the master in the paper) for maintaining configuration info (e.g., who is primary) and a replication system for replicating data. Paxos/Raft are commonly-used to build the configuration server while the replication system often uses primary-backup or chain replication. ...

November 6, 2023 · Last updated on August 1, 2025 · 5 min · KKKZOZ

Paper Note: Time, clocks, and the ordering

Happened Before 关系 论文先从事件之间的 Happened Before 关系开始讲起 结合上图我们举例解释一下「Happened Before」关系: 同一进程内部先后发生的两个事件之间,具有「Happened Before」关系。比如,在进程 Q 内部,q_2 表示一个消息接收事件,q_4表示另一个消息的发送事件,q_2 排在 q_4 前面执行,所以 q_2 → q_4。 同一个消息的发送事件和接收事件,具有「Happened Before」关系。比如,p_1 和 q_2分别表示同一个消息的发送事件和接收事件,所以 p_1→ q_2;同理, q_4→ r_3。 「Happened Before」满足传递关系。比如,由 p_1→ q_2, q_2→_q_4和 q_4→ r_3,可以推出 p_1 → r_3。 作者然后说明了并发的概念: Two distinct events a and b are said to be concurrent if a !→ b and b !→ a. 同时作者也描述了与因果性的关系: Another way of viewing the definition is to say that a→b means that it is possible for event a to causally affect event b. Two events are concurrent if neither can causally affect the other. ...

November 4, 2023 · Last updated on August 1, 2025 · 3 min · KKKZOZ