Paper Note

Paper Note: Cobra: Making Transactional Key-Value Stores Verifiably Serializable

Analyze 这篇论文关注在如何使用黑盒的方式验证键值存储的的可串行化。 Background 如今许多客户选择使用云数据库提供的键值存储服务，客户程序的正确性受到云数据库的正确性的影响，云数据库的正确性常常通过可串行化来定义，即客户的事务仿佛以串行的方式执行，那么云数据库是否符合了可串行化的约束？这个问题有几个挑战，一方面数据库是黑盒的，我们无法得到数据库的代码，只能分析数据库的行为，即输入和输出，另一方面需要在数据库不断运行中，同步验证其是否符合可串行化的要求，这需要验证手段高效并具有可扩展性。这篇论文的直觉来源于 SMT solver 以及计算能力的进步，认为其足以自动化的验证可串行化的问题，于是他们基于 SMT solver 提出 Cobra 框架，Cobra 包含一系列技术，做到了高效可扩展的验证可串行化，实验表明 Cobra 能验证实际场景下数据库的可串行化。 Structures Each client request is one of five operations: start, commit, abort (which refer to transactions), and read and write (which refer to keys). History collectors sit between clients and the database, capturing the requests that clients issue and the (possibly wrong) results delivered by the database. A verifier retrieves history fragments from collectors and attempts to verify whether the history is serializable. ...

Paper Note: Epoxy: ACID Transactions Across Diverse Data Stores

Summary 一句话总结，就是：Re-implement the multi-version concurrency control mechanism of Postgres on shim layers. 因为这篇文章在组会上做了汇报，所以我就直接贴 PPT 了。 Content

Paper Note: Zab: High-performance broadcast for primary-backup systems

FAQ What is the difference between receive and deliver? What does it mean by saying “Zab’s transaction log doubles as the database write-ahead transaction log” in page 3? ZooKeeper uses an in-memory database and stores transaction logs (Write-ahead log) and periodic snapshots on disk. Before a transaction is executed and its changes are applied to the in-memory database, it is first logged. This means that if the system crashes before the changes can be applied, the transaction can be replayed from the log to ensure data integrity. ...

Paper Note: CAP Twelve Years Later: How the "Rules" have Changed

FAQ what is a version vector? A version vector is a construct used in distributed systems to track the version of data across different nodes in a network, ensuring consistency and helping to resolve conflicts. Version vectors are particularly useful in systems where multiple nodes may independently modify data and then need to synchronize with each other without relying on a central authority. This concept is fundamental in the context of eventual consistency and conflict resolution in distributed databases, file systems, and data replication scenarios. ...

Paper Note: Chain Replication

FAQ Is chain replication used in practice over other things like Raft or Paxos? Systems often use both. A common way of building distributed systems is to use a configuration server (called the master in the paper) for maintaining configuration info (e.g., who is primary) and a replication system for replicating data. Paxos/Raft are commonly-used to build the configuration server while the replication system often uses primary-backup or chain replication. ...

Paper Note: Time, clocks, and the ordering

Happened Before 关系论文先从事件之间的 Happened Before 关系开始讲起结合上图我们举例解释一下「Happened Before」关系：同一进程内部先后发生的两个事件之间，具有「Happened Before」关系。比如，在进程 Q 内部，q_2 表示一个消息接收事件，q_4表示另一个消息的发送事件，q_2 排在 q_4 前面执行，所以 q_2 → q_4。同一个消息的发送事件和接收事件，具有「Happened Before」关系。比如，p_1 和 q_2分别表示同一个消息的发送事件和接收事件，所以 p_1→ q_2；同理， q_4→ r_3。「Happened Before」满足传递关系。比如，由 p_1→ q_2， q_2→_q_4和 q_4→ r_3，可以推出 p_1 → r_3。作者然后说明了并发的概念： Two distinct events a and b are said to be concurrent if a !→ b and b !→ a. 同时作者也描述了与因果性的关系： Another way of viewing the definition is to say that a→b means that it is possible for event a to causally affect event b. Two events are concurrent if neither can causally affect the other. ...

Paper Note: Chubby

FAQ What is an advisory lock? An “advisory lock” is simply a tool/API provided by Postgres to create arbitrary locks that can be acquired by applications. These locks, however, are not enforced in any meaningful way by the database – it’s up to application code to give them meaning (the same way any other non-database distributed lock would work). What is a write-through cache? A write-through cache is a caching strategy where data is simultaneously written into the cache and the corresponding database or backing store. ...

Paper Note: BigTable

FAQ What is a single-row transaction? 在 Bigtable 变更（例如读取、写入和删除请求）中，行级更改始终属于原子操作。这包括对单行中的多个列进行的变更，前提是它们包含在同一变更操作中。Bigtable 不支持以原子方式更新多个行的事务。但是，Bigtable 支持某些需要在其他数据库中执行事务的写入操作。实际上，Bigtable 使用单行事务来完成这些操作。这些操作包括读取和写入操作，且所有读取和写入均以原子方式执行，但仍然只是在行级的原子操作：读取-修改-写入 (Read-modify-write) 操作（包括增量和附加）。在读取-修改-写入 (read-modify-write) 操作中，Cloud Bigtable 会读取现有值，对现有值进行增量或附加操作，并将更新后的值写入表中。检查并更改 (Check-and-mutate) 操作（也称为条件更改或条件写入）。在检查并更改 (check-and-mutate) 操作中，Bigtable 会对行进行检查以了解其是否符合指定条件。如果符合条件，Bigtable 则会将新值写入该行中。 What is SSTable format? SSTable (Sorted Strings Table) is a file format used in Apache Cassandra, a popular NoSQL database. An SSTable is a data structure that provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte streams. It contains a series of key-value pairs sorted by keys, which enables efficient lookup and range queries. It’s used to store and retrieve data in a highly optimized manner. The SSTable also supports internal indexing, which makes accessing a particular data point faster. ...