Posts

Paper Note: Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly

这篇文章讲道理不应该写这么长的，但讲的东西比较“好玩”，于是就多记录了一些。 FAQ What is an ad hoc transaction? It refers to database operations coordinated by application code. Specifically, developers might explicitly use locking primitives and validation procedures to implement concurrency control amid the application code to coordinate critical database operations. Why do not use database transactions instead? For flexibility or efficiency. 在一般的数据库使用场景下，伴随着数据库的隔离级别提升，性能下降十分严重，为此，应用层临时事务需要做到既利用低隔离级别的数据库防止性能下降，又要实现应用层的事务机制防止数据一致性错误等问题。 What is a false conflict in database systems? In database systems, a false conflict, also known as a phantom conflict, occurs when a transaction appears to conflict with other transactions, but in reality, it does not. This can happen in situations where transactions are using optimistic concurrency control or multi-version concurrency control mechanisms. ...

A Piece Of: ThreadLocal

原理 ThreadLocal 是一种线程局部变量，它为每个线程提供了一个独立的变量副本，所以每个线程都可以拥有自己的局部变量，互不影响。 ThreadLocal 可以做到线程隔离的原因在于，每次创建 ThreadLocal 的时候，都会创建一个新的线程局部存储区，这个存储区只存在于当前线程中，其他线程无法访问到。这样就实现了线程之间的隔离，每个线程都可以在自己的线程局部存储区中保存自己的数据，互不影响。使用方法管理 Connection ThreadLocal 的相关知识我查过多次，一直不理解为什么使用 ThreadLocal 可以起到“管理 Connection”的作用，我之前的疑问是这样的：数据库连接在同一时间只能被一个线程所持有，线程在申请数据库连接时也是线程安全的。Java 多线程访问同一个 java.Sql.Connection 会导致事务错乱。如果 ThreadLocal 的作用是“提供副本”的话，那么多个线程拿到的不就是同一个 Connection 了？其实是这样的：如果不使用 ThreadLocal，你当然可以用局部变量的方式来保证线程封闭（Thread Confinement），即在一个函数中先从连接池中获取连接，执行完逻辑后再归还连接。但如果说你必须要使用到一个全局变量的 Connection 呢？如果不使用 ThreadLocal，就会出现不同的线程使用同一个全局变量的问题，自然不满足“一个数据库连接在同一时间只能被一个线程所持有”的限制。每当一个线程需要数据库连接时，它就从数据库连接池中取出一个连接，存到 ThreadLocal 中，这样虽然不同线程的数据库连接都叫 dbConn,但都是独立的 Connection。在 Spring 的 Web 项目中，我们通常会将业务分为 Controller 层，Service 层，Dao 层，我们都知道@Autowired 注解默认使用单例模式，那么不同请求线程进来之后，由于 Dao 层使用单例，那么负责数据库连接的 Connection 也只有一个，如果每个请求线程都去连接数据库，那么就会造成线程不安全的问题，Spring 是如何解决这个问题的呢？在 Spring 项目中 Dao 层中装配的 Connection 肯定是线程安全的，其解决方案就是采用 ThreadLocal 方法，当每个请求线程使用 Connection 的时候，都会从 ThreadLocal 获取一次，如果为 null，说明没有进行过数据库连接，连接后存入 ThreadLocal 中，如此一来，每一个请求线程都保存有一份自己的 Connection，于是便解决了线程安全问题。 public class DatabaseUtil { private static DataSource dataSource = ...; // 数据库连接池 private static final ThreadLocal<Connection> connectionHolder = new ThreadLocal<>(); public static Connection getConnection() throws SQLException { Connection conn = connectionHolder.get(); if (conn == null) { conn = dataSource.getConnection(); connectionHolder.set(conn); } return conn; } public static void closeConnection() throws SQLException { Connection conn = connectionHolder.get(); if (conn != null) { conn.close(); connectionHolder.remove(); } } } public class MyServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { Connection conn = null; try { conn = DatabaseUtil.getConnection(); // do something with the database connection // ... } catch (SQLException e) { // handle exception } finally { if (conn != null) { try { DatabaseUtil.closeConnection(); } catch (SQLException e) { // handle exception } } } } } 在这个示例中，DatabaseUtil 类通过 ThreadLocal 来存储数据库连接。每个请求线程从连接池获取连接时，会先检查 ThreadLocal 中是否已经存在了一个连接，如果没有就创建一个新连接并将其存储到 ThreadLocal 中，否则直接从 ThreadLocal 中获取已有的连接。在请求处理完毕后，关闭连接并从 ThreadLocal 中删除对象引用，以便及时释放资源和避免内存泄漏。 ...

Note: WSL2 Mirrored 网络模式下异常情况总结

Background 前段时间看到了 Windows Subsystem for Linux September 2023 update - Windows Command Line 这篇文章后，发现了 WSL 2 的新网络模式挺有意思的： Networking improvements are a consistent top ask for WSL, and this feature aims to improve the networking experience in WSL! This is a complete overhaul on the traditional NAT networking architecture of WSL, to an entirely new networking mode called “Mirrored”. The goal of this mode is to mirror the network interfaces that you have on Windows into Linux, to add new networking features and improve compatibility. ...

Paper Note: Cobra: Making Transactional Key-Value Stores Verifiably Serializable

Analyze 这篇论文关注在如何使用黑盒的方式验证键值存储的的可串行化。 Background 如今许多客户选择使用云数据库提供的键值存储服务，客户程序的正确性受到云数据库的正确性的影响，云数据库的正确性常常通过可串行化来定义，即客户的事务仿佛以串行的方式执行，那么云数据库是否符合了可串行化的约束？这个问题有几个挑战，一方面数据库是黑盒的，我们无法得到数据库的代码，只能分析数据库的行为，即输入和输出，另一方面需要在数据库不断运行中，同步验证其是否符合可串行化的要求，这需要验证手段高效并具有可扩展性。这篇论文的直觉来源于 SMT solver 以及计算能力的进步，认为其足以自动化的验证可串行化的问题，于是他们基于 SMT solver 提出 Cobra 框架，Cobra 包含一系列技术，做到了高效可扩展的验证可串行化，实验表明 Cobra 能验证实际场景下数据库的可串行化。 Structures Each client request is one of five operations: start, commit, abort (which refer to transactions), and read and write (which refer to keys). History collectors sit between clients and the database, capturing the requests that clients issue and the (possibly wrong) results delivered by the database. A verifier retrieves history fragments from collectors and attempts to verify whether the history is serializable. ...

Paper Note: Epoxy: ACID Transactions Across Diverse Data Stores

Summary 一句话总结，就是：Re-implement the multi-version concurrency control mechanism of Postgres on shim layers. 因为这篇文章在组会上做了汇报，所以我就直接贴 PPT 了。 Content