KKKZOZ’s Blog

Three paper indexes(LLM/Transactions/Distributed Systems) are pinned.

[Pinned] LLM Inference Papers Index

My reading notes. 2025 0729-0804 Fast On-device LLM Inference with NPUs Deja Vu Contextual Sparsity for Efficient LLMs at Inference Time PowerInfer-2 Fast Large Language Model Inference on a Smartphone LLM in a flash Efficient Large Language Model Inference with Limited Memory PowerInfer Fast Large Language Model Serving with a Consumer-grade GPU 0722-0728 AWQ Activation-aware Weight Quantization for LLM Compression and Acceleration FlexGen High-Throughput Generative Inference of Large Language Models with a Single GPU LoRA Low-Rank Adaptation of Large Language Models SpecInfer Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification EdgeLLM Fast On-Device LLM Inference With Speculative Decoding Efficient Memory Management for Large Language Model Serving with PagedAttention 0715-0721 A Survey on Efficient Inference for Large Language Models -0714 Orca A Distributed Serving System for Transformer-Based Generative Models EdgeShard Efficient LLM Inference via Collaborative Edge Computing ServerlessLLM Locality-Enhanced Serverless Inference for Large Language Models Uncategorized WIP 🚧 ...

[Pinned] Transactions Papers Index

My reading notes. 2025 0715-0721 Concurrency Control as a Service Sonata Multi-Database Transactions Made Fast and Serializable Uncategorized WIP 🚧 towards-transaction-as-a-service grit taking-omid-to-the-clouds epoxy ad-hoc-transactions-in-web-applications omid-reloaded data-management-in-microservices scalable-distributed-transactions-across-heterogeneous-stores cobra

[Pinned] Distributed Papers Index

My reading notes. 2025 2023 && 2024 bigtable cap-twelve-years-later zab mapreduce chubby chain-replication time, clocks, and the ordering farm zookeeper

Git Operations: Rebase, Squash, and Cherry-Pick

In multi-branch development, simply using git merge is not enough! Hint 把单次提交看做为针对于父提交新增的改动 (delta) 可能会好理解一点: ...

Deep Learning Basic

For self reference. Forward Pass import torch import torch.nn.functional as F # Setup learning_rate = 0.1 x = torch.randn(1, 5) # Input data y_true = torch.tensor([[1.0]]) # True label # Model parameters initialized manually # requires_grad=True tells PyTorch to calculate gradients for them w = torch.randn(5, 1, requires_grad=True) b = torch.randn(1, requires_grad=True) print(f"Initial weight:\n{w.data}\n") # 1. Forward Pass # Calculate a prediction using the current weight and bias z = x @ w + b # `@` is matrix multiplication y_pred = torch.sigmoid(z) # 2. Calculate Loss # Compare the prediction to the true label loss = F.binary_cross_entropy(y_pred, y_true) # 3. Backward Pass # Calculate the gradients of the loss with respect to w and b loss.backward() # 4. Update Parameters # Manually adjust w and b in the opposite direction of their gradients with torch.no_grad(): # Temporarily disable gradient tracking for the update w -= learning_rate * w.grad b -= learning_rate * b.grad # Manually zero out the gradients for the next iteration w.grad.zero_() b.grad.zero_() print(f"Updated weight:\n{w.data}\n") print(f"Loss: {loss.item():.4f}") “forward pass” (前向传播) 是指神经网络从输入数据开始，逐层计算，直到产生最终输出（预测结果）的过程。可以把它想象成信息在网络中“向前流动”的过程。 ...

NumPy Notes

Shape Manipulating reshape 可以把 NumPy 的 reshape 操作想象成一个先摊平, 再重铺的过程核心思想：无论你原来的数组是什么形状，也无论你想要变成什么新形状，reshape 都会（概念上）做两步：摊平 (Flattening): 一行一行地把整个数组中的元素读出来, 形成一维数组重铺 (Refilling): 再根据 reshape 后的形状填满一行一行地填满整个数组 import numpy as np a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]) # Flatten a_flat = a.reshape(1, 12) print("Flattened array: ", a_flat) # Refill a_refilled = a_flat.reshape(3, 4) print("Refilled array: ", a_refilled) transpose 高维转置的本质是重新安排数组的索引顺序，而不是传统意义上的"矩阵转置" concatenate & stack concatenate: 沿着现有的轨道/维度进行延伸或对接 concatenate 是将多个数组沿着一个已经存在的维度（轴，axis）拼接起来。结果数组的维度数量通常与输入数组的维度数量相同工作方式：你需要指定一个 axis 参数，告诉 NumPy 沿着哪个维度进行拼接。除了要拼接的那个维度之外，其他所有维度的大小必须完全相同。就像你要把两列火车车厢接起来，它们的高度和宽度得匹配，只有长度可以不同（然后加起来） A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6]]) # 注意：B 也是二维的，才能在 axis=0 上与 A 匹配列数 np.concatenate((A, B), axis=0) # 结果: # [[1, 2], # [3, 4], # [5, 6]] (行数增加了，列数不变) A = np.array([[1, 2], [3, 4]]) C = np.array([[5, 6], [7, 8]]) np.concatenate((A, C), axis=1) # 结果: # [[1, 2, 5, 6], # [3, 4, 7, 8]] (列数增加了，行数不变) stack: 将多个独立的层叠放起来，形成一个新的维度 ...

Useful Websites

Some useful websites for references. Daily Life Picture Compressor Convert from/to JPG Programming Code Image Generator OpenJDK Docker Proxy

Common Golang Mistakes

Irregular updates Go Gotcha: math/rand Thread Safety Isn’t Universal Ran into a subtle trap with Go’s math/rand package regarding concurrency. It’s easy to assume all random generation is thread-safe, but that’s not quite right. The Key Difference: rand.Intn() (Package Level): Thread-safe! Uses a global source with an internal mutex. Good for most concurrent uses. myRand.Intn() (Method on *rand.Rand): NOT thread-safe by default! If you create your own *rand.Rand instance (myRand) and share it between goroutines, calling its methods concurrently without your own locking will cause data races. Example I Encountered: ...