KKKZOZ’s Blog

Three paper indexes(LLM/Transactions/Distributed Systems) are pinned.

Other posts are sorted by date.

[Pinned] LLM Inference Papers Index

My reading notes. 2025 1111-1117 LServe Efficient Long-sequence LLM Serving with Unified Sparse Attention QServe W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Quest Query-Aware Sparsity for Efficient Long-Context LLM Inference Dynamic Sparse Attention on Mobile SoCs A dynamic parallel method for performance optimization on hybrid CPUs SmoothQuant Accurate and Efficient Post-Training Quantization for Large Language Models DuoAttention Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Efficient Streaming Language Models with Attention Sinks KTransformers Unleashing the Full Potential of CPU GPU Hybrid Inference for MoE Models 1104-1110 EAGLE Speculative Sampling Requires Rethinking Feature Uncertainty 1028-1103 Aegaeon Effective GPU Pooling for Concurrent LLM Serving on the Market DistServe Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Splitwise Efficient Generative LLM Inference Using Phase Splitting Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve 0826-0901 ELMS Elasticized Large Language Models On Mobile Devices Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash 0819-0825 STI Turbocharge NLP Inference at the Edge via Elastic Pipelining EdgeMoE Empowering Sparse Large Language Models on Mobile Devices LLM as a System Service on Mobile Devices SmallThinker A Family of Efficient Large Language Models Natively Trained for Local Deployment HeteroLLM Accelerating Large Language Model Inference on Mobile SoCs with Heterogeneous AI Accelerators A Survey of Resource-efficient LLM and Multimodal Foundation Models H2O Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models 0812-0818 KV-Runahead Scalable Causal LLM Inference by Parallel Key-Value Cache Generation Striped Attention Faster Ring Attention for Causal Transformers Ring Attention with Blockwise Transformers for Near-Infinite Context TPI-LLM Serving 70B-scale LLMs Efficiently on Low-resource Mobile Devices LLM.int8() 8-bit Matrix Multiplication for Transformers at Scale 0729-0804 Fast On-device LLM Inference with NPUs Deja Vu Contextual Sparsity for Efficient LLMs at Inference Time PowerInfer-2 Fast Large Language Model Inference on a Smartphone LLM in a flash Efficient Large Language Model Inference with Limited Memory PowerInfer Fast Large Language Model Serving with a Consumer-grade GPU 0722-0728 AWQ Activation-aware Weight Quantization for LLM Compression and Acceleration FlexGen High-Throughput Generative Inference of Large Language Models with a Single GPU LoRA Low-Rank Adaptation of Large Language Models SpecInfer Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification EdgeLLM Fast On-Device LLM Inference With Speculative Decoding Efficient Memory Management for Large Language Model Serving with PagedAttention 0715-0721 A Survey on Efficient Inference for Large Language Models -0714 Orca A Distributed Serving System for Transformer-Based Generative Models EdgeShard Efficient LLM Inference via Collaborative Edge Computing ServerlessLLM Locality-Enhanced Serverless Inference for Large Language Models Uncategorized WIP 🚧 ...

July 28, 2025 · Last updated on November 17, 2025 · 3 min · KKKZOZ

[Pinned] Transactions Papers Index

My reading notes. 2025 0715-0721 Concurrency Control as a Service Sonata Multi-Database Transactions Made Fast and Serializable Uncategorized WIP 🚧 towards-transaction-as-a-service grit taking-omid-to-the-clouds epoxy ad-hoc-transactions-in-web-applications omid-reloaded data-management-in-microservices scalable-distributed-transactions-across-heterogeneous-stores cobra

August 1, 2025 · Last updated on August 3, 2025 · 1 min · KKKZOZ

[Pinned] Distributed Papers Index

My reading notes. 2025 2023 && 2024 bigtable cap-twelve-years-later zab mapreduce chubby chain-replication time, clocks, and the ordering farm zookeeper

August 1, 2025 · Last updated on August 3, 2025 · 1 min · KKKZOZ

Server Management

组内有三台 Ubuntu 的服务器是我在管,这里记录一些常用操作 创建用户相关 sudo adduser [username] # 查看一个用户所属的所有组 groups [username] # 查看一个组里面有哪些用户 getent group groupname # 创建组 sudo groupadd <groupname> # 将一个用户添加到某个组里面 sudo usermod -aG groupname username 硬盘管理相关 # 查看整体使用情况 df -h # 查看某个路径下的文件夹大小(注意不包括文件) du -h -d 1 | sort -hr

January 8, 2026 · Last updated on January 8, 2026 · 1 min · KKKZOZ

Software Router

记录一下折腾软路由的过程,我主要是用于旁路由 很早之前买了一个电犀牛的 R66s,大三大四还有研一在用,回所后吃灰了一段时间 第一步是刷机,可以从这里获取镜像 刷到内存卡上,然后用网线连接路由器的 LAN 和你的电脑 默认管理后台是 http://192.168.1.1 打开后主要设置网络接口,把软路由 LAN 口的属性固定下来: IP: 一个固定的 IP 网关:实际路由器的 IP DNS:实际路由器的 IP 最重要的一点,记得关闭这个口的 DHCP 设置好后,在 PassWall 或者类似的插件中设置一下,然后把软路由的 LAN 口连接到路由器的 LAN 口就行了 对于要上网的设备来说,还是正常连接路由器,然后手动设置 IP IP: 一个固定的 IP 网关:软路由的 IP DNS:软路由的 IP

January 5, 2026 · Last updated on January 8, 2026 · 1 min · KKKZOZ

Sync Dot Files with Chezmoi

Keeping dotfiles consistent across multiple machines (macOS, Ubuntu, etc.) is a common challenge. Traditionally, tools like GNU Stow were used to symlink files from a repository into $HOME. Today, chezmoi provides a more powerful, template-driven approach that integrates seamlessly with Git and makes managing dotfiles across systems straightforward. This post will walk you through: Installing chezmoi Setting it up on your first machine Committing to a remote repository Applying and syncing your configuration on other hosts Handling OS-specific configuration differences 1. Installing chezmoi On macOS (via Homebrew): ...

October 4, 2025 · Last updated on October 4, 2025 · 3 min · KKKZOZ

Clang on Apple

最近在 macOS 上尝试编译 llama.cpp 的过程中,踩了不少坑。最后的结论其实很简单:在 macOS 上,最稳妥的方案就是直接用系统自带的 Apple Clang。这样几乎不需要额外配置,避免了各种 ABI、SDK 的兼容性问题。 遇到的问题 一开始我用的是 Homebrew 安装的 LLVM/Clang: brew install llvm 然后在 CMake 的 toolchain 或者 preset 里,把编译器指定成了: /opt/homebrew/opt/llvm/bin/clang /opt/homebrew/opt/llvm/bin/clang++ 结果一跑,问题接踵而至: SDK 找不到 链接时提示: ld: library 'System' not found 这是因为 Homebrew 的 clang 默认不会自动找到 macOS SDK,导致 libSystem 等核心库无法链接。 ABI 不兼容 在修复 SDK 之后,又遇到了链接报错: Undefined symbols for architecture arm64: "std::__1::__hash_memory(void const*, unsigned long)", ... 这些符号来自 libc++ 21 的新 ABI(Homebrew 的 LLVM),但链接时却跑去用了 Apple SDK 里的老版本 libc++。结果头文件和库的版本不一致,出现了典型的 ABI mismatch。 ...

September 12, 2025 · Last updated on September 12, 2025 · 1 min · KKKZOZ

Git Essentials

Essential Understandings about Git. Basic Concept Working Directory & Staging Area & Branch Working Directory 是你当前正在进行工作的、实实在在的磁盘文件集合 可以是“干净的”,即与仓库中某个版本完全一致;也可以是“脏的”,即包含了修改过或未跟踪的文件 Staging Area 是一个临时区域,用于保存即将提交到版本库的文件快照 通过 git add 命令将工作目录中的修改添加到暂存区 Branch 是 Git 中用于并行开发的核心概念 每个分支都是一个独立的开发线,可以在上面进行修改而不影响其他分支 Remote & Upstream Remote 是指你本地仓库关联的远程 Git 仓库的引用。 # 添加 remote git remote add origin https://github.com/user/repo.git # 添加多个 remote git remote add upstream https://github.com/original/repo.git git remote add fork https://github.com/your-fork/repo.git # 删除 remote git remote remove origin # 重命名 remote git remote rename origin github Upstream 是指本地分支跟踪的远程分支,建立了"上下游"关系。 ...

August 8, 2025 · Last updated on September 1, 2025 · 12 min · KKKZOZ