Paper Note: MapReduce

Analyze Key Features Processing and generating large data sets. Exploits a restricted programming model to parallelize the user program automatically and to provide transparent fault-tolerance. Details By distributing the workload to many machines and let them execute the tasks in parallel. Specifically, input files are split into $M$ pieces and master assign each one to an idle worker. Worker process it with the map function and save each key/value pair to one of the $R$ files according to the partioning function. When finishing processing all $M$ pieces, $R$ reduce workers will read data from the corresponding intermediate files, process it with the reduce function and save to output file eventually. ...

October 30, 2023 · Last updated on August 1, 2025 · 2 min · KKKZOZ

Paper Note: ZooKeeper

Analyze Key Features Coordination in large-scale distributed systems by providing general “coordination kernel” APIs that support a lot of use cases. Good performance Fault tolerance / high availability Details ZooKeeper can be used for group messaging, shared registers, and distributed lock services. Reliability By replicating the ZooKeeper data on each server that compses the service. The data is periodically snapshotted. Fuzzy Snapshot We do not lock the ZooKeeper state to take the snapshot. ...

October 30, 2023 · Last updated on August 1, 2025 · 1 min · KKKZOZ