Paper Note: MapReduce
Analyze Key Features Processing and generating large data sets. Exploits a restricted programming model to parallelize the user program automatically and to provide transparent fault-tolerance. Details By distributing the workload to many machines and let them execute the tasks in parallel. Specifically, input files are split into $M$ pieces and master assign each one to an idle worker. Worker process it with the map function and save each key/value pair to one of the $R$ files according to the partioning function. When finishing processing all $M$ pieces, $R$ reduce workers will read data from the corresponding intermediate files, process it with the reduce function and save to output file eventually. ...