Tuesday, May 13, 2014

Personal Notes On Distributed Data Computing [Unofficial]

  • A Data computation problem requiring Multiple jobs (multiple rounds of "map" followed by "reduce") in Hadoop [5]
    • => YARN (Hadoop) [2]
    • Apache Pig [4]
      • Provides a DSL interface that translates code into Single / Multiple Mapreduce jobs. 
  • Compare: Apache Hama: Bulk Synchronous Parallel Model [3].
    • Cycles of Parallel Computation and Synchronization.
    • Comparison:
      • "Map": Parallel Computation
      • "Reduce": Synchronization
      • Multiple rounds of "map" followed by "reduce": Bulk Synchronous Parallel Model 

Bulk Synchronous Parallel Model


References

No comments:

Post a Comment