Abstract:
In this article, we propose new extensions to Hadoop to enable clusters of reconfigurable active solid-state drives (RASSDs) to process streaming data from SSDs using FPGAs. We also develop an analytical model to estimate the performance of RASSD clusters running under Hadoop. Using the Hadoop RASSD platform and network simulators, we validate our design and demonstrate its impact on performance for different workloads taken from Stanford's Phoenix MapReduce project. Our results show that for a hardware acceleration factor of 20×, compute-intensive workloads processing 153MB of data can run up to 11× faster than a standard Hadoop cluster.
Citation:
Kaitoua, A., Hajj, H., Saghir, M. A., Artail, H., Akkary, H., Awad, M., ... & Mershad, K. (2014). Hadoop extensions for distributed computing on reconfigurable active SSD clusters. ACM Transactions on Architecture and Code Optimization (TACO), 11(2), 1-26.