Practical Multiple Node Failure Recovery in Distributed Storage Systems

Itani, M.; Sharafeddine, S.; ElKabbani, I.

dc.contributor.author	Itani, M.
dc.contributor.author	Sharafeddine, S.
dc.contributor.author	ElKabbani, I.
dc.date.accessioned	2018-07-02T11:34:37Z
dc.date.available	2018-07-02T11:34:37Z
dc.date.copyright	2016	en_US
dc.date.issued	2018-07-02
dc.identifier.uri	http://hdl.handle.net/10725/8152
dc.description.abstract	As multiple node failures are becoming so frequent in distributed storage systems, many erasure coding techniques are emerging to handle such failures. In this paper we use the fractional repetition code to apply as a redundancy scheme for multiple failure recovery with optimized system cost. The fractional repetition (FR) code is a class of regenerating codes that consists of a concatenation of an outer maximum distance separable (MDS) code and an inner fractional repetition code that splits the data into several blocks and stores multiple replicas of each on different nodes in the system. We model the problem as an integer linear programming problem that uses modified versions of the fractional repetition code by allowing different block sizes, and minimizes the recovery cost of all dependent and independent multiple node failure scenarios. First, we generate an optimized block distribution scheme that minimizes the total system repair cost together with a full recovery plan with a node repair order for the system. Moreover, we account for the common scenario of having newcomer blocks. We allocate newcomers to nodes with minimal computations and without changing the original optimized plan. The problem is solved using genetic algorithms that search within the feasible solution space. Fast convergence validates the efficacy of our algorithms for different system parameters. Simulation results are shown to be close to optimal for the case of newly arriving blocks.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Data transmission systems -- Congresses	en_US
dc.subject	Telecommunication -- Data processing -- Congresses	en_US
dc.subject	Wireless sensor networks -- Congresses	en_US
dc.subject	Cloud computing -- Congresses	en_US
dc.subject	Internet of things -- Congresses	en_US
dc.subject	Smart power grids -- Congresses	en_US
dc.title	Practical Multiple Node Failure Recovery in Distributed Storage Systems	en_US
dc.type	Conference Paper / Proceeding	en_US
dc.author.school	SAS	en_US
dc.author.idnumber	200502746	en_US
dc.author.department	Computer Science and Mathematics	en_US
dc.description.embargo	N/A	en_US
dc.publication.place	Piscataway, N.J.	en_US
dc.description.bibliographiccitations	Includes bibliographical references	en_US
dc.identifier.doi	http://dx.doi.org/10.1109/ISCC.2016.7543851	en_US
dc.identifier.ctation	Itani, M., Sharafeddine, S., & Elkabbani, I. (2016, June). Practical multiple node failure recovery in distributed storage systems. In Computers and Communication (ISCC), 2016 IEEE Symposium on (pp. 901-907). IEEE.	en_US
dc.author.email	sanaa.sharafeddine@lau.edu.lb	en_US
dc.conference.date	27-30 June 2016	en_US
dc.conference.pages	901-907	en_US
dc.conference.place	Messina, Italy	en_US
dc.conference.title	2016 IEEE Symposium on Computers and Communication (ISCC)	en_US
dc.identifier.tou	http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php	en_US
dc.identifier.url	https://ieeexplore.ieee.org/abstract/document/7543851/	en_US
dc.orcid.id	https://orcid.org/0000-0001-6548-1624	en_US
dc.publication.date	2016	en_US
dc.author.affiliation	Lebanese American University	en_US