Dynamic single node failure recovery in distributed storage systems

Itani, M.; Sharafeddine, S.; Elkabani, I.

dc.contributor.author	Itani, M.
dc.contributor.author	Sharafeddine, S.
dc.contributor.author	Elkabani, I.
dc.date.accessioned	2018-06-06T12:45:10Z
dc.date.available	2018-06-06T12:45:10Z
dc.date.copyright	2017	en_US
dc.date.issued	2018-06-06
dc.identifier.issn	1389-1286	en_US
dc.identifier.uri	http://hdl.handle.net/10725/8019
dc.description.abstract	With the emergence of many erasure coding techniques that help provide reliability in practical distributed storage systems, we use fractional repetition coding on the given data and optimize the allocation of data blocks on system nodes in a way that minimizes the system repair cost. We selected fractional repetition coding due to its simple repair mechanism that minimizes the repair and disk access bandwidths together with the property of un-coded repair process. To minimize the system repair cost, we formulate our problem using incidence matrices and solve it heuristically using genetic algorithms for all possible cases of single node failures. We then address three practical extensions that respectively account for newly arriving blocks, newly arriving nodes and variable priority files. A re-optimization mechanism for the storage allocation matrix is proposed for the first two extensions that can be easily implemented in real time without the need to redistribute original on-node blocks. The third extension is addressed by implementing variable fractional repetition codes which is shown to achieve significant cost reduction. The contributions of the paper are four fold: (i) generating an optimized block distribution scheme among the nodes of a given data center for fixed and variable size blocks; (ii) optimization of storage allocation under dynamic environments with data block arrivals; (iii) optimization of storage allocation with newly added storage nodes; and (iv) generating an effective block distribution scheme among the nodes by accounting for varying priorities among data blocks. We present a wide range of results for the various proposed algorithms and considered scenarios to quantify the achievable performance gains.	en_US
dc.language.iso	en	en_US
dc.title	Dynamic single node failure recovery in distributed storage systems	en_US
dc.type	Article	en_US
dc.description.version	Published	en_US
dc.author.school	SAS	en_US
dc.author.idnumber	200502746	en_US
dc.author.department	Computer Science and Mathematics	en_US
dc.description.embargo	N/A	en_US
dc.relation.journal	Computer Networks	en_US
dc.journal.volume	113	en_US
dc.article.pages	84-93	en_US
dc.keywords	Distributed storage systems	en_US
dc.keywords	Fractional repetition codes	en_US
dc.keywords	Failure recovery	en_US
dc.keywords	Genetic algorithms	en_US
dc.keywords	Variable fractional repetition codes	en_US
dc.identifier.doi	https://doi.org/10.1016/j.comnet.2016.12.005	en_US
dc.identifier.ctation	Itani, M., Sharafeddine, S., & Elkabani, I. (2017). Dynamic single node failure recovery in distributed storage systems. Computer Networks, 113, 84-93.	en_US
dc.author.email	sanaa.sharafeddine@lau.edu.lb	en_US
dc.identifier.tou	http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php	en_US
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S1389128616304200	en_US
dc.orcid.id	https://orcid.org/0000-0001-6548-1624	en_US
dc.author.affiliation	Lebanese American University	en_US