Authored by Thomas Schwarz, Ahmed Amer, Thomas Kroeger, Ethan Miller, Darrell Long and Jehan-Francois Paris, RESAR: Reliable Storage at Exabyte Scale presents the results of several years of research on how to store data at the very large scale.
Stored data needs to be protected against device failure and irrecoverable errors, and doing so at exabyte scale can be challenging given the large number of potential failures that must be handled. To address these challenges, the authors developed RESAR, which offers much greater flexibility and robustness than previous methods, and also offers greater manageability, broader potential for energy savings, and easier handling of heterogeneous storage devices.
“We developed a novel two-failure tolerant disk layout that repairs most of a disk failure faster, and even without this, is a magnitude more robust than the standard architecture of declustered RAID Level 6,” says Thomas Schwarz, lead author of the publication. “An earlier layout with a million disks was emulated at Sandia National Laboratory, and data lost by disk failures was restored in most cases within five minutes.
“We have been working on this for some time,” said Darrell Long, one of the paper’s co-authors. “We came up with several new coding and layout schemes, including the winning bipartite graph model, and we use graph coloring for data placement.”
The research was supported by the National Science Foundation, the Department of Energy and by the industrial members of the Center for Research in Storage Systems, a research center in the Baskin School of Engineering at UC Santa Cruz.
For more information about the research, or to download the publication