I/O deduplication: Utilizing content similarity to improve I/O performance Article

Koller, R, Rangaswami, R. (2010). I/O deduplication: Utilizing content similarity to improve I/O performance . 6(3), 10.1145/1837915.1837921

cited authors

  • Koller, R; Rangaswami, R

fiu authors

abstract

  • Duplication of data in storage systems is becoming increasingly common. We introduce I/O Deduplication,a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations. I/O Deduplication consists of three main techniques: content-based caching, dynamic replica retrieval, and selective duplication. Each of these techniques is motivated by our observations with I/O workload traces obtained from actively-used production storage systems, all of which revealed surprisingly high levels of content similarity for both stored and accessed data. Evaluation of a prototype implementation using these workloads showed an overall improvement in disk I/O performance of 28 to 47% across these workloads. Further breakdown also showed that each of the three techniques contributed significantly to the overall performance improvement. © 2010 ACM.

publication date

  • September 1, 2010

Digital Object Identifier (DOI)

volume

  • 6

issue

  • 3