Storage 101: Two Data Reduction Technologies Explained
Many data storage systems have data reduction features built in. Data reduction is exactly what it sounds like. It is a set of methods by which the amount of data that is stored can be reduced so that it requires less actual space on the company’s storage systems. Data reduction can have a tremendous impact on the bottom line when it comes to storage.
Think about some of the documents and spreadsheets that are in use throughout a company. It’s very likely that there are redundancies within each file. For example, how many times does the word the appear in a large document? What if, instead of actually saving the word the a few hundred times, the storage system could simply save just one instance of that word and just keep track of where else it appeared in the document? When the document is opened by a user, all instances of the word the would be visible, but the data saved would be less for the file .As this process – called compression – is applied at a broad scale and on all files, companies can achieve major storage capacity savings.
Data deduplication is a lot like compression, but whereas compression works on just the individual files in a company, deduplication works across the whole of a data storage environment. Deduplication looks for common patterns in storage across everything in the environment. Anything that is identified as duplicate data is removed and an index keeps track of exactly what has been removed so that the information can be put back when a user needs to access a file.
There is a whole lot more duplication in data center environments than it may seem. Deduplication works across all storage, not just user files. This storage includes the operating systems that run on each server. If the company has hundreds of servers all running the same version of Windows, that means that there are hundreds of opportunities for data deduplication to work its magic. Also, remember that these kinds of data reduction opportunities work on the trove of data that’s hidden “below the waterline” – that is, in the tier of data that’s visible to users. Hidden below the waterline is a vast supporting object in the form of all of the storage that users never see. Such storage includes the operating systems that support business workloads, the database server software that enables business intelligence, and the myriad f Exchange systems that keep corporate communications flowing.
It quickly becomes obvious that an organization can save a lot of money by implementing data reduction technologies and enable the storage of just a fraction of the full data set. Most storage vendors use a ratio to describe their overall data deduplication efficiency. This ratio is presented as the impact of reduction as it compares to the overall “real” size of the data. For example, a “5:1” reduction ratio would mean that the data reduction technology is reducing data to just 20 percent of its pre-reduced size. In other words, the storage system is storing 5 gigabytes (GB) of data in just 1 GB of storage space.
As you’re looking at storage, though, just make sure you understand exactly how deduplication ratios are derived. Some vendors have been known to exaggerate their ratios or use ratios from outlier scenarios to boost their numbers.
Data reduction technologies used to be difficult to procure as storage vendors charged a hefty premium for the feature set. Moreover, such technologies often had a detrimental impact on the overall performance of the storage system. Because of these factors, the technologies were not always implemented. Today, though, many of these technologies are baked right into solutions available from many vendors. Newer storage vendors often include these features at no additional cost. Those in the market for storage should make sure that these money-saving features are included.