Data Backup vs Data Archiving: It’s Imporant To Know The Difference
When talking about Data Protection, there seems to be universal confusion between the concepts of Backing Up Data and Archiving Data. Indeed we, at DataTrust, are still finding this confusion alive and well in the ranks of IT and Data Protection Professionals today!
At a high level, the main differences are:
- Data Backup refers to taking multiple replica copies of original data and storing them separately so that they may be used to restore the original in the event of it being lost or damaged beyond repair. It is a safeguard for data that is in use. This Backup / Duplication is taken as a whole and not processed in any fashion e.g. indexing for Subsequent Rapid Search and Retrieval. Backups give you multiple versions of files (versioning) so that you can recover from a multiple earlier points in time.
- Data Archiving refers to removing the actual item of data from its current location, (Usually because it is no longer being actively used or to declutter slow, clogged up systems.) to a long-term storage system. Because this store of data can, ultimately, become vast; data is tagged and indexed, as it is stored, to facilitate later rapid, easy search and retrieval – mainly for Legal & Compliance e-Discovery Reasons. In certain cases, data may also be encrypted in a manner to be able to demonstrate its authentic, unaltered status for later evidential purposes. Archives only store a single version of the file, for record purposes, since archival files will probably never change and is designed to store this indefinitely – sometimes for decades.
An Example to Illustrate:
Data Backup: “We are currently negotiating a Contract with Acme ( Dublin ) Ltd. Can I have lastTuesday’s Version of the Contract, please? I don’t want the most recent copy – we seem to have lost our way a bit and I want to go back to a point in time.”
Data Archive: “We agreed a contract with Acme ( Dublin ) Ltd. in August 2005. Can I have a copy of that please?
The significant differences here is that the Contract agreed in 2005 is an Unchanging Matter Of Record of which there could only be one version whereas the current contract being negotiated is an ongoing Work In Progress for which a Final Version has not been agreed – so it may be important to retain all iterations / versions of the Contract as it is negotiated.
Once an organization has a firm grasp of the differences, it doesn’t take long to realize that both backup and archiving are necessary, complimentary services for organizations of all sizes, especially those bound with Legal & Compliance Requirements.
The main focus or raison d’etre for Data Archiving is to provide a Low Total Cost of Ownership historical reference of information. The archive process’ final product is a long term ( decades ) non-changeable copy of data or information. It is understood and accepted that the archive media must be resilient, capable of surviving over long periods of time and must guarantee that the archived data remains unchanged during the entire archive lifespan. There should be an index of the archive media to facilitate rapid and easy retrieval .
Another function of Data Archiving is to declutter underperforming “clogged up” systems. Most of these files are either rarely used, duplicates, or completely obsolete. Storing this kind of data on your primary server will slow it down and decrease its search speed and performance.In order to maintain optimal performance of your primary servers, you need to regularly take older or less active data and move it off to another archival storage system.Because, effectively, most Archive Systems will be storing Unlimited Amounts of Data for,
relatively, Infinite Periods of Time; Total Cost of Ownership is an important factor.
The focus of Data Archiving is for e-discovery. It’s nothing like a data backup that runs as a batch process every night. With archive systems, data is archived in real time as they’re being created or received. Another difference between the two is the level of detail that archives store. Data archives store all of the metadata in a file. For example, in emails, a data archiving product stores the subject line, the sender and the receiver, and perhaps even looks into the body and attachment for key words. Archives store all of this information in a database as well as store the document and email attachment in a similar way that a data backup system would. But essentially, archives take data and put it into a database that can be searched. Whereas in a backup system, data is just stored on a tape with no search capabilities.
Basically because Data Archiving and Data Backup have two totally different, albeit complimentary, purposes your data archiving purposes are very unlikely to be met by your data backup application. If someone asks you for a specific email or file, you’re not going to be able to go to your data backup system and fashion a query to yield the desired results. For example, let’s say you have a full backup of your system for every Saturday of every week for the last five years. Then your boss comes to you and says, “I want all files and emails with this particular Surname in them.” In theory, you would have to restore and search all backups, in turn, until you have found all the files. That would mean restoring 5 Years x 52 Saturdays = 260 Backups and searching them all individually!
This is, obviously, not an option.
A classic backup application takes periodic ( Daily, Weekly, Monthly etc. ) images of active data in order to provide a method of recovering records that have been corrupted, deleted or destroyed. These Backups are retained only for a finite period with later backup images superseding /overwriting previous versions. Data Backup Configurations are, generally, determined, by operational requirements.
A Backup process creates a full copy of the data being protected exactly as it is at the moment. This data may be changing from backup to backup and these multiple different copies of the data may be important for restoring the data to a required point in time. This is referred to as Versioning as it means retaining multiple Versions of the data as it changes. Each subsequent will grab a subsequent Version of the Data.
In a Disaster Situation – the backup needs to be accessible fast for recovery. Remember that Data Backup protects Live In-Use Data. Also, the Backup needs to execute pretty fastly within constrained time periods when data is in a rlatively low rate of change e.g. Overnight, Lunch Time etc. As the amount of data goes above a certain size it may become important, also, that the backup system is able to identify and back up only data that has changed to allow backups to be run within the allowable Backup Window e.g. Over Lunch, Over Night or Over Weekend..