In a previous article, we identified that traditional backup policies based on long retention strategies for local backups versus shorter retention durations for offsite (Cloud) backups were becoming less and less relevant.
In this article, we will analyze the potential needs and the impacts of changing the traditional logic: moving to short local retention and long distance (Cloud) retention.
Understanding deduplication technologies
Although deduplication technologies were quickly integrated into high-end backup solutions through specialized appliances (especially in the context of target-side deduplication solutions), they were only moderately effective in traditional software backup solutions (using source-side deduplication which leads to network and source server consumption).
Thus, on very mature target-side deduplication technologies (such as Dell Data Domain, Avamar, or HPE StoreOnce), deduplication rates can reach 90 to 95%. However, these appliances have numerous constraints:
– Scalability and cost of the appliances in terms of storage space
– The need to rehydrate old backup data (for example, for retention periods of 5 or 10 years), it is necessary to follow technological evolutions (in short, migrate backups to new appliances): costly and tedious.
– Hardware and software obsolescence
For purely software backup solutions, deduplication rates are much lower (between 30 and 70%), depending on many parameters:
– File types
– Backup methods
– Used algorithms
In summary, the most effective deduplication solutions have long been deployed on local backup appliances (allowing for target-side deduplication, much more efficient but inducing constraints related to any hardware device: capacity management, obsolescence).
Current software backup solutions that allow saving in the Cloud have much less efficient deduplication engines because they rely on source-side deduplication principles.
A Significant Evolution: Perpetual Incremental Backups (Forever Incremental Backup)
A perpetual incremental backup strategy eliminates the concept of full backups and their disadvantages on weekend backup windows for complete backups (which exist even when using synthetic full backups) but generates new risks:
– How to be sure that the restoration will work knowing that there is never a complete backup done: how to check the integrity of the backup
– How to guarantee a stable restoration duration: that the restoration does not have to read data from too many data blocks which greatly slows down the restoration time
– How to reorganize all the saved blocks at regular intervals without impacting the servers of the original infrastructure
The different backup solutions have different answers to cover these risks, but remember that the method that generates the most trust is the solution to restart the server with its data and verify that everything works well.