What new criteria to integrate into your backup policy?

What new criteria to integrate into your Backup policy?

In a previous article, we identified that traditional backup policies based on long retention strategies for local backups versus shorter retention durations for offsite (Cloud) backups were becoming less and less relevant.

In this article, we will analyze the potential needs and the impacts of changing the traditional logic: moving to short local retention and long distance (Cloud) retention.

Understanding deduplication technologies

Although deduplication technologies were quickly integrated into high-end backup solutions through specialized appliances (especially in the context of target-side deduplication solutions), they were only moderately effective in traditional software backup solutions (using source-side deduplication which leads to network and source server consumption).

Thus, on very mature target-side deduplication technologies (such as Dell Data Domain, Avamar, or HPE StoreOnce), deduplication rates can reach 90 to 95%. However, these appliances have numerous constraints:

– Scalability and cost of the appliances in terms of storage space
– The need to rehydrate old backup data (for example, for retention periods of 5 or 10 years), it is necessary to follow technological evolutions (in short, migrate backups to new appliances): costly and tedious.
– Hardware and software obsolescence

For purely software backup solutions, deduplication rates are much lower (between 30 and 70%), depending on many parameters:

– File types
– Backup methods
– Used algorithms

In summary, the most effective deduplication solutions have long been deployed on local backup appliances (allowing for target-side deduplication, much more efficient but inducing constraints related to any hardware device: capacity management, obsolescence).

Current software backup solutions that allow saving in the Cloud have much less efficient deduplication engines because they rely on source-side deduplication principles.

A Significant Evolution: Perpetual Incremental Backups (Forever Incremental Backup)

A perpetual incremental backup strategy eliminates the concept of full backups and their disadvantages on weekend backup windows for complete backups (which exist even when using synthetic full backups) but generates new risks:

– How to be sure that the restoration will work knowing that there is never a complete backup done: how to check the integrity of the backup
– How to guarantee a stable restoration duration: that the restoration does not have to read data from too many data blocks which greatly slows down the restoration time
– How to reorganize all the saved blocks at regular intervals without impacting the servers of the original infrastructure

The different backup solutions have different answers to cover these risks, but remember that the method that generates the most trust is the solution to restart the server with its data and verify that everything works well.

New, more efficient Backup engines integrating source and target deduplication: the best of both worlds?

New backup engines have appeared quite recently in Linux environments. Why Linux, mainly because there are large research organizations that generate petabytes of scientific data that need to be backed up (with extremely important data growth rates).

In this context with large Linux servers (from a few TB to tens or hundreds of TB), traditional backup solutions generated:

– Too long backup durations
– Gigantic storage space consumption
– Interruptions or errors that require restarting if the software does not manage resumes in all contexts
– And ultimately, the inability to restore these volumes of data within reasonable time frames (thus, to restore 10 TB servers, with traditional backup solutions, it could take 3/4/5 days and regularly not complete).

Therefore, new backup engines integrating the concepts described above such as:

– Perpetual incremental backups that avoid regularly performing complete (full) backups
– Very interesting because they no longer require performing complete backups that consume
– But to have a high level of confidence, require additional mechanisms at the storage target of the backups for regular checks of the consistency
– New deduplication methods based on Content Defined Chunking, which are significant evolutions in the efficiency of deduplication algorithms
– These Content Defined Chunking (CDC) mechanisms, which we will translate as rolling hashing, offer a high rate of deduplication.
– Possibilities in a GDPR context, to delete in the backup history, all data from a folder, client, etc.

In a future blog article, we will explain in detail all these possibilities.

Nuabee's contact

65, rue Hénon
69004 Lyon - France
Contact Us - Homepage top right - OK