When we originally planned our the filesystem for our backup servers we were confident that even at 250 million files, backups would process within a reasonable time (read: less than 12 hours to process & cleanup old backups).
Unfortunately, as of this email, our backups node has over 1.04 billion files within the filesystem. Cleanups are taking longer than the buffer time between each backup run, meaning loads on the server are going to keep compounding till it locks up.
While we could've used TAR archives, we would need to have 4x+ the capacity we do now to keep backups, pushing costs to the point of it being unrealistic to include them for free.
Francisco has been in discussion with multiple hardware vendors to come up with a solid platform to address this major issue. While the new setup will allow for faster backup times as well as sub minute cleanup times, a complete wiping of all daily backups is required to get a fresh start.
Once the formatting is completed, we expect to have a fresh initial backup in place within 24 hours and the stallion code adjustments completed to have us back on a 7 day backups rotation.
We're fairly certain we'll be able to keep all snapshots currently in the system.

0 comments:
Posting Komentar