How to Find the Right Balance between Archive, Backup, Nearline, and Online
Data expands to fill the storage space available. Well, data isn’t really expanding; rather this is a corollary to Parkinson’s Law – “Work expands so as to fill the time (or space) available for its completion”. In short, it may seem like your storage system will never be big enough. As capacity requirements continue to increase due to high-resolution camera and delivery formats, facilities frequently find themselves battling to free up space on their shared storage systems. Gone are the days when you could simply lay back the show master to Digibeta and throw it in a vault. In 2015 everything is file-based, and there’s a wanton desire to keep everything involved in a project including camera originals, editorial proxies, CGI, scripts, logs, you name it!
When it comes to storing project data used in post production, it’s common to refer to different storage “tiers” used for the different stages of a file’s life. Tier 1 represents the fastest, and most expensive, collaborative storage. Tier 1 storage utilizes the best technology in spinning disks and/or SSD to attain maximum performance. It’s synonymous with “on-line” and is the main work volume for compressed HD or 4K and 6K editorial and finishing work in process. For data that is not in process, we turn to Tier 2 storage system, commonly referred to as “Nearline” storage.
Tier 2 storage tends to be cheaper and consequently may have less performance or availability than Tier 1. It may be comprised of commodity RAID enclosures on a client workstation or NAS head for network access. Some facilities that own substantial IT infrastructure may use a portion of their enterprise storage as well. Nearline exists primarily to move data from Tier 1 storage that is not actively needed, but that is anticipated to be needed in the near future, possibly with short notice. Nearline also has a management layer that assists in the movement of data to and from the Teir1 when requested. This management can be very simple, as in scheduled movements and automated restore; or very complex with systems that obscure the location of the data in an object-oriented database.
This leaves Tier 3, which is most commonly associated with long term archive, and typically utilizes either LTO (Linear Tape-Open) drives, or a cloud storage solution. LTO tape is popular for many due to its high capacity, relatively speedy restore (once the tape is ready), small footprint, and low cost. Cloud storage can be even cheaper, but best used as an option for project-based deep archive requirements. Since you’re likely to incur charges to retrieve your data from the cloud, it works best if your clients are paying for that service. Tier 3 is likely the place your data ends up when it can’t be deleted, but isn’t immediately needed.
What many facilities fail to consider is the difference between archive and backup. Archive is all about the longevity and media compatibility. If you can access the data at all in 10 years, you’ll be happy. Backup on the other hand is a safety net, and it’s all about the restore. How quickly can you get back to work after a system failure, accidental deletion, or other disaster? If you rely on generic LTO or cloud services for backup of more than a few terabytes, you may be waiting a long time as compared to the spinning disk alternative.
Each facility has individual requirements based on the clients they support, and the services they offer. If you’ve sold your client on the need to keep their media safe for re-purposing to 4K/8K three to four years from now, you’re in the archive space. Tape can be a great solution at this point; however, a bunch of non-spanned tapes on the shelf with manual interaction required when brought back to tier 1 can be tricky down the road. For more demanding scenarios, you’ll most likely need a more robust tape infrastructure such as a robotic library with indexing and data automation for the quickest unattended retrieval.
If you’ve set a clear expectation with the departing client that the minute they need to get back in the facility, the media and all project assets will be available, you’re talking about a hybrid category, offload. Offload is the process of moving data off the tier 1 storage simply due to space constraints. The reason for doing this is obvious, but the target of this data can vary depending on the situation. In the example above, the target may be spinning disk, or LTO if the tape solution is robust.
Offload is not nearline, because nearline is seen as a more managed approach to data movement, while offload is a reactive action to the situation. If you have a nearline strategy, you may not need to offload, as any requirement for offload would be added into the nearline management policies.
Finding the Balance
With the steady increase and ever expanding size of media files coming through the door, finding the right balance between archive, backup, nearline, and online storage has become a critical component for any post house.
Tape wins for pure archive due to longevity of the media, but what about short-term archive? Nearline disk is a great backup for jobs that are still in process, but what if restore speed isn’t an issue? What if you have too much media to hold on spinning disk for even another week, but your facility will bear the cost of archiving that material? No one has all the answers because no one knows your business, but some guidelines can be helpful.
At Facilis, we offer our SyncBlock products including LTO drives and libraries, and NL16 nearline disk arrays for customers seeking a nearline solution. The NL16 is a 16-drive RAID5 or RAID6 array based on the ATTO ExpressSAS hardware. It’s shipped with software that manages scheduling, file decisions and policies, and automated restore of data. Give us a call, and we’ll be happy to discuss your unique requirements.