Feature Story


More feature stories by year:

Return to: 2004 Feature Stories

CLIENT: BakBone Software

August 2004: LinuxWorld Magazine

NEXT GENERATION BACKUP TECHNOLOGIES FOR LINUX

Linux based servers are fast becoming the low-cost alternative to higher priced proprietary Unix/Windows environments. Finding a cost-effective storage management solution to support different environments can be a challenge. As enterprise IT environments grow and change, cost-conscious administrators are constantly on the lookout for more efficient ways to scale out storage configurations. So what's available to quickly and flexibly meet these pressing business needs?

A New Approach to Media Management

Enter the Virtual Disk Library (VDL, also referred to as Virtual Tape Library by many in the industry). It looks and feels like a tape library. But it sets up an entirely new media management paradigm for Linux users. With VDL, tape becomes a strategic component of a data protection strategy, not its major element. Managers can create multiple duplications of backup jobs from the VDL to tape, or vice versa. Administrators can set up specific backup policies, such as retention dates, rotation schemes, media groups. VDL also allows save sets to be accessed wherever they reside. Incremental saves can be sent for fast restores from disk. Storing backup data on a VDL allows data copy jobs to be run off line, without impacting the network, application servers or workstations. VDLs are immune to the mechanical afflictions of tape backup over a network—such as shoe-shining, or a slow data stream host. They can capture data in drips or blasts, arriving at virtual media slots as a save-set, which brings us to how VDLs are constructed.

Deconstructing the VDL

Based on a modular, object-oriented architecture, the software that runs VDL lets Linux administrators integrate tape backup and restore operations seamlessly with a variety of databases and messaging applications, storage devices, and storage area networks (SANs).

The VDL is basically a directory structure on disk consisting of directories called drives and slots. Numbered directories reside under each of these directories and each numbered directory defines a unique slot or drive. A media file resides in each slot- numbered directory. These are the virtual library's "tapes".

The virtual libraries are viewed as real physical libraries. The more drives the VDL contains, the more simultaneous backups can be performed. Virtual libraries always have many more slots than drives and are usually configured with a minimum of 8 slots. Having extra slots allows for the proper handling of backup retention cycles. In addition, different operating systems may impose limits on the maximum file size, which can affect the number of slots needed. When the system is configured for Linux and the number of slots and media capacity is defined, media files are created and the space is pre-allocated.

Administrators can also install application plug-in modules for Oracle®, MySQL®, Sybase®, PostgreSQL®, Informix®, and various other applications. The modules automatically add application-specific components to the backup and restore selection criteria that appear on the system's graphical user interface. From this common GUI, Linux administrators can manage all backup and restore operations across a Storage Area Network (SAN), network attached storage (NAS), wide area network (WAN), or local area network (LAN).

When to Use a VDL

VDL Staging can be useful in two areas. If a company has a huge file system with millions of files, a typical server might not be able to read these files fast enough to stream today's high-performance tape drives. This can lead to shoe-shining and premature drive or media failures. There's no shoe-shining with disk storage, so there's no downside from slow performance when backing up to a VDL.

On the other hand, if the backup window is too small to back up several Clients onto a limited number of tape drives, a VDL with enough Virtual Drives could back up all Clients simultaneously. Performance here would hinge on network bandwidth, requiring gigabit network to handle the load. For example, if a Linux user wanted to back up five Clients in one hour, each with 10GB of data and only one tape drive performing at 18GB/hour, the back up window would be too small. With Disk Staging the user could back up to multiple Virtual Tape Drives first, copy to physical tape and define enough Virtual tape drives to complete back up all the Clients within an hour.

VDL Staging vs. Multiplexing

There are essentially two approaches to multiple Clients backing up to limited tape drives in short backup windows: Staging and multiplexing.

In multiplexing, multiple streams of backup data are sent to one tape device. This results in a number of drawbacks. For one, a backup of any given client will span more tape than is actually required, which calls for handling multiple tapes per client backup. There's also a higher probability of failure if one of the media fails, since more media is used for any given backup. Restores are longer, because more tape needs to be scanned for a given restore time, since data must be reconstructed from multiple data streams. Multiplexing also uses more CPU time on the backup server, because data streams must be reorganized and packed into a multiplexed stream. This can create performance problems with today's high-speed tape devices.

With the VDL staging approach, extra disk space is required for the virtual library resource allocation. But each Client's back ups are always contiguous on tape, which uses less tape and speeds up restores.

Consolidating File System Backups

Combining VDLs and tape for back up opens up a variety of different strategies that answer the need for increased data protection, faster backups and restores, even reduced data vulnerability through multiple copies. A key strategy involves consolidating file system backups.

Consolidated backups let users create a "synthetic" full backup without running a weekly full backup. Full backups are very resource intensive. They can consume a considerable amount of network bandwidth (especially when backing up across the LAN) and server bandwidth that may be better used elsewhere. Consolidating a file system backup also won't consume system resources (Network or Application Server bandwidth), freeing administrators to run full backups anytime they want without impacting production. Although it consumes the backup server's resources and VDL/tape resources, these are typically not in use during normal business hours. Running consolidated full backups makes it easier to run backups during normal hours, resulting in a "good" backup, since its progress will likely be monitored.

Before consolidating a Linux system for back up, users should first determine the VDL size, configuration and location. How big will the library be? For a tight backup window with several clients backed up simultaneously, the VDL must be big enough to accommodate all the data for every client. If the server has a very large file system, clients may only need enough space in their VDL to handle one backup at a time.

Another factor to consider is VDL geometry in terms of drives and slots. The number of simultaneous backup jobs a client needs to run will dictate the number of drives required. The total size and the size of the Linux operating system will dictate the number of slots needed. Users should know how much available free disk space they have before attempting to create the virtual library. Finally, there's the decision of where to create the VDL. This is usually done on the software supplier's server. The actual physical library or tape drive must be configured, tested and good to go by the client.

The Goal…simplify systems management

The goal of any enterprise-wide storage system is to simplify systems management in heterogeneous environments. VDLs create a backup and restore option for scalable enterprise computing environments. One that allows IT staff to administer both tape-based and disk-based storage from a common GUI for better efficiencies and cost economies. For a freely-distributable, multi-platform operating system like Linux, VDL presents a viable storage management solution.

Data Protection Considerations

Below are some factors to consider when determining appropriate backup policies:

  • Time to recovery objectives
  • Total amount of data to be backed up
  • Backup window (i.e., amount of time in which to complete a backup)
  • Type and speed of network infrastructure (LAN, SAN, WAN)
  • Location of the data (Local/Remote)
  • Type and speed of backup media being used (Disk vs. Tape)
  • Length of time that you must keep data
  • Budget for media
  • Number of data files to be backed up

Return to: 2004 Feature Stories