Amazon Glacier Demands Planning: Analyst

by Nick Kolakowski Aug 29, 2012 3 min read

Amazon’s recently launched Glacier service targets companies with a lot of data to archive in a low-cost, highly durable way. By uploading the contents of tape libraries and other deep-storage formats to the cloud, those organizations can save space and resources—although Glacier’s cost structure and slow speed of retrieval makes it best suited for data that needs to be accessed on an infrequent basis, such as years-old legal records and research files. According to one analyst, Glacier’s offerings demand the user carefully plan out their storage strategy before uploading massive amounts of data to the cloud. “Glacier is not appropriate for backups for active chargeable data because it does not have the RTO (recovery-time objective) and RPO (recovery point objectives) that this data needs,” David Hill, an analyst with The Mesabi Group, wrote in an Aug. 29 research note. “It might be useful for the backup of some active archive instances, but this would have to be carefully thought through.” Glacier is more ideal, he added, for “cold” storage of fixed (i.e., permanent) archive data—which he estimates at 80 percent of total data today. Glacier’s 256-bit encryption is another bonus, sparing organizations from the inevitable security concerns that come with storing data in-house. With regard to durability, Amazon claims that Glacier can “sustain the concurrent loss of data in two facilities,” and that data undergoes regular integrity checks. Glacier is similar to Amazon’s Simple Storage Service (termed Amazon S3) in that both are meant for storing and retrieving data from anywhere with a Web connection. However, S3 was designed with rapid data retrieval in mind. Glacier data-retrieval requests, by contrast, can take between 3 and 5 hours. Amazon charges $0.01 per Gigabyte per month for Glacier storage, tossing in the retrieval of the first 5 percent of the customer’s average monthly storage for free. Glacier stores the data as “archives” within what it calls “vaults,” with individual archive limits of 40TB. “Storing one file in an archive would facilitate recovery, but if you had millions of files, that would not be very practical,” Hill wrote. “Plus, retrieving an archive with millions of documents where only a much smaller number of documents are needed would incur extra time and cost.” In light of that, he added: “If some data are more likely to be retrieved than other, assigning that data into specific archives would be useful.” On a much longer time horizon, users will also need to consider the software used to read and analyze particular types of data eventually becoming obsolete: “For guidance in this area, organizations should study the good work that the eXtensible Access Method (XAM) group at Storage Networking Industry Association (SNIA) is doing on this subject.” Hill believes that the pressure is on other IT vendors and cloud companies to provide similar deep-archiving solutions: “In fact, unless they move quickly other public cloud providers may essentially cede this market to AWS, which is great for Amazon’s growth and market ambitions.” (On the private cloud front, companies such as SpiderOak offer solutions for storing encrypted data.) Glacier may not kill the use of in-house tape as an archiving solution, he concludes, but companies large and small could end up gravitating toward it as a way to store data without infrastructure costs and security concerns. Image: Volodymyr Goinyk/Shutterstock.com