Main image of article A Cloud Storage Latency Fix With Minimal Code Change

In my first article on cloud storage latency, I gave a brief review of cloud storage, specifically how Amazon S3 stores and references objects. Now we'll pick up where we left off to begin illustrating one of many possible solutions that can help solve the latency that haunts cloud storage, specifically in the mobile space. Cloud storage providers that are worth their salt all have at least two regions in the U.S. to choose from to store your objects. For larger providers, they also have at least Asia-Pacific storage and European storage facilities. Keep in mind that just because a vendor has multiple sites doesn't mean that your data is replicated between sites. Most vendors require you to upload your data to the specific sites where you'd like to have your data reside, and you must specify the specific bucket where you'd like to retrieve it from. This isn't usually an issue with Web applications, but it can cause serious latency with mobile applications in different parts of the world. Again, there are many ways to solve latency with mobile-to-cloud uploads, and Amazon S3 actually makes it a little easier for both the mobile client side and the server side.

A Scenario

Imagine an Android data collection app that's connecting via Bluetooth and reading data streams from scientific devices that are measuring defects in pipes. Once the app has completed the collection and the operator enters site-specific data, thousands of data points are collected into a single file to be pushed to the cloud (Amazon S3) in batch style. Having a single connection and transaction is far better than having thousands, or even hundreds, of transactions, as we'll see later. Once the file of data points have been successfully uploaded, a call to the Web service hosted on the Rackspace Cloud Service is then made by the Android device. At this point, the operator believes that the servers are analyzing the data points and calculating all of the defects in the pipe. Unfortunately, for those scans outside of the U.S. latency inside of the Amazon network seems to occur, which happens to be fairly common from the ingest servers to the final destination servers. This isn't normally an issue, except the analytic servers are “believing” the data resides at a specific storage location when in reality, the data hasn't yet arrived.

The Solution

Though there are many ways to handle this scenario, there are constraints on what code can be changed on the server and the Android device. This solution allows for the least amount of code changes. Instead of using just the one West Coast storage bucket on Amazon S3, we created additional buckets for each region around the globe. For each one, we used the same name as the current bucket, but appended a short region name to make sure each bucket name was unique (recall from my first post that Amazon S3 requires ALL bucket names to be unique globally). Once the buckets were established, the Android upload code was altered to support a drop-down menu that allowed the operator to choose which region to upload the data file to. (This could also have been dynamically determined, but the decision was made to put the option in the hands of the operator.) Also, we couldn't alter the Web service call or create a different version of the Web service, so we had to alter its retrieval portion code to look for the object in all of the Amazon buckets. The service code already did an exponential back-off on download requests for a specific number of retries, so no additional changes were needed there. Mobile-to-cloud uploads need to make as few round trips as possible, have the ability to work/gather offline, and be able to upload once back online. The process worked as designed, and execution was much faster than anticipated.