When Hurricane Sandy hit the East Coast, the combination of high winds, rain, and storm surges wreaked havoc on homes and businesses alike. SlashDataCenter paid close attention to New York City, where floods swept the southern tip of Manhattan. With a data center on the Avenue of the Americas, CoreSite Realty escaped the worst the storm had to offer. But was it coincidence or careful planning?
Slashdot sat down with Billie Haggard, CoreSite’s senior vice president of data centers. He’s responsible for the design, construction, maintenance, facilities staffing and uptime, reliability and energy efficiency of CoreSite’s data centers.
First off, can you explain your role within CoreSite?
We build, maintain and operate all of our datacenters, from the construction to the facilities, engineering, and security of all the data centers. So they’re the ones the in the field making all the magic happen.
From your Website, the only adverse effect that CoreSite suffered from Hurricane Sandy was switching over to generator power on the night of the 29th. You were never down. Is that true, or were there problems we didn’t know about?
We were on generators from the 29th, which was a Monday, through the week until Sunday morning—the 4th. For any facility, you have to be anticipating at least 24 hours that you’re going to be operating as an island. And when I say that, what a data center needs external to the datacenter—you’ve heard a lot about fuel. Water can actually be a problem in a data center, for periods longer than 24 hours. Extended utility power. Personnel is a really big deal. And then how the network connects to the data center.
You were in Zone B [where flooding was a possibility]. Was there any flooding at all?
There was a little flooding in the basement, but it didn’t affect any of our equipment. I don’t think it affected any of the tenants in 32 Avenue of the Americas.
What sort of contingency planning do you always have in place? And what sort of specialized plans did you have in place for Sandy, when it came on your radar?
For any incident, we have a business continuity plan as well as a disaster recovery plan. And they’re driven by certain timing—disasters are. For Sandy, five days out, what it triggers is—we hold a meeting, and we go down through our checklist. Do we have this? Do we have this? Do we apply new people? Do we need to reserve hotel rooms? We go through and we check communications, the staffing of our personnel, looking at our individuals, and determining whether they have special needs. Coordinate with people we depend upon, which is our vendors and contractors. Things like making sure we have an electrician available, so if we need to we can run an alternate power source, if we have an extended outage.
Something that gets missed, when we have extended outages like this, is feeding people. So just taking the preparations and going out and shop and have three days worth of food at the site. Spare parts, run through all of our equipment checks, make sure all of our operations are ready to handle an outage.
So specifically for Sandy, it’s five days out, going through the contingency plans, making sure that we’ve gone through the checklist and making sure that everyone knew what they were doing and making sure all the preparations were made.
On the evening of the 29th, when the hurricane made landfall, what happened?
We had three people on site, one facilities person and one operations person. We didn’t know what the impact was going to be. And, initially, we started talking about power, and the lights started flickering. Our facilities personnel made the decision that utility power was not stable, so we proactively transferred our site to generator [power]. Because you don’t have to have an outage to create problems in the data center. Those momentary flickers take hits on the batteries, they can cause spikes on the electrical grid, grounding problems and things like that. So we proactively went to generators, prior to Sandy hitting. So we were all stable.
At that point, then, the only concern was the power?
For us, it was just power. For us, on the communications side, all of our sites are remotely monitored from our two operational support centers. You normally call them a NOC, but we call them an operational support center. And so they can monitor video feeds, from a video standpoint, they can use cameras where everybody’s at, and then all of our equipment is monitored remotely. But also our security -we have special radios – I can use a security radio and talk anywhere in the country, independent of the phone systems. Because we’ve found that in the past that when you have – the Northeast utility outage that occurred a few years back, the biggest problem was everybody lost their cell phones.
On the 29th, how many days of fuel did you have available for your generator?
We had three days. Probably about three and a half, but at least three days.
So at what time did you start worrying that wasn’t going to be enough? Or did you?
Well, I’ll tell you, the best investment I ever made… was that we paid $9,000 at the beginning of the year to have a guaranteed fuel delivery within eight hours. So when everyone started scrambling, trying to find fuel, ours was already paid for. We were at the top of the list.
So eight hours in, we already had fuel trucks running. And every 24 hours, we had fuel, even though we didn’t need to.
And who was that contract with? A local fuel supplier?
It was actually two local fuel suppliers. We had one in Manhattan, and one outside Manhattan. Just in case one couldn’t deliver, we had an alternate. But we had guaranteed fuel for us.
$9,000. That was a great investment.
That was the thing. At the end of the year, we might have said, man, that was $9,000 wasted. But for this event, and all future events, it more than paid for itself.
So you always had three days of fuel on hand.
At least two. Because if you get a truck every 24 hours, you always consume a day of fuel.
You said you had three days worth of food on-site, and three employees there. So they had cots, and slept on-site?
Yes. We reserved hotel rooms, but talk about lessons learned: what we found is even though we had personnel in hotels, they lost power and water in the hotels. So it was actually more advantageous for the guys to be at the sites sleeping, and they had water and shower capabilities and food.
We also found that our customers didn’t plan on not finding places to eat, and we were actually feeding our customers.
So how much food did you actually have on hand?
We had more than three days, and we actually had food delivered from uptown.
I’m wondering: were they subsisting on trail mix, or something more substantial?
They actually ate pretty good, considering that most people in lower Manhattan didn’t have food, power, water and everything.
As the hurricane moved through, and past, it sounds like CoreSite’s experience wasn’t that bad. What lessons did you learn from all of this?
One is to ensure that our documentation and our checklist goes beyond 24 hours. I think that most of—if you look at the tier ratings, how a datacenter is classified, Tier 1, 2, 3, or 4 and 4 being the most reliable—all the requirements are based on 12 hours. Twelve hours of fuel, twelve hours of water for your cooling systems, and we’ve always looked at 24 hours. In preparations for disasters such as Sandy, in the future, we’re going to expand that to three days.
And the other thing is, we had to depend a lot on outside organizations. As I said, the data center becomes an island. And within the building at 32 Ave. of the Americas, the biggest problem was with network people. Even though the data center stayed up, we actually had customers of the building that had lost power and had lost connectivity and so we were sending electricians to reroute power and connectivity on our site so they [customers] could use our power and our connectivity to power their facilities.
The other thing is making sure our customers understand that temporary systems are not good in situations like this. One of our major carriers, their backup system was to bring up a rollup generator. And from what I understand, they paid to have this generator there in four hours, and when they had this generator up, the police confiscated it for emergency use. So their backup generator wasn’t there any more.
So what was their backup plan for that backup plan?
They went down. They went down until they found another rollup generator.
Did that affect CoreSite?
It affected some of our customers that were connected to the network. Again, since we treat the data center as an island, we have to look at people outside data centers that we depend upon. Transportation was locked down in lower Manhattan for three or four days. If there was an electrical problem or a generator problem, you have to make sure that your people are trained to overcome those obstacles, because help was not coming from outside the datacenter.
Given that you were on the Avenue of the Americas, you weren’t as physically close to the storm surge as some on the southern tip of the island. Still, are you taking any more precautions in the future?
Well, because we’re on the seventh floor, most of our equipment was on the 15th floor, we were not as sensitive as some of our competitors, where their fuel pumps,and fuel tanks, and electrical systems, the rollup generators that were prone to flooding. So we’re not going to have to go back and take traditional measures for those. But certainly for future buildouts and expansion projects, we’ll take it into consideration.
You know, Sandy was only a Category 1 [hurricane]. You hear the stories of people who went down. I couldn’t imagine what it was like if it was a Category 3.
One final question: how much sleep did you get throughout the whole thing?
That’s what we were talking about. You know, my job is to worry. Even though I’m in Denver, all my personnel—they’re the ones on the front lines and having to endure the 12-on, 12-off shifts, and eat in the facility, and being away from their families. So I caught this cold along the way, and I refused to take cold medicine because I did not want to be unalert throughout all this. And so I got very little sleep through the first week until things were stabilized.