Lessons learned in a power outage
Having experienced several complete data center power outages I’ve learned some important lessons over the years when it comes to virtualizing your infrastructure. Today I experienced another power outage and my previous experiences ensured that I was ready for this one and I thought I would share some tips:
- DNS is the most critical component in your environment, for almost anything to work properly you need a DNS server up first. If you have all your DNS servers virtualized and they are on shared storage its going to be very difficult to bring them up because everything else that needs to come up first usually relies on DNS. Therefore you should make sure you have at least one DNS server on the local storage of one of your hosts, that way you can get it up early on and not have to wait for your shared storage to come up. If you want to go a step further keep one as a physical server also, so you can quickly and easily get DNS up right away.
- Active Directory, DHCP & any other authentication servers are also very important, having a workstation up and running is very handy so you can centrally connect to hosts and power things back on. If you’re using DHCP for your workstations and servers you again will want one up as soon as possible so you can get them on the network and get going. Not having an Active Directory server available can make Windows servers take a very long time to boot. So again, keep an AD & DHCP server on local storage or on a physical server so you can quickly get them up as soon as possible. AD, DNS & DHCP are critical to a Windows environment and without them available you’ll find that the rest of your environment is mostly useless.
- Know your ESX command line, if your vCenter Server and other workstations are not available you’ll need to start VM’s using the command line. Even if your DNS server is on a local VM you won’t be able to start it without the vSphere Client. Therefore you’ll have to log into the ESX console and manually start it, if you don’t know the command to do this that could be a problem. Keep a cheat sheet by your hosts with the basic commands that you’ll need like vmware-cmd to get things up and running using the console.
- Know your host IP addresses, if DNS is not up yet you won’t be able to connect to your hosts using putty or the vSphere client using their host names, you probably won’t know their IP address and without a DNS server can’t look them up. Therefore keep a list of your host IP addresses so you can use that to connect to them.
- Know how to re-scan storage, your hosts may come up before your shared storage, once your shared storage is up you’ll need to rescan from your hosts so they will see it and you can restart the VM’s on it. You can do this using the vSphere client by clicking on Configuration, Storage Adapters, selecting your HBA and clicking the Rescan button and then select search for new devices. You can also do this using the command line esxcfg-rescan utility.
- Make sure you know where your datacenter keys are if you use a electronic card scanner to open your doors. Most systems are placed in the datacenter and if the power goes out your doors are not going to work. There is nothing worse than running around trying to find keys in a crisis to get into the datacenter. And make sure you don’t keep the keys in datacenter or you’ll have to break down the door to get in. (thanks Tony DiMaggio for reminding me about this one)
Being prepared is critical in crisis situation to ensure you can react quickly to get things back up and running. Sometimes it takes a crisis to point out any shortcomings that you may have in your environment but thinking ahead can save you from big headaches later on.