It All Adds Up to Disaster!

I recently was dragged into a corporate disaster situtation that was not my own.  Lets just say it was as a favor to numerous parties.  The whole exercise reminded me ...

  • Uncontrolled mailbox growth is bad.  When you are a smaller organization that does not typically warrant multiple mail servers, having 6-8GB average mailboxes is a very bad thing.  If email and mobile messaging is your #1 priority system, you should probably invest some extra in it. 
  • Just "backing it up" is not enough.  Go through the exercise of restoring the data periodically.  Have some documention available.  Oh, and ensure you keep your backup software and support contracts up to date! 
  • Plan for it.  It is inevitable that something will fail.  If you are lucky, it is something minor or easily repairable.  Have a DR plan or an outline available.  Knowing what to do helps you get a jump on recovery.
  • Communicate.  Do not sugar coat it.  When failures occur, quickly notify your users and engage any technical support / vendor personnel (equipment / software).  Make sure you have access to a decision maker who can make the "tough calls" when needed.
  • Do not hesitate to ask for help.  Consider engaging an outside consultant with the experience needed.  If you did not spend the money up front for prevention, it is time to spend it on a faster recovery.
  • Keep your options open.  If you can, give yourself several options to get back on your feet and work in parallel.  Do not put all your eggs in one basket only to find out that basket had a hole in the bottom.
  • Fatigue can lead to further disaster.  In small shops, you may be on your feet for 24hrs trying to correct the issue.  Take a break and get a power nap.  Just a couple hours can go a long way.
  • Have a spare.  Keep an extra server or two around, even if it is an older one.  Newer is not necessarily better if the hardware/software that failed is not compatible with the lastest and greatest software/hardware.

Preparation and practice can save you time and money as well as speed the recovery.  Failing to invest time and money in fault tolerance and DR, will mean longer down times and possible data loss.