The LandPhil be honest, be honorable, be kind, be compassionate, and work hard.

January 11, 2011

SharePoint: Disaster Recovery How-To (SharePoint 2010)

Filed under: SharePoint — phil @ 12:39 pm

No one has a good “How To” out there for Disaster Recovery for SharePoint. Hopefully, this will provide at least a little bit of insight into that process. I will try to document as much as I can the implementation that we’ve decided to use and the steps I’ve gone through to bring it to fruition.

There are many ways that you can set up “disaster recovery” for SharePoint. More than a few of them do it at the SQL Server level. Unfortunately, in order to do that, you need a very fast link between your live SQL servers and your DR SQL servers. We don’t have that link. So, instead, we’re using what’s referred to as the Hot-Standby DR plan. In it, you asynchronously mirror all databases that are supported to be mirrored. This DOES NOT include the SharePoint_Config database, among others. With that in mind, here’s what we did:

First, we set up our production (and development) environments. Once those were done, I installed SharePoint 2010 on the first DR server. I then configured Central Administration for the DR environment using the same steps that we’d used for development and production. Then, I had our SQL DBA configure asynchronous database mirroring between our production SQL and DR SQL servers (for the SQL Admins, that would be high availability, not high safety. The principal database doesn’t wait for changes to commit to the mirror database, for production speed purposes. We also did not configure a witness. Our production and DR sites are in two different data centers, but we do not have a third data center at which a witness could be located. We also didn’t want the overhead or problems associated with the witness being inaccessible at any point.) This is where I believe I made my first error.

I then attempted to rebuild the service applications while pointing the database links to the mirrored databases. For the Business Data Connectivity service application, it worked, but with errors. Those errors were such that I was unable to get into the configuration of the service application itself. So, I removed that service application. I was able to recreate the Managed Metadata Service application in this way, but only by doing it as the exact same user as when it was originally created (that user becomes the term store administrator, by default.) I ran into the same problems with the Secure Store as I did with the BDC.  (Edit:  As it turns out, the Managed Metadata service application didn’t work either.  It worked for me because I had inadvertently mistyped the database name and had created a brand new, non-synced Managed Metadata service.)

It then occurred to me that perhaps doing a granular backup of the service applications in the production environment and then restoring them to the DR environment would work. After a few failed attempts at actually capturing the backups (don’t write directly to a local drive, for whatever reason, some of the files created don’t allow you to write to them. Instead, share a directory and then point to the shared directory instead), I was able to get one. However, when I tried to restore it, I kept running into errors that the databases were already being restored. I believed that to be because of the mirrored nature of the databases.  As luck would have it, I was right.

Ultimately, how I accomplished the DR site was to delete all of the mirrored databases.  Run backup from inside the production central administration of the individual service applications and their proxies and restore those service applications and their proxies to the DR site.  I also backed up the one web application that we had configured and did the same thing.  I thought that perhaps I could do the solutions as well, but I received errors.  You can’t restore a solution to “deployed” status.  Rather, the solutions get copied over and you will need to redeploy them to the target web applications.

The DR site is up.  Things left to do:

  • Document procedures for failing over to the DR site
  • Document procedures for failing BACK over to the production site
  • Test DR site fail procedure

Fun fun fun.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress