Sunday, October 14, 2007

Lessons learnt from recent SharePoint crash

Here's a bit of retrospective of what we learn't after teh recent Sharepoint issues:
1. Don't be adventurous with patches, let some other poor smuck find the problem first. Wait at least 7 days before applying any patch to a system and make sure that you keep across other keep people in the industry like Wayne, Vlad and Susan (to name but a few). THese people are likley to find the problem first or have others come to them.
2. Make sure that you have a backup and the backup works. We have 2 backups, one is a full server backup using NT backup and the second is a Sharepoint data backup using STSADM. The full backup failed to restore TWICE so we had to revert to the Sharepoint backup. Problem with this is that it takes time to build up a suitable machine onto which you can recover Sharepoint ( Service Packs, Templates and the like). Since most of our Sharepoint stuff is virtualized we will ensure that we have a more up to date backup machine ready in the future. It is also probably a good idea to take an image of the server regularly using something like ShadowProtect. This means in the event of a disaster you can at least roll back to a previous point pretty quickly.
3. Realise the point at which you need to call Microsoft Product Support is probably much earlier than you think. If you don't work with thi stuff day and day out then a call to PSS could have you potentially hours and hours of wasted time. If you don't have access to PSS then again consult others in the community since they are all usually most willing to help. Also, keep in mind that it may take a while for any support to actually get hands onto the problem. In our case it took 24 hours, which was frustrating but in th end we were more than happy with the result.
4. All our Sharepoint sites (both internally and externally) DON'T run on SBS. Why? Well, we reckon SBS is the heart of our network and doesn't need to be strained with additional load when we can achieve the same result using other means (read - user Virtual PC). If Sharepoint had been on our SBS it would have caused an even more major distruption. Since it wasn't we could focus on other things while working independently on the Sharepoint issue. We acknowledge that in some places this is not possible but if you can, generally host Sharepoint somewhere else on the network.
5 Understand the fact that if Sharepoint or IIS is down you can't access ANY of your data! Let me repeat that ANY OF YOUR DATA. This is a single point of failure and admittedly fairly unique but if your business DEPENDS on Sharepoint day in and day out then you need to take steps to reduce the downtime or otherwise you'll have plenty of staff twiddling their thumbs.
If we think of more we'll post but hopefully everything is now looking ok.