Wednesday, July 21, 2010

The PDF Project



Now that I have SharePoint Foundation 2010 running on SBS 2008 my next project is to get it to index Adobe PDF’s because guess what? It doesn’t by default!

Windows SharePoint Services v3, WSS v3, (i.e. the old Companyweb on SBS 2008) wouldn’t either but you could install an Adobe PDF iFilter, make a few registry changes and get it working (all the details are in my Guide). Interestingly, my hosted SharePoint site also doesn’t index PDFs. When I asked the hosting company about this they said they were looking in to it. I suppose that allowing SharePoint to index PDF’s is not ‘standard’ but without it is a significant drop in functionality I believe.

After doing a bit of reading on the Internet I couldn’t really find anyone that has definitively been able to get PDF’s indexed on SharePoint Foundation 2010. Once again this means that I’ll have to nut it out myself.

The first step in the process would be to install the 64 bit PDF iFilter from Adobe because no PDF’s will be indexed with out this. Again, make sure you install the 64bit version and to my knowledge there is no 64 bit version of Acrobat reader so if you simply install Acrobat reader on your SBS 2008 server you’ll only get the 32 not 64 bit version. Thus the specific need for the 64 bit iFilter.

The next step was to make similar registry changes that are made when you get PDF indexing going on WSS v3. The only thing to be aware is that the hive is now \14\ not \12\ but the rest of the registry path is the same. So I made these changes, stopped and started SharePoint Search Server v14 service, did a full manual crawl and did a search for terms that only appeared in PDF documents on my SharePoint Foundation 2010 site. No luck.

When I returned to examine one of the registry entries I found that it was missing. Hmmm..I re-did the entry and went through the search restart and crawl process. Still no good but again the registry entry was missing! Now that is interesting. It appears that when you restart the SharePoint Foundation Search V4 service it rewrites this registry entry. Ok, now where is it getting that from?

After some more digging it turns out that the entries in the registry actually come from a database in the SharePoint Foundation 2010 search database. So what I did was create an additional entry in this database for the registry entry that I wanted and again restarted all the services. Still no luck but at least the required registry entry for PDF’s was there.

My next guess on what was wrong was the specific GUID for the PDF iFilter which I guessed was now different from what it was in WSS v3. So I took a working WSS v3 installation and searched for all registry entries that matched the WSS v3 PDF GUID. From these I found a common string being ‘PDF iFilter’. I then searched the registry on the machine with SharePoint Foundation 2010 for the string ‘PDF iFilter’.

I turned up quite a few GUID’s but after comparing these to articles I found on the Internet I determined that the correct GUID is in fact
{E8978DA6-047F-4E3D-9C78-CDBE46041603}. I inserted that into the registry in the appropriate place, restarted all the search services again and ran a search.

Joy of joy’s, it works! Now I gotta say that most people probably don’t want to hacking the SharePoint search database just to get PDF’s to index on SharePoint 2010 but as far as I can see this is really the only option they have. I’m going to keep looking for a better solution but with the registry keys getting overwritten on each Search service restart it isn’t going to be simple.

So there you have it. You can index PDF’s with SharePoint Foundation 2010 but the process is not straight forward at all and is not a supported option at all. However, for those that really need to work it can be done. Full details and a step by step guide of how to do this will be added to my Guide for subscribers.