Our current search does not seem to include older Office versions (e.g. ppt) in SharePoint 2010 Search results. Why did we limit this?
When I received the above mentioned problem statement I started exploring the below:
- Is there a problem in crawl / any restrictions configured to specific file types – Test passed
- Is Browser Locale setting in the Search Result Core webpart trimming the results specific to some location – Test passed
- Is Remove Duplicate Results settings in the Search Result Core webpart causing any issue– Test passed
- Compared the file size between the files which are appearing the search results and the issue causing file – Test partially failed, the file of the disappearing files are always bigger than the files which are appearing in the search results.
I stopped here and started exploring is there a limitation in the crawler settings to crawl the file contents with big size and found that SharePoint is by default limited to crawl file contents which are less than 16 MB.
So SharePoint when it crawls the files from the document library or list, if the file size exceeds this 16 MB limit then it will only crawls the basic meta data which are associate to that list /library such as Title, Created By, Modified By. The contents inside the file will not be crawled.
To increase this limit we can run the following PowerShell script. But the impact will be on the crawl time. So we have to analyze the environment and perform this action.
$ssa = Get-SPEnterpriseSearchServiceApplication
We can set this limit to specific file type as shown below: