After doing a migration from SharePoint 2007 to SharePoint 2010 for one of my clients I experienced a funny Search crawl error. My client had a webpart on their SharePoint 2007 environment which displays the recently changed documents on the site. Everything was working just fine in SharePoint 2007 and the migration to SharePoint 2010 was running successfully as well.
Once we configured a search crawl to index the content we encountered several issues in the crawl logs telling us that the site can’t be downloaded and for that can’t be crawled.
After we ensured that the crawl account had appropriate permissions on the webapplication I used a little move I pull from time to time to troubleshoot searrch crawl issues. I logged in as the crawl account and we found that when the crawl account accesses the site a threshold error gets thrown which tells us that the webpart is exceeding the allowed threshold of 5000 items (default setting for the webapplication).
You can adjust the threshold in the webapplication general settings in central administration I would not recommend it though because you open up the door for very severe performance issues within your environment. Rather I would recommend to rethink the way you query your data from SharePoint to be more performance efficient.
My client was using his account with administrative privileges to verify if everything was working as expected after the migration but for this account the threshold will not be applied. The crawl account just has read permissions to the content and for that wasn’t able to crawl the content due to the threshold.
After removing the webpart from the site(s) crawling was just working fine. The webpart will need to be replaced by one beeing a little bit more aware of the threshold in SharePoint 2010 .