SharePoint Search Service: Regex Crawl Rules for File Types
- Published on
- • 2 mins read•--- views
Task: Configure SharePoint Search Service so that it will only show pages of the following types: documents (.doc/.docx/.pdf), tables (.xls, .xlsx), and aspx pages (.aspx).
Instruction:
- First, you need to compose a regex to solve your task. In my case, the regex would be:
http://<host>/.*(.aspx|.doc(x)?|.xls(x)?|.pdf)

I recommend using the very helpful site https://regex101.com/ for composing and testing your regular expressions.
- Copy your regex and navigate to SharePoint Admin Center → Services, find the search service and go to Manage → Crawl Rules. Add a new crawl rule with the include type (ATTENTION: remove backslashes from the regex you got in step 1). Enable the checkbox Follow complex URLs also!


- Click save. On the page where you added the new rule, you can also test some links and see if the page will be covered by this rule. For example:

- Also, you have to add a global exclude rule for all content sources, with lower priority than the include rules (ATTENTION: add include and exclude rules for all content sources). In my case, the exclude rule regex will be:
https://host/.*
- Save all rules and run a full index scan. Check crawled pages in the crawl log.
- PROFIT!