Tools / Crawler / Crawler: DiscoveryPatterns

Crawler: DiscoveryPatterns

Type: string[]
Parameter syntax
discoveryPatterns: [
  'http://www.example.com/**',
  ...
]

About this parameter

Indicates additional web pages that the Crawler should visit.

When visiting a web page, the Crawler looks for these URLs using micromatch, and adds all matches to the Crawling queue. You can use negations, wildcards, and more.

This is useful when you want to visit pages that contain links to pages to extract, but you don’t want to extract records from these intermediate pages.

You can think of discoveryPatterns as pathsToMatch, but without record extraction. The Crawler visits all URLs that it finds from either list, but only runs the recordExtractor on pages whose URL matches with pathsToMatch.

Examples

1
2
3
{
  discoveryPatterns: ['https://*.algolia.com/**'],
}
Did you find this page helpful?