Crawler: DiscoveryPatterns
string[]
discoveryPatterns: [ 'http://www.example.com/**', ... ]
About this parameter
Indicates additional web pages that the Crawler should visit.
When visiting a web page, the Crawler looks for these URLs using micromatch, and adds all matches to the Crawling queue. You can use negations, wildcards, and more.
This is useful when you want to visit pages that contain links to pages to extract, but you don’t want to extract records from these intermediate pages.
You can think of discoveryPatterns
as pathsToMatch
, but without record extraction. The Crawler visits all URLs that it finds from either list, but only runs the recordExtractor
on pages whose URL matches with pathsToMatch
.
Examples
1
2
3
{
discoveryPatterns: ['https://*.algolia.com/**'],
}