Configuration
Parameter
appId |
The ID of the application you want to store the crawler extractions in. |
apiKey |
API key for your targeted application. |
indexPrefix |
Prefix added to the names of all indices defined in the crawler’s configuration. |
rateLimit |
Number of concurrent tasks per second that can run for this configuration. |
schedule |
How often a complete crawl should be performed. |
startUrls |
The crawler uses these URLs as entry points to start crawling. |
sitemaps |
URLs found in |
ignoreRobotsTxtRules |
When set to |
ignoreNoIndex |
Whether the Crawler should extract records from a page whose |
ignoreNoFollowTo |
Whether the Crawler should follow links with the |
ignoreCanonicalTo |
Whether the Crawler should extract records from a page that has a canonical URL specified. |
extraUrls |
URLs found in |
maxDepth |
Limits the processing of URLs to the specified depth, inclusively. |
maxUrls |
Limits the number of URLs your crawler can process. |
saveBackup |
Whether to save a backup of your production index before it is overwritten by the index generated during a crawl. |
renderJavaScript |
When |
initialIndexSettings |
Defines the settings for the indices that the crawler updates. |
exclusionPatterns |
Tells the crawler which URLs to ignore or exclude. |
ignoreQueryParams |
Filters out specified query parameters from crawled URLs. This can help you avoid indexing duplicate URLs. You can use wildcards to pattern match. |
requestOptions |
Modify all crawler’s requests behavior. |
linkExtractor |
Override the default logic used to extract URLs from pages. |
externalData |
Defines the list of external data sources you want to use for this configuration, and make available to your extractor function. |
login |
This property defines how the crawler acquires a session to access protected content. |
safetyChecks |
A configurable collection of safety checks to make sure the crawl was successful. |
actions |
Determines which web pages are translated into Algolia records and in what way. |
discoveryPatterns |
Indicates additional web pages that the Crawler should visit. |
hostnameAliases |
Defines mappings to replace given hostname(s). |
pathAliases |
Defines mappings to replace a path in a hostname. |
cache |
Turn crawler’s cache on or off. |