Configuration

Whether the Crawler should follow links with the rel=”nofollow” tag and extract links from a page whose robots meta tag contains nofollow or none.

ignoreCanonicalTo

Whether the Crawler should extract records from a page that has a canonical URL specified.

extraUrls

URLs found in extraUrls are treated as startUrls for your crawler: they are used as starting points for the crawl.

maxDepth

Limits the processing of URLs to the specified depth, inclusively.

maxUrls

Limits the number of URLs your crawler can process.

saveBackup

Whether to save a backup of your production index before it is overwritten by the index generated during a crawl.

renderJavaScript

When true, all web pages are rendered with a chrome headless browser. The crawler will use the rendered HTML.

initialIndexSettings

Defines the settings for the indices that the crawler updates.

exclusionPatterns

Tells the crawler which URLs to ignore or exclude.

ignoreQueryParams

Filters out specified query parameters from crawled URLs. This can help you avoid indexing duplicate URLs. You can use wildcards to pattern match.

requestOptions

Modify all crawler’s requests behavior.

linkExtractor

Override the default logic used to extract URLs from pages.

externalData

Defines the list of external data sources you want to use for this configuration, and make available to your extractor function.

login

This property defines how the crawler acquires a session to access protected content.

safetyChecks

A configurable collection of safety checks to make sure the crawl was successful.

actions

Determines which web pages are translated into Algolia records and in what way.

discoveryPatterns

Indicates additional web pages that the Crawler should visit.

hostnameAliases

Defines mappings to replace given hostname(s).

pathAliases

Defines mappings to replace a path in a hostname.

cache

Turn crawler’s cache on or off.

Parameter

On this page