Tools / Crawler / Crawler: Cache
Type: object
Parameter syntax
cache: {
  enabled: true
}

About this parameter

Turn crawler’s cache on or off.

Turning on cache can save bandwidth, as the crawler will only crawl pages that have changed. When cache.enabled is true, the crawler tries to perform conditional requests to your website. For that, the crawler uses the ETag and Last-Modified response headers returned by your web server during the previous reindex. It sends these headers, respectively, in the If-None-Match and If-Modified-Since request headers.

When your website replies with a 304 Not Modified response to those requests, the crawler reuses the record(s) of your live index instead of downloading and parsing the web page. Since your website wasn’t modified since the last reindex, your records wouldn’t change as well.

Usage notes

  • The crawler doesn’t send conditional requests if your configuration is different from the last reindex.
  • The crawler doesn’t send conditional requests if the external data associated to the page has changed since the last reindex.

Examples

1
2
3
4
5
{
  cache: {
    enabled: true
  }
}

Parameters

Cache

enabled
type: boolean
default: true
Required

Turn the cache on or off.

Did you find this page helpful?