Crawler: Cache
object
cache: { enabled: true }
About this parameter
Turn crawler’s cache on or off.
Turning on cache can save bandwidth, as the crawler will only crawl pages that have changed.
When cache.enabled
is true
, the crawler tries to perform conditional requests to your website.
For that, the crawler uses the ETag
and Last-Modified
response headers returned by your web server during the previous reindex. It sends these headers, respectively, in the If-None-Match
and If-Modified-Since
request headers.
When your website replies with a 304 Not Modified
response to those requests, the crawler reuses the record(s) of your live index instead of downloading and parsing the web page. Since your website wasn’t modified since the last reindex, your records wouldn’t change as well.
Usage notes
- The crawler doesn’t send conditional requests if your configuration is different from the last reindex.
- The crawler doesn’t send conditional requests if the external data associated to the page has changed since the last reindex.
Examples
1
2
3
4
5
{
cache: {
enabled: true
}
}
Parameters
Cache
Parameter | Description |
---|---|
enabled
|
type: boolean
default: true
Required
Turn the cache on or off. |