Crawler: RateLimit
number
8
rateLimit: rate_limit
About this parameter
Number of concurrent tasks per second that can run for this configuration.
A higher number means more crawls per second. This number works with the following formula:
1
MAX ( urls_added_in_the_last_second, urls_currently_being_processed ) <= rateLimit
If fetching, processing, and uploading URLs is taking less than a second, your crawler processes rateLimit
URLs per second.
However, if each URL on average takes 4 seconds to be processed, your crawler processes rateLimit / 4
pages per second.
It’s recommend to start with a low value (e.g. 2) and update it if you need faster crawling: a high rateLimit
can have a huge impact on bandwidth cost and server resource consumption.
Examples
1
2
3
{
rateLimit: 5,
}