Crawlera

Say goodbye to IP bans and proxy management

Crawlera is a smart downloader designed specifically for web crawling and scraping. It allows you to crawl quickly and reliably, managing thousands of proxies internally, so you don’t have to.

Meet Crawlera

What is Crawlera

Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.

Accounts provide a standard HTTP proxy API, so you can configure it in your crawler of choice and start crawling.

How does it work?

Crawlera distributes requests among many internal nodes, using a proprietary algorithm to minimize the risks of getting banned, by throttling requests sent to sites from each internal node. If, for whatever reason, any node gets banned, Crawlera will blacklist it and avoid using it for future requests to that domain.

Banned requests typically return a non-200 response (like 403 or 503), or redirect to a captcha page. These responses are detected by Crawlera and the requests are automatically retried from another (clean) node.

You can start right away with an account in our shared pool, or speak to sales to set-up a dedicated instance.

Check Pricing for the available plans and the Documentation to start using it.

Example

Using Crawlera can be as easy as running following command:

curl -U <API_KEY>: -x proxy.crawlera.com:8010 http://crawlera.com

More usage examples

Usage examples

Crawlera plans provide a standard HTTP proxy interface so you can use it with any software that supports them. There are some examples below and more available in our Documentation.

cURL

Here is an example using Crawlera with curl to download http://crawlera.com:

curl -U : -x proxy.crawlera.com:8010 http://crawlera.com

Command-Line Tools

Several Unix commands (like wget and curl) and applications (including Scrapy) support the http_proxy environment variable to configure the HTTP proxy to use. You can configure before running your command with:

export http_proxy=http://:@proxy.crawlera.com:8010

Scrapy

To use Crawlera with Scrapy you can just set the http_proxy environment setting (as explained in Command-Line Tools section).

There is also middleware provided by scrapy-crawlera that you can use if you need more functionality (like enabling Crawlera only for some specific spiders).

First install scrapy-crawlera:

pip install scrapy-crawlera

Then enable the middleware by adding this to your Scrapy settings:

DOWNLOADER_MIDDLEWARES = {'scrapy_crawlera.CrawleraMiddleware': 600}
CRAWLERA_ENABLED = True
CRAWLERA_APIKEY = 'your_apikey'

Pricing

C10C50C100C200Enterprise
Price$25/month$100/month$250/month$500/monthCustom
Monthly requests question 150K1M3M9MCustom
Concurrent requests1050100200Custom
Custom User agent-checkcheckcheckcheck
HTTPScheckcheckcheckcheckcheck
Crawl Assistance question ----check
Priority Support question ----check
Residential IPs----Available
Request a quote

Features

  • Instant access to thousands of IPs in our shared pool.
  • IPs from 50+ countries available on request.
  • A ban detection database with over 130 ban types, status codes or captchas.
  • Automatic retrying and throttling to crawl smoothly and prevent bans.
  • HTTP and HTTPS proxy support (with CONNECT).

Feedback and support

In case you need help, check our Documentation and the Crawlera Knowledge Base.

You're also invited to Contact Us whenever in need.

Incredibly transparent and stable

Crawlera has solved our problem of making sparse requests from different IP addresses in an incredibly transparent and stable way. Their team is amazing and has been really helpful to us. I definitely recommend this product!

Juan Catalano

CTO - Streema

Allowed us to bypass anti-crawling technology

Scrapinghub allowed us to launch the largest Bitcoin market in the world by scraping millions of items from more than 20 markets. Some of the sites were employing anti-crawling technology but by using Crawlera and crawling from multiple IPs, that problem was solved as well.

Vedran Kajic

Founder - Bspend