Crawlera

Say goodbye to IP bans and proxy management


Meet Crawlera, a service of the Scrapinghub Platform

Sign Up

What is Crawlera?

Crawlera is a smart downloader designed specifically for web crawling and scraping. It allows you to crawl quickly and reliably, managing thousands of proxies internally, so you don’t have to.

Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.

Accounts provide a standard HTTP proxy API, so you can configure it in your crawler of choice and start crawling.

How does it work?

Crawlera distributes requests among many internal nodes, using a proprietary algorithm to minimize the risks of getting banned, by throttling requests sent to sites from each internal node. If, for whatever reason, any node gets banned, Crawlera will blacklist it and avoid using it for future requests to that domain.

Banned requests typically return a non-200 response (like 403 or 503), or redirect to a captcha page. These responses are detected by Crawlera and the requests are automatically retried from another (clean) node.

You can start right away with an account in our shared pool, or speak to sales to set-up a dedicated instance.

Check Pricing for the available plans and the Documentation to start using it.

Example

Using Crawlera can be as easy as running following command:


    curl -U <API_KEY>: -x proxy.crawlera.com:8010 http://crawlera.com
    

More usage examples

Usage Examples

Crawlera plans provide a standard HTTP proxy interface so you can use it with any software that supports them. There are some examples below and more available in our Documentation.

cURL

Here is an example using Crawlera with curl to download http://crawlera.com:


        curl -U <API_KEY>: -x proxy.crawlera.com:8010 http://crawlera.com
        

Command-Line Tools

Several Unix commands (like wget and curl) and applications (including Scrapy) support the http_proxy environment variable to configure the HTTP proxy to use. You can configure before running your command with:


        export http_proxy=http://<API_KEY>:@proxy.crawlera.com:8010
        

Scrapy

To use Crawlera with Scrapy you can just set the http_proxy environment setting (as explained in Command-Line Tools section).

There is also middleware provided by scrapy-crawlera that you can use if you need more functionality (like enabling Crawlera only for some specific spiders).

First install scrapy-crawlera:

        pip install scrapy-crawlera
        

Then enable the middleware by adding this to your Scrapy settings:


        DOWNLOADER_MIDDLEWARES = {'scrapy_crawlera.CrawleraMiddleware': 600}
        CRAWLERA_ENABLED = True
        CRAWLERA_APIKEY = 'your_apikey'
        

Pricing


 
  • Price
  • Monthly requests
  • Concurrent requests
  • Custom User agent
  • HTTPS
  • Crawl Assistance
C10
  • Price: $25/month
  • Monthly requests : 150K
  • Concurrent requests: 10
  • Custom User agent: -
  • HTTPS:
  • Crawl Assistance : -
  • Sign Up Now
C50
  • Price: $100/month
  • Monthly requests : 1M
  • Concurrent requests: 50
  • Custom User agent:
  • HTTPS:
  • Crawl Assistance : -
  • Sign Up Now
C100
  • Price: $250/month
  • Monthly requests : 3M
  • Concurrent requests: 100
  • Custom User agent:
  • HTTPS:
  • Crawl Assistance : -
  • Sign Up Now
C200
  • Price: $500/month
  • Monthly requests : 9M
  • Concurrent requests: 200
  • Custom User agent:
  • HTTPS:
  • Crawl Assistance : -
  • Sign Up Now
Enterprise
  • Contact Sales for Custom Settings
  • Contact Sales for Custom Settings
  • Contact Sales for Custom Settings
  • Custom User agent:
  • HTTPS:
  • Crawl Assistance :
  • Sign Up Now
Sign Up for Scrapinghub

Features

  • Instant access to thousands of IPs in our shared pool.
  • IPs from 50+ countries available on request.
  • A ban detection database with over 130 ban types, status codes or captchas.
  • Automatic retrying and throttling to crawl smoothly and prevent bans.
  • HTTP and HTTPS proxy support (with CONNECT).

Feedback and support

In case you need help, check our Documentation and the Crawlera Knowledge Base.
You're also invited to Contact Us whenever in need.

Happy Clients


Incredibly transparent and stable.

Crawlera has solved our problem of making sparse requests from different IP addresses in an incredibly transparent and stable way. Their team is amazing and has been really helpful to us. I definitely recommend this product!

Allowed us to bypass anti-crawling technology.

Scrapinghub allowed us to launch the largest Bitcoin market in the world by scraping millions of items from more than 20 markets. Some of the sites were employing anti-crawling technology but by using Crawlera and crawling from multiple IPs, that problem was solved as well.

  • Vedran Kajic
  • Founder
  • Bspend