Scrapy Cloud

The most powerful platform to run your web crawlers.


Scrapy Cloud is a battle-tested platform for running web crawlers (aka. spiders). Your spiders run in the cloud and scale on demand, from thousands to billion of pages. Think of it as a Heroku for web crawling.

Features

Code your spiders

Write your spiders using Scrapy, the most powerful open source web crawling framework

.. or build them visually

With a point and click tool (Portia), which is also open source and extensible

Deploy them to the cloud

With a single command or the push of a button. No servers involved

Manage your spiders

Manage your spiders from a dashboard. Schedule them to run automatically

Watch them run

Watch spiders as they run and scrape data, compare and annotate the data scraped

Manage resources wisely

Run heavy jobs with more memory and lighter jobs with more concurrency

Download the data

In JSON, CSV or XML format

.. or share with the world

By publishing a dataset

Full API access

To integrate and build your apps with Scrapinghub data


Pricing


Scrapy Cloud is forever free

$
0
  • unlimited team members
  • unlimited projects
  • unlimited requests
  • 24 hour max job run time
  • 1 concurrent crawl
  • 7 day data retention
  • no credit card required
Sign Up

Elastic Pricing

One of Scrapy Cloud key features is its elastic capacity. You can purchase capacity units (essentially, 1 GB RAM each) when you need to scale up. You can distribute these units intelligently, by having heavy spiders require more units to run and lighter spiders less. This way, the capacity will be optimized to the size of your spiders.

adds

+1GB of RAM

+1 concurrent crawl

for

When you purchase any unit, you also get:

ability to run jobs longer than 24 hours
120 days of data retention (vs 7 days in the free tier)
personalized support


Happy Clients


Provides a simple way to run your crawls and browse results.

Scrapy is really pleasant to work with. It hides most of the complexity of web crawling, letting you focus on the primary work of data extraction.

Scrapinghub provides a simple way to run your crawls and browse results, which is especially useful for larger projects with multiple developers.

Does not force vendor lock-in.

I love that Scrapy Cloud does not force vendor lock-in, unlike the other scraping and crawling services. Investment developing the right scraping logic is not stuck in some proprietary format or jailed behind some user friendly interface. With Scrapy Cloud scraping logic is in standard Python code calling the open-source Scrapy Python library. You retain the freedom to run the scraping Python code on your own computers or someone else’s servers.