Scrapy Cloud

The most powerful platform to run your web crawlers

Scrapy Cloud is a battle-tested platform for running web crawlers (aka. spiders). Your spiders run in the cloud and scale on demand, from thousands to billion of pages. Think of it as a Heroku for web crawling.

Features

Code your spiders

Write your spiders using Scrapy, the most powerful open source web crawling framework

...or build them visually

With a point and click tool (Portia), which is also open source and extensible

Deploy to the cloud

With a single command or the push of a button. No servers involved

Manage your spiders

Unearth actionable insights. We’re able to filter, normalize, augment, analyze, and aggregate your data.

Watch them run

Watch spiders as they run and scrape data, compare and annotate the data scraped

Manage resources wisely

Run heavy jobs with more memory and lighter jobs with more concurrency

Download the data

Dowload your data in JSON, CSV or XML formats

...or share with the world

You can choose to share the data by publishing a dataset

Full API access

To integrate and build your apps with Scrapinghub data

Pricing

Scrapy cloud is forever free

0
  • Unlimited team members
  • Unlimited projects
  • Unlimited requests
  • 1 concurrent crawl
  • 7 day data retention
  • No credit card required
Sign up for free

Elastic pricing

One of Scrapy Cloud key features is its elastic capacity. You can purchase capacity units (essentially, 1 GB RAM each) when you need to scale up. You can distribute these units intelligenly, by having heavy spiders require more units to run and lighter spiders less. This way, the capacity will be optimized to the size of your spiders.

1 Scrapy Cloud Unit
1GB of RAM + 1 concurrent crawl question
$9 Per Month

When you purchase any unit, you also get:

  • Ability to run jobs longer than 24 hours
  • 120 days of data retention (vs 7 days in the free tier)
  • Personalized support

Provides a simple way to run your crawls and browse results

Scrapy is really pleasant to work with. It hides most of the complexity of web crawling, letting you focus on the primary work of data extraction. Scrapinghub provides a simple way to run your crawls and browse results, which is especially useful for larger projects with multiple developers.

Jacob Perkins

StreamHacker.com

Does not force vendor lock-in

I love that Scrapy Cloud does not force vendor lock-in, unlike the other scraping and crawling services. Investment developing the right scraping logic is not stuck in some proprietary format or jailed behind some user friendly interface. With Scrapy Cloud scraping logic is in standard Python code calling the open-source Scrapy Python library. You retain the freedom to run the scraping Python code on your own computers or someone else’s servers.

Castedo Ellerman

Quantitative Analyst - Developer

Need web data?

Contact us

Interested in our platform?

Sign up for free