Scrapy Cloud is a battle-tested platform for running web crawlers (aka. spiders). Your spiders run in the cloud and scale on demand, from thousands to billion of pages. Think of it as a Heroku for web crawling.
Write your spiders using Scrapy, the most powerful open source web crawling framework
With a single command or the push of a button. No servers involved
To integrate and build your apps with Scrapinghub data
Unearth actionable insights. We’re able to filter, normalize, augment, analyze, and aggregate your data.
Watch spiders as they run and scrape data, compare and annotate the data scraped
Run heavy jobs with more memory and lighter jobs with more concurrency
Dowload your data in JSON, CSV or XML formats
You can choose to share the data by publishing a dataset
One of Scrapy Cloud key features is its elastic capacity. You can purchase capacity units (essentially, 1 GB RAM each) when you need to scale up. You can distribute these units intelligenly, by having heavy spiders require more units to run and lighter spiders less. This way, the capacity will be optimized to the size of your spiders.