Scrapy Cloud is a battle-tested platform for running web crawlers (aka. spiders). Your spiders run in the cloud and scale on demand, from thousands to billion of pages. Think of it as a Heroku for web crawling.
Code your spiders
Write your spiders using Scrapy, the most powerful open source web crawling framework
.. or build them visually
With a point and click tool (Portia), which is also open source and extensible
Deploy them to the cloud
With a single command or the push of a button. No servers involved
Manage your spiders
Manage your spiders from a dashboard. Schedule them to run automatically
Watch them run
Watch spiders as they run and scrape data, compare and annotate the data scraped
Manage resources wisely
Run heavy jobs with more memory and lighter jobs with more concurrency
Download the data
In JSON, CSV or XML format
.. or share with the world
By publishing a dataset
Full API access
To integrate and build your apps with Scrapinghub data
Scrapy Cloud is forever free
- unlimited team members
- unlimited projects
- unlimited requests
- 24 hour max job run time
- 1 concurrent crawl
- 7 day data retention
- no credit card required
One of Scrapy Cloud key features is its elastic capacity. You can purchase capacity units (essentially, 1 GB RAM each) when you need to scale up. You can distribute these units intelligently, by having heavy spiders require more units to run and lighter spiders less. This way, the capacity will be optimized to the size of your spiders.
+1Scrapy Cloud unit
+1GB of RAM
+1 concurrent crawl
When you purchase any unit, you also get:
ability to run jobs longer than 24 hours
120 days of data retention (vs 7 days in the free tier)
Provides a simple way to run your crawls and browse results.
Scrapy is really pleasant to work with. It hides most of the complexity of web crawling, letting you focus on the primary work of data extraction.
Scrapinghub provides a simple way to run your crawls and browse results, which is especially useful for larger projects with multiple developers.
- Jacob Perkins
Does not force vendor lock-in.
I love that Scrapy Cloud does not force vendor lock-in, unlike the other scraping and crawling services. Investment developing the right scraping logic is not stuck in some proprietary format or jailed behind some user friendly interface. With Scrapy Cloud scraping logic is in standard Python code calling the open-source Scrapy Python library. You retain the freedom to run the scraping Python code on your own computers or someone else’s servers.
- Castedo Ellerman
- Quantitative Analyst / Developer