Frequently Asked Questions

What is Scrapinghub?

Scrapinghub is a company that provides web crawling and scraping services, including Scrapy Cloud and scraping consultancy.

What is Scrapy?

Scrapy is the most popular screen scraping framework for Python. It provides an environment for writing custom scrapers for any site, which are called "spiders" and allows grouping multiple spiders in a project, to be run together.

What is Scrapy Cloud?

Scrapy Cloud is a platform for running and managing Scrapy projects. Projects can be deployed using Scrapy, and managed from the API or via the Web Panel.

How much does it cost?

See the Pricing page.

What is a spider?

A Scrapy spider, or spider for short, is the code that defines the custom behaviour for crawling and parsing pages for a particular website in the scrapy screen scraping and web crawling framework.

In other systems these are sometimes referred to as wrappers, agents or scrapers.

Do you write custom spiders?

Yes, see Professional Services and the Professional Services FAQ.

Do I own the code of my spiders?

Yes, the Scrapy spiders code written by us are always owned by the customer. If you upload your spiders to Scrapy Cloud, the code is yours already and we'll never share it without your permission.

Do I own the data I scrape?

Scrapinghub does not claim ownership of the data you scrape. You will have control how it is used and who may access it.

Will you share or resell my data?

No, we'll never share your data without your permission. Customer data is considered propietary and confidential.

Does Scrapy Cloud have an API?

Yes, Scrapy Cloud has an API for scheduling scraping jobs (ie. execution of spiders), checking which jobs have run, and retrieving the scraped data. See the API documentation for more info.

What format do you support for retrieving the scraped data?

Scraped data can be retrieved in JSON format.

Can I download Scrapy Cloud?

Scrapy Cloud operates as a hosted online service, so there is nothing to download. All Scrapy Cloud operations can be controlled through the web-based panel, and you can connect your own applications and systems to Scrapy Cloud using web-services APIs.

If the hosted online service has limitiations that make it unsuitable for your needs, please contact us and hopefully we can propose a solution.