Scrapinghub is a company that provides web crawling and scraping services, including Scrapy Cloud and scraping consultancy.
Scrapy is the most popular screen scraping framework for Python. It provides an environment for writing custom scrapers for any site, which are called "spiders" and allows grouping multiple spiders in a project, to be run together.
Scrapy Cloud is a platform for running and managing Scrapy projects. Projects can be deployed using Scrapy, and managed from the API or via the Web Panel.
See the Pricing page.
A Scrapy spider, or spider for short, is the code that defines the custom behaviour for crawling and parsing pages for a particular website in the scrapy screen scraping and web crawling framework.
In other systems these are sometimes referred to as wrappers, agents or scrapers.
Yes, see Professional Services and the Professional Services FAQ.
Yes, the Scrapy spiders code written by us are always owned by the customer. If you upload your spiders to Scrapy Cloud, the code is yours already and we'll never share it without your permission.
Scrapinghub does not claim ownership of the data you scrape. You will have control how it is used and who may access it.
No, we'll never share your data without your permission. Customer data is considered propietary and confidential.
Yes, Scrapy Cloud has an API for scheduling scraping jobs (ie. execution of spiders), checking which jobs have run, and retrieving the scraped data. See the API documentation for more info.
Scraped data can be retrieved in JSON format.
Scrapy Cloud operates as a hosted online service, so there is nothing to download. All Scrapy Cloud operations can be controlled through the web-based panel, and you can connect your own applications and systems to Scrapy Cloud using web-services APIs.
If the hosted online service has limitiations that make it unsuitable for your needs, please contact us and hopefully we can propose a solution.