Meet Portia, a service of the Scrapinghub Platform
What is Portia?
Portia lets you scrape web sites without any programming knowledge required.
Create a template by clicking the elements on pages you would like to scrape, and Portia will create a spider to scrape similar pages from the website.
No need to download or install anything, as Portia runs in your web browser!
Portia is completely open source! Please check out the portia github repository. Portia projects in Scrapinghub can be exported and used with the open source project, giving users all the freedom and benefits of open source.
See how it works!
Define your data schema
Annotate a web page
Run your spider
Watch your spider run
Review the scraped data
See it in action
Using Portia to scrape your favorite recipes from allrecipes.com
Vastly simplified my development efforts.
I have been using Scrapinghub for more than a year for personal projects. The interface provided saves about 95% of the time normally needed for extracting information from webpages. With about an hour of learning, this tool vastly simplified my development efforts, and runs consistently and reliably. If you have a need to gather data from a series of webpages, I could not recommend Scrapinghub enough.
- Bill Frischling
Allows us to expedite crawling simple sites.
Portia allows us to expedite crawling simple sites. The visual spider editor and Scrapy Cloud's scheduling functionality enable us to create crawlers and run scraping jobs without touching a line of code - meaning less development time and less maintenance time. Since we began using the platform, Explore has become even better and exhaustive in its monitoring process.
- Cédric Pegorier
- Middleware Manager
Lets me extract data by clicking around.
Portia allows me to get the data I want on a constant basis without needing to code or download a desktop client. A fun extra is that some people I work with think I’m some kind of genius - because I’m able to extract all this data from different sites, and they're not realizing I’m doing so by merely clicking around.
- Christian Eder