Open Source at Scrapinghub

Supporting the leading crawl technologies through sponsored open source work


An Open Source DNA

Scrapinghub was built on the success of Scrapy, an open source web crawling framework our founders released in 2008. We’ve been managing Scrapy with the same commitment and enthusiasm ever since.

5 years later, we’re over 110 and have a few dozen more open source projects under our belt. They’re all crafted and maintained with the same love and passion. We couldn't think of any other way to run Scrapinghub. Open source is in our DNA.


Our Open Source Projects

Scrapely

Scrapely is a library for generating parsers for web pages.

Date Parser

DateParser is our libraryfor parsing human readable dates and times. Supports 18 languages.

ScrapyJS

ScrapyJS is our middleware for Splash, making it easy to use Splash in your Scrapy projects.

Frontera

Frontera is a framework for managing your crawl logic and policies.

Formasaurus

Formasaurus figures out the type of an HTML form using machine learning. Is it a login, search, sign up, password recovery, contact form, etc?

w3lib

w3lib provides a number of useful web-related functions for your web scraping projects.

ScrapyRT

ScrapyRT let’s you reuse your spider’s logic to extract data from web pages through a single HTTP request.

Loginform

Loginform is a library that detects and fills login forms on specified URLs.

Webstruct

Webstruct is our library for building NER systems that work with HTML.

Queuelib

Queuelib lets you create disk-based queues in Python.

adblockparser

adblockparser is a library for parsing and matching against Adblock Plus filters.

MDR

MDR is a library for detecting and extracting list data from web pages.

Webpager

Webpager is a library for classifying whether a link on a web page is a pagination link or not.

Skinfer

Skinfer is a tool we developed to infer schemas from a sample of JSON data.

Scrapy-StreamItem

Scrapy-StreamItem provides support for working with streamcorpus’ StreamItems.

Wappalyzer-Python

Wappalyzer-Python is a Python based wrapper for Wappalyzer.

Interested in working with us?

Check our open positions