a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

河南工业大学完美校园自动校外打卡

Scrapes all articles and their headlines from theonion.com

robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser.

Web crawling framework based on asyncio.

A Web Scraping Program.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

Python framework to scrape Pastebin pastes and analyze them

A high-level distributed crawling framework.

This tool crawls a list of websites and download all PDF and office documents

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

a way to scrape a database of all of the isef projects

Facebook Group Scraping Using Beautiful Soup & Selenium

WebScrapping Project - G1 Latest News

Async Python 3.6+ web scraping micro-framework based on asyncio

Lovely Scrapper

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

A dead simple crawler to get books information from Douban.

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Scraping weather data using Python to receive umbrella reminders

原神爬虫抓取原神界面圣遗物信息

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

河南工业大学 完美校园 自动校外打卡

Scrapes all articles and their headlines from theonion.com

robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser.

Web crawling framework based on asyncio.

A Web Scraping Program.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

Python framework to scrape Pastebin pastes and analyze them

A high-level distributed crawling framework.

This tool crawls a list of websites and download all PDF and office documents

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

a way to scrape a database of all of the isef projects

Facebook Group Scraping Using Beautiful Soup & Selenium

WebScrapping Project - G1 Latest News

Async Python 3.6+ web scraping micro-framework based on asyncio

Lovely Scrapper

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

A dead simple crawler to get books information from Douban.

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Scraping weather data using Python to receive umbrella reminders

原神爬虫 抓取原神界面圣遗物信息

河南工业大学完美校园自动校外打卡

原神爬虫抓取原神界面圣遗物信息