A python module to parse the Open Graph Protocol

Last update: Nov 12, 2022

Related tags

Overview

OpenGraph is a module of python for parsing the Open Graph Protocol, you can read more about the specification at http://ogp.me/

Installation

$ pip install opengraph

Features

Use it as a python dict
Input and parsing from a specific url
Input and parsung from html previous extracted
HTML output
JSON output

Usage

From an URL

>>> import opengraph
>>> video = opengraph.OpenGraph(url="http://www.youtube.com/watch?v=q3ixBmDzylQ")
>>> video.is_valid()
True
>>> for x,y in video.items():
...     print "%-15s => %s" % (x, y)
...
site_name       => YouTube
description     => Eric Clapton and Paul McCartney perform George Harrison's "While My Guitar Gently Weeps" at the...
title           => While My Guitar Gently Weeps
url             => http://www.youtube.com/watch?v=q3ixBmDzylQ
image           => http://i2.ytimg.com/vi/q3ixBmDzylQ/default.jpg
video:type      => application/x-shockwave-flash
video:height    => 224
video           => http://www.youtube.com/v/q3ixBmDzylQ?version=3&autohide=1
video:width     => 398
type            => video

From HTML

>>> HTML = """
... <html xmlns:og="http://ogp.me/ns#">
... <head>
... <title>The Rock (1996)</title>
... <meta property="og:title" content="The Rock" />
... <meta property="og:type" content="movie" />
... <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
... <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
... </head>
... </html>
... """
>>> movie = opengraph.OpenGraph() # or you can instantiate as follows: opengraph.OpenGraph(html=HTML)
>>> movie.parser(HTML)
>>> video.is_valid()
True

Generate JSON or HTML

>>> ogp = opengraph.OpenGraph("http://ogp.me/")
>>> print ogp.to_json()
{"image:type": "image/png", "title": "Open Graph protocol", "url": "http://ogp.me/", "image": "http://ogp.me/logo.png", "scrape": false, "_url": "http://ogp.me/", "image:height": "300", "type": "website", "image:width": "300", "description": "The Open Graph protocol enables any web page to become a rich object in a social graph."}
>>> print ogp.to_html()

<meta property="og:image:type" content="image/png" />
<meta property="og:title" content="Open Graph protocol" />
<meta property="og:url" content="http://ogp.me/" />
<meta property="og:image" content="http://ogp.me/logo.png" />
<meta property="og:scrape" content="False" />
<meta property="og:_url" content="http://ogp.me/" />
<meta property="og:image:height" content="300" />
<meta property="og:type" content="website" />
<meta property="og:image:width" content="300" />
<meta property="og:description" content="The Open Graph protocol enables any web page to become a rich object in a social graph." />

A python module to parse the Open Graph Protocol

Related tags

Overview

Installation

Features

Usage

Owner

Erik Rivera

Python scraper to check for earlier appointments in Clalit Health Services

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

Scrapes the Sun Life of Canada Philippines web site for historical prices of their investment funds and then saves them as CSV files.

Console application for downloading images from Reddit in Python

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

This is a python api to scrape search results from a url.

This is a script that scrapes the longitude and latitude on food.grab.com

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Bulk download tool for the MyMedia platform

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

Parse feeds in Python

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

A simple django-rest-framework api using web scraping

A Python package that scrapes Google News article data while remaining undetected by Google.

Scrape all the media from an OnlyFans account - Updated regularly

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

Screen scraping and web crawling framework

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.