A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

Automatically scrapes all menu items from the Taco Bell website

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

This was supposed to be a web scraping project, but somehow I've turned it into a spamming project

Open Crawl Vietnamese Text

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Extract embedded metadata from HTML markup

Scraping news from Ucsal portal with Scrapy.

A scalable frontier for web crawlers

An IpVanish Proxies Scraper

Dailyiptvlist.com Scraper With Python

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A Scrapper with python

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

Works very well and you can ask for the type of image you want the scrapper to collect.

A python module to parse the Open Graph Protocol

Scrap-mtg-top-8 - A top 8 mtg scraper using python

Automated Linkedin bot that will improve your visibility and increase your network.

Visual scraping for Scrapy