A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

A low-code tool that generates python crawler code based on curl or url

联通手机营业厅自动做任务、签到、领流量、领积分等。

Scraping Top Repositories for Topics on GitHub,

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

Web Scraping images using Selenium and Python

Instagram profile scrapper with python

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

A Python library for automating interaction with websites.

Python web scrapper

Scrape all the media from an OnlyFans account - Updated regularly

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

Scrap-mtg-top-8 - A top 8 mtg scraper using python

Scrapping Connections' info on Linkedin

优化版本的京东茅台抢购神器

A web service for scanning media hosted by a Matrix media repository

mlscraper: Scrape data from HTML pages automatically with Machine Learning

Danbooru scraper with python

TikTok Username Swapper/Claimer/etc

Twitter Scraper