A Pixiv web crawler module

Overview

Pixiv-spider

A Pixiv spider module

WARNING

It's an unfinished work, browsing the code carefully before using it.

Features

0004 -

​ Readme.md updated, comments fixed, variable names fixed.

0003 -

​ Name changed to "Pixiv-spider", bugs fixed, ugoira added.

Installation

Clone or download this repository than get into it and input on your terminal:

python ./setup.py install

Usage

classes

  • LastestPicGetter - Picker to get the lastest artwork by get method
  • Artwork - Format to request and parse artwork data

Example

def main() :

    COOKIE = ""
    # Use Your cookie if you want to login.
    try : 
        with open("./COOKIE.key") as ios:
            COOKIE = ios.readline()
    except :
        print("COOKIE.key not found")

    UA = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
    # User-Agent
    
    PROXIES = {"http":"socks5://127.0.0.1:10808", 
    "https":"socks5://127.0.0.1:10808"}
    # Proxies if needed

    keyword = "艦これ"
    # Keyword for searching
    
    mode = "safe" #"r18" or "safe"
    # Logging in is necessary if using R-18 mode

    picker = LastestPicGetter(keyword, mode = mode,
    cookie = COOKIE,
    UA = UA, 
    proxies = PROXIES)
    #Create a picker by get method

    for i in range(5, 6) :

        picker.request(i)
        picker.parsing()

        print("Result:", list(picker.result.keys()))
        print("Last page:", picker.last_page)

        # picker.request_all()
        picker.download_path_all(".\\pics\\")


if __name__ == "__main__":
    main()

MIT License

Copyright (c) 2021 Uzuki

Permission is hereby granted, free of charge, to any person obtaining a copy

of this software and associated documentation files (the "Software"), to deal

in the Software without restriction, including without limitation the rights

to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

copies of the Software, and to permit persons to whom the Software is

furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all

copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

SOFTWARE.

Owner
Uzuki
My sin is not from Postmodernism, but Postmodernism is from Capitalism.
Uzuki
Minimal set of tools to conduct stealthy scraping.

Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab

Nikolai Tschacher 88 Jan 04, 2023
a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

Charles Leifer 588 Dec 27, 2022
Libextract: extract data from websites

Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python

499 Dec 09, 2022
Crawl BookCorpus

These are scripts to reproduce BookCorpus by yourself.

Sosuke Kobayashi 590 Jan 03, 2023
Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Guilherme Silva Uchoa 3 Oct 04, 2022
Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Crawler Rottentomatoes, Goodreads and IMDB sites crawler. Crawler written by beautifulsoup, selenium and lxml to gather books and films information an

Faeze Ghorbanpour 1 Dec 30, 2021
Web scrapping

Project Setup Table of Contents Project Setup Table of Contents Run project locally Install Requirements Run script Run project locally Install Requir

Charles 3 Feb 04, 2022
Amazon scraper using scrapy, a python framework for crawling websites.

#Amazon-web-scraper This is a python program, which use scrapy python framework to crawl all pages of the product and scrap products data. This progra

Akash Das 1 Dec 26, 2021
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

HONGVVENG 1 Jan 06, 2022
Luis M. Capdevielle 1 Jan 14, 2022
对于有验证码的站点爆破,用于安全合法测试

使用方法 python3 main.py + 配置好的文件 python3 main.py Verify.json python3 main.py NoVerify.json 以上分别对应有验证码的demo和无验证码的demo Tips: 你可以以域名作为配置文件名字加载:python3 main

47 Nov 09, 2022
Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of chart

Samrat Mitra 3 Sep 09, 2022
Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

Bernardas Ališauskas 8 Oct 27, 2022
This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

Saim Zafar 1 Dec 20, 2021
This was supposed to be a web scraping project, but somehow I've turned it into a spamming project

Introduction This was supposed to be a web scraping project, but somehow I've turned it into a spamming project.

Boss Perry (Pez) 1 Jan 23, 2022
京东云无线宝积分推送,支持查看多设备积分使用情况

JDRouterPush 项目简介 本项目调用京东云无线宝API,可每天定时推送积分收益情况,帮助你更好的观察主要信息 更新日志 2021-03-02: 查询绑定的京东账户 通知排版优化 脚本检测更新 支持Server酱Turbo版 2021-02-25: 实现多设备查询 查询今

雷疯 199 Dec 12, 2022
This tool can be used to extract information from any website

WEB-INFO- This tool can be used to extract information from any website Install Termux and run the command --- $ apt-get update $ apt-get upgrade $ pk

1 Oct 24, 2021
TikTok Username Swapper/Claimer/etc

TikTok-Turbo TikTok Username Swapper/Claimer/etc I wanted to create it as fast as possible but i eventually gave up and recoded it many many many many

Kevin 12 Dec 19, 2022
A tool to easily scrape youtube data using the Google API

YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself

7 Dec 03, 2022
Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

Game Scraper Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms. Join the discord About The Proj

KursK 2 Mar 28, 2022