Automatically download and crop key information from the arxiv daily paper.

Overview

Arxiv daily 速览

  • 功能:按关键词筛选arxiv每日最新paper,自动获取摘要,自动截取文中表格和图片。

1 测试环境

  • Ubuntu 16+
  • Python3.7
  • torch 1.9
  • Colab GPU

2 使用演示

首先下载权重baiduyun 提取码:il87,放置于code/ParseServer/models/PubLayNet/faster_rcnn_R_50_FPN_3x/model_final.pth

2.1 环境安装

可选择在本地使用或Colab使用,以本地使用为例。

  • 1.提前安装Pytorch GPU版本
  • 2.在本项目根目录启动jupyter notebook,运行Overview_RUNME_Local.ipynb
  • 3.首次运行,先安装环境

image

  • 4.运行文档版面分析服务,确认正常启动后再运行下一步

image

  • 5.按照需要填写关键词进行筛选,如果需要PDF文件needPDF=True,需要将结果打包needZip=True

image

  • 6.启动后,将同时进行下载和文档版面分析,截取需要的内容。下载的文件将保存在./arxiv 目录下,如果needZip=True,会产生 ./arxiv.zip 文件。

2.2 Colab

  • 将code目录压缩上传 google drive根目录
  • 使用Colab运行Overview_RUNME_Colab.ipynb,后续步骤同2.1

3 效果展示

本地解压后,使用Typora markdown阅览工具可进行查看。

image

image

每个文件夹中的abs.md文件保留的是当前pdf的介绍。

ps:排版不规范会导致截图混乱,这也侧面说明了文章质量。

其他

ps:本着能用就行"堆屎山"代码,有bug描述清楚提issue,定期维护。

Owner
HeoLis
Interesting in generate methods.
HeoLis
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021
A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl

TriNitroTofu 1 Dec 07, 2021
Dex-scrapper - Hobby project for scrapping dex data on VeChain

Folders /zumo_abis # abi extracted from zumo repo /zumo_pools # runtime e

3 Jan 20, 2022
Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

nourollah rezaei 2 Feb 17, 2022
A Scrapper with python

Scrapper-en-python Scrapper des données signifie récuperer des données pour les traiter ou les analyser. En python, il y'a 2 grands moyens de scrapper

Lun4rIum 1 Dec 05, 2021
UsernameScraperTool - Username Scraper Tool With Python

UsernameScraperTool Username Scraper for 40+ Social sites. How To use git clone

E4crypt3d 1 Dec 20, 2022
jd_maotai rpa 基于selenium驱动的jd抢购rpa机器人

jd_maotai rpa 基于selenium驱动的jd抢购rpa机器人, 照顾我们这样的马大哈, 不会忘记抢购了, 祝大家过年都能喝上茅台. 特别声明: 本仓库发布的jd_maotai_rpa项目定义为自动化rpa项目, 是用于防止忘记参与jd茅台的活动(由于本人时常忘记), 而不是为了秒杀和抢

35 Nov 18, 2022
Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Zhijun 18 May 05, 2022
12306抢票脚本

12306抢票脚本

罐子里的茶 457 Jan 05, 2023
A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Alex Papadopoulos 1 Nov 13, 2021
淘宝、天猫半价抢购,抢电视、抢茅台,干死黄牛党

taobao_seckill 淘宝、天猫半价抢购,抢电视、抢茅台,干死黄牛党 依赖 安装chrome浏览器,根据浏览器的版本找到对应的chromedriver下载安装 web版使用说明 1、抢购前需要校准本地时间,然后把需要抢购的商品加入购物车 2、如果要打包成可执行文件,可使用pyinstalle

2k Jan 05, 2023
New World Market Scraper

Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

4 Sep 21, 2022
中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 若对您有所帮助,记得点个Star 🌟 !!! 中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 🥰 🥰 🥰 依赖 本项目依赖的第三方库: requests 在终端执行以下

XWhite 229 Dec 12, 2022
Web3 Pancakeswap Sniper bot written in python3

Pancakeswap_BSC_Sniper_Bot Web3 Pancakeswap Sniper bot written in python3, Please note the license conditions! The first Binance Smart Chain sniper bo

Treading-Tigers 295 Dec 31, 2022
Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

Douglas Trajano 2 Jan 24, 2022
🐞 Douban Movie / Douban Book Scarpy

Python3-based Douban Movie/Douban Book Scarpy crawler for cover downloading + data crawling + review entry.

Xingbo Jia 1 Dec 03, 2022
Discord webhook spammer with proxy support and proxy scraper

Discord webhook spammer with proxy support and proxy scraper

3 Feb 27, 2022
Extract embedded metadata from HTML markup

extruct extruct is a library for extracting embedded metadata from HTML markup. Currently, extruct supports: W3C's HTML Microdata embedded JSON-LD Mic

Scrapinghub 725 Jan 03, 2023
Kusonime scraper using python3

Features Scrap from url Scrap from recommendation Search by query Todo [+] Search by genre Example # Get download url from kusonime import Scrap

MhankBarBar 2 Jan 28, 2022
Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

2 Nov 01, 2021