Automatically download and crop key information from the arxiv daily paper.

Last update: Jul 30, 2022

Related tags

Overview

Arxiv daily 速览

功能：按关键词筛选arxiv每日最新paper，自动获取摘要，自动截取文中表格和图片。

1 测试环境

Ubuntu 16+
Python3.7
torch 1.9
Colab GPU

2 使用演示

首先下载权重baiduyun 提取码:il87，放置于code/ParseServer/models/PubLayNet/faster_rcnn_R_50_FPN_3x/model_final.pth

2.1 环境安装

可选择在本地使用或Colab使用，以本地使用为例。

1.提前安装Pytorch GPU版本
2.在本项目根目录启动jupyter notebook，运行Overview_RUNME_Local.ipynb
3.首次运行，先安装环境

4.运行文档版面分析服务，确认正常启动后再运行下一步

5.按照需要填写关键词进行筛选，如果需要PDF文件needPDF=True，需要将结果打包needZip=True

6.启动后，将同时进行下载和文档版面分析，截取需要的内容。下载的文件将保存在./arxiv 目录下，如果needZip=True，会产生 ./arxiv.zip 文件。

2.2 Colab

将code目录压缩上传 google drive根目录
使用Colab运行Overview_RUNME_Colab.ipynb，后续步骤同2.1

3 效果展示

本地解压后，使用Typora markdown阅览工具可进行查看。

每个文件夹中的abs.md文件保留的是当前pdf的介绍。

ps:排版不规范会导致截图混乱，这也侧面说明了文章质量。

其他

ps:本着能用就行"堆屎山"代码，有bug描述清楚提issue，定期维护。

Automatically download and crop key information from the arxiv daily paper.

Related tags

Overview

Arxiv daily 速览

1 测试环境

2 使用演示

2.1 环境安装

2.2 Colab

3 效果展示

其他

Owner

HeoLis

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Web and PDF Scraper Refactoring

Crawl the information of a given keyword on Google search engine

A Pixiv web crawler module

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

A module for CME that spiders hashes across the domain with a given hash.

Scraping news from Ucsal portal with Scrapy.

🐞 Douban Movie / Douban Book Scarpy

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

A web crawler for recording posts in "sina weibo"

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

A pure-python HTML screen-scraping library

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

Open Crawl Vietnamese Text

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

Python script who crawl first shodan page and check DBLTEK vulnerability

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post