Automatically download and crop key information from the arxiv daily paper. (cpu version)

Last update: Jul 30, 2022

Related tags

Downloader FocusAX

Overview

FocusAX

按关键词筛选arxiv每日最新paper或从arxiv搜索。

自动下载、获取摘要、自动截取文中表格和图片。

安装必要的环境

安装 paddle

# GPU安装
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple

# CPU安装
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple

安装 Layout-Parser

=2.2"">

pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install "paddleocr>=2.2"

按照其他必要的包

pip3 install -r requirements.txt

下载模型权重
将PubLayNet 下载解压后放置在paperparse目录下。目录结构如下

FocusAX
    - paperparse
        - ppyolov2_r50vd_dcn_365e_publaynet
            - inference.pdiparams
            - inference.pdiparams.info
            - inference.pdmodel
        - ...
    - downloader
        - ...
    - utils
        - ...
    - configs.py
    - focus_daily.py
    - focus_search.py
    - README.py
    - ...

使用教程

configs.py ：程序参数配置文件

# =============== 网络代理 ================
# proxy = None # 不使用代理
proxy = {"http": "socks5://127.0.0.1:8080", "https": "socks5://127.0.0.1:8080"}
# =============== 保存文件根目录 ================
root_path = "./arxiv"
# =============== DNN模型推理配置信息 ================
threshold = 0.5
enable_mkldnn = True
enforce_cpu = True
thread_num = 4

focus_daily.py ：按关键字过滤arxiv daily上的文章（仅当日）

if __name__ == '__main__':
    key_words = ['GAN'] # 要包含的关键词
    subject_words = ['ML', 'CV', 'AI']  # 要包含的类别
    start_parse(key_words, subject_words, needPDF=True, needZip=False)

focus_search.py ：按关键字在arxiv检索

start_parse('Keyword')

root_path 目录中将创建新的文件夹保存结果

效果图

每个文件夹中的abs.md文件保留的是当前pdf的介绍，使用Typora等markdown编辑器打开。

ps:论文排版不规范会导致截图混乱。

其他

服务器端推理版本（前后端分离）https://github.com/wmpscc/ArxivDailyOverview

Automatically download and crop key information from the arxiv daily paper. (cpu version)

Related tags

Overview

FocusAX

安装必要的环境

使用教程

效果图

其他

Owner

HeoLis

A standalone pytube wrapper for downloading individual videos from YouTube.

This repository contains code for a youtube-dl GUI written in PyQt.

A CLI that searches and download Youtube videos in mp3 format.

Automatically download and crop key information from the arxiv daily paper. (cpu version)

YouPlay is a python based tool for downloading YouTube videos through its URL

Python based YouTube video Downloader GUI Application.

A股tick下载，自动判断交易日历，获取全市场level1数据

Newsemble is an API that provides easy access to the current news for programmatic analysis

Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

VK sticker downloader with python

Simple package for Sublime Text 4; download URL's for local viewing and editing

A tool to download program information from Bugcrowd, for use by researchers to compare programs they are eligible to participate in

Download YouTube videos/music and images in MP4, JPG with this tool.

A python script that discovers hidden YouTube API clients. Just a research project.

Download minecraft head or skin, allows TLauncher accounts

A cross-platform python based utility to download courses from udemy for personal offline use.

Making the process of downloading youtube videos faster and more convinient.

A youtube-dl fork with additional features and fixes

SABnzbd - The automated Usenet download tool