spider-admin-pro

Overview

Spider Admin Pro

PyPI PyPI - Downloads PyPI - Python Version PyPI - License

Github: https://github.com/mouday/spider-admin-pro

Gitee: https://gitee.com/mouday/spider-admin-pro

Pypi: https://pypi.org/project/spider-admin-pro

简介

Spider Admin Pro 是Spider Admin的升级版

  1. 简化了一些功能;
  2. 优化了前端界面,基于Vue的组件化开发;
  3. 优化了后端接口,对后端项目进行了目录划分;
  4. 整体代码利于升级维护。
  5. 目前仅对Python3进行了支持

安装启动

方式一:

$ pip3 install spider-admin-pro

$ python3 -m spider_admin_pro.run

方式二:

$ git clone https://github.com/mouday/spider-admin-pro.git

$ python3 spider_admin_pro/run.py

配置参数

配置优先级:

yaml配置文件 >  env环境变量 > 默认配置 

1、默认配置

# flask 服务配置
PORT = 5002
HOST = '127.0.0.1'

# 登录账号密码
USERNAME = admin
PASSWORD = "123456"
JWT_KEY = FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0=

# token过期时间,单位天
EXPIRES = 7

# scrapyd地址, 结尾不要加斜杆
SCRAPYD_SERVER = 'http://127.0.0.1:6800'

# 调度器 调度历史存储设置
# mysql or sqlite and other, any database for peewee support
SCHEDULE_HISTORY_DATABASE_URL = 'sqlite:///dbs/schedule_history.db'

# 调度器 定时任务存储地址
JOB_STORES_DATABASE_URL = 'sqlite:///dbs/apscheduler.db'

# 日志文件夹
LOG_DIR = 'logs'

2、env环境变量

在运行目录新建 .env 环境变量文件,默认参数如下

注意:为了与其他环境变量区分,使用SPIDER_ADMIN_PRO_作为变量前缀

如果使用python3 -m 运行,需要将变量加入到环境变量中,运行目录下新建文件env.bash

注意,此时等号后面不可以用空格

# flask 服务配置
export SPIDER_ADMIN_PRO_PORT=5002
export SPIDER_ADMIN_PRO_HOST='127.0.0.1'

# 登录账号密码
export SPIDER_ADMIN_PRO_USERNAME='admin'
export SPIDER_ADMIN_PRO_PASSWORD='123456'
export SPIDER_ADMIN_PRO_JWT_KEY='FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0='

增加环境变量后运行

$ source env.bash

$ python3 -m spider_admin_pro.run

3、config.yml

在运行目录下新建config.yml 文件,运行时会自动读取该配置文件

eg:

# flask 服务配置
PORT: 5002
HOST: '127.0.0.1'

# 登录账号密码
USERNAME: admin
PASSWORD: "123456"
JWT_KEY: "FU0qnuV4t8rr1pvg93NZL3DLn6sHrR1sCQqRzachbo0="

# token过期时间,单位天
EXPIRES: 7

# scrapyd地址, 结尾不要加斜杆
SCRAPYD_SERVER: "http://127.0.0.1:6800"

# 日志文件夹
LOG_DIR: 'logs'

生成jwt key

$ python -c 'import base64;import os;print(base64.b64encode(os.urandom(32)).decode())'

部署优化

1、使用 Gunicorn管理应用

Gunicorn文档:https://docs.gunicorn.org/

# 启动服务
$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app

一个配置示例:gunicorn.conf.py

# -*- coding: utf-8 -*-

"""
$ gunicorn --config gunicorn.conf.py spider_admin_pro.run:app
"""

import multiprocessing
import os

from gevent import monkey

monkey.patch_all()

# 日志文件夹
LOG_DIR = 'logs'

if not os.path.exists(LOG_DIR):
    os.mkdir(LOG_DIR)


def resolve_file(filename):
    return os.path.join(LOG_DIR, filename)


def get_workers():
    return multiprocessing.cpu_count() * 2 + 1


# daemon = True
daemon = False  # 使用supervisor不能是后台进程

# 进程名称
proc_name = "spider-admin-pro"

# 启动端口
bind = "127.0.0.1:5001"

# 日志文件
loglevel = 'debug'
pidfile = resolve_file("gunicorn.pid")
accesslog = resolve_file("access.log")
errorlog = resolve_file("error.log")

# 启动的进程数
# workers = get_workers()
workers = 2
worker_class = 'gevent'


# 启动时钩子
def on_starting(server):
    ip, port = server.address[0]
    print('server.address:', f'http://{ip}:{port}')

2、使用supervisor管理进程

文档:http://www.supervisord.org

spider-admin-pro.ini

[program: spider-admin-pro]
directory=/spider-admin-pro
command=/usr/local/python3/bin/gunicorn --config gunicorn.conf.py spider_admin_pro.run:app

stdout_logfile=logs/out.log
stderr_logfile=logs/err.log

stdout_logfile_maxbytes = 20MB
stdout_logfile_backups = 0
stderr_logfile_maxbytes=10MB
stderr_logfile_backups=0

3、使用Nginx转发请求

server {
    listen 80;

    server_name _;

    access_log  /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_pass         http://127.0.0.1:5001/;
        proxy_redirect     off;

        proxy_set_header   Host                 $host;
        proxy_set_header   X-Real-IP            $remote_addr;
        proxy_set_header   X-Forwarded-For      $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto    $scheme;
    }
}

使用扩展

收集运行日志:scrapy-util 可以帮助你收集到程序运行的统计数据

技术栈:

1、前端技术:

功能 第三方库及文档
基本框架 vue
仪表盘图表 echarts
网络请求 axios
界面样式 Element-UI

2、后端技术

功能 第三方库及文档
接口服务 Flask
任务调度 apscheduler
scrapyd接口 scrapyd-api
网络请求 session-request
ORM peewee
jwt jwt
系统信息 psutil

项目结构

【公开仓库】基于Flask的后端项目spider-admin-pro: https://github.com/mouday/spider-admin-pro

【私有仓库】基于Vue的前端项目spider-admin-pro-web: https://github.com/mouday/spider-admin-pro-web

spider-admin-pro项目主要目录结构:

.
├── run.py        # 程序入口
├── api           # Controller层
├── service       # Sevice层
├── model         # Model层
├── exceptions    # 异常 
├── utils         # 工具类
└── web           # 静态web页

经验总结

Scrapyd 不能直接暴露在外网

  1. 其他人通过deploy部署可以将代码部署到你的机器上,如果是root用户运行,还会在你机器上做其他的事情
  2. 还有运行日志中会出现配置文件中的信息,存在信息泄露的危险

TODO

1. 补全开发文档

2. 支持命令行安装可用

3. 优化代码布局,提取公共库

4. 日志自动刷新

5. scrapy项目数据收集

项目截图

You might also like...
Modern theme for Django admin interface
Modern theme for Django admin interface

Django Suit Modern theme for Django admin interface. Django Suit is alternative theme/skin/extension for Django administration interface. Project home

Django application and library for importing and exporting data with admin integration.
Django application and library for importing and exporting data with admin integration.

django-import-export django-import-export is a Django application and library for importing and exporting data with included admin integration. Featur

:honey_pot: A fake Django admin login screen page.

django-admin-honeypot django-admin-honeypot is a fake Django admin login screen to log and notify admins of attempted unauthorized access. This app wa

"Log in as user" for the Django admin.

django-loginas About "Login as user" for the Django admin. loginas supports Python 3 only, as of version 0.4. If you're on 2, use 0.3.6. Installing dj

Visually distinguish environments in Django Admin
Visually distinguish environments in Django Admin

django-admin-env-notice Visually distinguish environments in Django Admin. Based on great advice from post: 5 ways to make Django Admin safer by hakib

A new style for Django admin
A new style for Django admin

Djamin Djamin a new and clean styles for Django admin based in Google projects styles. Quick start Install djamin: pip install -e git://github.com/her

Responsive Theme for Django Admin With Sidebar Menu
Responsive Theme for Django Admin With Sidebar Menu

Responsive Django Admin If you're looking for a version compatible with Django 1.8 just install 0.3.7.1. Features Responsive Sidebar Menu Easy install

A Django admin theme using Twitter Bootstrap. It doesn't need any kind of modification on your side, just add it to the installed apps.
A Django admin theme using Twitter Bootstrap. It doesn't need any kind of modification on your side, just add it to the installed apps.

django-admin-bootstrapped A Django admin theme using Bootstrap. It doesn't need any kind of modification on your side, just add it to the installed ap

Collection of admin fields and decorators to help to create computed or custom fields more friendly and easy way
Collection of admin fields and decorators to help to create computed or custom fields more friendly and easy way

django-admin-easy Collection of admin fields, decorators and mixin to help to create computed or custom fields more friendly and easy way Installation

Comments
  • Web Interface Source Code

    Web Interface Source Code

    I'm trying to contribute English translation, however, the source code for the web interface isn't publicly available.

    The documentation points me to https://github.com/mouday/spider-admin-pro-web, though I get 404.

    Please can you provide the Vue code to facilitate this?

    opened by karlbaillie 4
  • StatsCollectionModel的日志在哪儿插入的?

    StatsCollectionModel的日志在哪儿插入的?

    只看到写到了api,没看到在哪儿插入日志,我看这个表是用来存结果的,应该是运行结束完回调?也没找到回调的地方。

    @stats_collection_api.post('/addItem')
    def add_item():
        pprint(request.json)
    
        spider_job_id = request.json['job_id']
        project = request.json['project']
        spider = request.json['spider']
        item_scraped_count = request.json['item_scraped_count']
        item_dropped_count = request.json['item_dropped_count']
        start_time = request.json['start_time']
        finish_time = request.json['finish_time']
        duration = request.json['duration']
        finish_reason = request.json['finish_reason']
        log_error_count = request.json['log_error_count']
    
        StatsCollectionModel.create(
            spider_job_id=spider_job_id,
            project=project,
            spider=spider,
            item_scraped_count=item_scraped_count,
            item_dropped_count=item_dropped_count,
            start_time=start_time,
            finish_time=finish_time,
            finish_reason=finish_reason,
            log_error_count=log_error_count,
            duration=duration
        )
    
    
    opened by grayguest 1
  • 上传egg文件报了这个错

    上传egg文件报了这个错

    HTTPConnectionPool(host='127.0.0.1', port=6800): Max retries exceeded with url: /addversion.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000019C73F56DD8>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))

    opened by zengaorong 0
  • 运行统计

    运行统计"/stats/list" no data collected on windows

    ScrapydUtil.parse_log_file raised err:

    venv\lib\site-packages\scrapy_util\utils.py", line 15, in parse_log_file
        _, project, spider, filename = log_file.split('/')
    ValueError: not enough values to unpack (expected 4, got 1)
    

    when you run on windows then change line 15 : "log_file.split('/')" to log_file.split(os.sep) , and "import os" on the top.

    opened by pinkonio 1
Releases(2.0.1)
  • 2.0.1(Nov 24, 2022)

    • 优化前端界面在windows平台显示异常的问题
    • 修复前端调度日志 列表显示异常的问题
    • 优化定时任务添加,自动选中项目和爬虫
    Source code(tar.gz)
    Source code(zip)
  • 2.0.0(Nov 19, 2022)

    • 升级依赖 requirements.txt, Flask 1.0.3 升级为 2.2.2
    • 优化启动方式
    • 优化启动配置,移除PORTHOST 配置项
    • 移除.env环境变量配置,简化配置流程
    • 移除Flask配置读取,推荐使用gunicorn启动服务
    Source code(tar.gz)
    Source code(zip)
Owner
mouday
努力,学习,奋斗
mouday
PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

80 Dec 13, 2022
Sandwich Batch Normalization

Sandwich Batch Normalization Code for Sandwich Batch Normalization. Introduction We present Sandwich Batch Normalization (SaBN), an extremely easy imp

VITA 48 Dec 15, 2022
Jinja is a fast, expressive, extensible templating engine.

Jinja is a fast, expressive, extensible templating engine. Special placeholders in the template allow writing code similar to Python syntax.

The Pallets Projects 9k Jan 04, 2023
Django Smuggler is a pluggable application for Django Web Framework that helps you to import/export fixtures via the automatically-generated administration interface.

Django Smuggler Django Smuggler is a pluggable application for Django Web Framework to easily dump/load fixtures via the automatically-generated admin

semente 373 Dec 26, 2022
Ajenti Core and stock plugins

Ajenti is a Linux & BSD modular server admin panel. Ajenti 2 provides a new interface and a better architecture, developed with Python3 and AngularJS.

Ajenti Project 7k Jan 07, 2023
GFPGAN is a blind face restoration algorithm towards real-world face images.

GFPGAN is a blind face restoration algorithm towards real-world face images.

Applied Research Center (ARC), Tencent PCG 25.6k Jan 04, 2023
Drop-in replacement of Django admin comes with lots of goodies, fully extensible with plugin support, pretty UI based on Twitter Bootstrap.

Xadmin Drop-in replacement of Django admin comes with lots of goodies, fully extensible with plugin support, pretty UI based on Twitter Bootstrap. Liv

差沙 4.7k Dec 31, 2022
A new style for Django admin

Djamin Djamin a new and clean styles for Django admin based in Google projects styles. Quick start Install djamin: pip install -e git://github.com/her

Herson Leite 236 Dec 15, 2022
AaPanel - Simple but Powerful web-based Control Panel

Introduction: aaPanel is the International version for BAOTA panel(www.bt.cn) There have millions servers had installed BAOTA panel since 2014 in Chin

bt.cn 1.4k Jan 09, 2023
A modern Python package manager with PEP 582 support.

A modern Python package manager with PEP 582 support.

Python Development Master(PDM) 3.6k Jan 05, 2023
Jazzy theme for Django

Django jazzmin (Jazzy Admin) Drop-in theme for django admin, that utilises AdminLTE 3 & Bootstrap 4 to make yo' admin look jazzy Installation pip inst

David Farrington 1.2k Jan 08, 2023
There is a new admin bot by @sinan-m-116 .

find me on telegram! deploy me on heroku, use below button: If you can't have a config.py file (EG on heroku), it is also possible to use environment

Sinzz-sinan-m 0 Nov 09, 2021
Jet Bridge (Universal) for Jet Admin – API-based Admin Panel Framework for your application

Jet Bridge for Jet Admin – Admin panel framework for your application Description About Jet Admin: https://about.jetadmin.io Live Demo: https://app.je

Jet Admin 1.3k Dec 27, 2022
Allow foreign key attributes in list_display with '__'

django-related-admin Allow foreign key attributes in Django admin change list list_display with '__' This is based on DjangoSnippet 2996 which was mad

Petr Dlouhý 62 Nov 18, 2022
With Django Hijack, admins can log in and work on behalf of other users without having to know their credentials.

Django Hijack With Django Hijack, admins can log in and work on behalf of other users without having to know their credentials. Docs See http://django

1.2k Jan 05, 2023
Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use

Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use, Lazymux install any of the given tools provided by it

DedSecTL 1.8k Jan 09, 2023
WordPress look and feel for Django administration panel

Django WP Admin WordPress look and feel for Django administration panel. Features WordPress look and feel New styles for selector, calendar and timepi

Maciej Marczewski 266 Nov 21, 2022
Django Semantic UI admin theme

Django Semantic UI admin theme A completely free (MIT) Semantic UI admin theme for Django. Actually, this is my 3rd admin theme for Django. The first

Alex 69 Dec 28, 2022
Python books free to read online or download

Python books free to read online or download

Paolo Amoroso 3.7k Jan 08, 2023
Firebase Admin Console is a centralized platform for easy viewing and maintenance of Firestore database, the back-end API is a Python Flask app.

Firebase Admin Console is a centralized platform for easy viewing and maintenance of Firestore database, the back-end API is a Python Flask app. A starting template for developers to customize, build

Daqi Chen 1 Sep 10, 2022