The code for 2021 MGTV AI Challenge Anti Stealing Link, and the online result ranks 10th.

Overview

赛题介绍

芒果TV-第二届“马栏山杯”国际音视频算法大赛-防盗链 随着业务的发展,芒果的视频内容也深受网友的喜欢,不少视频网站和应用开始盗播芒果的视频内容,盗链网站不经过芒果TV的前端系统,跳过广告播放,且消耗大量的服务器、带宽资源,直接给公司带来了巨大的经济损失,因此防盗链在日常运营中显得尤为重要。如何从海量日志中识别出盗链行为,并进行有效拦截,也是防盗链工作的核心。 比赛网址:https://challenge.ai.mgtv.com/contest/detail/7

 

模型介绍

  • 对原始特征进行LabelEncoder编码保存为feather文件
  • 构建 group_feature、count_feature、count_time_feature、cumcount_time_feature、cumcount_feature、cumcountratio_feature、diff_feature、category_encoders_feature,对每类特征加原始特征进行LightGBM建模得到特征重要性
  • 汇总各类特征重要性后分别取前50、100、150、200、250个特征进行LightGBM建模,提交200个特征结果B榜得分0.5174

 

代码

python preprocess.py # 原始数据文件放入/data文件夹中,处理原始数据

python feature_engineering.py # 特征工程,单个特征保存为feather文件

python main.py # 模型训练,按日期排序,最后两天数据作为线下验证集数据
Owner
tongji40
tongji40
IDA Pro plugin that shows the comments in a database

ShowComments A Simple IDA Pro plugin that shows the comments in a database Installation Copy the file showcomments.py to the plugins folder under IDA

Fernando Mercês 32 Dec 10, 2022
Companion Web site for Fluent Python, Second Edition

Fluent Python, the site Source code and content for fluentpython.com. The site complements Fluent Python, Second Edition with extra content that did n

Fluent Python 49 Dec 08, 2022
Pdraw - Generate Deterministic, Procedural Artwork from Arbitrary Text

pdraw.py: Generate Deterministic, Procedural Artwork from Arbitrary Text pdraw a

Brian Schrader 2 Sep 12, 2022
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Jon Crall 638 Dec 13, 2022
Grail(TM) is a web browser written in Python

Grail is distributed in source form. It requires that you have a Python interpreter and a Tcl/Tk installation, with the Python interpreter configured for Tcl/Tk support.

22 Oct 18, 2022
Parser for air tickets' price

Air-ticket-price-parser Parser for air tickets' price How to Install Firefox If geckodriver.exe is not compatible with your Firefox version, download

Situ Xuannn 1 Dec 13, 2021
Security-related flags and options for C compilers

Getting the maximum of your C compiler, for security

135 Nov 11, 2022
FollowSpot is a comprehensive audition tracking fullstack web application for entertainment industry professionals.

FollowSpot is a comprehensive audition tracking fullstack web application for entertainment industry professionals. This app allows users to store information/media for all of their auditions while a

Jen Brissman 9 Jul 12, 2022
A powerful and user-friendly binary analysis platform!

angr angr is a platform-agnostic binary analysis framework. It is brought to you by the Computer Security Lab at UC Santa Barbara, SEFCOM at Arizona S

6.3k Jan 02, 2023
ClamNotif: A tool to send you ClamAV notifications

A tool to forward notifications to different recipients categorised by two severity levels of the regular health reports produced by `clamscan` bundled with the ClamAV antivirus engine.

PiSoft Company Ltd. 1 Nov 15, 2021
An Airflow operator to call the main function from the dbt-core Python package

airflow-dbt-python An Airflow operator to call the main function from the dbt-core Python package Motivation Airflow running in a managed environment

Tomás Farías Santana 93 Jan 08, 2023
A collection of python exercises to help your learning path!

How to use Step 1: run this command git clone https://github.com/TechPenguineer/Python-Exercises.git Step 2: Run this command cd Python-Exercises You

Tech Penguin 5 Aug 05, 2021
Taxonomy addition for complete trees

TACT: Taxonomic Addition for Complete Trees TACT is a Python app for stochastic polytomy resolution. It uses birth-death-sampling estimators across an

Jonathan Chang 3 Jun 07, 2022
Automate your Microsoft Learn Student Ambassadors event certificate with Python

Microsoft Learn Student Ambassador Certificate Automation This repo simply use a template certificate docx file and generates certificates both docx a

Muhammed Oğuz 24 Aug 24, 2022
An universal linux port of deezer, supporting both Flatpak and AppImage

Deezer for linux This repo is an UNOFFICIAL linux port of the official windows-only Deezer app. Being based on the windows app, it allows downloading

Aurélien Hamy 154 Jan 06, 2023
Very Simple 2 Message Spammer!

Very Simple 2 Message Spammer!

Syntax. 4 Dec 06, 2022
Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source.

Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source. Getting Started This is an example of ho

14 Apr 10, 2022
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.

Background Remover The Need It's been good long while since Microsoft first released a Teams version for Linux and yet, one of Teams' coolest features

Dylan Turner 80 Dec 20, 2022
The code for 2021 MGTV AI Challenge Anti Stealing Link, and the online result ranks 10th.

赛题介绍 芒果TV-第二届“马栏山杯”国际音视频算法大赛-防盗链 随着业务的发展,芒果的视频内容也深受网友的喜欢,不少视频网站和应用开始盗播芒果的视频内容,盗链网站不经过芒果TV的前端系统,跳过广告播放,且消耗大量的服务器、带宽资源,直接给公司带来了巨大的经济损失,因此防盗链在日常运营中显得尤为重要

tongji40 16 Jun 17, 2022
Python bindings for Basler's VisualApplets TCL script generation

About visualapplets.py The Basler AG company provides a TCL scripting engine to automatize the creation of VisualApplets designs (a former Silicon Sof

Jürgen Hock 2 Dec 07, 2022