List of content farm sites like g.penzai.com.

Last update: Jan 03, 2023

Overview

内容农场网站清单

Google 中文搜索结果包含了相当一部分的内容农场式条目，比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站，页面内容为自动生成，大量堆叠关键字，揉杂一些爬取到的内容，完全不具可读性和参考价值。

尤为过分的是，该类网站可能有成千上万个分身域名被 Google 收录，严重影响搜索体验。详见 2021 年 10 初的社区反馈：

使用正则匹配标题的方式不能完全屏蔽，所以为方便广大网友过滤搜索结果，特整理此清单。

由于此次事件主角「小搭百科网」在造成影响后主动关站，所以接下来也将关注、收录其他的类似内容农场站。

使用方式

uBlacklist

安装 uBlacklist：

Chrome Web Store / Firefox Add-ons / App Store (for macOS and iOS)

后进入 Option 菜单，点击 Add a subscription，输入如下内容：

Name: content-farm-list
URL: https://raw.githubusercontent.com/wdmpa/content-farm-list/main/uBlacklist.txt

或

Name: content-farm-list
URL: https://wdmpa.org/content-farm-list/uBlacklist.txt

单击 'Add' 按钮。

订阅说明

文件	说明
`uBlacklist.txt`	uBlacklist 规则集合
`Surge.txt`	Surge 规则集合
`uBlacklist/spam/g.penzai.com.txt`	uBlacklist 专用小搭百科网域名集合
`Surge/spam/g.penzai.com.txt`	Surge 专用小搭百科网域名集合
`uBlacklist/machine-translated/stackoverflow.txt`	uBlacklist 专用机翻 StackOverflow 域名集合
`Surge/machine-translated/stackoverflow.txt`	Surge 专用机翻 StackOverflow 域名集合

设置搜索引擎

因与清单中域名匹配的结果会被移除，所以搜索引擎的结果页剩余条目太少，不便浏览，建议登录后设置搜索结果显示为每页面 100 条。

我们能做什么？

一、发 PR 添加域名

从本地插件 uBlacklist 中导出域名列表
在搜索引擎中尝试长尾关键词，以发现更多目前权重尚低的农场域名

按结构在 domains 目录中添加新的分类集合文件。参考文件中已有内容的格式，在任意位置添加即可。（Fork 本仓库后编辑再 Push，或在页面中编辑均可。）

文件	说明
`domains/spam/g.penzai.com.txt`	小搭百科网域名集合
`domains/machine-translated/stackoverflow.txt`	机翻 StackOverflow 域名集合

提交后，脚本会自动更新订阅文件中的内容。

二、举报

向其使用的云服务提供商举报其滥用行为。

List of content farm sites like g.penzai.com.

Related tags

Overview

内容农场网站清单

使用方式

uBlacklist

Google Hit Hider

Install

Manage lists

订阅说明

设置搜索引擎

我们能做什么？

一、发 PR 添加域名

二、举报

Owner

WDMPA

Caffe implementation for Hu et al. Segmentation for Natural Language Expressions

Datasets for new state-of-the-art challenge in disentanglement learning

Multi-Task Deep Neural Networks for Natural Language Understanding

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Embracing Single Stride 3D Object Detector with Sparse Transformer

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

Anagram Generator in Python

Utilizes Pose Estimation to offer sprinters cues based on an image of their running form.

Official TensorFlow code for the forthcoming paper

Lazy, a tool for running things in idle time

Code repository for our paper regarding the L3D dataset.

CM building dataset Timisoara

A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation