Generate a repository with mirror links for DriveDroid app

Last update: Nov 19, 2022

Overview

DriveDroid Repository Generator

Generate a repository for the app that allow boot a PC using ISO files stored on your Android phone

Check also an official scraper written in JavaScript

Try Already Built Repo

Add the next link to image repositories in DriveDroid app:

https://dd.hexed.pw

https://raw.githubusercontent.com/flameshikari/ddrg/master/repo/repo.json

Requirements
Usage
How to Make a Scraper
Misc
Roadmap
Credits
License

Requirements

Python 3.6+ with packages included in requirements.txt.

I recommend to create a venv then install packages there.

Usage

python ./src/main.py [-i dir] [-o dir] [-g]

-i dir where dir is a directory with distro scrapers (./src/distros is default).

-o dir where dir is a directory where the built repo will be saved (./build is default).

-g will generate a webpage to present the content of repo.json.

-h option is available anyway.

How to Make a Scraper

Create a folder in ./src/distros with next structure:

distro_name
├── info.toml
├── logo.png
└── scraper.py

If distro_name starts with underscore (e.g. _disabled), it will not be counted.

Let's take a look for every file.

`info.toml`

info.toml contains a distro name and a link to the official website. Arch Linux info.toml example:

name = "Arch Linux" # name of distro
url  = "https://example.com" # official site

If info.toml is missing or values ain't provided, fallback values will be used. Arch Linux fallback values will be next:

name = "arch" # distro folder name as value, also used in url
url  = "https://distrowatch.com/table.php?distribution=arch"

`logo.png`

Should be 128x128px with transparent background. Arch Linux logo.png example:

If logo.png is missing, the fallback logo will be used:

`scraper.py`

A scraper can be written as you like, as long as it returns the desired values.

It must return an array of tuples (every tuple contains iso_url, iso_arch, iso_size, iso_version in order).

Arch Linux scraper returns next values:

[
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.05.01/archlinux-2021.05.01-x86_64.iso',
    'x86_64',
    792014848,
    '2021.05.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.06.01/archlinux-2021.06.01-x86_64.iso',
    'x86_64',
    811937792,
    '2021.06.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.07.01/archlinux-2021.07.01-x86_64.iso',
    'x86_64',
    817180672,
    '2021.07.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/archboot/2020.07/archlinux-2020.07-1-archboot-network.iso',
    'x86_64',
    516947968,
    '2020.07'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/archboot/2020.07/archlinux-2020.07-1-archboot.iso',
    'x86_64',
    1280491520,
    '2020.07'
  )
]

A scraper includes from public import * in top which imports next stuff to the namespace:

bs (short for BeautifulSoup)
json
re
requests

Also it includes these functions:

get_afh_url(iso_url) — returns a download link for the file from AndroidFileHost
iso_url must be like this: https://androidfilehost.com/?fid=8889791610682936459
get_iso_arch(iso_url) — returns the used processor architecture of iso_url
get_iso_size(iso_url) — returns the file size of iso_url in bytes

Arch Linux scraper.py example:

from public import *  # noqa


def init():

    array = []
    base_urls = [
        "https://mirror.yandex.ru/archlinux/iso/latest",
        "https://mirror.yandex.ru/archlinux/iso/archboot/latest"
    ]

    for base_url in base_urls:

        html = bs(requests.get(base_url).text, "html.parser")

        for filename in html.find_all("a", {"href": re.compile("^.*\.iso$")}):

            iso_url = f"{base_url}/{filename['href']}"
            iso_arch = get_iso_arch(iso_url)
            iso_size = get_iso_size(iso_url)
            iso_version = re.search(r"-(\d+.\d+(.\d+)?)", iso_url).group(1)

            array.append((iso_url, iso_arch, iso_size, iso_version))

    return array

Misc

Here's a snippet for nginx if you decided to self host the repository with website and you wanna access repo.json only by hostname via DriveDroid. Place it in server section of your config:

location = / {
  if ($http_user_agent ~* 'okhttp') {
    rewrite ^/(.*)$ /repo.json break;
  }
}

Roadmap

Option to generate a webpage
Add a mechanism to retry scraping if a network error occurs
Option to select mirrors (mainly uses mirrors based in Russia)
Package this project perhaps
Probably make the code better

Credits

afh-dl by kade-robertson
Yandex.Disk direct links by DokPub

License

MIT License

Generate a repository with mirror links for DriveDroid app

Related tags

Overview

DriveDroid Repository Generator

Try Already Built Repo

Contents

Requirements

Usage

How to Make a Scraper

`info.toml`

`logo.png`

`scraper.py`

Misc

Roadmap

Credits

License

Owner

Evgeny

An experiment to deploy a serverless infrastructure for a scrapy project.

Web Scraping Framework

Basic-html-scraper - A complete how to of web scraping with Python for beginners

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Visual scraping for Scrapy

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

TikTok Username Swapper/Claimer/etc

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

Minimal set of tools to conduct stealthy scraping.

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

哔哩哔哩爬取器：以个人为中心

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

Amazon scraper using scrapy, a python framework for crawling websites.

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan