Web Downloader With Python

Overview

Web Downloader

Introduction

This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href link file based on the input url (the url must start with 'http' or 'https' ).

To prosses multiple URLs at the same time, The user can list all the url he wants to download in the file "urllist.txt" as shown below:

# Add the URL you want to download line by line(The url must start with 'http' or 'https' ):
# example: https://www.google.com
https://www.google.com
https://www.carousell.sg/
https://www.google.com/search?q=github&sxsrf=AOaemvJh3t5_h8H85AE8Ajbb1IMnBrRISA%3A1636698503535&source=hp&ei=hwmOYY6mHdGkqtsPq8S9sAY&iflsig=ALs-wAMAAAAAYY4Xl7GLWS16_xc2Q9XrG0p3q277DpkL&oq=&gs_lcp=Cgdnd3Mtd2l6EAEYADIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzINCC4QxwEQowIQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJ1AAWABgjgdoAXAAeACAAQCIAQCSAQCYAQCwAQo&sclient=gws-wiz
https://stackoverflow.com/questions/66022042/how-to-let-kubernetes-pod-run-a-local-script/66025424

Program Setup

Development Environment : python 3.7.4
Additional Lib/Software Need
  1. beautifulsoup4 4.10.0

    install:

    pip install beautifulsoup4
    

    Lib link: https://pypi.org/project/beautifulsoup4/

Hardware Needed : None
Program File List

version: v0.1

Program File Execution Env Description
webDownload.py python 3 Main executable program use the API.
urllist.txt url record list.

Program Usage

Module API Usage
  1. Downloader init:
soup = urlDownloader(imgFlg=True, linkFlg=True, scriptFlg=True)
  • imgFlg: Set to "True" to download all the "" tag files.
  • linkFlg: Set to "True" to download all the html section, image, icon, css file imported by ""
  • scriptFlg: set to "True" to download all the js file.
  1. Call API method savePage to scape url and save the data in a folder

    soup.savePage('
         
          ', '
          
           ')
    
    # Exampe:
    soup.savePage('https://www.google.com', 'www_google_com')
    
          
         
Program Execution
  1. Copy the url you want to check in the url record file "urllist.txt"

  2. Cd to the program folder and run program execution cmd:

    python webDownload.py
    
  3. Check the result:

    For example, if you copy the url "https://www.carousell.sg/" as the first url you want to check into the file "urllist.txt" file, all the html files, image file and js files will be under folder "1_www.carousell.sg_files"

    • The main web page will be saved as: "1_www.carousell.sg_files/1_www.carousell.sg.html"
    • The image used in the page will be saved in folder: "1_www.carousell.sg_files/img"
    • The html/imge/css import by href will be saved in folder: "1_www.carousell.sg_files/link"
    • The js file used by the page will be saved in fodler: "1_www.carousell.sg_files/script"

Problem and Solution

Problem[0]: Files download got slight different

Why there is a slight different between the files which download by using the program and the files which downlaod I use some-webBrowser's "page save as " for the same URL such as www.google.com

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

This is normal situation, the logic of web scrape and browser display are different: if you type www.google.ccom if different people's browser, you can see the page shown on different browser are also different. This is because the browser cache, token in the local storage , cookie will make influence of the "GET" request. So when different people type in the google URL in their browser, they can see their own Gmail Icon shows on the right top corner. If you remove all the cache, token in the local storage , cookie of your browser and try "page save as ", the file downloaded by "page save as " should be same as the program.

Problem[2]: Some download Image are empty

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

If a web use third party's storage to save the image and the net-storage need to authorization before download, our program download request will be reject and got 'null' when download the file. Then the saved image will be empty.


Last edit by LiuYuancheng([email protected]) at 13/11/2021

The free and open-source Download Manager written in pure Python

The free and open-source Download Manager written in pure Python

pyLoad 2.7k Dec 31, 2022
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
Automatically download multiple papers by keywords in CVPR

Automatically download multiple papers by keywords in CVPR

46 Jun 08, 2022
Download and save Bing wallpapers and set as background for GNOME desktop

Save Bing wallpapers and set as background for GNOME desktop This script downloads the Bing wallpaper and sets it in the background of your gnome desk

manikamran 2 Nov 06, 2021
Download India Stocks Historical Data

Kite Helper - Download Stock Market Data 🌎 Website Simple Application to Download any stock market data in .csv format using Kite 🏃‍♂️ Running Serve

Pishang Ujeniya 12 Dec 06, 2022
Python module to download all media from a CyberDrop gallery.

CyberDrop Downloader Intro Let's suppose you found out the Eva G (bby_gee) leak on https://cyberdrop.me/a/aWAt4TWY. You wish you could download the en

Quatrecentquatre 1 Dec 12, 2021
Download YOUR files, documents from vk.

vk-documents-downloader Кароч эта симпл херня качает все ВАШИ документы с вк. Или я еблан, но в гх и тмб гугле я подобного не нашел. py main.py Login:

4 Jun 10, 2022
Throttle qBittorrent on Plex stream Start/Stop

Dependencies Python 3.6+ 'qbittorrent-api' Python Library Tautulli Script Setup Edit qbittorrent_throttle.py and set qBittorrent username, password an

6 Sep 24, 2022
Code for "Temporal Difference Learning for Model Predictive Control"

Temporal Difference Learning for Model Predictive Control Original PyTorch implementation of TD-MPC from Temporal Difference Learning for Model Predic

Nicklas Hansen 156 Jan 03, 2023
Google Art Image Downloader Tkinter

Google-Art-Image-Downloader-Tkinter 由 google-art-downloader 整改的批量 Google 艺术展平台高清图片下载 ⭐ It works perfectly from 2018 year till today, thanks for stars!

PY-GZKY 1 Jan 05, 2022
Youtube-music - Youtube music with python

youtube-music fzf on https://github.com/junegunn/fzf python3 ytb.py [no/yes] yes

direskyfer 0 Feb 03, 2022
Fetch papers and metadata.

Fetch PubMed Central for open-access papers as well as Sci-Hub

4 Oct 31, 2022
Python youtube playlist downloader

Youtube-Playlist-Downloader-python 👍 This program is a simple Youtube playlist downloader where you input the playlist link, and then the desired pat

Pepczenko 2 Dec 25, 2021
Itchio Downloader Tool with python

Itchio Downloader Tool Install pip install git+https://github.com/emersont1/itchio Download All Games in library from account python -m itchio.downloa

Peter Taylor 69 Dec 05, 2022
A growing collection of search plugins for the qBittorrent, an awesome and opensource torrent client

qBittorrent Search Plugins This is a still growing collection of search plugins for qBittorent, an amazing and open source torrent client, maintained

Alessio Tudisco 59 Dec 26, 2022
A simple Python +3.x script to download videos from Facebook.

Facebook Video Downloader A simple Python +3.x script to download videos from Facebook posts

Kerolos Atef Saber 1 Dec 03, 2021
python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

VVHai 2 Dec 21, 2021
An Inline Telegram bot that can download YouTube videos with permanent thumbnail support

Tube (YouTube Downloader) An Inline Telegram bot that can download YouTube videos with permanent thumbnail support About Bot need to be in Inline Mode

Renjith Mangal 30 Dec 14, 2022
This Program helps you download songs from the Spotify track's link you give in.

Spotify-Downloader-GUI This Program helps you download songs from the Spotify track's link you give in. It uses yt-dlp to download songs from Youtube.

Harish 12 Jun 14, 2022
pubmex.py - a script to get a fancy paper title based on given DOI or PMID

pubmex.py is a script to get a fancy paper title based on given DOI or PMID (can be also combined with macOS Finder)

Marcin Magnus 13 Nov 20, 2022