Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Lets you remove all friends, leave GCs, and leave servers, in an instant!

anonymity Lets you remove all friends, leave GCs, and leave servers, in an instant! You can also do each of them by themselves. First, you need to get

1 Dec 07, 2021
Simple Port Scanner script written in Python, plans is to expand upon this script to turn it into a GUI based pen testing suite

PortScanner Simple Port Scanner script written in Python, plans is to expand upon this script to turn it into a GUI based pen testing suite. #IMPORTAN

1 Oct 23, 2021
openPortScanner is a port scanner made with Python!

Port Scanner made with python • Installation • Usage • Commands Installation Run this to install: $ git clone https://github.com/Miguel-Galdin0/openPo

Miguel Galdino 7 Jan 09, 2022
KoreaVPN - Create a VPN App for Mac Using Automator

VPN app 만들기 (a.k.a. KoreaVPN) VPN을 사용하기 위해 들어가는 10초의 시간을 아끼고, 귀찮음을 최소화 하기 위해 크롤링

DongHee 6 Jan 17, 2022
Simple Port Scanner With Socket Module In Python 3x

PortScanner Simple Port Scanner With Socket Module In Python 3x How To Install Requirements Of This Port Scanner sudo apt install python3;sudo apt ins

1 Nov 23, 2021
AdaFruit Funhouse publishing Temperature, Humidity and Pressure to MQTT / Apache Pulsar

pulsar-adafruit-funhouse AdaFruit Funhouse publishing Temperature, Humidity and Pressure to MQTT / Apache Pulsar Device Get your own from adafruit Ada

Timothy Spann 1 Dec 30, 2021
The sequel to SquidNet. It has many of the previous features that were in the original script, however a lot of the functions that do not serve much functionality have been removed.

SquidNet2 The sequel to SquidNet. It has many of the previous features that were in the original script, however a lot of the functions that do not se

DrSquidX 5 Mar 25, 2022
🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

Jan Kupczyk 1 Jan 16, 2022
Repo used to maintain all notes and scripts developed during my DevNet Expert studies

DevNet Expert Studies Exam Date: TBD (Waiting for registration to open) This repository will be used to track my progress and maintain all notes/scrip

Dan 32 Dec 11, 2022
Automated network configuration backups using Github actions and git-scraping

Network Config Scraper This repository demonstrates the use of Github Actions and git-scraping to build an automated backup solution for network confi

WWT 19 Dec 14, 2022
EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic

EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic

Oli Zimmermann 1 Jan 11, 2022
Display ip2.network active live streams.

Display ip2.network active live streams.

Daeshon Jones 0 Oct 31, 2021
Takes a file of hosts or domains and outputs the IP address of each host/domain in the file.

Takes a file of hosts or domains and outputs the IP address of each host/domain in the file. Installation $ git clone https://github.com/whoamisec75/i

whoami security 2 May 10, 2022
Quickly fetch your WiFi password and if needed, generate a QR code of your WiFi to allow phones to easily connect

wifi-password Quickly fetch your WiFi password and if needed, generate a QR code of your WiFi to allow phones to easily connect. Works on macOS and Li

Siddharth Dushantha 2.6k Jan 05, 2023
This is a python based command line Network Scanner utility, which input as an argument for the exact IP address or the relative IP Address range you wish to do the Network Scan for and returns all the available IP addresses with their MAC addresses on your current Network.

This is a python based command line Network Scanner utility, which input as an argument for the exact IP address or the relative IP Address range you wish to do the Network Scan for and returns all t

Abhinandan Khurana 1 Feb 09, 2022
This repository contain sample code of gRPC Communication between Python and GoLang

This repository contain sample code of gRPC Communication between Python and GoLang, the Server is running on GoLang while Python is running the client

Abdullahi Muhammad 2 Nov 29, 2021
Fast and configurable script to get and check free HTTP, SOCKS4 and SOCKS5 proxy lists from different sources and save them to files

Fast and configurable script to get and check free HTTP, SOCKS4 and SOCKS5 proxy lists from different sources and save them to files. It can also get geolocation for each proxy and check if proxies a

Almaz 385 Dec 31, 2022
Tool for quickly gathering information from Shodan.io about the number of IPs which satisfy large number of different queries

TriOp Tool for quickly gathering information from Shodan.io about the number of IPs which satisfy large number of different queries For furt

Jan Kopriva 27 Nov 03, 2022
With the use of this tool, you can change your MAC address

Akshat0404/MAC_CHANGER This tool has to be used on linux kernel. Now o

1 Jan 25, 2022
API to establish connection between server and client

Socket-programming API to establish connection between server and client, socket.socket() creates a socket object that supports the context manager ty

Muziwandile Nkomo 1 Oct 30, 2021