Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
The best way to send tokens into a specific server, which can be used for discord bots, and some tools..

XTRA420 The simplified version of sending tokens into a server, the basic and fastest way.. When using this, you have the option to use proxies (http)

07v 1 Nov 30, 2021
Simple Port Scanner script written in Python, plans is to expand upon this script to turn it into a GUI based pen testing suite

PortScanner Simple Port Scanner script written in Python, plans is to expand upon this script to turn it into a GUI based pen testing suite. #IMPORTAN

1 Oct 23, 2021
A TrueCharts automatic and bulk update utility

trueupdate A TrueCharts automatic and bulk update utility How to install run pip install trueupdate Please be aware you will need to reinstall after e

TrueCharts 125 Jan 04, 2023
A socket script to obtain chinese phones-sequence for any english word

Foreign Pronunciation Generator (English-Chinese) We provide a simple socket script for acquiring Chinese pronunciation of English words (phones in ai

Ephemeroptera 5 Jul 25, 2022
TsuserverMoS - A Python-based server for Attorney Online,

tsuserverMoS A Python-based server for Attorney Online, forked from RealKaiser/tsuserverCC Requires Python 3.7+ and PyYAML. Changes/additions from tsu

1 Dec 30, 2021
Python Program to connect to different VPN servers autoatically using Windscribe VPN.

AutomateVPN What is VPN ? VPN stands for Virtual Private Network , it is a technology that creates a safe and encrypted connectionover a less secure n

Vivek 1 Oct 27, 2021
Qtas(Quite a Storage)is an experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.

Qtas(Quite a Storage)is a experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.

Jiaming Zhang 3 Jan 12, 2022
GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python.

GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python. Installation: This program requires Python 3.9. Linux

gl0ky 5 Jun 25, 2022
NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

TRIKKSS 5 Oct 05, 2022
Network Dynaimcs Simulation

A Final Year Project in CUHK, Autumn 2021 Network Dynaimcs Simulation Files param.h edit all the variables & settings here simulate.c the main program

Likchun 0 Mar 28, 2022
Python code that get the name and ip address of a computer/laptop

IP Address This is a python code that provides the name and the internet protocol address of the computer. You need to install socket pip install sock

CODE 2 Feb 21, 2022
Nautobot is a Network Source of Truth and Network Automation Platform.

Nautobot is a Network Source of Truth and Network Automation Platform. Nautobot was initially developed as a fork of NetBox (v2.10.4). Nautobot runs as a web application atop the Django Python framew

Nautobot 549 Dec 31, 2022
MQTT Explorer - MQTT Subscriber client to explore topic hierarchies

mqtt-explorer MQTT Explorer - MQTT Subscriber client to explore topic hierarchies Overview The MQTT Explorer subscriber client is designed to explore

Gambit Communications, Inc. 4 Jun 19, 2022
A simple hosts picker for Microsoft Services

A simple Python scrip for you to select the fastest IP for Microsoft services.

Konnyaku 394 Dec 17, 2022
Extended refactoring capabilities for Python LSP Server using Rope.

pylsp-rope Extended refactoring capabilities for Python LSP Server using Rope. This is a plugin for Python LSP Server, so you also need to have it ins

36 Dec 24, 2022
Linux SBC featuring two wifi radios, masquerading as a USB charger.

The WiFiWart is an open source WiFi penetration device masquerading as a regular wall charger. It features a 1.2Ghz Cortex A7 MPU with two WiFi chips onboard.

Walker 151 Dec 26, 2022
BaseSpec is a system that performs a comparative analysis of baseband implementation and the specifications of cellular networks.

BaseSpec is a system that performs a comparative analysis of baseband implementation and the specifications of cellular networks. The key intuition of BaseSpec is that a message decoder in baseband s

SysSec Lab 35 Dec 06, 2022
Best discord webhook spammer using proxy (support all proxy type)

Best discord webhook spammer using proxy (support all proxy type)

Iтѕ_Ѵιcнч#1337 25 Nov 01, 2022
gRPC typing stubs for Python

gRPC Typing Stubs for Python This is a PEP-561-compliant stub-only package which provides type information of gRPC. Install using pip: pip install grp

Blake Williams 27 Dec 20, 2022
This is a small python code that I use with my NAS server connected to Plex

Spotifarr This is a small python code that I use with my NAS server connected to Plex I didn't appreciate how Lidarr works because it downloads a full

Automator 35 Oct 04, 2022