Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
A Simple but Powerful cross-platform port scanning & and network automation tool.

DEDMAP is a Simple but Powerful, Clever and Flexible Cross-Platform Port Scanning tool made with ease to use and convenience in mind. Both TCP

Anurag Mondal 30 Dec 16, 2022
Some files casually made by @AneekBiswas

Python-Tools All Pyhthon Files are created and managed by @AneekBiswas Modules needed to be downloaded 1.CLI bagels.py random guess.py random text-tow

1 Feb 23, 2022
A tool which is capable of scanning ports as TCP & UDP and detecting open and closed ones.

PortScanner Scan All Open Ports Of The Target IP. A tool which is capable of scanning ports as TCP & UDP and detecting open and closed ones. Clone fro

Msf Nmt 17 Nov 26, 2022
A simple, personal chat program that runs on a single computer. No Internet, just you.

MultiChat A simple, personal chat program that runs on a single computer. No Internet, just you. Simple and Local MultiChat was created with ease of u

Owls 2 Aug 19, 2022
A Cheap Flight Alert program sends you a SMS to notify cheap flights in next 8 months.

Flight Dealer A Cheap Flight Alert program sends you a SMS to notify cheap flights (return trips) in next 6 months. Installing Download the Python 3 i

Aytaç Kaşoğlu 2 Feb 10, 2022
snappi-trex is a snappi plugin that allows executing scripts written using snappi with Cisco's TRex Traffic Generator

snappi-trex snappi-trex is a snappi plugin that allows executing scripts written using snappi with Cisco's TRex Traffic Generator Design snappi-trex c

Open Traffic Generator 14 Sep 07, 2022
Using AWS's API Gateway + Lambda + Python to run a simple websocket application. For learning/testing

Using AWS's API Gateway + Lambda + Python to run a simple websocket application. For learning/testing. The AWS Resources seemed overly complex and were missing some critical gotchas in setting up a s

Seth Miller 15 Dec 23, 2022
This script will make it easier to connect to any wireguard vpn config

wireguard-linux-python-script-vpn This script will make it easier to connect to any wireguard vpn config also u will need your wireguard vpn from your

Jimo 1 Sep 21, 2022
A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines.

VPN Swapper A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines. Dependencies poetry -

Workday 5 Jul 07, 2022
Discord RPC Generator With Python

Discord-RPC-Generator Thank you for using this Discord Custom RP Generator. This is 100% safe and open source. Download Discord for your computer here

1 Nov 09, 2021
Dshell is a network forensic analysis framework.

Dshell An extensible network forensic analysis framework. Enables rapid development of plugins to support the dissection of network packet captures. K

DEVCOM Army Research Laboratory 5.4k Jan 06, 2023
Modern Denial-of-service ToolKit for python

💣 Impulse Modern Denial-of-service ToolKit 💻 Main window 📡 Methods: Method Target Description SMS PHONE Sends a massive amount of SMS messages and

1 Nov 29, 2021
test whether http(s) proxies actually hide your ip

Proxy anonymity I made this for other projects, to find working proxies. If it gets enough support and if i have time i might make it into a gui Repos

gxzs1337 1 Nov 09, 2021
Utility for converting IP Fabric webhooks into a Teams format.

IP Fabric Webhook Integration for Microsoft Teams and/or Slack Setup IP Fabric Setup Go to Settings Webhooks Add webhook Provide a name URL will b

Community Fabric 1 Jan 26, 2022
Apple Store Stock Notifier monitors the availability of selected Apple devices in selected Apple stores, and sends you a notification when devices are available!

Apple Store Stock Notifier This software will immediately send you a notification via Telegram when one of your coveted Apple Devices is available in

Floris-Jan Willemsen 25 Dec 05, 2022
Terminal based chat - networking project with sockets in python

Terminal based chat - networking project with sockets in python

2 Jan 24, 2022
The OUCH Project - OUCH Server/Client

This software simulates OUCH Server/Client communication through a script which initialises a central server and another script which simulates clients connecting to the server.

Jahin Z. 2 Dec 10, 2022
Visualize the electric field of a point charge network.

ElectriPy ⚡ Visualize the electric field of a point charges network. 🔌 Installation Install ElectriPy package: $ pip install electripy You are all d

Dylan Tintenfich 29 Aug 29, 2022
A project that forwards data it receives in a URL POST Request to a Discord Webhook link

Mailman Mailman is a project that basically just forwards data it receives in a URL POST Request to a Discord Webhook link and act as a sort of messag

Prakhar Trivedi 2 Mar 14, 2022
🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

Jan Kupczyk 1 Jan 16, 2022