Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
LGPL Pure Python OPC-UA Client and Server

LGPL Pure Python OPC-UA Client and Server

Free OPC-UA Library 1.2k Jan 04, 2023
IPV4 network calculation project in Python

Curso de Python 3 do Básico ao Avançado Desafio: Calculando redes IPV4 Criar um programa que obtem um numero de IP com o prefixo da mascara de rede. O

Diego Guedes 3 Jan 21, 2022
syncio: asyncio, without await

syncio: asyncio, without await asyncio can look very intimidating to newcomers, because of the async/await syntax. Even experienced programmers can ge

David Brochart 10 Nov 21, 2022
Wifi-jammer - Continuously perform deauthentication attacks on all detectable stations

wifi-jammer Continuously perform deauthentication attacks on all detectable stat

Leonardo de Araujo 14 Nov 03, 2022
Simple self-hosted server to receive files from remote systems

Badtray This is a very simple self-hosted server to receive files from remote systems. This works similar to Bintray (RIP) and primarily designed to d

Alex Taradov 1 Nov 22, 2021
Enrich IP addresses with metadata and security IoC

Stratosphere IP enrich Get an IP address and enrich it with metadata and IoC You need API keys for VirusTotal and PassiveTotal (RiskIQ) How to use fro

Stratosphere IPS 10 Sep 25, 2022
Python implementation of the IPv8 layer provide authenticated communication with privacy

Python implementation of the IPv8 layer provide authenticated communication with privacy

203 Oct 26, 2022
Light, simple RPC framework for Python

Agileutil是一个Python3 RPC框架。基于微服务架构,封装了rpc/http/orm/log等常用组件,提供了简洁的API,开发者可以很快上手,快速进行业务开发。

16 Nov 22, 2022
This tools just for education only - Layer-7 or HTTP FLOODER

Layer-7-Flooder This tools just for education only - Layer-7 or HTTP FLOODER Require Col1 Before You Run this tools How To Use Download This Source Ex

NumeX 7 Oct 30, 2022
An opensource library to use SNMP get/bulk/set/walk in Python

SNMP-UTILS An opensource library to use SNMP get/bulk/set/walk in Python Features Work with OIDS json list [Find Here](#OIDS List) GET command SET com

Alexandre Gossard 3 Aug 03, 2022
Linkedin Connection Automation

Why spend an hour+ a week, connecting with the correct people on LinkedIn when you can go for lunch and let your computer do the hard work?

1 Nov 29, 2021
The best way to send tokens into a specific server, which can be used for discord bots, and some tools..

XTRA420 The simplified version of sending tokens into a server, the basic and fastest way.. When using this, you have the option to use proxies (http)

07v 1 Nov 30, 2021
This is the code repository for the USENIX Security 2021 paper, "Weaponizing Middleboxes for TCP Reflected Amplification".

weaponizing-censors Censors pose a threat to the entire Internet. In this work, we show that censoring middleboxes and firewalls can be weaponized by

UMD Breakerspace 119 Dec 31, 2022
Simple Port Scanner With Socket Module In Python 3x

PortScanner Simple Port Scanner With Socket Module In Python 3x How To Install Requirements Of This Port Scanner sudo apt install python3;sudo apt ins

1 Nov 23, 2021
Send files to your friends over network! (100mb max)

PyServed v2.0.1 Made by Shaurya Pratap Singh Installation Using pip(for stable releases.) - $ pip install pyserved Using Git (for latest updates) -

Sblip.dev 4 Mar 22, 2022
An improved version of the original AutoDD

AutoDD = Automatically does the "due diligence" for you. If you want to know what stocks people are talking about on reddit, this little program might help you.

Steven Zhu 169 Oct 05, 2022
Easy to use gRPC-web client in python

pyease-grpc Easy to use gRPC-web client in python Tutorial This package provides a requests like interface to make calls to gRPC-Web servers.

Sudipto Chandra 4 Dec 03, 2022
A powerful framework for decentralized federated learning with user-defined communication topology

Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated

Johns Hopkins Applied Physics Laboratory 7 Sep 26, 2022
A gRPC-Web implementation for Python

Sonora Sonora is a Python-first implementation of gRPC-Web built on top of standard Python APIs like WSGI and ASGI for easy integration. Why? Regular

Alex Stapleton 216 Dec 30, 2022
Program can control your server via discord bot

GTPS Controller Program can control your server via discord bot Require Python How To Use Download This Source Extract The Zip File Paste gtps.py to y

Lamp 2 Mar 15, 2022