Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Dokumentasi belajar Network automation

Repositori belajar network automation dengan Docker, Python & GNS3 Using Frameworks and integrate with: Paramiko Netmiko Telnetlib CSV SFTP Netmiko, S

Daniel.Pepuho 3 Mar 15, 2022
A tool to generate valid ip addresses of 55 countries. These ip's can be used for OpenBullet.

IP-Grabber A tool to generate valid ip addresses of 55 countries. These ip's can be used for OpenBullet. ive added the feature to set the generated ip

Saad 9 Dec 17, 2022
Data Exfiltration without ever making a connection. Using TCP header space.

TCPwned PoC toy code to exfiltrate data without ever making a TCP connection. This will never show up in firewall logs, much less, actually be monitor

2 Nov 21, 2022
simple subdomain finder

Subdomain-finder Simple SubDomain finder using python which is easy to use just download and run it Wordlist you can use your own wordlist but here i

AsjadOwO 5 Sep 24, 2021
A benchmark for stateful fuzzing of network protocols

A benchmark for stateful fuzzing of network protocols

3 Apr 25, 2022
Transfer files to and from a Windows host via ICMP in restricted network environments.

ICMP-TransferTools ICMP-TransferTools is a set of scripts designed to move files to and from Windows hosts in restricted network environments. This is

icyguider 269 Dec 20, 2022
A simple multi-threaded time server and client in python.

time-server-client A simple multi-threaded time server and client in Python. This uses the latest match/case command found in Python 3.10 so requires

Zeeshan Mulk 1 Jan 29, 2022
Python implementation of the Session open group server

API Documentation CLI Reference Want to build from source? See BUILDING.md. Want to deploy using Docker? See DOCKER.md. Installation Instructions Vide

Oxen 36 Jan 02, 2023
IPV4 network calculation project in Python

Curso de Python 3 do Básico ao Avançado Desafio: Calculando redes IPV4 Criar um programa que obtem um numero de IP com o prefixo da mascara de rede. O

Diego Guedes 3 Jan 21, 2022
Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

3 Apr 05, 2022
JF⚡can - Super fast port scanning & service discovery using Masscan and Nmap. Scan large networks with Masscan and use Nmap's scripting abilities to discover information about services. Generate report.

Description Killing features Perform a large-scale scans using Nmap! Allows you to use Masscan to scan targets and execute Nmap on detected ports with

377 Jan 03, 2023
WebRTC and ORTC implementation for Python using asyncio

aiortc What is aiortc? aiortc is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python. It is built o

3.2k Jan 07, 2023
Start a simple TCP Listener on a specified IP Address and Port Number and receive incoming connections.

About Start a simple TCP Listener on a specified IP Address and Port Number and receive incoming connections. Download Clone using git in terminal(git

AgentGeneric 5 Feb 24, 2022
BlueHawk is an HTTP/1.1 compliant web server developed in python

This project is done as a part of Computer Networks course. It aims at the implementation of the HTTP/1.1 Protocol based on RFC 2616 and includes the basic HTTP methods of GET, POST, PUT, DELETE and

2 Nov 11, 2022
This script will make it easier to connect to any wireguard vpn config

wireguard-linux-python-script-vpn This script will make it easier to connect to any wireguard vpn config also u will need your wireguard vpn from your

Jimo 1 Sep 21, 2022
IP-Escaner - A Python Tool to obtain information from an IP address

IP-Escaner Herramienta para obtener informacion de una direccion IP Opciones de

4 Apr 09, 2022
An API for controlling Wi-Fi connections on Balena devices.

Description An API for controlling Wi-Fi connections on Balena devices. It does not contain an interface, instead it provides API endpoints to send re

8 Dec 25, 2022
KoreaVPN - Create a VPN App for Mac Using Automator

VPN app 만들기 (a.k.a. KoreaVPN) VPN을 사용하기 위해 들어가는 10초의 시간을 아끼고, 귀찮음을 최소화 하기 위해 크롤링

DongHee 6 Jan 17, 2022
EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic

EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic

Oli Zimmermann 1 Jan 11, 2022
A repo with study material, exercises, examples, etc for Devnet SPAUTO

MPLS in the SDN Era -- DevNet SPAUTO All of the study notes have now been moved to use auto-generated documentation to build a static site with Githu

Hugo Tinoco 67 Nov 16, 2022