Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Last update: Apr 05, 2022

Related tags

Web Crawling github-scraper-app

Overview

Github Scraper

Github scraper app is used to scrape data for a specific user profile.
Github scraper app gets a github profile name and check whether the given user name is exists or not.
If the user name exists, app will scrape the data from that github profile.
If the user name doesn't exists, app displays a info message.
You can download the scraped data in CSV,JSON and pandas profiling HTML report formats.

Installation :-

To install all necessary requirement packages for the app 👇

pip install -r requirements.txt

Packages Used :-

import requests
import pandas as pd
import streamlit as st
from bs4 import BeautifulSoup
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report

Function To Scrape the Data :-

def ScrapeData(user_name):
    url = "https://github.com/{}?tab=repositories".format(user_name)
    page = requests.get(url) 
    soup = BeautifulSoup(page.content, "html.parser")
    info = {"name": soup.find(class_="vcard-fullname").get_text()}
    info["image_url"] = soup.find(class_="avatar-user")["src"]
    info["followers"] = (
        soup.select_one("a[href*=followers]").get_text().strip().split("\n")[0]
    )
    info["following"] = (
        soup.select_one("a[href*=following]").get_text().strip().split("\n")[0]
    )

    try:
        info["location"] = soup.select_one("li[itemprop*=home]").get_text().strip()
    except:
        info["location"] = ""

    try:
        info["url"] = soup.select_one("li[itemprop*=url]").get_text().strip()
    except:
        info["url"] = ""

    repositories = soup.find_all(class_="source")
    repo_info = []
    for repo in repositories:
        try:
            name = repo.select_one("a[itemprop*=codeRepository]").get_text().strip()
            link = "https://github.com/{}/{}".format(user_name, name)
        except:
            name = ""
            link = ""
            
        try:
            updated = repo.find("relative-time").get_text()
        except:
            updated = ""

        try:
            language = repo.select_one("span[itemprop*=programmingLanguage]").get_text()
        except:
            language = ""

        try:
            description = repo.select_one("p[itemprop*=description]").get_text().strip()
        except:
            description = ""

        repo_info.append(
            {
                "name": name,
                "link": link,
                "updated ": updated,
                "language": language,
                "description": description,
            }
        )
    repo_info = pd.DataFrame(repo_info)
    return info, repo_info

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Related tags

Overview

Github Scraper

Installation :-

Packages Used :-

Function To Scrape the Data :-

Demo GIF Image 👇 :-

Owner

Siva Prakash

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Scrape all the media from an OnlyFans account - Updated regularly

download NCERT books using scrapy

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

PyQuery-based scraping micro-framework.

A web crawler script that crawls the target website and lists its links

Scrape Twitter for Tweets

This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.

A Very simple free proxy list scraper.

Scraping weather data using Python to receive umbrella reminders

Web Scraping Practica With Python

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

Danbooru scraper with python

An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

SkyScrapers: A collection of variety of Scraping Apps

An IpVanish Proxies Scraper

This is python to scrape overview and reviews of companies from Glassdoor.

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

script to scrape direct download links (ddls) from google drive index.

联通手机营业厅自动做任务、签到、领流量、领积分等。