Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Last update: Apr 05, 2022

Related tags

Web Crawling github-scraper-app

Overview

Github Scraper

Github scraper app is used to scrape data for a specific user profile.
Github scraper app gets a github profile name and check whether the given user name is exists or not.
If the user name exists, app will scrape the data from that github profile.
If the user name doesn't exists, app displays a info message.
You can download the scraped data in CSV,JSON and pandas profiling HTML report formats.

Installation :-

To install all necessary requirement packages for the app 👇

pip install -r requirements.txt

Packages Used :-

import requests
import pandas as pd
import streamlit as st
from bs4 import BeautifulSoup
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report

Function To Scrape the Data :-

def ScrapeData(user_name):
    url = "https://github.com/{}?tab=repositories".format(user_name)
    page = requests.get(url) 
    soup = BeautifulSoup(page.content, "html.parser")
    info = {"name": soup.find(class_="vcard-fullname").get_text()}
    info["image_url"] = soup.find(class_="avatar-user")["src"]
    info["followers"] = (
        soup.select_one("a[href*=followers]").get_text().strip().split("\n")[0]
    )
    info["following"] = (
        soup.select_one("a[href*=following]").get_text().strip().split("\n")[0]
    )

    try:
        info["location"] = soup.select_one("li[itemprop*=home]").get_text().strip()
    except:
        info["location"] = ""

    try:
        info["url"] = soup.select_one("li[itemprop*=url]").get_text().strip()
    except:
        info["url"] = ""

    repositories = soup.find_all(class_="source")
    repo_info = []
    for repo in repositories:
        try:
            name = repo.select_one("a[itemprop*=codeRepository]").get_text().strip()
            link = "https://github.com/{}/{}".format(user_name, name)
        except:
            name = ""
            link = ""
            
        try:
            updated = repo.find("relative-time").get_text()
        except:
            updated = ""

        try:
            language = repo.select_one("span[itemprop*=programmingLanguage]").get_text()
        except:
            language = ""

        try:
            description = repo.select_one("p[itemprop*=description]").get_text().strip()
        except:
            description = ""

        repo_info.append(
            {
                "name": name,
                "link": link,
                "updated ": updated,
                "language": language,
                "description": description,
            }
        )
    repo_info = pd.DataFrame(repo_info)
    return info, repo_info

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Related tags

Overview

Github Scraper

Installation :-

Packages Used :-

Function To Scrape the Data :-

Demo GIF Image 👇 :-

Owner

Siva Prakash

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

让中国用户使用git从github下载的速度提高1000倍!

自动完成每日体温上报（Github Actions）

Dex-scrapper - Hobby project for scrapping dex data on VeChain

A simple code to fetch comments below an Instagram post and save them to a csv file

mlscraper: Scrape data from HTML pages automatically with Machine Learning

Simply scrape / download all the media from an fansly account.

Google Scholar Web Scraping

This project was created using Python technology and flask tools to scrape a music site

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

一些爬虫相关的签名、验证码破解

A high-level distributed crawling framework.

Binance Smart Chain Contract Scraper + Contract Evaluator

a small library for extracting rich content from urls

A tool for scraping and organizing data from NewsBank API searches

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.