Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Overview

Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2004 a 2021 obtidos a partir do seguinte dataset do Kaggle.

Tecnologias

  • Python 3
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-Learn
  • Requests
  • REST API Call (Github API)

Algoritmos

  • Regressão Linear

Inicialmente, serão visualizados dados de séries temporais e regressão linear.

Tratamento de dados

df = pd.read_csv('Most Popular Programming Languages from 2004 to 2021 V4.csv')

def createDataFrameFor(df, colunas, colunaAtual):
    return pd.DataFrame(
        {
            'Date': df.Date,
            'Timestamp': map(lambda i : datetime.strptime(df["Date"][i], '%B %Y'), range(len(df.Date))),
            'Language': colunas[colunaAtual],
            'Value': df[df.columns[colunaAtual]]
        }
    )

colunas = df.columns

dados_tratados = createDataFrameFor(df, colunas, 1)

for coluna in range(1, len(colunas)):
    dados_tratados = pd.concat([dados_tratados, createDataFrameFor(df, colunas, coluna)])

dados_tratados.reset_index(drop=True, inplace=True)

dados_tratados['UnixTime'] = list(map(lambda i: (pd.to_datetime([dados_tratados['Timestamp'][i]]).astype(int) / 10**9)[0], range(len(dados_tratados['Date']))))

Visualização dos dados

df_java = dados_tratados[dados_tratados['Language'] == 'Java']
sns.regplot(x="UnixTime", y="Value", data= df_java)
plt.gcf().set_size_inches(16, 6)
plt.ylabel('% de uso da linguagem Java')
plt.xlabel('Anos em Unix Time de 2004 a 2021')
plt.show()

X = df_java.UnixTime.values.reshape(-1, 1)
y = df_java.Value.values.reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

reg = linear_model.LinearRegression()
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)

plt.scatter(X_test, y_test,  color='blue')
plt.plot(X_test, y_pred, color='red', linewidth=3)

plt.gcf().set_size_inches(16, 6)

plt.ylabel('% de uso da linguagem Java')
plt.xlabel('Anos em Unix Time de 2004 a 2021')

plt.show()

Plots gerados

  • Regressão Linear com Seaborn (SNS):

Seaborn

  • Regressão Linear com Scikit-Learn (LinearRegression) e Matplotlib:

Matplotlib ScikitLearn

Plots de evolução das linguagens Java, Javascript, Python, C#, PHP, Delphi, Dart e Cobol nos últimos 17 anos

dados = df

dados['Date'] = pd.to_datetime(dados['Date'])
dados.set_index('Date', inplace = True) 

fig, axes = plt.subplots(nrows=4, ncols=2)

dados['Java'].plot(ax=axes[0,0], title = "Análise da linguagem Java nos últimos 17 anos")
dados['JavaScript'].plot(ax=axes[1,0], title = "Análise da linguagem JavaScript nos últimos 17 anos")
dados['Python'].plot(ax=axes[0,1], title = "Análise da linguagem Python nos últimos 17 anos")
dados['C#'].plot(ax=axes[1,1], title = "Análise da linguagem C# nos últimos 17 anos")
dados['PHP'].plot(ax=axes[2,0], title = "Análise da linguagem PHP nos últimos 17 anos")
dados['Delphi'].plot(ax=axes[2,1], title = "Análise da linguagem Delphi nos últimos 17 anos")
dados['Dart'].plot(ax=axes[3,0], title = "Análise da linguagem Dart nos últimos 17 anos")
dados['Cobol'].plot(ax=axes[3,1], title = "Análise da linguagem Cobol nos últimos 17 anos")

plt.gcf().set_size_inches(16, 22)

plt.show()

17 anos

Autor

  • Victor Hugo Negrisoli
  • Desenvolvedor Back-End Sênior | Analista de Dados
Owner
Victor Hugo Negrisoli
Cientista da Computação (UniFil), cursando pós em Data Science (PUC-MG), experiência em Java, Javascript e Python. Atuo como Desenvolvedor Back-End na Ilegra
Victor Hugo Negrisoli
This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

MLProject_01 This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev Context Dataset English question data set file F

Hadi Nakhi 1 Dec 18, 2021
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
A Pythonic framework for threat modeling

pytm: A Pythonic framework for threat modeling Introduction Traditional threat modeling too often comes late to the party, or sometimes not at all. In

Izar Tarandach 644 Dec 20, 2022
A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

Author Ibrahim Koné From-Machine-Learning-Models-To-WebAPI A collection of Machine Learning Models To Web Api which are built on open source technolog

Ibrahim Koné 2 May 24, 2022
pymc-learn: Practical Probabilistic Machine Learning in Python

pymc-learn: Practical Probabilistic Machine Learning in Python Contents: Github repo What is pymc-learn? Quick Install Quick Start Index What is pymc-

pymc-learn 196 Dec 07, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 09, 2023
Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.

Databricks Certification Spark Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along

19 Dec 13, 2022
Generate music from midi files using BPE and markov model

Generate music from midi files using BPE and markov model

Aditya Khadilkar 37 Oct 24, 2022
CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

ZhihuiYangCS 8 Jun 07, 2022
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
Provide an input CSV and a target field to predict, generate a model + code to run it.

automl-gs Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learn

Max Woolf 1.8k Jan 04, 2023
Learn how to responsibly deliver value with ML.

Made With ML Applied ML · MLOps · Production Join 30K+ developers in learning how to responsibly deliver value with ML. 🔥 Among the top MLOps reposit

Goku Mohandas 32k Dec 30, 2022
Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

Mohammad Dori 3 Jul 15, 2022
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022
Repositório para o #alurachallengedatascience1

1° Challenge de Dados - Alura A Alura Voz é uma empresa de telecomunicação que nos contratou para atuar como cientistas de dados na equipe de vendas.

Sthe Monica 16 Nov 10, 2022
A handy tool for common machine learning models' hyper-parameter tuning.

Common machine learning models' hyperparameter tuning This repo is for a collection of hyper-parameter tuning for "common" machine learning models, in

Kevin Hu 2 Jan 27, 2022
Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.** S

Sebastian Raschka 4k Dec 30, 2022
Lightweight Machine Learning Experiment Logging 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and co

Robert Lange 65 Dec 08, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 07, 2023
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022