Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Last update: Feb 09, 2022

Overview

Real-time stock predictions with deep learning and news scraping

This repository contains a partial implementation of my bachelor's thesis "Real-time stock predictions with deep learning and news scraping". The code has been built using PyTorch Lightning, read its documentation to get a complete overview of how this repository is structured.

Disclaimer: Neither the pipeline nor the model published in this repository are the ones used in the thesis. On the pipeline side, notice that the model tries to match headlines and prices of the same day, while in the thesis we used news published the day before. For the case of the model, the one shared here has nothing to do with the original and should be considered a toy model.

Preparing the data

The data used in the thesis has been completely crawled and put together from scratch. Specifically, you can find the titles and descriptions of the news published on Reuters.com from January 2010 to May 2018. In addition to that, you also have the stock prices (end of the day) of S&P 500 companies extracted from AlphaVantage.co. Everything is compressed in a H5DF file that you can download from this link.

The first step is to clone this repository and install its dependencies:

git clone https://github.com/davidalvarezdlt/bachelor_thesis.git
cd bachelor_thesis
pip install -r requirements.txt

Move both bachelor_thesis_data.hdf5 and word2vec.bin inside ./data. The resulting folder structure should look like this:

bachelor_thesis/
    bachelor_thesis/
    data/
        bachelor_thesis_data.hdf5
        word2vec.bin
    lightning_logs/
    .gitignore
    .pre-commit-config.yaml
    LICENSE
    README.md
    requirements.txt

Training the model

In short, you can train the model by calling:

python -m bachelor_thesis

You can modify the default parameters of the code by using CLI parameters. Get a complete list of the available parameters by calling:

python -m bachelor_thesis --help

For instance, if we want to train the model using GOOGL stock prices, with a batch size of 32 and using one GPUs, we would call:

python -m bachelor_thesis --symbol GOOGL --batch_size 32 --gpus 1

Every time you train the model, a new folder inside ./lightning_logs will be created. Each folder represents a different version of the model, containing its checkpoints and auxiliary files.

Testing the model

You can measure the loss and the accuracy of the model (number of times the prediction is correct) and store it in TensorBoard by calling:

python -m bachelor_thesis --test --test_checkpoint <test_checkpoint>

Where --test_checkpoint is a valid path to the model checkpoint that should be used.

Citation

If you use the data provided in this repository or if you find this thesis useful, please use the following citation:

@thesis{Alvarez2018,
    type = {Bachelor's Thesis},
    author = {David Álvarez de la Torre},
    title = {Real-time stock predictions with Deep Learning and news scrapping},
    school = {Universitat Politècnica de Catalunya},
    year = 2018,
}

Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Related tags

Overview

Real-time stock predictions with deep learning and news scraping

Preparing the data

Training the model

Testing the model

Citation

Owner

David Álvarez de la Torre

This repository provides a PyTorch implementation and model weights for HCSC (Hierarchical Contrastive Selective Coding)

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021.

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Implementation of "Semi-supervised Domain Adaptive Structure Learning"

pytorch implementation of the ICCV'21 paper "MVTN: Multi-View Transformation Network for 3D Shape Recognition"

Aircraft design optimization made fast through modern automatic differentiation

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

Official implementation of ACMMM'20 paper 'Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework'

Multi-objective gym environments for reinforcement learning.

Atomistic Line Graph Neural Network

NALSM: Neuron-Astrocyte Liquid State Machine

PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

Implementation of the bachelor's thesis "Real-time stock predictions with deep learning and news scraping".

Related tags

Overview

Real-time stock predictions with deep learning and news scraping

Preparing the data

Training the model

Testing the model

Citation

Owner

David Álvarez de la Torre

This repository provides a PyTorch implementation and model weights for HCSC (Hierarchical Contrastive Selective Coding)

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021.

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Implementation of "Semi-supervised Domain Adaptive Structure Learning"

pytorch implementation of the ICCV'21 paper "MVTN: Multi-View Transformation Network for 3D Shape Recognition"

Aircraft design optimization made fast through modern automatic differentiation

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Plug-n-Play Reinforcement Learning in Python with OpenAI Gym and JAX

THIS IS THE **OLD** PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

Official implementation of ACMMM'20 paper 'Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework'

Multi-objective gym environments for reinforcement learning.

Atomistic Line Graph Neural Network

NALSM: Neuron-Astrocyte Liquid State Machine

PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

THIS IS THE OLD PYMC PROJECT. PLEASE USE PYMC3 INSTEAD: