NLP_0-project

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures¹. We are a "democratic" and collaborative group of five, and I mentioned our names based on our initial work division below 😄 .

Here is the outline of our project:

Data collection.

@LeiyuanHuo, jyang130, FanFanShark, xdc1999, gaojiamin1116

Based on file data-WRDS-list.csv, write a web-scraping algorithm to download all 10-Ks (html format) these companies filed to the SEC within 2010 to 2022 at Historical EDGAR documents, and rename them data-10K-COMPNAME-Year.html.
Parse html files to extract Business and MD&A sections.

Text Processing: feature extraction²

Part of Speech Tagging (POS) (mainly this method) to get product name, descriptions. Store these for each company.
Named Entity Recognition (NER) (also mainly this method) to get mentioned competitor names. Store these for each company.
Product texts: BoW and tf-idf for each company's product(s), and hopefully we have a term-product matrix then.
Competitor texts: definitely BoW, as we care about the frequency of being mentioned.
‼️ We also need to combine sector and firm size/market power into competitor texts and re-count.

Text Processing: feature transformation and representation²

Term-product matrix: calculate cosine similarity scores for products pairwise; use score threshold to cluster products into similar groups.
Term-product matrix: directly apply clustering method (e.g., KMeans clustering) to product vectors, and cluster them.

Econometric Analysis and Hypothesis Testing²

Multivariate regression: DV is profitability (e.g., sales, revenue, Tobin's q), IV is competition measures (one from similar product count, one from mentions as competitors), also include relevant control variables.
Cross-section portfolios: our competition measures are cross-sectional (one for each year), so we can create long-short portfolios for both measures, and examine stock return effects.

Two papers inspired this project. Citations: Eisdorfer, A., Froot, K., Ozik, G., & Sadka, R. (2021). Competition Links and Stock Returns. The Review of Financial Studies, The Review of financial studies, 2021-12-20. && Hoberg, G., & Phillips, G. (2016). Text-Based Network Industries and Endogenous Product Differentiation. The Journal of Political Economy, 124(5), 1423-1465. ↩
Text processing processes are based on MFIN7036 Lecture_Notes and a review paper. Citation: Marty, T., Vanstone, B., & Hahn, T. (2020). News media analytics in finance: A survey. Accounting and Finance (Parkville), 60(2), 1385-1434. ↩ ↩ ² ↩ ³

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Related tags

Overview

NLP_0-project

Data collection.

Text Processing: feature extraction²

Text Processing: feature transformation and representation²

Econometric Analysis and Hypothesis Testing²

Owner

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

An open source object detection toolbox based on PyTorch

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Implementations of paper Controlling Directions Orthogonal to a Classifier

Data-depth-inference - Data depth inference with python

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Anonymous implementation of KSL

Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

A curated list of awesome game datasets, and tools to artificial intelligence in games

Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

StyleGAN2-ADA - Official PyTorch implementation

Attempt at implementation of a simple GAN using Keras

An index of recommendation algorithms that are based on Graph Neural Networks.

Generic ecosystem for feature extraction from aerial and satellite imagery

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

DLWP: Deep Learning Weather Prediction

BridgeGAN - Tensorflow implementation of Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation.

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Related tags

Overview

NLP_0-project

Data collection.

Text Processing: feature extraction2

Text Processing: feature transformation and representation2

Econometric Analysis and Hypothesis Testing2

Footnotes

Owner

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

An open source object detection toolbox based on PyTorch

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Implementations of paper Controlling Directions Orthogonal to a Classifier

Data-depth-inference - Data depth inference with python

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Anonymous implementation of KSL

Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

A curated list of awesome game datasets, and tools to artificial intelligence in games

Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

StyleGAN2-ADA - Official PyTorch implementation

Attempt at implementation of a simple GAN using Keras

An index of recommendation algorithms that are based on Graph Neural Networks.

Generic ecosystem for feature extraction from aerial and satellite imagery

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

DLWP: Deep Learning Weather Prediction

BridgeGAN - Tensorflow implementation of Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation.

Text Processing: feature extraction²

Text Processing: feature transformation and representation²

Econometric Analysis and Hypothesis Testing²