NLP

T5 Project proposal

Topic Modeling and Clustering of News-Articles-and-Essays

Students:

Nasser Alshehri
Abdullah Bushnag
Abdulrhman Alqurashi

OVERVIEW

News come in different formats, different types and different categories. Here we attempt to use Topic modeling and Clustering to get answers on what each content containt based on its content and then we try to do it based only on its title.

The process would be: We load the data. Keep what we need from the data. Clean the text(ex:stopwords).

Build the bag of words for all documents. Build the bag of words for each document.

Vectorize the data. Run the LDA model. Run the model on all data and save the output to dataframe

Run the Clustering algorithm. Save the data to csv. Make the charts.

Data

The data is acquired from: https://components.one/datasets/all-the-news-articles-dataset

The Raw data containts 12 features: id, title, author, date, content, year, month, publication, category, digital, section, url.

The features we are using are only the 'title' and 'content'.

The data we are not interested in will be dropped/ignored.

The 'title' is the headling/name/title of the news/Article/Essay. The 'Content' is the body/content/Essay/Article/News itself.

TOOLS

Pandas Numpy Scikit-learn Matplotlib Seaborn nltk gensim

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Content Modeling and Clustering.ipynb		Content Modeling and Clustering.ipynb
MVP_Content.ipynb		MVP_Content.ipynb
MVP_Title.ipynb		MVP_Title.ipynb
ProjectProposal.ipynb		ProjectProposal.ipynb
README.md		README.md
Reading.pdf		Reading.pdf
Slides.pdf		Slides.pdf
Title Modeling and Clustering.ipynb		Title Modeling and Clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Modeling and Clustering.ipynb

Content Modeling and Clustering.ipynb

MVP_Content.ipynb

MVP_Content.ipynb

MVP_Title.ipynb

MVP_Title.ipynb

ProjectProposal.ipynb

ProjectProposal.ipynb

README.md

README.md

Reading.pdf

Reading.pdf

Slides.pdf

Slides.pdf

Title Modeling and Clustering.ipynb

Title Modeling and Clustering.ipynb

Repository files navigation

NLP

Students:

OVERVIEW

Data

TOOLS

About

Releases

Packages

Languages

NasserAlshehri11/News-Articles-and-Essays

Folders and files

Latest commit

History

Repository files navigation

NLP

Students:

OVERVIEW

Data

TOOLS

About

Resources

Stars

Watchers

Forks

Languages