Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Overview

Applied Natural Language Processing in the Enterprise

This is the companion repo for Applied Natural Language Processing in the Enterprise, an O'Reilly Media publication by Ankur A. Patel and Ajay Uppili Arasanipalai. Here, you will find all the source code from the book, published here on GitHub for your convenience.

Follow the steps below to get started with setting up your environment and running the code examples.

Setup

To install all the required libraries and dependencies, run the following command:

pip install nlpbook

However, the recommended approach is to use conda, a cross-platform, language-agnostic package manager that automatically handles dependency conflicts.

If you have not already, install the Miniforge distribution of Python 3.8 based on your OS. If you are on Windows, you can choose the Anaconda distribution of Python 3.8 instead of the Miniforge distribution, if you wish to.

Once conda is installed, run the following command:

conda install -c nlpbook nlpbook

Alternatively, if you'd like to keep your environment for this book isolated from the rest of your system (which we highly recommend), run the following commands:

conda create -n nlpbook
conda activate nlpbook
conda install -c nlpbook nlpbook

Then run conda activate nlpbook every time you want to return to your environment. To exit the environment, run conda deactivate.

Next, install the spaCy models.

python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg
python -m spacy download en_core_web_trf

Setup Environment Directly

If you're interested in setting up an environment to quickly get up and running with the code for this book, run the following commands from the root of this repo (please see the "Getting the Code" section below on how to set up the repo first).

conda env create --file environment.yml
conda activate nlpbook

You can also grab all the dependacies via pip:

pip install -r requirements.txt

Getting the Code

All publicly released code is in this repository. The simplest way to get started is via Git:

git clone https://github.com/nlpbook/nlpbook.git

If you're on Windows or another platform that doesn't already have git installed, you may need to obtain a Git client.

If you want a specific version to match the copy of the book you have (this can occasionally change), you can find previous versions on the releases page.

Getting the Data

Next, download data from AWS S3 (the data files are too large to store and access on Github).

aws s3 cp s3://applied-nlp-book/data/ data --recursive --no-sign-request
aws s3 cp s3://applied-nlp-book/models/ag_dataset/ models/ag_dataset --recursive --no-sign-request

How This Repo is Organized

Each chapter in the book has a corresponding notebook in the root of this project repository. They are named chXX.ipynb for the chapter XX. The appendices are named apXX.ipynb.

Note: This repo only contains the code for the chapters, not the actual text in the book. For the complete text, please purchase a copy of the book. Chapters 1, 2, and 3 have been open-sourced, courtesy of O'Reilly and the authors.

Once you'd navigated to the nlpbook project directory, you can lauch a Jupyter client such as Jupyter Lab, Jupyter Notebooks, or VS Code to view and run the notebooks.

Contributions and Errata

We welcome any suggestions, feedback, and errata from readers. If you notice anything that seems off in the book or could use improvement, we've love to hear from you. Feel free to submit an issue here on GitHub or on our errata page.

Copyright Notice

This material is made available by the Creative Commons Attribution-Noncommercial-No Derivatives 4.0 International Public License.

Note: You are free to use the code in accordance with the MIT license, but you are not allowed to redistribute or sell any of the text presented in chapters 1, 2, and 3, which have been open-sourced for the benefit of the community. Please consider purchasing a copy of the book if you are interested in reading the text that accompanies the code presented in this repo.

You might also like...
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.  This is part of the CASL project: http://casl-project.ai/
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

DELTA is a deep learning based natural language and speech processing platform.
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Comments
  • Download failed for train_prepared.csv

    Download failed for train_prepared.csv

    download failed: s3://applied-nlp-book/data/ag_dataset/prepared/train_prepared.csv to data/train_prepared.csv An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

    opened by sharma-ji 2
  • Chapter 05: data contains no attribute

    Chapter 05: data contains no attribute "Field"

    In chapter 05 when setting up the fields for training an Embedding on IMDB data you propose:

    TEXT = data.Field(lower=True, include_lengths=True, \
    batch_first=False, tokenize='spacy')
    LABEL = data.LabelField()
    

    However, data has not been defined yet. The module data imported from torchtext.__all__ does not contain an attribute Field. In the sources of torchtext I couldn't find it either.

    Can you advise or define data ?

    My Python version: 1.9.0 My Torchtext version: 0.10.0

    opened by iNLyze 1
  • No 'data' folder in Ch. 1

    No 'data' folder in Ch. 1

    Hello,

    I purchased your book and started reading Ch.1. Great book so far. I tried to emulate what is written in your book and ipynb. But there is no folder "data" that can retrieve Jeopardy questions. I guess this kind of incompleteness will not be the last even though I am reading your first chapter. Could you run your notebooks in a new environment and check what is missing? Thank you in advance. It would be an option to make your notebooks run in Colab. Then, you can write a setup file at the beginning of each chapter and users won't have issues running the scripts.

    opened by knslee07 1
Releases(v1.0.0)
  • v1.0.0(May 29, 2021)

    This is the initial public release of the source code for "Applied Natural Language Processing in the Enterprise" by Ankur A. Patel and Ajay Uppili Arasanipalai.

    Source code(tar.gz)
    Source code(zip)
Owner
Applied Natural Language Processing in the Enterprise
An O'Reilly Media book by Ankur A. Patel and Ajay Uppili Arasanipalai
Applied Natural Language Processing in the Enterprise
State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 05, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
IMDB film review sentiment classification based on BERT's supervised learning model.

IMDB film review sentiment classification based on BERT's supervised learning model. On the other hand, the model can be extended to other natural language multi-classification tasks.

Paris 1 Apr 17, 2022
BiQE: Code and dataset for the BiQE paper

BiQE: Bidirectional Query Embedding This repository includes code for BiQE and the datasets introduced in Answering Complex Queries in Knowledge Graph

Bhushan Kotnis 1 Oct 20, 2021
Nested Named Entity Recognition for Chinese Biomedical Text

CBio-NAMER CBioNAMER (Nested nAMed Entity Recognition for Chinese Biomedical Text) is our method used in CBLUE (Chinese Biomedical Language Understand

8 Dec 25, 2022
Continuously update some NLP practice based on different tasks.

NLP_practice We will continuously update some NLP practice based on different tasks. prerequisites Software pytorch = 1.10 torchtext = 0.11.0 sklear

0 Jan 05, 2022
Yodatranslator is a simple translator English to Yoda-language

yodatranslator Overview yodatranslator is a simple translator English to Yoda-language. Project is created for educational purposes. It is intended to

1 Nov 11, 2021
Partially offline multi-language translator built upon Huggingface transformers.

Translate Command-line interface to translation pipelines, powered by Huggingface transformers. This tool can download translation models, and then us

Richard Jarry 8 Oct 25, 2022
本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。

【关于 NLP】那些你不知道的事 作者:杨夕、芙蕖、李玲、陈海顺、twilight、LeoLRH、JimmyDU、艾春辉、张永泰、金金金 介绍 本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。 目录架构 一、【

1.4k Dec 30, 2022
Uncomplete archive of files from the European Nopsled Team

European Nopsled CTF Archive This is an archive of collected material from various Capture the Flag competitions that the European Nopsled team played

European Nopsled 4 Nov 24, 2021
Model for recasing and repunctuating ASR transcripts

Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o

Benoit Favre 88 Dec 29, 2022
Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

2 Jul 05, 2022
Mlcode - Continuous ML API Integrations

mlcode Basic APIs for ML applications. Django REST Application Contains REST API

Sujith S 1 Jan 01, 2022
PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023
超轻量级bert的pytorch版本,大量中文注释,容易修改结构,持续更新

bert4pytorch 2021年8月27更新: 感谢大家的star,最近有小伙伴反映了一些小的bug,我也注意到了,奈何这个月工作上实在太忙,更新不及时,大约会在9月中旬集中更新一个只需要pip一下就完全可用的版本,然后会新添加一些关键注释。 再增加对抗训练的内容,更新一个完整的finetune

muqiu 317 Dec 18, 2022
Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

Shreya Shankar 190 Dec 21, 2022
NLP-based analysis of poor Chinese movie reviews on Douban

douban_embedding 豆瓣中文影评差评分析 1. NLP NLP(Natural Language Processing)是指自然语言处理,他的目的是让计算机可以听懂人话。 下面是我将2万条豆瓣影评训练之后,随意输入一段新影评交给神经网络,最终AI推断出的结果。 "很好,演技不错

3 Apr 15, 2022
Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

Dani El-Ayyass 23 Sep 05, 2022
硕士期间自学的NLP子任务,供学习参考

NLP_Chinese_down_stream_task 自学的NLP子任务,供学习参考 任务1 :短文本分类 (1).数据集:THUCNews中文文本数据集(10分类) (2).模型:BERT+FC/LSTM,Pytorch实现 (3).使用方法: 预训练模型使用的是中文BERT-WWM, 下载地

12 May 31, 2022
BERT Attention Analysis

BERT Attention Analysis This repository contains code for What Does BERT Look At? An Analysis of BERT's Attention. It includes code for getting attent

Kevin Clark 401 Dec 11, 2022