Python wrapper for Stanford CoreNLP.

Last update: Dec 25, 2022

Overview

stanfordcorenlp

stanfordcorenlp is a Python wrapper for Stanford CoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, and more.

Prerequisites

Java 1.8+ (Check with command: java -version) (Download Page)

Stanford CoreNLP (Download Page)

Py Version	CoreNLP Version
v3.7.0.1 v3.7.0.2	CoreNLP 3.7.0
v3.8.0.1	CoreNLP 3.8.0
v3.9.1.1	CoreNLP 3.9.1

Installation

pip install stanfordcorenlp

Example

Simple Usage

# Simple usage
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print 'Tokenize:', nlp.word_tokenize(sentence)
print 'Part of Speech:', nlp.pos_tag(sentence)
print 'Named Entities:', nlp.ner(sentence)
print 'Constituency Parsing:', nlp.parse(sentence)
print 'Dependency Parsing:', nlp.dependency_parse(sentence)

nlp.close() # Do not forget to close! The backend server will consume a lot memery.

Output format:

# Tokenize
[u'Guangdong', u'University', u'of', u'Foreign', u'Studies', u'is', u'located', u'in', u'Guangzhou', u'.']

# Part of Speech
[(u'Guangdong', u'NNP'), (u'University', u'NNP'), (u'of', u'IN'), (u'Foreign', u'NNP'), (u'Studies', u'NNPS'), (u'is', u'VBZ'), (u'located', u'JJ'), (u'in', u'IN'), (u'Guangzhou', u'NNP'), (u'.', u'.')]

# Named Entities
 [(u'Guangdong', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'of', u'ORGANIZATION'), (u'Foreign', u'ORGANIZATION'), (u'Studies', u'ORGANIZATION'), (u'is', u'O'), (u'located', u'O'), (u'in', u'O'), (u'Guangzhou', u'LOCATION'), (u'.', u'O')]

# Constituency Parsing
 (ROOT
  (S
    (NP
      (NP (NNP Guangdong) (NNP University))
      (PP (IN of)
        (NP (NNP Foreign) (NNPS Studies))))
    (VP (VBZ is)
      (ADJP (JJ located)
        (PP (IN in)
          (NP (NNP Guangzhou)))))
    (. .)))

# Dependency Parsing
[(u'ROOT', 0, 7), (u'compound', 2, 1), (u'nsubjpass', 7, 2), (u'case', 5, 3), (u'compound', 5, 4), (u'nmod', 2, 5), (u'auxpass', 7, 6), (u'case', 9, 8), (u'nmod', 7, 9), (u'punct', 7, 10)]

Other Human Languages Support

Note: you must download an additional model file and place it in the .../stanford-corenlp-full-2018-02-27 folder. For example, you should download the stanford-chinese-corenlp-2018-02-27-models.jar file if you want to process Chinese.

# _*_coding:utf-8_*_

# Other human languages support, e.g. Chinese
sentence = '清华大学位于北京。'

with StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27', lang='zh') as nlp:
    print(nlp.word_tokenize(sentence))
    print(nlp.pos_tag(sentence))
    print(nlp.ner(sentence))
    print(nlp.parse(sentence))
    print(nlp.dependency_parse(sentence))

General Stanford CoreNLP API

Since this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.

 # General json output
nlp = StanfordCoreNLP(r'path_to_corenlp', memory='8g')
print nlp.annotate(sentence)
nlp.close()

You can specify properties:

annotators: tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref (See Detail)
pipelineLanguage: en, zh, ar, fr, de, es (English, Chinese, Arabic, French, German, Spanish) (See Annotator Support Detail)
outputFormat: json, xml, text

text = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \
       'GDUFS is active in a full range of international cooperation and exchanges in education. '

props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}
print nlp.annotate(text, properties=props)
nlp.close()

Use an Existing Server

Start a CoreNLP Server with command:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

And then:

# Use an existing server
nlp = StanfordCoreNLP('http://localhost', port=9000)

Debug

import logging
from stanfordcorenlp import StanfordCoreNLP

# Debug the wrapper
nlp = StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)

# Check more info from the CoreNLP Server 
nlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)
nlp.close()

Build

We use setuptools to package our project. You can build from the latest source code with the following command:

$ python setup.py bdist_wheel --universal

You will see the .whl file under dist directory.

Comments

INFO:root:Waiting until the server is available.

Hi, I'm trying to use this library for a project where I would need corefence resolution. I was testing out the test.py file and this kept on showing indefinitely, how do I fix this?

Thanks in advance.

opened by antoineChammas 13
中文指代消解没有输出

我在general api里添加dcoref的时候，没有输出结果

但是pipeline中去掉dcoref,就会有正确的输出，英文部分并不存在这个问题，我在源码中看到中文和英文的处理代码除了model文件之外几乎没有区别，请问是这个wrapper现在还不支持中文的指代消解吗？

下面的代码是没有问题的测试代码

opened by chenjiaxiang 10
Problems with NER

When I tried to run the demo code for ner: print 'Named Entities:', nlp.ner(sentence) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 146, in ner r_dict = self._request('ner', sentence) File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 171, in _request r_dict = json.loads(r.text) File "/usr/lib/pypy/lib-python/2.7/json/init.py", line 347, in loads return _default_decoder.decode(s) File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 363, in decode obj, end = self.raw_decode(s, idx=WHITESPACE.match(s, 0).end()) File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 381, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

Does anyone know how to solve this problem? I use StanfordCoreNLP version 3.7.0

opened by TwinkleChow 10

No JSON object error in test.py

When I ran the test with python 2.7, I got the following error:

Initializing native server...
java -Xmx4g -cp "/home/ehsan/Java/JavaLibraries/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000
The server is available.
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    print('Tokenize:', nlp.word_tokenize(sentence))
  File "/home/ehsan/Python/stanford-corenlp/stanfordcorenlp/corenlp.py", line 78, in word_tokenize
    r_dict = self._request('ssplit,tokenize', sentence)
  File "/home/ehsan/Python/stanford-corenlp/stanfordcorenlp/corenlp.py", line 114, in _request
    r_dict = json.loads(r.text)
  File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

opened by ehsanmok 10

JSONDecoderError: Expecting Value line 1 column 1

I was trying to do NER with my text

from stanfordcorenlp import StanfordCoreNLP

documents = pd.read_csv('some csv file')['documents'].values.tolist() text = documents[0] ## test is a string

nlp = StanfordCoreNLP('my_path\stanford-corenlp-full-2018-02-27') ## latest version print(nlp.ner(text))

But I keep getting this error

raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

opened by yceny 6
FileNotFoundError

I'm getting a file not found error as shown below:

Traceback (most recent call last): File "D:/Users/[user]/Documents/NLP/arabic_tagger/build_model.py", line 6, in <module> nlp = StanfordCoreNLP(corenlp_path, lang='ar', memory='4g') File "C:\Users\[user]\AppData\Roaming\Python\Python36\site-packages\stanfordcorenlp\corenlp.py", line 46, in __init__ if not subprocess.call(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) == 0: File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 267, in call with Popen(*popenargs, **kwargs) as p: File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 997, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

I have used this wrapper before and am using it in the same way as always:

corenlp_path = 'D:/Users/[user]/Desktop/StanfordCoreNLP/Full_CoreNLP_3.8.0' nlp = StanfordCoreNLP(corenlp_path, lang='ar', memory='4g')

Just to be sure, I downloaded version 3.8.0 as well as the Arabic models and made sure they are in the path specified. I'm wondering if the FileNotFoundError is not referring to the CoreNLP path but something else... subprocess.py is in the correct directory. So yeah... not sure what's wrong/what to do.

Thanks!

opened by mjrinker 5
Batch processing

It is a great wrapper. Can you make it run as a batch process as it is too slow to run this each time for a new sentence? I need to make it dependency parse several sentences within seconds. Please look into the issue.

opened by Subh1m 5
How to use stanfordcorenlp to replace all pronouns in a sentence with the nouns

How can I use standfordcorenlp to replace all pronouns with their nouns in a sentence. For example the sentence is : Fred Rogers lives in a house with pets. It is two stories, and he has a dog, a cat, a rabbit, three goldfish, and a monkey.

This needs to be converted to : Fred Rogers lives in a house with pets. House is two stories, and Fred Rogers has a dog, a cat, a rabbit, three goldfish, and a monkey.

I had used the below code nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2017-06-09', quiet=False) props = {'annotators': 'coref', 'pipelineLanguage': 'en'}

text = 'Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.' result = json.loads(nlp.annotate(text, properties=props))

mentions = result['corefs'].items()

Although, I cannot understand how to read through mentions and perform what I want to do.

opened by manavpolikara 4
Connecting to a unavailable server doesn't throw exception

When trying to connect to a existing server that is unavailable the code throws no exception and becomes unresponsive while trying to annotate leading to timeout errors.

opened by GanadiniAkshay 3

How to extract relations from stanford annotatedText?

I am able to annotate text successfully by using stanford-corenlp as follows

nlp = StanfordCoreNLP('http://localhost', port=9000)
sentence = '''Michael James editor of Publishers Weekly,
                 Bill Gates is the owner of Microsoft,
                 Obama is the owner of Microsoft,
                 Satish lives in Hyderabad'''

props={'annotators': 'tokenize, ssplit, pos, lemma, ner, regexner,coref',
       'regexner.mapping':'training.txt','pipelineLanguage':'en'}

annotatedText = json.loads(nlp.annotate(sentence, properties=props))

I am trying to get the relation of annotatedText, but it returns nothing

roles = """
    (.*(                   
    analyst|
    owner|
    lives|
    editor|
    librarian).*)|
    researcher|
    spokes(wo)?man|
    writer|
    ,\sof\sthe?\s*  # "X, of (the) Y"
    """
ROLES = re.compile(roles, re.VERBOSE)
for rel in nltk.sem.extract_rels('PERSON', 'ORGANIZATION', annotatedText,corpus='ace', pattern = ROLES):
print(nltk.sem.rtuple(rel))

can you please help how to extract the relation from Stanford annotated text using NLTK nltk.sem.extract_rels

opened by satishkumarkt 3

AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH

Exception ignored in: <bound method StanfordCoreNLP.del of <stanfordcorenlp.corenlp.StanfordCoreNLP object at 0x7f5d1e4856d8>> Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/stanfordcorenlp/corenlp.py", line 111, in del File "/usr/lib/python3/dist-packages/psutil/init.py", line 349, in init File "/usr/lib/python3/dist-packages/psutil/init.py", line 370, in _init File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 849, in init File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 151, in get_procfs_path AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH

opened by yzho0907 3

Unable to print word_tokenize Chinese word

I used the example code:

from stanfordcorenlp import StanfordCoreNLP

# Other human languages support, e.g. Chinese
sentence = '清华大学位于北京。'

with StanfordCoreNLP(r'install_packages/stanford-corenlp-full-2016-10-31', lang='zh') as nlp:
    print(nlp.word_tokenize(sentence))
    print(nlp.pos_tag(sentence))
    print(nlp.ner(sentence))
    print(nlp.parse(sentence))
    print(nlp.dependency_parse(sentence))

And my output is:

['', '', '', '', '']
[('', 'NR'), ('', 'NN'), ('', 'VV'), ('', 'NR'), ('', 'PU')]
[('', 'ORGANIZATION'), ('', 'ORGANIZATION'), ('', 'O'), ('', 'GPE'), ('', 'O')]
(ROOT
  (IP
    (NP (NR 清华) (NN 大学))
    (VP (VV 位于)
      (NP (NR 北京)))
    (PU 。)))
[('ROOT', 0, 3), ('compound:nn', 2, 1), ('nsubj', 3, 2), ('dobj', 3, 4), ('punct', 3, 5)]

It seems that parse code is able to print correct result, but other codes can't. I don't know why this happened.

opened by PolarisRisingWar 0

'json.decoder.JSONDecodeError' Error
As the lastest version of Chinese language package met the 'json.decoder.JSONDecodeError', which is caused by http request error :[500], the fuction _request should be updated as(change the annotators conf) :

def _request(self, annotators=None, data=None, *args, **kwargs): if sys.version_info.major >= 3: data = data.encode('utf-8') if annotators: annotators = 'tokenize,ssplit' + ',' + annotators # NEW language package({version} > 3.9.1.1) API cmd must inintial with these two annototars:tokenize,ssplit properties = {'annotators': annotators, 'outputFormat': 'json'} params = {'properties': str(properties), 'pipelineLanguage': self.lang} if 'pattern' in kwargs: params = {"pattern": kwargs['pattern'], 'properties': str(properties), 'pipelineLanguage': self.lang}

logging.info(params) r = requests.post(self.url, params=params, data=data, headers={'Connection': 'close'}) print(self.url, r) r_dict = json.loads(r.text) return r_dict
opened by mvllwong 0

how to extract data anotator

How can I handle error. If there is error like this I'm using like this syntax

after several trial. I have notice, to load json using this function from this issue

import nltk, json, pycorenlp, stanfordcorenlp
from clause import clauseSentence

url_stanford = 'http://corenlp.run'
model_stanford = "stanford-corenlp-4.0.0"
try:
  nlp = pycorenlp.StanfordCoreNLP(url_stanford) # pycorenlp
except:
  nlp = stanfordcorenlp.StanfordCoreNLP(model_stanford) # stanford nlp

props={"annotators":"parse","outputFormat": "json"}
sent = 'The product shall plot the data points in a scientifically correct manner'
dataform = nlp.annotate(sent, properties=props)
try:
  dt = clauseSentence().print_clauses(dataform['sentences'][0]['parse'])
except:
  parser = json.loads(dataform)
  dt = clauseSentence().print_clauses(parser['sentences'][0]['parse'])
print(dt)

but there is problem that result this one. JSONDecodeError: Expecting value: line 1 column 1 (char 0)

opened by asyrofist 0

error with "%"

when I use this tool to analyse a text which include "%", it raise error like this:

and I fix it by changing the post code in corenlp.py like this:

opened by liuxin99 0
How to use 4class NER classifier instead of default classifier

Hi, I am trying to use 4class NER classifier but not sure how to do this. I tried to search online and found a post but unable to do this in the above Python package. Any help will be greatly appreciated. I tried to follow this link below but not successful. https://stackoverflow.com/questions/31711995/how-to-load-a-specific-classifier-in-stanfordcorenlp

opened by KRSTD 0

Releases(v3.9.1.1)

v3.9.1.1(Apr 25, 2018)
Add coref function

Change token['word'] to token['originalText']

Source code(tar.gz)
Source code(zip)
stanfordcorenlp-3.9.1.1-py2.py3-none-any.whl(5.59 KB)
v3.8.0.1(Feb 14, 2018)

stanfordcorenlp 3.8.0.1 released! More robust exit strategy and more stable.

It's Valentine's Day, my girlfriend seems to lose interest in me.

I love you not only for what you are, but for what I am when I am with you.
Source code(tar.gz)
Source code(zip)
stanfordcorenlp-3.8.0.1-py2.py3-none-any.whl(5.49 KB)
v3.7.0.2(Jun 7, 2017)
Add

Support Python 3.x

More robust checking

Source code(tar.gz)
Source code(zip)
stanfordcorenlp-3.7.0.2-py2.py3-none-any.whl(4.96 KB)
v3.7.0.1(May 28, 2017)

stanfordcorenlp is a Python wrapper for Stanford CoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, and more.
Source code(tar.gz)
Source code(zip)
stanfordcorenlp-3.7.0.1-py2-none-any.whl(3.65 KB)

Owner

GitHub Repository

A collection of automation aids to connect various database systems into Lookout for Metrics

3 Apr 28, 2022

A Discord bot to allow people to create lists of random characters, with limit reroll options.

Mugen Bot A small bot I made to practice python and allow people to publically select random characters on a discord server. Uses py-cord, as that is

2 Feb 06, 2022

摩尔庄园手游脚本

摩尔庄园 BlueStacks 脚本手游上线，情怀再起，但面对游戏中枯燥无味的每日任务和资源采集，你是否觉得肝疼呢？本项目通过生成 BlueStacks 模拟器的宏脚本，帮助玩家护肝。使用脚本请阅读使用方式和对应的功能及说明联系 Telegram 频道 @mole61 Telegram

43 Dec 16, 2022

discord bot made in discord.py

udeline discord bot made in discord.py, which's main features include: general use server moderation fun commands other cool commands dependencies dis

1 Feb 08, 2022

Filters to block and remove copycat-websites from DuckDuckGo and Google. Specific to dev websites like StackOverflow or GitHub.

uBlock-Origin-dev-filter Filters to block and remove copycat-websites from DuckDuckGo and Google. Specific to dev websites like StackOverflow or GitHu

1.7k Dec 30, 2022

How to make a QR Code of your own in python

QR CODE Bilgilendirme! " pip install qrcode pillow " kurmalısınız.

1 Dec 24, 2021

Minimal Python client for the Iris API, built on top of Authlib and httpx.

🕸️ Iris Python Client Minimal Python client for the Iris API, built on top of Authlib and httpx. Installation pip install dioptra-iris-client Usage f

1 Jan 28, 2022

AWS Lambda - Parsing Cloudwatch Data and sending the response via email.

AWS Lambda - Parsing Cloudwatch Data and sending the response via email. Author: Evan Erickson Language: Python Backend: AWS / Serverless / AWS Lambda

1 Nov 14, 2021

VideoMergeDcBot1 - Video Merge Dc Bot for telegram

VIDEO MERGE BOT An Telegram Bot Demo 👉 @VideoMergeDcBot To Merge multiple Video

2 Feb 04, 2022

SkyzoMusicBot - Bot Music Telegram By Skyzo

SKYZO MUSIC BOT Telegram Music Bot And Stream Feature New Version Ready to use m

19 Apr 08, 2022

Quickly and efficiently delete your entire tweet history with the help of your Twitter archive without worrying about the pointless 3200 tweet limit imposed by Twitter.

Twitter Nuke Quickly and efficiently delete your entire tweet history with the help of your Twitter archive without worrying about the puny and pointl

73 Dec 12, 2022

Python client for Arista eAPI

Arista eAPI Python Library The Python library for Arista's eAPI command API implementation provides a client API work using eAPI and communicating wit

124 Nov 23, 2022

Discord py bot that plays magic the gathering.

Klunker Discord py bot that can play magic the gathering Bug Hunter Hello Bug Hunters. To help out with production of this bot, we need help catching

0 Apr 25, 2022

Termux Pkg

PKG Install Termux All Basic Pkg. Installation : pkg update && pkg upgrade && pkg install python && pkg install python2 && pkg install git && git clon

1 Oct 28, 2021

Male' Map Telegram Bot

Male' Map TelegramBot A simple TelegramBot to fetch residential addresses in Male', Maldives. The bot can be queried inline or directly. sample .env f

12 Nov 25, 2022

Stock trading bot made using the Robinhood API / Python library...

High-Low Stock trading bot made using the Robinhood API / Python library... Index Installation Use Development Notes Installation To Install and run t

1 Jan 07, 2022

We’re releasing an open-source tool you can use now, which we developed as a homemade Just-In-Time database access control tool for our sensitive database. This tool syncs with our directory service, slack, SIEM, and finally, our Apache Cassandra database.

Cassandra Access Control By Aner Izraeli - Intezer Security Manager ([email p

6 Mar 31, 2022

A bot to view Garfield comics directly from Discord and get updates of the comics automatically

Garfield-Bot A bot to view Garfield comics directly from Discord and get updates of the comics automatically. Instructions to use the bot: Invite the

3 Feb 13, 2022

The Simple Google Colab Notebook to Download Files from Direct Link to Google Drive with custom name and bulk link support.

Direct Link to Google Drive (Advanced! 🔥 ) The Most Advanced yet Simple Google Colab Notebook to Download Files from Direct Link to Google Drive. 🆕

14 Jul 26, 2022

Simple Python script that lets you upload image/video to imgur

Pymgur 🐍 Simple Python script that lets you upload image/video to imgur! Usage 🔨 Git Clone this repository install the requirements (pip install -r

3 Feb 20, 2022