Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Last update: Dec 27, 2022

Overview

NeuroNER

NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com.

This page gives step-by-step instructions to install and use NeuroNER.

Requirements
Installation
Using NeuroNER
Citation

Requirements

NeuroNER relies on Python 3, TensorFlow 1.0+, and optionally on BRAT:

Python 3: NeuroNER does not work with Python 2.x. On Windows, it has to be Python 3.6 64-bit or later.
TensorFlow is a library for machine learning. NeuroNER uses it for its NER engine, which is based on neural networks. Official website: https://www.tensorflow.org
BRAT (optional) is a web-based annotation tool. It only needs to be installed if you wish to conveniently create annotations or view the predictions made by NeuroNER. Official website: http://brat.nlplab.org

Installation

For GPU support, GPU requirements for Tensorflow must be satisfied. If your system does not meet these requirements, you should use the CPU version. To install neuroner:

# For CPU support (no GPU support):
pip3 install pyneuroner[cpu]

# For GPU support:
pip3 install pyneuroner[gpu]

You will also need to download some support packages.

The English language module for Spacy:

# Download the SpaCy English module
python -m spacy download en

Download word embeddings from http://neuroner.com/data/word_vectors/glove.6B.100d.zip, unzip them to the folder ./data/word_vectors

# Get word embeddings
wget -P data/word_vectors http://neuroner.com/data/word_vectors/glove.6B.100d.zip
unzip data/word_vectors/glove.6B.100d.zip -d data/word_vectors/

Load sample datasets. These can be loaded by calling the neuromodel.fetch_data() function from a Python interpreter or with the --fetch_data argument at the command line.

# Load a dataset from the command line
neuroner --fetch_data=conll2003
neuroner --fetch_data=example_unannotated_texts
neuroner --fetch_data=i2b2_2014_deid

# Load a dataset from a Python interpreter
from neuroner import neuromodel
neuromodel.fetch_data('conll2003')
neuromodel.fetch_data('example_unannotated_texts')
neuromodel.fetch_data('i2b2_2014_deid')

Load a pretrained model. The models can be loaded by calling the neuromodel.fetch_model() function from a Python interpreter or with the --fetch_trained_models argument at the command line.

# Load a pre-trained model from the command line
neuroner --fetch_trained_model=conll_2003_en
neuroner --fetch_trained_model=i2b2_2014_glove_spacy_bioes
neuroner --fetch_trained_model=i2b2_2014_glove_stanford_bioes
neuroner --fetch_trained_model=mimic_glove_spacy_bioes
neuroner --fetch_trained_model=mimic_glove_stanford_bioes

# Load a pre-trained model from a Python interpreter
from neuroner import neuromodel
neuromodel.fetch_model('conll_2003_en')
neuromodel.fetch_model('i2b2_2014_glove_spacy_bioes')
neuromodel.fetch_model('i2b2_2014_glove_stanford_bioes')
neuromodel.fetch_model('mimic_glove_spacy_bioes')
neuromodel.fetch_model('mimic_glove_stanford_bioes')

Installing BRAT (optional)

BRAT is a tool that can be used to create, change or view the BRAT-style annotations. For installation and usage instructions, see the BRAT website.

Installing Perl (platform dependent)

Perl is required because the official CoNLL-2003 evaluation script is written in this language: http://strawberryperl.com. For Unix and Mac OSX systems, Perl should already be installed. For Windows systems, you may need to install it.

Using NeuroNER

NeuroNER can either be run from the command line or from a Python interpreter.

Using NeuroNer from a Python interpreter

To use NeuroNER from the command line, create an instance of the neuromodel with your desired arguments, and then call the relevant methods. Additional parameters can be set from a parameters.ini file in the working directory. For example:

from neuroner import neuromodel
nn = neuromodel.NeuroNER(train_model=False, use_pretrained_model=True)

More detail to follow.

Using NeuroNer from the command line

By default NeuroNER is configured to train and test on the CoNLL-2003 dataset. Running neuroner with the default settings starts training on the CoNLL-2003 dataset (the F1-score on the test set should be around 0.90, i.e. on par with state-of-the-art systems). To start the training:

# To use the CPU if you have installed tensorflow, or use the GPU if you have installed tensorflow-gpu:
neuroner

# To use the CPU only if you have installed tensorflow-gpu:
CUDA_VISIBLE_DEVICES="" neuroner

# To use the GPU 1 only if you have installed tensorflow-gpu:
CUDA_VISIBLE_DEVICES=1 neuroner

If you wish to change any of NeuroNER parameters, you can modify the parameters.ini configuration file in your working directory or specify it as an argument.

For example, to reduce the number of training epochs and not use any pre-trained token embeddings:

neuroner --maximum_number_of_epochs=2 --token_pretrained_embedding_filepath=""

To perform NER on some plain texts using a pre-trained model:

neuroner --train_model=False --use_pretrained_model=True --dataset_text_folder=./data/example_unannotated_texts --pretrained_model_folder=./trained_models/conll_2003_en

If a parameter is specified in both the parameters.ini configuration file and as an argument, then the argument takes precedence (i.e., the parameter in parameters.ini is ignored). You may specify a different configuration file with the --parameters_filepath command line argument. The command line arguments have no default value except for --parameters_filepath, which points to parameters.ini.

NeuroNER has 3 modes of operation:

training mode (from scratch): the dataset folder must have train and valid sets. Test and deployment sets are optional.
training mode (from pretrained model): the dataset folder must have train and valid sets. Test and deployment sets are optional.
prediction mode (using pretrained model): the dataset folder must have either a test set or a deployment set.

Adding a new dataset

A dataset may be provided in either CoNLL-2003 or BRAT format. The dataset files and folders should be organized and named as follows:

Training set: train.txt file (CoNLL-2003 format) or train folder (BRAT format). It must contain labels.
Validation set: valid.txt file (CoNLL-2003 format) or valid folder (BRAT format). It must contain labels.
Test set: test.txt file (CoNLL-2003 format) or test folder (BRAT format). It must contain labels.
Deployment set: deploy.txt file (CoNLL-2003 format) or deploy folder (BRAT format). It shouldn't contain any label (if it does, labels are ignored).

We provide several examples of datasets:

data/conll2003/en: annotated dataset with the CoNLL-2003 format, containing 3 files (train.txt, valid.txt and test.txt).
data/example_unannotated_texts: unannotated dataset with the BRAT format, containing 1 folder (deploy/). Note that the BRAT format with no annotation is the same as plain texts.

Using a pretrained model

In order to use a pretrained model, the pretrained_model_folder parameter in the parameters.ini configuration file must be set to the folder containing the pretrained model. The following parameters in the parameters.ini configuration file must also be set to the same values as in the configuration file located in the specified pretrained_model_folder:

use_character_lstm
character_embedding_dimension
character_lstm_hidden_state_dimension
token_pretrained_embedding_filepath
token_embedding_dimension
token_lstm_hidden_state_dimension
use_crf
tagging_format
tokenizer

Sharing a pretrained model

You are highly encouraged to share a model trained on their own datasets, so that other users can use the pretrained model on other datasets. We provide the neuroner/prepare_pretrained_model.py script to make it easy to prepare a pretrained model for sharing. In order to use the script, one only needs to specify the output_folder_name, epoch_number, and model_name parameters in the script.

By default, the only information about the dataset contained in the pretrained model is the list of tokens that appears in the dataset used for training and the corresponding embeddings learned from the dataset.

If you wish to share a pretrained model without providing any information about the dataset (including the list of tokens appearing in the dataset), you can do so by setting

delete_token_mappings = True

when running the script. In this case, it is highly recommended to use some external pre-trained token embeddings and freeze them while training the model to obtain high performance. This can be done by specifying the token_pretrained_embedding_filepath and setting

freeze_token_embeddings = True

in the parameters.ini configuration file during training.

In order to share a pretrained model, please submit a new issue on the GitHub repository.

Using TensorBoard

You may launch TensorBoard during or after the training phase. To do so, run in the terminal from the NeuroNER folder:

tensorboard --logdir=output

This starts a web server that is accessible at http://127.0.0.1:6006 from your web browser.

Citation

If you use NeuroNER in your publications, please cite this paper:

@article{2017neuroner,
  title={{NeuroNER}: an easy-to-use program for named-entity recognition based on neural networks},
  author={Dernoncourt, Franck and Lee, Ji Young and Szolovits, Peter},
  journal={Conference on Empirical Methods on Natural Language Processing (EMNLP)},
  year={2017}
}

The neural network architecture used in NeuroNER is described in this article:

@article{2016deidentification,
  title={De-identification of Patient Notes with Recurrent Neural Networks},
  author={Dernoncourt, Franck and Lee, Ji Young and Uzuner, Ozlem and Szolovits, Peter},
  journal={Journal of the American Medical Informatics Association (JAMIA)},
  year={2016}
}

Comments

Trouble running main.py missing module attribute-distutils-util

Hey, thanks for the code. Unfortunately I am having difficulty running the pretrained conll_2003_en model. mu parameters.ini file looks like this:


#----- Possible modes of operation -----------------------------------------------------------------------------------------------------------------#
# training mode (from scratch): set train_model to True, and use_pretrained_model to False (if training from scratch).                        #
#				 				Must have train and valid sets in the dataset_text_folder, and test and deployment sets are optional.               #
# training mode (from pretrained model): set train_model to True, and use_pretrained_model to True (if training from a pretrained model).     #
#				 						 Must have train and valid sets in the dataset_text_folder, and test and deployment sets are optional.      #
# prediction mode (using pretrained model): set train_model to False, and use_pretrained_model to True.                                       #
#											Must have either a test set or a deployment set.                                                        #
# NOTE: Whenever use_pretrained_model is set to True, pretrained_model_folder must be set to the folder containing the pretrained model to use, and #
# 		model.ckpt, dataset.pickle and parameters.ini must exist in the same folder as the checkpoint file.                                         #
#---------------------------------------------------------------------------------------------------------------------------------------------------#

[mode]
# At least one of use_pretrained_model and train_model must be set to True.
train_model = False
use_pretrained_model = True
pretrained_model_folder = ../trained_models/conll_2003_en

[dataset]
dataset_text_folder = ../data/conll2003/en

# main_evaluation_mode should be either 'conll', 'bio', 'token', or 'binary'. ('conll' is entity-based)
# It determines which metric to use for early stopping, displaying during training, and plotting F1-score vs. epoch.
main_evaluation_mode = conll

output_folder = ../output

#---------------------------------------------------------------------------------------------------------------------#
# The parameters below are for advanced users. Their default values should yield good performance in most cases.      #
#---------------------------------------------------------------------------------------------------------------------#

[ann]
use_character_lstm = True
character_embedding_dimension = 25
character_lstm_hidden_state_dimension = 25

# In order to use random initialization instead, set token_pretrained_embedding_filepath to empty string, as below:
# token_pretrained_embedding_filepath =
token_pretrained_embedding_filepath = ../data/word_vectors/glove.6B.100d.txt
token_embedding_dimension = 100
token_lstm_hidden_state_dimension = 100

use_crf = True

[training]
patience = 10
maximum_number_of_epochs = 100

# optimizer should be either 'sgd', 'adam', or 'adadelta'
optimizer = sgd
learning_rate = 0.005
# gradients will be clipped above |gradient_clipping_value| and below -|gradient_clipping_value|, if gradient_clipping_value is non-zero
# (set to 0 to disable gradient clipping)
gradient_clipping_value = 5.0

# dropout_rate should be between 0 and 1
dropout_rate = 0.5

# Upper bound on the number of CPU threads NeuroNER will use
number_of_cpu_threads = 8

# Upper bound on the number of GPU NeuroNER will use
# If number_of_gpus > 0, you need to have installed tensorflow-gpu
number_of_gpus = 0

[advanced]
experiment_name = test

# tagging_format should be either 'bioes' or 'bio'
tagging_format = bioes

# tokenizer should be either 'spacy' or 'stanford'. The tokenizer is only used when the original data is provided only in BRAT format.
# - 'spacy' refers to spaCy (https://spacy.io). To install spacy: pip install -U spacy
# - 'stanford' refers to Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP/). Stanford CoreNLP is written in Java: to use it one has to start a
#              Stanford CoreNLP server, which can tokenize sentences given on the fly. Stanford CoreNLP is portable, which means that it can be run
#              without any installation.
#              To download Stanford CoreNLP: https://stanfordnlp.github.io/CoreNLP/download.html
#              To run Stanford CoreNLP, execute in the terminal: `java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 50000`
#              By default Stanford CoreNLP is in English. To use it in other languages, see: https://stanfordnlp.github.io/CoreNLP/human-languages.html
#              Stanford CoreNLP 3.6.0 and higher requires Java 8. We have tested NeuroNER with Stanford CoreNLP 3.6.0.
tokenizer = spacy
# spacylanguage should be either 'de' (German), 'en' (English) or 'fr' (French). (https://spacy.io/docs/api/language-models)
# To install the spaCy language: `python -m spacy.de.download`; or `python -m spacy.en.download`; or `python -m spacy.fr.download`
spacylanguage = en

# If remap_unknown_tokens is set to True, map to UNK any token that hasn't been seen in neither the training set nor the pre-trained token embeddings.
remap_unknown_tokens_to_unk = True

# If load_only_pretrained_token_embeddings is set to True, then token embeddings will only be loaded if it exists in token_pretrained_embedding_filepath
# or in pretrained_model_checkpoint_filepath, even for the training set.
load_only_pretrained_token_embeddings = False

# If load_all_pretrained_token_embeddings is set to True, then all pretrained token embeddings will be loaded even for the tokens that do not appear in the dataset.
load_all_pretrained_token_embeddings = False

# If check_for_lowercase is set to True, the lowercased version of each token will also be checked when loading the pretrained embeddings.
# For example, if the token 'Boston' does not exist in the pretrained embeddings, then it is mapped to the embedding of its lowercased version 'boston',
# if it exists among the pretrained embeddings.
check_for_lowercase = True

# If check_for_digits_replaced_with_zeros is set to True, each token with digits replaced with zeros will also be checked when loading pretrained embeddings.
# For example, if the token '123-456-7890' does not exist in the pretrained embeddings, then it is mapped to the embedding of '000-000-0000',
# if it exists among the pretrained embeddings.
# If both check_for_lowercase and check_for_digits_replaced_with_zeros are set to True, then the lowercased version is checked before the digit-zeroed version.
check_for_digits_replaced_with_zeros = True

# If freeze_token_embeddings is set to True, token embedding will remain frozen (not be trained).
freeze_token_embeddings = False

# If debug is set to True, only 200 lines will be loaded for each split of the dataset.
debug = False
verbose = False

# plot_format specifies the format of the plots generated by NeuroNER. It should be either 'png' or 'pdf'.
plot_format = pdf

# specify which layers to reload from the pretrained model
reload_character_embeddings = True
reload_character_lstm = True
reload_token_embeddings = True
reload_token_lstm = True
reload_feedforward = True
reload_crf = True

parameters_filepath = ./parameters.ini

However when I run the python3.5 main.py or python3.5 main.py --train_model=False --use_pretrained_model=True --dataset_text_folder=../data/example_unannotated_texts --pretrained_model_folder=../trained_models/conll_2003_en I get this:

TensorFlow version: 1.3.0
Traceback (most recent call last):
  File "main.py", line 250, in <module>
    main()
  File "main.py", line 245, in main
    nn = NeuroNER(**arguments)
  File "/home/beast/Documents/NeuroNER-master/src/neuroner.py", line 257, in __init__
    parameters, conf_parameters = self._load_parameters(arguments['parameters_filepath'], arguments=arguments)
  File "/home/beast/Documents/NeuroNER-master/src/neuroner.py", line 118, in _load_parameters
    parameters[k] = distutils.util.strtobool(v)
AttributeError: module 'distutils' has no attribute 'util'
Exception ignored in: <bound method NeuroNER.__del__ of <neuroner.NeuroNER object at 0x7fb575c84588>>
Traceback (most recent call last):
  File "/home/beast/Documents/NeuroNER-master/src/neuroner.py", line 489, in __del__
    self.sess.close()
AttributeError: 'NeuroNER' object has no attribute 'sess'```

Any ideas? 
Thanks

opened by Blair-Young 15

License?

As you provide no explicit license, that makes this project unusable to anybody. No license is equivalent to "only look, don't touch". Even in a purely academic or research context, it is technically illegal to use code from a repo with no license.

opened by fnl 13
Using NeuroNER with Brat with custom annotations

Hi Franck and all the rest of you! First, thanks for all the great work with this.

I am trying to get started with using NeuroNER on a dataset I have annotated with BRAT, within a specific domain (music), so the entities are all custom. And, I want to add this dataset, to NeuroNER, by adding the BRAT-files (the txt-file and the ann-file), and then train on that. But I am not getting anywhere.

As I have understood it from your docs is that I should put both the train.txt and the train.ann in a folder and then point to that in parameters.ini, but I guess I misunderstood because I'm not making progress...

Could you offer some guidance on how to get started with this?

Best regards, Robert
enhancement

opened by robertkviby 12
AttributeError: module 'distutils' has no attribute 'util'

File "main.py", line 250, in main() File "main.py", line 245, in main nn = NeuroNER(**arguments) File "/home/server1/share/NeuroNER-master/src/neuroner.py", line 257, in init parameters, conf_parameters = self._load_parameters(arguments['parameters_filepath'], arguments=arguments) File "/home/server1/share/NeuroNER-master/src/neuroner.py", line 118, in _load_parameters parameters[k] = distutils.util.strtobool(v) AttributeError: module 'distutils' has no attribute 'util' Exception ignored in: <bound method NeuroNER.del of <neuroner.NeuroNER object at 0x7fb8d35fcda0>> Traceback (most recent call last): File "/home/server1/share/NeuroNER-master/src/neuroner.py", line 489, in del self.sess.close() AttributeError: 'NeuroNER' object has no attribute 'sess'

opened by w2781993753 10

FileNotFoundError with conll_output_filepath

I'm trying to follow the steps from the README to run NeuroNER using the default parameters.ini settings. I'm running into a FileNotFoundError at train.py, line 95.

I'm new to python, but will try to track down the source of the issue. It looks like maybe the file should've been created by this line, but it's not clear to me why.

This is on ubuntu 16.04, python 3.5.2. Any advice on how to debug this issue?

Here's partial output from running python main.py:

Starting epoch 0
Training completed in 0.00 seconds
Evaluate model on the train set
Traceback (most recent call last):
  File "main.py", line 445, in <module>
    main()
  File "main.py", line 392, in main
    y_pred, y_true, output_filepaths = train.predict_labels(sess, model, transition_params_trained, parameters, dataset, epoch_number, stats_graph_folde$
, dataset_filepaths)
  File "/home/user/neuroner/neuroner/src/train.py", line 113, in predict_labels
    prediction_output = prediction_step(sess, dataset, dataset_type, model, transition_params_trained, stats_graph_folder, epoch_number, parameters, dat$
set_filepaths)
  File "/home/user/neuroner/neuroner/src/train.py", line 95, in prediction_step
    with open(conll_output_filepath, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '../output/en_2017-07-05_22-35-05-549137/000_train.txt_conll_evaluation.txt'

question

opened by davidbenton 10

NeuroNER installation on Windows

I am new to python and have been trying to install and run NeuroNER on windows for 2 days but its not running and i think i am not able to install it properly on windows 10 54 bit OS. The installation tutorial for Ubuntu is available but for windows i am unable to find any video tutorial. Can any one please create a step by step video tutorial or installation manual with step by step snapshots? I really need it for my MS research ASAP.
question

opened by Rabia-Noureen 10
The output file is not created.

Hi.. First of all, I will explain the steps what I have done for unannotated texts.

The dataset folder now contains a deploy folder with phrase.txt. No other files are included. In that case, when I run, main.py, I am getting the error as train.txt is not found. (FileNotFoundError).

If I include train,test and valid files in the same folder, it's running, but I am not getting the expected output.

As per the instructions given, I have changed these parameters in src ini file: train_model=False use_pretrained_model=True pretrained_model_folder=../trained_models/conll_2003_en dataset_text_folder=../data/dataset

use_character_lstm=True character_embedding_dimension=25 character_lstm_hidden_state_dimension=25 token_pretrained_embedding_filepath=../data/word_vectors/glove.6B.100d.txt token_embedding_dimension=100 token_lstm_hidden_state_dimension=100

use_crf=True tagging_format=bioes tokenizer=spacy

Please help me to find out what error I am making in this.
question

opened by elsaresearch 10

AttributeError: 'PolyCollection' object has no attribute 'get_axes'

I see the following error. Is there a way to fix it? Thanks.

~/linux/test/python/nlpdl/NeuroNER/src$ python3 main.py
NeuroNER version: 1.0-dev
TensorFlow version: 1.4.0
{'character_embedding_dimension': 25,
 'character_lstm_hidden_state_dimension': 25,
 'check_for_digits_replaced_with_zeros': 1,
 'check_for_lowercase': 1,
 'dataset_text_folder': '../data/conll2003/en',
 'debug': 0,
 'dropout_rate': 0.5,
 'experiment_name': 'test',
 'freeze_token_embeddings': 0,
 'gradient_clipping_value': 5.0,
 'learning_rate': 0.005,
 'load_all_pretrained_token_embeddings': 0,
 'load_only_pretrained_token_embeddings': 0,
 'main_evaluation_mode': 'conll',
 'maximum_number_of_epochs': 100,
 'number_of_cpu_threads': 8,
 'number_of_gpus': 0,
 'optimizer': 'sgd',
 'output_folder': '../output',
 'parameters_filepath': './parameters.ini',
 'patience': 10,
 'plot_format': 'pdf',
 'pretrained_model_folder': '../trained_models/conll_2003_en',
 'reload_character_embeddings': 1,
 'reload_character_lstm': 1,
 'reload_crf': 1,
 'reload_feedforward': 1,
 'reload_token_embeddings': 1,
 'reload_token_lstm': 1,
 'remap_unknown_tokens_to_unk': 1,
 'spacylanguage': 'en',
 'tagging_format': 'bioes',
 'token_embedding_dimension': 100,
 'token_lstm_hidden_state_dimension': 100,
 'token_pretrained_embedding_filepath': '../data/word_vectors/glove.6B.100d.txt',
 'tokenizer': 'spacy',
 'train_model': 1,
 'use_character_lstm': 1,
 'use_crf': 1,
 'use_pretrained_model': 0,
 'verbose': 0}
Formatting train set from CONLL to BRAT... Done.
Converting CONLL from BIO to BIOES format... Done.
Formatting valid set from CONLL to BRAT... Done.
Converting CONLL from BIO to BIOES format... Done.
Formatting test set from CONLL to BRAT... Done.
Converting CONLL from BIO to BIOES format... Done.
Load dataset... done (93.21 seconds)
Load token embeddings... done (0.43 seconds)
number_of_token_original_case_found: 14618
number_of_token_lowercase_found: 11723
number_of_token_digits_replaced_with_zeros_found: 119
number_of_token_lowercase_and_digits_replaced_with_zeros_found: 16
number_of_loaded_word_vectors: 26476
dataset.vocabulary_size: 28984

Starting epoch 0
Training completed in 0.00 seconds
Evaluate model on the train set
processed 203621 tokens with 23499 phrases; found: 198223 phrases; correct: 3218.
accuracy:   3.71%; precision:   1.62%; recall:  13.69%; FB1:   2.90
              LOC: precision:   1.32%; recall:   2.77%; FB1:   1.79  14951
             MISC: precision:   1.23%; recall:  60.62%; FB1:   2.42  168908
              ORG: precision:  16.17%; recall:   5.32%; FB1:   8.00  2078
              PER: precision:   4.88%; recall:   9.09%; FB1:   6.35  12286

Evaluate model on the valid set
processed 51362 tokens with 5942 phrases; found: 50303 phrases; correct: 869.
accuracy:   3.43%; precision:   1.73%; recall:  14.62%; FB1:   3.09
              LOC: precision:   1.58%; recall:   2.94%; FB1:   2.06  3415
             MISC: precision:   1.27%; recall:  59.87%; FB1:   2.48  43568
              ORG: precision:  21.15%; recall:   6.04%; FB1:   9.40  383
              PER: precision:   6.20%; recall:   9.88%; FB1:   7.62  2937

Evaluate model on the test set
processed 46435 tokens with 5648 phrases; found: 45225 phrases; correct: 689.
accuracy:   3.43%; precision:   1.52%; recall:  12.20%; FB1:   2.71
              LOC: precision:   1.59%; recall:   3.24%; FB1:   2.13  3404
             MISC: precision:   1.09%; recall:  59.26%; FB1:   2.13  38275
              ORG: precision:  16.52%; recall:   4.46%; FB1:   7.02  448
              PER: precision:   4.68%; recall:   8.97%; FB1:   6.15  3098

Generating plots for the train set
Traceback (most recent call last):
  File "main.py", line 250, in <module>
    main()
  File "main.py", line 246, in main
    nn.fit()
  File "/Users/xxx/linux/test/python/nlpdl/NeuroNER/src/neuroner.py", line 394, in fit
    evaluate.evaluate_model(results, dataset, y_pred, y_true, stats_graph_folder, epoch_number, epoch_start_time, output_filepaths, parameters)
  File "/Users/xxx/linux/test/python/nlpdl/NeuroNER/src/evaluate.py", line 239, in evaluate_model
    verbose=verbose)
  File "/Users/xxx/linux/test/python/nlpdl/NeuroNER/src/evaluate.py", line 22, in assess_model
    cmap='RdBu')
  File "/Users/xxx/linux/test/python/nlpdl/NeuroNER/src/utils_plots.py", line 162, in plot_classification_report
    heatmap(np.array(plotMat), title, xlabel, ylabel, xticklabels, yticklabels, figure_width, figure_height, correct_orientation, cmap=cmap)
  File "/Users/xxx/linux/test/python/nlpdl/NeuroNER/src/utils_plots.py", line 112, in heatmap
    show_values(c, fmt=fmt)
  File "/Users/xxx/linux/test/python/nlpdl/NeuroNER/src/utils_plots.py", line 36, in show_values
    ax = pc.get_axes()
AttributeError: 'PolyCollection' object has no attribute 'get_axes'

opened by pengyu 8

What should be the max epoch_number for training?
What should be the max epoch_number value should be to train the model at the first run of main.py? I tried to change it by

python main.py --maximum_number_of_epochs=2 --token_pretrained_embedding_filepath=""

And after 3 or 4 runs the model starts training again. Should it stop at some point?
question
opened by Rabia-Noureen 8

Only predicting "O", also on provided examples

python3.5 main.py --train_model=False --use_pretrained_model=True --dataset_text_folder=../data/example_unannotated_texts --pretrained_model_folder=../trained_models/conll_2003_en - it just yields "O's"

Output:

NeuroNER version: 1.0-dev
TensorFlow version: 1.1.0
NeuroNER version: 1.0-dev
TensorFlow version: 1.1.0
{'character_embedding_dimension': 25,
 'character_lstm_hidden_state_dimension': 25,
 'check_for_digits_replaced_with_zeros': 1,
 'check_for_lowercase': 1,
 'dataset_text_folder': '../data/example_unannotated_texts',
 'debug': 0,
 'dropout_rate': 0.5,
 'experiment_name': 'test',
 'freeze_token_embeddings': 0,
 'gradient_clipping_value': 5.0,
 'learning_rate': 0.005,
 'load_only_pretrained_token_embeddings': 0,
 'main_evaluation_mode': 'conll',
 'maximum_number_of_epochs': 100,
 'number_of_cpu_threads': 8,
 'number_of_gpus': 0,
 'optimizer': 'sgd',
 'output_folder': '../output',
 'parameters_filepath': './parameters.ini',
 'patience': 10,
 'plot_format': 'pdf',
 'pretrained_model_folder': '../trained_models/conll_2003_en',
 'reload_character_embeddings': 1,
 'reload_character_lstm': 1,
 'reload_crf': 1,
 'reload_feedforward': 1,
 'reload_token_embeddings': 1,
 'reload_token_lstm': 1,
 'remap_unknown_tokens_to_unk': 1,
 'spacylanguage': 'en',
 'tagging_format': 'bioes',
 'token_embedding_dimension': 100,
 'token_lstm_hidden_state_dimension': 100,
 'token_pretrained_embedding_filepath': '../data/word_vectors/glove.6B.100d.txt',
 'tokenizer': 'spacy',
 'train_model': 0,
 'use_character_lstm': 1,
 'use_crf': 1,
 'use_pretrained_model': 1,
 'verbose': 0}
Formatting deploy set from BRAT to CONLL... Done.
Converting CONLL from BIO to BIOES format... Done.
Load dataset... done (40.78 seconds)

Starting epoch 0
Load token embeddings... done (89.64 seconds)
number_of_token_original_case_found: 94
number_of_token_lowercase_found: 25
number_of_token_digits_replaced_with_zeros_found: 0
number_of_token_lowercase_and_digits_replaced_with_zeros_found: 0
number_of_loaded_word_vectors: 119
dataset.vocabulary_size: 119
Load token embeddings from pretrained model... done (0.22 seconds)
number_of_loaded_vectors: 104
dataset.vocabulary_size: 119
Load character embeddings from pretrained model... done (0.23 seconds)
number_of_loaded_vectors: 58
dataset.alphabet_size: 58
Training completed in 92.45 seconds
Predict labels for the deploy set
Formatting 000_deploy set from CONLL to BRAT... Done.
Finishing the experiment

question

opened by kootenpv 8

Using pre-trained model example not working

Hi,

Thanks a lot for the model and the code! They are very useful.

I'm trying to re-use the conll-2003 pre-trained model like in the example, using the example files in the same folder path (..\data\example_unannotated_texts\deploy).

with: dataset_text_folder = ../data/example_unannotated_texts

while the text files to be annotated are in ..\data\example_unannotated_texts\deploy

The output text file is empty, and the loading shows that it found no tokens.

I tried running it from dataset_text_folder = ../data/example_unannotated_texts/deploy instead, but then I get an error message (assertion error) saying that the tag is not 'O' (from the remove BIO function).

I also get an error message from spacy (asking to download the 'en' data from it first, which I did a few times already) the first time I run the code on the data to annotate. If I run it a second time, it then runs but the spacy file created during the first run is empty (which I believe is the problem).

Thanks for your help! Yoann
question

opened by YoannMR 7
Bump tensorflow from 1.1.0 to 2.9.3
Bumps tensorflow from 1.1.0 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1

Project dependencies may have API risk issues

Hi, In NeuroNER, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

matplotlib==3.0.2
networkx==2.2
pycorenlp==0.3.0
scikit-learn==0.20.2
scipy==1.2.0
spacy==2.0.18
tensorflow==1.1.0
numpy==1.16.0

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project, The version constraint of dependency matplotlib can be changed to >=1.3.0,<=3.0.3. The version constraint of dependency networkx can be changed to >=2.0,<=2.8.4. The version constraint of dependency scikit-learn can be changed to >=0.15.0,<=0.20.4. The version constraint of dependency spacy can be changed to >=0.100.0,<=3.3.1.

The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the matplotlib

matplotlib.colors.ListedColormap
matplotlib.cm.get_cmap
matplotlib.use

The calling methods from the networkx

max

The calling methods from the scikit-learn

sklearn.preprocessing.LabelBinarizer.fit
sklearn.preprocessing.LabelBinarizer.transform
sklearn.metrics.precision_recall_fscore_support
sklearn.preprocessing.normalize
random.choice
sklearn.metrics.confusion_matrix
sklearn.metrics.classification_report
sklearn.preprocessing.LabelBinarizer
sklearn.metrics.f1_score
sklearn.metrics.accuracy_score

The calling methods from the spacy

spacy.load

The calling methods from the all methods

distutils.util.strtobool
neuroner.conll_to_brat.conll_to_brat
plot_handles.extend
neuroner.neuromodel.fetch_data
numpy.arange
bioes_filepath.utils.get_basename_without_extension.split
self.unique_label_indices_of_interest.append
matplotlib.pyplot.axvline
tensorflow.python.tools.inspect_checkpoint.print_tensors_in_checkpoint_file
tensorflow.contrib.tensorboard.plugins.projector.ProjectorConfig
tensorflow.stack
pprint.pprint
tensorflow.constant
neuroner.utils.get_parameter_to_section_of_configparser
self._parse_dataset
pkg_resources.resource_isdir
collections.OrderedDict
token_dict.split
argparse.ArgumentParser.print_help
get_current_time_in_seconds
sorted.append
input_conll_filepath.utils.get_basename_without_extension.split
annotation_filepath.codecs.open.close
conll_to_brat
numpy.histogram
os.path.abspath
dataset_type.character_indices_padded.append
json.dump
random.choice.split
f.write
neuroner.utils_nlp.is_token_in_pretrained_embeddings
neuroner.utils.get_basename_without_extension
config.sections
p.vertices.mean
time.strftime
self.index_to_label.keys
neuroner.utils_nlp.load_pretrained_token_embeddings
token.split
self.tokens_mapped_to_unk.append
tensorflow.name_scope
type
os.makedirs
codecs.open
os.path.getsize
pretraining_string_to_index.keys
numpy.linspace
str
neuroner.brat_to_conll.check_brat_annotation_and_text_compatibility
matplotlib.pyplot.gca.set_xticklabels
sess.run
matplotlib.pyplot.gca.text
AssertionError
matplotlib.pyplot.clf
json.load
line.str.replace
tensorflow.argmax
neuroner.evaluate.save_results
output_conll_lines_with_bioes
tensorflow.train.Saver.save
neuroner.utils.reverse_dictionary.keys
dataset_filepaths.get
_get_default_param.items
sorted.add
os.path.isfile
pickle.load
dataset_filepaths.keys
support.append
line.split.replace
split_lines.append
all_y_true.extend
shutil.rmtree
matplotlib.pyplot.gcf.set_size_inches
get_sentences_and_tokens_from_stanford.append
neuroner.train.predict_labels
unary_scores.tolist
self.index_to_character.keys
tensorflow.nn.embedding_lookup
tensorflow.square
matplotlib.pyplot.xlabel
entity_lstm.EntityLSTM
neuroner.evaluate.evaluate_model
line.strip
self.dataset_filepaths.update
tensorflow.nn.tanh
tokens.append
sorted.sort
tensorflow.expand_dims
writers.add_summary
tensorflow.zeros
neuroner.train.prediction_step
neuroner.neuromodel.fetch_model
input_filepath.xml.etree.ElementTree.parse.getroot
line.split.strip
matplotlib.pyplot.gca.set_yticks
matplotlib.pyplot.title
vars
matplotlib.pyplot.gca.invert_yaxis
epoch_number.results.append
kwargs.items
self.character_indices.update
labels_bioes.append
neuroner.dataset.Dataset
codecs.open.write
argparse.ArgumentParser
self.token_embedding_weights.assign
self.token_embedding_weights.read_value
neuroner.utils.pad_list
cm2inch
generate_reference_text_file_for_conll
line.strip.replace
writers.flush
labels_bio.append
line.strip.split.strip
os.path.splitext
neuroner.utils_nlp.bioes_to_bio
matplotlib.pyplot.figure
matplotlib.colors.ListedColormap
file_obj.RenameUnpickler.load
self.saver.restore
sklearn.metrics.classification_report
filepath.os.path.basename.replace
datetime.datetime.now
end_current_entity
tensorflow.reduce_mean
dataset_type.token_lengths.append
map
argparse.ArgumentParser.add_argument
self.model.load_pretrained_token_embeddings
remove_bio_from_label_name
self.load_pretrained_token_embeddings
l.rstrip.replace.replace.replace.strip.split
tensorflow.summary.histogram
get_entities_from_brat
self._create_stats_graph_folder
tensorflow.Variable
self.sess.close
self.optimizer.compute_gradients
config.items
round
tag.get
matplotlib.pyplot.ylabel
self.index_to_token.keys
self.load_embeddings_from_pretrained_model
check_validity_of_conll_bioes
tensorflow.summary.FileWriter
get_sentences_and_tokens_from_stanford
os.path.exists
dataset_type.character_indices.append
dataset_types.append
os.path.join
sys.exit
plotMat.append
document_count.str.zfill
shutil.copytree
os.system
pycorenlp.StanfordCoreNLP
token_dict.strip
tensorflow.Session
tensorflow.tile
_get_default_param
ax.yaxis.get_major_ticks
_clean_param_dtypes.items
bio_to_bioes
sorted
matplotlib.pyplot.bar
neuroner.utils_plots.heatmap
tensorflow.variable_scope
labels.y_pred.y_true.sklearn.metrics.precision_recall_fscore_support.tolist
print
line.strip.split.pop
tensorflow.reduce_max
numpy.copy.flatten
pc.update_scalarmappable
matplotlib.pyplot.xlim
output_file.write
bidirectional_LSTM
numpy.fill_diagonal
IOError
tensorflow.contrib.rnn.CoupledInputForgetGateLSTMCell
tensorflow.clip_by_value
neuroner.conll_to_brat.output_brat
all_predictions.extend
line.split
matplotlib.pyplot.plot
os.path.relpath
tensorflow.assign
open
l.rstrip.replace.replace
self.model.restore_from_pretrained_model
self.modeldata.update_dataset
sklearn.metrics.confusion_matrix.tolist
collections.defaultdict.items
self.dataset_brat_folders.update
sorted.remove
assess_model
matplotlib.cm.get_cmap
dataset_type.writers.close
tensorflow.nn.bidirectional_dynamic_rnn
len
_get_config_param
get_stanford_annotations
shutil.copyfile
embedding_weights.read_value
neuroner.neuromodel.load_parameters
xml_to_brat
neuroner.utils_nlp.load_pretrained_token_embeddings.keys
tensorflow.contrib.rnn.LSTMStateTuple
tensorflow.contrib.layers.xavier_initializer
plot_handles.append
token.strip
matplotlib.pyplot.gcf
param_config.items
matplotlib.pyplot.gca.barh
neuroner.utils.convert_configparser_to_dictionary
tensorflow.summary.scalar
dataset_type.token_indices.append
matplotlib.pyplot.subplots
neuroner.brat_to_conll.get_entities_from_brat
numpy.random.rand
self.characters.update
self._get_valid_dataset_filepaths
tensorflow.contrib.crf.viterbi_decode
prepare_pretrained_model_for_restoring
os.listdir
new_token_sequence.append
range
predictions.tolist.tolist
neuroner.neuromodel.NeuroNER.close
dataset_type.characters.append
matplotlib.pyplot.grid
_clean_param_dtypes
pc.get_array
tensorflow.global_variables_initializer
dataset_type.label_indices.append
neuroner.utils.convert_configparser_to_dictionary.items
pretraining_dataset.label_to_index.keys
self.define_training_procedure
labels.append
random.shuffle
tensorflow.reduce_min
pretraining_dataset.index_to_token.values
cmap
tensorflow.nn.xw_plus_b
tensorflow.get_collection
_fetch
neuroner.utils.renamed_load
numpy.copy
entity.replace
neuroner.neuromodel.NeuroNER.fit
prediction_step
configparser.ConfigParser.set
trim_dataset_pickle
matplotlib.pyplot.gca
sklearn.preprocessing.normalize.flatten
sklearn.metrics.f1_score
tensorflow.train.Saver.restore
matplotlib.pyplot.colorbar
sklearn.preprocessing.LabelBinarizer.fit
tensorflow.summary.merge_all
pretraining_dataset.index_to_character.values
trim_model_checkpoint
self.character_indices_padded.update
classification_report.split
l.rstrip.replace.replace.replace.strip
configparser.ConfigParser.read
neuroner.utils_nlp.remove_bio_from_label_name
dataset.token_to_index.keys
neuroner.utils.reverse_dictionary
os.path.dirname
os.remove
self._check_param_compatibility
_clean_param_dtypes.update
labels.copy
show_values
pc.get_facecolors
self.label_vector_indices.update
tensorflow.cast
get_sentences_and_tokens_from_spacy
random.choice
neuroner.train.train_step
operator.itemgetter
parse_arguments.items
tensorflow.get_variable
dictionary.items
self.prediction_count.str.zfill
list
sentence_tokens.append
neuroner.evaluate.remap_labels
int
os.path.basename
numpy.random.uniform
tensorflow.contrib.tensorboard.plugins.projector.visualize_embeddings
token.replace
tensorflow.concat
neuroner.utils_nlp.get_parsed_conll_output
colors.append
copy.copy
pkg_resources.resource_filename
classes.append
neuroner.entity_lstm.EntityLSTM
save_results
tensorflow.train.GradientDescentOptimizer
neuroner.utils_tf.resize_tensor_variable
tensorflow.sqrt
l.rstrip.replace.replace.replace
classification_report.keys
neuroner.neuromodel.NeuroNER
infrequent_token_indices.append
neuroner.utils_plots.plot_classification_report
neuroner.utils_nlp.replace_unicode_whitespaces_with_ascii_whitespace
dataset_type.label_vector_indices.append
dataset_type.f1_dict_all.append
epoch_number.str.zfill
check_param_compatibility
enumerate
os.path.isdir
new_label_sequence.append
neuroner.utils.order_dictionary.get
neuroner.utils.copytree
neuroner.utils.order_dictionary
super
neuroner.brat_to_conll.brat_to_conll
token.lower
min
tensorflow.nn.dropout
class_names.append
numpy.all
argparse.ArgumentParser.parse_args
str.lower
self.label_indices.update
embeddings_projector_config.embeddings.add
collections.defaultdict.keys
ax.xaxis.get_major_ticks
sklearn.metrics.confusion_matrix
conf_parameters.write
spacy.load
join
input_filepath.xml.etree.ElementTree.parse.getroot.findall
token.token.text.replace
max
self.modeldata.load_dataset
spacy_nlp
time.time
tensorflow.train.AdadeltaOptimizer
list.remove
ax.pcolor.append
load_parameters
heatmap
sklearn.preprocessing.LabelBinarizer.transform
embedding_weights.assign
tensorflow.ConfigProto
self.token_indices.update
self.sess.run
cur_line.split.split
matplotlib.use
line.strip.replace.split
matplotlib.pyplot.ylim
tensorflow.train.Saver
codecs.open.readline
plot_f1_vs_epoch
shutil.copy
all
self.unique_labels_of_interest.remove
time.localtime
cur_line.split.strip
warnings.warn
numpy.array
re.sub
tensorflow.variables_initializer
f.read
RenameUnpickler
ValueError
tensorflow.train.AdamOptimizer
format
set
utils.create_folder_if_not_exists
matplotlib.pyplot.gca.set_yticklabels
random.randint
neuroner.utils.get_current_time_in_miliseconds
original_conll_file.readline.strip
neuroner.utils.create_folder_if_not_exists
pickle.dump
collections.defaultdict
neuroner.utils_nlp.convert_conll_from_bio_to_bioes
self.__del__
sklearn.preprocessing.normalize
string.split
sklearn.preprocessing.LabelBinarizer
glob.glob
get_cmap
pretraining_dataset.label_to_index.copy
pycorenlp.StanfordCoreNLP.annotate
main
tensorflow.equal
output_entities
parse_arguments
token_dict.replace
tensorflow.placeholder
input_filepath.xml.etree.ElementTree.parse.getroot.findtext
float
result.update
AUC.flatten.min
get_start_and_end_offset_of_token_from_spacy
self.unique_labels.append
model.saver.save
matplotlib.pyplot.legend
writers.get_logdir
f1.np.asarray.argmax
dataset.__dict__.keys
pc.get_paths
l.rstrip.replace
set.add
dataset_type.result_update.update
remap_labels
abs
index_to_string.items
entities.append
check_bio_bioes_compatibility
tensorflow.nn.softmax_cross_entropy_with_logits
line.strip.split.append
tuple
matplotlib.pyplot.gca.set_xticks
output_filepaths.keys
AUC.flatten.max
self.RenameUnpickler.super.find_class
xml.etree.ElementTree.parse
sklearn.metrics.accuracy_score
self._convert_to_indices
numpy.asarray
line.split.split
self.token_lengths.update
f.read.splitlines
matplotlib.pyplot.axhline
tensorflow.shape
codecs.open.close
l.rstrip
dict
bioes_to_bio
matplotlib.pyplot.gca.pcolor
warnings.filterwarnings
neuroner.conll_to_brat.check_compatibility_between_conll_and_brat_text
json.loads
zip
matplotlib.pyplot.close
shutil.copy2
self.sess.as_default
neuroner.utils_tf.variable_summaries
line.strip.split
self.optimizer.apply_gradients
neuroner.conll_to_brat.output_entities
ax.xaxis.tick_top
tensorflow.contrib.crf.crf_log_likelihood
sklearn.metrics.precision_recall_fscore_support
matplotlib.pyplot.savefig
configparser.ConfigParser
setuptools.setup
results.keys
tensorflow.squeeze

@developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

opened by PyDeps 0

link from README.md to conll2003 dataset broken

the link from README.md to https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/data/conll2003/en is not working.

it is from this para: "We provide several examples of datasets: data/conll2003/en: annotated dataset with the CoNLL-2003 format, containing 3 files (train.txt, valid.txt and test.txt).

I assume that this is not a problem per se, since the dataset is available at https://huggingface.co/datasets/conll2003. You may want to update the broken link though.

opened by poedator 0
Bump numpy from 1.16.0 to 1.22.0
Bumps numpy from 1.16.0 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Switch input/output folder vars in prepare_pretrained_model.py

It seems like in https://github.com/Franck-Dernoncourt/NeuroNER/blob/master/neuroner/prepare_pretrained_model.py#L105, input_model_folder should use model_name and output_model_folder should use output_folder_name.

Do you concur?

opened by matt-thomas 1

Releases(1.0-dev2)

1.0-dev2(Mar 13, 2019)

Changes: see https://github.com/Franck-Dernoncourt/NeuroNER/pull/130
Source code(tar.gz)
Source code(zip)

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Related tags

Overview

NeuroNER

Table of Contents

Requirements

Installation

Installing BRAT (optional)

Installing Perl (platform dependent)

Using NeuroNER

Using NeuroNer from a Python interpreter

Using NeuroNer from the command line

Adding a new dataset

Using a pretrained model

Sharing a pretrained model

Using TensorBoard

Citation

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Releases(1.0-dev2)

1.0-dev2(Mar 13, 2019)

Owner

Franck Dernoncourt

Train and use generative text models in a few lines of code.

Collection of useful (to me) python scripts for interacting with napari

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

A fast hierarchical dimensionality reduction algorithm.

Conditional Transformer Language Model for Controllable Generation

A Practitioner's Guide to Natural Language Processing

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Switch spaces for knowledge graph embeddings

Facilitating the design, comparison and sharing of deep text matching models.

Beautiful visualizations of how language differs among document types.

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Tools and data for measuring the popularity & growth of various programming languages.

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Word Bot for JKLM Bomb Party

Continuously update some NLP practice based on different tasks.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

skweak: A software toolkit for weak supervision applied to NLP tasks

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio