Yet another Python binding for fastText

Last update: Nov 16, 2022

Overview

pyfasttext

Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresearch/fastText/tree/master/python

Yet another Python binding for fastText.

The binding supports Python 2.6, 2.7 and Python 3. It requires Cython.

Numpy and cysignals are also dependencies, but are optional.

pyfasttext has been tested successfully on Linux and Mac OS X.
Warning: if you want to compile pyfasttext on Windows, do not compile with the cysignals module because it does not support this platform.

pyfasttext
Table of Contents
- Installation
- Usage

Installation

To compile pyfasttext, make sure you have the following compiler:

GCC (g++) with C++11 support.
LLVM (clang++) with (at least) partial C++17 support.

Simplest way to install pyfasttext: use pip

Just type these lines:

pip install cython
pip install pyfasttext

Possible compilation error

If you have a compilation error, you can try to install cysignals manually:

pip install cysignals

Then, retry to install pyfasttext with the already mentioned pip command.

Cloning

pyfasttext uses git submodules.
So, you need to add the --recursive option when you clone the repository.

git clone --recursive https://github.com/vrasneur/pyfasttext.git
cd pyfasttext

Requirements for Python 2.7

Python 2.7 support relies on the future module: pyfasttext needs bytes objects, which are not available natively in Python2.
You can install the future module with pip.

pip install future

Building and installing manually

First, install all the requirements:

pip install -r requirements.txt

Then, build and install with setup.py:

python setup.py install

Building and installing without optional dependencies

pyfasttext can export word vectors as numpy ndarrays, however this feature can be disabled at compile time.

To compile without numpy, pyfasttext has a USE_NUMPY environment variable. Set this variable to 0 (or empty), like this:

USE_NUMPY=0 python setup.py install

If you want to compile without cysignals, likewise, you can set the USE_CYSIGNALS environment variable to 0 (or empty).

Usage

How to load the library?

>>> from pyfasttext import FastText

How to load an existing model?

>>> model = FastText('/path/to/model.bin')

>>> model = FastText()
>>> model.load_model('/path/to/model.bin')

Word representation learning

You can use all the options provided by the fastText binary (input, output, epoch, lr, ...).
Just use keyword arguments in the training methods of the FastText object.

Training using Skipgram

>>> model = FastText()
>>> model.skipgram(input='data.txt', output='model', epoch=100, lr=0.7)

Training using CBoW

>>> model = FastText()
>>> model.cbow(input='data.txt', output='model', epoch=100, lr=0.7)

Word vectors

Word vectors access

Vector for a given word

By default, a single word vector is returned as a regular Python array of floats.

>>> model['dog']
array('f', [-1.308749794960022, -1.8326224088668823, ...])

Numpy ndarray

The model.get_numpy_vector(word) method returns the word vector as a numpy ndarray.

>>> model.get_numpy_vector('dog')
array([-1.30874979, -1.83262241, ...], dtype=float32)

If you want a normalized vector (i.e. the vector divided by its norm), there is an optional boolean parameter named normalized.

>>> model.get_numpy_vector('dog', normalized=True)
array([-0.07084749, -0.09920666, ...], dtype=float32)

Words for a given vector

The inverse operation of model[word] or model.get_numpy_vector(word) is model.words_for_vector(vector, k).
It returns a list of the k words closest to the provided vector. The default value for k is 1.

>>> king = model.get_numpy_vector('king')
>>> man = model.get_numpy_vector('man')
>>> woman = model.get_numpy_vector('woman')
>>> model.words_for_vector(king + woman - man, k=1)
[('queen', 0.77121970653533936)]

Get the number of words in the model

>>> model.nwords
500000

Get all the word vectors in a model

>>> for word in model.words:
...   print(word, model[word])

Numpy ndarray

If you want all the word vectors as a big numpy ndarray, you can use the numpy_normalized_vectors member. Note that all these vectors are normalized.

>>> model.nwords
500000
>>> model.numpy_normalized_vectors
array([[-0.07549749, -0.09407753, ...],
       [ 0.00635979, -0.17272158, ...],
       ..., 
       [-0.01009259,  0.14604086, ...],
       [ 0.12467574, -0.0609326 , ...]], dtype=float32)
>>> model.numpy_normalized_vectors.shape
(500000, 100) # (number of words, dimension)

Misc operations with word vectors

Word similarity

>>> model.similarity('dog', 'cat')
0.75596606254577637

Most similar words

>>> model.nearest_neighbors('dog', k=2)
[('dogs', 0.7843924736976624), ('cat', 75596606254577637)]

Analogies

The model.most_similar() method works similarly as the one in gensim.

>>> model.most_similar(positive=['woman', 'king'], negative=['man'], k=1)
[('queen', 0.77121970653533936)]

Text classification

Supervised learning

>>> model = FastText()
>>> model.supervised(input='/path/to/input.txt', output='/path/to/model', epoch=100, lr=0.7)

Get all the labels

>>> model.labels
['LABEL1', 'LABEL2', ...]

Get the number of labels

>>> model.nlabels
100

Prediction

To obtain the k most likely labels from test sentences, there are multiple model.predict_*() methods.
The default value for k is 1. If you want to obtain all the possible labels, use None for k.

Labels and probabilities

If you have a list of strings (or an iterable object), use this:

>>> model.predict_proba(['first sentence\n', 'second sentence\n'], k=2)
[[('LABEL1', 0.99609375), ('LABEL3', 1.953126549381068e-08)], [('LABEL2', 1.0), ('LABEL3', 1.953126549381068e-08)]]

If you want to test a single string, use this:

>>> model.predict_proba_single('first sentence\n', k=2)
[('LABEL1', 0.99609375), ('LABEL3', 1.953126549381068e-08)]

WARNING: In order to get the same probabilities as the fastText binary, you have to add a newline (\n) at the end of each string.

If your test data is stored inside a file, use this:

>>> model.predict_proba_file('/path/to/test.txt', k=2)
[[('LABEL1', 0.99609375), ('LABEL3', 1.953126549381068e-08)], [('LABEL2', 1.0), ('LABEL3', 1.953126549381068e-08)]]

Normalized probabilities

For performance reasons, fastText probabilities often do not sum up to 1.0.

If you want normalized probabilities (where the sum is closer to 1.0 than the original probabilities), you can use the normalized=True parameter in all the methods that output probabilities (model.predict_proba(), model.predict_proba_file() and model.predict_proba_single()).

>>> sum(proba for label, proba in model.predict_proba_single('this is a sentence that needs to be classified\n', k=None))
0.9785203068801335
>>> sum(proba for label, proba in model.predict_proba_single('this is a sentence that needs to be classified\n', k=None, normalized=True))
0.9999999999999898

Labels only

If you have a list of strings (or an iterable object), use this:

>>> model.predict(['first sentence\n', 'second sentence\n'], k=2)
[['LABEL1', 'LABEL3'], ['LABEL2', 'LABEL3']]

If you want to test a single string, use this:

>>> model.predict_single('first sentence\n', k=2)
['LABEL1', 'LABEL3']

WARNING: In order to get the same probabilities as the fastText binary, you have to add a newline (\n) at the end of each string.

If your test data is stored inside a file, use this:

>>> model.predict_file('/path/to/test.txt', k=2)
[['LABEL1', 'LABEL3'], ['LABEL2', 'LABEL3']]

Quantization

Use keyword arguments in the model.quantize() method.

>>> model.quantize(input='/path/to/input.txt', output='/path/to/model')

You can load quantized models using the FastText constructor or the model.load_model() method.

Is a model quantized?

If you want to know if a model has been quantized before, use the model.quantized attribute.

>>> model = FastText('/path/to/model.bin')
>>> model.quantized
False
>>> model = FastText('/path/to/model.ftz')
>>> model.quantized
True

Subwords

fastText can use subwords (i.e. character ngrams) when doing unsupervised or supervised learning.

You can access the subwords, and their associated vectors, using pyfasttext.

Get the subwords

fastText's word embeddings can be augmented with subword-level information. It is possible to retrieve the subwords and their associated vectors from a model using pyfasttext.

To retrieve all the subwords for a given word, use the model.get_all_subwords(word) method.

>>> model.args.get('minn'), model.args.get('maxn')
(2, 4)
>>> model.get_all_subwords('hello') # word + subwords from 2 to 4 characters
['hello', '<h', '<he', '<hel', 'he', 'hel', 'hell', 'el', 'ell', 'ello', 'll', 'llo', 'llo>', 'lo', 'lo>', 'o>']

For fastText, < means "beginning of a word" and > means "end of a word".

As you can see, fastText includes the full word. You can omit it using the omit_word=True keyword argument.

>>> model.get_all_subwords('hello', omit_word=True)
['<h', '<he', '<hel', 'he', 'hel', 'hell', 'el', 'ell', 'ello', 'll', 'llo', 'llo>', 'lo', 'lo>', 'o>']

When a model is quantized, fastText may prune some subwords. If you want to see only the subwords that are really used when computing a word vector, you should use the model.get_subwords(word) method.

>>> model.quantized
True
>>> model.get_subwords('beautiful')
['eau', 'aut', 'ful', 'ul']
>>> model.get_subwords('hello')
['hello'] # fastText will not use any subwords when computing the word vector, only the full word

Get the subword vectors

To get the individual vectors given the subwords, use the model.get_numpy_subword_vectors(word) method.

>>> model.get_numpy_subword_vectors('beautiful') # 4 vectors, so 4 rows
array([[ 0.49022141,  0.13586822,  ..., -0.14065443,  0.89617103], # subword "eau"
       [-0.42594951,  0.06260503,  ..., -0.18182631,  0.34219387], # subword "aut"
       [ 0.49958718,  2.93831301,  ..., -1.97498322, -1.16815805], # subword "ful"
       [-0.4368791 , -1.92924356,  ...,  1.62921488, 1.90240896]], dtype=float32) # subword "ul"

In fastText, the final word vector is the average of these individual vectors.

>>> import numpy as np
>>> vec1 = model.get_numpy_vector('beautiful')
>>> vecs2 = model.get_numpy_subword_vectors('beautiful')
>>> np.allclose(vec1, np.average(vecs2, axis=0))
True

Sentence and text vectors

To compute the vector of a sequence of words (i.e. a sentence), fastText uses two different methods:

one for unsupervised models
another one for supervised models

When fastText computes a word vector, recall that it uses the average of the following vectors: the word itself and its subwords.

Unsupervised models

For unsupervised models, the representation of a sentence for fastText is the average of the normalized word vectors.

To get the resulting vector as a regular Python array, use the model.get_sentence_vector(line) method.
To get the resulting vector as a numpy ndarray, use the model.get_numpy_sentence_vector(line) method.

>>> vec = model.get_numpy_sentence_vector('beautiful cats')
>>> vec1 = model.get_numpy_vector('beautiful', normalized=True)
>>> vec2 = model.get_numpy_vector('cats', normalized=True)
>>> np.allclose(vec, np.average([vec1, vec2], axis=0)
True

Supervised models

For supervised models, fastText uses the regular word vectors, as well as vectors computed using word ngrams (i.e. shorter sequences of words from the sentence). When computing the average, these vectors are not normalized.

To get the resulting vector as a regular Python array, use the model.get_text_vector(line) method.
To get the resulting vector as a numpy ndarray, use the model.get_numpy_text_vector(line) method.

>>> model.get_numpy_sentence_vector('beautiful cats') # for an unsupervised model
array([-0.20266785,  0.3407566 ,  ...,  0.03044436,  0.39055538], dtype=float32)
>>> model.get_numpy_text_vector('beautiful cats') # for a supervised model
array([-0.20840774,  0.4289546 ,  ..., -0.00457615,  0.52417743], dtype=float32)

Misc utilities

Show the module version

>>> import pyfasttext
>>> pyfasttext.__version__
'0.4.3'

Show fastText version

As there is no version number in fastText, we use the latest fastText commit hash (from HEAD) as a substitute.

>>> import pyfasttext
>>> pyfasttext.__fasttext_version__
'431c9e2a9b5149369cc60fb9f5beba58dcf8ca17'

Show the model (hyper)parameters

>>> model.args
{'bucket': 11000000,
 'cutoff': 0,
 'dim': 100,
 'dsub': 2,
 'epoch': 100,
...
}

Show the model version number

fastText uses a versioning scheme for its generated models. You can retrieve the model version number using the model.version attribute.

version number	description
-1	for really old models with no version number
11	first version number added by fastText
12	for models generated after fastText added support for subwords in supervised learning

>>> model.version
12

Extract labels or classes from a dataset

You can use the FastText object to extract labels or classes from a dataset. The label prefix (which is __label__ by default) is set using the label parameter in the constructor.

If you load an existing model, the label prefix will be the one defined in the model.

>>> model = FastText(label='__my_prefix__')

Extract labels

There can be multiple labels per line.

>>> model.extract_labels('/path/to/dataset1.txt')
[['LABEL2', 'LABEL5'], ['LABEL1'], ...]

Extract classes

There can be only one class per line.

>>> model.extract_classes('/path/to/dataset2.txt')
['LABEL3', 'LABEL1', 'LABEL2', ...]

Exceptions

The fastText source code directly calls exit() when something wrong happens (e.g. a model file does not exist, ...).

Instead of exiting, pyfasttext raises a Python exception (RuntimeError).

>>> import pyfasttext
>>> model = pyfasttext.FastText('/path/to/non-existing_model.bin')
Model file cannot be opened for loading!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/pyfasttext.pyx", line 124, in pyfasttext.FastText.__cinit__ (src/pyfasttext.cpp:1800)
  File "src/pyfasttext.pyx", line 348, in pyfasttext.FastText.load_model (src/pyfasttext.cpp:5947)
RuntimeError: fastext tried to exit: 1

Interruptible operations

pyfasttext uses cysignals to make all the computationally intensive operations (e.g. training) interruptible.

To easily interrupt such an operation, just type Ctrl-C in your Python shell.

>>> model.skipgram(input='/path/to/input.txt', output='/path/to/mymodel')
Read 12M words
Number of words:  60237
Number of labels: 0
... # type Ctrl-C during training
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/pyfasttext.pyx", line 680, in pyfasttext.FastText.skipgram (src/pyfasttext.cpp:11125)
  File "src/pyfasttext.pyx", line 674, in pyfasttext.FastText.train (src/pyfasttext.cpp:11009)
  File "src/pyfasttext.pyx", line 668, in pyfasttext.FastText.train (src/pyfasttext.cpp:10926)
  File "src/cysignals/signals.pyx", line 94, in cysignals.signals.sig_raise_exception (build/src/cysignals/signals.c:1328)
KeyboardInterrupt
>>> # you can have your shell back!

Comments

The Model just can not be loaded. | "RuntimeError: fastext tried to exit: 1"
The Model just can not be loaded. I've tried every possible approach that I could came up with, nothing worked. Some people discussed about it here: #125, but nothing helpful that I found.

EMBEDDINGS_MODEL_PATH = '~/fastText/result/fil9.bin' self.word_model = FastText(EMBEDDINGS_MODEL_PATH) File "src/pyfasttext.pyx", line 137, in pyfasttext.FastText.__cinit__ (src/pyfasttext.cpp:2249) File "src/pyfasttext.pyx", line 466, in pyfasttext.FastText.load_model (src/pyfasttext.cpp:7906) RuntimeError: fastext tried to exit: 1

Is there any advice someone can give me on this? It would be very helpful.

Thanks a lot.
opened by adamliuio 12
How does pyfasttext handle sentence inputs?

Hi Vincent,

I had a question about how pyfasttext handles inputs - how do you process sentences, as opposed to individual tokens? For instance, do I need to separately tokenise my input text, or can I provide it without any preprocessing? From my initial attempts it appears that I can, but since you don't mention it your documentation (which is great, by the way), I wasn't sure.

Thank you for creating this package!

opened by nsanthanam 11
installation failure on Mac OS with gcc 8.2: fatal error: 'random' file not found

ERROR MESSAGES: src/pyfasttext.cpp:648:10: fatal error: 'random' file not found #include . ^~~~~~~~ . 1 warning and 1 error generated. error: command 'gcc' failed with exit status 1

Have installed gcc and LLVM... UK-xxxx:vercheng$ gcc --version Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 10.0.0 (clang-1000.10.40.1) Target: x86_64-apple-darwin17.7.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin

tried this: https://github.com/vrasneur/pyfasttext/issues/24 didn't help wondering if it's due to gcc latest version 8.2 Thanks

opened by chengwq613 7

Error during compilation - "fatal error: 'cstdint' file not found"

Here's my output:

➜  pyfasttext git:(master) python3 setup.py install
Compiling src/pyfasttext.pyx because it changed.
[1/1] Cythonizing src/pyfasttext.pyx
running install
running build
running build_ext
building 'pyfasttext' extension
creating build
creating build/temp.macosx-10.7-x86_64-3.6
creating build/temp.macosx-10.7-x86_64-3.6/src
creating build/temp.macosx-10.7-x86_64-3.6/src/fastText
creating build/temp.macosx-10.7-x86_64-3.6/src/fastText/src
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -iquote . -include src/custom_exit.h -I. -I/Users/xx/anaconda/include/python3.6m -c src/pyfasttext.cpp -o build/temp.macosx-10.7-x86_64-3.6/src/pyfasttext.o
In file included from <built-in>:1:
./src/custom_exit.h:14:61: error: no member named 'to_string' in namespace 'std'
  throw std::runtime_error("fastext tried to exit: " + std::to_string(status));
                                                       ~~~~~^
In file included from src/pyfasttext.cpp:488:
src/fastText/src/vector.h:13:10: fatal error: 'cstdint' file not found
#include <cstdint>
         ^
2 errors generated.
error: command 'gcc' failed with exit status 1

Compiling the "normal" FastText binaries, as well as SaleStock's fastText.py works without any problems. Any thoughts on how to fix this?

opened by joneidejohnsen 6

pip install fails with Cython 0.28

Cython 0.28 was released yesterday. pip install pyfasttext using this Cython breaks with:

Collecting pyfasttext
  Downloading pyfasttext-0.4.4.tar.gz (235kB)
    Complete output from command python setup.py egg_info:
    Collecting cysignals
      Downloading cysignals-1.6.9.tar.gz (85kB)
    Building wheels for collected packages: cysignals
      Running setup.py bdist_wheel for cysignals: started
      Running setup.py bdist_wheel for cysignals: finished with status 'done'
      Stored in directory: /root/.cache/pip/wheels/c3/dd/fa/e7a20f8ca22a48bb55b07486dde4e8ed256907192218339b72
    Successfully built cysignals
    Installing collected packages: cysignals
    Successfully installed cysignals-1.6.9

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
          ret['label'] = self.label
          return ret

        cdef size_t index = 0
        args = get_fasttext_args(self.ft)
        args_map = get_args_map(args)
                               ^
    ------------------------------------------------------------

    src/pyfasttext.pyx:264:28: Cannot assign type 'shared_ptr[const Args]' to 'const shared_ptr[const Args]'
    Compiling src/pyfasttext.pyx because it depends on /usr/local/lib/python2.7/dist-packages/cysignals/signals.pxd.
    [1/1] Cythonizing src/pyfasttext.pyx
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-_kbz9r/pyfasttext/setup.py", line 88, in <module>
        'FASTTEXT_VERSION': get_fasttext_commit_hash()}),
      File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 1026, in cythonize
        cythonize_one(*args)
      File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 1146, in cythonize_one
        raise CompileError(None, pyx_file)
    Cython.Compiler.Errors.CompileError: src/pyfasttext.pyx

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-_kbz9r/pyfasttext/

If I do pip install Cython==0.27.3 first I can successfully install pyfasttext

The above was run using Python 2.7.12 on Ubuntu 16.0.4.3

opened by flawaetz 3

pyfasttext import error

Hi I successfully installed pyfasttext in python 3.6 but at import I have this error:

ImportError Traceback (most recent call last) in () ----> 1 from pyfasttext import FastText

ImportError: /home/stephane.mbatchou/anaconda3/lib/python3.6/site-packages/pyfasttext.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZTINSt6thread6_StateE

Thanks

opened by bananemure 3

Build fails with macOS 10.12

With Python 2.7.14 the compilation fails on macOS 10.12. Requirements (cython, cysignals, future and numpy) are satisfied. Compilation fails both using pip and manual conversion.

C-Compiler is:

Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

src/fasttext_access.cpp:54:48: error: 'find' is a private member of 'fasttext::Dictionary'
ALLOW_CONST_METHOD_ACCESS(Dictionary, int32_t, find, const std::string&);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

opened by MarcoNiemann 3

Quantization error
I am not able to pass other options such as qnorm while trying to quantize the model

model.quantize(input=train_path, output = model_out_path, qnorm=True, retrain=True, cutoff=100000

Gets the below stacktrace

rc/pyfasttext.pyx in pyfasttext.FastText.quantize (src/pyfasttext.cpp:11347)() src/pyfasttext.pyx in pyfasttext.FastText.train (src/pyfasttext.cpp:10844)() RuntimeError: fastext tried to exit: 1
opened by whiletruelearn 2
Installs fails with pip10
Summary

Install currently fails with pip >= 10. This is because this line uses pip's .main API which is no longer present in the package.

See for for more details: https://mail.python.org/pipermail/distutils-sig/2017-October/031642.html https://github.com/pypa/pip/issues/5191

Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-5ikhpkeg/pyfasttext/setup.py", line 18, in <module> ret = pip.main(['install', 'cysignals']) AttributeError: module 'pip' has no attribute 'main'
opened by DomHudson 2

Quantization error: Model file cannot be opened for loading!

Great library! I have an issue with quantization.

When I try to run the quantization example:

from pyfasttext import FastText

model = FastText()
model.quantize(input='data/input_data.txt', output='artifacts/quantized_model',
               epoch=40, lr=0.3,
               dim=80, minn=2,
               maxn=3,
               label='__label__',
               wordNgrams=2)

The python process exits with code 139 and the following error message:

Model file cannot be opened for loading!
------------------------------------------------------------------------
0   signals.cpython-35m-darwin.so       0x0000000110fbcbb8 sigdie + 120
1   signals.cpython-35m-darwin.so       0x0000000110fbcaef cysigs_signal_handler + 351
2   libsystem_platform.dylib            0x00007fff5c0d4f5a _sigtramp + 26
3   libc++abi.dylib                     0x00007fff59efe467 GCC_except_table51 + 119
4   pyfasttext.cpython-35m-darwin.so    0x00000001100f6519 _ZL39__pyx_pf_10pyfasttext_8FastText_36trainP31__pyx_obj_10pyfasttext_FastTextP7_objectS2_ + 6905
5   pyfasttext.cpython-35m-darwin.so    0x00000001100f0c8f _ZL39__pyx_pw_10pyfasttext_8FastText_37trainP7_objectS0_S0_ + 111
6   python                              0x000000010f7de8fe PyCFunction_Call + 62
7   pyfasttext.cpython-35m-darwin.so    0x00000001100e1f71 _ZL19__Pyx_PyObject_CallP7_objectS0_S0_ + 97
8   pyfasttext.cpython-35m-darwin.so    0x00000001100f1573 _ZL42__pyx_pw_10pyfasttext_8FastText_45quantizeP7_objectS0_S0_ + 227
9   python                              0x000000010f7de8fe PyCFunction_Call + 62
10  python                              0x000000010f85f357 PyEval_EvalFrameEx + 23159
11  python                              0x000000010f8624ab _PyEval_EvalCodeWithName + 3115
12  python                              0x000000010f85988c PyEval_EvalCode + 44
13  python                              0x000000010f88881d PyRun_FileExFlags + 205
14  python                              0x000000010f887d88 PyRun_SimpleFileExFlags + 280
15  python                              0x000000010f8a00e6 Py_Main + 2982
16  python                              0x000000010f78f128 main + 232
17  libdyld.dylib                       0x00007fff5be53115 start + 1
18  ???                                 0x0000000000000002 0x0 + 2
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

opened by saxelsen 2

add Trove classifier for POSIX
As of pyfasttext == 0.4.4, the relevant operating systems are not indicated on PyPI, except for the comment:

Warning: pyfasttext does not currently compile on Windows because the cysignals module does not support this platform.

in the README.

(Thanks for mentioning this in the README, it made me aware of cysignals requirements.) Also, it may be worth noting that cysignals appears to work using Cygwin. See also https://github.com/sagemath/cysignals/pull/64.

Also, it is possible to support platforms where cysignals is unavailable via conditional compilation:

IF HAVE_CYSIGNALS: from cysignals.signals cimport sig_on, sig_off, sig_check ELSE: # for non-POSIX systems noop = lambda: None sig_on = noop sig_off = noop sig_check = noop

at:

https://github.com/vrasneur/pyfasttext/blob/35bbbf4e80fc65851b37e1bb988d8d396f1b34cb/src/pyfasttext.pyx#L38

as outlined here. (In this case, a try:...cimport...except ImportError: raises errors when compiling with Cython, so an IF...ELSE is needed).
opened by johnyf 2
pyhdc.cpp:5:10: fatal error: 'cstdint' file not found , 1 error generated. error: command 'gcc' failed with exit status 1

I am using MACOS High Sierra 10.13.6 i tried to install pyhdc, but I got the following error bb$ sudo python3 setup.py install running install running build running build_ext building 'pyhdc' extension gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/eman/anaconda3/include -arch x86_64 -I/Users/eman/anaconda3/include -arch x86_64 -DHNAME=permutations_8160.h -I/Users/eman/anaconda3/lib/python3.7/site-packages/numpy/core/include -I. -I/Users/eman/anaconda3/include/python3.7m -c pyhdc.cpp -o build/temp.macosx-10.7-x86_64-3.7/pyhdc.o -std=c++11 pyhdc.cpp:5:10: fatal error: 'cstdint' file not found #include ^~~~~~~~~ 1 error generated. error: command 'gcc' failed with exit status 1

I installed gcc and Std libraries, but it does not work. Any advice?

opened by emfhasan 0
pyhdc.cpp:5:10: fatal error: 'cstdint' file not found. error generated. error: command 'gcc' failed with exit status 1

I am using MACOS High Sierra 10.13.6 i tried to install pyhdc, but I got the following error eman$ sudo python3 setup.py install running install running build running build_ext building 'pyhdc' extension gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/eman/anaconda3/include -arch x86_64 -I/Users/eman/anaconda3/include -arch x86_64 -DHNAME=permutations_8160.h -I/Users/eman/anaconda3/lib/python3.7/site-packages/numpy/core/include -I. -I/Users/eman/anaconda3/include/python3.7m -c pyhdc.cpp -o build/temp.macosx-10.7-x86_64-3.7/pyhdc.o -std=c++11 pyhdc.cpp:5:10: fatal error: 'cstdint' file not found #include ^~~~~~~~~ 1 error generated. error: command 'gcc' failed with exit status 1

I installed gcc and Std libraries, but it does not work. Any advice?

opened by emfhasan 0
Failed to install pyfasttext on MacOS 10.14.6 (Mojave) via pip

I want to install pyfasttext into my project virtual environment on MacOS 10.14.6 (Mojave) via pip.

Some dependencies have been installed (like cython and cysignals), but I still got the error as below.

Tried re-install XCode command line tools, but did not work.

gcc -DNDEBUG -g -fwrapv -O3 -Wall -iquote . -include src/custom_exit.h -Isrc -I/Users/duan/IDMED/zeta_search/venv/lib/python3.6/site-packages/cysignals -I. -Isrc/variant/include -I/Users/duan/IDMED/zeta_search/venv/include -I/Library/Frameworks/Python.framework/Versions/3.6/include/python3.6m -I/Users/duan/IDMED/zeta_search/venv/lib/python3.6/site-packages/numpy/core/include -c src/fasttext_access.cpp -o build/temp.macosx-10.9-x86_64-3.6/src/fasttext_access.o -Wno-sign-compare -std=c++0x src/fasttext_access.cpp:43:1: error: non-type template argument is not a pointer to member constant ALLOW_METHOD_ACCESS(FastText, bool, checkModel, std::istream&); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/private_access.h:55:38: note: expanded from macro 'ALLOW_METHOD_ACCESS' template struct rob<Only_##MEMBER, (RET_TYPE(CLASS::*)(__VA_ARGS__))(&CLASS::MEMBER)> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/fasttext_access.cpp:54:1: error: non-type template argument is not a pointer to member constant ALLOW_CONST_METHOD_ACCESS(Dictionary, int32_t, find, const std::string&); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/private_access.h:59:38: note: expanded from macro 'ALLOW_CONST_METHOD_ACCESS' template struct rob<Only_##MEMBER, (RET_TYPE(CLASS::*)(__VA_ARGS__) const)(&CLASS::MEMBER)> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/fasttext_access.cpp:55:1: error: non-type template argument is not a pointer to member constant ALLOW_CONST_METHOD_ACCESS(Dictionary, void, pushHash, std::vector<int32_t>&, int32_t); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/private_access.h:59:38: note: expanded from macro 'ALLOW_CONST_METHOD_ACCESS' template struct rob<Only_##MEMBER, (RET_TYPE(CLASS::*)(__VA_ARGS__) const)(&CLASS::MEMBER)> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/fasttext_access.cpp:56:1: error: non-type template argument is not a pointer to member constant ALLOW_METHOD_ACCESS(Dictionary, void, initTableDiscard, ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/private_access.h:55:38: note: expanded from macro 'ALLOW_METHOD_ACCESS' template struct rob<Only_##MEMBER, (RET_TYPE(CLASS::*)(__VA_ARGS__))(&CLASS::MEMBER)> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/fasttext_access.cpp:57:1: error: non-type template argument is not a pointer to member constant ALLOW_METHOD_ACCESS(Dictionary, void, initNgrams, ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/private_access.h:55:38: note: expanded from macro 'ALLOW_METHOD_ACCESS' template struct rob<Only_##MEMBER, (RET_TYPE(CLASS::*)(__VA_ARGS__))(&CLASS::MEMBER)> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5 errors generated. error: command 'gcc' failed with exit status 1 ---------------------------------------- ERROR: Command errored out with exit status 1: /Users/duan/IDMED/zeta_search/venv/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/ls/1mmwn3p11nd2j32fbw3704ww0000gn/T/pip-install-itzgumsa/pyfasttext/setup.py'"'"'; __file__='"'"'/private/var/folders/ls/1mmwn3p11nd2j32fbw3704ww0000gn/T/pip-install-itzgumsa/pyfasttext/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/ls/1mmwn3p11nd2j32fbw3704ww0000gn/T/pip-record-haa1t52z/install-record.txt --single-version-externally-managed --compile --install-headers /Users/duan/IDMED/zeta_search/venv/include/site/python3.6/pyfasttext Check the logs for full command output.

opened by Kungreye 4
Fails install on windows 10 with python 3
Hi, I tried to install pyfasttext on windows 10 under python 3.6. I have g++ from mingw, clang LLVM 7.0.1 and put both of them on environment variable. I set USE_CYSIGNALS=0 because I'm on windows. But when I run the command: python setup.py install I got the following error:

File "setup.py", line 66, in build_extensions if 'clang' in self.compiler.compiler[0]: AttributeError: 'MSVCCompiler' object has no attribute 'compiler'

Can anyone help me on this, I want to be able to use pyfasttext on windows.
opened by Vonisoa 0
official fastText example is very stranger, have anybody know how to used.

official Python binding from the fastText repository: https://github.com/facebookresearch/fastText/tree/master/python , open this website , only little example , compare pyfasttext document , I cannot understand official document , have anybody know how to understand official fasttext example , In my mind pyfasttext document better than official fasttext document

opened by chenbaicheng 0
bayesopt import error

I tried to run the example given in the repo. But i met the bayesopt import error. I couldn't find this in PIP too. Can you guide me, what I have missed?

opened by giriannamalai 2

Releases(0.4.6)

0.4.6(Dec 8, 2018)
fix parsing of boolean quantization arguments

Source code(tar.gz)
Source code(zip)
0.4.5(May 6, 2018)
make cysignals optional (and enables compilation for Windows systems)

add more PyPI trove classifiers (thanks @johnyf)

support pip 10

Source code(tar.gz)
Source code(zip)
pyfasttext-0.4.5.tar.gz(227.11 KB)
0.4.4(Nov 4, 2017)
add support for clang++

add more details in the README

add more examples

fix build with old cython versions

fix a bug when accessing the subwords

Source code(tar.gz)
Source code(zip)
pyfasttext-0.4.4.tar.gz(229.53 KB)
0.4.3(Oct 26, 2017)
update fastText source code

update the variant library source code

add access to subword vectors (Python or numpy array)

add more getters (model version, module version, fastText version, ...)

update README

Source code(tar.gz)
Source code(zip)
pyfasttext-0.4.3.tar.gz(226.22 KB)
0.4.2(Oct 14, 2017)
force installation of cysignals before compiling the module with pip

Source code(tar.gz)
Source code(zip)
pyfasttext-0.4.2.tar.gz(207.01 KB)
0.4.1(Oct 13, 2017)
Fix dependency build order when installing using pip

Add a new model.get_subwords(word) method

Source code(tar.gz)
Source code(zip)
pyfasttext-0.4.1.tar.gz(206.91 KB)
0.4.0(Oct 4, 2017)
Training and other expensive methods are interruptible (e.g. by typing Ctrl-C)

Better const correctness

Update fastText source code (add subwords support in supervised models)

Source code(tar.gz)
Source code(zip)
pyfasttext-0.4.0.tar.gz(208.67 KB)
0.3.0(Sep 2, 2017)
Support older fastText models (before quantization support)

Remove useless parameter

Update fastText source code to latest version

Source code(tar.gz)
Source code(zip)
pyfasttext-0.3.0.tar.gz(202.93 KB)
0.2.2(Aug 22, 2017)
Add documentation for model.words_for_vector() (thanks @yaakov2)

Add continuous integration (build and tests) using Travis CI

Change variant library in order to support g++ 4.7 and 4.8

Source code(tar.gz)
Source code(zip)
pyfasttext-0.2.2.tar.gz(205.66 KB)
0.2.1(Aug 16, 2017)

Add a new model.words_for_vector(vector, k) method. (Thanks @yaakov2)
Source code(tar.gz)
Source code(zip)
pyfasttext-0.2.1.tar.gz(153.72 KB)
0.2.0(Aug 6, 2017)
Fix some bugs

Add support for numpy ndarray (optional)

Update the fastText source to the latest version

Source code(tar.gz)
Source code(zip)
pyfasttext-0.2.0.tar.gz(151.94 KB)
0.1.0(Aug 1, 2017)
Initial release!

Fairly complete support of all the features of the fastText binary (training, prediction, quentization)

Add some new features too (normalized probabilities, label extraction, ...)

Source code(tar.gz)
Source code(zip)
pyfasttext-0.1.0.tar.gz(132.11 KB)

Owner

Vincent Rasneur

GitHub Repository

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Sploitus Command line search tool for sploitus.com. Think searchsploit, but with

5 Mar 07, 2022

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据，将清华新闻数据、搜狗新闻数据等新闻数据集，以及开源的一些摘要数据进行整理清洗，构建一个较完善的中文摘要数据集。数据集清洗时，仅进行了简单地规则清洗。

785 Dec 29, 2022

hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

5 Jul 17, 2022

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

48 Nov 15, 2022

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Negative Sampling for NER Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes u

128 Dec 29, 2022

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

43 Dec 23, 2022

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

- 基于标题的大规模商品实体检索top1 一、任务介绍 CCKS 2020：基于标题的大规模商品实体检索，任务为对于给定的一个商品标题，参赛系统需要匹配到该标题在给定商品库中的对应商品实体。输入：输入文件包括若干行商品标题。输出：输出文本每一行包括此标题对应的商品实体，即给定知识库中商品 ID，

43 Nov 11, 2022

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Python_Natural_Language_Processing This repository contains tutorials on important topics related to Natural Language Processing (NPL). No. Name 01 01

170 Dec 13, 2022

Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

10 Oct 21, 2022

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

37 Jan 04, 2023

NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

2.5k Jan 04, 2023

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

SentimentArcs - Emotion in Text An end-to-end pipeline based on Jupyter notebooks to detect, extract, process and anlayze emotion over time in text. E

14 Dec 19, 2022

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

81 Dec 09, 2022

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

1. What does this library do? Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a prepr

334 Dec 30, 2022

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

74 Oct 07, 2022

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord | Installation Guide ⬅️ Fully autom

11.2k Jan 05, 2023

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural languag

1.1k Jan 03, 2023

GPT-2 Model for Leetcode Questions in python

Leetcode using AI 🤖 GPT-2 Model for Leetcode Questions in python New demo here: https://huggingface.co/spaces/gagan3012/project-code-py Note: the Ans

100 Dec 12, 2022

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

8.5k Jan 03, 2023

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, whic

33 Dec 28, 2022

Yet another Python binding for fastText

Related tags

Overview

pyfasttext

Table of Contents

Installation

Simplest way to install pyfasttext: use pip

Possible compilation error

Cloning

Requirements for Python 2.7

Building and installing manually

Building and installing without optional dependencies

Usage

How to load the library?

How to load an existing model?

Word representation learning

Training using Skipgram

Training using CBoW

Word vectors

Word vectors access

Vector for a given word

Numpy ndarray

Words for a given vector

Get the number of words in the model

Get all the word vectors in a model

Numpy ndarray

Misc operations with word vectors

Word similarity

Most similar words

Analogies

Text classification

Supervised learning

Get all the labels

Get the number of labels

Prediction

Labels and probabilities

Normalized probabilities

Labels only

Quantization

Is a model quantized?

Subwords

Get the subwords

Get the subword vectors

Sentence and text vectors

Unsupervised models

Supervised models

Misc utilities

Show the module version

Show fastText version

Show the model (hyper)parameters

Show the model version number

Extract labels or classes from a dataset

Extract labels

Extract classes

Exceptions

Interruptible operations

Comments

Hi I successfully installed pyfasttext in python 3.6 but at import I have this error:

Summary

Releases(0.4.6)

0.4.6(Dec 8, 2018)

0.4.5(May 6, 2018)

0.4.4(Nov 4, 2017)

0.4.3(Oct 26, 2017)

0.4.2(Oct 14, 2017)

0.4.1(Oct 13, 2017)

0.4.0(Oct 4, 2017)

0.3.0(Sep 2, 2017)

0.2.2(Aug 22, 2017)

0.2.1(Aug 16, 2017)

0.2.0(Aug 6, 2017)

0.1.0(Aug 1, 2017)

Owner

Vincent Rasneur

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

hashily is a Python module that provides a variety of text decoding and encoding operations.

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars