🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

Overview

Thinc: A refreshing functional take on deep learning, compatible with your favorite libraries

From the makers of spaCy, Prodigy and FastAPI

Thinc is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow and MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models. Previous versions of Thinc have been running quietly in production in thousands of companies, via both spaCy and Prodigy. We wrote the new version to let users compose, configure and deploy custom models built with their favorite framework.

Azure Pipelines codecov Current Release Version PyPi Version conda Version Python wheels Code style: black Open demo in Colab

🔥 Features

  • Type-check your model definitions with custom types and mypy plugin.
  • Wrap PyTorch, TensorFlow and MXNet models for use in your network.
  • Concise functional-programming approach to model definition, using composition rather than inheritance.
  • Optional custom infix notation via operator overloading.
  • Integrated config system to describe trees of objects and hyperparameters.
  • Choice of extensible backends.
  • Read more →

🚀 Quickstart

Thinc is compatible with Python 3.6+ and runs on Linux, macOS and Windows. The latest releases with binary wheels are available from pip. Before you install Thinc and its dependencies, make sure that your pip, setuptools and wheel are up to date. For the most recent releases, pip 19.3 or newer is recommended.

pip install -U pip setuptools wheel
pip install thinc --pre

⚠️ Note that Thinc 8.0 is currently in alpha preview and not necessarily ready for production yet.

See the extended installation docs for details on optional dependencies for different backends and GPU. You might also want to set up static type checking to take advantage of Thinc's type system.

⚠️ If you have installed PyTorch and you are using Python 3.7+, uninstall the package dataclasses with pip uninstall dataclasses, since it may have been installed by PyTorch and is incompatible with Python 3.7+.

📓 Selected examples and notebooks

Also see the /examples directory and usage documentation for more examples. Most examples are Jupyter notebooks – to launch them on Google Colab (with GPU support!) click on the button next to the notebook name.

Notebook Description
intro_to_thinc
Open in Colab
Everything you need to know to get started. Composing and training a model on the MNIST data, using config files, registering custom functions and wrapping PyTorch, TensorFlow and MXNet models.
transformers_tagger_bert
Open in Colab
How to use Thinc, transformers and PyTorch to train a part-of-speech tagger. From model definition and config to the training loop.
pos_tagger_basic_cnn
Open in Colab
Implementing and training a basic CNN for part-of-speech tagging model without external dependencies and using different levels of Thinc's config system.
parallel_training_ray
Open in Colab
How to set up synchronous and asynchronous parameter server training with Thinc and Ray.

View more →

📖 Documentation & usage guides

Introduction Everything you need to know.
Concept & Design Thinc's conceptual model and how it works.
Defining and using models How to compose models and update state.
Configuration system Thinc's config system and function registry.
Integrating PyTorch, TensorFlow & MXNet Interoperability with machine learning frameworks
Layers API Weights layers, transforms, combinators and wrappers.
Type Checking Type-check your model definitions and more.

🗺 What's where

Module Description
thinc.api User-facing API. All classes and functions should be imported from here.
thinc.types Custom types and dataclasses.
thinc.model The Model class. All Thinc models are an instance (not a subclass) of Model.
thinc.layers The layers. Each layer is implemented in its own module.
thinc.shims Interface for external models implemented in PyTorch, TensorFlow etc.
thinc.loss Functions to calculate losses.
thinc.optimizers Functions to create optimizers. Currently supports "vanilla" SGD, Adam and RAdam.
thinc.schedules Generators for different rates, schedules, decays or series.
thinc.backends Backends for numpy and cupy.
thinc.config Config parsing and validation and function registry system.
thinc.util Utilities and helper functions.

🐍 Development notes

Thinc uses black for auto-formatting, flake8 for linting and mypy for type checking. All code is written compatible with Python 3.6+, with type hints wherever possible. See the type reference for more details on Thinc's custom types.

👷‍♀️ Building Thinc from source

Building Thinc from source requires the full dependencies listed in requirements.txt to be installed. You'll also need a compiler to build the C extensions.

git clone https://github.com/explosion/thinc
cd thinc
python -m venv .env
source .env/bin/activate
pip install -U pip setuptools wheel
pip install -r requirements.txt
pip install --no-build-isolation .

Alternatively, install in editable mode:

pip install -r requirements.txt
pip install --no-build-isolation --editable .

Or by setting PYTHONPATH:

export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace

🚦 Running tests

Thinc comes with an extensive test suite. The following should all pass and not report any warnings or errors:

python -m pytest thinc    # test suite
python -m mypy thinc      # type checks
python -m flake8 thinc    # linting

To view test coverage, you can run python -m pytest thinc --cov=thinc. We aim for a 100% test coverage. This doesn't mean that we meticulously write tests for every single line – we ignore blocks that are not relevant or difficult to test and make sure that the tests execute all code paths.

Issues
  • thinc_gpu_ops not built properly

    thinc_gpu_ops not built properly

    windows 10, cuda10, spacy100, python3.6

    I tried to check the folder as https://github.com/explosion/thinc/issues/79#issuecomment-461230144 and did not see the cpython file.

    Then I proceed to build gpu_ops by running "pip install --force-reinstall --no-binary :all: thinc-gpu-ops"

    But still there is no cpython file being generated.

    thinc_gpu_ops.AVAILABLE evaluates to False

    Thank you, D

    opened by ghost 39
  • module 'thinc_gpu_ops' has no attribute 'hash'

    module 'thinc_gpu_ops' has no attribute 'hash'

    On Win10, thinc 6.12.0, spacy 2.0.16, cupy 4.1.0

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-5-2432a7701a48> in <module>()
          1 # defining doc
    ----> 2 doc = nlp("Jill laughed at John Johnson.")
          3 spacy.displacy.render(doc, style='dep', options={'distance' : 140}, jupyter=True)
    
    E:\Anaconda3python\lib\site-packages\spacy\language.py in __call__(self, text, disable)
        344             if not hasattr(proc, '__call__'):
        345                 raise ValueError(Errors.E003.format(component=type(proc), name=name))
    --> 346             doc = proc(doc)
        347             if doc is None:
        348                 raise ValueError(Errors.E005.format(name=name))
    
    pipeline.pyx in spacy.pipeline.Tagger.__call__()
    
    pipeline.pyx in spacy.pipeline.Tagger.predict()
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in predict(self, X)
         53     def predict(self, X):
         54         for layer in self._layers:
    ---> 55             X = layer(X)
         56         return X
         57 
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in predict(seqs_in)
        291     def predict(seqs_in):
        292         lengths = layer.ops.asarray([len(seq) for seq in seqs_in])
    --> 293         X = layer(layer.ops.flatten(seqs_in, pad=pad))
        294         return layer.ops.unflatten(X, lengths, pad=pad)
        295 
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in predict(self, X)
         53     def predict(self, X):
         54         for layer in self._layers:
    ---> 55             X = layer(X)
         56         return X
         57 
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in predict(self, X)
        123 
        124     def predict(self, X):
    --> 125         y, _ = self.begin_update(X)
        126         return y
        127 
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in uniqued_fwd(X, drop)
        372                                                     return_counts=True)
        373         X_uniq = layer.ops.xp.ascontiguousarray(X[ind])
    --> 374         Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
        375         Y = Y_uniq[inv].reshape((X.shape[0],) + Y_uniq.shape[1:])
        376         def uniqued_bwd(dY, sgd=None):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(self, X, drop)
         59         callbacks = []
         60         for layer in self._layers:
    ---> 61             X, inc_layer_grad = layer.begin_update(X, drop=drop)
         62             callbacks.append(inc_layer_grad)
         63         def continue_update(gradient, sgd=None):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\hash_embed.py in begin_update(self, ids, drop)
         49         if ids.ndim >= 2:
         50             ids = self.ops.xp.ascontiguousarray(ids[:, self.column], dtype='uint64')
    ---> 51         keys = self.ops.hash(ids, self.seed) % self.nV
         52         vectors = self.vectors[keys].sum(axis=1)
         53         mask = self.ops.get_dropout_mask((vectors.shape[1],), drop)
    
    ops.pyx in thinc.neural.ops.CupyOps.hash()
    
    AttributeError: module 'thinc_gpu_ops' has no attribute 'hash'
    
    opened by j2l 32
  • Add assert that size hasn't changed for reduce mean backprop

    Add assert that size hasn't changed for reduce mean backprop

    Behavior of this with numpyops seems wrong, as instead of giving an error it produces nans, as though it's accessing out of bounds memory or something. See https://github.com/explosion/spaCy/pull/9669.

    Some more general issues / questions around this:

    1. Do we want to add this kind of check for all cases where appropriate?
    2. Is there a better place (like in numpyops) for this check or an equivalent check?
    3. Other asserts in Thinc seem to not use messages, should the message here be removed?
    feat / layers feat / ux 
    opened by polm 14
  • numpy.ndarray size changed   Expected 88 from C header, got 80 from PyObject

    numpy.ndarray size changed Expected 88 from C header, got 80 from PyObject

    error showed in terminal:
    File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

    numpy with version 1.19.0 is installed in my server Could you please show me how to solve this problem ? thx!

    install 
    opened by hzg456 13
  • Using list2padded for sequences

    Using list2padded for sequences

    I have referred to https://github.com/explosion/thinc/blob/master/examples/03_pos_tagger_basic_cnn.ipynb to get a better understanding of the thinc layers. The model is as follows in the example

    model = strings2arrays() >> with_array(
            HashEmbed(nO=width, nV=vector_width, column=0)
            >> expand_window(window_size=1)
            >> ReLu(nO=width, nI=width * 3)
            >> ReLu(nO=width, nI=width)
            >> Softmax(nO=nr_classes, nI=width)
        )
    

    Can someone please explain what a string2arrays does? The documentation says that it takes a sequence of sequence of string and produces a list[int2D].

    The input X_train is something like [["this","is","awesome"],["thinc","is","cool"]]. What does the strings2arrays transform this example to? I am unable to wrap my head around what exactly strings2arrays does and how it transforms the input from a List[List] (2D lists) to List[Int2D] technically (3D lists/sequence).

    feat / layers 
    opened by naveenjafer 10
  • Fixes for slow tests

    Fixes for slow tests

    Will leave this in draft mode until explosionbot is updated to support the Buildkite thinc slow test suite.

    Requirements

    • explosion/explosion-bot#3
    • #672
    tests 
    opened by shadeMe 9
  • Fix reductions when applied to zero-length sequences

    Fix reductions when applied to zero-length sequences

    • A bug was introduced in NumpyOps that caused the output pointer not to be incremented for zero-length sequences.
    • When using uninitialized arrays, the sum or mean for a zero-length array was not correctly set to zero in Ops.

    We should also do something about zero-length sequences for reduce_max. However, it's unclear what the corresponding which should be set to (since 0 is not a valid index for a zero-sized sequence). Maybe we should throw an exception when trying to apply reduce_max to a zero-length sequence?

    bug feat / ops 
    opened by danieldk 9
  • Reduce unnecessary zero-init'd allocations

    Reduce unnecessary zero-init'd allocations

    I've added a new keyword arg uninitialized to the allocator methods in the NumpyOps and Ops classes. Also replaced some zero-init'd allocations where the elements in the destination array were immediately replaced.

    enhancement feat / ops 
    opened by shadeMe 9
  • Add textcat from config example

    Add textcat from config example

    A textcat from config example that supports two datasets with mutually exclusive classes:

    • imdb: 2 classes, from ml-datasets
    • dbpedia_ontology: 14 classes, loader implemented locally for fast.ai dataset on AWS

    Requires syntok for tokenization.

    examples 
    opened by adrianeboyd 9
  • ModuleNotFoundError: No module named 'thinc.neural.util'

    ModuleNotFoundError: No module named 'thinc.neural.util'

    I installed spaCy on my Mac OSX 10.11.6 into an Anaconda Python 3.6 environment with conda. I also was able to build thinc from source with 'fab'.

    $ conda --version
    conda 4.3.29
    $ python --version
    Python 3.6.3
    $ fab --version
    Fabric 1.14.0
    Paramiko 2.3.1
    
    

    When I try to install the spaCy model I get this error:

     python -m sense2vec.download
    
    

    ModuleNotFoundError: No module named 'thinc.neural.util'

    I see util.py in thinc.neural package on github. There must be a conflict or version mismatch with what I downloaded from conda and what is in the latest version of 'thinc' on github. Any ideas?

    From Conda: spacy: 2.0.4-py36_0 conda-forge thinc: 6.10.0-py36_0 conda-forge

    $ conda list thinc
    # packages in environment at /Users/davidlaxer/anaconda/envs/py36:
    #
    thinc                     6.10.2                    <pip>
    thinc                     6.10.0                   py36_0    conda-forge
    thinc                     5.0.8                     <pip>
    (py36) David-Laxers-MacBook-Pro:sense2vec davidlaxer$ conda list spacy
    # packages in environment at /Users/davidlaxer/anaconda/envs/py36:
    #
    spacy                     2.0.5                     <pip>
    spacy                     2.0.4                    py36_0    conda-forge
    spacy                     0.101.0                   <pip>
    (py36) David-Laxers-MacBook-Pro:sense2vec davidlaxer$ python -m spacy validate
    Traceback (most recent call last):
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/runpy.py", line 183, in _run_module_as_main
        mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/runpy.py", line 142, in _get_module_details
        return _get_module_details(pkg_main_name, error)
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/runpy.py", line 109, in _get_module_details
        __import__(pkg_name)
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/site-packages/spacy/__init__.py", line 4, in <module>
        from .cli.info import info as cli_info
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/site-packages/spacy/cli/__init__.py", line 1, in <module>
        from .download import download
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/site-packages/spacy/cli/download.py", line 10, in <module>
        from .link import link
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/site-packages/spacy/cli/link.py", line 7, in <module>
        from ..compat import symlink_to, path2str
      File "/Users/davidlaxer/anaconda/envs/py36/lib/python3.6/site-packages/spacy/compat.py", line 11, in <module>
        from thinc.neural.util import copy_array
    ModuleNotFoundError: No module named 'thinc.neural.util'
    
    
    
    opened by dbl001 9
  • Failed to compile on Windows and Python 3.6

    Failed to compile on Windows and Python 3.6

    Hi, I tried to install spacy 2.0 alpha on Windows, but I can´t compile the thinc.

    I´m using Python 3.6 64 bits.

    Finished generating code
        building 'thinc.neural.gpu_ops' extension
        C:\Program Files (x86)\Microsoft Visual Studio 15.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Ic:\python36\include -IC:\Users\bratao\AppData\Local\Temp\pip-build-8r66gl1_\thinc\include -Ic:\python36\include -Ic:\python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio 15.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 15.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\winrt" /EHsc /Tpthinc/neural/gpu_ops.cpp /Fobuild\temp.win-amd64-3.6\Release\thinc/neural/gpu_ops.obj gcc nvcc
        cl : Command line warning D9024 : unrecognized source file type 'gcc', object file assumed
        cl : Command line warning D9027 : source file 'gcc' ignored
        cl : Command line warning D9024 : unrecognized source file type 'nvcc', object file assumed
        cl : Command line warning D9027 : source file 'nvcc' ignored
        gpu_ops.cpp
        c:\users\bratao\appdata\local\temp\pip-build-8r66gl1_\thinc\include\numpy\npy_deprecated_api.h(8) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
        C:\Program Files (x86)\Microsoft Visual Studio 15.0\VC\BIN\x86_amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:c:\python36\libs /LIBPATH:c:\python36\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 15.0\VC\LIB\amd64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 15.0\VC\ATLMFC\LIB\amd64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" /EXPORT:PyInit_gpu_ops build\temp.win-amd64-3.6\Release\thinc/neural/gpu_ops.obj /OUT:build\lib.win-amd64-3.6\thinc\neural\gpu_ops.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\thinc/neural\gpu_ops.cp36-win_amd64.lib
        gpu_ops.obj : warning LNK4197: export 'PyInit_gpu_ops' specified multiple times; using first specification
           Creating library build\temp.win-amd64-3.6\Release\thinc/neural\gpu_ops.cp36-win_amd64.lib and object build\temp.win-amd64-3.6\Release\thinc/neural\gpu_ops.cp36-win_amd64.exp
        gpu_ops.obj : error LNK2001: unresolved external symbol "void __cdecl gpu_max_pool(float *,int *,float const *,int const *,int,int,int)" ([email protected]@[email protected])
        gpu_ops.obj : error LNK2001: unresolved external symbol "void __cdecl gpu_backprop_max_pool(float *,float const *,int const *,int const *,int,int,int)" ([email protected]@[email protected])
        gpu_ops.obj : error LNK2001: unresolved external symbol "void __cdecl gpu_mean_pool(float *,float const *,int const *,int,int,int)" ([email protected]@[email protected])
        gpu_ops.obj : error LNK2001: unresolved external symbol "void __cdecl gpu_backprop_mean_pool(float *,float const *,int const *,int,int,int)" ([email protected]@[email protected])
        gpu_ops.obj : error LNK2001: unresolved external symbol "void __cdecl gpu_hash_data(char *,char const *,unsigned __int64,unsigned __int64,unsigned __int64,unsigned int)" ([email protected]@[email protected])
        build\lib.win-amd64-3.6\thinc\neural\gpu_ops.cp36-win_amd64.pyd : fatal error LNK1120: 5 unresolved externals
        error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 15.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1120
    
        ----------------------------------------
    Command "c:\python36\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\bratao\\AppData\\Local\\Temp\\pip-build-8r66gl1_\\thinc\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\bratao\AppData\Local\Temp\pip-coc6ueyu-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\bratao\AppData\Local\Temp\pip-build-8r66gl1_\thinc\
    
    
    bug 
    opened by bratao 9
  • Fix performance regressions in parser caused by `get_ops`

    Fix performance regressions in parser caused by `get_ops`

    The thinc.backends.get_ops function gets called in the parser model's hotpath to fetch the correct CBlas struct (when executing on the GPU/CupyOps is used).

    This was problematic for two reasons:

    • Firstly, the function attempts to import a couple of external packages (which adds a lot of overhead when the package in question is not found on the host machine, leading to the Python runtime making a lot of IO syscalls).
    • Secondly, it attempts to enumerate all possible, registered ops. The get_all method it calls on the Registry object performs a copy of a global REGISTRY dict and iterates over it, incurring even more overhead.

    This PR introduces the following changes:

    • It moves the imports of thinc_apple_ops and thinc_backend_ops to the compat module to make it an one-off operation.
    • It modifies get_ops to directly use the specific Ops types when querying for CPU backends instead of looking it up by name in the global ops registry.
    bug feat / ops 
    opened by shadeMe 3
  • Add `with_signpost_interval` layer

    Add `with_signpost_interval` layer

    This layer wraps a layer, adding macOS interval signposts for the forward and backward pass. These intervals can then be visualized in the macOS Instruments.app timeline (similar to NVTX ranges work).

    Example of Thinc layers that are visualized in the Instruments.app timeline:

    Screen Shot 2022-07-01 at 11 00 30
    opened by danieldk 0
  • Label smooth threshold fix

    Label smooth threshold fix

    The maximum label-smoothing parameter allowed in to_categorical was 0.5. However, this value was not correct. In reality the thing we need to make sure is that the value at the index where the 1 is in the one-hot vector remains the maximum after smoothing: We are looking for the largest alpha such that argmax(x) = argmax(smooth(x, alpha)).

    In the new version the input validation dynamically adapts what is the maximum value that label_smoothing could take, plus it checks whether the label_smoothing parameter is less than 0.

    bug feat / loss 
    opened by kadarakos 0
  • `get_array_module` with non-array input returns `None`

    `get_array_module` with non-array input returns `None`

    The get_array_module utility had the behavior of returning numpy as the array module if the input to the function is not a cupy array. I think this can lead to some hard to catch errors so this PR suggests to return None in case the input is neither a cupy nor a numpy array.

    opened by kadarakos 0
  • Unroll `argmax` in `maxout` for small sizes of `P`

    Unroll `argmax` in `maxout` for small sizes of `P`

    maxout uses the argmax function to determine the index of the maximum value of each P inputs. argmax uses a generic array loop, which impedes speculative execution and could also prevent unrolling of the outer maxout loop.

    This change unrolls argmax with small values of P using a variadic template. This leads to a small performance improvement.

    Before this change, maxout takes ~3.9% of the runtime:

    Screen Shot 2022-06-15 at 16 58 57

    After this change, maxout takes ~2.7% of the runtime:

    Screen Shot 2022-06-15 at 16 56 33

    (The unrolled argmax function is no longer a separate unit in the profile.)

    enhancement performance feat / ops 
    opened by danieldk 0
Releases(v8.0.17)
  • v8.0.17(Jun 2, 2022)

    ✨ New features and improvements

    • Extend support for typing_extensions up to v4.1.x (for Python 3.7 and earlier).
    • Various fixes in the test suite.

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @shadeMe

    Source code(tar.gz)
    Source code(zip)
  • v8.0.16(May 19, 2022)

    ✨ New features and improvements

    🔴 Bug fixes

    • Fix issue #624: Support CPU inference for models trained with gradient scaling.
    • Fix issue #633: Fix invalid indexing in Beam when no states have valid transitions.
    • Fix issue #639: Improve PyTorch Tensor handling in CupyOps.asarray.
    • Fix issue #649: Clamp inputs in Ops.sigmoid to prevent overflow.
    • Fix issue #651: Fix type safety issue with model ID assignment.
    • Fix issue #653: Correctly handle Tensorflow GPU tensors in tests.
    • Fix issue #660: Make is_torch_array work without PyTorch installed.
    • Fix issue #664: Fix out of-bounds writes in CupyOps.adam and NumpyOps.adam.

    ⚠️ Backwards incompatibilities

    • The init implementations for layers no longer return Model.

    📖 Documentation and examples

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @kadarakos, @koaning, @notplus, @richardpaulhudson, @shadeMe

    Source code(tar.gz)
    Source code(zip)
  • v8.0.15(Mar 15, 2022)

    🔴 Bug fixes

    • Fix issue #610: Improve compatibility with PyTorch versions before v1.9.0.

    👥 Contributors

    @adrianeboyd, @danieldk

    Source code(tar.gz)
    Source code(zip)
  • v8.0.14(Mar 14, 2022)

    ✨ New features and improvements

    🔴 Bug fixes

    • Fix issue #552: Do not backpropagate Inf/NaN out of PyTorch layers when using mixed-precision training.
    • Fix issue #578: Correctly cast the threshold argument of CupyOps.mish and correct an equation in Ops.backprop_mish.
    • Fix issue #587: Correct invariant checks in CategoricalCrossentropy.get_grad.
    • Fix issue #592: Update murmurhashrequirement.
    • Fix issue #594: Do not sort positional arguments in Config.

    ⚠️ Backwards incompatibilities

    • The out keyword argument of Ops.mish and Ops.backprop_mish is replaced by inplace for consistency with other activations.

    📖Documentation and examples

    👥 Contributors

    @adrianeboyd, @andrewsi-z, @danieldk, @honnibal, @ines, @Jette16, @kadarakos, @kianmeng, @polm, @svlandeg, @thatbudakguy

    Source code(tar.gz)
    Source code(zip)
  • v8.0.12(Oct 28, 2021)

    🔴 Bug fixes

    • Fix issue #553: Switch torch tensor type with set_ops and use_ops.
    • Fix issue #554: Always restore original ops after use_ops.

    👥 Contributors

    @adrianeboyd, @danieldk, @ryndaniels, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.11(Oct 20, 2021)

    ✨ New features and improvements

    • Speed up GPU training time with up to ~25% by using cuBLAS for computing Frobenius norms in gradient clipping.
    • Give preference to AppleOps (if available) when calling get_ops("cpu").
    • Support missing values in CategoricalCrossEntropy when the labels are integers.
    • Provide the option to run model.walk with depth-first traversal.
    • Wrap forward/init callbacks of a Model in with_debug and with_nvtx_range to facilitate recursively instrumenting models.

    🔴 Bug fixes

    • Fix issue #537: Fix replace_node on nodes with indirect node refs.

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.10(Sep 7, 2021)

  • v8.0.9(Sep 3, 2021)

    ✨ New features and improvements

    • Add ops registry.
    • Enable config overrides to add new keys.
    • Allow newer releases of nbconvert and nbformat.
    • Layer for marking NVTX ranges.
    • Support mixed-precision training in the PyTorch shim (experimental).

    🔴 Bug fixes

    • Fix issue #521: Fix numpy_ops gemm output.
    • Fix issue #525: Fix mypy plugin crash on variadic arguments.

    👥 Contributors

    @adrianeboyd, @connorbrinton, @danieldk, @honnibal, @ines, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.8(Jul 19, 2021)

  • v8.0.7(Jul 1, 2021)

    🔴 Bug fixes

    • Fix issue #512: Include final n-gram in NumpyOps.ngrams.
    • Fix issue #516: Update initializers for typing in numpy 1.21+.
    Source code(tar.gz)
    Source code(zip)
  • v8.0.6(Jul 1, 2021)

  • v8.0.5(Jun 16, 2021)

  • v8.0.4(Jun 11, 2021)

    ✨ New features and improvements

    • Add tuplify layer.
    • More generic implementation of the concatenate layer.
    • Add resizable layer.
    • Introduce force parameter for model.set_dim().
    • Improve UX when setting the GPU allocator.

    🔴 Bug fixes

    • Fix issue #492: Fix backpropagation in with_getitem.
    • Fix issue #494: Resolve forward refs issue with Pydantic.
    • Fix issue #496: Avoid Pydantic versions with security vulnerabilities.

    👥 Contributors

    @adrianeboyd, @honnibal, @ines, @kludex, @polm, @svlandeg, @thomashacker

    Source code(tar.gz)
    Source code(zip)
  • v8.0.3(Apr 19, 2021)

    🔴 Bug fixes

    • Fix issue #486: Fix expand_window for empty docs on GPU
    • Fix issue #487: Require catalogue>=2.0.3 due to performance regressions related to importlib-metadata
    • Fix issue #488: Fix config override & interpolate interaction
    Source code(tar.gz)
    Source code(zip)
  • v8.0.2(Mar 9, 2021)

    ✨ New features and improvements

    • Add map_list layer (#472)

    🔴 Bug fixes

    • Fix issue #465: Fix saving models to Pathy paths
    • Fix issue #466: Avoid initializing with Y if X is set
    • Fix issue #470: Reset torch tensor type in require_cpu
    • Fix issue #484: Ensure consistency of nO dim for BiLSTM
    Source code(tar.gz)
    Source code(zip)
  • v8.0.1(Mar 9, 2021)

  • v8.0.0(Jan 24, 2021)

    🔮 This version of Thinc has been rewritten from the ground up and will be used to power the upcoming spaCy v3.0. The new Thinc v8.0 is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow or MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models. For more details, see the documentation.

    ✨ New features and improvements

    • Use any framework: Switch between PyTorch, TensorFlow and MXNet models without changing your application, or even create mutant hybrids using zero-copy array interchange.
    • Type checking: Develop faster and catch bugs sooner with sophisticated type checking. Trying to pass a 1-dimensional array into a model that expects two dimensions? That’s a type error. Your editor can pick it up as the code leaves your fingers.
    • Config system: Configuration is a major pain for ML. Thinc lets you describe trees of objects with references to your own functions, so you can stop passing around blobs of settings. It's simple, clean, and it works for both research and production.
    • Super lightweight: Small and easy to install with very few required dependencies, available on pip and conda for Linux, macOS and Windows. Simple source with a consistent API.
    • Concise functional-programming approach to model definition using composition rather than inheritance.
    • First-class support for variable-length sequences: multiple built-in sequence representations and your layers can use any object.
    Source code(tar.gz)
    Source code(zip)
  • v7.4.5(Dec 11, 2020)

  • v7.4.4(Dec 10, 2020)

    🔴 Bug fixes

    • Update for compatibility with cupy v8.
    • Remove f-strings from PyTorchWrapper.
    • Remove detailed numpy build constraints from pyproject.toml.
    • Update Cython extension setup.
    Source code(tar.gz)
    Source code(zip)
  • v7.4.3(Dec 10, 2020)

    ✨ New features and improvements

    • Add seed argument to ParametricAttention.
    • Dynamically include numpy headers and add numpy build constraints.
    • Update tests to support hypothesis v5.

    🔴 Bug fixes

    • Fix memory leak in Beam.
    Source code(tar.gz)
    Source code(zip)
  • v7.4.2(Dec 10, 2020)

  • v7.4.1(May 24, 2020)

    🔴 Bug fixes

    • Use 0-vector for OOV in StaticVectors to fix similarity bug in spaCy
    • Fix murmurhash on platforms where long type was not 64 bit
    Source code(tar.gz)
    Source code(zip)
  • v7.3.1(Oct 30, 2019)

  • v7.3.0(Oct 28, 2019)

    ✨ New features and improvements

    • Add Mish activation. Use via the thinc.v2v.Mish layer, which computes f(X) = mish(W @ X + b). CUDA and Cython kernels are included to make the activation efficient.
    • Add experimental support for RAdam to the optimizer. Enable it with the keyword argument use_radam to True. In preliminary testing, it's a small change that's worth enabling.
    • Add experimental support for Lookahead to the optimizer. Enable it by setting the keyword argument lookahead_k to a positive integer. In preliminary testing, it helps if you're not using parameter averaging, but with averaging it's a bit worse.
    • Add experimental support for LARS to the optimizer. Enable it by setting use_lars to True. In preliminary testing, this hasn't worked well at all – possibly our implementation is broken.

    🙏 Acknowledgements

    Big thanks to @digantamisra98 for the Mish activation, especially the extensive experiments and simple gradient calculation. We expect to be using the activation in the next round of spaCy models.

    Gratitude to the fast.ai community for their crowd-sourced experiments, and especially to users @LessW2020, @MGrankin and others for their optimizer implementations, which we referenced heavily when implementing the optimizers for Thinc. More importantly, it's super helpful to have a community filtering the deluge of papers for techniques that work on a few different datasets. This thread on optimization research was particularly helpful.

    Source code(tar.gz)
    Source code(zip)
  • v7.2.0(Oct 20, 2019)

    ✨ New features and improvements

    • Ditch thinc_gpu_ops for simpler GPU install.
    • Improve GPU support and PyTorch wrapper.

    🔴 Bug fixes

    • Fix issue #47: Fix ExtractWindow nW>=2.
    • Fix issue #51: Ditch thinc_gpu_ops for simpler GPU install.
    • Fix issue #88: Fix Quora URL in datasets.
    • Fix issue #115: Fix compilation on cygwin.

    👥 Contributors

    Thanks to @rupsaijna and @KoichiYasuoka for the pull requests!

    Source code(tar.gz)
    Source code(zip)
  • v7.1.1(Sep 10, 2019)

    ✨ New features and improvements

    • Allow support for preshed v3.0.0, which includes some bug fixes when items are deleted from the table, and also features Bloom filters.
    • Use collections.abc when possible and avoid deprecation warning.

    👥 Contributors

    Thanks to @hervenicol for the pull request!

    Source code(tar.gz)
    Source code(zip)
  • v7.1.0(Aug 23, 2019)

    ✨ New features and improvements

    • Support read-only numpy arrays, by specifying const in Cython memory-view types. Read-only arrays are helpful for shared-memory multiprocessing, e.g. from Apache Arrow's Plasma object store.

    • Update to cython-blis v0.4, which supports non-x86_64 CPU architectures. For wide (but slow) support, you can specify the environment variable BLIS_ARCH=generic before installing.

    Source code(tar.gz)
    Source code(zip)
  • v7.0.8(Jul 11, 2019)

  • v7.0.7(Jul 11, 2019)

  • v7.0.6(Jul 11, 2019)

Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 192 Jun 16, 2022
Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

null 1 Nov 27, 2021
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 123 Jul 1, 2022
fklearn: Functional Machine Learning

fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th

nubank 1.4k Jun 29, 2022
Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

tf-fsvd TensorFlow Implementation of Functional Singular Value Decomposition for paper Fast Graph Learning with Unique Optimal Solutions Cite If you f

Sami Abu-El-Haija 14 Nov 25, 2021
Let Python optimize the best stop loss and take profits for your TradingView strategy.

TradingView Machine Learning TradeView is a free and open source Trading View bot written in Python. It is designed to support all major exchanges. It

Robert Roman 314 Jun 25, 2022
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

Med-AIR@CUHK 34 May 6, 2022
Data from "HateCheck: Functional Tests for Hate Speech Detection Models" (Röttger et al., ACL 2021)

In this repo, you can find the data from our ACL 2021 paper "HateCheck: Functional Tests for Hate Speech Detection Models". "test_suite_cases.csv" con

Paul Röttger 38 Jun 26, 2022
Recovering Brain Structure Network Using Functional Connectivity

Recovering-Brain-Structure-Network-Using-Functional-Connectivity Framework: Papers: This repository provides a PyTorch implementation of the models ad

null 3 Jun 19, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 382 Jun 26, 2022
Learning Compatible Embeddings, ICCV 2021

LCE Learning Compatible Embeddings, ICCV 2021 by Qiang Meng, Chixiang Zhang, Xiaoqiang Xu and Feng Zhou Paper: Arxiv We cannot release source codes pu

Qiang Meng 21 Apr 21, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4k Jul 1, 2022
Python project to take sound as input and output as RGB + Brightness values suitable for DMX

sound-to-light Python project to take sound as input and output as RGB + Brightness values suitable for DMX Current goals: Get one pixel working: Vary

Bobby Cox 1 Nov 17, 2021
In this project, two programs can help you take full agvantage of time on the model training with a remote server

In this project, two programs can help you take full agvantage of time on the model training with a remote server, which can push notification to your phone about the information during model training, like the model indices and unexpected interrupts. Then you can do something in time for your work.

GrayLee 7 Feb 8, 2022
This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Colorizer The point of this project is to write a program capable of taking a black and white / grayscale image, and generating a realistic or plausib

Maitri Shah 1 Jan 6, 2022
TakeInfoatNistforICS - Take Information in NIST NVD for ICS

Take Information in NIST NVD for ICS This project developed with Python. When yo

null 4 Jan 5, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.6k Jun 29, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.6k Jul 4, 2022