🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Overview

thinc-apple-ops

Make spaCy and Thinc up to 8 Γ— faster on macOS by calling into Apple's native libraries.

⏳ Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

⏱ Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini (M1) 6492 27676 5
MacBook Air Core i5 2020 9790 10983 9
AMD Ryzen 5900X 22568 N/A 52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini M1 2020 3.34 10.07 5
MacBook Air Core i5 2020 3.10 3.27 10
AMD Ryzen 5900X 6.53 N/A 53
Comments
  • Pass through Accelerate sgemm/saxpy in Ops.cblas

    Pass through Accelerate sgemm/saxpy in Ops.cblas

    This can be used by e.g. the parser in spaCy 3.4 to use Accelerate's implementations.

    I am not sure how to handle this dependency-wise, since this requires Thinc 8.1, but we still want to people to be able to use thinc-apple-ops with Thinc 8.0.x and spaCy < 3.4. Do we need another minor release that sets thinc < 8.1.0?

    opened by danieldk 5
  • IndexError: Out of bounds on buffer access (axis 1)

    IndexError: Out of bounds on buffer access (axis 1)

    Hi I tried to use this awesome package and I am getting this error. Not sure what it means, maybe you guys could help me?

    I should mention that my data is quite big and I am also using some SWAP space. Could this be the reason of this error?

    [2021-09-28 21:09:01,238] [INFO] Set up nlp object from config
    [2021-09-28 21:09:01,500] [INFO] Pipeline: ['tok2vec', 'ner', 'sentencizer', 'entity_linker']
    [2021-09-28 21:09:01,505] [INFO] Created vocabulary
    [2021-09-28 21:09:01,505] [INFO] Finished initializing nlp object
    Traceback (most recent call last):
      File "/Users/joozty/Documents/kolurbo/venv/bin/spacy", line 8, in <module>
        sys.exit(setup_cli())
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/_util.py", line 69, in setup_cli
        command(prog_name=COMMAND)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
        return self.main(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
        return callback(**use_params)  # type: ignore
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/train.py", line 60, in train_cli
        nlp = init_nlp(config, use_gpu=use_gpu)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/training/initialize.py", line 84, in init_nlp
        nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/language.py", line 1272, in initialize
        proc.initialize(get_examples, nlp=self, **p_settings)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/pipeline/tok2vec.py", line 216, in initialize
        self.model.initialize(X=doc_sample)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 86, in init
        layer.initialize(X=curr_input, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 90, in init
        curr_input = layer.predict(curr_input)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
        return self._func(self, X, is_train=False)[0]
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
        return self._func(self, X, is_train=is_train)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/ml/staticvectors.py", line 46, in forward
        vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc_apple_ops/ops.py", line 25, in gemm
        C = blas.gemm(x, y, trans1=trans1, trans2=trans2)
      File "thinc_apple_ops/blas.pyx", line 37, in thinc_apple_ops.blas.gemm
      File "thinc_apple_ops/blas.pyx", line 53, in thinc_apple_ops.blas.gemm
    IndexError: Out of bounds on buffer access (axis 1)
    

    Info about spaCy

    • spaCy version: 3.1.3
    • Platform: macOS-11.6-arm64-arm-64bit
    • Python version: 3.9.7
    • Pipelines: en_core_web_sm (3.1.0), en_core_web_md (3.1.0)
    opened by Joozty 2
  • Can't compile thinc on Macbook Air M1

    Can't compile thinc on Macbook Air M1

    Hello, I find myself unable to compile this otherwise magnificent tool! Please help, if you can!

    I am on MacOS 12.1, Kernel Version 21.2.0, and have installed the latest Python (3.10.2)

    Here is the error message I get after trying to install with pip (apparently it can't find the Accelerate Libraries, especially Accelerate.h Header ...):

    ERROR: Command errored out with exit status 1: command: /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10 /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/tmp0bhlw2sh cwd: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-install-wgga78t9/thinc-apple-ops_f5b38888c7a149cd9f99fd524c2bd340 Complete output (34 lines): running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.9-universal2-3.10 creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/ops.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/test_gemm.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests running egg_info warning: no files found matching '.pxd' under directory 'thinc_apple_ops' warning: no files found matching '.txt' under directory 'thinc_apple_ops' writing manifest file 'thinc_apple_ops.egg-info/SOURCES.txt' copying thinc_apple_ops/blas.pyx -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/py.typed -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops running build_ext creating build/temp.macosx-10.9-universal2-3.10 creating build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c thinc_apple_ops/blas.c -o build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops/blas.o In file included from thinc_apple_ops/blas.c:706: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1960: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
    ^ thinc_apple_ops/blas.c:714:10: fatal error: 'Accelerate/Accelerate.h' file not found #include "Accelerate/Accelerate.h" ^~~~~~~~~~~~~~~~~~~~~~~~~ thinc_apple_ops/blas.c:714:10: note: did not find header 'Accelerate.h' in framework 'Accelerate' (loaded from '/System/Library/Frameworks') 1 warning and 1 error generated. error: command '/Library/Developer/CommandLineTools/usr/bin/clang' failed with exit code 1

    ERROR: Failed building wheel for thinc-apple-ops Failed to build thinc-apple-ops ERROR: Could not build wheels for thinc-apple-ops, which is required to install pyproject.toml-based projects

    ------------------------------------------ END---------------------------------------------------------------------------

    Any help would be greatly appreciated, thanks!

    duplicate 
    opened by amal1us 1
  • AppleOps.gemm: write in-place when `output` is given

    AppleOps.gemm: write in-place when `output` is given

    NumpyOps.gemm (with BLIS) writes the result of matrix multiplication in-place when the output argument is given. This changes AppleOps.gemm to do the same, avoiding allocation of a temporary.

    enhancement 
    opened by danieldk 0
  • Change thinc upper bound to <8.1.0

    Change thinc upper bound to <8.1.0

    thinc-apple-ops will require thinc >= 8.1.0 in the future for the CBLAS passthrough functionality. As discussed in #15, we should first do another minor thinc-apple-ops release specifically for thinc <8.1.0.

    Also bump the version to v0.0.7 to prepare for the release.

    opened by danieldk 0
  • Fix 0-size arrays

    Fix 0-size arrays

    Our bit of Cython code uses memory buffers, which apparently have a bounds-check when the size is 0 when acquiring the pointer. In contrast, in other bits of code we often acquire the buffer by casting the array.data pointer, which has no such bounds check. This led to IndexError being raised when zero shapes were passed through.

    opened by honnibal 0
  • Require thinc with ops registry

    Require thinc with ops registry

    Technically it doesn't require a currently unreleased version of thinc to run, but if people install it into an existing venv, then it's better to require the version of thinc to upgraded so that it's detected and used.

    opened by adrianeboyd 0
Releases(v0.1.3)
Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Rock πŸ’Ž Paper πŸ“ Scissors βœ‚οΈ Lizard 🦎 Spock πŸ––

Rock πŸ’Ž Paper πŸ“ Scissors βœ‚οΈ Lizard 🦎 Spock πŸ–– If you’ve seen The Big Bang Theory, you’ve heard of a game called β€œRock, Paper, Scissors, Lizard, Spoc

AmirHossein Mohammadi 16 Jun 19, 2022
Simple script to match riders with drivers.

theBestPooler Simple script to match riders with drivers. It's a greedy, unoptimised search, so no guarantees that it works. It just seems to work (ve

Devansh 1 Nov 22, 2021
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

Performance Ninja Class This is an online course where you can learn to find and fix low-level performance issues, for example CPU cache misses and br

Denis Bakhvalov 1.2k Dec 30, 2022
The fundamentals of Python!

The fundamentals of Python Author: Mohamed NIANG, Staff ML Scientist Presentation This repository contains notebooks on the fundamentals of Python. Th

Mohamed NIANG 1 Mar 15, 2022
Entitlement AND Hardened Runtime Check

Python3 script for macOS to recursively check /Applications and also check /usr/local/bin, /usr/bin, and /usr/sbin for binaries with problematic/interesting entitlements. Also checks for hardened run

Cedric Owens 79 Nov 16, 2022
A simple script that shows important photography times. written in python.

A simple script that shows important photography times. written in python.

John Evans 13 Oct 16, 2022
Script to use SysWhispers2 direct system calls from Cobalt Strike BOFs

SysWhispers2BOF Script to use SysWhispers2 direct system calls from Cobalt Strike BOFs. Introduction This script was initially created to fix specific

FalconForce 101 Dec 20, 2022
Machine Learning powered app to decide whether a photo is food or not.

Food Not Food dot app ( πŸ” 🚫 πŸ” ) Code for building a machine Learning powered app to decide whether a photo is of food or not. See it working live a

Daniel Bourke 48 Dec 28, 2022
Very Simple Zoom Spam Pinger!

Very Simple Zoom Spam Pinger!

Syntax. 2 Mar 05, 2022
Script to check if your Bistromatic handle everything as it should.

Bistromatic Checker Script to check if your Bistromatic handle everything as it should. The bistromatic is the project marking the end of the CPool at

Mathias 1 Dec 27, 2021
addons to the turtle package that help you drew stuff more quickly

TurtlePlus addons to the turtle package that help you drew stuff more quickly --------------

1 Nov 18, 2021
My programming language named JoLang. (Mainly created for fun)

JoLang status: not ready So this is my programming language which I decided to name 'JoLang' (inspired by Jonathan and GoLang). Features I implemented

Jonathan 14 Dec 22, 2022
Consulta cpf fds

Consulta-cpf Consulta cpf fds InstalaΓ§Γ£o: apt-get update -y

Moleey 1 Nov 24, 2021
My HA controller for veg and flower rooms

HAGrowRoom My HA controller for veg and flower rooms I will do my best to keep this updated as I change, add and improve. System heavily uses custom t

4 May 25, 2022
A joke conlang with minimal semantics

SyntaxLang Reserved Defined Words Word Function fo Terminates a noun phrase or verb phrase tu Converts an adjective block or sentence to a noun to Ter

Leo Treloar 1 Dec 07, 2021
Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Nikolaos Avouris 2 Dec 05, 2021
High-level bindings to the Valhalla framework.

Valhalla for Python This spin-off project simply offers improved Python bindings to the fantastic Valhalla project. Installation pip install valhalla

GISβ€Šβ€’β€ŠOPS 20 Dec 13, 2022
AlexaUsingPython - Alexa will pay attention to your order, as: Hello Alexa, play music, Hello Alexa

AlexaUsingPython - Alexa will pay attention to your order, as: Hello Alexa, play music, Hello Alexa, what's the time? Alexa will pay attention to your order, get it, and afterward do some activity as

Abubakar Sattar 10 Aug 18, 2022
A Trace Explorer for Reverse Engineers

Tenet - A Trace Explorer for Reverse Engineers Overview Tenet is an IDA Pro plugin for exploring execution traces. The goal of this plugin is to provi

1k Jan 02, 2023
This is an implementation of PEP 557, Data Classes.

This is an implementation of PEP 557, Data Classes. It is a backport for Python 3.6. Because dataclasses will be included in Python 3.7, any discussio

Eric V. Smith 561 Dec 06, 2022