🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Overview

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini (M1) 6492 27676 5
MacBook Air Core i5 2020 9790 10983 9
AMD Ryzen 5900X 22568 N/A 52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini M1 2020 3.34 10.07 5
MacBook Air Core i5 2020 3.10 3.27 10
AMD Ryzen 5900X 6.53 N/A 53
Comments
  • Pass through Accelerate sgemm/saxpy in Ops.cblas

    Pass through Accelerate sgemm/saxpy in Ops.cblas

    This can be used by e.g. the parser in spaCy 3.4 to use Accelerate's implementations.

    I am not sure how to handle this dependency-wise, since this requires Thinc 8.1, but we still want to people to be able to use thinc-apple-ops with Thinc 8.0.x and spaCy < 3.4. Do we need another minor release that sets thinc < 8.1.0?

    opened by danieldk 5
  • IndexError: Out of bounds on buffer access (axis 1)

    IndexError: Out of bounds on buffer access (axis 1)

    Hi I tried to use this awesome package and I am getting this error. Not sure what it means, maybe you guys could help me?

    I should mention that my data is quite big and I am also using some SWAP space. Could this be the reason of this error?

    [2021-09-28 21:09:01,238] [INFO] Set up nlp object from config
    [2021-09-28 21:09:01,500] [INFO] Pipeline: ['tok2vec', 'ner', 'sentencizer', 'entity_linker']
    [2021-09-28 21:09:01,505] [INFO] Created vocabulary
    [2021-09-28 21:09:01,505] [INFO] Finished initializing nlp object
    Traceback (most recent call last):
      File "/Users/joozty/Documents/kolurbo/venv/bin/spacy", line 8, in <module>
        sys.exit(setup_cli())
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/_util.py", line 69, in setup_cli
        command(prog_name=COMMAND)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
        return self.main(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
        return callback(**use_params)  # type: ignore
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/train.py", line 60, in train_cli
        nlp = init_nlp(config, use_gpu=use_gpu)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/training/initialize.py", line 84, in init_nlp
        nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/language.py", line 1272, in initialize
        proc.initialize(get_examples, nlp=self, **p_settings)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/pipeline/tok2vec.py", line 216, in initialize
        self.model.initialize(X=doc_sample)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 86, in init
        layer.initialize(X=curr_input, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 90, in init
        curr_input = layer.predict(curr_input)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
        return self._func(self, X, is_train=False)[0]
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
        return self._func(self, X, is_train=is_train)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/ml/staticvectors.py", line 46, in forward
        vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc_apple_ops/ops.py", line 25, in gemm
        C = blas.gemm(x, y, trans1=trans1, trans2=trans2)
      File "thinc_apple_ops/blas.pyx", line 37, in thinc_apple_ops.blas.gemm
      File "thinc_apple_ops/blas.pyx", line 53, in thinc_apple_ops.blas.gemm
    IndexError: Out of bounds on buffer access (axis 1)
    

    Info about spaCy

    • spaCy version: 3.1.3
    • Platform: macOS-11.6-arm64-arm-64bit
    • Python version: 3.9.7
    • Pipelines: en_core_web_sm (3.1.0), en_core_web_md (3.1.0)
    opened by Joozty 2
  • Can't compile thinc on Macbook Air M1

    Can't compile thinc on Macbook Air M1

    Hello, I find myself unable to compile this otherwise magnificent tool! Please help, if you can!

    I am on MacOS 12.1, Kernel Version 21.2.0, and have installed the latest Python (3.10.2)

    Here is the error message I get after trying to install with pip (apparently it can't find the Accelerate Libraries, especially Accelerate.h Header ...):

    ERROR: Command errored out with exit status 1: command: /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10 /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/tmp0bhlw2sh cwd: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-install-wgga78t9/thinc-apple-ops_f5b38888c7a149cd9f99fd524c2bd340 Complete output (34 lines): running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.9-universal2-3.10 creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/ops.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/test_gemm.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests running egg_info warning: no files found matching '.pxd' under directory 'thinc_apple_ops' warning: no files found matching '.txt' under directory 'thinc_apple_ops' writing manifest file 'thinc_apple_ops.egg-info/SOURCES.txt' copying thinc_apple_ops/blas.pyx -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/py.typed -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops running build_ext creating build/temp.macosx-10.9-universal2-3.10 creating build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c thinc_apple_ops/blas.c -o build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops/blas.o In file included from thinc_apple_ops/blas.c:706: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1960: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
    ^ thinc_apple_ops/blas.c:714:10: fatal error: 'Accelerate/Accelerate.h' file not found #include "Accelerate/Accelerate.h" ^~~~~~~~~~~~~~~~~~~~~~~~~ thinc_apple_ops/blas.c:714:10: note: did not find header 'Accelerate.h' in framework 'Accelerate' (loaded from '/System/Library/Frameworks') 1 warning and 1 error generated. error: command '/Library/Developer/CommandLineTools/usr/bin/clang' failed with exit code 1

    ERROR: Failed building wheel for thinc-apple-ops Failed to build thinc-apple-ops ERROR: Could not build wheels for thinc-apple-ops, which is required to install pyproject.toml-based projects

    ------------------------------------------ END---------------------------------------------------------------------------

    Any help would be greatly appreciated, thanks!

    duplicate 
    opened by amal1us 1
  • AppleOps.gemm: write in-place when `output` is given

    AppleOps.gemm: write in-place when `output` is given

    NumpyOps.gemm (with BLIS) writes the result of matrix multiplication in-place when the output argument is given. This changes AppleOps.gemm to do the same, avoiding allocation of a temporary.

    enhancement 
    opened by danieldk 0
  • Change thinc upper bound to <8.1.0

    Change thinc upper bound to <8.1.0

    thinc-apple-ops will require thinc >= 8.1.0 in the future for the CBLAS passthrough functionality. As discussed in #15, we should first do another minor thinc-apple-ops release specifically for thinc <8.1.0.

    Also bump the version to v0.0.7 to prepare for the release.

    opened by danieldk 0
  • Fix 0-size arrays

    Fix 0-size arrays

    Our bit of Cython code uses memory buffers, which apparently have a bounds-check when the size is 0 when acquiring the pointer. In contrast, in other bits of code we often acquire the buffer by casting the array.data pointer, which has no such bounds check. This led to IndexError being raised when zero shapes were passed through.

    opened by honnibal 0
  • Require thinc with ops registry

    Require thinc with ops registry

    Technically it doesn't require a currently unreleased version of thinc to run, but if people install it into an existing venv, then it's better to require the version of thinc to upgraded so that it's detected and used.

    opened by adrianeboyd 0
Releases(v0.1.3)
Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Create a simple program by applying the use of class

TUGAS PRAKTIKUM 8 💻 Nama : Achmad Mahfud NIM : 312110520 Kelas : TI.21.C5 Perintah : Buat program sederhana dengan mengaplikasikan pengguna

Achmad Mahfud 1 Dec 23, 2021
Tools for dos (denial-of-service) website / web server

DoS Attack Tools Tools for dos (denial-of-service) website / web server di buat olah NurvySec How to install on debian / ubuntu $ apt update $ apt ins

nurvy 1 Feb 10, 2022
Superset custom path for python

It is a common requirement to have superset running under a base url, (https://mydomain.at/analytics/ instead of https://mydomain.at/). I created the

9 Dec 14, 2022
Shopify Backend Developer Intern Challenge - Summer 2022

Shopify Backend Developer Intern The task is build an inventory tracking web application for a logistics company. The detailed task details can be fou

Meet Gandhi 11 Oct 08, 2022
Make pack up python files easier.

python-easy-pack make pack up python files easier. 目前只提供了中文环境 如何使用? 将index.py复制到你的项目文件夹,或者把.py文件拷贝到这个文件夹。 打开你的cmd或者powershell 切换到程序所在目录,输入python index

2 Dec 15, 2021
a really simple bot that send you memes from reddit to whatsapp

a really simple bot that send you memes from reddit to whatsapp want to use use it? install the dependencies with pip3 install -r requirements.txt the

pai 10 Nov 28, 2021
This python code will get requests from SET (The Stock Exchange of Thailand) a previously-close stock price and return it in Thai Baht currency using beautiful soup 4 HTML scrapper.

This python code will get requests from SET (The Stock Exchange of Thailand) a previously-close stock price and return it in Thai Baht currency using beautiful soup 4 HTML scrapper.

Andre 1 Oct 24, 2022
Demodulate and error correct FIS-B and ADS-B signals on 978 MHz.

FIS-B 978 ('fisb-978') is a set of programs that demodulates and error corrects FIS-B (Flight Information System - Broadcast) and ADS-B (Automatic Dep

2 Nov 15, 2022
A place where one-off ideas/partial projects can live comfortably

A place to post ideas, partial projects, or anything else that doesn't necessarily warrant its own repo, from my mind to the web.

Carson Scott 2 Feb 25, 2022
Simply create JIRA releases based on your github releases

Simply create JIRA releases based on your github releases

8 Jun 17, 2022
A pairs trade is a market neutral trading strategy enabling traders to profit from virtually any market conditions.

A pairs trade is a market neutral trading strategy enabling traders to profit from virtually any market conditions. This strategy is categorized as a statistical arbitrage and convergence trading str

Kanupriya Anand 13 Nov 27, 2022
Painel simples com consulta de cep,CNPJ,placa e ip

Painel mpm Um painel simples com consultas de IP, CNPJ, CEP e PLACA Início 🌐 apt update && apt upgrade -y pkg i python git pip install requests Insta

8 Feb 27, 2022
A corona information module

A corona information module

Fayas Noushad 3 Nov 28, 2021
Automatic certificate unpinning for Android apps

What is this? Script used to perform automatic certificate unpinning of an APK by adding a custom network security configuration that permits user-add

Antoine Neuenschwander 5 Jul 28, 2021
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Olga Tsiouri 1 Jan 23, 2022
AdventOfCode 2021 solutions from the Devcord server

adventofcode-21 Ein Sammel-Repository für Advent of Code 2021-Lösungen der deutschen DevCord-Community. A repository collecting Advent of Code 2021 so

Devcord 12 Aug 26, 2022
monster hunter world randomizer project

mhw_randomizer monster hunter world randomizer project Settings are in rando_config.py Current script for attack randomization is n mytest.py There ar

2 Jan 24, 2022
A multi-platform fuzzer for poking at userland binaries and servers

litefuzz A multi-platform fuzzer for poking at userland binaries and servers litefuzz intro why how it works what it does what it doesn't do support p

52 Nov 18, 2022
Kellogg bad | Union good | Support strike funds

KelloggBot Credit to SeanDaBlack for the basis of the script. req.py is selenium python bot. sc.js is a the base of the ios shortcut [COMING SOON] Set

407 Nov 17, 2022
Apilytics for Python - Easy API analytics for Python backends

apilytics-python Installation Sign up and get your API key from https://apilytics.io - we offer a completely free trial with no credit card required!

Apilytics 6 Sep 29, 2022