PPLNN

Overview

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for OpenMMLab.

Important Notice

PMX has changed to OPMX at 25/04/2024.
ChatGLM1 will not be supported in OPMX.
All LLM must be converted(or just rename pmx_params.json to opmx_params.json) and exported again.
You can find the old code at llm_v1

Known Issues

NCCL issue on some Device: Currently reported that L40S and H800 may encounter illegal memory access on NCCL AllReduce. We suggest trying to turn NCCL protocol Simple off by setting environment NCCL_PROTO=^Simple to fix this issue.

LLM Features

New LLM Engine(Overview)
Flash Attention
Split-k Attention(Similar with Flash Decoding)
Group-query Attention
Dynamic Batching(Also called Continous Batching or In-flight Batching)
Tensor Parallelism
Graph Optimization
INT8 groupwise KV Cache(Numerical accuracy is very close to FP16🚀)
INT8 per token per channel Quantization(W8A8)

LLM Model Zoo

Hello, world!

Installing prerequisites:

On Debian or Ubuntu:

apt-get install build-essential cmake git python3 python3-dev

On RedHat or CentOS:

yum install gcc gcc-c++ cmake3 make git python3 python3-devel

Cloning source code:

git clone https://github.com/openppl-public/ppl.nn.git

Building from source:

cd ppl.nn
./build.sh -DPPLNN_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON

Running python demo:

PYTHONPATH=./pplnn-build/install/lib python3 ./tools/pplnn.py --use-x86 --onnx-model tests/testdata/conv.onnx

Refer to Documents for more details.

Documents

Building from Source
How to Integrate
APIs
- C++
  - Getting Started
  - API Reference
- Python
  - Getting Started
  - API Reference
Develop Guide
- Adding New Engines and Ops
- X86
  - Supported Ops and Platforms
  - Adding Ops（中文版）
  - Benchmark（中文版）
- CUDA
  - Supported Ops and Platforms
  - Adding Ops（中文版）
  - Benchmark（中文版）
- RISCV
  - Supported Ops and Platforms
  - Adding Ops（中文版）
  - Benchmark（中文版）
- ARM
  - Adding Ops（中文版）
  - Benchmark（中文版）
- LLM-CUDA
  - Overview
Models
- Converting ONNX Opset
- Generating ONNX models from OpenMMLab
实现细节

Contact Us

Questions, reports, and suggestions are welcome through GitHub Issues!

WeChat Official Account	QQ Group
OpenPPL	627853444

Contributions

This project uses Contributor Covenant as code of conduct. Any contributions would be highly appreciated.

Acknowledgements

License

This project is distributed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1,188 Commits
.github		.github
cmake		cmake
docker		docker
docs		docs
include/ppl/nn		include/ppl/nn
python		python
samples		samples
src/ppl/nn		src/ppl/nn
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
VERSION_STRING		VERSION_STRING
build.bat		build.bat
build.sh		build.sh

License

openppl-public/ppl.nn

Folders and files

Latest commit

History

Repository files navigation

PPLNN

Overview

Important Notice

Known Issues

LLM Features

LLM Model Zoo

Hello, world!

Documents

Contact Us

Contributions

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages