An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Last update: Dec 27, 2022

Overview

Ultra_Fast_Lane_Detection_TensorRT

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)
这是一个基于TensorRT加速UFLD的repo，包含PyThon ONNX Parser以及C++ TensorRT API版本, 还包括Torch2TRT版本, 对源码和论文感兴趣的请参见：https://github.com/cfzd/Ultra-Fast-Lane-Detection

一. PyThon ONNX Parser

1. How to run

1) pip install -r requirements.txt

2) TensorRT7.x wil be fine, and other version may got some errors

2) For PyTorch, you can also try another version like 1.6, 1.5 or 1.4

2. Build ONNX（将训练好的pth/pt模型转换为onnx）

1) static（生成静态onnx模型）:
python3 torch2onnx.py onnx_dynamic_int8/configs/tusimple_4.py --test_model ./tusimple_18.pth 

2) dynamic（生成支持动态输入的onnx模型）:
First: vim torch2onnx.py
second: change "fix" from "True" to "False"
python3 torch2onnx.py onnx_dynamic_int8/configs/tusimple_4.py --test_model ./tusimple_18.pth

3. Build trt engine（将onnx模型转换为TensorRT的推理引擎）

We support many different types of engine export, such as static fp32, fp16, dynamic fp32, fp16, and int8 quantization
我们支持多种不同类型engine的导出，例如：静态fp32、fp16，动态fp32、fp16，以及int8的量化

static(fp32, fp16): 对于静态模型的导出，终端输入：

fp32:
python3 build_engine.py --onnx_path model_static.onnx --mode fp32<br/>
fp16:
python3 build_engine.py --onnx_path model_static.onnx --mode fp16<br/>

dynamic(fp32, fp16): 对于动态模型的导出，终端输入：

fp32:
python3 build_engine.py --onnx_path model_dynamic.onnx --mode fp32 --dynamic
fp16:
python3 build_engine.py --onnx_path model_dynamic.onnx --mode fp16 --dynamic

int8 quantization 如果想使用int8量化，终端输入：

python3 build_engine.py --onnx_path model_static.onnx --mode int8 --int8_data_path data/testset1000
# （int8_data_Path represents the calibration dataset）
# （其中int8_data_path表示校正数据集)

4. evaluate(compare)

（If you want to compare the acceleration and accuracy of reasoning through TRT with using pytorch, you can run the script）
（如果您想要比较通过TRT推理后，相对于使用PyTorch的加速以及精确度情况，可以运行该脚本）

python3 evaluate.py --pth_path PATH_OF_PTH_MODEL --trt_path PATH_OF_TRT_MODEL

二. torch2trt

torch2trt is an easy tool to convert pytorch model to tensorrt, you can check model details here:
https://github.com/NVIDIA-AI-IOT/torch2trt
（torch2trt 是一个易于使用的PyTorch到TensorRT转换器）

How to run

1) git clone https://github.com/NVIDIA-AI-IOT/torch2trt

2) python setup.py install

2) PyTorch >= 1.6 (other versions may got some errors)

生成trt模型

python3 export_trt.py

torch2trt 预测demo (可视化)

python3 demo_torch2trt.py --trt_path PATH_OF_TRT_MODEL --data_path PATH_OF_YOUR_IMG

evaluated

python3 evaluate.py --pth_path PATH_OF_PTH_MODEL --trt_path PATH_OF_TRT_MODEL --data_path PATH_OF_YOUR_IMG --torch2trt

三. C++ TensorRT API

生成权重文件

python3 export_trtcy.py

trt模型生成

修改第十行为 #define USE_FP32,则为FP32模式, 修改第十行为 #define USE_FP16,则为FP16模式

mkdir build
cd build
cmake ..
make
./lane_det -transfer             //  'lane_det.engine'

Tensorrt预测

./lane_det -infer  ../imgs

四. trtexec

test tensorrt_dynamic_model on terminal, for instance, for batch_size=BATCH_SIZE, just run:

trtexec  --explicitBatch --minShapes=1x3x288x800 --optShapes=1x3x288x800 --maxShapes=32x3x288x800 --shapes=BATCH_SIZEx3x288x800 --loadEngine=lane_fp32_dynamic.trt --noDataTransfers --dumpProfile --separateProfileRun

You might also like...

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This repository is a modification of: https://github.com/openai/gpt-2 for our sp

1 Jan 6, 2022

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Arabic-Phonetic-Output You can input the phonetic version of any Arabic text her

1 Dec 30, 2021

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

148 Dec 26, 2022

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Colibri Core by Maarten van Gompel, [email protected], Radboud University Nijmegen Licensed under GPLv3 (See http://www.gnu.org/licenses/gpl-3.0.html

122 Nov 17, 2022

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

60 Dec 31, 2022

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

AI Dynamic Text Reader: This is a simple dynamic text reader based on Artificial

1 Jan 18, 2022

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支，删除 wavegan 分支！ 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块！ 2021/04/13 softdtw 分支支持使用 Sof

161 Dec 19, 2022

Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

112 Dec 5, 2022

A high-level yet extensible library for fast language model tuning via automatic prompt search

ruPrompts ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with Hugg

37 Dec 7, 2022

Comments

bug in UFLD_C++/main.cpp

in function softmax_mul() : exp() don't substruct channel's (100) largest value; int funcion argmax(): "int max" should change to "float max".

opened by tangjianping54 0
请问怎么用CULane数据集训练的权重来推理

我使用UFLD_C++来进行推理，修改了export_trtcy.py中的model = parsingNet(pretrained=False, backbone='18', cls_dim=(101, 56, 4), use_aux=False).cuda()，改为model = parsingNet(pretrained=False, backbone='18', cls_dim=(201, 18, 4), use_aux=False).cuda()，并且把OUTPUT_C改成201，把OUTPUT_H改成18，把OUTPUT_W改为4. 然后运行./lane_det -transfer的时候抛出了下面的错误： ./lane_det -transfer Loading weights: ../lane_culane.trtcy Platform supports fp16 mode and use it !!! Building engine, please wait for a while... [08/29/2022-11:29:31] [E] [TRT] (Unnamed Layer* 73) [Constant]: constant weights has count 29638656 but 46333952 was expected [08/29/2022-11:29:31] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 73) [Constant]_output, because the network is not valid. [08/29/2022-11:29:31] [E] [TRT] Network validation failed. Build engine successfully! lane_det: /home/juche/Desktop/lmf_workspace/Ultra_Fast_Lane_Detection_TensorRT/UFLD_C++/UFLD/UFLD_net.cpp:138: void UFLD_net::APIToModel(nvinfer1::IHostMemory**): Assertion `engine != nullptr' failed. Aborted (core dumped)

请问我该怎么办？

opened by limengfei3675 1
Unpickling issue with torch2trt

I converted the tusimple_18.pth weight from the original UFLD repo using torch2onnx.py and build_engine.py scripts to a trt file. Running evaluate.py shows Inference time with PyTorch = 141.777 ms and Inference time with TensorRT_static = 27.395 ms in fp16. However, running UFLD_torch2trt/demo_torch2trt.py returns this error: Traceback (most recent call last): File "UFLD_torch2trt/demo_torch2trt.py", line 96, in <module> demo_with_torch2trt(trt_path, data_path) File "UFLD_torch2trt/demo_torch2trt.py", line 31, in demo_with_torch2trt model_trt.load_state_dict(torch.load(trt_file_path)) File "/home/nam/.local/lib/python3.6/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/nam/.local/lib/python3.6/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: unpickling stack underflow It appears the issue mostly comes from loading old torchvision models, I tried to delete torch caches but it didnt work. I tried for both static and dynamic model but the result is the same. :(

opened by namKolorfuL 0
Issue with demo_trt.py

Hi, I downloaded tusimple_18.pth weight from the original UFLD repo and converted it to trt using your scipts in UFLD_Tiny. However, when doing inference with demo_trt.py, i got this error:

[email protected]:~/Desktop/Ultra_Fast_Lane_Detection_TensorRT$ python3 UFLD_Tiny/demo_trt.py --model ./model_static_fp16 Loading TRT file from path ./model_static_fp16.trt... [array([-0.2890625 , -1. , -1.4892578 , ..., 2.9804688 , 0.18823242, 9.140625 ], dtype=float32)] Traceback (most recent call last): File "UFLD_Tiny/demo_trt.py", line 123, in <module> main() File "UFLD_Tiny/demo_trt.py", line 93, in main out_j = trt_outputs[0].reshape(97, 56, 4) # tiny版本不一样 ValueError: cannot reshape array of size 22624 into shape (97,56,4) The output looks like a 1-D array. Any idea how to solve this? My system: Jetson TX2, Jetpack 4.5.1, Ubuntu 18.04, CUDA 10.2, Tensorrt 7.1.3

opened by namKolorfuL 0

Releases(TRT2021)

TRT2021(May 1, 2021)

Source code(tar.gz)
Source code(zip)

Owner

steven.yan

Algorithm engineer

GitHub Repository

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Related tags

Overview

Ultra_Fast_Lane_Detection_TensorRT

一. PyThon ONNX Parser

1. How to run

2. Build ONNX（将训练好的pth/pt模型转换为onnx）

3. Build trt engine（将onnx模型转换为TensorRT的推理引擎）

static(fp32, fp16): 对于静态模型的导出，终端输入：

dynamic(fp32, fp16): 对于动态模型的导出，终端输入：

int8 quantization 如果想使用int8量化，终端输入：

4. evaluate(compare)

二. torch2trt

How to run

生成trt模型

torch2trt 预测demo (可视化)

evaluated

三. C++ TensorRT API

生成权重文件

trt模型生成

修改第十行为 #define USE_FP32,则为FP32模式, 修改第十行为 #define USE_FP16,则为FP16模式

Tensorrt预测

四. trtexec

test tensorrt_dynamic_model on terminal, for instance, for batch_size=BATCH_SIZE, just run:

You might also like...

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Simple and efficient RevNet-Library with DeepSpeed support

A high-level yet extensible library for fast language model tuning via automatic prompt search

Comments

bug in UFLD_C++/main.cpp

请问怎么用CULane数据集训练的权重来推理

Unpickling issue with torch2trt

Issue with demo_trt.py

Releases(TRT2021)

TRT2021(May 1, 2021)

Owner

steven.yan

Sequence-to-Sequence Framework in PyTorch

A complete NLP guideline for enthusiasts

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

NLP, Machine learning

COVID-19 Related NLP Papers

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

Code examples for my Write Better Python Code series on YouTube.

A 10000+ hours dataset for Chinese speech recognition

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

A paper list for aspect based sentiment analysis.

A python package for deep multilingual punctuation prediction.

Deep learning for NLP crash course at ABBYY.

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Sequence-to-Sequence learning using PyTorch

基于百度的语音识别，用python实现，pyaudio+pyqt

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Simple translation demo showcasing our headliner package.

Code-autocomplete, a code completion plugin for Python

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.