Skip to content

zhangbo2008/bpe_algorithm_can_finetune_tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"# bpe_algorithm_can_finetune_tokenizer"

this is an implyment for huggingface/transformers#15153

I just add tens of lines of code into the py_bpe algorithm. function finetune_tokenizer is main function added.

Details can be see in example.py , actuctally it is very simple. the official python library tokenizer is written is rust. I am learning hoping to give a rust version of this code.

ps: the_factor_of_new_added_token_divided_unk_number is the only param you should set. hoping can find a auto algorithm to set it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages