PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Last update: Dec 29, 2022

Related tags

Text Data & NLP poincare-embeddings

Overview

Poincaré Embeddings for Learning Hierarchical Representations

PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Installation

Simply clone this repository via

git clone https://github.com/facebookresearch/poincare-embeddings.git
cd poincare-embeddings
conda env create -f environment.yml
source activate poincare
python setup.py build_ext --inplace

Example: Embedding WordNet Mammals

To embed the transitive closure of the WordNet mammals subtree, first generate the data via

cd wordnet
python transitive_closure.py

This will generate the transitive closure of the full noun hierarchy as well as of the mammals subtree of WordNet.

To embed the mammals subtree in the reconstruction setting (i.e., without missing data), go to the root directory of the project and run

./train-mammals.sh

This shell script includes the appropriate parameter settings for the mammals subtree and saves the trained model as mammals.pth.

An identical script to learn embeddings of the entire noun hierarchy is located at train-nouns.sh. This script contains the hyperparameter setting to reproduce the results for 10-dimensional embeddings of (Nickel & Kiela, 2017). The hyperparameter setting to reproduce the MAP results are provided as comments in the script.

The embeddings are trained via multithreaded async SGD. In the example above, the number of threads is set to a conservative setting (NHTREADS=2) which should run well even on smaller machines. On machines with many cores, increase NTHREADS for faster convergence.

Dependencies

Python 3 with NumPy
PyTorch
Scikit-Learn
NLTK (to generate the WordNet data)

References

If you find this code useful for your research, please cite the following paper in your publication:

@incollection{nickel2017poincare,
  title = {Poincar\'{e} Embeddings for Learning Hierarchical Representations},
  author = {Nickel, Maximilian and Kiela, Douwe},
  booktitle = {Advances in Neural Information Processing Systems 30},
  editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
  pages = {6341--6350},
  year = {2017},
  publisher = {Curran Associates, Inc.},
  url = {http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf}
}

License

This code is licensed under CC-BY-NC 4.0.

Comments

Question about evaluation: mean rank and mAP

Hi, I am new to this task and want to know the relation between mean rank and mAP. I am reproducing the result of link prediction task and trained TransE model as well as Poincare model. When I evaluation those two models, I found that TransE may get higher mean rank and higher MAP ,but Poincare may get lower mean rank and lower MAP. Should it always be lower mean rank and higher MAP? Are there some differences between those two ways of evaluation? Or maybe there is something wrong with my evaluation code :(

opened by xxkkrr 10
KeyError: 'Traceback

[email protected]:~/ub16_prj/poincare-embeddings$ NTHREADS=2 ./train-nouns.sh Using 2 threads slurp: objects=82115, edges=743086 Indexing data json_conf: {"distfn": "poincare", "dim": 10, "lr": 1, "batchsize": 50, "negs": 50} Burnin: lr=0.01 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n'

opened by loveJasmine 8
“train-nouns.py” stops

Hi, I am wondering is it normal for “train-nouns.py” to stop for a relatively long time after many epochs. I have run it for nearly 1 day and it does not have any outputs now.

opened by ShaoTengLiu 6
MAP around 0.4 after 300 epochs, far less than results in your paper

Hi, I just train your code on pytorch 0.4.0, on the mammals dataset, with your default hyperparameters, I can just get an embedding of MAP around 0.4 after 300 epochs, which is far more less than your reported results 0.927 in your paper. BTW, I just add 6 lines of code in the form of t= int(t) because the version of pytorch, other codes remain the same. I am wondering how could this happen?

opened by ydtydr 6
Question on using these embeddings in practice

I wrote this question on stack overflow here... this is not so much an issue with the repo, as it is a question of how to implement these embeddings in practice.

My assumption for how to implement these is to take a sentence, like """The US and UK could agree a “phenomenal' trade deal after Britain leaves the EU.""", tokenize into a list of synsets [[], Synset('united_states.n.01'), [], Synset('united_kingdom.n.01'), ... ]... but in order to do that, one needs each unique representative synset node of the wordnet, based on the context in which that word lives.

This seems like a pretty difficult aspect of using these embeddings, and I'm wondering what strategies there are to solve this, what literature is available, whether this is totally not how their implemented in practice, or if there are open source projects which can take both the word and the sentence context into account to map a sentence to a list of "optimal" synsets. Seems like this is a critical aspect of using these (and wordnets in general), but not frequently discussed.

BTW many thanks for open-sourcing this, I find this technology really fascinating.

opened by erjenkins29 5

ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

Thank you for sharing this great code. However, I encountered one valueError when I am trying to reproduce the train-mammals.sh. The error information can be as follows:

(poincare) C:\Users\DELL\Github\poincare-embeddings-master>sh train-mammals.sh
Specified hogwild training with GPU, defaulting to CPU...
Using edge list dataloader
Traceback (most recent call last):
  File "embed.py", line 246, in <module>
    main()
  File "embed.py", line 147, in main
    manifold, opt, idx, objects, weights, sparse=opt.sparse
  File "C:\Users\DELL\Github\poincare-embeddings-master\hype\sn.py", line 64, in initialize
    opt.ndproc, opt.burnin > 0, opt.dampening)
  File "hype\graph_dataset.pyx", line 75, in hype.graph_dataset.BatchedDataset.__cinit__
    self._mk_weights(idx, weights)
  File "hype\graph_dataset.pyx", line 81, in hype.graph_dataset.BatchedDataset._mk_weights
    def _mk_weights(self, npc.ndarray[npc.long_t, ndim=2] idx, npc.ndarray[npc.double_t, ndim=1] weights):
ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

I am using pytorch 1.0 with cuda 10.0 on windows:

(poincare) C:\Users\DELL\Github\poincare-embeddings-master>python -c "import torch; print(torch.version.cuda)"
10.0
(poincare) C:\Users\DELL\Github\poincare-embeddings-master>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130

I found that the problem comes from earlier steps python setup.py build_ext --inplace. When I run python setup.py build_ext --inplace, it shows the following information:

(poincare) C:\Users\DELL\Github\poincare-embeddings-master>python setup.py build_ext --inplace
Compiling hype/graph_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
Compiling hype/adjacency_matrix_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
[1/2] Cythonizing hype/adjacency_matrix_dataset.pyx
C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
[2/2] Cythonizing hype/graph_dataset.pyx
C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'hype.graph_dataset' extension
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/graph_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/graph_dataset.obj -std=c++11
cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
graph_dataset.cpp
c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
hype/graph_dataset.cpp(3033): warning C4244: “=”: 从“Py_ssize_t”转换到“int”，可能丢失数据
hype/graph_dataset.cpp(3418): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”，可能丢失数据
hype/graph_dataset.cpp(3429): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”，可能丢失数据
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_graph_dataset build\temp.win-amd64-3.6\Release\hype/graph_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib
  正在创建库 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.exp
正在生成代码
已完成代码的生成
building 'hype.adjacency_matrix_dataset' extension
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/adjacency_matrix_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj -std=c++11
cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
adjacency_matrix_dataset.cpp
c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
hype/adjacency_matrix_dataset.cpp(3144): warning C4244: “=”: 从“Py_ssize_t”转换到“int”，可能丢失数据
hype/adjacency_matrix_dataset.cpp(5484): warning C4244: “=”: 从“Py_ssize_t”转换到“long”，可能丢失数据
hype/adjacency_matrix_dataset.cpp(5595): warning C4244: “=”: 从“Py_ssize_t”转换到“long”，可能丢失数据
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_adjacency_matrix_dataset build\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib
  正在创建库 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.exp
正在生成代码
已完成代码的生成

Your suggestion are welcome. Thanks.

opened by chengjun 5

[Question] Predicting the parent of an unseen word in an existing hierarchy

Hello there,

I saw this question and I am looking for something similar.

But, in my case, let's say I have a list like:

parent | child -- | -- animal | carnivore carnivore | dog dog | rottweiler dog | huskey dog | pug

From this list I can train a Poincare model and get poincare embeddings for every word/node in this hierarchy. Then, let's say I receive a new word: "dogo argentino". What I want at this point is a way to infer automatically that "dogo argentino" is the child of "dog". Can poincare embeddings be of help here? If ignoring poincare embeddings at all, a thought would be to get fastText embeddings for all the words, and find the word that is the closest (maybe in terms of cosine similarity) to the fastText embedding for "dogo argentino". Then assign the parent of that closest word, to the "dogo argentino" word. Any thoughts would be greatly appreciated. Thank you!

opened by stelmath 4
modifies shell script to use correct python version

Going through the code instructions on Mac OSX gave the following error:

ModuleNotFoundError: No module named 'torch'

Which was resolved by modifying python3 to python.
CLA Signed

opened by denmonz 4
Embedding of Tree Structures: root no in the center of the disk

Hi, I'm dealing with embedding of hierarchical structures (a tree) and I used your algorithm. It works, but when I visualise the 2D or 3D embedding the root of the tree is not in the center of the disk. Is there some command or flag to add to the code? Or is the root supposed to be in the center on his own?

opened by federicodassereto 4
Lack of hypernymysuite

Hi, I try to use this tool follow the readme file. But the latest version does not include hypernymysuite file. My system return me

"from hypernymysuite.base import HypernymySuiteModel ModuleNotFoundError: No module named 'hypernymysuite'"

opened by QingkaiZeng 3
Problem of Poincare Distance and Norm

Hi, I trained a 100 dimension model with train-nouns.sh in poincare manifold. I tested the model by calculating the tensor.norm of some nodes and poincare.distance between some nodes.

However, for Gauss, Brain, Blue and many other nodes near to the edge, the norm of them become 1.0000, which means infinity in poincare ball. In addition, if I calculate the poincare distance between these nodes near to the edge and other nodes, the distance will be always 12.2061, which is a little confusing.

Thanks for your attention!

opened by ShaoTengLiu 3
Large dataset that needs continuous training

Hi there, I have a quite large dataset and GPU usage that does not exceed 24 hours. In this case, I need to restore from checkpoint and continue to train the old models. The current situation is that, each time I try to do so, it is shown that the dimension of parameters is not correct--originally I train a 2-dimension embedding but now the dim of parameter is 21. Any thoughts on this? Any advice?

opened by lkcao 0
how was mammal_closure.csv created

Hello, apologies in advance if this is a silly question. I was just looking at mammal_closure.csv because i want to do something similar with my own data and run like train-mammals.sh. I was wondering, what do the numbers in the file indicate? Such as vixen.n.02, mastiff.n.01 - what are the n.01 and n.02? Thanks!

opened by jayachaturvedi 1
What should we do after training?

Hello, I am Tianyu. I just tried to use my data to train a new model using embed.py. However, after I get the dl model, what is the next step for me to get the embeddings? I don't find clear steps for future work. Thanks a lot.

opened by HelloWorldLTY 4
In the Euclidian space, the distance is incorrectly squared

In the formula angle_at_u(...), dist is used squared in the numerator, and unsquared in the denominator. Thus, this does not expect distance(...) to return a squared norm but a true norm.

You can notice the inconsistency if you put norm(...) and distance(...) next to each other:

CLA Signed

opened by FremyCompany 4
Entailment cones compute the wrong angle?
Hi,

I could be wrong, but I have the impression the entailment cones energy function is wrong, and optimizes the inverted order relationship between concepts.

If I understood correctly...:

In our data files, id2 should be the generic concept, and id1 the more specific one.

After training norm(embedding(id2)) should usually be lower than norm(embedding(id1)) as cones have more space on the edge, so generic concepts should be close to the origin and specific concepts should be far from it.

However, after training, I observe the reverse to be true:

Generic concepts are as far as possible from the center.

In their cones, one can find even-more-generic concepts, and no more specific concepts.

I'm either doing something very wrong, or have incorrect assumptions, or the energy function is reversed compared to what it should be according to the entailment cones article, no?
opened by FremyCompany 1

Releases(1.0)

1.0(Nov 16, 2018)

This is a snapshot of the NIPS 2017 release
Source code(tar.gz)
Source code(zip)

Owner

Facebook Research

GitHub Repository

Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Highlights The strongest performances Tracker

485 Jan 04, 2023

Course project of [email protected]

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git

2 Apr 24, 2022

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支，删除 wavegan 分支！ 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块！ 2021/04/13 softdtw 分支支持使用 Sof

161 Dec 19, 2022

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

22 Nov 16, 2022

Huggingface Transformers + Adapters = ❤️

adapter-transformers A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models adapter-transformers is an extension of

1.2k Jan 09, 2023

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

5 Oct 24, 2022

Automatically search Stack Overflow for the command you want to run

stackshell Automatically search Stack Overflow (and other Stack Exchange sites) for the command you want to ru Use the up and down arrows to change be

22 Oct 27, 2021

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

MILES Multilingual Lexical Simplifier Explore the docs » Read LSBert Paper · Report Bug · Request Feature About The Project MILES is a multilingual te

45 Oct 19, 2022

To be a next-generation DL-based phenotype prediction from genome mutations.

Sequence -----------+-- 3D_structure -- 3D_module --+ +-- ? | |

18 Jan 11, 2022

NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

3 Aug 08, 2021

Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

Twitter-Sentiment-Analysis The hands-on project is in Python 3 Programming class offered by University of Michigan via Coursera. The task is to build

1 Jan 03, 2022

A CSRankings-like index for speech researchers

Speech Rankings This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possib

19 Nov 26, 2022

Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

61 Oct 26, 2022

A Telegram bot to add notes to Flomo.

flomo bot 使用 Telegram 机器人发送笔记到你的 Flomo. 你需要有一台可访问 Telegram 的服务器。 Steps @BotFather 新建机器人，获取 token Flomo 官网获取 API，链接 https://flomoapp.com/mine?source=in

44 Dec 30, 2022

edge-SR: Super-Resolution For The Masses

edge-SR: Super Resolution For The Masses Citation Pablo Navarrete Michelini, Yunhua Lu and Xingqun Jiang. "edge-SR: Super-Resolution For The Masses",

40 Nov 10, 2022

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Mortgage-Application-Analysis Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables: age, in

1 Jan 29, 2022

Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

230 Nov 16, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

479 Jan 01, 2023