PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Overview

Poincaré Embeddings for Learning Hierarchical Representations

PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

wn-nouns.jpg

Installation

Simply clone this repository via

git clone https://github.com/facebookresearch/poincare-embeddings.git
cd poincare-embeddings
conda env create -f environment.yml
source activate poincare
python setup.py build_ext --inplace

Example: Embedding WordNet Mammals

To embed the transitive closure of the WordNet mammals subtree, first generate the data via

cd wordnet
python transitive_closure.py

This will generate the transitive closure of the full noun hierarchy as well as of the mammals subtree of WordNet.

To embed the mammals subtree in the reconstruction setting (i.e., without missing data), go to the root directory of the project and run

./train-mammals.sh

This shell script includes the appropriate parameter settings for the mammals subtree and saves the trained model as mammals.pth.

An identical script to learn embeddings of the entire noun hierarchy is located at train-nouns.sh. This script contains the hyperparameter setting to reproduce the results for 10-dimensional embeddings of (Nickel & Kiela, 2017). The hyperparameter setting to reproduce the MAP results are provided as comments in the script.

The embeddings are trained via multithreaded async SGD. In the example above, the number of threads is set to a conservative setting (NHTREADS=2) which should run well even on smaller machines. On machines with many cores, increase NTHREADS for faster convergence.

Dependencies

  • Python 3 with NumPy
  • PyTorch
  • Scikit-Learn
  • NLTK (to generate the WordNet data)

References

If you find this code useful for your research, please cite the following paper in your publication:

@incollection{nickel2017poincare,
  title = {Poincar\'{e} Embeddings for Learning Hierarchical Representations},
  author = {Nickel, Maximilian and Kiela, Douwe},
  booktitle = {Advances in Neural Information Processing Systems 30},
  editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
  pages = {6341--6350},
  year = {2017},
  publisher = {Curran Associates, Inc.},
  url = {http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf}
}

License

This code is licensed under CC-BY-NC 4.0.

https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg

Comments
  • Question about evaluation: mean rank and mAP

    Question about evaluation: mean rank and mAP

    Hi, I am new to this task and want to know the relation between mean rank and mAP. I am reproducing the result of link prediction task and trained TransE model as well as Poincare model. When I evaluation those two models, I found that TransE may get higher mean rank and higher MAP ,but Poincare may get lower mean rank and lower MAP. Should it always be lower mean rank and higher MAP? Are there some differences between those two ways of evaluation? Or maybe there is something wrong with my evaluation code :(

    opened by xxkkrr 10
  • KeyError: 'Traceback

    KeyError: 'Traceback

    [email protected]:~/ub16_prj/poincare-embeddings$ NTHREADS=2 ./train-nouns.sh Using 2 threads slurp: objects=82115, edges=743086 Indexing data json_conf: {"distfn": "poincare", "dim": 10, "lr": 1, "batchsize": 50, "negs": 50} Burnin: lr=0.01 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n'

    opened by loveJasmine 8
  • “train-nouns.py” stops

    “train-nouns.py” stops

    Hi, I am wondering is it normal for “train-nouns.py” to stop for a relatively long time after many epochs. I have run it for nearly 1 day and it does not have any outputs now.

    image

    opened by ShaoTengLiu 6
  • MAP around 0.4 after 300 epochs, far less than results in your paper

    MAP around 0.4 after 300 epochs, far less than results in your paper

    Hi, I just train your code on pytorch 0.4.0, on the mammals dataset, with your default hyperparameters, I can just get an embedding of MAP around 0.4 after 300 epochs, which is far more less than your reported results 0.927 in your paper. BTW, I just add 6 lines of code in the form of t= int(t) because the version of pytorch, other codes remain the same. I am wondering how could this happen?

    opened by ydtydr 6
  • Question on using these embeddings in practice

    Question on using these embeddings in practice

    I wrote this question on stack overflow here... this is not so much an issue with the repo, as it is a question of how to implement these embeddings in practice.

    My assumption for how to implement these is to take a sentence, like """The US and UK could agree a “phenomenal' trade deal after Britain leaves the EU.""", tokenize into a list of synsets [[], Synset('united_states.n.01'), [], Synset('united_kingdom.n.01'), ... ]... but in order to do that, one needs each unique representative synset node of the wordnet, based on the context in which that word lives.

    This seems like a pretty difficult aspect of using these embeddings, and I'm wondering what strategies there are to solve this, what literature is available, whether this is totally not how their implemented in practice, or if there are open source projects which can take both the word and the sentence context into account to map a sentence to a list of "optimal" synsets. Seems like this is a critical aspect of using these (and wordnets in general), but not frequently discussed.

    BTW many thanks for open-sourcing this, I find this technology really fascinating.

    opened by erjenkins29 5
  • ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

    ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

    Thank you for sharing this great code. However, I encountered one valueError when I am trying to reproduce the train-mammals.sh. The error information can be as follows:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>sh train-mammals.sh
    Specified hogwild training with GPU, defaulting to CPU...
    Using edge list dataloader
    Traceback (most recent call last):
      File "embed.py", line 246, in <module>
        main()
      File "embed.py", line 147, in main
        manifold, opt, idx, objects, weights, sparse=opt.sparse
      File "C:\Users\DELL\Github\poincare-embeddings-master\hype\sn.py", line 64, in initialize
        opt.ndproc, opt.burnin > 0, opt.dampening)
      File "hype\graph_dataset.pyx", line 75, in hype.graph_dataset.BatchedDataset.__cinit__
        self._mk_weights(idx, weights)
      File "hype\graph_dataset.pyx", line 81, in hype.graph_dataset.BatchedDataset._mk_weights
        def _mk_weights(self, npc.ndarray[npc.long_t, ndim=2] idx, npc.ndarray[npc.double_t, ndim=1] weights):
    ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'
    

    I am using pytorch 1.0 with cuda 10.0 on windows:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>python -c "import torch; print(torch.version.cuda)"
    10.0
    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
    Cuda compilation tools, release 10.0, V10.0.130
    

    I found that the problem comes from earlier steps python setup.py build_ext --inplace. When I run python setup.py build_ext --inplace, it shows the following information:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>python setup.py build_ext --inplace
    Compiling hype/graph_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
    Compiling hype/adjacency_matrix_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
    [1/2] Cythonizing hype/adjacency_matrix_dataset.pyx
    C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    [2/2] Cythonizing hype/graph_dataset.pyx
    C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    running build_ext
    building 'hype.graph_dataset' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/graph_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/graph_dataset.obj -std=c++11
    cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
    graph_dataset.cpp
    c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
    hype/graph_dataset.cpp(3033): warning C4244: “=”: 从“Py_ssize_t”转换到“int”,可能丢失数据
    hype/graph_dataset.cpp(3418): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”,可能丢失数据
    hype/graph_dataset.cpp(3429): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”,可能丢失数据
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_graph_dataset build\temp.win-amd64-3.6\Release\hype/graph_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib
      正在创建库 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.exp
    正在生成代码
    已完成代码的生成
    building 'hype.adjacency_matrix_dataset' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/adjacency_matrix_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj -std=c++11
    cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
    adjacency_matrix_dataset.cpp
    c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
    hype/adjacency_matrix_dataset.cpp(3144): warning C4244: “=”: 从“Py_ssize_t”转换到“int”,可能丢失数据
    hype/adjacency_matrix_dataset.cpp(5484): warning C4244: “=”: 从“Py_ssize_t”转换到“long”,可能丢失数据
    hype/adjacency_matrix_dataset.cpp(5595): warning C4244: “=”: 从“Py_ssize_t”转换到“long”,可能丢失数据
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_adjacency_matrix_dataset build\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib
      正在创建库 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.exp
    正在生成代码
    已完成代码的生成
    

    Your suggestion are welcome. Thanks.

    opened by chengjun 5
  • [Question] Predicting the parent of an unseen word in an existing hierarchy

    [Question] Predicting the parent of an unseen word in an existing hierarchy

    Hello there,

    I saw this question and I am looking for something similar.

    But, in my case, let's say I have a list like:

    parent | child -- | -- animal | carnivore carnivore | dog dog | rottweiler dog | huskey dog | pug

    From this list I can train a Poincare model and get poincare embeddings for every word/node in this hierarchy. Then, let's say I receive a new word: "dogo argentino". What I want at this point is a way to infer automatically that "dogo argentino" is the child of "dog". Can poincare embeddings be of help here? If ignoring poincare embeddings at all, a thought would be to get fastText embeddings for all the words, and find the word that is the closest (maybe in terms of cosine similarity) to the fastText embedding for "dogo argentino". Then assign the parent of that closest word, to the "dogo argentino" word. Any thoughts would be greatly appreciated. Thank you!

    opened by stelmath 4
  • modifies shell script to use correct python version

    modifies shell script to use correct python version

    Going through the code instructions on Mac OSX gave the following error:

    ModuleNotFoundError: No module named 'torch'

    Which was resolved by modifying python3 to python.

    CLA Signed 
    opened by denmonz 4
  • Embedding of Tree Structures: root no in the center of the disk

    Embedding of Tree Structures: root no in the center of the disk

    Hi, I'm dealing with embedding of hierarchical structures (a tree) and I used your algorithm. It works, but when I visualise the 2D or 3D embedding the root of the tree is not in the center of the disk. Is there some command or flag to add to the code? Or is the root supposed to be in the center on his own?

    opened by federicodassereto 4
  • Lack of hypernymysuite

    Lack of hypernymysuite

    Hi, I try to use this tool follow the readme file. But the latest version does not include hypernymysuite file. My system return me

    "from hypernymysuite.base import HypernymySuiteModel ModuleNotFoundError: No module named 'hypernymysuite'"

    opened by QingkaiZeng 3
  • Problem of Poincare Distance and Norm

    Problem of Poincare Distance and Norm

    Hi, I trained a 100 dimension model with train-nouns.sh in poincare manifold. I tested the model by calculating the tensor.norm of some nodes and poincare.distance between some nodes.

    However, for Gauss, Brain, Blue and many other nodes near to the edge, the norm of them become 1.0000, which means infinity in poincare ball. In addition, if I calculate the poincare distance between these nodes near to the edge and other nodes, the distance will be always 12.2061, which is a little confusing.

    Thanks for your attention!

    opened by ShaoTengLiu 3
  • Large dataset that needs continuous training

    Large dataset that needs continuous training

    Hi there, I have a quite large dataset and GPU usage that does not exceed 24 hours. In this case, I need to restore from checkpoint and continue to train the old models. The current situation is that, each time I try to do so, it is shown that the dimension of parameters is not correct--originally I train a 2-dimension embedding but now the dim of parameter is 21. Any thoughts on this? Any advice?

    opened by lkcao 0
  • how was mammal_closure.csv created

    how was mammal_closure.csv created

    Hello, apologies in advance if this is a silly question. I was just looking at mammal_closure.csv because i want to do something similar with my own data and run like train-mammals.sh. I was wondering, what do the numbers in the file indicate? Such as vixen.n.02, mastiff.n.01 - what are the n.01 and n.02? Thanks!

    opened by jayachaturvedi 1
  • What should we do after training?

    What should we do after training?

    Hello, I am Tianyu. I just tried to use my data to train a new model using embed.py. However, after I get the dl model, what is the next step for me to get the embeddings? I don't find clear steps for future work. Thanks a lot.

    opened by HelloWorldLTY 4
  • In the Euclidian space, the distance is incorrectly squared

    In the Euclidian space, the distance is incorrectly squared

    In the formula angle_at_u(...), dist is used squared in the numerator, and unsquared in the denominator. Thus, this does not expect distance(...) to return a squared norm but a true norm.

    image image

    You can notice the inconsistency if you put norm(...) and distance(...) next to each other:

    image

    CLA Signed 
    opened by FremyCompany 4
  • Entailment cones compute the wrong angle?

    Entailment cones compute the wrong angle?

    Hi,

    I could be wrong, but I have the impression the entailment cones energy function is wrong, and optimizes the inverted order relationship between concepts.

    If I understood correctly...:

    • In our data files, id2 should be the generic concept, and id1 the more specific one.
    • After training norm(embedding(id2)) should usually be lower than norm(embedding(id1)) as cones have more space on the edge, so generic concepts should be close to the origin and specific concepts should be far from it.

    However, after training, I observe the reverse to be true:

    • Generic concepts are as far as possible from the center.
    • In their cones, one can find even-more-generic concepts, and no more specific concepts.

    I'm either doing something very wrong, or have incorrect assumptions, or the energy function is reversed compared to what it should be according to the entailment cones article, no?

    opened by FremyCompany 1
Releases(1.0)
Owner
Facebook Research
Facebook Research
NLP library designed for reproducible experimentation management

Welcome to the Transfer NLP library, a framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP You can

Feedly 290 Dec 20, 2022
Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

Voidy Devleoper 1 Dec 18, 2021
The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

Machel Reid 10 Dec 27, 2022
This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

Koga Kobayashi 60 Nov 11, 2022
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 5.8k Jan 04, 2023
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe

Dat Quoc Nguyen 152 Sep 02, 2022
Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

Jonathan Besomi 2.7k Jan 08, 2023
Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Random Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short links

Mohammed Rabil 1 Jan 01, 2022
This project converts your human voice input to its text transcript and to an automated voice too.

Human Voice to Automated Voice & Text Introduction: In this project, whenever you'll speak, it will turn your voice into a robot voice and furthermore

Hassan Shahzad 3 Oct 15, 2021
Open source code for AlphaFold.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

DeepMind 9.7k Jan 02, 2023
Interpretable Models for NLP using PyTorch

This repo is deprecated. Please find the updated package here. https://github.com/EdGENetworks/anuvada Anuvada: Interpretable Models for NLP using PyT

Sandeep Tammu 19 Dec 17, 2022
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
Mastering Transformers, published by Packt

Mastering Transformers This is the code repository for Mastering Transformers, published by Packt. Build state-of-the-art models from scratch with adv

Packt 195 Jan 01, 2023
jiant is an NLP toolkit

jiant is an NLP toolkit The multitask and transfer learning toolkit for natural language processing research Why should I use jiant? jiant supports mu

ML² AT CILVR 1.5k Jan 04, 2023
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

EQT 21 Dec 15, 2022
Legal text retrieval for python

legal-text-retrieval Overview This system contains 2 steps: generate training data containing negative sample found by mixture score of cosine(tfidf)

Nguyễn Minh Phương 22 Dec 06, 2022
This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

shivam danawale 5 Jul 17, 2022
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

gpt-j-api 🦜 An API to interact with the GPT-J language model. You can use and test the model in two different ways: Streamlit web app at http://api.v

Víctor Gallego 276 Dec 31, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 03, 2023
Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022