PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Overview

Poincaré Embeddings for Learning Hierarchical Representations

PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

wn-nouns.jpg

Installation

Simply clone this repository via

git clone https://github.com/facebookresearch/poincare-embeddings.git
cd poincare-embeddings
conda env create -f environment.yml
source activate poincare
python setup.py build_ext --inplace

Example: Embedding WordNet Mammals

To embed the transitive closure of the WordNet mammals subtree, first generate the data via

cd wordnet
python transitive_closure.py

This will generate the transitive closure of the full noun hierarchy as well as of the mammals subtree of WordNet.

To embed the mammals subtree in the reconstruction setting (i.e., without missing data), go to the root directory of the project and run

./train-mammals.sh

This shell script includes the appropriate parameter settings for the mammals subtree and saves the trained model as mammals.pth.

An identical script to learn embeddings of the entire noun hierarchy is located at train-nouns.sh. This script contains the hyperparameter setting to reproduce the results for 10-dimensional embeddings of (Nickel & Kiela, 2017). The hyperparameter setting to reproduce the MAP results are provided as comments in the script.

The embeddings are trained via multithreaded async SGD. In the example above, the number of threads is set to a conservative setting (NHTREADS=2) which should run well even on smaller machines. On machines with many cores, increase NTHREADS for faster convergence.

Dependencies

  • Python 3 with NumPy
  • PyTorch
  • Scikit-Learn
  • NLTK (to generate the WordNet data)

References

If you find this code useful for your research, please cite the following paper in your publication:

@incollection{nickel2017poincare,
  title = {Poincar\'{e} Embeddings for Learning Hierarchical Representations},
  author = {Nickel, Maximilian and Kiela, Douwe},
  booktitle = {Advances in Neural Information Processing Systems 30},
  editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
  pages = {6341--6350},
  year = {2017},
  publisher = {Curran Associates, Inc.},
  url = {http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf}
}

License

This code is licensed under CC-BY-NC 4.0.

https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg

Comments
  • Question about evaluation: mean rank and mAP

    Question about evaluation: mean rank and mAP

    Hi, I am new to this task and want to know the relation between mean rank and mAP. I am reproducing the result of link prediction task and trained TransE model as well as Poincare model. When I evaluation those two models, I found that TransE may get higher mean rank and higher MAP ,but Poincare may get lower mean rank and lower MAP. Should it always be lower mean rank and higher MAP? Are there some differences between those two ways of evaluation? Or maybe there is something wrong with my evaluation code :(

    opened by xxkkrr 10
  • KeyError: 'Traceback

    KeyError: 'Traceback

    [email protected]:~/ub16_prj/poincare-embeddings$ NTHREADS=2 ./train-nouns.sh Using 2 threads slurp: objects=82115, edges=743086 Indexing data json_conf: {"distfn": "poincare", "dim": 10, "lr": 1, "batchsize": 50, "negs": 50} Burnin: lr=0.01 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n' Traceback (most recent call last): File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 19, in train_mp train(model, data, optimizer, opt, log, rank, queue) File "/home/mldl/ub16_prj/poincare-embeddings/train.py", line 46, in train for inputs, targets in loader: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) KeyError: 'Traceback (most recent call last):\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 57, in \n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/mldl/ub16_prj/poincare-embeddings/model.py", line 185, in getitem\n if n not in self._weights[t]:\nKeyError: tensor(23511)\n'

    opened by loveJasmine 8
  • “train-nouns.py” stops

    “train-nouns.py” stops

    Hi, I am wondering is it normal for “train-nouns.py” to stop for a relatively long time after many epochs. I have run it for nearly 1 day and it does not have any outputs now.

    image

    opened by ShaoTengLiu 6
  • MAP around 0.4 after 300 epochs, far less than results in your paper

    MAP around 0.4 after 300 epochs, far less than results in your paper

    Hi, I just train your code on pytorch 0.4.0, on the mammals dataset, with your default hyperparameters, I can just get an embedding of MAP around 0.4 after 300 epochs, which is far more less than your reported results 0.927 in your paper. BTW, I just add 6 lines of code in the form of t= int(t) because the version of pytorch, other codes remain the same. I am wondering how could this happen?

    opened by ydtydr 6
  • Question on using these embeddings in practice

    Question on using these embeddings in practice

    I wrote this question on stack overflow here... this is not so much an issue with the repo, as it is a question of how to implement these embeddings in practice.

    My assumption for how to implement these is to take a sentence, like """The US and UK could agree a “phenomenal' trade deal after Britain leaves the EU.""", tokenize into a list of synsets [[], Synset('united_states.n.01'), [], Synset('united_kingdom.n.01'), ... ]... but in order to do that, one needs each unique representative synset node of the wordnet, based on the context in which that word lives.

    This seems like a pretty difficult aspect of using these embeddings, and I'm wondering what strategies there are to solve this, what literature is available, whether this is totally not how their implemented in practice, or if there are open source projects which can take both the word and the sentence context into account to map a sentence to a list of "optimal" synsets. Seems like this is a critical aspect of using these (and wordnets in general), but not frequently discussed.

    BTW many thanks for open-sourcing this, I find this technology really fascinating.

    opened by erjenkins29 5
  • ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

    ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'

    Thank you for sharing this great code. However, I encountered one valueError when I am trying to reproduce the train-mammals.sh. The error information can be as follows:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>sh train-mammals.sh
    Specified hogwild training with GPU, defaulting to CPU...
    Using edge list dataloader
    Traceback (most recent call last):
      File "embed.py", line 246, in <module>
        main()
      File "embed.py", line 147, in main
        manifold, opt, idx, objects, weights, sparse=opt.sparse
      File "C:\Users\DELL\Github\poincare-embeddings-master\hype\sn.py", line 64, in initialize
        opt.ndproc, opt.burnin > 0, opt.dampening)
      File "hype\graph_dataset.pyx", line 75, in hype.graph_dataset.BatchedDataset.__cinit__
        self._mk_weights(idx, weights)
      File "hype\graph_dataset.pyx", line 81, in hype.graph_dataset.BatchedDataset._mk_weights
        def _mk_weights(self, npc.ndarray[npc.long_t, ndim=2] idx, npc.ndarray[npc.double_t, ndim=1] weights):
    ValueError: Buffer dtype mismatch, expected 'long_t' but got 'long'
    

    I am using pytorch 1.0 with cuda 10.0 on windows:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>python -c "import torch; print(torch.version.cuda)"
    10.0
    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
    Cuda compilation tools, release 10.0, V10.0.130
    

    I found that the problem comes from earlier steps python setup.py build_ext --inplace. When I run python setup.py build_ext --inplace, it shows the following information:

    (poincare) C:\Users\DELL\Github\poincare-embeddings-master>python setup.py build_ext --inplace
    Compiling hype/graph_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
    Compiling hype/adjacency_matrix_dataset.pyx because it depends on C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Includes\numpy\__init__.pxd.
    [1/2] Cythonizing hype/adjacency_matrix_dataset.pyx
    C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    [2/2] Cythonizing hype/graph_dataset.pyx
    C:\Anaconda3\envs\poincare\lib\site-packages\Cython\Compiler\Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.pyx
      tree = Parsing.p_module(s, pxd, full_module_name)
    running build_ext
    building 'hype.graph_dataset' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/graph_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/graph_dataset.obj -std=c++11
    cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
    graph_dataset.cpp
    c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
    hype/graph_dataset.cpp(3033): warning C4244: “=”: 从“Py_ssize_t”转换到“int”,可能丢失数据
    hype/graph_dataset.cpp(3418): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”,可能丢失数据
    hype/graph_dataset.cpp(3429): warning C4244: “=”: 从“__pyx_t_5numpy_long_t”转换到“long”,可能丢失数据
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_graph_dataset build\temp.win-amd64-3.6\Release\hype/graph_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\graph_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib
      正在创建库 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\graph_dataset.cp36-win_amd64.exp
    正在生成代码
    已完成代码的生成
    building 'hype.adjacency_matrix_dataset' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Anaconda3\envs\poincare\lib\site-packages\numpy\core\include -IC:\Anaconda3\envs\poincare\include -IC:\Anaconda3\envs\poincare\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tphype/adjacency_matrix_dataset.cpp /Fobuild\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj -std=c++11
    cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
    adjacency_matrix_dataset.cpp
    c:\anaconda3\envs\poincare\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
    hype/adjacency_matrix_dataset.cpp(3144): warning C4244: “=”: 从“Py_ssize_t”转换到“int”,可能丢失数据
    hype/adjacency_matrix_dataset.cpp(5484): warning C4244: “=”: 从“Py_ssize_t”转换到“long”,可能丢失数据
    hype/adjacency_matrix_dataset.cpp(5595): warning C4244: “=”: 从“Py_ssize_t”转换到“long”,可能丢失数据
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Anaconda3\envs\poincare\libs /LIBPATH:C:\Anaconda3\envs\poincare\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" /EXPORT:PyInit_adjacency_matrix_dataset build\temp.win-amd64-3.6\Release\hype/adjacency_matrix_dataset.obj /OUT:C:\Users\DELL\Github\poincare-embeddings-master\hype\adjacency_matrix_dataset.cp36-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib
      正在创建库 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.lib 和对象 build\temp.win-amd64-3.6\Release\hype\adjacency_matrix_dataset.cp36-win_amd64.exp
    正在生成代码
    已完成代码的生成
    

    Your suggestion are welcome. Thanks.

    opened by chengjun 5
  • [Question] Predicting the parent of an unseen word in an existing hierarchy

    [Question] Predicting the parent of an unseen word in an existing hierarchy

    Hello there,

    I saw this question and I am looking for something similar.

    But, in my case, let's say I have a list like:

    parent | child -- | -- animal | carnivore carnivore | dog dog | rottweiler dog | huskey dog | pug

    From this list I can train a Poincare model and get poincare embeddings for every word/node in this hierarchy. Then, let's say I receive a new word: "dogo argentino". What I want at this point is a way to infer automatically that "dogo argentino" is the child of "dog". Can poincare embeddings be of help here? If ignoring poincare embeddings at all, a thought would be to get fastText embeddings for all the words, and find the word that is the closest (maybe in terms of cosine similarity) to the fastText embedding for "dogo argentino". Then assign the parent of that closest word, to the "dogo argentino" word. Any thoughts would be greatly appreciated. Thank you!

    opened by stelmath 4
  • modifies shell script to use correct python version

    modifies shell script to use correct python version

    Going through the code instructions on Mac OSX gave the following error:

    ModuleNotFoundError: No module named 'torch'

    Which was resolved by modifying python3 to python.

    CLA Signed 
    opened by denmonz 4
  • Embedding of Tree Structures: root no in the center of the disk

    Embedding of Tree Structures: root no in the center of the disk

    Hi, I'm dealing with embedding of hierarchical structures (a tree) and I used your algorithm. It works, but when I visualise the 2D or 3D embedding the root of the tree is not in the center of the disk. Is there some command or flag to add to the code? Or is the root supposed to be in the center on his own?

    opened by federicodassereto 4
  • Lack of hypernymysuite

    Lack of hypernymysuite

    Hi, I try to use this tool follow the readme file. But the latest version does not include hypernymysuite file. My system return me

    "from hypernymysuite.base import HypernymySuiteModel ModuleNotFoundError: No module named 'hypernymysuite'"

    opened by QingkaiZeng 3
  • Problem of Poincare Distance and Norm

    Problem of Poincare Distance and Norm

    Hi, I trained a 100 dimension model with train-nouns.sh in poincare manifold. I tested the model by calculating the tensor.norm of some nodes and poincare.distance between some nodes.

    However, for Gauss, Brain, Blue and many other nodes near to the edge, the norm of them become 1.0000, which means infinity in poincare ball. In addition, if I calculate the poincare distance between these nodes near to the edge and other nodes, the distance will be always 12.2061, which is a little confusing.

    Thanks for your attention!

    opened by ShaoTengLiu 3
  • Large dataset that needs continuous training

    Large dataset that needs continuous training

    Hi there, I have a quite large dataset and GPU usage that does not exceed 24 hours. In this case, I need to restore from checkpoint and continue to train the old models. The current situation is that, each time I try to do so, it is shown that the dimension of parameters is not correct--originally I train a 2-dimension embedding but now the dim of parameter is 21. Any thoughts on this? Any advice?

    opened by lkcao 0
  • how was mammal_closure.csv created

    how was mammal_closure.csv created

    Hello, apologies in advance if this is a silly question. I was just looking at mammal_closure.csv because i want to do something similar with my own data and run like train-mammals.sh. I was wondering, what do the numbers in the file indicate? Such as vixen.n.02, mastiff.n.01 - what are the n.01 and n.02? Thanks!

    opened by jayachaturvedi 1
  • What should we do after training?

    What should we do after training?

    Hello, I am Tianyu. I just tried to use my data to train a new model using embed.py. However, after I get the dl model, what is the next step for me to get the embeddings? I don't find clear steps for future work. Thanks a lot.

    opened by HelloWorldLTY 4
  • In the Euclidian space, the distance is incorrectly squared

    In the Euclidian space, the distance is incorrectly squared

    In the formula angle_at_u(...), dist is used squared in the numerator, and unsquared in the denominator. Thus, this does not expect distance(...) to return a squared norm but a true norm.

    image image

    You can notice the inconsistency if you put norm(...) and distance(...) next to each other:

    image

    CLA Signed 
    opened by FremyCompany 4
  • Entailment cones compute the wrong angle?

    Entailment cones compute the wrong angle?

    Hi,

    I could be wrong, but I have the impression the entailment cones energy function is wrong, and optimizes the inverted order relationship between concepts.

    If I understood correctly...:

    • In our data files, id2 should be the generic concept, and id1 the more specific one.
    • After training norm(embedding(id2)) should usually be lower than norm(embedding(id1)) as cones have more space on the edge, so generic concepts should be close to the origin and specific concepts should be far from it.

    However, after training, I observe the reverse to be true:

    • Generic concepts are as far as possible from the center.
    • In their cones, one can find even-more-generic concepts, and no more specific concepts.

    I'm either doing something very wrong, or have incorrect assumptions, or the energy function is reversed compared to what it should be according to the entailment cones article, no?

    opened by FremyCompany 1
Releases(1.0)
Owner
Facebook Research
Facebook Research
ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

ZUNIT Dependencies you can install all the dependencies by pip install -r requirements.txt Datasets Download CUB dataset. Unzip the birds.zip at ./da

Chen Yuanqi 9 Jun 24, 2022
Minimal GUI for accessing the Watson Text to Speech service.

Description Minimal graphical application for accessing the Watson Text to Speech service. Requirements Python 3 plus all dependencies listed in requi

Moritz Maxeiner 1 Oct 22, 2021
Machine Psychology: Python Generated Art

Machine Psychology: Python Generated Art A limited collection of 64 algorithmically generated artwork. Each unique piece is then given a title by the

Pixegami Team 67 Dec 13, 2022
Finetune gpt-2 in google colab

gpt-2-colab finetune gpt-2 in google colab sample result (117M) from retraining on A Tale of Two Cities by Charles Di

212 Jan 02, 2023
This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

Koga Kobayashi 60 Nov 11, 2022
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
A sentence aligner for comparable corpora

About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur

Machinalis 128 Aug 24, 2022
Practical Machine Learning with Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Dipanjan (DJ) Sarkar 2k Jan 08, 2023
Code and data accompanying Natural Language Processing with PyTorch

Natural Language Processing with PyTorch Build Intelligent Language Applications Using Deep Learning By Delip Rao and Brian McMahan Welcome. This is a

Joostware 1.8k Jan 01, 2023
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 08, 2022
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
Stuff related to Ben Eater's 8bit breadboard computer

8bit breadboard computer simulator This is an assembler + simulator/emulator of Ben Eater's 8bit breadboard computer. For a version with its RAM upgra

Marijn van Vliet 29 Dec 29, 2022
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas

Imanol Schlag 77 Dec 19, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 02, 2022
Indonesia spellchecker with python

indonesia-spellchecker Ganti kata yang terdapat pada file teks.txt untuk diperiksa kebenaran kata. Run on local machine python3 main.py

Rahmat Agung Julians 1 Sep 14, 2022
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. C

Raphael Sourty 224 Nov 29, 2022
Turkish Stop Words Türkçe Dolgu Sözcükleri

trstop Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with th

Ahmet Aksoy 103 Nov 12, 2022
This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Benedek Rozemberczki 201 Nov 09, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022