Simple PyTorch Implementation of "Grokking"

Implementation of Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Usage

Running train.py with default arguments will run my best (yet) attempt to reproduce the "Grokking" behavior on modular division as seen in Figure 1 of the paper.

python train.py

The results seem highly sensitive to optimizer hyperparameter selection, and I have not yet tried all of the configurations outlined in the paper.

Citations

@inproceedings{power2021grokking,
  title={Grokking: Generalization beyond overfitting on small algorithmic datasets},
  author={Power, Alethea and Burda, Yuri and Edwards, Harri and Babuschkin, Igor and Misra, Vedant},
  booktitle={ICLR MATH-AI Workshop},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
figures		figures
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

Simple PyTorch Implementation of "Grokking"

Usage

Citations

About

Languages

teddykoker/grokking

Folders and files

Latest commit

History

Repository files navigation

Simple PyTorch Implementation of "Grokking"

Usage

Citations

About

Topics

Resources

Stars

Watchers

Forks

Languages