Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

Overview

Boostcamp AI Tech 3rd : Basic Paper Reading w.r.t Embedding

TL;DR

1992년부터 2018년도까지 이루어진 word/sentence embedding의 중요한 줄기를 이루는 기초 논문 스터디를 진행하고자 합니다. 

논문 정리 발표에 들어갈 내용

  • 저자가 풀려고 하는 문제는 어떤 것인가?
  • 어떤 식으로 해결하고자 했는가. 어떤 장점이 있는가(시간 여유가 된다면, 이전에는 어떤 방법이 있었고 그 방법들의 단점)
  • 그 방법에 대한 intuition (수학 없이)
  • 방법에 대한 이해(수학적으로)
  • 방법의 성공성을 보여주기 위해 사용한 데이터, 메트릭, 성능비교
  • 부족하다 생각되는 것, 애매한 것, 혹은 좋았던 점 등의 Discussion point

리딩 리스트

Dates Paper(author) Year Presenter File upload Code explained
Class-Based n-gram Models of Natural Language(Peter F Brown, et al.) 1992 소연 설명
Efficient Estimation of Word Representations in Vector Space(Tomas Mikolov, et al) 2013 동진 발표
Distributed Representations of Words and Phrases and their Compositionality(Tomas Mikolov, et al) 2013 나연 설명 skip-gram, CBOW
Distributed Representations of Sentences and Documents(Quoc V. Le and Tomas Mikolov) 2014 기원 설명 Doc2Vec
GloVe: Global Vectors for Word Representation(Jeffrey Pennington, et al.) 2015 수정 설명
Skip-Thought Vectors(Ryan Kiros, et al.) 2015 기범 설명
Enriching Word Vectors with Subword Information(Piotr Bojanowski, et al.) 2017 은기 설명
Universal Sentence Encoder(Daniel Cer et al.) 2018

issue & 추가 스터디 자료

Dates Topic Presenter File upload
04/14 genism을 이용한 word2vec 사용 현지 링크
04/14 negative samping & subsampling 나경 링크
04/14 hierarchical softmax 소연 링크
04/14 negative contrastive estimation(NCE) 수정 링크

스터디 룰

  • 스터디 시간 : 목요일 저녁 9시 30분!
  • 스터디 분량 : 매주 1주씩! (프로덕트 서빙 커리큘럼 기간에 집중할 수 있게 그전에 끝내보아영)
    • 각각 읽고, 질문 최소 1개를 github issue에 올림(+ 거기에 대한 답변을 안다면 답변 달아주기!)
  • 발표자 : 해당 요일에 랜덤 선택. 발표 자료는 자유 양식
    • 논문 발표 : 발표자는 발표 후 정리 내용 해당 레포 폴더파서 업로드. 발표자 외 사람 중 공유하고 싶은 사람은 issues에 남기거나 file upload 에 마찬가지로 링크 추가 가능(자율)
    • 코드뷰 설명: 해당 논문 발표자는 다음주차에 코드뷰 설명(e.g, 어떤 라이브러리로 쉽게 쓸 수 있는지 usage 설명, 알고리즘이 복잡한 경우 코드뷰로 어떻게 구현되었는지 설명 등 본인 기호에 맞게)

참여자

강나경, 김소연, 김현지, 박기범, 임동진, 임수정, 정기원, 한나연 , 김은기

참고 링크

논문을 정리하는 틀과 issues를 통한 discussion이 좋았던 깃헙 레포 참고

리딩 리스트를 참고한 NLP Must Read paper 정리된 깃헙 레포 참고

국내 NLP 리뷰 모임 참고 (season1의 beginners에 중복되는 논문들 있어요!)

Owner
Soyeon Kim
Soyeon Kim
Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Price-Prediction-For-a-Dream-Home ROADMAP TO THIS LINEAR REGRESSION BASED HOUSE PRICE PREDICTION PREDICTION MODEL Import all the dependencies of the p

DIKSHA DESWAL 1 Dec 29, 2021
A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

jedibobo 3 Dec 28, 2022
Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

What is Detectron2-FC Detectron2-FC a fast construction platform of neural network algorithm based on detectron2. We have been working hard in two dir

董晋宗 9 Jun 06, 2022
Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

RefinerViT This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm an

101 Dec 29, 2022
🤖 Project template for your next awesome AI project. 🦾

🤖 AI Awesome Project Template 👋 Template author You may want to adjust badge links in a README.md file. 💎 Installation with pip Installation is as

Wiktor Łazarski 18 Nov 23, 2022
A simple, fully convolutional model for real-time instance segmentation.

You Only Look At CoefficienTs ██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗ ╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝ ╚██

Daniel Bolya 4.6k Dec 30, 2022
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
This project is for a Twitter bot that monitors a bird feeder in my backyard. Any detected birds are identified and posted to Twitter.

Backyard Birdbot Introduction This is a silly hobby project to use existing ML models to: Detect any birds sighted by a webcam Identify whic

Chi Young Moon 71 Dec 25, 2022
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

HugsVision is an open-source and easy to use all-in-one huggingface wrapper for computer vision. The goal is to create a fast, flexible and user-frien

Labrak Yanis 166 Nov 27, 2022
Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

EDSR modelling A Github repository for deep-learning image enhancement, pore-network and continuum modelling from X-Ray Micro-CT images. The repositor

Samuel Jackson 7 Nov 03, 2022
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

9.3k Jan 02, 2023
Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis [Paper] [Online Demo] The following results are obtained by our SCUNet with purely syn

Kai Zhang 312 Jan 07, 2023
Contrastive Multi-View Representation Learning on Graphs

Contrastive Multi-View Representation Learning on Graphs This work introduces a self-supervised approach based on contrastive multi-view learning to l

Kaveh 208 Dec 23, 2022
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022
Pytorch cuda extension of grid_sample1d

Grid Sample 1d pytorch cuda extension of grid sample 1d. Since pytorch only supports grid sample 2d/3d, I extend the 1d version for efficiency. The fo

lyricpoem 24 Dec 03, 2022
Codebase of deep learning models for inferring stability of mRNA molecules

Kaggle OpenVaccine Models Codebase of deep learning models for inferring stability of mRNA molecules, corresponding to the Kaggle Open Vaccine Challen

Eternagame 40 Dec 29, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles [ICCV 2021, Oral] This repository contains tools for training and evaluating: Pretrained models Demo code Traini

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 27 Dec 23, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech | | | | 中文文档 This repository is the official PyTorch implementation of our IJCAI-2022

Zhenhui YE 116 Nov 24, 2022