trstop

Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with the highest frequency. In order to test the new candidate words in future, I add a small python script, and a 10 thousand item word list with highest frequency. At https://github.com/sgsinclair/trombone/blob/master/src/main/resources/org/voyanttools/trombone/keywords/stop.tr.turkish-lucene.txt are some Turkish stop words. However, some stop words in that list do not belong to the ten thousand highest frequency words.

In order to use the module:

import trstop

print(trstop.is_stop_word(parameter))

Contributors:

Ahmet Aksoy
Toprak Öztürk

Bu depoya en sık kullanılan 10 bin Türkçe sözcük listesinde yer alan dolgu sözcüklerini ekledim. Dolgu sözcükleri (stop words), sık kullanılan, ama iptal edildiklerinde ayrıldıkları cümlenin anlamında önemli değişiklikler oluşturmayan sözcüklerdir.

"Stop words" terimine karşılık "dolgu sözcükleri" terimini kullandım. Daha iyi bir seçenek varsa, değiştirmeye hazırım. Depoya eklediğim "turkce-stop-words-dict.py" betiğini, ileride listeye yeni sözcükler eklemek istediğimizde kullanım sıklığını denetlemek amacıyla kullanabiliriz.

https://github.com/sgsinclair/trombone/blob/master/src/main/resources/org/voyanttools/trombone/keywords/stop.tr.turkish-lucene.txt adresinde de bazı dolgu sözcükleri listelenmiş. Ancak buradaki bazı sözcükler ilk on bine girecek kadar yoğun frekansa sahip değil.

Modülü kullanmak için:

import trstop

print(trstop.is_stop_word(parametre))

Projeye katkıda bulunanlar:

Ahmet Aksoy
Toprak Öztürk

Son güncelleme: 29.06.2018

Turkish Stop Words Türkçe Dolgu Sözcükleri

Related tags

Overview

trstop

In order to use the module:

Contributors:

Modülü kullanmak için:

Projeye katkıda bulunanlar:

Owner

Ahmet Aksoy

Utilities for preprocessing text for deep learning with Keras

Common Voice Dataset explorer

Header-only C++ HNSW implementation with python bindings

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Code for the paper "Flexible Generation of Natural Language Deductions"

SimCTG - A Contrastive Framework for Neural Text Generation

Python package for Turkish Language.

Text Classification Using LSTM

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

test

Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Random Directed Acyclic Graph Generator

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

CorNet Correlation Networks for Extreme Multi-label Text Classification

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

A Fast Sequence Transducer Implementation with PyTorch Bindings

A curated list of efficient attention modules

Ray-based parallel data preprocessing for NLP and ML.