Exploring dimension-reduced embeddings

Last update: Nov 29, 2022

Related tags

Text Data & NLP sleepwalk

Overview

sleepwalk

Exploring dimension-reduced embeddings

This is the code repository. See here for the Sleepwalk web page.

License and disclaimer

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Comments

Error running sleepwalk: cannot open the connection
Dear sleepwalk developers, Thanks a lot for providing such nice method. I could install the package but I get the following error when I tried to run:

> sleepwalk([email protected][email protected], [email protected][email protected]) Estimating 'maxdist' for feature matrix 1 Server has been stopped. Server has been stopped. Error in app$openPage(useViewer, browser) : Timeout waiting for websocket. In addition: Warning messages: 1: In file(con, "r") : cannot open file 'sleepwalk_canvas.html': No such file or directory 2: In func(req) : File '/favicon.ico' is not found

I know this is probably not a sleepwalk specific error, but I couldn't find a solution for this. Any hints/help on how to fix this issue?

Also, I have a question about the output. Besides using the interactive mode to manually inspect cells that might be "misplaced" on the reduced-dimension space, I would like to systematically find the cells that don't quite fit to the clusters they were originally assigned to. In other words, how would you suggest to use sleepwalk to refine my clustering since I suspect that many of my cells were wrongly assigned to their clusters. I am using Seurat package to reduce dimension and clustering.

Thank you very much, Gustavo
opened by gufranca 2
Error: 'browser' must be a non-empty character string
Hello,

After calling the sleepwalk function on a Seurat object, I got this error:

> sleepwalk( as.matrix([email protected][email protected]), as.matrix([email protected][email protected]) ) Estimating 'maxdist' for feature matrix 1 Error in browseURL(str_c("http://localhost:", port, "/", pageobj$startPage), : 'browser' must be a non-empty character string

I have loaded the stringr library (containing the function str_c()), and I cannot find the file originating this error. Can I ask if someone had this problem at some point?

Thank you
opened by PedroRaposo 2
slw_on_selection error when sleepwalk is not attached

Running sleepwalk without attaching the package (i.e., NOT specifying library(sleepwalk)) like this works fine:

sleepwalk::sleepwalk(se[email protected][email protected], t([email protected][[email protected],]))

But the moment you select cells with your mouse, it crashed (browser tab closes) and R gives this error:

Error in slw_on_selection(selPoints, 1) : could not find function "slw_on_selection"

Loading the package using library(sleepwalk) solves the issue, but it'd be nice if it weren't necessary.

opened by FelixTheStudent 0
doc for comparison

The example on the web page for comparing two embeddings still uses the old version where both distances are used concurrently. We also need to change the explanation below to say that the same cell always has the same colour in all embeddings

opened by simon-anders 0
Suggestion: Link embeddings from transposed table

Let say I have e.g. a matrix where I have individuals (cells e.g.) as rows and features as columns, and then run a UMAP on both the ordinary matrix, and the transposed one. Then it would be natural to want to look at the individual UMAP with the default usage (the distances to other individuals), but it would also be interesting to see the features for that individual (and vice versa).

Is it clear what I mean?

opened by StaffanBetner 2

Releases(v0.3.2)

v0.3.2(Sep 17, 2021)
jrc now (v.0.5.0) uses setLimits function for all the security restriction. This update fixes the dependency problem caused by that change.

Source code(tar.gz)
Source code(zip)
v0.3.1(Sep 30, 2020)
broken path to the start page, caused by jrc update fixed

Source code(tar.gz)
Source code(zip)
v.0.3.0(Feb 27, 2020)
New argument metric allows to use angular distance (metric = "cosine") as an alternative to default Euclidean distance (meric = "euclid").

If compare = "distances", it is no longer required to provide several embeddings. If only one embedding is given, it will be used for all the distances.

Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 2, 2019)
Changes due to an update of the jrc package.

Indices of selected points are no longer stored in a variable and can be accessed only via the callback function. Thus, no changes to the global environment are made, unless user specifies them his- or herself.

Added the possibility to pass arguments to jrc::openPage (such as port number or browser in which to open the app.)

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 27, 2019)
Now HTML Canvas is used to plot the embedding. It makes Sleepwalk faster and allows to simultaneously display more points.

New parameter mode = c("canvas", "svg") is added, that allows user to go back to the old SVG-based version of Sleepwalk app.

Bug in slw_snapshot is fixed. The function no longer returns a list of identical plots, when used with several different embeddings.

Source code(tar.gz)
Source code(zip)

Owner

S. Anders's research group at ZMBH

GitHub Repository https://anders-biostat.github.io/sleepwalk/

GSoC'2021 | TensorFlow implementation of Wav2Vec2

73 Nov 28, 2022

Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

27 Jul 09, 2022

A python script that will use hydra to get user and password to login to ssh, ftp, and telnet

Hydra-Auto-Hack A python script that will use hydra to get user and password to login to ssh, ftp, and telnet Project Description This python script w

2 Jan 16, 2022

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

4 Jun 17, 2022

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

wav2vec-toolkit A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models This repository accompanies the

29 Oct 23, 2022

PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

161 Oct 30, 2022

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

1 Feb 11, 2022

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

9 Oct 13, 2022

🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk 1.8k Dec 21, 2022

Model for recasing and repunctuating ASR transcripts

Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o

88 Dec 29, 2022

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

File-based TF-IDF Calculates keywords in a document, using a word corpus. Why? Because I found myself with hundreds of plain text files, with no way t

1 Feb 11, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Opal-lang - A WIP programming language based on Python

thanks to aphitorite for the beautiful logo! opal opal is a WIP transcompiled pr

3 Nov 04, 2022

An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据，遵循文件结构如下： ├── data │ ├── train │

10 Jul 19, 2022

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Regression Free Model Update Code for the paper: Regression Bugs Are In Your Mod

2 Feb 17, 2022

Exploring dimension-reduced embeddings

Related tags

Overview

sleepwalk

License and disclaimer

Comments

Error running sleepwalk: cannot open the connection

Error: 'browser' must be a non-empty character string

slw_on_selection error when sleepwalk is not attached

doc for comparison

Suggestion: Link embeddings from transposed table

Releases(v0.3.2)

v0.3.2(Sep 17, 2021)

v0.3.1(Sep 30, 2020)

v.0.3.0(Feb 27, 2020)

v0.2.1(Oct 2, 2019)

v0.2.0(Sep 27, 2019)

Owner

S. Anders's research group at ZMBH

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Shellcode antivirus evasion framework

A python script that will use hydra to get user and password to login to ssh, ftp, and telnet

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

PG-19 Language Modelling Benchmark

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

🎐 a python library for doing approximate and phonetic matching of strings.

Model for recasing and repunctuating ASR transcripts

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Opal-lang - A WIP programming language based on Python

An end to end ASR Transformer model training repo

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

leaking paid token generator that was a shit lmao for 100$ haha

✨Fast Coreference Resolution in spaCy with Neural Networks

🏆 • 5050 most frequent words in 109 languages

DeLighT: Very Deep and Light-Weight Transformers

Ask for weather information like a human