Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Last update: Dec 14, 2022

Overview

Little Ball of Fur is a graph sampling extension library for Python.

Please look at the Documentation, relevant Paper, Promo video and External Resources.

Little Ball of Fur consists of methods that can sample from graph structured data. To put it simply it is a Swiss Army knife for graph sampling tasks. First, it includes a large variety of vertex, edge, and exploration sampling techniques. Second, it provides a unified application public interface which makes the application of sampling algorithms trivial for end-users. Implemented methods cover a wide range of networking (Networking, INFOCOM, SIGCOMM) and data mining (KDD, TKDD, ICDE) conferences, workshops, and pieces from prominent journals.

Citing

If you find Little Ball of Fur useful in your research, please consider citing the following paper:

@inproceedings{littleballoffur,
               title={{Little Ball of Fur: A Python Library for Graph Sampling}},
               author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
               year={2020},
               pages = {3133–3140},
               booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)},
               organization={ACM},
}

A simple example

Little Ball of Fur makes using modern graph subsampling techniques quite easy (see here for the accompanying tutorial). For example, this is all it takes to use Diffusion Sampling on a Watts-Strogatz graph:

import networkx as nx
from littleballoffur import DiffusionSampler

graph = nx.newman_watts_strogatz_graph(1000, 20, 0.05)

sampler = DiffusionSampler()

new_graph = sampler.sample(graph)

Methods included

In detail, the following sampling methods were implemented.

Node Sampling

Degree Based Node Sampler from Adamic et al.: Search In Power-Law Networks (Physical Review E 2001)
Random Node Sampler from Stumpf et al.: SubNets of Scale-Free Networks Are Not Scale-Free: Sampling Properties of Networks (PNAS 2005)
PageRank Based Node Sampler from Leskovec et al.: Sampling From Large Graphs (KDD 2006)

Edge Sampling

Random Edge Sampler from Krishnamurthy et al.: Reducing Large Internet Topologies for Faster Simulations (Networking 2005)
Random Node-Edge Sampler from Krishnamurthy et al.: Reducing Large Internet Topologies for Faster Simulations (Networking 2005)
Hybrid Node-Edge Sampler from Krishnamurthy et al.: Reducing Large Internet Topologies for Faster Simulations (Networking 2005)
Random Edge Sampler with Induction from Ahmed et al.: Network Sampling: From Static to Streaming Graphs (TKDD 2013)
Random Edge Sampler with Partial Induction from Ahmed et al.: Network Sampling: From Static to Streaming Graphs (TKDD 2013)

Exploration Based Sampling

Snowball Sampler from Goodman: Snowball Sampling (The Annals of Mathematical Statistics 1961)
Loop-Erased Random Walk Sampler from Wilson: Generating Random Spanning Trees More Quickly Than the Cover Time (STOC 1996)
Forest Fire Sampler from Leskovec et al.: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations (KDD 2005)
Random Node-Neighbor Sampler from Leskovec et al.: Sampling From Large Graphs (KDD 2006)
Random Walk With Restart Sampler from Leskovec et al.: Sampling From Large Graphs (KDD 2006)
Metropolis Hastings Random Walk Sampler from Hubler et al.: Metropolis Algorithms for Representative Subgraph Sampling (ICDM 2008)
Random Walk Sampler from Gjoka et al.: Walking in Facebook: A Case Study of Unbiased Sampling of OSNs (INFOCOM 2010)
Random Walk With Jump Sampler from Ribeiro et al.: Estimating and Sampling Graphs with Multidimensional Random Walks (SIGCOMM 2010)
Frontier Sampler from Ribeiro et al.: Estimating and Sampling Graphs with Multidimensional Random Walks (SIGCOMM 2010)
Community Structure Expansion Sampler from Maiya et al.: Sampling Community Structure (WWW 2010)
Non-Backtracking Random Walk Sampler from Lee et al.: Beyond Random Walk and Metropolis-Hastings Samplers: Why You Should Not Backtrack for Unbiased Graph Sampling (SIGMETRICS 2012)
Randomized Depth First Search Sampler from Doerr et al.: Metric Convergence in Social Network Sampling (HotPlanet 2013)
Randomized Breadth First Search Sampler from Doerr et al.: Metric Convergence in Social Network Sampling (HotPlanet 2013)
Rejection Constrained Metropolis Hastings Random Walk Sampler from Li et al.: On Random Walk Based Graph Sampling (ICDE 2015)
Circulated Neighbors Random Walk Sampler from Zhou et al.: Leveraging History for Faster Sampling of Online Social Networks (VLDB 2015)
Shortest Path Sampler from Rezvanian et al.: Sampling Social Networks Using Shortest Paths (Physica A 2015)
Diffusion Sampler from Rozemberczki et al.: Fast Sequence-Based Embedding with Diffusion Graphs (Complex Networks 2018)
Diffusion Tree Sampler from Rozemberczki et al.: Fast Sequence-Based Embedding with Diffusion Graphs (Complex Networks 2018)
Common Neighbor Aware Random Walk Sampler from Li et al.: Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness (ICDE 2019)
Spiky Ball Sampler from Ricaud et al.: Spikyball Sampling: Exploring Large Networks via an Inhomogeneous Filtered Diffusion (Algorithms 2020)

Head over to our documentation to find out more about installation and data handling, a full list of implemented methods, and datasets. For a quick start, check out our examples.

If you notice anything unexpected, please open an issue and let us know. If you are missing a specific method, feel free to open a feature request. We are motivated to constantly make Little Ball of Fur even better.

Installation

Little Ball of Fur can be installed with the following pip command.

$ pip install littleballoffur

As we create new releases frequently, upgrading the package casually might be beneficial.

$ pip install littleballoffur --upgrade

Running examples

As part of the documentation we provide a number of use cases to show how to use various sampling techniques. These can accessed here with detailed explanations.

Besides the case studies we provide synthetic examples for each model. These can be tried out by running the scripts in the examples folder. You can try out the random walk sampling example by running:

$ cd examples
$ python ./exploration_sampling/randomwalk_sampler.py

Running tests

$ python setup.py test

License

GNU General Public License v3.0

Comments

change initial num of nodes formula

to avoid having more initial nodes than the requested final number of nodes (when the final number of nodes requested is much smaller than the graph size).

opened by bricaud 7

Error install dependency networkit==7.1

I didn't manage to install littleballoffur due to one of its dependency that seems outdated. It didn't work to install networkit==7.1 but I did manage to run its latest version. However, littleballoffur runs on networkit==7.1.

I am using a Jupyter notebook as an environment and the following system specs: posix Darwin 21.4.0 3.8.12 (default, Mar 17 2022, 14:54:15) [Clang 13.0.0 (clang-1300.0.29.30)]

The specific error output:

Collecting networkit==7.1
  Using cached networkit-7.1.tar.gz (3.1 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [2 lines of output]
      ERROR: No suitable compiler found. Install any of these:  ['g++', 'g++-8', 'g++-7', 'g++-6.1', 'g++-6', 'g++-5.3', 'g++-5.2', 'g++-5.1', 'g++-5', 'g++-4.9', 'g++-4.8', 'clang++', 'clang++-3.8', 'clang++-3.7']
      If using AppleClang, OpenMP might be needed. Install with: 'brew install libomp'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Please note that: libomp 14.0.0 is already installed and up-to-date.

Is there some way I could install the library on networkit v10? Thanks a lot!

opened by CristinBSE 6

Node attributes are not copied from original graph

Breadth and Depth First Search return me subgraphs without correct attributes on nodes/edges. Actually, I found that the dict containing those attributes has been completely deleted in the sampled graph. Is this a known issue? Is the sampler supposed to work in this way?

opened by jungla88 6
Why can't I use the graph imported by nx.read_edgelist()

graph = nx.read_edgelist("filename", nodetype=int, data=(("Weight", int),))

error : AssertionError: Graph is not connected. why? 'graph' is a networkx graph

opened by DeathSentence 5
Spikyball exploration sampling

You might find the change a bit invasive (understandable :) This adds a new family exploration sampling method (spikyball) described in the paper Spikyball sampling: Exploring large networks via an inhomogeneous filtered diffusion available here https://arxiv.org/abs/2010.11786 and submitted for publication in Combinatorial Optimization, Graph, and Network Algorithms journal. The version number has been increased in order not to collide with official releases of lbof, you might want to change this...

opened by naspert 4
Assumptions on graph properties

Hi there,

I am wondering if it would be possible to relax some constrain the graph has to satisfy in order to start an exploration on it. In particular, the requirement of connectivity seems a bit strong to me. I think a graph sampling procedure could easily deal with such property, since in the case the graph is not connected the sampling could take place on the single connected components or the exploration could rely on the neighborhood of the current node explored. For node sampling strategies like BFS and DFS looks pretty natural to me, also for Random Walk Sampling (maybe the one with the restart probability could be a little tricky). Something strange could probably happen for edge sampling if the connectivity property is not satisfied. Do you see any possibility to extend little ball of fur to such type of graphs? What was the reason that bring you to assume the connectivity property for graphs?

Thank you !

opened by jungla88 3
Error importing DiffusionSampler
Hello,

First of all, thank you for your great work building this library. Great extension to NetworkX.

I am facing an issue when trying to import the DiffusionSampler specifically. All the other samplers get imported just fine. However the DiffusionSampler raises import issue.

I am using a Jupyter notebook as an environment.

The specific error output:

--------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-29-fbd222d9c756> in <module> ----> 1 from littleballoffur import DiffusionSampler 2 3 4 model = DiffusionSampler() 5 new_graph = model.sample(wd50k_connected_relabeled) ImportError: cannot import name 'DiffusionSampler' from 'littleballoffur'

Is this replicable?

Thank you in advance for looking into it.
opened by DimitrisAlivas 3
ForestFireSampler throws exceptions for some seed values

Hi,

I am trying to sample an undirected, connected graph of 5559 nodes and 10804 edges into a sample of 100 nodes. As I loop over the "creation of samples" part, I am altering the seed for the ForeFireSampler every time to obtain a different sample.

E.g. seed_value = random.randint(1,2147483646) sampler = ForestFireSampler(100, seed=seed_value )

However, for some runs I get an exception thrown, which is also reproducible. I assume it is related to specific seed values which the sampler doesn´t seem to be able to handle. An example is seed value 1176372277.

Traceback (most recent call last): File "/project/topology_extraction.py", line 472, in abstraction_G = graph_sampling(S) File "/project/topology_extraction.py", line 234, in graph_sampling new_graph = sampler.sample(S) File "/usr/local/lib/python3.8/dist-packages/littleballoffur/exploration_sampling/forestfiresampler.py", line 74, in sample self._start_a_fire(graph) File "/usr/local/lib/python3.8/dist-packages/littleballoffur/exploration_sampling/forestfiresampler.py", line 47, in _start_a_fire top_node = node_queue.popleft() IndexError: pop from an empty deque

Process finished with exit code 1

I believe this is a bug in the library.

Thanks! Nils

opened by nrodday 3
Error in forest fire sampling

Hi,

While running the forest fire sampling code, I got an error that it is trying to pop an element from an empty deque.

File "/opt/anaconda3/lib/python3.7/site-packages/littleballoffur/exploration_sampling/forestfiresampler.py", line 47, in _start_a_fire top_node = node_queue.popleft() IndexError: pop from an empty deque

I am not sure if it was due to data or needs an empty/try-catch check or should it be handled by application code. Hence opened an issue.

Thank you

opened by apurvamulay 2
Broken link in Readme (readdthedocs)

https://littleballoffur.readthedocs.io/en/latest/notes/introduction.html

as of 2020-05-18 9:35 AM EDT, it says "sorry this page does not exist"

opened by bbrewington 1
Error in line 254 _checking_indexing() of backend.py

According to your code, once numeric_indices != node_indices, the error raises. Under my scenario, I constructed a networkx graph in which the indices of nodes start from '1', and then, the sampler did not work. This error will be triggered if the indices of nodes in a networkx graph do not start from '0'. I have to adjust my graph such that the indices of nodes start from '0' to utilize your samplers. I hope you can refine this part of the code to avoid someone else meets this problem.

opened by Haoran-Young 0

Releases(v_20200)

v_20200(Aug 15, 2022)

Fixing dependencies: scipy, numpy, pandas, networks.

Full Changelog: https://github.com/benedekrozemberczki/littleballoffur/compare/v2.1.12...v_20200
Source code(tar.gz)
Source code(zip)
v2.1.12(Jan 22, 2022)
Fix pandas do be compliant with cython.

Source code(tar.gz)
Source code(zip)
v_20011(Jan 5, 2022)

Connectivity checks are cleaned up everywhere.
Source code(tar.gz)
Source code(zip)
v_20010(Jan 5, 2022)
Removed backend connection checks.

Source code(tar.gz)
Source code(zip)
v_20009(Jan 5, 2022)
What's Changed

Flag for disconnected nodes.

change initial num of nodes formula by @bricaud in https://github.com/benedekrozemberczki/littleballoffur/pull/15

Update setup.py by @mamonu in https://github.com/benedekrozemberczki/littleballoffur/pull/16

New Contributors

@mamonu made their first contribution in https://github.com/benedekrozemberczki/littleballoffur/pull/16

Full Changelog: https://github.com/benedekrozemberczki/littleballoffur/compare/v_20008...v_20009
Source code(tar.gz)
Source code(zip)
v_20008(Mar 17, 2021)

Docs failing for wrong import.
Source code(tar.gz)
Source code(zip)
v_20007(Dec 4, 2020)
Added new tests and modifications.

Source code(tar.gz)
Source code(zip)
v_20(Dec 4, 2020)

Source code(tar.gz)
Source code(zip)
v_20005(Dec 2, 2020)

Added Diffusion Sampler Tree for Branching Process.
Source code(tar.gz)
Source code(zip)
v_20004(Dec 2, 2020)
Add the new diffusion sampler with tests and references.

Source code(tar.gz)
Source code(zip)
v_2(Dec 2, 2020)

Source code(tar.gz)
Source code(zip)
v_20003(Nov 20, 2020)
Forest Fire Mods.

Source code(tar.gz)
Source code(zip)
v_20002(Nov 13, 2020)
Fire halting.

Source code(tar.gz)
Source code(zip)
v_20001(Sep 8, 2020)
Optional starting nodes for exploration sampling.

Source code(tar.gz)
Source code(zip)
v_20000(Aug 26, 2020)

Source code(tar.gz)
Source code(zip)
v_10004(Aug 9, 2020)
Added abstract backend design.

Source code(tar.gz)
Source code(zip)
v_10003(Jul 18, 2020)
Sample method NetworkX Graph type output enforced.

Source code(tar.gz)
Source code(zip)
v_10002(Jul 18, 2020)
Type hints are added for every class.

Source code(tar.gz)
Source code(zip)
v_10001(Jul 5, 2020)

Source code(tar.gz)
Source code(zip)
v_10000(Jun 9, 2020)
First general release.

100% test coverage.

Travis CI.

Source code(tar.gz)
Source code(zip)
v_00009(Jun 4, 2020)

Source code(tar.gz)
Source code(zip)
v_00008(May 28, 2020)
Fixed the exceptions.

Source code(tar.gz)
Source code(zip)
v_00007(May 20, 2020)
SB

Source code(tar.gz)
Source code(zip)
v_00006(May 20, 2020)
CNARW

NBTRW

BFS

DFS

LERW

Source code(tar.gz)
Source code(zip)
v_00005(May 11, 2020)
Partial edge induction.

Common neighbor aware.

Source code(tar.gz)
Source code(zip)
v_00004(May 10, 2020)
Community Structure Sampler

Circulated Neighbor Random Walk Sampler

Shortes Path Sampler

Source code(tar.gz)
Source code(zip)
v_00003(May 9, 2020)

Frontier sampler fixed.
Source code(tar.gz)
Source code(zip)
v_00002(May 7, 2020)
Includes basic methods.

Source code(tar.gz)
Source code(zip)

Owner

Benedek Rozemberczki

Machine Learning Engineer at AstraZeneca and PhD candidate at The University of Edinburgh.

GitHub Repository https://little-ball-of-fur.readthedocs.io

Simple structured learning framework for python

PyStruct PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perce

666 Jan 03, 2023

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

🎉 第二版本 🎉 （现货趋势网格）介绍在第一版本的基础上趋势判断，不在固定点位开单，选择更优的开仓点位优势： 🎉 简单易上手安全(不用将api_secret告诉他人) 如何启动修改app目录下的authorization文件

250 Jan 07, 2023

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

95 Dec 28, 2022

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

Python-Fast-Raytracer A basic Ray Tracer that exploits numpy arrays and functions to work fast. The code is written keeping as much readability as pos

393 Dec 27, 2022

Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible

176 Jan 04, 2023

A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts.

MachineLearning A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts. Tested algorithms:

1 Feb 01, 2022

Time series changepoint detection

changepy Changepoint detection in time series in pure python Install pip install changepy Examples from changepy import pelt from cha

92 Nov 08, 2022

Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

29 Dec 02, 2022

Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

1 Oct 28, 2021

AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

465 Jan 02, 2023

monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical alg

179 Dec 21, 2022

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

0 Nov 16, 2021

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

🤖 Interactive Machine Learning experiments: 🏋️models training + 🎨models demo

1.4k Jan 06, 2023

AP1 Transcription Factor Binding Site Prediction

A machine learning project that predicted binding sites of AP1 transcription factor, using ChIP-Seq data and local DNA shape information.

1 Jan 21, 2022

机器学习检测webshell

ai-webshell-detect 机器学习检测webshell,利用textcnn+简单二分类网络,基于keras,花了七天检测原理: 从文件熵文件长度文件语句提取出特征,然后文件熵与长度送入二分类网络,文件语句送入textcnn 项目原理,介绍,怎么做出来的

56 Dec 14, 2022

Climin is a Python package for optimization, heavily biased to machine learning scenarios

climin climin is a Python package for optimization, heavily biased to machine learning scenarios distributed under the BSD 3-clause license. It works

177 Sep 02, 2022

Coursera Machine Learning - Python code

Coursera Machine Learning This repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignmen

859 Dec 10, 2022

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

1.4k Jan 01, 2023

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and t

164 Jan 04, 2023

Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

131 Dec 12, 2022

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Related tags

Overview

Comments

Releases(v_20200)

v_20200(Aug 15, 2022)

v2.1.12(Jan 22, 2022)

v_20011(Jan 5, 2022)

v_20010(Jan 5, 2022)

v_20009(Jan 5, 2022)

What's Changed

New Contributors

v_20008(Mar 17, 2021)

v_20007(Dec 4, 2020)

v_20(Dec 4, 2020)

v_20005(Dec 2, 2020)

v_20004(Dec 2, 2020)

v_2(Dec 2, 2020)

v_20003(Nov 20, 2020)

v_20002(Nov 13, 2020)

v_20001(Sep 8, 2020)

v_20000(Aug 26, 2020)

v_10004(Aug 9, 2020)

v_10003(Jul 18, 2020)

v_10002(Jul 18, 2020)

v_10001(Jul 5, 2020)

v_10000(Jun 9, 2020)

v_00009(Jun 4, 2020)

v_00008(May 28, 2020)

v_00007(May 20, 2020)

v_00006(May 20, 2020)

v_00005(May 11, 2020)

v_00004(May 10, 2020)

v_00003(May 9, 2020)

v_00002(May 7, 2020)