Bringing sanity to world of messed-up data

Last update: Oct 26, 2021

Related tags

Overview

Sanitize

sanitize is a Python module for making sure various things (e.g. HTML) are safe to use. It was originally written by Mark Pilgrim and is distributed under the BSD license.

Usage

>>> from sanitize import HTML
>>> HTML('<b>hello')
'<b>hello</b>'
>>> HTML('<img>')
'<img />'
>>> HTML(("<b><b><b>hello")
... )
'<b><b><b>hello</b></b></b>'
>>> HTML('<img src="foo"/')
''
>>> HTML('<input type="checkbox" checked>')
'<input type="checkbox" checked="checked" />'
>>> # dangerous tags (a small sample)
... 
>>> HTML('safe<applet code="foo.class" codebase="http://example.com/"></applet> <b>description</b>')
'safe <b>description</b>'
>>> HTML('safe<frameset rows="*"><frame src="http://example.com/"></frameset> <b>description</b>')
'safe <b>description</b>'
>>> # bad protocols (a small sample)
>>> HTML('<a href="java' + chr(1) + 'script:foo">bar</a>')
'<a href="#foo">bar</a>'
>>> HTML('<a href="vbscript:foo">bar</a>')
'<a href="#foo">bar</a>'
>>>

To see more usage examples see tests/test_sanitize_html.py.

Installation

python-sanitize is available on pypi

http://pypi.python.org/pypi/sanitize

So easily install it by pip:

pip install sanitize

Or by easy_install:

$ easy_install sanitize

Another way is by cloning python-sanitize's git repository

$ git clone git://github.com/Alir3z4/python-sanitize.git

Then install it by running

$ python setup.py install

Tests

To run unit tests:

$ python setup.py test

License

Sanitize is distributed under BSD license.

You might also like...

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

Compositional Zero-Shot Learning This is the official PyTorch code of the CVPR 2021 works Learning Graph Embeddings for Compositional Zero-shot Learni

70 Dec 27, 2022

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

51 Oct 8, 2022

The first dataset on shadow generation for the foreground object in real-world scenes.

Object-Shadow-Generation-Dataset-DESOBA Object Shadow Generation is to deal with the shadow inconsistency between the foreground object and the backgr

105 Dec 30, 2022

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022

Releases(2014.10.7)

2014.10.7(Oct 7, 2014)
Version 2014.10.7 - 2014-10-07

Feature: Add ChangeLog.rst file.

Feature: Add AUTHORS.rst file.

Feature: Add setup.cfg for wheel support.`

Feature #2: Add travis-ci testing.

Feature #4: Using unittest for testing.

Feature #7: Add coveralls support.

Feature #8: Add MANIFEST.in file.

Feature #5: Better Readme and documentation.

Feature #1: Python packaging done right.

Feature #9: Change version numbering.

Source code(tar.gz)
Source code(zip)

Bringing sanity to world of messed-up data

Related tags

Overview

Sanitize

Usage

Installation

Tests

License

You might also like...

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

The first dataset on shadow generation for the foreground object in real-world scenes.

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Open-World Entity Segmentation

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

[CVPR2021] De-rendering the World's Revolutionary Artefacts

Learning Open-World Object Proposals without Learning to Classify

Releases(2014.10.7)

2014.10.7(Oct 7, 2014)

Version 2014.10.7 - 2014-10-07

Owner

Alireza Savand

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

performing moving objects segmentation using image processing techniques with opencv and numpy

HuSpaCy: industrial-strength Hungarian natural language processing

KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

Code release for NeRF (Neural Radiance Fields)

Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Neural network chess engine trained on Gary Kasparov's games.

This is the paddle code for SeBoW(Self-Born wiring for neural trees), a kind of neural tree born form a large search space

The code for "Deep Level Set for Box-supervised Instance Segmentation in Aerial Images".

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

DeepSpamReview: Detection of Fake Reviews on Online Review Platforms using Deep Learning Architectures. Summer Internship project at CoreView Systems.

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Dynamic Capacity Networks using Tensorflow

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

Code Release for Learning to Adapt to Evolving Domains

Machine learning, in numpy