A library for converting HTML into PDFs using ReportLab

Overview

XHTML2PDF

PyPI version Python versions Travis CI AppVeyor Coveralls Read the Docs

The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its use in production depends on many factors, so be aware that you may find issues in some cases.

Big thanks to everyone who has worked on this project so far and to those who help maintain it.

About

xhtml2pdf is a HTML to PDF converter using Python, the ReportLab Toolkit, html5lib and PyPDF2. It supports HTML5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python, so it is platform independent.

The main benefit of this tool is that a user with web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Documentation

The documentation of xhtml2pdf is available at Read the Docs.

And we could use your help improving it! A good place to start is doc/source/usage.rst.

Installation

This is a typical Python library and can be installed using pip:

pip install xhtml2pdf

Requirements

Python 2.7+. Only Python 3.4+ is tested and guaranteed to work.

All additional requirements are listed in the requirements.txt file and are installed automatically using the pip install xhtml2pdf method.

Alternatives

You can try WeasyPrint. The codebase is pretty, it has different features and it does a lot of what xhtml2pdf does.

Call for testing

This project is heavily dependent on getting its test coverage up! Furthermore, parts of the codebase could do well with cleanups and refactoring.

If you benefit from xhtml2pdf, perhaps look at the test coverage and identify parts that are yet untouched.

Development environment

  1. If you don't have it, install pip, the python package installer:

    sudo easy_install pip
    

    For more information about pip refer to http://www.pip-installer.org

  2. We will recommend using virtualenv for development. It's great to have a separate environment for each project, keeping the dependencies for multiple projects separated:

    sudo pip install virtualenv
    

    For more information about virtualenv refer to http://www.virtualenv.org

  3. Create a virtualenv for the project. This can be inside the project directory, but cannot be under version control:

    virtualenv --distribute xhtml2pdfenv --python=python2
    
  4. Activate your virtualenv:

    source xhtml2pdfenv/bin/activate
    

    Later to deactivate it use:

    deactivate
    
  5. The next step will be to install/upgrade dependencies from the requirements.txt file:

    pip install -r requirements.txt
    
  6. Run tests to check your configuration:

    nosetests --with-coverage
    

    You should have a log with the following success status:

    Ran 36 tests in 0.322s
    
    OK
    

Python integration

Some simple demos of how to integrate xhtml2pdf into a Python program may be found here: test/simple.py

Running tests

Two different test suites are available to assert that xhtml2pdf works reliably:

  1. Unit tests. The unit testing framework is currently minimal, but is being improved on a regular basis (contributions welcome). They should run in the expected way for Python's unittest module, i.e.:

    nosetests --with-coverage (or your personal favorite)
    
  2. Functional tests. Thanks to mawe42's super cool work, a full functional test suite is available at testrender/.

Contact

This project is community-led! Feel free to open up issues on GitHub about new ideas to improve xhtml2pdf.

History

These are the major milestones and the maintainers of the project:

  • 2000-2007, commercial project, spirito.de, written by Dirk Holtwich
  • 2007-2010 Dirk Holtwich (project named "Pisa", project released as GPL)
  • 2010-2012 Dirk Holtwick (project named "xhtml2pdf", changed license to Apache)
  • 2012-2015 Chris Glass (@chrisglass)
  • 2015-2016 Benjamin Bach (@benjaoming)
  • 2016-2018 Sam Spencer (@LegoStormtroopr)
  • 2018-Current Luis Zarate (@luisza)

For more history, see the CHANGELOG.txt file.

License

Copyright 2010 Dirk Holtwick, holtwick.it

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Issues
  • Problems with some Unicode characters

    Problems with some Unicode characters

    Hi, I'm using the latest xhtml2pdf (0.2b1) & reportlab (3.4.0) through django-easy-pdf (0.1.0) on Python 3.6.0 and it's working great for the most part! One problem I am still experiencing, though, is that some Unicode characters are not rendering properly (šŠčČćĆđĐžŽ):

    screen shot 2017-03-29 at 16 38 36

    I'm using the default django-easy-pdf base template and I found that I can somewhat repair things if I override it to declare the html encoding:

    {% block extra_style %}
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    {% endblock %}
    

    Which results in some characters being rendered correctly like Š and Ž, but not all of them (Č, Ć, Đ are still blacked out).

    screen shot 2017-03-29 at 16 38 19

    I tried experimenting with different font declarations (sans-serif, serif, external fonts), but I can't seem to fix this. The characters are never rendered correctly. I don't know if I'm missing some xhtml2pdf / Reportlab setting here. Do you maybe have an idea of a possible solution?

    Fonts 
    opened by metakermit 48
  • black square box while generating pdf (unicode error)

    black square box while generating pdf (unicode error)

    A weird problem. While generating pdf, inplace of unicodes square black boxes apperars. Dont know if its unicode or font-face error. I even dont know if to use the "font-face and font-family" to generate the unicode into pdf. Anything I am missing ?? Great thanks.

    Code snippet # -- coding: utf-8 --

    from xhtml2pdf import pisa
    from StringIO import StringIO
    
    source = """<html>
                <style>
                    @font-face {
                    font-family: Mangal;
                    src: url("mangal.ttf");
                    }
    
                    body {
                    font-family: Mangal;
                    }
                </style>
                <body>
                    This is a test <br/>
                           सरल
                </body>
            </html>"""
    
    # Utility function
    def convertHtmlToPdf(source):       
        pdf = StringIO()
        pisaStatus = pisa.CreatePDF(StringIO(source.encode('utf-8')), pdf)
    
        # return True on success and False on errors
        print "Success: ", pisaStatus.err
        return pdf
    
    # Main program
    if __name__=="__main__":
        print pisa.showLogging()
        pdf = convertHtmlToPdf(source)
        fd = open("test.pdf", "w+b")
        fd.write(pdf.getvalue())
        fd.close()
    
    opened by beebek 31
  • Twitter-Bootstrap Causes Selector CSSParseError

    Twitter-Bootstrap Causes Selector CSSParseError

    Twitter Bootstrap has some pretty gnarly CSS selectors that xhml2pdf doesn't like.

    Result is:

    Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

    1. pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), dest=result, link_callback=fetch_resources )
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaDocument
    2.                     encoding, context=context, xml_output=xml_output)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/document.py" in pisaStory
    3. pisaParser(src, context, default_css, xhtml, encoding, xml_output)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/parser.py" in pisaParser
    4. context.parseCSS()
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseCSS
    5.     self.css = self.cssParser.parse(self.cssText)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
    6.             src, stylesheet = self._parseStylesheet(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
    7.             src, atResults = self._parseAtKeyword(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtKeyword
    8.         src, result = self._parseAtImports(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseAtImports
    9.         stylesheet = self.cssBuilder.atImport(import_, mediums, self)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/css.py" in atImport
    10.         return cssParser.parseExternal(import_)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/context.py" in parseExternal
    11.     result = self.parse(cssFile.getData())
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in parse
    12.             src, stylesheet = self._parseStylesheet(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseStylesheet
    13.             src, ruleset = self._parseRuleset(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseRuleset
    14.     src, selectors = self._parseSelectorGroup(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorGroup
    15.         src, selector = self._parseSelector(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelector
    16.     src, selector = self._parseSimpleSelector(src)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSimpleSelector
    17.             src, selector = self._parseSelectorPseudo(src, selector)
      
      File "/Library/Python/2.7/site-packages/xhtml2pdf-0.0.3-py2.7.egg/xhtml2pdf/w3c/cssParser.py" in _parseSelectorPseudo
    18.             raise self.ParseError('Selector Pseudo Function closing \')\' not found', src, ctxsrc)
      

    Exception Type: CSSParseError at /p/pdf/gd8lx6xbl Exception Value: Selector Pseudo Function closing ')' not found:: (u':not(', u'[controls]) {\n disp')

    opened by Miserlou 22
  • Now broken with html5lib

    Now broken with html5lib

    From https://pypi.python.org/pypi/html5lib/0.99999999:

    Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, utils) to be underscore prefixed to clarify their status as private

    Except https://github.com/xhtml2pdf/xhtml2pdf/blob/master/xhtml2pdf/parser.py#L17:

    from html5lib import treebuilders, inputstream
    

    Current fix:

    • Use `pip install html5lib==1.0b8`
      
    opened by LegoStormtroopr 18
  • Python 3

    Python 3

    I made some changes so that the tests now run in both Python 2 and and Python 3, Build Status. Most of the changes I made were the same as made by @wylee, in #205.

    I also added a file to do Travis CI testing #202, and updated some of the dependencies.

    opened by JimInCO 18
  • Add optional pisaDocument argument to set metadata

    Add optional pisaDocument argument to set metadata

    Without this the functionality of pisaDocument would need to be recreated in order to set metadata such as the document author.

    Usage is like so:

    pisaDocument(src=io.StringIO(html), dest=open(output_file, "w"), context_meta={
                "author": "MyCorp Ltd.",
                "title": "My Document Title",
                "subject": "My Document Subject",
                "keywords": "pdf,documents",
            })
    
    opened by alistair-broomhead 16
  • Python2/Python3 compatibility

    Python2/Python3 compatibility

    So, I'm close but for some reason on my install image in docs don't show up in python2 and are a little smaller in python3.

    I'm gonna fix this even if it kills me.

    Todo:

    • [x] Figure out how to render a transparent PDF as white (-flatten doesn't work for multipage PDFs)
    • [ ] Make the images the right size
    • [x] Clean up the string.join issues in reportlab_paragraph
    • [ ] Fix background for tr's
    opened by LegoStormtroopr 13
  • Unwanted Helvetica font

    Unwanted Helvetica font

    No matter what font I use, there is always Helvetica and it's not embed, so most of printing companies can not print the document if a font is missing.

    opened by maguayo 13
  • ZeroDivisionError: float division by zero

    ZeroDivisionError: float division by zero

    Hi, I get this error while trying to parse an HTML containing the following piece of code. I'm using the latest versions of all packages needed:

    • html5lib-0.90
    • pyPdf-1.13
    • reportlab-2.5
    • xhtml2pdf-0.0.3

    and Python 2.7 (2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)])

    Python Code: -[ import cStringIO as StringIO from xhtml2pdf import pisa ....

    html = ''' <TABLE BORDER="0" CELLPADDING="2" CELLSPACING="2"> <TR> <TD></TD> </TR> </TABLE> ''' dest = file('test.pdf', "wb") pdf = pisa.CreatePDF( StringIO.StringIO(html), dest, log_warn = 1, log_err = 1 ) ]-

    Note: If I put something inside the TD (example: ".... <TD>... some stuff..... </TD>........") or I change the value of the attr cellpadding, it works!!!

    Traceback: -[ Traceback (most recent call last): File "C:\tmp\test.py", line 95, in log_err = 1 File "C:\Python27\lib\site-packages\xhtml2pdf\document.py", line 131, in pisaDocument doc.build(context.story) File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 880, in build self.handle_flowable(flowables) File "C:\Python27\lib\site-packages\reportlab\platypus\doctemplate.py", line 763, in handle_flowable if frame.add(f, canv, trySplit=self.allowSplitting): File "C:\Python27\lib\site-packages\reportlab\platypus\frames.py", line 174, in _add flowable.drawOn(canv, self._x + self._leftExtraIndent, y, _sW=aW-w) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 108, in drawOn self._drawOn(canvas) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 89, in _drawOn self.draw()#this is the bit you overload File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1302, in draw self._drawCell(cellval, cellstyle, (colpos, rowpos), (colwidth, rowheight)) File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 1393, in _drawCell w, h = self._listCellGeom(cellval,colwidth,cellstyle,W=W, H=H,aH=rowheight) File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 710, in _listCellGeom return Table._listCellGeom(self, V, w, s, W=W, H=H, aH=aH) File "C:\Python27\lib\site-packages\reportlab\platypus\tables.py", line 377, in _listCellGeom vw, vh = v.wrapOn(canv, aW, aH) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 119, in wrapOn w, h = self.wrap(aW,aH) File "C:\Python27\lib\site-packages\xhtml2pdf\xhtml2pdf_reportlab.py", line 693, in wrap return KeepInFrame.wrap(self, availWidth, availHeight) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 970, in wrap W, H = func(s1) File "C:\Python27\lib\site-packages\reportlab\platypus\flowables.py", line 951, in func W /= x ZeroDivisionError: float division by zero ]-

    Thanks for your great job, Shen139

    opened by shen139 12
  • Release a new version

    Release a new version

    I just upgraded my version with the master branch from github and it fixes a ton of issues in the current 0.0.5 release. Could you release a new version so we can just use pypi?
    Thanks for all the work on this :)

    opened by lzantal 11
  • make rtl languages from left

    make rtl languages from left

    for example Persian text must start from right but your result seems like this Farsi / Persian: .‫یم نم‬ ‫مروخب هشيش درد ساسحا ِنودب مناوت‬ also in PDF separate character

    correct Persian من می نوانم ...

    ps: maybe i couldn't say the problem. for example correct word is "word" but your result is "dorw"

    opened by efazati 11
  • fix rtl languages paragraph sentences issue

    fix rtl languages paragraph sentences issue

    In a RTL language, the paragraphs lengths are reversed. In a normal paragraph the shortest sentence is the last sentence but this was the opposite in here.

    Before: Screen Shot 2022-06-21 at 4 18 14 PM After: Screen Shot 2022-06-21 at 4 19 00 PM

    opened by lailalelouch 0
  • <pdf:pagecount> is not working with 0.2.7

    is not working with 0.2.7

    Hi, I've updated my version of 'xhtml2pdf' from 0.2.6 to 0.2.7 with pip (so the dependencies should be up-to-date), and now the tag <pdf:pagecount> (or <pdf:pagecount />) makes all the content of the surrounding tag disappear. If <pdf:pagecount> is a direct child of <div id="footer_content">, all the footer disappears. If I add a surrounding <div> tag, only this last one disappears.

    Notes:

    • <pdf:pagenumber> (and <pdf:pagenumber />) still works fine.
    • I've "updated" to 0.2.6 and it works again (versions of the other libs did not changed when I rolled back).

    Have a good day!

    opened by genglert 1
  • border radius not working

    border radius not working

    I'm trying to draw a circle but it doesn't work

    .badge { height: 100px; width: 100px; display: table-cell; text-align: center; vertical-align: middle; border-radius: 50%; /* may require vendor prefixes */ background: yellow; }

    opened by pedro-jmanuel 1
  • Are emojis supported ? Black squares are appearing

    Are emojis supported ? Black squares are appearing

    Hello,

    I am using the version 0.2.6 and the command line tool with a simple html snippet (see below) The snippet contains an emoji and it's equivalent html entity, the html file is encoded as UTF-8 The browser has no issue displaying the emojis : image

    But when I use the tool to convert, the resulting PDF has black boxes for both characters : image

    I tried to set a font-family with some of the defaults indicated in the docs (even tried an asian font) but no luck Am I doing it wrong or is there no support currently ?

    (Btw I have the same issue via python so that's why I tried with the command tool)

    Thanks

    <!doctype html>
    <html>
    <head>
      <meta charset="utf-8">
    </head>
    <body>
      <br>AAA
      😀
      <br>BBB
      &#128512;
    </body>
    </html>
    

    image

    opened by ThaNico 0
  • Linking to in-document anchors are not fully supported?

    Linking to in-document anchors are not fully supported?

    I have a link in my HTML document like this: <a href="#target">Link</a> Where the target looks like this: <h3 id="target">Header title </h3> The link does not work in the generated PDF.

    Is this not supported or a bug? I did try other methods to convert the HTML (e.g. pandoc), and there seems to no issue there.

    opened by errmac11 0
  • background-image renders on top of other content

    background-image renders on top of other content

    I use a background-image on the @page. In the past this has worked fine, but as of xhtml2pdf 0.2.7, the image now renders on top of all the other content, and because my image is fully opaque, that means none of the content is visible.

    For what it's worth, I think that a CSS property named "background-image" should render the image in ... well, in the background :)

    Reverting to 0.2.6 solves the problem for me.

    opened by direvus 4
Releases(v0.2.8)
  • v0.2.8(Jun 16, 2022)

    🐛 Bug-Fixes

    • Fix background-image issues with #614 and pull requests with #619
    • Fix CSSParseError for minified @font-face definitions #609
    • Fixed a few typos and grammar mistakes in usage.rst documentation. #610
    Source code(tar.gz)
    Source code(zip)
  • v0.2.7(Mar 31, 2022)

    🎉 New

    • Add encryption and password protection
    • New WaterMark management system with new options
    • Add Graphic builder
    • Add signing pdfs (simple and pades)

    🐛 Bug-Fixes

    • Remove import cycle between utils and default
    • Fixed link_callback construction of path
    • Fixed path when is relative to current path

    ⚠️ Deprecation

    • xhtml in pisa.CreatePDF support will removed on next release
    • XML2PDF and XHTML2PDF will be removed on next release use HTML2PDF instead

    📘 Documentation

    • Add render pdf on documentation and add some html example.
    • Include graphics examples

    | Thanks to the following people on GitHub for contributing to this release: | @marcelagz for graphics support :)

    Source code(tar.gz)
    Source code(zip)
  • v0.2.6(Mar 11, 2022)

    • Drop python 2 support.
    • Remove most of python 2 code and cleanup
    • Update packages dependencies
    • Remove six dependency and update Readme
    • Set timeout in https options
    • Add new file manager approach using factory method, now new classes deal with different types of data B64InlineURI, LocalProtocolURI, NetworkFileUri, LocalFileURI, BytesFileUri
    • Now getColor return None when None is passed ignoring default value, but return default if bool(data) == false
    • rtl languages reversed lines added as a ParaFrag (note: not fully supported yet)
    • Check if Paragraph has 'rtl' attribute (note: not fully supported yet)
    • Fix UnboundLocalError in reportlab_paragraph (#585) (#586)
    • Remove usage of getStringIO (#590) removed form reportlab
    • Change test for github workflow using only Linux
    • Add Python 3.9, 3.10
    • Switch from PyPDF2 to PyPDF3
    • Add SVG support
    • Update package information.
    • Allow call tests using make.
    Source code(tar.gz)
    Source code(zip)
    xhtml2pdf-0.2.6.tar.gz(99.36 KB)
  • 0.2.4(Jan 21, 2020)

    Update link_callback documentation. Stylize code lines in documentation. Fixed cgi escape util on setup version. Add test to python 3.7 and 3.8. Fixed width assignation on fragments. Support urllib in python 3 and python 2. Add em unit support. Repair base64 unscaped string. Fixed urlparse when urls has parameters. Fixed i_rgbcolor support.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.2(Apr 17, 2018)

  • 0.2.1(Feb 15, 2018)

    This new release has a lot of improvements in python 3 and demos.

    Version 0.2.1

    • Improve python3 support - thanks ***luisza, andreyfedoseev and flupzor ***
    • Include new Httplibs options - thanks luisza
    • Support to background image - thanks flupzor
    • Remove python23 support - thanks flupzor
    • Transparent images work again in Python 3 - thanks flupzor
    • Readthedocs integration - thanks luisza
    • Update Django demo site - thanks luisza
    • PEP8 and cleanup code - thanks luisza
    • Drop the turbogears module - thanks browniebroke
    Source code(tar.gz)
    Source code(zip)
  • 0.1b2(Aug 1, 2016)

  • 0.1b1(Jun 5, 2016)

    This release is possibly the final release ever of xhtml2pdf, except if someone takes over maintainership. It has Python 3 support, but there are certain bugs also that you can read about in the ~37 unclosed issues.

    Source code(tar.gz)
    Source code(zip)
  • 0.1a4(May 18, 2016)

    Version 0.1alpha4

    • Removed PyPy support
    • Avoid exceptions likely to occur systematic to how narrow a text column is #309 - thanks _jkDesignDE_
    • Improved tests for tables #305 - thanks _taddeimania_
    • Fix broken empty PDFs in Python2 #301 - thanks _citizen-stig_
    • Unknown page sizes now raise an exception #71 - thanks _benjaoming_
    • Unorderable types caused by duplicate CSS selectors / rules #69 - thanks _benjaoming_
    • Allow empty page definition with no space after @page - #88 - thanks _benjaoming_
    • Error when in addFromFile using file-like object #245 - thanks _benjaoming_
    • Python 3: Bad table formatting with empty columns #279 - thanks _citizen-stig and benjaoming_
    • Removed paragraph2.py, unused ghost file since the beginning of the project #289 - thanks _citizen-stig_
    • Catch-all exceptions removed in a lot of places, not quite done #290 - thanks _benjaoming_
    Source code(tar.gz)
    Source code(zip)
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

Chaos Bodensee 1 Jan 26, 2022
A HTML-code compiler-thing that lets you reuse HTML code.

RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH

Duckie 4 Nov 15, 2021
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

Tom Flanagan 1.4k Jun 26, 2022
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

Mozilla 2.3k Jun 16, 2022
Standards-compliant library for parsing and serializing HTML documents and fragments in Python

html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo

null 974 Jun 15, 2022
A python HTML builder library.

PyML A python HTML builder library. Goals Fully functional html builder similar to the javascript node manipulation. Implement an html parser that ret

Arjix 9 May 30, 2022
Generate HTML using python 3 with an API that follows the DOM standard specfication.

Generate HTML using python 3 with an API that follows the DOM standard specfication. A JavaScript API and tons of cool features. Can be used as a fast prototyping tool.

byteface 97 Jun 3, 2022
Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

The Pallets Projects 473 Jun 18, 2022
Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

Python Software Foundation 12.6k Jun 22, 2022
Modded MD conversion to HTML

MDPortal A module to convert a md-eqsue lang to html Basically I ruined md in an attempt to convert it to html Overview Here is a demo file from parse

Zeb 1 Nov 27, 2021
A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

Gael Pasgrimaud 2.1k Jun 26, 2022
Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

Michael Gale 1 Jan 6, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 69 May 27, 2022
A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Universal Online Judge Spider Introduction This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/). It also works for all other Onl

TriNitroTofu 1 Dec 7, 2021
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

ArchiveBox Open-source self-hosted web archiving. ▶️ Quickstart | Demo | Github | Documentation | Info & Motivation | Community | Roadmap "Your own pe

ArchiveBox 13.6k Jun 19, 2022
That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

null 1 Jan 10, 2022
Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

Html-to-pdf-pdfkit-wkhtml- This repository has code for converting local html files and online html resources into pdf. It is an python script which u

Hemachandran P 1 Nov 9, 2021
A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

null 3 Feb 23, 2022
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

Chaos Bodensee 1 Jan 26, 2022
Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework

Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj

Nikit Singh Kanyal 1 Feb 17, 2022
peartree: A library for converting transit data into a directed graph for sketch network analysis.

peartree ?? ?? peartree is a library for converting GTFS feed schedules into a representative directed network graph. The tool uses Partridge to conve

Kuan Butts 134 Mar 25, 2022
A HTML-code compiler-thing that lets you reuse HTML code.

RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH

Duckie 4 Nov 15, 2021
Use minify-html, the extremely fast HTML + JS + CSS minifier, with Django.

django-minify-html Use minify-html, the extremely fast HTML + JS + CSS minifier, with Django. Requirements Python 3.8 to 3.10 supported. Django 2.2 to

Adam Johnson 37 Jun 2, 2022
Python library to extract tabular data from images and scanned PDFs

Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d

Org. Account 133 Jun 11, 2022
A python library for extracting text from PDFs without losing the formatting of the PDF content.

Multilingual PDF to Text Install Package from Pypi Install it using pip. pip install multilingual-pdf2text The library uses Tesseract which can be ins

Shahrukh Khan 48 May 4, 2022
Camelot is a Python library that can help you extract tables from PDFs!

A Python library to extract tabular data from PDFs

null 1.6k Jun 21, 2022
Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

null 1k Jun 12, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 201 Jun 11, 2022
A bulk pdf generator. This application can generate PDFs in bulk by using just one click.

A bulk html pdf generator. This application can generate PDFs in bulk by using just one click. Screenshots Requirements ?? Your system must have the f

Aman Nirala 3 Apr 23, 2022