Scan, index, and archive all of your paper documents

Last update: Jan 06, 2023

Overview

[ en | de | el ]

Important news about the future of this project

It's been more than 5 years since I started this project on a whim as an effort to try to get a handle on the massive amount of paper I was dealing with in relation to various visa applications (expat life is complicated!) Since then, the project has exploded in popularity, so much so that it overwhelmed me and working on it stopped being "fun" and started becoming a serious source of stress.

In an effort to fix this, I created the Paperless GitHub organisation, and brought on a few people to manage the issue and pull request load. Unfortunately, that model has proven to be unworkable too. With 23 pull requests waiting and 157 issues slowly filling up with confused/annoyed people wanting to get their contributions in, my whole "appoint a few strangers and hope they've got time" idea is showing my lack of foresight and organisational skill.

In the shadow of these difficulties, a fork called Paperless-ng written by Jonas Winkler has cropped up. It's really good, and unlike this project, it's actively maintained (at the time of this writing anyway). With 564 forks currently tracked by GitHub, I suspect there are a few more forks worth looking into out there as well.

So, with all of the above in mind, I've decided to archive this project as read-only and suggest that those interested in new updates or submitting patches have a look at Paperless-ng. If you really like "Old Paperless", that's ok too! The project is GPL licensed, so you can fork it and run it on whatever you like so long as you respect the terms of said license.

In time, I may transfer ownership of this organisation to Jonas if he's interested in taking that on, but for the moment, he's happy to run Paperless-ng out of its current repo. Regardless, if we do decide to make the transfer, I'll post a notification here a few months in advance so that people won't be surprised by new code at this location.

For my part, I'm really happy & proud to have been part of this project, and I'm sorry I've been unable to commit more time to it for everyone. I hope you all understand, and I'm really pleased that this work has been able to continue to live and be useful in a new project. Thank you to everyone who contributed, and for making Free software awesome.

Sincerely, Daniel Quinn

Index and archive all of your scanned paper documents

I hate paper. Environmental issues aside, it's a tech person's nightmare:

There's no search feature
It takes up physical space
Backups mean more paper

In the past few months I've been bitten more than a few times by the problem of not having the right document around. Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper. I wrote this to make my life easier.

How it Works

Paperless does not control your scanner, it only helps you deal with what your scanner produces

Buy a document scanner that can write to a place on your network. If you need some inspiration, have a look at the scanner recommendations page.
Set it up to "scan to FTP" or something similar. It should be able to push scanned images to a server without you having to do anything. Of course if your scanner doesn't know how to automatically upload the file somewhere, you can always do that manually. Paperless doesn't care how the documents get into its local consumption directory.
Have the target server run the Paperless consumption script to OCR the file and index it into a local database.
Use the web frontend to sift through the database and find what you want.
Download the PDF you need/want via the web interface and do whatever you like with it. You can even print it and send it as if it's the original. In most cases, no one will care or notice.

Here's what you get:

Documentation

It's all available on ReadTheDocs.

Requirements

This is all really a quite simple, shiny, user-friendly wrapper around some very powerful tools.

ImageMagick converts the images between colour and greyscale.
Tesseract does the character recognition.
Unpaper despeckles and deskews the scanned image.
GNU Privacy Guard is used as the encryption backend.
Python 3 is the language of the project.
- Pillow loads the image data as a python object to be used with PyOCR.
- PyOCR is a slick programmatic wrapper around tesseract.
- Django is the framework this project is written against.
- Python-GNUPG decrypts the PDFs on-the-fly to allow you to download unencrypted files, leaving the encrypted ones on-disk.

Project Status

This project has been around since 2015, and there's lots of people using it. For some reason, it's really popular in Germany -- maybe someone over there can clue me in as to why?

I am no longer doing new development on Paperless as it does exactly what I need it to and have since turned my attention to my latest project, Aletheia. However, I'm not abandoning this project. I am happy to field pull requests and answer questions in the issue queue. If you're a developer yourself and want a new feature, float it in the issue queue and/or send me a pull request! I'm happy to add new stuff, but I just don't have the time to do that work myself.

Affiliated Projects

Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:

Paperless App: An Android/iOS app for Paperless.
Paperless Desktop: A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
ansible-role-paperless: An easy way to get Paperless running via Ansible.
paperless-cli: A golang command line binary to interact with a Paperless instance.

Similar Projects

There's another project out there called Mayan EDMS that has a surprising amount of technical overlap with Paperless. Also based on Django and using a consumer model with Tesseract and Unpaper, Mayan EDMS is much more featureful and comes with a slick UI as well, but still in Python 2. It may be that Paperless consumes fewer resources, but to be honest, this is just a guess as I haven't tested this myself. One thing's for certain though, Paperless is a way better name.

Important Note

Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. While Paperless encrypts the original files via the consumption script, the OCR'd text is not encrypted and is therefore stored in the clear (it needs to be searchable, so if someone has ideas on how to do that on encrypted data, I'm all ears). This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

Donations

As with all Free software, the power is less in the finances and more in the collective efforts. I really appreciate every pull request and bug report offered up by Paperless' users, so please keep that stuff coming. If however, you're not one for coding/design/documentation, and would like to contribute financially, I won't say no ;-)

The thing is, I'm doing ok for money, so I would instead ask you to donate to the United Nations High Commissioner for Refugees. They're doing important work and they need the money a lot more than I do.

Comments

Parse error of some documents (not all)

I do get an error by comsuming some documents. In the log file I got the following message:

PARSE FAILURE for /home/paperless/consume/20181201132749Z - Vodafone_0.pdf: Convert failed at ('convert', '-scale', '500x5000', '-alpha', 'remove', '/home/paperless/consume/20181201132749Z - Vodafone_0.pdf[0]', '/tmp/paperless/paperless-kljps7vb/convert.png')
Consuming /home/paperless/consume/20181201132749Z - Vodafone_0.pdf
Parsers available: RasterisedDocumentParser

The status of the consumer process showes the following

Dez 01 13:39:19 nuc python3[31976]: Consuming /home/paperless/consume/20181201132749Z-Vodafone_0.pdf
Dez 01 13:39:19 nuc python3[31976]: convert-im6.q16: FailedToExecuteCommand `'gs' -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 '-sDEVICE=pngalpha' -dTextAlphaBits=4 -dGraphicsAlphaBits=4 '-r72x72' -dFirstPage=1 -dLastPage=1 
Dez 01 13:39:19 nuc python3[31976]: convert-im6.q16: no images defined `/tmp/paperless/paperless-k3id431m/convert.png' @ error/convert.c/ConvertImageCommand/3258.
Dez 01 13:39:19 nuc python3[31976]: PARSE FAILURE for /home/paperless/consume/20181201132749Z-Vodafone_0.pdf: Convert failed at ('convert', '-scale', '500x5000', '-alpha', 'remove', '/home/paperless/consume/20181201132749Z-Vodafone_0.pdf[0]', '/tmp/paperless/paperless-k3id431m/convert.png')

I already tried to reprint the pdf file into an other but it does not work. Any proposals?

opened by Ulli2k 34

Add Dockerfile for application and documentation
This commit adds a Dockerfile to the root of the project, accompanied by a docker-compose.yml for simplified deployment. The Dockerfile is agnostic to whether it will be the webserver, the consumer, or if it is run for a one-off command (i.e. creation of a superuser, migration of the database, document export, ...).

The containers entrypoint is the scripts/docker-entrypoint.sh script. This script verifies that the required permissions are set, remaps the default users and/or groups id if required and installs additional languages if the user wishes to.

After initialization, it analyzes the command the user supplied:

If the command starts with a slash, it is expected that the user wants to execute a binary file and the command will be executed without further intervention. (Using exec to effectively replace the started shell-script and not have any reaping-issues.)

If the command does not start with a slash, the command will be passed directly to the manage.py script without further modification. (Again using exec.)

The default command is set to --help.

If the user wants to execute a command that is not meant for manage.py but doesn't start with a slash, the Docker --entrypoint parameter can be used to circumvent the mechanics of docker-entrypoint.sh.

Further information can be found in docs/setup.rst and in docs/migrating.rst.

Some additional points:

Given the discussions in issue #2 and PR #28, this PR will probably supersede PR #28.

If you have skimmed through the migration-documentation you might have realized that I have left out how to restore data using Docker. I have actually written the corresponding documentation and implementation, but it requires a custom loaddata-command which can be found in this gist.

Right now, the license the gist is under is unclear. I have already asked for clarification and as soon as I get a response, I am going to update this PR accordingly.

I have marked this as work in progress. While I have tested everything (at least I think so...) I have documented, I am not comfortable merging this before there are a few responses.

One big point remaining to discuss is Docker Hub. In my eyes it is essential that the official Docker container will be available on the hub, and as up-to-date as possible.

Regarding integrity of the image, using Docker Hub's automated builds is also something I would see as a given. This leaves the issue of "namespacing" -- under what user or organization should this container live?

If we were to bind it to a user, it would have to be @danielquinn as far as I can tell since Docker Hub requires linking your GitHub account to make automated builds work. Additionally, having Docker Hub rebuild the image as soon as master gets updated requires this link as well. (I don't know if we would want this build to happen fully automatically, but rather only if we know building the Docker image will not fail. I have something brewing regarding Travis-CI, this could possibly solve this.)

I have no experience with organizations on Docker Hub, maybe someone with more knowledge on that can support here?

Overall, there are a few open points, although I think the whole Docker Hub issue should be discussed separately and should not block this PR from being merged.
enhancement
opened by pitkley 33
Data loss when exporting documents with duplicate titles
The export script writes documents into a directory without checking for name collisions. This means that if I upload two documents with the same title, they will produce distinct entries in the database, but the export will clobber one of them.

I'm happy to tackle fixing this (I have quite some more experience with python and django than with docker :-) ), but wanted to get your pre-approval for the approach I would take. I thought to add a timestamp prefix to the export, so that files come out looking like 2016-02-13T23:15:02Z - Title - tag,tag.pdf. It's not 100% watertight, but it seems pretty much good enough to me. It also implies the consumer has to recognise this format -- that has one nice side effect, namely that exporting docs and re-importing them will also preserve the timestamps at which they were originally imported.

If you prefer a different approach, I'm happy to take it on if you can give me a sketch, or to brainstorm alternative ideas here. I actually need something like this, because one of my use cases for paperless is storing documents that come in at regular intervals and will always get the same name (e.g. the electricity bill).

[ ] Are filenames with .isoformat() datetimes scp-compatible? (There is some concern about the : character.)

bug enhancement help wanted
opened by tikitu 32
Server setup via Docker

It would be really cool to have a small Docker script that allows this project to be built and deployed behind an Apache or nginx server; that way a person could simply drop into onto a cheap host somewhere in the cloud and have the service accessible to them from everywhere...
enhancement help wanted

opened by gamesbook 25
Add PDF preview next to edit form

This adds a simple PDF preview next to the edit form. Adding the CSS to the page footer is a bit ugly. If anyone with better knowledge of Django knows of a cleaner way to do this, please let me know.

On a large enough screen, it looks like this:

If there is not enough space, the preview will be shown below the form:

Closes: #596

opened by bauerj 23
Consumer is not detecting files from 120kb?

Hello,

Just testing this nice application, however.... I have imported numbers of pdf files but the consumer can't detect larger files. Like for example I downloaded the paperless documentation pdf, he didn't detect it.

I am trying to scan a few documents with android apps but I cant import them because of the file size?

opened by MaartenMol 23

IntegrityError on deleting document

When I try to delete a document, I get the following error:

IntegrityError at /paperless/admin/documents/document/239/delete/
FOREIGN KEY constraint failed

Request Method: POST
Request URL: http://FQDN/paperless/admin/documents/document/239/delete/
Django Version: 2.0.8
Exception Type: IntegrityError
Exception Value: FOREIGN KEY constraint failed
Exception Location: /usr/lib/python3.6/site-packages/django/db/backends/sqlite3/base.py in execute, line 303
Python Executable: /usr/bin/python3
Python Version: 3.6.5
Python Path: ['/usr/src/paperless/src',
 '/usr/lib/python36.zip',
 '/usr/lib/python3.6',
 '/usr/lib/python3.6/lib-dynload',
 '/usr/lib/python3.6/site-packages']
Server time: Mon, 3 Sep 2018 21:58:47 +0000

with traceback:

Environment:


Request Method: POST
Request URL: http://FQDN/paperless/admin/documents/document/239/delete/

Django Version: 2.0.8
Python Version: 3.6.5
Installed Applications:
['django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'corsheaders',
 'django_extensions',
 'documents.apps.DocumentsConfig',
 'reminders.apps.RemindersConfig',
 'paperless_tesseract.apps.PaperlessTesseractConfig',
 'django.contrib.admin',
 'rest_framework',
 'crispy_forms',
 'django_filters']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'corsheaders.middleware.CorsMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'paperless.middleware.Middleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware']



Traceback:

File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py" in _execute
  85.                 return self.cursor.execute(sql, params)

File "/usr/lib/python3.6/site-packages/django/db/backends/sqlite3/base.py" in execute
  303.         return Database.Cursor.execute(self, query, params)

The above exception (FOREIGN KEY constraint failed) was the direct cause of the following exception:

File "/usr/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
  35.             response = get_response(request)

File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  128.                 response = self.process_exception_by_middleware(e, request)

File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  126.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/lib/python3.6/site-packages/django/contrib/admin/options.py" in wrapper
  575.                 return self.admin_site.admin_view(view)(*args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapped_view
  142.                     response = view_func(request, *args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
  44.         response = view_func(request, *args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/contrib/admin/sites.py" in inner
  223.             return view(request, *args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapper
  62.             return bound_func(*args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/utils/decorators.py" in _wrapped_view
  142.                     response = view_func(request, *args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/utils/decorators.py" in bound_func
  58.                 return func.__get__(self, type(self))(*args2, **kwargs2)

File "/usr/lib/python3.6/site-packages/django/contrib/admin/options.py" in delete_view
  1736.             return self._delete_view(request, object_id, extra_context)

File "/usr/lib/python3.6/site-packages/django/contrib/admin/options.py" in _delete_view
  1768.             self.log_deletion(request, obj, obj_display)

File "/usr/lib/python3.6/site-packages/django/contrib/admin/options.py" in log_deletion
  806.             action_flag=DELETION,

File "/usr/lib/python3.6/site-packages/django/contrib/admin/models.py" in log_action
  29.             change_message=change_message,

File "/usr/lib/python3.6/site-packages/django/db/models/manager.py" in manager_method
  82.                 return getattr(self.get_queryset(), name)(*args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/db/models/query.py" in create
  417.         obj.save(force_insert=True, using=self.db)

File "/usr/lib/python3.6/site-packages/django/db/models/base.py" in save
  729.                        force_update=force_update, update_fields=update_fields)

File "/usr/lib/python3.6/site-packages/django/db/models/base.py" in save_base
  759.             updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)

File "/usr/lib/python3.6/site-packages/django/db/models/base.py" in _save_table
  842.             result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)

File "/usr/lib/python3.6/site-packages/django/db/models/base.py" in _do_insert
  880.                                using=using, raw=raw)

File "/usr/lib/python3.6/site-packages/django/db/models/manager.py" in manager_method
  82.                 return getattr(self.get_queryset(), name)(*args, **kwargs)

File "/usr/lib/python3.6/site-packages/django/db/models/query.py" in _insert
  1125.         return query.get_compiler(using=using).execute_sql(return_id)

File "/usr/lib/python3.6/site-packages/django/db/models/sql/compiler.py" in execute_sql
  1285.                 cursor.execute(sql, params)

File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py" in execute
  100.             return super().execute(sql, params)

File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py" in execute
  68.         return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)

File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py" in _execute_with_wrappers
  77.         return executor(sql, params, many, context)

File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py" in _execute
  85.                 return self.cursor.execute(sql, params)

File "/usr/lib/python3.6/site-packages/django/db/utils.py" in __exit__
  89.                 raise dj_exc_value.with_traceback(traceback) from exc_value

File "/usr/lib/python3.6/site-packages/django/db/backends/utils.py" in _execute
  85.                 return self.cursor.execute(sql, params)

File "/usr/lib/python3.6/site-packages/django/db/backends/sqlite3/base.py" in execute
  303.         return Database.Cursor.execute(self, query, params)

Exception Type: IntegrityError at /paperless/admin/documents/document/239/delete/
Exception Value: FOREIGN KEY constraint failed

Let me know if you can reproduce or not or if you need more information. I am running the Docker container.

opened by kmlucy 22

Document Categories

Thanks for putting together this tool! After a bit of tooling around, I've got a bare metal install to play with. I was wondering what your thoughts are on creating a new property for Documents for a Category, separate from the tags. That way if you want to separate Documents into "Mail", "Receipts", "Statements", you can do that and perhaps even have different behaviors per Category.

An example of something that I personally would like to see is that for all of my Mail that I scan, I scan the envelope first, then open it up and scan the contents as subsequent pages. It would be cool for the "Mail" category to show a thumbnail of the first page of the PDF (in my case it would be the envelope). For the other Categories, maybe allow the user to upload their own images or whatnot to show for them so that at a glance, the Documents can be quickly gleaned as to which Category they belong to (a briefcase or a lock or whatnot).

If you're not opposed to the idea, I could potentially help implement it, time permitting.
enhancement

opened by nuudles 22

Trouble processing multi-page documents

Hi everyone,

First of all, big up for the project, really interesting. I am giving it a try right now. The solution works under a raspberry-docker environment. All was doing fine so far, I could have my first single pages pdf's processed correctly. Then I dropped into the consumer_folder two 2-page recto-verso pdf's (so 4 sheets per doc) plus a new single page document for which one I created a specific 'correspondent' rule in the web-app to see how the feature works. Finally I ended up with an infinite loop over the first 2-page pdf:

  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0001.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational

Then I deleted it from the folder and the consumer started to process the second 2-page pdf:

April 26, 2018, 10 a.m. | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0002.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611000_0002.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational
  | Starting document consumer at /consume | Informational

It was taking the same direction, then I deleted that one too manually from the consumer's folder. Then it started to process the single page pdf with success:

April 26, 2018, 10:05 a.m. | Document 20180329000000: SKM_C454e18042611340 consumption finished | Informational
  | Completed | Informational
  | Detected document date 2018-03-29T00:00:00+00:00 based on string 29/03/2018 | Informational
  | Parsing for fra | Informational
  | Parsing for eng | Informational
  | OCRing the document | Informational
  | Consuming /consume/SKM_C454e18042611340.pdf | Informational
  | Parsers available: RasterisedDocumentParser | Informational

Then I did not had an auto correspondent match because the literal expression I indicated was not reflected in the final OCR'ed output, but this is an other topic.

My question is, does paperless is able to process multi-page pdf's and if positive what could cause the loops on this type of document I experienced ?

Many thanks,

opened by GarethFox 21

Hybridise PDFs with combined OCR'd text

OCRFeeder is a application around various recognition backends. It does the page segmentation (find sections containing text), feeds them to the OCR engine of choice and provide options like combined PDF (a pdf containing the plaintext below the original document), plaintext, odt and others.

Its written in python, so it should be possible to integrate the segmentation and export code with paperless.
enhancement help wanted

opened by janLo 21

Dockerfile not working

hello,

I don't know if i'm missing something but at the log output i think there is a missing dependance

consumer_1   | Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
consumer_1   | Get:2 http://security.debian.org jessie/updates/main amd64 Packages [373 kB]
consumer_1   | Ign http://httpredir.debian.org jessie InRelease
consumer_1   | Get:3 http://httpredir.debian.org jessie Release.gpg [2373 B]
consumer_1   | Get:4 http://httpredir.debian.org jessie Release [148 kB]
consumer_1   | Get:5 http://httpredir.debian.org jessie-updates InRelease [142 kB]
consumer_1   | Get:6 http://httpredir.debian.org jessie/main amd64 Packages [9032 kB]
consumer_1   | Get:7 http://httpredir.debian.org jessie-updates/main amd64 Packages [17.6 kB]
consumer_1   | Fetched 9778 kB in 6s (1442 kB/s)
consumer_1   | Reading package lists...
consumer_1   | dpkg-query: package 'tesseract-ocr-fra' is not installed and no information is available
consumer_1   | Use dpkg --info (= dpkg-deb --info) to examine archive files,
consumer_1   | and dpkg --contents (= dpkg-deb --contents) to list their contents.
consumer_1   | E: No packages found
consumer_1   | Starting document consumer at /consume
consumer_1   | Traceback (most recent call last):
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
consumer_1   |     return self.cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/sqlite3/base.py", line 323, in execute
consumer_1   |     return Database.Cursor.execute(self, query, params)
consumer_1   | sqlite3.OperationalError: no such table: documents_log
consumer_1   |
consumer_1   | The above exception was the direct cause of the following exception:
consumer_1   |
consumer_1   | Traceback (most recent call last):
consumer_1   |   File "/usr/src/paperless/src/manage.py", line 18, in <module>
consumer_1   |     execute_from_command_line(sys.argv)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
consumer_1   |     utility.execute()
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 345, in execute
consumer_1   |     self.fetch_command(subcommand).run_from_argv(self.argv)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 348, in run_from_argv
consumer_1   |     self.execute(*args, **cmd_options)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 399, in execute
consumer_1   |     output = self.handle(*args, **options)
consumer_1   |   File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 50, in handle
consumer_1   |     "Starting document consumer at {}".format(settings.CONSUMPTION_DIR)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1279, in info
consumer_1   |     self._log(INFO, msg, args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1415, in _log
consumer_1   |     self.handle(record)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1425, in handle
consumer_1   |     self.callHandlers(record)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1487, in callHandlers
consumer_1   |     hdlr.handle(record)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 855, in handle
consumer_1   |     self.emit(record)
consumer_1   |   File "/usr/src/paperless/src/documents/loggers.py", line 23, in emit
consumer_1   |     Log.objects.create(**kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/manager.py", line 122, in manager_method
consumer_1   |     return getattr(self.get_queryset(), name)(*args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 401, in create
consumer_1   |     obj.save(force_insert=True, using=self.db)
consumer_1   |   File "/usr/src/paperless/src/documents/models.py", line 260, in save
consumer_1   |     models.Model.save(self, *args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 708, in save
consumer_1   |     force_update=force_update, update_fields=update_fields)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 736, in save_base
consumer_1   |     updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 820, in _save_table
consumer_1   |     result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 859, in _do_insert
consumer_1   |     using=using, raw=raw)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/manager.py", line 122, in manager_method
consumer_1   |     return getattr(self.get_queryset(), name)(*args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 1039, in _insert
consumer_1   |     return query.get_compiler(using=using).execute_sql(return_id)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/compiler.py", line 1060, in execute_sql
consumer_1   |     cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 79, in execute
consumer_1   |     return super(CursorDebugWrapper, self).execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
consumer_1   |     return self.cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/utils.py", line 95, in __exit__
consumer_1   |     six.reraise(dj_exc_type, dj_exc_value, traceback)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
consumer_1   |     raise value.with_traceback(tb)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
consumer_1   |     return self.cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/sqlite3/base.py", line 323, in execute
consumer_1   |     return Database.Cursor.execute(self, query, params)
consumer_1   | django.db.utils.OperationalError: no such table: documents_log
consumer_1   | Get:1 http://security.debian.org jessie/updates InRelease [63.1 kB]
consumer_1   | Get:2 http://httpredir.debian.org jessie-updates InRelease [142 kB]
consumer_1   | Get:3 http://security.debian.org jessie/updates/main amd64 Packages [373 kB]
consumer_1   | Ign http://httpredir.debian.org jessie InRelease
consumer_1   | Get:4 http://httpredir.debian.org jessie-updates/main amd64 Packages [17.6 kB]
consumer_1   | Get:5 http://httpredir.debian.org jessie Release.gpg [2373 B]
consumer_1   | Get:6 http://httpredir.debian.org jessie Release [148 kB]
consumer_1   | Get:7 http://httpredir.debian.org jessie/main amd64 Packages [9032 kB]
consumer_1   | Fetched 9778 kB in 2s (3820 kB/s)
consumer_1   | Reading package lists...
consumer_1   | dpkg-query: package 'tesseract-ocr-fre' is not installed and no information is available
consumer_1   | Use dpkg --info (= dpkg-deb --info) to examine archive files,
consumer_1   | and dpkg --contents (= dpkg-deb --contents) to list their contents.
consumer_1   | E: No packages found
consumer_1   | Starting document consumer at /consume
consumer_1   | Traceback (most recent call last):
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
consumer_1   |     return self.cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/sqlite3/base.py", line 323, in execute
consumer_1   |     return Database.Cursor.execute(self, query, params)
consumer_1   | sqlite3.OperationalError: no such table: documents_log
consumer_1   |
consumer_1   | The above exception was the direct cause of the following exception:
consumer_1   |
consumer_1   | Traceback (most recent call last):
consumer_1   |   File "/usr/src/paperless/src/manage.py", line 18, in <module>
consumer_1   |     execute_from_command_line(sys.argv)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
consumer_1   |     utility.execute()
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 345, in execute
consumer_1   |     self.fetch_command(subcommand).run_from_argv(self.argv)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 348, in run_from_argv
consumer_1   |     self.execute(*args, **cmd_options)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/core/management/base.py", line 399, in execute
consumer_1   |     output = self.handle(*args, **options)
consumer_1   |   File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 50, in handle
consumer_1   |     "Starting document consumer at {}".format(settings.CONSUMPTION_DIR)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1279, in info
consumer_1   |     self._log(INFO, msg, args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1415, in _log
consumer_1   |     self.handle(record)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1425, in handle
consumer_1   |     self.callHandlers(record)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 1487, in callHandlers
consumer_1   |     hdlr.handle(record)
consumer_1   |   File "/usr/local/lib/python3.5/logging/__init__.py", line 855, in handle
consumer_1   |     self.emit(record)
consumer_1   |   File "/usr/src/paperless/src/documents/loggers.py", line 23, in emit
consumer_1   |     Log.objects.create(**kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/manager.py", line 122, in manager_method
consumer_1   |     return getattr(self.get_queryset(), name)(*args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 401, in create
consumer_1   |     obj.save(force_insert=True, using=self.db)
consumer_1   |   File "/usr/src/paperless/src/documents/models.py", line 260, in save
consumer_1   |     models.Model.save(self, *args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 708, in save
consumer_1   |     force_update=force_update, update_fields=update_fields)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 736, in save_base
consumer_1   |     updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 820, in _save_table
consumer_1   |     result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/base.py", line 859, in _do_insert
consumer_1   |     using=using, raw=raw)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/manager.py", line 122, in manager_method
consumer_1   |     return getattr(self.get_queryset(), name)(*args, **kwargs)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 1039, in _insert
consumer_1   |     return query.get_compiler(using=using).execute_sql(return_id)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/compiler.py", line 1060, in execute_sql
consumer_1   |     cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 79, in execute
consumer_1   |     return super(CursorDebugWrapper, self).execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
consumer_1   |     return self.cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/utils.py", line 95, in __exit__
consumer_1   |     six.reraise(dj_exc_type, dj_exc_value, traceback)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
consumer_1   |     raise value.with_traceback(tb)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
consumer_1   |     return self.cursor.execute(sql, params)
consumer_1   |   File "/usr/local/lib/python3.5/site-packages/django/db/backends/sqlite3/base.py", line 323, in execute
consumer_1   |     return Database.Cursor.execute(self, query, params)
consumer_1   | django.db.utils.OperationalError: no such table: documents_log

I will try to take a look this week end tofix the dockerfile

opened by arckosfr 20

Paperless in Kubernetes with NFS Backing

Hello!

Love this project!

I'm attempting to run this in my home kubernetes cluster leveraging NFS shares for the backends. Unfortunately the service never starts because the docker-entrypoint.sh script attempts to chown / chmod files on an NFS share which isn't permitted.

Is anyone else experiencing this and does anyone have a workaround? Considering adding an env to the entrypoint script which would skip the chown / chmod process altogether.

opened by carpenike 0
ERROR Error while consuming document img_20180606_204601.893.jpg: Invalid rotation (0)

I'm moving all my documents to Paperless-NG but I come across quite a lot of these errors. I started missing documents and then I saw the errors are in the logs (so the feature to show the status in the frontend is a nice one). The error Error while consuming document img_20180606_204601.893.jpg: Invalid rotation (0) is there multiple times. What can I do so that Paperless handles this file?

Unfortunately I'm not comfortable sharing the file since there are personal details in it, but I still hope you are able to help.

opened by TruffelNL 1

Docker install: ERROR: for consumer Container "a713bc3650c5" is unhealthy.

Hi,

I'm trying to install Paperless with/in Docker (Docker version 20.10.2, build 2291f61, Ubuntu 20.04.1). I'm not very familiar with Docker, so I hope somebody can give me a hint.

docker-compose up -d runs into this error:

Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
Creating paperless_webserver_1 ... done

ERROR: for consumer  Container "a713bc3650c5" is unhealthy.
ERROR: Encountered errors while bringing up the project.

docker-compose --verbose up -d:

...
compose.parallel.feed_queue: Healthcheck for service(s) upstream of <Service: consumer> failed - not processing
compose.parallel.parallel_execute_iter: Failed: <Service: consumer>
compose.parallel.feed_queue: Pending: set()

ERROR: for consumer  Container "a713bc3650c5" is unhealthy.
ERROR: compose.cli.main.exit_with_metrics: Encountered errors while bringing up the project.

docker-compose.yml:

version: '2.1'

services:
    webserver:
        build: ./
        # uncomment the following line to start automatically on system boot
        restart: always
        ports:
            # You can adapt the port you want Paperless to listen on by
            # modifying the part before the `:`.
            - "8008:8000"
        healthcheck:
            test: ["CMD", "curl" , "-f", "http://localhost:8000"]
            interval: 30s
            timeout: 10s
            retries: 5
        volumes:
            - /media/documents/store:/usr/src/paperless/data
            - /media/documents/media:/usr/src/paperless/media
            # You have to adapt the local path you want the consumption
            # directory to mount to by modifying the part before the ':'.
            - /media/documents/inbox:/consume
        env_file: docker-compose.env
        # The reason the line is here is so that the webserver that doesn't do
        # any text recognition and doesn't have to install unnecessary
        # languages the user might have set in the env-file by overwriting the
        # value with nothing.
        environment:
            - PAPERLESS_OCR_LANGUAGES=
        command: ["gunicorn", "-b", "0.0.0.0:8000"]

    consumer:
        build: ./
        # uncomment the following line to start automatically on system boot
        restart: always
        depends_on:
            webserver:
                condition: service_healthy
        volumes:
            - /media/documents/store:/usr/src/paperless/data
            - /media/documents/media:/usr/src/paperless/media
            # This should be set to the same value as the consume directory
            # in the webserver service above.
            - /media/documents/inbox:/consume
            # Likewise, you can add a local path to mount a directory for
            # exporting. This is not strictly needed for paperless to
            # function, only if you're exporting your files: uncomment
            # it and fill in a local path if you know you're going to
            # want to export your documents.
            - /media/documents/export:/export
        env_file: docker-compose.env
        command: ["document_consumer", "--no-inotify"]

volumes:
    data:
    media:

docker-compose.env:

# Environment variables to set for Paperless
# Commented out variables will be replaced with a default within Paperless.
#
# In addition to what you see here, you can also define any values you find in
# paperless.conf.example here.  Values like:
#
# * PAPERLESS_PASSPHRASE
# * PAPERLESS_CONSUMPTION_DIR
# * PAPERLESS_CONSUME_MAIL_HOST
#
# ...are all explained in that file but can be defined here, since the Docker
# installation doesn't make use of paperless.conf.
#
# NOTE: values in paperless.conf should be wrapped in double quotes, but not in this file
# Example:
# paperless.conf: PAPERLESS_FORGIVING_OCR="true"
# docker-compose.env (this file): PAPERLESS_FORGIVING_OCR=true

# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
TZ=Europe/Berlin

# Additional languages to install for text recognition.  Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# default language used when guessing the language from the OCR output.
PAPERLESS_OCR_LANGUAGES=deu

PAPERLESS_FORGIVING_OCR=true

# Set Paperless to use SSL for the web interface.
# Enabling this will require ssl.key and ssl.cert files in paperless' data directory.
# PAPERLESS_USE_SSL=false

# You can change the default user and group id to a custom one
#USERMAP_UID=1000
#USERMAP_GID=1000

The underlying directories are SMB-shares.

/etc/fstab:

//192.168.1.25/documents /media/documents cifs defaults,nofail,username=admin,password=*supersecure*,vers=1.0,x-systemd.automount,x-systemd.requires=network-online.target,gid=1000,uid=1000,rw,users 0 0

[email protected]:~/Docker/paperless/paperless$ ls -ll /media/documents/
total 0
drwxr-xr-x 2 tony tony 0 Mär  5  2020 archive
drwxr-xr-x 2 tony tony 0 Jan 22 18:53 export
drwxr-xr-x 2 tony tony 0 Jan 19 17:53 inbox
drwxr-xr-x 2 tony tony 0 Jan 22 18:23 media
drwxr-xr-x 2 tony tony 0 Jan 22 19:11 store
drwxr-xr-x 2 tony tony 0 Jan 22 18:38 tmp

[email protected]:~/Docker/paperless/paperless$ id
uid=1000(tony) gid=1000(tony) groups=1000(tony),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),120(lpadmin),131(lxd),132(sambashare),998(docker)

opened by ostpol 0

Disable DjangoQL 0.14 advanced search syntax by default.
DjangoQL 0.14 enables advanced search/completion by default. This patch disables it again, so paperless behaves the same regardless of the version used.

See

https://github.com/ivelum/djangoql/blob/master/CHANGES.rst#0140

https://github.com/ivelum/djangoql#using-djangoql-with-the-standard-django-admin-search

When using DjangoQL 0.13, this is simply a NOOP.
opened by grembo 2

Releases(2.7.0)

2.7.0(Jan 27, 2019)
@syntonym submitted a pull request to catch IMAP connection errors #475.

@sbrunner added psycopg2 to the Pipfile #489. He also fixed a syntax error in docker-compose.yml.example #488 and added DjangoQL, which allows a litany of handy search functionality #492.

@CkuT and @MasterofJOKers hacked out a simple, but super-helpful optimisation to how the thumbnails are served up, improving performance considerably #481.

@tsia added a few fields to the tags REST API. #483.

@cribbstechnologies improved the documentation to help people using Paperless over NFS #484.

@bmsleight updated the documentation to include a note for setting the DEBUG value. The paperless.conf.example file was also updated to mirror the project defaults.

Source code(tar.gz)
Source code(zip)
2.6.0(Dec 1, 2018)
Allow an infinite number of logs to be deleted. Thanks to @Ulli2k for noting the problem in #433.

Fixed the RecentCorrespondentsFilter correspondents filter that was added in 2.4 to play nice with the defaults. Thanks to @tsia and @Sblop who pointed this out. #423.

Updated dependencies to include (among other things) a security patch to requests.

Fix text in sample data for tests so that the language guesser stops thinking that everything is in Catalan because we had Lorem ipsum in there.

Tweaked the gunicorn sample command to use filesystem paths instead of Python paths. #441

Added pretty colour boxes next to the hex values in the Tags section, thanks to a pull request from @jat255 #442.

Added a .editorconfig file to better specify coding style.

@jat255 also added some logic to tie Paperless' date guessing logic into how it parses file names on import. #440

Source code(tar.gz)
Source code(zip)
2.5.0(Oct 7, 2018)
New dependency: Paperless now optimises thumbnail generation with optipng, so you'll need to install that somewhere in your PATH or declare its location in PAPERLESS_OPTIPNG_BINARY. The Docker image has already been updated on the Docker Hub, so you just need to pull the latest one from there if you're a Docker user.

"Login free" instances of Paperless were breaking whenever you tried to edit objects in the admin: adding/deleting tags or correspondents, or even fixing spelling. This was due to the "user hack" we were applying to sessions that weren't using a login, as that hack user didn't have a valid id. The fix was to attribute the first user id in the system to this hack user. #394

A problem in how we handle slug values on Tags and Correspondents required a few changes to how we handle this field #393:

Slugs are no longer editable. They're derived from the name of the tag or correspondent at save time, so if you wanna change the slug, you have to change the name, and even then you're restricted to the rules of the slugify() function. The slug value is still visible in the admin though.

I've added a migration to go over all existing tags & correspondents and rewrite the .slug values to ones conforming to the slugify() rules.

The consumption process now uses the same rules as .save() in determining a slug and using that to check for an existing tag/correspondent.

An annoying bug in the date capture code was causing some bogus dates to be attached to documents, which in turn busted the UI. Thanks to @pengc99 for reporting this. #414

A bug in the Dockerfile meant that Tesseract language files weren't being installed correctly. @euri10 was quick to provide a fix: #406, #413.

Document consumption is now wrapped in a transaction as per an old ticket #262.

The get_date() functionality of the parsers has been consolidated onto the DocumentParser class since much of that code was redundant anyway.

Source code(tar.gz)
Source code(zip)
2.4.0(Oct 7, 2018)
A new set of actions are now available thanks to @jonaswinkler's very first pull request! You can now do nifty things like tag documents in bulk, or set correspondents in bulk. #405

The import/export system is now a little smarter. By default, documents are tagged as unencrypted, since exports are by their nature unencrypted. It's now in the import step that we decide the storage type. This allows you to export from an encrypted system and import into an unencrypted one, or vice-versa.

The migration history has been slightly modified to accommodate PostgreSQL users. Additionally, you can now tell paperless to use PostgreSQL simply by declaring PAPERLESS_DBUSER in your environment. This will attempt to connect to your Postgres database without a password unless you also set PAPERLESS_DBPASS.

A bug was found in the REST API filter system that was the result of an update of django-filter some time ago. This has now been patched in #412. Thanks to @thepill for spotting it!

Source code(tar.gz)
Source code(zip)
2.3.0(Sep 9, 2018)
Support for consuming plain text & markdown documents was added by @jat255! This was a long-requested feature, and it's addition is likely to be greatly appreciated by the community: #395. Thanks also to @ddddavidmartin for his assistance on the issue.

@dubit0 found & fixed a bug that prevented management commands from running before we had an operational database: #396. @jat255 added a simple update to the thumbnail generation process to improve performance: #399.

As his last bit of effort on this release, @jat255 also added some code to allow you to view the documents inline rather than download them as an attachment. #400

Finally, @ahyear found a slip in the Docker documentation and patched it. #401.

Source code(tar.gz)
Source code(zip)
2.2.1(Sep 2, 2018)
@kmlucy reported a bug quickly after the release of 2.2.0 where we broke the DISABLE_LOGIN feature.

Source code(tar.gz)
Source code(zip)
2.2.0(Sep 2, 2018)
Thanks to @dadosch, @wmader, and @brookst, this is the first version of Paperless that supports Django 2.0! As a result of their hard work, you can now also run Paperless on Python 3.7 as well: #386 & #390.

@sbrunner added a few lines of code that made tagging interface a lot easier on those of us with lots of different tags: #391.

@kiliankoe noticed a bug in how we capture & automatically create tags, so that's fixed now too: #384.

Source code(tar.gz)
Source code(zip)
2.1.0(Jul 8, 2018)
@elohmeier added three simple features that make Paperless a lot more user (and developer) friendly:

There's a new search box on the front page: #374.

The correspondents & tags pages now have a column showing the number of relevant documents: #375.

The Dockerfile has been tweaked to build faster for those of us who are doing active development on Paperless using the Docker environment: #376.

You now also have the ability to customise the interface to your heart's content by creating a file called overrides.css and/or overrides.js in the root of your media directory. Thanks to @SummittDweller for this idea: #371.

Source code(tar.gz)
Source code(zip)
2.0.0(Jun 17, 2018)
This is a big release as we’ve changed a core-functionality of Paperless: we no longer encrypt files with GPG by default.

The reasons for this are many, but it boils down to that the encryption wasn’t really all that useful, as files on-disk were still accessible so long as you had the key, and the key was most typically stored in the config file. In other words, your files are only as safe as the paperless user is. In addition to that, the contents of the documents were never encrypted, so important numbers etc. were always accessible simply by querying the database. Still, it was better than nothing, but the consensus from users appears to be that it was more an annoyance than anything else, so this feature is now turned off unless you explicitly set a passphrase in your config file.

Migrating from 1.x

Encryption isn’t gone, it’s just off for new users. So long as you have PAPERLESS_PASSPHRASE set in your config or your environment, Paperless should continue to operate as it always has. If however, you want to drop encryption too, you only need to do two things:

Run ./manage.py migrate && ./manage.py change_storage_type gpg unencrypted. This will go through your entire database and Decrypt All The Things.

Remove PAPERLESS_PASSPHRASE from your paperless.conf file, or simply stop declaring it in your environment.

Special thanks to @erikarvstedt, @matthewmoto, and @mcronce who did the bulk of the work on this big change.
Source code(tar.gz)
Source code(zip)
1.3.0(Feb 25, 2018)
You can now run Paperless without a login, though you'll still have to create at least one user. This is thanks to a pull-request from @matthewmoto: #295. Note that logins are still required by default, and that you need to disable them by setting PAPERLESS_DISABLE_LOGIN="true" in your environment or in /etc/paperless.conf.

Fix for #303 where sketchily-formatted documents could cause the consumer to break and insert half-records into the database breaking all sorts of things. We now capture the return codes of both convert and unpaper and fail-out nicely.

Fix for additional date types thanks to input from @isaacsando and code from @BastianPoe (#301).

Fix for running migrations in the Docker container (#299). Thanks to @TeraHz for the fix (#300) and to @pitkley for the review.

Fix for Docker cases where the issuing user is not UID 1000. This was a collaborative fix between @ChromoX and @pitkley in #311 and #312 to fix #306.

Patch the historical migrations to support MySQL's um, interesting way of handing indexes (#308). Thanks to @skuzzle for reporting the problem and helping me find where to fix it.

Source code(tar.gz)
Source code(zip)
1.2.0(Feb 3, 2018)
New Docker image, now based on Alpine, thanks to the efforts of @addadi and @Pit. This new image is dramatically smaller than the Debian-based one, and it also has a new home on Docker Hub. A proper thank-you to @Pit_ for hosting the image on his Docker account all this time, but after some discussion, we decided the image needed a more official-looking home.

@BastianPoe has added the long-awaited feature to automatically skip the OCR step when the PDF already contains text. This can be overridden by setting PAPERLESS_OCR_ALWAYS=YES either in your paperless.conf or in the environment. Note that this also means that Paperless now requires libpoppler-cpp-dev to be installed. Important: You'll need to run pip install -r requirements.txt after the usual git pull to properly update.

@BastianPoe has also contributed a monumental amount of work (#291) to solving #158: setting the document creation date based on finding a date in the document text.

Source code(tar.gz)
Source code(zip)
1.1.0(Jan 21, 2018)
Fix for #283, a redirect bug which broke interactions with paperless-desktop. Thanks to @chris-aeviator for reporting it.

Addition of an optional new financial year filter, courtesy of @ddddavidmartin (#256)

Fixed a typo in how thumbnails were named in exports (#285), courtesy of @pzl

Source code(tar.gz)
Source code(zip)
1.0.0(Jan 6, 2018)
Because by this point Paperless is stable enough to have a major version.

Changes in this release:

Upgrade to Django 1.11. You'll need to run pip install -r requirements.txt after the usual git pull to properly update.

Replace the templatetag-based hack we had for document listing in favour of a slightly less ugly solution in the form of another template tag with less copypasta.

Support for multi-word-matches for auto-tagging thanks to an excellent patch from @ishirav #277

Fixed a CSS bug reported by @xkpd3 that caused an overlapping of the text and checkboxes under some resolutions #272.

Patched the Docker config to force the serving of static files. Credit for this one goes to @dev-rke via #248.

Fix file permissions during Docker start up thanks to @pitkley on #268.

Date fields in the admin are now expressed as HTML5 date fields thanks to @Findus23's issue #278.

Source code(tar.gz)
Source code(zip)
0.8.0(Sep 9, 2017)

Adds support for hosting Paperless in a directory other than the root. So instead of example.com/ you can now host it at example.com/paperless/ if you like. Thanks to @maphy-psd for the PR on this one.
Source code(tar.gz)
Source code(zip)
0.7.0(Jul 15, 2017)
Potentially breaking change: As per #235, Paperless will no longer automatically delete documents attached to correspondents when those correspondents are themselves deleted. This was Django's default behaviour, but didn't make much sense in Paperless' case. Thanks to @thomasbrueggemann and @ddddavidmartin for their input on this one.

Fix for #232 wherein Paperless wasn't recognising .tif files properly. Thanks to @ayounggun for reporting this one and to @kskyten for posting the correct solution in the Github issue.

Source code(tar.gz)
Source code(zip)
0.6.1(Jun 19, 2017)

Removes debugging info :-/
Source code(tar.gz)
Source code(zip)
0.6.0(Jun 18, 2017)
Abandon the shared-secret trick we were using for the POST API in favour of BasicAuth or Django session.

Fix the POST API so it actually works. #236

Breaking change: We've dropped the use of PAPERLESS_SHARED_SECRET as it was being used both for the API (now replaced with a normal auth) and form email polling. Now that we're only using it for email, this variable has been renamed to PAPERLESS_EMAIL_SECRET. The old value will still work for a while, but you should change your config if you've been using the email polling feature. Thanks to @jmgilman for all the help with this feature.

Source code(tar.gz)
Source code(zip)
0.5.0(May 27, 2017)
Support for fuzzy matching in the auto-tagger & auto-correspondent systems thanks to @jgysland's patch #220.

Modified the Dockerfile to prepare an export directory (#212). Thanks to combined efforts from @pitkley and @Strubbl in working out the kinks on this one.

Updated the import/export scripts to include support for thumbnails. Big thanks to @CkuT for finding this shortcoming and doing the work to get it fixed in #224.

All of the following changes are thanks to @ddddavidmartin:

Bumped the dependency on pyocr to 0.4.7 so new users can make use of Tesseract 4 if they so prefer (#226).

Fixed a number of issues with the automated mail handler (#227, #228)

Amended the documentation for better handling of systemd service files (#229)

Amended the Django Admin configuration to have nice headers (#230)

Source code(tar.gz)
Source code(zip)
0.4.1(Mar 28, 2017)

Fix for #206 wherein the pluggable parser didn't recognise files with all-caps suffixes like .PDF.
Source code(tar.gz)
Source code(zip)
0.4.0(Mar 25, 2017)

This release introduces reminders, a feature that's fully functional at the server level, but which is presently not very impressive in the UI. Basically you can now create, edit, update, list, and delete reminders both via the UI (the Django admin) or via the REST API. However actually you know, reminding you via some sort of notification system isn't in there yet.

However, this was a feature requested by people who are writing stuff that plugs into Paperless, like Paperless Desktop so maybe they'll be making use of this more than Paperless Core.
Source code(tar.gz)
Source code(zip)
0.3.6(Mar 25, 2017)
Introduces pluggable consumers (#197)

Fixes a bug in the API that didn't allow for updating correspondents or tags (#200)

Source code(tar.gz)
Source code(zip)
0.3.5(Feb 12, 2017)
This release is primarily for the new look on the documents listing page, but also includes a number of updates to dependency packages. If upgrading, remember to follow the following steps:

git pull

pip install -r requirements.txt

./manage.py migrate

Restart Paperless (however you're running it)

Source code(tar.gz)
Source code(zip)
0.3.4(Jan 10, 2017)
Removal of django-suit due to a licensing conflict I bumped into in 0.3.3. Note that you can use Django Suit with Paperless, but only in a non-profit situation as their free license prohibits for-profit use. As a result, I can't bundle Suit with Paperless without conflicting with the GPL. Further development will be done against the stock Django admin.

I shrunk the thumbnails a little 'cause they were too big for me, even on my high-DPI monitor.

BasicAuth support for document and thumbnail downloads, as well as the Push API thanks to @thomasbrueggemann. See #179_.

Source code(tar.gz)
Source code(zip)
0.3.3(Jan 8, 2017)
Thumbnails in the UI and a Django-suit -based face-lift courtesy of @ekw!

Timezone, items per page, and default language are now all configurable, also thanks to @ekw.

Source code(tar.gz)
Source code(zip)
0.3.2(Jan 3, 2017)
Fix for#172: defaulting ALLOWED_HOSTS to["*"]and allowing the user to set her own value viaPAPERLESS_ALLOWED_HOSTS` should the need arise.

Source code(tar.gz)
Source code(zip)
0.3.1(Jan 1, 2017)

Added a default value for CONVERT_BINARY
Source code(tar.gz)
Source code(zip)
0.3.0(Jan 1, 2017)
Updated to using django-filter 1.x

Added some system checks so new users aren't confused by misconfigurations.

Consumer loop time is now configurable for systems with slow writes. Just set PAPERLESS_CONSUMER_LOOP_TIME to a number of seconds. The default is 10.

As per #44, we've removed support for PAPERLESS_CONVERT, PAPERLESS_CONSUME``, andPAPERLESS_SECRET. Please usePAPERLESS_CONVERT_BINARY,PAPERLESS_CONSUMPTION_DIR, andPAPERLESS_SHARED_SECRET` respectively instead.

Source code(tar.gz)
Source code(zip)

Scan, index, and archive all of your paper documents

Related tags

Overview

Important news about the future of this project

How it Works

Documentation

Requirements

Project Status

Affiliated Projects

Similar Projects

Important Note

Donations

Comments

Releases(2.7.0)

2.7.0(Jan 27, 2019)

2.6.0(Dec 1, 2018)

2.5.0(Oct 7, 2018)

2.4.0(Oct 7, 2018)

2.3.0(Sep 9, 2018)

2.2.1(Sep 2, 2018)

2.2.0(Sep 2, 2018)

2.1.0(Jul 8, 2018)

2.0.0(Jun 17, 2018)

Migrating from 1.x

1.3.0(Feb 25, 2018)

1.2.0(Feb 3, 2018)

1.1.0(Jan 21, 2018)

1.0.0(Jan 6, 2018)

Changes in this release:

0.8.0(Sep 9, 2017)

0.7.0(Jul 15, 2017)

0.6.1(Jun 19, 2017)

0.6.0(Jun 18, 2017)

0.5.0(May 27, 2017)

0.4.1(Mar 28, 2017)

0.4.0(Mar 25, 2017)

0.3.6(Mar 25, 2017)

0.3.5(Feb 12, 2017)

0.3.4(Jan 10, 2017)

0.3.3(Jan 8, 2017)

0.3.2(Jan 3, 2017)

0.3.1(Jan 1, 2017)

0.3.0(Jan 1, 2017)

Owner

Paperless

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.

Scan, index, and archive all of your paper documents

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database

Find duplicate files

A Python library to manage ACBF ebooks.

Automatic Movie Downloading via NZBs & Torrents

Plugin-based, unopinionated membership administration software

Small and highly customizable twin-panel file manager for Linux with support for plugins.

A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.

:mag: Ambar: Document Search Engine

Collect your thoughts and notes without leaving the command line.

One webpage for every book ever published!

🦉Data Version Control | Git for Data & Models

A collection of self-contained and well-documented issues for newcomers to start contributing with

A simple shared budget manager web application

Conference planning tool: CfP, scheduling, speaker management

Wikidata scholarly profiles

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

This is your launchpad that comes with a variety of applications waiting to run on your kubernetes cluster with a single click

The official source code repository for the calibre ebook manager