Paperless-ngx + Local AI (Optional): Better OCR, Self-Hosted, No Cloud
FULL TRANSCRIPT
Paperless NGX is a self-hosted document inbox that lets you drop in PDF scans, automatically
runs OCR, organizes with tags and correspondents, and then makes the whole thing searchable so
when you need a specific page, you're not digging through folders like it's 2006.
And because it's self-hosted, it puts you back in control of your data.
Your documents stay on your hardware under your rules.
No uploading personal paperwork into ChatGPT or some random cloud service just to get indexed.
In this video, we're doing a full repeatable setup with Docker. We'll get Paperless up and
running first, and then we'll optionally add some local AI with Ollama and show what's actually
worth using once everything's set up. Also, there's a full step-by-step guide linked in the description,
including the exact compose files and.env files we're going to use today.
So what are we setting up then? We're setting up PaperlessNG for document storage with some
baseline OCR. We're going to use Ollama as our local AI engine. We'll then use paperless-ai for
tagging, titles, metadata, and suggestions. Then we'll bring in paperless-gpt for a vision model
OCR, and this is a huge upgrade for OCR accuracy. I know that was a lot of information, but think of
it like this. Paperless NGX is the filing cabinet. It stores, indexes, and searches. Ollama is the
local brain. It runs our models. paperless-ai and paperless-gpt are add-ons that plug into
Paperless that improve metadata, and in the case of paperless-gpt, it improves OCR.
And just to be clear, all of the AI is optional. Paperless NGX works perfectly fine on its own.
If you just want local document management and basic OCR, you can stop there and you'll have a
great setup. Alright, so let's jump into the Compose stack. Here's my Paperless stack and
and I'm using it with Postgres instead of SQLite.
You might be wondering why I'm using Postgres.
Well, it scales a little bit better
if you start building a really big library.
You'll notice that Paperless is also depending
on a few other services in this stack.
One of them is Redis, an in-memory database.
Another one is Gotenberg, which is new to me,
but this helps convert documents to PDFs.
The next is Tika, which extracts metadata
from those documents that you want to bring in to Paperless.
Then we have Ollama, which helps manage our local LLMs.
Then we have OpenWebUI, which helps us manage Ollama, which helps us manage our local LLMs.
Then we have two of our AI services for paperless.
We have paperless-ai.
And then we have paperless-gpt.
Last but not least is Dazzle.
I usually include this in all of my stacks, but it's a web UI to see all of your container
logs, which is super helpful when troubleshooting.
You can see I've exposed a couple of ports on here.
We have Paperless NGX on 8000, then we have paperless-ai on 3000, then we have open web
UI on 3001, paperless-gpt on 3002, and then Dozzle running on port 8080.
Also if you couldn't tell already, I have a.env per service so that not all services have
access to all of the secrets.
To bring the stack up, all we need to do is docker compose up-d.
To see if all the containers are running, you can run a docker ps and check there, or you
can go out to Dazzle and see all of your containers are running.
So now we want to go into OpenWebUI.
So once we get to OpenWebUI, we'll want to get started and then we'll need to create
an account.
So let's create our admin account.
So to download a model, go to your profile, then go to admin panel, then go to settings,
and then go to models.
Here we want to pull a model from Ollama.com.
I found that Llama 3.2 3b works pretty good and it's a good starting point.
So let's pull that one down.
Alright, so we have that model pulled down.
Let's test it really quick with chat just to confirm it responds.
Tell me a joke.
Okay, so we see a response.
That's a good sign.
That means the model is loaded and working.
If you ever want to do a sanity check on your GPU and make sure that your server is using
it, you can remote into your server and run nvtop.
And if you go back to chat and have it output some more information, other than my UPS going
off because it's drawing a lot of power, you can see that it's actually using my GPU here.
Now let's check on Paperless NGX.
So when we get to the initial startup screen, we'll have to create an admin account and
then we'll sign in.
And congratulations, we now have Paperless NGX running.
We'll come back to Paperless here in a second, but let's set up paperless-ai.
And in order to set up paperless-ai, we need an API token from Paperless first.
So to get our token from Paperless NGX, we can go into our profile.
And if we click into here and we generate one, we can get one right here.
Now let's copy this API token and let's paste it into our paperless-ai .env file right into
the Paperless API token.
You also want to update your username, mine's admin.
And then while we're here, you just want to make sure that your Ollama model that you pull
down within Ollama matches here.
And I used llama 3.23b.
And the rest of this outside of time zone, you can leave the way it is.
Let's restart our paperless stack now so we can get the new EMVs.
Let's go to paperless API now and I'm checking here to see what port it's running on and
it's running on port 3000.
Okay, once we get here, we'll need to set up user count as well.
So we'll set up our user account here.
For connection settings, you'll want to enter the paperless API URL.
And in this stack, it's going to be in the service paperless for 8000 and then slash API.
And then for API token, you're going to grab that API token that we just got from paperless
and you're going to want to paste it into here and then enter your paperless user in
here.
For AI settings, the same thing we talked about in that dot EMV, we're going to use Ollama.
We're going to set the Ollama URL and we're going to set it to the service name within
our compose stack, which is just Ollama.
The model that we're using, a token limit and token response.
I'm not going to adjust any of these.
The default values work fine for me.
In the advanced settings, you can choose whether or not you want to use existing correspondents
and tags, and that's related to tags within Paperless NGX.
You can set your scan interval time.
So how often paperless-ai is going to check for new documents.
If you want to process only specific pre-tagged documents, if you want to add an AI process
tag to the documents, so after these documents are processed, do you want to add a tag to
these documents?
Do you want to use specific tags and prompts, whether or not you want to include tags in
your prompts that you'll see below?
And then whether or not you want to disable automatic processing, and that will shut everything
down without having to go and uncheck everything.
This right here is just a toggle to basically say don't do anything to my documents.
And then whether or not you want to use these AI features.
So do we want to assign tags to it?
I do.
Do we want to detect correspondents within the data?
Yeah, I do.
Document type classification, title generation, and then custom fields if you want.
You can add some additional fields, say for instance, total amount if you were scanning
invoices and you could say that that would be a integer and you would add an integer and then when
scanned if the LLM can see a total amount in there it might be able to plug that value into that
custom field last but not least there is a prompt and this is the prompt that we're going to give
the LLM in order to process the documents right now it doesn't have one I would use the example
prompt by clicking on that button and it's going to fill in a pretty good prompt that should be
tailored exactly to paperless ng now if you need to tweak anything you definitely can in here
this is a lot better prompt than I could ever write so I'm going to keep this one in here
let's go to save and it saved my configuration and every save you do it's going to reset the ui
but we're done for now and paperless-ai so now let's go back to paperless and upload a few
documents. So I've generated a few sample documents, some documents of say devices with serial numbers,
some sample invoices along with some tax information and some receipts. I tried to generate a variety
of documents to see how Paperless would handle these. To upload documents you can either click
this button or you can drag and drop them into this area. So let's drag them into here. Now all
of these documents are uploading and if we go into documents we can see them if we click into the
document we can see some of the metadata about this document and this one hasn't been processed
with ai at all it's only been imported into Paperless NGX and we've got some OCR so if we
look at some of the metadata like title this is based on the file name and then we don't have a
lot of information here we don't have the correspondent we don't have document types
and we don't have any tags right now if we go into content we can see some of the information that
was extracted out using OCR so you can see it did a pretty good job but this was a pdf and this
information was text already you see it's selectable but it did a pretty good job outside of this right
here refund your your your our refund or er t balance I think it's getting really confused on
this test data not a tax return right here and then we see xr here with zero dollars and then if
we go into metadata this is pretty standard metadata this is just metadata it got from either
the document or by doing a checksum off of it or looking at the document types but nothing
interesting to see in here nothing in notes and nothing in history so let's look at one of these
other documents where you can see that OCR isn't the greatest. Say for instance I go into this
made-up sample image of this camera you can see you know we have a title and the title again is
just the file name. If we go into content we can see where it actually did some OCR or didn't really
do some OCR. I'm not poking fun at paperless itself this is just the OCR library I guess that it's
using but you can see it didn't do the greatest job and this is a made-up image with this text
printed over like this is as clear as text as it could possibly be but you can see the model is
x2000 then it goes to serial number and screener and then who knows what it's doing here uh it just
said nothing it gave up here and then I think on the fcc it did this symbol maybe uh oh that's
probably the symbol right here uh it got made in japan and somehow it got a dollar sign ws so
and then tilde dash where I have no idea where that came from so you can see that OCR itself
isn't the greatest we're going to see if we can improve this um but just remember this OCR right
here but now that we have paperless-ai running let's hop over to there and process some of these
images and enhance our metadata. So if we go back to paperless-ai on our dashboard, we can see that
it's now seeing 14 of those documents. If you don't see your documents, you can say scan now,
and it will go out to Paperless NGX and scan for those new documents. But now it has 14 documents
ready to process. Now, if it's not processing any of your documents, it might be because you have
this setting turned on in here where it says process only specific pre-tagged documents.
And if you said yes, and you would give it a tag,
like process AI documents,
it would look for those tags and process those.
I'm gonna say no,
because I just wanted to process everything.
And then let's save our configuration.
And the UI is gonna refresh here in a second.
We should see this start processing.
So if we go back to documents,
you could see it's starting to process these documents.
So it's process two already.
It's getting a new document.
It's scanning it.
I can actually hear my GPU,
like the electrical noise when it scans this document.
It's pretty wild because it's making a noise
almost like it's scanning it.
I know it's not, but just the electrical noise
it's making is pretty awesome.
Okay, so I processed all of my documents really fast.
It did say one processed and I refreshed
and it was already done.
So all 14 have been processed
and we can see we have some information about this document.
So let's actually hop back to Paperless and GX
really quick just to see how it enhanced this document and then we'll come back to some of
these features here. So if we go back to Paperless NGX and we go into our documents now this is
really lit up with a lot more information. So just going back into this camera picture you can see
I have a lot more information about it. So it automatically added some tags like electronics,
Japan, made in Japan, x2000. It determined that from this picture this is probably product
information. If we go into content, you could see that it's still not the greatest. It's still using
the OCR that it had because paperless-ai doesn't enhance the image content based on a vision model.
If we go into metadata, that kind of looks the same. Notes history, we actually get some more
information about what was happening and what got updated from paperless-ai, which is pretty cool.
And you can see the title itself got updated too to X2000. I think all of these were based
on the file name and then if we go back out to all of our documents you can see that hey now I have
some titles so this minnesota you know tax return not really my tax return uh you can see it got a
document type of tax return it recognized a correspondent uh in this document is minnesota
state that's I guess who it would have been from it would have been the irs of minnesota but
easy enough to fix or figure out um for some reason it thinks that the correspondent in this
one is Amazon, which I guess could make sense. They might think that this is, I don't know,
a receipt or something like that or product information. I don't know why I chose Amazon,
but it was a good guess. And so this is a really good example of how paperless-ai just enhanced
all of these documents just by scanning it and feeding it through an LLM. But one other cool
thing you could do with paperless-ai is actually to chat about your documents, right? So if I
wanted to look at where was my Minnesota tax return my fake one if I wanted to look at this
and say uh how much did I earn this year uh and then here we go it actually answered it pretty
quick it said my wages for tax year 2025 were $65,978 we go back to the document itself was
that true according to this fake return and yeah yeah it was it was did pretty good I mean yeah
probably my adjusted gross but uh did really good so if I wanted to do RAG chat with all of my
documents I just index them and now they're all indexed and so now it knows about all of those
documents uh or at least it can go and retrieve information about those documents as I ask
questions uh so I could say uh what is the total across all invoices because there were a few sample
invoices in there I could say what's the total across all invoices and uh it's saying that the
total is eight thousand two hundred and eleven dollars and nine cents now we could uh yeah yeah
i was gonna say we can go and look but they're right here and so it was able to pull up all of
these you know fake invoices and look at them and look at the total so pretty cool you could do RAG
chat here if you want paperless-ai did a pretty good job processing my documents and it did a good
job with titles and labels but it really didn't do anything for OCR text quality and for me that's
top priority if my content is wrong well then search is wrong and then I'm back to hunting for
documents like I did before so that's when I started looking for a better OCR option that's
I found paperless-gpt. The reason this stood out to me was it can use OCR using an LLM,
especially vision models. So instead of traditional OCR guessing at pixels,
the model can actually understand what it's looking at. On my test documents,
the test extraction was dramatically better. Now paperless-gpt has a lot of options. I configured a
lot of them already. You'll want to make sure that you have your API token, your LLM provider is
going to be a llama you want to set your LLM model for this as well I'm using the same one llama 3.2
3b then you'll want to make sure you set your OCR provider as LLM and then your vision LLM
provider is a llama and then you'll want to choose a vision LLM model which is minicpm-v
minicpm-v is an LLM it's a multi-modal LLM meaning you can give it an image or you can give
it text and get text back out and this one's really good I used it for a little while and I'm
really impressed but it's really high performing it's pretty small it's pretty accurate for getting
data or text out of images so this is the one that they recommend and so I kind of stuck with it
a lot of these variables in here are the default I only put them here so I understood what I could
change but while we're in here we actually need to download this model right here so the minicpm-v
will need to download this in Ollama so back in open web ui we're going to go to the admin panel
go to settings and then go to models and then let's pull this new model and the model is mini
cpm - v and it's the 8 billion parameters so let's pull this model once that's downloaded now let's
go out to paperless-gpt which should be on port 3002 once we get to paperless-gpt it's a pretty
basic ui but you don't really need to use it all that much which is okay but you can see on the
home page it's actually looking and scanning for all documents that have this tag of paperless-gpt
manual which we don't have any in here we can do ad hoc analysis on some of our documents what we
see with paperless-gpt we can see a log or history of the documents that it's processed
we could see some additional settings which these are all of our prompts and if you want to adjust
these prompts i've actually mapped them inside of our stack so if you want to manually edit these
prompts you can. I'm not going to change any of the prompts nor have I yet. Here in OCR is where
you can do OCR on individual documents if you wanted to test this out. So let's test it out
really quick and then we'll process some automatically. So this is asking for our document
ID. If we go back to paperless and then we go into our documents, let's select this document
right here. That was, you know, this camera image and let's select its document ID, which is right
here in the URL. So this is document ID of five. And let's look at the content again, just to make
sure, see the content. It's still that OCR content. Back in paperless-gpt, let's just paste that
document ID of five. And then let's submit a scan job. It's actually really fast. And here's the
combined OCR result. You can see now that this is a lot better than it was before. But let me
actually save the content so we can go compare within Paperless NGX. So let's save the content
back to that document. Let's go back to our document. It's going to say, hey, it's detected
changes. So let's close out of this document and then open it back up. Now let's go into content.
And you can see here, one, it's in Markdown, which is pretty awesome. It actually did a lot better.
So now it was actually able to pull the serial number right off here, DC45678901, made in Japan and had before.
But even this right here, it was able to see FC and then CE made in Japan.
And then even made a note, "The FC and CE are likely abbreviations for regulatory compliance marks,
but the full meanings of these acronyms were not provided within the image."
So this did a lot better when we fed this image to an LLM, a vision model, because it's able to understand what it's looking at, unlike OCR, which is just doing pixel detection.
So let's see if we can find one more.
Remember we had that weirdness in the text stuff?
Yeah, right here.
So we had a lot of this weird text right here.
And I wasn't really sure where it was coming from.
Like I thought maybe it was reading some of that.
And some of this was overlapped right here.
So let's grab this document ID and feed it to paperless-gpt.
And let's see what it comes up with.
Let's save this content back to the document.
So let's go back into that document.
It was document ID 14, right?
14, yeah, it's right here.
So contents, hey, look at this.
This is pretty cool.
It actually generated markdown too, which is pretty awesome because LLM is like markdown.
So it actually found the title, created a title, created a subtitle.
created body text created a table in markdown this is pretty cool a footer and then look at this
additional information sample test data not a tax return and this is that text going all the way
across there and then it says no at the bottom of the page uh synthetic sample document for upload
slash OCR testing only that was me and that was here when I generated this document and even page
one. So pretty cool, man, pretty, pretty cool stuff. This is way better than OCR. Having a vision
model actually understand what it's looking at and then parse that out and even give us context
around it. Like these tables and markdowns, pretty, pretty cool. So you're probably thinking, well,
that's a lot of work. I don't want to grab the ID every time. And then every time I upload one,
have to go there and, you know, give it the ID. You don't have to do that. And you can create a
quick workflow just like this. So if you wanted to create a workflow, you could. I'm just going to
call this on upload. And so a trigger of upload. So every time we added a new document, what are we
going to do? We're going to take an action here. We're going to apply an action. And what we're
going to do is we are going to assign tags. And we don't have these tags yet, but those tags
in our dot env and so the tags I want to apply to this are paperless-gpt-ocr-auto and so if I apply
this tag to those documents when they're uploaded then paperless-gpt will automatically do OCR on
them now you can also do paperless-gpt-auto which we'll put in there too which is going to tell
paperless-gpt to also process the document titles and stuff like that. So let's get the OCR out of
the way. We actually need to create this title and be very... oh we can't create the title. Okay, okay.
So let's actually create this tag first. Create this tag. There's one. Let's create the other tag
while we're at it too. We want to create this tag also. All right, so now let's go back into our
workflow and on a workflow I'm going to say on document add totally developer name trigger
uh and then what's going to happen document added and then our action is going to be to assign
a tag we'll assign two of these tags so we'll do gpt auto and we'll do gpt auto OCR so this is
going to process tags titles and also OCR too so let's save oh sort order it should just set one
for me anyways sort order is one it's the only one we have okay so let's go into documents let's
upload one more test document let's upload this random diagram you can see had one of the tags
and now both tags are gone so let's find this document again let's clear out of here
Their menuing is a little bit weird. The only problem is when it converts the document and
even the title you're kind of left like figuring out what you just uploaded. I just searched for
diagram and it came up which was pretty interesting because I couldn't find there. "Designing a scalable
web service architecture" that's that's actually exactly what I was diagramming. That's pretty
interesting that that's exactly what I was diagramming and trying to show uh to someone
in discord so designing a scalable web service architecture that's exactly what I was designing
and I didn't even say what it was pretty cool uh let's see in content here we go it's a diagram
some web service uh arrow pointing down labeled user yeah this is pretty good uh arrows connecting
various components so database clustered three boxes connected by arrows in the database cluster
Yeah, that is kind of right.
Object storage cluster.
Yep, two circles and one box connected by arrows.
It's actually three circles, but this is pretty good.
So this is what it pulled out now using both Vision and the LLM.
So we got titles.
We got some tags of IT services.
Correspondent is Amazon.
I don't know, AWS likes Amazon for some reason.
And then we got some content out of it too.
So really, really cool stuff.
So that's the full setup. Paperless NGX is the core, Ollama and OpenWebUI for local AI,
paperless-ai for metadata suggestions, and paperless-gpt for vision model OCR you can trust.
If you want to copy this exactly, check out the description for links to all of the documentation.
I hope you enjoyed this video on Paperless NGX. I'm Tim, thanks for watching.
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.