TRANSCRIPTEnglish

Paperless-ngx + Local AI (Optional): Better OCR, Self-Hosted, No Cloud

26m 42s4,727 words330 segmentsEnglish

FULL TRANSCRIPT

0:00

Paperless NGX is a self-hosted document inbox that lets you drop in PDF scans, automatically

0:06

runs OCR, organizes with tags and correspondents, and then makes the whole thing searchable so

0:12

when you need a specific page, you're not digging through folders like it's 2006.

0:16

And because it's self-hosted, it puts you back in control of your data.

0:20

Your documents stay on your hardware under your rules.

0:24

No uploading personal paperwork into ChatGPT or some random cloud service just to get indexed.

0:30

In this video, we're doing a full repeatable setup with Docker. We'll get Paperless up and

0:34

running first, and then we'll optionally add some local AI with Ollama and show what's actually

0:39

worth using once everything's set up. Also, there's a full step-by-step guide linked in the description,

0:45

including the exact compose files and.env files we're going to use today.

0:49

So what are we setting up then? We're setting up PaperlessNG for document storage with some

0:55

baseline OCR. We're going to use Ollama as our local AI engine. We'll then use paperless-ai for

1:01

tagging, titles, metadata, and suggestions. Then we'll bring in paperless-gpt for a vision model

1:06

OCR, and this is a huge upgrade for OCR accuracy. I know that was a lot of information, but think of

1:13

it like this. Paperless NGX is the filing cabinet. It stores, indexes, and searches. Ollama is the

1:20

local brain. It runs our models. paperless-ai and paperless-gpt are add-ons that plug into

1:26

Paperless that improve metadata, and in the case of paperless-gpt, it improves OCR.

1:31

And just to be clear, all of the AI is optional. Paperless NGX works perfectly fine on its own.

1:37

If you just want local document management and basic OCR, you can stop there and you'll have a

1:42

great setup. Alright, so let's jump into the Compose stack. Here's my Paperless stack and

1:48

and I'm using it with Postgres instead of SQLite.

1:50

You might be wondering why I'm using Postgres.

1:53

Well, it scales a little bit better

1:55

if you start building a really big library.

1:58

You'll notice that Paperless is also depending

2:00

on a few other services in this stack.

2:02

One of them is Redis, an in-memory database.

2:06

Another one is Gotenberg, which is new to me,

2:08

but this helps convert documents to PDFs.

2:12

The next is Tika, which extracts metadata

2:15

from those documents that you want to bring in to Paperless.

2:18

Then we have Ollama, which helps manage our local LLMs.

2:22

Then we have OpenWebUI, which helps us manage Ollama, which helps us manage our local LLMs.

2:29

Then we have two of our AI services for paperless.

2:32

We have paperless-ai.

2:33

And then we have paperless-gpt.

2:36

Last but not least is Dazzle.

2:38

I usually include this in all of my stacks, but it's a web UI to see all of your container

2:43

logs, which is super helpful when troubleshooting.

2:45

You can see I've exposed a couple of ports on here.

2:48

We have Paperless NGX on 8000, then we have paperless-ai on 3000, then we have open web

2:55

UI on 3001, paperless-gpt on 3002, and then Dozzle running on port 8080.

3:03

Also if you couldn't tell already, I have a.env per service so that not all services have

3:09

access to all of the secrets.

3:11

To bring the stack up, all we need to do is docker compose up-d.

3:15

To see if all the containers are running, you can run a docker ps and check there, or you

3:19

can go out to Dazzle and see all of your containers are running.

3:22

So now we want to go into OpenWebUI.

3:26

So once we get to OpenWebUI, we'll want to get started and then we'll need to create

3:29

an account.

3:30

So let's create our admin account.

3:32

So to download a model, go to your profile, then go to admin panel, then go to settings,

3:39

and then go to models.

3:40

Here we want to pull a model from Ollama.com.

3:43

I found that Llama 3.2 3b works pretty good and it's a good starting point.

3:48

So let's pull that one down.

3:50

Alright, so we have that model pulled down.

3:52

Let's test it really quick with chat just to confirm it responds.

3:56

Tell me a joke.

3:59

Okay, so we see a response.

4:00

That's a good sign.

4:01

That means the model is loaded and working.

4:03

If you ever want to do a sanity check on your GPU and make sure that your server is using

4:07

it, you can remote into your server and run nvtop.

4:11

And if you go back to chat and have it output some more information, other than my UPS going

4:17

off because it's drawing a lot of power, you can see that it's actually using my GPU here.

4:23

Now let's check on Paperless NGX.

4:25

So when we get to the initial startup screen, we'll have to create an admin account and

4:29

then we'll sign in.

4:30

And congratulations, we now have Paperless NGX running.

4:34

We'll come back to Paperless here in a second, but let's set up paperless-ai.

4:38

And in order to set up paperless-ai, we need an API token from Paperless first.

4:43

So to get our token from Paperless NGX, we can go into our profile.

4:48

And if we click into here and we generate one, we can get one right here.

4:53

Now let's copy this API token and let's paste it into our paperless-ai .env file right into

5:01

the Paperless API token.

5:03

You also want to update your username, mine's admin.

5:07

And then while we're here, you just want to make sure that your Ollama model that you pull

5:11

down within Ollama matches here.

5:14

And I used llama 3.23b.

5:17

And the rest of this outside of time zone, you can leave the way it is.

5:21

Let's restart our paperless stack now so we can get the new EMVs.

5:25

Let's go to paperless API now and I'm checking here to see what port it's running on and

5:29

it's running on port 3000.

5:31

Okay, once we get here, we'll need to set up user count as well.

5:35

So we'll set up our user account here.

5:37

For connection settings, you'll want to enter the paperless API URL.

5:42

And in this stack, it's going to be in the service paperless for 8000 and then slash API.

5:48

And then for API token, you're going to grab that API token that we just got from paperless

5:53

and you're going to want to paste it into here and then enter your paperless user in

5:57

here.

5:58

For AI settings, the same thing we talked about in that dot EMV, we're going to use Ollama.

6:03

We're going to set the Ollama URL and we're going to set it to the service name within

6:07

our compose stack, which is just Ollama.

6:10

The model that we're using, a token limit and token response.

6:14

I'm not going to adjust any of these.

6:16

The default values work fine for me.

6:18

In the advanced settings, you can choose whether or not you want to use existing correspondents

6:22

and tags, and that's related to tags within Paperless NGX.

6:26

You can set your scan interval time.

6:28

So how often paperless-ai is going to check for new documents.

6:33

If you want to process only specific pre-tagged documents, if you want to add an AI process

6:39

tag to the documents, so after these documents are processed, do you want to add a tag to

6:43

these documents?

6:45

Do you want to use specific tags and prompts, whether or not you want to include tags in

6:50

your prompts that you'll see below?

6:52

And then whether or not you want to disable automatic processing, and that will shut everything

6:57

down without having to go and uncheck everything.

7:00

This right here is just a toggle to basically say don't do anything to my documents.

7:04

And then whether or not you want to use these AI features.

7:07

So do we want to assign tags to it?

7:10

I do.

7:11

Do we want to detect correspondents within the data?

7:14

Yeah, I do.

7:16

Document type classification, title generation, and then custom fields if you want.

7:21

You can add some additional fields, say for instance, total amount if you were scanning

7:27

invoices and you could say that that would be a integer and you would add an integer and then when

7:37

scanned if the LLM can see a total amount in there it might be able to plug that value into that

7:43

custom field last but not least there is a prompt and this is the prompt that we're going to give

7:48

the LLM in order to process the documents right now it doesn't have one I would use the example

7:53

prompt by clicking on that button and it's going to fill in a pretty good prompt that should be

7:59

tailored exactly to paperless ng now if you need to tweak anything you definitely can in here

8:05

this is a lot better prompt than I could ever write so I'm going to keep this one in here

8:08

let's go to save and it saved my configuration and every save you do it's going to reset the ui

8:15

but we're done for now and paperless-ai so now let's go back to paperless and upload a few

8:20

documents. So I've generated a few sample documents, some documents of say devices with serial numbers,

8:28

some sample invoices along with some tax information and some receipts. I tried to generate a variety

8:34

of documents to see how Paperless would handle these. To upload documents you can either click

8:39

this button or you can drag and drop them into this area. So let's drag them into here. Now all

8:45

of these documents are uploading and if we go into documents we can see them if we click into the

8:51

document we can see some of the metadata about this document and this one hasn't been processed

8:56

with ai at all it's only been imported into Paperless NGX and we've got some OCR so if we

9:03

look at some of the metadata like title this is based on the file name and then we don't have a

9:08

lot of information here we don't have the correspondent we don't have document types

9:13

and we don't have any tags right now if we go into content we can see some of the information that

9:18

was extracted out using OCR so you can see it did a pretty good job but this was a pdf and this

9:26

information was text already you see it's selectable but it did a pretty good job outside of this right

9:32

here refund your your your our refund or er t balance I think it's getting really confused on

9:42

this test data not a tax return right here and then we see xr here with zero dollars and then if

9:49

we go into metadata this is pretty standard metadata this is just metadata it got from either

9:54

the document or by doing a checksum off of it or looking at the document types but nothing

10:01

interesting to see in here nothing in notes and nothing in history so let's look at one of these

10:07

other documents where you can see that OCR isn't the greatest. Say for instance I go into this

10:14

made-up sample image of this camera you can see you know we have a title and the title again is

10:21

just the file name. If we go into content we can see where it actually did some OCR or didn't really

10:27

do some OCR. I'm not poking fun at paperless itself this is just the OCR library I guess that it's

10:33

using but you can see it didn't do the greatest job and this is a made-up image with this text

10:39

printed over like this is as clear as text as it could possibly be but you can see the model is

10:44

x2000 then it goes to serial number and screener and then who knows what it's doing here uh it just

10:53

said nothing it gave up here and then I think on the fcc it did this symbol maybe uh oh that's

11:01

probably the symbol right here uh it got made in japan and somehow it got a dollar sign ws so

11:10

and then tilde dash where I have no idea where that came from so you can see that OCR itself

11:16

isn't the greatest we're going to see if we can improve this um but just remember this OCR right

11:22

here but now that we have paperless-ai running let's hop over to there and process some of these

11:28

images and enhance our metadata. So if we go back to paperless-ai on our dashboard, we can see that

11:34

it's now seeing 14 of those documents. If you don't see your documents, you can say scan now,

11:39

and it will go out to Paperless NGX and scan for those new documents. But now it has 14 documents

11:45

ready to process. Now, if it's not processing any of your documents, it might be because you have

11:51

this setting turned on in here where it says process only specific pre-tagged documents.

11:56

And if you said yes, and you would give it a tag,

11:59

like process AI documents,

12:02

it would look for those tags and process those.

12:04

I'm gonna say no,

12:05

because I just wanted to process everything.

12:08

And then let's save our configuration.

12:10

And the UI is gonna refresh here in a second.

12:12

We should see this start processing.

12:15

So if we go back to documents,

12:16

you could see it's starting to process these documents.

12:19

So it's process two already.

12:21

It's getting a new document.

12:22

It's scanning it.

12:23

I can actually hear my GPU,

12:26

like the electrical noise when it scans this document.

12:29

It's pretty wild because it's making a noise

12:31

almost like it's scanning it.

12:32

I know it's not, but just the electrical noise

12:35

it's making is pretty awesome.

12:37

Okay, so I processed all of my documents really fast.

12:39

It did say one processed and I refreshed

12:43

and it was already done.

12:44

So all 14 have been processed

12:47

and we can see we have some information about this document.

12:50

So let's actually hop back to Paperless and GX

12:53

really quick just to see how it enhanced this document and then we'll come back to some of

12:57

these features here. So if we go back to Paperless NGX and we go into our documents now this is

13:03

really lit up with a lot more information. So just going back into this camera picture you can see

13:10

I have a lot more information about it. So it automatically added some tags like electronics,

13:15

Japan, made in Japan, x2000. It determined that from this picture this is probably product

13:22

information. If we go into content, you could see that it's still not the greatest. It's still using

13:28

the OCR that it had because paperless-ai doesn't enhance the image content based on a vision model.

13:36

If we go into metadata, that kind of looks the same. Notes history, we actually get some more

13:41

information about what was happening and what got updated from paperless-ai, which is pretty cool.

13:47

And you can see the title itself got updated too to X2000. I think all of these were based

13:51

on the file name and then if we go back out to all of our documents you can see that hey now I have

13:57

some titles so this minnesota you know tax return not really my tax return uh you can see it got a

14:03

document type of tax return it recognized a correspondent uh in this document is minnesota

14:10

state that's I guess who it would have been from it would have been the irs of minnesota but

14:14

easy enough to fix or figure out um for some reason it thinks that the correspondent in this

14:19

one is Amazon, which I guess could make sense. They might think that this is, I don't know,

14:24

a receipt or something like that or product information. I don't know why I chose Amazon,

14:28

but it was a good guess. And so this is a really good example of how paperless-ai just enhanced

14:35

all of these documents just by scanning it and feeding it through an LLM. But one other cool

14:40

thing you could do with paperless-ai is actually to chat about your documents, right? So if I

14:46

wanted to look at where was my Minnesota tax return my fake one if I wanted to look at this

14:51

and say uh how much did I earn this year uh and then here we go it actually answered it pretty

15:01

quick it said my wages for tax year 2025 were $65,978 we go back to the document itself was

15:10

that true according to this fake return and yeah yeah it was it was did pretty good I mean yeah

15:17

probably my adjusted gross but uh did really good so if I wanted to do RAG chat with all of my

15:24

documents I just index them and now they're all indexed and so now it knows about all of those

15:30

documents uh or at least it can go and retrieve information about those documents as I ask

15:35

questions uh so I could say uh what is the total across all invoices because there were a few sample

15:44

invoices in there I could say what's the total across all invoices and uh it's saying that the

15:50

total is eight thousand two hundred and eleven dollars and nine cents now we could uh yeah yeah

15:58

i was gonna say we can go and look but they're right here and so it was able to pull up all of

16:02

these you know fake invoices and look at them and look at the total so pretty cool you could do RAG

16:07

chat here if you want paperless-ai did a pretty good job processing my documents and it did a good

16:13

job with titles and labels but it really didn't do anything for OCR text quality and for me that's

16:20

top priority if my content is wrong well then search is wrong and then I'm back to hunting for

16:26

documents like I did before so that's when I started looking for a better OCR option that's

16:31

I found paperless-gpt. The reason this stood out to me was it can use OCR using an LLM,

16:38

especially vision models. So instead of traditional OCR guessing at pixels,

16:43

the model can actually understand what it's looking at. On my test documents,

16:48

the test extraction was dramatically better. Now paperless-gpt has a lot of options. I configured a

16:55

lot of them already. You'll want to make sure that you have your API token, your LLM provider is

17:00

going to be a llama you want to set your LLM model for this as well I'm using the same one llama 3.2

17:08

3b then you'll want to make sure you set your OCR provider as LLM and then your vision LLM

17:16

provider is a llama and then you'll want to choose a vision LLM model which is minicpm-v

17:25

minicpm-v is an LLM it's a multi-modal LLM meaning you can give it an image or you can give

17:31

it text and get text back out and this one's really good I used it for a little while and I'm

17:37

really impressed but it's really high performing it's pretty small it's pretty accurate for getting

17:45

data or text out of images so this is the one that they recommend and so I kind of stuck with it

17:51

a lot of these variables in here are the default I only put them here so I understood what I could

17:56

change but while we're in here we actually need to download this model right here so the minicpm-v

18:03

will need to download this in Ollama so back in open web ui we're going to go to the admin panel

18:08

go to settings and then go to models and then let's pull this new model and the model is mini

18:15

cpm - v and it's the 8 billion parameters so let's pull this model once that's downloaded now let's

18:23

go out to paperless-gpt which should be on port 3002 once we get to paperless-gpt it's a pretty

18:29

basic ui but you don't really need to use it all that much which is okay but you can see on the

18:35

home page it's actually looking and scanning for all documents that have this tag of paperless-gpt

18:41

manual which we don't have any in here we can do ad hoc analysis on some of our documents what we

18:48

see with paperless-gpt we can see a log or history of the documents that it's processed

18:54

we could see some additional settings which these are all of our prompts and if you want to adjust

19:00

these prompts i've actually mapped them inside of our stack so if you want to manually edit these

19:06

prompts you can. I'm not going to change any of the prompts nor have I yet. Here in OCR is where

19:13

you can do OCR on individual documents if you wanted to test this out. So let's test it out

19:19

really quick and then we'll process some automatically. So this is asking for our document

19:24

ID. If we go back to paperless and then we go into our documents, let's select this document

19:29

right here. That was, you know, this camera image and let's select its document ID, which is right

19:35

here in the URL. So this is document ID of five. And let's look at the content again, just to make

19:41

sure, see the content. It's still that OCR content. Back in paperless-gpt, let's just paste that

19:47

document ID of five. And then let's submit a scan job. It's actually really fast. And here's the

19:55

combined OCR result. You can see now that this is a lot better than it was before. But let me

20:01

actually save the content so we can go compare within Paperless NGX. So let's save the content

20:07

back to that document. Let's go back to our document. It's going to say, hey, it's detected

20:12

changes. So let's close out of this document and then open it back up. Now let's go into content.

20:19

And you can see here, one, it's in Markdown, which is pretty awesome. It actually did a lot better.

20:26

So now it was actually able to pull the serial number right off here, DC45678901, made in Japan and had before.

20:35

But even this right here, it was able to see FC and then CE made in Japan.

20:41

And then even made a note, "The FC and CE are likely abbreviations for regulatory compliance marks,

20:48

but the full meanings of these acronyms were not provided within the image."

20:53

So this did a lot better when we fed this image to an LLM, a vision model, because it's able to understand what it's looking at, unlike OCR, which is just doing pixel detection.

21:05

So let's see if we can find one more.

21:09

Remember we had that weirdness in the text stuff?

21:12

Yeah, right here.

21:13

So we had a lot of this weird text right here.

21:16

And I wasn't really sure where it was coming from.

21:18

Like I thought maybe it was reading some of that.

21:20

And some of this was overlapped right here.

21:22

So let's grab this document ID and feed it to paperless-gpt.

21:28

And let's see what it comes up with.

21:30

Let's save this content back to the document.

21:33

So let's go back into that document.

21:35

It was document ID 14, right?

21:38

14, yeah, it's right here.

21:40

So contents, hey, look at this.

21:41

This is pretty cool.

21:42

It actually generated markdown too, which is pretty awesome because LLM is like markdown.

21:47

So it actually found the title, created a title, created a subtitle.

21:52

created body text created a table in markdown this is pretty cool a footer and then look at this

21:59

additional information sample test data not a tax return and this is that text going all the way

22:05

across there and then it says no at the bottom of the page uh synthetic sample document for upload

22:11

slash OCR testing only that was me and that was here when I generated this document and even page

22:16

one. So pretty cool, man, pretty, pretty cool stuff. This is way better than OCR. Having a vision

22:23

model actually understand what it's looking at and then parse that out and even give us context

22:29

around it. Like these tables and markdowns, pretty, pretty cool. So you're probably thinking, well,

22:33

that's a lot of work. I don't want to grab the ID every time. And then every time I upload one,

22:37

have to go there and, you know, give it the ID. You don't have to do that. And you can create a

22:42

quick workflow just like this. So if you wanted to create a workflow, you could. I'm just going to

22:49

call this on upload. And so a trigger of upload. So every time we added a new document, what are we

22:59

going to do? We're going to take an action here. We're going to apply an action. And what we're

23:04

going to do is we are going to assign tags. And we don't have these tags yet, but those tags

23:12

in our dot env and so the tags I want to apply to this are paperless-gpt-ocr-auto and so if I apply

23:22

this tag to those documents when they're uploaded then paperless-gpt will automatically do OCR on

23:29

them now you can also do paperless-gpt-auto which we'll put in there too which is going to tell

23:36

paperless-gpt to also process the document titles and stuff like that. So let's get the OCR out of

23:42

the way. We actually need to create this title and be very... oh we can't create the title. Okay, okay.

23:49

So let's actually create this tag first. Create this tag. There's one. Let's create the other tag

23:57

while we're at it too. We want to create this tag also. All right, so now let's go back into our

24:03

workflow and on a workflow I'm going to say on document add totally developer name trigger

24:11

uh and then what's going to happen document added and then our action is going to be to assign

24:19

a tag we'll assign two of these tags so we'll do gpt auto and we'll do gpt auto OCR so this is

24:28

going to process tags titles and also OCR too so let's save oh sort order it should just set one

24:36

for me anyways sort order is one it's the only one we have okay so let's go into documents let's

24:42

upload one more test document let's upload this random diagram you can see had one of the tags

24:50

and now both tags are gone so let's find this document again let's clear out of here

24:55

Their menuing is a little bit weird. The only problem is when it converts the document and

25:02

even the title you're kind of left like figuring out what you just uploaded. I just searched for

25:08

diagram and it came up which was pretty interesting because I couldn't find there. "Designing a scalable

25:14

web service architecture" that's that's actually exactly what I was diagramming. That's pretty

25:20

interesting that that's exactly what I was diagramming and trying to show uh to someone

25:25

in discord so designing a scalable web service architecture that's exactly what I was designing

25:30

and I didn't even say what it was pretty cool uh let's see in content here we go it's a diagram

25:36

some web service uh arrow pointing down labeled user yeah this is pretty good uh arrows connecting

25:42

various components so database clustered three boxes connected by arrows in the database cluster

25:49

Yeah, that is kind of right.

25:51

Object storage cluster.

25:52

Yep, two circles and one box connected by arrows.

25:56

It's actually three circles, but this is pretty good.

26:00

So this is what it pulled out now using both Vision and the LLM.

26:06

So we got titles.

26:07

We got some tags of IT services.

26:10

Correspondent is Amazon.

26:11

I don't know, AWS likes Amazon for some reason.

26:15

And then we got some content out of it too.

26:16

So really, really cool stuff.

26:18

So that's the full setup. Paperless NGX is the core, Ollama and OpenWebUI for local AI,

26:25

paperless-ai for metadata suggestions, and paperless-gpt for vision model OCR you can trust.

26:31

If you want to copy this exactly, check out the description for links to all of the documentation.

26:37

I hope you enjoyed this video on Paperless NGX. I'm Tim, thanks for watching.

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.