TRANSCRIPTEnglish

Importing Open Source Models to Ollama

7m 15s1,732 words241 segmentsEnglish

FULL TRANSCRIPT

0:00

AMA is the easiest way to run llms on

0:01

your own hardware and today I'm going to

0:03

show you how you can expand the number

0:05

of models that you have access to from

0:06

the dozens of models in 's library to

0:08

hundreds or even thousands thanks to

0:10

commes for requesting this video let's

0:12

go before we dig to the implementation I

0:15

want to answer the question why do we

0:16

want this AMA hosts a lot of great

0:19

models that suit a lot of different use

0:20

cases they range in size from stable

0:22

lm's 1.6 billion parameter model all the

0:25

way up to Goliath's 120 billion

0:26

parameter model they range in purpose

0:28

from mist's General purpose chat model

0:31

to lava's multimodal model that accepts

0:34

image inputs as well and all the way to

0:36

Metatron which is trained specifically

0:37

on medical data there are also models

0:39

that are trained on different languages

0:41

and even models that are trained to not

0:42

be censored but despite all these

0:44

options within the AMA Library there are

0:46

only about 65 total models that are

0:48

supported right now if we compare that

0:50

with the popular model and data

0:51

repository hugging face we can see that

0:53

there are almost half a million models

0:54

that we could be playing with with more

0:56

added every minute one popular Trend in

0:58

the open source llm space is to take

0:59

models that were trained on thousands of

1:01

GPU hours and terabytes of data and

1:03

improve it even further by adding new

1:05

and refined data two leaders in this

1:07

space are Eric Hartford of cognitive

1:09

computations who created the dolphin

1:10

data set and technium of noose research

1:13

that created mistol as well as the open

1:14

Hermes data set you've probably seen a

1:16

lot of models that were derived from

1:18

these data sets so while some of these

1:19

great models are already on oama many

1:21

more of them aren't in fact I just came

1:23

across one on hugging face today that is

1:25

a fine tune of mistol that is trained on

1:27

the copy Bara and Hermes data sets that

1:29

we just mentioned

1:30

it's a 7 billion parameter model that

1:32

promises to beat mistol on pretty much

1:33

all of the major benchmarks by a pretty

1:35

significant margin so let's get it

1:36

running on AMA what we think of

1:38

abstractly as models that have different

1:40

strengths and characteristics are

1:42

actually just a set of configuration

1:43

files which determine the shape of an

1:45

neural network as well as a bunch of

1:47

numbers which are the weights of the

1:48

nodes within that neural network the

1:50

easiest one to use is called ggf or GPT

1:53

unified format and that's a successor of

1:56

gml it's a binary format that allows

1:58

models to be represented as one single

1:59

single file including all the

2:01

configuration and weights while that's

2:02

convenient for packaging and inference

2:04

most models will initially be released

2:06

as safe tensors safe tensors are just

2:08

another binary format that compresses

2:10

all of the weights into a couple

2:11

different files but they need to be

2:13

paired with other files which determine

2:15

the shape of the network um and how all

2:17

of these weights are related importing a

2:19

ggf into AMA is actually dead simple and

2:21

it's really similar to the process that

2:22

we covered in my first video which I'll

2:24

link above here I'll just be following

2:26

the documentation from ama's GitHub page

2:28

which I'll link in the description below

2:30

first we just want to download the ggf

2:31

file of our desired model the bloke on

2:34

hugging face is pretty prodigious at

2:35

quantizing models into this ggf format

2:39

so here I'm on the BLS page on hugging

2:41

face specifically within the model that

2:43

I am interested in so for quantized

2:45

models I would recommend this Q4 km

2:48

quantization basically quantization is

2:50

just a trade-off between how much

2:52

compression you get versus how much

2:53

accuracy your model is going to have so

2:55

the less accurate is the smaller your

2:56

model is going to be the more accurate

2:58

is the larger your model is going to be

2:59

at the end um but if you check out the

3:01

BLS page he kind of has all of these

3:03

little readms for the different size

3:05

models and so you can choose your own as

3:06

you want okay so next we just want to

3:08

create our empty model file and when we

3:11

edit it I'm just going to paste in from

3:14

and then the location of the file that

3:15

we just downloaded that will Point AMA

3:17

to where everything is that it needs to

3:19

get started additionally according to

3:20

ama's documentation some models that

3:22

you're importing yourself may need um

3:24

the specific prompt template to be

3:26

specified as well in order to work

3:27

properly I know that this model is based

3:30

on the chat ml standard uh which was

3:32

created by open Ai and is sort of

3:34

becoming the main standard across the

3:35

model space so I'm just going to paste

3:37

that in as well so here I'm defining the

3:39

prompt template which you're already

3:40

familiar with as well as parameters for

3:41

stop characters and those are

3:43

essentially just special characters that

3:44

lets the model know that it should stop

3:47

uh creating new text finally we can use

3:49

the AMA CLI to create our new model so

3:51

it looks like AMA was able to create all

3:53

the layers and finish the command with a

3:55

success okay so now we get to try it out

3:57

I'm just going to run our model with AMA

3:59

run and then the name of our model looks

4:01

like it's working the Cy bar data set

4:03

that this model was fine tuned on is

4:05

supposed to excel at multi-step chats so

4:06

let's try one of those tell me something

4:09

briefly about copy bar okay great it

4:11

looks like it knows a lot about that

4:14

let's continue with wow what are two

4:16

things that make them so adorable so now

4:17

it actually needs to know what we're

4:18

referring to in the previous chat all

4:20

right perfect so that was the easy way

4:22

to use a model that was already

4:23

converted into a GG UF for us but I want

4:25

to take it one step further and show you

4:26

how to do this conversion yourselves

4:28

it's only one or two more extra steps

4:30

and knowing how to do this will unlock

4:31

way more models for you so the first

4:33

thing that we want to do is go to the

4:34

models page on hugging face next we want

4:36

to go into its models inversions and

4:38

confirm that it's one of the supported

4:39

architectures ama's docs uh specify that

4:42

they support llama mistl and a couple

4:44

others um but there are a lot of

4:46

different architectures out there so

4:47

just be careful so to do that we just

4:49

want to go into this config file here

4:51

and confirm that under architectures the

4:53

architecture starts with mistal or llama

4:55

and then the rest of this is also fine

4:57

here we see that this is a mistal

4:58

architecture so we're good to proceed

5:00

next we can follow the hugging face

5:02

instructions for downloading the model

5:03

hugging face repositories are just get

5:05

repositories under the hood so if you've

5:07

ever used GitHub or gitlab or something

5:08

like that you should feel right at home

5:10

one exception is that hugging face

5:12

repositories deal with a lot of large

5:14

files specifically the safe tensors that

5:16

I was talking about before and so you're

5:17

going to need to have an extra tool

5:19

installed called get lfs or large file

5:21

storage so just copy paste that command

5:23

and your files will begin downloading

5:25

now one caveat is you won't actually see

5:26

a progress bar as you normally would and

5:28

that's because uh these really large

5:30

files are downloading in the background

5:32

I expect this total download to take

5:33

about 10 minutes depending on your

5:34

internet connection but one pro tip that

5:36

I have is even though you don't have a

5:37

progress bar you can actually just open

5:39

up a second terminal window and then

5:41

look at the status of the downloads

5:42

Yourself by inspecting the files so here

5:44

I'm doing LS which lists all the files

5:46

and then HL basically shows me the size

5:48

of the files in a human readable format

5:50

once the download completes we'll have

5:52

all the files needed to convert into one

5:53

single GG UF file that we already know

5:55

how to use with olama first CD into the

5:58

directory of the model that we just

5:59

downloaded

6:00

next I'm going to paste in a command

6:01

straight from the AMA docs this is a

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.

    Importing Open Source… - Full Transcript | YouTubeTranscript.dev