TRANSCRIPTIONEnglish

Importing Open Source Models to Ollama

7m 15s1,732 mots241 segmentsEnglish

TRANSCRIPTION COMPLÈTE

0:00

AMA is the easiest way to run llms on

0:01

your own hardware and today I'm going to

0:03

show you how you can expand the number

0:05

of models that you have access to from

0:06

the dozens of models in 's library to

0:08

hundreds or even thousands thanks to

0:10

commes for requesting this video let's

0:12

go before we dig to the implementation I

0:15

want to answer the question why do we

0:16

want this AMA hosts a lot of great

0:19

models that suit a lot of different use

0:20

cases they range in size from stable

0:22

lm's 1.6 billion parameter model all the

0:25

way up to Goliath's 120 billion

0:26

parameter model they range in purpose

0:28

from mist's General purpose chat model

0:31

to lava's multimodal model that accepts

0:34

image inputs as well and all the way to

0:36

Metatron which is trained specifically

0:37

on medical data there are also models

0:39

that are trained on different languages

0:41

and even models that are trained to not

0:42

be censored but despite all these

0:44

options within the AMA Library there are

0:46

only about 65 total models that are

0:48

supported right now if we compare that

0:50

with the popular model and data

0:51

repository hugging face we can see that

0:53

there are almost half a million models

0:54

that we could be playing with with more

0:56

added every minute one popular Trend in

0:58

the open source llm space is to take

0:59

models that were trained on thousands of

1:01

GPU hours and terabytes of data and

1:03

improve it even further by adding new

1:05

and refined data two leaders in this

1:07

space are Eric Hartford of cognitive

1:09

computations who created the dolphin

1:10

data set and technium of noose research

1:13

that created mistol as well as the open

1:14

Hermes data set you've probably seen a

1:16

lot of models that were derived from

1:18

these data sets so while some of these

1:19

great models are already on oama many

1:21

more of them aren't in fact I just came

1:23

across one on hugging face today that is

1:25

a fine tune of mistol that is trained on

1:27

the copy Bara and Hermes data sets that

1:29

we just mentioned

1:30

it's a 7 billion parameter model that

1:32

promises to beat mistol on pretty much

1:33

all of the major benchmarks by a pretty

1:35

significant margin so let's get it

1:36

running on AMA what we think of

1:38

abstractly as models that have different

1:40

strengths and characteristics are

1:42

actually just a set of configuration

1:43

files which determine the shape of an

1:45

neural network as well as a bunch of

1:47

numbers which are the weights of the

1:48

nodes within that neural network the

1:50

easiest one to use is called ggf or GPT

1:53

unified format and that's a successor of

1:56

gml it's a binary format that allows

1:58

models to be represented as one single

1:59

single file including all the

2:01

configuration and weights while that's

2:02

convenient for packaging and inference

2:04

most models will initially be released

2:06

as safe tensors safe tensors are just

2:08

another binary format that compresses

2:10

all of the weights into a couple

2:11

different files but they need to be

2:13

paired with other files which determine

2:15

the shape of the network um and how all

2:17

of these weights are related importing a

2:19

ggf into AMA is actually dead simple and

2:21

it's really similar to the process that

2:22

we covered in my first video which I'll

2:24

link above here I'll just be following

2:26

the documentation from ama's GitHub page

2:28

which I'll link in the description below

2:30

first we just want to download the ggf

2:31

file of our desired model the bloke on

2:34

hugging face is pretty prodigious at

2:35

quantizing models into this ggf format

2:39

so here I'm on the BLS page on hugging

2:41

face specifically within the model that

2:43

I am interested in so for quantized

2:45

models I would recommend this Q4 km

2:48

quantization basically quantization is

2:50

just a trade-off between how much

2:52

compression you get versus how much

2:53

accuracy your model is going to have so

2:55

the less accurate is the smaller your

2:56

model is going to be the more accurate

2:58

is the larger your model is going to be

2:59

at the end um but if you check out the

3:01

BLS page he kind of has all of these

3:03

little readms for the different size

3:05

models and so you can choose your own as

3:06

you want okay so next we just want to

3:08

create our empty model file and when we

3:11

edit it I'm just going to paste in from

3:14

and then the location of the file that

3:15

we just downloaded that will Point AMA

3:17

to where everything is that it needs to

3:19

get started additionally according to

3:20

ama's documentation some models that

3:22

you're importing yourself may need um

3:24

the specific prompt template to be

3:26

specified as well in order to work

3:27

properly I know that this model is based

3:30

on the chat ml standard uh which was

3:32

created by open Ai and is sort of

3:34

becoming the main standard across the

3:35

model space so I'm just going to paste

3:37

that in as well so here I'm defining the

3:39

prompt template which you're already

3:40

familiar with as well as parameters for

3:41

stop characters and those are

3:43

essentially just special characters that

3:44

lets the model know that it should stop

3:47

uh creating new text finally we can use

3:49

the AMA CLI to create our new model so

3:51

it looks like AMA was able to create all

3:53

the layers and finish the command with a

3:55

success okay so now we get to try it out

3:57

I'm just going to run our model with AMA

3:59

run and then the name of our model looks

4:01

like it's working the Cy bar data set

4:03

that this model was fine tuned on is

4:05

supposed to excel at multi-step chats so

4:06

let's try one of those tell me something

4:09

briefly about copy bar okay great it

4:11

looks like it knows a lot about that

4:14

let's continue with wow what are two

4:16

things that make them so adorable so now

4:17

it actually needs to know what we're

4:18

referring to in the previous chat all

4:20

right perfect so that was the easy way

4:22

to use a model that was already

4:23

converted into a GG UF for us but I want

4:25

to take it one step further and show you

4:26

how to do this conversion yourselves

4:28

it's only one or two more extra steps

4:30

and knowing how to do this will unlock

4:31

way more models for you so the first

4:33

thing that we want to do is go to the

4:34

models page on hugging face next we want

4:36

to go into its models inversions and

4:38

confirm that it's one of the supported

4:39

architectures ama's docs uh specify that

4:42

they support llama mistl and a couple

4:44

others um but there are a lot of

4:46

different architectures out there so

4:47

just be careful so to do that we just

4:49

want to go into this config file here

4:51

and confirm that under architectures the

4:53

architecture starts with mistal or llama

4:55

and then the rest of this is also fine

4:57

here we see that this is a mistal

4:58

architecture so we're good to proceed

5:00

next we can follow the hugging face

5:02

instructions for downloading the model

5:03

hugging face repositories are just get

5:05

repositories under the hood so if you've

5:07

ever used GitHub or gitlab or something

5:08

like that you should feel right at home

5:10

one exception is that hugging face

5:12

repositories deal with a lot of large

5:14

files specifically the safe tensors that

5:16

I was talking about before and so you're

5:17

going to need to have an extra tool

5:19

installed called get lfs or large file

5:21

storage so just copy paste that command

5:23

and your files will begin downloading

5:25

now one caveat is you won't actually see

5:26

a progress bar as you normally would and

5:28

that's because uh these really large

5:30

files are downloading in the background

5:32

I expect this total download to take

5:33

about 10 minutes depending on your

5:34

internet connection but one pro tip that

5:36

I have is even though you don't have a

5:37

progress bar you can actually just open

5:39

up a second terminal window and then

5:41

look at the status of the downloads

5:42

Yourself by inspecting the files so here

5:44

I'm doing LS which lists all the files

5:46

and then HL basically shows me the size

5:48

of the files in a human readable format

5:50

once the download completes we'll have

5:52

all the files needed to convert into one

5:53

single GG UF file that we already know

5:55

how to use with olama first CD into the

5:58

directory of the model that we just

5:59

downloaded

6:00

next I'm going to paste in a command

6:01

straight from the AMA docs this is a

DÉBLOQUER PLUS

Inscrivez-vous gratuitement pour accéder aux fonctionnalités premium

VISUALISEUR INTERACTIF

Regardez la vidéo avec des sous-titres synchronisés, une superposition réglable et un contrôle total de la lecture.

INSCRIVEZ-VOUS GRATUITEMENT POUR DÉBLOQUER

RÉSUMÉ IA

Obtenez un résumé instantané généré par l'IA du contenu de la vidéo, des points clés et des principaux enseignements.

INSCRIVEZ-VOUS GRATUITEMENT POUR DÉBLOQUER

TRADUIRE

Traduisez la transcription dans plus de 100 langues en un seul clic. Téléchargez dans n'importe quel format.

INSCRIVEZ-VOUS GRATUITEMENT POUR DÉBLOQUER

CARTE MENTALE

Visualisez la transcription sous forme de carte mentale interactive. Comprenez la structure en un coup d'œil.

INSCRIVEZ-VOUS GRATUITEMENT POUR DÉBLOQUER

DISCUTER AVEC LA TRANSCRIPTION

Posez des questions sur le contenu de la vidéo. Obtenez des réponses alimentées par l'IA directement à partir de la transcription.

INSCRIVEZ-VOUS GRATUITEMENT POUR DÉBLOQUER

TIREZ LE MEILLEUR PARTI DE VOS TRANSCRIPTIONS

Inscrivez-vous gratuitement et débloquez la visionneuse interactive, les résumés IA, les traductions, les cartes mentales, et plus encore. Aucune carte de crédit requise.