TRANSCRIPTEnglish

Your Mac Has Hidden VRAM… Here's How to Unlock It

6m 18s1,283 words173 segmentsEnglish

FULL TRANSCRIPT

0:00

If your Apple silicon machine, like a

0:02

MacBook or a little Mac Mini, doesn't

0:04

have much RAM, and you still want to run

0:06

large language models that are decently

0:08

sized, you can download more RAM. You

0:10

can't download more RAM, but there is a

0:12

trick you can do to allow your system to

0:15

use more memory. Here, I've got the base

0:16

model M4 MacBook Air, and it only has 16

0:19

GB of unified memory. So, when I open up

0:21

LM Studio, and I try to load GPTOSS

0:26

20B, a pretty popular model, look at

0:28

that. It's 11.2 28 GB on disk, which

0:31

should be okay for a 16 GB machine,

0:34

right? Let's load it up. Load model.

0:37

There it goes. It's trying, desperately

0:39

trying to load that model. Oh, what

0:41

happened here? Oh no, failed to load the

0:44

model. Oh, please check the settings and

0:46

try loading it again. That's not going

0:48

to help. You You can't just check the

0:50

settings without doing anything and try

0:51

loading it again. That's like the

0:53

definition of insanity. You need to make

0:55

some changes. So, let's take a look at

0:56

LM Studio. You can go to settings down

0:58

here and if you go to the hardware tab,

1:00

it'll show you what you have available.

1:02

There you go. Apple M4 RAM is 16 GB and

1:06

VRAM is 11.84 GB. So, LM Studio actually

1:10

tells you you can't use all of 16 GB.

1:13

There's some memory that needs to be

1:14

allocated for other tasks like your

1:16

operating system and other things you're

1:18

running. There's things like the kernel,

1:20

IO buffers, Windows server, GPU drive,

1:23

compressed memory pool, and all the

1:24

background tasks. You can see some of

1:26

these actually running. If you go to

1:27

activity monitor and take a look, there

1:29

they are. Hey, look at that. They're

1:30

using some memory. Now, Mac OS is pretty

1:33

good at handling this stuff, putting

1:34

aside what's not being used, compressing

1:37

the rest and things like that. But there

1:38

is a limit to it, right? If you're

1:40

trying to load a model that's going to

1:41

be bigger than this 11.84 GB with

1:45

context, that is cuz when you start to

1:47

load GPoss 20B, estimated memory usage

1:50

is 9.26, total is 12.34. So, we need a

1:55

little bit more, don't we? I'm going to

1:56

show you how to download this RAM. All

1:58

right, enough of that joke. I'm not

1:59

going to show you how to download RAM.

2:01

I'm going to show you this command here

2:02

right now. If you take a look at this

2:04

command, sudo, which means you're going

2:06

to need to execute this as admin

2:08

account.

2:09

>> Hey, mom. Some guy on the internet told

2:10

me to run a weird command on my

2:12

computer.

2:13

>> Okay, just be careful, dear.

2:14

>> It's like, hey, Mac, what's the maximum

2:16

amount of memory the GPU is allowed to

2:18

lock and keep for itself? Right, kid?

2:20

>> Huh? And then this command will show you

2:22

what this limit is set to right now. You

2:24

need to enter your password. And usually

2:27

it's zero by default. And that's the

2:28

default setting, but you can change

2:30

that. 81 92. Why the weird number, Alex?

2:33

Well, these all have to be powers of

2:35

two. Like memory, it goes 2 4 8 uh 16

2:40

and it goes up from there. 8192 is 4,96

2:44

* 2. These are all powers of two, and

2:46

it's used all throughout computing.

2:49

That's how things just work really well.

2:50

So, I'm going to set that. And now, if I

2:52

run that initial command to check and

2:54

not set 8192. Let's restart LM Studio.

2:57

What do you think is going to happen

2:58

here? I've just set the memory limit in

3:01

megabytes to basically half of 16 GB.

3:04

So, if I check hardware, look at that.

3:06

VRAM is now 8 GB instead of 16. So, it's

3:09

even less. You get where I'm going here,

3:11

right? I'm just showing you that

3:13

basically you can alter it. You can make

3:15

it less, you can make it more. And LM

3:17

Studio is a good tool to show you it

3:18

graphically, which is pretty cool. So

3:20

now, let's go a little higher so we can

3:22

actually load that model. Well, should

3:24

we set it to 16 GB? No, because if you

3:27

do, you're going to break things. Why

3:30

the heck not? You know what? This

3:31

channel is for that kind of thing. I

3:33

break things so you don't have to. I'm

3:35

going to set it to 16,000 and this is

3:38

probably not going to be good. Yeah,

3:41

it's set to 16,000. Let's quit LM

3:43

Studio. Hey, maybe it'll work. I don't

3:45

know. Let's see. Things are still

3:46

working and snappy. Let's check

3:48

hardware. VRAM is set to 15.63 GB. Um,

3:54

I'm pretty sure that's not a good idea.

3:56

This is probably not very safe, but I'm

3:58

going to go ahead and try to load this

3:59

20 billion parameter model. Load. Let's

4:01

see what's happening here. Let's take a

4:03

look at the activity monitor and the

4:05

memory pressure. The memory used is

4:07

going up. And there it goes. It's going

4:09

up. Oh boy. This is making me a little

4:11

nervous here. Look at that. It's orange

4:13

now. And now it's going back down. And

4:16

it loaded. It loaded.

4:18

What? Okay, I'm going to keep an eye on

4:20

that little window there while I try to

4:23

run a prompt. New chat. Write a story.

4:26

Boom. It doesn't matter what the prompt

4:28

is. It's just going to generate text no

4:29

matter what. And it's working. That's

4:31

crazy. The memory used is 15.65 GB. The

4:35

memory pressure is crazy high. It's not

4:37

in the red though. Sometimes it turns

4:38

red and that's really not good. But this

4:41

is actually working. Do I recommend that

4:43

you set that setting? No. You want to

4:45

leave a little bit of room unless this

4:47

is all you're doing with this machine.

4:49

If you're operating your Mac Mini in

4:50

headless mode, for example, and you can

4:52

do that on all the Apple silicon

4:53

machines, Mac Studios, in fact, uh back

4:56

here I have a cluster of Mac Studios

4:58

where I can run really gigantic models

5:01

sharing the memory between all those. I

5:03

had to go in there and manually adjust

5:05

the available memory. Each one of those

5:07

boxes has 512 GB of memory, but by

5:10

default, that wired limit in megabytes

5:12

is set to zero, which is by default

5:14

going to be calculated to about 75% of

5:17

what's available. So, if you want to

5:19

squeeze a little bit more out of that to

5:21

be able to run larger models, that's the

5:23

setting you want to use. Now, going back

5:25

to a small machine like this one, look

5:26

at that. 16.72 tokens per second. I'm

5:29

going to actually set the memory limit

5:31

to something more reasonable. And this

5:33

is the number I'm going to use, 14336 on

5:36

a 16 gigabyte machine, which is still

5:39

not the safest thing you can do, but

5:41

it'll allow me to have a couple of

5:43

gigabytes to run for some background

5:45

processes and still have enough room to

5:47

run GPT OSS. Going to restart LM Studio

5:50

because it needs to restart to detect

5:52

those changes. And yeah, so 14 GB now is

5:55

allocated to VRAM. And now I can load my

5:58

model GPUs 20B. There's that memory

6:00

pressure going up again. But in the end,

6:02

it loaded. Write a story, thought for a

6:05

brief moment, and off it goes. Now, if

6:07

you want to see some huge models

6:09

running, check out my video on that

6:10

cluster over there. That video is right

6:12

over here. Thanks for watching, and I'll

6:13

see you next time.

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.