TRANSCRIPCIÓNEnglish

Gemma 4 Has Landed!

18m 34s3,380 palabras478 segmentsEnglish

TRANSCRIPCIÓN COMPLETA

0:00

Okay, so Google has just dropped Gemma 4

0:04

and this is four new models with

0:06

multimodality

0:08

thinking function calling the works and

0:11

honestly that alone would get me

0:13

covering this. But that's not even the

0:14

interesting part. The interesting part

0:16

is the license. Gemma 4 ships under an

0:20

Apache 2 license. Not a custom license

0:23

with weird restrictions with the whole

0:25

sort of open weights but don't compete

0:27

with us clauses. This is an actual real

0:30

Apache 2 license, which means for the

0:32

first time you can take Google's best

0:34

open model, modify it, fine-tune it,

0:37

deploy it commercially, do whatever you

0:40

want with it. No strings attached. And

0:43

when we combine that with inside these

0:45

models, we're talking about 128 experte

0:48

here, native audio, native vision,

0:51

built-in reasoning, all of that becomes

0:54

a pretty big deal. Okay, so let me give

0:56

you a quick orientation because there

0:58

are four models and the naming is a

1:00

little bit confusing here. Gemma 4 comes

1:02

in two tiers. You've got what they're

1:04

calling your workstation models. So this

1:07

is a 31 billion parameter dense model

1:10

and a 26 billion parameter mixture of

1:13

experts model with 4 billion parameters

1:15

active. And then you've got your edge

1:18

models. So this is the E2B and the E4B.

1:22

Now, these are tiny, efficient models

1:24

designed to run on phones, Raspberry

1:26

Pies, Jets and Nanos, and really pretty

1:29

much at the edge anywhere you need a

1:31

good quality model here. Now, I've

1:33

covered the Gemma line of models since

1:34

the original release. I covered Gemma 3

1:36

on the channel, and I know back then,

1:38

while a lot of people were very

1:40

impressed with it, they were kind of

1:41

frustrated with some of the things

1:42

around the license. So, you had this

1:44

capable model, but a license with enough

1:47

restrictions that a lot of people went

1:48

with Llama or went with Quen instead. So

1:51

the Apache 2.0 move here is Google

1:54

basically saying, "Okay, fine. We'll

1:56

play the same terms as some of the other

1:58

open model providers out there." And in

2:00

fact, as we're talking about this, some

2:02

of the other model open providers in

2:03

China are actually pulling back their

2:06

latest releases and not making them open

2:08

like they have in the past. So the other

2:11

big thing up front here is that Google

2:13

is saying that these are built from

2:15

Gemini 3 research. So basically the

2:18

architecture innovations that went into

2:20

some of their flagship commercial models

2:22

are slowly now trickling down into the

2:25

open weights models. So if you've been

2:28

running local models and I know a lot of

2:30

you have the landscape has kind of

2:32

settled into this pattern. We've kind of

2:34

gone past the llama models. We've now

2:37

got sort of quan mistral and they're all

2:40

sort of competing on benchmarks in this

2:42

sort of fixed parameter range for dense

2:45

models. But we've also seen, you know,

2:47

up until recently, most of these models

2:49

were text only or at best text plus

2:53

vision. If you want audio, you're kind

2:55

of bolting on whisper. You're bolting on

2:58

some external ASR pipeline. And often if

3:00

you wanted something like function

3:01

calling, you're kind of hoping that the

3:03

model cooperates with your prompt

3:05

template. So what Gemma 4 is doing here

3:07

is shipping all of that natively into a

3:10

single model family. vision, audio,

3:13

thinking, function calling, and all of

3:15

these four are actually built in from

3:18

the architecture level, not sort of

3:20

bolted on after the fact. All right, so

3:22

one of the key things that makes Gemma 4

3:25

better than the previous Gemma series is

3:27

that it now has the ability to do sort

3:30

of long chain of thought reasoning. And

3:32

we've seen clearly that this can improve

3:34

outputs and can get you better final

3:36

answers, etc. Now, not only can this

3:38

reason across text, but it can reason

3:41

across different modalities. So, it can

3:44

reason across images if you wanted to

3:46

basically pass in an image and make use

3:49

of that. And for the first time, you can

3:50

actually reason across audio. So, that

3:53

is also cool in here. Obviously, this

3:56

ability to do the long chain of thought

3:58

has improved a lot of the benchmarks

4:00

that are out there and they're getting

4:02

really strong results on the MMU Pro as

4:04

well as Sweetbench Pro. Along with the

4:07

reasoning comes function calling. So,

4:10

anything you want to do that's aantic,

4:12

you want to basically be using function

4:14

calling and tools. So, this has

4:17

integrated a lot of the research they

4:19

put into the function Gemma model which

4:21

they released at the end of last year.

4:23

But now this is both in the small models

4:25

and the bigger models. So a lot of

4:27

people will think that this is not that

4:29

new. But really the way people did this

4:32

in the past for doing this kind of

4:35

function calling was actually just

4:37

having the model to be better at

4:38

instruction following and then sort of

4:40

coaxing it into it. Gemma 4 actually has

4:43

the function calling baked into it from

4:46

scratch. So this is sort of optimized

4:48

for multi-turn agentic flows allows you

4:50

to do with multiple tools and it really

4:53

shows up in some of the agentic

4:54

benchmarks and tasks that you can do.

4:57

All right, I mentioned earlier in the

4:58

reasoning that the two smaller models,

5:01

unfortunately not all four models, but

5:03

the two smaller models actually have

5:05

audio support and that audio support is

5:08

a lot better than what we had in Gemma

5:10

3N and some of the previous Gemma models

5:13

that had audio support. This means that

5:15

you can do things like ASR and

5:16

transcription, but you can also do

5:19

speech to translated text support. So,

5:21

I'll show you that when we go through

5:23

the walkthrough. On top of this, the

5:24

audio encoder is not only better, but

5:27

it's just a lot smaller. So, this helps

5:29

a lot for anything that you want to do

5:31

at the edge with these models that

5:33

you're just not going to be using as

5:35

much device storage and memory. Another

5:38

thing of comparing Gemma 4 to say the

5:41

Gemma 3N series is to do with the image

5:45

encoder. The image encoder with those

5:47

Gemma 3N models. While it was good, it

5:50

really was a bit sort of old-fashioned

5:52

in the way that they did it. It didn't

5:54

handle things like aspect ratios well.

5:56

And because of that, you would often see

5:58

that it didn't do a great job for things

5:59

like OCR, etc. The Gemma 4 models

6:02

basically have native support for these

6:05

interled multi-image inputs. My guess is

6:08

from playing with it that it's probably

6:10

had a decent amount of sort of OCR and

6:12

document understanding training in

6:14

there. And because you can do that sort

6:16

of multi-image input, you can actually

6:18

do video here and have reasoning across

6:21

those multi-im images. So generally just

6:23

comparing the Gemma 4 against Gemma 3

6:26

and Gemma 3N, you've got a lot more

6:28

updates in both with the smaller models

6:30

supporting the audio and better

6:32

multimodality support. And whereas Gemma

6:35

3N only had a context window of 32K,

6:38

even for the small models on Gemma 4,

6:41

they've got a context window of 128K and

6:44

then 256K for the bigger models. All

6:47

right, so let's talk about some of these

6:49

architecture choices and the model sizes

6:51

themselves. So the mixture of experts

6:54

model is 26 billion total parameters,

6:57

but only 3.8 billion are active at any

7:00

time. Now they haven't gone for a huge

7:03

number of experts like we've seen some

7:05

of the other models go for recently.

7:06

They've got 128 of these sort of tiny

7:09

experts, eight being activated for each

7:12

token plus one sort of shared always on

7:15

expert. So if we compare that to the

7:17

Gemma 3 model which the largest model

7:19

was a 27 billion parameter dense model

7:22

obviously in that case you are using all

7:24

27 billion at the same time. So roughly

7:28

this is giving you sort of the

7:29

intelligence of a 27b model with the

7:32

compute costs of something around a 4B

7:35

model. Now this you can certainly run on

7:37

sort of consumer GPUs and I'm sure that

7:40

even as I'm recording this before it

7:41

comes out we will see this on Oama on

7:44

LLM studio etc. And Google themselves is

7:47

also releasing the QAT checkpoints

7:50

that's the quantized aware training

DESBLOQUEAR MÁS

Regístrate gratis para acceder a funciones premium

VISOR INTERACTIVO

Mira el video con subtítulos sincronizados, superposición ajustable y control total de la reproducción.

REGÍSTRATE GRATIS PARA DESBLOQUEAR

RESUMEN DE IA

Obtén un resumen instantáneo generado por IA del contenido del video, los puntos clave y las conclusiones.

REGÍSTRATE GRATIS PARA DESBLOQUEAR

TRADUCIR

Traduce la transcripción a más de 100 idiomas con un solo clic. Descarga en cualquier formato.

REGÍSTRATE GRATIS PARA DESBLOQUEAR

MAPA MENTAL

Visualiza la transcripción como un mapa mental interactivo. Comprende la estructura de un vistazo.

REGÍSTRATE GRATIS PARA DESBLOQUEAR

CHATEA CON LA TRANSCRIPCIÓN

Haz preguntas sobre el contenido del video. Obtén respuestas impulsadas por IA directamente desde la transcripción.

REGÍSTRATE GRATIS PARA DESBLOQUEAR

SACA MÁS PARTIDO A TUS TRANSCRIPCIONES

Regístrate gratis y desbloquea el visor interactivo, los resúmenes de IA, las traducciones, los mapas mentales y mucho más. No se requiere tarjeta de crédito.

PRUEBA YOUTUBETRANSCRIPT.DEV COMIENZA GRATIS

Gemma 4 Has Lan… - Transcripción Completa | YouTubeTranscript.dev