Gemma 4 Has Landed!
TRANSCRIÇÃO COMPLETA
Okay, so Google has just dropped Gemma 4
and this is four new models with
multimodality
thinking function calling the works and
honestly that alone would get me
covering this. But that's not even the
interesting part. The interesting part
is the license. Gemma 4 ships under an
Apache 2 license. Not a custom license
with weird restrictions with the whole
sort of open weights but don't compete
with us clauses. This is an actual real
Apache 2 license, which means for the
first time you can take Google's best
open model, modify it, fine-tune it,
deploy it commercially, do whatever you
want with it. No strings attached. And
when we combine that with inside these
models, we're talking about 128 experte
here, native audio, native vision,
built-in reasoning, all of that becomes
a pretty big deal. Okay, so let me give
you a quick orientation because there
are four models and the naming is a
little bit confusing here. Gemma 4 comes
in two tiers. You've got what they're
calling your workstation models. So this
is a 31 billion parameter dense model
and a 26 billion parameter mixture of
experts model with 4 billion parameters
active. And then you've got your edge
models. So this is the E2B and the E4B.
Now, these are tiny, efficient models
designed to run on phones, Raspberry
Pies, Jets and Nanos, and really pretty
much at the edge anywhere you need a
good quality model here. Now, I've
covered the Gemma line of models since
the original release. I covered Gemma 3
on the channel, and I know back then,
while a lot of people were very
impressed with it, they were kind of
frustrated with some of the things
around the license. So, you had this
capable model, but a license with enough
restrictions that a lot of people went
with Llama or went with Quen instead. So
the Apache 2.0 move here is Google
basically saying, "Okay, fine. We'll
play the same terms as some of the other
open model providers out there." And in
fact, as we're talking about this, some
of the other model open providers in
China are actually pulling back their
latest releases and not making them open
like they have in the past. So the other
big thing up front here is that Google
is saying that these are built from
Gemini 3 research. So basically the
architecture innovations that went into
some of their flagship commercial models
are slowly now trickling down into the
open weights models. So if you've been
running local models and I know a lot of
you have the landscape has kind of
settled into this pattern. We've kind of
gone past the llama models. We've now
got sort of quan mistral and they're all
sort of competing on benchmarks in this
sort of fixed parameter range for dense
models. But we've also seen, you know,
up until recently, most of these models
were text only or at best text plus
vision. If you want audio, you're kind
of bolting on whisper. You're bolting on
some external ASR pipeline. And often if
you wanted something like function
calling, you're kind of hoping that the
model cooperates with your prompt
template. So what Gemma 4 is doing here
is shipping all of that natively into a
single model family. vision, audio,
thinking, function calling, and all of
these four are actually built in from
the architecture level, not sort of
bolted on after the fact. All right, so
one of the key things that makes Gemma 4
better than the previous Gemma series is
that it now has the ability to do sort
of long chain of thought reasoning. And
we've seen clearly that this can improve
outputs and can get you better final
answers, etc. Now, not only can this
reason across text, but it can reason
across different modalities. So, it can
reason across images if you wanted to
basically pass in an image and make use
of that. And for the first time, you can
actually reason across audio. So, that
is also cool in here. Obviously, this
ability to do the long chain of thought
has improved a lot of the benchmarks
that are out there and they're getting
really strong results on the MMU Pro as
well as Sweetbench Pro. Along with the
reasoning comes function calling. So,
anything you want to do that's aantic,
you want to basically be using function
calling and tools. So, this has
integrated a lot of the research they
put into the function Gemma model which
they released at the end of last year.
But now this is both in the small models
and the bigger models. So a lot of
people will think that this is not that
new. But really the way people did this
in the past for doing this kind of
function calling was actually just
having the model to be better at
instruction following and then sort of
coaxing it into it. Gemma 4 actually has
the function calling baked into it from
scratch. So this is sort of optimized
for multi-turn agentic flows allows you
to do with multiple tools and it really
shows up in some of the agentic
benchmarks and tasks that you can do.
All right, I mentioned earlier in the
reasoning that the two smaller models,
unfortunately not all four models, but
the two smaller models actually have
audio support and that audio support is
a lot better than what we had in Gemma
3N and some of the previous Gemma models
that had audio support. This means that
you can do things like ASR and
transcription, but you can also do
speech to translated text support. So,
I'll show you that when we go through
the walkthrough. On top of this, the
audio encoder is not only better, but
it's just a lot smaller. So, this helps
a lot for anything that you want to do
at the edge with these models that
you're just not going to be using as
much device storage and memory. Another
thing of comparing Gemma 4 to say the
Gemma 3N series is to do with the image
encoder. The image encoder with those
Gemma 3N models. While it was good, it
really was a bit sort of old-fashioned
in the way that they did it. It didn't
handle things like aspect ratios well.
And because of that, you would often see
that it didn't do a great job for things
like OCR, etc. The Gemma 4 models
basically have native support for these
interled multi-image inputs. My guess is
from playing with it that it's probably
had a decent amount of sort of OCR and
document understanding training in
there. And because you can do that sort
of multi-image input, you can actually
do video here and have reasoning across
those multi-im images. So generally just
comparing the Gemma 4 against Gemma 3
and Gemma 3N, you've got a lot more
updates in both with the smaller models
supporting the audio and better
multimodality support. And whereas Gemma
3N only had a context window of 32K,
even for the small models on Gemma 4,
they've got a context window of 128K and
then 256K for the bigger models. All
right, so let's talk about some of these
architecture choices and the model sizes
themselves. So the mixture of experts
model is 26 billion total parameters,
but only 3.8 billion are active at any
time. Now they haven't gone for a huge
number of experts like we've seen some
of the other models go for recently.
They've got 128 of these sort of tiny
experts, eight being activated for each
token plus one sort of shared always on
expert. So if we compare that to the
Gemma 3 model which the largest model
was a 27 billion parameter dense model
obviously in that case you are using all
27 billion at the same time. So roughly
this is giving you sort of the
intelligence of a 27b model with the
compute costs of something around a 4B
model. Now this you can certainly run on
sort of consumer GPUs and I'm sure that
even as I'm recording this before it
comes out we will see this on Oama on
LLM studio etc. And Google themselves is
also releasing the QAT checkpoints
that's the quantized aware training
DESBLOQUEAR MAIS
Registe-se gratuitamente para aceder a funcionalidades premium
VISUALIZADOR INTERATIVO
Assista ao vídeo com legendas sincronizadas, sobreposição ajustável e controlo total da reprodução.
RESUMO DE IA
Obtenha um resumo instantâneo gerado por IA do conteúdo do vídeo, pontos-chave e conclusões.
TRADUZIR
Traduza a transcrição para mais de 100 idiomas com um clique. Baixe em qualquer formato.
MAPA MENTAL
Visualize a transcrição como um mapa mental interativo. Entenda a estrutura rapidamente.
CONVERSAR COM A TRANSCRIÇÃO
Faça perguntas sobre o conteúdo do vídeo. Obtenha respostas com tecnologia de IA diretamente da transcrição.
APROVEITE MAIS DE SUAS TRANSCRIÇÕES
Inscreva-se gratuitamente e desbloqueie o visualizador interativo, resumos de IA, traduções, mapas mentais e muito mais. Não é necessário cartão de crédito.