TRANSCRIÇÃOEnglish

The AI Models, Agents, and Subs I'm Currently Using

21m 25s4,917 palavras665 segmentsEnglish

TRANSCRIÇÃO COMPLETA

0:00

I get a lot of questions about what

0:01

model am I using, what harness am I

0:03

using, whether or not the new $200 sub

0:06

from whatever provider is actually worth

0:07

it. And I wanted to make a video going

0:09

over my current opinions as of March

0:12

30th, 2026 on all of this stuff. I'm

0:15

including the date in this intro because

0:16

unfortunately, I don't know how well

0:18

this video is going to age, but I think

0:20

I want to try and make this a recurring

0:21

video entirely for the fact that things

0:24

are changing so damn fast. But it's how

0:26

I'm filming now and I want to go over

0:27

it. I have this thing to point to of

0:29

these are the models, harnesses, and

0:31

subs that I am currently using and I

0:33

would recommend to any of my friends or

0:35

anyone I'm working with. I put together

0:36

this site called State of AI. It is just

0:38

AI.davis7.sh.

0:40

It'll be linked down below in the

0:41

description. And all it is is a snapshot

0:43

of how I felt at this moment in time.

0:45

Starting with the models, my main

0:47

default model is GBT 5.4. If you look at

0:50

my token consumption over the course of

0:52

a week, I would guesstimate that 80 to

0:54

90% of it is going through GPT 5.4. It's

0:57

incredible at instruction following. It

0:59

has very up-to-date and recent training

1:01

data. That was a problem that OpenAI

1:02

models had for a long time that it just

1:04

doesn't really have anymore. It's

1:06

incredible at like computer use type

1:09

stuff. It's incredible at running within

1:10

a normal agent harness type thing. And

1:13

it's able to solve problems that I just

1:15

can't get any other model to solve. GBT

1:18

5.4 for basically everything is the king

1:21

right now. There is nothing better. I

1:23

really didn't like GBT models last year.

1:24

I was a huge claw guy. I loved how they

1:26

felt to use. But after about a week of

1:29

using 5.3 Codeex a couple months ago, I

1:31

finally got it and it clicked. But once

1:33

you get there, it's an incredible model.

1:35

There is one thing that it absolutely

1:36

sucks at, and that's UI, which is why

1:38

Opus 46 is on this list. I pull out Opus

1:41

46 when I need to touch front-end code.

1:43

The UIs that GBT 5.4 spits out are

1:46

genuinely horrific. I don't know what

1:48

happened, but it is a massive regression

1:50

in UI capability compared to even GPT5.

1:53

it's just bad at making anything

1:55

involving front end. Opus isn't. You

1:57

give it the front-end design scale and

1:59

it will just do the job. It is

2:00

incredible. It actually feels really

2:02

fast to use, even faster than GBT 5.4.

2:05

And I think a lot of the reason for that

2:06

is because it's very prone to action.

2:09

Something you'll notice with Claude

2:10

models is that they feel great to talk

2:11

to. The voice of Claude is probably the

2:14

best uh at least of all the Frontier

2:16

Labs. Like I said, it biases towards

2:18

action. So, it feels faster and it feels

2:21

better in a lot of ways than GBT does

2:23

because it'll just do things, but it

2:25

lacks the discernment and wisdom that

2:27

5.4 4 has where when I was working on

2:29

the original version of BTCA, which was

2:31

a massive TypeScript monoro, it would

2:34

make very subtle bad changes that would

2:36

end up having cascading effects in

2:39

completely different packages across the

2:40

monoro in weird ways that you wouldn't

2:43

realize just looking at the code it

2:44

generated cuz the code looked right, but

2:46

it wasn't right in the broader context.

2:49

That's the thing that 5.4 has over opus

2:51

and the reason why it's my daily driver

2:53

and not opus. It's that discernment. And

2:55

also another big problem with it is it's

2:56

really [ __ ] expensive. Like really

2:58

really expensive. Another good anecdote

3:00

from chat, which I do record these

3:02

videos live. You should come by

3:03

sometime. But in his case, he was trying

3:05

to do some cyber security stuff and ask

3:07

it to sniff some stuff on his Wi-Fi. GBT

3:10

just flat out refuses, but Claude will

3:12

do it. I think that this is again just

3:14

kind of the way GBT is. It is very

3:17

strict about security stuff and is very

3:19

careful to not make breaking changes and

3:21

not to do anything dangerous, quote

3:22

unquote. And in this case, I think

3:24

that's exactly what that is. So yeah, if

3:26

you need to do something like that, claw

3:28

does make a lot of sense. I don't hate

3:29

this model by any stretch of the

3:31

imagination. It's just not my main

3:32

coding model. 5.4 Mini to me is a

3:35

massive sleeper pick. It feels a lot

3:37

like a sonnet type model even though it

3:39

says mini on it. It's way smarter than

3:41

you would think it is. It is really,

3:43

really [ __ ] fast and it's great at

3:45

tool calling. It keeps a lot of those

3:47

characteristics that GBT 5.4 has where

3:50

it's good at following instructions.

3:52

It's good at working in a loop. It's

3:53

good at high level discernment and not

3:56

making dumb mistakes. But of course, the

3:58

trade-off to all that is that it is not

4:00

nearly as smart as GBT 5.4. But if you

4:02

need to do sub agency type tasks like

4:05

searching a codebase or searching the

4:07

web or making some small change on your

4:09

computer, 5.4 Mini is actually a

4:11

surprisingly useful model that I would

4:13

recommend picking up more than you

4:15

probably are. The Gemini models I really

4:17

don't have too much to say about other

4:18

than they do have a place. I was

4:20

benchmarking different models for taking

4:22

a blob of something, so in this case it

4:24

was emails, and converting them into

4:26

JSON that parses out what they're for,

4:28

how important they are, all that kind of

4:30

thing. And I put together a benchmark to

4:32

see which one was the best at that

4:34

specific task. And Gemini won pretty

4:36

handily. It's a great multimodal model.

4:38

It's just not a great coding model

4:40

because it cannot run in a coding agent

4:42

harness to save its life. The only place

4:44

where I would say the Gemini models are

4:46

usable for coding is cursor. And that's

4:48

just because they spent a ton of time

4:50

and effort refining the system prompt

4:52

and tool names for Gemini models so that

4:54

they wouldn't make mistakes. The last

4:56

model I put on this list is Composer

4:57

2.0. It is actually a RLED version of

5:00

Kimmy. 25% of the compute went into

5:03

making Kimmy 2.5 which is the base

5:05

model. Then 75% of the compute for

5:07

Composer like actually training it went

5:09

into just refining it through cursors

5:11

data. And it's a really good model. It's

5:13

absurdly fast, way better at front end

5:15

than you would expect. And anytime I'm

5:17

doing pair programming type stuff where

5:20

I'm reading the code, I'm in the editor,

5:22

I'm thinking specifically about it, and

5:23

I just want to have it make some

5:25

specific change in a file, composer is a

5:28

great model for this. The problem is you

5:30

can't use it outside of the cursor

5:31

harnesses. They have a lot of really

5:33

good harnesses, but still you're limited

5:34

to where it doesn't work in something

5:36

like a PI or Open Code or even over API.

5:39

And it's not nearly as smart as an

5:41

OpenAI or anthropic model, but it's

5:43

still a good model. And you'll probably

5:44

notice that I don't have any of the

5:45

Chinese models on this list. And the

5:47

reason for that is because I'm only

5:49

covering the stuff that I'm using

5:50

heavily day-to-day, and I have real

5:52

experience with. I've tested a lot of

5:53

these other models. I think they're

5:55

cool, but I don't have enough experience

5:57

to say anything intelligent about them

5:58

other than paring other people's

6:00

opinions, and I just don't think that's

6:01

useful. So, that's why they're not in

6:02

here. But, there is something else

6:03

that's in here, and that's today's

6:05

sponsor. There's a really weird split

6:06

that I've been seeing where on our

6:08

machines we have these insanely powerful

6:10

agents like codecs and cloud code, but

6:12

then in the cloud we're still just

6:13

running like normal Verscell AISDK

6:15

stream text loops. A lot of the reason

6:17

for that is because codecs and cloud

6:19

code benefit a ton from having a real

6:21

computer to work with, which is why

6:22

sandboxes are so damn valuable right

6:24

now. And that's why I'm really hyped

6:26

about Upstash's new box product. And you

6:28

might be thinking, wait, isn't Upstash

6:29

the company that just has that really

6:30

nice Reddus instance? And yes, they are.

DESBLOQUEAR MAIS

Registe-se gratuitamente para aceder a funcionalidades premium

VISUALIZADOR INTERATIVO

Assista ao vídeo com legendas sincronizadas, sobreposição ajustável e controlo total da reprodução.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

RESUMO DE IA

Obtenha um resumo instantâneo gerado por IA do conteúdo do vídeo, pontos-chave e conclusões.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

TRADUZIR

Traduza a transcrição para mais de 100 idiomas com um clique. Baixe em qualquer formato.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

MAPA MENTAL

Visualize a transcrição como um mapa mental interativo. Entenda a estrutura rapidamente.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

CONVERSAR COM A TRANSCRIÇÃO

Faça perguntas sobre o conteúdo do vídeo. Obtenha respostas com tecnologia de IA diretamente da transcrição.

REGISTE-SE GRATUITAMENTE PARA DESBLOQUEAR

APROVEITE MAIS DE SUAS TRANSCRIÇÕES

Inscreva-se gratuitamente e desbloqueie o visualizador interativo, resumos de IA, traduções, mapas mentais e muito mais. Não é necessário cartão de crédito.

    The AI Models, Ag… - Transcrição Completa | YouTubeTranscript.dev