ABSCHRIFTEnglish

The AI Models, Agents, and Subs I'm Currently Using

21m 25s4,917 Wörter665 segmentsEnglish

VOLLSTÄNDIGE ABSCHRIFT

0:00

I get a lot of questions about what

0:01

model am I using, what harness am I

0:03

using, whether or not the new $200 sub

0:06

from whatever provider is actually worth

0:07

it. And I wanted to make a video going

0:09

over my current opinions as of March

0:12

30th, 2026 on all of this stuff. I'm

0:15

including the date in this intro because

0:16

unfortunately, I don't know how well

0:18

this video is going to age, but I think

0:20

I want to try and make this a recurring

0:21

video entirely for the fact that things

0:24

are changing so damn fast. But it's how

0:26

I'm filming now and I want to go over

0:27

it. I have this thing to point to of

0:29

these are the models, harnesses, and

0:31

subs that I am currently using and I

0:33

would recommend to any of my friends or

0:35

anyone I'm working with. I put together

0:36

this site called State of AI. It is just

0:38

AI.davis7.sh.

0:40

It'll be linked down below in the

0:41

description. And all it is is a snapshot

0:43

of how I felt at this moment in time.

0:45

Starting with the models, my main

0:47

default model is GBT 5.4. If you look at

0:50

my token consumption over the course of

0:52

a week, I would guesstimate that 80 to

0:54

90% of it is going through GPT 5.4. It's

0:57

incredible at instruction following. It

0:59

has very up-to-date and recent training

1:01

data. That was a problem that OpenAI

1:02

models had for a long time that it just

1:04

doesn't really have anymore. It's

1:06

incredible at like computer use type

1:09

stuff. It's incredible at running within

1:10

a normal agent harness type thing. And

1:13

it's able to solve problems that I just

1:15

can't get any other model to solve. GBT

1:18

5.4 for basically everything is the king

1:21

right now. There is nothing better. I

1:23

really didn't like GBT models last year.

1:24

I was a huge claw guy. I loved how they

1:26

felt to use. But after about a week of

1:29

using 5.3 Codeex a couple months ago, I

1:31

finally got it and it clicked. But once

1:33

you get there, it's an incredible model.

1:35

There is one thing that it absolutely

1:36

sucks at, and that's UI, which is why

1:38

Opus 46 is on this list. I pull out Opus

1:41

46 when I need to touch front-end code.

1:43

The UIs that GBT 5.4 spits out are

1:46

genuinely horrific. I don't know what

1:48

happened, but it is a massive regression

1:50

in UI capability compared to even GPT5.

1:53

it's just bad at making anything

1:55

involving front end. Opus isn't. You

1:57

give it the front-end design scale and

1:59

it will just do the job. It is

2:00

incredible. It actually feels really

2:02

fast to use, even faster than GBT 5.4.

2:05

And I think a lot of the reason for that

2:06

is because it's very prone to action.

2:09

Something you'll notice with Claude

2:10

models is that they feel great to talk

2:11

to. The voice of Claude is probably the

2:14

best uh at least of all the Frontier

2:16

Labs. Like I said, it biases towards

2:18

action. So, it feels faster and it feels

2:21

better in a lot of ways than GBT does

2:23

because it'll just do things, but it

2:25

lacks the discernment and wisdom that

2:27

5.4 4 has where when I was working on

2:29

the original version of BTCA, which was

2:31

a massive TypeScript monoro, it would

2:34

make very subtle bad changes that would

2:36

end up having cascading effects in

2:39

completely different packages across the

2:40

monoro in weird ways that you wouldn't

2:43

realize just looking at the code it

2:44

generated cuz the code looked right, but

2:46

it wasn't right in the broader context.

2:49

That's the thing that 5.4 has over opus

2:51

and the reason why it's my daily driver

2:53

and not opus. It's that discernment. And

2:55

also another big problem with it is it's

2:56

really [ __ ] expensive. Like really

2:58

really expensive. Another good anecdote

3:00

from chat, which I do record these

3:02

videos live. You should come by

3:03

sometime. But in his case, he was trying

3:05

to do some cyber security stuff and ask

3:07

it to sniff some stuff on his Wi-Fi. GBT

3:10

just flat out refuses, but Claude will

3:12

do it. I think that this is again just

3:14

kind of the way GBT is. It is very

3:17

strict about security stuff and is very

3:19

careful to not make breaking changes and

3:21

not to do anything dangerous, quote

3:22

unquote. And in this case, I think

3:24

that's exactly what that is. So yeah, if

3:26

you need to do something like that, claw

3:28

does make a lot of sense. I don't hate

3:29

this model by any stretch of the

3:31

imagination. It's just not my main

3:32

coding model. 5.4 Mini to me is a

3:35

massive sleeper pick. It feels a lot

3:37

like a sonnet type model even though it

3:39

says mini on it. It's way smarter than

3:41

you would think it is. It is really,

3:43

really [ __ ] fast and it's great at

3:45

tool calling. It keeps a lot of those

3:47

characteristics that GBT 5.4 has where

3:50

it's good at following instructions.

3:52

It's good at working in a loop. It's

3:53

good at high level discernment and not

3:56

making dumb mistakes. But of course, the

3:58

trade-off to all that is that it is not

4:00

nearly as smart as GBT 5.4. But if you

4:02

need to do sub agency type tasks like

4:05

searching a codebase or searching the

4:07

web or making some small change on your

4:09

computer, 5.4 Mini is actually a

4:11

surprisingly useful model that I would

4:13

recommend picking up more than you

4:15

probably are. The Gemini models I really

4:17

don't have too much to say about other

4:18

than they do have a place. I was

4:20

benchmarking different models for taking

4:22

a blob of something, so in this case it

4:24

was emails, and converting them into

4:26

JSON that parses out what they're for,

4:28

how important they are, all that kind of

4:30

thing. And I put together a benchmark to

4:32

see which one was the best at that

4:34

specific task. And Gemini won pretty

4:36

handily. It's a great multimodal model.

4:38

It's just not a great coding model

4:40

because it cannot run in a coding agent

4:42

harness to save its life. The only place

4:44

where I would say the Gemini models are

4:46

usable for coding is cursor. And that's

4:48

just because they spent a ton of time

4:50

and effort refining the system prompt

4:52

and tool names for Gemini models so that

4:54

they wouldn't make mistakes. The last

4:56

model I put on this list is Composer

4:57

2.0. It is actually a RLED version of

5:00

Kimmy. 25% of the compute went into

5:03

making Kimmy 2.5 which is the base

5:05

model. Then 75% of the compute for

5:07

Composer like actually training it went

5:09

into just refining it through cursors

5:11

data. And it's a really good model. It's

5:13

absurdly fast, way better at front end

5:15

than you would expect. And anytime I'm

5:17

doing pair programming type stuff where

5:20

I'm reading the code, I'm in the editor,

5:22

I'm thinking specifically about it, and

5:23

I just want to have it make some

5:25

specific change in a file, composer is a

5:28

great model for this. The problem is you

5:30

can't use it outside of the cursor

5:31

harnesses. They have a lot of really

5:33

good harnesses, but still you're limited

5:34

to where it doesn't work in something

5:36

like a PI or Open Code or even over API.

5:39

And it's not nearly as smart as an

5:41

OpenAI or anthropic model, but it's

5:43

still a good model. And you'll probably

5:44

notice that I don't have any of the

5:45

Chinese models on this list. And the

5:47

reason for that is because I'm only

5:49

covering the stuff that I'm using

5:50

heavily day-to-day, and I have real

5:52

experience with. I've tested a lot of

5:53

these other models. I think they're

5:55

cool, but I don't have enough experience

5:57

to say anything intelligent about them

5:58

other than paring other people's

6:00

opinions, and I just don't think that's

6:01

useful. So, that's why they're not in

6:02

here. But, there is something else

6:03

that's in here, and that's today's

6:05

sponsor. There's a really weird split

6:06

that I've been seeing where on our

6:08

machines we have these insanely powerful

6:10

agents like codecs and cloud code, but

6:12

then in the cloud we're still just

6:13

running like normal Verscell AISDK

6:15

stream text loops. A lot of the reason

6:17

for that is because codecs and cloud

6:19

code benefit a ton from having a real

6:21

computer to work with, which is why

6:22

sandboxes are so damn valuable right

6:24

now. And that's why I'm really hyped

6:26

about Upstash's new box product. And you

6:28

might be thinking, wait, isn't Upstash

6:29

the company that just has that really

6:30

nice Reddus instance? And yes, they are.

MEHR FREISCHALTEN

Melden Sie sich kostenlos an, um Premium-Funktionen zu nutzen

INTERAKTIVER VIEWER

Sehen Sie sich das Video mit synchronisierten Untertiteln, anpassbarer Überlagerung und voller Wiedergabesteuerung an.

KOSTENLOS ANMELDEN ZUM FREISCHALTEN

KI-ZUSAMMENFASSUNG

Erhalten Sie eine sofortige KI-generierte Zusammenfassung des Videoinhalts, der wichtigsten Punkte und Erkenntnisse.

KOSTENLOS ANMELDEN ZUM FREISCHALTEN

ÜBERSETZEN

Übersetzen Sie das Transkript mit einem Klick in über 100 Sprachen. Download in jedem Format.

KOSTENLOS ANMELDEN ZUM FREISCHALTEN

MIND MAP

Visualisieren Sie das Transkript als interaktive Mind Map. Verstehen Sie die Struktur auf einen Blick.

KOSTENLOS ANMELDEN ZUM FREISCHALTEN

CHAT MIT TRANSKRIPT

Stellen Sie Fragen zum Videoinhalt. Erhalten Sie Antworten von der KI direkt aus dem Transkript.

KOSTENLOS ANMELDEN ZUM FREISCHALTEN

HOLEN SIE MEHR AUS IHREN TRANSKRIPTEN HERAUS

Melden Sie sich kostenlos an und schalten Sie interaktiven Viewer, KI-Zusammenfassungen, Übersetzungen, Mind Maps und mehr frei. Keine Kreditkarte erforderlich.

    The AI Models… - Vollständiges Transkript | YouTubeTranscript.dev