The AI Models, Agents, and Subs I'm Currently Using
TRANSCRIPTION COMPLÈTE
I get a lot of questions about what
model am I using, what harness am I
using, whether or not the new $200 sub
from whatever provider is actually worth
it. And I wanted to make a video going
over my current opinions as of March
30th, 2026 on all of this stuff. I'm
including the date in this intro because
unfortunately, I don't know how well
this video is going to age, but I think
I want to try and make this a recurring
video entirely for the fact that things
are changing so damn fast. But it's how
I'm filming now and I want to go over
it. I have this thing to point to of
these are the models, harnesses, and
subs that I am currently using and I
would recommend to any of my friends or
anyone I'm working with. I put together
this site called State of AI. It is just
AI.davis7.sh.
It'll be linked down below in the
description. And all it is is a snapshot
of how I felt at this moment in time.
Starting with the models, my main
default model is GBT 5.4. If you look at
my token consumption over the course of
a week, I would guesstimate that 80 to
90% of it is going through GPT 5.4. It's
incredible at instruction following. It
has very up-to-date and recent training
data. That was a problem that OpenAI
models had for a long time that it just
doesn't really have anymore. It's
incredible at like computer use type
stuff. It's incredible at running within
a normal agent harness type thing. And
it's able to solve problems that I just
can't get any other model to solve. GBT
5.4 for basically everything is the king
right now. There is nothing better. I
really didn't like GBT models last year.
I was a huge claw guy. I loved how they
felt to use. But after about a week of
using 5.3 Codeex a couple months ago, I
finally got it and it clicked. But once
you get there, it's an incredible model.
There is one thing that it absolutely
sucks at, and that's UI, which is why
Opus 46 is on this list. I pull out Opus
46 when I need to touch front-end code.
The UIs that GBT 5.4 spits out are
genuinely horrific. I don't know what
happened, but it is a massive regression
in UI capability compared to even GPT5.
it's just bad at making anything
involving front end. Opus isn't. You
give it the front-end design scale and
it will just do the job. It is
incredible. It actually feels really
fast to use, even faster than GBT 5.4.
And I think a lot of the reason for that
is because it's very prone to action.
Something you'll notice with Claude
models is that they feel great to talk
to. The voice of Claude is probably the
best uh at least of all the Frontier
Labs. Like I said, it biases towards
action. So, it feels faster and it feels
better in a lot of ways than GBT does
because it'll just do things, but it
lacks the discernment and wisdom that
5.4 4 has where when I was working on
the original version of BTCA, which was
a massive TypeScript monoro, it would
make very subtle bad changes that would
end up having cascading effects in
completely different packages across the
monoro in weird ways that you wouldn't
realize just looking at the code it
generated cuz the code looked right, but
it wasn't right in the broader context.
That's the thing that 5.4 has over opus
and the reason why it's my daily driver
and not opus. It's that discernment. And
also another big problem with it is it's
really [ __ ] expensive. Like really
really expensive. Another good anecdote
from chat, which I do record these
videos live. You should come by
sometime. But in his case, he was trying
to do some cyber security stuff and ask
it to sniff some stuff on his Wi-Fi. GBT
just flat out refuses, but Claude will
do it. I think that this is again just
kind of the way GBT is. It is very
strict about security stuff and is very
careful to not make breaking changes and
not to do anything dangerous, quote
unquote. And in this case, I think
that's exactly what that is. So yeah, if
you need to do something like that, claw
does make a lot of sense. I don't hate
this model by any stretch of the
imagination. It's just not my main
coding model. 5.4 Mini to me is a
massive sleeper pick. It feels a lot
like a sonnet type model even though it
says mini on it. It's way smarter than
you would think it is. It is really,
really [ __ ] fast and it's great at
tool calling. It keeps a lot of those
characteristics that GBT 5.4 has where
it's good at following instructions.
It's good at working in a loop. It's
good at high level discernment and not
making dumb mistakes. But of course, the
trade-off to all that is that it is not
nearly as smart as GBT 5.4. But if you
need to do sub agency type tasks like
searching a codebase or searching the
web or making some small change on your
computer, 5.4 Mini is actually a
surprisingly useful model that I would
recommend picking up more than you
probably are. The Gemini models I really
don't have too much to say about other
than they do have a place. I was
benchmarking different models for taking
a blob of something, so in this case it
was emails, and converting them into
JSON that parses out what they're for,
how important they are, all that kind of
thing. And I put together a benchmark to
see which one was the best at that
specific task. And Gemini won pretty
handily. It's a great multimodal model.
It's just not a great coding model
because it cannot run in a coding agent
harness to save its life. The only place
where I would say the Gemini models are
usable for coding is cursor. And that's
just because they spent a ton of time
and effort refining the system prompt
and tool names for Gemini models so that
they wouldn't make mistakes. The last
model I put on this list is Composer
2.0. It is actually a RLED version of
Kimmy. 25% of the compute went into
making Kimmy 2.5 which is the base
model. Then 75% of the compute for
Composer like actually training it went
into just refining it through cursors
data. And it's a really good model. It's
absurdly fast, way better at front end
than you would expect. And anytime I'm
doing pair programming type stuff where
I'm reading the code, I'm in the editor,
I'm thinking specifically about it, and
I just want to have it make some
specific change in a file, composer is a
great model for this. The problem is you
can't use it outside of the cursor
harnesses. They have a lot of really
good harnesses, but still you're limited
to where it doesn't work in something
like a PI or Open Code or even over API.
And it's not nearly as smart as an
OpenAI or anthropic model, but it's
still a good model. And you'll probably
notice that I don't have any of the
Chinese models on this list. And the
reason for that is because I'm only
covering the stuff that I'm using
heavily day-to-day, and I have real
experience with. I've tested a lot of
these other models. I think they're
cool, but I don't have enough experience
to say anything intelligent about them
other than paring other people's
opinions, and I just don't think that's
useful. So, that's why they're not in
here. But, there is something else
that's in here, and that's today's
sponsor. There's a really weird split
that I've been seeing where on our
machines we have these insanely powerful
agents like codecs and cloud code, but
then in the cloud we're still just
running like normal Verscell AISDK
stream text loops. A lot of the reason
for that is because codecs and cloud
code benefit a ton from having a real
computer to work with, which is why
sandboxes are so damn valuable right
now. And that's why I'm really hyped
about Upstash's new box product. And you
might be thinking, wait, isn't Upstash
the company that just has that really
nice Reddus instance? And yes, they are.
DÉBLOQUER PLUS
Inscrivez-vous gratuitement pour accéder aux fonctionnalités premium
VISUALISEUR INTERACTIF
Regardez la vidéo avec des sous-titres synchronisés, une superposition réglable et un contrôle total de la lecture.
RÉSUMÉ IA
Obtenez un résumé instantané généré par l'IA du contenu de la vidéo, des points clés et des principaux enseignements.
TRADUIRE
Traduisez la transcription dans plus de 100 langues en un seul clic. Téléchargez dans n'importe quel format.
CARTE MENTALE
Visualisez la transcription sous forme de carte mentale interactive. Comprenez la structure en un coup d'œil.
DISCUTER AVEC LA TRANSCRIPTION
Posez des questions sur le contenu de la vidéo. Obtenez des réponses alimentées par l'IA directement à partir de la transcription.
TIREZ LE MEILLEUR PARTI DE VOS TRANSCRIPTIONS
Inscrivez-vous gratuitement et débloquez la visionneuse interactive, les résumés IA, les traductions, les cartes mentales, et plus encore. Aucune carte de crédit requise.