Cursor, Claude Code and Codex all have a BIG problem
FULL TRANSCRIPT
You've probably seen me shifting around
my dev tools over the last few years
from Copilot to Super Maven to Cursor to
Cloud Code to Codeex to the Codeex app
and many, many more things. But I've
noticed something about all of these
solutions. A core problem that seems to
be holding all of them back. You've
probably noticed it, too, but maybe not
in the same way that I have, or maybe
not understood the underlying problem
that's causing it. To put it frankly,
they all suck. Just like they're bad. I
come from an era where the tools we use
were so carefully finely crafted to
behave super well and I long for the
days of sublime text almost every day
where I open up cursor and watch it
shift around my UI a whole bunch
whenever I click anything and then
nothing does what it's supposed to do.
You'd think with all of these AI tools
making it easier than ever to write code
that we'd have things that work better,
right?
About that. That's actually the thing I
really want to talk about today. The
fact that all of these tools were
written with these models is not an
advantage. In many ways, it's a
disadvantage. The problem with cursor,
cloud code, codecs, and all of these
other tools is that they were built with
those same tools. Historically, this has
been a good thing. When you write the C
compiler in C, you're able to do awesome
things. When you write the C compiler in
C, it makes you a good target to dog
food the language, and it makes the
thing that you're maintaining much
easier to maintain by the users of the
thing. Generally speaking, dog fooding
your stuff is good. Generally speaking,
and now I have to drop some
controversial takes that I don't think
it's the right call for a lot of these
companies. In fact, I think their choice
to bet on these things as early as they
did might be causing them a lot of
problems. This video is going to ruffle
some feathers and sadly many of those
feathers belong to friends of mine,
companies in my portfolio, etc. I am an
early investor in cursor. I technically
have something tied to anthropic. I
don't really know cuz it was a scout
check to bun that now is part of
anthropic codeex I funny enough have no
financial tie to anything open AI beyond
using their stuff. So as always account
for bias. I do my best to be transparent
but I'm also about to talk about these
companies in a way that is going to piss
all of them off and probably get me into
some trouble. So this is going to be
fun. If all of my early investments are
going to go to zero due to all the
things I'm talking about here, I need to
make sure we make some money. So we're
going to take a quick break for today's
sponsor before I start roasting. If
you've been around for a while, you
probably remember how much I loved
today's sponsor, Augment. They were my
favorite way to figure out what's going
on in really big code bases because
their indexing engine was incredible.
But then I started using other tools and
as such stopped using Augment as much.
Can't our agents figure out what they
need to know nowadays anyways? Well,
kind of. But you don't get that absurd
level of immediate responsiveness that I
got I just fell in love with with
Augment. So, they decided to put this
out for everyone to use via a tool you
can add to whatever agents you're using
today. As you guys know, I like Codeex
quite a bit, but has a bad habit of
searching for ever trying to find
information. Here I have the huge T3
chat codebase, and sometimes Codec can
just search this forever trying to find
things. Well, Augment has their own CLI
you can use, and it's really cool. I
just opened it and told it to index the
codebase. Where it gets way cooler is
when you use it in other tools. So, now
it is indexed. Check this out. We're
going to switch over to codeex. It just
booted the MCP. Let's ask it for
something a bit annoying. trace the
logic around how subscriptions work for
paying users and it immediately uses the
codebase retrieval tool that is provided
by augment. This tool is weirdly capable
of finding exactly what you need and
almost nothing else. See how quickly it
found all of this. This is real time.
We're under 20 seconds in. This would
have taken codeex like 5 to 10 minutes
before and possibly wouldn't have been
as accurate because it might have missed
things that were related. Indexing your
codebase makes it possible to find
related information that doesn't come up
via GP, and the results end up
consistently being way better than I
would have expected for the exact same
prompt. Working on a small codebase or a
small project, this isn't going to be a
groundbreaking thing. But for those of
us that rotate between small and big
code bases, it makes them both feel
almost identical to work in, which is a
magical unlock for those of us building
bigger and bigger things. Augment always
felt like they understood enterprise and
big business work better than the
competition and have never felt it more
than I have using the context engine in
other tools. If you feel like your
agents don't understand your codebase,
fix that now at soyv.link/ augment. So
if we want to explain really simply the
problem with cursor cloud code and
codecs, simple. They suck to use.
They're super inconsistent. Not just
because the models are
non-deterministic, but because the
actual code that you're relying on is
really bad. They change every day and in
ways that are obnoxious. One of the most
common comments I get on my videos is,
"How do you get the agent and editor
tabs on the navbar in Cursor?" If you're
not familiar, Cursor had this wonderful
agents editor tab toggle in the corner
here where you could switch between an
agent mode where you were just looking
at and managing the agents and the
editor was secondary and the traditional
editor mode that we all expect something
like cursor to be. I loved this and I
switch behind them all the time. I would
spend most of my time in the agent mode
and then switch to the editor mode when
I really wanted to dig into the code.
Soon after this launch, and this was the
cursor 2.0 launch, I went to the cursor
office to talk about a lot of the
problems I was having. I was like this
close to what was going to be the cursor
crash out video. I ended up ranting a
lot about my problems with cursor. It
went kind of viral on Twitter. They had
me to the office. They sweet talked me.
They bought me some chocolate. Told me
things that I'm still horrified about to
this day. And then I walked off. They
have made progress since. But one of the
things they told me that I told them
this is [ __ ] terrible and stupid and
you should not do this is they decided
they were going to remove the agent
editor toggle in favor of letting people
customize it more. So instead of that,
we now have all these fun buttons, these
toggles, the change layout, which
underneath has the agent and editor
layout, but those aren't things you
toggle between. Those are default
template starting points for how you
might want to configure your editor. And
now everything's [ __ ] broken. I
switch to editor mode. I switch to agent
mode. The sidebars change where they
are. So here the sidebar for the agent
stuff is on the left. When I switch to
the agent mode, it ends up moving to the
right. There was a hotkey to toggle
between these things before. I think
they changed it. But now when I do
almost any of the things I did before,
oh [ __ ] of course I leak my [ __ ]
email. The fact there's a one-click leak
email button in the editor is enough of
a reason for me to curse them out. What
the [ __ ] cursor.
I don't think there's any product that
has caused me to leak my email more than
[ __ ] cursor has. I'm very thankful I
use my GitHub email and not one that
actually matters because it's leaked in
half the times I've opened the [ __ ]
app. You see what's happening? I'm
trying to not crash out at Cursor both
because I'm an investor and because I'm
afraid of the company and also just cuz
Crash Out content isn't my favorite
thing to do. But half the time I open
this editor pisses me off. Just being
real. It It is falling the [ __ ] apart.
It's bad. Regardless, the thing I'm
trying to say is the feature that was in
my video that people liked enough that
they're asking for it in the comment
section just got deprecated for no
[ __ ] reason at all. There's no reason
to have removed the agent editor toggle.
I think it is dumb that they did that. I
told them in the office it was stupid
that they were doing that. They did it
anyways. And now I get questions all the
time like, "Where's that button? That
seems really useful." I agree. It was
really useful. It [ __ ] sucks. They
removed it.
Okay, calm down. Stop just raging at
cursor. command Q and go back to VS Code
for now. Please cursor, let me switch
back. I want to, you know, I want to.
You gave me a bunch of credit to
incentivize that I use it. And I still
don't because I don't like using your
editor right now. I use cursor in the
browser more than I use it in the
[ __ ] IDE at this point.
Anyways, if you thought my cursor
crashes were bad, wait till we start
talking about claude code. Oh man, the
fact that a CLI has even more
non-deterministic [ __ ] behavior than
a fork of a giant app like VS Code. Like
like just seriously imagine this.
Imagine 5 years ago somebody told you
that they were moving away from idees
and towards CLI because they wanted
something simpler and less buggy and you
had to sadly respond to them, "Sorry
about that. The CLI are actually more
buggy." And they just stare at you like,
"What the [ __ ] did you just say? The CLI
is more buggy. What? How the [ __ ] do we
get here? Obviously, something like
pasting images is not going to be the
most consistent thing in the world in a
terminal UI, but the fact that it is so
non-deterministic and broken is absurd.
It's absurd enough that it caused me to
just [ __ ] lose my [ __ ] on stream last
week. It just took so long. There are so
many things with cloud code that are
driving me mad. What just happened
there? because you don't know what keys
I was pressing. When you paste an image
in cloud code, it takes time and the
images are often big enough that it has
to run local compression before
uploading it to their server. It doesn't
block the input when that happens and it
doesn't show you that anything's
happening. So, I just submitted a
message while it was waiting for the
image to attach. It submitted without
the image attached because it doesn't
block or wait until it's done. We even
figured this out with T3 chat in our
first month. And then when it was done,
it wasn't there. It didn't even show it
in the little section on top here. So I
repasted it and sent it. And when I sent
the follow-up, it are you kidding.
Literally, while I am explaining this,
it just failed to compact. I I do not
get how anyone does serious work in
cloud code. It is not a serious
application. It is
what the hell is going on? This model,
this harness, this ecosystem feels like
it is burning. What the [ __ ] That's the
moment I decided I had to do this video.
I still feel rage from that one. The
amount of things that went wrong there.
First off, pasting an image didn't block
the input. So, I accidentally submitted
without the image. But, it turns out if
you submit a message while the image is
still processing, it will silently be
attached to the next thing you sent. So,
I pasted a second image accidentally
because I wanted the model to know what
the UI looked like. So, I pasted the
screenshot again. It showed me I had one
attachment. I sent it now has two.
Obnoxious. I went to go demonstrate
this, pasted a third image, and before I
could even send it, I hit the context
limit because it failed to correctly
compress the other image because of the
weird race condition that existed. And
as a result, the thread was dead. There
was no reviving this thread. The work
was gone. It just it died. It's over.
It's done. When was the last time you
used a dev tool that wasn't some crazy
AI crap that like one little bug in the
UI could just throw away that whole path
of work and force you to restart from
scratch? It's meme tier. It's [ __ ]
meme tier. I actually cannot fathom that
this is how we are writing code every
day now. It is absurd. Like the anti-I
people should not be talking about like
how I don't know [ __ ] co-pilot sucks
or whatever the hell they're bitching
about. Just come use the tools and then
talk [ __ ] on this and not even the
performance. Like the performance is
unacceptable. We'll get to that. But the
the state of the UX, the lack of
consistency, the the fact that it just
it feels like I'm the only person using
it because I don't see anyone else
bitching about these things. It's so
bad. I have yet to have an experience
with Claude code that didn't feel like
they were forcing an old broken UI into
the terminal with all sorts of
nondeterministic [ __ ] The worst
part isn't that it fails all the time.
It's that it never fails the same way
twice. It's a slot fest. And that's the
theme I want to drive home here. All of
these [ __ ] tools committed to vibe
coding way too hard way too early. And
sure, vibe coding is a slur, whatever.
We all have our terms we like to use.
What I'm referring to here is letting go
a bit and letting the agent do its
thing. Steering less, coding more, and
being a little too willing to accept the
code that the agent made. You can say
I'm exaggerating about this here that
there's no way that they actually vibe
coded Cloud Code. Want to [ __ ] bet
within anthropic? Claude code is quickly
becoming another tool we can't do
without. Engineers and researchers
across the company use it for everything
from major code refactors to squashing
commits to generally handling the toil
of coding. This was in February 24th.
Wait, it's not February 24th. Oh,
February 24th of last [ __ ] year when
the best model available was Sonnet 37.
They were already using Claude Code for
the majority of their work on Sonnet 3
fucking7. That's the problem here.
Sonnet 37 was a very impressive model.
It was able to do tool calls reliably.
It was meaningfully smarter than 3.5. It
could do some little UI fixes in ways
that were better than before. Maybe,
just maybe, if your merge conflict was
simple enough, it could help you resolve
it. Maybe, you cannot build a serious
application with Sonnet 3 [ __ ] 7.
Let's be real here. This is the thing I
want to emphasize.
They committed to using AI to code too
early because they wanted to build
things using AI, using the tools they
were making to maximize the chance that
they would make something good. Both so
they could iterate faster, but also
because they wanted to commit to using
AI for things as they're building tools
to use AI for things. And what resulted
is a total [ __ ] slopfest. To better
explain this, cuz I know a lot of you
guys have not had traditional jobs. And
even for those who have, you probably
were brought into a giant codebase that
has already existed for years. The way
that code bases work over time is very
interesting. If we do this as a very
very vague quality over time, the way
that working in a codebase tends to work
is this. Initially, it does like okay,
it quickly has like a nice place. It
goes down a bit, you care more, you
restore it, but eventually you hit a
plateau and it stops improving. And when
I say quality here, I mean a lot of
different things. I mean the quality of
the experience users are providing. I
mean the quality of the patterns that
exist in the codebase that we're relying
on. I mean things like the packages
you're using and what versions you're on
of them. I mean the likelihood that
you'll ever upgrade the package in the
first place. If I init a new project and
I'm on React 19 and a month later React
20 comes out, I'm probably going to
update. If I've been using that codebase
for 4 years and React 20 comes out, I'm
much less likely to update. Codebase
inertia is real. Every codebase has a
point in time where the quality of
working in it stops improving and it
stops being a focus of the team. The
amount of time it takes to get there can
vary a lot based on different things. I
would argue generally speaking 3 to 6
months of focused effort from the team
is what you get and at that point the
quality of the codebase is the the bar
that you're going to hit. Like things
can get worse. Absolutely. Believe me,
things will always get worse. But the
quality of your codebase 6 months in is
the best it will ever be. If you don't
have the codebase exactly how you want
it and working in the way that you want
to work in it at the 6-month mark, it
will be a downhill ride and you will
never see the light again. And I think
that's what's happened to a shitload of
these projects. If the 6-month mark was
a pile of vibecoded slop that you wrote
with Sonnet 35 and 37, you're [ __ ]
It's not going to get better. And this
is the problem. A lot of these projects
are no longer months in. They're now
many years in. Claude Code is roughly a
year and three or so months in. And it
was really good for those first 6
months. It felt like it was improving
consistently up until I don't know,
Augustish, September maybe. And now I've
just felt the downward trend as it gets
buggier, as it gets slower. It's now at
the point where I'll just I'll show you
my favorite one here. I'm just going to
open Claude Code and I'm going to
immediately start typing. This is me.
Okay, it wasn't that bad that time. Half
the time when I open Cloud Code, it
doesn't actually start recognizing my
inputs until I'm a word or two words in.
How the [ __ ] is a CLI app locking the
input box? This happening on the web was
a meme that kind of destroyed the web's
reputation. The idea that your keys
would be sticky, it would take too long
to start showing you the changes. And
now we have it in our terminals. Like,
what? Ah, apparently open code's bad
about this, too. Let's try typing.
Typing. Wow, I got to type a lot before
I started recognizing characters. Here's
what we'll do. I'm just going to run my
finger like 1 to 10.
I got all the way to the end
and it still wasn't recognizing inputs.
That is pretty hilarious. That's what
you get for getting out of that mode.
They just did the database migration, so
that's cool. So, let's try one last
time.
Okay, got to eight and nine. Yeah,
insanity. Uh, how do we get here? I do
sympathize for the cursor team a bit
because they started with one of the
most complex giant code bases and none
of their engineers had worked on it
before because they forked VS Code. That
is a genuine difficult challenge. Taking
something as big as VS Code and making
meaningful changes and having to
maintain it over time while also making
sure to bring in whatever is needed from
the actual OG codebase when things are
being upstreamed. That sucks. It's real
difficult maintenance work and they've
now diverged so far from the original
that bringing things in is basically
nonviable. And honestly, it might be
time to cut ties with that codebase. I
think it, as crazy as it sounds,
considering how many hundreds of
engineers have had working on it for
years, it might be better to wipe their
hands clean and start from scratch.
Maybe bring the harness over and nothing
else. I don't know. I'm not in the
codebase. I don't know how it works. All
I know is that it sucks to use. Cloud
code is no [ __ ] excuse. Cloud Code
was the start from scratch. But now,
because the team insists that 100% of
Cloud Code's code is written by Cloud
Code, the slop fest continues to the
point where it's easier to buy their
core dependency of Bun and hope it can
fix the hellish performance issues
they've caused for themselves than it is
to just unfuck what they've done. Can
Can we take a moment to appreciate the
level of absurdity there? spending an
absurd amount of money to buy one of the
most talented native developers in the
industry and his team building a
JavaScript runtime in Zigg so they can
make things more performant. Acquiring
that just to try and hopefully maybe
make your CLI tool not use 2 gigs of
RAM. What?
How are we here? Like the meme isn't,
oh, they used React and React's bad. The
meme is that they vibe coded the whole
[ __ ] thing and now they're in a pile
of slop.
So what do we have so far? Codebase
inertia is real. You will not top the
quality of your codebase 6 months in for
the rest of the time that codebase
exists. And everybody who bet on these
models early models like Sonnet 35, the
old GBT models, all of these early
models you use for code, all of them
sloppy their [ __ ] so fast that there is
no return. And to be clear, modern
models are great. Opus 4.6 six and
codeex 5.3. They're miracle workers.
They can do things they never would have
imagined. They are much worse off trying
to clean up bad patterns than they are
trying to make new ones. And here is
where the harsh reality is. Let's say
your code base is decent. 90% of it is
good. We'll have green be good and we'll
say whatever is left here, 10, 20,
whatever you want to measure it as. This
other section here is bad. This is your
codebase. Now the code base needs to get
bigger. More people are coming in. More
changes are being made. Code base is
quickly growing. Generally speaking, it
doesn't matter if you're vibe coding, if
you're using AI tools, or if you're just
hiring traditional people, the codebase
gets copied around. The patterns used in
one place will be used in others. And
generally speaking, the pattern that is
being used is the one that is the
easiest to find and the easiest to copy.
Sadly, the ones that are the easiest to
find and copy are very rarely the good
patterns. So, what ends up happening is
the good parts of your codebase expand
linearly, and the bad ones tend to
expand exponentially. So, once you have
the the starting point, we'll say the
the the point you're at at that 6-month
mark, the way things go for over time is
that the bad parts exponentially
increase and the good parts linearly
increase. So, you very quickly end up in
a slop fest. The models accelerate this.
Codeex loves referencing the codebase
for examples to use in the work that it
does. Codeex will very happily copy a
[ __ ] pattern from somewhere in your
codebase and apply it somewhere else
because it thinks it passed your bar.
It's in the codebase. And honestly,
that's fair. One of the best moments I
had in my time at Twitch is when I filed
my first PR to the new web repo that
became the Twitch site, the rewrite
entire script in React. And I made some
dumb changes in it. And when I was asked
about those changes, because one of the
people reviewing it was like, "Wait, why
did you do that?" I showed them the code
example I found and the page in the docs
that led me here, they were like, "Oh,
that's really bad. This shouldn't happen
anywhere." So before I even got to fix
my PR, they updated the docs to no
longer steer in that way. And they filed
multiple PRs, removing any other place
with similar patterns. So it was less
likely a less experienced TypeScript dev
like myself would end up in that
position. The agents accelerate that. If
a junior Theo can come in and make a
dumb mistake because he copied code from
somewhere else, the agents can make that
10 times faster. And they do. So if
you're not starting from a really good
spot before the agents take over your
codebase, probably a little bit [ __ ]
And I think that's what's happening. No
model can be better than the code it is
starting with. And the code these things
start with are garbage because a lot of
it was written by worse models in the
past. And I can tell you from experience
that this is the case across almost
everything I've worked in. That's why it
was nearly exactly on that six-month
mark that I made the move away from my
custom sync engine in T3 chat over to
Convex because I knew we were quickly
approaching the point where we couldn't
make things better in the codebase.
There are lots of subtle improvements
you can make like you can change your
llinter to be something better. You can
apply a new lint rule and clean some
things up. You can upgrade a library
here and there, but after that six-month
mark, the majority of that code is going
to stay there. It just is. And any
patterns you've established already,
those aren't going anywhere. So, how do
we fix this? How do we prevent this? How
do we make our code bases pleasant at
the start and stay pleasant both for
humans and more importantly for agents?
This is actually something I feel
somewhat qualified in specifically
because I would have to do this a lot.
Not because like every other coder was
bad and I was good, but because I cared
a lot about velocity and developer
experience and those things tend to line
up really well with the quality of the
codebase. Not that going faster means
the codebase is better. actually kind of
the opposite. The codebase being really
well built and laid out makes it easier
to make changes fast. So the first thing
optimizing for ease, clarity, and speed.
Really trying to make patterns in your
codebase established early that are easy
to make contributions to that are clear
in what they do and are fast to change.
Ideally, a small change should only have
to touch a small number of files. And a
big change should probably have to touch
a big number of files. Common mistake I
see is people architecting their
codebase. So big changes are simple and
as a result the small changes end up
being really really complex. So common
if small changes and big changes take
the same amount of effort you [ __ ] up
both sides and it's just so common. This
is also why things like Tailwind are so
cool because it reduces the number of
services that have to be hit to make a
change. This is also why things like
GraphQL are bad because you have to
touch way more [ __ ] to just add a little
bit of information to your UI. So
optimizing your codebase so that it is
easier to understand what's going on,
clearer as well, and fast to make
contributions to is great. But there's a
bigger piece here that I really want to
emphasize. Tolerate nothing. If a bad
pattern makes it in, it will multiply.
Bad code multiplies way, way faster than
good does. Because the bad code wouldn't
have made it in if it wasn't convenient.
Bad code and convenient code have a lot
of overlapping characteristics. But bad
code multiplies too aggressively to ever
let it in. You have to be strict about
this. You can't make the exception of,
well, we need to hit this deadline, so
we know this is slow, but we'll fix it
later. Later is another word for never
in the software development world.
You're not going to fix the thing. So,
don't tolerate it. Don't let it in the
codebase. And along those lines, if you
do stumble upon something bad, if you do
find something in the codebase that
smells, don't hesitate. Don't look into
the history, don't question why it's
there. Murder it with intensity. There
is no room in our code bases for slop.
It spreads too fast. If you do happen to
stumble upon it, drop everything to kill
it. I don't care what deadline gets
missed. I don't care what manager is
mad. I don't care what agent is
insisting that it's totally fine. If it
smells, it is bad. And if it is bad, it
should be removed. no tolerance. This
does mean you have to pay more attention
a bit. Not that you have to read every
line of code, but you need to keep an
eye on the general patterns that your
codebase is evolving. How are these
things interacting with each other? What
are the methods they use to define
things? How concisely can they describe
things? A simple litmus test for this is
ask an agent how a feature works. If it
can give you a good answer in under
three minutes, you're probably fine. If
it takes more than 5 minutes to search
things and doesn't give you a good
answer, either of those things being the
case, honestly, throw it out. You'll
start to build an intuition for this,
not just by like reading the code, but
seeing how long it takes for agents to
complete the code. If you ask an agent
to do something that should be simple
and it takes longer than it should, you
know something is off and you need to go
fix it. Something I was known for in my
time at Twitch was what I referred to as
sledgehammer style development. When I
came into something that wasn't working
how it was supposed to, it was often
significantly easier to just remove it
and start again than it was to try and
fix the thing that was broken. The
reason that wouldn't work is because it
was too expensive to reproduce it. If
you wanted to throw out 5,000 lines of
code, good luck, because the average
engineer race between 10 and 100 a day,
it's going to take you 50 days to
replace it all. What if it didn't? What
if you could write the thing correctly
in a few hours instead? Here's where we
get into how we can actually do this.
Right now, a lot of these problems are
because AI accelerated natural problems
in code bases. More devs are maintaining
bigger code bases and they're looking at
the code less. If I was to just do my
job as a manager, this can happen the
same exact way without AI. If I write
this codebase, I get it to a decent
state, and then I let my team take it
over, and I don't code review too
closely, I let them do their thing, tech
debt starts to pile up. Bad patterns
start to clone and appear all around the
codebase, and then it all collapses.
I've had to learn patterns to address
this as a manager already, and those
same patterns apply here. But there is
one big difference. You don't have to
feel bad telling the agent it did things
wrong. It [ __ ] sucks and needs to fix
it. If my logs with cloud code were to
go public, I would probably be
institutionalized for some of the things
I've said to it. I am not rude to my
employees. I go very out of my way for
that. I don't even raise my voice to my
team. The thought makes me feel sick,
genuinely. But when [ __ ] [ __ ] sucks,
you need to yell about it sometimes. And
it feels kind of nice to yell at the
agent when it [ __ ] up. But more
importantly, you can give the agent [ __ ]
work. No one wants to be the person to
upgrade a 5 plus year old codebase to
modern patterns. The agent will do it.
Nobody wants to port this old internal
service that 15 people at the company
use, but the agent will do it. The cost
of sledgehammers has gone down
exponentially. Historically, it just
wasn't worth it to delete the 5,000
lines of code and replace it cuz it was
too expensive to replace it. Now, it's
not. You do need to make sure the new
solution is well aligned. But if you can
find the right compartmentalized pieces
of your codebase that are worth
sledgehammering and rebuilding, it's
absolutely worth it. now. And I've been
doing this a bunch. There have been many
places in many of my code bases where I
was like, "This just [ __ ] sucks. Can
we rewrite this?" And it worked. It's
crazy. And I'm not saying that these new
models are magically able to never
produce slop. If your codebase has a lot
of slop in it, the models will reproduce
it faster than your engineers can. And
both can already do it quite fast. But
if you do a good job of planning with
the model, specking out exactly what you
want. Because this is probably the thing
the new models are significantly better
at is planning and conversations. spend
a bunch of time in the back and forth
with the model. Write a better plan.
Tell the model, "This sucks. I want to
delete this entire folder. Let's work
together to make a better version. What
would it look like?" Maybe you have some
ideas. Maybe you have some syntax or
patterns or an API definition that you
know would be better. Write it out with
the model. Go back and forth. At the
end, you have a markdown document that
you can read and determine, is this good
or is this bad? And once you've decided
if this makes sense or doesn't make
sense, you can tell it to go build. And
as long as you have a good enough plan
in place before you start that piece, it
will probably come out okay. Will it be
as good as handwritten code by a human?
Depends a lot on the human. I would
argue the average human is a worse
engineer than the average model at this
point. I have not hired many engineers
in my career that are better than codeex
5.3. Hired a few. They're hard to find,
but they all make mistakes. They all
have the same problems. And we need to
build systems that prevent that. First
point is spend way more time in plan
mode helps a ton. Just working with the
model to make sure there is a good plan
that everything in the plan sounds good
to you. Just take the time. You'll be
amazed at what it can do. And actually
read the plan. I know a lot of people
aren't reading the plan anymore. Depends
on the size of the change and how much
you care. But if you want to make sure
this codebase is maintainable over time,
treat it accordingly. Next, and hope
this was obvious by now, use the latest
stuff. If you're still using Sonnet 4
because your company hasn't approved
Opus 4.5 yet, make them approve it or
get a real job. Like the companies that
don't understand this are so [ __ ]
Similar to this, throw away way more
code. I find most engineers are still
too attached to code and are scared to
throw away the 5,000 or 10,000 lines.
Don't be. Aggressively toss it. If any
part of you is telling you, I should
probably delete this thing, you probably
should have deleted it a month ago. Most
engineers have this bar really poorly
configured where they try too hard to
fix the code instead of replacing the
code. And on that note, actually, don't
be afraid to branch off. What I mean by
this is that I've seen a lot of times
where a code base has too many things in
it that don't really belong in that
codebase. This often happens because of
things like it's too hard to deploy this
thing again somewhere else, so we're
just going to stuff all the features
into version one. Silly example of this,
I'm a Twitch streamer. I mostly a
YouTuber, but I do actually stream on
Twitch. I'm live right now with my
audience filming this video. I used to
work at Twitch. I was working on safety
at Twitch. I was building the internal
safety platform and I was also helping
build the core Twitch site. We had a
really bad incident where one particular
game was being spammed with horrible,
terrible, nasty things. The category had
a bunch of fake streamers spinning up
and streaming vile [ __ ] At the time,
the internal safety tools did not have a
way to review things pre-report. The
point of the internal tools was to
review reports, not to review live
streams as they were live, which meant
that there was no way for our safety
moderators, admins, internal safety
tooling team to easily go through a
category and ban a bunch of different
content creators. Yes, this was the
artifact incident. God, the word
artifact still haunts me to this day
from how [ __ ] up this incident was. We
needed a way to set up our admins to
quickly ban streamers in this category
if they were doing nasty stuff. You
could easily see just browsing the
category if they were actually playing
the game or not. The Twitch safety team
needed a way to deal with this. And the
proposal I was given was to add a
permaban button that only admins could
click in the main Twitch site. So if an
admin was signed in on Twitch, they
could just scroll through a category and
have a one-click instab. I immediately
jumped on that and said, "Are you
[ __ ] kidding? We're never exposing
the internal permaban endpoint to the
public [ __ ] Twitch codebase. Are you
joking?" No, never.
And obviously like a normal user can't
hit it cuz it required the elevated off
of the global mods and admins. But the
code even making it into the main site
was horrifying. And regardless of all of
the potential security safety
implications here, the code smell to
have a custom code path that only
applies to 10 to 15 people in a codebase
that serves millions a day. No, we're
not building a feature for 10 to 15
people in a codebase that serves
millions. You're [ __ ] joking. Just
like basic tech debt math tells you to
not do that. But these types of
proposals to this day are very common.
Why don't we just add the feature really
quick? Well, it probably belongs in its
own codebase, right? Well, then we have
to go link it to all the right
dependencies. We have to get permission
from the team to deploy it. We have to
host it. We have to figure out all these
pieces. We have to admit it. Like making
the new project to do this one-off thing
just wasn't worth it all the time.
Thankfully, for this particular
instance, I had enough ownership of the
internal codebase and enough familiarity
with the external one that I knew I
could quickly recreate the category
browser in the internal tool to make it
easier for the admins to ban people from
it. So, that's what I did. It ended up
working fine, but I had to go do that
and I had to fight my team the whole
time telling them, "Sorry, like I know
this is going to take slightly longer.
It's not going to take that much longer
and it's not going to be a huge risk. We
need to do it this way." And the only
reason they let me, by the way, it
wasn't even the internal versus external
thing or the security or any of that. It
was because the external core Twitch
site only deployed once a day at around
1 p.m. if I recall. So, we were going to
have to get exceptions for them to
redeploy it just for this one button to
be added for the admins. But our
internal tool I could deploy whenever I
wanted. So, it was a deployment
architectural thing where the internal
deployment was slightly easier because I
owned it. That was the only reason that
this button didn't end up in the public
codebase. So, why the [ __ ] did I just go
down this tangent? Hear me out. All of
the reasons, 100% of the reasons to put
that in the existing codebase for the
public Twitch site instead of making a
new thing or putting it in the internal
thing instead. 100% of those are gone.
It is easier than ever to build a new
codebase. It is easier than ever to port
features from codebase A to codebase B.
It is easier than ever to get things
deployed and shipped. There's no more
excuse. And I find a lot of these nasty
code bases are because of things making
it in that don't belong. I would be
blown away if the cloud codebase wasn't
full of deprecated features that were
hidden under feature flags and never
shipped. Things that are specific to old
models that no one's using anymore.
Integrations with systems that don't
exist. Internal tools that they use
themselves that aren't part of the
external code that they don't want other
users having. I would be floored if less
than 50% of the codebase for cloud code
was stuff like that. And you have to
fight that instinct. You need to push
back when things don't belong in the
codebase. The number of code bases we
all have in our lives should 10x over
the next year. I went from actively
working in two to three code bases to
like 40. And yes, there's a lot of
context to manage. Yes, that's a lot of
things going on. But the harsh reality
is that most of those code bases once
they do the one thing don't have to be
touched the same way that the shitty
folder in another codebase doesn't have
to be touched. The problem comes when
you have to do anything with that big
codebase. If you have the one major
codebase that matters, like in this case
the Twitch site, doing an upgrade to the
React version should not require an
internal tool that no one's used for 2
years to be bumped as well. If you build
a strong discipline here, if you get
good at keeping unrelated things out of
that codebase, just making new modules,
making new repos, making new projects
for the things that don't need to be in
your main codebase, suddenly maintenance
becomes way less miserable. And I know
for a fact that is not how any of these
things have worked. Cursor keeps adding
things into cursor. They keep building
on top of the mess and the result is an
unstable shitow. Cloud code is a
slopfest, constantly building on top of
itself with things that no one uses and
they don't even need anymore. Fight that
urge. Push to keep things out. Make it
easier for your team to do this as well.
If it's too hard to deploy a new
service, fix that. It should be easy for
anyone at your company to deploy a new
internal service to a new internal URL
or subdomain. It should be trivial. If
it's hard to get your codebase to spin
up a new repo on your internal GitHub
enterprise or whatever shitty source
solution you're using instead, fix it.
It should be trivial for anyone in your
company to spin up and deploy 10 new
code bases in a day. The agents have
made everything else easy. If that part
is why you're blocked, fix it.
Incentivize new project creation instead
of old project sloppification. If it's
not essential to the features that the
majority of your users are using it for,
make it something else. And the most
important final piece here, ask
questions. Specifically, ask the AI
agent when it's doing work, what is it
doing? Why is it doing it? Where did it
get this idea from? Why does it think
this thing matters? Why did you choose
to go down this path? If you ever notice
the agent doing something wrong, there
is a reason for it. These are
non-determinism machines. Like, it's
never going to do the same thing twice,
but it's going to do them close enough
and the reasons are going to be
relatively consistent. If it did this
thing wrong, figure out where it got the
idea from. If it came from your
codebase, eliminate it. If it came from
your quad MD, fix it. Adjust it. The
problem with these tools is that they
didn't have this diligence because they
were too focused on how fast can we ship
and not how well can we ship. And I feel
this deeply. I feel it so deeply that I
bullied Cursor into doing a month of no
new features and just cleaning up the
slop. And honestly, some of these code
bases probably need to just be reset
from scratch. There's a project I think
about a lot and you guys are going to
think I'm insane for going down this
rabbit hole, but just hear me out, guys.
There's a game called Vampire Survivors.
Many of y'all may have heard of this
game before. Vampire Survivors was
originally based on a shitty mobile slot
war game where you would just walk
around in circles and a gun would
autoshoot and destroy monsters until you
eventually couldn't finish them all off
and you would die. Fun fact about
Vampire Survivors, it was written in
Phaser.js in the browser, but you might
see to the right there, Vampire
Survivors for Nintendo Switch. The
Nintendo Switch is not a competent
console. When it came out, it came out
with a processor that was already over
two years old. The RAM speeds are slower
than most hard drives are, not as these
hard drives. The Switch was a [ __ ] piece
of hardware. There was no world in which
they were going to get the PhaserJS
version along with a full JavaScript
engine running in even vaguely a
performant way for the Switch. So, they
rewrote it in C++. That became the
Vampire Survivors console version. That
is now also the version you get on
Steam, as far as I know. I might be
wrong on that, but I'm pretty sure the
Steam version is also in C++. The lead
dev does not know C++. He hired a couple
additional developers to join in and
port the game to other systems. But
here's where things get real fun. When
the creator of the game wants to build a
new feature for the game, test out new
ideas, play with things, and just
improve the game, he doesn't do it by
editing things in the C++ codebase. He
does it by slopping away in his giant
[ __ ] show of Phaser.js for the browser.
And once he has it in a good enough
state where he feels like the game plays
well, he hands that off to the team to
go build it in the real codebase.
They're maintaining two code bases in
parallel. One version, the web version,
the phaserjs version is the one that the
game designer and creator uses to figure
out what does and doesn't work to
iterate on ideas to improve the game.
And once that version has things figured
out, well, the team is assigned the task
of porting it to the polished, refined,
reliable C++ edition. I think we're
going to see more of this going forward.
I do legitimately believe the future's
going to look something like this where
you maintain two or maybe even more
versions of a given codebase. Maybe you
use the vibe code slopware tools to make
something that works and test out
theories. Maybe you even ship that
version to some users to see how they
react to that. I've already seen this
happen in the past doing things like
using framer mocks to test ideas out for
users. What if we did this more
sincerely? What if we did this as part
of our actual design methodology? What
if we thought about building through
slop as a way to test ideas and then we
used more established engineering
practices to turn those ideas into good
reliable product? What if cursor just
cut their losses, treated the existing
code base as the slopfest version, used
that to prototype ideas, play with
things, and then started from scratch a
new version where they would port over
the parts that matter and leave behind
all of the slop that doesn't. It's
suddenly way cheaper to do that.
Maintaining the same codebase twice,
once as an internal version for
prototyping and once as an external
version for actual usage sounds insane
and would not have made any sense at all
in the past. I think it's starting to. I
don't know if this is actually going to
be a thing. I don't know if this is
going anywhere, but I think it's a
viable path for the companies that are
in this place where they've slopped
themselves into a corner and they need
to get out of it, but they don't want to
lose the iteration speed and the
experimentation that they can do in the
slopfest edition. I think this might
make sense. And we're even considering
dog fooding it ourselves for some fun
things we're cooking over at T3. I see
Julius has been cooking with all things
T3 code since we went live. He already
seems to have added hotkeys since. If
all goes well, this will hopefully be
out by the time this video is. Might not
be the case, but we have been deep in
the creation of T3 code, which is meant
to be a much more stable way to interact
with these agents. And this might end up
being the first time we test out that
theory where we have a vibecoded slop
version and poor Julius is stuck with
the task of making it actually work
reliably. We effectively already work
this way where I'll put up a slop PR
that makes the feature work that shows
what my intentions are and what I
expected to do and then my team will go
and build it correctly. half or more of
my PRs on T3 chat don't get merged even
preAI because I built a lot of these
things to show the UX feel it see if it
works right and if it doesn't retry and
do it again and then my team would be
stuck with the horrible task of making
that an actual good reliable feature
happened all the time the amount of
times my [ __ ] PRs have been trumped by
Julius is genuinely hilarious AI is only
accelerating it but I think there might
be something here this idea of
maintaining two versions of the codebase
or at least throwing away a lot of the
code that you're using knowing that
you're writing a ton of code to test
ideas not to ship. Instead of measure
twice, cut once, one of my favorite
phrases ever. Maybe it's file twice,
merge once. Maybe it's file 10pr once.
But the idea that like code is cheap, so
we should merge all of this cheap code.
Terrible, horrible, nasty. It's resulted
in the hell that we're in today. Maybe
these tools, as useful as they are, are
good for vetting ideas. and then the
final version we end up committing gets
a little bit more care in the approach.
Maybe or maybe the model just gets so
good that none of this matters and they
can rewrite everything from scratch
anyways. Who knows? All of this can go
anywhere. This is all speculation about
patterns that may or may not work. My
experience for maintaining large code
bases and all that and more. I don't get
anybody who says that engineering skills
don't matter. I've never felt like my
skills are being pushed to their limits
more than they are today. There's
opportunity for good engineers to do
great things now. And I hope that this
absurd rant helps you understand why and
also maybe helps you understand why all
of these slop fests are so miserable to
work in and maintain. I don't see a
future where the existing cursor and
cloud code bases become nice to work in
or nice to use the results of. I do see
one where they're treated as a slot fest
they are. A new better thing is built
from scratch and as a result they have a
better app. But I really don't see a
future where claude code the codebase
gets better. I see a world where cloud
code the executable gets better because
a new code base is used to serve it. No
idea where this will all go. I just
wanted to complain a bunch. I appreciate
you all for listening. I hope that this
helps you maintain code bases a bit
better and maybe at the very least vent
your frustrations with these existing
tools a little better, too. It's never
been more important to maintain your
codebases well cuz the agents are more
than happy to tear them to shreds. Let
me know what y'all think about this or
if I'm just going mad. Until next time.
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.