Efficient COUNT, SUM, MAX with the Aggregate Component
FULL TRANSCRIPT
Counting, summing, finding the min or
max value is quite frankly one of the
most confusing aspects for people new to
convex. Before I became an official
Convex fanboy, I flagged it as one of
the things that I really don't like
about Convex when I first encountered
it. This is more of a technical issue,
but there are no aggregate queries in
the database. So, this means it's not
possible to count the number of items in
a query without explicitly iterating
them. If you are like me and you come
from the SQL world, then you're probably
used to doing something like select
count star from some table, but no such
thing like this exists on convex. Why?
Why on earth not? Well, I did just do a
whole video on the why of that. So, you
might want to check that one out first
if you want to dive deep into that
particular question. But, they're too
long, didn't watch. is that to perform
aggregates such as count or sum, you
kind of have to scan all the rows in the
database which can really cause
performance issues as your data grows.
So, Convex kind of takes the philosophy
of forcing you to think upfront about
how to manage those aggregates yourself.
In the solution section to that video, I
briefly mentioned that you could use
something called the aggregate component
to help out. And I said if people want
to have more information upon the
aggregate component, they leave me a
comment down below. Well, the people
have spoken. So, if you are one of those
commenters or you're somebody who's
looked at the aggregate component before
and noped out of there or you're totally
new to convex and have no clue what I'm
talking about, then this is the video
for you. We're going to go through how
to get started with it, how to do some
basic aggregations like counts and sums
before progressing onto some more
advanced use cases. Then, if we have
time at the end, we're going to talk a
little bit about how it works underneath
the covers. Oh, and do stick around to
the end as I'm going to give what might
be a bit of a spicy take on all of this.
So, once you've dropped me a like and
sub, let's get into it. All right, so
I'm going to assume that you have a
Convex project already running. If not,
check out one of the quick start guides
for how to get started there. Then,
you're going to want to install the
aggregate component with npm install at
convexdev/agregate.
And then, you're going to want to tell
Convex how to use the component by
creating a convex.config.ts ts file in
your convex directory and popping in
something like this. Note here, it's a
bit more complicated than you would
probably typically have it. For this
video, we're going to be using the same
component multiple times with different
names. So, it allows us to easily
identify the component um within the
convex dashboard. Right now, you got the
component set up. Let's have a look at
how to use it. So, let's imagine we're
going to build a game that is going to
show a leaderboard, something like this.
So here we want to show the player name,
the score, and rank. Oh, and by the way,
if you want to try out any of these
demos that I'm going to show in this
video yourself, they're all available in
the example folder in the aggregate
component repo. I've left a link to it
down below. Anyways, hopping back into
the code. First thing we need to do is
define a normal convex table for
recording our scores in our leaderboard
like this. And then of course, we're
going to need a mutation uh a way to
insert our score into the database. So
here we define our ad score mutation and
we push a record into our leaderboard
table. All right. So now the question is
how are we going to display a page of
data in our leaderboard table? I guess
we could do something like this. Here we
use the dot collect function to grab
every single row in the table. We sort
them and then we just slice off a page
of items. And this would work, but as
you've probably already guessed, this
isn't going to scale too well as we get
thousands, tens of thousands, millions
of scores added into our database. We
have to read every single row just to
get back one page of results. What we
kind of want to do is something a bit
like this. Here, we're trying to order
the scores by descending, skip to our
offset, and then take just what we need.
But as you may have noticed, there's
some red squiggles here because convex
doesn't have an order by or a skip.
Well, comics does kind of have an order
by if you use an index. Um, but it
definitely doesn't have a skip. And
related to this issue is the one of how
we go about ranking our scores. So, just
think about this for a second. How would
you go about efficiently ranking these
scores? Say I add another score here.
Let's say Mikey C with a score of 99999.
Well, how am I going to determine my
rank? Well, I guess as long as the
table's sorted, I could loop through all
the scores until I find the one that I'm
above and that's my rank. But h this
doesn't sound very efficient. If we're
going to have millions of scores in the
database, that could be millions of
scans just to find out what rank I am.
But then the issue is I'm going to
insert myself into the position here and
give myself a rank, but then everybody
below me is going to have to then move
down one slot, which is going to cause a
lot more um mutations into the database.
This really doesn't sound very
efficient, does it? But fear not, as
this is precisely where the aggregate
component steps in. So now let's define
a table aggregate at the top of our file
like this. Here we tell it that we're
going to want to sort by the score from
the leaderboard table and it's going to
be a number. Note that we are using a
negative value here because the
aggregate component by default sorts its
entries in the ascending form. So that's
from the smallest to largest. But we
actually want to sort in the descending
form. So from largest to smallest. So by
negating the score here, it kind of does
that. It'll make more sense in a minute.
So now when we insert a new score, we
must also remember to insert it into the
aggregate component like this. For now,
we're going to assume that there's no
existing scores in the table. And we'll
talk a little bit later about what
happens if you have a table with
existing data in it. So now we're
inserting data into our leaderboard
table and our aggregate component. We
can update our paging query to look like
this. So it's pretty simple. We're going
to use the aggregate components.app app
function to find the score that is at
the given offset in our page. So once we
have that first item in the page, we can
then use the pageionate function to
define the lower bound that we're going
to pageionate over and the page size as
our number of items. And once we have
that, we can then go through each of
those items in the page, look up in our
own leaderboard table, uh, the document,
and return the results. Cool. So, let's
try it out. We have our first five
scores here, and we can flip through the
pages, and the scores go down, and
everything feels nice and snappy. We can
change the uh number of items that we
want in a page, and it all just works.
Awesome. And up here we can add another
score and it should be entered into the
right place. Cool. And we can remove
some entries and it all just updates
nicely. Okay. So I guess what we've done
here is to say what is going to be the
given score for this particular rank
we're interested in, right? So as long
as our leaderboard scores are ordered in
a descending form, we can just grab five
or whatever items and just show them.
And this kind of makes sense, right? You
can see how this would be slow if we
were to do this without the aggregate
component. Like for page 100, for
example, we would have to manually scan
through 496 other rows if we wanted to
get just that page that we were
interested in. And this would not only
be just slow, it would also be really
problematic due to the way that the
transactional nature of convex works. So
like if any of those 496 rows uh that we
scan through change then it's going to
invalidate our page and so just even a
simple name change would cause the row
to be invalid validated which causes our
query to be invalidated and yeah it
could get expensive very quickly. So the
way that the aggregate component works
is it stores the data in such a way that
allows us to skip lots of those document
reads and get right to the part that we
are interested in. But I'm getting a
little bit of ahead of myself here.
We're going to talk about how the
aggregate component works a little bit
later on in this video. But for now,
let's just see what else we can do with
the component to help us out. So now
let's say uh that instead of wanting to
look up a score from a given rank, we
want to do the inverse of that. We want
to find out what rank I would be g would
be given if I had a specific score. So
we can do this really easily with the
index of function on the aggregate. Oh,
and not forgetting to invert the score
as we're actually storing our keys in
inverted form in aggregate that we've
talked about before. And now if we take
this for a spin, we can enter a score
here and we get the correct rank. And
again, this is reactive just like
everything in convex. So if we add new
scores or remove them, then it's
instantaneously reflected. And finally,
what happens if we wanted to get some
stats on a particular player like find
out what my average score is or my
higher score? Well, we can do that
pretty easily if we pull in another
aggregate. So now, um, we've changed the
sort key from being just a score to
being this array, this tupil of the name
of the person who got the score and the
score. So, it's first sorted by the name
and then sorted by the score within that
name for that person. We could also
potentially do this using namespaces,
but we'll talk about namespaces in a
minute. We've also added this sum value
here, which is going to sum all the
scores up and cache them inside the
aggregate component as you will see in a
minute. Okay. So, then to show the
average score for someone, we can do
this here. Here we use the component to
count the scores belonging to the player
and the sum of the scores and then just
returning the average from that. And
don't forget because of the way the
aggregate component stores these values,
this lookup is super fast and efficient
and doesn't have to scan all the rows.
If we want to show the highest score the
user's ever gotten, then we can use the
max function like this. Once we have the
item, we can pull off the sum value.
This maybe should be called max value,
but anyway, it's called sum value. Okay,
let's take this for a spin now. So, I
can enter my name in here and cool, it
works. I can see my average and highest
score. And of course, if I change it or
add something else, uh, then it's going
to update. Awesome. Love it. All right.
So, so far so good. What else can we do
with the aggregate components? Well, if
you know that your data is going to be
clearly separated, uh, like the albums
in this photos demo here, then you can
use namespacing to more efficiently
segregate your data cuz this is going to
reduce the amount of reads and writes
uh, by just segregating it by namespace.
So when you define your aggregate at the
top, just tell the component what
namespace we're going to be using for
this given document. Then you can use
that namespace when calling into the
component. So here you see we have a um
query that is going to get the number of
photos in a given album and by using the
name space it means that if um somebody
adds a photo to another album then it's
not going to cause this query to rerun.
We could also use the aggregate
component to do efficient randomization
like in this demo. So we can click this
button to get a new random song or we
can show a page of randomized playlist.
To do something like this, we need to
use a null as our sort key. And
basically what this is going to do is
it's going to cause the component to
order the values in its structure by the
ID of the document which is effectively
random because it's a gooid. Then we can
use the dotrandom function to
efficiently get some random song. Now
the final demo here is um showing
various stats that we can get from the
aggregate. We can add values here and
you can see that the numbers change as I
add more latencies. And this one is
demoing a lower level API called the
direct aggregate. So with this one we
don't actually tie our aggregate to any
specific table. Instead we're going to
tell it what our key and value is
directly. So this is the mutation um
that you would call when you wanted to
report uh a latency. So here we insert
with a key of our latency. Our ID is
just anything and our sum value is uh
our latency. We're going to add up our
latencies to get our sum. And here's how
we uh return that data to the user in
the query. We're using our count. We're
using a sum. We're using our min max.
combine it all together to get a bunch
of stats and then we can just show that
on the client. Easy. Okay. So, before we
pop the hood of the component and work
out how it does its magic, um let's talk
about a couple of issues that you might
encounter with the aggre aggregate
component. Now, you may have noticed
what should we call it an operational
issue uh when working with the
component. Whenever we add a document
into a table like the leaderboard here,
we must also remember to add it into the
aggregate. And the same goes for when we
remove something or when we update
something. We always have to remember to
update the aggregate component because
the keys and values could cause it to
get corrupted or out of sync if we
don't. This required coupling is
necessary, but it is also very
errorprone. If you or one of your
teammates forgets to include the
aggregate component line somewhere, then
you can easily end up with an aggregate
component that's out of sync or
corrupted. Fortunately though, there is
a solution to this problem in the form
of triggers and custom functions.
Triggers, if you're not familiar with
them, are a library that's provided by
the Convex helpers repo and allows you
to wrap your database so that whenever
you make a change to it, it can also
trigger something else to happen
somewhere else. I'll show you what I
mean. So, here we're going to create our
triggers object by registering it on our
leaderboards table and then attaching
both aggregates like so. We can then
pull in um the custom functions library
from the convex helpers repo um as it's
going to allow us to write a custom
mutation that looks like this. We can
then go ahead and replace our mutations
in here with our mutation with triggers
custom function. And then we can go
ahead and remove all those lines where
we were having to update or insert into
the aggregate because our triggers are
now going to cause the aggre aggregate
component to be automatically kept in
sync whenever we make a change to our
leaderboard table. And this is really
nice as it allows us to massively
simplify our code and remove that
errorprone nature of our aggregate
getting out of sync from our table. All
right, so let's just make sure this all
works by taking it for a spin. Cool.
Yep. I can add, remove, update entries
as before. Very nice. Now, as we are
talking about issues, uh there is one
other issue with aggregate component
that I should point out that might bite
you if you're not careful. So, as you're
probably aware, every single convex
project comes with a dashboard. So you
can go into it and you can go in and
edit the data inside the table and it
instantaneously reflects on the UI
because that's how convex works. But now
that I've updated the name of this score
here watch what happens if I try and
delete this row. Now bam, error. The
issue is is that our aggregate has now
gone out of sync with our leaderboard.
But Mike, why? We've just used triggers.
Why? How has it got out of sync? You
told me that there was going to be
automatically kept in sync. Yeah. Well,
unfortunately, triggers don't work when
you make changes to the data from the
convex dashboard. And right now,
unfortunately, there is no work around
this. You just have to be really
careful. Make sure that if you're going
to change data through the dashboard
that you also remember to either update
the aggregate component or reync it or
something like that. We have been
talking internally about ways to fix
this maybe with platform level triggers.
So the the dashboard would al also cause
triggers to be fired, but unfortunately
that's probably going to be a little bit
way off before we get to a solution
there. If you're keen to see a proper
solution to this being developed, then
leave me a comment down below and we'll
see if we can bump up the priority on
that. So hopefully at this point you
should have a good idea what the
aggregate component is and how you can
use it in your own project. But what
happens if you have some existing table
with some data in it already that you
want to incorporate into your component?
How can you go about adding the
aggregate component on after the fact?
Well, to solve that issue, we can pull
in another component, the migrations
component. We can then create a
migration that will iterate over every
row in the leaderboard table, for
example, and populate our aggregate in a
safe way. If you're interested more in
this, then I recommend you check out the
documentation as there's full
step-by-step instructions of the
sequence that you should follow. Okay,
so now if you're anything like me,
you're probably wondering at this point,
how is this wizardry performed? Well, I
started to go deep on this topic and put
together a whole demo that shows how the
data structure evolves over time as you
add and remove items. But the hours
started turning to days. Uh, so I had to
reluctantly park this this project for
now. But if you would like me to spend
some more time on this and continue on
this demo or dive a bit deeper into this
this topic in the future, then leave me
a comment down below. But having said
that, the TLDDR is that the aggre
aggregate component works by storing the
data in a really efficient data
structure called a balanced search tree
or B tree. This is a similar data
structure that you might find in a
regular database like Postgress and
MySQL as an index. So what we are
effectively doing is building an index
that we can control inside of a convex
table. So by constructing this B tree in
the convex database, we can turn what
would be O to the N um operation into an
O to the log N which is obviously much
more efficient. All right, so I promised
a spicy take at the start of this video.
So here it is. Convex chose explicitly
not to offer aggregates in their core
offering and instead offload it into
userland or components like this one for
some very good reasons that I elaborate
in my dedicated video on the topic. By
making users handle the aggregation of
the data themselves rather than leaving
it up to the whims of the almighty query
planner, you end up with a more
predictable performance and cost. On the
other hand though, it is more work uh to
manage these simple counts and other
basic aggregations yourself manually via
code or via the usage of this component.
And unless you use triggers, it's very
easy to forget to update the aggre
aggregate component when you make a
mutation to the underlying data, which
would leave your um component in a
corrupted state and you would basically
have to migrate your way out of it. Or
worse, one of your teammates could
change a value in the dashboard as I
showed earlier in this video. Uh that
would leave your component again in a
corrupted state. These kinds of easy to
make mistakes is in my opinion
anthetical to the typical ethos of the
no foot guns found in the rest of
Convex. So I do wonder whether there's
something that Convex should do more
here. Maybe bringing some of the
aggregations to the platform level or
maybe make the platform work better with
triggers or something like that. Well,
anyways, let me know what you guys think
down below. I'm keen to hear what you
think about this.
Right, that's just about it for me for
today. I hope you enjoy this video and
if you did, please leave me a like and
sub. And if you wanted to check out
another video a bit like this one, then
you might want to check out this one I
did a while back on embeddings and
vector search. I dressed up as a school
girl. It was kind of fun. Anyways, until
next time. Thanks for watching. Cheerio.
All right. So hopefully at this point
you should have a good idea of what the
aggre
hopefully you have a good idea of what
the aggre aggregate you should have a
good idea of what the aggre aggregate
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.