TRANSCRIPTEnglish

Efficient COUNT, SUM, MAX with the Aggregate Component

21m 13s3,939 words543 segmentsEnglish

FULL TRANSCRIPT

0:00

Counting, summing, finding the min or

0:02

max value is quite frankly one of the

0:04

most confusing aspects for people new to

0:06

convex. Before I became an official

0:08

Convex fanboy, I flagged it as one of

0:10

the things that I really don't like

0:12

about Convex when I first encountered

0:13

it. This is more of a technical issue,

0:16

but there are no aggregate queries in

0:17

the database. So, this means it's not

0:19

possible to count the number of items in

0:21

a query without explicitly iterating

0:23

them. If you are like me and you come

0:25

from the SQL world, then you're probably

0:28

used to doing something like select

0:29

count star from some table, but no such

0:32

thing like this exists on convex. Why?

0:35

Why on earth not? Well, I did just do a

0:38

whole video on the why of that. So, you

0:40

might want to check that one out first

0:41

if you want to dive deep into that

0:43

particular question. But, they're too

0:45

long, didn't watch. is that to perform

0:47

aggregates such as count or sum, you

0:50

kind of have to scan all the rows in the

0:52

database which can really cause

0:54

performance issues as your data grows.

0:56

So, Convex kind of takes the philosophy

0:58

of forcing you to think upfront about

1:00

how to manage those aggregates yourself.

1:03

In the solution section to that video, I

1:05

briefly mentioned that you could use

1:06

something called the aggregate component

1:08

to help out. And I said if people want

1:11

to have more information upon the

1:12

aggregate component, they leave me a

1:14

comment down below. Well, the people

1:16

have spoken. So, if you are one of those

1:19

commenters or you're somebody who's

1:20

looked at the aggregate component before

1:22

and noped out of there or you're totally

1:25

new to convex and have no clue what I'm

1:26

talking about, then this is the video

1:28

for you. We're going to go through how

1:30

to get started with it, how to do some

1:32

basic aggregations like counts and sums

1:34

before progressing onto some more

1:36

advanced use cases. Then, if we have

1:38

time at the end, we're going to talk a

1:39

little bit about how it works underneath

1:41

the covers. Oh, and do stick around to

1:43

the end as I'm going to give what might

1:45

be a bit of a spicy take on all of this.

1:48

So, once you've dropped me a like and

1:49

sub, let's get into it. All right, so

1:52

I'm going to assume that you have a

1:53

Convex project already running. If not,

1:56

check out one of the quick start guides

1:57

for how to get started there. Then,

1:59

you're going to want to install the

2:00

aggregate component with npm install at

2:02

convexdev/agregate.

2:05

And then, you're going to want to tell

2:06

Convex how to use the component by

2:08

creating a convex.config.ts ts file in

2:10

your convex directory and popping in

2:12

something like this. Note here, it's a

2:15

bit more complicated than you would

2:16

probably typically have it. For this

2:18

video, we're going to be using the same

2:20

component multiple times with different

2:22

names. So, it allows us to easily

2:24

identify the component um within the

2:26

convex dashboard. Right now, you got the

2:29

component set up. Let's have a look at

2:30

how to use it. So, let's imagine we're

2:33

going to build a game that is going to

2:34

show a leaderboard, something like this.

2:36

So here we want to show the player name,

2:39

the score, and rank. Oh, and by the way,

2:41

if you want to try out any of these

2:42

demos that I'm going to show in this

2:43

video yourself, they're all available in

2:45

the example folder in the aggregate

2:47

component repo. I've left a link to it

2:49

down below. Anyways, hopping back into

2:52

the code. First thing we need to do is

2:54

define a normal convex table for

2:56

recording our scores in our leaderboard

2:58

like this. And then of course, we're

3:00

going to need a mutation uh a way to

3:02

insert our score into the database. So

3:05

here we define our ad score mutation and

3:08

we push a record into our leaderboard

3:10

table. All right. So now the question is

3:12

how are we going to display a page of

3:14

data in our leaderboard table? I guess

3:17

we could do something like this. Here we

3:19

use the dot collect function to grab

3:21

every single row in the table. We sort

3:24

them and then we just slice off a page

3:27

of items. And this would work, but as

3:30

you've probably already guessed, this

3:32

isn't going to scale too well as we get

3:34

thousands, tens of thousands, millions

3:36

of scores added into our database. We

3:38

have to read every single row just to

3:40

get back one page of results. What we

3:43

kind of want to do is something a bit

3:45

like this. Here, we're trying to order

3:47

the scores by descending, skip to our

3:50

offset, and then take just what we need.

3:53

But as you may have noticed, there's

3:55

some red squiggles here because convex

3:57

doesn't have an order by or a skip.

4:00

Well, comics does kind of have an order

4:02

by if you use an index. Um, but it

4:04

definitely doesn't have a skip. And

4:06

related to this issue is the one of how

4:09

we go about ranking our scores. So, just

4:12

think about this for a second. How would

4:14

you go about efficiently ranking these

4:16

scores? Say I add another score here.

4:19

Let's say Mikey C with a score of 99999.

4:23

Well, how am I going to determine my

4:26

rank? Well, I guess as long as the

4:28

table's sorted, I could loop through all

4:31

the scores until I find the one that I'm

4:33

above and that's my rank. But h this

4:37

doesn't sound very efficient. If we're

4:38

going to have millions of scores in the

4:40

database, that could be millions of

4:42

scans just to find out what rank I am.

4:45

But then the issue is I'm going to

4:47

insert myself into the position here and

4:50

give myself a rank, but then everybody

4:52

below me is going to have to then move

4:53

down one slot, which is going to cause a

4:56

lot more um mutations into the database.

4:59

This really doesn't sound very

5:01

efficient, does it? But fear not, as

5:03

this is precisely where the aggregate

5:05

component steps in. So now let's define

5:09

a table aggregate at the top of our file

5:11

like this. Here we tell it that we're

5:13

going to want to sort by the score from

5:15

the leaderboard table and it's going to

5:18

be a number. Note that we are using a

5:21

negative value here because the

5:23

aggregate component by default sorts its

5:26

entries in the ascending form. So that's

5:29

from the smallest to largest. But we

5:32

actually want to sort in the descending

5:34

form. So from largest to smallest. So by

5:37

negating the score here, it kind of does

5:39

that. It'll make more sense in a minute.

5:41

So now when we insert a new score, we

5:44

must also remember to insert it into the

5:46

aggregate component like this. For now,

5:49

we're going to assume that there's no

5:50

existing scores in the table. And we'll

5:52

talk a little bit later about what

5:54

happens if you have a table with

5:56

existing data in it. So now we're

5:58

inserting data into our leaderboard

6:00

table and our aggregate component. We

6:02

can update our paging query to look like

6:05

this. So it's pretty simple. We're going

6:08

to use the aggregate components.app app

6:10

function to find the score that is at

6:13

the given offset in our page. So once we

6:16

have that first item in the page, we can

6:18

then use the pageionate function to

6:22

define the lower bound that we're going

6:24

to pageionate over and the page size as

6:28

our number of items. And once we have

6:30

that, we can then go through each of

6:32

those items in the page, look up in our

6:35

own leaderboard table, uh, the document,

6:37

and return the results. Cool. So, let's

6:40

try it out. We have our first five

6:43

scores here, and we can flip through the

6:44

pages, and the scores go down, and

6:48

everything feels nice and snappy. We can

6:50

change the uh number of items that we

6:53

want in a page, and it all just works.

6:55

Awesome. And up here we can add another

6:58

score and it should be entered into the

7:01

right place. Cool. And we can remove

7:04

some entries and it all just updates

7:07

nicely. Okay. So I guess what we've done

7:09

here is to say what is going to be the

7:12

given score for this particular rank

7:14

we're interested in, right? So as long

7:16

as our leaderboard scores are ordered in

7:19

a descending form, we can just grab five

7:22

or whatever items and just show them.

7:24

And this kind of makes sense, right? You

7:26

can see how this would be slow if we

7:28

were to do this without the aggregate

7:29

component. Like for page 100, for

7:32

example, we would have to manually scan

7:34

through 496 other rows if we wanted to

7:38

get just that page that we were

7:39

interested in. And this would not only

7:41

be just slow, it would also be really

7:44

problematic due to the way that the

7:45

transactional nature of convex works. So

7:48

like if any of those 496 rows uh that we

7:52

scan through change then it's going to

7:54

invalidate our page and so just even a

7:57

simple name change would cause the row

8:00

to be invalid validated which causes our

8:02

query to be invalidated and yeah it

8:04

could get expensive very quickly. So the

8:06

way that the aggregate component works

8:07

is it stores the data in such a way that

8:10

allows us to skip lots of those document

8:12

reads and get right to the part that we

8:15

are interested in. But I'm getting a

8:17

little bit of ahead of myself here.

8:19

We're going to talk about how the

8:20

aggregate component works a little bit

8:22

later on in this video. But for now,

8:23

let's just see what else we can do with

8:24

the component to help us out. So now

8:27

let's say uh that instead of wanting to

8:30

look up a score from a given rank, we

8:32

want to do the inverse of that. We want

8:35

to find out what rank I would be g would

8:37

be given if I had a specific score. So

8:40

we can do this really easily with the

8:42

index of function on the aggregate. Oh,

8:45

and not forgetting to invert the score

8:47

as we're actually storing our keys in

8:50

inverted form in aggregate that we've

8:52

talked about before. And now if we take

8:54

this for a spin, we can enter a score

8:56

here and we get the correct rank. And

9:00

again, this is reactive just like

9:02

everything in convex. So if we add new

9:04

scores or remove them, then it's

9:08

instantaneously reflected. And finally,

9:11

what happens if we wanted to get some

9:14

stats on a particular player like find

9:16

out what my average score is or my

9:19

higher score? Well, we can do that

9:21

pretty easily if we pull in another

9:23

aggregate. So now, um, we've changed the

9:26

sort key from being just a score to

9:29

being this array, this tupil of the name

9:33

of the person who got the score and the

9:35

score. So, it's first sorted by the name

9:37

and then sorted by the score within that

9:40

name for that person. We could also

9:42

potentially do this using namespaces,

9:44

but we'll talk about namespaces in a

9:46

minute. We've also added this sum value

9:48

here, which is going to sum all the

9:51

scores up and cache them inside the

9:53

aggregate component as you will see in a

9:54

minute. Okay. So, then to show the

9:57

average score for someone, we can do

9:58

this here. Here we use the component to

10:01

count the scores belonging to the player

10:03

and the sum of the scores and then just

10:05

returning the average from that. And

10:07

don't forget because of the way the

10:09

aggregate component stores these values,

10:11

this lookup is super fast and efficient

10:13

and doesn't have to scan all the rows.

10:15

If we want to show the highest score the

10:17

user's ever gotten, then we can use the

10:19

max function like this. Once we have the

10:22

item, we can pull off the sum value.

10:24

This maybe should be called max value,

10:26

but anyway, it's called sum value. Okay,

10:29

let's take this for a spin now. So, I

10:31

can enter my name in here and cool, it

10:34

works. I can see my average and highest

10:36

score. And of course, if I change it or

10:40

add something else, uh, then it's going

10:42

to update. Awesome. Love it. All right.

10:46

So, so far so good. What else can we do

10:48

with the aggregate components? Well, if

10:50

you know that your data is going to be

10:51

clearly separated, uh, like the albums

10:54

in this photos demo here, then you can

10:56

use namespacing to more efficiently

10:59

segregate your data cuz this is going to

11:02

reduce the amount of reads and writes

11:04

uh, by just segregating it by namespace.

11:08

So when you define your aggregate at the

11:09

top, just tell the component what

11:11

namespace we're going to be using for

11:13

this given document. Then you can use

11:15

that namespace when calling into the

11:17

component. So here you see we have a um

11:20

query that is going to get the number of

11:23

photos in a given album and by using the

11:26

name space it means that if um somebody

11:30

adds a photo to another album then it's

11:32

not going to cause this query to rerun.

11:35

We could also use the aggregate

11:36

component to do efficient randomization

11:39

like in this demo. So we can click this

11:41

button to get a new random song or we

11:45

can show a page of randomized playlist.

11:48

To do something like this, we need to

11:50

use a null as our sort key. And

11:53

basically what this is going to do is

11:54

it's going to cause the component to

11:56

order the values in its structure by the

11:59

ID of the document which is effectively

12:02

random because it's a gooid. Then we can

12:04

use the dotrandom function to

12:06

efficiently get some random song. Now

12:09

the final demo here is um showing

12:12

various stats that we can get from the

12:14

aggregate. We can add values here and

12:16

you can see that the numbers change as I

12:18

add more latencies. And this one is

12:20

demoing a lower level API called the

12:23

direct aggregate. So with this one we

12:25

don't actually tie our aggregate to any

12:28

specific table. Instead we're going to

12:30

tell it what our key and value is

12:32

directly. So this is the mutation um

12:35

that you would call when you wanted to

12:37

report uh a latency. So here we insert

12:40

with a key of our latency. Our ID is

12:45

just anything and our sum value is uh

12:49

our latency. We're going to add up our

12:50

latencies to get our sum. And here's how

12:53

we uh return that data to the user in

12:56

the query. We're using our count. We're

12:59

using a sum. We're using our min max.

13:01

combine it all together to get a bunch

13:02

of stats and then we can just show that

13:05

on the client. Easy. Okay. So, before we

13:08

pop the hood of the component and work

13:10

out how it does its magic, um let's talk

13:14

about a couple of issues that you might

13:16

encounter with the aggre aggregate

13:19

component. Now, you may have noticed

13:21

what should we call it an operational

13:24

issue uh when working with the

13:26

component. Whenever we add a document

13:28

into a table like the leaderboard here,

13:30

we must also remember to add it into the

13:33

aggregate. And the same goes for when we

13:35

remove something or when we update

13:38

something. We always have to remember to

13:40

update the aggregate component because

13:43

the keys and values could cause it to

13:45

get corrupted or out of sync if we

13:48

don't. This required coupling is

13:50

necessary, but it is also very

13:52

errorprone. If you or one of your

13:55

teammates forgets to include the

13:56

aggregate component line somewhere, then

13:59

you can easily end up with an aggregate

14:01

component that's out of sync or

14:02

corrupted. Fortunately though, there is

14:05

a solution to this problem in the form

14:07

of triggers and custom functions.

14:10

Triggers, if you're not familiar with

14:11

them, are a library that's provided by

14:14

the Convex helpers repo and allows you

14:16

to wrap your database so that whenever

14:19

you make a change to it, it can also

14:21

trigger something else to happen

14:23

somewhere else. I'll show you what I

14:25

mean. So, here we're going to create our

14:27

triggers object by registering it on our

14:30

leaderboards table and then attaching

14:32

both aggregates like so. We can then

14:34

pull in um the custom functions library

14:38

from the convex helpers repo um as it's

14:41

going to allow us to write a custom

14:43

mutation that looks like this. We can

14:45

then go ahead and replace our mutations

14:48

in here with our mutation with triggers

14:52

custom function. And then we can go

14:54

ahead and remove all those lines where

14:56

we were having to update or insert into

14:58

the aggregate because our triggers are

15:01

now going to cause the aggre aggregate

15:03

component to be automatically kept in

15:06

sync whenever we make a change to our

15:08

leaderboard table. And this is really

15:10

nice as it allows us to massively

15:12

simplify our code and remove that

15:14

errorprone nature of our aggregate

15:16

getting out of sync from our table. All

15:19

right, so let's just make sure this all

15:20

works by taking it for a spin. Cool.

15:23

Yep. I can add, remove, update entries

15:26

as before. Very nice. Now, as we are

15:29

talking about issues, uh there is one

15:32

other issue with aggregate component

15:33

that I should point out that might bite

15:35

you if you're not careful. So, as you're

15:38

probably aware, every single convex

15:40

project comes with a dashboard. So you

15:43

can go into it and you can go in and

15:45

edit the data inside the table and it

15:48

instantaneously reflects on the UI

15:50

because that's how convex works. But now

15:53

that I've updated the name of this score

15:56

here watch what happens if I try and

15:58

delete this row. Now bam, error. The

16:01

issue is is that our aggregate has now

16:05

gone out of sync with our leaderboard.

16:07

But Mike, why? We've just used triggers.

16:09

Why? How has it got out of sync? You

16:11

told me that there was going to be

16:12

automatically kept in sync. Yeah. Well,

16:15

unfortunately, triggers don't work when

16:17

you make changes to the data from the

16:20

convex dashboard. And right now,

16:22

unfortunately, there is no work around

16:25

this. You just have to be really

16:27

careful. Make sure that if you're going

16:28

to change data through the dashboard

16:30

that you also remember to either update

16:32

the aggregate component or reync it or

16:35

something like that. We have been

16:37

talking internally about ways to fix

16:39

this maybe with platform level triggers.

16:42

So the the dashboard would al also cause

16:46

triggers to be fired, but unfortunately

16:48

that's probably going to be a little bit

16:49

way off before we get to a solution

16:50

there. If you're keen to see a proper

16:53

solution to this being developed, then

16:55

leave me a comment down below and we'll

16:56

see if we can bump up the priority on

16:58

that. So hopefully at this point you

16:59

should have a good idea what the

17:00

aggregate component is and how you can

17:03

use it in your own project. But what

17:05

happens if you have some existing table

17:07

with some data in it already that you

17:09

want to incorporate into your component?

17:13

How can you go about adding the

17:14

aggregate component on after the fact?

17:17

Well, to solve that issue, we can pull

17:19

in another component, the migrations

17:21

component. We can then create a

17:23

migration that will iterate over every

17:27

row in the leaderboard table, for

17:28

example, and populate our aggregate in a

17:32

safe way. If you're interested more in

17:34

this, then I recommend you check out the

17:36

documentation as there's full

17:39

step-by-step instructions of the

17:40

sequence that you should follow. Okay,

17:42

so now if you're anything like me,

17:44

you're probably wondering at this point,

17:47

how is this wizardry performed? Well, I

17:50

started to go deep on this topic and put

17:51

together a whole demo that shows how the

17:54

data structure evolves over time as you

17:56

add and remove items. But the hours

17:59

started turning to days. Uh, so I had to

18:02

reluctantly park this this project for

18:04

now. But if you would like me to spend

18:06

some more time on this and continue on

18:08

this demo or dive a bit deeper into this

18:11

this topic in the future, then leave me

18:13

a comment down below. But having said

18:15

that, the TLDDR is that the aggre

18:17

aggregate component works by storing the

18:20

data in a really efficient data

18:22

structure called a balanced search tree

18:25

or B tree. This is a similar data

18:28

structure that you might find in a

18:30

regular database like Postgress and

18:32

MySQL as an index. So what we are

18:35

effectively doing is building an index

18:38

that we can control inside of a convex

18:40

table. So by constructing this B tree in

18:43

the convex database, we can turn what

18:46

would be O to the N um operation into an

18:50

O to the log N which is obviously much

18:53

more efficient. All right, so I promised

18:56

a spicy take at the start of this video.

18:58

So here it is. Convex chose explicitly

19:01

not to offer aggregates in their core

19:04

offering and instead offload it into

19:07

userland or components like this one for

19:09

some very good reasons that I elaborate

19:11

in my dedicated video on the topic. By

19:14

making users handle the aggregation of

19:16

the data themselves rather than leaving

19:18

it up to the whims of the almighty query

19:21

planner, you end up with a more

19:23

predictable performance and cost. On the

19:26

other hand though, it is more work uh to

19:28

manage these simple counts and other

19:30

basic aggregations yourself manually via

19:33

code or via the usage of this component.

19:36

And unless you use triggers, it's very

19:38

easy to forget to update the aggre

19:41

aggregate component when you make a

19:42

mutation to the underlying data, which

19:45

would leave your um component in a

19:48

corrupted state and you would basically

19:50

have to migrate your way out of it. Or

19:52

worse, one of your teammates could

19:54

change a value in the dashboard as I

19:56

showed earlier in this video. Uh that

19:59

would leave your component again in a

20:01

corrupted state. These kinds of easy to

20:04

make mistakes is in my opinion

20:07

anthetical to the typical ethos of the

20:09

no foot guns found in the rest of

20:12

Convex. So I do wonder whether there's

20:14

something that Convex should do more

20:16

here. Maybe bringing some of the

20:18

aggregations to the platform level or

20:22

maybe make the platform work better with

20:24

triggers or something like that. Well,

20:27

anyways, let me know what you guys think

20:29

down below. I'm keen to hear what you

20:31

think about this.

20:33

Right, that's just about it for me for

20:35

today. I hope you enjoy this video and

20:37

if you did, please leave me a like and

20:39

sub. And if you wanted to check out

20:42

another video a bit like this one, then

20:43

you might want to check out this one I

20:45

did a while back on embeddings and

20:47

vector search. I dressed up as a school

20:49

girl. It was kind of fun. Anyways, until

20:54

next time. Thanks for watching. Cheerio.

20:59

All right. So hopefully at this point

21:00

you should have a good idea of what the

21:02

aggre

21:04

hopefully you have a good idea of what

21:05

the aggre aggregate you should have a

21:08

good idea of what the aggre aggregate

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.

GET STARTED FREE SIGN IN