TRANSCRIPTEnglish

Categorical scatter plot in Excel

17m 31s1,994 words364 segmentsEnglish

FULL TRANSCRIPT

0:01

hello everybody in this video tutorial

0:04

i want to discuss with you how we can

0:06

generate

0:07

what is called a categorical scatter

0:09

plot in excel

0:11

and i discovered that this is

0:14

surprisingly challenging

0:16

apparently the good people from

0:18

microsoft have no thought about

0:20

something like that so what am i talking

0:23

about

0:25

right so let's say we do an

0:28

experiment and we collect some data it's

0:31

a very small data set

0:34

so for example we want to see

0:37

if some how seedlings are

0:40

growing so we measure the growth of

0:43

seedlings

0:45

we measure that in centimeters so for

0:47

example we let them grow for

0:49

a day or two or something like that and

0:52

then measure

0:52

how much have they grown and we've got

0:56

three kind of

1:00

parameters for that so we have the

1:02

control that would be the untreated

1:05

seedlings

1:06

we have treatment number one that might

1:09

be with some

1:10

nutrients and treatment number two with

1:14

another set of nutrients and we

1:17

want to plot these data so that we can

1:20

immediately see

1:21

what's going on and ideally we also

1:24

would like to

1:26

show our average uh growth

1:29

and also maybe the standard deviation

1:32

so what do we want our plot to look like

1:37

so we would probably want to have

1:40

something like that

1:42

here on the x-axis that's our

1:46

independent variable we want the

1:48

categories so we want the control

1:51

we want treatment number one

1:55

and we want treatment number two

1:59

and on the y-axis that's the dependent

2:02

variable

2:03

that would be the growth and

2:06

here we've got that in

2:10

centimeters let's do it like that

2:14

and uh because we've got only a very

2:17

small set

2:19

of data we want to

2:23

show each data point so we would

2:26

probably we would like to have something

2:29

like

2:30

this where we have three data points for

2:32

the control

2:34

like we've got here three data points

2:36

and

2:37

for treatment one we might have

2:40

something like that

2:41

uh i'm not looking at the data at the

2:43

moment

2:45

we also might want to have say

2:48

the mean of the data

2:51

indicated so for example if here is the

2:54

mean

2:55

of the data we want to indicate it like

2:57

that

2:58

and we might want to have some standard

3:01

deviation some error bars with standard

3:04

deviation

3:05

so that is what we would like to have

3:08

our our plot at the end

3:12

what it should look like so i'll

3:15

remove that to

3:18

get some space here so

3:24

but we we keep that in mind that this

3:27

is what we actually want to have

3:31

so how are we going to deal with it

3:34

now the first thing that we do

3:37

is we calculate

3:41

we calculate the average

3:45

so i need to go in here

3:48

and select okay

3:52

so here we calculate the average for our

3:56

treatments

3:57

so that would be the mean and in excel

4:00

we can easily calculate that that would

4:02

be

4:03

equals average and i start typing so it

4:06

comes up with a suggestion i

4:09

left mouse double click on it

4:12

and i just simply highlight the numbers

4:16

that i want this to

4:19

continue in in the bracket to calculate

4:23

the mean so that would be

4:24

from b3 to b5 here that looks good

4:28

close the bracket and here this is the

4:30

mean

4:32

and for these data now what i can do

4:37

is i can calculate the mean for

4:39

treatment 1 and treatment 2

4:42

and because i'm lazy i just

4:45

go down here and you see this

4:49

black cross here in the bottom corner

4:53

left mouse button clicked and i just

4:56

simply drag it

4:57

over and it will

5:00

do the calculation of the average

5:04

for the other treatments and what we see

5:07

is that

5:08

actually the mean

5:11

for the growth in the control seems to

5:14

be

5:14

the the largest whereas the treatment

5:17

one

5:18

has the lowest growth and treatment too

5:21

well is sort of in between so it looks

5:24

like that

5:25

the seedlings do not like treatment 1 or

5:28

treatment

5:29

2. so this is now

5:32

our data set that we can use

5:35

and now we need to figure out how we can

5:38

actually plot this and as i said this

5:42

is surprisingly challenging

5:46

because excel doesn't really have

5:49

a good option for dealing with

5:52

something like that so what can we do

5:56

we need to prepare our data a little bit

5:59

and

6:00

it will look a little bit odd so what i

6:03

do in the first place

6:04

is i highlight the data

6:08

for treatment 1

6:11

and pull them

6:14

down move them

6:19

so that it looks like something like

6:22

that

6:22

so you notice the

6:25

treatment one here for the controls

6:30

that is now empty and we've got

6:33

treatment

6:34

1 here and we do exactly the same thing

6:39

for treatment 2 and

6:42

do it like that now important

6:47

just to keep track of our last data

6:50

point

6:50

these were the mean so here we had the

6:54

mean

6:54

for control here we have the mean for

6:58

uh treatment one and here we have the

7:00

mean

7:02

for treatment

7:06

two we can also calculate standard

7:09

deviations

7:10

and i do that down here so the

7:13

standard deviation and i just simply

7:17

abbreviate it like that

7:19

so we can calculate the sample standard

7:22

deviation

7:23

that's again that's very easy we just

7:27

start typing and we are looking for the

7:30

sample standard deviation which is

7:32

abbreviated with these

7:34

standard dev s so

7:37

we highlight that because this then will

7:40

give

7:41

our error bar at the end so that's our

7:44

standard deviation

7:46

for the control here we do it for

7:50

the s the standard deviation

7:54

for our treatment one

7:59

and we do the standard deviation

8:05

for treatment two and now we are pretty

8:09

much

8:09

set up with whatever we need

8:14

and we see the standard deviation for

8:17

control 1

8:18

for control and treatment 1 are pretty

8:21

similar

8:23

whereas the treatment 2 the standard

8:25

deviation is quite large

8:26

okay now let's plot so what

8:29

are we going to do we highlight

8:33

control we highlight our data

8:37

like this with the left mouse button

8:39

clicked

8:41

and then we move to

8:45

a line plot now this is probably the

8:48

only

8:49

situation where a line plot is really

8:51

useful

8:52

so let's click on that and we go

8:56

to a line plot where we have the data

9:00

like this so we want to have it

9:05

in this way so this is now

9:08

our plot let's not worry about the chart

9:13

title

9:13

and and things like that because to be

9:16

perfectly honest it

9:18

doesn't really look good

9:21

we are not interested in these

9:24

weird things so what we need to do now

9:27

is

9:28

we need to switch row and column

9:32

so that's up here and let's switch

9:35

row and column and here we get

9:39

our data points control treatment

9:42

and treatment too we don't really

9:45

need the legend here so we can

9:49

remove that and now we've got

9:53

our data points here and they are nicely

9:56

stacked up

9:57

and if we look carefully we see that

9:59

we've got

10:00

four data points so these these are the

10:05

three data points plus the average that

10:08

we have

10:09

and of course we want to highlight the

10:12

average a little bit more so what we can

10:15

do

10:16

is we can try to figure out which one is

10:19

the

10:19

is the uh is the mean so

10:23

here it tells us even if we hover

10:26

over it so this is series point control

10:29

4 with a value of 0.64

10:33

and that actually was our our mean

10:36

so what we can do is we click on it and

10:39

it shows us that we have the mean

10:42

and what we do is if we double click on

10:45

it

10:46

we will open another window here and now

10:48

we can format this data point

10:52

so what do we do with it so we

10:55

go to this bucket here and to format

10:58

this data point

11:01

we go to marker that's this one here

11:04

and we look at the marker options

11:08

now i usually like to have a slightly

11:12

different

11:13

shape for the mean so we go to build in

11:17

and i would like to have let's say

11:25

this one here and

11:29

i would like it to be

11:32

black

11:36

i don't want a fill but i want a solid

11:39

line

11:40

so this gives us our first

11:43

average how does it look like okay

11:46

that's the average

11:47

the other three are the data points and

11:51

now we can actually also

11:55

include error bars into that

11:58

how do we do that or actually let's do

12:01

the averages for the other data points

12:03

as well

12:04

so here our treatment 1

12:07

has a mean of 0.22

12:11

let's see

12:14

i think that's this one here click on it

12:18

yes that's the right one

12:20

now i want to do the same thing here

12:25

i want to change the marker

12:28

marker option built in

12:33

i want this one here i

12:38

want no fill but a

12:41

solid line okay i've got this

12:45

and for treatment number

12:48

two i go for this one here

12:52

yep we've got the right one

12:55

marker options again the

12:58

built-in option i want the

13:01

star again

13:04

that's this one i don't

13:07

want a fill but i want a

13:11

solid line so now i have

13:14

labeled all my averages and it might be

13:17

a good idea to

13:19

note that here then in the legend and

13:21

now how we

13:22

how can we add error bars

13:25

that's also uh pretty straightforward

13:29

so what we can do is we can double click

13:33

on the average this one here

13:37

and let's remove that and we get these

13:40

little sidebars here so click on the

13:43

plus

13:44

and we go to error bars we check this

13:47

box

13:50

and go on the arrow

13:54

and here under more options

13:58

we choose custom

14:02

we get a large error by here but that

14:04

indicates that

14:06

we've got something and here we want to

14:08

specify the value that's this

14:10

the value that we've just calculated

14:12

with the standard deviation

14:14

so here we just put in

14:19

this value and

14:22

here we also put in for the lower one

14:26

this value okay

14:30

so we've got our standard deviation

14:34

let's quickly do that

14:35

for the other ones as well

14:40

so we go to the

14:44

error bars here for this one

14:50

more option we customize it

14:53

specify the value and here we've got the

14:57

standard deviation

14:58

here and the standard deviation

15:01

also for the lower one

15:05

okay oops we

15:09

apparently made a mistake yes here's the

15:12

mistake

15:19

okay that's the error bar here

15:22

and last but not least let's do the same

15:26

thing

15:26

for treatment two

15:30

and error bars

15:33

custom options again we want to

15:38

specify this one

15:41

and also this one here

15:45

and we need to be careful not to make

15:47

the same mistake

15:48

again that we've done before

15:51

and this should now

15:54

look pretty decent so that's

15:58

what we needed to do of course we want

16:00

to give it a chart title

16:02

that is growth of seedlings

16:07

growth of seedlings

16:12

and we probably need to also

16:15

give it a title here on

16:19

the y-axis so again we click on the axis

16:23

we say axis title and in this case

16:28

we want to give that as

16:31

growth

16:35

and we have this in centimeters

16:39

okay so we don't need this axis title

16:42

down here so this is how we can actually

16:47

prepare uh relatively

16:50

decent categorical

16:53

scatter plot and then we have these

16:56

uh data here where we use the different

16:59

categories and of course you can have

17:02

more than just these three categories

17:05

you can have as many categories as you

17:07

like

17:07

but then it gets a little bit wild you

17:09

can also have larger

17:11

data sets but then again it gets a

17:14

little bit

17:14

wild so i think you should not have more

17:17

than five

17:18

six seven uh data here but

17:21

at least we can see how our data

17:24

actually

17:25

look like so i hope this makes sense and

17:27

thank you very much for watching

UNLOCK MORE

Sign up free to access premium features

INTERACTIVE VIEWER

Watch the video with synced subtitles, adjustable overlay, and full playback control.

SIGN UP FREE TO UNLOCK

AI SUMMARY

Get an instant AI-generated summary of the video content, key points, and takeaways.

SIGN UP FREE TO UNLOCK

TRANSLATE

Translate the transcript to 100+ languages with one click. Download in any format.

SIGN UP FREE TO UNLOCK

MIND MAP

Visualize the transcript as an interactive mind map. Understand structure at a glance.

SIGN UP FREE TO UNLOCK

CHAT WITH TRANSCRIPT

Ask questions about the video content. Get answers powered by AI directly from the transcript.

SIGN UP FREE TO UNLOCK

GET MORE FROM YOUR TRANSCRIPTS

Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.