Excel for Data Analytics - Full Course for Beginners
FULL TRANSCRIPT
dat nerds welcome to this full course
tutorial on Excel for data analytics
this is the course I wish I would have
had when I first started as a data
analyst you're going to be working right
alongside me as we Master how to use a
spreadsheet starting with the basics of
functions charts and tables working our
way up to our first portfolio project
we'll then shift gears into advanced
features like pivot tables power query
and power pivot ultimately building our
second and final project analyzing real
world data now to master this tool we're
not going to go straight for 11 hours
instead we're going to break it down
into 10 to 20 minute lessons during this
we'll have exercises for you to learn
while doing not just watching followed
by practice problems to reinforce your
newly learned skills now Excel is the
most popular spreadsheet tool in the
world it's estimated to have over 1
billion users that's one in eight people
in the world and for data nerds it's one
of the most popular skills for data
analysts coming only behind SQL oh and
the same can be said for business
analysts in this Tool's popularity truth
be told Excel was one of the only skills
that I knew when I landed my first role
in data analytics but it was able to
handle everything thrown at me and so
I've been cataloging over the years all
of the most important features to
perform data analytics and I compiled it
in this course and this video is for
absolute beginners you don't need any
analytic or spreadsheet experience we'll
be starting with the first half on the
basic chapters which will build up your
knowledge on the fundamentals with
covering which versions of excel you can
use for the course along with installing
it then we'll get you familiar with
working around how to manipulate a
spreadsheet from there we'll shift into
practical exercises analyzing data using
formulas and functions and then
visualizing it using common charts and
statistical analysis at the end of the
basics chapters we'll put your skills to
the test to build an interactive
dashboard to predict one salary based on
job and location for the second half of
the course we're going to ramp up our
learnings diving into Advanced
Analytical features focusing on using
pivot tables and add-ins to dive quickly
into Data in sites we'll learn power
query to connect to a variety of data
sets and perform ETL or extract
transform and load finally we'll learn
data modeling with power pivot and
perform Advanced calculations with the
Dax Language by the end of the advanced
chapters we'll have built a full data
analytics project analyzing the data
science job market which you'll be able
to share this and the previous project
in order to Showcase your experience
with analyzing data in Excel now I'm a
big believer in open- sourcing education
so this course and all the content
required to complete the course is
completely free I not only get you set
up with Excel but I also provide all the
different Excel workbooks and sheets
needed to complete this course with this
you'll get access to the data sets
needed to make those final projects and
even how to share them now unfortunately
the AdSense Revenue alone from this
course isn't enough in order to support
all the different costs associated with
building this so I have an option for
those that want to support and help out
for those that purchase my supporter
resources you're you're going to get
access to a lot of features that are
going to help speed up your learning all
provided through this custom dashboard
to track your progress you'll get guided
practice problems to perform after each
lesson that will not only provide the
solution but also walk you through how
to get it if you get stuck along the way
you'll have access to a community of
others in order to jump in and comment
and ask for help additionally you'll be
getting my step-by-step instructions
that walk through each of the lessons as
I perform it and finally when you
complete the course I'll email you a
certificate of completion that you can
upload to LinkedIn now one quick shout
out before we jump in and that's to
Kelly Adams she helped me plan out a lot
of the different lessons for this course
along with being the brains behind a lot
of the different practice problems and
frankly if I didn't have help I probably
couldn't have completed this course so
before we go any further with what we
need to and actually diving into this
course we need to First understand what
is Excel and where the heck it came from
so in order to understand this we need
to go back oh a little too far back
ah just right ancient Babylon when we
used to trade livestock like it was
crypto now it's during this time that we
started recordkeeping and we didn't have
paper so we used Stone and we partition
it into rows and columns during the time
of the Romans they began to perfect this
even further with accounting eventually
we get some advancements in technology
we start getting this on paper this is
when the term spreadsheets gets the
introduction this maintained that
familiar row and column format in order
to catalog different things spread AC
across different sheets spread sheet
fast forward to the 1900s and we pack
rooms full of underpaid people in order
to maintain and keep track of all the
different transactions on paper
spreadsheets with the Advent of
computers in the late '70s we started to
see our first spreadsheet softwares vial
and Lotus 123 then our boy here decided
to revolutionize the world little
bit okay not with that but with this I'm
Bill Gates chairman of my
Microsoft in this video you're going to
see the future since its launch in 1985
it's been wreaking havoc in the
spreadsheet software Community
dominating market share and to continue
to dominate over the years Microsoft has
added more and more features it
initially started out to where you'd
only be using it for the cells of
entering different formulas and forming
quick calculations along with getting
different charts and Analysis shortly
thereafter it was upgraded with pivot
tables and that's my secret weapon to
quickly analyzing data as I no longer
have to remember which comes first and
index and match now VBA or Visual Basic
for applications was included in the
mid90s and it's a programming language
in order for you to automate task in
Microsoft applications now we're not
going to waste any time in this course
learning VBA frankly I feel it's
outdated you should learn python instead
and there's newer tools that actually
automate the process of data analysis
like powerquery this was first
introduced as 2010 and then rebranded to
get and transform and then rebranded
again to power query sort of similar to
what Google does with renaming products
anyway this bad boy is like washing down
a couple caffeine pills with a shot of
espresso it can ingest and clean so much
data in the blink of an eye hardcore
data nerds call this ETL or extract
transform and load power pivot was also
introduced during this time of power
query and it's like putting your
spreadsheets on steroids this allows us
to perform data modeling on data sets
greater than a million rows greater than
what Excel actually holding the
spreadsheets and combined with the power
of Dax or data analysis Expressions we
can supercharge our calculations fast
forward to today and there's been two
other major features added to excel
co-pilot which is basically chat GPT
inside of Microsoft Excel and python
Excel which is basically python inside
of excel anyway co-pilot is great wait
that's a lie so I do believe AI chat
rots are great at helping us out when we
get stuck but I don't want you rely on
that to actually learn this technology
of Excel and for Python and Excel you
need to know well python if you don't
know this yet it's completely useless
now with all these features it can make
it seem like Excel is overwhelming which
I completely get that but when you focus
on the basics and work from there I
think it makes a lot easier to learn it
it's also why this course is almost 11
hours long all right enough with the
history lesson let's actually get into
the course material and what you're
going to need for this also we're going
to be going over what data set or what
data we're going to be analyzing for the
project for this with the link provided
below you can navigate to this which is
the GitHub repo that has all the
different folders and files needed to
take the course now don't understand if
you're not familiar with GitHub we're
going to walk through this this pane
here basically outlines all the
different folders that you have access
to and if I navigate in something like
resources I can see I have a data sets
folder images folder and even a problems
folder so for those that purchase the
course practice problems you have access
to the problems inside of here and
they're broken down by chapter along
with the lesson in addition to that
resources folder you can see numbered
here we have each of those eight
chapters and if we navigate into
something like spreadsheets intro we
have a workbook for each one of the
lessons so you want to download this
file you just navigate to it click the
three dots and click download but have
an alternate method coming up in a bit
inside the workbooks I provide a blank
template for you to go through and
actually fill in and we'll be getting to
what's in this final sheet of actually
being filled in now as we move into the
advanced chapters they're going to have
something like the data sheet or you're
going to use the data from the data
sheets in order to do different
operations and we'll put those in
different sheets as well so how do we
get these files well the easiest way is
to come up here to this code and go to
download zip with the file downloaded
all you need is to unzip it and then
from there it has all the different
folders with the appropriate workbooks
inside of them now after going through a
lesson I then have practice problems for
those that purchase the course perks to
go through here's the course dashboard
that you'll get access to that breaks it
all down for the problems based on the
chapter itself and then by the lesson
and inside of each of these lessons is
multip multiple different problems for
you go through and work the other perk
that you'll receive with those practice
problems are the course notes these
break down the concepts in a similar
format of all the different chapters and
lesson here's the one on Excel install
which is going to be what we're covering
next but it provides all the different
background on all the different material
that be covering this and it's in the
same format that I'm covering it in the
video so you can follow right along just
as a reminder there's no requirement to
purchase these practice problems or
course notes just helps support me
anyway what are we actually going to be
covering in this data analysis that
we're going to be doing inside of excel
well you're going to be taking the role
of a job Seeker in exploring what are
some of the top paying roles along with
skills of data nerds for this we're
going to use the data from my app dat
nerd. Tech that is collected to this
point up to 3 million jobs it tells
based on a job title and also on a
location what are the top skills and it
not only tells us the salary of these
skills for a particular job but also the
salaries of the jobs themselves now the
main data set we're going to be using
for the majority of this course is this
one here inside the data sets folder of
data job salary all this data set
includes over 30,000 job postings from
2023 and it includes a wealth of
information such as company name salary
and location as we go through these
examples I'm going to be doing it from
the perspective of a data analyst which
is their top job in the data set but as
shown here there's a lot of different
other job titles that you can check out
and use as well so feel free to deviate
additionally I'll be primarily focusing
on the United States but there's a lot
of different countries in there as well
so feel free to plug in your home
country and analyze this instead now
with any course you're probably going to
get stuck along the way and so how do
you get help for this well I don't
recommend just jumping into the comment
section and waiting for somebody to help
you out instead I recommend using a chat
bot like chat GPT in it you can provide
whatever era you're seeing and it will
help you out and guide you along the way
on what to do and there's other great
options as well such as gemini or even
Claude so feel free to use whichever one
you're most comfortable with all right
if you haven't done so already it's your
turn now to go in and download that
GitHub repo with all the different
workbooks needed for this course in the
next lesson we're going to be getting
into installing Excel and mainly
understanding what are the different
versions that you can actually get with
Excel and which one you need for the
course with it I'll see you
there let's now actually get into
working with Excel so in this lesson
we're going to be going through how to
actually inst install Excel onto your
computer assuming you don't have it but
before we get to that for those that
maybe have Excel or an older version of
Excel or have different computers we're
going to actually go through what are
the preliminary requirements you need to
have or set up in order to be able to
have the Excel you need for this
course now here's a breakdown of the
different chapters within this course
that is the rows here and then for the
columns are the different micro Micosoft
products that you can get in order to
have Excel now if you're running Excel
on a Windows machine either through
Microsoft 365 Microsoft Office at home
and student or even an older version of
excel up to about
2010 you're going to be fine with
completing all the different course
content however if you have the Mac
version or Mac operating system and
Excel is installed directly on that
operating system you're not going to be
able to complete the Advanced chapter
specifically on power query and on power
pivot along with the project and it's
similar as well for Microsoft 365 online
as you won't also be able to complete
the Advanced Data analysis section now
if you have any of these first three
versions of excel installed on your
computer you can skip to the next lesson
if you want I'm just be going through
before the install process of breaking
down each of these different versions so
you understand your options what you can
get so let's get into breaking down all
these different versions available first
up is Microsoft
365 now with Microsoft 365 you're going
to get a host of different Microsoft
applications not only Excel but also
things like word PowerPoint and even
Outlook and there's two major plans I'm
going to recommend for this either the
family plan which allows you to give out
these keys for these different services
to up to six people or a personal plan
which allows you to give it to well
yourself now I do want to call out that
if you're a college student or maybe you
work for a big Corporation you may have
access to a free Microsoft 365 plan so
if you're in college check with your
college and if you're working for a
business check for your business if you
have access to this so you don't have to
pay money for it but regardless of that
if money is an issue Microsoft 365
family offers this free one-month trial
which I think you can complete this
course within a month so technically you
could do this for free if you don't want
to get charged you will need to actually
cancel before the end of that 30 days
and at that point you'll still have
Microsoft Excel installed on your
computer just everything will be in view
only mode you won't actually be able to
edit any of the different spreadsheets
that we've operated on during this
course let's now move into Microsoft
Office home and
student now this bad boy is the
alternate recommendation I'm going to
give you if you don't want to pay for a
Microsoft 3 365 subscription this is
only a onetime purchase and it gives you
keys to Microsoft Office so you can
install all the different Microsoft
products of excel word and PowerPoint
onto your computer for the low low price
of $150 similar to Microsoft 365
subscription this will not only work on
a Windows machine but it will also work
on a Mac machine Let's now move to this
last option because it's sort of in the
bundle of it of Microsoft 365 online
now this version of Microsoft 365 is
completely free but sort of a catch to
this here I am on my web browser logged
into Microsoft 365 online and I have
access to all the different apps within
the browser including something like
Excel so we can go to it now this
version looks very similar to the
version that you can actually install
the applications on your Windows or Mac
machine there are limitations like a
disuss before about power query and
power pivot so you're going to be
limited if you're trying to follow along
in this course when we get to those
Advanced chapters also the layout on the
web browser version of this app is much
different from that that's installing
your computer so I'm not going to be
providing any support on this course on
actually actually how to navigate this
so you're going to have to figure that
out yourself so we've discussed
everything except for these Mac versions
of Microsoft 365 and office so here's a
quick recap of all the different
features and cost of the three major
versions of Microsoft that you can get
in order to get Excel on your computer
for this personally I'm using the
Microsoft 365 family plan because it
includes all the different features that
I need and it also I save cost because
I'm splitting with my brother who now
that I think of it is actually paying
for it but it provides everything that I
need and so it's the one I'm
recommending for this
course now before we get into the
install I want to briefly show what are
the differences between using Mac with
Excel installed Vice windows and Excel
installed on it anyway here's Excel
installed on my Windows operating system
and Excel on this operating system is in
my opinion the flagship product from
Microsoft so they're investing all of
their effort and resources into
designing this application to make it
the best possible and then from there
Excel online and then Excel for Mac are
really just copycats of this anyway the
two main differences and the problems
I've run into in the past that Excel for
Mac doesn't have are in this data tab I
have a lot of different data sources I
can choose from and that's specifically
related to our power query lesson and
then finally it has power pivot which is
just completely non-existent on Excel
for Mac now here I am on a Mac machine
and we can see that it looks very
similar to before but there's a lot of
limitations that we're going to find
with this specifically going back to
that power query not a lot of different
sources you can choose from and then
yeah Power pivot is just completely
non-existent you may be like Luke I have
a Mac machine what do I need to do in
order to have the most premier version
of Excel and use for this well for that
I recommend installing a virtual machine
and virtual machines like parallels
shown here allows you to host a
different operating system on your Mac
machine this Windows example that I was
showing earlier if I actually expanded
out you can see in the background here
I'm running this on a Mac machine and I
have full capabilities en able to carry
out and running Windows on this now I've
been paying for and using parallels over
the past 3 years and I can tell you the
support and the offers from it are
perfectly fine and I love using it now
personally I'm using the Parallels
Desktop Pro Edition but you can get by
with just using the standard edition now
they also have this onetime purchase
that you could do which is 129 but it
doesn't get any further updates and I
really like how it actually updates and
fixes any bugs that may run into now the
other reason why I like parallels is
because it has this coherence mode I
have this blue little icon that I can
click up at the top to go into coherence
mode and then wait for it it allows me
to access any of those windows inside of
my windows vers virtual machine inside
of Mac so here is Excel running right
here inside my Mac and this is not only
limited to Microsoft Excel but also
products like powerbi which I'm using
pretty frequently as a data analyst I
can also run this into coherence mode
but enough about
that now that they got that out of the
way let's actually get into installing
Excel via in your Windows machine or on
your Windows Virtual Machine so the
first thing we need to do is navigate
over to
microsoft.com and I'm going to click up
here to Microsoft 3 365 we're going to
be going through setting up the free
30-day version so I'm going to click
this of try for free and from there
start my one month trial it's going to
ask me to sync my data I'm assume you
don't have it I'm also going to assume
you don't have an account so we're going
to create one I'm going to put in my
address and then from there create a
password after providing some personal
information you're going to need to
verify your email with the code they
send you now to be clear this is the
Microsoft 365 family plan which after
that 1 month trial it's going to be
charging you at
$99 every year so if you're just one
person and you're trying to switch to
the personal plane after this you'll
need to do that at the end or near the
end of those 30 days from there like any
company they're going to ask for some
payment methods I'm going to just go
ahead with PayPal PayPal's all set go
ahead and do more paperwork of adding
Bell and address and with that I can
start trial and pay later so now that
I'm logged in I want to install the
desktop app so it gives me access to
right here it's going to go ahead and
begin this it's going to ask if want to
allow this app to make changes to your
device yeah I trust them so only took a
few minutes and all the different
Microsoft 365 office apps were installed
so I just come down to the search bar
down here type in Excel let's pop it
open make sure it's working and in order
to get started you need to sign in in
order to verify that it's your
subscription so I put in my email and
password and already forgot my
password now I'm resetting my password
and now I'm all set up all right and we
got agre to some lawyer talk of
accepting licensing agreements at this
point I'm pretty worn out of going
through this process so I'm just going
to click through everything I'm not
going to send any optional data
personally I don't like to do that I
don't want to personalize right now and
it looks like I'm finally done all right
I'm into it and now that we're into
Excel we can see up here it should have
your name or your account that you're
going into and go in here into the blank
workbook all right so that basically
concludes this lesson on installing
Excel I do want to show real quick how
easy it is to actually cancel your
membership should you want to go about
just getting the free version or the
free 30-day trial and you want to cancel
it before any if I go back to my account
I can go in here to manage
subscriptions and here I'm inside my
Microsoft account which tells me I'm
subscribed to Microsoft 365 family I can
share it with up to zero to five people
and for that I just click on it and I
can copy a link and provide it to
whoever I want to share it with we're
going to cancel it so we can go to
manage subscriptions right here and all
we got to do is click cancel
subscriptions it's going to have me
confirm that I do want to cancel this
family plan makes me scroll all the way
to the bottom after showing me all these
different prices that I could get
instead and I'm going to say yeah I
don't want my subscription and as I'm
filming this on August 27th it basically
says hey you still have access this for
30 days until September 26th so still
technically have access to it so if you
haven't done it already it's your turn
to now go and install Microsoft Excel
the one of the options that I've shown
here in the next chapter we're going to
get into a spreadsheets intro to get you
familiar with how to actually use all
the different functionality or graphical
unit or interface gooey of excel with
that see you in the next
one welcome to this chapter on an intro
to spreadsheets and this chapter has
three different lessons in order to
understand what we're covering those
three different lessons we need to
explore some vocabulary with it so let's
jump into Excel for this lesson we're
going to be focusing on worksheets and
that is basically as you can see this
tab here called sheet one that is how to
manipulate these different cells within
this worksheet or also known as a sheet
in the next lesson we're going to be
going into workbooks so workbooks
basically captures either one sheet like
this one sheet one if I add another one
sheet two so it encapsulates multiple
different sheets within this program of
Excel and then finally in the third
lesson of this chapter we're going to be
moving into the ribbon which is up here
at the top and has a bunch of different
functionality to extend into those
spreadsheets along with using this file
tab up here that has a whole bunch of
features within it as well now this
chapter was designed for those that may
not have experience with using Microsoft
Excel before so if you don't fall in
that category as in you've used excel in
your job and you're pretty familiar with
all those different features I just
shown you can feel free to skip this
chapter and then move into the next one
on functions along with all those
different practice problems but if
you're not comfortable with that stick
around we're going to get into
it all right so the first thing you need
to do is open up that first Excel sheet
in the files you should have downloaded
from GitHub on onecore worksheets inside
of here I have an original sheet that
allows you to actually go in and fill in
everything we're going to be doing and
manipulating during the course of this
lesson then if you get lost along the
way or want to peek ahead to see what
we're actually going to do you can
actually scroll over here or select the
final sheet to see that now I want to
make this as big as possible for you to
see so I'm going to go ahead and close
out this ribbon up here and you can just
do that by double clicking on any one of
these different items up here and then
from there I also want to zoom in so I'm
going to come down here to the bottom
right and I'm going to just zoom in to
about 200% and scroll on over now inside
the spreadsheet it has all these
different cells and it's organized in a
manner where it has rows and the rows
are labeled with numbers 1 2 3 all the
way down to about a million and then we
have the columns and the columns are
alphabetical and they all go all the way
to where they start duplicating where
they'll put another letter in front of
the other and it'll go all the way
through xfd so let's practice some data
entry here I have a table we're going to
be filling in for this lesson basically
has all the different skills associated
with it and then I want you to actually
go through while we're going through
this and you don't have to provide the
values I do you can if you want we're
going to be filling it in based on our
difficulty when we made have started it
or level and then filling out some other
self formulas as we go so we're going to
start first with Excel and then the
difficulty so I'm going to select right
here and I can see which cell is
selected because it's sort of
highlighted here on this B and also two
but also right up here next to this
formula bar I just call that formula bar
we can see that we're calling out the
name of B2 so anytime we reference any
cells it first references the column
letter and then the row number so in
this case I'm selected in C7 so I'm
going to go ahead and give this a number
I'm going to say four for myself as you
notice I I just put it right in the Box
alternatively I can also select the cell
I want to go to and then come up here
into the formula bar press what I want
so I want five for Python and go from
there whenever I press enter it then
goes down to the next cell so
technically I could just go through and
enter this all in using my keyboard and
I don't have to click or move manipulate
at all except to select the cell that
wanted so those were all numerical
values when we move into the skill known
on whether we know it or not we want to
put in whether it's known or not we want
to put true or false this is known as a
Boolean value so typing in something
like true I can see when I press enter
it actually updates to be all caps for
this Tru so it recognizes the data type
of this as Boolean now if you're taking
this course you probably don't know
Excel so we're going to put in false
instead now say I want to update the
rest of these for false false I can yeah
go through and actually type it up or I
can select this lower right hand corner
of cell C2 and now I can drag these
values down and it will autofill it in
now autofills not just limited to
Boolean values let's say I had something
like Luke I could put that here and just
drag it down it's going to fill in Luke
all the way through here a cool feature
about Excel is say I have something like
one and then two I could select both of
these cells and then when I drag it down
it's going to actually fill in three or
four now autofill can also throw you off
especially for dates so let's say we're
filling in when we're starting Excel
which is we'll put in for the today's
date in my case it's August 27
20124 I'm going to go ahead press enter
to save that in it automatically updates
to this formatting here in America if in
Europe you may see the month in a
different location anyway if I select
this and actually drag down what you'll
see is is it will do that auto fill in
but it's not going to keep that same day
per it assumes we want to increment by
one day now specifically with dates if I
want to change the format I can actually
come up here and I'll expand out this
home ribbon again and right now it's Rec
recognizing that the number is of date
and for date I have a few different
options I can do short date which is
shown here or even something like long
date I can also go even further which
we'll explore as we get further into
this course into this more number
formats and date actually has a whole
bunch of other different options that we
can choose from but for right now we're
just going to keep it this simple date
format and I'm going to click okay now
assuming you haven't started any of
these I'm going to go ahead and actually
just select all the different cells that
I want and if you were to press delete
it's only going to delete that top cell
and that's sort of annoying because I
want to delete all these different cells
instead what I'm going to do if I'm on a
Windows machine I'm G to press delete or
in my case I'm using a Mac Windows VM
I'm press function delete and it's going
to delete all the different content
right I'm also going to go ahead while
I'm here delete all that different
content down there we don't need it now
we're going to move on to level type of
diet we're going to put into this is
text so in the case of excel you're
probably a beginner so I'll put in
beginner and then if I want to I can go
through and fill out different levels
for each of these so python Advanced RBI
Advanced and so on for all these now one
thing to notice real quick is for the
date it does specify in here under this
home ribbon that it is a date but all
these other one it just characterizes as
general which is perfectly fine now for
these other options down here let's go
ahead and say I wanted to put in
beginner for all the rest of these can't
necessarily drag and drop this but what
I can do is I can actually copy it
specifically I could right click the
cell and come up here and copy it but I
don't recommend that also over here on
the home menu they have an option as
well to copy or even cut something so I
can select something like copy as well
and it's going to put these marching
ants as they call it around the cell to
tell you that hey it's actually selected
and then if I wanted to paste it I go
ahead and select down here and I could
paste it down below that's not what we
want to do I don't like going through
and actually selecting all these
different buttons I want to minimize it
as much as possible and I want to use
shortcuts so in order to stop these
marching ants I can go ahead and press
escape and I'll select the cell that I
want to copy and from there I'll press
contrl C and that copies it and then I
can go ahead and paste it below by
selecting the cell that I want and
pressing control contr V now you'll be
noticing that when I'm going through
this I have these shortcuts peing right
here next to me on the screen so you'll
be able to follow along as well as I'm
using these shortcuts the other option
is I could cut this so I could press crl
X and then paste it in here crl V but
this is going to go ahead and take this
value out of here we don't want to
necessarily do that so I'll just copy
this again crl C and then paste it right
above here contrl V shortcuts are going
to be a big timesaver and we're going to
be using them a lot throughout this
course in order to save you time and
having you to go back to your mouse in
order to manipulate it and select the
different
cells all right so let's step this up a
notch and we're now going to get into
using formulas and formulas are denoted
by whenever we go into a cell like
difficulty here which we want it to be
on a 1 to 10 scale we denote formulas by
an equal sign and in this case we want
the difficulty to be on a 10-point scale
basically transition from that 5 point
scale so we need to multiply it times
two so we could do something like 4 * 2
and I press enter and it's going to give
me as I expect eight but I actually
don't recommend hardcoding values that
are already inside of excel here
specifically this four so instead of
this I'm going to remove this and I can
either type in the cell coordinates of
the cell so I could type in
B2 and as you notice it's highlighting
one the B2 is blue but then the cell B2
is highlighted in blue alternatively I
can have an equal sign here and just go
over and actually select it as well
whenever I press enter it's going to go
ahead and say a it's four now now that
I'm referencing that four I want to say
that this is 4 * 2 pressing enter we
have 8 once again we're going to use
that power of autofill so I can select
that cell of F2 and now drag it down and
what's going to be pretty interesting
about this is the two as denoted in the
formula bar and actually whenever I
click into it as well the two Remains
the Same but autofill automatically
knows to adjust the formula or the cell
coordinates for the next cell Down based
on how I did that autofill just to show
this as well I could say hey let's equal
this to B6 right below it and then if I
were to drag this over it's going going
to then put in C6 D6 E6 then F6 so
pretty cool I'm going go ahead and
delete this now the last column we're
going to be filling in is skill and
level we're also be using a formula for
this and we'll set this equal to this
skill thing and also this level so I'll
start by putting in an equal sign and
then it's not on the screen right now
but I know it's in b or sorry A2 and I
can see that selected by scrolling over
here now how am I going to get in that
F2 well I can do an Amper sand now and
from there I'll put in F2 and it has
this selected as well pressing enter
ended up in the wrong one sorry about
that should have been E2 and now I have
Excel beginner but there's no space in
between there this is sort of hard to
read so what I can do is actually
manipulate this to include another Amper
sand and then in between this I'm going
to put quotes and this is hey insert
this text character in between it
specifically I want to have a space then
a dash and then another space and then
press enter now if I tried to do this
without the quote if I just did this and
press enter I'm going to get a typo in
my formula you have to actually put
those quotes around to show that it's
text and it's trying to correct it for
some minus sign I don't really like how
it's doing it oh my gosh it's freaking
out now anyway I put the quotes back in
there pressing enter boom we have it and
like before I'm going to just do
autofill to fill all those
in so let's zoom out a little bit cuz
we're going to be now be working with
ranges which is a collection of cells
now if you notice whenever I select in
this case I'm selecting B2 it says B2 up
the top but if I go to select more of
this it will actually call out that five
r or five rows by two c or two columns
and then when I Let Go it just goes back
to B2 anyway ranges are a selection of
multiple of cells so if I come over here
to i1 put it in equal sign and then if I
want to say copy this entire range I can
go ahead and select this all so it's
saying it's A1 colon G6 so start the
upper left hand corner of A1 and the
bottom right hand corner of G6 now this
is pretty cool there's a new feature of
excel of dynamic rages it's going to go
ahead and fill this in there's only one
formula in here of that A1 through j6
but you see that has this Shadow border
around here that's showing that this
dynamic range is now filling in for all
these different things and if we look at
the formula bar it's sort of gray out
here too for it only at the very
beginning does it show that A1 and G6
and then you could manipulate it so if I
wanted to I could change it to G5 and it
would just go down a row now we're not
limited to just that we could in fact
select an entire column so in this case
I'll put an equal sign and let's say I
want to do the the full column of column
a right here I can select up here a it's
going to select all the way down and if
we go over to the formula bar itself we
can see that it's saying a colon a that
means all the contents of column A are
going to be included in this and from
there it's putting a copy putting all
these different things and then when
there's not a value in it because it's a
copy similar to over here for these
dates of zero we're going to see Zero in
all these different values all the way
down now similarly I can also do a copy
of a row so in this case if I wanted to
or multiple rows if I wanted to do rows
five and six I could press enter going
to get an erir with this though and that
has to do with this Q column right here
that we're copy and pasting here so I'm
going to go ahead and delete that real
quick get rid of it and now we have that
rows five and six duplicated below along
with that shadow around it and all there
now these ranges are going to save us a
lot of time later so I'm going to go
ahead and delete this right now I don't
want any of that as later on when we get
into actually using functions within
formulas I can use something like the
average function put in a range in here
so it selects all of it and then get the
average of it in this case now one last
thing to note on this before we wrap up
here on how to save this is you may have
noticed that this date started over here
is a number and that's because that's
how Excel stores dates with within this
spreadsheet right here so if I actually
click on it go back up to home right now
it's St storing it under the format of
General right now so if I were to make
this into an actual date we can see that
it is in fact 827 2024 now just some fun
little trivia if I were to put in number
one and transition it to a date so
coming up here and selecting date that
first date starts at January 1st 19900
and then they move on the numbers from
there all right last thing we need to do
is now save the work that you just
completed with this you can do this
multiple different ways we can come up
here to the top of your Excel workbook
right here and click save you can also
as shown you can use contrs
alternatively you can come over here to
the file menu and then come on down to
save or save as and then if you wanted
to you can specify the location where
you actually want to save your file and
save it there now you do have the option
which I highly recommend if you're
working with real world files you want
to actually save them to save this
autosave feature the one caveat to this
is that your files have to be stored on
one drive right now with the plan that I
have I can store about one terabyte of
files on there so if you'd like to do
that feel free to transition your files
there I'm not going to um and I won't
have Auto saave on for this but for very
important files definitely do have
autosave set up all right for those that
have purchased the practice problems and
notes you have some practice problems to
go through and get even more familiar
with manipulating cells inside of a
spreadsheet after that we're going to be
going into manipulating a workbook with
that see you in the next
one all right we're going to be
continuing on with this spreadsheets
intro focusing now on workbooks so
previously we were focusing on
worksheets which are a sheet inside of a
workbook now we're going to be focusing
on manipulating and moving data between
workbooks now for this I don't want you
immediately jumping into that 2or
workbooks Excel file this really just
has all the answers in it it doesn't
have really what we need for it instead
we're going to be starting with a new
notebook and instead importing in some
data so specifically if we go into this
folder of zore resource
into data sets we have this one Excel
file called Data job salary monthly now
this is similar to the data that we're
going to be using for the remainder of
the course we're actually going to use
another Excel sheet but this one here is
pretty neat because it's broken up by
months into different sheets so all the
job postings for January are in this
sheet called Jan and so on for February
and so on for March so what we're going
to be doing in this lesson is moving we
want to just evaluate the January data
move that into a new workbook so to get
a new workbook as easy as possible we're
going to come over here to the file menu
I'm just going go to new and click blank
workbook now here I have that new Bo
notebook right now it's titled book two
because it hasn't been saved anyway
going back to that file menu just to
show you I have different options I can
get a new notebook so we went into new
and just selected uh a blank workbook
also we could use this Home tab and
select a bank blank workbook based on
that also have a bunch of different
tutorials you can check out also we have
this open tab right here which allows
you on the left hand side to select a
location like this PC or even browse
different locations in your file system
but frankly I'm using more often than
not over here on the right hand side
this right here where this shows a past
history of Excel files I've worked with
so I can go through and actually select
an Excel file pretty easily we're going
to explore more about this file menu
more in a bit let's get moving some data
first now before we get into copying
this data into the new workbook itself I
want to actually just copy it within its
own workbook so if we noce some controls
down here at the bottom we have all the
different Sheets if we want to add
another sheet which I want to copy it to
I'm just going to add this in right here
and I'm going to call this Jan copy
press enter and that's new sheet and I
and I added that by just double clicking
in there and then allowing it to addit
addition I can rightclick it and I can
do things like rename it and that will
do the same thing now there's also some
controls around here you notice there's
some arrows on right here and what that
does is just Scrolls all the way over or
incrementally over so I can see all the
different sheets in this case there's
more sheets than I'd actually see in one
view then we have the scroll bar over on
the right hand side this is actually
just controlling the scroll area within
our new sheet of Jan copy so previously
we we saw how we can copy ranges using a
formula in this case I'm entering equal
to and then I'm just going to select
this range right here press enter and I
can get it inserted in and then actually
looking at the formula it's just equal
to
J1 uh colon p8 and this has its range
right there all right so I want to get
the contents into this sheet so I'm
going to start by putting an equal sign
and then I'm I go over to that Jan sheet
and when I go over here you're going to
notice that now next to that equal sign
I have Jan the name of the sheet and an
exclamation point this is identifying
the sheet and I want all this different
items so as I go to select it all you
can see that it's updating in the
formula bar right now I have A1 through
P2 selected but I actually want to
select everything in this sheet and
we're about at 3,000 rows and right now
I'm only about 500 of those this is
going to take forever so I don't
recommend necessarily doing this type of
method to try to select all your data so
I'm going to go ahead and Escape out of
this and go back to where we were at the
Gen copy instead once again I'm going to
press that equal sign go back to that
Jan sheet right up in the form bar once
again I can see that it has the Jan and
the exclamation point I'm going to
select A1 to start with and I'm going to
press the shortcut contrl shift and then
the right arrow key and now all the top
row is selected from here I'm going to
continue to hold control shift and press
control shift down and it's going to
select all the different arrows so as we
can see up here A1 to P 3103 scrolling
down we don't have any more data now all
I have to do is press enter and I did
this to basically show the nomenclature
now so now we're not only selecting a
range but we're also selecting a range
from a different sheet and this is how
Excel does the nclat or the formula
necessary to make this work and once
again this is a dynamic range appearing
inside of here but we really want to put
it inside of here into this new workbook
so what I'm going to do is I'm going to
actually delete this sheet right here
because we don't need this copy sheet in
here I don't want to actually manipulate
my data at all going to right click it
and select delete it's going to prompt
me any time that hey you're going to
permanently delete a sheet do you want
to continue yeah I want to continue now
once again I'm going to go back to that
original blank sheet that we have I want
to put it into here so I'm actually
going to name this one Jan and then
we'll call this one formula CU
technically it was a formula not a copy
I don't know why I did copy before
anyway back into A1 once again I'll
press that equal sign and then going
back to that other workbook I will
select it the first cell in there which
is actually A1 and now we can see we
have in the formul of the bar which is
actually the front of the bar which is
sort of strange in the other sheet that
our other workbook that we work with we
have inside of brackets the Excel file
name the sheet that we're in and then
the actual uh cell range of A1 we have
dollar signs around this this locks the
references of it which we're going to go
into more detail on but the main thing
to understand is this has A1 selector
right now but we want to select all this
data so that shortcut of control shift
right select all the different columns
and then control shift down okay it's
all selected I'm going to go ahead and
press enter and it's going to take me
back to my original workbook that I was
trying to work with this now that was
using formulas to copy this data we're
going to explore two more options the
second one is going to be somewhat
familiar using copy and paste so I'm
going to create this new sheet I'm going
to call it Jan copy and
paste from here I'm going to go back to
our original data that we have and since
we're at the bottom of the sheet I'm
just going to select the bottom right
hand corner press control shift left now
if you noticed it went and stopped
stopped at this Blank cell right here
which isn't a big deal I'll press it one
more time it'll go to the next cell over
that actually has a value in it and then
once again it's going to go all the way
to the end of a
3103 so basically if there's any blanks
while you're trying to do this it's
going to stop at those values there okay
and then from there I'm going to press
control shift up and as we're saying
it's going to stop at every different
Blank cell along the way this is going
to take forever unfortunately I don't
recommend you actually do that ever
again
instead start up at the top left and do
the control shift over to the right and
then all the way down in order to select
all the cells now like we did before we
want to copy it I could either use this
up at the top in the home ribbon right
here I could actually select copy or the
shortcut which I'm going to recommend of
contrl c and from there going back into
our new workbook selecting cell A1 and
then using contrl V and pasting all this
data in now moving on to the third
example which is is actually the one I
recommend you do anytime you need to
move sheets of data basically in both of
those previous approaches you could go
about missing getting data to move over
so I don't really recommend doing that
instead I would come down here to the
Jan sheet write click it and select move
or copy so we have this new window that
pops up and it has two book right now it
has this Excel sheet selected of data
job salary monthly we don't want to move
to that we want to move to book two we
also move to a new book but book two is
open that's what we've been working in
that's what we're going move to okay we
can see we have the different sheets
that we've already made in there and it
says in this dialogue this is where you
want to put this before this sheet and
we want at the end so we'll select move
to end now we don't want to take this
sheet Jan out of here we just want a
copy of it so we're going to select this
create a copy and then click okay now JN
has moved over here but I do want to
actually differentiate this so I'm going
to
put mover
copy now in the next lesson we're going
to be exploring more about the ribbon
but we're going to be exploring now more
about the file menu or also known as
backstage view we've gone through this
home new and open we also have this here
for share this is available for well if
you're sharing it via one drive this
makes it super easy to share with your
co-workers we're not going to go into a
lot of detail but this is a great option
if you're working in one drive and you
want to actually collaborate with other
co-workers you can work on Excel files
at the same time moving down to the list
here we also have get add-ins and we're
going to be actually looking at
different addins we can use in the
advanced chapters whenever we get to
that so we working with some addins with
that next up is info which has over here
on the right hand side some key metadata
about our Excel file itself then if we
want get into actually protecting our
workbook which we're going to cover in a
few chapters down the road you can get
into actually doing that the only other
thing that I find myself doing from time
to time in this section is on version
history once again this requires you to
be using one drive for it but you could
go back and revert back into a previous
version that you work with so it's great
for that now moving into save or even
save as since we haven't saved yes
they're both the same right here I'm
going to go ahead and save this but I
don't want to save this on one drive
personal I'm just going to shave this on
my desktop so I'll come and select
desktop and then I'll name this two
workbooks and save it now Beyond save as
we also have things like print which I
really don't find myself doing that too
often should be sending an electronic
version export if I wanted a pdf version
of something and then finally close as
well same thing as this x up here just a
x out of it and there's two more areas
down here that I want to call out and
that's a count and that allows you to
actually see behind the scenes of what
going on with your Microsoft account and
this is generic to all the different
Microsoft products that you have so not
just Microsoft Excel as you can see from
my information I'm actually inside the
Microsoft 365 Insider program so I get a
lot of access to Insider features get to
experiment with new stuff before any
other people do anyway this is where you
want to come anytime you want to make
sure that you have your Microsoft
products up to dat I have automatic
updates available so even I'm I check to
update now it's going to tell me hey I'm
up to date the other thing to note on
this is the different office themes that
you have on this I'm actually going to
change this right now to use system
settings which on my Mac I use dark
theme so it's going to go to that last
two options are hting down here behind
more I have feedback so if I wanted to
give feedback to this product i'
probably go to something like X or
Twitter instead and then finally options
we'll be getting to options later on in
this but this allows a very much more
advanced features that we can actually
go in and customize using this menu
especially whenever we get into add-ins
we're going to be doing that from here
all right so now you become an expert at
how to manipulate different spreadsheets
or sheets along with manipulating them
between different workbooks in the next
lesson we're going to be going into this
ribbon up here and actually exploring
everything a little bit further and
getting a sneak peek into each one of
these for those that purchase the
practice problems and course notes you
have some practice problems to go
through now and experiment working with
different workbooks with that see you in
the next one where we get into the
ribbon see you
there all right this final lesson of the
spreadsheets intro we're going to be
getting into the ribbon inside of Excel
and better understanding what are all
the different tabs and what are the
capabilities by doing some simple
exercises for this we're going to
continue to be analyzing that January
data set that we worked from the last
Lon and we're going to actually get into
actually performing some data analysis
with it so for this lesson you can open
and use that ribbon menu Excel file
which I have right here and all the data
that we're going to be working with are
that January data is in this data tab
along with all the examples and all the
different tabs but I don't need this I'm
not going to work with this so I'm going
to close this out instead I'm going to
be working off where we left from last
time in that two workbooks where we
actually moved over that January data
set now quick disclaimer for any of
these files that you're opening up if
you're noticing the security warning of
automatic updates of links have been
disabled can go ahead and just enable
the content and then click right here on
do not ask me again for network files
and select yes cuz I want to make it a
trusted document now if you're getting
any of these areas that the file has
been moved renamed or deleted cuz mainly
you have it in a different location of
what I had it here's actually the
address of the file that I'm using I
open it up anyway this is the actual
address of where the file is anyway you
can come down here and select these
three dots on the file in question and
just select change Source go into browse
and then from there inside the actual
file itself select where this is so in
this case it's looking for that data set
file with the data job salary monthly
I'm going to select it select okay and
then it's prompting me now that this
link workbook hasn't been refreshed want
to and go ahead and refresh it and it's
going to update it all right close out
of this now anyway that was all s silly
because I'm going to go ahead and delete
this formula one right here and also
this copy and paste tab right here we
only want to keep the Mover copy which
is the actual sheet that we moved over
that has all the data for this lesson
okay I'm just going to rename the sheet
data so let's dive into this Home tab
and this thing has a lot to do with
formatting the text and how things
appear within the spreadsheet for
example I can select all these top rows
right here so basically A1 all the way
to P1 I can change this font size to
something like 12 for the fill color or
the background color I can change it to
something like a light gray right now it
looks like it's already bold I could
turn it off or turn it back on
inspecting all these different columns I
can see that some of it is hidden
especially here this date column I can
see inside of here this is the actual
value but whenever we actually look look
at it from afar like it it has these
Amper sand signs so double clicking on
the edge of that H column right here it
actually expands out and moves it where
it needs to go you can actually do this
for all the column by just selecting all
of them and then double clicking that
last one and then that expands it all
the way we can see that that last column
is well super long so it has all the
different skills typically these titles
up the top I'd like to maintain centered
so that way I know that it's a title but
I could move it to either side also so I
can move it up or down if I wanted to
but we'll leave it right there in the
center as well getting into the number
formatting itself I can actually go and
select something like job post to date
it's going to select that whole column
if I wanted to I can turn this into a
date so in our case I want to do a short
date now other columns I would want to
format are these salary year average and
also salary hour average so besides just
clicking here I can also just select
that hey I want to use this as an
accounting number format and it's going
to automatically put these decimal
places at the end two decimal places
since we're in the 100 thousands I don't
really care about so I'm actually going
to remove them by saying decrease
decimal I'm going to do that twice now
for something like salary hour average
I'm going to also convert this to a
currency but for these these may have
two decimal places of values included in
it so I'm going to leave it now so for
the Styles and cells portion we're going
to be getting into this more especially
into conditional formatting in the
spreadsheets Advanced chapter and
chapter 4 so we'll save that for then
the next thing I want to do is get into
this editing and this is a pretty
powerful feature we can actually sort
and filter our data if we wanted to so
what I'm going to do is actually select
all these cells from P1 all the way to
A1 and then come in here inside of
editing select sord and filter and apply
this filter so let's actually get into
filtering this data specifically I'm
wanting to investigate
jobs or data analyst jobs in the United
States and specifically full-time jobs
we're going to be looking at the salary
data for this so I want to filter it
down for it so I'm going to select here
I'm going to unclick select all and
select data analyst and now it's going
to filter for all the different data
analyst roles there nothing else that's
not there additionally that job schedule
type I want to be looking at full-time
roles only I don't want to include any
other ones so I'll select fulltime I
want the country I don't want to be
skewed by any other countries I live in
the United States so I'm going to then
select United States and then finally I
only want to look at the salary or the
yearly salary data so I can actually
come over here to the salary rate and
select here I only want to look at the
year data okay so now this has
everything in it that I want we're going
to get to analyzing and visualizing this
in a second before that I want to talk
about two other features addins which we
talked about before on how you access to
the file menu you can get to addin via
this and finally analyze data which in
my opinion isn't that strong of a
feature this tab uses a little bit of
artificial intelligence behind the
scenes for you to investigate so it'll
actually provide you different
visualizations that you could actually
visualize out of your data and or even
you can go as far as asking a question
about maybe you want to see hey the
distribution of salary rate or something
like that all you have to do is come
down here and then insert in the chart
that you want to insert in I'm going to
close out of this now we can see that
we've made this salary distribution um
that we maybe want to visualize overall
though I find that this analyzed data is
pretty hit or miss so I'm not using it
very
often now the insert tab is where I
spend the second most of my time after
the Home tab they conveniently put in
the correct order there's three major
use cases that I'm using out of this in
chapter 4 on the advanced use of spread
sheets we're going to be going into
tables and then in chapter five we're
going to be going into pivot tables but
even closer to that in chapter 3 we're
going to be going all into depth on how
to use these charts but let's get a
sneak peek into this specifically
remember we filtered this table down to
data analyst jobs in the United States
and specifically full-time roles we want
to visualize this salary year average
column so with column M selected I come
up here to recommend charts and it's
going to give me a visualization of some
well recommended charts now there's only
four here I can also select this other
tab up here on all chart and actually
try to see hey what would this look like
maybe in a pie chart or a bar chart
anyway I want this in a histogram which
we're going to go into more detail on
how to read this later what all have to
do is just come in here double click it
it'll insert it in now notice how
whenever this was created we now have
new tabs appear inside of here
specifically with this selected we have
this chart design and format tab if I
select off of it those tabs disappear
and select it again they reappear this
tab allows me to dive in and actually
further customize these visualizations
to how I want them to appear I can even
move them to let's say a new sheet and I
can title this something like histogram
and then move it the charge Stone always
necessarily appear just like that let's
actually do a deeper analysis to see
what are the different job title short
columns available I want to clear all
these different filters on here so I'm
going to come back up here with this one
row selected come into editing sort in
filter and I'm going to say hey clear
all the different filters now selecting
column A going into insert and into
recommended charts it's recommended this
clustered bar chart which is actually
what I want to view so double clicking
on this this provides me a breakdown of
all the different counts of the
different job titles within our our data
set and we can see things like data
scientist engineer and analyst are some
of the highest amount of job postings in
this data set now unlike our histogram
example this actually provides this data
in a pivot table which we're going to be
going into in the pivot table chapter
which allows me to further manipulate
the data so say I want to actually sort
this I could rightclick the values right
here and clict hey sort smallest to
largest and then closing out this pivot
table tab right here I can actually see
what is the highest amount of job
compared to the lowest which is cloud
engineer now there's remaining tabs
we're going to be going and hopefully
rapid fire in order to cover these as I
find I'm using these less frequently
than these other tabs that we previously
talked about the draw tab allows you to
well draw on your spreadsheet so I can
just write on it if I wanted to but I
don't really find myself doing that
except for maybe being I'm building
dashboards besides that use case is
pretty rare if I want to end do this
drawing right here I can come up here
and click undo or I can select contrl Z
and it'll remove it page layout tab is
great if you're having to print out any
data for those co-workers that are
living in the past and don't know how to
accept things digitally you can do
everything from adjusting your page
layout to adjusting the scale that
you're actually viewing things now
personally I find myself more using
these sheet options right here so if I
go to this job count tab right here if I
wanted to I could turn off the grid
lines on here as you can can see it got
white on the background I really like
that now if I wanted to make sure they
had actual grid lines around my table I
come back to the Home tab and for here I
can select borders and from there I want
to put all borders on there so now I
look like I have this table right here
along with my graph super fancy next up
is formulas this is where you need to go
if you can't remember a function that
maybe you want to use if it's a text
function you come in here select
something like text you can scroll
through and actually see even a
description of of the different
functions that are available so in this
case replace it tells you hey replace
this part of a text string with a
different text string depending on what
version of excel you have and the newer
ones you'll have this insert python to
insert python functions and then finally
they have more advanced features with
maintaining and updating and formatting
your different formulas and functions
which we'll be diving to in the next
chapter now besides the home and insert
tab the data tab is the next tab that I
find myself using all the time in
chapter 7 we'll be diving into Power
query and we're going to be focusing
heavily on this getting transform data
and also queries and connections and
then in chapter 8 when we get to power
pivot we're going to be going into
managing our data model with power pivot
in chapter 4 we're going to be going
into this forecasting and we're also
going to be adding in some extra add-ins
that are going to appear in this data
tab now I sort of skipped over the data
types and sort and filter because we've
saw them on the Home tab they're just
conveniently located here in bigger
format for you use also all right this
tab on review is probably the least
likely for me to actually use I can
actually go through and check things
like spelling and add comments or even
protect my sheet besides that I'm not
finding I'm using that this often view
tab is similar to the review Tab and
that I'm using it a little bit more you
can change the format of how you
actually want to view things but mainly
I'm finding myself using this the most
of freeze pains let's say you see I'm
scrolling down here and I don't know
what the job or what the he headers are
right here so going over to this data
tab I can actually come in here to
freeze panes and select freeze top row
or even freeze First Column so in this
case that top row actually stays up
there and I really like it like that now
let's say I want to freeze both the top
row and that First Column there's not
really a selection for that so here's
what you can do you can come over here
to freeze panes and select unfreeze
paines and then select something like a
cell like B2 that means I want
everything above this and to the left of
it to freeze so now when I select freeze
panes this upper or top row is actually
Frozen and then the actual First Column
is Frozen as well all right final tab is
help and I'll be honest I think this is
pretty useless if I get stuck with
anything along the way I'm finding
myself navigating to something like chat
GPT and it's helping me a lot quicker
than trying to navigate through this
help box that it provides and I'm
already getting an error message with
even accessing it so you can see how
often I even use it
then now we've been doing a lot of
manual clicking with using the ribbon
and I think a good resource that goes
with this is shortcuts so if you come
inside of the resources folder we have a
Excel file here called Excel shortcuts
and what this has in it is a list of all
the different shortcuts that I find
myself using anytime I'm inside of excel
so it's worth having all of these I'm
not going to lie committed to memory it
looks like a long list but I'm telling
you by the end of this you're going to
have all of these basically committed to
memory they're going to be timesaver now
although I shed on people that print out
stuff this would be something that I do
recommend actually printing out and
having next to you so that way you can
reference really quickly while going
through this course all right now I know
we move fast through that but we're
really going to be diving into as I
called out during this lesson all of
these different tabs even more as we
advance through all the different
chapters that was more of a sneak peek
into what you're going to be exposed to
coming up in this course all right for
those that purchas the practice problems
you have some problems to go through and
actually experiment more with with the
tabs in the next chapter we're going to
be jumping into functions and also more
specifically formulas order to build
them out and form data analysis on that
data science job posting data set with
that I'll see you in the next
one all right welcome to this chapter on
formulas and functions in this lesson
we're going to be focusing specifically
on going a deep dive and understanding
formulas then in all the follow on
lessons this we're going to spend the
majority of our time working on
functions for that we'll be exploring
the entire function Library focusing on
the key functions within this library
that I find that I'm using time and time
again in data analytics so what are we
going to be doing in this lesson well
we're going to be focusing on a
fictitious data set we're going to keep
it small in order for us to get more
familiar with operating with formulas
and operating on this data set
specifically by the end of this we're
going to be able to input into into this
worksheet a number of years of
experience or total salary and be able
to see whether these jobs meet those
conditions specifically me that I meet
both of those conditions so for this you
can follow along by opening that
formulas intro workbook in this workbook
will be staying in this data sheet right
here all the different answers when we
get to the math operators comparison
operators or cell referencing are shown
via that sheet but we'll just be
sticking for data for
now first as math operators and as shown
by this table here you can use a variety
of different symbols for to conduct
different multiplication subtraction
division operations that you want to do
so let's dive into testing some of these
out we're going to be filling in each of
these columns that correlate with the
associated job title as we go through
this so the first one's going to be
experience pretty simple right we talked
about before in order to reference
another cell we would use an equal sign
and then from there we can either type
or select a cell I'm going to recommend
just typing it to make it go faster C3
it's highlighted blue because that's the
cell that's highlighted then we'll be
using the autofill feature of this to
fill in all the cells below and we
notice that it updates to here this
one's equal to C12 which correlates to
this one right to the left of it so
let's calculate our total salary and
this is going to be taking our annual
salary in column D and adding it to our
bonus Max in column e so we can do this
by specifying
D3 plus E3 and from there there pressing
enter once again to autofill it I select
that cell that I want and drag it on
down now if I want to calculate what is
the rate of bonus or the bonus rate that
is going to be the bonus divided by that
salary so in this case E3 / D3 once
again going to use autofill drag and
drop it all the way down now for all
these values I don't like what it's
formatted as right now I'm actually
going to change this to a percentage and
I want to see one decimal place so I'll
press this one to expand out one now
anytime I do any type of mathematical
operation in Excel I always want to try
to confirm it that it's correct I did
the operation correctly so in the case
of this bonus rate I can do this by
confirming what we got for total salary
previously so if we took that bonus rate
is which we want to confirm right so
we're going to take that and multiply it
times our annual salary right so that
should give us that bonus rate right
there then if we wanted to like we said
we want to confirm total salary right
here so I can just add in that we want
to also add in that annual salary itself
and we do have that total salary right
here to actually confirm what's going on
dragging it down and doing an autofill
all these values look like they
correlate to what it should be for total
salary so I feel we calculate a bonus
rate correctly now going back into the
formula itself you can see we have
multiple operations in here how do we
know whether multiplication addition
subtraction what comes first well really
if you know the order of operations it
really is the same here here the
different operators listed in their
order of Precedence exponentiation comes
first multiplication division or second
then addition and subtraction are third
it's Then followed by concatenation
which we did in one of the previous
lessons followed by the comparison
operators which we're about to get
to so with that segue here we are
comparison operators
for this you probably are familiar with
the first three the last three are
something that get a little bit more
complicated whenever you have a greater
than or equal to less than or equal to
or in this case a not equal to so
previously I just sort of did a cursor
check to make sure this confirmed t
total salary column equals this other
total salary column but imagine you have
hundreds of thousands of rows how can we
actually compare this and find these
values well what we can do is we can say
hey is G3
equal to I3 this looks a little bit
confusing right CU you have two equal
signs in there but everything to the
right of the equal sign it's basically a
comparison and from there it either ends
up as a true or a false and we can drag
and autofile this in and everything is
true similarly if we want to find
something like is the bonus Max greater
than the annual salary we can do hey is
bonus Max at E3 greater than that at D3
and the typical of any data a science
job none of these really exceed that at
all all right now that we're familiar
with math operators and also comparison
operators let's dive deeper into cell
referencing and we've been doing this
previously whenever we reference another
cell like A2 but we're going to add a
little twist to this I'm going to go
ahead and hide some of these columns
that way we clear up the Clutter going
to hide column F by right clicking it
and selecting hide then I'm also going
to select all the columns H through k
and also hide them want everything to
appear on the same sheet so we're going
to be referencing this table down here
for this portion of the exercise and
this is potentially goals that you may
have when you're trying to land a job
you may know how many years of
experience or you should have know how
many years of experience you have along
with a goal total salary that you want
to achieve and so we're going to be
building out formulas with this in order
to be able to find out which of these
jobs actually meet our conditions of the
expected years of experience and total
salary for so for this we'll go with
that I have five years of experience
then I'm looking at
$90,000 the first we want to calculate
in column L is whether it meets our
experience so for this we'll say hey is
C15 right here less than or equal to the
value right here in our experience and
as expected five is less than or equal
to basically equal to 5 it's true now
we're going to run a problem now when we
try to autofill this if I try to
autofill this down I'm getting this one
is false and then these all is true but
I would expect especially this AI
specialist at three it would be false
and so let's actually inspect this well
as we can see from this this is
referencing well c23 which is way down
here but it's still referencing the
correct C11 right here the problem is we
didn't really want this value up here
this C15 to actually change whenever we
went to do the autofill down below it so
what we can do here is provide a fixed
reference of that cell in order to do
this we're going to insert those dollar
signs that we saw
previously before the column and then
also the row so in this case I have C
locked and I have 15 locked now the
formula itself doesn't change at all but
now when I drag and drop this down all
of these are updating correctly as
expected AI specialist is going to be
false whenever I actually click on it to
inspect it it's still referencing that
C15 C11 next we're going to move on to
column M of seeing if it meets our
salary requirements so for this one
we'll be seeing hey is the salary or
total salary in G3 greater than or equal
to our total salary down here of 90,000
now we already know we need to lock c16
of this 990,000 because we're going to
be autofilling it down I can manually
type in the dollar signs but a shortcut
to this is just pressing F4 if you're on
a Mac you'll need to press function F4
anyway this locks this in so now
whenever I drag and drop this down as
expected the only other one that's less
than 990,000 is this data analyst rule
right here now I want to play with this
just a little bit more so we talked
about this right here putting a dollar
sign in front of the column and then a
dollar sign some of the row is a fixed
reference they also have what is called
a mixed reference so I'm going to go
ahead and put my cursor right there next
to G3 I'm going to press F4 and it's
going to do the absolute reference but
if I press it one more time it's going
to do a mix reference if you notice
there's only a dollar sign in front of
the three or if I press it again there's
only a dollar sign in front of the G now
technically this is going to work but
fine because we're going to now lock
this G column for this but it's going to
allow the three to update so I'm going
to show you this now by actually
dragging and dropping this down and from
there inspecting that last cell contents
we can see that that g is locked as
expected but it moved down now instead
of locking just the column we could also
lock the rows so I could also do change
up c16 now instead and lock the rows of
c16 cuz we're going to still stay in
that c column right there pressing enter
now autofill we don't have to just go
down we can also go up so inspecting it
locking it didn't really change by only
locking the row of 16 so let's wrap this
all up by actually def finding out which
of these actually meet both of our
conditions of 5 years and 990,000 well
it turns out that behind the scenes true
is equal to 1 and Z is equal to false so
if actually were to take this and add
this true to this true right here we
should get two autofilling it all the
way down we have two1 2 1 so basically
confirm that hey zero yeah false is zero
because 0 plus 0 is Zer now I recommend
instead we're going to be going through
and doing L3 * M3 so that way anytime
either one of these are true they will
return a one and now in order to get a
true or false back on whether it meets
both we can select that N3 and see hey
is it equal to one type over there equal
to one and it evaluates to true so now
I'm going to go ahead and just hide
these columns so we can actually see
this a little bit better but we can
find values in here that meet our
conditions of the 90,000 or 5 years and
let's say we're doing job searching and
it lasts over a year um we have to
change this to six this will
automatically update the formulas that
we've used here as shown here so that's
our intro to formulas and for me the
hardest thing to wrap my head around
when I was first tackling this was
around absolute and mixed references so
we have some practice problems for those
that purchased the course practice
problems in order to go through and test
this out and understanding what happens
whenever you lock the row or lock the
column all right and after that we'll
next be diving into an intro into
formulas which I'll be covering for the
remainder of this chapter with that see
you in the next
one for this lesson we're going to be
focusing on an intro into functions
specifically we're going to be going
over all the different functions that
we're going to be deep diving within
this chapter itself along with some
common problems you may run into and
errors and how to troubleshoot it to do
this we'll be continuing on from that
data set that we used in the last lesson
specifically we'll be calculating things
like averages and counts and how many
jobs actually meet our goals and we'll
be using functions for this so you can
continue working in that workbook that
you had from last time or open this
function intros workbook in this
function intros workbook I've gone ahead
and moved our job goals over here to
that column RNs and then added in this
bottom portion right here for the
averages and total counts really you can
do and manipulate as you
want so why use functions let's look at
a couple quick examples on the
importance of these things let's say we
wanted to get the average of each one of
these Columns of experience annual
salary and bonus Max previously we know
we can actually reference each one of
these cells to calculate the average we
wanted to do that we would have to
actually add up all the values so I have
to go through select C3 C4 all the way
down to
C12 and we would need to divide it by
that total number of 1 2 3 4 5 6 7 8 9
10 in that case we' get the average also
that me count that 10 wasn't necessarily
perfect so I don't really recommend
doing this but anyway nonetheless we can
actually do autofill to calculate the
averages as the is as well as it
automatically update the referencing
correctly to it but I don't recommend
doing that instead I recommend using
functions specifically we can use
something like the average function as
soon as I start typing a function a in
this case all the functions that have
the a name pop up if I wanted to well I
do know I want average right here I can
select it it provides a brief statement
of what it's actually going to do and
then I can doubleclick it to insert it
below here it actually specifies what's
going on with this function here and
specifically to provides me to hey
provide in these numbers now I could
select these number by number as we can
see that there's in Brackets here this
number two that means it's an optional
parameter but instead what we'll do is
we'll just provide a range providing it
from C3 all the way to C12 in that case
I got 5.3 similar to above and then
dragging this over we can get all the
other values as well as a quick example
also previously we had made this sort of
convoluted formula in order to calculate
calate whether we met both conditions of
mean our experience and also our salary
which we're specified over here well
there's actually a formula for that and
it's called the and formula and what it
takes for its arguments are logical
values so it can take a logical one for
the first parameter I can specify L3 and
then for the second parameter I can
specify M3 and notice how this second
parameter now highlights or becomes more
bold as I put it in so you can keep
track of where you are in the formula
any I'm going to close the parenthesis
press enter and it evaluates to True
dragging it all down these should match
these other ones and yeah this is
definitely something I'd use over these
formulas that I've used
before so let's dive into this formula
tab more and understand the capabilities
that we're going to be carrying out the
next lessons in this chapter the most
powerful of these especially for those
new to excel is this insert function
anytime you're looking for a function
and maybe can't can't recall the name
and you're not sure what even starts
with you can put something in here so
say I wanted maybe the average I can
type in average and then everything that
basically calculates a different average
off of it even if they're closely
related like this rank average will pop
up in here along with a description
below explaining it if you've used a
formula recently you can come in here
under recently used and I frequently
find myself just going back to this in
order to select something I may have
used recently now in the next seven
lessons we're going to be diving into
each one of these all the way through it
from logical and text to look up and
also math and trick now one note we
won't be going into detail on this
financial functions because I find
they're sort of nuanced but we will be
going into all the different ones that
I'm using on a daily basis as a data
analyst that aren't specific to
financial
applications so let's get into
understanding the basics about formulas
by calculating these different counts
and especially counts around whether any
of these jobs meet our goals for this I
know I want to use a count function so
I'm going to go to this insert function
I'm going to type in count now there's a
bunch of different ones that pop up
count itself just counts the number of
cells in a Range that contain numbers it
has to have numbers in it if I wanted to
do something more around text I would
say hey count the number of cells in
range that are not empty I could do even
do something conversely of counting the
number of blank cells for us we want to
actually do count so as we showed before
I'm just going to come in type count
it's going to prompt me that I need to
at least put at minimum a value and I
want to count all these cells here so
using autofill to fill it over um we can
see that all the different values are 10
nothing really spectacular here but now
let's get into a pretty unique use case
of count so in this scenario that I'm
count trying to calculate in cell c16
I'm trying to find out how many jobs
above here in these 10 right here how
many meet our goal of less than or equal
to 5 years and I want to count the
number of these so I know I want a type
of count I can go into insert function I
know it's here inside these different
statistical functions specifically I
have these different counts right here
and I'm going to scroll over this count
if right here and it's going to provide
me a description it says Hey counts the
number of cells Within range that meet
the given condition and that's what we
want to do we want to meet a condition
of a certain amount of experience now it
provides this box in order to help me
input in these values so for the range
here what I can do is specify hey I want
to count inside of here if they meet a
certain criteria and just going back to
that range right here we can see that it
already input all those different values
into an array likee object okay so the
criteria right now is NX want to put
something in here I can also press this
box and it'll make it disappear and I
want to compare it to this experience
but I want it to be less than or equal
to five so I can press enter to accept
it but the problem is it's going to
evaluate whether five is any of these
columns here and right now we see that
there are two I'm going go ahead and
close up so we can see this better right
now we can see that there's two fives in
here that's not what we we want we want
to see everything that is less than or
equal to 5 so instead what we need to
put in here is less than or equal to 5
now I'm going to press enter and we're
going to get an error this is pretty
common whenever you are manipulating
different formulas and you have in this
case I have this less than or equal to
right here so Excel is confused by this
what we need to do is actually put
parentheses around this which basically
sort of makes it into a string or text
if you will but now it knows hey I want
you to look for less than or equal to 5
I want you to evaluate this entire thing
pressing enter bam we have six values
here that are less than or equal to five
now similarly I can drag this over
because we want to also do this for
experience but I don't want to do less
than or equal to five I want to do
greater than or equal to
90,000 and in this case we have nine cuz
we only have have one that's less than
this but as you find out on this course
I don't like hardcoding values into my
formulas in this case I have five inside
of here but I'm already having five
right here what happens if I want to
change this maybe to say something like
three well it's not going to actually
update these values right here so I'm
going to go ahead and actually change
that back to five and we're going to
make another formula that actually fixed
this so I want to drag these down but we
actually didn't lock either one of the
these cells and it will cause errors if
we do so I'll just select right next to
it press F4 next to C3 I'll do the same
of f4 doing the same in this cell as
well all right now I'll take this and
I'll drag this down so now let's
actually fix this to be more Dynamic we
don't want it to be less than or equal
to this five right now what we can do is
that Amper sand operator and then from
there put in reference to S3 which
contains our five pressing enter bam we
got six same thing here I can delete
that 90,000 put in an Amper sand and
then from there we're going to be
basically putting it to mashing it
together with that 90,000 and it
evaluates now when we change this
experience to say something like two we
can see that it actually updates
appropriately to see that oh only one
job meets this requirement so pretty
cool I'm going change that back to
five now frequently you're going to run
into errors with your formulas let's say
I wanted to divide one by zero not a
good thing that we need to do anyway I'm
going to get this error right now you
can notice it because it has this green
check on the upper left hand corner but
also it starts with this hashtag and
it's saying hey you have a divide by
zero error I can even come down into
here and it tells me even more on this
provides help on this or if I wanted to
even ignore it now in this sheet of this
work workbook I have a bunch of
different errors in here that you may
run into from time to time again and
we're going to be running into these
errors as we go through the rest of this
chapter so if you get stuck along the
way while we're going through this I
feel like this is a good reference for
you to maybe save somewhere in order to
understand what is going on with the
different errors you may encounter now
the biggest time saer I've found with
any of these errors is using some sort
of chatbot specifically me I'm going to
go to something like chat GPT or even
claw they're going to be able to provide
really quick help in understanding what
an error is and what I need to do to fix
it all right so now it's your turn to
dive into and test Out These intro into
functions and play with them and
experience some of the errors of your
own after that we'll be diving into
logical functions a major type of
function that you need to be aware of
with that I'll see you in the next
one now that we have the basics down on
formulas and also functions we're going
to be moving into one of the most
important typ of functions to know
logical ones the most popular of these
are an if condition basically looking at
something and then providing a response
based on it so for this analysis we're
going to be jumping into our data
science job salary data set but we're
only going to focus on the first 20 rows
of it here and on the next few lessons
as well as I don't want to overwhelm you
with the all the data just yet now for
the final results we're going to be
doing two major things the first is
determining within this list of jobs
whether they meet our conditions of
finding the job we want of a data
analyst or business analyst and will
Market not desired or Ro desired
additionally we're going to do a common
practice and analytics of bucketing
basically taking those salaries and
depending on the amount value putting it
into a certain bucket for us we're going
to be looking at whether they have
salary data in this data set or more
specifically if they are greater than
our goal of 85,000
so why are these logical functions
needed well let's jump into that last
data set real quick and simplify how we
can actually use these as a quick
example previously in this P column we
were evaluating whether they met both of
our conditions of experience or salary
we can use an if statement in order to
clarify this so I can specifically call
out with an if statement saying if it
has The Logical test that we want to
actually evaluate so I'm going to put in
P3 in this case as it's going to return
true or false and then from there the
next value in there is value if true
which what do we want to return if it is
true well that our goal is met and then
if it's not met we want to have well not
met okay and then this whenever we drag
this down will provide not met or goal
met depending on if this is true or
false and so that's the power of these
if statements in helping us actually
provide this
value so that was just a quick example
of if let's actually jump into some more
examples so you get more familiar with
how to use this so here we are in this
data set and I don't need all the
columns of this data set so I'm just
going to select the columns that I don't
need I'm going select B through G and
then hide it additionally I'm not going
to need I or J so I'll hide these as
well so our first goal is to identify
whether these jobs meter conditions of
either a data analyst or a business
analyst we're going to start simple by
just finding out which one is a data
analyst first and then which one is a
business analyst and meets those
conditions so once again we'll start
with that if condition and for this
we're going to put in that logical test
remember pretty the example we need to
have a return either true or false so
we're wanting to check whether senior
data engineer in A2 is equal to data
analyst in K1 now we're going to be
autofilling this down so we need to make
sure that the A2 we're fine with it
actually adjusting as necessary K1 we
want it to lock at least lock on the row
value of one then if it's true we'll be
roll desired and if it's not it's not
desired as expected senior data engineer
is not desired let's drag this all the
way down and just double checking it we
see that the data analyst roles are R
desired okay so I can drag this over now
and just to double check it shifted over
to B2 but it's still but it's selecting
a right one of L1 so actually what I'm
going to do is I'm going to delete this
go back up here I'm sort of a
perfectionist I'm going to end up
locking that a value so it stays in that
a column none my values are going to
change here and then when I actually
drag it over I can check that okay A2 is
the correct one I once selected to
compare it to business analyst in
L1 okay then I'm going to autofill all
the way down looks like there's only two
business analyst roles here so now how
can we identify that it meets both of
those conditions both data analyst and a
business analyst well we're going to do
one approach first and it's called a
nested if statement and it's not really
the approach I'm going to recommend but
it's something that you should be aware
of so what I'm going to do is I'm going
to select cell K2 I'm going to go ahead
and copy this formula plugging it in
here we have it here and making sure
that it operates correctly yep it does
so how does this nested if statement
work well we're going to still evaluate
our first condition is the first role
evaluated as data analyst does it meet
that if it is we want to mark it as rule
desired now we get into what happens if
it's not a data analyst well now we want
to now check if it's a business analyst
so I'm going to close out this and what
we can do is I'm going to take this
business analyst formula right here
everything up to the if and I'm going to
go back in here and I'm going to drop it
in right here inside of the value if
false so it's an nested if statement an
if inside of another if so now if we
don't meet this first condition of the
value if isn't true it will go into the
nested if statement and start checking
this condition is now the software data
engineer equal to data analyst if it is
it's R desired if not it's not desired
so let's now drag and drop this all the
way down I'm going to expand this out a
little bit and now we can see if it's
data analyst we get rule desired along
now with if it's business analyst also R
desired but I'm not a fan of nessf as
they're hard to read instead I like
using the functions of and and or and
should be a little bit familiar because
we saw it from the intro lessons that we
did previously with and it evaluates
whether both conditions are true so in
this case I'll put in condition one of
B3 and then condition two of C E3 and
both conditions are true so it satisfies
as true dragging this all down in all
the following condition cases they're
not true for both conditions so
therefore it evaluates as false in or it
checks whether condition one or
condition two is true and then will
return true so inputting in the
conditions of B3 and C3 one of the
conditions here are true actually both
are dragging it down I expect yeah the
second and third rows are also true
where the final one both are false so
therefore it is false so let's run the
same Andor logic that we've run before
in order to determine which one we
actually use so in this one we're
checking whether both of these jobs of
data analyst and business analyst are
equal to this one here senior data
engineer as expected false and what
should we should expect for all of these
all of them are false because none of
these are going to be both data analyst
and business analyst so as you can
probably guess or it's probably going to
be the one that's going to work for us
we're evaluating whether either data
anal or business analyst are going to
match up to that value of senior data
engineer in this case we're getting
those tree values for data analyst and
True Values for business analyst so now
we're going to put that or function
inside of that if for The Logical test
and from there we can determine whether
it's rule desired not desired dragging
all down all of it's matching as
expected okay I'm going to go ahead and
hide these
rows so now what happens if we don't
want just a evaluate for a true or false
condition basically we want to evaluate
for multiple different conditions well
that's going to be something that comes
up if you need to ever bucket data which
we're going to be doing with salaries
now for this first one we're going to
just use a simple if statement we want
to determine whether a salary is greater
than 85,000 or if it's not we want to
just specify that the salary is low so
for this we're going to be evaluating if
H2 we're going to go ahead and lock that
H column is greater than that 85,000
which will lock that completely for the
85,000 then we want to say the salary is
greater than 85,000 conversely if it
doesn't meet this we want to say that
the salary is low I'm going to expand
this out a little bit and then we're
going to drag this down as expected we
have the values returning those are
85,000 and then this one at 35,000 it is
Mark is low now the problem we're
running into and why we need multiple
conditions is this is the salary is low
but there's actually no data there we
need specify in these conditions that
well there's no data so for this we can
use an ifs formula and what happens with
this is you provide a test and then a
value if true and that's just the first
one we can then provide another logical
test and the value of true so the first
thing I'm going to test is if there is
no value there I'm going to go ahead and
lock that H column as well and when I'm
looking for a blank I'm just going to
put in two quot Mark say signifying that
it's blank and the value of true is no
data okay put another comma we can see
we now we're on to logical test number
two the next thing we want to test is if
it's greater than 85,000 so we'll see H2
again locking that H and we want send it
if it's greater than or equal to that
85,000 which will lock if it is we want
to return back that salary is greater
than
85k and finally we're on to the final
logical test and basically we want all
of them to pass this condition so
instead of providing hey salary less
than 85,000 we're just going to pass in
true because we want it to be true and
we would expect this to be any values
between a number that are between 0 and
85,000 so like before we're going to
specify salary low running this we going
to expand this out and then drag this
down we have when it returns no data no
data salary less than 85,000 return
salary low and then whenever it's
greater than 85 the correct results now
if s functions are one of the more
complex functions to work with so you do
need some practice with this like for
those that purchased course practice
problems you have some now to go into
and actually try this out manipulate and
better understand how to work with this
with that in the next one we're going to
be jumping into my next favorite type of
functions math functions which heavily
used in data analytics all right with
that I'll see you in the next one
now in this lesson we're going to be
using math functions and also some
statistical functions in order to
perform Eda or exploratory data analysis
on our job posting data set and for this
we're going to be focusing on the five
major functions of count sum average and
also Min and Max and we're not only
going to focus on the core versions such
as just count but also the if an ifs
version so they have multiple different
versions that we're going to get to now
for our analysis we're going to be
diving into the full data set of the
data science job postings which has over
30,000 different job postings and in it
we're going to be specifically diving
into data jobs that are in the United
States for data analyst and we're going
to be able to use these sort of
different functions that incorporate if
and ifs in order to fine-tune in what
we're looking for one quick note you're
not limited to using un States and data
analyst you can use the scenario that
you're in of what country you're in and
what job title you're most interested in
instead so we're going to be filling out
this table right here and we're going to
start on Row three focusing on those
count functions first now the data set
is actually much larger than this three
columns I actually I'll unhide between a
through K but we're not using any of
these columns in between here so I'm
just hiding that them and making it
easier for us to work with for this
we're going to focus on the core
function of only count and we're going
to be looking at those that have all the
yearly salary data in it as you can see
over here that there's missing blanks in
here so we don't want to count those
that are missing anyway what I'm going
to do here is Select column M and as you
knew it selects the range of M colon M
and then from there press enter so what
we're finding is that around 22,000 jobs
out of these 30,000 we're going to find
out have salary data and how do I know
about that 30,000 well let's actually
see we can actually use instead we can
use a count a function which stands for
count all and it counts the number of
cells in a Range that are not empty
specifically I want to capture those in
the job title short column right here so
I'll do a colon alen running this we get
to see that it's around 32,000 jobs One
technical note before we continue these
are since we're doing the columns
themselves in this case the count M it's
also counting that column header in this
case so if we want to be exactly
accurate which in this case I just need
roundabout numbers if we want to be
exactly accurate technically we would
want to go in and S say subtract one to
get what the actual value is but frankly
I'm just trying to look at General
numbers right now I'm not too car about
one or two off so now let's dive into
analyzing this further on my needs
looking for specifically focus on the
United States first so we're going to
find those that have in the job country
here United States and for this we're
going to use the count if function and
this counts the number of cells within a
range that meets the given condition so
you provided a range in this case we're
going to provide the range of that
column K and then the criteria itself we
want to filter for United States which I
conveniently typed above so we'll select
it right there I'm also going to lock it
by pressing F4
and then running this we get that about
25,000 jobs contain United States so now
let's evaluate those data analyst jobs
using that same thing of ctif once again
we provide the range in this case we're
looking at that job title short column
and for this we want to look for data
analyst locking this cell we get about
9600 jobs for data analyst now next up
we're going to be using count ifs
specifically we're doing this because we
want to find jobs that contain not only
data an but also contain that they're
from the United States now we can't just
add these two columns together because
one it's going to as we once we add it
up we see that's even greater than all
the jobs there that's not what we
actually want we want conditions like
here on row 16 where it's a data analyst
and United States whereas something here
on roow 223 where it's a data analyst in
s that's not going to meet our condition
so we wouldn't count it so using count
ifs this counts the number of cells
specified by a given set of conditions
or criteria for this we need to specify
a range and then the criteria first
we'll focus on the range of a for job
title short and we're looking to match
that of data analyst which I'll lock by
pressing F4 then now we're moving on to
criteria range number two where for this
one we're looking at Job country now and
for that we want to look for the
criteria of United States locking this
with F4 closing this with parenthesis
and then running it we get around 8,000
jobs and this makes sense right because
it would be less than that 9,000 data
analyst because some of these aren't
going to be from the United States now
with how this is Flowing we could
actually make a visualization out of
this data right here so going into
insert and then recommended charts we
have here a funnel chart so I'm going go
ahead and insert that in and this
basically shows the funnel if you will
of jobs we have we started with almost
32,000 jobs and we got towards the end
of the jobs that we actually care about
us and data analyst at around 8,000 I'll
go ahead and move this off to the side
for
now all right next moving into the sum
function and the core one itself of
actual sum itself it's pretty simple we
have to just we're going to obviously
using salary year average column for
this because we want to sum up the
numbers in them and I'm put in that
column of M and we get the sum of values
there now unlike count where a count has
a count a or count all where we're
trying to find if there's blanks or not
that's not really applicable In Sum and
average and also in Min or Max so I'm
actually going to go ahead and just gray
these out because we're not going to
need them now moving into suth which
adds the cells specified by a given
condition or criteria this one is a
little bit more complex than we dealt
with with count because we first want to
provide the range that we're going to be
evaluating for a certain criteria which
in our case the range you want to
evaluate is job country because we're
evaluating for if it contains United
States which I'll lock with that four
but we're not summing the countries
because there are text column so we have
to provide this sum range which is
column M similarly once again we can do
that sum if looking for data analyst so
in this case we're going to be looking
at column A to evaluate if it has data
analyst in it and then from there the
sum range once again is going to be that
column M now the sum ifs similar to that
count ifs adds the cell specified by a
given set of conditions or criteria for
this one we provide the sum range first
so it gets a little bit confusing you
got to make sure that you're actually
reading the formulas in this case we're
going to use M because that's the sum
range we want to use and then we're
first going to evaluate for that job
title short that column A which we're
going to evaluate for data analyst and
then we'll evaluate for the job country
evaluating for United States closing the
parentheses and running this bam as
expected this value is less than that of
the data
analyst now moving into the last three
of average men and Max which I think are
actually more valuable than that sum one
we did I'm not going to walk through
actually typing in all these in because
now you've had a familiarity with how I
did the sum which follows the same
example for average men in Max feel free
to if you want to you can go through and
type it out on your own to get more
experience doing it but overall I think
this has some very unique insights from
it from this analysis we did in it we
can see that salaries in the United
States are around 125,000 where the data
analyst is only around 93 and
specifically us data analyst is around
94 so data analysts in general are lower
salaries than the other jobs in the data
science Industry as far as Min and Max
go we're having as low as
25,000 but we're having as high as well
at least for a data analyst up to
650,000 and apparently there's a job in
here around
$960,000 and you may be wondering what
jobs correlate to this $155,000 or
$960,000 well we're going to be diving
into that further when we get to that
lookup functions one last note on errors
before we go I commonly find the most
common error with these functions is a
value error and that usually occurs
whenever in this case we had column a
selected initially for criteria range
number one let's say we accidentally
selected multiple different columns for
this obviously we're not trying to
evaluate all the different columns we
only want to evaluate one column for
that criteria of if data analyst Falls
in it anyway when I run this I get a
value ER anyway this is a common one
that I see come up time and time again
so anytime you're going through this any
of these or the practice problems
themselves make sure you're
investigating to see that you've
actually input in the correct ranges to
evaluate cuz it's commonly causing those
value errors all right with that you
have some practice problems to dive into
and next we'll be diving into even more
statistical functions in order to really
dive into how deep you can go with Eda
or exploratory data analysis all right
with that I'll see you in the next
one we're now going to be taking this up
a notch shifting gears from focusing on
math functions now to statistical
functions for this we're going to be
using our job posting data set and
analyzing the salaries in this
specifically looking at common
statistical functions like median
standard deviation and even quartiles
once we have the basics we're going to
shift into an actual analysis looking at
what is the average salary of different
job titles and we'll even get a sneak
peek of visualizing it for this lesson
you can start by opening this syst
statistical functions workbook we're
going to be starting by filling in this
table here on the different statistical
functions we're going to be filling out
and we're still working with that data
set we did previously if you noticed
I've hidden a lot of the columns that we
won't be using for
this so we've done a few of these
different type of functions already
let's go ahead and fill these in for
count we'll be using the count function
specifically on that M column of salary
or average and like before we have
around 22,000 values for average we'll
be doing the same on that M column we
find that's around
123,000 for men we'll also run this on
the M column and that's around 15,000
for Max that's going to be around
960,000 so let's move on to our first
true statistical function we're actually
going to go into this to actually see
what it does and that's median it
Returns the median or the number in the
middle of the set of given numbers so
let's go ahead and type that out median
and in there we need to specify number
or numbers we can specify a range we're
just going to keep it simple right now
to actually show what this function is
actually doing it's selecting the middle
of numers so I'm just going to select
these top three numbers right now and
what I expect for this function to do is
to provide basically in a set of numbers
given provide the middle number so it
should provide us 140,000 which is the
center number of these three we don't
care about the center of just three
values we care about the center of
basically all of our different values so
I'm going to place the entire M column
into it and that is around
115,000 now why is this average higher
than this median well let's actually
visualize it I'm going to select this m
column and go to the insert tab going to
histograms I'm going to insert a
histogram and what this is showing is
the distribution of salaries from 15,000
all the way to 950,000 bottom xaxis is a
little confusing to read but it's
basically a range so this case 87,000
93,000 how many counts of salaries are
falling in between that and that's how
large the bar is next to it anyway
getting back to that original question
why is the average higher than the
median itself if you call back from
definition a median is the middle number
in our set of our list but our average
however is taking all the different
values and well averaging it out and as
we can see from it we have a large
amount of salaries around well
$100,000 but we do have some up here
that are getting close to a million
dollar these basically outliers are
causing us to have a higher average so
basically those values that are near
960,000 are dragging that average way
higher so that's why I prefer to use
something like the median when I can in
order to analyze these salaries because
they're not skewed by the these outlier
salaries that are just something that
you're probably not going to get all
right next up is standard deviation and
for this you have two options standard
dev. p and standard dev. s the P stands
for population and the S stands for
sample this data set is around 30,000
salaries and there's way more than
30,000 data science jobs available so
that's a sample of the actual population
so we're going to be using standard Dev
s and for this we can insert a range
into it so what does this value actually
mean well if we had something like a
normal distribution which our salary
data is somewhat close to that we'll
find that one standard deviation from
something like the average has in this
case right here 34,000 so if we went
above and below the average by one
standard deviation around 68% which is a
heck a lot of data is within this one
standard deviation so in our case if I
was to take the average and then
subtract this standard deviation along
with taking the average and then adding
the standard deviation around 70% of the
salaries are going to be between 75,000
and 170,000 but what if we wanted to be
more precise about finding say something
like where does 50% of the data actually
fall well we can use quartiles in this
case specifically calculating the first
and third quartile here's a graph that I
did from my python course which when you
get done with this course feel free to
check it out but anyway it looks at the
salary distribution of data analyst
United States has this histogram right
here very similar to what we plotted
previously in Excel but in it I'm able
to plot out cortile one where the
quartile one starts and then quartile 3
where that one starts so between this
quartile 1 and quartile 3 marker lines
50% of the data Falls here with this red
dotted line being the media again which
let's actually get to calculating this
so if we want to do something like the
cortile we're going to see that there's
a few different functions available for
this we have exclusive and inclusive
we're going to do inclusive first and
then I'll show The Exclusive after to
basically show how it's different so
this takes two arguments the first is
the array so I'll put in that range of
the M column and then lastly it takes
the quartile and we have one for the
first quartile two for the median three
for the third quartile anyway I have
these values over in the U column so
I'll just select that and use that for
this and for the second quartile we're
seeing that basically as a just red
that's also equal to the median now I'm
going to go ahead and get rid of these
Min and Max CU we can also use that by
with our quartile function and I'm going
to go ahead and drag and drop this up
and then also below so what we can see
from this with this first and third
quartile is that around 50% of the data
Falls between 90 ,000 and
150,000 so frankly when it comes to
using quartiles like here and standard
deviation I find myself more gravitating
towards quartiles anyway what about that
other quartile function specifically
that one around exclusive values well
once again I can select the array that
we're going to use we're going to use M
and then finally the quartile itself now
notice for this one this one doesn't
have a value of 0 and four that you
actually can put in for the Min and Max
it's exclusive so it excludes those
outliers basically of the Min and Max so
specifying that column next to it when I
actually drag this down we can see that
the Min and Max AR provided in this but
it's the same values for that se first
second and third quartile if you notice
here we get this numb error and as we
inspected when going through this
formula zero and four were not available
to actually input into the formula so
any time you're inputting things into a
formula that doesn't necessarily exist
you're going to get this numb error all
right the last function to investigate
is the mode and this returns a vertical
array of the most frequently occurring
or repetitive values in any array in our
case we'll once again provide column M
and surprisingly we find that 90,000 is
one of the most repetitive values if we
go back to that histogram we plotted
earlier we can see that the largest line
right here with a value of 19 25
occurrences occurs between 87,000 and
93,000 so this makes sense on the 90,000
being the
mode so let's get into some data
analysis Now by actually ranking the
average salaries of these different job
tiles I'm going to go ahead and hide the
columns v through R now in order to rank
the salaries of the different job tiles
that I have this list here for you where
you need to First calculate the average
salaries of each of these job titles so
for this we're going to be using as last
time average if first we need to specify
the range that we're going to be
basically running that if on not
necessarily the values but the range of
the job titles next we need to provide
the criteria for this we'll provid it of
data analyst which is in W2 and then
finally the actual average range of
column M dragging this all the way down
we have our different averages for all
the job tiles one note real quick in
future lessons we're going to be jumping
into using
median to evaluate these job tiles cuz
personally I like that more but that's a
slightly more complex problem so we're
going to stick simple for now anyway
with these advertisers we can now
actually rank it and this Returns the
rank of a number in a list of numbers it
size relative to other values in the
list so first we need to put in the
number that we want to rank in this case
we're want to do that of data analyst
and then from there they have the ref or
the reference array in this case we're
going to provide it right here from X2
all the way down to X11 now I can change
this from descending to ascending but
I'm going to keep it how it is now I'm
going to drag and drop all the rest of
these and we had a little bit of an
issue CU we have repeating numbers right
here it's obviously because I didn't
lock my cells appropriately so selecting
this range that I want to actually lock
and pressing F4 go ahead and lock that
and then we'll drag and drop this again
hopefully this works this time and boom
we have all these ranked from highest to
lowest we can see business analyst or
some of the lowest thata analyst not far
behind and Senior data scientist has the
highest I'm going to take this one step
further I'm going to highlight
everything from job title down to the
bottom salary for software Engineers
going to go into insert in here and go
to recommended charts and basically the
first one that pops up this clustered
bar chart I'm going to insert in and I
can just change the salary up here by
double clicking in here and I put
average salary of data science jobs and
there we have some data analysis is
actually viewing these one minor touch
to this I really don't like how these
are unordered right now so I could
actually go up here select these three
titles right here and then under the
Home tab select I want to actually
filter it and then order this rank from
well we'll say largest to smallest one
note you may not have been able to see
it but it actually rearranged the data
inside of our data set that's not a big
deal for me I'm not caring too much but
that is something that will be effective
whenever you do this anyway with this we
can see things like senior roles or
getting paid the most and things like
analyst are sometimes getting paid the
least compared to these all right you
now have some practice problems to go
into and thus practice your skills with
these statistical functions after that
we're going to be jumping in the next
lesson into arrays which is a super
powerful feature sort of new to excel in
the past few years all right with that
I'll see you in the next
one we're going to be now shifting gears
and jumping into a more advanced topic
of arrays and with arrays what you can
do is typing a formula in a single cell
we can use this to fill in cells below
it or cells to the side of it all with
one single formula so we're going to be
slowly working up to an easy then a
medium and then a hard problem and how
to use these first up with the easy one
we're going to go through and basically
identify all the unique job titles and
then go through and actually sort it
alphabetically using arrays next we're
going to move into to our median problem
of calculating the median salary if you
recall back to our last lesson we were
calculating the average salary based on
a job tile well we can use a raise to
calculate the median and then finally
one of the most hardest problems we're
going to get into actually looking at
based on the month how many different
jobs were submitted during that month
and before this we'll be using the Su
product formula and a combination of
other ones using arrays for this be
using the arrays formula Excel
workbook now before we jump into those
problems we need to First understand
that there's actually two different
types of arrays we're going to start
with the first one of modern dynamic
arrays which we've seen before and with
this what we can do is using a formula
we can specify a range to identify and
then whenever we press enter B2 to B5
it's going to actually fill in with all
these we can see that it's modern or
dynamic because it has this Shadow
around the edge if I select any of the
other ones and not the core one where
this one's actually highlighted when the
other ones these are grayed out taking
this a step further with array
multiplication we can actually go in and
multiply this column one of A2 to A5 and
multiply it times B2 to B5 anyway in
this sequence you can see that it goes
down 1 * 1 is 1 whereas 4 * 4 is well 16
anyway that's modern dynamic arrays
classical arrays let's say we want to do
the same thing in this case well we're
going to have to go about it a little
bit differently we need to select all
the cells that we want to fill in first
is a very key concept to get for right
first then from there we can start
entering our formula so I put in equal
in this case we want to do the same
array multiplication I'll take A2 to A5
times it time B2 to B5 now whenever I am
done with this and I want to actually
execute this I don't just press enter I
have to press contrl shift enter and
then it fills in the array notice it's
not grayed out around the edges like
this as a shadow this one does not do
that and all of the different formulas
are now filled in below this and you'll
notice that there's a curly bracket
around this this was used prior to
around
2020 and so you may come into contact
with Excel spreadsheets that have this
and if you don't know about it if you
come into here and say you want to like
mess with this formula and you press
enter you're going to get an error
message but now let's say we have some
additional values in it we'll say We'll
add five to each to the bottom of these
if I wanted to adjust this array if I
came in here and then change this to six
for both the bottom and the top and
press control shift enter it's only
going to adjust the ones that were
previously selected so now if I want to
include this bottom row right here for
modern dynamic arrays it's pretty easy I
can just come in here and adjust this to
six and this is done however for
classical arrays or classic arrays not
classical I have to actually select all
these different cells and then go in and
actually enter the formula that I want
to enter if I try and press enter it's
going to give me an error message and I
realize okay I have to press control
shift enter and it'll actually fill in
anyway the main point of this is
classical arrays or a mess we're going
to be focusing on Modern and or dynamic
arrays for the remainder of this course
but you need to be aware of classical
arrays in case you encounter them in the
wild
so jumping into our data analysis we're
going to be focusing with the data set
that we've been focusing on before and
I've hidden any columns that I don't
feel are relevant for our future
analysis that we're going to do anyway
the first thing we're going do is find
the unique job titles and for this we
can use the unique function and this
Returns the unique values from a range
or array so the first thing we need to
do is actually put in the array itself I
don't want to actually select this
column A because I don't want this job
title short to appear so I'm going to
select A2 and then press control shift
down to select all the way to the bottom
I'm going close this parenthesis and we
have all the different unique titles in
there now I want to get the sorted job
titles out of this so as you guess we're
going to use the sort function and for
this all we really need to do is specify
the array now if you notice from this
one whenever I went ahead and selected
it it specifies that R2 pound and that
basically says that hey there's an array
basically formula inside side of R2 we
want to extract all the contents of that
using R2 pound and so that's going to
work to be able to provide us all those
values and then it's going to sort it in
this case we have it sorted in
alphabetical order one thing I haven't
called out both times is these are once
again dynamic or modern arrays you can
see that gray box around each of these
but just to show this also works by
specifying R2 to R11 it's going to
provide us the exact same results but I
really like the shorthand nomenclature
of the R2 hashtag
sign now we're going to get into
calculating the median salary and if you
recall back to our last lesson on
statistical functions we went through
and calculated the average salary for
each of these job titles using an
average IF function but as it discussed
last time when comparing something like
the average to the median the average in
this data set is slightly higher due to
those basically outliers of those High
salaries around 960,000 so we want to
use median so what are we going to be
eventually calculating and now that's
this table right here where we sorted
our business or job titles themselves
and then we go into actually calculating
the median salary based on these
different job titles from our data set
now there's a pretty complex formula
going into here so because of this we're
actually going to break it down step by
step by step going through each columns
explaining how this actual process works
in order for us to get to this final
value for this we're going to be doing
it for data analyst only as we can see
the final value we're going to get to is
990,000 which over here which our final
results 90,000 so I'm going to go ahead
and delete this to actually start with
now we need to look for two separate
conditions the first one we need to look
to find do the job titles here actually
match up to this value here of data
analyst and this provides booing values
back where we get to this value down
here for true as expected in row 16 we
have data analyst now if we scroll down
further we can see that our next data
analyst job doesn't have a salary for IT
these type of things will throw off our
final median function that we're going
to actually be calculating and so we
need to basically filter it out as well
well so with the salary dat data set
selected I'm going to then go through
and filter this basically not equal to a
blank value and as expected we're
getting false values for these blank
ones now similar to what we saw in the
intro in arrays where we were
multiplying different arrays together
we're going to do the same thing here
with these bolean values for this I'm
taking that formula and wrapping it in
parenthesis it needs to be in
parenthesis in order to execute properly
for the we contains that analyst and
then the second condition that the
salary can't be blank whenever we
multiply these two Boolean values
together we get returned back either a
zero or one and the only way we get a
one back is if both these values are
true which is the condition we want to
meet now for zero or one values we can
actually see if we did an if statement
here if we did a logical test of zero
what is it going to return whether true
or false so for Z returns false and for
one we'd expect to turn true anyway we
don't want to necessarily return true in
this case we want to return the salary
that corresponds to that Row in the data
set so I'm going to go ahead and delete
this so for this we're going to start
with that if function itself then I want
to place all the different contents that
we saw in that previous V column now we
want to return the salary which are
these contents right here so I'll be our
value if true and then if false we just
want it to be FAL false which we can
just leave blank so now scrolling down
we can see that we have nothing but
those values for data analyst scroll
over just to confirm 129 yep that
analyst all right the last step we need
to go ahead and put inside of our median
formula all those contents that we had
before that entire if statement itself
to evaluate so that array that it's
going to basically find out for all
those salary for data analyst and it's
return back the median salary now this
also works for other function so let's
say we wanted to use the mode we want to
use a mode if condition they don't have
this available so we could just plug
this inside of mode and then running
this we can see that well the most
common value for thata analyst
apparently also the median of 90,000 so
going back to our data sheet let's
actually go through and stepbystep
calculate it for each of these different
sorted unique job tiles that we did
previously and we're going to be
building this step by step how i'
normally build a for so the first thing
we going to look for the job titles
itself do they match to that business
analyst so selecting column A2 and then
control shift down to select all the
contents on the cell we want to see if
that's equal to this business analyst
rule right here and now remember we're
going to be dragging these Downs do an
autofill so we need to be particular
about how we lock these cells
specifically we do need to lock these
values right here and just for safe
measure I'm going to lock the column of
this one okay pressing enter all right
we we have our array back looking for
business analyst and we can see that
it's working by what we see down here in
row 84 so let's actually do that array
multiplication by now filtering out
salary that doesn't have values or
blanks so we're going to put another set
of parentheses next to it we'll put in
our salary data and that it's not equal
to blank running this we confirm that
the first value of business analyst that
has a salary has a one now we need to
wrap this all inside of an if to
basically return instead of that one we
want it to return the salary itself so
for the value of true I'm going to put
in the selection of the salary yearly
running this we confirm this is again
correct looking at row 180 almost done
just now need to wrap this all inside of
a median function and Bam
85,000 and hopefully we actually locked
all the cells properly dragging it Down
Bam looks like we got all our things and
we slightly messed up our formatting
here so I'm going to go ahead and put a
thick outside border on again to make
that right again all right so that's how
you basically transform any function in
Excel that doesn't have that you know
count if or average IF function or
capability into other
functions now moving into probably the
most complex example that we're going to
be be using not only this lesson
probably in the entire course so if you
get around this you're going to be good
to go for the rest of the course anyway
what we're trying to look at here is the
count of job postings based on the month
that it was posted in and we're going to
be using the sum product function for
this now sum product is not anything
that you should be afraid of basically
before we were doing whenever we were
doing the intro and we were talking
about array multiplication how went
through line by line based on this and
we have our values of 1 4 9 16 and 25
line by line well if we were to do the
sum
product of the values in column A along
with the values in column B we're going
to get 55 which when we look at the sum
of these values here we can see that it
is 55 so it's a sum of the product of
the arrays so getting back to our
example that we're going to be solving
we're trying to aggregate it by these
names of these months if we actually
scroll over to the data set itself the
job posted date is in a date time format
so similar to the last example I'm going
to be walking you through column by
column by column on how we get to this
final value that we're going to be
eventually putting into our table here
to thus calculate these values for the
counts per month so we go ahead and
clear these cells to start and we're
going to start first by we want to
extract out the month from this job
posted date column so for this we can
use the text function which we're sort
of jumping ahead because we'll be doing
text functions upcoming lessons but
there a good little sneak peek anyway we
can plug in here something like a date
time value and then from there we wanted
to Output what is the format text for
well I know that if we do three M's it's
going to provide me the shorthand month
of this additionally if I do fourms it's
going to provide me the lonand month of
this and there's a host of different
format codes that you can provide sh by
this table here when I'm looking it up
in something like perplexity that says
that hey if you provide certain things
like if I provided a Double Y it's going
to provide the two-digit year and so on
for other values you look this up in
something like chat GPT anyway get back
to this example itself I want to
actually autofill this all the way
through it's not around any other
columns that I can actually autofill all
the way down and I don't want to sit
here and drag it all the way so what I
can do is select the column itself and
then when it has these basically four
arrows I can then drag it where I want
I'm going to drag it right next to here
and then now actually autofill it all
the way down now that I have it complete
I'm just select this column again make
sure I have those 4 hours again and drag
it back to the column it needs to be now
seeing what you did here you probably
like Luke can't you use something like a
count if in order to calculate the
months now using this and you'd be
correct with that remember call back for
the count if s we can provide a criteria
range in this case we're going to
provide it column V and then for the
criteria itself will provide the actual
month and then actually dragging and
dropping this all the way down once
again my formatting got messed up so I'm
put that thick outside border back on
there anyway these values here for what
we're going to get finally are the same
and so you really could stop this lesson
right here and if you want to do this of
creating a new column and then just
using count ifs you can do that but this
is a lesson on arrays so we're going to
get more complex with this in order how
to use the arays in order to actually
calculate this without having to create
these extra columns so I'm going to go
ahead and hide this cuz we're not going
to use it so before we can actually
summing up we need to get an array of
all the values that we'll say equal to
January so so we'll start by creating
that text function it's going to be
slightly different before cuz we're
going to be making it out of array we
want to actually select all the values
from H2 all the way down to the bottom
we want to then go ahead and lock it we
want it to be evaluating for that long
month name so four lowercase M's and
when I want to check if it's equal to in
our case we're looking for January we'll
look up here at this U2 or U1 I got a
typo up there update that to U1 anyway
we now have okay that this value is true
right here and we can tell from row 11
that this is in January it is true so
it's working out just fine so now if I
tried to actually run a su product which
is what we're finally trying to do on
all the contents of this array itself
we'll do W2 uh hashtag we're going to
get back zero because this isn't in the
format that we want we actually need to
convert this unfortunately although it
is on the back end is zero and on the
actual functions themselves can't
actually calculate it so we can do this
by basically converting it and the first
thing we can do actually is just we'll
put one negative sign and then I'll put
in that W2 hashtag and what this does is
it negates the Boolean values so
basically true which is normally a one
it negates it and makes it negative one
zero a negative Z is negative anyway we
need to actually apply two negative
signs CU we don't want it to be negative
one we want it to be positive one so
doing this one more time we now have
positive ones in there so now we are
using some product because some product
I feel are better with arrays but we
could use in this case where it's a
single array we could use actual just
sum itself I didn't want to show that
and we get that value of 3102 which
correlates to what I expect as the value
but we're going to use some product
because as you'll find out in future
lessons we're actually going to be
modifying it even further what's inside
of here and so we need this Su product
in order to do those anyway we get the
same value of
3,12 so going back to our data tab let's
actually calculate this fully for all of
these different values walking through
it step by step by step as we do
previously we're going to start with our
text function and we want to look at
that job posted date column I'm going go
ahead and lock all those cells it's very
important for this going be dragging and
dropping that down and remember for the
format text to this we want it to be
four lowercase M and in this we're
checking whether it's equal to this
value here of V2 which is January and
I'm going to go ahead and actually lock
just that column pressing enter to make
sure it goes correctly yep we got True
Value here for our row 15 value first
thing we want to do is do that double
negation which we need to actually wrap
these in this whole formula itself in
parenthesis in order to get our Z and
one values and then finally we're going
to wrap this all once again in Su
product putting that closing parentheses
on there pressing enter get 3102 and
then doing autofill all the way down we
have all our values once again format is
messed up I'm going put that thick
outside border now the other reason why
we're using some product in this case is
because in older versions of excel
before we had these uh modern dynamic
arrays some is not going to to be able
to work over a raise and you actually
have to use some product so this allows
us also to have a safe way to
calculate using arrays and then give it
to people that may be archaic and have
older versions of excel all right it's
your turn now to jump into some practice
problems to get more familiar with
working with arrays inside of formulas
in the next lesson we're going to be
getting into probably I think one of the
most funnest types of functions lookup
functions like vlookup and X look up and
things like that which are super helpful
for data analysis all right with that
I'll see you in the next
one lookup functions are one of the most
I'd say funnest functions whenever
you're learning to be a freak in the
sheets specifically we're going to be
focusing on three different lookup
functions vlookup H lookup and X lookup
V and V lookup stands for vertical H and
H lookup stands for horizont and x and x
look up just uh they wouldn't be
different in order to learn about these
functions we're going to be performing
some data analysis and if you recall
back from our math and statistical
functions lessons we found out what the
median Min and Max salaries were but for
the things like the Min and Max what
were those different job postings that
correlated to that well based on the
structure of our data set we can use the
vlookup and also X lookup functions in
order to find this out now because of
the structure of our data we're going to
have to do something different in order
to implement H lookups and for this
we're going to be able to get out or
extract out horizontal type data we're
going to basically transpose it into a
vertical format using H lookup but if
there's anything you remember from this
lesson it's that of X lookup this one is
the most dynamic and flexible and how it
can be used and we're going to be doing
in a final example using this in order
to bucket our salary data set allowing
us to categorize it into different
ranges and whether it has data or not
all using xlup for this we can start
using the lookup functions workbook we
have two main tabs in this data and
dataor 2 Data ones where we're going to
start in first for this section on
vlookups so for this we're going to be
using that job posting data set I've
hidden any unnecessary columns and we're
going to be filling in this table right
here so what I'm trying to do with this
is fill in based on this Min as you can
see the formula for Min the formula for
max and the formula for median where we
actually calculate this from the Sal
year average column we want to then
extract out based on these values the
company name a job title associated with
it and then the country associated with
it so we're going to start with V lookup
first and V lookup looks in a vertical
type format specifically it says it
looks for a value in the leftmost column
of a table and then returns a value in
the same Row from a column you specify
so for the first value of this we want
to provide the lookup value in this case
we want to look up 15,000 from that
salary year average column then from
there we need to provide the table array
now remember for this it needs to be the
leftmost column of the table and we want
to get columns M and O I'm going to
select column o because if we start at M
and try to go down it's going to mess up
cuz there's blank in it so I'm just
going to do control shift over and then
control shift down to select all the
data and then change this a column to M
instead the next thing we need to
specify is the column index number and
right now we're in column M so that
would be the First Column so MN o we're
in the third column you can imagine if
we have a buttload of columns what kind
of problems are going to run into so
we'll get to that when we get to it okay
now they have a range lookup we're going
to leave that blank for the time being
we're just going to execute this formula
as is and for this we're getting an NA
error if we actually click into it value
not available error and why is that well
if we actually go back to that vlookup
function in the definition that it
provides for it the last statement is by
default the table must be sorted in
ascending order right now our salary
values are not sorted so it's having
issues going through it and actually
finding that 15,000 because it's
unsorted anyway we're not going to
actually sort that table that's going to
be too much work we can actually now go
into that fourth parameter of range
lookup and instead of doing an
approximate match which was the default
we're going to do an exact match by
providing false in that case we find
that net two Source Inc is the company
name of the job with 15,000 now I want
to autofill for this but we need to
actually lock some cells real quick so
I'm going to lock this right now by
pressing F4 then from there we'll drag
it down now one thing to note on vlookup
X lookup and also H lookup this is just
going to return the first value so in
this case of this 115,000 it says it's
Volt Technical Resources however I do a
contrl f of
115,000 we'll find that yes it's at row
19 for Volt Technical Resources the
first one that provides but it's also in
row 42 with northr Gan so it's only
providing that first match now what
happens if we wanted to next get things
like the job title or the country itself
well if I were put in the first two
values the lookup value and then the
table array what will we put for the
column index number remember in vlookup
the leftmost column of the table itself
is what we're going to be looking up but
however columns A and K are even well
more left of that table so unfortunately
we can't use vck up for this but we will
be using X lookup for this that's why
I'm going to recommend it over vlookup
but I think you guys start at the Bas
phics
first however before we get into that
we're going to now shift gears and cover
H look up in order to look up values in
a horizontally oriented table this case
this is horizontally oriented because we
have things like the months across the
Horizon if you will and then we have in
the columns in the column standpoint we
have the job titles of the different
ones of data analyst and your data analy
so on now the data in this table is
calculated using the data from the data
tab in order to get the counts of months
and you've previously seen this in the
last lesson where we went in that hard
example of some product where we now go
through and do some array multiplication
in order to find out the different
counts for the job titles based on a
month anyway for this H look up we want
to look up based on a month what is the
associated job count for a specific job
type
so let's say we want to just look at
that may column well we can put in h
lookup and this looks for a value in the
top row of a table or array of values
and Returns the value in the same column
from a row you specify so only selects
from that top row for this we provide a
lookup value in this case let's say
we're looking up January then from there
we provide the table array itself we can
go and just select this data now I could
technically I could select all this data
because it's just going to go to the
associated column associated with this
so that we included row a doesn't really
matter then from there we want the row
index number what value do we want from
this January do we want data analyst
senior data analyst senior data
scientist so we can just count down what
we want we'll start with data analyst
first so we'll put in that's the second
row in this so let's try to enter this
and for this we get 753 which if you go
back to this we're doing Jan A1 through
M7 and then the second one so why are we
getting
753 well once again this has to do with
the range lookup we're doing an
approximate match similar to vlookup it
expects that these values for that top
row are in in this case alphabetical
order in order to perform that
approximate maass
these aren't in alphabetical order
they're actually in chronological order
so instead we need to specify false now
running it we get the correct value of
982 now we can also apply this to a
situation where maybe we want to
transpose these values into this new
table that we have here on month and
count and then up here I'm going to also
just specify what we're looking at we're
going to look at data analyst now with
our H lookup we're going to be providing
that lookup value the table array and
then the row index number say if we
wanted to go in here instead of data
analysts we wanted to look at data
engineer instead how can we get this to
update well we can use another function
for this specifically we can use the
match function for this and this Returns
the relative position of an item in
Array that matches a specified value in
a specified order so in this case I want
to look up data engineers in the array
from a 2 to A7 it's providing me a one
cuz it's not it's doing the approximate
match again once again they're not in
alphatic so I have to specify exact
match using zero okay and now I get data
engineers in the fifth place I'm also
going to move this column over and make
this a little bit bigger going back into
that H lookup that we're going to use
for this we're going to provide that
lookup value which we want to actually
lock by pressing F4 then we're going to
provide the table once again I said you
can select that a column if want or not
we're going to lock all these values as
well because we'll be dragging it down
from there we'll be providing the row
index number which we've calculated
right here in this P based on that match
that we're performing want to lock this
as well and as far as the range lookup
well we want to do exact match running
this we get an NA error because I was
silly and the lookup value we want to
actually do is for the month of January
not the data engineer actual lookup
confusing this with h lookup sorry about
that so we'll put in 03 for this instead
and then running it and now we're
getting back to 236 which is not that
Engineers thing we're one off and this
has to do with how we did our match up
here which specified A2 to A7 basically
we're counting down from the second one
where in h lookup we included all the
way up to that first row so this is just
a simple fix by changing this one up
here to A1 and now our values update
appropriately and then I can go ahead
and just drag and drop this all the way
down and once again going to get into
some troubleshooting because this is all
the same values and that's because I
fully locked this actual month number
and instead I wanted to press F4 and
only lock the column of O now finally
getting to the final answer we have it
and we can confirm this that data
Engineers should have 396 on the
December value that's correct and now we
can do things like this where I can go
in and say hey instead I want to look at
Dana analyst and it will update for this
instead now once again with h lookup we
run into issues like vlookup if there's
values Above This top row I can't really
think of that any applications that
that's applicable in this but it is a
limitation anyway this is why we're
going to be shifting to the next
topic and that is using xlookup to now
based on these salaries that we were
previously trying to identify
identifying a job title and a country
associated with it so what is the
definition of xlup and this searches a
range or an array for a match and
Returns the corresponding item from a
second range or array by default an
exact match is used that's pretty
awesome considering all the issues ran
to with h look up and V lookup anyway
instead of using a single table we're
going to be using multiple ranges for
this let's get into it first we're going
to provide the lookup value which in
this case is 15,000 and then we want to
provide the lookup array so we need to
select this entire M column here for
what we want to actually look up but we
have these blanks in here so I'm going
to just do a trick of selecting the O
column selecting all the way down and
then from here I'm going to just go in
and actually change these values to M
instead now we want this to remain the
same so I'm going to press F4 to
actually lock this now that was our
lookup array now we want to get into
what return array or where we want
actually look to see and that's to the
left of this in this job title short
column these arrays have to match up in
where they are uh where you're selecting
them so in this case I selected over
here in the second row I need to do the
same for the job title then from there
pressing control shift down I select all
of them once again I'm going to lock all
of these by pressing F4 now let's close
the parentheses and go ahead and execute
it we can see see that data engineer is
the lowest paid salary with this 15,000
now we can also add in this default
parameter in case you can't find a value
you can put not found but in our case we
made or we calculated this minmax and
median from our data set so technically
this isn't really necessary anyway let's
see what the other job titles are for
these Max and median looks like it's
data scientist and then the data
engineer for the median which is that
first one that appears right over here
in row 19
now doing the same for the country I'm
going to go ahead and just copy and
paste that formula in that we had from
the other cell and I'm going to just
adjust this now to use column K instead
of column A for the actual return array
okay with that updated press enter and
we can see Brazil has the lowest one and
what is the highest one United States
and also the median United
States all right we're going to crank
this up a notch and now we're going to
jump into actually bucketing our salary
using X lookup specifically I want to
use this table that I've created in
order to properly categorize different
values based on this so in this case we
have this value of 140,000 it's going to
fall into our bucket of
125,00 th000 there's no data in this one
so I want to say no data this one's
greater than 200,000 so I want to say
greater than 200,000 so for this we're
going to be creating a new column column
Q and we're going to call it salary year
bucket I'm going to go ahead also and
cod this column o for the time being we
don't really need this for this now
technically you already have the
requisite knowledge in order to bucket
it I could put in a nested IF function
similar to below and it has 1 2 3 four
five if you will nested ifs to go
through and basically check each of the
different values as it's going through
in order to bucket it appropriately
in this case it correctly categorizes it
and then if I wanted to I can drag and
drop it all the way down but now this
sheet is filled with all of these nested
if statements this is really going to
slow your spreadsheet down so I don't
recommend doing this also building
something this like this you've now
hardcoded in your values into it and
what is if you want to change this later
you'd have to update all your formulas
it's a mess don't recommend doing it so
I'm going to select all this control
shift down and then just delete it all
instead we're going to be using X lookup
for this specifically we need to look up
the lookup value which is going to be
the same one that we did before that M2
and then we want to look up the lookup
array now I conveniently made this table
here that it's providing values at if
you will the higher end of the bucket so
we're not going to necessarily do an
exact match for this we'll get to that
in a second anyway now we want to look
at what do we want to return the return
array which is on the left side of this
table that's the values I actually want
to return back in that column value if
not found is not necessarily applicable
so now getting into how we're actually
going to match based on these salary
buckets based on these values
highlighted in this T column right here
well we need to do not exact match we
need to do exact match or next larger
item and this is the value of one
basically in this case of this 12850
it's going to look for initially an
exact match of 128 of 50 and it's going
to see that nothing's there so then it's
going to look for the next larger item
which is that 200,000 so therefore it's
going to return as we're going to find
out the
125,000 to
200,000 now I can try to drag and drop
this down but I'm going to run into
errors because I didn't lock my formulas
correctly so I need to go back in lock
that s column with F4 and lock that t
column with F4 and then I'm just going
to autofill all the way down and now we
have all of our different job postings
bucketed into these different salaries
so instead I wanted to go through and
actually change this to be
150k and then match this to
150,000 I go do it and it would update
appropriately I also need to update this
column as well but now it all updates
and it's in one single location so this
is really the power of using that X
lookup over the ifs in order to perform
this type of bucketing all right you now
got some practice problems go through
and get more familiar with using these
different Lookout functions as I said
before make sure you're prioritizing
understanding that X lookup it's the
most powerful but the one caveat to X
lookup is that it was introduced around
the 2020s so anybody using once again an
archaic version of excel Beyond or
before this year they're going to have
compatibility issues using this so
that's why you need to also be familiar
with that V lookup and also H lookup are
going to encounter them in the while all
right with that I'll see you in the next
one where we're jumping into text
functions now I know this is a course on
data analysis but text functions are
actually imperative for performing
analysis on Text data and for this we're
going to be working in this lesson on a
data set of job applicants and we're
going to take it a step further using
text functions in order to analyze
specifically for our final analysis we
have information on the different skills
that each one of these job applicants
knows so we're going to be able to
perform an analysis to see what are the
most common skills from these applicants
but before we get to that final analysis
we first need to beef up our knowledge
we're going to focus on three main areas
the first one is text combination we're
going to be working to combine different
columns into a single column from there
we'll move into the second one of of
text extraction being able to out of a
single column extract multiple values
and finally in the third one performing
some sort of text search in order to
also extract out in this case we're
going to be extracting out the state
name from an address that contains a
city state and area code so for this you
can start up by opening up the text
functions workbook and in the data tab
we have this data set which you haven't
seen before it's only about 20 R and
includes a list of job applicants now
we're not using the full data science
job posting data set because a lot of
the examples we're going to do in this
it would be basically bogged down your
Excel spreadsheet so especially how
we're going to be implementing these
it's really meant to be used for smaller
data sets you may be like Luke what ends
if a bigger data set and need to clean
up the text well that's where power
query comes in which we'll be covering
in the advanced chapters so stick around
for that
anyway moving into text combination we
want to Target these columns right here
f and g we want to combine them into one
line to have a single address so I'm
going to go ahead and hide this column H
for the time being we're going to be
putting that full address in column J
and this one's pretty simple all we're
going to do is text join which
concatenates a list or range of text
strings using a delimiter the first
thing I need to specify is the delimiter
how am I going to to separate that
street and the city state all I want to
do is a space so I'll do that en closing
it in double quotes next is ignore empty
basically if there was an empty cell in
here it would just ignore this and it's
not going to input multiple different
spaces between it just ignore it so we
want to in that case we're just going to
put in true the final one is text and we
can specify you could do text and then
comma and then text to um that's really
Vose I don't really like doing that
instead I'm just going to select the
range of F2 to G2 now we can see that
the address is fully concatenated and we
can drag it on down and it works for all
of
it now the opposite of combo is
extraction which we going to get into
next and in this case we're just going
to use a single column and extract out
multiple values in this case we have
this full name column we want to extract
out the first name and the last name go
ahead and hide these other columns we're
not using
in this case we're going to specify the
text split and it says it splits text
into rows or columns using delimiters so
we'll first start by specifying the text
which is B2 in this case and then the
column delimiter which in our case is
going to be that space once again we're
going to use that double quotes for that
space and then end Double quotes and
then this is going to be a dynamic array
and it has these two values here now
dragging this all down down we see that
it fills in for all these different
names now we just split text there also
could be cases where we maybe want to
extract out certain amount of values or
certain amount of text from a column in
this case we also have our application
ID number which is a combination of
letters and numbers but as you can see
from this there's some values in here
that are actually repeating sometimes we
want to refer to this the shorthand of
this and let's say we only want to get
the last three digits of the applicant
ID because we know that's always
different well in this case we can
specify the right function and it
Returns the specified number of
characters from the end of the text
string we specify the text itself and
then the number of characters in this
case we can just say three and it's
going to provide back that 548 we could
also just change that to include all the
text numbers in case this number gets
bigger than that and then go ahead and
drag it all the way down
now one last one before we get into
actually performing that analysis we
want to we want to go through and
extract out the state from this city
state and zip and as you notice from all
these they have a common format in that
the city has a comma and then the state
starts and then there's another comma
following that so we're going to be
using those basically delimiters if you
will in order to identify where we
should potentially extract out this
state value from where this state these
two L twetter value so the approach
we're going to use for this is as we go
through this is we're going to find the
location first of that first Common
space before this the next we'll find
where it actually ends and then finally
well using those values will actually
extract out using the mid function that
state abbreviation so the first thing we
need to do is find that comma and this
Returns the starting position of one
text string within another text string
so in this I'm going to specify that
we're the fine text we're going to be
looking for is the comma itself and
we're going to be looking at within text
obviously G2 now we also need to find
the second comma in this we can use that
find function again specifically we're
finding that comma specifying that
within text of G2 and now we have the
second optional parameter of start
number we want to start from nine which
is the first one we found this in
running this we get nine now the problem
here is because we're starting as the
exact number that the comma actually
starts that's why we're getting that
back that value of nine we need slightly
actually bigger than nine but anyway
we'll fix that in a bit instead let's
actually get into extracting out that or
at least trying to extract out that CA
of this value and then we'll fix that
issue in cell R2 so for this we're going
to be using the mid function which which
similar to that right function is
Returns the characters from the middle
of a text string given a starting
position and length so in this case we
want to extract out G2 and we'll provide
it the start number of well what's
valuable in Q2 and the number of
characters and for right now we'll just
put in we know we want to extract out
two so we're going to put in two now
we're running into issues we're only
getting back a comma if will and if we
actually make this longer to actually
zoom in on here we get commas space CA
now when providing four and this has to
do with right here this value on the
start number isn't correct this nine
right here is exactly at the comma we
need to actually specify for that start
number of where the C is and these are
all two spaces over so I'm going to come
in here and I'm just going to modify
this shortly and add two to this this is
also going to fix our previous one that
we had when finding this of 13 because
13 now has all the way over and then
finally that mid is fixed we can change
this now to back to two now you know me
I don't like hardcoding values something
like this to and really what we're doing
here is we're doing adding two based on
the length of the comma and then the
space after it so there's two characters
in there two this is still that 11 value
that we saw before similarly inside of
our mid function I don't like doing this
two here because States maybe could be
more than two so I don't want to hold it
necessarily to that so instead I'm going
to do R2 minus Q2 which in our case is
going to be two and we have California
all right now we can take all the
different values actually drag it on
down and we get all of our states
extracted from this
all right diving into our final analysis
we're actually combine all of these
different functions we just learned
about specifically with this data set we
have this column H right here and it's a
list of different skills that each one
of these job applicants have we want to
combine this and aggregate this in order
to analyze the most common skills for
this we're going to have to walk through
four different steps in order to get
this into our final visualization that
we can actually visualize and see here
so I'm going to go ahead and clear all
these values so we can get started
actually doing this the first thing I
want to do is actually combine all of
these values into a single long text
string and we're already having the
separator of a comma and space between
each skill so we're going to use that
same separator to continue separating
this so using text join we're going to
first specify the delimiter of that
comma and a space it's asking if I want
to ignore those hidden or empty cells I
do and then finally we need to provide
the actual text itself so we'll go down
through and select in our data tab H2 to
h21 going back up into the formula bar
closing this parenthesis and then
pressing an enter look I have a like
slight typo in here I need to actually
put double quotes around both of them
you can't mix double and single quotes
now we have this super long list uh that
has all of our different skills in it it
looks like it's properly delimited now
that we have all these values in one
cell we can then use the text split
function to now separate this into
different cells because we're going to
want to then move into transposing it
next and for this once again the
delimiter we're using is that comma and
space running this we have all the
different values separated out by
different cells so now almost there we
need to get into making a table
right here basically having skills in
the left hand column and then the counts
of those skills from what's above here
so first thing we need to do is get the
unique values of this but if we just run
unique on that row six we're going to
run into an issue to where it actually
goes out to the right and actually
doesn't get the unique values for all
these so the first thing we need to do
is actually
transpose which moving it from
horizontal to vertic vertical of that
row six okay so it's now up and down all
the way now in this case we want to run
the unique function on this to extract
out all those unique values and
scrolling down looks like we have all
the unique values it does have a zero
because that we did that row six and so
when we get to these empty cells over
keep on scrolling over here it records
as zero I'm fine with that for the time
being and we'll continue last thing we
had to do is use basically a count if to
count these different skills based on
whether they appear or how often they
appear in this row of six so I'll type
in count if we need to specify the range
first and we'll do six I want it to stay
there uh as we're because we're going to
autofill down so I'm going to F4 that
and then from there specify the criteria
which is going to be a n Kafka okay so
three values for that one and then
dragging this all the way down bam got
this all filled in all right the last
thing we need to do is actually
visualize this cuz we want to visualize
these skill counts select the area that
we want we're going to go in and insert
in under recommended charts you can do a
bar chart but I'm more a fan of
horizontal bar charts especially when we
have text values and we need to be able
to see all the different names so I'm
going to have to expand that out a bit
and I'm going to change this title up
here just to something like skill count
of applicants and Bam now we can see
things some Trends out of this that a
lot of people are claiming to have
experience with data bricks which that's
unusually high there probably something
I want to investigate for this but a
good little thing that we actually can
analyze and see from this analysis that
we did one minor note I would normally
go through and actually sort this from
high to low and you can definitely do
this you'd have to copy and paste the
values over you wouldn't be able to use
these values right here and S sort and
filter them because we're using the
modern or dynamic array to find these
unique values so that's definitely an
option if you want to do and I
definitely would recommend you do
something like that before sharing some
sort of visualization like this all
right you now got some practice problems
to go through and get more familiar with
these text functions which like I said
are imperative for de analysis in the
next lesson we're going to be moving
into our last one in this chapter on
formulas and functions on date and time
functions with that I'll see you in the
next one
all right saving the shortest lesson for
last we're going to be focusing on date
and time functions and for this we're
going to be using that same data set
from that last lesson which is about 20
rows of job applicants now similar to
text functions we're not using that full
data science job data set that we've
been using previously because I find
it's not common to really use these date
and time functions on a large set of
data because it's going to slow down
your sheets so that's why we're using
this smaller data set for this once
again if we're needed to actually clean
up date and time stuff we're going to
use something like power query which
we're going to be getting to in the
advanced chapter anyway we're going to
be focusing on two main types of
functions first up our date functions
which going to be able to extract out
things like month day and year and then
from there we're going to transition
into time functions extracting things
out like hour minutes and seconds
finally we're going to move into that
final analysis looking at what is the
time that is most likely for applicants
to apply to jobs for this we're going to
be using the date and time functions
workbook and we'll be working in this
data sheet for this filling in certain
values as we go through this I'm going
to go ahead and hide some of these
unnecessary columns so we have more
space to work with
this anyway jumping right in if we want
to calculate what the month is we have
something like the month number putting
that in that's D2 similarly we can get
the day by using something like day and
once again providing it D2 then finally
something like year we can provide D2 We
Get 2023 now if I wanted to only extract
out of this date out of this date time
if I were to use this date function it
Returns the number that represents the
date in Microsoft Excel okay date time
code got it we're going to put in the
year so we need to provide the year
first month and then from there day boom
and analyzing this we see it is febru 14
2023 one quick refresher on how Excel
stores those datetime objects so right
now it's in as a the number format of
date if I change this back to General
it's going to shift to this number and
if we recall this stores the values in
it if we start at something like one
converting it to a short date we can see
that it starts at January 1st 1900 now
if you're working with dates before 1900
let's say we put in something like
negative 1 I converted it here to a date
it's going to just provide all these
different Amber Sands here there's a few
different workarounds for that that's
beyond the scope of this course main
thing to understand is how it's actually
stored within Excel anyway I'm going to
convert this back up into a date and for
each of these I want to actually fill in
the values all the way down bam all
right close up this home ribbon all
right next up is today say we needed to
today's date well I can put in the today
function this actually takes no
arguments and will provide us the date
I'm filming this on September the 3rd
now the last common function that I find
myself using all the time are when I
want to calculate the days since
something happen in this case we want to
find out how many days has it been since
they have applied to the job so we can
use the date diff function for this now
the one thing to note with this is I'm
typing it in there's no if I type in
just date there's no date diff in there
there's no documentation that Excel
natively actually includes for you to
use this so this is like a function you
just have to know about anyway it takes
three parameters basically the start
date that we want to start from the
reference date that we want to basically
subtract from this which is today we
want to actually go ahead and lock this
I'm going to lock this with F4 and we
want to provide this in the format of
days which we provide this text
character of D and this tells us it's
been about 567 days since Valentine's
Day in 2023 anyway updating all these
cells for this we now have this
data shifting gears into our time
functions as we can expect a lot of
these are going to be the same hour we
use hour function minute has a function
as well as second but this doesn't
really to show seconds but we can see up
here it is is actually included in your
data similar to the date function for
time we have to provide three parameters
of hour minute and then also second drag
and drop this all the way down we can
see that yep it's correlating correctly
one note for the hour that we previous
calculated this is in military time or
if you're in Europe you also do it this
way anyway I really like this for an
analysis purpose especially when we get
into analyzing it now conversely we can
also use for time and also date you
could use the text function which we
previously saw when we were extracting
out the month out of date Times by
providing a value and then the format
text which we're going to say in this
case is just hour hour minute minute if
I wanted that am PM format not that
military time format I can just add in
Here Am Pm and it converts it
appropriately dragging this all down and
then filling it in we get
it now moving into that final analysis
we want to analyze when are these job
postings Happening by hour of day the
first thing we need to actually do is
get a colum here of the hours in the day
so we can do some sort of like count if
on it in order to calculate that so for
this I'm going to use the sequence
function and I went 24 rows with it
column's going to leave blank and I want
to start at one and it's going to fill
down from 1 all the way to 24 and now we
need to run a c if basically for each
one of these conditions run down this
list basically matching to see what is
the hour for these things so I have it
hidden but I'm going to go ahead and
make column again for hour and I'll put
in here hour and unlike last time I'm
actually just going to put the whole
range in here and it's going to provide
me back it in a modern array now with
this I can actually now use this in the
count if we want to First provide it a
range which is our modern array so it's
going to do I2 hashtag and then a
criteria for the hour we want to search
for we want to search for that one A2
from here we want to fill it all in and
we have some reference errors because we
didn't lock our cells specifically we
didn't lock this cell right here this I2
so I'm going press f4 on that to
actually lock that then dragging it all
the way down we have it okay our last
portion of this is actually visualizing
this so we're going to go in select all
that data go to insert go to recommended
charts and I'm more of a fan of column
charts with this type of data so I'm
going to go ahead and put this in and
I'm going to change this to job postings
per hour and Bam now from this we're
seeing that basically people are
applying during normal working hours and
apparently they're waiting until the end
of the day to actually submit their job
applications maybe to get in before a
deadline or something all right this is
the last lesson on functions and
formulas in the next chapter we're going
to be moving deeper into understanding
how to actually make these different
visualizations I've only been showing
you a sneak peek at it right now to get
you familiar with how to easily create
it but we're going to go in into a lot
greater detail up coming up next now we
spent almost nine lessons on these
functions and it's because I feel
functions are one of the most important
things to understand about Excel because
it also transfers to other portions
specifically we're going to be learning
more about the Dax language in the
advanced chapter and we're going to
apply a lot of our knowledge that we
already know about these Excel functions
to Dax functions they're very similar
anyway you got some practice problems to
go through in work in order to
understand better how to use these
datetime functions and from there we'll
get into that chart chapter with that
I'll see you in the next
one welcome to this chapter on charts
and as much as I love using something
like python a programming language for
making
visualizations I feel that Excel has
some capabilities built into it that
allow it to basically exceed any
programming language and the
customization that you can do to charts
that we'll be finding out in this
chapter for this chapter we have four
lessons this lesson right here is an
intro to chart so we're going to be
focusing on understanding the basics of
using charts and specifically looking at
three types of charts specifically line
charts pie charts and bar or column
charts so technically that's four in the
second lesson we're going to move into
more advanced charts such as Scatter
Plots and also map charts along with
understanding more advanced
customizations that we can do to these
charts in the third lesson we're going
to go Harden the paint in order to
understand statistical charts
specifically histograms and then also
box and whisker charts which are
imperative to understand statistical
distributions of our data we'll finally
wrap this all up with a final lesson
focusing on spark lines which basically
allow us to put charts inside of
individual cells
in Excel pretty neat all right for this
lesson we're going to be using the
charts intro
workbook first thing to understand is
terminology Microsoft refers to all
these different visualizations diagrams
plots whatever you want to call it they
refer to it as chart basically they want
to use a safe term that encompasses all
the different type of visualizations we
can build with this so you may hear me
from time to time call this a plot or
visualization basically mean a chart
anyway why do we use charts well looking
these six examples here we can see some
different characteristics about this
data that we're looking at but what if
we looked at just the core data itself
which is this table right here looking
at what is the number of job postings
per month if we look at this visually
we're not able to see necessarily what
is the highest month and also what is
the lowest month I mean you can figure
out eventually but it's not easy to spot
and that's why charts are so powerful
and so I have a variety of
visualizations here in order to Showcase
that same table that we were just
looking at in basically a variety of
different forms here even have a few
below here down below it but we need to
understand which chart to use because
let's say we wanted to use this pie
chart here is that actually a good chart
to use to visualize this or instead
should we be using something like this
line chart to better show a trend over
time while also showing a magnitude of
difference anyway as we go through this
lesson I'm going to be calling out when
you should use certain charts as best
practice along with my recommended tips
for how to customize it to show them
best so for our first chart as I hinted
to we're going to be making this job
posting count into a line chart and this
is the chart I'd use typically for any
time series like data as it's great at
showing a trend over time and how it's
connected so how do we do this well
we're going to select all the data here
all the way from A1 down to B13 come up
into insert and we're going to dive into
each one of these charts individually
but I would encourage you to actually
just start with recommended charts I
really jump to it every time I use it
anyway first thing they has two tabs
here recommended charts and all charts
for recommended charts usually provides
a lot of good tips that you could
potentially use for different charts
sometimes however I do find that I want
a particular chart and it's not here and
that's when I'm going to go to this all
charts Tab and frankly it provides a lot
more control while allowing you to
actually visualize your different data
in our case I know I want a line chart
on this but now I can go in and actually
plot it with markers or even change it
into a 3D line chart highly don't
recommend this we're going to be
sticking to a line chart for this and
I'm going to go ahead and click okay I'm
not going to lie this chart is getting
us 90% of the way there now if you
notice for this when we clicked on the
chart we have certain values highlighted
here basically this purple outline is
showing that this is the X values right
here and then the blue coordinates right
here are showing the actual values
themselves and then conveniently they
put the job posting count which is
highlighted in Orange as the title we'll
be jumping into how to customize this
area in the advanced section but that's
in the next lesson now for those new to
charts there's a bunch of different
elements and I can come up here and I
can click this plus icon right here and
it shows all the different elements on
here I can use the checkbox to control
whether I want to include the axes or
not in this case I do want to include it
and then I can even find tune it further
to select which one I'm talking about am
I talking about the horizontal or am I
talking about the vertical
just going through these in Rapid
fashion access titles allow us to
provide titles for the X and Y AIS the
chart title shown above I can remove it
or keep it on if I want to include data
labels I can do this along with
controlling what position of them I want
to go with I could also include
something like a data table below but
personally I find this is sometimes
sensory overload I don't really use that
much next are airb bars for data grid
lines whether I want to have horizontal
vertical some minor ones or some other
minor ones a legend if there's more than
one data I probably want this a trend
line which will be adding in this a
little bit and then up and down bars
which are going to show whether the data
goes up or down based on each set but
not really necessarily applicable to
this one now I find this plus icon is
where I go most of the time but I could
also go to this chart design tab up here
and it has this box of add chart
elements and basically you can go
through and adjust all the different
ones along with showing a more visual
indication of what's going on here here
showing that I was actual up down bars
to actually see what they actually look
like you can also use this quick layouts
to quickly try out different themes that
Excel has so doy myself from time to
time using this so this chart is almost
done all I do want to do first is change
the title and I usually like to either
provide some sort of snippet of
information from it or ask a question
that I want the reader of this graph to
understand or take away from this chart
so I can put in something like how did
jobs Trend in 2023 so it also tells what
year what's going on here and it asks
them to look at hey what is the trend
going on here which it looks like we
have a peak up in January and a peek up
in August now I try to minimize the
amount of access titles on here because
like in the month's case that's pretty
self-explanatory however the number in
the y- AIS is not so self-explanatory so
in that case I would want to include it
in this case give it a representative
name of counts of jobs the last thing I
want to do with this is just add a trend
line and there's multiple different
options for this we can do linear
exponential a linear forecast where it
actually goes into the future and then
even a two period moving average which
is pretty neat I'm going to just stick
with the basic one right now of linear
and Bam that's our first chart so let's
move in the next
one now if we go back to our original
data set in the data tab we have a
column here on job no degree mention and
basically this column right here
includes whether there's a mention of a
degree in a job posting so in this case
where we have two different values we're
trying to determine what are the
proportions of each a way to compare
this we could either compare this in
like a bar or column chart but I feel a
better one for this is a pie chart so
I've gone through and calculated a count
of the jobs with a no degree mention
along with those that have a mention of
a degree I calculated the total and then
from that I calculated their individual
percentages now I'm not going to just
select all the data here because I don't
want to plot all of it I'm going to
select the first two values here of A2
A3 press control and then also select C2
to C3 then from here now I'm I'm going
to go insert those recommended charts
like got a lad two bar and column charts
come up but the one we're going to be
using for this it's a pie chart so I'm
going to go ahead and insert that in now
personally I'm not a fan of this layout
here so I'm going to come up into chart
designs into Quick layouts and I'm going
to just experiment with different ones
looking at them and frankly I like the
one this one right here actually where
we've removed the legend and put the
actual values themselves along with
their titles inside the pie chart itself
to make it super simple to see which one
is which now Excel sometimes gets crazy
with the colors I actually don't
recommend using a lot of different
colors because it could be very
confusing for viewers on where to look
personally I want to highlight more of
the no degree mentioned so I'm going to
use this single color palette right here
or this monochromatic color palette
right here that has these different
shades of blue and and I feel the ey is
going to go more to the darker blue now
with each of these labels here I can
actually select it I double clicked it
over time I can actually drag it and
drop it and move it around where I want
it to be I would probably want it to be
more over here I want the degree
mentioned to be stacked basically I want
them opposite of each other now you may
have noticed I can't really read this
text right here and even this text is
hard to read as well so what I can do is
I'll just click outside real quick and
clicking back in I'm going to double
click and this is going to bring up the
format data labels if double clicking
isn't work you can just select it go
into the format tab up here and select
format selection anyway there's a lot to
unpack in this Pane and we'll be
unpacking it as we go along this entire
chapter but the main thing to understand
is they have label options and text
options we want to adjust the text
options and this has things like text
fill and outline text effects and then
also the text box for this we're trying
to fill the text fill and outline
specifically this drop down here of text
fill we want to change the color so we
want to change it to White now if you
notice only one of these change and
that's because I only had one of the
boxes selected so actually actually
click out of this double click back into
this and then make sure both of these
are actually selected go back into text
options go into text fill and then
change this color and then it's going to
change both of these colors
now I'm fine with this text now but
let's say I wanted to customize further
the percentage here maybe I want to
include one more decimal place clicking
on the box itself I can now have this
option for label options and then under
well label options again I can scroll
all the way down or I can actually cover
this up and then unhide this number I
can change the number formatting itself
in this case I do want to still do a
percentage and then maybe I want to do
one decimal place personally I think
there's a little a little bit too much
dat so we're just going to keep it with
the zero all right that's the final
customization the last thing we want to
do is just add a title and I want a very
compelling title what do they want to
look at for this I want them to
understand what jobs mention a degree
and now with this we have a pretty great
visual indication of that about one of
jobs have no degree mention in them
which personally I think that's a pretty
high percentage and hopefully gets
higher so we have data similar to our
first chart that basically explains how
many counts of jobs for the different
job titles now this isn't chronological
so I don't necessarily recommend using
something like a line chart for this
that's why we're going to be making
column and bar charts for this also let
explain the difference between the two
anyway I'm using the formulas that we
previously have covered you can dive
into it if you want to basically using
unique and then also a c if formula in
order to count each one of these in
their data tab anyway if I actually go
to graph these by selecting all these
things go to insert and recommended
charts here provides the recommended
charts and we're going to start with a
column chart first I start with this one
first because we're already running into
problems with how long these labels are
we can see that we have these three
ellipses here basically telling the that
the rest of the name is hidden here so
not all the names are shown here the
other problem that we're getting into
with this column chart um named after
the fact that it looks like columns is
that it's not in an organized manner I
would expect to see it high to low to
make it more easily to compare values to
each other and also how they rank so
we'll go ahead and delete this bad boy
anyway this table is organized based on
this unique function which doesn't
necessarily put things in the correct
order and I won't be able to actually go
through and filter it or soter it
appropriately So Below this I made a
different table that I basically use
sort to sort these values from above by
their job count in descending order now
since it's in this order I could
actually select a few less of this
remember how it was cut off last time I
could select only the top six go into
here go into recommended charts and once
again and put in our clustered column
chart now this one I can play around
with and as you see as I expand it out I
can actually see all the different names
here but once again I'm not a fan of
this column chart I'm not going to be
using it for this case instead we're
going to try out a bar chart instead so
selecting all this data to show the
power of these bar charts and then
coming in I can put in that bar chart
now I do like this one better because
all the titles are organ ganized and
they're right off to the side and so
this is a much more easier read the
problem now is I'm really nitpicky with
my charts the problem now is I don't
like the order that this is in what
happens is is Excel starts plotting
these although it's in descending order
in our table as shown over here it's
going to be plotting them starting at
this zero axis up here and then plotting
from there so technically we don't even
want it like this instead what I can do
is reverse the sort order here I'm just
controlling it by using uh either one or
netive one in that sort order portion
anyway with this order now now we can
finally get into the final bar chart
that we want to actually put in and I'm
just going to skip this recommended
charts come up here into the column and
then the bar charts we want this one
inserted in and I'm also going to zoom
out some now this is more in lined with
what I want let's actually clean up this
visualization to identify what we want
I'm actually more curious about what are
the top jobs in data science so that's
what we'll name it additionally feel the
titles are pretty self-explanatory based
on that title but I would need something
for the x-axis down here so we'll add an
axis title calling this count of job
postings now with this question I'm
asking of what are the top jobs in data
science I'm not really feeling like we
need to include things like machine
learning Engineers software Engineers
cloudware Engineers or business business
analyst how could I actually adjust this
well one way is I could control what
areas are highlighted over here and I
could actually drag this and change this
to whichever ones I want um but I'm not
necessarily going to recommend that
instead I'm going to select our data
make sure all the columns are selected
themselves rightclick it and then go to
select data and this new window is going
to pop up here this tells us a lot of
great things about our visualization
first is the chart data range it tells
us we're selected from a25 to b35 so we
could change that here if we wanted to
the next thing is the two windows down
here of the legend entries and the
horizontal axis so this controls our job
count I'm going to scroll this over here
we could just remove job count but it's
not going to do anything this guys
mainly right here the access labels we
can control so I know I want data
analyst and all the way up down to
senior data analyst I can actually go
through and select remove business
analyst machine learning engineer
software engineer and Cloud engineer and
then click okay and it will remove it
from this visualization while still
keeping this data here so I can easily
go back and add or remove job titles as
necessary and now we have our final
visualization earlier I did go through
and actually delete the chart and start
over but you do have this option in the
chart design tab of change chart type
and allows you to basically go through
and try out different ones if I wanted
to go back to that column chart I could
and it would show me an example of what
it looks like now there is one last
thing that I want to format on this I do
find it a little difficult to read
exactly what are the amount of job
postings that they have here so I'm
going to add data labels to this we have
a couple different options we can be
inside end which can't read at all
inside Base outside end which I'm more
for and then also a data call that's
just too much there we're going to do
outside end now with this these numbers
I don't like the level of detail I don't
need down to the single or the on
digigit place to tell what it is instead
I would rather it shows something like
9.6k or 9.6000 so we can actually format
that so double clicking on one of those
labels this format short area is going
to pop up again and for this I'm going
to go under label options and then label
options again and finally number and for
this I'm going to use use instead of uh
any one of these I'm going to use a
custom type now I have a few of these
already built into here and so they may
not pop up to you but this is actually
sneak peek this is actually what we want
but if you don't have this popping up
right now what you can do is actually go
in in this case I'll just show a
different value what we're going to
first say is how we want this formated
with how many decimal places so I want
all the values before the decimal place
then a decimal place and then I only
want in this case let's go with two
places after the decimal place and then
from there I want a K on the end so
basically to show this as a thousand so
I'm going to use a parenthesis put a k
and then close parenthesis and I'm going
to click add okay so now this changes it
to the double digits for explaining that
this is the thousands this automatically
whenever I do that K parentheses it
automatically does the math to basically
divide that by a th and transfer this to
K instead of the thousands anyway I
don't really I'm going to go with the
original one I had of only one decimal
place and Bam that's our final
visualization and we can see from this
that we have a lot of insights into
understanding that more Junior roles
like dat analyst dat scientist dat
Engineers are more prevalent than the
senior roles and that luckily it seems
like there's a lot more data analyst
roles than data scientists and data
Engineers all right you now some
practice problems to go through and get
more familiar with those four major type
of visualizations that frankly I feel
I'm using on a daily basis anytime I'm
making visualizations so don't think
that they're just too plain or too
simple they're really powerful and
explaining data in the next lesson we're
going to be jumping into not only more
advanced charts but even more advanced
customization so with that I'll see you
in that
one we're going to crank this up a notch
and get into some more advanced
visualizations specifically on this
we're going to be doing a deeper dive
dive into the pay of different jobs not
only based on the different job titles
but also based on where a job is located
using things like a map chart and so for
all these charts also we're going to be
looking into how we can further get into
deeper customization of
these so Scatter Plots are great at
comparing two numerical values in our
data set we have these two columns here
one on the salary year average and the
other on the salary hour average just as
a background on why it's called average
at the end of these sometimes job
postings have a range of salary and so I
took the average of the Min and Max and
hence I named this average anyway we
have yearly salary data and we have
hourly salary data what it did next is
get the unique value of the job titles
and then from there using that median
basically modified median IF function
got the yearly median salaries and then
the hourly median salaries so because we
have these two numerical values to
compare basically we want to see if
there's a trend correlated between the
two because well there is we're going to
find out I'm going to go ahead and
select these all then from there go into
insert and we can come into charts I
know I want a scatter plot and if we go
to insert it in can't see cuz it's
hidden behind here well we'll just go
ahead and show it this isn't necessarily
showing us what I want us to show with
this it's basically showing hey this is
the yearly data up here in the blue and
then this is the hourly data since
hourly data it's super low it didn't
work out how I wanted to by selecting
all the data like we've previously been
doing instead I'm going to go ahead and
delete this what we're going to do is
we're only going to select basically
this B and C column of data once again
we're going to try again inserting that
scatter plot and at this point it's
actually working correctly as we want it
unfortunately we can't tell there's no
basically like data labels for this to
understand what are the different job
titles associated with it even with the
graph we can see that it's only
highlighting this also the incorrect
titles up here it's not just hourly
median salary we're going to fix all
this anyway the first thing that I want
to clean up is actually the selection of
data right now we can see these numbers
are overlapping down here also it goes
all the way down to this zero axis on
both the X and Y I want to change that
so I'm going to double click this x axis
and format access pane pops up and we
can see that we have bounds here 0 to
180,000 I can see that there's no values
under about 75,000 so I'm going to go
ahead and put that in for the minimum
and press enter so it's going to jumate
this way now I want to do the same thing
for the Y AIS I'll just double click it
and this one didn't necessarily go where
I wanted it to go I wanted to actually
change the values here so we can go
under access options under access
options again and under access options
again we can change this minimum maximum
I'm going to change it to looks like
there's nothing above 20 or below 25 so
we're going to go with that now even
with this change in the formatting of
the values here the minimum I can still
see that there's overlap here so I want
to update this similar to last time
basically cut it off the thousands place
and place and put a k at the end so
under access options access options
again I'm going to close this drop down
of access options also instead we're
going to go to number for this we want a
custom type and I do have some values in
here but we're just going to go if you
don't have them in here we're going to
add a new one specifically with this I
wanted to show one I wanted to show a
dollar sign at the front and I don't
want any decimal places whatsoever so
I'm just going to put a zero in there
and then from there like last time I
want to format this in the thousand's
place so I'm going to put a comma and
then double quotes to put around the K
which signifies I want to formulate this
in the thousand's place I'm going go
ahead and click add and now this is much
more readable not so much sensory
overload for our y AIS I don't care at
all about this decimal place right here
so going back into numbers again I can
just format the decimal place places as
zero and I'll just leave this one as an
accounting category now which one's
yearly and which one hourly salary well
we need to include actual access titles
for this so I'll go ahead and enable
that and then for this we're going to do
something a little bit different I'm
going to select this ya AIS title and
instead of actually typing in values in
I want to use actually the column header
right here so I'm going to come up into
the formula bar type equal to I'm going
select C1 and then press enter and now
this updates for that column head I can
do the same thing here for the x-axis
title selecting it then from there going
to the formula bar put an equal and
selecting cell B1 and pressing enter for
the title we don't want that hourly
median salary we're really trying to
find out what jobs have the highest pay
and we can basically tell it from this
all right so let's actually finally get
to adding data labels to this and we can
see what data labels are actually
available but scrolling over the
different options here we're going to
just go with above for the time being
then I'm going to close on out of this
and I'm going to select the data labels
themselves and format data labels should
pop up if it doesn't you can also go
about doing it by right-clicking this
and going to format data labels anyway
for this I don't want to actually show
The X or the Y value for this anyway uh
I made I made it disappear by actually
closing out of that so actually I going
have to add those data labels again
again anyway going back into it under
label options label options then label
options again I'm going to leave that y
value selected for right now but what I
want to do now is provide the job title
itself right next to the data point so I
can do this option here so label
contains value from cells and it's going
to ask me to select the data label range
and so now this is when I'm going to
select all of these different job titles
here and press okay so now we have these
values from cells I no longer want this
y values and I do want to include this
leader lines because we're going to be
actually dragging this around because as
you can see some of these values are
overlapping now also I'm noticing that
this is really busy right now with all
this text and stuff so I'm actually
going to remove the grid lines for the
time being actually for the remainder of
this cuz I I don't feel like it really
needs the grid lines in general and now
I have a little bit less sensory
overload so I can go through and
actually clean up where a lot of these
different job titles are located by just
selecting it and then dragging it and
you notice uh we had that leader line
selected so I have arrows or basically
lines going to each of these ones to
signify which one is which so now I've
dragging these all over so that way
they're basically more represent I want
sometimes if I dragged off of this and
drag maybe the whole chart itself and
make a mistake I press just control Z
and it reverts it back to where I'm
going and then I just continue on to
selecting the box that I want and moving
it anyway this is pretty neat now I
could actually go in if I wanted to and
add a trend line to this and basically
it shows for an increase in that yearly
salary I expect the same with the hourly
data in this case I don't find it as
much useful so I'm going to just keep
leave that off but in general it is
pretty neat to see the trend that's
going on with this that senior data
Engineers although they're underpaid
compared to senior data scientist in
yearly salary you could get the hookup
if in instead you look for an hourly gig
instead in order to get a little bit
higher pay a similar Dynamic happens
between business analysts and data
analyst so if you're a data analyst and
you're looking for a job maybe on upwork
maybe you should advertise as a business
analyst
instead all right going back to our data
set itself we have another column in
here I want to investigate and that's
specifically around the country is
called job country basically where the
job is located at and I like to
visualize these type of things well on a
map to actually see how it affects
others so I've made this table here
under the map chart tab where we have
our all the different countries in the
data set then from there we use a count
if to determine how many counts for each
of the countries and then our modified
median if in order to determine what the
median salary is in each of these
countries I've also had to wrap this one
in an if error because some of these if
there's no values it throws an error and
I didn't want that popping up in the
chart so so I had it disappear or make
it basically a blank value if it does
have an error anyway let's get into
visualizing this we're going to first
just visualize what are the counts of
these different jobs based on the
country so I'm going to select column A
and B go to insert and then maps and go
to this map chart now you may have a
pop-up warning that comes up during this
that says data needed to create your map
chart will be set to B and I'm fine with
sending this data to being you should be
fine too with it so feel free to accept
this then you shouldn't get this pop up
anymore anyway this chart's pretty neat
because it goes and shows we have a
heavy concentration of jobs basically
from the United States for my job
scraper I'm heavily aggregating jobs
from this country compared to other
countries sorry other countries out
there but I am still n less collecting
from other countries like us has 25,000
India is around 580 for this one I'm
going to change the title to where are
most jobs in Luke's data set from
there's not to say the United States has
more jobs than other countries this is
just how my data set is and how I
extracted the data so don't want you to
come up with the wrong conclusions from
this now the visualization that I really
care about is comparing these countries
to the median salary so holding control
I select a and then C I'm going to do
recommended charge from this cuz I'm
having problems using the maps one
anyway I see that it has the filled map
here I'm going to select okay and I have
all the data filled in all right with
this visualization we we can now dive in
we can see that we have a range of these
median salaries from over 157,000 down
to 30,000 with country like China having
around 68,000 and then over in Africa we
have Algeria at
45,000 so looks like we have a lower
salary in the African continent over in
North America and also South America
pretty high salaries along with
Australia as well anyway pretty cool
visualization we were able to generate
out of this I mean I love data and I
just love this visual a with this I'm
going to change the title to what are
top paying countries now the last thing
is a minor Point sometimes if you're
going ahead and actually moving maybe
columns around you'll notice that my
visualization is also moving as well and
this can wreak havoc especially whenever
you've made your dash or made your chart
a certain size and then move columns
around and it messes everything up we
can fix this so I'm going to go ahead
and contrl Z both of those column moves
to get it back to where I had previously
and then from there I'm just going to
double click on the chart itself go
under chart options and once again this
like resizing one here and going under
properties right now it's selected under
move and size with cells we don't want
to do that basically we don't want to
move or size with the cells so I'm going
to select that now closing out of this
whenever I go to adjust the column size
it's not going to adjust the
visualization at all this is much more
of what I want also one last note on
this I do do have a filter currently
applied to this data set specifically I
go into it it's a custom filter and I
wanted to make sure that I had basically
removed any na values so I put hey I
want values that are median Sal greater
than zero and are less than 200,000 so
if I go ahead and clear this filter we
can see that we have some other values
up here basically rushes up here at
300,000 for a median salary and if we
actually go in investigate Russia we'll
see that they only have around four jobs
with salary data listed so I feel like
this salary is more of an outlier than
anything so that's why I'm applying this
filter of 0 to 200,000 applying this
filter again we get final visualization
now you could also play around with this
and filter it based on the number of
counts to make sure you have values that
are above a certain count that's also an
option and probably maybe even a better
option as well all right chch turn now
to dive into those practice problems to
try out some different Advanced
visualizations and along with some
Advanced customization with that in the
next lesson we're going to be diving
deeper into understanding how to use
statistical analysis specifically box
and wher charts and also histograms and
how to read them with that see you in
the next
one this lesson is going to be focused
on actually visualizing a lot of the
things that or a lot of the functions
that we used in that statistical
functions lesson where we're looking
visually at things like the median and
core tiles specifically we're going to
do a refresher on histograms we've seen
it a few time reality but we're going to
dive into further understanding how
salaries are distributed specifically
for a target audience of data analyst in
the United States you can feel feel free
to do whoever you want and then from
there based on the limitations of it
only be able to visualize one job title
we're going to shift Vex to looking at
box and whisker charts and these are
great at also showing statistical
distributions like a histogram but we
can take it a step further and we
compare different values specifically in
this case we're going to compare them
across the different job titles on how
they're distributed now box and whisker
charts aren't probably a chart that
you're familiar with or most people are
familiar with so we're going to go
through a review and understand and
break them down to understand those
Concepts we talked about previously
about median and quartiles and where
they fall into this for this we're going
to be using the charts statistics
workbook specifically we're going to be
starting in this data Tab and for all
this we're going to be analyzing salary
data in this video we're going to be
focusing specifically though on that
yearly salary
data so let's actually go back into
breaking down how to read a histogram we
go back into insert recommended charts
and then from there select histogram and
insert in the histogram I don't like
where it is right now I'm actually going
to move this chart into a new sheet now
quick refresher on histograms each one
of these bars represents a count of
values within a range so in this case
there's 920 values between the range of
oh my gosh so hard to read 75,000 to
81,000 and as we're noting by this we
have a large number over here if gets
even out to 960,000 this would be called
a skewed right distribution now this is
different from a column chart because
this data down here on the xaxis is
basically continuous data when one bin
stops so this first bin of 15,000 to
21,000 the next bin picks up now the
first problem with this histogram is
this is for all salary data specifically
all job titles across all countries I
want to actually find tune to look at my
specific use case of data analyst in the
United States so you can come here into
the histogram 2 Tab and I have the four
Columns of interest that I want to use
from the data Tab and I already have the
filters applied but if you want to you
can come in here and actually select to
clear these filters and I'll just select
it here from that Home tab then from
there I'm going to go through and select
data analyst roles that are full-time
only that are in the United States and
then finally I don't want any of these
blank values here so I'm going to
uncheck this value here for blanks now
we'll say filtering this data did take
some time to actually do so don't be
alarmed if this taken more than 10 or 15
seconds all right so back in let's
actually make a histogram with this data
we'll go into insert from here I'm going
to insert in a histogram now once again
this distribution is so the last one
skewed right and we have a heavy amount
of outline s right here even out this
one value around 370,000 I don't think
this provides a lot of value instead I
want to actually focus more into these
this actual distribution and not
actually on this portion out here that
we have just outliers anyway I'm going
to come in here into our filters up here
insert a number filter and that it's
less than 300,000 click okay all right
this is looking a lot more readable
which we can actually see now the x-axis
now each one of these bars right here or
what what you would see in like a column
chart are called the bins and they're
all equally space but we can control the
width of each one of those bins that
they Encompass specifically I can double
click on the chart to bring up that pane
to the right selecting the x axis I can
then go into access options and then
once again access options we can go into
something right now we're noticing that
the bins are automatically determined we
can actually change this binwidth I'm
going to change this something to like
15,000 notice that it is bigger in this
case the bins are bigger than they were
previously you can feel free to test
different options if you will I feel if
you go too small in the case let's say
we went down to 1,000 it just gets too
noisy and also you can't necessarily see
the distribution as well so really you
just have to play around with it until
you get to what you want to find as far
as the access goes this is a little bit
this is sensory overload for me way too
many zeros in here so I'm going to move
this selecting the xaxis we can see that
has format access now I can go under
number and once again we can go in our
custom type none of the ones that I've
previously done are here sometimes it
pops up sometimes it doesn't we're going
to go ahead and just put in we want the
dollar sign zero and then formatted with
the K value basically removing all those
uh thousands zeros and I'm going to go
ahead and click add all right this is a
lot more readable to actually see what
those different ranges are
and from there I'm going to change the
title of how much do data analysts in
the United States make probably also
best practice here to add a title on the
Y AIS for count of jobs and B now we
have this final visualization show on
our histogram we can see that a lot of
the salaries are more around the range
of 85,000 to 100,000 which 70,000 85,000
is coming up next so this show is really
visually great and at where I can expect
to have a salary as a starting data
analyst but now what if we want to
analyze multiple different job titles
which we're going eventually get to is
this box plot here where we're plotting
it for all the different job tiles we'll
be able to actually compare different
values across each other but before we
get to that we need to First understand
how to read a box plot also sometimes I
call it a box plot but it's also known
as a box and whiskers chart anyway I
made this visualization here you don't
have to do it there's a bunch of
customization along with it the main
purpose of this is to demonstrate or
help understand how to read a box and
whiskers chart so I took our data that
we previously were analyzing for data
analyst in the United States it was a
full-time role along with all the salary
data and then I use like we previously
did calculating things like the Min
first quartile median average third
quartile and Max just ignore this
portion right here it was used to make
build this visualization right here
anyway I tried as best as possible to
line up this histogram where we have the
x-axis going from 25,000 to 285,000 with
the box and whiskers chart I may below
it from 25,000 to 285,000 so the Box
itself signifies what that nerds call
the inter quartile range basically all
the values between q1 or quartile 1 and
cortile 3 had a typo there got to fix
that anyway that's why it was so
important that previously we calculated
that first quartile and third quartile
and if you remember from that there
quartiles so 50% of the data Falls
within this box and if we look up we
were to draw imaginary lines into our
histogram we can see that about 50% of
the data does fall within this the next
up inside of here is a line that is for
the median in this case our median is
990,000 and then we have our average of
90
5,000 which as we discussed previously
the average is going to be higher here
because we have things all the way out
here called outliers basically dragging
that average higher and outliers are
signified by these dots outside of the
whiskers themselves these whiskers are
the lines and the lines themselves
extend to the minimum and the maximum
and these are just relative mins and
Maxes they're not necessarily the true
men and Max anyway so that's a box and
whisker chart and frankly by themselves
I don't think they're really great but
when you pair them with other
categorical values I find them super
interesting so let's actually build this
visualization so you can come over to
this box plot2 Tab and I have our data
inside of it none of it is filtered it
has all the different job titles and all
their Associated salaries for this I'm
going to select column M and then also
holding control I'm going to select
column A then from there go in and
insert and go to recommended and from
there look at the box and whiskers chart
which looks like it's already pulling it
up for us so let's pop this bad boy in
now one drawback of these box and
whisker charts in Excel is unlike that
last box plot that I made I custom made
this in order to make it appear in this
horizontal fashion you can actually do
that you can only have the option to
have them vertical up and down anyway
this is pretty close of what we want to
get the main problem I'm noticing right
now is we have outliers up to 1.2
million and it's really with the data
around 100 150,000 it's really hard to
actually look into those boxes so I'm
going to change this yvalue scale double
clicking on the Y AIS I'm going to
change the maximum to 300,000
additionally since we're here I'm going
to change that number formatting to use
that 0k value then also I'm finding the
color is a little hard to actually see
these x's in here so under series option
selecting fill in line fill I'm going to
change this color to more of a lighter
blue okay and that's definitely easier
to read I'm going to add a vertical
access of salary USD I'm also going to
bold it all to make it a little bit more
readable and then from there change that
chart title to what are the top paying
jobs in data science all right getting
into actually analyzing this and getting
insights from it now one drawback out of
this is there's not an easy way to sort
these values right here right now I'd
normally put them high to low I'd
probably put them high to low based on
median salary but they've been put into
this graph based on the order that they
first appear over here in column A and
that's when they pop up so that's the
order so technically I could go through
and sort this column
alphabetically but that's going to take
a little bit too much time if you want
to do that feel free to try that out
anyway it looks like roles like machine
learning engineers and also software
Engineers have a pretty large inter
cortile range or that where that 50% of
that data Falls so there's a basically a
wide range of data or salaries you could
find with that whereas data nerds data
scientists data analysts and data
Engineers have a tighter band also as
expected those data analysts and
business analysts have some of the
lowest median salaries where something
like the data engineers and the senior
roles have even higher median salaries
overall this is pretty great at going in
comparing values I would probably work
with this more to fine tune it to only
have a couple of job titles in it and
for that we can use something like
slicers which will be covering in an
upcoming chapter well the next chapter
when we get into Advanced Techniques in
Excel so we'll be able to customize this
further once you have that knowledge all
right you now have some practice
problems to go through and get more
familiar with those histograms and all
scope box and whisker charts in the next
lesson which is a quick one we're going
to be moving into spark lines which is
the final lesson in this chart overview
with that I'll see you in the next
one moving into this last lesson on
charts focusing on spark line spark
lines are basically ways to insert mini
charts into a cell that summarizes data
that's next to it if your data is coming
in a horizontal form similar to this
table you probably have the possibility
of considering inserting a spark line
we're going to going through how to make
them but also customizing it all right
for this we're going to be using the
spark lines workbook for this we have
like usual our data Tab and then our
original tab that calculates data off it
and for this data set we're just looking
at what are the counts of the different
job titles based on month so this is
basically horizontally oriented this is
great for a spar
line so how we're going to do this well
we'll go ahead and select the data only
so C4 to n10 then come up into the
insert tab then right here we have this
section on spark lines we can insert a
line column or a win loss we'll just
start with column to start with and it
fills in for the data range C4 to 10 but
it wants us to choose where you want the
spark lines to be placed so the location
range and click this Arrow here and then
from there actually drag it next to it
all close this Arrow back and click okay
anyway I wanted to demonstrate that bar
chart because it's not really that great
for here remember anytime we're doing
continuous data in this case we're doing
that monthly data I'm going to want to
use something like a line chart instead
so I can easily change it by coming up
here selecting all of our different data
selecting that spark Line tab and then
just changing it to I can change
something like win loss which no really
data from this line chart that's what we
really want from this now getting into
the customization of this I really
personally I'm like blue so we're going
to stick with the blue color but we
could change the color if we want to and
the other thing we change is the marker
color right now we don't have any
markers on it we can actually change
which markers are right here in the show
selection right here so I can select the
high points right now it's going to
highlight all of them red uh low point
also red negative points there's no
negative point you also do the first
point which I don't really find much
value in that or last point and then
actual finally the markers itself you
just put every single one of them with a
marker I really like this High Point and
this low point and we can customize this
the high points I would really want to
call out to be a green color right now
this green that's sort of hard to see so
I'm going to change it to something a
little bit darker and Bam we can see
that one a little better the red for the
low point I'm going to keep it as is and
the last thing is all this data has
Bally a grid around it I'm just going to
add that in real quick by selecting all
the cells come up into home into the
borders I'm going to put in all borders
around it then it looks like I have a
double line right here for this lower
one so I'll insert this bottom double
border and then finally I'm going to put
a thick border around this all bam we
have our final visualization there now I
can go through and see things like okay
with that analyst and other analyst we
saw spikes in January but things like
thata Engineers we didn't see a spike
however all the job titles ran to a
similar problem where apparently they
ran out of budget and the least amount
of jobs were posted in November and
December so this's a pretty cool feature
to show some quick snapshots about the
data you're looking at right you now
have some practice problems to go
through and basically practice making
some of these spark lines we're going to
next be jumping in the next chapter it's
our final chapter of the basic section
and it's going to be focusing on
Advanced features inside spreadsheets
such as tables formatting and how to
collaborate with others it's our last
section before we build our first
project so with that I'll see you in the
next chapter on Advanced
spreadsheets then nerds welcome to this
last chapter in the basic section
focusing on Advanced features and
spreadsheets there's a last chapter
we're going to be covering before we get
into our first project and this chapter
is broken into three different lessons
this one right here is going to be on
tables how to use tables how to use
things like slicers and how to
manipulate them second lesson is on
formatting not just on making cells look
pretty but developing conditional
formatting rules in order to highlight
CES according to well a certain rule
pretty interesting feature within Excel
and the third lesson is on collaboration
for a project we're going to be making a
dashboard and so we need to enact
certain measures in order to protect it
and prevent people from going in and
messing it up and so we're going to go
over a lot of features in order to set
it up properly anyway back to this
lesson what are we going to be doing for
it well first we're going to start out
by using a smaller subset of our data
set basically 15 rows and creating your
first table we're going to be
manipulating it using custom formulas
that we really haven't seen before along
with using some other ones that we have
seen before in order to calculate totals
subtotals and Aggregates by the end of
this lesson we're going to be building a
mini dashboard to analyze that histogram
that we talked about in our previous
lessons specifically we're going to add
slicers to it in order to be able to
filter down and look at a subset of data
that we're most interested about and
that's all could be done without the
help of tables for this lesson we're
going to be using the tables workbook in
chapter
4 for this you're going to start in the
tables intro original sheet and then the
final one's going to be what we're going
to eventually get to all these are going
to be labeled similarly with the
original and final and we're all going
to be working with the original it
should look like the final when you get
done with this so let's dive into
creating our first table first thing you
have to do is make sure that we're
selected somewhere in here we don't
necessarily need to select the full
table but just somewhere in here from
there we'll go into the insert Tab and
we'll insert a table also notice that we
can use the shortcut control t for this
so I'm going to do that instead and for
this it automatically pinpoints the
rightmost cell and the bottom most cell
and we need to make sure we have this
check mark enabled of my table has
headers because we have well call them
headers and Bam we just made our first
table this lesson's over but seriously
let's actually get into exploring this
table design tab that now appears
anytime you're selected to the table if
I click off of it it disappears anyway
we're going to first look at the table
name and i' like to have a table name
that's easy to reference so I'm going to
just name it something like jobs it's
going to come into handy naming it
something simple whenever we're making
formulas later for this now we'll get to
this section in a little bit on tool and
external table data but I want to move
over to the style options you can play
around with some of these options here
where you can highlight the First Column
or you can highlight the last column has
a lot of different formatting options
with it but what I really like is this
color formatting if I'm not really
liking the color that it's given to me
just come over here select a new one so
we'll get back to table design in a bit
but what's really the benefit of this
table well one thing is you can easily
add data to a table and it will will
autofill let me show you let's say I
wanted to add a new column with a solid
year average copy whenever I enter this
new column name and press enter it
automatically fills this in I can the
skills are sort of covering this up
right now sorry about that and I can
make this a little bit bigger but you
can see we have salary or average copy
now included within this table and I can
verify that it's included also in this
table by if I want to go to resize table
it will say that now it goes to
k16 now for this I just want to copy the
results of the salary year average
column over here in h so what I'm going
to do is press equal to and I'm just
going to select the cell over here of H2
now this is what I was talking about
whenever I said tables have their own
unique formulas what it's going and
doing here is it's referencing the
salary or average column which is this
portion right here and then it's also
using this at symbol to basically refer
to this is the same point in the row of
H2 that is a K2 anyway when I go ahead
and press enter Watch What Happens we
actually fill in all the different
values of this so if I were to actually
double click into this one down here we
still have that same syntax of we're
selecting the Sal your average column
and we're using that at value value to
get the one that corresponds in that
same row now let's dive deeper into
these different formulas we can use for
this table so I'm going to come over
here into column n and for this remember
we named our table jobs so I'm just
going to type in jobs and I have two
tables in here one called job one jobs
you only have one popping in here anyway
it automatically pops up so I'm going to
select jobs and now whenever I do this
I'm going to press enter it's using our
modern dynamic arrays basically to fill
in all the data that we have over here
inside of our table so pretty unique in
how we can reference this now what
happens if we wanted to also include the
column headers up at the top well I can
type in jobs and then from there I'm
going to add a square bracket and we
have a few options popping up right now
it looks like it's just a column titles
but if we scroll down we have these
values here with hashtags in it
specifically I want with the column
headers so I'm going to put hashtag
headers I'm going to put a close bracket
on this and then press enter and now we
have the column headers across the top
now that's a little bit too much work
having to do two different formulas for
this if instead I wanted to do job and
then square bracket and see the options
available I can see I have an all a data
only a headers and a totals row totals
row we're going to get to a little bit
so we'll do the all for now and if I go
ahead and press enter bam we now have
our data with our column headers and
also the data itself but what happens if
you want to just access certain columns
well I thought you never asked that well
once again I can type in something like
jobs but the square bracket and then we
have a list of different columns
available let's do the salary year
average and do a close bracket once
again this is going to provide the data
values only if we wanted to include the
specific header for this I once again
need to put in jobs and this time I'm
going need to specify not only the
headers so I need to put this in its own
square brackets but I'm also going to
have to do a comma put another square
brackets and put salary year average
within its own brackets so it's almost
like a list of items if you're familiar
with python this would be like a list
anyway we have the headers in Brackets
and we have salary year average in
Brackets pressing enter we get salary
your average up at the top now honestly
an easier way to do this all is to well
use that all command or hashtag all but
it has to be put within its own square
brackets then from there a comma and
then we want to say hey the subset only
that we're providing for this is salary
year average close that bracket and then
close the entire brackets for jobs now
from there when we run it we get the Sal
year average along with all the column
values at any time if you forget that
it's not that big of a deal as you can
just go through and put an equal sign
and like we did previously I could just
highlight well not that um our salary
your average column and look it
automatically populates with that same
formula above here and when I press
enter boom it pops up there so don't
think you have to memorize these
formulas that I just went over but what
do all these formulas actually provide
any value value for well let's look at a
use case let's say I wanted to identify
jobs that whenever we looked at the
skills we could find out if they
contained the skill of Excel or not so
I'm going to create this new column over
here and call it Excel and for this
we're going to be using the search
function which we need to provide what
text we want to actually find
conveniently I put it in the column
header so I'll go ahead and just select
it and automatically populates the
formula for this then from that we need
to go to the next parameter of within
text we're trying to look at that job
skills column it puts that at symbol at
the front of job skills to basically
signify look at that row then from there
I'm going to go ahead and close the
parentheses and press enter so for that
search function it provides the N
numerical location of excel in here
Excel is 36 characters deep into this so
I'm just going to modify this cuz I
don't really care about the number of
that I'm going to say I'm going to use
the is number function which checks if
it's a number and then returns true or
false in this case we have True Values
so we know that for these columns if
they contain Excel or not they'll have
true so that's how I find myself using
these different formulas and
understanding how to actually manipulate
them anyway let's get into our next step
let's say we wanted to include some sort
of totals Row in order to maybe
calculate median salary how many job
postings there were Etc so we'll go into
this table design Tab and I'm going
going to select the total row and now
down here in row 17 we have total
written down here along with a bunch of
well blank values except for all the way
to the right looks like it puts us the
number of 15 which is the total of these
now going over to that salary year
average column I can basically select
this totals row right here and you
notice a drop down appears right here
from here we can select some basic
statistics average count min max
variance go ahead and select average
that's the average of this column right
here so pretty neat I'd go through and
if I wanted to do other columns as well
that now you can also go into here and
select more functions and then like we
said we want to calculate Median on this
salary we could go ahead and select this
function of median but I'm actually
going to recommend another approach you
see if we double click inside of here we
actually see that this totals column is
using a function specifically the
subtotal function function so let's
actually build this out from scratch
without selecting it luckily we have the
salary your average copy column over
here so I'm going to go in and I'm going
to type in subtotal and it returns a
subtotal in a list or database first is
the function number what do we want it
to actually do and this has even more
values available to it that you can
actually select from and perform on this
so in this case let's say I wanted to
find out what the max value is I would
plug this in it would be 104 and then
for the reference for this well we're
just going to select this salary year
average copy column it automatically
transformed into this special syntax and
then add a closing parenthesis and press
enter and so now we have the max salary
which looking at this it's true but if
we go back into this and actually
inspect what values are available in
this function number we can see that
median is not available in here so what
are we going to do well there's another
function we're not going to use median
but that I recommend instead of using
sub total and for this one we're going
to use the aggregate function and this
returns an aggregate in a list or
database it's similarly designed where
it has a function number but with this
one we have a lot more options including
things like CTO and stuff like that
anyway it has median available as number
12 now the second parameter on options
allows us to select a host of options uh
no pun intended for allowing us how we
want to actually perform this aggregate
basically do we want to maybe ignore
hidden rows or do we want to ignore
error values in my case I don't really
want to ignore anything so I'm just
going to do number four and then finally
we need to insert the array or the
column itself in this case we want
salary year average closing the
parentheses on this and pressing enter
we get our median value of 94,000
now depending how fast your computer is
you're going to run into some
limitations here I have in the table
limits original tab which is the next
one we're going to be working with in
this uh portion of the lesson it has
around well 32,000 which is in the data
set anyway we're going to run into some
limitations as I'm going to show I'm
going to encourage you to just watch
along uh me do this and then from there
basically decide if you think you have a
strong enough computer or not to
continue on to do this
um but if you have a pretty uh basically
slow computer I wouldn't necessarily
follow along with this anyway I'm going
to convert this into table by selecting
any portion in here pressing contrl T it
selected all the different values and
that table has CS so now we've converted
this into a table and one of the
benefits we haven't really discussed yet
is the ability to actually filter data
because it automatically provides this
filter up at the top now I'm going to go
ahead and filter this down based on a
data analyst job title and when I go
through and actually select this to just
select it at analyst and press okay it
runs pretty quickly but I have run into
problems in the past especially working
with smaller computers where it takes a
while to do this I'm working with about
24 GB of RAM on this virtual machine so
if you're something at like8 or even 4
I'm going to highly recommend that you
may not perform this exercise
so moving to this last exercise of this
lesson I've gone ahead and condensed
down this data set you can go into
histogram original and our previous data
set I basically shorn it down to these
four columns and limited to only
positions that have a salary year
average value listed basically if
there's blanks I remove those rows so
it's about 208,000 rows anyway this is
what we're going to be manipulating for
this this shouldn't lock up your
computer if you have a basically a
computer with less RAM and we're going
to convert this into a table first
pressing control T I select all the
values on here and press okay so now we
have a title now also in this sheet you
may have noticed hopefully that it's
been on the screen I have this histogram
here which is basically aggregating the
data from this Delta column on salary
year average anyway we're going to be
manipulating this further we want to
basically make this into a dashboard so
we can go through and maybe filter for
different job title different job
schedule types or different job
countries and it can be mildly
inconvenient to come up here and
actually select this arrow and then go
through and select the values want
that's why slicers are great so with our
table selected I'm going to go into
table design and then from there under
Tools I'm going go to insert slicer
we're going to be entering in both a job
tile short job schedule type and a job
country slicer so all three are here now
I'm going to go ahead and position them
make them look a lot neater all right
got them cleaned up and then from there
I can go ahead and actually select the
slider sir and if you notice this slicer
tab pops up conveniently labeled this
slicer has a caption on it or a title as
well and I can just rename it basically
to a better visually appealing title in
this case I want it to call job title
and then it updates here for job title
I'm going to do the same for the other
two updating it to schedule type and
then also Country Now by default this
slicer and all the slicers have all the
value selected so if I wanted to to go
in to actually select a value I could do
something like well we want to look at
data analyst I just select data analyst
it's going to clear all those other ones
and then only select that analyst as you
notice it took a second for it to
actually load that's why with this
20,000 rows of data even that's a little
high for tables I recommend it around
10,000 if you're using tables anyway we
have it filtered down to data analyst I
could also do it down to
fulltime along with filtering it for U
basically
uh I want to do United States if you
notice these values are gray out that
means there's no country basically
available with the current selections
that I have of data analyst in full-time
so that's what that means there but I
can go into that for United States
selecting it and Bam we now have our
final basically visualization but what
happens if I want to maybe look at
multiple different values what if I
maybe want to look at both data analyst
and business analyst well in that case
you want to select this box up here and
it allows multi I select and so I enable
it and now I can go through and select
something like business analyst and this
provides both those values along with I
wanted to look at full-time and also
part-time I could enable the multi
select on this schedule type and select
part-time and Bam now we have multiple
values selected for this along with the
United States and this makes the
dashboards that you're building a lot
more interactive and a little bit fun to
play around and to visualize the
different data all right we have some
practice problems for you now to go
through and dive into not only creating
tables manipulating them but also adding
and playing with slicers as well with
that we'll see you in the next lesson
we're going to be jumping into
formatting specifically conditional
formatting so see you
there in this lesson we're going to be
focusing on formatting and not just self
formatting where we're going through and
adding borders and colors but also
conditional formatting where a cell's
basically formatting highlighting will
update dynamically based on a value in
the first example we're going to focus
on Cell formatting specifically we're
going to go back to that table that
we've worked with previously that does a
count of data science jobs over the
month anyway we're going to go through
and actually format it using all the
different functions we can in order to
make it look pretty like I made it from
there we're going to move into our first
conditional formatting example where
we're going to look at basically
highlighting based on a job title those
that are basically high and those that
are low highlighting them appropriately
green or red and then in our final
example we're going to move on besides
using color scales to also using things
like datab bars and also icon sets to
make it look a lot more Dynamic we're
also going to go over best practices on
what not to do cuz sometimes you can go
overboard in how much you're actually
coloring a table and you can make it a
little distracting and and ultimately
not meet your goal for this lesson we'll
be working with our formatting notebook
in chapter 4 as usual all the data is
located in the little data Tab and we'll
be starting with the underscore original
of each of these sheets and then it
we'll get to in this case format
original we'll have what it looks like
format
final for the cell formatting we're
going to be using this format original
sheet and we're going to be focused on
this Home tab here so I'm actually going
to leave it expanded and for this we're
going to make this to where well what
this table looks like by going through
and actually formatting using all the
different features in here so the first
thing we need to do is highlight it all
and actually remove the formatting so
with it all selected I can go to editing
and then clear and I can either clear
all which is what I don't want to do I
want to do clear format and Bam now we
have an ugly table that doesn't really
make a lot of sense now previously we
were mess with tables so I could
highlight from B3 to 010 and make this
into a table by coming up here to format
as table basically selecting the color
that I want saying that it has headers
and allowing it to update there's
definitely an option um but I'm not
necessarily a fan of this so I'm going
to clear this by pressing contrl Z Now
an underused feature of formatting is
this cell Styles tab right here so I'm
going to go ahead and select the months
up here basically the titles and for
cell Styles they actually have a lot of
pretty unique formatting you can see
happening in the background so I'm going
to try out in this case I'm going to try
out heading two which is pretty neat
because it makes it bold slight bigger
and it puts a little line underneath it
I could do something also where I
highlight all the rows over here and
then make this into maybe heading three
and then all these values in here are
calculations so technically I could just
highlight this all and for the cell
Styles I could come up to the top here
and select hey this is a calculation and
this not a bad looking table uh but not
necessarily all I want to do so I'm
going to just remove this all instead
I'm going to start with my months I'm
going to make them bold and also add a
light gray background I'm going to do
the same thing over here for the values
in my rows and then from here we're
going to get the actual column grid
lines put in I'm going to only select C3
all the way down to o10 I'm going to
show you why and I'm going to add an all
borders so this is NE it add it adds all
borders to it what I'm going to also add
this which will add a little bit of
flare to it is a thick outside border so
now we got a thick outside border around
all of this and I'm going to do the same
with this one of an all borders and then
a thick outside border now it did remove
that thick outside border that I had on
this line between B and C so I'm
actually going to go ahead and put that
back in by just clicking it next thing I
want to do is format these with a comma
so I'm going to come up here and well
add a comma and then unfortunately it
adds this space in here and makes this
table bigger than what you can see now
I'm going to first remove the decimal
places and then in order order to fix
this I'm going to highlight all the
different columns through here to
January and just double click on one of
them to make them slightly smaller
anyway it's still not fitting completely
on here and I want this to fit within
the view here so I'm just going to
select this all and I'm actually going
to make these values slightly smaller
and I'm not liking the positioning of
these it looks like it's lower now that
I made this smaller so I'm going to
actually Center this this do a middle
align basically move it up slightly all
right my OCD is no looking good all
right now this is looking good now the
last thing we want to do is add a title
to this basically describe what is this
table that we're looking at and I want
to insert this in up on the top row but
I basically want it centered over this
table so what I can do is highlight from
b11 and from there select up here for
merge and also I want to Center because
that's I want my text Center during this
and from there I put in hey this is the
data science job count tracker and for
the cell style I'll make this heading
one now let's get into conditionally
formatting this table and specifically I
want to say if I'm looking at data
analyst I want to be able to look across
here and see which ones are the highs
and the lows right now I have this grid
lines and I can see that based on the
green and red or the highs and lows but
I want to actually be able to see this
in this table right here and so
underneath the Home tab we have this
conditional formatting available we're
going to focus on these three right here
first and that is datab bars and you can
see if I put it in it's basically
looking like a you know like a bar chart
color scales allows us to do well
different color formatting with it and
then an icon set basically allows us to
put in a nice looking icon and we're
going to stick simple for now we're
going to do color scales right now I
have C4 through N4 selected I'm going to
go ahead and select this green to Red
which is not bad if we're looking this
right this is doing exactly what I want
I want August which is the highest to be
highlighted green to attract my eyes to
it and then I want the red to be
November and December cuz a Lis I want
to attract attention to it but we want
to highlight the entire table here so if
I were to actually select the entire
table if you will from C4 all the way
down to n10 go into conditional
formatting color scales and do the same
thing you're going to notice it
basically does these bands but it does
this entire
table all formatted together and this is
not what we necessarily want of course
the total road is going to be the
highest I want to look through that row
and actually see where I should be
actually looking so anytime we need a
clear mess with any rules we come into
conditional formatting and go to clear
rules you have clear from selected cells
or entire sheet we're just going to do
the entire sheet then we're going to go
back to where we were before of
selecting just the data analyst values
going into conditional formatting color
scales and I'm going to go to this green
white red I actually want to try to
limit as many colors as I do two is
enough so I'm going to go green white
red red and I really like this one
better now I don't need to necessarily
go through once again of selecting
senior data analyst doing this again
what I would do instead is I'm going to
select data analyst here and then come
into this home menu up here and you
notice this paintbrush this is a format
painter in the instructions it basically
says select the content with with the
format you like click format painter and
then select something else to
automatically apply the formatting so
from here I can just paint my formatting
on unfortunately this doesn't have a
shortcut so I have to go do go back up
every single time it removes their
marching ants reselect the format
painter and go through and select it but
now we have this formatted how I want it
where I can look at a certain Row in
this case I look at data analyst see
what some of the highest are and Senior
data Engineers I can see how they
contrast to the other job titles
additionally which going be jump into a
little bit more later is we can go into
manage rules and we can see the current
conditional formatting appli
right now I have show matting formatting
rules for current selection I'm selected
the top cell right up here so there's no
conditional formatting if I were to
change this to just this worksheet I can
then if I expand this down I can see how
this applies this this type of
formatting of the red white green
applies to each of the different cells
and if I needed to actually control what
cells are actually selected I could do
that I could have also gone through
instead of done that copy formatting and
pasting I could done a duplicate Rule
and modifying the code as well but I
decided to do my way instead anyway this
is where you need to go if anytime you
need a manage conditional formatting we
cck
okay let's crank this up a notch and get
into using some more advanced
functionality with conditional
formatting here we have a new table you
haven't seen before basically it has all
the different job titles the counts of
those jobs aggregated from our data
sheet the median salary what is their
work from home percentage or likelihood
based on the jobs and then finally I
have this job rank right here which
basically uses these cells that are
hidden right here that if we actually
expand it out goes through and
normalizes the values so in this case
the job count normalize it between zero
and one so this job count is 90 is the
highest so it gets a value of one where
it's the lowest gets a value of zero
anyway I did this for all the different
values and then from there provide a
certain waiting factor of like 0453 and
0.15 in order to wait it appropriately
this is all my bias and how I wanted to
actually do it so feel free to adjust it
to what you want anyway we have this
final job rank in order to assess based
on these three values and this is
commonly done especially in like kpis
and stuff like that so we're going to be
making like icons for this column so
let's get into formatting our first
column we're going to do job count first
and for this one I want to have data bar
so I'm going to come down into condition
formatting into data bars and we'll add
these data bars right here I like the
bars in this case because we're dealing
with a count and we can really see
especially data analysts scientist
Engineers they really make up the
majority of the data here so it really
draws your attention to it next up is a
median salary we're going to do similar
to last time maintain a color scale
we're just going to do this first one
right here where green is the highest
salary and red is the lowest and then
one more we're going to do that work for
home we're also going to do it in a
color scale but for this one let's
actually do a different color go into
more rules and in this case we have this
new formatting rule window right here I
have two colors just say I want to do
one color I'm going to do white from the
lowest value and then we'll do like
purple for the highest value anyway this
is all basically to show a point this is
becoming
entirely entirely too much visually
distracting if you're if you were to
give this to somebody else or a
stakeholder where are they supposed to
look and actually organize their
thoughts on where they should
potentially pursue a job right now I'm
thoroughly confused at looking at this
so let's clean this up a bit and for
this I want to make it to where I like
maintaining a solid coloro across so
that way you know like hey if this color
is darker or there's more of this color
I should be looking there so in this
case we'll make this job count we're
going to just clean it up slight
slightly for the data bars B going to
make this like gradient appearance cuz
then I feel we can see the numbers
better and it's not too visually
distracting for the median salary I
really my goal of this is to find jobs
that are look say greater than 100,000
so let's actually just make highlighting
that highlights those jobs that are
greater than this value in this case I'm
going to come to conditional formatting
and enter a new rule this new formatting
rule popup comes up against once again
and we have a select a rule type this
allows us to do things like format all
cells based on the value format only top
or bottom rank values format only values
that are above or below average I
personally like this one of use a
formula to determine which cells to
format and in this case I want to say
I'm going to collect this formula thing
right here I want to look at you can
just select the first item in the item
selected so I'm select D3 it's going to
go through and actually do all of these
don't worry we'll see and for that we
want to highlight those that are greater
than
100,000 and press enter and then right
now it doesn't have any format set so
I'm going to change this to format and
we can control a whole host of things
such as the fill border font and the
number formatting itself but we're going
to stick with that blue theme I'm going
to just come down in here and I'm going
just select this blue color right here
and click okay and then okay again now
you notice my formatting is not
appearing that's because we have
multiple formatting applied to a cell
which you can do so in order to fix this
we need to come into manage rules and as
we see we have both of these applied to
it so I actually need to select this one
and I need to delete this Rule and click
apply and then okay now we're running
into our second issue and I slightly
misled you earlier when I said that D3
works if we go back into manage our
rules and we see our formula right here
I'm going to double click it we don't
need to actually provide an absolute
reference to a D3 because it's actually
going to evaluate all those cells based
on D3 instead we want it to be D3
without the dollar sign so it's not an
absolute reference and therefore
whenever I click okay and okay again bam
now it knows appropriately to check the
actual cell that it's looking at within
the range on whether to highlight it or
not moving on to the work from home
we're going to keep this similar in that
not going to be purple though we're
going to change this to Blue instead so
going into manage rules we have the
actual color right here selected I'm
going to just go in and change this to
this color that we used previously and
click okay and then okay as well so that
way it applies it all right the last
thing is this the job rank itself and
for this we're going to be using icon
set
specifically I like this one over here
on ratings but this becomes a little bit
over helming when where we have this
rating and also the number next to it so
we can actually remove this number in
the column we go back into manage rules
we can double click on that icon set
Rule and we can even further customize
when these stars are appearing but I'm
going to just go ahead and get to this
portion where it says show icon only
this allows us to only show the value so
going into applying this bam it's now
showing the icon I want that icon
centered both vertically and also
horizontally so bam now whenever I look
at this I can see especially since it's
all one color my eyes really gravitate
to well data scientists and data
Engineers based on this full star rating
and more of the blue being in this
region and that's what I would hope
people would go to or gravitate to as
well when they're looking at it one
quick note in this condition conditional
format we didn't cover this highlight
cell rules where you highlight greater
than or less than or you do a top uh
bottom rule where you can highlight the
top 10% or top 10 you can also adjust
that number anyway I find that myself
more using custom rules instead by
coming in here into new rule and then
actually fine-tuning what I want to do
so with the practice problems I'd really
dive into actually relying on using
these type of options instead and so as
I desly hinted to you have some practice
problems now to go through and really
practice how to do formatting and also
more specifically conditional formatting
in the next lesson we're going to be
move into collaboration and covering how
to actually protect your workbooks and
your worksheets so that way whenever you
share these with co-workers or friends
they don't go through and actually mess
them up all right with that I'll see you
in the next
one welcome to this last lesson in
spreadsheets advance for we jump into
our project and this lesson itself is on
collaboration which sounds sort of
cheesy but in order to demonstrate what
we're actually going to be learning in
this lesson we need to actually jump
fast forward a little bit and jump into
our project so I'm going to open up the
salary dashboard which is located under
project One dashboard so here's the
dashboard that we're going to build in
it they have three boxes that you can go
through and select this is going to be
using data validation which we're going
to be learning about in this lesson but
it allows you to basically standardize
the inputs that we want somebody to
actually select in in order to get the
results and it prevents them from
putting in values that maybe don't exist
and then breaking our dashboard so for
each of these job titles country and
types we have an Associated
visualization for each showing the
salary by job title the salary by region
and then also salary by job type finally
at the bottom I have some I call them
kpi cards basically outlining certain
characteristics or certain indications
of the median salary what is the top job
platform and then what what is a account
of jobs but I can come in here and
select something like maybe I wanted to
look at business analyst and it's going
to filter Down based on this telling me
what their median salary is that
LinkedIn is probably the best place to
go to for this what are the different
types of rollers availables and what's
available in the job database so the
other feature we're going to be going
through besides this data validation
process that we can do right here is
actually protecting your sheets which
you can find this here underneath review
under protect but anyway if you try to
move these cells around you're not able
to at all so we're going to be able to
design this dashboard in a way that
other co-workers won't be able to
destroy it additionally if you notice
down here at the bottom there's only one
sheet in here there's actually other
Sheets if I go to unhide here there's
other sheets I'll just unhide one of
them we'll just unhide data there's
other sheets inside of here but if
they're not applicable to my co-workers
or stakeholders I don't need to have
them so I can hide them so that's the
another feature we're go on over in
this all right nothing be yaen let's
actually get into this lesson for this
we're going to be using the
collaboration workbook in chapter 4 now
we're going to be building out these
three sheets as we go along and as a
sneak peek in this first example we're
going to be building out this little
portion right here this is going be
basically preparing us for our project
so a lot of this work is going to be put
to good use anyway we're going to be
building the simple one right here I'm
going zoom in where we have based on the
job title we can go through and select
it so senior. engineer it's going to pop
up with our median salary so that's what
we're going to be building with this and
specifically we're going to be using
this feature of data validation so I'm
going to create a new sheet to start
with because I don't want to start with
the answer right there I'm going just
call it calculator I'm going to put in
job title here and then median salary
below I'm also going to bold these by
pressing B and then these are where next
to it in column C is where we're
actually going to use the actual control
of this now we need to get a list of job
titles to put in this so I'm going to
create a new sheet and call it
validation and basically what I going to
do with this is create a sheet of all of
the different job titles available
specifically I'm going to say this is
going to be from the column job title
short and we're going to be using in
order to get the unique values of it
well the unique function we need to
provide it an array so I'm going come
back over here down to column A2 use
control shift select all the way down
close the parenthesis press enter okay
so now we have all of our different
values I'm going expand this out I'm
also going to zoom in a little bit now
whenever I do this drop- down menu I
want it in some sort of order
specifically I wanted in probably what
is the highest count value I wanted it
appearing at the top and those that are
less likely down at the bottom so what
I'm going to do is actually just copy
this value right here because this is
what we actually want to use what we
want to do is a count ifs we want to
count based on a condition for the
criteria range we're going to be
providing that job title short column
from our table and then for the criteria
we're going to be selecting right next
to it A2 B there we'll just autofill it
all the way down and then finally we
want to now sort it by this so I'll use
job title short sorted from there we'll
use the sort function to then sort this
by the second column position in
descending order so bam this is more
like I want I want those data analysts
that scientist engineers at the top and
the senior roles and so on cloud
Engineers car Bel so we now have this
list available that we want to use for
data validation we speak of I'm going to
go back to the calculator tab that I
made and for this we're going to go to
the data tab specifically under data
tools they have this this selection
available where where data validation
actually is and now this is going to
allow us to well customize it right now
the data validation for this cell is any
value I can place any value into it I
could limit it to a whole number I could
limit it to decimals a list a date a
time a bunch of things we're going to
limit it to basically a list of values
and we need to basically so provide a
source for this so for the source we're
going to go in and select the validation
tab that we just made and I'm going to
select all the different jobs right here
and then press enter from here I'm going
to accept this and press okay now as you
can see we have this little drop down
right next to it and I have different
selections actually available of data
engineer if I were to go into here
because I have this uh set to data
validation if I was going to put in
something like data nerd which isn't
available and press enter it says this
value doesn't match the data validation
uh restriction defined for this cell
therefore I have to go in and retry so
so only values within there are going to
be able to work in this so now let's
actually get into calculating that
median salary and for this we're going
to create a new sheet similar to this
median salary sheet we're going to call
this one salary wrong spot need to
actually enter it down here and call
this one salary throw this all the way
over first I need the names of job title
short and all that kind of good stuff so
what I'll do is I'll come over to our
validation Tab and I've selected equal
to already I'm going to select these
cells right here press enter so now
they're all appearing here now I'm going
to calculate the median salary for all
these jobs I know our calculator or
dashboard has uh only one value that is
calculating a time but in our dashboard
we're going to build we're actually
going to build a graph with all these
median salaries so we just need to
calculate them now all the median
salaries and then basically calculate
using data validation and also an X look
up what the median salary is going to be
here so for this we're going to be using
the median function and specifically
we're going to be using that if inside
of it because median if isn't available
we first want to check does the job
title here of data analyst meet our
condition of the job title short so I'm
going to type in the table itself of
jobs and then the column of job title
short close bracket and set an equal
sign equal to A2 then I'm going to close
the parentheses on this and actually we
need to wrap all this in parentheses
because we have to do multiple different
conditions we're going to do some array
multiplication the other thing we have
to check is that the values are not
blank or not equal to zero so once again
I'll put in jobs again and we're going
to be using that salary year average
column and we want to make sure that it
doesn't equal to zero and so that's the
condition we're checking for and so now
what do we want to return if true well
we want to return the salary so we'll do
jobs and then salary year average I'll
then close the brackets on that then we
need to close one parentheses I can see
a red parentheses still and then a final
black parentheses NOS I'm good press
enter looks like I got it right on the
first try let's actually drag this down
boom this is pretty nice so now we have
all the median salaries for these
different job titles I'm also going to
take this a step further of actually
sorting this by the med CER because I
know I'm going to be actually
visualizing this in the Project's lesson
so we'll go ahead and sort this as well
sorting it on the second index in
descending order so now we need to
provide the value in this case data
Engineers there is selected we need to
provide based on this value the median
salary and I want to just calculate it
over here just in case I need to go back
to it so for this I want basically
125,000 to here right here in G2 so I'm
going to provide an X lookup and the
first thing is this lookup value right
we're going to look up the data engineer
in this now I'm not going to use a cell
reference of going over here of
selecting this cell of data engineer
which is calculator C2 I'm actually
going to escape out of this we're going
to stop this right here I want to go
back to this I actually instead because
I'm going to be referencing these cells
specifically well this what s right here
a lot I'm going to just rename this from
C2 to title so right now I can see that
it is named title so going back over to
that salary tab again now we can perform
our X lookup and for the lookup value
we're trying to look up the title for
the lookup array we're looking up
through this job titles right here and
then for a return array the actual
salary values so now we're getting that
data engineer value of
125,000 similarly I also want to name
this cell as well I'm going to name this
one median salary pressing enter boom
locks it in so now when I come back over
to my calculator tab I can just put in
here equal to median salary I'm also
going to go through and format this to
make this look
better so just playing around with this
I can see that I can put in something
like senior data analyst and then a job
the associated Med and seller is going
to come up with it but let's say now I
want to give this to a coworker right
how can I prevent them from going in and
potentially you know entering in this
cell and then breaking it well we can
come up here to review and in this case
we're going to select this of protect
sheet now the first thing you can do you
can set a password to unprotect sheet
I'm not going to put a password but say
you wanted to put one you could and then
we have these options for for what you
can actually protect whether that's
select lock cells or select unlock cells
to protect we're just going to leave
both of these checked for the time being
click okay and now while one we can see
that underneath protect here it now says
instead of protect sheet it says
unprotect sheet whenever I go through
this and say I want to change it any
value whatsoever I can't change it so
it's good because the numbers can't
change or the median tile can't change
but now I can't change B job title which
is a little bit of a pain so
unfortunately Excel doesn't necessarily
make this the easiest I'm going to start
over again and just click unprotect
sheet and what we want to do is we're
going to select all the cells in here so
with all the cells selected I'm going to
press control and unselect C2 then right
clicking it I'm going to go into format
cells now under this protection tab
right here we're going to notice we have
options for locked and hidden we want to
actually be able to lock all the cells
except for C2 we don't want to hide any
so we're not going to adjust that right
now but now we're going to have the
ability to adjust whether it's locked or
not this doesn't actually change
anything right now so if I go into here
yes I locked those certain cells but if
I were to type into here it's still
going to allow it to be changed so now
what I can do is go into protect sheet
and previously we had both of these
selected of Select lock cells and select
unlock cells and in this case because we
locked all the cells except for C2 we
only want to allow people to select the
unlocked cell of C2 so I'm going to
uncheck this click okay and now I can't
click anywhere else except for where
I've set up that data validation in this
cell and I can still change it and it
will manipulate the value now we could
also go through and protect the workbook
itself I don't necessarily manipulate
with this as much instead what would I
would want to do in in this case is
actually hide all these other sheets
with the exception of this calculator
and so I can do this by right clicking a
tab and selecting hide so I'm going to
go through and actually hide all of them
so now we have everything as shown by
this tab down here of calculator we have
every tab hidden except for that and if
I wanted it to
reappear or get a sheet to reappear I
would just right click it click unhide
and then it's going to allow me to
select which option I can unhide and and
if I do want to make it to where a user
can't go in and necessarily unhide
sheets well I can go in here and select
protect workbook once again I can enter
a password if I wanted to I'm going to
just set this up but now when I come
down here to rightclick it there's no
option to hide or unhide a sheet so the
entire workbook is now protected so I'm
not going to lie that was definitely an
advanced intro into Data validation and
also protecting your workbooks but I
promise it's going to just come into
great use for whenever we're building
this project which will we get to next
now we do have some practice problems
for you go through and just test out all
these different features and with that
we'll be jumping in the next lesson and
actually building this data science
salary dashboard with that I'll see you
in that
one all right let's now dive in and
build our first project with Excel which
is this data science salary dashboard
this project is going to combine
everything that we've used and learned
up to this point from formulas and
functions to charts and then even to
data validation we're going to start
first by looking at the dashboard itself
you can just go to the project One
dashboard folder and Open salary
dashboard workbook now in this right now
you're only going to see one sheet and
as you try to click around you're not
going be able to do anything so as a
refresher if you want to actually dive
in and see what's going on behind the
scenes you'll need to First if you want
to actually touch any of these points
actually go into the review Tab and
click unprotect sheet then you'll be
able to investigate how I name certain
cells and whatnot additionally if you
want to investigate any of the workbooks
that I worked on you'll need to go into
unhide and select the appropriate
workbook that you want to well unhide so
for this we're going to be building it
out section by section specifically
we're going to start up at the top
building these data validation drop-down
menus then from from there we'll go into
building the different graphs associated
with it and then finally we'll end up
with these kpi cards now powering each
one of these major topics I've built
individual seats so for things like jobs
I have all the jobs along with any key
information to then build the
visualizations in it so here is the
basically the table that I made in order
to show the graphic right here similarly
for Country I have all the different
countries and then they're Associated
Med and salaries and I use that to not
only make the drop down but also make
the graph same thing for type and then
finally for platform anyway that's just
a quick overview to make sure that
you're under familiar with how we're
going to be working through this but
let's actually dive into
it for this I recommend picking up where
we left off in the last lesson on
collaboration did a lot of work for that
so we're going to use this workbook
first thing I'm going to do once this is
open I'm going to go in and actually
save it as this final dashboard and I
recommend that during this you're saving
this pretty frequently so we don't lose
progress first thing I'm going to do is
start moving this around I basically
know where I want to get these different
titles of these drop downs and then
where I want to put the drop downs we're
not going to be using meeting salary for
a little bit so I'm just going to take
that control xit and place it down at
the bottom then take the job title put
it in C3 and then move the data
validation to right below that we'll fix
all the format add in when we get later
on it okay so we have the job title now
the next thing we need to jump into is
country and we'll be putting that right
under this portion right here for this
I'm going to create a new sheet and call
this country with all these sheets I
want to have them pretty much similar to
what the title is above it so in this
case here where we had median salary
it's actually the titles um you have
named it in the previous one salary so
let's go ahead and just name this title
anyway going back to that country tab
that's where similar to the title tab if
you see we first grab the names of the
job titles from there and then calculate
the median salaries for each we're going
to be doing something similar in the
country tab with first putting in the
country names and then from there
putting in that median salary but I want
to keep a similar format as in this
title case remember we actually pulled
this from the data valid ation tab which
we're pulling here so I want to keep
this consistent anytime we're creating
anything for those drop downs we're
going to make it here in this data
validation tab so I'm going to create a
column here called job country and then
in this I want to get the unique values
from our data set specifically that jobs
table it's still named that jobs table
and of that column job country go ahead
and close the brackets and then close
parentheses and now we have all of these
different countries not sure why but
this is bolded I'm going to go ahead and
remove that anyway I want this in a
sorted format I'm not going to
necessarily sort it like count like we
did here with the job tiles I'm just
going to sort it in alphabetical order
so I'm going to use the sort function
and I'm just going to identify that we
wanted to use
G2 hashtag and Bam now we have all of
this also name this appropriately of job
country sorted so now we have our list
we can go back into here and actually
put in the country for the data
validation portion we do that by going
to the data tab selecting data
validation and the values we want to
provide a list to this and for the
source we go back to that data Val
station tab close this out and we
basically want to select all these
values here so I'll just do control
shift down pressing enter we now have
everything all the criteria for this I'm
going to go and click okay and I get
this error message and there's a problem
with this formula for some reason I
guess when I move back it added this
extra sheet in here I'm not too sure
this extra data I can't even select in
here anyway just make sure it's only one
sheet there it's going to work fine
country is now in here I can s something
like Argentina next value that we're
going to be looking at is the job type
so part-time full-time whatnot with this
although we're not going to use it yet
I'm going to create a new sheet and call
it type and also move that to the end
but now we want to get the unique values
of job schedule type so I'm put in the
column here of job schedule type and
then from there we want to get the once
again unique values for this we're using
the jobs table specifically that job
schedule type column and Bam now you
will notice from this one this one it's
a little bit this needs some data clean
up with it there's a lot of values in
here like it sometimes it has combined
values like full-time part-time and
internship and and whatnot we really I'm
actually going to expand this colum out
we really just want the single values
from this so something like fulltime
contractor part-time internship and then
also temp work so the first thing I'm
noticing about the thing ones we want to
remove is that they contain the word and
so we'll first identify those that con
turn and we do this using the search
function which is a text function to
find text specifically we're looking for
that keyword of and with intext we want
to just look through the whole array so
we'll put in J2
hashtag and I got a little error message
I need to make sure I use double quotes
for the text itself and running this now
I have basically number values for where
the and is located at and it looks like
yeah it looks like we're good on
everything with the exception of the
zero which we'll get in a little bit
okay so we need to convert this into
basically Boolean values because we're
going to end end up using this to to
pull out that we want using a filter
function so we're going to wrap this in
the is number and we're going to get
false or true and whatnot anyway all
right so now we have false or true the
last thing we need to do is use well not
the last thing second last thing we're
going to use the filter function and in
this we provided the array so in this
case it's going to be J2 hashtag and
then for what we want to include is this
other array that we just did so I'm
going go ahead and close this and see
what we get returned back and we're
returning now only the values that have
and in it we actually wanted to do
opposite of that right we want the
values that don't have an and so in
order to do that we're going to fix this
entire statement right here for the
include portion we're going to wrap it
in a giant knot to turn everything
around add an extra parenthesis on the
end bam now we have full-time contractor
part time we got the zero in there
internship and temp work we just need to
remove this zero out of it so we just
need to modify once again this right
here this portion of this include we're
going to do some array multiplication
basically once again looking through and
making sure no values equal to zero so
I'm going to do a multiplication do an
opening closing parenthesis and
basically we're just checking whether J2
hashtag is not equ equal to Zer let's go
ahead and enter this boom now we have it
down to the values that we want for this
I'm going to name this appropriately job
schedule type sorted also for some
reason this is in this column we're
going to move it over looks like we're
buing one spacing anyway now we need to
go back to our basic calculator Tab and
we need to enter data validation in this
portion to make sure can select the
right type so going select data
validation once again allow values of
list and then for the actual Source
itself we'll go to that data validation
tab select all these values in here
press enter and enter okay so now we
have the type in here so all of our data
validation portions are now
built next thing up is moving into
building the three different charts here
we're actually going to start with the
country chart because it's the easiest
and a sneak peek of what data is
actually needed for this I can go to the
country tab inside my final salary
dashboard and all we really need to do
is for each country calculate the median
salary and then throw it into a map
graph so back to our Excel worksheet
first thing we need to do is get those
list of countries and remember we
already have that so I'm put equal sign
it's inside of our data validation here
with these sorted values I want all
these values here here so I'm going to
do H2 hashtag press enter we have all
them all so let's actually start
developing the formula for building this
out using only we're just going to
calculate first the median salary for
that country and then also remember in
the past we've have to filter out any
values that basically equal zero so for
that if condition for The Logical test
we're going to do we're going to have to
do array multiplication and for our
first array we're going to be checking
for the job country right so we do that
jobs table and specifically that job
country column and we want to make sure
that it's equal to basically A2 in this
case the country right next to it
additionally we want to check that
there's 9 zero vales and so we're going
to be checking the salary year average
column and making sure that it's not
equal to zero so now moving on to the
value if true we basically want to use
the salary year average column value
false not applicable here go ahead and
close this looks like we have a typo it
went ahead and added that extra
parenthesis and we have a median salary
now and go ahead and copy that all the
way down now this is great but remember
in our if I go here back to to the basic
calculator tab we also want to not only
filter for a specific country but also
we're going to need to filter for a job
title and also for a job type so we need
to include not necessarily the country
because we're doing it for each country
but we need to include the job title and
the type now in order to add that this
formula is going to get a lot longer and
it's now getting hard to read so I want
to actually I want to one I want to
operate in this formula bar if you press
control shift U it expands it out and
then from there you can actually change
it to the desired length that you want
so what I'm going to do now is actually
break this into new lines I can press on
a Mac you're going to press Alt Enter on
the Mac I'm pressing option return
anyway I've went ahead broken this into
different lines I've also inserted some
spaces in there to basically put in some
indentation so I can read it better
don't have to necessarily do that but
now I feel like this is much readable
for my eyes go ahead and execute this
and Bam we have all the results and if I
do a drag and drop all the way down all
the other ones are updated as well so
the first thing we need to add to this
is to check for the job title itself so
I'm going put a multiplication there go
to the next line pressing Alt Enter and
for this I want to check jobs
specifically I want to check that job
title short column and whether it's
equal to basically title remember we
created title so I'm going go ahead and
press enter and it looks like we have a
typo because I forgot to insert a
parentheses at the end press enter looks
like I misspelled the actual table at
itself my bad press enter again now I'm
getting this name error right here and
that's because of this title that we're
using if we go back to that basic
calculator and select that cell C4 right
here it's named titlecore exe and I can
inspect the different names assigned to
cells by going to formulas Define names
and then the name manager now I started
directly with this workbook before we
actually created all these variables
here so what we'll do is this I'm going
to go ahead and actually just delete
this titlecore ex that was just an
example that's why it says ex then from
there I'm going to just rename it I'm
going to select the cell itself of C4
and I'm going to change it back to title
okay now it's Title Here back to the
country tab uh we have this updated for
the title it's actually appearing now no
name eror and I'll go ahead and drag it
all the way down there's going to be a
lot less values for this cuz we're
further filtering this so I'm seeing
some num erors that's as expected all
right the last condition we need to now
take into account is this type right
here and we haven't named this cell
already so I'm selecting K4 and I've
come up here and I'm going to select
type and now I've rename that as type so
we can finish this formula off we wanted
to I'm going to do a multiplication sign
start a new line by pressing Alt Enter
then do open and closeing parenthesis
for this we want to check if the job
schedule type column is equal to type
okay I'm going to go ahead and press
enter for this looks we have a value I
expect a few more even filtered from
here okay not a lot now one note on this
this formula is perfectly fine for
checking the job schedule tyght I'm
going to make it slightly better and
actually slightly more correct if I go
over to that data validation tab I'm
going to press uh control shift U to
actually close that formul bar if you
remember
from our job schedule tites yeah we
narrowed it down to this list but
actually there were the true list is
this so what we actually need to do is
check if a value is in here so in our
case we want to check whether the type
is in here so if we select part-time we
will also match on this job type here
where it says full-time parttime or this
one here where it says full-time
part-time temp work and we can do that
using the search function so we can find
something like part time within text of
right here and it's going to give us
back a number and then if it's not there
if I were to actually drag it down to
something like third column it's not
there it's going to get a a a value
error so I'm going to come back into
this and expand out the formula bar and
I'm going to change this formula right
here to basically get that condiction
remember we want to use the search
function we want to find the text of the
type which is that variable that we have
for the job type and we'll be searching
the job schedule type column now
remember this is going to return back a
number of the position if it's there so
we're going to need to wrap this all in
a is number function and then put
closing parentheses so I'm going to
autofill this all the way down again and
it doesn't look like any values at least
in view actually changed underneath this
formul bar for right now so I'm going to
go ahead and hide it and then for this
when we go to plot it we actually need
to remove these numb values from here so
in order to do this I'm going to I'll
create this new one called job country
filter and we're going to be using the
well filter function and for this we
need to include the array so everything
from here downwards pressing control
shift down to select that and then what
do we want to actually include well we
want to check to include anything in
that b column so is a number we going to
check those values are equal to a number
so I entered in that b column then as
well all right let's go ahead and run
this and it looks like it has all of our
values I don't like the order I'd rather
it sorted this is just me preference I'd
rather the numerical values be sorted so
I'm going to wrap this all in a sort
function and this is the array we're
applying to it we want to sort it on the
second index and for for this we wanted
to put it in we'll say descending order
and well Puerto Rico has some of the
highest jobs may have to move there and
okay we're going to get into applying
this now I want to make sure that we
have the maximum amount of values
present there's a lot of countries
missing that I know we available so I'm
going to just select the most basic job
possible to make sure that we have all
the jobs that we can appear so so we'll
just select data analyst United States
fulltime okay now we can go about
selecting column d and e and then
inserting in our map now I don't want
this here so I'm actually going to grab
this map and then come over here and put
it in I'm only going to do some minor
cleanup right now I'm going to remove
the chart title and also leged but we
now have this chart map available for
countries that shows the median salary
one quick note you are going to have
this sort of warning right here if I
click on it and it says hey we plotted
74% of the location from the data with
high confidence basically some of the
countries in there couldn't align
properly in my opinion it picked out a
lot of the major countries so I'm really
fine with that I'm fine if I didn't
identify all of them 74 is good enough
back to the final dashboard so we made
this country map right here now we need
to make these other two one thing to
call out with this which I don't think
I've called out before if we notice
whenever we select a job so in this case
I'll select data scientist it makes that
barall are a darker color blue the way
your eyes go towards it and then you can
compare it to the other ones so how did
I do this well if I go to my jobs tab my
final jobs tab what I'm doing here is I
have all the median salaries which we
calculated already in ours but I added
this over here basically I have one
column without we have data scientist
selected right now so I have one column
without the value appear in and then one
value with it appearing in and then what
we'll do from there is just some
basically manipulation of the graph to
make it to where in this case data
scientist appears so going back to our
worksheet of our fancy Dancy dashboard
we have so far going to go to that title
sheet remember we already did all this
portion of the last section first thing
we do is well we need to do some cleanup
we need to get rid of this name error
also we are going to create those extra
columns right here for basically what
job title selected but we need need to
more importantly if I expand out the
formula bar we need to update this
median salary similar to what we do with
job type to not only take into account
the job title but also the country and
the job schedule type so I'm all for not
repeating our work I'm going to go back
over to the country tab select the
median salary and I'm going to basically
just copy all that portion that's in
there anyway I'm going to escape out of
that come back into the job job title
tab select B2 and I'll go ahead and just
press uh Alt Enter insert all that in
and then now I just want to clean this
up we do want this country which we're
going to have to
fix but we don't need these middle two
right here that we already basically
have specifically with the job country
though so remember this thing's
calculating the median salary based on
the job title selected in this col here
and column A so this A2 is going to work
here previously we were doing the same
thing with country we don't need to do
country anymore we need to actually put
in a variable of country which we
haven't created yet so I'm just going to
enter country in it's going to give me
an error this name error I'm going to
come back over to the basic calculator
tab select this and then rename G4 to
Country press enter come back to the
title tab we're no longer getting that
name error looks like it's executing
just right I'm going to go ahead and
drag it all the way down and we do have
an error in my formula I have this comma
right here this is supposed to actually
be an array right this whole thing is
supposed to be um an array so now let's
try it again press enter okay 990,000
for data analyst in the United States I
know that's true and now we're filling
it in for all the rest okay so we have
what we need I'm going close out the
formula bar and remember we want to
basically in one column if it has the
word data analist we want to not include
it and then another one we want to only
include that one so we're going to use
an if for this so if this value which
we're going to go ahead and lock the
column is not equal to the title then
we're going to basically display those
results which I'm going to lock the
column for this otherwise I just wanted
to display an A and not a value Okay g
to go ahead and enter this and it is dat
analyst so it's not going to appear
there but it will appear all the rest of
these and so I locked those columns so I
can just drag this over and now with
this other one I want to do the opposite
basically if it's equal to title I want
it to appear and then I'll drag and drop
it all the way down so these are the
values I want to plot so I'm going to
select D2 to d11 then holding control
also select these values right here go
in and insert recommended charts and
first one up is actually the one that I
want so we'll go ahead and insert that
so I'll take this chart and also move
that right here into the basic
calculator tab with this one once again
I don't want a chart title and I don't
want a legend the other thing are the
values the horizontal values down here
I'm going to go ahead and double click
on that scroll down here all the way to
number and we're going to do that custom
formatting that we've done previously if
it's not peering uh feel free to type
the code in but we're going to use this
to basically format it as with the
dollar sign in the front and then also
the k for the thousands place all right
the last thing is you know I don't like
to use a lot of different colors in this
so making sure the graph is selected go
to chart design and then into chart
colors right now it's set under colorful
which I think is awful default value I'm
going to come down here and select not
this monochromatic palette 4 five sorry
the but the monochromatic palette 12 and
that's because now data analyst will be
the darkest blue the other ones will be
light so that way my eyes go to that one
instead so now what we just did with the
job title we need to repeat it for job
type so a lot of copy and pase in so
we're going to move a lot faster with
this one because we've done most of this
before for this we're going to be
entering in the type sheet and I'm going
to go ahead and pull all those things in
from data validation tab now we need to
get the median salaries for that I'm
just going to come back over to the
title sheet come into here and actually
just copy this en typable formula then
expanding this out with control shift U
pasting this in here now we need to just
change this up slightly so for the job
title we need to actually use the job
title whereas conversely for the job
type we no longer want to use type we
want to use what's available in A2
pressing enter we get our value for
full-time 990,000 of data analyst that's
correct and then drag it on down I'm
going to go ahead and close this for of
the bar and for this I'm going to use uh
similar to what we did in that Country
Sheet in where we not only filter the
data to make sure we include is numbers
but also we sorted it and that's because
sometimes these values sometimes we may
not have values and we go back to this
type tab sometimes there may not be a
certain job schedule type so I'm going
to go ahead and paste this in now it is
working I know there will always be five
values so I'm going to actually change
this to B6 here and also B6 here and
press enter now I also realized I made a
mistake earlier whenever I went to the
title sheet this is only doing the sort
function and we may have a condition
where in certain countries they don't
have all these different job titles
available so we need to do its similar
Hill here as well so I'm going to paste
that formula into here and then adjust
it because I know there's always 10 job
titles so it's going to go down to 11 in
this case and 11 here we go ahead and
run that there's going to be no change
the one issue though is in this case if
I go back to that basic calculator it
doesn't do it in the order that I want
so going back to that title sheet I'm
going to change that sorting value from
a negative one to a one so that way it
goes in basically ascending order and I
need to do the same thing here here as
well in the type sheet where it's also
in ascending order cuz we're going to be
making the same graph all right similar
to last time I wanted to if the value is
selected I want it to be highlighted so
we need to make those same columns again
so if this is not equal to the type I
want the value to appear and it be na
because right now fulltime is selected
dragging it over and then adjusting it
for equal instead and then dragging it
down I do want it to appear if it's
full-time now I'm going to select
D2 D6 and then these values in f and g
once again we're going to go to insert
recommended charts I don't like these
clustered columns I prefer a clustered
bar chart so I'm going to take this and
then put it in here make similar format
and changes as well of removing the
title and then also the legend updating
the xaxis by going into numbers and
changing the format to a custom format
to using the K value instead and then
finally the actual color Itself by going
to that monoch chromatic the color
palette 12 so bam now we have a lot of
this made so I can go through now and
select say data data scientist it will
update for selecting data scientist and
then you see all these other values
update as well I can also select the
different type um part-time in this case
and then the values still remain the the
same it just changes the bar that it's
selected
to all right the last major thing before
we get into formatting we're going to
make these three kpi cards one is for
the median salary the next is for the
top job platform and then finally on the
job count itself for how many counts of
jobs for all of these now one quick
thing Excel doesn't necessarily have kpi
cards like if you use something like
powerbi or looker they provide cards to
this we're going to do some sort of
backdoor approach if you will to make
this into a kpi card basically I'm going
to insert in a text box and we're going
to put a cell equal to it you'll see
what we're going to do with it but the
main point is these values this value
itself is not as you can see it's a
rectangle it's not in a Cell per se but
it is calculated within the workbook
anyway what we're going to be doing I
don't need this down here this median
salary what we did from the last lesson
I'm gonna go ahead and delete this but
the first we want to calculate is that
median salary and we basically have it
already and I'm going to calculate it
right here in this column of I2 and for
this we're just going to use a simple x
lookup and the value we want to look up
is based on the job title selected so
title and the lookup array is this array
right here and then the final return
array is right next to it there's a
missing value right now because Cloud
Engineers is not available in the
currenc are selected so make sure you're
selecting the full values and we going
to go ahead and close it but we have now
the median salary so I'm going to
actually rename this I2 cell to median
salary and then going back into our
basic calculator tab remember I'm not
going to insert it into a sell in here
but instead we go into insert and then
illustrations and I'm just going to
insert a simple old textt box I'll drag
it right there now the thing is I don't
want to type inside of here what I'm
actually do is I'm going to select the
Box itself so you no longer have that
blinking cursor in there come up into
the formula bar up here type in equal to
median salary and Bam now if you notice
it copied the formatting that we
previously have right here as a cluster
number looking at right there it copied
the same formatting that we're using
here in I2 so what I'm going to do is
just go in here and change this
formatting to a currency with zero
decimal places and then once we have
this value actually updated go back to
basic calculator we can see boom looks a
lot nicer we'll adjust the formatting as
far as the size and stuff in a little
bit after we calculate all the other
ones the next one from our final
dashboard is the top job platform so
we've only calculated things associated
with the job title the job country and
the job type so we need to make a new
sheet and we'll rename it platform and
technically the column name is job via
and for this we need to get the unique
values of the job via column now for
this one we're trying to get the top job
platform so we're not necessarily doing
that based on what is the top median
salary on this I just want where are the
most jobs actually located so we're
going to be doing a count using control
shift U to expand the we've been using
this median with this if array in it
we've already built this out already
which this formula does so you could so
we're going to use this I'm going to go
ahead and copy it by pressing contrl C
coming over to platform and then pasting
it in with contrl v okay and instead of
median we're going to use count and the
only other thing we need to update on
this is we stole it from the job country
page is we need to update the job
country to be well country and we need
to check one more condition so we need
to add to this array I'm going press uh
Alt Enter to create a new line and we
want to check that job via is equal to
in this case A2 and we go ahead and
press enter looks like 10 were available
for Via script zip recruiter and then it
calculates all the way down now remember
our data set also has hourly data in
there as well so technically if you
wanted to which I'm going to I'm going
to remove move this condition right here
that we're checking that it's not equal
to zero basically it's also going to
include if there's a job that has an
hourly salary included so I'm going to
go ahead and backspace out of that press
enter and then from there drag and drop
it down and I can see we added a few
more values because of this I'm close
this formula bar control shift you all
right so now I need to sort these values
basically from high to low selecting all
the values using control shift down the
sword index we want to use the second
index and we want to put this one in
descending order cuz we want the highest
one up at the top and for this it looks
like snag a job is the highest anyway uh
this is what we want this first one
actually appearing in our kpi card but
if you notice all of these have via in
front of it so what I'm going to use is
a text function of substitute which
replaces existing test with a new text
and for our text in D2
the old text that I want to replace is
via with a space and the new text is
just a blank value so snag job is now up
the top this is what I want to be known
as we're going to rename this variable
to platform then we do the same thing on
our dashboard of inserting a text value
and for this I'm going to select it and
say that it's equal to platform all
right so snag a job and for this one
this one is well somewhat simple but in
our data validation tab we were in the
very beginning in the last lesson we
were calculating the count and we were
calculating a generic count of all of
them so we need to once again modify
this because we want the count based on
our three conditions here so what I'm
going to do is just basically steal it
from what we did previously go into that
B2 cell in the platform sheet go ahead
and copy this all and then then in here
I'm going to expand this formula out I'm
going to go ahead and replace that in B2
with this now a few modifications we can
make to this we're no longer checking
the job via column we're not trying to
check that for the count that was
specific to where we stole that from so
I'm going to delete that and also this
uh multiplication point and then this is
checking all of the things selected of
country title and type we're wanting to
check the count of a certain title so
instead of having title we'll put in a
A2 pressing enter we have a lower value
because we've the current filters are
lower and then we'll fill it all the way
down closing the formula bar out we now
want to get the count for whatever is
selected so I'm going to go to an empty
column over here right here and we're
going to be doing an X lookup again the
lookup value is what is the title that
we're using the lookup array is we'll
use this one right here and then for as
far as the return array right next to it
pressing enter boom get a value of 537
now just to be safe in case there aren't
any results like say it was zero or
something or not applicable it's going
to be basically not applicable I do want
to include if not found I'm going to
enter in no results and I'm going to do
the same thing underneath the title
sheet for where we calculated the median
salary put for no results
so I'm going go ahead we want to get
that count in there so we insert that
illustration again for us we're going to
insert a text box and that textbox is
going to be equal to count which I don't
think we actually named yet so I
actually need to go back to escape out
of this go back to the data validation
tab rename this count and then from
there with the text box selected I'm
going put that equal to count now for
each one of these text boxes I need to
go through and actually
as you can see the we have a text box
for the value but I actually want to use
a shape basically background to tell us
what we're actually performing or
calculation that this kpi is showing so
I'm going come in here and to insert
illustrations for shapes we're going to
keep it actually we'll say a rectangle
this time and then we'll go ahead and
draw it now for the shape format itself
I'm going to go to this one right here
basically a blue around with white on
the front and with these shapes you can
still put in text in here so I can put
in something like median salary and I
can open up the Home tab and I can
actually customize this further so I can
make this bold I can put in the center I
actually want Center top and I'm going
to make this slightly bigger by 20 point
also I'm noticing this box is a green
outline I don't really like that I'd
rather a blue outline so we have that
now okay so how do we get that number if
you notice the number is no long it's
hidden behind here we can do a couple
different ways but I'm just going to
rightclick this object and then under
shape format you can go to send
backwards specifically I want to send
all the way to the back now getting into
the actual text box itself if you notice
there's a little bit of a a box around
it I don't really like that I'm also
going to exp expand it all the way to
the edges I'm going to format this one
as well to be centered bold and then
we're going to make the font much bigger
on this and I'm going to once I like I
talked about remove that shape outline
right now it has a a light one I'm going
to say no outline okay so now it looks
like a kpi card copying this I'm going
to then make two more and for each of
these I'm going to send them back to the
back name appropriately to top job
platform and job count for this I'm
going to just copy this text box here
that has the median salary in it and I
just want to copy the formatting to the
other ones as well so we can
conveniently use this paintbrush this
format prer and I'll select this one it
disappeared I have to reselect it and
I'll also select this one if you notice
the names are cutting off so it's really
important that you extend it all the way
over same thing with the job count as
well now we're getting into the format
portion of actually just doing some
final touches on here I don't like grid
lines so under view tab I'm going to
select remove grid lines for each of
these charts I don't really like those
outlines I want it just to sort of blend
in to make it look like it's there so
for the shape outline I'm going to
change each of them to no outline up in
our data validation point I want to make
the spacing right I'm also going to make
these titles slightly bigger for the
dropdowns themselves I want them to
basically pop out so I'm going to change
this formatting I'm going to go to the
cell Styles and I really like this one
of input because it sort of calls your
eyes to what you need to go to I'm going
to make this G column slightly bigger
and then shift the type over some the
other thing I want to do is add a title
up here at the top for what this
dashboard actually does so I'm going to
select cells B1 through L1 I'm going to
do merge and center and I'm going to
change this to data science salary
calculator along with going to the cell
style we'll do heading one for right now
I want that to still be slightly bigger
okay now we're going to to start moving
stuff around but I want to get in it's
like its final form that I'm going to
give to colleagues and co-workers and
I'm going to give it with the Home tab
closed and also with if I view this can
remove headings so it moved the column
headers the A and the B and then the row
numbers as well so it looks like
everything's upda correctly one minor
thing this job count I want to make sure
after I select it fulltime I saw that
the formatting of the thousands with the
Comm is not there so going back into
that data validation tab I'm going to
select this go to home make it a comma
and remove all the decimal places okay
looking good all right now we need to
get this set up to give to colleagues I
don't want them to have all these other
tabs or all these other sheets so I'm
going to go through and actually just
hide the ones that aren't applicable for
them Additionally the sheet of basic
calculator doesn't really make sense
anymore cuz that was for that first
lesson I'm going to actually name this
to salary calculator now call could
still potentially go in and they could
mess up these formulas and so we need to
now protect our worksheet and we only
want them to be able to manipulate these
three cells so we're going to be going
through protecting the sheet but we need
to actually recall that we have to pick
what cells that we want to lock right we
need to select all the cells and I
preemptively told you to hide the
headings you need to go back into view
and show the headings again cuz we need
to be able to select this triangle in
the upper left hand order to select all
the different cells and then from there
holding control unselect these three
cells and then from there we're going to
right click in there go to format cells
under protection and we want to make in
that case that they are locked or
basically we are going to be able to
lock them conversely we need to escape
out of this and now select the three
cells that we want to unlock right click
go to format cells and for these we want
to make sure that they are not checked
for this so basically unlocked whenever
we go ahead and protect the sheet so now
whenever I go into review go to protect
sheet I want to be able to select unlock
cells once again if you want to enter a
password you can I'm going to click okay
so now I can't click anywhere else
except for where we have our data
validation so I can go through and
select things like data scientist and
turkey now I'm just going to add that
last final touch of removing the
headings
bam we have our dashboard now I promise
last last thing before we go I'm
noticing and you're probably noticing as
well if you're going through and
manipulating these values in this case
let's go from data analyst from previous
selected data scientists this me talking
in real time I want to show it takes how
long it takes to load and it takes
forever to load why is it doing this
this is not good for stakeholders
they're going to get annoyed if it takes
this long I'm going go ahead and unhide
some of our sheet repats specifically
that platform one now these formulas
that we're using um the array formulas
to calculate these values it's F so in
this platforms one we have like oh my
gosh in this case we have close to 200
oh no it's like slowing down even going
through this we're executing this
hundreds of times in here whereas if I
compare it to something like the title
sheet we're only running this you know n
10 times which I feel isn't that big but
if we're running this formula hundreds
of times it's going to slow down this
sheet so I have a quick fix for this and
it involves we're not going to
especially for this sheet here platform
sheets we're not going to use this um
array multiplication order to calculate
this instead we're going to use a count
ifs the first thing we're going to do is
check that the Java is equal to the
criteria one of A2 so basically job
platform is what it is says it is from
there we'll check the job title short
column to make sure it makes up with
title we'll check the job country is
equal to Country and then finally we're
going to check that the job schedule
type is equal to type and then we're
going to go ahead and execute this and
then we're going to autofill it all the
way down notice that 1490 it's actually
going to go down slightly to
1426 and that's because we've now
changed this condition inside of this
count ifs specifically if I go back to
that title sheet you remember whenever
we match for this we did a really
indepth search so if any job schedule
type contain those keywords we match to
it now we're only matching it if it
exactly matches but since this job
platform is just providing it's not
providing a numerical value it's
providing what is the Top Value I don't
think the Top Value is going to change
that much so I don't think we're being
inaccurate about this if we change this
formula anyway going back to the actual
dashboard itself now whenever I change
this from data analyst to data scientist
it is much faster so now I'm go ahead
and hide those sheets and we are done so
that was a heck of a lot of work so in
the next lesson we're going to be
getting into how you can actually go
through and share this dashboard
specifically for those that have a
Microsoft description you can use
something like Microsoft online because
it has all these features that we have
within here and host it there for others
to use additionally we're going to get
into my recommended method of sharing
any your projects and that's via linked
in now just a heads up we will be
getting into git and GitHub after
project 2 at the very end of this course
and during that portion we'll talk about
how to share not only project 2 but also
this project here but that's more
complicated and I really want to focus
on Excel so with that we're going to be
shifting in the next lesson to quickly
share it and then moving into the
advanced chapter all right with that
I'll see you in the next
[Music]
one first up congratulations on
completing your first project in Excel
and building this salary dashboard been
nothing short of your hard work and you
shouldn't let that hard work go
unnoticed so in this lesson we're going
to be going over different methods you
could go about actually sharing this
project to your social network and to
others to help out in the job search or
future employment now if you were just
learning these skills for fun you had no
intent getting a new job or increasing
your pay in your current job then you
can feel free to skip this and go to the
next chapter on pivot
tables so there's a few different ways
you can go about sharing your work that
you did we're not going to go dive into
deep any of these we're going to look at
these more at a high level before
jumping into one of the options first up
is a portfolio website here I have luk
bru.com and if I wanted to I could come
inside of here and edit it and include
my project here along with what I did
for others to see another option even if
you don't have a big following on
YouTube is you could actually go in and
record and describe what you did within
your dashboard and host it somewhere
like YouTube now for both those options
you may be like Luke how do I actually
actually share my Excel file that
actually went through well that's where
we run into a little bit of issues as as
yes we created this Excel file right
here but how do you actually go about
sharing it with others to see your work
well one option for this is actually
hosting your file online via something
like one drive which if you're paying
for a subscription of Microsoft service
you have access to one drive and you can
host your dashboard online all I need to
do is navigate to One drive. live.com go
to this add new and files upload from
there select my file that I actually
want to upload online and then we can go
to it and our file is actually uploaded
here which we can actually go through
and select something like data
scientists and it will actually
calculate based on the changes we make
to it now one note the country chart
inside of excel online doesn't work but
I have a fix for it and mainly it's to
just remove it you go into the review
tab under protection and go to manage
protection and then you turn off sheet
protection then from there you can
delete it next all you need to do is
just take those charts and actually
extend them over so way they take up
that extra space and then once you're
complete with that turn back on the
sheet protection and now you can go
about actually sharing this so here I'm
coming into share and you can add an
email if you want or if you just want to
share it in general with a link you can
come down here and fine-tune the control
of a link to provide in this case I'm
selecting that I'm going to share with
anyone they can edit it you could make
it view but then they can't change the
dropdowns so I recommend that you still
leave it on edit you could set an
expiration and even password and then
from there click apply and now you have
a link to your dashboard that works even
if you don't have a Microsoft account so
here I am in incognito mode within my
browser so I'm not signed in at all and
I can actually go in and access this
dashboard and go through and select
something and it updates in real time
and because I got that sheet protection
on they can't go through and change
anything except for these dropdowns
don't believe me you can check out my
project via the link below but what
happens if we want to not only maybe
share our file but also write up what we
did the work we did with this and all
the different skills that we used well
that's the case of using something like
GitHub GitHub provides a location to
store Excel files like shown here along
with giving you the ability to go
through and perform a write up detailing
all the different work that you did now
if you wanted to see this you could just
navigate over to my project where you
download all these files from on GitHub
navigate into that project
one-board and in here has our Excel file
and also this read me which then appears
actually underneath here and details all
the different work that we did for this
now getting this project onto GitHub if
you're not familiar with GitHub up is
fairly complex we're actually going to
be saving this for after project 2 and
in that case navigating back to the
project itself we'll not only be
uploading project one we'll also be
uploading project two as well so after
we finish the last chapter chapter 8 on
power pivot we'll be getting into all of
this and you'll be learning more about
git GitHub and how to manage a
projects now from what I found working
in data science it's that the best way
to share your work and your project and
potentially collaborate with others is
use something like LinkedIn a social
media platform for networking in order
to share your project specifically here
I am on my profile right here and if we
scroll on down they have a section in
your profile to basically show all your
different projects that you've worked on
and contributed to and adding a project
is super simple I got to do is click
this plus icon include a description in
my case I was trying to help out job
Seekers inves salaries for their desired
jobs put in a few skills up to five of
Microsoft Excel data analysis or Excel
dashboards now for media they do have
options to add a link or media in the
case of the media it doesn't support
Excel files and then if you try to
insert your one Drive Link I ran into
errors so I find the best way to
actually just share the link is to post
it inside of the description from there
specify when you start and stopped on
this project anybody that contributed to
it this or anything that is associated
with and then from there click save the
other option that I recommend is
actually just going in and making a post
here I just write up a short little
description of what you did with your
project and then if you want include
something like an image or even
something like a gif which shows an
overview of the project and then
probably the most important thing is
actually sharing that link to your one
drive online you can also Post in the
comments and not include in the
description it's really up to you anyway
go through there and then post
so bam that's how you share your project
as a reminder we will be going into
greater detail into how to share both
this project and also the second project
on GitHub using git and also use things
like markdown in order to write about
your project but that'll be included
after we go through all of the different
Excel content just wanted to have a
quick way of you going through and
actually sharing what you've done so far
cuz I know you're probably excited and
proud of it all right in the next videos
we're going to be shifting gear into the
advanced chapters getting starting off
first with pivot tables with that I'll
see you in
there all right welcome to the advanced
chapter and because we're get into the
advanced section you know it's time for
a new
flannel and with this Advanced chapter
we're going to be focusing on a few core
topics that I think is going to make
your life a lot easier specifically
we're f focus on things like pivot
tables power query and also power pivot
all of these are great at automating my
Excel workflows to make it a lot easier
to do repetitive analytics that my boss
may come to me back and back again for
instead of with something like a formula
where I have to go through and make and
copy and paste that formula all over
again and rerun that whole analysis
these Advanced chapters are going to
make your life a lot easier anyway in
this chapter we're going to be focused
on pivot tables this lesson specifically
will be getting an intro into pivot
tables how to make them how to
manipulate them how to even read them in
the next lesson we'll be going into
advanced pivot tables looking at things
like grouping and even aggregating such
as getting percentages of grand totals
and whatnot and then the final lesson in
this chapter is on pivot charts which
allows us to basically take what we have
in our pivot tables and convert it into
a usable chart hence the name pivot
chart all right so let's actually get
into it and understanding why these
pivot tables are so
important so in the basics chapter we
made this table right here which uses
hardcoded values for the different job
titles along with the different months
and then from there uses formulas
specifically some product along with
some array calculations in order to
calculate how many job counts per month
this is cool and all but what happens if
we wanted to add another job title so
say we have like some like business
analyst or we have software developer
we'd have to actually manipulate and
upgrade all these different formulas
that we have here well here's that same
table but in a pivot table and by its
name that's what they're great at
they're great at pivoting and thus
aggregating data based on certain values
and whatnot so what is if we want to add
more job title this well I can just come
in here similar to how we manipulate a
table select this filter dropdown and
then go from there and select things
like oh I want to include something like
a business analyst and then the data
automatically updates for this no
readjusting formulas makes it super
simple I can even take this table a step
further and if I wanted to I can
actually filter by the job country in
this case I'm filtering by the United
States and we now have these values
makes it super simple anyway we're
getting ahead of ourselves we actually
need to get into creating our first
pivot
table all right so for the advanced
chapters it's going to be a little bit
different for what files you're going to
use for this the final results of this
lesson will be in the lesson title of
pivot table intro but what I want you to
do whenever you're going through or
following me along in this lesson is
actually revert back to the previous
file of the last lesson in this case or
the first lesson so we don't have one so
I have this one called zero of just
pivot tables that's the one you want to
start with so in this case pivot tables
itself just has the data tab of the data
we want to work with and this sheet of
the table that we've been familiar with
in Basics chapter which by the end of
this we're going to make a pivot table
out of and when out of I mean actually
of the core data itself anyway for the
actual pivot table intro this will have
also those similar tabs but then also
the lesson itself will have all the
different work that we've actually done
to complete what we need to do so feel
free to just have both of these up
during a lesson so that way you can
consult back and forth in case you get
lost all right so let's get into our
first pivot table we're going to be
using the data that we previous been
using of all the salary data for those
job titles anyway if I go into the
insert tab up here in the top left hand
corner I have pivot tables but I also
have recommended pivot tables if I don't
have an analysis in mind I could come
into recommended pivot tables a Pan's
going to appear on the right hand side
and notice here that it actually
selected the data range I know that's
the data range and it goes through and
provides some recommended different
pivot tables that you could put into
here whether you put it into a new sheet
or an existing sheet but I know what
analysis I want to do specifically I
want to do a count of the different job
titles so data engineer I want to find
the accounts of this senior data analyst
and so on right now it's not providing
any of that I don't typically find that
any time with recommended pivot tables
that it provides me what I want so I
don't find myself using that often
instead I go directly into pivot tables
right here and then we have three
options but we're really going to focus
for this lesson and this chapter is from
table or range I'm selected inside of A4
right now but it automatically knows
that this is the data range all the way
down to the bottom the other thing it
says is choose where you want the pivot
tail to place you can either do a new
worksheet or you can do inside the
existing worksheet but you have to
specify a location we don't want that I
typically like it in a new worksheet to
keep my analysis in one standard
location the last thing it asked is
whether you want to analyze multiple
tables specifically add this to the data
model we're going to be going into Data
models very heavily in the power pivot
chapter or chapter eight or last chapter
this is a super powerful feature when
you have multiple tables you need to
combine it we're not doing it in this
lesson or in this chapter so we're going
to leave it unchecked so now I'm in this
new sheet that I'm going to rename to
job count and I'm also going to move it
over here to the end anyway this pivot
table this pivot table 2 that is calling
it is there's nothing in it right now
and you notice there's a few things that
popped up first is the pivot table
analyze tab which is available with this
and also the design tab we'll be going
into these in some upcoming examples
that we're going to get into we're
however going to be focusing on for this
example example on the job count I'm
going to close this out on this pivot
tabl Fields pane right here now the
layout of this you may see it's somewhat
different is we have the columns over
here on the left so if you remember the
job tile short column job tile column
job location and then these fields on
the right hand side are things for like
filters row columns or values so I can
take the job title short column put into
something like the rows and get
basically all the values in the rows now
your layout may be a little bit
different if you come up and select the
tools icon right here you may be under
this Field section and area section
stacked which has the feels down here on
the bottom I personally don't really
like this because look how short my
column titles are so I like having them
like this instead anyway I think we
understand this columns area right here
but I don't think we understand these
filters rows columns and values so let's
explore this by calculating the counts
of these different job titles now
anytime I add something to the rows or
any of these columns I can either remove
it by grabbing it and pulling it off
notice they have the x mark on it or
similarly I can also just come in here
and click the uncheck Mark box that's
more applicable if especially for having
it in multiple different panes and want
to move it completely makes it simple
besides rows we also have columns and so
instead of the job titles being in rows
they're in the different columns I don't
really like this too much I typically
find myself using rows so we're trying
to calculate what is the count of these
job title shorts so I'm just going to
take that job title short again and put
it into the values and it automatically
Aggregates this by counts of that but
what happens if I don't want to do that
count aggregation well one way is to
come back into that values right here
and I'm going to just click it not right
click it just normal click it and then
go into value field settings and this
pop-up is going to come up first up is
the name of the column itself I actually
don't like this for of a name I'm just
going to rename this to job count under
here under the summarized values by tab
you can select a lot of different
aggregation methods we're going to stay
with count you can also change how you
show value as basically if we wanted to
do a percentage of some total or not
we're going to be jumping that in the
advaned lesson so stand by for that the
last thing to note with this is the
number format so I can come in here and
actually select in our case we have
thousand values so I like to use a th
separator along with zero decimal places
and then clicking okay to apply this all
it updates the formatting and the name
so we've going over rows columns and
values what happens if we want to then
filter let's say for only United States
jobs well I could drag something like
the job country column into filters and
right now it's selecting all you have
you see this pan come up right here and
from there here I can actually go
through and select something like the
United States click okay and now the
values as you can see they reduced and
are only United States value other type
of filterings I can do I can filter the
row itself so if I wanted to I could
select the different job titles that I
want to appear in this and click apply I
could also do something where let's say
I wanted only job title so we're going
to do a label filter and jobs that
contain the word data so I could just
type in here
data and whenever I filter it I get all
the different jobs that contain data
similarly I could also filter by this
job count here and that's by the values
filter so I'm going to remove this label
filters to start with and we can go back
in here in the values filter and we
could do something like hey we want to
get jobs that are only greater than
let's see here Cloud Engineers 33 I
don't want to see that anymore I get to
greater than 100 and it filters down but
we're not going to use any filters right
now so I'm going to one clear this
filter for the table and then also
remove this filter from filtering for
the United
States so let's get into taking this
analysis of step further and we're going
to want to now analyze the average
salary of these different job titles
while we're going through this we're
also going to be exploring the pivot
table analyze tab so a quick tour of
this tab first up over here on the left
is Pivot tables if I wanted to I could
go through and rename this i' probably
name this typically something similar to
what is my sheet name itself this case I
named it job count additionally inside
of here we have options which allows us
to do a lot of detailed control of how
we're building our pivot tables it's a
very Advanced feature I don't find
myself going into it quite often unless
I need to fine-tune the functionality of
it active field so that tells us
basically what's the active field
grouping is something we're going to go
into in the next lesson we actually go
and Performing groups of different job
titles slicers and timelines we're going
to be going into the last lesson on
pivot charts in order to basically use
these slicers and timelines to filter
data section is used to control our data
so I can click something like refresh or
refresh all it's going to refresh the
data that we have so in this case
remember business analyst is around
1,1 so if I go back to our data itself
and I find this entry on business
analyst and then and let's say that
that's not correct and I delete that out
of there whenever I come back to this
table itself it still says
1,1 what I have to do is well we've
updated the data so I have to well
refresh it now that I refreshed it it's
down to 1,000 I actually don't want to
remove that entry so I'm going to just
press contrl Z and bring that right back
and then also click refresh to make sure
it's up to date if I want to change the
data source or maybe the range I could
go into something like this of change
data source actions allow us to clear
select and even move a pivot table for
calculations they have things like
calculated fields and items but we're
going to get into measures and I feel
they're way more powerful so we're not
going to cover this much the last thing
to cover with this is over here on the
right hand side is the show sometimes
whenever you're navigating you'll click
into your pivot table and that pivot
table Fields pane won't pop up you can
also pan it on and off by clicking this
field list and if you didn't want
something like row labels at the top you
could just remove the field headers as
well so getting into that actual
analysis we want to analyze the salary
year average what is the average value
now I can't see all the different values
selected in here so I'm going to
actually going to go ahead and close
this paint up here to have a bigger view
anyway what it did was it did a sum of
the salary year average we don't really
want that we want to go to average and
I'll change this column name to to
average yearly salary now if you've been
following along since the basic chapter
you probably know that I prefer me
performing a median for this salary data
over an average but if you actually go
through this there's no median value for
this that doesn't mean you can't do
median in pivot tables you actually can
you can actually do even more advanced
stuff which we're going to get to in
chapter 8 and power pivot but for now
we're just going to stick to only
performing average for this I'm going to
click okay so the formatting on this is
all jacked up and we could go into that
field settings and adjust that or I can
actually go in as long as I have all the
values selected here I can select hey I
want to convert this to a currency and
that I don't want any decimal places and
it's going to format all the values and
I feel this is a little bit easier
because now actually if you go back and
in exploring the value field settings
inside of number format it actually
applied this custom formatting for me so
it knows to apply that since I applied
it to all the values that were visible
now since this is so easy I could also
do something like get the average of the
hourly salary once again it's doing the
sum of that and I don't want that I want
the average itself and I can change that
column Name by just going in here and
typing in average hourly salary
inspecting the value field setting it
also updates inside of here and I'm
going to go ahead and adjust the
formatting as well changes to a currency
with two decimal
places so let's get into actually
cleaning how this table looks up and we
can go and do this by going into the
design tab now I'm going to start over
here on the right in pivot table Styles
and we can actually change what it may
look like in this case I sort of like
this one right here the simplistic look
I can also change things like column
headers which I like the formatting on
it or whether I want banded rows or
banded columns in my case I kind of like
the banded rows we'll go with that last
portion is around the layout if you
notice down here we have this grand
total over here this is a grand total
based on well the column values it's
adding up all the values in the column
so this is on for the column so if I
wanted to turn it off for rows and
columns I could come up here and
actually do that I kind of like this so
we're going to leave it on I could also
turn on on for the rows and columns but
in this case because we're doing
different aggregation method so a count
here and an average here it's not
necessarily going to do anything over
here for the row grand total whereas for
something like the columns gram total
that knows that hey for a job count I
probably need the total count for the
average I probably need an average and
that's what it does for both of these
there's some additional ones up here on
adjusting the report layout adjusting
for blank rolls and then also subtitles
we'll be exploring that as we go along
as we build out more complex pivot
tables
so let's now get into that final
analysis and we're going to be creating
basically this pivot table that we did
previously with formulas and functions
so what we'll need to do or think of
right we're going to need the job title
short in the rows and we're going to
need the month the job posted months in
the columns and then we'll need to
aggregate this by count for the values
now I can navigate back to the data Tab
and once again go to insert pivot table
if you notice here it says from table or
range so that's the really good thing if
we actually convert this to a table
we'll now be able to once we do this
press okay and rename this to something
like jobs now we can really be anywhere
in this workbook in this case I created
a new sheet I go hey insert from table
arrange specifically I want to do a
table of jobs and we want to do this
existing worksheet in A1 and all the
values from that jobs table are now here
so we know we need the job title short
along the rows but then we need the job
posted month across the top which right
now we have a date we could put the date
into the columns but we get this air
Message hey you cannot place a field
that has more than well 16,000 different
values for it so we're not going to do
that also before we forget I'm going to
rename the sheet to monthly count anyway
we need a monthly value here so what
going to have to do is good thing about
the table itself is now that we've
created this as a table I know next to
this job posted date colum I want to
insert in a column called job posted
month and for this we'll just use that
text function that we already know using
the value of job posted date and then
for the format we know we want three
lowercase M to get the month itself it's
going to fill all the way down okay so
now we have job posted month going back
to our pivot table itself remember we're
not going to see job posted month in
here until we actually go back into
pivot table to analyze and click
refresh now job posted month is inside
of here and conveniently it's also in
the correct order now this thing is
completely blank right now we need to
actually add what values we want so I'm
going to drag job title short into
values and it's going to do a count
notice here we do have column value Val
which go up and down and then the row
values itself so we can see what the
count of business analyst is around 101
I'm not really a fan of these things
that say row and column labels I'm going
so I'm going to toggle off field headers
to make this look a little bit better
and I'm also going to change the name of
this to monthly job count so bam this is
looking good and we compare it to our
basically non-pa table just to make sure
that our values are correct we can see
we have 982 for data analyst come over
over to data analyst we have 982 all the
last thing we want to do is actually
filter this down and better sort our
values specifically I'm curious about
roles in the United States so I'm going
to drag that job country over here and
select United States from here to apply
to it additionally I care about the most
important jobs at the top and the least
important at the bottom mainly by this
grand total right here and so what I can
do is I can sort it by the grand total
but if you notice I remove that that
filter button right here whenever I
actually remove the field headers so I
can also go in Instead rightclick This
Grand the value inside of grand total
and I can say sort from in our case
largest to smallest so I feel like that
makes it a lot more convenient alsoo
sort additionally I'm noticing the
formatting isn't correct for this I'm
going to put in that comma separator and
then remove the two decimal places
similarly not only did we sort by the
grand total let's say I only wanted
maybe the top six of these right here I
could rightclick any of these job titles
right here and then go into filter in
this case I'm going to go top 10 instead
I'm going to select top six press okay
now that we have this all sorted I can
once again go into that design tab
change the grand totals we're going to
turn it on for columns only and Bam now
we have basically the same pivot table
that we had before with our values or
using formulas but instead now with
pivot tables and this is a lot more
customizable all right all right it's
your turn now to get your hands dirty
with some practice problems and
exploring how to make some different
pivot tables in the next lesson we're
going to go deeper with pivot tables
looking at things like grouping
hierarchy and how we can show different
values as with that I'll see you in the
next
one so let's get into some Advanced
pivot table features and for this lesson
and actually for everything in advanced
chapter we're going to be sticking with
that salary data set of over 30,000 rows
in order to actually analyze for this so
I'm not going to be calling it out
really any further into other lessons or
chapters the first thing we're going to
focus on is hierarchy which allows us to
look at things like we want to aggregate
not only the job title itself but also
by the country so what job titles are
within a country and then look at
specific values there for say like the
salary next we're going to move into
grouping focusing first on automatic
grouping basically using that job posted
date column to automatically aggregate
by year month and whatnot and from there
we'll then shift into some manual
grouping we'll be able to create groups
of different job titles and basically
break out whether we want to look at
maybe senior roles such as senior data
analyst senior data engineers and
compare them to just normal data nerd
roles such as data analyst or data
Engineers with this we're also going to
dive deep into understanding a deeper
method to analyze maybe percentages of
totals or percentages of grand totals
when analyzing these type of groups for
this you can continue working with that
workbook you were working with on the
last lesson if you've did everything you
did there or you can just open the pivot
table intro for this lesson once again
as a reminder the solution is going to
be in pivot table Advanced we don't want
to open that just yet because it could
mess up what we're doing here if I could
it is so we have four different sheets
that we cre created with this I only
really care about the data tab right now
so I'm actually going to select all
these other ones by holding control and
then right clicking it to hi
them so let's actually look what a
hierarchy actually creates I'm going to
go in and insert a pivot table from
table arrange remember we're using that
table of jobs you should have named the
table that in order for this to work and
we're going to insert it in a new
worksheet I'm going to move this over
and I'm also going to create uh call
this sheet hierarchy so for this we want
to look at the salaries for job titles
in a certain country so we're going to
start by dragging that job country over
to Rose and right now there's no
hierarchy but if I drag job title short
into the rows as well when we close this
tab up here we can see that now we have
two values in here and how we have
values underneath here we've now created
a hierarchy so Albania is basically the
parent or the top of this and then we
have data analyst data scientist senior
data scientist notice there's only three
values here and that's because an
Albania sort of a smaller country they
only have three types of jobs there at
least in the data set now we want to
look at salary for this so I'm going to
drag the salary your average into the
values it's going to do a sum once again
going into value field settings I'm
going to change this to average rename
the title to salary year average and
then changing the number format to
currency with zero decimal places
pressing okay for all this bam now I'm
also curious by this how many jobs we
actually have with a salary value this
just sort of an add-on so I'm going to
drag that salary year average over going
into the value field settings I'm going
to do a count of this and we'll call
this job count click okay so now we get
more of a relative idea of how many jobs
are so in Albania we have well only five
job postings so now I want to get into
actually seeing what countries have the
highest pay now as a refresher you can
come in here and select the dropdown and
we could either select how we're going
to filter the row labels or filter the
value labels but remember we want to
sort them and right now this is only the
or option to sort a toz or Za to a for
those row labels instead I can just
click make sure I'm clicking the Sal
your average because that's where I care
about I can rightclick it and from there
go to sort in this case sort large just
the smallest and what it did is it
sorted the values well within each of
these it still kept this kept the
countries in alphabetic order instead
what I can do is Select this cell for
the countries because I want to sort the
country's highest to lowest and then I
can sort largest to smallest as well now
this is pretty neat because now we can
see things like Belarus Russia Bahamas I
got to go down there have some of the
highest salaries by country and then
what those are based on the different
job titles there now some sometimes I
find reading this somewhat difficult in
this manner that it's laid out here I'm
going to show you how you can actually
change this so going back into the
design tab remember we had this reports
layout that we sort of breezed over last
right now it's in this show in compact
form we can actually change this to
something like show in outline form and
it will basically shift this over and
have this hierarchy basically in two
separate columns it also makes it nice
that you can actually a little bit
easier to sort with another method is
show in tabular form so now it basically
crunches it up and I actually like this
one even better and it's still breaking
out the job country and job title short
into two different columns but now it's
actually aggregated to less line so I
can actually see more data on here now
this is definitely a form that I'd like
if I want to hand over on boss and even
if I wanted to convert this even further
to what is this repeat all item labels
so now I could if I wanted to actually
copy and paste this into its own table
and analyze further at least now I have
like Bahamas with the software data
engineer not software data engineer I
mean software engineer or senior data
engineer anyway you may have noticed
there are some blank values in here and
that's because it has an Associated
hourly salary but not yearly what i'
need to do is actually apply a value
filter because it's a value so I come in
here and click to drop down go to Value
F filters and then maybe put something
like greater than we'll put zero and now
those values will
disappear next analysis we're going to
do is a count by the job month but we're
not going to use the this job posted
month column that we created in the last
lesson instead we're going to use
automatic grouping for this so we'll go
ahead insert in a pivot table we'll
insert into a new sheet and we'll call
this group automatic I'll go ahead and
move that to the very end okay so what
I'm going to do is I'm going to take the
job posted date remember it's a bunch of
dates and I'm going to throw it into the
rows and this is going to get into some
aggregation it's going to take a little
bit to load my computer's not even
loaded yet but it's about 15 seconds
later and it is now available if you
notice now we have this hierarchy of
this grouping and I can now dive into in
this case January and then one Jan here
and then diving in further we can dive
into specific times of job postings
going to go ahead and close this up if
we actually investigate over here inside
of here we can see that after I dragged
that job posted date over it basically
created a month days and then the date
itself which is actually a date time but
anyway three different values its own
hierarchy with this automatic grouping
and so now I can go in and do something
like drag the job title short into here
to get the job count I'm going to change
this
to job count also go in and actually
adjust the formatting but now whenever
actually go into each one of these
hierarchies and look in we can see how
many job postings were having on a daily
bra basis and how many were happening at
a certain date time so now let's say I
wanted to dive deeper to understanding
maybe why July had such a high number
compared to all the other months one I
could double click it or I can just
rightclick it and go to show details
this is going to show well the details
and if we actually go over to the job
posted date column it's going to have
all the values for July inside of here
so this is a pretty unique way to get
into diving deep and showing the details
of what is the data being used to
perform these aggregations and also
double check your
work now we're going to get into manual
grouping specifically we're going to
create this where we actually go through
and aggregate based on the job titles
itself assigning it into well a group so
put data analyst scientists and data
Engineers into Data nerds senior RS into
senior data nerds and then these guys
into other data nerds so we're going to
create a pivot table for this go in and
select okay using the jobs table and
we're going to be grouping the job title
short so I'll drag that into the rows
for the time being and we'll just start
by grouping just the data nerds so I'm
going to just select one of these and
then hold down control and then also
select data engineer and then also data
scientist then I'm going to rightclick
it and select group the other way I
could also do this is go into pivot
table analyze and select group selection
the next one on want to group are senior
roles so I'm going to just select all
the different senior roles conveniently
they're all right next to each other
then I'm going right click it and go to
group so now they're the own group the
only thing left is getting the rest of
these I'm actually going have to control
these select these and then these as
well and then from there we'll group
that for group one I'm just going to
select it come up to the formula bar and
type in data nerds name group two to
senior data nerds and then group three
to other data nerds also going to zoom
in a little bit to get a little bit
closer
now that we have all these grouped let's
actually dive into performing a
basically deeper anal analysis on this
to look at how or what percentages these
make up of all the job titles and also
of their respective groups specifically
we're going to be looking at on going
job the job title short over here we're
looking at the count and how the counts
of those jobs are going to be of the
percentages anyway I'm going change this
to job count along with going through
and updating the formatting to use a
comma and no decimal places so for this
I still wanted to use that basically
count of the job title short so with
these counts we're going to do the
percentages so I'm still going to use
that job title short column going to
drag it into the values we did a count
but now let's actually go in inside of
the value field setting remember we got
to that show value as and inside of here
we can have different values percent of
grand total percent of column total
percent of row total we're just going to
go percent of grand total to start press
okay and Bam this is now showing us the
percent of the grand total now I'm not
liking how this is ordered right now I'm
actually going to I'm going to sort this
selecting one of the values inside of
the job count from sort it from largest
to smallest and then also I want to do
the actual grand total itself sort
largest to smallest anyway we updated
this to percent of grand total we need
to update the title to specify perent of
grand total and so we can see that data
nerds for their parent are taking up
about 76% almost 34 of the jobs are that
and individually we can see that data
analysts are nearly 30% of that whereas
we get down to the other data nerds
they're only taking up a very small
percentage now what happens if we want
to see so what is data analyst of the
actual parent or what is the cloud
engineer of the parent other data nerds
well I can drag that job title short
into the values again it's aggregating
by count but I can go in and this time
I'm actually just going to rightclick it
and we can have this show value as I'm
going to use that instead we can do that
percent of grand total but instead I'm
going to come down here to percent of
parent total and in our case it's asking
us what is the parent now you didn't we
haven't gone over this but it actually
recreated that that grouping as job
title short two so I'm going to click
okay we don't want to do the job title
short that's not the parent job child
short to and that's you can see it
actually down here job title short too
it created inside the rows but anyway
getting back to the parent now it's
showing the percent that it takes to
make the parent and then obviously the
parent is at 100% so I'm going to rename
this one percent of parent now we just
looked at percent of grand total and
percent of parent but the show value as
has a lot of different other ones you
can also do in here if I wanted to I can
even do something like rank largest to
smallest once again it's asking us do we
want to rank part of the parent or part
of the job tile short I want to rank
part of job tile short and it will show
its individual rankings underneath each
from highest to lowest I'm going to go
ahead and undo this I don't want to know
necessarily keep that one more note
before we go for those that purchase the
course practice problems and also note I
also go into calculated items and field
and have its own little worksheet for
you to follow along and try out
calculated items and field I didn't
necessarily include it in this lesson
because I felt that it wasn't a very
powerful feature I instead use like
using measures instead which we going to
cover in the power pivot chapter but if
you're interested about it I have
content on it in our notes and those
calculated field and items is underneath
that pivot table analyze tab in here on
calculated field and items we're not
going to be covering it outside of those
notes that you can follow along and do
your own self-study with it all right
you have some practice problems now go
through and get more and familiar with
these Advanced features and pivot tables
because in the next lesson we're going
to be diving into actually making charts
out of these pivot tables using pivot
charts with that see you in the next one
moving now into pivot charts so we did a
lot of work already in analyzing things
with pivot tables we're going to take it
now to Next Level pivot charts
specifically we're going to be looking
at first what is the average salary by a
job title next we'll be looking at which
job has the highest percent of demand
and then finally lastly we'll be looking
at how basically how are jobs trending
over time we're going to be building all
these charts using pivot tables
additionally we're going to include the
features of slicers and also timelines
based on what chart we're using in order
to be able to filter down and more
easily make our graphs more interactive
as usual in the advanced chapters I want
you to starting with the Excel workbook
from the last lesson so pivot tables
advance and if you want to see the
examples or the final answer you could
go to Pivot charts for this we're not
going to be using the hierarchy or that
show Det tail tab so I'm going to go
ahead and hide
those so let's create this first chart
to analyze what is the top paying job in
data science for this I'm going to just
create a new pivot table for this using
that jobs table and we're going to be
aggregating by job title short in the
rows and then the salary your average in
the values and for this we want to
summarize values we don't want to do the
sum we're going to do the average I did
this by right clicking it but we do to
have these all formatted correctly in
currency with no decimal places and I'll
update the title as well to average
yearly salary so in order to insert in
this pivot chart we're going to go to
the insert Tab and we're going to come
here to Pivot chart there's only one
option right now because we're selected
on a pivot table and that's a pivot
chart itself so I'll go ahead and insert
it with this there's no recommended
charts but I know I want a column chart
so we're going to go with that and if
charts aren't that different from
regular charts I can come up here select
this plus sign I can remove things like
the legend I don't really need that and
then I can change things like the title
by just double clicking this to
something like what is the top paying
job in data science now you may notice
these pivot charts are a little bit
different as they have these field
buttons on here that basically allow you
to with the chart itself go in and
filter it this is really convenient if
them say this chart was in a different
page anyway I want to have these salary
sorted from highest to lowest so I can
come into here and you know we can go
sort A to Z or Z to A and you can change
it around we want to actually sort from
highest to lowest so I can come in here
under more sort options and I can change
this from the job title short column to
that average yearly salary column and we
want it to be descending and we'll click
okay so bam now we have our salary
oriented from high to low with our
values if you don't like these field
buttons right here you can come in and
right click it and go hide all field
buttons if you want but if you want to
get them back you have to come back
underneath the pivot chart analyze Tab
and select field buttons and uncollect
this hide
all the next thing to analyze is which
job has the highest percentage of demand
we're going to use that percentage of
grand total column before and we're
going to be adding a little twist with
this one as we're going to be also
building in some slicers so we can slice
the data for what we want so back inside
the work should you should be working in
so we want the percent of grand total
only so I'm going to move out count
percent of parent and also that rank
count to only have what we want next
move into getting a pivot chart Built
For This and once again I'm going to be
using that column chart I'll go ahead
and insert that I'm going rename this to
which job has the highest percentage
once again I don't really care about
that Legend now I want my basically
target audience whoever I give this to
to have control to be able to select
which group they can filter for whether
that's data nerds senior data nerds or
other data nerds so in order to control
that I'm going to first zoom out we're
going to insert some slicers for this so
if we come into the pivot chart analyze
tab we can have with this chart selected
I'm going to go into insert slicer and
we're going to do it for remember that
that group from last time is actually
job title short 2 and also we're going
to filter this one also by country I'm
going to click okay they're going to pop
up here on top of this I don't really
like this I'm going to drag it over and
I'm going to fix the formatting real
quick so now with these slicers I can
make it a lot easier for somebody using
this to come in and say hey I only want
to look at data nerds or I want to look
at other data nerds and see what their
appropriate percentage is when you click
on a slicer you will notice that this
slicer tab comes up there's some
different formatting options the one
thing that I Define myself do changing
is the appropriate label or the slicer
caption in this case I would rename this
one to something like job group and then
for the job country I would just rename
this to Country and you can see they
update appropriately here for it as a
refresher right if you wanted to select
multiple different op options I would
select this multi select right here and
then with that enabled I can then select
data nerds and also senior data
nerds the last visualization we're going
to be building with this it's a line
chart looking at how jobs are trending
out of time using that previous pivot
table we made on the job count for this
one we're going to be using a timeline
filter to be able to select down to
maybe a certain quarter or month so back
in the workbook that we're working in
I'm in this group automatic sheet we
want to create a pivot chart so I go to
insert and into pivot chart and for this
one we want a line so I'm going to go
ahead and insert that I'm going to give
this appropriate title of how are jobs
trending over time additionally I'm
going to remove that Legend and I want
to add a trend line to it now you notice
by this one the actual field values for
this you have multiple different ones
here remember it did that automatic
grouping in the last lesson so you have
not only months to filter by days and
also that job posted date so a lot more
values here now to add a timeline for
this I'm going to go up to Pivot chart
analyze and I'm going to go into insert
timeline there's only one value that's
going to be available for this job
posted date and right now if I expand
this all the way out we have all the
different months that are available I'm
going go ahead and close this up right
below it so if I wanted to filter by a
specific month I could be like hey I
want from February to November in this
case October I actually need to select
February I'm holding down my key for
this and then dragging to November
anyway I can also change this with this
filter not only months but also quarters
and even something like years I prefer I
typically analyze things in quarters so
we're going to do it that manner and I'm
also going to shift it up here to the
right hand side similar to slicers if I
have the timeline selected I can come up
here and actually change the name in
this case I'm going to change it to date
I could also change thing like
formatting or even things like color now
one thing to note with this with what I
have selected here it's only going to
filter what I have the chart set up to
or what I actually created the timeline
while the pivot chart was selected so
let's say I came into here and I wanted
to look at in our case just data nerds
and then also go into looking at the
counts themselves this isn't necessarily
going to update for that those slicers
aren't connected to other charts but you
can change it to do that so in this case
I could select something like the pivot
table itself going into pivot table
analyze and then here under filters
where you can create things like slicers
and timelines which we did in the pivot
chart anyway they have this thing called
filter connections and I'm going to
expand this out so we can actually see
it and right now we're saying that well
for pivot table 3 as we can see up here
probably need to give these even better
names only the date is actually
connected to this if I wanted to connect
the other ones such as country or job
group I'd have to select them and press
okay now I don't know if you noticed
that but it actually adjusted these
values actually decrease because I have
less values selected here whereas if I
actually select more all of these going
on this is going to increase the values
anyway that's sort of hard to see let's
actually show this by with uh sheet one
which actually should be something like
top paying jobs and in this case I can
go into pivot chart analyze into filter
connections and this is going to show us
based on pivot table 7 which is this one
right here I should have renamed these
there's no different slicers or
timelines attached to it so I can
actually select all of these and apply
it to this one and now when I go to our
grouping right here right so we had all
of them selected if I want to just look
at data nerds here so I can see the
percentages of data analyst dat engineer
and data scientist I can see what their
salaries are for it and then also I can
see their counts for those as well so
this is definitely a useful feature if
you're looking to link charts or
specifically pivot tables that are not
necessarily
connected all right now it's your turn
to get more familiar with using pivot
charts we have some practice problems
that you go through and actually
understand more about how to use them
with that in the next lesson we're going
to be jumping into well the next chapter
on Advanced Data analysis and using some
pretty unique and pretty complicated
features in order to analyze data so
with that I'll see you in that
one welcome to this chapter on Advanced
Data analysis and this entire chapter is
really focused on using addins which are
basically programs that people have
built to incorporate into Excel to do
very unique and specific tasks because
of that going from less lesson to lesson
we're not going to necessarily be
building on each other as we go through
these lessons every lesson is going to
be sort of its own unique sort of
Learning Journey about a specific
feature or features to start with this
lesson we're looking at just enabling
the add-ins and looking at some basic
ones such as what if analysis and we're
going to get more into it in a second
but we're going to be focused on looking
at if weed three different job offers
which one should we actually take in the
next lesson we're going to be continuing
on with what analysis focusing on data
tables and this shows us how values are
going to be changing based on one or
multiple variables and then finally the
third lesson is on an addin called
analysis tool pack that provides us
access to a lot of different statistical
analysis that we can just easily select
what type of analysis want to perform
and it does all the analysis for for us
and provides it in a sheet pretty neat
anyway getting into this lesson we're
going to start by first enabling these
add-ins so that way you have it and then
from there we're going to move into our
first somewhat simple example
forecasting what's going to happen into
the future specifically we're going to
look in at our past job postings and try
to predict what's going to happen in the
future from there we're going to be
moving into what if analysis and for
this we're going to have a scenario
where we have three job offers and we're
trying to find what is the most optimal
one we're going to use things like
scenario manager to go through and
automatically calculate what it should
be for those three different job offers
and then let's say we need to actually
negotiate one of those job offers and we
want to match another we can use solver
or goalkeeper and both of these have
both unique different features of them
that we're going to dive into to allow
us to adjust what we could potentially
negotiate for better job offers one
quick reminder on which versions of
excel will support this chapter on
Advanced Data analysis all of them will
with the exception of Microsoft online
it doesn't have the ability to add in
these specific addins but you're on Mac
or the windows version you're going to
be completely fine so for this we're
going to be working inside of the
analysis addins workbook I know it said
previously you need to work with the
previous workbook from the previous
lesson but this chapter in general
doesn't build on anything it has
everything you need within the workbook
so you're going to be fine with this
anyway we just need two sheets from this
forecast original and what if analysis
all the others are just the results that
we're going to be getting and feel free
to go through and select the sheets that
we're not using so these four in this
case and hide them so that way we only
have the two sheets of forecast original
and what if
analysis so before we enable addins I
think you need to know what are exactly
Excel addins here I am in perplexity a
and I asked the question and it goes
into to specify What It Is by saying
that basically interacts with Excel
objects and data and it will add custom
ribbon buttons or menu items and thus
providing custom functions now this is a
little technical but there are three
different type of addins they have web
Excel and com add-ins today we're going
to be importing in Excel addins which
are actually created using something
like VBA anyway the most popular Excel
add-ins are things like solver power
pivot power query you don't necessarily
have to add in unless it's not included
and then also things like analysis tool
pack which we're going to get to in that
third lesson all right enough on the
history lesson let's actually get into
enabling your addins if you go to the
data tab right now you'll probably see
that you have this forecast section so
you do have what if analysis available
but you don't have anything ex else over
here right now it's um well usually
blank but we're going to add to it so
I'm going to go into file and then from
there it's hidden but under more I'm
going to go to options on the menu on
the left hand side I'm going to go into
addins and this menu right here tells
you what your active application addins
are right now I have no active
applications and then your inactive
application addins so I do have access
to all these different ones right here
so I want to enable them specifically I
want this analysis tool pack and then
well the one we're going to use in this
lesson solver so um on manage I have
Excel addins that's the one that I want
to actually use for this I'm going to
click go and now we need to enable which
ones we're going to use so analysis tool
pack for the third lesson and solver for
this one from there I'm going to click
okay and now over here on the right hand
side we have analysis popup data
analysis which is the analysis tool pack
and then solver is the solver
added so let's actually get into
forecasting specifically looking at what
we expect job postings it to be next
year and right here in the forecast
original sheet I have date and then also
the job count and this goes all the way
for or this is all the data for 2023
anyway this example is going to show the
custom features that we really can do
with some of these add-ins and also
built-in features so I can select the
date and job count column and then for
this we're going to go into the forecast
and specifically to forecast sheet in
this it plots in blue what are our
values that we currently have for
basically 2023 and then from there it
plots into the future using this orange
I can toggle this between this a line
chart and also a column chart but I'm
not really finding the column chart that
useful It's Time series data so I'm
going to go back to that line chart the
other major thing I control is the
forecast end date so if I wanted to only
do maybe two months I could change this
instead to end in March additionally
have hidden underneath this drop down of
options the ability to go in and
actually change other things like
confidence interval and seasonality and
things like that right now it's
automatic set it up to basically detect
automatically and seasonality is as you
notice in this data it goes up and down
up and down up and down it has a
seasonality to it basically every single
week there's more postings during the
week and on the weekend there's less as
expected so this seasonality is carried
out into the predicted data as you can
see here because it's still in the
orange actually goes up and down anyway
going to close this this is great I'm
going to click create in this new sheet
it automatically has this popup here
that says this table contains a copy of
your data with additional forecast of
values at the end you can manually edit
the forecasting formulas in the sheet or
return to the original data to create a
different forecast worksheet okay great
got it I'm going to zoom out a little
bit and what this table did is it still
kept that date and job count column but
it also built out three other columns to
actually look at scroll all the way down
what the forecasted would be a lower
confidence band and then an upper
confidence band and then looking at the
actual chart that it provides we can see
this where this darker orange color is
what The Forecastle band is this is the
upper band and then this is the lower
band anyway that's pretty cool that I
could generate this all by just clicking
a single button of forecast
sheet all right now we're going to move
into wh if analysis and we click this wh
if analysis we have three different
things here we have scenario manager
goal seeker and data table
for this one we're going to start with
scenario manager but let's first go over
what the data is here in the sheet that
we're trying to basically trying to
calculate first let's focus on these
columns B and C this is a if you will
dashboard or calculator that I built so
I can put into here a base salary a
bonus rate and then an annual raise
amount and it will calculate it so let's
say our base salary is 12,000 I can put
that into here assuming the same 10% and
1.5% it's going to automatically update
for this over here on the right hand
side in E through H over here we have
three different job offers that we
received and they consist of the base
salary the bonus rate and the annual
raise underneath here this fourth or
fifth row if you will this is
constraints that we're going to use
later on I would just ignore this right
now so what's going on down here in the
result cell well what we're doing is
we're
calculating what the expected salary is
for year zero all the way to year four
and then from there we're actually
getting a total so in this case this is
summing up all these values right here
so why am I doing four years why am I do
a total left for these four years well
the Bureau of Labor Statistics basically
estimates that most people have the
average tenure at a company of four
years so the idea with this calculator
that I've made is that we're able to
calculate based on a job offer we re
what would we expect if we were to stay
at the basically average amount or
median amount of time that a normal
person stays at a job like just looking
at what's the first year because
sometimes things like bonuses and annual
raise may actually push us into higher
salaries even though the base salary is
lower than another salary so it
basically helps calculate this out and
even the playing field for these three
jobs that we're trying to calculate
anyway you can go through if you want to
and and understand what formulas are
going on behind the scenes here but
basically I'm just taking into account
these three parameters right here and
then every year basically starting with
that previous years and then adjusting
it for the annual raise and then giving
it its appropriate bonus so as expected
because there's an annual raise on each
one of these the salaries are going up
so with that what is going on here do I
need to actually go through and actually
put in every single one of those jobs so
I'll put in job one and get the 566,000
and then now do the second job and third
job no I can use scenario manager for
this so going into what if analysis I
select scenario manager and we're going
to add three different scenarios so I'm
going to come up here and select add
this scenario name we're going to call
it job one next we're going to move into
what we're going to use for the changing
cells and I've labeled these basically
or made these into an input format we're
going to select these three right here
so C3 through C5 we'll leave the comment
as is protection as prevent changes and
go to okay now it's going to ask us what
values we want to use for each in this
case I use 100,000 10% and 1.5 it's
already filled in pre-filled in from
there I'm going to click okay now we
need to add job two for this I'm going
to leave changing cells the same this
one I'm going to change to 880,000
15% and then change this bottom one to
1.2%
then finally we need to add that job
three one of the last steps we need to
do is now go into summary right here and
for this we need to figure out what we
want to actually have it provide for us
in our case we want the result cell of
C9 through c14 to be provided from there
we click okay and bam we're going to get
this scenario summary sheet that goes
through in details based on job one job
two and job three for the value that we
input into it and from there it's going
to tell us what year zero is year 1 2 3
all the way down to the total salary now
one thing to note is you see these names
of Base bonus raise year zero uh and
then total salary if I go back to what
if analysis I've actually gone through
already for you and actually Nam this so
in this case I'm selecting zero it's
named year zero and total salary if I
were to use things that were maybe not
named it would just provide the cell so
if we're using the values here it would
just going to be provide F6 and in that
case we would have saw F6 here also back
in the scenario summary you may not have
ever saw this before but Excel allows
this sort of grouping if you will to
basically manipulate the sheets and what
values are hidden or potentially shown
here anyway pretty unique feature that
you may or may not have seen
before all right moving on to goal
Seeker let's say we have the scenario
now where we got the job offer for job
one in this case but we want to try to
match that of job three specifically if
I go back to that scenario summary sheet
we can see that job one is at around
566,000 but job three is at 640,000
we'll say we have some Insider
information that human resources told us
hey we can't adjust the base or the
bonus but we can adjust the raise what
raise you get every year and so you
could potentially ask for a higher Rays
what Rays would you need to basically
put into here to get equal to that job
three so the first thing I'm going to do
is go in and make sure that we have
inside of our formula input in the job
one actual statistics of it so 100,000
10% and 1.5% for the annual raise now I
could go through there so I type 1.7%
and then 1.8% and just keep on going up
until I actually find what it is or
instead we can just actually use this
goal seeker and for this we're going to
be setting a cell specifically cell
c14 to that 640,000 that we want to get
to and we need to provide what cell
we're going to actually change in this C
case we're going to change cell
C5 which is the annual raise no for this
we can only change one option we're
going to be able to change M multiple
the next scenario but not in this one of
goal Seeker so from there I'll go ahead
and click okay and Bam automatically
goes through I don't know if you saw
that it Ste through it and it went up to
7.6% and that's what we'll need in order
to get to that 640,000 and it even
provides an old nice dialogue box saying
that hey it did find a solution
sometimes you may put a goal in that's
not achievable and in this case it would
it would tell
you so
7.76% is a pretty high raise let's say
we get further information from HR
saying hey we can actually change not
only the annual raise but also your
bonus we still have the same scenario
you can't change the base salary needs
to stay at 100,000 for that first year
so we have multiple parameters now that
are changing this is when we're going to
shift from using this goal Seeker now
over to solver one thing before we start
we need to actually reset these values
in here I'm going to change this back to
1.5% both of these Step Up in value so
you want to reset it before you go so
opening up solver I'm going to set the
objective as before that c14 of that
total salary and we want to get it to a
salary of 640,000 and we want to do this
by like we said we can change two things
in this case the bonus and the annual
raise we can also add constraints which
we'll do in a second after we just run
through this one but I want to actually
just go through and solve it first and
the last thing we need to look at is
select a solving method we're going to
just leave it here I really like this
grg nonlinear we'll leave it that for
the time being and we'll go ahead and
click solve now for this it says solver
found a solution all constraints and
optionality conditions are satisfied as
we can see it increased the bonus and
then also the annual raise and we got to
that 640,000 inside of this popup box we
can have it output certain reports so
I'm going to just hold control and
select multiple different reports along
with clicking this for outline reports
that's it's going to actually print to
different sheets and from there click
okay anyway the most important of these
three different reports that it gave to
us feels the answer report basically
tells us hey what was the original
values put in for the D bonus and raise
and then what are the final values in
order to get to that final value of 6
40,000 they also have these two other
reports one on sensitivity analysis and
the other one evaluating the limits
which we're going to get to um but these
I don't find as important so now with
this with solver we found that we can
input more than one different input now
we can also specify constraints if I
come back up to solver and it says Hey
in this dialogue box subject to the
constraint right now the annual raise is
sort of low still at 2.1% but that bonus
skyrocketed it was previously at 10% and
it went all the way up to 23% so we
could actually put some constraints in
by clicking add and we'll say hey the
bonus we're not going to let that exceed
15% we'll click add for that and then
for the next one we don't want the
annual raise to exceed we'll say 4% and
we'll click okay remember I did name
these cells so that's why it pops up
automatically as B and raise makes it
super easy whenever you name cells all
right let's go ahead and click solve so
look at this solver could not find a
feasible solution with these constraints
basically maxed out the bonus and maxed
out that annual raise and we didn't get
to that 640,000 so what I can do is I
can return to the solver parameters
dialogue click okay and in this case
I'll change the bonus to we'll say 20%
now and then for the raise we'll change
this to 5% click okay and then try to
solve again and we found a solution we
have 17% and
4.4% and for this I'm going to Output
the answers I'll click outline reports
to export it click okay close this out
and then go to the report we can see
what our finally values are along with
how we got to our 640,000 final value
all right you got some practice problem
problem to now go through and try these
different features out of scenario
manager and goal seeker and also solver
and I think once you play around with
them more you can find out which one is
more applicable to which scenario with
that I'll see you in the next section
where we're going be going into deeper
into what if analysis specifically on
data tables one my favorite features of
what if analysis with that see you
there let's now get wrapped up on what
if analysis by focusing on data tables
we're going to be focusing on building
one input and also two input data tables
for the first one on one input we're
going to be continuing on with that
exercise from last lesson looking into
that job offer one and seeing how we
could change the annual rays in order to
thus affect different salaries at our
4-year point and mainly the total salary
at this point and from there we're going
to shift into building two input data
tables where we're not only analyzing
that annual raise increase but also a
change in the bonus rate to see what the
different salaries are for that final
total amount of those four years so for
this lesson and also for this chapter
we're going to be starting with the
actual workbook of the name of the
chapter in this case data tables and
we're going to be working this original
sheet but I want to jump into that one
input to basically show you what we're
going to be building
we're going to be inputting into here
the annual raise percentage we're going
to put it in increments of. 5% and then
along the top in the row we're going to
be inputting the values from over here
um and these values right here across
the top and then the data table itself
is going to fill this in with the
expected result so in this case year
three at 2% s uh 2% increase in arrays
it's going to get around
116,000 we also do for coloring at the
end uh the data tables don't do that
that's done with conditional formatting
so here we are back in the original
sheet first thing we need to do is get
the annual Rays put up here remember we
want to go in we'll say. 5% increment so
I'll do zero
0.5% and then for the rest of these I'll
just drag them down I end up messing up
the formatting so I'm just clear the
borders and then put a border back
around the outside now for the salaries
I want that to be for what year zero
then year 1 I'll drag this on over for
these and then we'll put a total so for
this I want to enter in that year zero
we're going to be doing this for all the
different values right there I'm going
go ahead and put it in if you notice it
has this line through it and I actually
click it and then from here whenever I
look into it it provides the error of
stale value you may or may not see this
but I'm going I show you how to fix this
if you are experienced this you can go
into file and then into more under
options and what happens is under
formulas my workbook calculations went
from basically automatically calculating
to manually where under manually if I
look at this little icon right here they
can be manually calculated by pressing
F9 or going to formulas calculate now
anyway there's nothing wrong with having
automatic calculations that's actually
what I want all the time somehow my
thing switched into this manual if yours
does switch it back to automatic click
okay bam we're good to go and we'll
continue on now it's important for up
here at the top that we have them equal
to the formulas here because this is
what's going to be ultimately getting
changed and manipulated so I wouldn't
want to go through and actually manually
fill this in with a 110,000 it needs to
be connected to the formula that
actually is getting calculated so
building our data table now I'm going to
select this entire range right here E3
all the way down to K12 go to the data
Tab and select data table now this
provides us two inputs a row input cell
and a column input cell we're only doing
a one input data table so we only need
to fill in one of these specifically
we're looking for the input either into
the row or the input into the column in
this case we're going to be subbing in
this this column this e column right
here we're going to be subbing it into
the formula here and it wants to know
what is the input cell for in this case
the column so I'm going to go ahead and
select it it's C5 I'm gonna go ahead and
click okay and it's going to
automatically fill it in now what's
unique about this is I could also go in
here if I wanted to and maybe change
this to something like 10% and it will
update this entire data table with that
new value I'm actually going to change
that back to 3% but pretty unique anyway
if I wanted to I can come in also and
I'll so go in and to conditional format
it I'm only going to select Euro 0
through four and I'm going to do a white
to green and then for the total I'm
going to do its own because it's almost
in its own bracket here right it's a a
sum of all those different values so I'm
also going to do the same thing of the
white to green and then you know me I
don't really really like green so I'm
going to go ahead and select this and
I'm going to end up changing this by
going into manage rules and conditional
formatting selecting on this one
adjusting the color to Blue and also
selecting this one and changing this one
to Blue as well click apply and then
okay and
Bam so with that example complete let's
move into a two input data table and
let's look at the final example for this
for this we're going to have as we had
before the annual rays in the column but
this time we're going to have the bonus
up on that top row and for this we're
going to be calculating as we click here
it's going to be calculating
c14 which is the total salary we're not
going to be calculating that 0 1 through
4 anymore and it's going to go through
and calculate it for all of these
different scenarios if you will all
right to do this I'm going to go back to
that original sheet I'm going to
actually duplicate this by saying copy
it create a copy and click okay okay so
now we have original two so I'm going to
name original to one input and then
rename or two to two input now for this
one I'm going to end up just clearing
the contents from here I'll go to
editing clear and I'll just select clear
contents and now thinking about it I
want to also clear any of the formatting
that's in here cuz we're going to be
doing something different with it I can
go into clear rules I can go clear rules
from entire sheet all right so we're
have the Rays and the rows and now we
need the actual bonus in the columns for
this we'll go from 0% to
5% and I need to actually change this
formatting to actually be a percentage
and then drag this all the way through
along with fixing this formatting so now
a two input data table is a little bit
different in that we need in the upper
left hand corner what we actually want
to change whereas the one input put we
did across in in our case we did across
the rows in this case we just want to
have in the upper left hand corner there
it is I sort of grayed it out you can
make it a little bit darker if if you
want to but I would just want to make it
known that hey we're not necessarily
using it so similarly we're actually
going to get into creating it we're
going to select the entire data table go
to the data tab what if analysis data
table for the row input cell so this row
up here what are we wanting to
substitute these values into well we
want to sub it into the bonus and then
similarly for the column input same as
last time that's the annual raise so
we're going to want to sub that into C5
going go ahead and click okay so I'm
going to dress this up a little bit I'm
going to bold the header right here also
I'm going to merge and center this all
so we can put inside of here bonus and
then finally I'm going to conditionally
format it like we did last time using
that white to green and then changing
that green to a blue to get it more of
what I want so bam now we have a two
input table and we can see what it's
going to be across all these things also
with this if you remember from our last
lesson right we were looking at finding
what is the value we'd want to be to get
around 640,000
now we have a few different values we
can actually look at for this and we can
tell from this well we going to need to
be above a bonus rate of 15% to even be
considered to get up to 640,000 so
sometimes I like this visually better
than going in and doing something like
goal Seeker or even things like solver
because now I have multiple different
variables I can look at and analyze and
try to adjust on my own all right so you
now have some practice problems to go
through and get familiar with data
tables I found when I first started with
data tables got really confused on the
row input and also the column input
cells but really understanding how those
are being applied into the original
formula helps you figure that out all
right with that I'll see in the next one
where we're going going into the
analysis tool pack and diving into a lot
of different statistical analysis you
can do with Excel so with that see you
there all right this is the last lesson
in this chapter on Advanced ad analysis
and specifically we're going to be
focusing on that analysis tool pack
addin now this addin is packed full of
features and I can make a whole tutorial
just on this addin alone but we're only
going to be focusing on four core things
of it that it does that I use from time
to time on our job posting salary data
set of over 30,000 rows first we're
going to look at how we can get
descriptive statistics of something like
a salary column so we don't have to go
through and use formulas to get all the
different statistics for it second we're
going to investigate how to make
histograms but these are a little bit
with a Twist in that I feel like they're
more customizable than the previous
histograms we can make third we'll get
into ranking and assigning a percentile
for our salary data so we can understand
where it actually ranks for percentiles
and then finally we're going to be
moving into looking at at a moving
average if you remember our job posting
data set had all the seasonality in it
basically went up and down a lot
depending on where it was posted during
the week well we can remove those
fluctuations by a moving average for
this we're going to be working in the
analysis tool pack workbook and all the
answers in there so you can feel free to
go ahead and actually select all the
different sheets in here and go ahead
and hide them so we only have the data
tab in there and we'll be working with
this
so as a refresher this is the data
analysis tool pack you should have gone
through in that first lesson and
actually enabled it by going into
options into the addins itself and it
should now be under the active addins if
you didn't do that remember you all you
have to do is just go into go into here
and select it all right so let's open
this bad boy up and if I click that
analysis it's going to pop up here and
this dialogue box allows us to select
like I said from a variety of different
tests that we can actually perform
there's a lot of different statistical
tests in here such as regression and
sampling and then even things like
correlation Co variance and whatnot so
let's start with the one that I find
myself using the most and that's
descriptive statistics when I want to
perform Eda or exploratory analysis this
is the first thing I want to do now the
thing about this is we need to provide a
column that has numerical values in it
so we could do the date column but what
we're going to do is we're going to to
provide the salary year average column
go ahead and press enter for this we do
have labels in the first row so I need
to click this here for output options we
want to go to a new worksheet so that's
what we'll leave for this and with this
we do want the summary statistics you
can go in and also specify things like
confidence level and the cith largest
and kith smth but we're going to leave
those default for the time being and
click okay now it's popped up in this
new sheet called cheap one and diving
into it I'm actually going to expand
this out and then format all these
numbers real quick so that's much more
readable so now we have all the key
statistics from it we don't have to go
through and calculate a formula for mean
median mode standard deviation the
minimum maximum sum
whatnot all right next up is histogram
and previously remember we could just
select something like the M column go
into insert here and actually insert a
histogram now the one problem I have
with this is the formatting of the rows
or the X values down here it basically
provides this range this is a lot of
data right there and there's it's really
hard to format this so let's look at an
alternate option for this using the data
analysis tool pack specifically we're to
come in here to histogram for the input
range once again I'm going to go ahead
and just select that column M press
enter it does have labels for bin range
I'm going to leave m I'm not going to
specify a width of the histogram or the
bin I'm going to leave it just default
for the output I'm going to leave it as
the new worksheet ply I don't want
either of these the parto or the
cumulative percentage instead I just
want the chart output of this press okay
and here we have the histogram it's
honestly not too special it's a little
hard to read based on the size of these
bins as you can see basically the
difference between these is around it
looks like they're doing basically an
thousand increments so the increments
are way too small we need to adjust the
bin anyway the one good thing is along
this xais it's only one value now so a
lot easier to read so now let's go in
and adjust that bin size so if I go back
to data analysis into histogram and
click okay for the bin range it wants me
to actually put in a range or a
selection so we need to actually
pre-fill out what range or bins we want
for this so I'm going to copy this
header up here cuz we're going to keep
the bin in frequency start a new sheet
paste it in here and I want to go in
we'll say 50,000 increments so 0
50,000 and I want it to go to basically
400,000 so now going into Data analysis
again histogram opening it back up still
has the input range selected correctly
now for the bin range I'll select A2 to
A10 select the output range to I
basically want it to be inside of of
this notebook so I'm going to select up
here on D1 we'll just start there and we
want a chart output on this page okay
I'll click okay and I'm getting this
error message that the input range must
contain at least one data point right
now this Elm is not referring back to
the correct sheet it needs to look at so
actually I'm going to select right here
you can see it selected that other sheet
I actually want to select the M column
of the data tab now we'll press okay so
now I love this because wanted output
this I didn't apparently need to do this
frequency thing I got confused anyway we
can actually go in and format this to
remove the legend and then update the
axess title for salary and then we'll
update this one for frequency anyway I
really like this because now look at
this control we were able to minimize it
not to go past 40,000 and have all these
outliers and everything else that has
past 40,000 is put into this basically
more value you anyway this is my
preferred method for making histograms
especially whenever I need to control
that
xais next up is Rank and percentile and
with this one we're going to be doing a
rank and percentile of that salary year
average column once again now this one
depending on the size of your computer
may take up it may even crash your
computer so if you're concerned that
this is not going to be able to
performed on your computer don't run it
just look at my example and understand
what get out of it anyway I selected
rank in percentile and then for the
input range once again I'll select that
column M and then we'll output it to a
new worksheet ply and I can do something
like even name it in this case calling
it Rank and percentile of the sheet that
it's going to go to so clicking okay it
says Rank and percentile input range
contains non-numeric data basically I
forgot to click this of labels in the
first row clicking again it's thinking
how long is it going to take all right
so Excel just on me maybe that wasn't a
great idea let's try that again using
Rank and percentile this time instead of
selecting the whole column I think
because it had some blank value
especially down to a million rows sort
of crashed it instead what I'm going to
do is I'm going to just select A1 and
then select down all the way to the
bottom I don't know why it changed it
over to column F but the main point of
me to doing this is that way we select
column M and also I need to remove this
A1 at the beginning okay and also need
to update this to be starting the second
cell and we're going to try this again I
gave it the name of rank percentile I
didn't have the labels in first row
selected because we're going from the
second cell how long is it going to take
this time all right so that was a lot
quicker this time and we have our now in
this Rank and percentile sheet our
actual data it did take about a minute
to do so once again if you have a
computer that's not necessarily that
fast don't try this at home all right so
some key statistics about this it
provides a point which is the row number
it's itself and then from there what is
the value that's the column the rank and
then the percentile what's cool about
this because of provided point we could
do something like the index function and
you provided an array and then the row
number in this case that's the row
number so if I wanted to find out what
the job title is I could select column B
and then from there for the row number
go back to rank and percentile and
select this value right here then close
parenthesis press enter looks like it's
a clinical NLP data scientist and I can
actually autofill this all the way down
anyway let's make sure this is actually
correct okay yeah just double checking
the row number at 25589 is clinical NLP
data scientist so we have it correct
anyway I could go through now and I did
this for the job title itself but you
could imagine you could pull out things
like the job country job tile short all
sorts of other key information and get
this in a list of what it's rank is
along with its
percentile our last feature to look at
is moving average and this is what we're
going to be calculating here the Blue
Line already is data we already have of
what are the job postings over time but
that orange line is the moving average
we can use this analysis tool pack in
order to calculate this and as you can
see it removes a lot of these fluctu
these weekly fluctuations if you will
from it and makes it a lot more are
basically readable to see where actual
the Peaks and the troughs are now in
order to do this I can't necessarily
just put in that job posted date into it
I have to actually get a count of the
dates and also what are the counts of
the job postings per date so we need to
create a pivot table so we go in insert
pivot table from table we're going to do
it from this table which is named jobs
and we're going to insert it into a new
sheet similar before we're going to put
that job posted date into the rows and
I'm actually going to take out you can
see it aggregated by month I'm going to
take out the month from there so it does
by days and now I'm going to throw into
the values here it's going to do a count
so I'm just change this to job count and
we can actually visualize this by itself
by going to insert pivot charts
inserting in a pivot chart we want a
line and that's what we saw before with
our Blue Line before that showed how it
basically went across uh went through
time
all right so goes ahead and I'm going to
delete this chart because we're going to
be making it and once again we're going
to that data tab into Data analysis and
we're going to be forming moving average
for the input range I'm going to select
B4 and then select all the way to the
Bottom now this grand total went into it
so actually I'm going to back up one and
change this to 368 we didn't select any
labels in the front row so I'm going to
leave that on blank in the interval I'm
going to just set it something like
seven for the time being for the output
range I want it to go right next to my
chart so I'm going to copy this above
and paste it below and change these B's
into C's so it's C values right next to
it and we want a chart output along with
standard errors I'm going to go ahead
and click okay now this chart is not
correct um we made a little bit of a
mistake but I did want to show you real
quick this moving average we can see
that it starts 7 days later right here
and so that's what's happening in this C
column here that's the actual moving
average and then the actual error itself
is right next to it it's pretty
consistent around 30 to 40 anyway we
need to fix this we need to take this
entire value if you will and move it out
of a pivot chart so I'm going to select
this all the way down to the bottom and
copy it then inside of a new sheet I'm
going to come in and paste it I'm going
to just paste looks like a pasting with
the pivot table formatting I'm going to
paste uh the values only and change this
to job date so let's try this again
using data analysis going to moving
average for the input range we're going
to select B2 and then all the way down
to the bottom remember this has a grand
total so I actually need to change that
to minus one for the interval I'm going
to adjust it a little bit I'm going to
actually change this now to a 21-day
moving average and then for the output
range this actually needs to be adjusted
to match what the input range is but for
b or c sorry anyway go go ahead leave
everything else checked click okay and
Bam now we have blue and also orange if
you will for the actual and the forecast
now one thing I'm noticing with this
chart is well the markers are pretty
heinous they're making they're clogging
up this chart so what I can do is Select
something like this orange line right
here I can rightclick it go to format
data series and then here underneath
this fill and line go into markers and
then for the marker options just
basically do none we just want to have a
line instead additionally we can just go
ahead and click that blue the blue line
and for the markers there we can do none
as well okay sensory overloads gone now
looks a lot more readable with the
exception of down here for some reason
it didn't pick up the dates on mine and
we can adjust that by right clicking
that and going to select data underneath
the horizontal ax labels I'm going to go
ahead and edit this I'm going like from
A2 all the way down minus one we don't
to do grand total click okay that
changed the names let's see if that
updated the chart and Bam it did now I'm
going to do some minor cleanup I'm going
to remove that Legend from there and
that looks a lot better so now we have a
graph of our moving average of the job
postings and as we sort of suspected in
August we had a peak along with January
seemed sort of high then went down a
little bit but then up again in August
so we see a lot more Trends and then
tapering out towards the end of the year
all right now it's your turn to go
through and practice with those practice
problems and exploring some of these
features in the analysis tool pack add
in with that we're going to be wrapping
up this chapter and in the next one
we're be jumping into Power query which
I'm super excited about in order how to
clean up our data and load it in in the
format that we want easily all right
with that see you
there welcome to this chapter on power
query and no pun intended but this is
one of the most powerful tools within
Excel it allows us to perform ETL
processes or extract transform and load
which just some fancy data engineering
talk for connecting to a data source and
loading it in after you clean it up
anyway in this chapter we have five
lessons specifically in this one we're
going to have an intro to power query
what it's all about how to actually
connect to a data source in the next one
we'll be moving into the power query
editor and we'll be covering that for
three lessons in order to go in how to
actually clean up your data and get it
prepared to a format that you want in
the last lesson we'll be diving into the
M language which is powering power query
don't worry we're not going to do any
in-depth coding or anything like that
just want you to have some familiarity
with you so we have more experience with
using power
query so what's this lesson about well
in order to understand that we have to
understand is what is power query and
here on Microsoft's learning platform
they have this fancy Dancy diagram that
basically shows this what power query
does it allows us to connect to
different data sources it could be
something like a database a text file or
even something on the cloud from there
power query will then pipe it in to a
bunch of different products they have
and we're going to be using it for
Microsoft Excel but it's also famously
also in powerbi now if you have a
Windows version of excel power query is
going to work just fine on the Mac
versions it is available however it's
very limited so a lot of the stuff we're
going to do within this lesson you're
not going to be able to do and also
Microsoft online is just completely not
available so as a reminder power query
is an ETL tool or extract transform load
and we can connect to as a data source
such as this here's a Wikipedia page on
the list of S&P 500 companies and it has
all the different 500 companies that are
part of the S&P 500 anyway let's say I
want this table I could go through and
try I mean as you can see I'm trying to
select it right now and it's like
selecting the whole page it's a whole
mess if I'm trying to get this but we
can actually use power query to extract
all this components out all I have to do
is go in and provide the web address of
this which I know it's located right
here I'll then select which of the
tables I want out of the web page which
is this one right here and then I just
load it in and here it is in our
workbook now don't worry I sort of ran
through that example real quick we're
going to go more in depth and Detail in
the last example in this lesson but I
just wanted to show the power of this
and how we can actually get data even
from online into our workbook so easily
so why do we need to use power query
well we're going to find that out as we
go along but I'm going to give you the
tidbits right now of One it automates
the ETL process so I don't have to do
that annoying task of going to a sheet
and copying it over every time I get new
data I can just get power query to do it
for me additionally with that sometimes
I may have mistakes I'm copy and paste
and sheets over therefore I have
reproducibility and then finally with
this I'm now allowed to bring data in
that potentially exceeds that 1 million
row limit of Excel which we'll show how
we can deal with that in a bit so let's
actually get into performing our first
example of loading in a simple data set
specifically from another Excel sheet
like I talked about the beginning of the
advanced chapters you're not going to be
able to actually work inside of the
workbooks that I have given so in this
case power query intro has the final
results but I don't want you working in
that I'll tell you what works you need
to be working with as we go through this
which you're probably getting the
security warning of external data
connections have been disabled and we'll
get to troubleshoot shooting that at the
end so instead we're going to be
starting with a new blank workbook I'm
going to go to navigate over here to the
data tab this is where power query is
located specifically under this get and
transform data it doesn't really say
power query but that's where power query
is hidden now anytime I'm importing any
data I typically go to this get data and
then from there I navigate Down Deeper
depending on it's file database from
fabric and power platforms or from even
other sources they do have for all these
for it here they also have smaller icons
right next to it that you can navigate
over and basically highlight okay this
is from web and then this is from a
table of range and whatnot we're going
to be going over multiple examples in
this video so don't worry if you're not
following along with which data sources
you can actually import I think you'll
have a good idea by the end of
this so what are we going to import
first well if you navigate into our
course folder under resources under dat
ass sets and then data jobs monthly we
have Excel files for every single month
we're going to start by just importing
one Excel file to start and then in the
next exercise we'll go into how to
import all these at once anyway we're
going to start simple first with just
this Excel file so for this I'm going to
go to get data and it's a file
specifically it's from an Excel workbook
inside the course folder I'm going to
then navigate to the data set going to
resources data sets monthly and then
select that January data set and click
import with power query you're going to
find that it has this Navigator window
pop up and from there it will show you
what is actually importing in in this
case January data jobs the Excel sheet
and then if it had one or multiple
sheets it will appear there underneath
it whenever I select sheet one it then
shows me to the right hand side a
snapshot or a preview of all the
different data in there it doesn't show
all the columns but a snapshot of it at
the bottom there's a few options to load
or load to and then also transform we're
going to keep it simple for the time
being and we're just going to load so
I'll go ahead and click it so we just
imported in this data set from another
worksh sheet it's already in its own
table and because it also was sheet one
it's naming the sheet sheet one
parenthesis 2 to signify as the second
one so congratulations we just completed
our first ETL process of actually
extracting transforming and loading an
Excel workbook into another
workbook so we loaded this table in but
how do we actually go about using it
well in this portion we're going to be
demonstrating how we can manipulate it
with a pivot table and how to basically
control all our different queries if you
notice we had over on the right hand
pane this queries and connections now if
it's not popping up you can go up here
to the data Tab and then you see queries
and connections you can navigate it on
and off by clicking this button power
query sets up these queries and in this
case it named it sheet one after the
sheet one in that workbook that we
exported in I'm sorry that we imported
in and if we hover over it we can get
some details about the columns when it
was last refreshed it's load status and
even data source now connections over
here on the right right now we have zero
connections that's actually what's
controlled by power pivot which we're
going to be going over in the next
chapter on power pivot but anyway back
to Power queries itself right now we see
with sheet one that 3,000 rows are
loaded and if necessary we go through
and refresh the data set as showing it
loaded the data and 3,000 rows are
loaded again pretty quick so let's
actually get into manipulating this well
it says that 3,000 rows are loaded but I
actually I can go in and delete this tab
and it's going to give you this warming
that's going to per delete the sheet do
you want to continue yes I do and
whenever I do that since the data is no
longer loaded it now displays that it's
connection only so we can actually
change where we load our data to if you
will and I can get to this by right
clicking it and then going into here and
we'll be exploring all these other
options as we go through but I'm only
want to focus right now on this load to
and they have a few different options in
here let's actually explore them right
now it has only create connection so
right now it only has a connection if we
go back to that table and click okay it
once again loads it into that table if
we want to actually get into a pivot
table we'll select this on pivot table
report we can also do a pivot chart and
it asks whether we want to put it in the
existing worksheet or a new worksheet
and then finally it has ADD this data to
the data model you've seen this one
before and once again we're going to be
going over data models more in depth in
chapter eight on power pivot so we're
not going to be enabling this checkbox
just yet anyway I went in the existing
worksheet I don't need that table there
so I'm going to click okay and it says
hey there's possible data loss because
we're going to be basically getting rid
of that table and replacing it with a
pivot table do I want to continue yeah
and now like we did before in the pivot
table chapter we're now using a pivot
table and so we can put things like job
title short and analyze it for the count
of different jobs that it has within it
there's no change whatsoever in
everything we learn in pivot tables
still same application that we're using
it here
for so now let's actually get into
importing multiple different Excel files
we're going to specifically be importing
all 12 of these of January through
December this time whenever I go into
the data tab under get data and we want
to get it from a file but I'm not going
to select an Excel workbook instead what
I'm going to do is select a folder
because all those Excel files are in the
same folder inside my course I'll
navigate into resources data sets and
then I'm going to select the folder
itself and select open now you may
notice the Navigator window looks a
little bit different and that's because
now it contains the metadata of these
Excel files itself such as the name data
access modified created and whatnot and
with this one before we had that load
and load to along with transform data
we're just going to go into combining
this data set so I'm going to go ahead
and click that and specifically we're
going to use combine and load now we
navigate to a window we're more familiar
with of combined files and what this is
doing is showing is how it's going to
actually combine the files in that we
need to make sure one that they're all
the same format but if I actually click
sheet one of which the sample file is
looking at is the first file this is
what it looks like and we know this
already because we looked at the January
file anyway if you wanted to you could
also change this to a specific file I'm
fine with just using first file
selecting a sheet if I was having errors
I would do skip files with errors but
I'm not worried about that just yet I'm
going go ahead and click okay and Bam
now we have that once again that table
loaded into here and this has all the
data so I expect it to have around
30,000 results similar to what we've
been working with before and it looks
like it does and if you notice we have
this new column right here on Source
name which tells which Excel file each
of these comes through and just doing a
cursory check it looks like all the
different months are in there now onto
this queries and connections paint up
here I'm going to actually make this
smaller so I can actually see it all
previously we only had our sheet one
query but now we have also this data
jobs monthly query and with that up here
at the top because we're connecting
multiple different files we have these
helper queries that were created during
the process so you can navigate over
these and basically see that hey it used
the September file as a sample and this
is the steps it took or this is what the
sample file actually looks like anyway
I'm not too concerned with those helper
queries right there or with anything
underneath this transform from files I
mainly care about what's under those
other queries so we have sheet one and
data jobs monthly speaking of which
sheet one is a really bad name for this
so I'm going to rename this to data jobs
January I also rename the sheet so just
to prove with the data jobs monthly that
we actually imported it all in we're
going to go in and load to and we're
going to do this time a pivot chart
going to go ahead and click okay we're
doing the existing sheet with the table
I don't care if I get rid of that table
so I'll click okay and similar four I'm
going to put that job title short this
we're going to put in the Axis or the
rows and then we're going to want a
count of that as well and then I'll just
organize this in descending order based
on the count of job title short so bam
we now connected with power query to
multiple Excel files and imported in at
once I hope you realize that now this
unlocks a lot of potentials because say
you get January of next year's data you
could just put it into this folder here
and then just all you need to do is go
back into the data tab click refresh
it's going to go through and refresh all
that data set and pull those new numbers
in all right in this example you're not
going to follow along I'm just want to
show the power a power query okay the
pun's getting old by now anyway I have
this CSV pile or comma separated values
basically it has comma separating
everything I looked at this is in VSS
code don't worry about any of this stuff
like I said you're not doing it the main
point is to show this data set itself
right here it's starting at the top row
of one and if I scroll all the way down
we get to the last entry and that's
2.7 million jobs that I have here in
this data set we can actually import
this into Excel now if you recall if you
scroll all the way down to the bottom of
excel it only includes about 1 million
rows so how the heck are we going to do
this with power query so this is a CSV
file I'm going to go to data get data
from file specifically it's a text or
CSV and I'm going to import in this data
jobs large file that I have reminder
again you don't have access to this file
it's just too big to even get onto
GitHub so that's why this is a demo only
this is the data set itself so I'm going
to go in and actually go and look load
it now this has taken a little bit of
time as you can see it's loading around
100,000 rows as it goes through also it
had well it has three errors now in here
this usually appears whenever it has a
row of data that doesn't necessarily
make sense for what it's supposed to
import it alert you there's an error so
the 2.7 million rows are loaded but I
get this error message the query
returned more data that will fit on a
worksheet remember it automatically by
default tries to load it into a table
into Excel and it's telling me that hey
it's not going to fit so I'll click okay
now it's still going to try to load that
table but it's going to cut it off at
that 1.5 million but this doesn't mean
we can't analyze it if I scroll over
this query it reminds me that the
results of this query is too large to be
loaded to the specified location
worksheets have a limit of 1 million
rows sure instead what I'm going to do
is go and load to and I'm going to load
to a pivot table click okay and it's
going to warn me again about the table
loss yeah I know so once it loaded like
a hot minute to do that I can actually
go through and now analyze these 2.7
million rows so if I do something like
put the job posted date into the rows
and we also want and we want to get a
count of this so I'm going to put the
job poster date also into the values so
we get this counts anyway reformatting
it with commas to actually be able to
read this now we can see that we did
actually get in 2.7 million different
data points for this and as a side note
this is all the data that I've collected
since I started in 2022 doing this so
there's a lot of different jobs so Excel
is not necessarily limited to just
analyzing 1 million rows of
data all right so let's finally get into
that last example of importing in this
list of S&P 500 companies feel free you
don't necessarily have to do this table
from Wikipedia but I'll drop a link
below on where this table is located and
you can use that if you want so I copied
the web page of that table then I'm
going to come in here and select like
this of from web you can do basic or
Advanced with Wikipedia it's perfectly
fine to do the basic version putting in
that URL clicking okay we get into that
Navigator window and there's actually
multiple tables inside of here one is
the list of 500 companies and the second
one is a list of companies that have
been added and also removed from there
they also just have random tables in
there as well just because in the
internet you're going to have random
tables like this one of main menu
contents tools appearances not
applicable anyway we want to do table
one I'm going to go ahead and click load
and now that we have it in here anytime
we do this probably need to rename it
appropriately from something like table
one to S&P 500 in this case and Bam
scrolling down we can see that we have
um should be 500 oh a little bit more
than 500 apparently the list has been
updated to clear a little bit more I
don't know why that is but got all the
DAT
nonetheless now quick note on if you
want to actually navigate into any of
the files and see what I've done
whenever you go to open it so in this
case I want to open power query intro
I'm going to open it up you're going to
get this of external data connections
have been disabled do you want to enable
content in this case yes you want to
enable all that now the problem you now
may also have is that it may give you a
warning that your data source settings
aren't correct and what do I mean by
that if I go into data and then under
get data we're going to see this thing
here for data source settings and it's
managing settings for your data sources
anyway you're going to see these
locations here these are file locations
of the data sets and they reference the
files that are on my computer that's not
going to be the same for your computer
it's probably going to be in a different
location with a different name so here I
know that this is the data jobs monthly
folder if I wanted to actually go in and
update it with the actual location for
where it is I would go down here select
change source and then from there select
browse to navigate to it you're going to
once again navigate to your course of
excel. analytics into resources data
sets and then there's that data jobs
monthly click open and okay and then
it's going to update you're going to
have to do that for this file and all
the files within a power query and also
power pivot because your file locations
are not the same as my file locations
then after you do that all it should go
through and refresh but if it doesn't
you can manually refresh it underneath
the data tab by clicking refresh
all the last item to call out is the
options menu we're going to be going
into the query editor in the next video
so we're going to save that for that one
anyway query option has a lot of
advanced details in controlling power
query in this case of showing the query
Peak when hovering on a query in the
query's task pane that's sort of
annoying to me it pops up every now and
then I'm going to go ahead and unclick
it but they also have different
behaviors you contr control for data
load for the power qu editor the
security privacy and even Diagnostics so
feel free to go through this and
navigate and see what is available to
actually customize with this I'm going
to go ahead and click my changes of okay
and now whenever I go to the queries and
connections and actually hover over
something like data jobs Jan doesn't
just pop up on the screen and sort of
catch me off guard so I sort of like
that all right we now got some practice
problems for you to go through and get
more familiar with performing Bally ETL
with power query and loading in some
different data sources with it with that
we'll see you in the next one we're
going to get into the power query editor
anyway nothing to be intimidated by as a
lot of the core principles we've learned
already in Excel are going to be applied
to this new window so you're going to
pick it right up on it all right with
that I'll see you in the next
[Music]
one in this lesson we're going to be
continuing on with power query focusing
on specifically getting you introduced
to this power query editor and in order
to facilitate this we're going to be
going through or walking through
actually importing and cleaning up our
data science job posting data set that
has over 30,000 rows of data we're going
to be automating a lot of the steps
using power query that previously we had
to use functions and formulas for so
it's going to be saving us a lot of
times in order to actually automate this
data in
justest for this we're going to be
starting out with a blank workbook so I
know we do have this power query editor
but I don't want you actually editing
from that that's more for a reference
now if you do open this file in order to
reference it as we go along this
remember you're going to have issues or
an error saying hey data source isn't
there remember you need to go in and
actually select where this data set is
so under the data tab get data and then
under data source settings you're going
to need to update this link or this
address right here of where you're
actually accessing the data job salary
all Excel file this is my location not
yours got to update it anyway like I
said we're not going to be using this so
I'm going to open up a new notebook and
like before we're going to be importing
in that data set so we'll go to get data
from file from Excel workbook you'll
navigate to the course itself under
resource under data sets and then we're
going to be using this data jobs salary
all Microsoft Excel file go ahead and
import this in we're going to select
that sheet one and this time instead of
doing load or even the load two we're
actually going to go into transform data
and this is now going to pop open the
power query editor and this is where all
the magic happens behind the scenes in
order to get our data cleaned up so
let's go over a quick overview of the
window itself it's very similar to laid
out to excel up at the top we have a
ribbon with four different tabs of Home
transform add columns and view we'll be
walking through each one of these as we
go through this lesson underneath here
on the Le hand side we have which query
we're selected to once we're building
multiple queries they'll start popping
up underneath each other we can close
this if we want and make more room is
right here in the middle is what the
current step or what the current status
is is of our data set now yours may look
a little bit different right now
specifically I have this column
distribution enabled underneath the view
tab which I'm going to go to more in a
second but anyway it basically outlines
all the different columns or where we're
at with the data set itself before we
finally loaded in now right above this
area is a Formula bar just like similar
again to the Excel UI and this has all
the steps or all the code the M language
done in this current step if you will of
actually cleaning up this data set and
you're like step like what step well
over here on the right hand side we have
our query settings and in it we have the
name of our query and then we have the
applied steps this lists all the
different Transformations that we've
walked through so just a brief walkr the
first step is source and if I look at
the formula bar basically what it's
doing is it's connecting to that Excel
file with the file path that it has in
the next step of navigation it's
basically selecting hey out of that
Excel file actually select sheet one
from there to actually load in then from
there we can see that the headers are
actually in the first row and not up at
the top so the next or third step is the
promote the headers up to the top and
then finally the last step is change
type it actually goes through and
assigns for each of these what data type
it is so in this case job title short it
assigns to type text whereas something
like job posted date it assigns to type
number which needs to be a date which we
going to fix that in a little bit down
at the bottom there's a few statistics
on this specifically talks about 16
columns and over 999 rows and it tells
you when the last preview is downloaded
anyway if I just wanted to stop here
with this data transformation if you
will I would just come up into home go
into close and load we're just going to
do close and load two and in this case
like I'm just going to put in a pivot
table
specifically analyzing for job title
short specifically how many different
counts or that we have of this we can
see totaling it all up have around
32,000 anyway that's a quick overview
let's actually get into exploring each
one of those tabs in the power query
editor so we're going to go back to data
get data and from there you can just
select this of launch powerquery editor
similarly you can also use a shortcut of
just alt F12 I'm on a Mac so I have to
press option
but actually launching this up boom it
has it with just a shortcut anytime you
launch it it may be grayed out here so
we need to make sure that we go in and
actually select a query that we want to
analyze and
transform for this overview we're going
to start with the view tab because
mainly I want to get into actually how
we can use the power query editor for
Eda and thus save us a lot of time of
actually having to analyze it in Excel
in the spreadsheets itself instead we
can do it right here so going through
this first thing is you can toggle on
and off the formula bar I always leave
the form on so I don't know why that's
an option next is the data preview I can
change the font type I can also CH the
column quality so this is telling us if
there would be a potential error in here
or if in this case of job location if
there's empty values you typically have
error values whenever the data type
isn't being being understood correctly
so in this case job tile short is text
everything in there is a text column if
I were to change this to number press
enter to run I'm going to get errors all
the way through here because well that
was text and can't convert text to
numbers also not sure why but it should
say 100% error but it's not anyway they
also have this green bar up at the top
and you can use this that's what I
actually prefer so I'm going to unclick
on The View and changes from the con
quality because you can actually look up
here and see and then also togg it so in
this case for salary or average it looks
like there's 60% of them are valid and
40% are empty now remember this is only
doing the data sets around 30,000 or
32,000 rows but it's only profiling so
down here on the bottom column profiling
based on the top 1,000 rows so that's
all we're seeing right here if I wanted
to see all of the data itself now
depending on how big it is we may not
want to do this I can select this at the
bottom and column profiling based on
entire data set and it's going to reload
back into here not sure how long it's
going to take now going over I can see
there's 22,000 data sets of for data
points of the salary year where 10,000
are empty the other thing that you may
have enabled by now is that column
distribution to be able to see what are
the what is the breakdown of distinct
and also unique values investigating
what actually distinct unique means I
went back to the job tile short looks
like now it's actually picking up on all
the different errors I'm going to
actually change this back we don't want
this to be number for job tile short
we're going to change this back to text
and I'm also going to refresh the
preview by going to that Home tab
basically refreshing it to get it all
cleaned up anyway if we recall from our
previous analysis there's 10 different
job titles of sat senior data scientist
data engineers and whatnot and so that
is the 10 distinct values they're
distinct because they have repetitive
values in here like right in here in six
and 7 data engineer appears more than
once now if we go over to something like
job country they have 111 distinct so
meaning 111 countries that have multiple
different countries and only 12
countries that have one value for it or
one unique value all right the last
thing in data preview is column profile
and this is pretty neat right now I'm
selected on the job tile short column
it provides one on the left- hand side
key statistics about the column and then
two it actually shows a breakdown of the
value distribution of it so this is
really helpful in performing Eda if I
wanted to go through here and actually
see something so I can easily go in and
even see something like job country and
see how United States has the majority
of the values and then how the different
other countries Fall underneath that now
this takes up a lot of room and sort of
valuable real estate so I find myself
togging Ling this column profile on and
off all right last few sections in this
view tab go to column if you have a
large data set with a ton of columns you
can just come down here select the
column you want to go to and then it
will navigate you to it parameters this
is beyond the scope of this course we're
not going to be enabling parameters or
even using them so we'll call This na
next is the advanced editor which allows
us access to basically the behind the
scenes of our am uh M language which
we're going to be breaking down further
in an upcoming lesson so we're going to
save that but you can also access that
from the home menu in advanced editor as
well lastly is query dependencies
whenever it gets into complicated ways
that you're actually building your
different queries and how they're
connected to each other this is going to
come in handy and this case we're
showing that hey we connected to that
Excel file on my MacBook and we loaded
it into a pivot
table all right next up is query
settings I'm actually going to go ahead
and close this out for queries over here
anyway with the query settings we can
actually change the name of the query if
we want to in this case is named sheet
one I don't really like that I'm going
to name it something like J jobs and I
know it has salary data in it so I'm
going to have salary down here on the
applied steps like we mentioned this is
a step through walkth through of each of
the individual steps that power query
has taken to actually clean up our data
set now one thing I will call out in
this if I need to modify anything so in
this case if I wanted to modify the data
source here I could come inside of here
into the formula bar and edit it I would
encourage you if you're not familiar
with the phone of the bar with using
that or comfortable using it instead
click click this settings icon over here
on the right hand side and then
typically a window will pop up and allow
you to edit it so I could technically
change the location of this or change
what type of file it is the same for
navigation as well I can basically pull
back up that navigation window that I
had before and change the sheet I wanted
to for the change type this doesn't
really have a gear icon next to it for
us to edit so we're about to go through
and actually change it but if we inspect
the job posted date we'll see that here
one it has it underneath the type number
but then actually looking at the column
itself it's a number value because
remember Excel stores ex uh dates as
number values behind the scene well we
could convert this to a date by typing
in date here but you may not be
comfortable doing that just yet anyway
with that that's a great segue into the
Home tab into how actually we can change
something like a data
type with the home typ we've already
seen a lot of things already right we
saw the close and load too we also saw
that I can go through and actually
refresh my query query to make sure that
it's fully loaded and up to date if I
have multiple queries I can not only do
this refresh pery I can go to this
refresh all and it does refresh of all
queries we've already seen Advanced
edited before properties just allows us
to actually go in and change the name of
this query if you want to and manage is
more advanced we'll be dive in that in a
little bit similar to Under The View tab
with goto column we also have this
option of choose column and go do column
we can also just actually select a
column if you will so if I wanted to
actually select job post to date or even
more than that I can just do that and
it's going to select it and it's going
to actually remove all the other columns
so which is not what we want to do which
brings us a good point if we want to get
mid rid of a step all we have to do is
come over to the applied steps and
there's a red x mark that will appear
over any step that you do so I'm just
going to go ahead and click X here and
it's going to remove anything that I've
done moving on to remove columns which I
think is pretty self-explanatory if you
want to remove a column you just select
it and you select remove column
additionally if I want to remove all
other columns so in this case job title
let's say I want to keep that I could
select remove all other columns and it
would do that I want to cancel this step
so I'll click X similarly to remove
columns we have well keep rows and also
remove rows and then we have options for
also sorting our values if we want to
sort them from a to z or Z to A
depending on a column so back to job
post to date maybe I wanted them in
numerical order I could just click A to
Z and it would go through and actually
sort it anyway I don't really want to do
this I'm going to clear this step as
well this brings us actually into what
we want to do of we want to change this
job posted date to a date time and
that's we're going to use underneath
this transform section in the Home tab
right now this data type as I'm
selecting this job posted dat it notices
that it's a decimal number I go to
something like search location it
changes to text so what I want to do is
change this data type of decimal number
to specifically a date time because
that's what we have in here we have date
and time now this popup is going to come
up if you're doing this underneath the
step that has changed type already what
it's noticing is that the selected
column has an existing type conversion
would you like to replace the existing
conversion or basically preserve that as
a number and add a separate step I'm
just going to go ahead we're going to do
replace current but I just want to show
what it looks like of adding another
step in this case I converted it in this
step to a number and then the next step
I converted it to a date time I don't
like having a bunch of steps I want to
make this as concise as possible so I'm
going to clear that step instead and
instead this time whenever we go through
it and select date time I'm going to say
hey replace current now underneath here
it updated that job post to date type to
date time and it's all within one step
love this similarly to that date time I
also want to convert the salary or
average and the salary hour average
columns right now they're decimal
numbers which is nothing wrong with that
but I actually have the option to change
it to something like a currency in this
case once again I want to replace the
current step for that I'm going to do
the same for salary hour average and
change that to a currency as well for
replace current covering briefly these
other sections in the Home tab first up
is merge and append we're going to be
covering an entire lesson on this and
how we can actually take different Excel
files and different queries and combine
them together with this manage
parameters is outside the scope of this
course I don't find myself ever really
doing this so not something we need to
worry about data source settings similar
to what we saw outside of the power qu
in Excel basically the same popup is
going to come here to allow you to
change where your data source is and
then down here at the very end if we
have wanted to put in a new query I
wouldn't necessarily have to back out of
the power query editor I could just come
in here and select a new source a file
or database or other source and then
work through actually importing it in in
a query sometimes I find myself also
using this one of enter data say I had a
simple table that I wanted to input into
Power query to have I could go through
and just create that
table all right next up is the transform
Tab and this one I feel is maybe
actually although it looks like a lot of
options it's probably one of the most
simplest as you can see we have things
like text column number column date and
time columns structured columns
basically if we have a data type of this
we're going to go to you can go to if I
have a number column I want to go to
this and see what things I could do to
it such if I could do statistics to it I
could do rounding to it or I could even
get information out of it if it's even
or odd I also have this section on any
column that basically applies to any
column this is allows us to one like we
saw in the Home tab actually convert the
data type of something but also even
more advanced Transformations such as
pivoting and unpivoting columns which
we're going to be diving deeper into in
the next lesson on Advanced
Transformations and finally we have this
section on tables which just does more
of generic things to this data set such
as if I wanted to actually go through
and count the rows on this could and I
find out I have 32,000 different rows on
this anyway I actually want to transform
a column of this specifically this job
via column as you notice from here that
all these different job platforms have
via and then a space right at the
beginning of it I want to actually
remove that so in order to do this I
make sure that one job via column is
selected I notice up here in the any
columns it has the data type of text now
there are a few options in underneath
the text column section for like
splitting columns I could split it by
this half and then delete that via but I
find actually the easiest way to do this
is just go through this replace values
and we're not going to do replace errors
we're going to just do replace values
itself and we find a value in here in
this case we want to find VIA with a
space and we want to replace it with
well nothing if I wanted to go into
advanced options and I have a few
different selections available but
neither of these applicable does so
we're going to just go ahead and click
okay and Bam now we have these job
platforms cleared up now we've been
going through this and keeping the names
of these steps the same but sometimes I
like to be more descriptive in when it's
not a general tyag now it named this new
Step replaced values I may actually do
that a few times and I want to be able
to whenever I go back to this actually
be able to identify what steps did what
in this case change type promoted
headers navigation Source those are all
only usually typically done once so I
know what that means however however for
this one I don't know so I'm going to
right click it and go to rename and I'll
say this is replaced via in job via
which is much more descriptive in my
opinion all right only one more tab to
cover and that is the add column with
transform we transformed a current
column with ADD column we're adding
additional column to this similar
transform it has these options for text
number and also date and time so very
familiar features with this so let's say
I wanted to extract the month and the
year out of the job posted date column
basically I want to Callum for month and
I want to Callum for Year anyway
previously we learned with that
transform tab if I were to come into
here under date time and then select
something like month it's going to
transform this tab so it's going to get
rid of the contents of the job posted
date is not necessarily what I want I
want a new column so I'm going to
actually get rid of this Stu so with ADD
column what this does is with that job
posted date column selected I select
date in this case I want month I could
do start a month end of month day of
month whatever I just want the month
itself and then inserted month is pretty
descriptive I however don't like the
name of this so I could come in here
this is an option and change I double
clicked on this and name this job posted
month and then press enter now with this
I'm going to get a renamed columns here
so now I have two steps of this month
was inserted into this and then we
rename the column I would encourage you
to minimize the amount of steps you have
because these queries can get quite long
in this case I'm going to delete this
rename column go back to this inserted
month if we actually re read this you
don't actually need to understand what's
going on much in here but I can see
basically that we have this month in
quotation marks and this is named month
so I basically can reason that this is
probably the new column title of this so
instead of using month I'm just going to
edit this in the formula bar to job
posted month then I'm going to click at
the end and press enter and now all
within one step I inserted that month
and renamed it as well if you're not
comfortable doing that feel free to go
through that next step of actually
double clicking this and actually
changing it but I would encourage you if
you can actually try to mess around with
the formula if you make a mistake it's
pretty simple to just X out of that step
and then redo it again so there's no
harm to your actual data set now
similarly if I wanted to create that job
posted year column I could just go
through here select year whether I want
start year end of year year itself once
again it inserts year and then I would
want to change the name of this and
change this to job posted year and then
click enter and Bam now we have it I
don't actually need this all these are
from 2023 I don't actually this is not
going to provide any useful data for me
so I'm actually going to delete this
Stu all right I want to do one last
transformation before we actually load
this and going to actually visualize
this so we have our salary year average
column and then also want to compare
this to the salary hour average column
but right this is on a yearly basis this
is on an hourly basis what we could do
is do a conversion to our salary hour
average column to get it to an equal
value or comparable value to our yearly
value meaning we could put the number of
hours in a year multiply it times this
value and from there get what would be
the expected yearly salary for this hour
data so I could do this via the
transform tab right going into that
number column under standard we want to
actually multiply and then there's 2080
hours in a year working hours for 40
hours of work week I could go through
and actually do that and that's going to
update this column itself but remember
we probably want its own column so I'm
not going to use that instead we'll go
to add column with this
hour average column selected select
standard multiply put in those hours of
2080 and then click okay once again I'm
going to rename this I can see that this
multiplication column is titled this via
in this step right here so I'm going to
rename it to salary hour adjusted and in
this case I'm going to also rename this
step to adjusted hourly salary to yearly
now I'm sort of a stickler for keeping
my data set in order right now I have
this job posted month and it's sort of
right away from it's pretty far away
from my job posted date I would actually
want to move it right next to it so
there's a couple options I can do to
move it I can select the column and then
come up here to the transform Tab and
move go left right to beginning to end
or I can actually just take it and then
drag it and this is taking forever it's
like paint dry but find where I want it
boom plant it in and then inserted the
step of reordered columns I'm going to
do the same thing with salary hour
adjusted and put it right next to salary
hour average and both of these done with
one step of reordered columns so I'm
fine with
that so now let's actually get into
analyzing this specifically I want to be
able to analyze and compare this salary
hour adjusted column that we just
created compared to the salary year
average so going back to home I'm going
to close and load this in we have this
previous analysis that we did before
doing Eda on the jobs actually want to
create my own from scratch all right so
back on sheet one we can see our queries
connection specifically that data job
salary remember the data tab you can go
into that and it can toggle on all that
queries and connections anyway we want
to insert I want to analyze that hourly
adjusted salary so I'm going to come in
to create a pivot chart we also do pivot
chart and pivot table at the same time
anyway when this pops up for pivot table
or pivot charts we want to we're not
going to select a table AR range because
this is a power query connection if you
will we're going to use this external
data source and we're going to say
choose connection what connection do we
want to use for this specifically I want
to use that DOA job salary so go ahead
and click that and open and we're going
to insert it into the existing worksheet
so now the pivot table set up for us go
forward to do one quick note you may be
tempted say if we went back to jobs Eda
to rightclick this and then go load to
and let's say hey I wanted to create a
new pivot chart well the problem is is
going to then get rid of this pivot
table that we previously created so you
don't want to necessarily if you want to
keep this you don't want to actually do
that back to the pivot table itself
you'll notice now because we have these
queries and connections but you can
toggle between the two over here on the
right hand side anyway what I want to
compare is that salary hour adjusted to
that salary year average right now it's
doing sums we don't want that we do
eventually we're go to Value fail
settings we're going to do average here
we're eventually going to do median I
promise you but we're going to STi for
average for the time being I'll adjust
both of these to be of average then I'm
not really liking the formatting here I
know we adjusted it as currency back in
the the power query but this is the one
data type that I find doesn't actually
follow through in actually making into
the correct data type when you import it
into Excel so you do need to go back
still and actually convert it into the
correct thing anyway we're seeing that
the hourly salary is much less than the
yearly salary and moving this over we
can also see this via visualization this
doesn't really show as much I would
rather look at this when compared to job
type
so I'm going to go ahead and grab job
title short and throw it into the axis
now closing out of this and then closing
out of this on the side we can now get a
better view of this I'm not liking the
format of this pivot chart specifically
I'm going to go in here design under
change chart type and change this to a
bar chart I feel like it's going to be
easier to read yeah it's a lot easier to
read also for these visualizations I'm
going to rightclick this and I'm going
to say hide all field button so that
make this easier to view and I'm going
to go ahead and stick The Legend at the
bottom okay we're off to a good start
other things I want to do to clean this
up is oh my goodness this is so long I'm
going to change these column titles to
hourly adjusted salary and then yearly
salary additionally I want to sort this
a little bit better specifically from
high to low so under sort options more
sort options I'm going to go into
sorting this as sending based on the
year L salary from high to low sorry
that's actually descending selecting
year salary clicking okay no it was
right the first time it's ascending okay
this is looking good you know also I
don't like having different colors I
like actually going with a consistent
theme so going into design change colors
I'll change this to this monochromatic
pallette 8 and Bam we now have our final
visualization that we use power query to
basically ingest all our data in clean
it up create this new column of hourly
adjusted salary perform an analysis in
Excel to average it and we can see that
consistently the hourly salary is well
below that of the yearly salary so I
guess it pays to have a salary job all
right we have some practice problems for
you to now go through and test out all
these different features and get more
familiar with the power query editor in
the next lesson we're going to be going
into advanced Transformations and Diving
deeper specifically in analyzing skills
and using power query to actually clean
it up so where we can actually analyze
skills with that see you in that
one all right welcome to this lesson
we're going to continue on with power
query specifically focusing on using
more advanced Transformations and for
this we're actually going to get into
analyzing those skills and being able to
put them on a graph and actually
visualize what are the top skills of
data nerds now if you recall way back in
the functions and formulas chapter when
we went over text functions we did a
little bit of text cleanup to clean up
this column and then plot it but we were
only able to do that with around 20 rows
now with the power of power query we're
actually going to be able to clean up
all these values and be able to
visualize it for all 30,000 job post
so let's jump in if you want to you can
continue on from that worksheet that we
used in the previous lesson and just
make sure that you do go through and
actually save it before you continue on
however if you got lost in the way or
you just don't have that file anymore
feel free to use the lesson or the file
from the last lesson of power query Eder
once again you don't want to be using
the actual one working cuz that has the
final results we're going to want to
work with that one and this has all the
different work that we did it also has
some some additional analysis whenever I
looked at plotting it over time to see
if how the salary of yearly versus
hourly
compared anyway let's get into editing
this and we can get to the power query
editor by going up to get data launch
power query or pressing alt F12 once it
loads and need to click on the query
that I actually want to look at and I'm
going to close this or minimize this the
first thing that I want to do is start
an index column on this data set because
in general whenever you have a source
data set or a fact table like this is
you want to have an index associated
with it yeah these row numbers are good
but that's not good enough and we'll be
using it more in the power pivot chapter
but it's good practice to start it now
so moving over to the add column tab I'm
going to go to index column it allows us
to start from either zero or one I'm a
coder so I like from zero now Pro tip I
want this index at the front now I could
go to to transform and then move and
then move this to the beginning but
remember we did this reordered columns
right here so what I'm actually going to
do is take this added index put it
before reordered columns now that the
reordered columns is right there
whenever I select this index and move
this over to beginning it's going to be
included in part of this step of all of
our column reord so I don't have once
again multiple different reordered
columns
all right in order to clean up this job
skills column we're going to end up
being putting this uh these skills right
now they're separated by column inside
of this list we're going to be breaking
them up into their own individual rows
and because we're breaking this up into
different rows this now is going to put
for this Row one value here this is
going to make 1 2 3 4 5 6 7 this is
going to make seven different rows of
data this is going to mess up anytime we
want to analyze anything because imagine
if you have like a salary data it's then
going to appear seven times so the main
point of explaining that is we want a
new query to actually populate and
actually break these skills out into
their own separate rows so in order to
create a query or another query right
now we have queries one to create
another query from this we have two
options and that's underneath Home tab
they have manage and we can either
delete a query which we're not going to
do we can either duplicate it or
reference it I can also get to this by
just right-clicking the query and it
also has these of duplicate and
reference let's actually look at both of
those starting with duplicate first so
I've created my duplicate query and as
you can see it basically has a duplicate
of the original query nothing really has
changed from it now this is cool if I
want to walk through all the different
steps again and I wanted to have it in
this new query but I actually like this
other option so I'm going to go to data
job salary this CL I'm going to go down
select reference okay this query this
one named three is referencing data job
seller and it only has one applied step
if we look at the applied step all it is
doing is referencing the data jobs
salary so this first query right now and
populating it for us and this is really
good because say now I make changes to
the original query such as say I want to
go through and I don't want any any more
of the hourly data in here I only want
the yearly data so I filter down to only
have the yearly data so now it's
filtered these rows for the yearly data
don't worry we're actually not going to
do this I'm going to delete this Stu but
anyway if I go to that duplicated query
the one with the three at the end this
one only has year values in it this I
can verify is 100% yearly by looking
either the column distribution or the
column profile everything is your anyway
we don't actually want to do that step
so I'm going to go back to this original
query clear the filtered rows and once
again it's going to just clean this back
up to have two distinct values so
compare checking the S rate yearly and
also hourly okay so we like the
reference for our case cuz I like we may
make changes to the original one so I'm
going to delete this number two because
remember that was the duplicate and
we're going to keep the number three one
which was the reference we're also going
to be doing all our alterations on the
skills on this one so I'm going to to
rename this one data jobs
skills so with this new query data jobs
skills let's actually get into cleaning
up this column of data of job skills
specifically we're going to be
separating this into each of these
skills into the new rows by this comma
delimiter but we need to remove a few
things from this specifically this has
brackets around it and it also has
single quotes we don't need any of that
we need to remove it so going to that
transform tab we're going to go into
replace values and we've done this
before so for the value defin I'm going
to just start with the first square
bracket we want to replace with nothing
I'm going to click okay additionally we
want to replace the other bracket as
well replace it with a blank and then
finally we want to replace that single
quote as well also I'm going to just
rename these all next thing we going to
do is actually split these columns on
this delimiter of a comma so under
transform we can go here to split column
it has a few different options by
delimiter number of characters by
positions we can go to by delimiter I'm
going to select that for this we're
going to use a comma delimiter because
there's multiple different options you
could potentially use for this we want
to split at not just the leftmost but we
want to split at each occurrence there's
no quote characters in here we removed
all the quote characters so I'm going to
click none and then click okay so now we
just split these skills into let's see
how many different columns we have here
looks like we have up to 24 skills for
all these different skills that we have
so now what we need to do to get all of
these if you will skills within a single
column we need to unpivot them but the
one issue right now so I have all these
skills right here but we also have all
these other columns right here I don't
really care about all them just I don't
really care about around too much I want
to mainly just analyze job title short
and indexed so what I'm going to do to
make this easier because I need to
basically select which columns I want to
remove or which ones I don't want to
remove in this case so what I'm going to
do is go back to source and this one has
before we actually broken up the job
skills so I'm going to select job skills
hold down control and then from there
select job title short and also index
and then underneath the Home tab we're
going to go to remove call s what we're
going to do remove other columns
basically going to keep those three
columns that we have now we are doing
this in the applied steps after that
first step of source so it's asking hey
do we want to insert this step yes we do
and so now we've limited it down to
those three columns and Bam now whenever
we go down here down to that last step
of change type we can see that we have
all our different job skills and then
over on the right hand side we have our
index and our job tile short which I
don't really like the order of this I'm
actually going to go back to reorder
this over here I'm going to just take
these column values and then put them in
this order of index job title short and
job skills so now we actually get into
unpivoting these job skills columns
basically making all these job skills
into one column so I'm going to select
instead of selecting all the job skills
column I'm actually going to select the
opposite holding control select the
index and job title short and I'm going
to go into to transform tab into unpivot
columns and for this one once again
we're going to use the other we want to
unpivot other columns and go ahead and
do this all right so what we do here we
now have this new column of attribute
and value attribute if we go back that
is just the name of the column that was
created previously and then the value is
what was in the cell itself and that's
filled with all the skills so personally
I don't really care for use of this
attribute so I'm going to go ahead and
just remove this column by right
clicking and selecting it additionally
I'm going to go back up here and I don't
want this to be named value so I can go
in and inspect this under unpivot other
columns I can see in here that it
renames these columns attribute and
value in this case I don't want to be
value like I said I want to be job
skills clicking enter boom renamed it to
job skills and then in here it is job
skills
now one thing that's bothering me real
quick before we continue on to actually
visualizing this data is this column
here typically I like to name things
something like job uncore whatever it is
in this case index I want to Name jobor
ID but if you recall back we created
this back in this data jobs salary
portion especially here under the step
of added index I want to change this
from index as we've done before going in
and renaming it to job ID however
whenever I do this press enter this is
going to break my queries and this is
going to happen to you anytime you're
manipulating it so I think we need to
get familiar with it so if I go to the
next step of reorder columns we're going
to have this expression error the column
index of the table wasn't found duh
because we named it job ID in the
previous step instead of index but this
step is still the same so what I can do
is come in here change index to job ID
press enter and Bam that updates but
then now going to data job skills we're
going to have the same thing you're
going to notice with this one right the
column index the tail wasn't found index
so same error message what we want to do
you can do is go to error it's going to
go to the first occurrence of that error
in this is trying to reference index we
if you call back from if we go to the
first step of source we expect it to be
called job ID now because we renamed it
right so I'm going to change this to job
ID and then scrolling through the
applied steps to see whenever we get to
our next error if there is an error and
that's unpivot other columns
specifically they have job title short
and index I don't want index here I want
job ID and now bam now we have it
cleaned so I should have done that job
ID but that was actually good
troubleshooting to walk through that you
may
encounter so let's actually get into
visualizing this so we're going to go to
home and we're going to close and we're
going to close and load
now it's popping up as a table but we
actually want to analyze this I don't
really care to have it as a table so I'm
going to right click it and I'm going
click load to specifically we're going
to go to a pivot chart and we'll insert
in the existing worksheet because we're
going to get rid of that data yes
there's going to be possible data loss
we understand that so I'm going move
this chart off to the side select inside
the pivot table and we want to analyze
the job skills so I'm going to take the
job skills put them in rows and then the
job skills also in the values to to
count up the values then also I'm going
to sort them I want to sound them from
high to low so I went to more sort
options um we're doing a descending
order count of job skills so now there's
a ton of different skills in here but
want you to inspect this if you notice
one these skills have sometimes have
spaces in the front of them basically we
didn't do a full cleanup of this so
that's why we have python twice in here
is cuz this one has a space of it so
opening up the power query editor by
playing by pressing alt F12 so
underneath the data job skills query I'm
going to go ahead and we want to do a
text
transformation specifically if we look
underneath this underneath for format we
can change this to lower case upload
case capitalize each word we're going to
do trim which removes leading and
trailing white space from each of the
cells in the selected cell from there
we'll go back to home close and load
this and now it's going to be reloading
the data and those duplicate values are
now going to be removed now there's a
lot of skills here so I really only want
to see the top 10 so I'm going to put a
filter on here go into value filters and
top one specifically want to see the top
10 items by count of job skills also I'm
going to rename this to skill count and
because these are text values down here
I'm actually going to change this from a
column chart going to change chart type
into a bar chart instead clicking okay
boom and then with this obviously it's
not sorted from high to low that's how I
want actually to sort it so I'm going to
go in here back underneath our more sort
options Chang this from descending to
ascending and the good thing about this
is we still have that top 10 filter on
it so it's still going to apply this and
have the top 10 values on there first
last little clean up I'm going to hide
all field buttons I'm going to get rid
of this Legend right here and and then
I'm going to rename this to what are the
top skills of data
nerds now let's say that I'm frequently
referencing the top 10 skills as we have
right here and instead of having to
populate this every single time I want
to actually create a own or create a
query for this so opening power query
going to alt F12 I could do the same
analysis inside of power query query and
get this into its own table to be reused
but for this I don't want to use this
data job skills query instead like we
did before I'm going to create a new
query we're not going to duplicate this
instead we're going to reference it so
now it's Unique and distinct and I'll
rename this data jobs skill count
because we're get the top 10 and their
Associated count so in order to do this
analysis to find what is the count of
all these different skills we want to do
a group buy and it's right here under
transform form under that Home tab and I
can do group by which group rows in the
table based on the values in the
currently selected column we're going to
be forming a basic Group by we're using
that job skills column I could change it
to another column if I wanted to and
that new column name is going to be
skill count operation we're going to be
counting the rows we could do any other
type of aggregation as well if we had
numerical data we could do average
median min max whatnot go ahead and
click okay so we've done this
aggregation now the next thing is I just
want to get the top 10 values but before
to do that I need to actually sort this
in descending order right now I can tell
looking into the numbers this isn't
necessar although it looks like it isn't
right so clicking the arrow up at the
top I'm just going to say hey sort
descending and then we want the top 10
values so underneath the Home tab under
keep rows I'm going to have keep top
rows and it's going to prop me how many
number of rows do I want to keep 10 in
this case I want the 10 values and now
from here all I got to do is close and
load this into its own separate query
and Bam here we have it and so if I
needed to reference the top 10 skills
any time all I would have to do is just
reference this query and I wouldn't have
to like we did last time go through this
full analysis so power of query is
really great at automating some
repetitive analysis and having it just
ready for
you all right last little cleanup if we
look at these skilled names they're not
formatted correctly specifically if I
look at something like SQL I expect to
be all capital letters SQL capital
letters python I expected to be Capital
At the beginning python so we're going
to go through and actually fix this so
that way whenever we present our data to
someone it doesn't look like a hot mess
so opening up the power query menu by
pressing alt F12 we're going to go into
the data jobs skills query specifically
on that last step on and we're want to
alter the job skills column so the first
thing I want to do with this text
cleanup the easiest thing looking at
this is we just need to capitalize the
first letter of every single word and
then from there we'll go through and
actually fine-tune it to capitalize in
case of SQL capitalize all letters we'll
have to put in special case for this
anyway if you recall from before we have
that transform format and they have this
capitalize each word we're going to do
that the next thing though the more
complicated one is we're going to go
into add column and we're going to add a
conditional column so what we're going
to do is go through we're going to keep
the the name of custom column cuz we're
technically going to be since we're
adding a column we're going to have to
go and delete this job skills column
once create this new one I don't want to
name a job skills right now going to
call MK anyway what we want to do is we
want to select the column that we want
so if job skills equals in this case we
expect to equal something like SQL we
want the output to equal SQL then if we
want to add more conditions or Clauses
to it we go to add Clause once again I'm
going to select job skills and I'm going
to put something like
powerbi it had a lowercase ey at the end
I want the powerbi to be fully
capitalized at the end I also went
through and added some other ones such
as AWS gcp no SQL and SAS most all these
required them to just capitalize fully
except for the no SQL one then what do
we want it to be if it's not any of
these conditions well we'll add this
else clause and we want it to be
basically the results of an entire
column we want it to be whatever it is
already in the job skills column I'm
going to go ahead and click okay so now
we have this cleaned up data set as well
with nice looking names now if you want
to if you're going through and finding
anything in here that you want to clean
up feel free to add to that conditional
column statement those are the ones I'm
just going to go for right now anyway
because we added this new column and I
don't really know an easy way to do this
without actually creating this new
column we need to now go ahead and
remove job skills and rename custom so
going to the Home tab I'm going to
remove column I'm going to remove the
one that's selected and I'm going to
renames custom to job skills and
conveniently because we're using that
same name and just replacing it if I go
to the data jobs skill count that one
because it references this one will also
get updated and all those values in
there are updated as well anyway let's
go ahead and close and load and inspect
this is our previous pivot table and
pivot chart that we analyzed it's now
going through and loading all the data
and now we have it updated with all that
correct formatting for those different
data points one last thing before we go
this is generic these top skills of data
nerds tall data nerds and that is using
the data job skills query which has the
job title short column in it so we can
actually visualize this for a certain
job by going into pivot chart analyze
I'm going to go into insert slicer
specifically we're going to look at job
title short I'm going to put it over
here and then as usual I'm going to
rename it real quick to job title and
now let's say we want to analyze
something like data analyst we can see
that SQL is the top skill but Excel is
in second place followed by python
Tableau and SAS what about for business
analysts very similar in that sqls top
and then Excel is in that second place
so really unique and showing the
importance of excel Within These skills
and pretty meta that we used Excel to
find this out all right now it's your
turn to give it a shot you have some
practice problems to go through and get
more familiar with doing these Advanced
Transformations specifically pivoting
unpivoting and then also Group by all
right with that I'll see you in the next
one we're going to be diving into append
and merging queries specifically going
to be doing this with that skill query
that we did previously all right see you
there
let's now get into how to perform a pend
and also merges and so the first portion
of this lesson the easiest portion of my
opinion is going to be a pend
specifically going back to that Excel
sheet where we had all those different
uh sheets for the months of the year and
they're job posting on each because all
these data sets are of the same format I
have the same columns we're going to be
able to append all these together and
get what is our final data set of all
30,000 rows if you recall each month had
around 3,000 postings so that's how we
get to that value from there the primary
focus of this lesson will then shift to
merge for this we're going to be
combining our two queries that we built
previously one which was our original
data set so we titled that one data jobs
salary and then that new query that we
created in the last lesson on the skills
so data job skills we're going to be
merging those two together
and this will allow us to do some pretty
interesting analysis specifically now
that we've merged those we'll be able to
see based on a skill what is the
expected salary and we're going to build
a visualization for that for the top 10
skills now merge unlike a pend is a very
complex operation mainly because there's
a lot of different types of merges
specifically there's six type of merges
in Microsoft alone so we're going to be
walking through each one of those so you
understand the differences and know
which one to use when for this first
append example we're going to be using
this data job salary monthly data set
and just as a refresher this contains
everything for in this case I'm selected
on the January sheet down here and this
has all the January data which has
around 3,100 rows for this and we have
each one of the months for the year
here anyway let's use power query to
append all these together because
previously before you knew about this
you'd have to go through and actually
copy and paste all these different
options right here and then put it into
a new sheet doing this 12 times is a hot
mess so since this is only a simple
example that we're not going to use
later on I recommend just opening up a
new workbook for this now coming into
the data tab I can come down to get data
and they do have this option right here
for Combined queries merge and also
append but this is for append two
queries from within in this workbook
it's basically assuming you've already
imported it in so instead what we need
to do is actually go to from file and
actually start our first query of
connecting to that Excel workbook with
all those different sheets navigating to
the course underneath resources data
sets and then here down on data job
salary monthly I'll select that select
Import in the Navigator we can see all
the different sheets that are available
we want to actually do enable this of
select multiple items and then go
through and select all the items with
all these loaded we're going to then
shift into not just loading it we want
to actually go into the power query
editor so I'm going to select transform
data and it's going to start by loading
each one of those sheets and just going
to be naming each one of the queries
respectively after those sheets with
power query editor launched we can see
over here in the left hand pan all 12 of
those queries for each of the months so
these are all their separate own queries
because of that we need to now move into
actually appending them and make it one
final query that we can actually export
into or import into Excel so underneath
the Home tab they have the option for
combine append queries they have appen
queries and append queries is new with
the January query selected I'm going to
go to append queries and for this I can
say either do two tables and specify the
table I want to do we're going to do
three or more cuz we want to do all of
them with them all selected I'll now go
through and click okay to append now
this inserted a step of appended queries
inside inside of that January query so
now that January query is all those
different data sets so I just want to
verify that I got all the data in here
right now if we scroll down well I'm
just going to show it right here we're
only showing column profile based on the
top 1,000 the fastest way to actually
find this out is just go to the
transform Tab and go to count rows which
it tells me there's 36,000 rows which
it's a few thousand too many and if I go
back into the appended query option and
actually look into it I can see in the
formula bar we have August in here I
accidentally selected it twice so I'll
go ahead and delete it and then look at
the counted rows that's actually what I
expect the value to be around 32,000
anyway that was just to count the rows
additionally I don't want the append
query to be inside of that January query
so I'm going to delete this step as well
instead with the January query selected
I'll go back to that home append queries
and then select append queries as new
this is going to create a completely new
query once again we want to do three or
more tables this time I'm going to hold
control and select all of them and then
move them over at once make sure we
don't have duplicates this time so this
now starts a new query right now it's
called aend one I would probably name it
something like data jobs all and then
pressing enter it then loads in here but
you can see these queries like imagine
the case where I right now we have 13
queries I want to organize these a
little bit better so we can actually
group these specifically we can group
these monthly ones I selected April and
then holding control selecting all the
other queries as well then right clicked
it and I'm going to select this option
to move to group we need to have a new
group and I'll call this real uniquely
data jobs
monthly and click okay so now we have
these two folders one with data jobs
monthly I'm going to close that down and
then there other queries which we've
seen before and there's one query inside
of this of data jobs all this cleans it
up also you may get this disclaimer up
here the preview may be up to 33 days
old feel free to refresh it if you've
been getting that should have no effect
on your data then if we wanted to we
could go through and actually Analyze
This by pressing close and load to I
pretty maturely selected close and load
I recommend you select close and load to
anyway nonetheless I'll go to the data
jobs all we'll go to load to
specifically I want to look at a pivot
table I know there's going to be some
data loss because it's going to remove
the data in the sheet and then I can
inspect that job posted date
specifically for the account dragging
job post date into the rows and then
also dragging job posted date into
values and once again this is why we
double check it this time it looks like
I accidentally imported in January twice
with this as we can see that it's
35,000 anyway opening up that power
query editor going to the data jobs all
query and updating it to remove that
second January that I should have caught
from before and then close and loading
it and now it should refresh and update
for these these correct values boom so
now it's actually aligned with what I
expect to see this why we always double
check any type of query or analysis you
do this double check of the work is
going to save your
butt all right let's now get into the
bulk of this lesson I'm moving into
merge for this feel free to continue
working with that workbook that you were
working with in the last lesson if you
didn't Happ to save it or you got lost
you can use the advanced transform
workbook from the last lesson that'll
pick right right back up where we left
off and then as usual the append and the
merge are the final examples that you're
going to see at the end of this which
specifically for append you've already
saw so let's actually get into merging
those queries for this I want to press
alt F12 and right now we have three
queries in here the data job salary
which is basically like our fact table
this includes all of our data going into
transform and count rows we have as
expected around 32 data point points I'm
going go ahead and delete that Stu
similarly we have this data jobs skills
which has all of our skills in it let's
see how many rows are in this by going
up to transform and to count rows and
this has
167,000 now it's important to understand
these numbers because we're going to be
using them or need to understand them
whenever we actually get into the joins
to see when we have missing or more data
so I'm going go ahead and delete the
step of counted rows as well we don't
need it then we have also this final
query of data job skills count this was
made as an example only we're not going
to use this any further into the future
so I'm actually going to go ahead and
just delete this to minimize my queries
it's going to ask them I'm sure want to
delete it yep so let's get into merging
these queries I have data job salary
selected come up to the Home tab under
merge queries we're going to have merge
queries and merge queries as new like we
learned from the append of appen queries
and appen queries is new we're going to
want a new query so that way we still
have these Source queries so I'm going
to go merge queries as new with this
this merge window pops up and it says
select the tables and matching columns
to create a merge table specifically we
want to go with the data jobs salary and
we want to merge it on the job ID that's
why we created that a few lessons ago
we're trying to connect to the data jobs
skills on also that job ID now down here
underneath this there's a join kind and
there's six different options from this
of left outer right outer full outer
inner left anti and right anti now Kelly
put together this fancy chart that shows
visually what is happening with these
merges and we're going to be walking
through all of these briefly in order to
understand which type of join you should
be choosing depending on which scenario
you're in as a quick overview these
circles are signifying the two different
tables so in this case table a and table
B and the Shaded Blue Area shows what
portion of the contents from those
tables will be included in the final
table first up is a left outer join and
with this join what's showing here is
that all rows from table a will be
included in the final table and then
from that Center portion right there
where A and B overlap this signifies
that it's only going to keep items from
table B that are in table a or match
with table a so what does it actually
mean so if we go here into join kind and
select left outer and then what we get
told based on this next to this check
mark is the selection matches 29,000 of
32,000 rows from the first table so what
are those missing jobs well basically
there's some jobs that don't have a
skill now this isn't necessarily a bad
thing although we're not going to go
with this join this could be an option
we could use I'm going to click okay to
load it in so right now we have it under
this query called merge one and as you
can see there's not repeating any job
IDs basically we have the original dat
jobs salary table and then we scroll all
the way to the right we have the data
job skills over here and if you see each
one of these items is a table if I click
on it and expand it to see hey what's in
this table we can see that for this one
there job posting or job ID of 10,000
And1 this is the table associated with
it so I'm going to go ahead and actually
delete out of this step and go back to
it so what we could do is expand it out
and there's this icon up in the top
right hand corner I'm going to go ahead
and click it and it's going to ask me
how it wants to basically expand out and
in this case I already have the job ID I
already have job title short I would
expand it by job skills so now seeing
how these skills are broken over I can
actually scroll all the way over and see
that now 10,000 And1 ID is duplicated
multiple times and if actually looked at
the number of rows within this data set
this new data set we have 170,000 rows
now technically this merge has exactly
what we want but we still need to go
through those other merge examples to
understand them so we're going to show
them as well now for this I want to go
back to that merge window and I'm going
to click the settings icon I need to get
rid of the step we're going to be trying
out different types of merges so I'm
going to xit out and then go in here and
click the gear icon now it's popping
back up we did left outer next thing
we're going to look at is Right outer
for right outer this takes all of the
rows out of table B and then from there
any that match those rows in table a are
included now this one when we look down
here it says Hey the selection
matches
167,000 of 167,000 rows from the second
table if you recall back from that left
outer we had
170,000 so 3,000 higher why is that well
that table a or data job salary has
3,000 roles in here that don't have any
skills listed hence why 3000 is less
this provides a similar type of merge
that we did before where we need to
actually go over to that data job skills
and expand it out selecting the job
skills column and with this table we can
just check that we have 167,000 rows
which bam we confirm all right I'm going
to get rid of these two steps we're
going to move into the next merge next
is inner join and this provides only
matching rows from table a and matching
rows from table B so depending how
you're join it there could be missing
data on both A and B for this one it's
saying hey the selection matches about
29,000 of 32,000 rows from the first
table which what we expect and then
basically all of the rows from the
second table so this one if actually go
into it and then expand out those data
job skills looking only at the job
skills column with it expanded out
actually counting the rows we have once
again 167,000 so missing that 3,000 of
jobs that don't have skills next is left
anti and in this case it checks to see
what matches it doesn't have and Returns
the value for that specifically for
table a whichever values don't have a
match it's going to return that so in
this case it says the selection excludes
29,000 out of the 32,000 when I go to
load it I get the rows from table a or
data jobs salary and it still has the
data job skills but actually if I looked
into here right we should be matching on
things that don't match or don't have a
value specifically there shouldn't be
inside anything in this table that I'm
clicking on and as expected they're null
values because it doesn't have skills so
exiting out of navigation going back to
Source counting these rows we can see
that we have 3,000 jobs basically with
no skills for right anti this gets rows
from the right table that do not have
matches in the left table and for this
with right anti- selected this selection
excludes 167,000 out of 167 rows from
the second table so basically everything
from this table is included we're not
going to walk through this in the power
query cuz this is also not what we want
the final one we're going to actually
use is a full out join from this it
takes all rows from table a and all rows
from table B and if there's a match it
will join those two if there's no
matches it's still going to return them
in the table it will just be a null
value for where it doesn't match up and
this talks about how basically selection
matches 29,000 of 32,000 rows from the
first table and all the rows from the
second table loading this in once again
we have data job skills we need to
expand out and we only want to expand
out those job skills and then from there
just going to do a double check I'm I'm
going to do count rows and this has
170,000 rows in it so similar to our
left outer we could have done either of
these these are one the twos that we
want but I'm going to stick with this
one of the full outer because I have all
the work here any I'm going to close out
the step and I think that's a great
example of sometimes there may be
multiple joins that fit the example it's
important that you go through and
actually count the rows and understand
the data set to figure out which one you
need to use and for what purpose anyway
one thing I glossed over real quick
going back to source and that gear icon
is right underneath this underneath the
join kind they have used fuzzy matching
to perform the merge right now we're
doing basically exact matching as the
job ID of 10,1 we're matching up exactly
with the 10,1 from the other table fuzzy
matching allows you to connect to tables
that have basically non-exact matches so
in this case we have table a with a
student ID and a student's name and only
their first name but then in table B we
have the student name full so first and
last name and the grade with the fuzzy
matching we could merge table A and B
based on that student name First Column
and the student name full column now
what happens if we get to where we have
students with multiple similar first
names it's going to create a hot mess so
I don't always recommend using this
unless you know the data and you know
you're going to cause complications with
it so that was a quick overview of of
the different joins within power query
if you want a more indepth tutorial for
how this is done then and you can check
out my SQL tutorial where I go through
it with all the different SQL analysis
that we do in that course and break it
down step by step I'll include a link to
that video right here for you to go and
see
it all right so we have the final table
that we actually want for this remember
these do have duplicate values in it so
you have to keep that in mind anytime
you're doing analysis I'm going to
rename this as data jobs merged one last
thing for close and load we have this
job skills column which is sort of
redundant right now because we actually
have the data job skills not job skills
the actual skills itself so I need to
get rid of this column I actually want
to do this I'm going to do this in the
source step before we even break this
out so I'm going to select job skills
and select remove columns it's going to
ask if I want to insert the step which I
do and then after we remove the columns
we go into expanding it out and because
we did it in that order I can actually
come in here instead of renaming it here
I can just rename it via the formula
inside of expanded skills and just
change it to job skills and Bam now I
only added two steps Vice one all right
go ahead now we're going to close and
load two I'm going to want a pivot table
and also pivot chart so I'm going to
select the pivot chart option here and
underneath quers and connections it's
going to show that it's loading this in
here under data jobs merged so let me
show you what we're going to be creating
with this I want to build this
visualization that's showing what is the
salary of the top 10 skills top 10
skills by count for data nerds and this
is a combo chart we're going to have not
only the salary or the average salary
for a skill but also for this line
portion we're going to have the
associated count for the number of
skills that appears or how many jobs it
appears in all right so I'm going to go
ahead and move this pivot chart out of
the way and select the pivot table
remember we want to use the job skills
we're going to be analyzing that so I'm
going to throw in the rows the first
thing I'm going to look at is the
easiest is the count of these job skills
and I'm going to rename this to job
count along with changing the value
field settings going to number format I
want to change the number specifically I
want to use a thand separator with zero
decimal places I'll go ahead and press
okay so we have a count now we want the
average salary so I'm going to take
salary your average drag it into the
values right now it's doing a sum so
I'll go into value field settings select
average and then for number format we're
going to do currency with zero decimal
places click okay and okay again and I'm
going to change this one to average
salary and then specify the units of USD
all right so now xing out of this and
xing out of this now our pivot chart is
sort of all jacked up well it is jacked
up mainly it's trying to PR this as like
a dual column chart and that's not what
we want so we're going to change this
design of it going to design change
chart type I'm going to go over to combo
and then underneath here for the combo
for the job count I want that to be a
line so I'm going to go up here and
select line and for the average salary I
actually want that to be the column now
I want the job count on a secondary axis
I don't want the same axis as the salary
itself because they're just not
proportional I'm going to go ahead and
click okay I want to clean this up a
little bit further by removing the
legend and then also right clicking here
and hiding all field buttons on this
okay there's now there's still too many
skills on here remember we want the top
10 skills so going into the pivot table
itself I'm going to come up into the
filter into value filters and top one
we're going to do top 10 items by job
count all right this is getting a lot
more readable now because I have the top
10 by job count I want to order this
from high to low by salary so I'm going
to go to more sort options and we're
going to do descending on average salary
I'll click okay and Bam now we're
getting somewhere so we're seeing things
like spark and AWS have the highest and
Excel did make the top 10 so it's on
there at 100,000 other things I'm going
to change selecting on this pivot chart
is the actual design itself you know how
I am about colors so we're going to
change the colors I'm going to use this
monochrom MAAC palette 8 I want the line
to be a lighter color than the actual
bars itself I'm going to go ahead and
add access titles for primary vertical
and secondary vertical for this I'm
going just select the box go into the
formula bar and say hey for this one
make it equal to average yearly salary
for this one selecting the Box going
into the formula bar pressing equal I'm
going to make it equal to job count I'm
also going to add a title to this I'm
going toall this of what is the salary
of the top 10 skill of data nerds and
remember this is for all data nerds so I
want to be able to actually what's the
great thing about this of joining these
tables now we not only get salary data
but we can get job title information so
I'm going to add a slicer now but going
in pivot chart analyze insert slicer add
in that job title short only going to
move that out of the way now I'm going
to go to slicer I'm going to rename this
to a more friendly title of job title
and now now let's actually look at it
for data analyst so with this looks like
python arlor the highest Excel still
makes that top 10 and for data analysts
at
86,000 it's also if we look at this it's
the second most important skill behind
SQL which has a value of 96,000 let's
see what it is for a business analyst
once again SQL and Excel are two of the
highest and for business analysts Excel
is paying 87,000
so bam we just showed the power of well
append but also more specifically merge
we can now take this analysis to another
level analyzing skills to other data
points from our main fact table or that
data jobs salary table that has all of
the data in it so now you have some
practice problems to go through and get
more familiar with using both a pend and
also merge after that we'll be jumping
into the last lesson of power query
focusing on the M language as I warned
at the beginning don't worry if you
don't have coding experience or anything
like that we're going to be taking it
nice and easy and you're going to be
able to follow along and fill it out
pretty easily we're going to be doing
some final prep before we finally send
this data set on over to power pivot
which we're going to cover in the next
chapter all right with that I'll see you
there welcome to this final lesson on
the M language and we're going to be
going into some pretty Advanced
Techniques and understanding how to read
and better utilize the M language in
building your power query queries anyway
nothing in this lesson is going to be
used that we actually go through and do
used to build on our project so if any
time you're not following along or
you're not able to do anything don't
worry too much nothing's actually be
used it's more to inform you about the M
language so you get more familiar with
it as a disclaimer you will not be an
expert on M language you not be able to
code in M language after this mainly
you'll just be able to look look at it
understand what's going on there from
there and make slight adjustments if
necessary feel free to continue working
on in that worksheet that you've been
using previously where we just
calculated in the last lesson looking at
the top 10 skills and what the salary is
for them however if you got lost or
wasn't able to follow along or just
starting over feel free to use this
merge notebook don't use once again that
M language one that one's going to be
what is going to be done at the end of
this lesson so what are we going to be
covering in this lesson well if you open
up the power query editor we can
navigate into it we're going to be
covering three main things first is the
Z Advan editor actually walking through
a previous query and understanding how
to read it and then from there under add
column tab we're going to go into these
different examples on creating custom
columns and also custom
functions so what exactly is this m
language well if we dive in
documentation we can see that the power
query engine uses a scripting language
behind the scenes for all power query
Transformations the power query M formul
language also known as M so although
we're doing all these edits inside of
this power query editor behind the
scenes if we navigates something like
the advanced editor it's actually using
this m language right here to carry out
all the Transformations and it goes on
to say if you want to do Advanced
Transformations using the power query
engine you can use the advanced Editor
to access the script of the query and
modify it as you want it even goes on to
discuss that if you're not finding what
you need in the actual GUI or the
graphical unit user interface of the
power query editor you can use the M
language editing it in the advanced
editor for
this so let's go into breaking down this
m language more by going to that data
jobs merge and entering the advanced
editor and we're going to be just
breaking down this simple query right
here up here on the right hand side
there's a few different options display
options I'm going to do this render Whit
space basically it shows me the
indentation that's going on here right
now I'm seeing that there's four spaces
in here anyway the key thing here is
we've have first have this let keyword
and then in keyword this Begins the
basically definition block if you will
this whole portion right here for
defining different variables and
specifically different tasks if we look
we have things like source expanded data
job skills sorted rows remove column
remove columns if I go ahead and move
this over to the right those applied
steps are the same thing those are the
variables itself I currently have enable
word wrap enabled and I'm not liking the
format and how it looks I'm going to go
ahead and unclick that finally we have
the in keyword and then this displays
the final value that we want to appear
for our query so in this case we want
the final value of rename columns or the
last applied step to be what appears now
this Advanced ER I'm going to expand it
back out again is also a syntax Checker
so in this case let's say I deleted this
quotations at the end of this rename
columns it's going to one it's going to
give me these red squiggly lines to say
that hey there's something wrong here
and two it's going to actually give you
an error of invalid identifier and so we
would probably know that we probably
need to fix this so we're not going to
be breaking down much more of the
formulas here but I do want you to spot
two main things from this the first
thing is this column names column names
are always put in quotes in here and
conveniently they're also highlighted in
here so if you needed to do any changes
to column names or see what's happening
that's one quick way to identify it the
next thing is this every step that is
taken refers to the previous one what do
I mean by this so this first step is
assign the valuable variable of source
and I know it's assign this variable
because it has an equal sign right next
to it
and then whenever we go to the next line
of expanded data job skills inside this
function of table expanded table column
it references source which if I scroll
over it I can see that it's giving me
the same formula for source which is
right above it so basically it's
plugging right into it similarly this
expanded data job skills is going to be
located in the next one below it on
sorted rows and it's going to be the
first value in here for this table
sorted and if you're curious about what
these different functions are doing you
can just scroll over it as well in this
case table. sort sorts the table using
one or more Columns of names and
comparison criteria and it tells us via
the syntax inside the parentheses that
the first parameter is table is table so
it takes that previous variable which is
a table anyway one minor last thing
about this if you notice these are
surrounded by these variables have a
hashtag and then double quotes on each
side and that's because they have white
space in the actual names that we're
doing for this in the case of source
there's no white space it's only one
value with no white space so it doesn't
need to have this around it anyway why
am I yaen about all this stuff if you
need to understand this m language
anyway we're going to actually create
this data jobs merge query I'm going to
select it all press contrl C to copy it
then from there I'm going to close out
of it we're going to now create a new
query so underneath the Home tab I'm
going to go to new source I'm and then
under that other source and I'm just
going to go into blank query okay right
now this is completely blank but I can
go into that advanced error of query 1
and it has the let and instill and
obviously nothing going on here what I
can do is just highlight this all and
then using contrl V paste all of that
other query into this now when I press
done it goes through and actually
creates that same exact query from data
jobs merged now it could could have gone
through and right click data jobs merged
and click duplicate but this is more of
to show that you can actually go in copy
queries or copy portions of queries and
thus paste it into other ones which
we're going to do in a little
bit so let's get into more of learning
about the M Language by actually
cleaning up this query one that we just
created by using this column from
example first thing though I do want to
rename this query one this is the one
we're be working with for the remainder
of this lesson and I'm going to call it
data jobs clean because that's what
we're going to do we're going to clean
it up so we have four major tasks that
we're going to do with this the first is
for job schedule type I just want to
extract out the first value out of here
that's full-time out of it additionally
we're going to be using the date and
date time columns to extract the weekday
and also the hour of the job postings
and then finally we're going to do some
data cleanup on this job title column
that frankly is a mess specifically
we're going to move job postings that
have this parentheses remote around it
anyway let's start with this first one
of this job schedule type if I go into
view and then look at the column profile
it looks like we have that full-time
contractor part-time and whatnot but we
have a lot of combines of full-time and
part-time contractor and temp work
full-time parttime and internship I
basically want to go through and just
extract out what is the first value that
appears in here so in the case of this
full-time and parttime just want to
extract full-time contractor and temp n
work only contractor so under add column
and then column from example we'll do
from selection and this appears at the
top of add column from examples enter
sample values to create a new column
control enter to apply so I'll first go
by entering fulltime and it's already
picking it up I'm just going to type it
in first okay and then I'm going to
scroll down but in this case I'm going
to put in hey I want full time for this
one this is the example remember so now
it's cleaning up that let's scroll down
further if it's done this fully for even
more okay it's getting the first of
these and you might think that this is
correct but the problem we're running
into now is if we go down to this one
where it says contractor it's only
contract do and just looking at the
formula this is the formula it's
generated so far it's doing teex start
and nine I don't really know too much
what's going on here but I'm assuming
that it's taking the first nine values
that's not I want so inside this
contractor one I'm going to type in
contractor with an R so that way it
hopefully fixes this so this is good and
now it has text before delimiter and a
space so I'm going to go ahead and click
okay to load this in so let's scroll
down to just inspect it to make sure
that we have this correct and an easier
way instead of scrolling down and trying
to find something I can just use this
drop down right here and look in here
and it looks like we're good
except for we now have a comma here
specifically I have a fulltime and then
a full-time comma so what's going on
here well for values that have more than
two so three they actually insert a
comma in there and when we inspect our
formula opening up the formula bar here
it's only checking for a space so the
easiest way to fix this is actually just
like we did before we're pretty familiar
with it let's go to the trans form Tab
and then under replace values we want to
go to replace values specifically we
want to find commas we want to replace
it with a blank bam so now pulling down
that drop down we don't have multiple
different full times we just have that
single one without the comma we have
what we want all right we're going to
rename this and I can just go ahead and
double click this and rename it but I'm
actually going to do something first I
see that I have the step already for
renamed columns so I'm going to take
that and I'm going to drag it to the
Bottom now with rename columns as the
last step I'll then rename it to job
schedule type first press enter and then
it inserts it into that current step as
we can see from here cuz we're now
familiar with it and we don't have
multiple rename columns in there and
then finally you know how I get about
column ordering this job schedule type
first I want it next to the job schedule
type so I'm going to drag this on over
here see how long it takes and we've
moved it over and we now have this new
step of reordered columns all right
let's look at some other quick examples
for column from examples for this we're
going to be using the job posted date
for this using column from example I'm
going to select from selection now with
some of these things whenever I type in
this box I want to get let's say the
year in this case if I were to type in
four one it would pop up that hey with
all these different options we can do
and so this provides a lot of different
options as far as okay I do know if I
wanted to do the month I could do that
and pressing enter it's going to copy it
all the way down that's not what I
wanted this case though I'm going to
double click it again go
2023 and scrolling down and looking
through this this option here of year
from job post to date so we're going to
go with that then press enter and
looking at the transform we can see what
is the m language code that it used for
this it used the date and year function
putting in job posted date this is what
we want we'll click okay you know I I'm
with naming so we're not going to keep
this named year so I'm going to modify
this m language to be job post posted
year with that renamed let's actually
move over to our other example
extracting out the hour for this we're
going to be using that job posted
datetime column column from example from
selection in this case I want the hour
out of it so I'm just going to put
something like nine and we can see that
we also have this here for hours from
job post to date time I want that one
press enter again inspecting the M
language formula it's extracting the
hour out of this one I'm good with it
I'm also seeing the other values are
updating correctly I'll click okay and
we have our new column called hour which
you know me we're going to fix this an
updated hour to job posted hour press
enter all right now we got it so you're
probably like look I already know how to
go something like the transform Tab and
already extract out that information
using these functions that we used
before well that was mainly as a primer
for this next example we're going to be
doing and that's that with this job
title column there's some job titles in
here that have a lot of sort of
frivolous information that we don't need
like in this case supervisor information
technology specialist and then
parentheses it has associate director I
don't need anything in parenthesis
similarly for this for the senior data
engineer I don't need this remote in
here so let's select this job title go
into add column column from example and
from selection for this first one with
the associate director I'm going to
select it so it appears below and then
just highlight what I want press contrl
C and then paste it in here then
scrolling here through here to do a
cursor check so I'm seeing that senior
data engineer remotes in here I could
select it and copy this down here
another option is I just go in here
double click it since it's now
populating and delete out that remote
press enter and it looks like it's doing
this it's getting the text before the
limiter job title specifically before
the parenthesis and looks like in this
case University grad data scientist PhD
only now hiring it removed all that okay
so this is now doing what we want click
okay and I don't want I want to call
this column text for delimiter I want to
call this job title clean pressing enter
all right so last thing I want to now
clean up these columns and you know how
I get I want the year an hour to be next
to the date time the job tile clean be
next to the job tiles I could drag and
drop these I'm going to show you
something else this reordered column
step we're going to be modifying the M
language for this and I don't want
reordered columns to appear more than
once so I'm going to take it once again
and drag it to the very end now what I
can do is take and modify this m
language that we have in here now if we
actually inspect this reordered columns
it may do this or may not in my case it
didn't add anything after job skills it
basically let any new columns just fall
towards the end so this job skills all
these other columns after it aren't
included which not a big deal so what I
want to do is I want to move this year
and hour to near job posted date and job
posted month so I'll enter inside of
here put in job posted year and also job
posted hour make sure we're putting
commas after both of those then I'm
going to run this to make sure there's
no issues with it and it looks like it
moved it over inspecting next to job
post a date we have our month and also
year and hour all right the last one is
this job title clean and I want this to
be right after job title so I'll go
ahead and put that in right here making
sure to put a comma after that and then
from there press ing this check mark up
here to move it inspecting over we have
job title clean right next to
it our next to look at is custom column
we'll go ahead and actually just select
this and whenever we pull this up this
tells us this allows us to add a column
that's computed from the other column
provides a box to basically put in the
new column name but right here this is
where we put in the custom column
formula or the M language to maybe clean
it up now let's start with something
simple let's say I just wanted to repeat
the job ID column I would come over here
select job ID click insert it's going to
put it in notice that the variable
itself is inside of brackets and I'm
going to rename this job ID repeat down
at the bottom it's telling me that no
syntax errors have been detected I'll
click okay and then I get this new step
for added custom and we can see hey it's
job ID repeat scrolling over yep it
repeated it if I want to go back in to
edit it I'll press that settings icon
and it's going to pull this back up so
let's do something a little bit more
complex now and it going to involve the
salary year average column and that
salary hour adjusted column go ahead and
cancel out of this what I want is to
create a new column that if there's a
salary year average value it will
basically be in that new column and then
if there's a salary hour adjusted value
it will be in that column instead
just as for warning anytime salary year
average is null there's always a value
for salary hour adjusted and vice versa
so like I said we're not going to
becoming coding experts with this so I
recommend taking use of chat Bots like
chat gbt gemini or whatnot lots of free
options available out there anyway we
have this prompt of generate a power
query formula for a custom column on
building make the column salary your
average if it's not blank otherwise it
is salary hour adjusted now it's giving
do the entire M language right this is
what we providing to the advanced editor
providing that previous step name what
column we're using everything like that
I care about really this formula right
here specifically everything after the
each I'm going to copy this from if all
the way to the end that's the actual
code right here going back to the custom
column I'm going to delete that job ID
out of there I want to make sure that
there's an equal sign still there and
I'm going to paste this in
and you can see from this this is just
basically an if formula it's doing if
salary year average is not equal to null
then salary year average else perform
salary hour adjusted down at the bottom
we can see that no syntax errors have
been detected so I'm going to go ahead
and click okay so bam we now have this I
did in that jav ID repeat value here so
we're going to actually change that to
rename that value to salary year
combined and then clicking the check
mark in order to to rerun that formula
to update the column and you know I like
have my steps in order so I'm going to
grab reordered column and I'm going to
drag it to the very end and for this one
I'm just going to drag it over to salary
hour adjusted right after it to salary
year combined so now scrolling down just
to double check it it looks like we got
140,000 here 140,000 82,000 82,000 there
so the formula filled out
correctly so let's get into our final
task so we've been working this data
jobs clean data set we made this salary
year combined which is pretty useful
actually what happens now if we want it
in something like data jobs merged what
do we need to do to actually add it into
here because we have everything we need
for it specifically we have that salary
year average and we have the salary hour
adjusted columns well we could recreate
it in here going through all those steps
creating that if statement or we could
just copy it out of the advanced error
and bring it in here so I'm going to go
back to data jobs cleaned and then under
home Advanced editor I'm going to go and
find the step that's in here
specifically it was this of added column
and I'm going to copy it because I can
see that hey it has the salary year
combined in it I'm going to copy it all
the way the the end and I'm going to
copy it by pressing contrl C okay go and
close out of this one and then bring
over to data jobs merged go into the
advanced editor and I want to insert it
in right at the end so I'm going to go
to at the end of this block of this let
block going to press enter and then from
there press contrl + V to paste it in
now I'm already getting an error message
and it's saying hey token comma un
basically expected and it's not getting
it if I scroll over I can see these
squiggly lines right here basically
there's not if we can see there's commas
after every one of these variable
definitions so I need to come up here
put a comma in there next is this a
comma cannot proceed an in so if we
scroll over we can see this is red
highlighted probably wrong not to have a
comma here so we'll get rid of it now
we're not done it's going to say there's
no syntax errors but we didn't complete
this remember you have to have the name
of the it's got to reference the
previous name here in it so if I tried
to even though it says no syntax errors
if I try to click done and go to load it
I'm basically getting an error I can see
this by this basically air Bo at the top
of each one of these columns also
there's only one applied step and it's
calling it data job
merged of the actual title itself but we
need to fix this query and actually get
it back to where it had multiple
different applied steps so I'm going to
go back to the advanced editor we're
going to show what we did wrong here and
that has to deal with remember we had
before where we had something like
remove columns you reference the
previous column in it so in this case
remove columns right there well rename
columns is the last one we had I'm going
to go ahead and copy this by control
cing it but yet we have inex inserted
text before delimiter one which is not
correct so I'm going to select all of
that and replace it by pressing crl +v
so we have the rename columns now one
other thing we have to do this last
statement or and the in portion needs to
be referencing that last variable of
added custom so I'm going to go ahead
and copy this contrl C and then pasting
it in control V click done and now
scrolling all the way over we can see
that we have that salary year combined
column that we created in the last query
it's at the end we do need to move it
over but it's in there nonetheless so it
helps with understanding these queries
now one quick thing before we go we've
gone through basically every single
thing in this chapter on power query up
to this point with the exception of this
invoke custom functions this basically
invokes a custom function defined in the
file for each row of this table this is
more advanced and Beyond the scope of
this course we're not going to be
covering it but is available for you to
dive into say you're doing a lot of
different Imports and you need to
automate the Imports that you do this
would be a path you would go but for
beginners like us I'm going to say stick
away from it for the time being so this
now wraps up on the M language and that
was really a crash course and
understanding how to use it by no means
do you need be a professional or be an
expert coder and codeing the M language
if you got lost at any point in the way
nothing to feel ashamed about this is a
very pretty complex topic if you would
like to learn more I do recommend this
book which is M is for data monkey it's
a good little read talking about not
only Power query but also how to
manipulate the M language I'll include a
link in the description below anyway
power query in my opinion is one of the
most important features the most
powerful tools within Excel and also
powerbi and so it's worth your time
investing and learning it and so this
all culminates and we're now finalized
covering power query in this chapter in
the next chapter we're going be jumping
into Power pivot and that's going to
jumping into actually data modeling but
before that for those that purchase C
practice problems you have some practice
problems to go through and get more
familiar with that M language for
proceeding forward all right with that
see you in the next
one welcome to this chapter on power
pivot and this chapter consists of four
different lessons where we're going to
go an intro into Power pivot and over
the wind window that it actually
provides then from there looking into
Dax or data analytical Expressions which
is a Formula language very similar to
excel formulas but before we actually
jump into this lesson and going over
what we're going for it we're going to
focus on what exactly is power
pivot so here I am in Excel and this is
meant for me to just go through and
quickly explain what is the power power
of power pivot I know that pun is
getting sort of old by now but it really
is powerful if you're curious of looking
at it it's in the workbook of power
pivot intro part one part two is what
we're going to be using for the actual
lesson so in power query in the last
chapter we end up clearing up our data
set to have these two main tables versus
data job salary which has the complete
data set on all the data science job
postings and then data job skills which
is unique to the skills for a job we
also created a data jobs merge table but
that table is actually going to be well
it's pretty much Obsolete and power
pivot is going to help replace that and
for good reason so what exactly is power
pivot well it's an addin we're going to
get to adding it in and it has a few
different features that you can do
within it such as accessing the data
model adding measures kpis and whatnot
this lesson is going to be going over
this tab as a quick refresher power
pivot is going to be available in
basically any version of Windows for
Microsoft past
2010 but it's completely not available
in either the Mac version or the
Microsoft online version so you won't be
able to do this chapter if you have
those versions or the final project
anyway the core portion of power pivot
is actually managing a data model and
what's a data model well a data model
defines how data is basically structured
stored and also related in this case we
have the data jobs salary table right
here and we have the data jobs skill
table what we can do with power pivot
besides modeling these tables and
showing how they're structured is the
more important thing of creating a
relationship in this case I created a
relationship between the job ID of data
job salary and that of data job skills
and because I created this relationship
I can look at things like the job title
shot short column see how many jobs it
has with it but also I can query across
a table over to the job skills and see
how many skills has with it in fact
let's actually do that real quick here I
have my data model itself I have my two
tables which are shown anyway I can look
at things like what are the count of the
different job titles themselves I'm
going to do that on job ID and like
we've done plenty of times before here's
the job count with a little clean up of
the actual text here but now with power
pivot I can actually reach across to
that other table of data job skills and
drag the job skills into here and this
is telling us obviously the count of the
skills based on the job title pretty
cool that we can reach across the tables
and do this now the other cool thing
that power pivot unlocks is Dax or data
analytical Expressions recall previously
that we were using the average of the
salaries and like we learned way back
earlier in this Excel course we prefer
actually a median salary but
unfortunately looking at the value fied
settings window here there is no option
to actually pick median from this and
that's where where Dax comes to the
rescue with this I can go to something
like the power pivot Tab and now create
a measure which is where you actually
insert in your Dax and I can create a
new one called median salary and we're
going to be using this Dax formula in
this case I'm going to use the median
formula very similar to the Excel
formula and I can do it on the entire
salary year average column here I'm
going to format it real quick and then
press enter anyway bam now we have
because of the power of Dax we have the
ability to get the median salary and
those Dax things can do some pretty
complicated calculations so in the case
of here we have this job count and count
of skills and we want to see what were
the skills per job specifically in this
case what is something like C2 / B2 and
then dragging all the way down and
filling it for all these this provides a
much better analysis of what's going on
with these values of counts and skills
here when we get this proportionality we
can create this with measures as shown
in this final pivot table that we're
going to be creating coming up in the
third lesson of this chapter so in
summary power pivot provides us the
opportunity to now model our data which
allows us to one create relationships
and two allows us on unlocks these
measures that we can create using
Dax all right so let's get into this
lesson what we're going to be focused on
for well first thing is we're going to
enable the power power pivot plugin and
then from there actually getting in to
data modeling or modeling our data that
we imported through Power query after we
have everything set up with our data
model we're going to then move into
performing our first analysis analyzing
based on a job title how many different
skills they have associated with it like
I said we'll eventually get to that
skills per job in an upcoming lesson so
for this you can continue to work in
that workbook that we were working with
in the last chapter EMP power query
we're going to continue work on that
because we want to use those queries
that we built if you got lost dur in the
way and just want to start back up we're
going to be starting from that M
language workbook back in the power
query chapter as a reminder these
lessons or workbooks are what are the
completed workbooks at the end of the
lesson specifically for this lesson part
one was just that intro part two is what
will be done at the end of this lesson
anyway here I am in the M language
workbook we need to get into enabling
power pivot right now you probably don't
see Power pivot up at the top of the
tabs so I'm going to go into file and
then go down to options from here I'm
going to select add-ins like we did
before and instead of excel addins we're
actually going to be using those Comm
addins I'm going click go and they have
three different ones available data
streamer power map and power pivot we
want Power pivot I'm go ahead and click
okay now power pivot should appear up at
the top all the way on the right hand
side and should look something like this
quick little overview of this tab manage
here pops up the power pivot window
which we're going to be doing a deep
dive on this in the next lesson we're
going to use it a little bit in this
lesson but anyway that's one way you can
actually access it you can also go to
the data Tab and then here under data
tools you should see it also and you'll
be able to manage your data model and
once again it will pop up the window
additionally on this tab you have the
ability to create measures and kpis
which going to be diving deep into in
the third and fourth lesson if you have
a table within your worksheets you can
add it to your dat model you can also go
about detecting relationships although I
don't find that this feature works that
well and then finally they have settings
and settings I don't really touch that
much nor does it have much control
here so let's actually get into EMB
boarding some data into our data model
we're going to do a simple example first
here I created a new sheet made three
columns of ID name salary and then
different values associated with it one
way I can add to the data model is if I
have data in a table is to do this
feature of add to data model in this my
table has headers I'll go ahead and
continue and then it will pop open power
pivot a similar like environment will
exist with Excel I can't actually edit
any numbers in here this is just how
you're modeling your data if you needed
to actually edit it I have to go back to
the sheets and like I said this isn't a
method I typically use typically have
bigger data sets not located in tables
so I'm going to go ahead and rightclick
this down at the bottom this table name
of table two click delete it's going to
say hey do you sure you want to delete
this table and Bam it's gone all right
so now there's nothing in our data model
right now here we are still inside the
power pivot window and if you've noticed
from this in the Home tab right here it
has the option to get external data they
have options for you to actually connect
Direct ly with power pivot to things
like a SQL Server Microsoft Access you
could also get it from some sort of data
feed and then this option would be more
probably useful in that it has a lot of
different sources you could use such as
other Excel files text files such as
csvs and whatnot now you may be asking
yourself I'm going to close out of this
power pivot why would I import of that
whenever we just went through with power
query to get data via this when which
time should I use which well it's very
important to remember the purpose of the
tool that you're using power query is an
ETL tool extract transform and load we
did a lot of Transformations with our
data set and so that's really the power
of power query and then it loads it in
power pivot strengths is not in ETL or
data cleaning instead it's in data
modeling creating these relationships
and Dax now now you may be tempted to
come inside of existing connections and
try to connect to specifically that
salary and skills and if we went through
like in the salary case and try to click
open we're going to get an error message
and I'll be honest this is really
confusing because we have this workbook
connections why isn't this working well
it really just comes down to naming
conventions and that the fact that power
query connections are not the same as
power pivot connections but we have a
fix for this we just need to exit out of
the power pivot window here inside of
queries and connections remember you can
get to that by going to the data Tab and
going to queries and connections we can
go to something like data job salary
which right now is a connection only
rightclick it and go to load to right
now it's only under only create
connection but we need to check this
check mark of add this data to the data
model I'm going to click okay it's going
to go through this process of loading
the data and now it talks about the rows
are loaded but mainly if I go to the
connection it has this new connection
now of this workbook data model which if
I go to and actually open up or manage
our data model we can see that it's
inside of here we have this basically
sheet for the table itself of data job
salary inside power pivot inside the
data model now we do need to get that
other pivot table or other table into
there as well so I'm going go to queries
data job skills s right click this load
to and also add this to the data model
okay it talks about 167,000 rows are
loaded and another connections still
it's only going to be one connection
because we only have one data model in
this case and now when I go to manage
the data model I have two basically
sheets down here but two tables and now
we have the data job skills in
here anyway I want to do some cleanup
real quick I'm going to clean up power
pivot but this data jobs merged and this
data jobs cleaned it's going to be very
confusing like I said we're not using
this mainly for the fact that we have
duplicate values in here for senior data
scientists in this case and then for the
salaries and so if we don't manipulate
this in a correct manner we're going to
get the wrong results so we're just
going to get rid of these so for data
jobs merge I'm going to write click and
select delete and it's going to say hey
should you want to delete data jobs
merge yes I do and then I'm going to do
the same thing with data jobs clean
right click it and select delete also if
you have these tabs down here for data
jobs clean or merge you can go ahead and
delete those as well with our models now
cleaned up let's actually get into going
over really briefly this power pivot
window with this we have three main tabs
of Home Design and advanced advanced
we're not going to go into a lot of
things inside of this if any at all it's
beyond the scope of the course we're
going to be focusing mostly on the home
and the design t tab so with this tab
we've already gone over get external
data but we can do things like refresh
our data if we know that it's updated in
power query generate pivot tables and
pivot charts based on our data model
itself change the formatting of a
particular column in this case is
noticing as text if we go to the data
jobs salary data we can actually scroll
over and see that for the salary your
average column it knows that it's a
currency we did a lot of this cleanup
right in power query and setting these
different data types so this saves a lot
of steps here in power pivot if it
wasn't done now we have options
displaying the table below that we can
actually sort it we can filter it or
sort by a certain column they also
provide options to find a specific value
within here and then these features for
calculations I don't find myself using
that much as far as the auto so anyway
over on the right the most important
thing I find is allows you to toggle on
the different views of your data set so
right now this is the data View and if I
scroll over here this is the diagram
View and this is going to show our two
different tables side by side I'm going
to move them over and actually expand
this one out to show all the different
columns and then the data job skills now
back on that data view clicking that we
have data view but also below this we
have this calculation area which I can
toggle on and off calculation areas are
where we're going to be storing our
different measures that we build with
dacks and so they'll be appearing
underneath here here if we have any
hidden columns we'll be able to toggle
them on and off right now I don't have
any hidden columns now one thing to note
with this data cleanup some of that we
did before with formatting stuff some of
it's going to be quite limiting you may
not be able to do like in the case of
this so data job skills has this job
title short column and actually if we
look at the data jobs salary data set we
have the same repeated column in it so
data job skills this job title short
right here is unnecessary now I could
rightclick it and try to delete the
column
and ask me if I want to delete it it's
going to tell me it's not going to be
able to do it because it was created by
a query I.E through Power query and
instead I should actually update it
through Power query which I would
actually argue as best practice anyway
so I could exit out a power pivot launch
power query by pressing alt F12 then go
into the data jobs skills query and if I
want I can just select this column and
select remove columns but you know how I
am I like to actually clean up the
applied steps because it could depending
on how large your power query query is
it could take a long time to load it and
unload it necessary so if I go to this
remove other colums that's the first
time that it appears in it I can remove
this by deleting it out of there then
pressing enter we may get an error
message we may not I'm not sure going to
the last step in here I notice there one
thing of the table wasn't found
specifically here it's appearing job
title short in here so I can go ahead
and delete job title short along with
with that comma and Bam we now have this
Final Table just to lean for those two
steps I'm going to go ahead and close
and load this and now going back in to
look at our data model and power pivot I
can see that it updated for data job
skills all right moving into this design
tab within power pivot this has a few
different options within it for adding
columns freezing columns just messing
with the columns they also have
different options for creating
calculations concerning columns we'll be
getting into calculating columns more in
the next lesson so stay tuned for that
right the main thing that we're actually
going to be doing in this portion of the
video is actually setting up
relationships and that is we could go
about creating a relationship here and
right now I have data job skills and I
could relate it with the job ID by
pulling the drop down to the data jobs
salary table on that job ID now that's a
way I can do it I'm actually not going
to do it this way I actually prefer
going to the diagram View and then from
there just dragging and dropping the job
IDs across each other and then it
establish this connection which we can
see through this line through here now
there's a few different things that we
need to notice from this line here one
this Arrow it's going to come to bite Us
in the butt later and that's that that
Arrow only allows data flow in One
Direction and by data flow I mean
filtering if I try to filter something
in the data job skills table this arrow
is only pointing in One Direction I
won't be able to filter it back we'll
encounter those problems in a little bit
and we'll talk about strategies how to
actually offset it the other thing to
note with this relationship here is you
notice right here it says one and over
here it says star in this case this is a
one to many relationship and what does
this mean well going to our data view
for data job salary we only have one
unique ID for each job whereas in the
data jobs skills we have multiple
different job IDs or many job IDs now if
we only had one job ID in there and we
actually looked that diagram view for
this relationship we'd have a one to one
relationship but we have multiple skills
in there so that's not possible now it's
also possible to have a basically as to
ASIS or many to many relationship but
that causes a mess slows down your data
model and I don't recommend it so you
should typically see either a one to one
or a one to many last little wrap up
before we actually analyze and use this
relationship we have the options for
table properties which we're not going
to be able to look at because this was
created the a power query for this
connection and then we have options to
create date tables underneath calendars
which we're going to be exploring in an
upcoming lesson and like always you have
a undo and redo anyway let's actually
get into analyzing and putting this
actual relationship to the test so what
we're going to do is inside the Home tab
go to pivot table CU we're want to
create a pivot table with this we're
going to insert a pivot table and we'll
have it insert into a new worksheet
selecting inside the pivot table it's
not having the field list come up so
I'll select it under pivot table analyze
anyway we want to query across this
table to show the power of the
relationships so what I'm going to do is
from the data jobs salary table I'm
going to take that job title short throw
it into the r those and then from there
going to come down to the data jobs
skills table and I'm going to throw the
job skills into the values it should be
performing a count and then I'm going to
organize this real quick from largest to
smallest and it looks like data
Engineers have the most so this is
pretty neat we're able now to query
across tables going back into that power
pivot window this connection allows us
to do that I'm going to just show you
something real quick by clicking this
Rel ship right clicking it and deleting
it want to delete for model and I want
to show you how these values are
basically going to change inside our
pivot table basically to the fact that
they're going to have it to where
they're all the same value and that's
how you know that your relationship is
not set up correctly whenever you have
multiple repeating values and you expect
them not to be anyway sometimes you'll
see this popup come up of relationships
between tables may be needed
autodetect and sometimes it works
sometimes it doesn't um in this case it
looked like it worked so we're going to
go with it and just double- checking it
in power pivot it is set up
correctly so for this final analysis
we're going to be looking at building
this visualization right here analyzing
what are the top skills of data nerds
we're basically remaking what we did in
the power query chapter now that we have
that updated data model anyway we're
going to build this out to see where the
skills counts for each of these and also
provide filters for job country so back
inside the workbook that we're
previously working with if I would
actually remember we did make that sort
of similar visualization that I talked
about but however if I go to data and
actually refresh the data it's going to
give me this error message because once
again we deleted dat jobs merged anyway
I thought this was actually going to go
away it didn't it is not what we want
we're going to delete this one and then
we're going to do a little bit of
cleanup so that one that we created the
job analysis on I'm going to actually
just rename that quick to job analysis
and then now in this new sheet we're
going to do we're going to name this one
skill job analysis anyway let's insert a
pivot table in here so we go to insert
pivot table and now what we have the
option for is from data model and it's
ask if I want to put it in the existing
worksheet yes I do remember we want to
analyze the skills and specifically how
many counts they have associated with it
or how many jobs they have associated
with it so I'm put the skills into into
the rows and then from there I want to
count how many jobs are associated with
it so I'm just going to drag that job ID
into the values right now it's doing a
sum going click on it go to Value field
settings change this to count now you
may be like Luke could we use the job
skills count and we can which has the
same exact values but actually closing
this out and taking out job skills
you're probably more interested in why
can't I use something like the job ID
from the data job salary table well if
drag that over and then I change this
value field setting to account count and
click okay you notice it says
32672 which is coincidentally the same
number of rows of that data set and this
gets into the point of filter Direction
what do I mean by that let's go back to
the data model itself looking at it in
diagram view remember the arrow is
pointed towards the data job skill table
right now I have job skills in the rows
and I'm trying to filter for data job
salary based on the count of the job IDs
but the arrow doesn't flow in that
direction we can't do it now in
something like powerbi you can actually
rightclick this edit the relationship
and change the direction that's not
possible within Excel unfortunately
anyway we're going to be using Dax to
fix this in the future for the time
being we're just going to go about using
in this case for this analysis the same
values in the same table I'm going to
remove this other job ID from the other
table anyway we're going to sort these
values from largest to smallest then
additionally I only want to show the top
10 skills so I'll go to Value filters
and then top one dot dot dot top 10
items by count of job ID is what I want
and so now we have this so now we have
the values we want to visualize I'll go
in and actually insert a pivot chart for
this I like the bar because it makes it
easier to read the different skills that
it has right there and I'm realizing now
the sword order is actually back
backwards in this I want it from
smallest to largest I'm also going to
right click and hide all field buttons
we're also going to be adding access
titles for the primary horizontal and
then removing that Legend we'll update
this title to what are the top skills of
data nerds and then the y- axis is
self-explanatory but for the x-axis
we'll label this skill count in job
postings okay the last thing we need to
do now is actually add some slicers to
this so we can actually control it
better so selecting the table itself
going to insert slicers I'm going to
select the job title short and also we
want job country right here which each
of these slicers I'm going to rename
them also this one job title short I'm
going to rename to job title and then
job country I'm going to rename to
Country now when I go through I can
actually select something like data
analyst and it will filter down and
actually see the associated skills I
could also do something like like look
at those in the United States
specifically for their counts and we see
that SQL Excel and Tableau are the three
top skills now you may be scratching
your head on like okay I thought we were
trying earlier to actually aggregate
something in the pivot table and it
didn't work well remember this arrow is
pointing to the filter Direction so in
our case we have a job title short
slicer because this arrows in the
direction back to the data job skills
table we can filter in that direction
but we cannot conversely filter in the
other direction that's why we can't get
the counts from these tables little
confusing I know but I promise you we
will work out as we go through this
entire chapter in power pivot so bam we
just completed our first analysis for
our final project we have a few more
analysis coming up in the next lessons
you do have some practice problems
though to go through and get yourself
more familiar with power pivot and
understanding what's going on with these
relationships the one to many and
whatnot all right with that I'll see you
in the next one which we're going to do
a deeper dive on looking into that power
pivot window that I'll see you
there all right let's now dive further
into Power pivot and we're going to be
focusing on the power pivot window for
this we're going to be looking at some
major aspects of it for this we're going
to get into using a little bit of Dax to
create our first measure and with those
measures we're also going to be
exploring the difference between
implicit and explicit measures don't
worry we'll cover that in a bit from
there we're going to move into a feature
that's related to measures called
calculated columns and it's going to
allow us to inside of our data model
create different values such in this
case we can actually create a date colum
from our date time value the last thing
we'll explore are date tables which
power pivot gives with a click of a
button and allows us to connect these
data tables of these date tables to our
original data source and then filter it
by a lot of different data and so we'll
wrap this all up with a final analysis
where we're looking at job postings
based on a day of week using this date
table anyway jumping into Excel for this
we're not going to be using any of the
work that we've done previously instead
we're going to open up a completely new
workbook and be working out of this
instead and the reason is all the work
that we're going to be doing within this
lesson we're not going to be carrying it
on to our project that we're going using
this is more this lesson is more to get
us more familiar with the powers power
pivot oh gosh this pun's killing me and
so we'll eventually incorporate some of
the stuff into our final project but
like I said we're going to be starting
with a blank notebook or workbook for
this as always if you want to see what
the results are at the end of this
lesson you can just go to Power pivot
window and it will have
it all right so let's actually get some
data into here to start working with and
like I said we're not going to use power
query at all for this we're going to use
power pivot so I'm going to open up the
goto to the manage the power pivot data
model and we want to get this external
data specifically we want to get that
Excel workbook that we've been working
with of data jobs salary all so
underneath the Home tab I'm going to go
to get external data and it's going to
be from other sources we scroll all down
we could look at how we can import it
from different databases or whatnot
we're going to be doing it from an Excel
file then from there we're going to
browse the connections navigating into
that data set folder I'm going select
data jobs salary all it PR me if I want
to use the first row as column headers I
do if I wanted to I could go in and test
the connection to make sure it's it's
going to succeed and it does so we'll go
from there to next it sees that it has
one sheet within the workbook that's the
one that I want I'll click finish next
it'll go through the import looks like
it completed it has a success got 32,000
rows I'll click close now let's go
through and actually clean this data set
up using power pivot now I know in the
last lesson I talked about hey we're
using power query for ETL and that's
true but let's say you have a quick data
set you need to connect to and model
quickly in that case you would do some
of the stuff that I'm going to do here
in order to quickly model it if I wanted
to rename it I'd come down to this
basically sheet tab down here it's
called sheet one after where it's at
I'll rename it and we'll keep a similar
naming Convention of Jatt jobs salary go
ahead and click enter so let's say for
this quick analysis that we're trying to
do in this lesson I'm trying to analyze
only the yearly salary data I don't care
care about the salary uh hourly data and
I don't even want the data entries in
here well I can get rid of that salary
hour average row by just deleting this
column by right clicking it it's asking
me if I want to delete it yes and now
there's still blank values in here right
so I need to get rid of this salary rate
values that are equal to hour so I'm
going to click the filter here unclick
next to hour and click okay so now we
have that out the other thing I want to
do is actually clean up the format of
the salary and I'm going to change that
instead to a currency and this talks
about how the data is going to be a
changed when where it's stored yeah I
don't really care about that no doubt
that I care about will be lost it'll all
be here still and then I'm going to
reduce the decimal places by two the
other thing I can do if I wanted to is
actually sort this based on that job
posted date could come up here and sort
from newest to oldest and then it's in
order sorry actually want it oldest to
newest got confused on that one so bam
just did some quick clean up to our data
set and now we're ready to proceed
forward so let's actually get into
building our first measure or measures
specifically I want to analyze this to
understand what are the different the
the amount of jobs in here and then also
what is the average and then also more
importantly the median salary well
there's a few different ways we can do
this we're going to do this first within
this power pivot window so in order to
do this I'm going to first first I want
to do a count so we're going to just run
this on this job title short column and
here underneath on the Home tab under
calculations we have this Auto sum I
don't frequently use this I use it every
now and then but I can run things on
this like count or distinct count I'm
going to do count in this case and this
is going to create our first measure
down here remember down below this area
is our calculation area I can toggle it
on and off by clicking calculation area
up here anyway I can also make this
column slightly bigger and what's cool
about this keep on scrolling over is now
it tells us the name of this measure
count of job tile short and that there's
22,000 remember there's normally around
30,000 but because we've taken out that
hourly data we're down to 22,000 now I
can also edit this measure if you notice
it appears right up here similarly they
have a formula bar in power pivot and to
the left hand side it tells you what is
actually selected job title short column
and then the actual measure itself in
here now one quick note there is
basically a colon and then an equal sign
that's how we're going to know that
we're doing measures and we'll get to
calculate columns in a little bit and it
will only be the equal sign but this is
Microsoft's way of signifying that this
we're using a measure so that way you
don't confuse with anything else anyway
I can edit this the actual title in this
case and I can change this something to
more more descriptive to job count
pressing enter it now runs it and it's a
lot shorter additionally if I want to
actually format it I can have the
measure selected come up here select
comma and then it formats it with the
comma and then I don't want two decimal
places I'll go ahead and remove it next
let's get into analyzing that salary
column with this once again we can click
this I could use that auto sum and do
something like average here clicking
average and below it it generates that
average
of salary or average 123,000 and I can
change it if I want to average salary
but if I wanted to calculate something
like the median instead I would have to
actually manually type out this
calculation so selecting right below
average salary and then coming into the
formula bar I can type in something like
median salary remember we want to create
a measure so it's going to be a colon
and then an equal to and then for this
we want to use the median function now a
lot of these functions that are Dax
functions are very similar to what we
use in Excel so they have a lot of
different similarities but with this
like we talked about before this allows
us to now put in basically an entire
column into it and then perform that
entire aggregation on it in this case I
want to do it all on salary year average
making sure I put a close parenthesis to
close out that function and Bam right
next to average salary we have this
median salary now which needs to be
formatted so I'll format it as English
United States stes USD and remove the
decimal places now what happens if I
didn't enter that colon equal sign so
here I am selected below median salary
we'll go ahead and paste in that formula
and we'll delete that colon I haven't
run this yet now I'm going to run I'm
going to press enter and as you notice
by this it's not actually calculating a
value it actually just converts this to
text so this is not what we want that's
why we have to do the colon equal to
sign for entering in the formula bar
there
so with measures it's important to
understand implicit vers explicit
measures so let's close out the power
pivot window and actually getting into
exploring these different measures by
creating a pivot table of that median
salary we just created so I'm going to
go to insert pivot table from data model
we're going to insert it into the
existing sheet here we have our table of
data job salary I'm going to analyze the
salary based on the job title short
column so I'll put job title short into
the rows and then look we scroll down at
the very bottom you'll notice that the
measures that we created have this F ofx
basically it shows us an equation that
it is a measure so I can take these
measures this median salary and in this
case drag it into the values and now
unlike power pivot where it did in that
same column we're now filtering down to
do it by well the appropriate job titles
now we could also do something like drag
the the job count into the values as
well and actually see the job count
there now both of these measures are
explicit measures because we explicitly
defined it we despine defined what job
count is and what median salary is so
what is an implicit measure well you
actually created this before so in
regards to that job count we're doing a
count of the job title short column if I
were to drag that down into here you can
see it says say count of job title short
this is an implicit measure these are
great for quick short analysis as we
demonstrated before you can quickly
throw something in and generate it and
you didn't even know your us the
measures and you were similarly with the
salary year average if I drag that in
down here we previously well changing
this up to actually perform an average
mov to average from there that was also
an implicit measure so I think you get
the point but we're going to see the
power of this as we go through this when
we start to make newer measures that are
actually going to use our explicit
measures specifically we're going to be
using our job count in other
calculations and so these explicit
measures are going to save our butt and
save us so much time and ensure we're
doing the correct
calculations so let's get into our first
calculated column and we're going to be
going back into the power pivot window
for this we're going to be creating a
column that will convert the salary year
average values into Euro values so
there's a couple ways we can do this or
add these columns we can go under design
and right here under columns we can
click add to add a column additionally
without that that unselected selecting
back in into again you see this add
column up here we can just go right in
and add a column I feel that's actually
easier anyway in this case in order to
get the Euros value of what it is for
Sal year average we need to multiply by
a conversion rate so inside of here I'm
going to put the equal sign and we see
it's popping up here in the formula bar
from there I'm going to use the value in
salary year average I just selected one
of the values and it popped right in
then from there similar to how we wrote
formulas before I'm going to put times
0.9 enter now notice from this one I
didn't use the colon equal sign right
because is not a measure it's a
calculated column and it still knew that
this was a currency although I don't
like it it has two decimal places so
I'll remove it and to me it knows it's a
currency but it doesn't know that it's a
Euro so I'm actually going to convert it
over to Euro and then remove the two
decimal places additionally I'm going to
rename this from calculated column one
to salary year Euro you can identify
calculated columns because Normal
columns are green the calculated columns
are black also if I go to the DI diagram
view we can see that well you can't
really tell that we have the calculate
column C Euro but you can see your
different measures that you've created
all right so back to the data view even
though we have this calculated column we
could also create a measure on this
calculated column clicking in the box
below here and then typing in here I
could do something like median salary
Euro and then put in that median
function for salary year Euro and then
close the parenthesis and Bam now we
have it I'm going to spread it out to
actually see we have the value of €
103 now going back into here we can take
this and actually if we wanted to we
could put the salary year euro into
there that column it's going to
aggregate it appropriately right now
it's doing a sum so if I wanted to I
could get a average of this of these
values or we could actually go to that
measure that we created That explicit
measure throw it in here and we get the
explicit value of the median salary
Euro all right so let's shift our focus
on this analysis let's say we wanted to
analyze more around the date
specifically the day of the weeks for
when job postings are occurring well
let's go back into Power pivot and
manage to open up the power pivot window
right now investigating our the diagram
view of our data model we only have one
table in here data jobs salary well if
we go under the design tab talked about
in the last lesson we can actually
create a date table I could also
potentially mark this table of do job
salaries a it's not a date table so we
actually need to create one and you'll
see what it looks like after that and
with that I did click new on this anyway
it created this new table called
calendar and expecting all of the
different values in here well let's
actually just get out of this view let's
actually go to the data view one which
is pretty cool with it with it what it
created it created it based on the dates
it knew what was in our original table
so from the first of 2023 all the way to
the last day of 2023 and with this it
has a year column month day of week and
day of week number so a lot of great
values from it now we need to actually
connect these two there's no
relationship between the two if we go to
that data jobs salary so selecting it
here here we only have this job posted
date column which is a date and a time
so we need only a date so because this
column is named inappropriately I'm
going to change it to J job posted date
time so now let's create that new column
with that job posted date time this time
though instead of clicking add column
we're going to go to insert function and
this is pretty neat because it allows us
to actually look under different things
in this case we wanted sort of a text
function and we can look and explore
different one specifically I know we
want this one a format converts a value
and text to the specified number format
so I'm going to click okay and it
automatically fills it in with this
colon and equal sign of format equal to
from there I'll select the job posted
date time column that's the value and
then what do we want for the format well
I know we want in the format of
basically the year first then two months
or two M's and then two D's for month
and date in order to match close that
double quote because that's the actual
format we're using that's all we need so
we'll close the parentheses and press
enter and then I'm going to take this
calculated column one drag it over here
and then I can see that it did convert
it correctly so I'm also going to go now
and rename this appropriately to job
posted date press enter so now let's
create a relationship between the two
remember we can go to that diagram view
or I can use this of create relationship
go to calendar to match on the date
itself let's see what it looks like in
that actual diagram view we always want
to inspect it to make sure we have this
right one to many or one to one anytime
we have many to many you need to start
questioning it depending on what the
data is anyway we now have a
relationship established with
this so let's actually get into
analyzing this with our calendar based
on this day of the week and seeing what
is the prop portion that they're turning
out during the week for job postings so
closing out the power pivot window I'm
going to go in and create a new sheet
from there I'm going to go go insert
pivot table from data model we're going
to do it in the existing worksheet
underneath calendar Underneath more
Fields I'm going to drag in day of week
into the rows so it has Sunday all the
way to Saturday then from there remember
we created that job count already so I'm
going to take that and drag that into
the values so looking at this I can see
that I think our relationship is not set
up properly cuz we have basically the
blanks at 32,000 I think I know what's
going on with this let's go back into
the power pivot window in calendar when
we select the date it's of the time data
type date it also has this format of
date and time I don't that really
matters too much but if we go into Data
job salary and we go to that job post to
date because we use that format function
right now the data type is auto of text
we need it to be of date and this now
looks a lot more similar to what does on
the calendar now when I close out of
this bam all the values pop up here so
don't forget about your data types and
making sure they're match within the
data model so let's actually visualize
this by inserting a pivot chart and Bam
we get this bad boy which we'll rename
to to when are most jobs posted during
the week and it looks like we have well
on Saturday Sunday or the lowest
obviously during the week it's the
highest with a basically a higher amount
on Wednesday so pretty cool analysis
that we were able to do based on the day
of the week we didn't have to create any
additional things and additionally we
can evaluate based on this calendar
table created we can do other analysis
such as by the year month day of the
week and whatnot all right so that's a
brief intro into measures and also
calculated columns don't worry too much
if you're not feeling too confident with
them just yet as one you have some
practice problems to go through to get
more familiar with it but the next
lesson will be and the next two lessons
will be on Dax and Dax advance in order
to explore different formulas that you
can also use inside of your measures and
also calculated columns all right with
that I'll see you in the next one where
we're getting into deck see you there
welcome to this lesson on Dax or data
analytical Expressions we' used it a few
times before in the previous lesson but
now we're going to go much more in depth
and actually understanding the basics of
it now as we've learned Dax can be used
within measures or even calculated
columns for the purpose of what we we
going through in the project we're not
going to create any calculated columns
but we will be using it for measures for
this we're going to be focusing on three
major types of functions in this lesson
specifically around aggregation
statistics and also filter these
functions you're going to notice are
very similar to your Excel functions
that we did back in Chapter 2 so a lot
of those similarities and concept we've
learned already are going to be able to
be applied to this so we'll be able to
move pretty quick now we're going to be
answering two major questions regarding
our final project the first one involves
calculating the number of skills
required per job title we're going to
use Dax in order to calculate this and
then we're even going to go on to
actually graph this to show how it
correlates with median salary spoil
alert the more skills you have the
higher median salary you can expect from
there we're going to go into a deeper
analysis of salar specifically looking
at the median salary and specifically
being able to compare it from your home
country to the US and also non us
countries so we're going to use filter
function in order to be able to view
these things within a pivot table now
jumping right into Excel for this you
can continue working in the Excel file
that you have from that first lesson on
power pivot intro where we created this
visualization right here which analyzes
top skills of data nerds and has some
filters for job title and Country if you
don't happen to have that file anymore
or you got lost along the way you can
just use the power pivot intro part two
file and you can start from there now if
you're loading it via the power pivot
intro part two file you're going to have
two sheets in there one skill job
analysis and then also the skill
analysis we're not actually going to be
using the skill analysis so you can feel
free to delete this or conversely if
you're working from the files that
you've been building up during this and
didn't necessarily load from the power
pivot intro part two file you may have
multiple tabs in there once again I only
care about this skill jobs analysis
where we have this this is what we're
going to keep for the final project the
job analysis and and also this other one
that we created back in the power query
lesson we're actually going to be
recreating it with power pivot so both
of these I can just delete or anything
else you have in there you can f it free
to delete after holding control and
selecting both of those I'm just going
to delete
them all right so we're going to be
looking at aggregation functions first
conveniently Microsoft has some
documentation around the Dax functions
and also statements that they have so
I'm going to dive right into the link
that's provided on the screen underneath
Dax functions specifically I'm going to
go into the aggregation functions they
have this page here on aggregation
functions overview and it shows a lot of
the different functions they have for
this average count Max Min sum let's
look at count real quick count is pretty
simple all we're going to do is use the
following syntax count and inside of it
you provide a column and for this it
says Hey the column that contains the
values to be counted so pretty simple
function to use similarly we have
distinct count which has the similar
syntax of you provide distinct count and
the column and the column that contains
the values counted and it will return
the number of distinct values in columns
we're going to use this so what we're
going to be calculating with those
functions that we just went over is
trying to find out how many skills per
job we're going to first go through
based on a job title and find not only
the skill count but also the job count
and then we're going to take both these
values and divide them to get the skills
per job so I'm going to create a new
sheet for this and inside of here I'm
going to insert in a pivot table from
our data model we're going to do in the
existing worksheet for the rows we're
going to go through the do data job
salary table and we're going to put that
job title short into the rows and then
now we need the skill count remember we
could go in and do something and create
an implicit measure by throwing job
skills and the values we want an
explicit measure because we're actually
going to be using the skill count in a
later calculation to find that skill for
job anyway how do we do this well we can
also not only create a measure by going
to power pivot and underneath here going
to new measure you can also just select
in here which table you want to use in
this case I'm doing a skill count so I
want to contain it in the data jobs
skills table doesn't really matter which
table I'll put it in but I just go by my
memory of which one I'm going to know to
go look at for which in there it auto
selects that table of data jobs skills
the measure name is going to be skill
count and then for the formula itself we
want to do a count of the job skills
column from the job skills table make
sure it's not from the job salary table
okay I'm going to put a closing
parenthesis on this and then for this we
do want to format it to use a th
separator and zero click okay and now in
the data job skills table we have this
explicit measure can drag it right next
to it same values are getting created as
the implicit measure so I'm going to
take out that implicit measure next
thing you want to calculate is that job
count we're going to be counting it
based on the distinct values of the job
ID so I'm going go to add measure we're
going to call this one job count and
we'll do a distinct count of we want to
do it of the job ID column and for this
one we want to make sure that we're
actually doing it from the salary or
data jobs salary table because this has
all the job IDs in it once again we're
going to format as a number with 1,000
separator and click okay and then I'm
going to drag at the bottom the measure
is going to appear I'm going to drag it
into here so now we want to get how many
skills per job so we want to take the
skill count column and divide it by the
job count column this one doesn't really
matter too much because it contains both
of them but I'm going to put this in the
data jobs skills table I'm going to call
this skills per job now what's great
about these explicit measures that we
just created is I can go hey I want to
do this skill count and I want to divide
divided by the job count and it's right
there so you don't have to necessarily
write out every single time okay I want
to do a count of the job skills column
and then divided by a count of the job
ID column which actually needs to be a
distinct count anyway this is where we
run into errors that's why the explicit
meas are so measures are so great all
right so I have skill count divided by
job count I'm going to create it as a
number and I want one decimal place for
this go ahead and click okay and then
we're going to add this skills per job
two here now I'm actually going to
recommend although we just use the
division sign I'm going to actually
recommend this divide function with it
which is a ma math function and what
would you do in this case is you would
provide divide and you list a numerator
and a denominator and the reason why I
like this is because it fixes any type
or catches any error specifically it
performs Division and returns alternate
results or or blank on division by zero
so we're not going to necessar error out
if we have a division by Z zero issue
and you can actually provide as shown
down here in the alternate result the
value return When division by zero
results in an error so you could
actually catch that any so I'm go going
to go back into that skills per job and
I'm going to go to edit measure I'm
going to change this to divide specify
the first and second parameter with a
comma and then click okay okay overall
no real change here but just a best
practice to know about
so now with this skills per job I want
to actually get in and comparing this to
median salary this is what we're going
to be building right here we're going to
be comparing it to median salary and
then graphing it in a scatter chart in
order to see how these different job
titles correlate to each other so first
so to know what the final analysis is
going to be of this I'm going to rename
this sheet appropriately specifically
calling it salary vers skills and this
pivot table here we don't need
necessarily the skill count or the job
count we just need the skills per job
okay we're going to calculate now the
median salary and median is a
statistical function which is
encountered underneath here but there's
a lot of different options underneath
here such as Med median finding the
different percentiles like we did back
in the formulas looking at things like
standard deviation and whatnot so a lot
of good statistical functions that you
have access to Via Dax so for this
measure I'm just going to come up here
to power pivot go under measures and
select new measure I do want this in the
data job salary table and we're going to
call this median salary for this we're
going to be using the median function
and we need to provide it a column
specifically that salary year average
value for formatting we're going to
format it as a currency with zero
decimal places since it's a salary so
now we have median salary here I
actually want it to appear on the Y AIS
so I'm going to throw it over here on
the First Column so now we have the
median salary and skills per job I'm
just going to rate these or sort these
from highest to lowest to see if I can
see visually if there's anything going
on with a correlation right now I am
seeing some higher skills than uh with a
higher salary but let's actually
visualize this so I'm going to insert
pivot chart and select PIV pivot chart
for this we want to enter a scatter plot
and if you remember back from our charts
lecture we're going to have issues with
this you can't create this chart with
the data inside the pivot table doesn't
natively support creating Scatter Plots
kind of annoying if you ask me anyway
let's X out of this and for this what
we're going to do is we're just going to
set this area starting up here we're
going to set it equal to this entire
table right here I'm not going to
capture the grand total at the bottom
because we're not going to be plotting
that now with these values I'm going to
select the contents in that this column
f and g and then from there go insert a
scatter plot specifically this one right
here I can see it already looks pretty
good you can't actually add the data
labels in whenever you create this chart
we actually have to go about doing that
somewhat manually specifically we have
to select on the data points and then
rightclick it and we have to select add
data labels okay now it's giving us
points which bar which actually
correlate to the skills per job point
it's not what we want we want to include
the job title we're going to add that so
we're going to do is select one of those
values and just rightclick it and then
from there select format data labels
then the pane's going to open up on the
right hand side and it should pop you up
underneath label options label options
then this label options and right now we
have this y value selected that's not
what we want we want value from cell and
it says Hey select the data label range
what we want is right here all the way
going down it's hidden behind here I'm
going to sort of guess but I know it
goes down to E11 click okay and
scrolling it over bam we got all those
data labels on there now all right so
now we need to clean this bad boy up
because well it's a hot mess that is all
up in the upper right hand quadrant
labels are overlapping we're going to
fix all of this first thing is I'm going
to correct the axises so I'm going to
click on the y or click on the x axis
and it should go immediately to this
minimum axis underneath access options
and I can see the first value stops
around or begins around 880,000 so I'm
going to change this to that and press
enter okay similarly I'm going to select
the Y AIS and if doesn't go to it should
be under access options inside that
format access Pane and I'm going to
select this first value that I want to
go to is three I'll leave the default of
nine there next thing is we need some
axis labels for the y axis we'll call
this average skills requested for the
x-axis we'll call this median salary and
we'll specify the units of USD speaking
of which this is not formatted correctly
for how we want the numbers so under
that format access pane under access
options and under access opt options
again under number we can go to the
custom option specifically you should
have this type hopefully appearing up if
not you can just enter it into this
format code below and then press enter
all right the last two things to do is
rename the title naming it do more
skills equal more money for data nerds
which from this chart it looks like it
does and we can actually confirm this if
we want by adding a trend line now
there's different options here for trend
lines we've going over linear
exponential IAL linear forecast I feel
linear best meets this need here also
like the coloring aspect of it so we're
going to go with that all right the last
thing to do is just fix some of these
names on here so right now we have the
data La labels appearing to the right of
the data point and in cases where it's
close so data senior data scientist it's
too close to the edge and so it's just
sort of over the top of it anyway what
you can do is actually select it twice
so click it twice then you can drag and
drop it and it should have these arrows
or these connectors that connect the
name to where it goes to all right so
now we have our final
visualization and I'd say it's not too
bad some things I'm noticing about this
some correlation if you notice yes we do
see the average skills requested are
going up with the salary but those jobs
I mean if you you can pretty much see it
they div iding line those jobs that end
an engineer Vice analyst or scientist
are commanding or requesting more skills
but yet have sort of a similar pay to
their data analyst or scientist
counterparts so I don't know I guess it
kind of pays to be a data analyst and
not a data engineer don't tell my data
engineer friends I said
that all right last analysis we're going
to get into is using filters to actually
aggregate so in this case right here
we're showing what we're going to get to
the final thing of based on a job title
short value what is the median salary in
this first column for the us then what
is the median salary for non us and then
finally that final column of median
salary what is the median salary of in
this case the selected column is uh
Argentina it's filter down basically I
call this filter function we're going to
go over but we're going to be
calculating or figure out how to prevent
filters from affecting a visualization
so we can get core values what we may
want so we're going to create a new
sheet and I'm going to call this salary
analysis like before we're going to
insert a pivot table from our data model
insert it into this new sheet and we're
going to be putting that job title short
into the rows now we're obviously with
this going to be calculating median
salary so I'm going to go ahead and just
drag that into the values to start
getting those median salaries
additionally we're going to want to
include a Slicer in here so based on the
job country so I'm going to insert
slicer on job country click okay and
then with this we can actually see if we
select something like Argentina it's
going to filter down to what it is or
what the salary median salary is in
Argentina but remember we're trying to
add two columns to this so we can
compare these values of something like
Argentina to us salaries and maybe non
us salaries so basically countries
outside the US anyway we're going to be
using filter functions for this and for
warning on this it says it here the
filter and value functions in Dax are
some of the most complex powerful and
differ greatly from Excel functions so
there's going to be a little bit of
complexity here in understanding this
and for this filter function we're going
to be using this one on calculate and
what it does is it evaluates an
expression in a modified filter context
calculate is pretty simple in my opinion
first you provide an expression so such
as hey perform a count of this column or
a median of this column from there you
provide a filter or filters and as it
states below here filters can be Boolean
filter Expressions table filter or
filter modification functions main thing
is here we're going to use things like
logical operators in order to compare
this to maybe a certain value we're
going to expect so let's jump into
creating our first one with median
salary evaluating for median salary in
the United States
so I want to create this measure inside
of our data job salary column sorry data
job salary table and for this we're
going to call it median salary us we're
going to be using the calculate function
for this and inside of here we're going
to insert the ex an expression so in our
case the expression is the median of the
salary year average column and what
we're going to do actually I'm just
going to leave this is cuz filter is
optional we can tell filter is optional
based on the square brackets around it
I'm going to just close out this
calculate function change this to a
format of currency with zero decimal
places and then from there take that
median salary us and actually drag it
onto here so right now calculate is
working by calculating the median salary
and there's no filters applied to it so
pretty simple so let's go in and
actually edit this measure now now
remember we have an explicit measure of
median salary so I actually don't even
need to Define it like I did here I can
actually just call out median salary in
this case kicking okay still the same
value going back in and actually editing
it we now want to apply a filter
specifically for this filter we want to
make sure that the job country column is
equal to United States so I'm going to
type in job country and we can use
logical operators so I'm going to use an
equal sign right next to this and I'm
going to specify United States need make
sure it's spelled exactly right I know
it's that via the column okay so now
we're going to leave everything out El
as is click okay and Bam now it has the
median salary filtered by the US and I
can confirm this by scrolling down to
the United States clicking United States
and seeing that these values are the
same but no matter what I actually click
the United States median salary is going
to stay the same additionally if you
noticed here when I click on something
like the US virsion Islands would am I
moveing there they only have four job
titles available so because of that they
just filter this table down to only show
those four that are applicable it along
with their applicable salaries in median
salary in the US so now let's calculate
the median salary for non us countries
and actually see how they differ so come
into D job salary select add measure for
this we're going to be using non us
values once again we want to use that
calculate fun function on the median
salary measure that we created and for
this one we're still evaluating the job
country but we want it not equal to so
we're going to use basically a less than
and greater than sign right next to each
other say not equal to and we'll say
United States we're going to format this
as a currency with zero decimal places
click okay and then add this bad boy to
the values and I want to actually see a
country with more job postings in it so
we'll go to something like Australia and
now something like Australia we can see
one comparing us to non Us in general
the US well except for data Engineers
yeah it looks like only data Engineers
are the lowest one in another country
everything else is higher in the US but
now we can with this one compare hey
what does it look like something like
Australia compared to us and non us
countries so super useful in actually
filtering down providing the right
context for what we want to look at so
as a data analyst median salary is
around 100,000 Which is higher than us
and also any other non us median salary
so may have to move to Australia one
last clean up right quick slicer itself
I don't like it to say job country we're
going to name this to country all right
now wrap up the analysis for this all
right so you now have some practice
problems to go through and test out
these different Dax functions that we
just went through along with some others
now in this lesson we just did some
basic dacks in the next one we're going
to be moving into some more advanced
Stacks features that I do find myself
using from time to time but overall most
of the stuff we apply in this lesson I
use dayto day all right with that see
you in the next lesson we'll be wrapping
up basically our final question in our
project and be done with our project see
you
there all right welcome to the last
lesson in this course where we're going
to be going over more advanced decks
specifically we're going to be focusing
more in depth on fil fter and also
relation or relationship type functions
these are going to be needed by our data
model in order to calculate what is the
salary or median salary for an
Associated skill if you remember back to
a few lessons ago we had relationship
issues I know I feel that with having
them being able to filter tables in
certain directions and we're going to be
able to see that and fix that in this
lesson
so in this lesson you can start with
some the workbook from the last lesson
or if you got lost dur in the way you
can go into the Dax intro workbook now
let's do a quick overview of where we're
at with which analysis we've done for
this project we've identified what are
the top skills of data nerds along with
different filters to filter for whatever
our interest is in my case I'm looking
for data analyst in the United States
and I can see that se SQL Excel and
Tableau are some of the highest
additionally we've zoomed out a little
bit and been able to identify based on
job titles where our job title of
Interest Falls compared to others and
how many skills it requires for data
analysts it's right above did business
data analysts and based on the number of
skills it looks like it's appropriately
rewarded for the median salary and then
final thing we did was be able to
analyze additionally Based on data
analyst we can look at different
countries and compare it not only in
that country but to within the US and
outside the US so a lot of good stuff
related to well data analyst that
position and analyzing the salary but
what about skills well we haven't done
that yet we're going to get into
actually analyzing in this first portion
analyzing what is the expected median
salary based on one of the top 10 skills
we did this back in the power query
lesson but now we have this new data
model we need to recalculate it anyway
we're going to run into some issues with
the data model as we're going to find
out additionally we're going to be
calculating the skill likelihood instead
of skill count basically finding the
percentage of a skill in a job posting
this is somewhat complex so this portion
here will be optional and you'll be able
to use job count instead if you don't
want to follow along with this skill
likelihood anyway back in your workbook
whether you started from that uh Dax
intro or you're continuing on with from
the last lesson we're going to create
this new sheet for this and for this
we're going to name this
skill salary analysis as usual we're
going to go in and insert in a pivot
table from our data model so we can get
into analyzing the skills going click
okay insert it in and so for this I want
to analyze what is the median salary for
a skill so if I drag the job skills from
the data jobs skills table into the rows
we have all the different skills pop up
underneath here and then if we went up
here and then tried to drag or we will
be dragging in the median salary into
here all these values are going to be
the same addition we get this popup
right here that relationships between
tables may be needed basically we're
running into an issue with our data
model even if I click autod detect it's
going to tell me no new new
relationships are found so what's going
on here well let's actually analyze our
data model by going to manage and then
inside of here go into diagram view so
the air resides with their filtering
dire remember this Arrow right here
signifies which way we can actually
filter our data so in our case we have
job skills which is over here in the
data job skills table and we're trying
to find the median salary the problem is
is we're basing that off of that salary
or average value that's in the data jobs
salary table and based on the direction
of this Arrow we cannot flow in the
opposite direction this is what we're
call oneway way or single filtering now
unfortunately Excel doesn't support bir
directional filtering however in things
like powerbi you can actually go in and
change it from single filtering to both
or bir directional filtering kind of
makes me wish I was in powerbi right now
so back in Excel we can't actually
control this via here and actually click
it to change this to bir directional
filters we can only control the
relationship itself but we we can use
Dax to fix this now in order to fix this
relationship we actually have
relationship functions inside of Dax
specifically we're going to use this
cross filter function with this function
you put inside of cross filter the
column names so in our case we can
specify basically the job ID from job
salary and the job ID from data job
skills and then from there we specify
the direction which the parameters under
here we can go into what we can provide
to directions we can either provide none
basically don't create a relationship
both which is what we want filters on
either side or one way which is what we
have already we're not going to use this
you also control filters left or filters
right the one way we're also not messing
with that we want both now this cross
filter remember is a filter function so
we need to use this in an appropriate
for formula that we already know
calculate in order to filter so I'm
going to x out of this box right here
cuz that's not applicable
what we're going to do is I'm going to
calculate median salary or a new median
salary if you will inside of the data
jobs skills table and because it's uh
going to use the same name but we're
going to keep it in a different table
it'll be perfectly fine and then for
this remember we want to use still
calculate we want to have an expression
in here in our case we want to calculate
what is the median salary and we'll just
use the explicit measure that we already
defined then from there we'll get into
the filter one of what we want to
actually filter we want to provide for
this cross filter and for this we're
going to specify the job ID of one table
along with the job ID of the other table
then for the filter type we're going to
use both okay I'm going to go ahead and
close this now we're calculating median
salary so I want this formatted as a
currency with zero decimal places I'm
going go ahead and click okay and have
an error in my formula should have known
that by the X I need to actually put a
closing parentheses on here and I'll
lied to you a measure a column with the
name median already exists okay I
thought we could do that it's Sil me so
we'll name it median salary skills go
ahead and click okay okay now I'm going
to drag this into the values and we can
actually see with this one now that the
associated median salaries are actually
there and it's not all that 115,000
which is basically the median of the
entire data set so I'm going to go ahead
andove move this other median salary out
of here and from there we're going to
also drag skill count into here I just
want to look at the top 10 most common
skills in this case so I'm going to go
up here into our filter and go to our
value filters for top one dot dot dot we
want the top 10 items by in this case
skill count and then from there based on
these top 10 skills I'm going to sort it
from largest to smallest but like usual
this is no good unless we don't actually
analyze for the country and also for the
title or job title so if I actually go
back into that skill jobs analysis I can
just select these two slices right there
pressing control then copy it and paste
them into here now you may notice
whenever I'm clicking this this is not
affecting this pivot table right here so
we can actually inspect this by going to
the slicer and going to report
connections right now this slicer is
only affect ing the skill job analysis
tab so this one right here in our case
for this job title we actually want to
affect it on this page here of skill
salary analysis which is right down here
click okay looks like the salary is
updated also we want to do the same
thing for Country adjusting the report
connections for this as well and
selecting this one right here for
underneath the sheet of skill salary
analysis clicking okay bam it updated as
well so now looking at the top skill of
data analyst in the United States which
I'm pretty familiar with I can see
things like python Oracle and Tableau
are top three Excel does make the list
and it's the second to last at 84,000
now with this I do want a visualization
with it specifically I want a combo
chart showing this so I'm going go into
insert pivot chart pivot chart and for
this go down to combo for this I want
the median salary to be the main focus
and then for the skill count we're going
to put that on a secondary axis because
right now it's just way too low if we
keep it on the same axis and this has
the format that I want right here go
ahead and click okay I'm going to hide
all the field buttons on the chart I'm
going to add a primary vertical and also
a secondary vertical axis along with a
chart title and then for the legend
itself I'm going to click it and then
rightclick it and go to format Legend
and for this it should go under Legend
options Legend options Legend options
I'm going to unclick this of show The
Legend without overlapping the chart and
I'm just going to move it up here so not
bad I don't necessarily want this orange
line right here for the skill kind I
don't really feel like a line is best to
signify the count instead what I'm going
to do is select the line and if it
doesn't appear the format data series
you can also just right click it go to
format data series and then underneath
fill and line they have line but also
marker for the line we're going to go no
line and then for the marker we're
actually going to change the marker
options to builtin we'll change it to
this square is going to be fine or we
can change it to a diamond we'll make it
slightly bigger and I don't really like
the color so I'm going to go into design
and change the color to this
monochromatic pallette 8 nope never mind
not that one I meant monochromatic
palette one I want the bar charts to be
more visually popping than the actual
markers themselves I change the title
two what's the pay of the top 10 skills
and then change the primary access to
median salary USD and the other one one
to job count closing this out and then
making some room over here for the
actual visualization itself so now we
have our visualization that we want that
looks at this and be able to show us
what are the top 10 skills for data
analyst and their Associated pay now one
last thing for this regarding slicers I
want to actually make it to where
they're connected between the charts so
right now I have it to where this
basically this one for skill salary
analysis tab if I go over to the skill
job analysis tab select business analyst
it will change then go go back to skill
salary analysis it updated to business
analyst anyway I wanted to if we change
a slicer to make sure that it changes on
the appropriate sheets so the job title
slicer is only on these two sheets
actually that one's perfectly fine but
the one we actually have concerns with
now is the country specifically on this
one I'm selected on the United States
the skill job analysis one it's also on
the United States and updates
appropriately but then if we look in the
salary analysis that one's on Australia
it's not updating appropriately so we
need to go to slicer report connections
and we're going to be putting the
country one on all the different sheets
so I'm going to go ahead and select all
the sheets for this I'm going to do the
same for skill salary analysis country
slicer which it looks like it updated
along for the skill job analysis so what
I'm going to do is actually copy this
now and put this into the salary verse
skills because we're controlling it on
this page as well and so now whatever I
select select something like maybe
United Kingdom it will update
appropriately and update on other sheets
as well anyway quick one quick note
because we move those titles around that
one time sometimes it's not going to
match up exactly how we had it before if
you recall I'm going to go ahead and
select all we set up these text box in
order to view them whenever basically
all countries were selected so that is
one of the issues about dragging and
dropping those titles and making them
stick to a certain location it messes it
up your filters whenever you want to
filter down for something like the
United
States so this wraps up basically our
four major analysis that we did now I'm
going to take it a step further this
portion will be completely optional and
that's this right now we're using skill
count in order to look at what is you
know the skill count of in this case for
data analyst in we'll do United States
we see that SQL is around 400 4,000 and
that Excel is around 3500 but what does
that actually mean well if we go to the
Future file of what we're going to get
to we're actually going to be
calculating a skill likelihood instead
which in this case is looking at what is
the proportion of a skill compared to
all the different jobs that are
available for data analysts in the
United States and so that 4500
and almost 3500 is equal to well greater
than 50% for SQL and about 40% for Excel
so that makes in my mind a lot clearer
how important that skill is over account
in that you probably should be learning
SQL and Excel as a data analyst so back
in our sheet where we're actually
calculating with the job count how do we
calculate this well let's actually get
to moving this over to here go back into
our pivot table self and if we throw up
the job count you may get this
relationship between toils maybe needed
don't worry about it too much now these
values are all stagnant based on some
issues with the filter Direction but
that actually comes to our advantage
because for our filter right here
specifically data analyst in the United
States the amount of jobs that actually
are are
8339 if I actually remove both of these
filters we would expect it to be the
total rows of the column which is
32672 so coincidentally this is actually
doing what we need we just need to get a
percentage of these two values and that
can be done pretty easy so let's open
the show field list and actually get
into creating this measure we're going
to create in the data job skill table
we'll call this skill likelihood and
what this will do is take skill count
and divide it by job count but remember
we probably want to use the divide
function for for this so putting in
skill count and then job count now
there's no option to format this as a
percentage unfortunately so I'm going to
go ahead and click okay from there I'm
going to drag the skill likelihood into
the values and go through and format
this appropriately selecting that it's a
percentage and then with this I'm going
to select something that a value that I
know what it should be of data analyst
in the United States and with those
values selected I can see that Excel is
at 41% which I know that's what it is
and SE is at 53% for these values so bam
we have this skill likelihood now we can
now go in and remove these other two
columns of skill count and job count and
then from here actually move this graph
back over and unfortunately with the
adjusting to it we actually have to fix
this and turn this back into a combo
chart so we're going to design change
chart type into combo select for the
skill likelihood we want this to be on
the secondary axis click okay go back to
format data series remove the line and
then change the marker option to be
built in and to be that diamond at 6
point and then finally update that
secondary access to basically say it's
skill likelihood and Bam now we have
this final visualization now there's one
more that we actually do need to clean
up and that's this one right here what
are the top skills of data nerds right
now we're doing a count of the job ID an
implicit measure which you know how I
feel about that we should use an
explicit measure specifically we're
using skill likelihood instead of that
and remove that count of job postings
once again I need to actually format
this as a percentage so going to home
change it to a percentage and then from
there clicking in it and sorting from
smallest to largest and Bam for this one
data analyst in the United States once
again we can actually see visually what
are the top skills for this so now we
just updated both of these charts to
have a more represen istic understanding
of what's going on with the data all
right so you should be super proud of
what we just accomplished in this
project going through both power query
and power pivot and actually diving deep
to understand some key statistics about
top paying skills and also top skills
you should be targeting depending on
what job you're pursuing and what
country you're in now do have some
practice problems go through and test
out some of these more advanced
functions specifically this cross filter
function that we went over then after
that in the next lesson we're going to
be getting into how we can actually go
about sharing this project for those
that purchase the course practice ice
problems and also certificate you can
now go through and complete that end of
course survey and you'll be rewarded
this course certificate now if you
didn't do this it's not too late for you
to go in and purchase the course so way
you get this course certificate all you
got to do is go in and take that Endor
survey and you'll get it all right
congratulations on your work so far see
you in the next
one all right congratulations again for
finishing that last project in this
video and the next video which are the
last two videos of this entire course
they're going to be focused on how to
actually go through and share your
projects in my recommended way
specifically we're going to be sharing
this on GitHub so that way others can
see it here I am on GitHub and also if
you didn't notice there where you
actually downloaded all those Excel
files at the beginning of this course
anyway inside of here is where I'm
hosting my different projects and you've
gone through and probably seen this but
you may not have clicked on something
like the project One dashboard and in
this case yeah I have the Excel file but
that read me in there displays below
this and this is what we're actually
going to be doing in the next two videos
to set this up and then create this read
me and this allows you to detail all the
different skills that you used along
with detailing all the different
analysis that you did while going
through this now that was Project one
project two is going to follow a similar
method and that it has the Excel file
and the readme and then in the readme
itself it details all the different work
that we did in
it so you may be like Luke why the heck
am I going to be using GitHub in order
to share this project I'm not familiar
with it I don't know how to use GitHub
at all why am I going to waste my time
with it well I think it's useful not
only in Excel but also other
Technologies specifically programming
here I have my SQL project for my SQL
course and this this is where I host my
SQL code and all the different analysis
that I did for it and similarly for my
python course and the project we
creating that I also hosted on GitHub
and detailed all the different the steps
that we did along with all the different
uh python files associated with it so
more the story is I think github's a
great tool to use in order to share your
work not only in Excel but also other
tools now if you recall from Project one
we walk through the steps to quickly
share your project on one drive if you
had it accessible via like a paid
Microsoft subscription and this provided
a method to go through and share if you
go up here and actually copy the link a
usable link for others whether they have
Excel or not to actually go in and then
manipulate your dashboards that you have
so you may be wondering why the heck are
we not doing this with this second Excel
file that we created with all of our
analysis and then sharing it via this
method well if you're called back to
this handy Dan table of the different
Microsoft versions and the different
skills or basically Technologies within
Excel that it uses Microsoft online
which where we hosted that first project
at doesn't have the capabilities of
power query or power pivot because of
that I could go through the process of
adding the second project to this which
it's this file right here I'll open it
up then actually investigating it well
it does if you investigate all the
different sheets does go through and
actually show the analysis that we did
but if you actually get into
manipulating it like in this case let's
say I wanted to see what are the top
skills of data analyst you're going to
get this popup right here that says this
workbook contains external data
connections or bi features that are not
supported basically power pivot and
power query aren't supported it can't
actually query the data it's just
showing the basic last snapshot of the
data right here and you can't manipulate
it so in this case Microsoft online
becomes pretty useless so that's why I'm
recommending sharing it via GitHub as
you can share all the associated files
with this if somebody want to they could
come in here and download it along with
going through and actually detailing
what you actually did so basically
controlling the story line and sharing
what the different analysis or insights
that you actually found now this what
you're reading right now is a read me
and it requires understanding markdown
and how to write and markdown so we're
going to be covering that more in depth
in the next video when we get into
markdown and creating the read me this
video is going to be primarily focused
on just getting this project into GitHub
so what are we going to be doing for
this well we have five major steps we
need to get through the first thing is
installing git which is the core
technology used behind GitHub we'll
explain more in a bit second and third
we'll be going through actually setting
up our GitHub account and then
installing GitHub desktop to then manage
with Git our different folders and
projects and then fourth and fifth we'll
be basically initializing the repository
which is a fancy term for a folder and
from they are getting that folder
repository onto GitHub to then share so
before we install it what the heck is
git well similar to how they have track
changes and stuff like word and
PowerPoint git does this it's a Version
Control System it tracks changes in not
only files but also code and because of
all this it allows you also to
collaborate with others when working on
a project git is the core technology
behind maap managing all these different
things going on on your own local
computer and then whenever you make any
of these changes get Hub is where it
keeps track of these final changes if
you will and then displays it for the
world to see and also pull those changes
so here's my Excel di analytics course
right here on GitHub and I have the same
folders or repository on my own local
computer now there's actually hidden
folders or git folders in here managing
this and I can do a shortcut on Mac of
command shift period to show that but
anyway I wanted to mainly show this of
this dogit folder in here and this thing
I don't necessarily touch this at all or
work inside of it this.get folder
contains all the different revisions and
tracks all the different changes within
my project so in order to get this git
folder inside your project and then also
get it into GitHub we need to actually
install git
so navigate over to the git website into
their downloads select your operating
system Choice whether Mac OS or Windows
I want a Windows machine right here and
from there I'm going to select the
64-bit version for Windows and click
here to download Once download I'm going
to open the file as do I want to allow
this to make changes in my device yes I
do and then it's going to walk you
through the setup process for git all of
these things are going to be left as
default so feel free to just go through
and select it all after I've left all
the default settings as is and selected
that it then gets into the actual
install itself looks like it installed
properly we'll go ahead and click finish
we can confirm it's installed by opening
something like terminal and you should
have a terminal app installed this is
just confirming it you don't necessarily
have to do this anyway mine opens in a
Powershell and you can just type
something like get and it shouldn't give
you an error message it should instead
give you how you could go about using
git via the command line in terminal
don't worry don't be AF of this we're
not going to be using git via the
command line although I may need to make
a separate course on that instead we're
going to be using GitHub desktop to
manage
git so in order to use GitHub you need
to have an account if you already have
an account you can feel free to just
sign right on in but if you don't go
through the whole process of entering
your email providing your different
credentials and then getting logged in
once logged in it should direct you to
your homepage if it doesn't you can come
up here to this icon at the top and from
there just select your profile I would
go through at this point and actually
customize your profile specifically
adding a picture your name a little
description and any social media links
over here on the right hand side of on
my homepage I have some different pinned
repositories because you just set it up
you probably have none but this is where
we're going to be putting your Excel
project when you're complete so that way
if people navigate to your profile they
can see it now that we have this account
we need to actually get our project or
our repository onto GitHub but
unfortunately there's not really an easy
method I've found with actually using
the UI from the website to do this and
that's mainly because there's a lot of
technical things going behind the scenes
and managing
git instead I'm going to recommend
downloading github's application to
install on your computer they have it
for both Mac and windows navigate to
this link here and for this we I'm going
to go ahead and just download the 64-bit
version of this application this one's a
lot easier to install than get from here
once we have it downloaded I'm going to
open the file the installer should open
this window for you to next sign into
GitHub once you've enter your
credentials for GitHub you'll use this
to configure git and for this you're
going to basically say hey I want to use
GitHub account and name and email
address to manage all this and click
finish now it should navigate you to the
let's go started screen anyway it has
methods for you to go through and create
a tutorial repository if you want to
we're going to be doing that and it has
some different options for this that you
can also select via the file menu such
as a new repository add local repository
or clone repository we're going to be
creating a new repository and as a
reminder repository it's basically a
fancy name for a folder but it's a way
for us to maintain and collect all of
our different files and not for what
we're using in our project so for this
we need to give it a name so I'm going
to give it some descriptive like Excel
project data analytics and for
description I'll just give the simple
one of my project Dem maturing my Excel
skills for the local path we need to
actually point it to the folder that has
this so mine is inside my documents
folder and real quick inside that folder
itself right now I would expect you to
have the project one and project two I
also going to be putting all the
different files that I have for the
different Excel workbooks that we work
through in the lesson if you don't have
them don't feel like you need it the
main important thing is that you have
both project one and project 2 in there
and I have them conveniently located in
different folders inside of here never
getting out of that so I can select this
Excel project. analytics folder I'm
going to select this folder it's going
to ask if I want to initialize this
repository with a read me I do as far as
the get ignore I'll put none and license
none as well and we'll create the
repository so now you're going to be
navigated to this screen here here which
is basically the default screen of
GitHub desktop it allows you to select
different repositories right now I have
only the Excel project analytics one it
allows you to select different branches
we're going to say on one shifting to
another Branch beyond the scope of this
course then up here at the top it has
something like publish repository which
we want to do but one quick thing real
quick I can actually investigate what
files are going to be pushed up to
GitHub by going here into history and
right now it's just one I selected that
box for read me so the readme is in
there and the other one's just do get
attributes the other ones aren't in
there and I'm doing this on a Windows
machine well if I navigate back to the
folder that contains my project so here
I have Excel project. analytics which I
selected two from the GitHub desktop
whenever I go into it it actually
created another folder inside of it and
that has theget attributes and read me
that it's talking about about now I've
done this on both Windows and Mac and
Mac doesn't cause this issue of putting
another folder inside your other folder
so for Mac users you may not have this
problem so completely ignore this but
for Windows user this is a problem
because this right here is the project
or the folder was going to get uploaded
to GitHub so what we need to do is take
all the contents of this by selecting it
all and just pressing control to select
it all and then dragging it into that
folder so a little confusing but if we
go back to the documents we have our
Excel project. analytics folder then
inside of that we have our GitHub repo
and then now navigating back into GitHub
desktop I go over here and I see changes
we have 85 of 8 five different files and
folders within there it's actually
picking up on all those different files
that I have in there once again if
you're on a Mac you may not see this
because it's already in there in history
and you can see it's actually within the
this portion of the guy anyway the thing
now is if we go ahead and publish this
repository to GitHub it's only going to
have what's inside of our history right
now under this what we're calling a
commit and a commit is a snapshot of
your
repository at the time that you're
basically committing it so we need to do
a commit in order to get all these
different changes into a repository cuz
technically right now they're in an area
called a staging area or the working
area anyway we need to provide a summary
that's required and I'm going to add
something simple like add all Excel
files doesn't need to be super
descriptive and from there I'm going to
click commit to main now if I go into
history I have this initial commit that
it did but then that add all Excel files
it's going to then have in all those
different Excel files that I added into
it so now that our local repository on
your machine is is up to date we need to
then publish this repository to GitHub
and we can either click this button or
this button here for this we're going to
keep the same name and description that
we have before we don't want to keep
this code private so we're going to
uncheck that box and then from there
we're going to click publish repository
so my repository has quite a bit of
Excel files and the memory size of it is
pretty large so it is taking a little
bit of time to do this so now we've
completed pushing our local repository
to our remote repository on GitHub so
inside of GitHub I can navigate up here
to the right hand side and I go to your
repositories and here it is the Excel
project data analytics that we made
public and it's all in here so now
somebody can come in here and see our
different work in this case our project
One dashboard is inside of here we have
our Excel file in there and Bam we've
set up git and also GitHub and that was
a push so now we need to demonstrate
what is a pull
and so in order to do that a pull
request we need to actually make changes
on our remote repository so that on
GitHub and then pull it into our local
repository so here's what we can do for
that I'm going to just go in and we
created this read me. markdown file upon
creation because we selected that
checkbox you can actually come in here
and edit this read me by clicking the
edit file button and and I'm just going
to come in here and I'm just going to
say hey I added this on github.com
adding it in the bottom now we're going
to go into markdown formats and stuff as
you can see we have this hashtag here
we're going to go all that in the next
lesson but anyway I made this changes to
here so we need to like we did on our
local repository and making a change we
need to commit those changes here and
conveniently it just gives us a commit
message of update read me confirm the
correct email and it conects directly to
the main branch we're just staying on
that Branch we're not shifting for this
course at all from there I'm going to
commit changes so now if I go back into
the project itself scroll on down to see
the read me I can see that I have I
added this on GitHub whereas on my local
machine if I go into look at the readme
markdown it doesn't have that addition
that I added to the readme file so we
need to pull those changes going back to
the GitHub desktop app I'm going to come
up here and you notice that it says
fetch or this isn't going to do anything
this is just going to fetch origin
basically the main branch and Pull It in
this isn't going to make any changes to
your file it's just going to update it
of what's on GitHub and we can see based
on this that we have basically one
change here by this one and this down
Mark and so in order to get these
changes we need to pull the origin pull
it and so I'm just going to click it to
pull and now when we go into the history
we now have this new one of update read
me we can see that this readme has this
addition because it's in green of I edit
this on github.com and then inspecting
this in the readme itself it now updated
to say hey I added this on github.com so
bam we just demonstrated how to push and
also pull from our local repository and
machine to our remote
repository so now that we have GitHub
and git all set up we now need to get in
to actually building out those readms
and explaining what we did in our
project and demonstrating those skills
that we gained in this course so that's
what we'll be doing in the next lesson
if you're getting stuck at any point
during the way I highly recommend that
you take use of something like chat gbt
or even gemini or whatnot and actually
paste in your error code and it will
help you with troubleshooting it it's a
lot quicker than posting a comment in
here saying that you had an issue all
right with that see you in the next one
we're getting into the Remy see you
there welcome to the last video in this
course and in this we're going to be
going over how we're going to actually
document all the different work that you
did for project one and for project two
we're going to putting this into our
markdown file or our read me and then
from there getting it onto GitHub and
then finally going through how to share
it on LinkedIn so right now navigating
to our GitHub repo with our project in
it you should have at least two folders
in there one for your project One
dashboard and one for your project 2 if
you have your other folders for all the
work that you did for all the other
lessons in this course that's awesome
too but not required mainly just have
your project work in there anyway we
have this read me for the entire project
itself and right now it's pretty Bare
Bones and if we navigate into that
project One dashboard right now you
should have only have a file in there
specifically that Excel file but we need
also a readme in here as well so we can
description add a description of what we
did in that dashboard similarly project
2 doesn't have a read me as
well now we have demonstrated in that
last lesson how we can actually go into
something like the readme and then from
there edit it inside of your web browser
by just clicking this edit this file
icon it shows not only the edits for you
to actually go through and maybe type
something but also the preview itself
itself of what the file is going to look
like don't worry we're going to be going
over markdown syntax in a little bit but
anyway that's how we're going to be
doing all these different changes to the
files for this I'm not going to do these
changes I'm actually going to cancel
these changes now an alternate option to
making edits to something like a readme
is using a text editer or IDE integrated
development environment such as
something as Visual Studio code which is
completely free and is I have it
launched here in my app um is an app
that I use in order to edit and manage
my different files I can also go through
if I'm editing the read me itself I can
type inside of here and edit it but also
during that I can actually go in and
view what's going on with the actual
read me itself off to the side while I'm
typing here in this other window anyway
I just want to make you aware of this
that is an option for you to go through
but it does take some experience with
knowing how to use vs code setting this
all up so based on the complexity we've
already built up already we're going to
stick to just editing our readms inside
of github.com
so before we get into building our
project readms we need to understand
some syntax here specifically if you
notice this Excel project analytics is
capitalized and everything else is
lowercased and if we actually go in and
edit the file we can see that we have
this hashtag at the front which
translates this into a heading so they
have special characters that you can
actually use in front or around text to
manipulate text
and the team that created markdown
conveniently created this cheat sheet
which I'll link here and it shows all
the different methods that you can use
to actually manipulate and make
different things happen inside your
markdown file so let's actually look at
a few here I have a heading one heading
two and heading three denoted by how
many hashtags and a space and then if I
preview this heading one heading two and
heading three next we can either bold or
italicize text by surrounding it either
double asteris or single asteris and the
final results right here is bold and
italicized notice how the Bold text and
italicize are on the same line it's
important that after you go to a new
line you actually put two spaces in
there now that I have that in there it
will actually shift it to the next line
we can also do things like an ordered
list or an unordered list which would be
like bullet points and it conveniently
indents that and makes it look a lot
nicer we can o surround something by a
back tick which is located up at the top
of your keyboard or you could do triple
back ticks at the top and bottom for if
you have multiple lines of code and if
we actually go to preview this we can
see that the single line of code was
just surrounded whereas a multiline
creates this entire coding block the
final two worth mentioning are links and
also images for the link for the text
that you wanted to appear for the link
you'll put in square brackets and then
for the hyperlink itself you're going to
put that inside a parentheses right next
to it and then actually changing this to
a real world example of something like
google.com if I go to preview and then I
click this link it's going to ask me if
I want to leave site and go to Google
I'm not going to do it because it's
going to mess up all my changes but you
get the point for images is very similar
but the text you provide in the square
brackets is just your alternate text so
whenever you scroll over it what the
text is displays and then from there is
the actual image location however this
isn't an actual image location so I have
this eror message that goes on with this
alt text hence this broken file you're
going to notice that if any of your
files for your images are broken anyway
github.com actually makes it pretty easy
to get images in in this case I have a
gif of the dashboard you could also use
an image file but all I have to do is
take it and drag it into here and if you
notice it automatically formatted it
with alt text and then the actual link
location itself so saving the file
itself and it puts that exclamation
point at the front signifying that it's
an image or in this case GIF if I go to
preview scrolling down we can see that
we have our image once again you need to
put spaces after that other one to make
sure that you're not having it all in
the same line but you get the
point anyway let's actually get into
creating this read me that's on the
homepage if you will of our actual
project and the main point of this one
is I want people to be navigated to the
appropriate project depending on what
they're looking for so I went ahead and
put in some text already for how I want
to break this down I'll break uh I'll
shift over to preview and I'm going have
a title such as my excel. analytics
projects from there we're going to have
the salary dashboard project and the
salary analysis right now the image that
I have for the dashboard is in the wrong
location actually shift that up now I
went ahead and added the images also for
our salary analysis while cleaning up
where the salary dashboard is which I
included only just two graphs here but I
just want to give a sneak peek of what's
going to be inside of those other readms
that were about to build out now you may
be wondering how the heck do I get
screenshots of graphs in my different
dashboard well depending on if you're
using Mac or Windows they have software
installed already and so these shortcuts
should work for you in order to perform
your appropriate screen capture I
primarily use on a Mac command Shift 4
to select a certain area and it allows
me to basically just hover over
something and snapshot it this same
thing can be done on a window Windows
machine you're just going to press
Windows shift plus s so I went through
also and just added a quick description
to each section I'm go into preview
because it's a little bit easier to read
there anyway underneath this I just
detail hey this contains all my Excel
files to follow along in my case my free
course of Excel for data analytics I
would word it differently for you of
that you're actually providing all your
different Project work in this
repository additionally I provide a
short description for the first project
and then also a short description for
the second project make sure in this
case you actually are putting spaces
after those lines so you don't have
those images overlay on top of it now
the last thing I would do as you see
here I link to my course but I think
more importantly what you need to do is
actually link to the appropriate files
within this repository so people can
quickly get to the salary dashboard or
the salary analysis and so I'm going to
add this link of connecting to that
appropriate project by first adding this
text of check out my work here
and then inside parentheses I'm going to
list the folder of project One dashboard
you have to make sure you spell it
exactly like the folder that is inside
of your repository or the Link's not
going to work I'm going to do the same
with the project two dashboard as well
and going to preview it I can see it's
all there I probably want some spaces in
between
this and so just put an extra enter in
there okay that's good enough I'm going
to get into committing the changes this
is update my readme that sounds good I'm
going to commit them so now on our home
folder of our repository of excel
project. analytics scrolling down I have
my read me here it tells me about it and
then for the salary dashboard it says
hey check out my work here when I click
on it it navigates me into this folder
for the salary dashboard which you need
to now create a readme 4 also it's just
good practice to make sure that you
check to make sure that other link works
as well and in this case it didn't it's
a good thing we checked it I had project
2 dashboard and instead it was actually
project 2 analysis I'm going to commit
changes and then now when I actually try
it out bam navigates me to the right
location so now you have now the basics
to go through you understand markdown
enough to edit it I'm going to walk
through how I built out the project one
read me and also the project 2 read me
so that way you have some understanding
of what you should do going forward with
the project one I recommend including a
picture of the dashboard to start and
then a brief intro detailing why you
wanted to do this project underneath
this make sure you include a link to the
file itself which is conveniently right
here and then inside of here detailing
the different skills that you use with
building this is really important for
job Seekers that way if a recruiter
comes and looks at this they see what
the skills are you used in this and then
from there I talk about the data set
itself talking about what we were trying
to get or extract out of the data so
basically all the foundation they need
in the introduction portion this portion
I recommend keep being the similar
format the next portion you can feel
free to go about however you want
specifically I go into the dashboard
build breaking it down into three main
areas of focus on first is the charts
itself I highlight the different median
salaries all of the different job titles
themselves I go into some insights from
that I also talk about the country map
and the insights from this as well next
after charts I move into functions and
formulas detailing one of the key
functions that we used using median and
then an if statement in order to build
out an array formula so not only
breaking it down but also explaining
what insights we're able to get with
this formula and then the third skill I
talk about is data validation talking
about why it's used a gif of How It's
actually applicable or how it's actually
visually seen in Excel and then finally
I just wrap it up with a conclusion so
to recap for the first project you need
an intro statement describing what we're
doing and why you did it and what skills
you used then then from there on the
build itself explaining what you
actually built how you use those skills
and what insights you got out of it and
then finally wrap it up with a
conclusion for the second project mine
is very similar formatted in that I have
an introduction Excel skills used the
data set and then since this one was
primarily focused on analysis I included
the four questions that we went through
and actually answered for our analysis
so then with the template of these four
questions I broke each one of those down
with those questions primarily focusing
on one what skill did I use to help
answer that question and then two what
is the analysis insights I got out of
answering that question I repeat the
same thing for the second question
specifying the skills that we use for
this and then the analysis or what
insights we got out of it after going
through questions three and four we then
get to our final thing of a conclusion
of what you actually learn and extracted
from insights for this so it's really
good to put all this stuff in it I
wouldn't be overwhelmed and think you
need to include everything in it think
about a job recruiter themselves they
don't have a lot of time so keeping it
as short and to the point as possible is
going to be best for
you once you're done actually gone
through and built out your repo with all
its Associated read me it's time to get
into actually sharing this on social
media via LinkedIn I recommend the same
approach that we used back in Project
one of listing this down in your project
section by going through and actually
clicking the add icon and adding the
projects if you did go through and
actually add that salary dashboard
already I would just focus this one on
the salary analysis so I'd put in
something like a name of the data
science job analysis a description add
any appropriate skills there's a ton of
different skills you actually select for
what you use I would focus on primarily
these of Microsoft Excel power query
data modeling ETL and pivot tables for
the media in this case I would include a
link to your repo and paste it on into
here and click add it will then provide
this snapshot thumbnail of what's going
on here and a title I like it all I'll
click apply now if you recall back from
that first project we tried to provide
the link of that one drive link for
Excel and it didn't work so if you have
that project on LinkedIn I would go
through and also attach this link as
well to that so that way they know how
to navigate to it finally select your
start and stop date if you have any
contributors are associated with I don't
have in this case and then from there
save it the last thing I recommend doing
is making a post telling others about
your project so they can come in and see
it in it I would definitely include
something like a link and feel free to
tag Kelly or myself in it I love
checking out your projects and seeing
the different work that you've done for
it so once again congratulations for
finishing this course been nothing short
of your hard work Excel was the first
skill or main skill that I learned in
helping me land my first data analytics
opportunity so I feel the same can go
for you as well now after you taking a
short break and you're ready to get back
into learning more skills I do have a
squel course that I recommend you taking
as you've learned from analyzing this
data Excel and SQL are two of the most
top skills of data analyst so it pays to
know it and you can basically learn it
in a weekend all right with that I'll
see you then either in the next video or
in the next course see you there
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.