### Introduction

This is the second tutorial in the series Elegant Data Visualization with
ggplot2. In the previous tutorial, we understood the concept of grammar of
graphics and even built a bar plot step by step while exploring the different
components of a plot/chart. In this tutorial, we will learn to quickly build a set
of plots that are routinely used to explore data using `qplot()`

. It can be
used to quickly create plots but also has certain limitations. Nevertheless, if
you want to quickly explore data using a single function, `qplot()`

is your friend.

### Libraries, Code & Data

We will use the following libraries in this tutorial:

All the data sets used in this tutorial can be found here and code can be downloaded from here.

#### Scatter Plot

Scatter plots are used to examine the relationship between two continuous variables. The relationship can be examined across the levels of a categorical variable as well. Let us begin by creating scatter plots. The first two inputs are the variables/columns representing the X and Y axis. The next input is the name of the data set.

`qplot(disp, mpg, data = mtcars)`

If you want the relationship between the two variables to be represented by
both points and line, use the `geom`

argument and supply it the values using a
character vector.

`qplot(disp, mpg, data = mtcars, geom = c('point', 'line'))`

The color of the points can be mapped to a categorical variable, in our case
`cyl`

, using the color argument. Ensure that the variable is categorical using
`factor()`

.

`qplot(disp, mpg, data = mtcars, color = factor(cyl))`

The shape and size of the points can also be mapped to variables using the
`shape`

and `size`

argument as shown in the below examples.

`qplot(disp, mpg, data = mtcars, shape = factor(cyl))`

Ensure that size is mapped to a continuous variable.

`qplot(disp, mpg, data = mtcars, size = qsec)`

#### Bar Plot

A bar plot represents data in rectangular bars. The length of the bars are proportional to the values they represent. Bar plots can be either horizontal or vertical. The X axis of the plot represents the levels or the categories and the Y axis represents the frequency/count of the variable.

To create a bar plot, the first input must be a categorical variable. You can
convert a variable to type `factor`

(R equivalent of categorical) using the
`factor()`

function. The next input is the name of the data set and the final
input is the `geom`

which is supplied the value `'bar'`

.

`qplot(factor(cyl), data = mtcars, geom = c('bar'))`

You can create a stacked bar plot using the `fill`

argument and mapping it to
another categorical variable.

`qplot(factor(cyl), data = mtcars, geom = c('bar'), fill = factor(am))`

#### Box Plot

The box plot is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Box plots are useful for detecting outliers and for comparing distributions. It shows the shape, central tendancy and variability of the data.

Box plots can be created by supplying the value `'boxplot'`

to the `geom`

argument. The firstinput must be a categorical variable and the second must be
a continuous variable.

`qplot(factor(cyl), mpg, data = mtcars, geom = c('boxplot'))`

Unlike `plot()`

, we cannot create box plots using a single variable. If you are
not comparing the distribution of a variable across the levels of a categorical
variable, you must supply the value `1`

as the first input as show below.

`qplot(factor(1), mpg, data = mtcars, geom = c('boxplot'))`

#### Line Chart

Line charts are used to examing trends across time. To create a line chart,
supply the value `'line'`

to the `geom`

argument. The first two inputs should
be names of the columns/variables representing the X and Y axis, and the third
input must be the name of the data set.

`qplot(x = date, y = unemploy, data = economics, geom = c('line'))`

The appearance of the line can be modified using the `color`

argument as shown below.

```
qplot(x = date, y = unemploy, data = economics, geom = c('line'),
color = 'red')
```

#### Histogram

A histogram is a plot that can be used to examine the shape and spread of
continuous data. It looks very similar to a bar graph and can be used to detect
outliers and skewness in data. A histogram is created using the `bins`

argument
as shown below. The first input is the name of the continuous variable and the
second is the name of the data set.

`qplot(mpg, data = mtcars, bins = 5)`

### Summary

In this tutorial, we learnt to quickly create plots using the `qplot()`

function. While useful, it has limitations and can be used only to quickly visualize data.

### Up Next..

In the next tutorial, we will create the same set of plots but using **geoms**.