DATA VISUALIZATION USING GGPLOT2
BASIC
Visualizations bring data to life. A good visualization will give you new insights and will often lead to new ideas for additional analyses or visualizations. As humans we are much better at processing visual information than numeric information - both in terms of comprehension and speed.
The ggplot2 library is one of the gems of R. The syntax for producing plots may appear at bit strange at first, but once we “get it”, it will be easy to produce beautiful and insightful visualizations in no time. With ggplot2 one can create visualizations by adding layers to a plot.
- Any plot in ggplot2 consists of
- Data
- Aesthetics: which variables go on the x-axis, y-axis, colors, styles etc.
- Style of plot: Bar, scatter, line etc. These are called plot layers in ggplot and are specified using the syntax geom_layer, e.g., geom_point, geom_line, geom_histogram etc.
Let’s get some data to plot. We will be working with Iris dataset which is a pre installed dataframe in R.
- GLIMPSE OF THE DATASET
- USING HEAD( ) FUNCTION TO VIEW THE FIRST FEW OBSERVATIONS
- USING FUNCTION QPLOT( )
Firstly , we will plot sepal. length against petal.length and obtain the plot. Sepal.Length goes into the x axis , Petal.Length goes into the y axis and data refers to the iris dataframe. Here we have used a single color to denote all the points of the different species.
We can also represent the different species by different colors as shown below using color=Species argument.
Similarly, we can let the size of each point denote sepal width, by adding a
size = Sepal.Width argument. The scatter plot shown
below is the modified version of the above plot in which an additional size
agrument is added while plotting the two variables , this argument increases
the size of the petal.width ,which increases the size of the dots in the
scatter plots denoting the petal.width. Here we see that the Iris Setosa flowers have the smallest petals.
Here we see that we can add the value of alpha to
avoid over plotting . So we add an argument called alpha with a
value of 0.7 , to reduce the affects of over plotting
Alpha
transparency for overlapping elements expressed as a fraction between 0
(complete transparency) and 1 (complete opacity).
Specifies
the geometric objects that define the graph type. The geom option is expressed
as a character vector with one or more entries. geom values include
"point", "smooth", "boxplot", "line",
"histogram", "density", "bar", and
"jitter".
In the
scatterplot examples above, we implicitly used a point geom,
the default when you supply two arguments to qplot().
In this plot we use the
geometric object as line to define the graph object in the graph below the
lines of different colors shows the different species of the flowers plotted
below .
Orange is another built in data-frame that describes the growth of orange trees. Variation of orange tree circumference with age.
We can also plot both points and lines.
ADVANCED
No comments:
Post a Comment