Wednesday, 11 January 2017

Data Visualisation : An Introduction to GGPlot2 ( 05/01/2017 )



DATA VISUALIZATION USING GGPLOT2

BASIC

Visualizations bring data to life. A good visualization will give you new insights and will often lead to new ideas for additional analyses or visualizations. As humans we are much better at processing visual information than numeric information - both in terms of comprehension and speed. 

The ggplot2 library is one of the gems of R. The syntax for producing plots may appear at bit strange at first, but once we “get it”, it will be easy to  produce beautiful and insightful visualizations in no time. With ggplot2 one can  create visualizations by adding layers to a plot.
  • Any plot in ggplot2 consists of
    • Data
    • Aesthetics: which variables go on the x-axis, y-axis, colors, styles etc.
    • Style of plot: Bar, scatter, line etc. These are called plot layers in ggplot and are specified using the syntax geom_layer, e.g., geom_pointgeom_linegeom_histogram etc.
Let’s get some data to plot. We will be working with Iris dataset which is a pre installed dataframe in R.


  • GLIMPSE  OF THE DATASET 





  • USING HEAD( )  FUNCTION TO VIEW THE FIRST FEW OBSERVATIONS






  • USING FUNCTION QPLOT( )
Firstly , we will plot sepal. length against petal.length and obtain the plot. Sepal.Length goes into the x axis , Petal.Length goes into the y axis and data refers to the iris dataframe. Here we have used a single color to denote all the points of the different species. 


We can also represent the different species by different colors as shown below using color=Species argument.

Similarly, we can let the size of each point denote sepal width, by adding a size = Sepal.Width argument. The scatter plot shown below is the modified version of the above plot in which an additional size agrument is added while plotting the two variables , this argument increases the size of the petal.width ,which increases the size of the dots in the scatter plots denoting the petal.width. Here we see that the Iris Setosa flowers have the smallest petals.


Here we see that we can add the value of alpha to avoid over plotting . So we add an argument  called alpha with a  value of  0.7 , to reduce the affects of over plotting 
Alpha transparency for overlapping elements expressed as a fraction between 0 (complete transparency) and 1 (complete opacity).


We add the axes labels and titles to the plot.


Specifies the geometric objects that define the graph type. The geom option is expressed as a character vector with one or more entries. geom values include "point", "smooth", "boxplot", "line", "histogram", "density", "bar", and "jitter".


In the scatterplot examples above, we implicitly used a point geom, the default when you supply two arguments to qplot().



In this plot we use the geometric object as line to define the graph object in the graph below the lines of different  colors shows the different species of the flowers plotted  below .


Orange is another built in data-frame that describes the growth of orange trees. Variation of orange tree circumference with age.


We can also plot both points and lines.




ADVANCED


















































No comments:

Post a Comment