Wednesday, 11 January 2017

Data Visualization in R - 11/01/2017

DATA VISUALIZATION  IN R

With ever increasing volume of data in today's world , the need for data visualization has also increased . It is very important to effectively visualize the data so that we can tell stories and draw insights from the data.R programming offers a set of inbuilt functions and packages to visualize and present the data.

Data visualization in R can be broadly classified into the following two categories-
  • BASIC VISUALIZATION
  • ADVANCED VISUALIZATION
Basic visualization can be in the form of  histogram, line/bar chart , scatter plot , boxplot , etc. 
Advanced visualization can be in the form of heat map, mosaic map, correlogram, 3D graphs, map visualizations , etc.

  • SCATTER PLOT

In our above mart dataset, if we want to visualize the items as per their cost data, then we can use scatter plot chart using two continuous variables, namely Item_Visibility & Item_MRP as shown below.





Now, we can view a third variable also in same chart, say a categorical variable (Item_Type) which will give the characteristic (item_type) of each data set. Different categories are depicted by way of different color for item_type in below chart.

We can even make it more visually clear by creating separate scatter plots for each separate Item_Type as shown below.




  • HISTOGRAM

Histogram is basically a  plot that breaks the data into bins and shows frequency distribution of these bins. par ( mfrow=c(2,5)) enables us to fit multiple plots on the same page. We installed and called the library RColorBrewer and using the dataset VADeaths , drew a set of histograms. 



Here we see that if the number of breaks is less than the number of colours specified , the colors just go to extreme values as in the " Set 3 8 colors " graph .However if the number of breaks is more than the number of colors , the colors start repeating as in the first row.

  • BAR AND STACK BAR CHART

Here we see that if the number of breaks is less than the number of colours specified , the colors just go to extreme values as in the " Set 3 8 colors " graph .However if the number of breaks is more than the number of colors , the colors start repeating as in the first row.







  • BOXPLOT

Box plot is an important tool for visualizing the spread of the data and deriving inferences accordingly.A boxplot , also known as box and whisker plot , splits the dataset into quartiles .The body of the boxplot consists of a box which goes from the first quartile to the third quartile.Within the box, a vertical line is drawn as Q2 which represents the median of the dataset. 

Below is the code and output for drawing boxplots. 



  • AREA CHART

Area chart is used to show continuity across a variable or data set. It is very much same as line chart and is commonly used for time series plots. Alternatively, it is also used to plot continuous variables and analyze the underlying trends.



  • HEAT MAP
A heat map is a two dimensional representation of data in which values are represented by colors. A single heat map provides an immediate summary of information. More complex heatmaps enable the reader to understand complex datasets. Below code is used to draw a heat map on the mtcars dataset. An important point to note here is that we need to convert the dataset into matrix format. 



  • CORRELOGRAM
The corrgram function produces a graphical description of the correlation matrix , known as correlogram. The cells of the matrix can be colored or shaded to show the correlation value. Darker the color , higher is the correlation between the variables. Color intensity is proportional to the correlation value.From our dataset, let’s check co-relation between Item cost, weight, visibility along  with Outlet establishment year and Outlet sales from below plot.

  • MAP VISUALIZATION
We can also visualize the data in R using javascript libraries. Leaflet is one of the most popular open source JavaScript libraries for interactive maps. We can easily install it from github and plot the map using simple code.






1 comment: