DATA VISUALIZATION IN R
With ever increasing volume of data in today's world , the need for data
visualization has also increased . It is very important to effectively
visualize the data so that we can tell stories and draw insights from the
data.R programming offers a set of inbuilt functions and packages to visualize
and present the data.
Data visualization in R can be broadly classified into the following two
categories-
- BASIC VISUALIZATION
- ADVANCED VISUALIZATION
Basic visualization can be in the form of histogram, line/bar
chart , scatter plot , boxplot , etc.
Advanced visualization can be in the form
of heat map, mosaic map, correlogram, 3D graphs, map visualizations , etc.
- SCATTER PLOT
In our above mart dataset, if we want to
visualize the items as per their cost data, then we can use scatter plot chart
using two continuous variables, namely Item_Visibility & Item_MRP as shown
below.
Now, we can view a third variable also in same
chart, say a categorical variable (Item_Type) which will give the
characteristic (item_type) of each data set. Different categories are depicted
by way of different color for item_type in below chart.
We can even make it more visually clear by creating separate scatter plots for each separate Item_Type as shown below.
- HISTOGRAM
Histogram is basically a plot that breaks the
data into bins and shows frequency distribution of these bins. par (
mfrow=c(2,5)) enables us to fit multiple plots on the same page. We installed
and called the library RColorBrewer and using the dataset VADeaths , drew
a set of histograms.
Here we see that if the number of breaks is less than the number of
colours specified , the colors just go to extreme values as in the " Set 3
8 colors " graph .However if the number of breaks is more than the number
of colors , the colors start repeating as in the first row.
- BAR AND STACK BAR CHART
Here we see
that if the number of breaks is less than the number of colours specified , the
colors just go to extreme values as in the " Set 3 8 colors " graph
.However if the number of breaks is more than the number of colors , the colors
start repeating as in the first row.
- BOXPLOT
Box plot is an important tool for visualizing the spread of the data and
deriving inferences accordingly.A boxplot , also known as box and whisker plot
, splits the dataset into quartiles .The body of the boxplot consists of a box
which goes from the first quartile to the third quartile.Within the box, a
vertical line is drawn as Q2 which represents the median of the dataset.
Below is the code and output for drawing boxplots.
- AREA CHART
Area chart is used to show continuity across a variable or data set. It is very much same as line chart and is commonly used for time series plots. Alternatively, it is also used to plot continuous variables and analyze the underlying trends.
- HEAT MAP
A heat map is a two dimensional representation of data in which values are represented by colors. A single heat map provides an immediate summary of information. More complex heatmaps enable the reader to understand complex datasets. Below code is used to draw a heat map on the mtcars dataset. An important point to note here is that we need to convert the dataset into matrix format.
- CORRELOGRAM
The corrgram function produces a graphical description of the correlation matrix , known as correlogram. The cells of the matrix can be colored or shaded to show the correlation value. Darker the color , higher is the correlation between the variables. Color intensity is proportional to the correlation value. From our dataset, let’s check co-relation between Item cost, weight, visibility along with Outlet establishment year and Outlet sales from below plot.
- MAP VISUALIZATION
We can also visualize the data in R using javascript libraries. Leaflet is one of the most popular open source JavaScript libraries for interactive maps. We can easily install it from github and plot the map using simple code.










ok
ReplyDelete