DATA VISUALIZATION IN R ( DETAILED )
With ever increasing volume of data in today's world , the need for data visualization has also increased . It is very important to effectively visualize the data so that we can tell stories and draw insights from the data.R programming offers a set of inbuilt functions and packages to visualize and present the data.
Data visualization in R can be broadly classified into the following two categories-
- BASIC VISUALIZATION
- ADVANCED VISUALIZATION
Basic visualization can be in the form of histogram, line/bar chart , scatter plot , boxplot , etc. Advanced visualization can be in the form of heat map, mosaic map, correlogram, 3D graphs, map visualizations , etc.
- HISTOGRAM
Histogram is basically a plot that breaks the data into bins and shows frequency distribution of these bins. par ( mfrow=c(2,5)) enables us to fit multiple plots on the same page. We installed and called the library RColorBrewer and using the dataset VADeaths , drew a set of histograms.
Here we see that if the number of breaks is less than the number of colours specified , the colors just go to extreme values as in the " Set 3 8 colors " graph .However if the number of breaks is more than the number of colors , the colors start repeating as in the first row.
- BAR/LINE CHART
LINE CHART-
The line chart below shows the increase in air passengers over given time period.Line charts are basically used to analyse trend over a period of time.
BAR CHART-
Bar plots are recommended when we wan to plot a categorical variable or a combination of categorical and continuous variable.
- BOX PLOT
Box plot is an important tool for visualizing the spread of the data and deriving inferences accordingly.A boxplot , also known as box and whisker plot , splits the dataset into quartiles .The body of the boxplot consists of a box which goes from the first quartile to the third quartile.Within the box, a vertical line is drawn as Q2 which represents the median of the dataset.
Below is the code and output for drawing boxplots.
In the code below , we have drawn four boxplots on the same page. The spread of sepal length across various categories of species can be easily visualized. In the last two graphs is the example of color palettes.Color palettes is a group of colors used to make the plot more appealing and visually creative.
- SCATTER PLOT
Scatter plot is used to see the relationship between two continuous variables. We can draw both simple and multivariate scatter plots using the code below. We can also draw a simple pie chart using the code pie(table(iris$Species)). However , pie charts are not always a good choice because the human eye cannot visualize circular distances as good as it can visualize linear distances . Therefore, instead of using a pie chart , we can use a line graph.
- HEXBIN PLOT
Hexagon binning is a form of bivariate histogram useful for visualizing the structure in datasets with large n. For better visual effect , we can use color palette and then draw the hexbin plot.
- MOSAIC PLOT
Mosaic plot is a graphical method for visualizing data from two or more qualitative variables. It gives an overview of the data and makes it possible to visualize relationships between different variables. The area of the tiles , also known as the bin size is proportional to the number of observations within that category.
- HEAT MAP
A heat map is a two dimensional representation of data in which values are represented by colors. A single heat map provides an immediate summary of information. More complex heatmaps enable the reader to understand complex datasets. Below code is used to draw a heat map on the mtcars dataset.
An important point to note here is that we need to convert the dataset into matrix format.
- MAP VISUALIZATION
We can also visualize the data in R using javascript libraries. Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. We can easily install it from github and plot the map using simple code.
- 3D GRAPHS
We use the package RCommander which acts as graphical user interface (GUI).For this we simply install the RCmdr package and use the 3D plot option from within graphs.
We can also make 3D graphs using Lattice package . Lattice can also be used for xyplots using the following simple code -
- CORRELOGRAM
The corrgram function produces a graphical description of the correlation matrix , known as correlogram. The cells of the matrix can be colored or shaded to show the correlation value. Darker the color , higher is the correlation between the variables. Color intensity is proportional to the correlation value.
No comments:
Post a Comment