Most of the reports/stories produced by scientists/researchers/journalists (especially the ones who specialise in business and economics) are driven by data. If the data are presented in a tabular form, it may not be interesting and you may not immediately get the real insights. For instance, if you have a dataset that contains a time series data and your purpose is to see whether a specific variable (the data in which you are interested) has increased or decreased during a specific period, the data in tabular form may not provide deeper insights.
On the other hand, if you represent the data visually, say, as a time series line chart, the reader would immediately be able to get a good sense of how the data varied across time. It is common knowledge that we can process images much faster than text content. This is where data visualisation technology comes into play.
Data visualisation is a multi-purpose data-crunching task. Apart from being used to communicate your ideas based on data, visualisation is one of the techniques being used to extract more knowledge/insights/meaning from data (in the statistician’s parlance this task is called data exploration).
Basically, data visualisation involves transforming data into charts, graphs or maps. It lets us transform the components of our data into things that make sense. The type of visualisation you use depends on the type of the variable you are looking at and the issue you wish to analyse. If you are interested in comparing variables or the relationship between two or more variables you have to rely on a specific type of graph. If you are comparing items with a few categories you can use bar charts. If you are comparing behaviour over time you may have to use line charts. If your intention is to depict the relationship between two continuous variables (like income and life expectancy), your choice is a scatter plot, or if you want to depict the relationship between more than two variables, a bubble chart could come in handy.
Data visualisation entails representing numbers through the variation of certain features of objects such as the length or height of an object (bar charts), the position of an object, size, line weight, colour shading (varying the shades of a colour to show differences or value changes) etc. To put it differently, here we map numbers on to properties of objects. The way in which we map data into visual objects is called visual encoding. The kind of encoding you choose depends on the purpose.
If you wish to know more about how to choose an appropriate encoding for your data, you may find consulting a site like Dataviz Catalogue rather useful. The website features a plethora of graphic forms that are currently popular.
By navigating the site, you will realize that the field of data visualization is far richer and diverse than you would normally expect. Besides the common visualisation types (like bar charts, line charts, scatter plots etc), there are many more ways to visualise data. The little blue clickable buttons you see on the site represent some of encodings and visualisation types currently available. Each of these buttons is dedicated to a type of encoding and you can obtain the details of a graphic type by simply clicking on the relevant button. For instance, if you click on the icon with the label ‘Box and Whisker Plot’, the application will immediately display a web page that contains a detailed write-up on box and whisker plots.
Online visualisation tools
Apart from conventional data visualisation tools (like MS-Excel ), there are ample, simple to use data visualisation tools out there (like Tableau Public). Here we introduce one such tool called ‘iNZight’.
The ‘iNZight is a free visualisation tool that was designed and created by the Department of Statistics at the University of Auckland in New Zealand. It is a simple open source data visualisation tool. The tool comes in two versions: a desktop version that can be used offline from your computer and an online version that can be used in your browser. The program is based on the popular statistical software ‘R’ and if you are familiar with ‘R’ you can install ‘iNZIght’ as an R package. If you don’t want to install the software, you can use the iNZightLite, its web counterpart.
Features of ‘iNZight’
If you wish to practice without loading your own data, you can use the example data set that comes along with the application. You will find data sets such as Facebook, Salary etc. You will also find datasets originally produced by the famous data research site Gapminder, which was established by the illustrious data scientist Hans Rosling, who died in February 2017. If you wish to use your own data, simply bring it to the application via the ‘Import Dataset’ option under the ‘File’ menu’.
To get acquainted with the software let us choose the ‘Gapminder’ data and select it. When you load data, the application will display a summary of its content (a few initial entries of the dataset). This country-level dataset contains information about more than 50 demographic, health, economic and social variables across different countries. It can be used to explore issues pertaining to health and economic circumstances across the world and changes over time.
Now, to visualise the data, go to the menu ‘Visualise’ and immediately the plot interface will pop up (screenshot above). Select the variable of your choice from the ‘Variable selection’ interface and instantly you will get a distribution of the selected variable (screenshot below).
In the example shown below each of the little dots corresponds to a country. You can also get a numerical summary of the data (mean, median, standard deviation, quartiles etc) if you wish. To change the style and appearance of the graph you can use the menu option ‘Add To Plot’. Along with the distribution chart, the application will display a box and whisker plot too.
You can even subdivide the data according to a specific variable and can obtain a set of plots based on the sub-divided data. For instance, if you select the ‘Region’ variable in our example, you will get graphs for each of the regions.
We have just scratched the surface of this application, which has tons of other features. It supports multi-variable graph modules, 3D plots, maps module (for exploring geographical data) and so on. You may also note that it allows you to export the plots as a PDF or an image document.