Google search has made astonishing progress over the last few years. Unlike in the past, its capability to understand the real intention of a user is rather remarkable now. Even the dumb and error-prone queries are answered with exact results in the first attempt itself.
For instance, if you invoke a search with the string ‘ecojomivs of innovation’, Google automatically corrects the spelling and presents results for the corrected version of your query (here, ‘economics of innovation’). It can now understand questions with superlatives (like, “longest river in the world”, “highest mountain” and so on) and questions with dates in them (like “gold price in 1970 in India”, “population of india in 1947”). The point to be noted here is that the search engine now allows us to make the search in an everyday natural language. The technology that makes this possible is called Natural Language Processing, which is the focus of this week’s column.
The technology that focuses on understanding the internals of human language (like English, Hindi, German, etc.) is known as Natural Language Processing (NLP). NLP strives to understand the way humans learn and use a language. It tries to teach computers, how to understand natural language and also how to generate it.
Auto summarization of text (creating a summary of an article with a few words or sentences), spelling correction, text classification (being able to divide pieces of text into different topic categories), sentiment analysis (classify social media messages into positive or negative views), topic analysis (locating/understanding the topic clusters or groups of articles available in a huge corpus of news articles) and machine translation are some of the problems that come under the purview of NLP.
For all the NLP issues mentioned above, we need to undertake a set of common tasks. The primary task is to break down the piece of text into a set of words and sentences (in NLP parlance this is called ‘tokenization’). To get an idea about this task, take a look at the ‘Tokenize’ service available here. Just type in or paste your text in the input box; the application will split the text into a set of words.
Identifying the type of the word (finding a word’s appropriate part-of-speech- if it is a noun, verb, adjective, etc.) is yet another common NLP task (this is called part-of-speech tagging). To get a feel, check out the ‘Tag It’ service.
Finding the commonly occurring words or groups of words is the next phase. If you wish to experiment with a word count service, access the Text Fixer service here.
There are lots of words that take on different meanings based on the context in which they occur. This means, when a word has multiple meanings, to properly comprehend its appropriate sense (meaning), we should know the context in which it is used in the text. For example, the word ‘interest’, can represent the money paid for using money or it can be ‘a sense of concern with someone’. Understanding the word context, which is called ‘word sense disambiguation‘, is an important subtask of many NLP applications.
As mentioned earlier, one of the applications of NLP in real-life is auto-summarisation. If you are an MS-Word user, you can use its ‘AutoSummary Tools’ feature, which identifies the important sentences in a document. You may find this feature quite handy while going through a long piece of writing. If required, one can use this feature to generate an abstract and insert it at the top of the document. Several online summarization tools are also in place. The online application SMMRY is one such service worth a try. The service offers three input options: pasting in the text, inputting the web address of the article to be summarised and uploading the article file.
Most of our activities are now done in the virtual space; activities such as online shopping, banking, hotel booking and the like have become quite commonplace. Apart from helping us conduct many of our life chores online, the digital age provides us with a plethora of tools for expressing our thoughts/views/opinions on any topic of our choice too. Quite often, people make online comments on hotel facilities, provide feedback about different products, express their thoughts on political events and so on.
All of these carry information about people’s perception about a company or product or person. For anyone who sells a product or service, customer perception is more important than anything else. Before making a purchase decision, generally, people go through reviews made by other customers. Naturally, any service provider would be keen to know how the company is perceived in the market- do the customers like our brand or they hate it. This ‘perception’ issue is not limited to business organisations; even politicians cannot afford to ignore people’s views.
For instance, a candidate would be eager to know how people are reacting to her speech during an election campaign. This context makes the opinion databases- postings appearing on blogs, social media platforms (like Facebook), Twitter messages and so forth- a gold mine for opinion miners.
Data flowing across social media channels are so huge that it is almost impossible to manually monitor them. We need to automate this task. As these data are already in digital machine readable form, performing analytics on them is rather easy. The solution should scour the various opinion sources, and extract insights from the text and identify the sentiment expressed in it. This is what Sentiment Analysis, yet another application of NLP, does. To get an idea as to how the ‘sentiment analysis’ functions, you may access the demo application available here.
Sentiment Analysis has gained lots of attention in the political realm these days, especially the one based on Twitter messages. The Twitter-based application, ‘Election 2016 Sentiment Map’, which used the Twitter feeds to analyse the popularity of political parties in the US presidential election, is an excellent instance of this trend. Though most of the predictions were favoring Hillary, the sentiment map from the application was showing a different picture. From the map, it was quite clear that the US electorate was leaning towards Trump. Yet another triumph of NLP/Machine Learning over human judgment!