Text Analytics : Creating a picture from a thousand (or 10^n) words
In mid 14th century, the European renaissance began with the invention of the printing press by Johannes Gutenberg. Philosophers and social scientists at the time seized this opportunity to spread their ideas in a scale larger than ever before. This perhaps was the earliest sign of the information age we would witness centuries later. For a very long time after the invention of the printing press, there wasn’t another path breaking invention enabling information sharing, until of-course the creation of the world wide web.
Fast forward to 2016, the world wide web is probably best described as a giant sink for all human ideas in the form of status updates, tweets and blogs. The explosive growth is so dramatic that we have generated more data in the past couple of years than in the entire history of mankind before this.
The problem now, is quite contrary to what existed in medieval Europe. Unstructured ‘dark’ data is becoming a massive black hole, our very own Gargantua, in the sense that we are increasingly more curious to know what lies beneath but are handicapped by the lack of technical know-how. The need now is for another toolset, as game changing as the printing press to enable mankind to move into an era of frictionless information consumption…to allow us to create intuitive visual representations for this vast trove of unstructured text. In other words, create pictures from the millions of records getting generated at an unprecedented pace.
Enter Text Mining, a set of computational tools and techniques for synthesizing patterns and trends through statistical learning. Consider consumer perception. It is a known fact that word of mouth publicity is a highly significant contributor to consumer perception. But we have never really been able to figure out what is it exactly that the consumer is talking about? How does he/she feel about a given product/service? (Yes, we have solutions such as a focus group or a consumer research survey but gross inaccuracies in the findings of these solutions have often been attributed to minor sampling biases.)
Text Mining is a powerful suite of techniques that is increasingly being used by diverse businesses like consumer goods, social media giants, insurance, media, telecom, legal services and others to detect fraud, e-discovery, analyze customer sentiment, prevent churn, and make informed decisions on other high impact business initiatives.
The idea of Text mining can easily be explained by recollection of the the idiom – A picture is worth a thousand words. In a general sense, a text mining engine has three major components –
- Topic Models – What topics or themes are being discussed?
- Sentiment Analysis – What is the polarity of the sentiments?
- Named Entity and Event Recognition – What identifiable entities are mentioned and how are they related?For illustrative purposes, consider the two minute video below:
The text mining engine illustrated above parses text to look for word patterns, sentiments and named entities in the data. The entity recognition component is able to identify Sam and the popular consumer product brands he is using like Gatorade, Xbox and Samsung. The sentiment analysis component can successfully ascertain sentiments in each sentence which can be tied back to Sam’s emotional connection to each of the brand entities. The topic model looks for word frequencies in documents to identify the overall document theme, which is ‘play’ in this case, so this document could potentially be about recreation.
The math behind these three individual techniques is highly involved but the concept in itself is self-explanatory.
With the Big Data and Text Mining technologies, it is now possible for organizations to convert large amounts of unstructured text into intuitive visual representations – pictures that can help you read between the lines and connect the dots.
Imagine a world where organizations are armed with precise knowledge about the following – What are my customers are talking about? How do they feel about our products/services? What are their likes and dislikes? What exactly are they saying?
Now imagine that organizations can sense these patterns in real time, zoom into the specifics and develop targeted responses to emerging opportunities. What possibilities does this open up? Is your organization ready to capitalize on these opportunities?