Social Media Text Mining: How Does It Work?


The two major ongoing challenges for any brand are:

  1. Understanding their customers’ needs and motivations
  2. Keeping track of their changing requirements and expectations over time.

Social media data provides the solution to both challenges. Locked within online conversations are the diamonds that brands are looking for – detailed and timely insights into customers’ experience of their brand, product or service.
Social media text mining is nothing new, but the technologies used to mine and analyse text are rapidly evolving. The latest and most technology-forward social listening tools are combining more traditional linguistic rules with machine learning-based analysis to uncover more hidden insights locked into social media conversations.

What is text mining?

Text mining, also referred to as text data mining and similar to text analysis, is the process of extracting meaningful patterns and insights from unstructured text.
Most textual data (around 80% of all data in the world) is unstructured – that is to say, it isn’t standardised into a tabular format (rows and columns). Social media comments, product reviews, forum posts, customer service transcripts, all of these sources are written organically and are historically more difficult for a computer to categorise and make sense of.
Language is as complex and varied as the people who use it. Nuance, subjectivity and idiosyncrasies all play their part. What makes it more complex still is the rate at which language evolves – intergenerational communication is a constant source of confusion and bemusment, further exacerbated by online communication in forums and on social media.
Natural Language Processing (NLP) techniques enable us to make sense of unstructured qualitative data and machine learning helps computers to adapt to the nuances and changes to language.
Humans are still more capable of understanding language, but this new technology allows vast amounts of unstructured textual data to be processed at scale, enabling organisations to improve their decision making and business outcomes.

Uses of social media text mining

Text mining social media data can help brands to better understand their customers and their customers’ experience of their brand, products and services.
Here are some of the applications:

Identify topics and subtopics

Find out what is driving conversation on social media, what portion of your audience is talking about any particular topic, and how conversation changes with time.

What’s dominating the conversation?

Find out what portion of online conversation is around a specific product, service or topic.

Measure sentiment and emotion

Discover how customers feel about your brand in general or in relation to a specific topic, product or service. Are they positive or negative? What emotions do they express? E.g. anger, joy, frustration, etc.

Psychographic profiling

Some text mining tools have the capacity to extract complex information from text such as the key personality traits and communication style preferences of the authors of the texts. This information enables you to hone your marketing messaging and communication style to suit your audience and target specific customers.

Competitor analysis

The majority of social media text data is freely and publicly available and provides a perfect opportunity to suss out your competition. Find out what your competitors’ customer pain points are, what share of voice you and each of your competitors have, and how it has changed over time.

Customised analysis

If you want to measure something in particular, you can create your own use case and categories and train your machine-learning platform to take statistics on a specific topic.

Approaches to text mining

There are two main approaches to text mining: machine learning and linguistic rules. Each has their pros and cons.

Linguistic rules-based approach

The rules-based approach has been compiled by language experts. This approach can break down sentences into parts of speech, can identify syntax and inflexions, and makes allowances for certain nuances and variations in style.

Pros Cons
Accuracy from the first enquiry. There’s no learning curve in the linguistic rules approach. Doesn’t adapt and evolve with changes in language particularly informal language that is subject to change frequently
Easy to reclassify and fix errors. Takes a long time to initially create the rules. Requires a lot of time and research with highly skilled linguistic experts.
Relatively easy to apply and adapt to different languages. Some languages are not as thoroughly studied and require further research to understand unique grammar and vocabulary characteristics.
Built by humans with biases and possible gaps in knowledge of dialect, informal speech, specialized language etc.

Machine learning-based approach

In the machine learning-based approach, developers train the computer by feeding it examples and assigning them meaning. The computer can look at a pattern and map it to a concept such as semantics or intent.

Pros Cons
Requires fewer resources. You can train with examples, and the machine learns by itself. Requires extensive training at first.
Easy to adapt and customize to new conditions. A learning curve.
Less rigid – it can learn and infer meaning from context Difficult to fix errors, you would need to feed in several examples for it to learn the new rule.
Can uncover unpredictable and more relevant insights Slightly less precise.

Hybrid approach

The third approach is to create a hybrid of each approach and get the benefits of both worlds. Who’s to say how accurate and impressive machine learning on its own will become. For now, at least, text mining benefits from the expertise of linguists to boost precision, while machine learning makes up for the rigidity of the rules-based approach.