Three Symanto Papers to be Included in RANLP 2021

Share on facebook
Share on twitter
Share on linkedin

We’re excited to announce that not one, not two, but three of our papers have been accepted in this year’s International Conference on Recent Advances in Natural Language Processing (RANLP 2021).

The conference, which is being held online this year, runs from 1st – 3rd September, with workshops subsequently held on the 6th and 7th.

Here, we’ll run down the abstract of each of the papers as an overview, but our researchers will give a more thorough presentation of their papers during the conference over Zoom.

1. Probabilistic Ensembles of Zero- and Few-Shot Learning Models for Emotion Classification

Authors: Angelo Basile, Guillermo Pérez-Torró and Marc Franco-Salvador

Emotion Classification is the task of automatically associating a text with an emotion.

State-of-the-art models for emotional classification are usually learned by one of two methods:

  1. Using annotated corpora
    This is when a body of text is annotated with affective information, such that each document is labelled as being related to a certain emotion. The process is tedious, time consuming and susceptible to annotator bias.
  2. Using hand-crafted affective lexicon.
    Affective lexicons refer to the subset of words that are used to describe affect or affective conditions such as emotions. This method also requires considerable human effort and as such is also susceptible to human bias and error.

Through our research, we present an emotion classification model that does not require a large annotated corpus to be competitive.

We experiment with pre-trained language models in both a zero-shot configuration (with no labelled training data) and few-shot configuration (with a few examples).

We build several of such models and consider them as biased, noisy annotators. Then, we aggregate the predictions of these models using a Bayesian method developed for modelling crowdsourced annotations.

Next, we show that the resulting system performs better than the strongest individual model.

Finally, we show that when that system and the zero-shot one are trained on few labelled data, they outperform fully supervised models.

This finding suggests that the system can outperform other more resource-intensive models. We’re excited to share our findings with the natural language processing (NLP) community at RANLP 2021.

2. Exploring Reliability of Gold Labels for Emotion Detection in Twitter

Author: Štajner, S. 2021.

“Gold labels” are sets of training data that have been hand-labeled by subject-matter experts. They are considered higher quality than labels output by agorithms. But the methods of obtaining gold labels for training and testing of the systems for automatic emotion detection vary significantly from one study to another, posing the question of the reliability of gold labels and obtained classification results.

In this study, Symanto’s Senior Research Scientist Sanja Štajner systematically explores several ways of obtaining gold labels for Ekman’s emotion model from Twitter data. The study also explores the influence of the chosen strategy on the manual classification results.

Given that emotion detection from social media posts has attracted noticeable attention from the NLP community in recent years, this research is a timely exploration on the variations in the reliability of gold labels for emotion detection in Twitter data.

3. How to Obtain Reliable Labels for MBTI Classification from Texts?

Authors: Štajner, S. and Yenikent, S. 2021

In the last few years there has been considerable interest in automatic detection of the Myers-Briggs Type Indicator (MBTI) from short posts. Given its ready availability, Twitter is a popular source of such data. But recent studies have proven this to be a difficult task.

Obtaining MBTI labels comes with its own set of challenges: human annotation requires the input of trained psychologists. Existing methods of automatically obtaining MBTI labels necessitate the use of long and time-consuming questionnaires.

In this paper, we present a method for collecting reliable MBTI labels using only four carefully selected questions that can be applied to any type of textual data.

Intrigued? You’ll have to await the full presentation of our paper at RANLP 2021.

Symanto Research Team

Collectively, our research team has decades of experience in the fields of psychological profiling and AI innovation.

Their skills and backgrounds complement one another to produce research that advances what’s possible in the real-world application of NLP. You can read more about their backgrounds and experiences on our website.

Of course, their research all contribultes to the development of all Symanto products, from our APIs to the Symanto Insights Platform. All are direct products of their collective specialist knowledge, and their drive to innovate and constantly advance our technologies.

We’re proud that our research team is included amongst the many inspirational scientists due to attend RANLP a 2021 and we’re excited to attend and discover other interesting developments in the field of NLP.

You can find more information about the conference and corresponding events on the RANLP website.



Pretzfelder Strasse 15

Nuremberg 90425