PROactive PROfiling of HATE speech spreadeRs


According to the World Economic Forum’s 2017 Global Risk Report, society is increasingly polarized. People are becoming more organized into distinct communities of similar stances and positions. Inter-community polarization is often perpetuated by two related phenomena: the lack of communication between groups, or the use of aggressive language (hate speech). The Pew Research Center online harassment report (2017) shows that 41% of people have been the target of hate speech, while 62% of those interviewed consider it a major problem. The European Commission (EC) is aware of this problem and has signed a code of conduct with social media companies to fight hate speech.

Social media can amplify what occurs in society and plays a key role in inter-group communication. However, social media companies have been reluctant to tackle the problem, dealing with it through various caveats, such as redefining hate speech Facebook pages as controversial humor. The business model of social media companies may also not be conducive to tackling hate speech. Indeed, hate speech pages can receive a lot of visits and social media companies can be unwilling to drive traffic away from their sites for fear of losing advertising revenue.

Compounding the issue is the issue of verification. Shielded behind anonymity, entities with partisan interests may try to massively influence public opinion as they try to polarize society with disinformation. Despite growing interest in polarization and hate speech, as well as numerous research efforts were undertaken in English, there is little research that targets other European languages such as German or Spanish. The main aim of this project is to address this gap and pave the way for further research on Polarization and Hate Speech beyond English-speaking societies. Our research will address different problems that contribute to the detection of polarization and hate speech: 1) Stance detection with respect to controversial topics; 2) Identification of polarized communities; 3) Hate speech detection; and 4) Bots identification. These components will be considered from a holistic perspective unlike some of the existing research works, which address them as isolated problems. Our proposal focuses on four components: 1) Language Resources; 2) Polarized Communities Network Analysis; 3) Methods and Tools; and 4) Application Scenarios.

The project has several application scenarios. In the context of cyber-security, government agencies could detect individuals and groups that spread hate speech and take appropriate countermeasures. In addition, social media companies could profit from the language resources and project tools to automate the detection of hate speech in the German and Spanish languages and its dialects. Even now, much content review at major social media companies is still being done manually.

Moreover, communication and advertising agencies as well as organizations in general could profit from knowing the stance of citizens and consumers with respect to some topic or service (e.g., self-driving cars) in order to better understand their target users/customers (e.g., before the launch of a (controversial) product or service ( They can also use stance detection to analyze the customers’ posts on their products or services. Bot detection can allow organizations to detect social media accounts trying to harm their reputation.



Get in touch today and understand your data like never before.

Yuwon Song