
Discover the science behind natural language processing with our industry-leading NLP research
Explore our latest research papers and publications that advance the science of natural language processing and shape the future of NLP technology.

Participation in research projects co-financed by European and national institutions to promote scientific excellence and social progress.
Research Projects
Research Papers
Our technology is based on scientific research by experts in a wide range of market sectors, including psychology, AI, NLP and engineering.
Mental Health
Basile, A., Chinea-Rios, M., Uban, A. S., Müller, T., Rössler, L., Yenikent, S., Chulví, B., Rosso, P., & Franco-Salvador, M. (2021). UPV-Symanto at eRisk 2021: Mental Health Author Profiling for Early Risk Prediction on the Internet. In CLEF 2021.
Cohrdes, C., Yenikent, S., Wu, J., Ghanem, B., Franco-Salvador, M., & Vogelgesang, F. (2021). Indications of Depressive Symptoms During the COVID-19 Pandemic in Germany: Comparison of National Survey and Twitter Data. In JMIR Mental Health.
Psychological Profiling
Štajner, S., Yenikent, S., & Franco-Salvador, M. (2021). Five Psycholinguistic Characteristics for Better Interaction with Users. In BESC 2021.
Štajner, S., Yenikent, S., Ghanem, B., & Franco-Salvador, M. (2021). What Motivates You? Benchmarking Automatic Detection of Basic Needs from Short Posts. In ACL-IJNLP 2021.
Štajner, S., & Yenikent, S. (2021). Why Is MBTI Personality Detection from Texts a Difficult Task?. In EACL 2021.
Štajner, S., & Yenikent, S. (2020). A Survey of Automatic Personality Detection from Texts. In COLING 2021.
Sentiment, emotion & social media analysis
Štajner, S. (2021). Exploring Reliability of Gold Labels for Emotion Detection in Twitter. In RANLP 2021.
Ghanem, B., Rosso, P., & Rangel, F. (2020). An Emotional Analysis of False Information in Social Media and News Articles. In ACM Transactions on Internet Technology (TOIT), 20(2), 1-18.
Chinea-Rios, M., Franco-Salvador, M., & Benajiba, Y. (2020). Aspect on: an interactive solution for post-editing the aspect extraction based on online learning. In LREC 2020.
Basile, A., Franco-Salvador, M., Pawar, N., Štajner, S., Rios, M. C., & Benajiba, Y. (2019). Symanto Research at SemEval-2019 task 3: combined neural models for emotion classification in human-chatbot conversations. In SemEval 2019.
Jiang, L., Biran, O., Tiwari, M., Weng, Z., & Benajiba, Y. (2019). End-to-end product taxonomy extension from text reviews. In ICSC 2019.
Giménez-Pérez, R. M., Franco-Salvador, M., & Rosso, P. (2018). String kernels for polarity classification: a study across different languages. In NLDB 2018.
Giménez-Pérez, R. M., Franco-Salvador, M., & Rosso, P. (2018). Single and Cross-domain Polarity Classification using String Kernels. In EACL 2017.
Author Profiling
Bevendorff, J., Borrego-Obrador, I., Chinea-Ríos, M., Franco-Salvador, M., Fröbe, M., Heini, A., Kredens, K., Mayerl, M., Pęzik, P., Potthast, M., Rangel, F., Rosso, P., Stamatatos, E., Stein, B., Wiegmann, M., Wolska, M., & Zangerle, E. (2023). Overview of PAN 2023: Authorship Verification, Multi-Author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection. In CLEF 2023.
Chinea-Rios, M., Borrego-Obrador, I., Franco-Salvador, M., Rangel, F., & Rosso, P. (2023). Profiling Cryptocurrency Influencers with Few-shot Learning. In CLEF 2023.
Bevendorff, J., Chinea-Ríos, M., Franco-Salvador, M., Heini, A., K̈orner, E., Kredens, K., Mayerl, M., Pezik, P., Potthast, M., Rangel, F., Rosso, P. Stamatatos, E., Stein, B., Wiegmann, M., Wolska, M., and Zangerle, E. (2023). Overview of PAN 2023: Authorship Verification, Multi-Author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection. In ECIR 2023.
Chinea-Ríos, M., Müller, T., De La Peña, G., Rangel, F., & Franco-Salvador, M. (2022). Zero and Few-shot Learning for Author Profiling. In NLDB 2022.
Rosso, P., Rangel, F. (2020). Author Profiling Tracks at FIRE. In SN Computer Science 2020.
Ghanem, B.,Giachanou, A., Kestemont, M., Manjavacas, E., Potthast, M., Rangel, F., Rosso, P., Specht, G., Stamatatos, E., Stein, B., Wiegmann, M., Zangerle, E. (2020). Shared Tasks on Authorship Analysis at PAN 2020. In ECIR 2020.
Franco-Salvador, M., Kondrak, G., & Rosso, P. (2017). Bridging the native language and language variety identification tasks. In KES 2017.
Franco-Salvador, M., Plotnikova, N., Pawar, N., & Benajiba, Y. (2017). Subword-based Deep Averaging Networks for Author Profiling in Social Media. In CLEF 2017.
Hate-speech detection
De La Peña, G., & Rosso, P. (2022). Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection. In LREC 2022.
De La Peña, G., & Rosso, P. (2022). Convolutional Graph Neural Networks for Hate Speech Detection in Data-Poor Settings. In NLDB 2022.
Rangel, F., De La Peña, G., Chulvi, B., Fersini, E., & Rosso, P. (2021). Profiling hate speech spreaders on twitter task at PAN 2021. In CLEF 2021.
Bevendorff, J., Chulvi, B., De La Peña, G., Kestemont, M., Manjavacas, E., Markov, I., Mayerl, M., Potthast, M., Rangel, F., Rosso, P., Stamatatos, E., Stein, B., Wiegmann, M., Zangerle, E. (2021). Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. In ECIR 2021.
Bevendorff, J., Chulvi, B., De La Peña, G., Kestemont, M., Manjavacas, E., Markov, I., Mayerl, M., Potthast, M., Rangel, F., Rosso, P., Stamatatos, E., Stein, B., Wiegmann, M., Zangerle, E. (2021). Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. (extended version). In CLEF 2021.
Low-resource NLP
Basile, A. Franco-Salvador, M., & Rosso, P. (2023). Zero-Shot Data Maps. Efficient Dataset Cartography Without Model Training. In Findings of EMNLP 2023.
Basile, A., Franco-Salvador, M., & Rosso, P.(2022). Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers. In NLDB 2022.
Müller, T., Pérez-Torró, G., Basile, A., & Franco-Salvador, M. (2022). Active Few-Shot Learning with FASL. In NLDB 2022.
Müller, T., Pérez-Torró, G., & Franco-Salvador, M. (2022). Few-Shot Learning with Siamese Networks and Label Tuning. In ACL 2022.
Basile, A., Pérez-Torró, G., & Franco-Salvador, M. (2021). Probabilistic Ensembles of Zero- and Few-Shot Learning Models for Emotion Classification. In RANLP 2021.
Outahajala, M., Benajiba, Y., Rosso, P., & Zenkouar, L. (2015). Using confidence and informativeness criteria to improve POS-tagging in amazigh. In Journal of Intelligent & Fuzzy Systems.
Textual similarity
Benajiba, Y., Sun, J., Zhang, Y., Jiang, L., Weng, Z., & Biran, O. (2019). Siamese networks for semantic pattern similarity. In ICSC 2019.
Glavaš, G., Franco-Salvador, M., Ponzetto, S. P., & Rosso, P. (2018). A resource-light method for cross-lingual semantic textual similarity. In Knowledge-based Systems 2018.
Álvarez-Carmona, M. A., Franco-Salvador, M., Villatoro-Tello, E., Montes-y-Gómez, M., Rosso, P., & Villaseñor-Pineda, L. (2018). Semantically-informed distance and similarity measures for paraphrase plagiarism identification. In Journal of Intelligent & Fuzzy Systems.
Text simplification
Štajner, S., Franco-Salvador, M., Rosso, P., & Ponzetto, S. P. (2018). CATS: A tool for customized alignment of text simplification corpora. LREC 2018.
Štajner, S., Franco-Salvador, M., Ponzetto, S. P., Rosso, P., & Stuckenschmidt, H. (2017). Sentence alignment methods for improving text simplification systems. In ACL 2017.
Fake news detection
Ghanem, B., Ponzetto, S. P., Rosso, P., & Rangel, F. (2021). Fakeflow: Fake news detection by modeling the flow of affective information. In EACL 2021.
Rangel, F., Giachanou, A., Ghanem, B., Rosso, P. (2020). Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In CLEF 2020.
Bevendorff, J., Ghanem, B., Giachanou, A., Kestemont, M., Manjavacas, E., Markov, I., Mayerl, M., Potthast, M., Rangel, F., Rosso, P., Specht, G., Stamatatos, E., Stein, B., Wiegmann, M., Zangerle, E. (2020). Overview of PAN 2020: Authorship Verification, Celebrity Profiling, Profiling Fake News Spreaders on Twitter, and Style Change Detection. In CLEF 2020.
Question answering
Krichene, S., Müller, T., & Eisenschlos, J. M. (2021). DoT: An efficient Double Transformer for NLP tasks with tables. In Findings of ACL 2021.
Eisenschlos, J. M., Maharshi, G., Müller, T., & Cohen, W. C. (2021). MATE: Multi-view Attention for Table Transformer Efficiency. In EMNLP 2021.
Event extraction
Caselli, T., Mutlu, O., Basile & A., Hürriyetoğlu A. (2021). PROTEST-ER: Retraining BERT for Protest Event Extraction. In CASE 2021.
Basile, A., & Caselli, T. (2020). Protest Event Detection: When Task-Specific Models Outperform an Event-Driven Method. In CLEF 2020.
Machine generated text
Sarvazyan, A. M., González, J. A., Franco-Salvador, M., Rangel, F., Chulvi, B., & Rosso, P. (2023). Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains. In Sociedad Española de Procesamiento del Languaje Natural.
Sarvazyan, A. M., González, J. A., Rosso, P., & Franco-Salvador, M. (2023). Supervised Machine-Generated Text Detectors: Family and Scale Matters. In CLEF 2023.
Other
Whitehouse, E., Gerard, W., Klimovich, Y., & Franco-Salvador, M. (2022). Programming by Example and Text-to-Code Translation for Conversational Code Generation. In arXiv preprint, 2022.