Professors publish article in international journal Frontiers
Study with undergraduate monitor unites Computing, Linguistics and Psychiatry
The article “At the interface between linguistics, computer science and psychiatry: analysis of key textual factors influencing the BERT-based classification of schizophrenia in social media texts” was published in the international journal Frontiers. The work is the result of doctoral research in Language Studies (PUC-Rio) by course assistant João Victor Miranda e Silva, carried out in partnership with IMPA Tech professors Cilene Rodrigues and Emílio Brazil.
The research investigates linguistic patterns associated with schizophrenia (SZ) based on an experiment with the BERT language model – a model based on neural networks for Natural Language Processing (NLP) capable of understanding the context of a word based on all the surrounding text, not just sequentially.
“It was highly collaborative work. We thought together about how to automatically analyze and statistically compare linguistic data that dialogues with previous findings in linguistics and psychiatry,” said Silva.
The study used a model based on transformers to distinguish texts produced by people with schizophrenia from those without the disorder, integrating contributions from theoretical linguistics and computational approaches.
In all, 31,278 posts were analyzed from the Reddit platform, an online forum organized into thematic communities. According to the researchers, one of the main challenges was to balance the quantity and quality of the data used. “We invested a lot in curating the data, carefully selecting and reviewing the material. Even so, it’s a laborious process, especially in the context of social media,” says Silva.
On the other hand, the use of computational methods has made it possible to analyze large volumes of linguistic data, something not common in traditional approaches to the area. “Interdisciplinarity allows us to combine different perspectives on the same problem, broadening our understanding of the phenomena studied,” he adds.
The results indicate that language models can contribute to the identification of linguistic patterns associated with schizophrenia. The study also highlights the importance of data quality to ensure reliable predictions and points out possible biases – such as the influence of words directly related to the disorder – that need to be controlled to improve the generalization capacity of the models.
João Victor Silva is currently a course assistant for Language Skills. Since 2024, he has been a tutor for English and Introduction to Data Science. In April last year, the young man was the guest speaker at the academic seminar of IMPA Tech, when he presented the preliminary studies of the research to the students of the bachelor’s degree in Mathematics of Technology and Innovation.
