COMPUTATIONAL LINGUISTICS APPROACHES TO READABILITY AND AUTOMATIC SIMPLIFICATION IN FINANCIAL NARRATIVE (CLARA-FIN)

WHAT IS CLARA-FIN?

One of the conclusions drawn from the FinT-esp project is the significant amount of implicit information hidden in the financial reports. Logically, the communicators (especially presidents and CEOs) of the companies do not want to reveal the losses they have incurred in their management. In our experiments with the automatic classification of annual reports from companies with gain and loss(Moreno et al. 2019, El-Haj et al. in preparation), we found that it is not easy to distinguish one from another by purely lexical methods (whether lexicon-based or machine learning). This is difficult even for human specialists. The reason is that relevant information that can make a difference is absent.

The main goal of CLARA-FINT is to describe the core reporting structure that allows for the comparison of financial report contents. Therefore, our approach to simplification involves, above all, a clear discourse and syntactic structure, not just a basic vocabulary.

We will collect new texts to increase the size but especially the variety of the FinT-esp corpus. We will include news from the specialised press as well as information from websites. A more complete and varied corpus will enable us to develop more representative financial language models in Spanish.

A second specific objective is the participation in shared-tasks of evaluation within the framework of the Workshops on Financial Narrative Processing and MultiLing Financial Summarisation, organised by the researchers of UCREL – Lancaster. This will allow the inclusion of texts in Spanish for the summarisation competitions. It is planned that one of the conferences where these tasks can be raised is the annual meeting of the SEPLN.

A third objective is to advance financial narrative knowledge, both from an economic and a linguistic perspective. The valuable and enormous amount of data collected is a significant source for elaborating specialised lexicons or glossaries of financial terms and publications on the characteristics of financial discourse, its ways of organising information and argumentation. This knowledge has an impact on applied language disciplines such as Translation or Communication.

CLARA-FIN RESEARCH TEAM

RESEARCHERS

ANA GISBERT

Ana Gisbert Clemente is Associate Professor in the Department of Accounting in the Faculty of Economics at Universidad Autónoma de Madrid. She was a Predoctoral Fellow at Lancaster University within the context of the HARMONIA European project on Accounting Harmonisation and Standardisation in Europe. Since 2018 she is collaborating with the Laboratorio de Lingüística Informática to develop a financial narrative corpus to analyse the use of language in Spanish listed companies' annual reports. She has published papers in the areas of international accounting, corporate governance, audit oversight and earnings management. Her current research interests are focused on the analysis of financial reporting narratives.

BLANCA CARBAJO CORONADO

Blanca Carbajo Coronado holds a BA in Translation and Interpreting and a MA in Spanish Linguistics. She is currently a Ph.D student at the Computational Linguistics Laboratory, at Universidad Autónoma de Madrid with a scholarship (FPU) awarded by the Spanish Ministry of Science, Innovation and Universities. Her thesis deals with cause-effect relations in financial narratives using computational linguistic methods. She has also published work on financial terminology and corpus linguistics.

CLARA-FIN RESOURCES

PUBLICATIONS

A Discourse Marker Tagger for Spanish using Transformers

Authors: Ana García Toro, Jordi Porta Zamorano and Antonio Moreno–Sandoval

Year: 2022

Overview: Introducing an automatic discourse marker (DM) tagger for Spanish, this paper discusses developing and evaluating a tool that achieves significant agreement rates among human annotators and an impressive F1-score using Transformers.

Cite

The Financial Narrative Summarisation Shared Task (FNS 2022)

Authors: Mahmoud El-Haj, Nadhem Zmandar, Paul Rayson, Ahmed AbuRa’ed, Marina Litvak, Nikiforos Pittaras, George Giannakopoulos, Aris Kosmopoulos, Blanca Carbajo-Coronado and Antonio Moreno-Sandoval

Year: 2022

Overview: The paper showcases the outcomes of the FNS 2022, an initiative for summarizing financial annual reports from the UK, Greece, and Spain, as part of the FNP 2022 Workshop.

Cite

The Financial Document Structure Extraction Shared Task (FinTOC 2022)

Authors: Abderrahim Ait Azzi, Sandra Bellato, Blanca Carbajo Coronado, Mahmoud El-Haj, Ismail El Maarouf, Mei Gan, Ana Gisbert, Juyeon Kang and Antonio Moreno Sandoval

Year: 2022

Overview: This paper details the FinTOC-2022 Shared Task, which focuses on extracting and hierarchically organizing the structure of financial documents, fostering progress in table-of-contents extraction technologies.

Cite

The Financial Document Causality Detection Shared Task (FinCausal 2023)

Authors: Antonio Moreno-Sandoval, Jordi Porta-Zamorano, Blanca Carbajo-Coronado, Doaa Samy, Dominique Mariko and Mahmoud El-Haj

Year: 2023

Overview: This paper presents the results and insights from the Financial Document Causality Detection Shared Task (FinCausal 2023). It outlines the task’s objectives, methodology, dataset creation, and evaluation metrics. It also discusses the approaches and results of participating teams.

Cite

The Financial Narrative Summarisation Shared Task (FNS 2023)

Authors: Elias Zavitsanos, Aris Kosmopoulos, George Giannakopoulos, Marina Litvak, Blanca Carbajo-Coronado, Antonio Moreno-Sandoval and Mo El-Haj

Year: 2023

Overview: This paper presents the results and insights from the Financial Narrative Summarisation Shared Task (FNS 2023), focusing on summarizing annual reports from the UK, Greece, and Spain. The task, part of the 5th Financial Narrative Processing Workshop, aimed at using automatic summarization techniques, either abstractive or extractive, to condense long financial documents. The challenge attracted six systems from three teams.

Cite

LLI-UAM Team at FinancES 2023: Noise, Data Augmentation and Hallucinations

Authors: Jordi Porta-Zamorano, Yanco Torterolo and Antonio Moreno-Sandoval

Year: 2023

Overview: This paper presents a T5-based system developed by LLI-UAM for the FinancES 2023 Shared Task. It includes noise and data augmentation experiments, using corrected datasets and ChatGPT for data improvement. The paper reports on the system’s performance across tasks, detailing the impact of noise, data augmentation, and hallucinations on model accuracy.

Cite

Lexical indicators of profit and loss in Spanish shareholder letters

Author: Blanca Carbajo Coronado

Year: 2023

Overview: This study examines linguistic differences in shareholder letters from profitable and loss-making Spanish companies, focusing on verbs and nouns to discern financial performance indicators.

Cite

Financial concepts extraction and lexical simplification in Spanish

Authors: Blanca Carbajo Coronado and Antonio Moreno Sandoval

Year: 2024

Overview: This paper explores automatic concept extraction and lexical simplification in Spanish financial texts, employing AI language models for term identification and proposing strategies for making complex financial language more accessible.

Cite

DEMOS

SimFin

Author: Yanco Torterolo

Year: 2023

Overview: Financial term simplifier demo, designed to make complex financial terminology more accessible for non-experts.

Cite

PUBLIC DISSEMINATION

Antonio Moreno Sandoval Invited Talks:

«Annotating discourse markers and key financial terms in Spanish with transformers», Invited talk, 3rd Financial Narrative Processing workshop, Lancaster, 15 septiembre de 2021.
«Some issues on Financial Narrative Processing in Spanish», Plenary session, Meaning and Knowledge Representation Conference, UAM. 6 de Julio de 2022
«Algunas cuestiones sobre el procesamiento de la narrativa financiera», Invited talk, CITIUS, Centro Singular de Investigacion en Tecnoloxías Intelixentes, Universidad de Santiago de Compostela, 28 de noviembre de 2022.
«How does the technology assist organizations in delivering more significant research impact?», Invited talk, University Leaders’Forum 2023, Universitas Muhammadiyah, Yogyakarta, Indonesia, 10 de marzo de 2023.
«Lingüística e IA». Invited talk, Mundo actual, UAM, 3 mayo 2023.
«¿Cómo pueden las tecnologías del lenguaje ayudar a la enseñanza del español?» Invited talk, Congreso Español para todos, Salamanca 27 de junio 2023
«Panorama histórico de la Lingüística Computacional en España a travél del LLI-UAM», Lección inaugural del Máster de Lingüística aplicada y Tecnologías del lenguaje de la UCM: 15 septiembre 2023.
«Tecnologías lingüísticas al servicio de la enseñanza de lenguas», Lección inaugural del Máster de Español en la Univ. Valladolid, 21 septiembre 2023.
«Evolución de la Traducción Automática: desde los diccionarios a los transformers en 40 años» Invited Talk, Jornadas DARIAH-ES en la BNE: 7 de Noviembre 2023
«LLI-UAM (1988-2023): 35 años de investigación, docencia y transferencia en Lingüística Computacional», Invited Talk, Jornadas sobre Docencia e Investigación Lingüística en la Era de la Inteligencia Artificial, Universidad de La Rioja, 13 de diciembre de 2023.
«Herramientas digitales para la literatura: el diccionario de lemas y formas del Quijote», I Jornadas de Humanidades Digitales, UAM, 9 de enero de 2024.

Ayuda PID2020-116001RA financiada por MICIU/AEI /10.13039/501100011033

COMPUTATIONAL LINGUISTICS APPROACHES TO READABILITY AND AUTOMATIC SIMPLIFICATION IN FINANCIAL NARRATIVE (CLARA-FIN)

WHAT IS CLARA-FIN?

CLARA-FIN RESEARCH TEAM

RESEARCHERS

ANTONIO MORENO SANDOVAL

JOSÉ MARIA GUIRAO

ANA GISBERT

CHELO VARGAS

JORDI PORTA

MARTA TORDESILLAS

PAUL RAYSON

MAHMOUD EL-HAJ

DOAA SAMY

PABLO HAYA

BLANCA CARBAJO CORONADO

CLARA-FIN RESOURCES

PUBLICATIONS

DEMOS

PUBLIC DISSEMINATION

LEGAL NOTICE & PRIVACY POLICY

COOKIES POLICY

Tabla de contenidos