CORMA

This website provides an overview of the process of collecting the CORMA corpus (Corpus Oral de Madrid) and information about the corpus data.

CORMA was recorded between 2016 and 2019. It is an oral corpus that represents the speech of Madrid. In its current version, it consists of 106 conversations among 485 speakers from Madrid, resulting in 57 hours of recording and 469,860 words.

The CORMA project was created to document spontaneous conversational Spanish as it is currently spoken in Madrid.

It is defined by the following features:

It represents a corpus of linguistic interactions in everyday settings and activities
It is characterized by a high degree of situational and sociolinguistic variation, with a representative number of participants of both sexes, different generations, and different socio-cultural levels
It is a corpus of oral, conversational, and colloquial speech, meaning that participants interact immediately, face-to-face, engaging in dialogues that are primarily cooperative, with a high degree of dynamism
It is a corpus of spontaneous speech, obtained mainly through ordinary techniques

background