Home – English

CORMA

This website provides an overview of the process of collecting the CORMA corpus (Corpus Oral de Madrid) and information about the corpus data.

CORMA was recorded between 2016 and 2019. It is an oral  corpus that represents the speech of Madrid. In its current version, it consists of 106 conversations among 485 speakers from Madrid, resulting in 57 hours of recording and 469,860 words.

The CORMA project was created to document spontaneous conversational Spanish as it is currently spoken in Madrid.

It is defined by the following features:

  • It represents a corpus of linguistic interactions in everyday settings and activities
  • It is characterized by a high degree of situational and sociolinguistic variation, with a representative number of participants of both sexes, different generations, and different socio-cultural levels
  • It is a corpus of oral, conversational, and colloquial speech, meaning that participants interact immediately, face-to-face, engaging in dialogues that are primarily cooperative, with a high degree of dynamism
  • It is a corpus of spontaneous speech, obtained mainly through ordinary techniques
background