About CORMA

Overall description

Data collection

The data collection was conducted over three fieldwork campaigns carried out between 2016 and 2019. The table below summarizes the period during which the data collection took place, the recording hours, and the number of conversations for each campaign.

campaign name date recording hours (approximately) number of conversations
first campaign CORMA2016 April 2016 43 58
second campaign CORMA2.0 January 2019 8 29
third campaign CORMA2.1 October 2019 6 19

 

Conversations were recorded in different situations, which can be subdivided into four general interactive contexts:

  • Interactions between family members in the private sphere (FA)
  • Conversations between friends in the public or private sphere (AM)
  • Interactions in commercial contexts (customer service = AT)
  • Regular interactions between people who may or may not know each other in the public or private sphere (CON and CO)

Regarding the parameter of colloquialism, most of the conversations included in the corpus are characterized by the following five features:

  • equality between participants
  • a familiar discursive domain
  • a non-specialized theme
  • lack of planning
  • the interpersonal purpose of the communicative act, so that they are considered prototypical colloquial conversations (Briz, 1998: 41)

However, it should be noted that these ‘colloquializing’ features are present to a greater or lesser extent, so it is advisable to distinguish between different degrees of colloquialism and, consequently, between prototypical and peripheral colloquial conversations. By way of illustration, a conversation between young friends in the park is considered more colloquial than a transaction between a pharmacist and a customer.

The recordings were collected in an unguided and flexible manner, which is reflected in the self-control over the recording and its circumstances by the recruit (the participant responsible for the recording). However, this procedure involves some limitations regarding the characteristics of the corpus, namely that the recordings show considerable variation in terms of their duration, the number of participants, and the audio quality (with more or less background noise). Additionally, the sociolinguistic distribution of participants is not equal across the different recording contexts.

Transcription and Composition of the Corpus

The transcription of the recordings was carried out by native Spanish speakers using the program Praat (<www.praat.org>).

The transcriptions are orthographic in nature, following the rules of official normative orthography (RAE, 2010), with the exception of two phonetic features typical of orality: the aspiration of the implosive -s (González Montero, 1993) and the elision of consonants or syllables (Gómez Molina and Gómez Devís, 2010).

For more information on the transcription and composition of the corpus, we refer to the following article:

Enghels, R., De Latte, F., & Roels, L. (2020). El Corpus Oral de Madrid (CORMA): materials for the (socio)linguistic study of current colloquial Spanish. ZEITSCHRIFT FUR KATALANISTIK, 33, 45–76.

Metadata

Data Description

The oral, conversational, and colloquial corpus is characterized by a high degree of situational and sociolinguistic variation, with a representative number of participants of both sexes, different generations, and different socio-cultural levels.

More information about the metadata can be found in the following documents:

For each conversation, information about the following situational and sociological variables was systematically recorded in a data sheet:

Recording Information

Date

Duration

Recording location: neighborhood and spatial environment (e.g., ‘Embajadores’, ‘at home’)

Responsible researcher

Conversation Information

Conversation topic (e.g., travel, school, work, family)

Conversational purpose: interpersonal – transactional

Recording Technique Information

Role of the researcher: absent – present as observer – present as participant

Type of recording: (semi)secret – ordinary

Participant Information

Number of participants

Sociological information of each participant (if available):

    1. Gender: male – female – n/a
    2. Age: generation 1 (0–11) – generation 2 (12–25) – generation 3 (26–55) – generation 4 (+55)
    3. Education level: primary – secondary – higher education
    4. Profession
    5. Role (based on the relationship with the person responsible for the recording)
    6. Additional observations (e.g., foreign origin of the speaker)

Field Notes (any supplementary information relevant to the course and subsequent analysis of the conversation)

 

Each data sheet can be consulted on the ‘Consult the corpus‘ page.

 

Participant and Conversation Codes

Each speaker has been assigned a code that refers to the situation, role, or educational institution (in the case of young people), age, gender, and intervention.

Situations/Role

AM = friends

FA = family

CON = acquaintances

CO = colleagues

AT = customer service

PEL = hairdresser

BAR = bar or café

FAR = pharmacy

EST = beautician

FLOR = florist

MUEB = furniture dealer

ROPA = clothing store

ALB = builder

EL = electrician

PR = teacher

PORT = doorman

desc = unknown

Educational Institutions

FP = Faculty of Journalism

IIC = IES Isabel la Católica

IR = IES Las Rozas

MS = IES Madrid Sur

RE = IES Renacimiento

VV = IES Villa de Vallecas

L, M, IJ = correspond to contact persons

Age

GEN1: 0-11 years (children)

GEN2: 12-25 years (adolescents/young adults)

GEN3: 26-55 years (adults)

GEN4: > 56 years (elderly)

Gender

M = male

F = female

Intervention

For example, the speaker with the identification code AM2F1 participates in the corpus as a friend (AM) (= situation), belongs to the second generation (2) (= age), and is a woman (F) (= gender). The final 1 indicates that she was the first to participate in the conversation.

In some situations, additional specifications have been added:

Customer Service

C = customer

P = provider

j = boss

e = employee

Example: ROPAj3F1m = Mother (m) of the boss (j) of the clothing store (ROPA)

Family

h = son

hi = daughter

m = mother

p = father

e = spouse

pr = cousin

s = in-law

a = friend

Example: AM1M7p = Father (p) of one of the young friends (AM) (1)

Likewise, an identification code has been created for each conversation: AM.GEN3.F.01 constitutes the first conversation (01) recorded between adult (GEN3) female friends (AM, F).