Overall description
Data collection
The data collection was conducted over three fieldwork campaigns carried out between 2016 and 2019. The table below summarizes the period during which the data collection took place, the recording hours, and the number of conversations for each campaign.
campaign | name | date | recording hours (approximately) | number of conversations |
first campaign | CORMA2016 | April 2016 | 43 | 58 |
second campaign | CORMA2.0 | January 2019 | 8 | 29 |
third campaign | CORMA2.1 | October 2019 | 6 | 19 |
Conversations were recorded in different situations, which can be subdivided into four general interactive contexts:
- Interactions between family members in the private sphere (FA)
- Conversations between friends in the public or private sphere (AM)
- Interactions in commercial contexts (customer service = AT)
- Regular interactions between people who may or may not know each other in the public or private sphere (CON and CO)
Regarding the parameter of colloquialism, most of the conversations included in the corpus are characterized by the following five features:
- equality between participants
- a familiar discursive domain
- a non-specialized theme
- lack of planning
- the interpersonal purpose of the communicative act, so that they are considered prototypical colloquial conversations (Briz, 1998: 41)
However, it should be noted that these ‘colloquializing’ features are present to a greater or lesser extent, so it is advisable to distinguish between different degrees of colloquialism and, consequently, between prototypical and peripheral colloquial conversations. By way of illustration, a conversation between young friends in the park is considered more colloquial than a transaction between a pharmacist and a customer.
The recordings were collected in an unguided and flexible manner, which is reflected in the self-control over the recording and its circumstances by the recruit (the participant responsible for the recording). However, this procedure involves some limitations regarding the characteristics of the corpus, namely that the recordings show considerable variation in terms of their duration, the number of participants, and the audio quality (with more or less background noise). Additionally, the sociolinguistic distribution of participants is not equal across the different recording contexts.
Transcription and Composition of the Corpus
The transcription of the recordings was carried out by native Spanish speakers using the program Praat (<www.praat.org>).
The transcriptions are orthographic in nature, following the rules of official normative orthography (RAE, 2010), with the exception of two phonetic features typical of orality: the aspiration of the implosive -s (González Montero, 1993) and the elision of consonants or syllables (Gómez Molina and Gómez Devís, 2010).
For more information on the transcription and composition of the corpus, we refer to the following article:
Enghels, R., De Latte, F., & Roels, L. (2020). El Corpus Oral de Madrid (CORMA): materials for the (socio)linguistic study of current colloquial Spanish. ZEITSCHRIFT FUR KATALANISTIK, 33, 45–76.
Metadata
Data Description
The oral, conversational, and colloquial corpus is characterized by a high degree of situational and sociolinguistic variation, with a representative number of participants of both sexes, different generations, and different socio-cultural levels.
More information about the metadata can be found in the following documents:
For each conversation, information about the following situational and sociological variables was systematically recorded in a data sheet:
Recording Information
Date
Duration
Recording location: neighborhood and spatial environment (e.g., ‘Embajadores’, ‘at home’)
Responsible researcher
Conversation Information
Conversation topic (e.g., travel, school, work, family)
Conversational purpose: interpersonal – transactional
Recording Technique Information
Role of the researcher: absent – present as observer – present as participant
Type of recording: (semi)secret – ordinary
Participant Information
Number of participants
Sociological information of each participant (if available):
-
- Gender: male – female – n/a
- Age: generation 1 (0–11) – generation 2 (12–25) – generation 3 (26–55) – generation 4 (+55)
- Education level: primary – secondary – higher education
- Profession
- Role (based on the relationship with the person responsible for the recording)
- Additional observations (e.g., foreign origin of the speaker)
Field Notes (any supplementary information relevant to the course and subsequent analysis of the conversation)
Each data sheet can be consulted on the ‘Consult the corpus‘ page.
Participant and Conversation Codes
Each speaker has been assigned a code that refers to the situation, role, or educational institution (in the case of young people), age, gender, and intervention.
Situations/Role
AM = friends
FA = family
CON = acquaintances
CO = colleagues
AT = customer service
PEL = hairdresser
BAR = bar or café
FAR = pharmacy
EST = beautician
FLOR = florist
MUEB = furniture dealer
ROPA = clothing store
ALB = builder
EL = electrician
PR = teacher
PORT = doorman
desc = unknown
Educational Institutions
FP = Faculty of Journalism
IIC = IES Isabel la Católica
IR = IES Las Rozas
MS = IES Madrid Sur
RE = IES Renacimiento
VV = IES Villa de Vallecas
L, M, IJ = correspond to contact persons
Age
GEN1: 0-11 years (children)
GEN2: 12-25 years (adolescents/young adults)
GEN3: 26-55 years (adults)
GEN4: > 56 years (elderly)
Gender
M = male
F = female
Intervention
For example, the speaker with the identification code AM2F1 participates in the corpus as a friend (AM) (= situation), belongs to the second generation (2) (= age), and is a woman (F) (= gender). The final 1 indicates that she was the first to participate in the conversation.
In some situations, additional specifications have been added:
Customer Service
C = customer
P = provider
j = boss
e = employee
Example: ROPAj3F1m = Mother (m) of the boss (j) of the clothing store (ROPA)
Family
h = son
hi = daughter
m = mother
p = father
e = spouse
pr = cousin
s = in-law
a = friend
Example: AM1M7p = Father (p) of one of the young friends (AM) (1)
Likewise, an identification code has been created for each conversation: AM.GEN3.F.01 constitutes the first conversation (01) recorded between adult (GEN3) female friends (AM, F).