Research Group 'Psychiatric Genetics', Head: Prof. Dr. Hans H. Stassen

Department of Psychiatry, Psychotherapy and Psychosomatics

Psychiatric Hospital, University of Zurich


Representing Speech Characteristics

Speaking Behavior and Voice Sound Characteristics

Speech characteristics can be roughly described by a few major features: speech flow, loudness, intonation and intensity of overtones. Speech flow describes the speed at which utterances are produced as well as the number and duration of temporary breaks in speaking. Loudness reflects the amount of energy associated with the articulation of utterances and, when regarded as a time-varying quantity, the speaker's dynamic expressiveness. Intonation is the manner of producing utterances with respect to rise and fall in pitch, and leads to tonal shifts in either direction of the speaker's mean vocal pitch. Overtones are the higher tones which faintly accompany a fundamental tone, thus being responsible for the tonal diversity of sounds.

Normative Data

Our approach to analyzing the nonverbal information contained in human speech is based on the results of a normative study on 192 healthy volunteers. The design of this study, with three different types of text and two repeated measurements at 14 day intervals, was chosen to investigate the reproducibility of speech parameters over time, and to analyze the sensitivity of speech parameters with respect to form and content of spoken text. In detail, we determined (1) the optimum recording time required for a reliable estimation of speech parameters, (2) the distribution of speech parameters in the general population, (3) the intra-individual stability of speech parameters over time which allows one to distinguish between "natural" fluctuations and "significant" changes, (4) the differences between dialect and non-dialect, and between affect-neutral and affect-charged speech, (5) the amount of variance explained by the factors age, gender and social status. Within the scope of this normative study we developed a practical recording procedure that can be carried out routinely by a technician in a standardized setting.

Data Analysis

All speech signals are inspected visually and marked with an artifact code if necessary, so that disturbed intervals can be removed prior to data analysis. In a next step, segmentation tables are set up in order to identify pauses and utterances, whereby pauses of less than 250 msec duration are skipped. Subsequently to this, we calculate "spectra" on the basis of 1-second epochs by means of a discrete Fourier transformation ("pure" utterances with pauses having been eliminated for spectral analysis). Finally, we approximate the shape of the F0 distribution curve ("F0" designates the mean vocal pitch of a speaker) by a 2nd degree polynomial and use the distance between the symmetrical 6-dB points as a measure for the "F0-variability" (intonation). The ratio height/width of the 2nd degree polynomial serves as a measure of the "F0-narrowness" (monotony). All frequency differences are calculated in quartertones in order to allow direct comparison between speakers independently of the speakers' mean vocal pitch.


Braun S, Annovazzi C, Botella C, Bridler B, Camussi E, Delfino JP, Mohr C, Moragrega I, Papagno C, Pisoni A, Soler C, Seifritz E, Stassen HH: Assessing Chronic Stress, Coping Skills and Mood Disorders through Speech Analysis. A Self-Assessment "Voice App" for Laptops, Tablets, and Smartphones. Psychopathology 2016; 49(6): 406-419 [get the article]
Delfino JP, Barragán E, Botella C, Braun S, Bridler R, Camussi E, Chafrat V, Lott P, Mohr C, Moragrega I, Papagno C, Sanchez S, Seifritz E, Soler C, Stassen HH: Quantifying Insufficient Coping Behavior under Chronic Stress. A cross-cultural study of 1,303 students from Italy, Spain, and Argentina. Psychopathology 2015; 48: 230-239
Braun S, Botella C, Bridler R, Chmetz F, Delfino JP, Herzig D, Kluckner VJ, Mohr C, Moragrega I, Schrag Y, Seifritz E, Soler C, Stassen HH: Affective State and Voice: Cross-Cultural Assessment of Speaking Behavior and Voice Sound Characteristics. A Normative Multi-Center Study of 577+36 Healthy Subjects. Psychopathology 2014; 47(5): 327-340
Mohr C, Braun S, Bridler R, Chmetz F, Delfino JP, Kluckner VJ, Lott P, Schrag Y, Seifritz E, Stassen HH: Insufficient Coping Behavior under Chronic Stress and Vulnerability to Psychiatric Disorders. Psychopathology 2014; 47: 235-243
Stassen HH, Delfino JP, Kluckner VJ, Lott P, Mohr C: Vulnerabilität und psychische Erkrankung. Swiss Archives of Neurology and Psychiatry 2014; 165(5): 152-157
Stassen HH (2004) Veränderungen der Sprechmotorik. In: T.Jahn (ed) Bewegungsstörungen bei psychischen Erkrankungen. Springer Heidelberg: 107-125
Stassen HH, Angst J (2002) Wirkung und Wirkungseintritt in der Antidepressiva-Behandlung. In: Böker H and Hell D (eds) Therapie der affektiven Störungen. Stuttgart und New York: Schattauer 141-165
Lott PR, Guggenbühl S, Schneeberger A, Pulver AE, Stassen HH (2002) Linguistic analysis of the speech output of schizophrenic, bipolar, and depressive patients. Psychopathology 35(4): 220-227
Püschel J., Stassen HH, Bomben G, Scharfetter C and Hell D (1998) Speaking behavior and voice sound characteristics in acute schizophrenia. J. Psychiatric Research 32, 89-97
Stassen HH, Kuny S, Hell D (1998) The speech analysis approach to determining onset of improvement under antidepressants. Eur. Neuropsychopharmacology 8(4), 303-310
Kuny S, Stassen HH, Hell D (1997) Kognitive Beeinträchtigungen in der Depression. Schweiz Arch Neurol Psychiatrie 150,3: 18-25
Stassen HH (1995) Affekt und Sprache. Stimm- und Sprachanalysen bei Gesunden, depressiven und schizophrenen Patienten. Monographien aus dem Gesamtgebiete der Psychiatrie, Bd. 79. Berlin, Heidelberg: Springer
Stassen HH, Albers M, Püschel J, Scharfetter C, Tewesmeier M, Woggon B (1995) Speaking behavior and voice sound characteristics associated with negative schizophrenia. J Psychiat Res. 29, 277-296
Kuny S, Stassen HH (1993) Speaking behavior and voice sound characteristics in depressive patients during recovery. J Psychiat Res. 27, 289-307
formant analysis
Voice sound characteristics ("timbre") of a male speaker as quantified through spectral analyses. Spectral intensities are plotted along the y-axis on log-proportional scales and as a function of frequency (x-axis: 7 octaves covering the frequency range of 64-8192Hz).
Mean vocal pitch in females lies 1 octave above that of male speakers. The distribution and intensity of overtones, as produced with vowels "a", "e", "i", "o" and "u", exhibit characteristic patterns with large inter-individual variations while being stable over time.
Depression significantly reduces the dynamic expressiveness of human voices, thus greatly reducing inter-individual differences. As a direct consequence, the patients' voices become more similar to each other ("depressive voice"). Voices regain their distinct individuality during recovery.
[ Mail to Webmaster ] k454910@bli.uzh.ch