Speech Recordings  |  Speech Characteristics  |  Normative Studies  |  Depression  |  Coping Skills  |  Program Package  |  Home

Normative Studies with Healthy Volunteers

Nonverbal Information Contained in Human Speech

Prior to using the speech analysis method as a standard tool for the assessment of the affective state of a speaker, several questions have to be addressed: (1) the optimum recording time containing enough information for a reliable quantification of speech parameters; (2) the distribution of the inter-individual scattering of speech parameters in the general population; (3) the intra-individual stability of speech parameters over time; (4) the differences between dialect and non-dialect, and between affect-neutral and affect-charged speech; and (5) the amount of variance explainable by the external factors age, sex and education.

Learning to Recognize

To address these questions, we have carried out a normative study with 192 healthy volunteers, stratified according to age, sex and education. The specific design of this study with 3 different types of text and 2 repeated measurements at 14 day intervals allowed us to analyze the intra-individual variation of speech parameters over time, their sensitivity to form and content of spoken text, and the inter-individual scattering of speech parameters. Thus, we were able to derive normative values of the general population and to learn to distinguish between "natural" fluctuations and "significant" changes which may encompass short-term reactions to the immediate environment or longer persisting deviations from "normality". Natural fluctuations are defined through the interval of plusminus two standard deviations around mean values(90%). Deviations beyond these thresholds are regarded as significant

Language-Dependence of Speech Parameters

As speech parameters are language-dependent, OPTIMI relies on the normative study previously carried out at Zurich (Switzerland: German), along with a study of 120 healthy volunteers carried out at Bristol (UK: English), and a study of 120 healthy volunteers carried out at Valencia (Spain: Spanish). The test persons are ascertained on the basis of the Zurich Health Questionnaire (ZGF) and invited twice at 14-day intervals to the recording studio where they are asked to present 3 different types of text: (1) automatic speech, (2) reading out loud an emotionally neutral text, and (3) reading out loud an emotionally stimulating text. The speech recordings are carried out according to the following scheme:

  • Counting out loud from 1 to 40 (0.5 minutes)
  • Short pause (0.5 minutes)
  • Emotionally neutral text from a well-known children's book to be read out loud (3 minutes)
  • Short pause (0.5 minutes)
  • Emotionally stimulating text from a well-known author to be read out loud (3 minutes)
  • Counting out loud from 1 to 40 (0.5 minutes)

scatter plot
Fig. 3a: Stability of the speech parameters "mean vocal pitch" and "6db-bandwidth" over time among 91 healthy volunteers: the first assessment is plotted along the x-axis and the second assessment 14 days later along the y-axis.

While mean vocal pitch displays a high stability over time, the speech parameter "6db-bandwidth" (measuring intonation) is much less stable. The experimental condition is "counting" and "frehand speech".

Everis, Spain
ETH, Switzerland
UZH, Switzerland
Freiburg, Germany
MA Systems, UK
Bristol, UK
Xiwrite, Italy
Ultrasis, UK
Jaume, Spain
Valencia, Spain
Lanzhou, China


EU-Grant (FP7):

[ Comments and suggestions to Webmaster ] k454910@bli.uzh.ch
[ Home ]
Impressum  |  Acknowledgements