OPTIMI: Early Prediction and Prevention of Depression

Institute for Response-Genetics, Departement of Psychiatry (KPPP)

Psychiatric Hospital, University of Zurich


Computerized Analysis of Speech Characteristics

Program Package Master.VOX

The program package Master.VOX comprises 27 modules for the analysis of speaking behavior and voice sound characteristics. Master.VOX has been built around a databank system in order to facilitate (1) normative studies with different types of text and repeated assessment on the same subject at 14-day intervals; (2) clinical studies with psychiatric patients and repeated assessments over several weeks.

Learning to Recognize: Normative Studies with Healthy Subjects

Normative studies with healthy subjects are necessary in order to learn to distinguish between "natural fluctuations" and "significant changes" in speaking behavior and voice sound characteristics. Typically, normative studies are carried out on samples stratified according to gender, age, and educational level. Two repeated assessments at 14-day intervals with 4 different types of text (free speech; reading out loud emotionally neutral text; reading out loud emotionally stimulating text; automatic speech: counting) allows one to compute estimates of between-text variation and within-subject stability over time.

Clinical Studies with Psychiatric Patients

Longitudinal studies of patients under treatment with repeated assessments at dense intervals can be used to monitor the time course of recovery in terms of speech characteristics. Also, Neural Nets (NNs) can be used to predict, for example, the time course of psychopathology syndrom scores on the basis of speech characteristics in combination with other relevant factors.

Speech Recordings

Similarity/Diversity of Spectral Voice Patterns

Data Retrieval

Basic Databank Functions


Speech recordings are typically carried out in an acoustically shielded room which creates a standardized experimental setting and leads to acoustic conditions that enable recordings with a dynamic range of 60 decibel (dB). Speech signals are digitized online with a sampling rate of 96 kHz and at a 24 bit resolution. The recording procedure yields a 2-3 minute time series for each of the two standard texts.

Everis, Spain
ETH, Switzerland
UZH, Switzerland
Freiburg, Germany
MA Systems, UK
Bristol, UK
Xiwrite, Italy
Ultrasis, UK
Jaume, Spain
Valencia, Spain
Lanzhou, China


EU-Grant (FP7):

[ Mail to Webmaster ] k454910@bli.uzh.ch