OPTIMI: Early Prediction and Prevention of Depression

Institute for Response-Genetics, Departement of Psychiatry (KPPP)

Psychiatric Hospital, University of Zurich


Speech Recordings in a Standardized Setting

Recording Studio

The recordings are carried out in an acoustically shielded room, thus guaranteeing a reproducible setting and the acoustic conditions required for a dynamic range of 60 decibel (dB). Speech signals are digitized online with a sampling rate of 96 kHz at a 24 bit resolution. Typically, test person and technician are situated in two different rooms, separated from each other by an acoustically shielded window. In cases where no acoustically shielded laboratory is available, a "quiet" room with sufficiently low background noise is used.

Experimental Setting

A recording encompasses "counting out loud from 1 to 40" and "reading a standard text out loud". The test persons are led to the recording studio where they are asked to feel comfortable, to relax, and to count loud from 1 to 10 and to chat about their personal background in their native dialect and with their normal voice. This speech signal is used by the technician to calibrate the recording (the procedure may be repeated several times if necessary). The microphone distance is kept constant and is approximately 50 cm.

Signal Calibration

The recording is calibrated by the technician in such a way that the maximal signal is between -2dB and -0dB on the peak meter. Important: the signal amplitude must not exceed 0dB (red LEDs should not light up on the peak meter) in order to avoid clipping. After voice calibration, a well-defined tone of 5 seconds duration is generated and recorded via microphone so that calibration parameters can be reconstructed. Under the assumption that counting and chatting with the technician has helped the test person to relax, the definitive measurement is carried out. The recording comprises two experimental conditions "automatic speech" (counting) and "reading a standard text out loud". This design eliminates all sources of variation due to different content of recorded text, or due to direct interactions between an interviewer and the test person. Variations due to circadian fluctuations are avoided by always recording test persons at a fixed time span in the morning, beween 8 and 11 o'clock.

Standard Recording Scheme

Speech recordings are carried out according to the following scheme: (1) The test person is asked to count out loud from 1 to 40; (2) Short pause of maximal 30 seconds; (3) The test person is asked to read the standard text out loud; (4) Short pause of maximal 30 seconds; (5) The test person is asked to count out loud again from 1 to 40. The entire recording procedure takes approximately 15 minutes including volume calibration.


Recording studio (acoustically shielded room) with microphone, desk, and stool where the test person is seated while speaking into the microphone. Test person and technician are typically in two different rooms separated from each other by a window.

Everis, Spain
ETH, Switzerland
UZH, Switzerland
Freiburg, Germany
MA Systems, UK
Bristol, UK
Xiwrite, Italy
Ultrasis, UK
Jaume, Spain
Valencia, Spain
Lanzhou, China


EU-Grant (FP7):

[ Mail to Webmaster ] k454910@bli.uzh.ch