Speech Recordings in a Standardized Setting

Recording Studio

The recordings are carried out in an acoustically shielded laboratory, thus guaranteeing a reproducible setting and the acoustic conditions required for a dynamic range of 60 decibel (dB). Speech signals are digitized online with a sampling rate of 48 kHz at a 20 bit resolution. Test person and technician are situated in two different rooms, separated from each other by an acoustically shielded window. In cases where no acoustically shielded laboratory is available, a "quiet" room with sufficiently low background noise is used.

Experimental Setting

A recording encompasses "counting out loud from 1 to 40" and "reading a standard text out loud". The test persons are led to the recording studio where they are asked to feel comfortable, to relax, and to count loud from 1 to 40 in their native dialect and with their normal voice. This speech signal is used by the technician to calibrate the recording, and the measurement may be repeated several times if necessary. The microphone distance should be kept constant and should be approximately 50 cm.

Signal Calibration

The recording is calibrated by the technician in such a way that the maximal signal is between -2dB and -0dB on the peak meter. Important: the signal amplitude must not exceed 0dB (red LEDs should not light up on the peak meter) in order to avoid clipping. After voice calibration, a well-defined tone of 5 seconds duration is generated and recorded on tape for calibration purposes. Under the assumption that counting and chatting with the technician has helped the test person to relax, the actual measurement is carried out while the test person is alone in the recording studio. The recording comprises two experimental conditions "automatic speech" (counting) and "reading a standard text out loud". This design eliminates all sources of variation due to different content of recorded text, or due to direct interactions between an interviewer and the test person. Variations due to circadian fluctuations can be avoided by always recording test persons at a fixed time span in the morning, for example, beween 8 and 11 o'clock.

Standard Recording Scheme

Speech recordings are carried out according to the following scheme: (1) The test person is asked to count out loud again from 1 to 40; (2) Short pause of maximal 30 seconds; (3) The test person is asked to read the standard text out loud; (4) Short pause of maximal 30 seconds; (5) The test person is asked to count out loud again from 1 to 40. The entire recording procedure takes approximately 15 minutes including volume calibration.

Technical Specifications

Data are stored on a DAT cassette tape recorder TASCAM DA-30 MkII (TEAC Professional Divison) that has been modified through a high resolution A/D-D/A analogue-digital converter ADD 30 (LAKE PEOPLE). The sampling rate is 48 kHz, the quantization 20 bit linear, and the frequency response 20-20,000 Hz +0.1 dB. The microphone MKH 40 P48 U-3 (SENNHEISER) has been specifically selected on the basis of its empirically determined frequency response curve in order to meet the high linearity of 0.1 dB over the frequency range 64 Hz - 16kHz.

Fig. 1: Recording studio (acoustically shielded room) with microphone, desk, and stool where the test person is seated while speaking into the microphone. Test person and technician are in two different rooms separated from each other by a window.
