OPTIMI: Early Prediction and Prevention of Depression

Institute for Response-Genetics, Departement of Psychiatry (KPPP)

Psychiatric Hospital, University of Zurich


PATTERNS — Similarities between spectral voice patterns

Basic variables (such as pitch, loudness, energy, dynamics, duration of pauses and utterances) are used to describe speech recordings in terms of scalar parameters. In contrast, the multivariate spectral pattern approach to speech analysis has its main focus on the spectral composition of a speech recording together with the individual variability of each spectral component. Spectral patterns enable the application of a variety of algorithms which proved to be powerful in the field of Pattern Recognition. This program constructs spectral patterns in such a way that the total information included in the speech samples can be decomposed into (1) a static component that represents the individual, genetically determined characteristics of a the speaker's voice and (2) a dynamic component which reflects reactive changes due, for example, to the current situation of the speaker (short-term fluctuations) or to the speaker's global affective state (long-term fluctuations). Spectral patterns are derived from 4-8 consecutive 4-second epochs out of a total recording time of 30-60 seconds.

            Specificationlist:    PATTERNS
            I4 FRST                      1  Default-value
            I4 NSPK                     15  Default-value
            I4 PROT                      0  Default-value
            I4 PLOT                      0  Default-value
            I4 PMAX                      0  Default-value
            I4 LPRT                      6  Default-value
            I4 SAVE                      0  Default-value
            I4 TLOG                      0  Default-value
            01 FRST Specifies first spectrum to be included
            02 NSPK Specifies number of spectra to be used             l
            03 PROT Controls output to display/printer
            04 PLOT Controls graphic output
            05 PMAX Specifies maximum number of plot pages
            06 LPRT Logical unit number of plot-device
            07 SAVE Saves newly constructed spectral pattern in databank
            08 TLOG Logarithmic transformation of spectral lines
            09 DEMO Examples that illustrate program function
            - FRST = p: Specifies first spectrum to be included
            - NSPK = q: Specifies number of spectra to be used
            - PROT = 0: No print output
                   = 1: Basic characteristics of spectral voice patterns
                   = 2: Details on optimization
            - PLOT = 0: No plot output
                   = 1: Spectral voice patterns
                   = 2: Similarity between spectral voice patterns
            - PMAX = q: Maximum number of plot pages
                   = 0: Unlimited number
            - LPRT = q: Logical unit number of plot-device (standard=6;
                        valid numbers are 46-96)
            - SAVE = 0: No effect
                   = 1: Newly constructed spectral pattern is to be saved
                   = 2: Existing patterns will be replaced
            - TLOG = 0: No effect
                   = 1: Logarithmic transformation of spectral lines
            - DEMO: Voice sound characteristics and spectral patterns


            &&START CSELECT=Normative speech study zurich: males (study 600)
            &&START CSELECT=Normative speech study zurich: females (study 600)
            &&START PATTERNS=Normative speech study zurich (study 600)
Fig. 24: Voice sound characteristics ("timbre") of a female speaker as quantified through spectral analyses. Spectral intensities are plotted along the y-axis on log-proportional scales and as a function of frequency (x-axis: 7 octaves covering the frequency range of 64-8192Hz). The shaded area denotes the variability around mean spectral intensities ("characteristic variability"). It is this frequency-dependend variability that makes voices easily distinguishable from each other. The maxima represent "overtones" at fixed, physically well-defined intervals over the fundamental frequency F0 (please compare with the spectrum of the male speaker in Fig. 22).

Everis, Spain
ETH, Switzerland
UZH, Switzerland
Freiburg, Germany
MA Systems, UK
Bristol, UK
Xiwrite, Italy
Ultrasis, UK
Jaume, Spain
Valencia, Spain
Lanzhou, China


EU-Grant (FP7):

[ Mail to Webmaster ] k454910@bli.uzh.ch