Research Group 'Psychiatric Genetics', Head: Prof. Dr. Hans H. Stassen

Department of Psychiatry, Psychotherapy and Psychosomatics

Psychiatric Hospital, University of Zurich


Molecular-Genetic Neural Network Analysis

Modeling Gene Products

Standard association methods aim at connecting genotype with phenotype in a direct way, thus greatly simplifying biology. In fact, genes code for proteins or RNA ("gene products") which may interact in a variety of ways and influence the phenotype only after a cascade of intermediate steps. Molecular-genetic Neural Networks (NNs) generalize standard regression analysis in a natural way by (1) implementing multistage gene products through one or more intermediate "layer(s)", and (2) allowing for (linear/nonlinear) interactions between genes and between gene products. It is the advantage of NNs that the specific knowledge about the cascade of intermediate steps, which ultimately lead from genotype to phenotype, can be incomplete or even unknown ("hidden layers").

Fitting the NN Model

During optimization the algorithm systematically improved genotype-phenotype correlations by iteratively adding or removing genomic loci and fitting the NN model to the set of 1,042 observations under the constraint of reproducibility with k-fold cross-validation (k = 10). Using a single layer for gene products, we set the number of gene products equal to the number of genomic loci included in the NN model, while a one-dimensional phenotype was chosen to reflect the IgM level as derived from the multidimensional genotype. The convergence criterion was set to c = 0.03 with a maximum number of iterations of 70,000 and an initial learning rate of l = 0.012 that was gradually modified during iteration when the method of gradient descent got "stuck" without achieving convergence. Averaged across the k solutions and applied to the 1,042 probes, weight matrices and classifiers yielded an overall performance for each optimization step [Figure]. The optimization stopped when a plateau was reached at a rate of 77.3% correctly classified subjects out of the entire sample.

Prediction of IgM Levels by Genotype

The table below gives re-classification rates, sensitivity and specificity of NN-based predictors as derived during the process of k-fold cross-validation prior to averaging weight matrices and classifiers. Such predictors tend to be over-optimistic, in particular if the population under investigation includes subgroups. Therefore, averaging weight matrices and classifiers allows one to compensate for "local" data characteristics and yield a better performance when new, "unknown" probes are to be classified.

Iterative optimization of the starting configuration by systematically adding/removing genomic loci while fitting the NN model to the set of 1,042 observations under the constraint of reproducibility with 10-fold cross-validation. The red circles designate the percentage of correctly classified subjects for each optimization step, with optimization steps plotted along the x-axis (over proportionally large decreases in performance indicate removal of loci of larger weight).
[ Mail to Webmaster ] k454910@bli.uzh.ch