Research Group 'Psychiatric Genetics', Head: Prof. Dr. Hans H. Stassen

Department of Psychiatry, Psychotherapy and Psychosomatics

Psychiatric Hospital, University of Zurich


Reproducing Conventionally Derived Results through 500k-Chip Technology

Predicting Individual IgM Levels from Multilocus Genotype

A genetically predisposed aberrancy of the inflammatory response system has been linked to various complex diseases. In consequence, of particular clinical interest are "objective" classifiers that enable reliable prediction, on the one hand, and offer the opportunity for early intervention prior to the onset of clinical manifestations, on the other. To investigate the extent to which IgM levels can be reproducibly predicted for each individual patient from his/her multi­locus geno­type, we carried out a Neural Network (NN) analysis on a sufficiently large sample (n=1,042; genotyped for 5,728 SNPs of a conventionally designed 0.4 Mb genome scan) under the constraint of a 10-fold cross-validation. Since NN results tend to be over-optimistic, even when using stringent cross-validation approaches, we were interested in the reproducibility of predictors across populations ("training" versus "test" samples) and across SNP sets (conventionally designed genome scan versus anonymous 500k-chip). To address these questions, we relied on independent test samples (n=746; genotyped for 545,080 SNPs of a 500k-chip) along with 6 different SNP sets, each with 5’728 SNPs drawn from the 500k-chip under the constraint of maximum informativeness and compatibility with the training SNPs.

SNP Selection for Cross-Validation

Based on NCBI36 data, the coordinates X(k) of the 5,728 SNPs of our training sample were used to define surrounding X(k)±0.1 Mb intervals (k=1,2,.. 5,728). Typically 50-80 SNPs of the 500k-chip were located in these inter­vals and served as pool for selecting 8 "optimal" SNPs in terms of informativeness and vicinity to the original loci at X(k) (k=1,2,.. 5,728). Finally, 6 subsets of 5,728 SNPs each were constructed by randomly combining SNPs from each interval [Figure]. This process led to mutual overlaps between the 6 subsets in the range of 14.6-16.6%. Due to missing data typically 40 SNPs (0.7%) of the resulting sets had to be excluded from analysis, so that on average only 5,688 SNPs were available in each set for testing.

Reproducibility of Multilocus Configuration

In terms of clusters of at least 3 SNPs within a 0.5 Mb region, the training step yielded a configuration of 15 genomic loci (61 SNPs) that served as reference for subsequent investigations into the reproducibility of classifiers across populations and SNP sets. Yet unexpectedly, the same algorithm applied to the 746 test samples with 6 competitive SNP sets, typically yielded relatively reproducible results for 4 out of the 6 SNP sets, whereas the results of the 2 other SNP sets pretty consistently turned out to be largely arbitrary. Given current results, no more than 5 of 15 genomic loci derived from the training samples appear to be reproducible through the test samples and inde­pendent of SNP sets.


Stassen HH, Szegedi A, Scharfetter C: Modeling Activation of Inflammatory Response System. A Molecular-Genetic Neural Network Analysis. BMC Proceedings 2007, 1 (Suppl 1): S61, 1-6
Stassen HH, Anghelescu IG, Hell D, Hoffmann K, Rujescu D, Scharfetter C, Szegedi A, Tadic A: Linking autoantibody formation to genetic vulnerability to psychiatric disorders and psychotropic drug response. Int J Neuropsychopharmacol. 2008; 11 (Suppl. 1): 101
Stassen HH, Hoffmann K, Scharfetter C: The Difficulties of Reproducing Conventionally Derived Results through 500k-Chip Technology. BMC Genet 2009; 3 Suppl 7: S66
Stassen HH, Hell D, Hoffmann K, Scharfetter C, Szegedi A: Linking Autoantibody Formation to Genetic Vulnerability to Psychiatric Disorders. Normative Rheumatoid Arthritis Study of 1,042 subjects genotyped for 5,728 SNPs. Am J Med Genetics B, Neuropsychiatr Genet (2009: in preparation)
replication study
Independent training and test sets were used to address the question of population dependence. Specifically, 6 randomly selected subsets of SNPs from the 500k-chip data, each comprising 5,728 SNPs, were used in order to quantify SNP-set dependence. Selection criteria for the 6 subsets were: (1) SNPs from ±0.1 Mb intervals around the original training set; (2) maximum informativeness of newly selected SNPs; and (3) less than 20% overlap between each of the 6 subsets.
[ Mail to Webmaster ] k454910@bli.uzh.ch