Diametrical Clustering

Inderjit Dhillon, Edward Marcotte, and Usman Roshan

Supplementary Information


A copy of the paper is available here 
For each dataset NA indicated missing values.
Yeast expression matrix, (matrix with NA replaced by 0, and expression vectors normalized to Euclidean norm 1) as used in Spellman 98 et al . This contains a subset of 696 genes out of 800 analyzed in Spellman et al. This subset was chosen by removing genes with more than 4 missing values. Each gene expression vector is normalized to have mean 0 and variance 1.
Human Fibroblast expression matrix, as used in Iyer 99 et al. This contains a set of 517 genes. Each entry has been modified by dividing by the expression level at time 0 and taking the log of the result. Each gene expression vector is normalized to Euclidean norm 1.
The Rosetta Yeast dataset is available here. Because it is commercial data, we can't post it directly on this site.

Following are clusters for the above datasets created using the diametrical algorithm. We provide expression profiles on only the first 100 experiments (when the number of experiments is at least 100) for ease of visualization.

Important note for older versions of Netscape and Internet Explorer:

When viewing individual gene profiles in clusters using older versions of Netscape and Internet Explorer, the browser may display the profiles of previously selected genes fetched from the cache, rather than the current ones selected by the user. To reload the gene profiles, right click on window displaying the gene profiles ,select reload, and click yes when asked to repost the form data. With Explorer version 6.0 and Netscape version 7.0 this problem does not occur.
10 clusters from the Iyer Human Fibroblasts datasets
12 clusters from the Spellman Yeast dataset
80 clusters from the Rosetta Yeast dataset