Diametrical Clustering
Inderjit Dhillon, Edward Marcotte, and Usman Roshan
Supplementary Information
A copy of the paper is available here
For each dataset NA indicated missing values.
-
Yeast
expression matrix, (matrix
with NA replaced by 0, and expression vectors normalized to Euclidean norm
1) as used in Spellman
98 et al . This contains a subset of 696 genes out of 800 analyzed
in Spellman et al. This subset was chosen by removing genes with more than
4 missing values. Each gene expression vector is normalized to have mean
0 and variance 1.
-
Human
Fibroblast expression matrix, as used in Iyer 99 et al. This contains
a set of 517 genes. Each entry has been modified by dividing by the expression
level at time 0 and taking the log of the result. Each gene expression
vector is normalized to Euclidean norm 1.
-
The Rosetta Yeast dataset is available here.
Because it is commercial data, we can't post it directly on this site.
Following are clusters for the above datasets created using the
diametrical algorithm. We provide expression profiles on only the first 100 experiments
(when the number of experiments is at least 100) for ease of visualization.
Important note for older versions of Netscape and Internet
Explorer:
When viewing individual gene profiles in clusters using older versions
of Netscape and Internet Explorer, the browser may display the profiles of previously
selected genes fetched from the cache, rather than the current ones selected by the
user. To reload the gene profiles, right click on window displaying the gene profiles
,select reload, and click yes when asked to repost the form data. With Explorer
version 6.0 and Netscape version 7.0 this problem does not occur.
-
10
clusters from the Iyer Human Fibroblasts datasets
-
12
clusters from the Spellman Yeast dataset
-
80
clusters from the Rosetta Yeast dataset