TopSet of potentially coregulated genes. Positive and negative controls.
The starting point for the current algorithm is a set S of nS potentially coregulated genes. This set is assumed to be an input to the current algorithm. Several sources for S are possible including clusters of coexpressed genes from DNA microarray experiments. Here we tested the algorithm with three known sets of coregulated genes as positive controls and with random selections of mice genes from the RefSeq collection as negative controls.
Random sets
In each iteration, we randomly selected nS genes from the set of mouse RefSeq genes and applied the algorithm to search for cis regulatory elements as with the other sets. The assumption is that a random set of genes is unlikely to share a specific set of common regulatory elements (they may share general regulatory elements like the TATA box but these general elements should be present in many genes throughout the genome and therefore would not be enriched in a set of genes picked randomly). We used the following values of nS: 20, 30 and 40 genes.The process was repeated for 5 iterations (a rigorous bound on the number of modules and p values would require a much larger number of iterations, on the order of 1000 iterations. However, currently each iteration takes approximately 50 hours to run. The results for 5 iterations therefore are presented as an argument of plausibility that similar results to the ones obtained in the positive controls below are highly unlikely to arise from a random set of genes). See Performance on random sets of genes for the results.
Yeast CLB2 set
We studied the regulation of a set of genes that participate in the cell cycle in yeast. These genes show a pattern of expression that peaks in the M phase of the cell cycle (Spellman et al., 1998). The expression of several of these genes is known to be regulated by two transcription factors: MCM and SFF (Spellman et al., 1998, Koranda et al., 2000, Zhu et al., 2000). See Analysis of the CLB2 set in yeast for the results.
Drosophila pattern formation set
We studied the regulation of a set of genes involved in pattern formation in flies. These genes were taken from a previous study that combined computational and experimental work to study clustering of five known transcription factors, namely Bicoid, Caudal, Hunchback, Krüppel and Knirps (Berman et al., 2002). See Analysis of the pattern formation set in flies for the results.
Human muscle set
We studied the regulation of a set of genes expressed in the skeletal muscle in humans. Several transcription factors have been shown to play a role in the regulation of gene expression in skeletal muscle including Myf, Mef-2, SRF, Tef and Sp-1 (Wasserman and Ficket, 1998). See Analysis of the skeletal muscle set in humans for the results.