Code - Cisregul - Preliminary search for modules

We studied all combinations of up to 4 motifs within the list of motifs. Let nL be the total number of motifs. The total number of possible pairs is nL(nL+1)/2. Note that here we are not considering the order of the motifs to be relevant. There is still little biological data about this question and this requires further investiagion. Also, a module can be formed by instances of the same motifs, therefore allowing to consider homotypic interactions as well. The total number of possible triplets is given by nL(nL+1)(nL+2)/6. The total number of possible quadruplets is given by nL(nL+1)(nL+2)(nL+3)/24.

A potential module M was defined as a set of motifs {m1, …, mn}, with nmotifs, mi in L, that fulfilled the following requirements:
(i) Distance between adjacent motif occurrences less than maxd (we explored maxd=25, 50, 100 and 200 bp).
(ii) Maximum overlap between adjacent motifs of half the motif length.
(iii) The module had to be present in at least ntr genes in S (ntr=4 by default).
(iv) The module had to be enriched in S with respect to the background at p < 0.01 after Bonferroni correction (see definition of enrichment below).

If the order constraint was not imposed the module {m1, m2, ..., mn} was considered to be equivalent to the module {m2, mn, ..., m1}, etc.

This involves a large number of computations for large values of nL. We therefore, used a preliminary definition of modules by considering a reduced background set consisting of nb genes. The background genes were randomly selected from the whole genome with the only constraint that they should not overlap the genes in S. We used nb=20 nS.