Abstract
In order to detect sequence-based information predictive for the location of eukaryotic transcriptional regulatory domains, the frequencies and distributions of the 36 possible purine/pyrimidine reverse complement hexamer pairs was determined for test sets of real and random sequences. The distribution of one of the hexamer pairs (RRYYRR/YYRRYY, referred to as M1) was further examined in a larger set of sequences (>32 genes, 230 kb). Predominant clusters of Ml and the locations of eukaryotic transcriptional regulatory domains were found to be associated and non-randomly distributed along the DNA consistent with a periodicity of ∼1.2kb. In the context of higher ordered chromatin this would align promoters, enhancers and the predominant clusters of M1 longitudinally along one face of a 30 nm fiber. Using only information about the distribution of the M1 motif, 50–70% of a sequence could be eliminated as being unlikely to contain transcriptional regulatory domains with an 87% recovery of the regulatory domains present.