Within
Normal Limits
of
Reason

"Chance is the very guide of life"

"In practical medicine the facts are far too few for them to enter into the calculus of probabilities... in applied medicine we are always concerned with the individual" -- S. D. Poisson

October 31, 2004

Determination of Local Statistical Significance of Patterns in Markov Sequences with Application to Promoter Element Identification



High-level eukaryotic genomes present a particular challenge to the computational identification of transcription factor binding sites (TFBSs) because of their long noncoding regions and large numbers of repeat elements. This is evidenced by the noisy results generated by most current methods. In this paper, we present a p-value-based scoring scheme using probability generating functions to evaluate the statistical significance of potential TFBSs. Furthermore, we introduce the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts. We demonstrate that our approach is advantageous in the prediction of myogenin and MEF2 binding sites in the human genome. We also apply LMM to large-scale human binding site sequences in situ and found that, compared to current popular methods, LMM analysis can reduce false positive errors by more than 50% without compromising sensitivity. This improvement will be of importance to any subsequent algorithm that aims to detect regulatory modules based on known PSSMs.


Technorati Tags:
/ / / / / / /