Monte Carlo method by Markov chain for rare event probability estimation in bioinformatics problems
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Природные пептидные соединения являются биологически активными компонентами и входят в состав многих антибиотиков, противораковых и противоопухолевых средств. Несмотря на развитие масс-спектрометрии, благодаря которой точность секвенирования пептидных соединений значительно возросла, а потребность в количестве материала снизилась, задача идентификации природных пептидных соединений по базе данных все еще плохо изучена. В данной работе рассмотрен метод оценки статистической значимости совпадения спектров пептидных соединений, базирующийся на методе Монте-Карло по схеме марковской цепи. Кроме того, разработанный метод позволяет оценить дисперсию значимости и построить доверительный интервал. Также в работе представлен критерий остановки, позволяющий построить марковскую цепь длины, достаточной для построения оценки заданной точности.
Peptidic Natural Products (PNPs) are highly sought after bioactive compounds that include many antibiotic, antiviral and antitumor agents, immunosuppressors and toxins. Even though recent advancements in mass-spectrometry have led to the development of accurate sequencing methods for nonlinear (cyclic and branch-cyclic) peptides, requiring only picograms of input material, the identification of PNPs via a database search of mass spectra remains problematic. In this paper we describe a new way of estimating the statistical significance of a Peptide Spectrum Matches, defined by any peptide (including linear and non-linear), by using Markov Chain Monte Carlo methods. In addition to the estimate itself our method also provides an uncertainty estimate in the form of confidence bounds, as well as an automatic simulation stopping rule that ensures that the sample size is sufficient to achieve the desired level of result accuracy.
Peptidic Natural Products (PNPs) are highly sought after bioactive compounds that include many antibiotic, antiviral and antitumor agents, immunosuppressors and toxins. Even though recent advancements in mass-spectrometry have led to the development of accurate sequencing methods for nonlinear (cyclic and branch-cyclic) peptides, requiring only picograms of input material, the identification of PNPs via a database search of mass spectra remains problematic. In this paper we describe a new way of estimating the statistical significance of a Peptide Spectrum Matches, defined by any peptide (including linear and non-linear), by using Markov Chain Monte Carlo methods. In addition to the estimate itself our method also provides an uncertainty estimate in the form of confidence bounds, as well as an automatic simulation stopping rule that ensures that the sample size is sufficient to achieve the desired level of result accuracy.