Evaluation of mutual information and genetic programming for feature selection in QSAR.

Venkatraman, V., Dalby, A.R. and Yang, Z.R. 2004. Evaluation of mutual information and genetic programming for feature selection in QSAR. Journal of Chemical Information and Computer Sciences. 44 (5), pp. 1686-1692. https://doi.org/10.1021/ci049933v

TitleEvaluation of mutual information and genetic programming for feature selection in QSAR.
TypeJournal article
AuthorsVenkatraman, V., Dalby, A.R. and Yang, Z.R.
Abstract

Feature selection is a key step in Quantitative Structure Activity Relationship (QSAR) analysis. Chance correlations and multicollinearity are two major problems often encountered when attempting to find generalized QSAR models for use in drug design. Optimal QSAR models require an objective variable relevance analysis step for producing robust classifiers with low complexity and good predictive accuracy. Genetic algorithms coupled with information theoretic approaches such as mutual information have been used to find near-optimal solutions to such multicriteria optimization problems. In this paper, we describe a novel approach for analyzing QSAR data based on these methods. Our experiments with the Thrombin dataset, previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001 demonstrate the feasibility of this approach. It has been found that it is important to take into account the data distribution, the rule “interestingness”, and the need to look at more invariant and monotonic measures of feature selection.

JournalJournal of Chemical Information and Computer Sciences
Journal citation44 (5), pp. 1686-1692
ISSN1549-9596
Year2004
PublisherACS Publications
Digital Object Identifier (DOI)https://doi.org/10.1021/ci049933v
PubMed ID15446827
Web address (URL)http://europepmc.org/abstract/med/15446827
Publication dates
Published11 Aug 2004

Related outputs

microRNA 1307 Is a Potential Target for SARS-CoV-2 Infection: An <i>in Vitro</i> Model
Arisan, Elif Damla, Dart, D. Alwyn, Grant, Guy H., Dalby, A.R., Kancagi, Derya Dilek, Turan, Raife Dilek, Yurtsever, Bulut, Karakus, Gozde Sir, Ovali, Ercument, Lange, Sigrun and Uysal-Onganer, P. 2022. microRNA 1307 Is a Potential Target for SARS-CoV-2 Infection: An <i>in Vitro</i> Model. ACS Omega. 7 (42), pp. 38003-38014. https://doi.org/10.1021/acsomega.2c05245

Bacterial Adaptation to Venom in Snakes and Arachnida
Esmaeilishirazifard, Elham, Usher, Louise, Trim, Carol, Denise, Hubert, Sangal, V., Tyson, G., Barlow, Axel, Redway, Keith F, Taylor, John D, Kremyda-Vlachou, Myrto, Davies, Sam, Loftus, Teresa D, Lock, Mikaella M G, Wright, Kstir, Dalby, Andrew, Snyder, L., Wuster, Wolfgang, Trim, Steve and Moschos, S. 2022. Bacterial Adaptation to Venom in Snakes and Arachnida. Microbiology Spectrum. 10 (3) e02408-21. https://doi.org/10.1128/spectrum.02408-21

Complete analysis of the H5 hemagglutinin and N8 neuraminidase phylogenetic trees reveals that the H5N8 subtype has been produced by multiple reassortment events
Dalby, A.R. 2016. Complete analysis of the H5 hemagglutinin and N8 neuraminidase phylogenetic trees reveals that the H5N8 subtype has been produced by multiple reassortment events. F1000Research . 5, p. 2463 2463. https://doi.org/10.12688/f1000research.9261.1

Molecular dynamics simulations of the temperature-induced unfolding of crambin follow the Arrhenius equation
Dalby, A.R. and Shamsir, M. 2015. Molecular dynamics simulations of the temperature-induced unfolding of crambin follow the Arrhenius equation. F1000Research. 4 (589). https://doi.org/10.12688/f1000research.6831.1

The European and Japanese outbreaks of H5N8 derive from a single source population providing evidence for the dispersal along the long distance bird migratory flyways
Dalby, A.R. and Iqbal, M. 2015. The European and Japanese outbreaks of H5N8 derive from a single source population providing evidence for the dispersal along the long distance bird migratory flyways. PeerJ. 3 e934. https://doi.org/10.7717/peerj.934

A global phylogenetic analysis in order to determine the host species and geography dependent features present in the evolution of avian H9N2 influenza hemagglutinin
Dalby, A.R. and Iqbal, M. 2014. A global phylogenetic analysis in order to determine the host species and geography dependent features present in the evolution of avian H9N2 influenza hemagglutinin. PeerJ. 2 e655. https://doi.org/10.7717/peerj.655

The Robustness of Pathway Analysis in Identifying Potential Drug Targets in Non-Small Cell Lung Carcinoma
Dalby, A.R. and Bailey, I. 2014. The Robustness of Pathway Analysis in Identifying Potential Drug Targets in Non-Small Cell Lung Carcinoma. Microarrays. 3 (4), pp. 212-225. https://doi.org/10.3390/microarrays3040212

Analysis of gene expression data from non-small celllung carcinoma cell lines reveals distinct sub-classesfrom those identified at the phenotype level
Dalby, A.R., Emam, I. and Franke, R. 2012. Analysis of gene expression data from non-small celllung carcinoma cell lines reveals distinct sub-classesfrom those identified at the phenotype level. PLoS ONE. 7 (11) e50253. https://doi.org/10.1371/journal.pone.0050253

Identification of Schistosoma mansoni microRNAs
Simões, M.C., Lee, J., Djikeng, A., Cerqueira, G.C., Zerlotini, A., da Silva-Pereira, R.A., Dalby, A.R., LoVerde, P., El-Sayed, N.M. and Oliveira, G. 2011. Identification of Schistosoma mansoni microRNAs. BMC Genomics. 12 (47), pp. 1-17. https://doi.org/10.1186/1471-2164-12-47

Developing stochastic models for spatial inference: bacterial chemotaxis
Yu, Y.D., Choi, Y., Teo, Y.Y. and Dalby, A.R. 2010. Developing stochastic models for spatial inference: bacterial chemotaxis. PLoS ONE. 5 (5) e10464. https://doi.org/10.1371/journal.pone.0010464

A comparative proteomic analysis of the simple aminoacid repeat distributions in Plasmodia reveals lineagespecific amino acid selection
Dalby, A.R. 2009. A comparative proteomic analysis of the simple aminoacid repeat distributions in Plasmodia reveals lineagespecific amino acid selection. PLoS ONE. 4 (7) e6231. https://doi.org/10.1371/journal.pone.0006231

Beta-sheet containment by flanking prolines: molecular dynamic simulations of the inhibition of beta-sheet elongation by proline residues in human prion protein.
Shamsir, M.S. and Dalby, A.R. 2007. Beta-sheet containment by flanking prolines: molecular dynamic simulations of the inhibition of beta-sheet elongation by proline residues in human prion protein. Biophysical Journal. 92 (6), pp. P2080-2089. https://doi.org/10.1529/biophysj.106.092320

COPASAAR--a database for proteomic analysis of single amino acid repeats.
Depledge, D.P. and Dalby, A.R. 2005. COPASAAR--a database for proteomic analysis of single amino acid repeats. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-6-196

Predicting the phosphorylation sites using hidden Markov models and machine learning methods.
Senawongse, P., Dalby, A.R. and Yang, Z.R. 2005. Predicting the phosphorylation sites using hidden Markov models and machine learning methods. Journal of Chemical Information and Modeling. 45 (4), pp. 1147-1152. https://doi.org/10.1021/ci050047+

Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms.
Berry, E.A., Dalby, A.R. and Yang, Z.R. 2004. Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Computational Biology and Chemistry. 28 (1), pp. 75-85. https://doi.org/10.1016/j.compbiolchem.2003.11.005

Constructing an enzyme-centric view of metabolism.
Horne, A.B., Hodgman, T.C., Spence, H.D. and Dalby, A.R. 2004. Constructing an enzyme-centric view of metabolism. Bioinformatics. 20 (13), pp. 2050-2055. https://doi.org/10.1093/bioinformatics/bth199

Mining HIV protease cleavage data using genetic programming with a sum-product function.
Yang, Z.R., Dalby, A.R. and Qiu, J. 2004. Mining HIV protease cleavage data using genetic programming with a sum-product function. Bioinformatics. 20 (18), pp. 3398-3405. https://doi.org/10.1093/bioinformatics/bth414

The structure of human liver fructose-1,6-bisphosphate aldolase
Dalby, A.R., Tolan, D.R. and Littlechild, J.A. 2002. The structure of human liver fructose-1,6-bisphosphate aldolase. Acta Crystallographica Section D. D57, pp. 1526-1533. https://doi.org/10.1107/s0907444901012719

Structural and functional comparisons between vanadium haloperoxidase and acid phosphatase enzymes.
Littlechild, J., Garcia-Rodriguez, E., Dalby, A.R. and Isupov, M. 2002. Structural and functional comparisons between vanadium haloperoxidase and acid phosphatase enzymes. Journal of Molecular Recognition. 15 (5), pp. 291-296. https://doi.org/10.1002/jmr.590

Crystal structure of dodecameric vanadium-dependent bromoperoxidase from the red algae Corallina officinalis.
Isupov, M.N., Dalby, A.R., Brindley, A.A., Izumi, Y., Tanabe, T., Murshudov, G.N. and Littlechild, J.A. 2000. Crystal structure of dodecameric vanadium-dependent bromoperoxidase from the red algae Corallina officinalis. Journal of Molecular Biology. 299 (4), pp. 1035-1049. https://doi.org/10.1006/jmbi.2000.3806

Crystal structure of human muscle aldolase complexed with fructose 1,6-bisphosphate: mechanistic implications.
Dalby, A.R., Dauter, Z. and Littlechild, J.A. 1999. Crystal structure of human muscle aldolase complexed with fructose 1,6-bisphosphate: mechanistic implications. Protein Science. 8 (2), pp. 291-297. https://doi.org/10.1110/ps.8.2.291

Structure of a phosphoglycerate mutase:3-phosphoglyceric acid complex at 1.7 A.
Crowhurst, G.S., Dalby, A.R., Isupov, M.N., Campbell, J.W. and Littlechild, J.A. 1999. Structure of a phosphoglycerate mutase:3-phosphoglyceric acid complex at 1.7 A. Acta Crystallographica Section D. D55, pp. 1822-1826. https://doi.org/10.1107/s0907444999009944

Preliminary X-ray analysis of a new crystal form of the vanadium-dependent bromoperoxidase from Corallina officinalis.
Brindley, A.A., Dalby, A.R., Isupov, M.N. and Littlechild, J.A. 1998. Preliminary X-ray analysis of a new crystal form of the vanadium-dependent bromoperoxidase from Corallina officinalis. Acta Crystallographica Section D: Structural Biology. D54 (Pt 3), pp. 454-457. https://doi.org/10.1107/s0907444997014558

Studies with type I aldolase to understand fructose intolerance and combat parasitic disease.
Dalby, A.R. and Littlechild, J.A. 1996. Studies with type I aldolase to understand fructose intolerance and combat parasitic disease. Journal of Pharmacy and Pharmacology. 48 (2), pp. 214-217. https://doi.org/10.1111/j.2042-7158.1996.tb07126.x

Permalink - https://westminsterresearch.westminster.ac.uk/item/vw777/evaluation-of-mutual-information-and-genetic-programming-for-feature-selection-in-qsar


Share this

Usage statistics

80 total views
0 total downloads
These values cover views and downloads from WestminsterResearch and are for the period from September 2nd 2018, when this repository was created.