The classification of solvents by combining classical QSPR methodology with principal component analysis

Katritzky A., Fara D., Kuanar M., Hur E., Karelson M.

JOURNAL OF PHYSICAL CHEMISTRY A, vol.109, no.45, pp.10323-10341, 2005 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 109 Issue: 45
  • Publication Date: 2005
  • Doi Number: 10.1021/jp050395e
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.10323-10341
  • Eskisehir Osmangazi University Affiliated: Yes


The results of a quantitative structure- property relationship (QSPR) analysis of 127 different solvent scales and 774 solvents using the CODESSA PRO program are presented. QSPR models for each scale were constructed using only theoretical descriptors. The high quality of the models is reflected by the squared multiple correlation coefficients that range from 0.726 to 0.999; only IS models have R-2 < 0.800. This enables direct theoretical calculation of predicted values for any scale and/or for any organic solvent, including those previously unmeasured. The molecular descriptors involved in the models are classified and discussed according to (i) the origin of their calculation (i.e., constitutional, geometric, charge-related, etc.) and (ii) the commonly accepted classification of physical interactions between the solute and solvent molecules in liquid (condensed) media. A reduced matrix 774 (solvents) x 100 (solvent scales) was selected for the principal component analysis (PCA) by taking into account only the solvent scales with more than 20 experimental data points. The first 5 principal components account for 75% of the total variance. The robustness of the PCA model obtained was validated by the comparison models development for restricted submatrices of data and with the results obtained for the full data set. The total variance accounted for by the first three PCs, for the submatrices with the same number of solvent scales but different numbers of solvents, varies from 68.2% to 59.0%. This demonstrates that the total variance described by the first 3 components is essentially stable as the number of solvents involved varies from 100 to 774. Subsequently, a matrix with 703 diverse solvents and 100 solvent scales was selected for the general classification of the solvents and scales according to the scores and loadings obtained from the PCA treatment. Classification of the theoretical molecular descriptors, derived from the chemical structure alone, according to their relevance to specific types of intermolecular interaction (cavity formation, electrostatic polarization, dispersion, and hydrogen bonding) in liquid media enables a more easily comprehensible physical interpretation of the QSPR of molecular properties in liquids and solutions. The reported QSPR models for solvent scales with theoretical molecular descriptors and the results of the PCA analysis are potentially of great practical importance, as they extend the applicability of correlations with empirical solvent scales to many previously unmeasured systems.