Examen d’analyse statistique des donn´ees∗
All documents are allowed. Duration: 2h. The work should be presented as a paper sheet and one or
more Excel files. It should be posted on the Moodle platform. All statistical tests should be performed usinga confidence level of 95% unless explicitly specified otherwise. The candidate can choose another level ifhe feels it is useful for his/her conclusions. In that case the confidence level must be clearly indicated.
All answers must argued otherwise it shall be considered as null. Between 1962 and 1964, C. Hansch and T. Fujita published landmark articles considered as the
fundation of QSAR and QSPR modeling [1, 2, 3]. They described several linear correlations betweensome biological activites and two proportionality parameters, σ and π, deduced from a linear free energyrelationship analysis.
The first one, σ is the Hammett substituent constant. Given a reaction, the rate constants of the
reaction acting on a X-substituted benzen ring (kX ) and the corresponding H-substituted benzene ring(kH ) follows a proprotionality law known as the Hammett equation: log(kX /KH ) = ρσ, where ρ isconstant. It is interpreted as a measure of the intrinsic reactivity of the compound. The authors relatedthis property to the free energy of interaction of the compound with a biological target.
The second one, π (the Hansch-Fujita constant) is a proportionality coefficient between the two phases
partition coefficient of an X-substituted compound (PX ) compared to the H-substituted compound (PH ):log(PX /PH ) = kπ. For octanol/water partition coefficients, used as a reference situation, k = 1. It isinterpreted as a measure of the lipophilicity of the compound. The authors used this parameter as amodel of the mobility of a compound in a biological medium modeled as a mixture of aqueous and fattyenvironment.
This subject propose to re-examine some of the orginal data published by Hansch and Fujita during
The data for 34 phenoxyacetic acids were compiled for this study.
Auxins.xlsx; original data are in the sheet Data.
The majority of the reported compounds had a
The concentration (in mole/L) producing and elongation of plant
tissues of 10% in 24h was reported as the logarithm of the inverse of this concentration log(1/C). Foreach compound, were also reported estimated values for σ and π. Compound which log(1/C) < 3 areconsidered inactives. In other sheets, the σ values reported are more recent estimated values. The targetfactor to model is the log(1/C).
First, you will try to characterize the distribution of log(1/C). Define the following terms:
∗Enseignants: G. Marcou Universit´e de Strasbourg, Institut de Chimie, 4, rue Blaise Pascal, 67000 Strasbourg
In the sheet Q1 report for log(1/C) the following values:
The original estimates are given in the sheet Q2 in the column log(1/C) pub. The publication
Compute in the column pi^2 the π2 values for each compound and apply the equation (1) in the columnlog(1/C) eq1.
Report the difference between the computed values using equation (1) and the published values, in thecolumn d1. Check if the computed and published values are similar or not. Report you conclusions inthe sheet Q2, starting at the cell I37.
For the compound 3-SO2CH3, the π values were found to be inadequate by the authors and was
re-estimated. Both estimates are present in the dataset reported in the sheet Q2 line 24 and 25.
Perform a Grubbs test on the difference between the experimental (column log(1/C)) and the computedvalues (column log(1/C) pub) of the log(1/C). You can report this difference in the column d2. Checkthat one of the entry for the compound 3-SO2CH3 is an outlier according to this difference. A table ofcritival values of the Grubbs distribution is given in the sheet Grubbs. Report your conclusions in thesheet Q2, starting at the cell L1.
The data usefull for modeling are reported in the sheet Q3.
Use the data in the sheet Q3 to build the following models:
Comment your regressions and select the best model.
In fact the authors did not used data for the 2-fluoro, 3-SO2CF3, 3-OH and 3-COOH. The compounds
are indicated in red in the sheet Q3.
Use the data in the sheet Q3 to rebuild the following models, excluding the compounds in red:
Comment your regressions and select the best model.
Conclude in few lines, in the sheet Conclusion.
[1] C. Hansch and T. Fujita; ρ-σ-π Analysis: A Method for the Correlation of Biological Activity and
Chemical Structure; JACS, 1964, 1616-1626.
[2] C. Hansch and R. Muir and T. Fujita and P. P. Maloney and F. Geiger and M. Streich; The
Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives withHammett Constants and Partition Coefficients; JACS, 1963, 2817-2824.
[3] C. Hansch and P. Peyton and T. Fujita and R. M. Muir Correlation of Biological Activity of Phe-
noxyacetic Acids with Hammett Substituent Constants and Partition Coefficients; Nature, 1962,194, 178-180.
Disposición derogatoria única. Derogación normativa. Quedan derogadas cuantas disposiciones, de igualo inferior rango, se opongan a lo dispuesto en este RealREAL DECRETO 327/2002, de 5 de abril, porDecreto, y en particular los Estatutos para el Régimenel que se aprueban los Estatutos Generalesy Gobierno de los Colegios de Arquitectos, aprobadosde los Colegios Oficiales de Arquitectos y