I recently researched the mathematical modelling of the dose–response relationship using NAG’s Optimization Modelling Suite data fitting solver. In an exploration of the use of different model starting points, I saw the effect that this can have on the quality of the optimal model found. The importance of using a versatile data fitting solver in dose–response analysis became clear, particularly a solver that allows for easy interchange of loss functions and regularisation functions, since this aids identification of the best model fit for the data. In my research I also consider the much-discussed issue of collinearity, which can arise when performing a multiple regression, and demonstrate one method to reduce collinearity using regularisation.

### Background on dose–response analysis

Dose–response models are used to describe the response of an organism after being exposed to a stimulus. This response is measured after treating the organism with different concentrations of the chemical. Since humans are constantly exposed to different chemicals over their life, one use of dose–response models is to determine the toxicity of chemicals in a low-cost way (for example, modelling an organism’s response to a chemical is significantly cheaper and faster than conducting the classical rodent bioassay [5]). A further application of dose–response analysis is examining how effective a drug is at decreasing the growth of tumours in a tissue culture. Therefore, the investigation of dose–response relationships is crucial to determining public policy regarding what dosages of drugs and pollutants are considered safe, hazardous, or, in some contexts, beneficial.

The history of dose–response analysis can be traced back to the eighth century BC, in which many people tried to interpret this relationship [9]. It was in the 1500s, however, when an important figure in toxicology named Paracelsus realised that the most important factor in determining whether a chemical is toxic or not is its dosage [11], when he coined the adage “the dose makes the poison”. Moving on to the early 1900s, dose–response relationships in toxicology were mainly an observational science, being studied at the level of disease-specific, animal-based models. It was the 1970s when this field started transitioning to a purely statistical approach, and the fitting of dose–response models to data, which is what we will do in this blog, became more widespread.

### Data fitting via the NAG Library

The process of fitting a model to dose–response data can be achieved through nonlinear regression. This regression allows for the interpolation of unknown values and for the comparison of results from different experiments. However, experimental data of this type can often contain outliers – this can severely impact parameter estimation in nonlinear models. Therefore, it is important that the regression is robust. Robustness of a regression can be changed via the use of different loss functions. The NAG Library solver handle_solve_nldf is a routine that allows the use of various loss functions. More specifically, it can solve a data fitting problem of the form

\begin {aligned} &\min _{x \in \mathbb {R}^{n_{\mathrm {var}}}} \quad &&\sum ^{n_{\mathrm {res}}}_{i=1} \mathcal {X}(r_i(x)) + \rho \sum ^{n_{\text {var}}}_{i=1} \psi (x_i) \\ &\text {subject to} \quad &&l_g \leq g(x) \leq u_g, \\ & &&\tfrac {1}{2} x^T Q_i x + p_i^T x + s_i \leq 0, \quad 1 \leq i \leq m_Q, \\ & &&l_B \leq Bx \leq u_B, \\ & &&l_x \leq x \leq u_x. \end {aligned} Here, $\mathcal {X}$ is the loss function and it acts upon residuals $r_i(x)$, $i=1,\dots ,n_{\mathrm {res}}$, which are equal to the difference in the observed and predicted values of each data point. Further, $\psi$ is the regularisation function with regularisation coefficient $\rho$ – there are currently two different regularisation functions that can be used: the $l_1$-norm and the $l_2$-norm. It should be noted that this solver allows different types of constraints to be imposed on the minimisation problem – these are nonlinear, quadratic, linear, and box constraints, respectively (further details can be found in the documentation). As previously mentioned, a wide range of loss functions can be used, and these different loss functions affect the robustness of the regression. For example, Huber loss is more robust than the $l_2$-norm loss function (least squares) – this means that when residuals get larger, $l_2$-norm loss has a higher penalty than Huber loss and other more robust loss functions (see Figure 1).

### The four-parameter logistic model

We will use dose–response data for one of the chemicals from the US National Toxicology Program (NTP) library of 1408 chemicals [8]. This library originated from an initiative to move away from traditional toxicity testing (typically performed on animals) to focus on developing predictive models that assess how chemicals affect cellular responses and toxicity pathways [6]. This library contains toxicological profiles for many environmental compounds, including solvents, preservatives, and pollutants [12]. There are 14 doses in the study, with three replicates per dose (see Figure 2). Since dose–response curves tend to be sigmoidal in shape and monotonic, the following four-parameter logistic model [1], derived from the original equation developed by Hill [3], which he used to describe the binding of oxygen to haemoglobin, is typically fitted to the data: $y = a + \frac {b-a}{1 + (\frac {c}{x})^d}.$ This nonlinear regression model, where $x$ denotes the dose of a chemical and $y$ denotes the corresponding expected response, estimates four parameters: $a$ is the minimum curve asymptote, $b$ is the maximum curve asymptote, $c$ is the point of inflection in the curve – it also denotes the concentration of the drug corresponding to when half the population are expected to demonstrate the studied response (often referred to as $\mathrm {EC}_{50}$ or $\mathrm {IC}_{50}$), and $d$ is the steepness of the curve (also known as the Hill slope).

The model shown here is one of a family of Hill equations that are able to express many nonlinear relationships, including those found in areas such as ligand binding [10], quantitative pharmacology [2], and even dynamical systems in biosciences [7].

### Results from various regression models

The dose–response model was fitted to the NTP data using the NAG solver in Python and the code used to achieve this can be found on our Github Local Optimization page. Let us first consider this model fitted from a good-quality (or “good”) starting point (in this case, this means a starting point that is near to the optimal solution). In Figure 3, it can be seen that all loss functions result in a model curve that follows the expected sigmoidal shape. However, there is variation in these curves depending on the loss function used in the optimisation. For instance, the curve resulting from Huber loss (as well as Cauchy, quantile, $l_1$-norm, and smooth $l_1$-norm) goes exactly through the middle point of each dose (for larger dose concentrations); it treats the other two replicate doses as outliers. However, since $l_2$-norm loss is less robust, it is more influenced by these outliers, pulling the curve slightly upwards. Furthermore, the curves resulting from $l_\infty$-norm and arctan loss treat different values as outliers for the largest doses and therefore have different model fits for the larger doses.

Since nonlinear regression is a local method (it will only find local optima – not necessarily global optima), the final solution can be affected by the quality of the initial values given. So, let us next consider a model that is fitted starting from a worse initial point (which shall be referred to as “naive”). Note that we restrict our analysis to only $l_2$-norm and Huber loss functions to keep it simple.

#### Huber loss function with regularisation

In Figure 4 (a), we can see the curves resulting from fitting the model using Huber loss and starting from a naive initial value with different regularisation functions (none, $l_1$-norm, and $l_2$-norm). Let us first consider the model fitted using no regularisation function and a naive starting point. As can be seen in Figure 4 (a), the curve is more angular than the curve fitted using a good starting point, which is much smoother. Note that, if one wanted to analyse the standard errors of the parameters for this model, we must calculate a matrix comprised of the Jacobian matrix ($J$) of the Hill equation, this being $J^TJ$, and this matrix must be inverted. However, this matrix is uninvertible for this optimal solution, suggesting that there is collinearity in the model. Since this was not detected when performing other analyses, it is likely that this is a structural multicollinearity arising from the model itself, rather than from the data.

More specifically, multicollinearity can arise in a regression model if there is a correlation between variables. This can cause a few problems: the parameter estimates can become sensitive to small changes in the model, the precision of the parameter estimates can reduce, which also decreases the statistical power of the regression, and furthermore, this can undermine the statistical significance of the independent variable.

One way to try to reduce multicollinearity in a regression model is to use regularisation. Regularisation adds a penalty to the objective function in order to shrink the parameter estimates towards zero and avoid overfitting. We therefore next refit the model using the $l_2$-norm regularisation function in an attempt to deal with the multicollinearity of this model [4]. This results in a visually unimpressive model fit (Figure 4 (a)). However, when $l_1$-norm regularisation is used, the resulting curve is much smoother, and while it is not as good as the optimal solution found from the good starting point, it is the best found from the naive starting point. Furthermore, this regularisation makes the matrix $J^TJ$ invertible, meaning that the standard errors of the parameters could be found.

#### $l_2$-norm loss function with regularisation

Next, we consider the models fitted using the $l_2$-norm loss function for the naive starting point and using various regularisation functions. By visual inspection of Figure 4 (b), it appears that the resulting curves for $l_1$-norm regularisation and no regularisation are very similar to the curve found from the good starting point. However, when $l_2$-norm regularisation was used, the resulting curve looks like a significantly worse fit. A comparison of the optimal solutions found when fitting the model from the two different starting points and using $l_2$-norm loss can be found in Table 1. If we narrow our focus to only the model that was fitted using the good starting point and the model that was fitted using the naive starting point and $l_2$-norm regularisation, we can see that the parameter $d$ has a significantly larger standard error for the naive starting point with $l_2$-norm regularisation than the good starting point. This parameter represents the Hill slope, and the slope is notably different in the two models. In general, the standard errors of the parameters for all models except the one using a naive starting point and $l_2$-norm regularisation appear to be more regular.

 Parameter Start Regularisation Estimate SE 95% CI $a$ 1 None $-38.2$ $4.4$ $(-47.1, -29.3)$ 2 None $-3.4$ $1.0$ $(-5.4, -1.3)$ 2 $l_1$-norm $-3.3$ $1.0$ $(-5.4, -1.3)$ 2 $l_2$-norm $-3.0$ $1.1$ $(-5.4, -0.7)$ $b$ 1 None $-3.4$ $1.0$ $(-5.4, -1.3)$ 2 None $-38.2$ $4.4$ $(-47.1, -29.3)$ 2 $l_1$-norm $-37.7$ $4.3$ $(-46.4, -29.0)$ 2 $l_2$-norm $-28.0$ $3.4$ $(-34.8, -21.2)$ $c$ 1 None $32.5$ $5.6$ $(21.3, 43.8)$ 2 None $32.5$ $5.6$ $(21.3, 43.8)$ 2 $l_1$-norm $31.8$ $5.5$ $(20.7, 42.9)$ 2 $l_2$-norm $21.3$ $3.6$ $(14.1, 28.5)$ $d$ 1 None $-3.4$ $1.3$ $(-5.9, -0.8)$ 2 None $3.4$ $1.3$ $(0.8, 5.9)$ 2 $l_1$-norm $3.5$ $1.3$ $(0.8, 6.1)$ 2 $l_2$-norm $5.7$ $15.8$ $(-26.4, 37.7)$

Now let us compare the parameter estimates found for the $l_2$-norm loss model fitted starting from a naive initial point (see Table 1) to the original definitions of each parameter. Since the model we are fitting is decreasing, we would expect the parameter $d$, which is the Hill slope, to be negative. However, this is not the case for these optimal solutions. Furthermore, the parameters $a$ and $b$, which are the minimum and maximum asymptotes respectively, have swapped meanings for these optimal solutions. Therefore, the regression has essentially tried to fit an increasing curve to decreasing data, which could explain why we are getting slightly different dose–response curves compared to when we started from a good initial point – when the parameter estimates matched their original definitions. This demonstrates the effect that having a good-quality initial guess can have on the final regression model.

### Conclusion

Overall, dose–response analysis requires careful consideration of multiple factors. When fitting such a nonlinear regression model, it is important to consider the quality of the starting point since this can have a big effect on the optimal solution found. Furthermore, the optimal solution is also affected by the loss function used. Several loss functions (of varying robustness) should be tried in order to find a fit that best suits the data being used.

For more information on the NAG® Library and our Optimization Modelling Suite or to try it for yourself, visit ‘Getting Started with the NAG Library’, select your configuration and language, download the software, request a trial key and experiment for yourself.

This post is a part of The NAG Optimization Corner series.

Simplify Mathematical Optimization Problem Solving
References

[1]   Gadagkar, S. R., & Call, G. B. (2015). Computational tools for fitting the Hill equation to dose–response curves. Journal of Pharmacological and Toxicological Methods, 71, 68–76.

[2]   Gesztelyi, R., Zsuga, J., Kemeny-Beke, A., Varga, B., Juhasz, B., & Tosaki, A. (2012). The Hill equation and the origin of quantitative pharmacology. Archive for History of Exact Sciences, 66(4), 427–438.

[3]   Hill, A. V. (1910). The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. Journal of Physiology, 40, 4–7.

[4]   Kesavulu, P., Vasu K., Bhupathi Naidu, M., Abbaiah, R., & Balasiddamuni, P. (2016). The effect of multicollinearity in nonlinear regression models. International Journal of Applied Research, 2, 506–509.

[5]   Lim, C., Sen, P. K., & Peddada, S. D. (2013). Robust analysis of high throughput screening (HTS) assay data. Technometrics, 55(2), 150–160.

[6]   National Toxicology Program. (2004). A national toxicology program for the 21st century: A roadmap for the future. National Toxicology Program: Research Triangle Park, NC, USA.

[7]   Pérez-Correa, J. R., Lefranc, G., & Fernández-Fernández, M. (2015). A new application of the Hill repressor function: Automatic control of a conic tank level and local stability analysis. Mathematical Problems in Engineering, 2015, Article ID 271216.

[8]   Tice, R. R., Fostel, J., Smith, C. S., Witt, K., Freedman, J. H., Portier, C. J., …& Bucher, J. R. (2007). The National Toxicology Program high throughput screening initiative: Current status and future directions. Toxicologist, 46(246), 151.

[9]   Tsatsakis, A. M., Vassilopoulou, L., Kovatsi, L., Tsitsimpikou, C., Karamanou, M., Leon, G., …& Spandidos, D. A. (2018). The dose response principle from philosophy to modern toxicology: The impact of ancient philosophy and medicine in modern toxicology science. Toxicology Reports, 5, 1107–1113.

[10]   Weiss, J. N. (1997). The Hill equation revisited: Uses and misuses. FASEB Journal, 11(11), 835–841.

[11]   Waddell, W. J. (2010). History of dose response. Journal of Toxicological Sciences, 35(1), 1–8.

[12]   Xia, M., Huang, R., Witt, K. L., Southall, N., Fostel, J., Cho, M. H., …& Austin, C. P. (2008). Compound cytotoxicity profiling using quantitative high-throughput screening. Environmental Health Perspectives, 116(3), 284–291.

Leave a Comment