This post is a part of The NAG Optimization Corner series.

In a previous post, we introduced Derivative-free optimization (DFO) and showed it is particularly adapted to problems where the objective function has unknown derivatives and is noisy or expensive to compute. In this post, we demonstrate how DFO can speed up a practical calibration problem arising frequently in the finance industry: calibrating an option pricing model to market data.

The following material is based on a recent webinar we presented, a recording is available here.

Recent improvements in DFO solvers

DFO has seen enormous activity in the last years since our previous DFO post. NAG collaborated on a research project with Coralia Cartis and Lindon Roberts from the University of Oxford. This research led to significant improvements, such as noise resilience and enhancements to interpolation models for structured problems, and brought the state of the art of DFO techniques [1] into the NAG Library. We now offer a bound-constrained nonlinear least squares solver suitable for data fitting problems, handle_solve_dfls (e04ff), as well as one that is aimed at general nonlinear functions, handle_solve_dfno (e04jd). We also provide alternative interfaces (reverse communication) to pass the function values to the solvers, handle_solve_dfls_rcomm (e04fg) and handle_solve_dfno_rcomm (e04je), respectively.

Pricing European Options: calibrating the Heston model

A European option is a contract giving the buyer the option to buy (call) or sell (put) an underlying at a given price (strike) and expiration date (maturity). Pricing the option consists in attributing it a price based on the probability that the buyer will want to exercise the option at maturity.

Many numerical pricing methods have been introduced throughout the years, the most common ones still being based on the Black-Scholes equations. However, the Black-Scholes model has some shortcomings, among which the assumption that the volatility is constant over time. It is in fact observed that market implied volatilities are smile-shaped. Stochastic volatility models such as Heston’s were introduced to try to explain this shape. The model consists in the following set of equations:

 $\begin{array}{ccc}\hfill d{S}_{t}& =\mu {S}_{t}{d}_{t}+\sigma \sqrt{{v}_{t}}{S}_{t}d{W}_{t}^{1}\hfill & \hfill \\ \hfill d{v}_{t}& =\lambda \left(1-{v}_{t}\right)dt+\alpha \sqrt{{v}_{t}}d{W}_{t}^{2}\hfill \\ \hfill d{W}_{t}^{1}\cdot d{W}_{t}^{2}& =\rho dt\hfill \end{array}$

where ${S}_{t}$ is the spot price of the underlying, ${v}_{t}$ is the time-dependent volatility and $\left(d{W}_{t}^{1},d{W}_{t}^{2}\right)$ are two Brownian motions.

This model still depends on 4 parameters that are not easy to directly observe in the market: the volatility scaling $\sigma$, the mean reversion rate $\lambda$, the volatility of volatility $\alpha$ and the Brownian motion correlation $\rho$. These parameters, therefore, need to be calibrated. This can be done by observing historical data and trying to optimize the parameters to make our model prediction match it as closely as possible. This is a great candidate for DFO due to the lack of readily available derivatives and relatively expensive evaluation.

Introducing term structure in the Heston model

The Heston equations still assume that our 4 parameters are constant with respect to time, which does not necessarily correspond to market reality. To alleviate this problem, a possible extension is to add term structure: $\sigma$, $\lambda$, $\alpha$ and $\rho$ are only considered constant over fixed time periods. This allows the model to better mimic market behaviour but also multiplies the number of parameters to calibrate by the number of time periods considered.

An implementation of the Heston model with term structure is available in the NAG Library (opt_heston_term, s30ncf) and is the one used in the following numerical experiments.

Setting up the calibration problem

We have access to historical data in the foreign exchange market for the currency pair EUR-USD ranging from 2012 to 2016. For each date, we have data for 7 maturities (2m, 3m, 6m, 1y, 2y, 3y and 5y):

• EUR risk-free rate
• USD risk-free rate
• 25-$\Delta$ RR quotes
• 25-$\Delta$ BF quotes
• ATM quote

One apparent issue with this data is that we have only 3 quotes and 4 parameters to tune even without considering term structure. To avoid over-fitting, we chose to fix one of the parameters $\lambda$ to a constant value after some numerical experiments. Without term structure, calibrating the Heston model would then consist in solving the nonlinear least squares optimization problem:

 $\underset{\alpha ,\rho ,\sigma }{min}\frac{1}{2}\sum _{i=1}^{m}{\left({H}_{i}\left(\alpha ,\rho ,\sigma \right)-{M}_{i}^{market}\right)}^{2}$

where ${H}_{i}\left(\alpha ,\rho ,\sigma \right)$ are the prices predicted by the model and ${M}_{i}^{market}$ the actual data.

The next step is to choose the number of time periods to set in the term structure. With 7 maturities it is natural to consider 7 time periods for our term structure: for every date, we consider $k$ time periods for the ${k}^{th}$ maturity.

With this structure, each maturity depending on the parameters of the previous ones plus its own, it was also natural to choose to fit the parameters sequentially. The first parameters $\left({\alpha }_{1},{\rho }_{1},{\sigma }_{1}\right)$ are tuned to the quotes of the 2m maturity, then fixed to their optimal values and $\left({\alpha }_{2},{\rho }_{2},{\sigma }_{2}\right)$ are tuned to the second maturity. the process continues until we end up with a sequence of seven 3-parameters calibration problems:

 $\left\{\begin{array}{cc}\underset{{\alpha }_{1},{\rho }_{1},{\sigma }_{1}}{min}\sum _{i=1}^{m}{\left({H}_{i}\left({\alpha }_{1},{\rho }_{1},{\sigma }_{1}\right)-{M}_{i}\right)}^{2}\hfill & \hfill \left(P1\right)\\ \underset{{\alpha }_{2},{\rho }_{2},{\sigma }_{2}}{min}\sum _{i=1}^{m}{\left({H}_{i}\left({\alpha }_{1},{\alpha }_{2},{\rho }_{1},{\rho }_{2},{\sigma }_{1},{\sigma }_{2}\right)-{M}_{i}\right)}^{2}\hfill & \hfill \left(P2\right)\\ ⋮\hfill & \hfill \\ \underset{{\alpha }_{7},{\rho }_{7},{\sigma }_{7}}{min}\sum _{i=1}^{m}{\left({H}_{i}\left({\alpha }_{1},\dots ,{\alpha }_{7},{\rho }_{1},\dots ,{\rho }_{7},{\sigma }_{1},\dots ,{\sigma }_{7}\right)-{M}_{i}\right)}^{2}\hfill & \hfill \left(P7\right)\end{array}\right\$ (1)

Numerical Experiment

To solve the calibration problem (1), we try two solvers from the NAG Library dedicated to nonlinear least squares problems:

Since we don’t have access to the exact derivatives of the Heston model, we estimate the Jacobian of the residuals needed by e04gg with finite differences. Using finite differences effectively can be quite challenging (as discussed in our previous post). Here we chose a constant perturbation parameter $h=1{0}^{-6}$ after some numerical experiments.

For our comparisons, we use the same realistic stopping criterion for both solvers. The fit is considered good if it reaches 1-basis point precision:

 $\frac{{‖{H}_{i}\left(\alpha ,\rho ,\sigma \right)-{M}_{i}‖}_{\infty }}{S}\le 0.01$

The data we use contains 1070 dates defining 1070 sequences of seven calibration problems to solve (7490 total). All were solved using both approaches, the results can be summarized as:

• Total number of calls to the Heston model:
• DFO: 258540 (avg 241); gradient-based: 963933 (avg 900)
• DFO needed 3.7 times fewer evaluations.
• Total number of calibrations that didn’t reach 1-basis point:
• DFO: 32 (20 problems); gradient-based: 180 (138 problems)
• DFO solved to required accuracy 98% of the problems in contrast to 87% for the other.
• Number of calls to the Heston model for problems that both methods solved:
• DFO: 211326 (avg 229); gradient-based: 575379 (avg 626)
• DFO required 2.7 times fewer evaluations.

The figure below captures the speedup of the DFO solver over the finite-differences based solver as a ratio of the number of function evaluations needed per problem. Each blue dot represents a problem where the DFO solver was faster whereas each red dot is in favour of the derivative-based solver. It is clear that the DFO solver offers much faster convergence for a majority of problems.

Things to remember

Calibrating black-box models is not a trivial problem, you should consider using a derivative-free solver if you don’t have access to precise estimations of the derivatives. As shown on the practical Heston calibration example, you are likely to obtain a better fit in fewer model calls with DFO solvers.

To see all the previous blogs, please go here. You can also find various examples through our GitHub Local optimization page. See you in the next blog.

References

[1] Coralia Cartis, Jan Fiala, Benjamin Marteau, and Lindon Roberts. Improving the exibility and robustness of model-based derivative-free optimization solvers. ACM Trans. Math. Softw., 45(3), August 2019.