Is Your Software Trustworthy?
by Rob Meyer
While visiting one of our customers a while back I learned that their new financial modeling application, using some of our code, would soon be in the hands of hundreds of retail account managers. I found myself alternately proud that they had the confidence to use our code in this high-profile application and slightly uneasy that someone other than a financial engineer was about to make decisions worth substantial sums based upon the answers. I have the utmost confidence in the abilities of our mathematicians, statisticians and software engineers but I have spent enough years around mathematics and technical software to appreciate a few of the things that can go wrong. The experience made me think about trends in financial applications, the esoteric realities of numerical software quality and the implications for how our organization (and yours) develops software.
In an era where organizations are putting increasingly sophisticated financial modeling software in the hands of employees and customers, it is critical to build in the quality that inspires trust from users who will never see a line of code or a mathematical formula. In the rest of this article I'll talk about trends I see in the development of financial applications and why code quality is increasingly important. I'll also outline some of the steps our organization takes to ensure quality. Though we've been creating, porting and supporting numerical software for 35 years, we are still learning. Perhaps you'll see some steps you can adapt for your software development organization or some questions to ask about any third-party software you consider adopting.
Trends: Sophisticated Applications/Unsophisticated Users?
Our organization has been serving financial customers for more than twenty years. We have seen a number of trends emerge but some of the most profound have been in just the past five years. Among them:
- Software applications are outliving their hardware and creators. It's not unusual for a large organization to invest $500,000 to $1 million or more in labor and other costs in creating a complex application. The hardware it runs on will be scrap in three to four years. Many on the development team will have moved on to other roles and companies in the same timeframe. Companies will make incremental adjustments and port the application to new hardware/operating system versions but they will be loath to rewrite it. My personal object lesson in this phenomenon came five years ago when I visited the research group where I did my graduate studies in the late 1970s. When I introduced myself to one of the doctoral students, he told me that he had just converted a modeling application I wrote from Fortran to C. I was tempted to ask him how many bugs he found while doing the conversion!
- Sophisticated applications are increasingly in the hands of "ordinary mortals" (i.e., not financial engineers) making important business decisions. Gaining business advantage from an application often means enabling more people to use it to make decisions. These users are generally knowledgeable about their jobs but less so about mathematics, statistics and computer software. They don't have the means to validate the results.
- With longer lives, many applications are being ported one or more times to new chip architectures and operating system versions. Even straightforward upgrades can introduce multiple changes in operating systems, compiler options, underlying math libraries and so on. Even without any changes to the application logic, these other changes can introduce errors into the ported application.
- Enterprises are also putting sophisticated financial modeling software into the hands of their customers as the means to deliver services. In some instances, the software modeling application is the product sold to the customer. These customer organizations, in turn, are putting the software into the hands of an increasingly large and diverse staff to gain the business advantage promised by the software.
In the face of an increasingly large and diverse population of users, what do we do to insure that our code meets user expectations; namely "fast, accurate and never breaks?"
To gain some perspective on what can be done, I sat down with two of our most experienced developers and software engineers to talk about the steps we use from the selection of the algorithm itself to the delivery of product code to the customer. While some of these may be unique to a commercial software firm or to numerical code, you may find that you can use them in your application development process. You may also wish to use these as a guide in evaluating code you are considering adoption of, whether commercial, open-source or internally developed.
Steps on the Road to Quality
- Algorithm Selection: In many areas of mathematics and statistics, there are several methods to solve the same problem. Each is likely to have both advantages and disadvantages. For example, method "A" may be computationally faster to reach a solution while "B" is more robust in handling extreme cases of data or poorly formed problems. Also, some methods are amenable to measuring their own errors while others are not; this is especially important when the code needs to be ported to a new platform. Take the time to evaluate and summarize each of the methods. Better still, have a peer of the first reviewer independently look at the report before choosing. Finally, remember that speed and robustness are often in competition with each other, i.e., the faster method may provide less accuracy or may break more easily. Our general bias is to err on the side of robustness since machines continue to make code faster. The shorthand we use for this choice is the rhetorical question: "How fast do you want the wrong answer?"
- Code Engineering: At this stage, our process divides into three parallel tasks: core algorithmic coding, interface design and documentation. The core algorithmic code where the guts of the computation take place is documented in XML (Extensible Markup Language). While a user doesn't see this, it permits the automatic adaptation of the documentation to different languages and interfaces without a manual translation (and the errors that can come with it). We separate the interface design because the world is a "moving target" of languages and styles. Abstracting the interface of an algorithm into XML permits software tools to perform the translation to a new environment and eliminates most errors that result from a manual process. The core algorithmic code itself is written under standards that emphasize portability. While it might be tempting to use certain trendy language extensions, we code first for portability and adhere to a set of internally developed standards for a variety of things like variable naming and interface design. This point comes home the first time a routine has to undergo a major rewrite as it is being moved to a new environment. At this stage we also subject the code to a host of software tools for validating argument lists, checking for un-initialized variables or finding memory leaks. The result is a careful blend of strict coding standards, design for portability and the use of automated tools to reduce human error.
- Code Engineering Quality Assurance: This is an independent peer review of the core code, interface and a proofreading of documentation to ensure that the developer has adhered to coding standards, run required tools and properly documented the code. You could question this seeming fussiness if only a few routines were involved, but we have over 1,500 at the user-callable level alone. Even for a dozen or more complex routines, code, interface and documentation standards and automated tools will reduce errors and improve the longevity of the code.
- Overnight "build": Using the base core code, interface, documentation, stringents and example programs, we build finished executables each night during the development process on six or more systems (chip hardware, operating system, compiler) simultaneously using an automated process, logging all results. This system tends to find both systemic code errors and ones which are unique to a particular compiler. This "short loop" system means that errors are caught earlier and portability across multiple platforms is assured.
- Testing: Simply put, the temptation is great to short-circuit this step. Within our code base we have 30-year-old routines and six months old routines. We plan for the latter to be around as long as the former. We accomplish this by investing more time in test programs (called "stringents") than we do writing the core code. These stringents exercise all error exits and code paths for a broad range of problems, including "edge cases" where the algorithm begins to fail. Stringents are often two to three times longer than the core code they test. Errors revealed return to Code Engineering for further development. We also use related test programs to assure that the interfaces work properly and example programs to conduct simple tests of the integrated code. These example programs also exercise all error messages to confirm that the messages are meaningful.
- Implementation: After testing, we build the production version of the code with all of the base materials. Part of this process is determining proper compiler "flags" necessary to get an acceptable compromise of performance and accuracy. In addition, we will often test the code on "variants" (slightly different versions of operating systems and compilers) to advise users about other workable variations. Finally, we check the installer and user advisory notes to make certain that they conform to the test system and results.
- Quality Assurance: This is an independent check of installation on the target system. It also includes execution of example programs and a check of stringent test results and installer/user notes. From this, a master CD and set of download installation files is created. The master and download files are then used to do a final test installation.
By now, you are probably mentally exhausted and feeling slightly numb. You may also be asking yourself, "Why should I devote this much effort on complex code that goes into my financial modeling applications?" The answer depends on many factors, including the expected longevity of the application and the financial and other consequences of getting it wrong. We take this much care because our users, especially those running the same application on multiple platforms, need to have equal confidence of correctness on any of 40-50 different implementations. We're also thinking about the next operating system version, chip architecture and compiler improvement. Having done it for nearly 35 years we aren't about to become short-sighted now.
What useful advice could you take away from this? First, take the time to think through your algorithmic needs, people resources and time horizons. The best method for your situation might come from an open source project on the web, someone on your staff, a published source such as Numerical Recipes or from a supported commercial library like that provided by the Numerical Algorithms Group or others. Keep in mind where we started; complex financial modeling applications are costly to develop and usually outlive both the hardware and often the developers themselves. Their life cycle costs are dominated by the development staff hours spent building, debugging, maintaining and porting to the next platform. What's the right formula for your next project?
Rob Meyer is the president and CEO of Numerical Algorithms Group (NAG).