Things I learned building a model validation library

I spent a few years designing and partially building a library used for the validation of front office models at a financial institution, and I thought it might be a good idea to write down what I learned during that time. It was my first ever big project and happened to some degree by accident, building something that I just thought was neat turned into a modest project with its own team. But before we get to all that, let’s first talk briefly about what model validation is.

What is Model Validation?

Whenever you trade financial instruments you end up with one half of a trade. You sold an option, so you hold onto a responsibility to fulfill your end of the option deal. If you buy a forward, you hold that forward until you either sell it or it matures. These holdings are important because at the end of the day someone may ask what all these things are worth. You may want to know what they are worth for many reasons, like, how much money do we expect to make on them? But a very common reason is “if we had to sell (or close out) it all right now, and have no holdings, what’s a fair price that we could ask for that people are willing to pay?”. This “fair value” is an important principle. It often ends up in the financial statements of the organization and it can be quite problematic when it is wrong.

So how do we get those fair values? Sometimes, it’s simple. If you own some publicly traded stock, just take the end of day price of that stock and multiply it with how much stock that you own. Other times, it’s really hard. You have some equity in a private startup that writes its financial statements on a napkin? Well, you’ll probably want a team of analysts and have them spend some time looking at just that company to understand how much that equity is worth to someone else. There is a middle ground too (which was our focus); a set of securities and derivatives for which the price can’t just be read off a website, but that don’t need a dedicated team of analysts. A simple example is an over-the-counter (OTC) forward with a custom expiry. Let’s say you have a client that wants to buy salmon futures like the ones on Fishpool, which normally expire at the start of every month. However, they would like to have it expire at the end of the month instead, let’s say that’s in 2.5 months time. “No problem” you say. Since you know the 2 month salmon futures price, and the 3 month price, the 2.5 month price should probably be somewhere in between those two. So you draw a line between those two prices and take the price half way, add your fee or spread, and give your client an offer. At the end of the day your boss comes over and asks hey, how are we going to put these into our end of day book? “well, just draw a line between these two points every day, take the halfway point, and use that” you say, and there we have our model.

Of course it doesn’t really look exactly like this, where both the trading decisions and the modelling can get a lot more complicated. Definitely no-one is drawing lines in sharpie on their Bloomberg terminal (I hope). We do it with code, and indeed, that modelling can get so complicated that it can make people uneasy; dozens of numbers go in, one comes out, how can we know it makes any sense? If you have a whole bunch of asset classes and a dozen models for each, keeping track of it all can be quite daunting. Usually, each model will have a significant amount of documentation that needs to go through review and approval before it can be used, but even then, how can you be sure that the code does what the paper says it does? What if you cant even read the code, because the software you use was sold to you by a company that really doesn’t want to show you that code? For this, you need ~~open source trading and accounting software~~ model validation.

There are a few different ways you can do model validation. You can read the model documentation, ponder it a bit, maybe even look at one or two actual trades in the system, and then write up a big document of what you think of it. This covers some bases but not all of them. Another method that we settled on is to take the entire book of trades, attempt to value them ourselves, and then compare those valuations to whatever the trading system spits out. You could do all of this in excel (and indeed, sometimes we did) but there are open source libraries that can help. We ended up using QuantLib and the Open Source Risk Engine (ORE), the latter of which is kindof a library extension of QuantLib, but we ended up using it as an executable that you’d feed XML’s of data into and get valuations out of.

What’s a model validation library?

So you could write up some scripts to pull together some data and feed that into a bunch of XMLs, but quickly we found that there’s both a lot of repetition and a lot of special cases. So, we wanted both some reusable components (like yield curves, instruments, volatility surfaces and the like) but still having the ability to modify and compose them to deal with unexpected complexity. What we wanted was a library!

In retrospect, whether building a library was a good idea vs some alternative is not something I am completely confident about. There are upsides and downsides as with all things, but by talking about them perhaps others and even myself in future can use the information.

Model validation for us was a big game of unknown unknowns. The nature of the work often meant reverse engineering a black box with poor documentation. When we had a theory, we would test it by running small variations of models against each other to see if they match the black box output, which meant both understanding our own code in detail and having it be stable.

We chose to work in python because prototyping was easy, (which was nice since at the time it was the only language I knew with any depth). This turned out to work quite well. It’s fairly easy to write readable python, even though we produced unreadable python in almost equal measure. But this improved over time, and readability was key. We were aware that if we didn’t write readable code, we would just end up with another black box running alongside the existing one. “Look, the black boxes agree with each other!” is not a very inspiring result. We also had a fairly high turnover, and that loss of institutional knowledge led to more weight being put on preserving that knowledge with our code. We became archeologists not only of the (sometimes abondonware) systems we were validating, but of our own work.

Anyways, here are some more of the things I learned. I’ll start with a finance-y one, and then some more software engineering-y ones

There is no market standard.

Part of our job was to make sure that all models were to some degree “market standard”. This means if everyone else trading european options uses black-scholes, you should be using black-scholes too. This is a good idea, since if you do actually want to close out your positions and call someone up and your models are the same, then you are likely to agree on a fair price. The sale that happens is likely to be close to what the books said it should sell for.

For the simple stuff, that is mostly true. But things rarely turned out simple. What if there aren’t many others trading that thing you trade? What if you are a large enough player that the price is pretty much whatever you say it is (within reason)? What if two perfectly reasonable modelling assumptions lead to different valuations? A lot of the time, these quibbles over market standard did not have much of an impact on the final valuation anyways. “It’s immaterial” gets bandied about a lot. Still, immaterial now may still be material under stress. High volatility during market turbulence is going to make your choice of interpolation method matter a lot more, for example.

So I guess the thing I learned is you really have to 1. consider things from first principles, what expectations are and how these valuations move and 2. Gather as many cases where models failed as possible, stressed cases, surprising cases, and write them down. You want to build an understanding of what might happen in the same way a trader would. This was hard for me as someone who was just a programmer in the risk department, and I wish I had done it better.

Hammers and nails.

In software engineering there are a lot of hammers. Your programming language will offer you lots and lots of tools, hammers, the number of which grows larger the more mature the language is. These are the packages, patterns, syntactic sugar and so on that is available to you to use in your project that ostensibly save you time and effort. There is a wonderful advantage of maturing alongside the language you use, meaning if you start a serious programming career and choose a new-ish language, your own experience will grow alongside the increasing tools available to you. The features will be added roughly at the same time as you are ready to grasp them, and most importantly you will understand why they were created. You may even be in a position to contribute some of these yourself. Jumping into a decades old language is the same as jumping into a decades old codebase. There are so many hammers and you have no idea why they are there. “Don’t use that pattern for that use case, it’s wrong” will be heard often, and it is a useful thing to learn, but it will for a long time feel dogmatic and unsatisfying. In our project I had a bit of both. Some things I used and realized I was probably using them wrong, here’s an example. In python you can use a decorator called @property like this:

class FinancialInstrument:
    ...
    @property
    def maturity(self):  # cannot take any arguments except self
        ...  # some calculations
        return some_value_we_need_to_calculate

Its a neat idea, you can write as complex code as you want but you pretend that there is no complexity, you can just get the value with financial_instrument.maturity. Thing is, that complexity is a risk. What if at some point you want to take an argument, meaning you want to be able to modify the way the maturity is calculated? Tough luck, everywhere else in your code all the callers assumed this was a simple, property like value. You have made a promise that this value is simple, when in fact you knew it was a bit complicated. In the end it was just a lack of foresight that meant we chose this path, essentially just to eliminate two brackets, and paid the price whenever we were wrong. It’s a hammer for the worlds tiniest nail.

This pattern of more and more hammers, by the way, may be a bit of a curse. We haven’t had that many generations of programmers and languages, but it does feel like all languages get harder and harder to get into until one day someone gets tired of it and makes a new language with whatever they think is the most important features and new coders flock to it since it is, for the time being, simple. Then, as it adopts new patterns and features and changes to accommodate all the different use cases it ends up inevitably being used for it becomes the same as all those that came before it. Even my favorite language, python, with one of its “zen of python” principles being “There should be one– and preferably only one –obvious way to do it.” is gathering more hammers at an alarming rate. Reading the PEPs as they come out, I understand each of them in a vacuum, like them even, but I am increasingly believing that this is going to end with a complex system that maybe just doesn’t need to be that complex. I’ll be fine, I’ll understand most of it, but hopefully I’ll never look down on people who “should just learn it” without recognizing the advantage that I had.

It really is just about accidental and necessary complexity.

We’re engineers, and that means we solve problems. Not problems like, “what does the product need”, because that would fall within the purview of the higher ups. We solve practical problems.

Right?

Well, no. The value of any solution comes from solving real, hard problems. It encapsulates the necessary complexity. The product, taken as a whole, is inextricably linked to its own value proposition. How well it does that trickles all the way down into what parts of the code are necessary, and what is accidental. In our case, the necessary complexity was “how do we translate from one set of objects and data and modeling methods (our trading system) to another one (ORE).” There was then lots of times when we might be fighting our own code and realized that we could rewrite or even delete some part of it. This was unnecessary complexity, that we had introduced, that if we did not resolve would become the technical debt that would slowly rot the project.

Except, zoom out, earlier in this post I had made the tongue in cheek remark that what we really needed was open source trading and accounting software. I don’t know if that is possible, but if it is, the whole idea of model validation may, in a sense, be unnecessary complexity. Indeed, we couldn’t edit the source system APIs we were interfacing with, so when we saw something that we could make the reasonable guess was kindof dumb, unnecessary complexity, any code we wrote to work around it “inherited” that property, i.e it was also unnecessary. For example, if there are two variations of an instrument that in the source system are stored in two separate relational database tables, but you really truly believe it should have been just one table with an additional column. Then any code you write to handle those two tables will feel unnecessary. It’s tech debt you can’t fix, and it’s quite demoralizing.

All that is to say, unnecessary complexity trickles down. The higher up it is, the worse it will be, both because replacing it means more broken dependencies and lost work, but also because the demoralizing effect it has reaches more people, all of those who see it for what it is—inefficiency.

I also wrote about where complexity comes from, both the necessary and the unnecessary, in my first ever post Complexity Fills the Space it’s Given.

I don’t know if I could do it all again.

This post has at times sounded a bit cynical and worrisome, and in all honesty that’s a problem. When I started out naivety drove me to make a lot of mistakes, but at least it drove me. That motivation, that you are building something unique where the tradeoffs aren’t so important because it’s just such a good idea, is worth something, and as the project matured felt like I had lost some of that. My work became more careful, measured, of higher quality, but there was objectively less of it. Maybe it was more about the project and less about me, but I don’t know. I haven’t done a multi-year undertaking since. Maybe I’ll update this once I have.

Things I learned building a model validation library

What is Model Validation?

What’s a model validation library?

There is no market standard.

Hammers and nails.

It really is just about accidental and necessary complexity.

I don’t know if I could do it all again.

More posts

You should not write library style code! (probably)

Hate, Frustration, Annoyance, and Anger

People are just as bad as my LLMs

Why not use DeepSeek to reward DeepSeek?