Using Returns in Pairs Trading

This blog article is taken from our book [1].

In most entry-level materials on pairs trading such as in [2],  a mean reverting basket is usually constructed by this relationship:

\(P_t – \gamma Q_t = Z_t, \textrm{(eq. 1)}\)

, where \(P_t\) is the price of asset \(P\) at time t, \(Q_t\) the price of asset \(Q\) at time t, and \(Z_t\) the price of the mean reverting asset to trade. One way to find \(\gamma\) is to use cointegration. There are numerous problems in this approach as detailed in [1]. To mention a few: the identified portfolios are dense; executions involve considerable transaction costs; the resultant portfolios behave like insignificant and non-tradable noise; cointegration is too stringent and often unnecessary a requirement to satisfy.

This article highlights one important problem: it is much better to work in the space of (log) returns than in the space of prices. Therefore, we would like to build a mean reverting portfolio using a similar relationship to (eq. 1) but in returns rather than in prices.

The Benefits of Using Log Returns

When we compare the prices of two assets, [… TODO …]


A Model for a Mean Reverting Synthetic Asset

Let’s assume prices are log-normally distributed, which is a popular assumption in quantitative finance, esp. in options pricing. Then, prices are always positive, satisfying the condition of “limited liability” of stocks. The upside is unlimited and may go to infinity. [5] We have:

\(P_t = P_0\exp(r_{P,t}) \\ Q_t = Q_0\exp(r_{Q,t}), \textrm{eq. 2}\)

\(r_{P,t}\) is the return for asset \(P\) between times 0 and t; likewise for asset \(Q\).

Instead of applying a relationship, e.g., cointegration (possible but not a very good way), to the pair on prices, we can do it on returns. This is possible because, just like prices, the returns at time t are simply random walks, hence \(I(1)\) series. We have (dropping the time subscript):

\(r_P – \gamma r_Q = Z, \textrm{(eq. 3)}\)

Of course, the \(\gamma\) is a different coefficient; the \(Z\) a different white noise.

Remove the Common Risk Factors

Let’s consider this scenario. Suppose the oil price suddenly drops by half (as is developing in the current market). Exxon Mobile (XOM), being an oil company, follows suit. American Airlines (AAL), on the other hand, can save cost on fuel and may rise. The naive (eq. 3) will show a big disequilibrium and signal a trade on the pair. However, this disequilibrium is spurious. Both XOM and AAL are simply reacting to the new market/oil regime and adjust their “fair” prices accordingly. (Eq. 3) fails to account for the common oil factor to both companies. Mean reversion trading should trade only on idiosyncratic risk that are not affected by systematic risks.

To improve upon (eq. 3), we need to remove systematic risks or common risk factors from the equation. Let’s consider CAPM. It says:

\(r = r_f + \beta (r_M – r_f) + \epsilon, \textrm{(eq. 4)}\)

The asset return, \(r\), and \(\epsilon\), are normally distributed random variables. The average market return, \(r_M\), and the risk free rate, \(r_f\), are constants.

Substituting (eq. 4) into the L.H.S. of (eq. 3) and grouping some constants, we have:

\((r_P – \beta_P (r_M-r_f)) – \gamma (r_Q – \beta_Q (r_M-r_f)) = \epsilon + \mathrm{constant}\)

To simply things:

\((r_P – \beta_P r_M) – \gamma (r_Q – \beta_Q r_M) = \epsilon + \gamma_0, \textrm{(eq. 5)}\)

where \(\gamma_0\) is a constant.
(Eq. 5) removes the market/oil effect from the pair. When the market simply reaches a new regime, our pair should not change its value. In general, for multiple n asset, we have:

\(\gamma_0 + \sum_{i=1}^{n}\gamma_i (r_i – \beta_ir_M) = \epsilon, \textrm{(eq. 6)}\)

For multiple n asset, multiple m common risk factors, we have:

\(\gamma_0 + \sum_{i=1}^{n}\gamma_i (r_i – \sum_{j=i}^{m}\beta_jF_j) = \epsilon, \textrm{(eq. 7)}\)

Trade on Dollar Values

It is easy to see that if we use (eq. 1) to trade the pair, to long (short) \(Z\), we buy (sell) 1 share of \(P\) and sell (long) \(\gamma\) share of \(Q\). How do we trade using (eqs. 3, 5, 6, 7)? When we work in the log-return space, we trade for each stock, \(i\), the number of shares worth of \(\gamma_i\). That is, we trade for each stock \(\gamma_i/P_i\) number of shares, where \(P_i\) is the current price of stock \(i\).

Let’s rewrite (eq. 3) in the price space.

\(\log(P/P_0) – \gamma \log(Q/Q_0) = Z\)

The R.H.S. is

\(\log(P/P_0) – \gamma \log(Q/Q_0) = \log(1 + \frac{P-P_0}{P_0}) – \gamma \log(1 + \frac{Q-Q_0}{Q_0})\)

Using the relationship \(\log(1+r) \approx r, r \ll 1\), we have

\(\log(1 + \frac{P-P_0}{P_0}) – \gamma \log(1 + \frac{Q-Q_0}{Q_0}) \approx \frac{P-P_0}{P_0} – \gamma \frac{Q-Q_0}{Q_0} \\ = (\frac{P}{P_0} -1) – \gamma (\frac{Q}{Q_0} -1) \\ = \frac{1}{P_0}P – \gamma \frac{1}{Q_0}Q + \mathrm{constant} \\= Z\)

Dropping the constant, we have:

\(\frac{1}{P_0}P – \gamma \frac{1}{Q_0}Q = Z, \textrm{(eq. 8)}\)

That is, we buy \(\frac{1}{P_0}\) shares of \(P\) at price \(P_0\) and \(\frac{1}{Q_0}\) shares of \(Q\) at price \(Q_0\). We can easily extend (eq. 8) to account for the general cases: we trade for each stock \(i\) \(\gamma_i/P_i\) number of shares.


  1. Numerical Methods in Quantitative Trading, Dr. Haksun Li, Dr. Ken W. Yiu, Dr. Kevin H. Sun
  2. Pairs Trading: Quantitative Methods and Analysis, by Ganapathy Vidyamurthy
  3. Identifying Small Mean Reverting Portfolios, Alexandre d’Aspremont
  4. Developing high-frequency equities trading models, Infantino
  5. The Econometrics of Financial Markets, John Y. Campbell, Andrew W. Lo, & A. Craig MacKinlay

SuanShu is the Best Numerical and Statistical Library, ever!

a picture is worth a thousand words…

SuanShu linear algebra API performance

Check out the release notes here:

Happy birthday to me 🙂

Certificate in Quantitative Investment (CQI)

NM FinTech has the vision to promote rational investment and trading. Jointly organized with top universities, we are offering a 6-months 9-courses program that teaches mathematics, programming and quantitative or algorithmic trading. We invite famous and established traders from Wall Street banks and funds to share their experience. Students may choose to participate in classroom or online. More information can be found here:

Change of Measure/Girsanov’s Theorem Explained

Change of Measure or Girsanov’s Theorem is such an important theorem in Real Analysis or Quantitative Finance. Unfortunately, I never really understood it until much later after having left school. I blamed it to the professors and the textbook authors, of course.  The textbook version usually goes like this.

Given a probability space \({\Omega,\mathcal{F},P}\), and a non-negative random variable Z satisfying \(\mathbb{E}(Z) = 1\) (why 1?). We then defined a new probability measure Q by the formula, for all \(A in \mathcal{F}\).

\(Q(A) = \int _AZ(\omega)dP(w)\)

Any random variable X, a measurable process adapted to the natural filtration of the \(\mathcal{F}\), now has two expectations, one under the original probability measure P, which denoted as \(\mathbb{E}_P(X)\), and the other under the new probability measure Q, denoted as \(\mathbb{E}_Q(X)\). They are related to each other by the formula

\(\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)\)

If \(P(Z > 0) = 1\), then P and Q agree on the null sets. We say Z is the Radon-Nikodym derivatives of Q with respect to P, and we write \(Z = \frac{dQ}{dP}\). To remove the mean, μ, of a Brownian motion, we define

\(Z=\exp \left ( -\mu X – \frac{1}{2} \mu^2 \right )\)

Then under the probability measure Q, the random variable Y = X + μ is standard normal. In particular, \(\mathbb{E}_Q(X) = 0\) (so what?).

This text made no sense to me when I first read it in school. It was very frustrated that the text was filled with unfamiliar terms like probability space and adaptation, and scary symbols like integration and \(\frac{dQ}{dP}\). (I knew what \(\frac{dy}{dx}\) meant when y was a function and x a variable. But what on earth were dQ over dP?)

Now after I have become a professor to teach students in finance or financial math, I would get rid of all the jargon and rigorousness. I would focus on the intuition rather than the math details (traders are not mathematicians). Here is my laymen version.

Given a probability measure P. A probability measure is just a function that assigns numbers to a random variable, e.g., 0.5 to head and 0.5 to tail for a fair coin. There could be another measure Q that assigns different numbers to the head and tail, say, 0.6 and 0.4 (an unfair coin)! Assume P and Q are equivalent, meaning that they agree on what events are possible (positive probabilities) and what events have 0 probability. Is there a relation between P and Q? It turns out to be a resounding yes!

Let’s define \(Z=\frac{Q}{P}\). Z here is a function as P and Q are just functions. Z is evaluated to be 0.6/0.5 and 0.4/0.5. Then we have

\(\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)\)

This is intuitively true when doing some symbol cancellation. Forget about the proof even though it is quite easy like 2 lines. We traders don’t care about proof. Therefore, the distribution of X under Q is (by plugging in the indicator function in the last equation):

\(\mathbb{E}_Q(X \in A) = \mathbb{E}_P(I(X \in A)Z)\)

Moreover, setting X = 1, we have (Z here is a random variable):

\(\mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)\)

These results hold in general, especially for the Gaussian random variable and hence Brownian motion. Suppose we have a random (i.e., stochastic) process generated by (adapted to) a Brownian motion and it has a drift μ under a probability measure P. We can find an equivalent measure Q so that under Q, this random process has a 0 drift. Wiki has a picture that shows the same random process under the two different measures: each of the 30 paths in the picture has a different probability under P and Q.

The change of measure, Z, is a function of the original drift (as would be guessed) and is given by:

\(Z=\exp \left ( -\mu X – \frac{1}{2} \mu^2 \right )\)

For a 0 drift process, hence no increment, the expectation of the future value of the process is the same as the current value (a laymen way of saying that the process is a martingale.) Therefore, with the ability to remove the drift of any random process (by finding a suitable Q using the Z formula), we are ready to do options pricing.

Now, if you understand my presentation and go back to the textbook version, you should have a much better understanding and easier read, I hope.


FREE .NET/C# Numerical/Math library

On this Christmas Day, we are happy to announce that is FREE for all! has all the features as its Java sibling as well as has undergone the same many thousands of test cases daily.

There are a tutorial and examples that show you how to build a SuanShu application in Visual Studio. One major advantage of using over the Java version is that it integrates seamlessly with Microsoft Excel. By incorporating SuanShu library in your spreadsheet, you literally have access to hundreds of numerical algorithms when manipulating and analyzing your data, significantly enhancing Excel’s productivity.

We hope that you enjoy using in your work. If you have any interesting story, comments or feedback, we’d love to hear from you.

Starting downloading now!

SuanShu 2.0

We are proud to announce the release of SuanShu 2.0! This release is the accumulation of customer feedbacks and our experience learnt in the last three years coding numerical computation algorithms. SuanShu 2.0 is a redesign of the software architecture, a rewrite of many modules, additions of new modules and functionalities driven by user demands and applications, numerous bug fixes as well as performance tuning. We believe that SuanShu 2.0 is the best numerical and statistical library ever available in Java, if not all, platform.

Here are highlights of the new features available since 2.0.

–          ordinary and partial differential equation solvers

–          Optimization: Quadratic Programming,  Sequential Quadratic Programming, (Mixed) Integer Linear Programming, Semi-Definite Programming

–          ARIMA fit

–          LASSO and LARS

–          Gaussian quadrature/integration

–          Interpolation methods

–          Trigonometric functions and physical constants

–          Extreme Value Theory

Continuing our tradition, we will still provide trial license and academic license for eligible schools and research institutes. Moreover, we now provide another way to get a FREE SuanShu license – the contribution license. If you are able to contribute code to the SuanShu library, you can get a permanent license. For more information, see:

We hope that you will find the new release of SuanShu more helpful than ever in your work. If you have any comments to help us improve, please do let us know.

Happy birthday to TianTians and Merry Christmas to all!


Trading and Investment as a Science

Here is the synopsis of my presentation at HKSFA, September 2012. The presentation can be downloaded from here.


Many people lose money playing the stock market. The strategies they use are nothing but superstitions. There is no scientific reason why, for example, buying on a breakout of the 250-day-moving average, would make money. Trading profits do not come from wishful thinking, ad-hoc decision, gambling, and hearsay, but diligent systematic study.
• Moving average as a superstitious trading strategy.


Many professionals make money playing the stock market. One approach to investment decision or trading strategy is to treat it as a science. Before we make the first trade, we want to know how much money we expect to make. We want to know in what situations the strategy will make (or lose) money and how much.
• Moving average as a scientific trading strategy


There are many mathematical tools and theories that we can use to quantify, analyse, and verify a trading strategy. We will show case some popular ones.
• Markov chain (a trend-following strategy)
• Cointegration (a mean-revision strategy)
• Stochastic differential equations (the best trading strategy, ever!)
• Extreme value theory (risk management, stop-loss)
• Monte Carlo simulation (what are the success factors in a trading strategy?)

Data Mining

The good quant trading models reveal the nature of the market; the bad ones are merely statistical artifacts.

One most popular way to create spurious trading model is data snooping or data mining. Suppose we want to create a model to trade AAPL daily. We download some data of, e.g., 100 days of AAPL, from Yahoo. If we work hard enough with the data, we will find a curve (model) that explains the data very well. For example, the following curve perfectly fits the data.

Suppose the prices are \({ x_1, x_2, \dots x_n }\)

\(\frac{(t-2)\dots(t-n)}{(1-2)\dots(1-n)}(x_1) + \frac{(t-1)\dots(t-n)}{(2-1)\dots(2-n)}(x_2) + \dots + \frac{(t-1)\dots(t-n+1)}{(n-1)\dots(n-n+1)}(x_n)\)

Of course, most of us are judicious enough to avoid this obvious over-fitting formula. Unfortunately, some may fall into the trap of it in disguise. Let’s say we want to understand what factors contribute to the AAPL price movements or returns. (We now have 99 returns.) We come up with a list of 99 possible factors, such as PE, capitalization, dividends, etc. One very popular method to find significant factors is linear regression. So, we have

\(r_t = \alpha + \beta_1f_{1t} + \dots + \beta_{99}f_{99t} + \epsilon_t\)

Guess how well this fits? The goodness-of-fit (R-squared) turns out be 100% – a perfect fit! It can be proved that this regression is a complete nonsense. Even if we throw in random values for those 99 factors, we will also end up with a perfect fit regression. Consequently, the coefficients and t-stats mean nothing.
Could we do a “smaller” regression on a small subset of factors, e.g., one factor at a time, and hope to identify the most significant factor? This step-wise regression turns out to be spurious as well. For a pool of large enough factors, there is big probability of finding (the most) significant factors even when the factors values are randomly generated.

Suppose we happen to regress returns on only capitalization and finds that this factor is significant. Even so, we may in fact be doing some form of data snooping. This is because there are thousands other people testing the same or different factors using the same data set, i.e., AAPL prices from Yahoo. This community, taken as a whole, is doing exactly the same step-wise regression described in the last paragraph. In summary, empirical evidence alone is not sufficient to justify a trading model.

To avoid data snooping in designing a trading strategy, Numerical Method Inc. recommends our clients a four-step procedure.

  1. Hypothesis: we start with an insight, a theory, or a common sense about how the market works.
  2. Modeling: translate the insight in English into mathematics (in Greek).
  3. Application: in-sample calibration and out-sample backtesting.
  4. Analysis: understand and explain the winning vs. losing trades.

In steps 1 and 2, we explicitly write down the model assumptions, deduce the model properties, and compute the p&l distribution. We prove that under those assumptions, the strategy will always make money (on average). Whether these assumptions are true can be verified against data using techniques such as hypothesis testing. Given the model parameters, we know exactly how much money we expect to make. This is all done before we even look at a particular data set. In other words, we avoid data snooping by using the data set only until the calibration step and after we have created a trading model.

An example of creating a trend following strategy using this procedure can be found in lecture 1 of the course “Introduction to Algorithmic Trading Strategies”.


The Role of Technology in Quantitative Trading Research

I posted my presentation titled “The Role of Technology in Quantitative Trading Research” presented in

You can find the powerpoint here.


There needs a technology to streamline the quantitative trading research process. Typically, quants/traders, from idea generation to strategy deployment, may take weeks if not months. This means not only loss of trading opportunity, but also a lengthy, tedious, erroneous process marred with ad-hoc decisions and primitive tools. From the organization’s perspective, comparing the paper performances of different traders is like comparing apples to oranges. The success of the firm relies on hiring the right geniuses. Our solution is a technological process that standardizes and automates most of the mechanical steps in quantitative trading research. Creating a new trading strategy should be as easy and fun as playing Legos by assembling together simpler ideas. Consequently, traders can focus their attention on what they are supposed to be best at – imagining new trading ideas/strategies.


  • In reality, the research process for a quantitative trading strategy, from conceptual design to actual execution, is very time consuming, e.g., months. The backtesting step, in the broadest sense, takes the longest time. There are too many details that we can include in the backtesting code. To just name a few, data cleaning and preparation, mathematics algorithms, mock market simulation, execution and slippage assumptions, parameter calibration, sensitivity analysis, and worst of all, debugging. In practice, most people will ignore many details and make unfortunate “approximation”. This is one major reason why real and paper p&l’s are different.
  • Before AlgoQuant, there is no publicly available quantitative trading research platform that alleviates quants/traders from coding up those “infrastructural” components. Most of the existing tools are either lacking extensive built-in math libraries, or lacking modern programming language support, or lacking plug-and-play “trading toolboxes”.
  • Technology can change the game by enhancing productivity. Imagine there is a system that automates and runs in a parallel grid of 100s of CPUs for you all those tedious and mundane tasks, data cleaning, mock market, calibration, and mathematics. You can save 80% of coding time and can focus your attention to trading ideas and analysis. Jim, using Matlab, may find a successful trading strategy in 3 months. You, equipped with the proper technology, may find 3 strategies in a month! The success of a hedge fund shall no longer rely on hiring genius.
  • After we code up a strategy and choose a parameter set, there is a whole suite of analysis that we can go through and many measures that we can compute to evaluate the strategy. For instance, we can see how the strategy performs for historical data, simulated data generated from Monte Carlo simulation (parametric) or bootstrapping (non-parametric), as well as scenario data (hand crafted). We can construct the p&l distribution (it is unfortunate that historical p&l seems to be the popular performance measure; we traders do not really care about what we could make in the past but care only about our bonuses in the future; so what we really want to see is the future p&l distribution for uncertainty not historical p&l); we can do sensitivity analysis of parameters; we can compute the many performance statistics. All these are very CPU-intensive tasks. Using AlgoQuant, you simply feed your strategy into the system. AlgoQuant runs all these tasks on a parallel grid and generates a nice report for you.
  • The academic community publishes very good papers on quantitative trading strategies. Unfortunately they are by-and-large unexplored. First, they are very difficult to understand because they are written for peer reviewers not laymen. Second, they are very difficult to reproduce because most authors do not publish source code. Third, they are very difficult to apply in real trading because the source code is not meant for public use, even if available.

NUMERICAL METHOD INC Selected as a Red Herring Top 100 Asia Tech Startup

Hong Kong, China – Numerical Method Incorporation Limited has won the Top 100 Asia award. Numerical Method Inc. publishes SuanShu, a Java math library, and AlgoQuant, an algorithmic/quantitative trading strategy research platform.

Red Herring announced its Top 100 Asia award in recognition of the leading private companies from Asia, celebrating these startups’ innovations and technologies across their respective industries.

Red Herring’s Top 100 list has become a mark of distinction for identifying promising new companies and entrepreneurs. Red Herring editors were among the first to recognize that companies such as Facebook, Twitter, Google, Yahoo, Skype,, YouTube, and eBay would change the way we live and work.

“Choosing the companies with the strongest potential was by no means a small feat,” said Alex Vieux, publisher and CEO of Red Herring. “After rigorous contemplation and discussion, we narrowed our list down from hundreds of candidates from across Asia to the Top 100 Winners. We believe Numerical Method Inc. embodies the vision, drive and innovation that define a successful entrepreneurial venture. Numerical Method Inc. should be proud of its accomplishment, as the competition was very strong.”

Red Herring’s editorial staff evaluated the companies on both quantitative and qualitative criteria, such as financial performance, technology innovation, management quality, strategy, and market penetration. This assessment of potential is complemented by a review of the track record and standing of startups relative to their sector peers, allowing Red Herring to see past the “buzz” and make the list a valuable instrument of discovery and advocacy for the most promising new business models in Asia.


Red Herring Top 100 Asia Award Ceremony
Red Herring Top 100 Asia Award Ceremony