Change of Measure or Girsanov’s Theorem is such an important theorem in Real Analysis or Quantitative Finance. Unfortunately, I never really understood it until much later after having left school. I blamed it to the professors and the textbook authors, of course.  The textbook version usually goes like this.

Given a probability space $${\Omega,\mathcal{F},P}$$, and a non-negative random variable Z satisfying $$\mathbb{E}(Z) = 1$$ (why 1?). We then defined a new probability measure Q by the formula, for all $$A in \mathcal{F}$$.

$$Q(A) = \int _AZ(\omega)dP(w)$$

Any random variable X, a measurable process adapted to the natural filtration of the $$\mathcal{F}$$, now has two expectations, one under the original probability measure P, which denoted as $$\mathbb{E}_P(X)$$, and the other under the new probability measure Q, denoted as $$\mathbb{E}_Q(X)$$. They are related to each other by the formula

$$\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)$$

If $$P(Z > 0) = 1$$, then P and Q agree on the null sets. We say Z is the Radon-Nikodym derivatives of Q with respect to P, and we write $$Z = \frac{dQ}{dP}$$. To remove the mean, μ, of a Brownian motion, we define

$$Z=\exp \left ( -\mu X – \frac{1}{2} \mu^2 \right )$$

Then under the probability measure Q, the random variable Y = X + μ is standard normal. In particular, $$\mathbb{E}_Q(X) = 0$$ (so what?).

This text made no sense to me when I first read it in school. It was very frustrated that the text was filled with unfamiliar terms like probability space and adaptation, and scary symbols like integration and $$\frac{dQ}{dP}$$. (I knew what $$\frac{dy}{dx}$$ meant when y was a function and x a variable. But what on earth were dQ over dP?)

Now after I have become a professor to teach students in finance or financial math, I would get rid of all the jargon and rigorousness. I would focus on the intuition rather than the math details (traders are not mathematicians). Here is my laymen version.

Given a probability measure P. A probability measure is just a function that assigns numbers to a random variable, e.g., 0.5 to head and 0.5 to tail for a fair coin. There could be another measure Q that assigns different numbers to the head and tail, say, 0.6 and 0.4 (an unfair coin)! Assume P and Q are equivalent, meaning that they agree on what events are possible (positive probabilities) and what events have 0 probability. Is there a relation between P and Q? It turns out to be a resounding yes!

Let’s define $$Z=\frac{Q}{P}$$. Z here is a function as P and Q are just functions. Z is evaluated to be 0.6/0.5 and 0.4/0.5. Then we have

$$\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)$$

This is intuitively true when doing some symbol cancellation. Forget about the proof even though it is quite easy like 2 lines. We traders don’t care about proof. Therefore, the distribution of X under Q is (by plugging in the indicator function in the last equation):

$$\mathbb{E}_Q(X \in A) = \mathbb{E}_P(I(X \in A)Z)$$

Moreover, setting X = 1, we have (Z here is a random variable):

$$\mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)$$

These results hold in general, especially for the Gaussian random variable and hence Brownian motion. Suppose we have a random (i.e., stochastic) process generated by (adapted to) a Brownian motion and it has a drift μ under a probability measure P. We can find an equivalent measure Q so that under Q, this random process has a 0 drift. Wiki has a picture that shows the same random process under the two different measures: each of the 30 paths in the picture has a different probability under P and Q.

The change of measure, Z, is a function of the original drift (as would be guessed) and is given by:

$$Z=\exp \left ( -\mu X – \frac{1}{2} \mu^2 \right )$$

For a 0 drift process, hence no increment, the expectation of the future value of the process is the same as the current value (a laymen way of saying that the process is a martingale.) Therefore, with the ability to remove the drift of any random process (by finding a suitable Q using the Z formula), we are ready to do options pricing.

Now, if you understand my presentation and go back to the textbook version, you should have a much better understanding and easier read, I hope.

References:

## Recommended Posts

1. thanks a lot!

2. Thank you. This stopped instantly my 6 hours of struggling to understand this subject.

• This instantly stopped my 6 hours of struggling to understand this subject. After reading this article it is clear.

3. Thank you a lot ! Your students must love you for your sense of pedagogy

Moreover, setting X = 1, we have (Z here is a random variable):
\mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)

Why is this true? \mathbb{E}_Q(X) = 1

• Hi,

I completely agree. I think it is a misleading statement. It is obvious that expectation of a random variable X (e.g. price) is not 1. I guess this was referring to an integral (continuous RV) / sum (discrete RV) of its pdf/pmf.

5. Thank you so much for writing this. Incredibly helpful, great explanation. I wish I would have found it sooner!

6. Didn’t you forget about the indicator function on the left hand side in the 3rd equation (from the bottom)?

• Agree

7. Thank you so much!

8. Thank you so much!! I could finally understand change of measure intuitively 🙂

9. A student goes to beach, party, or lecture, if a coin shows tail T, head H, or falls on edge E. Enjoyment is X = {T = 1, H = 2, E = -10}. The probabilities of a fair coin C are P(C) = {0.5, 0.5, 0}.
For a specially manufactured unfair coin U, favoring sport over drinking, they are P(U) = {0.6, 0.4, 0}. We still maintain zero for E making the probability measures P(C) and P(U) “equivalent”, what helps us later to avoid division by zero. The events and values X are the same but their probabilities change. The average enjoyment is
E(X with C) = 1 * 0.5 + 2 * 0.5 + (-10) * 0 = 1.5 or E(X with U) = 1 * 0.6 + 2 * 0.4 + (-10) * 0 = 1.4. We define Z = P(U)/P(C) = {0.6/0.5 = 1.2, 0.4/0.5 = 0.8, ignore}. Z is not probabilities (do not sum to one) but their “corresponding to events or values X” ratios. We ignore undefined value of Z here but could ignore zero probabilities earlier: adding values of impossible events to a random variable changes nothing for us. Here T, H, E, E(C), E(U); X, P(C), P(U); and Z are deterministic values, sets, and function. Only the coins C and U (how they fall) are random.

We can reproduce E(X with U) = 1.4 using P(C), if replace X with X * Z = {1 * 0.6/0.5 = 1.2, 2 * 0.4/0.5 = 1.6}. The values
X are multiplied by the transforming function. Indeed, E(XZ with C) = 1.2 * 0.5 + 1.6 * 0.5 = 1.4. This is the mean of the product XZ under a different (not P(U)) measure.

Using finite number of discrete events instead of a continuous random variable provides simpler explanations including the theorem on changing the probability measure. But a probability of any value of a continuous random variable is exactly zero. One could not apply the same explanation to a Gaussian variable facing with 0/0 elsewhere. The non-zero measures for the latter are introduced for intervals of values. While point probabilities in the continuous case are zeros, the ratio dP(U)/dP(C) on corresponding each to other shrinking intervals can be a finite number, even, in the limit and then we can get similar conclusions and use this useful technique.

Suddenly, Igor Vladimirovich Girsanov, a talented mathematician, pupil of Dynkin and Kolmogorov died in Sayan Mountains on March 16, 1967 at the age of 32 years. Similar to the coin U, he favored
sport – alpinism. It was 50 years ago. One of his contributions: Girsanov, Igor “On Transforming a Certain Class of Stochastic Processes by Absolutely Continuous Substitution of Measures”,
Theory of Probability and its Applications, Volume 5, No 3, pp. 314 – 330, 1960 is translated to several languages and famous. Best Regards, Valerii

10. Great description! Thanks a lot!

11. I believe there is a mistake in the above – most likely for the example of removing drift has been copied from Shreve, who has the mistake in his book as well: when you remove the drift in that example, both Z and Y are missing a “t” in their second term: Y = X + mu*t, and Z = exp( -muX -1/2 * mu^2 t).

12. Very helpful, I’ve needed this explanation for 15 years!