Change of Measure or Girsanov’s Theorem is such an important theorem in Real Analysis or Quantitative Finance. Unfortunately, I never really understood it until much later after having left school. I blamed it to the professors and the textbook authors, of course.  The textbook version usually goes like this.

Given a probability space \({\Omega,\mathcal{F},P}\), and a non-negative random variable Z satisfying \(\mathbb{E}(Z) = 1\) (why 1?). We then defined a new probability measure Q by the formula, for all \(A in \mathcal{F}\).

\(Q(A) = \int _AZ(\omega)dP(w)\)

Any random variable X, a measurable process adapted to the natural filtration of the \(\mathcal{F}\), now has two expectations, one under the original probability measure P, which denoted as \(\mathbb{E}_P(X)\), and the other under the new probability measure Q, denoted as \(\mathbb{E}_Q(X)\). They are related to each other by the formula

\(\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)\)

If \(P(Z > 0) = 1\), then P and Q agree on the null sets. We say Z is the Radon-Nikodym derivatives of Q with respect to P, and we write \(Z = \frac{dQ}{dP}\). To remove the mean, μ, of a Brownian motion, we define

\(Z=\exp \left ( -\mu X – \frac{1}{2} \mu^2 \right )\)

Then under the probability measure Q, the random variable Y = X + μ is standard normal. In particular, \(\mathbb{E}_Q(X) = 0\) (so what?).

This text made no sense to me when I first read it in school. It was very frustrated that the text was filled with unfamiliar terms like probability space and adaptation, and scary symbols like integration and \(\frac{dQ}{dP}\). (I knew what \(\frac{dy}{dx}\) meant when y was a function and x a variable. But what on earth were dQ over dP?)

Now after I have become a professor to teach students in finance or financial math, I would get rid of all the jargon and rigorousness. I would focus on the intuition rather than the math details (traders are not mathematicians). Here is my laymen version.

Given a probability measure P. A probability measure is just a function that assigns numbers to a random variable, e.g., 0.5 to head and 0.5 to tail for a fair coin. There could be another measure Q that assigns different numbers to the head and tail, say, 0.6 and 0.4 (an unfair coin)! Assume P and Q are equivalent, meaning that they agree on what events are possible (positive probabilities) and what events have 0 probability. Is there a relation between P and Q? It turns out to be a resounding yes!

Let’s define \(Z=\frac{Q}{P}\). Z here is a function as P and Q are just functions. Z is evaluated to be 0.6/0.5 and 0.4/0.5. Then we have

\(\mathbb{E}_Q(X) = \mathbb{E}_P(XZ)\)

This is intuitively true when doing some symbol cancellation. Forget about the proof even though it is quite easy like 2 lines. We traders don’t care about proof. Therefore, the distribution of X under Q is (by plugging in the indicator function in the last equation):

\(\mathbb{E}_Q(X \in A) = \mathbb{E}_P(I(X \in A)Z)\)

Moreover, setting X = 1, we have (Z here is a random variable):

\(\mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)\)

These results hold in general, especially for the Gaussian random variable and hence Brownian motion. Suppose we have a random (i.e., stochastic) process generated by (adapted to) a Brownian motion and it has a drift μ under a probability measure P. We can find an equivalent measure Q so that under Q, this random process has a 0 drift. Wiki has a picture that shows the same random process under the two different measures: each of the 30 paths in the picture has a different probability under P and Q.

The change of measure, Z, is a function of the original drift (as would be guessed) and is given by:

\(Z=\exp \left ( -\mu X – \frac{1}{2} \mu^2 \right )\)

For a 0 drift process, hence no increment, the expectation of the future value of the process is the same as the current value (a laymen way of saying that the process is a martingale.) Therefore, with the ability to remove the drift of any random process (by finding a suitable Q using the Z formula), we are ready to do options pricing.

Now, if you understand my presentation and go back to the textbook version, you should have a much better understanding and easier read, I hope.


Recommended Posts


  1. thanks a lot!

  2. Thank you. This stopped instantly my 6 hours of struggling to understand this subject.

    • This instantly stopped my 6 hours of struggling to understand this subject. After reading this article it is clear.

  3. Thank you a lot ! Your students must love you for your sense of pedagogy

  4. Thanks for the explanation. However I am confused about this:

    Moreover, setting X = 1, we have (Z here is a random variable):
    \mathbb{E}_Q(X) = 1 = \mathbb{E}_P(Z)

    Why is this true? \mathbb{E}_Q(X) = 1

    • Hi,

      I completely agree. I think it is a misleading statement. It is obvious that expectation of a random variable X (e.g. price) is not 1. I guess this was referring to an integral (continuous RV) / sum (discrete RV) of its pdf/pmf.

  5. Thank you so much for writing this. Incredibly helpful, great explanation. I wish I would have found it sooner!

  6. Didn’t you forget about the indicator function on the left hand side in the 3rd equation (from the bottom)?

  7. Thank you so much!

  8. Thank you so much!! I could finally understand change of measure intuitively 🙂

  9. A student goes to beach, party, or lecture, if a coin shows tail T, head H, or falls on edge E. Enjoyment is X = {T = 1, H = 2, E = -10}. The probabilities of a fair coin C are P(C) = {0.5, 0.5, 0}.
    For a specially manufactured unfair coin U, favoring sport over drinking, they are P(U) = {0.6, 0.4, 0}. We still maintain zero for E making the probability measures P(C) and P(U) “equivalent”, what helps us later to avoid division by zero. The events and values X are the same but their probabilities change. The average enjoyment is
    E(X with C) = 1 * 0.5 + 2 * 0.5 + (-10) * 0 = 1.5 or E(X with U) = 1 * 0.6 + 2 * 0.4 + (-10) * 0 = 1.4. We define Z = P(U)/P(C) = {0.6/0.5 = 1.2, 0.4/0.5 = 0.8, ignore}. Z is not probabilities (do not sum to one) but their “corresponding to events or values X” ratios. We ignore undefined value of Z here but could ignore zero probabilities earlier: adding values of impossible events to a random variable changes nothing for us. Here T, H, E, E(C), E(U); X, P(C), P(U); and Z are deterministic values, sets, and function. Only the coins C and U (how they fall) are random.

    We can reproduce E(X with U) = 1.4 using P(C), if replace X with X * Z = {1 * 0.6/0.5 = 1.2, 2 * 0.4/0.5 = 1.6}. The values
    X are multiplied by the transforming function. Indeed, E(XZ with C) = 1.2 * 0.5 + 1.6 * 0.5 = 1.4. This is the mean of the product XZ under a different (not P(U)) measure.

    Using finite number of discrete events instead of a continuous random variable provides simpler explanations including the theorem on changing the probability measure. But a probability of any value of a continuous random variable is exactly zero. One could not apply the same explanation to a Gaussian variable facing with 0/0 elsewhere. The non-zero measures for the latter are introduced for intervals of values. While point probabilities in the continuous case are zeros, the ratio dP(U)/dP(C) on corresponding each to other shrinking intervals can be a finite number, even, in the limit and then we can get similar conclusions and use this useful technique.

    Suddenly, Igor Vladimirovich Girsanov, a talented mathematician, pupil of Dynkin and Kolmogorov died in Sayan Mountains on March 16, 1967 at the age of 32 years. Similar to the coin U, he favored
    sport – alpinism. It was 50 years ago. One of his contributions: Girsanov, Igor “On Transforming a Certain Class of Stochastic Processes by Absolutely Continuous Substitution of Measures”,
    Theory of Probability and its Applications, Volume 5, No 3, pp. 314 – 330, 1960 is translated to several languages and famous. Best Regards, Valerii

  10. Great description! Thanks a lot!

  11. I believe there is a mistake in the above – most likely for the example of removing drift has been copied from Shreve, who has the mistake in his book as well: when you remove the drift in that example, both Z and Y are missing a “t” in their second term: Y = X + mu*t, and Z = exp( -muX -1/2 * mu^2 t).

  12. Very helpful, I’ve needed this explanation for 15 years!

Add a Comment

Your email address will not be published. Required fields are marked *