Now we can rearrange equation’s parts as follows:

Thus we can equate it against simple regression formula:

Therefore, we obtain:

And this gives us the following OLS estimates:

As it was in the previous articles:

And finally the estimation of is:

Thus we obtained the estimation of exponential Ornstein-Uhlenbeck process.

]]>Item is a general term that describes everything what RS recommends to users. It might be stocks, books, videos or anything else.

User of RS is usually an individual who doesn’t have enough experience or competence to make a decision of item choice. For instance, popular on-line book store Amazon provides recommendations based on what user bought and viewed in past.

To be useful suggestion (or recommendation) should support some decision-making process, like what book to buy, what news to read or what to do in a spare time. In the simplest form, it might be a ranked list of items. This ranking is way to predict what the most suitable items are for user behaviour.

RS should track how users interact with items. For instance, on the book store one might view a book title, look inside of the book and/or buy it. Viewing a title can be considered as an implicit sign of preference. Thus the usual recommender system has to deal with User, Item and User-to-Item actions (transactions).

For RS implementation to be successful it should achieve one or more business goals:

**Sell more items****Sell more diverse items****Increase user satisfaction and fidelity****Understand user behaviour and habits**

And those could be approached by implementation of few tasks:

**Find some good items:**Create a ranked list of items along with predictions of how much user would like them. This is the main task of many RS.**Find all good items:**Create a complete ranked list of items. Usually it is required when number of items is small and user can benefit from ranking information. Such RS are quite common in financial application. They usually need to examine and to rank all possible scenarios.**Annotate items in context:**Given an existing context, emphasise items based on long-term user preferences. For instance, such RS might emphasise TV shows in EPG based on previous user behaviour.**Recommend a bundle:**Suggest a group of items that fits well together. You’ll find such bundles at cable internet providers, travelling agencies etc. For instance, airlines are starting to recommend accommodation and car hire during ticket purchase.**Recommend a sequence:**Recommend a sequence of items that is pleasing as a whole. For instance, a recommended track of courses at the university might depend not only on chosen major, but also on the absolved courses.**Browsing:**RS should help the user to browse items that are more likely in the user’s interest in this browsing session.**Improve user profile:**This task is all-time task of RS. It collects information about user’s actions to provide more personalised recommendations.

That is pretty much what one can expect from such thing as recommender system. In next posts I plan to cover:

- Overview of basic techniques
- Clustering
- Content-based RS
- Collaborative filtering in RS

It was not easy for me. When I got a proposal to write a book in this area, I was hesitating if it is something I was able to cope with. For fresh writer it is a daunting task as you should go under the schedule and try to write consistently, every day, at least half page of text. Some parts of book were easy to write as I already wrote about these topics. Some were awful to accomplish as I did not really understand how to explain math and its links to Haskell with a plain and clear language.

I’m quite sure that now I will start again writing this blog, though the new projects are quite far from financial math now but they are still in math and big data projects. Please, also check our new company website to see what is in progress now.

But finally it is out and available in book stores like Amazon, O’Reilly or Safari Book store:

]]>where is entropy and is internal energy of the system.

In statistical mechanics, entropy is a measure of the number of ways in which a system may be arranged, often taken to be a measure of “disorder” (the higher the entropy, the higher the disorder). This definition describes the entropy as being proportional to the natural logarithm of the number of possible microscopic configurations (microstates) which could give rise to the observed macroscopic state (macrostate) of the system. For sake of simplicity we assume the constant of proportionality equal to one:

Order book is in fact a set of all buy/sell orders. Let’s denote it as where b (s) is price and B (S) is amount of contracts at given price of buy (or sell) orders. Let’s normalise it by total buy (Tb) and sell (Ts) contracts:

Thus the entropy becomes a sum of entropy of buy and sell sides:

The internal energy is the total energy contained by thermodynamical system. It is the energy needed to create the system, but excludes the energy to displace the system’s surroundings, any energy associated with a move as a whole, or due to external force fields. Thus to create the order book one needs to have all money of buy side and to own securities of sell side. There could be doubts how to price securities of sell side but we’ll take the easiest approach:

Let’s try to derive a formula of temperature under the given assumption. At first, the total differentials of entropy and internal energy should be obtained:

Then we can find derivative of entropy by energy by total derivative definition:

where

And substitution into the total derivative yields the formula for temperature:

where L is the Lagrangian. The real beauty of such description lies in developed and well studied mechanism of equation solutions.

Let’s link it with our trading problem of one security and define profit function in the same way as action:

where q is price (quote) and p is position, positive is long and negative is short. Therefore, the Lagrangian is:

As we can see it doesn’t depend on time and position itself, therefore, according to Noether’s theorem 2 conservation laws follow:

Ignoring the fact we obtain:

I.e. classic buy-and-hold strategy.

Let’s introduce control function which doesn’t depend on time elapsed:

In fact we state that position depends only on current price dynamics. Thus, the Lagrangian becomes:

The extremum of action is defined by Lagrange equations:

As the Lagrangian doesn’t depend on time the equation reduces to:

If we assume that the control function doesn’t depend on time, then under this assumption energy is motion integral of the system:

where is a constant, starting energy. Let’s simplify and say that control function doesn’t depend on price derivative:

By definition of control function:

We obtain the following solution

In fact it is unsatisfactory solution as it is a continuous buying of the security, non-bounded at all.

In case if price change speed is taken into account, partial derivative equation becomes:

And it has generic solution:

where is an arbitrary function of q. And this solution is in complex plane in general, thus contradicting the initial setup.

It seems that continuous approach to the task is not productive. So, in next articles I’ll try to follow discrete formulation of the task.

]]>Now let’s construct more complicated model with the following volatility changes as states in chain:

- Less than -25%: denote it as -100.
- From -25% to -15% : -20.
- From -15% to -5% : -10.
- From -5% to 5% : 0.
- From 5% to 15% : 10.
- From 15% to 25% : 20.
- More than 25% : 100.

If Markov chain has states enumerated from 1 to N than probability of each state in stationary distribution is:

and state probabilities must sum up to one:

or in matrix form:

where

and the last row of transition matrix specifies the constraint on probabilities:

The upper side of the transition matrix is in fact a left stochastic matrix and by Perron-Frobenius theorem it has spectral radius . This radius is a positive real number and it is an eigenvalue of the matrix, called the Perron-Frobenius eigenvalue. It satisfies the inequalities:

As each column of transition matrix is a conditional probabilities and sums up to one:

Thus spectral radius of transition matrix is equals one. The added zeros and ones just give additional eigenvalue zero and don’t influence the spectral radius

The stationary distribution is in fact an eigenvector with :

But in general case the eigenvector might not satisfy the probability constraint, i.e. non-negativity and sum up to one. Stationary distribution problem reduces to linear system of equations:

This system might have no solutions, one solution or infinitely many solutions in case of collinearity. Therefore, Markov chain might have no stationary distribution, one stationary distribution and infinitely many distributions.

From the experimental data of the year 2010 the following state transition matrix is deduced using Laplacian smoothing with k=1 and 7 classes.

From\To | -100 | -20 | -10 | 0 | 10 | 20 | 100 |
---|---|---|---|---|---|---|---|

-100 | 3.34 % | 3.33 % | 13.33 % | 13.33 % | 6.67 % | 20.00 % | 40.00 % |

-20 | 6.25 % | 6.25 % | 9.38 % | 18.75 % | 21.88 % | 18.75 % | 18.74 % |

-10 | 8.62 % | 6.90 % | 17.24 % | 18.96 % | 17.24 % | 12.07 % | 18.97 % |

0 | 3.13 % | 4.69 % | 15.62 % | 39.06 % | 14.06 % | 10.94 % | 12.50 % |

10 | 6.98 % | 20.94 % | 23.26 % | 18.60 % | 11.62 % | 6.97 % | 11.63 % |

20 | 15.15 % | 18.18 % | 21.22 % | 18.18 % | 12.12 % | 6.06 % | 9.09 % |

100 | 25.53 % | 14.89 % | 27.66 % | 8.51 % | 14.89 % | 4.26 % | 4.26 % |

Or by transposing the table and fitting it into the matrix form:

The transition matrix eigenvector for eigenvalue 1 is approximately:

As we can see the eigenvector doesn’t satisfy non-negativity requirement for probabilities. So it implies that this Markov chain doesn’t have stationary distribution.

No stationary distribution means that either the chain is reducible and/or some of its states are null-recurrent and/or periodic. The chain in consideration is obviously is irreducible because all states are in on communication class, i.e. it is possible to reach each state from another state. Therefore, there are 2 possibilities non-exclusive:

- There are some null-recurrent or transient states.
- The chain is k-periodic with k greater than 1.

The question will be considered in the next article.

]]>So Markov chain is a set of states and all transition probabilities between states.

Let’s assume that estimation of drift parameter might lead to the following 2 states:

- Positive, i.e. drift is greater or equal zero
- Negative, i.e. drift is less than zero

So, one could construct Markov chain for these states as it shown below.

In 2010 for USD/CHF the daily drifts gives the following probabilities:

- P(+) = 53.0769 %
- P(-) = 46.9231 %
- P(+|+) = 51.09 %
- P(-|+) = 48.91 %
- P(-|-) = 45.08 %
- P(+|-) = 54.92 %

Markov chain has stationary distribution. It is calculated by putting:

and therefore by law of total probability:

Thus the stationary probabilities of positive and negative drifts are:

Unfortunately, it doesn’t give any hints on how to predict future drift. Let’s see what it can give for volatility parameter.

For volatility (+) “up” denotes raise of volatility and (-) “down” denotes drop of volatility. In 2010 USD/CHF daily historical volatility probabilities are:

- P(+) = 48.6486 %
- P(-) = 51.3514 %
- P(+|+) = 30.71 %
- P(-|+) = 69.29 %
- P(-|-) = 34.07 %
- P(+|-) = 65.93 %

This asymmetry in transitive probabilities seems to have roots in the definition of “up” and “down”. It is relatively easy to show that if volatility is a random variable with cumulative distribution function F(x), then probabilities of up and down and their transitions should be:

- P(+) = 1/2
- P(-) = 1/2
- P(+|+) = 1/3
- P(-|+) = 2/3
- P(-|-) = 1/3
- P(+|-) = 2/3

State probabilities are derived as follows:

where F(x) is a cumulative distribution function. Integration of the inner expression results in:

And down probability is accordingly:

Transition probabilities require a little bit of math. By definition of conditional probability the up-up probability is:

Let’s calculate joint probability of 2 “ups”:

Integration by y and x yields:

Therefore, conditional probability of up given previous up is:

And this corresponds well with experimental data. The up-down case is solved by the following:

As an off-topic, application stores usually give ranking to apps by user comments and rankings. The simplest way to derive an app rating is to calculate average or median, i.e. some statistical property based on rating samples. For average rating not being a robust statistics, its value is affected by outliers, for instance, by deviant rankings submitted by users. Thus a robust procedure might be used to improve ranking.

In fact we can apply HMM mechanics to infer real application rating by the most likely explanation of observed user rankings. Let’s see how to do that.

The AppStore initial (not-learned) HMM has:

- Observables – list of integers from 1 to 5 where 5 is the best
- Hidden states – integers from 1 to 5 with P = 0.2 for each state
- Probabilities of observable for given hidden state. Heuristic applies if observables should be close to hidden state, i.e. if hidden states is 3 then 3 is the most observed output, 2 and 4 have less probability and 1 and 5 is the lowest ones
- Transitions between hidden states. Heuristic rule is to give the maximum weight to

The initial model is shown on the picture (click on it to enlarge). Note, that states are numbered from 0 to 4.

I used an unofficial Android market API to retrieve rating comments for top ten free applications and train the Markov model on their comments. Those apps were at the moment of writing this article:

- AppBrain App Market
- Lookout Security & Antivirus
- File Expert
- Android Assistant(17 features)
- Scanner Radio
- Greatest Magic Performances!
- Scanner Radio Pro
- Sudoku Free
- US Yellow Pages
- Brightest Flashlight Free

The final HMM is shown below (click on it to enlarge). The result summary is:

- Ratings 1 and 2 are quite rare
- Apps with rating 3 have a tendency to fall to 1 and 2
- Rating 4 is relatively unstable as there are 46.1% chance to remain and 36.6% chance to move up
- Rating 5 is quite stable as probability to remain at top is 90.3%

Android Market API is protected against overwhelming querying, so it is pretty hard to obtain a significant dataset for further investigations of the model applicability. But the results are rather plausible.

Natural language processing might give further improvements to the model as well. That might be a next topic.

]]>Assume that we’re given a one-dimensional stochastic process:

where and are some functions of arguments .

We observe this process by measuring $\latex S_i(t_i)$ where . For sake of simplicity assume that observations are equidistant in time, i.e. .

So, let’s estimate parameters.

As for the method, one should construct joint probability density for the observations. By classic method we should construct all conditional probabilities of all time-forward transitions between observations, i.e. there should be conditional probabilities. But we can reduce the amount if the process satisfies Markov property or in other words it is memoryless.

Therefore, joint probability density for the observations is:

where is a probability that the process starting at with parameters comes to .

Let’s apply a common trick and take a logarithm of the probability:

Usually it is better for computational reasons.

Conditional probability is somewhat hard to compute in closed form for the most interesting cases. But a general approach exists for Ito processes. There is a differential equation describing time evolution of probability density function of stochastic process. It is Fokker-Planck equation and is also know as Kolmogorov forward equation. The equation is widely used in quantum physics and is quite deeply studied.

Fokker-Planck equation for density function for the process given is:

Thus the conditional probability is a solution of the equation:

The equation usually has at least numerical solution. So it is possible to provide quite efficient implementation of MLE algorithm.

Another method for conditional probability is to apply Monte Carlo sampling to find a particular conditional probabilities . It is quite a brute force algorithm to find them but it might be useful in case of complicated stochastic processes.

The method is available in Java as part of Monte-Carlo library.

The original paper of Andrew Law “Maximum Likelihood estimation of Generalized Ito process with discretely sample data”

]]>The SABR model describes a single forward F, such as a LIBOR forward rate, a forward swap rate, or a forward stock price. The volatility of the forward F is described by a parameter σ. SABR is a dynamic model in which both F and σ are represented by stochastic state variables whose time evolution is given by the following system of stochastic differential equations:

Constant parameters should satisfy the condition

Here, and are two correlated Wiener processes with correlation coefficient . For simplicity sake, we assume that , therefore, we put :

In this case it is possible to integrate equation and find an exact solution:

Therefore:

Therefore:

Let’s introduce new variables for convenience:

By naive approach, in finite differences the SABR model looks:

Assumption of correlation gives

where and are independent normal distributed random variables.

I don’t know what to do further with it. So, I’ve to fallback to brute force Maximum Likelihood Estimation with Monte Carlo sampling to obtain conditional probability.

]]>