Collaborative Filtering for user behavior

Recommender systems are a way of suggesting like or similar items and ideas to a users specific way of thinking. Collaborative Filtering is the most important algorithm to find missing rating, or evaluate items base d on user behavior.

The passage is used to illustrate how Collaborative Filtering (short for CF in the following section) algorithm could be used for analysis of user behavior, including purchase, browsing, etc
1. Data preperation
Input data: clickstream data in specific time duration
Process for algorithm:

  • Group data by category, customer_id, action_name, then we can get data for different actions as following
category customer_id count(ProductDetailPageViewEvent)
A1 C0000000004 5
A2 C0000000003 2
  • python implementation:
data.groupby(['action_name', 'caas_customer_number', 'productCategory']).size()

then we can get table for each different action.

2. Collaborative Filtering

  • For each action, we take category, customer_id as input matrix then form category*customer_id matrix as follow:
C0000000004 C0000000003
A1 5
A2 2
  • Low rank matrix factorization
    Given $n_m$ is total count of product category, $n_u$ is total count of customer number, then
  1. Original math equation for CF is:

$$ J(x^{ (1) }, ..., x^{ (n_m) }, \theta^{ (1) }, ..., \theta^{ (n_u) }) = \frac{1}{2} \sum_{(i,j):r(i,j)!=-1} \left[ (\theta{ (j) }){T} * x^{ (i) } - y^{ (i,j )} \right] ^ {2} $$

2.   Suppose we have mask matrix $ M(i,j) = { y(i,j)!=-1 } $

$$ J(x^{ (1) }, ..., x^{ (n_m) }, \theta^{ (1) }, ..., \theta^{ (n_u) }) = \frac{1}{2} \sum_{ (i,j) } \left[ \lbrace (\theta{(j)}){T} * x^{ (i) } - y^{ (i,j) } \rbrace * M(i,j) \right] ^ {2} $$then vetorize for matrix$$ J(x^{ (1) }, ..., x^{ (n_m) }, \theta^{ (1) }, ..., \theta^{ (n_u) }) = \frac{1}{2} \sum \left[ (\Theta^{T}*X -Y).*M \right] $$

3.  Add regularation item

$$ J(x^{ (1) }, ..., x^{ (n_m) }, \theta^{ (1) }, ..., \theta^{ (n_u) }) = \frac{1}{2} \sum_{ (i,j):r(i,j)!=-1 } \left[ (\theta^{ (j) })^{T} * x^{ (i) } - y^{ (i,j) } \right] ^ {2} + \frac{\lambda}{2} \sum_{i=1}^{ n_{m} } \sum_{k=1}^{n}( x_{k}^{(i)} )^{2} + \frac{\lambda}{2} \sum_{j=1}^{n_{u}} \sum_{k=1}^{n}( \theta_{k}^{ (j) } )^{2} $$