Beyond Collaborative Filtering (Part 1)

Here at Rubikloud, a big focus of our data science team is empowering retailers in delivering personalized one-to-one communications with their customers. A big aspect of personalization is recommending products and services that are tailored to a customer’s wants and needs. Naturally, recommendation systems are an active research area in machine learning with practical large scale deployments from companies such as Netflix and Spotify. In Part 1 of this series, I’ll describe the unique challenges that we have faced in building a retail specific product recommendation system and outline one of the main components of our recommendation system: a collaborative filtering algorithm.  In Part 2, I’ll follow up with several useful applications of collaborative filtering and end by highlighting some of its limitations.

Building a Scalable Product Recommendation System

Personalized product recommendations appear in many contexts for a retailer from their e-commerce homepage to their direct mail marketing to their mobile app. The primary goal of product recommendations is to get the customer to make an additional purchase. Beyond simply purchase, there may be secondary goals such as up-selling, cross-selling, clearance, etc. Depending on any number of business factors such as the time of year, supplier contracts, and inventory levels, a retailer may have additional constraints on what recommendations should be shown to their customers. These competing objectives significantly complicate the recommendation process, often “diluting” the optimized recommendations a system provides.

To further complicate issues, it is not uncommon for real world retail data to be messy. A retailer’s transaction data are typically reasonably accurate (as it is the lifeblood of their business) but neighboring data sources may not be as clean. Examples of bad data might include inaccurate/incomplete product attributes, missing promotion schedule, or low cross-basket customer tracking. These business and data issues significantly complicate the application of off the shelf product recommendations systems.

The Recommendation Workhorse: Collaborative Filtering

Given the above data issues, a natural solution to solving the recommendation problem is to use collaborative filtering. Collaborative filtering approaches collect information on users’ previous activity and behavior (i.e. transactions) to make predictions on preferences (i.e. product recommendations) based on a user’s similarity to other users. The big benefit is that you do not need detailed information about explicit customer preferences or detailed product attributes, which usually are missing or quite messy.

By far the most common collaborative filtering algorithms are ones based on matrix factorization [1]. In fact, this technique is employed by both Spotify [2] and Netflix [3] as a component in their recommendation systems. In the context of product recommendations, we begin with a matrix \(\bf{R}\) mapping customers to products, where each cell \(r_{i,j}\) corresponds to a rating (or an implicit rating such as a number of purchases) of customer \(i\) to product \(j\).

{\bf R} = \left[ \begin{array}{ccc}
r_{1,1} & \ldots & r_{1,N} \\
\ldots & r_{i,j} & \ldots \\
r_{M,1} & \ldots & r_{M,N} \\
\end{array} \right]

Notice that this is quite a sparse matrix because it is unlikely that a single customer has purchased more than several dozen of the tens of thousands of products available. The empty cells are considered missing because we don’t know whether or not the user has an affinity to that product (as opposed to \(0\), which means they have no affinity). Matrix factorization techniques aim to factor \(\bf{R}\) into two components \(\bf{U}\) and \(\bf{P}\) with a joint latent factor space where each product and item can be represented as an \(f\) dimensional vector:

{\bf R} = {\bf U} {\bf P} =
\left[ \begin{array}{ccc}
u_{1,1} & \ldots & u_{1,f} \\
\ldots & u_{i,j} & \ldots \\
u_{M,1} & \ldots & u_{M,f} \\
\end{array} \right]
\left[ \begin{array}{ccc}
p_{1,1} & \ldots & p_{1,N} \\
\ldots & p_{i,j} & \ldots \\
p_{f,1} & \ldots & p_{f,N} \\
\end{array} \right]

The big advantage of this method is that both users and products are mapped to the same \(f\) latent feature space. Intuitively, one can think of these latent factors mapping to familiar attributes such as gender, category, brand, or color.  However instead of being explicitly limited to dimensions which we can describe, the model learns the right combination to represent a product-customer preference relationship. To find a customer’s affinity for a product, you simply need to a perform a dot product between the \(f\) dimensional customer vector and the \(f\) dimensional product vector. In the same way, we can find customer-customer and item-item similarity.

The tricky part is in factoring \(\bf{R}\) though, it is both quite large and has missing values. Instead of filling in the missing values and directly factoring the matrix, we estimate \(\bf{U}\) and \(\bf{P}\) by minimizing the regularized squared error of the known ratings:

\min \sum_{r_{i,j}\text{ is known}} (r_{i,j} – u_{i,\cdot}^T p_{\cdot,j}) + \lambda (\|u_{i,\cdot}\|^2 + \|p_{\cdot, j}\|^2)

where \(u_i\) and \(p_{\cdot,j}\) represent the \(f\) dimensional vector of customer \(i\) and product \(j\) respectively. Without getting into much more of the nitty gritty, we should also be using an objective function that takes into account the implicit nature of our data (a customer doesn’t directly rate a product when he/she buys it). This is explained in [4] with a distributed implementation available in Spark’s MLlib. Additional variations to the objective function that take into account factors such as shifting user preferences over time [1].

Please also check out part 2 of this series where we go into some applications of collaborative filtering as well as its limitations.

If any of these problems sound interesting to you, we’re always looking for talented people to help us build out our personalization engine. We’ve got a diverse team with some pretty bright people, come join us!

[1] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (August 2009), 30-37.
[2] “Algorithmic Music Recommendations at Spotify”,
[3] “Netflix Recommendations: Beyond the 5 stars”,
[4] Y. Hu, Y. Koren and C. Volinsky, “Collaborative Filtering for Implicit Feedback Datasets,” Data Mining, 2008. ICDM ’08. Eighth IEEE International Conference on, Pisa, 2008, pp. 263-272.