How can one apply factorization or model-based algorithms with incremental or online learning to enable a very large-scale online recommender system?
-
Model based machine learning algorithms have been proven to have state-of-the-art results on recommender system, and it has got attraction from academic researchers continuously. Many ideas have been presented, such as matrix factorization(SVD, SVD++, ..., or a series of solutions to attend Netflix Prize), factorization machine, personalized learning to rank, ..., etc. I've noticed that all of those research results have done a great job on model training, such as introducing distributed and parallel SGD[3] or L-BFGS to accelerate the training process, while some other solutions have even introduced the incremental or online learning capability to train the models, such as MatchBox[1], or some solutions evolved from Yahoo CORE (Content Optimization and Relevance Engine)[2] which is used for both content recommendation and targeted advertising. However, there's one fundamental problem on applying those algorithms to a very large scale online service: say, those with a large item pool(such as tens of millions of items, or even hundreds of millions of items). All of model based algorithms only focus on how to achieve a good scoring function to decide user-item affinity, for example, the most typical scoring process coule be: the score is computed through an inner-product between user latent factor and item latent factor. While if top-N items is required to be selected for a certain user, the scoring process has to be computed across all items. As far as I know, such computation will reach up to near 1 second given if there are only tens of thousands of items available to be scored for only single user, which makes it impossible to perform the computation on the fly with a much larger item pool. On the one hand, researchers are striving to shorten the training time as much as possible, even make it online (one could take Microsoft's Matchbox[1] to get one of such solution), while on the other hand, whenever the model has been changed, the affinity scores for all user-item dyadic pairs would have to be changed, which is an obvious confliction. As a result, the problem could be conclude as "how to improve the prediction performance for model based recommender"---here "performance" means computation cost instead of accuracy. Unfortunately, I've not found any reasonable suggestion through academic works. A similar situation is Youtube which owns over 50M items, Google adopts a naive item based CF[4] which does not suffer from the prediction performance issue of model based approaches. Does there exist any practical solution to resolve this challenge? [1] David H. Stern, Ralf Herbrich, Thore Graepel: Matchbox: large scale online bayesian recommendations. WWW 2009: 111-120 [2] Deepak Agarwal, Bee-Chung Chen, Pradheep Elango: Fast online learning through offline initialization for time-sensitive recommendation. KDD 2010 [3] A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems by Y. Zhuang, W. Chin, Y. Juan and C. Lin. RecSys 2013 [4] The YouTube video recommendation system James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, Dasarathi Sampath Fourth ACM conference on Recommender systems (2010)
-
Answer:
I'm going to point you at a project I work on, because I think it has characteristics that are directly relevant to your question. Oryx (https://github.com/cloudera/oryx) uses a low-rank matrix factorization to make recommendations. It's not exactly the same model build process that you cite above, but your question is really about the run-time scoring, which is identical. Yes you are conceptually multiplying a user-feature vector by the whole item-feature matrix to make recommendations. For what it's worth, on one core and modern hardware, Oryx can probably score 10,000,000 items in a second, not 10,000. This is in Java. The naïve approach is not that slow. That's still not blazingly fast. There are then tricks from there that it can do (that anyone could implement) to speed it up: Multi-thread the scoring to scale "vertically". There is no reason you can't split up the operation across N cores efficiently. Scale horizontally by adding more servers to handle more requests. Each one is after all independent. Use location-sensitive hashing to decide which very small subset of the item-feature matrix is likely to contain elements with a large dot product with the user-feature vector. It's approximate, but can make things much faster without much loss of accuracy. Ideally: use business domain knowledge to filter in only a small set of candidates to score, because they are the only ones eligible to be returned. These make quite usable systems in practice. The bottleneck I find is that the above requires the whole item-feature matrix to be in memory. Memory is cheap these days but this is usually the barrier before running out of CPU. You also mention cost of updating the model. You certainly can't re-factor the matrix on every input, but you don't have to. You can make an approximate update by assuming that a user-item datum only changes that user-feature and item-feature vector. Not 100% true but true to a first approximation. I have an old slide on this: http://www.slideshare.net/datasciencelondon/big-practical-recommendations-with-alternating-least-squares-16190093/14
Sean Owen at Quora Visit the source
Other answers
Here are the major rearch works I've found to face such problems: Efficient Retrieval of Recommendations in a Matrix Factorization Framework. Koenigstein, Noam and Ram, Parikshit and Shavitt, Yuval, CIKM 2012 Learning Binary Codes for Collaborative Filtering. Zhou, Ke and Zha, Hongyuan, SIGKDD 2012 Maximum Inner-Product Search using Tree Data-structures. Ram, Parikshit and Gray, Alexander G, arXiv preprint arXiv:1202.6101 Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). Shrivastava, Anshumali and Li, Ping, arXiv preprint arXiv:1405.5869 Speeding Up the Xbox Recommender System Using a Euclidean Transformation for Inner-Product Spaces. Bachrach, Yoram and Finkelstein, Yehuda and Gilad-Bachrach, Ran and Katzir, Liran and Koenigstein, Noam and Nice, Nir and Paquet, Ulrich, RecSys 2014 Preference Preserving Hashing for Efficient Recommendation. Zhang, Zhiwei and Wang, Qifan and Ruan, Lingyun and Si, Luo, SIGIR 2014 On Symmetric and Asymmetric LSHs for Inner Product Search. Behnam Neyshabur and Nathan Srebro, arXiv preprint arXiv:1410.5518
Yingfeng Zhang
Related Q & A:
- How can I apply a filter on a page?Best solution by Stack Overflow
- How can i apply to a community college in the USA?Best solution by Yahoo! Answers
- How can I apply for work at Mervyns online?Best solution by wiki.answers.com
- How can I apply for JYP audition online?Best solution by audition.jype.com
- How can you apply online for part time jobs in stores?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.