Why my SQLite3 query takes time?

What is the standard way to order contents based on user's feature or query?

  • We are currently building a personalized commerce site. Since our site aims to provide personalized recommendations, we have built a simple, yet effective, scoring engine which can score items based on user's input. Once a user feeds in 'some input' to our service, we would like to make every possible list of items to be ordered, such that most relevant items goes to the top. Also, we want to give the user ability to change its input easily. However, there seems to be multiple difficult problems that needs to be solved. First off, we should pre-compute the item's score for every possible combination of users' input since scoring takes long time to compute. Currently, it isn't impossible to precompute those since user's input is somewhat limited. However, once we try to expand the users' input, then it may be near impossible to know the every combination of inputs. Also, even if we assume that all the computation is done offline, ordering and paging for specific category, or set of items should be done effectively. However, even with those pre-computed scores, ordering and paging set of items online, without pre-computation make take too much time. For instance, showing top 20 items for category 'dress' which presumably contains 10000+ items make us to order 10000+ numbers, which is far from real-time. I am sure a lot of service, especially search engines, are handling abovementioned issues well. Real-time personalization and ordering / paging based on query. Are there any standardized way to achieve this?! Or even better, are there any relavant book that I can read about?

  • Answer:

    Disclaimer - I started learning ML last semester & have worked of couple of projects here & there. My answer might be little theoretic but it might give you useful head-start. Pre-computation: First of all, there are few problems associated with pre-computation of scores for each item in each category like : 1) We don't know of a universal set of words from which user's query will be formed. We can't deny possibility of having to process unseen word 2) Trying out every possible permutation is brute-force, exponential & in short impractical. Some of the permutation might be so obsolete that it won't be worth the processing efforts & development time. Vector space model: You can create a vector space model for the data you have. Few possible dimensions can be : Color, gender of the customer, dress type, manufacturer, material used etc. Cost of the dress & date related parameters might help you in sorting, but I don't see much benefit for having dimensions for them. Ex query - "Blue Denim" You can consider points (entities) which are located at <d:color, v:blue> , <d:dress_type, v:jeans> etc and neglect all other items. This step narrows down your result set by many folds. More about vector space : http://en.wikipedia.org/wiki/Vector_space_model Category based clusters: Extending above idea, you can form clusters to similar items. This step you can do beforehand. And then do finer processing. There are many clustering algorithms like K-means, Hierarchical etc. See which fits you better. Tags based approach and N-grams: I don't know how sophisticated is your data, but if you have a description/tags along with each item this approach can also be applied. Tags will be nothing but attributes of the entity. Even Google incorporates keywords based ranking, so it is certainly feasible. If you have plain text description then N-gram vector for each item can be pre-computed. Similarly compute query vector and then find the best match. Asymmetric scoring: This is a novel idea for handling false positives. The deal is to set different penalties for mistakes committed for different range of items. Ex. Showing a shirt at 15th when user asks for t-shirt should hurt more than showing shirt at 97th. Making use of user data: I assume you will allow users to create profiles on your e-commerce portal. One easy way will be to ask for trivial details like gender & measurements, because they won't be entering these details in the initial search query. That doesn't mean we can't make use of that information though :D Also, give an option to change these settings right there on search result page. This handles situation where users are buying stuff for their friends and family & not for themselves. Misc - Sorting: We know a linear solution for Kth order statistics (Source : http://en.wikipedia.org/wiki/Selection_algorithm) This algorithm fits the need better as we simply want top K items & don't want to sort the entire data. Hope this helps!

Rushabh Mehta at Quora Visit the source

Was this solution helpful to you?

Other answers

As you pointed out pre-calculating scores for every possible query - item pair will not scale. An alternative approach commonly used in recommendation systems is matrix factorization. This involves projecting the query and items into a common space and then calculating the distance, using some measure, between the query and items in that space. Generally the matrix factorization, which determines how the queries and items are projected into the common space, is calculated offline. The projection of all items can also be pre-calculated. This means that at run-time all that needs to be done is to project the query into the common space and the calculate the distance to each document. You may like to look into alternating least squares and SVD as a starting point. Mahout has a large scale implementation which is described here http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares. Another very promising approach is to use RBMs / deep networks which are very good at extracting non-linear features from the queries and items. Semantic hashing is a promising approach to doing this in constant time. See here for details http://www.utstat.toronto.edu/~rsalakhu/papers/semantic_final.pdf

Raphael Cendrillon

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.