HBase schema design for user profiling?
-
I'm new to HBase and I'm trying to use it for one of our projects. The background is like this: There're hundreds of mobile applications on our platform. Each user has a profile: user name, IP address, email, preferences, etc. Within each application, each user has some activities we want to keep track of. Examples include voice queries within s-voice, location history within foursquare and so on. And here're some example access patterns: Based on the data, we can do data mining to learn some user behavior so as to provide better answers. We might want to find what happens recently for each user. We might want to do some aggregates such as how many users ask "what weather it is in Seattle". Please advise what kind of HBase table schema design I should design. Thanks a lot!! Ad
-
Answer:
Am going to be a bit general and probably point you in the right direction of how to go about thinking about the data model. You might have to evolve the data model as the use cases increase, but then that's relatively easy since it's a schema-less datastore. First rough pass on the data model: Row key - user name 2 c.f.s - Profile Activity Profile: "Profile" c.f. can contain IP Address, e-mail, preferences etc. Column key could be "IP Address" and value could be the IP Address of the user/request. ...so on.. Activity: "Activity" c.f. can contain users activity. Column key could be AppID and value could be the activity stored in JSON/Avro sort of data structure. Column key could be RecentAppID and value is the App ID on which there is a recent activity by the user. 2. We might want to find what happens recently for each user. >> You can enable versioning on this c.f. if needed, so the latest activity by a user on a particular app is easily query-able or you can store all the activity by a user on a particular app as JSON/Avro only in 1 version where by you keep on over-writing(read previous version, add that recent activity to it) that version. For reference, see this: http://hbase.apache.org/book/versions.html Also, in Activity c.f. we have the RecentAppID using which we know the AppID the user recently used and then the query the activity for that particular AppID from the same c.f. 3.We might want to do some aggregates such as how many users ask "what weather it is in Seattle". >> You can run M/R job on the table to derive aggregates like easily. Also. you could use counters if you need some of these aggregates in real-time. The point is there are various ways one can model the data in schema-less data stores. You need to really list out all your access patterns and then come up with data models accordingly.
Jahangir Mohammed at Quora Visit the source
Related Q & A:
- How to create an external table for hbase?Best solution by stackoverflow.com
- How can I make Private Messages in Drupal 7 with user user pictures like in Facebook?Best solution by Drupal Answers
- How to check on all the permissions granted to a user and apply the same permissions to another user?Best solution by Stack Overflow
- How to dump an HBase table?Best solution by Stack Overflow
- How do I permanently block a user forever? I mean, find or enter the username, when the user isnt on my list?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.