How do I order the nodes of a social network to get the best locality when running map-reduce graph algorithms?
-
The Schimmy approach to optimising the performance of graph algorithms uses a partitioning strategy that groups nodes together using data derived from some attributes of the nodes -- see http://www.cloudera.com/blog/2010/11/do-the-schimmy-efficient-large-scale-graph-analysis-with-hadoop-part-2/ For example, a web-crawl graph will generally get good locality if its nodes are sorted by the domain of the URL of each webpage, since same-domain pages are likely to link to each other. What is the most effective (and efficient to calculate) way to totally order a social graph's nodes to maximise the likelihood that two nodes sorted nearby to each other will want to communicate during processing?
-
Answer:
it's been a while (meaning the graph was pretty small then and so was the hadoop cluster and the average friend cardinality) - but i spent quite a bit of time trying to perform the the simplest of graph algorithms on FB's graph. the problem being finding top friends by mutual friends. my conclusion was that map-reduce (as implemented in hadoop) is a very poor paradigm for this particular graph there is little locality way too much intermediate data is generated i tried many non-trivial ways of partitioning the graph (by different features) as well as estimated the effect of local caching strategies (to only shuffle when required). while these helped quite a bit back then - by the time i got close to finish - i was asking myself - why not just do in memory? we did the math afterward - if we just stored the graph in memory and we had a fast hash table and decent networks - we could do the entire computation with a small handful of machines and with none of the complicated tuning around partitioning and caching. partly this may be implementation dependent (hadoop is not the fastest map-reduce implementation, my experience is based on v0.13 and hadoop does grouping by sorting - something that can be avoided in most cases (like this one)). it's not so much that algorithmically map-reduce is fundamentally worse off (after all - one can think that instead of sending network lookups one at a time with an in-memory solutions - map-reduce is simply batching those requests up) - but that the implementation (hadoop) is associated with various artifacts extraneous to the problem that greatly increase the computation cost (buffering incurs cost of storing data on disk, network io is not well overlaid with computation, sorting is really expensive, running huge number of mappers/reducers is desirable - but is bottlenecked on a single job tracker and so on). just my little practical experience on this matter.
Joydeep Sen Sarma at Quora Visit the source
Other answers
For many algorithms, MapReduce is not the best solution. Social Networks, the WWW, and many other real-world graphs are "scale-free" [1]. One of the characters of a scale-free network is a small diameter [2]. Practically, this means that a scale-free network is hard to partition; after a few iterations of a graph traversal, you visit a very large portion of the graph. For many graph algorithms, alternative hardware/network architectures have shown very great results both in research [3] and practice [4]. Both of these architectures solve the problem of lack-of-locality and lack-of-partitionability by storing the graph in RAM. There is also research on hybrid MapReduce approaches [5]. [1] See references here: http://en.wikipedia.org/wiki/Scale-free_network#References [2] http://arxiv.org/abs/cond-mat/0205476 [3] http://www.cc.gatech.edu/~bader/papers/MassiveTwitter.html [4] http://www.readwriteweb.com/hack/2010/12/how-hunch-built-a-data-crunchi.php [5] http://www.cc.gatech.edu/~bader/papers/HybridMapReduce.html
Joe Crobak
If by locality, you are referring to node locality (most friends' data are located in the same HDFS node) and minimizing network I/O, my guess is geography would do. I'd imagine many, if not most, of one's friends are in the same geographical region as the person. If by locality, you are referring to locality of reference (e.g. most friends' data are clustered together), I can't think of a good way to do it.
Reynold Xin
One way to achieve a little speedup is to use a LSH trick to cluster users. Take some hash function h and assign all users to a cluster given by min_{u \in \mbox{friends}} h(u). Users with many connections in common will have a slightly higher probability of ending up in the same cluster. Don't expect any miracles though.
Erik Bernhardsson
This is a topic of current research. There is at least one recent paper which directly addresses this question : http://www.eecs.harvard.edu/%7Emichaelm/postscripts/kdd2009.pdf It turns out that finding the "best" ranking is NP-Hard. However, they do give some heuristics and show how it works on a few graphs. You can read the blog post that covers this to get more details: http://mybiasedcoin.blogspot.com/2009/05/new-paper-on-compressing-social.html
Mangesh Gupte
If you have a large, global social graph, you could do *much* worse then use language spoken to partition the space. (at least that's what basic analysis of the Flickr social graph told us)
Kellan Elliott-McCrea
This is a great research question and I doubt there exists a "best" solution. Also graph algorithms are so broad an area -- problems ranging from O(1) to NP-hard. I don't think one graph partitioning scheme can fit in all problems. Locality-based partitioning is problem good for some simple problems such as friends-of-friends and shortest paths. It should be a great research topic.
Ning Zhang
Related Q & A:
- How to create a social network?Best solution by Yahoo! Answers
- What is the gprs? i WOULD LIKE TO use a GPRS network?Best solution by Yahoo! Answers
- How Do I Setup My Printer On A Wireless Network?Best solution by Yahoo! Answers
- How do I order a channel on Comcast?Best solution by Yahoo! Answers
- What is a social network?Best solution by wiki.answers.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.