MapReduce: How to find Online Communities by removing nodes(vertices) from a social graph?
-
I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1) Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weighted but this problem is more like finding structures in a graph, so edge weights won't be of relevance. (Another way to think about the problem would be to visualize Solid Spheres touching each other at some points with Spheres being those strongly connected components and touching points being those ambiguous nodes) I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible solution. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case. Is there any research paper, blog that addresses this or somewhat related problem? Or can anyone come up with a solution to this problem howsoever dirty. Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too? Is there any current open source implementation in MapReduce that can i directly use? I think this problem is analogous to Finding Communities in Online Social Network Graphs with communities need to be discovered by removing nodes(vertices).
-
Answer:
To address your algorithmic challenge, the metric you are describing is a "betweenness" metric. That is, if you construct shortest paths between all nodes in a graph (using your weighting mechanism). The nodes / edges of high centrality will be reused in a large number of those paths, as they are related to the congestion that represents these points. These edges / nodes have high "betweenness centrality". I recommend looking into approximation methods for identifying central edges; note that after each removal of a highly central graph element, recalculation is potentially expensive. This recalculation is necessary to avoid removing edges that are no longer central after removals (e.g., if you had two clusters linked by two nodes). Read about Girvan-Newman betweenness centrality clustering algorithms. The JUNG java library has an implementation, although their implementation does not support hierarchical clustering (e.g., saving the clusters after each cluster step).
Jacob Ouellette at Quora Visit the source
Other answers
Got this answer on Stackoverflow[1] .... http://micans.org/mcl Seems legit, in the first go! [1] http://stackoverflow.com/questions/10764597/graph-clustering-for-almost-clustered-graph-by-removing-nodesvertices?noredirect=1#comment25991842_10764597
Shashank Gupta
There's clustering in Mahoot, but alos this http://xrime.sourceforge.net/ xrime seems to use hadoop 0.20 though, which may or may not be an issue for you (I am constantly confused as to which API I am supposed to use now)
Simon Thompson
Related Q & A:
- How To Find Online Skype Contact?Best solution by Yahoo! Answers
- How to find out if it is possible to contruct a binary matrix with given row and column sums?Best solution by Mathematics
- How to find connected components of a random graph?Best solution by Mathematics
- How to find a path in graph with maximum edges?Best solution by stackoverflow.com
- How to find out if I have a bench warrant online?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.