How to find all simple cycles in an undirected graph efficiently?

MapReduce: How to find Online Communities by removing nodes(vertices) from a social graph?

  • I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with  different clusters joined together only by some nodes(kind of ambiguous  nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into  many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1) Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weighted but  this problem is more like finding structures in a graph, so edge  weights won't be of relevance. (Another way to think about the problem  would be to visualize Solid Spheres touching each other at some points  with Spheres being those strongly connected components and touching  points being those ambiguous nodes) I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible solution. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case. Is there any research paper, blog that addresses this or somewhat  related problem? Or can anyone come up with a solution to this problem  howsoever dirty. Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too? Is there any current open source implementation in MapReduce that can i directly use? I think this problem is analogous to Finding Communities in Online Social Network Graphs with communities need to be discovered by removing nodes(vertices).

  • Answer:

    To address your algorithmic challenge, the metric you are describing is a "betweenness" metric. That is, if you construct shortest paths between all nodes in a graph (using your weighting mechanism). The nodes / edges of high centrality will be reused in a large number of those paths, as they are related to the congestion that represents these points.  These edges / nodes have high "betweenness centrality".  I recommend looking into approximation methods for identifying central edges; note that after each removal of a highly central graph element, recalculation is potentially expensive. This recalculation is necessary to avoid removing edges that are no longer central after removals (e.g., if you had two clusters linked by two nodes). Read about Girvan-Newman betweenness centrality clustering algorithms.  The JUNG java library has an implementation, although their implementation does not support hierarchical clustering (e.g., saving the clusters after each cluster step).

Jacob Ouellette at Quora Visit the source

Was this solution helpful to you?

Other answers

There's clustering in Mahoot, but alos this http://xrime.sourceforge.net/ xrime seems to use hadoop 0.20 though, which may or may not be an issue for you (I am constantly confused as to which API I am supposed to use now)

Simon Thompson

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.