In this assignment, you will implement the Louvain and Leiden algorithms and compare the results on two protein interaction networks. Get started early. This assignment is significantly more difficult than Assignment 1!
MoveNodes
function, i.e., one iteration of Phase 1 of the Louvain algorithmMoveNodes
refers to the function in the pseudocode on page 28 of the slides.Louvain
functionMoveNodes
back to the function repeatedly until the value of the quality function does not improve.MoveNodesFast
function, i.e., one iteration of the Leiden algorithmRefinePartition
and MergeNodeSubsets
functionsRefinePartition
should be easy to implement. Hint: For the sake of convenience, when you are invoking MergeNodeSubsets
, do not pass the entire partition \(P_{\mathrm{refined}}\). Instead pass only the nodes in the community \(C\) but with each node in an individual community.MergeNodeSubsets
looks very complex, you can break up the implementation into pieces. Let us name its arguments as \(G\), \(P\), and \(S\).
CheckGammaConnected
that takes two arguments \(T\) and \(S\). Both are sets of nodes where \(T\) is a subset of \(S\). This function simply checks if the number of edges that connect the nodes in \(T\) to the nodes in \(S-T\) is at least as large as the product of \(\gamma\), the number of nodes in \(T\), and the number of nodes in \(S-T\).CheckGammaConnected
repeatedly with each node \(v\) in \(S\) and \(S\) as the two arguments.CheckGammaConnected
repeatedly with each community in \(P\) and \(S\) as the two arguments. Here, be sure that each community in \(P\) that you consider is a subset of \(S\). This condition should hold true if you follow the hint above.Leiden
functionRefinePartition
back to the Leiden
function repeatedly until the value of the quality function does not improve.Louvain
.Leiden
breaks \(c\) up into smaller communities, then deem \(c\) to be badly connected.MoveNodes
or MoveNodesFast
. Here it is important to record the time taken by the process itself rather than the clock time. Find out how to do it accurately in the language you use.A PDF file on Canvas that contains the following plots:
Grading for items 4 and 5 will be somewhat subjective.
This assignment is challenging. Here are some ideas to help you succeed.
MoveNodes
, you will test moving the node \(v\) to every community \(C\), including the empty community. Here, you should create temporary variables that store the new partition where \(v\) is in \(C\) and compute the modularity/CPM value of this new partition explicitly using the quadratic time algorithm. Compare the difference of this value and \(h_{old}\) to the value you compute for the change in modularity/CPM value using the efficient function.MoveNodes
. To speed it up, change the implementation of line 16 of the pseudocode for MoveNodes
and line 17 of MoveNodesFast
. Instead of trying to move node \(v\) to every cluster in the partition, consider only those clusters that contain a node \(u\) that is a neighbour of \(v\). This the number of clusters to which you try to move \(v\) will change to the number of neighbours of \(v\). Consider moving \(v\) to an empty cluster as well.
For further speed, consider the formulae for change in modularity that I posted on Piazza. One of them is \(\Delta{\cal H}_{\cal P}(v \rightarrow \emptyset) = - \frac{1}{m}\sum_{u \in D} a(u,v) + \frac{\gamma d(v)}{2m^2}\sum_{u \in D} d(u)\). Precompute each of the two sums and cache them so that you can look them up in the functions to the compute the change in modularity. If you implement these ideas, be sure to write new functions for MoveNodes
and MoveNodesFast
so that you still retain the old, slower, but bug-free implementations.
MoveNodes
or MoveNodesFast
, you will only need to do a lookup in the hash table.MoveNodes
or MoveNodesFast
the first time but I am sure you can work out these initial values are simple.MoveNodes
or MoveNodesFast
, you will only need to do a lookup in the hash table.