Assignment 1 (CS 4884: Computing the Brain)
Deadline: 11:59pm, February 20, 2020

The goal of this assignment is to determine if different brain networks have the small world property in order to replicate the results in some of the papers we will discuss in class. You will have to write code for this assignment. You can use any programming language you want. Feel free to use matrix or graph libraries that already implement helpful classes and functions, unless I instruct you otherwise. Work on this assignment individually.

Your code will have bugs. It may take a long time to run. Therefore, start well in advance of the deadline!

Analysis of brain connectomes

You will analyze specific brain connectomes for their small-world properties. The networks you will consider are the following. The first five networks are from the Brain Connectivity Toolbox:

  • The weighted and directed interareal connectome for the macaque cerebral cortex used in Cortical High-Density Counterstream Architectures.
    • This file has many columns. You may ignore NEURONS, STATUS, and BIBLIOGRAPHY.
    • SOURCE and TARGET columns specify the starting brain region and ending brain region, respectively, of each neuron.
    • The FLNe value is the edge weight.
    • The same SOURCE-TARGET pair may appear in multiple lines with different values of FLNe. The CASE value should also change, since each CASE is in effect a different animal.
    • A larger value of FLNe corresponds to more neurons between the SOURCE and TARGET brain regions.
  1. For each network, write a function to read the corresponding file into a data structure. You may use existing libraries and packages, e.g., the NetworkX library in Python, to store the network. Some libraries may also have their own functions to read in a graph from a file, although you must be careful to ensure that the format of the file is supported by the function. Here are some points to keep in mind as you implement this function.
    • The graph may be directed or undirected and each edge may or may not have a weight; this weight is usually in the third column of the file.
    • You can assume that a graph is undirected if every edge \((u,v)\) appears twice in the file, once with \(u\) in the first column and \(v\) in the second column and another time with the order flipped. However, the edge weight must be the same in both appearances of the edge.
    • In addition, you may assume that a graph is undirected if at least some fraction of edges appear twice in the file, e.g., 0.95 or more of all edges.
    • Your function to read the graph must include tests to determine if the graph is directed or undirected and unweighted or weighted.
    • If a graph has edge weights, you may ignore them for the purpose of computing the clustering coefficient.
    • Some of these graphs may be disconnected. Consider how you will deal with them.
    • Parsing the final dataset and creating a graph from it will require some thought and specific decisions. Explain in your report how you (a) dealt with multiple appearances of the same pair of SOURCE-TARGET values and (b) modified the FLNe values to make them compatible with shortest path computations.
  2. After reading in the network, compute two properties: the average shortest path length and the average clustering coefficient. Implement these functions with your own code, even if the graph library you use contains these functions. The one exception to this rule is that you may use a library function to compute the shortest path between a pair of nodes or from one node to every other node. Recall that to compute the average shortest path length in an undirected graph with \(n\) nodes, you must compute the length of the shortest path between all \(\binom{n}{2}\) pairs of nodes and then calculate the average of these values. Similarly, you must compute the clustering coefficient of every node in the graph and then take its average.

Submission via Canvas

Turn in a typeset (not handwritten) report describing your results for each graph. Specifically, for each each graph report the following:

  • Is it undirected/directed, weighted/unweighted? Mention what value of the fraction you used as a threshold for deciding that a graph is undirected.
  • What steps did you take in your code to handle the calculations if (a) the graph was disconnected and/or (b) the graph was directed graph but there was one or more node pairs that were not connected by a path.
  • For the weighted and directed interareal connectome, give your responses to the two points noted above.
  • How many nodes and how many edges does it contain?
  • What is the average shortest path length?
  • What is the average clustering coefficient?

Mention any difficulties you encountered and how you addressed them. Were there any surprises, i.e., results or trends you did not expect? It will also be helpful to me if you can include your thoughts on what you learnt from this assignment.