Numerous publications have been devoted to the study of the structure of the topological organization of biological network. This was initiated by several articles published by Albert Barabasi' team (Jeong et ak., 2000, Jeong et al., 2001), which are now considered as seminal for the field of network biology (Barabasi, 2004).
One of the main ideas is that in all biological networks (metabolism, protein interactions, gene regulation, etc), the degree distribution follows a power law, meaning that a vast majority of the nodes present very few connections, while a very few nodes have a large number of connections.
The universality of the power law is however controversial, since several publications showed that the theoretical distribution did not fit actual biological data (Stumpf, 2005; Stumpf, 2005; Khanin, 2006; Lima-Mendez, 2009; Stumpf and Porter, 2012).
Several statistical parameters have been proposed to describe topological properties of the nodes of a network.
Degree: number of neighbours of a node in a network. In a directed network, we may distiguish the out-degree (number of outgoing edges) and the in-degree (number of incoming edges).Nodes with a higher degree than expected in a random network are qualified as hubs, by analogy to computer networks.
Closeness of a vertex is the inverse of the average length of the shortest paths from this vertex and all the other vertices that are reachable from it.
The closeness is a parameter of centrality (nodes with the smallest closeness are reachable in a relatively lower number of steps from any other node of the same component).
Betweenness: proportion of shortest paths passing through a node, among all the shortest paths between all nodes of a graph.
In the following tutorial, we will discover and compute some topological metrics and visualize the node degree distribution of some biological networks and compare it to the degree distribution of randomly generated networks.
This tutorial will be executed using the tools of the software suite Network Analysis Tools (NeAT): a series of modular computer programs specifically designed for the analysis of biological networks. Although NeAT is also available as command lines or as Web service applications, we will only use the web server (http://rsat.bigre.ulb.ac.be/neat/).
This tutorial will rely on two datasets. We recommend to download the two corresponding network files on your computer beforegoing further in this protocol.
|Yeast interactome from Uetz (2000)||This network results from the first published large-scale protein interaction dataset (caracterized by the two-hybrid method) in the budding yeast (Uetz et al, 2000). The baits only covered a subset of 192 among the 6200 yeast proteins. The resulting network contains 865 interactions between 926 proteins.||uetz_2001.tab|
|Yeast interactome from Gavin (2006)||A more complete protein interaction network characterized by mass spectrometry.|
|Yeast metabolism from BioCyc||This network consists in the union of all the metabolic pathways annotated for the budding yeast S. cerevisiae in the metabolic database BioCyc (Karp et al, 2005), release 10.6. The network is formatted as a directed bipartite graph consisting of 1,184 nodes (compounds and reacions) linked by 2,656 edges (substrate/product relationships between compounds and reactions).||yeast_biocyc.tab|
In the following protocol, we show how to download a network from the BioGrid database. In case of trouble with the database availability, or if you want to skip this protocol, some sample networks are provided in the section datasets above.
Open a connection to the PubMed database and identify publication of interest. For this tutorial, we will select the interactome from Gavin (2006), associated to the Pubmed identifier (PMID) 16429126. Copy this identifier number.
Go on the BioGRID website ( http://thebiogrid.org/).
In the frame Search the BioGRID , click on the green tab By publication, paste the PubMed identifier of your selected publication, and click GO.
If the query fails, it may be because you performed a query by Gene ID instead of Pubmed ID. In such case, reload the main home page from BioGrid, and make sure you click the right option before submitting the query.
The screen should now display the abstract of the selected article, followed by a table of interactions. Just below the abstract (before the interaction table), there is a link Download 7592 Interactions For This Publication. Click on this link, select the format BioGRID TAB 2.0 Format and click Downlad interactions.
Wait until the file is ready. When it is, click on the link Download your file and save it on your hard drive.
Unzip the file you just downloaded.
Open the Cytoscape sofwtare tool.
Cytoscape should be pre-installed on your computer. If this is not the case, you can download it from http://www.cytoscape.org/.
The following instructions are suitable for CytoScape v2.8.2, and might require some adaptations for Cytoscape versions >= 3.
In the top main menu, select
File > Import > Network from table
(text / MS Excel).
Click on the button Select file and select browse your computer to locate the file you downloaded from BioGrid, and check the box Show text file import.
In the section Delimiter, uncheck the option Space.
By default, the import function considers tabs and spaces as field delimiters. however, the BioGrid files use spaces inside fields, and only the tab should be considered as a field separator.
In the section Attribute Names, check the option Transfer fist line as attribute names, then click Refresh Preview.
In the section Interaction Definition, select column 8 as Source Interaction and column 9 as Target Interaction.
Click the Import button.
The network is now imported and displayed in a rudimentary way: nodes are displayed on a square grid, without apparent consistency between their locatiojn and their interactions. We will now apply some graph layout in order to obtain a more expressive representation of the network.
Maximize the frame size.
Run the command Layout > yFiles > Organic.
This layout generally gives reasonably good results with biological networks. You can also test some other layout algorithms and compare the result.
Run the command Plugins > Network Analysis > Analyze Network. Check that the network type is well Undirected. Indeed, since we are studying a protein-protein network inteactions are symmetrical and non-directed.
Once the analysis is complete, click the button Save Statistics and save the result on your hard drive.
Since the computation of topological parameters costs time, it is convenient to store the result. Cytoscape then enables to reload these statistics using the command Plugins > Analyze Network > Load Network Statistics.
In the result window of the network statistics, click on the button, select the coloring options (node color and size according to degree, edge color according to betweenness), and display the result.
On the top of the Statistics window, click on the tab Node Degree Distribution. This displays a graph, and a list of buttons for performing additional analyses. Click Fit Power Law.
Evaluate (visually) the fit of the power law on the degree distribution of the interactome from gavin 2006. Do the points seem to follow the fitted line ?
Identify the nodes having the higest values for the different scores (degree, closeness, betweenness), and analyze their annotations in the Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/). Is the functional description helpful to understand why these particular proteins are highly connected (degree) or central to the network (betweenness, closeness) ?
Load the yeast metabolic network (file yeast_biocyc.tab) in CytoScape, and perform the same operations as above.
Identify the nodes with the highest degree. Do you understand the biochemical reason why these compounds are so highly connected in the metabolic network ?
Open a connection to the Network Analysis Tools (Neat; http://neat.rsat.eu/), open the metabolic path finding tool, and search the shortest path between L-aspartate and L-lysine with the following options:
Does the result look like a biochemically relevant metabolic pathway that would convert L-Aspartate into L-Lysine? Why?
Redo the search with a weight on the compounds, and comment the results of the two shortest path searches.
To compute the betweenness and the closeness, all shortest paths between all pairs of nodes must be computed and this might thus take some time. In order not to wait to long, we suggest to compute these statistics only for one network.
Since the metabolic network is directed, check the directed option befire computing its topologocal statistics
Open a connection to the NeAT server (http://neat.rsat.eu/).
In the tool menu (left panel), click on Node topology statistics.
Upload Gavin (2006) interactome network do this with the option Upload graph from file. For this, click on the Browse button and select the network file previously saved on your hard drive.
Check the box of the statistics you would like to compute (degree, betweenness and closeness)
Click on the GO button.
After the result appears, download the result file Graph topology (tab).
Open this file with a speadsheet program (Excel, LibreOffice). Select the table and sort it on the basis of the different columns.
Identify the nodes having the higest values for the different scores (degree, closeness, betweenness), and analyze their annotations in the Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/).
The results consist in two degree distribution figures (logarithmic scale or not) as well as a table containing all the computed statistics. The table can be opened as an HTML file or as simple text file (that might be open in a spreadsheet).
Some conclusions can be drawn from the analysis of these results.
Even if we observe that there are far more less connected nodes than highly connected nodes, none of the analyzed networks seem to follow a power law.
The compounds with the highest degree and betweenneess and the lowest closeness in the metabolic network is water. Co-factors like ATP, ADP, etc share the same trends.
The node degree distribution of the networks randomized according to the Erdos-Renyi randomization look like a Poisson distribution (which was expected).
When we compute the degree distribution of network generated from scratch (200 nodes and 15000 edges), the mean degree is around 150 and the distribution look more like a normal distribution. Indeed, when the mean of the Poisson distribution increases, the Poisson takes a bell (normal-like) shape.
In the left menu, click on Randomize network.
In the Upload graph from file : text field, click on the Browse button and select one of the network files you saved on your hard drive in the previous section.
In the Randomization type field, select Erdos-Renyi randomization.
The Erdos-Renyi randomization corresponds to the randomization of a input graph, keeping the original nodes and the number of edges but without preserving topological properties such as node degree, clustering coefficient, .... In fact, edges are randomly "thrown" between the nodes and each node has the same probability to be linked with an edge.
Check the box Prevent the graph from containing nodes with no neighbour.
Click on the GO button.
After a short wait, a result page should appear that contain a link to the randomized network. Save the result file by right-clicking on the link Random graph and selecting Save link as....
Repeat the preceding steps for the other network.
For both networks, repeat these steps, but select the node degree conservation as randomization type. Leave the box Prevent the graph from containing nodes with no neighbour unchecked.
The node degree conservation randomization corresponds to a shuffling of the edges between the nodes in an adjacency table. It means that a node will keep as many neighbours in the randomized network that in the original network but that these neighbours will differ.
The third available randomization that is available in NeAT, i.e., node degree distribution conservation, consists in exchanging the node names so that each nodes has different neighbours, a different number of edges but the node degree distribution of the network remains unchanged. We won't use this randomization strategy in this tutorial.
In the NeAT left menu, click on Randomize network.
Select from scratch Erdos-Renyi randomization as randomization type
In the number of nodes text field, specify 200.
In the number of edges text field, specify 15000.
Click on the GO button.
Save the result file by right-clicking on the link Random graph and selecting Save link as....
The result of this manipulation is a network where the node names are n1, n2, ...,n865.
Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004 Feb;5(2):101-13.
Gavin et al. Proteome survey reveals modularity of the yeast cell machinery. Nature (2006) vol. 440 (7084) pp. 631-6
Jeong et al. The large-scale organization of metabolic networks. Nature (2000) vol. 407 (6804) pp. 651-4
Jeong et al. Lethality and centrality in protein networks. Nature (2001) vol. 411 (6833) pp. 41-2
Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 2005 Oct 24;33(19):6083-9. Print 2005.
Lima-Mendez and van Helden. The powerful law of the power law and other myths in network biology. Mol Biosyst (2009) vol. 5 (12) pp. 1482-93
Stumpf and Porter. Mathematics. Critical truths about power laws. Science (2012) vol. 335 (6069) pp. 665-6
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000 Feb 10;403(6770):623-7.