MotifNet – FAQ | Yeger-Lotem Lab

MotifNet: A web-server for network motif analysis

For general issues contact: Esti Yeger-Lotem estiyl@bgu.ac.il

What are network motifs and why do we need them?

How to configure the ‘advanced search options’ in the input form?

What is ‘Motif dispersity’, ‘Lowest position dispersity’ and ‘Highest position dispersity’ properties in the motif card?

How long does it take to process my job?

For how long can I access the results of my job?

Some of the motifs show a warning message: ‘this motif contains duplicate subgraphs’. What does it mean?

Why does my session show a warning message: ‘The results of the analysis contain too many subgraphs …”?

My session shows an error message. What does it mean?

What is MotifNet and why use it?

MotifNet is a web-tool for the identification of network motifs. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labelled, and search for motifs of up to eight nodes. There are several other tools that offer network motif analysis for a user-specified network of interest (mFinder, FANMOD, Kavosh and MAVITSO). Although MotifNet is based on FANMOD, it provides several services that FANMOD and the other tools do not:

The computation of the motif analysis is done on a web-server via MotifNet website. The web-server is equipped with a backend of computational power and allows the analysis of larger networks.
The identified motifs are presented graphically in MotifNet website and the user can interactively filter the motifs and explore their instances in various ways.
The user can search for motifs (or motif instances) that contain specific node names (e.g., genes).
MotifNet provide additional features, dispersity measures and the frequency table, through which users can identify dispersed and local motifs.

What are network motifs and why do we need them?

Network motifs are small topological patterns that recur in the network significantly more often than expected by chance [1]. Prominent examples for common network motif include the feed-forward loop [2] and the mixed-feedback loop [3]. These two examples, and many more [4-13], show how network motifs can elaborate our knowledge of the design principles underlying complex systems [1].

How to configure the ‘advanced search options’ in the input form?

MotifNet uses FANMOD algorithm for fast detection of network motifs [14,15]. The basic workflow of the algorithm is as follows. The algorithm generates numerous random networks and counts the instances of each candidate motif in each random network. The p-value of the candidate motif is calculated as the fraction of random networks in which it occurs at least as frequent as in the input network. The generation of random networks is done by a series of edge swapping operations applied on the input network, while preserving the node-degree distribution. When an edge is swapped, a partner edge is selected randomly, and the two edges swap one of their endpoint nodes. The arguments in the advanced settings are used for three purposes:

Determine how the motif instances will be enumerated (‘Subgraphs enumeration’). You can choose exact enumeration for accurate results, or random sampling if the input network or the motif size are very large.
Determine how many random networks will be generated (‘Number of randomized networks’).
Configure the edge swapping options (all the other arguments). These include: how to swap bi-directional edges, how to treat node/edge-labels, how many swapping will be made per edge and many swapping failures are allowed.

To know more about each argument, hover with the cursor over the help icon next to each argument in the input form, or check the tutorial. For more information about FANMOD, go to FANMOD website.

What is ‘Motif dispersity’, ‘Lowest position dispersity’ and ‘Highest position dispersity’ properties in the motif card?

The dispersity measures reflect how heterogeneous a motif is, in terms of the genes that occur in its instances. This measure helps in distinguishing dispersed motifs, which are independent of each other and might have evolved through different routes, from local motifs, which are centered on a small subset of nodes and may illuminate their functions. The ‘Motif dispersity’ is computed as the total number of genes that appear in the motif instances, divided by the number of motif instances and by the size of the motif. For example, for a 3-node motif that has 10 instances that include 20 different genes, dispersity equals to 20/(3*10) = 0.6667. Note that the ‘Motif dispersity’ equals to 1, only if there are no overlapping genes in any pair of motif-instances, and values close to 0 imply that many instances have common nodes. The ‘Lowest position dispersity’ and ‘Highest position dispersity’ are computed as follows. First, node dispersity is computed for each node in the motif. This is computed as the number of genes that appear in the motif instances in that specific node, divided by the number of motif instances. For example, given the same motif as above, and assuming that in position node X of the motif there are only 3 genes across all instances, then the position node dispersity is 3/10=0.3. Following this definition, ‘Lowest position dispersity’ and ‘Highest position dispersity’ are the minimal and maximal node dispersity, respectively.

How long does it take to process my job?

The run time of MotifNet depends on several factors, mainly on the size of the sought motifs and the analyzed network, the number of random networks to be generated, and the type of enumeration, and might take several days. We refer you to the tutorial page where you can find a table that shows the expected running-time for various inputs.

For how long can I access the results of my job?

Results are saved as session on the web-server. Due to the excessive amount of disk space that each session takes, MotifNet maintains a session for two months. If your job is no longer accessible, you can send an e-mail to ilansmoly@gmail.com and we will try to recover your session.

Some of the motifs show a warning message: ‘this motif contains duplicate subgraphs’. What does it mean?

During the processing of the results, MotifNet assembles all the identified instances for each motif. Occasionally, some instance appears more than once, having two (or more) copies differing by the ordering of the participating nodes (i.e. genes). If this happens, MotifNet keeps only one copy of that instance (because storing all possible orderings would boost the number of instances). Note that this naturally happens when the motif is symmetrical (i.e. two nodes can be replaced and the motif will remain the same). Because the motif instance is selected arbitrarily out of several possible orderings, the frequency information given in the ‘instance exploration window’ might be distorted. For example, if the motif is symmetrical such that nodes ‘a’ and ‘b’ can be replaced, then any gene ‘x’ that appears in column ‘a’, can also appear in column ‘b’ in the table. However, only one possible ordering is chosen, so ‘x’ will appear only in one of these columns (in that specific row). Therefore, the node/edge frequencies for a specific position in the motif might be inaccurate. Note that the overall frequencies shown for ‘all nodes/edges in the motif’, and the data ‘frequency in motif’, remain accurate because the node ordering does not affect them.

Why does my session show a warning message: ‘The results of the analysis contain too many subgraphs …”?

MotifNet cannot handle jobs in which the result contain more than 5,000,000 subgraphs. Note that no other existing tool is able to handle such a vast amount of data. You will be able to explore the motifs but will not be able to view their instances.

My session shows an error message. What does it mean?

‘Error (4)’ – cannot run FANMOD. This error typically occurs when the user submits an empty network. Please check carefully the format of the edge-files you submitted. ‘Error (5)’ – FANMOD run was interrupted. This error typically occurs when FANMOD process exceeds the memory allocation (200GB). This normally occurs due to one (or more) of the following:

The input network is too large.
Motif size is higher than 4.
The number of randomized networks is too high.
Enumeration type is set to ‘Exact’
Enumeration type is set to ‘Random’ but the sampling probabilities are too high.

‘Error (6)’ – FANMOD run detected 0 subgraphs. This error normally occurs when the following conditions hold:

Enumeration type is set to ‘Random’ and the sampling probabilities are too low.
The input network is too small for the given sampling probabilities.

‘Sampling probabilities’ – Upon selecting ‘Random’ you may provide sampling probabilities. The multiplication of these probabilities is the probability for one single subgraph in the network to be sampled. In the algorithm, each subgraph is extended node after node in a seed-and-extend technique. The sampling probabilities correspond to the probabilities to sample the nodes of each candidate subgraph. The left-most number is the probability for the first node in the subgraph to be sampled, the second left-most number is the probability for the second node in the subgraph to be samples, and so on. Note that you should select high probabilities in the left fields to spread the samples evenly in your network. For more information go to FANMOD website.

REFERENCES

Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs: simple building blocks of complex networks. Science, 2002. 298(5594): p. 824-7.
Mangan, S. and U. Alon, Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci U S A, 2003. 100(21): p. 11980-5.
Yeger-Lotem, E., S. Sattath, N. Kashtan, S. Itzkovitz, R. Milo, R.Y. Pinter, U. Alon, and H. Margalit, Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci U S A, 2004. 101(16): p. 5934-9.
Martinez, N.J., M.C. Ow, M.I. Barrasa, M. Hammell, R. Sequerra, L. Doucette-Stamm, F.P. Roth, V.R. Ambros, and A.J. Walhout, A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev, 2008. 22(18): p. 2535-49.
Yu, H., Y. Xia, V. Trifonov, and M. Gerstein, Design principles of molecular networks revealed by global comparisons and composite motifs. Genome Biol, 2006. 7(7): p. R55.
Burda, Z., A. Krzywicki, O.C. Martin, and M. Zagorski, Motifs emerge from function in model gene regulatory networks. Proc Natl Acad Sci U S A, 2011. 108(42): p. 17263-8.
Gerstein, M.B., A. Kundaje, M. Hariharan, S.G. Landt, K.K. Yan, C. Cheng, X.J. Mu, E. Khurana, J. Rozowsky, R. Alexander, R. Min, P. Alves, A. Abyzov, N. Addleman, N. Bhardwaj, A.P. Boyle, P. Cayting, A. Charos, D.Z. Chen, Y. Cheng, D. Clarke, C. Eastman, G. Euskirchen, S. Frietze, Y. Fu, J. Gertz, F. Grubert, A. Harmanci, P. Jain, M. Kasowski, P. Lacroute, J. Leng, J. Lian, H. Monahan, H. O’Geen, Z. Ouyang, E.C. Partridge, D. Patacsil, F. Pauli, D. Raha, L. Ramirez, T.E. Reddy, B. Reed, M. Shi, T. Slifer, J. Wang, L. Wu, X. Yang, K.Y. Yip, G. Zilberman-Schapira, S. Batzoglou, A. Sidow, P.J. Farnham, R.M. Myers, S.M. Weissman, and M. Snyder, Architecture of the human regulatory network derived from ENCODE data. Nature, 2012. 489(7414): p. 91-100.
Cheng, C., K.K. Yan, W. Hwang, J. Qian, N. Bhardwaj, J. Rozowsky, Z.J. Lu, W. Niu, P. Alves, M. Kato, M. Snyder, and M. Gerstein, Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data. PLoS Comput Biol, 2011. 7(11): p. e1002190.
Yu, H., N.M. Luscombe, J. Qian, and M. Gerstein, Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet, 2003. 19(8): p. 422-7.
Hershberg, R., E. Yeger-Lotem, and H. Margalit, Chromosomal organization is shaped by the transcription regulatory network. Trends Genet, 2005. 21(3): p. 138-42.
Ma, H.W., B. Kumar, U. Ditges, F. Gunzer, J. Buer, and A.P. Zeng, An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Res, 2004. 32(22): p. 6643-9.
Ptacek, J., G. Devgan, G. Michaud, H. Zhu, X. Zhu, J. Fasolo, H. Guo, G. Jona, A. Breitkreutz, R. Sopko, R.R. McCartney, M.C. Schmidt, N. Rachidi, S.J. Lee, A.S. Mah, L. Meng, M.J. Stark, D.F. Stern, C. De Virgilio, M. Tyers, B. Andrews, M. Gerstein, B. Schweitzer, P.F. Predki, and M. Snyder, Global analysis of protein phosphorylation in yeast. Nature, 2005. 438(7068): p. 679-84.
Lee, T.I., N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar-Joseph, G.K. Gerber, N.M. Hannett, C.T. Harbison, C.M. Thompson, I. Simon, J. Zeitlinger, E.G. Jennings, H.L. Murray, D.B. Gordon, B. Ren, J.J. Wyrick, J.B. Tagne, T.L. Volkert, E. Fraenkel, D.K. Gifford, and R.A. Young, Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002. 298(5594): p. 799-804.
Wernicke, S. and F. Rasche, FANMOD: a tool for fast network motif detection. Bioinformatics, 2006. 22(9): p. 1152-3.
Wernicke, S., Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform, 2006. 3(4): p. 347-59.