Abstract Networks often contain regions of tightly connected nodes, or clusters, that highlight their shared relationships. An effective way to create a visual summary of a network is to identify clusters and annotate them with an enclosing shape and a summarizing label. Cytoscape provides the ability to annotate a network with shapes and labels, however these annotations must be created manually one at a time, which can be a laborious process. AutoAnnotate is a Cytoscape 3 App that automates the process of identifying clusters and visually annotating them. It greatly reduces the time and effort required to fully annotate clusters in a network, and provides freedom to experiment with different strategies for identifying and labelling clusters. Many customization options are available that enable the user to refine the generated annotations as required. Annotated clusters may be collapsed into single nodes using the Cytoscape groups feature, which helps simplify a network by making its overall structure more visible. AutoAnnotate is applicable to any type of network, including enrichment maps, protein-protein interactions, pathways, or social networks. Keywords: network analysis, enrichment map, tag cloud, network clustering, complexity reduction, modular networks, annotations, cytoscape Introduction Identifying clusters of nodes in a network, based on similarity of node attributes or connectivity between the nodes, is useful for defining groups of related nodes. For example, clusters in a protein-protein interaction network often represent molecular complexes, proteins that work together as a group to perform a specific function ^[29]1, whereas clusters in a co-authorship network represent a group of authors that often collaborate and publish together. Clusters can be used to create a visual summary of a network by drawing an enclosing shape around each cluster and adding a textual summary label next to each cluster ( [30]Figure 1). This technique is often effective in summarizing the results of a network analysis by highlighting main themes and categories within the network. [31]AutoAnnotate ( [32]http://apps.cytoscape.org/apps/autoannotate) was originally created to aid pathway enrichment analysis using the Enrichment Map App ^[33]2 as every enrichment map requires clustering and annotation, but AutoAnnotate is now being made available as a stand-alone app to benefit other types of analysis. Figure 1. AutoAnnotate overview. [34]Figure 1. [35]Open in a new tab A network is clustered, textual annotation associated with each cluster is automatically summarized as a single cluster label, and the results are visualized. Users can customize the view by selectively collapsing and expanding clusters. Cytoscape enables users to add annotations on top of the network, including arrow, image, shape and text objects. However, fully annotating the clusters in a network can be cumbersome because the user must manually create and position each individual annotation. Furthermore, Cytoscape does not maintain any relationship between the network layout and the annotations, when the layout changes the user must manually reposition the annotations. Because of these limitations some users prefer to export their networks and add annotations using an external application, such as Adobe Illustrator. AutoAnnotate is a Cytoscape 3 App that identifies clusters and automatically draws shape and label annotations for each cluster. The enclosing shapes make it visually clear which nodes belong to each cluster when the clusters do not overlap. The generated labels provide a concise semantic summary of the data attached to the nodes in each cluster. AutoAnnotate maintains a relationship between the annotations and the nodes in a cluster, if the layout of the network changes then the annotations are automatically repositioned. AutoAnnotate maintains multiple sets of annotations for a single network, which allows the user to experiment with different clustering algorithms and label generation strategies. Additionally, AutoAnnotate allows clusters to be collapsed, which can simplify large networks by reducing potentially large sections of the network into single nodes. AutoAnnotate leverages two existing Cytoscape Apps to do most of its work: clusterMaker2 ( [36]http://apps.cytoscape.org/apps/clustermaker2) and WordCloud ( [37]http://apps.cytoscape.org/apps/wordcloud). The clusterMaker2 App provides several clustering algorithms and is directly called by AutoAnnotate to identify clusters of nodes in the network. The use of clusterMaker2 is optional, and the user may provide their own list of cluster identifiers. WordCloud is a Cytoscape App that creates a visual summary of selected attributes for a set of nodes by displaying a word tag cloud, where more frequent words are displayed using larger font size and adjacent words are grouped closer together. AutoAnnotate invokes WordCloud to generate a word tag cloud for the node data within each cluster, which is used to derive the text for the label annotations. AutoAnnotate can be installed from within Cytoscape via the App Manager or from the Cytoscape App Store. Operation This section discusses a basic overview of the capabilities of AutoAnnotate. A more detailed user manual is available from [38]http://baderlab.org/Software/AutoAnnotate. AutoAnnotate creates and manages a list of “Annotation Sets” for each network. An Annotation Set consists of a group of clusters and their associated annotations. Each network may have one active Annotation Set at time. The user may create as many Annotation Sets as they like, and can easily switch between them. The use-case for supporting multiple Annotation Sets is to allow the user to experiment with different clustering and summarization parameters, and then choose the most satisfactory resulting Annotation Set. Annotation Sets are saved to the Cytoscape session file. To create an Annotation Set select AutoAnnotate > New Annotation Set… from the Apps menu. A dialog will pop up enabling the users to define the clusters in the network ( [39]Figure 2). Clusters can be defined in two ways: user defined or calculated automatically through AutoAnnotate. AutoAnnotate accepts any node attribute associated with the network to define clusters. Nodes with the same value for the attribute will be placed in the same cluster. This enables users to import cluster definitions from external programs or clustering algorithms. Alternately, clusters can be calculated within Cytoscape using the clusterMaker2 App ^[40]3 which enables users to cluster a network using one of many clustering algorithms it provides. The clusterMaker2 algorithms are available as commands or from the Apps menu. Most of the algorithms can generate a cluster ID attribute in the node table, which can be consumed by AutoAnnotate. For convenience, AutoAnnotate also provides access to a subset of the clusterMaker2 algorithms directly from the Create Annotate Set dialog ( [41]Figure 2). This enables a user to quickly get started using AutoAnnotate without knowing the fine details of clusterMaker2. Currently, due to the simplified interface within AutoAnnotate, only clustering algorithms that take zero or one parameters and that run quickly are available directly from AutoAnnotate. After the clusters are computed, a label for each cluster is generated by the WordCloud app ^[42]4. Figure 2. The Create Annotation Set dialog. [43]Figure 2. [44]Open in a new tab Cluster options: ClusterMaker2 requires a clustering algorithm and edge weight attribute to be selected, otherwise a node attribute must be selected. Label options: User selects the node attribute to use for creating labels, the labelling algorithm to use, and the maximum number of words per label. Once clusters and labels are calculated, AutoAnnotate automatically adds them as annotations to the network ( e.g. ellipses around each cluster). Clusters are contained within bounding annotation shapes with its corresponding label as a text annotation directly above it. The look of the label and shape annotations may be modified using the Display Options panel ( [45]Figure 3). From this panel the user can adjust border width, shape type, opacity and visibility for shape annotations, and font size, font scaling and visibility for label annotations. Figure 3. The Display Options panel. [46]Figure 3. [47]Open in a new tab Changing parameters here automatically updates the display. After an Annotation Set is created, it is shown in the main AutoAnnotate panel ( [48]Figure 4). This panel allows the user to choose which Annotation Set is currently active for each network. When an Annotation Set is active, a list of all its clusters and labels is displayed in the table. Each row represents one annotation ( i.e. one cluster with its computed label) with the number of nodes it contains and whether it is collapsed or not. Figure 4. The main Annotation Set panel. [49]Figure 4. [50]Open in a new tab Clusters and labels are shown. Clusters can be collapsed or expanded. An annotation can be customized via options available in a context sensitive (right click) menu. Although AutoAnnotate aims to automatically create the best labels and annotations, it also enables users to customize the resulting annotations. Clusters and labels can be manually adjusted. Multiple clusters can be combined. New clusters can be created from selected nodes directly in the network view. Labels can be re-generated automatically based on different label options. New Annotation Sets can be created by selecting a subset of clusters from an existing Annotation Set. To further simplify complex networks AutoAnnotate has the ability to collapse some or all of the clusters automatically. The set of nodes belonging to a cluster are removed and replaced with a single group node representing the set. The group node is named using the computed label. When all clusters in a complex network are collapsed, it can substantially decrease the complexity of the network so that major themes or structure of the underlying network is more readily apparent. Implementation Feature implementation WordCloud registers a command service that calculates a word-cloud given a set of nodes and one or more node attributes. AutoAnnotate calls this command once for each cluster. The results of the command are placed in a list of WordInfo objects, which contain the font size of each word, the order that words appeared in the attributes (maintains word order), and the word adjacency group that contains the word. The word adjacency group is an identifier assigned to each word based on the number of times words occur in adjacent positions. WordCloud computes word frequencies which AutoAnnotate then uses to calculate the best combination of words to describe the given cluster, considering word frequency in the cluster compared to that in the entire network. Normally WordCloud returns all of the words in the cloud, however AutoAnnotate uses a maximum of 1–10 words to make a label, and therefore must make a decision of which words from the cloud to use and in which order. AutoAnnotate currently has two options for deciding this. The “Biggest Words” option sorts the words by font size, takes the N largest words, then sorts the result by word order (this preserves the original order that the words appeared in the selected attribute). The “Adjacent Words” option is a heuristic that attempts to balance word size with word adjacency information. First the words are sorted by font size, then a size bonus is added to every word that is in the same adjacency group as the N largest words. This causes words that are in the same group as the N largest words to be more likely to be chosen. The size bonus cannot cause a word to become bigger than the largest word in the group. Then the list is sorted again by size and the N largest words are selected. We have found through trial and error with many networks that a size bonus of eight results in labels that provide a good semantic description of the nodes in the cluster. Thus, we have made this the default label making option. The Cytoscape group nodes feature is used to collapse clusters. When AutoAnnotate collapses a cluster it first creates a group node that contains all the nodes in the cluster and then the group node is collapsed. The shape and label annotations are no longer drawn for the collapsed cluster. When the cluster is collapsed Cytoscape will create "meta-edges" between the group node and any other nodes it is connected to. The collapsed group nodes and the meta-edges provide a summary of the network. When a cluster is expanded the group node is first expanded and is then deleted. Collapsed groups must be expanded before switching to another Annotation Set because the nodes contained in each group may belong to different clusters in the other Annotation Set. Software design AutoAnnotate is a Cytoscape bundle App based on the Cytoscape 3-supported OSGi (Open Services Gateway Initiative) ( [51]https://www.osgi.org/developer/specifications/) module framework. AutoAnnotate version 1.1 depends on Java 8, the Cytoscape 3.3 API and WordCloud 3.1. The Cytoscape App Manager or the Cytoscape App Store installs WordCloud automatically when installing AutoAnnotate. ClusterMaker2 is not installed automatically because it is optional. To make installation of clusterMaker2 easier for the user the Create Annotation Set dialog attempts to detect the presence of clusterMaker2, and if not available the user is presented with a web link to the App Store page for clusterMaker2 from which they can install it. Cytoscape makes its extensive API available to apps via a number of OSGi service interfaces. These services must be acquired in a bundle activator lifecycle method that is called when the App is initialized. Traditionally the service references are passed down the chain of