Bunch - User’s Manual
Introduction
This document is the User’s Manual for Bunch
Beta 1, a code-analysis tool produced by the Software Engineering Research
Group at the Department of Math & Computer Science of Drexel University.
The document describes the functionality of our tool from a User’s viewpoint
and is meant both as a User Reference and as a help for those who approach
the tool for the first time. More information and documentation on Bunch,
and the research behind it, can be found at Bunch's
site.
Program Functionality
Bunch is a clustering tool intended to aid
the software developer and maintainer in understanding, verifying and maintaining
a source code base. To do this, Bunch lets the user evaluate the quality
of an application’s modularization, by analyzing the source code graph.
Bunch relies solely on the information contained in a module dependency
file, considering nodes as program units or modules, such as files or classes,
and edges between the nodes as calls or relationships between those modules,
such as function calls or inheritance relationships. With this graph, Bunch
can find what a "good" clustering for the system is (thus helping when
documentation of the code is nonexistent or outdated), and it can also
use pre-defined clusters to measure or improve the quality of the system’s
clustering.
To know more about the clustering process,
and what are the uses of Bunch, please consult the on-line
papers.
Installation
Bunch is a Java application. As such, it requires
a Java Virtual Machine 1.1 or compatible to run. Bunch will run with JDK
1.1.6, and you will also need the Java Foundation Classes for JDK 1.1 (JFC,
a.k.a Swing). We have also tested Bunch with Microsoft’s Java Virtual Machine,
jview, and while the algorithms run faster, you may encounter some problems.
You will also need a decompression tool such as WinZip 95 (for Windows
95) or GNU TAR/GZIP for UNIX environments.
Installation Steps
Java Version Check
Check to have the JDK 1.1.6 (or greater),
this can be done by running:
java –version
Swing requires JDK 1.1.5 or greater. We
recommend that, before attempting to run Bunch, you download and install
the JDK and the last swing version from the JavaSoft
site.
Java Classpath
To install bunch, just decompress the distribution
file into any directory (for example, c:\bunch -for windows- or /usr/bunch
-for UNIX-). Then add the JAR file (bunch.jar) that appears in that directory
to your classpath.
For example, if your CLASSPATH was
c:\jdk1.1.6\lib\classes.zip
(for Windows)
or
/usr/jdk1.1.6/lib/classes.zip
(for UNIX)
assuming that you copied the Bunch JAR
file into c:\do (for windows) or /usr/do (for UNIX) you should run the
following command:
set CLASSPATH=c:\bunch\bunch.jar;%CLASSPATH%
(for windows)
or
export CLASSPATH=/usr/bunch/bunch.jar:$CLASSPATH
(for UNIX)
Running Bunch
To run bunch, just type:
java bunch.Bunch
Required Software
To run, Bunch requires:
-
The Java Runtime Environtment or the Java
Development Kit.
-
The Java Foundation Classes for JDK 1.1
As of Beta 1, currently Bunch does not run
as an applet.
Using Bunch
Bunch has a very simple user interface: a
set of tabbed panes that contain all the information easily accessible,
without clobbering the screen with options that are rarely used. The following
is a screenshot of how Bunch appears after launched:

As we can see in the picture, Bunch contains
two divisions on its window: "Options" where the user selects the options
that will be used to run a given action (or procedure) and the "Action"
portion, where the action to be executed is selected and run.
Clustering a graph
Clustering a graph requires the user to select:
-
The graph dependency file
-
A graph output file, and its format
-
A clustering method
The graph dependency file is selected by clicking
on the "Select…" button to the right of the "Input Graph File" field. After
selecting a file with the File Chooser dialog that appears, Bunch will
automatically define the output filename as the same as the name of the
dependency file selected (when writing the results, Bunch will add the
file extension appropriate to the format used, to avoid overwriting the
original dependency file). You can change the output filename by directly
modifying it in the "Output Cluster File" field.
The "Clustering Method" list contains
all the available options to cluster a graph file (for a specific description
of the algorithms, their uses and options, please consult the on-line papers).
The algorithms are:
-
SAHC: Steepest Ascent Hill Climbing algorithm,
a pure hill-climbing algorithm.
-
NAHC: Next Ascent Hill Climbing, a hill climbing
algorithm that produces slightly worse results that SAHC but is faster.
-
GA: A Genetic Algorithm, more efficient than
hill-climbing algorithms for large graphs. Uses a genetic algorithm to
find a good partition.
-
Optimal: only usable for small graphs. Enumerates
every possible partition and chooses the best one.
The "Output File Format" list contains all
the available output formats for the clustered graph. The formats are:
-
Dotty: output format viewable with AT&T
Research’s dotty tool.
-
Tom Sawyer: format viewable with Tom Sawyer.
-
Text: a simple format that defines one cluster
in each line of the file and simply enumerates the names of the modules
that correspond to that cluster. This file format can be used to as input
for the User Directed Clustering option (see the next section: "Clustering
a Graph with predefined clusters").
After selecting all the desired options and
making sure that the Action selected is "Optimize MQ", just press the "Run"
Button. A progress dialog will appear:

Important note: While the data structures
are initialized prior to the beginning of the optimization process, the
progress window may appear frozen. This initialization time increases with
graph size and decreases with machine speed.
This progress dialog indicates:
-
Size of the graph, in the dialog’s
title.
-
Overall progress of the clustering
process (first progress bar), which shows how many generations are left
until the process stops, no matter what answer has been obtained.
-
Generations without change (second
progress bar), which shows how many generations have passed without finding
a better partition for the graph. If this progress bar reaches its maximum
value, the population is assumed stable (i.e., no further improvements
can be made) and the process finishes. This bar only makes sense for SAHC
and NAHC algorithms, and it is the algorithm’s threshold times the total
number of generations to run (as defined in the algorithms’ options dialog).
-
Elapsed time, the time elapsed since
the algorithm started running.
-
Best MQ Value Found, the MQ value of
the best partitioned graph found so far.
Also, The dialog includes three buttons (the
first disabled):
Output,
Pause
and
Cancel. When pressing
Cancel,
the dialog will ask for confirmation before canceling the clustering process.
When pressing the Pause button the Output
button will be
enabled and Pause
will change its label to Resume. In this
way, the user will be able to take "snapshots" of the clustering process
by pressing
Output. Pressing Resume
will continue the clustering
process normally.
User-Directed Clustering
Once a graph is selected, the Options to the
left of the "Basic" pane are activated. The last option in the list is
"User Directed Clustering". Pressing that tab presents the following pane:

This pane lets you select an "Input cluster
file", i.e., a file that contains a given partition for the loaded graph.
Including a predefined clustering in the process has basically two uses:
the first is to analyze the evolution of a system (i.e., find a new clustering
when new modules have been added to the system that weren’t present the
previous time the clustering was run) and the second is to know the MQ
Value of a given partition of the graph.
Clustering a graph with predefined clusters
The first option discusses involves running
the "Optimize MQ" option with a loaded Input Cluster File. In this case,
the "Lock Clusters" checkbox means, when checked, that none of the new
modules will be able to enter the clusters already defined, i.e., they
will be forced to appear in clusters outside the original structure. Leaving
the option unchecked (the most common case) will let new modules integrate
with old clusters.
Calculating a partitioned graph’s MQ
In this case you will have to select the Action
"Calculate MQ" from the Actions List after loading the Input Cluster File
and before starting the process. Doing this with a graph and a pre-defined
cluster will present you with a dialog containing the information of the
graph: its MQ, intraconnectivity and interconnectivity values.
Using Libraries
"Library" modules are those modules that only
have incoming edges, that is, modules that do not make calls to other modules
in the graph. It is common that such modules appear in the input dependency
file but are not really part of the system. Yet, if included in the automatic
clustering process, they might change the result and make the graph less
corresponding to the actual clustering of the modules.
To avoid this, you can select specific
modules to be "Libraries," and thus let them out of the optimization (i.e.,
it will be as if they didn’t exist). The interface provided by Bunch for
this is in the "Libraries" tab, which shows a pane as follows:

The list in the left shows all the modules
available to be selected as libraries. Selecting one of the modules and
pressing the button with the arrow pointing to the right will move the
module to the Libraries list, and viceversa. Bunch also includes a facility
to automatically find library modules (i.e., those modules that match the
definition given above). Pressing the "Find" button at the bottom of the
pane will automatically find those modules that only have incoming edges
and place them on the Library Modules list at the right, removing them
from the original modules list.
In the output graph, Library modules will
appear in their own separate cluster, with a special shape when possible.
Using Omnipresent Modules
A module is considered "Omnipresent" if it
has a number of connections (edges) that is a certain number of times more
than the average. Omnipresent modules are, in a sense, internal system
libraries: modules that are heavily used or that make heavy use of the
system. Supplier omnipresent modules are those that are heavily used (such
as an internal system library) and Client omnipresent modules are those
that make heavy use of the system in general and thus have many connections
to a lot of modules, like a driver. Like Libraries, Omnipresent modules
are modules that are excluded from the clustering process because of their
negative influence in the system’s structure.
Bunch lets you select modules from the
available modules list to put them into the clients or suppliers list using
the "Omnipresent" tab in the main window, which presents the following
pane:

The list in the left shows all the modules
available to be selected as libraries. (NOTE: modules selected as libraries
cannot be selected as omnipresent, and viceversa). Selecting one of the
modules and pressing the button with the arrow pointing to the right will
move the module to the appropriate Omnipresent module list, and viceversa.
The pane includes a facility to automatically find omnipresent modules
(i.e., those modules that match the definition given above). Pressing the
"Find" button at the bottom of the pane will automatically find those modules
that have a number of incoming or outgoing edges (depending if the module
is supplier or client) that is higher than the average number of edges
times the number specified in the field to the right of the button (with
a 3.0 value as a default). Pressing this button will place the modules
on the appropriate omnipresent module list at the right, removing them
from the original modules list.
In the output graph, Library modules will
appear in their own separate cluster, with a special shape when possible.
contact
SERG