SWAG tools

Tools

CPPX - a GCC-based fact extractor for C and C++
LDX/BFX - a fact extractor for C, C++, Fortran and Pascal
ASX - a fact extractor for Assembler/Disassembler
Grok - a relational calculator
jGrok - a multi-platform relational calculator
LSEdit - a multi-platform, multi-purpose graph visualizers
CLICS - a Clone Interpretation and Classification System
Beagle - an evolution exploration tool
Evolution spectrograph - a graphical evolution exploration tool
Javap2 Java class file dissassembler
Javex Java class file fact extractor
JCD Java clone detector
ACD C/C++/Assembler clone detector

Pipelines

SWAG Kit - an architecture analysis toolkit, based on CPPX
LDX/BFX pipeline - an architecture analysis toolkit, based on LDX/BFX
Portable Bookshelf (PBS) (retired) - a web-based architecture analysis toolkit

Introduction

Over the years, members of SWAG have produced a considerable amount of tools to aid them in their research. The tools are varied in nature, but fall into several broad categories:

fact extractors, which are usually compiler-based programs that process source code in a particular language to extract information about it;
fact manipulators, which are designed to operate on factbases, and are meant to manipulate and analyze the data these fact bases contain. Facts produced by fact extractors are by far the most common input for these tools;
visualizers, which are used to display the results of the analysis performed by the above tools;
analyzers, which perform further analysis on, and reasoning about, data produced by other tools.

Most of these tools are freely available for download, and you will find links to the downloads on this page.

Pipelines. While all SWAG tools are extremely useful on their own, they truly shine when combined together into "pipelines". A pipeline, as the name suggests, is a collection of tools designed to work together to achieve a final result, with some extra code to achieve tool integration. As per pipeline philosophy, the output of one tool in the pipeline becomes the input of the other, with each tool in the procession performing a certain task. The two pipelines currently available both aim at making the extraction and analysis of software easier for the beginning researcher. They provide a way to perform a complete extraction, analysis and visualization of a piece of software with as few user input as possible.

Just like individual tools, the pipelines are freely available for download. Installing a pipeline is usually easier than installing all the individual tools that make up that pipeline, and lets you use all the included tools both individually and in the pipeline. Below you will find the listing of the tools currently available, with links to their short descriptions.

CPPX (C++ Fact Extraction Tool)

CPPX is a free C++ compiler which produces a fact base instead of producing executable code. For design-recovery tool interoperability to become a reality, we need common software to extract facts from code bases, in a common format, and according to a common schema. CPPX extracts facts from C++ source from the highest semantic level (classes and global data and functions) down to the lowest code level of individual statements and expressions. The output format is based on that of Bell Canada's Datrix project. CPPX is available as free software (and in fact depends on GCC for semantic analysis).

LDX and BFX Fact Extraction Tools

BFX and LDX are two complementary binary code extractors targeted at object modules, executables, and dynamic libraries. Normally, they are integrated with the actual software build process to carry out fact extraction. They together produce information on function calls, variable access, and build dependencies between object modules. Their less detailed output is an order of magnitude smaller than the CPPX output. The two extractors offer an extremely simple and supremely reliable way to derive a system model.

ASX is a fact extraction tool that extracts source information from C, C++, assembler, object, libraries, dynamic libraries and executables, in a format that may then immediately be visualised using lsedit.

CLICS

CLICS screenshot The CLone Interpretation and Classification System (CLICS) is a tool that extends the work of CCFinder/Gemini by trying to improve the scalability of the clone visualization problem. It automatically filters common types of false matches and classifies clones based on a taxonomy of clones. Users can navigate clones based on this taxonomy, remove clones from the result set, edit the list of files to be included in the analysis without changing the detection results, and visualize the cloning relationships in the software system using LSEdit.

Grok

Grok is a programming language for manipulating binary relations. Grok has an interpreter, which can be considered to be a relational calculator. This interpreter has been used extensively for analyzing factbases produced by parsers that extract information from source programs.

The initial version of Grok was created by the author in 1995. It has evolved to become a language for manipulating factbases. Grok operates at the level of a relational database, in that operators generally apply across entire relations, and not just to single entities. The Grok interpreter has been optimized to handle large factbases (up to several hundred thousands of facts). It keeps all of its data structures in memory. It is written in the Turing language.

jGrok

jGrok (download here) is a re-implementation and extension of the original Grok, written in Java, completed and maintained by Jingwei Wu. jGrok adds many new features and commands to the set that original Grok provides. While sacrificing some speed, jGrok gains portability: being a Java program, it is executable on any platform that has a Java Virtual Machine available. Like Grok, jGrok is optimized for operating on large fact bases, and has been used to operate on collections of up to a million facts.

LSEdit

LSEdit screenshot LSEdit (the name stands for Landscape Editor) is a nested-box-and-arrow graph visualizer. While it is primarily used to display graphs representing software architectures ("landscapes"), it is not limited to visualizing software, and is general enough to display any graph that can be visualized as a collection of nested boxes connected by arrows. LSEdit possesses advanced graph layout and editing capabilities, advanced query, elision and search functions, and support for graphs in excess of 300,000 nodes. LSEdit is written in Java and runs equally well on Windows, Unix, MAC OS and any other platform that has a Java virtual machine.

Beagle

Named after the ship upon which Charles Darwin served as a naturalist, Beagle is a tool that aims to help developers gain a better understanding of the software evolution process. It provides a framework that allows users to query, visualize, and navigate through a system's history, and allows users to build a persistent, annotated models of how structural changes have impacted the design of the system.

Evolution Spectrograph

Spectrograph provides a metrics-based method to characterize the evolution of a spectrum of closely related components. There are five terms that need to be clarified in the spectrograph modelling of software evolution: time, spectrum, measurement, snapshot, and thread.

The time dimension denotes the whole or partial lifetime of a software system. Time can be measured in two ways. First, we can measure time in units of evolution events such as software versions and repository commits. Second, we can adopt fixed-length periods as time units such as months and years.
The spectrum dimension denotes a specific decomposition of the software system. A spectrum may appear in different forms. For example, a spectrum may contain a group of source files or subsystems based on the system structure or contain a group of program developers based on team organization.
The measurement dimension denotes a set of measured values for each component in the spectrum over each time unit. A variety of software metrics, such as Lines of Code (LOC), Fan In/Out of dependencies, and defect density can be used to measure software properties of interest.
A snapshot captures the evolution states of all the components in the spectrum at a particular time or during a particular period. An aggregation of all the measured values related to a snapshot can produce a single-valued measurement of the whole snapshot. For example, summing the number of lines of code of each source file yields a number for the whole system.
A thread characterizes the whole or partial evolution history of a specific component in the spectrum. An aggregation of all the measured values related to a thread can yield an overall characterization of the evolution of the corresponding component. For example, an aggregated value can be used to describe the most recency or the least recency by means of summing up measured values using a weighting function.

SWAG Kit

Swagkit is an architecture extraction and analysis toolkit developed by the Software Architecture Group at the University of Waterloo, comprised of the CPPX fact extractor, Grok fact manipulation engine and several fact manipulation scripts, and the LSEdit graph visualization software.

SWAGKit can be used to extract, abstract and present Software Architectures. Currently Swagkit supports the extraction of C/C++ code, the abstraction to the architectural level and the presentation in a Landscape form. Swagkit has been used to analyze and visualize many complex software systems, including the Linux operating system kernel and the VIM editor.

LDX/BFX Pipeline

The LDX/BFX family of pipelines has been developed by Jingwei Wu while working at SWAG. The two related pipelines are made up of the LDX/BFX fact extraction utilities (from which they get their names), the jGrok fact manipulation and query engine, and the LSEdit graph visualization software. These pipelines are interesting because unlike other fact extractors, they work besides the compiler and linker, not instead of them. This means that the facts are extracted as the software system is built. The build process produces both the executable program and the information about that program.

LDX/BFX pipelines have been used to analyze many large software systems, such as Gnumeric (a free spreadsheet program) and Mozilla (the free Internet browser).

Portable Bookshelf (PBS)

The Software Bookshelf is a web-based paradigm for the presentation and navigation of information representing large software systems. The Portable Bookshelf (PBS) is one implementation of this concept. The PBS Toolkit is our set of tools for the generation of a PBS Bookshelf.