Table of Contents Next :  CPPX Architecture

1. Introduction and Background

The CPPX is an open source C/C++ extraction tool for Reverse Engineering and Program Comprehension. It analyzes source code and produces the program graph in a standard exchange format. for use by other reverse engineering and program comprehension tools.

The CPPX project was started with minimal resources and an early delivery date.  There were roughly 4 person months available to work on it (2 people for 4 months).  In addition, the two people (Tom Dean and Andrew Malton) were working out of their homes.  The architecture of CPPX was designed to permit the maximum amount of independent work while minimizing interaction.  We believe that this architecture will allow the project to continue forward as an Open Source project for similar reasons.

One of the first decisions was to base the extractor on the GNU C++ compiler.  The compiler would handle the preprocessor, parsing and semantic analysis.  CPPX would convert the GNU C++ graph into a Datrix graph.  Checking the GCC web site, it appeared that the tree dumping facility was only available in version 2.97. Since the project started in mid January 2001, we started with the January 15, 2001 snapshot of 2.97.

It quickly became apparent that the dumps were not suitable for the project.  This is not to say that the dumps were not suitable for their intended purpose, just not suitable for our needs.  Among other difficulties, the value of real literals were not included in the dumps, and the value of string literals were dumped without any delimitation, making it difficult to find the beginning and end of the string literal (particularly when the string literal could span more than one line because of embedded newlines).  Since it was obvious that we would have to change the dumper anyway, we decided to dump the graph in a binary format, which we called GD (short for Gnu to Datrix), to make it easier to read, write and transform. The GD definition merged both the gcc node types and the Datrix node types, and was thus able to represent both the start, end graph and any intermediate results. One note is that not everything in the GCC graph is dumped in the format. Thus the GD file is a projection of the original GCC graph.

The architecture of the transform it self was a series of individual transforms, each of which accomplishes a small part of the overall transform.  For example, one transform might replace the GCC nodes representing the built in types (e.g. int, short, char) with Datrix nodes for the same concept.  In the resulting graph, any nodes representing the declaration of a variable of a built in type will still be GCC nodes, but the edge representing the type of the variable would point to a Datrix node.

The GD format may be translated at any time into GXL, TA or VCG  formats.  This is not only used for exporting the final Datrix version of the graph, but for examining the original GCC version of the graph, and any of the intermediate stages of the transform.

1.1 Call For Volunteers

As mentioned above, CPPX is open source. Andrew and myself are starting full time faculty positions and can no longer devote 100% of our time to this project.  We intent to continue work on the project as much as we are able, and are recruiting people to assist in moving the project forwards.  Please visit the main web site for the project.

  Table of Contents Next :  CPPX Architecture