Prev : Introduction and Background Table of Contents Next : The GD Format

2. CPPX Architecture

The CPPX tool consists of four parts:
  1. Modifications to the GCC compiler to dump the GD binary representation of the program graph
  2. A set of transforms to rewrite that graph from the GCC internal compiler schema to the Datrix schema
  3. Export programs that transform the binary GD representation to a external format
  4. A driver program to hid the details for most common uses of the CPPX toolset
Note that the deatils of the GD format are discussed in the next section.

2.1 GCC Compiler Modifications

The compiler modifications consist of changes to two of the standard compiler files (gcc/c-dump.c, gcc/cp/dump.c), two additional files (gcc/gd_graph.h gcc/gd_graph.c) and associated changes to the makefiles (gcc/Makefile.in, gcc/cp/Make-lang.in).

The version of gcc/c-dump.c and gcc/cp/dump.c included with the GCC compiler contain part of the implmentation of a breadth first search of the compiler graph and routines to dump parts of the graph out in an ASCII format.  One of the benefits of the breadth first search is that an index of each node can be computed and used for references from other ndoes. The ASCII format makes excellent use of this property. Since the GD format is based on array indexing, we make use of the same feature.

The information dumped is a subset of the information actually present in the internal GCC graph. The internal graph contains information about the rtl code used to communicate with the code generator and other compiler implementation information.  Since we were working originally from the ASCII dumps, only the information originally present in the ASCII dumps are included in the binary dumps. Some changes have been made. One is that the pointers to the binfo nodes (and the binfo nodes themselves) are not included in the dumps. Instead, the dumper makes its own walk of the class inheritance information dumping a slightly modified graph.  Some other information missing from the ASCII dump (such as the value of real constants) has been added to the graph.  Since our binary representation does not support variable length tree nodes (such as the GCC TREE_VEC node), these are converted to TREE_LISTS (a linked list in the graph) by the dumper.

The file gcc/c-dump.c is the common dumper used by both the C compiler and the C++ compiler.  The file gcc/cp/dump.c is a C++ specific version called from gcc/c-dump.c. In the original ASCII dump version, the C++ specific file (gcc/cp/dump.c) could return a value that terminated handling of the current node in the graph.  Since our version is generating a new structure, we could not simply terminate the handling of the node. Instead, the return value is saved and used to set default values for appropriate fields of the node.

The two additional files, gcc/gd_graph.h and gcc/gd_graph.c are used to define the GD graph reprsentation and provide some utilities to read and write GD files.  They must be part of the compiler since the dumper must create a GD graph and write it for use by the rest of the CPPX tool.  They are explained in the next section. The Makefiles were changed only to include the two new files in the appropriate build units.

2.2 CPPX Transforms

The transform from the GCC schema to the Datrix schema is accomplished as a series of individual transforms, each of which performs a small step in the overal transform. For example, one transform replaces the gcc nodes representing a while statement by the datrix nodes for the while statement. The statements inside the while statement are not changed, and the result graph will contain both GCC and Datrix statement nodes. In priniple, the order of the rewrites has some flexibility. For example, it should not matter if the transform of for statements is done before the transformation of namespace declaration nodes. However, the current implementation is quite as flexible. Although the general architectural style is a pipe and filter architecture, UNIX pipes are not supported directly. Instead intermediate files are used between each of the passes.  This is not as slow as it might seem.  On the Linux operating system, unallocated memory is used as disk buffer, and if there is enough memory on the system to hold the input and output files simultaineously with the current transform and the intermediate files are removed after each stage, then very little actual disk activicty takes place (this may not be the case with a journallling file system). The individual transformations are discussed in the section on transformations.

2.3 Export Programs

Currently, there are three export programs:
  1. gd2gxl - exports the gd graph as GXL
  2. gd2ta - exports the gd graph as TA
  3. gd2vcg - exports the gd graph as VCG
All three do not attempt to traverse the graph as they generate thier output.  They assume a valid
graph in the binary representation, and just traverse the nodes and edges separately. gd2ta is different
from the other two in that it makes several passes due to the nature of the TA representation.

There are several tasks planned (volunteers needed) for the export filters. They are:

  1. A flag for Datrix only output. Some C++ features are not yet implemented by the transforms. The consequence is that the final graph is mixed graph containing both GCC and Datrix nodes and edges. A flag that suppresses the GCC nodes would produce a Datrix only graph by suppressing GCC nodes and any edges to the GCC nodes.  The resulting graph would not be complete with respect to the source program, but would satisfy the Datrix model for use by other tools.
  2. A flag for specifying the GXL schema file. The gd2gxl export program has the schema file gd.xml hard coded into the code. A flag to specify this file should be provided since the datrix schema file will most likely have a different name. But at the same time, some researchers may wish to deal with the original GCC graph (gcc.xml schema file) or with the merged graph (gd.xml schema file).  The gcc.xml and gd.xml schema file must also be authoured.

2.4 Driver Program

The current driver is a simple driver, and does not support a lot of functionality. It only supports export in the GXL format of the original GCC schema or the final Datrix schema. A flag for leaving the gd file for other use is also provided.
There are a couple of extra driver programs intended for testing that run only the transform (the compiler and dump is run separately). These are called doGeneration, doGeneration2 and doGeneration3. Some of them use the auxilary shell script mkgraph.

The binary layout of the CPPX program is similar to that of the GCC compiler. The two main exceptions is that the modified compiler is built with a directory called execs that contains the compiler executables. The bin directory contains the main driver script. This prevents the modified compiler from being visible on a users path and thus prevents the modified compiler from being invoked accidentally.  The executables for each of the transforms are also placed in the execs directory.  The executalbles for the export filters are found in the bin directory of the distribution directory.


Prev : Introduction and Background Table of Contents Next : The GD Format