2. CPPX Architecture
The CPPX tool consists of four parts:
-
Modifications to the GCC compiler to dump the GD binary representation
of the program graph
-
A set of transforms to rewrite that graph from the GCC internal compiler
schema to the Datrix schema
-
Export programs that transform the binary GD representation to a external
format
-
A driver program to hid the details for most common uses of the CPPX toolset
Note that the deatils of the GD format are discussed in the next section.
2.1 GCC Compiler Modifications
The compiler modifications consist of changes to two of the standard compiler
files (gcc/c-dump.c, gcc/cp/dump.c), two additional files
(gcc/gd_graph.h gcc/gd_graph.c) and associated changes
to the makefiles (gcc/Makefile.in, gcc/cp/Make-lang.in).
The version of gcc/c-dump.c and gcc/cp/dump.c included
with the GCC compiler contain part of the implmentation of a breadth first
search of the compiler graph and routines to dump parts of the graph out
in an ASCII format. One of the benefits of the breadth first search
is that an index of each node can be computed and used for references from
other ndoes. The ASCII format makes excellent use of this property. Since
the GD format is based on array indexing, we make use of the same feature.
The information dumped is a subset of the information actually present
in the internal GCC graph. The internal graph contains information about
the rtl code used to communicate with the code generator and other compiler
implementation information. Since we were working originally from
the ASCII dumps, only the information originally present in the ASCII dumps
are included in the binary dumps. Some changes have been made. One is that
the pointers to the binfo nodes (and the binfo nodes themselves) are not
included in the dumps. Instead, the dumper makes its own walk of the class
inheritance information dumping a slightly modified graph. Some other
information missing from the ASCII dump (such as the value of real constants)
has been added to the graph. Since our binary representation does
not support variable length tree nodes (such as the GCC TREE_VEC node),
these are converted to TREE_LISTS (a linked list in the graph) by the dumper.
The file gcc/c-dump.c is the common dumper used by both the
C compiler and the C++ compiler. The file gcc/cp/dump.c
is a C++ specific version called from gcc/c-dump.c. In the original
ASCII dump version, the C++ specific file (gcc/cp/dump.c) could
return a value that terminated handling of the current node in the graph.
Since our version is generating a new structure, we could not simply terminate
the handling of the node. Instead, the return value is saved and used to
set default values for appropriate fields of the node.
The two additional files, gcc/gd_graph.h and gcc/gd_graph.c
are used to define the GD graph reprsentation and provide some utilities
to read and write GD files. They must be part of the compiler since
the dumper must create a GD graph and write it for use by the rest of the
CPPX tool. They are explained in the next section. The Makefiles
were changed only to include the two new files in the appropriate build
units.
2.2 CPPX Transforms
The transform from the GCC schema to the Datrix schema is accomplished
as a series of individual transforms, each of which performs a small step
in the overal transform. For example, one transform replaces the gcc nodes
representing a while statement by the datrix nodes for the while statement.
The statements inside the while statement are not changed, and the result
graph will contain both GCC and Datrix statement nodes. In priniple, the
order of the rewrites has some flexibility. For example, it should not
matter if the transform of for statements is done before the transformation
of namespace declaration nodes. However, the current implementation is
quite as flexible. Although the general architectural style is a pipe and
filter architecture, UNIX pipes are not supported directly. Instead intermediate
files are used between each of the passes. This is not as slow as
it might seem. On the Linux operating system, unallocated memory
is used as disk buffer, and if there is enough memory on the system to
hold the input and output files simultaineously with the current transform
and the intermediate files are removed after each stage, then very little
actual disk activicty takes place (this may not be the case with a journallling
file system). The individual transformations are discussed in the section
on transformations.
2.3 Export Programs
Currently, there are three export programs:
-
gd2gxl - exports the gd graph as GXL
-
gd2ta - exports the gd graph as TA
-
gd2vcg - exports the gd graph as VCG
All three do not attempt to traverse the graph as they generate thier output.
They assume a valid
graph in the binary representation, and just traverse the nodes and
edges separately. gd2ta is different
from the other two in that it makes several passes due to the nature
of the TA representation.
There are several tasks planned (volunteers needed) for the export filters.
They are:
-
A flag for Datrix only output. Some C++ features are not yet implemented
by the transforms. The consequence is that the final graph is mixed graph
containing both GCC and Datrix nodes and edges. A flag that suppresses
the GCC nodes would produce a Datrix only graph by suppressing GCC nodes
and any edges to the GCC nodes. The resulting graph would not be
complete with respect to the source program, but would satisfy the Datrix
model for use by other tools.
-
A flag for specifying the GXL schema file. The gd2gxl export program has
the schema file gd.xml hard coded into the code. A flag to specify this
file should be provided since the datrix schema file will most likely have
a different name. But at the same time, some researchers may wish to deal
with the original GCC graph (gcc.xml schema file) or with the merged graph
(gd.xml schema file). The gcc.xml and gd.xml schema file must also
be authoured.
2.4 Driver Program
The current driver is a simple driver, and does not support a lot of functionality.
It only supports export in the GXL format of the original GCC schema or
the final Datrix schema. A flag for leaving the gd file for other use is
also provided.
There are a couple of extra driver programs intended for testing that
run only the transform (the compiler and dump is run separately). These
are called doGeneration, doGeneration2 and doGeneration3. Some of them
use the auxilary shell script mkgraph.
The binary layout of the CPPX program is similar to that of the GCC
compiler. The two main exceptions is that the modified compiler is built
with a directory called execs that contains the compiler executables.
The bin directory contains the main driver script. This prevents
the modified compiler from being visible on a users path and thus prevents
the modified compiler from being invoked accidentally. The executables
for each of the transforms are also placed in the execs directory.
The executalbles for the export filters are found in the
bin directory
of the distribution directory.