CPPX - C/C++ Fact Extractor


The CPPX project goal is a compiler from C++ to an interchange language for semantic graphs. The target language is GXL. The data model for the target is described informally in Datrix. Syntactic and semantic analysis is provided by GNU gcc. CPPX provides the translation between these data models and languages.


The members of the CPPX project are Ric Holt, Tom Dean, and Andrew Malton. The project is based at the University of Waterloo's Software Architecture Group.

Tasks and Approach

The CPPX tool is conceptually simple. The GNU gcc compiler provides (to maintainers) an abstract semantic graph at a certain point in its processing of a compilation unit. The target model ("Datrix") is also an abstract semantic graph. The core of CPPX is a union data structure able to represent instances of

The CPPX application is then conceived as a collection of little graph transformers which progressively move a gcc ASG towards a Datrix ASG, all the while representing the results in gd format and maintaining semantic equivalence.

The input to CPPX is any C++ which the gcc compiler will accept. (Since the gcc software also supports compilers of Java, Pascal, and FORTRAN, these languages may eventually also be acceptable input to CPPX. But not now.)

The output from CPPX can be (translated to) GXL, TA, or VCG.

The core data structure can be stored on file: the format is binary and is read and written quickly. We call them "gd" files. CPPX's interface to GNU gcc, its graph transform steps, and its output translators all use gd files for interchange.


The use of the gcc compiler as the source of semantic information means that CPPX results are not strictly-speaking at the source level, and the original source is not fully determined by CPPX's output ASGs. More discussion.

The basis of the Datrix data model has been changed a little, and will be changed some more, either to correct errors in it, to move it closer to the actual semantics of C++, or to make its results more generally convenient. We aren't designing a new data model, though: the changes are very small. More discussion.

Here are a few technical notes about gcc.


We are maintaining various examples to show our progress and to give tool developers access to GXL data before the general release of the CPPX toolset.

Progress Report

At the beginning of the project we built the code data-structure, utility routines, converters, and learned how to obtain the gcc ASG from the depths of the compiler.

Now we are writing graph rewriters which operate on our gd files. There is an up-to-date list of tasks, from which you can look at the source code of rewriters that are completed to date.



This page is maintained by Tom Dean