CPPX is a C++ compiler which produces a fact base instead of producing executable code.

A fact base for a source file (in C++ or any language) is a collection of individual "facts" about the program and whatever it contains, defines, uses, modifies, relies on, imports, exports, and so on. If expressed in a standard format, the fact base can then be used as input to software reengineering tools of every kind, for parsing, source code extraction, architecture recovery, data flow analysis, pointer analysis, program slicing, query techniques, source code visualization, object recovery, restructuring, refactoring, remodularization, and so forth. In other words, CPPX is a universal C++ front end.

To be most useful, a fact extractor should

These useful goals are met by CPPX as follows.

The output fact base is an E/R graph, whose vertices represent the program's templates, classes, methods, compound statements, and expressions, down to the lowest level of constants and variable references. The edges in the graph represent syntactic relationships (so that the graph contains the parse tree of the input program) and in addition, semantic facts linking identifiers to their declarations, method calls to their targets, objects to their types, and most things to their enclosing scopes. From the CPPX output graph it is (almost) possible to reconstruct the original program.

An E/R graph has to conform to a schema in order to be useful. The schema (or data model) which CPPX output conforms to is that of Bell Canada's Datrix™ project, somewhat modified.

The result graph is in the Graph eXchange Language GXL. CPPX also can produce output in TA and VCG.

CPPX uses the GNU GCC free compiler to perform syntactic and semantic analysis. This means that CPPX accepts and compiles the same "dialect" of C++ that the GNU compiler does, which certainly includes commercial-scale software. CPPX's project software converts the internal data structure of GCC into the target schema of the Datrix model: this involves little additional processing time, and as a result CPPX runs at the approximately same speed as g++.

last update 2001 May 9 by AJM