Changes to the Datrix Model for C++

Changes to the Datrix Model for CPPX.

A data model for exchanging source information about C++ should be

complete: every language feature is modeled somehow
correct: language features are modeled as they are documented in the common manuals
consistent: similar features are modeled in similar ways; related features reveal their relationships; a uniform amount of semantic information is added throughout.

The Datrix model is pretty good according to these criteria. There are a few spots where we saw fit to make small changes, enumerated here. The documentation of the original model can be consulted for comparison: it is divided into numbered Chapters 1 through 6, each having Sections and Subsections. The reference Datrix Model is written in black. The changes are written in blue.

Changes and Interpretations of the Datrix Model

The root node class is called cAsgNds, and has attribute beg and end.. The root node class is called cASGNode for consistency, and has no attributes; source information is represented as described below.
The definitions of scopes and source files (Section 4.9) confuses source location with scope, and doesn't allow the source hierarchy (of nested inclusions) to be represented properly. We are replacing this part of the model with the following:

Following gcc, each compilation unit is treated as a namespace called "::". Its members are connected to it by cArcSon edges.
Each source file is represented by a new node class cSourceFile.
Inclusion between source files is represented by a new arc type cIncludes having an arc attribute line (for the source line number where the #include appears). (Note - this doesn't handle repeated inclusion, which remains unsolved right now.)
Each cASGNode which represents an item in a source file is connected to the source file it begins in, by a new arc type cArcSource, with attribute line for the line number in the source file where the item begins.

The cLiteral class has an attribute type which is a string indicating (inadequately) the built-in type of the literal. The cLiteral class has cInstance edges to indicate the type of literal; it has no type attribute. This allows all the different types of literals to be represented consistently.
The cSwitch node class has structural edges to case labels, which in turn have edges to blocks. This would represent the structure of a Pascal, PL/I, or COBOL case statement, but not a C++ or Java switch statement, which has a single block containing case labels (and any other labels) So a cSwitch node has a single structural edge to its unique block.
The handling of arrays, especially in conjunction with typedefs (called cAliasType in Datrix) is not consistent, or at least not clear: array variables and array types are mixed up, and typedefs makes it worse.

(In the array variables model, a dimensioned variable's type is the "base type" of the array, and the dimension(s) are attributes of the variable. There are no array types.
(In the array types model, a dimensioned variable's type is an array type; arrays have a dimension and a base type; multidimensional arrays are arrays whose base type is an array type.)

The src of an cArrayDim edge is always a cArrayType node.
Each array variable or formal-parameter is represented by a cObject node, having a cInstance edge to a cArrayType node.
Each array typedef is represented by a cAliasType node, having a cInstance edge to a cArrayType node. There are no array dimension edges from a cAliasType node, contrary to the Datrix manual.

In C++ enumerated constants are logically defined in the same scope as the type they belong to. (By contrast, in Java constants are defined in the scope of the class defining them.) It isn't clear from existing documentation (Sec 4.14) what is meant to be be the destination of the ArcDefLoc edge out of cEnumerators. But in keeping with the scope rules of C++, an ArcDefLoc edge connects each cEnumerator to the scope in which the enumerator's cEnumType is defined.
More to come, no doubt.

AJM 2001-04-4