Changes to the Datrix Model for CPPX.
A data model for exchanging source information about C++ should be
-
complete: every language feature is modeled somehow
-
correct: language features are modeled as they are documented in the
common manuals
-
consistent: similar features are modeled in similar ways; related features
reveal their relationships; a uniform amount of semantic information is
added throughout.
The Datrix model is pretty good according to these criteria. There
are a few spots where we saw fit to make small changes, enumerated here.
The documentation
of the original model can be consulted for comparison: it is divided into numbered
Chapters 1 through 6, each having Sections and Subsections.
The reference Datrix Model is written in black. The
changes are written in blue.
Changes and Interpretations of the Datrix Model
-
The root node class is called cAsgNds, and has attribute beg
and
end.. The root node class is called
cASGNode
for consistency, and has no attributes; source information is represented
as described below.
-
The definitions of scopes and source files (Section 4.9) confuses source
location with scope, and doesn't allow the source hierarchy (of nested
inclusions) to be represented properly. We are replacing this part
of the model with the following:
-
Following gcc, each compilation unit is treated as
a namespace called "::". Its members are connected to it
by cArcSon edges.
-
Each source file is represented by a new node class
cSourceFile.
-
Inclusion between source files is represented by
a new arc type cIncludes having an arc attribute line
(for the source line number where the #include appears).
(Note - this doesn't handle repeated inclusion, which remains unsolved
right now.)
-
Each cASGNode which represents an item in
a source file is connected to the source file it begins in, by a new arc
type cArcSource, with attribute line for the line number
in the source file where the item begins.
-
The cLiteral class has an attribute type which is a string
indicating (inadequately) the built-in type of the literal. The
cLiteral
class has cInstance edges to indicate the type of literal; it
has no type attribute. This allows all the different types
of literals to be represented consistently.
-
The cSwitch node class has structural edges to case labels, which
in turn have edges to blocks. This would represent the structure
of a Pascal, PL/I, or COBOL case statement, but not a C++ or Java switch
statement, which has a single block containing case labels (and any other
labels) So a cSwitch node has a single structural
edge to its unique block.
-
The handling of arrays, especially in conjunction with typedefs (called
cAliasType
in Datrix) is not consistent, or at least not clear: array variables and
array types are mixed up, and typedefs makes it worse.
(There are basically two ways to model arrays: the "array variables"
model and the "array types" model.
-
(In the array variables model, a dimensioned variable's type is the "base
type" of the array, and the dimension(s) are attributes of the variable.
There are no array types.
-
(In the array types model, a dimensioned variable's type is an array type;
arrays have a dimension and a base type; multidimensional arrays are arrays
whose base type is an array type.)
We have decided to use the array types model, because (a) it's closer to
gcc, which makes out life easier; (b) Datrix mostly uses it too; and (c)
C and C++ really do have array types, and they know their size. This
means the following slight changes to the Datrix model:
-
The src of an cArrayDim edge is
always a cArrayType node.
-
Each array variable or formal-parameter is represented
by a cObject node, having a
cInstance edge to a cArrayType
node.
-
Each array typedef is represented by a cAliasType
node, having a cInstance edge to a cArrayType node. There
are no array dimension edges from a cAliasType node, contrary to the Datrix
manual.
-
In C++ enumerated constants are logically defined
in the same scope as the type they belong to. (By contrast, in Java
constants are defined in the scope of the class defining them.)
It isn't clear from existing documentation (Sec 4.14) what is meant to be
be the destination of the ArcDefLoc edge out of cEnumerators.
But in keeping with the scope rules of C++,
an ArcDefLoc edge connects each cEnumerator to the scope
in which the enumerator's cEnumType is defined.
-
More to come, no doubt.
AJM 2001-04-4