Example of Creating a Landscape

Gary Farmaner & Ric Holt - Last updated: November 5, 1997

This document describes how to create a software landscape. This will be done in terms of an example, which explains each step in the process of tranforming a C program into a TA program. The TA program is used by the Landscape Viewer (lsedit) to draw a diagram (a software landscape) showing the structure of the C program.

(In this document, we will not go into the further steps that are would be carried out if there were subsequent versions and new builds of the same software.)

An Example C Program

The C program that we will use as an example is very simple. In realistic uses of landscapes, the source program would be much larger and more interesting. However, for our purpose, which is to show how a landscape is created, we choose to deal only with this simple case.

Our C program consists of three files, main.c, putint.c and putint.h. The first, main.c, sets a global variable to 5 and calls the putint function with a parameter of 10. The putint.h file declares a global variable to be an integer (as external) and gives the header of the putint function (as external). The putint.c program gives the body of the putint function, which computes the product of its parameter and the globlal variable and prints the result. These files are listed in Figure 1.

File main.c:

      #include "putint.h"
      
      void main(void){
          global_var=5;
          putint(10);
      }

File putint.h:

      #ifndef _PUTINT_H_
      #define _PUTINT_H_
      
      extern int global_var;
      extern int putint(int param);
      
      #endif

File putint.c:

      #include "putint.h"
      
      int global_var;
      
      int putint(int param) {
          int local_var;
          local_var=global_var*param;
      
          printf("The number is:%d\n",local_var);
      }

Figure 1. EG: The Example C Program

In rest of this document, we use the term EG (as in "example") to refer for this example C program. We will now show how this program can be visualized as a software landscape.

A Software Landscape for the Example Program

The example program could be diagrammed in many ways. In a software landscape, the emphasis is on the overall structure of the program, rather than details about functions and variables. The software landscape for this program can consist of three boxes, one for each file and arrows between them that show how the files are related. In our diagram for this program (see Figure 2), we will use three kinds of arrows:

useproc. This arrow (or relation) means that there is a call inside the source box (at the tail of the arrow) that goes to a function inside the target box (located by the arrow head). For example, there is a call from the main function in main.c to the putint function declared in the putint.h file.
usevar. This arrow means that inside the source box there is a use (fetch or store) of a variable declared in the target box. For example, the main function, in main.c, changes the value of the global variable declared in putint.h.
implementby. This arrow means that an item (such as a function) declared the source box is implemented in the target box. For example, the global variable and the putint function that are declared in putint.h are defined (are implemented) in putint.c.

(Note that we are using C terminology, in which "declare" determines the identifier and type of an item, as in "external int putint(int param)", and "define" gives the implementation or storage for the item.)


                    +------------+
                    |   main.c   |
                    +------------+
                       |  |
               useproc |  | uservar
                       V  V
                    +------------+
                    |   putint.h |
                    +------------+
                             |
                             | implementby
                             V
                    +------------+
                    |   putint.c |
                    +------------+

Figure 2. Software Landscape for the Example C Program

We will now describe the process to create a landscape from a C program.

What You Need to Get Started

In order to create a landscape, you will need to collect the "ingredients". These are:

Source program. In the case of the example, the source program is available on the Web (it can even be copied from this document). If you are creating a landscape for another program, it will have to be written in C if you want to use our extraction tools (in principle, you can extract facts with other tools, for example, with grep). You should have the "make" file for your C program. Your C program should be syntactically correct; otherwise the fact extractor will not be able to parse it.
Bookshelf Tools. These are all available from the Web. In order to build your own landscape, you will need these tools from the web: cfx, fbgen, grok, and lsedit. The first two, cfx and fbgen are available in source (C) and ideally are quite portable (but buyer beware!). The third, grok, is only available as an executable file, so currently can be used only on certain machines. The last, lsedit, is a Java program, available in source, so it should be quite portable.

Once you have your source program and the tools, you are ready to start creating a landscape.

Four Steps to Create a Software Landscape

For a particular C program, the following four steps (see Figure 3) will create a landscape diagram:

Step 1. Extract the facts from the C program. These facts are information such as what procedure calls what procedure, what variables are set by what procedures, and so on. These facts are represented by tuples, such as "call P Q", which means procedure P calls procedure Q. These facts are extracted by a parser that analyzes the C program. This extraction will be done by cfx (C Fact Extractor, which is a modified version of the GNU C compiler) and fbgen (Fact Base Generator).
Step 2. Manipulate the facts to turn them into a standard format (a TA program) and to determine higher level relations. This manipulation is done by a program called grok. It reads the facts produced in Step 1 and manipulates these facts to create a new set of facts. Higher level relations, such as useproc from one module to another, does not appear explicitly in the source program, so grok must compute them. There is a grok language in which scripts are written to tell grok what manipulations to carry out. (Grok also reads two files, called contain.rsf and prefix.rsf, in order to determine the architecture of the C program as broken into subsystems; in the detailed description below, you will read more about these two files.)
Step 3. Create the layout attributes that determine how to draw the facts as a diagram (as a landscape). This is done by a program called lslayout, which reads the facts about the example program (the TA program created in Step 2), along with a TA scheme which describes the organization of these facts.
Step 4. View the landscape. This is done using the lsview program, which reads the manipulated facts (from Step 2) and the layout attributes (from Step 3) and displays the landscape diagram on the screen. The lsview program is written in Java and can be run standalone or under a net browser.

This actions taken in these four steps and the data that flows from step to step are illustrated in Figure 3.

            Software System EG in C 
            (main.c, putint.c, putint.h)
                  |
                  V
 Step 1     Extract Facts for C (cfx),
            Generate fact base (fbgen)
                  |
                  V
            RSF Fact Base for EG           
            (factbase.rsf)                 Grok scripts
                  |                       /
                  V                      /
 Step 2     Manipulate Facts    <--------- Containment & Prefix Facts
            (grok)                         (contain.rsf, prefix.rsf)
                  |
                  V
            EG TA Fact Tuples   <--------- TA Scheme for C Facts
            (eg_bookshelf.tup)            /(common.scheme.tuple,
                  |                      / (common.scheme.attribute)
                  V                     /
 Step 3     Create Layout Attributes <--
            (lslayout)
                  |
                  V
            TA Attributes for EG Layout 
            (eg_bookshelf.lyt)
                  |
                  V
 Step 4     View EG Landscape (lsedit)   <---- EG TA Fact Tuples & C Scheme
                                               (eg_bookshelf.tup,
                                               common.scheme.tuple,
                                               common.scheme.attribute)

Figure 3. Four Steps to Create a Software Landscape

The rest of this document describes these steps in some detail. You may want to carry out each of these steps to gain experience as you go along. Everything needed to do this is available from the Web pages (given that you have the right kind of machine).

Step 1. Extract the Facts from the Source Program

In Step 1, our goal is to extract facts from the source program. There are tools available from the Web to do this for us automatically for C programs. (Ideally, fact extractors will be build or adapted for other programming languages in the future. This first step is the only one that depends on the source language.) The C extraction programs are called cfx and fbgen. Conceptually these two together are one big program; they are distinct mainly because of historical reasons.

The first of these program, cfx, is a modified version of the GNU shareware C compiler. In the case of EG (our example program), we run:

        cfx_cc -c main.c
        cfx_cc -c putint.c

This generates the following two files, which contain facts about the C source program:

        .cfx.main.o
        .cfx.putint.o

Although we do not need to look at these files, because we simply use them as the input to another fact extraction program, fbgen, one of them in listed in Figure 4 so you can get an idea of the format it uses.

# This written by cc1 [Tue Jul 29 13:37:13 1997]
#
# The File Table
3
main.c
main.c
*Initialization*
*Initialization*
putint.h
putint.h
#
# Imports/Exports
#
#     File   Subject     Object
#Kind Attr File  Line  File  Line          Scope Name         Item Name
#---- ---- ---- ------ ---- ------ -------------------------- ---------
22
   38    1    1     -1   -1     -1                      main putint
   21    1    1     -1    3      5                         - putint
   39    1    1     -1   -1     -1                      main global_var
   22    1    1     -1    3      4                         - global_var
    0    1    1      3   -1     -1                         - main
    4    3    3      5   -1     -1                         - putint
    5    3    3      4   -1     -1                         - global_var
    7    2    2      1   -1     -1                         - __GNUC__
    7    2    2      1   -1     -1                         - __GNUC_MINOR__
    7    2    2      1   -1     -1                         - sparc
    7    2    2      1   -1     -1                         - sun
    7    2    2      1   -1     -1                         - unix
    7    2    2      1   -1     -1                         - __GCC_NEW_VARARGS__
    7    2    2      1   -1     -1                         - __sparc__
    7    2    2      1   -1     -1                         - __sun__
    7    2    2      1   -1     -1                         - __unix__
    7    2    2      1   -1     -1                         - __GCC_NEW_VARARGS__
    7    2    2      1   -1     -1                         - __sparc
    7    2    2      1   -1     -1                         - __sun
    7    2    2      1   -1     -1                         - __unix
   14    1    1      1    3     -1                         - putint.h
    7    3    3      2   -1     -1                         - PUTINT_H

Figure 4. Output from Running cfx on main.c, Producing ".cfx.main.o"

Then the special .cfx.*.o files are "linked" by cfx to generate the .cfx.mr file:

        cfx_cc -o eg_bookshelf .cfx.main.o .cfx.putint.o

This generates this .cfx.mr fact file:

        eg_bookshelf.cfx.mr

This file has a format much like the one shown in Figure 4, but now the file contains the facts for the entire C program.

Instead of individually applying cfx to each source file, we can use the program called cfx_make_trans, which reads the make file for the original program and translate it into a new make file whose purpose is to extract facts from the entire program. Once cfx_make_trans has been run, applying make to the result will carry out the entire translation from source to facts.

Once cfx has extracted and linked all the facts into the file eg_bookshelf.cfx.mr, fbgen reads this file by means of this command:

        fbgen eg_bookshelf.cfx.mr factbase.rsf

This produces the file

        factbase.rsf

The contents of factbase.rsf for our EG system is listed in Figure 5. This fact base has a considerable amount of low level information, such as where macros are declared, that we do not need for our landscape (although other tools, such as a debugger, might be able to use this information). In the next step we will use grok to manipulate these facts to create higher level information and to eliminate much of the low level information. (See the documentation for fbgen for a detailed explanation of the information in factbase.rsf.)

fcndef main.c main
defloc main main.c:3
include main.c putint.h
linkcall main putint
linkref main global_var
macrodef *Initialization* __GCC_NEW_VARARGS__
macrodef *Initialization* __GNUC_MINOR__
macrodef *Initialization* __GNUC__
macrodef *Initialization* __sparc
macrodef *Initialization* __sparc__
macrodef *Initialization* __sun
macrodef *Initialization* __sun__
macrodef *Initialization* __unix
macrodef *Initialization* __unix__
macrodef *Initialization* sparc
macrodef *Initialization* sun
macrodef *Initialization* unix
funcdcl putint.h putint
dclloc putint putint.h:5
vardcl putint.h global_var
vardclloc global_var putint.h:4
macrodef putint.h PUTINT_H
fcndef putint.c putint
defloc putint putint.c:4
vardef putint.c global_var
vardefloc global_var putint.c:3
include putint.c putint.h
librarycall putint printf

Figure 5. Fact Base for Example System EG (in factbase.rsf)

This file factbase.rsf is now in a form (called RSF) which is suitable to be manipulated by grok in Step 3. This file may need to be copied into a directory so grok can find it.

Step 2. Manipulating the Facts by Grok

The file factbase.rsf is read by grok and manipulated in various ways. Grok reads a set of scripts, written in the grok language, which tell it what to do. This language is designed to facilitate queries and updates on fact bases such as the one produced by cfx/fbgen. You can write your own grok scripts, or you can modify the standard ones, but for our purposes we will assume that the standard grok scripts are sufficient.

Modules in the C Language

The standard grok scripts assume that the C program uses the typical C style in which each .c file has a corresponding .h file (the exception being the main program). There may be additional .h files that contain shared information. A combined .c and .h unit will be called a module. For example, files M.c and M.h together make up module M.

Higher Level Structure: Modules and Subsystems

One thing the grok scripts do is to "induce" higher level relations. For example, if module M contains a call to a function in module N, we induce the fact that "M calls N", which in turn, becomes the fact "useproc M N". In a similar way, modules are collected into subsystems, and higher level relations are induced between subsystems.

Since grok does not know how the program modules should be collected into subsystems, we must tell it. We do this by preparing a file called contain.rsf, which has lines such as

        contain front.ss parser
        contain back.ss codegen

These two lines could mean that the front end subsystem, front.ss, contains the parser module, and the back end subsystem, back.ss, contains the codegen module.

In our simple example program, we will consider that we have only one subsystem, eg_bookshelf.ss (which is the entire system), so our file

        contain.rsf

will contain only these three tuples:

        contain eg_bookshelf.ss main.c
        contain eg_bookshelf.ss putint.c
        contain eg_bookshelf.ss putint.h

Besides the file contain.rsf, the grok scripts expect a file called prefix.rsf. This file can be used to categorize modules into subsystems based on the first characters of their names. For example, if all the modules in the front end subsystem have the prefix characters fe_, the file prefix.rsf would contain:

        prefix front.ss fe_

Since our example system has no subsystems of interest, we will simply create prefix.rsf as an empty file.

In a large software system that is being reengineered, it is not easy to determine how to decompose the system into subsystems. (In fact, it may be that the system has such a messy architecture that there is no good way to break it down into subsystems.) One of the best sources of information about decomposition of the system lies in the heads of the people who are working on the system, or have worked on the system. These people should be interviewed, and any available documentation should be studied to try to find out a reasonable decomposition. Once a good decomposition has been obtained, there are bookshelf tools to help maintain this decomposition with the release of new versions of the software, but there tools will not be discussed in this document.

Running the Grok Scripts

Once the files contain.rsf and prefix.rsf have been created, it is time to run the grok scripts. At the point you may wish to study the grok software tool. These grok scripts will:

Input the the program's fact base (factbase.rsf).
Induce the low level facts up to the file level.
Input the contain.rsf and prefix.rsf files.
Induce the file level facts up to the subsytem level.
Output individual subsystem.tup TA files for each subsystem to be used for landscapes for each of them. (Note that our simple example only has a single subsystem and only one landscape.)

The relations induced for the module and subsystem levels are: useproc, uservar, and implementby.

(Technical note: For large programs, depending on the nature of the fact base being used, some pre-processing of it may be desirable to reduce the grok overhead.)

In the case of the EG system, the result of running these scripts is the file

        eg_bookshelf.tup

which contains the tuples which will be used in drawing the landscape; Figure 6 gives the contents of this file.

FACT TUPLE :
$INSTANCE eg_bookshelf.ss subsystem
$INSTANCE main.c module
$INSTANCE putint.c module
$INSTANCE putint.h module
contain eg_bookshelf.ss main.c
contain eg_bookshelf.ss putint.c
contain eg_bookshelf.ss putint.h
usevar main.c putint.h
useproc main.c putint.h
implementby putint.h putint.c

Figure 6. Output of Grok Scripts (eg_bookshelf.tup)

As can be seen in Figure 6, much of the low-level information given in factbase.rsf has been eliminated and replaced by high-level information, such as the fact that main.c uses a variable in putint.h: "usevar main.c putint.h".

Now that we have generated the facts for our program at the right level and in the form of a TA program, we are ready to create the layout information that determines how to diagram these facts as a landscape.

Step 3. Create the Layout Attributes for the Landscape Diagram

We are now ready to run lslayout to generate the layout .lyt files for subsystems and modules in our system. For our example system, this is done by the command:

        lslayout -m2 eg_bookshelf.tup eg_bookshelf.lyt

In this case, we are asking lslayout to generate TA attributes for a client-subsystem-services (CSS) model, using the -m2 switch. This command reads the facts in eg_bookshelf.tup and generates a TA "fact attribute" file called

        eg_bookshelf.lyt

which will contain the information needed to diagram our system. This file is listed in Figure 7.

$ROOT {
      x = 0.000000
      y = 0.000000
  width = 800.000000
 height = 600.000000
}

eg_bookshelf.ss {
      x = 25.000000
      y = 100.000000
  width = 750.000000
 height = 400.000000
}

main.c {
      x = 275.040009
      y = 40.000000
  width = 99.959999
 height = 60.000000
}

putint.h {
      x = 275.040009
      y = 140.000000
  width = 99.959999
 height = 60.000000
}

putint.c {
      x = 275.040009
      y = 240.000000
  width = 99.959999
 height = 60.000000
}

Figure 7. Attributes to Draw Landscape Diagram (eg_bookshelf.lyt)

You might notice in Figure 7 that each file (main.c, putint.h, and putint.c) has values given for x, y, width and height, which are used in creating the diagram. There is also an entity called $ROOT which represents the entire drawing area (canvas) as well as the containing system called eg_bookshelf.ss.

If we planned to continue on to create a bookshelf, in addition to a basic landscape, we would now need to populate a bookshelf but we will not discuss this possibility here.

Step 4. View the Landscape

Now that we have created the fact base and the diagramming attributes for our C program, we are ready to view the landscape for our example system. You can use this information to edit or view your landscape using the lsedit tool.

Try it Yourself

We have packaged the eg_bookshelf example in the file eg.tar which contains the C language source files, the contain.rsf, and the script buildls which you can run to generate the landscape .tup and .lyt files.

To use this package, you need to:

Download, compile, and install all the Bookshelf tools including the grok scripts.
Insure the cfx tools, fbgen, grok, and lslayout are in your command path.
Download eg.tar to an empty working directory.
Untar the eg package using the command:
```
tar xf eg.tar
```
Run the buildls script:
```
./buildls (path to root of grok script directory)
```
Note: The root of the grok script directory contains the two subdirectories gk and scripts.

The significant result of running buildls script will be the directory eg_bookshelf.dir containing the landscape TA files eg_bookshelf.tup and eg_bookshelf.lyt.

See the lsedit/lsview documentation for instructions on viewing the generated landscape.

Conclusions

In this document we have discussed the steps to be followed to create a software landscape, which is a set of diagrams representing of the structure of a piece of software. We did this in terms of a very simple example C program. This approach has been used for very large pieces of software, measured in hundreds of thousands of source lines, and your are encouraged to experiment with these tools to diagram your own software.

See also: Toolkit download, Toolkit reference,