Gary Farmaner & Ric Holt - Last updated: November 5, 1997
This document describes how to create a software landscape. This will be done in terms of an example, which explains each step in the process of tranforming a C program into a TA program. The TA program is used by the Landscape Viewer (lsedit) to draw a diagram (a software landscape) showing the structure of the C program.
(In this document, we will not go into the further steps that are would be carried out if there were subsequent versions and new builds of the same software.)
The C program that we will use as an example is very simple. In realistic uses of landscapes, the source program would be much larger and more interesting. However, for our purpose, which is to show how a landscape is created, we choose to deal only with this simple case.
Our C program consists of three files, main.c, putint.c and putint.h. The first, main.c, sets a global variable to 5 and calls the putint function with a parameter of 10. The putint.h file declares a global variable to be an integer (as external) and gives the header of the putint function (as external). The putint.c program gives the body of the putint function, which computes the product of its parameter and the globlal variable and prints the result. These files are listed in Figure 1.
File main.c:
#include "putint.h" void main(void){ global_var=5; putint(10); }
File putint.h:
#ifndef _PUTINT_H_ #define _PUTINT_H_ extern int global_var; extern int putint(int param); #endifFile putint.c:
#include "putint.h" int global_var; int putint(int param) { int local_var; local_var=global_var*param; printf("The number is:%d\n",local_var); }
The example program could be diagrammed in many ways. In a software landscape, the emphasis is on the overall structure of the program, rather than details about functions and variables. The software landscape for this program can consist of three boxes, one for each file and arrows between them that show how the files are related. In our diagram for this program (see Figure 2), we will use three kinds of arrows:
(Note that we are using C terminology, in which "declare" determines the identifier and type of an item, as in "external int putint(int param)", and "define" gives the implementation or storage for the item.)
+------------+ | main.c | +------------+ | | useproc | | uservar V V +------------+ | putint.h | +------------+ | | implementby V +------------+ | putint.c | +------------+
We will now describe the process to create a landscape from a C program.
For a particular C program, the following four steps (see Figure 3) will create a landscape diagram:
This actions taken in these four steps and the data that flows from step to step are illustrated in Figure 3.
Software System EG in C (main.c, putint.c, putint.h) | V Step 1 Extract Facts for C (cfx), Generate fact base (fbgen) | V RSF Fact Base for EG (factbase.rsf) Grok scripts | / V / Step 2 Manipulate Facts <--------- Containment & Prefix Facts (grok) (contain.rsf, prefix.rsf) | V EG TA Fact Tuples <--------- TA Scheme for C Facts (eg_bookshelf.tup) /(common.scheme.tuple, | / (common.scheme.attribute) V / Step 3 Create Layout Attributes <-- (lslayout) | V TA Attributes for EG Layout (eg_bookshelf.lyt) | V Step 4 View EG Landscape (lsedit) <---- EG TA Fact Tuples & C Scheme (eg_bookshelf.tup, common.scheme.tuple, common.scheme.attribute)
The rest of this document describes these steps in some detail. You may want to carry out each of these steps to gain experience as you go along. Everything needed to do this is available from the Web pages (given that you have the right kind of machine).
In Step 1, our goal is to extract facts from the source program. There are tools available from the Web to do this for us automatically for C programs. (Ideally, fact extractors will be build or adapted for other programming languages in the future. This first step is the only one that depends on the source language.) The C extraction programs are called cfx and fbgen. Conceptually these two together are one big program; they are distinct mainly because of historical reasons.
The first of these program, cfx, is a modified version of the GNU shareware C compiler. In the case of EG (our example program), we run:
cfx_cc -c main.c cfx_cc -c putint.cThis generates the following two files, which contain facts about the C source program:
.cfx.main.o .cfx.putint.o
Although we do not need to look at these files, because we simply use them as the input to another fact extraction program, fbgen, one of them in listed in Figure 4 so you can get an idea of the format it uses.
# This written by cc1 [Tue Jul 29 13:37:13 1997] # # The File Table 3 main.c main.c *Initialization* *Initialization* putint.h putint.h # # Imports/Exports # # File Subject Object #Kind Attr File Line File Line Scope Name Item Name #---- ---- ---- ------ ---- ------ -------------------------- --------- 22 38 1 1 -1 -1 -1 main putint 21 1 1 -1 3 5 - putint 39 1 1 -1 -1 -1 main global_var 22 1 1 -1 3 4 - global_var 0 1 1 3 -1 -1 - main 4 3 3 5 -1 -1 - putint 5 3 3 4 -1 -1 - global_var 7 2 2 1 -1 -1 - __GNUC__ 7 2 2 1 -1 -1 - __GNUC_MINOR__ 7 2 2 1 -1 -1 - sparc 7 2 2 1 -1 -1 - sun 7 2 2 1 -1 -1 - unix 7 2 2 1 -1 -1 - __GCC_NEW_VARARGS__ 7 2 2 1 -1 -1 - __sparc__ 7 2 2 1 -1 -1 - __sun__ 7 2 2 1 -1 -1 - __unix__ 7 2 2 1 -1 -1 - __GCC_NEW_VARARGS__ 7 2 2 1 -1 -1 - __sparc 7 2 2 1 -1 -1 - __sun 7 2 2 1 -1 -1 - __unix 14 1 1 1 3 -1 - putint.h 7 3 3 2 -1 -1 - PUTINT_H
Then the special .cfx.*.o files are "linked" by cfx to generate the .cfx.mr file:
cfx_cc -o eg_bookshelf .cfx.main.o .cfx.putint.o
This generates this .cfx.mr fact file:
eg_bookshelf.cfx.mrThis file has a format much like the one shown in Figure 4, but now the file contains the facts for the entire C program.
Instead of individually applying cfx to each source file, we can use the program called cfx_make_trans, which reads the make file for the original program and translate it into a new make file whose purpose is to extract facts from the entire program. Once cfx_make_trans has been run, applying make to the result will carry out the entire translation from source to facts.
Once cfx has extracted and linked all the facts into the file eg_bookshelf.cfx.mr, fbgen reads this file by means of this command:
fbgen eg_bookshelf.cfx.mr factbase.rsfThis produces the file
factbase.rsfThe contents of factbase.rsf for our EG system is listed in Figure 5. This fact base has a considerable amount of low level information, such as where macros are declared, that we do not need for our landscape (although other tools, such as a debugger, might be able to use this information). In the next step we will use grok to manipulate these facts to create higher level information and to eliminate much of the low level information. (See the documentation for fbgen for a detailed explanation of the information in factbase.rsf.)
fcndef main.c main defloc main main.c:3 include main.c putint.h linkcall main putint linkref main global_var macrodef *Initialization* __GCC_NEW_VARARGS__ macrodef *Initialization* __GNUC_MINOR__ macrodef *Initialization* __GNUC__ macrodef *Initialization* __sparc macrodef *Initialization* __sparc__ macrodef *Initialization* __sun macrodef *Initialization* __sun__ macrodef *Initialization* __unix macrodef *Initialization* __unix__ macrodef *Initialization* sparc macrodef *Initialization* sun macrodef *Initialization* unix funcdcl putint.h putint dclloc putint putint.h:5 vardcl putint.h global_var vardclloc global_var putint.h:4 macrodef putint.h PUTINT_H fcndef putint.c putint defloc putint putint.c:4 vardef putint.c global_var vardefloc global_var putint.c:3 include putint.c putint.h librarycall putint printf
This file factbase.rsf is now in a form (called RSF) which is suitable to be manipulated by grok in Step 3. This file may need to be copied into a directory so grok can find it.
The file factbase.rsf is read by grok and manipulated in various ways. Grok reads a set of scripts, written in the grok language, which tell it what to do. This language is designed to facilitate queries and updates on fact bases such as the one produced by cfx/fbgen. You can write your own grok scripts, or you can modify the standard ones, but for our purposes we will assume that the standard grok scripts are sufficient.
Since grok does not know how the program modules should be collected into subsystems, we must tell it. We do this by preparing a file called contain.rsf, which has lines such as
contain front.ss parser contain back.ss codegenThese two lines could mean that the front end subsystem, front.ss, contains the parser module, and the back end subsystem, back.ss, contains the codegen module.
In our simple example program, we will consider that we have only one subsystem, eg_bookshelf.ss (which is the entire system), so our file
contain.rsfwill contain only these three tuples:
contain eg_bookshelf.ss main.c contain eg_bookshelf.ss putint.c contain eg_bookshelf.ss putint.h
Besides the file contain.rsf, the grok scripts expect a file called prefix.rsf. This file can be used to categorize modules into subsystems based on the first characters of their names. For example, if all the modules in the front end subsystem have the prefix characters fe_, the file prefix.rsf would contain:
prefix front.ss fe_Since our example system has no subsystems of interest, we will simply create prefix.rsf as an empty file.
In a large software system that is being reengineered, it is not easy to determine how to decompose the system into subsystems. (In fact, it may be that the system has such a messy architecture that there is no good way to break it down into subsystems.) One of the best sources of information about decomposition of the system lies in the heads of the people who are working on the system, or have worked on the system. These people should be interviewed, and any available documentation should be studied to try to find out a reasonable decomposition. Once a good decomposition has been obtained, there are bookshelf tools to help maintain this decomposition with the release of new versions of the software, but there tools will not be discussed in this document.
(Technical note: For large programs, depending on the nature of the fact base being used, some pre-processing of it may be desirable to reduce the grok overhead.)
In the case of the EG system, the result of running these scripts is the file
eg_bookshelf.tupwhich contains the tuples which will be used in drawing the landscape; Figure 6 gives the contents of this file.
FACT TUPLE : $INSTANCE eg_bookshelf.ss subsystem $INSTANCE main.c module $INSTANCE putint.c module $INSTANCE putint.h module contain eg_bookshelf.ss main.c contain eg_bookshelf.ss putint.c contain eg_bookshelf.ss putint.h usevar main.c putint.h useproc main.c putint.h implementby putint.h putint.c
As can be seen in Figure 6, much of the low-level information given in factbase.rsf has been eliminated and replaced by high-level information, such as the fact that main.c uses a variable in putint.h: "usevar main.c putint.h".
Now that we have generated the facts for our program at the right level and in the form of a TA program, we are ready to create the layout information that determines how to diagram these facts as a landscape.
We are now ready to run lslayout to generate the layout .lyt files for subsystems and modules in our system. For our example system, this is done by the command:
lslayout -m2 eg_bookshelf.tup eg_bookshelf.lytIn this case, we are asking lslayout to generate TA attributes for a client-subsystem-services (CSS) model, using the -m2 switch. This command reads the facts in eg_bookshelf.tup and generates a TA "fact attribute" file called
eg_bookshelf.lytwhich will contain the information needed to diagram our system. This file is listed in Figure 7.
$ROOT { x = 0.000000 y = 0.000000 width = 800.000000 height = 600.000000 } eg_bookshelf.ss { x = 25.000000 y = 100.000000 width = 750.000000 height = 400.000000 } main.c { x = 275.040009 y = 40.000000 width = 99.959999 height = 60.000000 } putint.h { x = 275.040009 y = 140.000000 width = 99.959999 height = 60.000000 } putint.c { x = 275.040009 y = 240.000000 width = 99.959999 height = 60.000000 }
You might notice in Figure 7 that each file (main.c, putint.h, and putint.c) has values given for x, y, width and height, which are used in creating the diagram. There is also an entity called $ROOT which represents the entire drawing area (canvas) as well as the containing system called eg_bookshelf.ss.
If we planned to continue on to create a bookshelf, in addition to a basic landscape, we would now need to populate a bookshelf but we will not discuss this possibility here.
Now that we have created the fact base and the diagramming attributes for our C program, we are ready to view the landscape for our example system. You can use this information to edit or view your landscape using the lsedit tool.
We have packaged the eg_bookshelf example in the file eg.tar which contains the C language source files, the contain.rsf, and the script buildls which you can run to generate the landscape .tup and .lyt files.
To use this package, you need to:
tar xf eg.tar
./buildls (path to root of grok script directory)
Note: The root of the grok script directory contains the two subdirectories gk and scripts.
See the lsedit/lsview documentation for instructions on viewing the generated landscape.