CONSTRUCTING A BOOKSHELF

SOFTWARE BOOKSHELF: OVERVIEW AND CONSTRUCTION

Ric Holt, Updated 29 Mar 97, 31 Mar 97
holt@uwaterloo.ca

ABSTRACT. Part 1 of this note gives an overview of the concept of a Software Bookshelf, which is a structure for providing web-based documentation and visualization of large software systems. The second part of the note describes how to build Bookshelves.

PART 1: OVERVIEW

A "Software Bookshelf" for a large software system is intended to provide an easily accessible electronic structure for storing information about the system. This structure is based on the hierarchic decomposition of the software system into subsystems [Finnigan et al 97].

This information includes the source code as well as documentation about the system. A Bookshelf could also provide access to other information, such as collected tests, performance analysis, future plans, and project history. Our concentration here will be upon providing convenient access to architectural diagrams, source code, and system documentation.

This note is intended for those people who want a brief overview of what a Software Bookshelf is and want to know how to build one.

WHO USES A SOFTWARE BOOKSHELF?

A software bookshelf is intended to serve a variety of users, including:

NEW TEAM MEMBERS: These people need to learn about the system and its structure. In many development environments there is no convenient source of information for these people and they must repeatedly consult senior project members to find out basic information.

EXPERIENCED TEAM MEMBERS: These people need to find information that they do not remember, or have not yet learned. For example, tracking down a defect may require understanding the interconnection structure of the system, and this interconnection structure may not be easy to determine. As another example, a developer may need access to an abstract algorithm, and may not have ready access to the appropriate document that documents the algorithm.

MANAGERS: A manager may need to understand the structure of the system for planning purposes. He or she may not need to be intimately knowledgeable about the software system, but would like to be able to see the structural implications of an architectural change.

BENEFITS OF A SOFTWARE BOOKSHELF

The reason for constructing a software bookshelf is to make information about the system more easily accessible, thereby increasing the efficiency of those working with the system. It should decrease the "ramp up" time for new team members as they come on board, and at the same time should decrease the time which experienced team members must spent training these new people. It should decrease the time taken by experienced team members to fix bugs and to design and implement changes. It should help managers understand the structure of the system, in order to better plan for future work. In a nutshell, the software bookshelf should provide a convenient means of storing information so that this information does not need to be repeatedly derived.

The Bookshelf structure is intended to be useful for both maintenance and re-engineering of software systems. In terms of re-engineering, with an eye toward porting or writing in a new language, the bookshelf provides a means of collecting information needed to plan for and execute the revamping of the system.

HOW IS A SOFTWARE BOOKSHELF USED?

The Software Bookshelf is accessed by means of a set of Web pages. There is a separate page for each subsystem of the target software system. These individual pages are sometimes called "shelves". There is a hierarchic structure of these pages, reflecting the decomposition of the target system into subsystems. Each page of these is hot-linked to the pages of related subsystems. Each page displays an architectural diagram called a "landscape" of its subsystem, and is hot-linked to the subsystem's source code and documentation.

The structure of the Bookshelf is open so that a developer can, at his or her discretion, directly access the source code and documentation, by the traditional means such as of navigating the file system. He or she can conveniently link new information into the bookshelf display.

A PAGE FOR EACH SUBSYSTEM

Each subsystem, as well as the entire system, has a web page. This page displays various things including:

LANDSCAPE DIAGRAM: Each "Landscape" diagram is a box and arrow picture in which the boxes represent the items contained in the subsystem and the arrows represent the dependencies among these items. There are as well boxes that represent the clients (users) of the subsystem and the suppliers (items used by the subsystem).

LINKS TO CLIENTS AND SUPPLIERS: These are the files and subsystems that use or are used by this subsystem.

LINKS TO CONTAINED ITEMS: The lowest level subsystems contain source code files. Higher level subsystems contain other subsystems. The page for any subsystem contains links to its contained items.

LINKS TO ASSOCIATED DOCUMENTATION: A "Table of Contents" for the page provides web links to on-line information that documents the subsystem's purpose, algorithm, data structures, etc. There is a simple mechanism for automatically linking documents to Subsystem pages. These documents, most of which are manually created, are sometimes called the "books" on the Bookshelf.

LINK TO CONTAINING SYSTEM OR SUBSYSTEM: Each subsystem is contained in another subsystem (or in the system as a whole). The page for each subsystem contains a link to its containing subsystem.

PART 2: BUILDING A SOFTWARE BOOKSHELF

Up to this point in this note, the organization and use of the Software Bookshelf have been described, without describing how such a Bookshelf might be implemented. We will now describe how it has been implemented.

TOOLS SUPPORTING SOFTWARE BOOKSHELVES

A set of tools is available to help build Bookshelves, including:

EXTRACTORS: These parse the source language and emit corresponding "facts" about programs. For example, if procedure P calls procedure Q, this fact would be emitted:

     call P Q

This format for facts, sometimes called RSF for Rigi Standard Form, consists of a subject (P), a verb (call) and an object (Q). These facts describe a kind of graph in which there is an edge (arrow) from node (box) P to node (box) Q. As well, an extractor produces "attributes", such as the fact that this call occurred on line 461 of file F. Extractors of interest include one for PLIX (an IBM internal language) and two C extractors (the new one is based on IBM internal program representation) written at the University of Victoria. There is a C extractor called CFX based on GCC written at the University of Toronto. There is a Pascal extractor for the Mitel dialect of Pascal. There are a number of public domain extractors, one of the best know being "CIA". (In fact, an extractor may input information other that the source program. For example, the PLIX extractor takes as input the program's cross reference table and the internal stream produced by the PLIX compiler.)

FACT MANIPULATOR: Facts produced by an extractor (or by some other means, including manual preparation), are effectively a data base, which can be manipulated. A software tool called Grok operates on facts written in RSF, allowing the user to "slice" the data base, produce relational composition (a kind of relational "join"), etc. Grok is used to read the facts for an entire system and to emit the facts (the sub-graph) for to each subsystem.

DIAGRAM LAYOUTER: The "layout" attributes for a graph give the positions, sizes and colors for drawing a graph as a diagram. A tool called a Layouter reads facts that represents a graph and adds layout attributes. Besides a Layouter, there is a Layout Adjuster, which reads an old graph with its layout attributes, and a new graph which is similar to the old one, and produces layout attributes for a the new graph such that its diagram resembles the old one.

LANDSCAPE VIEWER: We call each diagram a Software Landscape. There is a tool called the LS (Landscape) Viewer which reads a graph with its layout attributes and displays it as a diagram on the computer screen. The LS Viewer is a Java program; it draws the Landscape in one panel of a web page. In the common case, this page contains information for a particular subsystem of the target system of a Software Bookshelf. There is also a LS Editor, which displays the diagram and allows the user to change the diagram by direct manipulation, changing sizes, positions and colors, as well as adding and deleting boxes and arrows. The modified diagram can then be emitted as a new graph with new new layout attributes.

TA: A FACT INTERCHANGE LANGUAGE

To allow tools such as the ones just described to communicate, a simple language, called TA (Tuple-Attribute Language) has been designed. This language is based an open, human-readable ASCII format, to maximize its ease in learning, in manipulation by standard tools such as "grep" and sort programs, and in interfacing to new or existing tools. See also the "TA: The Tuple-Attribute Language" [Holt 97].

The fundamental notation in TA is the tuple (in RSF format), e.g.,

    include F G

This means that file F includes file G.

TA also supports attributes for each entity or edge, e.g.,

    F { x = 135  y = 216 }

The x and y attributes here tell the Landscape Viewer where to draw a box representing file F on the screen.

TA also supports "schemes", which can be thought of as Entity-Relation diagrams, which determine the allowed connectivity among entities and the allowed attributes.

INFORMATION FLOW IN BUILDING A SOFTWARE BOOKSHELF

The Bookshelf is designed so that as much as is possible, it can be automatically maintained. To help facilitate automatic maintainance, we divide the information that supports a Bookshelf into two categories:

PRIMARY INFORMATION: Parts of the bookshelf are "primary information", meaning these parts cannot be derived by Bookshelf tools, but exist by themselves. For example, the source code and manually written documentation are primary information from the point of view of the Bookshelf.

DERIVED INFORMATION: Other parts of the bookshelf are "derived information", meaning that during maintenance of the Bookshelf, they are derived from primary information. For example, the cross reference table and the call graph are derived from the source code.

Figure 1 shows the flow of information during the building of a Bookshelf. The primary information includes source code (box 1), containment structure (box 2), and documentation (not shown in figure).

               +-----------+   +-----------+
             1 | Source    |   |Containment| 2
               |  Code     |   | Structure |
               +-----------+   +-----------+
                         |       |   
               Extraction|       |  
                         |       | 
                       +-v-------v-+
                     3 | System    |  Fact Manipulation
                       |  Facts    |
                       +-----------+
                             |
                             | Automatic Layout of Landscapes
                             |
                       +-----v-----+
                     4 | Subsystem |  Manual Layout of Landscapes
                       |  Pages    |
                       +-----------+

Figure 1. Information flow when building a Bookshelf

CONTAINMENT STRUCTURE

The containment structure is derived from interviews with the developers, from system documentation, from source file naming conventions, and from the structure implied by facts derived from the source code. This structure records the hierarchic architecture of the target system in terms of successive decomposition, is to be represented in TA notation. For example, the structure of target system S, consisting of subsystems A and B, where A contains files F and G and B contains file H could be represented in TA as:

    contain S A
    contain S B
    contain A F
    contain A G
    contain B H

Creating the first version of this decomposition for a system may require considerable effort, in terms of interviewing the system's designers and studying its documentation. In some cases the structure is obvious from the naming conventions for files, or from the directory structure used for storing files. In difficult cases, there may be no clear source of this architectural information. In the worse case, the system architecture may have deteriorated beyond recognition, in which case attempting to determine the hierarchic structure using a Bookshelf may not be possible.

FACT MANIPULATION

The facts extracted from the source files, as well as the containment structure, are represented in ASCII in TA notation and can be conveniently inspected and manipulated using standard tools such as "grep", "vi" and "sort". This is handy in the too common case in which there is some difficulty in the output produced by tool such as an extractor, or even more worrisomely, if something is missing, such as a source file.

Many of the manipulations of the facts can be done automatically. Grok scripts exist that separate the facts for particular subsystems from the system's global pool of facts, emitting facts in about individual subsystems in TA form suitable for input by the Layouter and Layout Adjust tools.

In the long term, during system evolution, one should try to do all fact manipulation automatically.

MANUAL LAYOUT

The Landscape diagrams for subsystems can be automatically prepared the Layouter tool, i.e., automatically prepared for display by the Landscape Viewer software. Although these automatically prepared diagrams are quite useful, it is usually the case that a manual layout by a developer is superior. The developer lays out the diagram using the LS Editor to clarify various information, such as order of calls, semantic relationship among files, etc.

Once a manual layout has been done for a subsystem, it should not need to be redone for a some time, because the Layout Adjuster can use an old layout to create a new one for each new version of the same subsystem. After the initial creation of the Bookshelf, the Layout Adjuster, as opposed to the Layouter, is the tool used for creating (really, adjusting) attributes used to draw diagrams.

A BUILD SCRIPT FOR MAINTAINING A BOOKSHELF

To be useful in actual practice, the Bookshelf for a system must be created and later maintained using a minimal amount of human intervention. Programmers and managers are very busy people and do not have a lot of time to spend on new activities, which is what they will regard the work of creating a Bookshelf.

The initial creation of a Bookshelf is rightfully called re-engineering in the usual case in which the documentation does not provide a clear record of the architecture of the system. Once the initial version of the Bookshelf has been created, it must then be maintained to keep up to date with the evolution of the target system.

We should attempt to write a "build" script that reads primary information and creates derived information automatically. This script is to extract facts from source programs, manipulate these facts to emit subsystem graphs, lay out subsystem Landscapes, and link to documents. It may be that each time the target system is compiled, the derived information can also be created. Using "Make" scripts, we may be able to avoid rebuilding those parts of the bookshelf that have not changed since the previous build of the Bookshelf.

INTERACTING WITH THE DEVELOPMENT TEAM

Up to now, this note has concentrated on technical problems involved in constructing a Bookshelf. More importantly, in the case in which you are constructing a Bookshelf in concert with a team that is responsible for the software system, is the means of interacting effectively with the team.

The team probably knows much or most of the non-mechanically extractable information needed to construct a bookshelf; they are the experts. They will be willing to share this knowledge with you if they feel you are "on their side". The help they can provide is essential to the successful creation of the Bookshelf.

The team is busy and may not welcome the "distractions" that come with a new project, namely, building a Bookshelf. The benefits of a Bookshelf must be clear to the team, or else they are likely to ignore your efforts to create it and may even subtly sabotage its construction. The utility of a Bookshelf must be understood by both management and technical staff; if not the attempt to create a Bookshelf is likely to fail.

SUCCESS OF A BOOKSHELF

In the end, a Bookshelf should be considered to be successful if it is actually useful to the team, and if they find it worthwhile to continue to maintain it. The team should come to "own" the Bookshelf. This means that it is not enough to construct a Bookshelf; rather the goal should be to provide a mechanism, the Bookshelf, that provides the team with a convenient access to information about their system, thereby allowing them to get their jobs done more proficiently.

BIBLIOGRAPHY (TO BE COMPLETED)

[Holt 97a] "TA: The Tuple-Attribute Language" , Study this document for an understanding of the TA fact interchange language.

[Finnigan et al 97] The Software Bookshelf, I P.J. Finnigan, R. C. Holt, I. Kalas, S. Kerr, K. Kontogiannis, H.A. Muller, J. Mylopoulos, S. G. Perelgut, M. Stanley, and K. Wong, IBM Systems Journal, Vol 36, No 4, 1997, pp 564-593. This paper gives the "big picture" of the concepts in a Software Bookshelf.

[Muller 93] A Reverse Engineering Approach to Subsystem Identification, by Hausi A. Muller, O.A. Mehmet, S.R. Tilley, and J.S. Uhl, Software Maintenance and Practice, Vol 5, 181-204, 1993. Gives a discussion about the Rigi system, a powerful system for visualizing software structures.

[Pak thesis] Experiments with contrasting versions of software using Landscape like diagrams. See [Holt & Pak] for synopsis of some of these results.

[Holt 96] "GASE: Visualizing Software Evolution-in-the-Large", by R. C. Holt and J. Pak. WCRE 96: Working Conference on Reverse Engineering, Monterey, Nov 96. Shows how related Landscapes (one evolved from the other) can be neatly contrasted in a diagram showing both of them.

[Neighbors] WCRE 96 paper Gives good advice on how to re-engineer systems.

[Carmichael] "Design Maintenance : Unexpected Architectural Interactions" , by Ian Carmichael, Vassilios Tzerpos and R.C. Holt, ICSM '95. Explains how the implemented architecture of a software system drifts from its original design.

[Mancoridis 95] "Extending Programming Environments to Support Architectural Design", by S. Mancoridis and R. C. Holt. In the Proceedings of the Seventh International Workshop on Computer-Aided Software Engineering, Toronto, Canada, July, 1995. Gives some early ideas of using Landscapes in Software Engineering classes

[Holt 97] "Binary Relational Algebra Applied to Software Architecture", by R. C. Holt. CSRI Tech Report 345, University of Toronto, March 1996. Explains relational calculus, which is the semantics of tuples, and gives implications in terms of tools such as Grok and software architecture theories.

[MacDaniels] PLIX parser Parser used by CSER IBM project concentrating on migration of software systems. This parser inputs XREF information and compiler intermediate language and emits program facts.

[MacDaniels] C parser Parser developed in CSER IBM project. This parser inputs XREF information and compiler intermediate language and emits program facts.

[Chen 77] The Entity-Relationship Approach to Database Design, by Peter Chen, QED Information Sciences Inc., 170 Linden St., Wellesley, Mass., 1977, 1991

[West] CFX Parser developed at University of Toronto by Tom West and by Gerry Kovan (R. Holt's group), based on GCC compiler. CFS inserts extra markers into compiler stream to attempt to maintain source level constructs such as macros that are lost by some extractors due to action of preprocessors.

[Ramamoorthy] "The C Information Abstraction System", by Ramamoorthy V., Chen F., Nishimoto M., IEEE Transactions on Software Engineering, vol. 16(3), March 1990, pp. 325-334. CIA comprises a rich set of solid tools, one of which is an extractor for C.