Setting up and using the BFX pipelilne

Setting Up And Using The BFX Pipeline

By Nikita Synytskyy, based on the original README by Jingwei Wu
Mar 04, 2004, updated for BFX Sept 29, 2004

The structure of this document

This document consists of two sections. The first one, with its titles numbered by numeral, describes the steps necessary to successfully install the BFX pipeline on your computer and derive, from a given piece of software, a factbase which you can query or visualize.

The second part, with its titles numbered by letter, deals with issues that are not directly related to setup or execution, but nevertheless are, or at least can be, very important when working with the pipeline. For best results, read the whole document before you start installing or extracting, and then use it as a reference while installing or extracting.

0. Introduction

The BFX pipeline is a reverse engineering fact extraction and analysis tool written by Jingwei Wu at the University of Waterloo. It works by analyzing the information extracted from the object files generated during the build process. These facts can then be examined and manipulated to study the architecture of the software being built.

The BFX pipeline has been applied to several large pieces of software, and is very reliable. To apply the pipeline to a new piece of software, you need to undertake these general steps:

Set up QLDX on your computer and define certain environment variables, so that all pieces of QLDX know where they are;
Compile and link your software;
Use the BFX fact extractor to extract facts from the object files generated during the build process and perform post-processing.

This document will guide you through these steps one by one.

1. Setting up QLDX

QLDX setup is very straightforward. You need to put all QLDX files in a directory of your choice, define an environment variable called QLDX to point to that directory, and add the /bin subdirectory to the search path. For example, if your QLDX files are located in /home/username/bin/QLDX, you will do this by typing the following:

$ export QLDX=/home/username/bin/QLDX
$ export LD_LIBRARY_PATH=$QLDX/lib/ld
$ export PATH=$QLDX/bin:$PATH

You need to have Java 1.5 installed on your computer before you can run QLDX. If you do not have Java installed, or if you use an earlier version (1.4 or earlier), the pipeline will not function correctly.

At the moment, the QLDX pipeline works on Linux only.

2. Compile and link your software

Use GCC to build your software as you normally would.

3. Database building and post-processing.

To begin the database building process, you must use the bfx utility to extract the facts from the object files generated during the build process. You can do it using the command

$ bfx `find $PWD -name "*.o"` -o software.bfx.ta

The -o filename switch instructs bfx where to store the output information (in this case, in a file named software.bfx.ta.)

When some smaller programs are built, no permanent object files are left behind - the compiler uses temporary object files, which are deleted after linking. In these cases, the build process has to be modified so that the object files are available for analysis. This can be done by building the program in two steps: compiling the source files to obtain the object files, and then building the executable from the object files.

The information in the database needs to be further processed before it can be used. QLDX comes with a series of scripts to perform the preprocessing, and a special language, QL, to execute them. The next order of business is to generate the rawlink data in the form of a .raw.ta file:

$ ql $QLDX/script/bfx/rawlink.ql  software.bfx.ta software.raw.ta
$ ql $QLDX/script/bfx/liftfile.ql software.raw.ta software.raw.file.ta

You can use either software.raw.ta or software.raw.file.ta as the starting point for the next series of commands. The software.raw.file.ta file has all relations in it lifted to file level, which means it is smaller, contains more general data, and thus takes less time to process. Use software.raw.ta if you want to preserve all relations in as-extracted form.

The next processing step creates the .con.ta hierarchy file. You can either adopt the system hierarchy that has been extracted from the code, or create your own. To adopt the system hierarchy, use the following command:

$ ql $QLDX/script/bfx/syscontain.ql software.raw.ta software.con.ta

To derive your own hierarchy instead, follow these steps:

$ ql $QLDX/script/bfx/files.ql software.raw.ta > software.files

Then you can manually create software.contain file (in RSF format) based on software.files, and integrate it back into the database using the command:

$ ql $QLDX/script/bfx/addcontain.ql software.contain software.raw.ta software.con.ta

Nothing else remains except to generate a viewable landscape and view it with LSEdit:

$ schema software.con.ta software.ls.ta
$ lsedit software.ls.ta

A. On shell variable persistence

The shell variables you define using the "export" command do not have a permanent life, nor universal scope. An exported variable will be available in the instance of the shell it was defined, and in all subprocesses spawned by the shell. It will not be available to other shell instances running in parallel, or to the parent process of the shell.

This has direct impact on the functioning of QLDX, because QLDX itself depends on certain variables being defined. If you define the $QLDX variable as outlined in step 1, the variable will be available (and QLDX will function correctly) only in the shell the variable was defined. In other shells, the variable will remain undefined, and QLDX will not work properly.

If you think that you will be using QLDX frequently, consider placing crucial variable definitions (such as that of the QLDX variable and the addition of the $QLDX/bin directory to the path) in your shell's startup files.