Subject Infrastructure Repository
C Object Biographies
This file provides information about each of the C objects we currently make available, or hope to soon make available, including what they are and how they were prepared. (Preparation instructions include details on test suite creation and fault seeding processes). Also provided are references that discuss the process by which the objects were developed.
Flex, grep, gzip, and make are all unix utilities obtained from the Gnu site. We obtained several sequential, previously-released versions of each of these programs.
For these objects, with the exception of a few "smoke" tests, comprehensive test suites were not available. To construct test cases representative of those that might be created in practice for these programs, we used the documentation on the programs, and the parameters and special effects we determined to be associated with each program, as informal specifications. We used these informal specifications, together with the category partition method and an implementation of the TSL tool, to construct a suite of test cases that exercise each parameter, special effect, and erroneous condition affecting program behavior. We then augmented those test suites with additional test cases to increase code coverage (measured at the statement level). We created these suites for the base versions of the programs; they served as regression suites for subsequent versions.
We wished to evaluate the performance of testing techniques with respect to detection of regression faults, that is, faults created in a program version as a result of the modifications that produced that version. Such faults were not available with our object programs; thus, to obtain them, we followed a procedure similar to one defined and employed in several previous studies of testing techniques, as follows. First, we recruited graduate and undergraduate students in computer science with at least two years of C programming experience. Then, the students thus recruited were instructed to insert faults that were as realistic as possible based on their experience, and that involved code deleted from, inserted into, or modified between the versions. To further direct their efforts, the fault seeders were given a list of types of faults to consider.
Given ten potential faults seeded in each version of each program, we activated these faults individually, and executed the test suites for the programs to determine which faults could be revealed by which test cases. We excluded any potential faults that were not detected by any test cases: such faults are meaningless to our measures and have no bearing on results. We also excluded any faults that were detected by more than 25% of the test cases; our assumption was that such easily detected faults would be detected by engineers during their unit testing of modifications.
Use of these subjects, as well as the Siemens subjects described below, may require localization in order to successfully build. Some of these considerations are presented here.
Space consists of 9564 lines of C code (6218 executable), and functions as an interpreter for an array definition language (ADL). The program reads a file that contains several ADL statements, and checks the contents of the file for adherence to the ADL grammar and to specific consistency rules. If the ADL file is correct, space outputs an array data file containing a list of array elements, positions, and excitations; otherwise the program outputs error messages.
Space has 33 associated versions, each containing a single fault that had been discovered during the program's development. Through working with this program, we discovered five additional faults, and created versions containing just those faults. We also discovered that three of the "faulty versions" originally supplied were actually semantically equivalent to the base version.
We constructed a test pool for space in two stages. We obtained an initial pool of 10,000 test cases from Vokolos and Frankl, they had created this pool for another study by randomly generating test cases [Vokolos98]. Beginning with this initial pool, we instrumented the program for coverage and then added additional test cases to the pool until it contained, for each executable statement or edge (though unlike the Siemens programs, not for each definition-use pair) in the program or its control flow graph, at least 30 test cases that exercised that statement or edge. (We treated the statements and edges executable only on failure of one of the seventeen malloc calls found in the program as non-executable.) This process yielded a test pool of 13,585 test cases.
To obtain sample test suites for space, we used this test pool and sampling procedures. We have created several types of suites, some randomly selected, some coverage adequate. The Space CONTENTS files describe the types of suites.
[Vokolos98]. Vokolos, F. I. and Frankl, P. G., "Empirical evaluation of the textual differencing regression testing technique", Proceedings of the International Conference on Software Maintenance, November 1998, pages 44-53.
The "Siemens" programs were assembled by Tom Ostrand and colleagues at Siemens Corporate Research for a study of the fault detection capabilities of control-flow and data-flow coverage criteria [Hutchins94], and were made available to us by Tom Ostrand. They have since been partially modified by us for use in further studies.
The Siemens programs perform a variety of tasks: tcas is an aircraft collision avoidance system, schedule2 and schedule are priority schedulers, totinfo computes statistics given input data, printtokens and printtokens2 are lexical analyzers, and replace performs pattern matching and substitution.
The researchers at Siemens sought to study the fault detecting effectiveness of coverage criteria. Therefore, they created faulty versions of the seven base programs by manually seeding those programs with faults, usually by modifying a single line of code in the program. Their goal was to introduce faults that were as realistic as possible, based on their experience with real programs. Ten people performed the fault seeding, working mostly without knowledge of each other's work. The result of this effort was between 7 and 41 versions of each base program, each containing a single fault.
For each base program, the researchers at Siemens created a large test pool containing possible test cases for the program. To populate these test pools, they first created an initial suite of black-box test cases according to good testing practices, based on the tester's understanding of the program's functionality and knowledge of special values and boundary points that are easily observable in the code, using the category partition method and the Siemens Test Specification Language tool. They then augmented this suite with manually-created white-box test cases to ensure that each executable statement, edge, and definition-use pair in the base program or its control-flow graph was exercised by at least 30 test cases. To obtain meaningful results with the seeded versions of the programs, the researchers retained only faults that were neither too easy nor too hard to detect, which they defined as being detectable by at most 350 and at least 3 test cases in the test pool associated with each program.
To obtain sample test suites for these programs, we used the test pools for the base programs and sampling procedures to create suites. We have created several types of suites, some randomly selected, some coverage adequate. The Siemens CONTENTS files describe the types of suites.
The Siemens files are described in the original Siemens paper [Hutchins94].
[Hutchins94]. Hutchins, M. and Foster, H. and Goradia, T. and Ostrand, T., "Experiments on the effectiveness of dataflow- and control flow-based test adequacy criteria", Proceedings of the 16th International Conference on Software Engineering, May, 1994, pages 191-200.