Information Retrieval Applications in Software Development

Abstract Information retrieval (IR) extracts and organizes natural-language information found in unstructured text. Many of the challenges faced by software engineers can be addressed using IR techniques on the unstructured text provided by source code and its associated documents. A survey of IR-based techniques applied to software engineering challenges during the initial development process is presented.

See Full PDF See Full PDF

Related Papers

Abstract Information retrieval (IR) extracts and organizes natural-language information found in unstructured text. Many of the challenges faced by software engineers can be addressed using IR techniques on the unstructured text provided by source code and its associated documents. A survey of IR-based techniques applied to software engineering (SE) challenges during the initial development process is presented.

Download Free PDF View PDF

Abstract There is a growing interest in creating tools that can assist engineers in all phases of the software life cycle. This assistance requires techniques that go beyond traditional static and dynamic analysis. An example of such a technique is the application of information retrieval (IR), which exploits information found in a project's natural language. Such information can be extracted from the source code's identifiers and comments and in artifacts associated with the project, such as the requirements.

Download Free PDF View PDF

Abstract There is a growing interest in creating tools that can assist engineers in all phases of the software life cycle. This assistance requires techniques that go beyond traditional static and dynamic analysis. An example of such a technique is the application of information retrieval (IR), which exploits information found in a project's natural language. Such information can be extracted from the source code's identifiers and comments and in artifacts associated with the project, such as the requirements.

Download Free PDF View PDF

ACM Transactions on Software Engineering and Methodology

Download Free PDF View PDF

Download Free PDF View PDF

Successful development of software systems involves the efficient navigation of software artifacts. However, as artifacts are continuously produced and modified, engineers are typically plagued by challenging information landscapes. One state-of-practice approach to structure information is to establish trace links between artifacts; a practice that is also enforced by several development standards. Unfortunately, manually maintaining trace links in an evolving system is a tedious task. To tackle this issue, several researchers have proposed treating the capture and recovery of trace links as an Information Retrieval (IR) problem. The goal of this thesis is to contribute to the evaluation of IR-based trace recovery, both by presenting new empirical results and by suggesting how to increase the strength of evidence in future evaluative studies. This thesis is based on empirical software engineering research. In a Systematic Literature Review (SLR) we show that a majority of previous evaluations of IR-based trace recovery have been technology-oriented, conducted in "the cave of IR evaluation", using small datasets as experimental input. Also, software artifacts originating from student projects have frequently been used in evaluations. We conducted a survey among traceability researchers and found that while a majority consider student artifacts to be only partly representative of industrial counterparts, such artifacts were typically not validated for industrial representativeness. Our findings call for additional case studies to evaluate IR-based trace recovery within the full complexity of an industrial setting. Thus, we outline future research on IR-based trace recovery in an industrial study on safety-critical impact analysis. Also, this thesis contributes to the body of empirical evidence of IR-based trace recovery in two experiments with industrial software artifacts. The technology-oriented experiment highlights the clear dependence between datasets and the accuracy of IR-based trace recovery, in line with findings from the SLR. The human-oriented experiment investigates how different quality levels of tool output affect the tracing accuracy of engineers. While the results are not conclusive, there are indications that it is worthwhile further investigating into the actual value of improving tool support for IR-based trace recovery. Finally, we present how tools and methods are evaluated in the general field of IR research, and propose a taxonomy of evaluation contexts tailored for IR-based trace recovery in software engineering.

Download Free PDF View PDF

Information and Software Technology

Download Free PDF View PDF

Download Free PDF View PDF

2011 27th IEEE International Conference on Software Maintenance (ICSM)

Download Free PDF View PDF