Anatomy of a Software Analytics Tool
There exist multiple software analytics tools for software maintenance and software modernization. These can scan ”development assets” and provide maintenance engineers with ‘knowledge’ about these assets, or even offer semi-automated transformations of existing software. We collectively refer to these tools as software analytics tools.
Typical software analytics tools include one or more parsers for selected programming languages, an intermediate representation, possible, an integration layer, and then, an analysis component. The parsers supported by the analytics tool determines the ”footprint” of that tool (the software assets that can be processed by that tool). In the software maintenance context, a ”software analytics” tool mines software assets for ”knowledge” for the purposes of understanding, and evolution.
Software development assets considered by a ”software analytics” tools are almost entirely restricted to ”source code”. Other sources of knowledge of the software development life cycle include (but not limited to) deployment configuration files, machine code, third-party components, information software configuration management repositories, information from issue tracking repositories, requirement documents, test cases and design documentation. The internal information model is often the backbone of the ”analytics” tool which separates the complexity of parsing the source code (or processing other software development artifacts), from the complexity of performing the knowledge mining. Such model is often semantically complex.
Traditional software analytics tools are often built as ”silos”, driven by the parsing technology. It is often the case, that the internal model of a ”software analytics” tool is a form of an ”abstract syntax tree”. Abstract syntax trees are essential for compiler construction, however they create a barrier between tools focused at different languages (or language families) since the analysis component is bound to the ontology of a particular programming language. There are several negative consequences of that. First, there is little interoperability between ”analytics” tools. Second, each tool provider has to excel in several diverse technologies, such as parsing, analysis, as well as the subject matter ”know-how” which determines the business value addressed by the tool. As the result, each tool has a unique profile of strengths and weaknesses. Software integrators and service organizations are faced with a insurmountable challenge of point-to-point integration between tools.