Specification Overview

This specification defines a meta-model for representing information related to existing software assets, their associations and operational environments (referred to as the Knowledge Discovery Meta-model (KDM)).
The KDM provides a common interchange format that allows interoperability between existing software analysis and modernization tools, services, and their respective models. More specifically, (KDM) provides a common repository structure that facilitates the exchange of data currently contained within individual tool models that represent existing software assets. The meta-model represents the physical and logical software assets at various levels of abstraction as entities and relations.

KDM separates knowledge about existing systems into several orthogonal facets that are well-known in software engineering and are often referred to as Architecture Views.


– Layers, packages and separation of concerns in KDM

KDM specification is organized into the following 4 layers:

  • KDM Infrastructure Layer
  • Program Elements Layer
  • Runtime Resource Layer
  • Abstractions Layer

Each layer is further organized into packages. Each package defines a set of meta-model elements whose purpose is to represent a certain independent facet of knowledge related to existing software systems.

Logically, KDM consists of 9 models. Each KDM model is described by one KDM package.

The KDM Infrastructure Layer consists of the following 3 packages: Core, kdm and Source. Core and kdm packages do not describe separate KDM models. Instead these packages define common meta-model elements that constitute the infrastructure for other packages. The Source package defines the Inventory model, which enumerates the artifacts of the existing software system and defines the mechanism of traceability links between the KDM elements and their original representation in the “source code” of the existing software system.

The Program Elements Layer consists of the Code and Action packages. These packages collectively define the Code model which represents the implementation level assets of the existing software system, determined by the programming languages used in the developments of the existing software system. The Code package focuses on the named items from the “source code” and several basic structural relationships between them. The Action package focuses on behavior descriptions and control- and data-flow relationships determined by them. The Action package is extended by other KDM packages to describe higher-level behavior abstractions that are key elements of knowledge about existing software systems.

The Runtime Resource Layer consists of the following 4 packages: Platform, UI, Event and Data.

The Abstractions Layer consists of the following 3 packages: Structure, Conceptual, Build.

Each of these knowledge facets contains large amounts of information, impossible to be processed at once by human beings. To overcome such a roadblock, each dimension supports the capability to aggregate (summarize) information to different levels of abstraction. This requires KDM to be scalable. In addition, KDM represents both kinds of information: primary and aggregate information. Primary information is assumed to be automatically extracted from the source code and other artifacts, including (but not restricted to) formal models, build scripts, configuration files, data definition files. Some (or even all) primary information can be provided manually by analysts and experts. Aggregate information is obtained from primary information.

Knowledge Discovery exists at progressively deeper levels of understanding, reflecting varying levels required to achieve different objectives. These were seen as the lexical or syntactic understanding of the program code (language-dependent level); the understanding of the application functionality and design (language-independent level); understanding of application packaging and the corresponding dependencies (architecture level); and an understanding of the applications behavior (business level).

The following are key design characteristics of KDM:

  • KDM is a MOF model
  • KDM is an Entity-Relationship model
  • KDM defines an ontology for describing existing software systems
  • KDM can be extended to capture language-specific, application-specific and implementation-specific entities and relationships
  • KDM defines multiple hierarchies of entities via containers and groups
  • KDM models are composable (it is possible to group several entities into a typed container, that will further on represent the entire collection of grouped entities via aggregated relationships)
  • Analyst is able to refactor the model (for example, by moving entities between containers) and map changes in the model to changes in the software through traceability links
  • KDM is aligned with ISO/IEC 11404 Language-Independent Datatypes and SBVR