DMS Domain Specifications
Program analysis and transformation tools, to be very general, must apply to a wide variety of languages. Since it is impractical to build "all languages" into a single tool, one must be able to specify a particular language (and even dialect) to such a tool quickly for it to be widely useful.
The DMS Software Reengineering Toolkit is designed to allow the "domain" (language) engineer specify those languages elements quickly and accurately, so that she may spend most of her attention on the actual program analysis or transformation of interest.
These pages discuss such specifications in some detail, and show them applied to Nicholas Wirth's Oberon language as an example. This will provide the would-be DMS domain engineer a feel for DMS (Yes, SD has Tools for Oberon based on the definitions show here). The tutorials and reference documentation provided with DMS itself is far more extensive and detailed.
For a simpler but holistic view of DMS domain specifications working together, see Algebra as a DMS Domain.
DMS Domain Definition Elements Necessary
The following formal descriptions are minimally needed for the DMS Software Reengineering Toolkit to parse and analyze a programming language:
- A lexical definition: this defines the elements ("tokens") of the target language
- A grammar definition: this defines allowable sequences of tokens (BNF)
As a practical matter, if one is to analyze or manipulate source code, one must provide for Life After Parsing. This means (optionally but strongly encouraged) providing definitions to DMS for the following:
- A Prettyprinter: this defines how to print an instance AST as valid/comilable source text, complete with comments
- Static Analysis via Attribute Grammars: how to specify information collection across an AST easily
- Symbol Tables: building a mapping from identifiers to their definition sites, types, and usage instances
- Control Flow Analysis: constructing a control flow graph for the micro-semantics of the program
- Data Flow Analysis: determining how data flows across the program, controlled by the control flow graph. One may need special support to handle indirect references.
- Source to Source Rewrites: definining transformations over trees in terms of surface syntax familiar to the programmer
- Data Flow Pattern Matching: writing surface syntax pattern that can match dataflows rather than syntax
DMS is available with sets of definitions for the above for a wide variety of languages.