Tuesday, November 23, 2010

Vesta Beef Risotto Pro Points

An object-oriented metalanguage

In computer magazines and books about model-driven software development in recent times often described the framework
Xtext
and praise - which is mainly due to the fact that it is used for the integration of programming in the Eclipse IDE [. 1] Xtext (or its underlying parser
Antlr
) is an easy-to-learn tool to meet the broad development of domain-specific languages - that is why it is also suitable for entry into the area. But there are interesting alternatives that should at least know.

A more interesting option to make run-time maintainable models for code generation and for the development of DSLs in my view, the so-called
Parsing Expression Grammars (PEG) is
, especially developed by Alessandro Warth object-oriented metalanguage
OMeta
. The basic idea of all PEGs is a generic term the pattern matching, such as it is used in regular expressions. PEGs use the knowledge that all usually are the construction of compilers and interpreters related tasks in a way, only variations of a common mechanism - the pattern recognition [2]




The
Tokenizer
(also
Lexer
or
Scanner
called) transforms an array of characters (the input, the source code) into a sequence of basic syntactic units, tokens. Classically used for this purpose a tool like lex

or its Flex development

. In principle, a Lex or Flex file provides a list of regular expressions that will be applied sequentially to the input until a suitable expression is found. Each regular expression is an "action" in the form of C code assigned, which must eventually end with the return of a well-defined tokens, together with a possible token argument. To illustrate how such a flex file is listed, like my serve
Flex Definition File apc_lexer.lex
project
astro
patterns.

The

parser operates on the array of tokens to generate an internal data structure that is suitable for the efficient processing machine particularly well. Usually the result of parsing an abstract syntax tree (AST). Also, the parsing is basically Pattern recognition, only on a higher level: it uses a different kind of "alphabet" consisting of terminal and nonterminal symbols, and the patterns are the syntax rules, called production rules. Parser has a long history, represented in which
yacc
(

y
et
a nother

c
ompiler
c
ompiler
) a first milestone.
yacc
, although developed over 30 years, is still used for the design of programming languages. I have implemented with
Bison
, the successor to yacc, a
astrological DSL
which serve also to illustrate here should. The
Bison grammar of these DSL
shows a structural similarity that is compared here with the input of a number of so-called production rules. If a suitable rule is found, the associated semantic action

executed.


Type Checker
and
optimizer
are transformations of the AST, which recognize certain patterns and replace them with more specific or more efficient designs. As the product of the parser is usually a syntax tree, one often uses the

visitor design pattern to traverse the tree, run depending on the application-specific operations. [3] The same applies

Finally, the code generator

, who transformed the AST program in source code or binary code for a processor or a virtual machine. Also be used for this purpose to the visitor pattern ajar methods, although this task can be formulated in the form of a grammar with production rules.



OMeta The programming language provides a mechanism to formulate all these tasks in the form of a PEG grammar and fix them. The concept is so general that in principle any transformation of code and data-structures in other target structures. The OMeta syntax can be embedded in different host languages, for example, there OMeta implementations many others,
C #
,

JavaScript, Ruby
- for
Squeak
, COLA (a mixture of Scheme and Smalltalk used the first host language) - increases the number of implementations. I find it particularly attractive to the cooperation with host dynamic languages such as JavaScript or Python. For the interpreted languages are obviously very well prepared for the vision of model-driven software development, at run time in a domain-specific language, the system behavior can be. The JavaScript language also has the additional advantage of being a "default language of the Web Browser" at no extra cost in web applications available. Because I think the language OMeta so important, I have posted on my homepage an area for experimentation at the link
http://ruediger-plantiko.net/ometa/

. It is similar to the http://tinlizzie.org OMeta is, as its name says, an object-oriented metalanguage that has many interesting features:

Entering OMeta must not source or even be a string, but can an array of arbitrary objects be the host language. This allows for example the transformation of abstract syntax trees. For each syntax tree iw as any sequence of nested arrays, tokens, and objects can be displayed. The JavaScript statement document.getElementById ("btnSave") click ();

for example, could be represented by the AST.
listof: p = apply (p) ("," apply p ()) *
would match one or more, separated by commas occurrence of "something that usually p enough '. For example, recognizes the term listof ('expression') a comma separated list of expressions (meet a previously defined rule expression ), while
listof ('name')
on a list of names ( the name of the previously defined rule meet
) fits.


I can not find the entire Language OMeta
explain, but only to stimulate their studies. The above statements summarize key parts of Warth's thesis. Who wants to learn more: they have
    http://www.tinlizzie.org/ometa/
    .
  • least I still want a small show in the scene of the "compiler design" favorite example: a "desktop calculator". This concrete calculator can even work with variables, which shows the way that you can build with OMeta also stateful parser. On my test page
    reach its definition by selecting
     Calculator 
    in the list of available grammars. This example is the dissertation of Warth removed. I will discuss briefly here.

    The first line
     
    ometa Calc
    a new grammar (a new transformation, a new parser, a new OMeta object) is declared. In this case, it will inherit from the built-in grammar parser
    . This built-in class currently contains only one basic parser functionality - namely, a token rule, can be identified with the space-separated string. This token
    rule is automatically highlighted in all of

  • parser inheriting grammar definitions included in the string converted commas. More you do not get through this inheritance.
  • The next line
    var = letter spaces: x -> x,

  • shows the typical syntax of a production rule in OMeta. The name of the rule is followed by an equal sign their definition, and by definition can, with an arrow
  • - separated> , a semantic action
    be specified. A semantic action is listed in the code of the host language and will be evaluated for a recognized pattern. The result of the evaluation is the return value of the so-called
  • matchall ()
    method with which you can then call the parser.

  • In this case we have a rule
     var 
    detection of variable names. First, specify that spaces should be ignored just before a variable name. This is achieved with the built-in rule
    spaces, which recognizes "zero or more spaces." Next comes the actual determination of what is to be recognized as a variable: variables should be single letter in our calculator. For this purpose, we use the built-in rule-based OMeta
    letter. After the colon, then followed by a variable name that contains in the successful application of the rule evaluation result of the semantic action (or, if no semantic action is recognized as fitting the part of the input). These variables that are assigned during parsing, then you can in the semantic Action draw - as here. We determine that the return value of the rule is exactly the recognized character.

    A simpler way would be completely equivalent formulation

    var = letter spaces

  • But we do finally learn how OMeta works, and this is the first version better.
  • sets the following variables according to the rule determine the permitted numbers - positive whole numbers: num = num: digit n: d -> (n * 10 + d * 1) The declaration of the alternatives. We also learn that even parts of a rule is already a semantic action, may be assigned. The first part of the rule also contains the already discussed left recursion.
     This rule could have been equivalent to write again: 

    digit num = +: d -> (1 * d.join ('')),
     new here is the quantifier <= c && c <= y) -> + 
    with the same semantics as in regular expressions . Enriched with the Quantifier
    + or *
    expression always evaluates to an array with the individual hits as elements. The array, we can by in the semantic action the method
     join () 
    the JavaScript Array class
    zusammenspleissen to a string, and finally to force by multiplying by 1 (type coercion) in the conversion of the numeric data type. This variant usually
    num
    is certainly instructive.

  • Now follow the
  • primary expressions : Entering a variable name will return the contents of the input of a number of strings is said to have just that number to answer, and complex expressions are intended to be clamped: primaryExpr = spaces var: x -> self . vars [x] ")" -> R, new here is the access to the variable self.vars to initialize in a special, in the so-called Parserinstanziierung
    ()
     method is defined as an empty hash and us as a container ( more precisely, serves as a symbol table) for the variables entered in the calculator website. 

    There are now the multiplication and addition rules: mulExpr = mulExpr: x "*" primaryExpr: y -> (x * y)
    mulExpr: y -> (x + y)
      be encoded. 

    The addExpr is thus more general than
    mulExpr
     and more general than the 
    primaryExpr
    . You latter contains as special cases. The following rule = expr var: x "=" expr: r -> (self.vars [x] = r)
    Expression should be output.

  • doit = (expr: r) * spaces end -> r


follows Here - outside the OMeta grammar that already mentioned initialize () method by which the computer is put into the initial state: Calc.initialize = function () {this.vars = {};}

In my "Workbench" I can now rule on the input doit

x = 1 y = 2 x + y
 get and apply the output <: Parser {

3. This is because the doit rule several expressions separated by blank space identifies and evaluates. I would not have three Entries can be made in sequence: First x = 1 then y = 2

and finally
 x + y 

Even then I would get 3 the output. The reason is that my "workbench" in each dialog step with the same Parserinstanz works and has noticed so all variable assignments already made in the symbol table vars . It's just a stateful parser - and with this property is also suitable for OMeta REPL Shells (
read-evaluate-print
loops). You might want to OMeta my workbench in my OMeta implemented XML parser View: A well-read in my view, only 42 lines of code allows the transformation of an XML document into a JavaScript object modeled tree. These are the variables, with which one can work in OMeta, it is also one of the issues with which the Viewpoints Research Institute is taken up, developed under the roof also OMeta was: To how many orders of magnitude can consist of source code with the use of appropriate, possible expressive programming environments will be reduced to still achieve that for which it now uses computer (word processing, graphics, Internet, etc.)?

This may be a first insight into a very interesting parser Comply with the I can only recommend to closer study. The fields, around the area of domain-specific languages (DSL) seem to me to be very promising. The language is compact and expressive. Its powerful features, such as object orientation, and parameterized rules of recursion is to thank that grammars can be written much more elegantly than previous tools of this kind
 


[1] Thus, in Thomas Stahl, Markus Volter , Sven Efftinge, Arno Haase:
Model-Driven Software Development
, 2 Dpunkt edition, Verlag, Heidelberg 2007th
 Jan Köhnlein and Sebastian Zarnekow, 
Xtext practice
, eclipse-Magazin 1.2010, p. 50
[2] The following ideas of the thesis by Alessandro Warth Experimenting with Programming Languages are removed.
[3] It is, however, even without the visitor design pattern, which unfortunately is intrusive and in the implementation of a certain awkwardness attached (as I have shown in my blog

).
 [4] Here, the application of a method m is 

of the object o with the argument array [args ...] by the construct [APPLY, o, m, [args ...]] modeled.

0 comments:

Post a Comment