1+ ---
2+ uid : Kaleidoscope-ch2
3+ ---
4+
15# 2. Kaleidoscope: Implementing the parser
26The chapter 2 sample doesn't actually generate any code. Instead it focuses on the general
37structure of the samples and parsing of the language. The sample for this chapter enables all
48language features to allow exploring the language and how it is parsed to help better understand
59the rest of the chapters better. It is hoped that users of this library find this helpful.
610
7- The LUbiquity .NET.Llvm version of Kaleidoscope leverages ANTLR4 to parse the language into a parse tree.
11+ The Ubiquity .NET.Llvm version of Kaleidoscope leverages ANTLR4 to parse the language into a parse tree.
812This has several advantages including logical isolation of the parsing and code generation.
913Additionally, it provides a single formal definition of the grammar for the language. Understanding
1014the language grammar from reading the LVM tutorials and source was a difficult task since it isn't
@@ -262,7 +266,7 @@ This is a simple rule for sub-expressions within parenthesis for example: `(1+2)
262266the addition so that it occurs before the division since, normally the precedence of division is higher.
263267The parse tree for that expression looks like this:
264268
265- ![ Parse Tree] ( parsetree-paren-expr.svg )
269+ ![ Parse Tree] ( ./ parsetree-paren-expr.svg)
266270
267271### FunctionCallExpression
268272``` antlr
@@ -271,7 +275,7 @@ Identifier LPAREN (expression[0] (COMMA expression[0])*)? RPAREN
271275This rule covers a function call which can have 0 or more comma delimited arguments. The parse tree
272276for the call ` foo(1, 2, 3); ` is:
273277
274- ![ Parse Tree] ( parsetree-func-call.svg )
278+ ![ Parse Tree] ( ./ parsetree-func-call.svg)
275279
276280### VarInExpression
277281``` antlr
@@ -366,20 +370,21 @@ classes so they are extensible from the parser assembly without needing to deriv
366370methods etc. Thus, the Kaleidoscope.Grammar assembly contains partial class extensions that provide simpler
367371property accessors and support methods to aid is generating the AST.
368372
369- See [ Kaleidoscope Parse Tree Examples] ( Kaleidoscope-Parsetree-examples.md ) for more information and example
373+ See [ Kaleidoscope Parse Tree Examples] ( xref: Kaleidoscope-Parsetree-examples) for more information and example
370374diagrams of the parse tree for various language constructs.
371375
372376## Abstract Syntax Tree (AST)
373377To further simplify code generators the Kaleidoscope.Runtime library contains the AstBuilder type that is
374- an ANTLR parse tree visitor. AstBuilder will convert a raw ANTLR IParseTree into an ` IEnumerable<IFunctionNode> ` .
375- That is, it visits the declarations and definitions in the parse tree to produce an ordered sequence of declarations
376- and definitions as they appeared in the source. For interactive modes - the sequence will have only a single element.
377- However, when parsing a whole source file, the parse tree may contain multiple declarations and definitions.
378+ an ANTLR parse tree visitor. AstBuilder will convert a raw ANTLR IParseTree into a a tree of ` IAstNode ` elements.
379+ That is, it visits the declarations and definitions in the parse tree to produce a full tree of declarations
380+ and definitions as they appeared in the source. For interactive modes - the tree will have only one top level node.
381+ However, when parsing a whole source file, the parse tree may contain multiple declarations and definitions under
382+ a RootNode.
378383
379- The [ Kaleidoscope AST] ( Kaleidoscope-AST.md ) is a means of simplifying the original parse tree into
380- constructs that are easy for the code generation to use directly. In the case of Kaleidoscope there are
381- a few types of nodes that are used to generate LLVM IR. The AstBuilder class is responsible for
382- generating an AST from an ANTLR4 parse tree.
384+ The [ Kaleidoscope AST] ( xref: Kaleidoscope-AST) is a means of simplifying the original parse tree into
385+ constructs that are easy for the code generation to use directly and to validate the syntax of the input source.
386+ In the case of Kaleidoscope there are a few types of nodes that are used to generate LLVM IR. The AstBuilder class
387+ is responsible for generating an AST from an ANTLR4 parse tree.
383388
384389The major simplifying transformations performed in building the AST are:
385390 * Convert top-level functions to a pair of FunctionDeclaration and FunctionDefinition
@@ -391,52 +396,60 @@ The major simplifying transformations performed in building the AST are:
391396> operators no longer exists in the AST! The AST only deals in function declarations, definitions and the built-in
392397> operators. All issues of precedence are implicitly resolved in the ordering of the nodes in the AST.
393398> Thus, the code generation doesn't need to consider the issue of user defined operators or operator
394- > precedence at all. ([ Chapter 6] ( Kaleidoscope-ch6.md ) covers the details of user defined operators)
395- >
396-
399+ > precedence at all. ([ Chapter 6] ( xref: Kaleidoscope-ch6) covers the details of user defined operators and how
400+ > the Kaleidoscope sample language uses ANTLR to implement them.)
401+
397402## Basic Application Architecture
398403
399- Generally speaking there are four main components to all of the sample chapter applications.
404+ Generally speaking, there are four main components to most of the sample chapter applications.
400405
401406 1 . The main driver application (e.g. program.cs)
402- 2 . The parser (e.g. Kaleidoscope.Grammar assembly )
403- 3 . Runtime support (e.g. Kaliedoscope.Runtime)
407+ 2 . The Read-Evaluate-Print-Loop (e.g. ReplEngine.cs )
408+ 3 . Runtime support (e.g. Kaliedoscope.Runtime and Kaleidoscope.Parser libraries )
404409 4 . The code generator (e.g. CodeGenerator.cs)
405410
406411### Driver
407412While each chapter is a bit different from the others. Many of the chapters are virtually identical for
408- the driver. In particular Chapters 3-7 only really differ in the language level support.
413+ the driver. In particular Chapters 3-7 only really differ in the name of the app and window title etc...
414+
415+ [ !code-csharp[ Program.cs] ( Program.cs )]
416+
417+ ### Read, Evaluate, Print loop
418+ The Kaleidoscope.Runtime library contains an abstract base class for building a standard REPL engine from an
419+ input TextReader. The base class handles converting the input reader into a sequence of statements, and
420+ parsing them into AST nodes. The nodes are provided to an application provided generator that produces the
421+ output result. The REPL engine base uses the abstract ShowResults method to actually show the results.
409422
410- [ !code-csharp[ Program.cs] ( ../../../Samples/Kaleidoscope/Chapter2/Program.cs#generatorloop )]
423+ [ !code-csharp[ Program.cs] ( ReplEngine.cs )]
411424
425+ ### Runtime Support
412426The Parser contains the support for parsing the Kaleidoscope language from the REPL loop interactive
413427input. The parser stack also maintains the global state of the runtime, which controls the language features
414428enabled, and if user defined operators are enabled, contains the operators defined along with their
415429precedence.
416430
417- After the parser is created an async enumerable sequence of statements is created for the parser to process.
431+ After the parser is created an enumerable sequence of statements is created for the parser to process.
418432This results in a sequence of AST nodes. After construction, the sequence is used to iterate over all of
419433the nodes generated from the user input.
420434
421- This use of an Async enumerator sequences is a bit of a different approach to things for running an interpreter Read,
435+ This use of an enumerator sequences is a bit of a different approach to things for running an interpreter Read,
422436Evaluate Print Loop, but once you get your head around it, the sequence provides a nice clean and flexible
423437mechanism for building a pipeline of transformations from the text input into the result output.
424438
425- ### Processing generated results
426- The calling application will generally subscribe to the observable sequence with a ` ShowResults ` function to show the
427- results of the generation in some fashion. For the basic samples (Chapter 3-7) it indicates the value of any JITed
428- and executed top level expressions, or the name of any functions defined. Chapter 2 has additional support for
429- showing an XML representation of the tree but the same basic pattern applies . This, helps to keep the samples
439+ ### CodeGenerator
440+ The code generator will transform the AST node into the final output for the program. For the basic samples
441+ (Chapter 3-7) it indicates the value of any JITed and executed top level expressions, or the name of any functions
442+ defined. Chapter 2 uses a generator that simply produces the node it was given as the app doesn't actually use LLVM
443+ (it focuses on parsing the language only and the REPL infrastructure) . This, helps to keep the samples
430444consistent and as similar as possible to allow direct file comparisons to show the changes for a particular feature.
431445The separation of concerns also aids in making the grammar, runtime and code generation unit-testable without the
432- driver. (Although that isn't implemented yet - it is intended for the future to help broaden testing of Ubiquity.NET.Llvm to
433- more scenarios and catch breaking issues quicker.)
446+ driver.
434447
435- [ !code-csharp[ ShowResults] ( ../../../Samples/Kaleidoscope/Chapter2/Program.cs#ShowResults )]
448+ [ !code-csharp[ ShowResults] ( CodeGenerator.cs )]
436449
437450### Special case for Chapter 2
438451Chapter 2 sample code, while still following the general patterns used in all of the chapters, is a bit
439- unique, it doesn't actually use LUbiquity .NET.Llvm at all! Instead, it is only focused on the language and parsing.
452+ unique, it doesn't actually use Ubiquity .NET.Llvm at all! Instead, it is only focused on the language and parsing.
440453This helps in understanding the basic patterns of the code. Furthermore, this chapter serves as an aid in
441454understanding the language itself. Of particular use is the ability to generate DGML and [ blockdiag] ( http://blockdiag.com )
442455representations of the parse tree for a given parse.
0 commit comments