Rapid Programming Language Prototypes with Ruby RACC


Summarized using AI

Rapid Programming Language Prototypes with Ruby RACC

Tom Lee • November 01, 2012 • Denver, Colorado • Talk

Summary of "Rapid Programming Language Prototypes with Ruby RACC"

Tom Lee, speaking at RubyConf 2012, presents on rapid programming language prototypes using Ruby and RACC, a parser generator for Ruby. The presentation aims to help attendees understand compiler construction, focusing on the differences between using mocking vs. testing in isolation.

Main Topics Discussed:

  • Introduction to Compilers:

    • Compilers take source code, process it, and output target code through a series of transformations, primarily consisting of a scanner and a parser.
  • Compiler Construction Fundamentals:

    • A brief overview of compiler theory, including the basic architecture of a compiler. Key concepts relevant to various programming languages, such as Ruby and Python, are discussed.
  • The Importance of RACC:

    • RACC, resembling Yacc and Bison, is highlighted as a user-friendly option for creating parsers in Ruby, emphasizing ease of use for experimentation and rapid development compared to its counterparts.
  • Live Coding Demonstration:

    • The presentation includes a two-thirds segment of live coding where Lee implements a simple compiler using RACC, demonstrating the flow from writing a grammar through token generation to code generation.
    • Initial examples include creating a DSL named "suck_langang" for function calls with no arguments and simple grammar structures in EBNF format.
  • Constructing a Compiler Pipeline:

    • Lee illustrates how to create a scanner to process input code into tokens, then shows how to pass those tokens through a grammar using RACC, and generate a structured output as an Abstract Syntax Tree (AST).
  • Code Generation:

    • The final step includes generating target code in C from the AST, showing how to compile high-level language code down to low-level implementation.
  • Conclusion and Q&A:

    • The talk concludes with a Q&A session where Lee addresses questions about creating complex languages, testing methodologies, and alternative representations of ASTs.

Rapid Programming Language Prototypes with Ruby RACC
Tom Lee • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

Some people test in isolation, mocking everything except the class under test. We'll start with that idea, quickly examine the drawbacks, and ask how we might fix them without losing the benefits. This will send us on a trip through behavior vs. data, mutation vs. immutability, interface vs. data dependencies, how data shape affords parallelism, and what a system optimizing each of these for natural isolation might look like.

RubyConf 2012

00:00:15.160 has everybody had a good Ruby comp so far
00:00:21.320 yes okay so I know it's uh Saturday afternoon um so you know I'll try to
00:00:27.080 keep it light um so yesterday you know hi I my name's Tom uh
00:00:36.399 I work for neic uh I uh work on the mobile monitoring and Linux Ser monitoring products uh we're hiring so
00:00:43.520 if you're looking for a job uh please feel free to spam us with your RS um I'm here today to talk to you
00:00:50.199 about rapid programming Lang rapid programming language prototypes using Ruby and rack um which is essentially
00:00:57.760 you know compiler construction using Ruby uh now before I get started I just want
00:01:02.960 to make it clear that I'm talking about rack and not rack um so rack with a K is web server
00:01:09.000 middleware so I'm not going to be speaking about uh TH or unicorn or Sinatra or rails or anything like that
00:01:16.280 um I'm talking about rack which is past generator for uh Ruby okay so when I first sort of put
00:01:24.159 the proposal for this presentation together um I kind of had it in my head that my target audience would sort of be
00:01:29.640 people who have you know maybe toyed with uh compiler Construction in the past and had some experience using uh
00:01:36.880 yak and bison and and all that sort of stuff but mat's uh keynote on Thursday
00:01:43.079 sort of made me realize that you know maybe that was a bit too ambitious and you know I I should sort of scale it
00:01:48.520 back a little bit to uh sort of give people who had that experience um some exposure to uh you know compiler
00:01:55.200 construction fundamentals um and so the first part of this presentation probably the first 10 15 minutes is uh basically uh an intro
00:02:04.079 to compiler construction basic Theory um the architecture behind a a generalized
00:02:10.599 compiler uh and and this this is great information because you know this this stuff can be
00:02:17.640 applied to pretty much any you know modern uh uh compiler in you know
00:02:24.239 compiler that you that you care to look at so if you go digging inside the uh Ruby compiler these Tech these Concepts
00:02:32.680 apply uh likewise python PHP even um all of this stuff you know it sort of
00:02:38.680 applies across the board once I've sort of covered off the fundamentals uh I'm going to dive into uh rack and you know
00:02:46.800 when you might want to use it instead of yak or bison uh and I'm going to finish up with some live coding so when I say
00:02:53.280 finish up uh probably 2/3 of this presentation is is live coding so I'm going to sort of try and Implement a a
00:02:59.840 compiler here on the stage in front of you um and you all get to laugh at my horrible horrible Ruby
00:03:07.519 code okay so first up what is a compiler a compiler takes code in a source language
00:03:15.560 uh then magic happens and then out the other end uh the target code comes out
00:03:20.840 and so the way I like to think of a compiler is is sort of as a uh as a pipeline uh with several sort of steps
00:03:26.599 along the way so there's several Transformations applied to your source code along the way before it gets dumped out as you know your your target
00:03:34.439 code uh and the first step in that process is uh the scanner so once again
00:03:39.920 if you think of your compiler as a pipeline the scanner is like the first component in that pipeline that sort of
00:03:45.000 takes your raw source code and tries to identify uh you know interesting tokens
00:03:51.439 within that source code like this is an if this is an if keyword this is a bracket uh this is a string semicolon
00:03:58.959 that sort of thing uh so you know it might strip out all the white space and comments and all that sort of stuff um
00:04:04.599 so it's really just sort of chunking up your input um to make it easier to digest um so those tokens uh you know
00:04:11.920 sort of form a stream or an array or however you want to sort of conceptualize it and that feeds into the
00:04:17.440 passer the passer takes those tokens and sort of tries to make make sense of of that raw input uh using a set of rules
00:04:25.680 and I've got up on the on the projector there you know it it outputs an intermediate representation um which is so you generic is
00:04:32.520 meaningless um so I've given the example of an abstract syntax tree which you may have heard of uh if you haven't I'll
00:04:39.039 sort of cover that more detail in a moment um so I I'm going to treat the rest of
00:04:45.800 this presentation as though the passer uh always outputs an abstract syntax tree but it's probably important to note
00:04:51.800 that it doesn't necessarily have to do that it can even you know directly generate bite code uh now passes you can you can
00:04:58.080 manually write passes so it's actually pretty trivial to to write like a top down recursive descent Passa uh by hand
00:05:05.639 uh but generally you know what you tend to see out in the wild is people will use things called pass generators uh to
00:05:12.440 take a a grammar and automatically generate a pass of for the the language
00:05:18.280 described in the grammar and so a grammar as you might suspect describes the complete syntax of
00:05:25.160 a programming language um has everybody sort of heard of uh NF sort of maybe no
00:05:32.240 kind of uh that's that's okay it it doesn't really matter too much um you'll sort of see an example later on in the presentation very very simple
00:05:38.960 example uh and essentially variants of ebnf are used for uh the dsls that
00:05:45.280 serves inputs to Passa generators uh so an example of a pass generator would be Yak or bison or even
00:05:55.319 rack uh so before I mentioned an abstract syntax tree um so this is really just you know sort
00:06:03.440 of giving you more detail about what an abstract syntax tree actually is um so it's an inmemory logical representation
00:06:10.120 of your Source program um so it's just a data structure basically so you start with this
00:06:15.800 raw uh you know blob of source code it passes through the scanner the scanner converts it into Tokens The Tokens are
00:06:22.720 fed into the passer the passor structures those tokens into uh
00:06:28.039 an um so it's it's abstract in the sense that uh we emit some detail as well so
00:06:34.280 by the time we get to an abstract syntax tree we don't care that our if statement uh requires brackets around the test
00:06:41.479 expression for example uh contrast to a concrete syntax tree which does include that that sort
00:06:47.479 of detail Okay so we've got our or you know
00:06:53.880 quote unquote intermediate representation uh and now we want to generate code uh from that intermediate
00:06:59.919 representation so in the case of an we Traverse the tree and generate the appropriate code for each node along the
00:07:07.160 way so if you're working with uh an if node for example you would the the code generator would look at the if node and
00:07:13.280 go aha this is an if node so I know to generate uh code for the test expression
00:07:19.280 associated with that if node uh and I know to generate code to test whether that expression is true and I know I
00:07:26.919 need to generate a jump to jump over the body of the the if statement if uh uh if
00:07:32.599 the expression evaluates to false and I need to generate the code for the body itself and so there you know that's all
00:07:39.400 done sort of recursively like the the code gener generator can recurse recurse into the various nodes in
00:07:48.960 the okay so I kind of spoke about the compiler essentially being a uh a
00:07:55.240 pipeline and this is kind of the the diagram you know that sort of shows what's going on so the ra code raw
00:08:01.400 source code feeds into the scanner the scanner produces Tokens The Tokens feed into the passer the passer constructs an
00:08:08.400 the feeds into the code generator and the code generator finally dumps out your target
00:08:15.120 code okay so that kind of concludes the sort of you know 10,000 foot view of you
00:08:20.479 know compiler Construction in general uh so now for a little bit about rack so
00:08:25.840 rack as you probably might have guessed by now is a pass generator uh written by Tender
00:08:31.560 Love uh you write your grammar using uh a sort of DSL which I'll sort of
00:08:37.760 demonstrate in a minute you run uh rack over your grammar uh and that will generate Ruby
00:08:44.560 code that can pass the language that you describe in your Lang in your grammar so why would you use rack over
00:08:53.920 something like yakob bison um well I I tend to use rack when I'm sort of uh
00:09:00.720 experimenting you know with crazy ideas or trying to trying to learn basically I'm I'm relatively new to you know the
00:09:06.720 whole compiler construction thing myself and so whenever I want to spin up a new project or something like that
00:09:12.800 essentially something that I'm probably going to throw away uh I get really frustrated when I have to you know do the magical dance uh required to get uh
00:09:20.720 Yak bison flex and c and everything working together uh not to mention you
00:09:25.920 know the fact that Ruby gives you hashes and arrays and all that sort of stuff
00:09:31.120 for free uh so really for me it's it's about you know getting from uh 0 to 100
00:09:38.480 as quickly as possible okay this is suck
00:09:46.480 langang um suck langang is essentially a DSL for calling functions um that
00:09:52.560 doesn't take any arguments at this point so this is a really really simple example um if I have time towards the
00:09:58.399 end which it looks like I um I'll try to uh sort of expand on this and uh make it more
00:10:05.240 interesting so up there you can see we've got so this is an example of ebnf if you've never seen it before so very
00:10:11.399 very simple example of ebnf um so on the left hand side you can see we've got uh a call uh it's called a
00:10:19.000 non-terminal um and on the right hand side we've got a sequence of tokens
00:10:24.120 so ID is essentially a token the bracket is a token the closing bracket is a
00:10:29.800 token uh which is all fed into the passer from the scanner um so I've also got a note up
00:10:36.120 there you know that big scary reject basically just means you know an ID is basically you know string that starts
00:10:42.120 with an alphabetical character or an underscore followed by zero or more alpha numeric or underscore
00:10:49.320 characters okay so this is an example of suck langang it's exciting
00:10:57.320 right okay so if you recall that the scanner takes raw source code generates
00:11:03.040 tokens passes into the passer that means there needs to be some sort of agreement between the scanner
00:11:10.279 and the passer as to how that data kind of comes in and so rack kind of sets the
00:11:16.279 sets the rule so to speak by sort of saying okay well I expect Two element sequences uh from the scanner and so the
00:11:22.800 first element in you know the Two element sequence that represents a token is the type of the token
00:11:29.920 uh and the second value uh the second element in the token is the value and what I mean by that is if you're passing
00:11:36.079 uh say string uh if you're scanning sorry if you're scanning a string uh you
00:11:41.519 would say okay well I I see a quote there's the contents of the string and another quote
00:11:47.279 okay so this is a string token so the type of the token is a string but the value of the token would be the contents
00:11:52.920 of the string um and that's important to sort of uh expose uh this that sort of like
00:12:00.600 you know that sort of information to the passer when you're constructing like the for
00:12:06.480 example okay so I think at this point uh we can sort of get started with the fun
00:12:13.000 part which is you know writing a compiler a very very simple
00:12:18.920 compiler okay so oh you can't see
00:12:37.959 uh okay so we'll open up lib sulang scanner.
00:12:45.000 RB scanner whoops the scanner will live in a module called
00:12:50.360 sulang scanner and we don't really tend to need
00:12:55.639 uh to keep State around when we're writing a scanner so I'm just making this a module level function uh scan so
00:13:02.920 the scanner takes raw source code AS
00:13:07.959 input and emits an array of tokens as
00:13:14.480 output so if we now look back at our grammar uh way back
00:13:21.240 here we can see that we've basically got basically got three tokens so we've got an ID opening bracket and a closing
00:13:27.880 bracket so let's start out by passing that ID token
00:13:36.320 so while our input so we want to consume all the input uh which we'll do by just
00:13:41.760 sort of stripping the front off the the string as we're scanning it uh
00:13:50.639 so when the start of the string contains alphabetical character or an
00:13:57.320 underscore followed by uh 0 or n alpha numeric characters or
00:14:05.279 underscores so our language isn't Unicode compliant unfortunately um okay so we need to
00:14:12.480 generate a token so this is an ID token we want to set the value of this token
00:14:19.040 to the pattern that we just matched here uh oh what am I
00:14:27.639 doing so when we match that pattern this magic variable sort of just evaluates to
00:14:32.720 the text that was matched by that Rex Okay so we've now matched an ID but
00:14:39.279 we need to strip off the front of our source string so I'll use slice
00:14:49.959 oops um and that's all we need to do to sort of scan a token um so we also need
00:14:56.279 to handle the open and close brackets
00:15:02.759 so once again when the string starts with an opening bracket or a closing
00:15:11.000 bracket we can just use the token itself so in in the case of a bracket there's
00:15:16.199 no value associated with a bracket right it's just punctu punctuation um so we don't really care about this value on
00:15:22.839 the right hand side it's just kind of a throwaway thing um I'm just throwing it in there you know more by habit than
00:15:28.120 anything else um and since we're only sort of scanning a single token in this case we can sort
00:15:33.360 of just use something like that to strip off the the first character in the in the
00:15:39.360 string okay now we don't want to accept anything else as input so anything else
00:15:44.880 that we see causes an
00:15:56.079 error and we'll just you know flat out
00:16:01.959 exit and that should pretty much be our scanner uh so let's write a quick little
00:16:07.759 driver program um
00:16:13.319 Ruby oh started writing C all right we'll include lib suang
00:16:23.240 scanner uh so the scanner emits an array of tokens
00:16:32.199 and we'll just pass the you know uh sample Source program in
00:16:42.160 directly all
00:16:50.959 righty all right so let's see how it goes okay so we're we're sort of
00:16:57.639 chunking that you know raw source code up into into tokens and this is in a format that that rack can kind of start
00:17:04.799 to work with um and sort of just to demonstrate you know at this at this point we're not applying any strict
00:17:12.039 rules to to the input so you know it doesn't have to be a legal uh program so
00:17:17.880 long as it contains characters that the scanner kind of knows about so you know that sort of stuff works as
00:17:26.160 well um but if we you know if we for example put in a character that the
00:17:31.919 scanner doesn't like then it will complain that sort of
00:17:37.679 thing um so so the scanner in a in a sense sort of enforces um some level of
00:17:42.840 of syntax but you know it's more in terms of you know weird characters and and stuff that you don't expect to find
00:17:48.320 on the input at all uh so let's let's modify this so that we can pass in
00:17:55.559 uh programs from the command line uh now this isn't going to work
00:18:04.039 because uh you'll get an implicit uh carriage return uh sorry Line Feed so we
00:18:09.760 try to pass that in Unown input blah uh so we'll go back to our
00:18:16.520 scanner and we'll strip out whites space so when the string starts with Whit
00:18:25.480 space get rid of it
00:18:33.400 okay so oh jeez we're making good time that's excellent uh okay so now that we've got
00:18:41.080 our scanner uh if you think back to that diagram uh the next thing we need to do
00:18:46.880 is Implement a passer and the way we're going to do that is by writing a rack grammar um so the rack grammar is
00:18:52.880 basically you know it's sort of a sort of pseudo ruby/ um
00:18:59.720 ebnf is DSL thing um so I guess to start with make sure you've got rack installed
00:19:05.960 um once you've done that you're pretty much ready to go so what we need to do next is ADD lib suck langang pass.
00:19:15.360 rack now pass. rack whoops yep class uh pass. rack uh starts out like this so
00:19:22.880 the name of the module followed by the passer class
00:19:30.159 uh so we want this class uh which we're going to call passer to be in this module which we're
00:19:37.880 going to call suckl so when rack generates uh the pass
00:19:43.600 class it's going to dump it out in inside the the cyang module so this rule section is where we
00:19:51.799 write our uh crazy ebnf DSL thing uh so
00:19:57.200 if we go back to our grammar we can almost use this exactly as is so
00:20:02.960 we can say okay well the call non- terminal consists of so in ebnf we use
00:20:10.080 this weird you know colon colon equals operator but uh in the in the rrar and yak and Bice and all that sort of thing
00:20:16.960 uh you only need to use a colon uh followed by ID Open Bracket
00:20:22.039 close bracket and we can literally just write it in like that
00:20:29.559 so once again just to make it clear these tokens are coming from the scanner
00:20:34.960 so these These are the the the token types in your from your from tokens from
00:20:41.000 your scanner um okay now the other thing we want to do is um you know make it easy
00:20:47.320 to call the passer without sort of you know uh messing around with it too much
00:20:52.720 so we'll use this thing called an intersection and the intersection will basically you know it basically lets you
00:20:57.919 add whatever code you want to the inside of the passer class
00:21:02.960 so pass and we take you know a sequence of tokens as
00:21:09.080 input uh and we'll just save those
00:21:14.840 tokens and then we'll call the the magical rack function uh well rack meth uh rack method called do pass uh which
00:21:22.240 will actually do the the heavy lifting of of passing uh our tokens uh now we
00:21:27.960 need to tell rack how to get those tokens so we need to define a method called Next token uh which is something that that
00:21:34.880 rack is internally aware of it'll it'll call this on your behalf so all we need to do is shift the
00:21:41.880 next token off the front of our token array and so we'll just keep doing that until we run out of tokens and you know
00:21:47.880 eventually return nil and rack will go oh I'm finished I don't need to do any anything
00:21:53.240 more okay and that's pretty much the uh passer so
00:21:59.039 now we need to make this kind of evil command line invocation
00:22:04.400 here um so basically all this is doing is running uh rack via
00:22:11.520 RBM um specifying the output file so we want to store the output file in libang passer. RB and the input file you know
00:22:19.960 libang passer. rack uh okay so let's do that
00:22:29.679 okay seems to have worked uh actually we'll we'll open that up and edit it lib suck langang pass.
00:22:40.080 RB okay so this is uh the Ruby code that that uh Yak uh Yak rack generates on
00:22:46.880 your behalf um you can see our uh intersection here so this is dumped in
00:22:52.559 you know literally you know in place uh and then everything else is
00:22:58.120 sort of you know it's it's generated based upon the the rules that we specify in our uh in our rule section so over
00:23:05.880 here um you don't have to worry too much about what this actually does this is the point of using a passive generator you don't have to worry about you know
00:23:11.919 the magic going on on the
00:23:17.400 inside uh okay now since I'm going to have to do that a few times
00:23:23.200 um I'm going to dump it into a rake file
00:23:34.720 so when we uh sorry I can't type and think at this
00:23:41.840 point um so we want to generate uh suing pass.
00:23:47.960 RB and it depends on pass. R so just execute
00:23:55.240 that and we'll set the default task to lib suang pass.
00:24:01.919 RB so when we want when we run rake uh that'll execute uh rack on our
00:24:11.400 behalf Okay so we've implemented a Passa but what does it do at this point so we
00:24:17.279 we want to construct an during the pass but at this point like R rack doesn't
00:24:22.640 rack doesn't build an as for you it it sort of doesn't Define um the output of of the past phase is so much it kind of
00:24:29.480 leaves it up to you to to to do that um so just for science uh let's open up our
00:24:37.399 driver program again and we're not returning an as at the moment but let's see what R does by
00:24:44.120 default so we'll pass in our
00:24:50.480 tokens and see what comes with that
00:24:58.440 oh forgot to require
00:25:06.600 it okay so the interesting thing is from our pass here we get this string oh high
00:25:12.360 which sort of corresponds with the identifier value um now if we go into our grammar
00:25:21.360 once again uh you can see that you know the
00:25:28.360 the IID token is the the first value uh in you know well the first token uh in
00:25:35.840 in the sequence of tokens that make up a call um and what you can actually do in rack is specify you know custom actions
00:25:43.080 and the default action for you know handling this rule is to set the result
00:25:48.679 equal to the value of the first token um
00:25:55.399 so this and this or functionally equivalent it's the
00:26:00.760 same thing um so just to prove that uh I'll run rake again because I've modified the grammar so we need to
00:26:06.399 regenerate the Passa uh and we'll run that example program again it's the exact same
00:26:15.440 thing uh now we need to construct an so so what does an look like I mean I've
00:26:21.640 spoken abouts in you know sort of a really abstract sense um and I guess the
00:26:27.159 best way to sort of demonstrate what an might look like is to sort of ask
00:26:32.640 something like rinus um so we'll say okay well oops RBM shell oh I'm already
00:26:39.520 using here so we'll say RBX compile s
00:26:45.200 e and we'll pass in you know because this is sort of legal Ruby syntax as
00:26:50.360 well we can use reinius to sort of give us a hint as to what we're doing here okay so this is giving us a SE
00:26:59.799 which is you know sort of a representation of an as uh as a series of nessed arrays and
00:27:08.039 the node type is the first value in the array and you know sort of children of
00:27:15.080 that node kind of follow and so here we've got a call and a call has a Target
00:27:23.279 so you you can specify a Target on which to call a method right so if you if you call
00:27:29.000 fu. High it sets the target of the method to
00:27:34.600 F so hopefully that makes sense it's not really important for what we're doing here but uh yeah so then then we get this
00:27:43.159 symbol oh high which is you know the the method that we're trying to call followed by uh you know an empty
00:27:49.320 argument list so let's do something
00:27:54.519 evil and blatantly copy this
00:28:03.480 ah trackpads are the bane of my existence uh okay
00:28:10.159 so we we we can really dump anything in here okay so just to make it clear you can do like 1 2 3 4 1 2 3 and set that
00:28:18.039 to the result of your pass and you'll get that as the output so here we're
00:28:23.399 going to literally just dump in that from uh
00:28:35.440 reinus and you know without modifying it at the moment run it again okay we get
00:28:41.360 our now we don't want to hard code uh that oh high uh we want to set that to
00:28:47.799 the value of the function that we're trying to call back here
00:28:54.080 so we'll set it we'll set the the name of the method that we're trying
00:28:59.640 to call to the value of that ID
00:29:04.960 token uh and I've modified the grammar regenerate the Ruby code okay so there's
00:29:12.679 our oh High um and just to prove that it is actually
00:29:19.840 working you know so you can see the value of our Target changing in
00:29:25.760 the okay so we don't need all this junk so
00:29:31.799 I'm just going to strip out the target uh we're not using it uh for now we're going to strip out the argument list as
00:29:37.120 well we can get rid of that um and so you know we're just sort of simplifying the at this point so we we don't need
00:29:43.600 any of that other junk that's just stuff that reinius gave to us
00:29:51.799 okay so there's our simplified so we've got an we've got a
00:29:58.519 Passa returning the uh all that's left for us to do is
00:30:04.399 to implement a code generator so I sort of umed andard for
00:30:09.840 for a while about you know what language to use uh in the code generation side of things um and I settled on C but you
00:30:16.399 could really use like for this sort of thing it's so simple you could use you know Ruby or or JavaScript or you know
00:30:22.440 you could probably generate you know the equivalent jbm bite code pretty easily as well um if you wanted to um but we're
00:30:28.640 going to use C so let's get started and and just go ahead and implement the code
00:30:39.799 generator okay so once again module s module code
00:30:45.480 generator I'm going to call it Class C just in case we decide to add extra backends later
00:30:52.279 on so our code generator takes a compile function which takes an as as input
00:30:58.880 so the input will be this
00:31:04.559 guy and we want to make sure that you know we're getting the correct node passed in as well so we'll match on uh
00:31:11.760 we'll match on the first item in the
00:31:17.440 array and make sure that it's it's a script node otherwise it's some sort of
00:31:22.600 error I won't bother implementing the error function for now uh so when we know we've got a
00:31:29.039 script uh we know our language only supports a single function call so you
00:31:34.679 know the child node here is going to be this call uh so what we're going to do is add
00:31:41.679 a method called compile call which is the second element in the
00:31:47.360 array so that guy and then down
00:31:52.600 here we'll add compile call it takes a call node once again we want to make sure that we do definitely have a call
00:31:59.320 node being passed in otherwise it's some sort of
00:32:06.519 error okay so we're generating C code so you know unlike Ruby you can't just dump
00:32:13.480 uh you know function calls in the mid in the middle of a source file um so what we're going to do is generate some uh
00:32:21.639 you know like a Prelude and a and a and a and a footer for the uh for the Target
00:32:27.919 source file so what we're going to do is say our
00:32:35.799 output starts with uh int
00:32:42.039 Main and then we compile our call and the
00:32:47.840 output ends with return
00:32:53.279 zero close bracket EF
00:32:58.440 all right so without actually generating the function call we should get a valid C file at this point so we'll open up
00:33:07.320 our driver program and this time we won't forget to require the code generator
00:33:22.480 module and we'll basically call the code generator directly
00:33:31.639 without and make sure we get some code at the end so I'm not really interested in these calls anymore I've I've taken
00:33:36.760 it all the way but the interesting thing you can see here is that we're taking you know our rural source source code
00:33:43.399 all the way through to our Target code so um you know this this is essentially
00:33:48.559 the pipeline the compiler pipeline in source code form um which you know is kind of
00:33:54.639 nice so if we run this guy what have I
00:34:01.000 done did I uh uh back in the uh code generator
00:34:08.200 right uh yep
00:34:13.720 gotcha okay so there's our Dum C file without any function
00:34:19.359 call uh so now we can append okay so remember that
00:34:26.159 our wherever that's gone now so remember our for our node for
00:34:34.119 call uh just contains the identifier of the method that we're trying to call uh
00:34:39.800 as its second element so we can say call one Open Bracket close bracket
00:34:47.599 semicolon sln and that's you know how we generate code for a call ASO uh in our target
00:34:55.720 language okay so we're calling ohim man here uh
00:35:03.680 but we haven't defined that you know in our code so what I'm going to
00:35:11.839 do is have main call
00:35:17.880 itself I'm glad you're all following along uh
00:35:33.359 and Kaboom so we have SE successfully written a a passer uh a compiler um
00:35:40.359 whose sole function is to crash itself um which is kind of neat um but you know
00:35:47.839 maybe we should do something a little bit more interesting uh so we're going to add a
00:35:53.680 bit of a runtime to our our uh hypothetical made crappy language um so
00:35:59.920 we'll Define uh a function called oh high and oh high we'll call Print
00:36:10.359 F and remember we're generating C code here so we need to escape that carriage
00:36:15.440 return now carriage return the backlash and include standard i.h and
00:36:21.640 all that sort of fun stuff okay
00:36:27.880 I did it again okay compile it again and run it
00:36:36.720 and there's our output so that's taking us all the way from you know uh our very
00:36:42.000 simple Beginnings in in the source code all the way to the end generating Target code and uh you know executing it so we
00:36:50.000 can take that you know maybe one step further and you know dump that out to a
00:36:57.200 file so a. out. C uh open up for
00:37:03.400 writing write code and then we'll use system to invoke
00:37:08.720 GCC because we're bad
00:37:14.560 people okay so now when we run our
00:37:21.000 compiler we get no output uh but we should have an A do
00:37:26.319 out which you know so you can kind of handwave and pretend like you're you're generating native code um which is kind
00:37:34.400 of fun um now so I've got I've got maybe
00:37:41.440 eight minutes left I think um so if anybody has any questions at this point
00:37:46.560 feel free to ask otherwise I can sort of you know expand upon what we've already got here and um keep going forwards yes
00:37:54.160 what would your recommendation be for writing sort of uh sophisticated scan so you have like more complex tokens you
00:38:02.720 want whatever is there a tool you use yeah yeah um I I mean I sort of chose to
00:38:09.119 write this one by hand because it was you know so simple that you know it wasn't worth bringing in a new tool but
00:38:16.079 um there's there's gems out there called I think like Lex R and Ruby Lex and
00:38:21.920 things like that um which let you do that kind of scanning with sort of less
00:38:27.040 effort and you know I'm I'm assuming you know uh more comp more complicated tokens as
00:38:33.280 well I think I did a like a gem search for uh Lex and that came up with a whole
00:38:38.520 bunch of useful stuff yep can you uh compose grammar so that you can have one kind of you know
00:38:46.160 language and then say have another SQL language over here and say this one includes that one so you can do in line
00:38:51.760 sequel sort of in the first one sure um I I think I've heard murmuring of things
00:38:58.400 that have been able to do that um you to the best of my knowledge you can't do it with something like rack
00:39:04.760 um I I mean I could be wrong I I I honestly don't know to to be honest uh but uh yeah I I do know that some
00:39:12.640 research has been done into it because it's the next obvious step right you know uh when you've got like SQL in your
00:39:19.079 in your code it just makes sense to kind of you know have it in there rather than you know encode it in the string or
00:39:24.960 something like that e only cont PR grammar so if you're changing your parsing rules based on where you are in
00:39:31.160 the program you can't do that with a context grammar so so like you can you can use multiple parsers so switch
00:39:38.160 between them based on what you're doing but you can't write like an ebnf that describes something where the rules are
00:39:44.160 different based on what the last
00:39:49.839 thing uh so uh could you repeat that um so
00:39:55.400 essentially um sorry what's your name GL Glenn so what Glenn was saying was um
00:40:01.200 that you you could potentially chunk it up into uh you know different different
00:40:07.319 sections of source code and use different passes for each of those sections um but you know comp composing
00:40:13.599 ebnf grammar doesn't really work does that kind of sum it up pretty much cool
00:40:19.040 yeah could you talk a little bit about the tools provided by rubinus to write your own implementation or your own
00:40:25.000 parser and own language sorry I can't quite could you could you talk a little bit about the tools provided by rubinus
00:40:31.280 to do some of that into pretty much your language uh honestly I I don't know a whole lot about rinus um I just kind of
00:40:37.440 Ed it here to you know as as an example uh to generate uh the you just you're
00:40:43.520 taking a peek under the hood at what reinius is doing internally it's not a tool for like writing your own gra right
00:40:52.440 right yeah I'm I'm not using reinius you know any as anything more just to than just to demonst what you know an as
00:40:58.560 looks like essentially y just on that topic the ridus does have numerous easy entry
00:41:05.520 points into building your own language on top of it the Ruby support Rini is just one plug in basically build using
00:41:13.119 its tooling it's uh rack is really cool but if you're doing this sort of thing you should definitely look at at the
00:41:18.760 rious tools as well the first day is EAS like the E day is
00:41:24.680 EAS cool no that's awesome I'll have to check that out uh yeah uh can you talk a little bit
00:41:31.960 about alternate
00:41:41.079 representations uh I I don't know what else uh Rous provides um but uh you know
00:41:47.880 another example is like a I think I mentioned like a concrete syntax tree where you know it doesn't it you know
00:41:53.680 all the extra syntax that's kind of included um uh you know you can directly
00:42:00.160 generate uh bu code or n code or whatever directly from within your passw
00:42:06.319 ifcl um off the top of my head I can't think of anything else
00:42:28.119 soorry how does that how does that differ from like an ASD or comprensive text you can have an ASD object
00:42:35.880 instances rather than arrays with a symbol as the first so like an array with a symbol as the first element sort
00:42:42.040 of like aor man's
00:42:47.920 tag yeah actually actually I kind of prefer that style um to be honest it's
00:42:53.040 just one of those things where you know it's so simple that it's just so much easier to to use
00:43:00.119 arise yeah you test sorry testing how do you test
00:43:06.680 something like this how do you test something like this um you write
00:43:12.000 programs um you know you can you can test all of the individual components in isolation so you can verify that your
00:43:19.119 your scanner breaks up tokens in a way that you would expect um you can verify
00:43:24.280 that given a set of tokens that your passer cons constructs uh an that you
00:43:29.359 would expect um and you know given uh you know some as you could verify that
00:43:36.240 you get the source code at the end that you would expect um so each each of those individual you know uh sections
00:43:42.040 can be you know tested in isolation sure the risk of chilling for Rus the uh the compiler test Suite in
00:43:49.480 Rus is a good example of how you can get from point you know from the starting
00:43:54.640 point to where you want to be you know check out the test from compiler and then it has mpec which is this easily
00:44:00.200 good strap uh sort of Cort of rpcs more or less and there test for that you can
00:44:05.640 see how once you those you can prove the next thing and move on okay so I I think the answer to that question was uh use
00:44:13.920 reinus Che out kinds of things it tests and it's the power test show you the
00:44:19.119 style yeah definitely I mean I want to look at rinus now so um any more questions oh yes you
00:44:27.240 built any languages of your own using this kind of process that you've open sourced uh yeah I have actually if uh I
00:44:33.960 mean if you're Keen to see a specific example come and see me afterwards and and I can give you some links um it
00:44:39.880 would be good to see a more concrete right right um and I'm I'm kind of just
00:44:44.920 about out of time but um yeah I I understand I I was kind of disappointed with the sort of simple example that you
00:44:51.920 know I I kind of had to stick with in the end mainly due to time constraints right um
00:44:57.960 so I guess are there any more questions before I sort of wind things up yeah sure just um from like an educational
00:45:05.559 standpoint do you know any tool that converts a formal grammar like
00:45:11.200 definition into
00:45:20.079 this you have a formal definition you want to see what that looks like that to
00:45:26.640 the sorry do do you mean like going back from backwards from a language to a
00:45:33.160 grammar kind of thing ormar to
00:45:38.480 language so basically just some examples kind of thing like some example grammars
00:45:43.640 yeah yeah um once once again if if you want to come and speak to me afterwards I can give you some links to to stuff
00:45:49.800 like that um
00:45:57.240 like you can take like back s form stuff and like something and get something out I don't know if there's a ruby thing
00:46:03.359 like that be cool Oh you mean like something that isn't like necessarily a DSL or yeah
00:46:10.280 like so if you just take like BNF format one of the standard grammar format there's nothing that can just take that
00:46:15.559 as input and spit out you know effectively the starting point for what you did okay sorry I think I think I
00:46:22.079 misunderstood your question um yeah I I personally don't know of anything that does exactly that
00:46:28.680 um the best I mean the best I could offer is sort of you know example Rack grammars or or Yak grammar or something
00:46:35.079 like that um all right so just winding up quickly um stuff I didn't cover more
00:46:41.440 complex grammars um being the most obvious one um and there's a whole bunch of other stuff um that kind of comes
00:46:47.359 into compilers but isn't sort of core to you know implementing a very basic
00:46:52.800 compiler if you're just sort of starting out or looking to experiment um so that's it uh once again we're hiring
Explore all talks recorded at RubyConf 2012
+46