00:00:15.160
has everybody had a good Ruby comp so far
00:00:21.320
yes okay so I know it's uh Saturday afternoon um so you know I'll try to
00:00:27.080
keep it light um so yesterday you know hi I my name's Tom uh
00:00:36.399
I work for neic uh I uh work on the mobile monitoring and Linux Ser monitoring products uh we're hiring so
00:00:43.520
if you're looking for a job uh please feel free to spam us with your RS um I'm here today to talk to you
00:00:50.199
about rapid programming Lang rapid programming language prototypes using Ruby and rack um which is essentially
00:00:57.760
you know compiler construction using Ruby uh now before I get started I just want
00:01:02.960
to make it clear that I'm talking about rack and not rack um so rack with a K is web server
00:01:09.000
middleware so I'm not going to be speaking about uh TH or unicorn or Sinatra or rails or anything like that
00:01:16.280
um I'm talking about rack which is past generator for uh Ruby okay so when I first sort of put
00:01:24.159
the proposal for this presentation together um I kind of had it in my head that my target audience would sort of be
00:01:29.640
people who have you know maybe toyed with uh compiler Construction in the past and had some experience using uh
00:01:36.880
yak and bison and and all that sort of stuff but mat's uh keynote on Thursday
00:01:43.079
sort of made me realize that you know maybe that was a bit too ambitious and you know I I should sort of scale it
00:01:48.520
back a little bit to uh sort of give people who had that experience um some exposure to uh you know compiler
00:01:55.200
construction fundamentals um and so the first part of this presentation probably the first 10 15 minutes is uh basically uh an intro
00:02:04.079
to compiler construction basic Theory um the architecture behind a a generalized
00:02:10.599
compiler uh and and this this is great information because you know this this stuff can be
00:02:17.640
applied to pretty much any you know modern uh uh compiler in you know
00:02:24.239
compiler that you that you care to look at so if you go digging inside the uh Ruby compiler these Tech these Concepts
00:02:32.680
apply uh likewise python PHP even um all of this stuff you know it sort of
00:02:38.680
applies across the board once I've sort of covered off the fundamentals uh I'm going to dive into uh rack and you know
00:02:46.800
when you might want to use it instead of yak or bison uh and I'm going to finish up with some live coding so when I say
00:02:53.280
finish up uh probably 2/3 of this presentation is is live coding so I'm going to sort of try and Implement a a
00:02:59.840
compiler here on the stage in front of you um and you all get to laugh at my horrible horrible Ruby
00:03:07.519
code okay so first up what is a compiler a compiler takes code in a source language
00:03:15.560
uh then magic happens and then out the other end uh the target code comes out
00:03:20.840
and so the way I like to think of a compiler is is sort of as a uh as a pipeline uh with several sort of steps
00:03:26.599
along the way so there's several Transformations applied to your source code along the way before it gets dumped out as you know your your target
00:03:34.439
code uh and the first step in that process is uh the scanner so once again
00:03:39.920
if you think of your compiler as a pipeline the scanner is like the first component in that pipeline that sort of
00:03:45.000
takes your raw source code and tries to identify uh you know interesting tokens
00:03:51.439
within that source code like this is an if this is an if keyword this is a bracket uh this is a string semicolon
00:03:58.959
that sort of thing uh so you know it might strip out all the white space and comments and all that sort of stuff um
00:04:04.599
so it's really just sort of chunking up your input um to make it easier to digest um so those tokens uh you know
00:04:11.920
sort of form a stream or an array or however you want to sort of conceptualize it and that feeds into the
00:04:17.440
passer the passer takes those tokens and sort of tries to make make sense of of that raw input uh using a set of rules
00:04:25.680
and I've got up on the on the projector there you know it it outputs an intermediate representation um which is so you generic is
00:04:32.520
meaningless um so I've given the example of an abstract syntax tree which you may have heard of uh if you haven't I'll
00:04:39.039
sort of cover that more detail in a moment um so I I'm going to treat the rest of
00:04:45.800
this presentation as though the passer uh always outputs an abstract syntax tree but it's probably important to note
00:04:51.800
that it doesn't necessarily have to do that it can even you know directly generate bite code uh now passes you can you can
00:04:58.080
manually write passes so it's actually pretty trivial to to write like a top down recursive descent Passa uh by hand
00:05:05.639
uh but generally you know what you tend to see out in the wild is people will use things called pass generators uh to
00:05:12.440
take a a grammar and automatically generate a pass of for the the language
00:05:18.280
described in the grammar and so a grammar as you might suspect describes the complete syntax of
00:05:25.160
a programming language um has everybody sort of heard of uh NF sort of maybe no
00:05:32.240
kind of uh that's that's okay it it doesn't really matter too much um you'll sort of see an example later on in the presentation very very simple
00:05:38.960
example uh and essentially variants of ebnf are used for uh the dsls that
00:05:45.280
serves inputs to Passa generators uh so an example of a pass generator would be Yak or bison or even
00:05:55.319
rack uh so before I mentioned an abstract syntax tree um so this is really just you know sort
00:06:03.440
of giving you more detail about what an abstract syntax tree actually is um so it's an inmemory logical representation
00:06:10.120
of your Source program um so it's just a data structure basically so you start with this
00:06:15.800
raw uh you know blob of source code it passes through the scanner the scanner converts it into Tokens The Tokens are
00:06:22.720
fed into the passer the passor structures those tokens into uh
00:06:28.039
an um so it's it's abstract in the sense that uh we emit some detail as well so
00:06:34.280
by the time we get to an abstract syntax tree we don't care that our if statement uh requires brackets around the test
00:06:41.479
expression for example uh contrast to a concrete syntax tree which does include that that sort
00:06:47.479
of detail Okay so we've got our or you know
00:06:53.880
quote unquote intermediate representation uh and now we want to generate code uh from that intermediate
00:06:59.919
representation so in the case of an we Traverse the tree and generate the appropriate code for each node along the
00:07:07.160
way so if you're working with uh an if node for example you would the the code generator would look at the if node and
00:07:13.280
go aha this is an if node so I know to generate uh code for the test expression
00:07:19.280
associated with that if node uh and I know to generate code to test whether that expression is true and I know I
00:07:26.919
need to generate a jump to jump over the body of the the if statement if uh uh if
00:07:32.599
the expression evaluates to false and I need to generate the code for the body itself and so there you know that's all
00:07:39.400
done sort of recursively like the the code gener generator can recurse recurse into the various nodes in
00:07:48.960
the okay so I kind of spoke about the compiler essentially being a uh a
00:07:55.240
pipeline and this is kind of the the diagram you know that sort of shows what's going on so the ra code raw
00:08:01.400
source code feeds into the scanner the scanner produces Tokens The Tokens feed into the passer the passer constructs an
00:08:08.400
the feeds into the code generator and the code generator finally dumps out your target
00:08:15.120
code okay so that kind of concludes the sort of you know 10,000 foot view of you
00:08:20.479
know compiler Construction in general uh so now for a little bit about rack so
00:08:25.840
rack as you probably might have guessed by now is a pass generator uh written by Tender
00:08:31.560
Love uh you write your grammar using uh a sort of DSL which I'll sort of
00:08:37.760
demonstrate in a minute you run uh rack over your grammar uh and that will generate Ruby
00:08:44.560
code that can pass the language that you describe in your Lang in your grammar so why would you use rack over
00:08:53.920
something like yakob bison um well I I tend to use rack when I'm sort of uh
00:09:00.720
experimenting you know with crazy ideas or trying to trying to learn basically I'm I'm relatively new to you know the
00:09:06.720
whole compiler construction thing myself and so whenever I want to spin up a new project or something like that
00:09:12.800
essentially something that I'm probably going to throw away uh I get really frustrated when I have to you know do the magical dance uh required to get uh
00:09:20.720
Yak bison flex and c and everything working together uh not to mention you
00:09:25.920
know the fact that Ruby gives you hashes and arrays and all that sort of stuff
00:09:31.120
for free uh so really for me it's it's about you know getting from uh 0 to 100
00:09:38.480
as quickly as possible okay this is suck
00:09:46.480
langang um suck langang is essentially a DSL for calling functions um that
00:09:52.560
doesn't take any arguments at this point so this is a really really simple example um if I have time towards the
00:09:58.399
end which it looks like I um I'll try to uh sort of expand on this and uh make it more
00:10:05.240
interesting so up there you can see we've got so this is an example of ebnf if you've never seen it before so very
00:10:11.399
very simple example of ebnf um so on the left hand side you can see we've got uh a call uh it's called a
00:10:19.000
non-terminal um and on the right hand side we've got a sequence of tokens
00:10:24.120
so ID is essentially a token the bracket is a token the closing bracket is a
00:10:29.800
token uh which is all fed into the passer from the scanner um so I've also got a note up
00:10:36.120
there you know that big scary reject basically just means you know an ID is basically you know string that starts
00:10:42.120
with an alphabetical character or an underscore followed by zero or more alpha numeric or underscore
00:10:49.320
characters okay so this is an example of suck langang it's exciting
00:10:57.320
right okay so if you recall that the scanner takes raw source code generates
00:11:03.040
tokens passes into the passer that means there needs to be some sort of agreement between the scanner
00:11:10.279
and the passer as to how that data kind of comes in and so rack kind of sets the
00:11:16.279
sets the rule so to speak by sort of saying okay well I expect Two element sequences uh from the scanner and so the
00:11:22.800
first element in you know the Two element sequence that represents a token is the type of the token
00:11:29.920
uh and the second value uh the second element in the token is the value and what I mean by that is if you're passing
00:11:36.079
uh say string uh if you're scanning sorry if you're scanning a string uh you
00:11:41.519
would say okay well I I see a quote there's the contents of the string and another quote
00:11:47.279
okay so this is a string token so the type of the token is a string but the value of the token would be the contents
00:11:52.920
of the string um and that's important to sort of uh expose uh this that sort of like
00:12:00.600
you know that sort of information to the passer when you're constructing like the for
00:12:06.480
example okay so I think at this point uh we can sort of get started with the fun
00:12:13.000
part which is you know writing a compiler a very very simple
00:12:18.920
compiler okay so oh you can't see
00:12:37.959
uh okay so we'll open up lib sulang scanner.
00:12:45.000
RB scanner whoops the scanner will live in a module called
00:12:50.360
sulang scanner and we don't really tend to need
00:12:55.639
uh to keep State around when we're writing a scanner so I'm just making this a module level function uh scan so
00:13:02.920
the scanner takes raw source code AS
00:13:07.959
input and emits an array of tokens as
00:13:14.480
output so if we now look back at our grammar uh way back
00:13:21.240
here we can see that we've basically got basically got three tokens so we've got an ID opening bracket and a closing
00:13:27.880
bracket so let's start out by passing that ID token
00:13:36.320
so while our input so we want to consume all the input uh which we'll do by just
00:13:41.760
sort of stripping the front off the the string as we're scanning it uh
00:13:50.639
so when the start of the string contains alphabetical character or an
00:13:57.320
underscore followed by uh 0 or n alpha numeric characters or
00:14:05.279
underscores so our language isn't Unicode compliant unfortunately um okay so we need to
00:14:12.480
generate a token so this is an ID token we want to set the value of this token
00:14:19.040
to the pattern that we just matched here uh oh what am I
00:14:27.639
doing so when we match that pattern this magic variable sort of just evaluates to
00:14:32.720
the text that was matched by that Rex Okay so we've now matched an ID but
00:14:39.279
we need to strip off the front of our source string so I'll use slice
00:14:49.959
oops um and that's all we need to do to sort of scan a token um so we also need
00:14:56.279
to handle the open and close brackets
00:15:02.759
so once again when the string starts with an opening bracket or a closing
00:15:11.000
bracket we can just use the token itself so in in the case of a bracket there's
00:15:16.199
no value associated with a bracket right it's just punctu punctuation um so we don't really care about this value on
00:15:22.839
the right hand side it's just kind of a throwaway thing um I'm just throwing it in there you know more by habit than
00:15:28.120
anything else um and since we're only sort of scanning a single token in this case we can sort
00:15:33.360
of just use something like that to strip off the the first character in the in the
00:15:39.360
string okay now we don't want to accept anything else as input so anything else
00:15:44.880
that we see causes an
00:15:56.079
error and we'll just you know flat out
00:16:01.959
exit and that should pretty much be our scanner uh so let's write a quick little
00:16:07.759
driver program um
00:16:13.319
Ruby oh started writing C all right we'll include lib suang
00:16:23.240
scanner uh so the scanner emits an array of tokens
00:16:32.199
and we'll just pass the you know uh sample Source program in
00:16:42.160
directly all
00:16:50.959
righty all right so let's see how it goes okay so we're we're sort of
00:16:57.639
chunking that you know raw source code up into into tokens and this is in a format that that rack can kind of start
00:17:04.799
to work with um and sort of just to demonstrate you know at this at this point we're not applying any strict
00:17:12.039
rules to to the input so you know it doesn't have to be a legal uh program so
00:17:17.880
long as it contains characters that the scanner kind of knows about so you know that sort of stuff works as
00:17:26.160
well um but if we you know if we for example put in a character that the
00:17:31.919
scanner doesn't like then it will complain that sort of
00:17:37.679
thing um so so the scanner in a in a sense sort of enforces um some level of
00:17:42.840
of syntax but you know it's more in terms of you know weird characters and and stuff that you don't expect to find
00:17:48.320
on the input at all uh so let's let's modify this so that we can pass in
00:17:55.559
uh programs from the command line uh now this isn't going to work
00:18:04.039
because uh you'll get an implicit uh carriage return uh sorry Line Feed so we
00:18:09.760
try to pass that in Unown input blah uh so we'll go back to our
00:18:16.520
scanner and we'll strip out whites space so when the string starts with Whit
00:18:25.480
space get rid of it
00:18:33.400
okay so oh jeez we're making good time that's excellent uh okay so now that we've got
00:18:41.080
our scanner uh if you think back to that diagram uh the next thing we need to do
00:18:46.880
is Implement a passer and the way we're going to do that is by writing a rack grammar um so the rack grammar is
00:18:52.880
basically you know it's sort of a sort of pseudo ruby/ um
00:18:59.720
ebnf is DSL thing um so I guess to start with make sure you've got rack installed
00:19:05.960
um once you've done that you're pretty much ready to go so what we need to do next is ADD lib suck langang pass.
00:19:15.360
rack now pass. rack whoops yep class uh pass. rack uh starts out like this so
00:19:22.880
the name of the module followed by the passer class
00:19:30.159
uh so we want this class uh which we're going to call passer to be in this module which we're
00:19:37.880
going to call suckl so when rack generates uh the pass
00:19:43.600
class it's going to dump it out in inside the the cyang module so this rule section is where we
00:19:51.799
write our uh crazy ebnf DSL thing uh so
00:19:57.200
if we go back to our grammar we can almost use this exactly as is so
00:20:02.960
we can say okay well the call non- terminal consists of so in ebnf we use
00:20:10.080
this weird you know colon colon equals operator but uh in the in the rrar and yak and Bice and all that sort of thing
00:20:16.960
uh you only need to use a colon uh followed by ID Open Bracket
00:20:22.039
close bracket and we can literally just write it in like that
00:20:29.559
so once again just to make it clear these tokens are coming from the scanner
00:20:34.960
so these These are the the the token types in your from your from tokens from
00:20:41.000
your scanner um okay now the other thing we want to do is um you know make it easy
00:20:47.320
to call the passer without sort of you know uh messing around with it too much
00:20:52.720
so we'll use this thing called an intersection and the intersection will basically you know it basically lets you
00:20:57.919
add whatever code you want to the inside of the passer class
00:21:02.960
so pass and we take you know a sequence of tokens as
00:21:09.080
input uh and we'll just save those
00:21:14.840
tokens and then we'll call the the magical rack function uh well rack meth uh rack method called do pass uh which
00:21:22.240
will actually do the the heavy lifting of of passing uh our tokens uh now we
00:21:27.960
need to tell rack how to get those tokens so we need to define a method called Next token uh which is something that that
00:21:34.880
rack is internally aware of it'll it'll call this on your behalf so all we need to do is shift the
00:21:41.880
next token off the front of our token array and so we'll just keep doing that until we run out of tokens and you know
00:21:47.880
eventually return nil and rack will go oh I'm finished I don't need to do any anything
00:21:53.240
more okay and that's pretty much the uh passer so
00:21:59.039
now we need to make this kind of evil command line invocation
00:22:04.400
here um so basically all this is doing is running uh rack via
00:22:11.520
RBM um specifying the output file so we want to store the output file in libang passer. RB and the input file you know
00:22:19.960
libang passer. rack uh okay so let's do that
00:22:29.679
okay seems to have worked uh actually we'll we'll open that up and edit it lib suck langang pass.
00:22:40.080
RB okay so this is uh the Ruby code that that uh Yak uh Yak rack generates on
00:22:46.880
your behalf um you can see our uh intersection here so this is dumped in
00:22:52.559
you know literally you know in place uh and then everything else is
00:22:58.120
sort of you know it's it's generated based upon the the rules that we specify in our uh in our rule section so over
00:23:05.880
here um you don't have to worry too much about what this actually does this is the point of using a passive generator you don't have to worry about you know
00:23:11.919
the magic going on on the
00:23:17.400
inside uh okay now since I'm going to have to do that a few times
00:23:23.200
um I'm going to dump it into a rake file
00:23:34.720
so when we uh sorry I can't type and think at this
00:23:41.840
point um so we want to generate uh suing pass.
00:23:47.960
RB and it depends on pass. R so just execute
00:23:55.240
that and we'll set the default task to lib suang pass.
00:24:01.919
RB so when we want when we run rake uh that'll execute uh rack on our
00:24:11.400
behalf Okay so we've implemented a Passa but what does it do at this point so we
00:24:17.279
we want to construct an during the pass but at this point like R rack doesn't
00:24:22.640
rack doesn't build an as for you it it sort of doesn't Define um the output of of the past phase is so much it kind of
00:24:29.480
leaves it up to you to to to do that um so just for science uh let's open up our
00:24:37.399
driver program again and we're not returning an as at the moment but let's see what R does by
00:24:44.120
default so we'll pass in our
00:24:50.480
tokens and see what comes with that
00:24:58.440
oh forgot to require
00:25:06.600
it okay so the interesting thing is from our pass here we get this string oh high
00:25:12.360
which sort of corresponds with the identifier value um now if we go into our grammar
00:25:21.360
once again uh you can see that you know the
00:25:28.360
the IID token is the the first value uh in you know well the first token uh in
00:25:35.840
in the sequence of tokens that make up a call um and what you can actually do in rack is specify you know custom actions
00:25:43.080
and the default action for you know handling this rule is to set the result
00:25:48.679
equal to the value of the first token um
00:25:55.399
so this and this or functionally equivalent it's the
00:26:00.760
same thing um so just to prove that uh I'll run rake again because I've modified the grammar so we need to
00:26:06.399
regenerate the Passa uh and we'll run that example program again it's the exact same
00:26:15.440
thing uh now we need to construct an so so what does an look like I mean I've
00:26:21.640
spoken abouts in you know sort of a really abstract sense um and I guess the
00:26:27.159
best way to sort of demonstrate what an might look like is to sort of ask
00:26:32.640
something like rinus um so we'll say okay well oops RBM shell oh I'm already
00:26:39.520
using here so we'll say RBX compile s
00:26:45.200
e and we'll pass in you know because this is sort of legal Ruby syntax as
00:26:50.360
well we can use reinius to sort of give us a hint as to what we're doing here okay so this is giving us a SE
00:26:59.799
which is you know sort of a representation of an as uh as a series of nessed arrays and
00:27:08.039
the node type is the first value in the array and you know sort of children of
00:27:15.080
that node kind of follow and so here we've got a call and a call has a Target
00:27:23.279
so you you can specify a Target on which to call a method right so if you if you call
00:27:29.000
fu. High it sets the target of the method to
00:27:34.600
F so hopefully that makes sense it's not really important for what we're doing here but uh yeah so then then we get this
00:27:43.159
symbol oh high which is you know the the method that we're trying to call followed by uh you know an empty
00:27:49.320
argument list so let's do something
00:27:54.519
evil and blatantly copy this
00:28:03.480
ah trackpads are the bane of my existence uh okay
00:28:10.159
so we we we can really dump anything in here okay so just to make it clear you can do like 1 2 3 4 1 2 3 and set that
00:28:18.039
to the result of your pass and you'll get that as the output so here we're
00:28:23.399
going to literally just dump in that from uh
00:28:35.440
reinus and you know without modifying it at the moment run it again okay we get
00:28:41.360
our now we don't want to hard code uh that oh high uh we want to set that to
00:28:47.799
the value of the function that we're trying to call back here
00:28:54.080
so we'll set it we'll set the the name of the method that we're trying
00:28:59.640
to call to the value of that ID
00:29:04.960
token uh and I've modified the grammar regenerate the Ruby code okay so there's
00:29:12.679
our oh High um and just to prove that it is actually
00:29:19.840
working you know so you can see the value of our Target changing in
00:29:25.760
the okay so we don't need all this junk so
00:29:31.799
I'm just going to strip out the target uh we're not using it uh for now we're going to strip out the argument list as
00:29:37.120
well we can get rid of that um and so you know we're just sort of simplifying the at this point so we we don't need
00:29:43.600
any of that other junk that's just stuff that reinius gave to us
00:29:51.799
okay so there's our simplified so we've got an we've got a
00:29:58.519
Passa returning the uh all that's left for us to do is
00:30:04.399
to implement a code generator so I sort of umed andard for
00:30:09.840
for a while about you know what language to use uh in the code generation side of things um and I settled on C but you
00:30:16.399
could really use like for this sort of thing it's so simple you could use you know Ruby or or JavaScript or you know
00:30:22.440
you could probably generate you know the equivalent jbm bite code pretty easily as well um if you wanted to um but we're
00:30:28.640
going to use C so let's get started and and just go ahead and implement the code
00:30:39.799
generator okay so once again module s module code
00:30:45.480
generator I'm going to call it Class C just in case we decide to add extra backends later
00:30:52.279
on so our code generator takes a compile function which takes an as as input
00:30:58.880
so the input will be this
00:31:04.559
guy and we want to make sure that you know we're getting the correct node passed in as well so we'll match on uh
00:31:11.760
we'll match on the first item in the
00:31:17.440
array and make sure that it's it's a script node otherwise it's some sort of
00:31:22.600
error I won't bother implementing the error function for now uh so when we know we've got a
00:31:29.039
script uh we know our language only supports a single function call so you
00:31:34.679
know the child node here is going to be this call uh so what we're going to do is add
00:31:41.679
a method called compile call which is the second element in the
00:31:47.360
array so that guy and then down
00:31:52.600
here we'll add compile call it takes a call node once again we want to make sure that we do definitely have a call
00:31:59.320
node being passed in otherwise it's some sort of
00:32:06.519
error okay so we're generating C code so you know unlike Ruby you can't just dump
00:32:13.480
uh you know function calls in the mid in the middle of a source file um so what we're going to do is generate some uh
00:32:21.639
you know like a Prelude and a and a and a and a footer for the uh for the Target
00:32:27.919
source file so what we're going to do is say our
00:32:35.799
output starts with uh int
00:32:42.039
Main and then we compile our call and the
00:32:47.840
output ends with return
00:32:53.279
zero close bracket EF
00:32:58.440
all right so without actually generating the function call we should get a valid C file at this point so we'll open up
00:33:07.320
our driver program and this time we won't forget to require the code generator
00:33:22.480
module and we'll basically call the code generator directly
00:33:31.639
without and make sure we get some code at the end so I'm not really interested in these calls anymore I've I've taken
00:33:36.760
it all the way but the interesting thing you can see here is that we're taking you know our rural source source code
00:33:43.399
all the way through to our Target code so um you know this this is essentially
00:33:48.559
the pipeline the compiler pipeline in source code form um which you know is kind of
00:33:54.639
nice so if we run this guy what have I
00:34:01.000
done did I uh uh back in the uh code generator
00:34:08.200
right uh yep
00:34:13.720
gotcha okay so there's our Dum C file without any function
00:34:19.359
call uh so now we can append okay so remember that
00:34:26.159
our wherever that's gone now so remember our for our node for
00:34:34.119
call uh just contains the identifier of the method that we're trying to call uh
00:34:39.800
as its second element so we can say call one Open Bracket close bracket
00:34:47.599
semicolon sln and that's you know how we generate code for a call ASO uh in our target
00:34:55.720
language okay so we're calling ohim man here uh
00:35:03.680
but we haven't defined that you know in our code so what I'm going to
00:35:11.839
do is have main call
00:35:17.880
itself I'm glad you're all following along uh
00:35:33.359
and Kaboom so we have SE successfully written a a passer uh a compiler um
00:35:40.359
whose sole function is to crash itself um which is kind of neat um but you know
00:35:47.839
maybe we should do something a little bit more interesting uh so we're going to add a
00:35:53.680
bit of a runtime to our our uh hypothetical made crappy language um so
00:35:59.920
we'll Define uh a function called oh high and oh high we'll call Print
00:36:10.359
F and remember we're generating C code here so we need to escape that carriage
00:36:15.440
return now carriage return the backlash and include standard i.h and
00:36:21.640
all that sort of fun stuff okay
00:36:27.880
I did it again okay compile it again and run it
00:36:36.720
and there's our output so that's taking us all the way from you know uh our very
00:36:42.000
simple Beginnings in in the source code all the way to the end generating Target code and uh you know executing it so we
00:36:50.000
can take that you know maybe one step further and you know dump that out to a
00:36:57.200
file so a. out. C uh open up for
00:37:03.400
writing write code and then we'll use system to invoke
00:37:08.720
GCC because we're bad
00:37:14.560
people okay so now when we run our
00:37:21.000
compiler we get no output uh but we should have an A do
00:37:26.319
out which you know so you can kind of handwave and pretend like you're you're generating native code um which is kind
00:37:34.400
of fun um now so I've got I've got maybe
00:37:41.440
eight minutes left I think um so if anybody has any questions at this point
00:37:46.560
feel free to ask otherwise I can sort of you know expand upon what we've already got here and um keep going forwards yes
00:37:54.160
what would your recommendation be for writing sort of uh sophisticated scan so you have like more complex tokens you
00:38:02.720
want whatever is there a tool you use yeah yeah um I I mean I sort of chose to
00:38:09.119
write this one by hand because it was you know so simple that you know it wasn't worth bringing in a new tool but
00:38:16.079
um there's there's gems out there called I think like Lex R and Ruby Lex and
00:38:21.920
things like that um which let you do that kind of scanning with sort of less
00:38:27.040
effort and you know I'm I'm assuming you know uh more comp more complicated tokens as
00:38:33.280
well I think I did a like a gem search for uh Lex and that came up with a whole
00:38:38.520
bunch of useful stuff yep can you uh compose grammar so that you can have one kind of you know
00:38:46.160
language and then say have another SQL language over here and say this one includes that one so you can do in line
00:38:51.760
sequel sort of in the first one sure um I I think I've heard murmuring of things
00:38:58.400
that have been able to do that um you to the best of my knowledge you can't do it with something like rack
00:39:04.760
um I I mean I could be wrong I I I honestly don't know to to be honest uh but uh yeah I I do know that some
00:39:12.640
research has been done into it because it's the next obvious step right you know uh when you've got like SQL in your
00:39:19.079
in your code it just makes sense to kind of you know have it in there rather than you know encode it in the string or
00:39:24.960
something like that e only cont PR grammar so if you're changing your parsing rules based on where you are in
00:39:31.160
the program you can't do that with a context grammar so so like you can you can use multiple parsers so switch
00:39:38.160
between them based on what you're doing but you can't write like an ebnf that describes something where the rules are
00:39:44.160
different based on what the last
00:39:49.839
thing uh so uh could you repeat that um so
00:39:55.400
essentially um sorry what's your name GL Glenn so what Glenn was saying was um
00:40:01.200
that you you could potentially chunk it up into uh you know different different
00:40:07.319
sections of source code and use different passes for each of those sections um but you know comp composing
00:40:13.599
ebnf grammar doesn't really work does that kind of sum it up pretty much cool
00:40:19.040
yeah could you talk a little bit about the tools provided by rubinus to write your own implementation or your own
00:40:25.000
parser and own language sorry I can't quite could you could you talk a little bit about the tools provided by rubinus
00:40:31.280
to do some of that into pretty much your language uh honestly I I don't know a whole lot about rinus um I just kind of
00:40:37.440
Ed it here to you know as as an example uh to generate uh the you just you're
00:40:43.520
taking a peek under the hood at what reinius is doing internally it's not a tool for like writing your own gra right
00:40:52.440
right yeah I'm I'm not using reinius you know any as anything more just to than just to demonst what you know an as
00:40:58.560
looks like essentially y just on that topic the ridus does have numerous easy entry
00:41:05.520
points into building your own language on top of it the Ruby support Rini is just one plug in basically build using
00:41:13.119
its tooling it's uh rack is really cool but if you're doing this sort of thing you should definitely look at at the
00:41:18.760
rious tools as well the first day is EAS like the E day is
00:41:24.680
EAS cool no that's awesome I'll have to check that out uh yeah uh can you talk a little bit
00:41:31.960
about alternate
00:41:41.079
representations uh I I don't know what else uh Rous provides um but uh you know
00:41:47.880
another example is like a I think I mentioned like a concrete syntax tree where you know it doesn't it you know
00:41:53.680
all the extra syntax that's kind of included um uh you know you can directly
00:42:00.160
generate uh bu code or n code or whatever directly from within your passw
00:42:06.319
ifcl um off the top of my head I can't think of anything else
00:42:28.119
soorry how does that how does that differ from like an ASD or comprensive text you can have an ASD object
00:42:35.880
instances rather than arrays with a symbol as the first so like an array with a symbol as the first element sort
00:42:42.040
of like aor man's
00:42:47.920
tag yeah actually actually I kind of prefer that style um to be honest it's
00:42:53.040
just one of those things where you know it's so simple that it's just so much easier to to use
00:43:00.119
arise yeah you test sorry testing how do you test
00:43:06.680
something like this how do you test something like this um you write
00:43:12.000
programs um you know you can you can test all of the individual components in isolation so you can verify that your
00:43:19.119
your scanner breaks up tokens in a way that you would expect um you can verify
00:43:24.280
that given a set of tokens that your passer cons constructs uh an that you
00:43:29.359
would expect um and you know given uh you know some as you could verify that
00:43:36.240
you get the source code at the end that you would expect um so each each of those individual you know uh sections
00:43:42.040
can be you know tested in isolation sure the risk of chilling for Rus the uh the compiler test Suite in
00:43:49.480
Rus is a good example of how you can get from point you know from the starting
00:43:54.640
point to where you want to be you know check out the test from compiler and then it has mpec which is this easily
00:44:00.200
good strap uh sort of Cort of rpcs more or less and there test for that you can
00:44:05.640
see how once you those you can prove the next thing and move on okay so I I think the answer to that question was uh use
00:44:13.920
reinus Che out kinds of things it tests and it's the power test show you the
00:44:19.119
style yeah definitely I mean I want to look at rinus now so um any more questions oh yes you
00:44:27.240
built any languages of your own using this kind of process that you've open sourced uh yeah I have actually if uh I
00:44:33.960
mean if you're Keen to see a specific example come and see me afterwards and and I can give you some links um it
00:44:39.880
would be good to see a more concrete right right um and I'm I'm kind of just
00:44:44.920
about out of time but um yeah I I understand I I was kind of disappointed with the sort of simple example that you
00:44:51.920
know I I kind of had to stick with in the end mainly due to time constraints right um
00:44:57.960
so I guess are there any more questions before I sort of wind things up yeah sure just um from like an educational
00:45:05.559
standpoint do you know any tool that converts a formal grammar like
00:45:11.200
definition into
00:45:20.079
this you have a formal definition you want to see what that looks like that to
00:45:26.640
the sorry do do you mean like going back from backwards from a language to a
00:45:33.160
grammar kind of thing ormar to
00:45:38.480
language so basically just some examples kind of thing like some example grammars
00:45:43.640
yeah yeah um once once again if if you want to come and speak to me afterwards I can give you some links to to stuff
00:45:49.800
like that um
00:45:57.240
like you can take like back s form stuff and like something and get something out I don't know if there's a ruby thing
00:46:03.359
like that be cool Oh you mean like something that isn't like necessarily a DSL or yeah
00:46:10.280
like so if you just take like BNF format one of the standard grammar format there's nothing that can just take that
00:46:15.559
as input and spit out you know effectively the starting point for what you did okay sorry I think I think I
00:46:22.079
misunderstood your question um yeah I I personally don't know of anything that does exactly that
00:46:28.680
um the best I mean the best I could offer is sort of you know example Rack grammars or or Yak grammar or something
00:46:35.079
like that um all right so just winding up quickly um stuff I didn't cover more
00:46:41.440
complex grammars um being the most obvious one um and there's a whole bunch of other stuff um that kind of comes
00:46:47.359
into compilers but isn't sort of core to you know implementing a very basic
00:46:52.800
compiler if you're just sort of starting out or looking to experiment um so that's it uh once again we're hiring