00:00:17.000
good afternoon everyone uh thank you all for coming uh my name is uh Mike Edgar and today I'm going to be talking about
00:00:22.400
laser uh which is static analysis for Ruby in Ruby um so a little bit about me
00:00:28.160
um y just told you my name um the website is carbon. uh I am not Canadian I just like
00:00:35.040
the name um and Laser which is going to be talking about today was my undergraduate thesis uh Dartmouth
00:00:40.879
College and nowadays I just started as a Google engineer uh so what is laser laser is a
00:00:48.280
uh attempt at uh building real static analysis uh linting for Ruby and it has
00:00:53.760
two goals which are pretty actually quite distinct which is to find errors in your code or potential errors and
00:01:00.199
just general style issues uh a key part of that is that because Ruby has such a complex grammar you need a pretty
00:01:05.840
Advanced tool to find style issues so what's static analysis really about static analysis at its core is
00:01:12.880
finding properties of a program any kind of uh non-trivial properties without running it this has a lot of uses uh
00:01:18.920
biggest one the most uh important one today has been optimization um static analysis is why uh um all of our
00:01:26.840
mainstream programming languages have gotten so fast uh it's very useful for error Discovery this is where it's really becoming uh where it's been more
00:01:33.240
explored these days uh the clang team has an awesome static analyzer for CN C++ code and just for General
00:01:40.360
statistical analysis finding out things about large corpuses of uh code in a given language and static analysis is
00:01:47.479
for the most part quite hard um these are a few of the problems that we might try to solve using static analysis um uh
00:01:54.880
finding Loops finding what constants are in a program doing type inference which has been a you know some focus in Ruby
00:02:00.880
but not too much exceptions um you see this in Java actually it's part of the spec you need to do uh exception
00:02:06.920
inference um dead code elimination per inference these are all examples of static analysis and uh for the most part
00:02:13.560
they're all undecidable uh with you know without any Aid to the compiler um which
00:02:19.800
means that we are doomed to fail in terms of actually uh doing a perfect job
00:02:25.120
of all these things now a lot of you probably wondering why are we talking about static analysis and in Ruby it's
00:02:30.599
an incredibly Dynamic language anything can change at runtime um and of course
00:02:36.040
we have no types anywhere so how do we even know anything but as I said we already aren't going to figure
00:02:41.519
everything out all these problems are already undecidable so the real question and the question that I've been you know
00:02:48.120
focusing on for the last year is what can we do reliably what can we figure out and the answer is actually quite a
00:02:54.640
lot um my first motivating example uh the one that I first came up with about a year ago when I decided to start this
00:03:00.159
project was I wanted to write a tool that given a ruby method or Ruby program
00:03:05.239
find out for a given method whether it requires a block um when you call it
00:03:11.280
whether it will use one optionally well whether it will ignore one completely and when I actually formalized this I
00:03:16.959
found another uh group of methods that actually make no sense which I call block foolish ones those are the ones
00:03:22.879
that will try to yield if you don't give it a block and will not call a block if you give it one um those are an error
00:03:31.360
um so just to make it clear what I mean here's a block required method it doesn't always called The Block um but
00:03:38.680
you cannot call it safely as in you cannot call it without generating exception unless you provide a block um
00:03:45.439
a block ignored method just something that adds I mean yeah something that adds to numbers something that definitely does not use a block no
00:03:51.480
matter what block optional we know and love these These are especially common in our file operations all they're all
00:03:57.120
over the standard Library all of them numerable and 1.9 they're all black optional uh and this is just a quick
00:04:02.840
example um and this is a black foolish one which is just the inverse and this
00:04:08.920
seems incredibly foolish but that's why I call it that but you'll notice the only difference is that we screwed up
00:04:14.599
the uh the order uh of the guard on on a block optional
00:04:20.359
method so these uh are all very hard problems
00:04:27.160
um checking if a method uh determining if a method is block required is equivalent to the halting halting
00:04:32.199
problem and all the other ones are even harder um but the truth they are they
00:04:39.360
are but the truth is and before I go any further Microsoft's done a lot of great work on the halting problem um
00:04:47.759
and and they're all highly tractable um and I knew this from the start because
00:04:53.240
when we as Ruby programmers look at our me look at our methods we all know instantly whether uh which category
00:04:59.039
these methods follow into we can tell it doesn't take a lot of difficult work and that's because we use for one we use
00:05:05.880
Ruby syntax we use very common idioms and it's it's just very simple to
00:05:11.720
actually figure these things out for the mosta for uh for most cases now because
00:05:16.800
these are undecidable and unrecognizable you can always write a method that it can't figure out and that's okay
00:05:24.039
because um how are we going to do this yeah uh no one actually writes yield if
00:05:30.479
uh the halting problem is true for a given thing that just you don't write that code if you do then laser is not
00:05:36.120
the tool for you um and I'm sorry but the truth is real code is tractable for a lot of these problems and that's why
00:05:41.800
we write these tools that's why we're interested so when I started I found that two major challenges uh face Ruby
00:05:49.240
when it comes to this the first thing is metaprogramming metaprogramming is what makes Ruby so fun to use in my opinion
00:05:55.319
it's what makes it you know fascinating it lets you write incredible code but it also makes it very hard to uh for a tool
00:06:01.360
to look at a bit of code and see what it's doing for example a lot of uh active support and whatnot uses uh
00:06:07.440
string class eval just to create methods you know uh Alias method chain for example I don't know if the current
00:06:13.440
implementation uses string eval or defined method but either way it could work and a tool needs to be prepared for
00:06:18.720
both of these if you aren't then you're not going to know what methods are there and you're already as soon as you see a call to one of those methods you're
00:06:24.759
going to have no idea what's going on so that is something that you have to solve
00:06:30.160
and then the other thing is that we need flow analysis and this is something that hasn't really been done uh in a strong
00:06:35.840
General way for Ruby yet um most of our tools just look at the abstract syntax tree you can do flow analysis on a
00:06:41.479
syntax tree it's just a lot more cumbersome and a lot less fun um and as we saw in the in the block usages
00:06:47.960
example you need flow analysis you need to be able to tell which branch is taken in a given context um if you can't then
00:06:54.199
you're done from the beginning so for metaprogramming um I uh I struggled with
00:07:00.400
this for quite a bit I tried to find a way that didn't involve just running the code but the truth is that's pretty much what you got to do um and you may ask is
00:07:08.800
that still static analysis well it is static analysis unless you're as long as your program does something other than
00:07:15.360
create classes and methods um if as soon as you hit IO then um yeah as soon as
00:07:22.560
you hit IO then all of a sudden you know I we're not going to keep running the code we're going to switch to more traditional techniques but uh I note on
00:07:30.120
the slide here is this enough I'm not sure I'm not sure if this is actually enough for a full for example laser I'm
00:07:36.000
going say right now doesn't work on Rails uh probably going to be a while before it does um and I don't and so I
00:07:42.520
don't know if this approach yet is going to work just right for that but the truth is when you think about your programs how many methods do you add to
00:07:49.240
your program after a given point in time um I call that point in time the actual load time of your program up until then
00:07:55.919
you're still constructing classes through actual runtime code you're still adding method but after a certain point of time you
00:08:01.720
aren't creating new methods um method missing and the idiom of creating a method dynamically on that is an
00:08:06.879
interesting challenge one I haven't uh focused on yet I don't think that's actually too tough of a technical problem but in real world code we don't
00:08:13.960
add lots of methods in classes at runtime um and this is uh again for
00:08:19.240
applications testing Frameworks those are going to do some interesting things um but I'm focusing primarily on
00:08:25.520
applications uh so yeah um so the for the the first prong of lasers approach
00:08:32.039
is just run all the code you see in the program until you hit IO and while you do that capture all the global State
00:08:37.919
changes um and it does that using a pretty interesting technique we'll see in a sec but and then there's flow
00:08:44.000
analysis um you can do static analysis that is flow and sensitive um when I started doing research on this there
00:08:49.480
actually was uh an implementation of the type inference algorithm I use which we'll talk about in a sec which was flow
00:08:55.760
insensitive and that had some limitations rubies DW is uh for the most part flow insensitive um and that is
00:09:03.040
part of the reason why you know some people have expressed frustration with it is that it is limited in that sense
00:09:08.720
and flow sensitivity for the most part makes pretty much all analyses better it's just an extra signal in when you are trying to figure something out so
00:09:16.560
flow analysis really is Laser's major contribution um to Ruby and to the uh to the literature on this so far and we do
00:09:23.399
that by trying to build a control flow graph for Ruby coat we then put it in what's called Static single assignment
00:09:28.680
form which uh all the compiler Buffs in the room will know about um I did cons uh consider continuation passing style
00:09:35.360
but I had no idea what that would look like with Ruby so I went with static single assignment form and after we
00:09:40.920
build these graphs we all just do all the analyses on these graphs not everything laser does is on the graph
00:09:46.839
some of it is done before that really trivial stuff like seeing if you Shadow a variable with a block but the interesting stuff happens on the
00:09:53.519
graphs so what's the control flow graph it's just a representation of a program it's a graph of things called basic
00:09:59.680
blocks a basic block is a sequence of instructions which when run always starts at the beginning and always
00:10:05.959
terminates with the last instruction you can't jump into the middle of a basic block and you can't leave early this is a simple control flow
00:10:12.760
graph uh this uh yeah it for some abstract language whatever and as you
00:10:18.720
can see it starts on B1 goes into B2 this is just a simple Loop uh and at the end it exits you can't leave after the
00:10:26.440
assignment to comparison and that's just a fundamental part of basic box box um these are fundamental to how most
00:10:32.200
optimizing compilers work what instructions do we use um now
00:10:37.560
this is something that takes a lot of thought um the rinus people will tell you they put a lot of work into deciding
00:10:42.800
what bik code they use well we put a lot of work into I put uh a lot of thought
00:10:47.959
into uh picking the instruction set to use for laser um and your choices there are going to frame all of your analyses
00:10:55.040
and how much work you have to do to support those analyses lasers instruction set is is uh extremely
00:11:00.320
simple and sometimes that's a bit annoying but here's what it is um in those basic blocks uh which is what we
00:11:06.880
end up turning every single Ruby program into is just these basic blocks pointing to each other you can assign a local
00:11:12.440
variable you can call a method with Dynamic dispatch call Super raise return
00:11:18.880
jump or branch and uh while I don't support all of the Ruby language so far I've managed to P pigeon hole everything
00:11:25.279
into just those instructions um your uh
00:11:30.959
don't do throw yet uh that's uh one of the limitations that I'll be talking about later but yes catch and throw is
00:11:36.279
something I'm still trying to figure out but I do suspect it'll end up turning into a raisik uh situation so first
00:11:43.240
question is probably well there's a whole lot of features that don't seem to make sense in that context uh most obvious one is instance variables um
00:11:49.839
well I turn those into calls to instance variable and set while ignoring overrides um and there's two reasons for
00:11:56.120
that one I didn't want to add another instruction uh just because I didn't want to have to do more work later two
00:12:02.959
uh it forced me to also simultaneously solve the problem of instance variable set with a constant string and um for
00:12:11.120
example calls to that are very commonly generated through metaprogramming and sometimes they're just used to break
00:12:16.519
encapsulation so I wanted to be able to solve that and I figured by just doing this for the short term it would force
00:12:22.079
me to actually solve that problem I do the same thing for class variables as well keywords like Alias or defa or
00:12:28.760
class those are all method calls uh class turns into a lot of method calls uh and branches and whatnot blocks uh
00:12:36.120
create a a constant proc value and stick it in a temporary
00:12:41.279
variable and this doesn't really always work um there are some things that there aren't actually existing method calls
00:12:46.959
for in Ruby so I use a special class called Laser Magic um if see rubby folks
00:12:53.560
will know the Frozen VM object that's pretty much my frozen VM so for example when you set or uh get a global variable
00:13:00.560
it turns into a method called to Laser Magic um and I I mean maybe someday I'll
00:13:06.160
change this for now it's actually worked out so real difficulties um Ruby control
00:13:12.800
flow is complex and unpredictable exceptions come from pretty much every method call at least until we do further
00:13:17.839
analysis and figure out what things can raise what methods so when we first build this control flow graph every method call is going to possibly raise
00:13:23.920
an exception uh and also currently it does not model all control flows yet which is a big problem
00:13:29.680
um and this just results in very ugly graphs so here is the graph you can sort of see what's going on there for that
00:13:36.079
simple while loop um I've got it hooked up to graph viz to automatically spit these out in PNG which is pretty cool
00:13:42.839
but yeah uh and all those little red ones that are solid those are possible exceptions the dotted ones are ones that
00:13:48.680
has determined to not exist and this is just a two nested classes um so luckily uh I I'm the only
00:13:57.759
one who has to look at these while I'm Deb laser uh you don't have to worry about it but the size of them is a problem for Speed and efficiency all
00:14:04.000
right so static single assignment is a special form of a CFG uh every in this every variable is assigned only once
00:14:10.480
local variables and the compiler made Temporaries this requires rewriting the control flow graph renaming variables um
00:14:16.519
in Ruby it involves changing the structures a little bit in the situation of unassigned variables you also need to
00:14:22.360
add a new instruction type but you don't need to know about
00:14:27.639
that um just don't uh so yeah the key point is that it distinguishes variables with the same name that are assigned on
00:14:34.079
different code paths for every variable it calculates it also simultaneously calculates where it's used with respect
00:14:39.639
to flow which gives us the flow sensitivity which makes us powerful and in Ruby while I'm doing this it just so
00:14:45.839
happens that's the perfect time to uh pretty much the only time when you easily figure out when a v uh local
00:14:52.160
variable can be assigned um uh conditionally on One path so that's how
00:14:58.839
I actually surface that information all right so now we're ready to start analyzing stuff but first we
00:15:04.680
need to talk a little bit about how rubby how laser represents the Ruby object model um we pretty I pretty much
00:15:10.120
reimplemented the whole thing in laser there's a laser object a class module proc method um there's big overlap here
00:15:16.279
with rubinius and where I think uh there could be a lot of good work uh either me stealing everything they did or vice
00:15:21.759
versa we'll see how that works out um but an example of this is laser method there's one instance of these for every
00:15:28.120
method in the analyze program it implements the basic meth method functionality and it also has meth uh
00:15:33.920
has methods on the method for answering analytical questions yield type all you do is you type uh whatever method. yield
00:15:41.160
type hit enter and it'll tell you is it a block optional foolish whatever um return type for types for example and
00:15:48.639
what's interesting is because of this we can subclass laser method if we want to add more add more information now by
00:15:54.519
default uh a laser method just uses the built-in uh techniques that I'm going to going to illustrate to figure out the
00:16:00.440
answers to these questions but if you want and I do this for the send method in particular and allocate um I actually
00:16:08.440
subclass laser method and say this is actually a s method and this is really what happens when you call it that
00:16:13.959
should only be necessary for special things that are really low level in Ruby but we'll see all right so let's get some answers
00:16:20.560
the first stumbling block is dynamic dispatch with Dynamic types and yeah so we I decided everything's going to be a
00:16:26.880
method calling Ruby um now I actually have have to uh own up to that and figure out how to handle all these
00:16:32.480
different method calls so I use What's called the cartisian product algorithm this was uh uh the first paper on this I
00:16:39.279
think was in 91 or 92 for small talk and self a guy named adeson hope I'm saying that right and the real question is how
00:16:46.000
do I know what foo's type is here in the case of bar. run XYZ obviously that
00:16:51.040
depends on Bar's type we all know that but it also depends on X Y and Z's type
00:16:56.360
and by type I mean the concrete class here um I don't have any more advanced type system excuse me in laser besides
00:17:03.039
that so here's a quick simple example here we want to find out what uh the
00:17:08.880
bottommost X's type is uh in this horrible little line of code um and you
00:17:14.240
can see here it does depend on both the receiver of go and the argument um if it
00:17:20.039
goes to the double class it'll return an integer if it goes to the string class it'll return a string or if you put a
00:17:26.480
string into double. go it'll get a a but that doesn't happen I lied um so
00:17:32.160
yeah and the answer is to try all of the types the cartisian product of all the argument types um so that means every
00:17:39.160
possible combination of Bar's class X's class y's class all need to be tried and then you Union the result of these all
00:17:44.799
together now if one of these raises its type is empty so that actually handles the that um sort of handles that case uh
00:17:52.880
silently um and you just end up with all the possible actual return types this is really nice because for
00:17:59.440
one it's conservative um it is actually going to get every possible type it may
00:18:04.640
get some extras uh that aren't in the real program but it will get all the correct ones it only requires one pass
00:18:10.360
through the program an important part of this is that uh if we go back to this example every time we discover a new
00:18:16.120
type for X we have to go and recalculate the new parts of the cartisian product that's how you get the one pass part of
00:18:21.919
it uh it does require a finite number of types uh this is not uh I I tried to add
00:18:28.520
a tupal types to the language to laser and uh I was able to add some tupal types to it but um some instances
00:18:36.600
resulted in infinite types which kind of ruins everything um all right so lasers
00:18:43.000
implementation um for every single one of these combinations of the cartisian product it just duplicates the entire
00:18:48.159
control flow graph um the SSA uh the fact that it's SSA means that we're uh
00:18:53.760
flow sensitive um and this may be one of the first times someone's combined a uh
00:18:59.919
a flow sensitive cartisian product algorithm um come up with one I don't
00:19:05.280
really know um but yeah so you end up with this very nice way of uh expressing
00:19:10.400
the question laser method what is your return type for types what it does is it just goes and says well what's the
00:19:16.559
control flow graph for those types give me that's return type uh and then it checks it against expectations and
00:19:22.480
returns it now some interesting stuff happens in the control flow graph but um
00:19:27.760
it's the questions that we ask our classes and our methods they all go through the control flow graphs which lets us isolate all these things to very
00:19:35.080
relatively simple uh old algorithms um I don't use any algorithms in this that
00:19:40.360
are not at least 20 years old uh static single assignment comes from the late 80s out of IBM um control flow graphs go
00:19:47.000
back to like the 50s or 60s um so yeah if all of our analyses use cfgs um and
00:19:53.159
we need to parameterize the cfgs by the argument types and all of our uh all of
00:19:58.840
our analyses can be parameterized by by the argument types for example if we want to ask if something um yields uh we
00:20:06.400
can actually check if that depends on say the uh type of some argument that's a bad example that probably doesn't
00:20:11.520
actually happen too much but yeah that actually gives us more accuracy whenever we want to do an analysis it just makes
00:20:17.280
it a lot slower um only two analyses aren't currently parameterized by that right now and that's yield type because
00:20:22.720
that doesn't really make any sense and uh I forget the other all right so let's find some bugs I already showed you some
00:20:28.240
error disc actually back here when I said check return type against the expectations well well all that does is
00:20:36.440
uh it goes and says all right what is the expectation of uh what do I expect this method to return now you may be
00:20:42.760
thinking well that doesn't make any sense well in my opinion it does because if something's named 2 s I expect it to
00:20:48.080
return a string or a subass of that if it's to R I expect an array or subass of
00:20:53.480
that and if it's bang you really better return a Boolean um and and so these are
00:21:00.240
and so right here we just see it checks against that and adds an error of uh a warning level error it isn't a uh an um
00:21:08.600
a fatal err but yeah there's about 10 different levels of Precedence of errors in laser and so yeah it just tells you
00:21:14.240
hey that Pro that may be a mistake that you did there and it it took 16 lines of
00:21:19.279
code to add this feature to laser um and that's not much of an anomaly the hard part is building the control flow graph
00:21:25.480
uh it is built in pretty much one file which is over 2,500 lines long um and I
00:21:31.440
really got to work on that but that that's really the hard part uh after that you're writing algorithms to work with the general data structure with
00:21:37.960
very simple instructions so the analyses end up quite compact another one is unused methods this one took two days to
00:21:45.159
add reliably and I real and the Insight was just we're already calculating dispatch every time we see a method call
00:21:51.600
we're using this cartisian product algorithm to say okay what methods could this call what are the receiver types
00:21:57.039
and how about we just mark we're doing that why don't we just Mark every method we see uh that's really quite that's
00:22:02.440
pretty cheap compared to the cost of doing all this other stuff and at the end of The Run of laser we'll just see what isn't
00:22:07.880
used um and this pretty you can see right here that involved adding that one line method. Ben used I have it
00:22:13.799
highlighted it's kind of hard to see um and then this to the runner uh of
00:22:19.840
uh like the actual entry point to laser at the very end it just says you know all unused methods unused methods
00:22:25.279
iterates over every class and module every and then then over every method checks if it's had that Ben used flag
00:22:31.720
set on it and then returns it and then it creates a warning um and and because
00:22:38.120
we already model send and public sends uh dispatch um we already account for that that comes out to around 15 to 20
00:22:44.919
lines of code just to add this one feature um unused variables um now
00:22:50.720
there's some easy cases uh that are variables that are assigned but never read um and Ruby already does this it's
00:22:56.039
in the parser which is a little you know a bit of it's a place to do it um well I
00:23:02.400
no no anyway uh but it can't handle even
00:23:08.200
trivially dead code if you write if true or if you know uh something that only returns a string um it can't really uh
00:23:15.600
or yeah it can't tell which branches aren't taken um and also it can't handle Vari variables that are only used uh to
00:23:23.039
compute than unused variables um so for lasers approach we just go through every variable in the control flow graph um
00:23:30.640
find all the transitive uses and something's considered unused if uh if
00:23:35.760
it's not returned uh if not if none of its transitive uses are returned or uh
00:23:41.400
used in an impure method call so if you um add a bunch of uh numbers together and I can prove that they're numbers but
00:23:47.600
you never return any of those numbers then all of them are considered unused this also transparently works for method
00:23:53.159
arguments because those are just local variables uh as far as the control flow graph is concerned by lowering the
00:23:59.559
repres representation so far a lot of things just fall into place multiple assignments for example those are just
00:24:04.840
desugared into single assignments and branches and whatnot so yeah this turns into a dead
00:24:11.080
simple work list algorithm um and it's O of V which is the the number of
00:24:16.320
variables in the SSA control flow graph it's usually linear with the with the code size I haven't uh checked this with
00:24:23.200
my implementation um but for the most part in other languages it's shown to be uh Tri typically linear and yeah this
00:24:31.000
entire algorithm was just 55 lines of code on top of what was already there and we get flow sensitivity and dead
00:24:36.679
code does not affect uh how it sees this and so here's here's just a quick example I cheated and just ripped this
00:24:43.360
from the re the the rpcs for laser but yeah you see this here and it calculates
00:24:49.559
a constant which is just some number in like base five uh and checks it and Che
00:24:54.760
and pro and Laser here figures out one of those branches because it has some basic Purity analysis it can tell that
00:25:01.279
you know turning a number into a string is a pure operation and uh it trims off
00:25:06.720
the lines at seven 8 and 10 I'm sorry it
00:25:12.520
Tri it prunes the El Branch so it knows those lines are dead and you see that the second and third spec check that it
00:25:19.240
also sees that the variable D on line four it calculates it it even assigns it to a variable J later but then J does
00:25:26.240
nothing so it manages to kill all those very cool um I should probably add
00:25:32.399
something so it checks if x is unused because it isn't but I don't have a spec for that all right so block usage this
00:25:38.080
was the whole motiva this is what I really started out to do and obviously it's going to be harder than what I've already shown you um unused variables
00:25:44.000
are cool but I really didn't have a thesis if I could find unused variables in Ruby I mean I did I just would like a
00:25:49.600
better one um so why do we care well for one we can find errors um if you give a
00:25:54.960
block to a method that's never going to use it so you you probably are a bit confused about what you think that method is going to do it's not going to
00:26:01.279
kill you it's not going to you know raise an exception but you probably messed up um that's a silent error that
00:26:06.840
laser can find the other case is when you don't provide a method that very well could use it you may not even notice the first time because like that
00:26:13.240
silly Rand example I gave before it may not raise the first time you do it it may be that only at you know after
00:26:19.679
you've deployed your application oh you hit the edge case where it needs a block so that that that's a big thing finding
00:26:26.960
errors is a big thing and that's what lasers focused on you know and block foolish methods you already you already screwed that up um another idea is that
00:26:34.279
we could autogenerate documentation you know if we get really if I get really good at this then maybe I could hook this up to yard and uh have it
00:26:41.000
automatically spit out um whether it yields or not that'd be pretty cool and
00:26:46.039
maybe someday we can help the the um the compiler writers and the audience that
00:26:51.159
um work so tirelessly to help us all out so as Ruby programmers how do we
00:26:56.840
tell if a method needs its block during its execution when we look at a when we look at a method how do we tell that
00:27:02.240
well if it calls yield um either not in a block given uh type conditional and
00:27:07.960
without catching a local jump error or one of its super types yeah that's when
00:27:13.000
I when I first discussed this idea someone said oh what about catching local jump air well I had to account for that um you do um yeah exactly yeah I
00:27:22.399
mean there's again there's things that it can't do that's proven um but uh the other thing you could introduce the
00:27:27.840
explicit it you know Amper sand block variable in the method signature and then call it you could use procon colon
00:27:33.679
new to get get a hold of it you could then or you could not even call it right
00:27:38.960
there but you could delegate it to uh a method that does call it um and as long
00:27:44.279
as none of these are guarded by block given defined yield any user Define oh
00:27:50.440
yes uh yeah you mean like Alias method and the Alias
00:27:55.640
keyword oh yeah yeah yeah yeah absolutely um yeah and user defined aliases to those methods um but yeah or
00:28:04.120
you could do you can define an alias to block given um and there's also did
00:28:09.240
anyone here know that iterator question mark is a uh is an alias to block given
00:28:14.360
yeah that that's also one uh I did not know that um but it is
00:28:20.240
so um yeah so uh or someone could grab a hold of one of those and do is that not
00:28:25.840
n so there's a whole lot of cases that you could do to construct one of these and they all exist in even in the Ruby
00:28:31.519
core standard library though I don't think uh it um some of those do but the main ones do so we need constant
00:28:37.919
propagation and we have constant propagation um we have the ability to prune those branches as I showed you
00:28:44.000
earlier during the undefined uh the uh not sorry the unused variable case so
00:28:50.080
what we do is uh there's a little bit of uh uh support in the control flow graph Builder to make this easier to analyze
00:28:56.440
but for the most part we just copy the parameterized uh CFG twice there's no reason I mean there are reasons to
00:29:02.559
parameterize this but it's uh on for the most part it's not worth it and it would save a lot of memory and time um on one
00:29:08.880
we assume there's no block we pass a fake in Block argument to the constant propagation system um and on the other
00:29:16.320
we assume we tell it there is a block we tell it always returned true for Block given and all those things pass in a
00:29:21.799
dummy laser proc and then we track the aliases looking for attempted calls through by aliases here I mean Alias is
00:29:29.080
through the method so if you capture proc. new into X and then set y to X or
00:29:34.720
call a method with it it's a very conservative analysis but it tries to make sure it can see if you've possibly
00:29:41.000
called that block um if we see delegation if you call a method with that as the block we recursively analyze
00:29:48.279
uh at that point we then ask that method hey I'm passed you my block are you going to use it because I need to
00:29:54.120
know and this whole thing was 79 lines of code um there's uh 19 lines of a
00:30:00.320
conservative Alias analysis I mentioned it here because I don't use it for anything else yet um but and it handles
00:30:06.080
almost any uh real Ruby code you throw at it um again there's always cases that it won't be able to handle including
00:30:11.799
real world cases I'm sure um but this was one I mean yeah this is turning unrecognizable that's harder than the
00:30:17.559
halting problem um Ruby's not as hard to analyze as we've been led to believe um
00:30:23.519
and a big part of this is the syntax and the idioms for example block given uh everyone uses block given um or everyone
00:30:31.320
and uh that that's just one example yield by being an actual keyword makes it a lot easier for the control flow
00:30:37.720
graph Builder to to create separate code paths for that to make it easier for later
00:30:43.519
analyses um so yeah now simulation we talked about this at the at the start I said that we were just going to run all
00:30:49.399
the top level code um cfgs are representations of a program they're clearly executable so when laser sees
00:30:56.159
top level code it runs it uh and it gives up after it runs for either a long time either because you wrote an infinite Loop before while you're
00:31:03.039
defining classes or um or just that you wrote really long code and Laser doesn't want to waste your time um or when it
00:31:10.399
hits IO okay we turning you off um when it gives up it switches to all the
00:31:15.679
analyses that I've already told you even for the top level code so this is literally interpreting
00:31:21.760
Ruby in Ruby and you're probably thinking that must be really really really slow um but you're wrong cuz it's
00:31:27.720
way way slower uh I've got a I've got a bug actually in
00:31:33.320
the GitHub uh repository for someone who just like wrote a top level Loop to like
00:31:38.440
100,000 and it ran for like 20 minutes and they just like gave up um and that that is a problem I'll talk about it in
00:31:44.600
a sec but uh but yeah it's really slow uh luckily it only uses it again defining classes not uh does not take
00:31:52.159
that long that graph looked big but it's actually not that many instructions and this gives us some benefits while we're
00:31:58.120
doing this it tracks which edges of the CFG are actually taken whenever something's simulated uh and during
00:32:03.279
constant propagation it checks to see what things are clearly never run this uh makes analysis later faster and it
00:32:09.639
could uh possibly optimize startup time someday um there's something called the theory of uh partial evaluation which
00:32:15.519
you may want to look up um it's way too headyy to talk about now but it's all about this idea of partially running a
00:32:22.320
program and figuring out what things we can uh ahead of time ignore and a large part of the overhead of loading 300
00:32:30.000
classes into a ruby program there's tons of checks in there that can be completely all lighted so that's another
00:32:35.519
possible path for this there's not enough time to keep talking about different analyses there's some other cool things I hit most of the good stuff
00:32:41.919
um but it's time to talk for a bit about the challenges that laser faces uh the biggest challenge is that
00:32:48.080
uh there's a bust factor of one and I have a job now and I don't write Ruby um so it's maybe a little less than one um
00:32:57.320
and there's also a bit of technical debt that makes this a bit harder for other people to jump in um I mean I'm proud
00:33:03.919
that I got an academic project actually uh the source out there that doesn't usually happen um but that said there's
00:33:10.159
uh there's good corners and there's bad Corners um and you know I mentioned the control photograph Builder that's 2500
00:33:15.519
lines and it actually needs to grow um so that whole thing needs to be
00:33:20.760
modularized and made better and some of the internal apis I showed you some code Snippets before and I'm not terribly
00:33:26.120
proud of how they look because they could actually be cleaned up quite a bit to be more clear the arguments could be better and that also makes it harder for
00:33:32.559
people to jump in it's hard to analyze a control flow graph when there's so many different things that you can do with it and it isn't really
00:33:39.159
clear also laser understands Ruby it does not know c um I've been developing it using uh actually 1
00:33:46.760
193 uh and that's mainly because I've fixed about a dozen Ripper bugs that are actually now going to be in 193 making
00:33:53.159
it good enough to actually yeah
00:33:58.440
um I actually yeah uh laser does use Ripper it's good enough to actually you know power a
00:34:05.960
um sort of Ruby implementation I think it I think it's finally good enough that it could actually power Ruby implementation um and so right now it
00:34:13.679
only works with Ruby 1.9.3 my continuous build server just can't actually run laser yet um until I get 193 on it um
00:34:22.079
but yeah thing is lots of MRI is in see um it's laser does awesome when it sees just Ruby calls but eventually you're
00:34:28.000
going to get to C calls if you're using uh um yarv or MRI and the current solution is uh to have stubs for all of
00:34:35.760
these uh and not all because I don't have that much time but for lots of these methods with annotations and yeah
00:34:42.280
there are this is just a few lines from string. RB in my in the laser repository
00:34:49.200
and there's there there's a couple thousand lines of those and that's just not going to work uh the um and the
00:34:55.240
obvious solution is to use rubinus libraries I really really like this and I think this is really where it's got to go um I haven't done it yet there's uh
00:35:02.839
they need a little bit of extra support to analyze well because rubinus does use well a lot of the type checks that
00:35:08.960
happen in uh normal C Ruby um they expose in real Ruby and those are going
00:35:14.040
to need some extra help from laser to really to understand as accurately as it needs to to analyze these appropriately
00:35:20.599
there's no technical barriers to this though uh it really is you know a week or two of work and we could I I could
00:35:27.040
drop in you know 95% of rubinus libraries and um drastically improve
00:35:34.160
both the state I mean get rid of all these and I mean these aren't even completely correct
00:35:43.760
yes Diamond Back Ruby I believe they're using o camel but uh yeah uh I'll talk
00:35:53.119
about yeah well I did I I the thing is I mean that was for static I mean that's a good that's a very good idea um they
00:36:00.319
that was I didn't even think of that talk to me after I like that all right but yeah so uh there's also some good
00:36:06.400
news which is that uh a lot of the things that laser could do um I mean for example it already finds guaranteed
00:36:12.240
rises in your methods uh for example if you try to call a method that doesn't exist and it can actually prove that um
00:36:17.640
incorrect area type errors stuff like that um and I figured I'd put only one meme in this entire thing but it doesn't
00:36:24.560
tell you uh and some of that is uh for one like some method calls are supposed to always raise if you called the method
00:36:30.880
raise I'm not going to yell at you for always raising in that method because you called raise or you may write a
00:36:37.160
rapper function that does some formatting for a string then calls raise well I shouldn't yell at you for calling that um I haven't figured out exactly
00:36:43.800
what that hero stick is supposed to be um and it's not going to be just as good as did you call Rays um aity errors
00:36:51.400
should be reported uh I don't actually think that would be that much work I just haven't gotten around to it but
00:36:56.520
that is something that does work um the other thing is laser is pretty slow um we'll never get rubyist to Value
00:37:03.560
static analysis if it's slow um the compiler writers aren't aren't going to like it if they have to add if um none
00:37:10.880
of their users want to use the static analysis that can make things faster because it turns their build uh to a
00:37:16.160
couple hours um it's like it's it's a lot like testing people don't run their tests if they take half an hour to run
00:37:23.599
uh laser isn't that bad I mean uh I haven't tested I haven't had it it isn't
00:37:29.040
good enough to test on enormous code bases yet um but I suspect it is too slow um and a good a good amount of the
00:37:36.160
low hanging fruit has been picked all the algorithms used are um at most o of N squared uh there's only one that's
00:37:43.520
over squared um and that one can't really be made too much better um so it's not they aren't really algorithmic
00:37:49.319
problems I'm using algorithms that much smarter people came up with 20 years ago um so there are a few things the CFG
00:37:55.640
encoding well because I use so few instru CS that could be made a lot more compact that could improve things a lot
00:38:01.280
um it would also mean analyses could be made smarter and avoid doing extra work um I only have I I say laser is Ruby in
00:38:08.280
Ruby there are two parts of it right now that are not in Ruby um one is Ripper um
00:38:13.760
and that may actually change soon I've been working on a new parsing thing but we'll see uh and the other is that I
00:38:19.839
replaced the basic block class I talked about basic blocks earlier that got Rewritten in C++ because it just made
00:38:24.920
things so much faster it can it is still still fallback in Ruby but yeah um the
00:38:30.680
other thing is that a lot of this can be cached on disk the whole standard library for example say I drop in all of rubin's libraries well all of that stuff
00:38:38.240
can be analyzed once and just saved on disk um serialize I try the main problem
00:38:43.640
right now is I use too many uh hashes with default procs and those can't be serialized um so yeah the one m and and
00:38:52.520
so everything I've talked about so far they aren't really technical issues um the main thing right now is is that the
00:38:57.920
control flow graph is not conservative now conservative we hear that mostly uh about um you know Mr garbage collector
00:39:04.359
and being conservative is a bad thing there we all you know rubinius has the precise garbage collector well that's because garbage collection is not
00:39:10.400
undecidable um and building a accurate control flow a precise control flow graph is um unless you have a very bad
00:39:17.920
language um a non-t complete one um so yeah there's three categories of solutions precise ones which is rubinius
00:39:24.760
uh garbage collector and there's conservative things like the typical optimizing compiler CFG and the rfgc and
00:39:31.319
then there's wrong things which is what we have um it's true there's there's no
00:39:36.920
wiggle room there are things that laser will be just wrong about um and that sucks uh and the main problem is and
00:39:45.720
what it comes down to is that not every possible control flow flow through a ruby program is represented by edges in the graph um the easy solution is to add
00:39:52.880
an edge from every block to every basic block but that makes it useless and slower so we're not going to go with that
00:39:58.280
uh and the real solution is just to actually figure out just uh how to actually model these uh these
00:40:05.760
problems um as minimally as possible currently addresses catch and throw this was brought up earlier um right now
00:40:13.000
laser does model exceptions uh quite I'm pretty sure we don't do I don't do retry
00:40:18.400
but uh I haven't done that yet but it does do all the other uh uh other stuff with exceptions really well it even
00:40:23.680
handles weird cases like next and through loops and all that stuff um um and ensure but catch and throw adds
00:40:29.560
another level to the equation because any method can uh also catch or throw
00:40:34.599
and I don't know if that's going to be too hard or not but fibers and continuations not too worried about that
00:40:40.119
just yet but those are control flow constructs that have to be considered um prop cases uh bind yeah uh bindings make
00:40:48.400
things horrible um because you yeah I mean you could pass a uh uh just Callum
00:40:54.359
method with a block and that method could change a local variable well a control flow graap needs to be able to see when it local variables change or
00:40:59.720
things are going to get messed up that may not be solvable um we'll see luckily
00:41:05.079
most people don't do such a horrible thing um and the big part of laser is that I need you guys to not do the worst
00:41:10.960
things in the world um the I mean uh um
00:41:16.119
none of the compiler writers are happy because they are laughing because they have to support the worst things in the world um when you uh call a method with
00:41:25.160
with a block and they have to be ready for the fact that you're going to change all the local variables in that scope um
00:41:31.040
and that inhibits enormous optimizations that could be made um so I mean this control flow graph is
00:41:37.560
is isn't conservative yet that's not the worst thing in the world because I'm not a compiler I'm not going to generate the wrong program for you uh
00:41:45.520
but I so yeah I mean we want static I want static analysis to succeed that's
00:41:51.880
uh the more spous warnings you get or the lack of warnings that could be given that depends on accuracy that needs to
00:41:57.760
be improved the other thing is the compiler writers do want these do want these analyses I'll put words in their mouth and say that you know a lot of
00:42:03.920
these things and there's and the things I've talked about mostly so far they're not that big a deal for compiler writers but there's a lot more that could be
00:42:09.800
done Escape analyses Alias analyses um and we all want those things to be made
00:42:15.920
that much faster so yeah lasers had a lot of successes it's the first for into deep static analysis for Ruby it does a
00:42:22.280
lot more than I've even talked about today I didn't even talk about the style stuff uh it can do some pretty cool style things and it'll write your code
00:42:27.720
to fix them too um and I've just uh I'm working on a new par a new parsing
00:42:33.960
library that exposes the full par tree including white spaces comments all that stuff so that it can be even better at
00:42:40.359
that and so that it can uh know exact column numbers and give Clank style error reporting that's another big thing
00:42:46.960
um it can it can it models exception frequency and type inference could be a little bit better at that with respect
00:42:52.200
to rescue Clauses but um it does successfully model that it handle it can see when you have aity airs all that
00:42:58.280
stuff too um does Conant propagation and Branch printing it may not help programs today but that's okay so laser is a year
00:43:05.599
old uh but extremely young I truly believe it is vital to Ruby's growth I think this is a huge opportunity and I
00:43:11.680
think this is really what ruby needs uh in many ways to help stay relevant uh in a world where people are starting to
00:43:17.559
look for other tools um I think that this can really make things uh make Ruby a much more attractive to have this sort
00:43:24.559
of support uh in a year or two I could could see see laser actually supporting rails um it would need an extra plugin
00:43:31.640
uh not going to lie on that one but it could it could absolutely do that um how much time we
00:43:38.400
got my phone oh yeah I turned off my phone because it people keep kept tweeting me 254 how much time that oh
00:43:46.960
some guy tweeted while he was here that's okay thanks all right so here's just a quick
00:43:52.200
thing here let's take a look at uh V uh temp. RB um got two classes here and I'm this
00:43:59.319
is this shows off just a couple of features uh not too many but
00:44:05.240
um but yeah so I mean I I don't know if people are laughing at the horrible code or people
00:44:11.200
tweeting but um but yeah so yeah this is something that uh I I mean I wanted to find an easy way to I could have used an
00:44:17.800
if or something but this um yeah it uses gets. 2i to pick something out of there
00:44:23.040
and so recv is going to be either a foo a bar hello or nil because you could
00:44:28.839
return 10 or something like that um and then it calls do stuff with three so if
00:44:34.240
it goes to Foo it'll return three values if it goes to um bar return three values
00:44:41.640
we know it's not going to return two because we're good where we programmers and we know that three is an integer so there's nothing wrong with this code so
00:44:48.720
let's run and uh most of this time it actually is taking right now uh in my defense is
00:44:54.559
it loading the standard library and analyzing it uh because as I said I'm not cashing that yet uh but it should
00:45:00.319
only take about five more seconds 10 seconds oh there we go okay so there a couple things uh yeah um there's a blank
00:45:06.920
line at the end there's a little bit of dead code I showed you yeah you saw the dead code um oh string can be wrapped in
00:45:13.760
double my bad um I didn't even do that on purpose uh so now let's make this
00:45:21.400
well let's remove one h no we'll make an extra one so yeah now we're updating our
00:45:26.480
bar AP uh to return four things and uh the line below that's consuming that API has not
00:45:32.920
been updated it isn't prepared to have a fourth value and it's only unpacking that tupal into three things um so let's
00:45:40.400
see if laser how's it doing when you say F do youan aut fixes yes yes those the the
00:45:48.359
style thing I mean yeah it only even tries to fix style things but yeah there
00:45:53.760
we go um see that second to last line there uh r h s value being discarded um
00:45:59.119
that's it seeing that um we have the potential here to have uh four things on
00:46:05.640
the right hand side and only unpacking into three things you can see that it's analyzing all of the possible types and
00:46:11.160
let's see if it hopefully works when I do this and if it only return two which means one of those things is going to be
00:46:17.079
nil two good things one thing is that it'll see that it's nil and if you accidentally try to call something on it
00:46:22.720
then it'll notice the exception which as I said it won't tell you about um and it'll also let you know that that left
00:46:29.040
hand side is never assigned defaults to know very cool yes your com
00:46:34.960
about what uh I mean yeah I mean it's pretty it's
00:46:46.520
pretty it's extremely minimal but I like it so uh no but the good thing is that
00:46:52.680
there's already I'm sorry there you go I like that uh no but most importantly is that
00:46:59.920
laser does allow both uh tool wide settings for which things you do want to be warned about um and it also allows uh
00:47:08.480
it doesn't have file wide settings yet but it does have line settings you can add a comment to a line and say ignore
00:47:13.680
in this case uh let's see it didn't print the string that you
00:47:19.880
actually specify but I'm probably over time anyway
00:47:29.280
um because as I said it doesn't tell you what uh remember the the scumbag steep thing yeah uh that's part of how it
00:47:35.880
needs to it needs to be better at uh the other thing is that I mean because the
00:47:41.119
type analysis is cons type analysis is conservative it's going to pick up on things that aren't even in the code and
00:47:48.079
that's part of an example where it may actually have picked up a type that never will happen so I mean that yeah um
00:47:55.880
it doesn't even tell you about guaranteed exceptions it finds yet the possible exceptions that's going to be
00:48:01.359
more interesting to figure out where that goes but yeah all right so I I'm I think that's pretty much it uh any
00:48:07.280
questions
00:48:19.760
yes okay cool
00:48:34.760
cool schemes had a lot of great work on it the one one of the most fascinating projects is a a project called Stalin um
00:48:41.800
which is called that because it brutally optimizes uh and it has very slow compilation times but it generates
00:48:47.599
incredibly fast scheme but anyway yeah next anyone I thought I saw a couple other hands
00:48:53.240
yes uh re oh yeah yeah I mean part of the reason I got interested in this
00:48:58.319
because I was interested in code quality tools and yeah there's things like reek there's Rudy there's all those different
00:49:03.680
tools for uh for that sort of thing um I think that I mean for for things like
00:49:09.119
finding coat smells and stuff like that I think that laser could be a big benefit to that a big part of what it does I mean this finds you know if you
00:49:15.040
do assignment in an if it'll let you know about that um I think there's a a
00:49:20.440
lot of um what's the business word Synergy there we go a lot of synergy for a lot
00:49:26.440
of projects out there with what I've been doing so any else all right cool
00:49:31.839
thank you