00:00:15.400
all right did everybody uh make it to Jim wri's great keynote this morning all right good because uh your
00:00:22.160
mind needs to be kind of limber for uh the adventure we're about to set off
00:00:28.720
on okay so uh I titled my talk there and back again or how I set out to Benchmark
00:00:35.200
an algorithm and ended up fixing Ruby because that's basically exactly what I did and I would like to share the story
00:00:42.480
of that adventure with you okay so first of all my name is
00:00:47.840
Joshua I live and work from home in Ankara Turkey which is actually a very
00:00:53.600
lovely place and I work for a company that used to be called ELC but we're actually in
00:01:00.640
the process uh right in the middle of the process of rebranding is Burnside digital so uh if you go home from Ruby
00:01:07.960
conf and want to find out more information Burnside digital is uh the place you want to go
00:01:13.759
looking all right so like any good story this one comes with a warning this is an advanced
00:01:20.119
presentation while I do not expect you to be an expert on the inner workings of Ruby I do hope that you have at least
00:01:27.079
some familiarity with pointers memory allocation memory Pages if you're prone to motion sickness
00:01:34.200
or become nauseous at the S of heximal memory addresses this talk may not be for you and um if you want to take a
00:01:41.200
moment now to check out one of the others I would not be offended um I realized when I proposed this talk that
00:01:47.079
it was a little ambitious uh but I figured what the heck maybe uh maybe somebody else out there has as much fun
00:01:53.439
doing this kind of stuff as I do okay I I do want to make sure that
00:02:00.399
everybody gets a little something out of this talk though so I want to before we jump into the actual adventure story uh
00:02:06.159
do a little bit of talking about um contributing to open source uh Ruby or
00:02:11.840
any open source project really I'm I'm going to be talking in the context of Ruby but this applies uh pretty
00:02:17.519
ubiquitously so I've been hanging out on the Ruby cor mailing list for a while and there's a
00:02:24.160
question that pops up occasionally you get some uh gung-ho programmer discovers Ruby or you know gets really excited
00:02:31.400
wants to contribute how can I contribute to Ruby they say right let me let me at those bugs well one thing that I found
00:02:39.800
is the best way that you can contribute to Ruby or any open source project for that matter use
00:02:46.440
it don't go looking for the bugs use Ruby and the bugs will find
00:02:53.480
you and that's actually a a sort of a more efficient way of going about contributing to a project because after
00:03:00.480
all we write software not to write software we write software to run software so the more you run it the
00:03:06.120
better it is going to be now let's say you've been using Ruby
00:03:12.080
and you happen to stumble across a bug what next well I want to introduce you
00:03:17.440
to the four RS of bug reporting these are things that you should do if you want to be a good open-source
00:03:24.280
citizen so you want to reproduce report reduce and regress
00:03:31.879
okay let's look at those a little closer so this is an often overlooked uh step
00:03:38.599
in in reporting bugs but the first thing that you need to do if you think you found a bug is make sure you found a
00:03:46.120
bug a bug that happens once is not a bug that you can really go about
00:03:51.760
fixing so the first thing you want to do is run the code again and see if it
00:03:58.400
crashes or if whatever the bug happens to be reproduces can you make it occur every time now not every bug is 100%
00:04:06.439
reproducible once you get into things like threading and memory allocation maybe it's only 60% 50% it should be
00:04:13.360
more than zero though if you can't reproduce it you're not going to be able to fix it in the process of reproducing it
00:04:20.680
there are a number of things that you need to be aware of Ruby can be uh influenced by a number of different
00:04:26.280
configuration files an irbr RC a gem RC very ious other RC's the environment
00:04:32.240
that you're running it in the particular Shell Shell configurations environment variables all of these things can have
00:04:37.960
subtle effects on the way that Ruby is running so make sure when you're trying to reproduce this that you pay attention
00:04:44.880
especially uh lately a lot of people have been using rvm and rbn and these
00:04:50.800
also do a lot to the environment that may alter it in such a way that uh it
00:04:56.160
affects whether or not the bug is going to occur the best thing that you can do if you're trying to reproduce a bug is
00:05:02.280
ask a friend get one of your buddies on chat or whatever and say hey I ran this
00:05:07.720
code I think there's a bug can you run it do you see the bug if two people see the bug it's
00:05:13.960
probably a bug okay so once you're sure you have a bug The Next Step report it
00:05:20.240
again something that is uh quite frequently overlooked you know you have newbie programmers come into something
00:05:27.720
like Ruby or a library that out there and they're using it and they say oh that looks like a bug but I don't know
00:05:34.960
somebody else must have found it right I don't need to report it no if you find
00:05:40.800
the bug report the bug duplicate bug reports are not a problem
00:05:46.840
actually H some people might complain about them but honestly duplicate bug reports are a good thing they're a good
00:05:51.919
thing for two reasons one if a particular bug is reported and again and
00:05:58.120
again then that means means that people are obviously encountering this bug this is an important bug this is how open
00:06:04.000
source projects prioritize their fixes the bugs that get reported more often are fixed
00:06:10.440
sooner the other reason duplicate bug reports are good is because not everybody is going to hit the bug the same way and one of the most challenging
00:06:18.080
things about fixing any bug is narrowing down the specific set of criteria that are needed to trigger the bug so if you
00:06:26.080
have multiple bug reports you can look for the commonalities between them to help zero in on what the actual bug
00:06:33.199
is okay so Ruby has a bug tracker bugs. Ruby d.org when you're reporting it make sure
00:06:40.759
you include all of the steps it's really good if you can actually just attach a little script and say run this script
00:06:47.479
you'll see the bug that's great make sure you make a note of what Ruby version you're using it should go
00:06:54.400
without saying that not not every Ruby version has the same set of bugs so if you found bug in say 193 it may already
00:07:02.599
be fixed in 2.0 so make sure that you report the version and if the bug that you're
00:07:08.560
reporting happens to be a crash make sure you attach the crash log
00:07:13.759
now if you're on OS 10 oh and by the way I should mention that this talk presumes you are on OS 10 a number of the tools
00:07:20.240
that we're will'll encounter later are on OS 10 there are equivalance on other operating systems but just so that
00:07:26.879
caveat is out there um so on OS 10 there's this app console. apppp uh if
00:07:32.720
you don't find your crash log in your terminal you can go to console app and there's this little section it's kind of
00:07:38.879
hard to read but uh it says user diagnostic reports and there you will see all the various programs that have
00:07:44.360
crashed and their crash logs and you should see one that says Ruby uncore and then it'll give a date so you just look
00:07:50.039
for the one that corresponds to the date that the crash actually happened and you get the full output of everything that happened there
00:07:57.479
okay next step now those first two steps those are kind of the bare minimum for
00:08:03.199
reporting a bug right make sure it can be reproduced and then actually go and report
00:08:08.720
it now if you want to be a really good open source citizen the best thing you can do is to reduce the
00:08:15.639
bug so if you find if you find a bug running a rails app right and you go and
00:08:21.199
you say well here's a 100K line rails app and I was running it and I did these 30 things and then I found a bug oh okay
00:08:29.840
that's a bug but it's not going to be horribly easy to reproduce right so try
00:08:35.200
as much as you can to eliminate the need to include any extraneous libraries try and zero down on just the code that's
00:08:42.399
crashing simple scripts are the best thing that you can have five or fewer lines I know that sounds really really
00:08:48.880
small but seriously if you can get it down to five lines of Ruby or less that's the ideal case and aside from
00:08:55.920
making it easier for whoever's going to try and fix this bug to understand we'll see a little bit later that some of the
00:09:01.839
tools that you're going to use to try and fix the bug depend on the bug being
00:09:07.240
a small test case and finally if you're feeling
00:09:13.320
really gung-ho and you really want to you know be a really really good open source citizen try and regress it so
00:09:20.560
Ruby has multiple different versions multiple different patch levels of those versions uh you can try all of those and
00:09:27.279
see does it occur on 187 does it occur on 193 does it occur on 20 what patch
00:09:32.600
level of 1 n if you're really ambitious get bisect is an amazing tool that you
00:09:38.839
can actually just set to run automatically and find the exact commit that caused your bug that's really
00:09:44.560
helpful doesn't always work but if it if it does um and finally if you really are
00:09:50.800
ambitious you can also try different build settings try different optimization levels um but that's that's
00:09:56.720
kind of almost uh going a little too far at this point okay I said four there's actually five
00:10:04.640
if you want to be a really good open source citizen try and fix
00:10:10.200
it and so that brings us to our story so a long time ago about six
00:10:17.640
months maybe a little longer in a far away land called turkey uh I was working
00:10:23.480
on a delegation library now yes I know this isn't technically what delegation is we've
00:10:29.279
you've all been lectured on you know the delegation pattern is not what you're doing in Ruby n all right when I say delegation what I mean
00:10:36.279
is I have an object that I'm going to pass to the initializer to a second object and I want methods on that first
00:10:41.880
object to be called when I attempt to call them on the second object right
00:10:47.000
there's a number of ways that you can achieve this in Ruby so there's what I was calling pre-
00:10:53.240
delegation where as soon as I initialize the object I'm going to take all of the methods on the second object and hoist
00:10:59.920
them into the object that I'm creating using defined Singleton
00:11:05.240
method then there's what I was calling post delegation where I don't Hoist the
00:11:10.399
methods until the first time they're called so I rely on method missing the first time you call a method and then
00:11:15.920
after that I'm going to Hoist the method into my object and from then on it'll be called as if it was a method on this
00:11:24.440
object then there's on demand where it's basically just going through method missing every single
00:11:30.839
time and then there's this complicated Beast that um I was actually interested in a delegation library that would allow
00:11:36.680
me to do quick calls to the delegate but also swap out the delegate at runtime so
00:11:42.519
I came up with this crazy scheme where I undf methods when I decide that I want to reset the delegate and all the rest of that right so uh that
00:11:50.560
was uh and then the next step bench market right how do I know which of
00:11:56.360
these strategies to choose well run through a couple of different scenarios find out what the runtime looks like
00:12:03.000
that'll help me make an informed evidence-based decision so that's what I did um now I
00:12:11.639
ran through those kind of quickly I uh put up a couple of uh blog posts on our company blog describing the backstory in
00:12:17.120
a little bit more detail why you want to Benchmark and how I came up with those various
00:12:22.720
tests but as I was running my delegation Benchmark when I got to one of the later
00:12:30.240
benchmarks that was putting a lot of stress on the system all of a sudden bang
00:12:36.920
crash Ruby crashed I found a bug now I don't know about you guys
00:12:44.480
maybe you get annoyed when you find a bug I get excited I think yes all right
00:12:52.760
so okay so going through those four RS right before I reported it even um I I
00:12:58.639
was able to produce it that was not a problem every time I ran that Benchmark it crashed uh before I reported it I
00:13:04.360
wanted to reduce it because this is actually I know you can't read this uh you're not supposed to this is the entire Benchmark that I was running
00:13:10.560
right and and that's a lot of code uh so the first thing I did is uh those blocks at the bottom or all the different
00:13:15.680
benchmarking runs that I was doing it was only the last one uh that was finally causing the crash so I said okay
00:13:22.680
what if I take out all of the Benchmark runs before that right will that last Benchmark run on its own still trigger
00:13:28.480
the crash and yep it did so then I said okay well it's crashing on the first of
00:13:35.519
the methods that I'm benchmarking in that last Benchmark method so let's let's eliminate all the other delegation
00:13:41.399
strategies that I was trying let's just go with this pre- delegation one and see if just running that in that last
00:13:47.160
Benchmark causes a crash and yep it certainly does okay so at this point I don't need
00:13:52.639
the Benchmark library or I suspected I didn't need the Benchmark Library anymore I could just Hoist the code out
00:13:57.720
and run the uh the method that was actually being benchmarked on its own see if it still crashed and yep it
00:14:04.480
did now when I was doing this Benchmark in this inner loop here right so I've got the 10,000 times do this
00:14:10.560
initialization and then 10,000 times call a method right I was actually calling three different methods in The
00:14:15.800
Benchmark that was triggering the crash and I wanted to know do I actually need to call all three of those what if I
00:14:20.880
just call the first one so the first method I called in The Benchmark was the length method so let's see if I do this
00:14:27.720
is it still going to crash nope that didn't crash and this is kind
00:14:33.920
of the art of creating a reduced test case you sort of vector towards the minimum amount of code that will trigger
00:14:40.720
the crash and as you do you'll probably end up eliminating the crash at some point so then you got to backtrack and
00:14:46.399
say okay what can I do differently right so the second method that I called of those three was reverse and I said what
00:14:54.160
if I call just reverse right do I actually need to call all three or is this one enough and yep that one did
00:15:00.519
cause a crash so the next thing I wanted to know is do I actually need to hoist all of
00:15:07.360
the methods from my delegate onto the object that I'm created or is it enough for me to Simply Hoist the reverse
00:15:13.279
method the one that I'm calling right and so I eliminated that iteration through all of the methods and just did
00:15:20.320
that single defined Singleton method with reverse and yep that still crashed so at that point um I was pretty
00:15:28.279
much at the minimum amount of code that I I could have that would still trigger the crash the last thing to do is to
00:15:33.720
play with the number of times of iteration and so you'll notice that I only needed to run the outside Loop a
00:15:39.240
thousand times the inside loop I still needed to run 10,000 times in order to trigger the crash and so now I have my
00:15:46.319
minimally reduced test case that this is the minimum am amount of code that I
00:15:52.040
need to trigger the crash at this point I went and I filed the bug and then I
00:15:57.800
went to sleep because it was probably about 2: a.m. on a Friday night but then I woke up on Saturday and
00:16:03.480
I couldn't stop thinking about it I said I want to fix
00:16:09.040
this all right so before we set off on this adventure
00:16:15.160
we need to pack our bags if you're going to debug Ruby you need Ruby so going at
00:16:20.680
the source code conveniently it's mirrored to GitHub so it's actually really easy for you to get now you don't
00:16:26.519
have to worry with subversion checkout commands and go ahead and build it uh over the years building Ruby has
00:16:33.720
become incredibly easy in fact I do believe you don't even need that first configure step anymore you can just
00:16:39.440
simply type make and it'll run configure for you before you actually uh make um
00:16:45.240
if you want to play around with build settings configure d-el has uh a lot of information about different build
00:16:50.600
settings uh we'll see some of them a little bit later all right so make sure that you've got the Ruby source code and
00:16:56.920
you can build the Ruby source code then study the
00:17:02.240
map again I don't expect you all to know uh Ruby internals but there are a couple of things that are useful to know if
00:17:08.600
you're going to head off and start debugging Ruby um I I do suggest that um
00:17:14.600
I'm forgetting the name of the book but uh the book that uh was discussed in the session yesterday uh Ruby internals uh
00:17:22.559
Ruby under microscope thank you yes uh that sounds like it it would be an excellent resource if this is the kind
00:17:27.760
of thing that you want to get into but at a bare minimum things that you should be familiar with uh inside of the Ruby
00:17:32.960
Source directory you'll find this file include Ruby ruby. this is the public cap API but it's also a lot of
00:17:38.919
convenient methods that you can use when you're attempting to debug things when you look at Ruby Source
00:17:45.720
you're going to see a couple of uh values that are repeated over and over again one is ID and an ID is essentially
00:17:52.600
just a symbol uh then value which is essentially everything else Ruby is is
00:17:59.039
dynamically typed and C is not so Ruby sort of fakes out C by calling
00:18:04.840
everything a value uh then there are a couple of methods good methods to know RB id2 name
00:18:10.960
this actually will take an ID and it will dump it out as a C string it's important that you have a C string
00:18:16.679
because we're going to be in a second uh using GDB and GDB only knows about C
00:18:22.480
strings it can't print a ruby string but it can print a C string so uh RB id2 name turns an ID into a C string so you
00:18:29.039
can print it out in GDB uh likewise RB string value seaster uh is essentially
00:18:34.799
like two string um but it's going to as the name apply implies uh output a C
00:18:40.360
string so that you can view that in GDB and finally as we're going to be looking at some uh back traces VM call
00:18:47.640
method is an important one to pay attention to because this is how methods get called in
00:18:53.400
Ruby okay and finally make sure you bring your Gardener because what an
00:18:59.000
adventure without your Gardener uh and in this case that's mini Ruby so if you build Ruby and pay
00:19:07.280
attention to the build log you'll notice the first thing that happens when you build Ruby is that you actually build mini Ruby mini Ruby is a sort of a
00:19:14.760
bootstrap uh although it's not exactly what you would consider a traditional bootstrap it's actually a very capable
00:19:20.520
Ruby executable just without a lot of the extensions and uh standard Library parts so it's smaller uh it still has
00:19:28.120
the lecture part parser interpreter VM runtime all the core library or um core classes uh so if so so long as your bug
00:19:36.760
is in one of those core Library classes you actually don't need to run Ruby to debug you can run mini Ruby and this is
00:19:43.640
convenient because mini Ruby builds a heck of a lot quicker than Ruby does and if you're going to be you know putting in printf statements and all the rest of
00:19:50.159
that it's nice to be able to rebuild quickly okay so we're ready let's set
00:19:57.760
off the first weapon that you need to get proficient with if you're going to be debugging anything in Sealand is
00:20:05.240
GDB and the way that this works is you simply give GDB the executable to run
00:20:10.720
which in this case is going to be mini Ruby what what happen is GDB will start
00:20:16.760
up and then it'll stop and it'll wait and that's so that you can do any sort
00:20:21.919
of uh breakpoint setting that you need to do before you you uh begin running the the actual program uh enironment
00:20:28.960
variables we we'll see all of this in a second then the arguments that you would normally pass to mini Ruby right so uh
00:20:36.760
if you're going to have a script that exhibits the bug normally you would do
00:20:42.200
mini Ruby and then pass the name of the script on the command line right well because we're inside of GDB instead of
00:20:47.799
passing it on the command line you actually pass it as an argument to the Run command in GDB and that'll pass it
00:20:53.320
through as an argument to mini Ruby and then in this case because we had a crash we wait for for it to
00:20:59.559
crash so here's what that looks like okay so we're starting up GDB on Mini
00:21:04.960
Ruby it's going to give you some introductory information copyright notices you know this is gdbs compiled
00:21:10.799
by blah blah blah none of that really matters and then for a a first go let's
00:21:18.000
just run our program now I I have this tradition I like to name my bug scripts bug. RB so I don't have to remember the
00:21:24.400
name I just always do a run bug. RB and so that's now going to tell us
00:21:30.000
it's going to start the mini Ruby program executing with the argument bug. RB it's loading all of our libraries
00:21:36.159
shared symbols and there is our crash so we crash in tiny free list
00:21:42.840
remove pointer what the heck is that and how the heck did we get there all right well the back Trace
00:21:50.000
command or BT for short will show us the stack Trace at the location of the
00:21:55.480
crash and in this case we're this is actually much longer so I I like I said I tried to fit as much as I could onto
00:22:02.360
these slides without making it crazy so the the output is actually way longer uh if you do this on your own um but if we
00:22:10.000
look backwards through our stack frames a little bit back back back eventually we'll see there is a VM call
00:22:16.360
method and if you've looked at the call Signature for VM call method you're going to see that uh one of the
00:22:22.120
arguments that it takes is uh Rec V which is the the target of the call and
00:22:27.360
so uh G wouldn't it be nice if we could figure out what uh what object we were actually calling a method on right this
00:22:33.200
is just you know trying to figure out which direction we're going right what what happened at the point of the crash
00:22:38.600
so we use our RB string value sear with the memory address of the receiver
00:22:46.279
and crash we got a crash inside of our
00:22:53.799
crash So This Time ruby is telling us that we can't object allocation during
00:22:59.559
garbage collection what well go back to the back Trace that we had before and
00:23:06.120
you'll notice between the method that actually triggered the crash and the VM call method at some point we were
00:23:12.520
triggering a GC lazy Sweep All right so this crash is
00:23:17.600
actually happening inside of the GC routines and if you study some of Ruby's internals you'll know that once you're
00:23:23.919
inside of the GC routines there's not a lot that you can get away with right so you could get around this you
00:23:32.320
know the the RB string value Seer method actually goes in and and looks at the structures and uh you know figures out
00:23:40.400
uh what the name is based on values stored in in different structures and pointers and you could trace that down
00:23:45.640
on your own but uh you know and I did but uh it didn't really actually tell me
00:23:51.919
anything all right so let's go back to the message that we got in our terminal the first time that we saw the crash
00:24:00.320
it said incorrect check some for freed object object was probably modified after being freed set a break point in
00:24:06.679
Malo error break to debug now this is something that os10 does uh Malo error break is a method
00:24:14.159
that doesn't do anything it simply exists as a method for you to set a break point on if you're attempting to
00:24:20.480
debug something that is a error in malok right so so let's go back to GDB start
00:24:27.320
it up again this time before we run the program let's set the break point now you're notice GDB does this weird thing
00:24:33.600
it tells us you're breaking on some method I don't know what this is right it's because the libraries haven't been loaded yet so it says uh should I should
00:24:40.679
I wait and see if this method shows up and you say yeah okay go ahead and then you run the program right and then it
00:24:46.200
says oh yeah there's maloc error break okay so the break point is set and then we're going to run run run and crash in
00:24:54.120
the exact same place so setting that uh breakpoint didn't exactly help
00:25:03.679
anything okay enough with GDB time to turn to some more advanced
00:25:09.520
Weaponry so if you go into the os10 man page for malok you'll find that there
00:25:15.840
are a whole collection of really interesting environment variables that you can set that modify the way that
00:25:21.120
malok works one of those is called Malo stack logging no compact all right it's fun to say what
00:25:28.960
happens when you set Malo stack logging no compact is that every time there is an allocation or a free of memory right so
00:25:37.480
every time memory is allocated or freed the system is going to look at the stack
00:25:43.200
at the point that free or allocation happened and record it to disk and then
00:25:48.880
there's a tool that you can use Malo history you can go in and you can retrieve those stack traces right and
00:25:55.440
this is the history for that memory address for the entire run time of the program now first thing we're going to
00:26:01.159
need to do before we uh set off in GDB again and actually probably could have done this a little bit sooner is go back
00:26:07.120
to that configure make step and make sure that we are making with our optimizations set to zero and our
00:26:13.799
debugging symbols included so that we can actually see all the memory addresses because if you go back and
00:26:19.360
look at those uh back traces before you'll see a number of them it said inlin or you know value not available
00:26:24.559
because of optimizations and stuff like that so recompiling at optimization level Z will allow us to see all of the
00:26:30.480
memory addresses that we need to see okay so start up GDB again before
00:26:37.360
the program runs set the environment variable Malo stack logging no compact equals true right and in GDB it's
00:26:43.360
confusingly set environment instead of Set n because whatever um
00:26:49.520
run and you'll notice as soon as we start running we start getting some logging messages Malik is telling me
00:26:54.679
okay I turned on your stack logging I'm not going to compact the stack logs but your dis is going to fill up really
00:27:01.559
fast and this is why it's so important to have a reduced test case because as soon as we get into these debugging
00:27:07.600
tools they stress the system pretty significantly okay so we let it run and
00:27:14.799
pay attention to the process ID that it reports to us for what we're running because we're going to need that when we
00:27:20.760
go to use Malo history and there's our crash again same crash as always okay so now when we look at the
00:27:27.600
back Trace we going to go and and these methods appear because we're optimization level zero now right before
00:27:33.760
the Ruby x free and vmx free were all in line so we didn't see them so we can actually see the pointer that Ruby is
00:27:40.159
attempting to free when this crashes and so now we can take that pointer we can take the process ID from the screen
00:27:45.679
before we can go Malo history process ID memory address and Bam we get this whole
00:27:52.080
and I mean by the way this is like 20 30 pages of this I just showed the very end of it right every time this memory
00:27:58.480
address was either allocated or freed anywhere in the Ruby running process it
00:28:03.679
records the stack right and and it reports the stack with pipes so you can see that this is the last time that it
00:28:09.760
was allocated in freed you see the last time that it was allocated and freed it was a start main Ruby run node all the
00:28:14.799
rest of that that that and actually if you look the free one actually matches the back Trace that we got for our crash
00:28:20.640
because this is our crash right we're crashing as we're freeing this block now
00:28:26.200
what you can do is you can go back um so at some point there was a problem with an allocation or a free right the the
00:28:32.080
bug that told us that somebody was touching our memory inappropriately right probably after it was freed so we
00:28:38.840
can look through this log and we can look for things like for example
00:28:44.279
um you can make sure that every allocation in free is matched up right so there are different things that Ruby
00:28:50.080
allocates in freeze there are procs and blocks and all the rest of that right so uh you can see here the last allocation
00:28:56.120
was a proc new that alloc it and then a proc free that freed it right so this matches up and actually if you go back
00:29:02.080
through the history um it matches up the whole way back H that didn't actually tell us much
00:29:08.720
of anything it's kind of cool to look at but we need something else we need
00:29:16.120
gallo and you can read the man page for G Malo I suggest you do it's actually kind of entertaining but I think the
00:29:21.880
first line summary is enough to give you an idea says guard malok is an
00:29:27.120
aggressive debugging malok Library let's see how it looks okay so
00:29:33.640
this is your memory you've got pages and normally what Malik is going to do is it's going to give you a little bit of
00:29:39.080
memory from a page and another page and then it's going to go to the next page and so it's allocating your memory and
00:29:44.320
when you free it's going to Mark something as freed and then it's going to allocate some more and then maybe you come along and you want another uh a
00:29:51.159
pointer and it's got its free list of uh pointers that have been freed so it recycles that memory and gives it back
00:29:56.399
to you right this is how Malo normally works this is how your system runs every day Galo does something interesting when
00:30:05.399
Galo allocates memory it allocates it on the edge of a page boundary and then it
00:30:10.960
marks the next page as unusable so what this does is for example if you were working in C and you
00:30:18.519
allocated enough space to have say an array of 10 inss and then you went and attempted to access the 11th member of
00:30:25.480
the array right that's a bug but but normally c will not complain it'll just
00:30:30.760
give you whatever is at that memory address right C doesn't care so what GMA does is to catch those
00:30:37.600
kinds of bugs it ensures that you cannot access the memory after the end of the
00:30:43.600
memory that you've allocated right so the next allocation is going to happen on the next page boundary and then the
00:30:50.399
next page after that is going to be marked off do you see how quickly we're using up memory again this is why it's so
00:30:57.399
important to have a reduced test case G Malo really stresses the system and on top of that the other thing that g malok
00:31:03.960
does that's particularly important for our case is that when you free memory it
00:31:09.080
doesn't return it to the system it marks that page as unusable as well so if you attempt to use memory after you freed it
00:31:15.840
you will crash so the whole point of G malok and the man page explain this explains this
00:31:21.760
right is your program Should Crash It Should Crash immediately where the bug
00:31:27.519
occurs right the reason that we are having such a hard time finding this bug is because
00:31:32.639
somebody allocated memory somebody freed memory somebody touched the memory inappropriately then somebody else allocated the memory somebody else freed
00:31:38.320
the memory and that's where the crash happened right the crash happened all the way over here but the bug was somewhere in the
00:31:44.919
past so G should help us catch that so here's what we do we start up GDB again
00:31:51.720
we have to set our uh Malo stack logging no compact we want to know what happens on all of these allocations we said uh
00:31:58.399
this dial insert libraries environment variable which on OS 10 tells the dynamic loader use this Library not the
00:32:03.760
one you normally use and we tell it to use Li G Malo and then we run and again we're getting all these
00:32:10.519
nice wonderful errors about or warnings about how how much our program is going to suck up all the resources of our
00:32:16.559
machine and now when we crash it's a completely different
00:32:21.720
crash this is something this looks like it could be the location of our bug
00:32:27.919
we're setting a mark on this me which
00:32:33.039
stands for method entry on cfp which is the not current frame pointer Matt
00:32:39.480
corrected me yesterday but I forget it was anyway so we can print out what is
00:32:45.440
cfme and there's the memory address okay so we have a memory address that's been
00:32:50.679
inappropriately touched we have our process ID which we recorded before so now we go in we do Malo history we give
00:32:57.440
the process ID we give the memory location and it tells us the last time that it was freed right so somebody
00:33:03.960
freed this memory before they should have because somebody else is attempting to mark it the GC Mark phase is
00:33:09.039
attempting to mark this memory is still in use but it's already been freed who
00:33:14.120
was it turns out it's that guy free method entry eye now if you go and you look at
00:33:20.960
the source for free method entry ey turns out it's one line should have been three lines because it didn't check to
00:33:29.320
see whether or not the method entry that it was freeing was already marked so all you have to do is add that
00:33:36.720
guard so that we don't free method entries that have already been marked by the GC and the bug goes
00:33:43.600
away all right and that's it got a little bit of time for questions so
00:33:56.480
thanks yes
00:34:05.320
um how did I well because I actually spent a good 48 hours straight looking at almost every method in the stack and
00:34:12.760
uh trying to figure out who's marking the method entry and when are they marking it and when are they freeing it and tracing through all of the methods
00:34:18.800
so no I mean it's a fun adventure um if I told the whole story we'd be here till Nightfall so uh um yeah I I kind of
00:34:27.040
shortcut that that but took a little bit of our what's that did you get blame it to find out who uh yeah the same person that fixed
00:34:34.440
it is the one who broke it so yeah if we were sistic enough to want
00:34:42.119
to reproduce this is the blog post and everything you mentioned before list the Ruby version all that um well not uh
00:34:50.480
that blog post isn't up yet I'll actually be posting my slides yeah I forgot to mention I'll post my slides so
00:34:55.560
that uh because I know the text is kind of small so you can uh go through it at a more leisurely Pace um turns out as
00:35:02.280
far as Ruby version you can actually run the most recent 193 because it didn't get back ported to the most recent patch
00:35:08.800
level so it's only fixed in Ruby 2.0
00:35:16.640
whoops any other questions all right well thank you for