00:00:17.320
we're going to talk about some of the the rubby concept that I try to simplify as much as I could because I think it's
00:00:23.160
very important that we have a good understanding of things um
00:00:28.640
and so this is this is a cross-section of a ruby app um you basically have the
00:00:35.040
the cross which is what people see outside and as you go deeper and deeper people have less and less understanding
00:00:40.480
of what's going on and my goal is not to go and talk about really the core and
00:00:45.520
what's going on how it's made and why you should use this implementation instead of another implementation my goal is really to explain why things
00:00:53.039
work the way they are so we can have a better discussion about the possib the POS the different possibilities and the
00:00:59.199
solutions we could Implement so 90% of our work is done on the surface we
00:01:04.400
probably all write rubby code and we don't all work on different Ruby implementations and um that's story fine
00:01:11.600
that's actually really good that's what ruby was designed so we can focus on the core of the application and we can uh
00:01:19.600
release business value and make money which is probably why we have a job so why should we care and why should
00:01:26.960
you come and listen to this talk if really uh that's not so important for us
00:01:32.200
well the reality is even though you're on top you need to know what's going on underneath uh so you can have an overal
00:01:38.399
understanding and you can understand when people arguing and they're saying well we should have better concurrency what does that mean uh we need to remove
00:01:45.520
the global lock well what does that mean why isn't it removed is mass just somebody who doesn't want to remove the
00:01:51.320
global lock is there a reason why we we add it in the first place what's a green thread what's what's the difference with
00:01:56.880
a n native thread people told us it's better but why is it better so my goal is really to try to um
00:02:04.479
explain some of this concept as simple as I can so you can understand the dependencies between each of
00:02:10.280
them and we also all craftmen we really try to do something we care about we
00:02:16.519
believe it's almost an art and we get together and we want to be motivated by the Ruby Spirit we want to work and um
00:02:25.760
we we really want to do something together and it's interesting because as people are fighting for changes in the
00:02:31.560
Ruby language or in the Ruby implementation um there's a desire for all of us to do a better
00:02:38.040
job so um there's a lot of stuff I won't be covering um and stuff you should know
00:02:43.640
other people are talking about so I don't really care too much about that at least for this talk um on the other hand
00:02:50.040
I will cover um a few things things that you don't have to know but it would be good if you know them so we're going to
00:02:56.560
start at the beginning we're going to talk about how SS code get gets pared to
00:03:01.720
become uh something and then it gets executed and it's all magical um and
00:03:07.440
then we have the C extensions that sometimes we have to use like noiri or these myal jams and these things that
00:03:13.720
are a bit annoying sometimes and we'll talk about that and see how they relate to all of that and then we're going to
00:03:19.159
talk about concurrency because concurrency is related to all of this and it's a really hot topic nowadays and
00:03:24.440
everybody has his own opinion and they think you know they have a solution for that uh and different implementations
00:03:30.760
approach that differently and finally we talk about memory management because memory management we don't have to do it
00:03:37.680
but somebody does it which is the implementation and if you don't understand how that works you might not
00:03:43.080
realize you're doing something wrong or you might not realize the arguments of for a solution against another one so
00:03:49.439
let's get started oh before get started this is ruby9 only and it's actually MRI
00:03:55.040
so I will quickly mention other implementations but I'm going to focus mainly on one n with some uh discussions
00:04:02.360
about what change but it's really y n Only If you still using 18 I'm sorry for you it's time you move
00:04:08.239
on so it might sound very boring to you um yes it might be boring my goal is to
00:04:13.400
make it less boring if it's really boring uh you have Cafe Deon over there and you can get a good coffee with with
00:04:19.040
beign or you can just wait a little bit go through that and um then you can think about what we
00:04:25.400
discussed so let's start at the beginning uh we have a lot to cover so I'll be try to be I'll try to be quick
00:04:31.280
so let's see what happens I write my own source code and um it's just a simple
00:04:36.639
hello world what happens when uh I write this code well what happens is that
00:04:42.560
first there will be a tokenization process what that means we're going to cut this text that comes from the SS
00:04:48.720
code and it's going to be broken down into tokens and here you have a representation of the different
00:04:53.960
tokens now after the tokenization process uh there's a Lex there's that
00:05:00.080
will happen based on a grammar and basically this is how we will break down
00:05:05.320
um the SCE code into things that might make sense for somebody to interpret so you can see how things are breaking down
00:05:11.960
into smaller parts and then um as you can see here I explain a little bit how
00:05:18.199
that works you have basically the line number where it starts the column it's on the the type um of token and a token
00:05:26.400
itself so once we have that the parser can do his job um so he will take this
00:05:31.840
Lex representation and then you'll convert that into an as which which is an abstra abstract syntax tree and it's
00:05:38.800
basically a bunch of notes put together that explain how the language will be executed based on the grammar that was
00:05:44.400
defined so here we say we have a program that starts we have a First Command that's a put uh and it's it's on uh on
00:05:51.800
column zero line one um yes um and then we have uh we're going to pass some
00:05:58.639
arguments in this string blah blah blah and you can see all the information now this EST is a representation of the
00:06:04.000
program now it's not enough to do much but it's enough to to have the language being defined so this is implemented in
00:06:11.479
MRI using a lexer and a parser and you can actually go and look in the source
00:06:16.520
code and see how that works if you're interested uh and you you can see how the language is basically being
00:06:22.880
explained and between the different Ruby implementation that's what's really shared everybody shares the same
00:06:27.960
approach to parsing s code so if you want to look at how Works in Ruby you
00:06:33.440
can actually use Reaper and Reaper is is a nice tool that was provided in Ruby 1 n that allows you to see um the
00:06:41.680
underlying uh Lexing and parsing that's being done and Ruby MRI uses Lex and
00:06:48.000
bison but with uh Reaper you can actually have an idea of how that works so you can take your code and look at it
00:06:54.400
and see the end results um there are also some smart people that did the other way around where at and I
00:07:00.000
basically generate an as the NST so the as I showed you here um this is actually
00:07:06.039
a s expression that's not really how it is in inside Ruby but you can actually take this as modify it on the Fly and uh
00:07:13.479
evaluate it but anyways this is how you can play with the EST to understand the language better see how that works see
00:07:19.160
how Ruby interpretes your code now once you have this EST you have the virtual machine that will do two things so this
00:07:26.440
runtime or virtual machine would do the compilation and their interpretation and the compilation uh and the viral machine
00:07:33.039
was written by Ki sasada who's here and I'm sure you you know him and it was
00:07:39.319
replaced in one n are you here Ki maybe he's not here after all a sad
00:07:45.240
but it did a great job uh and he's still doing a great job so the compiler the job of the compiler is to take the EST
00:07:52.199
we s and to compile it into a bite code and a bite code is a representation optimized representation of the as
00:08:00.240
now once we have this bite code we can actually interpret this bite code and we can execute it by running it and that's
00:08:07.120
what's basically going to run your program so this VM also handles uh the
00:08:13.759
concurrency and the extension libraries and we're going to talk about that in a minute so don't worry too much about it
00:08:18.840
for now so the VM implementation you have a bunch of C files um I would recommend you know if you're interested
00:08:25.080
in that to look uh at at the code and see how it's done it's not really easy if you you don't know C but really if
00:08:32.080
you want to become a better developer I would encourage you to learn a bit of C it would just make you better um you
00:08:38.039
don't have to be a c expert and you don't have to contribute to to MRI core but I think it would really help uh you
00:08:43.959
understand different a different world of programming so uh these are the rest of the VM
00:08:49.720
files now to understand how an implementation of Ruby works these are the different parts of Ruby
00:08:56.360
implementation you have the garbage collector which is what's going to manage the memory and we're going to
00:09:01.399
talk about that again in a minute then you have the bu buil-in classes like hash array base object all these these
00:09:08.959
classes that exist then you have the standard libraries uh like net HTTP uh
00:09:14.200
op SSL this the standard libraries then you have the string encoding transcoding which um was modified for one n which is
00:09:20.959
very very important because it allows us to deal with different type of encoding like utf8 versus ASI um or binary uh
00:09:28.800
encoding then you have the reg exp engine which is what allows us to use regular expression you need to realize
00:09:34.560
some languages don't have regular expressions and it can be a pain um then you have a bunch of small utilities that
00:09:40.680
allows us to uh debug how things are working inside to do time formatting and things like that then you have the
00:09:46.600
parser and then you have the VM so this is what makes a ruby implementation so let's talk about C
00:09:52.480
extensions because they're really critical to the discussion to everything else um interestingly enough you can
00:09:58.360
hear people say well we canot change this because of the C extensions we don't want to break them what does that
00:10:03.440
really mean we first need to understand what a c extension is and how it works so C extension is usually um used when
00:10:10.200
you want to wrap an existing C library so you can think of noiri or the my SQ
00:10:15.560
gem or a lot of different C extensions that actually have C code and then they expose this C code um it could be used
00:10:22.760
for performance but usually it's because people reuse libraries that that already exist already perform that you don't
00:10:28.200
want to rewrite so how does it work well in C you can actually Define objects in
00:10:34.880
the Ruby world so this is some code I took from uh from Aaron noiri and you
00:10:40.720
can see U he declares a bunch of values and then he will use the Capi that
00:10:45.959
that's provided by the Ruby headers and it can say Ruby Define a module called
00:10:51.120
noiri and assign that to the value and then um I also want to Define another
00:10:57.519
module underneath no giri and then I would Define another module underneath
00:11:02.760
XML which is under no giri and then you can see another example of the the Capi
00:11:08.120
where we Define a constant and we'll set it inside a different module so this is how you write C extensions that seems
00:11:14.560
pretty easy now the problem is you also need to expose C functions so a c function it's not really really hard you
00:11:21.760
can see we're doing the same thing we basically creating a module in this module we Define a Singleton method and
00:11:28.200
this method points to a function that's being defined underneath so whenever we're going to call Bonjour on Ruby it's
00:11:34.480
going to call the C function that's defined underneath and the C function returns a ruby string so that's a the
00:11:40.519
extension now if you stop here you you think well okay so what's the big deal well the big deal is that um usually
00:11:48.320
what you do you would actually wrap uh structures in C and for that that means
00:11:53.360
you need to manage memory now that means you need to understand the garbage collector because um the you will create
00:11:58.480
object in Ruby that would wrap um seure in C and you need to make sure you don't leak memory and you need to make sure
00:12:04.399
things work properly so when you define a class in this case I took that from the my equal 2 client when you define
00:12:10.880
the class you need to say when my class is allocated this is what you're going to do in C and you're going to allocate
00:12:16.440
a few things and then we have the C function that basically defines how the memory is allocated and um you can see
00:12:22.839
that the author first defined a value object then we have a pointer to uh wrapper it's it's a it's it's a data
00:12:29.320
type in C we don't really know what it is yet and then you have this line that says data Max structure and this is
00:12:34.880
basically how in in a c extension you're going to create a class or an object or
00:12:40.040
whatever you want that will point to your own object in C and when you do that you will say hey um Ruby by the way
00:12:46.760
when you will uh run the garbage collector please run this um other
00:12:52.600
functions for me so I can maintain my own memory and we don't leak memory and what's interesting to see is uh this
00:12:58.320
data structure as a name as everything else as this different functions and then he has a pointer to the data which
00:13:03.959
is the wrapper we Define above and you can see then then the wrapper is being set all in C and then some memory is
00:13:10.360
being allocated directly to this wrapper object and then we return the the new Ruby object that was mapped so Ruby can
00:13:17.720
use it so what happens is when the garbage collector would go through and it would want to check if the object is
00:13:24.120
being used it will call one of this function which is RB MySQL client Mark that was defined here here and here
00:13:30.959
we'll check if this object that we're pointing to still exist and if it if it does we're going to mark it to say it's
00:13:37.639
a live object don't don't do anything about it and if the memory needs to be uh cleared so if we need to free the
00:13:44.000
object in this case uh we're going to call the C code and say hey by the way get this wrapper close close this object
00:13:51.240
which we need to Define and that will free the memory that was allocated down here almost at the last line and then
00:13:58.040
also free the point that we Define and that's how we free the object so you can see it's actually not that easy and
00:14:03.600
there's a lot of work to do to maintain the memory because the C code has to work with uh Ruby itself so it um it
00:14:11.639
basically you have a few challenges when you write C extensions you have to deal with with the memory and the fact that
00:14:17.160
you have you have the garbage collector to deal with um you also have to deal which is a problem with c and NC classes
00:14:23.360
you need to make sure your cut is is crossplatform so um we have a tendency as a ruby Community to Target mainly uh
00:14:30.399
uniqu uh but a lot of people really work hard to make sure windows work and when you write a c extension that's part of
00:14:35.920
the challenges and then you have the problem of thread safety which is not a problem with MRI because um we're going
00:14:42.120
to talk about that but basically thread safety uh a c extension cannot run on on multiple threats at the same time so you
00:14:48.199
don't have this problem but I'm going to get to that in a minute because that's the next topic let's talk about
00:14:54.720
concurrency so concurrency is a big deal everybody want to be concurrent we all want to to uh run a lot of code in
00:15:00.600
parallel and I just want to explain it a little bit by showing an example so we would like to handle a lot of requests
00:15:06.800
web request at the same time and I will write a bit of Rubik Cod just to show you how that
00:15:11.959
works just to to illustrate um the concept of concurrency in a very simple
00:15:17.279
way so we have a simple client we have a Dy client and this client is a class with two function one is me a query the
00:15:23.959
other one gets the reply and will print the response that will come back in the query uh method we call the server and
00:15:30.600
we dispatch ourself with an ID I mean it doesn't matter how the client works just to show the as an example now the server
00:15:37.319
implementation that would be a very simple server um it's a module with two function one is dispatch which gets
00:15:43.079
called by the client and it would take the client and call reply on it and it would pass a response which is a fake
00:15:49.959
response and in this case what I did is my fake response would will be random
00:15:55.120
and it would take more time if the ID is um than even so and my point I will show
00:16:02.319
why I did that in a minute so if we start 10 clients and we make them query
00:16:07.600
um we make them query the server 10 times we'll see the response will come back as response 0 1 2 3 4 5 6 7 n it's
00:16:15.279
8 n so they all come in synchronal order which is not good because as I told you
00:16:21.079
um some of the responses will be slower and if they're slower that means that the one that's faster behind will have
00:16:26.480
to wait for that and that's that's not really good the through part really will be low because it will depend on the
00:16:31.839
Queue so if you only test one call you will gu you will always get the same
00:16:37.000
speed but if you make a lot of calls you're going to have a different response time so the problem with this
00:16:42.880
approach that it cannot scale as you get more uh load so everybody will tell you well this is not a good solution so
00:16:48.800
let's use threads because that's what we all know so let's talk about threads so what's a thread well a thread
00:16:55.079
is very simple it's basically a code that gets executed in parallel and shares the same memory as everything
00:17:00.279
else so we have a main thread which would be EUR be application and then you can branch that and say Hey I want to
00:17:06.000
run this cut in parel and you can do that many times and you can share the memory so that sounds really good um we
00:17:13.559
could rewrite our code to make it into a threaded server so the only thing you need to do for that is I need to wrap
00:17:20.079
the dispatch uh function body inside a new thread and now if I query 10 times
00:17:27.480
um I will see that the response come back in a different order because the faster response will come back before
00:17:33.520
that sounds really good all right so the fast the the 3D response um will allow us to get a
00:17:40.200
better throughput because it will not depend on everybody else on the Queue that's before that um now there's one
00:17:47.559
problem is if we have too many threads we can actually slow down the server and we're going to talk about that really
00:17:52.679
soon but threats are not magical there's not something magical that just say hey when I have threats I can execute all my
00:17:57.799
code in parallel that's just magical the reality is that a CPU can only execute one instruction at a time so there's
00:18:03.440
something happening Bey the SC to to do this thread um work and that's called
00:18:08.919
Contact switching so contact switching um it's not actually true not the OS is
00:18:14.000
not always in charge but um contact switching means you go from one execution code to the other and you go
00:18:20.159
back and forth it it seems like it's concurrent you don't really see what's going on I mean you run a bunch of
00:18:25.640
application on your computer they they don't seem like they're running one after the other right but the CPU can
00:18:30.679
only handle one thing at a time it's just switches back and forth now when you do a thread context switch which is
00:18:36.280
basically when you switch from one thread to the other is still faster than when we switch from one process to the other finally the contact switch happens
00:18:43.679
per CPU so um if you have two CPUs you should be able to run two code exactly
00:18:49.200
at the same time in parallel so this is an example of how the schedule Works in Ruby uh in MRI so
00:18:56.280
that's a rub um Ruby uses a fair schedule which which means that um it will go back and forth between the
00:19:02.280
threads with a Time slice of 10 milliseconds so you say 10 milliseconds on one slice 10 milliseconds on the other slide 10 milliseconds now there's
00:19:09.480
one thing and I think that's a lot of people got confused about that um and especially because it Chang in one n
00:19:15.640
what happens if you have a thread and there's a blocking operation are we wasting the rest of the time well no
00:19:21.120
what happens is when you have a blocking operation in a thread the thread is not being called by the schedule anymore
00:19:27.679
there's no polling to check check if the thread is still is still blocked or not what happens is the other threads are
00:19:33.080
going to be scheduled um but the OS on N native thread will come back and say
00:19:38.559
okay my thread is my thread is back from the blocking operation schedule it again so um this is how the scheduling
00:19:46.159
work now there's a lot of discussion about green threads and native threads Ruby 18 used to have green threads Ruby
00:19:52.520
19 has native threads uh Ruby has Ruby thread native threads um micro native
00:19:58.960
threads what's what's the big deal so a green thread is basically a thread
00:20:04.080
that's handled by the runtime itself it doesn't go to the OS and basically has one thread and it would do the
00:20:09.640
scheduling itself so the pros of a green thread is that everything is managed by
00:20:14.919
the VM so it should be technically uh crossplatform at least you get a unified behavior and control at the VM level
00:20:22.840
because you know exactly how the threads will will work you also get um lightweight threads because you don't
00:20:29.799
have to use a native thread so much um the lighter to start the faster to start and the memory should be in theory uh
00:20:35.720
smaller now there's one major problem um is that it's not it's not
00:20:40.919
concurrent so you're limited to one CPU you have one thread and that's it you can only use that so it's it's a big
00:20:47.360
problem if you want to use multiple cores and also if you have a blocking iio within one of these green threads
00:20:53.840
the other threads cannot run they basically are blocked so it seems natural that you
00:21:00.000
know 14 years ago when we only have one core that was not a big deal was actually probably a good idea to use green threads and that's what Matt did
00:21:06.679
it was a good idea but now that we have multiple cores uh Native threats are actually quite interesting because they
00:21:12.799
allow us to run on multiple processors we they also get scheduled by
00:21:18.240
the OS so you have a bit less work to do now you do have a lot of work to do because um different os's work
00:21:23.679
differently so threats are implemented differently but at least you don't have to deal with some of the problems the
00:21:29.080
other thing is a blocking IO happening on a n thread won't block the other threads so that's good so the problem
00:21:36.200
with threads in general not just native threads is that you need to communicate between these threads using a shared
00:21:42.760
memory what that means is you need to use mutexes and locks because you have the shared resource and you're like you
00:21:48.840
have two people trying to communicate and modify them you don't want that to happen so you need to put a lock around that that would say you can access it
00:21:55.159
then when you're done the other person will access it and that's actually a lot of work to do if you don't do that
00:22:00.440
properly you can actually corrupt the data because you have now two threads accessing the same data structure modifying it and you have uh data
00:22:07.760
corruption happening you could also have Deadlocks you have your lock somebody came to it it never released the lock
00:22:14.039
everybody else wants to access the data you get contention you get a lock and you're in a in a bad situation and Java
00:22:19.960
developers would know uh it's not fun to deal with so it also has a lot of cut
00:22:25.080
complexity because you have to worry about all these things um and that's kind of the challenge with with
00:22:31.320
threads also if you have too many threads um you have to do contact switching and you saw how basically you
00:22:37.039
have to go back and forth if you have more threads that than than what's it's a bit complicated to explain because it
00:22:43.520
depends on the system how it's being used but if the amount of threads is not appropriate the context switching will
00:22:49.480
be quite expensive now context switching in one in on Native thread is much faster as we saw that than on a on a
00:22:55.400
green thread finally you get UND deterministic behaviors uh with threads which is a bit scary
00:23:02.320
sometimes One n also added something else which are fibers and continuations or Contin they call both ways um what
00:23:10.159
that means is actually very very simple people seem to be confused but if you look if I go back to my graph here the
00:23:16.400
schedule will switch between each thread when you have a fiber it's not the schedule doing that is you as a
00:23:22.080
developer so you say start this this fiber this light thread and then you
00:23:28.080
tell it when to stop and when to to to come back and do it again so basically you're handling your un scheduling on
00:23:35.400
your own thread with bunch of fibers um it doesn't really help with blocking iOS
00:23:41.279
but it could help if you know uh you don't need to go back and forth between the threads because you know it's going to take a certain amount of time it's
00:23:47.600
also a bit lightweight so it might save you on memory so these are the fibers so
00:23:53.320
the big question is why threads are not popular with Ruby Developers
00:23:58.600
and there are a few reasons uh first we used to have only green threads and people thought well you know it's not
00:24:05.120
worth it um then we had a history of a lot of blocking C extensions so if I go
00:24:11.360
back again to My Graph you see when I say blocking IOU the thread gets uh basically taken off of the scheduler
00:24:18.600
well when you do the C extension you actually need to tell um you need to to write in the code you need to say hey
00:24:24.400
this is a blocking operation and I want you to do it in even it doesn't matter you need to do something something and
00:24:29.679
that would let go if you don't do that then it would actually block until the end of the 10 milliseconds before it
00:24:35.279
would switch back to the other thread so we had this issue with some C extensions that were not doing that and that was
00:24:42.360
fixed so that means that blocking iOS should not happen with DB drivers and
00:24:47.399
other type of code now the other issue was rails uh rails and rails 22 or 23
00:24:54.279
was not thread safe which meant that if you were using threads you would have really behaviors and even to this day um
00:25:00.720
if you start a new rails project by default um it would use a big lock around every single request so it's not
00:25:07.840
uh it's not going to handle all the requests in different threads um there's also a lack of
00:25:13.360
knowledge and understanding people are a bit confused sometimes about what's going on with with the core with the uh
00:25:19.039
with the threads and finally we have a lot of multicore machines uh now we actually want to take advantage of um
00:25:26.679
the different course so we want to use thread Reds now there's one problem that's a big problem that uh Dr Nick
00:25:32.640
loves to talk about which is the global interpreter lock so what is this Gil that things are that people are talking
00:25:38.320
about what does it mean why is it so annoying why do people hate it um why did Matt put a Gil in Ruby to start with
00:25:45.520
well the global interpreter lock is actually something very simple to understand uh to avoid data corruption
00:25:52.720
and for the Reas Reasons I'm going to address in a second every single thread can talk to The View m one at a time you
00:25:59.559
cannot have two threads talking to the the VM at the same time so if you remember the bite code when the bite
00:26:05.799
code is interpreted you can only have bu code from one thread being interpreted at the same time just to make it simple
00:26:12.039
so that's really not a problem if you have one CPU because anyway the CPU can only handle one instruction at the same
00:26:17.520
time the problem is if you have more CPUs so if you have two CPUs you can still only handle one code execution at
00:26:24.840
the time that's kind of a problem um it's not concurrent anymore so the
00:26:30.320
reasons why um there's a global uh Global interpreter lck is first to make
00:26:35.840
developers life easier it's really harder to corrupt data because you don't have access to the data at the same
00:26:41.600
time it's also to avoid racek conditions with C extensions at the C extension
00:26:46.720
level it's much EAS easier to to um find race conditions and I'll talk about that
00:26:52.919
in a sec so um so you make see extensions development much easier because you don't have to if you don't
00:26:58.559
have a global interpreter lock you need to write a lot of code around your C code to make it thread safe and um if
00:27:04.240
you ever wrote a python extension for instance a c extension you know that the the CPI is much harder to use and you
00:27:10.039
need to do a lot more work and if you don't do it right you're going to have a lot of problem like memory leak and and data corruption most of the ca libraries
00:27:17.039
out there that people are wrapping are not threat safe uh starting with the regular expression engine that's used on
00:27:23.240
on on cruv now there's another big problem is that part of the Ruby implementation
00:27:29.480
itself like the hash implementation is not thread safe and that means that if
00:27:34.679
we remove the global interpretor lock we need to go back and fix all these things uh which will create some problems like
00:27:41.320
making um Ruby itself slower so should we remove the gear and that's dr's big
00:27:48.120
question um there is a lot of implementations um we don't have Global
00:27:54.559
interpreter lock so should we remove it in MRI well there are a few answers and I'm not gonna I'm not going to argue one
00:28:00.960
way or the other I'm just want to expose to you what the arguments are so you can understand well first if we remove the
00:28:07.360
global interpreter lock it will make rubby code unsafe it's actually it is a
00:28:13.120
fact that you need to understand how data Works otherwise you're going to corrupt the data you also need to uh worry much more about uh mutexes and
00:28:22.039
locks if we do that it will break the extension and that's something people hear all the time like so who cares
00:28:27.600
right you always tell me Dr Nick like oh I don't care like let's rewrite all the extensions you know who cares
00:28:36.480
well I don't think I've ever done that you didn't say that okay so so last last
00:28:42.919
night okay yes that's correct he never said
00:28:49.600
that with my accent it was more with an Australian accent so well the C extension as I show you at the beginning
00:28:55.159
if we would remove the global interpretor lock is not as simple as changing the way we have the Capi and
00:29:00.240
just the way the calls are made we actually need to change the way memory is handled and the way um you need to
00:29:06.679
hand a lot of of the challenges at the ca level extension unless which we actually make even more changes so C
00:29:13.080
extension will be compatible but that's that's a big one because it's not as simple as just doing you know a fine and
00:29:19.480
replace and then we're good um it would also make writing C extensions much harder so for instance
00:29:26.679
uh depending on the solution that's being used you need to um use right barriers and there's a lot of work that
00:29:33.120
would need to happen at the C extension layer um to make these things
00:29:38.799
happen isne was not it was not you I know oh anyway let's move on um it's
00:29:45.360
also a lot of work so it's that's that might not be a good argument for a lot of you it's like well who cares just do
00:29:51.240
it that's actually what Dr Nick wasam yesterday right do you agree with that it's too much let's just do it that's
00:29:56.760
that's what you were saying no I think we should have a go for okay so it's a lot of work to go in implementation and
00:30:03.559
change it when you have a new implementation it's much easier but going back will take a lot of time you can actually break a lot of things and
00:30:09.600
it's a big big change that might or might not be worth it a lot of the Ruby applications out there actually are not
00:30:15.440
affected by this thing they're not CPU bound and they can deal with other work around of that I believe that most
00:30:21.799
people could deal with the global interpretor lock um python users have been dealing with that we've been
00:30:27.399
dealing with it it's not the best situation but it still works it's not the worst case in this in in the world
00:30:34.120
and a lot of people would say well instead of focusing on on this concurrency issue why don't we focus more on the memory usage on the garbage
00:30:40.880
collector all these things that actually slow down also Ruby that could really improve uh without having to break the C
00:30:46.440
extensions without having to break the way um things work and without putting the rubby code in
00:30:52.559
danger um also as I mentioned earlier if we remove the global interpret lock we would have to go back into the the C
00:30:59.000
code into MRI and we have to make sure everything is is um thread safe and it would make the C code just run
00:31:06.559
slower so what are the arguments for well first we really need better concurrency I think that's that's the
00:31:11.919
main that's the main reason we really want uh concurrency so that's a good argument the other argument comes from
00:31:17.919
the the python community and they have this analogy of the rubber boots and they're saying well it's kind of stupid
00:31:24.080
to wear rubber boots every day and to deal with that just because it might run rain well the reality is even if it
00:31:30.320
rains not everybody will be outside in the rain and not every and and boots will not solve all the problems in the
00:31:35.960
world so maybe we should not pay this price and we should just deal with the real challenge of um working with
00:31:42.919
threats and that's what that's the approach that some people are taking so these are basically the arguments I'm
00:31:48.519
sure you can hear other arguments but these are the main arguments you'll hear and this is probably why MRI is not
00:31:54.120
going to lose his his Global interpretor lock right away
00:31:59.200
so how do you achieve concurrency if we keep this Global lock well you you have
00:32:04.360
multiple ways the easiest one is to say well let's start multiple processes if you have multiple processes each process
00:32:09.840
can work on one core and then the threads will basically use the core we're good to go the problem with that
00:32:14.880
is it uses twice the amount of memory it's not that great well what you could do is you could Fork the process which
00:32:20.480
means you start the process you for the process and then now you have two process and they can basically share the
00:32:25.880
same memory well actually well I'll make it simple you start the the the fork and the fork will basically
00:32:32.559
would not need to copy the entire memory into the fork I sounds good the only problem is that MRI is not copy on right
00:32:39.039
friendly which means that or the garbage collector of MRI is not copy on right friendly which means that when you Fork
00:32:44.360
the process you don't have a lot of of memory used in the fork but when the garbage collector will come it will go
00:32:50.639
and check on all the objects and basically the memory will increase to be exactly the same amount as it is on the
00:32:55.840
master process and that's why uh Ruby Enterprise Chang that to they change the
00:33:01.360
implementation they patch the garbage collector to be copy and right friendly so now you only deal with the the
00:33:06.559
allocation done on the fork now uh Ruby MRI is actually going to fix this
00:33:12.720
problem uh they've been working on it for quite a while now to have the bit map marking um GC so what that means is
00:33:22.480
instead of marking every single object what will happen is that they will keep a bit map like a big table of all the
00:33:29.559
different slots what's being used and you have the marks being used are being put here so when you Fork the process
00:33:35.519
this bit map is being copied over to the fork process and now when the garbage collector would run you don't have to
00:33:40.760
actually uh re um double the memory you'll basically get the entire memory same as as master so that's going to be
00:33:47.159
fixed um there's already a patch for it and it's being uh worked on so her
00:33:52.279
already might be implemented in 194 or $2 so that's coming uh hopefully soon then you can do Ed um um programming
00:34:00.919
which is done um for example using even machine or something like that you can use the messaging actor model which is
00:34:07.080
an interesting approach uh and a lot of people are talking about that where instead of communic communicating by
00:34:12.119
sharing memory you communicate by sending messages between different objects and um this way you can uh get
00:34:18.879
better concurrency and then the approach that kisan presented uh for the future
00:34:24.760
is to run multiple VMS and the way that we work is that within one process you
00:34:29.839
would have two VMS if you have two cores for instance and each VM would talk directly to is on core and now you
00:34:36.879
actually you get the full conren even though you get a global interpretor lock on each VM so that's one way of solving the
00:34:44.200
problem so let's talk about memory management because it's also related to all of these issues so when object
00:34:52.119
allocation happens uh every time you declare an object you actually allocate something so this an example if you say
00:34:59.280
100 times a string it would actually allocate 100 strings in 100 string objects if you create a hash that would
00:35:06.200
actually create four objects you have a hash you have the two values and you have the key if you define um a class
00:35:14.040
and you create an instance of this class that will allocate one node which is basically the code itself uh of the the
00:35:20.680
class then you have two classes probably the class and the Singleton class and then you have the instance object
00:35:28.960
so the garbage collection prior to 1 193 so basically everything that's now um
00:35:34.560
released so 192 and everything before that's the way was working you add a memory so that's the Ruby hiip We have
00:35:41.640
basically a bunch of slots with the ones that are dotted are the one that are live objects they're marked as being object that are being used so think
00:35:47.720
about um a a variable that's actually being used inside your code you canot get rid of it right away and when you
00:35:53.640
try to allocate a new object um we'll get from the freely which are the available slots that are here available
00:36:00.280
in the Ruby hip that works great so what happens if you don't have an available slot so if three list is empty the
00:36:06.839
garbage collector is being called and the garbage collector will come and will I have a slide for that way will go
00:36:13.040
through the list and check every single object the entire uh Ruby hip and say
00:36:18.400
are you free or are you live so he goes and work through all this object until it will Mark each of them so it goes to
00:36:25.359
an object say are you are you live if the object says yes it will mark it if it doesn't say anything or if it's not
00:36:31.000
sure it won't mark it at the end it would basically sweep so it would take all the object that were not marked and
00:36:38.160
it would free them and well it would basically put them into a free list and it will reuse the slot for the new
00:36:44.720
object that's being allocated so what happens if you go through the entire uh the entire slot
00:36:51.040
and there's still use you all have Ruby object so what happens in this case the garbage collector comes goes scans
00:36:57.040
everything all the objects are marked you don't know what to do well the garbage collector cannot find anything to reclaim it will allocate another Ruby
00:37:04.440
hip with more space and this object this slots now can be used for new objects
00:37:10.920
now in one three things change and this is quite important to understand because um it would really affect performance in
00:37:17.400
a lot of your applications so this is what happens you want to allocate a new object we have a free list it will take the object the
00:37:23.839
same as before now what happens if free list is empty well if free list is empty you want to
00:37:29.599
allocate a new object there's no free list in this case the garbage collector would just take one row and would mark
00:37:36.560
it and it will go through and find the object that are used if they are not used it would swipe only this row and
00:37:43.599
use that to allocate the new object which means that the garbage collection time will be much much faster because it
00:37:48.800
doesn't have to scan everything so what happens if um there's still nothing at the end of the the the
00:37:56.920
if you basically go through all the rows and they still nothing what happens is we do the same thing we allocate a new
00:38:02.160
uh Ruby hip to get more more space so um what what you need to
00:38:08.119
understand is with this approach we might run the garbage collector a bit more often but the time spent in the garbage collector would be much
00:38:15.200
smaller which um if if you run some Benchmark and I did on on some of my applications you actually see uh quite a
00:38:21.920
lot of performance Improvement depending on your application obviously so there are different type of garbage collectors
00:38:27.560
you know we hear talking about the conservative garbage collector like the tea parties like the awful garbage collector but you have the precise one
00:38:33.839
that sounds like that's a good one right but how come MRI doesn't use a precise garbage collector well in the case of
00:38:41.040
serubi the garbage collection garbage collector is a conservative garbage collector and the reality is if you use
00:38:48.200
um a c implementation you have to use a conservative garbage collector that's the case also for macruby and the reason
00:38:54.000
for that is because uh you deal with pointers and you don't always sure if
00:38:59.119
the pointer to object what it points to and you're also not sure about the length of the arrays so um using a
00:39:06.240
conservative um GC is actually normal now the C Ruby is a Stop The
00:39:12.480
World um garbage collector it's the same thing from R for ruin what that means is when the garbage collector runs nothing
00:39:19.040
else can run at the same time so everything is stopped while the garbage collector run which is why it's important that the garbage collector
00:39:26.000
doesn't run for too long because in the meantime nothing happens so the laser sweep allows uh the stop the world to
00:39:31.680
actually be shorter so the more code can run at the same time which is what can run within the same
00:39:38.040
time and then it's a lazy sweep which I just explained which means that the the the time spent in a garbage collector is
00:39:45.079
now uh lower than it was before so if before you would run a garbage collector
00:39:50.560
it might take I don't know let's say 20 milliseconds which is quite a lot um now
00:39:55.800
it would probably spend only 2 millisecond but it would run more often so the the time spend in the garbage
00:40:00.920
collector overall within your program will be the same or maybe even longer but the time spent every single time
00:40:06.440
will be shorter so we have other examples of garbage collectors um in the Ruby world
00:40:12.319
we have um the case of micro which which I work on so I know it a little bit uh it's a multi3 generational GC so what
00:40:19.560
does that mean well that means that in the case of microbe uh we run multiple VMS and um we don't have locks they're
00:40:26.079
multi- entrance uh re-entrant VMS and um they don't have locks the lock happen in
00:40:31.200
the core which is I'm not going to explain that right now but the GC runs on a different thread connected to the
00:40:36.880
VM and every single thread registered to the GC um generational means that um objects
00:40:43.839
are organized by um time that they've been spending in uh the garbage
00:40:50.160
collector or in memory so you have the youngest objects that get allocated in one place and then you have the the
00:40:55.400
young object which the ones that survive few GC cycles and basically um your data
00:41:00.920
is organized in different ways depending on how long it lives and the reason why it's done like that is so uh the first
00:41:06.560
generation usually this object are being allocated and they allocated right away so you want to have them in one place uh
00:41:12.920
this is also the case for rubinus which uses a generational um garbage collector
00:41:18.760
the difference is in their case because it's a preise garbage collector and I will explain that in a second they can
00:41:23.839
actually move uh the object from one um rout one memory place to the other so rubinus
00:41:31.400
uses a stop the world preise moving generational GC so stop the world is a difference with macruby where macruby
00:41:37.359
does the garbage collection in theide um Rubin stops the world it doesn't stop it for a long time so it's not a big deal
00:41:43.839
but it still stops the world it's precise what that means is that unlike um C Ruby uh the the implementation
00:41:52.920
itself handles all the object and knows exactly what's going on with all its objects so it can actually uh it doesn't have to guess it doesn't
00:41:59.319
have to be conservative and conservative I didn't really explain what it is conservative is um instead
00:42:05.400
of when it's time to to mark and um release objects what it will do is if
00:42:10.960
it's not sure it will basically keep the object in memory in the case of a pre-sized garbage collector we know exactly where um how the memory is
00:42:18.599
allocated so we can actually take care of that we can even move uh the object in
00:42:24.119
memory so also because it's precise which I just explained you can actually move uh these things it avoids
00:42:30.240
fragmentation because as you remove slots they don't always have the same size you want to allocate new object they might not fit um by moving things
00:42:37.720
you can actually allocate uh the memory in a better way and you will avoid fragmentation by quite a lot uh
00:42:44.400
potentially rub could also uh release the memory that's that's not being used so let's say you you allocate a lot of
00:42:51.000
object the memory grows a lot and uh then you actually end up not using that much um you ruins could clear this
00:42:59.400
memory so you would see the usage going down when you use MRI um once you hit
00:43:04.559
you know 100 megabytes which probably is what happens when you start rails um you will never go
00:43:10.960
down so there you have another type of garbage collector which is the reference counting and this is what's used by
00:43:16.040
cpython and other um other Solutions um like Objective C on iOS if you don't use
00:43:21.839
um uh garbage collector uh well on iOS you don't have garbage collector yet so each object uh will keep track of it
00:43:29.119
State what that means is I create my object and I will keep I will tell the garbage collector this is who depends on
00:43:34.200
me this is what I do and you will keep on Counting what's going on and when it's time to be released the object
00:43:40.040
itself will basically say hey I'm free to be deleted this is great because it's decentralized you don't have one place
00:43:46.200
that deals with that every single object do do that the problem is that it's really hard for everybody else because now C extension developers need to do
00:43:52.800
that for every single location they do uh you have a lot of opportunities for uh leak memory leaks and for a lot of
00:43:59.520
other problems you also have the hybrid solution which is the case of the jvm and I'm not really familiar with the the
00:44:06.200
details of the jvm but the jvm first is is a VM that you can configure when it comes to the garbage collectors so you
00:44:12.240
can decide how you run a garbage collector uh but what it can do is it can be generational like we explained
00:44:18.640
before it can also do the occasional Mark and sweep where goes through the entire um memory and it can be copying
00:44:26.200
the data and copying the the memory uh which I will not explain now because I'm running out of
00:44:31.960
time so what are the tricks um well there are a bunch of tricks and uh the
00:44:37.400
MRI team is working on a lot of things to try to optimize garbage collection La sweep was one step but they they they
00:44:42.800
want to do much more than that so the bit map marking is what I explained uh
00:44:48.240
before and that allows for better forking uh which means that if you start
00:44:53.359
let's let's pretend you start one rail Z and you have the same code you could Fork this process and have two or three
00:44:59.599
other uh rail apps and they would not it would not be four times the cost of memory you would basically share what
00:45:05.359
was loaded at the beginning and if you use Ruby Enterprise that's exactly what you get then there's the parallel
00:45:11.400
marking which is every single thread um would basically uh talk to the the GC
00:45:17.240
and would Mark it's on object in every single thread it still stop the world but it's done differently uh NIS I did a
00:45:23.760
talk on that you can see slides online if you miss it or you can watch the video when it will be released um then
00:45:29.240
you have the mostly copying GC so the mostly copying GC what that means is that um in in
00:45:35.040
memory the problem is fragmentation in this case that's the problem you're trying to solve so the problem is we have this object we allocate we
00:45:41.160
deallocate and when you have this new object they don't always fit in the old object spaces so what you do is you copy
00:45:48.079
the memory and you basically copy the object that you know are safe and you move them somewhere else so you can
00:45:53.680
reallocate the other ones the point with that is that you end up using uh more memory but the fragmentation issue
00:46:00.280
almost goes away uh then Twitter did is on TW weeks on the garbage collector and
00:46:06.119
um you can read it online there's a link on on my slides uh basically they did a
00:46:11.240
few things related to what ruby Enterprise Edition did plus they apply some sort of long life GC patch which
00:46:16.559
means that it's almost generational so they look at the object that are here for a long time so when you start reals
00:46:22.640
you you load a lot of code and a lot of objects that will never go away so what they take is they take this object they
00:46:28.280
put them in a different place so they don't get garbage collected or not as often uh there was a long life GC patch
00:46:35.559
uh that was almost applied to one nine uh but it turned out it was not the best solution ever um and Matt can explain
00:46:42.960
that more in details uh they're really working on the GC so um if you have the opportunity to talk to them I'm sure
00:46:48.800
they would be glad to explain then the last thing is something that was in Ruby Enterprise and that's back in Ruby 193
00:46:55.520
it's a GC tuning and the GC tuning allows you to uh set a lot of settings on the garbage collector and you need to
00:47:02.000
understand a little bit how that works but um by doing that you can make sure the garbage collection usage is Optimum
00:47:08.640
um so it will go faster so what's the big deal with with garbage collector why do I care that much and why do I get
00:47:15.200
upset when people allocate too many object and I tell them they they're making a mistake well it's like if you
00:47:20.280
have a car if you have a car and it's a small car let's say you have a small Fiat and JLo is not in the car and you
00:47:26.040
want to do something with this car if you put 15 people in this car it's not going to go fast so you need to think
00:47:31.359
about the type of car you have if you understand how things work you'll be able to use code the best way you can
00:47:37.640
and the reality is currently uh the garbage collector makes things a bit slow um if you generate the r doog um at
00:47:45.800
least for for Nisan what happened for him is that he spends 80 Seconds to generate the entire Rog and 30% of this
00:47:52.960
time is spent in the garbage collector at Twitter um in 2009 uh they did some
00:47:59.440
test and they realized that they were spending 20% of the front end CPU on garbage correction I did myself some
00:48:05.800
tests when I was working at sunny and I was seeing the same number and sometimes even higher depending on the quality of
00:48:11.200
the code that was written and the amount of object being allocated per request so there is a concrete effect on
00:48:18.440
the performance of you you code especially with rails where a lot of objects are um allocated every single
00:48:24.559
request you need to realize the cost of the Govern collector because it would actually slow down every responses um
00:48:31.280
when you get a lot of requests coming in and what you can do um is use the tools
00:48:37.559
that I provided to you by ruby9 to see how you the garbage collection is being
00:48:43.000
used so um this is a simple example to to to show you how that works at the beginning of the code you turn on the
00:48:49.559
the garbage collector profiler and whenever you want you output the result of the garbage the garbage collector
00:48:55.520
collector provider uh profiler and when you do that it will basically show you
00:49:00.680
all the different collection that happened how long they spent what's the what's the use What's the total memory
00:49:06.160
uh usage that's allocated for the the in memory and what what's actually used within this memory when was it in
00:49:13.240
invoked uh when it comes to time uh in the timeline of the execution when was that executed how many objects are
00:49:20.119
available how long did it take in milliseconds now in itself like that it might not be very useful but um you can
00:49:27.040
use it with object space Which object space allows you to show you how the memory is being used what's in memory um
00:49:34.440
and here's a simple example where I disabled a garbage collector to uh to not get information I don't want I
00:49:41.000
didn't want object to be um to be freed so I turn off the garbage collector I
00:49:46.160
take a reference which is a at this this point in time how many strings do I have in in memory and then I allocate 10,000
00:49:54.680
strings and then I get get the count and I remove well I I get the new count of
00:50:00.520
strings and I remove the reference and then I will print the amount of strings and you can see object space count
00:50:06.640
object basically is a big hash representing the memory the total is the amount of um objects that are available
00:50:13.880
free is the the length of the the free list and then you have the allocation per object type so this tool can
00:50:21.319
actually really give you an idea of how the garbage collector is working and what you have in memory and what you're doing and I wrote a very simple uh rack
00:50:28.280
middleware to show how that works to do some of the work I was doing to see the optimization of the garbage collector so
00:50:33.799
I wrote something very very simple it's not fancy uh but it basically gives you that as a rack meterware it would run at
00:50:40.440
least in when you enable it and um as the garbage collection will come it
00:50:47.559
would tell you uh the last GC cycle happened x amount of request ago um four
00:50:54.440
GC Cycles were run or 4 DC Cy in this case and this is the amount of time spent for each of them and you can see
00:51:01.119
in this case I'm spending 14 15 milliseconds um in the garbage collector
00:51:07.000
my goal uh working on apis who had to be be be fast was to get a response time of
00:51:13.559
30 milliseconds anything above 100 milliseconds was really bad and the goal is really 30 milliseconds if I spend
00:51:20.680
that much time in the garbage collector I will have major problems and at the end I basically use object space to show
00:51:26.839
you some stats about how the memory change happened between the last GC cycle and this GC cycle so that gives
00:51:33.200
you an idea of how the memory is used and I would really want to encourage people to pay attention to this to to
00:51:38.319
the memory allocation because it really costs you a lot of time uh on request
00:51:44.400
time that's it any
00:51:50.680
questions oh yes it's mer Beast I forgot to B thank you yes it's the same as my
00:51:56.160
Twitter so I put the slides there in HTML you can see them any questions um Nick do you want
00:52:04.799
to to come and talk about your problem with the global
00:52:10.880
lock I kept so any
00:52:16.920
question okay a question over there oh no questions see a question over
00:52:23.480
there do not believe
00:52:30.559
that was not a real question you are absolutely right I was
00:52:36.119
wrong about that the photo was not taken by the person I I mentioned in in the source any other question
00:52:53.920
question so the let me just the question so the question is a lot of things change in the past with MRI um for
00:53:02.200
instance the string and the the encoding that broke what that changed things in one n how come we cannot change the
00:53:08.680
global interpreter lock is that what you're saying or the garbage
00:53:16.319
collector what makes the Gil special would you like to answer mat
00:53:29.720
the main concern we have for Gil is not not the the compatibility but it's it UNS safeness so the if we allow the C
00:53:38.359
Ruby to crash when you use threat we can remove the G but I we I don't want the
00:53:45.319
Ruby to be like that thank you
00:53:58.839
any other questions that's it well thank you very
00:54:03.920
much and see you next year