Complex Ruby concepts dummified

In the talk "Complex Ruby Concepts Dummified" by Matt Aimonetti at RubyConf 2011, the speaker aims to simplify complex Ruby concepts to improve developers' understanding of the language. He emphasizes that while Ruby allows developers to focus on application logic, a deeper understanding of its underlying mechanisms can enhance a developer's skill set.

Key points discussed in the talk include:

Understanding Ruby Internals: Aimonetti showcases how the various components of a Ruby application interact. He discusses the importance of grasping backend mechanisms, such as the parsing process, execution through the virtual machine (VM), and how code is transformed into bytecode.
C Extensions and Their Role: The use of C extensions is explained as a way to improve performance in Ruby applications by leveraging existing C libraries. Aimonetti highlights the necessity of managing memory carefully when using C extensions to avoid leaks.
Concurrency Challenges: A key focus on concurrency highlights the need for Ruby programmers to understand threading concepts. Aimonetti differentiates between green threads (managed by Ruby) and native threads (managed by the OS), emphasizing the limitations and advantages of each.
Global Interpreter Lock (GIL): A significant portion of the talk is dedicated to discussing the GIL, which affects Ruby's concurrency capabilities. Aimonetti explains that while the GIL simplifies development by ensuring thread safety, it also limits concurrency, especially on multi-core systems.
Memory Management: Finally, Aimonetti covers garbage collection processes in Ruby and their impact on performance. He introduces various garbage collection strategies and stresses the importance of managing memory allocations to improve application efficiency.

In conclusion, Aimonetti encourages Ruby developers to gain a deeper insight into these complex concepts, as they are crucial for producing efficient and scalable applications. By demystifying these topics and encouraging further exploration, he aims to empower developers to better grasp Ruby's elegance under the hood, ultimately improving their coding practices.

Complex Ruby concepts dummified
Matt Aimonetti • New Orleans, LA • Talk

Date: September 29, 2011
Published: Tue, 13 Dec 2011 00:00:00 +0000
Announced: unknown

Programming languages, such as Ruby, are natural and elegant. But to achieve this elegance, things have to happen under the hood. Garbage Collection, concurrency, Global Interpreter Lock, metaprogramming, C extensions are just some of the things happening with or without our knowledge. Trying to understand these concepts, their implementations and their implications in daily coding might seem daunting. However, having a good understanding of these topics will make you a better developer. No CS degree or PhD required to attend this talk.

RubyConf 2011

00:00:17.320 we're going to talk about some of the the rubby concept that I try to simplify as much as I could because I think it's

00:00:23.160 very important that we have a good understanding of things um

00:00:28.640 and so this is this is a cross-section of a ruby app um you basically have the

00:00:35.040 the cross which is what people see outside and as you go deeper and deeper people have less and less understanding

00:00:40.480 of what's going on and my goal is not to go and talk about really the core and

00:00:45.520 what's going on how it's made and why you should use this implementation instead of another implementation my goal is really to explain why things

00:00:53.039 work the way they are so we can have a better discussion about the possib the POS the different possibilities and the

00:00:59.199 solutions we could Implement so 90% of our work is done on the surface we

00:01:04.400 probably all write rubby code and we don't all work on different Ruby implementations and um that's story fine

00:01:11.600 that's actually really good that's what ruby was designed so we can focus on the core of the application and we can uh

00:01:19.600 release business value and make money which is probably why we have a job so why should we care and why should

00:01:26.960 you come and listen to this talk if really uh that's not so important for us

00:01:32.200 well the reality is even though you're on top you need to know what's going on underneath uh so you can have an overal

00:01:38.399 understanding and you can understand when people arguing and they're saying well we should have better concurrency what does that mean uh we need to remove

00:01:45.520 the global lock well what does that mean why isn't it removed is mass just somebody who doesn't want to remove the

00:01:51.320 global lock is there a reason why we we add it in the first place what's a green thread what's what's the difference with

00:01:56.880 a n native thread people told us it's better but why is it better so my goal is really to try to um

00:02:04.479 explain some of this concept as simple as I can so you can understand the dependencies between each of

00:02:10.280 them and we also all craftmen we really try to do something we care about we

00:02:16.519 believe it's almost an art and we get together and we want to be motivated by the Ruby Spirit we want to work and um

00:02:25.760 we we really want to do something together and it's interesting because as people are fighting for changes in the

00:02:31.560 Ruby language or in the Ruby implementation um there's a desire for all of us to do a better

00:02:38.040 job so um there's a lot of stuff I won't be covering um and stuff you should know

00:02:43.640 other people are talking about so I don't really care too much about that at least for this talk um on the other hand

00:02:50.040 I will cover um a few things things that you don't have to know but it would be good if you know them so we're going to

00:02:56.560 start at the beginning we're going to talk about how SS code get gets pared to

00:03:01.720 become uh something and then it gets executed and it's all magical um and

00:03:07.440 then we have the C extensions that sometimes we have to use like noiri or these myal jams and these things that

00:03:13.720 are a bit annoying sometimes and we'll talk about that and see how they relate to all of that and then we're going to

00:03:19.159 talk about concurrency because concurrency is related to all of this and it's a really hot topic nowadays and

00:03:24.440 everybody has his own opinion and they think you know they have a solution for that uh and different implementations

00:03:30.760 approach that differently and finally we talk about memory management because memory management we don't have to do it

00:03:37.680 but somebody does it which is the implementation and if you don't understand how that works you might not

00:03:43.080 realize you're doing something wrong or you might not realize the arguments of for a solution against another one so

00:03:49.439 let's get started oh before get started this is ruby9 only and it's actually MRI

00:03:55.040 so I will quickly mention other implementations but I'm going to focus mainly on one n with some uh discussions

00:04:02.360 about what change but it's really y n Only If you still using 18 I'm sorry for you it's time you move

00:04:08.239 on so it might sound very boring to you um yes it might be boring my goal is to

00:04:13.400 make it less boring if it's really boring uh you have Cafe Deon over there and you can get a good coffee with with

00:04:19.040 beign or you can just wait a little bit go through that and um then you can think about what we

00:04:25.400 discussed so let's start at the beginning uh we have a lot to cover so I'll be try to be I'll try to be quick

00:04:31.280 so let's see what happens I write my own source code and um it's just a simple

00:04:36.639 hello world what happens when uh I write this code well what happens is that

00:04:42.560 first there will be a tokenization process what that means we're going to cut this text that comes from the SS

00:04:48.720 code and it's going to be broken down into tokens and here you have a representation of the different

00:04:53.960 tokens now after the tokenization process uh there's a Lex there's that

00:05:00.080 will happen based on a grammar and basically this is how we will break down

00:05:05.320 um the SCE code into things that might make sense for somebody to interpret so you can see how things are breaking down

00:05:11.960 into smaller parts and then um as you can see here I explain a little bit how

00:05:18.199 that works you have basically the line number where it starts the column it's on the the type um of token and a token

00:05:26.400 itself so once we have that the parser can do his job um so he will take this

00:05:31.840 Lex representation and then you'll convert that into an as which which is an abstra abstract syntax tree and it's

00:05:38.800 basically a bunch of notes put together that explain how the language will be executed based on the grammar that was

00:05:44.400 defined so here we say we have a program that starts we have a First Command that's a put uh and it's it's on uh on

00:05:51.800 column zero line one um yes um and then we have uh we're going to pass some

00:05:58.639 arguments in this string blah blah blah and you can see all the information now this EST is a representation of the

00:06:04.000 program now it's not enough to do much but it's enough to to have the language being defined so this is implemented in

00:06:11.479 MRI using a lexer and a parser and you can actually go and look in the source

00:06:16.520 code and see how that works if you're interested uh and you you can see how the language is basically being

00:06:22.880 explained and between the different Ruby implementation that's what's really shared everybody shares the same

00:06:27.960 approach to parsing s code so if you want to look at how Works in Ruby you

00:06:33.440 can actually use Reaper and Reaper is is a nice tool that was provided in Ruby 1 n that allows you to see um the

00:06:41.680 underlying uh Lexing and parsing that's being done and Ruby MRI uses Lex and

00:06:48.000 bison but with uh Reaper you can actually have an idea of how that works so you can take your code and look at it

00:06:54.400 and see the end results um there are also some smart people that did the other way around where at and I

00:07:00.000 basically generate an as the NST so the as I showed you here um this is actually

00:07:06.039 a s expression that's not really how it is in inside Ruby but you can actually take this as modify it on the Fly and uh

00:07:13.479 evaluate it but anyways this is how you can play with the EST to understand the language better see how that works see

00:07:19.160 how Ruby interpretes your code now once you have this EST you have the virtual machine that will do two things so this

00:07:26.440 runtime or virtual machine would do the compilation and their interpretation and the compilation uh and the viral machine

00:07:33.039 was written by Ki sasada who's here and I'm sure you you know him and it was

00:07:39.319 replaced in one n are you here Ki maybe he's not here after all a sad

00:07:45.240 but it did a great job uh and he's still doing a great job so the compiler the job of the compiler is to take the EST

00:07:52.199 we s and to compile it into a bite code and a bite code is a representation optimized representation of the as

00:08:00.240 now once we have this bite code we can actually interpret this bite code and we can execute it by running it and that's

00:08:07.120 what's basically going to run your program so this VM also handles uh the

00:08:13.759 concurrency and the extension libraries and we're going to talk about that in a minute so don't worry too much about it

00:08:18.840 for now so the VM implementation you have a bunch of C files um I would recommend you know if you're interested

00:08:25.080 in that to look uh at at the code and see how it's done it's not really easy if you you don't know C but really if

00:08:32.080 you want to become a better developer I would encourage you to learn a bit of C it would just make you better um you

00:08:38.039 don't have to be a c expert and you don't have to contribute to to MRI core but I think it would really help uh you

00:08:43.959 understand different a different world of programming so uh these are the rest of the VM

00:08:49.720 files now to understand how an implementation of Ruby works these are the different parts of Ruby

00:08:56.360 implementation you have the garbage collector which is what's going to manage the memory and we're going to

00:09:01.399 talk about that again in a minute then you have the bu buil-in classes like hash array base object all these these

00:09:08.959 classes that exist then you have the standard libraries uh like net HTTP uh

00:09:14.200 op SSL this the standard libraries then you have the string encoding transcoding which um was modified for one n which is

00:09:20.959 very very important because it allows us to deal with different type of encoding like utf8 versus ASI um or binary uh

00:09:28.800 encoding then you have the reg exp engine which is what allows us to use regular expression you need to realize

00:09:34.560 some languages don't have regular expressions and it can be a pain um then you have a bunch of small utilities that

00:09:40.680 allows us to uh debug how things are working inside to do time formatting and things like that then you have the

00:09:46.600 parser and then you have the VM so this is what makes a ruby implementation so let's talk about C

00:09:52.480 extensions because they're really critical to the discussion to everything else um interestingly enough you can

00:09:58.360 hear people say well we canot change this because of the C extensions we don't want to break them what does that

00:10:03.440 really mean we first need to understand what a c extension is and how it works so C extension is usually um used when

00:10:10.200 you want to wrap an existing C library so you can think of noiri or the my SQ

00:10:15.560 gem or a lot of different C extensions that actually have C code and then they expose this C code um it could be used

00:10:22.760 for performance but usually it's because people reuse libraries that that already exist already perform that you don't

00:10:28.200 want to rewrite so how does it work well in C you can actually Define objects in

00:10:34.880 the Ruby world so this is some code I took from uh from Aaron noiri and you

00:10:40.720 can see U he declares a bunch of values and then he will use the Capi that

00:10:45.959 that's provided by the Ruby headers and it can say Ruby Define a module called

00:10:51.120 noiri and assign that to the value and then um I also want to Define another

00:10:57.519 module underneath no giri and then I would Define another module underneath

00:11:02.760 XML which is under no giri and then you can see another example of the the Capi

00:11:08.120 where we Define a constant and we'll set it inside a different module so this is how you write C extensions that seems

00:11:14.560 pretty easy now the problem is you also need to expose C functions so a c function it's not really really hard you

00:11:21.760 can see we're doing the same thing we basically creating a module in this module we Define a Singleton method and

00:11:28.200 this method points to a function that's being defined underneath so whenever we're going to call Bonjour on Ruby it's

00:11:34.480 going to call the C function that's defined underneath and the C function returns a ruby string so that's a the

00:11:40.519 extension now if you stop here you you think well okay so what's the big deal well the big deal is that um usually

00:11:48.320 what you do you would actually wrap uh structures in C and for that that means

00:11:53.360 you need to manage memory now that means you need to understand the garbage collector because um the you will create

00:11:58.480 object in Ruby that would wrap um seure in C and you need to make sure you don't leak memory and you need to make sure

00:12:04.399 things work properly so when you define a class in this case I took that from the my equal 2 client when you define

00:12:10.880 the class you need to say when my class is allocated this is what you're going to do in C and you're going to allocate

00:12:16.440 a few things and then we have the C function that basically defines how the memory is allocated and um you can see

00:12:22.839 that the author first defined a value object then we have a pointer to uh wrapper it's it's a it's it's a data

00:12:29.320 type in C we don't really know what it is yet and then you have this line that says data Max structure and this is

00:12:34.880 basically how in in a c extension you're going to create a class or an object or

00:12:40.040 whatever you want that will point to your own object in C and when you do that you will say hey um Ruby by the way

00:12:46.760 when you will uh run the garbage collector please run this um other

00:12:52.600 functions for me so I can maintain my own memory and we don't leak memory and what's interesting to see is uh this

00:12:58.320 data structure as a name as everything else as this different functions and then he has a pointer to the data which

00:13:03.959 is the wrapper we Define above and you can see then then the wrapper is being set all in C and then some memory is

00:13:10.360 being allocated directly to this wrapper object and then we return the the new Ruby object that was mapped so Ruby can

00:13:17.720 use it so what happens is when the garbage collector would go through and it would want to check if the object is

00:13:24.120 being used it will call one of this function which is RB MySQL client Mark that was defined here here and here

00:13:30.959 we'll check if this object that we're pointing to still exist and if it if it does we're going to mark it to say it's

00:13:37.639 a live object don't don't do anything about it and if the memory needs to be uh cleared so if we need to free the

00:13:44.000 object in this case uh we're going to call the C code and say hey by the way get this wrapper close close this object

00:13:51.240 which we need to Define and that will free the memory that was allocated down here almost at the last line and then

00:13:58.040 also free the point that we Define and that's how we free the object so you can see it's actually not that easy and

00:14:03.600 there's a lot of work to do to maintain the memory because the C code has to work with uh Ruby itself so it um it

00:14:11.639 basically you have a few challenges when you write C extensions you have to deal with with the memory and the fact that

00:14:17.160 you have you have the garbage collector to deal with um you also have to deal which is a problem with c and NC classes

00:14:23.360 you need to make sure your cut is is crossplatform so um we have a tendency as a ruby Community to Target mainly uh

00:14:30.399 uniqu uh but a lot of people really work hard to make sure windows work and when you write a c extension that's part of

00:14:35.920 the challenges and then you have the problem of thread safety which is not a problem with MRI because um we're going

00:14:42.120 to talk about that but basically thread safety uh a c extension cannot run on on multiple threats at the same time so you

00:14:48.199 don't have this problem but I'm going to get to that in a minute because that's the next topic let's talk about

00:14:54.720 concurrency so concurrency is a big deal everybody want to be concurrent we all want to to uh run a lot of code in

00:15:00.600 parallel and I just want to explain it a little bit by showing an example so we would like to handle a lot of requests

00:15:06.800 web request at the same time and I will write a bit of Rubik Cod just to show you how that

00:15:11.959 works just to to illustrate um the concept of concurrency in a very simple

00:15:17.279 way so we have a simple client we have a Dy client and this client is a class with two function one is me a query the

00:15:23.959 other one gets the reply and will print the response that will come back in the query uh method we call the server and

00:15:30.600 we dispatch ourself with an ID I mean it doesn't matter how the client works just to show the as an example now the server

00:15:37.319 implementation that would be a very simple server um it's a module with two function one is dispatch which gets

00:15:43.079 called by the client and it would take the client and call reply on it and it would pass a response which is a fake

00:15:49.959 response and in this case what I did is my fake response would will be random

00:15:55.120 and it would take more time if the ID is um than even so and my point I will show

00:16:02.319 why I did that in a minute so if we start 10 clients and we make them query

00:16:07.600 um we make them query the server 10 times we'll see the response will come back as response 0 1 2 3 4 5 6 7 n it's

00:16:15.279 8 n so they all come in synchronal order which is not good because as I told you

00:16:21.079 um some of the responses will be slower and if they're slower that means that the one that's faster behind will have

00:16:26.480 to wait for that and that's that's not really good the through part really will be low because it will depend on the

00:16:31.839 Queue so if you only test one call you will gu you will always get the same

00:16:37.000 speed but if you make a lot of calls you're going to have a different response time so the problem with this

00:16:42.880 approach that it cannot scale as you get more uh load so everybody will tell you well this is not a good solution so

00:16:48.800 let's use threads because that's what we all know so let's talk about threads so what's a thread well a thread

00:16:55.079 is very simple it's basically a code that gets executed in parallel and shares the same memory as everything

00:17:00.279 else so we have a main thread which would be EUR be application and then you can branch that and say Hey I want to

00:17:06.000 run this cut in parel and you can do that many times and you can share the memory so that sounds really good um we

00:17:13.559 could rewrite our code to make it into a threaded server so the only thing you need to do for that is I need to wrap

00:17:20.079 the dispatch uh function body inside a new thread and now if I query 10 times

00:17:27.480 um I will see that the response come back in a different order because the faster response will come back before

00:17:33.520 that sounds really good all right so the fast the the 3D response um will allow us to get a

00:17:40.200 better throughput because it will not depend on everybody else on the Queue that's before that um now there's one

00:17:47.559 problem is if we have too many threads we can actually slow down the server and we're going to talk about that really

00:17:52.679 soon but threats are not magical there's not something magical that just say hey when I have threats I can execute all my

00:17:57.799 code in parallel that's just magical the reality is that a CPU can only execute one instruction at a time so there's

00:18:03.440 something happening Bey the SC to to do this thread um work and that's called

00:18:08.919 Contact switching so contact switching um it's not actually true not the OS is

00:18:14.000 not always in charge but um contact switching means you go from one execution code to the other and you go

00:18:20.159 back and forth it it seems like it's concurrent you don't really see what's going on I mean you run a bunch of

00:18:25.640 application on your computer they they don't seem like they're running one after the other right but the CPU can

00:18:30.679 only handle one thing at a time it's just switches back and forth now when you do a thread context switch which is

00:18:36.280 basically when you switch from one thread to the other is still faster than when we switch from one process to the other finally the contact switch happens

00:18:43.679 per CPU so um if you have two CPUs you should be able to run two code exactly

00:18:49.200 at the same time in parallel so this is an example of how the schedule Works in Ruby uh in MRI so

00:18:56.280 that's a rub um Ruby uses a fair schedule which which means that um it will go back and forth between the

00:19:02.280 threads with a Time slice of 10 milliseconds so you say 10 milliseconds on one slice 10 milliseconds on the other slide 10 milliseconds now there's

00:19:09.480 one thing and I think that's a lot of people got confused about that um and especially because it Chang in one n

00:19:15.640 what happens if you have a thread and there's a blocking operation are we wasting the rest of the time well no

00:19:21.120 what happens is when you have a blocking operation in a thread the thread is not being called by the schedule anymore

00:19:27.679 there's no polling to check check if the thread is still is still blocked or not what happens is the other threads are

00:19:33.080 going to be scheduled um but the OS on N native thread will come back and say

00:19:38.559 okay my thread is my thread is back from the blocking operation schedule it again so um this is how the scheduling

00:19:46.159 work now there's a lot of discussion about green threads and native threads Ruby 18 used to have green threads Ruby

00:19:52.520 19 has native threads uh Ruby has Ruby thread native threads um micro native

00:19:58.960 threads what's what's the big deal so a green thread is basically a thread

00:20:04.080 that's handled by the runtime itself it doesn't go to the OS and basically has one thread and it would do the

00:20:09.640 scheduling itself so the pros of a green thread is that everything is managed by

00:20:14.919 the VM so it should be technically uh crossplatform at least you get a unified behavior and control at the VM level

00:20:22.840 because you know exactly how the threads will will work you also get um lightweight threads because you don't

00:20:29.799 have to use a native thread so much um the lighter to start the faster to start and the memory should be in theory uh

00:20:35.720 smaller now there's one major problem um is that it's not it's not

00:20:40.919 concurrent so you're limited to one CPU you have one thread and that's it you can only use that so it's it's a big

00:20:47.360 problem if you want to use multiple cores and also if you have a blocking iio within one of these green threads

00:20:53.840 the other threads cannot run they basically are blocked so it seems natural that you

00:21:00.000 know 14 years ago when we only have one core that was not a big deal was actually probably a good idea to use green threads and that's what Matt did

00:21:06.679 it was a good idea but now that we have multiple cores uh Native threats are actually quite interesting because they

00:21:12.799 allow us to run on multiple processors we they also get scheduled by

00:21:18.240 the OS so you have a bit less work to do now you do have a lot of work to do because um different os's work

00:21:23.679 differently so threats are implemented differently but at least you don't have to deal with some of the problems the

00:21:29.080 other thing is a blocking IO happening on a n thread won't block the other threads so that's good so the problem

00:21:36.200 with threads in general not just native threads is that you need to communicate between these threads using a shared

00:21:42.760 memory what that means is you need to use mutexes and locks because you have the shared resource and you're like you

00:21:48.840 have two people trying to communicate and modify them you don't want that to happen so you need to put a lock around that that would say you can access it

00:21:55.159 then when you're done the other person will access it and that's actually a lot of work to do if you don't do that

00:22:00.440 properly you can actually corrupt the data because you have now two threads accessing the same data structure modifying it and you have uh data

00:22:07.760 corruption happening you could also have Deadlocks you have your lock somebody came to it it never released the lock

00:22:14.039 everybody else wants to access the data you get contention you get a lock and you're in a in a bad situation and Java

00:22:19.960 developers would know uh it's not fun to deal with so it also has a lot of cut

00:22:25.080 complexity because you have to worry about all these things um and that's kind of the challenge with with

00:22:31.320 threads also if you have too many threads um you have to do contact switching and you saw how basically you

00:22:37.039 have to go back and forth if you have more threads that than than what's it's a bit complicated to explain because it

00:22:43.520 depends on the system how it's being used but if the amount of threads is not appropriate the context switching will

00:22:49.480 be quite expensive now context switching in one in on Native thread is much faster as we saw that than on a on a

00:22:55.400 green thread finally you get UND deterministic behaviors uh with threads which is a bit scary

00:23:02.320 sometimes One n also added something else which are fibers and continuations or Contin they call both ways um what

00:23:10.159 that means is actually very very simple people seem to be confused but if you look if I go back to my graph here the

00:23:16.400 schedule will switch between each thread when you have a fiber it's not the schedule doing that is you as a

00:23:22.080 developer so you say start this this fiber this light thread and then you

00:23:28.080 tell it when to stop and when to to to come back and do it again so basically you're handling your un scheduling on

00:23:35.400 your own thread with bunch of fibers um it doesn't really help with blocking iOS

00:23:41.279 but it could help if you know uh you don't need to go back and forth between the threads because you know it's going to take a certain amount of time it's

00:23:47.600 also a bit lightweight so it might save you on memory so these are the fibers so

00:23:53.320 the big question is why threads are not popular with Ruby Developers

00:23:58.600 and there are a few reasons uh first we used to have only green threads and people thought well you know it's not

00:24:05.120 worth it um then we had a history of a lot of blocking C extensions so if I go

00:24:11.360 back again to My Graph you see when I say blocking IOU the thread gets uh basically taken off of the scheduler

00:24:18.600 well when you do the C extension you actually need to tell um you need to to write in the code you need to say hey

00:24:24.400 this is a blocking operation and I want you to do it in even it doesn't matter you need to do something something and

00:24:29.679 that would let go if you don't do that then it would actually block until the end of the 10 milliseconds before it

00:24:35.279 would switch back to the other thread so we had this issue with some C extensions that were not doing that and that was

00:24:42.360 fixed so that means that blocking iOS should not happen with DB drivers and

00:24:47.399 other type of code now the other issue was rails uh rails and rails 22 or 23

00:24:54.279 was not thread safe which meant that if you were using threads you would have really behaviors and even to this day um

00:25:00.720 if you start a new rails project by default um it would use a big lock around every single request so it's not

00:25:07.840 uh it's not going to handle all the requests in different threads um there's also a lack of

00:25:13.360 knowledge and understanding people are a bit confused sometimes about what's going on with with the core with the uh

00:25:19.039 with the threads and finally we have a lot of multicore machines uh now we actually want to take advantage of um

00:25:26.679 the different course so we want to use thread Reds now there's one problem that's a big problem that uh Dr Nick

00:25:32.640 loves to talk about which is the global interpreter lock so what is this Gil that things are that people are talking

00:25:38.320 about what does it mean why is it so annoying why do people hate it um why did Matt put a Gil in Ruby to start with

00:25:45.520 well the global interpreter lock is actually something very simple to understand uh to avoid data corruption

00:25:52.720 and for the Reas Reasons I'm going to address in a second every single thread can talk to The View m one at a time you

00:25:59.559 cannot have two threads talking to the the VM at the same time so if you remember the bite code when the bite

00:26:05.799 code is interpreted you can only have bu code from one thread being interpreted at the same time just to make it simple

00:26:12.039 so that's really not a problem if you have one CPU because anyway the CPU can only handle one instruction at the same

00:26:17.520 time the problem is if you have more CPUs so if you have two CPUs you can still only handle one code execution at

00:26:24.840 the time that's kind of a problem um it's not concurrent anymore so the

00:26:30.320 reasons why um there's a global uh Global interpreter lck is first to make

00:26:35.840 developers life easier it's really harder to corrupt data because you don't have access to the data at the same

00:26:41.600 time it's also to avoid racek conditions with C extensions at the C extension

00:26:46.720 level it's much EAS easier to to um find race conditions and I'll talk about that

00:26:52.919 in a sec so um so you make see extensions development much easier because you don't have to if you don't

00:26:58.559 have a global interpreter lock you need to write a lot of code around your C code to make it thread safe and um if

00:27:04.240 you ever wrote a python extension for instance a c extension you know that the the CPI is much harder to use and you

00:27:10.039 need to do a lot more work and if you don't do it right you're going to have a lot of problem like memory leak and and data corruption most of the ca libraries

00:27:17.039 out there that people are wrapping are not threat safe uh starting with the regular expression engine that's used on

00:27:23.240 on on cruv now there's another big problem is that part of the Ruby implementation

00:27:29.480 itself like the hash implementation is not thread safe and that means that if

00:27:34.679 we remove the global interpretor lock we need to go back and fix all these things uh which will create some problems like

00:27:41.320 making um Ruby itself slower so should we remove the gear and that's dr's big

00:27:48.120 question um there is a lot of implementations um we don't have Global

00:27:54.559 interpreter lock so should we remove it in MRI well there are a few answers and I'm not gonna I'm not going to argue one

00:28:00.960 way or the other I'm just want to expose to you what the arguments are so you can understand well first if we remove the

00:28:07.360 global interpreter lock it will make rubby code unsafe it's actually it is a

00:28:13.120 fact that you need to understand how data Works otherwise you're going to corrupt the data you also need to uh worry much more about uh mutexes and

00:28:22.039 locks if we do that it will break the extension and that's something people hear all the time like so who cares

00:28:27.600 right you always tell me Dr Nick like oh I don't care like let's rewrite all the extensions you know who cares

00:28:36.480 well I don't think I've ever done that you didn't say that okay so so last last

00:28:42.919 night okay yes that's correct he never said

00:28:49.600 that with my accent it was more with an Australian accent so well the C extension as I show you at the beginning

00:28:55.159 if we would remove the global interpretor lock is not as simple as changing the way we have the Capi and

00:29:00.240 just the way the calls are made we actually need to change the way memory is handled and the way um you need to

00:29:06.679 hand a lot of of the challenges at the ca level extension unless which we actually make even more changes so C

00:29:13.080 extension will be compatible but that's that's a big one because it's not as simple as just doing you know a fine and

00:29:19.480 replace and then we're good um it would also make writing C extensions much harder so for instance

00:29:26.679 uh depending on the solution that's being used you need to um use right barriers and there's a lot of work that

00:29:33.120 would need to happen at the C extension layer um to make these things

00:29:38.799 happen isne was not it was not you I know oh anyway let's move on um it's

00:29:45.360 also a lot of work so it's that's that might not be a good argument for a lot of you it's like well who cares just do

00:29:51.240 it that's actually what Dr Nick wasam yesterday right do you agree with that it's too much let's just do it that's

00:29:56.760 that's what you were saying no I think we should have a go for okay so it's a lot of work to go in implementation and

00:30:03.559 change it when you have a new implementation it's much easier but going back will take a lot of time you can actually break a lot of things and

00:30:09.600 it's a big big change that might or might not be worth it a lot of the Ruby applications out there actually are not

00:30:15.440 affected by this thing they're not CPU bound and they can deal with other work around of that I believe that most

00:30:21.799 people could deal with the global interpretor lock um python users have been dealing with that we've been

00:30:27.399 dealing with it it's not the best situation but it still works it's not the worst case in this in in the world

00:30:34.120 and a lot of people would say well instead of focusing on on this concurrency issue why don't we focus more on the memory usage on the garbage

00:30:40.880 collector all these things that actually slow down also Ruby that could really improve uh without having to break the C

00:30:46.440 extensions without having to break the way um things work and without putting the rubby code in

00:30:52.559 danger um also as I mentioned earlier if we remove the global interpret lock we would have to go back into the the C

00:30:59.000 code into MRI and we have to make sure everything is is um thread safe and it would make the C code just run

00:31:06.559 slower so what are the arguments for well first we really need better concurrency I think that's that's the

00:31:11.919 main that's the main reason we really want uh concurrency so that's a good argument the other argument comes from

00:31:17.919 the the python community and they have this analogy of the rubber boots and they're saying well it's kind of stupid

00:31:24.080 to wear rubber boots every day and to deal with that just because it might run rain well the reality is even if it

00:31:30.320 rains not everybody will be outside in the rain and not every and and boots will not solve all the problems in the

00:31:35.960 world so maybe we should not pay this price and we should just deal with the real challenge of um working with

00:31:42.919 threats and that's what that's the approach that some people are taking so these are basically the arguments I'm

00:31:48.519 sure you can hear other arguments but these are the main arguments you'll hear and this is probably why MRI is not

00:31:54.120 going to lose his his Global interpretor lock right away

00:31:59.200 so how do you achieve concurrency if we keep this Global lock well you you have

00:32:04.360 multiple ways the easiest one is to say well let's start multiple processes if you have multiple processes each process

00:32:09.840 can work on one core and then the threads will basically use the core we're good to go the problem with that

00:32:14.880 is it uses twice the amount of memory it's not that great well what you could do is you could Fork the process which

00:32:20.480 means you start the process you for the process and then now you have two process and they can basically share the

00:32:25.880 same memory well actually well I'll make it simple you start the the the fork and the fork will basically

00:32:32.559 would not need to copy the entire memory into the fork I sounds good the only problem is that MRI is not copy on right

00:32:39.039 friendly which means that or the garbage collector of MRI is not copy on right friendly which means that when you Fork

00:32:44.360 the process you don't have a lot of of memory used in the fork but when the garbage collector will come it will go

00:32:50.639 and check on all the objects and basically the memory will increase to be exactly the same amount as it is on the

00:32:55.840 master process and that's why uh Ruby Enterprise Chang that to they change the

00:33:01.360 implementation they patch the garbage collector to be copy and right friendly so now you only deal with the the

00:33:06.559 allocation done on the fork now uh Ruby MRI is actually going to fix this

00:33:12.720 problem uh they've been working on it for quite a while now to have the bit map marking um GC so what that means is

00:33:22.480 instead of marking every single object what will happen is that they will keep a bit map like a big table of all the

00:33:29.559 different slots what's being used and you have the marks being used are being put here so when you Fork the process

00:33:35.519 this bit map is being copied over to the fork process and now when the garbage collector would run you don't have to

00:33:40.760 actually uh re um double the memory you'll basically get the entire memory same as as master so that's going to be

00:33:47.159 fixed um there's already a patch for it and it's being uh worked on so her

00:33:52.279 already might be implemented in 194 or $2 so that's coming uh hopefully soon then you can do Ed um um programming

00:34:00.919 which is done um for example using even machine or something like that you can use the messaging actor model which is

00:34:07.080 an interesting approach uh and a lot of people are talking about that where instead of communic communicating by

00:34:12.119 sharing memory you communicate by sending messages between different objects and um this way you can uh get

00:34:18.879 better concurrency and then the approach that kisan presented uh for the future

00:34:24.760 is to run multiple VMS and the way that we work is that within one process you

00:34:29.839 would have two VMS if you have two cores for instance and each VM would talk directly to is on core and now you

00:34:36.879 actually you get the full conren even though you get a global interpretor lock on each VM so that's one way of solving the

00:34:44.200 problem so let's talk about memory management because it's also related to all of these issues so when object

00:34:52.119 allocation happens uh every time you declare an object you actually allocate something so this an example if you say

00:34:59.280 100 times a string it would actually allocate 100 strings in 100 string objects if you create a hash that would

00:35:06.200 actually create four objects you have a hash you have the two values and you have the key if you define um a class

00:35:14.040 and you create an instance of this class that will allocate one node which is basically the code itself uh of the the

00:35:20.680 class then you have two classes probably the class and the Singleton class and then you have the instance object

00:35:28.960 so the garbage collection prior to 1 193 so basically everything that's now um

00:35:34.560 released so 192 and everything before that's the way was working you add a memory so that's the Ruby hiip We have

00:35:41.640 basically a bunch of slots with the ones that are dotted are the one that are live objects they're marked as being object that are being used so think

00:35:47.720 about um a a variable that's actually being used inside your code you canot get rid of it right away and when you

00:35:53.640 try to allocate a new object um we'll get from the freely which are the available slots that are here available

00:36:00.280 in the Ruby hip that works great so what happens if you don't have an available slot so if three list is empty the

00:36:06.839 garbage collector is being called and the garbage collector will come and will I have a slide for that way will go

00:36:13.040 through the list and check every single object the entire uh Ruby hip and say

00:36:18.400 are you free or are you live so he goes and work through all this object until it will Mark each of them so it goes to

00:36:25.359 an object say are you are you live if the object says yes it will mark it if it doesn't say anything or if it's not

00:36:31.000 sure it won't mark it at the end it would basically sweep so it would take all the object that were not marked and

00:36:38.160 it would free them and well it would basically put them into a free list and it will reuse the slot for the new

00:36:44.720 object that's being allocated so what happens if you go through the entire uh the entire slot

00:36:51.040 and there's still use you all have Ruby object so what happens in this case the garbage collector comes goes scans

00:36:57.040 everything all the objects are marked you don't know what to do well the garbage collector cannot find anything to reclaim it will allocate another Ruby

00:37:04.440 hip with more space and this object this slots now can be used for new objects

00:37:10.920 now in one three things change and this is quite important to understand because um it would really affect performance in

00:37:17.400 a lot of your applications so this is what happens you want to allocate a new object we have a free list it will take the object the

00:37:23.839 same as before now what happens if free list is empty well if free list is empty you want to

00:37:29.599 allocate a new object there's no free list in this case the garbage collector would just take one row and would mark

00:37:36.560 it and it will go through and find the object that are used if they are not used it would swipe only this row and

00:37:43.599 use that to allocate the new object which means that the garbage collection time will be much much faster because it

00:37:48.800 doesn't have to scan everything so what happens if um there's still nothing at the end of the the the

00:37:56.920 if you basically go through all the rows and they still nothing what happens is we do the same thing we allocate a new

00:38:02.160 uh Ruby hip to get more more space so um what what you need to

00:38:08.119 understand is with this approach we might run the garbage collector a bit more often but the time spent in the garbage collector would be much

00:38:15.200 smaller which um if if you run some Benchmark and I did on on some of my applications you actually see uh quite a

00:38:21.920 lot of performance Improvement depending on your application obviously so there are different type of garbage collectors

00:38:27.560 you know we hear talking about the conservative garbage collector like the tea parties like the awful garbage collector but you have the precise one

00:38:33.839 that sounds like that's a good one right but how come MRI doesn't use a precise garbage collector well in the case of

00:38:41.040 serubi the garbage collection garbage collector is a conservative garbage collector and the reality is if you use

00:38:48.200 um a c implementation you have to use a conservative garbage collector that's the case also for macruby and the reason

00:38:54.000 for that is because uh you deal with pointers and you don't always sure if

00:38:59.119 the pointer to object what it points to and you're also not sure about the length of the arrays so um using a

00:39:06.240 conservative um GC is actually normal now the C Ruby is a Stop The

00:39:12.480 World um garbage collector it's the same thing from R for ruin what that means is when the garbage collector runs nothing

00:39:19.040 else can run at the same time so everything is stopped while the garbage collector run which is why it's important that the garbage collector

00:39:26.000 doesn't run for too long because in the meantime nothing happens so the laser sweep allows uh the stop the world to

00:39:31.680 actually be shorter so the more code can run at the same time which is what can run within the same

00:39:38.040 time and then it's a lazy sweep which I just explained which means that the the the time spent in a garbage collector is

00:39:45.079 now uh lower than it was before so if before you would run a garbage collector

00:39:50.560 it might take I don't know let's say 20 milliseconds which is quite a lot um now

00:39:55.800 it would probably spend only 2 millisecond but it would run more often so the the time spend in the garbage

00:40:00.920 collector overall within your program will be the same or maybe even longer but the time spent every single time

00:40:06.440 will be shorter so we have other examples of garbage collectors um in the Ruby world

00:40:12.319 we have um the case of micro which which I work on so I know it a little bit uh it's a multi3 generational GC so what

00:40:19.560 does that mean well that means that in the case of microbe uh we run multiple VMS and um we don't have locks they're

00:40:26.079 multi- entrance uh re-entrant VMS and um they don't have locks the lock happen in

00:40:31.200 the core which is I'm not going to explain that right now but the GC runs on a different thread connected to the

00:40:36.880 VM and every single thread registered to the GC um generational means that um objects

00:40:43.839 are organized by um time that they've been spending in uh the garbage

00:40:50.160 collector or in memory so you have the youngest objects that get allocated in one place and then you have the the

00:40:55.400 young object which the ones that survive few GC cycles and basically um your data

00:41:00.920 is organized in different ways depending on how long it lives and the reason why it's done like that is so uh the first

00:41:06.560 generation usually this object are being allocated and they allocated right away so you want to have them in one place uh

00:41:12.920 this is also the case for rubinus which uses a generational um garbage collector

00:41:18.760 the difference is in their case because it's a preise garbage collector and I will explain that in a second they can

00:41:23.839 actually move uh the object from one um rout one memory place to the other so rubinus

00:41:31.400 uses a stop the world preise moving generational GC so stop the world is a difference with macruby where macruby

00:41:37.359 does the garbage collection in theide um Rubin stops the world it doesn't stop it for a long time so it's not a big deal

00:41:43.839 but it still stops the world it's precise what that means is that unlike um C Ruby uh the the implementation

00:41:52.920 itself handles all the object and knows exactly what's going on with all its objects so it can actually uh it doesn't have to guess it doesn't

00:41:59.319 have to be conservative and conservative I didn't really explain what it is conservative is um instead

00:42:05.400 of when it's time to to mark and um release objects what it will do is if

00:42:10.960 it's not sure it will basically keep the object in memory in the case of a pre-sized garbage collector we know exactly where um how the memory is

00:42:18.599 allocated so we can actually take care of that we can even move uh the object in

00:42:24.119 memory so also because it's precise which I just explained you can actually move uh these things it avoids

00:42:30.240 fragmentation because as you remove slots they don't always have the same size you want to allocate new object they might not fit um by moving things

00:42:37.720 you can actually allocate uh the memory in a better way and you will avoid fragmentation by quite a lot uh

00:42:44.400 potentially rub could also uh release the memory that's that's not being used so let's say you you allocate a lot of

00:42:51.000 object the memory grows a lot and uh then you actually end up not using that much um you ruins could clear this

00:42:59.400 memory so you would see the usage going down when you use MRI um once you hit

00:43:04.559 you know 100 megabytes which probably is what happens when you start rails um you will never go

00:43:10.960 down so there you have another type of garbage collector which is the reference counting and this is what's used by

00:43:16.040 cpython and other um other Solutions um like Objective C on iOS if you don't use

00:43:21.839 um uh garbage collector uh well on iOS you don't have garbage collector yet so each object uh will keep track of it

00:43:29.119 State what that means is I create my object and I will keep I will tell the garbage collector this is who depends on

00:43:34.200 me this is what I do and you will keep on Counting what's going on and when it's time to be released the object

00:43:40.040 itself will basically say hey I'm free to be deleted this is great because it's decentralized you don't have one place

00:43:46.200 that deals with that every single object do do that the problem is that it's really hard for everybody else because now C extension developers need to do

00:43:52.800 that for every single location they do uh you have a lot of opportunities for uh leak memory leaks and for a lot of

00:43:59.520 other problems you also have the hybrid solution which is the case of the jvm and I'm not really familiar with the the

00:44:06.200 details of the jvm but the jvm first is is a VM that you can configure when it comes to the garbage collectors so you

00:44:12.240 can decide how you run a garbage collector uh but what it can do is it can be generational like we explained

00:44:18.640 before it can also do the occasional Mark and sweep where goes through the entire um memory and it can be copying

00:44:26.200 the data and copying the the memory uh which I will not explain now because I'm running out of

00:44:31.960 time so what are the tricks um well there are a bunch of tricks and uh the

00:44:37.400 MRI team is working on a lot of things to try to optimize garbage collection La sweep was one step but they they they

00:44:42.800 want to do much more than that so the bit map marking is what I explained uh

00:44:48.240 before and that allows for better forking uh which means that if you start

00:44:53.359 let's let's pretend you start one rail Z and you have the same code you could Fork this process and have two or three

00:44:59.599 other uh rail apps and they would not it would not be four times the cost of memory you would basically share what

00:45:05.359 was loaded at the beginning and if you use Ruby Enterprise that's exactly what you get then there's the parallel

00:45:11.400 marking which is every single thread um would basically uh talk to the the GC

00:45:17.240 and would Mark it's on object in every single thread it still stop the world but it's done differently uh NIS I did a

00:45:23.760 talk on that you can see slides online if you miss it or you can watch the video when it will be released um then

00:45:29.240 you have the mostly copying GC so the mostly copying GC what that means is that um in in

00:45:35.040 memory the problem is fragmentation in this case that's the problem you're trying to solve so the problem is we have this object we allocate we

00:45:41.160 deallocate and when you have this new object they don't always fit in the old object spaces so what you do is you copy

00:45:48.079 the memory and you basically copy the object that you know are safe and you move them somewhere else so you can

00:45:53.680 reallocate the other ones the point with that is that you end up using uh more memory but the fragmentation issue

00:46:00.280 almost goes away uh then Twitter did is on TW weeks on the garbage collector and

00:46:06.119 um you can read it online there's a link on on my slides uh basically they did a

00:46:11.240 few things related to what ruby Enterprise Edition did plus they apply some sort of long life GC patch which

00:46:16.559 means that it's almost generational so they look at the object that are here for a long time so when you start reals

00:46:22.640 you you load a lot of code and a lot of objects that will never go away so what they take is they take this object they

00:46:28.280 put them in a different place so they don't get garbage collected or not as often uh there was a long life GC patch

00:46:35.559 uh that was almost applied to one nine uh but it turned out it was not the best solution ever um and Matt can explain

00:46:42.960 that more in details uh they're really working on the GC so um if you have the opportunity to talk to them I'm sure

00:46:48.800 they would be glad to explain then the last thing is something that was in Ruby Enterprise and that's back in Ruby 193

00:46:55.520 it's a GC tuning and the GC tuning allows you to uh set a lot of settings on the garbage collector and you need to

00:47:02.000 understand a little bit how that works but um by doing that you can make sure the garbage collection usage is Optimum

00:47:08.640 um so it will go faster so what's the big deal with with garbage collector why do I care that much and why do I get

00:47:15.200 upset when people allocate too many object and I tell them they they're making a mistake well it's like if you

00:47:20.280 have a car if you have a car and it's a small car let's say you have a small Fiat and JLo is not in the car and you

00:47:26.040 want to do something with this car if you put 15 people in this car it's not going to go fast so you need to think

00:47:31.359 about the type of car you have if you understand how things work you'll be able to use code the best way you can

00:47:37.640 and the reality is currently uh the garbage collector makes things a bit slow um if you generate the r doog um at

00:47:45.800 least for for Nisan what happened for him is that he spends 80 Seconds to generate the entire Rog and 30% of this

00:47:52.960 time is spent in the garbage collector at Twitter um in 2009 uh they did some

00:47:59.440 test and they realized that they were spending 20% of the front end CPU on garbage correction I did myself some

00:48:05.800 tests when I was working at sunny and I was seeing the same number and sometimes even higher depending on the quality of

00:48:11.200 the code that was written and the amount of object being allocated per request so there is a concrete effect on

00:48:18.440 the performance of you you code especially with rails where a lot of objects are um allocated every single

00:48:24.559 request you need to realize the cost of the Govern collector because it would actually slow down every responses um

00:48:31.280 when you get a lot of requests coming in and what you can do um is use the tools

00:48:37.559 that I provided to you by ruby9 to see how you the garbage collection is being

00:48:43.000 used so um this is a simple example to to to show you how that works at the beginning of the code you turn on the

00:48:49.559 the garbage collector profiler and whenever you want you output the result of the garbage the garbage collector

00:48:55.520 collector provider uh profiler and when you do that it will basically show you

00:49:00.680 all the different collection that happened how long they spent what's the what's the use What's the total memory

00:49:06.160 uh usage that's allocated for the the in memory and what what's actually used within this memory when was it in

00:49:13.240 invoked uh when it comes to time uh in the timeline of the execution when was that executed how many objects are

00:49:20.119 available how long did it take in milliseconds now in itself like that it might not be very useful but um you can

00:49:27.040 use it with object space Which object space allows you to show you how the memory is being used what's in memory um

00:49:34.440 and here's a simple example where I disabled a garbage collector to uh to not get information I don't want I

00:49:41.000 didn't want object to be um to be freed so I turn off the garbage collector I

00:49:46.160 take a reference which is a at this this point in time how many strings do I have in in memory and then I allocate 10,000

00:49:54.680 strings and then I get get the count and I remove well I I get the new count of

00:50:00.520 strings and I remove the reference and then I will print the amount of strings and you can see object space count

00:50:06.640 object basically is a big hash representing the memory the total is the amount of um objects that are available

00:50:13.880 free is the the length of the the free list and then you have the allocation per object type so this tool can

00:50:21.319 actually really give you an idea of how the garbage collector is working and what you have in memory and what you're doing and I wrote a very simple uh rack

00:50:28.280 middleware to show how that works to do some of the work I was doing to see the optimization of the garbage collector so

00:50:33.799 I wrote something very very simple it's not fancy uh but it basically gives you that as a rack meterware it would run at

00:50:40.440 least in when you enable it and um as the garbage collection will come it

00:50:47.559 would tell you uh the last GC cycle happened x amount of request ago um four

00:50:54.440 GC Cycles were run or 4 DC Cy in this case and this is the amount of time spent for each of them and you can see

00:51:01.119 in this case I'm spending 14 15 milliseconds um in the garbage collector

00:51:07.000 my goal uh working on apis who had to be be be fast was to get a response time of

00:51:13.559 30 milliseconds anything above 100 milliseconds was really bad and the goal is really 30 milliseconds if I spend

00:51:20.680 that much time in the garbage collector I will have major problems and at the end I basically use object space to show

00:51:26.839 you some stats about how the memory change happened between the last GC cycle and this GC cycle so that gives

00:51:33.200 you an idea of how the memory is used and I would really want to encourage people to pay attention to this to to

00:51:38.319 the memory allocation because it really costs you a lot of time uh on request

00:51:44.400 time that's it any

00:51:50.680 questions oh yes it's mer Beast I forgot to B thank you yes it's the same as my

00:51:56.160 Twitter so I put the slides there in HTML you can see them any questions um Nick do you want

00:52:04.799 to to come and talk about your problem with the global

00:52:10.880 lock I kept so any

00:52:16.920 question okay a question over there oh no questions see a question over

00:52:23.480 there do not believe

00:52:30.559 that was not a real question you are absolutely right I was

00:52:36.119 wrong about that the photo was not taken by the person I I mentioned in in the source any other question

00:52:53.920 question so the let me just the question so the question is a lot of things change in the past with MRI um for

00:53:02.200 instance the string and the the encoding that broke what that changed things in one n how come we cannot change the

00:53:08.680 global interpreter lock is that what you're saying or the garbage

00:53:16.319 collector what makes the Gil special would you like to answer mat

00:53:29.720 the main concern we have for Gil is not not the the compatibility but it's it UNS safeness so the if we allow the C

00:53:38.359 Ruby to crash when you use threat we can remove the G but I we I don't want the

00:53:45.319 Ruby to be like that thank you

00:53:58.839 any other questions that's it well thank you very

00:54:03.920 much and see you next year

Matt Aimonetti

@matt-aimonetti

Explore all talks recorded at RubyConf 2011

+55

RubyConf 2011