Advanced Eventmachine

In this talk presented at RubyConf 2011, Jonathan Weiss explores advanced topics in EventMachine, a powerful event processing library for Ruby. He shares insights from real-world applications where EventMachine is extensively used, particularly in the context of deploying and scaling applications in a production environment.

Key Points Discussed:

Conclusion:

Weiss concludes that while EventMachine offers powerful capabilities, developers must remain vigilant to avoid blocking the event loop and ensure that all operations are efficiently structured. The potential of EventMachine can lead to significant performance improvements when applied thoughtfully in scalable architectures.

Advanced Eventmachine
Jonathan Weiss • New Orleans, LA • Talk

Date: September 29, 2011
Published: Mon, 12 Dec 2011 00:00:00 +0000
Announced: unknown

We use EventMachine heavily in production. It is handling uploads to S3, managing thousands of messages a second or distributing agent workload. This taught us a load about EventMachine and some weired corner-cases. I want to about such advanced EventMachine topics and shared some use-cases and experiences from the trenches.

RubyConf 2011

00:00:17.199 so welcome to my talk about advanced event machine um i'm from from berlin germany we do a lot

00:00:24.320 of work around deployment and scaling because we work on scalarium which is um

00:00:30.720 an amazon ec2 cluster management solution where we handle lots of lots of connections and manage

00:00:37.120 uh thousands of machines and this is like the way we discovered event machine and and

00:00:42.480 started to use it and the idea of the talk is to to to describe a couple of patterns we

00:00:47.680 came across in a couple of situations that you should aware of um if you use event machine

00:00:53.199 um before we dig into event machine um who already uses event machine

00:00:58.399 oh so quite a few good so i have to i can keep the introduction

00:01:04.320 uh pretty short um yeah so event machine as it seems a lot of you

00:01:09.600 guys already know is a very simple and and event processing library for ruby um

00:01:16.720 kind of uh yeah the node.js for for ruby if you want to so the idea is that i can

00:01:21.920 have a lot of multiple io streams without blocking on them so that i can execute them what looks like in parallel

00:01:29.759 and and whole programming of your machine is done via callbacks so very similar to node.js and the cool

00:01:36.079 thing about event machine is that it already brings a lot of a lot of protocol implementations for

00:01:42.640 the typical stuff like udp tcp http smtp and so on so it's very simple

00:01:48.159 to use and because it's ruby it's very easy to extend and to reuse your existing libraries which is also

00:01:54.479 kind of the culprit of it because it's very easy to uh to break it if you use existing libraries it's very

00:02:00.479 tempting as we'll see in a couple of minutes to to just require a gem and without knowing it every everything

00:02:07.680 gets slow and a lot of unexpected things happen yeah um so this is what you should definitely

00:02:14.480 have a look at um you have to think before you use or require a new library um so the the basic idea is turning like

00:02:23.040 synchronous step-by-step sequential code like this one into um callback a

00:02:29.680 callback oriented code where i um i execute i call

00:02:35.040 command and register callback that will be called once the the command is finished

00:02:40.480 and my code can move on so it doesn't block on the load data call and i get notified once

00:02:46.640 the the process is finished and this this idea is pretty pretty simple i mean every

00:02:52.160 people especially in the the race community know it like if you program javascript uh in the browser with with uh like dom

00:02:58.800 um notifications and stuff like that so it's it's nothing too unusual um but it's pretty pretty uh

00:03:05.599 it's it's pretty easy to to get confused once you have a very deep nesting of callbacks um but the the the simple ideas is very

00:03:12.879 simple so let's maybe like step back and explain to you how we came to across event

00:03:19.040 machine and why we started to using it our simple use case was we

00:03:24.400 process a lot of events and all of those events we generate log files and we store them on amazon s3 so the basic idea is somehow

00:03:31.680 we generate a log file and we have millions of those and we then want to upload those to s3 so

00:03:36.799 the typical the typical step would be just a sequential call you have a log file you post it to amazon s3

00:03:43.680 you wait while you're uploading you weld while amazon is processing the the data and then you get a response back that

00:03:49.760 says i've got the file so nothing nothing's fancy here but if you if you have a lot of log files and

00:03:56.239 you and you do it with a lot of files it looks like this right so i upload a file i weld i wait for the

00:04:02.319 upload i work for the processing it's done okay let's take the next one yeah upload the next file waiting uh i get

00:04:09.840 the response so i can upload the next one and so on so the time it takes is is the sum of the individual times yeah so it takes me a

00:04:16.720 long time and for most of the time my process is just sitting there waiting for networking waiting for amazon to

00:04:22.240 to respond if you do the stuff like this with event machine um the the picture is a very different

00:04:28.960 one because i can now upload multiple ones at at once yeah i can maybe then get the first response back i can

00:04:35.040 upload the next one and then i get they get the responses for all the other ones back so the the time that that i'm spending is

00:04:41.919 is like roughly um like the the maximum number for the individual like the long i have to wait for the longest request

00:04:48.160 yeah it's it's only roughly because it depends like on how how many parallel processors you're doing it

00:04:53.280 and on other things but you can you can optimize this a lot with stuff like that so for

00:04:58.560 for example in our case we had like i think a six times uh

00:05:04.880 increasing throughput of of managing all of those log files just by doing it with event machine

00:05:11.520 so this is uh the theory behind it um we also wrote a small uh library to do it so um this is like the code if you would do

00:05:18.720 it uh first was uh rate i will write ws and then is happening that is a small

00:05:24.720 library that we wrote um and as you can see you have a little bit more boilerplate code in order to

00:05:30.000 get event machine running so we have em run and then at the end we have em stop but the actual code is not so much

00:05:35.680 different apart from the the callback part where i wait for the response to to come back to

00:05:40.960 me if you're interested in happening it's as i said a small library um

00:05:47.600 if so apparently the important thing about about event machine is the event loop right uh

00:05:53.199 it's in the name um and this is was for me the most important thing in order to

00:05:59.120 to really understand how event machine works and where the culprits are is to really understand how the event loop

00:06:04.560 and the the corresponding threads are working so the setting up

00:06:09.919 the event loop is pretty easy you just call em.run and at some point you stop it again

00:06:15.520 and em run is is an endless loop it will start the the event machine loop

00:06:21.520 and it will um not return unless you stop it um so so if you have like small parts small

00:06:27.919 agents small demons that are running event machines usually this is like the first and the last call that you're

00:06:33.919 doing yeah you're calling event machine and inside the loop you're setting up your your callbacks and then you do nothing

00:06:39.360 and wait for event machine to process all events and come back to you um what you don't see is what what

00:06:45.919 happens in the background is so your quote is actually a very small part of what is what is happening

00:06:51.120 um but your code can break anything else and and this is uh unfortunately the the thing that you

00:06:56.560 always have to have in mind if you if you program with event machine is you have to be very careful what your code does and where it does it because

00:07:02.479 it's very easy to break um to break the loop so what you do is your code your register

00:07:09.120 your callbacks and during such a loop run what event machine is doing is it's running all timers that

00:07:15.759 you registered it's uh checking uh file descriptors it's uh checking sockets networking stuff like that

00:07:21.120 and then it calls your code if necessary and then the next iteration starts so the whole idea is this loop is

00:07:27.919 endlessly running and always um and always in this line so checking file descriptors checking networking

00:07:33.680 your code if any callbacks that you registered and next round and um what what you have to keep in

00:07:40.720 mind is that you you you should not um disturb this this uh cycle and unfortunately it's

00:07:48.000 very easy to do so but if you disturb it if you somehow break it by by your code being too slow

00:07:53.759 everything falls apart so um how can you break it and what you

00:07:59.280 shouldn't do is stuff like that yeah putting in the in the main loop like a sleep yeah of course you nobody

00:08:04.800 puts a sleep in there but uh doing any synchronous calls is pretty much the same thing

00:08:10.960 um we're now blocking the main event loop because we're waiting for an http call and if this if this server that we're

00:08:17.759 getting the the response from is very slow everything is waiting on on this um on the server so so you shouldn't do

00:08:26.479 any synchronous calls inside the event loop

00:08:31.919 and this is the part where i said in the beginning it's very easy to break it even by not by not knowing because if

00:08:38.399 you if you like you have a class that is running somewhere inside the event loop and you just need a small new

00:08:43.599 functionality and you just require a gem somewhere that does a couple of http calls that

00:08:48.880 uh does some networking whatever it's very easy to to like for this code to sneak into the event

00:08:54.480 loop and then you're wondering why is everything processing so slow why are the messages piling up um because you just yeah you blocked the

00:09:01.360 loop so um an example case how we did it for example is

00:09:06.800 we use nanite which is an agent framework on top of mqp event machine and rabbitmq

00:09:12.880 and what it does is it distributes messages across agents and we pretty much

00:09:20.080 broke like the whole setup by just having the the default um behavior of the ampu mqp game which is if there

00:09:26.880 is data on the socket it reads it and if you have a lot of messages a lot of data it reads a lot which means

00:09:32.480 um if you look back to the to the loop if you're if you're if the reading part becomes too big

00:09:37.600 because too much work you're slowing down the whole loop so so our code was essentially like reading

00:09:43.839 thousands of messages then we had one iteration which means like callbacks were notified once and then we're again reading thousands

00:09:49.920 of messages which means we slow down the whole cycle which means effectively everything is is

00:09:56.480 running to a halt um and we did it by just like using the

00:10:01.519 mqp game which is already evented and stuff like that so it's it's very easy to do it so what what we need to do

00:10:06.560 is to somehow split the work and make sure that our code is not running for to uh for too long

00:10:14.240 um and you can you can um how you can do that we'll show in a minute but the main

00:10:19.839 idea is that inside the main reactor loop which is the the main thread you shouldn't do

00:10:25.680 anything that that is slow you shouldn't do anything that could potentially block it you should also if you're not calling

00:10:31.839 any synchronous code only asymptotes code you still shouldn't do anything that takes too long because

00:10:37.279 as i just saw in our case we didn't do anything synchronously but because it was so much work we still

00:10:42.640 blocked the loop um so the main idea is to try to keep everything as short as possible also to do all i o handling

00:10:48.480 inside uh the main loop and uh avoid yeah avoid non-evented libraries

00:10:54.880 um i can i cannot stretch this point too much because it's so easy to just

00:11:00.160 require a library this is also like why i think node.js is like more successful in

00:11:05.600 inventors applications it's not because they have like a better programming model better libraries whatever it's just because

00:11:11.519 they unfortunately that everything is evented so there is no like synchronous library that you can load

00:11:17.760 while on ruby if you just like do uh like a file.open um operation yeah you just use the the basic libraries

00:11:24.399 that are part of the of the standard library um you're already using some

00:11:29.519 something synchronous that is blocking you um so yeah take care and not do not use anything

00:11:35.279 that is uh not evented and the other thing is as i said the um

00:11:41.440 your core bits are running as long as it as they take so make sure your callbacks are not taking too long

00:11:46.880 um because even even if you use evented callbacks or event applications and libraries inside your

00:11:52.320 your callbacks if they take too long you're still blocking everything and one nice way to do it how you can

00:11:59.440 break up those in smaller steps of work is to use em next tick

00:12:04.880 so em next tick um you give it a block and what it does is it schedules it to run in the next iteration of the

00:12:11.600 of the loop inside the main the main reactor thread and what you can do with it is for once

00:12:17.200 you can like come back from a background thread or the other thing is you can if you have like a very big piece of work that you need to be doing

00:12:23.680 like reading thousands of messages on a socket you can just like read a couple of messages and then

00:12:29.200 reschedule yourself to run in the next iteration this is an example how you can do it

00:12:34.639 with with a simple proc and a method so let's assume do something is potentially slow or like potentially

00:12:41.200 processes a lot of a lot of work so we can just re just call it um every couple of um only process a couple of messages and

00:12:48.560 then reschedule ourselves to call to be run inside the next iteration um

00:12:54.000 yeah so this is what em next tick does the other thing um is it brings you back from a background thread

00:13:00.880 and what it means um we will have we'll see in a second but um this is like one

00:13:06.639 of the of the important concept is every time you use i o like wrap it in next tick

00:13:12.560 this will ensure that that you run inside the main loop inside the main the reactor thread

00:13:19.040 there are cases where you don't want to you do this or there are cases where you want to to have to do something slowly or you know

00:13:24.880 i have to compute like the fibonacci number of 2 billion or something and it will be slow so

00:13:31.760 what you should do is wrap it in an emd fur block emd4 does the opposite of am next stick

00:13:37.519 so it brings you out of the main reactor thread into a background thread by default there are 20 threads

00:13:43.199 in a thread pool that you can use for it and emt fur will make sure that this code that you give it

00:13:48.320 runs in such a background thread and so in our example case here we will

00:13:54.399 not block the the main loop if i wouldn't wrap wrap it in emd fair we will never

00:14:00.000 see like the the the continuous printing of the of the timer because we will be blocking

00:14:05.920 the main loop because the sleep will block the loop and in this case we've wrapped it in emd fur so now

00:14:11.120 um we're blocking a background thread which means the main loop can still run

00:14:16.720 so what emd4 is good for is if you have long running computations if you have a long running process

00:14:22.320 if you do something that is potentially blocking the main loop wrap it in the emd fair block the only

00:14:28.480 thing that you have to remember is if you're doing i o in it the io has again to be rescheduled in the main in the main thread

00:14:35.199 so uh sometimes your code will will like ping-pong between uh deferred blocks and next ticks because you you're doing computation in

00:14:41.839 a deferred block then you you then have it have having some i o so

00:14:46.880 you do it in the next tick block and then going back again with a result to a background thread

00:14:58.000 and if i would stop here um those are pretty much the basics of event machine

00:15:03.120 next tick deferring and then the typical callback stuff so this is like the the simplest

00:15:08.959 thing that you can do with event machine but um also like for me what i found myself is

00:15:14.000 like the hardest part is finding out where your code is running and

00:15:19.199 where it should be running yeah sometimes if you have um as i said a library a class that you're requiring somewhere and calling

00:15:25.760 out of event machine loop it's hard to find out now am i now in the main loop am i in a background thread where should it be

00:15:31.040 running and it took me a while to like really get the feeling where i should be going and so the main thing for you guys to remember

00:15:37.440 is i o in the main in the main uh reactor threat anything that is potentially slow in a deferred

00:15:43.920 uh threat and if you are unsure wrap it in a defer on a next stick call

00:15:51.920 but there are a couple of um syntactic sugar a couple of helper

00:15:57.040 classes that makes it a lot of easier to to not get tangled in such a like a callback code so one of a couple of those

00:16:03.759 are uh deferable cues um channels and iterators that i want to talk about and they help you to structure code so

00:16:11.600 that you don't get tangled up in all of those code and it's very easy to make sure that you're calling the right code from the right place

00:16:18.880 so the first one is deferrable deferrable um at first it looks a little bit

00:16:24.880 complicated and it's it's hard to make out why you need it but it allows you to write your your

00:16:30.000 libraries in a way that that they feel like an event machine library and that makes them very very easy to

00:16:36.560 use what they are basically is like a mixture between a state machine

00:16:42.320 that that and uh like a simple concurrency mechanism that allows you your own code to register callbacks and execute them when

00:16:48.959 it's ready so the idea is that i can register callbacks again

00:16:54.639 just like on the built-in event machine classes so for example i can register callbacks

00:17:01.120 on the success case and i can register callbacks on an error case and

00:17:06.319 um the cool thing is that callbacks can be edited at any time yes i can register i currently create an

00:17:12.480 object where i can register callbacks for successful error cases it doesn't sound too interesting so let's yeah look at an

00:17:18.559 example this is a very not too useful example so i have a class i include the deferrable

00:17:24.000 module and then i create an instance for of it inside the run inside the m loop

00:17:29.520 and i i just adds a couple of callbacks yeah a callback for success a callback for error case

00:17:35.200 and then after five seconds i uh make this deferrable object succeed

00:17:41.360 so that my successful callback will be called so those are like the build the basic primitives that the deferrable

00:17:47.039 allows you to do so the interesting question is why do i need stuff like that yeah and the answer is to to um to build

00:17:55.200 classes where you that use again um internally callbacks in order for your user to make

00:18:01.120 it very easy to use them themselves so um let's assume i have

00:18:06.880 i have using the google spellchecker api and because i of course use the

00:18:13.039 asynchronous http library to use it i i would in you if i would my using

00:18:18.080 myself i would like to to um to do a call to the to to google and then register my

00:18:24.080 callback to be running and then i pause the result and stuff like that so this class is abstracting all this logic away from us and using

00:18:31.039 deferrables i can use it like a built-in event machine library so i can i can instantiate it i can then call the

00:18:38.000 method that is doing the work as currency in our case check and then register callbacks that should be running once i get the result back

00:18:44.559 so my callback is getting all the suggestions and then printing them out so very very boring but the code feels right the

00:18:50.960 code looks like it should be looking in event machine case and it's implemented using deferrables

00:18:56.400 so the the implementation is also very simple i have my class that

00:19:02.880 includes deferable then i have the the check method that does the actual asychronous call to

00:19:08.080 to the http um library of the http service of google and um the interesting part is

00:19:15.919 that this that on a successful http call i called succeed on myself yeah i called succeed on the on the google spreadsheet

00:19:22.720 class with the the the result that i got from the from the google servers and by doing so

00:19:29.840 all callbacks that um that the user registered on this google spellchecker instance will be

00:19:35.520 called with the with the result

00:19:40.799 and the same would be true for the error case and so on but this allows me to um so if my class uses like

00:19:47.679 very complex nested um structure of event machine calls of like a callback of callback of callback

00:19:54.320 i can i can still make it look like very nice and just like notify the user when i'm ready

00:19:59.840 without the user having to keep track of um all the all my internal state and to know where where where i'm in

00:20:06.080 like did i parse the xml already or not uh and i'm still asynchronously of

00:20:11.120 course so um i hope that you understood what like defer deferable is all about

00:20:17.840 so deferrables are just like syntactic sugar that makes it very easy for you for for you to use a class

00:20:24.240 that that uses asynchronous methods like in this case an http request

00:20:30.880 again so this is the the code that uses it so for um so as a user of this class i can

00:20:36.080 just register a callback myself i can just say if the once you get back the suggest the

00:20:41.600 suggestions from the google spreadsheet api this is what i would do with them and i don't have to care how it's implemented

00:20:46.840 internally so deferrables are also heavily used by by

00:20:52.480 event machine itself and by other libraries so for example the em http request library that

00:20:58.240 we're using right here is returning your a deferable so the http variable is a deferrable um actually

00:21:06.240 so yeah most of the vendor machine libraries and protocol implementations actually return deferrables and this is what's

00:21:12.240 it's nice about them is that you can just like put register callbacks on them

00:21:17.600 and be called when they're ready uh yeah another another couple of things on

00:21:23.039 deferables is they have timeouts so you can say you can give a deferable timeout if it's

00:21:29.440 not succeeding within a specified amount of seconds it will automatically fail and you can also reset them which um is

00:21:36.080 a nice a nice um way to to to like schedule work so for example

00:21:41.760 um the the em jack library which is the beanstalk um implementation of an event machine

00:21:49.039 client they use deferables in order for you to be able to all to like instantly use the class without even

00:21:55.600 having a connection established because you're just registering callbacks on the class and

00:22:00.640 once the connection is registered all callbacks will fire so the commands will be executed you get the result back

00:22:06.000 and if the connection should like break or fail or time out they reset the deferrable which means

00:22:12.400 all callbacks are cleared again and then you can again register ones of them and once the connection comes back uh

00:22:18.080 all commands will be flushed to the to the server so for you as a user um the

00:22:23.200 api is very simple very nice but the the um the behavior is very complex

00:22:28.240 because you're stacking up commands you you're um firing them once you get the connection back you're clearing them

00:22:33.520 when the connection is lost and stuff like that and it's very easy to implement those with the variables so

00:22:39.360 so if you're having a more a complex class that is doing a lot of um callbacks behavior with with event

00:22:45.440 machine you should definitely um think about um using deferables in order to make the implementation a lot

00:22:50.720 nicer another thing that helps you structure your code if you if you're coding with

00:22:56.080 event machine are cues so um cues are exactly that what the name says a very simple

00:23:01.440 cue which is like in memory so inside your process um so and but the the the um the nice

00:23:08.559 thing about them is that they help you to to make sure that the code runs in the correct place

00:23:14.159 so for example um it helps you to um to be sure that you're running in the in the reactor thread so everybody can

00:23:20.960 push the queue and you can pop up it and you know the the popping part will always be in the main reactor thread and

00:23:26.240 can respond to to the messages um the queue is very simple

00:23:31.520 so you just instantiate it you can register a callback that will be run on pop and then you can push messages onto

00:23:37.919 it the important part to know is that the pop part will only be run once so if you if you want to have some kind of a

00:23:44.559 worker pattern where you continuously like execute something on items on the cube

00:23:50.640 you need um like to reschedule yourself just unlike in the example we we saw before where you're just having a proc and at

00:23:57.440 the end you're just um executing the proc again so in this case we will be now

00:24:02.840 continuously working off messages of the queue and the the um the pop part will be

00:24:08.880 inside the main loop uh inside the reactor thread so you don't have again to worry as i said

00:24:14.320 before you because you usually have to worry i am running it first or in a director thread where should

00:24:19.600 i be running and stuff like that so with it with a queue it's very simple so that you can schedule work from from

00:24:26.400 multiple threads don't have to care where those threads are and then work them inside the main reactor loop

00:24:38.960 another thing that helps you to structure your code are channels so channels are yeah a simple pub sub

00:24:45.440 implementation again in memory inside event machine so they are not a replacement um um for for like a real messaging um

00:24:53.520 uh application like uh mqp and rabbit and q and whatever yeah they're just in process but they help you again to

00:24:59.440 structure your code they they make sure that that um subscribers and publishers

00:25:05.919 can can like safely communicate um cross thread with with each other so the the idea is again

00:25:12.720 um i have a channel and i can just subscribe to it so i can subscribe for

00:25:18.080 multiple threads so i can subscribe out of the main thread i can subscribe out of a deferred

00:25:23.279 thread and then i can of course can push messages onto it

00:25:28.799 and as you would expect i can once i subscribe to a channel i get the messages and

00:25:36.559 and can unsubscribe again the the um the way you usually use um channels is

00:25:42.240 if you want to um have a long running deferred thread that is doing um like work that takes a

00:25:48.480 long time and you want to to be able to like communicate your your intermediate results

00:25:53.840 to the main reactor thread and do something with with them so usually um you would then just just

00:26:00.159 publish out of the of the deferred long-running thread you would publish to the queue to the channel and then you would have

00:26:07.200 inside your reactor loop you would you would respond to the messages and process them

00:26:14.240 another very useful thing that you can use is the iterator so em iterator allows you to to do

00:26:22.559 to iterate over set in parallel

00:26:28.000 so if you if you use a normal the built in iterator and ruby so let's assume we have we have an array

00:26:33.279 of a couple of urls and want to find out what is the biggest site um hiding

00:26:39.279 behind those urls yeah so i i'm for ev for every url i uh load the page and i count

00:26:44.880 uh like the the the byte size of it or the the the size of this of the of the

00:26:50.320 string of the body uh in a in a sequential implementation i would have to like

00:26:55.440 call ever sequentially every url wait for it and then at the end i could i could uh like find out which is the

00:27:01.440 the biggest um if you if you want to do this um we

00:27:06.640 already learned that with event machine it's a lot easier to do this in parallel but still you have a lot of coordination work to do yeah you have to keep track

00:27:12.480 of of um the number of parallel threads you have to to somehow store the intermediate results and then at the end

00:27:19.279 iterate over those and this is exactly like the shortcut that em iterator gives you

00:27:24.559 so em iterator um allows you is a built-in like with integra sugar of

00:27:29.679 wrapper to do so what you you give it a range and you give it the number of concurrent workers that issue that you want to use and and then a proc

00:27:38.080 that will work on every on the individual item in the range and then

00:27:43.520 optionally you can give it a second proc that will work on the on the result once

00:27:48.640 all results are there so the the same example um with the iterator looks like this i give

00:27:55.840 it the the the array with the urls i'd say that i want to be processed in 10 parallel workers and

00:28:01.760 then i give it a proc that that specifies what do you do with the individual item in the set so in our case we're doing the async

00:28:08.320 http call and the important part is that we have to manually signal event machine that

00:28:13.600 we're done processing this part so this is the the iterator dot return of the result

00:28:18.960 um because event machine doesn't know when are we finished so our callback has to to signal that we finish processing this

00:28:25.600 item and the second proc gets all the responses once they're ready so i don't block for i don't wait for

00:28:32.159 all responses i can just register a callback again and don't have to to do all the coordination work

00:28:37.360 myself like like coordinate workers instantiate the the the deferred threats like store somewhere the intermediate

00:28:43.440 results and then uh then then to like recheck are you ready with all with all of my urls or not

00:28:50.080 so yeah so em iterator helps you to to um to work so a lot of uh over range or

00:28:56.159 like of a set of items in parallel

00:29:02.240 once you work um with event machine for some time you can you you start to uh you should

00:29:08.159 definitely to start to look into fibers and em secronay so um

00:29:13.760 once you understand all of it and you start to write complex applications the problem um is that you start um

00:29:20.480 that's very easy to to um yeah to get into callback health so let's assume

00:29:25.840 we have like a very simple task yeah we want to to search um for for a word on google yeah

00:29:32.880 so this is um the an api a very simple implementation of an api client to the google search api

00:29:38.880 we're doing an essence request we're loading um the the url and then we're doing a json

00:29:44.159 parse on the on the results um but what happens if we now want to

00:29:49.440 like load the first result yeah so we have inside the successful callback we have now to

00:29:54.559 schedule a second request that loads this first url and then prints out the size of the

00:30:02.080 response for example yeah and once we have the size we don't

00:30:07.440 want to like print it we want to like store it somewhere yeah so let's store it in memcache um so again we have an asset callback so

00:30:13.520 we have to to now call memcache and register callback that once the the savings is done we get notified

00:30:21.120 so it's it's very easy like to get nested callback by call vapor callback for callback

00:30:26.240 which is um at some point like it's very hard to look very hard to read very hard to debug

00:30:31.600 and to understand what is happening especially if you now i'm now having only the successful case yeah only the successful callbacks a good

00:30:39.120 implementation of course would have to handle all errors in between yeah so i have like a very deep nesting

00:30:44.320 of successful callbacks of error callbacks and so on

00:30:49.360 what you can do in order to avoid this is to use fibers so fibers are part of rewind 9 and they

00:30:54.960 are also implemented in j ruby and and rubinius and what they allow you to

00:31:01.120 do is to to make this code look like it's it's synchronous yeah so it's very a lot

00:31:07.200 easier to read and understand but the the um the behavior is the same as

00:31:12.320 before it just reads nicer which means it's nicer to understand and maintain

00:31:19.039 what if you would use fibers in order to to to make this code look synchronous

00:31:26.240 is you have basically to to resume the fiber on on a on a

00:31:32.880 callback and um and yield before you're getting the your response so it sounds complicated but as you see

00:31:39.120 in a minute um using it is very is very very simple if you would do it manually it would look like this you're instantiating a

00:31:44.480 new fiber then you um you're doing the azerone is calling and set up

00:31:50.320 and as a callback you you register a resume of your fiber which means um i want to now take back control

00:31:58.480 in in the fiber because what the fiber allows you to do is to to like voluntarily yield the the control flow

00:32:04.799 in contrast to threats where you like where in threading there is there is a fixed amount of time that you have to run

00:32:10.000 and after that you you're going to sleep and another thread is running with fibers you can you can schedule

00:32:15.519 when those times are so in our case we um we say that um on the last line

00:32:20.960 that we yield the um uh the control back to to the five of

00:32:26.640 the controllers or to the to the main thread um because we're waiting for the response and inside the

00:32:34.640 callbacks we said that they want to resume this control because now we got what we waited for so um the the actual line of code is the

00:32:42.640 last one that that um allows you to do this equivalently so this is the like the complex implementation if you

00:32:48.640 would hide it in a method it would look like this so let's say we

00:32:53.919 we want to get again we want to get an url so we we hi we put all this fiber

00:32:59.360 code inside the method so that the actual the actual api for the for the user looks a lot nicer it says content

00:33:05.679 equals get off url what it will do is it will behave exactly um like our asynchronous example before

00:33:12.880 um because what will happen is as soon as you call the method get your f and your fiber will yield to the

00:33:19.519 main uh to the fiber that controls it or to the main thread and and this code would suspend yeah

00:33:25.279 until the result is is is there of the http call

00:33:30.399 and then your code is in control again which means you can process the result so um the line content equals get

00:33:37.679 off the url and then it could of course do the other processing with it looks like it's synchronous code but in

00:33:43.279 reality you're always like yielding control to um to the main fiber and getting it back once a result

00:33:48.880 is there because it's very uh uh very very ugly to have all of

00:33:54.000 those fibers everywhere in your code and like to abstract away all of this functionality there's a very nice library called

00:33:59.200 uh em synchrony from uh ilia gigoric that does exactly all of this for you so

00:34:05.600 it it's um it's already manipulating all the built-in clients

00:34:10.720 and of event machine to do is all of this exactly so um you just have to instead of doing em.run you say em.

00:34:17.440 synchrony and um you can write the code like like it's synchronous but in reality it

00:34:24.480 will do exactly the same callback and yielding stuff that we did

00:34:29.919 manually before and the difference is that you don't have to see it you don't have to to

00:34:35.440 um to have maintain this deep asset this deeply nested code um

00:34:43.280 the cool thing is as i said it's implemented from for most of the uh i think for for all of the the

00:34:48.560 built-ins and for most of the the big um event machine libraries it

00:34:53.679 already did all of this so for example from for my sql twi to an active record it works out of the

00:34:58.720 box for the em http request for the couple of db um

00:35:05.440 mapper libraries and for vm.jack so you don't have to to do it manually just by requiring em

00:35:12.240 synchrony and calling em sql instead of em run you don't have to you can use the

00:35:17.920 synchronous looking code but in reality it's asynchronously um what you can do with it is

00:35:26.000 go a step further and use for example goliath which is um an evented web framework by the same

00:35:32.800 guys um so by ilya and post rank that um leverages um the fibers and gives you an

00:35:39.280 erect api on top of it that allows you to have a web framework that is invented

00:35:44.800 by implementation but looks like it is synchronous so in this case we're very very simple

00:35:50.320 um yeah application that just responds with hello world

00:35:55.760 but we can of course have um one that is a little bit more complex and in our case we're doing

00:36:03.040 um we're doing like a blocking my sql query in it but the cool thing is it's not blocking

00:36:08.960 yeah because the um if this would be a rails action the

00:36:14.320 rails process would be hanging for five um seconds in goliath um you can just hammer it

00:36:20.000 um in parallel and will still respond to you because the the fiber that is so what glass is

00:36:25.599 doing is it's implicitly wrapping every action call in a fiber so that

00:36:30.880 um that whenever you do something that is that is like blocking or you're handling control back because

00:36:37.040 you're waiting on a response it will yield and another fiber can run so in

00:36:42.160 this case i can answer other requests still while waiting for this five second block and contrast if you would use rails with

00:36:48.160 it rates will now block and no other requests will be handled because we're waiting for this mysql

00:36:53.200 query yeah so if you have um something that that um

00:36:59.839 that if you have an application where you need a very high throughput and a lot of small requests but you want to do a lot

00:37:05.839 of them in parallel you should definitely check out goliath

00:37:12.720 yeah this is basically it

00:37:18.079 what i wanted to say in wrapping up is what you should definitely remember out of this talk what i hope is is that

00:37:24.800 event machine is a great library um but you have to always make sure as a friend of mine a colleague of mine

00:37:30.000 matthias is saying do not block the event loop yeah he's like yelling at uh constantly but it's

00:37:35.280 very unfortunate it's very easy to to do so yeah it's you have to be very careful not to block it and sometimes

00:37:40.640 even if you do everything correctly yeah if you use only evented libraries you you make sure to use like deferables and

00:37:48.000 stuff like that it's very easy to and to just do too much work because you didn't test it with like thousands

00:37:53.760 of messages processing it at once or so to uh to break it

00:37:58.800 so you have always to remember to to not have any code that runs too long so like to limit yourself do a like em next tick

00:38:06.640 like yield um and uh try again like in the next iteration and um the the the two important things

00:38:13.520 are to like to know what em next tick and what emd4 does and to know where to use one or the

00:38:18.640 other which means i o and very small fast operations in the main reactor loop with the amnesic

00:38:24.160 stick and anything that is potentially longer running with background threats with defer and the

00:38:31.440 the all the libraries like the iterator and the queues and channels help you to to structure your code like this so that

00:38:36.800 it's a lot easier to use and um i am the i think with fibers

00:38:43.760 and and em synchrony it's a lot easier to use and i think that you will see more and more applications

00:38:49.040 using using um event machine behind the scenes but it will look um synchronously with em fiber

00:38:56.560 yeah um are there any questions yes please do you have any

00:39:02.560 advice about testing these applications yes uh testing um so testing is is and

00:39:07.920 can be a little bit of a challenge um because testing means you have to have a running loop so

00:39:15.440 what we usually end up doing is test all the the the domain logic which

00:39:21.680 is hopefully hidden well away in classes in normal unit tests that don't rely on the

00:39:26.880 event look to be running where we sometimes just um stop for example um like defer calls or timer calls like

00:39:33.839 fire immediately and then um in your tests that that test that you're calling the correct methods

00:39:39.359 of the correct classes in in the in this in the correct cases um there we we already we

00:39:44.560 actually fire up the event loop and um and which means that usually you you

00:39:50.400 would um set up assertions to run um with a timer so for example my test would would um would wrap it so

00:39:58.320 it would wrap yeah em run then you would um call your the methods under test like

00:40:04.160 the instantiated class and call the message that you're interested in and have a timer that is scheduled

00:40:09.200 like to run in in like a second or so that has the assertions um so with with an

00:40:15.920 approach like this you can you can test most of the of the stuff um

00:40:21.040 for uh for for more complex um interactions what we do is we we have

00:40:27.440 like a real integration suite that that actually fires it from the from the outside so you're observing the the behavior of for

00:40:34.240 example are those messages processed correctly and do i get like a different thing in my database

00:40:40.160 where you're not like interested in the in the um implementation of of the agent of the

00:40:46.079 daemon that you're using you're just like testing the response of it so you i'm firing a message into rabbitmq and i

00:40:52.160 say like five seconds later there should be in the database something that looks like this

00:40:57.920 and how it implemented it or how it got there i don't care um and with an approach like that which

00:41:02.960 is like like the the broader phase the integration testing approach um we're pretty pretty happy it works

00:41:09.280 very nicely the only the only drawback is of course that um at some point it takes it takes a long time to run all your test suite

00:41:16.160 but if you if you nicely separate the the domain logic and heavily used for example deferrables

00:41:22.079 which make it very easy then to have a clear interface that you can mock it's very easy to to have a unit test for the domain models and for the

00:41:28.720 for all the domain logic and then have a very lightweight test suite that tests that that event machine setup code does the

00:41:35.040 right thing and and at the right places um yeah another question up here

00:41:45.920 exception handling was the question was how do we handle exceptions um unfortunately error handling in general

00:41:51.920 in event machine is of course not the greatest because it's it's like sometimes deeply nested

00:41:58.560 um so what what uh and you have to make sure that you're not killing your loop by by having like an

00:42:03.680 an exception that you're not catching so and in general the code that we use in the event machine is is very very narrow so what we tend to

00:42:10.079 do is to have um small agents that have a very narrow functionality

00:42:15.200 or responsibility and that ping-pong the work through through a message bus for example so

00:42:21.680 that the the part that we have to test and handle is very small and then those agents have um a pretty

00:42:28.480 rough general error handling that um that then uh worst case the agent is dying and we have to respawn

00:42:34.480 another one but um in general the the um especially if you if you use um like the fibers the error

00:42:41.760 handling becomes a lot nicer because it's um it's not as deeply nested as

00:42:46.800 before so you can you can like check response values and so on so we we started to um to use fibers more

00:42:54.079 and more just for the for the nicer error but handling error handing is still still a case that that is not very great

00:43:00.480 especially if you get an exception somewhat deep down the the um out of the reactor because for example um a socket closed

00:43:07.520 for um so an example is if you use um somewhere a library that's not thread save for

00:43:13.599 example and you call it out of different threads um you can get like a weird exception out of

00:43:18.640 all the reactor because um because yeah one thread tried to to write on a closed socket or something like that so

00:43:24.000 um this is definitely a case where you have to take care of it's not so easy to do

00:43:29.760 with event machine if you have very deep nested callbacks um and you have to take care of error handling on

00:43:35.440 all levels so my advice would be to limit the nesting and slicer responsibilities have very

00:43:40.640 narrow responsibilities so you have defined interfaces um yeah another question

00:43:49.440 no other questions okay so thank you very much

00:44:20.839 do

00:44:35.839 you

JW

Jonathan Weiss

@jonathan-weiss

Explore all talks recorded at RubyConf 2011

+55

RubyConf 2011