Summarized using AI

DRb vs. RabbitMQ Showdown

Davy Stevenson • November 01, 2012 • Denver, Colorado • Talk

The video titled "DRb vs. RabbitMQ Showdown" features Davy Stevenson discussing Elemental Technologies' experience with communication layers in their video transcoding software. Over the years, the company used DRb (Distributed Ruby) for managing inter-process communication but faced limitations as their product demands grew. The talk outlines a thorough evaluation process undertaken to determine whether to switch to RabbitMQ, a message broker that utilizes the AMQP (Advanced Message Queuing Protocol).

Here are the key points discussed in the talk:

  • Company Background: Elemental Technologies is a Portland-based startup providing video transcoding solutions, leveraging Nvidia GPUs for superior speed. Their clients include HBO, ABC, and others.
  • Use of Ruby: Ruby and Rails are employed for the user interface and workflow automation, allowing rapid development and integration into customer environments.
  • Problems with DRb: Although DRb was effective initially for communication between services, it posed challenges with reliability in more complex workflows, particularly when multiple servers needed failed-over communication.
  • Defining Problems: Stevenson emphasized the importance of accurately assessing customer needs and defining what success means before making technology changes. They recognized issues with DRb's point-to-point communication model maintaining that servers had to know their communication partners directly.
  • Switch to RabbitMQ: After investigating different options, the team decided on RabbitMQ for its robust features, including:
    • Decoupling of Producers and Consumers: Producers do not need to know the identity of consumers, allowing for more flexible communication.
    • Message Durability and Reliability: RabbitMQ guarantees that messages won’t be lost even if the broker fails, overcoming the challenges faced with DRb, where unretrievable message loss could occur.
  • Implementation Plan: The transition to RabbitMQ will be rolled out gradually, first addressing internal communication before expanding to other components as needed.
  • Results and Future Plans: Preliminary results indicated that RabbitMQ performed reliably, allowing Elemental to prepare for increased growth and more robust customer demands without compromising system reliability.

In conclusion, the transition from DRb to RabbitMQ enables Elemental Technologies to enhance the reliability and scalability of their transcoding services, aiming to support increased client expectations while ensuring system stability. This change positions them favorably for future expansion down the line.

DRb vs. RabbitMQ Showdown
Davy Stevenson • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

Our enterprise product has used DRb as a communication layer for a number of years now. With the growth in popularity of EventMachine, it is time for us to evaluate whether a switch is right. This talk will discuss our current DRb setup, our process for comparing DRb and EventMachine and our comparison results, the decision on whether to switch, and either our reasoning to stay with DRb or how the resulting switch to EventMachine was handled.

RubyConf 2012

00:00:15.679 thank you very much for being here to listen to me talk you may have noticed that I have changed the title of my talk
00:00:22.519 slightly which is a cautionary tale of don't submit your talk proposal before
00:00:27.679 you actually know what you were going to talk about um I think the moral hopefully the moral of the story will be everything will
00:00:33.879 work out okay anyway but you'll have to let me know how it goes uh my name is Davey Stevenson I'm a
00:00:41.399 software engineer for Elemental Technologies in Portland Oregon um I take a little aside here to say that I
00:00:47.960 think that Ruby conf has been really amazing so far I think everyone has just been it's been a friendly group of
00:00:53.960 people um I absolutely in love with the Ruby friends thing I think it's a great way to help um you know break the ice
00:01:01.440 between some programmers who oftentimes are a little bit shy and don't really want to talk to other people so if
00:01:07.200 anyone wants to be a rubby friend with me find me later so what is this talk all about so
00:01:15.400 um one type of talk that I really really enjoy but I don't think um I don't see that many talks of this category is kind
00:01:22.400 of the talks that talk about what companies are actually doing with Ruby
00:01:27.479 with rails um what problems are they solving solving um you know what what are they using this technology for in a
00:01:33.840 really explicit way and so um I'm hoping in this talk to give you guys a little bit of a inside scoop as to how
00:01:41.439 Elemental uses Ruby um I think we use it in a little bit of an interesting unique way and I'm doing that by uh talking
00:01:49.920 giving you guys a story about a recent problem that we encountered with some of the technology that we were using uh a a
00:01:58.759 a use case that we were using drb for as our product grew it DET ended up not
00:02:04.719 really fitting our needs anymore and so we needed to figure out how to solve this problem um and what to replace drb
00:02:12.400 with in that context so Elemental Technologies we're
00:02:20.480 a startup company in Portland Oregon we do video transcoding software um and our
00:02:26.239 big claim to fame is that we use Nvidia gpus to parallelize the video
00:02:31.519 transcoding process which allows us to get much faster uh video transcoding
00:02:36.720 throughput as compared to all of our competitors and so um the speed of our products is vastly superior to all of
00:02:43.239 our competitor products which has allowed us even though we are a very small startup to be very competitive with a lot of the bigger names in the
00:02:50.560 space so we actually sell server appliances we sell these appliances to
00:02:55.840 large companies uh to solve their video transcoding needs these servers are installed in
00:03:02.120 oftentimes in their data centers and uh we need to be able to integrate these servers into their workflow
00:03:09.400 environments so who are some of our customers uh HBO uses Elemental servers
00:03:15.480 to um to transcode all of their content that they have for broadcast delivery uh
00:03:21.640 and they transcode that content for uh online adaptive bit rate uh consumption
00:03:27.400 to power HBO GO so HBO has you know a vast uh library of
00:03:34.400 broadcast television and this was is great if you're trying to transmit this
00:03:39.439 uh this video uh to you know cable subscribers but you know in the age of Netflix and Hulu they needed to be
00:03:45.959 competitive and be able to transmit their video over the internet as well this is a big problem for companies
00:03:51.840 because the number of different devices that their consumers are going to be using to consume this video is very very
00:03:57.879 Broad and the transcoding is needed to to create the video specifically for
00:04:04.439 each type of device so the videos needed to be displayed on an iPhone or an iPad are very different than the video
00:04:10.959 displayed on your computer or your TV hooked up to HBO GO or your Android
00:04:17.040 device so companies like HBO need lots of servers to handle all this transcoding we also power Comcast
00:04:24.320 Xfinity application and uh the ABC News iPad app among others
00:04:31.240 so another big win for Elemental was that we uh helped power the online coverage of the London Olympics in U
00:04:38.080 multiple continents uh and many many countries uh the BBC used Elementals
00:04:44.280 gear to live stream 24 um channels 247
00:04:49.520 throughout the entire London Olympic Games so for customers like the BBC having servers the having these live
00:04:56.199 streams up 247 is of the utmost importance we need to make sure that these servers are stable and uh able to
00:05:02.720 continue transcoding uh for weeks or months on end so you may be asking yourself you
00:05:09.720 know this is Ruby conf and all video transcoding I'm pretty sure you guys aren't doing video transcoding using
00:05:15.240 Ruby and if you thought that then yes you are correct we don't use Ruby to do any video Trend coding however Ruby has
00:05:22.199 helped Elemental in a lot of ways to basically grow from a very small startup into a very competitive player in this
00:05:28.840 market and so how has Ruby helped Elemental so we use a rails application
00:05:35.319 uh running on each server to handle the user interface of these uh the of our
00:05:40.479 servers and handle all the the transcoding events uh this has allowed us to have a best-in-class user
00:05:46.440 interface that's um much more usable and uh much easy easier to integrate into
00:05:53.120 these workflow environments as compared to a lot of our competitors Ruby allows for a lot of
00:05:58.759 fast-paced development you can get a lot done really quickly with Ruby this has allowed us to scale very quickly as a
00:06:04.720 startup and to develop our products and add the features that we needed in a very short time
00:06:10.479 frame not only that but I'm really happy that I get to use Ruby on a daily basis
00:06:16.520 uh in order to solve these really interesting problems we all love the Ruby language it's it's a joy to to use
00:06:22.639 and it helps us attract worldclass developers to work on our products so let's go a little bit deeper
00:06:30.800 into an example uh server node that we have in our uh in our use case so we I
00:06:36.560 already mentioned that we use the rails to provide a user interface for these servers they can uh our customers can
00:06:43.000 access uh this through the HTML interface or use the restful API to uh
00:06:48.680 automate their workflows we also run a multi-threaded ruby script as a service demon on each
00:06:56.639 of our servers this service demon is respons responsible for managing the vast majority of uh the actual
00:07:03.599 transcoding that happens uh in our file to- file case it needs to do a lot of load balancing to make sure that it's
00:07:10.319 consuming the maximum number of resources and thus getting the maximum number of throughput for our uh encoding
00:07:16.879 processes that are running uh for our live streaming uh products it needs to
00:07:22.080 be able to manage the the resources and handle starting and stopping these live
00:07:27.440 streams and making sure that all the information is Flowing appropriately throughout the entire
00:07:34.680 system so let's step back into this our story so in the beginning we are trying
00:07:39.879 to build a product and oftentimes you know the initial requirements and feature sets are what Drive the
00:07:45.280 technology decisions and this makes sense because you know none of us have crystal balls none of us know definitively what the future is going to
00:07:51.759 hold or where uh our products are going to evolve into uh we're not sure what
00:07:57.319 new technology is going to be created in future that may solve a problem better we don't know necessarily what our
00:08:03.960 customers needs are going to be in the future as we continue developing our
00:08:10.240 products there are new features uh new use cases that we are developing and often times it's easy to reuse the
00:08:16.280 technology that we're already using uh to solve these new similar problems often times the problems are very
00:08:22.400 similar maybe sometimes they aren't quite as a perfect match but even then
00:08:27.599 sometimes it's still useful to use the the known technology as opposed to spend the time to learn something
00:08:34.519 new and sometimes it can be a good business decision to reuse existing
00:08:39.599 technology in a close enough match as opposed to spending the time to to explore new options if you're in a
00:08:46.480 fast-paced startup environment sometimes getting the product out the door and in your customer's hands is more important
00:08:52.760 than uh than making sure that the technology you're using is a perfect match
00:09:03.640 so Elemental has been using drb as a communication uh method for uh about
00:09:08.920 three years now drb is really easy to use it is fast and it is reliable it's a
00:09:15.480 really easy way to uh allow two different Ruby processes to communicate to each
00:09:21.640 other here's a quick little code example to uh uh show kind of what the power of
00:09:27.279 what drb can do so uh we have this server class and it exposes two methods you can get data from the server which
00:09:34.279 returns a simple array or you can send data to the server uh and the server just prints that data out uh now we're
00:09:41.560 going to create a drb service uh on a specific U and we're going to pass in
00:09:47.519 this instance of This Server class that we've created so now anyone who's connecting to this drb service has
00:09:53.880 access to those methods in our server class and to keep the this uh this
00:09:59.560 process running we're going to call drb thread join so that this service is running at continuously if we're going
00:10:06.640 to run it from the command line so now we can open up IRB require drb again uh
00:10:12.800 create a new drb object to connect to that URI from our previous from our server and now we have an object that
00:10:20.399 can communicate directly with that server object we can call that get data method from a completely different Ruby
00:10:26.360 process and get that array back and we can send data to that server as well pass in the string hello and then over
00:10:32.720 on the service as it's running it's going to be printing out that uh in that other
00:10:39.440 process so you can use drb to send data and messages you can use it to request data which can be very powerful if you
00:10:46.040 know exactly what data you want you can ask the service to give you a very specific set of
00:10:51.440 information you can pass entire Ruby objects back and forth using Marshall which can be incredibly powerful um if
00:10:58.600 both both Ruby processes have the same concept of those objects one thing about drb is it is a
00:11:06.040 point-to-point communication method so you have to know exactly who you're talking to uh you open up a connection
00:11:12.120 to a direct service particularly in order to to perform that
00:11:21.360 communication so how does Elemental use drb so looking back to our little State
00:11:27.160 diagram from before we can see that we use RB to to communicate between our rails user interface and the Ruby
00:11:33.720 service demon that's running on all these servers so our first product was a file
00:11:40.120 to- file transcoding server uh this would basically transcode files to multiple formats as fast as possible
00:11:46.760 take up as much GPU resources as possible get those files transcoded you know High frame rate uh all of those
00:11:54.639 encoding processes are returning a lot of statistic information back to the service demon uh very continuously this
00:12:01.680 is information such as the current frames per second that's being encoded and certain quality metrics to describe
00:12:07.720 you know is is the current encode is the quality good or is it you know having to reduce the quality in order to you know
00:12:14.440 for some other reason of some sort this is information that we don't necessarily want to store permanently there's no
00:12:21.160 reason for us to have a historical um you know we don't need to run statistics
00:12:26.240 on this information uh later so we don't really want to store this data in the database we don't want to waste the
00:12:31.639 Cycles to submit that uh work to the database when we could be using those
00:12:36.720 CPU Cycles to transcode files faster um but our customers still want to be able
00:12:43.120 to query that information if they're seeing some issues with you know you know they want to see what the current
00:12:48.160 frames per second are they want to see what uh some of that data is they still want to be able to query that information and so we can have the rails
00:12:55.440 UI uh ask the demon what is the current stats for for a particular incode get
00:13:00.880 that data back display it in the UI and that's a really great use of
00:13:07.600 drb so uh our next product was our live streaming uh uh server which is uh you
00:13:14.279 know this is a kind of a different use case uh they're getting live data in and we have to transcode it in real time to
00:13:21.000 multiple output formats uh and keep that running as long as possible so in this use case now the UI needs to actually
00:13:27.880 send some commands to the service demon we need to be able to tell it when exactly to start the event maybe to
00:13:33.760 switch inputs if one of the inputs goes down maybe our output CDN is down maybe
00:13:38.800 aiz down but some of the other cdns are up so we want to tell it to stop encoding certain outputs but keep other
00:13:44.639 outputs up now we need a way to to communicate from the rails UI to the
00:13:49.760 demon send commands and drb is a pretty good fit for this use case um you know
00:13:55.120 you may say there's other ways to send commands and that may be right but in this case using drb for this met our
00:14:02.399 needs very well there was no need to try anything
00:14:07.440 else so as uh our customers had more and more uh need for transcoding they needed
00:14:14.320 U more and more servers and they wanted a central way to communicate with all of these servers and manage these servers
00:14:20.240 to manage failover of the servers uh in case uh components die because a lot of
00:14:26.160 these people they are running encodes for months on and we all know that server components
00:14:31.519 die that happens we have to plan for that so um we have um now a management node it's not doing any transcoding
00:14:37.920 itself it's just managing the state of all these other servers it's also running a ruby service script uh instead
00:14:45.079 of managing the livein codes it's managing these other servers and so in order to do that you know we those that
00:14:52.040 service needs to be able to communicate to the services on all the live streaming nodes so we can use drb for
00:14:58.000 that too this use case was a little bit different though especially as uh the needs of our
00:15:05.000 customers changed over time um we need to make sure that the those live nodes
00:15:10.720 are up no matter what it doesn't matter if other components are failing if the external database server go server goes
00:15:16.360 down we don't care we still want our live streams to go out and so we may need to make sure that our service
00:15:23.000 demons are still running on all those nodes however there's still data that's being produced that we need to make sure
00:15:29.040 is going to be eventually persisted to the database so that the customer can tell if something went wrong uh during
00:15:36.199 this sort of downtime you know things like did one of the cdns go down and did an alert pop up did um you know um
00:15:44.680 issues happen with a transcoding where certain outputs were were failing for certain parts of time they want to know
00:15:49.839 that information so now we have the use case where drb is not only just sending
00:15:56.040 messages but it has to be ensuring the receipt of that message ensuring that the database is getting that information
00:16:02.720 and that's not something that drb really is suited for so now we came to to our problem we
00:16:10.800 have our technology that we're using drb for communication and it's not fitting our needs anymore we have things that we
00:16:16.839 need to solve that drb can't provide for
00:16:22.240 us so if you find yourself in this sort of situation what are you going to do the first important part is to of course
00:16:28.920 Define the exact problem that you have so uh I outlined the three use cases
00:16:34.000 that we have for drb two of those use cases are still fine drb still fits our needs in those use cases we need to
00:16:41.480 define the problem as the exact portion of the code that is actually um not fitting our needs
00:16:47.680 anymore and then we also have to think about what our customer needs are what do they actually want out of our product
00:16:53.240 what is most important to them and that's going to help determine our decision-making process for what our
00:16:59.680 solution is going to be and most importantly before embarking
00:17:05.400 on S a project like this is to Define what you mean by success beforehand
00:17:12.039 that's a good way for you to be able to then meet your goal later you know and so in this case uh for Success we want
00:17:18.559 to be replacing that inner demon communication with something that's much more reliable and robust can manage um
00:17:25.559 saving messages even when other components are not available to to receive those messages and be able to
00:17:31.919 store those messages and um and we can be guaranteed that data is not going to be
00:17:37.919 lost for us speed isn't a primary concern um for our
00:17:44.440 users what they really care about is the reliability stability and the resiliency of our servers um as far as the Ruby
00:17:52.080 layer goes if if it's working that's great what they really care about is those encoding processes continuing to
00:17:59.159 run all the time and so also when you're embarking
00:18:05.720 on a project like this it's uh very uh useful to limit the scope of the
00:18:12.080 project replace one single component at a time again the inner demon communication layer is what we're going
00:18:18.159 to focus on here uh replacing a single component helps make sure that you're uh not trying to do too much too fast and
00:18:26.360 then once you've replaced one portion you can work on incremental development maybe then you can find other areas
00:18:32.480 where using a new technology is going to be better than the existing technology and move on from
00:18:41.720 that so why would we start looking into a message que and uh it became really
00:18:48.840 obvious to me that that a message cue is the right solution for this problem because I noticed that I was beginning
00:18:55.000 to reinvent the wheel I was starting to develop what was kind of a pseudo message Q around and in the middle of
00:19:01.520 our drb code except my message Q code is of course not going to be nearly as stable and robust as the many many
00:19:08.919 different message CU options that are available uh to
00:19:15.000 use the other great thing about using a message CU is it's decoupling the producers from the consumers uh in this
00:19:22.600 sort of case where we have these multiple live uh live encoding servers and one management server the the live
00:19:30.039 encoding servers don't actually care who receives the messages that they're sending out um they don't need to know
00:19:36.440 who the consumer the management node is and um and so drb with its
00:19:43.200 point-to-point communication uh adds this limitation to the producer the
00:19:48.280 producer has to know exactly who it's communicating to and msq also helps reduce the
00:19:56.039 responsibility of the various components our service demon was responsible for a lot of things and if it's responsible
00:20:03.240 for a lot of things then that means that failures in one area are going to trickle down and cause uh impacts that's
00:20:09.200 unintended consequences in other areas of our code um we want to make sure that
00:20:14.799 you know other areas in the service demon aren't going to mean that we're losing data in the message messaging
00:20:22.960 section so we decided to use rabid mq as our message broker uh system it's an
00:20:28.120 asynchronous message broker it uses amqp as its messaging protocol however it can also use a lot of different other
00:20:34.280 message protocols which was very attractive to us so if we do determine that amqp is not the right fit for us in
00:20:40.159 the future we would be able to swap that out another great benefit of RAB mq is
00:20:45.200 it has high availability clustering available so as we move towards uh
00:20:50.600 wanting to make ensure the availability of all of our messages we can use the multiple servers themselves to enable
00:20:57.919 High availability High availability
00:21:02.960 clustering so quick overview of amqp the advanced message CU protocol this is
00:21:08.440 simply a protocol to used to receive route queue and deliver messages again there's other protocols available that
00:21:14.320 do very similar things um the one good thing about amqp is that there is a lot
00:21:19.640 of implementations in multiple other languages which for us was a really uh good benefit because we do have major
00:21:26.279 components written in other languages are made in coding processes are run in C++ so being able to easily uh be able
00:21:34.279 to use amqp in our other areas is going to be a great strength for us as
00:21:39.720 well so in order to use uh amqp with Ruby you use the ruby amqp gem this uses
00:21:46.440 EV vent machine coming back to my original talk so wasn't completely out of the woods there so uh you use the
00:21:53.440 amqp gem to communicate with the rabbit mq broker that's running on your server
00:21:58.679 and so here's a quick example so you require amqp and um it
00:22:04.000 require it requires event machine to be running and so we uh call event machine
00:22:09.400 run and run everything inside that that Loop um we create a connection to the
00:22:15.480 amqp um the rabit mq broker and create a channel that we're going to communicate
00:22:21.480 on uh we're going to uh specify the cue that we want our
00:22:26.720 messages to to uh to go to and then rabid mq uses it's an exchange system so
00:22:33.240 you actually publish to exchanges not to the Q specifically this is actually kind of interesting because it means that you
00:22:39.600 can Define different U message routing uh protocols as necessary um you could
00:22:45.840 have multiple different cues and the exchange could fan out the message to multiple cues or do other Advanced
00:22:52.159 routing that this is just a really simple example where the exchange is just tying directly to this one Q
00:22:58.880 now we're going to use event machine to add a periodic timer every 5 seconds it's going to publish the current time
00:23:04.600 to The Exchange and then we are going to subscribe to that queue and uh whenever
00:23:10.760 we receive something we're going to put print out to the screen so if you run this you get a little script and every 5
00:23:17.000 Seconds it'll be printing out the
00:23:22.360 time so now uh we're going to Def we've defined the scope of uh of our
00:23:29.760 solution so here's our main our main uh issues that we want to make sure that rabit mq is solving for us we want to
00:23:36.600 make sure that rabit mq is going to add reliability to our system rabit mq provides this concept of
00:23:43.559 durable cues these are cues where if the message goes in the queue you're not going to ever lose that message it will
00:23:49.520 handle persisting that message to the disk and making sure that that message is never lost even if the rabbit mq
00:23:54.760 broker goes down rabit mq also ensures the order of
00:23:59.799 the messages it receives which is really good for us to make sure that uh we're getting a correct um uh State change U
00:24:08.799 messaging system uh from what what's happening on our Live Events we're knowing that the order that the the
00:24:16.320 management note is receiving the messages is the correct order of the events that happened on those live
00:24:21.679 notes so stability we want to make sure things are more stable and this is
00:24:27.120 really accomplished by the decoupling of the messaging from the service demon um
00:24:32.240 into a different component Rabbid mq is going to handle all of that messaging um decoupling that that that possible
00:24:39.760 failurea failure case from the service demon Itself by making sure each of these two different components is
00:24:45.720 focusing on the things that are important to it it means that the failure in one component is not going to cause unintended consequences and
00:24:52.520 Trickle and affect other areas of our system and resiliency want to make sure
00:24:58.919 our um our entire cluster is very resilient to errors and that is going to
00:25:04.559 be accomplished by enabling the high availability of the Rabid mq Brokers so that the messages are are shared amongst
00:25:11.600 all the different servers and so that way if one particular server has some sort of issue and crashes we're not
00:25:17.440 going to be losing that data and we're still going to be using
00:25:22.480 drb in some of our use cases drb is still the right tool for the job it is very simple which I really really like
00:25:28.679 it's very easy to use the syntax is very Ruby esque um and not only that but we
00:25:35.120 still have that use case where we need to request specific data from our service demon and in that uh drb is
00:25:40.720 still uh the real solution for that
00:25:46.960 task so a little bit about our final implementation so we have these multiple
00:25:52.919 producers which are the live streaming nodes and the single consumer which is the management node
00:25:59.039 and uh even when we expand out to have redundant management nodes there's still
00:26:04.440 always going to be one consumer and that also means the producers don't actually have to worry about who their single
00:26:09.480 consumer are the management nodes themselves are going to manage making sure that there is always someone there
00:26:15.720 to consume those messages it's not something that the live en codes have to worry about and we do have our two main
00:26:22.640 different classes of data we have uh the state change information that we need to be durable that's so sort of things like
00:26:28.919 if an alert happened on one of those nodes a change input happened um
00:26:35.600 encoding failed for some reason all these sorts of things um we need to make sure are stored and always persisted in
00:26:41.480 the database those can go along durable cues and then we have the more transient
00:26:46.520 status information that that information I was telling you about earlier where you know what's the current frames per second what's the current quality
00:26:52.919 metrics the sort of information where we want the real-time information but we don't necessarily need to store the historical
00:26:58.760 um information from that those can be stored in a different way in Rabbid mq
00:27:03.799 in a way that's not going to tax the
00:27:09.120 system so here's kind of a overview of what our final imp implementation is
00:27:14.720 going to look like we have you know our main management node and again this is kind of simplistic eventually we're
00:27:21.159 going to have redundant management nodes but that's you know the management nodes will deal with that sort of thing um and
00:27:27.159 then the multip worker nodes and then we have the clustered message Brokers on all of those servers and rabbit mq will
00:27:34.279 handle the high availability clustering of uh the Brokers between all these different
00:27:39.799 servers we still will have a drb connection between the UI and the re service on each of those nodes
00:27:46.880 individually um but each of the service uh each of the different components can send messages to the to the message
00:27:53.480 broker as necessary so that allows us to do incremental development if we determin that sending messages from the
00:27:59.960 rails UI is going to be a better uh solution for some of our use cases we can do that and uh another great benefit
00:28:07.480 is in the future we're hoping to uh have our communication from the encoder process processes themselves also use
00:28:15.480 amqp um and be able to send the messages uh into the message broker uh from that
00:28:20.919 from those processes as well which again are C++ applications
00:28:29.279 so what are some of the results we've seen after doing some of this refactoring the performance of rabid mq
00:28:34.559 has been really really impressive to us the speed and throughput that rabit mq provides is more than uh sufficient for
00:28:43.200 anything that Elemental will need it for in the near future this uh the broker can handle you
00:28:48.320 know millions and millions of messages and we're not sending anywhere near that amount of information so for us the performance has been
00:28:55.440 great the decoupling of the communication was was um something even
00:29:00.519 more satisfying than I had originally intended you know we all know that we've had we have code that we want to
00:29:05.640 refactor and we have an idea of how we're going to refactor that code and then once you actually take that that
00:29:11.559 that refactoring takes place and you see the new structure sometimes you're like man that was satisfying this is just a
00:29:18.039 beautiful implementation the way that the the this information is now structured very cleanly and separated it
00:29:24.159 really helped drive home to us that message Q was the right solution for this problem for
00:29:31.880 us uh we haven't actually released this code yet it's slated for our next major
00:29:37.279 release which is going to happen by the end of the next year um or end of this year sorry um but I'm looking forward to
00:29:44.159 that because I think it's going to be U positioning our products for a lot of growth um our customers are asking for a
00:29:49.640 lot more uh interesting uh reliability and resiliency in our clustered systems
00:29:55.640 and I think rabbit mq is going to be able to provide a lot of these features for us as we grow
00:30:02.640 um and I'd like to give a quick shout out to my co-worker Matt who did a lot of the implementation details and also
00:30:09.440 made those great diagrams for me thanks Matt and um I wanted to know if anyone
00:30:14.799 had any questions for
00:30:20.039 me two questions first I'm with who actually made and
00:30:30.960 I would love to be the second question is like you
00:30:36.279 mentioned that like d to was
00:30:41.919 not what was is it just Rel wasn't reliable or something well part of the
00:30:48.399 the the demon to demon thing was that oh so he was asking why exactly was drb not
00:30:54.000 a good fit for the demon to demon case was it the reliability and it was kind
00:30:59.080 of a little bit of the reliability but one of the main things was actually the point-to-point communication in that uh
00:31:05.880 in order to use drb you actually have to know who you're actually talking to you have to know the two servers have to
00:31:11.399 know exactly who each other are and in some of uh the cases where the the live
00:31:17.120 node is sending its information back to the to the master like for example if if we have one master node in the backup
00:31:23.440 Master if the master node goes down then all the live codes need to know that now
00:31:29.480 they need to be sending all their messages not to the master node but to the backup Master that's information that those servers don't need to know
00:31:35.799 about they don't need to know who they're sending their information to they just need to know that the master is going to be getting that information
00:31:42.080 so that's one reason why drb wasn't really a good fit for that use case um
00:31:47.200 and then Additionally the so drb when when um when we found we were using it
00:31:53.840 to send the messages to the master if if the master wasn't up for some reason
00:31:59.000 then the live node would have to be storing that information locally in memory because drb is you know is part
00:32:06.399 of that Ruby process itself and so but there you get into a challenging
00:32:11.639 situation where if the master is going is down for some reason let say we're upgrading the software on those Master nodes and all now all these live nodes
00:32:18.600 are kind of queuing up all this information that they need to send to the master node once it's coming back up
00:32:24.159 what if the service demon crashes at that point um because all that data is in memory we're going to be losing all
00:32:30.399 that information and so in order to avoid that problem then we're going to have to be implementing the methodology
00:32:36.880 to somehow persist that data to state so that drb can then re access it and so
00:32:42.679 and that's kind of where you're getting to the point well where the message Q has already implemented that the message CU handles that case for us so we don't
00:32:50.279 necessarily want to be coding that up for ourselves so we can use drb anything else
00:33:01.720 M what volume of messages are we seeing so um for each of our encoding processes
00:33:08.039 usually sends uh one message a second approximately and um and our main live
00:33:14.080 servers can run about six to 10 Live Events
00:33:19.559 concurrently uh and then our current largest cluster is about 20 noes so um
00:33:26.880 so that's kind of we're seeing for our current uh live encoding servers however
00:33:32.639 we have a new product that's that we're going to um we're uh releasing which is just a packager and so that's not
00:33:39.200 actually doing any transcoding it's just repackaging for a different delivery mechanism and that server can run many
00:33:46.039 many more concurrent live um packages package events at once you know that's
00:33:52.000 something uh we're seeing something along the lines if it can run 60 to 100 maybe 120 and so then you're getting
00:33:57.960 into the case where each node is running you know 120 um messages a second plus
00:34:03.240 you're clustering them all together so that's kind of the scale that we're seeing so it's not really that much not
00:34:08.440 really that many messages you know you can talk to lots of other companies who are dealing with high volume messaging
00:34:13.639 it's blows us out of the water resp
00:34:30.560 anything
00:34:41.760 else so so did it uh increase the complexity of our platform and I think what you're saying is well so because
00:34:48.359 drb is included within Ruby and so we kind of get that for free um now with using rabbit mq there's a whole bunch of
00:34:55.119 other things that we need to package up and distri rute uh on our servers is that kind of the question you're asking
00:35:01.119 and yeah that definitely complicates our our packaging we're going to have to now um package up the not only Rabbid mq
00:35:08.599 itself but all the the new gems event machine and amqp in order to handle this
00:35:14.280 so there's definitely overhead that we had to deal with there um to to handle s
00:35:19.680 making that switch
00:35:29.079 well thank you everyone I'm really glad that you guys came and hopefully you learned something or at
00:35:34.839 least discovered something new that you can do with Ruby so thanks again
Explore all talks recorded at RubyConf 2012
+46