00:00:15.679
thank you very much for being here to listen to me talk you may have noticed that I have changed the title of my talk
00:00:22.519
slightly which is a cautionary tale of don't submit your talk proposal before
00:00:27.679
you actually know what you were going to talk about um I think the moral hopefully the moral of the story will be everything will
00:00:33.879
work out okay anyway but you'll have to let me know how it goes uh my name is Davey Stevenson I'm a
00:00:41.399
software engineer for Elemental Technologies in Portland Oregon um I take a little aside here to say that I
00:00:47.960
think that Ruby conf has been really amazing so far I think everyone has just been it's been a friendly group of
00:00:53.960
people um I absolutely in love with the Ruby friends thing I think it's a great way to help um you know break the ice
00:01:01.440
between some programmers who oftentimes are a little bit shy and don't really want to talk to other people so if
00:01:07.200
anyone wants to be a rubby friend with me find me later so what is this talk all about so
00:01:15.400
um one type of talk that I really really enjoy but I don't think um I don't see that many talks of this category is kind
00:01:22.400
of the talks that talk about what companies are actually doing with Ruby
00:01:27.479
with rails um what problems are they solving solving um you know what what are they using this technology for in a
00:01:33.840
really explicit way and so um I'm hoping in this talk to give you guys a little bit of a inside scoop as to how
00:01:41.439
Elemental uses Ruby um I think we use it in a little bit of an interesting unique way and I'm doing that by uh talking
00:01:49.920
giving you guys a story about a recent problem that we encountered with some of the technology that we were using uh a a
00:01:58.759
a use case that we were using drb for as our product grew it DET ended up not
00:02:04.719
really fitting our needs anymore and so we needed to figure out how to solve this problem um and what to replace drb
00:02:12.400
with in that context so Elemental Technologies we're
00:02:20.480
a startup company in Portland Oregon we do video transcoding software um and our
00:02:26.239
big claim to fame is that we use Nvidia gpus to parallelize the video
00:02:31.519
transcoding process which allows us to get much faster uh video transcoding
00:02:36.720
throughput as compared to all of our competitors and so um the speed of our products is vastly superior to all of
00:02:43.239
our competitor products which has allowed us even though we are a very small startup to be very competitive with a lot of the bigger names in the
00:02:50.560
space so we actually sell server appliances we sell these appliances to
00:02:55.840
large companies uh to solve their video transcoding needs these servers are installed in
00:03:02.120
oftentimes in their data centers and uh we need to be able to integrate these servers into their workflow
00:03:09.400
environments so who are some of our customers uh HBO uses Elemental servers
00:03:15.480
to um to transcode all of their content that they have for broadcast delivery uh
00:03:21.640
and they transcode that content for uh online adaptive bit rate uh consumption
00:03:27.400
to power HBO GO so HBO has you know a vast uh library of
00:03:34.400
broadcast television and this was is great if you're trying to transmit this
00:03:39.439
uh this video uh to you know cable subscribers but you know in the age of Netflix and Hulu they needed to be
00:03:45.959
competitive and be able to transmit their video over the internet as well this is a big problem for companies
00:03:51.840
because the number of different devices that their consumers are going to be using to consume this video is very very
00:03:57.879
Broad and the transcoding is needed to to create the video specifically for
00:04:04.439
each type of device so the videos needed to be displayed on an iPhone or an iPad are very different than the video
00:04:10.959
displayed on your computer or your TV hooked up to HBO GO or your Android
00:04:17.040
device so companies like HBO need lots of servers to handle all this transcoding we also power Comcast
00:04:24.320
Xfinity application and uh the ABC News iPad app among others
00:04:31.240
so another big win for Elemental was that we uh helped power the online coverage of the London Olympics in U
00:04:38.080
multiple continents uh and many many countries uh the BBC used Elementals
00:04:44.280
gear to live stream 24 um channels 247
00:04:49.520
throughout the entire London Olympic Games so for customers like the BBC having servers the having these live
00:04:56.199
streams up 247 is of the utmost importance we need to make sure that these servers are stable and uh able to
00:05:02.720
continue transcoding uh for weeks or months on end so you may be asking yourself you
00:05:09.720
know this is Ruby conf and all video transcoding I'm pretty sure you guys aren't doing video transcoding using
00:05:15.240
Ruby and if you thought that then yes you are correct we don't use Ruby to do any video Trend coding however Ruby has
00:05:22.199
helped Elemental in a lot of ways to basically grow from a very small startup into a very competitive player in this
00:05:28.840
market and so how has Ruby helped Elemental so we use a rails application
00:05:35.319
uh running on each server to handle the user interface of these uh the of our
00:05:40.479
servers and handle all the the transcoding events uh this has allowed us to have a best-in-class user
00:05:46.440
interface that's um much more usable and uh much easy easier to integrate into
00:05:53.120
these workflow environments as compared to a lot of our competitors Ruby allows for a lot of
00:05:58.759
fast-paced development you can get a lot done really quickly with Ruby this has allowed us to scale very quickly as a
00:06:04.720
startup and to develop our products and add the features that we needed in a very short time
00:06:10.479
frame not only that but I'm really happy that I get to use Ruby on a daily basis
00:06:16.520
uh in order to solve these really interesting problems we all love the Ruby language it's it's a joy to to use
00:06:22.639
and it helps us attract worldclass developers to work on our products so let's go a little bit deeper
00:06:30.800
into an example uh server node that we have in our uh in our use case so we I
00:06:36.560
already mentioned that we use the rails to provide a user interface for these servers they can uh our customers can
00:06:43.000
access uh this through the HTML interface or use the restful API to uh
00:06:48.680
automate their workflows we also run a multi-threaded ruby script as a service demon on each
00:06:56.639
of our servers this service demon is respons responsible for managing the vast majority of uh the actual
00:07:03.599
transcoding that happens uh in our file to- file case it needs to do a lot of load balancing to make sure that it's
00:07:10.319
consuming the maximum number of resources and thus getting the maximum number of throughput for our uh encoding
00:07:16.879
processes that are running uh for our live streaming uh products it needs to
00:07:22.080
be able to manage the the resources and handle starting and stopping these live
00:07:27.440
streams and making sure that all the information is Flowing appropriately throughout the entire
00:07:34.680
system so let's step back into this our story so in the beginning we are trying
00:07:39.879
to build a product and oftentimes you know the initial requirements and feature sets are what Drive the
00:07:45.280
technology decisions and this makes sense because you know none of us have crystal balls none of us know definitively what the future is going to
00:07:51.759
hold or where uh our products are going to evolve into uh we're not sure what
00:07:57.319
new technology is going to be created in future that may solve a problem better we don't know necessarily what our
00:08:03.960
customers needs are going to be in the future as we continue developing our
00:08:10.240
products there are new features uh new use cases that we are developing and often times it's easy to reuse the
00:08:16.280
technology that we're already using uh to solve these new similar problems often times the problems are very
00:08:22.400
similar maybe sometimes they aren't quite as a perfect match but even then
00:08:27.599
sometimes it's still useful to use the the known technology as opposed to spend the time to learn something
00:08:34.519
new and sometimes it can be a good business decision to reuse existing
00:08:39.599
technology in a close enough match as opposed to spending the time to to explore new options if you're in a
00:08:46.480
fast-paced startup environment sometimes getting the product out the door and in your customer's hands is more important
00:08:52.760
than uh than making sure that the technology you're using is a perfect match
00:09:03.640
so Elemental has been using drb as a communication uh method for uh about
00:09:08.920
three years now drb is really easy to use it is fast and it is reliable it's a
00:09:15.480
really easy way to uh allow two different Ruby processes to communicate to each
00:09:21.640
other here's a quick little code example to uh uh show kind of what the power of
00:09:27.279
what drb can do so uh we have this server class and it exposes two methods you can get data from the server which
00:09:34.279
returns a simple array or you can send data to the server uh and the server just prints that data out uh now we're
00:09:41.560
going to create a drb service uh on a specific U and we're going to pass in
00:09:47.519
this instance of This Server class that we've created so now anyone who's connecting to this drb service has
00:09:53.880
access to those methods in our server class and to keep the this uh this
00:09:59.560
process running we're going to call drb thread join so that this service is running at continuously if we're going
00:10:06.640
to run it from the command line so now we can open up IRB require drb again uh
00:10:12.800
create a new drb object to connect to that URI from our previous from our server and now we have an object that
00:10:20.399
can communicate directly with that server object we can call that get data method from a completely different Ruby
00:10:26.360
process and get that array back and we can send data to that server as well pass in the string hello and then over
00:10:32.720
on the service as it's running it's going to be printing out that uh in that other
00:10:39.440
process so you can use drb to send data and messages you can use it to request data which can be very powerful if you
00:10:46.040
know exactly what data you want you can ask the service to give you a very specific set of
00:10:51.440
information you can pass entire Ruby objects back and forth using Marshall which can be incredibly powerful um if
00:10:58.600
both both Ruby processes have the same concept of those objects one thing about drb is it is a
00:11:06.040
point-to-point communication method so you have to know exactly who you're talking to uh you open up a connection
00:11:12.120
to a direct service particularly in order to to perform that
00:11:21.360
communication so how does Elemental use drb so looking back to our little State
00:11:27.160
diagram from before we can see that we use RB to to communicate between our rails user interface and the Ruby
00:11:33.720
service demon that's running on all these servers so our first product was a file
00:11:40.120
to- file transcoding server uh this would basically transcode files to multiple formats as fast as possible
00:11:46.760
take up as much GPU resources as possible get those files transcoded you know High frame rate uh all of those
00:11:54.639
encoding processes are returning a lot of statistic information back to the service demon uh very continuously this
00:12:01.680
is information such as the current frames per second that's being encoded and certain quality metrics to describe
00:12:07.720
you know is is the current encode is the quality good or is it you know having to reduce the quality in order to you know
00:12:14.440
for some other reason of some sort this is information that we don't necessarily want to store permanently there's no
00:12:21.160
reason for us to have a historical um you know we don't need to run statistics
00:12:26.240
on this information uh later so we don't really want to store this data in the database we don't want to waste the
00:12:31.639
Cycles to submit that uh work to the database when we could be using those
00:12:36.720
CPU Cycles to transcode files faster um but our customers still want to be able
00:12:43.120
to query that information if they're seeing some issues with you know you know they want to see what the current
00:12:48.160
frames per second are they want to see what uh some of that data is they still want to be able to query that information and so we can have the rails
00:12:55.440
UI uh ask the demon what is the current stats for for a particular incode get
00:13:00.880
that data back display it in the UI and that's a really great use of
00:13:07.600
drb so uh our next product was our live streaming uh uh server which is uh you
00:13:14.279
know this is a kind of a different use case uh they're getting live data in and we have to transcode it in real time to
00:13:21.000
multiple output formats uh and keep that running as long as possible so in this use case now the UI needs to actually
00:13:27.880
send some commands to the service demon we need to be able to tell it when exactly to start the event maybe to
00:13:33.760
switch inputs if one of the inputs goes down maybe our output CDN is down maybe
00:13:38.800
aiz down but some of the other cdns are up so we want to tell it to stop encoding certain outputs but keep other
00:13:44.639
outputs up now we need a way to to communicate from the rails UI to the
00:13:49.760
demon send commands and drb is a pretty good fit for this use case um you know
00:13:55.120
you may say there's other ways to send commands and that may be right but in this case using drb for this met our
00:14:02.399
needs very well there was no need to try anything
00:14:07.440
else so as uh our customers had more and more uh need for transcoding they needed
00:14:14.320
U more and more servers and they wanted a central way to communicate with all of these servers and manage these servers
00:14:20.240
to manage failover of the servers uh in case uh components die because a lot of
00:14:26.160
these people they are running encodes for months on and we all know that server components
00:14:31.519
die that happens we have to plan for that so um we have um now a management node it's not doing any transcoding
00:14:37.920
itself it's just managing the state of all these other servers it's also running a ruby service script uh instead
00:14:45.079
of managing the livein codes it's managing these other servers and so in order to do that you know we those that
00:14:52.040
service needs to be able to communicate to the services on all the live streaming nodes so we can use drb for
00:14:58.000
that too this use case was a little bit different though especially as uh the needs of our
00:15:05.000
customers changed over time um we need to make sure that the those live nodes
00:15:10.720
are up no matter what it doesn't matter if other components are failing if the external database server go server goes
00:15:16.360
down we don't care we still want our live streams to go out and so we may need to make sure that our service
00:15:23.000
demons are still running on all those nodes however there's still data that's being produced that we need to make sure
00:15:29.040
is going to be eventually persisted to the database so that the customer can tell if something went wrong uh during
00:15:36.199
this sort of downtime you know things like did one of the cdns go down and did an alert pop up did um you know um
00:15:44.680
issues happen with a transcoding where certain outputs were were failing for certain parts of time they want to know
00:15:49.839
that information so now we have the use case where drb is not only just sending
00:15:56.040
messages but it has to be ensuring the receipt of that message ensuring that the database is getting that information
00:16:02.720
and that's not something that drb really is suited for so now we came to to our problem we
00:16:10.800
have our technology that we're using drb for communication and it's not fitting our needs anymore we have things that we
00:16:16.839
need to solve that drb can't provide for
00:16:22.240
us so if you find yourself in this sort of situation what are you going to do the first important part is to of course
00:16:28.920
Define the exact problem that you have so uh I outlined the three use cases
00:16:34.000
that we have for drb two of those use cases are still fine drb still fits our needs in those use cases we need to
00:16:41.480
define the problem as the exact portion of the code that is actually um not fitting our needs
00:16:47.680
anymore and then we also have to think about what our customer needs are what do they actually want out of our product
00:16:53.240
what is most important to them and that's going to help determine our decision-making process for what our
00:16:59.680
solution is going to be and most importantly before embarking
00:17:05.400
on S a project like this is to Define what you mean by success beforehand
00:17:12.039
that's a good way for you to be able to then meet your goal later you know and so in this case uh for Success we want
00:17:18.559
to be replacing that inner demon communication with something that's much more reliable and robust can manage um
00:17:25.559
saving messages even when other components are not available to to receive those messages and be able to
00:17:31.919
store those messages and um and we can be guaranteed that data is not going to be
00:17:37.919
lost for us speed isn't a primary concern um for our
00:17:44.440
users what they really care about is the reliability stability and the resiliency of our servers um as far as the Ruby
00:17:52.080
layer goes if if it's working that's great what they really care about is those encoding processes continuing to
00:17:59.159
run all the time and so also when you're embarking
00:18:05.720
on a project like this it's uh very uh useful to limit the scope of the
00:18:12.080
project replace one single component at a time again the inner demon communication layer is what we're going
00:18:18.159
to focus on here uh replacing a single component helps make sure that you're uh not trying to do too much too fast and
00:18:26.360
then once you've replaced one portion you can work on incremental development maybe then you can find other areas
00:18:32.480
where using a new technology is going to be better than the existing technology and move on from
00:18:41.720
that so why would we start looking into a message que and uh it became really
00:18:48.840
obvious to me that that a message cue is the right solution for this problem because I noticed that I was beginning
00:18:55.000
to reinvent the wheel I was starting to develop what was kind of a pseudo message Q around and in the middle of
00:19:01.520
our drb code except my message Q code is of course not going to be nearly as stable and robust as the many many
00:19:08.919
different message CU options that are available uh to
00:19:15.000
use the other great thing about using a message CU is it's decoupling the producers from the consumers uh in this
00:19:22.600
sort of case where we have these multiple live uh live encoding servers and one management server the the live
00:19:30.039
encoding servers don't actually care who receives the messages that they're sending out um they don't need to know
00:19:36.440
who the consumer the management node is and um and so drb with its
00:19:43.200
point-to-point communication uh adds this limitation to the producer the
00:19:48.280
producer has to know exactly who it's communicating to and msq also helps reduce the
00:19:56.039
responsibility of the various components our service demon was responsible for a lot of things and if it's responsible
00:20:03.240
for a lot of things then that means that failures in one area are going to trickle down and cause uh impacts that's
00:20:09.200
unintended consequences in other areas of our code um we want to make sure that
00:20:14.799
you know other areas in the service demon aren't going to mean that we're losing data in the message messaging
00:20:22.960
section so we decided to use rabid mq as our message broker uh system it's an
00:20:28.120
asynchronous message broker it uses amqp as its messaging protocol however it can also use a lot of different other
00:20:34.280
message protocols which was very attractive to us so if we do determine that amqp is not the right fit for us in
00:20:40.159
the future we would be able to swap that out another great benefit of RAB mq is
00:20:45.200
it has high availability clustering available so as we move towards uh
00:20:50.600
wanting to make ensure the availability of all of our messages we can use the multiple servers themselves to enable
00:20:57.919
High availability High availability
00:21:02.960
clustering so quick overview of amqp the advanced message CU protocol this is
00:21:08.440
simply a protocol to used to receive route queue and deliver messages again there's other protocols available that
00:21:14.320
do very similar things um the one good thing about amqp is that there is a lot
00:21:19.640
of implementations in multiple other languages which for us was a really uh good benefit because we do have major
00:21:26.279
components written in other languages are made in coding processes are run in C++ so being able to easily uh be able
00:21:34.279
to use amqp in our other areas is going to be a great strength for us as
00:21:39.720
well so in order to use uh amqp with Ruby you use the ruby amqp gem this uses
00:21:46.440
EV vent machine coming back to my original talk so wasn't completely out of the woods there so uh you use the
00:21:53.440
amqp gem to communicate with the rabbit mq broker that's running on your server
00:21:58.679
and so here's a quick example so you require amqp and um it
00:22:04.000
require it requires event machine to be running and so we uh call event machine
00:22:09.400
run and run everything inside that that Loop um we create a connection to the
00:22:15.480
amqp um the rabit mq broker and create a channel that we're going to communicate
00:22:21.480
on uh we're going to uh specify the cue that we want our
00:22:26.720
messages to to uh to go to and then rabid mq uses it's an exchange system so
00:22:33.240
you actually publish to exchanges not to the Q specifically this is actually kind of interesting because it means that you
00:22:39.600
can Define different U message routing uh protocols as necessary um you could
00:22:45.840
have multiple different cues and the exchange could fan out the message to multiple cues or do other Advanced
00:22:52.159
routing that this is just a really simple example where the exchange is just tying directly to this one Q
00:22:58.880
now we're going to use event machine to add a periodic timer every 5 seconds it's going to publish the current time
00:23:04.600
to The Exchange and then we are going to subscribe to that queue and uh whenever
00:23:10.760
we receive something we're going to put print out to the screen so if you run this you get a little script and every 5
00:23:17.000
Seconds it'll be printing out the
00:23:22.360
time so now uh we're going to Def we've defined the scope of uh of our
00:23:29.760
solution so here's our main our main uh issues that we want to make sure that rabit mq is solving for us we want to
00:23:36.600
make sure that rabit mq is going to add reliability to our system rabit mq provides this concept of
00:23:43.559
durable cues these are cues where if the message goes in the queue you're not going to ever lose that message it will
00:23:49.520
handle persisting that message to the disk and making sure that that message is never lost even if the rabbit mq
00:23:54.760
broker goes down rabit mq also ensures the order of
00:23:59.799
the messages it receives which is really good for us to make sure that uh we're getting a correct um uh State change U
00:24:08.799
messaging system uh from what what's happening on our Live Events we're knowing that the order that the the
00:24:16.320
management note is receiving the messages is the correct order of the events that happened on those live
00:24:21.679
notes so stability we want to make sure things are more stable and this is
00:24:27.120
really accomplished by the decoupling of the messaging from the service demon um
00:24:32.240
into a different component Rabbid mq is going to handle all of that messaging um decoupling that that that possible
00:24:39.760
failurea failure case from the service demon Itself by making sure each of these two different components is
00:24:45.720
focusing on the things that are important to it it means that the failure in one component is not going to cause unintended consequences and
00:24:52.520
Trickle and affect other areas of our system and resiliency want to make sure
00:24:58.919
our um our entire cluster is very resilient to errors and that is going to
00:25:04.559
be accomplished by enabling the high availability of the Rabid mq Brokers so that the messages are are shared amongst
00:25:11.600
all the different servers and so that way if one particular server has some sort of issue and crashes we're not
00:25:17.440
going to be losing that data and we're still going to be using
00:25:22.480
drb in some of our use cases drb is still the right tool for the job it is very simple which I really really like
00:25:28.679
it's very easy to use the syntax is very Ruby esque um and not only that but we
00:25:35.120
still have that use case where we need to request specific data from our service demon and in that uh drb is
00:25:40.720
still uh the real solution for that
00:25:46.960
task so a little bit about our final implementation so we have these multiple
00:25:52.919
producers which are the live streaming nodes and the single consumer which is the management node
00:25:59.039
and uh even when we expand out to have redundant management nodes there's still
00:26:04.440
always going to be one consumer and that also means the producers don't actually have to worry about who their single
00:26:09.480
consumer are the management nodes themselves are going to manage making sure that there is always someone there
00:26:15.720
to consume those messages it's not something that the live en codes have to worry about and we do have our two main
00:26:22.640
different classes of data we have uh the state change information that we need to be durable that's so sort of things like
00:26:28.919
if an alert happened on one of those nodes a change input happened um
00:26:35.600
encoding failed for some reason all these sorts of things um we need to make sure are stored and always persisted in
00:26:41.480
the database those can go along durable cues and then we have the more transient
00:26:46.520
status information that that information I was telling you about earlier where you know what's the current frames per second what's the current quality
00:26:52.919
metrics the sort of information where we want the real-time information but we don't necessarily need to store the historical
00:26:58.760
um information from that those can be stored in a different way in Rabbid mq
00:27:03.799
in a way that's not going to tax the
00:27:09.120
system so here's kind of a overview of what our final imp implementation is
00:27:14.720
going to look like we have you know our main management node and again this is kind of simplistic eventually we're
00:27:21.159
going to have redundant management nodes but that's you know the management nodes will deal with that sort of thing um and
00:27:27.159
then the multip worker nodes and then we have the clustered message Brokers on all of those servers and rabbit mq will
00:27:34.279
handle the high availability clustering of uh the Brokers between all these different
00:27:39.799
servers we still will have a drb connection between the UI and the re service on each of those nodes
00:27:46.880
individually um but each of the service uh each of the different components can send messages to the to the message
00:27:53.480
broker as necessary so that allows us to do incremental development if we determin that sending messages from the
00:27:59.960
rails UI is going to be a better uh solution for some of our use cases we can do that and uh another great benefit
00:28:07.480
is in the future we're hoping to uh have our communication from the encoder process processes themselves also use
00:28:15.480
amqp um and be able to send the messages uh into the message broker uh from that
00:28:20.919
from those processes as well which again are C++ applications
00:28:29.279
so what are some of the results we've seen after doing some of this refactoring the performance of rabid mq
00:28:34.559
has been really really impressive to us the speed and throughput that rabit mq provides is more than uh sufficient for
00:28:43.200
anything that Elemental will need it for in the near future this uh the broker can handle you
00:28:48.320
know millions and millions of messages and we're not sending anywhere near that amount of information so for us the performance has been
00:28:55.440
great the decoupling of the communication was was um something even
00:29:00.519
more satisfying than I had originally intended you know we all know that we've had we have code that we want to
00:29:05.640
refactor and we have an idea of how we're going to refactor that code and then once you actually take that that
00:29:11.559
that refactoring takes place and you see the new structure sometimes you're like man that was satisfying this is just a
00:29:18.039
beautiful implementation the way that the the this information is now structured very cleanly and separated it
00:29:24.159
really helped drive home to us that message Q was the right solution for this problem for
00:29:31.880
us uh we haven't actually released this code yet it's slated for our next major
00:29:37.279
release which is going to happen by the end of the next year um or end of this year sorry um but I'm looking forward to
00:29:44.159
that because I think it's going to be U positioning our products for a lot of growth um our customers are asking for a
00:29:49.640
lot more uh interesting uh reliability and resiliency in our clustered systems
00:29:55.640
and I think rabbit mq is going to be able to provide a lot of these features for us as we grow
00:30:02.640
um and I'd like to give a quick shout out to my co-worker Matt who did a lot of the implementation details and also
00:30:09.440
made those great diagrams for me thanks Matt and um I wanted to know if anyone
00:30:14.799
had any questions for
00:30:20.039
me two questions first I'm with who actually made and
00:30:30.960
I would love to be the second question is like you
00:30:36.279
mentioned that like d to was
00:30:41.919
not what was is it just Rel wasn't reliable or something well part of the
00:30:48.399
the the demon to demon thing was that oh so he was asking why exactly was drb not
00:30:54.000
a good fit for the demon to demon case was it the reliability and it was kind
00:30:59.080
of a little bit of the reliability but one of the main things was actually the point-to-point communication in that uh
00:31:05.880
in order to use drb you actually have to know who you're actually talking to you have to know the two servers have to
00:31:11.399
know exactly who each other are and in some of uh the cases where the the live
00:31:17.120
node is sending its information back to the to the master like for example if if we have one master node in the backup
00:31:23.440
Master if the master node goes down then all the live codes need to know that now
00:31:29.480
they need to be sending all their messages not to the master node but to the backup Master that's information that those servers don't need to know
00:31:35.799
about they don't need to know who they're sending their information to they just need to know that the master is going to be getting that information
00:31:42.080
so that's one reason why drb wasn't really a good fit for that use case um
00:31:47.200
and then Additionally the so drb when when um when we found we were using it
00:31:53.840
to send the messages to the master if if the master wasn't up for some reason
00:31:59.000
then the live node would have to be storing that information locally in memory because drb is you know is part
00:32:06.399
of that Ruby process itself and so but there you get into a challenging
00:32:11.639
situation where if the master is going is down for some reason let say we're upgrading the software on those Master nodes and all now all these live nodes
00:32:18.600
are kind of queuing up all this information that they need to send to the master node once it's coming back up
00:32:24.159
what if the service demon crashes at that point um because all that data is in memory we're going to be losing all
00:32:30.399
that information and so in order to avoid that problem then we're going to have to be implementing the methodology
00:32:36.880
to somehow persist that data to state so that drb can then re access it and so
00:32:42.679
and that's kind of where you're getting to the point well where the message Q has already implemented that the message CU handles that case for us so we don't
00:32:50.279
necessarily want to be coding that up for ourselves so we can use drb anything else
00:33:01.720
M what volume of messages are we seeing so um for each of our encoding processes
00:33:08.039
usually sends uh one message a second approximately and um and our main live
00:33:14.080
servers can run about six to 10 Live Events
00:33:19.559
concurrently uh and then our current largest cluster is about 20 noes so um
00:33:26.880
so that's kind of we're seeing for our current uh live encoding servers however
00:33:32.639
we have a new product that's that we're going to um we're uh releasing which is just a packager and so that's not
00:33:39.200
actually doing any transcoding it's just repackaging for a different delivery mechanism and that server can run many
00:33:46.039
many more concurrent live um packages package events at once you know that's
00:33:52.000
something uh we're seeing something along the lines if it can run 60 to 100 maybe 120 and so then you're getting
00:33:57.960
into the case where each node is running you know 120 um messages a second plus
00:34:03.240
you're clustering them all together so that's kind of the scale that we're seeing so it's not really that much not
00:34:08.440
really that many messages you know you can talk to lots of other companies who are dealing with high volume messaging
00:34:13.639
it's blows us out of the water resp
00:34:30.560
anything
00:34:41.760
else so so did it uh increase the complexity of our platform and I think what you're saying is well so because
00:34:48.359
drb is included within Ruby and so we kind of get that for free um now with using rabbit mq there's a whole bunch of
00:34:55.119
other things that we need to package up and distri rute uh on our servers is that kind of the question you're asking
00:35:01.119
and yeah that definitely complicates our our packaging we're going to have to now um package up the not only Rabbid mq
00:35:08.599
itself but all the the new gems event machine and amqp in order to handle this
00:35:14.280
so there's definitely overhead that we had to deal with there um to to handle s
00:35:19.680
making that switch
00:35:29.079
well thank you everyone I'm really glad that you guys came and hopefully you learned something or at
00:35:34.839
least discovered something new that you can do with Ruby so thanks again