Summarized using AI

Asynchronous Processing for Fun and Profity

Mike Perham • November 01, 2012 • Denver, Colorado • Talk

In his talk "Asynchronous Processing for Fun and Profity" at RubyConf 2012, Mike Perham discusses the advantages and strategies of implementing asynchronous processing in Ruby applications. He emphasizes that effective asynchronous processing is crucial for improving user experience by avoiding unnecessary waiting times and enhancing the reliability of applications. The talk is structured into three parts: the basics of asynchronous processing, essential design principles, and an exploration of Sidekiq, Perham's own asynchronous processing library.

Key points of the talk include:

  • Understanding Asynchronous Processing: Perham highlights the importance of respecting user time by minimizing wait times through asynchronous operations, especially in I/O operations which can significantly slow down processes.

  • Practical Design Tips: He recommends processing any non-essential tasks asynchronously to prevent users from encountering errors and to improve overall application speed. This involves placing messages on a queue to be processed separately by workers.

  • Framework and Library Insights: Perham introduces Sidekiq as an efficient solution for asynchronous processing, built on a multi-threaded architecture compared to other systems. He explains how Sidekiq allows for the processing of class methods in a separate process efficiently by using simple and stateless messages, which contain only action identifiers and not full objects.

  • Best Practices:

    • Use Small Stateless Messages: Messages referring to identifiers rather than entire objects maintain data integrity and prevent stale data issues.
    • Design for Idempotency: Developing tasks that can be safely retried (like canceling an order) helps prevent negative repercussions from failures.
    • Embrace Concurrency: Perham encourages Ruby developers to utilize multi-threading to improve processing speed and efficiency, contrasting it with older, single-threaded models.
  • Real-World Implications: Through anecdotes, Perham shares experiences of implementing Sidekiq, including its significant memory efficiency over traditional methods and its ability to handle large volumes of background jobs without failure.

He concludes with a firm belief that correct design and understanding of asynchronous processing, particularly through practices applied in Sidekiq, can lead to reduced hosting costs and improved application performance. The overarching message is about embracing asynchronous processing as a means not only to improve user experience but also to optimize server resources effectively.

By the end of the session, attendees are encouraged to incorporate these best practices into their applications to optimize performance, reliability, and scalability in their Ruby projects.

Asynchronous Processing for Fun and Profity
Mike Perham • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

Learn the gotchas and secrets to successful message processing. The naive approach is to just throw a message on a queue but there many different trade-offs which can make your life miserable if not accounted for when designing your application. This session will show you those gotchas and secrets.

RubyConf 2012

00:00:15.679 fun fact there's two other talks here that end with for fun and profit so the lesson learned
00:00:23.119 is no matter how witty you think you are someone else is probably gonna tell the same joke
00:00:30.080 what's that oh see i'm i'm not yeah
00:00:37.600 i'm all about the fun and profit so um hey everybody i'm mike
00:00:45.440 so a little about me my day job is i'm director of engineering at theclimb.com
00:00:51.360 we are based in portland oregon we're an e-commerce vendor we have a refreshingly old-fashioned
00:00:56.480 business model we put stuff in a warehouse we put the inventory online people see it buy it with a credit card and we ship it to
00:01:02.320 them it works out pretty well and we're hiring if that interests you
00:01:07.360 um so during the evenings though i uh i do a lot of ruby open source and
00:01:14.479 uh so i've done a lot of gyms over the years my latest project is called sidekick i hope i hope a lot of you have
00:01:20.400 heard of it before if not we're going to be going into it into into detail
00:01:26.479 so this talk is broken into three parts i'm going to talk about the basics of async processing why you want to do it
00:01:33.360 how we do it um maybe some design we'll talk about a little bit of design then i'll give you some pro tips that uh
00:01:41.280 you may not think of when you're writing your your rails applications not your application something like that
00:01:47.840 that can influence uh how well the async processing works for you and then the last the last uh third i'm
00:01:55.520 going to go into sidekick and dive into its design a little bit and talk about ruby in general
00:02:02.159 so why do we want to asynchronously process uh the answer is simple and a word is
00:02:08.319 the user we uh you respect the user the user's time
00:02:13.360 by making things work as quickly as possible you don't want to make the user wait
00:02:18.480 user gets bored he goes away you lose money so the other
00:02:25.760 one the other uh reason is that modern i o is is very slow
00:02:31.920 it's orders of magnitude slower than ram so anywhere where we do i o we can we can throw it we can spin it
00:02:39.200 off into the background and save a lot of time and uh and save the the user a lot of
00:02:45.519 impatience and also io is unreliable disks die all the time the network dies all the time so it
00:02:53.680 it it's a good thing if we can do this in the background in a way that's repeatable where we can
00:02:58.879 retry if things fail so so what do we want to asynchronously process basically anything optional
00:03:06.480 uh for us rails guys that basically means anything that's not required to generate the http response
00:03:12.800 throw it into the background it's safer that way so that the user has less potential for seeing 500 errors
00:03:19.519 and it's also going to be faster so this is the general idea behind
00:03:27.040 asynchronous processing your code is the client you put some sort of message
00:03:32.720 on a queue that you want to be processed on the other side the other end of that queue there's the
00:03:38.239 server which pulls the message off and then hands it to some sort of worker to be executed
00:03:43.680 really simple right so how do we implement this
00:03:52.239 this is the simplest most straightforward ruby i could come up with
00:03:57.760 which is basically here's a method it just asynchronously executes that block
00:04:04.480 and in fact the go language basically does something identical to this it has a
00:04:09.840 keyword go and you just pass it a function and that function happens in the background
00:04:15.120 and that's exactly how go routines work however there's a problem here with with
00:04:21.840 uh with how we want to do things we want to execute this block in another process we want to
00:04:28.880 spin off that work to some other machine possibly and if
00:04:33.919 how many of you were at pat's talk yesterday you you learned all about what a block is a block is a closure right so basically
00:04:41.280 that closure can access any variables outside of that block that are in scope
00:04:47.840 so if we want to execute that block in another process we have to serialize the entire closure
00:04:56.320 over to that process and ruby can't do that it's just not possible so uh so we can't
00:05:03.120 do that in ruby so that's effectively impossible so and even if we could that could potentially be hundreds of k
00:05:09.680 or megabytes of data just not very efficient so there needs to be a simpler way of saying
00:05:15.840 execute this block in another process well what if we could what if we could
00:05:21.520 reference a block by name instead of an anonymous block if we just had a name for a block and that
00:05:28.479 that server process knew what that block was we could just call it by name and in fact ruby has something like this
00:05:35.840 you have a named block which accepts arguments does anyone know what that is
00:05:42.000 it's a method you've probably used them before
00:05:47.199 so we can do this instead of async processing an anonymous block we
00:05:54.320 give it an instance the name of the method we want to call and a set of arguments to pass
00:05:59.360 to that instance now we've we've got a much simpler problem to solve we need to marshal that instance over the name
00:06:05.840 of the method and marshall the arguments ruby can do that and in effect
00:06:11.840 that's how the rails 4 queuing api is going to work you give it an instance it marshals the instance over to a
00:06:17.759 separate process and then calls a method on it
00:06:22.880 however the problem the problem with this is you still have a lot of data that you're going to marshal that that instance could have a lot of data in it
00:06:28.720 you just don't know also the arguments could be large they could be full objects themselves so
00:06:34.880 there could be a lot of data associated with this is there some way we can simplify this a
00:06:40.080 bit more well yeah there is we could do class methods instead that takes away the instance so now
00:06:46.960 we're just calling a method on a class and passing in some simple arguments that that solves half of the problem now we
00:06:53.199 only need to worry about marshalling the arguments
00:07:00.720 i just said that so congratulations we've just derived how rescue and sidekick work
00:07:08.880 you are essentially calling class methods uh in a separate process and passing it
00:07:15.199 arguments and so this is this is an example of
00:07:21.280 a sidekick how you can spin off a method into the background you just have a
00:07:26.560 class method which takes an argument and then you call dot delay on it and that actually sends the method
00:07:32.080 invocation to the sidekick process to be executed in the background pretty simple
00:07:39.840 sidekick has another way of doing it which is through uh a worker so if you want to encapsulate a
00:07:47.280 particular job within a class you just provide a perform method which performs that job
00:07:52.960 and then you call a class method that is the site the sidekick api which is called perform
00:07:58.240 async pass it the arguments and it effectively does the same thing as the previous slide
00:08:05.919 that actually turns out it becomes very efficient now instead of talking about serializing that in that possibly megabyte
00:08:13.120 megabytes of closure for that anonymous block now we've simplified things so that we
00:08:18.720 can efficiently just pass in the class name and a simple set of arguments and that message is very efficient very
00:08:25.360 small so that brings me to my first pro tip
00:08:31.520 when doing asynchronous processing small stateless messages
00:08:41.279 stateless means that the queue is not where your data belongs your data belongs in your database
00:08:48.080 that's why it's called a database it holds a base that holds data your queue contains actions that you
00:08:56.320 want to perform on that data and so your message really should be perform action x
00:09:02.720 on object y that's it so your your uh your messages should
00:09:09.360 contain object identifiers that you then look up using activerecord
00:09:16.160 and so that you can then perform that action on that object you don't want to serialize that entire object onto the queue
00:09:24.480 because then you your object is sitting in two places your user object is sitting in the
00:09:30.399 database but it's also serialized into the queue well what's the problem there the problem is you get
00:09:36.000 stale data this is being executed asynchronously you don't know if it's going to execute
00:09:41.120 five seconds from now or five hours from now in the meantime the user could go and update their data and change things
00:09:48.800 such that when the job is actually executed off of the queue that data could be out of sync
00:09:54.320 if the message just contains the identifier you know that you're always going to be using the latest version of the data
00:10:03.200 simple types simple types are easy to read their cross cross platform and that's why rescue and
00:10:09.040 sidekick use json because that forces you to use those simple types json does not allow you to
00:10:14.800 serialize a full ruby object by doing that
00:10:22.800 tooling becomes much simpler you can now inspect the contents of your queue
00:10:28.399 visually and see exactly what arguments you're passing into that method
00:10:34.640 instead of passing in a kilobyte or a few kilobytes of binary data you're just passing in a few simple
00:10:40.000 integers now i can pop into rails console and execute that job myself manually
00:10:47.760 you can see at the top there i'm doing i'm having to use yaml because i'm actually delaying that class method and
00:10:54.240 technically side sidekick serializes the entire class over
00:11:00.000 unfortunately because you might have class state but but it still is is reasonably
00:11:05.920 readable as is tip number two we've talked about the
00:11:11.600 client and passing messages now on the server side when you're doing your work you want your work to be item
00:11:18.240 what's known as item potent and transactional item potent is a fancy computer science
00:11:24.320 term it basically means perform the perform performing the action many times
00:11:31.760 will not harm your system examples of things that are item potent
00:11:38.079 canceling an order once you've canceled that order you can cancel it as many more times as you want
00:11:43.760 nothing bad is going to happen if you update your email address you change it to bobbit example.com you
00:11:50.639 can continue to update it to bob at example.com it's not going to harm anything
00:11:56.720 things that are not item potent charging a credit card you ever gone to the place order screen
00:12:02.800 and it says don't double click they're not item potent that's
00:12:10.000 exactly why they put that warning there is because that developer is lazy and does not know what the word item potent
00:12:15.279 means sending an email is not item potent you send an email once
00:12:20.800 gmail will happily deliver a second email to that same user with the exact same contents
00:12:28.160 so why does this matter this matters because your code has bugs and sidekick will retry
00:12:36.079 your code if it fails if it raises an error sidekick will retry it with an exponential back off
00:12:42.480 the reason for that is simple like i said at the very beginning i o is unreliable networks die
00:12:49.360 third-party apis go down all the time so it's nice to have an asynchronous
00:12:56.399 processor that recognizes that and just will retry it for you and so if an error is thrown you see the
00:13:03.040 email and you don't have to get up and do anything because you know five minutes from now sidekick's going to retry it and it's going to succeed
00:13:10.959 so that means that you need to design your jobs with the idea in mind that this job
00:13:16.720 could be executed many times so if you're doing a non-item potent action
00:13:22.399 you need to check to make sure you need to you need to actually perform that action first
00:13:30.480 transaction item potent and transactional are sort of similar ideas
00:13:37.839 this is an example of a asynchronous worker if i'm returning an order i issue credit
00:13:44.880 for that order and then i send them an email saying that their credit was issued if i issue the credit successfully but
00:13:50.480 then the email blows up because gmail is down is sidekick's going to keep retrying
00:13:55.839 that is they are they going to continue to get more and more credit until you get until you realize that
00:14:04.160 that's up to that's up to you as a developer to make sure that that doesn't happen
00:14:11.519 so tip number three embrace concurrency
00:14:21.279 the standard for ruby until recently i think has been the single threaded model
00:14:28.000 i've been trying the last couple of years to get people to embrace threads i know the jruby guys have been trying
00:14:33.360 to get people to embrace threads the rubinius people it seems like everyone but mri is trying to get people
00:14:38.880 to embrace threads and so people who have been using rescue and delayed job which are the pretty
00:14:44.639 much the standards for asynchronous processing these days it's it's common to have a handful maybe
00:14:51.440 a dozen workers and that's that's not terribly concurrent so concurrency is not something you really need to think about
00:14:57.279 too much you're not going to crush your database by having 10 extra processes 10 extra connections
00:15:02.639 hitting it but sidekick is heavily multi-threaded every process has 25 threads in it
00:15:09.440 by default you can you can up that to a lot more so if you're running four processes you've got 100 workers
00:15:16.240 and you've got potentially 100 threads hitting your database all at once
00:15:21.440 same thing goes with any third-party apis you're hitting so sidekick in fact goes from being a nice
00:15:28.880 polite character to being cone in the barbarian
00:15:35.040 and sidekick will will crush servers it'll take it'll take down servers that you or services that you
00:15:41.360 haven't prepared for we rolled out sidekick in production at the climb about six months ago and
00:15:48.160 within 48 hours we had taken down a third party api that we were calling
00:15:54.639 because we just we pushed 500 jobs onto the queue and it processed them all at once and
00:16:01.600 that server was not prepared for 500 connections so uh so concurrency is something you
00:16:07.440 need to think about when you have that volume of work that number of workers and that volume of jobs
00:16:12.560 there's a couple things you can do you can use a connection pool so that your processes are limited in
00:16:19.920 the number of connections they can make to that service you can also mix parallel and serial
00:16:25.360 execution and blend them together so instead of if you've got 100 items to process instead
00:16:30.720 of creating 100 jobs you can create 10 jobs which each process 10 items serially
00:16:36.959 and that way you've only got 10 workers hitting that server
00:16:43.680 in my experience over the last year dealing with with sidekick is that
00:16:50.880 thread safety has not been much of an issue i've found three gyms i think that had thread safety
00:16:57.600 issues and we use about 150 gems at the climb so it's a
00:17:02.800 very small percentage um cocaine and typhus were the like
00:17:08.160 the last ones that had thread safety fixes put in the base camp gym is also not thread
00:17:14.799 safe but i was able to fix that myself but for the most part gem maintainers are responsive and if they're not
00:17:20.079 responsive that's a sign you probably shouldn't be using that gem at all anyways
00:17:28.240 so that's a little bit of the theory and some pro tips around asynchronous processing
00:17:34.080 let's talk about sidekick and ruby and and the innards
00:17:39.120 so sidekick's mantra is simple efficient message processing it's multi-threaded rather than single
00:17:44.320 threaded so it is 10 20 times faster than something like delayed job or rescue
00:17:53.360 and that's that's the reality is that modern ruby isn't terribly slow you can do a lot of work if you have the
00:18:00.160 right design it's it's single threading that's killing us
00:18:05.679 as a community
00:18:12.000 to scale a single threaded model you need to create lots of processes that works fine when you're gluing
00:18:18.799 together little command line unix utilities as was ruby's history 10 years ago
00:18:24.720 it does not work fine today in the era of 300 megabyte rails processes
00:18:32.240 okay this is an example from the climbs production server we've got five unicorns running at about
00:18:39.039 250 to 300 megabytes each each one of those can handle one request at a time
00:18:44.400 that's pathetic at the same time right next to it i've got a sidekick running with 10 workers
00:18:49.840 10 worker threads taking 250 megs of memory so side kicks doing effectively can do
00:18:55.679 twice as much work while being the same size as one of those unicorns
00:19:05.120 to put some numbers in front of you 400 megabytes you want 25 requests processed at a time
00:19:12.080 that's 10 gigs 10 gigs of ram that's an ec2 extra large instance that's 480 a month
00:19:19.120 if on the other hand you use threads one one gigabyte of ram let's say that fits onto an ec2 small for 60
00:19:25.200 dollars a month one of the earliest sidekick customers
00:19:30.720 was on had 160 dinos of rescue and he switched a sidekick and went down
00:19:36.559 to 10 dinos and a dyno cost 35 a month he's saving 5
00:19:41.760 000 a month in hosting costs alone just by switching so that's the inefficiency i'm talking
00:19:47.679 about that's why i say sidekick is efficient because it saves real money
00:19:53.120 at scale so i was a little bit disappointed yesterday
00:20:00.160 at ruby 2.0 feature set when i think of 2.0 i think of a major version increment that breaks
00:20:07.679 things for the better and i was really hoping to see
00:20:12.720 some concurrency changes take place specifically we need to get rid of the
00:20:17.760 gill and and we've we've got terrible gc and all this i think in my opinion
00:20:23.039 and my understanding is a lot of this is just simply hampered by the c extension api that continues to linger
00:20:29.200 on and so the gc and the gil are there to deal with c extensions which is is kind of funny
00:20:36.240 because the c extensions are there to speed up performance
00:20:41.360 right with a c extension i can take my well-tuned ruby and maybe make it 50 faster
00:20:49.360 but now i can't use seven other cores
00:20:54.400 so what's what's bigger fifty percent faster or eight hundred percent faster
00:21:01.600 that's that's the sad state is uh of where we are today i think
00:21:08.480 so sidekick internals the client got we've got your rails process with
00:21:14.240 your app code you call the sidekick client api sidekick has a concept of middleware there's a pipeline
00:21:20.320 that messages flow through from your code before they're pushed onto redis this
00:21:25.919 allows us to add in the same way that rack middleware allows us to add features to rails
00:21:31.440 allows you to add features to side sidekick so there's a client middleware pipeline which is executed
00:21:38.080 and then once it flows through the middleware it's serialized into json and placed into redis
00:21:49.600 on the server side the sidekick server uses celluloid i know the cellulite's
00:21:55.120 gotten a lot of love so far and i'm the first person to admit to be
00:22:00.240 another fanboy of it but celloid really makes makes it easy to to deal with the multithreading here
00:22:07.760 i've got three actors in sidekick there's a fetcher which just simply listens for
00:22:13.440 on cues and fetches messages off of those cues he hands the messages to the manager and
00:22:20.159 the manager is who keeps track of all the processors so the processors are
00:22:25.600 the guys that do all the heavy lifting and do execute all your work so you're going to have
00:22:30.760 2550 processors within your sidekick process they each execute the server middleware
00:22:38.320 with your message again that server middleware can can modify the message
00:22:44.080 or simply stop the message from getting processed but once it flows through the middleware
00:22:49.440 pipeline it's passed to a worker whereupon it gets executed and that's that's where
00:22:56.720 your code is called and the work is actually done conceptually it's pretty simple
00:23:03.679 so there's two versions of sidekick there is the base sidekick which is free open
00:23:08.960 source uh lgpl licensed there's also sidekick pro it has more
00:23:14.240 features and it costs money i charge for it and math yesterday talked about motivation
00:23:20.159 and he talked about love and altruism love and altruism only gets you so far
00:23:27.120 um i think i think money is a great motivator too so i like to have customers so
00:23:34.080 as long as the money keeps flowing in i'm going to be happy to support sidekick
00:23:40.559 in terms of feature sets again in terms of concurrency models sidekick uses threads the other two use processes
00:23:48.240 i like uh rescues use of redis i love redis as a data store um one of
00:23:55.279 the most frequent questions i get is can i use or cassandra or
00:24:00.480 react or my sequel or postgres to store my data and my answer is always
00:24:05.760 the same no no i don't want to make a lowest common
00:24:12.000 denominator storage api and abstract all that just so a few people can run on or whatever redis is awesome it's got
00:24:19.919 all sorts of cool data structures that and i leverage pretty much all of them for the features in sidekick
00:24:26.799 so to port to something like a database would be very difficult i like the middleware
00:24:33.440 design rather than callbacks so that's what sidekick uses and as i said before sidekick aims to be
00:24:39.360 simple and efficient part of that simple phrasing is gathering together all the features i
00:24:46.000 think 90 of us use all the time one of the things that
00:24:51.520 disappoints me about rescue is i think the feature set is a little bit bare bones out of the box a lot of
00:24:57.120 people if you if you how many people here use rescue okay so about half the audience um how many of
00:25:04.880 you don't use add-on gems that add on additional functionality to rescue
00:25:11.440 and nobody um there is a lot missing from rescue
00:25:16.559 that i really missed a delayed job has a lot of functionality built into it a scheduler the retry the retry
00:25:23.520 mechanism the web ui the the delaying of class
00:25:28.559 methods and that sort of thing all i use that stuff every day and that's why in terms of simplicity i
00:25:35.919 build it all in so that it's well designed and all works together well
00:25:43.520 so the future sidekick uh i just checked in i actually just released 2.5
00:25:48.720 which has a nicer web ui i still think it can be cleaned up a little bit but i want to add more functionality to it
00:25:56.080 i just actually 2.5 also released apis for managing queues and retries so
00:26:01.120 so that's there but the big one on the horizon is the rails 4q api
00:26:07.279 the the data that that api is solidified and supported there'll be a sidekick release which
00:26:13.200 supports that that api also so rest assured sidekick will will move forward with rails for
00:26:21.360 psychic pro i'm just going to add enterprise features sidekick is where functionality for 90
00:26:27.039 of you goes sidekick pro is where functionality for 30 percent of you might go
00:26:32.480 that's the optional stuff that's the enterprisey type stuff stuff that maybe hobbyists don't necessarily care about
00:26:40.799 so to wrap up i gave you those three tips small stateless messages item potent
00:26:46.640 transactional work and embrace that concurrency and then sidekick is a different design from
00:26:52.880 i think what ruby has been offering in the past and i hope more people learn it
00:26:58.960 understand it and embrace it and that is it for me thank you
00:27:14.000 any questions right there like yep
00:27:23.120 do you have any plans to like be able to remove a job from the scheduled queue the api allows
00:27:29.120 you to do that now yeah programmatically
00:27:44.399 so the asynchronous processes you recommend hitting up at the record or your data store kit
00:27:50.480 to get the uh associated yes um what what have you seen or what are your
00:27:56.000 with your knowledge of like any sort of performance impact that has um on
00:28:03.440 well typically typically you're passing identifiers over and so that's going to be a lookup by primary key and as long as you've got a
00:28:10.000 buffer pool um everything cached in in your buffer pool that's going to be super fast um so i
00:28:17.360 i wouldn't think that there's going to be much of a performance impact i would be much more worried um about about stale state than i would be
00:28:25.279 about the performance impact i have a real-time problem processes
00:28:33.120 and those those jobs are maybe you know
00:28:39.360 synchronized but i'm i'd
00:28:51.440 there's nothing that i do that is fiber adverse you can use fibers
00:28:57.919 side kick jobs are executed within a thread and a thread can run as many fibers as at once so there's
00:29:04.320 no reason to think why you couldn't use fibers with whatever functionality you want shouldn't shouldn't be a compatibility
00:29:10.000 issue or anything like that back there yep yes um just like you uh we really
00:29:16.080 like currency and we're getting ready to productionalize our rails app we use a lot of gems is there a way
00:29:23.440 to know that
00:29:30.559 is there a way to determine if all my threads are if all my gems are thread safe um i'd say that's pretty much equivalent
00:29:37.440 to the halting problem that is to say it's not really computable no i mean
00:29:43.360 the my experience has been if you're using popular well-supported gyms you're not
00:29:49.760 gonna you're not gonna have much of a problem sidekick actually did find a rails threading problem
00:29:55.120 in 3.328 that is fixed in 329. um but for the most part it's it's very
00:30:02.480 it's very rare kind of like unit testing you can unit test your code all you want but until
00:30:07.600 you actually click through stuff you don't know it's actually going to work right so
00:30:12.640 you've got to still load test your your your subsystems and only that way are you going to be
00:30:18.720 able to determine if if this thing is reasonably stable yeah i've often found uh when i'm using
00:30:26.880 both rescue and sidekick that i'm breaking a lot of day jobs down at the small more atomic one um i just want to speak
00:30:33.360 practically do you have any advice for how to track those the options struggle to report back um
00:30:41.360 uh so in other words you've got one process which is creating a thousand jobs and then you want to report back when
00:30:47.440 all thousands are done that's exactly what sidekick pro does it has the concept of a batch
00:30:54.240 where you create those thousand jobs within a batch and then when the batch is done it you can notif be notified on it
00:31:01.519 but again that's one of those pro features that that i charge for anybody else yeah
00:31:09.440 yeah but not saying it's been a big big problem but you mentioned sending email is yeah
00:31:15.840 you could send said the same email several times what's your pro tip to the boy to do that
00:31:24.399 that's a great question so one of my slides i'll show you real
00:31:30.840 quick had a comment that sort of showed a safer way to do this so when you're
00:31:37.279 issuing credit and then you actually try to deliver that email that network could be down right and so the
00:31:43.360 email will raise an exception and you might issue them credit over and over and over what is safer is to spin off that email
00:31:49.919 delivery as its own separate job and so you're doing that email delivery as an atomic job where that's all that
00:31:56.480 job does so if it fails there was no email sent but if it succeeds you know the email is sent
00:32:01.840 and that's all you and so by doing that uh you take away a lot of the uh the risk yeah
00:32:08.480 yeah and i agree takes away a lot of the risk and i would do it that way but it doesn't eliminate the risk because
00:32:15.039 you can you can send an email through an smtp connection and then you botch up the the hangout
00:32:22.399 and you decide oh no i didn't manage to do that whereas receiving science yeah oh yeah i've received that yeah
00:32:29.120 you know at that point all you can do is really mark uh you can store some state saying yes i sent this email that's really all
00:32:36.080 all that i can think of you can do offhand is just mark a flag that said this was done
00:32:42.159 any other questions yeah back there
00:32:47.919 i'm sorry uh
00:32:56.960 right someone actually asked for that feature and i started implementing it but there's an
00:33:02.399 implementation detail that i have not worked out how to deal with yet so currently the answer is no
00:33:12.000 yeah um
00:33:23.360 it's not correct right you still might be able to uh crank some more uh reduce process size
00:33:31.840 do you mean sidekicker okay
00:33:37.440 i i did do some basic performance testing on jruby on jruby uh to test the number of client
00:33:45.279 messages i could push onto a queue in jruby versus 193. jruby was something
00:33:50.399 like twice as fast you could push about twice as many messages so that it's
00:33:55.679 uh the performance is definitely there but but i haven't done on unfortunately
00:34:01.039 on the multi-threaded side no i haven't it's it's so hard to get a real world benchmark of
00:34:06.799 you know there's no such thing as a micro benchmark for a background worker system unfortunately wayne back
00:34:15.919 so as one of the few
00:34:32.839 is
00:34:46.839 oh
00:34:56.240 well rails does does is not thread adverse i should say it's
00:35:01.520 not rails job to create the threads and execute things that's that's your um your application server
00:35:07.359 so um so things like unicorn versus rainbows unicorn is single threaded rainbows you can choose to use a thread
00:35:13.839 pool to execute to execute rails if you're using jruby you can use
00:35:19.200 trinidad and it'll it'll spin up a thread pool to handle requests and again rails doesn't care about that
00:35:25.280 all rails knows is it's handling a request on this thread
00:35:31.119 so yeah i mean the the issue is is that you do have some trade-offs there there is some developer happiness
00:35:36.320 trade-offs jruby is super fast in production and and i think
00:35:41.920 it's a great project but there there still remains some developer happiness problems you know they got startup
00:35:46.960 startup problems startup time issues as always and that's just due to the the jvm
00:35:52.640 there's not much they can do about that i'm sure they've done as much as they can to to minimize that
00:35:59.760 but at the end of the day thread safety also requires you to turn off auto
00:36:05.200 loading so you don't get the the benefit of reloading your or hitting control r and reloading your
00:36:11.760 browser on every request so you know it's a trade-off people people
00:36:17.920 just need to be aware of those trade-offs and make their own decision yeah um psychic pro it's about five
00:36:26.000 hundred dollars per month i think that makes sense for a business but it's not per month just all right yeah yeah um but is there
00:36:33.040 also a trial for that okay no refunds
00:36:38.640 no no trials no refunds it's uh uh you know i'm i'm i'm selling a
00:36:44.320 product and i'm giving people the opportunities to support me in my efforts and my open source and if
00:36:52.160 people choose to avail themselves of that great i'll i'm going to do my best to support you sidekick's still there it's free
00:36:57.839 it's open source it's lgpl if you're happy with that and you just want to use that that's perfectly fine
00:37:12.240 so the retry is almost identical to delayed job delay job has the same sort of idea in
00:37:18.720 fact i stole the exact formula that they used to calculate how often you retry it basically retries with an
00:37:24.480 exponential back off so if it blows up it'll retry 15 seconds from now and then with every additional iteration
00:37:32.079 it it lengthens the time that it retries and it retries for about three weeks
00:37:37.760 or 25 total retries and what's great about that system is
00:37:42.800 that if you if it blows up you know i assume everyone's got air brake or exceptional
00:37:48.240 or something integrated into their rails app i hope you do at least if you get an email saying that there
00:37:54.240 was a bug and your worker blew up you just go in you fix the bug you push a new release
00:37:59.520 sidekick will retry it eventually you don't care when right you don't you don't have any state in that message right because you
00:38:05.839 followed my pro tips so the problem just sort of goes away in having to rerun that job
00:38:15.359 any other questions all right thank you
00:38:52.000 you
Explore all talks recorded at RubyConf 2012
+45