Summarized using AI

Abstracting Features Into Custom Reverse Proxies

Nick Muerdter • November 01, 2012 • Denver, Colorado • Talk

The video, titled "Abstracting Features Into Custom Reverse Proxies" by Nick Muerdter, discusses innovative ways to utilize custom reverse proxies to standardize functionality across diverse applications and programming languages. Muerdter, speaking at RubyConf 2012, outlines the concept of reverse proxies and how they can be leveraged to tackle challenges arising from organizational silos and varied technology stacks.

Key Points Discussed in the Video:

  • Definition of Reverse Proxy: Muerdter explains that reverse proxies serve as an intermediary between users and backend servers, managing tasks such as static file serving and slow client handling.
  • Diverse Development Needs: At NREL, where Muerdter works, different groups were developing web services using various programming languages. This diversity led to a need for unification and standardization in exposing these services to end users.
  • Implementation of Custom Reverse Proxies: Instead of modifying each API individually for features like authentication, request throttling, and analytics, Muerdter advocates for implementing these features at the reverse proxy level, allowing existing APIs to remain untouched. This method leads to:
    • A single entry point for users to access multiple APIs, simplifying the user experience.
    • Reduced implementation code and collaboration among developers, enhancing efficiency and code manageability.
  • Scalability and Flexibility: Reverse proxies can act as load balancers, improving scalability. Muerdter highlights the flexibility of using Ruby and the EventMachine library to implement custom features efficiently.
  • Federal Government Strategy on APIs: The talk also connects the benefits of reverse proxies to a broader effort by the U.S. government to standardize API access, making services more user-friendly and easier for agencies to manage.

Conclusion and Takeaways:

  • Reverse proxies present an effective architectural solution for dealing with diverse backend services without necessitating major changes to existing systems.
  • Custom proxies can implement a range of common features across various languages and applications, ultimately enhancing both developer and user experiences.
  • Muerdter encourages viewers to rethink their application architecture, considering how reverse proxies can simplify the complexities of multiple integrations.

The video encapsulates the potential of using custom reverse proxies to abstract and standardize functionalities, ultimately aiming to improve both user and developer satisfaction across various applications.

Abstracting Features Into Custom Reverse Proxies
Nick Muerdter • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

or Making Better Lemonade From Chaos

Life isn't always simple. We often have to deal with a mishmash of applications, languages, and servers. How can we begin to standardize functionality across this chaos? Custom reverse proxies to the rescue! Using Ruby and EventMachine, learn how you can abstract high-level features and functionality into fast reverse proxies that can improve scalability, save time, and make the world happy.

See how we've applied this across a diverse set of web service APIs to standardize the implementation of authentication, request throttling, analytics, and more. See how this can save development time, eliminate code duplication, make your team happy, make the public happy, and make you a hero. See how this can be applied to any TCP-based application for a wide-variety of use cases. Still think your situation is complicated? Learn about the U.S. Government's plans to standardize API access across the entire federal government. With some reverse proxy magic, this isn't quite as difficult or as foolhardy as it may first sound. It also comes with some nice benefits for both the public audience and government developers.

RubyConf 2012

00:00:14.389 thumbs up excellent okay well it's about time and it looks like it's a pretty packed house so uh
00:00:20.779 Wow hopefully I don't disappoint all of you going after maths is pretty intimidating but uh I guess I'll launch
00:00:28.019 right into it since it's about time to start so I'm here to talk about
00:00:33.140 abstracting features into custom reverse proxies aka making better lemonade from
00:00:38.399 chaos so what am I actually talking about I don't know if that title makes
00:00:43.800 sense to anybody else but what I'm here to talk about our reverse proxies to start with so just to make sure
00:00:49.500 everybody's sort of on the same page with knowledge and you know familiarity with everything I first just want to lay
00:00:56.129 out some basics of you know what is a reverse proxy in case you're not familiar with what a reverse proxy is basically your first proxy is a server
00:01:02.100 that sits in front of your other servers inside your internal network that sort of does stuff it does something in
00:01:10.260 between your main server and the end users on the internet so what kind of
00:01:16.890 stuff does that reverse proxy do a lot of you in the Ruby community might
00:01:22.170 already be familiar with reverse proxies you might have already used one a lot of you might use a unicorn to deploy your
00:01:27.750 apps and in that case you oftentimes put nginx in front of unicorn and that acts
00:01:33.150 as a reverse proxy so in that case Internet's role is to sort of serve static files and to deal with slow
00:01:39.329 clients so there that's the role of the reverse proxy but what I'm here to talk
00:01:44.369 about is sort of taking that reverse proxy layer and doing other fun things inside of it you can sort of implement
00:01:51.000 your own layer there in Ruby you can implement custom features you can use a
00:01:56.130 bet machine to do this to do highly scalable reverse proxies and that's a lot of fun so that sort of is the
00:02:04.110 outline of reverse proxies but you might still be confused as to what this is actually all about
00:02:10.800 so first I'm sort of gonna go into the why of why
00:02:15.960 we've sort of built these reverb proxies and why they might be suitable for certain use cases and then also get into
00:02:21.870 a little bit of how you would actually implement this kind of thing so the easiest way to get started is to start
00:02:28.020 with the story and that story is sort of how we're using customer reverse proxies
00:02:33.540 and I think it's perhaps a story that you know some of you might be able to relate to in some way so several years
00:02:40.620 ago we started building a lot of web services and we wanted to expose all of
00:02:48.630 our web services to the world but another part of this story is silos and I'm not talking agricultural silos with
00:02:55.410 delicious coffee or sugar I'm talking about organizational silos so I work at
00:03:00.930 NRL at the National Renewable Energy Lab it's just down the road and golden so I'm local yeah renewables
00:03:14.950 so NREL is a big company it's uh I think I don't know what the latest count is
00:03:20.150 it's grown a lot it might be around 2,000 employees something like that not all of us are developers there's a lot
00:03:25.520 of very smart scientists they're doing very cool research on renewable energy stuff but we do have a lot of sort of
00:03:31.760 separate development groups within our organization so you know our group that
00:03:36.770 I work in you know sort of has historically done vehicle stuff while other groups maybe do stuff related to
00:03:42.860 buildings and another group does stuff related to solar so we sort of have these different departments that all
00:03:48.980 sort of work on different things but we were all sort of wanting to build web services at the same time and sort of present those to the public in a
00:03:56.090 standard way and another way to look at this is because we sort of have this spread of groups and different you know
00:04:04.160 different development groups throughout the lab you know our group happens to do Ruby stuff we have some other groups
00:04:10.250 that do Python stuff we have other others do Java PHP etc so there's a big
00:04:15.260 diversity there so even if you're not working in big organization you might be able to relate to this from the
00:04:20.750 perspective of you might have legacy apps that you're you you deal with you might have you know a lot of there might
00:04:27.710 be different languages in your stack so there can be a lot of aspects to your stack even in a smaller company but from
00:04:33.979 our perspective we were dealing with sort of different departments and sort of we wanted to make all of our Web
00:04:39.620 Services sort of come together and so
00:04:45.169 and we wanted to present those to users as sort of a single offering and really
00:04:50.990 users don't care about this kind of thing I mean they don't care that our group happens to do transportation stuff
00:04:57.200 and another group happens to do solar stuff they're just interested in hey what does NREL have to offer as far as
00:05:03.290 web services go and how can I most easily find them so what we really wanted out of this platform that we were
00:05:09.830 building to expose our web services is we wanted one entry point for users to find all of our API so we sort of wanted
00:05:16.280 to bring together all these api's that were being developed in separate groups in different ways and bring them
00:05:22.880 together and we wanted it to make it easier for the easy for those users to access the api's
00:05:29.380 we basically wanted to have the user sign up for one account and then they can access all of our API so they don't
00:05:35.419 have to get an account from our transportation group to access the Ruby services and the solar group to access
00:05:41.300 the Python services and so on and then from our perspective on the development
00:05:46.789 side of you know we have all these different groups building web services we we wanted sort of API key based
00:05:53.509 management to access all of our API is across the board we sort of decided that's okay for what we're doing we were
00:06:00.229 doing pretty public stuff we want a rate limiting that was something that a lot of our groups hadn't really explored or
00:06:06.470 implemented within their individual api's but we we wanted that there was a
00:06:13.009 need across the lab and we also wanted analytics again another thing that you know other groups hadn't really tackled
00:06:19.039 and it was something that we really thought we could you know standardize and there would be benefits to doing
00:06:24.229 that so what we didn't want out of this were requiring changes to all of these
00:06:29.240 different groups you know it's a it's a big uphill battle to go into you know I mean the one option is sort of
00:06:35.030 standardizing it for all of our api's and saying it must be written in Ruby which I wouldn't mind but other people
00:06:40.639 would and so but we didn't want to have that mentality of saying you have to do
00:06:46.130 it this way if you want to be part of our cool Club of api's we just wanted it's a big battle to make that change
00:06:52.280 within different groups and for historical reasons you know there are the separate groups we're trying to do
00:06:58.610 better to work closer together but I'm sure you guys can empathize hopefully with sort of those big organizational
00:07:04.430 structures or just again even if you're not in a big organization sort of dealing with legacy apps and
00:07:09.919 sort of this diversity even within your codebase and it's also time consuming to have to go back and touch all these
00:07:16.219 other apps so that's where customer vs. proxies come in to the rescue and so
00:07:22.460 what does this look like so if we look at our diagram again it's at that same
00:07:28.940 stack at that level of the reverse proxy we were able to implement custom features like authentication of rate
00:07:36.380 limiting analytics those are the things we decided we needed across all P is and then they can be shared among
00:07:43.050 all the back-end servers so it sort of just slips in there without having to make any changes on any of the existing
00:07:49.560 ap is that were already built and it implements that common functionality the
00:07:54.870 existing api's don't have to change they can still exist wherever they wanted to exist and the proxy is just agnostic to
00:08:02.310 what sort of back in technology is happening but it sort of provides that unification that single entry point to
00:08:09.330 all of our API s that makes it a lot easier to apply a standard authentication scheme across all of our
00:08:15.000 AP is do rate limiting across our ap eyes and analytics so that sort of is
00:08:24.330 the basics of what we've done so how did this really help us did this really make better lemonade from chaos so from the
00:08:31.710 user perspective I think it did it helped our users we were able to build one website where users can go to and
00:08:38.760 find all of our web services and they can they only have to sign up for one
00:08:44.159 API key and they can get access to all of our API and so they're sort of shielded from sort of the internal you
00:08:50.820 know they don't need to know what department does what and their home you know we have different web presences
00:08:56.220 throughout the internet on various gov domains so they don't have to know where to find stuff there's just a single
00:09:02.190 website for renewable energy AP is that they can go to and start to find this stuff and it's easy for them to just
00:09:07.709 dive in and start using any of them no matter what all is happening on the backend and for our developers the real
00:09:16.110 advantage is that for old AP is they have to do absolutely nothing you know so we had an existing suite of API is
00:09:22.920 already out there but they were all sort of all over the place but they did absolutely nothing but just sort of by
00:09:28.680 existing we were able to put them behind this reverse proxy and then we were immediately able to start layering authentication rate limiting and
00:09:35.250 analytics on top of it and the same goes for new AP is when somebody is building
00:09:42.120 a new API those are just things they don't have to worry about they just sort of assume that all of that is taken care
00:09:47.279 of when they're building a new API and it's just outside the scope of it they just if if I'm being accessed I'll I'm
00:09:53.829 assuming the user is fully authenticated they're not they haven't exceeded rate limits and so on so that's it I think
00:10:01.959 it's been also advantageous for us just from a development perspective speeding up and not having to re implement those
00:10:07.360 same details and so forth so yeah so it's led to reduced implementation code
00:10:15.069 because we don't sort of have two individual api's don't have to implement
00:10:20.889 the same sort of logic over and over again and you know there are definitely ways you can abstract this and there's definitely ways you can you know reuse
00:10:27.910 code in a clean way but it just reduces the need to do any of that there's
00:10:33.850 really no code involved to implement these features at the individual API level there is code obviously at our
00:10:39.850 custom reverse proxy level but that's a little easier to maintain and the nice
00:10:44.860 also thing is that standardization is enforced across the board you know because we could abstract this into some
00:10:50.829 sort of library and things could reuse it but you know we don't run the risk of somebody messing up authentication
00:10:57.639 within their individual API by putting this sort of at a higher level
00:11:02.949 it sort of enforces that that's going to happen so the other advantages is that
00:11:10.509 because of how this operates at the layer that it operates any new features we add to this reverse proxy layer
00:11:16.149 benefit everybody so you know whether or not they're Python services Ruby services PHP services Java services this
00:11:24.009 architecture benefits everyone when we decide to implement new functionality obviously not all functionality is
00:11:29.769 suitable for this kind of thing but it could be a powerful mechanism for certain types of functionality that can
00:11:35.649 be layered like this and another thing is that just reverse proxies in general are a nice scaling mechanism you know a
00:11:42.759 lot of times they're just used you use reverse proxies as load balancers so sort of having this in place and sort of
00:11:49.569 getting everybody on board with this architecture of having and reverse proxies upfront allows us a lot more
00:11:55.420 flexibility on the backend to sort of scale things independently so that's
00:12:00.579 sort of the basics of what we did on to how we actually built these things
00:12:06.269 so you know so yeah how would you actually take Ruby and do some custom
00:12:13.050 magic stuff at that layer and do it fast and efficiently so currently we're using
00:12:19.079 e/m proxy it's a nice event machine proxy library and I certainly can't take
00:12:24.959 credit for any of this stuff I'm about to show code wise we're just users of it but it's open source it's nice it's easy
00:12:31.970 and it's out there so IAM proxy is Ruby and event machine and if you're aren't familiar with the event machine it's
00:12:38.509 just sort of a event based system for writing Ruby code in an event way vented
00:12:45.420 way so if you're you've heard all the stuff about nodejs it's similar in architecture to that but Ruby so it has
00:12:53.699 some nice advantages of being blazing fast it's also flexible and it's low-level that low-level aspect has its
00:13:00.149 pros and it's cons but I'll get into that in a bit so at a very basic example
00:13:06.089 this is sort of what an e/m proxy looks like so it's pretty basic but you sort
00:13:11.790 of have you set up a server so this when you run this script you basically start up a server in this case it would be
00:13:18.360 running on your current server listening to all the IPS running on port 80 and then in here we say we're going to proxy
00:13:25.079 to the same server one 27001 on port 81 and really that's all you need to do to
00:13:31.529 do sort of a transparent proxy but the real power of this is these callbacks that you can do so you can you have on
00:13:38.250 data on response on finish and I think there's even one other one but basically it gives you a lot of flexibility as far
00:13:44.009 as intercepting of chunks of data as they stream through the proxy and doing
00:13:49.199 stuff with that so you can do something with as the data for the request comes
00:13:54.389 in and then you can also do something to the response as it goes out and then you can also do clean up stuff on finish so
00:14:02.370 as a quick example of what this might look like this is a non data callback
00:14:08.100 where I'm modifying the user agent so the user agent comes in on the request
00:14:13.709 and basically I'm just doing a search and replace any existing user agent with the user
00:14:19.930 agent of e/m proxy and you'll note that I'm doing a search I replace but again
00:14:25.779 you're dealing with sort of chunks of data you're dealing with sort of the raw HTTP at the raw HTTP level here so you
00:14:33.610 have to be a little careful but you'll see here that I'm actually searching for user agent and then and two new line
00:14:41.769 characters that is how the HTTP headers work see there's some things you have to
00:14:47.290 be aware of it's not quite all simple but the other thing to note is that
00:14:52.420 you're dealing with chunks of data in this case so you're sort of you're getting a stream of data as it comes through in chunks so you can't always
00:14:58.420 assume that you have like say the full request in here so you can't just do a search or in place and assume you have
00:15:03.879 all the data you sort of have to deal with there's other things and I'll get in a bit of that as far as buffering and
00:15:10.420 things if you need to have the full request so that's all well and good but you could do that kind of thing with you
00:15:16.540 know any sort of reverse proxy or most of them to do sort of a change the user
00:15:21.819 agent that's pretty basic change ahead or do something like that at the reverse proxy later here's something that you
00:15:28.300 know now because this was written in Ruby you can start to tap into all these Ruby libraries so that's the real
00:15:34.629 advantage of sort of this approach and implementing custom stuff so here's an example of I'm setting up I'm connecting
00:15:42.579 to Redis and every time I get a chunk of data I'm incrementing the IP address I
00:15:49.059 have a counter for that IP address in Redis so again this is a little taste of what you can start to do because it is
00:15:55.569 Ruby you can just sort of start to write things and access your Ruby libraries and it can be a lot of fun and as I
00:16:04.420 mentioned you're sort of dealing at a low level here you know you basically have raw HTTP strings so if you want a
00:16:12.939 higher level interface into HTTP it's sort of is up to you to do that yourself and there's libraries to do that and
00:16:19.269 here's a very quick example of sort of as you get data on data so chunks of
00:16:25.929 data are coming in you pass that to this ttp parser library and then once it's
00:16:32.190 determined that all the headers have read an or there's other callbacks on the HTTP library you can then once the
00:16:38.670 headers are completely right in then I am accessing that user agent header as a
00:16:44.279 ruby hash so this is sort of if you want that higher-level interface into the
00:16:49.769 HTTP requests you sort of have to do that yourself but there are libraries to
00:16:55.230 do that kind of stuff so you might be asking why would I do this you know this sounds sort of like a pain to be dealing
00:17:02.550 with you know things at this low of level and having having to you know deal
00:17:09.390 with stuff like parsing HTTP yourself you know where a lot of us might do web development a lot of us might be used to
00:17:15.089 you know nice high-level frameworks like rails and Sinatra and things like that and so there's a few reasons why you
00:17:21.750 would go down this path a big reason is transparency at this reverse proxy level
00:17:29.250 at this proxy level implementing it this way you are dealing with the ride to be
00:17:34.260 request but that gives you a lot more flexibility to sort of pass that request on to the back end in a completely
00:17:40.950 transparent manner so it's not apparent that that there's something in there in
00:17:46.080 between doing something because if you if you try to do this with something like rack or rails or Sinatra by that
00:17:55.200 point by the time your application has been hit the web server is already taken in HTTP requests so it becomes difficult
00:18:02.429 if you then want to recreate that HTTP request to send it to a back-end because
00:18:08.370 by that time you don't really have access to that raw low-level stuff so it can be hard to sort of you sort of have
00:18:14.610 to manually try to reconstruct the request and there's a lot of you know edge cases with HTTP stuff that it makes
00:18:20.880 that hard to deal with so the other
00:18:26.160 reason why you would do this is purely just speed and efficiency you know higher level frameworks are great we use
00:18:31.410 them a lot but at this kind of level you know we really want things to be very fast
00:18:37.320 and very efficient and scalable and event machine is very fast and invented
00:18:43.980 systems are very suitable for proxies that's why you see a lot of proxies being built in nodejs this is similar in concept they scale
00:18:51.809 nicely and efficiently and there's a few reasons for that I'll throw up some
00:18:56.970 terribly unscientific benchmarks just to give you a sense
00:19:03.809 I just ran these on my computer there's probably lots of things going on here but the basic thing is that basically I
00:19:11.129 benchmarked making a direct request to a back-end server and then making it through AM proxy event machine proxy and
00:19:17.399 then making it through a proxy I just found I didn't know much about it called RAC reverse proxy that basically does
00:19:23.429 take a higher-level approach to proxy and you have a lot of nice access to sort of the parse HTTP requests so in
00:19:30.330 this case I mean it's already fast it's one millisecond e/m proxy adds 0.5
00:19:35.639 milliseconds RAC reverse proxy adds almost 3 milliseconds so I mean 3
00:19:40.679 milliseconds maybe not the end of the world who really cares it starts to get
00:19:46.200 the picture becomes a little more complicated once you start to get into bigger requests and this is partially
00:19:51.659 determined by your needs and so here's an example of a larger request where
00:19:56.970 there's more data involved and here AM proxy has 150 millisecond and rack
00:20:02.820 reverse proxy adds 800 milliseconds so why this is the case is that rack sort
00:20:10.350 of deals with a complete request and a complete response so it sort of takes in the full request then it tries to
00:20:17.129 recreate that and send it to the back-end server gets the response from the backend server and then sends it along to the original client so it sort
00:20:24.870 of has to buffer all of those in memory so you know if you can imagine that
00:20:30.210 you're uploading a 1 gigabyte file through some web service and you need to download 5 gigabytes I mean that's a lot
00:20:35.970 of data but there are cases where this happens with rack reverse proxy that
00:20:41.519 sort of is bottlenecked at that and it would read that into full memory at each step and again there are ways around
00:20:47.070 this but it sort of is the difference is M proxy deals with things at a trunk level so you're sort
00:20:54.150 of dealing with just chunks of data at a time and you can just stream them on to the backend as fast as you're receiving
00:21:00.270 them so as soon as you've have enough information you need to have to make your decisions that you want to do
00:21:06.840 inside your custom reverse proxy you can just start streaming that data very quickly and then stream the data back so
00:21:12.660 there isn't that required buffering and so that sort of gets to another aspect of this which is just flexibility of um
00:21:19.860 proxy and in line with that you know it's low level but and it's up to you to
00:21:27.180 implement more but it's up to you to decide if you do want to implement something like buffering or if you want
00:21:32.670 to stream everything live and another thing you can use a proxy for is non
00:21:37.710 HTTP things so you could use it for sort of any TCP level thing so you could do
00:21:43.650 with our WebSockets mail servers database protocols you can sort of put
00:21:50.610 it in the middle of anywhere and any of those sort of types of connections and
00:21:56.130 potentially do custom things so what else could you sort of do with these
00:22:01.530 customer reverse proxies I've talked a bit about you know what we did as far as tackling some of our issues with api's
00:22:08.090 and you know implementing authentication rate limiting analytics and you know I
00:22:14.280 think those are all really good candidates for something that could be implemented that level you know because you do have to be careful this you know
00:22:19.680 you can't really implement your it's not suitable to implement your whole application at this level but it's sort of as those high level features that you
00:22:26.700 know you might have potentially diverse backends or even if you don't have diverse backends it could be an interesting different way to structure
00:22:33.180 your application so what else could you do with this I don't really have any
00:22:38.730 concrete examples but I'm just throwing out ideas here you could do error handling so you know you could monitor
00:22:45.000 all requests coming back for an on success HTTP code and then you could do
00:22:50.280 something with that you could log that you could send emails and you know a lot of us do that within our rails apps but
00:22:56.340 you know if you're maybe dealing with we have legacy Perl applications I'm ashamed to admit but we have to deal with that and you
00:23:04.169 know we don't sometimes have those mechanisms to deal with sort of that air handling in the same way in those
00:23:09.389 applications but you could sort of slip in something that deals with all your errors across all of your applications
00:23:15.809 no matter how they're written and again their pros and cons you might not have access to all the debug details at that level but you could certainly you know
00:23:22.380 be notified about requests that failed you could even do this to sort of manipulate web pages you know say you
00:23:30.929 wanted to insert your Google Analytics snippet on all of your pages you could actually do that with a custom reverse
00:23:36.779 proxy you could sort of take the response as it's going back sneak something in there you know you could
00:23:42.450 even do crazier stuff with like taking all your JavaScript files compressing
00:23:48.179 them on the fly and then returning it as one javascript file to the client and we do that in rails but again if you have
00:23:54.029 diverse backends and maybe don't have as capabilities you could do that at a higher level you could you know maybe do
00:24:00.059 your template in your web page template put in a header and a footer I don't know why you do it at this level but you
00:24:05.580 could you could also do things like say you had a bunch of JSON API is already
00:24:11.789 and you know maybe you can't touch them for a variety of reasons but you wanted a JSON P you could sort of add that at
00:24:17.669 this level and then all of because JSON P is just sort of a matter of wrapping and existing JSON in a callback you
00:24:24.899 could implement that at this level by altering the response as it goes back and it would apply for all of your existing services you could do things
00:24:32.370 like check for security things see if incoming requests look malicious in any way for all of your servers do stuff
00:24:39.330 like that and again you can sort of do more than HTTP with this you can intercept and sort of manipulate also
00:24:46.080 it's a TCP things so you could do email database calls all sorts of fun stuff and there's a lot of great examples in
00:24:52.889 eeehm proxies github repo there's an examples folder just filled with sort of
00:24:58.080 interesting ideas of what you could sort of do so again I don't want to take credit for any of this stuff this is
00:25:03.600 sort of us just repurposing a lot of this stuff so there are a few things to
00:25:09.929 be aware of when you're building these as I've sort of hinted one of them is
00:25:15.300 buffering so and again I've already sort of talked about this but imagine you have you know one gigabyte upload and a
00:25:21.390 five gigabyte download if your proxy layer buffers that request that becomes sort of a sort of a bottleneck but it
00:25:29.970 also becomes a place where as the request gets uploaded it all has to halt until it's fully uploaded and then it
00:25:35.640 gets sent on to your back-end server and in the case of a one gigabyte upload or a 5k bi download that buffering can add
00:25:41.940 significant delays and so sometimes buffering is desirable though and
00:25:47.010 sometimes you can't achieve stuff without buffering and you know for example a unicorn actually wants
00:25:52.830 buffering to deal with slow clients so it can be advantageous but other times it's not and in our case we're dealing
00:25:59.130 with a diverse set of API is that we don't really know the use case of all those api's we opted not to buffer
00:26:05.490 because we just don't know what all those backends are doing or if they want to stream data and we didn't want to prevent that streaming from happening at
00:26:12.360 our proxy level because we just wanted to be as transparent as possible some other things to be aware of at this
00:26:18.240 reverse proxy layer is that if you are going to modify the response going back
00:26:23.520 to the client you can do that but it could be a little tricky you in the
00:26:30.960 header going back to the client there's usually a Content length header and you have to adjust adapt accordingly if
00:26:37.200 you're going to do any manipulation so again this isn't the easiest way to perhaps alter your website but for
00:26:43.710 certain use cases I think it can be powerful and in line with that another thing to be careful of is gzipped
00:26:50.160 responses going back to the client so if your back-end server decides to gzip something up send it back to the client
00:26:56.280 if you want to alter the response body in that case it gets tricky again because you sort of you do have to
00:27:01.800 buffer in that case because you sort of have to get the full gzip then because you can't just gzip the different unzip
00:27:08.130 the different trunks you have to get the full response body buffer it on gzip it
00:27:13.680 do whatever you were wanting to manipulate and then regions if it and send it back to the client so those are
00:27:20.250 just some things to be aware of that we've just sort of learned as we through this process so now I'm gonna
00:27:27.570 talk a little bit about some other stuff so this is all well and good but I don't
00:27:33.899 know if you're interested in bigger stuff so this digital strategy for the
00:27:40.109 federal government came out earlier this year May 23rd 2012 apparently and so I'm
00:27:49.320 involved in the API portion of sort of some of this digital strategy and a big
00:27:54.600 part of this digital strategy for the entire federal government is just a web services bonanza I mean it's just like
00:28:01.200 everybody should be doing Web Services web services are great all agencies should be you know delivering data to
00:28:08.849 people in the form of web services and I would tend to agree with that I mean I think you know as app developers it's
00:28:15.809 always nice to find data out there it's just in an open web format in an API
00:28:21.299 that's easy to use and get access to any or the government has a lot of data but
00:28:27.720 it's not always in the most easily accessible format you know I've seen stuff where it's like it's a printed
00:28:33.539 Excel spreadsheet it's been scanned in and then put into a Word document and
00:28:38.849 then they just give that to you and it's like I'm providing data not really so a
00:28:45.119 big part of this push for this federal strategy is to encourage agencies to
00:28:50.220 develop a lot more api's and web services so expect that sort of within the coming year to see a lot more
00:28:56.720 government data out there so that's exciting and of itself but the portion
00:29:03.599 that I'm involved in is sort of the API thing and sort of the main objectives
00:29:08.820 there is you know there is this big push for web services but they sort of want
00:29:14.399 to tackle two things on the one hand they want to make it easier for users like a lot of you to find and consume
00:29:21.599 federal aid api's there are some out there already but a lot of times they're not easy to find you know you might be
00:29:27.749 interested in it some data but it's just hard to find through the bureaucratic websites that are government websites
00:29:35.500 and the other aspect of this is that if they are pushing agencies to develop
00:29:40.990 more api's they they need to make it easier for agencies to develop and
00:29:47.409 deploy more api's so you know if there's this big push for api's it just needs to
00:29:53.769 become easier a lot of agencies you know a lot of them are ahead of the game and they're already building api's that are doing great work but some of them are
00:30:00.009 sort of they need help with this kind of thing so in a lot of ways this is the exact same problem we had within our
00:30:06.129 organization at NREL just on a much bigger scale you know there are silos of
00:30:12.879 organizations there's different agencies that are all doing things independently but a lot of the stuff that needs to be
00:30:20.139 addressed are similar issues so it's
00:30:26.049 sort of mirrors our same problems so what we're looking at right now is is it sort of the same solution we're
00:30:33.279 currently evaluating using basically our software stack that we developed at NREL
00:30:39.370 to sort of proxy to all the agencies within a federal government or possibly
00:30:45.580 other solutions that sort of do similar things but we've been talking to a lot of different agencies and the consensus
00:30:52.629 was they want something like this they don't want to have to deal with authentication on their own you know they want and from the federal level and
00:31:00.220 from a user level they want to make it easy for users to just get one API key and to be able to generally access all
00:31:07.120 the federal API so you don't need to have a bajillion accounts for all the different agencies so so yeah I mean
00:31:15.549 it's sort of the same issue sort of I perhaps the same solution and so yeah
00:31:21.309 we're currently involved in getting something like this up and running perhaps within the next six months for
00:31:27.070 federal agencies to start taking advantage of so that's all exciting there's lots of federal there's lots of
00:31:35.200 web service action going on in the federal government over the next year so stay tuned if you're at all interested in all that so
00:31:44.630 I'm starting to wind down here but I have some more slides to go through so what has all this been about really so
00:31:51.470 to sort of summarize you know what I really want to encourage here is sort of
00:31:56.810 a different way of perhaps thinking about some of your architecture again I
00:32:02.830 reverse proxies can't solve all problems that are not suitable for all problems but I think they can be an interesting
00:32:08.720 different approach that I don't see utilized as much to solve certain problems because they're fun for the
00:32:16.730 whole family you know anything you do within the reverse proxy layer affects all of your back-end applications so as
00:32:22.670 you start to add new features to your reverse proxy layer it can be advantageous for everyone so it's an
00:32:28.550 interesting way to sort of abstract things completely outside of the application level into a higher level
00:32:33.770 that sits in front of all of your applications and you know what I want to encourage you here is you might be able
00:32:39.470 to do more with reverse proxies than you realize you might think of reverse proxy just is you know a software like engine
00:32:46.190 X or H a proxy that sort of just does proxying but there's not a lot of logic that can happen in there or rather like
00:32:53.570 implementation details that can happen in there but since you can write these in Ruby you can start to leverage all
00:32:59.510 these libraries you can connect to databases you can do all sorts of crazy fun stuff at that level so yes again
00:33:05.480 just sort of think about it is a different way to perhaps architect some of your applications so I'll go through
00:33:11.480 some just random resources that might be useful one of them is API umbrella this
00:33:18.590 is our full API management system we've built at NREL and we've just recently
00:33:24.020 finally got approval to open-source it so it's all out of github we're super excited to finally open-source the
00:33:29.390 project and includes our custom event machine based proxy so you know even if
00:33:36.860 this is even if you're not interested in api's you might check it out as you know what you can do with what you can do
00:33:46.250 with that reverse proxy level and how you would implement some of those details so it's github slash NREL slash API
00:33:53.090 umbrella and it's a new open-source project for us so we're sadly behind the times as
00:33:58.300 far as getting it all documented and everything but definitely reach out to me if you have questions about any of
00:34:03.310 that as far as just ruby and event machine just low level proxies there's AM proxy which i talked about
00:34:10.030 and those are the examples i show it's just sort of a simple bare-bones but very capable reverse proxy
00:34:16.570 there's proxy machine which is actually what our current production system is based on it's a github project it's
00:34:24.790 simpler the one and and it can be easier to get started perhaps but the one
00:34:31.390 disadvantage is it can only act on incoming requests it doesn't track the response going outwards so it sort of
00:34:36.640 just does incoming stuff and then does routing it doesn't keep track of the response coming out so you can't keep
00:34:42.100 track of some of the stuff you can in DM proxy so we're currently in the process of switching over to um proxy and then
00:34:48.820 there's Goliath which is another thing that's based on all this event machine magic and it's more of a sort of
00:34:57.430 higher-level framework it still is pretty low-level but it then uses in synchrony which uses some fiber stuff so
00:35:04.480 you act it actually hides all the event stuff from you so that's an interesting project to also check out as far as just
00:35:10.810 general reverse proxies you might be interested in we use there's a lot of them out there this certainly is an
00:35:16.570 exaustive these are just sort of the ones we use in variety of capacities there's a CH a proxy H a proxy is
00:35:22.500 amazingly fast and scalable and it could do all sorts of fun load balancing stuff and it acts as a great general proxy
00:35:29.760 varnish cache is an interesting one it's a reverse proxy but it also is a caching layer and so that sort of gets back to
00:35:36.490 sort of we're actually going to be adding this to sort of our stack we're gonna slip this in as another reverse
00:35:42.550 proxy in our stack and then the advantage there is that some of our older API is that maybe don't have as
00:35:48.640 good of caching capabilities as some of those stuff built into rails does can start to use the varnish caching server
00:35:54.550 as a caching layer and varnish is nice just for caching all sorts of stuff and
00:36:00.220 then there's nginx which is really more of a web server but it also has some pretty nice proxying capabilities
00:36:07.220 it isn't as exhaustive as something like H a proxy but if you're already using it it can do quite a bit of stuff and so if
00:36:15.710 you happen to be interested in renewable energy api's this is our site that we sort of built that sort of this is what
00:36:23.720 this is was all about is sort of making one website for users to find all of our API so even though they happen to be on
00:36:29.060 all sorts of different servers within our organization and so you can check that out at developer and rel gov and
00:36:35.810 there's lots more api's coming soon some of my colleagues sitting down here on the front are working hard away at
00:36:41.690 building new api's so there's a lot of cool stuff in the pipeline and so those
00:36:47.630 are some contact details for myself I don't really use Twitter I'm sort of a curmudgeon but feel free to contact me
00:36:54.320 maybe I'll start using it and finally this is a completely off topic and a
00:37:00.109 shameless plug but if you've been wondering about this ridiculous thing on my upper lip we just finished our
00:37:08.570 mustache competition at work this week and we have a mustache competition every year for charity and so we haven't met
00:37:17.450 last year's goal but if you're interested it's a local denver-based charity to help support kids and
00:37:23.750 education resources so if you're interested bitly slash rubies - you can
00:37:31.310 donate but anyway I think there's actually some time for some questions because I was cooking yeah
00:37:42.589 I have a question that big deal with Easter services to be bishops one
00:37:56.569 was how do we deal with sort of inner process communication you know so if one
00:38:02.839 what service needs to call another service does that travel through the reverse proxy as well and the answer is
00:38:08.209 in our current system yes it isn't strictly necessary on the backend we definitely could communicate
00:38:13.910 directly server to server if we wanted to save the overhead but I will say this reverse proxy is very fast I mean it in
00:38:21.049 benchmarks I think it adds like four milliseconds to deal with the rate
00:38:27.109 limiting and analytics and user authentication and that utilizes Redis and MongoDB to do all that so it's fast
00:38:34.130 so for the time being we haven't really seen any problems with routing those sort of inner service communications
00:38:39.979 through the proxy and that sort of just gains us some advantages as far as analytics mainly just because we are
00:38:46.279 interested ourselves to know internally how we're using our API
00:38:57.110 so the question was how to deal with authentication then yeah so we still use
00:39:03.510 the same our authentication is admittedly very simple it's all API key based so it's just a big long API key
00:39:10.050 and then on the back end we have capabilities to remove the rate limits
00:39:15.450 from certain API keys so basically we set up those back in things as unlimited API keys so we have API keys that we
00:39:23.070 send back and forth and that's what we use for authentication and that sort of identifies each app so again for analytics yep
00:39:43.690 well how do you do with the fact that you're the pattern of the text you're looking for maybe split between blocks
00:39:50.260 do you have a facility to you love producing blocks yeah that's a good question so for the first part of that
00:39:56.920 question that you might have missed was how do we deal with sort of since things are coming into this proxy as chunks and
00:40:02.200 you know if we're doing something like a regular expression how do we make sure that it doesn't spam chunks and we don't have that data so the answer there is
00:40:09.359 first of all I wouldn't really recommend doing regular expressions search is sort of for that reason that was sort of a
00:40:14.890 very simple example but that is why you would use a higher-level library like
00:40:20.710 that HTTP HTTP parser library that then is capable of knowing when all of your
00:40:26.559 headers are read in and then you could take some sort of action on the whole all the headers or once all all the bodies write it right in but you're
00:40:32.559 right I mean in that case you do sort of need to buffer and that's why if you're doing anything with the response you
00:40:38.819 there are some ways you can get around it for very specific cases but for the most part if you're dealing with the
00:40:44.289 response you have to buffer and then if your incoming requests sort of the same deal if you want to do anything with the
00:40:51.339 body and yeah yep you can take a look at
00:41:02.500 our github repo it's probably not as well tested as it should be so yes there are I mean I would say I would say
00:41:11.109 slightly yes I mean we I would say the testing right now tries to break it down
00:41:16.180 on a it doesn't really get involved in the event machine sort of level it sort
00:41:22.000 of is just testing the more basic blocks within there so yeah we're probably lacking on tests for sort of the whole
00:41:28.450 thing running as a whole with all the events going crazy so yeah testing is more of a challenge with the bed machine
00:41:34.569 stuff
00:41:47.010 Apache does do it yeah and Apache can be used I would say it's probably not as
00:41:52.990 scalable as a lot of these other solutions it depends on how you're running Apache whether or not you're
00:41:58.600 sort of it's been a while since I've used Apache but the worker model versus the other model whatever that is by
00:42:07.390 default Apache will sort of spin up new processes for every single request whereas I think all of these other ones
00:42:14.220 H a proxy and nginx at least or event based model so they're a lot lighter weight so Apache will work the other
00:42:22.930 ones possibly just a little more scalable but again if you're already using Apache go for it I mean I don't
00:42:28.900 want to tell you not to use it
00:42:37.210 yes so we oh sorry yeah the question was
00:42:43.999 a previous question I don't know if I repeated that was about using Apache if you couldn't pick that up this question
00:42:49.309 was about how we deploy our proxy and if we deal with any sort of missed requests
00:42:57.589 or any of that so to deal with deploying we all of our deployment stuff runs
00:43:04.249 through Capistrano but within that how we're really deploying that and dealing with that issue of we do basically do a
00:43:11.329 rolling restart all of these we basically run I think three of these
00:43:17.359 processes or four of these processes it's sort of as the same models engine X where you run as many as you know courses you have on your CPU so we have
00:43:24.920 you know several of these processes of running and then we do do a rolling restart we do that through a Python
00:43:29.930 project called supervisor it's a pretty nice library just it's not really Python
00:43:35.390 relate at all it just happens to be written in Python but it's a nice process management library that you can
00:43:41.599 deal with sort of treating things as process groups I know a lot of people might use like Monnett and I think
00:43:48.349 there's some other blue pill God I think there's a few other those I've we've been I think probably more happy with
00:43:54.950 supervisor and you can do so we basically do we did a custom
00:44:00.109 implementation of a rolling restart with supervisor so we stopped one and start another I think there might be some edge
00:44:06.140 case stuff where if somebody's in the middle of a request I'm not sure if we're completely gracefully shutting down right now but I think actually part
00:44:12.289 of proxy machine an e/m proxy does should handle graceful shutdown so theoretically that should happen without
00:44:18.589 any lost requests any other questions
00:44:24.160 yeah what perfect timing well thank you guys very much I hope this was okay
Explore all talks recorded at RubyConf 2012
+46