Abstracting Features Into Custom Reverse Proxies

EventMachine

Bookmark

Play on YouTube

Edit

Abstracting Features Into Custom Reverse Proxies

Nick Muerdter • November 01, 2012 • Denver, CO • Talk

The video, titled "Abstracting Features Into Custom Reverse Proxies" by Nick Muerdter, discusses innovative ways to utilize custom reverse proxies to standardize functionality across diverse applications and programming languages. Muerdter, speaking at RubyConf 2012, outlines the concept of reverse proxies and how they can be leveraged to tackle challenges arising from organizational silos and varied technology stacks.

Key Points Discussed in the Video:

Definition of Reverse Proxy: Muerdter explains that reverse proxies serve as an intermediary between users and backend servers, managing tasks such as static file serving and slow client handling.
Diverse Development Needs: At NREL, where Muerdter works, different groups were developing web services using various programming languages. This diversity led to a need for unification and standardization in exposing these services to end users.
Implementation of Custom Reverse Proxies: Instead of modifying each API individually for features like authentication, request throttling, and analytics, Muerdter advocates for implementing these features at the reverse proxy level, allowing existing APIs to remain untouched. This method leads to:
- A single entry point for users to access multiple APIs, simplifying the user experience.
- Reduced implementation code and collaboration among developers, enhancing efficiency and code manageability.
Scalability and Flexibility: Reverse proxies can act as load balancers, improving scalability. Muerdter highlights the flexibility of using Ruby and the EventMachine library to implement custom features efficiently.
Federal Government Strategy on APIs: The talk also connects the benefits of reverse proxies to a broader effort by the U.S. government to standardize API access, making services more user-friendly and easier for agencies to manage.

Conclusion and Takeaways:

Reverse proxies present an effective architectural solution for dealing with diverse backend services without necessitating major changes to existing systems.
Custom proxies can implement a range of common features across various languages and applications, ultimately enhancing both developer and user experiences.
Muerdter encourages viewers to rethink their application architecture, considering how reverse proxies can simplify the complexities of multiple integrations.

The video encapsulates the potential of using custom reverse proxies to abstract and standardize functionalities, ultimately aiming to improve both user and developer satisfaction across various applications.

Abstracting Features Into Custom Reverse Proxies
Nick Muerdter • Denver, CO • Talk

Date: November 01, 2012
Published: Tue, 19 Mar 2013 00:00:00 +0000
Announced: unknown

or Making Better Lemonade From Chaos

Life isn't always simple. We often have to deal with a mishmash of applications, languages, and servers. How can we begin to standardize functionality across this chaos? Custom reverse proxies to the rescue! Using Ruby and EventMachine, learn how you can abstract high-level features and functionality into fast reverse proxies that can improve scalability, save time, and make the world happy.

See how we've applied this across a diverse set of web service APIs to standardize the implementation of authentication, request throttling, analytics, and more. See how this can save development time, eliminate code duplication, make your team happy, make the public happy, and make you a hero. See how this can be applied to any TCP-based application for a wide-variety of use cases. Still think your situation is complicated? Learn about the U.S. Government's plans to standardize API access across the entire federal government. With some reverse proxy magic, this isn't quite as difficult or as foolhardy as it may first sound. It also comes with some nice benefits for both the public audience and government developers.

RubyConf 2012

00:00:14.389 thumbs up excellent okay well it's about time and it looks like it's a pretty packed house so uh

00:00:20.779 Wow hopefully I don't disappoint all of you going after maths is pretty intimidating but uh I guess I'll launch

00:00:28.019 right into it since it's about time to start so I'm here to talk about

00:00:33.140 abstracting features into custom reverse proxies aka making better lemonade from

00:00:38.399 chaos so what am I actually talking about I don't know if that title makes

00:00:43.800 sense to anybody else but what I'm here to talk about our reverse proxies to start with so just to make sure

00:00:49.500 everybody's sort of on the same page with knowledge and you know familiarity with everything I first just want to lay

00:00:56.129 out some basics of you know what is a reverse proxy in case you're not familiar with what a reverse proxy is basically your first proxy is a server

00:01:02.100 that sits in front of your other servers inside your internal network that sort of does stuff it does something in

00:01:10.260 between your main server and the end users on the internet so what kind of

00:01:16.890 stuff does that reverse proxy do a lot of you in the Ruby community might

00:01:22.170 already be familiar with reverse proxies you might have already used one a lot of you might use a unicorn to deploy your

00:01:27.750 apps and in that case you oftentimes put nginx in front of unicorn and that acts

00:01:33.150 as a reverse proxy so in that case Internet's role is to sort of serve static files and to deal with slow

00:01:39.329 clients so there that's the role of the reverse proxy but what I'm here to talk

00:01:44.369 about is sort of taking that reverse proxy layer and doing other fun things inside of it you can sort of implement

00:01:51.000 your own layer there in Ruby you can implement custom features you can use a

00:01:56.130 bet machine to do this to do highly scalable reverse proxies and that's a lot of fun so that sort of is the

00:02:04.110 outline of reverse proxies but you might still be confused as to what this is actually all about

00:02:10.800 so first I'm sort of gonna go into the why of why

00:02:15.960 we've sort of built these reverb proxies and why they might be suitable for certain use cases and then also get into

00:02:21.870 a little bit of how you would actually implement this kind of thing so the easiest way to get started is to start

00:02:28.020 with the story and that story is sort of how we're using customer reverse proxies

00:02:33.540 and I think it's perhaps a story that you know some of you might be able to relate to in some way so several years

00:02:40.620 ago we started building a lot of web services and we wanted to expose all of

00:02:48.630 our web services to the world but another part of this story is silos and I'm not talking agricultural silos with

00:02:55.410 delicious coffee or sugar I'm talking about organizational silos so I work at

00:03:00.930 NRL at the National Renewable Energy Lab it's just down the road and golden so I'm local yeah renewables

00:03:14.950 so NREL is a big company it's uh I think I don't know what the latest count is

00:03:20.150 it's grown a lot it might be around 2,000 employees something like that not all of us are developers there's a lot

00:03:25.520 of very smart scientists they're doing very cool research on renewable energy stuff but we do have a lot of sort of

00:03:31.760 separate development groups within our organization so you know our group that

00:03:36.770 I work in you know sort of has historically done vehicle stuff while other groups maybe do stuff related to

00:03:42.860 buildings and another group does stuff related to solar so we sort of have these different departments that all

00:03:48.980 sort of work on different things but we were all sort of wanting to build web services at the same time and sort of present those to the public in a

00:03:56.090 standard way and another way to look at this is because we sort of have this spread of groups and different you know

00:04:04.160 different development groups throughout the lab you know our group happens to do Ruby stuff we have some other groups

00:04:10.250 that do Python stuff we have other others do Java PHP etc so there's a big

00:04:15.260 diversity there so even if you're not working in big organization you might be able to relate to this from the

00:04:20.750 perspective of you might have legacy apps that you're you you deal with you might have you know a lot of there might

00:04:27.710 be different languages in your stack so there can be a lot of aspects to your stack even in a smaller company but from

00:04:33.979 our perspective we were dealing with sort of different departments and sort of we wanted to make all of our Web

00:04:39.620 Services sort of come together and so

00:04:45.169 and we wanted to present those to users as sort of a single offering and really

00:04:50.990 users don't care about this kind of thing I mean they don't care that our group happens to do transportation stuff

00:04:57.200 and another group happens to do solar stuff they're just interested in hey what does NREL have to offer as far as

00:05:03.290 web services go and how can I most easily find them so what we really wanted out of this platform that we were

00:05:09.830 building to expose our web services is we wanted one entry point for users to find all of our API so we sort of wanted

00:05:16.280 to bring together all these api's that were being developed in separate groups in different ways and bring them

00:05:22.880 together and we wanted it to make it easier for the easy for those users to access the api's

00:05:29.380 we basically wanted to have the user sign up for one account and then they can access all of our API so they don't

00:05:35.419 have to get an account from our transportation group to access the Ruby services and the solar group to access

00:05:41.300 the Python services and so on and then from our perspective on the development

00:05:46.789 side of you know we have all these different groups building web services we we wanted sort of API key based

00:05:53.509 management to access all of our API is across the board we sort of decided that's okay for what we're doing we were

00:06:00.229 doing pretty public stuff we want a rate limiting that was something that a lot of our groups hadn't really explored or

00:06:06.470 implemented within their individual api's but we we wanted that there was a

00:06:13.009 need across the lab and we also wanted analytics again another thing that you know other groups hadn't really tackled

00:06:19.039 and it was something that we really thought we could you know standardize and there would be benefits to doing

00:06:24.229 that so what we didn't want out of this were requiring changes to all of these

00:06:29.240 different groups you know it's a it's a big uphill battle to go into you know I mean the one option is sort of

00:06:35.030 standardizing it for all of our api's and saying it must be written in Ruby which I wouldn't mind but other people

00:06:40.639 would and so but we didn't want to have that mentality of saying you have to do

00:06:46.130 it this way if you want to be part of our cool Club of api's we just wanted it's a big battle to make that change

00:06:52.280 within different groups and for historical reasons you know there are the separate groups we're trying to do

00:06:58.610 better to work closer together but I'm sure you guys can empathize hopefully with sort of those big organizational

00:07:04.430 structures or just again even if you're not in a big organization sort of dealing with legacy apps and

00:07:09.919 sort of this diversity even within your codebase and it's also time consuming to have to go back and touch all these

00:07:16.219 other apps so that's where customer vs. proxies come in to the rescue and so

00:07:22.460 what does this look like so if we look at our diagram again it's at that same

00:07:28.940 stack at that level of the reverse proxy we were able to implement custom features like authentication of rate

00:07:36.380 limiting analytics those are the things we decided we needed across all P is and then they can be shared among

00:07:43.050 all the back-end servers so it sort of just slips in there without having to make any changes on any of the existing

00:07:49.560 ap is that were already built and it implements that common functionality the

00:07:54.870 existing api's don't have to change they can still exist wherever they wanted to exist and the proxy is just agnostic to

00:08:02.310 what sort of back in technology is happening but it sort of provides that unification that single entry point to

00:08:09.330 all of our API s that makes it a lot easier to apply a standard authentication scheme across all of our

00:08:15.000 AP is do rate limiting across our ap eyes and analytics so that sort of is

00:08:24.330 the basics of what we've done so how did this really help us did this really make better lemonade from chaos so from the

00:08:31.710 user perspective I think it did it helped our users we were able to build one website where users can go to and

00:08:38.760 find all of our web services and they can they only have to sign up for one

00:08:44.159 API key and they can get access to all of our API and so they're sort of shielded from sort of the internal you

00:08:50.820 know they don't need to know what department does what and their home you know we have different web presences

00:08:56.220 throughout the internet on various gov domains so they don't have to know where to find stuff there's just a single

00:09:02.190 website for renewable energy AP is that they can go to and start to find this stuff and it's easy for them to just

00:09:07.709 dive in and start using any of them no matter what all is happening on the backend and for our developers the real

00:09:16.110 advantage is that for old AP is they have to do absolutely nothing you know so we had an existing suite of API is

00:09:22.920 already out there but they were all sort of all over the place but they did absolutely nothing but just sort of by

00:09:28.680 existing we were able to put them behind this reverse proxy and then we were immediately able to start layering authentication rate limiting and

00:09:35.250 analytics on top of it and the same goes for new AP is when somebody is building

00:09:42.120 a new API those are just things they don't have to worry about they just sort of assume that all of that is taken care

00:09:47.279 of when they're building a new API and it's just outside the scope of it they just if if I'm being accessed I'll I'm

00:09:53.829 assuming the user is fully authenticated they're not they haven't exceeded rate limits and so on so that's it I think

00:10:01.959 it's been also advantageous for us just from a development perspective speeding up and not having to re implement those

00:10:07.360 same details and so forth so yeah so it's led to reduced implementation code

00:10:15.069 because we don't sort of have two individual api's don't have to implement

00:10:20.889 the same sort of logic over and over again and you know there are definitely ways you can abstract this and there's definitely ways you can you know reuse

00:10:27.910 code in a clean way but it just reduces the need to do any of that there's

00:10:33.850 really no code involved to implement these features at the individual API level there is code obviously at our

00:10:39.850 custom reverse proxy level but that's a little easier to maintain and the nice

00:10:44.860 also thing is that standardization is enforced across the board you know because we could abstract this into some

00:10:50.829 sort of library and things could reuse it but you know we don't run the risk of somebody messing up authentication

00:10:57.639 within their individual API by putting this sort of at a higher level

00:11:02.949 it sort of enforces that that's going to happen so the other advantages is that

00:11:10.509 because of how this operates at the layer that it operates any new features we add to this reverse proxy layer

00:11:16.149 benefit everybody so you know whether or not they're Python services Ruby services PHP services Java services this

00:11:24.009 architecture benefits everyone when we decide to implement new functionality obviously not all functionality is

00:11:29.769 suitable for this kind of thing but it could be a powerful mechanism for certain types of functionality that can

00:11:35.649 be layered like this and another thing is that just reverse proxies in general are a nice scaling mechanism you know a

00:11:42.759 lot of times they're just used you use reverse proxies as load balancers so sort of having this in place and sort of

00:11:49.569 getting everybody on board with this architecture of having and reverse proxies upfront allows us a lot more

00:11:55.420 flexibility on the backend to sort of scale things independently so that's

00:12:00.579 sort of the basics of what we did on to how we actually built these things

00:12:06.269 so you know so yeah how would you actually take Ruby and do some custom

00:12:13.050 magic stuff at that layer and do it fast and efficiently so currently we're using

00:12:19.079 e/m proxy it's a nice event machine proxy library and I certainly can't take

00:12:24.959 credit for any of this stuff I'm about to show code wise we're just users of it but it's open source it's nice it's easy

00:12:31.970 and it's out there so IAM proxy is Ruby and event machine and if you're aren't familiar with the event machine it's

00:12:38.509 just sort of a event based system for writing Ruby code in an event way vented

00:12:45.420 way so if you're you've heard all the stuff about nodejs it's similar in architecture to that but Ruby so it has

00:12:53.699 some nice advantages of being blazing fast it's also flexible and it's low-level that low-level aspect has its

00:13:00.149 pros and it's cons but I'll get into that in a bit so at a very basic example

00:13:06.089 this is sort of what an e/m proxy looks like so it's pretty basic but you sort

00:13:11.790 of have you set up a server so this when you run this script you basically start up a server in this case it would be

00:13:18.360 running on your current server listening to all the IPS running on port 80 and then in here we say we're going to proxy

00:13:25.079 to the same server one 27001 on port 81 and really that's all you need to do to

00:13:31.529 do sort of a transparent proxy but the real power of this is these callbacks that you can do so you can you have on

00:13:38.250 data on response on finish and I think there's even one other one but basically it gives you a lot of flexibility as far

00:13:44.009 as intercepting of chunks of data as they stream through the proxy and doing

00:13:49.199 stuff with that so you can do something with as the data for the request comes

00:13:54.389 in and then you can also do something to the response as it goes out and then you can also do clean up stuff on finish so

00:14:02.370 as a quick example of what this might look like this is a non data callback

00:14:08.100 where I'm modifying the user agent so the user agent comes in on the request

00:14:13.709 and basically I'm just doing a search and replace any existing user agent with the user

00:14:19.930 agent of e/m proxy and you'll note that I'm doing a search I replace but again

00:14:25.779 you're dealing with sort of chunks of data you're dealing with sort of the raw HTTP at the raw HTTP level here so you

00:14:33.610 have to be a little careful but you'll see here that I'm actually searching for user agent and then and two new line

00:14:41.769 characters that is how the HTTP headers work see there's some things you have to

00:14:47.290 be aware of it's not quite all simple but the other thing to note is that

00:14:52.420 you're dealing with chunks of data in this case so you're sort of you're getting a stream of data as it comes through in chunks so you can't always

00:14:58.420 assume that you have like say the full request in here so you can't just do a search or in place and assume you have

00:15:03.879 all the data you sort of have to deal with there's other things and I'll get in a bit of that as far as buffering and

00:15:10.420 things if you need to have the full request so that's all well and good but you could do that kind of thing with you

00:15:16.540 know any sort of reverse proxy or most of them to do sort of a change the user

00:15:21.819 agent that's pretty basic change ahead or do something like that at the reverse proxy later here's something that you

00:15:28.300 know now because this was written in Ruby you can start to tap into all these Ruby libraries so that's the real

00:15:34.629 advantage of sort of this approach and implementing custom stuff so here's an example of I'm setting up I'm connecting

00:15:42.579 to Redis and every time I get a chunk of data I'm incrementing the IP address I

00:15:49.059 have a counter for that IP address in Redis so again this is a little taste of what you can start to do because it is

00:15:55.569 Ruby you can just sort of start to write things and access your Ruby libraries and it can be a lot of fun and as I

00:16:04.420 mentioned you're sort of dealing at a low level here you know you basically have raw HTTP strings so if you want a

00:16:12.939 higher level interface into HTTP it's sort of is up to you to do that yourself and there's libraries to do that and

00:16:19.269 here's a very quick example of sort of as you get data on data so chunks of

00:16:25.929 data are coming in you pass that to this ttp parser library and then once it's

00:16:32.190 determined that all the headers have read an or there's other callbacks on the HTTP library you can then once the

00:16:38.670 headers are completely right in then I am accessing that user agent header as a

00:16:44.279 ruby hash so this is sort of if you want that higher-level interface into the

00:16:49.769 HTTP requests you sort of have to do that yourself but there are libraries to

00:16:55.230 do that kind of stuff so you might be asking why would I do this you know this sounds sort of like a pain to be dealing

00:17:02.550 with you know things at this low of level and having having to you know deal

00:17:09.390 with stuff like parsing HTTP yourself you know where a lot of us might do web development a lot of us might be used to

00:17:15.089 you know nice high-level frameworks like rails and Sinatra and things like that and so there's a few reasons why you

00:17:21.750 would go down this path a big reason is transparency at this reverse proxy level

00:17:29.250 at this proxy level implementing it this way you are dealing with the ride to be

00:17:34.260 request but that gives you a lot more flexibility to sort of pass that request on to the back end in a completely

00:17:40.950 transparent manner so it's not apparent that that there's something in there in

00:17:46.080 between doing something because if you if you try to do this with something like rack or rails or Sinatra by that

00:17:55.200 point by the time your application has been hit the web server is already taken in HTTP requests so it becomes difficult

00:18:02.429 if you then want to recreate that HTTP request to send it to a back-end because

00:18:08.370 by that time you don't really have access to that raw low-level stuff so it can be hard to sort of you sort of have

00:18:14.610 to manually try to reconstruct the request and there's a lot of you know edge cases with HTTP stuff that it makes

00:18:20.880 that hard to deal with so the other

00:18:26.160 reason why you would do this is purely just speed and efficiency you know higher level frameworks are great we use

00:18:31.410 them a lot but at this kind of level you know we really want things to be very fast

00:18:37.320 and very efficient and scalable and event machine is very fast and invented

00:18:43.980 systems are very suitable for proxies that's why you see a lot of proxies being built in nodejs this is similar in concept they scale

00:18:51.809 nicely and efficiently and there's a few reasons for that I'll throw up some

00:18:56.970 terribly unscientific benchmarks just to give you a sense

00:19:03.809 I just ran these on my computer there's probably lots of things going on here but the basic thing is that basically I

00:19:11.129 benchmarked making a direct request to a back-end server and then making it through AM proxy event machine proxy and

00:19:17.399 then making it through a proxy I just found I didn't know much about it called RAC reverse proxy that basically does

00:19:23.429 take a higher-level approach to proxy and you have a lot of nice access to sort of the parse HTTP requests so in

00:19:30.330 this case I mean it's already fast it's one millisecond e/m proxy adds 0.5

00:19:35.639 milliseconds RAC reverse proxy adds almost 3 milliseconds so I mean 3

00:19:40.679 milliseconds maybe not the end of the world who really cares it starts to get

00:19:46.200 the picture becomes a little more complicated once you start to get into bigger requests and this is partially

00:19:51.659 determined by your needs and so here's an example of a larger request where

00:19:56.970 there's more data involved and here AM proxy has 150 millisecond and rack

00:20:02.820 reverse proxy adds 800 milliseconds so why this is the case is that rack sort

00:20:10.350 of deals with a complete request and a complete response so it sort of takes in the full request then it tries to

00:20:17.129 recreate that and send it to the back-end server gets the response from the backend server and then sends it along to the original client so it sort

00:20:24.870 of has to buffer all of those in memory so you know if you can imagine that

00:20:30.210 you're uploading a 1 gigabyte file through some web service and you need to download 5 gigabytes I mean that's a lot

00:20:35.970 of data but there are cases where this happens with rack reverse proxy that

00:20:41.519 sort of is bottlenecked at that and it would read that into full memory at each step and again there are ways around

00:20:47.070 this but it sort of is the difference is M proxy deals with things at a trunk level so you're sort

00:20:54.150 of dealing with just chunks of data at a time and you can just stream them on to the backend as fast as you're receiving

00:21:00.270 them so as soon as you've have enough information you need to have to make your decisions that you want to do

00:21:06.840 inside your custom reverse proxy you can just start streaming that data very quickly and then stream the data back so

00:21:12.660 there isn't that required buffering and so that sort of gets to another aspect of this which is just flexibility of um

00:21:19.860 proxy and in line with that you know it's low level but and it's up to you to

00:21:27.180 implement more but it's up to you to decide if you do want to implement something like buffering or if you want

00:21:32.670 to stream everything live and another thing you can use a proxy for is non

00:21:37.710 HTTP things so you could use it for sort of any TCP level thing so you could do

00:21:43.650 with our WebSockets mail servers database protocols you can sort of put

00:21:50.610 it in the middle of anywhere and any of those sort of types of connections and

00:21:56.130 potentially do custom things so what else could you sort of do with these

00:22:01.530 customer reverse proxies I've talked a bit about you know what we did as far as tackling some of our issues with api's

00:22:08.090 and you know implementing authentication rate limiting analytics and you know I

00:22:14.280 think those are all really good candidates for something that could be implemented that level you know because you do have to be careful this you know

00:22:19.680 you can't really implement your it's not suitable to implement your whole application at this level but it's sort of as those high level features that you

00:22:26.700 know you might have potentially diverse backends or even if you don't have diverse backends it could be an interesting different way to structure

00:22:33.180 your application so what else could you do with this I don't really have any

00:22:38.730 concrete examples but I'm just throwing out ideas here you could do error handling so you know you could monitor

00:22:45.000 all requests coming back for an on success HTTP code and then you could do

00:22:50.280 something with that you could log that you could send emails and you know a lot of us do that within our rails apps but

00:22:56.340 you know if you're maybe dealing with we have legacy Perl applications I'm ashamed to admit but we have to deal with that and you

00:23:04.169 know we don't sometimes have those mechanisms to deal with sort of that air handling in the same way in those

00:23:09.389 applications but you could sort of slip in something that deals with all your errors across all of your applications

00:23:15.809 no matter how they're written and again their pros and cons you might not have access to all the debug details at that level but you could certainly you know

00:23:22.380 be notified about requests that failed you could even do this to sort of manipulate web pages you know say you

00:23:30.929 wanted to insert your Google Analytics snippet on all of your pages you could actually do that with a custom reverse

00:23:36.779 proxy you could sort of take the response as it's going back sneak something in there you know you could

00:23:42.450 even do crazier stuff with like taking all your JavaScript files compressing

00:23:48.179 them on the fly and then returning it as one javascript file to the client and we do that in rails but again if you have

00:23:54.029 diverse backends and maybe don't have as capabilities you could do that at a higher level you could you know maybe do

00:24:00.059 your template in your web page template put in a header and a footer I don't know why you do it at this level but you

00:24:05.580 could you could also do things like say you had a bunch of JSON API is already

00:24:11.789 and you know maybe you can't touch them for a variety of reasons but you wanted a JSON P you could sort of add that at

00:24:17.669 this level and then all of because JSON P is just sort of a matter of wrapping and existing JSON in a callback you

00:24:24.899 could implement that at this level by altering the response as it goes back and it would apply for all of your existing services you could do things

00:24:32.370 like check for security things see if incoming requests look malicious in any way for all of your servers do stuff

00:24:39.330 like that and again you can sort of do more than HTTP with this you can intercept and sort of manipulate also

00:24:46.080 it's a TCP things so you could do email database calls all sorts of fun stuff and there's a lot of great examples in

00:24:52.889 eeehm proxies github repo there's an examples folder just filled with sort of

00:24:58.080 interesting ideas of what you could sort of do so again I don't want to take credit for any of this stuff this is

00:25:03.600 sort of us just repurposing a lot of this stuff so there are a few things to

00:25:09.929 be aware of when you're building these as I've sort of hinted one of them is

00:25:15.300 buffering so and again I've already sort of talked about this but imagine you have you know one gigabyte upload and a

00:25:21.390 five gigabyte download if your proxy layer buffers that request that becomes sort of a sort of a bottleneck but it

00:25:29.970 also becomes a place where as the request gets uploaded it all has to halt until it's fully uploaded and then it

00:25:35.640 gets sent on to your back-end server and in the case of a one gigabyte upload or a 5k bi download that buffering can add

00:25:41.940 significant delays and so sometimes buffering is desirable though and

00:25:47.010 sometimes you can't achieve stuff without buffering and you know for example a unicorn actually wants

00:25:52.830 buffering to deal with slow clients so it can be advantageous but other times it's not and in our case we're dealing

00:25:59.130 with a diverse set of API is that we don't really know the use case of all those api's we opted not to buffer

00:26:05.490 because we just don't know what all those backends are doing or if they want to stream data and we didn't want to prevent that streaming from happening at

00:26:12.360 our proxy level because we just wanted to be as transparent as possible some other things to be aware of at this

00:26:18.240 reverse proxy layer is that if you are going to modify the response going back

00:26:23.520 to the client you can do that but it could be a little tricky you in the

00:26:30.960 header going back to the client there's usually a Content length header and you have to adjust adapt accordingly if

00:26:37.200 you're going to do any manipulation so again this isn't the easiest way to perhaps alter your website but for

00:26:43.710 certain use cases I think it can be powerful and in line with that another thing to be careful of is gzipped

00:26:50.160 responses going back to the client so if your back-end server decides to gzip something up send it back to the client

00:26:56.280 if you want to alter the response body in that case it gets tricky again because you sort of you do have to

00:27:01.800 buffer in that case because you sort of have to get the full gzip then because you can't just gzip the different unzip

00:27:08.130 the different trunks you have to get the full response body buffer it on gzip it

00:27:13.680 do whatever you were wanting to manipulate and then regions if it and send it back to the client so those are

00:27:20.250 just some things to be aware of that we've just sort of learned as we through this process so now I'm gonna

00:27:27.570 talk a little bit about some other stuff so this is all well and good but I don't

00:27:33.899 know if you're interested in bigger stuff so this digital strategy for the

00:27:40.109 federal government came out earlier this year May 23rd 2012 apparently and so I'm

00:27:49.320 involved in the API portion of sort of some of this digital strategy and a big

00:27:54.600 part of this digital strategy for the entire federal government is just a web services bonanza I mean it's just like

00:28:01.200 everybody should be doing Web Services web services are great all agencies should be you know delivering data to

00:28:08.849 people in the form of web services and I would tend to agree with that I mean I think you know as app developers it's

00:28:15.809 always nice to find data out there it's just in an open web format in an API

00:28:21.299 that's easy to use and get access to any or the government has a lot of data but

00:28:27.720 it's not always in the most easily accessible format you know I've seen stuff where it's like it's a printed

00:28:33.539 Excel spreadsheet it's been scanned in and then put into a Word document and

00:28:38.849 then they just give that to you and it's like I'm providing data not really so a

00:28:45.119 big part of this push for this federal strategy is to encourage agencies to

00:28:50.220 develop a lot more api's and web services so expect that sort of within the coming year to see a lot more

00:28:56.720 government data out there so that's exciting and of itself but the portion

00:29:03.599 that I'm involved in is sort of the API thing and sort of the main objectives

00:29:08.820 there is you know there is this big push for web services but they sort of want

00:29:14.399 to tackle two things on the one hand they want to make it easier for users like a lot of you to find and consume

00:29:21.599 federal aid api's there are some out there already but a lot of times they're not easy to find you know you might be

00:29:27.749 interested in it some data but it's just hard to find through the bureaucratic websites that are government websites

00:29:35.500 and the other aspect of this is that if they are pushing agencies to develop

00:29:40.990 more api's they they need to make it easier for agencies to develop and

00:29:47.409 deploy more api's so you know if there's this big push for api's it just needs to

00:29:53.769 become easier a lot of agencies you know a lot of them are ahead of the game and they're already building api's that are doing great work but some of them are

00:30:00.009 sort of they need help with this kind of thing so in a lot of ways this is the exact same problem we had within our

00:30:06.129 organization at NREL just on a much bigger scale you know there are silos of

00:30:12.879 organizations there's different agencies that are all doing things independently but a lot of the stuff that needs to be

00:30:20.139 addressed are similar issues so it's

00:30:26.049 sort of mirrors our same problems so what we're looking at right now is is it sort of the same solution we're

00:30:33.279 currently evaluating using basically our software stack that we developed at NREL

00:30:39.370 to sort of proxy to all the agencies within a federal government or possibly

00:30:45.580 other solutions that sort of do similar things but we've been talking to a lot of different agencies and the consensus

00:30:52.629 was they want something like this they don't want to have to deal with authentication on their own you know they want and from the federal level and

00:31:00.220 from a user level they want to make it easy for users to just get one API key and to be able to generally access all

00:31:07.120 the federal API so you don't need to have a bajillion accounts for all the different agencies so so yeah I mean

00:31:15.549 it's sort of the same issue sort of I perhaps the same solution and so yeah

00:31:21.309 we're currently involved in getting something like this up and running perhaps within the next six months for

00:31:27.070 federal agencies to start taking advantage of so that's all exciting there's lots of federal there's lots of

00:31:35.200 web service action going on in the federal government over the next year so stay tuned if you're at all interested in all that so

00:31:44.630 I'm starting to wind down here but I have some more slides to go through so what has all this been about really so

00:31:51.470 to sort of summarize you know what I really want to encourage here is sort of

00:31:56.810 a different way of perhaps thinking about some of your architecture again I

00:32:02.830 reverse proxies can't solve all problems that are not suitable for all problems but I think they can be an interesting

00:32:08.720 different approach that I don't see utilized as much to solve certain problems because they're fun for the

00:32:16.730 whole family you know anything you do within the reverse proxy layer affects all of your back-end applications so as

00:32:22.670 you start to add new features to your reverse proxy layer it can be advantageous for everyone so it's an

00:32:28.550 interesting way to sort of abstract things completely outside of the application level into a higher level

00:32:33.770 that sits in front of all of your applications and you know what I want to encourage you here is you might be able

00:32:39.470 to do more with reverse proxies than you realize you might think of reverse proxy just is you know a software like engine

00:32:46.190 X or H a proxy that sort of just does proxying but there's not a lot of logic that can happen in there or rather like

00:32:53.570 implementation details that can happen in there but since you can write these in Ruby you can start to leverage all

00:32:59.510 these libraries you can connect to databases you can do all sorts of crazy fun stuff at that level so yes again

00:33:05.480 just sort of think about it is a different way to perhaps architect some of your applications so I'll go through

00:33:11.480 some just random resources that might be useful one of them is API umbrella this

00:33:18.590 is our full API management system we've built at NREL and we've just recently

00:33:24.020 finally got approval to open-source it so it's all out of github we're super excited to finally open-source the

00:33:29.390 project and includes our custom event machine based proxy so you know even if

00:33:36.860 this is even if you're not interested in api's you might check it out as you know what you can do with what you can do

00:33:46.250 with that reverse proxy level and how you would implement some of those details so it's github slash NREL slash API

00:33:53.090 umbrella and it's a new open-source project for us so we're sadly behind the times as

00:33:58.300 far as getting it all documented and everything but definitely reach out to me if you have questions about any of

00:34:03.310 that as far as just ruby and event machine just low level proxies there's AM proxy which i talked about

00:34:10.030 and those are the examples i show it's just sort of a simple bare-bones but very capable reverse proxy

00:34:16.570 there's proxy machine which is actually what our current production system is based on it's a github project it's

00:34:24.790 simpler the one and and it can be easier to get started perhaps but the one

00:34:31.390 disadvantage is it can only act on incoming requests it doesn't track the response going outwards so it sort of

00:34:36.640 just does incoming stuff and then does routing it doesn't keep track of the response coming out so you can't keep

00:34:42.100 track of some of the stuff you can in DM proxy so we're currently in the process of switching over to um proxy and then

00:34:48.820 there's Goliath which is another thing that's based on all this event machine magic and it's more of a sort of

00:34:57.430 higher-level framework it still is pretty low-level but it then uses in synchrony which uses some fiber stuff so

00:35:04.480 you act it actually hides all the event stuff from you so that's an interesting project to also check out as far as just

00:35:10.810 general reverse proxies you might be interested in we use there's a lot of them out there this certainly is an

00:35:16.570 exaustive these are just sort of the ones we use in variety of capacities there's a CH a proxy H a proxy is

00:35:22.500 amazingly fast and scalable and it could do all sorts of fun load balancing stuff and it acts as a great general proxy

00:35:29.760 varnish cache is an interesting one it's a reverse proxy but it also is a caching layer and so that sort of gets back to

00:35:36.490 sort of we're actually going to be adding this to sort of our stack we're gonna slip this in as another reverse

00:35:42.550 proxy in our stack and then the advantage there is that some of our older API is that maybe don't have as

00:35:48.640 good of caching capabilities as some of those stuff built into rails does can start to use the varnish caching server

00:35:54.550 as a caching layer and varnish is nice just for caching all sorts of stuff and

00:36:00.220 then there's nginx which is really more of a web server but it also has some pretty nice proxying capabilities

00:36:07.220 it isn't as exhaustive as something like H a proxy but if you're already using it it can do quite a bit of stuff and so if

00:36:15.710 you happen to be interested in renewable energy api's this is our site that we sort of built that sort of this is what

00:36:23.720 this is was all about is sort of making one website for users to find all of our API so even though they happen to be on

00:36:29.060 all sorts of different servers within our organization and so you can check that out at developer and rel gov and

00:36:35.810 there's lots more api's coming soon some of my colleagues sitting down here on the front are working hard away at

00:36:41.690 building new api's so there's a lot of cool stuff in the pipeline and so those

00:36:47.630 are some contact details for myself I don't really use Twitter I'm sort of a curmudgeon but feel free to contact me

00:36:54.320 maybe I'll start using it and finally this is a completely off topic and a

00:37:00.109 shameless plug but if you've been wondering about this ridiculous thing on my upper lip we just finished our

00:37:08.570 mustache competition at work this week and we have a mustache competition every year for charity and so we haven't met

00:37:17.450 last year's goal but if you're interested it's a local denver-based charity to help support kids and

00:37:23.750 education resources so if you're interested bitly slash rubies - you can

00:37:31.310 donate but anyway I think there's actually some time for some questions because I was cooking yeah

00:37:42.589 I have a question that big deal with Easter services to be bishops one

00:37:56.569 was how do we deal with sort of inner process communication you know so if one

00:38:02.839 what service needs to call another service does that travel through the reverse proxy as well and the answer is

00:38:08.209 in our current system yes it isn't strictly necessary on the backend we definitely could communicate

00:38:13.910 directly server to server if we wanted to save the overhead but I will say this reverse proxy is very fast I mean it in

00:38:21.049 benchmarks I think it adds like four milliseconds to deal with the rate

00:38:27.109 limiting and analytics and user authentication and that utilizes Redis and MongoDB to do all that so it's fast

00:38:34.130 so for the time being we haven't really seen any problems with routing those sort of inner service communications

00:38:39.979 through the proxy and that sort of just gains us some advantages as far as analytics mainly just because we are

00:38:46.279 interested ourselves to know internally how we're using our API

00:38:57.110 so the question was how to deal with authentication then yeah so we still use

00:39:03.510 the same our authentication is admittedly very simple it's all API key based so it's just a big long API key

00:39:10.050 and then on the back end we have capabilities to remove the rate limits

00:39:15.450 from certain API keys so basically we set up those back in things as unlimited API keys so we have API keys that we

00:39:23.070 send back and forth and that's what we use for authentication and that sort of identifies each app so again for analytics yep

00:39:43.690 well how do you do with the fact that you're the pattern of the text you're looking for maybe split between blocks

00:39:50.260 do you have a facility to you love producing blocks yeah that's a good question so for the first part of that

00:39:56.920 question that you might have missed was how do we deal with sort of since things are coming into this proxy as chunks and

00:40:02.200 you know if we're doing something like a regular expression how do we make sure that it doesn't spam chunks and we don't have that data so the answer there is

00:40:09.359 first of all I wouldn't really recommend doing regular expressions search is sort of for that reason that was sort of a

00:40:14.890 very simple example but that is why you would use a higher-level library like

00:40:20.710 that HTTP HTTP parser library that then is capable of knowing when all of your

00:40:26.559 headers are read in and then you could take some sort of action on the whole all the headers or once all all the bodies write it right in but you're

00:40:32.559 right I mean in that case you do sort of need to buffer and that's why if you're doing anything with the response you

00:40:38.819 there are some ways you can get around it for very specific cases but for the most part if you're dealing with the

00:40:44.289 response you have to buffer and then if your incoming requests sort of the same deal if you want to do anything with the

00:40:51.339 body and yeah yep you can take a look at

00:41:02.500 our github repo it's probably not as well tested as it should be so yes there are I mean I would say I would say

00:41:11.109 slightly yes I mean we I would say the testing right now tries to break it down

00:41:16.180 on a it doesn't really get involved in the event machine sort of level it sort

00:41:22.000 of is just testing the more basic blocks within there so yeah we're probably lacking on tests for sort of the whole

00:41:28.450 thing running as a whole with all the events going crazy so yeah testing is more of a challenge with the bed machine

00:41:34.569 stuff

00:41:47.010 Apache does do it yeah and Apache can be used I would say it's probably not as

00:41:52.990 scalable as a lot of these other solutions it depends on how you're running Apache whether or not you're

00:41:58.600 sort of it's been a while since I've used Apache but the worker model versus the other model whatever that is by

00:42:07.390 default Apache will sort of spin up new processes for every single request whereas I think all of these other ones

00:42:14.220 H a proxy and nginx at least or event based model so they're a lot lighter weight so Apache will work the other

00:42:22.930 ones possibly just a little more scalable but again if you're already using Apache go for it I mean I don't

00:42:28.900 want to tell you not to use it

00:42:37.210 yes so we oh sorry yeah the question was

00:42:43.999 a previous question I don't know if I repeated that was about using Apache if you couldn't pick that up this question

00:42:49.309 was about how we deploy our proxy and if we deal with any sort of missed requests

00:42:57.589 or any of that so to deal with deploying we all of our deployment stuff runs

00:43:04.249 through Capistrano but within that how we're really deploying that and dealing with that issue of we do basically do a

00:43:11.329 rolling restart all of these we basically run I think three of these

00:43:17.359 processes or four of these processes it's sort of as the same models engine X where you run as many as you know courses you have on your CPU so we have

00:43:24.920 you know several of these processes of running and then we do do a rolling restart we do that through a Python

00:43:29.930 project called supervisor it's a pretty nice library just it's not really Python

00:43:35.390 relate at all it just happens to be written in Python but it's a nice process management library that you can

00:43:41.599 deal with sort of treating things as process groups I know a lot of people might use like Monnett and I think

00:43:48.349 there's some other blue pill God I think there's a few other those I've we've been I think probably more happy with

00:43:54.950 supervisor and you can do so we basically do we did a custom

00:44:00.109 implementation of a rolling restart with supervisor so we stopped one and start another I think there might be some edge

00:44:06.140 case stuff where if somebody's in the middle of a request I'm not sure if we're completely gracefully shutting down right now but I think actually part

00:44:12.289 of proxy machine an e/m proxy does should handle graceful shutdown so theoretically that should happen without

00:44:18.589 any lost requests any other questions

00:44:24.160 yeah what perfect timing well thank you guys very much I hope this was okay

Nick Muerdter

@nick-muerdter

Explore all talks recorded at RubyConf 2012

+45

RubyConf 2012