00:00:14.389
thumbs up excellent okay well it's about time and it looks like it's a pretty packed house so uh
00:00:20.779
Wow hopefully I don't disappoint all of you going after maths is pretty intimidating but uh I guess I'll launch
00:00:28.019
right into it since it's about time to start so I'm here to talk about
00:00:33.140
abstracting features into custom reverse proxies aka making better lemonade from
00:00:38.399
chaos so what am I actually talking about I don't know if that title makes
00:00:43.800
sense to anybody else but what I'm here to talk about our reverse proxies to start with so just to make sure
00:00:49.500
everybody's sort of on the same page with knowledge and you know familiarity with everything I first just want to lay
00:00:56.129
out some basics of you know what is a reverse proxy in case you're not familiar with what a reverse proxy is basically your first proxy is a server
00:01:02.100
that sits in front of your other servers inside your internal network that sort of does stuff it does something in
00:01:10.260
between your main server and the end users on the internet so what kind of
00:01:16.890
stuff does that reverse proxy do a lot of you in the Ruby community might
00:01:22.170
already be familiar with reverse proxies you might have already used one a lot of you might use a unicorn to deploy your
00:01:27.750
apps and in that case you oftentimes put nginx in front of unicorn and that acts
00:01:33.150
as a reverse proxy so in that case Internet's role is to sort of serve static files and to deal with slow
00:01:39.329
clients so there that's the role of the reverse proxy but what I'm here to talk
00:01:44.369
about is sort of taking that reverse proxy layer and doing other fun things inside of it you can sort of implement
00:01:51.000
your own layer there in Ruby you can implement custom features you can use a
00:01:56.130
bet machine to do this to do highly scalable reverse proxies and that's a lot of fun so that sort of is the
00:02:04.110
outline of reverse proxies but you might still be confused as to what this is actually all about
00:02:10.800
so first I'm sort of gonna go into the why of why
00:02:15.960
we've sort of built these reverb proxies and why they might be suitable for certain use cases and then also get into
00:02:21.870
a little bit of how you would actually implement this kind of thing so the easiest way to get started is to start
00:02:28.020
with the story and that story is sort of how we're using customer reverse proxies
00:02:33.540
and I think it's perhaps a story that you know some of you might be able to relate to in some way so several years
00:02:40.620
ago we started building a lot of web services and we wanted to expose all of
00:02:48.630
our web services to the world but another part of this story is silos and I'm not talking agricultural silos with
00:02:55.410
delicious coffee or sugar I'm talking about organizational silos so I work at
00:03:00.930
NRL at the National Renewable Energy Lab it's just down the road and golden so I'm local yeah renewables
00:03:14.950
so NREL is a big company it's uh I think I don't know what the latest count is
00:03:20.150
it's grown a lot it might be around 2,000 employees something like that not all of us are developers there's a lot
00:03:25.520
of very smart scientists they're doing very cool research on renewable energy stuff but we do have a lot of sort of
00:03:31.760
separate development groups within our organization so you know our group that
00:03:36.770
I work in you know sort of has historically done vehicle stuff while other groups maybe do stuff related to
00:03:42.860
buildings and another group does stuff related to solar so we sort of have these different departments that all
00:03:48.980
sort of work on different things but we were all sort of wanting to build web services at the same time and sort of present those to the public in a
00:03:56.090
standard way and another way to look at this is because we sort of have this spread of groups and different you know
00:04:04.160
different development groups throughout the lab you know our group happens to do Ruby stuff we have some other groups
00:04:10.250
that do Python stuff we have other others do Java PHP etc so there's a big
00:04:15.260
diversity there so even if you're not working in big organization you might be able to relate to this from the
00:04:20.750
perspective of you might have legacy apps that you're you you deal with you might have you know a lot of there might
00:04:27.710
be different languages in your stack so there can be a lot of aspects to your stack even in a smaller company but from
00:04:33.979
our perspective we were dealing with sort of different departments and sort of we wanted to make all of our Web
00:04:39.620
Services sort of come together and so
00:04:45.169
and we wanted to present those to users as sort of a single offering and really
00:04:50.990
users don't care about this kind of thing I mean they don't care that our group happens to do transportation stuff
00:04:57.200
and another group happens to do solar stuff they're just interested in hey what does NREL have to offer as far as
00:05:03.290
web services go and how can I most easily find them so what we really wanted out of this platform that we were
00:05:09.830
building to expose our web services is we wanted one entry point for users to find all of our API so we sort of wanted
00:05:16.280
to bring together all these api's that were being developed in separate groups in different ways and bring them
00:05:22.880
together and we wanted it to make it easier for the easy for those users to access the api's
00:05:29.380
we basically wanted to have the user sign up for one account and then they can access all of our API so they don't
00:05:35.419
have to get an account from our transportation group to access the Ruby services and the solar group to access
00:05:41.300
the Python services and so on and then from our perspective on the development
00:05:46.789
side of you know we have all these different groups building web services we we wanted sort of API key based
00:05:53.509
management to access all of our API is across the board we sort of decided that's okay for what we're doing we were
00:06:00.229
doing pretty public stuff we want a rate limiting that was something that a lot of our groups hadn't really explored or
00:06:06.470
implemented within their individual api's but we we wanted that there was a
00:06:13.009
need across the lab and we also wanted analytics again another thing that you know other groups hadn't really tackled
00:06:19.039
and it was something that we really thought we could you know standardize and there would be benefits to doing
00:06:24.229
that so what we didn't want out of this were requiring changes to all of these
00:06:29.240
different groups you know it's a it's a big uphill battle to go into you know I mean the one option is sort of
00:06:35.030
standardizing it for all of our api's and saying it must be written in Ruby which I wouldn't mind but other people
00:06:40.639
would and so but we didn't want to have that mentality of saying you have to do
00:06:46.130
it this way if you want to be part of our cool Club of api's we just wanted it's a big battle to make that change
00:06:52.280
within different groups and for historical reasons you know there are the separate groups we're trying to do
00:06:58.610
better to work closer together but I'm sure you guys can empathize hopefully with sort of those big organizational
00:07:04.430
structures or just again even if you're not in a big organization sort of dealing with legacy apps and
00:07:09.919
sort of this diversity even within your codebase and it's also time consuming to have to go back and touch all these
00:07:16.219
other apps so that's where customer vs. proxies come in to the rescue and so
00:07:22.460
what does this look like so if we look at our diagram again it's at that same
00:07:28.940
stack at that level of the reverse proxy we were able to implement custom features like authentication of rate
00:07:36.380
limiting analytics those are the things we decided we needed across all P is and then they can be shared among
00:07:43.050
all the back-end servers so it sort of just slips in there without having to make any changes on any of the existing
00:07:49.560
ap is that were already built and it implements that common functionality the
00:07:54.870
existing api's don't have to change they can still exist wherever they wanted to exist and the proxy is just agnostic to
00:08:02.310
what sort of back in technology is happening but it sort of provides that unification that single entry point to
00:08:09.330
all of our API s that makes it a lot easier to apply a standard authentication scheme across all of our
00:08:15.000
AP is do rate limiting across our ap eyes and analytics so that sort of is
00:08:24.330
the basics of what we've done so how did this really help us did this really make better lemonade from chaos so from the
00:08:31.710
user perspective I think it did it helped our users we were able to build one website where users can go to and
00:08:38.760
find all of our web services and they can they only have to sign up for one
00:08:44.159
API key and they can get access to all of our API and so they're sort of shielded from sort of the internal you
00:08:50.820
know they don't need to know what department does what and their home you know we have different web presences
00:08:56.220
throughout the internet on various gov domains so they don't have to know where to find stuff there's just a single
00:09:02.190
website for renewable energy AP is that they can go to and start to find this stuff and it's easy for them to just
00:09:07.709
dive in and start using any of them no matter what all is happening on the backend and for our developers the real
00:09:16.110
advantage is that for old AP is they have to do absolutely nothing you know so we had an existing suite of API is
00:09:22.920
already out there but they were all sort of all over the place but they did absolutely nothing but just sort of by
00:09:28.680
existing we were able to put them behind this reverse proxy and then we were immediately able to start layering authentication rate limiting and
00:09:35.250
analytics on top of it and the same goes for new AP is when somebody is building
00:09:42.120
a new API those are just things they don't have to worry about they just sort of assume that all of that is taken care
00:09:47.279
of when they're building a new API and it's just outside the scope of it they just if if I'm being accessed I'll I'm
00:09:53.829
assuming the user is fully authenticated they're not they haven't exceeded rate limits and so on so that's it I think
00:10:01.959
it's been also advantageous for us just from a development perspective speeding up and not having to re implement those
00:10:07.360
same details and so forth so yeah so it's led to reduced implementation code
00:10:15.069
because we don't sort of have two individual api's don't have to implement
00:10:20.889
the same sort of logic over and over again and you know there are definitely ways you can abstract this and there's definitely ways you can you know reuse
00:10:27.910
code in a clean way but it just reduces the need to do any of that there's
00:10:33.850
really no code involved to implement these features at the individual API level there is code obviously at our
00:10:39.850
custom reverse proxy level but that's a little easier to maintain and the nice
00:10:44.860
also thing is that standardization is enforced across the board you know because we could abstract this into some
00:10:50.829
sort of library and things could reuse it but you know we don't run the risk of somebody messing up authentication
00:10:57.639
within their individual API by putting this sort of at a higher level
00:11:02.949
it sort of enforces that that's going to happen so the other advantages is that
00:11:10.509
because of how this operates at the layer that it operates any new features we add to this reverse proxy layer
00:11:16.149
benefit everybody so you know whether or not they're Python services Ruby services PHP services Java services this
00:11:24.009
architecture benefits everyone when we decide to implement new functionality obviously not all functionality is
00:11:29.769
suitable for this kind of thing but it could be a powerful mechanism for certain types of functionality that can
00:11:35.649
be layered like this and another thing is that just reverse proxies in general are a nice scaling mechanism you know a
00:11:42.759
lot of times they're just used you use reverse proxies as load balancers so sort of having this in place and sort of
00:11:49.569
getting everybody on board with this architecture of having and reverse proxies upfront allows us a lot more
00:11:55.420
flexibility on the backend to sort of scale things independently so that's
00:12:00.579
sort of the basics of what we did on to how we actually built these things
00:12:06.269
so you know so yeah how would you actually take Ruby and do some custom
00:12:13.050
magic stuff at that layer and do it fast and efficiently so currently we're using
00:12:19.079
e/m proxy it's a nice event machine proxy library and I certainly can't take
00:12:24.959
credit for any of this stuff I'm about to show code wise we're just users of it but it's open source it's nice it's easy
00:12:31.970
and it's out there so IAM proxy is Ruby and event machine and if you're aren't familiar with the event machine it's
00:12:38.509
just sort of a event based system for writing Ruby code in an event way vented
00:12:45.420
way so if you're you've heard all the stuff about nodejs it's similar in architecture to that but Ruby so it has
00:12:53.699
some nice advantages of being blazing fast it's also flexible and it's low-level that low-level aspect has its
00:13:00.149
pros and it's cons but I'll get into that in a bit so at a very basic example
00:13:06.089
this is sort of what an e/m proxy looks like so it's pretty basic but you sort
00:13:11.790
of have you set up a server so this when you run this script you basically start up a server in this case it would be
00:13:18.360
running on your current server listening to all the IPS running on port 80 and then in here we say we're going to proxy
00:13:25.079
to the same server one 27001 on port 81 and really that's all you need to do to
00:13:31.529
do sort of a transparent proxy but the real power of this is these callbacks that you can do so you can you have on
00:13:38.250
data on response on finish and I think there's even one other one but basically it gives you a lot of flexibility as far
00:13:44.009
as intercepting of chunks of data as they stream through the proxy and doing
00:13:49.199
stuff with that so you can do something with as the data for the request comes
00:13:54.389
in and then you can also do something to the response as it goes out and then you can also do clean up stuff on finish so
00:14:02.370
as a quick example of what this might look like this is a non data callback
00:14:08.100
where I'm modifying the user agent so the user agent comes in on the request
00:14:13.709
and basically I'm just doing a search and replace any existing user agent with the user
00:14:19.930
agent of e/m proxy and you'll note that I'm doing a search I replace but again
00:14:25.779
you're dealing with sort of chunks of data you're dealing with sort of the raw HTTP at the raw HTTP level here so you
00:14:33.610
have to be a little careful but you'll see here that I'm actually searching for user agent and then and two new line
00:14:41.769
characters that is how the HTTP headers work see there's some things you have to
00:14:47.290
be aware of it's not quite all simple but the other thing to note is that
00:14:52.420
you're dealing with chunks of data in this case so you're sort of you're getting a stream of data as it comes through in chunks so you can't always
00:14:58.420
assume that you have like say the full request in here so you can't just do a search or in place and assume you have
00:15:03.879
all the data you sort of have to deal with there's other things and I'll get in a bit of that as far as buffering and
00:15:10.420
things if you need to have the full request so that's all well and good but you could do that kind of thing with you
00:15:16.540
know any sort of reverse proxy or most of them to do sort of a change the user
00:15:21.819
agent that's pretty basic change ahead or do something like that at the reverse proxy later here's something that you
00:15:28.300
know now because this was written in Ruby you can start to tap into all these Ruby libraries so that's the real
00:15:34.629
advantage of sort of this approach and implementing custom stuff so here's an example of I'm setting up I'm connecting
00:15:42.579
to Redis and every time I get a chunk of data I'm incrementing the IP address I
00:15:49.059
have a counter for that IP address in Redis so again this is a little taste of what you can start to do because it is
00:15:55.569
Ruby you can just sort of start to write things and access your Ruby libraries and it can be a lot of fun and as I
00:16:04.420
mentioned you're sort of dealing at a low level here you know you basically have raw HTTP strings so if you want a
00:16:12.939
higher level interface into HTTP it's sort of is up to you to do that yourself and there's libraries to do that and
00:16:19.269
here's a very quick example of sort of as you get data on data so chunks of
00:16:25.929
data are coming in you pass that to this ttp parser library and then once it's
00:16:32.190
determined that all the headers have read an or there's other callbacks on the HTTP library you can then once the
00:16:38.670
headers are completely right in then I am accessing that user agent header as a
00:16:44.279
ruby hash so this is sort of if you want that higher-level interface into the
00:16:49.769
HTTP requests you sort of have to do that yourself but there are libraries to
00:16:55.230
do that kind of stuff so you might be asking why would I do this you know this sounds sort of like a pain to be dealing
00:17:02.550
with you know things at this low of level and having having to you know deal
00:17:09.390
with stuff like parsing HTTP yourself you know where a lot of us might do web development a lot of us might be used to
00:17:15.089
you know nice high-level frameworks like rails and Sinatra and things like that and so there's a few reasons why you
00:17:21.750
would go down this path a big reason is transparency at this reverse proxy level
00:17:29.250
at this proxy level implementing it this way you are dealing with the ride to be
00:17:34.260
request but that gives you a lot more flexibility to sort of pass that request on to the back end in a completely
00:17:40.950
transparent manner so it's not apparent that that there's something in there in
00:17:46.080
between doing something because if you if you try to do this with something like rack or rails or Sinatra by that
00:17:55.200
point by the time your application has been hit the web server is already taken in HTTP requests so it becomes difficult
00:18:02.429
if you then want to recreate that HTTP request to send it to a back-end because
00:18:08.370
by that time you don't really have access to that raw low-level stuff so it can be hard to sort of you sort of have
00:18:14.610
to manually try to reconstruct the request and there's a lot of you know edge cases with HTTP stuff that it makes
00:18:20.880
that hard to deal with so the other
00:18:26.160
reason why you would do this is purely just speed and efficiency you know higher level frameworks are great we use
00:18:31.410
them a lot but at this kind of level you know we really want things to be very fast
00:18:37.320
and very efficient and scalable and event machine is very fast and invented
00:18:43.980
systems are very suitable for proxies that's why you see a lot of proxies being built in nodejs this is similar in concept they scale
00:18:51.809
nicely and efficiently and there's a few reasons for that I'll throw up some
00:18:56.970
terribly unscientific benchmarks just to give you a sense
00:19:03.809
I just ran these on my computer there's probably lots of things going on here but the basic thing is that basically I
00:19:11.129
benchmarked making a direct request to a back-end server and then making it through AM proxy event machine proxy and
00:19:17.399
then making it through a proxy I just found I didn't know much about it called RAC reverse proxy that basically does
00:19:23.429
take a higher-level approach to proxy and you have a lot of nice access to sort of the parse HTTP requests so in
00:19:30.330
this case I mean it's already fast it's one millisecond e/m proxy adds 0.5
00:19:35.639
milliseconds RAC reverse proxy adds almost 3 milliseconds so I mean 3
00:19:40.679
milliseconds maybe not the end of the world who really cares it starts to get
00:19:46.200
the picture becomes a little more complicated once you start to get into bigger requests and this is partially
00:19:51.659
determined by your needs and so here's an example of a larger request where
00:19:56.970
there's more data involved and here AM proxy has 150 millisecond and rack
00:20:02.820
reverse proxy adds 800 milliseconds so why this is the case is that rack sort
00:20:10.350
of deals with a complete request and a complete response so it sort of takes in the full request then it tries to
00:20:17.129
recreate that and send it to the back-end server gets the response from the backend server and then sends it along to the original client so it sort
00:20:24.870
of has to buffer all of those in memory so you know if you can imagine that
00:20:30.210
you're uploading a 1 gigabyte file through some web service and you need to download 5 gigabytes I mean that's a lot
00:20:35.970
of data but there are cases where this happens with rack reverse proxy that
00:20:41.519
sort of is bottlenecked at that and it would read that into full memory at each step and again there are ways around
00:20:47.070
this but it sort of is the difference is M proxy deals with things at a trunk level so you're sort
00:20:54.150
of dealing with just chunks of data at a time and you can just stream them on to the backend as fast as you're receiving
00:21:00.270
them so as soon as you've have enough information you need to have to make your decisions that you want to do
00:21:06.840
inside your custom reverse proxy you can just start streaming that data very quickly and then stream the data back so
00:21:12.660
there isn't that required buffering and so that sort of gets to another aspect of this which is just flexibility of um
00:21:19.860
proxy and in line with that you know it's low level but and it's up to you to
00:21:27.180
implement more but it's up to you to decide if you do want to implement something like buffering or if you want
00:21:32.670
to stream everything live and another thing you can use a proxy for is non
00:21:37.710
HTTP things so you could use it for sort of any TCP level thing so you could do
00:21:43.650
with our WebSockets mail servers database protocols you can sort of put
00:21:50.610
it in the middle of anywhere and any of those sort of types of connections and
00:21:56.130
potentially do custom things so what else could you sort of do with these
00:22:01.530
customer reverse proxies I've talked a bit about you know what we did as far as tackling some of our issues with api's
00:22:08.090
and you know implementing authentication rate limiting analytics and you know I
00:22:14.280
think those are all really good candidates for something that could be implemented that level you know because you do have to be careful this you know
00:22:19.680
you can't really implement your it's not suitable to implement your whole application at this level but it's sort of as those high level features that you
00:22:26.700
know you might have potentially diverse backends or even if you don't have diverse backends it could be an interesting different way to structure
00:22:33.180
your application so what else could you do with this I don't really have any
00:22:38.730
concrete examples but I'm just throwing out ideas here you could do error handling so you know you could monitor
00:22:45.000
all requests coming back for an on success HTTP code and then you could do
00:22:50.280
something with that you could log that you could send emails and you know a lot of us do that within our rails apps but
00:22:56.340
you know if you're maybe dealing with we have legacy Perl applications I'm ashamed to admit but we have to deal with that and you
00:23:04.169
know we don't sometimes have those mechanisms to deal with sort of that air handling in the same way in those
00:23:09.389
applications but you could sort of slip in something that deals with all your errors across all of your applications
00:23:15.809
no matter how they're written and again their pros and cons you might not have access to all the debug details at that level but you could certainly you know
00:23:22.380
be notified about requests that failed you could even do this to sort of manipulate web pages you know say you
00:23:30.929
wanted to insert your Google Analytics snippet on all of your pages you could actually do that with a custom reverse
00:23:36.779
proxy you could sort of take the response as it's going back sneak something in there you know you could
00:23:42.450
even do crazier stuff with like taking all your JavaScript files compressing
00:23:48.179
them on the fly and then returning it as one javascript file to the client and we do that in rails but again if you have
00:23:54.029
diverse backends and maybe don't have as capabilities you could do that at a higher level you could you know maybe do
00:24:00.059
your template in your web page template put in a header and a footer I don't know why you do it at this level but you
00:24:05.580
could you could also do things like say you had a bunch of JSON API is already
00:24:11.789
and you know maybe you can't touch them for a variety of reasons but you wanted a JSON P you could sort of add that at
00:24:17.669
this level and then all of because JSON P is just sort of a matter of wrapping and existing JSON in a callback you
00:24:24.899
could implement that at this level by altering the response as it goes back and it would apply for all of your existing services you could do things
00:24:32.370
like check for security things see if incoming requests look malicious in any way for all of your servers do stuff
00:24:39.330
like that and again you can sort of do more than HTTP with this you can intercept and sort of manipulate also
00:24:46.080
it's a TCP things so you could do email database calls all sorts of fun stuff and there's a lot of great examples in
00:24:52.889
eeehm proxies github repo there's an examples folder just filled with sort of
00:24:58.080
interesting ideas of what you could sort of do so again I don't want to take credit for any of this stuff this is
00:25:03.600
sort of us just repurposing a lot of this stuff so there are a few things to
00:25:09.929
be aware of when you're building these as I've sort of hinted one of them is
00:25:15.300
buffering so and again I've already sort of talked about this but imagine you have you know one gigabyte upload and a
00:25:21.390
five gigabyte download if your proxy layer buffers that request that becomes sort of a sort of a bottleneck but it
00:25:29.970
also becomes a place where as the request gets uploaded it all has to halt until it's fully uploaded and then it
00:25:35.640
gets sent on to your back-end server and in the case of a one gigabyte upload or a 5k bi download that buffering can add
00:25:41.940
significant delays and so sometimes buffering is desirable though and
00:25:47.010
sometimes you can't achieve stuff without buffering and you know for example a unicorn actually wants
00:25:52.830
buffering to deal with slow clients so it can be advantageous but other times it's not and in our case we're dealing
00:25:59.130
with a diverse set of API is that we don't really know the use case of all those api's we opted not to buffer
00:26:05.490
because we just don't know what all those backends are doing or if they want to stream data and we didn't want to prevent that streaming from happening at
00:26:12.360
our proxy level because we just wanted to be as transparent as possible some other things to be aware of at this
00:26:18.240
reverse proxy layer is that if you are going to modify the response going back
00:26:23.520
to the client you can do that but it could be a little tricky you in the
00:26:30.960
header going back to the client there's usually a Content length header and you have to adjust adapt accordingly if
00:26:37.200
you're going to do any manipulation so again this isn't the easiest way to perhaps alter your website but for
00:26:43.710
certain use cases I think it can be powerful and in line with that another thing to be careful of is gzipped
00:26:50.160
responses going back to the client so if your back-end server decides to gzip something up send it back to the client
00:26:56.280
if you want to alter the response body in that case it gets tricky again because you sort of you do have to
00:27:01.800
buffer in that case because you sort of have to get the full gzip then because you can't just gzip the different unzip
00:27:08.130
the different trunks you have to get the full response body buffer it on gzip it
00:27:13.680
do whatever you were wanting to manipulate and then regions if it and send it back to the client so those are
00:27:20.250
just some things to be aware of that we've just sort of learned as we through this process so now I'm gonna
00:27:27.570
talk a little bit about some other stuff so this is all well and good but I don't
00:27:33.899
know if you're interested in bigger stuff so this digital strategy for the
00:27:40.109
federal government came out earlier this year May 23rd 2012 apparently and so I'm
00:27:49.320
involved in the API portion of sort of some of this digital strategy and a big
00:27:54.600
part of this digital strategy for the entire federal government is just a web services bonanza I mean it's just like
00:28:01.200
everybody should be doing Web Services web services are great all agencies should be you know delivering data to
00:28:08.849
people in the form of web services and I would tend to agree with that I mean I think you know as app developers it's
00:28:15.809
always nice to find data out there it's just in an open web format in an API
00:28:21.299
that's easy to use and get access to any or the government has a lot of data but
00:28:27.720
it's not always in the most easily accessible format you know I've seen stuff where it's like it's a printed
00:28:33.539
Excel spreadsheet it's been scanned in and then put into a Word document and
00:28:38.849
then they just give that to you and it's like I'm providing data not really so a
00:28:45.119
big part of this push for this federal strategy is to encourage agencies to
00:28:50.220
develop a lot more api's and web services so expect that sort of within the coming year to see a lot more
00:28:56.720
government data out there so that's exciting and of itself but the portion
00:29:03.599
that I'm involved in is sort of the API thing and sort of the main objectives
00:29:08.820
there is you know there is this big push for web services but they sort of want
00:29:14.399
to tackle two things on the one hand they want to make it easier for users like a lot of you to find and consume
00:29:21.599
federal aid api's there are some out there already but a lot of times they're not easy to find you know you might be
00:29:27.749
interested in it some data but it's just hard to find through the bureaucratic websites that are government websites
00:29:35.500
and the other aspect of this is that if they are pushing agencies to develop
00:29:40.990
more api's they they need to make it easier for agencies to develop and
00:29:47.409
deploy more api's so you know if there's this big push for api's it just needs to
00:29:53.769
become easier a lot of agencies you know a lot of them are ahead of the game and they're already building api's that are doing great work but some of them are
00:30:00.009
sort of they need help with this kind of thing so in a lot of ways this is the exact same problem we had within our
00:30:06.129
organization at NREL just on a much bigger scale you know there are silos of
00:30:12.879
organizations there's different agencies that are all doing things independently but a lot of the stuff that needs to be
00:30:20.139
addressed are similar issues so it's
00:30:26.049
sort of mirrors our same problems so what we're looking at right now is is it sort of the same solution we're
00:30:33.279
currently evaluating using basically our software stack that we developed at NREL
00:30:39.370
to sort of proxy to all the agencies within a federal government or possibly
00:30:45.580
other solutions that sort of do similar things but we've been talking to a lot of different agencies and the consensus
00:30:52.629
was they want something like this they don't want to have to deal with authentication on their own you know they want and from the federal level and
00:31:00.220
from a user level they want to make it easy for users to just get one API key and to be able to generally access all
00:31:07.120
the federal API so you don't need to have a bajillion accounts for all the different agencies so so yeah I mean
00:31:15.549
it's sort of the same issue sort of I perhaps the same solution and so yeah
00:31:21.309
we're currently involved in getting something like this up and running perhaps within the next six months for
00:31:27.070
federal agencies to start taking advantage of so that's all exciting there's lots of federal there's lots of
00:31:35.200
web service action going on in the federal government over the next year so stay tuned if you're at all interested in all that so
00:31:44.630
I'm starting to wind down here but I have some more slides to go through so what has all this been about really so
00:31:51.470
to sort of summarize you know what I really want to encourage here is sort of
00:31:56.810
a different way of perhaps thinking about some of your architecture again I
00:32:02.830
reverse proxies can't solve all problems that are not suitable for all problems but I think they can be an interesting
00:32:08.720
different approach that I don't see utilized as much to solve certain problems because they're fun for the
00:32:16.730
whole family you know anything you do within the reverse proxy layer affects all of your back-end applications so as
00:32:22.670
you start to add new features to your reverse proxy layer it can be advantageous for everyone so it's an
00:32:28.550
interesting way to sort of abstract things completely outside of the application level into a higher level
00:32:33.770
that sits in front of all of your applications and you know what I want to encourage you here is you might be able
00:32:39.470
to do more with reverse proxies than you realize you might think of reverse proxy just is you know a software like engine
00:32:46.190
X or H a proxy that sort of just does proxying but there's not a lot of logic that can happen in there or rather like
00:32:53.570
implementation details that can happen in there but since you can write these in Ruby you can start to leverage all
00:32:59.510
these libraries you can connect to databases you can do all sorts of crazy fun stuff at that level so yes again
00:33:05.480
just sort of think about it is a different way to perhaps architect some of your applications so I'll go through
00:33:11.480
some just random resources that might be useful one of them is API umbrella this
00:33:18.590
is our full API management system we've built at NREL and we've just recently
00:33:24.020
finally got approval to open-source it so it's all out of github we're super excited to finally open-source the
00:33:29.390
project and includes our custom event machine based proxy so you know even if
00:33:36.860
this is even if you're not interested in api's you might check it out as you know what you can do with what you can do
00:33:46.250
with that reverse proxy level and how you would implement some of those details so it's github slash NREL slash API
00:33:53.090
umbrella and it's a new open-source project for us so we're sadly behind the times as
00:33:58.300
far as getting it all documented and everything but definitely reach out to me if you have questions about any of
00:34:03.310
that as far as just ruby and event machine just low level proxies there's AM proxy which i talked about
00:34:10.030
and those are the examples i show it's just sort of a simple bare-bones but very capable reverse proxy
00:34:16.570
there's proxy machine which is actually what our current production system is based on it's a github project it's
00:34:24.790
simpler the one and and it can be easier to get started perhaps but the one
00:34:31.390
disadvantage is it can only act on incoming requests it doesn't track the response going outwards so it sort of
00:34:36.640
just does incoming stuff and then does routing it doesn't keep track of the response coming out so you can't keep
00:34:42.100
track of some of the stuff you can in DM proxy so we're currently in the process of switching over to um proxy and then
00:34:48.820
there's Goliath which is another thing that's based on all this event machine magic and it's more of a sort of
00:34:57.430
higher-level framework it still is pretty low-level but it then uses in synchrony which uses some fiber stuff so
00:35:04.480
you act it actually hides all the event stuff from you so that's an interesting project to also check out as far as just
00:35:10.810
general reverse proxies you might be interested in we use there's a lot of them out there this certainly is an
00:35:16.570
exaustive these are just sort of the ones we use in variety of capacities there's a CH a proxy H a proxy is
00:35:22.500
amazingly fast and scalable and it could do all sorts of fun load balancing stuff and it acts as a great general proxy
00:35:29.760
varnish cache is an interesting one it's a reverse proxy but it also is a caching layer and so that sort of gets back to
00:35:36.490
sort of we're actually going to be adding this to sort of our stack we're gonna slip this in as another reverse
00:35:42.550
proxy in our stack and then the advantage there is that some of our older API is that maybe don't have as
00:35:48.640
good of caching capabilities as some of those stuff built into rails does can start to use the varnish caching server
00:35:54.550
as a caching layer and varnish is nice just for caching all sorts of stuff and
00:36:00.220
then there's nginx which is really more of a web server but it also has some pretty nice proxying capabilities
00:36:07.220
it isn't as exhaustive as something like H a proxy but if you're already using it it can do quite a bit of stuff and so if
00:36:15.710
you happen to be interested in renewable energy api's this is our site that we sort of built that sort of this is what
00:36:23.720
this is was all about is sort of making one website for users to find all of our API so even though they happen to be on
00:36:29.060
all sorts of different servers within our organization and so you can check that out at developer and rel gov and
00:36:35.810
there's lots more api's coming soon some of my colleagues sitting down here on the front are working hard away at
00:36:41.690
building new api's so there's a lot of cool stuff in the pipeline and so those
00:36:47.630
are some contact details for myself I don't really use Twitter I'm sort of a curmudgeon but feel free to contact me
00:36:54.320
maybe I'll start using it and finally this is a completely off topic and a
00:37:00.109
shameless plug but if you've been wondering about this ridiculous thing on my upper lip we just finished our
00:37:08.570
mustache competition at work this week and we have a mustache competition every year for charity and so we haven't met
00:37:17.450
last year's goal but if you're interested it's a local denver-based charity to help support kids and
00:37:23.750
education resources so if you're interested bitly slash rubies - you can
00:37:31.310
donate but anyway I think there's actually some time for some questions because I was cooking yeah
00:37:42.589
I have a question that big deal with Easter services to be bishops one
00:37:56.569
was how do we deal with sort of inner process communication you know so if one
00:38:02.839
what service needs to call another service does that travel through the reverse proxy as well and the answer is
00:38:08.209
in our current system yes it isn't strictly necessary on the backend we definitely could communicate
00:38:13.910
directly server to server if we wanted to save the overhead but I will say this reverse proxy is very fast I mean it in
00:38:21.049
benchmarks I think it adds like four milliseconds to deal with the rate
00:38:27.109
limiting and analytics and user authentication and that utilizes Redis and MongoDB to do all that so it's fast
00:38:34.130
so for the time being we haven't really seen any problems with routing those sort of inner service communications
00:38:39.979
through the proxy and that sort of just gains us some advantages as far as analytics mainly just because we are
00:38:46.279
interested ourselves to know internally how we're using our API
00:38:57.110
so the question was how to deal with authentication then yeah so we still use
00:39:03.510
the same our authentication is admittedly very simple it's all API key based so it's just a big long API key
00:39:10.050
and then on the back end we have capabilities to remove the rate limits
00:39:15.450
from certain API keys so basically we set up those back in things as unlimited API keys so we have API keys that we
00:39:23.070
send back and forth and that's what we use for authentication and that sort of identifies each app so again for analytics yep
00:39:43.690
well how do you do with the fact that you're the pattern of the text you're looking for maybe split between blocks
00:39:50.260
do you have a facility to you love producing blocks yeah that's a good question so for the first part of that
00:39:56.920
question that you might have missed was how do we deal with sort of since things are coming into this proxy as chunks and
00:40:02.200
you know if we're doing something like a regular expression how do we make sure that it doesn't spam chunks and we don't have that data so the answer there is
00:40:09.359
first of all I wouldn't really recommend doing regular expressions search is sort of for that reason that was sort of a
00:40:14.890
very simple example but that is why you would use a higher-level library like
00:40:20.710
that HTTP HTTP parser library that then is capable of knowing when all of your
00:40:26.559
headers are read in and then you could take some sort of action on the whole all the headers or once all all the bodies write it right in but you're
00:40:32.559
right I mean in that case you do sort of need to buffer and that's why if you're doing anything with the response you
00:40:38.819
there are some ways you can get around it for very specific cases but for the most part if you're dealing with the
00:40:44.289
response you have to buffer and then if your incoming requests sort of the same deal if you want to do anything with the
00:40:51.339
body and yeah yep you can take a look at
00:41:02.500
our github repo it's probably not as well tested as it should be so yes there are I mean I would say I would say
00:41:11.109
slightly yes I mean we I would say the testing right now tries to break it down
00:41:16.180
on a it doesn't really get involved in the event machine sort of level it sort
00:41:22.000
of is just testing the more basic blocks within there so yeah we're probably lacking on tests for sort of the whole
00:41:28.450
thing running as a whole with all the events going crazy so yeah testing is more of a challenge with the bed machine
00:41:34.569
stuff
00:41:47.010
Apache does do it yeah and Apache can be used I would say it's probably not as
00:41:52.990
scalable as a lot of these other solutions it depends on how you're running Apache whether or not you're
00:41:58.600
sort of it's been a while since I've used Apache but the worker model versus the other model whatever that is by
00:42:07.390
default Apache will sort of spin up new processes for every single request whereas I think all of these other ones
00:42:14.220
H a proxy and nginx at least or event based model so they're a lot lighter weight so Apache will work the other
00:42:22.930
ones possibly just a little more scalable but again if you're already using Apache go for it I mean I don't
00:42:28.900
want to tell you not to use it
00:42:37.210
yes so we oh sorry yeah the question was
00:42:43.999
a previous question I don't know if I repeated that was about using Apache if you couldn't pick that up this question
00:42:49.309
was about how we deploy our proxy and if we deal with any sort of missed requests
00:42:57.589
or any of that so to deal with deploying we all of our deployment stuff runs
00:43:04.249
through Capistrano but within that how we're really deploying that and dealing with that issue of we do basically do a
00:43:11.329
rolling restart all of these we basically run I think three of these
00:43:17.359
processes or four of these processes it's sort of as the same models engine X where you run as many as you know courses you have on your CPU so we have
00:43:24.920
you know several of these processes of running and then we do do a rolling restart we do that through a Python
00:43:29.930
project called supervisor it's a pretty nice library just it's not really Python
00:43:35.390
relate at all it just happens to be written in Python but it's a nice process management library that you can
00:43:41.599
deal with sort of treating things as process groups I know a lot of people might use like Monnett and I think
00:43:48.349
there's some other blue pill God I think there's a few other those I've we've been I think probably more happy with
00:43:54.950
supervisor and you can do so we basically do we did a custom
00:44:00.109
implementation of a rolling restart with supervisor so we stopped one and start another I think there might be some edge
00:44:06.140
case stuff where if somebody's in the middle of a request I'm not sure if we're completely gracefully shutting down right now but I think actually part
00:44:12.289
of proxy machine an e/m proxy does should handle graceful shutdown so theoretically that should happen without
00:44:18.589
any lost requests any other questions
00:44:24.160
yeah what perfect timing well thank you guys very much I hope this was okay