00:00:16.960
hi everyone thank you for uh thank you for coming here this evening um uh i'm actually surprised to see so many
00:00:23.359
people at five in the evening uh last session but yeah thank you for coming here our talk is going to be
00:00:29.359
about rail services in the world uh world garden there are two of us presenting here uh for this this is niranjan my
00:00:35.840
colleague um he's akamiya on github and niranjan p on twitter and uh this is me sidhu i'm kiran on
00:00:43.200
github and pon up on twitter we both work at c42 engineering
00:00:49.280
this is a small ruby consultancy based out of bangalore in india
00:00:54.640
and let's get started with a little bit of background about this stock just to set some context
00:01:01.039
so this talk is based primarily of our work building a suite of nine plus
00:01:06.320
services at this at a fairly large business these services were a bunch of rails
00:01:11.600
applications that today automate uh entire data centers this is one of the largest hosting companies in the world
00:01:17.040
so it's probably one of the biggest uh deployments of uh of rail services that
00:01:22.560
i i've had a chance to work on we've built a bunch of other stuff but this was the
00:01:28.479
most interesting because we were replacing a monolithic legacy application
00:01:34.479
that was causing a lot of pain to the business and was pretty old and was mission critical since then we've we've done other stuff
00:01:40.960
we've but a lot of api work but i don't think any of them have come close to the the scale and complexity of that
00:01:47.119
particular project but we've done a bunch of stuff with two apis three apis uh in that scale so uh
00:01:53.439
almost without intending to we became something of uh a little bit of a focus group for uh for apis back home
00:02:01.360
um so this talk is about rails and service oriented architectures and in the wall garden so let me talk about
00:02:08.160
uh you know these things uh in in bits and pieces as to you know break them down and talk about why they're relevant
00:02:14.800
um but i think the single most important thing here is why do we say walt garden now by
00:02:20.879
wallgarden i mean services that run within a business which are not being consumed by people
00:02:26.720
outside the business this allows us to have a certain amount of leeway in how we structure and write these apis so
00:02:34.080
that we save on multiple things we save primarily on effort and cost we don't have to go the full rest hate
00:02:40.640
os full-blown consumer api way because we have far more control
00:02:45.840
so these are the assumptions we make and there is a certain structure to
00:02:52.400
to this talk as well the stock is going to be structured around the problems and the pain points we ran into when
00:02:58.239
building apis inside the walled garden so i think the first question is
00:03:04.400
why so okay so let's uh just quickly summarize the advantages of using soa versus building
00:03:12.319
a monolithic application first of all it allows us to break our application as per multiple business verticals we have
00:03:18.800
in our organization which where individual service provides an api to consume by that vertical
00:03:25.040
whereas it also provides integration points for other services to provide organization-wide workflows
00:03:31.519
every application rather every service is a self-contained which means that whatever development i'm doing in that
00:03:38.000
service doesn't cause ripple effects to all other services across the board as long as my api remains constant the way
00:03:45.040
it helps is whenever we are building a complex workflow for a particular vertical we don't have to worry about
00:03:51.440
all other business domains which we don't understand or don't care about essentially it means that we can
00:03:57.360
independently evolve a particular vertical by allowing other verticals to integrate with it
00:04:02.799
it also allows us to independently deploy a particular service if say for example if i'm working on one
00:04:08.640
particular business vertical and i'm done in two weeks but there is this other particular service which is being
00:04:14.959
developed and if it takes four weeks i not wait for four weeks before i can deploy my service i just have to make
00:04:20.239
sure that i'm not breaking the api and i can push things to production as soon as i'm done with them
00:04:26.800
now it also allows us to monitor what services are being hit more frequently
00:04:32.720
and scale those services instead of scaling the entire suit of services they are easy to maintain
00:04:40.639
every like if codebase is small of course scorebase is independent the code quality is generally higher
00:04:46.080
the bigger your code base get messier it becomes and more broken windows it will have so definitely it helps
00:04:52.400
we all know that smaller teams building working on individual code bases is much better than one massive team of say 8 10
00:05:00.240
15 or 20 people working on a shared code base so these are all advantages and list
00:05:06.880
continues but there are some disadvantages of building
00:05:12.320
service oriented architecture so what are these disadvantages first and foremost services like to talk
00:05:17.680
a lot they talk to each other and this essentially can lead to performance bottleneck because every service call
00:05:24.080
you are making comes up with its own http and framework over overheads which
00:05:30.800
we cannot avoid we have to provide some sort of transparent authorization and authentication across all services we
00:05:37.360
cannot just go with authorization for service because we want to implement workflows which span multiple services
00:05:44.880
it is difficult to implement database transactions like asset transactions across distributed databases when it
00:05:50.960
comes to distributed services it's even bigger problem because we have to manage it over http so
00:05:58.319
then anytime we are building talking about building services we always talk about how we are going to version the
00:06:03.440
api and how to make how to make sure that my apis are backward compatible if there
00:06:08.479
are old clients they should still talk to they should be able to talk to new services
00:06:14.000
and last but not least continuous integration like making sure that there is a continuation between integration
00:06:20.160
build running across multiple services which does actual integration testing by spending up all the services
00:06:25.919
and making um a making calls which span all those workflows is difficult
00:06:34.160
okay so uh we've we've gone through some sort of a justification for why service
00:06:39.919
oriented architecture is important but uh why is why is rails important in this context well frankly it's the most
00:06:45.759
awesome thing i've seen to create apis it's ridiculously easy it gives you powerful routing i'm sorry i was about
00:06:51.759
to say rooting i'm indian but powerful routing which means that it is probably the most
00:06:58.240
convenient thing that i've seen to map http requests to a block of code which then creates a resource for you it gives you
00:07:05.039
again what's probably among the best mime type negotiation frameworks out of the box like if you try to do this with
00:07:11.360
something else it's not that trivial so dealing with different kinds of representations content types is really
00:07:17.120
easy with rails you have less boilerplate code i mean there is still some but if you've ever seen a spring code base believe me rails is so much
00:07:23.680
better and then there's the bottom line which is that ruby is nice i mean given a choice between writing code and ruby
00:07:29.440
and writing code in java there's no reason why i wouldn't pick ruby but this begs the question what about the other
00:07:34.960
popular frameworks what about sinatra what about pedrino what about uh you know other similar frameworks now
00:07:40.560
this talk doesn't discount those much of what we're going to cover is relevant to those frameworks as well
00:07:46.000
it's just that rails is extremely popular and gives you a lot of the things that you need out of the box so we're going to talk about rails but
00:07:51.919
most of what we're covering is just as relevant for sinatra and friends as it is for rails
00:07:57.599
and uh walled gardens now let's let's uh go a little further into what i mean by
00:08:03.039
wall gardens uh beyond my original explanation uh it's the first first premise that i'm i'm
00:08:10.000
making here is that being inside a walled garden is easier you don't have thousands of consumers you don't have
00:08:16.400
your apis being consumed in unpredictable ways you have full visibility into the evolution of clients that consumer apis
00:08:23.039
because this is all within your organization it's easier to form a roadmap now what this means is that you can
00:08:28.319
avoid certain costs that you otherwise have no choice but to incur with with public general consumers
00:08:35.039
i am also going to be using certain rest related terminology here which uh you know if if it's unfamiliar for whatever
00:08:42.159
reason we can talk about in the qa but i'm just going to make a certain assumption here that uh it's you know people are familiar with it in
00:08:48.399
passing so one thing is that if you're going to try to build a full head os architecture it's extremely expensive
00:08:53.920
it's expensive in terms of implementation effort it's expensive in terms of performance and quite frankly i
00:08:59.120
haven't seen a full you know richardson maturity model level three api in the
00:09:04.640
wild yet i'd love to see one if anybody's run into one and can point me at one that would be awesome but i haven't seen one i've tried to build one
00:09:11.279
it's pretty hard so there is no reason to build a full head os architecture within uh the world garden heat os
00:09:17.440
exists to solve a lot of issues that crop up when consumers are using your apis rails does riches and maturity
00:09:23.519
model level 2 2.5 somewhere between two and three pretty much out of the box you can get two just like that with a little
00:09:29.519
bit of effort you get a little beyond that and this is free now richardson maturity model for those of you that are not familiar is a model which tells you
00:09:36.880
the levels of restfulness of apis and the highest level is a three so rails gets pretty close for free out of the
00:09:43.279
box and while we're talking about this let me quickly click you know call out this one glaringly obvious thing which is the
00:09:49.360
trails is not rest there is like restful routes uh is marketing like having rest
00:09:55.120
like restful routes doesn't mean your api is restful in the least using rails again doesn't mean that your api is just
00:10:00.560
full it does take a certain amount of work now as i mentioned earlier this talk is broken down into a series
00:10:06.480
of areas of interest that we found were bottlenecks while developing services and that's what we're going to structure
00:10:12.640
this talk around going forward this was the background a little bit of context and now we're going to start digging
00:10:17.839
into problems and solutions so first off we have yeah so uh first and foremost
00:10:22.959
thing let's talk about authentication whenever we're building any services we do want to
00:10:28.000
guard those services make sure that only certain set of people are able to consume those apis or
00:10:34.320
whatever now interesting thing is when we are building authentication across services
00:10:40.720
we need to make sure that we are implementing it in such a way that once i log into one service
00:10:46.480
i can actually go through a workflow which spawns which spans multiple services without re-authenticating
00:10:52.640
against a server or different services
00:10:57.760
now when we are building restful services obviously these authentication whatever authentication mechanism we
00:11:03.040
come up with has to be stateless which means that i cannot use cookies which raise generally uses for
00:11:09.839
managing authentication now the quickest simplest and cheapest way to get this authentication done is
00:11:17.279
just put a firewall in front of all your apis get a range of ips which can access those apis and you're kind of done make
00:11:23.360
sure that every consumer is sitting within those ip ranges and no one else will be able to access those apis
00:11:28.640
anymore and it will work in a small scenario but as your services
00:11:34.560
grow more and more complex you want a better solution what you want is essentially some sort of centralized
00:11:40.079
authentication system so that user can actually authenticate against that central authentication system and then
00:11:45.440
start using apis which are behind authentication firewall now there are multiple services which we have already
00:11:52.399
consuming which are doing this say for example facebook does it github does it twitter does it they all allow us to
00:11:59.440
actually authenticate against them as a service and we can simply use them but
00:12:05.680
as we are inside of all garden we might not have access to those services or we might not need those services for
00:12:11.920
authentication at all so what do we do we can simply create our own authentication
00:12:17.839
provider like oauth2 there are multiple oauth 2 providers out there and we can
00:12:22.880
just use one of them tweak them as per our necessary necessity now why war 2
00:12:29.440
or2 is pretty much becoming industry standard nowadays it gives you a very definite guidelines in terms of what you
00:12:35.440
need to do to actually achieve authentication across services it's very simple to implement rather if
00:12:41.440
you have a good net http client you can implement an oauth 2 client in matter of say 30 minutes there is a simple
00:12:48.399
sample plan available in one of our open source projects you can take a look at that if you want to
00:12:53.839
but what happens if you are using active resource as your client that's an interesting question because
00:13:00.079
active resource by default doesn't give you any access to response headers or request headers and oauth2 primarily
00:13:06.480
works on information passed through headers so essentially if we are not willing to monkey patch
00:13:12.160
the active resource active resources out of the door so we have to consider two things either we use active resource
00:13:18.000
open it up make sure that we have access to response headers and we can set all headers for every outbound request or we
00:13:24.639
can simply reconsider the active resource itself now as uh
00:13:30.560
while we are building all these services inside a walled garden we might have some
00:13:36.000
kind of external client the client is not externalizing developed by someone else but developed by us but accessed by
00:13:41.920
external user now if we are already an author provider we can essentially use any of the
00:13:47.920
external or2 providers and just get the authentication done like we can use google or author we can use facebook or
00:13:54.639
auth and let us authenticate against those and access our consumers
00:14:00.560
okay so next up right after authentication the next thing with an authorization uh this is pretty much uh a done deal in
00:14:07.279
rails if you're looking at standalone apps right user roles we have dozens of gems that do this and do it pretty well now the hard thing
00:14:13.920
is when roles have to happen over http that is authorization at a service level now there are several ways to look at
00:14:19.519
this problem you could potentially look at having centralized rules that is you have an authorization service which
00:14:26.160
contains all the roles the users and the rules that apply to them now the problem with this is it gets messy and
00:14:32.320
fragmented the whole point of having services so that services can evolve independently if you impose rules on
00:14:38.240
them from the outside you're already constraining their ability to evolve independently so what's the solution to
00:14:44.160
this what we recommend and what we've seen that works for us to have centralized roles but federated rules just have a
00:14:50.959
very clear set of responsibilities for your authorization service which is it simply maps users to roles and nothing
00:14:56.480
more right beyond that what those roles mean for particular resources in particular services is left to those
00:15:03.440
services then we have the next thing which is yeah so while talking about problems
00:15:10.560
with service oriented architecture i mentioned that there is chattiness between services now
00:15:16.160
let's just take an example assume we have this one client and two services which are sitting
00:15:22.000
which we are developing and there is one aux central lot server now client makes a request saying that hey i want this
00:15:28.639
information the first thing service has to do is validate without server whether requesting user is valid or not once it
00:15:36.000
has gotten confirmation from auth server that yes go ahead and process this request to complete the workflow service
00:15:41.920
one internally calls service to passing the user information now service one service two cannot depend on
00:15:48.000
the fact that service one has authenticated the user so it has to authenticate it again once it gets the go ahead from our
00:15:54.079
server it will actually generate the response send it back and then service one will process it further and send it
00:15:59.600
back to the client so this is going to grow the more services
00:16:04.880
you add to this this workflow your http graph is going to shoot like crazy and there is no really a civil silver
00:16:11.759
bullet to solve this problem because if you have been building independent services we are talking over http they are going to talk but there are certain
00:16:18.399
things which we can do to minimize the problem which we will face with chattiness of the application
00:16:24.720
first and foremost you want to make sure that your requests are fast so early in a project you want to set up some kind
00:16:31.279
of a performance build which makes sure that all the requests are satisfying
00:16:36.399
certain criteria they say for example take every get request should be less more or
00:16:42.959
less equal to 40 milliseconds it shouldn't grow beyond that once this particular performance build
00:16:48.480
in place you should monitor the trends how you while your services are growing and your applications are becoming more and more
00:16:54.720
complex how these response times are changing and what kind of tweaking you need to make sure that those response
00:16:59.920
times are brought down again now most of the requests you are making
00:17:05.120
across your app services are going to return smaller payloads and you need to make sure that you are
00:17:11.760
optimizing your app servers your web servers to handle these smaller payloads say for example apache versus nginx
00:17:17.039
which which one do you want to choose as your web server you need to consider that and obviously
00:17:23.839
you want to keep the payload small because if you try to return safe 50 mb of xml it is going to slow your entire
00:17:30.720
network down and potentially get down the performance of entire service suit
00:17:36.400
now while that is possible another thing you can
00:17:41.919
keep start introducing is caching http 3.4 not modified is the best thing
00:17:48.480
you can do you can implement server-side caching you can implement client-side caching
00:17:54.000
rails comes with fragment caching action caching if you have resources which need authorization and authentication you can
00:18:00.799
just action cache them and it will take care of before filters before actually returning a response
00:18:06.720
if you have some resources which don't need authentication you can just simply page cache them and your web server will take
00:18:13.679
care of directly serving them out of your public directory another thing you can do is
00:18:20.160
you can start using e-tags but the cache there is to use e-tags you
00:18:25.440
want a client which can support caching so essentially client sends a tag in a request and server response
00:18:32.559
saying that hey yeah this was not modified just load it from your cache and catch here is again if you're using
00:18:38.559
active resources as your client it neither supports uh caching at the client side nor does it allow you to
00:18:45.520
open the request header to put etag in every request so what can you do
00:18:50.960
there is this open source library which you can use which actually supports client-side caching and just if you slap
00:18:56.960
on top of it active model you can build your own client instead of active resource and use that instead of using
00:19:02.960
active resource only thing is obviously active resource comes with this whole plethora of functionality you might not
00:19:09.280
have all of them but you can introduce them as and when you need them
00:19:14.400
now we have covered the client side caching and server side caching but there is this whole caching which
00:19:19.600
happens in the middle over internet you might not need that if you are actually developing
00:19:25.039
a set of services which are local towards installation but if your services are actually spanning across the globe you
00:19:31.760
might want to consider some kind of caching in the middle say for example you might want to look at varnish or squid to cash your resources
00:19:39.039
in between now whenever we talk about caching you have to also take care of when and how
00:19:45.280
you expire the cash now server side caching so client-side caching is still there are like solutions to expire those
00:19:51.760
caches but what you might want to look at is if you are actually putting some cash in middle
00:19:57.360
how to expire square caches or varnish caches because otherwise that's going to come and hit you
00:20:02.400
so yeah that's about caching okay so next up is something that doesn't usually get a lot of attention when
00:20:07.919
we're talking about services which is pagination now let me just just jump in with an example and and i hope that'll
00:20:14.400
illustrate the default uh index action in rails i'm sure you're all familiar with is basically active record base all
00:20:20.720
this is what the template generates this means that when you're doing a search and looking something up for an api it's
00:20:26.159
going to return everything the database has now if you have 50k records that's pretty much what you get
00:20:32.480
so nation is very important and we do have a fairly nice solution for it uh
00:20:38.960
already which is well page in it now we'll page it runs off three pieces of page nation metadata which is which page
00:20:45.120
are we on how many records do we have in that page and how many records do we have in total now these three pieces of
00:20:50.240
metadata we need to start mapping them to http requests so that we can start page naming opiating over
00:20:56.559
apis so there are a few possible places or ways in which we can tackle this problem
00:21:02.640
the first one right off is http headers we can put these three pieces of metadata under the headers um it's a
00:21:08.000
little aesthetically displeasing because this is not really what headers are meant for and
00:21:13.440
it isn't really modeling the resource because the resource that you're modeling is really a collection so these are intrinsic parts of the resource
00:21:19.919
itself they're not metadata sitting in the headers you could put them into the xml tag attributes so the root tags carry these
00:21:27.200
three attributes page per page and total count and that's another way to look at it now again this is somewhat
00:21:33.039
displeasing because it isn't modeling the resource same as before second it depends on the fact
00:21:38.080
that you can have attributes in the first place what if you're using json and then of course there is what i was
00:21:43.360
talking about which is to actually model this as it should be modeled which is that you have a resource that is a collection if you're going full heat os
00:21:49.440
you you basically have these three attributes and then a bunch of uris but again we come back to active
00:21:55.039
resource which does not do this so if you're using active resource i mean stepping back if if you're using some
00:22:00.720
other client at school i would uh i would i would probably go with collection resources but if you're using active resources uh active resource what
00:22:07.520
you want is something that's a combination of active resource and build page name just like in normal rails app
00:22:12.640
we have active record and we'll page in it so what we did is pick option number two which is xml tag attributes because
00:22:19.440
we found that this was pretty easy to implement across the board and put that together to create
00:22:24.480
something called box page in it which you can find on github this is something that we wrote to allow active resource
00:22:29.840
to transparently pagenet so something you could potentially look at as a solution so on to the next area
00:22:37.039
yeah so let's talk about a slightly more complex problems which we might face while building services
00:22:43.280
and that is now we have caching at the server side caching at the client side but sometimes what we want is not the
00:22:49.280
cache responses but the cached objects themselves now let's take a look at this scenario for example we have this user
00:22:55.840
management service which actually manages multiple users under different companies and we have this project
00:23:02.000
management service which actually manages projects for a particular user now we get a new requirement saying that
00:23:09.120
as an admin user i want all projects which are for a given company
00:23:14.240
now there's a problem because in normal monolithic application wherein we have everything available in one
00:23:20.159
place we can just fire a database query and get the result out but if you see this model
00:23:26.400
we don't have we have company information here and we have project information in some other service so we have to make sure
00:23:31.919
that we actually somehow get that information from other services and create the join ourselves so what we are
00:23:37.919
essentially doing is we are actually making an http call to an external resource to get all the user ids
00:23:43.840
belonging to a particular company and then fire a database query to get all the projects which are in
00:23:50.799
where user id is and list of user ids now this very tribal exam explanation uh way to do it the
00:23:58.559
problem with that is if your user management service itself is
00:24:03.600
returning your paginated view of users in that case you have to fire multiple calls before you can get a
00:24:09.520
whole list of users which belong to a particular company now if that list of these users
00:24:15.679
is really large in that case when you're firing a database query at that time you'll hit probably hit the sql query
00:24:22.240
limit like it will go beyond 4 kb and you won't be able to fire that query at all so
00:24:27.440
how to solve this problem one way to solve this problem is essentially services share their data
00:24:33.840
with other services directly to the database but i seriously wouldn't recommend that what
00:24:40.320
happens if uh service one starts writing into the database of service too
00:24:45.600
essentially it defeats the whole purpose of going soft service oriented in the first place because now
00:24:51.760
everyone every service knows about every other service like internals other way to do it to prevent the actual
00:24:58.320
data writing we can probably give read on the connection like create a separate
00:25:03.600
database user and share it which only has read privileges on the schema or maybe create master slave database
00:25:09.840
configuration and create a read only slave copy now going by this approach definitely gives
00:25:17.039
you some benefits that first of all everything is immediately consistent so moment i update user in my user
00:25:22.400
management system and update my report which gives me all the projects in the company that report will automatically
00:25:28.960
get refreshed immediately without doing anything it is fairly easy to implement i just
00:25:34.559
have to drop another configuration in the database.tml configure my active record to talk to yet another database
00:25:40.799
connection and i'm kind of done but there are problems with this kind of approach
00:25:46.480
first of all we are integrating different services at the database level
00:25:51.600
so essentially we are breaking the encapsulation which otherwise service provides over its internal data and we
00:25:57.679
don't want to break the encapsulation what if there are computed fields which are present in the external
00:26:03.440
representation of a particular resource which is provided by service those computer computed fields won't be
00:26:09.360
available when you are directly hitting the database we are assuming that there is a one-to-one relationship between the
00:26:16.159
resources exposed by the api and the tables in the database that might not be
00:26:22.000
true essentially you are exposing internals of your one service to other service
00:26:27.600
which means that i cannot now independently evolve and modify my second service even though i'm
00:26:33.200
maintaining my apis properly so
00:26:38.400
case in point my user management system is currently using mysql as data store and i find out that it's really becoming
00:26:45.840
expensive and i want to shift shift to some kind of nosql database now i can no longer do it because there's some other
00:26:51.440
service which directly talks to my database and hints i'm kind of tied into that solution
00:26:56.640
so all in all try to avoid it it might work in certain scenarios but stay away from it if
00:27:03.120
possible so what is the other solution to implement the same thing before we talk about some other solution
00:27:09.039
let's talk about another problem we have which has similar suggestions
00:27:14.559
so something that comes up pretty uh often in most systems is an implementation of the observer pattern and when
00:27:20.640
an event occurs somewhere somebody else wants to know about it and you want to decouple these two so a trivial example
00:27:26.559
of this in the case of services is a post logout event right i have a user service my user logs out there are a
00:27:33.039
bunch of other services that want to know about this either to simply expire their sessions if they have one or maybe
00:27:39.120
they have some other things that trigger off that event now there are a couple of ways we can do this uh the most simple thing we can do is have a bunch of
00:27:45.279
callback uris right so my uh my client services register with the with the parent service saying that call
00:27:52.159
me back at this uri when a user logs out or when any other event happens and that's it when the event occurs the the
00:27:59.039
the main service will do the callback and as you can tell by my conversation you have callback hell it's really hard
00:28:04.960
to explain therefore it's probably hard to understand your configurations also get complex and then with all of these
00:28:10.799
callbacks your response times if you're doing this within the request response cycle trying to get really really long you could of
00:28:17.279
course solve this by having async callbacks right just throw background rb or something simple similar at it and have these callbacks done in the
00:28:22.960
background another way to look at this is potentially introduce an mq system like something like rabbit mq or
00:28:28.799
something like some similar tool this gives us a centralized bus which is an interesting difference from the
00:28:34.960
previous one in the previous case you had to know every single service that you were interested in and explicitly go
00:28:40.480
register yourself with that now here you just have a centralized bus and then you register listeners to that and then you
00:28:46.320
get all your events coming to you right there so you just have one authoritative source
00:28:51.360
it's async out of the box which is an interesting difference from the previous approach where you consciously have to make the differ make the decision to
00:28:57.440
make it async and you probably want to do some sort of convention over configuration with
00:29:02.559
regards to your topics or you know whatever names you use for your events
00:29:08.720
but otherwise these two approaches are broadly similar and you can pick and choose between them
00:29:14.559
but they're interesting because they lead up to a solution to what we were talking about that niranjan was talking
00:29:19.600
about earlier that i let him continue with yeah so uh
00:29:25.200
again going back to the problem we had and now going to the solution of that problem which is neater than actually
00:29:30.799
sharing the database across multiple services now assume that we actually set up some kind of event system which triggers the
00:29:37.919
event when resources are created updated or deleted and my second service which is
00:29:44.000
which actually wants a local cache of the first uh service which is actually triggering those events can listen to
00:29:50.559
those events and create resources in database in this way i'm not depending
00:29:55.840
on the database on the database integration but what i'm depending on is whatever
00:30:03.520
events which are triggered whatever payload is in that event i take that payload modified the way i want it for
00:30:09.279
my local cache and store it so that i can actually fire the joints which we could have fired in
00:30:15.120
monolithic application in this distributed services as well but
00:30:20.640
you have to keep one thing in mind that whatever you're storing in our local database is still a cache it's not
00:30:26.399
actual connection it doesn't have any actual connection with the resource which is sitting behind some other service so we should treat it like a
00:30:32.720
cache we shouldn't modify it locally because those changes are not going to get reflected anywhere else in the system and essentially you are just
00:30:42.159
like the reports you are generating are going to be bad if we are going with invented system
00:30:47.919
which is a synchronous another thing you need to keep in mind is whatever local cache we have
00:30:53.360
might not reflect in real time the actual data which is present behind some other service
00:30:58.399
and as it is not real time you need to pick and choose all those resources which you want to cache locally
00:31:04.480
keeping this in mind that whatever i'm getting there might be some lag
00:31:09.519
in the reflection now whenever we are caching anything we have to talk about how we are going to expire
00:31:15.679
cache so obviously if i'm listening to creation of users in user management system i also have to listen
00:31:22.559
to things like update event and delete event and accordingly modify my local
00:31:28.320
cache so that it actually reflects what is there on the server and obviously the server has to trigger those events with
00:31:35.279
payload which allows consumer to modify their local caches
00:31:43.679
so this is something that we again mentioned earlier api versioning unfortunately there doesn't seem to be
00:31:50.240
any um you know holy grail here like the pure rest guys talk about uh apis that
00:31:55.919
practically never need versioning but in in practice on the internet we see uris with b1 v2 v3 so on and so forth now
00:32:03.360
there are a few ways to deal with this uh with this uh with this problem the
00:32:08.559
inside of a walled garden uh and a few guidelines one thing is to just ensure that your resource representations
00:32:14.799
remain as far as possible backward compatible and that when you're changing them or especially when you're removing
00:32:20.000
fields from your resources you're notifying your clients which you can do because you know they're all in the same
00:32:25.840
organization as you are you may also want to consider having a
00:32:31.200
front-end server which routes to different versions of the application entirely so
00:32:36.399
you actually let version one run for a while when version 1.1 is out there so that clients have time even though
00:32:42.799
they're within your organization they have time to migrate and change and uh yeah that's pretty much it just a little
00:32:48.480
bit of common sense a little bit of communication really within the organization can work wonders can remove api versioning as a pain point
00:32:56.559
now the transactionality of various calls we are making specifically for those
00:33:02.240
workflows which are spanning across multiple services even in distributed databases it is hard
00:33:07.840
to achieve there is something called as two-phase command two-phase commit and databases actually implement two-phase
00:33:13.440
commit depending on which databases database you are using to achieve transactionality across distributed
00:33:19.760
databases it is even harder when we are talking over http and honestly speaking
00:33:25.360
though we have seen one such implementation working we haven't worked on it and that would be talking itself
00:33:31.360
like 40 45 minutes talk so i'm just going to take it offline and if anyone is interested we can talk about it but
00:33:37.519
yeah i'm going to keep that aside for time being okay and then we have the one of the
00:33:43.440
hardest things to deal with in these things which is just the core engineering the software engineering aspects of the whole uh
00:33:49.519
the whole development process uh one of one of the this is something that i picked up of one of my projects which is
00:33:56.559
standardized like crazy right standardize everything if you have any tooling any libraries that are part of
00:34:03.519
one service make sure that all services use them versions of libraries should be the same as far as possible ensure that
00:34:09.919
every service every code base uses the same tool same versions across the board if you're doing an upgrade calculate the
00:34:16.079
cost and upgrade across the board because engineers working on different code bases shouldn't have any surprising
00:34:21.359
things to deal with or different ways of implementing the exact same things in different services within an
00:34:26.399
organization if you have to deal with that can be quite a pain treat shared code between services like
00:34:32.320
any library like you may want to go as far as to spin up your own gem server create gems that are purely local not
00:34:37.760
published out there and have uh have bundler use those gems like go that far
00:34:43.200
because this is an important aspect of standardizing shared code between all the different services
00:34:49.919
another smell to keep an eye out for is conflation of human consumption of apis
00:34:55.119
with machine consumptions that is you know are you really producing an api or are you producing html so this kind of
00:35:01.200
code i'm not sure how clear this is is a smell if you see an action that um that
00:35:06.560
has both format xml or format json or something similar with format html you know in the same action this is a smell
00:35:12.640
watch out for this uh as far as possible you want to try to segment your actions that so that they either deal purely
00:35:19.280
with machine consumable apis or with people the two don't fit well together there is a mismatch there
00:35:24.880
there is a whole spectrum of separation that you can create starting from simply saying all right i'm going to make sure
00:35:30.800
that some controllers deal purely with apis and some controllers purely deal with people two all the way to the other
00:35:36.800
extreme which is to say i will build entire applications that have nothing but apis and no html at all and i will
00:35:43.040
build a separate application that is for people and that will act as a front end to this one the the twitter the new
00:35:48.320
twitter ui is probably a good example of something like this even though it's it's for public consumption
00:35:53.440
configuration management can become something of a nightmare across multiple services so don't commit configuration
00:35:58.880
into version control use something like chef or puppet to manage it so if you're looking at multiple servers
00:36:05.119
like if you have like 10 services you're probably looking at a minimum of 30 or 40 servers deal with it using chef and
00:36:10.960
puppet standardize your configurations across the board d you know have the same uh you know web servers same you
00:36:16.560
know same caches same everything so yeah that's pretty much our talk that's it
00:36:22.320
and uh we're open for questions and answers
00:36:41.760
awesome i'll check that out anybody else any questions
00:36:47.200
sure
00:37:00.560
uh we we actually started off with a system that was monolithic and already handled a certain volume of transactions
00:37:06.640
so we were looking at i don't remember was it 30 000 requests per hour something like in that range and that was only two so we were we were
00:37:13.440
a team that was working on one service so our service got 30 000 requests uh there were other services that probably
00:37:19.440
got a lot more sure
00:37:36.640
yes
00:37:51.520
right so that's exactly what i was saying earlier that uh you know most of what we're saying applies to everything
00:37:56.800
in the in the ruby ecosystem so my example really was in contrast with not not rails versus sinatra but it was more
00:38:03.520
rails versus spring or cake php or django or something like that
00:38:13.440
what about
00:38:18.480
sure so uh we didn't really address that but uh one one of the things that you want to look at is unless you have your
00:38:25.040
system actually go down uh you want to have that performance build monitoring use monitoring your
00:38:30.560
entire uh you call graph if that's the right right phrase to use so that at any
00:38:35.599
given point in time you don't have timeouts now i was on on on one story that that looked at exactly this where
00:38:42.720
one request panned out into about 200 requests and took about i don't know it was like three minutes or something crazy like that and you just don't want
00:38:49.200
you don't you don't want to get there right the simple solution to that is don't get there like track these things
00:38:54.720
up front because there is no good answer the other thing to do is to go async and async is a whole different bargain if
00:39:00.800
you're looking at an async system where your request response cycle has a bunch of stuff happening in the background that's a whole different ballgame from
00:39:06.400
this so this this pretty much says that okay if you're going synchronous from request to response just manage your
00:39:12.240
time so that your entire request response cycle happens in a same manner now we know this is possible because i
00:39:17.599
think one one article i read said that every amazon page that in in the store
00:39:23.040
has 135 services behind it or something similar so we know it's doable i haven't done it
00:39:28.960
for 135 i've done it for a dozen so yeah no good answer just track those trends
00:39:34.079
and make sure that it isn't going wild sure
00:39:53.839
sure i'm not sure people at the back got that the question was uh why you know what's the reasoning behind uh going to
00:39:59.359
oauth as opposed to something like ssh tunneling or any other solution that we could come up with you want to take this
00:40:05.200
one um yeah so essentially as i said like oauth is pretty much standard and it's easy to
00:40:10.319
implement okay and it gives you standard set of guidelines that hey put this in your
00:40:15.520
http headers and here you're dealing with http stack you're not going for your authentication and authorization
00:40:21.920
beyond http stack so if you actually want to send a request okay now imagine that you have
00:40:29.359
some servers deployed on ec2 some are in rackspace cloud everything is under your control
00:40:35.520
but you want to maintain them at separate places just for maintaining the redundancy and making sure nothing goes down and everything is behind the
00:40:42.240
firewall how we are going to open sss tunnels across different services
00:40:47.440
so there can be a bunch of such problems which will crop in again do you want to have ssh tunnel ssh access to your
00:40:53.200
production box or not is the first question so the other angle like my
00:40:58.560
like the other response i would have is that again it comes back to standardization like or2 is understood respected and
00:41:05.599
it isn't surprising and whatever performance losses you may have
00:41:10.640
using or 2 clearly there are a bunch of businesses that are working despite
00:41:15.680
those so it is potentially doable it's just a question of implementing it in the right way and that again is just
00:41:21.920
very very implementation specific it depends on your setup so on that side do i have a good formula to say can you
00:41:28.800
do this and make sure your oauth 2 is going to be really fast no i don't but you're just going to have to profile
00:41:34.720
it and make sure that you're doing certain things right um anybody else
00:41:42.000
yeah i see one hand there
00:41:47.119
approach that we um sure sure we just added a representation to
00:41:54.160
all of our resources
00:42:04.160
right that's that's pretty interesting so you you basically have uh a subset of information plus a pointer to whatever
00:42:11.599
else somebody might need
00:42:16.960
right so somebody wants it they go ask for it that makes a lot of sense okay great anybody any other questions
00:42:26.640
no great awesome so thank you everyone thank you very much please
00:43:05.920
do
00:43:12.800
you