00:00:16.960
thank you everyone that made it to the last talk all 35 of you
00:00:24.000
feel free to move up and heckle um i welcome questions at any point
00:00:29.599
obviously my name's tim connor since it's been up there for a while um i run a small consultancy called cloud cd
00:00:35.600
development um as anyone else i'm always hiring and i live in san francisco so i know a
00:00:41.280
bunch of people so if you don't want to work for me or some of these other people i can introduce you and
00:00:46.960
based on i do a pairing interview and based on that sort of say well you might be a good fit
00:00:54.559
but that's not what the talks about obviously it's about mapper sort of
00:01:00.079
which is a lot of fun to use i'll admit but i generally advise people against for most projects
00:01:07.920
particularly if they think oh it's web scale which it isn't uh
00:01:13.760
use dynamo bigtable like react as a dynamo implementation cassandra is to
00:01:18.799
just write plain log files and then do a map reduce that's not a good reason to use because you can't figure out
00:01:24.400
how to scale mysql bad reason you pay a bleeding edge tax
00:01:30.720
you're going to end up having a lot of people who don't know what they're doing versus how well they would know with mysql as developers make a lot
00:01:37.119
of mistakes and the tooling is going to change there's a whole bunch of reasons ask carbon five there's one guy in here
00:01:42.640
about like the bleeding edge tax they've paid on a couple of uh projects they will gladly tell you
00:01:49.840
and uh it does have a schema cliff moon enlightened me about this there's sort of a hidden scheme it uses to determine
00:01:56.720
how big the rose should be and unfortunately when you exceed this and you don't realize it all of a sudden
00:02:01.920
your gigabytes upon gigabytes upon gigabytes of data all have to be rewritten
00:02:07.439
and resized and then your website falls over so i don't count that as web scale
00:02:14.640
while i was using it on some projects replication changed twice in a year two replica sets
00:02:19.760
and i don't remember what to after which again is that sort of bleeding like bleeding edge text
00:02:25.280
so one of the reasons i didn't for a long time recommend postgres because the replication wasn't quite as out of the box as mysql i meant we know how mysql
00:02:32.319
replication works and it just does work it's kind of important for building big sites call me crazy but
00:02:40.400
oh yeah and they and they revamped mapreduce so of course if you have a lot of data and you're doing big data you
00:02:45.920
need to do mapreduce on and to have them say oh by the way you can't use it that way anymore rewrite all your
00:02:51.920
stuff again bleeding edge taxes using the coolest oh my god we all have to use it tech sometimes sucks for a couple
00:02:59.200
reasons but so why would you use it well tengen in their defense are awesome
00:03:06.000
and responsive you go jump on irc or somewhere and they will like tell you why something does it the way it does
00:03:11.200
and possibly fix it immediately so that's kind of big and it's powerful i mentioned that
00:03:16.959
before in my slide i didn't say it um what do i mean by powerful well
00:03:24.239
not having a schema is huge i mean obviously we hear about no schema and why is this such a big thing because you
00:03:30.239
can dynamically define your schema in the code and this is important because it leads to a really cool
00:03:35.519
pattern of development that um yeah i'll get to that in a second
00:03:41.040
which as a combination of embedded documents is another part of it and what you can do with mixins
00:03:47.680
um i thought this pattern that i'm about to get to i mislabeled it originally composition over inheritance
00:03:54.080
um and i was wrong and brianary corrected me and said no that's not composition that's just mixins you
00:03:59.439
dumbass and thank you he was right um he said i'm much nicer than that
00:04:04.879
so mixins i'll get back around to are very important because the inheritance
00:04:10.640
usually when you're building a rails app kind of sucks like single table inheritance i think everyone knows is
00:04:16.560
a really ugly approach to solve this problem but it's your only real choice if you're doing active record
00:04:21.919
um mapper has the same thing called single collection inheritance and again it's less than ideal
00:04:30.160
so as i said mixins um what mixins enable is i would say active record
00:04:36.800
is a good fit for sql for having a schema stored in the database that's what active back to record pattern is
00:04:42.400
for of hey look at your database here's the schema let's create a domain around it data model not a good fit for activerec
00:04:50.000
or for sql like datamapper was a cool project but to define your schema in the code
00:04:57.360
and then again have it defined in the database not such a good fit it doesn't really fit the dynamic nature data mapper
00:05:03.919
which um nosql does
00:05:09.199
embedded documents i mentioned if you don't know about them by now i think everyone's heard so much about
00:05:14.479
that maybe everyone in the room already does they're kind of like the serialized hash from active support but
00:05:20.080
like seriously plus plus if you're using postgres people will say hey we have native arrays which are cool
00:05:27.120
and powerful you're right um they're not quite as cool and powerful as
00:05:32.560
embedded documents here's what embedded documents do
00:05:38.560
it lets you have a whole object that is embedded in another one so it's
00:05:45.199
like a it's like a collection but it lives on the parent document which
00:05:50.320
why would you want to do that when you could just have a relation like in a database it cuts out a bunch of joins and so if you have a truly contained
00:05:57.039
sub-object there are some serious advantages all the associations work the same but
00:06:02.880
you can do stuff like this without even a join where hey find me all of the
00:06:08.319
people where their address city is in chicago and that's blindingly fast if you have things indexed right and it's
00:06:13.759
just if you use them a lot you discover there are some cases where it actually
00:06:19.680
it's not a one-to-many in the usual sense it is a truly contained object it just works a lot better and it's kind of
00:06:24.880
cool having support for that built into the data store but again mix-ins
00:06:31.120
and these sort of get related you'll see in a bit so yeah i know having instance message
00:06:36.720
class method pattern from active support is dumb and bad and we hate the pattern but it's pretty ubiquitous and it is
00:06:43.280
convenient so everything in mapper is implemented as a plugin which is just an
00:06:48.960
active support concern which means you have class method instance method and a configure hook to plug it in
00:06:54.720
so what this lets you start doing is instead of using single inheritance you can have a mixing you can say hey let's
00:07:01.440
define addresses in a couple related fields and dynamically mix them in
00:07:08.319
to any other model so you can have companies and people both have addresses
00:07:13.680
so instead of having some weird glommed together single inheritance chain you can say hey there's a whole bunch of
00:07:19.520
functionality grouped around has addresses that we can define as a well encapsulated module which includes
00:07:26.240
as you know about encapsulation both the data on the object and the behavior and that's something that's kind of really
00:07:31.840
hard to do honestly in a sql database an active record but it's trivial in
00:07:37.280
mapper and it's about the coolest thing you can do in mapper downside of this power
00:07:43.840
you have like 50 ways you can do any domain model in sql really there's like one true way you if you've been building
00:07:50.160
sites for a while you sort of know what third normal form is and a good way to abstract it and there is sort of almost
00:07:55.680
one proper way with the trade-off of embedded documents you're not they're a relation you end up having a lot of
00:08:02.319
choices decide when to embed is complicated it'll take you a while to figure out where the trade-offs are
00:08:09.599
and if there's a lot of trade-offs that means you're going to get it wrong a lot and you're going to refactor
00:08:15.520
and what does lots of refactoring mean when you're dealing with big amounts of data in a non-sql data store
00:08:23.759
well it means you have a lot of different data versions and i hate to break it to you but there's no
00:08:30.560
magic date of validity unicorns if your data exists in six different versions you're
00:08:36.880
um because you can't really code against an undefined data scheme it's hard to write code that you're like well if it was
00:08:43.279
written like a year ago it probably matches this pattern but if it's written now it sort of fits like this and that
00:08:49.440
leads to completely unmaintainable code trust me i've seen it um you can do long-running feature
00:08:55.440
switches where you're like oh hey if it matches this pattern then we have to act this way on it and
00:09:01.279
if it matches this new structure we have to have this other implementation but if you start having like 50 feature
00:09:07.360
switches in your code that are all there for two years you have a giant mess of every method takes like six flags of
00:09:13.680
knowing how to behave so there's a solution which is
00:09:19.040
hey maybe we should have our schema design defined in one place which is the code
00:09:24.320
and there should be one version of that which means you're going to have to come up with a way to migrate your data
00:09:29.839
mapper stuff um and please if you disagree or have a
00:09:34.959
question shout out this is a small group so we can afford a little heckling
00:09:42.320
um yeah you actually need them you don't just need them for date for
00:09:48.080
mapper the title's misleading you need them for any time you're using
00:09:53.279
which really is anytime you're using no sql i mean you have this problem people say well what about react same problem
00:09:58.320
if you don't have a defined schema in your database you're going to have the issue of your data has different schemas
00:10:04.399
in different points in time and that's really a no-no when you're trying to figure out what the hell they're doing your code
00:10:10.800
turns out though that's really just a problem of migrating data in general and
00:10:16.079
even in sql it's a problem when you have big data i meant the standard i think rails has done a bad thing with rakedb migrate in
00:10:22.880
a way and that's made everyone think oh hey this magically happens that we just can change our schema
00:10:28.160
and everything takes care of itself if you start dealing with a lot of data you realize it doesn't work that well i
00:10:34.000
meant if you're on the exact right version of mysql 5.1 before they toggle the switch back and forth of whether
00:10:40.959
column renames are fast it sometimes works but in reality if you're transforming your data much at all you need to take
00:10:47.440
your site down and put up a maintenance page and that sucks i mean otherwise you're in a state where the database either locked up or halfway through
00:10:54.079
transforming a bunch of stuff so doing in-band large migrations of data
00:10:59.519
it's just a horrible pattern that the simple approach to building a vanilla rails app leads you to be like oh hey
00:11:05.680
migrations are free we just do them and they run well that's
00:11:10.959
i think there's there's a silver lining that i discovered to like having to solve this problem a couple times in mapper
00:11:17.360
and that's that you end up doing it right honestly there's one way to do it and
00:11:22.720
y'all can think about it in about five minutes and come up with what it is um
00:11:28.640
whoa that slide was supposed to be broken down like that hey i forgot to delete one
00:11:35.360
so you have to deploy code that writes to both structures your old one and your new one
00:11:41.680
you've got to use finders against the old one because until you have all the new new structure written your finders
00:11:48.399
against the new one won't find all the records you get to update them over time because you have a lot of data and if you do one
00:11:54.720
big migration all in one go it's going to bring your site to its knees
00:12:00.079
when they're all updated you can switch to using finders against the new structure because in theory now all of your code all of your data looks like
00:12:07.120
one way that a new finder can find and then you clean up the whole data structure whenever the hell you get
00:12:12.639
around to whenever disk space gets expensive or something yep
00:12:23.200
well that's a good question those finders get helpful for that um a little later i'll mention that it is very
00:12:29.920
useful to have a finder for that purpose to be able to tell what are updated if you if you have a intelligent finder
00:12:36.320
that is like look for this old structure or look for the lack of the new one then you can do a count on that and
00:12:42.480
is reasonably fast and if you have things indexed right when that is zero then you're done
00:12:48.959
the later ones will get into the processing of that a little bit but i put a version for example
00:12:56.560
that's one of the things you can do i don't i don't like the version column because just like the old version column
00:13:03.200
in rails it sort of doesn't work with feature development and then getting versions out it's a little better if
00:13:08.880
it's on each model but i prefer at most collections
00:13:14.639
yeah you have to do a niche record yeah you could
00:13:20.720
also if you really are obsessive try to drop your old data at the same time but that means again you have to have finder
00:13:26.800
that finds based on the new schema and the old schema and do sort of a cross union join
00:13:32.160
thing cross engine join in your own data store and it just don't try it you can do whatever you
00:13:37.600
want but it's a bad idea um so how do we sort of implement this thing
00:13:44.880
typically i mean as a rails person we're going to say well we have a before save and we when we load up the record we do
00:13:51.279
it before save we transform it into the new version and then we save it
00:13:56.399
turns out you don't need to do that because of a kind of cool trick it's a combination of how mapper
00:14:01.600
is implemented in the fact that is a document object store so you're storing
00:14:06.800
the whole object each time
00:14:12.720
mapper always uses the setter when you initialize it rails does some different stuff but if you define an
00:14:19.120
attribute setter just by loading the object it will be run through that
00:14:25.120
so if you have your old way there's a bug
00:14:30.480
in this code if you can find it points if your old way of doing it is address
00:14:37.360
and you decide you're adding multiple addresses if you just have an address setter just by loading it this will run in the
00:14:44.639
middle here and then of course you have your thing to make it work with the old way
00:14:51.199
and then if you save you're done there's no before save just because you defined a setter in the way mapper loads
00:14:57.440
all attributes through the setter you don't even need a before save all you need to do is code to make sure that hey if i'm loaded handle it
00:15:04.720
which means migrate becomes you just have to touch and save every record in your data store
00:15:09.839
which is you know not a problem with billions at all ever
00:15:14.880
um every untouched because it's a little annoying if you're dealing with billions and you've saved you've
00:15:21.120
updated half the records over time just letting the stuff run to then go through and all the billions again
00:15:28.160
which is the answer to your question you really really really want to find her for the untransformed records
00:15:34.720
because you want to be able to say hey let's update only half of the billions not all of them
00:15:42.320
so the actual migration process went through on a project where i had to
00:15:48.000
build something unfortunately sorry guys i didn't have time to abstract it out to an open source thing it was a client
00:15:53.199
project you could touch every record in like rails migration and make it work
00:15:58.720
how rails people expect and be all seamless which is going to be slow it's going to
00:16:04.639
kill your deploy it's going to take a bunch of work to fake it's gonna not have any advantages other than making people feel comfortable which isn't
00:16:11.199
always such a bonus then you say hey why don't we do that other hack which is we make a rake test
00:16:17.759
called rake migrate something or other that we run out of band
00:16:23.279
at least it's out of band so now you can do a deploy and then run it it's still going to be slow you're not going to have good error tracking you're
00:16:30.000
not going to know what the status is it's not going to automatically be timed
00:16:35.199
for you it's going to be impossible to add additional workers so after you've implemented that and it
00:16:40.800
doesn't kind of work very well you say hey we all use rescue anyways or some sort of background job why don't we
00:16:46.160
just use rescue that sounds like a perfect solution you still have a lot of rows and you
00:16:52.160
need a way to track progress though so just like a rescue worker isn't the solution i ended up finding a pretty good
00:16:58.160
solution being sorry to sub class uh rescue status and add in
00:17:04.640
some custom logging and outputting of errors um because you could just do the
00:17:10.319
standard retisting and shove your errors into redis and be like hey we have redis and there's an error why don't we shove it in there
00:17:16.000
but oh wrong order damn it oh well
00:17:21.520
what the slide that isn't there and i got in the wrong order is if you shove all of your errors in redis your redis
00:17:27.760
is going to blow up and fall over if you're talking about billions at billions of records and you make one mistake ever or you have a little
00:17:33.919
inconsistent data and personally i don't like making redis fall over when i'm counting on it to keep track of which
00:17:39.360
records i'm migrating so you need to keep tracking things a little better so come up with
00:17:45.200
some sort of cool subclass of status i found it useful to output log since hey
00:17:50.400
disk space is cheap just a log file of say all the records that errored or just keep track of the ids
00:17:57.039
you're going to hit another problem then you're like sweet okay so we have this cool rescue thing that's keeping track of what's going on
00:18:03.120
um time out false there's some weird time-outing time-out issues with
00:18:08.799
long-running cursors in and mapper and i don't remember exactly but it might
00:18:15.200
have been a case like it wasn't even accepting passing in the timeout like setting time out not to be true which
00:18:21.200
meant you would have something running for some number of hours and then just deciding oh hey i timed out too bad so
00:18:29.600
you if you implement this yourself you may have to like invent your own batching process which isn't that hard
00:18:35.120
but don't count on mapper and or mongo's like timeout each without timeout or
00:18:40.400
something to work quite how you expect
00:18:45.760
but one pass through i swear to you is not going to fix all your data
00:18:50.960
if you hit billions of rows and you try to do a pass-through you're going to discover that oh you forgot a few cases so you're going to need to be able to do
00:18:56.640
a multi-pass process oh and here's a slide that i thought was supposed to be earlier
00:19:04.559
oh yeah and call out to the thoughtbot people pop toad is another possibility because hey
00:19:09.760
there's nothing quite like tracking errors in and shoving it into someone else's instance and
00:19:14.880
seeing which wins first what was that
00:19:21.440
yeah when it works well hey they're on too so you know what what can you say
00:19:28.240
uh yeah so this is great so now we're not overloading our redis we
00:19:34.000
are having one instance running to try to transform billions records which
00:19:39.200
you might guess could take a while so you need some way to slice it up
00:19:45.360
the first answer people come up with is hey take a mod of the id and then we can have 10 workers and so
00:19:51.520
we divide it up into 10 chunks and then run workers against that which god why do we have to keep coughing
00:19:56.960
sorry i'm getting over cold um
00:20:02.960
10 workers 100 how many ever it's not very flexible it'll get the job done
00:20:08.320
but there's got to be sort of better ways to do that one of the other cool things about
00:20:14.640
is uh it has a find and modify where you can actually find a record at the same time atomically set a flag or modify a
00:20:22.720
value so you can do something like if you have your version flag or some bit of under transform you can find a
00:20:30.320
record at the same time as you're setting it to be being transformed so given that you can make sure that
00:20:36.720
you're not finding a record that a different worker is already transforming this lets you spin up and this is my
00:20:43.360
ideal final solution it's spin up and workers with a randomized finder so at any point take like a thousand out
00:20:50.960
of the chunk for each worker and process it and since you're setting a flag you're kinda safe that you're not
00:20:58.000
stepping on your toes and because of how it's working all you're to do is end up re-transforming it if you do a little
00:21:03.280
yeah i mean that'll work for manga
00:21:09.840
you're going to have to find your own way of doing it yeah that the there's another solution yeah but this
00:21:17.200
is a cool supposedly this is a mapper talk so that is one cool thing about is that operation because then you can say
00:21:24.080
hey add 10 more workers when it's off hours and so then this allows you to transform a large data set
00:21:29.840
where you're like hey hit only so hard during the hours that matter and at night put 100 more workers on it which
00:21:37.200
is actually a pretty cool thing and how we should probably generally be transforming big data in general not trying to pretend like oh it's just
00:21:43.760
the same rails solution one of the other ones yeah depending on
00:21:49.520
how big your redis instance is is you could just shove all the ids into a queue since you're using redis anyways
00:21:55.440
and then spin off that q there'll be some interesting timing things of how many do you want to put in there and it
00:22:01.280
could get complicated but it would work in the react case we don't have time to modify
00:22:08.400
personally i'd like to be safe so i still have an instance method that can tell if that flag is set in case my timing is weird
00:22:15.919
just to not do the work so i can escape early because it's really slow to transform in ruby land billions of
00:22:21.280
records so i kind of like to not waste time doing it
00:22:27.039
so now you have awesome migrations i meant if you do implement this it's not that hard general approach
00:22:32.960
hey ping me if you have any questions because i have done it once or twice um
00:22:39.600
but it's a pain in the ass because you're still tracking multiple
00:22:46.000
versions of a lot of data and having code that works around it and it turns out that
00:22:51.360
you should just avoid it i wouldn't have more than one or two big transforms going at once
00:22:57.200
it starts getting hard to keep track of what's going on and then something falls over because one of them
00:23:02.960
that's interfering with the others which means god forbid you actually have to suddenly be careful
00:23:10.000
about your upfront design that you're getting the modeling your data right which the funny thing about this is why you went with mapper supposedly
00:23:16.480
was you didn't have to worry about the schema and now wait but i have to worry about the schema more because it's going to be
00:23:22.960
paying to transform and now i actually do have to worry about the schema so i think it's a little bit of a not such a
00:23:28.799
great trade-off situation there if you're not dealing with giant amounts of data it's not as much of an issue but
00:23:34.720
then you still don't have the hey let's just do rails transform approach so
00:23:39.760
i have projects i would use on but some of this don't worry about it cost is totally illusionary you're just
00:23:45.919
pushing cost to a different point in the cycle which is three months in when you try to refactor something and you discover
00:23:52.240
oh this is really hard to keep track of as i said talk to the carbon 5 guys they've moved a couple projects off of
00:23:57.679
back to mysql when i was at pivotal i saw a client there do the exact same thing where it's
00:24:05.039
rails people are so fast well that's the end by the way the slides rails people are so fast working against mysql we
00:24:11.840
know what we're doing you actually end up losing productivity i think honestly
00:24:16.880
generally by using there are some specific where yes this is actually a document it
00:24:22.159
makes perfect sense and i really need the flexible schema that make it worth it but generally i've found it's a drain as
00:24:29.679
well as a benefit so really think twice about it i still talk too fast and need more
00:24:36.240
slides so we all get to go drinking early unless there's many questions does it make sense to actually
00:24:43.120
use for some parts of the of an application or in small parts
00:24:48.799
my hbs for example yeah i've considered it i have a project now where i'm
00:24:54.960
if i went back in time i would despite my aversion to new and shiny use node and
00:25:01.120
for one piece and mysql for the rest the problem is is then you get into that
00:25:07.039
dreaded ground of cross-engine joins and it is really nice to have as much
00:25:12.400
data as possible in one in one data store because then whatever that data store approach for handling joining
00:25:18.559
models or joining tables as it were you can use that instead of having to get a group of
00:25:24.320
ids against one and then pull it over to another one and select against that and like we know how to tune mysql pretty
00:25:30.799
well by now even aggregate queries and can do we can do some cool stuff with that um or if you're using a giant data
00:25:37.200
store like a dynamo like a react there are ways to work with your data i found it
00:25:44.159
usually usually it's not for me a breakdown of the uh the data looks a little different
00:25:50.159
but the use case has to be different enough such as redis is great as a data store for semi-persistent non-persistent
00:25:56.480
data not necessarily a primary data store or i use a big tables or use hbase
00:26:02.640
or something roll stuff up and put it back into mysql so my main app can use it like there are trade-offs always in
00:26:08.799
your data store thing and all the proponents of one or the other make it sound like theirs is the be-all end-all
00:26:13.840
but i mean enough people write about all the trade-offs look at the trade-off for which one and fit c does it fit my use
00:26:20.799
case not does it fit my model really my data model because honestly if you've been
00:26:26.480
writing like mysql apps for a while you can figure out how to model just about anything in a database
00:26:32.480
when you don't have experience with whatever you actually know that the first time you're
00:26:39.279
gonna have a lot of stuff to actually fight with yep fine
00:26:44.480
so do it on a toy project first like experiment experiment with data stores on your own personal project not
00:26:50.080
necessarily a client's time if you're a consultant um or find consultants said like yes we want to do like you're
00:26:55.840
sure that i guess like okay sweet i'll learn something new it's i
00:27:01.360
won't at all be frustrated when i hit my head against the wall repeatedly um
00:27:06.720
yeah it's hard if you haven't used it to know the trade-offs but at this point mongo's been around a while so you can sort of
00:27:13.279
google and find a lot of people ranting about it both directions it's a lot of fun to have a fully
00:27:19.679
dynamic schema that is determined by your code that can be determined at run time even
00:27:25.120
that fun has to be tempered with what are the costs of sort of lack of constraint i mean we talk a lot in rails
00:27:30.159
about constrain constraints being creative and like making us solve the problem right and was the other
00:27:36.080
direction of like hey let's have no constraints and do whatever we want i had one project on it there was a
00:27:41.760
blast but yeah it was a nightmare dealing with different data versions which is why i said hey maybe i should give a talk on this i meant to abstract
00:27:48.240
it to a gym but proprietary code you can't always get permission to do that when particularly you're
00:27:53.919
done with the project and you can't remember it perfectly and i don't know that i'm going to write
00:27:59.039
another app until i have a client that's like yes let's put it in at this point i would lean more towards
00:28:04.320
react redis and mysql or postgres because i meant
00:28:10.240
does do a lot of cool stuff but i don't know if terabytes is the
00:28:15.440
answer i know some people that are pulling terabytes out of just writing flat to flat log files and
00:28:21.520
then using mapreduce to roll that up because mongo's just not holding up for them anymore
00:28:26.640
i meant there are works great when either your index or your data fit in memory
00:28:32.399
it is blindingly fast because it's designed to be better than mysql it's sort of a i can fit into memory so you can do
00:28:38.880
even unindexed queries against small enough data sets that you can't believe how fast they return you're like there's not an index on here how can you do this
00:28:45.440
complicated query against all these sub objects and just return it right away it's
00:28:51.039
blazingly faster that which is kind of cool but that doesn't work
00:28:56.399
that doesn't work when you have like a terabyte of data anymore and then i think you need a solution that's built more around replication as its core such
00:29:03.600
as a dynamo sort of thing cassandra supposedly works now but i don't know i've been watched how long people have
00:29:10.159
been trying to get cassandra working that i'm a little nervous about it and there's a lot of people that i've talked to who are smarter than me and
00:29:16.240
know a lot more about databases and every one of them keeps saying well why don't you use react so
00:29:21.840
that's kind of why i'm leaning that way was there another hand or does everyone
00:29:27.039
want to go get drinks okay go ahead but when people leave then it's done so
00:29:33.840
how about where is some embedded documents are they equally fast
00:29:39.279
for example you you don't have an index yeah yeah i was surprised at how fast queries
00:29:45.600
against embedded documents worked when they weren't indexed and that's what i was saying i talked to the tengen guys they're like well yeah if your data sets
00:29:51.840
small enough to fit memory in a dismissive tone and then i was like oh yeah that makes sense um
00:29:58.480
does some interesting things about how it loads things into memory when you access the collection and if it's small enough it's going to do incredibly fast
00:30:04.799
queries against it so yeah throw on your bfs server for small data sets where small is in how
00:30:11.600
many other gigabytes of ram you can afford and it'll work it'll be great you won't even have to worry about it'll just be
00:30:17.600
like sweet i type away at this dumb query and it returns and i don't have to think about my joins and i don't have any problems and then your data set
00:30:24.240
grows larger than your memory and it falls over and blows up in your face
00:30:29.840
it's not really your data set growing from memory it's your active data set active yes
00:30:34.960
like we have way more memory usage way larger index size than ram but we
00:30:42.159
have no performance issues because if that's not the active data set because typically the this latest partition is
00:30:48.320
actually active yeah so it's active data set not all could everyone hear that
00:30:53.440
um he just basically said your active data set is what matters not your data set uh his their indexes even are far
00:30:59.039
larger than their memory but since they're not accessing all of it which is a good point if you fit within the profile of it is sort of nice how
00:31:05.919
fast and easy it is i think there are some costs that are trade-offs there but uh
00:31:11.360
i i was very impressed at how how performing it was for what would be in mysql stupid hard queries where it's
00:31:18.880
just like oh here you go here's all those records you wanted like but how are you doing that
00:31:24.159
they'd put some thought into it couldn't you so for example i don't know start the
00:31:31.279
to data series with i don't know uh 100 gigabytes of memories
00:31:37.360
uh and then we
00:31:42.480
i didn't quite follow your questions yeah so if um
00:31:48.640
scanning of memory certain memory objects are fast enough couldn't you just
00:31:54.240
chat the data among instances you could
00:31:59.360
i don't know he might know more because it's been three six months he was asking about well how about just sharding your
00:32:05.919
then um i i was just burned by dealing with
00:32:12.159
replication of a couple times and i sort of avoided it and treated it like i meant shardings a different problem
00:32:17.919
but if you're going to the efforts to shard then you could use data stores that
00:32:23.679
sort of work better with large data i'm at large data in this case with how big memory is getting can mean seriously
00:32:30.240
large data i mean a lot of the websites we build are gonna fit fine in i personally don't think it's worth the
00:32:36.320
cost of dealing with a fully undefined schema and not knowing what my data is i meant it is really nice to know that hey
00:32:44.080
at least within certain constraints the data stores enforcing i know what the data looks like because i can write my
00:32:49.200
code against it you can't really write code against an undefined schema so
00:32:54.559
if you spend the effort to guarantee you know what what the code is it's totally worth it uh if you're hiring junior
00:33:01.039
intermediate or even advanced rails people that haven't touched they're going to be way faster writing code in mysql often and so why pay that
00:33:08.559
cost but not saying don't ever use it i've used it it was fun
00:33:14.080
i just fun is a thing where it keeps developers productive to some degree but you don't
00:33:20.480
necessarily want to pay too many costs for that but it there are plenty of shops that
00:33:27.039
well john noonmaker who like wrote mapper obviously using it with great success otherwise he wouldn't have read
00:33:32.799
it written it and keep maintaining it um that's part of why i like my experience mongood my first
00:33:39.679
project kind of hurt sorry whoever writes mongoose but i didn't like it nearly as much
00:33:46.159
i think it was the plug-in sort of data mapper pattern that's so baked into mapper which ripple which is the
00:33:53.679
react client was inspired by so i haven't played with ripple but it sounds like you should have a similar
00:33:59.840
experience using a react if you use the ruby client ripple
00:34:05.120
yeah you don't have fine to modify but you have a bunch other cool too so i think that looks like about it so we
00:34:11.679
can get out of here early anyone nope cool
00:34:57.920
you