Boundaries

Gary Bernhardt

Boundaries

Bookmark

Play on YouTube

Edit

#functional-programming

#software-design-patterns

Boundaries

Gary Bernhardt • November 01, 2012 • Denver, CO • Talk

The talk titled "Boundaries" by Gary Bernhardt at RubyConf 2012 explores the concept of isolating unit tests in software development and the implications of using mocks and stubs. Bernhardt highlights the tension between the benefits of isolated testing and the issues that arise when production code diverges from test configurations. He emphasizes the importance of distinguishing between value and mutation in programming paradigms, contrasting procedural, object-oriented, and functional programming approaches.

Key Points Discussed:
- Isolation in Testing: Bernhardt explains the advantages of isolated unit tests, including improved test-driven design and faster tests. However, he warns that reliance on mocks can lead to passing tests that do not reflect actual production scenarios, resulting in boundary problems.
- Testing Approaches: Various testing strategies are presented, including collaboration tests and the use of tools like RSpec, which ensure mocked methods exist to prevent errors. He discusses the limitations of integration tests, arguing that they become unmanageable as code complexity increases.
- Working with Values: The focus shifts to how adopting a functional programming style, where data is immutable and communicated through values rather than methods, can mitigate boundary problems. Bernhardt illustrates this with examples, including the transformation of a traditional class structure into one reliant on value-based communication.
- Three Programming Paradigms: Bernhardt examines procedural, object-oriented, and functional paradigms, explaining how they differ in terms of mutation and data encapsulation. He introduces a hybrid approach termed "functional OO" which combines the best of both worlds: immutable data structures and the encapsulation typical of object-oriented designs.
- Concurrency: The talk concludes with insights on leveraging concurrency through the actor model, highlighting how values allow for better distribution of processes within a program.

The central takeaway is that designing systems with a strong emphasis on functional principles—especially data encapsulation and immutability—enables easier testing and concurrency management, ultimately leading to more robust software architectures.

Boundaries
Gary Bernhardt • Denver, CO • Talk

Date: November 01, 2012
Published: Tue, 19 Mar 2013 00:00:00 +0000
Announced: unknown

Some people test in isolation, mocking everything except the class under test. We'll start with that idea, quickly examine the drawbacks, and ask how we might fix them without losing the benefits. This will send us on a trip through behavior vs. data, mutation vs. immutability, interface vs. data dependencies, how data shape affords parallelism, and what a system optimizing each of these for natural isolation might look like.

RubyConf 2012

00:00:14.269 the title of this talk is boundaries this is the only one word talk title at this conference which I'm very proud of

00:00:20.189 the next shortest is three words thank you this is some of the stuff in this

00:00:28.199 talk is going to be very familiar to anyone who comes from certain functional programming backgrounds but this is a

00:00:33.809 story of me approaching some ideas that they have from a very different direction and from a very different history so I am Gary Bernhardt so I

00:00:42.270 would look like this on the internet where sometimes I get mad and my

00:00:48.510 Bluetooth is not working very well I might have to forego it I own a company

00:00:54.750 called destroy all software that produces screencasts on various advanced software development topics and to start

00:01:02.070 us off in this talk we're going to start with test doubles there are a couple talks about test doubles mocking and

00:01:08.159 stubbing at this conference this is not a talk about test doubles but they are going to be part of my motivation just

00:01:14.970 to make sure everyone's on the same page let's go through a quick example of what an isolated unit test might look like I

00:01:20.009 have a sweeper class this is in some kind of recurring billing situation and if I have a user who is subscribed but

00:01:26.100 has not paid in the last month I want to tell him that something's wrong and disable his access so when his

00:01:32.159 subscription when a subscription is expired we will make a user Bob he's going to be a stub he's an active user

00:01:38.280 and he last paid two months ago we will have an array of users that's just Bob

00:01:43.439 for convenience and before every test we're going to stub out the user doll method to return that array of Bob so

00:01:50.850 this is one of the ways in which we're isolating ourselves from third parties from other classes like user we want to

00:01:57.689 email the user when the subscription is expired so we will invoke the sweeper and we expect it to call user mailer

00:02:05.130 billing problem to send an email to this user telling him things are bad so this is an isolated unit test it's isolated

00:02:11.700 because it moves its dependencies like a user and like the user mailer hopefully my phone

00:02:18.690 is back now awesome okay the implementation of this is very

00:02:24.420 simple we will pull out all the users from the database we will select only the ones who are active users but have

00:02:30.659 not paid recently enough and then for each of those we will send the email right so very straightforward stuff what

00:02:38.340 we have here is a three class system these three classes integrate in

00:02:43.530 production but in tests where we're moving two of the dependencies replacing them with stubs and mocks giving us this

00:02:49.019 as our testing world so everything is nice and isolated there are several good

00:02:55.560 reasons to do this several very big benefits that come out of it but there's also one really terrible thing that

00:03:01.530 happens when you do this so let's go through those this allows you to do real test-driven design looking at your tests

00:03:07.500 seeing that you have mocked six things and two of them are mocked three method calls deep this tells you that your

00:03:13.140 design is not so good for this class so it gives you a form of feedback that you can't get without isolated tests at

00:03:18.630 least I don't know how to it allows you to do outside in TDD where you actually build the higher-level pieces before the

00:03:24.930 low-level pieces exist so we could TDD the sweeper using the user using the user mailer before those classes exist

00:03:31.349 because we're just stubbing them out anyway then when we want to write the user class for real we can look at what

00:03:36.510 we stubbed and that tells us the interfaces it needs and finally this gives you very fast tests this is one of

00:03:43.410 the main things in the whole fast rails tests me more I don't want to call it a movement but people getting excited

00:03:48.480 about fast tests in the rails world and we're talking about the difference between a 200 millisecond time from

00:03:54.239 hitting return to seeing the prompt back versus a 30-second time to run a very small test it's a very big difference

00:04:00.090 when you're when you're really isolating so these are all very good things that you want but they are balanced out by a

00:04:07.709 very bad thing and that bad thing is that in tests you're running against a mock and a stub and in production you're

00:04:14.010 running against real classes and if you don't stub the boundary correctly your tests will pass and your production system will be wrong and this is this is

00:04:22.770 such a big problem that for most people I think it overshadows all those benefits even if you explain them

00:04:28.420 to them they're going to look at this problem and say it's not worth it now

00:04:33.760 there have been attempts to fix this various various approaches to try to solve this problem in one way or another

00:04:39.750 one of which is to solve it with more testing contract and collaboration tests this is an idea sort of most closely

00:04:48.460 associated with JB rains Berger who is one of the people who is most influential on my understanding of isolated unit testing I've not actually

00:04:55.540 done this and something about it doesn't resonate well with me but it is one attempt to fix this there's also the

00:05:01.900 tools approach our speck fire is a tool and Ruby that tries to solve this problem if you mock a class in art with

00:05:07.660 r-spec fire it will make sure that you only mock methods it actually exists so and make sure that you don't cause

00:05:13.780 these boundary problems or at least you don't cause simple boundary problems and finally you can solve this with static

00:05:19.150 typing like so many things in life it comes with all the same costs you pay to solve anything with a powerful static

00:05:25.510 type system but if you think about your mocks as being subclasses of the real class they just remove all the actual

00:05:31.240 implementations that gives you an idea of how static typing can solve this boundary problem all of these only solve

00:05:39.610 simple kinesins mismatches between objects they solve things like I called the method with the wrong name like

00:05:44.800 passed the wrong number of arguments they don't solve deeper things like my two algorithms that need to cooperate

00:05:49.900 don't actually cooperate correctly the way that you can solve that and the most

00:05:55.570 common way people try to fix this problem is by just not doing isolated unit testing by just integrating right

00:06:01.740 the problem with solving the isolation problem with integration is that integration tests are a scam I can't try

00:06:10.900 take credit for the sentence this is once again JB rains burger there's a talk called integration tests or a scam

00:06:16.270 which you should all watch it's a really good talk that really lays out the argument for why integration testing doesn't work on a long enough time scale

00:06:22.630 and he nowadays uses a terminology integrated test I mean any test that's integrating multiple pieces I'll give

00:06:30.370 you the really quick and dirty argument for why integration tests don't work the number of paths through your program

00:06:36.010 goes like 2 to the n where n is the number of branches or conditionals and that includes try/except that includes a

00:06:42.340 short-circuiting boolean expression that includes a loop every time a branch is

00:06:47.470 happening if you have n of those you have 2 to the N paths and if you're trying to test the whole thing you have a space of 2 to the N to decide to

00:06:54.130 choose from if you have 500 conditionals in your program this is a number with about 150 digits in it it's a very large

00:07:00.880 space it's very difficult to effectively choose which which paths matter because they're effectively uncountable to you

00:07:06.720 the other problem is that suite runtime and an integration suite is super linear whenever you add a unit test or whatever

00:07:14.440 kind of test you're writing you're also adding a little bit of code so your number of tests goes up by one and you

00:07:19.870 make the system a little bit bigger which means all your existing integration tests get a little bit slower so every time you add a test

00:07:25.960 there are two sources of slowness one of which is linear and one of which is something else I'm not sure of but together it's definitely a super linear

00:07:32.680 runtime and anyone who has a three-hour rales test suite will be able to tell

00:07:38.410 you that this is in fact the case and they will probably not like their lives very much either so that's all

00:07:45.820 background this is this is how I came to the ideas of this that I'm going to talk about for the rest of this talk this was

00:07:51.450 this has been a large focus in my software development career for the last five five years is isolated testing and

00:07:58.030 figuring out how to do it well so now it's shift gears entirely talk about values values meaning the the pieces of

00:08:07.120 data inside of a program if you want to test the plus method and let's just

00:08:12.610 think about + on machine integers and for whatever reason you decide you want to test it in isolation so you don't

00:08:18.910 want any other dependencies involved in the testing what do you have to do to isolate plus nothing it isolates for

00:08:27.640 free plus doesn't have any dependencies there's nothing to mock out there's nothing to stub it's totally local and

00:08:33.750 why is that the case it's not just because + whoops it's not just because plus is simple it's it's tempting to say

00:08:41.349 oh plus is simple so of course it isolates for free that is not what's happening it has two properties that are necessary to be naturally isolated with

00:08:49.030 no stubs or mocks the first is that it takes values arguments and it returns new values and it doesn't mutate those values it just

00:08:55.510 gives you a new value right it takes an integer in an integer and it gives you an integer the second property

00:09:00.760 is that it doesn't have any dependencies there's nothing to mock it doesn't it doesn't need anything else it's a local

00:09:06.850 computation that just produces a new value so how could we apply that to more

00:09:11.950 complex code that we work with all the time stuff like the sweeper well let's go through this and just impose both of

00:09:18.370 these constraints and see what happens starting with the Bob stub we can't use a stub because we're not faking out any

00:09:24.340 boundaries so let's replace that with a user object but not like an active record object but like just a struct

00:09:30.220 some kind of a piece of data even a hash I wouldn't use a hash but you could just use a hash we can't do the the user dot

00:09:38.470 all stub because we're not allowed to so we'll just delete that and then the actual body of the test instead of doing

00:09:43.630 a mock expectation we can just call the method and get back the array of users who are expired now this does less than

00:09:51.730 the original code we're going to get to that later the the implementation changes we basically lose the second

00:09:58.930 half we now have a method that goes through all the users and returns only the expired ones this difference is huge

00:10:08.320 the difference between the original code and us is huge the the nature of the communication between the components has changed instead of having synchronous

00:10:15.790 method calls as the boundaries between things we now have values as boundaries the value returned or taken by the

00:10:22.570 method is the boundary between it and another object now just as a quick

00:10:28.630 digression when I talk about values I often mean things like this may be a

00:10:33.790 class that is a struct it has two fields title and body and it has a slug computed from the title but behaviorally

00:10:40.660 this is equivalent to a class that has a title body and slug and computes the slug at creation time they're basically

00:10:46.090 the same thing right the only way to tell the difference from the outside is timing properties on the method calls so

00:10:51.730 I'm going to use these two ideas interchangeably but really they're basically the same

00:10:57.060 so we've seen isolated testing as a bit of background the idea of converting the

00:11:03.040 code in the system to communicate via values at the boundaries instead of via message sends or method calls at

00:11:09.070 boundaries and now I want to look at how this fits into the three dominant

00:11:14.890 programming paradigms putting aside logic programming but how does this relate to procedural low and

00:11:20.710 functional programming here's a small piece of procedural code we want to feed

00:11:25.930 some walruses so for each of the walruses we shovel some food into its stomach we shovel some cheese into

00:11:31.300 walruses stomach there are two properties of this code that make it very obvious that it's procedural the

00:11:37.900 first is the each whenever you see each and Ruby there's something destructive going on each each with it with a non

00:11:43.960 destructive body is a no op so there's something destructive happening and we know the structure of the walrus and the

00:11:50.200 structure of its stomach we know it has a stomach we know the stomach can have things shoveled into it we have knowledge of the internals contrast this

00:11:57.820 with the oo solution where you still have in each it's still destructive in most oo code but now we tell the walrus

00:12:03.880 to eat something he knows how to eat instead of us knowing about his stomach and then the eat method will shovel

00:12:09.190 things into the stomach same code as before just encapsulated and my

00:12:14.470 Bluetooth is dying again so we have two two paradigms here both of them involve

00:12:21.040 mutation one of them separates data and code that's procedural one of them combines them into units called objects if we add functional to this instead of

00:12:30.220 doing in each we do a map we're going to take all the walruses and produce new walruses that are slightly different so for each of them we're going to call eat

00:12:36.820 on the walrus and some food some cheese and I'm going to use a hash for the walrus and array for the stomach and

00:12:42.520 strings for the food so in the eat function it's kind of weird but we build a new stomach that's the old stomach

00:12:47.920 plus the new food and then we build we build a new wall wrist that's the old

00:12:54.370 walrus with the new stomach you can see why oh oh models real-world things a little better than functional

00:13:00.130 programming does okay so that's that's functional nothing is being mutated

00:13:06.550 right so we have no mutation but data and code are separate they are not

00:13:11.980 combined into single things now if you look at this table obviously I've left a row there's one more row to go

00:13:17.530 but even just looking at the variables we have two variables does it mutate or not does it bind data and code together

00:13:22.750 or not they clearly vary independently which means we have four possibilities so what is the fourth possibility it's

00:13:29.050 not logic programming by the way here's what the four spot fourth possibility looks like we map like in functional

00:13:36.790 programming so we're producing new walrus's but we're telling the walrus to eat something and that's not a

00:13:42.460 destructive eat instead they eat method constructs a new walrus that is the old walrus with a new stomach that contains

00:13:49.030 the new food so it combines the immutability of the functional code but

00:13:54.340 it combines the merging of data and code together like oh oh does and that is the

00:14:00.070 fourth entry and I call it lovingly pho because it's not real low now there's a

00:14:08.590 problem with programming this way and that problem is that you lose the ability to do anything destructive to

00:14:14.020 talk to network to talk to disk to do any kind of i/o you lose the ability to maintain state over time so to to

00:14:22.690 reintroduce the idea of state we have to add imperative programming back into

00:14:27.790 this sort of photo style of programming we have to figure out how to compose the user database the expired users class

00:14:34.270 and the mailer together even though the expired users class is functional in nature so we have our expired users it

00:14:42.160 returns an array of users who we need to notify and what we need to do is reintroduce the imperative layer around

00:14:47.950 it an imperative shell which surrounds the functional core it talks to the database it uses expired users to filter

00:14:54.760 those users and then it emailed each of the ones that comes out so the

00:14:59.800 imperative shell is a layer that surrounds the functional core the functional core is the bulk of the application it has all the intelligence

00:15:06.190 and the imperative shell is sort of a glue layer between the functional pieces of the system and the nasty external

00:15:12.430 world of disks and networks and other things that fail and are slow if we if

00:15:18.580 we look at what's actually happening in these two things it's not an arbitrary distinction even though all I did was

00:15:23.710 cut the original method in half this division runs very deep if you look at

00:15:28.780 what these things do the expired users class makes all the decisions and the sweeper class has all

00:15:35.570 the dependencies so if we look at the

00:15:40.610 way that that relates to testing the functional core is heavy on paths heavy on decisions light on dependencies which

00:15:46.339 is exactly what unit testing is good at especially isolated unit testing when you take away the need to stub out the

00:15:52.430 dependencies you can just focus on the logic and the tests become very simple and exactly the same thing is true for

00:15:57.649 the shell lots of dependencies few paths is exactly what an integration test is

00:16:02.690 good at it because it makes sure all the boundaries are lining up all the pieces are communicating correctly but you

00:16:07.880 don't have a lot of test cases which means you don't end up with a 30 minute or a 3 hour test suite just to get a

00:16:16.730 sense of what that integration test might look like since we already saw the unit test maybe I create two users in

00:16:21.980 the database actually create them in an actual database I invoke the sweeper I pull out all the mails that were

00:16:28.399 delivered by action mailer and I make sure that only Alice was mailed she's the only one who's expired he or she

00:16:33.709 paid two months ago Bob paid yesterday but I only have to write one of these

00:16:38.779 whereas I'm going to have to write a bunch of the isolated tests on the functional core so now we have a a

00:16:47.300 solution to the isolation problem in for most code in system because we can build

00:16:52.370 it all as functional pieces in this sort of flow style where there are still objects but they're not mutating and

00:16:57.740 they're just taking values in and out and we have a way to reintroduce the imperative part around it so we can

00:17:03.410 actually talk to the outside world and it turns out that this leads to all

00:17:09.049 kinds of amazing benefits not just the testing benefit not just the fact that functional code is easier to reason about over time but it even makes

00:17:16.579 certain types of concurrency much easier if we think about the actor model of concurrency which is the one that I have

00:17:22.699 the most faith in as something sort of approaching a general-purpose concurrency style or currency

00:17:27.980 programming method let me quickly explain it to you just in case

00:17:33.290 everyone's not familiar I'm going to do it with just threads and queues so we have a queue and this is going to be the

00:17:38.450 communication mechanism between two processes it is the inbox of process to process one is going to send

00:17:44.300 to it for process one I'm just going to fork off the thread that is going to infinitely loop reading from standard in

00:17:50.300 and pushing into the cue process to is going to infinitely loop reading from the cue and writing to standard out so

00:17:55.370 this is an echo program that's communicating through a cue where the cue is the inbox for process 2 if I just

00:18:02.930 run this at the shell and start typing things into it it's just going to print out whatever I sent in this is the the

00:18:10.130 simplest way I know to explain the actor model you have independent processes each of them has an inbox it is only

00:18:16.160 readable by that process and they communicate by sending messages to each other into each other's inboxes

00:18:22.990 the reason the way that this relates back to functional core imperative shell 2 fo o 2 the idea of having lots of

00:18:29.540 values is that every value in your system is a potential message a possible message between two processes every

00:18:36.740 value that is struct like and can be easily serialized can also be easily sent over the wire and this is a special

00:18:44.360 case of the value is the boundary between the components so if we rewrite our sweeper in a slightly different way

00:18:50.210 so we have a sweep method it calls expired users on user dot also pulls everything out of the data out of the database finds only the expired ones

00:18:57.200 and then for each of those emails this is the imperative shell that you're looking at right now the functional core

00:19:02.630 is the expired user's class it's going to do what it did before or the expired users method excuse me it's just going

00:19:08.450 to filter out expired users and then we have this very trivial notify a billing problem thing that just delegates to the

00:19:14.150 mailer let's translate this into the actor model for the first one I'm going

00:19:20.360 to make an actor that pulls everything out of the database and just sends them one by one into the expired users actor

00:19:25.640 and then dies if I didn't do die then this would loop infinitely the expired

00:19:31.610 user's actor is just going to pop a user off of its Inbox it's going to decide whether that user

00:19:37.880 is late and if it is late it's going to forward that user on to the mailer process and the mailer process is just

00:19:44.330 going to invoke the mailer so the imperative shell is sort of a bigger process it takes a little while to run

00:19:50.750 it fires off all these messages to the smaller processes and what we've just done is converted a program that could

00:19:56.930 only use one core into a program that can use three cores not on MRI but on other VMs we'vewe've

00:20:03.970 parallelized this by doing very little work because we had the values available to send over the wire oh I forgot to

00:20:12.140 actually translate that there's the new version it's the same thing as the old basically values in your system afford

00:20:19.250 shifting process boundaries but really in general values in your system afford shifting boundaries between anything

00:20:25.370 between a class arrangement between subsystem arrangement between the wave

00:20:31.190 you're building your program whether it's serial or parallel so this has

00:20:37.480 programming in this style has surprisingly deep effects on the things

00:20:42.920 you can do in the way that you can do them that was a lot of stuff so now I'm going to try to reset it in like three

00:20:49.430 minutes to make it all tie together in this style you design your program as a

00:20:56.360 core of independent functional pieces that take values and return values the imperative shell orchestrates the

00:21:01.520 relationships between those interfaces them to the network the disk other nasty

00:21:06.530 systems like that and maintain state for example I wrote a Twitter client in this style it's sort of a it's a terminal

00:21:13.640 program but it's interactive like vim would be so you hit J to go down to the next tweet the imperative shell sees the

00:21:20.120 J calls into the functional core to generate a new cursor position the new cursor is generated and returned and

00:21:25.550 then the imperative shell updates the instance variable holding the cursor to be the new cursor the functional core

00:21:30.800 built the new cursor and it was a purely functional operation the imperative shell just updates references to these

00:21:36.950 new objects as they're constructed what you get from this is easy testing

00:21:43.280 especially isolated you also get easy integration testing and the distinction between which one happens where is a lot

00:21:49.610 more obvious than it is if you just start throwing things against the wall and try to figure out what gets tested how later you get fast tests you don't

00:21:57.650 have to do any weird stuff to get fast tests so just inherently fast because they're functional and working on small pieces of code you have no call boundary

00:22:05.150 risks you don't have to Stuber mock you have easier concurrency at least in the

00:22:10.220 actor model and you have more fluid transition between concurrent and serial computation and that's all

00:22:16.580 just a special case of having higher code mobility in general moving code between components moving code between

00:22:22.460 processes so that is the end of the

00:22:27.470 actual talk once again I am Gary Bernhardt I run destroy all software which produces screencasts and if you

00:22:33.740 are a subscriber or want to become one it is not free but there is a screencast on destroy all software called

00:22:39.320 functional core imperative shell which is the first time I ever talked about this in public and the one that's coming

00:22:44.990 out two weeks from now is also about this topic expanded a little more and in

00:22:51.500 that screencast I give a much larger example that I can't really give here but I show you the Twitter client and how its arranged and how how the

00:22:58.190 different parts of the system are segregated in this way so with that

00:23:03.230 thank you guys very much for listening to me for half an hour

00:23:16.930 that actually went way faster than I expected so I would be happy to to take

00:23:24.050 comments or questions or yeah do you think there's any

00:23:29.740 useful distinctions besides the functional bit between like a ports and

00:23:35.860 adapters architecture right that's a wonderful question the question is about port the relationship to ports an

00:23:41.680 adapter is writer or hexagonal architecture or these kinds of things yeah so if you're building a large

00:23:50.890 system that's going to be 30,000 lines of code you don't want to have one functional core and one imperative shell if you ask a haskell programmer about

00:23:58.210 doing this they will tell you that that it just becomes a nightmare I think that the the ideal large system is actually

00:24:04.000 many smaller systems built out of this in a sort of way you you have the functional pieces you wrap them in a

00:24:10.270 layer of scar tissue to interface them to the nasty outside world and then you build a bunch of those that communicate

00:24:15.940 in destructive ways is it does that answer the question

00:24:22.390 sure there's no adapter in that explanation but it's sort of the

00:24:29.050 adapters are the stars exactly that's true I guess yeah does some extent the the the imperative shell is just an

00:24:34.540 adapter fair observation yeah over on the side

00:24:44.910 the question is how if I have how if I find sound success in using actors with

00:24:50.040 Ruby the answer is no I have so this

00:24:56.610 this Twitter client that I that I wrote to as I was figuring this out does use

00:25:01.680 the actor model but it's just threads and cues I just built a little actor library it's like 35 lines of code a

00:25:07.500 simple actor library is easy a more complex one I see diminishing returns if

00:25:13.500 your VM isn't built for it you can't spawn half a million processes in Ruby your machines just going to go up

00:25:19.170 explode into smoke so use our lang yeah

00:25:30.770 paradise bringing in other gems of libraries the same time like let's see a

00:25:37.100 traditional right

00:25:42.549 so the question is how suited with a rails app be to this style development the answer once again is no it's not

00:25:51.470 going to work very well you could I mean it depends on how large your rails app is the thing about a rails app is if

00:25:58.159 your rails app is a hundred thousand lines you don't have a rails app you have ninety five thousand lines of your

00:26:03.860 application and you have five thousand lines of rails glue code and probably what you've done is dumped those ninety

00:26:09.500 five thousand lines into models controllers and helpers and fail to actually design your system if you have

00:26:16.429 designed a system and treated rails as a small component of it that you want to mostly protect yourself from then you

00:26:22.549 might be able to do this but to be honest I've I've never even thought hard about how you would do that I guarantee

00:26:30.320 it's possible but but you're not going to transition your large rails app into this easily by would be like we do

00:26:37.970 letters a lot written software and

00:26:51.090 you if with the imperative shell wrapped

00:26:57.159 around the functional core you can do whatever you want out there right so you can use I mean like my Twitter my

00:27:02.559 Twitter client uses all tons of ghent well not tons it uses like six or eight gems normal gems that are just you know

00:27:10.210 work like anything else and they're they're imperative in nature as oo programs and Ruby programs tend to be

00:27:15.220 and I just put them out in the scar tissue layer and I let that be as big as it needs to be to reasonably allow me to

00:27:22.270 use it and then in the functional core it doesn't it doesn't have to see any of that stuff it is this is exactly this is

00:27:30.490 the difference between just thinking about that sort of photo style programming the functional oo style and then actually adding the imperative

00:27:38.110 shell the imperative shell is what allows you to build real software that actually does work in this way so when

00:27:47.470 you give the functional example you create Larson

00:27:54.750 and sort of a true functional standpoint right just returned data sure you feel about

00:28:02.020 the next level like you know there's nothing special on Wohlers that's stomach return data

00:28:11.550 and so you know like how do you feel like going next level

00:28:17.390 so I missed the last sentence going even more functional I guess

00:28:22.490 well I wouldn't consider changing from returning wall or sister returning stomachs as more fun no unless ain't

00:28:30.020 dancing right well if you look at the

00:28:42.230 code I used the word walrus but really there's nothing especially while we're see about the code you could replace

00:28:47.630 that with animal and it wouldn't know the difference right it just knows that there is a stomach key and there is inside of the stomach is an array of

00:28:54.260 various foods so it's not tied to the to the walrus nature of the walrus the the

00:29:04.460 user has lost right the class of the boys it's not well no I never mentioned

00:29:12.500 a walrus class I could have used the word animal it would have been the same thing right as long as it as long as it has a stomach that code will work on it

00:29:18.470 I just used walrus to make it more concrete give it like whether it's how you build up the wing just for 30 days

00:29:24.590 eg if values are the neighbors then like different values or boundaries

00:29:33.550 objects are data if they're if all if all the methods on an object are pure

00:29:39.560 functions then the object is data and it's indistinguishable from an object that's struct that has everything early

00:29:45.560 bound right late binding only matters in a in a system with mutation in it this

00:29:53.210 is why for example Haskell is lazy well Haskell is weird yeah I don't know

00:30:00.620 how else to say that I feel like I'm failing to understand some part of your question okay yeah

00:30:25.310 yeah so well if you go if we go back to the place where I actually did that where I merged way back here there it is

00:30:34.050 this was actually the functional example right if you look at the the functional oo example I just did Wallace knew which

00:30:41.730 is a little more natural there's not an easy way to say I want a new object with only this field changed because Ruby's

00:30:46.800 not designed for that but that is easier to build in than it would be to build in to replace all your core types the nice

00:30:55.440 thing about the Ruby core types is that the the really scary things have bangs

00:31:00.480 on them usually the mutation it's not true for like delete but but the names

00:31:05.850 are usually very obvious that they're mutating or they have a bang on them I've actually not found a problem maintaining maintaining functional data

00:31:14.640 structure manipulation code in Ruby your mileage may vary yeah in the back

00:31:30.010 despite your cultural important there's a whole bunch of these exactly

00:31:37.320 what they're doing it certainly could you have to spend

00:31:44.990 what I've found is that the the choice of which classes you have in the core is

00:31:50.400 extremely important the names of them and the way that the responsibilities are divided up so actually I could pull

00:31:56.400 up part of the Twitter client and show you guys a larger example let's see wait

00:32:05.190 where am i yep so for example the cursor cursor this is one of this is a piece of

00:32:12.390 the functional core it has state that includes the tweets in a list of tweets and then selection is the currently

00:32:18.510 selected tweet all right so this encapsulate all the behavior of the cursor actually why is my keyboard not

00:32:24.300 working part of some of this is gross like it's actually quite a large class this is one of the largest classes I've written since I started programming Ruby

00:32:30.780 it's almost a hundred lines but that's because it's really like a very small module dude laughs it's like a very

00:32:38.430 small module of functional code it's just sort of self-contained and then if we look at the actual imperative shell

00:32:44.940 this is the entire shell it's 153 lines that what that says you guys read that

00:32:50.210 there sorry about that so let's see where cursor dot something with tweets

00:32:59.070 no starting at index the shell is sometimes a little bit awkward here we

00:33:04.080 go so here is the cursor actually being manipulated when you hit J it just reassigns the current cursor in the

00:33:10.710 shell to the result of doing cursor down and if we look at cursor dot down all it does is construct a new cursor so the

00:33:18.630 fact that I chose cursor to be one of the boundaries in the functional core is very important if I had had if I had a

00:33:24.480 tweet list and then was maintaining a selection separate from that this would have been awful it's very important to

00:33:30.240 find those boundaries that make very small cohesive functional components but not too small I mean I showed you like

00:33:36.540 three line examples in the talk but that's because it's a talk really you want pieces larger than that but smaller

00:33:42.600 than a whole subsystem does that answer your question at all I was Chuck right yeah I can't see but I can hear

00:33:53.760 hidey-ho

00:34:00.820 yeah that's the hard part I mean that's always the hard part right but separating separating things that do

00:34:06.970 mutation from things that don't gives you a starting point and I it's the best starting point I've found it's not an absolute rule but if you start there as

00:34:13.750 opposed to some other arbitrary rule I found much better results or design

00:34:19.470 other questions there is no library the Twitter client

00:34:27.220 is not online because I stopped working on it because it turned out that Twitter Twitter's evilness is growing much like

00:34:32.380 test run time of an integration suite and I lost confidence that I should

00:34:38.170 build software that interacts with it sorry Twitter employees I assume there's something here yeah so nothing oh if

00:34:43.240 it's not sorry fair enough fair enough

00:34:51.060 at least it scales okay pretty much new

00:34:58.990 objects while you see because you like to start as well yeah I

00:35:04.680 mean the the Twitter clients doesn't really have many performance concerns I mean it does when it comes up it's

00:35:12.180 sorting through thousands of tweets it remembers everything all the way back and so it has to do a merge of like what it has versus what it sees from the API

00:35:18.390 but it's not doing anything really big in an MRI your life may not be

00:35:24.509 especially good if you're doing tons and tons of allocation if you're in the JVM it's much better right and if you're in

00:35:30.569 a VM that's designed to have constant object creation and destruction it's going to be even better than that a VM

00:35:36.900 design for functional programming I would guess that the Erlang VM would would handle this very well for example

00:35:42.180 because in Erlang you're constantly making small objects and letting them be freed so yes doing doing this on MRI if

00:35:50.880 you have performance concerns is probably going to be a little difficult but you can do certain types of caching

00:35:57.480 right if everything is a value in immutable you can always cache things because they don't change so there's

00:36:02.880 there are ways there ways to work around the the unfortunate nature of your VM I saw a hand back there yeah what's the

00:36:10.259 what's the biggest thing you've built using this style and do you have any concerns that as it gets

00:36:17.079 big the ability to organize those both

00:36:22.450 good questions what's the biggest thing I built and do I have concerns about scaling this into larger projects the biggest thing I built is the Twitter

00:36:28.210 client it's not that big it's about 600 lines and I would not be up here talking about this if that were why I thought

00:36:34.690 this is good the reason I think that this is good is that it it has it has

00:36:41.710 shades of both the actor model built into it the idea of functional pieces

00:36:47.890 that are communicating by passing values back and forth and it also is a lot like

00:36:53.589 the Haskell idea of using the i/o monad to encapsulate state which is a

00:36:58.750 wonderful idea that scales wonderfully up to about 500 lines of code and then everything falls apart right you look at a 20,000 line Haskell

00:37:05.890 program that does a lot of i/o and you're not going to like life that this is why I say I think that the larger

00:37:10.990 program is is smaller ones built in this way communicating via via channels external to the process but what I'm

00:37:19.180 really trying to do is merge merge this idea of actors merge this idea of the i/o a monad and bring them into the oo

00:37:25.780 world using our terminology right I didn't talk about monads I only talked

00:37:30.849 about actors at the end as an example I'm trying to rephrase that stuff in terminology that we use so that it seems

00:37:37.780 more directly accessible but to get back to your question about about larger

00:37:43.060 systems some of the largest most well some of the most reliable large systems in the world are written in Erlang and

00:37:49.119 probably probably most of the reliable large systems in the world are written in Erlang lots of lots and lots of nines

00:37:55.319 not not like Twitter's three 9s we're talking about like eight nines right and the fact that they can build large

00:38:02.140 systems that are that reliable using the actor model even not even knowing what those words mean tells you that there's

00:38:08.950 something there right so that was a long-winded answer to a very simple

00:38:14.260 question yeah very uh if there's an approach that you might recommend

00:38:19.320 one we're creating new rails app and let's say they were hitting a user model that was sub classing active record base

00:38:26.060 they're an approach that one might take try to expand it with the techniques you're talking about an isolate right

00:38:34.700 what sorry if you're building a new rails application and you're doing

00:38:40.290 things like you have a user the subclasses active record base how do you how do you go about doing this I haven't

00:38:47.310 gotten that far yet I have opinions about how you should be building that application but they don't involve this

00:38:52.380 that's a different talk called deconstructing the framework but yeah

00:38:58.320 it's not clear to me yeah give me give me a year or two others yeah deal with

00:39:05.640 the case where the extraction starts exactly be in the sweeper capacity

00:39:11.760 featured at all but they turned out the database is really fast that yep one of the reasons that my

00:39:20.819 talks tend to take half as long as when I practice is I forget to give all the qualifications like for example you

00:39:26.910 don't want to actually do that right you don't want to call use it all then filter in in Ruby the most of the

00:39:36.180 complexity of your application is not database clearing right I mean there's plenty of querying in a complex app but

00:39:41.999 but it is not the 50% of your application it's a fairly small percentage and I think that probably if

00:39:50.279 you're using a using Postgres or my sequel or sequel light that goes in the shell if you're using something like de

00:39:56.249 Tomic which is a database where everything is immutable that can go in the functional core it's just data

00:40:02.339 structures data economic is just data structures so it depends on the nature of your database and the more your

00:40:08.400 components are designed to work in this way the more can move into the core but it doesn't mean that pieces it doesn't

00:40:13.799 mean you can't do this if you have Postgres it just means Postgres has to be relegated to the scar tissue which i

00:40:20.069 think is fine 80% functional is a heck of a lot better than 0% you don't have to get to 99% yeah front row keeps you

00:40:34.940 what keeps me in Ruby inertia to a small

00:40:40.490 extent also I just don't like any of those languages I have this this problem

00:40:48.470 where I can't I can't not care about syntax I really like syntax and I've

00:40:55.069 written I wrote a lot of Lisp in college and I just never really enjoyed it that much Python and Ruby or what I like

00:41:02.000 syntactically this is why I want to go live on a cruise ship and write a new language how are we doing on time is

00:41:10.970 3:10 we can do a couple more guess yeah

00:41:19.109 ways to fix these problems contribute to

00:41:24.329 Rubinius for about a year until you know how it works and then for Karoubi language i'll tell you how to do it i

00:41:31.440 mean you want persistent core types all right you want to you want core types that are designed to be used in this way

00:41:37.650 and from that most of this will fall out pretty pretty naturally you probably

00:41:43.589 want actors and lightweight processes and you're going to have to build a user land scheduler but it's not that hard that's what our lang has and if you have

00:41:50.249 a user land scheduler with lightweight processes if you can fork ten thousand hundred thousand processes easily and

00:41:55.259 you have immutable core types you're most of the way towards doing this ninety-nine percent of the time or

00:42:00.690 ninety-five percent of the time yeah back right RT sort of ask the question

00:42:06.660 before is someone who's ah very well acquainted with Eagles Twitter guys I

00:42:13.440 would still encourage you to go execute this is an example this style

00:42:20.640 yeah the question is why won't you answer my question no that's legitimate

00:42:28.560 I do I do plan on putting this up eventually even though I kind of am NOT

00:42:33.990 happy about Twitter I just I struggle with the idea of encouraging people to write software that interacts with something I don't like versus

00:42:40.010 demonstrating something that I think is good so yeah also it's a little bit

00:42:45.780 embarrassing like the shell is not actually tested at all there are zero tests around it even though 250 lines long which I think will give people the

00:42:52.590 wrong idea I mean I have reasons that I did that but but they're very hard to articulate in like a readme to anyone

00:42:58.080 will actually read so I'm a little torn about encouraging bad things metal right

00:43:04.340 I think that this is a little bit more of a minute but since I'm interested in a lot of

00:43:10.780 your ideas here I just wonder if you look at array languages and their

00:43:16.090 approaches to concurrency right versus thinking about these are thread levels and I'm wondering because I have a

00:43:22.660 thinkable idea working for the long part from future

00:43:27.970 concurrency just wondering what your needs are right so the question is have

00:43:33.130 I looked at array languages and and dude believes that thinking explicitly about threads or I assume you mean processes

00:43:38.980 as well like any kind of explicit yeah thread of control think about those explicitly isn't is not the the right long-term thing

00:43:46.210 the first the first answer is no so that's easy I mean I'm familiar with like J and all those languages I don't

00:43:52.180 actually know any of them I've seen small snippets but I don't understand them the to the second part about

00:43:59.770 threads and arrays or threads and processes not being the right primitive I guess would be the word right then

00:44:05.290 right primitive to build on I'm not convinced that that's actually true I'm not I'm not convinced that they're the

00:44:11.260 wrong thing I assume the alternative you're thinking of is things like a parallel map right like like implicit

00:44:16.300 parallelism that if you're still writing

00:44:22.030 sequential programs to just have parallel pockets whereas in the actor model everything is inherently parallel

00:44:28.720 I mean if it's even remotely reasonably decomposed right as long as you don't have one process that's doing a ton of work so I'm not totally not convinced

00:44:35.980 that that the threads and processes are wrong well I'm convinced that threads are wrong if you're sharing the state but I'm not convinced that independent

00:44:42.460 threads of control independent processes of control are the wrong thing yeah we

00:44:49.210 thought about writing your twitter application using a more open protocol like

00:44:55.720 writing the Twitter app against a more open protocol like Oh status I guess I

00:45:01.790 could doesn't sound very interesting that's the problem I I already wrote it

00:45:08.180 once I don't want to write again maybe I'll put it on github and I'll accept pull requests that put it on a more open

00:45:15.050 protocol I think yeah that it is time thank you guys very much

00:45:53.290 you

Gary Bernhardt

@gary-bernhardt

Explore all talks recorded at RubyConf 2012

+45

RubyConf 2012