00:00:16.880
Uh, thank you for all for coming to my
00:00:18.560
talk, Silent Killers, Lessons from the
00:00:20.960
Brink.
00:00:22.960
I'm Joe Leo. This is me looking very
00:00:24.960
serious because we are going to discuss
00:00:28.640
very serious issues that arise in
00:00:31.039
successful and growing Rails
00:00:33.200
applications.
00:00:35.040
So who am I? I uh I'm the CEO of Def
00:00:37.840
Method. We offer deaf method services
00:00:40.559
where we pour love into your Ruby on
00:00:42.559
Rails applications until those
00:00:43.920
applications love you back. We are the
00:00:46.879
authors of Phoenix which is the first
00:00:48.800
ever AI platform for Rails. This is
00:00:53.360
Phoenix. It's code that loves you back.
00:00:57.840
I'm the co-author of the well-grounded
00:00:59.199
Rubist. I'm the co-host of the Ruby AI
00:01:02.399
podcast. I've been doing this for a
00:01:04.000
really long time. I really do love Ruby,
00:01:06.799
like all of you.
00:01:10.080
And before I jump in, I want to give a
00:01:11.680
special thanks to Steve, Jonathan, and
00:01:15.119
Christine. They dove in and really lived
00:01:19.439
these stories that I'm about to share
00:01:21.200
with you. and they did go to the break
00:01:24.080
with our customers and came back to tell
00:01:26.479
the tales.
00:01:29.439
So, I'm going to talk about the Tangle
00:01:31.360
Deep and I chose Deep Waters
00:01:34.640
because no app is without its obstacles,
00:01:38.000
challenge to na challenges to navigate
00:01:40.640
as you sail the seas of development.
00:01:43.439
Over time, even with great care and the
00:01:46.240
best of intentions, some seemingly small
00:01:48.880
obstacles are silently growing into
00:01:51.119
existential threats in your application.
00:01:54.000
They're silent killers. And today, we
00:01:56.720
are going to name them. We're going to
00:01:58.799
identify them. We're going to teach you
00:02:00.560
how to avoid them. But if you come into
00:02:03.439
these challenges, we are going to teach
00:02:04.880
you how to get out.
00:02:08.560
So, here are some of the antiatterns
00:02:10.319
that we're going to come across along
00:02:11.840
the way. Some of these you're going to
00:02:13.920
recognize, some of them you're not
00:02:15.680
because I made them up. And I had a lot
00:02:17.840
of fun coming up with names for the
00:02:19.200
antiatterns. But the important thing is
00:02:21.120
that you're going to recognize them
00:02:22.720
immediately because they are born out of
00:02:24.560
simple decisions that are made in the
00:02:26.160
course of everyday development that
00:02:28.000
accrete over time.
00:02:30.959
But fear not, you and your crew will
00:02:34.080
navigate to safety.
00:02:36.480
You will understand the hidden
00:02:37.760
complexity.
00:02:39.280
You'll diagnose the killers before
00:02:40.879
crisis hits.
00:02:42.720
and you will implement strategies for
00:02:44.640
Rails app survival. It's true that many
00:02:47.200
of the stories that I'm going to share
00:02:48.560
today involve the kinds of complex
00:02:50.720
complexities that could happen in any
00:02:53.200
application in any stack, but this isn't
00:02:55.760
any application in any stack. This is
00:02:57.519
Rails. And so there are particular ways
00:02:59.440
that we can get ourselves into and out
00:03:01.680
of trouble.
00:03:05.920
One reminder before we set sail,
00:03:09.280
identify and don't compare. I'm going to
00:03:12.319
share some pretty extreme cases, but
00:03:15.280
we've all been through one or more
00:03:16.640
situations like this before. If we are
00:03:19.200
seasoned sailors, we've all gone through
00:03:21.200
a squall from now and again.
00:03:24.480
But we learn from our mistakes and from
00:03:26.800
the tales of others who have gone to the
00:03:28.720
brink and have lived to tell about it.
00:03:34.239
Okay, chapter one, the modularity
00:03:37.519
mirage. It's time if you've got a hat.
00:03:40.720
Our seafaring comrades,
00:03:44.000
lured by the promise of clear waters and
00:03:46.400
separate shores, charted their course
00:03:48.480
upon twin vessels, only to find beneath
00:03:51.440
the surface their hulls bound fast by
00:03:54.319
tangled rigging and unseen currents
00:03:59.120
above the water,
00:04:01.360
a real estate application with two user
00:04:03.200
types, tenants and landlords.
00:04:06.560
In the course of doing business, the
00:04:08.159
tenant app was built first and then as
00:04:10.400
the application in the business grew,
00:04:12.000
the landlord app was created.
00:04:14.720
So the applications were separate, two
00:04:16.639
separate Rails apps, which was a little
00:04:19.359
suspicious, but the architecture was
00:04:20.880
modular, at least on paper.
00:04:23.520
Each app had its own GitHub, GitHub
00:04:25.280
repository, Heroku pipeline, React front
00:04:27.680
end. So it gave the appearance of a
00:04:29.360
clean separated system, a modern enough
00:04:32.800
stack. I know Heroku at Semaphore isn't
00:04:34.639
so modern anymore, but uh but this
00:04:36.960
business has been around for a while.
00:04:39.759
Clean code, CI/CD is running. There were
00:04:42.479
review apps configured. Seemingly all
00:04:45.199
was well.
00:04:47.199
Our lead engineer Steve dove in first.
00:04:50.800
Ran Ruby Critic as he is want to do
00:04:52.720
against the application and the results
00:04:55.120
came out pretty good. 91 score,
00:04:58.720
a few scattered D's, some C's, mostly
00:05:01.840
A's and B's.
00:05:04.960
And the Landlord app, though smaller,
00:05:06.639
was even better. No D's, a few C's, A's
00:05:10.240
and B's all around.
00:05:12.880
But trouble was spotted.
00:05:15.360
First, the customer came to us because
00:05:17.440
they wanted to implement Dora metrics
00:05:19.199
and they wanted our help, which is
00:05:21.120
interesting because you typically don't
00:05:23.120
want to implement Dora metrics until
00:05:24.720
something has gone wrong. And in this
00:05:26.080
case, something had gone wrong. What had
00:05:28.800
gone wrong was over the course of four
00:05:30.400
months, the production application that
00:05:32.560
was deployed needed to be rolled back
00:05:34.160
four separate times. Once one month,
00:05:36.720
then twice the next month, then zero the
00:05:38.880
month after that, one the month after
00:05:40.479
that. Depending on the industry you're
00:05:43.039
in and the number of times you deploy,
00:05:44.800
that might not seem like a big deal. But
00:05:46.800
in the highly regulated real estate
00:05:48.560
industry,
00:05:50.080
these kinds of production bugs when left
00:05:52.720
unchecked or before being rolled back
00:05:55.280
can result in regulatory fines. They can
00:05:57.600
result in regulatory penalties and even
00:05:59.440
shutdowns in certain states.
00:06:04.080
When we took a look at the main
00:06:05.919
branches, we found that the tenant app
00:06:08.080
main branch had failed seven times in
00:06:09.840
the last 25 builds. And the landlord app
00:06:12.240
was even worse. It had failed 11 times
00:06:14.400
out of the last 25 builds.
00:06:17.280
And what was worse is that there was no
00:06:19.120
flaky tests making this fail, they were
00:06:20.960
failing for inconsistent reasons. And
00:06:23.600
finally, there was no automated test
00:06:25.360
coverage being collected. So although we
00:06:27.120
found the application to be generally
00:06:29.199
well tested, we couldn't tell which
00:06:31.520
parts were really well tested and which
00:06:33.120
weren't because there was nothing like
00:06:34.639
coveralls uh or code climate that was
00:06:37.039
telling us the uh test coverage metrics.
00:06:41.520
So we dove in
00:06:44.960
and the first thing we spotted was that
00:06:46.560
there was direct database coupling. So
00:06:48.319
despite the repo and deployment
00:06:50.560
separation, the tenant app had direct
00:06:52.479
dependencies on the landlord database
00:06:54.880
and schema changes in the landlord
00:06:56.880
database necessitated immediate changes
00:06:59.520
in the tenant application
00:07:01.759
and some models in the tenant
00:07:03.199
application directly called tables in
00:07:06.240
the landlord application. So this
00:07:08.400
violated service boundaries and made
00:07:10.639
isolating testing, isolated testing
00:07:12.560
almost impossible. What's more, those
00:07:14.960
Heroku review apps that we were happy to
00:07:17.280
see at first actually required a
00:07:19.840
matching companion app for the other
00:07:21.680
repo in order to function. And if a
00:07:23.759
direct match didn't exist, then a
00:07:26.080
standalone version was bootstrapped with
00:07:28.639
copying over a schema file and doing
00:07:30.560
some manual changes. And so the whole
00:07:32.960
thing was manual and brittle and it was
00:07:34.639
held together with some custom Heroku
00:07:36.400
API glue code.
00:07:40.080
So let's pause for a moment and take a
00:07:41.520
look at the antiatterns at play. The
00:07:44.319
first superficial modularity. And I want
00:07:46.880
to stress that the antiattern I'm
00:07:48.960
talking about is not tight coupling. We
00:07:51.440
know that tight coupling is something to
00:07:53.520
avoid, but really in the real world it's
00:07:56.160
a system of trade-offs. And it might be
00:07:58.479
okay to say okay these applications are
00:08:00.879
coupled and we should treat them as
00:08:02.160
such. The antiattern here is that
00:08:04.560
although the the developers were
00:08:06.400
obviously intelligent and could say that
00:08:08.639
yes it's not really modular because
00:08:10.879
there's a database dependency the entire
00:08:13.199
team was still behaving as if they had
00:08:15.199
modular applications. That is
00:08:17.280
superficial modularity.
00:08:20.800
Schroinger's deploy. Were the deploys
00:08:23.440
alive or were they dead? I have Mark for
00:08:25.680
inspiration on this.
00:08:28.160
So when the review apps are tested, they
00:08:30.000
seem to work, but the manual effort
00:08:31.680
required to set up the review apps into
00:08:34.000
a staging environment or review app
00:08:35.680
environment was not replicated in
00:08:37.680
production. So we couldn't have
00:08:39.279
confidence that what we were seeing in
00:08:41.120
the review environment was actually
00:08:42.800
going to be there in production. And of
00:08:44.240
course, we had evidence that it was not
00:08:45.760
the same. And finally, broken windows.
00:08:50.080
just charting along the course, seeing
00:08:52.880
unreliable main builds, features that
00:08:55.680
work locally but fail in staging and an
00:08:57.920
inability to to track test coverage made
00:09:01.440
it so that the code that I write on my
00:09:03.519
computer, I could not have any trust
00:09:05.279
that it would make it to production
00:09:06.480
without failing somewhere and I didn't
00:09:08.240
know if those failures were real or if
00:09:10.240
they were a product of the environment.
00:09:13.279
Okay,
00:09:15.040
so the consequences here was really that
00:09:18.320
what started as an engagement to
00:09:20.080
implement Dora metrics quickly became a
00:09:22.080
remediation engagement for an
00:09:23.920
existential threat to the business.
00:09:28.240
The schema changes required
00:09:29.600
synchronization deployments. Features
00:09:31.279
couldn't be tested in isolation and of
00:09:33.360
course confidence was lost among the
00:09:34.880
product and engineering team.
00:09:37.519
So we set to work to untangle the
00:09:40.959
vessels
00:09:42.560
and we started by breaking the direct
00:09:44.959
database coupling. We removed the
00:09:47.279
database dependency between the tenant
00:09:48.880
app and the landlord app and we restored
00:09:50.959
true modularity. We used active resource
00:09:53.760
and I think it's interesting that you
00:09:55.920
know active resource has kind of uh
00:09:58.399
faded. It's not really used uh as much.
00:10:01.040
it's not included in Rails but in but I
00:10:04.240
think the main reason that people don't
00:10:06.399
use active resource is is because it
00:10:08.480
behaves too much like active record in
00:10:10.640
this case people were not using active
00:10:12.560
record enough and so we thought that
00:10:14.320
this was a nice step in the right
00:10:15.680
direction to allow the tenant app to
00:10:17.760
fetch data over the API rather than
00:10:20.399
reaching into the landlord's database
00:10:25.200
we added API contract tests one thing
00:10:27.200
you're going to see over and over at
00:10:28.880
Death Method we really love tests
00:10:30.959
So we used VCR which is well known. We
00:10:33.279
use pact which is slightly less wellnown
00:10:35.279
but that's consumer product contract
00:10:37.040
testing to capture and validate real API
00:10:40.000
interactions and this protects both
00:10:42.399
sides of the boundary from unintended
00:10:44.240
changes.
00:10:47.519
Then we simplified the review app setup
00:10:50.079
and I really liked what we did here. We
00:10:52.160
extracted shared logic into a private
00:10:54.000
gem which helped to remove some of the
00:10:55.839
duplication across both environments for
00:10:57.839
setup and it helped us to fully automate
00:11:00.959
the review app environment with Heroku
00:11:03.760
rather than having manual steps.
00:11:06.800
And finally, we did what they brought us
00:11:09.600
in there to do and we did implement some
00:11:12.000
Dora metrics. In fact, Steve created an
00:11:14.399
open source library for calculating uh
00:11:17.040
Dorometrics using GitHub actions. Now,
00:11:19.440
it's since been archived in favor of
00:11:22.399
services like Sleuth and Linear B, but
00:11:25.120
both Death Method and the customer were
00:11:26.880
really happy to be able to make that
00:11:28.160
contribution at the time because there
00:11:29.600
was no solution that existed.
00:11:32.720
After that, fair winds and following
00:11:35.279
seas.
00:11:36.959
And we learned some things that
00:11:38.959
superficial modularity is dangerous
00:11:42.160
that shared databases are hidden anchors
00:11:44.320
and they need to be cut early.
00:11:47.040
And this one I think is important.
00:11:48.880
DevOps metrics tell the story of
00:11:50.959
architectural health. And it's easy for
00:11:53.680
me I know in my own work to focus
00:11:56.079
exclusively on the health of the
00:11:57.760
application that I'm working on. And I
00:11:59.360
do that with code climate and uh and
00:12:01.760
Ruby critic and whatever CI/CD metrics I
00:12:04.320
may have in place and I think those are
00:12:05.839
very useful but it does not tell me the
00:12:08.079
story of how that app behaves in the
00:12:10.160
architectural environment in production.
00:12:12.160
And we need DevOps metrics for that.
00:12:15.200
And of course, real modularity is tested
00:12:17.760
in how things fail and scale. It is fine
00:12:20.880
to say that your application is modular.
00:12:23.200
It's fine to behave as if the
00:12:25.040
application is modular. But if you can't
00:12:27.600
recover quickly from failures and if you
00:12:29.519
can't scale at the speed of your
00:12:30.800
business, then you do not have a modular
00:12:32.639
application.
00:12:35.360
All right,
00:12:36.880
nice work on the first one. Let's go to
00:12:39.040
chapter two. The Leviathan beneath.
00:12:42.079
Excuse me, I need my hat.
00:12:45.440
Sailing a vessel built with confidence
00:12:47.680
and craft, the captain believed her
00:12:50.079
charts true and her keel steady, unaware
00:12:53.279
that far below a great beast had coiled
00:12:56.560
itself around the ship's spine, grown
00:12:58.880
fat on years of unchecked complexity.
00:13:02.480
Thank you, Colin. Appreciate it putting
00:13:04.800
the hat on.
00:13:08.399
Above the water, we have a marketplace
00:13:10.560
for collocation and networking buyers
00:13:12.480
and sellers.
00:13:14.959
There's a clear separation between the
00:13:16.480
Rails back end and the React front end.
00:13:18.720
It's modular front end with Redux,
00:13:20.560
Webpack, SPAS. It was fine. There are
00:13:23.440
reasonable metrics in place. A code
00:13:25.040
climate rating of a B. Okay. Clean
00:13:27.839
controllers, organized UI.
00:13:30.399
The team came to us or the business came
00:13:32.240
to us because they needed to add new
00:13:34.079
service categories, but they feared the
00:13:35.920
cost and complexity of extending the
00:13:37.760
system, which is interesting. At first
00:13:40.399
glance,
00:13:42.399
there is not evidence for that. But a
00:13:44.959
closer look revealed, nope, not up to
00:13:47.120
that part.
00:13:48.880
This is just to show you that it's a a
00:13:50.800
not small but not very large
00:13:52.320
application.
00:13:54.399
140 160,000 lines of code. Some trouble
00:13:57.600
was spotted early.
00:14:00.399
So, the CI build had been broken for
00:14:03.440
months due to flaky tests. On one hand,
00:14:06.480
the developers didn't just delete the
00:14:08.240
flaky tests or mark them as pending,
00:14:09.839
which is even worse. But on the other
00:14:11.600
hand, they didn't let it stop them from
00:14:13.040
deploying. And so, they were deploying
00:14:14.480
with red builds.
00:14:17.440
Adding to that, there was no JS
00:14:19.120
integration or unit test. So, everything
00:14:20.880
was being tested in the front end at
00:14:22.560
least with feature specs. And while I
00:14:24.639
have a soft spot on my heart for feature
00:14:26.160
specs, they are very difficult to
00:14:27.920
maintain. They are very flaky and they
00:14:30.880
take forever to run. And to to top
00:14:34.160
things off, we saw that actually it was
00:14:35.760
only covering about 40% of the front-end
00:14:38.320
code anyway.
00:14:40.880
But this is the real smoking gun, if
00:14:43.199
you'll allow me to mix metaphors. The
00:14:45.680
technical debt over the course of one
00:14:47.680
month dropped from about 16,000
00:14:49.920
estimated hours of remediation time to
00:14:52.320
about 2,000 hours, which is 14,000 hours
00:14:55.760
and is really impressive because it was
00:14:57.440
a team of three that was working on
00:14:58.959
this.
00:15:00.720
Of course, the code climate warnings
00:15:02.160
were being ignored. And probably if
00:15:06.000
you're using code climate, you're also
00:15:08.000
ignoring one or two warnings. 1,500
00:15:10.639
warnings are being ignored in this case,
00:15:12.959
which is a really extreme example and
00:15:15.040
was mask masking some really extreme
00:15:17.279
complexity.
00:15:20.880
Looking at the dependency chart for the
00:15:22.480
front end, things are not great, but
00:15:24.560
they're not terrible. This is a
00:15:26.560
manageable amount of dependencies. The
00:15:29.040
real issue we found on the front end was
00:15:30.880
that client side components like project
00:15:32.880
view.jsx and requirements.jsx JSX each
00:15:35.920
exceeded 1600 lines and had some really
00:15:38.480
nasty conditional logic.
00:15:41.920
But this is what the Ruby code
00:15:44.240
dependency graph looked like. And
00:15:46.160
shameless plug, if you sign up for
00:15:48.240
Phoenix, you will get a directed
00:15:49.839
interactive 3D graph of your code and
00:15:52.320
you can actually look through it and
00:15:53.759
find all of the really gnarly
00:15:55.839
dependencies. But I digress. This was
00:15:58.079
pre Phoenix. And what we quickly found
00:16:00.880
with this dependency graph
00:16:04.320
was that we had project and quote
00:16:06.240
classes that each had more than 20
00:16:08.480
responsibilities. So single
00:16:10.160
responsibility principle is out the
00:16:11.600
window. Half of the quote code was
00:16:14.399
devoted to pending quote generation and
00:16:17.680
automation. So state management
00:16:19.199
happening in the file and active support
00:16:21.680
concerns were burying yet more
00:16:23.440
complexity. We'll get to that in a
00:16:25.120
moment. I already mentioned the code
00:16:27.199
climate warnings and there were an
00:16:28.720
intense number of model dependencies.
00:16:33.199
The antiatterns at play here. First,
00:16:35.920
Leviathan classes which you may call God
00:16:38.800
classes, but I'm trying to stay on
00:16:40.079
theme.
00:16:42.320
The god classes or the Leviathan classes
00:16:44.480
are usually giant classes in your
00:16:46.480
application that have tons of
00:16:48.079
responsibilities, often tons of
00:16:49.920
dependencies. They are very difficult to
00:16:52.000
test. They are very difficult to
00:16:53.279
maintain. and they're even more
00:16:54.880
difficult to add new features or to
00:16:57.040
actually iterate upon. I see some people
00:16:59.839
shaking their heads and smiling. These
00:17:02.079
are really common god classes.
00:17:05.120
Warning fatigue. I know I'm citing a
00:17:07.360
really extreme example where a team
00:17:09.120
decided unilaterally to ignore 1500
00:17:11.600
warnings. But the truth is that I've
00:17:13.760
never worked on or seen or consulted for
00:17:17.039
a team that didn't experience some kind
00:17:19.280
of warning fatigue. And you could think
00:17:21.600
about this right now. you're all at a
00:17:23.199
conference. Has anybody did anybody
00:17:25.919
ignore a Slack warning about an issue
00:17:28.880
that might have happened? Some kind of
00:17:31.840
uh error that was caught, some kind of
00:17:34.080
notification from your CI. It happens
00:17:36.400
all the time. It's actually really
00:17:38.080
difficult to be very judicious about
00:17:40.720
what warnings we allow. And if you
00:17:42.720
ignore a warning, it really is kind of
00:17:44.559
the same thing as turning it off.
00:17:47.679
Finally, concern creep. Active support
00:17:50.559
concerns serve a valid purpose and they
00:17:53.840
really are interesting um code and and
00:17:56.400
nice to implement. However, they can
00:17:58.799
also add invisible complexity to already
00:18:01.200
bloated classes.
00:18:03.440
So they can quickly become a problem
00:18:04.880
when not used sparingly and in this case
00:18:07.200
they really were a problem.
00:18:10.400
So the consequences are the lost trust
00:18:12.320
in the build process. Engineering plans
00:18:14.799
and estimates were off by orders of
00:18:16.640
magnitude
00:18:18.480
and most importantly the business can't
00:18:21.039
break into new service categories. So it
00:18:23.440
could not advance because of what had
00:18:26.000
been built.
00:18:28.720
We needed to chart a new course. We
00:18:30.640
could not slay the Leviathan. Or maybe
00:18:32.880
it's more accurate to say that we would
00:18:34.480
not slay it. At Death Method, we
00:18:36.960
generally work with thriving businesses
00:18:39.200
which means that rewriting code is
00:18:41.440
generally off the table. the business
00:18:43.039
needs to continue which means the
00:18:44.400
application needs to continue. So
00:18:46.559
instead of trying to rewrite it or
00:18:48.480
trying to do some major surgery we
00:18:50.640
instead tried to chart a new course and
00:18:52.480
we started with some major backend
00:18:54.240
refactoring.
00:18:56.000
We decoupled the responsibilities by
00:18:57.760
extracting pricing export and project
00:19:00.240
life cycle logic from models into
00:19:02.320
discrete service classes and
00:19:04.080
interactors. So the interactor gem if
00:19:06.480
you've not used it is designed for the
00:19:08.559
specific purpose of encapsulating
00:19:10.160
business logic. It's quite useful.
00:19:13.840
The Statesman gem was used for state
00:19:15.840
management. And there are actually many
00:19:17.440
libraries out there for state
00:19:19.039
management. I recommend using one. Um,
00:19:21.600
and in this case, we were able to move,
00:19:23.600
you know, more than half of the code out
00:19:25.440
of the quote class and do life cycle
00:19:27.919
management outside of that class.
00:19:31.440
The front end needed just as much
00:19:33.120
restructuring
00:19:35.840
which we streamlined by modularizing the
00:19:37.919
re the Redux connections, eliminating
00:19:40.320
unused Redux code, and introducing code
00:19:43.039
splitting to improve performance and
00:19:44.960
load times.
00:19:46.880
And finally, we implemented the React
00:19:48.640
testing library for unit and integration
00:19:50.720
tests so that we weren't relying
00:19:52.559
exclusively on feature specs.
00:19:58.400
And finally,
00:20:00.880
we reestablished trust in the metrics.
00:20:02.960
This is a negotiation.
00:20:05.200
It was pretty easy to say, "Hey, look,
00:20:06.880
the green build really needs to be
00:20:08.240
reinforced."
00:20:09.919
Engineers generally agree we shouldn't
00:20:12.000
be deploying on uh on a flaky build. We
00:20:14.640
shouldn't be deploying when things are
00:20:15.760
red. But unignoring code climate
00:20:18.080
complexity um first is not generally
00:20:21.520
welll liked by the team because they
00:20:22.799
don't want to see all these warnings
00:20:23.919
come back and second is not really
00:20:25.919
doable for a team of three to remediate
00:20:27.840
all of that code. So instead it was more
00:20:30.720
of a judicious discussion to say okay
00:20:33.120
well what warnings are really important
00:20:35.039
and are masking the highest amount of
00:20:36.799
complexity so that we can move in a
00:20:40.080
better direction than where we've been
00:20:41.679
going.
00:20:43.840
Finally, a commitment to 80% test
00:20:45.600
coverage for new and refactored code. If
00:20:47.919
you're a testing zealot like me, 80% is
00:20:50.400
an extremely low bar for Ruby. However,
00:20:53.360
if you're not, that's okay. Just know
00:20:56.080
that it is easy to get to an 80% metric
00:20:59.440
by simply creating a new file with RSpec
00:21:02.720
or miniest or test unit or whatever you
00:21:04.559
like and simply exercising all of the
00:21:07.039
public methods in your file. You'll hit
00:21:10.080
80% no sweat. After
00:21:14.400
that, fair winds and following seas. Of
00:21:16.960
course, this took some time. Uh it's not
00:21:19.760
like this was able to to be done in a
00:21:21.520
weekend. But we did learn some things.
00:21:23.840
We learned that complex systems fail
00:21:25.760
slowly under layers of accreted
00:21:27.760
responsibility. There was no big outage,
00:21:31.120
no big um threat to the business except
00:21:36.159
that when the business decided it wanted
00:21:38.640
to move into new categories, which is
00:21:40.400
what businesses tend to do, they found
00:21:42.400
that they couldn't. So they were
00:21:44.640
stagnated under the layers of accreted
00:21:47.360
responsibility.
00:21:49.120
Complexity ignored is complexity
00:21:51.039
multiplied. If you take nothing else
00:21:53.520
from my talk today, understand that
00:21:56.000
complexity ignored is complexity
00:21:58.480
multiplied. Entire businesses have been
00:22:01.360
built
00:22:03.360
because developers ignore this to their
00:22:06.000
detriment. Complexity ignored is
00:22:08.480
complexity.
00:22:10.320
Yeah. Multiplied. All right.
00:22:14.240
Uh finally, the true measure of a
00:22:15.760
healthy system is safe change at speed.
00:22:17.520
Of course, this is not a system that
00:22:19.520
could change at the speed that the
00:22:21.120
business needed to, but it eventually
00:22:23.120
got there, I am happy to say.
00:22:27.280
Finally, chapter three,
00:22:30.480
the silt below.
00:22:34.080
Long at anchor and trusting in familiar
00:22:37.120
charts, our comrades gave little thought
00:22:39.600
to the slow accumulation beneath them.
00:22:42.720
Until the day came when the ship would
00:22:44.720
not turn, and they found its keel mired
00:22:48.159
in deep layers of silted logic and
00:22:51.679
forgotten depths.
00:22:54.159
Hope the photographer got me with the
00:22:55.440
hat on.
00:22:57.760
Above the water, one of the longest
00:22:59.520
running and successful public-f facing
00:23:01.360
Rails applications. I'll say no more
00:23:03.760
because I don't want to identify them.
00:23:06.320
Conventional methods were used for MVC.
00:23:09.520
Simple controllers, simple models,
00:23:12.240
complexity was split out into helpers
00:23:14.080
and concerns. And in this case, those
00:23:15.840
concerns were not a concern. A well-
00:23:19.120
tested application and in very good
00:23:21.520
shape for a Rails application that's
00:23:22.960
almost two decades old.
00:23:25.520
So, what did we spot
00:23:30.720
first?
00:23:32.320
What is this model doing? This comment
00:23:35.360
is not mine. We actually found this in
00:23:37.520
the code. This is just a join model and
00:23:39.520
usually shouldn't need to be used
00:23:40.960
directly.
00:23:42.559
Well, then why is it there?
00:23:45.360
So, we take another look beneath the
00:23:47.120
surface. And in the structure.sql file,
00:23:50.080
we found that first there's a
00:23:52.080
structure.sql file. Second,
00:23:56.080
we found that there were dozens of these
00:24:00.080
complex Postgress functions in that
00:24:02.880
file.
00:24:06.320
Oh, skip that. And finally, in the
00:24:09.360
membership controller, we found this
00:24:11.600
multiple joins and raw SQL in the
00:24:14.480
controller. Added to this is the fact
00:24:17.120
that what we are trying to do here is
00:24:19.279
callull a list of admin users. And how
00:24:21.840
many admin users could there possibly
00:24:23.679
be? 10, 20? To use multiple joins in raw
00:24:27.919
SQL in this manner felt like it was
00:24:30.080
hiding something really deeply wrong.
00:24:35.039
So to summarize, hard-coded SQL
00:24:36.799
functions in structure.SQL, dozens of
00:24:39.279
join dates, filters, and embedded
00:24:41.600
business logic, raw SQL in the
00:24:44.640
controllers.
00:24:47.200
the antiatterns here logic in the
00:24:50.080
depths.
00:24:51.760
So what that structure SQL file revealed
00:24:54.880
was application logic buried in stored
00:24:57.360
procedures. This affects everything.
00:25:01.440
Store pro stored procedures are hard to
00:25:03.840
maintain. They're hard to test. They're
00:25:05.840
hard to reason about. You name it.
00:25:09.679
Also overnormalization. So a join model
00:25:12.960
in and of itself is not an antiattern.
00:25:16.320
They are sometimes necessary. But what
00:25:18.640
we find when we see multiple join models
00:25:21.520
with comments like do not use this
00:25:23.600
model. We start to suspect that
00:25:26.480
something is at play. This
00:25:27.840
overnormalization
00:25:29.440
or that at some point the engineers
00:25:32.080
decided to prioritize third normal form
00:25:35.440
or beyond over being able to easily work
00:25:38.799
within the application layer. This is an
00:25:41.520
antiattern and I'll tell you why. The
00:25:44.640
consequences here are that each change
00:25:46.960
requires migrations and manual SQL
00:25:49.760
edits. That in itself slows down
00:25:52.080
development and makes things brittle.
00:25:54.159
But I am more concerned with the second
00:25:56.720
bullet point which is that maintenance
00:25:58.880
and iteration require specialists. So
00:26:01.440
stay with me here. You may not think
00:26:03.679
that needing to know a bunch of SQL
00:26:05.760
makes you a specialist. And indeed we're
00:26:07.520
at a developer conference where we all
00:26:09.679
want to say okay well we are Rails
00:26:11.360
developers. Of course, we know how to
00:26:13.039
write SQL or we should know how to write
00:26:15.120
SQL. But if you stop and think for a
00:26:17.279
moment about the kinds of developers
00:26:19.679
that can modify or maintain or iterate
00:26:23.120
upon
00:26:24.880
oh
00:26:28.159
this
00:26:30.080
I guarantee you are thinking about your
00:26:32.159
more experienced engineers. So to the
00:26:34.720
detriment of your less experienced
00:26:36.559
engineers which means that all of a
00:26:38.159
sudden you are creating this uh this
00:26:40.799
dynamic or this dichotomy where only
00:26:43.279
maybe half or fewer of your developers
00:26:45.600
can actually contribute to the entire
00:26:47.200
application meaningfully. And let's face
00:26:48.960
it, it's just a database. Everybody
00:26:51.360
should be able to participate and add
00:26:53.520
features.
00:26:57.600
In full disclosure, we have not
00:26:59.760
excavated this hole just yet.
00:27:02.559
This customer has actually a lot of
00:27:04.400
issues and we have not yet gotten to the
00:27:06.400
point where we can start to separate the
00:27:09.279
structure.sql file or break it up. But
00:27:11.440
this is what we will do when given the
00:27:13.360
chance. First surface can catalog.
00:27:17.600
Find every Postgress function in that
00:27:19.520
structure file. Classify each by
00:27:21.679
complexity, frequency of use, and
00:27:23.919
dependency on application tables. I'm
00:27:26.159
betting that there are some stored
00:27:27.679
procedures there that are no longer
00:27:29.120
being used. And wouldn't that be nice to
00:27:30.799
just delete them? For the rest, we want
00:27:34.000
to detach from structure.sql fi SQL if
00:27:37.120
possible. I understand that there can be
00:27:39.279
really fine grain database tuning
00:27:41.039
happening in the structure.SQL file, but
00:27:43.120
barring that, we really should all get
00:27:44.720
on to the schema.rb.
00:27:48.000
We'll surround with tests.
00:27:50.799
Um, and for critical or interconnected
00:27:52.720
functions, we would add contract tests
00:27:54.399
to ensure that downstream logic remains
00:27:56.799
stable throughout the transition.
00:27:59.600
and incrementally replace. So we can
00:28:01.840
rewrite that SQL in active record scopes
00:28:03.919
or in ARL and we can extract the service
00:28:06.640
objects and cache as needed for
00:28:08.720
performance reasons.
00:28:11.360
After that we would of course have fair
00:28:14.080
winds and following C's.
00:28:16.559
The thing to learn here is that a
00:28:17.840
structure file deserves scrutiny. It
00:28:20.080
doesn't mean that it's bad, but it can
00:28:22.399
sometimes
00:28:23.919
uh be done because the team is working
00:28:26.799
against what Rails gives you. And
00:28:28.960
certainly in an application that's been
00:28:30.559
running for almost 20 years, it could be
00:28:33.120
that that Rails didn't always give us
00:28:35.760
the the means to get around that
00:28:37.279
structure file, but it does today.
00:28:40.559
Rails conventions mixed with embedded
00:28:42.159
Postgress logic create split maintenance
00:28:44.240
paths and thus specialists and join
00:28:47.039
models may be a sign of
00:28:49.120
overnormalization.
00:28:53.520
Okay, the storm has passed. The sails
00:28:57.120
are mended and for now the waters are
00:28:59.840
calm. The crew stands at the helm, not
00:29:02.640
unscarred but wiser. Their hands I
00:29:06.159
already lost my place. their hands
00:29:07.360
guided by hard one knowledge of what
00:29:09.360
lurks below. They know that even still
00:29:12.640
seas conceal silt, and fair winds may
00:29:15.760
yet lead toward hidden beasts, but with
00:29:18.240
eyes sharpened and a vessel made ready,
00:29:20.960
they sail on.
00:29:24.640
Thank you everybody for coming. I really
00:29:27.440
appreciate it. You have made this an
00:29:28.799
incredible experience.