Summarized using AI

Silent Killers: Lessons from the Brink

Joe Leo • July 10, 2025 • Philadelphia, PA • Talk

Introduction

Silent Killers: Lessons from the Brink is a RailsConf 2025 talk by Joe Leo addressing the gradual degradation of Rails applications as they evolve in complexity. Joe shares stories from real-world projects, illustrating how seemingly minor issues can become existential threats, and offers actionable strategies for diagnosing and remedying deep-rooted problems without resorting to full rewrites.

Main Theme

The talk focuses on the hidden, accumulating issues—'silent killers'—that undermine the maintainability and scalability of mature Ruby on Rails applications. It highlights how unaddressed architectural and codebase issues silently compound over time, impacting development velocity, product confidence, and team effectiveness.

Key Points

  • Identification of Anti-patterns:
    • Joe describes three major anti-patterns using nautical metaphors:
    • The Modularity Mirage (superficially modular architectures)
    • The Leviathan Beneath (god classes and warning fatigue)
    • The Silt Below (buried business logic and overnormalization)

Case Study 1: The Modularity Mirage

  • Two Rails apps (tenant and landlord) appeared separate but were tightly coupled through shared databases.
  • Superficial modularity created fragile review/deployment processes and unreliable testing.
  • Resolutions included breaking database dependencies, using APIs (Active Resource), introducing contract and API tests, automating review app setup, and implementing Dora metrics.

Case Study 2: The Leviathan Beneath

  • A marketplace product with clear frontend/backend separation but grown massive god (Leviathan) classes and complex dependencies.
  • Warning fatigue led to ignoring thousands of code quality warnings and coalesced technical debt.
  • Solutions involved extracting service classes, using the Interactor and Statesman gems for logic/state, modularizing React components, improving test coverage, and restoring trust in CI metrics.

Case Study 3: The Silt Below

  • A legacy, public-facing Rails app had hidden complexity in numerous PostgreSQL stored procedures and excessive join models, leading to specialist-dependent maintenance and slow iteration.
  • Recommendations were to catalog and incrementally migrate business logic from SQL to Rails, surround logic with tests, and reduce overnormalization.

Important Takeaways

  • Superficial architecture decisions (like fake modularity or misplaced logic) become critical technical debt if unaddressed.
  • Metrics and monitoring should go beyond simple test coverage, incorporating DevOps and deployment indicators to accurately gauge codebase health.
  • Large, complex classes and warning fatigue multiply risk and slow innovation.
  • Overuse of database logic and excessive normalization can limit the team's ability to iterate and onboard engineers.
  • Remediation focuses on restoring confidence via modularity, robust testing, judicious use of metrics, and incremental refactoring—avoiding full rewrites unless absolutely necessary.
  • The core measure of system health is the ability to make safe changes at speed.

Conclusion

By recognizing and addressing silent killers before they manifest as acute problems, teams can keep mature Rails applications flexible, maintainable, and ready for new business challenges, all without the need for painful rewrites.

Silent Killers: Lessons from the Brink
Joe Leo • Philadelphia, PA • Talk

Date: July 10, 2025
Published: July 23, 2025
Announced: unknown

Every Rails app starts with a solid foundation, but as features evolve and teams change, even the best projects can drift toward complexity. Over time, code that once felt effortless can become tangled, slow, and hard to maintain. This talk dives deep into what truly degrades a Rails application over time. We’ll explore practical ways to diagnose the health of a codebase, including underutilized metrics beyond test coverage and churn. Along the way, I’ll share battle-tested strategies for bringing struggling Rails apps back to life without a painful rewrite.

RailsConf 2025

00:00:16.880 Uh, thank you for all for coming to my
00:00:18.560 talk, Silent Killers, Lessons from the
00:00:20.960 Brink.
00:00:22.960 I'm Joe Leo. This is me looking very
00:00:24.960 serious because we are going to discuss
00:00:28.640 very serious issues that arise in
00:00:31.039 successful and growing Rails
00:00:33.200 applications.
00:00:35.040 So who am I? I uh I'm the CEO of Def
00:00:37.840 Method. We offer deaf method services
00:00:40.559 where we pour love into your Ruby on
00:00:42.559 Rails applications until those
00:00:43.920 applications love you back. We are the
00:00:46.879 authors of Phoenix which is the first
00:00:48.800 ever AI platform for Rails. This is
00:00:53.360 Phoenix. It's code that loves you back.
00:00:57.840 I'm the co-author of the well-grounded
00:00:59.199 Rubist. I'm the co-host of the Ruby AI
00:01:02.399 podcast. I've been doing this for a
00:01:04.000 really long time. I really do love Ruby,
00:01:06.799 like all of you.
00:01:10.080 And before I jump in, I want to give a
00:01:11.680 special thanks to Steve, Jonathan, and
00:01:15.119 Christine. They dove in and really lived
00:01:19.439 these stories that I'm about to share
00:01:21.200 with you. and they did go to the break
00:01:24.080 with our customers and came back to tell
00:01:26.479 the tales.
00:01:29.439 So, I'm going to talk about the Tangle
00:01:31.360 Deep and I chose Deep Waters
00:01:34.640 because no app is without its obstacles,
00:01:38.000 challenge to na challenges to navigate
00:01:40.640 as you sail the seas of development.
00:01:43.439 Over time, even with great care and the
00:01:46.240 best of intentions, some seemingly small
00:01:48.880 obstacles are silently growing into
00:01:51.119 existential threats in your application.
00:01:54.000 They're silent killers. And today, we
00:01:56.720 are going to name them. We're going to
00:01:58.799 identify them. We're going to teach you
00:02:00.560 how to avoid them. But if you come into
00:02:03.439 these challenges, we are going to teach
00:02:04.880 you how to get out.
00:02:08.560 So, here are some of the antiatterns
00:02:10.319 that we're going to come across along
00:02:11.840 the way. Some of these you're going to
00:02:13.920 recognize, some of them you're not
00:02:15.680 because I made them up. And I had a lot
00:02:17.840 of fun coming up with names for the
00:02:19.200 antiatterns. But the important thing is
00:02:21.120 that you're going to recognize them
00:02:22.720 immediately because they are born out of
00:02:24.560 simple decisions that are made in the
00:02:26.160 course of everyday development that
00:02:28.000 accrete over time.
00:02:30.959 But fear not, you and your crew will
00:02:34.080 navigate to safety.
00:02:36.480 You will understand the hidden
00:02:37.760 complexity.
00:02:39.280 You'll diagnose the killers before
00:02:40.879 crisis hits.
00:02:42.720 and you will implement strategies for
00:02:44.640 Rails app survival. It's true that many
00:02:47.200 of the stories that I'm going to share
00:02:48.560 today involve the kinds of complex
00:02:50.720 complexities that could happen in any
00:02:53.200 application in any stack, but this isn't
00:02:55.760 any application in any stack. This is
00:02:57.519 Rails. And so there are particular ways
00:02:59.440 that we can get ourselves into and out
00:03:01.680 of trouble.
00:03:05.920 One reminder before we set sail,
00:03:09.280 identify and don't compare. I'm going to
00:03:12.319 share some pretty extreme cases, but
00:03:15.280 we've all been through one or more
00:03:16.640 situations like this before. If we are
00:03:19.200 seasoned sailors, we've all gone through
00:03:21.200 a squall from now and again.
00:03:24.480 But we learn from our mistakes and from
00:03:26.800 the tales of others who have gone to the
00:03:28.720 brink and have lived to tell about it.
00:03:34.239 Okay, chapter one, the modularity
00:03:37.519 mirage. It's time if you've got a hat.
00:03:40.720 Our seafaring comrades,
00:03:44.000 lured by the promise of clear waters and
00:03:46.400 separate shores, charted their course
00:03:48.480 upon twin vessels, only to find beneath
00:03:51.440 the surface their hulls bound fast by
00:03:54.319 tangled rigging and unseen currents
00:03:59.120 above the water,
00:04:01.360 a real estate application with two user
00:04:03.200 types, tenants and landlords.
00:04:06.560 In the course of doing business, the
00:04:08.159 tenant app was built first and then as
00:04:10.400 the application in the business grew,
00:04:12.000 the landlord app was created.
00:04:14.720 So the applications were separate, two
00:04:16.639 separate Rails apps, which was a little
00:04:19.359 suspicious, but the architecture was
00:04:20.880 modular, at least on paper.
00:04:23.520 Each app had its own GitHub, GitHub
00:04:25.280 repository, Heroku pipeline, React front
00:04:27.680 end. So it gave the appearance of a
00:04:29.360 clean separated system, a modern enough
00:04:32.800 stack. I know Heroku at Semaphore isn't
00:04:34.639 so modern anymore, but uh but this
00:04:36.960 business has been around for a while.
00:04:39.759 Clean code, CI/CD is running. There were
00:04:42.479 review apps configured. Seemingly all
00:04:45.199 was well.
00:04:47.199 Our lead engineer Steve dove in first.
00:04:50.800 Ran Ruby Critic as he is want to do
00:04:52.720 against the application and the results
00:04:55.120 came out pretty good. 91 score,
00:04:58.720 a few scattered D's, some C's, mostly
00:05:01.840 A's and B's.
00:05:04.960 And the Landlord app, though smaller,
00:05:06.639 was even better. No D's, a few C's, A's
00:05:10.240 and B's all around.
00:05:12.880 But trouble was spotted.
00:05:15.360 First, the customer came to us because
00:05:17.440 they wanted to implement Dora metrics
00:05:19.199 and they wanted our help, which is
00:05:21.120 interesting because you typically don't
00:05:23.120 want to implement Dora metrics until
00:05:24.720 something has gone wrong. And in this
00:05:26.080 case, something had gone wrong. What had
00:05:28.800 gone wrong was over the course of four
00:05:30.400 months, the production application that
00:05:32.560 was deployed needed to be rolled back
00:05:34.160 four separate times. Once one month,
00:05:36.720 then twice the next month, then zero the
00:05:38.880 month after that, one the month after
00:05:40.479 that. Depending on the industry you're
00:05:43.039 in and the number of times you deploy,
00:05:44.800 that might not seem like a big deal. But
00:05:46.800 in the highly regulated real estate
00:05:48.560 industry,
00:05:50.080 these kinds of production bugs when left
00:05:52.720 unchecked or before being rolled back
00:05:55.280 can result in regulatory fines. They can
00:05:57.600 result in regulatory penalties and even
00:05:59.440 shutdowns in certain states.
00:06:04.080 When we took a look at the main
00:06:05.919 branches, we found that the tenant app
00:06:08.080 main branch had failed seven times in
00:06:09.840 the last 25 builds. And the landlord app
00:06:12.240 was even worse. It had failed 11 times
00:06:14.400 out of the last 25 builds.
00:06:17.280 And what was worse is that there was no
00:06:19.120 flaky tests making this fail, they were
00:06:20.960 failing for inconsistent reasons. And
00:06:23.600 finally, there was no automated test
00:06:25.360 coverage being collected. So although we
00:06:27.120 found the application to be generally
00:06:29.199 well tested, we couldn't tell which
00:06:31.520 parts were really well tested and which
00:06:33.120 weren't because there was nothing like
00:06:34.639 coveralls uh or code climate that was
00:06:37.039 telling us the uh test coverage metrics.
00:06:41.520 So we dove in
00:06:44.960 and the first thing we spotted was that
00:06:46.560 there was direct database coupling. So
00:06:48.319 despite the repo and deployment
00:06:50.560 separation, the tenant app had direct
00:06:52.479 dependencies on the landlord database
00:06:54.880 and schema changes in the landlord
00:06:56.880 database necessitated immediate changes
00:06:59.520 in the tenant application
00:07:01.759 and some models in the tenant
00:07:03.199 application directly called tables in
00:07:06.240 the landlord application. So this
00:07:08.400 violated service boundaries and made
00:07:10.639 isolating testing, isolated testing
00:07:12.560 almost impossible. What's more, those
00:07:14.960 Heroku review apps that we were happy to
00:07:17.280 see at first actually required a
00:07:19.840 matching companion app for the other
00:07:21.680 repo in order to function. And if a
00:07:23.759 direct match didn't exist, then a
00:07:26.080 standalone version was bootstrapped with
00:07:28.639 copying over a schema file and doing
00:07:30.560 some manual changes. And so the whole
00:07:32.960 thing was manual and brittle and it was
00:07:34.639 held together with some custom Heroku
00:07:36.400 API glue code.
00:07:40.080 So let's pause for a moment and take a
00:07:41.520 look at the antiatterns at play. The
00:07:44.319 first superficial modularity. And I want
00:07:46.880 to stress that the antiattern I'm
00:07:48.960 talking about is not tight coupling. We
00:07:51.440 know that tight coupling is something to
00:07:53.520 avoid, but really in the real world it's
00:07:56.160 a system of trade-offs. And it might be
00:07:58.479 okay to say okay these applications are
00:08:00.879 coupled and we should treat them as
00:08:02.160 such. The antiattern here is that
00:08:04.560 although the the developers were
00:08:06.400 obviously intelligent and could say that
00:08:08.639 yes it's not really modular because
00:08:10.879 there's a database dependency the entire
00:08:13.199 team was still behaving as if they had
00:08:15.199 modular applications. That is
00:08:17.280 superficial modularity.
00:08:20.800 Schroinger's deploy. Were the deploys
00:08:23.440 alive or were they dead? I have Mark for
00:08:25.680 inspiration on this.
00:08:28.160 So when the review apps are tested, they
00:08:30.000 seem to work, but the manual effort
00:08:31.680 required to set up the review apps into
00:08:34.000 a staging environment or review app
00:08:35.680 environment was not replicated in
00:08:37.680 production. So we couldn't have
00:08:39.279 confidence that what we were seeing in
00:08:41.120 the review environment was actually
00:08:42.800 going to be there in production. And of
00:08:44.240 course, we had evidence that it was not
00:08:45.760 the same. And finally, broken windows.
00:08:50.080 just charting along the course, seeing
00:08:52.880 unreliable main builds, features that
00:08:55.680 work locally but fail in staging and an
00:08:57.920 inability to to track test coverage made
00:09:01.440 it so that the code that I write on my
00:09:03.519 computer, I could not have any trust
00:09:05.279 that it would make it to production
00:09:06.480 without failing somewhere and I didn't
00:09:08.240 know if those failures were real or if
00:09:10.240 they were a product of the environment.
00:09:13.279 Okay,
00:09:15.040 so the consequences here was really that
00:09:18.320 what started as an engagement to
00:09:20.080 implement Dora metrics quickly became a
00:09:22.080 remediation engagement for an
00:09:23.920 existential threat to the business.
00:09:28.240 The schema changes required
00:09:29.600 synchronization deployments. Features
00:09:31.279 couldn't be tested in isolation and of
00:09:33.360 course confidence was lost among the
00:09:34.880 product and engineering team.
00:09:37.519 So we set to work to untangle the
00:09:40.959 vessels
00:09:42.560 and we started by breaking the direct
00:09:44.959 database coupling. We removed the
00:09:47.279 database dependency between the tenant
00:09:48.880 app and the landlord app and we restored
00:09:50.959 true modularity. We used active resource
00:09:53.760 and I think it's interesting that you
00:09:55.920 know active resource has kind of uh
00:09:58.399 faded. It's not really used uh as much.
00:10:01.040 it's not included in Rails but in but I
00:10:04.240 think the main reason that people don't
00:10:06.399 use active resource is is because it
00:10:08.480 behaves too much like active record in
00:10:10.640 this case people were not using active
00:10:12.560 record enough and so we thought that
00:10:14.320 this was a nice step in the right
00:10:15.680 direction to allow the tenant app to
00:10:17.760 fetch data over the API rather than
00:10:20.399 reaching into the landlord's database
00:10:25.200 we added API contract tests one thing
00:10:27.200 you're going to see over and over at
00:10:28.880 Death Method we really love tests
00:10:30.959 So we used VCR which is well known. We
00:10:33.279 use pact which is slightly less wellnown
00:10:35.279 but that's consumer product contract
00:10:37.040 testing to capture and validate real API
00:10:40.000 interactions and this protects both
00:10:42.399 sides of the boundary from unintended
00:10:44.240 changes.
00:10:47.519 Then we simplified the review app setup
00:10:50.079 and I really liked what we did here. We
00:10:52.160 extracted shared logic into a private
00:10:54.000 gem which helped to remove some of the
00:10:55.839 duplication across both environments for
00:10:57.839 setup and it helped us to fully automate
00:11:00.959 the review app environment with Heroku
00:11:03.760 rather than having manual steps.
00:11:06.800 And finally, we did what they brought us
00:11:09.600 in there to do and we did implement some
00:11:12.000 Dora metrics. In fact, Steve created an
00:11:14.399 open source library for calculating uh
00:11:17.040 Dorometrics using GitHub actions. Now,
00:11:19.440 it's since been archived in favor of
00:11:22.399 services like Sleuth and Linear B, but
00:11:25.120 both Death Method and the customer were
00:11:26.880 really happy to be able to make that
00:11:28.160 contribution at the time because there
00:11:29.600 was no solution that existed.
00:11:32.720 After that, fair winds and following
00:11:35.279 seas.
00:11:36.959 And we learned some things that
00:11:38.959 superficial modularity is dangerous
00:11:42.160 that shared databases are hidden anchors
00:11:44.320 and they need to be cut early.
00:11:47.040 And this one I think is important.
00:11:48.880 DevOps metrics tell the story of
00:11:50.959 architectural health. And it's easy for
00:11:53.680 me I know in my own work to focus
00:11:56.079 exclusively on the health of the
00:11:57.760 application that I'm working on. And I
00:11:59.360 do that with code climate and uh and
00:12:01.760 Ruby critic and whatever CI/CD metrics I
00:12:04.320 may have in place and I think those are
00:12:05.839 very useful but it does not tell me the
00:12:08.079 story of how that app behaves in the
00:12:10.160 architectural environment in production.
00:12:12.160 And we need DevOps metrics for that.
00:12:15.200 And of course, real modularity is tested
00:12:17.760 in how things fail and scale. It is fine
00:12:20.880 to say that your application is modular.
00:12:23.200 It's fine to behave as if the
00:12:25.040 application is modular. But if you can't
00:12:27.600 recover quickly from failures and if you
00:12:29.519 can't scale at the speed of your
00:12:30.800 business, then you do not have a modular
00:12:32.639 application.
00:12:35.360 All right,
00:12:36.880 nice work on the first one. Let's go to
00:12:39.040 chapter two. The Leviathan beneath.
00:12:42.079 Excuse me, I need my hat.
00:12:45.440 Sailing a vessel built with confidence
00:12:47.680 and craft, the captain believed her
00:12:50.079 charts true and her keel steady, unaware
00:12:53.279 that far below a great beast had coiled
00:12:56.560 itself around the ship's spine, grown
00:12:58.880 fat on years of unchecked complexity.
00:13:02.480 Thank you, Colin. Appreciate it putting
00:13:04.800 the hat on.
00:13:08.399 Above the water, we have a marketplace
00:13:10.560 for collocation and networking buyers
00:13:12.480 and sellers.
00:13:14.959 There's a clear separation between the
00:13:16.480 Rails back end and the React front end.
00:13:18.720 It's modular front end with Redux,
00:13:20.560 Webpack, SPAS. It was fine. There are
00:13:23.440 reasonable metrics in place. A code
00:13:25.040 climate rating of a B. Okay. Clean
00:13:27.839 controllers, organized UI.
00:13:30.399 The team came to us or the business came
00:13:32.240 to us because they needed to add new
00:13:34.079 service categories, but they feared the
00:13:35.920 cost and complexity of extending the
00:13:37.760 system, which is interesting. At first
00:13:40.399 glance,
00:13:42.399 there is not evidence for that. But a
00:13:44.959 closer look revealed, nope, not up to
00:13:47.120 that part.
00:13:48.880 This is just to show you that it's a a
00:13:50.800 not small but not very large
00:13:52.320 application.
00:13:54.399 140 160,000 lines of code. Some trouble
00:13:57.600 was spotted early.
00:14:00.399 So, the CI build had been broken for
00:14:03.440 months due to flaky tests. On one hand,
00:14:06.480 the developers didn't just delete the
00:14:08.240 flaky tests or mark them as pending,
00:14:09.839 which is even worse. But on the other
00:14:11.600 hand, they didn't let it stop them from
00:14:13.040 deploying. And so, they were deploying
00:14:14.480 with red builds.
00:14:17.440 Adding to that, there was no JS
00:14:19.120 integration or unit test. So, everything
00:14:20.880 was being tested in the front end at
00:14:22.560 least with feature specs. And while I
00:14:24.639 have a soft spot on my heart for feature
00:14:26.160 specs, they are very difficult to
00:14:27.920 maintain. They are very flaky and they
00:14:30.880 take forever to run. And to to top
00:14:34.160 things off, we saw that actually it was
00:14:35.760 only covering about 40% of the front-end
00:14:38.320 code anyway.
00:14:40.880 But this is the real smoking gun, if
00:14:43.199 you'll allow me to mix metaphors. The
00:14:45.680 technical debt over the course of one
00:14:47.680 month dropped from about 16,000
00:14:49.920 estimated hours of remediation time to
00:14:52.320 about 2,000 hours, which is 14,000 hours
00:14:55.760 and is really impressive because it was
00:14:57.440 a team of three that was working on
00:14:58.959 this.
00:15:00.720 Of course, the code climate warnings
00:15:02.160 were being ignored. And probably if
00:15:06.000 you're using code climate, you're also
00:15:08.000 ignoring one or two warnings. 1,500
00:15:10.639 warnings are being ignored in this case,
00:15:12.959 which is a really extreme example and
00:15:15.040 was mask masking some really extreme
00:15:17.279 complexity.
00:15:20.880 Looking at the dependency chart for the
00:15:22.480 front end, things are not great, but
00:15:24.560 they're not terrible. This is a
00:15:26.560 manageable amount of dependencies. The
00:15:29.040 real issue we found on the front end was
00:15:30.880 that client side components like project
00:15:32.880 view.jsx and requirements.jsx JSX each
00:15:35.920 exceeded 1600 lines and had some really
00:15:38.480 nasty conditional logic.
00:15:41.920 But this is what the Ruby code
00:15:44.240 dependency graph looked like. And
00:15:46.160 shameless plug, if you sign up for
00:15:48.240 Phoenix, you will get a directed
00:15:49.839 interactive 3D graph of your code and
00:15:52.320 you can actually look through it and
00:15:53.759 find all of the really gnarly
00:15:55.839 dependencies. But I digress. This was
00:15:58.079 pre Phoenix. And what we quickly found
00:16:00.880 with this dependency graph
00:16:04.320 was that we had project and quote
00:16:06.240 classes that each had more than 20
00:16:08.480 responsibilities. So single
00:16:10.160 responsibility principle is out the
00:16:11.600 window. Half of the quote code was
00:16:14.399 devoted to pending quote generation and
00:16:17.680 automation. So state management
00:16:19.199 happening in the file and active support
00:16:21.680 concerns were burying yet more
00:16:23.440 complexity. We'll get to that in a
00:16:25.120 moment. I already mentioned the code
00:16:27.199 climate warnings and there were an
00:16:28.720 intense number of model dependencies.
00:16:33.199 The antiatterns at play here. First,
00:16:35.920 Leviathan classes which you may call God
00:16:38.800 classes, but I'm trying to stay on
00:16:40.079 theme.
00:16:42.320 The god classes or the Leviathan classes
00:16:44.480 are usually giant classes in your
00:16:46.480 application that have tons of
00:16:48.079 responsibilities, often tons of
00:16:49.920 dependencies. They are very difficult to
00:16:52.000 test. They are very difficult to
00:16:53.279 maintain. and they're even more
00:16:54.880 difficult to add new features or to
00:16:57.040 actually iterate upon. I see some people
00:16:59.839 shaking their heads and smiling. These
00:17:02.079 are really common god classes.
00:17:05.120 Warning fatigue. I know I'm citing a
00:17:07.360 really extreme example where a team
00:17:09.120 decided unilaterally to ignore 1500
00:17:11.600 warnings. But the truth is that I've
00:17:13.760 never worked on or seen or consulted for
00:17:17.039 a team that didn't experience some kind
00:17:19.280 of warning fatigue. And you could think
00:17:21.600 about this right now. you're all at a
00:17:23.199 conference. Has anybody did anybody
00:17:25.919 ignore a Slack warning about an issue
00:17:28.880 that might have happened? Some kind of
00:17:31.840 uh error that was caught, some kind of
00:17:34.080 notification from your CI. It happens
00:17:36.400 all the time. It's actually really
00:17:38.080 difficult to be very judicious about
00:17:40.720 what warnings we allow. And if you
00:17:42.720 ignore a warning, it really is kind of
00:17:44.559 the same thing as turning it off.
00:17:47.679 Finally, concern creep. Active support
00:17:50.559 concerns serve a valid purpose and they
00:17:53.840 really are interesting um code and and
00:17:56.400 nice to implement. However, they can
00:17:58.799 also add invisible complexity to already
00:18:01.200 bloated classes.
00:18:03.440 So they can quickly become a problem
00:18:04.880 when not used sparingly and in this case
00:18:07.200 they really were a problem.
00:18:10.400 So the consequences are the lost trust
00:18:12.320 in the build process. Engineering plans
00:18:14.799 and estimates were off by orders of
00:18:16.640 magnitude
00:18:18.480 and most importantly the business can't
00:18:21.039 break into new service categories. So it
00:18:23.440 could not advance because of what had
00:18:26.000 been built.
00:18:28.720 We needed to chart a new course. We
00:18:30.640 could not slay the Leviathan. Or maybe
00:18:32.880 it's more accurate to say that we would
00:18:34.480 not slay it. At Death Method, we
00:18:36.960 generally work with thriving businesses
00:18:39.200 which means that rewriting code is
00:18:41.440 generally off the table. the business
00:18:43.039 needs to continue which means the
00:18:44.400 application needs to continue. So
00:18:46.559 instead of trying to rewrite it or
00:18:48.480 trying to do some major surgery we
00:18:50.640 instead tried to chart a new course and
00:18:52.480 we started with some major backend
00:18:54.240 refactoring.
00:18:56.000 We decoupled the responsibilities by
00:18:57.760 extracting pricing export and project
00:19:00.240 life cycle logic from models into
00:19:02.320 discrete service classes and
00:19:04.080 interactors. So the interactor gem if
00:19:06.480 you've not used it is designed for the
00:19:08.559 specific purpose of encapsulating
00:19:10.160 business logic. It's quite useful.
00:19:13.840 The Statesman gem was used for state
00:19:15.840 management. And there are actually many
00:19:17.440 libraries out there for state
00:19:19.039 management. I recommend using one. Um,
00:19:21.600 and in this case, we were able to move,
00:19:23.600 you know, more than half of the code out
00:19:25.440 of the quote class and do life cycle
00:19:27.919 management outside of that class.
00:19:31.440 The front end needed just as much
00:19:33.120 restructuring
00:19:35.840 which we streamlined by modularizing the
00:19:37.919 re the Redux connections, eliminating
00:19:40.320 unused Redux code, and introducing code
00:19:43.039 splitting to improve performance and
00:19:44.960 load times.
00:19:46.880 And finally, we implemented the React
00:19:48.640 testing library for unit and integration
00:19:50.720 tests so that we weren't relying
00:19:52.559 exclusively on feature specs.
00:19:58.400 And finally,
00:20:00.880 we reestablished trust in the metrics.
00:20:02.960 This is a negotiation.
00:20:05.200 It was pretty easy to say, "Hey, look,
00:20:06.880 the green build really needs to be
00:20:08.240 reinforced."
00:20:09.919 Engineers generally agree we shouldn't
00:20:12.000 be deploying on uh on a flaky build. We
00:20:14.640 shouldn't be deploying when things are
00:20:15.760 red. But unignoring code climate
00:20:18.080 complexity um first is not generally
00:20:21.520 welll liked by the team because they
00:20:22.799 don't want to see all these warnings
00:20:23.919 come back and second is not really
00:20:25.919 doable for a team of three to remediate
00:20:27.840 all of that code. So instead it was more
00:20:30.720 of a judicious discussion to say okay
00:20:33.120 well what warnings are really important
00:20:35.039 and are masking the highest amount of
00:20:36.799 complexity so that we can move in a
00:20:40.080 better direction than where we've been
00:20:41.679 going.
00:20:43.840 Finally, a commitment to 80% test
00:20:45.600 coverage for new and refactored code. If
00:20:47.919 you're a testing zealot like me, 80% is
00:20:50.400 an extremely low bar for Ruby. However,
00:20:53.360 if you're not, that's okay. Just know
00:20:56.080 that it is easy to get to an 80% metric
00:20:59.440 by simply creating a new file with RSpec
00:21:02.720 or miniest or test unit or whatever you
00:21:04.559 like and simply exercising all of the
00:21:07.039 public methods in your file. You'll hit
00:21:10.080 80% no sweat. After
00:21:14.400 that, fair winds and following seas. Of
00:21:16.960 course, this took some time. Uh it's not
00:21:19.760 like this was able to to be done in a
00:21:21.520 weekend. But we did learn some things.
00:21:23.840 We learned that complex systems fail
00:21:25.760 slowly under layers of accreted
00:21:27.760 responsibility. There was no big outage,
00:21:31.120 no big um threat to the business except
00:21:36.159 that when the business decided it wanted
00:21:38.640 to move into new categories, which is
00:21:40.400 what businesses tend to do, they found
00:21:42.400 that they couldn't. So they were
00:21:44.640 stagnated under the layers of accreted
00:21:47.360 responsibility.
00:21:49.120 Complexity ignored is complexity
00:21:51.039 multiplied. If you take nothing else
00:21:53.520 from my talk today, understand that
00:21:56.000 complexity ignored is complexity
00:21:58.480 multiplied. Entire businesses have been
00:22:01.360 built
00:22:03.360 because developers ignore this to their
00:22:06.000 detriment. Complexity ignored is
00:22:08.480 complexity.
00:22:10.320 Yeah. Multiplied. All right.
00:22:14.240 Uh finally, the true measure of a
00:22:15.760 healthy system is safe change at speed.
00:22:17.520 Of course, this is not a system that
00:22:19.520 could change at the speed that the
00:22:21.120 business needed to, but it eventually
00:22:23.120 got there, I am happy to say.
00:22:27.280 Finally, chapter three,
00:22:30.480 the silt below.
00:22:34.080 Long at anchor and trusting in familiar
00:22:37.120 charts, our comrades gave little thought
00:22:39.600 to the slow accumulation beneath them.
00:22:42.720 Until the day came when the ship would
00:22:44.720 not turn, and they found its keel mired
00:22:48.159 in deep layers of silted logic and
00:22:51.679 forgotten depths.
00:22:54.159 Hope the photographer got me with the
00:22:55.440 hat on.
00:22:57.760 Above the water, one of the longest
00:22:59.520 running and successful public-f facing
00:23:01.360 Rails applications. I'll say no more
00:23:03.760 because I don't want to identify them.
00:23:06.320 Conventional methods were used for MVC.
00:23:09.520 Simple controllers, simple models,
00:23:12.240 complexity was split out into helpers
00:23:14.080 and concerns. And in this case, those
00:23:15.840 concerns were not a concern. A well-
00:23:19.120 tested application and in very good
00:23:21.520 shape for a Rails application that's
00:23:22.960 almost two decades old.
00:23:25.520 So, what did we spot
00:23:30.720 first?
00:23:32.320 What is this model doing? This comment
00:23:35.360 is not mine. We actually found this in
00:23:37.520 the code. This is just a join model and
00:23:39.520 usually shouldn't need to be used
00:23:40.960 directly.
00:23:42.559 Well, then why is it there?
00:23:45.360 So, we take another look beneath the
00:23:47.120 surface. And in the structure.sql file,
00:23:50.080 we found that first there's a
00:23:52.080 structure.sql file. Second,
00:23:56.080 we found that there were dozens of these
00:24:00.080 complex Postgress functions in that
00:24:02.880 file.
00:24:06.320 Oh, skip that. And finally, in the
00:24:09.360 membership controller, we found this
00:24:11.600 multiple joins and raw SQL in the
00:24:14.480 controller. Added to this is the fact
00:24:17.120 that what we are trying to do here is
00:24:19.279 callull a list of admin users. And how
00:24:21.840 many admin users could there possibly
00:24:23.679 be? 10, 20? To use multiple joins in raw
00:24:27.919 SQL in this manner felt like it was
00:24:30.080 hiding something really deeply wrong.
00:24:35.039 So to summarize, hard-coded SQL
00:24:36.799 functions in structure.SQL, dozens of
00:24:39.279 join dates, filters, and embedded
00:24:41.600 business logic, raw SQL in the
00:24:44.640 controllers.
00:24:47.200 the antiatterns here logic in the
00:24:50.080 depths.
00:24:51.760 So what that structure SQL file revealed
00:24:54.880 was application logic buried in stored
00:24:57.360 procedures. This affects everything.
00:25:01.440 Store pro stored procedures are hard to
00:25:03.840 maintain. They're hard to test. They're
00:25:05.840 hard to reason about. You name it.
00:25:09.679 Also overnormalization. So a join model
00:25:12.960 in and of itself is not an antiattern.
00:25:16.320 They are sometimes necessary. But what
00:25:18.640 we find when we see multiple join models
00:25:21.520 with comments like do not use this
00:25:23.600 model. We start to suspect that
00:25:26.480 something is at play. This
00:25:27.840 overnormalization
00:25:29.440 or that at some point the engineers
00:25:32.080 decided to prioritize third normal form
00:25:35.440 or beyond over being able to easily work
00:25:38.799 within the application layer. This is an
00:25:41.520 antiattern and I'll tell you why. The
00:25:44.640 consequences here are that each change
00:25:46.960 requires migrations and manual SQL
00:25:49.760 edits. That in itself slows down
00:25:52.080 development and makes things brittle.
00:25:54.159 But I am more concerned with the second
00:25:56.720 bullet point which is that maintenance
00:25:58.880 and iteration require specialists. So
00:26:01.440 stay with me here. You may not think
00:26:03.679 that needing to know a bunch of SQL
00:26:05.760 makes you a specialist. And indeed we're
00:26:07.520 at a developer conference where we all
00:26:09.679 want to say okay well we are Rails
00:26:11.360 developers. Of course, we know how to
00:26:13.039 write SQL or we should know how to write
00:26:15.120 SQL. But if you stop and think for a
00:26:17.279 moment about the kinds of developers
00:26:19.679 that can modify or maintain or iterate
00:26:23.120 upon
00:26:24.880 oh
00:26:28.159 this
00:26:30.080 I guarantee you are thinking about your
00:26:32.159 more experienced engineers. So to the
00:26:34.720 detriment of your less experienced
00:26:36.559 engineers which means that all of a
00:26:38.159 sudden you are creating this uh this
00:26:40.799 dynamic or this dichotomy where only
00:26:43.279 maybe half or fewer of your developers
00:26:45.600 can actually contribute to the entire
00:26:47.200 application meaningfully. And let's face
00:26:48.960 it, it's just a database. Everybody
00:26:51.360 should be able to participate and add
00:26:53.520 features.
00:26:57.600 In full disclosure, we have not
00:26:59.760 excavated this hole just yet.
00:27:02.559 This customer has actually a lot of
00:27:04.400 issues and we have not yet gotten to the
00:27:06.400 point where we can start to separate the
00:27:09.279 structure.sql file or break it up. But
00:27:11.440 this is what we will do when given the
00:27:13.360 chance. First surface can catalog.
00:27:17.600 Find every Postgress function in that
00:27:19.520 structure file. Classify each by
00:27:21.679 complexity, frequency of use, and
00:27:23.919 dependency on application tables. I'm
00:27:26.159 betting that there are some stored
00:27:27.679 procedures there that are no longer
00:27:29.120 being used. And wouldn't that be nice to
00:27:30.799 just delete them? For the rest, we want
00:27:34.000 to detach from structure.sql fi SQL if
00:27:37.120 possible. I understand that there can be
00:27:39.279 really fine grain database tuning
00:27:41.039 happening in the structure.SQL file, but
00:27:43.120 barring that, we really should all get
00:27:44.720 on to the schema.rb.
00:27:48.000 We'll surround with tests.
00:27:50.799 Um, and for critical or interconnected
00:27:52.720 functions, we would add contract tests
00:27:54.399 to ensure that downstream logic remains
00:27:56.799 stable throughout the transition.
00:27:59.600 and incrementally replace. So we can
00:28:01.840 rewrite that SQL in active record scopes
00:28:03.919 or in ARL and we can extract the service
00:28:06.640 objects and cache as needed for
00:28:08.720 performance reasons.
00:28:11.360 After that we would of course have fair
00:28:14.080 winds and following C's.
00:28:16.559 The thing to learn here is that a
00:28:17.840 structure file deserves scrutiny. It
00:28:20.080 doesn't mean that it's bad, but it can
00:28:22.399 sometimes
00:28:23.919 uh be done because the team is working
00:28:26.799 against what Rails gives you. And
00:28:28.960 certainly in an application that's been
00:28:30.559 running for almost 20 years, it could be
00:28:33.120 that that Rails didn't always give us
00:28:35.760 the the means to get around that
00:28:37.279 structure file, but it does today.
00:28:40.559 Rails conventions mixed with embedded
00:28:42.159 Postgress logic create split maintenance
00:28:44.240 paths and thus specialists and join
00:28:47.039 models may be a sign of
00:28:49.120 overnormalization.
00:28:53.520 Okay, the storm has passed. The sails
00:28:57.120 are mended and for now the waters are
00:28:59.840 calm. The crew stands at the helm, not
00:29:02.640 unscarred but wiser. Their hands I
00:29:06.159 already lost my place. their hands
00:29:07.360 guided by hard one knowledge of what
00:29:09.360 lurks below. They know that even still
00:29:12.640 seas conceal silt, and fair winds may
00:29:15.760 yet lead toward hidden beasts, but with
00:29:18.240 eyes sharpened and a vessel made ready,
00:29:20.960 they sail on.
00:29:24.640 Thank you everybody for coming. I really
00:29:27.440 appreciate it. You have made this an
00:29:28.799 incredible experience.
Explore all talks recorded at RailsConf 2025
Ben Sheldon
Sam Poder
Rhiannon Payne
Joe Masilotti
Josh Puetz
Wade Winningham
Irina Nazarova
Tess Griffin
+77