10 Costly Database Performance Mistakes (and How to Fix Them)


Summarized using AI

10 Costly Database Performance Mistakes (and How to Fix Them)

Andrew Atkinson • July 08, 2025 • Philadelphia, PA • Talk

Introduction

In the talk "10 Costly Database Performance Mistakes (and How to Fix Them)", Andrew Atkinson shares insights from working with Rails and Postgres applications, drawing on real-world experience to identify common pitfalls that can cause significant performance issues and financial costs in production environments. Delivered at RailsConf 2025, this session guides developers through 10 major mistakes, explaining why they arise, their impacts, and actionable strategies to resolve them.

Key Points and Mistakes Discussed

  • Infrequent Releases:

    • Slow release cycles decrease team agility and extend incident resolution times. Solutions include adopting trunk-based development, using feature flags, tracking key DevOps metrics (DORA/SPACE), and considering tools like the anchor migrations gem for safer, more efficient schema changes.
  • Database Inexperience:

    • Many Rails teams lack deep database expertise, leading to inefficient queries and poor schema choices. Recommendations include hiring or developing DBA talent, leveraging production clones for safe experimentation, and using educational resources and communities.
  • Speculative Database Design:

    • Avoiding constraints or normalization due to speculative future needs can result in data bugs and maintenance headaches. Adopt constraint-driven design (the "CORE" principles), align database constraints with application validations, and normalize by default.
  • Missing Database Monitoring:

    • A lack of focused database monitoring makes it harder to triage and resolve performance incidents. Solutions include logging detailed query metadata, tracking query execution plans, and using tools/extensions like query logs, marginelia, PGHero, PG Analyze, and PG Badger.
  • ORM Pitfalls:

    • Overuse or misuse of ORM features can generate unnecessary or inefficient queries (e.g., excessive COUNTs, huge IN lists, SELECT *). Auditing queries, limiting columns, restructuring queries, using cache features, and adopting more scalable approaches to pagination are recommended.
  • DDL Fear:

    • Hesitance to make schema changes, especially at scale, leads to code workarounds and technical debt. Practice DDLs on production clones, use linting tools for safe migrations, take advantage of ignored columns, and set appropriate lock timeouts.
  • Excessive Data Access:

    • Retrieving or aggregating excessive data degrades performance. Analyze and reduce data access patterns, implement advanced indexing, and consider data partitioning or precalculating aggregates (using gems like rollup and scenic).
  • Missing Data Archival:

    • Failure to archive old, seldom-accessed data increases latency and server costs. Archive unneeded data, migrate large tables to partitions, and prefer detaching partitions over massive deletes.
  • Missing Database Maintenance:

    • Running outdated software, neglecting bloat cleanup, and not tuning auto-maintenance features can harm performance and security. Regularly upgrade databases, tune maintenance parameters, prune obsolete objects, and rebuild or reindex as needed.
  • Rejecting Mechanical Sympathy:

    • Not understanding the database’s internal workings leads to suboptimal design and resource misuse. Developers should inspect SQL generated by ORMs, use database-specific optimizations, prevent lazy loading, and set prudent operational boundaries.

Illustrative Examples

  • Use of real-world Rails migrations and SQL linting tools (Squawk, anchor migrations)
  • Case references to poorly designed index strategies, inefficient aggregates, and massive delete scenarios handled by partitioning

Conclusion and Takeaways

The talk emphasizes proactive monitoring, regular maintenance, a willingness to evolve schemas, and developing a deeper understanding of how relational databases operate. By addressing these 10 pitfalls with practical solutions, Rails teams can significantly boost performance, reduce costs, and foster more maintainable applications.

10 Costly Database Performance Mistakes (and How to Fix Them)
Andrew Atkinson • Philadelphia, PA • Talk

Date: July 08, 2025
Published: July 23, 2025
Announced: unknown

After working with countless Rails and Postgres applications as a consultant and backend engineer, I’ve seen firsthand how database mistakes can cause big costs and headaches. Poor data types, inefficient queries, and flawed schema designs slow down operations, and result in excessive costs through over-provisioned servers, downtime, lost users, and engineering hours spent restructuring features.

How do we prevent these pitfalls? Awareness is the first step. And if your database is already serving tons of woefully inefficient queries, where should you focus for the biggest wins?

In this talk, we'll break down 10 real-world Rails database mistakes, including how they happened, the impact they had, and most importantly, how to fix them. Topics include query design, indexing, schema optimization, and how the CPU, memory, and IOPS resources tie into Active Record SQL performance.

Expect practical takeaways, real examples, and solutions you can apply immediately. Slides and blog posts will be available at GitHub.

RailsConf 2025

00:00:18.640 thanks for uh sticking with it uh for this last session of the day. Um okay.
00:00:25.519 So um in this presentation it's you'll see that the format is basically kind of
00:00:31.519 mistake area and then solutions for that and so I have a lot of links in here more than 60. So if you want to check
00:00:39.040 any of these links go ahead and scan this QR code or go to bit.lyrc10m the slide deck is online already and you
00:00:46.160 can find all the links there. Um so yeah and before I get into it really briefly uh RailsCom has been a
00:00:52.960 big part of my career. My first one was in 2010 back in Baltimore and in the last few years I've been able to present
00:00:58.719 which has been a real honor. Um and so I did want to if you wouldn't all mind um
00:01:03.840 you might have done this before today outside of this morning session but we could all just do a quick round of applause to thank all the organizers and
00:01:10.080 attendees that have made Rails comp.
00:01:18.159 We'll miss you Rails comp. Uh okay so let's get into it. So, we're going to look at costly database mistakes. And
00:01:25.200 some of these are going to be direct costs that are sort of obvious. You might overprovision your instance to handle certain high load periods. Some
00:01:32.799 will be more indirect costs like uh bad user experience that leads to customers downgrading or leaving your platform or
00:01:39.759 maybe an excessive amount of time spent by devs triaging an incident or resolving it, that sort of thing. So,
00:01:45.200 I'm going to have these these little bubbles in the bottom right that will show some of the costs for each of the problem areas.
00:01:52.399 So, we're going to look at 10 uh items in three categories. Uh the forming
00:01:57.520 category would be kind of intentionally attacking your database performance challenges um related to improving your
00:02:04.240 agility uh performance, reducing latency, that sort of thing. And then scaling and optimizing. Those are
00:02:09.920 probably more clear. And we'll go from 10 down to one. And each of these have their own color. And then these costs
00:02:15.440 are in the bottom right here. So far so good. Yeah.
00:02:20.640 Okay. So, uh, first up, mistake number 10. And we have to cook through these because I have a lot of content for 30 minutes. So, uh, um, I will talk at a
00:02:27.920 reasonable clip. Apologies to non-native English speakers. Please watch it at 0.7x 75x later on if that helps you or
00:02:35.840 um, watch it at 2x if I'm talking too slowly. Uh, okay. So um mistake 10 would
00:02:42.160 be infrequent releases. As we know we release most of our DDL changes our schema or structural changes to our
00:02:48.480 database uh through Rails migrations. So having an infrequent release process can really reduce your team's agility and
00:02:54.959 ability to evolve your schema. So using something like Git Flow or a similar sort of legacy software delivery
00:03:01.040 process, not using feature flags to decouple your releases from feature visibility, not tracking or or using
00:03:07.519 DevOps metrics in a metrics based approach, and then performing DDL changes exclusively using Rails
00:03:13.040 migrations. We'll look at one alternative to that briefly. So this can increase your cycle time and increase
00:03:18.879 your time spent uh responding to and resolving incidents. So to fix that, I recommend trying to
00:03:25.280 move towards trunkbased development if you're not already doing that. Using feature flags to help decouple your releases from your feature availability.
00:03:32.480 And there are teams out there using Rails that aren't doing trunkbased development. Last year's Rails survey,
00:03:37.519 uh 20% of the respondents released multiple times per month. 2% of the respondents released multiple times per
00:03:43.760 quarter. So that's pretty infrequent and it's going to reduce that team's agility. Uh tracking DevOps metrics. uh
00:03:50.799 a couple frameworks. If you're not familiar, Dora and Space are two frameworks that have a few key metrics
00:03:56.400 related to your team's ability to deliver software. Uh the book Accelerate covers Dora and Dora has a two-minute uh
00:04:03.280 quick check that is useful on teams to take and look at a few metrics. I'll show an example of that. As developers,
00:04:09.040 we can help um raise our test suites coverage, increase its speed and its reliability. Um and then one other
00:04:15.200 option you could look at if you are in this position now where you're not doing trunkbased development um I've released
00:04:20.639 a new gem called anchor migrations where the idea is to do uh SQL migrations that
00:04:25.759 can still be safety linted using a tool called squawk which I'll show and uh we can write non-blocking item potent DDL
00:04:33.199 changes and then generate rails migrations from that SQL to keep everything in sync.
00:04:38.639 So on the top here is just showing squawk briefly within anchor migrations using the lint command. So I've created
00:04:45.199 a DDL that's creating an index but not using the concurrently keyword. This would be for postgres and uh squawk is
00:04:52.560 the SQL uh linting tool that I mentioned. So it's detected hey you should use concurrently here to avoid
00:04:58.000 this being a blocking DDL operation. This is that accelerate book I
00:05:04.000 mentioned. And then here's a couple screenshots from the DORA two-minute quick check. and it covers their four primary DevOps metrics. Lead time,
00:05:10.560 deployment frequency, change fail percentage, and failed deployment recovery time.
00:05:16.240 Okay, so mistake number nine, database inexperience. So as Rails developers, we're usually
00:05:23.039 working with active record uh and we're working at the RM layer. So we're dealing with object orientation, inheritance, classes, that sort of
00:05:30.160 thing. And relational databases are quite different. you know we have data access to think about SQL relations
00:05:36.080 indexes execution plans so there's kind of this mismatch and it is a different set of skills
00:05:42.720 so often teams might get stuck where they're not hiring database specialists or DBAs on their team they're not using
00:05:49.360 SQL and application code or BI types of code they're not reading and using query execution plans as part of their designs
00:05:56.000 or using terms like cardality selectivity and execution plan buffers from Postgres execution plans for
00:06:02.080 example uh they might be adding indexes kind of haphazardly or choosing schema designs
00:06:07.199 that have poor performance and this again can increase your server costs and increase your uh developer time.
00:06:16.160 So to fix that you can uh directly hire experience. You can look to hire database specialists, DBAs and uh
00:06:23.199 consultants. You can grow experience on your team with books and courses and conferences like this one and
00:06:29.280 communities. Um I'm particularly um or I I'm involved in the Postgre Slack
00:06:35.039 community and that's a good one that I'd recommend if you use Postgres. Um, a specific tactic at companies that
00:06:41.840 I've seen work well is to create a production clone if you can afford that in your in your uh infrastructure budget
00:06:47.840 where you've got a similar sized instance and data to work with and then you can really integrate that into your workflow and run all of your DDL changes
00:06:54.080 there. And as you gain more experience, you can use these concept of concepts of
00:06:59.280 pages and buffers and selectivity and cardality in your designs to help improve your designs and reduce latency
00:07:04.639 in your queries. Uh, and you can learn to avoid performance unfriendly designs like random UUID primary keys. I just
00:07:10.960 wrote a blog post on that that's linked up here you can check out if you're interested. Um, so you'll learn about uh, you know,
00:07:17.360 row versions and you can figure out which Spider-Man is the live uh, Spider-Man and which one is the dead Spider-Man.
00:07:24.160 Um, you learn about fixedsiz pages where data and index entries are stored and um
00:07:30.880 how dense or sparse your uh your storage is and how that affects latency.
00:07:37.599 Um, here are some books I recommend. Uh, these are Postgre's books. So, this is meant to be a talk that would cover
00:07:45.199 MySQL and SQLite as well with Rails, although it's mostly um I've got more content in here related to Postgres. And
00:07:51.199 then the last book is my own book which I'd recommend if you aren't familiar with it. Um these other books are more
00:07:56.879 uh postgres oriented directly and they're all uh great books. And then a
00:08:02.479 couple courses Aaron Francis has produced um a couple great video courses on Postgrez and SQLite. And then Cresten
00:08:09.680 Jameson has produced a couple of great Postgres courses as well including a couple free ones. And uh Ludicrous speed
00:08:16.319 is one that I went through and uh would recommend. Okay. So next up, uh I'm calling it
00:08:22.720 speculative DB design. So this would be something like avoiding a beneficial database constraint today in your schema
00:08:29.520 design based on speculation about your needs that you might have in the future or tomorrow. So you might doubt your
00:08:35.440 ability to evolve the schema design in the future. Uh you might not be using third normal form normalization by
00:08:41.440 default. Um and then on the opposite side, you might be avoiding all forms of denormalization where you want to gain
00:08:48.560 an advantage on your uh query side at the cost of some data duplication. And
00:08:54.399 so there are cases like multi-tenency where that can be beneficial. So this could show up as uh data bugs and
00:09:00.399 increased um maintenance costs. So to fix that, I recommend trying to
00:09:06.480 use all available uh constraints that are available in your database for um improved consistency, referential
00:09:12.720 integrity, and data quality. And uh I created this uh core acronym to uh
00:09:20.080 represent a few different design principles for schema design where the C is uh constraint driven. And I've got a
00:09:25.920 blog post on that if you're curious. Uh you can create I recommend trying to create matching database constraints for
00:09:32.399 your active record validations. Uh doing things like matching your primary key and foreign key types. So there's a tool
00:09:38.399 called database consistency that's a Ruby gem which I'll show that helps you achieve some of that. I recommend using
00:09:45.200 data normalization practices by default working to eliminate duplication
00:09:50.240 um designing for today but also anticipating growth in your data and query volume. And then I think
00:09:56.640 denormalization is a useful tactic some of the time. Uh for example, if you've got a multi-tenant app and you want to
00:10:03.440 denormalize your tenant identifier column. So this is a a SQL query that's just
00:10:09.760 showing um the number of foreign keys uh per table as a ratio. And this uh is
00:10:15.920 useful if you just want to do a quick check on your own database for your usage of constraints. Here's a quick
00:10:22.480 screenshot from that database consistency command line tool. It's got a lot of built-in checkers. It's a
00:10:27.760 really nice tool. So, in this case, it's checking that there's a a foreign key constraint defined that matches a active
00:10:33.200 record validation and that the uh foreign key and primary key data types are matching.
00:10:39.839 Okay, next up is uh missing DB monitoring. So a lot of folks um
00:10:44.880 probably use APM tools with their applications but they might have less monitoring and visibility on their
00:10:50.560 database activity. That could be both foreground client application activity and background processes.
00:10:57.839 Um so not logging slow queries or collecting query statistics and execution plans. Not using the buffers
00:11:03.680 information in Postgrez execution plans. Uh spending time finding application source code for where SQL queries are
00:11:09.839 generated. not monitoring critical background processes like autovacuum. So again, these can increase your time
00:11:15.360 spent triaging an incident and resolving an incident. So to fix that, I recommend uh logging
00:11:21.360 and storing uh all uh source code line number locations for your your queries.
00:11:27.760 And you can do that using query logs in Rails. Uh there's also a gem called marginelia. Query logs uh formats the
00:11:34.800 annotations using the SQL commenter standard, which is nice. And if you enable that, you'll see it right away in
00:11:40.880 your Rails log. And if you use uh Postgres, you can use those annotations in PGAT statements as well, which will
00:11:47.680 capture normalized query statistics and then give you your comments to go back into your source code. Uh I recommend
00:11:54.560 collecting query execution plans and using those in your designs and your pull requests and that sort of thing
00:12:00.320 manually or automatically with um the auto explain extension in Postgres. Um
00:12:05.760 our goal is to reduce the buffers counts in our execution plan. So first we got to take a look at how many buffers are
00:12:13.120 being used, whether they're coming from the buffer cache or outside of the buffer cache. Our goal is to reduce
00:12:18.480 latency. And um there's some other database observability tools here besides PGAT
00:12:24.720 statements. PG Hero, PG Analyze, which is a sponsor of RailsCom. PG Badger is a
00:12:30.959 Postgres log analysis tool. And then I have a few more tools here for MySQL and SQLite listed.
00:12:38.800 So here's an example if you haven't used those query logs of what that could look like. So here we see the controller
00:12:44.480 action and application name for where this query was generated. The query text is just above the yellow highlighted
00:12:50.720 area. And then uh more usefully we have the source location. So we can see the model where the query was generated, the
00:12:57.440 line number and the method name. Okay, next up, OM pitfalls.
00:13:05.440 So, um, the ORM is what we're likely using for most of our applications queries, but there can be some pitfalls.
00:13:12.320 So, performing unnecessarily or unnecessary or costly ORM queries
00:13:17.760 like count queries or order by that aren't really needed by the application, but are part of what is being generated.
00:13:24.240 using non-scalable query patterns like uh the huge inlists. I wrote a blog post
00:13:29.279 about that which can happen either through direct mechanisms or through eager loading. So check that out if
00:13:35.600 you're curious. Um not restricting column access. So all of your queries are select star queries. We'll talk
00:13:41.040 about that a little bit. Using uh the pageionation that's uh comes by default
00:13:46.079 with the OM and then not using some of the caches that are available with the OM with active record. So uh this could
00:13:53.360 rel or sorry could result in overprovisioned instances needing to scale up further than what's needed um
00:14:00.240 and inefficient queries. So to fix that um I recommend uh you
00:14:05.839 know this this would be more for a mature application. It's been around for a while. It's likely that there are some redundant or um unnecessary queries. So,
00:14:13.839 I wrote a post called putting your app on a SQL query diet that um is intended to help folks go and find uh queries and
00:14:21.199 study and make sure that they're actually needed or useful. If not, we can remove those. Um we can try to limit
00:14:26.880 our columns to what's needed uh using select and pluck in active record and that can help us with um better use of
00:14:33.440 our indexes, particularly multicolumn indexes. If we do have the in lists uh
00:14:39.279 types of queries and if we have big tables and we're working with hundreds or thousands of values in these in
00:14:45.279 clauses uh it's likely that you're going to see bad performance from those types of queries. So those are opportunities
00:14:51.279 to restructure those and you've got a few options. You you might be able to uh
00:14:56.720 restructure into a join or use a values clause or use the any operator with an array in Postgres
00:15:03.760 for pageionation. uh you could look at endless or also called key set pageionation and the pagi gem is one
00:15:09.680 that I've used that can help you set that up and then with active record we have a lot of different cache options we
00:15:15.360 have the prepared statements cache if you're using prepared statements and that allows prepared statements allow us
00:15:20.959 to skip the repeated parsing and pla planning stage as the query arrives in the database engine and uh we can use
00:15:27.760 the counter cache to avoid unnecessary count queries we can use the size method over count length if we have a loaded
00:15:33.760 association to count how many items are in it. So on the top here is um this is
00:15:39.839 actually from the Rails guides. So we see that um there's a book uh model and
00:15:45.839 then there's an eager load of the author association and that's going to generate in the next block these two SQL queries.
00:15:52.000 One for the books table, one for the author's table with this in clause. And this is a small set of values. So this
00:15:58.480 isn't going to be a performance problem, but this can grow into hundreds or thousands of values. And um that's where
00:16:03.920 you run into issues. So we can restructure that. Here's one example in Postgres that uses the any and array
00:16:09.360 structure to restructure the query and give us an equivalent result.
00:16:14.959 All right. So mistake five. Uh I'm calling DDL fear. So this is something I've seen on teams where um a team might
00:16:23.199 create kind of a code workaround to avoid evolving the schema and doing a data backfill. Data backfills can al
00:16:28.959 often be quite intensive but um a mistake here might be avoiding it uh
00:16:36.079 when it's might be a better design long term um using blocking DDL operations
00:16:41.199 due to not understanding locks and how uh um operations that requiring the same
00:16:46.959 lock get queued up and that sort of thing. We'll look at some ways to get a better understanding of that. Not
00:16:52.160 linting DDL migrations for safety or for non-blocking uh designs. not practicing
00:16:57.839 big DDL changes on like a production DB clone like we discussed earlier and uh
00:17:03.600 not automatically cancelling uh contending DDL operations and so um we're going to look at the
00:17:10.480 lock timeout which exists in all of the supported database engines so again this could be longer cycles higher
00:17:15.760 maintenance costs so I recommend um if you have that
00:17:20.959 production clone trying to run all of your DDL changes through that first collecting timing understanding what's locked while those are running. Um using
00:17:29.520 tools like uh strong migrations in Postgres or online migrations for MySQL
00:17:34.559 Maria DB or squawk for SQL uh helps us detect blocking or unsafe DDL changes
00:17:41.600 and then often those tools you know particularly strong migrations and online migrations will suggest um safer
00:17:48.640 alternatives to us so we can perform those equivalent operations but usually with multiple steps. Uh, and then
00:17:55.039 there's other tactics like using the ignored columns feature in Rails to detach our code from the schema first,
00:18:01.520 deploying that. Um, tactics like using an invalid check constraint before we add a notnull constraint.
00:18:07.840 Um, there's a nice tool called pglocks.org that shows operations in which locks uh they acquire, whether
00:18:14.720 they're table lock or whether they're exclusive locks or shared locks. And that's a postgress specific tool. And
00:18:21.200 then um you know I recommend trying to set a low lock timeout for any DDL changes. And if uh the lock that's
00:18:28.320 needed can't be acquired then we would retry that DDL change at a lower activity period.
00:18:34.240 Okay. So mistake number four uh I'm calling excessive data access. So imagine you just got back from the
00:18:39.679 grocery store. You've got like eight grocery bags and you can either bring them all in with like four bags on each
00:18:44.720 arm or you can make two trips of four bags each. So if you're a database, uh the second approach is going to be
00:18:51.200 better. Uh we want to try to work on small result sets. So a mistake here could be querying and retrieving huge
00:18:58.880 sets of data, you know, 10,000 plus rows and doing that while the user is waiting. And this is usually a symptom
00:19:05.440 of something that initially was fine, but now the application is four or five years old and it's no longer uh an
00:19:11.679 acceptable amount of time, you know, so it's taking multiple seconds to load data. So there could be ineffective filtering and indexing on low cardality
00:19:18.799 columns or columns with a small amount of unique values. Um missing indexes on high cardality columns or foreign keys.
00:19:25.919 Uh and missed opportunities for filtering and reducing our result set sizes. Not using advanced indexing
00:19:32.799 strategies or index types. Performing slow aggregate queries like sums and counts u making users wait. And not
00:19:40.080 breaking up big tables using table partitioning which we'll touch on briefly. So again, this could increase
00:19:45.280 your server costs and worsen your user experience. So yeah, we want to try to work with
00:19:51.039 small sets of data. And this is often going to involve taking a look at your existing queries and restructuring them,
00:19:56.240 looking for additional opportunities to select fewer rows through extra filtering, select fewer columns,
00:20:02.400 performing fewer joints. Um, we can look for missing index opportunities on high
00:20:07.600 cardality columns. So I've got some some queries that you can use to try that out or we'll look briefly at the PG analyze
00:20:14.080 lint command line tool. Uh it uses this extension called um hypoppg or hypothetical indexes basically to try
00:20:21.679 out different index opportunities and see what uh would result in a reduced latency
00:20:28.080 and then we want to try to use advanced index strategies like multicolumn indexes or in postgres partial indexes
00:20:34.559 and then other index types besides Bree like jin and gist. um we want to you know one option is to
00:20:41.360 pre-calculate some of those aggregates as they become slow. So one option there is the rollup gem and I've got an
00:20:46.480 example with the ride share app that shows how to use that. It's like a kind of a generic mechanism to create um
00:20:53.039 different timebased rollups of calculations so that then at time they're very fast. Uh we can also
00:20:59.280 denormalize data from a bunch of different tables and create a materialized view and we can manage that
00:21:05.200 using the scenic gem in Ruby. Um, for our huge tables, maybe 100 gigabytes or more, we can look at migrating those to
00:21:12.159 partition tables to improve performance and maintenance. So, here's a quick look at a command
00:21:18.400 line tool called PG Analyze Lint, and I'm running it on a ride share app with the ride share users table, and it found
00:21:25.440 an opportunity for a missing index on the users table and the type column.
00:21:32.640 All right, mistake number three, missing data archival. Uh so this is something I've seen in most of the consulting work
00:21:39.679 I did. Um I think it's just more of a phenomenon of over time data is stored in the database and um what can happen
00:21:46.720 is data exists and it's not queried. So storing a significant proportion of data in tables and indexes and that's going
00:21:52.960 to increase latency. Um so there's an opportunity there to archive that which we'll look at. uh capturing high growth
00:22:00.000 data. Maybe you've used one of these gems before, public activity, paper trail, audited, ohoy, and not archiving
00:22:06.400 any of that data once it's no longer needed in the database. Uh it could be first-party data that your application
00:22:12.559 creates, uh but it's from customers that have left your platform, features that you've retired, or soft deleted data.
00:22:19.520 Another thing a lot of teams do is perform, um massive delete operations. So they could work with lots of rows on
00:22:26.480 big tables and uh at least in Postgres that's pretty resource intensive. So we'll look at an alternative there. Same
00:22:33.440 sort of thing. It's going to this could increase your server costs unnecessarily um to be able to handle high load
00:22:38.960 possibly even from a delete operation and worsen uh user experience from increased latency due to just having
00:22:45.200 more data that's not um not needed for quering purposes. So here's an example
00:22:51.280 of a pattern. This is um from PG Analyze where the reds are inserts and then the
00:22:57.520 the big spikes of greens are delete operations that run in a probably from a crown job.
00:23:04.400 So I recommend trying to archive all data that's not regularly queried. Um I wrote up a post called uh copy swap drop
00:23:12.159 which is a tactic that I used with one client where uh essentially we create a
00:23:17.919 a cloned structure of the table and copy in a subset of the rows and then swap the table names around.
00:23:24.400 Um we can also if we know our data fits well for it's high growth data and it would fit the characteristics of a
00:23:30.640 partition table well we can use partition friendly gems at the outset or we can migrate our big tables to be
00:23:37.679 partition tables and make any necessary Rails code changes and um as as customers leave the
00:23:44.640 platform or we have uh soft deleted data or features that we've retired we can archive that data from our application.
00:23:52.320 I'm a fan of the open source tool called CoverBand that tracks your code execution and you can use that to help
00:23:58.240 discover features that are no longer used that could be retired and thus the data from them could be retired or
00:24:03.760 archived. And for the massive delete operations, those are that's there's a possible um
00:24:10.880 benefit there to migrating that table data into a partition table and then moving to a detach operation of old
00:24:17.200 partitions that are unneeded which is much less resource intensive and can be done concurrently in Postgres as an
00:24:23.279 alternative to doing delete operations. All right, last couple here. So thanks
00:24:29.360 for uh hanging in there. So mistake number two um missing database maintenance. So this one's a little bit
00:24:35.360 more straightforward. Running unsupported versions of Postgres, MySQL or SQLite. Not monitoring or fixing
00:24:40.559 heavily fragmented tables or indexes, also called referred to as bloat, but essentially unneeded objects in the
00:24:47.440 database. Uh or in this case, row data and then row data that's captured in
00:24:52.799 index index entries as well. Uh leaving autovacuum or other maintenance parameters untuned just sort of in their
00:24:58.880 stock configuration and then not removing unneeded database objects. And in this case, objects is triggers and
00:25:04.720 constraints and other things like that. So, this could result in poor performance, uh, security risks or
00:25:09.919 worsened user experiences. So, anytime I run across tables or columns or constraints or indexes or functions,
00:25:15.919 there's lots of things. Uh, I'm I'm trying to remove them if they're no longer needed.
00:25:21.679 So, upgrade your database. uh Postgrez uh the Postgres community has this nice uh one of the community members created
00:25:27.520 this nice tool called Y upgrade that shows your current version and your target version and all of the differences between them. That's really
00:25:33.120 nice. Um I recommend tuning autovacuum and postgres for your workload. So in
00:25:39.039 particular look at your heavily updated and deleted your tables where there's a lot of updates and deletes and that's
00:25:44.240 likely where you're going to want to um add more resources for autovacuum. Uh we want to constantly prune and tune our
00:25:50.480 indexes. So our indexes are designed to support our queries. Uh so they should be used by one or more queries and they
00:25:56.159 should provide a meaningful reduction in latency by being scanned. So I've got a couple tools here that are useful to
00:26:02.640 help you achieve that. Um dropping any unneeded objects from the database. And then for a heavily fragmented or bloated
00:26:09.440 tables and indexes, uh we can rebuild those tables and remove that bloat from
00:26:14.480 those no longer reference row versions. And the same thing for indexes and postgrows. We can reindex and we can do
00:26:20.240 that concurrently. And I spoke on the maintainable podcast about how to maintain our databases. Sort of like our
00:26:25.919 applications. A lot of us are probably familiar with application maintenance tasks. We can apply some of that thinking to our databases.
00:26:33.120 Okay. So, we're at the last one. Got about three minutes left. Um, if you haven't seen this term
00:26:39.440 before, I liked this term and I wanted to use it as the last one that kind of uh brings together all of all of the
00:26:45.279 items in this presentation. The term is mechanical sympathy. So mechanical sympathy is when you use a tool or
00:26:51.600 system with an understanding of how it operates best. And it was originally um coined by a race car driver and
00:26:57.279 referring to a race car. But um on the AWS well architected framework blog or
00:27:03.200 well architected blog they sort of referred you know used it to refer to you know different technology systems.
00:27:09.279 And in this case we're going to look at it with uh a database in mind. So the mistake here would be rejecting
00:27:14.960 mechanical sympathy. um not learning how the database works and then not being able to play to its strengths and do
00:27:21.039 less of where it's not as good. So in Postgres, you know, using a lot of high
00:27:26.080 churn designs where there's a lot of updates and deletes as part of your core design, that doesn't work as well uh
00:27:31.520 with how tupils and MVCC and autovacuum works. Um overusing, you know, limited
00:27:37.600 resources on our database instance. So CPU, memory, and IO from inefficient reads and writes. um inefficient
00:27:44.080 generated queries, not restructuring those or kind of moving more to handcrafted queries from gems. You know,
00:27:50.240 some that I've seen that have caused trouble, JSON API resources, GraphQL, Ruby, Active Admin, allowing lazy
00:27:56.480 loading and M plus1s and not preventing excessively long queries or excessively long idle transactions.
00:28:05.120 Oops. So to fix some of that, I recommend trying to take control of your SQL. So in active record relations we
00:28:10.880 can use 2SQL to inspect the generated SQL and then uh work on improvements. So
00:28:16.399 that could be reducing again our columns that we're picking any joins that uh can be removed and adding more wear
00:28:22.399 filtering more criteria removing unnecessary ordering things like that.
00:28:27.600 Um and then we can look at our execution plans and uh we can use the explain keyword in active record to do that if
00:28:33.120 you're not aware. Uh we can also replace our high update um or delete churn
00:28:39.279 designs with append mostly designs. So one example of that is the slotted counters pattern and there's a nice Ruby
00:28:44.640 gem for that. I've got an example of showing how to use that with ride share to replace some counting. Uh and then in
00:28:50.080 Postgres there's this um type of update called a hot update which we're not going to get into but uh increasing our
00:28:56.000 proportion of our updates as hot updates is a right path optimization.
00:29:01.279 And then uh on the rail side, we can prevent lazy loading by using the strict loading feature, which is relatively
00:29:06.799 new. And if you're not using that, you could start by um logging violations where you've enabled you haven't enabled
00:29:13.600 strict loading, which will raise exceptions, but you've just logged violations. And then uh lastly, we can
00:29:20.240 uh preserve our system stability by setting some limits. And this is going to be through different timeout parameters. So we can set max reasonable
00:29:27.120 durations for our queries, our number of idle or length of our idle transactions and our number of connections. So on the
00:29:34.000 top here this the top box is just showing uh enabling the log forming of strict loading. The second box is
00:29:39.919 showing two SQL and an active record query and then the third box is showing explain with a couple arguments analyze
00:29:45.919 and buffers in Postgres which will print out our execution plan and enable the buffers option. So if you look down into
00:29:52.159 the execution plan, we can see that there were um three uh buffers that were
00:29:57.679 in the buffer cache showing us shared hits. So our idea here is really to embrace
00:30:04.799 mechanical sympathy. When you understand how a system is designed to be used, you can align uh with the design to gain
00:30:11.039 optimal performance. So that's what I've got for you today. And uh again, you can get this
00:30:16.240 presentation at the QR code or this bit.ly address. And uh thanks for your time today. I hope you enjoy the rest of RailsCom.
Explore all talks recorded at RailsConf 2025
Manu Janardhanan
Christopher "Aji" Slater
Hartley McGuire
Yasuo Honda
Ben Sheldon
Chad Fowler
John Athayde
Mike Perham
+77