10 Costly Database Performance Mistakes (and How to Fix Them)

10 Costly Database Performance Mistakes (and How to Fix Them)

Play on YouTube

Andrew Atkinson

#database-performance

#database-migrations

#database-design

10 Costly Database Performance Mistakes (and How to Fix Them)

Andrew Atkinson • July 08, 2025 • Philadelphia, PA • Talk

Introduction

In the talk "10 Costly Database Performance Mistakes (and How to Fix Them)", Andrew Atkinson shares insights from working with Rails and Postgres applications, drawing on real-world experience to identify common pitfalls that can cause significant performance issues and financial costs in production environments. Delivered at RailsConf 2025, this session guides developers through 10 major mistakes, explaining why they arise, their impacts, and actionable strategies to resolve them.

Key Points and Mistakes Discussed

Illustrative Examples

Use of real-world Rails migrations and SQL linting tools (Squawk, anchor migrations)
Case references to poorly designed index strategies, inefficient aggregates, and massive delete scenarios handled by partitioning

Conclusion and Takeaways

The talk emphasizes proactive monitoring, regular maintenance, a willingness to evolve schemas, and developing a deeper understanding of how relational databases operate. By addressing these 10 pitfalls with practical solutions, Rails teams can significantly boost performance, reduce costs, and foster more maintainable applications.

10 Costly Database Performance Mistakes (and How to Fix Them)
Andrew Atkinson • Philadelphia, PA • Talk

Date: July 08, 2025
Published: July 23, 2025
Announced: unknown

After working with countless Rails and Postgres applications as a consultant and backend engineer, I’ve seen firsthand how database mistakes can cause big costs and headaches. Poor data types, inefficient queries, and flawed schema designs slow down operations, and result in excessive costs through over-provisioned servers, downtime, lost users, and engineering hours spent restructuring features.

How do we prevent these pitfalls? Awareness is the first step. And if your database is already serving tons of woefully inefficient queries, where should you focus for the biggest wins?

In this talk, we'll break down 10 real-world Rails database mistakes, including how they happened, the impact they had, and most importantly, how to fix them. Topics include query design, indexing, schema optimization, and how the CPU, memory, and IOPS resources tie into Active Record SQL performance.

Expect practical takeaways, real examples, and solutions you can apply immediately. Slides and blog posts will be available at GitHub.

RailsConf 2025

00:00:18.640 thanks for uh sticking with it uh for this last session of the day. Um okay.

00:00:25.519 So um in this presentation it's you'll see that the format is basically kind of

00:00:31.519 mistake area and then solutions for that and so I have a lot of links in here more than 60. So if you want to check

00:00:39.040 any of these links go ahead and scan this QR code or go to bit.lyrc10m the slide deck is online already and you

00:00:46.160 can find all the links there. Um so yeah and before I get into it really briefly uh RailsCom has been a

00:00:52.960 big part of my career. My first one was in 2010 back in Baltimore and in the last few years I've been able to present

00:00:58.719 which has been a real honor. Um and so I did want to if you wouldn't all mind um

00:01:03.840 you might have done this before today outside of this morning session but we could all just do a quick round of applause to thank all the organizers and

00:01:10.080 attendees that have made Rails comp.

00:01:18.159 We'll miss you Rails comp. Uh okay so let's get into it. So, we're going to look at costly database mistakes. And

00:01:25.200 some of these are going to be direct costs that are sort of obvious. You might overprovision your instance to handle certain high load periods. Some

00:01:32.799 will be more indirect costs like uh bad user experience that leads to customers downgrading or leaving your platform or

00:01:39.759 maybe an excessive amount of time spent by devs triaging an incident or resolving it, that sort of thing. So,

00:01:45.200 I'm going to have these these little bubbles in the bottom right that will show some of the costs for each of the problem areas.

00:01:52.399 So, we're going to look at 10 uh items in three categories. Uh the forming

00:01:57.520 category would be kind of intentionally attacking your database performance challenges um related to improving your

00:02:04.240 agility uh performance, reducing latency, that sort of thing. And then scaling and optimizing. Those are

00:02:09.920 probably more clear. And we'll go from 10 down to one. And each of these have their own color. And then these costs

00:02:15.440 are in the bottom right here. So far so good. Yeah.

00:02:20.640 Okay. So, uh, first up, mistake number 10. And we have to cook through these because I have a lot of content for 30 minutes. So, uh, um, I will talk at a

00:02:27.920 reasonable clip. Apologies to non-native English speakers. Please watch it at 0.7x 75x later on if that helps you or

00:02:35.840 um, watch it at 2x if I'm talking too slowly. Uh, okay. So um mistake 10 would

00:02:42.160 be infrequent releases. As we know we release most of our DDL changes our schema or structural changes to our

00:02:48.480 database uh through Rails migrations. So having an infrequent release process can really reduce your team's agility and

00:02:54.959 ability to evolve your schema. So using something like Git Flow or a similar sort of legacy software delivery

00:03:01.040 process, not using feature flags to decouple your releases from feature visibility, not tracking or or using

00:03:07.519 DevOps metrics in a metrics based approach, and then performing DDL changes exclusively using Rails

00:03:13.040 migrations. We'll look at one alternative to that briefly. So this can increase your cycle time and increase

00:03:18.879 your time spent uh responding to and resolving incidents. So to fix that, I recommend trying to

00:03:25.280 move towards trunkbased development if you're not already doing that. Using feature flags to help decouple your releases from your feature availability.

00:03:32.480 And there are teams out there using Rails that aren't doing trunkbased development. Last year's Rails survey,

00:03:37.519 uh 20% of the respondents released multiple times per month. 2% of the respondents released multiple times per

00:03:43.760 quarter. So that's pretty infrequent and it's going to reduce that team's agility. Uh tracking DevOps metrics. uh

00:03:50.799 a couple frameworks. If you're not familiar, Dora and Space are two frameworks that have a few key metrics

00:03:56.400 related to your team's ability to deliver software. Uh the book Accelerate covers Dora and Dora has a two-minute uh

00:04:03.280 quick check that is useful on teams to take and look at a few metrics. I'll show an example of that. As developers,

00:04:09.040 we can help um raise our test suites coverage, increase its speed and its reliability. Um and then one other

00:04:15.200 option you could look at if you are in this position now where you're not doing trunkbased development um I've released

00:04:20.639 a new gem called anchor migrations where the idea is to do uh SQL migrations that

00:04:25.759 can still be safety linted using a tool called squawk which I'll show and uh we can write non-blocking item potent DDL

00:04:33.199 changes and then generate rails migrations from that SQL to keep everything in sync.

00:04:38.639 So on the top here is just showing squawk briefly within anchor migrations using the lint command. So I've created

00:04:45.199 a DDL that's creating an index but not using the concurrently keyword. This would be for postgres and uh squawk is

00:04:52.560 the SQL uh linting tool that I mentioned. So it's detected hey you should use concurrently here to avoid

00:04:58.000 this being a blocking DDL operation. This is that accelerate book I

00:05:04.000 mentioned. And then here's a couple screenshots from the DORA two-minute quick check. and it covers their four primary DevOps metrics. Lead time,

00:05:10.560 deployment frequency, change fail percentage, and failed deployment recovery time.

00:05:16.240 Okay, so mistake number nine, database inexperience. So as Rails developers, we're usually

00:05:23.039 working with active record uh and we're working at the RM layer. So we're dealing with object orientation, inheritance, classes, that sort of

00:05:30.160 thing. And relational databases are quite different. you know we have data access to think about SQL relations

00:05:36.080 indexes execution plans so there's kind of this mismatch and it is a different set of skills

00:05:42.720 so often teams might get stuck where they're not hiring database specialists or DBAs on their team they're not using

00:05:49.360 SQL and application code or BI types of code they're not reading and using query execution plans as part of their designs

00:05:56.000 or using terms like cardality selectivity and execution plan buffers from Postgres execution plans for

00:06:02.080 example uh they might be adding indexes kind of haphazardly or choosing schema designs

00:06:07.199 that have poor performance and this again can increase your server costs and increase your uh developer time.

00:06:16.160 So to fix that you can uh directly hire experience. You can look to hire database specialists, DBAs and uh

00:06:23.199 consultants. You can grow experience on your team with books and courses and conferences like this one and

00:06:29.280 communities. Um I'm particularly um or I I'm involved in the Postgre Slack

00:06:35.039 community and that's a good one that I'd recommend if you use Postgres. Um, a specific tactic at companies that

00:06:41.840 I've seen work well is to create a production clone if you can afford that in your in your uh infrastructure budget

00:06:47.840 where you've got a similar sized instance and data to work with and then you can really integrate that into your workflow and run all of your DDL changes

00:06:54.080 there. And as you gain more experience, you can use these concept of concepts of

00:06:59.280 pages and buffers and selectivity and cardality in your designs to help improve your designs and reduce latency

00:07:04.639 in your queries. Uh, and you can learn to avoid performance unfriendly designs like random UUID primary keys. I just

00:07:10.960 wrote a blog post on that that's linked up here you can check out if you're interested. Um, so you'll learn about uh, you know,

00:07:17.360 row versions and you can figure out which Spider-Man is the live uh, Spider-Man and which one is the dead Spider-Man.

00:07:24.160 Um, you learn about fixedsiz pages where data and index entries are stored and um

00:07:30.880 how dense or sparse your uh your storage is and how that affects latency.

00:07:37.599 Um, here are some books I recommend. Uh, these are Postgre's books. So, this is meant to be a talk that would cover

00:07:45.199 MySQL and SQLite as well with Rails, although it's mostly um I've got more content in here related to Postgres. And

00:07:51.199 then the last book is my own book which I'd recommend if you aren't familiar with it. Um these other books are more

00:07:56.879 uh postgres oriented directly and they're all uh great books. And then a

00:08:02.479 couple courses Aaron Francis has produced um a couple great video courses on Postgrez and SQLite. And then Cresten

00:08:09.680 Jameson has produced a couple of great Postgres courses as well including a couple free ones. And uh Ludicrous speed

00:08:16.319 is one that I went through and uh would recommend. Okay. So next up, uh I'm calling it

00:08:22.720 speculative DB design. So this would be something like avoiding a beneficial database constraint today in your schema

00:08:29.520 design based on speculation about your needs that you might have in the future or tomorrow. So you might doubt your

00:08:35.440 ability to evolve the schema design in the future. Uh you might not be using third normal form normalization by

00:08:41.440 default. Um and then on the opposite side, you might be avoiding all forms of denormalization where you want to gain

00:08:48.560 an advantage on your uh query side at the cost of some data duplication. And

00:08:54.399 so there are cases like multi-tenency where that can be beneficial. So this could show up as uh data bugs and

00:09:00.399 increased um maintenance costs. So to fix that, I recommend trying to

00:09:06.480 use all available uh constraints that are available in your database for um improved consistency, referential

00:09:12.720 integrity, and data quality. And uh I created this uh core acronym to uh

00:09:20.080 represent a few different design principles for schema design where the C is uh constraint driven. And I've got a

00:09:25.920 blog post on that if you're curious. Uh you can create I recommend trying to create matching database constraints for

00:09:32.399 your active record validations. Uh doing things like matching your primary key and foreign key types. So there's a tool

00:09:38.399 called database consistency that's a Ruby gem which I'll show that helps you achieve some of that. I recommend using

00:09:45.200 data normalization practices by default working to eliminate duplication

00:09:50.240 um designing for today but also anticipating growth in your data and query volume. And then I think

00:09:56.640 denormalization is a useful tactic some of the time. Uh for example, if you've got a multi-tenant app and you want to

00:10:03.440 denormalize your tenant identifier column. So this is a a SQL query that's just

00:10:09.760 showing um the number of foreign keys uh per table as a ratio. And this uh is

00:10:15.920 useful if you just want to do a quick check on your own database for your usage of constraints. Here's a quick

00:10:22.480 screenshot from that database consistency command line tool. It's got a lot of built-in checkers. It's a

00:10:27.760 really nice tool. So, in this case, it's checking that there's a a foreign key constraint defined that matches a active

00:10:33.200 record validation and that the uh foreign key and primary key data types are matching.

00:10:39.839 Okay, next up is uh missing DB monitoring. So a lot of folks um

00:10:44.880 probably use APM tools with their applications but they might have less monitoring and visibility on their

00:10:50.560 database activity. That could be both foreground client application activity and background processes.

00:10:57.839 Um so not logging slow queries or collecting query statistics and execution plans. Not using the buffers

00:11:03.680 information in Postgrez execution plans. Uh spending time finding application source code for where SQL queries are

00:11:09.839 generated. not monitoring critical background processes like autovacuum. So again, these can increase your time

00:11:15.360 spent triaging an incident and resolving an incident. So to fix that, I recommend uh logging

00:11:21.360 and storing uh all uh source code line number locations for your your queries.

00:11:27.760 And you can do that using query logs in Rails. Uh there's also a gem called marginelia. Query logs uh formats the

00:11:34.800 annotations using the SQL commenter standard, which is nice. And if you enable that, you'll see it right away in

00:11:40.880 your Rails log. And if you use uh Postgres, you can use those annotations in PGAT statements as well, which will

00:11:47.680 capture normalized query statistics and then give you your comments to go back into your source code. Uh I recommend

00:11:54.560 collecting query execution plans and using those in your designs and your pull requests and that sort of thing

00:12:00.320 manually or automatically with um the auto explain extension in Postgres. Um

00:12:05.760 our goal is to reduce the buffers counts in our execution plan. So first we got to take a look at how many buffers are

00:12:13.120 being used, whether they're coming from the buffer cache or outside of the buffer cache. Our goal is to reduce

00:12:18.480 latency. And um there's some other database observability tools here besides PGAT

00:12:24.720 statements. PG Hero, PG Analyze, which is a sponsor of RailsCom. PG Badger is a

00:12:30.959 Postgres log analysis tool. And then I have a few more tools here for MySQL and SQLite listed.

00:12:38.800 So here's an example if you haven't used those query logs of what that could look like. So here we see the controller

00:12:44.480 action and application name for where this query was generated. The query text is just above the yellow highlighted

00:12:50.720 area. And then uh more usefully we have the source location. So we can see the model where the query was generated, the

00:12:57.440 line number and the method name. Okay, next up, OM pitfalls.

00:13:05.440 So, um, the ORM is what we're likely using for most of our applications queries, but there can be some pitfalls.

00:13:12.320 So, performing unnecessarily or unnecessary or costly ORM queries

00:13:17.760 like count queries or order by that aren't really needed by the application, but are part of what is being generated.

00:13:24.240 using non-scalable query patterns like uh the huge inlists. I wrote a blog post

00:13:29.279 about that which can happen either through direct mechanisms or through eager loading. So check that out if

00:13:35.600 you're curious. Um not restricting column access. So all of your queries are select star queries. We'll talk

00:13:41.040 about that a little bit. Using uh the pageionation that's uh comes by default

00:13:46.079 with the OM and then not using some of the caches that are available with the OM with active record. So uh this could

00:13:53.360 rel or sorry could result in overprovisioned instances needing to scale up further than what's needed um

00:14:00.240 and inefficient queries. So to fix that um I recommend uh you

00:14:05.839 know this this would be more for a mature application. It's been around for a while. It's likely that there are some redundant or um unnecessary queries. So,

00:14:13.839 I wrote a post called putting your app on a SQL query diet that um is intended to help folks go and find uh queries and

00:14:21.199 study and make sure that they're actually needed or useful. If not, we can remove those. Um we can try to limit

00:14:26.880 our columns to what's needed uh using select and pluck in active record and that can help us with um better use of

00:14:33.440 our indexes, particularly multicolumn indexes. If we do have the in lists uh

00:14:39.279 types of queries and if we have big tables and we're working with hundreds or thousands of values in these in

00:14:45.279 clauses uh it's likely that you're going to see bad performance from those types of queries. So those are opportunities

00:14:51.279 to restructure those and you've got a few options. You you might be able to uh

00:14:56.720 restructure into a join or use a values clause or use the any operator with an array in Postgres

00:15:03.760 for pageionation. uh you could look at endless or also called key set pageionation and the pagi gem is one

00:15:09.680 that I've used that can help you set that up and then with active record we have a lot of different cache options we

00:15:15.360 have the prepared statements cache if you're using prepared statements and that allows prepared statements allow us

00:15:20.959 to skip the repeated parsing and pla planning stage as the query arrives in the database engine and uh we can use

00:15:27.760 the counter cache to avoid unnecessary count queries we can use the size method over count length if we have a loaded

00:15:33.760 association to count how many items are in it. So on the top here is um this is

00:15:39.839 actually from the Rails guides. So we see that um there's a book uh model and

00:15:45.839 then there's an eager load of the author association and that's going to generate in the next block these two SQL queries.

00:15:52.000 One for the books table, one for the author's table with this in clause. And this is a small set of values. So this

00:15:58.480 isn't going to be a performance problem, but this can grow into hundreds or thousands of values. And um that's where

00:16:03.920 you run into issues. So we can restructure that. Here's one example in Postgres that uses the any and array

00:16:09.360 structure to restructure the query and give us an equivalent result.

00:16:14.959 All right. So mistake five. Uh I'm calling DDL fear. So this is something I've seen on teams where um a team might

00:16:23.199 create kind of a code workaround to avoid evolving the schema and doing a data backfill. Data backfills can al

00:16:28.959 often be quite intensive but um a mistake here might be avoiding it uh

00:16:36.079 when it's might be a better design long term um using blocking DDL operations

00:16:41.199 due to not understanding locks and how uh um operations that requiring the same

00:16:46.959 lock get queued up and that sort of thing. We'll look at some ways to get a better understanding of that. Not

00:16:52.160 linting DDL migrations for safety or for non-blocking uh designs. not practicing

00:16:57.839 big DDL changes on like a production DB clone like we discussed earlier and uh

00:17:03.600 not automatically cancelling uh contending DDL operations and so um we're going to look at the

00:17:10.480 lock timeout which exists in all of the supported database engines so again this could be longer cycles higher

00:17:15.760 maintenance costs so I recommend um if you have that

00:17:20.959 production clone trying to run all of your DDL changes through that first collecting timing understanding what's locked while those are running. Um using

00:17:29.520 tools like uh strong migrations in Postgres or online migrations for MySQL

00:17:34.559 Maria DB or squawk for SQL uh helps us detect blocking or unsafe DDL changes

00:17:41.600 and then often those tools you know particularly strong migrations and online migrations will suggest um safer

00:17:48.640 alternatives to us so we can perform those equivalent operations but usually with multiple steps. Uh, and then

00:17:55.039 there's other tactics like using the ignored columns feature in Rails to detach our code from the schema first,

00:18:01.520 deploying that. Um, tactics like using an invalid check constraint before we add a notnull constraint.

00:18:07.840 Um, there's a nice tool called pglocks.org that shows operations in which locks uh they acquire, whether

00:18:14.720 they're table lock or whether they're exclusive locks or shared locks. And that's a postgress specific tool. And

00:18:21.200 then um you know I recommend trying to set a low lock timeout for any DDL changes. And if uh the lock that's

00:18:28.320 needed can't be acquired then we would retry that DDL change at a lower activity period.

00:18:34.240 Okay. So mistake number four uh I'm calling excessive data access. So imagine you just got back from the

00:18:39.679 grocery store. You've got like eight grocery bags and you can either bring them all in with like four bags on each

00:18:44.720 arm or you can make two trips of four bags each. So if you're a database, uh the second approach is going to be

00:18:51.200 better. Uh we want to try to work on small result sets. So a mistake here could be querying and retrieving huge

00:18:58.880 sets of data, you know, 10,000 plus rows and doing that while the user is waiting. And this is usually a symptom

00:19:05.440 of something that initially was fine, but now the application is four or five years old and it's no longer uh an

00:19:11.679 acceptable amount of time, you know, so it's taking multiple seconds to load data. So there could be ineffective filtering and indexing on low cardality

00:19:18.799 columns or columns with a small amount of unique values. Um missing indexes on high cardality columns or foreign keys.

00:19:25.919 Uh and missed opportunities for filtering and reducing our result set sizes. Not using advanced indexing

00:19:32.799 strategies or index types. Performing slow aggregate queries like sums and counts u making users wait. And not

00:19:40.080 breaking up big tables using table partitioning which we'll touch on briefly. So again, this could increase

00:19:45.280 your server costs and worsen your user experience. So yeah, we want to try to work with

00:19:51.039 small sets of data. And this is often going to involve taking a look at your existing queries and restructuring them,

00:19:56.240 looking for additional opportunities to select fewer rows through extra filtering, select fewer columns,

00:20:02.400 performing fewer joints. Um, we can look for missing index opportunities on high

00:20:07.600 cardality columns. So I've got some some queries that you can use to try that out or we'll look briefly at the PG analyze

00:20:14.080 lint command line tool. Uh it uses this extension called um hypoppg or hypothetical indexes basically to try

00:20:21.679 out different index opportunities and see what uh would result in a reduced latency

00:20:28.080 and then we want to try to use advanced index strategies like multicolumn indexes or in postgres partial indexes

00:20:34.559 and then other index types besides Bree like jin and gist. um we want to you know one option is to

00:20:41.360 pre-calculate some of those aggregates as they become slow. So one option there is the rollup gem and I've got an

00:20:46.480 example with the ride share app that shows how to use that. It's like a kind of a generic mechanism to create um

00:20:53.039 different timebased rollups of calculations so that then at time they're very fast. Uh we can also

00:20:59.280 denormalize data from a bunch of different tables and create a materialized view and we can manage that

00:21:05.200 using the scenic gem in Ruby. Um, for our huge tables, maybe 100 gigabytes or more, we can look at migrating those to

00:21:12.159 partition tables to improve performance and maintenance. So, here's a quick look at a command

00:21:18.400 line tool called PG Analyze Lint, and I'm running it on a ride share app with the ride share users table, and it found

00:21:25.440 an opportunity for a missing index on the users table and the type column.

00:21:32.640 All right, mistake number three, missing data archival. Uh so this is something I've seen in most of the consulting work

00:21:39.679 I did. Um I think it's just more of a phenomenon of over time data is stored in the database and um what can happen

00:21:46.720 is data exists and it's not queried. So storing a significant proportion of data in tables and indexes and that's going

00:21:52.960 to increase latency. Um so there's an opportunity there to archive that which we'll look at. uh capturing high growth

00:22:00.000 data. Maybe you've used one of these gems before, public activity, paper trail, audited, ohoy, and not archiving

00:22:06.400 any of that data once it's no longer needed in the database. Uh it could be first-party data that your application

00:22:12.559 creates, uh but it's from customers that have left your platform, features that you've retired, or soft deleted data.

00:22:19.520 Another thing a lot of teams do is perform, um massive delete operations. So they could work with lots of rows on

00:22:26.480 big tables and uh at least in Postgres that's pretty resource intensive. So we'll look at an alternative there. Same

00:22:33.440 sort of thing. It's going to this could increase your server costs unnecessarily um to be able to handle high load

00:22:38.960 possibly even from a delete operation and worsen uh user experience from increased latency due to just having

00:22:45.200 more data that's not um not needed for quering purposes. So here's an example

00:22:51.280 of a pattern. This is um from PG Analyze where the reds are inserts and then the

00:22:57.520 the big spikes of greens are delete operations that run in a probably from a crown job.

00:23:04.400 So I recommend trying to archive all data that's not regularly queried. Um I wrote up a post called uh copy swap drop

00:23:12.159 which is a tactic that I used with one client where uh essentially we create a

00:23:17.919 a cloned structure of the table and copy in a subset of the rows and then swap the table names around.

00:23:24.400 Um we can also if we know our data fits well for it's high growth data and it would fit the characteristics of a

00:23:30.640 partition table well we can use partition friendly gems at the outset or we can migrate our big tables to be

00:23:37.679 partition tables and make any necessary Rails code changes and um as as customers leave the

00:23:44.640 platform or we have uh soft deleted data or features that we've retired we can archive that data from our application.

00:23:52.320 I'm a fan of the open source tool called CoverBand that tracks your code execution and you can use that to help

00:23:58.240 discover features that are no longer used that could be retired and thus the data from them could be retired or

00:24:03.760 archived. And for the massive delete operations, those are that's there's a possible um

00:24:10.880 benefit there to migrating that table data into a partition table and then moving to a detach operation of old

00:24:17.200 partitions that are unneeded which is much less resource intensive and can be done concurrently in Postgres as an

00:24:23.279 alternative to doing delete operations. All right, last couple here. So thanks

00:24:29.360 for uh hanging in there. So mistake number two um missing database maintenance. So this one's a little bit

00:24:35.360 more straightforward. Running unsupported versions of Postgres, MySQL or SQLite. Not monitoring or fixing

00:24:40.559 heavily fragmented tables or indexes, also called referred to as bloat, but essentially unneeded objects in the

00:24:47.440 database. Uh or in this case, row data and then row data that's captured in

00:24:52.799 index index entries as well. Uh leaving autovacuum or other maintenance parameters untuned just sort of in their

00:24:58.880 stock configuration and then not removing unneeded database objects. And in this case, objects is triggers and

00:25:04.720 constraints and other things like that. So, this could result in poor performance, uh, security risks or

00:25:09.919 worsened user experiences. So, anytime I run across tables or columns or constraints or indexes or functions,

00:25:15.919 there's lots of things. Uh, I'm I'm trying to remove them if they're no longer needed.

00:25:21.679 So, upgrade your database. uh Postgrez uh the Postgres community has this nice uh one of the community members created

00:25:27.520 this nice tool called Y upgrade that shows your current version and your target version and all of the differences between them. That's really

00:25:33.120 nice. Um I recommend tuning autovacuum and postgres for your workload. So in

00:25:39.039 particular look at your heavily updated and deleted your tables where there's a lot of updates and deletes and that's

00:25:44.240 likely where you're going to want to um add more resources for autovacuum. Uh we want to constantly prune and tune our

00:25:50.480 indexes. So our indexes are designed to support our queries. Uh so they should be used by one or more queries and they

00:25:56.159 should provide a meaningful reduction in latency by being scanned. So I've got a couple tools here that are useful to

00:26:02.640 help you achieve that. Um dropping any unneeded objects from the database. And then for a heavily fragmented or bloated

00:26:09.440 tables and indexes, uh we can rebuild those tables and remove that bloat from

00:26:14.480 those no longer reference row versions. And the same thing for indexes and postgrows. We can reindex and we can do

00:26:20.240 that concurrently. And I spoke on the maintainable podcast about how to maintain our databases. Sort of like our

00:26:25.919 applications. A lot of us are probably familiar with application maintenance tasks. We can apply some of that thinking to our databases.

00:26:33.120 Okay. So, we're at the last one. Got about three minutes left. Um, if you haven't seen this term

00:26:39.440 before, I liked this term and I wanted to use it as the last one that kind of uh brings together all of all of the

00:26:45.279 items in this presentation. The term is mechanical sympathy. So mechanical sympathy is when you use a tool or

00:26:51.600 system with an understanding of how it operates best. And it was originally um coined by a race car driver and

00:26:57.279 referring to a race car. But um on the AWS well architected framework blog or

00:27:03.200 well architected blog they sort of referred you know used it to refer to you know different technology systems.

00:27:09.279 And in this case we're going to look at it with uh a database in mind. So the mistake here would be rejecting

00:27:14.960 mechanical sympathy. um not learning how the database works and then not being able to play to its strengths and do

00:27:21.039 less of where it's not as good. So in Postgres, you know, using a lot of high

00:27:26.080 churn designs where there's a lot of updates and deletes as part of your core design, that doesn't work as well uh

00:27:31.520 with how tupils and MVCC and autovacuum works. Um overusing, you know, limited

00:27:37.600 resources on our database instance. So CPU, memory, and IO from inefficient reads and writes. um inefficient

00:27:44.080 generated queries, not restructuring those or kind of moving more to handcrafted queries from gems. You know,

00:27:50.240 some that I've seen that have caused trouble, JSON API resources, GraphQL, Ruby, Active Admin, allowing lazy

00:27:56.480 loading and M plus1s and not preventing excessively long queries or excessively long idle transactions.

00:28:05.120 Oops. So to fix some of that, I recommend trying to take control of your SQL. So in active record relations we

00:28:10.880 can use 2SQL to inspect the generated SQL and then uh work on improvements. So

00:28:16.399 that could be reducing again our columns that we're picking any joins that uh can be removed and adding more wear

00:28:22.399 filtering more criteria removing unnecessary ordering things like that.

00:28:27.600 Um and then we can look at our execution plans and uh we can use the explain keyword in active record to do that if

00:28:33.120 you're not aware. Uh we can also replace our high update um or delete churn

00:28:39.279 designs with append mostly designs. So one example of that is the slotted counters pattern and there's a nice Ruby

00:28:44.640 gem for that. I've got an example of showing how to use that with ride share to replace some counting. Uh and then in

00:28:50.080 Postgres there's this um type of update called a hot update which we're not going to get into but uh increasing our

00:28:56.000 proportion of our updates as hot updates is a right path optimization.

00:29:01.279 And then uh on the rail side, we can prevent lazy loading by using the strict loading feature, which is relatively

00:29:06.799 new. And if you're not using that, you could start by um logging violations where you've enabled you haven't enabled

00:29:13.600 strict loading, which will raise exceptions, but you've just logged violations. And then uh lastly, we can

00:29:20.240 uh preserve our system stability by setting some limits. And this is going to be through different timeout parameters. So we can set max reasonable

00:29:27.120 durations for our queries, our number of idle or length of our idle transactions and our number of connections. So on the

00:29:34.000 top here this the top box is just showing uh enabling the log forming of strict loading. The second box is

00:29:39.919 showing two SQL and an active record query and then the third box is showing explain with a couple arguments analyze

00:29:45.919 and buffers in Postgres which will print out our execution plan and enable the buffers option. So if you look down into

00:29:52.159 the execution plan, we can see that there were um three uh buffers that were

00:29:57.679 in the buffer cache showing us shared hits. So our idea here is really to embrace

00:30:04.799 mechanical sympathy. When you understand how a system is designed to be used, you can align uh with the design to gain

00:30:11.039 optimal performance. So that's what I've got for you today. And uh again, you can get this

00:30:16.240 presentation at the QR code or this bit.ly address. And uh thanks for your time today. I hope you enjoy the rest of RailsCom.

Andrew Atkinson

explore all talks recorded at RailsConf 2025

Explore all talks recorded at RailsConf 2025

RailsConf 2025