00:00:18.640
thanks for uh sticking with it uh for this last session of the day. Um okay.
00:00:25.519
So um in this presentation it's you'll see that the format is basically kind of
00:00:31.519
mistake area and then solutions for that and so I have a lot of links in here more than 60. So if you want to check
00:00:39.040
any of these links go ahead and scan this QR code or go to bit.lyrc10m the slide deck is online already and you
00:00:46.160
can find all the links there. Um so yeah and before I get into it really briefly uh RailsCom has been a
00:00:52.960
big part of my career. My first one was in 2010 back in Baltimore and in the last few years I've been able to present
00:00:58.719
which has been a real honor. Um and so I did want to if you wouldn't all mind um
00:01:03.840
you might have done this before today outside of this morning session but we could all just do a quick round of applause to thank all the organizers and
00:01:10.080
attendees that have made Rails comp.
00:01:18.159
We'll miss you Rails comp. Uh okay so let's get into it. So, we're going to look at costly database mistakes. And
00:01:25.200
some of these are going to be direct costs that are sort of obvious. You might overprovision your instance to handle certain high load periods. Some
00:01:32.799
will be more indirect costs like uh bad user experience that leads to customers downgrading or leaving your platform or
00:01:39.759
maybe an excessive amount of time spent by devs triaging an incident or resolving it, that sort of thing. So,
00:01:45.200
I'm going to have these these little bubbles in the bottom right that will show some of the costs for each of the problem areas.
00:01:52.399
So, we're going to look at 10 uh items in three categories. Uh the forming
00:01:57.520
category would be kind of intentionally attacking your database performance challenges um related to improving your
00:02:04.240
agility uh performance, reducing latency, that sort of thing. And then scaling and optimizing. Those are
00:02:09.920
probably more clear. And we'll go from 10 down to one. And each of these have their own color. And then these costs
00:02:15.440
are in the bottom right here. So far so good. Yeah.
00:02:20.640
Okay. So, uh, first up, mistake number 10. And we have to cook through these because I have a lot of content for 30 minutes. So, uh, um, I will talk at a
00:02:27.920
reasonable clip. Apologies to non-native English speakers. Please watch it at 0.7x 75x later on if that helps you or
00:02:35.840
um, watch it at 2x if I'm talking too slowly. Uh, okay. So um mistake 10 would
00:02:42.160
be infrequent releases. As we know we release most of our DDL changes our schema or structural changes to our
00:02:48.480
database uh through Rails migrations. So having an infrequent release process can really reduce your team's agility and
00:02:54.959
ability to evolve your schema. So using something like Git Flow or a similar sort of legacy software delivery
00:03:01.040
process, not using feature flags to decouple your releases from feature visibility, not tracking or or using
00:03:07.519
DevOps metrics in a metrics based approach, and then performing DDL changes exclusively using Rails
00:03:13.040
migrations. We'll look at one alternative to that briefly. So this can increase your cycle time and increase
00:03:18.879
your time spent uh responding to and resolving incidents. So to fix that, I recommend trying to
00:03:25.280
move towards trunkbased development if you're not already doing that. Using feature flags to help decouple your releases from your feature availability.
00:03:32.480
And there are teams out there using Rails that aren't doing trunkbased development. Last year's Rails survey,
00:03:37.519
uh 20% of the respondents released multiple times per month. 2% of the respondents released multiple times per
00:03:43.760
quarter. So that's pretty infrequent and it's going to reduce that team's agility. Uh tracking DevOps metrics. uh
00:03:50.799
a couple frameworks. If you're not familiar, Dora and Space are two frameworks that have a few key metrics
00:03:56.400
related to your team's ability to deliver software. Uh the book Accelerate covers Dora and Dora has a two-minute uh
00:04:03.280
quick check that is useful on teams to take and look at a few metrics. I'll show an example of that. As developers,
00:04:09.040
we can help um raise our test suites coverage, increase its speed and its reliability. Um and then one other
00:04:15.200
option you could look at if you are in this position now where you're not doing trunkbased development um I've released
00:04:20.639
a new gem called anchor migrations where the idea is to do uh SQL migrations that
00:04:25.759
can still be safety linted using a tool called squawk which I'll show and uh we can write non-blocking item potent DDL
00:04:33.199
changes and then generate rails migrations from that SQL to keep everything in sync.
00:04:38.639
So on the top here is just showing squawk briefly within anchor migrations using the lint command. So I've created
00:04:45.199
a DDL that's creating an index but not using the concurrently keyword. This would be for postgres and uh squawk is
00:04:52.560
the SQL uh linting tool that I mentioned. So it's detected hey you should use concurrently here to avoid
00:04:58.000
this being a blocking DDL operation. This is that accelerate book I
00:05:04.000
mentioned. And then here's a couple screenshots from the DORA two-minute quick check. and it covers their four primary DevOps metrics. Lead time,
00:05:10.560
deployment frequency, change fail percentage, and failed deployment recovery time.
00:05:16.240
Okay, so mistake number nine, database inexperience. So as Rails developers, we're usually
00:05:23.039
working with active record uh and we're working at the RM layer. So we're dealing with object orientation, inheritance, classes, that sort of
00:05:30.160
thing. And relational databases are quite different. you know we have data access to think about SQL relations
00:05:36.080
indexes execution plans so there's kind of this mismatch and it is a different set of skills
00:05:42.720
so often teams might get stuck where they're not hiring database specialists or DBAs on their team they're not using
00:05:49.360
SQL and application code or BI types of code they're not reading and using query execution plans as part of their designs
00:05:56.000
or using terms like cardality selectivity and execution plan buffers from Postgres execution plans for
00:06:02.080
example uh they might be adding indexes kind of haphazardly or choosing schema designs
00:06:07.199
that have poor performance and this again can increase your server costs and increase your uh developer time.
00:06:16.160
So to fix that you can uh directly hire experience. You can look to hire database specialists, DBAs and uh
00:06:23.199
consultants. You can grow experience on your team with books and courses and conferences like this one and
00:06:29.280
communities. Um I'm particularly um or I I'm involved in the Postgre Slack
00:06:35.039
community and that's a good one that I'd recommend if you use Postgres. Um, a specific tactic at companies that
00:06:41.840
I've seen work well is to create a production clone if you can afford that in your in your uh infrastructure budget
00:06:47.840
where you've got a similar sized instance and data to work with and then you can really integrate that into your workflow and run all of your DDL changes
00:06:54.080
there. And as you gain more experience, you can use these concept of concepts of
00:06:59.280
pages and buffers and selectivity and cardality in your designs to help improve your designs and reduce latency
00:07:04.639
in your queries. Uh, and you can learn to avoid performance unfriendly designs like random UUID primary keys. I just
00:07:10.960
wrote a blog post on that that's linked up here you can check out if you're interested. Um, so you'll learn about uh, you know,
00:07:17.360
row versions and you can figure out which Spider-Man is the live uh, Spider-Man and which one is the dead Spider-Man.
00:07:24.160
Um, you learn about fixedsiz pages where data and index entries are stored and um
00:07:30.880
how dense or sparse your uh your storage is and how that affects latency.
00:07:37.599
Um, here are some books I recommend. Uh, these are Postgre's books. So, this is meant to be a talk that would cover
00:07:45.199
MySQL and SQLite as well with Rails, although it's mostly um I've got more content in here related to Postgres. And
00:07:51.199
then the last book is my own book which I'd recommend if you aren't familiar with it. Um these other books are more
00:07:56.879
uh postgres oriented directly and they're all uh great books. And then a
00:08:02.479
couple courses Aaron Francis has produced um a couple great video courses on Postgrez and SQLite. And then Cresten
00:08:09.680
Jameson has produced a couple of great Postgres courses as well including a couple free ones. And uh Ludicrous speed
00:08:16.319
is one that I went through and uh would recommend. Okay. So next up, uh I'm calling it
00:08:22.720
speculative DB design. So this would be something like avoiding a beneficial database constraint today in your schema
00:08:29.520
design based on speculation about your needs that you might have in the future or tomorrow. So you might doubt your
00:08:35.440
ability to evolve the schema design in the future. Uh you might not be using third normal form normalization by
00:08:41.440
default. Um and then on the opposite side, you might be avoiding all forms of denormalization where you want to gain
00:08:48.560
an advantage on your uh query side at the cost of some data duplication. And
00:08:54.399
so there are cases like multi-tenency where that can be beneficial. So this could show up as uh data bugs and
00:09:00.399
increased um maintenance costs. So to fix that, I recommend trying to
00:09:06.480
use all available uh constraints that are available in your database for um improved consistency, referential
00:09:12.720
integrity, and data quality. And uh I created this uh core acronym to uh
00:09:20.080
represent a few different design principles for schema design where the C is uh constraint driven. And I've got a
00:09:25.920
blog post on that if you're curious. Uh you can create I recommend trying to create matching database constraints for
00:09:32.399
your active record validations. Uh doing things like matching your primary key and foreign key types. So there's a tool
00:09:38.399
called database consistency that's a Ruby gem which I'll show that helps you achieve some of that. I recommend using
00:09:45.200
data normalization practices by default working to eliminate duplication
00:09:50.240
um designing for today but also anticipating growth in your data and query volume. And then I think
00:09:56.640
denormalization is a useful tactic some of the time. Uh for example, if you've got a multi-tenant app and you want to
00:10:03.440
denormalize your tenant identifier column. So this is a a SQL query that's just
00:10:09.760
showing um the number of foreign keys uh per table as a ratio. And this uh is
00:10:15.920
useful if you just want to do a quick check on your own database for your usage of constraints. Here's a quick
00:10:22.480
screenshot from that database consistency command line tool. It's got a lot of built-in checkers. It's a
00:10:27.760
really nice tool. So, in this case, it's checking that there's a a foreign key constraint defined that matches a active
00:10:33.200
record validation and that the uh foreign key and primary key data types are matching.
00:10:39.839
Okay, next up is uh missing DB monitoring. So a lot of folks um
00:10:44.880
probably use APM tools with their applications but they might have less monitoring and visibility on their
00:10:50.560
database activity. That could be both foreground client application activity and background processes.
00:10:57.839
Um so not logging slow queries or collecting query statistics and execution plans. Not using the buffers
00:11:03.680
information in Postgrez execution plans. Uh spending time finding application source code for where SQL queries are
00:11:09.839
generated. not monitoring critical background processes like autovacuum. So again, these can increase your time
00:11:15.360
spent triaging an incident and resolving an incident. So to fix that, I recommend uh logging
00:11:21.360
and storing uh all uh source code line number locations for your your queries.
00:11:27.760
And you can do that using query logs in Rails. Uh there's also a gem called marginelia. Query logs uh formats the
00:11:34.800
annotations using the SQL commenter standard, which is nice. And if you enable that, you'll see it right away in
00:11:40.880
your Rails log. And if you use uh Postgres, you can use those annotations in PGAT statements as well, which will
00:11:47.680
capture normalized query statistics and then give you your comments to go back into your source code. Uh I recommend
00:11:54.560
collecting query execution plans and using those in your designs and your pull requests and that sort of thing
00:12:00.320
manually or automatically with um the auto explain extension in Postgres. Um
00:12:05.760
our goal is to reduce the buffers counts in our execution plan. So first we got to take a look at how many buffers are
00:12:13.120
being used, whether they're coming from the buffer cache or outside of the buffer cache. Our goal is to reduce
00:12:18.480
latency. And um there's some other database observability tools here besides PGAT
00:12:24.720
statements. PG Hero, PG Analyze, which is a sponsor of RailsCom. PG Badger is a
00:12:30.959
Postgres log analysis tool. And then I have a few more tools here for MySQL and SQLite listed.
00:12:38.800
So here's an example if you haven't used those query logs of what that could look like. So here we see the controller
00:12:44.480
action and application name for where this query was generated. The query text is just above the yellow highlighted
00:12:50.720
area. And then uh more usefully we have the source location. So we can see the model where the query was generated, the
00:12:57.440
line number and the method name. Okay, next up, OM pitfalls.
00:13:05.440
So, um, the ORM is what we're likely using for most of our applications queries, but there can be some pitfalls.
00:13:12.320
So, performing unnecessarily or unnecessary or costly ORM queries
00:13:17.760
like count queries or order by that aren't really needed by the application, but are part of what is being generated.
00:13:24.240
using non-scalable query patterns like uh the huge inlists. I wrote a blog post
00:13:29.279
about that which can happen either through direct mechanisms or through eager loading. So check that out if
00:13:35.600
you're curious. Um not restricting column access. So all of your queries are select star queries. We'll talk
00:13:41.040
about that a little bit. Using uh the pageionation that's uh comes by default
00:13:46.079
with the OM and then not using some of the caches that are available with the OM with active record. So uh this could
00:13:53.360
rel or sorry could result in overprovisioned instances needing to scale up further than what's needed um
00:14:00.240
and inefficient queries. So to fix that um I recommend uh you
00:14:05.839
know this this would be more for a mature application. It's been around for a while. It's likely that there are some redundant or um unnecessary queries. So,
00:14:13.839
I wrote a post called putting your app on a SQL query diet that um is intended to help folks go and find uh queries and
00:14:21.199
study and make sure that they're actually needed or useful. If not, we can remove those. Um we can try to limit
00:14:26.880
our columns to what's needed uh using select and pluck in active record and that can help us with um better use of
00:14:33.440
our indexes, particularly multicolumn indexes. If we do have the in lists uh
00:14:39.279
types of queries and if we have big tables and we're working with hundreds or thousands of values in these in
00:14:45.279
clauses uh it's likely that you're going to see bad performance from those types of queries. So those are opportunities
00:14:51.279
to restructure those and you've got a few options. You you might be able to uh
00:14:56.720
restructure into a join or use a values clause or use the any operator with an array in Postgres
00:15:03.760
for pageionation. uh you could look at endless or also called key set pageionation and the pagi gem is one
00:15:09.680
that I've used that can help you set that up and then with active record we have a lot of different cache options we
00:15:15.360
have the prepared statements cache if you're using prepared statements and that allows prepared statements allow us
00:15:20.959
to skip the repeated parsing and pla planning stage as the query arrives in the database engine and uh we can use
00:15:27.760
the counter cache to avoid unnecessary count queries we can use the size method over count length if we have a loaded
00:15:33.760
association to count how many items are in it. So on the top here is um this is
00:15:39.839
actually from the Rails guides. So we see that um there's a book uh model and
00:15:45.839
then there's an eager load of the author association and that's going to generate in the next block these two SQL queries.
00:15:52.000
One for the books table, one for the author's table with this in clause. And this is a small set of values. So this
00:15:58.480
isn't going to be a performance problem, but this can grow into hundreds or thousands of values. And um that's where
00:16:03.920
you run into issues. So we can restructure that. Here's one example in Postgres that uses the any and array
00:16:09.360
structure to restructure the query and give us an equivalent result.
00:16:14.959
All right. So mistake five. Uh I'm calling DDL fear. So this is something I've seen on teams where um a team might
00:16:23.199
create kind of a code workaround to avoid evolving the schema and doing a data backfill. Data backfills can al
00:16:28.959
often be quite intensive but um a mistake here might be avoiding it uh
00:16:36.079
when it's might be a better design long term um using blocking DDL operations
00:16:41.199
due to not understanding locks and how uh um operations that requiring the same
00:16:46.959
lock get queued up and that sort of thing. We'll look at some ways to get a better understanding of that. Not
00:16:52.160
linting DDL migrations for safety or for non-blocking uh designs. not practicing
00:16:57.839
big DDL changes on like a production DB clone like we discussed earlier and uh
00:17:03.600
not automatically cancelling uh contending DDL operations and so um we're going to look at the
00:17:10.480
lock timeout which exists in all of the supported database engines so again this could be longer cycles higher
00:17:15.760
maintenance costs so I recommend um if you have that
00:17:20.959
production clone trying to run all of your DDL changes through that first collecting timing understanding what's locked while those are running. Um using
00:17:29.520
tools like uh strong migrations in Postgres or online migrations for MySQL
00:17:34.559
Maria DB or squawk for SQL uh helps us detect blocking or unsafe DDL changes
00:17:41.600
and then often those tools you know particularly strong migrations and online migrations will suggest um safer
00:17:48.640
alternatives to us so we can perform those equivalent operations but usually with multiple steps. Uh, and then
00:17:55.039
there's other tactics like using the ignored columns feature in Rails to detach our code from the schema first,
00:18:01.520
deploying that. Um, tactics like using an invalid check constraint before we add a notnull constraint.
00:18:07.840
Um, there's a nice tool called pglocks.org that shows operations in which locks uh they acquire, whether
00:18:14.720
they're table lock or whether they're exclusive locks or shared locks. And that's a postgress specific tool. And
00:18:21.200
then um you know I recommend trying to set a low lock timeout for any DDL changes. And if uh the lock that's
00:18:28.320
needed can't be acquired then we would retry that DDL change at a lower activity period.
00:18:34.240
Okay. So mistake number four uh I'm calling excessive data access. So imagine you just got back from the
00:18:39.679
grocery store. You've got like eight grocery bags and you can either bring them all in with like four bags on each
00:18:44.720
arm or you can make two trips of four bags each. So if you're a database, uh the second approach is going to be
00:18:51.200
better. Uh we want to try to work on small result sets. So a mistake here could be querying and retrieving huge
00:18:58.880
sets of data, you know, 10,000 plus rows and doing that while the user is waiting. And this is usually a symptom
00:19:05.440
of something that initially was fine, but now the application is four or five years old and it's no longer uh an
00:19:11.679
acceptable amount of time, you know, so it's taking multiple seconds to load data. So there could be ineffective filtering and indexing on low cardality
00:19:18.799
columns or columns with a small amount of unique values. Um missing indexes on high cardality columns or foreign keys.
00:19:25.919
Uh and missed opportunities for filtering and reducing our result set sizes. Not using advanced indexing
00:19:32.799
strategies or index types. Performing slow aggregate queries like sums and counts u making users wait. And not
00:19:40.080
breaking up big tables using table partitioning which we'll touch on briefly. So again, this could increase
00:19:45.280
your server costs and worsen your user experience. So yeah, we want to try to work with
00:19:51.039
small sets of data. And this is often going to involve taking a look at your existing queries and restructuring them,
00:19:56.240
looking for additional opportunities to select fewer rows through extra filtering, select fewer columns,
00:20:02.400
performing fewer joints. Um, we can look for missing index opportunities on high
00:20:07.600
cardality columns. So I've got some some queries that you can use to try that out or we'll look briefly at the PG analyze
00:20:14.080
lint command line tool. Uh it uses this extension called um hypoppg or hypothetical indexes basically to try
00:20:21.679
out different index opportunities and see what uh would result in a reduced latency
00:20:28.080
and then we want to try to use advanced index strategies like multicolumn indexes or in postgres partial indexes
00:20:34.559
and then other index types besides Bree like jin and gist. um we want to you know one option is to
00:20:41.360
pre-calculate some of those aggregates as they become slow. So one option there is the rollup gem and I've got an
00:20:46.480
example with the ride share app that shows how to use that. It's like a kind of a generic mechanism to create um
00:20:53.039
different timebased rollups of calculations so that then at time they're very fast. Uh we can also
00:20:59.280
denormalize data from a bunch of different tables and create a materialized view and we can manage that
00:21:05.200
using the scenic gem in Ruby. Um, for our huge tables, maybe 100 gigabytes or more, we can look at migrating those to
00:21:12.159
partition tables to improve performance and maintenance. So, here's a quick look at a command
00:21:18.400
line tool called PG Analyze Lint, and I'm running it on a ride share app with the ride share users table, and it found
00:21:25.440
an opportunity for a missing index on the users table and the type column.
00:21:32.640
All right, mistake number three, missing data archival. Uh so this is something I've seen in most of the consulting work
00:21:39.679
I did. Um I think it's just more of a phenomenon of over time data is stored in the database and um what can happen
00:21:46.720
is data exists and it's not queried. So storing a significant proportion of data in tables and indexes and that's going
00:21:52.960
to increase latency. Um so there's an opportunity there to archive that which we'll look at. uh capturing high growth
00:22:00.000
data. Maybe you've used one of these gems before, public activity, paper trail, audited, ohoy, and not archiving
00:22:06.400
any of that data once it's no longer needed in the database. Uh it could be first-party data that your application
00:22:12.559
creates, uh but it's from customers that have left your platform, features that you've retired, or soft deleted data.
00:22:19.520
Another thing a lot of teams do is perform, um massive delete operations. So they could work with lots of rows on
00:22:26.480
big tables and uh at least in Postgres that's pretty resource intensive. So we'll look at an alternative there. Same
00:22:33.440
sort of thing. It's going to this could increase your server costs unnecessarily um to be able to handle high load
00:22:38.960
possibly even from a delete operation and worsen uh user experience from increased latency due to just having
00:22:45.200
more data that's not um not needed for quering purposes. So here's an example
00:22:51.280
of a pattern. This is um from PG Analyze where the reds are inserts and then the
00:22:57.520
the big spikes of greens are delete operations that run in a probably from a crown job.
00:23:04.400
So I recommend trying to archive all data that's not regularly queried. Um I wrote up a post called uh copy swap drop
00:23:12.159
which is a tactic that I used with one client where uh essentially we create a
00:23:17.919
a cloned structure of the table and copy in a subset of the rows and then swap the table names around.
00:23:24.400
Um we can also if we know our data fits well for it's high growth data and it would fit the characteristics of a
00:23:30.640
partition table well we can use partition friendly gems at the outset or we can migrate our big tables to be
00:23:37.679
partition tables and make any necessary Rails code changes and um as as customers leave the
00:23:44.640
platform or we have uh soft deleted data or features that we've retired we can archive that data from our application.
00:23:52.320
I'm a fan of the open source tool called CoverBand that tracks your code execution and you can use that to help
00:23:58.240
discover features that are no longer used that could be retired and thus the data from them could be retired or
00:24:03.760
archived. And for the massive delete operations, those are that's there's a possible um
00:24:10.880
benefit there to migrating that table data into a partition table and then moving to a detach operation of old
00:24:17.200
partitions that are unneeded which is much less resource intensive and can be done concurrently in Postgres as an
00:24:23.279
alternative to doing delete operations. All right, last couple here. So thanks
00:24:29.360
for uh hanging in there. So mistake number two um missing database maintenance. So this one's a little bit
00:24:35.360
more straightforward. Running unsupported versions of Postgres, MySQL or SQLite. Not monitoring or fixing
00:24:40.559
heavily fragmented tables or indexes, also called referred to as bloat, but essentially unneeded objects in the
00:24:47.440
database. Uh or in this case, row data and then row data that's captured in
00:24:52.799
index index entries as well. Uh leaving autovacuum or other maintenance parameters untuned just sort of in their
00:24:58.880
stock configuration and then not removing unneeded database objects. And in this case, objects is triggers and
00:25:04.720
constraints and other things like that. So, this could result in poor performance, uh, security risks or
00:25:09.919
worsened user experiences. So, anytime I run across tables or columns or constraints or indexes or functions,
00:25:15.919
there's lots of things. Uh, I'm I'm trying to remove them if they're no longer needed.
00:25:21.679
So, upgrade your database. uh Postgrez uh the Postgres community has this nice uh one of the community members created
00:25:27.520
this nice tool called Y upgrade that shows your current version and your target version and all of the differences between them. That's really
00:25:33.120
nice. Um I recommend tuning autovacuum and postgres for your workload. So in
00:25:39.039
particular look at your heavily updated and deleted your tables where there's a lot of updates and deletes and that's
00:25:44.240
likely where you're going to want to um add more resources for autovacuum. Uh we want to constantly prune and tune our
00:25:50.480
indexes. So our indexes are designed to support our queries. Uh so they should be used by one or more queries and they
00:25:56.159
should provide a meaningful reduction in latency by being scanned. So I've got a couple tools here that are useful to
00:26:02.640
help you achieve that. Um dropping any unneeded objects from the database. And then for a heavily fragmented or bloated
00:26:09.440
tables and indexes, uh we can rebuild those tables and remove that bloat from
00:26:14.480
those no longer reference row versions. And the same thing for indexes and postgrows. We can reindex and we can do
00:26:20.240
that concurrently. And I spoke on the maintainable podcast about how to maintain our databases. Sort of like our
00:26:25.919
applications. A lot of us are probably familiar with application maintenance tasks. We can apply some of that thinking to our databases.
00:26:33.120
Okay. So, we're at the last one. Got about three minutes left. Um, if you haven't seen this term
00:26:39.440
before, I liked this term and I wanted to use it as the last one that kind of uh brings together all of all of the
00:26:45.279
items in this presentation. The term is mechanical sympathy. So mechanical sympathy is when you use a tool or
00:26:51.600
system with an understanding of how it operates best. And it was originally um coined by a race car driver and
00:26:57.279
referring to a race car. But um on the AWS well architected framework blog or
00:27:03.200
well architected blog they sort of referred you know used it to refer to you know different technology systems.
00:27:09.279
And in this case we're going to look at it with uh a database in mind. So the mistake here would be rejecting
00:27:14.960
mechanical sympathy. um not learning how the database works and then not being able to play to its strengths and do
00:27:21.039
less of where it's not as good. So in Postgres, you know, using a lot of high
00:27:26.080
churn designs where there's a lot of updates and deletes as part of your core design, that doesn't work as well uh
00:27:31.520
with how tupils and MVCC and autovacuum works. Um overusing, you know, limited
00:27:37.600
resources on our database instance. So CPU, memory, and IO from inefficient reads and writes. um inefficient
00:27:44.080
generated queries, not restructuring those or kind of moving more to handcrafted queries from gems. You know,
00:27:50.240
some that I've seen that have caused trouble, JSON API resources, GraphQL, Ruby, Active Admin, allowing lazy
00:27:56.480
loading and M plus1s and not preventing excessively long queries or excessively long idle transactions.
00:28:05.120
Oops. So to fix some of that, I recommend trying to take control of your SQL. So in active record relations we
00:28:10.880
can use 2SQL to inspect the generated SQL and then uh work on improvements. So
00:28:16.399
that could be reducing again our columns that we're picking any joins that uh can be removed and adding more wear
00:28:22.399
filtering more criteria removing unnecessary ordering things like that.
00:28:27.600
Um and then we can look at our execution plans and uh we can use the explain keyword in active record to do that if
00:28:33.120
you're not aware. Uh we can also replace our high update um or delete churn
00:28:39.279
designs with append mostly designs. So one example of that is the slotted counters pattern and there's a nice Ruby
00:28:44.640
gem for that. I've got an example of showing how to use that with ride share to replace some counting. Uh and then in
00:28:50.080
Postgres there's this um type of update called a hot update which we're not going to get into but uh increasing our
00:28:56.000
proportion of our updates as hot updates is a right path optimization.
00:29:01.279
And then uh on the rail side, we can prevent lazy loading by using the strict loading feature, which is relatively
00:29:06.799
new. And if you're not using that, you could start by um logging violations where you've enabled you haven't enabled
00:29:13.600
strict loading, which will raise exceptions, but you've just logged violations. And then uh lastly, we can
00:29:20.240
uh preserve our system stability by setting some limits. And this is going to be through different timeout parameters. So we can set max reasonable
00:29:27.120
durations for our queries, our number of idle or length of our idle transactions and our number of connections. So on the
00:29:34.000
top here this the top box is just showing uh enabling the log forming of strict loading. The second box is
00:29:39.919
showing two SQL and an active record query and then the third box is showing explain with a couple arguments analyze
00:29:45.919
and buffers in Postgres which will print out our execution plan and enable the buffers option. So if you look down into
00:29:52.159
the execution plan, we can see that there were um three uh buffers that were
00:29:57.679
in the buffer cache showing us shared hits. So our idea here is really to embrace
00:30:04.799
mechanical sympathy. When you understand how a system is designed to be used, you can align uh with the design to gain
00:30:11.039
optimal performance. So that's what I've got for you today. And uh again, you can get this
00:30:16.240
presentation at the QR code or this bit.ly address. And uh thanks for your time today. I hope you enjoy the rest of RailsCom.