00:00:16.880
Um, hi everybody. Thank you so much for
00:00:19.199
coming today. My talk is an active
00:00:21.760
record rewrite, the story behind the
00:00:23.920
attributes API. My name is Tess.
00:00:27.920
Today we have a story about refactoring
00:00:30.640
a large part of active record to make it
00:00:33.280
more stable.
00:00:35.440
First we'll have this user class.
00:00:38.960
This is its table. It has an attribute
00:00:42.480
ID whose type is integer.
00:00:46.320
Now we'll have an instance of user. We
00:00:49.120
are setting its attribute to the string
00:00:51.840
one.
00:00:53.360
Now let's call ID on the instance of
00:00:55.520
user and see what's returned. You might
00:00:58.000
expect that calling it again would
00:00:59.520
return the same thing. The string one I
00:01:02.000
mean we just set it to be a string but
00:01:04.559
we get back an integer. Its previous
00:01:07.040
type string was cast to integer.
00:01:11.200
This is type casting turning the data
00:01:13.520
from one type to another and vice versa.
00:01:17.200
When we look back at our table we can
00:01:19.200
see that the type for the attribute ID
00:01:22.479
is integer. That's why when we pass it
00:01:24.960
the string, Rails cast its type to an
00:01:28.400
integer. Now, how does Rails handle type
00:01:31.759
casting?
00:01:33.280
In Rails, the attributes API handles
00:01:35.680
type casting in Rails. Today, the
00:01:38.240
attributes API has been extracted to
00:01:40.560
active model, but it started off an
00:01:42.799
active record. For today's dive into the
00:01:45.360
attributes API, I sat down for a
00:01:47.840
conversation with its author, Sage. Um,
00:01:51.759
well, they're actually here and they're
00:01:53.119
not a plant. So, uh, Sage Griffin is my
00:01:56.479
wife and during the early years of our
00:01:58.560
marriage, they got into open source
00:02:00.640
contributing and refactored so much of
00:02:02.960
active record that I wanted to talk
00:02:04.799
about it. For context, this was like 10
00:02:08.879
years ago. Um, took place in general
00:02:11.039
between 2014 and 2016. Um, the major
00:02:15.440
internals refactoring was in Rails 4.2 2
00:02:19.680
and then the public API was released in
00:02:22.080
Rails 5. So my first question to Sage
00:02:24.800
was how would you describe the
00:02:27.280
attributes API? And they said it's the
00:02:30.640
code that does the types
00:02:33.920
like okay pretty straightforward. Um and
00:02:37.280
then I asked them well what is type
00:02:38.800
casting? And they said when active
00:02:41.599
record receives your data it's almost
00:02:43.760
certainly going to be as a string. You
00:02:46.080
as the developer don't want a string.
00:02:48.560
You want an integer or date or whatever
00:02:51.519
useful Ruby object that data is
00:02:53.840
representing. The attributes API and
00:02:56.640
active record will deal with converting
00:02:58.720
from your Ruby objects to those string
00:03:01.280
representations and vice versa.
00:03:04.480
So I asked them where did the
00:03:06.319
inspiration come from? And they said
00:03:09.040
well we were on a project and had an
00:03:11.519
attribute that needed to be encrypted.
00:03:14.000
We wanted it to be that whenever you
00:03:16.239
accessed it, you got the decrypted
00:03:18.400
version and vice versa. We were using a
00:03:21.280
gem and I looked at the gem's code to
00:03:23.519
see how it was doing that. The gem was
00:03:26.239
pretty buggy, but I discovered that
00:03:28.080
looking at the code, how much of the
00:03:30.159
Rails internals it had to monkey patch
00:03:32.560
to do this work. And that's not the
00:03:34.400
gem's fault.
00:03:36.560
The gem had to monkey patch all these
00:03:38.799
different code paths. These are Rails
00:03:41.519
internals, so they are undocumented and
00:03:44.319
interact in subtle ways. This is very
00:03:46.879
difficult to get right.
00:03:49.519
Supporting multiple versions of Rails
00:03:51.680
for gem authors becomes difficult or
00:03:54.560
impossible when you're monkey patching
00:03:56.799
internals that aren't stable. It's like
00:03:59.439
building on top of shifting stand.
00:04:01.680
Because they are internal, they can
00:04:04.000
change without deprecation warnings in
00:04:06.480
between versions. Sage said, "I thought
00:04:09.200
that there had to be a better way. I
00:04:11.360
said that they sounded like an
00:04:13.200
infomercial."
00:04:14.879
They said, "Yes, exactly. I was the
00:04:17.199
person in Grayscale doing things the old
00:04:19.519
hard way. I was the person who had to
00:04:22.240
This is a a person cutting up bread with
00:04:25.600
a doors stop." And uh um yeah,
00:04:30.000
infomercial commercials are weird. Um
00:04:32.800
there had to be a better way.
00:04:35.919
Rails should make writing this gem
00:04:38.000
easier. Gem authors shouldn't have to
00:04:40.880
monkey patch all of these undocumented
00:04:43.759
Rails internals to get type casting of
00:04:47.120
encrypted attributes to work. So how did
00:04:50.160
type casting work before? Sage said well
00:04:54.560
poorly.
00:04:56.639
Um this was our example of type casting
00:04:59.600
taking the string one and casting it to
00:05:02.000
the integer one. Typ casting has three
00:05:05.280
general areas of work that it does.
00:05:07.600
Writing, reading, and saving. To show
00:05:10.720
how active record handled typ casting
00:05:12.720
before sages refactor, we're going to
00:05:15.199
try to follow the code path for writing
00:05:17.360
the ID attribute. It's going to be
00:05:19.360
totally fine. Trust me.
00:05:22.479
Okay, so don't mind the module counter
00:05:26.080
here. Um, we're starting off in write
00:05:28.080
RB, our first module. Um, write RB takes
00:05:32.000
the attribute name and then the value
00:05:36.639
which would be the ID and the string
00:05:38.800
one. Oh, wait, hang on. Sorry. Our write
00:05:41.440
attribute gets overwritten immediately
00:05:43.600
in a different module dirty. So in dirty
00:05:46.720
marbby our second module we have write
00:05:49.440
attribute again. Uh, we do some things
00:05:52.080
call super cool. So that takes us back
00:05:55.520
to write RB.
00:05:57.600
Now we have to resolve type cast
00:05:59.840
attribute for write. Okay, let's see its
00:06:02.400
definition. Oh wait, sorry, hang on. Um,
00:06:05.520
type cast for type cast attribute for
00:06:08.479
write also gets overwritten in a
00:06:10.639
separate module serialization.
00:06:13.280
Um, so this is just we're running you
00:06:15.919
through three different modules and we
00:06:17.919
go back to write again and then just
00:06:21.520
yeah, no, this is it's not great. Um, to
00:06:24.960
be clear, we're not even close to being
00:06:26.639
done with writing yet, and we've already
00:06:29.039
gone through three different modules.
00:06:32.160
This is what debugging in 4.1 felt like
00:06:35.039
when trying to debug this type casting
00:06:39.039
behavior.
00:06:40.880
So, these are just all of the modules
00:06:43.600
that could possibly contain overrides
00:06:46.160
and methods we're working with.
00:06:48.560
The logic was so spread out in these
00:06:50.400
modules, the behavior leaked everywhere.
00:06:52.639
It was way too much to hold in your
00:06:54.160
head. Sage said, "Which of the six
00:06:57.199
places that define the same method is
00:06:59.440
the one that actually has the bug?"
00:07:02.880
These methods were a nightmare to deal
00:07:04.560
with. Sage said that they were all sort
00:07:06.720
of ad hoc and spread across the modules.
00:07:10.560
You would have like six different
00:07:12.560
modules all define the same method. It
00:07:16.000
was spaghetti code. If each noodle was a
00:07:19.440
module and they're all lumped together,
00:07:22.400
eventually you resolved to two hashes
00:07:27.120
before typcast and after typcast. You
00:07:29.919
can see that in the before typcast hash,
00:07:33.360
the key is the string one. Sage really
00:07:35.440
wanted me to let you know that this the
00:07:37.280
keys are strings. Um that's very
00:07:40.080
important apparently. uh and the value
00:07:42.240
is the string one and then after type
00:07:44.880
cast it's the integer one. So it boiled
00:07:48.560
down to just two hashes but the road to
00:07:50.319
get there was pretty hard to parse.
00:07:54.400
This is stage first pull request to
00:07:56.560
Rails in the spring of 2014. There were
00:08:00.160
37 files changed. Their intention was to
00:08:04.879
just
00:08:06.400
refactor typ casting behavior. The
00:08:09.039
problem was another person was already
00:08:11.280
doing this work in active record. This
00:08:14.240
is what it felt like to review,
00:08:18.319
just a bit overwhelming.
00:08:21.440
Um, the maintainer of active record
00:08:23.280
wrote, "Please don't add more stuff to
00:08:25.039
this PR. Please don't." Um, Sage was
00:08:28.400
only trying to refactor how typ casting
00:08:30.479
worked in active record, but like how we
00:08:32.959
talked about earlier, its behavior
00:08:34.880
leaked everywhere. This first pull
00:08:37.200
request was way too big and way too
00:08:39.200
complex.
00:08:41.039
Soon after, Sage was invited to base
00:08:43.200
camp where the maintainers of Rails
00:08:44.880
chatted about working on the project.
00:08:46.959
Sage worked through that summer in 2014
00:08:49.600
and into the fall making small refactor
00:08:52.080
after small refactor.
00:08:54.800
And I asked Sage what was their
00:08:57.200
northstar? What were they trying to
00:08:59.680
refactor all this leaky code into? and
00:09:02.959
they said that they were refactoring to
00:09:05.680
something that managed state internally
00:09:08.959
in a way that you would expect in Ruby.
00:09:11.920
This means that instead of code spread
00:09:14.160
out all over the place, they wanted
00:09:16.399
objects, instance variables and methods
00:09:18.959
that encapsulated the behavior.
00:09:22.000
So remember our user example
00:09:25.200
in the refactor, you have the new
00:09:28.160
attribute object and new type objects.
00:09:31.440
You can see here in from user we're
00:09:33.839
taking the value which is the string one
00:09:36.320
and this new type object the type active
00:09:40.080
model type integer. The attribute object
00:09:43.920
holds the state and then it asks the
00:09:47.519
type object given to it how to do the
00:09:50.399
conversion to and from its type.
00:09:54.080
Put simply, the spaghetti code got put
00:09:56.240
into a box. The box being the attribute
00:09:59.120
object.
00:10:01.200
The attribute object can now ask, "Hey,
00:10:03.519
datetime, turn my string into one of
00:10:05.760
you." These type objects now hold the
00:10:08.640
logic for ta for casting. And the
00:10:11.680
attribute object delicates to them.
00:10:15.120
Debugging went from digging through
00:10:17.120
spaghetti to oh, we probably just used
00:10:19.839
the wrong type object. Cool.
00:10:22.880
Sage said that the code got
00:10:24.720
rearchitected into this more generic
00:10:27.120
way. Entire classes of bugs became just
00:10:30.000
impossible.
00:10:31.839
So for example, this one relatively
00:10:34.399
small PR just closed all of these issues
00:10:37.839
just by itself from this one refactor.
00:10:41.120
In another example of bug fixing, Sage
00:10:43.839
mentions how this feature just like
00:10:45.839
works now. So, if you were upgrading
00:10:49.200
your Rails project from 4.1 to 4.2 or
00:10:53.120
the beta for 5.0, you might have just
00:10:55.920
started having things work now. You
00:10:57.360
didn't even know were broken.
00:11:00.160
I asked Sage, "What do you think was the
00:11:02.720
hardest trade-off you had to make in
00:11:04.160
this refactor?"
00:11:06.000
Initially, making the code maintainable
00:11:08.880
came at a performance cost.
00:11:11.839
So I previously mentioned how the
00:11:13.760
initial code resolved to two hashes. The
00:11:16.720
work in memory of allocating two hashes
00:11:19.200
is computationally small.
00:11:21.920
Depending on how many records you are
00:11:23.680
reading from the database, these
00:11:25.360
allocations can add up pretty quickly
00:11:27.440
when you're allocating Ruby objects
00:11:29.760
instead of just two hashes.
00:11:32.480
Allocating a lot of Ruby objects is
00:11:34.480
objectively going to do more work than
00:11:37.120
two hashes. So how many are we talking
00:11:40.000
about? So it's one attribute per column
00:11:43.839
per record. In the real world, your user
00:11:46.800
table is going to have a lot more than
00:11:48.480
one attribute. So let's say we have a
00:11:52.160
users table
00:11:54.480
and we're have an admin page and we're
00:11:57.120
displaying 20 users and let's say you as
00:12:00.959
a developer are making a query and
00:12:02.880
you're showing three attributes but
00:12:05.040
maybe you don't select just those three.
00:12:07.279
Active record will grab all of them. Our
00:12:09.760
user has 40 columns.
00:12:12.560
So 20 users times 40 columns is 800 Ruby
00:12:16.639
allocations, which is objectively a lot
00:12:20.240
more than two hashes.
00:12:22.880
Um, multiply this by all of the
00:12:25.040
different records you're having to query
00:12:26.399
for. This adds up quickly.
00:12:30.720
So one of the lowhanging fruits from
00:12:32.800
performance gains was instead of
00:12:34.399
allocating all the objects, changing the
00:12:37.040
code to only instantiating the objects
00:12:39.600
as needed.
00:12:41.519
For example, this would reduce our
00:12:43.440
example from 800 Ruby allocations to 60,
00:12:47.600
which is a lot better. This is that PR
00:12:50.959
Sage said that I remember spending
00:12:53.120
several weeks just working on
00:12:54.480
performance and we clawed it back to the
00:12:56.720
point where they don't think that 4.2
00:12:58.720
into had significant performance
00:13:00.959
regressions and they said work as
00:13:04.959
working on open source projects was hard
00:13:07.680
because Rails is dependent on by a lot
00:13:10.720
of different
00:13:12.720
a lot of different users and it's tricky
00:13:15.040
because there are various benchmarks
00:13:16.560
that they can run but every application
00:13:18.720
is going to be different so covering all
00:13:21.360
of those is really hard and so I'm
00:13:24.399
pretty sure at one of the Rails comps
00:13:27.200
before this was released They had the
00:13:29.120
beta out and stages were literally
00:13:30.720
running around everybody, please try the
00:13:32.800
beta. I will pair with you if your app
00:13:35.920
gets slower and they ended up pairing
00:13:37.920
with like a lot of people.
00:13:41.920
Um, I asked Sage for their thoughts on
00:13:44.000
the importance of maintainability.
00:13:46.560
They said that I'm of the opinion that
00:13:49.120
if the code is buggy and unmaintainable,
00:13:51.760
it doesn't matter how performant it is.
00:13:54.160
If you're doing the wrong thing fast,
00:13:56.160
you're still doing the wrong thing.
00:13:59.600
Code that is hard to follow will attract
00:14:02.160
more bugs. When you make the code easy
00:14:04.880
to reason about, you make it more
00:14:06.639
stable. In open source projects, you
00:14:09.519
have a lot of hands touching the
00:14:11.680
project. And the more hands, the more
00:14:14.720
bugs it will it'll attract. So in
00:14:17.600
general
00:14:19.519
said you should try to leave the code
00:14:21.680
better than how you found it for
00:14:24.000
yourself and for other people and other
00:14:26.480
people includes you six months from now
00:14:29.279
when you have no I when you do get blame
00:14:31.279
and you're like who wrote this? Oh it
00:14:33.360
was me it I'm the problem it's me.
00:14:37.920
Um
00:14:40.240
Sage is a big proponent of commit
00:14:42.560
messages as documentation.
00:14:45.040
Stage this did this a lot by making
00:14:47.920
their commit messages have all of the
00:14:50.079
context that they were thinking of at
00:14:51.760
the time that they wrote their pull
00:14:53.360
requests.
00:14:54.880
Um, this isn't like a example of that.
00:14:57.920
This is like way too large to fit on a
00:15:00.560
screen. Uh, but famously they had a code
00:15:04.000
change that was like two lines and it
00:15:06.399
was like 19 paragraphs. So the the ratio
00:15:11.279
of context did not always meet the code,
00:15:15.279
but writing a talk like this was really
00:15:18.240
only possible given past Sage's context
00:15:21.199
at the time they left behind as
00:15:22.959
breadcrumbs.
00:15:25.279
Finally, I asked Sage for their thoughts
00:15:27.040
about contributing to open source
00:15:28.639
projects ones like especially like
00:15:30.800
Rails, but they had contributed to other
00:15:32.480
open source projects such as crates.io.
00:15:36.959
For context, CH contributed ton rails
00:15:39.680
over the years that the this refactor
00:15:41.279
took place. I took the screenshot
00:15:43.360
recently and even though they haven't
00:15:45.600
contributed in a while, they're still
00:15:47.279
number 15 of all time. So, I'm pretty
00:15:50.160
proud of that for them.
00:15:52.720
Um, so stability is essential,
00:15:57.120
especially for open source projects that
00:15:59.279
a lot of people use. You don't want
00:16:02.000
upgrading Rails versions to be painful.
00:16:04.240
I think we've heard quite a few people
00:16:06.160
at this talk looking back on Rails
00:16:08.480
mentioning how painful it was to upgrade
00:16:11.440
projects from from two to three. And
00:16:14.800
around the time five came out, the core
00:16:17.120
team started taking stability much more
00:16:19.680
seriously than in previous versions. If
00:16:22.560
your public APIs aren't stable and
00:16:25.199
upgrading is painful, people tend to
00:16:27.680
stay on the old versions. Open source
00:16:30.160
maintainers want people to upgrade. If
00:16:32.399
there's an important security fix, you
00:16:34.399
don't want to have to go back to older
00:16:36.320
versions and fix it when like you've
00:16:39.279
already fixed it in the current version.
00:16:43.199
And then I asked Sage, did they have any
00:16:46.079
tips for like how to be a helpful
00:16:47.839
helpful contributor? And they said,
00:16:51.040
please don't just put an open issue into
00:16:54.079
Claude and then just like open a PR with
00:16:56.320
the results like it's gonna get closed.
00:16:59.279
I'm sorry.
00:17:01.279
Um, please read the contribution guide.
00:17:03.839
They've written it for a reason and
00:17:05.839
they're usually pretty great to help you
00:17:07.839
get started.
00:17:09.919
Contribute how the project would like
00:17:11.600
you to contribute because oftentimes
00:17:13.520
contributing to open source can be
00:17:15.760
things like triaging issues and
00:17:18.079
reproducing issues. Um, people working
00:17:20.959
on open source projects are just people.
00:17:23.600
So, it's really good to just talk to
00:17:25.280
them if you're interested.
00:17:28.000
Don't be like Sage. Don't come out of
00:17:30.160
nowhere with a giant pull request and
00:17:32.400
expect it to be merged because it
00:17:34.720
probably won't. Thank you, Ruby.
00:17:38.720
Write code that's meant to be read by
00:17:40.480
another human because it will be right
00:17:43.919
now. Um,
00:17:46.880
shout out to our friends project uh code
00:17:49.600
triage. you can sign up for it, pick
00:17:52.240
some open source projects and it will
00:17:54.000
send you issues that are beginner
00:17:56.160
friendly to work on if you want to get
00:17:57.600
started contributing to open source.
00:18:00.240
So I asked Sage if they had any final
00:18:02.240
thoughts and they said that they think
00:18:04.799
the most impactful work was refactoring
00:18:07.200
active record to make it easier to use
00:18:09.600
and they spent years of their life
00:18:11.760
working on this and I think it's really
00:18:13.760
cool to be able to celebrate that and
00:18:16.480
making the code more maintainable at the
00:18:19.039
last Railscom. So I'm I'm currently a
00:18:22.559
thbot. Um you can find me tessa
00:18:25.520
thoughtbot. Um, I just want to thank
00:18:27.520
them for letting me come here today and
00:18:29.360
talk about this. It It feels full circle
00:18:31.280
cuz when Sage uh actually worked on
00:18:34.080
this, uh, they actually worked at
00:18:35.360
Thoughtbot and they used Thbots's open
00:18:38.720
source um, their investment time to
00:18:40.880
actually do this rewrite and so it feels
00:18:43.120
really special. So, thank you everybody.