00:00:16.880
Alex Cumins, a software engineer on the
00:00:18.800
infrastructure team at Persona. For the
00:00:21.119
past several years, I've been part of
00:00:22.320
the team responsible for scaling and
00:00:24.000
evolving the architecture behind our
00:00:25.760
identity platform. These days, I help
00:00:28.400
design and maintain the globally
00:00:30.240
distributed multicluster Kubernetes
00:00:32.480
setup we operate today. Although I
00:00:35.040
wasn't around for the very first commits
00:00:36.880
by our founders Rick and Charles, I know
00:00:38.800
the challenges they faced like many of
00:00:40.800
y'all in taking this from an initial
00:00:43.120
idea to a fully functional Rails
00:00:45.120
application and product. But before we
00:00:47.360
dive in, let me first give you a bit of
00:00:49.120
context on who we are at Persona and the
00:00:51.600
kind of problems our platform is built
00:00:53.199
to solve. Persona is an all-in-one
00:00:56.800
identity platform. Think of onboarding,
00:00:58.800
compliance, fraud prevention. We power
00:01:01.359
the behind-the-scenes workflows that
00:01:02.879
ensure individuals and businesses are
00:01:05.280
who they say they are. Our platform's
00:01:07.439
flexibility is what makes us stand out
00:01:08.960
and applicable to so many different use
00:01:10.560
cases.
00:01:12.240
We work with customers across a wide
00:01:14.080
range of industries, fintech,
00:01:15.920
healthcare, marketplaces, crypto,
00:01:18.320
travel, government, and more. And each
00:01:20.479
one has their own unique set of
00:01:22.080
compliance requirements, user flows,
00:01:24.080
risk tolerances, and regional
00:01:26.080
regulations. That means almost every
00:01:28.240
customer uses our platform in a slightly
00:01:30.159
different way. And as such, we've had to
00:01:32.799
architect for adaptability, a choice
00:01:34.960
that has influenced almost every part of
00:01:36.560
our stack. And with that context, let's
00:01:39.280
rewind the clock and walk through how
00:01:40.720
our Rails architecture has had to evolve
00:01:42.880
to support that level of flexibility,
00:01:44.880
scale, and diversity of use.
00:01:47.840
Persona started the same way so many
00:01:49.680
other companies have, with an idea and a
00:01:51.920
command. One we've all used to bootstrap
00:01:53.840
our grand visions. But that simple
00:01:55.520
command comes with a plethora of options
00:01:57.119
behind the scenes. Some of which are
00:01:58.560
innocuous early on, but carry deep
00:02:00.640
implications for how your system will
00:02:02.079
scale, evolve, and operate years down
00:02:04.479
the line. In our case, we launched on
00:02:06.640
Google App Engine, which gave us just
00:02:08.319
enough infrastructure to move fast
00:02:09.840
without worrying too much about
00:02:11.280
provisioning or deployments. If you're
00:02:13.760
not familiar, it's a fully managed
00:02:15.280
platform that abstracts away most the
00:02:17.200
operational complexity similar to Heroku
00:02:19.920
or a lightweight subset of what
00:02:21.280
Kubernetes offers out of the box.
00:02:24.160
We started on Rails 5.2, well into the
00:02:26.720
maturity of many Rails features. And
00:02:28.560
given that was a little over 7 years
00:02:30.160
ago, here's a refresher on some of the
00:02:32.080
major features that launched then. We'll
00:02:34.080
come back to a few of these and how we
00:02:35.519
manage scaling with them. But first,
00:02:37.200
let's talk about one that hit early and
00:02:38.879
often, the asset pipeline.
00:02:41.920
Let's time travel back to 2010, a time
00:02:44.000
when jQuery was your best friend. These
00:02:46.480
are screenshots of some actual code I
00:02:48.239
wrote in a Rails 2.3 application. You'd
00:02:50.879
manually include JavaScript files in
00:02:52.480
your templates. And yes, your actual
00:02:54.160
JavaScript logic would often live right
00:02:55.760
in your view templates, tightly coupled
00:02:57.519
to the markup it was enhancing, a
00:02:59.200
pattern that's become back in style. In
00:03:01.920
2011, Rails 3.1 introduced the asset
00:03:04.640
integrated asset pipeline. And it was a
00:03:06.400
gamecher. It gave us a structured way to
00:03:08.560
organize, bundle, and minify assets. In
00:03:11.360
the lower example, we include a file
00:03:12.959
named user tabs.js
00:03:15.760
to be executed on the page. But what
00:03:17.920
happens when that file changes? We
00:03:19.840
generally want browsers to cache script
00:03:21.440
content for performance reasons. But the
00:03:23.760
if the file name stays the same, users
00:03:25.519
might keep getting the old version even
00:03:27.360
after we've deployed new code. The asset
00:03:30.080
pipeline solved this with
00:03:31.120
fingerprinting, appending a unique hash
00:03:33.519
to the file name. With a new file name,
00:03:35.440
the browser requests the file, giving us
00:03:37.200
the best of both worlds. Long-term
00:03:38.879
caching when things don't change and
00:03:40.400
instant updates when they do. The
00:03:43.120
integrated asset pipeline has evolved
00:03:44.959
significantly, but it originally meant
00:03:46.879
sprockets with defaults of coffecript
00:03:48.799
and SAS, languages that compile to
00:03:50.799
JavaScript and CSS, respectively. And of
00:03:53.519
course, yeah, respectively. And of
00:03:55.599
course, we already saw as we already saw
00:03:57.120
a healthy dose of jQuery, which was the
00:03:59.519
go-to solution for modern front-end
00:04:01.519
interactivity at the time. Options for
00:04:04.400
what we now consider a full-fledged
00:04:05.920
front-end framework were limited. The
00:04:07.840
first release of what would eventually
00:04:09.120
become Ember came later in 2011, led by
00:04:11.920
YehudaZ, a member of the Roy Rails core
00:04:14.000
team. The front-end landscape quickly
00:04:16.400
exploded though in the following years.
00:04:17.919
New frameworks, modern JavaScript
00:04:19.600
features, many of which were influences
00:04:21.759
from coffecript, richer UIs, and rising
00:04:24.320
expectations from users and product
00:04:25.840
teams alike. Ember along with other
00:04:28.000
fledgling frame frameworks like
00:04:29.600
Angular.js JS and React shifted more of
00:04:32.080
the UI into the browser and with that
00:04:34.720
Rails took an increasingly took on the
00:04:36.960
role of an API provider rather than a
00:04:38.800
full page renderer.
00:04:41.280
With that explosion, the other major
00:04:43.040
responsibility left on Rails was
00:04:45.120
coordinating with a front-end build
00:04:46.560
system in the asset pipeline. That
00:04:48.880
resulted in the Webpacker gem, an MPM
00:04:51.120
package that was added as an option with
00:04:52.720
Rails 5.1 in 2016 and became the default
00:04:56.000
with Rails 6.0 in 2019. Persona started
00:04:59.199
squarely in the middle of that
00:05:00.320
evolution, starting with React and
00:05:01.919
TypeScript through Webpacker. Webpacker
00:05:04.320
set out to connect Rails with modern
00:05:05.840
front-end tooling and for a while it
00:05:08.080
did. But as complexity grew, we started
00:05:10.160
to see the cost. Hardy debug
00:05:12.000
configurations, slow feedback loops, and
00:05:14.320
lagging support for emerging tools. The
00:05:16.400
break neck pace of front-end innovation
00:05:18.160
made it nearly impossible for Webpacker
00:05:19.919
to keep up, turning what was meant to be
00:05:21.759
a bridge into a constant game of
00:05:23.680
catch-up. and all the while struggling
00:05:25.840
to reconcile Rails's opinionated
00:05:27.919
defaults and focus on quick
00:05:29.440
implementation with the extensibility
00:05:31.360
and configurability of modern build
00:05:33.120
systems. In the end, we've moved from
00:05:35.759
Webpacker to Shakaacker and now we've
00:05:37.919
adopted VIT, a modern native JavaScript
00:05:40.479
build tool that's fast, flexible, and
00:05:42.400
designed for today's work front-end
00:05:44.160
workflows. And that's been a recurring
00:05:46.240
theme for us. Rails gives you great
00:05:48.160
defaults, but you're not locked in. When
00:05:50.320
the built-in tools no longer fit your
00:05:52.000
scale, your team, or your workflows,
00:05:54.320
it's okay to step outside the box and
00:05:56.000
bring in what works.
00:05:58.800
As we started to grow, we started to hit
00:06:00.560
the natural limits of Google App Engine.
00:06:02.240
It had served us well during our early
00:06:03.919
days, giving us speed and simplicity
00:06:05.600
when we needed it most. But eventually,
00:06:07.680
the trade-offs became too hard to
00:06:09.199
ignore. We needed more flexibility
00:06:11.280
around how we structured services, or
00:06:13.120
really just to easily deploy multiple
00:06:14.800
services at all. To set the stage,
00:06:17.199
picture what scaling looked like in our
00:06:18.720
early days. A service would spike, an
00:06:20.639
alert would fire, and someone would jump
00:06:22.080
in and manually scale things up or down.
00:06:25.440
This is a slide from one of our all
00:06:26.960
hands meetings in April of 2020, showing
00:06:29.440
five straight days of manual scaling
00:06:31.520
operations, scaling services up and
00:06:33.520
down, sometimes multiple times a day,
00:06:35.600
just to keep things running smoothly.
00:06:37.600
that moment looking at this slide and
00:06:39.759
realizing how much of our energy was
00:06:41.360
going into keeping just the lights on,
00:06:43.680
it was a clear signal it was time to
00:06:45.440
grow into something more sustainable.
00:06:48.720
So, we knew we had to move off of Google
00:06:50.240
App Engine. But what options were
00:06:52.160
actually viable? The most basic option
00:06:54.639
would be to run raw virtual machines,
00:06:56.479
maybe wrapped in Terraform and managed
00:06:58.319
with Anible or some other homegrown
00:06:59.840
tooling. Technically, that probably
00:07:02.319
would have worked, but it would have
00:07:03.680
meant taking on a ton of operational
00:07:05.280
complexity ourselves, solving problems
00:07:07.039
that much more mature tools had already
00:07:08.880
solved. Another option was to leave GCP
00:07:11.840
entirely and move to AWS or Azure for a
00:07:15.840
different platform as a service. But
00:07:17.680
realistically, that wouldn't have
00:07:18.960
guaranteed a solution to any of our core
00:07:20.800
issues, and it would have added a
00:07:22.240
massive migration on top of an already
00:07:24.080
complex problem. After weighing the
00:07:26.560
options, we decided on something that
00:07:27.919
gave us the control we needed without
00:07:29.599
starting from scratch. Kubernetes via
00:07:31.599
GKE. GKE is Google Cloud's managed
00:07:34.560
Kubernetes offering. It handles the
00:07:36.319
heavy lifting of cluster provisioning,
00:07:38.960
upgrades, and node management while
00:07:40.639
still giving us substantial operational
00:07:42.240
control.
00:07:44.400
Making the jump from Google App Engine
00:07:45.919
to Kubernetes wasn't just a change in
00:07:47.520
deployment systems. It was a fundamental
00:07:49.520
shift in how we thought about
00:07:50.560
infrastructure. App Engine handled most
00:07:52.639
of the heavy lifting for us.
00:07:53.759
Provisioning, scaling, networking, even
00:07:55.520
deployment, all abstracted behind a few
00:07:57.599
CLI commands. But that simplicity came
00:07:59.919
at the cost of control. Migrating
00:08:01.840
Kubernetes gave us flexibility,
00:08:03.520
observability, and granular control, but
00:08:06.080
required a maturity leap in tooling and
00:08:08.319
practices because it asked us to take
00:08:10.000
ownership of every part of the stack.
00:08:11.919
From networking and observability to
00:08:13.440
deploy workflows and access control, we
00:08:15.440
suddenly had a lot more flexibility and
00:08:17.759
a lot more responsibility. Let's take a
00:08:20.240
sideby-side look at how each platform
00:08:21.840
handled the key components of our
00:08:23.039
infrastructure and what changed when we
00:08:24.720
made the switch on a on App Engine. You
00:08:27.919
simply push code and Google takes care
00:08:29.199
of the compute. No servers to provision
00:08:31.199
or orchestration tools to configure. In
00:08:33.519
Kubernetes, you manage the full life
00:08:35.039
cycle of containers and the nodes they
00:08:37.200
run on. depending on the cloud vendor or
00:08:40.000
on prem that can vary between relatively
00:08:42.399
easy with managed services like EKS and
00:08:45.279
GKE to fully hands-on if you're running
00:08:47.839
your own control plane and node
00:08:49.040
infrastructure.
00:08:50.560
GA also doesn't support GPUs or other
00:08:52.800
specialized compute resources which have
00:08:54.640
become increasingly common in the mo as
00:08:56.720
modern workloads have exploded in
00:08:58.480
popularity and utility and those now
00:09:01.360
power critical parts of Persona's
00:09:03.040
platform like document analysis,
00:09:04.880
biometric matching and real-time image
00:09:06.880
processing.
00:09:08.240
When it comes to controlling your
00:09:09.440
application scaling behavior, App Engine
00:09:11.440
allows you to define targets for
00:09:12.560
CPUization and concurrent requests, but
00:09:15.440
that's about it. Kubernetes on the other
00:09:17.440
hand gives you fine grain control with
00:09:19.040
the ability to look at both system and
00:09:20.480
custom metrics in addition to being able
00:09:22.399
to scale both horizontally and
00:09:23.920
vertically. It even supports custom
00:09:25.760
scaling logic through integrations with
00:09:27.440
external metrics APIs and other
00:09:29.440
controllers, making it highly extensible
00:09:31.760
whether you're scaling based on QEP,
00:09:33.440
request latency, web hooks, or any other
00:09:35.440
signal relevant to your application. As
00:09:37.760
we alluded to earlier, the this
00:09:39.360
flexibility was a key driver in our
00:09:40.959
migration to Kubernetes. App Engine
00:09:42.959
scaling model is heavily geared towards
00:09:44.800
request response web traffic and it
00:09:46.640
didn't handle background job processing
00:09:48.399
like what we do with Sidekick very well.
00:09:50.480
We needed more control over how and when
00:09:52.720
workers scaled, especially under our
00:09:54.800
very bursty workloads workloads and
00:09:57.519
Kubernetes gave us the tools to do that.
00:09:59.600
I'll be honest though, our move to
00:10:01.120
Kubernetes didn't immediately remove our
00:10:02.880
manual scaling desires. It took time,
00:10:05.360
experience, and a bit of patience to
00:10:07.440
craft hor uh horizontal pod autoscalers
00:10:10.000
that met our needs. But once we got
00:10:12.000
there, it changed the game. The system
00:10:13.760
finally started working with us, not
00:10:15.360
waiting for us to catch up.
00:10:17.839
Networking, like compute, is fully
00:10:19.519
managed by App Engine. There are no load
00:10:21.120
balancers to configure unless you'd
00:10:22.480
explicitly like to do so. And simply
00:10:24.240
deploying your application gives you a
00:10:25.600
publicly accessible endpoint out of the
00:10:27.120
box. With Kubernetes, you're empowered
00:10:29.120
with services, ingresses, gateways, and
00:10:31.519
a host and a host of related objects and
00:10:33.519
configuration knobs. You can build
00:10:35.360
complex load balancing strategies with
00:10:37.279
traffic routed across multiple services,
00:10:39.120
paths, or backends, all without leaving
00:10:41.600
the Kubernetes ecosystem. But with that
00:10:43.600
power comes responsibility. You now have
00:10:45.839
to manage DNS, TLS, health checks,
00:10:48.320
firewall rules, and more. All of which
00:10:50.880
can be can add operational overhead if
00:10:53.040
not carefully designed and properly
00:10:55.200
configured. It's not uncommon to see
00:10:57.519
deployments missing critical pieces like
00:10:59.519
readiness probes or ingress annotations
00:11:01.519
leading to flaky traffic routing, failed
00:11:03.760
rollouts, or subtle production issues.
00:11:07.680
App Engine makes observability
00:11:10.000
effortless. Logs and metrics are
00:11:11.760
automatically captured and integrated
00:11:13.200
into Google Cloud's monitoring tools
00:11:14.800
with minimal setup. It's simple,
00:11:16.640
consistent, and good enough for many use
00:11:18.240
cases right out of the box. In contrast,
00:11:20.320
Kubernetes gives you a blank slate. You
00:11:23.040
have the freedom to plug in manage
00:11:24.240
observability platforms like data dog or
00:11:26.480
build out your own stack with tools like
00:11:28.640
Prometheus. That freedom is powerful,
00:11:30.880
but it also means you're responsible for
00:11:32.640
wiring it all together, deciding what to
00:11:34.720
measure, and making sure nothing falls
00:11:36.800
through the cracks.
00:11:38.880
So, while the cub the move to Kubernetes
00:11:40.640
gave us the control and flexibility we
00:11:42.240
needed to scale our infrastructure, it
00:11:44.079
also came with new complexity that we
00:11:46.399
had to learn to manage carefully. Our
00:11:48.399
deployment infrastructure wasn't the
00:11:49.839
only thing that had to evolve. As our
00:11:51.920
usage grew, one of the next places we
00:11:53.839
felt real pressure was in our database
00:11:55.360
layer. As our product and customer base
00:11:57.839
evolved, so did our data in volume,
00:12:00.000
structure, and complexity.
00:12:02.800
In mid 2022, we began sharding our
00:12:05.200
application to address the growing
00:12:06.639
pressure on our primary MySQL cluster.
00:12:08.880
Starting by adding a second shard in the
00:12:10.480
same compute location. And just to keep
00:12:12.480
things interesting, we kicked off work
00:12:14.079
at the same time to add a third shard.
00:12:15.920
This time in Europe, driven by data
00:12:17.760
residency requirements that called for
00:12:19.360
isolating customer data within specific
00:12:21.120
jurisdictions. In the span of just 6
00:12:23.600
months, we went from one database and
00:12:25.600
one Kubernetes cluster in one region to
00:12:28.240
three shards across two regions and an
00:12:30.639
additional Kubernetes cluster to support
00:12:32.480
it all.
00:12:34.560
Rather than relying upon a single
00:12:36.000
database, we use a combination of MySQL
00:12:37.920
and MongoDB as our primary data stores
00:12:40.240
along with Elastic Search for search and
00:12:41.920
indexing workloads and Reddus for
00:12:43.680
caching, psychic cues, and other
00:12:45.519
ephemeral data. Each of these systems
00:12:47.760
brings its own strengths and its own
00:12:49.600
operational challenges, especially in
00:12:51.600
cloud managed environments. Choosing the
00:12:53.839
right one is only half the battle.
00:12:55.600
Scaling, tuning, and managing them in
00:12:57.760
production is where the real work
00:12:59.040
begins. While MongoDB offers native
00:13:01.600
support for sharding, making horizontal
00:13:03.680
scaling more straightforward, MySQL
00:13:06.240
posed a much harder challenge. Sharding
00:13:08.160
our relational data meant untangling
00:13:09.920
assumptions deeply buried in our
00:13:11.360
application code and schema. And that's
00:13:13.680
where we'll focus next.
00:13:16.320
Rails has only recently started offering
00:13:18.399
official support for sharding, but
00:13:19.760
applications have been hacking around it
00:13:21.360
that limitation for years. Rails 6.0
00:13:24.639
added support for configurable database
00:13:26.639
connections by model, allowing
00:13:28.160
applications to route specific models or
00:13:30.079
even reads versus writes to different
00:13:31.760
database instances using the connects to
00:13:33.519
and connected to APIs. This is
00:13:36.000
effectively known as vertical sharding
00:13:37.519
where you split entire tables or domains
00:13:39.760
across databases. One database might
00:13:41.839
handle user data, another might handle
00:13:43.440
payments or autolocks. It's relatively
00:13:45.440
straightforward because the location of
00:13:46.880
each type of data is fixed. You always
00:13:49.040
know which database to query based on
00:13:50.560
the model. Importantly though, this laid
00:13:53.519
the groundwork for what we typically
00:13:54.639
think of when we say sharding,
00:13:56.079
horizontal sharding. Splitting rows of
00:13:58.160
the same table across multiple
00:13:59.519
databases, each holding a different
00:14:01.199
slice of the data, but sharing the same
00:14:02.720
schema.
00:14:05.440
Long- aaited, Rails 6.1 introduced
00:14:07.760
native horizontal sharding. It was
00:14:09.760
finally relatively easy to support
00:14:12.480
multiple shards of the same model in
00:14:14.160
your application. When we set out to
00:14:16.160
shard our MySQL cluster, we weren't
00:14:17.839
starting with a clean slate. We were
00:14:19.519
adapting a growing rapidly changing
00:14:21.600
Rails application to pattern to a
00:14:23.279
pattern the framework had just recently
00:14:24.959
begun to support and as you can imagine
00:14:27.199
that came with its own set of surprises.
00:14:29.680
Rails's connected help to helper is an
00:14:32.000
essential building block for sharding.
00:14:33.360
It lets you swap the database connection
00:14:35.040
on the fly based on context. Think of it
00:14:37.680
like a railway system. Each shard is a
00:14:39.680
different destination and connected to
00:14:41.279
is a is the track switch. Before the
00:14:43.920
train, your request job or rig task
00:14:46.880
leaves the station. you need to flip the
00:14:48.560
switch to send it down the right track.
00:14:50.560
If you forget or flip the wrong one,
00:14:52.880
your data ends up at the wrong terminal
00:14:54.560
or worse on a collision course with
00:14:56.160
something else. And just like that, and
00:14:58.240
just like in a real railway system, you
00:14:59.920
can't expect the train to figure it out
00:15:01.519
mid route. This context has to be set up
00:15:04.399
front. In practice, this means your
00:15:06.240
codebase needs to use that building
00:15:07.519
block everywhere you're interacting with
00:15:09.600
data. No small feat, even in a midsize
00:15:11.920
Rails application.
00:15:14.639
Threading shard context through an app
00:15:16.160
isn't necessarily hard in isolated
00:15:18.079
cases, but it requires discipline and
00:15:20.000
consistency. For job processing, it's
00:15:22.240
relatively straightforward since you've
00:15:23.600
already looked up the shard by querying
00:15:25.440
the record. In our case, we added a
00:15:27.519
query parameter to the global ids of
00:15:29.199
objects passed to the jobs which
00:15:30.959
indicated the correct shard, allowing
00:15:32.560
the job to reconnect to the right
00:15:33.839
database when it runs.
00:15:36.560
For web requests, though, it's a bit
00:15:38.079
trickier. You're now forced to rethink
00:15:39.760
what information is required to route a
00:15:41.360
request and make sure that shard context
00:15:43.440
is both available and trustworthy by the
00:15:45.600
time the controller code runs. Take a
00:15:47.600
public API as an example. You're
00:15:49.360
probably identifying requests with an
00:15:50.800
API key. That becomes your routing key,
00:15:53.199
a piece of context that tells you which
00:15:54.560
shard the request should go to. But
00:15:56.480
that's only half of the equation. You
00:15:58.320
now need some kind of directory or
00:15:59.920
lookup table, a centralized way to map
00:16:01.759
that routing key, API key, object token
00:16:04.560
to the correct shard. Notice the fine
00:16:06.720
shard example in this call in this
00:16:08.880
example from our application. Without
00:16:10.800
that layer of indirection, you're left
00:16:12.560
hard coding assumptions into your app
00:16:14.160
and that just doesn't scale. And here's
00:16:16.160
where things get even more interesting.
00:16:17.839
That lookup table might need to scale
00:16:19.519
far more than you'd initially expect. In
00:16:22.320
many cases, you're supporting APIs with
00:16:24.480
unchangeable contracts. Maybe they're
00:16:26.560
embedded in physical hardware, IoT
00:16:28.720
devices, or distributed SDKs in mobile
00:16:31.279
apps that can't be easily updated. That
00:16:33.920
means every request to your platform,
00:16:35.519
even the very first one, needs to hit
00:16:37.360
the right shard with no opportunity for
00:16:39.519
client side logic to help.
00:16:43.279
As a result, what looks like a simple
00:16:44.959
lookup turns into a high throughput, low
00:16:47.519
latency critical code path. One that
00:16:50.079
needs to be highly available, globally
00:16:52.000
accessible, and fast enough to sit in
00:16:54.000
front of any userfacing request.
00:16:56.800
At Persona, we solved this by backing
00:16:58.639
our lookup table with MongoDB, which
00:17:00.480
makes it very easy to support globally
00:17:02.160
distributed read replicas with minimal
00:17:04.079
operational overhead. That allowed us to
00:17:06.799
serve shard routing lookups close to the
00:17:08.640
user no matter where the request is
00:17:10.160
processing because the routing logic
00:17:12.400
sits in the critical path of almost
00:17:13.760
every request, especially in our public
00:17:16.160
or SDK facing APIs. Having that data
00:17:19.199
available fast and everywhere was
00:17:21.199
non-negotiable. As of last week, we had
00:17:23.839
just over a billion entries in that
00:17:26.000
lookup table.
00:17:28.720
Long before we even needed to shard, we
00:17:30.640
were already feeling the pressure of
00:17:31.840
working with large MySQL tables. As
00:17:34.000
usage grew, certain tables ballooned in
00:17:36.080
size, and that brought a new class of
00:17:37.840
problems. Slow queries, painful
00:17:39.919
migrations, unpredictable query plans,
00:17:42.320
and operational risk from even simple
00:17:44.080
schema changes. And while sharting helps
00:17:46.240
you scale horizontally, it doesn't
00:17:47.840
eliminate those problems. In fact, it
00:17:49.600
can make them even harder to manage. Now
00:17:51.679
you're not just maintaining one large
00:17:53.440
table. You're maintaining that same
00:17:54.799
large table across a multiple shards.
00:17:57.200
Every schema change, every index tweak,
00:17:59.600
every and every performance fix now has
00:18:01.360
to be repeated across end databases.
00:18:05.760
Schema changes on large MySQL tables can
00:18:08.480
be deceptively dangerous. By default,
00:18:10.720
operations like add column, modify or
00:18:12.960
drop index can lock the table, block
00:18:15.679
reads or writes and introduce unexpected
00:18:17.919
performance regressions, especially if
00:18:19.840
that table is in the critical path. For
00:18:21.760
a long time, tools like Perona's PT
00:18:23.840
online sk online schema change and the
00:18:26.400
LHM large headron migrator migrator
00:18:29.360
originally open sourced by SoundCloud
00:18:31.120
and now maintained by Shopify, have seek
00:18:33.360
to bridge that gap. For extremely large
00:18:36.000
tables though, it can result in
00:18:37.200
migrations taking weeks, potentially
00:18:39.120
stalling the work of other engineers or
00:18:41.120
changing query planning results and
00:18:42.880
slowing down unrelated parts of your
00:18:44.480
application. Recent versions of MySQL,
00:18:47.039
particularly 8.0, support more instant
00:18:48.960
DDL operations like adding and removing
00:18:51.200
some columns or modifying default values
00:18:53.520
without requiring full table rebuilds.
00:18:57.039
That still leaves things like index
00:18:58.640
creation requiring full rebuilds. So
00:19:00.720
that's where access pattern design
00:19:02.080
becomes essential. One of the biggest
00:19:03.840
challenges with large MySQL tables is
00:19:05.679
that small inefficiencies at scale
00:19:08.640
really start to hurt. A missing index, a
00:19:11.039
poorly chosen primary key, or an
00:19:12.640
unexpected query plan might be invisible
00:19:14.480
with 100,000 rows, but with 100 million,
00:19:16.720
it becomes a problem you can't ignore.
00:19:18.799
It's not enough to model your schema
00:19:20.240
around the shape of your data. You have
00:19:22.559
to model around how that data will be
00:19:24.000
queried, filtered, and joined in real
00:19:26.320
application usage. If you've worked with
00:19:28.480
NoSQL systems like DynamoB or MongoDB,
00:19:32.080
this mindset may already be familiar
00:19:33.840
where you have to design your schema
00:19:35.120
around your queries, not your entities.
00:19:37.679
In relational databases, that kind of
00:19:39.679
upfront thinking is often overlooked,
00:19:41.919
partly because you can get away with it,
00:19:44.320
especially early on, and it lets you
00:19:46.080
build faster. But as tables grow and
00:19:48.080
usage scales, those early shortcuts
00:19:50.160
start turning into real pain.
00:19:53.039
Large tables also introduce challenges
00:19:54.720
when needing when teams need to run back
00:19:56.240
fills. Not just because they take a long
00:19:58.160
time, but because they can
00:19:59.280
unintentionally impact performance.
00:20:01.280
Depending on how the backfill is
00:20:02.559
executed, it can evict hot pages from
00:20:04.640
the MySQL buffer pool, alter index
00:20:06.720
statistics, or disrupt caching behavior.
00:20:08.799
All of which can degrade query
00:20:10.320
performance in unpredictable ways. Of
00:20:13.360
all the table, large tables at Persona,
00:20:15.600
the top two are from a familiar Rails
00:20:17.679
component.
00:20:20.160
Active storage was introduced to
00:20:22.080
simplify interactions with files stored
00:20:23.679
in cloud object stores like S3 or GCS
00:20:26.720
and provides a flexible attachment
00:20:28.240
system that makes it very easy to
00:20:29.679
associate with any active record model.
00:20:31.840
It's built around two main tables. Blobs
00:20:34.240
which represent the actual files in the
00:20:35.919
in the object store and attachments a
00:20:37.840
polymorphic join table that connects
00:20:39.760
those blobs to an application record. At
00:20:42.559
Persona, these two tables represent the
00:20:44.640
top two by row count in our application.
00:20:47.600
Since each file is attached to exactly
00:20:49.440
one record, their row counts are nearly
00:20:51.280
identical at around 3.4 billion. That
00:20:55.120
makes them the perfect storm. They're
00:20:56.640
huge, they're hot, and they're hard to
00:20:58.559
touch, which becomes especially painful
00:21:01.200
when you need to back fill metadata,
00:21:02.880
migrate attachments, or optimize access
00:21:05.039
patterns. And modifying models outside
00:21:07.440
of your application, whether or not
00:21:09.120
they're Rails components or external
00:21:10.799
gems, is particularly tricky. Given
00:21:13.440
these challenges, we've started to
00:21:15.039
migrating to Shrine, which takes a more
00:21:16.640
lightweight modular approach, using
00:21:18.960
fields directly on individual records to
00:21:20.799
track the file attachments instead of
00:21:22.559
requiring a separate join model.
00:21:25.360
Sometimes though, it's the core of your
00:21:27.280
system. It's not the core of your system
00:21:29.039
that causes the most pain. It's the
00:21:30.640
abstractions you thought you didn't need
00:21:32.320
to think about.
00:21:34.799
We've been talking a lot about MySQL,
00:21:36.400
and that's intentional. It's the
00:21:37.760
database powering many Rails
00:21:38.960
applications. But many of these lessons
00:21:40.720
aren't unique to MySQL. You'll encounter
00:21:42.400
them with any relational database. At
00:21:44.640
scale, every data store brings its own
00:21:46.640
set of challenges, and we face them all.
00:21:48.480
We've dealt with hot shards on MongoDB,
00:21:50.400
fought through index tuning and cluster
00:21:51.840
pressure on Elastic Search, and even had
00:21:53.520
to shard Reddus to keep up with psychic
00:21:55.120
throughput. The reality is no database
00:21:57.760
stays easy forever. Once you're
00:21:59.679
operating at scale, even the managed
00:22:01.760
parts of your stack demand careful
00:22:03.440
planning, constant tuning, and a good
00:22:05.520
dose of humility. While I'd love to
00:22:07.919
unpack all those war stories, we simply
00:22:10.080
don't have time today. You might be
00:22:11.760
wondering though, where's the simplicity
00:22:13.280
in all this? That leads me into what
00:22:15.520
we're working on now, an intentional
00:22:17.200
return to simplicity.
00:22:19.679
You might be wondering why this slide
00:22:21.200
says one Kubernetes cluster and one
00:22:22.720
MySQL cluster. After everything we
00:22:24.640
talked about, that probably sounds
00:22:26.400
backwards. We didn't shrink. We didn't
00:22:29.039
suddenly stop needing to scale. What we
00:22:31.200
did was simplify. We took everything we
00:22:33.440
learned, the patterns, the guardrails,
00:22:35.600
the winds, the pain points, and
00:22:37.520
restructured it into a single
00:22:39.039
consolidated deployment model designed
00:22:41.120
to reuse, designed for reuse with strong
00:22:43.840
tenency boundaries and predictable
00:22:45.440
growth curves. It's not a step
00:22:47.440
backwards. It's a step of years of
00:22:49.520
experience teaching us that for the way
00:22:51.919
we scale, simplicity might be the only
00:22:53.919
way to do it without losing your mind.
00:22:57.679
This architectural shift is a project we
00:22:59.600
call stacks. The idea was simple.
00:23:02.000
Instead of scattering complexity across
00:23:03.440
multiple clusters, databases, and other
00:23:05.679
systems, we define a single
00:23:07.360
self-contained unit that could run our
00:23:09.039
full platform.
00:23:10.880
And then we replicate it again and again
00:23:14.640
and again.
00:23:19.280
In some ways, this architecture
00:23:20.880
resembles what was often called single
00:23:22.480
tenency, where some where each customer
00:23:24.480
gets their own isolated environment. In
00:23:27.200
our case, though, it's a bit more
00:23:28.400
nuanced. Each stack isn't necessarily
00:23:30.720
tied to one customer. It's more like a
00:23:33.120
self-contained runtime that can host
00:23:34.799
many tenants, but with strong boundaries
00:23:36.799
between the stacks themselves. So, while
00:23:38.640
we borrow some of the benefits of single
00:23:40.000
tenency, like isolation, blast radius
00:23:42.159
reduction, and operational flexibility,
00:23:44.320
we don't take on the overhead of
00:23:45.520
spinning up a new environment for every
00:23:46.880
single customer, essentially a middle
00:23:48.559
ground. While the main components of
00:23:50.400
each stack are isolated, introdu
00:23:52.159
including their own databases. There are
00:23:54.080
still a few database back services that
00:23:55.520
we share across all customers. Chief
00:23:57.600
among them is the lookup table we
00:23:59.120
discussed earlier which help routes the
00:24:00.960
requests to the correct stack.
00:24:03.520
We call these cores centralized systems
00:24:06.080
that power critical functionality across
00:24:08.159
all our environments while everything
00:24:09.679
else remains stack specific.
00:24:13.039
Routing a request to the right stack
00:24:14.880
isn't all that different from than what
00:24:16.480
we had to do with sharding. It has to be
00:24:18.080
correct from the very beginning. Just
00:24:20.159
like with database sharding, there's no
00:24:21.679
room for ambiguity. Once a request hits
00:24:23.679
our edge, we need to know exactly which
00:24:25.440
stack should handle it. If we get it
00:24:26.880
wrong, the request simply fails. So,
00:24:28.960
this isn't just a routing concern. It's
00:24:30.880
a critical correctness boundary. Let's
00:24:33.120
walk through a real world example to see
00:24:34.799
how all this comes together. Though,
00:24:36.640
just to be clear, we're pretending this
00:24:38.400
is the actual map of all of our edge and
00:24:40.000
main compute locations. The real one is
00:24:41.760
a bit too dense and not nearly as slide
00:24:43.600
friendly, but this gives you the general
00:24:45.200
idea hopefully. Consider the green
00:24:47.120
triangles to be our edge locations and
00:24:48.880
the coral cans as our main compute. Say
00:24:51.440
you were make a request from here in
00:24:53.279
Philadelphia.
00:24:54.960
That request would get routed to the
00:24:56.320
nearest edge location. That might be as
00:24:58.320
close as down the street or a few
00:25:00.159
hundred miles away. Kind of depends on
00:25:02.080
how the internet's behaving that day.
00:25:04.559
Let's say that happens to be in
00:25:05.679
Virginia, a pretty short hop at light
00:25:07.440
speed. To make stack routing work, we
00:25:10.000
run code at the edge as close to the
00:25:11.919
user as possible. This layer inspects
00:25:14.240
each incoming request and parses out the
00:25:15.919
key routing metadata.
00:25:18.640
Things like object tokens, API keys,
00:25:21.039
sessions. We'll first attempt to look
00:25:23.679
that key in a local cache. For things
00:25:26.559
like API keys where callers are
00:25:28.159
typically isolated to one or two
00:25:29.919
locations, we see a really high cache
00:25:32.320
hit rate, which means we can route that
00:25:34.320
request to the correct stack almost
00:25:36.000
instantly. In the roughly 7% of the time
00:25:38.559
that we miss, like for routing keys that
00:25:40.559
have collars spread across many
00:25:41.840
locations and don't frequently repeat,
00:25:44.480
we we'll query the lookup table in
00:25:45.919
MongoDB. We have read replicas and
00:25:48.960
distributed across the globe. So we're
00:25:50.640
able to make decisions in under 150
00:25:52.320
milliseconds for 95% of those cache miss
00:25:55.279
requests. Now that we've determined
00:25:57.600
where your request should go, we'll
00:25:58.960
actually route it there.
00:26:01.600
This time to a stack in Europe where the
00:26:03.520
request will be processed. All in this
00:26:06.400
approach allows us to introduce this
00:26:07.840
architecture change with zero
00:26:09.520
modifications to customer
00:26:10.799
implementations. No SDK updates, no
00:26:13.840
endpoint changes, no new headers, just a
00:26:17.360
clean cleaner, more scalable back-end
00:26:19.279
infrastructure that works exactly the
00:26:20.720
same from the outside. That kind of
00:26:22.640
seamless evolution is hard to pull off,
00:26:24.720
but when it works, it's one of the
00:26:26.080
clearest signs that your abstractions
00:26:27.600
are holding up.
00:26:30.960
While these stories have been about how
00:26:32.400
we manage the last seven years, scaling
00:26:34.240
Rails, taming complexity, and evolving
00:26:36.320
our architecture, they're really about
00:26:38.480
something bigger. And we've learned a
00:26:40.000
few lessons along the way. Complexity is
00:26:42.240
inevitable, but if you're intentional,
00:26:44.320
you can choose where that complexity
00:26:45.600
lives. Rails gives us great defaults,
00:26:47.919
and we've embraced them. But we've also
00:26:49.919
learned not to treat those defaults as
00:26:51.360
constraints.
00:26:52.880
Simplify where you can, scale where you
00:26:54.880
must, and don't be afraid to step
00:26:56.320
outside the box when necessary. I
00:26:58.480
appreciate you all joining me today to
00:27:00.080
hear some of our war stories. And I hope
00:27:01.840
that some of these lessons help you on
00:27:03.039
your own journey scaling Rails, whether
00:27:04.799
you're just starting out or deep in the
00:27:06.400
trenches.
00:27:08.080
On a final note, this is the Persona
00:27:09.679
team we have at RailsCom this week.
00:27:11.120
You'll be able to find us in the at the
00:27:12.480
Persona booth back there above in the
00:27:14.799
the floor above in in the Liberty foyer
00:27:17.679
or around at sessions. We'd love to chat
00:27:19.520
with you. We are hiring. Thank you very
00:27:21.840
much.