00:00:16.640
Thank you everyone for coming to my
00:00:18.160
talk. Uh I wasn't nervous before but
00:00:20.720
looking at the size of the crowd now I
00:00:22.240
am. Uh yeah. So let's get into it.
00:00:29.359
So why care about your web server?
00:00:32.239
Because choosing your web server is one
00:00:34.399
of the highest leverage decisions you
00:00:36.559
can make. It impacts performance, the
00:00:39.760
infrastructure cost, and the kind of
00:00:41.840
production issues you see. But there is
00:00:44.320
another lesser known reason this choice
00:00:47.360
is critical. The right web server can
00:00:50.800
provide an elegant solution to problems
00:00:53.680
that are otherwise complex to solve in
00:00:56.320
your application code. My goal today is
00:00:59.840
to give you a framework to navigate
00:01:02.079
these choices so you can confidently
00:01:04.559
select the right server for your needs.
00:01:09.040
We will start with the foundational
00:01:10.960
principles that govern the behavior of
00:01:13.680
modern Ruby web servers. Then we will
00:01:17.040
analyze Puma to understand its
00:01:19.759
architectural constraints. From there we
00:01:22.960
will examine two specialized servers
00:01:25.520
Falcon and Pitchfork
00:01:28.000
each addressing one of those
00:01:30.400
constraints. We will conclude with a
00:01:33.040
framework for making a decision. Let's
00:01:36.000
begin with the fundamentals.
00:01:40.799
To understand Ruby web servers, we have
00:01:43.360
to start with the global VM lock or GVL.
00:01:46.880
The GVL ensures that within a single
00:01:49.680
process, only one thread can execute
00:01:52.640
Ruby code at a time. This has direct
00:01:55.360
consequence. If you have a four core
00:01:57.840
server and you run only one Rails
00:02:00.079
process, you are utilizing only one of
00:02:02.719
those cores. To achieve parallelism and
00:02:06.079
to utilize all of your hardware, you
00:02:08.959
need to run at least four Rails
00:02:10.560
processes.
00:02:13.280
This multiprocess approach isn't new.
00:02:16.959
Early servers like Mongril required
00:02:19.360
running independent processes which was
00:02:22.400
slow to start and hard to manage. The
00:02:25.680
major evolution was a pre-forking model
00:02:28.800
popularized by Unicorn.
00:02:31.200
The idea is to start one master process
00:02:34.959
that boots the application and it then
00:02:37.599
fks cheap fast starting worker
00:02:40.560
processes. This master worker model is a
00:02:43.920
foundation for the servers we use today.
00:02:47.680
This design provides two fundamental
00:02:49.680
advantages. First centralized
00:02:52.640
management. The master process can
00:02:55.200
monitor, restart and manage it workers.
00:02:58.720
Restarting a failed worker is nearly
00:03:00.959
instantaneous.
00:03:02.800
Second and more importantly is memory
00:03:05.760
efficiency enabled by an operating
00:03:08.400
system feature called copy on write.
00:03:12.319
Copy on write is an OS level
00:03:14.239
optimization.
00:03:15.840
When a master process fks a worker, the
00:03:18.800
OS doesn't immediately duplicate all the
00:03:22.159
memory. Instead, they all share the same
00:03:26.159
copy. So if our parent process uses 300
00:03:30.239
MB to load the app after forking two
00:03:33.360
workers, the total memory usage is still
00:03:37.040
just 300 MB. This is what makes
00:03:40.239
multipprocess architecture viable.
00:03:44.239
When either the parent or the child
00:03:47.040
modifies a memory, it becomes private
00:03:50.080
memory. Any new memory allocations done
00:03:53.599
by the processes are also private.
00:03:58.000
Because of the sharing, measuring real
00:04:00.799
memory usage can be tricky. Tools like
00:04:04.239
top often show RSS or resident set size,
00:04:08.400
which overstays the memory usage by not
00:04:11.200
accounting for shared memory. A more
00:04:13.760
accurate metric is PSS or proportional
00:04:16.880
set size.
00:04:18.799
We won't go into the details, but it's
00:04:20.959
important to know that this complexity
00:04:22.960
exists when you're analyzing memory
00:04:25.040
usage.
00:04:27.440
That brings us to Puma.
00:04:31.280
Puma builds on the pre-forking model by
00:04:33.919
adding another layer of concurrency
00:04:36.400
threads. It's a hybrid, multiprocess,
00:04:39.919
multi-threaded architecture. As a
00:04:42.560
generalist, it performs well in many
00:04:44.960
scenarios, but it faces challenges at
00:04:48.400
two extremes.
00:04:50.720
Let's examine its first major constraint
00:04:53.440
which appears under heavily IO bound
00:04:56.560
workloads.
00:04:58.960
When threads block waiting for database
00:05:01.199
queries,
00:05:03.199
API calls or file operations, they
00:05:06.479
become unavailable to serve new
00:05:09.120
requests. This effectively reduces your
00:05:13.199
systems capacity even when your CPU
00:05:16.240
resources remain idle.
00:05:21.039
When your system operates near capacity,
00:05:23.919
this leads to a cascade of performance
00:05:26.160
degradation.
00:05:27.680
CPU utilization drops as threads wait,
00:05:30.800
throughput plummets, and the request
00:05:32.960
backlog grows. Your server becomes
00:05:35.440
paralyzed, not from being overworked,
00:05:38.720
but from being stuck in a waiting state.
00:05:42.960
A seemingly intuitive response to IO
00:05:45.840
bottlenecks is to increase the thread
00:05:47.840
count.
00:05:49.280
This can provide a temporary increase in
00:05:51.840
capacity. However, for a typical mixed
00:05:55.759
workload application, this strategy
00:05:58.400
introduces a new problem. It increases
00:06:01.759
the contention for GVL,
00:06:04.479
which brings us to constraint number
00:06:06.479
two, the GVL.
00:06:10.080
As a quick refresher, Ruby's threading
00:06:12.960
model has three key characteristics.
00:06:16.080
First, within a process, only one thread
00:06:19.039
can hold the GVL and execute Ruby code
00:06:21.520
at a time. Second, IO operations are the
00:06:25.199
exception.
00:06:26.960
A thread releases a GVL before a
00:06:29.840
blocking IO call.
00:06:32.319
And third, Ruby implements preemptive
00:06:35.199
shoulduling. So, a thread can't hold the
00:06:37.280
GVL for more than 100 milliseconds if
00:06:40.160
another thread is waiting for the GVL.
00:06:44.080
Let's observe how this GVL contention
00:06:46.639
manifests. We will trace two requests
00:06:49.440
that in isolation each take 270
00:06:53.520
milliseconds to complete. Now we'll see
00:06:56.160
what happens when they are processed
00:06:58.160
concurrently by two threads in a same
00:07:00.960
PUMA worker.
00:07:04.080
Despite both requests having the same
00:07:06.560
individual completion time, the GVL
00:07:09.199
contention creates additional
00:07:11.599
non-deterministic latency for both.
00:07:15.759
The result is a 48% increase in latency
00:07:18.639
for request one and a 37% increase for
00:07:21.759
request two. This is a GVL tax and it's
00:07:25.759
not a theoretical problem. It has driven
00:07:28.400
major real world architectural
00:07:30.080
decisions. For instance, the rail stream
00:07:33.199
lowered the default Puma threads from 5
00:07:35.599
to three directly citing this latency
00:07:38.160
impact. Basec camp prioritizing
00:07:41.120
predictable performance runs Puma with
00:07:43.520
just a single thread. And Shopify to
00:07:46.639
optimize for latency runs processbased
00:07:49.280
servers like Unicon and Pitchfork. These
00:07:52.720
are major players in our ecosystem all
00:07:55.199
making a clear tradeoff. They're
00:07:57.759
sacrificing some of the benefits of the
00:07:59.840
threaded model to avoid the GVL tax.
00:08:03.280
This is a core constraint of Puma's
00:08:06.000
hybrid architecture.
00:08:08.960
So we have established Puma's two
00:08:11.120
challenges, IO blocking and GVL
00:08:14.240
contention.
00:08:15.840
Let's address the IO problem directly.
00:08:19.039
What if we used a different concurrency
00:08:21.360
primitive than threads? This leads us to
00:08:24.960
an entirely different architecture. in
00:08:27.039
Falcon which is built on fibers and an
00:08:29.840
event loop
00:08:32.000
like Puma. Falcon is a multi-process
00:08:35.519
pre-foroking server for CPU parallelism.
00:08:38.719
The key difference is inside each
00:08:40.800
worker. There is no thread pool.
00:08:43.839
Instead, an event loop manages tasks.
00:08:47.680
When a request arrives, it is assigned
00:08:50.000
to a lightweight fiber.
00:08:52.560
There are no limits on the number of
00:08:54.800
fibers spawned. Falcon will spawn as
00:08:57.839
many fibers as there are requests.
00:09:01.040
The power of this model is how it
00:09:03.200
handles I/IO.
00:09:05.120
When a fiber encounters a blocking
00:09:07.440
operation, the fiber scheduleuler
00:09:09.839
automatically yields control, but the
00:09:12.240
worker process itself does not block.
00:09:15.040
The event loop is free to immediately
00:09:17.519
run another fiber. This allows a single
00:09:20.959
worker to handle thousands of concurrent
00:09:23.600
IO operations.
00:09:26.399
However, this introduces a critical
00:09:28.800
trade-off. Threads are preempted by the
00:09:31.600
runtime. Fibers, on the other hand, are
00:09:34.560
cooperative. They must explicitly yield.
00:09:38.240
A fiber will continue running until it
00:09:40.959
completes, yields, or performs a
00:09:43.760
blocking IO operation. The consequence
00:09:47.040
is that a single longunning CPUbound
00:09:50.480
fiber will block the event loop and
00:09:53.120
starve all other fibers.
00:09:55.680
This makes Falcon a highly specialized
00:09:58.160
tool. It is exceptionally efficient for
00:10:00.959
IO bound workloads but it is not
00:10:03.440
designed for CPUbound workloads.
00:10:06.800
So when should we use Falcon?
00:10:10.080
The obvious answer is for any
00:10:12.480
application with very high IO usage.
00:10:15.760
Anything that spends most of its time
00:10:18.079
waiting for the network or the disk.
00:10:21.360
But the more interesting answer is that
00:10:23.200
Falcon gives us a fundamentally new way
00:10:25.760
to write concurrent code in Ruby. If you
00:10:28.399
have ever looked at event-driven
00:10:30.000
concurrency in NodeJS and wished you had
00:10:33.200
something similar in Ruby, Falcon is
00:10:35.519
your answer. This ess powered by Falcon
00:10:39.360
and the Sing gem is exceptionally
00:10:41.920
powerful. It allows you to break long
00:10:45.120
head Rails rules that were created to
00:10:47.760
work around blocking IO. For example,
00:10:50.800
the rule never make a slow API call in
00:10:54.320
the controller exists because it would
00:10:57.360
tie up a precious Puma thread. With
00:11:00.320
Falcon, that rule no longer exists. You
00:11:03.839
can make those slow API calls directly
00:11:06.320
and with much simpler code. This
00:11:09.519
approach has tremendous potential. If
00:11:12.079
this interests you, I did a deep dive on
00:11:14.320
both Falcon and as gem last year.
00:11:19.360
So let's reassess. Falcon provides an
00:11:23.279
effective solution for heavily IO bound
00:11:25.839
systems. However, it does not address
00:11:29.040
the GVL contention that impacts CPUbound
00:11:32.000
work. It also doesn't solve certain
00:11:34.880
memory inefficiencies
00:11:36.959
inherent in the standard copy on write
00:11:39.120
model. To address those remaining
00:11:41.360
issues, we need to look at an evolution
00:11:44.079
of the pure process model. That brings
00:11:46.560
us to Pitchfork.
00:11:50.079
At its heart, Pitchfork is a direct
00:11:52.399
descendant of Unicorn. It fully embraces
00:11:55.920
the one request per process model. This
00:11:58.959
provides true GVL-free parallelism and
00:12:02.000
outstanding resiliency. But Pitchfork
00:12:04.720
creator wanted to solve the biggest
00:12:07.040
drawback of this model, memory usage.
00:12:11.519
Let's revisit our copy on write example.
00:12:15.279
We have a parent and two fork children
00:12:18.079
with both shared and private memory.
00:12:22.399
The key to pitchfork's optimization is
00:12:25.040
realizing that private memory isn't
00:12:28.160
monolithic. It has two parts. There is
00:12:32.079
private processing memory for objects in
00:12:34.959
a single request which gets garbage
00:12:36.880
collected.
00:12:38.639
There is also private static memory used
00:12:41.680
by the VM for things like inline caches
00:12:44.720
and JIT compile code. This is a warm-up
00:12:48.399
data that makes a worker fast.
00:12:52.079
And this reveals the critical
00:12:53.839
inefficiency of a standard server. As
00:12:57.120
each worker warms up, it builds this
00:12:59.440
private static block of JIT caches and
00:13:02.240
VM optimizations.
00:13:04.240
Now the crucial point is this. While the
00:13:07.040
content of this caches will be nearly
00:13:09.920
identical across all warmed up workers,
00:13:12.880
the physical memory pages they occupy
00:13:15.920
are not shared. Each worker has to build
00:13:19.360
and store its own private copy. You end
00:13:22.880
up with dozens of processes all holding
00:13:26.560
identical but separate copies of the
00:13:29.200
same warm-up data. This is a significant
00:13:32.959
waste of memory. It is this precise
00:13:35.680
inefficiency, the duplication of
00:13:38.480
identical but non-shared data that
00:13:41.279
pitchforks refing is designed to
00:13:44.320
eliminate.
00:13:46.959
Reforokking is a core idea of pitchfork.
00:13:50.560
Once a worker is fully warmed up, it's a
00:13:53.760
perfect template for new workers.
00:13:56.720
Instead of forking from the cold master,
00:13:59.760
pitchfork can promote a hot worker to
00:14:02.639
become a new mold and fork new
00:14:06.160
pre-warmed workers from it.
00:14:10.000
When refing is enabled, pitchfork forks
00:14:13.440
a gen zero mold which boots your
00:14:15.760
application.
00:14:17.360
Then the workers are spawned from the
00:14:19.920
mold.
00:14:22.160
Reforokking is triggered when a
00:14:24.399
configured number of requests have been
00:14:26.240
processed. When it is triggered, the
00:14:29.279
monitor spawns a gen one mold from one
00:14:32.320
of the workers. Once the gen one mold
00:14:35.440
has started, the gen zero mold is
00:14:37.760
terminated.
00:14:40.240
Then one by one each worker is replaced
00:14:44.000
by a fork from the new mold. This
00:14:47.040
process repeats for several generations.
00:14:51.199
The result is a cluster where every
00:14:53.839
single worker shares the exact same
00:14:56.560
fully populated caches and jitted code
00:14:59.440
from the moment it starts.
00:15:03.440
Due to an operating system bias on how
00:15:06.320
traffic is distributed, some workers
00:15:08.959
will handle significantly more traffic
00:15:11.360
than others. Since the criteria for
00:15:14.560
promoting a worker into the new mold is
00:15:17.600
a number of requests it has handled,
00:15:20.000
it's almost always the most warmed up
00:15:22.480
worker that ends up being used as a
00:15:24.880
mold.
00:15:27.440
The result is dramatic. The memory that
00:15:30.480
was previously private static is now
00:15:33.199
shared across all workers. The only
00:15:36.399
private memory each worker needs is for
00:15:38.800
the request it is actively processing.
00:15:42.800
This makes adding more workers
00:15:45.199
remarkably cheap from a memory usage
00:15:47.680
perspective.
00:15:51.680
So why would you choose pitchfork?
00:15:54.800
First, it's built on a simple and robust
00:15:58.320
process-based foundation. This gives you
00:16:01.360
powerful operational benefits like
00:16:03.759
outofband GC, simpler performance
00:16:06.399
debugging without GVL contention and
00:16:09.279
outstanding resiliency. If a worker
00:16:11.920
misbehaves, you can simply terminate
00:16:14.000
that one process and it doesn't affect
00:16:16.000
any other request.
00:16:19.519
Second, pitchfork builds on that
00:16:21.680
foundation with it unique innovation
00:16:24.240
refoc.
00:16:25.759
This provides two key advantages. It
00:16:29.199
achieves significant memory efficiency
00:16:31.839
by sharing the JIT cache and other VM
00:16:34.320
artifacts that are normally private. And
00:16:37.519
critically, it delivers a low and
00:16:40.320
consistent latency because reforoking
00:16:42.959
ensures every new worker is a perfect
00:16:46.079
pre-warmed copy. Here's
00:16:50.240
a publicly shared data from Pitchfork's
00:16:52.399
deployment on the main Shopify
00:16:54.240
monoliths. They reported a 30% reduction
00:16:57.600
in memory usage. More importantly, they
00:17:00.880
achieved a 9% reduction in P99 latency.
00:17:05.120
These are significant measurable
00:17:06.959
improvements in a massive scale.
00:17:11.760
Of course, there are some caveats.
00:17:13.919
Reforoking isn't enabled by default and
00:17:16.720
there are few gems which are
00:17:18.079
incompatible.
00:17:19.919
You also have to ensure that your
00:17:21.839
application is fork safe.
00:17:26.559
So to bring everything together, let's
00:17:28.880
look at a decision framework.
00:17:32.400
Each server has a strengths and a set of
00:17:35.600
trade-offs.
00:17:37.120
Puma solves for general purpose
00:17:39.120
concurrency. Its trade-off is a GVL
00:17:42.320
contention and IO blocking we analyzed.
00:17:46.080
Falcon solves for high IO throughput.
00:17:49.039
Its trade-off is a cooperative
00:17:50.799
scheduleuler which makes it unsuitable
00:17:53.280
for CPU heavy workloads.
00:17:56.320
And pitchfork solves for lowest latency
00:17:59.200
and memory usage. Its trade-offs are
00:18:02.240
potentially lower throughput on mixed
00:18:04.400
workloads and the engineering burden of
00:18:07.440
ensuring folk safety.
00:18:10.240
This means that the decision comes down
00:18:12.320
to two questions. First, what is the
00:18:15.919
nature of your workload? Analyze your
00:18:18.720
systems telemetry to understand your
00:18:21.280
actual bottlenecks.
00:18:23.280
And second, what do you value? Is your
00:18:26.799
priority raw throughput, predictable low
00:18:29.919
latency, or absolute resiliency?
00:18:33.679
There isn't a single best choice, but
00:18:36.640
there is a right choice. It's the one
00:18:39.840
whose trade-off you're most willing to
00:18:41.760
accept to solve your specific problems.
00:18:45.679
Thank you.
00:18:56.559
Sorry, I ran through it faster than I
00:18:58.400
practiced.
00:19:02.960
So, uh, since we have a ton of time,
00:19:05.280
like if you have questions, I'm glad to
00:19:07.200
answer.
00:19:08.720
So you went through your three favorite
00:19:12.160
or the three that you think are
00:19:15.039
appropriate in 2025. There's obviously
00:19:17.600
others around. We got Unicorn or
00:19:21.760
the team I'm using team I'm on is still
00:19:24.000
using Passenger. Uh how do those fit
00:19:26.960
into this scene?
00:19:29.840
I think
00:19:31.919
uh like if you're running unicorn uh
00:19:35.360
even its creator has said that it is not
00:19:37.679
actively maintained uh if I remember
00:19:40.480
right like you can run pitchfork without
00:19:43.360
uh reforoking enabled and you get a
00:19:46.240
modern web server which is actively
00:19:48.559
deployed uh and maintained so and I
00:19:52.960
don't know anybody who uses passenger
00:19:54.720
these days so no answer to
00:19:59.039
Yeah, you gave some numbers around uh
00:20:01.919
memory gains, efficiency gains. Uh were
00:20:04.240
those versus base rails or were they
00:20:07.840
against uh unicorn or
00:20:10.400
so? Uh the Shopify numbers they were
00:20:13.039
running unicorn before and uh the memory
00:20:16.400
reduction and the latency reduction is
00:20:18.720
from running pitchfork.
00:20:22.320
Yeah.
00:20:22.960
Yes. Thank you. Um, you mentioned there
00:20:25.120
are some gems that are not compatible
00:20:26.559
with featur
00:20:30.080
for gems to know which are compatible or
00:20:32.240
is there like a list or is there
00:20:34.320
something they do specifically that
00:20:36.480
makes them incompatible?
00:20:37.840
Yeah, there is a list. Uh, in fact, uh,
00:20:40.080
Pitchfork has amazing documentation. Uh,
00:20:43.600
in fact, uh, I think it's one of the
00:20:45.520
best that I've come across across all
00:20:47.039
gems because it specifically tells you
00:20:49.520
it doesn't try to sell you pitchfork. uh
00:20:51.840
it tells you why uh if you're happy with
00:20:54.320
what you have continue with it and if
00:20:56.720
you have only these specific problems
00:20:58.159
proceed with it and uh it has an
00:21:01.280
exhaustive I mean it has a list of all
00:21:03.280
the gems that they have found
00:21:04.640
incompatible and which they have also
00:21:07.760
working on making it compatible so one
00:21:09.760
of the most common one is gRPC uh but
00:21:13.120
there exist a fork of gRPC which is
00:21:16.159
refing safe
00:21:20.640
how How do you check if your application
00:21:23.039
is fork safe?
00:21:25.440
Uh if it doesn't have any issues in
00:21:27.679
production I guess
00:21:33.840
and and what happens when there is a
00:21:36.000
memor memory leak and then that process
00:21:38.799
keeps getting forked afterwards.
00:21:42.240
Um so one of the reasons why uh the
00:21:46.240
creator of pitchfork started this was
00:21:48.320
memory leak in their uh monolithic
00:21:51.440
application in the sense that they had
00:21:53.840
already configured to uh terminate a
00:21:56.880
worker in unicorn when it breached a
00:21:59.120
certain memory and with pitchfork since
00:22:02.080
everything is shared the memory usage
00:22:04.000
itself has come down. So if there is a
00:22:05.760
memory leak that is purely due to that
00:22:08.240
particular request that it has handled.
00:22:10.320
So, and you can apply everything that
00:22:12.559
you apply in unicorn to pitchfork like
00:22:14.240
you can configure timeouts or uh kill a
00:22:17.840
worker if it breaches a certain memory
00:22:19.760
and all that.
00:22:21.679
Yeah.
00:22:23.280
One one slightly more tactical question
00:22:25.760
like how do you like let's say we want
00:22:28.080
to use pitchfork or falcon in our
00:22:30.480
current like puma you know servers or
00:22:33.200
whatever.
00:22:34.000
Yeah. Yeah. Do have you like how how do
00:22:35.760
you even test that because like it's
00:22:37.919
like you just deploy Puma and see what
00:22:39.600
happens or like deploy Falcon and see
00:22:41.120
what happens. I'm I'm I'm certain that's
00:22:43.520
not the answer like is there tooling
00:22:45.760
around like have you guys tactically
00:22:47.360
tried deploying like the same
00:22:50.480
application on two and comparing like
00:22:52.000
how do you guys did you guys do that?
00:22:54.159
Yeah, I have deployed uh Puma and Falcon
00:22:58.559
together not pitchfork though. uh like
00:23:01.200
what I have done is I'd set up a proxy
00:23:05.039
in front of it and forwarded only
00:23:06.960
certain requests to falcon which I knew
00:23:09.039
were io heavy because they were calling
00:23:11.520
some external APIs and I didn't want to
00:23:14.080
handle that through sidekick and I when
00:23:16.000
I knew that it was much easier to do in
00:23:17.840
falcon u but I don't think in that
00:23:22.159
particular case it would have mattered
00:23:23.679
if the entirety of the application was
00:23:26.559
served by falcon because we were not
00:23:28.480
really CPU bound But you can always like
00:23:32.720
deploy the same Rails application in
00:23:34.320
multiple servers and just pass a certain
00:23:37.039
percentage of traffic to it and observe
00:23:39.120
how it behaves.
00:23:40.720
Uh so thanks for a nice talk. Um and
00:23:43.440
since Puma is probably fine for most
00:23:46.000
reals applications uh unless there is a
00:23:48.559
GVL contention um issue or problem, how
00:23:52.080
would I go about um finding out if my
00:23:54.960
application um is prone to to those kind
00:23:58.480
of things?
00:24:01.440
A lot of the APM frameworks give you uh
00:24:04.480
data on that. Uh but sometimes it's it's
00:24:07.360
tricky. Uh like I have seen
00:24:11.279
uh misattributed uh data like that the
00:24:15.360
database is being very slow which I then
00:24:17.840
later realized that it was it wasn't
00:24:19.600
actually slow but it was just because of
00:24:21.679
GVL contention that it appeared to be
00:24:23.520
slow. So uh one of the things that I've
00:24:26.799
done is try reducing the puma uh thread
00:24:29.679
counts and see how the application
00:24:31.520
behaves. Uh so if if you see a
00:24:35.120
significant increase uh significant
00:24:37.760
increase in performance that's a
00:24:38.960
decrease in latency uh then that means
00:24:41.840
you have GVL contention which is
00:24:43.520
affecting your application. So you can
00:24:46.240
just start by reducing your pum count.
00:24:49.520
Anyone else?
00:24:52.640
Okay. Thank you all.