Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork

Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork Compared

Play on YouTube

Manu Janardhanan

#memory-management

Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork Compared

Manu Janardhanan • July 08, 2025 • Philadelphia, PA • Talk

Introduction

This talk, delivered by Manu Janardhanan at RailsConf 2025, focuses on the internal architectures of three modern Ruby web servers: Puma, Falcon, and Pitchfork. The session aims to provide Ruby developers with a comprehensive framework to confidently choose, tune, and optimize a web server according to their application's unique workload and requirements.

Key Points Covered

Practical Considerations and Decision Framework

Evaluate your application's true workload (CPU vs. IO-bound) with real telemetry.
Consider your priorities: throughput, predictable low latency, or resilience.
Use A/B deployments and traffic splitting to assess alternative servers in your environment.
There's no single best server—the right choice depends on which trade-offs match your application's needs.

Conclusion

The architecture of a Ruby web server—whether hybrid (Puma), fiber-based (Falcon), or process-oriented with memory optimizations (Pitchfork)—carries distinct trade-offs. Understanding these can give developers the leverage to improve performance, reduce latency, and optimize resource usage in production environments.

Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork Compared
Manu Janardhanan • Philadelphia, PA • Talk

Date: July 08, 2025
Published: Wed, 23 Jul 2025 00:00:00 +0000
Announced: unknown

As Ruby developers, we often focus on our application code, but the choice of web server can significantly impact performance, scalability, and resource efficiency.

In this talk, we’ll explore Puma, Falcon, and Pitchfork - three modern Ruby web servers with distinct execution models. We’ll cover:

* How pre-forking and copy-on-write help in minimizing boot time and memory usage.
* How Falcon eliminates the traditional challenges of blocking I/O in Ruby, enabling new approaches to IO bound workloads
* Why Puma’s hybrid model balances concurrency, performance, and memory usage.
* How Pitchfork optimizes for latency and memory efficiency and why it’s a good choice for CPU-bound applications.

By the end, you’ll understand how to choose, tune, and optimize your web server for your specific use case

RailsConf 2025

00:00:16.640 Thank you everyone for coming to my

00:00:18.160 talk. Uh I wasn't nervous before but

00:00:20.720 looking at the size of the crowd now I

00:00:22.240 am. Uh yeah. So let's get into it.

00:00:29.359 So why care about your web server?

00:00:32.239 Because choosing your web server is one

00:00:34.399 of the highest leverage decisions you

00:00:36.559 can make. It impacts performance, the

00:00:39.760 infrastructure cost, and the kind of

00:00:41.840 production issues you see. But there is

00:00:44.320 another lesser known reason this choice

00:00:47.360 is critical. The right web server can

00:00:50.800 provide an elegant solution to problems

00:00:53.680 that are otherwise complex to solve in

00:00:56.320 your application code. My goal today is

00:00:59.840 to give you a framework to navigate

00:01:02.079 these choices so you can confidently

00:01:04.559 select the right server for your needs.

00:01:09.040 We will start with the foundational

00:01:10.960 principles that govern the behavior of

00:01:13.680 modern Ruby web servers. Then we will

00:01:17.040 analyze Puma to understand its

00:01:19.759 architectural constraints. From there we

00:01:22.960 will examine two specialized servers

00:01:25.520 Falcon and Pitchfork

00:01:28.000 each addressing one of those

00:01:30.400 constraints. We will conclude with a

00:01:33.040 framework for making a decision. Let's

00:01:36.000 begin with the fundamentals.

00:01:40.799 To understand Ruby web servers, we have

00:01:43.360 to start with the global VM lock or GVL.

00:01:46.880 The GVL ensures that within a single

00:01:49.680 process, only one thread can execute

00:01:52.640 Ruby code at a time. This has direct

00:01:55.360 consequence. If you have a four core

00:01:57.840 server and you run only one Rails

00:02:00.079 process, you are utilizing only one of

00:02:02.719 those cores. To achieve parallelism and

00:02:06.079 to utilize all of your hardware, you

00:02:08.959 need to run at least four Rails

00:02:10.560 processes.

00:02:13.280 This multiprocess approach isn't new.

00:02:16.959 Early servers like Mongril required

00:02:19.360 running independent processes which was

00:02:22.400 slow to start and hard to manage. The

00:02:25.680 major evolution was a pre-forking model

00:02:28.800 popularized by Unicorn.

00:02:31.200 The idea is to start one master process

00:02:34.959 that boots the application and it then

00:02:37.599 fks cheap fast starting worker

00:02:40.560 processes. This master worker model is a

00:02:43.920 foundation for the servers we use today.

00:02:47.680 This design provides two fundamental

00:02:49.680 advantages. First centralized

00:02:52.640 management. The master process can

00:02:55.200 monitor, restart and manage it workers.

00:02:58.720 Restarting a failed worker is nearly

00:03:00.959 instantaneous.

00:03:02.800 Second and more importantly is memory

00:03:05.760 efficiency enabled by an operating

00:03:08.400 system feature called copy on write.

00:03:12.319 Copy on write is an OS level

00:03:14.239 optimization.

00:03:15.840 When a master process fks a worker, the

00:03:18.800 OS doesn't immediately duplicate all the

00:03:22.159 memory. Instead, they all share the same

00:03:26.159 copy. So if our parent process uses 300

00:03:30.239 MB to load the app after forking two

00:03:33.360 workers, the total memory usage is still

00:03:37.040 just 300 MB. This is what makes

00:03:40.239 multipprocess architecture viable.

00:03:44.239 When either the parent or the child

00:03:47.040 modifies a memory, it becomes private

00:03:50.080 memory. Any new memory allocations done

00:03:53.599 by the processes are also private.

00:03:58.000 Because of the sharing, measuring real

00:04:00.799 memory usage can be tricky. Tools like

00:04:04.239 top often show RSS or resident set size,

00:04:08.400 which overstays the memory usage by not

00:04:11.200 accounting for shared memory. A more

00:04:13.760 accurate metric is PSS or proportional

00:04:16.880 set size.

00:04:18.799 We won't go into the details, but it's

00:04:20.959 important to know that this complexity

00:04:22.960 exists when you're analyzing memory

00:04:25.040 usage.

00:04:27.440 That brings us to Puma.

00:04:31.280 Puma builds on the pre-forking model by

00:04:33.919 adding another layer of concurrency

00:04:36.400 threads. It's a hybrid, multiprocess,

00:04:39.919 multi-threaded architecture. As a

00:04:42.560 generalist, it performs well in many

00:04:44.960 scenarios, but it faces challenges at

00:04:48.400 two extremes.

00:04:50.720 Let's examine its first major constraint

00:04:53.440 which appears under heavily IO bound

00:04:56.560 workloads.

00:04:58.960 When threads block waiting for database

00:05:01.199 queries,

00:05:03.199 API calls or file operations, they

00:05:06.479 become unavailable to serve new

00:05:09.120 requests. This effectively reduces your

00:05:13.199 systems capacity even when your CPU

00:05:16.240 resources remain idle.

00:05:21.039 When your system operates near capacity,

00:05:23.919 this leads to a cascade of performance

00:05:26.160 degradation.

00:05:27.680 CPU utilization drops as threads wait,

00:05:30.800 throughput plummets, and the request

00:05:32.960 backlog grows. Your server becomes

00:05:35.440 paralyzed, not from being overworked,

00:05:38.720 but from being stuck in a waiting state.

00:05:42.960 A seemingly intuitive response to IO

00:05:45.840 bottlenecks is to increase the thread

00:05:47.840 count.

00:05:49.280 This can provide a temporary increase in

00:05:51.840 capacity. However, for a typical mixed

00:05:55.759 workload application, this strategy

00:05:58.400 introduces a new problem. It increases

00:06:01.759 the contention for GVL,

00:06:04.479 which brings us to constraint number

00:06:06.479 two, the GVL.

00:06:10.080 As a quick refresher, Ruby's threading

00:06:12.960 model has three key characteristics.

00:06:16.080 First, within a process, only one thread

00:06:19.039 can hold the GVL and execute Ruby code

00:06:21.520 at a time. Second, IO operations are the

00:06:25.199 exception.

00:06:26.960 A thread releases a GVL before a

00:06:29.840 blocking IO call.

00:06:32.319 And third, Ruby implements preemptive

00:06:35.199 shoulduling. So, a thread can't hold the

00:06:37.280 GVL for more than 100 milliseconds if

00:06:40.160 another thread is waiting for the GVL.

00:06:44.080 Let's observe how this GVL contention

00:06:46.639 manifests. We will trace two requests

00:06:49.440 that in isolation each take 270

00:06:53.520 milliseconds to complete. Now we'll see

00:06:56.160 what happens when they are processed

00:06:58.160 concurrently by two threads in a same

00:07:00.960 PUMA worker.

00:07:04.080 Despite both requests having the same

00:07:06.560 individual completion time, the GVL

00:07:09.199 contention creates additional

00:07:11.599 non-deterministic latency for both.

00:07:15.759 The result is a 48% increase in latency

00:07:18.639 for request one and a 37% increase for

00:07:21.759 request two. This is a GVL tax and it's

00:07:25.759 not a theoretical problem. It has driven

00:07:28.400 major real world architectural

00:07:30.080 decisions. For instance, the rail stream

00:07:33.199 lowered the default Puma threads from 5

00:07:35.599 to three directly citing this latency

00:07:38.160 impact. Basec camp prioritizing

00:07:41.120 predictable performance runs Puma with

00:07:43.520 just a single thread. And Shopify to

00:07:46.639 optimize for latency runs processbased

00:07:49.280 servers like Unicon and Pitchfork. These

00:07:52.720 are major players in our ecosystem all

00:07:55.199 making a clear tradeoff. They're

00:07:57.759 sacrificing some of the benefits of the

00:07:59.840 threaded model to avoid the GVL tax.

00:08:03.280 This is a core constraint of Puma's

00:08:06.000 hybrid architecture.

00:08:08.960 So we have established Puma's two

00:08:11.120 challenges, IO blocking and GVL

00:08:14.240 contention.

00:08:15.840 Let's address the IO problem directly.

00:08:19.039 What if we used a different concurrency

00:08:21.360 primitive than threads? This leads us to

00:08:24.960 an entirely different architecture. in

00:08:27.039 Falcon which is built on fibers and an

00:08:29.840 event loop

00:08:32.000 like Puma. Falcon is a multi-process

00:08:35.519 pre-foroking server for CPU parallelism.

00:08:38.719 The key difference is inside each

00:08:40.800 worker. There is no thread pool.

00:08:43.839 Instead, an event loop manages tasks.

00:08:47.680 When a request arrives, it is assigned

00:08:50.000 to a lightweight fiber.

00:08:52.560 There are no limits on the number of

00:08:54.800 fibers spawned. Falcon will spawn as

00:08:57.839 many fibers as there are requests.

00:09:01.040 The power of this model is how it

00:09:03.200 handles I/IO.

00:09:05.120 When a fiber encounters a blocking

00:09:07.440 operation, the fiber scheduleuler

00:09:09.839 automatically yields control, but the

00:09:12.240 worker process itself does not block.

00:09:15.040 The event loop is free to immediately

00:09:17.519 run another fiber. This allows a single

00:09:20.959 worker to handle thousands of concurrent

00:09:23.600 IO operations.

00:09:26.399 However, this introduces a critical

00:09:28.800 trade-off. Threads are preempted by the

00:09:31.600 runtime. Fibers, on the other hand, are

00:09:34.560 cooperative. They must explicitly yield.

00:09:38.240 A fiber will continue running until it

00:09:40.959 completes, yields, or performs a

00:09:43.760 blocking IO operation. The consequence

00:09:47.040 is that a single longunning CPUbound

00:09:50.480 fiber will block the event loop and

00:09:53.120 starve all other fibers.

00:09:55.680 This makes Falcon a highly specialized

00:09:58.160 tool. It is exceptionally efficient for

00:10:00.959 IO bound workloads but it is not

00:10:03.440 designed for CPUbound workloads.

00:10:06.800 So when should we use Falcon?

00:10:10.080 The obvious answer is for any

00:10:12.480 application with very high IO usage.

00:10:15.760 Anything that spends most of its time

00:10:18.079 waiting for the network or the disk.

00:10:21.360 But the more interesting answer is that

00:10:23.200 Falcon gives us a fundamentally new way

00:10:25.760 to write concurrent code in Ruby. If you

00:10:28.399 have ever looked at event-driven

00:10:30.000 concurrency in NodeJS and wished you had

00:10:33.200 something similar in Ruby, Falcon is

00:10:35.519 your answer. This ess powered by Falcon

00:10:39.360 and the Sing gem is exceptionally

00:10:41.920 powerful. It allows you to break long

00:10:45.120 head Rails rules that were created to

00:10:47.760 work around blocking IO. For example,

00:10:50.800 the rule never make a slow API call in

00:10:54.320 the controller exists because it would

00:10:57.360 tie up a precious Puma thread. With

00:11:00.320 Falcon, that rule no longer exists. You

00:11:03.839 can make those slow API calls directly

00:11:06.320 and with much simpler code. This

00:11:09.519 approach has tremendous potential. If

00:11:12.079 this interests you, I did a deep dive on

00:11:14.320 both Falcon and as gem last year.

00:11:19.360 So let's reassess. Falcon provides an

00:11:23.279 effective solution for heavily IO bound

00:11:25.839 systems. However, it does not address

00:11:29.040 the GVL contention that impacts CPUbound

00:11:32.000 work. It also doesn't solve certain

00:11:34.880 memory inefficiencies

00:11:36.959 inherent in the standard copy on write

00:11:39.120 model. To address those remaining

00:11:41.360 issues, we need to look at an evolution

00:11:44.079 of the pure process model. That brings

00:11:46.560 us to Pitchfork.

00:11:50.079 At its heart, Pitchfork is a direct

00:11:52.399 descendant of Unicorn. It fully embraces

00:11:55.920 the one request per process model. This

00:11:58.959 provides true GVL-free parallelism and

00:12:02.000 outstanding resiliency. But Pitchfork

00:12:04.720 creator wanted to solve the biggest

00:12:07.040 drawback of this model, memory usage.

00:12:11.519 Let's revisit our copy on write example.

00:12:15.279 We have a parent and two fork children

00:12:18.079 with both shared and private memory.

00:12:22.399 The key to pitchfork's optimization is

00:12:25.040 realizing that private memory isn't

00:12:28.160 monolithic. It has two parts. There is

00:12:32.079 private processing memory for objects in

00:12:34.959 a single request which gets garbage

00:12:36.880 collected.

00:12:38.639 There is also private static memory used

00:12:41.680 by the VM for things like inline caches

00:12:44.720 and JIT compile code. This is a warm-up

00:12:48.399 data that makes a worker fast.

00:12:52.079 And this reveals the critical

00:12:53.839 inefficiency of a standard server. As

00:12:57.120 each worker warms up, it builds this

00:12:59.440 private static block of JIT caches and

00:13:02.240 VM optimizations.

00:13:04.240 Now the crucial point is this. While the

00:13:07.040 content of this caches will be nearly

00:13:09.920 identical across all warmed up workers,

00:13:12.880 the physical memory pages they occupy

00:13:15.920 are not shared. Each worker has to build

00:13:19.360 and store its own private copy. You end

00:13:22.880 up with dozens of processes all holding

00:13:26.560 identical but separate copies of the

00:13:29.200 same warm-up data. This is a significant

00:13:32.959 waste of memory. It is this precise

00:13:35.680 inefficiency, the duplication of

00:13:38.480 identical but non-shared data that

00:13:41.279 pitchforks refing is designed to

00:13:44.320 eliminate.

00:13:46.959 Reforokking is a core idea of pitchfork.

00:13:50.560 Once a worker is fully warmed up, it's a

00:13:53.760 perfect template for new workers.

00:13:56.720 Instead of forking from the cold master,

00:13:59.760 pitchfork can promote a hot worker to

00:14:02.639 become a new mold and fork new

00:14:06.160 pre-warmed workers from it.

00:14:10.000 When refing is enabled, pitchfork forks

00:14:13.440 a gen zero mold which boots your

00:14:15.760 application.

00:14:17.360 Then the workers are spawned from the

00:14:19.920 mold.

00:14:22.160 Reforokking is triggered when a

00:14:24.399 configured number of requests have been

00:14:26.240 processed. When it is triggered, the

00:14:29.279 monitor spawns a gen one mold from one

00:14:32.320 of the workers. Once the gen one mold

00:14:35.440 has started, the gen zero mold is

00:14:37.760 terminated.

00:14:40.240 Then one by one each worker is replaced

00:14:44.000 by a fork from the new mold. This

00:14:47.040 process repeats for several generations.

00:14:51.199 The result is a cluster where every

00:14:53.839 single worker shares the exact same

00:14:56.560 fully populated caches and jitted code

00:14:59.440 from the moment it starts.

00:15:03.440 Due to an operating system bias on how

00:15:06.320 traffic is distributed, some workers

00:15:08.959 will handle significantly more traffic

00:15:11.360 than others. Since the criteria for

00:15:14.560 promoting a worker into the new mold is

00:15:17.600 a number of requests it has handled,

00:15:20.000 it's almost always the most warmed up

00:15:22.480 worker that ends up being used as a

00:15:24.880 mold.

00:15:27.440 The result is dramatic. The memory that

00:15:30.480 was previously private static is now

00:15:33.199 shared across all workers. The only

00:15:36.399 private memory each worker needs is for

00:15:38.800 the request it is actively processing.

00:15:42.800 This makes adding more workers

00:15:45.199 remarkably cheap from a memory usage

00:15:47.680 perspective.

00:15:51.680 So why would you choose pitchfork?

00:15:54.800 First, it's built on a simple and robust

00:15:58.320 process-based foundation. This gives you

00:16:01.360 powerful operational benefits like

00:16:03.759 outofband GC, simpler performance

00:16:06.399 debugging without GVL contention and

00:16:09.279 outstanding resiliency. If a worker

00:16:11.920 misbehaves, you can simply terminate

00:16:14.000 that one process and it doesn't affect

00:16:16.000 any other request.

00:16:19.519 Second, pitchfork builds on that

00:16:21.680 foundation with it unique innovation

00:16:24.240 refoc.

00:16:25.759 This provides two key advantages. It

00:16:29.199 achieves significant memory efficiency

00:16:31.839 by sharing the JIT cache and other VM

00:16:34.320 artifacts that are normally private. And

00:16:37.519 critically, it delivers a low and

00:16:40.320 consistent latency because reforoking

00:16:42.959 ensures every new worker is a perfect

00:16:46.079 pre-warmed copy. Here's

00:16:50.240 a publicly shared data from Pitchfork's

00:16:52.399 deployment on the main Shopify

00:16:54.240 monoliths. They reported a 30% reduction

00:16:57.600 in memory usage. More importantly, they

00:17:00.880 achieved a 9% reduction in P99 latency.

00:17:05.120 These are significant measurable

00:17:06.959 improvements in a massive scale.

00:17:11.760 Of course, there are some caveats.

00:17:13.919 Reforoking isn't enabled by default and

00:17:16.720 there are few gems which are

00:17:18.079 incompatible.

00:17:19.919 You also have to ensure that your

00:17:21.839 application is fork safe.

00:17:26.559 So to bring everything together, let's

00:17:28.880 look at a decision framework.

00:17:32.400 Each server has a strengths and a set of

00:17:35.600 trade-offs.

00:17:37.120 Puma solves for general purpose

00:17:39.120 concurrency. Its trade-off is a GVL

00:17:42.320 contention and IO blocking we analyzed.

00:17:46.080 Falcon solves for high IO throughput.

00:17:49.039 Its trade-off is a cooperative

00:17:50.799 scheduleuler which makes it unsuitable

00:17:53.280 for CPU heavy workloads.

00:17:56.320 And pitchfork solves for lowest latency

00:17:59.200 and memory usage. Its trade-offs are

00:18:02.240 potentially lower throughput on mixed

00:18:04.400 workloads and the engineering burden of

00:18:07.440 ensuring folk safety.

00:18:10.240 This means that the decision comes down

00:18:12.320 to two questions. First, what is the

00:18:15.919 nature of your workload? Analyze your

00:18:18.720 systems telemetry to understand your

00:18:21.280 actual bottlenecks.

00:18:23.280 And second, what do you value? Is your

00:18:26.799 priority raw throughput, predictable low

00:18:29.919 latency, or absolute resiliency?

00:18:33.679 There isn't a single best choice, but

00:18:36.640 there is a right choice. It's the one

00:18:39.840 whose trade-off you're most willing to

00:18:41.760 accept to solve your specific problems.

00:18:45.679 Thank you.

00:18:56.559 Sorry, I ran through it faster than I

00:18:58.400 practiced.

00:19:02.960 So, uh, since we have a ton of time,

00:19:05.280 like if you have questions, I'm glad to

00:19:07.200 answer.

00:19:08.720 So you went through your three favorite

00:19:12.160 or the three that you think are

00:19:15.039 appropriate in 2025. There's obviously

00:19:17.600 others around. We got Unicorn or

00:19:21.760 the team I'm using team I'm on is still

00:19:24.000 using Passenger. Uh how do those fit

00:19:26.960 into this scene?

00:19:29.840 I think

00:19:31.919 uh like if you're running unicorn uh

00:19:35.360 even its creator has said that it is not

00:19:37.679 actively maintained uh if I remember

00:19:40.480 right like you can run pitchfork without

00:19:43.360 uh reforoking enabled and you get a

00:19:46.240 modern web server which is actively

00:19:48.559 deployed uh and maintained so and I

00:19:52.960 don't know anybody who uses passenger

00:19:54.720 these days so no answer to

00:19:59.039 Yeah, you gave some numbers around uh

00:20:01.919 memory gains, efficiency gains. Uh were

00:20:04.240 those versus base rails or were they

00:20:07.840 against uh unicorn or

00:20:10.400 so? Uh the Shopify numbers they were

00:20:13.039 running unicorn before and uh the memory

00:20:16.400 reduction and the latency reduction is

00:20:18.720 from running pitchfork.

00:20:22.320 Yeah.

00:20:22.960 Yes. Thank you. Um, you mentioned there

00:20:25.120 are some gems that are not compatible

00:20:26.559 with featur

00:20:30.080 for gems to know which are compatible or

00:20:32.240 is there like a list or is there

00:20:34.320 something they do specifically that

00:20:36.480 makes them incompatible?

00:20:37.840 Yeah, there is a list. Uh, in fact, uh,

00:20:40.080 Pitchfork has amazing documentation. Uh,

00:20:43.600 in fact, uh, I think it's one of the

00:20:45.520 best that I've come across across all

00:20:47.039 gems because it specifically tells you

00:20:49.520 it doesn't try to sell you pitchfork. uh

00:20:51.840 it tells you why uh if you're happy with

00:20:54.320 what you have continue with it and if

00:20:56.720 you have only these specific problems

00:20:58.159 proceed with it and uh it has an

00:21:01.280 exhaustive I mean it has a list of all

00:21:03.280 the gems that they have found

00:21:04.640 incompatible and which they have also

00:21:07.760 working on making it compatible so one

00:21:09.760 of the most common one is gRPC uh but

00:21:13.120 there exist a fork of gRPC which is

00:21:16.159 refing safe

00:21:20.640 how How do you check if your application

00:21:23.039 is fork safe?

00:21:25.440 Uh if it doesn't have any issues in

00:21:27.679 production I guess

00:21:33.840 and and what happens when there is a

00:21:36.000 memor memory leak and then that process

00:21:38.799 keeps getting forked afterwards.

00:21:42.240 Um so one of the reasons why uh the

00:21:46.240 creator of pitchfork started this was

00:21:48.320 memory leak in their uh monolithic

00:21:51.440 application in the sense that they had

00:21:53.840 already configured to uh terminate a

00:21:56.880 worker in unicorn when it breached a

00:21:59.120 certain memory and with pitchfork since

00:22:02.080 everything is shared the memory usage

00:22:04.000 itself has come down. So if there is a

00:22:05.760 memory leak that is purely due to that

00:22:08.240 particular request that it has handled.

00:22:10.320 So, and you can apply everything that

00:22:12.559 you apply in unicorn to pitchfork like

00:22:14.240 you can configure timeouts or uh kill a

00:22:17.840 worker if it breaches a certain memory

00:22:19.760 and all that.

00:22:21.679 Yeah.

00:22:23.280 One one slightly more tactical question

00:22:25.760 like how do you like let's say we want

00:22:28.080 to use pitchfork or falcon in our

00:22:30.480 current like puma you know servers or

00:22:33.200 whatever.

00:22:34.000 Yeah. Yeah. Do have you like how how do

00:22:35.760 you even test that because like it's

00:22:37.919 like you just deploy Puma and see what

00:22:39.600 happens or like deploy Falcon and see

00:22:41.120 what happens. I'm I'm I'm certain that's

00:22:43.520 not the answer like is there tooling

00:22:45.760 around like have you guys tactically

00:22:47.360 tried deploying like the same

00:22:50.480 application on two and comparing like

00:22:52.000 how do you guys did you guys do that?

00:22:54.159 Yeah, I have deployed uh Puma and Falcon

00:22:58.559 together not pitchfork though. uh like

00:23:01.200 what I have done is I'd set up a proxy

00:23:05.039 in front of it and forwarded only

00:23:06.960 certain requests to falcon which I knew

00:23:09.039 were io heavy because they were calling

00:23:11.520 some external APIs and I didn't want to

00:23:14.080 handle that through sidekick and I when

00:23:16.000 I knew that it was much easier to do in

00:23:17.840 falcon u but I don't think in that

00:23:22.159 particular case it would have mattered

00:23:23.679 if the entirety of the application was

00:23:26.559 served by falcon because we were not

00:23:28.480 really CPU bound But you can always like

00:23:32.720 deploy the same Rails application in

00:23:34.320 multiple servers and just pass a certain

00:23:37.039 percentage of traffic to it and observe

00:23:39.120 how it behaves.

00:23:40.720 Uh so thanks for a nice talk. Um and

00:23:43.440 since Puma is probably fine for most

00:23:46.000 reals applications uh unless there is a

00:23:48.559 GVL contention um issue or problem, how

00:23:52.080 would I go about um finding out if my

00:23:54.960 application um is prone to to those kind

00:23:58.480 of things?

00:24:01.440 A lot of the APM frameworks give you uh

00:24:04.480 data on that. Uh but sometimes it's it's

00:24:07.360 tricky. Uh like I have seen

00:24:11.279 uh misattributed uh data like that the

00:24:15.360 database is being very slow which I then

00:24:17.840 later realized that it was it wasn't

00:24:19.600 actually slow but it was just because of

00:24:21.679 GVL contention that it appeared to be

00:24:23.520 slow. So uh one of the things that I've

00:24:26.799 done is try reducing the puma uh thread

00:24:29.679 counts and see how the application

00:24:31.520 behaves. Uh so if if you see a

00:24:35.120 significant increase uh significant

00:24:37.760 increase in performance that's a

00:24:38.960 decrease in latency uh then that means

00:24:41.840 you have GVL contention which is

00:24:43.520 affecting your application. So you can

00:24:46.240 just start by reducing your pum count.

00:24:49.520 Anyone else?

00:24:52.640 Okay. Thank you all.

Manu Janardhanan

explore all talks recorded at RailsConf 2025

Explore all talks recorded at RailsConf 2025

RailsConf 2025