Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork Compared


Summarized using AI

Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork Compared

Manu Janardhanan • July 08, 2025 • Philadelphia, PA • Talk

Introduction

This talk, delivered by Manu Janardhanan at RailsConf 2025, focuses on the internal architectures of three modern Ruby web servers: Puma, Falcon, and Pitchfork. The session aims to provide Ruby developers with a comprehensive framework to confidently choose, tune, and optimize a web server according to their application's unique workload and requirements.

Key Points Covered

  • Foundational Principles

    • The Global VM Lock (GVL) in Ruby restricts true parallelism within a single process, necessitating multi-process architectures for full hardware utilization.
    • Pre-forking and copy-on-write (CoW) mechanisms make forking multiple processes efficient by sharing memory until it is modified, though measuring memory usage accurately can be challenging.
  • Puma: The Hybrid Generalist

    • Puma combines a pre-forking master-worker model with multithreading, balancing memory and performance for general-purpose workloads.
    • Two main constraints:
    • IO-bound workloads: Thread blocking on IO can reduce capacity and cause latency breakdowns.
    • GVL contention: Increasing thread count can increase contention for the GVL, causing unpredictable latency and degraded throughput.
    • Real-world decisions:
    • Organizations like Basecamp and Shopify have adjusted thread counts, or used process-based servers, to minimize latency introduced by the GVL.
  • Falcon: Fiber and Event-Loop Driven

    • Falcon is a multi-process pre-forking server that uses fibers and an event loop instead of threads in each worker.
    • Each incoming request is managed by a lightweight fiber, enabling thousands of concurrent IO operations efficiently.
    • Fibers are cooperative; a CPU-bound fiber can block the event loop, making Falcon less suitable for CPU-heavy tasks.
    • Best suited for highly IO-bound applications, Falcon offers Ruby developers a new paradigm similar to NodeJS-style event-driven concurrency, removing constraints like "no slow API calls in controllers".
  • Pitchfork: Memory-Efficient Process Model

    • Pitchfork follows a pure process-per-request model like Unicorn but introduces innovations to improve memory efficiency:
    • Re-forking: Uses a 'hot' (fully warmed-up) worker as a mold to spawn new workers, ensuring that memory caches like JIT and VM artifacts are shared across workers, dramatically reducing redundant memory usage.
    • Shopify's deployment showed a 30% memory reduction and a 9% P99 latency reduction.
    • Pitchfork also provides resilience and easier debugging but requires fork-safe applications and may have compatibility issues with certain gems.

Practical Considerations and Decision Framework

  • Evaluate your application's true workload (CPU vs. IO-bound) with real telemetry.
  • Consider your priorities: throughput, predictable low latency, or resilience.
  • Use A/B deployments and traffic splitting to assess alternative servers in your environment.
  • There's no single best server—the right choice depends on which trade-offs match your application's needs.

Conclusion

The architecture of a Ruby web server—whether hybrid (Puma), fiber-based (Falcon), or process-oriented with memory optimizations (Pitchfork)—carries distinct trade-offs. Understanding these can give developers the leverage to improve performance, reduce latency, and optimize resource usage in production environments.

Understanding Ruby Web Server Internals: Puma, Falcon, and Pitchfork Compared
Manu Janardhanan • Philadelphia, PA • Talk

Date: July 08, 2025
Published: July 23, 2025
Announced: unknown

As Ruby developers, we often focus on our application code, but the choice of web server can significantly impact performance, scalability, and resource efficiency.

In this talk, we’ll explore Puma, Falcon, and Pitchfork - three modern Ruby web servers with distinct execution models. We’ll cover:

* How pre-forking and copy-on-write help in minimizing boot time and memory usage.
* How Falcon eliminates the traditional challenges of blocking I/O in Ruby, enabling new approaches to IO bound workloads
* Why Puma’s hybrid model balances concurrency, performance, and memory usage.
* How Pitchfork optimizes for latency and memory efficiency and why it’s a good choice for CPU-bound applications.

By the end, you’ll understand how to choose, tune, and optimize your web server for your specific use case

RailsConf 2025

00:00:16.640 Thank you everyone for coming to my
00:00:18.160 talk. Uh I wasn't nervous before but
00:00:20.720 looking at the size of the crowd now I
00:00:22.240 am. Uh yeah. So let's get into it.
00:00:29.359 So why care about your web server?
00:00:32.239 Because choosing your web server is one
00:00:34.399 of the highest leverage decisions you
00:00:36.559 can make. It impacts performance, the
00:00:39.760 infrastructure cost, and the kind of
00:00:41.840 production issues you see. But there is
00:00:44.320 another lesser known reason this choice
00:00:47.360 is critical. The right web server can
00:00:50.800 provide an elegant solution to problems
00:00:53.680 that are otherwise complex to solve in
00:00:56.320 your application code. My goal today is
00:00:59.840 to give you a framework to navigate
00:01:02.079 these choices so you can confidently
00:01:04.559 select the right server for your needs.
00:01:09.040 We will start with the foundational
00:01:10.960 principles that govern the behavior of
00:01:13.680 modern Ruby web servers. Then we will
00:01:17.040 analyze Puma to understand its
00:01:19.759 architectural constraints. From there we
00:01:22.960 will examine two specialized servers
00:01:25.520 Falcon and Pitchfork
00:01:28.000 each addressing one of those
00:01:30.400 constraints. We will conclude with a
00:01:33.040 framework for making a decision. Let's
00:01:36.000 begin with the fundamentals.
00:01:40.799 To understand Ruby web servers, we have
00:01:43.360 to start with the global VM lock or GVL.
00:01:46.880 The GVL ensures that within a single
00:01:49.680 process, only one thread can execute
00:01:52.640 Ruby code at a time. This has direct
00:01:55.360 consequence. If you have a four core
00:01:57.840 server and you run only one Rails
00:02:00.079 process, you are utilizing only one of
00:02:02.719 those cores. To achieve parallelism and
00:02:06.079 to utilize all of your hardware, you
00:02:08.959 need to run at least four Rails
00:02:10.560 processes.
00:02:13.280 This multiprocess approach isn't new.
00:02:16.959 Early servers like Mongril required
00:02:19.360 running independent processes which was
00:02:22.400 slow to start and hard to manage. The
00:02:25.680 major evolution was a pre-forking model
00:02:28.800 popularized by Unicorn.
00:02:31.200 The idea is to start one master process
00:02:34.959 that boots the application and it then
00:02:37.599 fks cheap fast starting worker
00:02:40.560 processes. This master worker model is a
00:02:43.920 foundation for the servers we use today.
00:02:47.680 This design provides two fundamental
00:02:49.680 advantages. First centralized
00:02:52.640 management. The master process can
00:02:55.200 monitor, restart and manage it workers.
00:02:58.720 Restarting a failed worker is nearly
00:03:00.959 instantaneous.
00:03:02.800 Second and more importantly is memory
00:03:05.760 efficiency enabled by an operating
00:03:08.400 system feature called copy on write.
00:03:12.319 Copy on write is an OS level
00:03:14.239 optimization.
00:03:15.840 When a master process fks a worker, the
00:03:18.800 OS doesn't immediately duplicate all the
00:03:22.159 memory. Instead, they all share the same
00:03:26.159 copy. So if our parent process uses 300
00:03:30.239 MB to load the app after forking two
00:03:33.360 workers, the total memory usage is still
00:03:37.040 just 300 MB. This is what makes
00:03:40.239 multipprocess architecture viable.
00:03:44.239 When either the parent or the child
00:03:47.040 modifies a memory, it becomes private
00:03:50.080 memory. Any new memory allocations done
00:03:53.599 by the processes are also private.
00:03:58.000 Because of the sharing, measuring real
00:04:00.799 memory usage can be tricky. Tools like
00:04:04.239 top often show RSS or resident set size,
00:04:08.400 which overstays the memory usage by not
00:04:11.200 accounting for shared memory. A more
00:04:13.760 accurate metric is PSS or proportional
00:04:16.880 set size.
00:04:18.799 We won't go into the details, but it's
00:04:20.959 important to know that this complexity
00:04:22.960 exists when you're analyzing memory
00:04:25.040 usage.
00:04:27.440 That brings us to Puma.
00:04:31.280 Puma builds on the pre-forking model by
00:04:33.919 adding another layer of concurrency
00:04:36.400 threads. It's a hybrid, multiprocess,
00:04:39.919 multi-threaded architecture. As a
00:04:42.560 generalist, it performs well in many
00:04:44.960 scenarios, but it faces challenges at
00:04:48.400 two extremes.
00:04:50.720 Let's examine its first major constraint
00:04:53.440 which appears under heavily IO bound
00:04:56.560 workloads.
00:04:58.960 When threads block waiting for database
00:05:01.199 queries,
00:05:03.199 API calls or file operations, they
00:05:06.479 become unavailable to serve new
00:05:09.120 requests. This effectively reduces your
00:05:13.199 systems capacity even when your CPU
00:05:16.240 resources remain idle.
00:05:21.039 When your system operates near capacity,
00:05:23.919 this leads to a cascade of performance
00:05:26.160 degradation.
00:05:27.680 CPU utilization drops as threads wait,
00:05:30.800 throughput plummets, and the request
00:05:32.960 backlog grows. Your server becomes
00:05:35.440 paralyzed, not from being overworked,
00:05:38.720 but from being stuck in a waiting state.
00:05:42.960 A seemingly intuitive response to IO
00:05:45.840 bottlenecks is to increase the thread
00:05:47.840 count.
00:05:49.280 This can provide a temporary increase in
00:05:51.840 capacity. However, for a typical mixed
00:05:55.759 workload application, this strategy
00:05:58.400 introduces a new problem. It increases
00:06:01.759 the contention for GVL,
00:06:04.479 which brings us to constraint number
00:06:06.479 two, the GVL.
00:06:10.080 As a quick refresher, Ruby's threading
00:06:12.960 model has three key characteristics.
00:06:16.080 First, within a process, only one thread
00:06:19.039 can hold the GVL and execute Ruby code
00:06:21.520 at a time. Second, IO operations are the
00:06:25.199 exception.
00:06:26.960 A thread releases a GVL before a
00:06:29.840 blocking IO call.
00:06:32.319 And third, Ruby implements preemptive
00:06:35.199 shoulduling. So, a thread can't hold the
00:06:37.280 GVL for more than 100 milliseconds if
00:06:40.160 another thread is waiting for the GVL.
00:06:44.080 Let's observe how this GVL contention
00:06:46.639 manifests. We will trace two requests
00:06:49.440 that in isolation each take 270
00:06:53.520 milliseconds to complete. Now we'll see
00:06:56.160 what happens when they are processed
00:06:58.160 concurrently by two threads in a same
00:07:00.960 PUMA worker.
00:07:04.080 Despite both requests having the same
00:07:06.560 individual completion time, the GVL
00:07:09.199 contention creates additional
00:07:11.599 non-deterministic latency for both.
00:07:15.759 The result is a 48% increase in latency
00:07:18.639 for request one and a 37% increase for
00:07:21.759 request two. This is a GVL tax and it's
00:07:25.759 not a theoretical problem. It has driven
00:07:28.400 major real world architectural
00:07:30.080 decisions. For instance, the rail stream
00:07:33.199 lowered the default Puma threads from 5
00:07:35.599 to three directly citing this latency
00:07:38.160 impact. Basec camp prioritizing
00:07:41.120 predictable performance runs Puma with
00:07:43.520 just a single thread. And Shopify to
00:07:46.639 optimize for latency runs processbased
00:07:49.280 servers like Unicon and Pitchfork. These
00:07:52.720 are major players in our ecosystem all
00:07:55.199 making a clear tradeoff. They're
00:07:57.759 sacrificing some of the benefits of the
00:07:59.840 threaded model to avoid the GVL tax.
00:08:03.280 This is a core constraint of Puma's
00:08:06.000 hybrid architecture.
00:08:08.960 So we have established Puma's two
00:08:11.120 challenges, IO blocking and GVL
00:08:14.240 contention.
00:08:15.840 Let's address the IO problem directly.
00:08:19.039 What if we used a different concurrency
00:08:21.360 primitive than threads? This leads us to
00:08:24.960 an entirely different architecture. in
00:08:27.039 Falcon which is built on fibers and an
00:08:29.840 event loop
00:08:32.000 like Puma. Falcon is a multi-process
00:08:35.519 pre-foroking server for CPU parallelism.
00:08:38.719 The key difference is inside each
00:08:40.800 worker. There is no thread pool.
00:08:43.839 Instead, an event loop manages tasks.
00:08:47.680 When a request arrives, it is assigned
00:08:50.000 to a lightweight fiber.
00:08:52.560 There are no limits on the number of
00:08:54.800 fibers spawned. Falcon will spawn as
00:08:57.839 many fibers as there are requests.
00:09:01.040 The power of this model is how it
00:09:03.200 handles I/IO.
00:09:05.120 When a fiber encounters a blocking
00:09:07.440 operation, the fiber scheduleuler
00:09:09.839 automatically yields control, but the
00:09:12.240 worker process itself does not block.
00:09:15.040 The event loop is free to immediately
00:09:17.519 run another fiber. This allows a single
00:09:20.959 worker to handle thousands of concurrent
00:09:23.600 IO operations.
00:09:26.399 However, this introduces a critical
00:09:28.800 trade-off. Threads are preempted by the
00:09:31.600 runtime. Fibers, on the other hand, are
00:09:34.560 cooperative. They must explicitly yield.
00:09:38.240 A fiber will continue running until it
00:09:40.959 completes, yields, or performs a
00:09:43.760 blocking IO operation. The consequence
00:09:47.040 is that a single longunning CPUbound
00:09:50.480 fiber will block the event loop and
00:09:53.120 starve all other fibers.
00:09:55.680 This makes Falcon a highly specialized
00:09:58.160 tool. It is exceptionally efficient for
00:10:00.959 IO bound workloads but it is not
00:10:03.440 designed for CPUbound workloads.
00:10:06.800 So when should we use Falcon?
00:10:10.080 The obvious answer is for any
00:10:12.480 application with very high IO usage.
00:10:15.760 Anything that spends most of its time
00:10:18.079 waiting for the network or the disk.
00:10:21.360 But the more interesting answer is that
00:10:23.200 Falcon gives us a fundamentally new way
00:10:25.760 to write concurrent code in Ruby. If you
00:10:28.399 have ever looked at event-driven
00:10:30.000 concurrency in NodeJS and wished you had
00:10:33.200 something similar in Ruby, Falcon is
00:10:35.519 your answer. This ess powered by Falcon
00:10:39.360 and the Sing gem is exceptionally
00:10:41.920 powerful. It allows you to break long
00:10:45.120 head Rails rules that were created to
00:10:47.760 work around blocking IO. For example,
00:10:50.800 the rule never make a slow API call in
00:10:54.320 the controller exists because it would
00:10:57.360 tie up a precious Puma thread. With
00:11:00.320 Falcon, that rule no longer exists. You
00:11:03.839 can make those slow API calls directly
00:11:06.320 and with much simpler code. This
00:11:09.519 approach has tremendous potential. If
00:11:12.079 this interests you, I did a deep dive on
00:11:14.320 both Falcon and as gem last year.
00:11:19.360 So let's reassess. Falcon provides an
00:11:23.279 effective solution for heavily IO bound
00:11:25.839 systems. However, it does not address
00:11:29.040 the GVL contention that impacts CPUbound
00:11:32.000 work. It also doesn't solve certain
00:11:34.880 memory inefficiencies
00:11:36.959 inherent in the standard copy on write
00:11:39.120 model. To address those remaining
00:11:41.360 issues, we need to look at an evolution
00:11:44.079 of the pure process model. That brings
00:11:46.560 us to Pitchfork.
00:11:50.079 At its heart, Pitchfork is a direct
00:11:52.399 descendant of Unicorn. It fully embraces
00:11:55.920 the one request per process model. This
00:11:58.959 provides true GVL-free parallelism and
00:12:02.000 outstanding resiliency. But Pitchfork
00:12:04.720 creator wanted to solve the biggest
00:12:07.040 drawback of this model, memory usage.
00:12:11.519 Let's revisit our copy on write example.
00:12:15.279 We have a parent and two fork children
00:12:18.079 with both shared and private memory.
00:12:22.399 The key to pitchfork's optimization is
00:12:25.040 realizing that private memory isn't
00:12:28.160 monolithic. It has two parts. There is
00:12:32.079 private processing memory for objects in
00:12:34.959 a single request which gets garbage
00:12:36.880 collected.
00:12:38.639 There is also private static memory used
00:12:41.680 by the VM for things like inline caches
00:12:44.720 and JIT compile code. This is a warm-up
00:12:48.399 data that makes a worker fast.
00:12:52.079 And this reveals the critical
00:12:53.839 inefficiency of a standard server. As
00:12:57.120 each worker warms up, it builds this
00:12:59.440 private static block of JIT caches and
00:13:02.240 VM optimizations.
00:13:04.240 Now the crucial point is this. While the
00:13:07.040 content of this caches will be nearly
00:13:09.920 identical across all warmed up workers,
00:13:12.880 the physical memory pages they occupy
00:13:15.920 are not shared. Each worker has to build
00:13:19.360 and store its own private copy. You end
00:13:22.880 up with dozens of processes all holding
00:13:26.560 identical but separate copies of the
00:13:29.200 same warm-up data. This is a significant
00:13:32.959 waste of memory. It is this precise
00:13:35.680 inefficiency, the duplication of
00:13:38.480 identical but non-shared data that
00:13:41.279 pitchforks refing is designed to
00:13:44.320 eliminate.
00:13:46.959 Reforokking is a core idea of pitchfork.
00:13:50.560 Once a worker is fully warmed up, it's a
00:13:53.760 perfect template for new workers.
00:13:56.720 Instead of forking from the cold master,
00:13:59.760 pitchfork can promote a hot worker to
00:14:02.639 become a new mold and fork new
00:14:06.160 pre-warmed workers from it.
00:14:10.000 When refing is enabled, pitchfork forks
00:14:13.440 a gen zero mold which boots your
00:14:15.760 application.
00:14:17.360 Then the workers are spawned from the
00:14:19.920 mold.
00:14:22.160 Reforokking is triggered when a
00:14:24.399 configured number of requests have been
00:14:26.240 processed. When it is triggered, the
00:14:29.279 monitor spawns a gen one mold from one
00:14:32.320 of the workers. Once the gen one mold
00:14:35.440 has started, the gen zero mold is
00:14:37.760 terminated.
00:14:40.240 Then one by one each worker is replaced
00:14:44.000 by a fork from the new mold. This
00:14:47.040 process repeats for several generations.
00:14:51.199 The result is a cluster where every
00:14:53.839 single worker shares the exact same
00:14:56.560 fully populated caches and jitted code
00:14:59.440 from the moment it starts.
00:15:03.440 Due to an operating system bias on how
00:15:06.320 traffic is distributed, some workers
00:15:08.959 will handle significantly more traffic
00:15:11.360 than others. Since the criteria for
00:15:14.560 promoting a worker into the new mold is
00:15:17.600 a number of requests it has handled,
00:15:20.000 it's almost always the most warmed up
00:15:22.480 worker that ends up being used as a
00:15:24.880 mold.
00:15:27.440 The result is dramatic. The memory that
00:15:30.480 was previously private static is now
00:15:33.199 shared across all workers. The only
00:15:36.399 private memory each worker needs is for
00:15:38.800 the request it is actively processing.
00:15:42.800 This makes adding more workers
00:15:45.199 remarkably cheap from a memory usage
00:15:47.680 perspective.
00:15:51.680 So why would you choose pitchfork?
00:15:54.800 First, it's built on a simple and robust
00:15:58.320 process-based foundation. This gives you
00:16:01.360 powerful operational benefits like
00:16:03.759 outofband GC, simpler performance
00:16:06.399 debugging without GVL contention and
00:16:09.279 outstanding resiliency. If a worker
00:16:11.920 misbehaves, you can simply terminate
00:16:14.000 that one process and it doesn't affect
00:16:16.000 any other request.
00:16:19.519 Second, pitchfork builds on that
00:16:21.680 foundation with it unique innovation
00:16:24.240 refoc.
00:16:25.759 This provides two key advantages. It
00:16:29.199 achieves significant memory efficiency
00:16:31.839 by sharing the JIT cache and other VM
00:16:34.320 artifacts that are normally private. And
00:16:37.519 critically, it delivers a low and
00:16:40.320 consistent latency because reforoking
00:16:42.959 ensures every new worker is a perfect
00:16:46.079 pre-warmed copy. Here's
00:16:50.240 a publicly shared data from Pitchfork's
00:16:52.399 deployment on the main Shopify
00:16:54.240 monoliths. They reported a 30% reduction
00:16:57.600 in memory usage. More importantly, they
00:17:00.880 achieved a 9% reduction in P99 latency.
00:17:05.120 These are significant measurable
00:17:06.959 improvements in a massive scale.
00:17:11.760 Of course, there are some caveats.
00:17:13.919 Reforoking isn't enabled by default and
00:17:16.720 there are few gems which are
00:17:18.079 incompatible.
00:17:19.919 You also have to ensure that your
00:17:21.839 application is fork safe.
00:17:26.559 So to bring everything together, let's
00:17:28.880 look at a decision framework.
00:17:32.400 Each server has a strengths and a set of
00:17:35.600 trade-offs.
00:17:37.120 Puma solves for general purpose
00:17:39.120 concurrency. Its trade-off is a GVL
00:17:42.320 contention and IO blocking we analyzed.
00:17:46.080 Falcon solves for high IO throughput.
00:17:49.039 Its trade-off is a cooperative
00:17:50.799 scheduleuler which makes it unsuitable
00:17:53.280 for CPU heavy workloads.
00:17:56.320 And pitchfork solves for lowest latency
00:17:59.200 and memory usage. Its trade-offs are
00:18:02.240 potentially lower throughput on mixed
00:18:04.400 workloads and the engineering burden of
00:18:07.440 ensuring folk safety.
00:18:10.240 This means that the decision comes down
00:18:12.320 to two questions. First, what is the
00:18:15.919 nature of your workload? Analyze your
00:18:18.720 systems telemetry to understand your
00:18:21.280 actual bottlenecks.
00:18:23.280 And second, what do you value? Is your
00:18:26.799 priority raw throughput, predictable low
00:18:29.919 latency, or absolute resiliency?
00:18:33.679 There isn't a single best choice, but
00:18:36.640 there is a right choice. It's the one
00:18:39.840 whose trade-off you're most willing to
00:18:41.760 accept to solve your specific problems.
00:18:45.679 Thank you.
00:18:56.559 Sorry, I ran through it faster than I
00:18:58.400 practiced.
00:19:02.960 So, uh, since we have a ton of time,
00:19:05.280 like if you have questions, I'm glad to
00:19:07.200 answer.
00:19:08.720 So you went through your three favorite
00:19:12.160 or the three that you think are
00:19:15.039 appropriate in 2025. There's obviously
00:19:17.600 others around. We got Unicorn or
00:19:21.760 the team I'm using team I'm on is still
00:19:24.000 using Passenger. Uh how do those fit
00:19:26.960 into this scene?
00:19:29.840 I think
00:19:31.919 uh like if you're running unicorn uh
00:19:35.360 even its creator has said that it is not
00:19:37.679 actively maintained uh if I remember
00:19:40.480 right like you can run pitchfork without
00:19:43.360 uh reforoking enabled and you get a
00:19:46.240 modern web server which is actively
00:19:48.559 deployed uh and maintained so and I
00:19:52.960 don't know anybody who uses passenger
00:19:54.720 these days so no answer to
00:19:59.039 Yeah, you gave some numbers around uh
00:20:01.919 memory gains, efficiency gains. Uh were
00:20:04.240 those versus base rails or were they
00:20:07.840 against uh unicorn or
00:20:10.400 so? Uh the Shopify numbers they were
00:20:13.039 running unicorn before and uh the memory
00:20:16.400 reduction and the latency reduction is
00:20:18.720 from running pitchfork.
00:20:22.320 Yeah.
00:20:22.960 Yes. Thank you. Um, you mentioned there
00:20:25.120 are some gems that are not compatible
00:20:26.559 with featur
00:20:30.080 for gems to know which are compatible or
00:20:32.240 is there like a list or is there
00:20:34.320 something they do specifically that
00:20:36.480 makes them incompatible?
00:20:37.840 Yeah, there is a list. Uh, in fact, uh,
00:20:40.080 Pitchfork has amazing documentation. Uh,
00:20:43.600 in fact, uh, I think it's one of the
00:20:45.520 best that I've come across across all
00:20:47.039 gems because it specifically tells you
00:20:49.520 it doesn't try to sell you pitchfork. uh
00:20:51.840 it tells you why uh if you're happy with
00:20:54.320 what you have continue with it and if
00:20:56.720 you have only these specific problems
00:20:58.159 proceed with it and uh it has an
00:21:01.280 exhaustive I mean it has a list of all
00:21:03.280 the gems that they have found
00:21:04.640 incompatible and which they have also
00:21:07.760 working on making it compatible so one
00:21:09.760 of the most common one is gRPC uh but
00:21:13.120 there exist a fork of gRPC which is
00:21:16.159 refing safe
00:21:20.640 how How do you check if your application
00:21:23.039 is fork safe?
00:21:25.440 Uh if it doesn't have any issues in
00:21:27.679 production I guess
00:21:33.840 and and what happens when there is a
00:21:36.000 memor memory leak and then that process
00:21:38.799 keeps getting forked afterwards.
00:21:42.240 Um so one of the reasons why uh the
00:21:46.240 creator of pitchfork started this was
00:21:48.320 memory leak in their uh monolithic
00:21:51.440 application in the sense that they had
00:21:53.840 already configured to uh terminate a
00:21:56.880 worker in unicorn when it breached a
00:21:59.120 certain memory and with pitchfork since
00:22:02.080 everything is shared the memory usage
00:22:04.000 itself has come down. So if there is a
00:22:05.760 memory leak that is purely due to that
00:22:08.240 particular request that it has handled.
00:22:10.320 So, and you can apply everything that
00:22:12.559 you apply in unicorn to pitchfork like
00:22:14.240 you can configure timeouts or uh kill a
00:22:17.840 worker if it breaches a certain memory
00:22:19.760 and all that.
00:22:21.679 Yeah.
00:22:23.280 One one slightly more tactical question
00:22:25.760 like how do you like let's say we want
00:22:28.080 to use pitchfork or falcon in our
00:22:30.480 current like puma you know servers or
00:22:33.200 whatever.
00:22:34.000 Yeah. Yeah. Do have you like how how do
00:22:35.760 you even test that because like it's
00:22:37.919 like you just deploy Puma and see what
00:22:39.600 happens or like deploy Falcon and see
00:22:41.120 what happens. I'm I'm I'm certain that's
00:22:43.520 not the answer like is there tooling
00:22:45.760 around like have you guys tactically
00:22:47.360 tried deploying like the same
00:22:50.480 application on two and comparing like
00:22:52.000 how do you guys did you guys do that?
00:22:54.159 Yeah, I have deployed uh Puma and Falcon
00:22:58.559 together not pitchfork though. uh like
00:23:01.200 what I have done is I'd set up a proxy
00:23:05.039 in front of it and forwarded only
00:23:06.960 certain requests to falcon which I knew
00:23:09.039 were io heavy because they were calling
00:23:11.520 some external APIs and I didn't want to
00:23:14.080 handle that through sidekick and I when
00:23:16.000 I knew that it was much easier to do in
00:23:17.840 falcon u but I don't think in that
00:23:22.159 particular case it would have mattered
00:23:23.679 if the entirety of the application was
00:23:26.559 served by falcon because we were not
00:23:28.480 really CPU bound But you can always like
00:23:32.720 deploy the same Rails application in
00:23:34.320 multiple servers and just pass a certain
00:23:37.039 percentage of traffic to it and observe
00:23:39.120 how it behaves.
00:23:40.720 Uh so thanks for a nice talk. Um and
00:23:43.440 since Puma is probably fine for most
00:23:46.000 reals applications uh unless there is a
00:23:48.559 GVL contention um issue or problem, how
00:23:52.080 would I go about um finding out if my
00:23:54.960 application um is prone to to those kind
00:23:58.480 of things?
00:24:01.440 A lot of the APM frameworks give you uh
00:24:04.480 data on that. Uh but sometimes it's it's
00:24:07.360 tricky. Uh like I have seen
00:24:11.279 uh misattributed uh data like that the
00:24:15.360 database is being very slow which I then
00:24:17.840 later realized that it was it wasn't
00:24:19.600 actually slow but it was just because of
00:24:21.679 GVL contention that it appeared to be
00:24:23.520 slow. So uh one of the things that I've
00:24:26.799 done is try reducing the puma uh thread
00:24:29.679 counts and see how the application
00:24:31.520 behaves. Uh so if if you see a
00:24:35.120 significant increase uh significant
00:24:37.760 increase in performance that's a
00:24:38.960 decrease in latency uh then that means
00:24:41.840 you have GVL contention which is
00:24:43.520 affecting your application. So you can
00:24:46.240 just start by reducing your pum count.
00:24:49.520 Anyone else?
00:24:52.640 Okay. Thank you all.
Explore all talks recorded at RailsConf 2025
Manu Janardhanan
Christopher "Aji" Slater
Hartley McGuire
Yasuo Honda
Ben Sheldon
Chad Fowler
John Athayde
Mike Perham
+77