From Resque to SolidQueue - Rethinking our background jobs for modern

From Resque to SolidQueue - Rethinking our background jobs for modern times

Introduction

Andrew Markle, a staff engineer at FullScript, presents at RailsConf 2025 about their migration from Resque to SolidQueue, detailing why the change was necessary, their approach to the migration, challenges encountered, and the benefits realized.

Background and Motivation

FullScript's background job system had used Resque since 2011, but as the company scaled (to over 150 engineers), Resque's limitations became apparent: maintenance burdens, mysterious failures, underwhelming reliability, and operational complexity.
SolidQueue, a new default job runner in Rails 2024, became attractive due to:
- Native support for various databases (e.g., MySQL, Postgres)
- Deep integration with Rails, implying long-term community support
- Improved observability by leveraging familiar database tooling

Migration Strategy

Preparation: Ensured all jobs used ActiveJob (a Rails abstraction). Migrated legacy Resque jobs to use this interface to ease transition.
Database Decisions: Implemented SolidQueue on a separate MySQL database in production to avoid adding heavy write loads to the application’s primary database. Development and staging used the same DB in a different schema.
Transactional Integrity: Leveraged SolidQueue’s configuration for transactional integrity, choosing to keep behavior matching Resque during migration. Tools like the isolator gem were used to identify potential issues.
Queue Renaming and SLO Alignment:
- Moved away from confusing, legacy priority/domain-based queue names to names directly reflecting job expected latency (e.g., within_1_minute).
- This improved clarity for developers and operations, made SLOs explicit, and enabled actionable alerting and autoscaling tied to queue latency metrics.
Worker Configuration: Set up dedicated workers for each latency queue, enabling straightforward autoscaling by queue demand. Fast queues were over-provisioned to handle unexpected spikes, while balancing resource costs.
Monitoring: Utilized the Yabida gem to pull queue latency and job count metrics from the database, pushed to Prometheus, making it easy to build dashboards for real-time observability of queue health.

Migration Execution and Pragmatic Lessons

Incremental Rollout: Overrode ActiveJob's queue assignment method to direct jobs to SolidQueue or Resque, allowing gradual, low-risk migration.
Error Handling and Rollback: Implemented around-queue blocks to fall back to Resque on SolidQueue failures, ensuring no lost jobs during transition.
Challenges Encountered:
- Argument length limits – resolved by increasing DB column size.
- Unexpected worker shutdowns – traced to memory limits, resolved by increasing instance memory.
- Database connection exhaustion – occurred during large job influxes, mitigated by scaling DB and limiting autoscaler.
- Misplaced long-running jobs – enforced per-queue job execution ceilings and monitored via Sentry alerts.
- Migrating delayed/scheduled jobs – custom scripts were used to migrate jobs from Redis to SolidQueue.
- Observed that batch job enqueueing creates DB spikes; addressed with techniques like batching and randomized scheduling.

Results and Takeaways

Migrated all jobs with minimal downtime and no lost jobs over a couple of months with a small team.
After migration, job failure rates dropped significantly for high-volume jobs, demonstrating improved reliability.
SolidQueue delivered strong performance and scaling with improved observability and operational simplicity.
Renaming queues based on latency SLOs was a substantial benefit, clarifying responsibilities and making alerting actionable.
The process serves as a real-world playbook for modernizing background job infrastructure in Rails without disrupting service.

From Resque to SolidQueue - Rethinking our background jobs for modern times
Andrew Markle • Philadelphia, PA • Talk

Date: July 10, 2025
Published: July 23, 2025
Announced: unknown

If your Rails app has been around for a while, you might still be using Resque for background jobs. We were too—until scaling issues, missing features, and increasing maintenance costs made it clear that Resque was no longer working for us.

This year we migrated to SolidQueue, Rails’ new default job runner and haven't looked back. This talk will walk you through how we did it—what worked, what didn’t, and what we learned along the way.

Key takeaways:
• Why we left Resque
• How we migrated with minimal disruption using a parallel rollout
• Why we went through the effort to re-name all our queues so that they were SLO-based (within_1_minute) and why this matters
• Lessons learned, pitfalls to avoid, and how SolidQueue made our jobs (and jobs!) easier

If your background jobs are from a previous era, this talk will give you a practical, real-world migration playbook to modernize with SolidQueue—without breaking anything.

RailsConf 2025