LLM Evaluations & Reinforcement Learning for Shopify Sidekick on Rails
Andrew Mcnamara and Charlie Lee
• Amsterdam, Netherlands
•
Talk
Date: September 05, 2025
Published: not published
Announced: Tue, 20 May 2025 00:00:00 +0000
This talk explores building production LLM systems through Shopify Sidekick's Rails architecture, covering orchestration patterns and tool integration strategies. We'll establish statistically rigorous LLM-based evaluation frameworks that move beyond subjective 'vibe testing.' Finally, we'll demonstrate how robust evaluation systems become critical infrastructure for reinforcement learning pipelines, while exploring how RL can learn to hack evaluations and strategies to mitigate this.
Rails World 2025