00:00:07
Hi everyone. Welcome to The Rise of the Agent. We're going to be talking about agents today, specifically in Rails. I'm Kinsey. It's nice to meet you all. I'm from Denver, Colorado. I'm on the coding agent core team at GitHub, I'm the vice president of the board of directors at Ruby Central, and I'm also a mom. My two kids are here today watching me talk for the first time — if you see them, say hi.
00:00:42
Thank you. A little bit more about Ruby Central: we're behind RubyGems and many open-source initiatives. Marty just gave a lightning talk — he's our head of open source about CRA, which was great. We also put on RubyConf; we're about to announce where and when that will be in the U.S., and we'd love to see you there. We're also behind Exo Ruby and the small local conferences Jim is organizing across the U.S., so be sure to check those out if you're in the U.S.
00:01:10
Today we'll talk about what agents are, how to build them, best practices, the future of agents, and ethical considerations when building agents.
00:01:23
When I first thought about AI and Copilot at GitHub I was on a different team — I was on the deploys team and thought I would be tuning LLMs. I wasn't sure what I was getting into. When I joined the agent team it was very different from what I expected: I wasn't tuning LLMs. Agentic AI is the new frontier; it's where tooling and AI companies are headed. But what are agents? What is agentic AI? What are workflows? Before we go further, let's do a quick vocabulary lesson so we're all on the same page.
00:02:00
Agentic AI, at a high level, is the notion that AI has agency — it's able to act without being repeatedly prompted. It makes steps to achieve a goal and behaves in a more humanlike way. Agents are the software systems behind this: they can act autonomously, make decisions, understand context, and chain tasks without being hardcoded to do so. A modern agent typically has a goal or objective, tools, a planner or orchestrator, and short- and long-term memory. We'll dig into those building blocks when we talk about how to build agents.
00:02:42
What are workflows? Workflows are often confused with agents. They're a key concept in agent systems but they're not the same thing. Workflows can be part of an agent: they turn interactions into structured, more reliable processes and are not merely chatbots.
00:03:03
Workflows are regimented and rule-based, which makes them great when you know exactly what steps are needed. Agents, by contrast, are model-based and better at adapting and solving open-ended problems. Workflows plus agents form agentic systems. I could talk about workflows all day, but we only have 30 minutes, so we'll focus on agents for open-ended problems where it's hard to predict the number of steps and where agents may take many turns and require some trust because of their autonomy. They're ideal for scaling tasks in trusted environments.
00:04:01
Because agents are autonomous, they come with higher costs and the potential for compounding errors, so extensive testing in sandbox environments is essential. Agents represent a paradigm shift from imperative programming — where we tell software exactly what to do — toward declarative goal setting, where we define objectives. There are different types of agents: learning agents, which improve over time from experience and from the tools they use; utility-based agents, which optimize actions based on tradeoffs to maximize performance or "happiness"; model-based agents, which maintain an internal model of the world to make decisions beyond immediate inputs; goal-based agents, which set a defined goal and work toward it; and simple reflex agents, which react only to current inputs without relying on memory.
00:05:03
There are many real-world uses for agents: customer support, security threat detection, travel concierges, and of course coding agents. Coding agents can take mundane tasks and do impressive things, like creating a pull request for you. I want to briefly show what we've been working on at GitHub with our coding agent team.
00:07:07
Okay, I promise that's as GitHub-y as I'm going to get, but I wanted to show you quickly: you can assign Copilot to a PR or an issue and it can create a PR for you, which you then review. This shows the difference between an agent actually writing code versus simply prompting Copilot with questions like "How do I do this?" — it's taking that next step forward.
00:07:35
Now let's talk about building agents — the meat of the talk. You can build single or multi-agent systems; today we'll focus on building a single agent. Here are the essential or common building blocks we use when building agents. To solidify these concepts, I built a small Rails application: a dummy support agent (more than a bot — it's a support agent) and we'll use that as an example.
00:08:16
Tooling layer: I recommend using MCP — the Model Context Protocol — a standard that defines how models securely discover and call external tools, services, and structured JSON interfaces. MCP comes from Anthropic; there's a whole talk on MCP tomorrow from Paul, so I won't deep dive here. In my application I call a Zendesk MCP server to get context for the agent about support tickets.
00:09:06
Memory: agents need both short- and long-term memory. Long-term memory can be a database table with content and run IDs; after every planner response, tool call, or user input you append a row. This creates a timeline of the agent's thoughts and context. Before asking the LLM planner for the next step, you can recall recent context to prevent forgetfulness in long-running sessions.
00:09:42
State store: short-term memory is the single source of truth for an agent's current run or working state — think of it as the orchestrator's working memory. It's not logs or long-term memory; it tracks everything the planner and agent need to continue execution. It contains run metadata, the current plan, tool outputs, error retries, gate decisions, and so on. This is important for resumability, concurrency, and knowing whether to continue, pause for human review, or stop. In our app we use a Run state store stored in a JSONB column to hold run-scoped working data, with pointers for simple cursors and statuses to control processing.
00:10:18
Agent orchestration is a major part of building agents: it's the logic that plans, executes, and supervises multi-step agentic work. The orchestrator calls tools or the LLM, implements guardrails, persists state, handles retries and rollbacks, observes, and audits — essentially the glue between LLMs and business outcomes. In code, we ask the planner for the next action in strict JSON, store the raw plan as a workflow step, gate the plan with policy and schema checks (for example: does the user have access, or have they exhausted their LLM calls?), act using MCP as our tool protocol, persist the step, and accumulate state so the next plan can see previous results. If the planner returns finished, we gate the final answer and return the run.
00:11:04
Planning is the agent's executive function: it turns a goal into a sequence of verifiable steps and actions. In modern agentic AI we use four common tactics: subgoal composition (splitting a big goal into smaller steps so the planner proposes the next action), reflection (inspecting the latest output to decide how to improve), self-critique (scoring an immediate or final result against criteria and revising or escalating), and chain-of-thought (internal reasoning about what to do next). The planner should return strict JSON with either a 'use_tool' or 'finish' action; we persist the planner's decision (not the full model reasoning) — in Ruby we often use the OJ gem for optimized JSON — and then the orchestrator executes based on that decision.
00:13:17
Now that we've covered the essential building blocks and what they look like in our support agent example, let's go over best practices we've learned on the agent team at GitHub. First: make your system modular and maintainable. Agents are complex and tightly coupled; you don't want a small change to break everything. Tooling and strategies change rapidly, so iterate fast and keep components isolated. Second: add gates and guardrails — policy checks, rate limiting, or even prompting the human for confirmation (for example, a coding agent asking whether to keep or undo generated code). Use validation contracts (we use dry-schema) to ensure agent requests contain required fields.
00:14:16
Keep prompt changes minimal: a seemingly harmless tweak can cause the planner to stop using tools, hallucinate actions, or break JSON contracts. Extensive testing is essential: we run nightly evaluations and use test harnesses to lock in behavior, tweak prompts, and reduce the number of tool calls. There's a whole talk on LLM evaluations and reinforcement learning tomorrow by Andrew and Charlie from Shopify that I recommend. Observability is critical: everything the agent does needs to be tracked, inspected, and explainable. The building blocks we discussed, like memory and state stores, are essential for observability and for ensuring agents remain ethical and dependable. For example, with Copilot you can view a session after it creates a PR to see and track what it did and thought — that level of accountability matters.
00:16:36
What will the future of agents look like? Agents will likely become more humanlike — anthropomorphization is the attribution of human traits, emotions, or intentions to non-human entities — and it will matter how the agent feels to you: personality, voice, tone, values, and consistency will be as important as what the agent does. As memory and personalization improve, agents will learn how you write code and what you value, becoming more ethical, dependable, and aligned to a brand. Subagents and hierarchies will be important: we'll see agents with single responsibilities (SRP-style), where each subagent has its own role rather than one monolithic agent. Future systems will be workflow-native, modeling branching, retries, checkpoints, and rollbacks natively, with agent orchestration becoming central. I imagine blends of traditional orchestration tools (Airflow, Temporal) with LLM-aware adapters — workflow engines with LLM steps inside.
00:18:39
That said, agent solutions can be overengineered. It's easy to reach for the shiny new toy; sometimes it's better to use agents in side projects and not in production until you've learned what you need. Choose the right tool for the problem at hand and build the agent you actually need.
00:19:10
Ethical considerations: transparency is essential — be transparent about what your agents are doing and keep users included during development. Question the models you're using: are the LLMs biased? We dogfood our tools extensively to ensure agents do what we expect and to detect bias. Implement safeguards, gates, and policy checks — it's easy to build an agent without these, but they're essential for keeping behavior on track. Also enforce security and retention policies, make agent activity visible to developers, and perform regular audits and dogfooding.
00:20:24
Regarding Ruby and Rails tooling, we've been a bit slower to adopt major SDKs and tooling compared with TypeScript and Go ecosystems. We should ensure Rails remains relevant in this space and discuss how to keep Rails a major player. We should also focus on positive societal impact — augmentation and acceleration rather than replacement. This is a fundamental evolution in how we build software: moving beyond traditional programming paradigms toward systems that reason, learn, and adapt. These capabilities can dramatically change our work and free up time for what matters — for me, that means more time with my kids.
00:21:24
How can we build agents for good? How can we make these powerful tools have positive impact rather than negative? How can agents make our world better — and, importantly, how can they make your world better? Thank you.