Summarized using AI

A side gig for RuboCop, the Bookworm code crawler

David T. Crosby • April 16, 2025 • Matsuyama, Ehime, Japan • Talk

In the presentation titled "A side gig for RuboCop, the Bookworm code crawler," David T. Crosby, a production engineer at Meta, discusses the development of Bookworm, an open-source code crawler that operates on top of RuboCop. RuboCop is primarily recognized as a linting and refactoring tool but is leveraged here for its NodePattern API to better analyze and understand the extensive Ruby codebase used for Chef at Meta.

Key Points Discussed:

  • Overview of Chef at Meta: Chef is a Ruby DSL (Domain Specific Language) utilized for configuration management, responsible for managing the setup of systems over many servers, a requirement at Meta where millions of servers operate.
  • Transition from Food Critic to RuboCop: The presentation highlights the limitations of Food Critic, a legacy linter specific to Chef, that has been replaced by RuboCop in combination with a plugin called Cookstyle. RuboCop introduced features like autocorrection which are advantageous for maintaining clean and safe code.
  • Importance of Linting: Crosby emphasizes the value of linting in software development, comparing it to various safety protocols in industrial jobs, advocating for the necessity of tools that assist in maintaining code quality.
  • Utilizing NodePattern API: He illustrates how to write custom RuboCop rules using the NodePattern API to catch errors in code such as incorrect constant assignments, thereby enhancing code safety and correctness through static analysis.
  • Challenges with Existing Tools: Despite the utility of RuboCop, Crosby points out its limitations such as operating on a single file and not being able to analyze project-wide dependencies efficiently. These limitations presented challenges in large-scale code bases.
  • Introduction of Bookworm: In response to these challenges, Bookworm was created to crawl Chef code and maintain a knowledge base about code dependencies and attributes of cookbooks. This tool allows teams to have a clearer view of the codebase and manage it better with the ability to delete unused code safely and effectively.
  • Future of Bookworm: Crosby concludes by inviting collaboration and exploration of Bookworm by the community, encouraging its use in various Ruby contexts beyond Chef, and hinting at potential expansion into other programming languages.

Conclusion:

The presentation effectively showcases how tools like RuboCop and Bookworm are indispensable for managing large codebases, emphasizes the evolution of linter tools in facilitating safer coding practices, and invites further development and exploration within the coding community.

A side gig for RuboCop, the Bookworm code crawler
David T. Crosby • Matsuyama, Ehime, Japan • Talk

Date: April 16, 2025
Published: May 27, 2025
Announced: unknown

RuboCop is typically thought of as 'just' a linting or refactoring tool. However, one of RuboCop's foundational features, the NodePattern API, is so useful for crawling Ruby AST that an open-source tool called Bookworm has been written that uses the NodePattern API to understand the large Chef Ruby codebase used at Meta.

https://rubykaigi.org/2025/presentations/dafyddcrosby.html

RubyKaigi 2025

00:00:04.640 uh hello Um so uh this talk is about
00:00:10.000 bookworm It's a code crawler that uh we
00:00:12.880 built on top of rubocop Uh I'm David
00:00:16.080 Crosby I'm a production engineer at Meta
00:00:18.880 We make Facebook WhatsApp Instagram
00:00:23.439 um Quest a bunch of stuff
00:00:27.599 and I primarily work on our Ruby tooling
00:00:30.320 Um specifically in particular uh the
00:00:32.960 tooling that we have around Chef Um so
00:00:36.719 you're probably uh if you don't already
00:00:39.200 know Chef is a Ruby DSL uh for
00:00:42.960 configuration management So you know it
00:00:44.879 gets the the files and um packages that
00:00:48.640 you need on the system onto the system
00:00:51.960 Um and Meta has been using uh Chef for
00:00:55.680 over a decade now Um Phil Dowitz first
00:00:58.480 did a talk back in Chef Comp 2013 Um and
00:01:03.280 it fundamentally hasn't changed It it's
00:01:05.680 basically been uh the same model since
00:01:07.920 then Um and the open source that we put
00:01:12.159 out and continue to put out um still
00:01:15.040 being used by Greenfield projects um by
00:01:17.920 other companies and internally So it's
00:01:20.799 it's it's kind of a niche API but it
00:01:23.520 scales gloriously
00:01:26.759 Um yeah so it met scale um if you're not
00:01:31.200 familiar it's sort of you're operating
00:01:32.640 at millions of servers So that's a
00:01:35.520 pretty decent Ruby footprint Um it it
00:01:38.240 we're not really known for that but it's
00:01:40.400 there
00:01:42.119 Um so to manage all of our Chef code uh
00:01:47.840 we were using Food Critic um which it
00:01:51.759 was a llinter that it it predated
00:01:54.320 Rubocop by about a year thereabouts Um
00:01:57.600 and it it had some decent traction but
00:02:00.640 got to about 120 rules but um it was
00:02:03.840 chef specific and so in 2016 thereabouts
00:02:08.560 food critic was end of life and uh a new
00:02:12.160 plugin for rubocop called cookst style
00:02:14.800 came out Um the problem is uh inside
00:02:19.760 Facebook at the time we still had a
00:02:22.160 whole bunch of internal Food Critic
00:02:24.080 rules So we largely just kept Rubocop
00:02:26.640 for style
00:02:28.200 Um and we we sort of kept Food Critic
00:02:32.480 alive internally but um Food Critic uh
00:02:37.120 was kind of missing an important feature
00:02:38.720 that's not terribly obvious if you're
00:02:40.640 just thinking about things as style
00:02:42.640 lentters Um so real quick who here has
00:02:46.480 ever worked like an industrial job where
00:02:49.440 you got to like drive a forklift or work
00:02:51.440 with heavy machinery like not just a
00:02:54.040 keyboard okay there's a that's that's
00:02:57.519 great because then you probably haven't
00:02:58.640 seen this one
00:03:00.440 before So this is the hierarchy of
00:03:03.040 hazard controls Um and
00:03:06.680 uh it it's fairly fairly simple to
00:03:09.840 reason about basically the stuff that's
00:03:11.760 happening at the bottom it's kind of the
00:03:13.599 least effective way of keeping your
00:03:15.120 fingers Um you want the stuff that's at
00:03:17.680 the top Um so like one one example of
00:03:23.280 this uh there's this thing called the
00:03:25.920 band saw which is an awesome tool Um
00:03:29.440 it's just this blade that goes really
00:03:31.840 fast and it will just rip through wood
00:03:34.480 or whatever um like butter The thing is
00:03:38.159 though it is incredibly dangerous So
00:03:41.840 like at the bottom of this you're
00:03:43.840 wearing Kevlar gloves or you know safety
00:03:46.599 glasses Um but like that's not going to
00:03:50.080 protect you from a blade like that Um so
00:03:52.959 you could have a dead man switch So you
00:03:55.360 get pulled in or something it's going to
00:03:57.680 stop Uh you could also uh if you look at
00:04:01.840 the yellow one for isolating people from
00:04:03.840 the hazard you could put in safety
00:04:05.200 guards So that way you know maybe you
00:04:07.840 don't be directly interacting with the
00:04:09.920 blade because if that blade breaks it
00:04:12.319 flies into the air scares the pants off
00:04:14.239 of you and you decide you don't want to
00:04:15.680 work in industry anymore Um so um like
00:04:20.639 the the safest option with a band saw
00:04:23.520 don't use one Um so elimination you've
00:04:26.000 physically removed the hazard get
00:04:27.280 something pre-cut Um now this also can
00:04:31.440 apply to software So um I'm surprised I
00:04:36.720 still have to say this in 2025 Llinters
00:04:39.280 are important
00:04:41.000 Um red at the bottom here um that's the
00:04:44.880 get good school That's using Vim No
00:04:48.120 llinters no no syntax highlighting Like
00:04:51.120 you can do it It's awful Um and
00:04:56.240 uh you know you're it sort of assumes
00:04:58.240 that you're perfect which we're human
00:05:00.639 We're obviously not Um now you can go up
00:05:04.560 a level and you got pure code review but
00:05:07.199 now you're sort of relying on a Swiss
00:05:08.880 cheese defense So like okay you've got
00:05:12.320 some person who's an expert Well what if
00:05:14.000 the expert goes on vacation or something
00:05:16.639 well now you're back to red zone So
00:05:20.360 um you want something that's operating
00:05:22.720 at the very least at the yellow So
00:05:24.320 that's a llinter that's in your
00:05:26.479 continuous integration system It's
00:05:28.400 happening on every single commit Food
00:05:30.520 Critic will just live in yellow um
00:05:33.720 because that crucial feature that it's
00:05:36.080 missing is autocorrection Um Rubocop
00:05:39.840 affords autocorrection So that actually
00:05:42.160 gets you to one layer up um where you
00:05:44.960 are now taking unsafe code and making it
00:05:48.560 safe because uh if all you're doing is
00:05:51.440 complaining about things um people don't
00:05:54.320 always know how to fix that you're
00:05:56.080 working with a broad range of expertise
00:05:58.479 in any given environment And if you're
00:06:00.880 operating with hundreds of engineers
00:06:02.479 like yeah there's going to be engineers
00:06:04.560 who they will do the barest minimum to
00:06:07.759 get code shipped
00:06:11.240 Um so
00:06:14.440 uh yeah let's real quick um I want to
00:06:20.319 take a look at a a real example that you
00:06:23.039 can see again with a large environment
00:06:25.759 You've got you are trying to assign a
00:06:28.639 constant to fu Um that is syntactically
00:06:32.639 valid but quite obviously wrong Um so
00:06:36.080 you're not going to see that explode
00:06:37.600 necessarily until runtime Um which is
00:06:41.039 going to be a problem So uh we can we
00:06:46.319 got a lot of smart folks in the room We
00:06:48.080 can actually just make a uh a rubocop
00:06:50.720 rule to deal with this exact problem
00:06:52.560 right now So let's do that Um so this is
00:06:58.160 a a real
00:06:59.880 um rubocop cop It we actually open
00:07:03.440 source this one We're open sourcing more
00:07:05.520 of what we've got
00:07:07.319 Um and all it's looking at um so we
00:07:12.240 we've got our class named Python caps
00:07:15.800 Um the on const uh method here What
00:07:21.199 that's doing is it's a handler So you've
00:07:24.479 got this abstract syntax tree of your
00:07:27.520 source and um the different uh nodes of
00:07:32.080 the
00:07:33.240 ASD as the
00:07:35.639 um as Rubocop's going traversing through
00:07:38.800 that tree You want in this case to say
00:07:41.919 okay we we see a constant let's fire on
00:07:44.720 that So we fire on the constant and the
00:07:47.759 death node matcher What this is doing is
00:07:49.680 it's actually creating a new method
00:07:50.960 called const caps true false um and the
00:07:54.800 really fancy string in there is the node
00:07:56.800 pattern API Now this thing this thing
00:08:00.000 feels like the band saw of uh rubocop
00:08:03.360 It's awesome Um but
00:08:06.919 um the uh probably overkill for this
00:08:11.280 particular example but I wanted to show
00:08:12.720 it off Um so we got a constant and we're
00:08:16.240 looking for if that constant's name is
00:08:17.840 false or true And if it is we're going
00:08:21.360 to add an offense Hooray We're now at
00:08:23.840 the yellow of that triangle So um this
00:08:27.360 is a good thing Now here's how we turn
00:08:30.000 it into a green We add the extend
00:08:33.599 autocorrector line And um we pass uh uh
00:08:39.680 we have a block where we have um the
00:08:42.719 replace method We're going to take that
00:08:44.720 particular nodes range of uh source and
00:08:48.080 we're now going to take the short name
00:08:50.720 So either capital F false or capital T
00:08:53.680 true And we're just going to downcase it
00:08:56.320 There we've now replaced that with a
00:08:58.399 string that says true or false So we've
00:09:00.080 we've taken a constant and we've changed
00:09:01.760 it to a boolean literal
00:09:06.680 Um so ah apologies
00:09:11.000 uh writing code uh rewriters Um sorry
00:09:20.600 uh okay I I just came back from riding
00:09:23.920 the Shimoni Kaido for several days I'm
00:09:26.800 still still getting uh back to humanity
00:09:29.680 at this point
00:09:31.720 Uh so uh Facebook has code mods back
00:09:36.560 sort of in its DNA We we put out a code
00:09:39.120 mod tool for PHP back in
00:09:42.200 2007208 Um and code mods are basically
00:09:46.320 how you do coding at scale Now there was
00:09:49.120 a really good talk about how to do this
00:09:51.440 in Ruby in the 2024 Ruby Kai So I'm not
00:09:54.480 really going to go over it but I mean if
00:09:57.120 you aren't already using Rubocop for
00:09:59.680 large scale refactors like I'd really
00:10:02.399 encourage it
00:10:04.680 Um
00:10:06.600 so like as I was saying uh writing code
00:10:09.920 rewriters it seems overkill at first Um
00:10:12.959 and I mean this this seems like a fairly
00:10:15.200 mathy slide I I won't try to bore you
00:10:17.839 with it but
00:10:19.959 basically the thing is when your N is
00:10:23.040 small it seems like overkill but you
00:10:25.120 want to be building that muscle for like
00:10:26.959 how to use the node pattern API because
00:10:28.640 again that sucker is the band saw of of
00:10:31.600 Rubocop That is the good tool Um so
00:10:37.640 um you've got this cool way of finding
00:10:41.360 code patterns
00:10:43.240 and how do you build that muscle because
00:10:46.320 like the node pattern API I it it
00:10:49.600 absolutely made me a better Ruby
00:10:51.120 programmer and I've been doing this
00:10:52.079 since Ruby
00:10:53.959 187
00:10:55.720 Um but like it's a muscle You got to use
00:10:58.399 it a lot
00:10:59.959 Um so I'm going to leave you with this
00:11:01.920 cliffhanger for just like a little bit
00:11:03.839 Take a sip of water
00:11:10.920 Um
00:11:13.800 so sometimes to think about
00:11:17.360 uh if if you want to come up with new
00:11:19.200 things you kind of have to assess the
00:11:21.600 current state of what you've got and
00:11:24.000 come up with all of the
00:11:26.760 um all of the things that really bug you
00:11:31.320 Um
00:11:33.000 so uh just to be to be totally
00:11:36.640 transparent I'm going to beat up a bit
00:11:38.000 on Rubocop and Chef So like don't don't
00:11:41.360 don't take this as like me being ma
00:11:43.839 nasty here Um so Rubocop it operates on
00:11:48.079 one file at a time Um so let's say you
00:11:52.399 wanted to um see if that cookbook was
00:11:55.040 referencing something else
00:11:57.320 Um well you can't really do that unless
00:11:59.920 you're like using global state within
00:12:01.560 Rubocop which you can
00:12:03.800 do Don't Um it's it's it's really gross
00:12:09.800 Um there's
00:12:12.279 um also one common thing So I I do a lot
00:12:17.040 of training for uh people to be writing
00:12:19.720 rubocop uh cops within meta and uh we
00:12:25.040 see this fairly often where someone
00:12:27.120 writes a working cop but they actually
00:12:30.000 haven't tuned it to the correct files
00:12:32.720 and so because in Chef you have this
00:12:35.920 concept of cookbooks which are just like
00:12:37.360 a collection of files you've got recipes
00:12:39.920 you've got attributes so you got
00:12:41.120 metadata.rb RB files you've got
00:12:43.600 definitions it's on and on and on Um if
00:12:46.480 you want to tune to
00:12:48.600 um to just what you want like you're
00:12:51.760 you're really using two different
00:12:53.240 languages And um I don't know I I I
00:12:57.839 would rather write Ruby than a whole
00:12:59.120 bunch of YAML
00:13:01.000 Um so uh then there's also again node
00:13:05.760 pattern a you can
00:13:08.360 do like as a querying language for a
00:13:12.079 whole bunch of code like it's
00:13:14.040 great how do you use rubocop
00:13:17.800 to show you neat stuff about the code
00:13:20.639 that's not like really a problem So
00:13:22.880 Rubocop you've got the info severity
00:13:25.600 which I mean that sort of scratches the
00:13:27.600 itch um because it's letting you know
00:13:30.000 what's
00:13:31.839 uh what what things it's seeing in an
00:13:34.079 individual file But let's say you want
00:13:36.399 to know a really cool code pattern
00:13:38.959 across a whole whack of files Now for
00:13:42.639 that you're going to need like a
00:13:44.880 post-processor Um you can write a
00:13:47.440 formatter for rubocop but like no you
00:13:50.160 know it's it's just not great Um and
00:13:53.519 like I want to be using the cool
00:13:56.680 thing Um and so now now I get to beat up
00:13:59.600 on chef a bit Um so chef it has issues
00:14:05.720 that you
00:14:08.279 know so one problem that you'll often
00:14:11.040 see is you have this uh concept of
00:14:15.519 resources So a resource can be like a
00:14:17.600 directory or a file or an RPM if you're
00:14:20.480 using a Red Hat system and you can have
00:14:24.720 uh Chef notify you when a particular
00:14:27.199 file has been written
00:14:29.079 Well if you are spreading those
00:14:31.519 notifications across a whole bunch of
00:14:33.480 cookbooks like over time that you get a
00:14:36.399 real
00:14:37.560 um it's like trying to pull a chord out
00:14:40.560 of a bundle of chords Um it's it's awful
00:14:44.000 work Um then you've got transitive
00:14:46.639 cookbook dependencies So you can be
00:14:50.079 using a cookbook um you think you're
00:14:52.800 done with it you go to delete it and it
00:14:54.880 turns out that that cookbook was used in
00:14:57.600 a dependency chain to pull in some
00:15:00.320 totally unrelated cookbook Um that's
00:15:03.519 just bad And can Rubocop tell you about
00:15:06.880 the absence of a thing not so much
00:15:10.519 Um so
00:15:13.360 uh we we end up with this problem where
00:15:15.760 it's a lot easier to keep adding code
00:15:19.040 over time and you end up with this crust
00:15:22.480 and it's not like if the croft isn't
00:15:26.360 used is it a problem um there's
00:15:30.320 certainly many situations in the past um
00:15:34.000 uh not just meta but like in many
00:15:36.720 companies where things that were left
00:15:38.720 behind and thought were no longer to be
00:15:40.320 active suddenly pop into use Um so we
00:15:43.199 want to uh clean up Croft Uh how do you
00:15:46.800 do that in a way that's not going to
00:15:48.480 blow up at runtime
00:15:50.920 um
00:15:53.160 so yeah
00:15:55.480 um as I mentioned we've been using Chef
00:15:58.320 for over a decade
00:16:01.240 and tens of thousands of recipes and
00:16:05.800 libraries I've yet to meet anyone who's
00:16:08.480 able to keep all of that in their head
00:16:10.480 And I I think that would actually be a
00:16:12.160 pretty awful existence if you did
00:16:15.480 Um
00:16:17.320 so yeah there's assumptions you can make
00:16:20.480 on a codebase The size um I I won't say
00:16:24.800 them outright so I can keep my job but
00:16:28.240 but
00:16:29.240 um right around the time that I was
00:16:33.279 doing the cookst style transition there
00:16:35.279 was this book I read called Kill It With
00:16:37.199 Fire which is an excellent book And a
00:16:39.759 lot of it came down to you need to be
00:16:41.279 able to keep the code base in your head
00:16:43.680 And so you want to start building tools
00:16:46.079 that allow you to do that And so um
00:16:51.120 that's where we came up with Bookworm Um
00:16:54.079 because it was tooling that just didn't
00:16:56.000 exist at the time Um as far as I know
00:16:58.240 there still isn't something that really
00:16:59.360 scratches the itch And Met is a company
00:17:01.680 where we can really explore these
00:17:03.839 problem spaces because again like if you
00:17:06.400 can't buy your way out of the solution
00:17:07.919 you just have to build it
00:17:09.640 Um so uh yeah this makes our ability to
00:17:15.520 look at a codebase where we're talking
00:17:19.039 hundreds of cookbooks like if you have
00:17:20.559 10 cookbooks sure if you got hundreds
00:17:24.240 this this is a tool that you want
00:17:27.319 Um so there was a variety of problems
00:17:30.320 that we wanted to solve And I'm not
00:17:31.760 going to go into like each one of these
00:17:34.320 but we knew that in the course of
00:17:37.440 solving these problems we were going to
00:17:39.039 come up with new questions of like well
00:17:41.440 then how do we delete these things what
00:17:43.840 how does how does it get to be that we
00:17:46.400 can do this safely what is the order of
00:17:48.480 operations there and not just this but
00:17:50.720 it's like what if we wanted to change
00:17:54.160 one set of code into another set of code
00:17:56.080 or at least keep something in sync Um so
00:18:02.440 um basically the we come out with this
00:18:06.960 um situation where there's a pattern
00:18:09.120 that we want to follow Um that pattern
00:18:12.240 being that you're going to crawl files
00:18:14.720 So in this case we're crawling a
00:18:17.280 particular set of files We got
00:18:19.480 metadata.rb that gives you the metadata
00:18:21.600 of the cookbook So what things it
00:18:23.520 depends on what its name is the
00:18:25.440 description etc Then you got your recipe
00:18:27.679 code This is the code that actually like
00:18:29.760 drops the resources onto your system Um
00:18:34.320 and then you've got the attributes which
00:18:35.840 is how you configure a given cookbook Um
00:18:39.280 so we crawl all of those and we keep
00:18:41.440 them in little uh segments called keys
00:18:43.919 Now
00:18:45.480 I if I had known how long this thing was
00:18:48.160 going to have lasted um we'll give it a
00:18:50.400 different name We still can but why why
00:18:53.360 they're called keys is because they're
00:18:54.559 keys of a hash and the knowledge base is
00:18:57.039 just a hash of
00:18:58.280 hashes
00:18:59.960 Um and so you've got the crawler The
00:19:03.200 next part is you're going to take the
00:19:05.840 node pattern a and you're going to do
00:19:08.799 pattern matching against this knowledge
00:19:10.960 base where we taken all the a we
00:19:13.360 extracted from the
00:19:15.240 code and just run over and over and over
00:19:18.000 and we're now filling up that knowledge
00:19:20.000 base with all this information uh
00:19:23.600 separate and extracted from that code
00:19:26.720 and then that knowledge base can be fed
00:19:28.799 to a report Now a report it can be
00:19:31.840 whatever you want it to be It can be as
00:19:33.760 simple as a list that tells you what the
00:19:36.000 cookbooks are It can also be a
00:19:37.720 transpiler that um like we we we left a
00:19:42.720 lot of room there because we didn't know
00:19:44.640 what we needed and we we knew that those
00:19:48.080 needs were going to change pretty wildly
00:19:50.000 over the next few years
00:19:54.039 So there's a few design philosophies
00:19:56.400 that we've
00:19:59.720 taken And one of which is people don't
00:20:03.919 want to use a fancy tool if it's hard to
00:20:07.039 use Doesn't matter how good it is Um it
00:20:10.880 needs to be easy to explore to inspect
00:20:14.160 to fix Um as easy as a bicycle I would
00:20:17.600 say Um and if people can use Grepen
00:20:20.400 Shell they will So it needs to be better
00:20:22.400 than that Um so we put in strategic
00:20:26.000 break points at each part of that
00:20:28.440 process so that that way okay well
00:20:31.360 you've now crawled everything and you
00:20:32.960 want to inspect the a to figure out
00:20:35.200 what's going on Cool Drops you into an
00:20:37.440 interactive debugger We've also got the
00:20:39.280 pro profiler uh ruby prof built in so
00:20:41.840 that you got something slow Well that
00:20:44.960 that should be a quick step You
00:20:46.320 shouldn't have to be jamming these
00:20:48.159 things in And I mean this is the thing I
00:20:49.840 would love to see in more more Ruby
00:20:52.000 stuff um is the ability to um introspect
00:20:55.760 into what you're doing because like this
00:20:57.760 is like one of the superpowers of
00:21:01.000 Ruby
00:21:03.320 Um so yeah the other thing is again I've
00:21:07.679 trained lots of people um on node
00:21:09.919 pattern API and that's rad for rubocop
00:21:14.240 but um how do you use it for other
00:21:17.679 things because like there there's
00:21:19.120 definitely a learning curve there So
00:21:21.520 with this if you've got knowledge of how
00:21:24.159 to do pattern matching in Rubocop
00:21:26.640 congrats You've already figured out most
00:21:29.120 of what you're going to be doing in
00:21:30.919 Bookworm
00:21:33.640 Um and uh yeah so I mentioned that there
00:21:38.960 was a a rule um uh that kind of extracts
00:21:43.440 all that information Like here's a very
00:21:45.360 simple one Um and I can actually show a
00:21:48.960 bit of code right here So in this case
00:21:52.640 uh Chef has this concept of rules Rules
00:21:54.880 are basically okay your this is an
00:21:57.440 enginex server So um the run list would
00:22:01.200 have like an enginex cookbook or
00:22:03.720 whatever But we got the name we got the
00:22:06.320 description we got the run list
00:22:10.400 Now what this rule is going to do is
00:22:15.039 we've created a uh role name method
00:22:18.720 using that again that really cool node
00:22:21.200 pattern API where um if we see the name
00:22:26.159 method we're going to take that string
00:22:28.000 literal we're going to extract it out
00:22:30.159 and then we're going to uh give the a of
00:22:35.440 that particular file to the role name
00:22:38.720 method
00:22:40.000 And great So now if what we can reason
00:22:44.159 from this is the return value of u
00:22:48.799 output is going to be fake underscore
00:22:52.200 ro
00:22:53.880 Um now that that seems like a sort of
00:22:57.600 trivial thing but like the value of
00:22:59.440 rubocop is not that you've got one cop
00:23:01.760 The value of rubocop is that you've got
00:23:03.760 lots of cops Um and
00:23:09.120 So it so the same thing applies with
00:23:12.360 Bookworm Each each thing should unlock
00:23:15.200 the potential for more things And so
00:23:18.159 over time um as you keep building out
00:23:21.280 more and more stuff it gets easier and
00:23:23.919 that gives you a leg up on instead of
00:23:27.120 doing you know I wrote this hacky shell
00:23:29.840 script which just graps over things you
00:23:32.640 can keep reusing these things and build
00:23:35.120 on the work of others
00:23:39.320 Um
00:23:41.080 so this is a very simple report and
00:23:45.039 what's great about it is like again just
00:23:47.039 from the rules you can already see Okay
00:23:48.960 well we've got the role name that we
00:23:50.320 just looked at Ro description You could
00:23:52.480 probably guess what that's doing Same
00:23:54.720 thing but just time with the description
00:23:57.799 And we can take that and we're going to
00:24:01.520 go over all the RO files And we take the
00:24:04.640 RO name roll the description and now we
00:24:06.480 have a nice tidy list Um and that ends
00:24:08.960 up being something which again with crap
00:24:11.760 and shell not as not as nice And the
00:24:15.120 benefits here are now if there's
00:24:16.880 anything else we want to be doing with
00:24:18.240 RO name So like say you wanted to make
00:24:20.080 ro name and show all of the
00:24:23.880 um say you wanted to do a ro names for
00:24:27.440 anything that's using a particular
00:24:29.679 recipe inside of its set list or its run
00:24:33.440 list You can do something like that
00:24:35.760 pretty easily Now
00:24:39.679 um and another thing is because we
00:24:43.200 operate in this environment of hundreds
00:24:45.679 of engineers um some of whom have used
00:24:48.720 Ruby for years some of whom who have
00:24:50.640 used it for days we need to make sure
00:24:53.279 that one rule doesn't make the entire
00:24:55.760 thing slower So like when we look back
00:24:58.720 at this we see like okay we got the role
00:25:01.279 name rule and role description rule Well
00:25:03.120 we're not firing that on everything And
00:25:04.960 because that's both rules are only
00:25:07.039 specified for RO files That means that
00:25:09.520 when you do that report it's only going
00:25:12.400 to go over the rules which again we're
00:25:15.960 talking when your N is huge you really
00:25:19.279 only want to be working on the things
00:25:20.880 that you actually care about
00:25:25.279 So uh one last example just to show you
00:25:28.559 can do
00:25:30.200 um more interesting things with it than
00:25:32.880 just role names but it still fits in a
00:25:35.440 slide Uh is this one where okay let's
00:25:39.600 take all the dependencies that are in a
00:25:41.919 metadata.rb RB file and we can just put
00:25:46.559 them all into a set and we can actually
00:25:49.279 with this determine all the leaf
00:25:51.120 cookbooks that are within a given
00:25:55.640 codebase Why do we care about this well
00:25:59.120 this lets you know like these are
00:26:01.760 cookbooks that if we realize that it's
00:26:05.200 not actually being used anymore that
00:26:06.720 cookbook is safe to delete Which again
00:26:09.440 when you're talking really big dig you
00:26:12.640 want to be able to say "Yep this this
00:26:16.320 cookbook is safe to delete." You can do
00:26:18.159 more and more complex things We There's
00:26:20.240 a one rule we open source called the
00:26:22.799 cookbook dependency shaker and it's got
00:26:24.799 some really advanced logic It's almost
00:26:26.480 as big as the size of the engine itself
00:26:29.720 Um
00:26:31.559 but that's easy to see because the
00:26:35.200 engine itself because it's reliant on
00:26:38.000 rubocop to do a lot of the heavy lifting
00:26:41.120 You can actually do the um read the
00:26:43.120 entire thing in a lunch break That's an
00:26:45.360 intentional uh plug cuz I know where
00:26:48.640 that's just coming up And um yeah the
00:26:51.600 entire thing's under you know a thousand
00:26:53.279 lines of code which um like to have you
00:26:57.679 know a a framework that allows you to
00:27:00.240 just sort of rip through a whole bunch
00:27:02.400 of Ruby code and come up with really
00:27:05.120 interesting information Um that's pretty
00:27:08.080 nice
00:27:10.039 Um yeah so uh the code for Bookworm is
00:27:15.120 open source Um we would love for more
00:27:17.440 people to play with it Um I think it
00:27:19.840 would actually be really useful for
00:27:21.039 Rails but we don't
00:27:25.000 um we're we're mostly using this for
00:27:27.360 Chef is right now So uh that that would
00:27:30.960 be a good thing for if anyone's looking
00:27:33.600 for a fun project to pick up
00:27:36.520 Um and we also want to get more language
00:27:39.919 support in there So like right now it's
00:27:41.679 just Ruby but like the no pattern as um
00:27:44.480 no pattern API from what I can tell like
00:27:46.960 because it's just on the uh it's just
00:27:50.320 using the as gem like that could
00:27:53.600 probably be abstracted to any other
00:27:55.200 language too Um which would be really
00:27:58.960 useful Um there is an unrelated gem
00:28:02.799 called bookworm in
00:28:04.679 rubygeems.org There's um this is where
00:28:07.679 namespacing would be great
00:28:10.039 Um and yeah uh we keep the engine uh
00:28:14.399 source synced with GitHub So if you're
00:28:17.360 you if you're using the main branch so
00:28:19.200 are we Um so uh feel free to file any
00:28:22.960 pull requests you find So yeah Um a
00:28:26.559 little bit early so uh that's good for
00:28:28.320 lunch Uh if anyone has any questions um
00:28:32.080 I'll I'll try to wear this this cool
00:28:34.159 shirt And uh yeah thank you very much
Explore all talks recorded at RubyKaigi 2025
+66