Summarized using AI

API for docs

Soutaro Matsumoto • April 18, 2025 • Matsuyama, Ehime, Japan • Talk

In this presentation at RubyKaigi 2025, Soutaro Matsumoto discusses the documentation feature integrated within the Steep tool, which is built for Ruby's type checking via the Ruby type signature (RBS). This feature introduces an innovative approach to how documentation is accessed and interacted with in Ruby code, enhancing developers' productivity and understanding of API components with real-time support.

Key points covered include:
- Updates on RBS and Steep: The latest updates include improvements in RBS 3.9 and Steep 1.0, which allow for better handling of generic types and improved type checking performance, with features like ignoring certain types of errors.

  • Documentation Features in Steep: The Steep tool provides integrated documentation visibility, which appears as hover text within the development environment. This means that users can access class and method documentation directly from their code editor, enhancing the coding experience.

  • Real-time Documentation Interaction: When writing Ruby code, developers can access documentation through hover actions, method completion suggestions, and signature help features. These features are derived from comments added to RBS files, formatted in Markdown for clarity and ease of use.

  • Internal Documentation Structure: Matsumoto explains the internal API structure that supports this documentation feature, noting how RBS files interact with the Language Server Protocol (LSP). This differs from traditional tools like RDoc or YARD that generate static documentation files. Instead, Steep dynamically generates documentation on the fly based on user interactions.

  • Challenges with Overloaded Methods: A significant challenge discussed is identifying and managing overloaded methods within RBS. The current identifier system poses complications, and the presentation proposes using method signatures combined with identifiers to improve clarity and usability.

  • Future Goals: Matsumoto expresses intentions to refine this documentation API further, allowing greater integration with other Ruby documentation tools. The goal is to create a robust index system that can store various documentation types effectively.

In conclusion, the implementation of the documentation feature within Steep marks a significant evolution in Ruby programming, promising to enhance the way developers interact with documentation directly from their coding environments. The prospect of collaborative API across different tools in the Ruby ecosystem invites ongoing development and validation of these approaches.

API for docs
Soutaro Matsumoto • Matsuyama, Ehime, Japan • Talk

Date: April 18, 2025
Published: May 27, 2025
Announced: unknown

Steep provides documentation features integrated with editors. You can read Ruby code with documentations of classes and methods, which hovers near the cursor. You can write Ruby code with completion suggestions with documentation, helping you select the best option from the list.

The feature can be seen as a variant of traditional documentation tools, like RDoc and YARD. However, it is essentially different from other tools: instead of generating human-readable files, it provides an API. The API provides a data structure that allows retrieving the documentation associated with each component of Ruby programs.

In this talk, I will outline the implementation, discuss the requirements, and share design considerations behind the feature.

https://rubykaigi.org/2025/presentations/soutaro.html

RubyKaigi 2025

00:00:07.040 good morning everyone uh I'm Sut
00:00:13.440 uh I hope you find me as a on the stage
00:00:18.960 at the last session and yeah today I'll be talking about the API for docs about
00:00:27.599 the documentation that is implemented in RBS and
00:00:34.440 Steve I'm one of the Ruby committees you know I work for RBS it is an type
00:00:41.680 declaration language edge of luby programs and it is also a name of AT gem
00:00:47.360 that passes the RDS files and do some works on that i develop steep it is an static
00:00:56.480 type checker it is built on the top of apps i'm a software engineer at TY and
00:01:06.400 yeah uh I've started writing a series of articles for a magazine called software
00:01:11.840 design it's about the types of Louis cars so uh you will find this this
00:01:21.080 one's bookstore in this second floor
00:01:45.759 okay so yeah uh let's start with the update of RBS and Steve since last week
00:01:51.799 i RBS 3.9 and Steve 1.0 are the latest versions
00:01:59.840 uh generic type parameters can have different values in three in
00:02:05.960 RBS36 uh you can make some types generic without breaking existing type
00:02:12.920 definitions step 17 allows ignoring some type errors if you cannot find the way
00:02:19.680 to fix that type error you can just ignore by writing some comments steve N updates Steve file DSLR
00:02:28.480 so it now allows some better project organization and it may improve some uh
00:02:34.239 type checking performance on CI settings by running the uh the type checking some
00:02:42.160 parts of the your project distributed on the
00:02:47.959 computers the visor type names matic command is introduced at three RBS 39 it
00:02:53.840 will make the RBS file loading faster you can make types methods and some
00:03:01.920 constants duplicated this D 1110 reports some diagnostics if you use those uh
00:03:09.440 duplicated ones in your be code and RDBS type definition uh D-fork option is available
00:03:16.800 in step 1110 this is experimental and we hope that it reduces some the memory
00:03:23.040 footprint the RBS iss by Pasan and another one is the
00:03:30.879 receiver type narrow receiver type narrowing it is available in stick 110
00:03:37.360 and yeah this is by Kasan and he had a presentation related to this in the
00:03:43.680 first day so I think you are wondering something
00:03:52.080 is missing uh yeah so inline RBS
00:03:57.519 declaration support is not finished yet i have been working for RBS 4 and ST
00:04:04.159 2 and I hope the these new version will come with this feature so it means that
00:04:11.439 RBS files uh I'm sorry that RBS gem itself loads the DB code with inline
00:04:17.919 type decorations so you don't have to generate the RBS files using RBS inline
00:04:23.440 gem but yeah I'm sorry that uh it's not finished yet so you can find some progress
00:04:30.800 related to this on the GitHub repository but yeah it's not available yet
00:04:36.800 so uh these are some of the uh latest updates on RBS and
00:04:43.560 steep so moving on to the main topics the documentation feature of
00:04:49.160 steep steve steep provides uh some features to help programmers to help you
00:04:55.680 reading the documentation associated to the Dub program components class module
00:05:01.160 methods some type definitions in RBS so let's see how these features are
00:05:10.039 working the first thing one is hover it shows the documentation when
00:05:16.400 you point something in your source code on by your console when your console is
00:05:22.639 on a class name uh it shows the documentation associated to the class so the screenshot is about the comment
00:05:29.840 block class it is the yeah part of the RBS gem
00:05:34.880 if your castle is on a method call it shows the documentation of the
00:05:41.960 method completion pops up when you write the name of the method so this example
00:05:49.919 shows that uh when I write map and it shows the rest of the methods that contains map the map map band flat map
00:05:57.919 filter map and you also find that uh the documentation is the documentation of
00:06:05.360 the first uh section the first item is shown that it is um array map
00:06:12.759 method signature help is another feature to help you writing the methodical
00:06:19.479 arguments it works for the when you write the argument of the method call
00:06:26.400 and it shows the possible parameter list and it also shows the some of the method
00:06:32.400 some of the documentation of the method so the source of the
00:06:38.479 documentation displayed in the editor is the comment uh written in RBS files you
00:06:45.680 can put some comment of the class declaration comment on attribute definition comment of method uh you can
00:06:53.600 format them in markdown uh the because the language server protocol natively
00:06:59.039 supports the writing something in markdown so write like
00:07:06.120 this those features are implemented on the top of an internal API in steep
00:07:13.520 rdbs files are loaded into and some internal data structure and the editors
00:07:20.080 send some request to show hover to show the completion and sync via a protocol
00:07:26.479 called language server protocol the language server it is um implemented in steep it receives some
00:07:34.080 request from the editor and do something to query the documentation and uh it
00:07:41.120 sends to it sent back to the editor and the editor shows the content to the
00:07:50.680 user this is a bit different from some traditional documentation tools like our doc and yard
00:07:57.759 so they generate HTML PDF or some human readable
00:08:03.479 documents but yeah uh what is ARDOC i
00:08:08.800 mean the word means some different aspect of the tool we call the syntax of documentation
00:08:17.440 embedded in Ruby and C code r this is the one of the most frequently
00:08:24.400 used user interface so you write when you are writing some Ruby code you will
00:08:30.879 you you sometimes write some comments in the ardoc syntax and yeah we also call the output
00:08:40.719 HTML files AR doc this is another user interface so
00:08:46.080 that we use that HTML files to understand the overview of the library
00:08:52.000 or to understand the behavior of our method
00:08:57.279 we also have some engineer structure it is the implementation of
00:09:04.440 R.je it has some classes and modules and to it loads the uh some documentation
00:09:11.920 from the Ruby source code and generate some uh output HTML files but this is implementation detail
00:09:20.080 but uh and the end user don't need to understand
00:09:26.440 them rbs documentation system also has similar structure rbs files are the
00:09:33.440 input except some in general structure the output is a bit different
00:09:39.600 uh we consider the documentation fragments that is sent to the editor is
00:09:45.040 the output the output is generated on the
00:09:50.399 fly that is the triggered by the users interaction on editors the editor
00:09:57.440 sends some request to the server and server generates something so the generation is
00:10:03.000 done just in time and yeah in this presentation uh we
00:10:09.360 focus on the internal structure i don't discuss the concrete documentation syntax i don't talk about the output
00:10:15.839 formats but I'm going to explain the implementation of the internal structure of RBS and Steve today
00:10:24.720 so these are the components that is related to the documentation features as
00:10:30.560 Per and environment they are part of the RDS gem type checker and displaying the
00:10:36.720 docs are implemented in steep so let's start with the
00:10:43.959 ASD this is um some of the syntaxes to declare the RBS types
00:10:51.360 so um on the left we have class declaration and the middle we have the
00:10:56.560 method definition so that is the def synt syntax in RBS so they have comment attribute it
00:11:04.880 keeps the uh comment Ruby object and uh we know that a comment is associated to
00:11:13.120 some uh some of the constructs in RBS
00:11:19.640 files the command class is very small is on the right uh essentially it has only
00:11:25.600 one attribute called string so it contains the uh the the content string
00:11:32.480 of the comment the next one is p the rbs p is
00:11:40.720 implemented in the as you may also already launch as a presentation by rx from supery yesterday uh yeah this is um
00:11:50.480 in general data structure called comment this is uh purely in general
00:11:55.600 data structure it's not exposed to the Ruby CR library so this is just to keep
00:12:01.279 the track of the comment tokens during the pausing of the Ruby source
00:12:06.440 code it has start and end end position and it keeps the rest of the comment
00:12:13.839 tokens so this is one comment uh value uh one
00:12:19.360 comment value may have several comment tokens in this case there is the five
00:12:28.680 tokens and the last one is next to next comment this is a blink list so the next
00:12:35.880 comment points to another comment in the source code
00:12:42.000 the passing implementation is something like this so this is a function that
00:12:48.320 passes the method definition syntax called the pass method def
00:12:53.639 function the get comment function here retrieves a comment in the source code
00:13:00.959 so the comment we want is identified by the line number of the current token
00:13:08.399 what is the current token here it is the dev
00:13:13.800 token so the get function uh I'm sorry the get comment function finds a comment
00:13:21.279 ending at the just below the current line so this is the
00:13:28.600 comment the get comment function constructs a comment to be object and
00:13:33.839 returns it so finally the value of the comment local variable is a comment object or it may be new if there is no
00:13:42.440 comment and then uh finally it is passed to a function that constructs the a
00:13:50.000 object in this case that it is the RBS as members method definition
00:13:55.880 so this is how P attaches the comments to the ASD
00:14:04.360 objects we have ASD objects now and they have some associated comments on each
00:14:11.600 class declaration method declaration attribute declaration so let's see how
00:14:17.279 can we find the comment content using some class names or class name and
00:14:24.079 method name environment is a set of RBS type definitions and RS gem passes the RBS
00:14:32.880 files and put them into the some environment
00:14:38.360 object we convert the set of RBS declarations into a data structure
00:14:43.920 called definition so uh definition has methods variables
00:14:49.199 some some ancestors the uh ancestor relations between class and
00:14:54.680 modules so this line calls a build instance method it constructs a definition object for instance of a
00:15:01.440 string class uh there's a methods method that keeps the method object for each
00:15:08.920 method too many methods and the comment method returns an array of
00:15:15.839 comment objects we have type checker in steep uh
00:15:24.240 type checker scans the all of the Ruby programs and uh assigns the type of each
00:15:31.600 type to each node the constant is string and the type
00:15:37.279 checker knows which class the constant is associated with it is um string class
00:15:44.959 so uh it looks up the where the class uh declaration that string class is
00:15:50.240 declared and find the comment associated to the declaration it's almost the same with
00:15:57.360 the method calls the type checker detects which method overalls may be called on the method call so in this
00:16:05.320 case this one is the overload and again
00:16:11.120 we find the comment that is associated to the sub
00:16:16.440 definition the last part is the communication with the language server protocol uh this is an example of hover
00:16:23.839 request it has a file name and the the castle position line number and
00:16:30.000 character index the language server finds which node is located at the
00:16:37.040 position and it fetes some documentation that is related to the node and it
00:16:42.480 returns the JSON object that contains the formatted documentation in markdown
00:16:48.240 so this is an overview of the implementation of RBS and steep and yeah
00:16:53.839 it will help you understand the what is happening under the food and yeah it has a
00:17:02.920 program Steve starts type checking when you edit comments when you only changes
00:17:09.839 the comments in RBS files so this is an demonstration that
00:17:15.360 shows the we editing the some comments on the absolute question method and you
00:17:21.439 will find the at the bottom bar and it tells that Steve is type checking the
00:17:26.480 source code so this is weird so because this is just a comment we don't add any
00:17:34.320 type definition any type uh any new class or major no method is added we are
00:17:40.720 not changing the type of methods it just updates the docs so nothing will be changed on the type
00:17:47.440 checking result and yeah why we need to run the type check
00:17:56.120 again uh the for the type of constants it's fine
00:18:04.080 the type of the constant is a singleton of string and we can easily extract a
00:18:10.320 key to identify which document we we want we need to show it is a class name
00:18:16.320 the string is a class and we can look up the documentation from the RBS files
00:18:25.120 the problem is on the method call unlike the case of constant uh it
00:18:31.840 currently directly points to some internal overloading data
00:18:37.640 structure and the internal data structure can be resolved to the
00:18:42.960 commands this is the same so the problem is here when RBS file is changed the
00:18:48.880 internal overloading data is updated because it might change some typeing
00:18:53.919 result and uh we have to run the type check again so yeah I say the the feature is
00:19:03.120 something like an API and the previous slides but when we see the
00:19:08.400 implementation closely uh we found there is no clear boundary between the doc
00:19:13.919 system and the type checker we may solve the problem by
00:19:20.320 introducing some clear API API between the doc system and the type check so we
00:19:27.039 would call that API index so when steeps loads RBS files it also updates index
00:19:34.960 register some documentations to the index if only comments are changed it updates the index but it doesn't need to
00:19:42.080 run the type checkers the language server proves the documentation from the index using some
00:19:48.520 identifier so it doesn't use some internal data structure but the uh the
00:19:54.320 communication between the index and the type sticker is done through some
00:20:01.080 identifier another benefit of using identifier is that index may be reused
00:20:06.799 after the server restart if we can make the identifiers serizable like just a
00:20:13.919 string or something then we can save the index and the identifier to the disk or
00:20:20.400 maybe load it from uh the disk later or maybe we can distribute the index uh via
00:20:28.480 the internet so the index will be like this
00:20:33.840 so we have for example the class documentation and method documentation we have to we have the register and
00:20:40.240 query methods the do of the class can be registered to the index the name of
00:20:45.919 class is the identifier so this is very
00:20:53.080 simple the docs of a method can also be registered to the index but the problem
00:20:59.039 is what is the identifier of a method so you may be wondering the string hash
00:21:05.760 encoding or the array new so these notation are standard in Ruby
00:21:11.400 community so yeah it it works fine for some
00:21:17.679 methods like string hash encoding because it only has one
00:21:24.280 overload but how about the referencing of array class there are three over rows
00:21:32.320 in the RDBS declaration so it's not clear which overro is pointed with that
00:21:38.600 notation but in fact the this example is okay so because that there is only one
00:21:44.080 def synt syntax here so all of the overload shares one documentation
00:21:49.880 content the problem is that RBS allows multiple depth syntaxes for
00:21:56.679 overloads this is an example the f method has two def syntaxes in the right
00:22:04.000 uh the first f call point to the definition in a rbs because the argument
00:22:10.880 is string and the second f call points to the definition in b rbs because it's
00:22:17.440 called integer sleep currently handles the two
00:22:23.200 who call over correctly so we need something to identify the overload for
00:22:29.039 the index API the hash method name or dot method name notation is insufficient
00:22:35.200 for this so how we can find the good
00:22:41.799 identifier the how about using the file name and the line number so we have two definitions and
00:22:49.760 the first one is in a rbs the second one is b rbs it's
00:22:55.880 good but the problem is that the R number may be changed when we edit some
00:23:01.520 comments so we may add some new line uh in the comments and we will change the R
00:23:08.400 number of the div syntax so we have the
00:23:13.600 uh the second one is a b.rbs the post rhyme and it has it doesn't have the div
00:23:19.440 syntax another idea is the simply using the
00:23:25.919 index of overload we have some list of overloads in January so we can call the index zero overload zero overload one
00:23:33.520 over two so adding a number after the method name the first one it is overload index
00:23:42.799 zero it points to the integer case because the overloaded addit with the
00:23:49.760 triple keyword comes up first so the atm one is the string
00:23:57.240 case and this doesn't work well again so because the overloading order depends on
00:24:03.679 the file loading order and the file loading order is unspecified in RBS so when C.RBs loaded before B.RBs
00:24:12.080 RBS the integer overload comes first if the B rbs is loaded before C rbs the
00:24:21.200 simple overload is the first i mean that the overload index is
00:24:29.120 not very stable it may changed after server restart or if we if I change some
00:24:35.840 of the implementation of RBS gem so it will be broken another problem is that
00:24:42.000 it's not human readable it's not very clear which overload it is pointing
00:24:48.200 to so finally I think that using the method types directory is the solution
00:24:54.720 we can add some method types after the method name and it will solve the
00:24:59.760 problem so it's very clear which one is the which div syntax it's yeah it's very
00:25:08.039 straightforward but uh when the type of method is changed uh it means that the
00:25:13.679 identifiers should be invalidated it means that we should run the type checker again but this is fine
00:25:19.760 because that we in ways need to run the type checker for the upto-date typeing
00:25:28.440 results this is smear to method descriptor in JVM so JVM uses some
00:25:34.559 engineer names qualified with parameter types and region type so this example is a uh hello
00:25:42.640 method definition in Java the at the bottom the top is the engine uh JVM
00:25:49.760 method descriptor so it has uh type of
00:25:55.640 parameters in a blue and green part and the orange part is the return type this
00:26:01.840 is almost the same to what we will do but instead of using some cryptic but
00:26:08.000 compact type name notation we can simply use the RBS method type
00:26:14.600 notation uh we need some some normalization because the RDBS
00:26:20.960 method types may have some redundancy uh we should drop the parameter names uh
00:26:29.120 as about some documentation and it doesn't change the type checking result the keyword argument ordering
00:26:37.200 would cause some confusion too so we should s the keyword arguments and yeah
00:26:42.960 it will be fine so the normalized method types can be used to identify the overload of
00:26:49.360 methods uh it is a string so we can save to file or load that from files or we
00:26:57.120 can compare them as a string it will be super easy it's almost implementation agnostic and it will be stable after
00:27:04.880 server restarts after some uh gem updates also so using this identifier we can
00:27:12.559 separate the doc system from the type checkers so the two components can be
00:27:18.559 connected by the index API Okay uh let's see the constant example again so the
00:27:24.640 type of constant is a singleton string and the key to fetch the documentation
00:27:29.760 is a class name of string the method call is the same so the type checker detects the overload
00:27:36.480 that we've called and we now have an identifier of uh method call that
00:27:43.679 identifier of method overload and the documentation can be proved from the
00:27:48.720 index using the identifier so yeah so okay so everything is fine and we can
00:27:56.240 start the implementation and yeah it's not finished again so I believe it works
00:28:02.320 but yeah it's not confirmed yet by
00:28:08.600 implementation another topic we can discuss is that if the index can be used by other gems like our dog or yard so it
00:28:16.240 will be really great if our dog can register the documentation to the index
00:28:22.480 and Steve and maybe Ruby LSP can use the index to show the
00:28:30.760 documentation that is written in our doc
00:28:36.279 directory but I'm not sure if this makes sense or not so we discussed the key of
00:28:41.840 the index and I think we we solved the solution we we solved the problem but we
00:28:49.039 haven't discussed the content of the index so ARog Y RBS they have different documentation content so the problem is
00:28:56.880 that how the content format is the
00:29:02.000 content format should be to cover all of them
00:29:07.360 another possibility is using the index to store the definition locations or
00:29:12.720 reference locations so you can store the location across or method is defined in the codebase or the opposite where the
00:29:21.200 class or method is used in your Ruby program and yeah then uh this index can
00:29:29.120 be used to implement the go to definition or go to reference features
00:29:37.080 so okay so we see the overview of the documentation feature of Steve and RBS
00:29:43.919 uh I also explained some of the uh some of the
00:29:51.720 implementation and discussed some problems and I also share the idea to
00:29:58.320 implement the documentation index the key is the design of the method overload
00:30:03.720 identifier and I hope the API might be used by other tools too so yeah that's
00:30:11.520 it so thank you for your attention
Explore all talks recorded at RubyKaigi 2025
+66