00:00:04.480
hello everybody Welcome to Embracing Ruby
00:00:09.519
Magic statically analyzing DSLs My name is Vinnie I'm a member of
00:00:15.920
the Ruby developer experience team at Shopify We work with all sorts of
00:00:21.760
developer tooling Uh one aspect that we work with a lot is static analysis and
00:00:27.359
type checking which is what I'm going to be talking to you about today in the context of the Ruby
00:00:34.120
LSP If you want to know a little bit more about me and my work you can find me at Vinnie talk And if you would like
00:00:41.040
know to know more about the work that our team does I highly recommend you check out the talks from my colleagues
00:00:48.399
Alex Mchulov is going to be talking about how we migrated the RBS parser to be made in pure C with no dependencies
00:00:57.039
uh regarding the internals of Ruby and the VM And then Alex Tahaza is going to
00:01:02.960
be talking about the RBS support that we are pushing forward in Sorbet via uh
00:01:08.880
comment annotations And if you would like to work with us we have positions open you
00:01:16.479
can scan the QR code and land on the the link where you can apply or just come
00:01:21.759
talk to any of us uh in the conference in the afterparty
00:01:28.119
whenever So to start we need to first define what is static analysis and what are DSLs And in the context of Ruby the
00:01:36.560
best way that I can put static analysis in a single sentence is that it is a a technique to try to predict what will
00:01:42.880
happen in a program before you actually executing it without actually executing it There are several tools that are
00:01:50.159
enabled by static analysis many of them in an editor context like every single
00:01:55.759
LSP feature every single language server feature go to definition showing
00:02:00.799
information on hover or showing completion suggestions Uh but it's not only limited to uh the editor You can
00:02:08.239
also have static analysis related features that exist in other uh contexts
00:02:14.000
For example we have at Shopify a system that can detect dead code and then open
00:02:19.840
PRs automatically to remove it Uh but you can also have other things such as codebased v visualizations that will
00:02:26.959
allow you to see your codebase in a more uh high level Before we move on to defining DSLs
00:02:34.879
I also want to mention well why would we try to predict what the code's going to do instead of just executing the code
00:02:41.519
and then actually seeing what is happening There are a few reasons for that Uh but the most critical ones in my
00:02:48.480
opinion are the fact that executing code may have side effects So tooling that we
00:02:54.239
create for users is always running in code that we don't control And so if the
00:02:59.280
user is working on something that will perform destructive actions like removing files from their file system or
00:03:05.120
migrating a database and dropping tables or anything of that sort uh then we need to be very careful about the code that
00:03:11.280
we're going to execute because it may produce a side effect that they are not expecting But also executing code is uh
00:03:18.800
much slower than just analyzing it statically Even if your code is very
00:03:23.840
fast for the context where it executes in production for the uh constraints
00:03:28.879
that the editor imposes on tooling it can uh it can be uh maybe not fast
00:03:36.000
enough So if you have a type checking cycle or an analysis cycle that's taking a few hundreds of milliseconds uh if the
00:03:43.599
user is typing really fast and changing files all over that may already uh cause some lag and lack of responsiveness
00:03:52.959
So what are DSLs DSL stands for domain specific language And the way that I
00:03:58.000
like to think about it is that it is a way for us to uh extend Ruby itself So change the way that Ruby works It allows
00:04:05.680
us to do things that have dedicated syntax for in Ruby in other ways using
00:04:11.840
different types of syntax For example in uh this example coming from rails
00:04:17.120
belongs to this is just a method call We are only invoking belongs to in the post
00:04:24.560
class and that's it But we know that from invoking belongs to we are going to
00:04:29.600
have new methods defined in our post objects even though we're not actually
00:04:34.639
manually defining them with the defaf keyword So that's why I feel like it's a way of extending the language a way of
00:04:39.840
doing the same things that Ruby allows us to do but uh differently
00:04:45.199
One way that we could implement belongs to a super simplistic version of the implementation is by using class evol uh
00:04:52.880
which evaluates any arbitrary string of Ruby code and we can use that string of Ruby code to then define the methods for
00:05:00.320
uh the reader of the association and the writer of the association based on the entity name that we are associating to
00:05:08.160
And of course the the bodies of the methods here are empty we would need to actually implement the logic but it's just for uh illustration
00:05:15.400
purposes The key to the DSL component here is the meta programming that we get from class
00:05:22.440
evolving belongs to will mutate the declarations that are available in our codebase It will change which methods
00:05:29.440
are available in the program We're going to call this type of DSL uh declaration
00:05:34.919
DSL exactly because it will change the declarations that are available in in the program And to contrast with this
00:05:42.479
other type of DSL here another example coming from Rails
00:05:47.520
Validate takes uh the name of a method as a symbol and you register those
00:05:53.280
validations at the class level of your models and Rails will automatically invoke those validations for you in uh
00:06:00.720
objects of of your model in objects of the post class The way that we could implement uh
00:06:08.000
implement validate is we could have uh the method store the validations that you register in some sort of class level
00:06:14.960
state and then whenever is appropriate like when you save a new uh record for
00:06:20.240
post We can then grab those validations that we registered execute each one of
00:06:25.360
them with dynamic dispatch by using send uh and that will give the the developer
00:06:30.880
an opportunity to inside the the validation definition add errors to the
00:06:36.560
post object or do whatever is appropriate and then the framework can decide what happens if there are errors
00:06:42.720
in the in the post object that we're trying to save So we can raise return false do whatever is
00:06:49.639
appropriate In this example the key to the DSL component is the dynamic
00:06:55.759
dispatch The fact that we are uh sending a method that we don't initially know which one we're going to be invoking And
00:07:02.479
in this case we are not changing which declarations are available in the program but we are overloading the
00:07:08.960
meaning of a symbol From Ruby's perspective the symbol is just a little piece of data like a number or a string
00:07:16.479
But in this particular example in this particular Rails API is overloading the meaning of symbol to be methods method
00:07:23.759
names that are available in uh objects of the post class We're going to call this one a
00:07:30.240
call site DSL And the contrast is just because the Ruby LSP gives us different APIs to handle each one of the types of
00:07:37.759
DSLs So what is difficult about analyzing
00:07:43.360
these types of DSLs uh statically There are multiple reasons but I want to mention uh at least two here The first
00:07:51.039
one is the influence of runtime values and control flow in our code So if we
00:07:57.199
take another example here where we are defining a method dynamically with uh the name of the method being controlled
00:08:03.520
by a variable and then we are doing that conditionally on some other runtime value being equals to five Well then
00:08:11.759
it's very difficult for a type checker to be able to figure out what will happen in this case We not only do not
00:08:17.039
know the name of the method that will be created but we don't know when it's going to be created And this runtime
00:08:22.400
value can depend on external things like a database query or user input or some
00:08:28.080
sort of uh information in the file system So there's very little hope of being able to analyze this accurately
00:08:34.880
without extra uh annotations And in addition to that uh I
00:08:41.760
want to also remember that we're always under the performance constraints of the tooling or the context where the tooling
00:08:48.160
is going to be running like the editor So even in cases where you can argue
00:08:53.279
that the the piece of Ruby code we're analyzing can be fully understood statically it may simply not be worth
00:09:00.080
doing so So for example in this case uh you could argue that this piece of code can be analyzed statically in its
00:09:07.080
completion You have create method which always invokes define method uh not
00:09:14.240
conditionally It's always guaranteed to happen And we know that the first required parameter is always used as the
00:09:20.480
method name without any modifications And then in create hello we know that we're always going to invoke create
00:09:26.480
method with a string literal So in theory you have all of the information available here to analyze this piece of
00:09:32.640
code uh statically But the cost of doing so may simply not be worth it You need to remember all of those uh bits of
00:09:40.720
information so that the type checker can actually understand all what's going to happen on each step uh and even if you
00:09:47.440
can't understand it the method hello is only going to be available after you invoke create create hello So there's a
00:09:53.519
lot of complexity that makes the performance cost uh grow The reality is that type checking
00:10:00.240
and statically analyzing Ruby is a constant tension between accuracy and
00:10:05.920
performance Even if we could uh create analysis that's like super accurate if
00:10:11.680
it takes 15 20 minutes to run a single uh cycle of analysis then the usefulness
00:10:17.600
of it becomes uh very small We need the tools to be performant So we're always
00:10:23.040
trying to find the balance between the usefulness that we can provide the more accuracy that we can provide and uh the
00:10:28.800
performance cost So now let's move on to the Ruby LSP and
00:10:34.560
uh how we we added new APIs to analyze these DSLs If you don't know the project
00:10:41.440
the Ruby LSP is actually two different things It is a language server for Ruby
00:10:46.800
that adds extra features in any editor that go just beyond the basic editing of
00:10:53.279
uh of the code And it is also a VS Code extension that adds features that are
00:10:58.399
specific to VS Code So here I have a few examples The language server can show you
00:11:04.240
documentation on hover It can show you completion suggestions It can also allow you to
00:11:11.279
jump to definition on a constant method variable And then uh examples of of VS
00:11:18.480
code specific features I have here are uh implementation of the dependency explorer that allows you to navigate all
00:11:24.959
of the code defined in the dependencies of your your project And we are about to
00:11:30.000
release our reimplementation of the test explorer that allows you to navigate all
00:11:36.240
of the tests defined in your uh application and then you can run them in
00:11:41.279
uh four different modes of execution The one that you just saw in the video is the simple mode just run You can run
00:11:48.800
them in the terminal so that you can interact with the commands that we just used to execute your tests You can
00:11:54.720
launch uh an interactive debug session directly from the UI and you can run
00:12:00.399
them in coverage mode to see the results of coverage uh directly inside VS
00:12:07.560
Code We wanted the Ruby LSP to have as accurate as possible of uh static
00:12:13.680
analysis Uh but the reality is and understand DSLs like the ones that we just saw for Rails But the reality is
00:12:20.880
that creating new DSLs is something that's super common in the Ruby ecosystem And we have so many different
00:12:26.480
gems that that do it We couldn't possibly try to handle every single type of DSL within the Ruby LSP itself It
00:12:34.079
would be way too complex So we wanted a way for the gems themselves to be able to teach the Ruby
00:12:40.639
LSP how to understand the DSLs that they define And this is done through our add-on API
00:12:48.880
which is an API for other gems to enhance the base features that we provide from the Ruby LSP language
00:12:55.399
server And the idea is for uh tool and framework specific features and integrations to be added to the editor
00:13:02.720
So the Ruby LSP provides the base features for uh for Ruby for any type of Ruby code and then DSL specific things
00:13:10.079
integrations that are related to other frameworks are all done via
00:13:15.160
add-ons The basic structure of an add-on is uh this add-on entry point class and
00:13:21.440
it has four mandatory methods that we ask to be implemented Activate is the
00:13:27.440
first one it uh is invoked as the the language server is launching and it is
00:13:32.959
an opportunity for the add-on to start any sort of background process handle editor settings anything that may uh
00:13:39.760
make sense during initialization We ask uh that they define the name method simply for
00:13:46.800
presentation purposes Very recently we started allowing uh
00:13:52.240
add-ons to export public APIs and uh to have one add-on depend on another add-on
00:13:58.880
And so in order to make that a little bit smoother we started requiring that the version be defined so that we can
00:14:05.040
then impose constraints on the add-ons And uh the counterpart to
00:14:11.040
activate is deactivate When we are shutting down the language server then we give add-ons the opportunity to uh
00:14:17.360
perform any sort of cleanup or shutting down child processes anything that might make
00:14:23.800
sense Before we move on to the specific APIs for DSL handling we just need to take a look at how the Ruby LSP
00:14:31.360
implements uh every one of its features the architecture And it is essentially a
00:14:36.480
combination of two different patterns the visitor pattern and the observer pattern We essentially have this uh
00:14:43.519
entity called a dispatcher which visits uh an a traverses the a and as it is
00:14:50.839
traversing it will emit events for all of the nodes that we discover in a piece
00:14:56.720
of Ruby code and then other objects the listeners can then subscribe to the
00:15:02.320
nodes that they are interested in handling and they will be notified whenever we find those types of nodes in
00:15:08.959
u in the So here's an example of a super simplistic listener implementation We
00:15:16.000
initialize it with the response builder This uh response builder concept is what
00:15:21.760
we use to collect contributions that are coming for from multiple different listeners that will compose the final
00:15:29.120
response that we return to the editor And then with the dispatcher with that entity I mentioned we can register for
00:15:36.000
uh which events we are interested in handling So in this case we are registering uh the object itself the
00:15:42.880
first argument there self uh and we're interested in handling method definition nodes So whenever we find a new method
00:15:49.199
definition in the code and the last step is to define well what happens when we find one and for this example we're just
00:15:56.720
pushing the node into the response builder The way that we can use this listener is
00:16:03.040
first we define the two top level entities that control the composing of the response and the execution flow So
00:16:11.199
in this case we're just going to be using uh an array for the response builder in the actual implementation It
00:16:17.519
is more complex than that And for the dispatcher this is an object that we upstreamed uh a while ago at this point
00:16:24.560
to Prism You can use it for whatever purposes directly from the gem Um and it
00:16:31.040
gives you that ability to do the visiting of the a with uh the observer
00:16:36.240
functionality uh attached together as well We can then attach our listener to
00:16:42.560
those two entities so that it registers for the events and has a handle for the response builder We parse any string of
00:16:50.959
Ruby code that we are interested in analyzing to get the a back And then when we visit using the dispatcher we
00:16:58.399
are going to traverse the avents are going to be fired which then uh invokes logic in the listener and by
00:17:06.559
the end of the traversal the response builder will be mutated with the in this
00:17:11.679
case with the definitions that we found for methods This architecture uh allows us
00:17:19.199
to have pretty good separation of concerns the uh all of the features are implemented as separate listeners that
00:17:25.919
have completely independent state and logic Uh but we also get really good performance because we can attach all of
00:17:32.559
the listeners to the same round of traversal and then we don't have to go
00:17:37.600
through the same a twice or three times or four times depending on the number of features We only go uh and traverse it
00:17:44.640
once and capture all of the responses for every single feature
00:17:50.000
Okay so now we are ready to start taking a look at how to handle DSLs and the new APIs that we are exposing The first one
00:17:56.799
is going to be declaration DSLs We call this API indexing enhancements Indexing
00:18:03.840
is the process of discovering every single declaration in a codebase Since
00:18:10.000
this type of DSL mutates which declarations are available then it is contributing to the indexing process of
00:18:17.120
the Ruby LSP The API has two dimensions You define an
00:18:22.160
enhancement by inheriting from our base class and then you can handle uh method
00:18:27.360
call events both the enter and the leave Uh enter and leave is just so that you can uh accurately represent what will
00:18:35.440
happen if you for example enter a new namespace and leave that nameace using a
00:18:41.080
DSL And the other dimension of the API is that every uh enhancement has access
00:18:47.039
to our indexing listener which then exposes the methods that you can actually use to inform the LSP about
00:18:54.799
dynamically defined declarations So add method will simply tell the LSP that hey
00:19:00.320
here's a method that you may have missed because it's defined dynamically You can do the same for uh modules and classes
00:19:06.799
But then one nuance is that adding a new dynamically defined namespace will
00:19:12.559
advance the context of indexing to be inside of that namespace so that if we
00:19:18.240
find constants inside of it or new methods they will be automatically attached to that dynamic namespace And
00:19:25.600
then the fourth and last piece of the API is being able to leave that dynamically defined namespace so that we
00:19:31.760
can uh have the correct flow for indexing in the end So let's take a look at how we would
00:19:39.039
handle the case of belongs to and inform the Ruby LSP of the methods that will be defined there We can start by checking
00:19:46.799
if the method that we just encountered during indexing is named belongs to and
00:19:52.160
if it's not we are not interested in handling that method We can then take
00:19:57.360
the first argument which is the entity that we are associating to Uh and here we need to check if the argument is
00:20:03.840
actually there Language servers usually see the code in an intermediate state as
00:20:09.039
the user is typing So it's very likely that we will see the code uh the call belongs to without arguments multiple
00:20:15.120
times until the code is in its final state and then if the the name is
00:20:20.880
already there if the symbol is already there we can take the association name by taking the value of that symbol and
00:20:26.880
we can then use the add method API to inform the index that hey there will be a reader for the association and this is
00:20:34.480
the name of that that dynamically defined method The only other part is uh adding
00:20:41.280
the writer for the association and there's only uh two differences here for
00:20:46.320
adding the writer Of course the name is different because we have the equals uh but also the writer accepts one argument
00:20:52.799
and so we have to define which signatures are accepted for that method The Ruby LSP allows you to have multiple
00:20:59.600
in case there are many different ways that you can invoke the same method But in this particular example there's just
00:21:05.840
one signature with one required parameter And that's already enough for
00:21:12.080
us to be able to handle uh belongs to and tell the Ruby LSP that when we find that call two new methods are going to
00:21:19.440
be available inside of that post class One for the reader of the association and the other one for the
00:21:25.960
writer Moving on to the other type of DSL that we talked about call site DSLs
00:21:31.200
These are not a part of the indexing process because they don't impact the declarations that are available in the
00:21:37.200
codebase So these are handled on a perfeature basis and they don't not
00:21:42.240
every feature makes perfect sense for for uh these call site DSL So it's better that we handle them based on what
00:21:49.200
we want to augment in the LSP And the example we're going to see here is augmenting go to definition
00:21:56.480
The way that you can augment every single feature in the Ruby LSP is through these factory methods that are
00:22:02.640
defined in our add-on class And the idea of these methods is to create a new
00:22:08.240
listener that will be taken into account when we execute that request And you
00:22:13.760
receive from the Ruby LSP everything that you need to hook into the response
00:22:18.960
builder into the execution flow through the dispatcher so that your listener is taken into account as we're running this
00:22:25.280
request So what we want to do here is we want to be able to commandclick to go to
00:22:31.120
definition on the all is well symbol on the symbol that represents a method and
00:22:36.960
end up in the declaration of that method To do that we're going to define
00:22:42.400
our uh extra definition listener here for our add-on In the constructor we're
00:22:47.760
going to essentially just save state that we received from the Ruby LSP itself We're saving the index which
00:22:55.039
includes all of the declarations that we discovered in your project We save the
00:23:00.240
response builder so that we can contribute more information to the final response And the node context is another
00:23:07.600
object that we get from the LSP which has information around the place where
00:23:13.360
the user is trying to go to definition for and it'll be useful for for the
00:23:18.760
handler In this case we want to handle uh symbol nodes If the user is trying to go to definition on a symbol that's when
00:23:25.600
we want to actually uh perform logic And here's how we would define the
00:23:31.360
handler for this The first thing we want to do is we are not interested in any
00:23:36.400
symbol We are only interested in symbols that are being passed to a method call for validate So we can use the node
00:23:43.360
context to check is there a surrounding method call that is uh surrounding this
00:23:49.120
symbol node and is that method call a call to validate If that's the case uh
00:23:54.559
then we continue otherwise we return early We try to gra get the name of the
00:24:01.039
symbol the value of the symbol Of course here uh if the user didn't finish typing
00:24:06.080
we need to account for that as well So we return early if there's no name yet
00:24:11.440
If there is a name we know which method name we are trying to find And we also
00:24:17.840
know that because validate was invoked for example in the post class we know exactly which uh name space we need to
00:24:25.440
search for for that method which is given to us by the node context in this case So we can use the index to resolve
00:24:32.799
a method based on its name and the fully qualified name of the the namespace that
00:24:39.000
that should own that method that you are looking for And we're going to get back
00:24:44.400
as the result uh an array of every declaration that we were able to find for that name within that name
00:24:52.679
space Of course when you're developing you uh can do weird things like using a
00:24:59.279
symbol for a method that has not been defined yet So we need to account for that case as well and return early if
00:25:04.640
you didn't define the method The last step is to take the
00:25:09.679
declarations that we discovered that are related to the name of the the the symbol value that we are trying to jump
00:25:16.000
to and then format them for the editor and uh include them in the final
00:25:21.440
response using our response builder And in this case we're just taking we're just transforming the location of the
00:25:27.679
declaration into a a range which is what the editor expects And then we're creating this uh location object to push
00:25:34.000
it into the response builder And this is already enough for us to commandclick on that symbol and land on
00:25:41.679
the method declaration for all as well And the API for the uh for the index of
00:25:46.960
the Ruby LSP already takes other things into account So like things like inheritance If this method was defined
00:25:53.360
in a module or in a parent class uh that would also work
00:25:58.480
Since we announced the add-on API we have had considerable interest from the
00:26:03.919
community which has been super nice to see All of these add-ons with the exception of the tapioca and the Rails
00:26:09.440
one were uh created by the community and are not maintained by our team
00:26:15.600
In addition to contributing extra behavior to our base features all of the
00:26:21.360
intelligence that is contributed by add-ons is also available to any other type of integration that we use uh the
00:26:29.120
Ruby LSP for And I want to mention our latest prototype which is an implementation of an MCP server that
00:26:36.720
connects to the Ruby LSP to expose information about your codebase to LLM
00:26:42.480
and AI related tools Uh if you're not familiar with the MCP it is a server It's actually quite similar to language
00:26:49.200
servers in the way that the protocol is defined and it allows uh LLMs It allows
00:26:55.120
language models to ask questions to that server to grab extra context So in this
00:27:01.039
case we are uh providing extra context about your codebase through the Ruby LSP
00:27:06.080
and it will take the contributions of all of the add-ons into account So here's an example in cursor I
00:27:13.600
asked it to draw a diagram Uh and it decides whenever it needs to ask uh for
00:27:20.159
more information from the Ruby LSP in order to fulfill the task that the user has requested Uh and after it it grabbed
00:27:28.640
all of the information that it thought was relevant it will then perform the task that you
00:27:33.799
asked And so the intention here is to expose information that would not be
00:27:39.360
very easy for the LLM to discover like what is the hierarchy of uh classes
00:27:45.120
available or what is the type of a variable things that are not written down directly in the source code
00:27:53.679
Ruby is a very unique language and of course that we can learn a lot from look
00:27:58.960
looking at what other gradual typing systems have done but I don't think that it is enough even though it is super
00:28:06.000
helpful I think we should experiment with uh new types of ideas that can try
00:28:11.120
to account for everything that Ruby allows us to do And I have some crazy ideas here that I frankly don't know if
00:28:17.919
they are feasible but I'll share them anyway In the case of belongs to could
00:28:24.000
we have some sort of annotation where we inform the type checker that calling
00:28:29.520
that method will produce new declarations that are dynamic So things like new methods or new constants or new
00:28:36.640
classes and could we pass some sort of type parameter that can be used to then
00:28:41.760
uh properly annotate the methods that will be defined dynamically So we could pass something like this informing the
00:28:48.000
type checker that belongs to will produce methods that are related to the user class For the case of symbols that
00:28:55.520
represent other things like a a constant or a method could we have some sort of
00:29:01.279
special type that represents that so that we can have all sorts of type-checking features like
00:29:08.000
type-checking errors or completion or goats definition uh automatically available for those constructs
00:29:15.440
And in the case of instance variables that are not initialized in an object's
00:29:20.559
constructor So for example in this case where the instance variable user is always set by set user and not in an
00:29:28.080
initialized method Uh it is common for example if you're using survey to uh see
00:29:33.440
errors related to the type checker thinking that the instance variable may be nil because it it may indeed be new
00:29:40.799
if you just uh create a new instance of the users controller class But could we
00:29:46.000
try to inform the type checker that in this particular context show can only be
00:29:51.039
invoked after set user has been invoked so that the type checker can essentially
00:29:56.559
take the two bodies of the the methods and glue them together so that the analysis matches what happens when we
00:30:03.679
execute the code a little bit better I think to have a better
00:30:08.799
type-checking experience in Ruby correctness is of course important We always want to be as accurate as
00:30:15.039
possible but not at the expense of usefulness A gradual typing system is is a tool is not a requirement and we
00:30:21.600
should try to be as useful as possible to users And I also believe that we should optimize for the most common
00:30:27.520
cases that people use Ruby for Providing escape hatches for the edge cases so that we can make optimizations for
00:30:34.799
better accuracy in the cases that are most common for Ruby programmers And with some innovation in our type
00:30:41.039
annotations I think we can uh get there I think we can account for the dynamic nature of Ruby Thank you very much