00:00:06.400
well if you don't mind I'm going to
00:00:08.240
stand here to the side of the podium
00:00:10.400
because I don't like feeling imprisoned
00:00:13.519
by the podium but nice to meet you all
00:00:16.000
my name is Stephen and I am really
00:00:20.320
honored to be here at Ruby Kai this is
00:00:22.560
my first time at a Kai this is my first
00:00:24.960
time in Japan and it has already been a
00:00:27.920
really lovely experience
00:00:30.640
you can find me basically everywhere on
00:00:33.040
the internet as
00:00:35.480
fractalmind and I write on my blog
00:00:40.040
fractalmind.github.io i'm here today
00:00:41.840
because I love Ruby and I love open
00:00:45.040
source i've had the privilege over the
00:00:47.120
last couple of years to be a regular
00:00:49.280
contributor to Rails as well as to the
00:00:53.360
SQLite Ruby driver
00:00:56.399
and if you do know my name it's likely
00:00:58.719
because of the work I have done in the
00:01:00.480
last couple of years to make the Ruby
00:01:02.760
ecosystem the single best ecosystem in
00:01:06.479
the world to build with the SQLite
00:01:08.799
database
00:01:09.720
engine i had the opportunity last
00:01:12.479
September to speak at Rails World about
00:01:15.680
how Rails 8 and SQLite together can
00:01:19.360
supercharge the experience of building
00:01:21.439
web applications as solo developers and
00:01:25.280
if That's a talk that you haven't seen
00:01:28.000
it is on YouTube in my completely
00:01:30.479
unbiased opinion it's great and you
00:01:33.360
should watch it and if it increases the
00:01:35.759
view count that's just a a nice side
00:01:38.360
effect today though I want to focus on
00:01:42.000
pure Ruby and and not web applications
00:01:44.560
specifically and I want to talk to you
00:01:46.000
about the next project in my crusade to
00:01:49.200
make Ruby the best ecosystem for working
00:01:51.360
with SQLite and that is a project I'm
00:01:53.520
calling Plume
00:01:56.159
plume is a tool set for working with
00:01:59.040
SQLite
00:02:00.520
elegantly and what I'm going to talk
00:02:02.479
about today is the heart of that project
00:02:04.560
which is a
00:02:06.840
parser you can find plume now publicly
00:02:09.520
available open source on GitHub here at
00:02:12.319
this URL
00:02:14.040
and what I want to do today is walk
00:02:17.239
through why I am writing a parser for
00:02:20.959
SQLite's dialect of SQL how you can use
00:02:24.000
Plume and how it works and then talk
00:02:27.120
about what's next and how you can get
00:02:30.040
involved so why would anyone write a SQL
00:02:34.480
parser let alone a SQL parser for just
00:02:36.959
SQLite's dialect of
00:02:39.000
SQL and it's a pretty fair question
00:02:41.760
because if you did do some research you
00:02:43.440
would see that there are already a fair
00:02:45.680
number of projects that are aiming to be
00:02:49.519
a SQLite specific parser across a
00:02:52.400
variety of languages and frameworks and
00:02:55.760
tool
00:02:56.840
sets and you would think that this
00:03:00.720
incredibly niche market is already
00:03:02.879
saturated
00:03:04.640
but the fact of the matter is that
00:03:07.440
nearly every single one of these
00:03:09.120
existing projects would fail to parse
00:03:11.599
this SQL
00:03:13.159
statement and the reason why is right
00:03:17.319
here in well- formatted SQL this would
00:03:20.720
be a
00:03:21.959
comma but SQLite is perfectly happy to
00:03:25.040
accept just blank spaces between table
00:03:27.760
constraints and if SQLite is able to
00:03:32.239
parse something but a third-party parser
00:03:35.080
cannot that is a problem and it's a
00:03:38.480
problem because it limits the potential
00:03:40.480
impact of that project it'll never be as
00:03:42.640
impactful as something like
00:03:44.680
PGQuery pgquery is used in over 1,000
00:03:48.720
applications and packages and the number
00:03:51.760
one reason why is that it is the parser
00:03:55.519
that Postcrest uses which means it has
00:03:57.519
100% compatibility and that
00:03:59.760
compatibility is the foundation that is
00:04:01.840
needed to build an ecosystem of tools on
00:04:04.720
top of a
00:04:07.080
parser when it comes to parsing
00:04:09.920
compatibility is
00:04:11.640
king if there is any SQL statement that
00:04:14.319
the SQLite CLI will happily parse but a
00:04:17.680
third party parser does not that's a
00:04:19.600
problem and if there is a statement that
00:04:21.600
SQLite would error on and the parser
00:04:23.919
doesn't error on it that's also a
00:04:25.440
problem we have to get to perfect
00:04:27.880
compatibility and so the question
00:04:30.280
is how do we get there and what's sort
00:04:32.720
of the state of things today there are a
00:04:35.759
couple of projects when I was doing
00:04:37.360
initial research here that were pretty
00:04:39.360
close this one is a rustbased parser
00:04:43.600
generator that uses the exact same
00:04:45.840
source grammar
00:04:49.240
as this one is a rustbased um parser
00:04:53.440
generator that uses the exact same
00:04:55.040
source grammar file as the SQLite parser
00:04:58.960
but it has uh a couple of really key
00:05:01.560
limitations the first is that not every
00:05:05.280
grammar declaration is supported and the
00:05:07.759
second even bigger one is that it
00:05:09.520
regularly panics so it is unable to be
00:05:14.320
this foundational tool for the ecosystem
00:05:17.280
to build on top of as we can see in in
00:05:19.280
the number of projects that are using it
00:05:22.080
this JavaScript
00:05:23.880
parser would actually successfully parse
00:05:26.400
the example we gave at the beginning and
00:05:28.400
it has good momentum there are over 300
00:05:30.320
projects that are using
00:05:32.199
this but it was archived nearly four
00:05:34.880
years
00:05:35.560
ago so I think that there's a real need
00:05:38.800
for this kind of project and that's why
00:05:42.479
I started Plume with this vision to
00:05:45.039
build a foundation for the next wave of
00:05:47.280
SQLite tooling but if I'm being honest
00:05:50.560
this project didn't start with a grand
00:05:52.400
vision for the next wave of tooling it
00:05:54.720
started with fixing a bug in Rails
00:05:58.320
i found a bug when I went to execute
00:06:01.600
this
00:06:03.000
migration and the problem was that I
00:06:06.000
wanted to define a virtual column with
00:06:09.440
an expression that had a comma inside of
00:06:12.600
it and everything in Rails blew up when
00:06:14.960
I went to run this migration so I I went
00:06:16.560
diving through the source and I found
00:06:19.039
this line rails's active record SQLite
00:06:22.639
adapter needs to understand your schema
00:06:25.600
and in order to understand SQLite schema
00:06:28.720
it has to do some parsing of the create
00:06:31.120
table statement and Rails uses some
00:06:32.960
reaxes and some basic stream
00:06:34.319
manipulation to try to make sense of
00:06:35.680
things so I found a pretty simple split
00:06:38.080
on commas probably that'll just be comma
00:06:40.240
definitions but if you have an
00:06:41.280
expression with a comma in it all of a
00:06:42.800
sudden everything breaks so to fix this
00:06:45.039
bug I wrote this incredibly fun reax i'm
00:06:48.800
not even going to spend a lot of time on
00:06:49.919
it don't read it it's horrible it fixed
00:06:52.400
that bug even as I wrote it I was aware
00:06:54.479
of the other possible edge cases that
00:06:57.360
would lead to very similar bugs this is
00:06:58.960
not resilient and when I put this pull
00:07:02.639
request in and got it merged I was happy
00:07:04.639
to have fixed that bug but I couldn't
00:07:06.080
help but daydream about what it could
00:07:07.919
have been if I could have fixed the bug
00:07:10.240
with code like this because if we did
00:07:13.440
have a parser and if we could inspect
00:07:15.919
the schema of our tables and get back
00:07:19.039
structured Ruby objects that were full
00:07:21.120
and correct it would make Rails' life so
00:07:24.240
much easier and these kinds of bugs
00:07:25.840
would
00:07:27.400
disappear so this is why I wanted to
00:07:31.680
start this project i really do want to
00:07:34.240
set a foundation for a new wave of tools
00:07:36.960
for SQLite it's obviously gained a lot
00:07:39.440
of momentum here in the last couple of
00:07:41.039
years and I think that there's so much
00:07:42.639
opportunity for tooling to come in the
00:07:45.520
future years and maybe just maybe we
00:07:48.800
will be able to use this tool to secure
00:07:51.440
Rails's introspection capabilities to
00:07:53.680
drive active record when working with
00:07:55.479
SQLite and if I'm being honest I wanted
00:07:59.039
to challenge myself uh I am a
00:08:01.280
self-taught programmer i never went to
00:08:03.039
computer science uh classes let alone
00:08:06.479
get a full degree i didn't really know
00:08:08.400
much about parsing and I wanted to and
00:08:10.560
this seemed like a great opportunity to
00:08:12.479
dive in and learn
00:08:14.280
more so let's transition now to like how
00:08:17.599
you can use Plume and how it
00:08:20.199
works and to do that we're going to look
00:08:22.240
at a simple create table statement like
00:08:24.280
this now if we parse this SQL and we get
00:08:27.759
the the statement
00:08:29.160
node it'll look like this and let's
00:08:32.560
break this down bit by bit and see how
00:08:35.279
the syntax tree is being formed and how
00:08:37.760
we can work with
00:08:39.000
it so the create table token sequence is
00:08:43.360
what is going to determine that we have
00:08:44.880
a create table statement
00:08:47.399
node we see that the table we are
00:08:50.399
creating is called users so our node
00:08:52.880
will have a name property and or a table
00:08:56.959
property that has a
00:08:59.600
uh let's look at this table name node
00:09:02.640
the table value is our string
00:09:05.160
users our first column is called ID so
00:09:09.760
inside of our columns array we're going
00:09:11.120
to have a column definition node it'll
00:09:12.800
have a name property that'll be the
00:09:14.240
string
00:09:15.399
ID that column is defined as an integer
00:09:18.480
so inside of that column definition we
00:09:20.240
will also have a type property that type
00:09:22.560
property will have a column type node
00:09:24.640
that will have the text integer and
00:09:26.720
it'll also tell us the affinity for that
00:09:29.279
particular type sqlite's loose typing
00:09:31.839
system um is fully supported here
00:09:36.000
we also see that we have a constraint on
00:09:37.839
this column so this is a primary key
00:09:39.680
constraint with the auto increment
00:09:41.760
keyword so inside of our constraints
00:09:44.000
array we see a primary key column
00:09:46.000
constraint node and the auto increment
00:09:48.320
option is indeed
00:09:49.959
true our second column is called email
00:09:52.720
so we'll have another column definition
00:09:54.320
node property name value email we see
00:09:57.760
the string is the type another type
00:10:00.560
another column type we see the affinity
00:10:03.680
this actually has two constraints it's a
00:10:05.200
notnull column so we'll see that node in
00:10:08.240
the column in the constraints array and
00:10:11.040
it is also a unique column and so we
00:10:14.399
have a second constraint node in our
00:10:16.560
constraints
00:10:18.200
array this a structure provides every
00:10:22.000
single bit of information that we had in
00:10:24.399
our SQL string structurally in a way
00:10:27.120
that is easy to use in any Ruby program
00:10:30.959
and the parser is able to handle fully
00:10:33.279
complicated create table statements this
00:10:35.040
is the most complicated create table
00:10:36.480
statement I could write and it would
00:10:39.519
create a syntax tree like
00:10:42.760
this pretty easy to
00:10:45.079
follow yeah I think we all get it
00:10:48.680
um in order to support parsing large
00:10:52.480
complex SQL statements as quickly and
00:10:54.240
efficiently as possible the architecture
00:10:56.399
is inspired by Aaron Patterson's talk
00:10:59.040
that he gave last year at Rails World
00:11:00.720
where he broke down three key principles
00:11:03.519
for parsing efficiently in Ruby so the
00:11:08.160
first is to write your parser in pure
00:11:10.480
Ruby you want to minimize the number of
00:11:11.920
calls going back and forth between C and
00:11:14.320
Ruby so if you keep everything in Ruby
00:11:16.160
and you allow YJet to optimize um you're
00:11:18.959
going to get speed improvements for free
00:11:22.399
from the just in time compiler you then
00:11:25.839
want to minimize allocations that means
00:11:27.680
leaning on integers and symbols which
00:11:30.800
are objects that aren't going to
00:11:32.000
allocate to sort of drive the primitives
00:11:33.839
of your parser and then when you do need
00:11:36.399
to allocate and obviously you are going
00:11:37.920
to need to allocate some strings to make
00:11:39.760
sense of what's there you want to do
00:11:42.079
that lazily so only when needed do you
00:11:45.519
provide the values that users ask for
00:11:47.440
and that's going to help keep everything
00:11:48.640
as efficient as possible so these
00:11:51.040
principles drove the architecture for
00:11:54.399
how plume is implemented it has a lexer
00:11:57.600
that lexer will work token by token
00:12:00.880
keeping track of the cursor offsets that
00:12:04.560
are driving the tokenization process
00:12:08.800
such that you can ask for a value if and
00:12:12.800
when you need it and only at that moment
00:12:14.959
will we go and allocate that string and
00:12:16.639
pull it out for you that lexer is used
00:12:18.720
internally by the parser the parser is
00:12:21.519
also going to work token by token we do
00:12:24.560
use a buffer so that we can peak ahead
00:12:26.959
so we keep track of the the token so the
00:12:29.839
lexer can sometimes be one two or three
00:12:32.639
tokens ahead of what the parser is
00:12:34.480
looking at but that's just managed
00:12:35.760
easily with our buffer array in order to
00:12:39.040
make the actual parsing logic relatively
00:12:41.959
maintainable there are a number of
00:12:43.839
helper methods that allow me to encode
00:12:46.959
the grammar and all of the different
00:12:48.800
flexibility that is logically required
00:12:52.160
and to sort of walk through how those
00:12:55.760
helper methods are being used and how
00:12:58.320
the parser itself is running I want to
00:13:00.000
walk through the create table statement
00:13:01.519
parsing um so that we all feel
00:13:04.000
comfortable with how this gem is working
00:13:06.240
because I am going to ask all of you to
00:13:08.560
come and help me and contribute so you
00:13:10.560
won't have any excuses by the end of
00:13:12.000
this talk
00:13:14.000
if we look at the SQLite documentation
00:13:16.079
we will see a railroad diagram like this
00:13:18.160
that really clearly explains the logic
00:13:21.040
of parsing the create table statement it
00:13:23.600
is worth noting you should be aware that
00:13:26.399
these diagrams are telling you how to
00:13:28.079
write SQL they are not telling you how
00:13:30.079
to parse SQL as we can see they say you
00:13:32.560
should put commas after table
00:13:34.320
constraints and SQLite is happy to allow
00:13:37.279
you to not put commas there this right
00:13:39.040
here I believe is the primary reason
00:13:41.200
that most parsers have a number of
00:13:43.680
incompatibilities because their
00:13:45.040
grammarss are defined on these diagrams
00:13:46.959
and these diagrams are filled with lies
00:13:50.880
so buy or
00:13:52.360
beware if we jump into the logic for how
00:13:56.079
we can convert this diagram into uh our
00:13:58.959
actual execution we will see starting
00:14:02.240
here at the beginning of the create
00:14:03.440
table statement this where the first
00:14:05.360
thing we're going to do is ensure that
00:14:07.440
we have a create to token right so if we
00:14:10.160
expect that we are parsing our create
00:14:11.440
table statement we must have the create
00:14:12.959
tok to token first so we use the require
00:14:15.440
helper that ensures that that token is
00:14:17.839
there and consumed or we error the
00:14:20.959
grammar next says that we might or might
00:14:23.040
not have either the temp token or the
00:14:25.279
temporary token that is why we use the
00:14:27.040
maybe one of helper to say maybe we'll
00:14:29.519
have a token maybe not and if we do
00:14:32.160
it'll be one of temp or temporary so if
00:14:34.480
either of those tokens are the current
00:14:35.680
token we consume them if not we just
00:14:37.519
return nil and everything's okay the
00:14:40.320
next point is requiring the table token
00:14:44.240
as a small note yes you can just define
00:14:46.399
a require method in your class or module
00:14:48.480
and use it um am I maybe writing not so
00:14:52.000
beautiful Ruby because I'm overwriting
00:14:54.320
uh the require method possibly but I
00:14:56.480
like the way it sounds
00:14:58.760
so we do it after you have a table you
00:15:02.160
might or might not have a sequence of
00:15:04.079
tokens if not exists so the maybe all of
00:15:08.399
helper uh allows us to check do we have
00:15:11.680
this exact sequence of tokens fully from
00:15:14.160
start to finish and if we do consume all
00:15:16.160
of them and grab the full span but if we
00:15:19.360
don't just return nil and continue
00:15:22.199
on next we get the table name and that
00:15:25.760
calls out to a separate rule definition
00:15:28.160
method that has all of the logic for
00:15:30.240
getting the table name that might be
00:15:31.920
qualified and structured in a couple of
00:15:34.639
different ways um structurally you might
00:15:37.519
expect that we would use a require
00:15:40.079
helper method with a block to include
00:15:43.040
the method uh and that's actually what
00:15:44.959
we do with with may if you want to check
00:15:47.199
if you can parse another rule you can
00:15:49.600
call maybe with a block with the rule
00:15:51.720
method but in the case of requiring this
00:15:54.160
just adds overhead for very little value
00:15:56.399
so for rules that are required we just
00:15:58.720
call them directly and I like to add a
00:16:01.360
lot of spaces so things line up in a
00:16:03.279
nice columner way if we shift our focus
00:16:06.160
into the actual parsing of the
00:16:08.320
definition of a table we'll see the rest
00:16:10.399
of our helpers here so we can use the
00:16:12.959
maybe helper to check whether or not the
00:16:14.720
current token is the opening parenthesis
00:16:16.720
if it is we know that we are parsing a
00:16:18.399
create table statement with column
00:16:19.959
definitions and you must have at least
00:16:22.320
one column defined so there we can use
00:16:24.399
the require sum helper which says
00:16:26.720
there's one or more uh rules in this
00:16:31.360
case we are looking at the column defaf
00:16:33.519
rule that is another helper method uh a
00:16:36.320
rule definition method we're not going
00:16:37.920
to look at its internals but uh it
00:16:40.800
includes all of the logic for parsing
00:16:43.040
out a full column definition and we tell
00:16:45.759
the require some helper that we are
00:16:48.240
going to have those column definitions
00:16:50.800
separated by commas that come after each
00:16:54.040
definition after we've collected all the
00:16:56.240
columns we look to the constraints here
00:16:58.880
there might be zero or more constraints
00:17:01.120
so we use maybe
00:17:02.759
some in this case the table constraints
00:17:07.439
only are used inside of create column so
00:17:09.760
we don't need to extract it into a
00:17:11.120
separate rule method we can just define
00:17:12.559
all the logic directly in the
00:17:15.000
block and we specify that the trailing
00:17:18.079
separator could be either a comma or nil
00:17:20.720
and this is of course the exact logic
00:17:23.039
that we need to be fully compatible with
00:17:25.679
SQL's handling of table constraints
00:17:28.559
being either uh separated by a comma or
00:17:31.679
not
00:17:33.840
so once we have collected all the
00:17:35.760
constraints we must have our closing
00:17:38.240
parenthesis so we use require one more
00:17:40.160
time and at the very end we might or
00:17:42.880
might not have a couple of table options
00:17:44.720
so we use maybe some once again and that
00:17:47.280
gives us everything we need to create
00:17:49.039
our create table statement node which as
00:17:51.679
we have seen looks like this now the
00:17:54.880
only other final point to make is a
00:17:58.000
small confession uh this slide is a bit
00:18:01.120
of a lie
00:18:02.799
this isn't actually exactly what the
00:18:05.120
parser will generate this is an abstract
00:18:07.919
tree the parser actually produces a
00:18:11.520
concrete tree uh that keeps track of the
00:18:14.559
locations and
00:18:15.960
spans uh but I personally find this a
00:18:20.360
bit aggressive on the eyes and so when
00:18:24.559
you inspect the column or any particular
00:18:28.480
node you get back a lazy representation
00:18:31.360
of the abstract version of that node uh
00:18:34.240
but you have full access to all of the
00:18:36.080
span and location information we just
00:18:38.720
pretty print it um for you so as you can
00:18:42.919
see even this simple statement has a lot
00:18:45.840
of information on it you have access to
00:18:48.320
all of that information and if you
00:18:49.760
wanted to or needed to use some of this
00:18:52.000
fine grain information that's all there
00:18:54.000
but for the most part we're going to
00:18:56.000
keep it simple just to break down
00:18:57.600
exactly how it's working we're going to
00:18:59.440
have these token spans that are composed
00:19:01.840
of four parts we have the tokens one or
00:19:05.679
more tokens the start offset the end
00:19:08.400
offset these are offsets into the
00:19:10.240
original source uh string and then the
00:19:13.200
type of token that this is this helps in
00:19:15.120
some of the the helper methods that are
00:19:16.880
there to extract the value
00:19:19.200
in order to use these offsets every
00:19:21.360
single node includes a reference to the
00:19:24.160
full SQL source string that was passed
00:19:26.080
into the parser and that is what allows
00:19:28.160
us to lazily produce the values that are
00:19:30.640
needed so what happens in an IRB console
00:19:33.360
um for example if I just look at the
00:19:35.919
node it's going to access it and at that
00:19:39.440
moment it goes and lazily grabs all of
00:19:41.440
these values to say okay what is the
00:19:43.600
table name what is the column name uh
00:19:46.240
the column type text but in reality you
00:19:48.880
could write your tools on top of this
00:19:51.360
and if you didn't need any of that stuff
00:19:52.960
and you didn't ask for it none of those
00:19:54.559
objects would be
00:19:56.280
allocated so the way that Plume works
00:20:00.080
this is a handwritten recursive descent
00:20:02.679
parser it is minimally allocating uh
00:20:06.919
objects mostly integers and symbols in
00:20:11.160
arrays it works token by token um so
00:20:15.360
there's never huge arrays in memory the
00:20:18.480
grammar is encoded with these hopefully
00:20:20.799
very literate helper methods and that
00:20:23.600
allows us to produce a concrete syntax
00:20:25.760
tree that we can then present to users
00:20:28.000
as an abstract syntax tree so that you
00:20:30.159
get the best of both worlds the
00:20:31.440
usability of the abstractness and the
00:20:33.600
fullness of the
00:20:36.200
concreteness
00:20:37.720
okay what is
00:20:39.880
next well I'll be honest i didn't 100%
00:20:45.200
finish it before today so there are more
00:20:49.039
statements to parse and I would love for
00:20:53.360
you to help
00:20:54.919
me i'm tracking progress here in the
00:20:57.600
readme now I confess that this looks
00:20:59.840
like I'm barely
00:21:02.120
started the truth is that create table
00:21:06.320
statement is one of the most complicated
00:21:08.720
statements that has a number of
00:21:10.440
dependencies so getting full expression
00:21:13.039
parsing and select uh statement parsing
00:21:16.400
uh those two are like the hardest two
00:21:20.000
and then create table statement is the
00:21:21.760
most useful one and it allowed me to get
00:21:24.159
all of the foundations right and if you
00:21:26.960
actually go and look at most of these
00:21:29.000
statements they're actually really easy
00:21:31.200
to parse so there are a lot of
00:21:33.360
opportunities for you to contribute
00:21:34.960
without having to go into very insane
00:21:38.720
difficult parts of the grammar a lot of
00:21:40.559
these statements are really
00:21:41.679
straightforward and would be very uh
00:21:43.760
easy to con to write the uh logic for
00:21:47.440
using the helper methods so I really
00:21:50.120
genuinely think that all of you
00:21:52.960
incredibly smart people could get
00:21:55.120
involved and contribute to this
00:21:57.240
project uh but even if I'm forced to do
00:22:01.280
it alone it won't take a ton of time uh
00:22:04.320
now that all of the foundations are set
00:22:07.120
but I'm not trying to hide the fact that
00:22:09.760
yeah there are more statements to handle
00:22:11.600
luckily create table statements I think
00:22:13.600
are the most important feature both for
00:22:16.480
Rails and for some of the tools that I
00:22:18.080
want to build on top of this um but if
00:22:20.240
you have needs for parsing other
00:22:22.120
statements let's do it
00:22:24.280
together next I would also love any
00:22:28.159
community help in hunting for
00:22:30.919
incompatibilities right so as I said if
00:22:33.840
there's any difference between how
00:22:35.440
SQLite parses a statement and how Plume
00:22:37.360
parses a statement that is a bug and I
00:22:39.120
would love that bug to re be reported so
00:22:41.360
that we can fix it put it in as a
00:22:43.600
regression test and ensure that we get
00:22:45.919
to full um compatibility now there
00:22:49.200
shouldn't be many bugs with create table
00:22:51.440
statement parsing because there are lots
00:22:52.880
and lots and lots and lots and lots and
00:22:54.880
lots of tests um I wrote a script that
00:22:59.120
scrapes every single SQL statement in
00:23:01.200
the public test suite of SQLite uh there
00:23:04.720
are I don't know 20,000 or so uh and we
00:23:08.159
parse all of those and then I'm also
00:23:10.880
handwriting tests for every single
00:23:12.640
logical permutation of the grammar to
00:23:15.760
get to 100% branch coverage um and so
00:23:20.200
hopefully when you go hunting for
00:23:22.240
incompatibilities you don't find many
00:23:24.240
but I hope you find some and we can fix
00:23:27.200
them because that makes the project
00:23:29.360
stronger and stronger and that is what
00:23:30.880
we need to treat this as a foundational
00:23:33.200
project for the ecosystem as it grows in
00:23:35.200
the coming
00:23:37.320
years as those of you who were here for
00:23:40.480
Marco's talk have already seen he's much
00:23:43.440
smarter and much faster than me so he
00:23:45.200
already built a web as explorer for his
00:23:47.919
parser uh that's something in the future
00:23:49.919
for me but if you saw his demo you would
00:23:53.039
see how incredibly useful that can be
00:23:55.120
you can put the SQL in you can see
00:23:57.440
exactly how the parser is going to make
00:23:58.880
sense of it and really get a feeling for
00:24:02.720
how to make use of the parser to start
00:24:04.960
building your tools on top of it how you
00:24:06.640
can get into the bits that you're going
00:24:08.720
to need um so I look forward to stealing
00:24:13.279
much of his code and porting it over to
00:24:16.320
this parser
00:24:19.520
the other thing that I have been working
00:24:22.000
on uh I didn't have a ton of code
00:24:25.760
samples i didn't want to get into the
00:24:27.200
implementation but I do want to talk
00:24:28.640
about um the results is using this
00:24:33.679
syntax tree as the foundation for SQL
00:24:36.679
generation um yes we have ARL uh no I
00:24:42.240
don't love it and so I have been
00:24:44.559
exploring some possibilities for
00:24:46.960
different ways to bring the full
00:24:49.520
expressive power of SQL to rubists
00:24:53.600
without some of the object-oriented
00:24:56.679
cumbersomeness of uh Arell so just to
00:25:00.799
give you some teasers of some of the
00:25:04.000
things that um I've been building this
00:25:07.039
is a fluid Ruby interface to produce SQL
00:25:11.600
um very much influenced by the SQL gem
00:25:15.840
um and the idea here is that we can
00:25:19.360
build the same syntax tree through Ruby
00:25:22.960
and then have that syntax tree produce
00:25:25.279
SQL for us that will be incredibly nice
00:25:28.159
but more exciting to me is work I've
00:25:30.720
been doing to define what I haven't yet
00:25:34.000
named perfectly but for lack of a sexy
00:25:37.440
term uh hashbased
00:25:40.360
SQL which is using structured Ruby
00:25:43.279
primitives to encode the full expressive
00:25:46.400
power of
00:25:47.480
SQL in Ruby and I think that this is
00:25:51.520
particularly exciting because this
00:25:53.760
bridges the gap for the biggest missing
00:25:56.960
feature in SQL in my opinion which is
00:26:01.000
composability it is quite difficult
00:26:04.080
unless you're doing select statements
00:26:05.520
and you use common table expressions to
00:26:08.000
compose parts of SQL together sql is
00:26:10.480
just strings and so you have to use the
00:26:13.520
primitive tools available to you in any
00:26:15.440
programming language that you're working
00:26:16.640
with to compose strings and it's it's
00:26:18.320
brittle and it's not particularly
00:26:21.840
um resilient to various kinds of needs
00:26:24.400
that we have as programmers but if we
00:26:27.200
were able to encode SQL in a way that is
00:26:31.440
literate in a way that isn't burdensome
00:26:34.080
or ownorous on the
00:26:36.039
eyes but used Ruby structures then you
00:26:39.360
could imagine taking that hash that is
00:26:41.760
the key of the exist the value of the
00:26:43.520
exist key there we could generate that
00:26:46.080
in a method we could store that in a
00:26:48.640
constant we could build that up piece by
00:26:51.679
piece across a module uh however it's
00:26:54.559
there it can live in in a completely
00:26:56.400
different part of our codebase used in
00:26:58.640
completely other SQL queries maybe we
00:27:00.880
just call it directly or we can inject
00:27:04.640
it as the value of this exist query
00:27:07.640
composing this smaller query into the
00:27:10.320
larger query as needed when needed
00:27:12.159
however we produce it um and the
00:27:16.240
generation step to take that into uh
00:27:19.120
full SQL using the a is um something I
00:27:23.120
have been building as well into plume
00:27:25.279
and one of the things I'm most excited
00:27:27.600
to push to all of the edges of
00:27:32.760
SQL and the last thing for what is
00:27:36.080
coming next is your contributions i
00:27:39.440
would love to see you in GitHub i would
00:27:41.440
love to see you open pull requests to
00:27:43.120
open issues to submit incompat
00:27:45.520
compatibilities that you have found this
00:27:47.840
is a project that has the opportunity to
00:27:52.559
be vital to the emerging next generation
00:27:55.440
ecosystem of SQLite tooling and it's
00:27:58.159
only going to be as strong as we all
00:28:01.039
make it and if it's just on my shoulders
00:28:03.279
it'll be mediocre but if we all come
00:28:05.919
together it could be great and so I hope
00:28:09.360
that I have given you enough initial
00:28:12.320
interest and confidence that you join me
00:28:14.880
in this longer term project of helping
00:28:18.159
to make Ruby the world's best ecosystem
00:28:22.480
for working with SQLite so find the
00:28:26.480
Plume repository give it a star numbers
00:28:29.679
matter and I'm vain
00:28:33.120
go to
00:28:34.039
GitHub and together let's make this big
00:28:39.200
and let's make it real and let's make it
00:28:42.000
elegant which is to say let's make it
00:28:44.360
Ruby thank you