Summarized using AI

Krypt the next level of Ruby Cryptogaphy

Martin Bosslet • November 01, 2012 • Denver, Colorado • Talk

Summary of Krypt: The Next Level of Ruby Cryptography

In this talk by Martin Bosslet at RubyConf 2012, he introduces Krypt, a new cryptography framework aimed at replacing the existing OpenSSL extension used in Ruby. Krypt is designed to run without restrictions on any Ruby platform and aims to address several fundamental issues with OpenSSL’s implementation, particularly in certificate validation.

Key Points Discussed:

  • Overview of Krypt: Krypt is positioned as a platform-independent cryptography framework that promotes diversity by allowing users to choose different cryptographic libraries based on their needs.

  • Provider Layer: Each layer of Krypt functions as a separate gem, where the provider implements low-level cryptographic primitives. This allows flexibility in using libraries suited for different operating systems, such as C for MRI Ruby and Java for JRuby.

  • Core Layer: Acts as a bridge between native implementations and Ruby, offering an API for utilizing provider features in Ruby, thereby ensuring performance critical operations are optimized.

  • High-Level Cryptography: The top layer of Krypt consists of high-level protocols that anthropomorphize low-level operations, simplifying usage for developers.

  • Performance and Stability: Bosslet emphasizes the importance of performance, presenting benchmarks indicating Krypt's ASN.1/DER parser's speed which reportedly surpasses native OpenSSL implementations, reducing memory issues through an effective caching mechanism.

  • Fuzz Testing: Krypt incorporates an extensive testing framework called FuzzBert, promoting random or fuzz testing, which helps uncover security vulnerabilities that traditional testing may miss. Bosslet argues that fuzz testing is crucial for security projects to ensure robustness against various input scenarios.

  • Future Directions with Binyo: He discusses the potential tool Binyo, designed for efficient binary IO and low-level byte manipulation in Ruby, thereby enhancing performance when dealing with bit-level protocols.

  • Hash Collision Issues: The talk also touches on hash functions in Ruby, addressing recent vulnerabilities regarding hash collisions and proposing solutions involving the use of cryptographic hash functions to mitigate exploitation risks.

  • Call to Action: Bosslet encourages developers to contribute to Krypt, providing an overview of its development progress and inviting the community to help drive forward the vision of replacing OpenSSL within the Ruby ecosystem.

Conclusion

The talk not only highlights the technological advancements of Krypt but also underlines the necessity for a more reliable, secure, and high-performance cryptographic tool for Ruby. By improving upon OpenSSL’s limitations and integrating rigorous testing methodologies, Krypt aims to set new standards in Ruby cryptography.

Krypt the next level of Ruby Cryptogaphy
Martin Bosslet • Denver, Colorado • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

Last year it was an idea, more of it in our heads than on github. This year, krypt is reality, it's growing quickly and its goal is to become the successor of the OpenSSL extension. Learn about why we need a successor at all, about the evils of OpenSSL certificate validation and how krypt will improve all this, running without restrictions on any Ruby platform. I'd like to contribute to putting an end to "Ruby is slow" by showing you how krypt's ASN.1/DER parser runs even faster than native OpenSSL C code or native Java crypto libraries. You'll learn about how you can use krypt today and how you can extend it to suit your needs by plugging in different "providers". Even if you are not particularly interested in cryptography, you might still be interested in how krypt takes testing to a new level, setting out to become one of the best-tested cryptography libraries out there. You probably know about RSpec, code coverage tools for Ruby, C and Java code, and Valgrind for sorting out memory issues. But krypt takes it one step further by making random testing an integral part of its test suite. Fuzzers are not for bad guys only - we all can benefit from random testing, with any application that accepts external input. Let me show you how krypt has spawned FuzzBert, a simple and extensible random testing framework that allows you to set up an effective random testing suite in no time. Finally, in the attempt to run as much of krypt as possible in plain Ruby, let me show you binyo, which allows dealing effectively with binary IO and low-level byte manipulation in particular, in all Rubies. If you implement protocols on the bit & byte level, need to do bit-level, exact-width operations on raw bytes, then binyo is what you have been looking for - speed up your code without having to deal with any of the implications that are inherent to Strings.

RubyConf 2012

00:00:14.639 so thanks eric okay let's talk real quick about
00:00:20.080 now first of all i'm a member of ruby core and what i do there mainly is
00:00:26.080 maintaining the open ssl extension so in matt's keynote we all heard about
00:00:31.599 how diversity is the basis for innovation and i also think that diversity is also
00:00:37.840 what gives us the ability to choose to choose the right tool for the job
00:00:45.040 and that's what i think that ruby cryptography should also not be just
00:00:50.320 about using openssl as it is right now and that's where crypt enters the
00:00:56.960 picture because crypt is also about diversity
00:01:02.000 so in one sentence you can describe crypt as being a platform and library independent
00:01:07.840 cryptography framework for ruby and its ultimate goal is to replace
00:01:13.520 openness as well so that's a bird's eye view of crypt in general as you can see there
00:01:20.479 are different layers and we're going to look into each layer now in detail
00:01:26.720 first of all each of those layers is a separate gem and
00:01:32.880 depending on the platform or library or operating system that you're on
00:01:39.200 you can combine those as you needed so let's have a look at the provider layer well provider is a native
00:01:46.799 implementation of depending if you're on a c based ruby or then it's written in c otherwise for
00:01:52.880 jruby would be written in java and what it mainly does is
00:01:58.719 it will implement all those low-level primitives that you need in crypto such
00:02:04.000 as digests ciphers or signatures and so on
00:02:09.599 what happens is that provider defines an interface and see this would be a header file and
00:02:16.080 java we have an interface that needs to be implemented by each implementation of such a
00:02:22.480 provider i want to try to keep those minimal so that people would be
00:02:28.800 encouraged to provide their own providers so it's also possible to do just a
00:02:35.680 partial implementation for example if you have one very specific feature that's
00:02:41.040 just available in one particular library and you want to use this
00:02:46.879 then you could write a partial implementation of a provider and still be able to use this with the
00:02:52.959 default provider so anything else would be given to you by the default provider and
00:02:58.480 you can use your special feature in parallel
00:03:03.599 what i hope to achieve in the future is that to not only support openssl but to
00:03:09.599 support a lot of different crypto libraries that are specifically um
00:03:16.000 well suited for different operating systems so we should support windows we should support
00:03:22.480 osx and so on so in general you can think of provider
00:03:27.760 as writing an adapter for your favorite library
00:03:32.879 and in a hopefully not too distant future my plan is to also write something that
00:03:38.640 implements crypto entirely in ruby
00:03:44.319 in between all this is the core layer and you can think of it as the link between the native world and
00:03:51.200 the ruby world what it does is it offers a provider api in ruby
00:03:57.360 so you can access the provider features in ruby using such
00:04:02.400 a provider and there's also some performance critical things that have to
00:04:08.000 be implemented in crypt core for example stuff that's very intensive on io
00:04:15.760 and that's why we currently have three different implementations one for c based rubies one for java
00:04:23.360 and also one that's written entirely in ruby and whenever i have to reach out to
00:04:28.400 native code this would be done by using ffi which is pretty interesting because this
00:04:34.320 allows you to use openssl in jruby even if openssl is written in c
00:04:42.639 and on top of all of this is crypt itself you can think of crypt as the high level
00:04:48.479 yeah high level cryptography in general it's written entirely in ruby no my native code
00:04:54.560 and it implements a lot of those fancy acronyms higher level protocols that
00:05:01.919 yeah well use the lower level primitives to achieve some form of protocol
00:05:07.759 so let's talk about the design principles of crypt as you might have noticed my goal is to
00:05:13.360 use ruby as much as possible and also i want to have it run on each
00:05:19.280 ruby equally well which is not the case right now with openssl
00:05:25.039 and by being able to choose we will also be more independent so we're not as tightly bound to open
00:05:30.880 ssl anymore and what i also hope to achieve by this is more stability because right now
00:05:37.360 we're often surprised by what happens upstream with openssl without really
00:05:42.400 knowing what's going on so this is always yeah kind of sucks because we have to
00:05:48.880 react very fast to some security fixes or anything like that and if we
00:05:54.400 are in control this would help us to achieve more stability
00:06:00.800 and it would also probably be a bad thing if the design is nice and
00:06:06.479 everything but the performance is magnitudes uh slower than opencsl so i think
00:06:12.000 if we want to replace it we should be comparable to its performance
00:06:17.600 this is something that's really important for me personally
00:06:23.120 because i think that many of you if you use crypto then you're probably not
00:06:28.800 really interested if you're using sha-256 or sha-3 or whatever petting scheme you're using i
00:06:35.520 think the only thing that you're probably interested in is that the whole thing is secure and it should be easy to use
00:06:42.400 right now with openssl you have so many options and it's easy to screw things up
00:06:49.520 and yeah it should be easy to integrate a new provider so whenever you have a new favorite crypto library then it
00:06:56.319 should be easy to support this yeah i also want to fix some of the
00:07:02.240 problems that currently exist with openssl it's not like there are none our vm
00:07:08.319 users probably know this if you're on windows you probably noticed and the biggest problem that i currently
00:07:15.120 see is the way how openssl handles certificate validation there's just recently been this paper
00:07:22.639 where somebody analyzed how popular applications
00:07:28.000 deal with certificate validation and it's most of the time it's wrong because just the api is way too complicated
00:07:35.280 and openssl also they don't want to implement a proper http implementation so it's really hard
00:07:42.720 to use urls or ocsp and yeah my goal is once this is finally
00:07:48.479 done that we hopefully can kiss very fine and
00:07:56.840 goodbye so i think 10 years of trolling is enough
00:08:03.199 so you might ask why would you want a break with integrating battle tested c libraries when basically everybody else
00:08:10.479 who see base does this and sorry the argument is often that for
00:08:17.120 crypto you need to be in control you need to be able to wipe memory you need to
00:08:22.879 be able to squeeze every bit of performance out of it so you need c
00:08:28.319 but that's i don't think that's really the way to go i rather believe that
00:08:33.519 crypto in and by itself is hard enough as it is so if you add pointers and memory
00:08:40.560 management to the mix this is just asking for trouble because it's just way too complicated
00:08:45.920 and if you don't believe me you should look at recent vulnerabilities because i would bet that
00:08:52.480 probably half of them are related to implementation issues and not crypto issues
00:08:59.279 so i think that we would rather need something that's high level that takes care of all of this for us and gives us
00:09:05.600 just enough control to do the things that we want and of course this would be ruby
00:09:12.160 so i've discussed this with other people on the core team specifically with hiroshi
00:09:19.680 and we both agreed that ruby is probably currently not the best language to
00:09:24.720 implement crypto because there are some things that are hard to do but i think we can fix all those
00:09:31.120 problems while we're on the way and binyo which i want to talk about later
00:09:36.640 would be one of those things that fixes the problems
00:09:42.160 so now i want to show you some of the things that i implemented in crypt which i think are
00:09:47.920 worth investigating some interesting things happened there and the first i want to talk about is
00:09:54.080 the asm1 parser that i implemented from scratch so acen1 for those of you who don't know
00:10:00.959 it you can think of it as xml for crypto it's defines
00:10:07.200 data structures in binary format and it's probably used yeah almost
00:10:13.040 everywhere in crypto and because of that it's also important that this better be fast
00:10:20.560 one of the problems that we currently have in openssl is that we can't process asn 1 data streaming
00:10:27.440 in a streaming manner and this is usually fine because the data is
00:10:33.440 coming in little chunks but as soon as you start for example signing a database log or a web server
00:10:40.079 log then you're running into trouble because you can't parse the whole thing into memory anymore
00:10:46.480 so i looked into other parsers other parser technologies and because asn 1 and xml are so similar
00:10:54.720 i yeah i ended up looking into xml parses and if you think about which language has
00:11:01.440 probably done everything with xml in the past so java they have their
00:11:07.920 their xml parses better be fast and yeah once i looked into their positive
00:11:14.000 technologies i noticed that many of them were event-based but i don't really like event-based parsers
00:11:21.200 because all those callbacks they rip you out of the context and you can't you don't know what's going on after a
00:11:28.079 time so i found pull parsers pull parsers up mainly
00:11:34.160 yeah they're they look like um they're non-streaming parsers but the api is
00:11:40.160 probably um very similar and so the principle is that you decide when
00:11:46.720 you want to pull the next token from the stream instead of that your token is being pushed to a callback and so you're
00:11:53.360 in control when you want to do things when you want to process your tokens
00:11:58.800 so the api is pretty simple you have this next token method that you call in your parser you get a token on that
00:12:05.920 and if you want to process this token you can call i o on it and you would get a stream and
00:12:11.519 can could process this stream so it's actually pretty easy to achieve
00:12:19.040 the next thing that i implemented for ason one which i found important is
00:12:25.200 i wanted to have an easy way to create asm1 data structures because they're used in a lot of places so
00:12:32.880 it should be easy to create them and if you look at this it looks very
00:12:39.279 familiar if you're familiar with active record and similar things you would define your fields
00:12:46.959 and yeah you can declare data data classes there and as long as you're
00:12:53.600 in ruby you can deal with normal ruby classes strings integers and so on and
00:12:58.639 only when you serialize this stuff it will be transferred into the format
00:13:04.000 that is being expected by ason one and currently if you want to do something that is similar to this you
00:13:10.560 would have to do this manually and so you end up writing a lot of boilerplate code which is probably
00:13:17.279 really error prone because you end up copy pasting so much and with template you get all of this
00:13:23.760 for free because this dsl already provides the parsing and serialization
00:13:28.800 methods for you and all you have to do is declare your classes and you can start parsing and
00:13:34.079 encoding right away another
00:13:39.600 specific feature of this parses that it does lacy parsing and what i mean by this is that i cache the original
00:13:46.959 encoding once i start parsing data and there's a good reason for this
00:13:52.240 because bouncy castle not too long ago they adopted what's called indefinite
00:13:57.279 length encodings for signatures and they had a good reason to do this because this is what eventually enables you to
00:14:04.720 process signatures in a streaming fashion but there's a problem with this because
00:14:11.120 indefinite length encodings they're no longer unique they're what's called ber encodings
00:14:17.120 and as opposed to the rn coatings which are unique they are not and the problem is that if
00:14:23.680 you have such an encoding you parsed it and you want to re-encode it using openssl or any other library
00:14:30.800 what happens is they get re-encoded to der because that's the only thing the person knows
00:14:36.480 what to do in that situation and this is really bad because what happened to me in the past is that
00:14:43.760 this potentially breaks signatures and that's something you don't want have to happen
00:14:50.000 so the only way we can deal with this is actually to cache the original encoding
00:14:56.240 and now let's see how this works in practice let's consider we have a very simple data structure a
00:15:02.800 that consists of two elements b and c so if you just parse the data and
00:15:08.160 re-encode it right away again what happens is it will just cache the entire encoding and just dump it out again
00:15:14.480 that's all that happens only if you start accessing the fields
00:15:20.320 we will start to interpret the inner encodings and once we
00:15:25.360 for example um imagine we access c or b here then we will interpret b and c's
00:15:32.720 encodings and now that we got their encodings we can actually discard the outer encoding
00:15:38.560 because the outer encoding is just just consists of the encodings of b and c
00:15:44.560 so once we done that we discarded the outer encoding and
00:15:49.759 if we start writing it out again we can simply now write out the cached encodings of b
00:15:55.759 and c and things get really interesting once we start modifying those data structures
00:16:02.240 let's imagine we have an a and now we want to assign a new value to one of the fields in order to do this we will first
00:16:09.680 need to interpret the encodings of b and c then we can discard the encoding of c because we now assign a new value
00:16:17.199 and we haven't got an encoding yet and only if you start encoding this again then we
00:16:23.759 can compute the new encoding of c on the fly and in subsequent attempts to encode this
00:16:30.720 we can just write out the new cached encoding and that's yeah pretty much how this works and once
00:16:37.519 you start caching stuff you always are afraid of how
00:16:43.839 will this work with memory consumption because it can grow quite high but the cool fact about
00:16:50.880 this here is that we since we're able to discard outer encodings when we go further inside the
00:16:57.040 data structure it's a fact that at all times we just stay
00:17:02.880 below two times the memory that we would need if we didn't catch anything at all and so i think this is pretty nice
00:17:08.880 because we know for guaranteed that our memory consumption is bounded above
00:17:16.720 it's also nice that this approach is really lenient so whenever you write a parser everybody
00:17:22.640 recommends you to do it as lenient as possible and
00:17:28.000 i've had it happen in the past that i wanted to validate a signature of a certificate
00:17:33.280 but my parser rejected this because there was some date field wrong
00:17:38.400 and i wasn't really interested in the date field so i just wanted to validate the signature and you can do this with
00:17:44.240 this parser only the stuff that you're interested in will be validated
00:17:49.600 and of course all this caching has a huge impact on performance
00:17:54.880 i did some benchmarks there that i want to present to you so the red stuff is the parser and
00:18:03.039 the black stuff is the existing open ssl implementation
00:18:08.880 so yeah i like this one and if you look at jruby similar picture and
00:18:15.440 since this was so fast i was getting curious and i wanted to know how it would um stand up to native code
00:18:23.440 and this is what happened the only library that was able to keep up with this was
00:18:28.640 java's built-in security library but all the other libraries that i tested were actually
00:18:34.559 magnitude slower and so that's what i felt like
00:18:45.520 so but i think what's even more important than that is this stuff is so fast is the fact that
00:18:52.080 we have no outliers anymore like we had for rabinius in this one slide so we have similar numbers
00:18:58.880 although everything is written in a different language but because it follows the same design principles we get
00:19:05.760 comparable numbers there and that's pretty amazing okay so the second thing i want to
00:19:11.919 present to you is fussbird the background there is that i wanted script to be
00:19:18.160 really really well tested so i think testing should be priority anyways but
00:19:23.280 especially for a security project and that's something that i don't see
00:19:28.480 with other crypto libraries they have some tests but it's not really really well tested
00:19:34.080 so what i do is i have the usual suspects there
00:19:39.280 i try to include official test vectors in the test i do code coverage not only
00:19:44.320 for ruby but also for c in java code it's on travis of course and for c i to
00:19:51.760 find memory leaks and all those stuff i also included valgrind but the problem is with testing
00:19:59.120 we cannot test exhaustively it's an exponential exponentially hard problem
00:20:04.880 and to see why even with the most simple methods for example if you imagine that
00:20:11.520 this method here that arc could only be take on internship values we still couldn't test it exhaustively because
00:20:18.240 there are infinitely many integers so what we need is a heuristic something
00:20:24.160 that covers a lot of ground while not taking up too much time and one of those heuristics is random
00:20:31.360 testing or fussing as it's also called so random testing means you generate
00:20:36.400 random data shoot it at your app and see what happens and unfortunately although it's been
00:20:42.880 around for quite a while now people don't really seem to like it so
00:20:48.000 the arguments often yeah it crashes but come on nobody would ever
00:20:53.120 send such data and there's this general feeling of it's not fair because the machine generated
00:20:59.760 it but i actually think that's the real strength of random testing because it has no bias
00:21:06.080 so as a developer you're always biased when writing tests because you think
00:21:11.360 you know where to look for trouble but you probably omit places that would
00:21:16.480 be interesting too and that's also what random testing tends to find it tends to find those
00:21:22.559 weird cases and that's a good thing because hackers do exactly the same thing they will use
00:21:28.960 fuzzing to find vulnerabilities in your application so it's good if you can find them
00:21:34.320 before they do and yeah also your users might find
00:21:39.679 errors that you never thought of so just ask my mom she's found a lot of bugs and
00:21:44.960 windows that i never thought possible and because a lot of the stuff happens
00:21:52.240 in an automated way we can also cover a lot more ground than we could usually in
00:21:57.520 less time so in its most simple form we would just shoot completely random data at the app
00:22:04.559 and this is probably not what we want because it means we're scratching a lot
00:22:11.200 on the surface so we're wasting a lot of time with data that's that we already know will be rejected so
00:22:18.559 we probably don't get the edge cases that are further within the application so there's a trade-off between
00:22:24.720 completely random data and test cases that apply more structure to the data
00:22:29.919 and i believe that in order to have a good random testing suite we need both
00:22:36.240 and fussbird is something that aims to help you with this
00:22:41.520 so what it is it's something that looks probably familiar to our spec
00:22:46.960 you have this fast directive to declare your tests you have one of those deploy blocks that
00:22:53.520 tells fastbird how to send the data to your application so you're
00:22:59.520 really flexible there you it's not just for example targeted at
00:23:04.960 web applications then you have several of those data blocks that generate the data and there
00:23:11.200 you have you're free to choose how much structure you want to apply for example this first
00:23:16.640 line the first data block that's producing completely random data
00:23:22.480 and the other supply more structure so that's fine as long as you're working with binary protocols but as soon as you
00:23:29.760 want to fuss let's say a web app or anything that's string based data what you actually want is to have some
00:23:36.640 form of template support because you're dealing with strings that's also possible with fussbirds i
00:23:43.520 included this very simplistic templating language you can see it there
00:23:48.559 in the middle in red that's a template for producing json data and you can assign variables using dollar
00:23:56.240 and curly braces and after that you can assign generators that generate random
00:24:02.080 data to those variables so what's nice about
00:24:08.080 the testing procedure in general is that it runs in a separate process because
00:24:14.000 you want to be able to deal with cases when your vm would crash entirely
00:24:19.679 and so threats were out of the picture we need to do this in separate processes
00:24:25.039 everything happens in memory so there's no marshalling or unmarshaling and this of course speeds up things a lot and
00:24:31.919 it's only when something fails that those particular cases will be persisted
00:24:38.240 now that we've talked about all of this the question is does it actually work and oh yes it does
00:24:44.559 i can only recommend this course at udacity they cover a lot of random testing more
00:24:50.080 than i can do here and it's pretty interesting you should check that out
00:24:55.919 so some of the arguments against random testing are that it's not scientific there's no scientific foundation to it
00:25:02.559 but those people tend to forget that there's no scientific foundation for traditional testing either
00:25:08.799 and another argument is that you need to still need to know what you're doing but
00:25:15.200 i think you always should know what you're doing otherwise you're screwed anyway
00:25:20.720 and if you really want science then i can give you some ideas you could start
00:25:26.159 modeling failure arrival using for example an exponential distribution there
00:25:31.520 and then you start measuring the expected time until one of the tests fails
00:25:36.960 and you can do dynamically update your tests by doing some hypothesis testing so if one test
00:25:43.279 takes too much time compared to the average time you could kick it out and take in a new one
00:25:49.200 and so you could start off with a basis of just a few tests and mutate them on the
00:25:56.080 way and keep updating them and yeah this is the dream of having completely automated tests with just a
00:26:02.880 small basis of tests so if there are some our people amongst you
00:26:08.720 please do this i want to encourage you to play with this start fuzzing whatever
00:26:15.760 you like you can you could for example start fussing the ruby parser you could start fussing rails you can
00:26:22.720 even use this fastbird to fuss command line tools so it's not just ruby
00:26:29.120 only and i really believe that fuzzing is really the next step of testing and we
00:26:34.720 should all be doing this so my next showcase would have been
00:26:41.200 binyo but um i've been working on something lately that i thought would probably
00:26:46.799 be even more interesting since pinyo is just a vaporware right now so
00:26:53.840 what i want to talk about is hashing not this hashing but hashes in ruby so
00:27:01.039 let's just think about where our hashes used in your everyday programming and
00:27:07.919 the question is rather where aren't they used i think every real world application uses hashes somewhere
00:27:15.120 there's been this blog post by charlie about how we should avoid hashes because
00:27:21.200 they're only trouble but i think we all started to love them and would like to use them
00:27:28.720 in our applications so i don't think we can really get rid of them now what i want to talk about is
00:27:36.320 maybe some of you know about this last year there was this hashtag thing presented at chaos
00:27:42.240 computer club where people were able to
00:27:48.480 mount denial of service attack on the hash implementations of programming languages
00:27:54.880 so the problem was it was quite easy to produce collisions for general purpose
00:27:59.919 hash functions and what was surprising is that this hasn't been a new
00:28:06.080 thing it's been around for quite some while back in 2003
00:28:11.600 mr crosby found this but at the time he only
00:28:17.360 targeted pearl so that's why basically everybody else seemed to ignore this and it was only fixed for
00:28:23.760 pearl at the time so but last year we decided to fix this for good
00:28:29.600 in every other language too and the fix that was proposed was to
00:28:34.640 randomize the hash function so why because this is something that was already inclined in
00:28:42.159 the book introduction to algorithms they call it universal hashing and universal hashing means you would
00:28:48.799 just pick a random hash function from a random family of hash functions and this will
00:28:54.720 give you an upper bound to the collision probability now that's all well but there's a
00:29:00.799 problem with this thinking because universal hashing explicitly assumes
00:29:07.520 the hash function to act like a pseudo random function pseudorandom function means that the output is not
00:29:14.159 distinguishable from a real random function so if there's somebody rolling a die
00:29:20.159 and your pseudo random function you shouldn't notice a difference the problem is that basically every
00:29:26.960 general purpose hash function is not pseudorandom and just randomizing the seed as we did
00:29:32.640 last year is not good enough and to see why imagine this very stupid hash function
00:29:38.960 that takes in a random seed but always outputs 42. so this will never be random
00:29:44.320 regardless how good your seed is so it turns out that jean-philippe omazon who's of sha-3 fame he invented
00:29:53.039 blake and daniel j bernstein you probably know him too they were working on a hash function and
00:30:00.080 by the way you should follow these guys they're really good and while they were working on this hash
00:30:05.760 function they found out how to produce multi-collisions for murmur hash
00:30:11.600 in its versions two and three and they were even able to produce this with the fix applied last year of randomizing the
00:30:19.120 seeds and the problem is those hash functions are used in c ruby j ruby and rubinius
00:30:26.000 in some form and it doesn't simply it doesn't matter whatever the random seed is and how good
00:30:31.919 it is you can still produce collisions at will and yeah i would have loved to show you
00:30:38.880 this but you all know how it went but i will probably
00:30:43.919 yeah i have some announcement later on about this so
00:30:49.520 when i heard of this and talked to jean-philippe i was afraid that this would be turning out as a ruby thing and
00:30:56.240 that people would start claiming yeah it's a ruby problem so i
00:31:01.600 was looking into who else could we probably target maybe some language that always claimed to be also secure
00:31:08.880 and it's used in a lot of enterprise applications and you guessed right
00:31:14.880 so good news they're affected too
00:31:20.159 so okay we can produce collisions now but what good is that well the problem is
00:31:26.559 that if you think about web servers what they do is they create hashes from
00:31:31.760 user input so if you send form encoded data if using json whatever
00:31:37.039 it causes a hash to be filled with this data and the problem is that the worst case
00:31:42.240 behavior of a hash function for an insertion is linear as opposed to its average time
00:31:47.840 which is constant so this means that if you insert n values you will end up
00:31:53.200 having a quadratic time instead of a linear time and this is bad because with very little
00:31:59.120 effort you can actually take down a web server now what can we do about this
00:32:05.279 there have been some attempts and that's also i think the philosophy that some of the program programming languages last
00:32:11.919 year took we could start restricting parameters we could start looking into libraries
00:32:18.480 each and everywhere but i think the chance that one of those libraries will be
00:32:24.000 yeah that somebody will fix this wrong is just too high so i think we should fix it where it happens which is in the
00:32:30.240 ruby function ruby hash function so one way out would be to use the
00:32:35.760 cryptographic hash function as we all know md5 or sha-1 or whatever
00:32:41.279 they're just too slow so what we actually want is to have a really fast cryptographic hash function
00:32:48.320 that renders this attack infeasible and if you want to know more about how this works and if you
00:32:55.440 actually want to see the demo jean-philippe is going to give a presentation next week in switzerland
00:33:02.799 and i'm going to be there and i will demonstrate those i will demonstrate how to use this
00:33:08.320 attack on a real world rails application and afterwards we are also going to publish this code so shot code is if you
00:33:16.399 apply shot and floyd to hash functions and yeah my final words
00:33:23.039 let's make an effort to replace opencsl thanks for having me here sorry for the
00:33:28.480 problems i want to thank you all please visit my codes please have a look at crypt if you can bear it you can read my
00:33:35.600 blog and follow me on twitter write me an email thanks
00:34:13.200 you
Explore all talks recorded at RubyConf 2012
+46