Often times, as programmers, we need to check to see if a substring exists in a bigger string. Many programmers will instinctively reach out for Regex matching, but I often wondered if this was really the best way to do things, particularly in Ruby.
One day, in a Ruby-focused Skype group chat, a friend of mine asked the other members to give him a snippet of code that would allow him to take a hostname, check if the substring ‘qa’ was in it and if it did, return ‘qa’, else ‘prod’ (for production). My knee-jerk reaction was the same as the other members: use Regex. But I wondered if it would be better to use compiled Regex, or interpreted Regex, or perhaps maybe even the built in String method include?. So I decided to do some benchmarking.
Ruby has a great module built in that makes benchmarking very simple: Benchmark. A lot of the time I use it to check if something is performant or not, even on the Rails level! In this session, I make use of a very small subset of this module’s functionality, but if you’re interested, please be sure to check out the above link to see other options.
Anyways, my goal was to benchmark a simple string pattern check using compiled Regex, interpreted Regex, and the String #include? method. As a result, I wrote the following ruby script:
Code for benchmarking (simple_string_matching_benchmark.rb)download
require"benchmark"regex_compiled=Regexp.new("qa")regex_compiled.match("qa")Benchmark.bmbm(10)do|x|x.report("1000000 times positive regex"){1000000.timesdo"someserverqa"=~/qa/?"qa":"prod"end}x.report("1000000 times negative regex"){1000000.timesdo"someserverprod"=~/qa/?"qa":"prod"end}x.report("1000000 times positive regex compiled"){1000000.timesdoregex_compiled.match("someserverqa")?"qa":"prod"end}x.report("1000000 times negative regex compiled"){1000000.timesdoregex_compiled.match("someserverprod")?"qa":"prod"end}x.report("1000000 times positive #include?"){1000000.timesdo"someserverqa".include?("qa")?"qa":"prod"end}x.report("1000000 times negative #include?"){1000000.timesdo"someserverqa".include?("someserverprod")?"qa":"prod"end}end
Note the use of .bmbm, which tells the module that we want to perform a rehearsal run.
The results were as follows:
12345678910111213141516
Rehearsal -------------------------------------------------------------------------
1000000 times positive regex 0.400000 0.000000 0.400000 ( 0.398280)
1000000 times negative regex 0.420000 0.000000 0.420000 ( 0.424669)
1000000 times positive regex compiled 0.990000 0.020000 1.010000 ( 1.005177)
1000000 times negative regex compiled 0.500000 0.000000 0.500000 ( 0.496037)
1000000 times positive #include? 0.280000 0.000000 0.280000 ( 0.277431)
1000000 times negative #include? 0.240000 0.000000 0.240000 ( 0.248257)
---------------------------------------------------------------- total: 2.850000sec
user system total real
1000000 times positive regex 0.400000 0.000000 0.400000 ( 0.408893)
1000000 times negative regex 0.450000 0.000000 0.450000 ( 0.447820)
1000000 times positive regex compiled 1.010000 0.000000 1.010000 ( 1.009081)
1000000 times negative regex compiled 0.500000 0.000000 0.500000 ( 0.502232)
1000000 times positive #include? 0.280000 0.000000 0.280000 ( 0.275711)
1000000 times negative #include? 0.250000 0.000000 0.250000 ( 0.247974)
Conclusion
Turned out in the case of simple string presence checking, #include? was faster than everything else by almost half. More surprising was the fact that Regex compiled was even slower than Regex interpreted.