Often times, as programmers, we need to check to see if a substring exists in a bigger string. Many programmers will instinctively reach out for Regex matching, but I often wondered if this was really the best way to do things, particularly in Ruby.

One day, in a Ruby-focused Skype group chat, a friend of mine asked the other members to give him a snippet of code that would allow him to take a hostname, check if the substring ‘qa’ was in it and if it did, return ‘qa’, else ‘prod’ (for production). My knee-jerk reaction was the same as the other members: use Regex. But I wondered if it would be better to use compiled Regex, or interpreted Regex, or perhaps maybe even the built in String method include?. So I decided to do some benchmarking.

Ruby has a great module built in that makes benchmarking very simple: Benchmark. A lot of the time I use it to check if something is performant or not, even on the Rails level! In this session, I make use of a very small subset of this module’s functionality, but if you’re interested, please be sure to check out the above link to see other options.

Anyways, my goal was to benchmark a simple string pattern check using compiled Regex, interpreted Regex, and the String #include? method. As a result, I wrote the following ruby script:

Code for benchmarking (simple_string_matching_benchmark.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
require "benchmark"

regex_compiled = Regexp.new("qa")
regex_compiled.match("qa")

Benchmark.bmbm(10) do |x|
  x.report("1000000 times positive regex"){1000000.times do
    "someserverqa" =~ /qa/ ? "qa" : "prod"
  end}

  x.report("1000000 times negative regex"){1000000.times do
    "someserverprod" =~ /qa/ ? "qa" : "prod"
  end}

  x.report("1000000 times positive regex compiled"){1000000.times do
    regex_compiled.match("someserverqa") ? "qa" : "prod"
  end}

  x.report("1000000 times negative regex compiled"){1000000.times do
    regex_compiled.match("someserverprod") ? "qa" : "prod"
  end}

  x.report("1000000 times positive #include?"){1000000.times do
    "someserverqa".include?("qa") ? "qa" : "prod"
  end}

  x.report("1000000 times negative #include?"){1000000.times do
    "someserverqa".include?("someserverprod") ? "qa" : "prod"
  end}

end

Note the use of .bmbm, which tells the module that we want to perform a rehearsal run.

The results were as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Rehearsal -------------------------------------------------------------------------
1000000 times positive regex            0.400000   0.000000   0.400000 (  0.398280)
1000000 times negative regex            0.420000   0.000000   0.420000 (  0.424669)
1000000 times positive regex compiled   0.990000   0.020000   1.010000 (  1.005177)
1000000 times negative regex compiled   0.500000   0.000000   0.500000 (  0.496037)
1000000 times positive #include?        0.280000   0.000000   0.280000 (  0.277431)
1000000 times negative #include?        0.240000   0.000000   0.240000 (  0.248257)
---------------------------------------------------------------- total: 2.850000sec

                                            user     system      total        real
1000000 times positive regex            0.400000   0.000000   0.400000 (  0.408893)
1000000 times negative regex            0.450000   0.000000   0.450000 (  0.447820)
1000000 times positive regex compiled   1.010000   0.000000   1.010000 (  1.009081)
1000000 times negative regex compiled   0.500000   0.000000   0.500000 (  0.502232)
1000000 times positive #include?        0.280000   0.000000   0.280000 (  0.275711)
1000000 times negative #include?        0.250000   0.000000   0.250000 (  0.247974)

Conclusion

Turned out in the case of simple string presence checking, #include? was faster than everything else by almost half. More surprising was the fact that Regex compiled was even slower than Regex interpreted.

Comments