Regex performance in golang and rust. Which is faster? 🤔
Introduction
In the previous blog entry, we looked at the implementation details of PyRedactKit for improving performance, reducing code smells by refactoring, Using automated testing and code scanning with Github Actions. The heart of the cli app is working with regular expressions to identify any sensitive data are found in the ascii text files and mask them if any.
The initial constraints I had to work with are:
- The tool needs to run cross platform
- Rapidly prototype new features and push them out asap
- Most of the target audience have Python installed on their devices
The tool works great! It has been packaged and started rolling into production. So What next? I heard good things about Go and Rust being performant as compiled languages. Go being good at concurrency for web services and Rust being good at systems performance. However, there is very limited information available for regular expressions performance of each with my specific use case.
Thus, I decided to test things out myself and document my findings in this blog. Hopefully this would help someone and give insights to make decisions on their future projects with regards to regex.
Baseline setup
The baseline setup for both of the languages are simple. We will be testing the following of each standard regex libary for Go and Rust:
- Regex Match
- Regex Replace
For these two functions, I try to find the closest regex function for each language to run the benchmark against. The setup is simple, there is a chunk of data. We will match IP addresses in the data for one function and another function will be replacing the matched IP address with some other string. The two functions will be looped and timed for each.
You can find the full code repository for it here. https://github.com/brootware/regex-performance-benchmark
Running the benchmark
The scripts are first run without being compiled into a binary for 5 times and the results are averaged out to reduce any outliers.
Next, I repeated the same process with compiled binaries for each language. Do note that these binaries are compiled for release, meaning all the debugging and other dependencies are stripped out.
Results
You can check out the benchmark results below. Currently this is only ran on my macbook. (I am looking to run this in some of my cloud instances when I have time.) You can also run it in the meantime to see the results for yourself.
https://docs.google.com/spreadsheets/d/1Fg0r0emUTjItqlrCrVgJh6dgUwMJI0GhvSDsrgGpdD8/edit?usp=sharing
The results for running without compilation are fairly simple. Go performed better than Rust by a second. This was a bit surprising as I thought Rust will do better according to the internet anecdotes. So I started rooting for Go.
However, Rust blew Go out of the water as a compiled binary. The regular expression matching and replacing functions took less than a second to run which is really incredible while Go still took around the same time to run as a binary!
Hope this gave you some insight into regular expression libraries for each of the languages. So for the next project that you going to be working on with anything related to regular expressions, you might want to use Rust.