I am selling the source code for another of my private Regular Expression fuzzers. This is a much more advanced fuzzer then the previous one I put up for sale. It is also implemented in Python and can be used to generated random valid(-ish) regular expression strings in 6 different formats. It is designed to be easily extensible in case you want to add more formats or add missing features to the existing code. I've included Python scripts that use it to generate tests for fuzzing the regular expression engines in Perl, Ruby, Python, ECMAScript, JScript and VBScript.
This fuzzer is designed to allow generation of regular expressions that adhere to various different syntaxes. It is modular and easy to adjust and extend if you want to add a new syntax or create a variant of an existing syntax. It is perfect if you want to do some serious fuzzing of regular expression engines.
You can buy a license for the non-exclusive use of this fuzzer for as little as
250€ at https://license.
Once you have paid for the license, you can download the source in a .zip file from a link in the license details. this download contains the main Python class and the four Python scripts described below.
Please read the full license before downloading the fuzzer source.
The core fuzzer code is built around the main cRegExpFuzzer3
class. Various
features of the regular expression syntax are implemented in separate classes.
Only the cRegExpFuzzer3
class is meant to be instanciated directly; the rest
are helper classes that are instantiated by cRegExpFuzzer3
. Here's a list of
all relevant files:
cRegExpFuzzer3. py
- The main classcRegExpAnchor. py
- Implements "anchors": ^
, $
, \b
, ...cRegExpBranchReset. py
- Implements "branch resets": (?|...)
cRegExpCharacterClasses. py
- Implements "character classes": \d
, \w
, \s
, ...cRegExpComment. py
- Implements "comments": (?# comment)
cRegExpFlagsModifier. py
- Implements "flags modifiers": (?flags)
and (?flags:...)
cRegExpGroup. py
- Implements "groups": (...)
and (?...)
cRegExpLookAroundAssertion. py
- Implements "look ahead/behind": (?=...)
, (?<=...)
, ...cRegExpSubExpression. py
- Implements sub-expressions: (?>...)
cRegExpFlags. py
- Implements flags: /.../gmi
When you instantiate cRegExpFuzzer3
, you can tell it which syntax you want to
use. You can then call the fsGetPattern(uLength)
method of the created object
to generate a regular expression string of approximately uLength
bytes. You
can also call the fsGetFlags(uLength, [bUsedInReplace])
method to generate
a string of valid flags of approximately uLength
bytes. bUsedInReplace
is
used to indicate that the flags will be used with a regular expression in a
string replace operation; this can enable/disable specific flags depending on
the syntax in use.
Here's an example:
oRegExpFuzzer = cRegExpFuzzer3("ECMAScript");
sPattern = oRegExpFuzzer. fsGetPattern(100);
sFlags = oRegExpFuzzer. fsGetFlags(2);
print "/%s/%s" % (sPattern, sFlags);
The above code will output a randomly generated regular expression in ECMAScript (JavaScript) syntax that has approximatly 100 characters of pattern and uses 2 flags.
I've added seven Python scripts that serve as examples of how to use this fuzzer to test the regular expression engines of various scripting engines. When run, these scripts generate test code and write it to a script file. This script file can then be run in the target scripting engine to test the regular expression engine. Here's a list of commands to use each script to test a different engine:
python GeneratePerlTestCode. py & perl repro. pl
python GeneratePHPTestCode. py & php repro. php
python GeneratePythonTestCode. py & python repro. py
python GenerateRubyTestCode. py & ruby repro. rb
python GenerateSpiderMonkeyTestCode. py & js repro. js
python GenerateJScriptTestCode. py & cscript /nologo repro. js
python GenerateVBScriptTestCode. py & cscript /nologo repro. vbs
If you find a crash, you can run the repro again using BugId to automatically analyze the issue.
To show what kind of output you can expect from this fuzzer, I've generated example test code using the above commands, which you can download here.
If you run the tests, you will find that the Ruby test will cause Ruby to crash with a NULL pointer reference, see this tweet for details. The ECMAScript, JScript and VBScript tests will send those engines into infinite loops where they use 100% CPU, which is a very common issue.
The tests report any errors in parsing the regular expressions; this mostly happens when a generated regular expresion is syntactically valid but triggers a code-path that is not implemented, or the engine detects that it is somehow non-sensical.
I generated data for various engines a number of times until I had a test that they could complete. This proved to be impossible for JScript and VBScript as they would invariably end up freezing with 100% CPU usage. At the end the test lets you know how many of the fuzzed regular expressions were considered valid. Below are the results for those engines that were able to complete a test at least once:
Considering that each test ran on 2808 regular expressions ranging in size from 10 to 1000 bytes, this indicates that the fuzzer does not generate too random data but is right on the edge where they are random enough to potentially trigger issues.
If you have any further questions, please send an email to
license@skylined.