This is a friendly warning that your web-browser does not currently protecting your privacy and/or security as well as you might want. Click on this message to see more information about the issue(s) that were detected. January 15th, 2019 Reg­Exp­Fuzzer3 for sale

Reg­Exp­Fuzzer3 for sale

I am selling the source code for another of my private Regular Expression fuzzers. This is a much more advanced fuzzer then the previous one I put up for sale. It is also implemented in Python and can be used to generated random valid(-ish) regular expression strings in 6 different formats. It is designed to be easily extensible in case you want to add more formats or add missing features to the existing code. I've included Python scripts that use it to generate tests for fuzzing the regular expression engines in Perl, Ruby, Python, ECMAScript, JScript and VBScript.

This fuzzer is designed to allow generation of regular expressions that adhere to various different syntaxes. It is modular and easy to adjust and extend if you want to add a new syntax or create a variant of an existing syntax. It is perfect if you want to do some serious fuzzing of regular expression engines.

Where to get

You can buy a license for the non-exclusive use of this fuzzer for as little as 250€ at https://license.skylined.nl/. Simply create an account on the site, select the Reg­Exp­Fuzzer3 license, and pay on-line with a creditcard or request an invoice to pay through direct bank transfer.

Once you have paid for the license, you can download the source in a .zip file from a link in the license details. this download contains the main Python class and the four Python scripts described below.

Please read the full license before downloading the fuzzer source.

The fuzzer

The core fuzzer code is built around the main c­Reg­Exp­Fuzzer3 class. Various features of the regular expression syntax are implemented in separate classes. Only the c­Reg­Exp­Fuzzer3 class is meant to be instanciated directly; the rest are helper classes that are instantiated by c­Reg­Exp­Fuzzer3. Here's a list of all relevant files:

  • c­Reg­Exp­Fuzzer3.py - The main class
  • c­Reg­Exp­Anchor.py - Implements "anchors": ^, $, \b, ...
  • c­Reg­Exp­Branch­Reset.py - Implements "branch resets": (?|...)
  • c­Reg­Exp­Character­Classes.py - Implements "character classes": \d, \w, \s, ...
  • c­Reg­Exp­Comment.py - Implements "comments": (?# comment)
  • c­Reg­Exp­Flags­Modifier.py - Implements "flags modifiers": (?flags) and (?flags:...)
  • c­Reg­Exp­Group.py - Implements "groups": (...) and (?...)
  • c­Reg­Exp­Look­Around­Assertion.py - Implements "look ahead/behind": (?=...), (?<=...), ...
  • c­Reg­Exp­Sub­Expression.py - Implements sub-expressions: (?>...)
  • c­Reg­Exp­Flags.py - Implements flags: /.../gmi

When you instantiate c­Reg­Exp­Fuzzer3, you can tell it which syntax you want to use. You can then call the fs­Get­Pattern(u­Length) method of the created object to generate a regular expression string of approximately u­Length bytes. You can also call the fs­Get­Flags(u­Length, [b­Used­In­Replace]) method to generate a string of valid flags of approximately u­Length bytes. b­Used­In­Replace is used to indicate that the flags will be used with a regular expression in a string replace operation; this can enable/disable specific flags depending on the syntax in use.

Here's an example:

  o­Reg­Exp­Fuzzer = c­Reg­Exp­Fuzzer3("ECMAScript");
  s­Pattern = o­Reg­Exp­Fuzzer.fs­Get­Pattern(100);
  s­Flags = o­Reg­Exp­Fuzzer.fs­Get­Flags(2);
  print "/%s/%s" % (s­Pattern, s­Flags);

The above code will output a randomly generated regular expression in ECMAScript (Java­Script) syntax that has approximatly 100 characters of pattern and uses 2 flags.

How to use

I've added seven Python scripts that serve as examples of how to use this fuzzer to test the regular expression engines of various scripting engines. When run, these scripts generate test code and write it to a script file. This script file can then be run in the target scripting engine to test the regular expression engine. Here's a list of commands to use each script to test a different engine:

  • Perl: python Generate­Perl­Test­Code.py & perl repro.pl
  • PHP: python Generate­PHPTest­Code.py & php repro.php
  • Python: python Generate­Python­Test­Code.py & python repro.py
  • Ruby: python Generate­Ruby­Test­Code.py & ruby repro.rb
  • Spider­Monkey jsshell: python Generate­Spider­Monkey­Test­Code.py & js repro.js
  • JScript: python Generate­JScript­Test­Code.py & cscript /nologo repro.js
  • VBScript: python Generate­VBScript­Test­Code.py & cscript /nologo repro.vbs

If you find a crash, you can run the repro again using Bug­Id to automatically analyze the issue.

Sample fuzzed data

To show what kind of output you can expect from this fuzzer, I've generated example test code using the above commands, which you can download here.

If you run the tests, you will find that the Ruby test will cause Ruby to crash with a NULL pointer reference, see this tweet for details. The ECMAScript, JScript and VBScript tests will send those engines into infinite loops where they use 100% CPU, which is a very common issue.

The tests report any errors in parsing the regular expressions; this mostly happens when a generated regular expresion is syntactically valid but triggers a code-path that is not implemented, or the engine detects that it is somehow non-sensical.

I generated data for various engines a number of times until I had a test that they could complete. This proved to be impossible for JScript and VBScript as they would invariably end up freezing with 100% CPU usage. At the end the test lets you know how many of the fuzzed regular expressions were considered valid. Below are the results for those engines that were able to complete a test at least once:

  • Perl: 93% valid.
  • PHP: 89% valid.
  • Python: 99% valid.
  • Ruby: 88% valid.
  • Spider­Monkey: 77% valid.

Considering that each test ran on 2808 regular expressions ranging in size from 10 to 1000 bytes, this indicates that the fuzzer does not generate too random data but is right on the edge where they are random enough to potentially trigger issues.

If you have any further questions, please send an email to license@skylined.nl.

© Copyright 2024 by Sky­Lined. Last updated on March 23rd, 2024. Creative Commons License This work is licensed under a Creative Commons Attribution-Non‑Commercial 4.0 International License. If you find this web-site useful and would like to make a donation, you can send bitcoin to 183yyxa9s1s1f7JBp­PHPmz­Q346y91Rx5DX.