Fuzzing is magic - Or how I found a panic in Rust's regex library

30 Mar 2017

5 minutes read

Recently in the Rust world, a new tool called cargo fuzz was released. Fuzzing is a technique to intelligently generate arbitrary input for a program in order to find bugs in it.

cargo fuzz promises a very simple way to fuzz a cargo project using LibFuzzer, a coverage-guided, evolutionary fuzzing engine.

Since I had just done a change in the parsing module of Rust’s regex crate, I thought I’d try fuzzing that and see what happens. This post will go through the steps to set it all up and what I found.

The first step is to install it:

cargo install cargo-fuzz

Then, in the crate directory (regex-syntax in my case) you create a subproject:

cargo fuzz init

This generates a fuzz directory with a Cargo.toml and some files. There’s a subfolder that holds fuzzer scripts. The generated one is in fuzz/fuzzers/fuzzer_script_1.rs and looks like this:

#![no_main]
extern crate libfuzzer_sys;
extern crate regex_syntax;
#[export_name="rust_fuzzer_test_input"]
pub extern fn go(data: &[u8]) {
    // fuzzed code goes here
}

The idea is that the script is invoked repeatedly, providing different data in the byte slice as input.

So in our case, what we want to do is try to parse the input as a regex. The library is supposed to be able to handle any string as input and not crash. In case the input is an invalid regular expression, the parser should return an error result.

Since the library only accepts a string, what we can do is convert it to a string first. So let’s change the script to do that:

#![no_main]
extern crate libfuzzer_sys;
extern crate regex_syntax;

use std::str;
use regex_syntax::Expr;

#[export_name="rust_fuzzer_test_input"]
pub extern fn go(data: &[u8]) {
   if let Ok(s) = str::from_utf8(data) {
       Expr::parse(s);
   }
}

Only if the byte slice is a valid UTF-8 string, we pass it to the parser.

Now we want to run the fuzzer. It currently requires nightly Rust, which we can do by adding +nightly to the command:

cargo +nightly fuzz run fuzzer_script_1

If we try running it on Mac OS, we get this (some parts omitted):

    Updating git repository `https://github.com/rust-fuzz/libfuzzer-sys.git`
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Compiling regex-syntax v0.4.0 (file:///Users/rstocker/Projects/rust/regex/regex-syntax)
   Compiling gcc v0.3.45
...
error[E0463]: can't find crate for `std`
  |
  = note: the `x86_64-unknown-linux-gnu` target may not be installed
...

Turns out it currently also requires Linux. But nowadays, even if you don’t run Linux, there’s Docker. So you can use one of the existing Rust images which works just fine for cargo fuzz:

docker run -v $PWD:/volume -w /volume -t clux/muslrust:nightly \
  sh -c "cargo install cargo-fuzz && cargo fuzz run fuzzer_script_1"

This mounts the current directory into the Docker container as a volume, installs cargo fuzz and then runs it.

So after running for a few minutes, here’s what the output was:

...
thread '<unnamed>' panicked at 'valid octal number', /checkout/src/libcore/option.rs:785
note: Run with `RUST_BACKTRACE=1` for a backtrace.
==2160== ERROR: libFuzzer: deadly signal
    #0 0x56032e5e18f9  (/volume/fuzz/target/x86_64-unknown-linux-gnu/debug/fuzzer_script_1+0x29c8f9)
    ...

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 4 ShuffleBytes-ChangeByte-ChangeBinInt-ShuffleBytes-; base unit: b6b52807ed22997123cb048f286c450e5bb1b395
0x6d,0x3a,0x28,0x3f,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x6d,0x73,0x29,0x6d,0x6d,0x6d,0x0,0x1,0x0,0x2e,0x0,0x2b,0x40,0x2d,0x0,0xa,0x0,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x27,0x10,0x2,0x5b,0x2d,0x2d,0xa,0x5b,0x0,0x0,0x24,0x5c,0xa,0x33,0x3,0x5b,0x0,0x3a,0x36,0x3,0x44,0x0,
m:(?xxxxxxxxxxxxxms)mmm\x00\x01\x00.\x00+@-\x00\x0a\x00\x10\x10\x10\x10\x10\x10\x10\x10'\x10\x02[--\x0a[\x00\x00$\\\x0a3\x03[\x00:6\x03D\x00
artifact_prefix='artifacts/'; Test unit written to artifacts/crash-a855ca46b72a30e34db264fc4f9968df1e6b4ddb
Base64: bTooP3h4eHh4eHh4eHh4eHhtcyltbW0AAQAuACtALQAKABAQEBAQEBAQJxACWy0tClsAACRcCjMDWwA6NgNEAA==

Lots of output there. The interesting bit for us is the input that it used. The bytes are printed in hex, but also as a string with escapes:

m:(?xxxxxxxxxxxxxms)mmm\x00\x01\x00.\x00+@-\x00\x0a\x00\x10\x10\x10\x10\x10\x10\x10\x10'\x10\x02[--\x0a[\x00\x00$\\\x0a3\x03[\x00:6\x03D\x00

If you squint your eyes and ignore a lot of the \x, that kinda looks like a regex. After putting the string into a normal test and running it, it indeed panics with the valid octal number message! Note that there’s nothing unsafe about a panic, it ends the current thread and prints a message. But the user of the regex library would rather get an error result back from parsing.

So, looks like we found an actual bug! Isn’t it exciting to find a bug sometimes?

Then we can try to simplify the string to narrow down the problem, by removing parts of it and see if it still crashes. Doing that, I found a minimal test case for the problem:

#[test]
fn fuzz() {
   Expr::parse("(?x)\\\x0a3");
}

Or using a space instead of a newline, and a raw string to not have to escape the backslash:

#[test]
fn fuzz() {
   Expr::parse(r"(?x)\ 3");
}

The bug can also be seen in tools that rely on the regex crate, such as ripgrep.

After that, I looked at the code, found the problem and a solution. You can see that in the pull request for regex.

Edit: A description of the bug: (?x) is for enabling extended mode, which allows inserting whitespace and comments in some places of the regex. \ is for an octal escape. The code which parses the octal escape checks that the next characters are numbers between 0 and 7, skipping whitespace. But then it takes the whole string (space followed by 3) and tries to parse that as an octal number. That returns an error, but the code didn’t handle that error case, because it asserted that it had already checked the numbers before.

So that’s it, pretty simple! Let’s fuzz all the things and find all the bugs! Have a look at the LibFuzzer page for details about how it works.

// Comments Reddit, Hacker News

Back to posts