Yirt Grek

Zero Day Research

Binary Exploit Discovery Review


Notes on state of the art research into the automated discovery of vulnerabilities in binaries.

Leveraging LLM and Crash Reuse for Embedded Bug Unearthing

The authors use LLM based seed generation to generate the initial input seeds for mutation based coverage-guided fuzzing. Second they introduce a crash reuse strategy to speed up fuzzing across variants present in different targets. Notably these techniques can be used without altering the target’s source code or compilation process by using AFL++ in QEMU mode.

The main takeaways from this paper is that using LLM’s to generate initial seeds leads to improved findings by the fuzzer. AFL++ was used as the fuzzer and ChatGPT 4.0 was used to generate the seeds.

Generating Seeds

For example the prompt for fuzzing the awk applet in BusyBox was

"role": "system", 
"content": "You are an initial seed generator for a fuzzer that has to fuzz BusyBox awk applet. In response only provide the list of awk scripts"

"role": "user", 
"content": "Generate inital seed to fuzz BusyBox awk applet"

After gathering results from the LLM run afl-cmin to clean up the corpus

root@kali:~/# afl-cmin -i in -o out -- ./ @@
corpus minimization tool for afl-fuzz by <lcamtuf@google.com>
[*] Testing the target binary...
[+] OK, 82 tuples recorded.
[*] Obtaining traces for input files in 'in'...
    Processing file 1/1...
[*] Sorting trace sets (this may take a while)...
[+] Found 82 unique tuples across 1 files.
[*] Finding best candidates for each tuple...
    Processing file 1/1...
[*] Sorting candidate list (be patient)...
[*] Processing candidates and writing output files...
    Processing tuple 82/82...
[!] WARNING: All test cases had the same traces, check syntax!
[+] Narrowed down to 1 files, saved in 'out'.

You can also run afl-tmin to make sure the test cases don’t have unnecessary segments.

afl-tmin -i afl_in -o afl_out -- ./ @@

AFL QEMU mode

To fuzz apps not insturmented by afl we can use Qemu mode, just add a -Q flag to the usual command.

afl-fuzz -Q -i afl_in -o afl_out -- <binary>

Reference

paper

slides

talk

github


Binary Libification

Writing exploits often boils down to 3 steps: reaching, triggering, and exploiting. This paper focuses on the second step triggering by modifying the ELF headers to turn them into a shared library. This elegantly enables you to arbitrarily call functions within an ELF or even turn the entire ELF into a callable API.

This paper introduces the Witchcraft Linker and Witchcraft Shell. The linker enables transforming ET_EXEC or ET_DYN ELF binaries into a shared libray. The Shell allows you to invoke arbitrary C/C++ functions within libraries created by the linker. The main advantage of these tools is it allows researchers to manually invoke the posible vulnerable function without needing to produce user inputs traversing the applications call graph.

Installing

Make sure to install the prequisites. On Debian based systems run:

sudo apt install -y clang libbfd-dev uthash-dev libelf-dev libcapstone-dev  libreadline-dev libiberty-dev libgsl-dev build-essential git debootstrap file

Then you can grab a copy from github and then make sure the submodules are up to date.

git clone https://github.com/endrazine/wcc.git
cd wcc
git submodule init
git submodule update

Once everything is installed you can use make to build and install.

make

sudo make install

WLD

WLD is the witchcraft linker. Using it is straight forward just copy the bin you want to convert then pass it to wld.

cp /bin/ls /tmp/ls.so
wld --libify /tmp/ls.so

Currently wld only works on ELF binaries however the underlying architecture or operating system is irrelevant. Meaning that wld could process executables from Android, Linux, BSD and transform them into non relocatable shared libraries.

Keep in mind that you should only need to convert Executable files. You can find this the readelf command and looking at the Type field.

~$ file /bin/ls
~$ readelf -h /bin/ls
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file) <--------------- Needs wld
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x404890
  Start of program headers:          64 (bytes into file)
  Start of section headers:          108288 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         28
  Section header string table index: 27

One that doesn’t need to be linked would be apache for example:

~$ readelf -h /usr/sbin/apache2
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file) <----------- No need for wld
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x37156
  Start of program headers:          64 (bytes into file)
  Start of section headers:          635736 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         28
  Section header string table index: 27

Future Research

  • Automatically load individual functions and scan for exploitable instructions.

References

paper

slides

talk

conf page

github