Binary Modification: Part 2
If you haven’t yet, read Part 1 here!
Welcome back! We will be picking up from Part 1 after having inspected the binary’s headers and symbol table. In particular, we found the location of our .text segment along with the address of main and our target compute_hash function. With this information, we can move forward attempting to modify a single bit of the mystery binary to print the “success” flag.
Binary Dump
We found that inspecting the binary with readelf will only get us so far. We need to use another tool to disassemble the RISC-V code in a format we can read. For that, we can use the GNU tools objdump.
>$ riscv64-unknown-linux-gnu-objdump
Display information from object <file(s)>.
At least one of the following switches must be given:
-a, --archive-headers Display archive header information
...
-x, --all-headers Display the contents of all headers
-d, --disassemble Display assembler contents of executable sections
-D, --disassemble-all Display assembler contents of all sections
--disassemble=<sym> Display assembler contents from <sym>
...
There are a variety of options. objdump -x, for example, will display the contents of our section headers in a similar (but slightly different) format of readelf. objdump -D is what we’re after as it will display the assembly of all the sections, something that readelf could not do! For our purposes, we only need to examine the .text segment (our assembly code) so objdump -d will do just fine. Let’s save the output to the file mystery.asm.
Upon inspecting mystery.asm, we can see the disassembled .text section:
>$ riscv64-unknown-linux-gnu-objdump -d mystery > mystery.asm
>$ cat mystery.asm
mystery: file format elf64-littleriscv
Disassembly of section .text:
0000000000010120 <exit>:
10120: 1141 addi sp,sp,-16
[...]
000000000001014e <_start>:
1014e: 0000c197 auipc gp,0xc
10152: da218193 addi gp,gp,-606 # 1bef0 <__global_pointer$>
10156: 23818513 addi a0,gp,568 # 1c128 <__malloc_max_mem>
[...]
10188: 4601 li a2,0
1018a: 04e000ef jal 101d8 <main>
1018e: bf49 j 10120 <exit>
As we talked about earlier, we find our _start label at address 0x1014e which aligns with the listed entry point in the ELF header! Although I omitted much of _start for brevity, we can generally see that _start_ sets up the environment and eventually calls main before jumping to the exit label.
If we scroll further, we find our main function! Here is the entire version (annotated for those who are rusty in assembly):
00000000000101d8 <main>:
/* Store saved registers to stack */
101d8: 1101 addi sp,sp,-32
101da: ec06 sd ra,24(sp)
101dc: e822 sd s0,16(sp)
101de: 1000 addi s0,sp,32
/* malloc() call with 1000 bytes */
101e0: 3e800513 li a0,1000
101e4: 0d0000ef jal 102b4 <malloc>
101e8: 87aa mv a5,a0
101ea: fef43423 sd a5,-24(s0)
101ee: fe843783 ld a5,-24(s0)
/* branch instruction? Interesting... */
101f2: e38d bnez a5,10214 <main+0x3c>
101f4: deadc7b7 lui a5,0xdeadc
101f8: eef78513 addi a0,a5,-273 # ffffffffdeadbeef
101fc: 028000ef jal 10224 <compute_hash>
10200: 87aa mv a5,a0
10202: 85be mv a1,a5
/* printf("Success! Hash is %d") */
10204: 67e9 lui a5,0x1a
10206: cf878513 addi a0,a5,-776 # 19cf8 <__clzdi2+0x32>
1020a: 763000ef jal 1116c <printf>
1020e: 4501 li a0,0
10210: f11ff0ef jal 10120 <exit>
/* printf("Nothing to see here...") */
/* Note the call to puts() instead of printf() */
10214: 67e9 lui a5,0x1a
10216: d4078513 addi a0,a5,-704 # 19d40 <__clzdi2+0x7a
1021a: 004010ef jal 1121e <puts>
1021e: 557d li a0,-1
10220: f01ff0ef jal 10120 <exit>
Original C Code
1 #include <stdlib.h>
2 #include <stdint.h>
3 #include <stdio.h>
4
5 #define BIN_FILE "mystery"
6 #define SEED 0xDEADBEEF
7
8 uint32_t compute_hash(uint32_t);
9
10 int main() {
11 char *c = malloc(1000);
12 if (c == NULL) {
13 printf("Congratulations, you successfully modified the file!\n"
14 "Solution hash: %x\n", compute_hash(SEED));
15 exit(0);
16 } else {
17 printf("Nothing to see here...\n");
18 exit(-1);
19 }
20 }
21
22 uint32_t compute_hash(uint32_t seed) {
23 FILE *fp = fopen(BIN_FILE, "r");
24 if (fp == NULL) {
25 printf("Error opening file %s\n", BIN_FILE);
26 exit(-1);
27 }
28 uint32_t hash = seed;
29 uint32_t byte;
30 while (fread((void *) &byte, 1, 1, fp) == 1) {
31 hash ^= (seed * byte);
32 }
33 fclose(fp);
34 return hash;
35 }
Notice how the compiler replaced the second call to printf with a call to puts. printf has additional overhead needed to replace the format specifiers with their arguments on the stack. GCC sees that we are printing a static string and optimized our code by substituting puts for printf!
For our purposes, the following lines are the most important:
00000000000101d8 <main>:
/* malloc result stored in a5 */
101f2: e38d bnez a5,10214 <main+0x3c>
101f4: deadc7b7 lui a5,0xdeadc
101f8: eef78513 addi a0,a5,-273 # ffffffffdeadbeef
101fc: 028000ef jal 10224 <compute_hash>
...
10210: f11ff0ef jal 10120 <exit>
...
/* printf("Nothing to see here...") */
10214: 67e9 lui a5,0x1a
10216: d4078513 addi a0,a5,-704 # 19d40 <__clzdi2+0x7a>
1021a: 004010ef jal 1121e <puts>
1021e: 557d li a0,-1
10220: f01ff0ef jal 10120 <exit>
The key is the branch instruction bnez (“branch if not equal to zero”) which will jump to the new address if our register a5 is not equal to zero. This new address is the instruction at 0x10214 which is the start of our printf("Nothing to see here") block. If the bnez condition evaluates to false (a5 == 0), the program will not branch and will continue to the compute_hash function and terminate early at exit. This is exactly what we need!
BNEZ skips the execution of `compute_hash` Remember that our original goal is to the print the “success” line which involves jumping to compute_hash. If we can modify the binary so that the branch is not taken, then we will accomplish our goal.
The Plan
There are various single-bit modifications we can make to the binary to print “success.” Some ideas are below:
- Change
a5inbnezto a register we know will equal zero.a5is encoded as0b01111, so the registers with a Hamming distance of 1 will bet6,s7,t2,a1, anda4. Note that some (possibly all) of the registers may be nonzero by the time of thebnezinstruction.
- Change program entry point from
0x101d8to0x101f8which is two instructions afterbnez.- This will certainly not work without heavy modification as the
_startbootstrapping will be skipped, but it would be a fun exercise to attempt.
- This will certainly not work without heavy modification as the
- Make
mallocfail and which putsNULLintoa5- One idea for causing
mallocto fail is by passing in a value that is too large. This may be done by replacingli a0, 1000with an immediate value of1000 | (1 << X)whereXis large enough to be encoded in a singleaddiinstruction (liis a pseudoinstruction for an (optional, not currently present)luifollowed byaddito form the immediate). I am not sure this can be accomplished by modifying a single bit, but please try and let me know!
- One idea for causing
- Negate the branch condition by converting
bnez a5tobeqz a5(“branch if equal to zero”)- This will be our approach!
Because this project originally came from a student while teaching instruction formats, we will focus on changing bnez to a bne instruction. Let’s do it!
If we know the address of compute_hash, why can we not just insert an unconditional jump instruction to jump straight to our printf of compute_hash?
We can but changing fully replacing an existing an instruction would likely require modifying 2-4 bytes (can be done but likely not with a single-bit change. Please try though!). Inserting a jump instruction is possible, but dangerous and must be done with care.
The reason why we can’t just insert random instructions into assembly is because compilers make use of Position-Independent Code (PIC). This means that branch and jump instructions use offsets relative to the current instruction (called “PC-relative addressing”). If we start inserting / removing instructions, those branch/jump offsets point to the wrong instructions and the rest code will not work!
Instruction Formats
BNEZskips the execution ofcompute_hash
In Part 1, we discussed how RISC-V instructions are encoded in 32-bit or 16-bit binary. In our case, the bnez instruction is encoded in 16-bits as a compressed instruction. To turn our “branch-if-not-equal-zero” to “branch-if-equal-zero,” let’s examine the encodings:
| Instruction | Hexadecimal | Binary | Name |
|---|---|---|---|
c.bnez a5, 10214 | 0xE38D | 1110_0011_1000_1101 | Branch if Not Equals Zero |
c.beqz a5, 10214 | 0xC38D | 1100_0011_1000_1101 | Branch if Equals Zero |
The only differing bit in the instruction is bit 13. Thus, changing the bit from 1 to 0 in the executable should cause our CPU to interpret the instruction as a beqz and validate the condition check, thus print the success message.
00000000000101d8 <main>:
/* malloc result stored in a5 */
101f2: c38d beqz a5,10214 <main+0x3c> // Changed!!
101f4: deadc7b7 lui a5,0xdeadc
101f8: eef78513 addi a0,a5,-273 # ffffffffdeadbeef
101fc: 028000ef jal 10224 <compute_hash>
...
10210: f11ff0ef jal 10120 <exit>
And saving mystery.asm as an executable file and running gives us:
>$ chmod +x mystery.asm
>$ qemu-riscv64-static mystery
Error while loading mystery.asm: Exec format error
We get an error because mystery.asm does not contain any actual instructions; it is a plain, human-readable text file. We have to go one step deeper and modify the bits ourselves.
Binary Modification
Manipulating raw binary data can be scary, but is my favorite part of embedded systems and hardware design (which I have had to do in a number of low-level projects).
Let’s open up our executable mystery in our favorite text editor:
>$ neovim mystery
^?ELF^B^A^@^@^@^@^@^@^@...
^@^@^@^@j^@^@^@^@^@^@^@...
^B^]3<9b>Ö&@<85>G^3<97>...
...
My editor is attempting to render every byte in the file as a UTF8 character. We know that our file contains headers and machine code, however, so this is largely unhelpful. There are a few specialty editors which can edit raw binary (even vim has a binary mode!), but we can leverage our existing tools and get creative.
Instead of attempting to render each byte as displayable character, let’s use the xxd tool which is a Linux tool for manipulating files as binary / hexadecimal dumps.
>$ man xxd
xxd - make a hexdump or do the reverse.
>$ xxd mystery
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0200 f300 0100 0000 4e01 0100 0000 0000 ........N.......
00000020: 4000 0000 0000 0000 d093 0100 0000 0000 @...............
00000030: 0500 0000 4000 3800 0400 4000 1800 1700 [email protected]...@.....
00000040: 0300 0070 0400 0000 49b1 0000 0000 0000 ...p....I.......
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 6a00 0000 0000 0000 0000 0000 0000 0000 j...............
...
xxd shows that the first byte of the file is 0x7f45, followed by 0x4c46, and so on which is equivalent to the (non)-characters displayed when opening the file in a standard text editor.
Notice how the first line has 0x45 4c46 which are the ASCII letters E L F for an ELF file!
Yet another problem, however. Directly saving this output to a file will annotate every line with its corresponding hexadecimal address. Although this is helpful (and necessary) for debugging, we need the raw binary makeup of our file.
Fortunately, xxd has the ps flag for outputting in “post-script plain hexdump style” which is our data stripped of all annotations. Let’s store it into mystery.bin.
>$ xxd -ps mystery > mystery.bin
>$ cat mystery.bin
7f454c460201010000000000000000000200f300010000004e0101000000
00004000000000000000d093010000000000050000004000380004004000
18001700030000700400000049b100000000000000000000000000000000
0000000000006a0000000000000000000000000000000100000000000000
010000000500000000000000000000000000010000000000000001000000
...
The data is the same, but is in a raw binary format which will allow us to make direct edits to the machine code. Let’s search for the instruction sequence bnez-->lui which corresponds to our main function. We should find the hexadecimal string c38ddeadc7b7 somewhere in mystery.bin.
>$ cat mystery.bin | grep -o "c38ddeadc7b7"
>$ cat mystery.bin | grep -o "deadc7b7"
>$ cat mystery.bin | grep -o "deadc7b7"
However, that string does not exist anywhere in the binary file. Not only that, but deadc7b7 NOR even dead appear ONCE in the hex data. Is the objdump lying about the existence of the 0xdeadc7b7 LUI instruction?
The answer is that we also have to consider the endianness our data. Note that it is the same value Aside: Little- vs. Big-Endian
Endianness refers to the internal ordering of the bytes within a word. For example, the 4-byte word 0xDEADC7B7 will have the most-significant-byte at the lowest address for big-endian, while the least-significant-byte will be stored at the lowest address for little-endian systems. Endianness Representation Byte @ Lowest Address Big 0xDEADC7B70x______B7Little 0xB7C7ADDE0x______DE0xDEADC7B7, just stored in a different order.
In Part 1, we found that “our data is represented in little-endian 2’s complement.” Thus, the 0xdeadc7b7 instruction actually looks like 0xb7c7adde in the raw hexadecimal data. Our BNEZ instruction will look like 0x8de3 in Little-Endian as well, so let’s confirm with one last grep search:
>$ cat mystery.bin | grep -o "b7c7adde"
b7c7adde
>$ cat mystery.bin | grep -o "8de3b7c7adde"
8de3b7c7adde
Assembling the Solution
We now have the tools to modify our BNEZ instruction in the raw binary file. Let’s open up our text editor and find where the BNEZ->LUI sequence occurs in our main function. From the above, let’s search for the little-endian sequence 8de3b7a7adde:
...f126a2600264...
...00828001110c...
...8de3b7c7adde...
...f030760145ef...
...2326f4fce967...
Changing 0xE3 --> 0xC3 is a single-bit edit, and is now representing the instruction sequence BEQZ->LUI-> in our main function.
...f126a2600264...
...00828001110c...
...8dc3b7c7adde...
...f030760145ef...
...2326f4fce967...
After saving the file, we can use xxd in the reverse direction with -r to convert post-script hex dump back to a binary file.
Is it going to work??
>$ xxd -ps -r mystery.bin > mystery.out
>$ chmod +x mystery.out
>$ qemu-static-riscv64 mystery.out
Congratulations, you successfully modified the file!
Solution hash: 8bc44862
Success! Let’s use the cmp tool to check the diff between the original and modified executable.
>$ cmp -b mystery.out mystery
mystery.out mystery differ: byte 1058, line 2 is 303 M-C 343 M-b
In other words, there is a single bit difference between both files! 🎉 As I was first working through the modification, QEMU would raise a “File Not Found”-type error at the final step. Originally, I was running with the Newlib RISC-V cross-compiler because I was used to compiling for baremetal distributions with no operating systems. The file occur would then occur because QEMU-static uses the Host CPU to execute Syscalls, and To solve, I had to build the Linux GNU Cross-Compiler on my system so that the correct Syscall interface is used.Aside: Newlib Cross-Compiler
x86-64 open() is Syscall 430 while riscv64-unknown-elf-gcc links it with 1024 by default (discovered by running QEMU with strace).
This is why checksum verification is so important for tamper prevention! Secure hash functions (ex: SHA256, SHA-3) will catch single-bit differences between files by outputting a completely different hash.
Takeaways
This was a very exciting idea given to me by my students, and was fun to implement! It used a lot of hardware and systems knowledge as well as leveraging Unix tools to complete the task.
This was a small snippet of code in ideal conditions, however. In practice, will often much more obfuscated and difficult to bit hack. All code is open source to those who can read assembly! So make sure your projects rely on security measures like tags (checksums) and digital certificates to protect against tampering.
Hope you enjoyed the read!