Binary Modification: Part 1

26th Mar 2025
Tags:
ELF,
Binary Exploitation,
GNU Binutils

Is it possible to change the execution of a computer program by modifying one single bit? Can one alter a binary file to run custom code?

The answer to question #1 is yes. All computer code–whether compiled or run through an interpreter–is eventually interpreted as machine code: a sequence of 1s and 0s which your CPU understands. With the right tools and motivation, we can modify these 1s and 0s to execute custom code!

How can that be done? Let’s talk about it.

Note: knowledge of compilers, operating systems, and low-level programming may help but is not necessary – this post will introduce these difficult concepts in a fun way!

Motivation

At university, I taught weekly discussion sections for our lower-division computer architecture class. One of these topics was learning how programs are run through the Compiler-Assembler-Linker-Loader (CALL) stack, and how the assembler translates assembly code (RISC-V) to machine code (binary 1s and 0s).

In class, we were discussing a worksheet problem I wrote that has students convert (by hand) assembly instructions between their RISC-V aliases and their 32-bit machine code encodings (ex: translate 0x0062c533 to its RISC-V instruction (xor a0, t0, t1) or convert jal ra -80 to its 32-bit hexadecimal representation (0xfb1ff0ef)). The goal is to get students comfortable with the RISC-V ISA binary encoding before their 3rd project where they build their own CPU in the Logisim framework.

This particular subpart had students convert bne a0, a1, -8 which translates to the 32-bit value 0xfeb51ce3. In particular, one of my students asked about the difference between bne a0, a1, -8 and beq a0, a1, -8. These instructions have opposite effects: bne branches if the register values are not equal, while beq branches if the register values are equal. However, their encodings are nearly identical.

bne a0, a1, -8: 0xfeb50ce3
beq a0, a1, -8: 0xfeb51ce3

Every aspect of the instructions are the same (the registers, -8 immediate, opcode) except for the funct3 field which changes from 0b000 to 0b001. A student asked me the following:

If you could change one bit such that bne was interpreted as beq, could you, in theory, change the entire execution of a computer program?

Yes, you can! It was such a good question that I challenged myself to demonstrate how it can be done.

Aside: Instruction Encoding

Note: this subsection is fairly dense, and does not need to be fully understood!

Every assembly instruction in an ISA has a one-to-one translation to an N-bit sequence of binary 1s and 0s. CPU are built to interpret these binary sequences as their specific operation – machine code is essentially the language that a CPU understands!

We teach our students the RISC-V ISA which has its own group of encodings. Taken from the RISC-V Instruction Set Manual Volume 1

For example, let’s encode the branch-if-equal instruction beq a0, a1, -8 using the above table. For the 32-bit instruction labeled as inst:

inst[6:0] (instruction bits [6, 0] inclusive) are reserved for the instruction’s opcode. B-type (branch) instructions share the opcode of 110 0011
inst[11:7] is part of the encoding of -8.
inst[14:12] is the funct3 field = 0b000 – this is a 3-bit field that differentiates beq from bne (branch-if-not-equal) and from all the other branch instructions. Changing this changes the type of branch comparison.
inst[19:15] is the encoding of the first register (a0 = x10 = 0b01010)
inst[24:20] is the encoding of the second register (a1 = x11 = 0b01011)
inst[31:25] is the rest of the -8.

Assembling all of these values, we get the final 32-bit instruction beq a0, a1, -8 = 0xfeb50ce3. Cool stuff!

The Challenge

After experimenting with a few iterations of planning the design, I came up with a simple main function that had either a success or failure. On a successful heap allocation, it will skip the “success” branch and exit with an code of -1. Additionally, I wrote a simple hash function to print a checksum representing the binary file’s solution.

One-Bit Challenge main.c | Last updated: Apr 21, 2025

After compiling main.c to an executable, can you modify a single bit in the binary to print the “success” message?

Try it here or follow along (spoilers below)!

Prerequisites

To complete the challenge with the RISC-V instruction set, you will need access to either a RISC-V distribution running gcc or a cross-compiler. I will be using the RISC-V GNU Toolchain installed inside a docker image. To debug with gdb, the gdb-multiarch package must be installed.

Additionally, I will use QEMU RISC-V User Mode emulation to execute of this program. For this reason, it is necessary to compile with the gcc -static flag to ensure all libraries are statically linked.

(QEMU System Emulation or Spike are overkill because we just need User Mode emulation)

Challenge Walkthrough

Let’s walk through my thought process of solving the challenge! I will explain fun and interesting concepts along the way.

Given our main.c, let’s compile the above for the RISC-V 64 architecture and name the executable mystery:

>$ riscv64-unknown-elf-gcc -static main.c -o mystery
>$ rm main.c
>$ ls
mystery

>$ qemu-riscv64-static mystery
Nothing to see here...

Well… nothing to see so far and it’s time to problem-solve. Let’s try a debugger to see if we can step through the binary.

>$ riscv64-unknown-elf-gdb mystery
GNU gdb (GDB) 15.2
...
Reading symbols from mystery...
(No debugging symbols found in mystery)

(gdb) breakpoint main
Breakpoint 1 at 0x119d

(gdb) run
Breakpoint 1, 0x000055555555519d in main ()

(gdb) next
Single stepping until exit from function main,
which has no line number information.

Nothing to see here...
[Inferior 1 exited with code 0377]

Not much to see here as the executable is (purposely) compiled with no debug information to mimic a real application… GDB will not be of little help. We could try single-stepping through the GDB disassembly, but instead we can inspect the executable mystery for information.

Linux Executables

In the early days of computing, executable files were simply a sequence of assembly instructions for computers to execute. However, that leaves little room for encoding other information about our executable such as additional data sections, debug information, and relocation information that help our program run!

In modern architectures, Windows uses the PE (Portable Executable) format while Unix-like operating systems often default to ELF (Executable and Linkable Format) files.

GCC compiles our program to the ELF format which contains an ELF Header that links to sections with various information (similar to an object file, we can be created with the gcc -c flag). To search for clues about our executable, we can use the readelf GNU tool to search for information:

>$ riscv64-unknown-elf-readelf
Usage: readelf <option(s)> elf-file(s)
 Display information about the contents of ELF format files
 Options are:
  -a --all               Equivalent to: -h -l -S -s -r -d -V -A -I
  -h --file-header       Display the ELF file header
  -l --program-headers   Display the program headers
     --segments          An alias for --program-headers
 ...
 ...

Let’s use readelf to examine what the ELF header looks like for our program:

>$ riscv64-unknown-elf-readelf -h mystery
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x1014e
  Start of program headers:          64 (bytes into file)
  Start of section headers:          103376 (bytes into file)
  Flags:                             0x5, RVC, double-float ABI
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         4
  Size of section headers:           64 (bytes)
  Number of section headers:         24
  Section header string table index: 2

This tells us a lot of information about our compiled program! We can see that we have an executable written for RVC (RISC-V + Compressed 16b Extension) supporting floats and doubles, and that our data is represented in little-endian 2’s complement (no surprise there). We also have the entry point address listed as 0x1014e which should correspond to a _start label in our executable file (why is it not main? That’s an excellent question to be answered on another day…). If your machine is struggling to execute a file, it may be helpful to crosscheck ELF headers with your machine!

Additionally, we can inspect the sections and segments present:

>$ riscv64-unknown-elf-readelf -l mystery
Elf file type is EXEC (Executable file)
Entry point 0x1014e
There are 4 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  RISCV_ATTRIBUT 0x000000000000b149 0x0000000000000000 0x0000000000000000
                 0x000000000000006a 0x0000000000000000  R      0x1
  LOAD           0x0000000000000000 0x0000000000010000 0x0000000000010000
                 0x000000000000a54c 0x000000000000a54c  R E    0x1000
  LOAD           0x000000000000a550 0x000000000001b550 0x000000000001b550
                 0x0000000000000bd8 0x00000000000011a0  RW     0x1000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10

 Section to Segment mapping:
  Segment Sections...
   00     .riscv.attributes
   01     .text .rodata .eh_frame
   02     .init_array .fini_array .data .sdata .sbss .bss
   03

Most important for us are the listed “Segment Sections.” These sections contain the data needed to run our program. In particular, main.c’s compiled assembly code is stored in the .data section. We can use readelf to disassemble (mystery.dump) so that we can inspect the assembly code! Remember that our goal is to modify one bit in the executable to print out the “success” message.

// readelf -a displays all available information
>$ riscv64-unknown-elf-readelf -a mystery
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
 ...

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type RISC-V is not currently 
supported.

Symbol table '.symtab' contains 492 entries:
 [omitted for brevity]

Welp, unfortunately readelf will not disassemble the RISC-V .text for us so we will need to use another tool. If we were inspecting the program totally blind (without knowing main.c), then this is a useful because readelf will show us the file’s symbol table!

Symbol Tables

Out-of-scope for this blog post (this is not meant to be a full-stack compiler→executable tutorial), but symbol tables contain information about program variables, function locations, debug information, etc. and will often have the names of the functions in your code.

In the symbol table from above, we see:

>$ riscv64-unknown-elf-readelf -a mystery
 ...
Symbol table '.symtab' contains 492 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000010120     0 SECTION LOCAL  DEFAULT    1 .text
     2: 0000000000019cf8     0 SECTION LOCAL  DEFAULT    2 .rodata
     3: 000000000001a548     0 SECTION LOCAL  DEFAULT    3 .eh_frame
     4: 000000000001b550     0 SECTION LOCAL  DEFAULT    4 .init_array
     5: 000000000001b560     0 SECTION LOCAL  DEFAULT    5 .fini_array
     6: 000000000001b568     0 SECTION LOCAL  DEFAULT    6 .data
    ...
   360: 0000000000010224   144 FUNC    GLOBAL DEFAULT    1 compute_hash
    ...
   391: 000000000001014e    66 FUNC    GLOBAL DEFAULT    1 _start
    ...
   421: 00000000000101d8    76 FUNC    GLOBAL DEFAULT    1 main

A few notable things:

We see the locations of our executable segments, with our .text segment (where our machine lives) beginning at virtual address 0x10120.
There are entries for our two functions main and compute_hash! The symbol table specifies that main begins at relative address 0x101d8 and our compute_hash function is placed at virtual address 0x10224 (we want it to eventually jump to 0x10224).

Mysterious _start

Remember that Entry point address: 0x1014e from the ELF header? That corresponds to a label called _start (entry 391 in the symbol table) which is where our code will begin executing before it eventually jumps to main. It contains the startup code for the C runtime environment which (among other things) populates argv and argc, calls main, and calls exit on return from main.

You can compile code for custom operating systems and baremetal hardware by defining your own _start function combined with gcc -nostartfiles (and -nostdlib for omitting a standard library) :)

If we didn’t know the contents of main.c already, this information is vital! Let’s use this information to disassemble our executable file and reconstruct a modified binary in Part 2…