In part 2 of this series we will focus much more on
just digging into examples and than breaking down what I learned when
writing them. This will go through examples
asm4.s and general cover endianness, pseudoinstructions, debugging
instructions (with some light gdb), and why we can't easily load a 32-bit
If you are seasoned assembly veteran I suggest just looking at the code and skipping to other parts in the series where we actually start deconstructing the instructions at the bit level and creating real payloads.
As a reminder the code and documentation for this project is mirrored on my git repos:
I attempted to annotate my assembly as best as possible to help follow along for those not fond of blog style posts. So feel free to dive in.
This example we are going to explore is multi-faceted and explores pseudoinstructions and endianness. The goal is to simply print the string literals:
AA\n\0 Hello MWR Labs\n\0
In order to get started e need to look at the
call and figure out what arguments it needs, the function signature
looks like the following:
write(int fd, const void \*buf, size_t count);
Unlike our previous examples we are going to need to use a bit more than just loading integer values into function arguments before triggering the system call. We have to actually set one of the arguments to the memory value of the beginning of the buffer.
In order to load the values we want to print on the stack using their literal values in little-endian from registers, load the other arguments into their syscall format, and call the system call like in the previous example.
RISC-V is little-endian, which means the isystem is going to load the
bytes differently than you would visually expect. All sequences load the
least significant byte at the lowest addresses and the most significant
byte at the highest address. So the literal hex string of
0x41 0x41 0x0a 0x00 in big-endian, but
0x00 0x0a 0x41 0x41 in
Another thing that can cause complications is when you realize that the
li pseudoinstruction generates code differently for values greater
0xFFFFF). This is due to the fact that loading the
immediate upper value being a U-type instruction can only take 20 bits
and the rest of the expected 12 bits will have to be ADDI'd afterward.
Why is that? Remember that registers can only store a fixed amount of
immediate value and would not be able to store a full 32-bit value since
the size of a opcode is 32-bit. This would have to happen somewhere
Remember, the U-type format is
imm[31:12] | rd | opcode
As an example,
li a1,0xFFFFF000 will generate code as:
Whereas attempting to use
li a1,0x000a4141 will generate:
lui a1,0xa4 # 0x000a4 - Upper 20 bits addiw a1,a1,321 # 0x141 - lower 12 bits
This can cause issues and make things a bit hard to predict the output
of the assembler, so be aware when writing the assembly. Luckily RISC-V
G currently only has one of the variable pseudo-instructions and that is
li as discussed here.
Interacting with the stack is incredibly simple in RISC-V, it is a
"load-store" architecture. This means that there are specific memory
access instructions such as
ld for loading and
sd for storing, and
instructions that only interact with other instructions.
So if we want to store our data in the memory of the program we will need to grow the stack by the amount of memory we need to store and insert it into that location.
In hardware the stack starts at a high portion of memory and "grows"
downward, which means that in order to add data to the stack we have to
actually move the stack pointer (which literally points to the last
referenced location) downwards before storing the data in a location.
This can be seen below when we shift the stack pointer
sp down 4 bytes
in order to fit our
AA\n\0, and then using
sd to store the
The second part of asm2 that prints
Hello MWR Labs does the same as
what we have just discussed, but instead achieves the same thing as the
first but uses the assembler macro for loading the upper portion of the
text and then the lower. This is very useful when hand writing asm and I
figured it was worth pointing out, but not very useful for shellcoding
since we can't access the macros ourselves.
A final note is that on most POSIX systems the file descriptor for
STDOUT (normal output) is generally
0, which is why we can set the
first argument of the
write(2) system call to
All of these concepts can be seen being put together below:
.section .text .globl _start _start: li a0,0x0 #first argument: file descriptor STDOUT addi sp,sp,-4 #move stack pointer down 4 bytes li a1,0x000a4141#Literal AA\n\0 in little-endian sd a1,4(sp) #store AA\n\0 sp+4 (orignal sp value) li a1,0 #zero out the a1 register for the next instruction addi a1,sp,4 #second argument: address of sp+4 (original sp value) li a2,4 #third argument: 4 byte size of buffer li a7, 64 #64 is the __NR_write syscall ecall #system call li a0,0x0 #first argument: file descriptor STDOUT addi sp,sp,-16 #move stack pointer down 16 bytes lui a1, %hi(msg) #load msg(upper 20 bits) addi a1, a1, %lo(msg) #load msg(lower 12 bits) li a2,17 #third argument: 16 byte size of buffer ecall #a7 is still loaded with write syscall li a0, 0x0 li a7, 93 ecall .section .rodata msg: .string "Hello MWR Labs\n\0"
Once compiled we can give this a good ol execution:
$ ./bin/asm2 AA Hello MWR Labs
And then to verify that assembler didn't do anything crazy and to view
li behavior discussed earlier, we can look at the decompiled
objdump -D bin/asm2 and verify everything in there:
asm2: file format elf64-littleriscv Disassembly of section .text: 0000000000010078 <_start>: 10078: 00000513 li a0,0 1007c: ffc10113 addi sp,sp,-4 10080: 000a45b7 lui a1,0xa4 10084: 1415859b addiw a1,a1,321 10088: 00b13223 sd a1,4(sp) 1008c: 00000593 li a1,0 10090: 00410593 addi a1,sp,4 10094: 00400613 li a2,4 10098: 04000893 li a7,64 1009c: 00000073 ecall 100a0: 00000513 li a0,0 100a4: ff010113 addi sp,sp,-16 100a8: 000105b7 lui a1,0x10 100ac: 0c458593 addi a1,a1,196 # 100c4 <msg> 100b0: 01100613 li a2,17 100b4: 00000073 ecall 100b8: 00000513 li a0,0 100bc: 05d00893 li a7,93 100c0: 00000073 ecall Disassembly of section .rodata: 00000000000100c4 <msg>: 100c4: 6548 ld a0,136(a0) 100c6: 6c6c ld a1,216(s0) 100c8: 574d206f j e263c <__global_pointer$+0xd0d67> 100cc: 2052 fld ft0,272(sp) 100ce: 614c ld a1,128(a0) 100d0: 7362 ld t1,56(sp) 100d2: 000a c.slli zero,0x2 ...
.rotdata section is being read as RISC-V assembly, but if you check out the bit values it is actually just the string that we were using for
This example is pretty much the same as the last, but provides the
opportunity to discuss why we should structure assembly for shellcode in
a certain manner. One of the first concepts that we are going to need is
Position Independent Code (PIC). In our last example our long string
that we printed was read from the
.rodata section of the binary, which
means that the data is written to the corresponding binary itself and
either statically referenced or at run time a virtual address is
The behavior of having strings hardcoded in object sections means that we are not able to reliably reuse them in a generic manner when injecting the shellcode into a vulnerable target without prior knowledge of the binary, how it was compiled, and how it's mapped into memory. We now have the restriction of making all the shellcode not reference static locations, we can't use labels, can only use relative jumps, and are restricted from using library functions carefully. Up until this point we were functionally doing that anyway based on the simple nature of our programs, but a real assembly programmer would most likely be taking more advantage of these features.
You might also notice the
ebreak instruction. This is an incredibly
useful instruction that will trigger a gdb trap and will allow you to
debug incredibly easily. I used this very heavily during development to
better understand the flow of instructions and their interactions on the
stack. If your VM supports gdb, I highly suggest exploring the binary
with some helpful gdb commands like:
Here is another example of code not calling a hardcoded
sections and a commented
ebreak for practice:
.section .text .globl _start _start: li a0,0x0 #first argument: fd 0 = STDOUT li a2,8 #third argument: sizeof(a1) li t0,0x0a414141 #example of using temporary registers li t1,0x42424242 addi sp,sp,-8 #move the stack down sizeof(t0+t1) sd t1,0(sp) #store 'BBBB' sd t0,4(sp) #store 'AAA\n' addi a1,sp,0 #point a1 to the top of the stack li a7, 64 #64 is the __NR_write syscall ebreak ecall li a7, 93 #exit with retval of previous syscall ecall
When ran without gdb you will notice that the program exits with an
Trace/breakpoint trap (core dumped), meaning that programs
ebreak may not be able to be run directly and will require some
sort of debugging capability.
This is the last example of simple assembly that we will write, it is a
simple call to
execve(2) to call
/bin/sh, or in hacker parlance:
getting a local shell. At this point we are still not writing any
commonly useful shellcode or anything doing remote access, but still
familiarizing ourselves with using position independent code and some
RISC-V assembly pain points.
We will use the same techniques as before to make a call to execve(2) in order to execute a shell. This combines all that we have learned so far and requires the following signature be matched from the execve documentation:
execve(const char *filename, char *const argv,char *const envp);
While at first this might be intimidating, but just like with the
write example all we need to do is set up the stack to contain the
/bin/sh filename and since we are not setting any arguments to the
shell execution (argv) or the environment variables (envp) we can simply
point the arguments to NULL and move on.
Like in asm2.s we need to load the arguments in little-endian and store them on the stack since the first argument is a memory address of the file name.
.section .text .globl _start _start: #execve(*filename, *argv, *envp) li a0,0x6e69622f #nib/ addi sp,sp,-8 #set up the stack sd a0,0(sp) #store '/bin' li a0,0x0068732f # hs/ sd a0,4(sp) #store '/sh ' addi a0,sp,0 #set a0 to the top of the stack li a2,0x0 #set argv to NULL li a1,0x0 #set envp to NULL li a7, 221 #221 is the __NR_execve ecall li a7, 93 #exit value of execve is in a0 ecall #exit the program with the retval of execve
Once compiled, we can compare the asm4.s code with what is actually compile time generated:
asm4: file format elf64-littleriscv Disassembly of section .text: 0000000000010078 <_start>: 10078: 6e696537 lui a0,0x6e696 1007c: 22f5051b addiw a0,a0,559 10080: ff810113 addi sp,sp,-8 10084: 00a13023 sd a0,0(sp) 10088: 00687537 lui a0,0x687 1008c: 32f5051b addiw a0,a0,815 10090: 00a13223 sd a0,4(sp) 10094: 00010513 mv a0,sp 10098: 00000613 li a2,0 1009c: 00000593 li a1,0 100a0: 0dd00893 li a7,221 100a4: 00000073 ecall 100a8: 05d00893 li a7,93 100ac: 00000073 ecall
As expected we see
li has actually been turned into multiple
addiw calls. The
w in this immediate add is the 64-bit data wide word
variant. This is another situation that can occur from the
The final step to test this is to set the
PS1 shell variable to the
non-default in order to visualize when the new shell is executed and then call
the compiled binary:
$ PS1="ORIGINAl$ " ORIGINAL$ ./bin/asm4 $ exit exit ORIGINAL$
Usage of the
strace command if available can also help visually see some
system calls that we made and the set up of their arguments:
ORIGINAl$ strace ./bin/asm4 2>&1 | grep -i -e "execve" -e "(($(whoami))" execve("./bin/asm4", ["./bin/asm4"], 0x3fff96daf0 /* 29 vars */) = 0 execve("/bin/sh", NULL, NULL) = 0
Part 3 we will begin to look at more realistic styles of payloads that have restrictions on the type of characters that they can use, how the code gets introduced into vulnerable binaries, and an introduction into shellcode testers.