⊕ a tessier-ashpool subsidiary

RISC-V ASM / Payloads Part 2

⊕ 2019-04-28

In part 2 of this series we will focus much more on just digging into examples and than breaking down what I learned when writing them. This will go through examples asm2.s, asm3.s, and asm4.s and general cover endianness, pseudoinstructions, debugging instructions (with some light gdb), and why we can't easily load a 32-bit value immediately.

If you are seasoned assembly veteran I suggest just looking at the code and skipping to other parts in the series where we actually start deconstructing the instructions at the bit level and creating real payloads.


As a reminder the code and documentation for this project is mirrored on my git repos:

I attempted to annotate my assembly as best as possible to help follow along for those not fond of blog style posts. So feel free to dive in.

asm2.s - Endianness and Stack

This example we are going to explore is multi-faceted and explores pseudoinstructions and endianness. The goal is to simply print the string literals:

Hello MWR Labs\n\0

In order to get started e need to look at the write(2) system call and figure out what arguments it needs, the function signature looks like the following:

write(int fd, const void \*buf, size_t count);

Unlike our previous examples we are going to need to use a bit more than just loading integer values into function arguments before triggering the system call. We have to actually set one of the arguments to the memory value of the beginning of the buffer.

In order to load the values we want to print on the stack using their literal values in little-endian from registers, load the other arguments into their syscall format, and call the system call like in the previous example.

RISC-V is little-endian, which means the isystem is going to load the bytes differently than you would visually expect. All sequences load the least significant byte at the lowest addresses and the most significant byte at the highest address. So the literal hex string of AA\n\0 is 0x41 0x41 0x0a 0x00 in big-endian, but 0x00 0x0a 0x41 0x41 in little-endian.

Another thing that can cause complications is when you realize that the li pseudoinstruction generates code differently for values greater than 1048575 (0xFFFFF). This is due to the fact that loading the immediate upper value being a U-type instruction can only take 20 bits and the rest of the expected 12 bits will have to be ADDI'd afterward. Why is that? Remember that registers can only store a fixed amount of immediate value and would not be able to store a full 32-bit value since the size of a opcode is 32-bit. This would have to happen somewhere anyway.

Remember, the U-type format is imm[31:12] | rd | opcode

As an example, li a1,0xFFFFF000 will generate code as:

lui a1,1048575

Whereas attempting to use li a1,0x000a4141 will generate:

lui a1,0xa4 # 0x000a4 - Upper 20 bits
addiw a1,a1,321 # 0x141 - lower 12 bits

This can cause issues and make things a bit hard to predict the output of the assembler, so be aware when writing the assembly. Luckily RISC-V G currently only has one of the variable pseudo-instructions and that is li as discussed here.

Interacting with the stack is incredibly simple in RISC-V, it is a "load-store" architecture. This means that there are specific memory access instructions such as ld for loading and sd for storing, and instructions that only interact with other instructions.

So if we want to store our data in the memory of the program we will need to grow the stack by the amount of memory we need to store and insert it into that location.

In hardware the stack starts at a high portion of memory and "grows" downward, which means that in order to add data to the stack we have to actually move the stack pointer (which literally points to the last referenced location) downwards before storing the data in a location. This can be seen below when we shift the stack pointer sp down 4 bytes in order to fit our AA\n\0, and then using sd to store the register into sp+4.

The second part of asm2 that prints Hello MWR Labs does the same as what we have just discussed, but instead achieves the same thing as the first but uses the assembler macro for loading the upper portion of the text and then the lower. This is very useful when hand writing asm and I figured it was worth pointing out, but not very useful for shellcoding since we can't access the macros ourselves.

A final note is that on most POSIX systems the file descriptor for STDOUT (normal output) is generally 0, which is why we can set the first argument of the write(2) system call to 0.

All of these concepts can be seen being put together below:

.section .text
.globl _start
        li a0,0x0       #first argument: file descriptor STDOUT
        addi sp,sp,-4   #move stack pointer down 4 bytes
        li a1,0x000a4141#Literal AA\n\0 in little-endian
        sd a1,4(sp)     #store AA\n\0 sp+4 (orignal sp value)
        li a1,0         #zero out the a1 register for the next instruction
        addi a1,sp,4    #second argument: address of sp+4 (original sp value)
        li a2,4         #third argument: 4 byte size of buffer
        li a7, 64       #64 is the __NR_write syscall
        ecall           #system call
        li a0,0x0       #first argument: file descriptor STDOUT
        addi    sp,sp,-16 #move stack pointer down 16 bytes
        lui a1, %hi(msg)        #load msg(upper 20 bits)
        addi a1, a1, %lo(msg)   #load msg(lower 12 bits)
        li a2,17        #third argument: 16 byte size of buffer
        ecall           #a7 is still loaded with write syscall
        li a0, 0x0
        li a7, 93

.section .rodata
            .string "Hello MWR Labs\n\0"

Once compiled we can give this a good ol execution:

$ ./bin/asm2
Hello MWR Labs

And then to verify that assembler didn't do anything crazy and to view the li behavior discussed earlier, we can look at the decompiled objdump -D bin/asm2 and verify everything in there:

asm2:     file format elf64-littleriscv

Disassembly of section .text:

0000000000010078 <_start>:
   10078:	00000513          	li	a0,0
   1007c:	ffc10113          	addi	sp,sp,-4
   10080:	000a45b7          	lui	a1,0xa4
   10084:	1415859b          	addiw	a1,a1,321
   10088:	00b13223          	sd	a1,4(sp)
   1008c:	00000593          	li	a1,0
   10090:	00410593          	addi	a1,sp,4
   10094:	00400613          	li	a2,4
   10098:	04000893          	li	a7,64
   1009c:	00000073          	ecall
   100a0:	00000513          	li	a0,0
   100a4:	ff010113          	addi	sp,sp,-16
   100a8:	000105b7          	lui	a1,0x10
   100ac:	0c458593          	addi	a1,a1,196 # 100c4 <msg>
   100b0:	01100613          	li	a2,17
   100b4:	00000073          	ecall
   100b8:	00000513          	li	a0,0
   100bc:	05d00893          	li	a7,93
   100c0:	00000073          	ecall

Disassembly of section .rodata:

00000000000100c4 <msg>:
   100c4:	6548                	ld	a0,136(a0)
   100c6:	6c6c                	ld	a1,216(s0)
   100c8:	574d206f          	j	e263c <__global_pointer$+0xd0d67>
   100cc:	2052                	fld	ft0,272(sp)
   100ce:	614c                	ld	a1,128(a0)
   100d0:	7362                	ld	t1,56(sp)
   100d2:	000a                	c.slli	zero,0x2

Interestingly, the .rotdata section is being read as RISC-V assembly, but if you check out the bit values it is actually just the string that we were using for msg.

asm3.s - PIC and EBREAK

This example is pretty much the same as the last, but provides the opportunity to discuss why we should structure assembly for shellcode in a certain manner. One of the first concepts that we are going to need is Position Independent Code (PIC). In our last example our long string that we printed was read from the .rodata section of the binary, which means that the data is written to the corresponding binary itself and either statically referenced or at run time a virtual address is generated.

The behavior of having strings hardcoded in object sections means that we are not able to reliably reuse them in a generic manner when injecting the shellcode into a vulnerable target without prior knowledge of the binary, how it was compiled, and how it's mapped into memory. We now have the restriction of making all the shellcode not reference static locations, we can't use labels, can only use relative jumps, and are restricted from using library functions carefully. Up until this point we were functionally doing that anyway based on the simple nature of our programs, but a real assembly programmer would most likely be taking more advantage of these features.

You might also notice the ebreak instruction. This is an incredibly useful instruction that will trigger a gdb trap and will allow you to debug incredibly easily. I used this very heavily during development to better understand the flow of instructions and their interactions on the stack. If your VM supports gdb, I highly suggest exploring the binary with some helpful gdb commands like:

Here is another example of code not calling a hardcoded .rodata sections and a commented ebreak for practice:

.section .text
.globl _start
        li      a0,0x0  #first argument: fd 0 = STDOUT
        li      a2,8    #third argument: sizeof(a1)
        li      t0,0x0a414141   #example of using temporary registers
        li      t1,0x42424242
        addi    sp,sp,-8        #move the stack down sizeof(t0+t1)
        sd      t1,0(sp)        #store 'BBBB'
        sd      t0,4(sp)        #store 'AAA\n'
        addi    a1,sp,0         #point a1 to the top of the stack
        li      a7, 64          #64 is the __NR_write syscall
        li      a7, 93          #exit with retval of previous syscall

When ran without gdb you will notice that the program exits with an explicit Trace/breakpoint trap (core dumped), meaning that programs with ebreak may not be able to be run directly and will require some sort of debugging capability.

asm4.s - shells!

This is the last example of simple assembly that we will write, it is a simple call to execve(2) to call /bin/sh, or in hacker parlance: getting a local shell. At this point we are still not writing any commonly useful shellcode or anything doing remote access, but still familiarizing ourselves with using position independent code and some RISC-V assembly pain points.

We will use the same techniques as before to make a call to execve(2) in order to execute a shell. This combines all that we have learned so far and requires the following signature be matched from the execve documentation:

execve(const char *filename, char *const argv[],char *const envp[]);

While at first this might be intimidating, but just like with the write example all we need to do is set up the stack to contain the /bin/sh filename and since we are not setting any arguments to the shell execution (argv) or the environment variables (envp) we can simply point the arguments to NULL and move on.

Like in asm2.s we need to load the arguments in little-endian and store them on the stack since the first argument is a memory address of the file name.

.section .text
.globl _start
	#execve(*filename, *argv[], *envp[])
	li a0,0x6e69622f	#nib/
	addi sp,sp,-8	#set up the stack
	sd a0,0(sp)		#store '/bin'
	li a0,0x0068732f 	#hs/ 
	sd a0,4(sp)		#store '/sh'
	addi a0,sp,0	#set a0 to the top of the stack
	li a2,0x0		#set argv[] to NULL
	li a1,0x0		#set envp[] to NULL
	li a7, 221 		#221 is the __NR_execve 
	li a7, 93		#exit value of execve is in a0
	ecall		#exit the program with the retval of execve

Once compiled, we can compare the asm4.s code with what is actually compile time generated:

asm4:     file format elf64-littleriscv

Disassembly of section .text:

0000000000010078 <_start>:
   10078:	6e696537          	lui	a0,0x6e696
   1007c:	22f5051b          	addiw	a0,a0,559
   10080:	ff810113          	addi	sp,sp,-8
   10084:	00a13023          	sd	a0,0(sp)
   10088:	00687537          	lui	a0,0x687
   1008c:	32f5051b          	addiw	a0,a0,815
   10090:	00a13223          	sd	a0,4(sp)
   10094:	00010513          	mv	a0,sp
   10098:	00000613          	li	a2,0
   1009c:	00000593          	li	a1,0
   100a0:	0dd00893          	li	a7,221
   100a4:	00000073          	ecall
   100a8:	05d00893          	li	a7,93
   100ac:	00000073          	ecall

As expected we see li has actually been turned into multiple lui and addiw calls. The w in this immediate add is the 64-bit data wide word variant. This is another situation that can occur from the li pseudo-instruction.

The final step to test this is to set the PS1 shell variable to the non-default in order to visualize when the new shell is executed and then call the compiled binary:

$ PS1="ORIGINAl$ "
ORIGINAL$ ./bin/asm4
$ exit

Usage of the strace command if available can also help visually see some system calls that we made and the set up of their arguments:

ORIGINAl$ strace ./bin/asm4 2>&1 | grep -i -e "execve" -e "(($(whoami))"
execve("./bin/asm4", ["./bin/asm4"], 0x3fff96daf0 /* 29 vars */) = 0
execve("/bin/sh", NULL, NULL) = 0

part 3

Part 3 we will begin to look at more realistic styles of payloads that have restrictions on the type of characters that they can use, how the code gets introduced into vulnerable binaries, and an introduction into shellcode testers.

A link to all the posts in this series can be found here.