In part 2 of this series we will focus much more on
just digging into examples and than breaking down what I learned when
writing them. This will go through examples asm2.s
, asm3.s
, and
asm4.s
and general cover endianness, pseudoinstructions, debugging
instructions (with some light gdb), and why we can’t easily load a 32-bit
value immediately.
If you are seasoned assembly veteran I suggest just looking at the code and skipping to other parts in the series where we actually start deconstructing the instructions at the bit level and creating real payloads.
As a reminder the code and documentation for this project is mirrored on my git repos:
I attempted to annotate my assembly as best as possible to help follow along for those not fond of blog style posts. So feel free to dive in.
This example we are going to explore is multi-faceted and explores pseudoinstructions and endianness. The goal is to simply print the string literals:
AA\\n\\0
Hello MWR Labs\\n\\0
In order to get started e need to look at the write(2)
system
call and figure out what arguments it needs, the function signature
looks like the following:
write(int fd, const void *buf, size_t count);
Unlike our previous examples we are going to need to use a bit more than just loading integer values into function arguments before triggering the system call. We have to actually set one of the arguments to the memory value of the beginning of the buffer.
In order to load the values we want to print on the stack using their literal values in little-endian from registers, load the other arguments into their syscall format, and call the system call like in the previous example.
RISC-V is little-endian, which means the isystem is going to load the
bytes differently than you would visually expect. All sequences load the
least significant byte at the lowest addresses and the most significant
byte at the highest address. So the literal hex string of AA\\n\\0
is
0x41 0x41 0x0a 0x00
in big-endian, but 0x00 0x0a 0x41 0x41
in
little-endian.
Another thing that can cause complications is when you realize that the
li
pseudoinstruction generates code differently for values greater
than 1048575
(0xFFFFF
). This is due to the fact that loading the
immediate upper value being a U-type instruction can only take 20 bits
and the rest of the expected 12 bits will have to be ADDI’d afterward.
Why is that? Remember that registers can only store a fixed amount of
immediate value and would not be able to store a full 32-bit value since
the size of a opcode is 32-bit. This would have to happen somewhere
anyway.
Remember, the U-type format is imm[31:12] | rd | opcode
As an example, li a1,0xFFFFF000
will generate code as:
lui a1,1048575
Whereas attempting to use li a1,0x000a4141
will generate:
lui a1,0xa4 # 0x000a4 - Upper 20 bits
addiw a1,a1,321 # 0x141 - lower 12 bits
This can cause issues and make things a bit hard to predict the output
of the assembler, so be aware when writing the assembly. Luckily RISC-V
G currently only has one of the variable pseudo-instructions and that is
li
as discussed here.
Interacting with the stack is incredibly simple in RISC-V, it is a
“load-store” architecture. This means that there are specific memory
access instructions such as ld
for loading and sd
for storing, and
instructions that only interact with other instructions.
So if we want to store our data in the memory of the program we will need to grow the stack by the amount of memory we need to store and insert it into that location.
In hardware the stack starts at a high portion of memory and “grows”
downward, which means that in order to add data to the stack we have to
actually move the stack pointer (which literally points to the last
referenced location) downwards before storing the data in a location.
This can be seen below when we shift the stack pointer sp
down 4 bytes
in order to fit our AA\\n\\0
, and then using sd
to store the
register into sp+4
.
The second part of asm2 that prints Hello MWR Labs
does the same as
what we have just discussed, but instead achieves the same thing as the
first but uses the assembler macro for loading the upper portion of the
text and then the lower. This is very useful when hand writing asm and I
figured it was worth pointing out, but not very useful for shellcoding
since we can’t access the macros ourselves.
A final note is that on most POSIX systems the file descriptor for
STDOUT (normal output) is generally 0
, which is why we can set the
first argument of the write(2)
system call to 0
.
All of these concepts can be seen being put together below:
.section .text
.globl _start
_start:
li a0,0x0 #first argument: file descriptor STDOUT
addi sp,sp,-4 #move stack pointer down 4 bytes
li a1,0x000a4141 #Literal AA\\n\\0 in little-endian
sd a1,4(sp) #store AA\\n\\0 sp+4 (orignal sp value)
li a1,0 #zero out the a1 register for the next instruction
addi a1,sp,4 #second argument: address of sp+4 (original sp value)
li a2,4 #third argument: 4 byte size of buffer
li a7, 64 #64 is the __NR_write syscall
ecall #system call
li a0,0x0 #first argument: file descriptor STDOUT
addi sp,sp,-16 #move stack pointer down 16 bytes
lui a1, %hi(msg) #load msg(upper 20 bits)
addi a1, a1, %lo(msg) #load msg(lower 12 bits)
li a2,17 #third argument: 16 byte size of buffer
ecall #a7 is still loaded with write syscall
li a0, 0x0
li a7, 93
ecall
.section .rodata
msg:
.string "Hello MWR Labs\\n\\0"
Once compiled we can give this a good ol execution:
$ ./bin/asm2
AA
Hello MWR Labs
And then to verify that assembler didn’t do anything crazy and to view
the li
behavior discussed earlier, we can look at the decompiled
objdump -D bin/asm2
and verify everything in there:
asm2: file format elf64-littleriscv
Disassembly of section .text:
0000000000010078 <_start>:
10078: 00000513 li a0,0
1007c: ffc10113 addi sp,sp,-4
10080: 000a45b7 lui a1,0xa4
10084: 1415859b addiw a1,a1,321
10088: 00b13223 sd a1,4(sp)
1008c: 00000593 li a1,0
10090: 00410593 addi a1,sp,4
10094: 00400613 li a2,4
10098: 04000893 li a7,64
1009c: 00000073 ecall
100a0: 00000513 li a0,0
100a4: ff010113 addi sp,sp,-16
100a8: 000105b7 lui a1,0x10
100ac: 0c458593 addi a1,a1,196 # 100c4 <msg>
100b0: 01100613 li a2,17
100b4: 00000073 ecall
100b8: 00000513 li a0,0
100bc: 05d00893 li a7,93
100c0: 00000073 ecall
Disassembly of section .rodata:
00000000000100c4 <msg>:
100c4: 6548 ld a0,136(a0)
100c6: 6c6c ld a1,216(s0)
100c8: 574d206f j e263c <__global_pointer$+0xd0d67>
100cc: 2052 fld ft0,272(sp)
100ce: 614c ld a1,128(a0)
100d0: 7362 ld t1,56(sp)
100d2: 000a c.slli zero,0x2
...
Interestingly, the .rotdata
section is being read as RISC-V assembly, but if you check out the bit values it is actually just the string that we were using for msg
.
This example is pretty much the same as the last, but provides the
opportunity to discuss why we should structure assembly for shellcode in
a certain manner. One of the first concepts that we are going to need is
Position Independent Code (PIC). In our last example our long string
that we printed was read from the .rodata
section of the binary, which
means that the data is written to the corresponding binary itself and
either statically referenced or at run time a virtual address is
generated.
The behavior of having strings hardcoded in object sections means that we are not able to reliably reuse them in a generic manner when injecting the shellcode into a vulnerable target without prior knowledge of the binary, how it was compiled, and how it’s mapped into memory. We now have the restriction of making all the shellcode not reference static locations, we can’t use labels, can only use relative jumps, and are restricted from using library functions carefully. Up until this point we were functionally doing that anyway based on the simple nature of our programs, but a real assembly programmer would most likely be taking more advantage of these features.
You might also notice the ebreak
instruction. This is an incredibly
useful instruction that will trigger a gdb trap and will allow you to
debug incredibly easily. I used this very heavily during development to
better understand the flow of instructions and their interactions on the
stack. If your VM supports gdb, I highly suggest exploring the binary
with some helpful gdb commands like:
info registers
info stack
disassemble
stepi
Here is another example of code not calling a hardcoded .rodata
sections and a commented ebreak
for practice:
.section .text
.globl _start
_start:
lia0,0x0 #first argument: fd 0 = STDOUT
lia2,8 #third argument: sizeof(a1)
lit0,0x0a414141 #example of using temporary registers
lit1,0x42424242
addi sp,sp,-8 #move the stack down sizeof(t0+t1)
sdt1,0(sp) #store 'BBBB'
sdt0,4(sp) #store 'AAA\\n'
addi a1,sp,0 #point a1 to the top of the stack
lia7, 64 #64 is the __NR_write syscall
ebreak
ecall
lia7, 93 #exit with retval of previous syscall
ecall
When ran without gdb you will notice that the program exits with an
explicit Trace/breakpoint trap (core dumped)
, meaning that programs
with ebreak
may not be able to be run directly and will require some
sort of debugging capability.
This is the last example of simple assembly that we will write, it is a
simple call to execve(2)
to call /bin/sh
, or in hacker parlance:
getting a local shell. At this point we are still not writing any
commonly useful shellcode or anything doing remote access, but still
familiarizing ourselves with using position independent code and some
RISC-V assembly pain points.
We will use the same techniques as before to make a call to execve(2) in order to execute a shell. This combines all that we have learned so far and requires the following signature be matched from the execve documentation:
execve(const char *filename, char *const argv[],char *const envp[]);
While at first this might be intimidating, but just like with the
write
example all we need to do is set up the stack to contain the
/bin/sh
filename and since we are not setting any arguments to the
shell execution (argv) or the environment variables (envp) we can simply
point the arguments to NULL and move on.
Like in asm2.s we need to load the arguments in little-endian and store them on the stack since the first argument is a memory address of the file name.
.section .text
.globl _start
_start:
#execve(*filename, *argv[], *envp[])
li a0,0x6e69622f #nib/
addi sp,sp,-8 #set up the stack
sd a0,0(sp) #store '/bin'
li a0,0x0068732f #\0hs/
sd a0,4(sp) #store '/sh\0'
addi a0,sp,0 #set a0 to the top of the stack
li a2,0x0 #set argv[] to NULL
li a1,0x0 #set envp[] to NULL
li a7, 221 #221 is the __NR_execve
ecall
li a7, 93 #exit value of execve is in a0
ecall #exit the program with the retval of execve
Once compiled, we can compare the asm4.s code with what is actually compile time generated:
asm4: file format elf64-littleriscv
Disassembly of section .text:
0000000000010078 <_start>:
10078: 6e696537 lui a0,0x6e696
1007c: 22f5051b addiw a0,a0,559
10080: ff810113 addi sp,sp,-8
10084: 00a13023 sd a0,0(sp)
10088: 00687537 lui a0,0x687
1008c: 32f5051b addiw a0,a0,815
10090: 00a13223 sd a0,4(sp)
10094: 00010513 mv a0,sp
10098: 00000613 li a2,0
1009c: 00000593 li a1,0
100a0: 0dd00893 li a7,221
100a4: 00000073 ecall
100a8: 05d00893 li a7,93
100ac: 00000073 ecall
As expected we see li
has actually been turned into multiple lui
and addiw
calls. The w
in this immediate add is the 64-bit data wide word
variant. This is another situation that can occur from the li
pseudo-instruction.
The final step to test this is to set the PS1
shell variable to the
non-default in order to visualize when the new shell is executed and then call
the compiled binary:
$ PS1="ORIGINAl$ "
ORIGINAL$ ./bin/asm4
$ exit
exit
ORIGINAL$
Usage of the strace
command if available can also help visually see some
system calls that we made and the set up of their arguments:
ORIGINAL$ strace ./bin/asm4 2>&1 | grep -i -e "execve" -e "(($(whoami))"
execve("./bin/asm4", ["./bin/asm4"], 0x3fff96daf0 /* 29 vars */) = 0
execve("/bin/sh", NULL, NULL) = 0
Part 3 we will begin to look at more realistic styles of payloads that have restrictions on the type of characters that they can use, how the code gets introduced into vulnerable binaries, and an introduction into shellcode testers.