Q. Why am I making a writeup of a challenge that was from three years ago now?
Annually, b01lers runs their own internal CTF for Purdue students who wants to practice their CTF skills. We meet every Friday and play live CTF together, but an asynchronous CTF that lasts for a year where enthusiasts can try out real CTF problems from previous b01lersCTFs without having to worry about the time limit and "spoiling" the answers (or getting spoiled) is certainly desirable.
I found this challenge while roaming around the internal CTF site sometime last year. At first, the challenge looked very easy, and the first part of it was indeed a piece of cake. The second part, however, was the actual roadblock. I had what-I-thought-was a perfectly working solution for almost five hours, wrongfully convinced that either something was wrong with Python or the challenge itself, until I figured out the solution using a famous result in my research area.
I always wanted to write a writeup about this challenge. I procrastinated until I forgot about it, and until the said famous result was mentioned in one of the classes I am auditing this semester.
Without further ado, let's get started!
crypto/Hardcore
Author: mtanghu
Part 1
There are a lot of things happening in the code, but fear not, let's read the "problem statement" carefully for the first part.
Let $f$ be a one-way function, which in this problem it is SHA1. You are given $f(x)$ for some secret bitstring $x$, which appears to be 256-bit long based on the code. When the user sends in a bitstring $r$, it takes a bit-wise inner product $\left<r,x\right>$ and returns it to the user. Based on $\left<r,x\right>$, we have to recover $x$ correctly.
This is a pretty typical problem in security that appears in many research fields like MPC and differential privacy. If, for example, $x$'s least significant bit is $x_n = 1$, then by feeding $r = 0\dots 00$ and $r = 0\dots 01$:
That is, we can determine $x_k$ by comparing $\left< r,x \right>$ and $\left< r,x_{\neg k} \right>$. If they are the same, then $x_k = 0$; otherwise, $x_k=1$.
Flag1: bctf{do_you_like_hardcore_chals}
Part 2
Things are now a little tricky.
Note that the 100% accuracy part has been changed to 90%.
Attempt 1
One may naively think (like I did) that, since the accuracy is 90%, we can just send the same bitstring $x$ multiple times and take the majority; basically, we repeat Part 1 multiple times.
But then you'd realize that this solution does not work at all. Why?!?
Attempt 2 (Flag!)
After inspecting the local variables in the code above, one should be able to realize that the outputs are deterministic as long as the input payload is fixed, the 'randomness' only comes from the input. For example, count is always 0 or equal to trial. This is due to how random variable chance is computed in the predictor.
Note that FLAG is indeed fixed, so x_r is fixed as long as r is fixed, hence the randomness from the input only, and we cannot solve this problem by just replicating the solution from Part 1. Here, x_r is an array of boolean variables that we cannot predict as we do not know FLAG.
Since the randomness is, roughly speaking, based on the randomness of the inputs, the most natural thing is to randomize the input by XORing with random strings (because it is basically the one-time pad). For some people, this may seem like something that came out of the blue or rather ad-hoc at best, but it actually has a deep theoretical implication called Goldreich-Levin hardcore predicate, which I will outline briefly below (but please feel free to skip).
Security definition of one-way functions requires that it should be very difficult for any (polynomial time) adversary to invert $f(x)$ and recover $x$. Note that the function that returns $0$ for all input ($f(x) \equiv 0$) is not a one-way function because we can trivially find $x$ that returns $f(x)$ (by just picking any element $x$ from the domain, yawn). In a similar spirit, one can realize that $f(x)$'s security only guarantees that it 'hides' $x$ itself, may not guarantee that it hides everything about $x$ (for example, a bit of $x$).
But given that $f$ is still a one-way function, there must exist a bit (or some bits) of $x$ that is hard to compute/guess just based on $f(x)$. More generally, there must be a (boolean) function $g(x)$ (that may reveal some information about $x$) that is hard to compute when you are just given $f(x)$. This is called hardcore predicate. It is known that, given a one-way function $f(x)$ and some (random) variable $r$, it is difficult to compute $g_r(x) = \left< r, x\right>$ without $x$, and this $g(x)$ is called Goldreich-Levin hardcore predicate.
In short, a hardcore predicate is a secret bit that is hard to compute. But, thinking ahead, this also means that, if there is an algorithm that predicts them efficiently, we may be able to use it to recover the secret $x$ efficiently. The server is running Hardcore.py which, upon input $r$, returns $b$ such that $b = \left<r,x\right>$ with probability $0.9$. So, our server is the oracle that predicts Goldreich-Levin hardcore predicate efficiently! And the algorithm that predicts $x$ based on $\left<r,x\right>$'s indeed exists already (by Goldreich and Levin, obviously, hence the name). As you would've guessed, the key is to feed ($e \oplus r$)'s where $e$'s are random strings, instead of $r$'s. This lecture note by Prof. Omkant Pandey at Stony Brook outlines the algorithm and proof very nicely.