A couple of weeks ago the folks over at Privasec RED posted a series of challenges on their official Twitter account. Normally I ignore this sort of thing but the third challenge caught my attention - decrypt a the contents of an encrypted tweet, and win a fabulous prize - a 1 month subscription to Hack the Box. Actually, the prize wasn’t really important, I just couldn’t resist a good crypto puzzle, so I decided to have a crack.
When I finally managed to get the answer a few hours later, I kicked myself that it had taken me so long, the tools to solve it quickly were at my fingertips to begin with, it just didn’t occur to me until I was a couple of hours in how they would help me here. I wanted to share the process I went through as I think it makes a neat lesson for anyone wanting to get started on code breaking, so I after I claimed my prize (yay!) I asked @privasecred if they would mind if I posted a write up - they kindly obliged me.
I should start by saying that my knowledge of how cryptography works is pretty basic - I’ve done the Stanford University “Cryptogtaphy 1” course on Coursera (which I also highly recommend), but that is about it. I am terrible at maths and the content of that course was probably 50% over my head. Still, I did learn a thing or two about breaking ciphers.
So to start with, here’s the encrypted tweet:
ZmJrZ3t0bm1yIG1yIHRuZCBmYmtnIHlodSBrcWQgYmhoYW1lZyBmaHEhIHFkY2RjbGRxIHRoIGpoY2Qga2VpIGltcWRqdCBjZHJya2dkIHVyIGZocSB0bmQgcnVscmpxbW90bWhlISB9
When I first saw this tweet, I barely paid any attenion. One glance was enough for my brain to say “trying to decrypt a random string without knowing the algorith or parameters is a fools errand”. When it appeared in my feed again a bit later, I thought “hmm, maybe I should try Googling it”, but of course it wasn’t that easy. At this point I gave up, I didn’t really have the patience to plug away at something using trial and error, so I went back to the report I was supposed to be writing.
5 minutes later I realised something that should have been obvious - the encrypted string from the tweet wasn’t raw encrypted data. One thing I learned from the Stanford course was that a raw encrypted string will never look like that because when you XOR the ASCII byte value of a printed character with another (pseudo)random byte, the output won’t always be another printable character, it will be some value between 0 and 255, which may or may not be printable. The encrypted tweet however consisted entirely of printable alphanumric ASCII characters, which meant that it had been encoded into a format that was friendly for a tweet. What format is commonly used for rendering binary data as text?
$ base64 -d tweet.txt
fbkg{tnmr mr tnd fbkg yhu kqd bhhameg fhq! qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe! }
OK! Now I was getting somewhere, and my interested was instantly locked in. The decoded tweet was gibberish, but the arrangement of letters and spaces was similar enough to regular English that I was almost certain I was looking at a substution cipher. A closer inspection confirmed my suspicion through another tell-tale sign - some of the “word” patterns appeared more than once, indicating that the cipher used a fixed “key” where the mapping between letters does not change throughout the ciphertext.
Armed with this knowledge (and ignoring for a moment that there are tools on the web that will break a substitution cipher for you in less than a second), I followed the substituion cipher playbook and performed a letter distribution analysis on the ciphertext, and compared this with the frequency distribuion of normal English text:
Ciphertext:
Letter | D | H | Q | R | M | T | K | C | F | G | B | E | J | N | U | I | L | A | O | Y | P | S | V | W | X | Z |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Count | 10 | 8 | 7 | 7 | 6 | 6 | 5 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Freq% | 12.0 | 9.6 | 8.4 | 8.4 | 7.2 | 7.2 | 6.0 | 4.8 | 4.8 | 4.8 | 3.6 | 3.6 | 3.6 | 3.6 | 3.6 | 2.4 | 2.4 | 1.2 | 1.2 | 1.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
English text:
Letter | E | T | A | O | I | N | S | H | R | D | L | C | U | M | W | F | G | Y | P | B | V | K | J | X | Q | Z |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Freq% | 12.7 | 9.1 | 8.2 | 7.5 | 7.0 | 6.7 | 6.3 | 6.1 | 6.0 | 4.3 | 4.0 | 2.8 | 2.8 | 2.4 | 2.4 | 2.2 | 2.0 | 2.0 | 1.9 | 1.5 | 1.0 | 0.8 | 0.15 | 0.15 | 0.10 | 0.07 |
From this is was pretty clear that D mapped to E, so I plugged that into the cipher text:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**** **** ** **E **** *** **E ******* *** *E*E**E* ** ***E *** ***E** *E****E ** *** **E ************
Hmm, not really that helpful, but it looks like H might map to T, so lets try that:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**** **** ** **E **** *T* **E *TT**** *T* *E*E**E* *T *T*E *** ***E** *E****E ** *T* **E **********T*
Answer is still not leaping out at me… but now it gets harder, equal frequency of Q and R. Lets try Q = A:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**** **** ** **E **** *T* *AE *TT**** *TA AE*E**EA *T *T*E *** **AE** *E****E ** *TA **E *****A****T*
Uh nope. Maybe R = A:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**** ***A *A **E **** *T* **E *TT**** *T* *E*E**E* *T *T*E *** ***E** *EAA**E *A *T* **E A**A******T*
At this point I start to feel like I’m playing a poor man’s wheel of fortune. I decide I need to try something else. I’m pretty certain D = E, but everything else seems a bit iffy, so I take everything out except E and start looking at the letter combinations in the ciphertext:
tnd
**E
This combination occurs more than once and ends in E, so I think there’s a good chance this is some kind of conjunction - THE or ARE are good contenders. Lets try THE:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**** TH** ** THE **** *** **E ******* *** *E*E**E* T* ***E *** ***E*T *E****E ** *** THE ********T***
Turns out that wasn’t much help, those letters aren’t used enough elsewhere to make the rest of the puzzle any easier.
I continued with this kind of trial and error for quite a while, looking for patterns like double letters and recurring sequences, but every time I got to more than 4 substituted letters it was clear that I was making some non-words. I kept having trouble with one particular combination:
qdcdcldq
*E*E**E*
That’s a pretty specific arrangement of Es. Not only that but the combination “dc” repeats twice consecutively, AND the first and last letters of the word are the same. Can’t be more than a couple of 8-letter words in English that fit that pattern. If only I had some way to search the dictionary using an arbitrary string pattern…
Oh wait I do. Regular Expressions to the rescue:
^([a-z])e([a-z])e\2.e\1$
R E M E MBE R
Oh look, exactly one match. Let’s plug that into the cipher:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**** **** ** **E **** *** *RE ******* **R REMEMBER ** **ME *** **RE** ME****E ** **R **E **B**R******
Well that’ slightly better, I can now guess with fairly high certainty that K = A because kqd almost certainly must be ARE.
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**A* **** ** **E **A* *** ARE ******* **R REMEMBER ** **ME A** **RE** ME**A*E ** **R **E **B**R******
That didn’t help much though, so time for another Regex. That rr in cdrrkgd looks promising:
^me(.)\1a.e$
ME S SAGE
One match again.
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**AG ***S *S **E **AG *** ARE ******G **R REMEMBER ** **ME A** **RE** MESSAGE *S **R **E S*BS*R******
Interesting, G = G, so not all letters in this cipher have been transposed. With S now solved I can probably work out m and u with a bit of trial and error thanks to those 2 letter words ending in S:
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**AG **IS IS **E **AG **U ARE ****I*G **R REMEMBER ** **ME A** *IRE** MESSAGE US **R **E SUBS*RI**I**
OK, that pretty much nails it. Last word is clearly SUBSCRIPTION.
fbkg tnmr mr tnd fbkg yhu kqd bhhameg fhq qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe
**AG T*IS IS T*E **AG *OU ARE *OO*ING *OR REMEMBER TO COME AN* *IRECT MESSAGE US *OR T*E SUBSCRIPTION
With only a handful of letters left to solve, the solution becomes pretty obvious. Plug the punctuation back in for completeness:
fbkg{tnmr mr tnd fbkg yhu kqd bhhameg fhq! qdcdcldq th jhcd kei imqdjt cdrrkgd ur fhq tnd rulrjqmotmhe! }
FLAG{THIS IS THE FLAG YOU ARE LOOKING FOR! REMEMBER TO COME AND DIRECT MESSAGE US FOR THE SUBSCRIPTION! }
TA-DA!
Once I was done, I kicked myself for not thinking of regular expressions sooner. I wasted at least an hour plugging in substitutions by guess work and really getting nowhere, but in hindsight I feel like I was “tricked” into thinking the problem was going to be easy to solve because it was “just” a substitution cipher. I guess the lesson here is when you are cracking codes of any kind, remember computers are way better at it than you are.