Hide

Problem A
Crazy Base64

/problems/nsachallenge21.crazybase64/file/statement/en/img-0001.jpg

Problem Statement

A government defense contractor has been compromised by a recent global wave of ransomware. You have been brought on because of your excellent programming and analysis skills. Your mission is to produce code which can decrypt the encrypted files before the malicious actors delete the keys. They are demanding a ransom to get the files decrypted. The CEO has made it clear that they are not at liberty to communicate with the malware developers or pay any ransom. A cyber security firm has also analyzed various versions of the same malware and you have obtained a draft copy of their upcoming report to assist you in your efforts.

The situation appears dire on the surface. The cyber security firm reversed engineered components of the ransomware and has provided some very useful information, but there are a few places where details are scarce and there is no code to support their analysis. According to the report, the malware takes a multi-layered approach to encrypting the files, with the goal of forcing the victims to pay a ransom for the decryption service. After initially browsing the report, it becomes apparent that the ransomware developers have limited cryptanalytic knowledge. This might just be the way in. You will need to complete a series of steps and develop code to decrypt all the files.

The malicious actors have demanded the ransom be paid by 2:30pm EDT or they will destroy all the keys needed to recover the files, which leaves you with less than 3 hours to complete the tasks and determine how to leverage your code to create a decryption solution. At this point, the future of the company is in your hands, so you better get to it! You have been provided with a small number of encrypted files to build your code and test against. If you can decrypt all of these, you will have proven to the CEO that you can decrypt all of their files and you will have saved the day.

An Encoding Layer

According to the malware report, one of the layers that the ransomware actors use is a modified version of base64 encoding. Base64 is a method of representing data as ASCII characters, useful in applications where representing data as raw binary would create an issue. It is easy to spot, since it uses a standard alphabet and will always be ASCII data. The standard base64 alphabet contains 65 characters: 26 uppercase letters, 26 lowercase letters, 10 digits, +, /, and =. The = is only used for padding and the other 64 actually encode the data. Programs will sometimes modify the standard base64 alphabet for various reasons. URLs, for example, will use a URL-safe variant that does not include "/"s. The ransomware developers, however, are just trying to make things harder to analyze.

It is best to think about base64 as a $6$-bit encoding ($2^6 = 64$) method, considering the data in groups of three bytes at a time. We can represent 3 bytes, or 24 bits, as four $6$-bit groups. To represent those bits as ASCII characters, use the alphabet below:

\[ {\tt ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/} \]

where ${\tt A = 000000}$, ${\tt B = 000001}$, …, ${\tt / = 111111}$.

When data cannot be evenly split into groups of $3$ bytes, we pad values out with $0$ bits and use an "=" sign to note this padding. See the examples below for a visual representation of how the encoding works.

\includegraphics[width=0.9\textwidth ]{base64.jpg}

Figure 1: A standard encoding of $3$ bytes into $4$ base64 characters.

\includegraphics[width=0.9\textwidth ]{base64_2.jpg}

Figure 2: An standard encoding of $2$ bytes into $4$ base64 characters with padding.

\includegraphics[width=0.9\textwidth ]{base64_3.jpg}

Figure 3: An standard encoding of $1$ byte into $4$ base64 characters with padding.

Modified Base64

The ransomware developers felt standard base64 was not adequate for them so they devised a modified version of it to attempt to obfuscate the encrypted data even more. The first change that is noted in the malware report is an altered alphabet. Instead of using the alphabet above, they are using:

\[ {\tt AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz9876543210*@ } \]

with “!” as a padding character instead of “=”.

They also rotate the alphabet after each group of $3$ bytes is encoded by circularly shifting the characters of the entire alphabet by the byte value ($0-255$) of the last byte from the previous group of $3$ encoded or decoded, shifting to the left for odd and to the right for even values. After the next group is encoded, the currently shifted alphabet shifts again in the same manner and this continues through the scope of the call to encode or decode. New calls to the routine start with the unshifted version again.

As an example, If we encode "The" as before but using the ransomware alphabet, we would get "kDqs" (instead of "VGhl" as with the standard alphabet) as the indexes now point to different letters. But if we encode "TheTheThe", we will get "kDqs3vDFpIvx". The first group of $3$ bytes will use the unshifted ransomware alphabet, then the alphabet circularly shifts to the left $101$ (e = $0x65 = 101$) resulting in the following alphabet for the second group of $3$ bytes:

\[ {\tt sTtUuVvWwXxYyZz9876543210*@AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrS } \]

and after another shift left of $101$ (e) we encode the last group with

\[ {\tt FfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz9876543210*@AaBbCcDdEe } \]

Since shifting the alphabet $64$ in either direction returns the same alphabet, you can also think about a shift of $101$ as a shift of $37$ ($101 \mod 64$).

Efficiently remove this layer by writing an encoder/decoder for this modified scheme.

Input

The input is two lines with the first line being the string ENCODE or DECODE signifying whether to encode or decode the data found on the second line. If encoding, the data will be in HEX and you need to convert it to raw data before encoding. If decoding the data, the input will be the ASCII string of characters.

Output

Your output should the string in ASCII characters if encoding or the HEX representation of data if decoding.

Sample Input 1 Sample Output 1
DECODE
kDqucnVJXCj6k01SAVkeiINXG8*!
54686973206973206578616d706c652074657874
Sample Input 2 Sample Output 2
ENCODE
416c6c20796f75722062617365206172652062656c6f6e6720746f207573
ILyWOn4bQNfr2eULxUl6joflx0WbDqSJNmraeEL9
Sample Input 3 Sample Output 3
DECODE
95gxYoYTLsIJqSIy*H@K2ns3ihiAba77JbbfWz*eEAUD@SJ!
d3836f06e077c3faac9ea5b735d3a39c74668e18d29e569a401049868cfeef36f9e5e3
Sample Input 4 Sample Output 4
ENCODE
79a042ca1aa11a913d79766c6fe3514e0ab10505492f47b6f05c27eab11ac7345720e71e
PNaBYpUpsErO3z8EfX0aKqvZzBd2DxmyaF8WTlYDbcruy6cd

Please log in to submit a solution to this problem

Log in