COMP10002 Foundations of Algorithms Awesome
Foundations of Algorithms Awesome
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP10002 Foundations of Algorithms Awesome
Version 1.2
2
Changelog
Version 1.2
• Modification: We have modified Section 5 and our inputs to be consistent. There is now no trailing newline at
the end of any of the input files.
• Correction: We have updated the project inputs to conform to the spec. There was additional ciphertext included
in the assignment inputs.
• Correction: In Section 5, Line 4 contains all the timestamps through T19, a typo left out the digit 9. You can assume
there will always be 20 timestamps given, and all input files now match this.
• Addition: Section 8.1 provides guidance on how to test each stage of your program using some new tools we are
providing and new tests on Grok.
• Clarification: In Section 6.3, we added an explanation about the debug input given through the submit functions
has been added, matching the scaffold’s comments.
• Clarification: In Section 6.4 details about read_hex_line have been expanded to match the scaffold’s comments.
Version 1.1
• Typo: Capitalized some references where they were erroneously lowercase
• Typo: End of Section 4.1, “one call AES”→ “one called AES”.
• Typo: End of Section 4.3, “0xC.”→ “0xCC” since 204 in base ten is 0xCC.
• Typo: Locations that used the command ./assignment1 have been changed to use ./program1
• Clarification: You do not need to read the research paper for this assignment, it is present only for enrichment
and enjoyment. It also will not make the assignment easier if you do read it.
• Correction: Section 8, We did not previously provide sample output for the two input files. The latest version of
the package contains two additional test files and two matching output files, the same ones used for Grok.
3
1 Learning Outcomes
In this assignment you will demonstrate your abilities functions, pointers, arrays, loops, and conditionals. The as-
signment covers the following components of the textbook: Chapters 1 to 7 inclusive, along with material from pages
230-235.
The assignment is intended to be very challenging and push your programming skills significantly further than Grok,
as well as testing your algorithmic thinking. However, it is also intended to be fun, and a very close analogue to a real
world task. The first time I coded the full version of the attack for the research paper took me a long time! You can
anticipate that this assignment might also take quite a while, but the payoff will be in your new skills and learning.
THE FOLLOWING IS FOR INFORMATIONAL PURPOSES, YOU DO NOT HAVE TO READ THE RESEARCH PAPER:
Read all the instructions carefully and make sure you understand what is expected of you
2 Full Story (just for fun)
Watch the video here and read the accompanying description!
3 The (Short) Story
You have intercepted an encrypted message, and know some details about how
it was generated. The message was produced by taking the xor of the original
message with some key k1.
This key, k1, was produced using a pseudorandom number generator at times T11
to T19.
However, while you know all the times at which the generator was ever used (T0
to T19), you only have access to outputs from T9 and T10 (O9 and O10).
You reverse engineered the generator and found it used a second key, k2.
Your goal is to reconstruct k1, and use it to decrypt the original message.a
aWhen you succeed, you will know as the output will be a secret message in English.
4 Background Material
The following additional material provides context for the assignment itself. You can also watch the videos uploaded to
the LMS for a walkthrough of the assignment and the background material.
4.1 Generating Random Numbers
Computers don’t know how to be truly random. While one can purchase a device to measure very unpredictable phe-
nomena (like radioactive decays) and convert them into random numbers, most computers have no such hardware.
Instead, computers rely on algorithms that produce random looking output that is very hard for an outsider to distin-
guish from output that is truly random. How hard? We say that no algorithm that runs in polynomial time (Big O of
any polynomial) should be able to tell if the output is truly random or not with anymore than an minute probability.
We call these pseudorandom number generators.
Importantly, the sequence of random numbers should also not be guessable. If you see prior output from the algorithm,
they should be so “random” that you can’t guess what will come next. This is important because we typically use a
random generator to make random keys for encryption.
One well known algorithm to generate random numbers is the the following pseudorandom number generator (known
as the ‘ANSI X9.31 DRBG’).
The generator itself relies on having a cryptographic function, normally one called AES that we discuss in a bit more
detail in Section 4.5. Because of this choice, our assignment will process 128-bits at a time, the same amount as AES. We
represent the encryption of messagem under AES with k key as AESk(m) = c.
4
The X9.31 generator operates as follows:
Step 1: Initialize the state of the algorithm with some pre-generated random array V0 of size 128-bits (think of it as
an array of 16 characters). This can be done by taking a value such as the time the computer was turned on,
xor’d with other timing values from the startup process.
Then loop starting with i = 1 as many times as you need to generate sufficient output
Step 2: Generate an intermediate value Ii = AESk(Ti) where Ti is the current time
Step 3: Generate 128 bits of output by computing Oi = AESk(Ii ⊕ Vi−1)
Step 4: Calculate the value that will be used in the next iteration Vi = AESk(Oi ⊕ Ii)
Eventually you’ll end up with an array O that has as many blocks of random numbers as you need.
Note that the value Vi is the only variable from the previous iteration that is used in the next iteration. We therefore
call Vi the state of the pseudorandom number generator. In this design, if an attacker can figure out the state, they can
predict all future outputs.
The values we provide already incorporate the value of V0, so you will not need to worry about it in the assignment.
4.2 The DUHK Attack
Unfortunately the ANSI X9.31 random number generator has a deadly flaw. If the following conditions hold, an attacker
can guess the sequence of random numbers, and therefore know any key that’s made up of those numbers:
The attack works as follows for i = 1 without loss of generality, assuming the following notation for AES decryption
AES−1k (c) = m:
By rearranging Steps 2 to 4 for i and i+ 1 we can get1:
AES−1k (AES
−1
k (Oi+1)⊕AESk(Ti+1)) = Oi ⊕AESk(Ti) (1)
Now assume that our clever attacker:
1. Has a small number of good guesses for the key k
2. Has intercepted Oi and Oi+1
3. Knows the values of Ti and Ti+1—the times when the two outputs were generated
The attacker now makes lots of guesses for the key k. When they guess the right one, the left side and the right side of
the equations will be equal! They now know the key.2
Because the attacker now knows the key they can compute the secret state of the generator using:
Vi+1 = AESk(Oi+1 ⊕AESk(Ti+1)) (2)
an equation in which they know the value of all the variables on the right hand side.
And, once the attacker knows the state of the generator, they can generate all the subsequent outputs Oi+2, Oi+3 etc.,
assuming they also know the times at which those are generated.
4.3 Bits and Bytes
To carry out this attack you might want to learn a bit more about how we can store and manipulate all the values used
for this sort of cryptography.
Remember from lecture that all values are ultimately represented inside a computer’s memory as numbers of some
sort. Every number can be written in different ‘bases’, like decimal (base-10) or hexadecimal (base-16) which we use for
addresses, or binary (base-2).
1This works because we know the inverse of AES encryption is decryption so we can move an encryption term to the other side where it becomes
a decryption term
2In the real attack, the adversary learns k by reverse engineering the code, but has decent guesses for the two times. The key k is typically too
large to guess.
5
If we write a number in binary, it will consist of only zeroes and ones, with the rightmost digit corresponding to the
one-place, the second-rightmost digit corresponding to the twos-place, then fours-place, eights-place, etc. ascending in
powers of two.
This is directly analogous to the places in a base-10 number: the ones place, the tens place, the hundreds place.
For example in the number 204, the number is composed of 2 · 100+ 0 · 10+ 4 · 1 = 2 · 102+0 · 101+4 · 100. We could
also add a zero in the thousands-place, but the number wouldn’t change.
We can write 204 in binary as 11001100 = 1 · 128 + 1 · 64 + 0 · 32 + 0 · 16 + 1 · 8 + 1 · 4 + 0 · 2 + 0 · 1 =
1 · 27 + 1 · 26 + 0 · 25 + 0 · 24 + 1 · 23 + 1 · 22 + 0 · 21 + 0 · 20
Similarly the number 5 could be written as 101 = 1 · 4 + 0 · 2 + 1 · 1, or we could add some leading zeroes to the front
and write 00000101
Thinking back to what we know about the char type, we know it can store up to 256 values (including zero). To write
all these numbers in binary we would need eight ones and zeroes, each of which we call a ‘bit’, with eight bits making
up a ‘byte’. So, when we represent a char in binary, we will typically add zeroes to the front until all eight bits are
represented.
This assignmentwill not require you tomanipulate the bits of any individual number other than using the XOR operation
on two bytes at a time.
We can do the same thing, just with base-16, instead of base-2, where along with the digits 0-9 we add the letters A-F to
our repertoire. We call this the hexadecimal system, or hex for short.
This means after 9, we count A, B, C, D, E, F and then we’re out of symbols, so we add a 1 to the next column (the 161s),
and put a zero in the original column (the 160s). So 10 = 1 · 161 + 0 · 160 in hex is actually the number 16!
Likewise we could write3 204 as 12 · 161+12 · 160 = 0xC · 161+ 0xC · 160 = 0xCC. Note that here I prefix hexadecimal
numbers with 0x to indicate to the reader, by convention, that the number is in base-16.
4.4 XOR
Table 1: Bitwise XOR operation
ik jk (ik ⊕ jk)
0 0 0
0 1 1
1 0 1
1 1 0
One of the operations that we care about more in cryptography is called
exclusive-or.
Given two eight-bit numbers represented as a collection of eight bits i0...7 and
j0...7, we can define operations that change the bits in different ways. One
of these operations is XOR—short for exclusive-or—which we write using the
symbol ⊕. We can define XOR in the following manner:
Let l0...7 be the output of the ⊕ operation.
For each index k from 0 to 7, if ik = jk then lk = 0, otherwise lk = 1. We
illustrate this in Table 1.
In Table 2 we present the sample calculation of 00110011 ⊕ 01010101 =
01100110. To determine the result for each column, one can use Table 1 on the value in the same position in both
of the rows.
Table 2: Example of Bitwise XOR
00110011
⊕ 01010101
= 01100110
It’s called exclusive-or because applying the XOR operator bitwise, each bit
returns a true value (1) if only (exclusively) one of the two input bits was 1.
Note this works for numbers of any bit-length, with the definition adjusted
accordingly.
In C we take the XOR of two values with the bitwise XOR operator ^. For
example to compute the XOR of two numbers a and b and place the result in
c we write c=a^b, where a, b and c are of a type with an underlying integer
representation (such as an integer, but including a char—which we recall is just
a number in ASCII).
4.5 One-time pad and Basic Encryption
For the very last part of the assignment you’ll need to know just a tiny bit more cryptography which we present here.
3Here I’m abusing notation to mix two bases in one equation for the purpose of illustration
6
Symmetric encryption (where both sides start sharing a secret key) obeys the following property: Dk(Ek(m)) = m.
That is with amessagem, encryption algorithmE and decryption algorithmD, both ofwhich use a key k, the decryption-
of-the-encryption should be the original message.
A one-time pad is a way of encrypting a message. Given a key that is as many bits in length as the message to encipher,
this provides the best theoretical security guarantee of any cipher4. To encrypt a messagem into a ciphertext cwith key
k the process is simple: for the i-th byte of the message, XOR it with the i-th byte of the key. That is, for all i ∈ len(m):
c[i] = m[i]⊕ k[i]
The decryption process is similar: for the i-th byte of the ciphertext, XOR it with the i-th byte of the key. That is, for
all i ∈ len(c):
m[i] = c[i]⊕ k[i]
Why don’t we use this technique everywhere? For the size of the messages that we transmit daily, we’d need to have
already sent and stored keys just as long as the messages! This quickly gets unwieldy and infeasible.
Instead we typically use ciphers that aren’t as strong but that can use a small key to secure messages that are for all
practical purposes, secure. The most widely used (symmetric) cipher is called AES (the Advanced Encryption System).5
5 Your Assignment
Your solutions must be contained within a C file, program1.c.
Your code must compile as submitted or you may receive a zero on the assignment.
Your program is to read in input of the following format:
Input Line 1: Length of Encrypted Message in bytes, an integer l with a maximum value of 1024.
Input Line 2: Encrypted Message encrypted using a one time pad under key k1, which is l characters in length.
Each character is represented as a two-digit hexadecimal number in the file (see Lecture 10, minute
15). This corresponds to b = ⌈ l16⌉ groups of 16-characters (blocks), where ⌈x⌉ is the ceiling function.The numbers will not be prefixed with 0x.
Input Line 3: Outputs O9 and O10 from the random number generator, in groups of 16 characters, where each
character is represented as a two-digit hexadecimal number in the file.
Input Line 4: Timesteps T0 through T19 used to generateO0 toO19, in groups of 16-characters (blocks), where each
character is represented as a two-digit hexadecimal number in the file.
Input Line 5: The first 1284 characters in the text of Sherlock Holmes (or some other novel!), the cipher book.
There is no trailing newline at the end of your input.
Your job is to read in the text of the book, and find which 16-character block has been used as the key k2 for the random
number generator.
You will then generate enough output from the random number generator to produce k1, which is composed entirely
of this random output.
Finally, you will decrypt the original message using the key k1.
Along with this spec we are providing you with skeleton code to fill in, a set of libraries to accompany the scaffold and
files containing the input data. See Section 6 for more details.
We now provide information about the stages in greater detail.
4As discussed in lecture, it was Claude Shannon who provided the first formal proof of this fact!
5Please do not use the techniques in this assignment to encrypt data in real life. While the algorithms used are very similar, there are a large
number of caveats to learn about first.
7
Stage 0: Reading the Input File
Your first task is to read in the input file (from stdin, the standard input), and parse each line correctly, as described
above.
This is typically done (outside Grok) with a command similar to this one:
./program1 < assignment1-input1.txt
When the program tries to read input, it receives the contents of the file. You can always open up a file in your text
editor to see what’s inside, and we encourage you to do so for this assignment.6
Do not use the facilities in C to open the input file within your code. While this is often good practice, we have not
learned it yet in class. Therefore, your submission must take in input as described above.
You must store your results in the following variables, available as arguments in the function stage0:
• int *ciphertext_length which should be set by pointer to the length of the ciphertext (Line 1),
• msg_t ciphertext which should contain the ciphertext (Line 2),
• block_t outputs[] which should contains the two outputs from the random number we provide (Line 3),
• block_t timesteps[] which should contains all the timesteps (Line 4), and
• book_t cipherbook which should contain the text of the cipher book. (Line 5).
These values are passed to the submit_stage0 function called in main. You will not be able to pass the tests on Grok
if you leave extraneous printf statements in your code. This will present you with debugging output in the following
format (truncated for brevity):
Stage 0
==========
Length of encrypted ciphertext (bytes): 128
Encrypted ciphertext, as hexadecimal bytes and ASCII: (below)
0x0000: 0c 6a 03 63 c3 7c 93 d6 48 89 81 27 6b d3 86 bb ·j·c·|··H··'k···
...
0x0070: 00 74 cc 54 64 74 58 0e 5f 2f 86 4f 21 da 92 e7 ·t·TdtX·_/·O!···
Outputs, as hexadecimal bytes and ASCII: (below)
O_ 9: a9 90 e4 e7 f0 d7 1d 0f d3 67 62 ba 1e 65 8f 84 ·········gb··e··
O_10: 85 99 d6 a1 93 c5 30 01 e0 ec 1b 69 ed 0e 20 e8 ······0····i·· ·
Timesteps, as hexadecimal bytes and ASCII: (below)
T_ 0: 01 3c 6c 60 52 86 02 00 01 3c 6c 60 52 86 02 00 ·T_ 1: 01 3c 6c 60 37 9a 02 00 01 3c 6c 60 37 9a 02 00 ·...
T_19: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ················
Cipherbook: (below, wordwrapped to 80 chars)
To Sherlock Holmes she is always THE woman. I have seldom heard him mention her
under any other name. In his eyes she eclipses and predominates the whole of her
...
ifi
Your code’s output should exactly match the sample output given for the input file.
Stage 1: Stripping Punctuation
Produce an array that consists of only the alphanumeric values from the book’s text on Line 5.
You may not use any functions from string.h or ctype.h.
You must store your results in the following variables, available as arguments in the function stage1:
• book_t cipherbook[] which should (after your completed implementation) contain only the alphanumeric val-
ues from the book, and
6Some files store data intended to be read only by a computer, so a human may not be able to understand and read every file on a computer.
8
• int *book_len which should be set by pointer to be the length of the cipher book after you have removed all
non-alphanumeric values.
These values are passed to the submit_stage1 function called in main. This will submit your result for stage 1 for
grading, as well as giving you a debug output with the format (truncated for brevity):
Stage 1
==========
Punctuation stripped cipherbook length: 1024
Punctuation stripped cipherbook, as hexadecimal bytes and ASCII: (below)
0x0000: 54 6f 53 68 65 72 6c 6f 63 6b 48 6f 6c 6d 65 73 ToSherlockHolmes
0x0010: 73 68 65 69 73 61 6c 77 61 79 73 54 48 45 77 6f sheisalwaysTHEwo
...
0x03f0: 68 65 68 6f 6d 65 63 65 6e 74 72 65 64 69 66 69 hehomecentredifi
Your code’s output should exactly match the sample output given for the input file.
Stage 2: Guessing the key k2
You will need to implement a loop to iterate through the stripped book you generated in stage 1, in blocks of 16-bytes.
For each block, generate both sides of eq. (1). above and check if both sides match. When both sides match, you have
discovered the correct set of 16 bytes that comprises k2
You must store your result in the following variable, available as arguments in the function stage2:
• block_t key2 which should contain the 16 characters that make up k2.
This value is passed to the submit_stage2 function called in main. This will submit your result for stage 2 for grading,
as well as giving you a debug output with the format:
Stage 2
==========
Key k2: 76 65 72 65 78 63 65 6c 6c 65 6e 74 66 6f 72 64 verexcellentford
Your code’s output should exactly match the sample output given for the input file.
For this stage you will need to use the AES functions we provide, whose use we describe in section 6.2.
Stage 3: Generating the key k1
You will need to generate outputs O11, O12, ..., Ok until you have generated n bytes of output where n is the length in
characters of the encrypted message on Line 2. Remember each O is 16-bytes in length.
Write a function that implements the random number generator, and use it to generate output of length n. This output
is k1. To do so you will need to write code that performs Steps 1 to 4 listed in section 4.1.
This will require you to first calculate V10, the initial state of the generator, which you should be able to compute from
O10, I10 and k2 using eq. (2).
You must store your result in the following variable, available as arguments in the function stage3:
• byte_t key1[] which should contain the n characters that make up k1.
These values are passed to the submit_stage3 function called in main. This will submit your result for stage 3 for
grading, as well as giving you a debug output with the format:
Stage 3
==========
Key k1, as hexadecimal bytes and ASCII: (below)
0x0000: 58 2f 50 37 97 39 c0 82 1c cc d2 73 3f 96 d5 ef X/P7·9·····s?···
0x0010: 78 06 f3 fb d5 64 04 a3 5c 6f 2a bd 98 66 49 f1 x····d··\o*··fI·
...
0x0070: 54 31 9f 00 30 31 0b 5a 0b 6a d5 1b 75 9f c1 b3 T1··01·Z·j··u···
Your code’s output should exactly match the sample output given for the input file.
9
Stage 4: Decrypting the original message
Iterate over both k1 and c the encrypted message taking the xor to produce the original unencrypted message and solve
the caper!
You must store your result in the following variable, available as arguments in the function stage4:
• byte_t plaintext[] which should contains the decrypted message.
These values are passed to the submit_stage4 function called in main. This will submit your result for stage 4 for
grading, as well as giving you a debug output with the format:
Stage 4
==========
Decrypted ciphertext, as hexadecimal bytes and ASCII: (below)
0x0000: 54 45 53 54 54 45 53 54 54 45 53 54 54 45 53 54 TESTTESTTESTTEST
0x0010: 54 45 53 54 54 45 53 54 54 45 53 54 54 45 53 54 TESTTESTTESTTEST
...
0x0070: 54 45 53 54 54 45 53 54 54 45 53 54 54 45 53 54 TESTTESTTESTTEST
Decrypted ciphertext, as plaintext: (below)
TESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTT
ESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTESTTEST
Your code’s output should exactly match the sample output given for the input file.
6 Provided Code
We are providing you with a file that contains some pre-existing code to both aid in grading, and also to help you
structure your assignment.
You must use the scaffold code for the assignment, else you will have differences in your output and will face severe
mark deductions.
The scaffold begins with an authorship declaration. You must fill this in with your full name, student number and
date.
6.1 Type definitions
The scaffold provides a set of type definitions: book_t, byte_t, block_t and msg_t. These are aliases for their defined
types. For example, block_t, representing a block of 16 bytes of data is defined with:
#define BLOCKSIZE 16 // The length of a block
typedef unsigned char byte_t; // A byte (8 bits)
typedef byte_t block_t[BLOCKSIZE]; // A cipher bitset (block) (16 bytes)
block_t can be used anywhere where you would use a unsigned char[16]. For the rest of the implementations, see
the scaffold code.
6.2 The aes.h library
The scaffold includes the library aes.h with the line #include "aes.h". It provides the simple AES implementation
functions:
AES_encrypt(unsigned char message[16], unsigned char key[16], unsigned char output[16])
AES_decrypt(unsigned char ciphertext[16], unsigned char key[16], unsigned char output[16])
Clearly, a variable of type block_t could be used in the arguments of the AES functions.
Do not modify any code in aes.h or aes.c. It will be replaced with a fresh version in grading.
6.3 The a1grader.h library
The scaffold includes the library a1grader.h with the line #include "a1grader.h".
It provides access to the following functions:
10
void submit_stage0(int cipher_length, msg_t ciphertext, block_t outputs[N_OUTPUT_BLOCKS],
block_t timesteps[N_TIMESTEPS], book_t cipherbook);
void submit_stage1(book_t stripped_book, int book_len);
void submit_stage2(block_t key2);
void submit_stage3(byte_t *key1);
void submit_stage4(msg_t plaintext);
These are called automatically in the scaffold code at the end of each stage of the assignment, for printing the outputs
you saw in 5, as well as submitting those stages for assessment during grading. Make sure your values for submission
are stored in the variables that are passed to these functions.
The library also provides the helper function void hexdump(byte_t a[], int len), a debug function that outputs
a byte array in the format you saw in each of the stages in Section 5:
0x0010: 87 0f 7b c0 ce 4d 2e 10 cb 8c 3e b2 57 a9 bb 4f ··{··M.···>·W··O
On each line, the bit number of the first byte of each block is printed in hexadecimal. Then follows two sets of eight
bytes (a block of 16 bytes), with each byte printed as pairs of hexadecimal characters (00 to FF represents all values 0 to
255 that can be stored in a byte). Finally, an ASCII representation of these bytes is printed, with all non-printable ASCII
characters (control characters, newlines, invalid ASCII) being printed as · (centerdot). In the example above, the byte
0f is at address 0x0010 and is not a printable ASCII character, so is printed as a ·.
You are free to use this function in debugging your code, but we recommend removing those debug printouts when you
submit your final work.
We have provided you an implementation for this library, in a1grader.c for the development version of these functions.
For grading, we will use the grading version of a1grader.c, which you will not be able to see, specifically for grading
your work.
Do not modify any code in a1grader.h or a1grader.c. It will be replaced with a fresh, but modified version for
grading.