comp2022 Assignment 3 s2 2020
In this assignment you will implement an algorithm to convert an NFA to a DFA, and decide if strings are members of the corresponding language, by performing the following four tasks:
1. Compute the epsilon closure E(q) for each state q ∈ Q of the NFA. 2. Use the epilson closures to construct an equivalent epsilon-free NFA. 3. Use the epsilon-free NFA to construct an equivalent DFA.
4. Determine if the input strings are recognised by the DFA.
Input is read from standard input (stdin). The first line of input indicates which part of the algorithm to execute. The remaining input is the data to act on. Examples of usage, and of input and expected output are provided in the appendices.
Skeleton code is provided for Python on Ed. The code demonstrates how to read the input data. You are not required to use the skeleton code, and may modify it.
1
• •
2
• •
3
•
• •
Compute epsilon closures of the NFA
Input: epsilon-closure followed by an NFA (the syntax is described in the appendices). Output: One line per state, giving the epsilon closure of that state. You can output the lines
(and contents of the sets) in any order. E(q0) = {q0, q2, q3} would be output as q0:q0,q2,q3 Construct an equivalent epsilon-free NFA
Input: nfa-to-efnfa followed by an NFA and its epsilon closures (the syntax is described in the appendices).
Output: A string representing the equivalent epsilon-free NFA constructed using the algorithm given in lectures. The states in the new NFA should have the same names as the corresponding states in the original one.
Construct an equivalent DFA
Input: efnfa-to-dfa followed by an epsilon-free NFA (the syntax is described in the appen- dices).
Output: A string representing the equivalent DFA, in the format described in the appendices.
Caution: If followed exactly, the algorithm given in lectures will always result in a DFA with a state for every subset of states of the given NFA. This exponential growth is highly inefficient, and the test cases will likely fail because they took too long to compute. To avoid this “best case exponential” case, you should only construct the states actually reachable from the start state (i.e. add a state representing {q0} first, then a state corresponding to each set of states reachable from that, then the sets reachable from those, etc.), as was demonstrated in lectures.
1
comp2022 Assignment 3 s2 2020
4
5
• •
Decide if the strings are in the language
Input: compute-dfa followed by a DFA, then one or more strings to test (the syntax is described in the appendices).
Output: For each string, output 1 or 0 only, indicating whether that string was accepted by the DFA.
Submission and Marking
Submit your code on Ed. You do not need to submit anything anywhere else for this assignment. Each task (1–4) is worth 20 marks. Within each task, the marks are distributed as follows:
• 5 marks for passing the corresponding test case in appendix 6.3 of this document. It is permissable to hard-code the solution of this particular test case in your program (in case your program does not correctly implement the algorithm).
• 15 marks for passing the remaining test cases, some of which are hidden, and some of which we will provide the solutions to. It is not permitted to hard-code the solutions of these cases!
The remaining 20 marks are based on the overall quality of the code. This mark is based on factors such as:
• How readable is the code? e.g. commenting, variable naming, space, line length, etc.
• Are the data structures appropriate to the problem? e.g. how the automata are represented. • Are the algorithms implemented in a reasonably efficient way?
• How much of the assignment was attempted.
2
comp2022 Assignment 3 s2 2020 6 Appendices
6.1 Input/output formats
All inputs can be assumed to be well formed (i.e. you do not have to include error handling for malformed input, and you can assume that input strings will only include symbols from the input alphabet).
6.1.1 NFA/DFA
A plain text representation of the definition of a finite state automata, as a sequence of lines:
1. The set of states (as a comma separated list) e.g. q0,q1,q2,q3. State names should never
contain ,, :, or whitespace characters.
2. The set of alphabet symbols (as a comma separated list) e.g. 0,1
3. The start state e.g. q0
4. The set of final states (as a comma separated list) e.g. q1,q3
5. A sequence of lines representing transitions, each being a comma separated list of three values s,c,t denoting a transition from s to t on symbol c.
• DFA: s,c,t defines δ(s, c) = t. There should be exactly one such line per (state, symbol) pair.
• NFA: s,c,t defines t ∈ δ(s, c). There can be any number of such lines per (state, symbol) pair. ε is denoted by an empty string (i.e. s„t defines t ∈ δ(s, ε)).
6. Finally, the word end on a line by itself.
6.1.2 Epsilon Closures
Lines of the form q0:q0,q3,q4, in any order, followed by the word end on a line by itself. This example denotes E(q0) = {q0, q3, q4}.
6.1.3 Strings to test
• Input: One per line, followed by the word end on a line by itself.
• Output: Just a 1 or a 0 on each line, in the same order that the words were input.
3
comp2022 Assignment 3 s2 2020 6.2 Worked example
This example is included in the test cases on Ed, to help you verify that you are outputting data in the right format, although it is not worth any marks. The initial NFA is the one you would get by following the construction methods shown in class, for the following regular expression: (a(a ∪ b))⋆
6.2.1 Task 1
If the input is:
epsilon -closure q0,q1,q2,q3,q4,q5,q6,q7 a,b
q0
q0,q4,q6
q0,,q1
q1,a,q2
q2,,q7
q3,a,q4
q4,,q0
q5,b,q6
q6,,q0
q7,,q3
q7,,q5
end
Then the expected output is:
q0:q0,q1
q1:q1 q2:q2,q3,q5,q7 q3:q3 q4:q0,q1,q4 q5:q5 q6:q0,q1,q6 q7:q3,q5,q7
end
nfa-to-efnfa q0,q1,q2,q3,q4,q5,q6,q7 a,b
q0
q0,q4,q6
q0,,q1
q1,a,q2
q2,,q7
q3,a,q4
q4,,q0
q5,b,q6
6.2.2 Task 2
If the input was:
4
comp2022
Assignment 3
s2 2020
q6,,q0
q7,,q3
q7,,q5
end
q0:q0,q1
q1:q1
q2:q2,q3,q5,q7
q3:q3
q4:q0,q1,q4
q5:q5
q6:q0,q1,q6
q7:q3,q5,q7
end
Then the expected output is:
q0,q1,q2,q3,q4,q5,q6,q7 a,b
q0
q0,q4,q6
q0,a,q2 q1,a,q2 q2,a,q4 q2,b,q6 q3,a,q4 q4,a,q2 q5,b,q6 q6,a,q2 q7,a,q4 q7,b,q6
end
A,B,C,D,E a,b
B
B,D,E A,a,A A,b,A B,a,C B,b,A C,a,D C,b,E D,a,C D,b,A E,a,C E,b,A
6.2.3 Task 3
If the input was efnfa-to-dfa, followed by the output of Task 2 (above), then the expected output DFA follows. Note that the state names do not need to be the same as these.
5
comp2022 Assignment 3 s2 2020 end
6.2.4 Task 4
If the input was efnfa-to-dfa, followed by the DFA output by Task 3 (above), followed by:
ab
aaaaa abaa ba
end
Then the expected output is:
1 1 0 1
0
Because ab, abaa, and ε are all in the language, but aaaaa and ba are not.
6
comp2022 Assignment 3 s2 2020 6.3 Marked test cases
The tests in this section of the appendix are worth 20% of the overall grade. You are permitted to solve them by hand, and hard-code their answers in your program. This will enable you to demonstrate some understanding of the concepts even if your implementation is incomplete. You should not hard-code solutions to any of the other test cases.
6.3.1 Input for task 1:
q0,q1,q2,q3 0,1
q0
q1,q3 q0,0,q1 q0,1,q2 q0,,q1 q1,0,q1 q1,0,q2 q2,0,q0 q2,1,q3 q3,,q0 q3,,q2
end
q1 0 0
q2 ε1
ε
q3
q0
ε
ε
ε,0
1
q0
0
6.3.2 Input for task 2:
q0,q1,q2,q3,q4,q5,q6,q7 a,b
q0
q2,q3,q7
q0,,q1
q0,,q3
q1,a,q2
q3,,q4
q4,a,q5
q5,,q6
q6,b,q7
q7,,q4
end
q1:q1 q0:q0,q1,q3,q4 q3:q3,q4 q5:q5,q6
q4:q4 q6:q6 q2:q2 q7:q4,q7
end
q1 aε
q2 q4 a
q5 εε
q6
b q7
q3
7
comp2022
Assignment 3
s2 2020
6.3.3 Input for task 3:
q0,q1,q2,q3 a,b
q0
q1,q3 q0,a,q1 q0,b,q1 q0,b,q2 q1,a,q1 q1,a,q2 q2,a,q0 q2,b,q3 q3,a,q0 q3,a,q2
end
a,b
b
q1 a a
q2 a
q0
a
b
6.3.4 Input for task 4:
The DFA and input strings:
a
q3
error
a,b,c
start
good
error
error
error
start
start
start good,a,error good,b,good good,c,error end
aab
a acaaa abbbb aa
end
,start ,good
,a,error ,b,error ,c,error ,a,good ,b,error ,c,error
a
start
b,c
good b
a,c error
a,b,c
8