Assignment Questions
1In this exercise, we will restrict txt[1 . . . n] and pat[1 . . . m] to strings drawn from a two-letter alphabet, ℵ = {‘a’, ‘b’}. Your goal is to implement a specialized version of the Boyer-Moore algorithm that exclusively and effiffifficiently handles the search on such strings.
For this task, the (extended) bad-character rule becomes redundant, and hence in your implementation you can ignore the bad-character rule entirely and only consider shifts using the good-suffix and matched-prefix rules. (Reason to yourself why this is so.)
Finally, your implementation of this specialized version should be effiffifficient, and exploit any obvious optimizations that arise from dealing with a two-letter alphabet.
Strictly follow the specifification below to address this question:
Program name: special boyermoore.py
Arguments to your program: Two plain text fifilenames:
(a) name of an input fifile containing txt[1 . . . n] (without any line breaks), from an ℵ = {‘a’, ‘b’} alphabet.
(b) name of another input fifile containing pat[1 . . . m] (without any line breaks), also from an ℵ = {‘a’, ‘b’} alphabet.
Command line usage of your script:
python special boyermoore.py <text filename> <pattern filename>
Do not hard-code the fifilenames/input in your program. The pattern and text should be specifified as arguments. Penalties apply if you do.
Output fifilename: output special boyermoore.txt
❼ Each position where pat matches the txt should appear in a separate line. For example, when text = bbbbabbbbbbabb, and pattern = bbbb, the output should be:1678
Comments to marker document: A PDF document, commentsq1.pdf, that records the list of optimizations you have implemented in your script.
In this exercise, you will be generalizing the pattern matching problem to two dimensions (2D), where the goal is to search for all instances of a 2D (m × m) pattern, pat[1 . . . m][1 . . . m] within another 2D (n × n) text, txt[1 . . . n][1 . . . n], where n ≥ m.
For this task, assume all characters come from a subset of ascii alphabet (click), with characters in the ascii range [33, 126].
Importantly, this pattern matching is approximate in the sense that pat[1 . . . m][1 . . . m] matches a region starting at position (i, j), of the 2D text txt[1 . . . n][1 . . . n] (where i is a row index and j is a column index, both 1-based) if and only if each row of pattern pat[k][1 . . . m] has a Hamming distance (click) ≤ 1 with the corresponding row txt[i + k − 1][j . . . j + m − 1], ∀1 ≤ k ≤ m, in any region of the text where it appears.
2Below is an example of the approximate occurrences of a 3 × 3 pattern at specifific loci/regions in a 15 × 15 text:
(Notice in the above example, in regions of the text where the pattern appears (approximately), each row of the pattern has a Hamming distance of either 0 or 1 with the corresponding regions in the text.)
Strictly follow the specifification below to address this question:
Program name: hd1pm2D.py
Arguments to your program: Two plain text fifilenames:
(a) the fifilename of an input fifile containing an n × n 2D text (i.e., n lines, where each line is a string of n characters. You are free to assume that n ≥ 1
(b) the fifilename of another input fifile containing a m × m 2D pattern (i.e., m lines,where each line is a string of m characters. You are free to assume that m ≥ 1 and m ≤ n.
Command line usage of your script:
python h1pm2D.py <2D text filename> <2D pattern filename>
Do not hard-code the fifilenames/input in your program. The pattern and text should be specifified as arguments.
Penalties apply if you do.
Output fifilename: output hd1pm2D.txt
❼ Each (row index, column index) position in the 2D text (using 1-based indexes) where the 2D pattern appears (approximately). Using the example shown above,3the output fifile will contain:
(5,12)
(6,2)
(7,3)
(13,5)
Comments to marker document: A PDF document, commentsq2.pdf, that describes at a high-level the logic of the search you have implemented in your script.