Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSC148 Assignment 2 |Prefix Trees and Melodies
By this point in CSC148, we have studied several different abstract data types and data structures,and most recently looked at tree-based data structures that can be used to represent hierarchical data. On this assignment, you’ll explore a new tree-based data structure known as a prefix tree,which is a particular kind of tree used to store large quantities of data organized into a hierarchy by common “prefixes”.
You’ll be responsible first for implementing this new data structure, and then applying it to two different problem domains, one of which is a text-based autocompletion tool similar to what online search engines like Google use.
Task 1: Beginning with SimplePrefixTree
Open prefix_tree.py and read through the provided starter code for the Autocompleter and SimplePrefixTree classes (you can ignore CompressedPrefixTree for now).
Your first task is to implement __init__, __len__, and insert for the SimplePrefixTree class.
The first two are quite a bit easier than the third.
SimplePrefixTree.insert should be implemented recursively, but be aware that you’ll need to create new SimplePrefixTree objects to represent the different prefixes as you go down the tree.
Also, pay attention to representation invariants! In particular, the list of subtrees should always remain sorted in non-increasing order of weight, and your insertion algorithm should check for this.
Check your work!
We want you to start thinking about generating good test cases even at this point. Here are some ideas for test cases to get started:
Inserting one value with an empty prefix [] into a new prefix tree should result in a tree with two nodes: an internal node with an empty prefix [] , and then a leaf containing the inserted value. Note that __len__ should return 1 in this case, since we only count inserted values for the Autocompleter ADT.
Inserting one value with a length-one prefix [x] into a new prefix tree should result in a tree with three node: two internal nodes with prefixes [] and [x], and then a leaf containing the inserted value.
Inserting one value with a length-n prefix [x_1, …, x_n] into a new prefix tree should result in a tree with (n+2) nodes: internal nodes with prefixes [], [x_1], [x_1, x_2], etc., and then a leaf containing the inserted value.
Check that correct weights are calculated for both possible aggregate weight types. Note that averages should not be rounded here.
Task 2: Implementing autocompletion
After you are confident that your algorithm is inserting values correctly, the next step is to implement autocompletion.
You should do this in two parts:
1. First, implement an unlimited autocomplete, where you ignore the limit parameter and always return every value that matches the input prefix.
2. Then, implement the limited autocomplete, using the algorithm described in the Weights section.
You might find bugs in how your insertion algorithm updated tree weights. That’s totally normal, but if this happens to you, make sure to add to your Task 1 tests to check that weights are updated properly!
Note that because of the SimplePrefixTree representation invariants, each subtree list should already be sorted in non-increasing weight order when you call autocomplete; don’t re-sort the subtrees inside autocomplete (which is non-mutating), as this is very inefficient.
At this point, you should try inserting some values into a prefix tree, and then calling autocomplete to obtain some results. Here are some suggestions of input properties/conditions to help you design test cases (add to this!):
How many values in the prefix tree match the autocomplete prefix? (0? 1? 10?)
What is the relationship between the number of matches and the limit argument? (less than? equal to? greater than?)
If there are more matches than the specified limit, try different combinations of input weights to check that you’re returning the right matches.
Again, check all of the above for both aggregate weight types.
Task 3: Implementing removal
Your task in this section is to implement the SimplePrefixTree.remove method. The recursive structure is similar to SimplePrefixTree.autocomplete, but you’ll need to be careful about updating weights.
Note: similar to the Tree class, we have a representation invariant that says that self.subtrees cannot have any empty prefix trees (i.e., ones that do not contain any more values).
This means you need to check for this when implementing remove, since removing all values with a given prefix might cause a much larger tree (and even the whole prefix tree) to become empty.
For example, if we have a prefix tree that contains two only values ‘cat and ‘car’, and we remove every value with prefix [‘c’, ‘a’], then the resulting tree should become empty, even though the prefix looks much more specific.
Check your work!
We strongly recommend starting here with small trees containing just a few values, and then deleting one value at a time and checking at each step that the resulting tree has the right structure and weights.
After this task, you’re done with the SimplePrefixTree class. Onto some fun applications!
Task 4: Text-based autocompletion
(Updated Nov 19) Please check new starter code for autocomplete_engines.py for correction to documentation.
Next, open autocomplete_engines.py. Inside are the start of two classes
LetterAutocompleteEngine and SentenceAutocompleteEngine. Both of these are relatively simple classes that use an Autocompleter to perform some autocomplete actions, and both store strings as the values in the Autocompleter. Where they differ is in how they generate prefix sequences for each string when it is inserted.
The LetterAutocompleteEngine associates a string with a list of characters in that string, in the same way as the examples we’ve been using in the assignment handout so far. This enables it to finish a string given the first few letters.
The SentenceAutocompleteEngine associates a string with a list of the words in that string,where whitespace is used to separate words. For example, the sentence
‘how are you today’ would have the prefix sequence [‘how’, ‘are’, ‘you’, ‘today’].
Note that the LetterAutocompleteEngine can also take a string with spaces in it, and each space is treated just like any other letter. For example, the string ‘how are’ would have the prefix sequence [‘h’, ‘o’, ‘w’, ‘ ‘, ‘a’, ‘r’, ‘e’].
Your task is to complete the implementation of these two classes based on the public interface and some implementation instructions we’ve given you in the starter code. The hardest part of each one is their initializer; the other methods should be simple calls to Autocompleter methods.
Text sanitization
Both LetterAutocompleteEngine and SentenceAutocompleteEngine will read data from a text file to populate their Autocompleters. In the real world, this data is often messy, and we often go through the process of sanitizing the data before using them in the rest of our program.
Task 5: Let’s make some music
Our last application will be a fun one: performing autocomplete on song melodies! First, some background definitions.
A note represents a single sound. In Python, we represent a note as a Tuple[int, int],where the first integer represents the pitch of the note,1 and the second represents its duration in milliseconds.
A melody is a sequence of one or more notes. In melody.py, we’ve provided a Melody class that stores a list of notes and a name for the melody.
An interval is the difference in pitch between two notes.
Finally, the interval sequence of a melody is a list containing the differences between consecutive notes in the melody. For example, if a melody contains the sequence of notes
Task 6: Compressing prefix trees
You might notice that our implementation of SimplePrefixTree is a little bit inefficient: because we require that every internal value is a prefix that’s only one element longer than its parent, it is possible to get very long paths down a simple prefix tree that lead to just a single value. You can see a small example of this in the diagram of the sample prefix tree earlier in the handout with the path leading to the word “danger”.
Let’s formalize this idea. We say that an internal value in a prefix tree is compressible if it has exactly one child, and that child is also an internal value. Equivalently, we can say that an internal value is incompressible if it has more than one child or it is the parent of a leaf (or both). It turns out that it is possible to remove every compressible internal value in a prefix tree but still support the Autocompleter operations in an efficient manner!
We say that a prefix tree with no compressible internal values is fully compressed. For example,here is a fully compressed version of the prefix tree from the start of the handout.