COSC 2123/1285 Algorithms and Analysis
Algorithms and Analysis
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COSC 2123/1285 Algorithms and Analysis
Assignment 1: Word Completion
Assessment Type (Group of 1 or 2) Assignment. Submit online via Canvas → Assignments → Assignment 1. Clarifications/Updates/FAQ: check
Canvas → Discussion → Assignment 1 Discussion.
1 Objectives
There are a number of key objectives for this assignment:
• Understand how a real-world problem can be implemented by different data structures and/or
algorithms.
• Evaluate and contrast the performance of the data structures and/or algorithms with respect to
different usage scenarios and input data.
In this assignment, we focus on the word completion problem.
2 Background
Word/sentence (auto-)completion is a very handy feature of nowadays text editors and email browsers
(you must have used it in your Outlook). While sentence completion is a much more challenging task
and perhaps requires advanced learning methods, word completion is much easier to do as long as
you have a dictionary available in the memory. In this assignment, we will focus on implementing a
dictionary comprising of words and their frequencies that allows word completion. We will try several
data structures and compare their performances, which are array (that is, Python list), linked list,
and trie, which are described one by one below. Please read them very carefully. Latest updates and
answers for questions regarding this assignment will be posted on the Discussion Forum. It is your
responsibility to check the post frequently for important updates.
Array-Based Dictionary
Python’s built-in ‘list’ is equivalent to ‘array’ in other language. In fact, it is a dynamic array in the
sense that its resizing operation (when more elements are inserted into the array than the original
size) is managed automatically by Python. You can initialize an empty array in Python, add elements
at the end of the array, remove the first instant of a given value by typing the following commands
(e.g., on Python’s IDLE Shell).
>>> array = []
>>> array.append(5)
>>> array.append(10)
>>> array.append(5)
>>> array
[5, 10, 5]
>>> array.remove(5)
>>> array
[10, 5]
In the array-based dictionary implementation, we use the Python list (a data structure) to implement common operations for a dictionary (an abstract data type). We treat each element of the
array as a pair (word, frequency) (defined as an object of the simple class WordFrequency), where
word is an English word (a string), e.g., ‘ant’, and frequency is its usage frequency (a non-negative
integer), e.g., 1000, in a certain context, e.g., in some forums or social networks.
The array must be sorted in the alphabetical order, i.e., ‘ant’ appears before ‘ape’, to facilitate
search. A new word, when added to the dictionary, should be inserted at a correct location that
preserves the alphabetical order (using the module bisect is allowed - but you need to know what
it does). An example of a valid array is [(‘ant’, 1000), (‘ape’, 200), (‘auxiliary’, 2000)].
Adding (‘app’, 500) to the array, we will have [(‘ant’, 1000), (‘ape’, 200), (‘app’, 500),
(‘auxiliary’, 2000)]. Note that the pair (‘ant’, 1000) in our actual implementation should be
an object of the class WordFrequency and not a tuple.
A Search for ‘ape’ from the array above should return its frequency 200. If the word doesn’t exist,
0 is returned.
A Delete for a word in the dictionary should return True and remove the word from the dictionary
if it exists, and return False otherwise.
An Autocompletion for a string should return a sorted list (most frequent words appear first) of
the three words (if any) of highest frequencies in the dictionary that have the given string as a prefix.
For the array above, an autocompletion for ‘ap’ should return the list [(‘app’, 500),(‘ape’, 200)].
Notice that while both ‘app’ and ‘ape’ have ‘ap’ as a prefix, ‘app’ has a larger frequency and appears
first in the returned list of autocompletion.
Linked-List-Based Dictionary
A linked list is precisely what it is called: a list of nodes linked together by references. In a singly
linked list, each node consists of a data item, e.g., a string or a number, and a reference that holds the
memory location of the next node in the list (the reference in the last node is set to Null). Each linked
list has a head, which is the reference holding memory location of the first node in the list. Once we
know the head of the list, we can access all nodes sequentially by going from one node to the next
using references until reaching the last node.