Introduction to Computer Science
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSC148H1S Introduction to Computer Science
Learning goals
After completing this assignment, you will be able to:
model different kinds of real-world hierarchical data using trees
implement recursive operations on trees (both non-mutating and mutating)
implement an algorithm to generate a geometric tree visualisation called a treemap
use the os library to write a program that interacts with your computer’s file system
use inheritance to design classes according to a common interface
Introduction
As we’ve discussed in class, trees are a fundamental data structure used to model hierarchical data.
Often, a tree represents a hierarchical categorization of a base set of data, where the leaves
represent the data values themselves, and internal nodes represent groupings of this data. Files on a
computer can be viewed this way: regular files (e.g., PDF documents, video files, Python source
code) are grouped into directories, which are then grouped into larger directories, etc.
Sometimes the data in a tree has some useful notion of size. For example, in a tree representing the
departments of a company, the size of a node could be the dollar amount of all sales for that4/3/23, 1:36 AM
A2 — Treemaps: CSC148H1S 20231 (All Sections): Introduction to Computer Science
https://q.utoronto.ca/courses/292974/pages/a2-treemaps
2/14
department, or the number of employees in that department. Or in a tree representing a computer’s
file system, the size of a node could be the size of the file.
A treemap
(http://en.wikipedia.org/wiki/Treemapping) is a visualisation technique to show a tree’s
structure according to the weights (or sizes) of its data values. It uses rectangles to show subtrees,
scaled to reflect the proportional sizes of each piece of data.
Treemaps are used in many contexts. Among the more popular uses are news headlines, various
kinds of financial information, and computer disk usage. Some free programs use treemaps to
visualize the size of files on your computer; for example:
WinDirStat
(http://portableapps.com/apps/utilities/windirstat_portable) for Windows
Disk Inventory X
(http://www.derlien.com/) for MacOS
KDirStat
(http://kdirstat.sourceforge.net/) for Linux
For this assignment, you will write an interactive treemap visualisation tool that you can use to
visualize hierarchical data.
It will have a general API (implemented with inheritance, naturally!) and you will define specific
subclasses that will allow you to visualize different kinds of data.
And with that, let’s get started on your final CSC148 assignment!
Task 0: Getting started
1. Download and unzip the starter code (a2_starter_files.zip
(https://q.utoronto.ca/courses/292974/files/25399282?wrap=1)
(https://q.utoronto.ca/courses/292974/files/25399282/download?download_frd=1) ) into your
directory for Assignment 2.
2. With tm_trees.py open, take the time to read through the tasks below to get a sense of what
code you’ll be writing throughout this assignment.
3. Familiarize yourself with the following coding guidelines:
You may NOT define any new public or private attributes.
You are always free to define private helpers (functions or methods).
Complete docstrings are not required for any helpers that you define or for methods that you
override or extend, but you are encouraged to take the time to write them.
Task 1: TMTree basics
In this task, you will implement:
TMTree.__init__
TMTree.is_displayed_tree_leaf
TMTree.get_path_string4/3/23, 1:36 AM
A2 — Treemaps: CSC148H1S 20231 (All Sections): Introduction to Computer Science
https://q.utoronto.ca/courses/292974/pages/a2-treemaps
3/14
In order to use a treemap to visualize data, we first need to define our tree structure and be able to
initialize it. For this program, you will develop a TMTree class to define the basic functionality required
by the provided treemap visualizer. To visualize any specific hierarchical dataset, we simply need to
define a subclass of TMTree , which we will do later.
The TMTree class is similar to the basic Tree class that you have already worked with, but with some
differences. A few of the most conceptually important ones are discussed here.
First, the class has a data_size instance attribute that is used by the treemap algorithm to
determine the size of the rectangle that represents this tree. data_size is defined recursively as
follows:
If the tree is a single leaf, its data_size is some measure of the size of the data being
modelled. For example, it could be the size of a file.
If the tree has one or more subtrees, its data_size is equal to the sum of the data_size s of its
subtrees, plus some additional size of its own.
Second, the class not only stores its subtrees, but it also stores its parent as an attribute. This
allows us to traverse both up and down the tree, which you will see is necessary when we want to
mutate the tree.
Third, we’ll also track some information inside our tree related to how to display it. More on this
shortly.
Fourth, we don’t bother with a representation for an empty TMTree , as we’ll always be visualizing
non-empty trees in this assignment.
To get started, you should do the following:
1. Complete the initializer of TMTree according to its docstring. This will get you familiar with the
attributes in this class.
2. Implement the is_displayed_tree_leaf method according to its docstring and the definition of the
displayed-tree (see below).
3. Implement the get_path_string method according to its docstring.
In the subsequent parts of this assignment, your program will allow the user to interact with the
visualizer to change the way the data is displayed to them. At any one time, parts of the tree will be
displayed and others not, depending on what nodes are expanded.
We’ll use the terminology displayed-tree to refer to the part of the data tree that is displayed in the
visualizer. It is defined as follows:
The root of the entire tree is in the displayed-tree.
For each “expanded” tree in the displayed-tree, its children are in the displayed-tree.4/3/23, 1:36 AM
A2 — Treemaps: CSC148H1S 20231 (All Sections): Introduction to Computer Science
https://q.utoronto.ca/courses/292974/pages/a2-treemaps
4/14
The trees whose rectangles are displayed in the visualization will be the leaves of the displayed-tree.
A tree is a leaf of the displayed-tree if it is not expanded, but its parent is expanded. Note that the root
node is a leaf of the displayed-tree if it is not expanded.
We will require that (1) if a tree is expanded, then its parent is expanded, and (2) if a tree is not
expanded, then its children are not expanded. Note that this requirement is naturally encoded as a
representation invariant in the TMTree class.
Note: the displayed-tree is not a separate tree! It is just the part of the data tree that is displayed.
Progress check!
After this step is done, you should be able to instantiate and print a TMTree , as is done in the
provided main block of tm_trees.py .
In the next task, you will implement the code required for the treemap algorithm.
Task 2: The Treemap algorithm
In this task, you will implement:
TMTree.update_rectangles
TMTree.get_rectangles
The next component of our program is the treemap algorithm itself. It operates on the data tree and a
2-D window to fill with the visualization, and generates a list of rectangles to display, based on the
tree structure and data_size attribute for each node.
For all rectangles in this program, we’ll use the pygame representation of a rectangle, which is a
tuple of four integers (x, y, width, height) , where (x, y) represents the upper-left corner of the
rectangle. Note that in pygame, the origin is in the upper-left corner and the y-axis points down. So,
the lower-right corner of a rectangle has coordinates (x + width, y + height). Each unit on both axes is
a pixel on the screen. Below, we show a grid representing the rectangle (0, 0, 8, 4) :
We are now ready to see the treemap algorithm.
Note: We’ll use “size” to refer to the data_size attribute in the algorithm below.4/3/23, 1:36 AM
A2 — Treemaps: CSC148H1S 20231 (All Sections): Introduction to Computer Science
https://q.utoronto.ca/courses/292974/pages/a2-treemaps
5/14
Input: An instance of TMTree , which is part of the displayed-tree, and a pygame rectangle (the
display area to fill).
Output: A list of pygame rectangles and each rectangle’s colour, where each rectangle corresponds
to a leaf in the displayed-tree for this data tree, and has area proportional to the leaf’s data_size .
Treemap Algorithm:
Note I: Unless explicitly written as “displayed-tree”, all occurrences of the word “tree” below refer to a
data tree.
Note II: (Very Important) The overall algorithm, as described, actually corresponds to the
combination of first calling TMTree.update_rectangles to set the rect attribute of all nodes in the
tree and then later calling TMTree.get_rectangles to actually obtain the rectangles and colours
corresponding to only the displayed-tree’s leaves.
1. If the tree is a leaf in the displayed-tree, return a list containing just a single rectangle that covers
the whole display area, as well as the colour of that leaf.
2. Otherwise, if the display area’s width is greater than its height: divide the display area into vertical
rectangles (by this, we mean the x-coordinates vary but the y-coordinates are fixed), one smaller
rectangle per subtree of the displayed-tree, in proportion to the sizes of the subtrees (don’t use
this tree’s data_size attribute, as it may actually be larger than the sizes of the subtrees because
of how we defined it!)
Example: suppose the input rectangle is (0, 0, 200, 100), and the displayed-tree for the input tree
has three subtrees, with sizes 10, 25, and 15.
The first subtree has 20% of the total size, so its smaller rectangle has width 40 (20% of 200):
(0, 0, 40, 100).
The second subtree should have width 100 (50% of 200), and starts immediately after the first
one: (40, 0, 100, 100).
The third subtree has width 60, and takes up the remaining space: (140, 0, 60, 100).
Note that the three rectangles’ edges overlap, which is expected.
3. If the height is greater than or equal to the width, then make horizontal rectangles (by this, we
mean the y-coordinates vary, but the x-coordinates are fixed) instead of vertical ones, and do the
analogous operations as in step 2.
4. Recurse on each of the subtrees in the displayed-tree, passing in the corresponding smaller
rectangle from step 2 or 3.
Important: To ensure that you understand the treemap algorithm, complete the A2 release activity
Note about rounding: because you’re calculating proportions here, the exact values will often be
floats instead of integers. For all but the last rectangle, always truncate the value (round down to the
nearest integer). In other words, if you calculate that a rectangle should be (0, 0, 10.9, 100), “round”
(or truncate) the width down to (0, 0, 10, 100). Then a rectangle below it would start at (0, 100), and a
rectangle beside it would start at (10, 0).
However, the final (rightmost or bottommost) edge of the last smaller rectangle must always be equal
to the outer edge of the input rectangle. This means that the last subtree might end up a bit bigger
than its true proportion of the total size, but that’s okay for this assignment.
Important followup to Note II above: You will implement this algorithm in the update_rectangles
method in TMTree . Note that rather than returning the rectangles for only the leaves of the
displayed-tree, the code will instead set the rect attribute of each node in the entire tree to
correspond to what its rectangle would be if it were a leaf in the displayed-tree. Later, we can
retrieve the rectangles for only the leaf nodes of the displayed-tree by using the
get_rectangles method.