Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
In this project, you will write a program that uses natural language processing and machine
learning techniques to automatically identify the subject of posts from the EECS 280 Piazza. You
will gain experience with recursion, binary trees, templates, comparators, and the map data
structure. Another goal is to prepare you for future courses (like EECS 281) or your own
independent programming projects, so we have given you a lot of freedom to design the structure
of your overall application.
The correctness portion of the final submission is worth approximately 70%, with the remaining
approximately 30% based on the thoroughness of your BST test cases and style grading. Your
test cases and style will both by graded by the autograder.
Winter 2018: We will use the same automated style grading on this project that we did for project
4. On this project, the automated style checks will be part of the grade. To run the tests on your
own, check out the style checking tutorial.
You may work alone or with a partner. Please see the syllabus for partnership rules.
Table of Contents
Project Roadmap
Project Introduction
Project Essentials
The BinarySearchTree ADT
Testing BinarySearchTree
The Map ADT
Testing Map
The Piazza Datasets
Classifying Piazza Posts with NLP and ML
The Bag of Words Model
Training the Classifier
4/5/2018 EECS 280 Project 5: Machine Learning | p5-ml
https://eecs280staff.github.io/p5-ml/ 2/21
Predicting a Label for a New Post
Implementing Your Top?Level Classifier Application
Classifier Application Interface
Output
Results
Appendix A: Map Example
Appendix B: Splitting a Whitespace?Delimited String
Project Roadmap
1. Set up your IDE
Use the tutorial from project 1 to get your visual debugger set up. Use this wget link
https://eecs280staff.github.io/p5‐ml/starter‐files.tar.gz .
Before setting up your visual debugger, you’ll need to rename each .h.starter file to a .h
file.
$ mv BinarySearchTree.h.starter BinarySearchTree.h
$ mv Map.h.starter Map.h
You’ll also need to create these new files and add function stubs.
$ touch main.cpp
These are the executables you’ll use in this project:
BinarySearchTree_compile_check.exe
BinarySearchTree_public_test.exe
BinarySearchTree_tests.exe
Map_compile_check.exe
Map_public_test.exe
main.exe
If you’re working in a partnership, set up version control for a team.
2. Read the Project Introduction and Project Essentials
4/5/2018 EECS 280 Project 5: Machine Learning | p5-ml
https://eecs280staff.github.io/p5-ml/ 3/21
See the first sections below for an introduction to the project as well as essential instructions
for successfully completing the project.
3. Test and implement the BinarySearchTree data structure
We’ve provided header files with comments. Test and implement those functions. Be sure to
use recursion and tail recursion where the comments require it.
4. Test and implement the Map data structure
Implement and test a Map ADT that internally uses your BinarySearchTree to provide an
interface that works (almost) exactly like std::map from the STL! Appendix A has an example.
5. Test and implement the Piazza Classifier Application
This specification describes the interface for the overall application, but it’s up to you how to
separate it into functions and data structures.
Appendix B has tips and tricks for this part.
Submit to the Autograder
Submit the following files to the autograder.
BinarySearchTree.h
Map.h
main.cpp
BinarySearchTree_tests.cpp
Project Introduction
The goal for this project is to write an intelligent program that can classify Piazza posts according
to topic. This task is easy for humans ? we simply read and understand the content of the post,
and the topic is intuitively clear. But how do we compose an algorithm to do the same? We can’t
just tell the computer to “look at it” and understand. This is typical of problems in artificial
intelligence and natural language processing.
4/5/2018 EECS 280 Project 5: Machine Learning | p5-ml
https://eecs280staff.github.io/p5-ml/ 4/21
We know this is about Euchre, but how can we write an algorithm that “knows” that?
With a bit of introspection, we might realize each individual word is a bit of evidence for the topic
about which the post was written. Seeing a word like “card”, “spades”, or even “bob” leads us
toward the Euchre project. We judge a potential label for a post based on how likely it is given all
the evidence. Along these lines, information about how common each word is for each topic
essentially constitutes our classification algorithm.
But we don’t have that information (i.e. that algorithm). You could try to sit down and write out a
list of common words for each project, but there’s no way you’ll get them all. For example, the
word “lecture” appears much more frequently in posts about exam preparation. This makes
sense, but we probably wouldn’t come up with it on our own. And what if the projects change? We
don’t want to have to put in all that work again.
Instead, let’s write a program to comb through Piazza posts from previous terms (which are
already tagged according to topic) and learn which words go with which topics. Essentially, the
result of our program is an algorithm! This approach is called (supervised) machine learning. Once
we’ve trained the classifier on some set of Piazza posts, we can apply it to new ones written in the
future.
4/5/2018 EECS 280 Project 5: Machine Learning | p5-ml
https://eecs280staff.github.io/p5-ml/ 5/21
Authors
This project was developed for EECS 280, Fall 2016 at the University of Michigan. Andrew DeOrio
and James Juett wrote the original project and specification. Amir Kamil contributed to code
structure, style, and implementation details.
Project Essentials
The project consists of three main phases:
1. Implement and test the static _impl member functions in BinarySearchTree .
2. Implement and test Map by using the has?a pattern on top of BinarySearchTree .
3. Design, implement, and test the top?level classifier application.
The focus of part 1 is on working with recursive data structures and algorithms. The framework
and some of the implementation for BinarySearchTree is provided for you, but you must
implement the core functionality in several static member functions. Be mindful of requirements
for which implementations must use certain kinds of recursion.
Part 2 should not require a lot of additional implementation code. Make sure to reuse the
functionality already present in BinarySearchTree wherever possible.