Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
General Instructions
The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you
acquainted with the code and homework submission system. Completing the tutorial is
optional but by handing in the results in time students will earn 5 points. This tutorial is
to be completed individually.
Here you will learn how to write, compile, debug and execute a simple Hadoop program.
First part of the assignment serves as a tutorial and the second part asks you to write your
own Hadoop program.
Section 1 describes the virtual machine environment. Instead of the virtual machine, you
are welcome to setup your own pseudo-distributed or fully distributed cluster if you prefer.
Any version of Hadoop that is at least 1.0 will suffice. If you choose to setup your own cluster, you are responsible
for making sure the cluster is working properly. The TAs will be unable to help
you debug configuration issues in your own cluster.
Section 2 explains how to use the Eclipse environment in the virtual machine, including how
to create a project, how to run jobs, and how to debug jobs. Section 2.5 gives an end-to-end
example of creating a project, adding code, building, running, and debugging it.
Section 3 is the actual homework assignment. There are no deliverable for sections 1 and 2.
In section 3, you are asked to write and submit your own MapReduce job
This assignment requires you to upload the code and hand-in the output for Section 3.
All students should submit the output via Gradescope and upload the code via snap.
Gradescope: To register for Gradescope,
Create an account on Gradescope if you don’t have one already.
Join CS246 course using Entry Code MBDY2M
You must aggregate all the code in a single
file (one file per question), and it must be a text file.
CS246: Mining Massive Datasets - Problem Set 0 2
Questions
1 Setting up a virtual machine
Download and install VirtualBox on your machine: http://virtualbox.org/wiki/
Downloads
Start VirtualBox and click Import Appliance in the File dropdown menu. Click the
folder icon beside the location field. Browse to the uncompressed archive folder, select
the .ovf file, and click the Open button. Click the Continue button. Click the Import
button.