Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Project 3: Big Data Analytics
Objectives:
1. Understanding Hadoop Ecosystem and Data Analytics
2. Become familiar with MapReduce programming and Spark
3. Gain experience with research on big data and data analytics
This will be a group project (by 2 students) for one semester. The main purpose of this
project is to become familiar with Big Data platform, including Hadoop system,
MapReduce programming, and cloud based big data solutions (e.g., Google Big Query).
You need to follow the instruction to conduct the project.
Phase 1 (15%): Selecting Data Set - Due: March 27, 2024 (Wed)
? Each student researches on any data that you are interested in, and collect the
information about the data.
? Find any characteristics of the data you select, and describe why you are
interested in
? If possible, prepare 3~4 sample data, which can be either real data or manipulated
one.
? Make 2~ 3 pages of Powerpoint file as a report
? Submit the PPT file to Canvas
o PPT, PPTX or PDF file format ONLY
Phase 2 (15%): Defining Problems – Due: April 3, 2024 (Wed)
? In this 2nd phase, you are going to research on the following topics based on the
data you selected in Phase 1:
- What you can analyze using the selected data in terms of Hadoop HDFS with
Spark, and Google Big Query using GCP.
o 1 Spark
o 1 Google Big Query using GCP
- How you can collect the data at least 1GB. That means your data MUST be
uploaded to HDFS using VM in Phase 4-5.
? Make 2~ 3 pages of Powerpoint file as a report
? Submit the PPT file to Canvas
o PPT, PPTX or PDF file format ONLY
Phase 3 (20%): Preparing Proposal – Due: April 3, 2024 (Wed)
? Prepare a proposal using a MS word template: A proposal template can be found
at Canvas
o DOC, DOCX or PDF file format ONLY
? Prepare and submit 5~10 pages of Powerpoint file for presentation
o PPT, PPTX or PDF file format ONLY
? Then, submit 10 minutes presentation video to Canvas
o Submit a link such as YouTube, or record your presentation using Canvas
? In your proposal, you need to consider how to prepare the final deliverable of
following outputs