Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Datafile: BreadBasket_DMS.zip
Solve: Show the total number by item, per day per hour
Example, given the input:
Bread, 2016-10-30, 09, 1
Bread, 2016-10-30, 10, 12
:
2. 15 Points
Dataset: Restaurants_in_Durham_County_NC.csv
NOTE*** This file is colon delimited (not comma). Do not preprocess it; read it
with spark.read…
Solve: Summarize the number of entities by “rpt_area_desc”
Example:
“Swimming Pools”, 13
“Tatoo Establishment”, 2
:
3. 25 Points
Dataset: populationbycountry19802010millions.csv
Solve: For each year and each region, compute percentage increase in population,
year over year. Note the year 1980 will not have a preceding year.
Show the percentage of yearly population increase as a percentage of the global
population increase for that year.
Display the top 10 in deceasing order of global growth
Example:
Year, Region, yearly increase, percent of global year increase (these results are
made up)
1981, North America, 1.30%, 1%
1982, Aruba, …
4. 15 Points
Dataset: romeo-juliet-pg1777.txt
Solve: WordCount
Do a word count exercise using pyspark. Ignore punctuation, and normalize to
lower case. Accept only the characters in this set: [0-9a-zA-Z]