Search for question
Question

5. One of the social media billionaires is considering running for President. They

run a social media named Quitter, and they have access to a lot of data inside the

3

company. As an intern in this campaign, you have the same social network

dataset (named D1) specified in the previous question ((a,b) directed pairs

indicating a follows b), but you also have an additional dataset (named D2) with

entries (a, start_time, end_time) indicating that user a was online starting

start_time and ending at end_time. The data is only for one day. All times are

hh:mm:ss. However, each user a may have multiple entries in D2 (since users log

in simultaneously). Write a Mapreduce program that extracts all pairs of users

(a,b) such that: (i) a and b follow each other, and (ii) a and b were online

simultaneously at least once during that day. Same instructions as the first

Mapreduce question in this series apply. Please ensure that a Map stage reads

data from only one input dataset (i.e., if a Map reads directly from D2, don't use

it to also read from D1. And vice-versa.) - this is good practice consistent with

good Map programming practices.

Fig: 1