Search for question
Question

In this project you will implement gradient descent for linear regression on Spark using Scala.

The gradient descent update for linear regression is:

Wi+1 = Wi · αἱ Σ(wx; - yj)x;

- Part 1 (20 points)

First, implement a function that computes the summand (w¹x - y)x,, and test this function on two examples. Use (Vectors) to create dense vectors w and use (Labeled Point)

to create training dataset with 3 features. You can also use (Breeze) to do the dot product.

Part 2 (20 points)

Implement a function that takes in vector w and an observation's Labeled Point and returns a (label, prediction) tuple. Note that we can predict by computing the dot

product between weights and an observation's features. Test this function on a Labeled Point RDD.

Part 3 (20 points)

Implement a function to compute (RMSE) given an RDD of (label, prediction) tuples:

Test this function on an example RDD.

RMSE =

-18

n

Σ(vi - 1/2

i=1 Part 4 (40 points)

Implement a gradient descent function for linear regression:

The function will take trainData (RDD of Labeled Point) as an argument and return a tuple of weights and training errors. Reuse the code that you have written in Part 1 and 2.

Initialize the elements of vector w = 0 and a = 1. Update the value of a in ith iteration using the formula:

Wi+1 = Wi-ai (w/ x; -yj)xj

Bonus (20 points)

Implement the closed form solution:

Test the function on and example RDD. Run it for 5 iterations and print the results.

You can assume X is a (DenseMarix).

α₂ =

α

n√i.

w = (x¹x)¯¹x²