This recommender system works by taking user data X and feeding it through a logistic regression model to get the outcome Y.
Processing the Data
The users data originates from their responses to the survey questions. The range of values that the survey questions can have varies
depending on the type of question. Certain questions are binary and capture whether or not it is relevant. Other questions
range from -1 to 1. This enables more information because choices can either be positive, negative, or orthogonal.
The data X is a n-length vector where n is the number of survey questions.
The Model
X is multiplied by a matrix of weights with dimensions k x n resulting in a k x 1 vector. It then goes through a logistic sigmoid
function to get the predicted output h ` \in \mathbb{R}^k`. `h_i \in `[0,1] ` \forall h_i \in ` h.
*k = number of classes, which in this case is the number of possible careers.
`h = `sigmoid(`W*X`)
sigmoid(`z`) `= 1/(1+e^(-z))`
Stochastic Gradient Descent
The matrix of weights W are updated through Stochastic Gradient Descent. Each update iteration,
the weights move in the direction negative to the gradient "down hill" in order to to minimize the cost.
`W = W - alpha * (del J)/(del W)` Where J is the cost and `alpha` is the learning rate.
Cost Function
The cost function J is the logistic loss.
`J = -y*log(h) + (1-y)*log(1-h)` Where y is the real output and h is the predicted output.
Output
After enough training iterations, W should accurately map X to Y.
The predicted output h parameratized by W is a k-length vector with values ranging from 0 to 1.
The top 5 largest values in the vector are selected as the top 5 career choices.