An alternative to the one hot representation is where you figure out the meaning of a word by the words that frequently appear nearby....
The one hot representation of a word is a vector the length of the vocabulary, with all 0’s except a 1 to represent that...
Let \(X\) be the vector of all your features (i.e. \(X=(X_{1}, X_{2},\dots, X_{p})\), then the output is: \begin{equation*}...