Defining the network and feedforward function

class LSTMTagger(nn.Module): def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size): super(LSTMTagger, self).__init__() self.hidden_dim = hidden_dim self.word_embeddings = nn.Embedding(vocab_size, embedding_dim) self.lstm = nn.LSTM(embedding_dim, hidden_dim) self.hidden2tag = nn.Linear(hidden_dim, tagset_size) self.hidden = self.init_hidden() def init_hidden(self): return (torch.zeros(1, 1, self.hidden_dim), torch.zeros(1, 1, self.hidden_dim)) def forward(self, sentence): embeds = self.word_embeddings(sentence) lstm_out, self.hidden = self.lstm(embeds.view(len(sentence), 1, -1), self.hidden) tag_outputs = self.hidden2tag(lstm_out.view(len(sentence), -1)) tag_scores…

Basic LSTM Network

The first layer of a LSTM Network should always be an embedding layer which will take the vocabulary dictionary size as the input. Before we initialize the network we need to define the vocabulary that is simply a dictionary (with unique words) and each word will have a numerical index, to do so we can use the…

LSTM in PyTorch

To define a LSTM: lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers) To initialize the hidden state: h0 = torch.randn(1, 1, hidden_dim) c0 = torch.randn(1, 1, hidden_dim) We will need to wrap everything in Variable, input is a tensor inputs = Variable(inputs) h0 = Variable(h0) c0 = Variable(c0) Get the outputs and hidden state out, hidden = lstm(inputs,…

LSTM Cells

LSTM Cells will replace hidden layers on a Recurrent Neural Network, and can be stacked so you can have multiple hidden layers, all of them being LSTM cells. It is comprised of 4 gates with 2 inputs and 2 outputs: Learn Gate: it takes the Short term memory and the Event and combines them with a than function and then ignores a…

Hyperparameters

Is a variable that we need to set to a value before we can train a Neural Network. There are no magic numbers, it all depends on the architecture, data and problem to solve, etc. Optimizer Hyperparameters learning rate Is the most important hyperparameter of all, typical values are: 0.1, 0.01, 0.001, 0.0001, 0.00001 and…

Dropout

Will randomly turn on and off nodes in each layer (with some specified probability) each epoch during the feedforward and backpropagation, that means that the node that was disabled will not contribute to the prediction and will neither get the weighst updated during backpropagation, this will help the model generalize better and increase accuracy on…