Image Captioning Project

  In this project I will train a network with the COCO Dataset (Common Objects in Context). This dataset contains images and a set of 5 different captions per image. I will train a CNN-RNN model by feeding it with the image and captions so the network will learn to generate captions given an image. Once trained…

NLTK

NLTK Stands for Natural Language Toolkit. Tokenization is just splitting sentences in a list of words. Word Tokenization with Python built in functions word = text.split() Word Tokenization with NLTK from nltk.tokenize import word_tokenize words = word_tokenize(text) Sencentes Tokenization with NLTK from nltk.tokenize import sent_tokenize words = sent_tokenize(text) NLTK Documentation

Embeddings

An embedding is a mapping from discrete objects, such as words, to vectors of real numbers. For example, a 300-dimensional embedding for English words could include: blue: (0.01359, 0.00075997, 0.24608, …, -0.2524, 1.0048, 0.06259) blues: (0.01396, 0.11887, -0.48963, …, 0.033483, -0.10007, 0.1158) orange: (-0.24776, -0.12359, 0.20986, …, 0.079717, 0.23865, -0.014213) oranges: (-0.35609, 0.21854, 0.080944, …,…

Defining the network and feedforward function

class LSTMTagger(nn.Module): def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size): super(LSTMTagger, self).__init__() self.hidden_dim = hidden_dim self.word_embeddings = nn.Embedding(vocab_size, embedding_dim) self.lstm = nn.LSTM(embedding_dim, hidden_dim) self.hidden2tag = nn.Linear(hidden_dim, tagset_size) self.hidden = self.init_hidden() def init_hidden(self): return (torch.zeros(1, 1, self.hidden_dim), torch.zeros(1, 1, self.hidden_dim)) def forward(self, sentence): embeds = self.word_embeddings(sentence) lstm_out, self.hidden = self.lstm(embeds.view(len(sentence), 1, -1), self.hidden) tag_outputs = self.hidden2tag(lstm_out.view(len(sentence), -1)) tag_scores…

Basic LSTM Network

The first layer of a LSTM Network should always be an embedding layer which will take the vocabulary dictionary size as the input. Before we initialize the network we need to define the vocabulary that is simply a dictionary (with unique words) and each word will have a numerical index, to do so we can use the…

LSTM in PyTorch

To define a LSTM: lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers) To initialize the hidden state: h0 = torch.randn(1, 1, hidden_dim) c0 = torch.randn(1, 1, hidden_dim) We will need to wrap everything in Variable, input is a tensor inputs = Variable(inputs) h0 = Variable(h0) c0 = Variable(c0) Get the outputs and hidden state out, hidden = lstm(inputs,…

LSTM Cells

LSTM Cells will replace hidden layers on a Recurrent Neural Network, and can be stacked so you can have multiple hidden layers, all of them being LSTM cells. It is comprised of 4 gates with 2 inputs and 2 outputs: Learn Gate: it takes the Short term memory and the Event and combines them with a than function and then ignores a…

Hyperparameters

Is a variable that we need to set to a value before we can train a Neural Network. There are no magic numbers, it all depends on the architecture, data and problem to solve, etc. Optimizer Hyperparameters learning rate Is the most important hyperparameter of all, typical values are: 0.1, 0.01, 0.001, 0.0001, 0.00001 and…

Dropout

Will randomly turn on and off nodes in each layer (with some specified probability) each epoch during the feedforward and backpropagation, that means that the node that was disabled will not contribute to the prediction and will neither get the weighst updated during backpropagation, this will help the model generalize better and increase accuracy on…