GPU

The GPU performance state APIs are used to get and set various performance levels on a per-GPU basis. P-States are GPU active/executing performance capability and power consumption states. P-States range from P0 to P15, with P0 being the highest performance/power state, and P15 being the lowest performance/power state. Each P-State maps to a performance level.…

Batch Size

Batch size refers to the number of training examples utilized in one step or iteration. One step or iteration is one step of gradiend decent (one update of weights and parameters) The batch size can be either: The same number of the total number of samples which makes one step = an epoch, this is called batch…

Probability Basic Concepts

  Independent Events The probability of an event does not affect the probability of the next event. So for example tossing a coin does not affect the probability of the next flip. Dependent Events Two events are dependent when the probability of one influences the likelihood of the other event. Joint Probability Is the probability…

Embeddings

An embedding is a mapping from discrete objects, such as words, to vectors of real numbers. For example, a 300-dimensional embedding for English words could include: blue: (0.01359, 0.00075997, 0.24608, …, -0.2524, 1.0048, 0.06259) blues: (0.01396, 0.11887, -0.48963, …, 0.033483, -0.10007, 0.1158) orange: (-0.24776, -0.12359, 0.20986, …, 0.079717, 0.23865, -0.014213) oranges: (-0.35609, 0.21854, 0.080944, …,…

LSTM in PyTorch

To define a LSTM: lstm = nn.LSTM(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers) To initialize the hidden state: h0 = torch.randn(1, 1, hidden_dim) c0 = torch.randn(1, 1, hidden_dim) We will need to wrap everything in Variable, input is a tensor inputs = Variable(inputs) h0 = Variable(h0) c0 = Variable(c0) Get the outputs and hidden state out, hidden = lstm(inputs,…

Hyperparameters

Is a variable that we need to set to a value before we can train a Neural Network. There are no magic numbers, it all depends on the architecture, data and problem to solve, etc. Optimizer Hyperparameters learning rate Is the most important hyperparameter of all, typical values are: 0.1, 0.01, 0.001, 0.0001, 0.00001 and…

Dropout

Will randomly turn on and off nodes in each layer (with some specified probability) each epoch during the feedforward and backpropagation, that means that the node that was disabled will not contribute to the prediction and will neither get the weighst updated during backpropagation, this will help the model generalize better and increase accuracy on…