While trying to reconcile my understanding of LSTMs pointed out here in this post by Christopher Olah implemented in Keras and following the blog written by Jason Brownlee for the Keras tutorial, I am confused about the following:
[samples, time steps, features]
and,Considering the above two questions that are referenced by the code below:
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0], look_back, 1))
########################
# The IMPORTANT BIT
##########################
# create and fit the LSTM network
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
model.reset_states()
Note: create_dataset takes a sequence of length N and returns a N-look_back
array of which each element is a look_back
length sequence.
As it can be seen, TrainX is a 3-D array with Time_steps and Feature being the last two dimensions respectively (3 and 1 in this particular code). Looking at the image below, does this mean that we are considering the many to one
case, where the number of pink boxes is 3? Or does it mean the chain length is 3 (?.
Does the features argument become relevant when we consider multivariate series? e.g. Modelling two financial stocks simultaneously?
Does stateful LSTMs mean that we save the cell memory values between runs of batches? If this is the case, batch_size
is one, and the memory is reset between the training runs, so what was the point of saying that it was stateful? I am guessing this is related to the fact that training data is not shuffled, but am not sure how.
Any thoughts? Image reference: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
A bit confused about @van's comment about the red and green boxes being equal. Does the following API calls correspond to the unrolled diagrams? Especially noting the second diagram (batch_size
was arbitrarily chosen.):
For people who have done Udacity's deep learning course and confused about the time_step argument, look at the following discussion: https://discussions.udacity.com/t/rnn-lstm-use-implementation/163169
It turns out model.add(TimeDistributed(Dense(vocab_len)))
was what I was looking for. Here is an example: https://github.com/sachinruk/ShakespeareBot
I have summarised most of my understanding of LSTMs here: https://www.youtube.com/watch?v=ywinX5wgdEU