I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))

get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])

Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.

To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k

model = Sequential()
model.add(Dense(12, input_dim=8, init="uniform", activation='relu'))
model.add(Dense(8, init="uniform", activation='relu'))
model.add(Dense(1, init="uniform", activation='sigmoid'))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])

To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output

outputTensor = model.output #Or model.layers[index].output

Then we need to choose the variables that are in respect to the gradient.

  listOfVariableTensors = model.trainable_weights
  #or variableTensors = model.trainable_weights[0]

We can now calculate the gradients. It is as easy as the following:

gradients = k.gradients(outputTensor, listOfVariableTensors)

To actually run the gradients given an input, we need to use a bit of Tensorflow.

trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})

And thats it!

The below answer is with the cross entropy function, feel free to change it your function.

outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)

sess = tf.InteractiveSession()
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})