I’m running into an issue where the backprop fails with this message:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
The odd thing is that the first few loops work, and if I run this locally instead of on a cluster, it works as well. Has anyone run into something similar?
There is only a single .backwards() in my code
Cluster GPU: TeslaK80
Local GPU: GTX 1660
I’ve narrowed it down to this troublesome function, but having trouble understanding why
def pool_values(g_input,g_connector, g_out,x):
#Input: input graph, g_connector
#Output: Pooled values of g_output
#Description: Pools a DGL graph
n_input_nodes = g_input.__len__()
n_output_nodes = g_out.__len__()
#Propogate Data
in_nodes = list(range(0,n_input_nodes))
out_nodes = list(range(n_input_nodes,n_input_nodes + n_output_nodes))
g_connector.nodes[in_nodes].data['h'] = x
g_connector.update_all(gcn_msg, gcn_reduce)
x = g_connector.nodes[out_nodes].data['h']
return x