Rollout-Buffer of graphs for RL

Hi all,

I am working on a project where I try to implement RL with observations that are encoded as graphs. In my original RL algorithm I fill a rollout-buffer with observations (tensor shaped), which is then used for training a neural network.

I now want to move to a graph neural network. My rollout-buffer should again be filled with observations - which are now graphs with different topologies, nodes and features - to again be used for training over a minibatch. However, I am struggling with finding an efficient way to store these observations. Maybe some of you might have some ideas that could help me!

Two ideas/options I could maybe consider:

  • Store the observations as tensors and somehow make a batched graph of these batched tensors before applying my GNN?
  • First make the graph encodings from the original observation and storing the actual graphs in the rollout-buffer and then batching them before applying my GNN? Would this be efficient to store the actual graphs?

Any help, ideas or suggestions would be greatly appreciated!
Thanks!

Kind regards,
Erik

What’s an observation in your case? Is it a pair of tensors representing the source nodes and destination nodes of the edges in a graph?

Hi Mufeili,

Thank you so much for the quick reply! :slight_smile:

An observation is now a Dict observation (using OpenAi Gym) containing two Box spaces. From one of these observations I can make a graph. I am still trying to figure out what would be the smart way to convert this to a graph and store it in a buffer. So maybe what you are suggesting is actually the best way to do it: go from my original observation to a pair of tensors which represent the graph representation for one observation. Save these tensors in my buffer and then for each batch make all the graphs and batch them?

Kind regards,
Erik

Graph construction can be costly, if you are going to randomly sample observations from the buffer and create a batch from them, it will be better to directly store graphs in the buffer.