Efficient way of dealing with node features

If we have a smaller graph, It’s easy to store the node features along with the graph.

I am dealing with a bigger graph and I have all the node features in spark dataframes, What I was doing is as follows: spark_dataframe → pandas_dataframe → numpy_array → torch_tensor but as the graph is large, I am facing the driver memory issues with this approach.

Any efficient method to convert the pyspark dataframe to torch tensors?

Hi,

It’s hard to tell the solution without knowing your detailed pipelines. One suggestion I could give is to save the intermediate partitioned results(numpy array) on the disk, and then concat them together into torch tensor. (such as save each row as a separate numpy file and concat them together later)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.