Efficient way of dealing with node features

Sriharsha · May 7, 2021, 5:58am

If we have a smaller graph, It’s easy to store the node features along with the graph.

I am dealing with a bigger graph and I have all the node features in spark dataframes, What I was doing is as follows: spark_dataframe → pandas_dataframe → numpy_array → torch_tensor but as the graph is large, I am facing the driver memory issues with this approach.

Any efficient method to convert the pyspark dataframe to torch tensors?

VoVAllen · May 10, 2021, 7:29am

Hi,

It’s hard to tell the solution without knowing your detailed pipelines. One suggestion I could give is to save the intermediate partitioned results(numpy array) on the disk, and then concat them together into torch tensor. (such as save each row as a separate numpy file and concat them together later)

system · June 9, 2021, 7:30am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.