I just learned that the graph convolution of multiple subgraphs can be carried out using batch graph (dgl.batch).Thank you very much.
If multiple subgraphs are generated simultaneously based on a heterogeneous graph, is there any fast method? I use cycles to create subgraphs. I find that it takes a lot of time to create subgraphs.
Could you provide more details? For example, some code/pseudo-code would be appreciated.
g = dgl.heterograph(graph_data) # heterograph
all_sub = [] # Store all subgraphs
# len(sub_ids) = 300
sub_ids = [[[type1],[type2],[type3]], ...] # Store the ID required to generate the subgraph
for sub_id in sub_ids:
# It takes a lot of time
sub_g = dgl.node_subgraph(g, {'xx':sub_id[0], 'yy':sub_id[1], 'zz':sub_id[2]})
all_sub.append(sub_g)
# Subgraph batch processing
bg = dgl.batch(all_sub)
I don’t know any clever way to speed up, so I’m confused
I think depending on your scenario, there are multiple ways to accelerate it.
- If the
sub_ids
are the same across all iterations, you could pre-generate all the subgraphs and batch them before training. The cost will be paid once and is usually small compared with the training time. - If the
sub_ids
are the same across all iterations but at each iteration you wish to batch different subgraphs as mini-batch inputs to your model, you could create aDataset
with each data sample being a subgraph extracted fromsub_ids
. You can then use theGraphDataLoader
which speeds up data batching via multiprocessing. - If the
sub_ids
are different at each training iteration, then it becomes a graph sampling procedure. You can write a custom sampler where you calldgl.node_subgraph
in itssample
method. Below is a demo code that may not run but just to show the concept:
class MySampler(dgl.dataloading.Sampler):
def __init__(self):
super().__init__()
def sample(self, g, sub_id):
return g.node_subgraph(sub_id)
g = dgl.heterograph(graph_data) # heterograph
all_sub_ids = {'type1' : ..., 'type2' : ..., ...}
sampler = MySampler()
dataloader = dgl.dataloading.DataLoader(
g,
all_sub_ids,
sampler,
batch_size=xxx, # num nodes of each subgraph
..., # you can play with the arguments such as num_workers to accelerate the process
)
subgs = []
for subg in dataloader:
subgs.append(subg)
if len(subgs) == 300: # the number of subgs you'd like to batch
batched_subg = dgl.batch(subgs)
train_model(batch_subg)
subgs = []
Thank you very much.
I should be the third case.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.