How to build custom graph classification datasets in DGL

annihi1ation · November 9, 2020, 8:02am

In my situation, I have encoded my graph data in the DGLgraph types, however, I can not pack my data into the dgl data loader. Can somebody can help me?

mufeili · November 9, 2020, 5:43pm

Can you show your code snippet? Have you checked the user guide?

annihi1ation · November 10, 2020, 6:44am

sure，here is my code, and I don’t know how to PROCESSING them since I have already represented them into dglgraghs

here is what my data looks like


Graph(num_nodes=188, num_edges=450,
      ndata_schemes={'glycine': Scheme(shape=(4,), dtype=torch.int32)}
      edata_schemes={})
tensor(0., device='cuda:0')

This is how I pack those data

data_set = []
for i in range(len(rna_data)):
    cur_data = rna_data.iloc[i]

    seq = cur_data['seq']
    matching = cur_data['matching']
    label = cur_data['label']

    u = []
    v = []
    for idx in range(len(seq) - 1):
        u.append(idx)
        v.append(idx+1)

    par_dict = find_parentheses(matching)

    matching_list = (collections.OrderedDict(sorted(par_dict.items())))
    matching_list = list(matching_list.items())

    skip_u = []
    skip_v = []
    for item in matching_list:
        skip_u.append(item[0])
        skip_v.append(item[1])
    try:
        g = dgl.graph((u, v))
        g.edata['bonds'] = torch.tensor([[1, 0]] * len(u))
        g.add_edges(skip_u, skip_v, {'bonds': torch.tensor([[0, 1]] * len(skip_u))})
        g = dgl.to_bidirected(g)
    except Exception as e:
        continue
    g = g.to('cuda')
    glycine_one_hot_list = []
    for glycine in seq:
        one_hot = glycine_one_hot_dict[glycine]
        glycine_one_hot_list.append(one_hot)

    g.ndata['glycine'] = torch.tensor(glycine_one_hot_list).cuda()
    label = torch.tensor(label, dtype=torch.float32).cuda()

    data = (g, label)
    data_set.append(data)

thanks a lot again

annihi1ation · November 10, 2020, 7:36am

plus, another question I wanna ask is, in graph classification models, why there s no softmax or sigmoid function at the output layer?

mufeili · November 10, 2020, 3:38pm

Basically you can follow the custom dataset interface for PyTorch as here. Basically you just need to define a class as follows:

class GraphData:
    def __init__(self):
        # A list of preprocessed DGLGraphs
        self.graphs = ...  
        # Labels corresponding to the DGLGraphs
        # self.labels[i] is the labels corresponding to self.graphs[i]
        self.labels = ...

    def __getitem__(self, i):
        return self.graphs[i], self.labels[i]

    def __len__(self):
        return len(self.graphs)

Once you have defined such a class, you can then use it as a normal PyTorch dataset.

mufeili · November 10, 2020, 3:43pm

Typically we call the values before sigmoid/softmax “logits” and the values after that “probabilities”. There are different loss functions for taking logits and probabilities. In some cases, using logits in loss computation can be more numerically stable as that allows merging multiple operations into one operation.