How to build custom graph classification datasets in DGL

In my situation, I have encoded my graph data in the DGLgraph types, however, I can not pack my data into the dgl data loader. Can somebody can help me?
:frowning:

1 Like

Can you show your code snippet? Have you checked the user guide?

sureļ¼Œhere is my code, and I donā€™t know how to PROCESSING them since I have already represented them into dglgraghs

here is what my data looks like


Graph(num_nodes=188, num_edges=450,
      ndata_schemes={'glycine': Scheme(shape=(4,), dtype=torch.int32)}
      edata_schemes={})
tensor(0., device='cuda:0')

This is how I pack those data

data_set = []
for i in range(len(rna_data)):
    cur_data = rna_data.iloc[i]

    seq = cur_data['seq']
    matching = cur_data['matching']
    label = cur_data['label']

    u = []
    v = []
    for idx in range(len(seq) - 1):
        u.append(idx)
        v.append(idx+1)

    par_dict = find_parentheses(matching)

    matching_list = (collections.OrderedDict(sorted(par_dict.items())))
    matching_list = list(matching_list.items())

    skip_u = []
    skip_v = []
    for item in matching_list:
        skip_u.append(item[0])
        skip_v.append(item[1])
    try:
        g = dgl.graph((u, v))
        g.edata['bonds'] = torch.tensor([[1, 0]] * len(u))
        g.add_edges(skip_u, skip_v, {'bonds': torch.tensor([[0, 1]] * len(skip_u))})
        g = dgl.to_bidirected(g)
    except Exception as e:
        continue
    g = g.to('cuda')
    glycine_one_hot_list = []
    for glycine in seq:
        one_hot = glycine_one_hot_dict[glycine]
        glycine_one_hot_list.append(one_hot)

    g.ndata['glycine'] = torch.tensor(glycine_one_hot_list).cuda()
    label = torch.tensor(label, dtype=torch.float32).cuda()

    data = (g, label)
    data_set.append(data)

thanks a lot again :yum:

plus, another question I wanna ask is, in graph classification models, why there s no softmax or sigmoid function at the output layer?

Basically you can follow the custom dataset interface for PyTorch as here. Basically you just need to define a class as follows:

class GraphData:
    def __init__(self):
        # A list of preprocessed DGLGraphs
        self.graphs = ...  
        # Labels corresponding to the DGLGraphs
        # self.labels[i] is the labels corresponding to self.graphs[i]
        self.labels = ...

    def __getitem__(self, i):
        return self.graphs[i], self.labels[i]

    def __len__(self):
        return len(self.graphs)

Once you have defined such a class, you can then use it as a normal PyTorch dataset.

Typically we call the values before sigmoid/softmax ā€œlogitsā€ and the values after that ā€œprobabilitiesā€. There are different loss functions for taking logits and probabilities. In some cases, using logits in loss computation can be more numerically stable as that allows merging multiple operations into one operation.