Packed question related with link prediction

Maulpy · December 25, 2020, 1:49pm

Dear All members of DGL community.

I have worked on this whole day, i haven’t tested it at all; i am little scared what will come out of it hha. I begging answers from all of gurus here, several questions which i really can’t figure out after many hours of browsing. i hope it doesn’t deviate too much. thanks in advance

This post is the continuation of this

So my current goal is to get a solid code of link prediction, which i think is the simplest and the most compatible to what i seek, i really hope later i will get preliminary result before i turn into another model. and get a result of many type of lost (Cross-entropy, BPR, Margin, etc).

First of all, i set my nodes data differently from what is written in guide, each nodes data is loaded from tensor file in my computer. and it is obviously in rank-2 tensor format.

What i want to ask is the following.

Overall, The clarification the modification that i made is make sense
Several sub-questions that follows
2.1 I assume for the simplest mechanics i could think of that strength definition is by defining difference between two nodes as input of edges features, and made a dot product out of it that is propotional to the distance (i put the distance in 'G.nodes[].data[‘linkdata’]). is it will be refined iteratively in the computation? i still have some doubt over it.
2.2. the nodes data is really different in dimension, each loaded from different tensor file, i hope the broadcasting will work, i notice i don’t do any masking or padding as some torch refer to.
2.3. I really hope will find a better write the
2.4. The question about severed link that happen in the graph evolution (e.g : if the nodes occur in between), this is really just pop out in my mind this noon, i don’t know if the code will define automatically without i define too much, i did it one by one with only 9 nodes, and it is very exhausting. i hope i will get a simpler way to do this.
2.5. i don’t know how to build a (maybe dictionary) of nodes and its type to be more compact and callable, i made it into heterogenous type because i think the writing is more compatible than in homogenous (even it is indeed homogenous).
2.6. Related to Part 2; i don’t know what to do with SAGE class, along with RGCN, user_feats and item_feats, i change anything that i think is necessary, the last two definition is somewhat baffle me, even though i think all of the input are complete (i change the ‘hetero_graph’ into ‘G’ to fit the part 1 code).

That’s All, Thank you very much in advance.
Pardon for the swear words , please pay no heed, it is all adressed to me only

So here is the code. i simply made it into two parts in single jupyter notebook file, one for developing dataset, the other is an effort in adapting the link prediction code from guidance.

This one is introduction part, containing commentary and my lines of thought

#0.UNDERLYING HTPOTHESIS
#1.DEVELOP THE DATASET
#2. ADAPT THE LINK PREDICTION FOR HETEROGENOUS GRAPH FIRST.
#0. UNDERLYING HYPOTHESIS
#This is a preliminary phase before exploring another model.
#Primary function that are selected are : ‘fn.v_sub_u = x’ , fn.e_dot_x. this is a simple yet strong definition of spatial interaction strength (i think)
#This may provide usefulness later https://docs.dgl.ai/api/python/dgl.data.html#edge-prediction-datasets

This one for dataset development (PART ONE)

#1. DEVELOP THE DATASET
#TAKEN FROM ‘DATA PREPARATION INTO TENSOR FORM’ Line 5.

#JUST OCCUR TO ME, HOW TO ADDRESS SEVERED LINK ALONG GRAPH EVOLUTION?CAN WE JUST SET IT TO UPDATED LINK FEATURES?
#Link assumed to be at most skip 1 row/column to be assumed connect, update taken into account those information. outside those assumption the edges is not taken into consideration
#therefore there are only 16 edge type.
#can we assume full configuration is set, but the graph added one by one. very tedious, too many assumption. CAN WE MAKE FOR IT TAKE ALL OF IT?
#reversed return value from assumption will be accepted

import torch
import pandas as pd
import dgl

#For this case i modified my base data to the border of the layout in Area X, in the 2 conditions (all in border):
#Adjacent, but not immediate :
#14-Nov : (Area-A1, Area-A2)
#19-Nov : (Area-B1, Area-B2)
#21-Nov : (Area-C1, Area-C2, Area-C3, Area-C4)
#23-Nov : (Area-D5)

#Adjacent, and relatively immediate :
#18-Dec : (Area-E1, Area-E2, Area-E3, Area-E4, Area-E5, Area-E6)
#20-Dec : (Area-F1, Area-F2, Area-F3, Area-F4)

#Modified, pd.read_csv --> pd.read_excel https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

#If we skip the centre (it will be positioned last), how we counter it later?too muny periphery that will be overlooked. Actually this is a good case for how te center will afflict the other?
#Procedure contained here (https://docs.dgl.ai/en/latest/api/python/dgl.dataloading.html) doesn’t simplify loading from disk

#FOR TYPE 1
print(“Loading xlsx…”)
AreaA1 = pd.read_excel(‘C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A1.xlsx’)
AreaA2 = pd.read_excel(‘C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A2.xlsx’)
…
…

#Is it possible for data type to be float32?as the guide suggest.
print(“Converting to Tensor…”)
Area_A1 = torch.tensor(AreaA1.values, dtype=torch.int64)
Area_A2 = torch.tensor(AreaA2.values, dtype=torch.int64)
…
…

torch.save(Area_A1, ‘C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A1.pt’)
torch.save(Area_A2, ‘C:/Users/Acer/DGLCONDA05/FDC Data Type 1/Area-A2.pt’)

#IS THERE A SIMPLER WAY TO DEFINE HETEROGRAPH THAN THE BELOW???
#Build dictionary of node object from node type and edge type, make it in generalized form

graph_data_type1 = {

(‘Area_A1’, ‘0dx-2y0’, ‘Area_A2’): (torch.tensor([0]), torch.tensor([1])),
(‘Area_A2’, ‘5dx1y1’, ‘Area_B1’): (torch.tensor([1]), torch.tensor([2])),
…
}

#How to define node more conveniently, is dict-inng like this work?

#How to define edges data more conveniently?
G.edges[‘0dx-2y0’].data[‘linkdata’] = torch.tensor([0,-2,0])
G.edges[‘5dx1y1’].data[‘linkdata’] = torch.tensor([5,1,1])
…
…

G = dgl.DGLHeterograph(graph_data_type1)
G.ndata[‘gabungan’][0] = Area_A1
G.ndata[‘gabungan’][1] = Area_A2
…
…

#How to define ‘etype’ here?All is FORTUNATELY automatically defined as in ‘Set/get Features for All Edges of a Single Edge Type’ part https://docs.dgl.ai/en/latest/generated/dgl.DGLGraph.edges.html

This part for incorporating Link Prediction Code (PART TWO)

#2. ADAPT THE LINK PREDICTION FOR HETEROGENOUS GRAPH FIRST.

    # h contains the node representations for each node type computed from
    # the GNN defined in the previous section (Section 5.1).
    # maybe use elrow iterate for the particular features of data?no?matrix, no need, function is accomadating already.

#BAHH, HOW TO LOAD THE DATA AGAIN???
#IS THIS apply_edges really does iterate over nodes?
#for sub, see https://docs.dgl.ai/generated/dgl.function.v_sub_u.html#dgl.function.v_sub_u
#for dot, see https://docs.dgl.ai/generated/dgl.function.u_dot_e.html#dgl.function.u_dot_e

import dgl
import pytorch
import pandas
import numpy
import networkx
import dgl.nn as dglnn
import torch.nn as nn
import torch.nn.functional as F

class HeteroDotProductPredictor(nn.Module):

def forward(self, G, h, etype):
    with G.local_scope():
        for i in range(8) 
            x = fn.v_sub_u
            G.ndata['gabungan'][i] = h 
            G.apply_edges(fn.e_dot_x('h', 'h', 'linkdata'), etype=etype) 
            return G.edges[etype].data['linkdata']
    
def construct_negative_graph(G, k, etype):
    utype, _, vtype = etype
    src, dst = G.edges(etype=etype)
    neg_src = src.repeat_interleave(k)
    neg_dst = torch.randint(0, graph.number_of_nodes(vtype), (len(src) * k,))
    return dgl.heterograph(
        {etype: (neg_src, neg_dst)},
        num_nodes_dict={ntype: graph.number_of_nodes(ntype) for ntype in graph.ntypes})

#THIS PART BELOW IS YET TO BE CLEAR TO ME… SHED SOME LIGHT UPON ME…see homogenous graph part explanation.
#https://docs.dgl.ai/en/0.5.x/guide/training-node.html

#Define SAGE class first as per https://docs.dgl.ai/en/0.5.x/guide/training-node.html
#Contruct a two-layer GNN model

class SAGE(nn.Module):

def __init__(self, in_feats, hid_feats, out_feats):
    super().__init__()

    self.conv1 = dglnn.SAGEConv(
        in_feats=in_feats, out_feats=hid_feats, aggregator_type='mean')
    self.conv2 = dglnn.SAGEConv(
        in_feats=hid_feats, out_feats=out_feats, aggregator_type='mean')

def forward(self, graph, inputs):
    # inputs are features of nodes
    h = self.conv1(G, inputs)
    h = F.relu(h)
    h = self.conv2(G, h)
    return h

#Need to explore RGCN more at https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify.py
#But it seems irrelevant.
#All ‘hetero_graph’ are changed into ‘G’

class Model(nn.Module):

def __init__(self, in_features, hidden_features, out_features, rel_names):
    super().__init__()

    self.sage = RGCN(in_features, hidden_features, out_features, rel_names)
    self.pred = HeteroDotProductPredictor()
def forward(self, G, neg_g, j, etype):
    h = self.sage(G, j)
    return self.pred(G, h, etype), self.pred(neg_g, h, etype)

def compute_loss(pos_score, neg_score):
    # Margin loss
    n_edges = pos_score.shape[0]
    return (1 - neg_score.view(n_edges, -1) + pos_score.unsqueeze(1)).clamp(min=0).mean()
    
    k = 3 #i hope this is reasonable
    model = Model(5, 5, 5, G.etypes) #don't know how to adjust it, is it reasonable???
    #'feats' means feature size, i will replace user and item to source and destination
    #WTF is user and item stand for? just play along and change into source and destination, still nonsense though.
    source_feats = G.nodes[:].data['linkdata']
    destination_feats = G.nodes[:].data['linkdata']
    node_features = {'user': user_feats, 'item': item_feats}
    opt = torch.optim.Adam(model.parameters())
    #https://docs.dgl.ai/en/0.4.x/generated/dgl.DGLGraph.edges.html, ":" means all right?
    for epoch in range(10):
        negative_graph = construct_negative_graph(G, k, ('source', : , 'destination'))
        pos_score, neg_score = model(hetero_graph, negative_graph, node_features, ('source', : , 'destination'))
        loss = compute_loss(pos_score, neg_score)
        opt.zero_grad()
        loss.backward()
        opt.step()
        print(loss.item())

Maulpy · December 25, 2020, 2:03pm

Not all of this line are displayed in correct form, i hope it is still understandable, i put too many hashtag to write down my thought.

Maulpy · December 26, 2020, 4:43am

Oh i actually tested it last night, there are one problem in part 1; G.edges[].data[‘linkdata’] = torch tensor([a,b,c]).

It return the error, that the number of features don’t match with edge or something. What i really intended to do is assign a vector (temporal-spatial data) to the edges features for latter processing in message passing. Why it seems it only take tensor i assigned as edge id or something.

I also changed the dtype into dtype=tensor.float32

Thank you very much.

mufeili · December 28, 2020, 3:47am

1

Are you working with homogeneous graphs or heterogeneous graph? G.nodes[].data[...] or G.edges[].data[...] are not valid usage. See the user guide and doc for ndata and edata.

2

What do you mean by “defining difference between two nodes as input of edges features”?

3

the nodes data is really different in dimension, each loaded from different tensor file, i hope the broadcasting will work, i notice i don’t do any masking or padding as some torch refer to.

Do you mean features of different nodes have different dimension or do you mean there are multiple features for all nodes? For the former case, zero padding may still be a first thing you want to try. For the latter case, just concatenate them along the second dimension.

Maulpy · December 28, 2020, 5:10am

1

Actually, it is homogenous, but based on what i understand from example in documentary. If you set the features ‘x’ for example, it swoop generally for all nodes. Because in my case different nodes has different nodes features matrix dimension (only the row number is different) and i need to load it individually from tensor files in my computer i write it in heterogenous manner.

for example, nodes ‘A’ contain matrix features with 350rows x 20 columns, nodes 'B; contain matrix features with 400rows x 20columns, it is connected with edge that has 3 dimensional vector features, such as (0, -2, 0). each nodes features data is loaded from file ‘/…/A.pt’ etc.

Yes, exactly i have follow the guidance. just like the last example for edges in this

hg.edges['follows'].data['h'] = torch.ones(2, 1)
hg.edges['follows'].data['h']

i set it similiarly, for the edge ‘0dx-2y0’ i will assign features (0,-2,0) for message passing.

G.edges[‘0dx-2y0’].data[‘linkdata’] = torch.tensor([0,-2,0])
G.edges[‘5dx1y1’].data[‘linkdata’] = torch.tensor([5,1,1])
etc.
But it return error, the features doesn’t match with edge number each has 3 and 1 repectively, i also look up to the source code but i think i got it right.

This is only the function that i define as interaction strength parameter in message passing as you suggest in previous post, using simple sub and dot. i want to know if it will then incorporated in DGL with increased complexity along the way of computation (i know this is stupid question).

Just like what i replied in number 1, it is both, actually there are multiple features in each nodes with different dimension, so i need to combine padding and concantenation?

mufeili · December 28, 2020, 6:07am

Actually, it is homogenous, but based on what i understand from example in documentary. If you set the features ‘x’ for example, it swoop generally for all nodes. Because in my case different nodes has different nodes features matrix dimension (only the row number is different) and i need to load it individually from tensor files in my computer i write it in heterogenous manner.

for example, nodes ‘A’ contain matrix features with 350rows x 20 columns, nodes 'B; contain matrix features with 400rows x 20columns, it is connected with edge that has 3 dimensional vector features, such as (0, -2, 0). each nodes features data is loaded from file ‘/…/A.pt’ etc.

DGL expects nodes/edges of a same type to have features of same shape and does not encourage feature assignment for a proper subset of nodes/edges of a same type. If different nodes/edges of a same type have features of different shape, you can either do zero padding or treat them as different node/edge types.

G.edges[‘0dx-2y0’].data[‘linkdata’] = torch.tensor([0,-2,0])
G.edges[‘5dx1y1’].data[‘linkdata’] = torch.tensor([5,1,1])
etc.
But it return error, the features doesn’t match with edge number each has 3 and 1 repectively, i also look up to the source code but i think i got it right.

How many edges do you have for edge type 0dx-2y0 and 5dx1y1? The first dimension of the edge features need to be the same as the number of edges.

This is only the function that i define as interaction strength parameter in message passing as you suggest in previous post, using simple sub and dot. i want to know if it will then incorporated in DGL with increased complexity along the way of computation (i know this is stupid question).

The question is fine and you are right that the computation is in proportion to the number of edges.

Maulpy · December 28, 2020, 6:24am

Oke, so i was correct to assume heterogenous, and yes it doesnt just applied to subnodes. So code from PDBbind is suitable?and all the masking and broadcasting also necessary, could you give me a reccomended link?

Ah yeah i got it bit innacurate, actually it maybe better to define it individually from nodes like in the notes of heterograph(?) source code. I assume only one edge per pair (not multigraph).

Ok, glad to hear it. I hope i can run the lost function and visualize it before the end of the year. Oh in another post, yeah this is out of topic.

Mainly what i meant in another post as ‘severed link’ is a new edge that happen when nodes rise in between existing nodes, and i feel an impulse to define it manually (i did it with 9 nodes only and it is a bit nerve wracking). I think it is superflous to define it manually, and there must be any code writing that could take into account this situation. Just that, i hope it is understandable.

Maulpy · December 28, 2020, 6:27am

Oh regarding the second response, so how could i match the dimension between the two? I clearly only have one edges and i want to generally assign arbitrary dimension of vector (3 dimension). Do you mean this padding techniques also applied here?

Maulpy · December 28, 2020, 6:33am

It is accidental, i cannot exactly define how much, it is better define in individual manner for each edges.

Maulpy · December 29, 2020, 3:21am

I tried it again with edata like in the note of this

G.edata['linkdata'] = {('Area-A1', '0dx-2y0', 'Area-A2'): torch.tensor([0,-2,0]),
                       ('Area A-2',  '5dx1y1', 'Area-B2'): torch.tensor([5,1,1]),
                       ('Area-A1', '5dx-1y1', 'Area-B2'): torch.tensor([5,-1,1]),
                       .....
                       .....

                      ('Area-X1', '2dx1y1', 'Area-X2'): torch.tensor([2,1,1])

}

But eventhough return DGL mismatch edge-features number error, it differ than previously which pointing since first entry of edata, now it is in the last edata (‘Area-X1’, ‘2dx1y1’, ‘Area-X2’).

I don’t really understand what it means for it to be the same, but clearly for the example ‘edata’ doesn’t suit my needs as it only defined edge by its nodes. it is still only suitable with hg.edges[].data[], and by the way what do you mean with “first dimension”?

Maulpy · December 29, 2020, 6:53am

Oh yeah, so no matter what the tensor must be in (2, …) dimension. Yeah okay, now i know that it is really different from what is in my mind. There must be something that i miss, but the guidance state to ‘Set and get feature ‘h’ for a graph of multiple edge types.’ i thought it is somewhat flexible and doesn’t have to take into account the connecting nodes. i will look again also in message passing.

I initially though that

Message function must be the tensor of edata : (e.g : torch.tensor([2,1,1])), but it is in wrong dimension and i don’t really have a grasp the necessary reason yet, eventhough it reflect minium requirement of spatial-temporal feature of edges in my case. part 2.4 in message passing, ‘affinity’ also seems very relaxed.
Reduce function must be those combination of binary function, which i assume in this case as difference of node features.
Update function, with addition of source node features added with multiplication of those message function, reduce function and certain scalar (which i think the ‘scalar’ doesn’t need to be defined and automatically learnable by DGL)

For zero padding (because i don’t incorporate types in nodes), is it enough to take the pytorch class such ZeroPad2d? but in my mind i intend to take the form of non-zero padding, because it will take akin with continuous function, any implicit convolution in DGL i am afraid will take a wrong result by taking zero as product. (am i wrong?). I have seen one in pdbbind example and not really grasp the logic behind. it seems there are another technique in pytorch that include those. or this.

And lastly the most important thing, i think the code in link prediction no need many modifications, but i can’t be too sure before this features preparation is over.

Btw, Happy new year @mufeili (if you have a holiday).

mufeili · December 29, 2020, 7:38pm

Thank you. Happy new year.

and all the masking and broadcasting also necessary, could you give me a reccomended link?

Why do you need broadcasting? For zero padding, just search PyTorch zero padding.

Ah yeah i got it bit innacurate, actually it maybe better to define it individually from nodes like in the notes of heterograph(?) source code. I assume only one edge per pair (not multigraph).

It sounds like you treat each pair of nodes as an edge type. This will result in an extremely large number of edge types and is in general not the recommended way. How large is your graph? You should probably treat all nodes with Area-xx as a single node type and all edges as a single edge type unless there is a strong incentive.

I’m also a bit lost with the questions and words. Can you try making them more succinct and precise? If you have new discovery, you can edit the post rather than make another post.

Maulpy · January 1, 2021, 10:48am

Yeah, now i found what make make it doesn’t match, the writing isn’t valid like you said, the following code is still crude. I set the node and edges features still in individual manner. I copy the loss function exactly like in 5.3 except chaging ‘h’ and etc, but the following error occur, although it is clearly in definition of DotProductPredictor class in 5.3 :

NameError                                 Traceback (most recent call last)
<ipython-input-28-f9fc5552814f> in <module>
148 opt = th.optim.Adam(model.parameters())
149 for epoch in range(10):
--> 150     negative_graph = construct_negative_graph(GX, k) #NameError: name 'construct_negative_graph' is not defined
151     pos_score, neg_score = model(GX, negative_graph, node_features)
152     loss = compute_loss(pos_score, neg_score)

NameError: name 'construct_negative_graph' is not defined

I don’t pretend that i understand everything. here is the code that include 9 nodes and 22 edges, with bottom zero-padding features to all the node features so the tensor dimension right now is 26columns x 487 rows.

import dgl
import numpy as np
import torch as th
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
from dgl.nn import SAGEConv

class SAGE(nn.Module):
    def __init__(self, in_feats, hid_feats, out_feats):
        super().__init__()
        self.conv1 = dgl.nn.SAGEConv(
            in_feats=in_feats, out_feats=hid_feats, aggregator_type='mean')
        self.conv2 = dgl.nn.SAGEConv(
            in_feats=hid_feats, out_feats=out_feats, aggregator_type='mean')

    def forward(self, graph, inputs):
        # inputs are features of nodes
        XX = self.conv1(graph, inputs)
        XX = F.relu(XX)
        XX = self.conv2(graph, XX)
        return XX

class DotProductPredictor(nn.Module):
    def forward(self, graph, XX):
        # h contains the node representations computed from the GNN defined
        # in the node classification section (Section 5.1).
        with graph.local_scope():
            graph.ndata['XX'] = XX
            graph.apply_edges(fn.u_dot_v('XX', 'XX', 'score'))
            return graph.edata['score']
        
    def construct_negative_graph(graph, k):
        src, dst = graph.edges()
        
        neg_src = src.repeat_interleave(k)
        neg_dst = th.randint(0, graph.number_of_nodes(), (len(src) * k,))
        return dgl.graph((neg_src, neg_dst), num_nodes=graph.number_of_nodes())
    
class Model(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.sage = SAGE(in_features, hidden_features, out_features)
        self.pred = DotProductPredictor()
    def forward(self, graph, neg_g, x):
        h = self.sage(graph, x)
        return self.pred(graph, XX), self.pred(neg_g, XX)
    
    def compute_loss(pos_score, neg_score):
        # Margin loss
        n_edges = pos_score.shape[0]
        return (1 - neg_score.view(n_edges, -1) + pos_score.unsqueeze(1)).clamp(min=0).mean()


#FOR TYPE 1
print("Loading CSV...")
Area1_A1 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A1.xlsx')
Area1_A2 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A2.xlsx')
Area1_A3 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A3.xlsx')
Area1_A4 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A4.xlsx')
Area1_A5 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A5.xlsx')
Area1_A6 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A6.xlsx')
Area1_A7 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A7.xlsx')
Area1_A8 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A8.xlsx')
Area1_A9 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area1-A9.xlsx')

#Is it possible for data type to be float32?as the guide suggest.
print("Converting to Tensor...")
Area1_A1 = th.tensor(Area1_A1.values, dtype=th.float32)
Area1_A2 = th.tensor(Area1_A2.values, dtype=th.float32)
Area1_A3 = th.tensor(Area1_A3.values, dtype=th.float32)
Area1_A4 = th.tensor(Area1_A4.values, dtype=th.float32)
Area1_A5 = th.tensor(Area1_A5.values, dtype=th.float32)
Area1_A6 = th.tensor(Area1_A6.values, dtype=th.float32)
Area1_A7 = th.tensor(Area1_A7.values, dtype=th.float32)
Area1_A8 = th.tensor(Area1_A8.values, dtype=th.float32)
Area1_A9 = th.tensor(Area1_A9.values, dtype=th.float32)

th.save(Area1_A1, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A1.pt')
th.save(Area1_A2, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A2.pt')
th.save(Area1_A3, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A3.pt')
th.save(Area1_A4, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A4.pt')
th.save(Area1_A5, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A5.pt')
th.save(Area1_A6, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A6.pt')
th.save(Area1_A7, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A7.pt')
th.save(Area1_A8, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A8.pt')
th.save(Area1_A9, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area1_A9.pt')

u = th.tensor([0,1,0,1,2,0,0,2,0,3,1,2,1,3,0,1,2,3,4,5,6,7])
v = th.tensor([1,2,2,3,3,3,4,4,5,5,6,6,7,7,8,8,8,8,8,8,8,8])

graph_data_type1 = (u, v)

G = dgl.graph(graph_data_type1)

G.edata['linkdata'] = th.tensor([[0,-2,0],
                                    [5,1,1],
                                    [5,-1,1],
                                    [5,1,-1],
                                    [0,0,-2],
                                    [5,-1,-1],
                                    [7,0,1],
                                    [2,1,0],
                                    [7,0,-1],
                                    [2,1,-1],
                                    [7,0,1],
                                    [2,-1,0],
                                    [7,0,-1],
                                    [2,-1,0],
                                    [9,-1,0],
                                    [9,1,0],
                                    [4,0,-1],
                                    [4,0,1],
                                    [2,-1,-1],
                                    [2,-1,1],
                                    [2,1,-1],
                                    [2,1,1]], dtype=th.float32)

ai = nn.ZeroPad2d((0,0,0,34))
bi = nn.ZeroPad2d((0,0,0,56))
ci = nn.ZeroPad2d((0,0,0,20))
di = nn.ZeroPad2d((0,0,0,75))
ei = nn.ZeroPad2d((0,0,0,49))
fi = nn.ZeroPad2d((0,0,0,10))
gi = nn.ZeroPad2d((0,0,0,39))
hi = nn.ZeroPad2d((0,0,0,192))
ii = nn.ZeroPad2d((0,0,0,0))

#All of this now matrices in 26x487 size.
a = ai(Area1_A1)
b = bi(Area1_A2)
c = ci(Area1_A3)
d = di(Area1_A4)
e = ei(Area1_A5)
f = fi(Area1_A6)
g = gi(Area1_A7)
h = hi(Area1_A8)
i = ii(Area1_A9)

G.ndata['gabungan'] = th.stack([a, b, c, d, e, f, g, h, i])

GX = dgl.add_self_loop(G)

node_features = GX.ndata['gabungan']
n_features = node_features.shape[1]
k = 5
model = Model(n_features, 100, 100)
opt = th.optim.Adam(model.parameters())
for epoch in range(10):
    negative_graph = construct_negative_graph(GX, k) #NameError: name 'construct_negative_graph' is not defined
    pos_score, neg_score = model(GX, negative_graph, node_features)
    loss = compute_loss(pos_score, neg_score)
    opt.zero_grad()
    loss.backward()
    opt.step()
    print(loss.item())

Maulpy · January 1, 2021, 4:52pm

In the meantime, i changed the aggregator to ‘lstm’ for order inclusion. and some other revision, but only those name error still persist.

Maulpy · January 2, 2021, 3:16am

Oh and using autograd to wrap the nodes feature into variable

Maulpy · January 2, 2021, 5:30am

I need to explore 3.1 again, i will post again after i have modified it again.

mufeili · January 4, 2021, 5:01am

I copy the loss function exactly like in 5.3 except chaging ‘h’ and etc, but the following error occur, although it is clearly in definition of DotProductPredictor class in 5.3

You need to define construct_negative_graph first as in 5.3.

u = th.tensor([0,1,0,1,2,0,0,2,0,3,1,2,1,3,0,1,2,3,4,5,6,7])
v = th.tensor([1,2,2,3,3,3,4,4,5,5,6,6,7,7,8,8,8,8,8,8,8,8])

Any rules for deciding the edges?

Oh and using autograd to wrap the nodes feature into variable

You mean Variable in PyTorch? This should be no longer needed for a long time.

Maulpy · January 4, 2021, 12:38pm

Actually i am still not finished in updating my code, i wonder why for example, the forward() function in GNN instruction and Link Prediction is a bit different. and among other things, i recall maybe that in part 3 the forward is only for checking function meanwhile in part 5 is somewhat like layer initialization method or something, i am not too sure.

I am currently still need investigating those kind of things. maybe at most i will update after three days, clarification in negative_graphs not hurting though, maybe it will help me again.

Yes i did, like have shown you in the above code

Let me elaborate in separated notebook for negative_graph exclusively , the ouput are good too.
I just made this for sample points, I will later made exclusive tensor file for the edge features and unsqueezes all tensor files in node features.

import dgl
import torch as th
import torch.nn as nn
import pandas as pd

def construct_negative_graph(graph, k):
    #print(graph.edges())
    #print(graph.node_attr_schemes())
    src, dst = graph.edges()
    neg_src = src.repeat_interleave(k)
    neg_dst = th.randint(0, graph.number_of_nodes(), (len(src) * k,))
    return dgl.graph((neg_src, neg_dst), num_nodes=graph.number_of_nodes())

k = 5
u = th.tensor([0,1,0,1,2,0,0,2,0,3,1,2,1,3,0,1,2,3,4,5,6,7])
v = th.tensor([1,2,2,3,3,3,4,4,5,5,6,6,7,7,8,8,8,8,8,8,8,8])
graph_data_type1 = (u, v)
graph = dgl.graph(graph_data_type1)
graph.edata['linkdata'] = th.tensor([[0,-2,0],
                                    [5,1,1],
                                    [5,-1,1],
                                    [5,1,-1],
                                    [0,0,-2],
                                    [5,-1,-1],
                                    [7,0,1],
                                    [2,1,0],
                                    [7,0,-1],
                                    [2,1,-1],
                                    [7,0,1],
                                    [2,-1,0],
                                    [7,0,-1],
                                    [2,-1,0],
                                    [9,-1,0],
                                    [9,1,0],
                                    [4,0,-1],
                                    [4,0,1],
                                    [2,-1,-1],
                                    [2,-1,1],
                                    [2,1,-1],
                                    [2,1,1]], dtype=th.float32)

print("Loading CSV...")
AreaA1 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A1.xlsx')
AreaA2 = pd.read_excel('C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A2.xlsx')
....
etc.

#Is it possible for data type to be float32?as the guide suggest.
print("Converting to Tensor...")
Area_A1 = th.tensor(Area_A1.values, dtype=th.float32)
Area_A2 = th.tensor(Area_A2.values, dtype=th.float32)
....
etc.

th.save(Area_A1, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area_A1.pt')
th.save(Area_A2, 'C:/Users/Acer/DGLCONDA05/Data Type 1/Area_A2.pt')
.....
etc.

ai = nn.ZeroPad2d((0,0,0,34))
bi = nn.ZeroPad2d((0,0,0,56))
....
etc.

#All of this now matrices in 26x487 size.
a = th.unsqueeze(ai(Area_A1), 0)
b = th.unsqueeze(bi(Area_A2), 0)
....
etc.

negative_graph = construct_negative_graph(graph, k)
negative_graph

And here is the output

Loading CSV...
Converting to Tensor...
Graph(num_nodes=9, num_edges=110,
      ndata_schemes={}
      edata_schemes={})

Is it normal that the schemes return empty?

Pardon me, what rules? if you mean how i choose my edge while my graph is still evolving, i don’t really know to be honest, like i mention about severed link necessity previously. meanwhile i just planning to put as close to the sequential form as possible, it is just another spreadsheet problem, not really important.

mufeili · January 4, 2021, 12:45pm

This just means that your graph has no node and edge features.

Maulpy · January 13, 2021, 5:54am

Finally with some squeezing in node/edge features definition i get the outcome of negative graph as

negative_graph = construct_negative_graph(graph, k)
negative_graph.edata['linkdata'] = edata
negative_graph.ndata['gabungan'] = ndata
print(negative_graph)


Graph(num_nodes=9, num_edges=110,
      ndata_schemes={'gabungan': Scheme(shape=(487, 26), dtype=torch.float32)}
      edata_schemes={'linkdata': Scheme(shape=(3,), dtype=torch.float32)})

My next two question is :

As i have specified edge features in only some of the node pairs, most of the edge created in negative graph aren’t included within it, so how does it will be infered for the negative edge features?
And for the application, i am using SAGEConv snippet, please confirm if the list of modules and flow i mentioned below true.

a. Class DotProductPredictor, (i want to apply_edges twice, using 2 built in functions)
b. Class construct_negative_graph
c. Class Model (which is derived from Class SAGEConv and SAGE)
d. From those 3 Classess, i able to compute loss optimization from torch.optim.adam
and from here the value of link prediction can be deduced.

Thank you very much @mufeili.