Dear All members of DGL community.

I have worked on this whole day, i haven’t tested it at all; i am little scared what will come out of it hha. I begging answers from all of gurus here, several questions which i really can’t figure out after many hours of browsing. i hope it doesn’t deviate too much. thanks in advance

This post is the continuation of this

So my current goal is to get a solid code of link prediction, which i think is the simplest and the most compatible to what i seek, i really hope later i will get preliminary result before i turn into another model. and get a result of many type of lost (Cross-entropy, BPR, Margin, etc).

First of all, i set my nodes data differently from what is written in guide, each nodes data is loaded from tensor file in my computer. and it is obviously in rank-2 tensor format.

What i want to ask is the following.

- Overall, The clarification the modification that i made is make sense
- Several sub-questions that follows

2.1 I assume for the simplest mechanics i could think of that strength definition is by defining difference between two nodes as input of edges features, and made a dot product out of it that is propotional to the distance (i put the distance in 'G.nodes[].data[‘linkdata’]). is it will be refined iteratively in the computation? i still have some doubt over it.

2.2. the nodes data is really different in dimension, each loaded from different tensor file, i hope the broadcasting will work, i notice i don’t do any masking or padding as some torch refer to.

2.3. I really hope will find a better write the

2.4. The question about severed link that happen in the graph evolution (e.g : if the nodes occur in between), this is really just pop out in my mind this noon, i don’t know if the code will define automatically without i define too much, i did it one by one with only 9 nodes, and it is very exhausting. i hope i will get a simpler way to do this.

2.5. i don’t know how to build a (maybe dictionary) of nodes and its type to be more compact and callable, i made it into heterogenous type because i think the writing is more compatible than in homogenous (even it is indeed homogenous).

2.6. Related to Part 2; i don’t know what to do with SAGE class, along with RGCN, user_feats and item_feats, i change anything that i think is necessary, the last two definition is somewhat baffle me, even though i think all of the input are complete (i change the ‘hetero_graph’ into ‘G’ to fit the part 1 code).

That’s All, Thank you very much in advance.

Pardon for the swear words , please pay no heed, it is all adressed to me only

So here is the code. i simply made it into two parts in single jupyter notebook file, one for developing dataset, the other is an effort in adapting the link prediction code from guidance.

**This one is introduction part, containing commentary and my lines of thought**

#0.UNDERLYING HTPOTHESIS

#1.DEVELOP THE DATASET

#2. ADAPT THE LINK PREDICTION FOR HETEROGENOUS GRAPH FIRST.

#0. UNDERLYING HYPOTHESIS

#This is a preliminary phase before exploring another model.

#Primary function that are selected are : ‘fn.v_sub_u = x’ , fn.e_dot_x. this is a simple yet strong definition of spatial interaction strength (i think)

#This may provide usefulness later https://docs.dgl.ai/api/python/dgl.data.html#edge-prediction-datasets

**This one for dataset development (PART ONE)**

#1. DEVELOP THE DATASET

#TAKEN FROM ‘DATA PREPARATION INTO TENSOR FORM’ Line 5.

#JUST OCCUR TO ME, HOW TO ADDRESS SEVERED LINK ALONG GRAPH EVOLUTION?CAN WE JUST SET IT TO UPDATED LINK FEATURES?

#Link assumed to be at most skip 1 row/column to be assumed connect, update taken into account those information. outside those assumption the edges is not taken into consideration

#therefore there are only 16 edge type.

#can we assume full configuration is set, but the graph added one by one. very tedious, too many assumption. CAN WE MAKE FOR IT TAKE ALL OF IT?

#reversed return value from assumption will be accepted

import torch

import pandas as pd

import dgl

#For this case i modified my base data to the border of the layout in Area X, in the 2 conditions (all in border):

#Adjacent, but not immediate :

#14-Nov : (Area-A1, Area-A2)

#19-Nov : (Area-B1, Area-B2)

#21-Nov : (Area-C1, Area-C2, Area-C3, Area-C4)

#23-Nov : (Area-D5)

#Adjacent, and relatively immediate :

#18-Dec : (Area-E1, Area-E2, Area-E3, Area-E4, Area-E5, Area-E6)

#20-Dec : (Area-F1, Area-F2, Area-F3, Area-F4)

#Modified, pd.read_csv --> pd.read_excel https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

#If we skip the centre (it will be positioned last), how we counter it later?too muny periphery that will be overlooked. Actually this is a good case for how te center will afflict the other?

#Procedure contained here (https://docs.dgl.ai/en/latest/api/python/dgl.dataloading.html) doesn’t simplify loading from disk

#FOR TYPE 1

print(“Loading xlsx…”)

AreaA1 = pd.read_excel(‘C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A1.xlsx’)

AreaA2 = pd.read_excel(‘C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A2.xlsx’)

…

…

#Is it possible for data type to be float32?as the guide suggest.

print(“Converting to Tensor…”)

Area_A1 = torch.tensor(AreaA1.values, dtype=torch.int64)

Area_A2 = torch.tensor(AreaA2.values, dtype=torch.int64)

…

…

torch.save(Area_A1, ‘C:/Users/Acer/DGLCONDA05/Data Type 1/Area-A1.pt’)

torch.save(Area_A2, ‘C:/Users/Acer/DGLCONDA05/FDC Data Type 1/Area-A2.pt’)

#IS THERE A SIMPLER WAY TO DEFINE HETEROGRAPH THAN THE BELOW???

#Build dictionary of node object from node type and edge type, make it in generalized form

graph_data_type1 = {

(‘Area_A1’, ‘0dx-2y0’, ‘Area_A2’): (torch.tensor([0]), torch.tensor([1])),

(‘Area_A2’, ‘5dx1y1’, ‘Area_B1’): (torch.tensor([1]), torch.tensor([2])),

…

}

#How to define node more conveniently, is dict-inng like this work?

#How to define edges data more conveniently?

G.edges[‘0dx-2y0’].data[‘linkdata’] = torch.tensor([0,-2,0])

G.edges[‘5dx1y1’].data[‘linkdata’] = torch.tensor([5,1,1])

…

…

G = dgl.DGLHeterograph(graph_data_type1)

G.ndata[‘gabungan’][0] = Area_A1

G.ndata[‘gabungan’][1] = Area_A2

…

…

#How to define ‘etype’ here?All is FORTUNATELY automatically defined as in ‘Set/get Features for All Edges of a Single Edge Type’ part https://docs.dgl.ai/en/latest/generated/dgl.DGLGraph.edges.html

**This part for incorporating Link Prediction Code (PART TWO)**

#2. ADAPT THE LINK PREDICTION FOR HETEROGENOUS GRAPH FIRST.

```
# h contains the node representations for each node type computed from
# the GNN defined in the previous section (Section 5.1).
# maybe use elrow iterate for the particular features of data?no?matrix, no need, function is accomadating already.
```

#BAHH, HOW TO LOAD THE DATA AGAIN???

#IS THIS apply_edges really does iterate over nodes?

#for sub, see https://docs.dgl.ai/generated/dgl.function.v_sub_u.html#dgl.function.v_sub_u

#for dot, see https://docs.dgl.ai/generated/dgl.function.u_dot_e.html#dgl.function.u_dot_e

import dgl

import pytorch

import pandas

import numpy

import networkx

import dgl.nn as dglnn

import torch.nn as nn

import torch.nn.functional as F

class HeteroDotProductPredictor(nn.Module):

```
def forward(self, G, h, etype):
with G.local_scope():
for i in range(8)
x = fn.v_sub_u
G.ndata['gabungan'][i] = h
G.apply_edges(fn.e_dot_x('h', 'h', 'linkdata'), etype=etype)
return G.edges[etype].data['linkdata']
def construct_negative_graph(G, k, etype):
utype, _, vtype = etype
src, dst = G.edges(etype=etype)
neg_src = src.repeat_interleave(k)
neg_dst = torch.randint(0, graph.number_of_nodes(vtype), (len(src) * k,))
return dgl.heterograph(
{etype: (neg_src, neg_dst)},
num_nodes_dict={ntype: graph.number_of_nodes(ntype) for ntype in graph.ntypes})
```

#THIS PART BELOW IS YET TO BE CLEAR TO ME… SHED SOME LIGHT UPON ME…see homogenous graph part explanation.

#https://docs.dgl.ai/en/0.5.x/guide/training-node.html

#Define SAGE class first as per https://docs.dgl.ai/en/0.5.x/guide/training-node.html

#Contruct a two-layer GNN model

class SAGE(nn.Module):

```
def __init__(self, in_feats, hid_feats, out_feats):
super().__init__()
self.conv1 = dglnn.SAGEConv(
in_feats=in_feats, out_feats=hid_feats, aggregator_type='mean')
self.conv2 = dglnn.SAGEConv(
in_feats=hid_feats, out_feats=out_feats, aggregator_type='mean')
def forward(self, graph, inputs):
# inputs are features of nodes
h = self.conv1(G, inputs)
h = F.relu(h)
h = self.conv2(G, h)
return h
```

#Need to explore RGCN more at https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify.py

#But it seems irrelevant.

#All ‘hetero_graph’ are changed into ‘G’

class Model(nn.Module):

```
def __init__(self, in_features, hidden_features, out_features, rel_names):
super().__init__()
self.sage = RGCN(in_features, hidden_features, out_features, rel_names)
self.pred = HeteroDotProductPredictor()
def forward(self, G, neg_g, j, etype):
h = self.sage(G, j)
return self.pred(G, h, etype), self.pred(neg_g, h, etype)
def compute_loss(pos_score, neg_score):
# Margin loss
n_edges = pos_score.shape[0]
return (1 - neg_score.view(n_edges, -1) + pos_score.unsqueeze(1)).clamp(min=0).mean()
k = 3 #i hope this is reasonable
model = Model(5, 5, 5, G.etypes) #don't know how to adjust it, is it reasonable???
#'feats' means feature size, i will replace user and item to source and destination
#WTF is user and item stand for? just play along and change into source and destination, still nonsense though.
source_feats = G.nodes[:].data['linkdata']
destination_feats = G.nodes[:].data['linkdata']
node_features = {'user': user_feats, 'item': item_feats}
opt = torch.optim.Adam(model.parameters())
#https://docs.dgl.ai/en/0.4.x/generated/dgl.DGLGraph.edges.html, ":" means all right?
for epoch in range(10):
negative_graph = construct_negative_graph(G, k, ('source', : , 'destination'))
pos_score, neg_score = model(hetero_graph, negative_graph, node_features, ('source', : , 'destination'))
loss = compute_loss(pos_score, neg_score)
opt.zero_grad()
loss.backward()
opt.step()
print(loss.item())
```