In the init.py for data, there is a function load_dataset that works for Cora/Citeseer/Pubmed. However, in that same file, there are includes for many other datasets as well. Is there any analogous load_dataset function to accesses these datasets as well? If not, is there a particular reason why that function does not support all of the included datasets?
It’s just due to historical reason, that the original GCN implementation uses
load_dataset interface. Now we recommend using
ds = dgl.data.CoraDataset() in favor of pytorch’s dataset interface.
Gotchya, so for Cora (and pubmed, etc) we should now use the dgl.data.MyDataSet() way of doing things, as with the other standard datasets in DGL. One more question: is there any way to programmatically get a list of supported datasets? Something like dg.data.datasets?
You can find the list at https://docs.dgl.ai/en/latest/api/python/data.html#dataset-classes
Do you have an example of loading a NetworkX graph to be used in GCN (example of GCN)? I found it very difficult to load my networkX graph to train/test it.
Thanks a lot
Do you have edge/node data now? I would suggest create dgl graph from the edge list.
# G is a networkx graph edge_list = [e for e in G.edges] src, dst = zip(*edge_list) g = dgl.DGLGraph() g.add_edges(src, dst)
I can convert my networkX graph into a dgl graph (which can be found here https://drive.google.com/file/d/1g9eJdyGuzDzEp_JES8yUXFaQV1Zt9iId/view?usp=sharing).
My question is how can I load my dgl graph into GCN in example of GCN for training and testing?
I have been suggested to create a new data class (similar to cora), but I simply don’t know how to do it without proper step-by-step instructions. Although, I can make my dataset similar to cora.content and cora.cite, but where to put those file and how to configure the codes are confusing.
Thanks in advance.
Instead of directly looking at how
data class is implemented, I would rather recommend on inspecting which members of
data object the GCN example used. By looking at the code here we can see the concrete list of necessary stuff to make GCN example work:
features: the node feature tensor.
labels: the node label tensor.
test_mask: a 0-1 mask on the nodes representing whether the node belongs to training, validation, or test set.
num_labels: number of possible labels (or number of classes).
So you don’t have to implement a brand new
data object as in
CoraDataset in order to run the GCN example with your data. Instead, you can just prepare those variables by replacing all
data.foo occurrences with the stuff from your data.
Please feel free to follow up. Thanks.
it becomes a bit clearer now. Let say I have trained the model and save and load it with this commands
and load it again
If I have a new graph and want to its nodes by using my trained model, how should I proceed it? Does DGL has something like
model.infer(model, DGLgraph) for node classification in a new graph?
Thanks so much
If your model’s forward function is written as something that accepts a graph argument and your model’s parameter list does not depend on the graph itself:
def forward(self, g, features): # g: graph # f: input features ...
Then you can just load in your new graph and call your loaded model with it
# Say you trained with this: pred = model(g, features) loss = compute_loss(pred, ...) # Later on you can just run this: new_pred = model(new_g, new_features)
There are two caveats though:
- If your model’s parameter list depend on the graph itself (e.g. you specified some trainable parameters for every node), then you cannot do this. In fact, you may want to think of how to define the parameters of new unseen nodes on the new graph.
- If the graph is too big to fit on a single GPU and you were training it with minibatch training and neighbor sampling, you may want to check the tutorial to see this tutorial to see the difference between minibatch training and inference. The graph during inference doesn’t have to be the same as that during training, and you can just replace the
graphargument with your new graph when calling the inference function.
I still use the gcn example, which I think uses
forward(self, features) as its forward function. In this case, how can i alter/prepare a graph that i want to infer its node? so that i can do training and inference separately.
Thanks and sorry for asking some basics questions
For GCN example, since the model doesn’t actually depend on the graph to instantiate, you can simply move the graph from
forward(). That is, to rewrite the model from using
self.g in forward function to
g in forward function, and pass in
g directly as an argument:
class GCN(nn.Module): def __init__(self, ...): # remove g from the arguments and don't store it as a member ... def forward(self, g, features): h = features for i, layer in enumerate(self.layers): if i != 0: h = self.dropout(h) h = layer(g, h) return h model = GCN(...) pred = model(g, features)
In this case, you can then do
train_pred = model(train_g, train_features) test_pred = model(test_g, test_features)
I also found out that the example doesn’t consider the edge feature for node classification with GCN. As my graph has edge feature in it, how shall I incorporate it in class GCN?
Thanks a lot