Problem to reproduce the result in DGL gcn_concat

Thanks for making this awsome graph learning framework.

I’m trying to reproduce the results by using Keras+Tensorflow, based on Kipf’s Keras version of GCN.

I tried to construct a model similar to your gcn_concat, with the concatenation and 10 stacking gcn layers, as described in the README file.

However, I can never get the similar result as in DGL. For Cora, the testing accuracy gets stuck at roughly 80%, no matter how many gcn layers I use. I’m using the same hyperparameters, lr=0.01, dropout=0.5, 16 hidden features, etc.

Could anyone be so kind to offer some help, and take a brief look at my code, just to confirm that my network architecture is correct?

Here is my implementation:

class GraphConvolution(Layer):
    """Basic graph convolution layer as in https://arxiv.org/abs/1609.02907"""
    def __init__(self, units, support=1,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 **kwargs):
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super(GraphConvolution, self).__init__(**kwargs)
        self.units = units
        self.activation = activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = initializers.get(kernel_initializer)
        self.bias_initializer = initializers.get(bias_initializer)
        self.kernel_regularizer = regularizers.get(kernel_regularizer)
        self.bias_regularizer = regularizers.get(bias_regularizer)
        self.activity_regularizer = regularizers.get(activity_regularizer)
        self.kernel_constraint = constraints.get(kernel_constraint)
        self.bias_constraint = constraints.get(bias_constraint)
        self.supports_masking = True

        self.support = support
        assert support >= 1

    def compute_output_shape(self, input_shapes):
        features_shape = input_shapes[0]
        output_shape = (features_shape[0], self.units)
        return output_shape  # (batch_size, output_dim)

    def build(self, input_shapes):
        features_shape = input_shapes[0]
        assert len(features_shape) == 2
        input_dim = features_shape[1]

        self.kernel = self.add_weight(shape=(input_dim * self.support,
                                             self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.built = True

    def call(self, inputs, mask=None):
        features = inputs[0]
        basis = inputs[1]

        supports = K.dot(basis, features)
        output = K.dot(supports, self.kernel)
        if self.bias:
            output += self.bias

        return self.activation(output)

N_FILTERS = 16

# A_ will be passed to G, which is the normalized adjacency matrix with self-loop
G = Input(shape=(None, None), batch_shape=(None, None), sparse=True)

# feature input
X_in = Input(shape=(F,))

# Define model architecture
# The model is similar to https://github.com/dmlc/dgl/blob/master/examples/mxnet/gcn/gcn_concat.py
# NOTE: We pass arguments for graph convolutional layers as a list of tensors.
# This is somewhat hacky, more elegant options would require rewriting the Layer base class.
H = GraphConvolution(N_FILTERS, support, activation='relu')([X_in, G])
H = Dropout(0.5)(H)

concatenate_list = [X_in, H]

if args.nlayers > 1:
	for i in range(args.nlayers - 1):
		H = Concatenate()(concatenate_list)
		H = GraphConvolution(N_FILTERS, support, activation='relu')([H, G])
		H = Dropout(0.5)(H)
		concatenate_list.append(H)

H = Concatenate()(concatenate_list)
H = Dropout(0.5)(H)

Y = Dense(n_classes, activation='softmax')(H)

Thanks in advance.

That means you should use DGL instead :slight_smile:.

I haven’t got time to dig into your model, but would like to share some of my experiences in matching the performance:

  • Verify data preprocessing. Make sure that the data preprocess is exactly the same as the one in DGL. For example, make sure the features are correctly normalized, the self-loops are added and the train/val/test split is the same. According to this paper, GCN is actually not very stable when the split changes so make sure that is the same as well.
  • Verify hyperparameters such as learning rate, weight decay, dropout and so on.
  • Verify the model architecture by looking at the parameter shapes. This is a quick way to spot some easy mistakes.
  • Verify the parameter initializers.
  • Verify the loss curve. If you found the loss values are different in magnitude, then there must be sth. different.

Hope these tips help.

I will check the preprocessing and other stuff.
In the source code of gcn_concate, I cannot find the specific name of the initializers, could you tell me which initializer is used?
Also, is the weight decay also applied to the bias?

Thanks.

I just tried the DGL + mxnet-mkl (all latest nightly-built version).
However, I still cannot reproduce the results in README.
The running script is:

DGLBACKEND=mxnet python3 examples/mxnet/gcn/gcn_batch.py --dataset "cora" --n-epochs 200  --n-layers 10 --normalization 'sym' --self-loop

The only difference is that I’m using CPU instead of GPU, and using mxnet-mkl instead of mxnet or mxnet-cu90, all the hyperparameters remains unchanged.
However, the reported test acuracy is slightly below 80%, which is far away from 92.63%.

I do have a warning saying: “Initializer is not set. Use zero initializer instead. To suppress this warning, use set_initializer to explicitly specify which initializer to use.”

Is there any initializer I need to specify?

Also tried on GPU with mxnet-cu80, the test accuracy is still below 80% with 10 layers.

It seems that you are using an old version? The gcn_batch.py should have been removed from the example folder.

oh… sorry, I attached the wrong script there,
the actual script I used was:

DGLBACKEND=mxnet python3 examples/mxnet/gcn/gcn_concat.py --dataset "cora" --n-epochs 200 --gpu 1 --n-layers 10 --normalization 'sym' --self-loop

which is from the latest master branch.
I tested on both cpu and gpu, both has test accuracy close to 80%.

I attach part of the log here:

(mxnet-dgl) cx2@vision-gpu-2:~/src/gcn/dgl-gcn/dgl/gcn$ DGLBACKEND=mxnet python gcn_concat.py --dataset “cora” --n-epochs 200 --gpu 2 --n-layers 10 --normalization ‘sym’ --self-loop
Namespace(dataset=‘cora’, dropout=0.5, gpu=2, lr=0.01, n_epochs=200, n_hidden=16, n_layers=10, normalization=‘sym’, self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfe
ats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type=‘gnp’, syn_val_ratio=0.2, weight_decay=0.0005)
----Data statistics------’
#Edges 13264
#Classes 7
#Train samples 140
#Val samples 300
#Test samples 1000
gcn0_ (
Parameter dense0_weight (shape=(16, 0), dtype=float32)
Parameter dense0_bias (shape=(16,), dtype=float32)
Parameter dense1_weight (shape=(16, 0), dtype=float32)
Parameter dense0_bias (shape=(16,), dtype=float32) [195/710]
Parameter dense1_weight (shape=(16, 0), dtype=float32)
Parameter dense1_bias (shape=(16,), dtype=float32)
Parameter dense2_weight (shape=(16, 0), dtype=float32)
Parameter dense2_bias (shape=(16,), dtype=float32)
Parameter dense3_weight (shape=(16, 0), dtype=float32)
Parameter dense3_bias (shape=(16,), dtype=float32)
Parameter dense4_weight (shape=(16, 0), dtype=float32)
Parameter dense4_bias (shape=(16,), dtype=float32)
Parameter dense5_weight (shape=(16, 0), dtype=float32)
Parameter dense5_bias (shape=(16,), dtype=float32)
Parameter dense6_weight (shape=(16, 0), dtype=float32)
Parameter dense6_bias (shape=(16,), dtype=float32)
Parameter dense7_weight (shape=(16, 0), dtype=float32)
Parameter dense7_bias (shape=(16,), dtype=float32)
Parameter dense8_weight (shape=(16, 0), dtype=float32)
Parameter dense8_bias (shape=(16,), dtype=float32)
Parameter dense9_weight (shape=(16, 0), dtype=float32)
Parameter dense9_bias (shape=(16,), dtype=float32)
Parameter dense10_weight (shape=(16, 0), dtype=float32)
Parameter dense10_bias (shape=(16,), dtype=float32)
Parameter dense11_weight (shape=(7, 0), dtype=float32)
Parameter dense11_bias (shape=(7,), dtype=float32)
)
/home/nfs/cx2/virtualenv/mxnet-dgl/lib/python3.6/site-packages/dgl/frame.py:204: UserWarning: Initializer is not set. Use zero initializer instead. To suppress this warning, use set_init ializer to explicitly specify which initializer to use.
dgl_warning(‘Initializer is not set. Use zero initializer instead.’
[19:54:33] src/operator/contrib/…/tensor/./…/…/common/utils.h:450:
Storage type fallback detected:
operator = add_n
input storage types = [row_sparse, default, ]
output storage types = [default, ]
params = {“num_args” : 2, }
context.dev_mask = gpu
The operator with default storage type will be dispatched for execution. You’re seeing this warning message because the operator above is unable to process the given ndarrays with specifi
ed storage types, context and parameter. Temporary dense ndarrays are generated in order to execute the operator. This does not affect the correctness of the programme. You can set enviro
nment variable MXNET_STORAGE_FALLBACK_LOG_VERBOSE to 0 to suppress this warning.
Epoch 00003 | Time(s) 0.0694 | Loss 1.7922 | Accuracy 0.3500 | ETputs(KTEPS) 191.04
Epoch 00004 | Time(s) 0.0760 | Loss 1.7336 | Accuracy 0.3500 | ETputs(KTEPS) 174.53

Epoch 00178 | Time(s) 0.0682 | Loss 0.0606 | Accuracy 0.7767 | ETputs(KTEPS) 194.44
Epoch 00179 | Time(s) 0.0681 | Loss 0.0598 | Accuracy 0.7767 | ETputs(KTEPS) 194.69
Epoch 00180 | Time(s) 0.0680 | Loss 0.1376 | Accuracy 0.7900 | ETputs(KTEPS) 194.94
Epoch 00181 | Time(s) 0.0680 | Loss 0.1192 | Accuracy 0.7967 | ETputs(KTEPS) 195.17
Epoch 00182 | Time(s) 0.0682 | Loss 0.0688 | Accuracy 0.7567 | ETputs(KTEPS) 194.36
Epoch 00183 | Time(s) 0.0682 | Loss 0.0565 | Accuracy 0.7500 | ETputs(KTEPS) 194.38
Epoch 00184 | Time(s) 0.0682 | Loss 0.1576 | Accuracy 0.7800 | ETputs(KTEPS) 194.51
Epoch 00185 | Time(s) 0.0682 | Loss 0.0568 | Accuracy 0.8133 | ETputs(KTEPS) 194.49
Epoch 00186 | Time(s) 0.0682 | Loss 0.0739 | Accuracy 0.8267 | ETputs(KTEPS) 194.60
Epoch 00187 | Time(s) 0.0685 | Loss 0.0638 | Accuracy 0.8067 | ETputs(KTEPS) 193.75
Epoch 00188 | Time(s) 0.0685 | Loss 0.0923 | Accuracy 0.7933 | ETputs(KTEPS) 193.69
Epoch 00189 | Time(s) 0.0687 | Loss 0.0597 | Accuracy 0.7700 | ETputs(KTEPS) 193.08
Epoch 00190 | Time(s) 0.0686 | Loss 0.0911 | Accuracy 0.7733 | ETputs(KTEPS) 193.31
Epoch 00191 | Time(s) 0.0685 | Loss 0.0558 | Accuracy 0.7800 | ETputs(KTEPS) 193.52
Epoch 00192 | Time(s) 0.0686 | Loss 0.0451 | Accuracy 0.7667 | ETputs(KTEPS) 193.28
Epoch 00193 | Time(s) 0.0687 | Loss 0.0520 | Accuracy 0.7600 | ETputs(KTEPS) 193.09
Epoch 00194 | Time(s) 0.0687 | Loss 0.0356 | Accuracy 0.7633 | ETputs(KTEPS) 193.12
Epoch 00195 | Time(s) 0.0688 | Loss 0.0676 | Accuracy 0.7767 | ETputs(KTEPS) 192.93
Epoch 00196 | Time(s) 0.0687 | Loss 0.0648 | Accuracy 0.8000 | ETputs(KTEPS) 193.02
Epoch 00197 | Time(s) 0.0687 | Loss 0.0203 | Accuracy 0.8133 | ETputs(KTEPS) 193.14
Epoch 00198 | Time(s) 0.0686 | Loss 0.0224 | Accuracy 0.8067 | ETputs(KTEPS) 193.30
Epoch 00199 | Time(s) 0.0686 | Loss 0.0554 | Accuracy 0.8033 | ETputs(KTEPS) 193.23
Test accuracy 80.10%

Could you please post the log of running gcn_concat.py, which yields the test accuracy better than 90%, so that I can check the difference by myself?
Thanks in advance.

Hi,

Here, I am attaching the log of the particular training job at the end. While I hope this can be helpful, I also wanted to clarify a few details:

(1) The initial goal of this implementation was to benchmark against non-graph baselines, rather than obtaining better results. Since the DGL datasets are rather small, the case-by-case variance may be large.

(2) As pointed out by Ziyue in offline chats, the particular implementation considered both training and test accuracies when reporting the final result. This is a common practice in SSL, but seems not taken by the original authors. Apologies for any confusions.

(3) Adding depth may or may not improve accuracy. While adding depth is a clear way to mimic power iterations of matrix factorizations, training multiple epochs to obtain stationary points could equivalently solve matrix factorization. Conclusions should not be drawn from these experiments alone.

Namespace(dataset=‘cora’, dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=10, normalization=‘sym’, seed=None, self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type=‘gnp’, syn_val_ratio=0.2, wd=0.0005)
Finished data loading and preprocessing.
NumNodes: 2708
NumEdges: 10556
NumFeats: 1433
NumClasses: 7
NumTrainingSamples: 140
NumValidationSamples: 500
NumTestSamples: 1000
/home/ec2-user/yifeim-work/dgl/python/dgl/frame.py:204: UserWarning: Initializer is not set. Use zero initializer instead. To suppress this warning, use set_initializer to explicitly specify which initializer to use.
dgl_warning(‘Initializer is not set. Use zero initializer instead.’
Epoch 00003 | Loss 0.0992 | Time(s) 0.4409 | ETputs(KTEPS) 30.08
Epoch 00004 | Loss 0.0990 | Time(s) 0.4741 | ETputs(KTEPS) 27.98
Epoch 00005 | Loss 0.0987 | Time(s) 0.4949 | ETputs(KTEPS) 26.80
Epoch 00006 | Loss 0.0977 | Time(s) 0.4884 | ETputs(KTEPS) 27.16
Epoch 00007 | Loss 0.0965 | Time(s) 0.4901 | ETputs(KTEPS) 27.06
Epoch 00008 | Loss 0.0968 | Time(s) 0.4960 | ETputs(KTEPS) 26.74
Epoch 00009 | Loss 0.0956 | Time(s) 0.4964 | ETputs(KTEPS) 26.72
Epoch 00010 | Loss 0.0955 | Time(s) 0.4964 | ETputs(KTEPS) 26.72
Epoch 00011 | Loss 0.0945 | Time(s) 0.4991 | ETputs(KTEPS) 26.58
Epoch 00012 | Loss 0.0927 | Time(s) 0.5020 | ETputs(KTEPS) 26.42
Epoch 00013 | Loss 0.0918 | Time(s) 0.5041 | ETputs(KTEPS) 26.31
Epoch 00014 | Loss 0.0903 | Time(s) 0.4997 | ETputs(KTEPS) 26.54
Epoch 00015 | Loss 0.0861 | Time(s) 0.4996 | ETputs(KTEPS) 26.55
Epoch 00016 | Loss 0.0861 | Time(s) 0.4964 | ETputs(KTEPS) 26.72
Epoch 00017 | Loss 0.0876 | Time(s) 0.4966 | ETputs(KTEPS) 26.71
Epoch 00018 | Loss 0.0801 | Time(s) 0.4914 | ETputs(KTEPS) 26.99
Epoch 00019 | Loss 0.0736 | Time(s) 0.4881 | ETputs(KTEPS) 27.17
Epoch 00020 | Loss 0.0746 | Time(s) 0.4833 | ETputs(KTEPS) 27.45
Epoch 00021 | Loss 0.0705 | Time(s) 0.4787 | ETputs(KTEPS) 27.71
Epoch 00022 | Loss 0.0680 | Time(s) 0.4806 | ETputs(KTEPS) 27.60
Epoch 00023 | Loss 0.0696 | Time(s) 0.4816 | ETputs(KTEPS) 27.54
Epoch 00024 | Loss 0.0611 | Time(s) 0.4839 | ETputs(KTEPS) 27.41
Epoch 00025 | Loss 0.0614 | Time(s) 0.4852 | ETputs(KTEPS) 27.34
Epoch 00026 | Loss 0.0588 | Time(s) 0.4836 | ETputs(KTEPS) 27.43
Epoch 00027 | Loss 0.0487 | Time(s) 0.4846 | ETputs(KTEPS) 27.37
Epoch 00028 | Loss 0.0527 | Time(s) 0.4865 | ETputs(KTEPS) 27.27
Epoch 00029 | Loss 0.0456 | Time(s) 0.4864 | ETputs(KTEPS) 27.27
Epoch 00030 | Loss 0.0515 | Time(s) 0.4873 | ETputs(KTEPS) 27.22
Epoch 00031 | Loss 0.0509 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00032 | Loss 0.0420 | Time(s) 0.4866 | ETputs(KTEPS) 27.26
Epoch 00033 | Loss 0.0458 | Time(s) 0.4887 | ETputs(KTEPS) 27.14
Epoch 00034 | Loss 0.0592 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00035 | Loss 0.0585 | Time(s) 0.4877 | ETputs(KTEPS) 27.20
Epoch 00036 | Loss 0.0606 | Time(s) 0.4862 | ETputs(KTEPS) 27.28
Epoch 00037 | Loss 0.0520 | Time(s) 0.4840 | ETputs(KTEPS) 27.40
Epoch 00038 | Loss 0.0553 | Time(s) 0.4836 | ETputs(KTEPS) 27.43
Epoch 00039 | Loss 0.0370 | Time(s) 0.4823 | ETputs(KTEPS) 27.50
Epoch 00040 | Loss 0.0484 | Time(s) 0.4809 | ETputs(KTEPS) 27.58
Epoch 00041 | Loss 0.0473 | Time(s) 0.4837 | ETputs(KTEPS) 27.42
Epoch 00042 | Loss 0.0526 | Time(s) 0.4839 | ETputs(KTEPS) 27.41
Epoch 00043 | Loss 0.0453 | Time(s) 0.4840 | ETputs(KTEPS) 27.41
Epoch 00044 | Loss 0.0343 | Time(s) 0.4845 | ETputs(KTEPS) 27.38
Epoch 00045 | Loss 0.0483 | Time(s) 0.4837 | ETputs(KTEPS) 27.42
Epoch 00046 | Loss 0.0332 | Time(s) 0.4825 | ETputs(KTEPS) 27.49
Epoch 00047 | Loss 0.0268 | Time(s) 0.4828 | ETputs(KTEPS) 27.47
Epoch 00048 | Loss 0.0194 | Time(s) 0.4811 | ETputs(KTEPS) 27.57
Epoch 00049 | Loss 0.0358 | Time(s) 0.4829 | ETputs(KTEPS) 27.47
Epoch 00050 | Loss 0.0254 | Time(s) 0.4812 | ETputs(KTEPS) 27.56
Epoch 00051 | Loss 0.0246 | Time(s) 0.4807 | ETputs(KTEPS) 27.59
Epoch 00052 | Loss 0.0180 | Time(s) 0.4808 | ETputs(KTEPS) 27.59
Epoch 00053 | Loss 0.0173 | Time(s) 0.4797 | ETputs(KTEPS) 27.65
Epoch 00054 | Loss 0.0151 | Time(s) 0.4795 | ETputs(KTEPS) 27.66
Epoch 00055 | Loss 0.0130 | Time(s) 0.4788 | ETputs(KTEPS) 27.70
Epoch 00056 | Loss 0.0093 | Time(s) 0.4786 | ETputs(KTEPS) 27.71
Epoch 00057 | Loss 0.0132 | Time(s) 0.4778 | ETputs(KTEPS) 27.76
Epoch 00058 | Loss 0.0234 | Time(s) 0.4779 | ETputs(KTEPS) 27.75
Epoch 00059 | Loss 0.0079 | Time(s) 0.4790 | ETputs(KTEPS) 27.69
Epoch 00060 | Loss 0.0101 | Time(s) 0.4782 | ETputs(KTEPS) 27.73
Epoch 00061 | Loss 0.0103 | Time(s) 0.4779 | ETputs(KTEPS) 27.76
Epoch 00062 | Loss 0.0245 | Time(s) 0.4777 | ETputs(KTEPS) 27.76
Epoch 00063 | Loss 0.0208 | Time(s) 0.4768 | ETputs(KTEPS) 27.82
Epoch 00064 | Loss 0.0113 | Time(s) 0.4765 | ETputs(KTEPS) 27.83
Epoch 00065 | Loss 0.0142 | Time(s) 0.4768 | ETputs(KTEPS) 27.82
Epoch 00066 | Loss 0.0066 | Time(s) 0.4768 | ETputs(KTEPS) 27.82
Epoch 00067 | Loss 0.0119 | Time(s) 0.4760 | ETputs(KTEPS) 27.87
Epoch 00068 | Loss 0.0139 | Time(s) 0.4757 | ETputs(KTEPS) 27.88
Epoch 00069 | Loss 0.0151 | Time(s) 0.4760 | ETputs(KTEPS) 27.87
Epoch 00070 | Loss 0.0160 | Time(s) 0.4765 | ETputs(KTEPS) 27.83
Epoch 00071 | Loss 0.0117 | Time(s) 0.4766 | ETputs(KTEPS) 27.83
Epoch 00072 | Loss 0.0061 | Time(s) 0.4770 | ETputs(KTEPS) 27.81
Epoch 00073 | Loss 0.0255 | Time(s) 0.4773 | ETputs(KTEPS) 27.79
Epoch 00074 | Loss 0.0198 | Time(s) 0.4774 | ETputs(KTEPS) 27.78
Epoch 00075 | Loss 0.0112 | Time(s) 0.4770 | ETputs(KTEPS) 27.81
Epoch 00076 | Loss 0.0053 | Time(s) 0.4775 | ETputs(KTEPS) 27.78
Epoch 00077 | Loss 0.0024 | Time(s) 0.4782 | ETputs(KTEPS) 27.74
Epoch 00078 | Loss 0.0221 | Time(s) 0.4788 | ETputs(KTEPS) 27.70
Epoch 00079 | Loss 0.0045 | Time(s) 0.4791 | ETputs(KTEPS) 27.68
Epoch 00080 | Loss 0.0013 | Time(s) 0.4789 | ETputs(KTEPS) 27.70
Epoch 00081 | Loss 0.0033 | Time(s) 0.4800 | ETputs(KTEPS) 27.63
Epoch 00082 | Loss 0.0013 | Time(s) 0.4804 | ETputs(KTEPS) 27.61
Epoch 00083 | Loss 0.0009 | Time(s) 0.4813 | ETputs(KTEPS) 27.56
Epoch 00084 | Loss 0.0005 | Time(s) 0.4821 | ETputs(KTEPS) 27.51
Epoch 00085 | Loss 0.0016 | Time(s) 0.4829 | ETputs(KTEPS) 27.47
Epoch 00086 | Loss 0.0020 | Time(s) 0.4829 | ETputs(KTEPS) 27.47
Epoch 00087 | Loss 0.0019 | Time(s) 0.4838 | ETputs(KTEPS) 27.42
Epoch 00088 | Loss 0.0001 | Time(s) 0.4843 | ETputs(KTEPS) 27.39
Epoch 00089 | Loss 0.0051 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00090 | Loss 0.0103 | Time(s) 0.4873 | ETputs(KTEPS) 27.22
Epoch 00091 | Loss 0.0007 | Time(s) 0.4873 | ETputs(KTEPS) 27.22
Epoch 00092 | Loss 0.0028 | Time(s) 0.4875 | ETputs(KTEPS) 27.21
Epoch 00093 | Loss 0.0001 | Time(s) 0.4876 | ETputs(KTEPS) 27.20
Epoch 00094 | Loss 0.0020 | Time(s) 0.4876 | ETputs(KTEPS) 27.20
Epoch 00095 | Loss 0.0006 | Time(s) 0.4872 | ETputs(KTEPS) 27.22
Epoch 00096 | Loss 0.0003 | Time(s) 0.4876 | ETputs(KTEPS) 27.20
Epoch 00097 | Loss 0.0001 | Time(s) 0.4880 | ETputs(KTEPS) 27.18
Epoch 00098 | Loss 0.0002 | Time(s) 0.4874 | ETputs(KTEPS) 27.21
Epoch 00099 | Loss 0.0006 | Time(s) 0.4872 | ETputs(KTEPS) 27.23
Epoch 00100 | Loss 0.0033 | Time(s) 0.4868 | ETputs(KTEPS) 27.24
Epoch 00101 | Loss 0.0013 | Time(s) 0.4873 | ETputs(KTEPS) 27.22
Epoch 00102 | Loss 0.0005 | Time(s) 0.4873 | ETputs(KTEPS) 27.22
Epoch 00103 | Loss 0.0011 | Time(s) 0.4874 | ETputs(KTEPS) 27.21
Epoch 00104 | Loss 0.0002 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00105 | Loss 0.0001 | Time(s) 0.4866 | ETputs(KTEPS) 27.26
Epoch 00106 | Loss 0.0002 | Time(s) 0.4867 | ETputs(KTEPS) 27.25
Epoch 00107 | Loss 0.0009 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00108 | Loss 0.0004 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00109 | Loss 0.0000 | Time(s) 0.4876 | ETputs(KTEPS) 27.20
Epoch 00110 | Loss 0.0000 | Time(s) 0.4878 | ETputs(KTEPS) 27.19
Epoch 00111 | Loss 0.0023 | Time(s) 0.4872 | ETputs(KTEPS) 27.23
Epoch 00112 | Loss 0.0001 | Time(s) 0.4873 | ETputs(KTEPS) 27.22
Epoch 00113 | Loss 0.0001 | Time(s) 0.4870 | ETputs(KTEPS) 27.24
Epoch 00114 | Loss 0.0000 | Time(s) 0.4866 | ETputs(KTEPS) 27.26
Epoch 00115 | Loss 0.0003 | Time(s) 0.4860 | ETputs(KTEPS) 27.29
Epoch 00116 | Loss 0.0006 | Time(s) 0.4856 | ETputs(KTEPS) 27.32
Epoch 00117 | Loss 0.0001 | Time(s) 0.4853 | ETputs(KTEPS) 27.33
Epoch 00118 | Loss 0.0003 | Time(s) 0.4855 | ETputs(KTEPS) 27.32
Epoch 00119 | Loss 0.0001 | Time(s) 0.4858 | ETputs(KTEPS) 27.30
Epoch 00120 | Loss 0.0007 | Time(s) 0.4867 | ETputs(KTEPS) 27.25
Epoch 00121 | Loss 0.0002 | Time(s) 0.4864 | ETputs(KTEPS) 27.27
Epoch 00122 | Loss 0.0002 | Time(s) 0.4871 | ETputs(KTEPS) 27.23
Epoch 00123 | Loss 0.0014 | Time(s) 0.4872 | ETputs(KTEPS) 27.22
Epoch 00124 | Loss 0.0024 | Time(s) 0.4870 | ETputs(KTEPS) 27.24
Epoch 00125 | Loss 0.0002 | Time(s) 0.4865 | ETputs(KTEPS) 27.26
Epoch 00126 | Loss 0.0008 | Time(s) 0.4867 | ETputs(KTEPS) 27.25
Epoch 00127 | Loss 0.0003 | Time(s) 0.4868 | ETputs(KTEPS) 27.25
Epoch 00128 | Loss 0.0002 | Time(s) 0.4875 | ETputs(KTEPS) 27.21
Epoch 00129 | Loss 0.0036 | Time(s) 0.4875 | ETputs(KTEPS) 27.21
Epoch 00130 | Loss 0.0007 | Time(s) 0.4875 | ETputs(KTEPS) 27.21
Epoch 00131 | Loss 0.0081 | Time(s) 0.4874 | ETputs(KTEPS) 27.22
Epoch 00132 | Loss 0.0010 | Time(s) 0.4869 | ETputs(KTEPS) 27.24
Epoch 00133 | Loss 0.0007 | Time(s) 0.4866 | ETputs(KTEPS) 27.26
Epoch 00134 | Loss 0.0003 | Time(s) 0.4864 | ETputs(KTEPS) 27.27
Epoch 00135 | Loss 0.0002 | Time(s) 0.4859 | ETputs(KTEPS) 27.30
Epoch 00136 | Loss 0.0016 | Time(s) 0.4858 | ETputs(KTEPS) 27.31
Epoch 00137 | Loss 0.0020 | Time(s) 0.4851 | ETputs(KTEPS) 27.34
Epoch 00138 | Loss 0.0004 | Time(s) 0.4853 | ETputs(KTEPS) 27.33
Epoch 00139 | Loss 0.0011 | Time(s) 0.4850 | ETputs(KTEPS) 27.35
Epoch 00140 | Loss 0.0005 | Time(s) 0.4851 | ETputs(KTEPS) 27.34
Epoch 00141 | Loss 0.0003 | Time(s) 0.4846 | ETputs(KTEPS) 27.37
Epoch 00142 | Loss 0.0004 | Time(s) 0.4841 | ETputs(KTEPS) 27.40
Epoch 00143 | Loss 0.0000 | Time(s) 0.4840 | ETputs(KTEPS) 27.41
Epoch 00144 | Loss 0.0000 | Time(s) 0.4839 | ETputs(KTEPS) 27.41
Epoch 00145 | Loss 0.0002 | Time(s) 0.4842 | ETputs(KTEPS) 27.39
Epoch 00146 | Loss 0.0001 | Time(s) 0.4836 | ETputs(KTEPS) 27.42
Epoch 00147 | Loss 0.0001 | Time(s) 0.4832 | ETputs(KTEPS) 27.45
Epoch 00148 | Loss 0.0001 | Time(s) 0.4836 | ETputs(KTEPS) 27.43
Epoch 00149 | Loss 0.0001 | Time(s) 0.4837 | ETputs(KTEPS) 27.42
Epoch 00150 | Loss 0.0011 | Time(s) 0.4837 | ETputs(KTEPS) 27.42
Epoch 00151 | Loss 0.0004 | Time(s) 0.4838 | ETputs(KTEPS) 27.42
Epoch 00152 | Loss 0.0000 | Time(s) 0.4839 | ETputs(KTEPS) 27.41
Epoch 00153 | Loss 0.0053 | Time(s) 0.4839 | ETputs(KTEPS) 27.41
Epoch 00154 | Loss 0.0014 | Time(s) 0.4839 | ETputs(KTEPS) 27.41
Epoch 00155 | Loss 0.0000 | Time(s) 0.4836 | ETputs(KTEPS) 27.43
Epoch 00156 | Loss 0.0001 | Time(s) 0.4838 | ETputs(KTEPS) 27.42
Epoch 00157 | Loss 0.0003 | Time(s) 0.4837 | ETputs(KTEPS) 27.42
Epoch 00158 | Loss 0.0006 | Time(s) 0.4835 | ETputs(KTEPS) 27.44
Epoch 00159 | Loss 0.0001 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00160 | Loss 0.0002 | Time(s) 0.4834 | ETputs(KTEPS) 27.44
Epoch 00161 | Loss 0.0001 | Time(s) 0.4832 | ETputs(KTEPS) 27.45
Epoch 00162 | Loss 0.0001 | Time(s) 0.4829 | ETputs(KTEPS) 27.47
Epoch 00163 | Loss 0.0001 | Time(s) 0.4826 | ETputs(KTEPS) 27.48
Epoch 00164 | Loss 0.0000 | Time(s) 0.4828 | ETputs(KTEPS) 27.47
Epoch 00165 | Loss 0.0012 | Time(s) 0.4823 | ETputs(KTEPS) 27.50
Epoch 00166 | Loss 0.0001 | Time(s) 0.4825 | ETputs(KTEPS) 27.49
Epoch 00167 | Loss 0.0000 | Time(s) 0.4826 | ETputs(KTEPS) 27.49
Epoch 00168 | Loss 0.0036 | Time(s) 0.4828 | ETputs(KTEPS) 27.47
Epoch 00169 | Loss 0.0019 | Time(s) 0.4831 | ETputs(KTEPS) 27.46
Epoch 00170 | Loss 0.0019 | Time(s) 0.4829 | ETputs(KTEPS) 27.47
Epoch 00171 | Loss 0.0018 | Time(s) 0.4831 | ETputs(KTEPS) 27.46
Epoch 00172 | Loss 0.0027 | Time(s) 0.4834 | ETputs(KTEPS) 27.44
Epoch 00173 | Loss 0.0002 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00174 | Loss 0.0014 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00175 | Loss 0.0006 | Time(s) 0.4833 | ETputs(KTEPS) 27.44
Epoch 00176 | Loss 0.0025 | Time(s) 0.4833 | ETputs(KTEPS) 27.44
Epoch 00177 | Loss 0.0009 | Time(s) 0.4834 | ETputs(KTEPS) 27.44
Epoch 00178 | Loss 0.0001 | Time(s) 0.4837 | ETputs(KTEPS) 27.42
Epoch 00179 | Loss 0.0007 | Time(s) 0.4836 | ETputs(KTEPS) 27.43
Epoch 00180 | Loss 0.0071 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00181 | Loss 0.0003 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00182 | Loss 0.0012 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00183 | Loss 0.0006 | Time(s) 0.4833 | ETputs(KTEPS) 27.45
Epoch 00184 | Loss 0.0005 | Time(s) 0.4831 | ETputs(KTEPS) 27.45
Epoch 00185 | Loss 0.0001 | Time(s) 0.4831 | ETputs(KTEPS) 27.46
Epoch 00186 | Loss 0.0013 | Time(s) 0.4832 | ETputs(KTEPS) 27.45
Epoch 00187 | Loss 0.0015 | Time(s) 0.4832 | ETputs(KTEPS) 27.45
Epoch 00188 | Loss 0.0008 | Time(s) 0.4834 | ETputs(KTEPS) 27.44
Epoch 00189 | Loss 0.0011 | Time(s) 0.4831 | ETputs(KTEPS) 27.46
Epoch 00190 | Loss 0.0013 | Time(s) 0.4832 | ETputs(KTEPS) 27.45
Epoch 00191 | Loss 0.0020 | Time(s) 0.4835 | ETputs(KTEPS) 27.43
Epoch 00192 | Loss 0.0000 | Time(s) 0.4836 | ETputs(KTEPS) 27.43
Epoch 00193 | Loss 0.0000 | Time(s) 0.4833 | ETputs(KTEPS) 27.45
Epoch 00194 | Loss 0.0009 | Time(s) 0.4829 | ETputs(KTEPS) 27.47
Epoch 00195 | Loss 0.0002 | Time(s) 0.4825 | ETputs(KTEPS) 27.49
Epoch 00196 | Loss 0.0000 | Time(s) 0.4822 | ETputs(KTEPS) 27.51
Epoch 00197 | Loss 0.0005 | Time(s) 0.4820 | ETputs(KTEPS) 27.52
Epoch 00198 | Loss 0.0004 | Time(s) 0.4818 | ETputs(KTEPS) 27.53
Epoch 00199 | Loss 0.0011 | Time(s) 0.4822 | ETputs(KTEPS) 27.51
Final accuracy 92.63%

Btw, a coarse search shows that the learning rate may be different (0.01 vs 0.001).

Thanks a lot! That explains a lot of my confusions.

Just to confirm, please correct me if I’m wrong:

  1. Your log, and the results reported in README, actually result from the removed “gcn_batch”, not the latest version “gcn_concat”. (I know that because “gcn_concat” reports validation accuracy in each epoch, and at the end reports “Test accuracy” instead of the “Final accuracy” used in “gcn_batch”)
  2. In the latest version of “gcn_concat”, “Test accuracy” is reported on the test set, while in the removed “gcn_batch”, “Final accuracy” is reported on the entire dataset, including train, validation, and test set.

If the above is true, could you update README according to the latest version of “gcn_concat”, so that you could reduce a lot of confusion, since “gcn_batch” is removed.

Thanks.

P.S: I also tried the removed “gcn_batch” by myself, the “Final accuracy” is always nearly 99%, no matter 2 layers or 10 layers are used. Is that normal?

Thank you for reportint the problem in the README. We’ll update it shortly.

“the “Final accuracy” is always nearly 99%”

– This appears to be the case. There was a silent mxnet bug on the line loss = loss_fcn(pred, labels, mask), which should be loss = loss_fcn(pred, labels, mask.reshape((-1,1))). With the bug fixed, the accuracies become:

No-graph:

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.01, n_epochs=200, n_hidden=16, n_layers=0, normalization=None, self_loop=False, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 44.51%

Two-layers:

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=2, normalization=None, self_loop=False, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 73.21%

Or

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=2, normalization='sym', self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 53.65%

With 10 layers:

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=10, normalization=None, self_loop=False, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 75.76%

Or

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=10, normalization='sym', self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 75.12%

Btw, the implementation by Ziyue already fixed the bug.

Thanks!
I guess I can close this question now.