Problem to reproduce the result in DGL gcn_concat

xcgoner · January 31, 2019, 7:48pm

Thanks for making this awsome graph learning framework.

I’m trying to reproduce the results by using Keras+Tensorflow, based on Kipf’s Keras version of GCN.

I tried to construct a model similar to your gcn_concat, with the concatenation and 10 stacking gcn layers, as described in the README file.

However, I can never get the similar result as in DGL. For Cora, the testing accuracy gets stuck at roughly 80%, no matter how many gcn layers I use. I’m using the same hyperparameters, lr=0.01, dropout=0.5, 16 hidden features, etc.

Could anyone be so kind to offer some help, and take a brief look at my code, just to confirm that my network architecture is correct?

Here is my implementation:

class GraphConvolution(Layer):
    """Basic graph convolution layer as in https://arxiv.org/abs/1609.02907"""
    def __init__(self, units, support=1,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 **kwargs):
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super(GraphConvolution, self).__init__(**kwargs)
        self.units = units
        self.activation = activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = initializers.get(kernel_initializer)
        self.bias_initializer = initializers.get(bias_initializer)
        self.kernel_regularizer = regularizers.get(kernel_regularizer)
        self.bias_regularizer = regularizers.get(bias_regularizer)
        self.activity_regularizer = regularizers.get(activity_regularizer)
        self.kernel_constraint = constraints.get(kernel_constraint)
        self.bias_constraint = constraints.get(bias_constraint)
        self.supports_masking = True

        self.support = support
        assert support >= 1

    def compute_output_shape(self, input_shapes):
        features_shape = input_shapes[0]
        output_shape = (features_shape[0], self.units)
        return output_shape  # (batch_size, output_dim)

    def build(self, input_shapes):
        features_shape = input_shapes[0]
        assert len(features_shape) == 2
        input_dim = features_shape[1]

        self.kernel = self.add_weight(shape=(input_dim * self.support,
                                             self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None
        self.built = True

    def call(self, inputs, mask=None):
        features = inputs[0]
        basis = inputs[1]

        supports = K.dot(basis, features)
        output = K.dot(supports, self.kernel)
        if self.bias:
            output += self.bias

        return self.activation(output)

N_FILTERS = 16

# A_ will be passed to G, which is the normalized adjacency matrix with self-loop
G = Input(shape=(None, None), batch_shape=(None, None), sparse=True)

# feature input
X_in = Input(shape=(F,))

# Define model architecture
# The model is similar to https://github.com/dmlc/dgl/blob/master/examples/mxnet/gcn/gcn_concat.py
# NOTE: We pass arguments for graph convolutional layers as a list of tensors.
# This is somewhat hacky, more elegant options would require rewriting the Layer base class.
H = GraphConvolution(N_FILTERS, support, activation='relu')([X_in, G])
H = Dropout(0.5)(H)

concatenate_list = [X_in, H]

if args.nlayers > 1:
	for i in range(args.nlayers - 1):
		H = Concatenate()(concatenate_list)
		H = GraphConvolution(N_FILTERS, support, activation='relu')([H, G])
		H = Dropout(0.5)(H)
		concatenate_list.append(H)

H = Concatenate()(concatenate_list)
H = Dropout(0.5)(H)

Y = Dense(n_classes, activation='softmax')(H)

Thanks in advance.

minjie · January 31, 2019, 11:55pm

That means you should use DGL instead .

I haven’t got time to dig into your model, but would like to share some of my experiences in matching the performance:

Verify data preprocessing. Make sure that the data preprocess is exactly the same as the one in DGL. For example, make sure the features are correctly normalized, the self-loops are added and the train/val/test split is the same. According to this paper, GCN is actually not very stable when the split changes so make sure that is the same as well.
Verify hyperparameters such as learning rate, weight decay, dropout and so on.
Verify the model architecture by looking at the parameter shapes. This is a quick way to spot some easy mistakes.
Verify the parameter initializers.
Verify the loss curve. If you found the loss values are different in magnitude, then there must be sth. different.

Hope these tips help.

xcgoner · January 31, 2019, 11:59pm

I will check the preprocessing and other stuff.
In the source code of gcn_concate, I cannot find the specific name of the initializers, could you tell me which initializer is used?
Also, is the weight decay also applied to the bias?

Thanks.

xcgoner · February 1, 2019, 1:08am

I just tried the DGL + mxnet-mkl (all latest nightly-built version).
However, I still cannot reproduce the results in README.
The running script is:

DGLBACKEND=mxnet python3 examples/mxnet/gcn/gcn_batch.py --dataset "cora" --n-epochs 200  --n-layers 10 --normalization 'sym' --self-loop

The only difference is that I’m using CPU instead of GPU, and using mxnet-mkl instead of mxnet or mxnet-cu90, all the hyperparameters remains unchanged.
However, the reported test acuracy is slightly below 80%, which is far away from 92.63%.

I do have a warning saying: “Initializer is not set. Use zero initializer instead. To suppress this warning, use set_initializer to explicitly specify which initializer to use.”

Is there any initializer I need to specify?

xcgoner · February 1, 2019, 1:51am

Also tried on GPU with mxnet-cu80, the test accuracy is still below 80% with 10 layers.

minjie · February 1, 2019, 2:10am

It seems that you are using an old version? The gcn_batch.py should have been removed from the example folder.

xcgoner · February 1, 2019, 3:56am

oh… sorry, I attached the wrong script there,
the actual script I used was:

DGLBACKEND=mxnet python3 examples/mxnet/gcn/gcn_concat.py --dataset "cora" --n-epochs 200 --gpu 1 --n-layers 10 --normalization 'sym' --self-loop

which is from the latest master branch.
I tested on both cpu and gpu, both has test accuracy close to 80%.

I attach part of the log here:

(mxnet-dgl) cx2@vision-gpu-2:~/src/gcn/dgl-gcn/dgl/gcn$ DGLBACKEND=mxnet python gcn_concat.py --dataset “cora” --n-epochs 200 --gpu 2 --n-layers 10 --normalization ‘sym’ --self-loop
Namespace(dataset=‘cora’, dropout=0.5, gpu=2, lr=0.01, n_epochs=200, n_hidden=16, n_layers=10, normalization=‘sym’, self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfe
ats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type=‘gnp’, syn_val_ratio=0.2, weight_decay=0.0005)
----Data statistics------’
#Edges 13264
#Classes 7
#Train samples 140
#Val samples 300
#Test samples 1000
gcn0_ (
Parameter dense0_weight (shape=(16, 0), dtype=float32)
Parameter dense0_bias (shape=(16,), dtype=float32)
Parameter dense1_weight (shape=(16, 0), dtype=float32)
Parameter dense0_bias (shape=(16,), dtype=float32) [195/710]
Parameter dense1_weight (shape=(16, 0), dtype=float32)
Parameter dense1_bias (shape=(16,), dtype=float32)
Parameter dense2_weight (shape=(16, 0), dtype=float32)
Parameter dense2_bias (shape=(16,), dtype=float32)
Parameter dense3_weight (shape=(16, 0), dtype=float32)
Parameter dense3_bias (shape=(16,), dtype=float32)
Parameter dense4_weight (shape=(16, 0), dtype=float32)
Parameter dense4_bias (shape=(16,), dtype=float32)
Parameter dense5_weight (shape=(16, 0), dtype=float32)
Parameter dense5_bias (shape=(16,), dtype=float32)
Parameter dense6_weight (shape=(16, 0), dtype=float32)
Parameter dense6_bias (shape=(16,), dtype=float32)
Parameter dense7_weight (shape=(16, 0), dtype=float32)
Parameter dense7_bias (shape=(16,), dtype=float32)
Parameter dense8_weight (shape=(16, 0), dtype=float32)
Parameter dense8_bias (shape=(16,), dtype=float32)
Parameter dense9_weight (shape=(16, 0), dtype=float32)
Parameter dense9_bias (shape=(16,), dtype=float32)
Parameter dense10_weight (shape=(16, 0), dtype=float32)
Parameter dense10_bias (shape=(16,), dtype=float32)
Parameter dense11_weight (shape=(7, 0), dtype=float32)
Parameter dense11_bias (shape=(7,), dtype=float32)
)
/home/nfs/cx2/virtualenv/mxnet-dgl/lib/python3.6/site-packages/dgl/frame.py:204: UserWarning: Initializer is not set. Use zero initializer instead. To suppress this warning, use set_init ializer to explicitly specify which initializer to use.
dgl_warning(‘Initializer is not set. Use zero initializer instead.’
[19:54:33] src/operator/contrib/…/tensor/./…/…/common/utils.h:450:
Storage type fallback detected:
operator = add_n
input storage types = [row_sparse, default, ]
output storage types = [default, ]
params = {“num_args” : 2, }
context.dev_mask = gpu
The operator with default storage type will be dispatched for execution. You’re seeing this warning message because the operator above is unable to process the given ndarrays with specifi
ed storage types, context and parameter. Temporary dense ndarrays are generated in order to execute the operator. This does not affect the correctness of the programme. You can set enviro
nment variable MXNET_STORAGE_FALLBACK_LOG_VERBOSE to 0 to suppress this warning.
Epoch 00003 | Time(s) 0.0694 | Loss 1.7922 | Accuracy 0.3500 | ETputs(KTEPS) 191.04
Epoch 00004 | Time(s) 0.0760 | Loss 1.7336 | Accuracy 0.3500 | ETputs(KTEPS) 174.53

…

xcgoner · February 1, 2019, 6:39pm

Could you please post the log of running gcn_concat.py, which yields the test accuracy better than 90%, so that I can check the difference by myself?
Thanks in advance.

yifeim · February 1, 2019, 11:26pm

Hi,

Here, I am attaching the log of the particular training job at the end. While I hope this can be helpful, I also wanted to clarify a few details:

(1) The initial goal of this implementation was to benchmark against non-graph baselines, rather than obtaining better results. Since the DGL datasets are rather small, the case-by-case variance may be large.

(2) As pointed out by Ziyue in offline chats, the particular implementation considered both training and test accuracies when reporting the final result. This is a common practice in SSL, but seems not taken by the original authors. Apologies for any confusions.

(3) Adding depth may or may not improve accuracy. While adding depth is a clear way to mimic power iterations of matrix factorizations, training multiple epochs to obtain stationary points could equivalently solve matrix factorization. Conclusions should not be drawn from these experiments alone.

yifeim · February 1, 2019, 11:31pm

Btw, a coarse search shows that the learning rate may be different (0.01 vs 0.001).

xcgoner · February 2, 2019, 12:12am

Thanks a lot! That explains a lot of my confusions.

Just to confirm, please correct me if I’m wrong:

Your log, and the results reported in README, actually result from the removed “gcn_batch”, not the latest version “gcn_concat”. (I know that because “gcn_concat” reports validation accuracy in each epoch, and at the end reports “Test accuracy” instead of the “Final accuracy” used in “gcn_batch”)
In the latest version of “gcn_concat”, “Test accuracy” is reported on the test set, while in the removed “gcn_batch”, “Final accuracy” is reported on the entire dataset, including train, validation, and test set.

If the above is true, could you update README according to the latest version of “gcn_concat”, so that you could reduce a lot of confusion, since “gcn_batch” is removed.

Thanks.

P.S: I also tried the removed “gcn_batch” by myself, the “Final accuracy” is always nearly 99%, no matter 2 layers or 10 layers are used. Is that normal?

zhengda1936 · February 2, 2019, 12:22am

Thank you for reportint the problem in the README. We’ll update it shortly.

yifeim · February 2, 2019, 12:44am

“the “Final accuracy” is always nearly 99%”

– This appears to be the case. There was a silent mxnet bug on the line loss = loss_fcn(pred, labels, mask), which should be loss = loss_fcn(pred, labels, mask.reshape((-1,1))). With the bug fixed, the accuracies become:

No-graph:

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.01, n_epochs=200, n_hidden=16, n_layers=0, normalization=None, self_loop=False, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 44.51%

Two-layers:

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=2, normalization=None, self_loop=False, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 73.21%

Or

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=2, normalization='sym', self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 53.65%

With 10 layers:

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=10, normalization=None, self_loop=False, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 75.76%

Or

Namespace(dataset='cora', dropout=0.5, gpu=1, lr=0.001, n_epochs=200, n_hidden=16, n_layers=10, normalization='sym', self_loop=True, syn_gnp_n=1000, syn_gnp_p=0.0, syn_nclasses=10, syn_nfeats=500, syn_seed=42, syn_test_ratio=0.5, syn_train_ratio=0.1, syn_type='gnp', syn_val_ratio=0.2)
Final accuracy 75.12%

yifeim · February 2, 2019, 12:52am

Btw, the implementation by Ziyue already fixed the bug.

xcgoner · February 2, 2019, 12:55am

Thanks!
I guess I can close this question now.