Stochastic Training on Batched Hetero-Graphs

TasmanGC · December 14, 2021, 6:07am

Overview

Sup! I’ve been working through the tutorial on node classification with neighborhood sampling with my own data. I generate a few thousand bidirectional heterographs (G_1, G_2, …, G_n) with two node types (a & b), which I batch using dgl.batch.

The output for one of these graphs is below.

  Graph(num_nodes={'a': 94, 'b': 500},
  num_edges={('a', 'link', 'b'): 2000, ('b', 'link', 'a'): 2000},
  metagraph=[('a', 'b', 'link'), ('b', 'a', 'link')])

I want to use the batched graph and the MultiLayerFullNeighborSampler, to generate the blocks for a Stochastic 2 Layer GCN. However at the moment I’m just trying to get the sampler to work on a single graph. I’m still using the dgl.dataloading.MultiLayerFullNeighborSampler and dgl.dataloading.NodeDataLoader defaults from the tutorial, execpt I’m testing on just a single GNN layer.

The sampler and dataloader returns without error: the input_nodes, output_nodes, and blocks as below.

  [
  {'a': tensor([19, 43, 67, 90], dtype=torch.int32),'b': tensor([389], dtype=torch.int32)},
  {'a': tensor([], dtype=torch.int32), 'b': tensor([389], dtype=torch.int32)},
  [Block(num_src_nodes={'a': 4, 'b': 1},num_dst_nodes={'a': 0, 'b': 1},num_edges={('a', 'link', 'b'): 4},metagraph=[('a', 'b', 'link')])]
  ]

Problem

All set to do some prediction I set up a single layer to test on:

  test_layer = dglnn.HeteroGraphConv({rel : dglnn.GraphConv(1, 2, norm='right')for rel in ['link']})

Calling forward on this test layer with the block and srcdata gives the following error.

AssertionError: Current HeteroNodeDataView has multiple node types, can not be iterated.

What am I doing wrong?

Things I’ve tried and it hasn’t helped

uniquely identifying the two connections when generating the heterograph ie {(a,link_a,b),(b,link_b,a)}
having just a one directional heterograph
different batch sizes greater than 1
did a deep trace to ensure all the pytorch tensors were formatted correctly

What I think is wrong

could be that I’ve structured my nid_training_dict incorrectly for the dataloader(see below it’s just all nodes atm)
maybe I’ve built my graph incorrectly or I’m missing something there?

Potentially Useful Outputs

This is the format of my nid_training_dict

	{'a': tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
			 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
			 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
			 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
			 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
			 90, 91, 92, 93], dtype=torch.int32),
	 'b': tensor([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
			  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,
			  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,
			  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,
			  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,
			  70,  71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,
			  84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,
			  98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
			 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
			 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
			 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153,
			 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167,
			 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
			 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195,
			 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
			 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
			 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237,
			 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251,
			 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265,
			 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279,
			 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293,
			 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307,
			 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321,
			 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335,
			 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349,
			 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,
			 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377,
			 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391,
			 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405,
			 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
			 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433,
			 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447,
			 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461,
			 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475,
			 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489,
			 490, 491, 492, 493, 494, 495, 496, 497, 498, 499], dtype=torch.int32)}

BarclayII · December 14, 2021, 7:43am

Could I see how you invoked the test_layer and the stacktrace of the raised error?

TasmanGC · December 21, 2021, 9:32am

Invoking the test_layer looks like this:

test_layer = dglnn.HeteroGraphConv({ 'link' : dglnn.GraphConv(1, 2, norm='right')},aggregate='mean')
test_layer(blocks[0],blocks[0].srcdata)

I’ve never had to do a stacktrace before but I took a swing. Is this what you need?

Traceback (most recent call last):
  File "c:/Users/22566465/Desktop/UWA_GIT/ginn/notebooks/Graph NB/testing.py", line 42, in <module>
    test_layer(blocks[0],blocks[0].srcdata)
  File "C:\Users\22566465\Anaconda3\envs\ginn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\22566465\Anaconda3\envs\ginn\lib\site-packages\dgl\nn\pytorch\hetero.py", line 168, in forward
    dst_inputs = {k: v[:g.number_of_dst_nodes(k)] for k, v in inputs.items()}
  File "C:\Users\22566465\Anaconda3\envs\ginn\lib\site-packages\dgl\nn\pytorch\hetero.py", line 168, in <dictcomp>
    dst_inputs = {k: v[:g.number_of_dst_nodes(k)] for k, v in inputs.items()}
  File "C:\Users\22566465\Anaconda3\envs\ginn\lib\_collections_abc.py", line 743, in __iter__
    for key in self._mapping:
  File "C:\Users\22566465\Anaconda3\envs\ginn\lib\site-packages\dgl\view.py", line 100, in __iter__
    'Current HeteroNodeDataView has multiple node types, ' \
AssertionError: Current HeteroNodeDataView has multiple node types, can not be iterated.
PS C:\Users\22566465\Desktop\UWA_GIT\ginn>

BarclayII · December 27, 2021, 7:18am

blocks[0].srcdata will be a dictionary of dictionary of tensors, whereas HeteroGraphConv takes in a dictionary of tensors. So you need something like:

test_layer(blocks[0], {'typeA': A, 'typeB': B})

system · January 26, 2022, 7:19am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.