Question about yelpchi dataset

In our dgl.data. we have FraudYelpDataset . However, going back to the original poster of this dataset, we could find that the dataset have elements like this:

{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Wed Aug 19 20:09:02 2020',
 '__version__': '1.0',
 '__globals__': [],
 'homo': <45954x45954 sparse matrix of type '<class 'numpy.float64'>'
 	with 7693958 stored elements in Compressed Sparse Column format>,
 'net_rur': <45954x45954 sparse matrix of type '<class 'numpy.float64'>'
 	with 98630 stored elements in Compressed Sparse Column format>,
 'net_rtr': <45954x45954 sparse matrix of type '<class 'numpy.float64'>'
 	with 1147232 stored elements in Compressed Sparse Column format>,
 'net_rsr': <45954x45954 sparse matrix of type '<class 'numpy.float64'>'
 	with 6805486 stored elements in Compressed Sparse Column format>,
 'features': <45954x32 sparse matrix of type '<class 'numpy.float64'>'
 	with 1469088 stored elements in Compressed Sparse Column format>,
 'label': array([[0, 0, 0, ..., 0, 0, 0]])}

for net_rtr, net_rsr, net_rur, those components are integrated into our module. Thoes matrix are metapath based neighborhood, which means we only keep the center nodes through walking by those metapaths. However, for homo, why we get rid of it? I still don’t know what does homo matrix mean. Also, the author does not mention it either. I guess, it is the adjacent matrix of all of those metapaths.

Probably you could try to look into the first paper introduced this dataset: https://arxiv.org/pdf/2008.08692.pdf

You could check whether homo is the homogenize of net_xxx relations.

this has been solved. you could check from here.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.