for eg, I download the reddit dataset to my local disk. how to effeciently change the
class RedditDataset(DGLBuiltinDataset): and don’t need to download files again, thanks!
Hi, DGL’s builtin RedditDataset class will automatically download the files and avoid redownloading if they already exist. Please see the api doc for usage: https://docs.dgl.ai/api/python/dgl.data.html#reddit-dataset
thank you @minjie. pls correct me if I am wrong,
I see the has_cache() function is to check whether there is a cache and it only check dgl.graph.bin format. is there any config let me set the download file path & load like .npz file, thanks!
def has_cache(self):
graph_path = os.path.join(self.save_path, 'dgl_graph.bin')
if os.path.exists(graph_path):
return True
return False
Currently I rewrite some functions to import local data:
import scipy.sparse as sp
from dgl.data.utils import _get_dgl_url, generate_mask_tensor, load_graphs, save_graphs, deprecate_property
from dgl import backend as F
def process():
# graph
coo_adj = sp.load_npz(os.path.join(raw_path, "reddit_graph.npz"))
reddit_graph = from_scipy(coo_adj)
# features and labels
reddit_data = np.load(os.path.join(raw_path, "reddit_data.npz"))
features = reddit_data["feature"]
labels = reddit_data["label"]
# tarin/val/test indices
node_types = reddit_data["node_types"]
train_mask = (node_types == 1)
val_mask = (node_types == 2)
test_mask = (node_types == 3)
reddit_graph.ndata['train_mask'] = generate_mask_tensor(train_mask)
reddit_graph.ndata['val_mask'] = generate_mask_tensor(val_mask)
reddit_graph.ndata['test_mask'] = generate_mask_tensor(test_mask)
reddit_graph.ndata['feat'] = F.tensor(features, dtype=F.data_type_dict['float32'])
reddit_graph.ndata['label'] = F.tensor(labels, dtype=F.data_type_dict['int64'])
return reddit_graph
Checkout this download
utility function. There is a path
argument for specifying the place to store the files.
Just curious. Why do you mask nodes of certain types for validation? I’m guessing you only want to validate that your trained model can generalize to nodes of type 2? I’m trying to understand the generate_mask_tensor
function.