DGL metapath_reachable_graph: unable to allocate xxx GiB/TiB for an array with sahpe (xxxxxxxxxxxxxxxxx,) and data type int64

When i am tring to change a big graph into similar and small homographs, there was an error/

the way i am using is:

for mp in tqdm(mata_paths):
    homo_graph.append(dgl.metapath_reachable_graph(g, mp)

where the meta_paths, is the metapath defined by my self, which strats from entity that I want to classify and end to the same entity. Just like HAN, want to get the metapath based neighbors.

The error is:

file data_transformation.oy
    homo_graph.append(dgl.metapath_reachable_graph(g, mp)
File "...dgl/transformsfunctional.py", line 1520, in metapath_reachable_graph
adj = adj * g.adj_external(
File "/usr/local/.../sparse/_base.py", line 590, in _mul_
return self._mul_dispatch(other)
File ",,,_base.py", line 529, in _mul_sparse_matrix
indices = np.empty(nnz, dtype=idx_dtype)
numpy.core._exceptions.MemoryError: Unable to allocate 61.0 GiB for an array with shape (8185718168,) and data type int 64

sometimes, the error is Killed

I try to replace this function by random walk with metapath. However, seems not perfect.
because random walk create much more edges than metapath_reachable_graphs. I still cannot figure it out. the coding logic is:

There are definitely duplicated edges in the random walk. I removed them directly using (set). Assuming there is a pattern like author-paper-author, such as APAPAPAP, I will extract the set from 1357, then consider 3 as a neighbor of 1, and 5 as a neighbor of 3.


#%%
from dgl.data.rdf import AIFBDataset
from collections import defaultdict
import dgl
from tqdm import tqdm
import pickle
import torch

dataset = AIFBDataset()
g = dataset[0]
prefix = '../data/'


metapaths = [
[('Personen', 'ontology#name', '_Literal'),('_Literal', 'rev-ontology#name', 'Personen')],
[('Personen', 'rev-ontology#author', 'Publikationen'), ('Publikationen', 'ontology#author', 'Personen')],
[('Personen', 'rev-ontology#member', 'Projekte'), ('Projekte', 'ontology#member', 'Personen')]]

def build_and_update_neighbors_dict(traces, dict):
    for path in traces:
        for i in range(2, len(path), 2):  # start and ends nodes should be 1,3,5,7 ... in randomwalk
            src, dst = path[i-2].item(), path[i].item()

            # 确保 self-loop
            if src != -1:
                dict[src].add(src)
            if dst != -1:
                dict[dst].add(dst)

            # add to dictionary
            if src != -1 and dst != -1:
                dict[src].add(dst)
                dict[dst].add(src)


final_neighbors_dict = defaultdict(set)
for i, metapath in tqdm(enumerate(metapaths)):
    metapath_neighbors_dict = defaultdict(set)
    # create subgraph
    # etypes_of_interest = metapath
    # sg = g.edge_type_subgraph(etypes_of_interest)
    srctype = g.to_canonical_etype(metapath[0])[0]
    nodes = g.nodes(srctype)

    traces, _ = dgl.sampling.random_walk(
        g,
        nodes=nodes,
        metapath=metapath * 100
    )
    build_and_update_neighbors_dict(traces, dict=metapath_neighbors_dict)
    build_and_update_neighbors_dict(traces, dict=final_neighbors_dict)
    sorted_dict_metapath = {k: metapath_neighbors_dict[k] for k in sorted(metapath_neighbors_dict)}

    with open(prefix + '{}_adjlists.pickle'.format(i+1), 'wb') as f:
        pickle.dump(sorted_dict_metapath, f)
    f.close()

sorted_dict = {k: final_neighbors_dict[k] for k in sorted(final_neighbors_dict)}


with open(prefix + 'homo_adjlists.pickle', 'wb') as f:
    pickle.dump(sorted_dict, f)
f.close()


The error “Killed” means you run out-of-memory. My suggestion is to down-sample the graph before calling dgl.metapath_reachable_graph. You can perform such down-sampling a couple of times to make the approximation closer to the exact result.

Thanks for your reply. However, the small graph is not enough to increase my accuracy. Could you help me check my function to re-define the metapath_reachable_graphs? I use random walk in metapath, however, something wrong happened.

Could you paste your error messages?

@minjie Thanks for your time! There is no error messag. The code you could see as below:

The function of this code is to create metapath-based neighborhoods, as HAN did. However, this random walk based on metapath creates much more than HAN’s function.

HAN in DGL uses dgl.metapath_reachable_graphs. and I also define a new function for metapath_reachable_graphs which could be used in big graph.

from dgl.data.rdf import AIFBDataset
from collections import defaultdict
import dgl
from tqdm import tqdm

dataset = AIFBDataset()
g = dataset[0]
prefix = '../data/'


metapaths = [
[('Personen', 'ontology#name', '_Literal'),('_Literal', 'rev-ontology#name', 'Personen')],
[('Personen', 'rev-ontology#author', 'Publikationen'), ('Publikationen', 'ontology#author', 'Personen')],
[('Personen', 'rev-ontology#member', 'Projekte'), ('Projekte', 'ontology#member', 'Personen')]]

def build_and_update_neighbors_dict(traces, dict):
    for path in traces:
        for i in range(2, len(path), 2):  # start and ends nodes should be 1,3,5,7 ... in randomwalk
            src, dst = path[i-2].item(), path[i].item()

            # ensure self-loop
            if src != -1:
                dict[src].add(src)
            if dst != -1:
                dict[dst].add(dst)

            # add to dictionary
            if src != -1 and dst != -1:
                dict[src].add(dst)
                dict[dst].add(src)

def build_homograph(dict):
    # get all edges
    edges = [(src, dst) for src, neighbors in dict.items() for dst in neighbors]
    # create a dgl graph
    g = dgl.graph(edges)
    return g

def metapath_reachable_graphs(g, metapath, walk_times):
    # for i, metapath in tqdm(enumerate(metapaths)):
    metapath_neighbors_dict = defaultdict(set)
    srctype = g.to_canonical_etype(metapath[0])[0]
    nodes = g.nodes(srctype)

    traces, _ = dgl.sampling.random_walk(
        g,
        nodes=nodes,
        metapath=metapath * walk_times
    )
    build_and_update_neighbors_dict(traces, dict=metapath_neighbors_dict)
    homograph = build_homograph(metapath_neighbors_dict)
    return homograph

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.