What happens in this code when the data is changed

asmaa · October 4, 2021, 12:00pm

if i have two datasets: cora and citeseer. in cora dataset, paper_id such as

123,345, 675

and in citeseer dataset, it takes the form

 435, der, 456

in addition to citation data is relation between paper_id and take the form

in cora dataset but in citeseer dataset it takes the form 435 der der 456 The following code is run with respect to cora dataset but it not run with respect to sciteseer dataset

citations = pd.read_csv(
os.path.join(data_dir, "cora.cites"),
sep="\t",
header=None,
names=["target", "source"], )
print("Citations shape:", citations.shape)
column_names = ["paper_id"] + [f"term_{idx}" for idx in range(1433)] + ["subject"]
papers = pd.read_csv(
os.path.join(data_dir, "cora.content"), sep="\t", header=None, names=column_names,
)
print("Papers shape:", papers.shape)
class_values = sorted(papers["subject"].unique())
class_idx = {name: id for id, name in enumerate(class_values)}
paper_idx = {name: idx for idx, name in enumerate(sorted(papers["paper_id"].unique()))}

papers["paper_id"] = papers["paper_id"].apply(lambda name: paper_idx[name])
citations["source"] = citations["source"].apply(lambda name: paper_idx[name])
citations["target"] = citations["target"].apply(lambda name: paper_idx[name])
papers["subject"] = papers["subject"].apply(lambda value: class_idx[value])```

KeyError in citations line is : 'ghani01hypertext'

Rhett-Ying · October 8, 2021, 3:50am

Are you trying to parse cora and citeseer datasets on your own? They are newer and different from DGL provides? why not use dgl.data.CoraGraphDataset and dgl.data.CiteseerGraphDataset. DGL highly recommends processing graph data into a dgl.data.DGLDataset subclass. Pls refer to Make Your Own Dataset — DGL 0.8 documentation.

As for the KeyError in your case, you’re trying to parse both datasets with one same code? You may need to debug on citeseer and change code if required.

system · November 7, 2021, 3:50am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.