if i have two datasets: cora and citeseer. in cora dataset, paper_id such as
123,345, 675
and in citeseer dataset, it takes the form
435, der, 456
in addition to citation data is relation between paper_id and take the form
123 345
675 123
.
.
in cora dataset but in citeseer dataset it takes the form 435 der der 456 The following code is run with respect to cora dataset but it not run with respect to sciteseer dataset
citations = pd.read_csv(
os.path.join(data_dir, "cora.cites"),
sep="\t",
header=None,
names=["target", "source"], )
print("Citations shape:", citations.shape)
column_names = ["paper_id"] + [f"term_{idx}" for idx in range(1433)] + ["subject"]
papers = pd.read_csv(
os.path.join(data_dir, "cora.content"), sep="\t", header=None, names=column_names,
)
print("Papers shape:", papers.shape)
class_values = sorted(papers["subject"].unique())
class_idx = {name: id for id, name in enumerate(class_values)}
paper_idx = {name: idx for idx, name in enumerate(sorted(papers["paper_id"].unique()))}
papers["paper_id"] = papers["paper_id"].apply(lambda name: paper_idx[name])
citations["source"] = citations["source"].apply(lambda name: paper_idx[name])
citations["target"] = citations["target"].apply(lambda name: paper_idx[name])
papers["subject"] = papers["subject"].apply(lambda value: class_idx[value])```
KeyError in citations line is : 'ghani01hypertext'