HI team,
I have got the embedding of all item nodes in this way . Is that OK ? I will build the embedding recall system later.
The code i save all the embediding for all the item nodes:
" # Evaluate
model.eval()
with torch.no_grad():
item_batches = torch.arange(g.num_nodes(item_ntype)).split(args.batch_size)
h_item_batches = []
for blocks in tqdm.tqdm(dataloader_test):
for i in range(len(blocks)):
blocks[i] = blocks[i].to(device)
h_item_batches.append(model.get_repr(blocks, item_emb))
h_item = torch.cat(h_item_batches, 0)
# torch.save(h_item, "embeddings.pth" + "_" + str(epoch_id))
h_pd = pd.DataFrame(h_item.cpu().numpy())
h_pd.to_csv("embeddings" + "_" + str(epoch_id) + ".csv")
print(evaluation.evaluate_nn(dataset, h_item, args.k, args.batch_size))"
My Question is that:
I Saw that in the “dataloader_test” dataloader, before get_repr() , the sampler still works here.
" def collate_test(self, samples):
batch = torch.LongTensor(samples)
blocks = self.sampler.sample_blocks(batch)
assign_features_to_blocks(blocks, self.g, self.textset, self.ntype)
return blocks "
Is that necessary to stop taking sample or sum all the linked node’s embedding for the get_repr() ?
Is there any other simple way of getting all the item‘s Embeding for the downstream scenario ?
Wish your help ~
Thanks ~ ~