In pytorch, I use y=model(x)
(here x
is a pytorch Tensor and model
is a DNN model) to infer,following is the code.
class FCModel(torch.nn.Module):
def __init__(self, in_dim, n_hidden_1, n_hidden_2, out_dim):
super(FCModel, self).__init__()
self.l1 = torch.nn.Linear(in_dim, n_hidden_1)
self.l2 = torch.nn.Linear(n_hidden_1, n_hidden_2)
self.l3 = torch.nn.Linear(n_hidden_2, out_dim)
def forward(self, x):
out = F.relu(self.l1(x))
out = F.relu(self.l2(out))
out = self.l3(out)
return out
model = FCModel(4096, 2048, 1024, 22)
model.cuda()
...
x, label = next(train_loader)
x.cuda()
t0 = time.time()
y = model(x)
#torch.cuda.synchronize()
print(time.time()-t0)
And the inference is asynchronous, which means that if torch.cuda.synchronize()
is commented out, the print time is very small and the inference process is running in the background.
When I use DGL and pytorch as backend, the code is as following.
g = dgl.contrib.graph_store.create_graph_from_store(
args.dataset, "shared_mem")
train_loader = iter(dgl.contrib.sampling.NeighborSampler(g, args.batch_size,
args.num_neighbors,
neighbor_type='in',
shuffle=True,
num_workers=16,
num_hops=args.n_layers+1,
seed_nodes=train_nid,
prefetch=False))
model = GCNSampling(in_feats,
args.n_hidden,
n_classes,
args.n_layers,
F.relu,
args.dropout)
model.cuda()
...
nf, label = next(train_loader) #type(x) is NodeFlow
nf.copy_from_parent()
t0 = time.time()
y = model(nf)
#torch.cuda.synchronize()
print(time.time()-t0)
And the printed time stays the same no matter whether torch.cuda.synchronize()
is commented out, which means the inference is synchronous, while I wish the inference process is also async.
Is it because DGL implements the synchronization mechanism of inference?