Error in mixed precision training

kaoumrf · September 26, 2021, 3:44pm

I’m trying to implement end to end mixed precision training following the docs, i have dgl-cu111 installed

with autocast(enabled=use_fp16):
      enc, y_pred = model(G,input_features,X)
      loss = loss_fn(y_pred.to(device), y.float())

    if use_fp16:
      scaler.scale(loss).backward()
      scaler.step(optimizer)
      scaler.update()
    else:
      loss.backward()
      optimizer.step()

It’s returning this error:

DGLError: [15:34:37] /opt/dgl/src/array/cuda/sddmm.cu:148: Data type not recognized with bits 16

BarclayII · September 27, 2021, 7:25am

Did you follow the documentation to compile FP16? Pip installation by default does not have FP16 compiled.

cc @zihao for visibility.

system · October 27, 2021, 7:26am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.