Hi guys.
I’m reading my data from a CSV file, and I’m using CSVDataset
to parse it, and I’m using a custom parser to parse categorical edge and node feature named type
based on the tutorial 4.6 Loading data from CSV files. The problem is the result of the parser is a dict, that is transformed into a 1D torch.tensor.
I tried to return a numpy array already in 2D format, but the function expects a dict. How could I parse the categorical feature in a way that It’s in the shape like (N, in_size)
?
Thanks in advance
OBS: the parser code if necessary:
class ParseCategoricalFeature:
def __call__(self, df: pd.DataFrame):
EnumType = Enum('EnumType', ['ADDRESS','ARGV','BLOCK','FILE','IATTR','LINK','MMAPED_FILE','PATH','PIPE','PROCESS_MEMORY','SHM','SOCKET','TASK','XATTR','ACCEPT',
'ACCEPT_SOCKET','ARG','BIND','CLONE','CLONE_MEM','CONNECT','EXEC','EXEC_TASK','FILE_LOCK','FILE_RCV','FREE','GETATTR','GETXATTR','GETXATTR_INODE',
'LISTXATTR','MEMORY_READ','MEMORY_WRITE','MMAP','MMAP_EXEC','MMAP_READ','MMAP_WRITE','MUNMAP','NAMED','OPEN','PERM_APPEND','PERM_EXEC','PERM_READ',
'PERM_WRITE','READ','READ_IOCTL','READ_LINK','RECEIVE','RECEIVE_MSG','RECEIVE_UNIX','SEND','SEND_MSG','SEND_UNIX','SETATTR','SETATTR_INODE','SETUID',
'SETXATTR','SETXATTR_INODE','SH_ATTACH_READ','SH_ATTACH_WRITE','SH_CREATE_READ','SH_CREATE_WRITE','SHMDT','SH_READ','SH_WRITE','SOCKET_CREATE','SOCKET_PAIR_CREATE',
'TERMINATE_PROC','TERMINATE_TASK','UNLINK','VERSION_ACTIVITY','VERSION_ENTITY','WRITE','WRITE_IOCTL'], start=0)
parsed = {}
for header in df:
dt = df[header].to_numpy().squeeze()
if header == 'type':
list_type = []
for e in dt:
list_type.append((EnumType[str(e).upper()].value) * 1.0)
dt = np.array(list_type)
parsed[header] = dt
return parsed