Question about the tree-LSTM implementation


I am new to dgl and currently reading the tutorial. In the implementation of tree-lstm:

class TreeLSTMCell(nn.Module):
    def __init__(self, x_size, h_size):
        super(TreeLSTMCell, self).__init__()
        self.W_iou = nn.Linear(x_size, 3 * h_size, bias=False)
        self.U_iou = nn.Linear(2 * h_size, 3 * h_size, bias=False)
        self.b_iou = nn.Parameter(th.zeros(1, 3 * h_size))
        self.U_f = nn.Linear(2 * h_size, 2 * h_size)

I do not understand why the shape of the variables (i.e., U_iou, U_f) should be 2 * h_size ? If we compare it with the original math equations (i.e., Equations 1-4) Shouldn’t it be h_size?


Also, I have another question about the implementation of Equation 2:

    def reduce_func(self, nodes):
        # concatenate h_jl for equation (1), (2), (3), (4)
        h_cat = nodes.mailbox['h'].view(nodes.mailbox['h'].size(0), -1)
        # equation (2)
        f = th.sigmoid(self.U_f(h_cat)).view(*nodes.mailbox['h'].size())

For equation 2, it seems that you totally ignore its first term, i.e., w^f * x_j, could you explain why?


the shape of U_iou and U_f is (2 * h_size, *) because what we are dealing with is a binary tree.
You could use separate parameters for the left child and the right child as:

self.U_iou_left = nn.Linear(h_size, 3 * h_size, bias=False)
self.U_iou_right = nn.Linear(h_size, 3 * h_size, bias=False)
self.U_f_left = nn.Linear(h_size, 2 * h_size)
self.U_f_right = nn.Linear(h_size, 2 * h_size)

However, their is no difference between these two implementations.

For your second question,

note that in SST task, the input features x_j is not all zero if and only if j is a leaf node(this is because in constituency parse tree, only leaf node has word in it). Thus for non-leaf nodes, we could ignore the term W^{(f)} x_j. For leaf nodes, the f_{jk} does not affect the total result at all because in (5) there is no c_{jl}, in this case c_j = i_j \odot u_j.

In summary, i don’t think it’s necessary to take W^{(f)} x_j into account.

However, for tasks where x_j is not all-zero at non-leaf nodes, this term is required.


Got it. Zihao, thanks a lot for your detailed explanation and the excellent tutorial! Actually, I am trying to use dgl to implement graph-lstm (, where I cannot ignore the term w^{(f)} * x_j. Your explanation is quite helpful!


Thanks for your attention, you are also welcome to notify us if you have some progress in reproduce graph-lstm so that we could add it to our model zoo (examples).


Sure. Will do that later.