# Question about the tree-LSTM implementation

#1

I am new to dgl and currently reading the tutorial. In the implementation of tree-lstm:

https://docs.dgl.ai/en/latest/tutorials/models/2_small_graph/3_tree-lstm.html

``````class TreeLSTMCell(nn.Module):
def __init__(self, x_size, h_size):
super(TreeLSTMCell, self).__init__()
self.W_iou = nn.Linear(x_size, 3 * h_size, bias=False)
self.U_iou = nn.Linear(2 * h_size, 3 * h_size, bias=False)
self.b_iou = nn.Parameter(th.zeros(1, 3 * h_size))
self.U_f = nn.Linear(2 * h_size, 2 * h_size)
``````

I do not understand why the shape of the variables (i.e., U_iou, U_f) should be 2 * h_size ? If we compare it with the original math equations (i.e., Equations 1-4) Shouldn’t it be h_size?

#2

Also, I have another question about the implementation of Equation 2:

``````    def reduce_func(self, nodes):
# concatenate h_jl for equation (1), (2), (3), (4)
h_cat = nodes.mailbox['h'].view(nodes.mailbox['h'].size(0), -1)
# equation (2)
f = th.sigmoid(self.U_f(h_cat)).view(*nodes.mailbox['h'].size())
``````

For equation 2, it seems that you totally ignore its first term, i.e., w^f * x_j, could you explain why?

#3

Hi,
the shape of `U_iou` and `U_f` is `(2 * h_size, *)` because what we are dealing with is a binary tree.
You could use separate parameters for the left child and the right child as:

``````self.U_iou_left = nn.Linear(h_size, 3 * h_size, bias=False)
self.U_iou_right = nn.Linear(h_size, 3 * h_size, bias=False)
self.U_f_left = nn.Linear(h_size, 2 * h_size)
self.U_f_right = nn.Linear(h_size, 2 * h_size)
``````

However, their is no difference between these two implementations.

note that in SST task, the input features x_j is not all zero if and only if j is a leaf node(this is because in constituency parse tree, only leaf node has word in it). Thus for non-leaf nodes, we could ignore the term W^{(f)} x_j. For leaf nodes, the f_{jk} does not affect the total result at all because in (5) there is no c_{jl}, in this case c_j = i_j \odot u_j.

In summary, i don’t think it’s necessary to take W^{(f)} x_j into account.

However, for tasks where x_j is not all-zero at non-leaf nodes, this term is required.

#4

Got it. Zihao, thanks a lot for your detailed explanation and the excellent tutorial! Actually, I am trying to use dgl to implement graph-lstm (https://arxiv.org/abs/1708.03743), where I cannot ignore the term w^{(f)} * x_j. Your explanation is quite helpful!

#5

Thanks for your attention, you are also welcome to notify us if you have some progress in reproduce graph-lstm so that we could add it to our model zoo (examples).

#6

Sure. Will do that later.