-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Q = self.attention_layernorms[i](seqs) ** #Why Q should be calculated this way, should not be seqs * w_q?**
mha_outputs, _ = self.attention_layers[i](Q, seqs, seqs,
attn_mask=attention_mask)
# key_padding_mask=timeline_mask
# need_weights=False) this arg do not work?
seqs = Q + mha_outputs **# This sentence is not very understandable**
seqs = torch.transpose(seqs, 0, 1)
seqs = self.forward_layernorms[i](seqs)
seqs = self.forward_layers[i](seqs)
seqs *= ~timeline_mask.unsqueeze(-1)
Metadata
Metadata
Assignees
Labels
No labels