Skip to content

Calculation of Q in the code #35

@mengyangz86

Description

@mengyangz86
        Q = self.attention_layernorms[i](seqs)  ** #Why Q should be calculated this way, should not be seqs * w_q?**
        mha_outputs, _ = self.attention_layers[i](Q, seqs, seqs, 
                                        attn_mask=attention_mask)
                                        # key_padding_mask=timeline_mask
                                        # need_weights=False) this arg do not work?
        seqs = Q + mha_outputs  **# This sentence is not very understandable**
        seqs = torch.transpose(seqs, 0, 1)

        seqs = self.forward_layernorms[i](seqs)
        seqs = self.forward_layers[i](seqs)
        seqs *=  ~timeline_mask.unsqueeze(-1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions