About topk in sparse attention

Why do the `topk` selection across all query-key pairs together but not `topk` selection across the row for every query respectively. There may be only a part of tokens are updated in the attention module.

https://github.com/OpenImagingLab/FlashVSR/blob/914dcd4b3b1155d8a10028177b7473d4f42d47da/diffsynth/models/wan_video_dit.py#L140-L149

	attn_map = torch.softmax(scores, dim=-1)
	attn_map = rearrange(attn_map, 'h (it s1) s2 -> (h it) s1 s2', it=seqlen)
	loop_num, s1, s2 = attn_map.shape
	flat = attn_map.reshape(loop_num, -1)
	n = flat.shape[1]
	apply_topk = min(flat.shape[1]-1, topk)
	thresholds = torch.topk(flat, k=apply_topk + 1, dim=1, largest=True).values[:, -1]
	thresholds = thresholds.unsqueeze(1)
	mask_new = (flat > thresholds).reshape(loop_num, s1, s2)
	mask_new = rearrange(mask_new, '(h it) s1 s2 -> h (it s1) s2', it=seqlen) # keep shape note

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About topk in sparse attention #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About topk in sparse attention #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions