Dear sir:
Viewing the code,for each input batch, B C,it would be stored as B N M.All the N
weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?