Dear author:
Your code of NTM is a pretty work, and its structure is concise and easy to follow. But I have a little confusion about why each batch has a memory? Why not using only a memory for every batch, just like a LSTM but just expanding the memory cell size? Could you help me address this confusion? Thank you very much!