Hi,
Very nice and clean code. However, as far as I can tell there is only 1 positive sample representing the future observations in the code while the paper uses 1 positive sample as well as N-1 randomly sampled negative samples for the NCE loss?
/Johan