Perf would be better if train more epochs and without activation function!?

Hi Guys,
Weirdly, I forgot the activation function(ReLU) in former implementation which has been fixed in https://github.com/pmixer/TiSASRec.pytorch/commit/c4e6230b2568fb5099b60eeccdc2723734392c60, as we know, without activation functions, the whole network would just be a complex matrix multiplication sometimes. But when I double-check the perf w/ vs w/o ReLU, just found that with same hyper parameter settings and just train more epochs, the model performs better if w/o ReLU(even better than papers' reported `NDCG@10: 0.5701, HR@10: 0.8083`):

```
epoch:540, time: 5452.976251(s), valid (NDCG@10: 0.6156, HR@10: 0.8421), test (NDCG@10: 0.5977, HR@10: 0.8220)
```

but when taking the ReLU back, I can only get:

```
epoch:600, time: 938.027581(s), valid (NDCG@10: 0.5907, HR@10: 0.8262), test (NDCG@10: 0.5658, HR@10: 0.7954)
```

So pls kindly redo the fix on activation function sometimes if you need better performance.

Moreover, replacing `Adam` with `AdamW` as optimizer would also help you train the model bit faster(for early converge):
https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/main.py#L76

Last point is as negative sampling introduces the randomness into the training and testing phases, and I did not get seeds fixed, results expected to be slightly different in your own experiments.

Regards,
Zan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf would be better if train more epochs and without activation function!? #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Perf would be better if train more epochs and without activation function!? #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions