Thank you for open-sourcing the code!
In the process of reproducing, the following problems appeared:
Error(s) in loading state_dict for BertForSequenceClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([4, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([2]).
I haven't solved it, can you help me? Thanks!