Reddit Dialogue Feedback Dataset

A dataset to learn which dialogue response gets better human feedback.

View project on GitHub
Go back Home View on Github EMNLP Paper

Leaderboard

Baselines

We evaluate the pairwise accuracy (a random guess is expected to have 0.5 accuracy)

  updown depth width
Length baseline 0.531 0.543 0.591
Bag-of-words baseline 0.571 0.583 0.596
Dialog ppl. 0.488 0.508 0.513
Reverse dialog ppl. 0.560 0.557 0.571
DialogRPT 0.683 0.695 0.752

Submit new results!

Want to submit a new results? Please create an issue!

Go back Home View on Github EMNLP Paper