Reddit Dialogue Feedback Dataset

A dataset to learn which dialogue response gets better human feedback.

View project on GitHub

Leaderboard

Baselines

We evaluate the pairwise accuracy (a random guess is expected to have 0.5 accuracy)

	`updown`	`depth`	`width`
Length baseline	0.531	0.543	0.591
Bag-of-words baseline	0.571	0.583	0.596
Dialog ppl.	0.488	0.508	0.513
Reverse dialog ppl.	0.560	0.557	0.571
DialogRPT	0.683	0.695	0.752

Submit new results!

Want to submit a new results? Please create an issue!