Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment

Corinna Cortes; Neil D. Lawrence

doi:10.48550/arXiv.2109.09774

Back to publications

Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment

Corinna Cortes, Neil D. Lawrence

, 2021.

Abstract

In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for accepted papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of ‘top-tier conference publications’ when assessing the quality of individual researchers. For NeurIPS 2021, the PCs are repeating the experiment, as well as conducting new ones.

Links

Cite this Paper

BibTeX


@Misc{publications/inconsistency-in-peer-review,
  title = 	 {Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment},
  author = 	 {Cortes, Corinna and Lawrence, Neil D.},
  year = 	 {2021},
  doi = 	 {10.48550/arXiv.2109.09774},
  url = 	 {/publications/inconsistency-in-peer-review.html},
  abstract = 	 {In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for *accepted* papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of 'top-tier conference publications' when assessing the quality of individual researchers. For NeurIPS 2021, the PCs are repeating the experiment, as well as conducting new ones.
}
}

Endnote

%0 Generic
%T Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment
%A Corinna Cortes
%A Neil D. Lawrence
%D 2021	
%F publications/inconsistency-in-peer-review
%R 10.48550/arXiv.2109.09774
%U /publications/inconsistency-in-peer-review.html
%X In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for *accepted* papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of 'top-tier conference publications' when assessing the quality of individual researchers. For NeurIPS 2021, the PCs are repeating the experiment, as well as conducting new ones.

RIS


TY  - GEN
TI  - Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment
AU  - Corinna Cortes
AU  - Neil D. Lawrence
DA  - 2021/09/20	
ID  - publications/inconsistency-in-peer-review
DO  - 10.48550/arXiv.2109.09774
UR  - /publications/inconsistency-in-peer-review.html
AB  - In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for *accepted* papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of 'top-tier conference publications' when assessing the quality of individual researchers. For NeurIPS 2021, the PCs are repeating the experiment, as well as conducting new ones.

ER  -

APA


Cortes, C. & Lawrence, N.D.. (2021). Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment.  doi:10.48550/arXiv.2109.09774 Available from /publications/inconsistency-in-peer-review.html.