Peer review has long been the standard for quality assurance in scholarly research. But because it is inherently subjective and qualitative, the reliability of peer review as a method to evaluate research quality has often been questioned. Issues include biases in selection of reviewers, the tendency of reviewers to evaluate according to their own interests, conflicts of interest, and biases in evaluating research (e.g. researcher age, university reputation) (Martin & Irvine, 1983; Smith,1988; Langfeldt, 2001; Butler & McAllister, 2009). With the introduction of the Science Citation Index by Eugene Garfield in the 1960s academics, administrators, and research policy experts began to ask whether bibliometric indicators, primarily citation-based, might provide an alternative quantitative and more objective measure of research quality that would not suffer from the drawbacks of peer review (Garfield, 1979). This trend gained momentum with the introduction of the Scopus.
The academic and research policy communities have seen a long debate concerning the merits of peer review and quantitative citation-based metrics in evaluation of research. Some have called for replacing peer review with use of metrics for some evaluation purposes, while others have called for the use peer review informed by metrics. Whatever one’s position, a key question is the extent to which peer review and quantitative metrics agree. In this paper we study the relation between the three journal metrics source normalized impact per paper (SNIP), raw impact per paper (RIP) and Journal Impact Factor (JIF) and human expert judgement. Using the journal rating system produced by the Excellence in Research for Australia (ERA) exercise, we examine the relationship over a set of more than 10,000 journals categorized into 27 subject areas.
We analyze the relationship from the dimensions of correlation, distribution of the metrics over the rating tiers, and ROC analysis. Our results show that SNIP consistently has stronger agreement with the ERA rating, followed by RIP and then JIF along every dimension measured. The fact that SNIP has a stronger agreement than RIP demonstrates clearly that the increase in agreement is due to SNIP’s database citation potential normalization factor. Our results suggest that SNIP may be a better choice than RIP or JIF in evaluation of journal quality in situations where agreement with expert judgment is an important consideration.
Haddawy, Saeed-Ul Hassan, A. Asghar, S. Amin “A Comprehensive Examination of the Relation of Three Citation-Based Journal Metrics to Expert Judgment of Journal Quality”, Journal of Informetrics, vol 10, issue 1; Elsevier, 2016Read Paper