### Publish or perish

Academics and academic administrators are always looking for ways to measure and quantify performance. That is, how can we tell if someone is doing, or not doing, excellent science? This is easy to assess in hindsight – excellent science is that which advances the field – but it is a surprisingly tough question to answer in real time, because the scientists doing the work are studying right at the boundaries of what is known. The scientific method provides the rules for the game, but it does not keep score: there are no infallible scorekeepers regarding the importance of just-discovered knowledge. This is why we enlist the work of other active scientists to assess quality, called peer review. It is a cumbersome method. We do this for publishing scientific papers, for decisions on hiring, on tenure, to prepare nominations for awards, and so forth. The real-time assessment of science and of scientists is a lot of work.

All of us would like short-cuts, and the seductive allure of short-cuts, whether to assess other scientists or to obtain greater recognition for ourselves, leads to fads.

The simplest short cut is just to count the number of scientific papers someone has published. This is pretty crude, as there is no measure of quality; indeed, I have known scientists who – instead of trying to discover, say, the photon or the phonon, the quantums of light and of sound – seem to be determined to discover the “publon”, the smallest quantum of new science necessary to give rise to a journal article.

Fundamentally, scientists write papers so that people will read them and understand the new science presented therein. While we have no readily available methods telling how many people read and are impressed by a paper, an indirect method is available. Almost all scientific papers refer in their bibliographies to other scientific papers, called citations. If someone cites a paper they have (or they should have!) read that citation. So, somewhat like the way the Google page-rank algorithm works, we figure if a paper is cited, it has value, and the more citations the better. Journals which publish highly-cited papers are said to have high impact factors.

With this imperfect, but not unreasonable way of keeping score, citations have become a blood sport for scientists. As a side note, scientists’ emphasis on and fascination with citations has led to unintended consequences: one unfortunate consequence is the disappearance of books as a medium of scientific communication, as their cites are not counted.

Immediately you will see one short-cut: simply publish in high-impact journals. This is a common strategy, sometimes carried to extremes. I recall a high-impact journal which would sort submissions by a number and letter code, e.g., LR6473, LJ3746, LA2938, and so on. It was always “L” followed by a letter followed by four numbers. At one large conference a rumor swept the community: the letter determined your fate! “A” through “J” meant your paper was sorted and consigned to be rejected, on the other hand, “K” through “Z” was what you wanted to get published! I asked an editor at the time. After a long careful look he told me, Martin, we start at A, then we go to B, and when we finish with Z, we start over again at A.

Finally, let me review one last handy fad for measuring research impact, and show how it is related to other methods. And, for fun, please allow me to do a little mathematics in this blog.

The h-index was introduced by J. E. Hirsch in 2005 (see PNAS 102 (46), 16569 (2005)). A person has an h-index of h if h papers have been cited at least h times. This is relatively easily obtainable as it involves a person’s most highly cited papers, compared to the total number of citations, or the total number of papers, as these latter two require more work to track down obscure papers. It is more direct than the number of papers in “high-impact journals”, the latter being journals where previously-published papers have been cited a great deal, and hence not reflecting the quality necessarily of the work under scrutiny.

These days, it is a tedious reality of academic administration to hear about everyone’s h-index. But it is not a new thing, or a magic bullet. It is straightforward to relate analytically the h-index to the number of papers and/or to the total number of citations with a few assumptions.

Say the number of cites of a paper is c. If, in a field, papers make reference to χ papers on average, this is also the average number of cites per paper. Let the weight for the probability of a cite be P(c), which we will assume is normalized. The first and second moments are χ = ∫_{0}^{∞} dc c P(c), and σ^{2} = ∫_{0}^{∞} dc (c-χ)^{2} P(c). Using the normalizability of P(c), assuming h is reasonably close to χ with h ≪ N, and noting that P has the dimensions of 1/σ, we obtain h ≈ χ + *O*(σ). Such a result is intuitively obvious as well.

It remains to estimate σ. Group a person’s papers by their relationship to one another along the y axis. Someone cites a paper, and then is more likely to cite a related paper, close by on the y axis. As time goes on, the subsequent citations follow a one-dimensional random walk along the y axis. As is well known, the fluctuations in the y direction are proportional to the time taken. Here the time taken is the total number of cites, estimated to be Nχ. The square root of this, the standard deviation of y, will be proportional to the standard deviation of c. Hence σ = *O*(√N) and, as the data is observed to behave in Hirsch’s paper, h ≈χ+ *O*(√N), where χ and the proportionality constant in front of σ are field dependent.

So, after all that, note that the h-index is roughly equivalent to either a count of the number of papers N, or to the total cites Nχ. The advantage of the measure is convenience, but it is not a magic bullet.