Posted: 2015-07-20 in discussion, research
arXiv has become a main source of information for statistics and machine learning. Daily email digests tell me what papers have been uploaded since yesterday, including authors, abstracts and a link to the paper. For me this is invaluable at the receiving side.

However, on the producing/publishing side, not everybody thinks that uploading papers to arXiv is a good idea. And there are several good reasons for this:

  • A paper on arXiv basically eliminates the double-blind reviewing process. Many conferences (e.g., ICML, NIPS, AISTATS) have reacted to this and explicitly allow to post papers on arXiv. Nevertheless, the papers are often no longer double blind. I do not want to defend the double-blind reviewing process, but just highlight an issue. And when it comes to some journals, it might be a real problem to have a version online that has not yet been accepted.
  • Some people think that a paper on arXiv reduces the chances of the paper getting accepted at a conference. This is controversial at best, and I have not seen any statistical analysis or strong evidence.
  • There is a danger that the work is not sufficiently mature. The missing peer-reviewing process makes it very simple to upload research that is not (publication-) ready.
  • The fact that ever paper has a time stamp can lead to a rat race in research where different groups want to put their foot down on an idea or algorithm. This goes hand in hand with the point above since the temptation of putting not-yet-ready research up but claiming ownership of an idea can be great. And if you have the right press offices behind your research it is even possible to accelerate this effect.

Having said this, there are also good reasons for uploading papers to arXiv. Here are also some:

  • Availability. Papers on arXiv are simply available to the entire world for free. Freely available papers are not only good for the readers who cannot or do not want to pay ridiculous subscription fees to publishers, but they are also good for the authors since it gives them visibility, potentially leading to new collaborations.
  • Pace of research. By essentially shortcutting a lengthy reviewing process, arXiv is great to advance research so much faster. Ideas are floating around and may initiate some new research by completely different people.
  • In a few cases, you might get immediate feedback or questions regarding your paper on arXiv. When this happens it is invaluable: First it shows that somebody is interested in your research, which makes us feel good. Second, questions usually highlight point either at some mistake, some things that need clarification or missing related work.
  • arXiv shows the full history of a paper, making it possible to track the evolution of the work
  • arXiv is a central source of information, not only collecting papers from a single conference, but from all kinds of journals, conference, tech reports. It is not uncommon that a search engine provides an arXiv link when I’m looking for a particular paper.
  • The arXiv time stamp allows you to claim an idea/algorithm (note this is also in the negative points).
  • “arXiv uncouples scientific communication from academic brownie point collection.” (Ryan Adams)

I am sure I forgot many points in these non-exhaustive lists, and I will add them when they appear.

In summary, without self-regulation there are some weak points of the system that can be exploited. Nevertheless, arXiv offers a great opportunity for us to make research immediately and centrally available, free of charge.

