The Paper of the Year exercise is part of tradition in the Jacobsen Group at Harvard Chemistry and Chemical Biology. It is the group’s celebration of advances made in the past year in organic chemistry.

From group policy, the process comprises:

  1. collection of all candidates into a list
  2. individual study of the papers on the list and submission votes for group members’ 5 favorite papers
  3. the PoTY group meeting, where we discuss the top papers and ultimately select our Paper of the Year

Step 1 involves collating papers that are relevant to organic reaction chemistry, broadly defined. For key journals such as Science, Nature, Nature Chemistry, and J. Am. Chem. Soc., all papers that fit within our research interests are included in the list. Papers from other journals are included based on group members’ discretion.

The ~1500 papers in the list from Step 1 represents a human-curated body of literature that is germane to our group’s research (and likely that of other groups that also focus on the development of organic catalytic methodology).

The motivating question is:

What insights can we gain from treating this list of documents as a corpus and applying statistical and semantic analysis to it?

From a data science perspective, these texts form a reasonably sized corpus from which we may glean insights that may not be apparent by reading at a human chemist’s level—that is, reading deeply and within a rich conceptual framework, but being limited by memory.

Acknowledgements

I am indebted to all Jacobsen Group members, present and past, for making the PoTY exercise possible. Particular thanks to Prof. Eric Jacobsen for heading this exercise and Martin Reiterer for organizing the paper list. I gratefully acknowledge suggestions and thoughts from Diego Diaz, Prof. Richard Liu, Gabriel Lovinger, Petra Vojackova, Corin Wagen, and Michi Yasunaga.

Primary content for this analysis is copyrighted by ACS Publications, Wiley/Wiley-VCH, the Royal Society of Chemistry, and the National Academy of Sciences. No primary content has been, or will be, hosted on this website.