Research Paper Management with Emacs, org-mode and RefTeX
Update 3-11-14: Nuno Salgueiro in the comments led me to a RefTeX change that broke the “jump to this entry in notes.org” behavior (it seems “reftex-citation” returns a list now, regardless if there is only one entry). This can be fixed by changing (reftex-citation t) to (first (reftex-citation t)).
Update 1-19-11: I’ve added a screencast of me demonstrating how I use this setup to work with my papers, I’ve also re-written the “Workflow” section (due to the fact it was kind of confusing…) Hope this all helps :]
Update 4-27-12: olberger (in the comments section) has added, what I consider, an incredibly clever and useful function to help when writing papers. I’ve just finished tweaking it slightly for my purposes, but please check out his post, here. I’ll be adding what I did to this post… when I get around to it.
My labmates and I have been searching for a while now for methods to organize the mountain of research papers we collect as graduate students. I’ve tried a handful of approaches, and was happy using zim-wiki for a while, but entering info became a choir, and finding a paper could sometimes be a hassle.
My recent attempts at working with lisp have led me to switch to emacs, and in what seems to be a common occurrence, I wanted to do everything in emacs. As silly as that sounds, I believe I’ve found my solution to organize my papers through emacs.
Managing papers and references in emacs is nothing new, and I actually followed a few guides on how other people used org-mode and reftex to do so. Specifically this post, and this email. My hope with this initial post is to pull the bits together, show what I built on top of them, and how I setup my org files to facilitate my workflow. If you don’t know how to use or don’t know what emacs and org-mode are, give a quick search–there is plenty of info out there.
Setting up RefTeX
First, we want to load to load RefTeX whenever we use org-mode. This is well documented, and mine only differs in the citation formats I pass to RefTex, and my additional key binding.
(defun org-mode-reftex-setup () (load-library "reftex") (and (buffer-file-name) (file-exists-p (buffer-file-name)) (progn ;enable auto-revert-mode to update reftex when bibtex file changes on disk (global-auto-revert-mode t) (reftex-parse-all) ;add a custom reftex cite format to insert links (reftex-set-cite-format '((?b . "[[bib:%l][%l-bib]]") (?n . "[[notes:%l][%l-notes]]") (?p . "[[papers:%l][%l-paper]]") (?t . "%t") (?h . "** %t\n:PROPERTIES:\n:Custom_ID: %l\n:END:\n[[papers:%l][%l-paper]]"))))) (define-key org-mode-map (kbd "C-c )") 'reftex-citation) (define-key org-mode-map (kbd "C-c (") 'org-mode-reftex-search)) (add-hook 'org-mode-hook 'org-mode-reftex-setup)
Jump to Entry
The other difference I added was the binding of “C-c (” to org-mode-reftex-search, which I defined earlier in my init.el. This is the command that will jump to the entry in my org-mode file, and follows
(defun org-mode-reftex-search () ;;jump to the notes for the paper pointed to at from reftex search (interactive) (org-open-link-from-string (format "[[notes:%s]]" (reftex-citation t))))(defun org-mode-reftex-search () ;;jump to the notes for the paper pointed to at from reftex search (interactive) (org-open-link-from-string (format "[[notes:%s]]" (first (reftex-citation t)))))
Simple. But I was happy with the results. Update: changes in reftex from initial authoring of this post have reftex-citation return a list. An updated function to fix this has been added :P
Making Org-mode work with you
Lastly, org-mode needs a few things to pull all this together. The first and most important is importing the bibtex file. RefTeX looks for a LaTeX \bibliography tag anywhere in the file, I place mine as an org-mode comment at the start of the file
The other thing needed is link abbreviations. While you could hardcode this into your citation formats, I prefer to put abbreviations in for the citation formats, and define defaults elsewhere in my init.el
(setq org-link-abbrev-alist '(("bib" . "~/research/refs.bib::%s") ("notes" . "~/research/org/notes.org::#%s") ("papers" . "~/research/papers/%s.pdf")))
These can be easily overridden in an org-mode file, which I actually do for the org-mode file I store the actual entries in. If I left it as is, following a “notes” link in this org-mode file would open the same file in a new window and jump to the entry in that one. Not quite what we want. This is where I override it in the local file by adding this to my heading.
#+LINK: notes #%s
Now, if I follow a “notes” link in the entries file, it jumps to that entry in the same frame, while following a “notes” link in another org-mode file (or using my new reftex search addition) will open this file in a new frame and jump to the entry.
My setup for this involves two main files: refs.bib, the main bibtex file, and notes.org, the org-mode file I use to manage the papers and store notes for each.
In notes.org my overall workflow follows a typical org-mode hierarchical layout, the key parent being “Papers” with each child heading being either a category or an entry for a paper, each with the appropriate or useful org-mode tags. Each paper headline corresponds to that paper, and I write notes under these headlines about the paper.
The hierarchical layout has children inheriting parents tags which is quite nifty. This is my initial lookup method when I’m looking for a paper. For example, I want to find a paper that describes how to couple EDOT using an iron catalyst, I can type “C-c \” to do a tag search, type in one or all of the relevant keywords, and org-mode will show the entries matching those tag[s]. I can then expand those entries, see what notes I’ve written on the papers, and when I found the one I’m looking for, I can open the link to the pdf I’ve placed there using “C-c C-o”.
When I find a new paper I need to add, I initially gather all the data I need to use org-mode: the bibtex entry and the paper itself. I modify the bibtex key to fit with my scheme (FirstAuthorYear) but you can use whatever suites you best. I then save the paper using that bibtex key as the filename in another folder.
Note: I manage my bibtex entries by first saving each new bibtex entry as a separate file in a collective folder (due to the fact I usually export them from the journal’s website when I find the paper) and then I concatenate all the files in that folder to make a new bibtex file using
$ cat bibtex/*.bib > refs.bib
This feels a little messy, but the easiest solution I could think of; I’m sure I could setup a command to do this for me from emacs, but this is a low priority. The one problem with this is if you change the bibtex file while org-mode is running, RefTeX will not see the changes. To do so you need to enable “global-auto-revert-mode” in emacs. Supposedly, this is automatically enabled in emacs 23, but it seems to be disabled by default for me (23.2.1)
Adding a new headline in my notes.org file is simplified by using RefTex. I place my cursor on a new line and hit “C-c )” which is bound to “reftex-citation”. The first prompt is for a citation format (if more than one) and I have a few for different purposes. I hit ‘h’ for heading, which contains all the formatting for a barebones paper headline. This puts a new entry with the title of the paper as the headline, a propeties list with custom-id of the bibtex key (this allows linking to this entry by it’s bibtex key), and a body containing a link to the pdf. After selecting the format, RefTeX prompts for a regex to search the bibtex file with, presenting a list of matching entries. Selecting the desired entry inserts the citation, in this case, the new entry.
This is how we exploit RefTeX, we create custom citation formats that are really org-mode tags and formattings. A few other formats I have are all org-mode links: one that links to the entry in the bibtex file itself, one that links to the pdf, and another that links to the entry in the org-mode file. I use org-mode link abbreviations to get general behavior that can be changed on a per-file basis.
Another option I recently added to this is a way to search for other info I may not have placed in a tag, such as an author or journal name. Here I shamelessly take adavtage of having reftex loaded again. I bound this key to a custom command I made that will jump to the entry for the bibtex entry you select from the reftex-citation prompt.
And that’s that! So far, this is the most powerful approach I have found, and I know I’ve spent less time searching than any other method I’ve found. What’s also great about this is that org-mode’s exporting allows me to export this as HTML to serve up on our group’s website for the rest of my group to use. An additional benefit is that because I’m already gathering bibtex entries, when it comes time to write a paper, I already have all my citation data, and I can easily search a key to retrieve all my notes on that paper as well.
There are some weaknesses I’m still trying to work out, such as manually scraping bibtex entries and making sure everything has the proper filename. The problem really is that all the journals aren’t consistent with these things (some don’t even provide bibtex export! Luckily, there’s bibutils to handle the conversions) and entries need to be tweaked and/or pdf’s named according to the key. Ideally, I would like to find a database that I could script a tool against to scrape the data I need and already name and places the files for me, but that is for another day/entry
I’ve been trying to see about using attachments to handle the papers instead, but I haven’t been able to tweak it to my satisfaction just yet. Still trying though. This should allow me to attach multiple files for an entry (such as supporting info, etc)