Corante

About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: derekb.lowe@gmail.com Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Emolecules
ChemSpider
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
PubChem
Not Voodoo
DailyMed
Druglib
Clinicaltrials.gov

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
Kilomentor
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
ChemBark
Realizations in Biostatistics
Chemjobber
Pharmalot
ChemSpider Blog
Pharmagossip
Med-Chemist
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
SimBioSys
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Business|Bytes|Genes|Molecules
Eye on FDA
Chemical Forums
Depth-First
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa


Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
FuturePundit
Aetiology
Gene Expression (I)
Gene Expression (II)
Sciencebase
Pharyngula
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net


Medical Blogs
DB's Medical Rants
Science-Based Medicine
GruntDoc
Respectful Insolence
Diabetes Mine


Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem


Politics / Current Events
Virginia Postrel
Instapundit
Belmont Club
Mickey Kaus


Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Ah, Rumors | Main | ". . . Jobs That Don't Exist" »

August 21, 2007

Sorting Through the Piles

Email This Entry

Posted by Derek

We're in the final stages of moving into the newest version of Stately Lowe Manor. And that means, among many other more useful things, that I'll soon see my literature files for the first time in a few months. I've missed them.

Of course, paper files are slowly turning anachronistic, in the same way that paper libraries of the scientific literature are. (I've mused on their disappearance before). I now accumulate piles of PDF files and the like, scattered among folders on my hard drive(s). How to organize them?

That's what I'd like to ask people. I've come across reviews of various programs that are supposed to help with this kind of thing, but there are surely others that I don't know about. What I need is something that will cross-reference papers and graphics (in any format) as well as things like Excel files and such, and allow me to draw and return to connections between them. (At this point, I'm being paid for ideas as much as for anything else, and this is how I seem to generate them). There are several tools that can be made to do this work, with varying degrees of ease and efficiency, but I'm looking for something that's built specifically for the purpose.

One possibility is Yojimbo, which I haven't tried out yet. The ability to write down notes and ideas, with the relevant papers appended to them, is appealing. Does anyone out there have experience with this one, or with its competition? (And for that matter, if there are other ways that people find useful for generating interesting ideas, I'd be glad to hear about those, too. . .)

Comments (43) + TrackBacks (0) | Category: The Scientific Literature


COMMENTS

1. Grubbs the cat on August 22, 2007 1:37 AM writes...

I have recently switched my private email account to gmail and find the way they organise mails (similar problem) very appealing. Gmail assigns tags (of which you may have any number, just like keywords) to mails rather than putting mails in only ONE folder.

I think in the context of Derek's problem it would be very helpful if the same principle could be applied for files on your PC. Looking forward to other comments as I get to the stage at work where I could use something like that...

Permalink to Comment

2. Kay on August 22, 2007 4:04 AM writes...

I consider a good full-text indexing and retrieval system for PDFs far more important than a good organization or tagging. This doesn't mean that tagging is not useful.

Permalink to Comment

3. Nico on August 22, 2007 5:51 AM writes...

Hi Derek,

if you are using a Mac, one thing you could consider is a little program by Mekentosj called Papers. Papers is a little bit like iTunes for pdfs in that it allows you to organize them in folders, rate them, have quick previews etc. You can also serach PuBMed from within the application, download papers etc. Furthermore, it will automatically try to detect and store bibliographic information with the paper.
The application has been written by two PhD students in Amsterdam (which have by now graduated) and was prepared with the needs of the scientist in mind. This is the link:

http://mekentosj.com/papers/

Another option is to use a reference manager such as Endnote. The more recent versions allow you to import the paper directly into your Endnote database (i.e. no static links anymore), which should keep your reference collection nice and tidy and associated with your bibliographic information.

Hope this hels a little.

Permalink to Comment

4. Zak on August 22, 2007 5:52 AM writes...

Devonthink is good for that. Not sure how well it deals with Excel. You can import, but I'm not sure how well.

Permalink to Comment

5. Thomas E. McEntee on August 22, 2007 6:02 AM writes...

In the agnostic environment of web browsers, we don't know or really care what OS someone is using to build content but the mention of 'Yojimbo' suggests that Derek is a Mac OS X user. Having entered the brave new world of computers by way of the MS-DOS (Microsoft) and VMS (for all you young folks, from Digital Equipment Corporation) operating systems, we first looked at a Mac with the perspective of 'pretty neat UI but where's the command line?'. Macs are great for graphics, the OS is pretty good, etc., etc., but it doesn't come close to Microsoft Windows for database applications and this idea of linking files, graphics, notes, and comments is really a database application. In the Windows world, there are lots of apps to do what Derek is looking for. Some of these fall in the category of "mid-mapping" and I suggest that you look at FreeMind, MindRaider, PlanFacile, VYM (View Your Mind); these are all cross-platform programs, largely Java-based. Wikipedia (search on mind mapping) has a link to mind-mapping software. Most of these apps allow you to attach files but of course, the devil is in the details...

Permalink to Comment

6. JS on August 22, 2007 8:39 AM writes...

Of course, paper files are slowly turning anachronistic, in the same way that paper libraries of the scientific literature are....I now accumulate piles of PDF files and the like, scattered among folders on my hard drive(s).

I still print anything I want to read thoroughly, and keep the printed copies. I can't absorb PDF's and I certainly can't skim them.

Macs are great for graphics, the OS is pretty good, etc., etc., but it doesn't come close to Microsoft Windows for database applications...

I'm not sure why you think that. Even if users needed a command-line to interact with a database-driven application (and the vast majority of Windows users have never seen a command-line), modern MacOS is Unix.

Permalink to Comment

7. BioPhD on August 22, 2007 8:59 AM writes...

1) Get a tablet PC. It was made for annotating PDFs. Really. If you get one, PDF Annotator is a must. With these two you can take notes in the margins, highlight, whatever. Just like with a paper document. You can even turn the highlighting/notes off if you want to print a clean copy.

2) Get Zotero. With the 1.0RC release last week they've added full text searching, even inside of PDFs. It has MS Word / Openoffice integration, although this part could use some work in order to be on par with Endnote. Either way, Zotero is actively under development - which is more than I can say for Endnote which hasn't had a major improvement since like v3.0.

Permalink to Comment

8. BioPhD on August 22, 2007 9:08 AM writes...

I guess that I should've mention that as Zotero is a firefox plugin, it's cross-platform. It allows multiple libraries, tagging of documents, and will automatically save webpages + author info + PDFs in many cases.

My only question / possible caveat is the scalability of the application. I haven't built a huge library in it yet, so no idea if it'll handle 3000 PDFs. Although, it's being developed as the next-generation-open-source-endnote-replacement so *in theory* it should be built for heavy lifting.

Zotero is also free.

Permalink to Comment

9. Boghog on August 22, 2007 9:21 AM writes...

Macs are great for graphics, the OS is pretty good, etc., etc., but it doesn't come close to Microsoft Windows for database applications

Some fairly sophisticated database tools are now built right into the Macintosh OS:
MacOSX CoreData

In fact, Yojimbo is based on Core Data, which in turn is based on SQLite, an industrial strength relational database engine.

Finally, most of the major relational database applications run on Mac OS X (MySQL, PostgreSQL, Oracle).

Permalink to Comment

10. Jonathan on August 22, 2007 9:33 AM writes...

As Nico points out, if you have a mac I don't think you have an excuse for not trying out Papers. The user interface is beautiful, and it's the best scientific application I've come across since Prism.

Once you accept that it can organize your pdfs better than you can, and that you can just use Papers to browse them (much as one uses iTunes for music) wondering how to file things becomes a worry of the past.

http://arstechnica.com/journals/apple.ars/2007/03/18/minireview-papers-for-os-x

Permalink to Comment

11. Thomas E. McEntee on August 22, 2007 10:33 AM writes...

OK, Macs clearly have come a long way since my VMS and pre-Windows MS-DOS days in terms of their ability to provide database capabilities...but they were so poor in those days that no one thought seriously then about using them for this type of computing application. I probably should have known better than to open up the Mac vs. PC can of worms...that wasn't Derek's topic.

Permalink to Comment

12. molecularArchitect on August 22, 2007 11:02 AM writes...

I second Nico's comments on "Papers" and "EndNote". I've used EndNote since v 1.0 and have a huge legacy literature database that covers almost 20 years. Inertia alone has kept me from trying to move this database to other programs. EndNote is great for creating bibliographies in proposals and papers but it's searching capabilities could be better. The newest version allows one to include pdf files. You can add multiple keywords which can be used as tags for cross-referencing papers.

Papers looks very promising. It's obvious that it was created by scientists who understand how scientists use information. Once I'm working again, I plan to enter my new pdfs in Papers and give it a thorough evaluation. Although I'm a committed Mac user, I do wish they would release a Windows version so that the program could gain a wider audience and the files would be transportable if one changes jobs and has to change platforms.

On a related note, after 18 long months of unemployment, I'm happy to report that I just accepted a new job. It means leaving my beloved Bay Area for the East Coast but I'm excited about rejoining the game.

Permalink to Comment

13. lone electron on August 22, 2007 12:12 PM writes...

I'd also suggest investing in a Fujitsu ScanSnap. Some of the papers I have now are no longer available to me in PDF form. The Scansnap converts directly to a text searchable PDF document at a blazingly fast 36 pages per minute (18 double sided pages per minute). All the stacks in my office disappeared in one weekend thanks to this baby. I would also recommend using it with Devonthink Pro (which does OCR as well).

http://www.devon-technologies.com/products/devonthink/index.html

Permalink to Comment

14. Chrispy on August 22, 2007 3:35 PM writes...

Try getting sued and let someone else do it for you! No, seriously: the Attenex software package was pointed out to me the other day. It is made for lawyers to troll through masses of documents and emails and make connections between them. It makes this cool bubble diagram which shows all the associations between documents -- is there anything like this which would provide this kind of ongoing feedback to the user?

Check out the video demonstration at the bottom of the page here:
http://www.attenex.com/

I don't work for these guys and avoid lawyers all I can, but I really like how this software works (or is claimed to).

Permalink to Comment

15. Theodore Price on August 22, 2007 5:44 PM writes...

Another vote for Papers! I spent a few days getting a couple thousand pdfs into it and matching them with pubmed and I think it is the best application I have used in a long time. At 30 bucks you cannot beat it.

Permalink to Comment

16. biohombre on August 22, 2007 6:31 PM writes...

You did not mention price-range. In the hundreds range is a "mind-mapping" program- MindManager- this is the type of program Thomas McEntee mentions.

But for something that may be bit too familiar for keepers of lab notebooks, and is in the price range of Yojimbo, there is also Notebook (list $50). In the past I have used FileMakerPro to make my own digital notebook (to help search the paper one, create expt. tracking #s, and manage related pdfs). This "Notebook" metaphor entices me. You can get a trial download (http://www.circusponies.com/). One aspect I am concerned with is searchability in the text of pdfs. Spotlight does this (sometimes too well), I do not know how well "Notebook" integrates this in the notebook hierarchy the user creates.

Apparently, lawyers also use this (http://www.apple.com/business/profiles/mancino/index2.html).

Permalink to Comment

17. JW18 on August 22, 2007 7:36 PM writes...

molecularArchitect:

Congrats. I guess we'll see if you or this guy in the link below go longer. What is your tale?

www.boston.com/news/globe/magazine/articles/2007/08/05/unemployed_17_months_and_counting/?rss_id=Boston+Globe+--+Globe+Magazine

Permalink to Comment

18. molecularArchitect on August 22, 2007 10:31 PM writes...

JW18,

Thanks, for the congrats and the link. I can really empathize with the guy in the story. I don't want to hijack this thread now but posted about my job search previously on this blog, back when Derek lost his job. Word of advice for all you younger people: expand your skillset beyond med chem now, the market sucks for those 45+.

Permalink to Comment

19. Just me on August 22, 2007 11:01 PM writes...

I collect huge amount of files of various sorts for professional use. I've given up on systematic filing - Google desktop (Microsoft has a not as functional version) is the way to go. Where you file or store becomes immaterial - I just search based on key words. I always find the most relevant materials (usually the specific pdf I had in mind is at the top of the list). I dump pdfs/documents into high level files, but leave it at that.

Permalink to Comment

20. JW18 on August 22, 2007 11:25 PM writes...

molecularArchitect: Could you fire me up some of the more informative links to these old threads about your and DL's job searches, if you have a chance? Thanks.

Permalink to Comment

21. Thomas E. McEntee on August 23, 2007 7:34 AM writes...

Biohombre: Mind-mapping software costs range from $129-399 (MindManager) to $0 (FreeMind via SourceForge)

Permalink to Comment

22. Kay on August 23, 2007 8:30 AM writes...

Based on the comments above, it looks like the timing is good for a discussion on the new Chinese and Indian CRO. WuXi seems to have captured the imagination of Wall Street.

Permalink to Comment

23. Cat on August 23, 2007 9:19 AM writes...

After several years in the lab, I found that I had collected hundreds of pdf’s and was unable to find anything. I then devised my own file system based on publisher DOI numbers. Basically, the downloaded files are assigned into 8 categories; total synthesis, reactions, named reactions, structure synthesis, small drug molecules, reagents, reviews, and process development and chemistry. The article title and DOI number is placed in a MS Word reference document assigned to each category. Using the [Edit/Find] function, the reference document can be searched for a character string. For example, if I am looking for an oxidation, I would go to the reaction list and search for “oxidation”. This would flag all references with that term in the title. If I chose to, I could group these reactions in one location in the document for easier later access. I can link to the pdf with a hyperlink, search the reaction file folder manually for the folder with the desired DOI (easy since the files are automatically sorted into alphabetic-numeric order), or use the [Start/Search/For Files or Folders] function. In the latter case, one does a copy/paste of the DOI to the search engine. The files found are displayed in the search window and a click on the pdf file name will open the file. Some of the earlier DOI’s are long strings which are not suitable for naming files. In this case, I assign a DOD (digital object designator) derived from the journal title, volume, issue, and starting page of the article of interest. I define a set of rules for this to assure consistency.

I currently have 17K files, 7.5K folders and 10 GB of data. A file search takes

Permalink to Comment

24. Cat on August 23, 2007 9:37 AM writes...

Continued from 26

A file search takes less than 2 seconds and files are quickly accessible. It may be primitive but it works well for me.

(seems like the less than character is problematical!

Permalink to Comment

25. HelicalZz on August 23, 2007 9:45 AM writes...

I used to rely heavily on Scifinder searches dumped into an Endnote database. When I had the paper physically or in pdf, I'd make a comment in the endnote file where it was. Extremely handy.

I don't do this now, cause I'm with an underfunded operation. But I will again in the future I'm sure.

I was just made aware of Science Direct (http://www.sciencedirect.com/) which is mosly literature search but seems to have storage features as well. I haven't played with it much - I will.

Permalink to Comment

26. paiute on August 23, 2007 11:31 AM writes...

I want to put in a vote for the free Copernic desktop search software (http://www.copernic.com/). Unfortunately it is only PC, so I use it only at work, but there is where it is the most useful. I used to spend time filing pdfs and other documents in creative ways so they could be found later. Copernic indexes everything on your hard drive (and can also index corporate shared volumes - which Google Desktop Search would not do up until the time IT banned it). You can then search everything including emails for any text contained. Every word becomes a key word.

I am tempted to just toss every document into a big virtual pile and let Copernic sort them. After all, this is more or less the way Google searches the Internet.

The only drawback is that a lot of people in the company now come to me to find documents for them.

Permalink to Comment

27. JSinger on August 23, 2007 12:01 PM writes...

I guess that I should've mention that as Zotero is a firefox plugin, it's cross-platform. It allows multiple libraries, tagging of documents, and will automatically save webpages + author info + PDFs in many cases.

I downloaded it but am having trouble figuring out how to usefully use it. For typical browsing PubMed / downloading and reading PDFs / archiving them, how do you suggest doing it with Zotero?

Permalink to Comment

28. Derek Lowe on August 23, 2007 12:22 PM writes...

I've been using Zotero for a couple of days now (after reading about it in these very comments), and I'll probably do a more detailed post next week. So far, I think it's going to be quite useful.

When I get a useful PubMed reference, I click on the icon next to it in the address bar, and Zotero scoops up all the bibliographic data. I've named several folders with my areas of interest, and I drop it into one or more of them. I put my own notes, if any, in with it with the Notes tab. If I can get a PDF (through my company's access), I download that, and then use the Attachment tab to link the hard drive PDF to the others. I've also been using tags, which I will persist in thinking of as "index terms", to further identify each one.

Permalink to Comment

29. Derek Lowe on August 23, 2007 12:43 PM writes...

Oh, and I should mention that there's a tool in the latest release to index and search the text inside the PDFs themselves. I haven't tried that one out, but I'm looking forward to using it. . .

Permalink to Comment

30. Jonathan on August 23, 2007 3:49 PM writes...

Derek, you should really give Papers a whirl - instead of using firefox to browse PubMed, you do that inside Papers (which uses Safari's web engine iirc) and then import papers you like directly that method.

It also has a nifty feature that, once it knows a journal's website, it will keep up to date with the 50 most recent publications in that journal, so you don't even need to rely on eTOC emails to keep up to date with the literature, you can just do it all with one app. It's like a NetNewsWire and iTunes had a baby that went to work in a lab.

Permalink to Comment

31. bitter pill on August 23, 2007 5:49 PM writes...

check out evernote (evernote.com) as a good storage and search product for ideas, clips, etc.

Permalink to Comment

32. Yusuf Tanrikulu on August 23, 2007 8:16 PM writes...

I think Spotlight includes pdf text searching. Just try hitting apple+space and type in a word which might appear in one of your pdfs.

Permalink to Comment

33. molecularArchitect on August 23, 2007 10:51 PM writes...

Jonathan"

Your statement "It's like a NetNewsWire and iTunes had a baby that went to work in a lab" is a perfect description of Papers. You should send that to the developers!

The more I use it, the more impressed I am. I just wonder how well it will work with search results from SciFinder, wish there was a direct tie-in ala PubMed. I haven't had the opportunity to test this because I currently have no access to SciFinder.

Permalink to Comment

34. tom bartlett on August 24, 2007 8:12 AM writes...

There are lots of products out there. I'm sure the Devon or the Bare Bones products will be top or near top of the list of most useful. Once comment:
I think the Mac handles PDF's WAY better than Windows, as it is a native file format. Printing to PDF is trivial. Preview, IMHO, is nicer in terms of performance that Adobe Reader.


"Devonthink is good for that. Not sure how well it deals with Excel. You can import, but I'm not sure how well."


Off-topic, but have you tried my favorite calculator, Devon's free "CalcService"? You can type an expression, like (3*2)+6 into, say, Stickies or TextEdit and return the answer. I prefer this approach to traditional calculators.

Permalink to Comment

35. tom bartlett on August 24, 2007 8:40 AM writes...

There are lots of products out there. I'm sure the Devon or the Bare Bones products will be top or near top of the list of most useful. Once comment:
I think the Mac handles PDF's WAY better than Windows, as it is a native file format. Printing to PDF is trivial. Preview, IMHO, is nicer in terms of performance that Adobe Reader.


"Devonthink is good for that. Not sure how well it deals with Excel. You can import, but I'm not sure how well."


Off-topic, but have you tried my favorite calculator, Devon's free "CalcService"? You can type an expression, like (3*2)+6 into, say, Stickies or TextEdit and return the answer. I prefer this approach to traditional calculators.

Permalink to Comment

36. JSinger on August 25, 2007 8:04 PM writes...

When I get a useful PubMed reference, I click on the icon next to it in the address bar, and Zotero scoops up all the bibliographic data.

Ah, thanks! I get it now. Yep, that does look pretty useful.

Permalink to Comment

37. biohombre on August 26, 2007 7:56 PM writes...

I have to agree with Derek, Zotero is impressive. If they implement the search mode IN the pdf documents it will surpass commercial software I have. It already uses an organizational style I use currently, and adds those relational features for notes and images etc. Why would I want to pay money to upgrade my bibliographic database? !!!!

Permalink to Comment

38. biohombre on August 26, 2007 7:57 PM writes...

I have to agree with Derek, Zotero is impressive. If they implement the search mode IN the pdf documents it will surpass commercial software I have. It already uses an organizational style I use currently, and adds those relational features for notes and images etc. Why would I want to pay money to upgrade my bibliographic database? !!!!

Permalink to Comment

39. TFox on August 26, 2007 11:41 PM writes...

I like it simple: Google Docs for the notes, Endnote for the database, and a directory of PDFs for the files. When I need it, Spotlight locates and does full-text searching on the PDFs (plus everything else on my computer), and Google docs does full-text on the notes. I rarely search in Endnote, but it's nice for formal queries, and bibliography generation.

Permalink to Comment

40. lone electron on August 27, 2007 12:45 PM writes...

One last comment in support of Devonthink Office. If you import all of those old Tet.Lett. articles that are simply image files (and thus have no searchable text) it will do the OCR work for you. Voila, a 1975 Tet.Lett that you can search. I'm not sure if this is still a problem with the journal, but it was definitely an issue 5 years ago.

Permalink to Comment

41. Adam on August 28, 2007 11:33 AM writes...

I also downloaded Zotero at the recommendation of this blog, and like it so far. I'm still working on how to make it useful.

See this page for use with pubs.acs.org

Zotero Forum

(http://forums.zotero.org/discussion/770/pubsacsorg-not-recognized/#Item_0)

Permalink to Comment

42. HIV_vax on August 30, 2007 10:35 AM writes...

I have tried a lot of things, and settled on Connotea for now.

http://www.connotea.org/

Pros:

- Easy to import directly from the journal page (either by viewing the abstract in Pubmed, in HTML on the web, or by DOI)
- Can Tag with your own keywords
- Easy to search/retrieve, intuitive

Cons:

- It's a social bookmarking tool, so by default your citations are public. (The upside is you can find similar references by searching for those who have tagged the same ref.)
- Not really suitable for writing bibliographies. Endnote is better for that.

Permalink to Comment

43. azmanam on August 30, 2007 1:39 PM writes...

I was having problems in Zotero attaching pdfs. I'd get a "file does not start with %pdf" error. Very frustrating.

See here for the answer

http://forums.zotero.org/discussion/217/pdfs-saved-from-jstor-are-corrupt/

When you navigate to a pdf in your browser, you can click on the icon to the left (not the Zotero icon) and drag it to Zotero. Place it on the parent reference, and it adds the pdf as an attachment. Very helpful for attaching pubs.acs.org supporting information.

Permalink to Comment

POST A COMMENT




Remember Me?



EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
How Not to Do It: NMR Magnets
Allergan Escapes Valeant
Vytorin Actually Works
Fatalities at DuPont
The New York TImes on Drug Discovery
How Are Things at Princeton?
Phage-Derived Catalysts
Our Most Snorted-At Papers This Month. . .