About this Author
DBL%20Hendrix%20small.png College chemistry, 1983

Derek Lowe The 2002 Model

Dbl%20new%20portrait%20B%26W.png After 10 years of blogging. . .

Derek Lowe, an Arkansan by birth, got his BA from Hendrix College and his PhD in organic chemistry from Duke before spending time in Germany on a Humboldt Fellowship on his post-doc. He's worked for several major pharmaceutical companies since 1989 on drug discovery projects against schizophrenia, Alzheimer's, diabetes, osteoporosis and other diseases. To contact Derek email him directly: Twitter: Dereklowe

Chemistry and Drug Data: Drugbank
Chempedia Lab
Synthetic Pages
Organic Chemistry Portal
Not Voodoo

Chemistry and Pharma Blogs:
Org Prep Daily
The Haystack
A New Merck, Reviewed
Liberal Arts Chemistry
Electron Pusher
All Things Metathesis
C&E News Blogs
Chemiotics II
Chemical Space
Noel O'Blog
In Vivo Blog
Terra Sigilatta
BBSRC/Douglas Kell
Realizations in Biostatistics
ChemSpider Blog
Organic Chem - Education & Industry
Pharma Strategy Blog
No Name No Slogan
Practical Fragments
The Curious Wavefunction
Natural Product Man
Fragment Literature
Chemistry World Blog
Synthetic Nature
Chemistry Blog
Synthesizing Ideas
Eye on FDA
Chemical Forums
Symyx Blog
Sceptical Chymist
Lamentations on Chemistry
Computational Organic Chemistry
Mining Drugs
Henry Rzepa

Science Blogs and News:
Bad Science
The Loom
Uncertain Principles
Fierce Biotech
Blogs for Industry
Omics! Omics!
Young Female Scientist
Notional Slurry
Nobel Intent
SciTech Daily
Science Blog
Gene Expression (I)
Gene Expression (II)
Adventures in Ethics and Science
Transterrestrial Musings
Slashdot Science
Cosmic Variance
Biology News Net

Medical Blogs
DB's Medical Rants
Science-Based Medicine
Respectful Insolence
Diabetes Mine

Economics and Business
Marginal Revolution
The Volokh Conspiracy
Knowledge Problem

Politics / Current Events
Virginia Postrel
Belmont Club
Mickey Kaus

Belles Lettres
Uncouth Reflections
Arts and Letters Daily
In the Pipeline: Don't miss Derek Lowe's excellent commentary on drug discovery and the pharma industry in general at In the Pipeline

In the Pipeline

« Merck's CALIBR Venture | Main | Running Out of Helium? »

March 19, 2012

Dealing with the Data

Email This Entry

Posted by Derek

So how do we deal with the piles of data? A reader sent along this question, and it's worth thinking about. Drug research - even the preclinical kind - generates an awful lot of information. The other day, it was pointed out that one of our projects, if you expanded everything out, would be displayed on a spreadsheet with compounds running down the left, and over two hundred columns stretching across the page. Not all of those are populated for every compound, by any means, especially the newer ones. But compounds that stay in the screening collection tend to accumulate a lot of data with time, and there are hundreds of thousands (or millions) of compounds in a good-sized screening collection. How do we keep track of it all?

Most larger companies have some sort of proprietary software for the job (or jobs). The idea is that you can enter a structure (or substructure) of a compound and find out the project it was made for, every assay that's been run on it, all its spectral data and physical properties (experimental and calculated), every batch that's been made or bought (and from whom and from where, with notebook and catalog references), and the bar code of every vial or bottle of it that's running around the labs. You obviously don't want all of those every time, so you need to be able to define your queries over a wide range, setting a few common ones as defaults and customizing them for individual projects while they're running.

Displaying all this data isn't trivial, either. The good old fashioned spreadsheet is perfectly useful, but you're going to need the ability to plot and chart in all sorts of ways to actually see what's going on in a big project. How does human microsomal stability relate to the logP of the right-hand side chain in the pyrimidinyl-series compounds with molecular weight under 425? And how do those numbers compare to the dog microsomes? And how do either of those compare to the blood levels in the whole animal, keeping in mind that you've been using two different dosing vehicles along the way? To visualize these kinds of questions - perfectly reasonable ones, let me tell you - you'll need all the help you can get.

You run into the problem of any large, multifunctional program, though: if it can do everything, it may not do any one thing very well. Or there may be a way to do whatever you want, if only you can memorize the magic spell that will make it happen. If it's one of those programs that you have to use constantly or run the risk of totally forgetting how it goes, there will be trouble.

So what's been the experience out there? In-house home-built software? Adaptations of commercial packages? How does a smaller company afford to do what it needs to do? Comments welcome. . .

Comments (66) + TrackBacks (0) | Category: Drug Assays | Drug Development | Life in the Drug Labs


1. PPedroso on March 19, 2012 7:35 AM writes...

I work in a small company and so far is excel spreadsheet all the way but we do not have that many projects (so far!) and things are still manageable...
From my perspective more difficult than manage and process data is manage and process all the different documents from different sources (like reports and prelimminary results, ect...) regarding that data...

Permalink to Comment

2. MattF on March 19, 2012 8:08 AM writes...

And I'd think you also need what computer people call 'version control'-- when does new data become reliable enough to supersede old data, and to what degree, and who should have access to the new data before it is deemed reliable?

Permalink to Comment

3. MarkE on March 19, 2012 8:29 AM writes...

We are able to use an integrated 'report-puller' at the small company where I work: a custom-built program which can pull user-defined data from our central data capture system. Creates large but (more) manageable Excel files.

Permalink to Comment

4. Anonymous on March 19, 2012 8:30 AM writes...

There are excellent open source and free relational database management systems (RDBMS) out there, my favorite is PostgreSQL but there is also MySQL and SQLite. For visualizing and analyzing data, there is the trusty free and open source R statistics package which can plug in to the aforementioned RDBMSs. R is very widely used in the biomedical setting and has many excellent extensions. For general data handling, there are the widely used scripting languages like Perl and Python. Even though all the packages mentioned here are free, it takes time to develop proficiency but the payoff is well worth it, even for small companies. Spreadsheets are fine for data analysis up to a point but are no substitute for databases when it comes to data storage. My experience is based on bioinformatics but databases have an obvious general utility and I believe both Python and R have extensions and libraries for chemical data too.

Permalink to Comment

5. ChrisS on March 19, 2012 8:38 AM writes...

I'd say that for both small and large pharma there are more similarities than differences in needs when the chemist asks "What should I make next?" and the biologist asks "How did those results turn out?". While small pharma may have smaller datasets, in either case once you get beyond a certain number columns of data and rows of structures in a spreadsheet, it becomes difficult to make comparisons even with conditional formatting such as color/stoplighting to help out. In the beginning we wrote simple web-based tools that queried an Oracle database and returned a grids of data, but it was quickly apparent that this was not enough.

There are now several platforms that are available for small to medium pharma which are cost-effective and help enormously with the task of selecting and arranging data in a form that eases decision making. I would say that something like this is a sine qua non for effective drug discovery so be prepared to make the pitch and spend the money.

We chose Dotmatics Browser, a tabbed forms and tabular data query/viewing platform which very flexible and easy to maintain. We've configured each project with its own collection of forms, with a primary summary form that shows data for all the assays on the project's progression scheme. This form also shows structure and any other data the project deems critical for comparison purposes. Typical of forms-based applications, one queries the data by entering terms in any of the fields. Along with the main project summary form I've configured a set of standard template forms for the "usual suspects" - chem properties, in-vitro DMPK, in-vivo DMPK and so on. These appear in every project as well. To handle documents, I have an indexer on the fileshares containing the reports, and this is linked to a tab in Dotmatics so it returns documents containing the current compound ID as you move among the compounds in the query result. Query results can be pushed out to preconfigured tabular views easily.

What I don't want is for my chemists and biologists spending time pushing data around to get it in a form they can use; better to present the data in a useable form and let them think about what to make next.

Permalink to Comment

6. dmc on March 19, 2012 8:40 AM writes...

"...To visualize these kinds of questions - perfectly reasonable ones, let me tell you - you'll need all the help you can get."

This is the key point of the discussion. Regardless of the software solutions available, you need good scientist who will actively manage and analyze the loads of data coming in. In my experience this responsibility has always fallen on the MedChemists because we need to understand all of the data relationships and dependencies in order to design the next set of analogs. With the recent gutting of MedChem talent in the industry, this is an 'art' that is being lost. You can outsource all the compound synthesis you want, but if no one is left around who understands how to properly interpret all the data being generated you might as well go back to combichem!

Permalink to Comment

7. Rajarshi on March 19, 2012 8:41 AM writes...

While RDBMS's + R are a potent combination, they definitely require informatics support and/or expertise. Certainly, from the informatics side of things, this a great combination and coupled with chemistry toolkits and modern vis frameworks, pretty much anything is possible.

But from the bench scientists point of view I'd argue that without appropriate (ie usable) interfaces, it's all useless

Permalink to Comment

8. MarkE on March 19, 2012 8:42 AM writes...

We are able to use an integrated 'report-puller' at the small company where I work: a custom-built program which can pull user-defined data from our central data capture system. Creates large but (more) manageable Excel files.

Permalink to Comment

9. HelicalZz on March 19, 2012 9:02 AM writes...

I too am in a small company with a modest number of data and compound sets. Excel is just barely manageable for this though. It requires a database, and people who know how to manage and use it. Document and data retention, organization, management, organization, etc. is something small companies don't pay enough attention to in my opinion (until forced to).

So the question becomes, how many chemists train in database management and use? Why not? If you are at a large company there is really little excuse to not get yourself at least a basic level of this training (as speaks the experience born from not taking advantage of opportunities like that myself when I should have).


Permalink to Comment

10. LionelC on March 19, 2012 9:04 AM writes...

I am new in a small chemical start-up. I have to implement/select a new system for the management of the data from the chemical structure (with exact stereochemistry), to the certificat of analysis or their biological data.
It is not clear to see what are the solutions, so any comment on it are welcomes.

But I think that clearly it depends on what are yours needs. Is it to save your know-how or is it to do Med.Chem. and select the next compound to do?

Permalink to Comment

11. John Wayne on March 19, 2012 9:10 AM writes...

If you can afford it, Spotfire is an excellent program. You can import a spreadsheet and pretty easily visualize your data in many dimensions. You can easily save your work as a picture file, and they make handy tools in presentations.

Permalink to Comment

12. Miramon on March 19, 2012 9:11 AM writes...

I'm curious who the industry leader is in this software category. Is the quality -- usability and power -- of the software generally satisfactory for most users? Do "semantic" features add value here?

Permalink to Comment

13. Moses on March 19, 2012 9:11 AM writes...

# 10. LionelC:
You have a choice of several established companies.
The big fish is Accelrys/MDL/Symyx, there's Cambridgesoft, ChemAxon and Dotmatics. I've worked for a forerunner of Accelrys and like their Accord software, but Dotmatics is probably a neater option these days.

Permalink to Comment

14. SP on March 19, 2012 9:20 AM writes...

Schrodinger Seurat works pretty well and is cheaper than the "big fish." I've worked with Cambridgesoft in the past, was not very happy with it.

Permalink to Comment

15. LionelC on March 19, 2012 9:27 AM writes...

Thanks #13 and #14 for the softwares.

Just I have to add that, ok for R or Spotfire and so on, but clearly I think that the first and most important thing is to manage correctly the chemical structures. If your structures are false the rest will be too...

Permalink to Comment

16. AJ on March 19, 2012 9:40 AM writes...

SPotfire is great if you have "clean/organized" data and one can be productive within a VERY short time - one of the best tools I have EVER worked with. To actually "work" with data one shouldnt forget "KNIME" in combi with R.... seen from a compound and HCS/imaging screening perspective. Matlab is a classic and very powerful anyhow, though one really needs to know the details...


Permalink to Comment

17. CMCguy on March 19, 2012 9:49 AM writes...

My experiences ranges from big to small to medium and there was commonality that individuals and groups/Departments tended to have collections of their own Excel spreadsheets for info that was important or wanted readily accessible. While most also had or went at some point to Oracle Databases, customized in-house, while these may have been more comprehensive overall typically not very user-friendly and needed certain expertise to enter or extract anything.

I have heard it proposed that it was inability of bioinformatics that was the true reason combichem did not achieve that much successful applications because generated more data than could be reasonably handled at the time. I wonder if there exists better database tools today that would allow greater exploitation of the techniques although still would never be the panacea was once proclaimed?

Permalink to Comment

18. JTM on March 19, 2012 10:05 AM writes...

I once worked at a company that used Spotfire as the primary data visualisation tool for everyone - it was truly wonderful but insanely expensive.

I'm currently at a much small company where we use Cambridgesoft suite for collection of data (ELN, biology etc) and Excel / Dotmatics for visualisation. This works fine (it's difficult to beat excel) although it's a little clumsy and unstable. We're also evaluating Dotmatics Vortex, which they have in development as an alternative to Spotfire. At the moment it looks pretty decent (Structure visualisation, capable of handling plots of upwards of 100k datapoints

Permalink to Comment

19. DrSnowboard on March 19, 2012 10:06 AM writes...

Used to use Spotfire for HTS data - as one commenter says data needs to be quite clean.
Dotmatics usurped MDL / Isis / whatever symyx call it now, in my view. MDL got complacent and just threw in the towel as far as I was concerned. Dotmatics is web based and quick for medium sized datasets. Reporting is getting better too, and allows you to interface with legacy ActivityBase which biologists love and chemists hate, because it's useless at chemistry..

Permalink to Comment

20. JB on March 19, 2012 10:11 AM writes...

Re #15- For chemical intelligence we use Chemaxon which is really good, supportive company and they're very generous with academic collaborations. I'll also hint at some NIH-funded public tools that will be developed over the next year for assay management and data mining.
I love Spotfire (especially the newer version that replaced Decisionsite) and pipeline pilot, but as people mentioned both are very expensive- I've heard Knime as a cheap alternative to PP but haven't personally tried it.

Permalink to Comment

21. HTSguy on March 19, 2012 10:16 AM writes...

+1 to 18: Spotfire is very, very useful and insanely expensive.

I've used both Pipeline Pilot (while employed) and KNIME (while unemployed - it's free). Pipeline Pilot is both far faster (where I currently work we have it running on a small, several-year old server) and much better integrated (KNIME unfortunately acts like the Frankenstein monster it is - a jumble of disparate parts). I guess you get what you pay for.

Permalink to Comment

22. anon2 on March 19, 2012 10:29 AM writes...

This is a topic analogous to a circular piece of string with no end. It comes down, all too often, about different strokes for different folks. Everyone wants an easy to use data base, but the uses and hence objectives are not always consistent----biologists, synthetic chemists, modeling scientists, clinical folks interfacing with preclinical data, geneticists. None are always "right". None are always "wrong". But, no one system has proven (to me) to work for everyone.

Permalink to Comment

23. Thomas on March 19, 2012 10:34 AM writes...

Pfizer has a great tool called RGate - does everything including list logic, registration of virtual compounds, and exports nicely to Spotfire. There were some talks to make it available to the public, and/or replace it with a commercial product ("cheaper" as not supported by in-house specialists).

Permalink to Comment

24. AJ on March 19, 2012 10:39 AM writes...

@22 - thats what I liked bout "spotfire" - It didnt really care bout the origin of the data - it could just handle ERVERYTHING as long as it was in a structured table etc. ... though we never paid a lot for it (academia then) ...

Permalink to Comment

25. cdsouthan on March 19, 2012 10:43 AM writes...

These comments are pertinent to dealing with your own data but it's difficult to go it alone because there is an increasing imperative to intersect structures and linked data (preferably quickly) with public sources such as PubChem and ChEMBL. Having being involved in a project that engaged with this (PMID:22024215) believe me it gets even tougher than grappling with the in-house stuff.

Permalink to Comment

26. exGlaxoid on March 19, 2012 10:44 AM writes...

Used to use ISISBase, then evaluated MDLBase for several years before it was stable enough to use.

Also tried a few other programs, and a few were OK, but nothing was a powerful as ISISBase. Spotfire is neat, but I didn't like it as much as ISIS once you had a nice project table definition. But of course that took IT support, so only good at larger companies.

Currently I use a mix of Excel, some ChemAxon products, some Cambridge software, and some other software. Not thrilled with the cobbled together package, but it works for now. This (Excel) is only doable when only one project data is in each spreadsheet, otherwise it is too big. Plus now a researcher has to manually add the data, along with the issues of manually updating and data integrity issues.

Permalink to Comment

27. passionlessDrone on March 19, 2012 10:59 AM writes...

Hello friends -

I don't know squat from chemistry, but do know a little bit about technology. I've been playing around with a tool called Tableau that is priced very reasonably and can visualize data sets pretty easily.

- pD

Permalink to Comment

28. RD on March 19, 2012 11:13 AM writes...

Some suggestions:
1.) Spotfire. You can plot anything using any criteria you want in many dimensions.
2.) have your rogue programmer write an app that will allow you to see your compounds as clusters on a grid, instead of entries in a spreadsheet. I have no idea why a chemist would want to scroll through a spreadsheet when all the information is staring you in the face in a grid. Have rogue programmer add filtering and color coding to the grid to make it easier to spot patterns and activity trends.
3.) hire a really good rogue programmer. There are a few out there. The IT and informatics departments tend to try to handcuff them.

Permalink to Comment

29. chemit on March 19, 2012 11:21 AM writes...

No magic here, the best tools, especially for chemistry data management, are often the most expensive ones. But they usually worse it (for medium-large companies) if you need productivity, security, performance, reproductibility... Lucky people who have money can get Pipeline Pilot / Spotfire or Dotmatix to build a robust and powerfull system that can be used by everyone.

Free tools like KNIME are nice alternatives for simple tasks, but are really not (yet ?) in the same category. I don't know any decent free alternative to Spotfire / Dotmatix (anyone ?). Chemaxon is probably the best compromise, in terms of money, features and quality.

Finally (but you'll need money too), a skilled IT team who know chemists / biologists needs will do the job too. Look at J&J's ABCD, which looks quite wonderfull...

@25 indeed, so much to do in this area!!

Permalink to Comment

30. anon2 on March 19, 2012 11:41 AM writes...

The lack of concensus from this one biased slice of data users (tending toward chemistry) simply emphasizes my previous simple stroke for all folks. If it had been resolved across the scientific, data driven community, then this disucssion would not be taking place.

Sometimes, such "obvious" overly-simplistic questions have obvious, but maybe not so satisfying, answers. Realism hurts.

Permalink to Comment

31. Publius The Lesser on March 19, 2012 12:04 PM writes...

The problem you're describing (finding meaningful patterns in large data sets) is common across all technical domains and drives the "big data" fad -- Google for it if you want to see the hype cycle in full swing. Behind the hype are some very useful tools and techniques for finding and visualizing patterns in large data sets. Because each domain is different, there really isn't a good canned solution, and these kinds of problems are solved by people called "data scientists" these days. A good data scientist is one part domain expert, one part statistician, one part machine learning expert, and one part coder. Although I work primarily on text documents of various sorts, about 5-6 years ago, I applied some of these techniques to mass spectroscopy data back when I was a postdoc at Carnegie Mellon, with mixed results.

Permalink to Comment

32. kissthechemist on March 19, 2012 12:09 PM writes...

As a small pharma drone, we started out with Excel and it soon got very heavy. We were guided into Dotmatics Browser to handle all of our data by talented computational chemists we had. We had some teething troubles, but the made-to-measure database we have now is very user friendly and has speeded up things enormously both for the biologists (especially data entry) and the chemists (SAR is a joy not a chore). The folks at Dotmatics have been pretty spot on too, only drawbacks are the need for our own IT people (which we fortunately already have) and of course the price.

Overall, I'd say its an investment which makes sense for companies of a certain size. I'd certainly hate to be without it.

Permalink to Comment

33. JB on March 19, 2012 12:27 PM writes...

Cheap spotfire- when I was at a smaller company we used a program called Miner3D (I think the original company is Hungarian) that was the basic graphing functions of spotfire with some funny shapes included in the icon set (we always thought they were various Hungarian peasants.)

Permalink to Comment

34. cbrs on March 19, 2012 12:29 PM writes...

A relatively new and cost-effective platform is Chembiography. It utilizes a web based front-end coupled to a linux server with MySQL, so no cost overhead of Oracle and the like. Can be done as local or cloud model and is designed as solution for medium to small companies. Provides full registry and integrated biological data uploading and flexible querying with output as pdf or excel reports.

Permalink to Comment

35. C-me on March 19, 2012 12:52 PM writes...

CDD Collaborative Drug Discovery) is a choice that is web-based and does not require internal IT people. Chemical structure based, calculates all properties and will not let you register the same compound under 2 ID#s. Economical for a small player, and price includes training and complete support.
Drawback is (still) that the rendering of structures is not great in the output files (Excel). I hope they work on this because it is powerful and a workhorse especially if you have people working at different sites, consultants, etc.

Permalink to Comment

36. Assay Developer on March 19, 2012 1:09 PM writes...

@19: Biologists hate Activity Base too. And from what I hear, so do the developers. I have a lot of experience with the MDL/Symyx package (ISIS, Assay Explorer), and there is no commercial solution out there that is priced reasonably and powerful enough to do the jobs. We finally went with an in-house system running off of Pipeline Pilot for data entry/analysis & Seurat for querying the Accord db. So far so good & requires lower level of IT commitment/support.

Permalink to Comment

37. Lab Monkey on March 19, 2012 1:13 PM writes...

Another vote for Spotfire - great for visualising lots of data, and you can build all sorts of widgets to increase its functionality.

+1 for #27's Tableau suggestion too. I was looking for a free/cheap Spotfire alternative for use outside work, and this fitted the bill. Tableau Public is free (although the data you upload isn't confidential), but there are desktop/commercial versions that don't seem unreasonably priced.

Permalink to Comment

38. DataHogWild on March 19, 2012 1:22 PM writes...

Has anyone used a tool called Ensemble?

Permalink to Comment

39. Cellbio on March 19, 2012 2:10 PM writes...

Spotfire for me as well.

#6, Though perhaps the responsibility was on the Med Chemists in times past, I think it a requirement of biologists to become facile and hold responsibility from assay qualification and database interfacing through data interpretation. By this I mean validate the assays to support robust thru-put, and assure rapid and robust methodologies for data QC and release into the database. This is best if supported by policy that no data can be held in private work sheets, and by requiring teams to only work from published data. My favorite outcome after requiring this was having a biologist present results, with a confused chemist asking, "But who did your SAR?" when it was a simple extrapolation of biological data. This was a very productive change in the discourse that followed between chemist and biologist.

Final point, have meetings present data live from Spotfire (or equivalent) rather than powerpoint. This supports the need for data to be imported to databases, and will show the team how sparse most data sets are in reality. I have found the number one problem is not how to handle the thousands of rows by hundreds of columns, but how to make the sparse data sets more full to make such analysis useful.

OK, final final point, this effort will also yield insight into the number of redundant biological assays that support different programs, resulting in non-overlapping data sets and expansion of the number of columns. Makes coherent analysis tough, and reveals opportunity for improving efficiency.

Permalink to Comment

40. MIMD on March 19, 2012 3:44 PM writes...

I would like to point the readers to a series on how software to look at large amounts fo data should NOT be. That is, presenting a markedly mission hostile user experience.

Permalink to Comment

41. Stonebits on March 19, 2012 4:10 PM writes...

I'd vote for what I guess is a high end solution (I'm a developer): solid chemistry db, with good storage of the assay data, the data from this is pulled into a warehouse and then formatted by a custom program for display in a browser. It's not trivial, but everyone can then point to the same data, which has not only been vetted on input but calculated in the same way and is comparable between chemists, projects etc..

Permalink to Comment

42. Anonymous on March 19, 2012 4:43 PM writes...

We (a lab at U. Michigan) use Collaborative Drug Discovery and love it. It's most used for data storage and searching (HTS, biological). Great for that and affordable. Still trails Spotfire in terms of visualization.

Permalink to Comment

43. Martin on March 19, 2012 4:58 PM writes...

From an academic perspective, it's really a matter of choosing a computing platform first, sticking with it, and then choosing the software to fit. We like most university departments are platform agnostic. We use Macs, Windows running everything from NT! to 7, Linuxes of various flavours, Irix and the list goes on. As far as I have been able to ascertain, there is no truly cross-platform product out there that a smallish university department with one and a half IT support people can deploy at prices that the aforementioned department can afford on academic budgets. The same goes for ELNs.

Whilst I assume that the choice of desktop platform in Pharma is a bit more "structured", making rollouts of such systems easier to maintain, it only comes with a higher cost. Fans of open source solutions perversely find little favour in constrained IT departments where the time costs ultimately outweigh the material costs, not to mention the typical churn in University IT departments where such expertise is rapidly lost when that one crucial employee moves on to a better paid job.

Permalink to Comment

44. DubbleDonkey on March 19, 2012 6:08 PM writes...

As many others have said Spotfire is a great tool but is too expensive for many. Dotmatics Vortex looks to be a much cheaper but worthy alternative. However these tools are only as good as the underlying data. Getting good quality data is far more challenging. Tools such as ActivityBase are powerful, flexible and great at getting data into a database. Getting them setup in a way which allows for minimum amount of developer maintenance can be a challenge though. It’s tempting to build new templates for every assay and let the biologists choose different analysis routines. Instead it’s worth investing some time up front standardizing data analysis and assay naming conventions. As well as minimizing maintenance it will be easier for chemists to navigate results and compare results from different assays. You can also reduce the number of columns if the data analysis is standardized.

Be wary of Excel spreadsheets. Things can quickly get out of hand with them and you can end up in a real mess. Most Informatics people I know would be happier if it was uninstalled from biologist’s and chemist’s desktops! It’s use in ActivityBase can tame it somewhat but if you can afford it ActivityBase XE is the way to go for analyzing assay data. Biologists love the flexibility and extensive visualisations. Getting the data out of ActivityBase into chemist friendly views is more challenging. Dotmatics Browser can help with this.

Pipeline Pilot is a must have for the Informatics people. You can very quickly put something together to process some data and the fact that it’s chemically aware makes it invaluable. It's an unusual day if I don't use it.

Permalink to Comment

45. dvrvm on March 19, 2012 7:04 PM writes...

I've seen ChemFinder and a proprietary Access database in action. Both in academic settings, however. ChemFinder can act as a solid foundation for a variety of different systems, which are similar, but not identical.

Permalink to Comment

46. TJMC on March 19, 2012 7:29 PM writes...

2-3 issues drive the above comments and stories. Their main focus is on how to gain understanding of relationships and patterns of the diverse information types that Discovery spans. Excel and typical relational databases are everywhere, but hit limits in scale, utility and ease of use to the typical scientist. Most chemists are excellent in visualization skills in 3-D (and more dimensions.) Hence, Spotfire is readily embraced. Problem is, the range of data TYPES is exploding and tech struggles to keep up, let alone making it easy for non-data scientists. The third issues besides utility and diverse kinds of data creep, is that “usual” tools prefer well-behaved (structured) data and relationships.

All of the above became apparent a while back from a large Pharma R&D survey. A MAJORITY of respondents noted that they “abandoned avenues of research” because “it seemed too difficult or impossible” to con