At many companies, this is performance review season. As I’ve written about before, this is a particularly hard thing to do right in a research organization. It’s so hard, actually, that never once have I heard of one where the scientists were satisfied with how people were being rated. I think it’s probably impossible for any organization, if you want to know the truth. It’s like trying to design a perfect voting system. No matter what happens, some people are going to feel, perhaps even with justification, as if they’ve been had.
But evaluating scientists is especially thankless. If you have a lot of really good ones, it’s a little like filling out yearly reports on poets. Hmmm. . . Mr. Larkin. I see you haven’t published anything so far this year, and still no collection since The Whitsun Weddings. . .wasn’t that on your goals statement for this period? I don’t really see how we can give you an “exceeds” rating given all that. And Mr. Lowell, it’s true that you produced a great number of sonnets during this review period, but I can’t help but believe that these were less of an effort than some of the work you’ve done for us before, and they certainly had less of an impact on our operations. No, I think that “meets expectations” is probably the correct category this year. . .And as for you, Mr. Housman, we need to ask ourselves just how long it has been since A Shropshire Lad. . .
Rating research productivity sends you into the same thickets. If someone hammered out a long list of analogs, but used pretty much the same chemistry to make each of them, how do you rate that compared to someone who had to hand-forge everything (and produced a correspondingly smaller pile)? How much should number of compounds count for, anyway – how about impact? What if the big bunch of compounds didn’t do much for the project, but one of the tough ones opened up a whole new area? (Or what if it was the reverse?) But isn’t that partly luck – what if the one that hit was totally unexpected, even by the person who made it? What if it became a great compound for reasons totally out of their hands?
And then you get to the people who aren’t necessarily cranking out analogs, the lab heads and such. They’re supposed to be leading projects, managing direct reports, coming up with ideas. How’d they do? How can you tell? Can you reliably distinguish a project that got lucky, or had a better starting point, from a well-managed one that has nonetheless been wandering around in the wilderness? Put your best people on, say, a protein-DNA interaction target, and pretty soon they won’t look so good, either.
No, even with the best rating system in the world, it would be hard to fill out the reports on drug discovery projects. And you can take it as given that no one is using the best rating system in the world. (Some may in fact be experimenting with the worst). The yearly frequency of ratings is one problem – anything tied to the calendar is a potential problem, since the compounds, the cells, and the rats never know what month it is. This has been a problem for a long, long time. I once quoted from Rayleigh’s biography of physicist J. J. Thomson. You wouldn’t want to run a whole department on the following system, but you don’t want to ignore the man’s point, either:
"If you pay a man a salary for doing research, he and you will want to have something to point to at the end of the year to show that the money has not been wasted. In promising work of the highest class, however, results do not come in this regular fashion., in fact years may pass without any tangible results being obtained, and the position of the paid worker would be very embarrassing and he would naturally take to work on a lower, or at any rate, different plane where he could be sure of getting year by year tangible results which would justify his salary. The position is this: you want this kind of research, but if you pay a man to do it, it will drive him to research of a different kind. The only thing to do is to pay him for doing something else and give him enough leisure to do research for the love of it."
And the insistence of many HR departments that the ratings fall on a normal distribution is another problem. Sure, if you hired a few thousand random people and turned them loose on the work, you could expect some sort of bell curve, assuming that you’ve solved that problem of fairly evaluating them. But you didn’t hire your people at random, did you? Everyone’s supposed to be at some level of competence right from the start. Some of those performance distribution curves are reflecting the randomness of research or the defects in rating it, rather than any underlying truths about performance.