« Ariad Loses on Appeal |
| Sir James Black, 1924-2010 »
March 22, 2010
Benford's Law, Revisited
I mentioned Benford's Law in passing in this post (while speculating on how long people report their reactions to have run when publishing their results). That's the rather odd result that many data sets don't show a random distribution of leading digits - rather, 1 is the first digit around 30% of the time, 2 leads off about 18% of the time, and so on down.
For data that come from some underlying power-law distribution, this actually makes some sense. In that case, the data points spend more time being collected in the "lag phase" when they're more likely to start with a 1, and proportionally less and less time out in the higher-number-leading areas. The law only holds up when looking at distributions that cover several orders of magnitude - but all the same, it also seems to apply to data sets where there's no obvious exponential growth driving the numbers.
Lack of adherence to Benford's Law can be acceptable as corroborative evidence of financial fraud. Now a group from Astellas reports that several data sets used in drug discovery (such as databases of water solubility values) obey the expected distribution. What's more, they're suggesting that modelers and QSAR people check their training data sets to make sure that those follow Benford's Law as well, as a way to make sure that the data have been randomly selected.
Is anyone willing to try this out on a bunch of raw clinical data to see what happens? Could this be a way to check the integrity of reported data from multiple trial centers? You'd have to pick your study set carefully - a lot of the things we look for don't cover a broad range - but it's worth thinking about. . .
+ TrackBacks (0) | Category: Clinical Trials | In Silico | The Dark Side
POST A COMMENT
- RELATED ENTRIES
- Sanofi Bets on Schrödinger
- Review of Reviews
- Silicon Valley Sunglasses
- Trouble with Sovaldi, Or Not?
- Rotten Rottlerin
- Another Argument For Tau
- Switch Flasks, Switch Products