I heard Leigh Anderson several years ago and he drove home the point that it isn’t economically feasible in the long run to just do the proteomics manually, without the kind of automation that we now use for so many markers that are in a range 100-fold greater. The Clinical laboratory grinds out results that are 80% of the information a clinician uses.

How do you succeed going forward. Analysis of multiple analytes, accurately, and at a good rate. That is only part of the answer. The other feature is to analysis the mass of data, each analyte having no vlaue or a scaled value alone, and the combination of values forms clusters. The clusters may aggregate within a distance of a centroid in n-dimensional space. While closely spaced clusters may have missing confirmatory values, the close clusters aggregate into a single cluster in 3 dimensional space, and there are other clusters of similar nature at a distance (Mahalobe’s distance) between the centoids. This multivariable vector would align with the different diseases at baseline. I say at baseline because you have to delineate where you are in the disease process. The 4th dimension is time, introducing a stochastic factor. So you would want to eventually know about the differential changes in the clusters with treatment. Clinical trials aren’t designed to deal with this level of complexity.

This may not seem to be a possibility at first blush, but I have done what I think are proof-of-concept studies with Gil David under RR Coifman in Mathematics at Yale (member of NAS, and Received the National Science Award). The biggest problem that confronts us is that there is no way that you can make an assumption that the distributional characteristics of each predictor is gaussian, or that the distributions are the same. In looking at a small number of predictors, I have looked at log transform, and the Tukey’s folded-log, in a study with Christos Tsokos in 1975. In order to break out the information, it is necessary to use the tools of finding “worms” in communication. Finally, when I thought that you can only handle maybe 5 variables at most, Gil David classified the hemogram using 16 variables, and he obtained probabilities from a large database of 30,000 patients, sufficient for training and validation.

]]>