The Role of Informatics in The Laboratory

Larry H. Bernstein, M.D.

**Introduction**

The clinical laboratory industry, as part of a larger healthcare entrerprise, is in the midst of large changes that can be traced to the mid 1980’s, and that have accelerated in the last decade. These changes are associated with a host of dramatic events that require accelerated readjustments in the work force, scientific endeavors, education, and the healthcare enterprise. These changes are highlighted by the following (not unrelated) events: globalization, a postindustrial information explosion driven by advances in computers and telecommunications networks, genomics and proteomics in drug discovery, consolidation in retail, communication, transportation, the healthcare and pharmaceutical industries. Let us consider some of these events. Globalization is driven by the principle that a manufacturer may seek to purchase labor, parts or supplies from sources that are less than is available at home. The changes in the airline industry have been characterized by growth in travel, reductions in force, and ability of customers to find the best fares. The discoveries in genetics that have evolved from asking questions about replication, translation and transcription of the genetic code, has moved to functional genomics and to elucidation of cell signaling pathways. All of these changes were impossible without the information explosion.

## The Laboratory as a Production Environment

The clinical laboratory produces about 60 percent of the information used by nurses and physicians to make decisions about patient care. In addition, the actual cost of the laboratory is only about 3 – 4 percent of the cost of the enterprise. The result is that the requirements for the support of the laboratory don’t receive attention without a proactive argument of how it contributes to realizing the goals of the organization. The key issues affecting laboratory performance are: staffing requirement, instrument configuration, workflow, what to send out, what to move to point-of-care, how to reconfigure workstations, and how to manage the information generated by the laboratory.

Staffing requirement, instrument configuration and workflow are being addressed by industry automation. The first attempt was based on connecting instruments by tracks. This system proved unable to handle STAT specimens without a noticeable degrading of turnaround time. The consequence of the failure is to drive creation of a parallel system of point-of-care, and connecting them in a network with a RAWLS. Another adjustment was to have an infrastructure for pneumatic tube delivery of specimens, and to redesign the laboratory. This had some success, but required capitalization. The pneumatic tube system could be justified on the basis to a value to the organization in supporting services besides the laboratory. The industry is moving in the direction of connected modules that share an automated pipettor and reduce the amount of specimen splitting. These are primarily PREANALYTICAL refinements.

There are other improvements that affect quality and cost that are not standard, and should be. These are: autoverification, embedded quality control rules and algorithms, and incorporation of the X-bar into standard quality monitoring. This can be accomplished using middleware between the enterprise computer and the instruments designed to do more than just connect instruments with the medical information system. The most common problem encountered when installing a medical repository is the repeated slowdown of the system as more users are connected with the system. The laboratory has to be protected from this phenomenon, which can be relieved considerably by an open-architecture. Another function of middleware will be to keep track of productivity by instrument, and to establish the cost per reportable result.

## The Laboratory and Informatics

**A few informatics requirements for the processing of tests are:**

- Reject release of runs that fail Quality Control rules
- Flag results that fail clinical rules for automatic review
- Ability to construct a report that has correlated information for physician review, regardless of where the test is produced (RBC, MCV, reticulocytes and ferritin)
- Ability to present critical information in a production environment without technologist intervention (platelet count or hemoglobin in preparation of transfusion request)
- Ability to download 20,000 patients from an instrument for review of reference ranges
- Ability to look at quality control of results on more than one test on more than one instrument at a time
- Ability to present risks in a report for physicians for medical decisions as an alternative to a traditional cutoff value

I list essential steps of the workload processing sequence and identification of informatics enhancement of the process (bolded):

Prelaboratory (ER) 1:

Nurse draws specimens from patient (without specimen ID) and places tubes in bag labeled with name

Nurse prints labels after **patient is entered**.

Labels put on tubes

**Orders entered** into computer and labels put on tubes

Tubes sent to laboratory

### Lab test is shown as PENDING

Prelaboratory 2:

Tubes in bags sent to lab (by pneumatic tube)

Time of arrival is not same as time of order entry (10 minutes later)

If order entry is not done prior to sending specimen – entry is done in front processing area –

Sent to lab area 10 minutes later after test is entered into computer

Preanalytical:

Centrifugation

Delivery to workareas (bins)

**Aliquoting** for serological testing

### Workstation assignment

Dating and amount of reagents

Blood gas or co-oximetry – no centrifugation

Hematology – CBC – no centrifugation

send specimen for Hgb A_{1c}

Send specimen for Hgb electrophoresis and Hgb F/Hgb A2

Specimen to Aeroset and then to Centaur

Analytical:

### Use of bar code to encode information

Check alignment of bar code

**Quality control and calibration at required interval** – check before run

Run tests

Manual:

2 hrs per run

enter accession #

enter results 1 accession at a time

Post analytical:

Return to racks or send to another workarea

### Verify results

Enter special comments

Special problems:

Calling results

Add-on tests

Misaligned bar code label

Inability to find specimen

Coagulation

Manual differentials

Informatics and Information Technology

The traditional view of the laboratory environment has been that it is a manufacturing center, but the main product of the laboratory is information, and the environment is a knowledge business. This will require changes in the education of clinical laboratory professionals. Biomedical Informatics has been defined as the scientific field that deals with the storage, retrieval, sharing, and optimal use of biomedical information, data, and knowledge for problem solving and decision making. It touches on all basic and applied fields in biomedical science and is closely tied to modern information technologies, notably in the areas of computing and communication. The services supported by an informatics architecture include operations and quality management, clinical monitoring, data acquisition and management, and statistics supported by information technology.

The importance of a network architecture is clear. We are moving from computer-centric processing to a data-centric environment. We will soon manage a wide array of complex and inter-related decision-making resources. The resources, commonly referred to as objects and contents, can now include voice, video, text, data, images, 3D models, photos, drawings, graphics, audio and compound documents. The architectural features required to achieve this is in Fig 1.

According to Coeira and Dowton (Coiera E and Dowton SB. Reinventing ourselves: How innovations such as on-line ‘just-in-time’ CME may help bring about a genuinely evidence-based clinical practice. Medical Journal of Australia 2000;173:343-344), echoing Lawrence Weed, “Clinicians in the past were trained to master clinical knowledge and become experts in knowing why and how. Today’s clinicians have no hope of mastering any substantial portion of the medical knowledge base. Every time we make a clinical decision, we should stop to consider whether we need to access the clinical evidence-base. Sometimes that will be in the form of on-line guidelines, systematic reviews or the primary clinical literature.”

Fig 1

Interoperability across environments

Define representation for storage that is independent of implementation

Define a representation of collection that is independent of the database – schema, table structures

Informatics and the Education of Laboratory Professionals

The increasing dependence on laboratory information and the incorporation of laboratory information into Evidence-Based Guidelines necessitates a significant component of education in informatics. The public health service has mandated informatics as a component of competencies for health services professionals (“Core Competencies for Public Health Professionals” compendium developed by the Council on Linkages Between Academia and Public Health Practice.), and nursing informatics competencies have already been written. Coiera (E. Coiera, Medical informatics meets medical education: *There’s more to understanding information than technology**, **Medical Journal of Australia* 1998; 168: 319-320) has suggested 10 essential informatics skills for physicians.

I have put together a list below with items taken from Coiera and the Public Health Service competencies for elaboration of competencies for Clinical Laboratory Sciences.

A. Personal Maintenance

1. Understands the dynamic and uncertain nature of medical knowledge and know how to keep personal knowledge and skills up-to-date

- Searches for and assesses knowledge according to the statistical basis of scientific evidence
- Understands some of the logical and statistical models of the diagnostic process
- Interprets uncertain clinical data and deals with artefact and error
- Evaluates clinical outcomes in terms of risks and benefits

B. Effective Use of Information

Analytic Assessment Skills

- Identifies and retrieves current relevant scientific evidence
- Identifies the limitations of research
- Determines appropriate uses and limitations of both quantitative and qualitative data

9. Evaluates the integrity and comparability of data and identifies gaps in data sources

10. Applies ethical principles to the collection, maintenance, use, and dissemination of data and information

11. Makes relevant inferences from quantitative and qualitative data

12. Applies data collection processes, information technology applications, and computer systems storage/retrieval strategies

13. Manages information systems for collection, retrieval, and use of data for decision-making

14. Conducts cost-effectiveness, cost-benefit, and cost utility analyses

- Effective Use of Information Technology

- Select and utilize the most appropriate communication method for a given task (eg, face-to-face conversation, telephone, e-mail, video, voice-mail, letter)
- Structure and communicate messages in a manner most suited to the recipient, task and chosen communication medium.

17. Utilizes personal computers and other office information technologies for working with documents and other computerized files

- Utilizes modern information technology tools for the full range of electronic communication appropriate to one’s duties and programmatic area.
- Utilizes information technology so as to ensure the integrity and protection of electronic files and computer systems

- Applies all relevant procedures (policies) and technical means (security) to ensure that confidential information is appropriately protected.

I expand on these recommended standards. The first item is personal maintenance. This requires continued education to meet the changing needs of the profession in expanding knowledge and access to knowledge that requires critical evaluation. The payment for the profession has been paid for recognizing the technical contributions made by the laboratory profession as a task oriented contribution, but not for a contribution as a knowledge worker. This can be changed, but it can’t be realized through the usual bacchalaureate educated requirement. Most technologists want to get out in the workforce, but after they are out in the workforce – what next? In many institutions, it falls back on the laboratory to provide the expertise to drive the organization in the computer and information restructuring, from staff taken from the transfusion service, microbiology, and elsewhere. The laboratory is recognized for an information expertise, but then there is still reason to do more. The fact is that the mind set of the laboratory staff has been in a manufacturing productivity related to test production, but the data that the production represents is information. We have the quality control of the test process, but we are required to manage the total process, including the quality of the information we generate. Another consideration is that the information we generate is used for clinical trials, and a huge variation in the way the information is used is problematic.

The first category for discussion is personal maintenance. These items are keeping up with knowledge about advances in medical knowledge, being critical about the quality of the evidence for current knowledge, and being aware of the statistical underpinnings for that thinking (1-5). It is not enough to keep up with changes in medical thinking using only the professional laboratory literature. A systematic review of problem topics using PubMed as a guide is also essential. This requires that the clinical laboratory scientist will have to know how to access the internet and search for key studies concerning the questions that are being asked. The reading of abstracts and papers also requires an education in methods of statistical analysis, contingency tables, study design, and critical thinking. The most common methods used in clinical laboratory evaluation are linear regression, linear regression, and yes, linear regression. A discussion over distance learning among members of the American Statistical Association reveals that much of statistical education for the biologists, chemists, and engineers now comes from *software*. Knowledge workers in drug development and in molecular diagnostics are increasingly challenged with larger, more complicated data sets, and there is a need to interpret and report results quickly. This need is not confined to basic research or the clinical setting, and it may have to be done without consulting with statisticians. Category A slides into category B, effective use of information.

Effective use of information requires skills that support the design of evaluations of laboratory tests, methods of statistical analysis, and the critical assessment of published work (6-9), and the processes for collecting data, using information technology application, and interpreting the data (10-12). Items 13 and 14 address management issues.

There is a vocabulary that has to be mastered and certain questions that have to be answered whenever a topic is being investigated. I identify a number of these at this point in the discussion.

Contingency Table: A table of frequencies, usually two-way, with event type in columns and test results as positive or negative in rows. A multi-way table can be used for multivalued categorical analysis. The conventional 2X2 contingency table is shown below –

No disease | Disease | ||

Test negative | A (TN) | B (FN) | A+B |

PVN =

TN/(FN+TN)Test positiveC (FP)D (TP)C+D

PVP =

TP/(TP+FP) A+C

Specificity=

TN/(FP+TN)B+C

Sensitivity =

TP/(TP+FN)A+B+C+D

Type I error: There is no finding when one actually exists (missed diagnosis)(false negative error).

Type II error: There is a finding when none exists (false positive error).

Sensitivity: Percentage of true positive results. D/(B + D)

Specificity: Percentage of true negative results. A/(A + C)

False positive error rate: The percentage of results that are positive in the absence of disease (1 – specificity). C/(A + C)

ROC curve: Receiver operator characteristic curve is plot of sensitivity vs I-specificity. Two methods can be compared in ROC analysis by the area under the curve. The optimum decision point can be identified as within a narrow range of coordinates on the curve.

Predictive value (+)(PVP): Probability there is disease when a test is positive (D/C + D), or percentage of patients with disease, given a positive test. The observed and expected probability may be the same or different.

Predictive value (-)(PVN): Probability of absence of disease given a negative test result (A/A + B), or percentage of patients without disease given a negative test. The observed and expected probability may be the same or different.

Power: When a statement is made that there is no effect, or a test fails to predict the finding of disease, are there enough patients included in the study to see the effect if it exists. This applies to randomized controlled drug studies as well as studies of tests. Power protects against the error of finding no effect when it exists.

Selection Bias: It is common to find a high performance claimed for a test that is not later substantiated when it is introduced and widely used. Why does this occur? A common practice in experimental design is to define inclusion criteria and exclusion criteria so that the effect is very specific for the condition and to eliminate the interference by “confounders”, unanticipated effects that are not intended. A common example of this is the removal of patients with acute renal failure and chronic renal insufficiency because of delayed clearance of analytes from the circulation. The result is that the test is introduced into a population different than the trial population with claims based on the performance in a limited population. The error introduced could be prediction of disease in an individual in whom the effect is not true. This error is reduced by elimination of selection bias, which may require multiple studies using patients who have the confounding conditions (renal insufficiency, myxedema). Unanticipated effects often aren’t designed into a study. In many studies about cardiac markers, the study design included only patients who had Acute Coronary Syndrome (ACS) This is an example of selection bias. Patients who have ACS have chest pain of anginal nature that lasts at least 30 minutes, and usually have more than a single episode in 24 hours. That is not how a majority of patients present to the emergency department who are suspected of having a myocardial infarct. How then is one to evaluate the effectiveness of a cardiac marker?

Randomization: Randomization is the assignment of the treatment group to either placebo (no treatment) or treatment. The investigator and the participant enrolled in the study are blinded. The analyst might also be blinded. A potential problem is selection bias from dropouts who skew the characteristics of the population.

Critical questions:

What is the design of the study that you are reading? Is there sufficient power or is there selection bias? What are the conclusions of the authors? Are the conclusions in line with the study design, or overstated?

Statistical tests and terms:

Normal distribution: Symmetrical bell shaped curve (Gaussian distribution). The 2 standard deviation limits is approximately the 95% confidence interval.

Chi square test: Has a chi square distribution. Used for measuring probability from a contingency table. Non-parametric test.

Student’s t-test: Parametric measure of difference between two population means.

F-test: An F-test ( Snedecor and Cochran, 1983) is used to test if the standard deviations of two populations are equal. In comparing two independent samples of size *N*_{1} and *N*_{2} the F Test provides a measure for the probability that they have the same variance. The estimators of the variance are *s*_{1}^{2} and *s*_{2}^{2}. We define as test statistic their ratio *T* = *s*_{1}^{2}/ *s*_{2}^{2}, which follows an F Distribution with *f*_{1}= *N*_{1}-1 and *f*_{2}= *N*_{2}-1 degrees of freedom.

F Distribution: The F distribution is the ratio of two chi-square distributions with degrees of freedom and , respectively, where each chi-square has first been divided by its degrees of freedom.

Z scores: Z scores are sometimes called “standard scores”. The z score transformation is especially useful when seeking to compare the relative standings of items from distributions with different means and/or different standard deviations.

Analysis of variance: Parametric measure of two or more population means by the comparison of ** variances** between the populations. Probability is measured by the F-test.

Linear Regression: A classic statistical problem is to try to determine the relationship between two random variables *X* and *Y.* For example, we might consider height and weight of a sample of adults. Linear regression attempts to explain this relationship with a straight line fit to the data. The simplest case of regression — one dependent and one independent variable — one can visualize in a scatterplot, is simple linear regression (see below). The linear regression model is the most commonly used model in Clinical Chemistry.

Multiple Regression: The general purpose of multiple regression (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. The general computational problem that needs to be solved in multiple regression analysis is to fit a straight line to a number of points. A multiple regression fits a line using two or more predictors to the dependent variable by a model — Y = a_{1}X_{1} + a_{2}X + b + g.

Discriminant function: Discriminant analysis is a technique for classifying a set of observations into predefined classes. The purpose is to determine the class of an observation based on a set of variables known as predictors or input variables. The model is built based on a set of observations for which the classes are known. This set of observations is sometimes referred to as the training set. Based on the training set , the technique constructs a set of linear functions of the predictors, known as discriminant functions, such that

L = b_{1}x_{1 }+ b_{2}x_{2 }+ … + b_{n}x_{n} + c , where the b’s are discriminant coefficients, the x’s are the input variables or predictors and c is a constant.

These discriminant functions are used to predict the class of a new observation with unknown class. For a k class problem k discriminant functions are constructed. Given a new observation, all the k discriminant functions are evaluated and the observation is assigned to class i if the i^{th} discriminant function has the highest value.

**Nonparametric Methods:**

Logistic Regression: Researchers often want to analyze whether some event occurred or not. The outcome is binary. Logistic regression is a type of regression analysis where the dependent variable is a dummy variable (coded 0, 1). The linear probability model, expressed as Y = a + bX + e, is problematic because

- The variance of the dependent variable is dependent on the values of the independent variables.
- e, the error term, is not normally distributed.
- The predicted probabilities can be greater than 1 or less than 0.

The “logit” model has the form:

ln[p/(1-p)] = a + BX + e or

[p/(1-p)] = expa expBX expe

where:

- ln is the natural logarithm, log
_{exp}, where exp=2.71828… - p is the probability that the event Y occurs, p(Y=1)
- p/(1-p) is the “odds ratio”
- ln[p/(1-p)] is the log odds ratio, or “logit”

The logistic regression model is simply a non-linear transformation of the linear regression. The logit distribution constrains the estimated probabilities to lie between 0 and 1.

Graphical Ordinal Logit Regression: The logistic regression fits a non-parametric solution to a two-valued event. The outcome in question might have 3 or more values.

For example, scaled values of a test – low, normal, and high – might have different meanings. This type of behavior occurs in certain classification problems. For example, the model has to deal with anemia, normal, and polycythemia, or similarly, neutropenia, normal, and systemic inflammatory response (sepsis). This model fits the data quite readily.

Clustering methods: There are a number of methods to classify data when the dependent variable is not known, but is presumed to exist. A commonly used method classifies data using geometric distance of the average point coordinates. A very powerful method used is Latent Class Cluster analysis.

**Data Extraction:**

Data can be extracted from databases, but have to be worked at in a flat file format. The easiest and most commonly used methods are to collect data in a relational database, such as Access (if the format is predefined), or the convert data into an Excel format. A common problem is the inability to extract certain data because it is not in an extractable or usable format.

Let us examine how these methods are actually used in a clinical laboratory setting.

The first example is a test introduced almost 30 years ago into quality control in hematology by Brian Bull at Loma LindaUniversity called the x-bar function (also the Bull algorithm). The method looks at the means of runs of the population data on the assumption the means of the MCV don’t vary for a stable population from day-to-day. This is a very useful method that can be applied to the evaluation of laboratory. It is a standard quality control program used in industrial processes since the 1930s.

We next examine the Chi Square distribution. Review the formula for calculating chi square and calculations of expected frequencies. Take a two-by-two table of the type

Effect No effect Sum Column

Predictor positive 87 12 99

Predictor negative 18 93 111

Sum Rows 105 105 210

Experiment with the recalculation of chi square by changing the frequencies in the columns for effect and no effect, keeping the total frequencies the same. The result is a decrease in the chi square as predictor negative – effect and predictor positive – no effect both increase. The exercise can be carried out on the chi square calculator using Google to find the site. The chi square can be used to test the contingency table that is used to indicate the effectiveness of fetal fibronectin for assessing low risk of preterm delivery.

For example,

No Preterm Labor | Yes Preterm Labor | Sum Row | |

FFN – neg |
99 |
1 |
100 |

FFN – pos |
35 |
65 |
100 |

Sum Column |
134 |
66 |
200 |

PVN = 100*(1/100)% = 99%

99% observed probability that there will not be preterm delivery with a negative test.

Chi square goodness of fit:

Degrees of freedom: 1

Chi-square = 92.6277702397105

*p* is less than or equal to 0.001.

The distribution is significant.

Examine the effects of scaling of continuous data from a heart attack study to obtain ordered intervals. Look at the chi square test for the heart attack test by a Nx2 table with the table columns as heart attack or no heart attack. This allowed us to determine the significance of the test in predicting heart attack. Look at the Student T test for comparing the continuous values of the test between the heart attack and non-heart attack population. The T test is like the one-way analysis of variance with only two values for the factor variable. The T test and ANOVA1 compares the means between two populations. If the result is significant, then the null hypothesis that the data is taken from the same population is rejected. The alternative hypothesis is that they are different.

One can visualize the difference by plotting the means and confidence intervals for the two groups.

One can visualize the difference by plotting the means and confidence intervals for the two groups.

We can plot a frequency distribution before we calculate the means and check the distribution around the means. The simplest way to do this is the histogram. The histogram for a large sample of potassium values is used to illustrate this. The mean is 4.2.

We can use a method for quality control called the X-bar (Beckman Coulter has it on the hematology analyzer) to test the deviation from the means of runs. I illustrate the validity of the X-bar by comparing the means of a series of runs.

Sample size = 958

Lowest value = 84.0000

Highest value = 90.7000

Arithmetic mean = 87.8058

Median = 87.8000

Standard deviation = 0.9362

————————————————————

Kolmogorov-Smirnov test

for Normal distribution : accept Normality (P=0.353)

If I compare the means by the T-test, I am testing whether the sampling is taken from the same or different populations. When we introduce a third group, then we are asking whether the sampling is taken from a single population or to reject the hypothesis, taking the alternative hypothesis that the samples are different. This is illustrated by sampling from a group of patients with no cardiac disease and normal, neither of which have acute myocardial infarction. This is illustrated below:

Two-sample t-test on CKMB grouped by OTHER against Alternative = ‘not equal’

Group | N | Mean | SD |

0 | 660 | 1.396 | 3.085 |

1 | 90 | 4.366 | 4.976 |

Separate variance:

t = -5.518

df = 98.5

p-value = 0.000

Bonferroni adj p-value = 0.000

Pooled variance:

t = -7.851

df = 748

p-value = 0.000

Bonferroni adj p-value = 0.000

Two-sample t-test on TROP grouped by OTHER against Alternative = ‘not equal’

Group | N | Mean | SD |

0 | 661 | 0.065 | 0.444 |

1 | 90 | 1.072 | 3.833 |

Separate variance:

t = -2.489

df = 89.3

p-value = 0.015

Bonferroni adj p-value = 0.029

Pooled variance:

t = -6.465

df = 749

p-value = 0.000

Bonferroni adj p-value = 0.000

Another example illustrates the application of this significance test. Beta thalassemia is characterized by an increase in hemoglobin A2. Thalassemia gets more complicated when we consider delta beta deletion and alpha thalassemia. Nevertheless, we measure the hemoglobin A2 by liquid chromatography on the Biorad Variant II. The comparison of hemoglobin A2 in affected and unaffected is shown below (with random resampling):

Two-sample t-test on A2 grouped by THALASSEMIA DIAGNOSIS against Alternative = ‘not equal’

Group | N | Mean | SD |

0 | 257 | 3.250 | 1.131 |

1 | 61 | 6.305 | 2.541 |

Separate variance:

t = -9.177

df = 65.7

p-value = 0.000

Bonferroni adj p-value = 0.000

Pooled variance:

t = -14.263

df = 316

p-value = 0.000

Bonferroni adj p-value = 0.000

When we do a paired comparison of the Variant hemoglobin A2 versus quantitation of Helena isoelectric focusing, the results with the T-test shows no significance.

Paired samples t-test on A2 vs A2E with 130 cases

Alternative = ‘not equal’

Mean A2 = 3.638

Mean A2E = 3.453

Mean difference = 0.185

SD of difference = 1.960

t = 1.074

df = 129

p-value = 0.285

Bonferroni adj p-value = 0.285

Consider overlay box plots of the troponin I means for normal, stable cardiac patients and AMI patients:

The means between two subgroups may be close and the confidence intervals around the means may be wide so that it is not clear whether to accept or reject the null hypothesis. I illustrate this by taking for comparison the two groups that feature normal cardiac status and stable cardiac disease, neither having myocardial infarction. I use the nonparametric Kruskal Wallis analysis of ranks between two groups, and I increase the sample size to 100,000 patients by a resampling algorithm. The result for CKMB and for troponin I is:

Kruskal-Wallis One-Way Analysis of Variance for 93538 cases

Dependent variable is CKMB

Grouping variable is OTHER

Group Count Rank Sum

0 83405 3.64937E+09

1 10133 7.25351E+08

Mann-Whitney U test statistic = 1.71136E+08

Probability is 0.000

Chi-square approximation = 9619.624 with 1 df

Kruskal-Wallis One-Way Analysis of Variance for 93676 cases

Dependent variable is TROP

Grouping variable is OTHER

Group Count Rank Sum

0 83543 3.59446E+09

1 10133 7.93180E+08

Mann-Whitney U test statistic = 1.04705E+08

Probability is 0.000

Chi-square approximation = 21850.251 with 1 df

Examine a unique data set in which a test is done on amniotic fluid to determine whether there is adequate surfactant activity so that fetal lung compliance is good at delivery. If there is inadequate surfactant activity there is risk of respiratory distress of the newborn soon after delivery. The data includes the measure of surfactant activity, gestational age, and fetal status at delivery. This study emphasized the calculation of the odds-ratio and probability of RDA using surfactant measurement with, and without gestational age for infants delivered within 72 hours of the test. The statistical method (Goldmine) has a graphical display with the factor variable as the abscissa and the scaled predictor and odds-ratio as the ordinate. The data acquisition required a multicenter study of the National Academy of Clinical Biochemistry led by John Chapman (Chapel Hill, NC) and Lawrence Kaplan (Bellevue Hospital, NY, NY), published in Clin Chimica Acta (2002).

The table generated is as follows:

Probability and Odds-Ratios for Regression of S/A on Respiratory Outcomes

S/A interval | Probability of RDS | Odds Ratio |

0 – 10 | 0.87 | 713 |

11 – 20 | 0.69 | 239 |

21 – 34 | 0.43 | 80 |

35 – 44 | 0.20 | 27 |

45 – 54 | 0.08 | 9 |

55 – 70 | 0.03 | 3 |

> 70 | 0.01 | 1 |

There is a plot corresponding to the table above. It is patented as GOLDminer (graphical ordinal logit display). As the risk increases, the odds-ratio (and probability of an event) increases. The calculation is an advantage when there is more than two values of the factor variable, such as, heart attack, not heart attack, and something else. We look at the use of the Goldminer algorithm, this time using the acute myocardial infarction and troponin T example. The ECG finding is scaled so that the result is normal (0), NSSTT (1), ST depression or t-wave inversion, ST elevation. The troponin T is scaled to: 0.03, 0.031-0.06, 0.061-0.085, 0.086-0.1, 0.11-0.2, > 0.20 ug/L. The Goldminer plot is shown below with troponin T as 2^{nd} predictor.

(Joint Y) DXSCALE

average 0 4

X-profile score 1.00 0.00

4,5 3.64 0.00 0.68

4,4 3.51 0.00 0.59

4,3 3.35 0.00 0.48

3,5 3.07 0.01 0.34

4,1 2.87 0.02 0.27

3,4 2.79 0.02 0.24

4,0 2.54 0.04 0.17

3,3 2.43 0.06 0.15

3,2 2.00 0.12 0.08

2,5 1.88 0.15 0.07

3,1 1.55 0.23 0.04

2,4 1.42 0.26 0.03

3,0 1.12 0.36 0.01

2,3 1.02 0.40 0.01

2,2 0.70 0.53 0.00

2,1 0.47 0.65 0.00

2,0 0.32 0.74 0.00

1,3 0.29 0.77 0.00

1,2 0.20 0.83 0.00

1,1 0.13 0.88 0.00

1,0 0.09 0.91 0.00

The table is the table of probabilities from the Goldminer program. The diagnosis scale 4 is MI. Diagnosis 0 is baseline normal.

We return to a comparison of CKMB and troponin I. CKMB may be used as a surrogate test for examining the use of troponin I. We scale the CKMB to 3 and the troponin to 6 intervals. We construct a 3-by-6 table shown below, with the chi square analysis.

Frequencies

TNISCALE (rows) by CKMBSCALE (columns)

0 | 1 | 2 | Total | |

0 | 709 | 12 | 9 | 730 |

1 | 14 | 0 | 2 | 16 |

2 | 3 | 0 | 0 | 3 |

3 | 2 | 0 | 0 | 2 |

4 | 4 | 0 | 0 | 4 |

5 | 22 | 5 | 17 | 44 |

Total | 754 | 17 | 28 | 799 |

Expected values

TNISCALE (rows) by CKMBSCALE (columns)

0 | 1 | 2 | |

0 | 688.886 | 15.532 | 25.582 |

1 | 15.099 | 0.340 | 0.561 |

2 | 2.831 | 0.064 | 0.105 |

3 | 1.887 | 0.043 | 0.070 |

4 | 3.775 | 0.085 | 0.140 |

5 | 41.522 | 0.936 | 1.542 |

Test statistic | Value | df | Prob |

Pearson Chi-square | 198.580 | 10.000 | 0.000 |

How do we select the best value for a test? The standard accepted method is a ROC plot. We have seen how to calculate sensitivity, specificity, and error rates. The false positive error is 1 – specificity. The ROC curve plots sensitivity vs 1 – specificity. The ROC plot requires determination of the “disease” variable by some means other than the test that is being evaluated. What if the true diagnosis is not accurately known? The question posed introduces the concept of Latent Class Models.

** **

A special nutritional study set was used in which the definition of the effect is not as clear as that for heart attack. The risk of malnutrition is assessed at the bedside by a dietitian using observed features (presence of wound, malnutrition related condition, and poor oral intake), and by laboratory tests, using serum albumin (protein), red cell hemoglobin, and lymphocyte count. The composite score was a value of 1 to 4. Data was collected by Linda Brugler, RD, MBA, at St.FrancisHospital, (Wilmington, DE) on 62 patients to determine whether a better model could be developed using new predictors.

The new predictors were laboratory tests not used in the definition of the risk level, which could be problematic. The tests albumin, lymphocyte count, and hemoglobin were expected to be highly correlated with the risk level because they were used in its definition. The prealbumin, but not retinol binding protein or C reactive protein, was correlated with risk score and improved the prediction model.

The crosstable for risk level versus albumin is significant at p < 0.0001.

** **** A** GOLDminer plot showed scaled prealbumin versus levels 3 & 4. A value less than 5 is severe malnutrition and over 19 is not malnourished. Mild and moderate malnutrition are between these values.

A method called latent class cluster analysis is used to classify the data. A latent class is identified when the classification isn’t accurately known. The result of the analysis is shown in Table 4. The percent of variable subclasses are shown within each class and total 1.00 (100%).

Cluster1 Cluster2 Cluster3

Cluster Size

0.5545 0.3304 0.1151

PAB1COD

1 0.6841 0.0383 0.0454

2 0.3134 0.6346 0.6662

3 0.0024 0.1781 0.1656

4 0.0001 0.1490 0.1227

ALB0COD

1 0.9491 0.4865 0.1013

2 0.0389 0.1445 0.0869

3 0.0117 0.3167 0.5497

4 0.0003 0.0523 0.2621

LCCOD

1 0.1229 0.0097 0.7600

2 0.3680 0.0687 0.2381

4 0.2297 0.2383 0.0016

5 0.2793 0.6832 0.0002

There are other aspects of informatics that are essential for educational design of the laboratory professional of the future. These include preparation of powerpoint presentations, use of the internet to obtain current information, quality control designed into the process of handling laboratory testing, evaluating data from different correlated workstations, and instrument integration. The integrated open architecture will be essential for financial management of the laboratory as well. The continued improvement of the technology base of the laboratory will become routine over the next few years. The education of the CLS for a professional career in medical technology will require an individual who is adaptive and well prepared for a changing technology environment. The next section of this document will describe the information structure needed just to carry out the day-to-day operations of the laboratory.

Cost linkages important to define value

Traditional accounting methods do not take into account the cost relationships that are essential for economic survival in a competitive environment so that the only items on the ledger are materials and supplies, labor and benefits, and indirect costs. This is a description of the business as set forth by an NCCLS cost manual, but it is not sufficient to account for the dimensions of the business in relationship to its activities. The emergence of spreadsheets, and even as importantly, the development of relational database structures, has transformed and is transforming how we can look at the costing of organizations in relationship to how individuals and groups within the organization carry out the business plan and realize the mission set forth by the governing body. In this sense, the traditional model was incomplete because it only accounted for the costs incurred by departments in a structure that allocates resources to each department based on the assessed use of resources in providing services. The model has to account for the allocation of resources to product lines of services (as a DRG model developed by Dr. Eleanor Travers). A revised model has to take into account two new dimensions. The first dimension is that of the allocation of resources to provide services that are distinct medical/clinical activities. This means that in the laboratory service business there may be distinctive services as well as market sectors. That is, health care organizations view their markets as defined by service Zip codes which delineate the lines drawn between their market and the competition (in the absence of clear overlap).

We have to keep in mind that there are service groups that were defined by John Thompson and Robert Fetter in the development of the DRGs (Diagnosis Related Groups) that have a real relationship to resource requirements for pediatric, geriatric, obstetrics, gynecology, hematology, oncology, cardiology, medical and surgical. These groups are derived from bundles of ICDs (International Code of Diagnosis) that have comparable within group use of laboratory, radiology, nutrition, pharmacy and other resources. There was an early concern that there was too much variability within DRGs, which was addressed by severity of illness adjustment (Susan Horn). It is now clear that ICD’s don’t capture a significant content of the medical record. A method is being devised to correct this problem by Kaiser and Mayo using the SNOMED codes as a starting point. The point is that it is essential that the activities, resources required, and payment be aligned for validity of the payment system. Of some interest is the association of severity of illness with more than two comorbidities, and of an association with critical values of a few laboratory tests, e.g., albumin, sodium, potassium, hemoglobin, white cell count. The actual linkages of these resources to cost of the ten or 20 most common diagnostic categories is only a recent event. As a rule the top 25 categories account for a substantial volume of the costs that it is of great interest to control. The improvement of database technology makes it conceivable that 100 categories of disease classification could be controlled without difficulty in the next ten years.

Quality cost synergism

What is traditionally described is only one dimension of the business of the operation. It is the business of the organization, but it is only one-third of the description of the organization and the costs that drive it. The second dimension of the organization’s cost profile is only obtained by cost accounting how the organization creates value. Value is simply the ratio of outputs to inputs. The traditional cost accounting model looks only at business value added. The value generated by an organization is attributable to a service or good produced that a customer is willing to purchase. We have to measure the value by measuring some variable that is highly correlated with the value created. That measure is partly accounted for by transaction times. We can borrow from the same model that is used in other industries. The transportation business is an example. A colleague has designed a surgical pathology information system on the premise that a report in the pathology office or a phone inquiry by a surgeon is a failure of the service. This is analogous to the Southeast Airlines mission to have the lowest time on the ground in the industry. The growing complexity of service needs, the capital requirements to support the needs, and the contractual requirements are driving redesign of services in a constantly changing environment.

Technology requirements

We have gone from predominantly batch and large scale production to predominantly random access and a growing point-of-care application with pneumatic tube delivery systems in the acute care setting in the last 15 years. The emphasis on population-based health and increasing shift from acute care to ambulatory care has increased the pressure for point-of-care testing to reduce second visits for adjustment of medication. The laboratory, radiology and imaging services, and pharmacy information have to be directed to a medical record that may be accessed in acute care or ambulatory setting. We not only have the proposition that faster is better, but access is from anyplace and almost anytime – connectivity.

There has been a strategic discussion about configuration of information services that is resolving itself by the needs of the marketplace. Large, self contained organizations are short-lived, and with the emergence of networked provider organizations there will be no compelling interest in having systems that are not tailored to the variety of applications and environments that are served. The migration from minicomputer to microcomputer client-server networks will go rapidly to N-tiered systems with distributed object-oriented features. The need for laboratory information systems as a separate application can be seriously challenged by the new paradigm.

Utilization and Cost Linkages

Laboratory utilization has to be looked at from more than one perspective in relationship to costs and revenues. The redefinition of panels cuts the marginal added cost to produce an additional test, but it doesn’t cut the largest cost in obtaining and processing the specimen. Unfortunately, there is a fixed cost of the operations that has to be achieved, which also drives the formation of laboratory consolidations to have sufficient volume. If one looks at the capital requirements and labor to support a minimum volume of testing, the marginal cost of added tests decreases with large volume. The problem with the consolidation argument is that one has to remove testing from the local site in order to increase the volume with an anticipated effect on cycle time for processing. There is also a significant resource cost for courier service, specimen handling and reporting. Lets look at the reverse. What is the effect of decreasing utilization? One increases the marginal added cost per unit of testing on specimens or accessions. There is the same basic fixed cost, and if the volume of testing needed to break even is met, the advantage of additional volume is lost. Fixing the expected cost per patient or per accession becomes problematic if there is a requirement to reduce utilization.

The key volume for processing in the service sense is the number of specimens processed, which has an enormous impact on the processing requirements (number of tests adds to reagent costs and turnaround time per accession). The result is that one might consider the reduction of testing that is done to monitor critical patients’ status more frequently than is needed. One can examine the frequency of the CBC, PT/APTT, panels, electrolytes, glucose, and blood gases in the ICUs. The use of the laboratory is expected to be more intense, reflecting severity of illness, in this setting. On the other hand, excess redundancy may reflect testing that makes no meaningful contribution to patient care. This may be suggested by repeated testing with no significant variation in the lab results.

Intangible elements

Competitive advantage may have marginal costs with enormous value enhancement. This is in the manner of reporting the results. My colleagues have proposed the importance of a scale-free representation of the laboratory data for presentation to the provider and the patients. This can be extended further by the scaling of the normalized data into intervals associated with expected risks for outcomes. This would move the laboratory into the domain of assisting in the management of population adjusted health outcomes.

Blume P. Design of a clinical laboratory computer system. Laboratory and Hospital Information Systems. In Clinics Lab Med 1991;11:83-104.

Didner RS. Back-to-front systems design: a guns and butter approach. Proc Intl Ergonomics Assoc 1982;–

Didner RS, Butler KA. Information requirements for user decision support: designing systems from back to front. Proc Intl Conf on Cybernetics and Society. IEEE. 1982;–:415-419.

Bernstein LH. An LIS is not all pluses. MLO 1986;18:75-80.

Bernstein LH, Sachs B. Selecting an automated chemistry analyzer: cost analysis. Amer Clin Prod Rev 1988;–:16-19.

Bernstein L, Sachs E, Stapleton V, Gorton J. Replacement of a laboratory instrument system based on workflow design. Amer Clin Prod Rev 1988; –: 22-24.

Bernstein LH. Computer-assisted restructuring services. Amer Clin Prod Rev1986;9:–

Bernstein LH, Sachs B, Stapleton V, Gorton J, Lardas O. Implementing a laboratory information management system and verifying its performance. Informatics in Pathol 1986;1:224-233.

Bernstein LH. Selecting a laboratory computer system: the importance of auditing laboratory performance. Amer Clin Prod Rev 1985;–:30-33.

Castaneda-Mendez K, Bernstein LH. Linking costs and quality improvement to clinical outcomes through added value. J Healthcare Qual 1997;19:11-16.

Bernstein LH. The contribution of laboratory information systems to quality assurance. Amer Clin Prod Rev 1987;18:10-15.

Bernstein LH. Predicting the costs of laboratory testing. Pathologist 1985;39:–

Bernstein LH, Davis G, Pelton T. Managing and reducing lab costs. MLO 1984;16:53-56.

Bernstein LH, Brouillette R. The negative impact of untimely data in the diagnosis of acute myocardial infarction. Amer Clin Lab 1990;__:38-40.

Bernstein LH, Spiekerman AM, Qamar A, Babb J. Effective resource management using a clinical and laboratory algorithm for chest pain triage. Clin Lab Management Rev 1996;–:143-152.

Shaw-Stiffel TA, Zarny LA, Pleban WE, Rosman DD, Rudolph RA, Bernstein LH. Effect of nutrition status and other factors on length of hospital stay after major gastrointestinal surgery. Nutrition (Intl) 1993;9:140-145.

Bernstein LH. Relationship of nutritional markers to length of hospital stay. Nutrition (Intl)(suppl) 1995;11:205-209.

Bernstein LH, Coles M, Granata A. The BridgeportHospital experience with autologous transfusion in orthopedic surgery. Orthopedics 1997;20:677-680.

Bernstein LH. Realization of the projected impact of a chemistry workflow management system at BridgeportHospital. In Quality and Statistics: Total Quality Management. Kowalewski MJ, Ed. 1994; 120-133 ASTM: STP 1209. Phila, PA.

Bernstein LH, Kleinman GM, Davis GL, Chiga M. Part A reimbursement: what is your role in medical quality assurance? Pathologist 1986;40:–.

Bernstein LH. What constitutes a laboratory quality monitoring program? Amer J Qual Util Rev 1990;5:95-99.

Mozes B, Easterling J, Sheiner LB, Melmon KL, Kline R, Goldman ES, Brown AN. Case-mix adjustment using objective measures of severity: the case for laboratory data. Health Serv Res 1994;28:689711.

Bernstein LH, Shaw-Stiffel T, Zarny L, Pleban W. An informational approach to likelihood of malnutrition. Nutr (Intl) 1996;12:772-226.