Posts Tagged ‘laboratory hematology’

Medical Informatics View

Chapter 1 Statement of Inferential    Second Opinion

Realtime Clinical Expert Support

Gil David and Larry Bernstein have developed, in consultation with Prof. Ronald Coifman, in the Yale University Applied Mathematics Program, a software system that is the equivalent of an intelligent Electronic Health Records Dashboard that provides empirical medical reference and suggests quantitative diagnostics options.


Keywords: Entropy, Maximum Likelihood Function, separatory clustering, peripheral smear, automated hemogram, Anomaly, classification by anomaly, multivariable and multisyndromic, automated second opinion

Abbreviations: Akaike Information Criterion, AIC;  Bayes Information Criterion, BIC, Systemic Inflammatory Response Syndrome, SIRS.


Background: The current design of the Electronic Medical Record (EMR) is a linear presentation of portions of the record by services, by diagnostic method, and by date, to cite examples.  This allows perusal through a graphical user interface (GUI) that partitions the information or necessary reports in a workstation entered by keying to icons.  This requires that the medical practitioner finds the history, medications, laboratory reports, cardiac imaging and EKGs, and radiology in different workspaces.  The introduction of a DASHBOARD has allowed a presentation of drug reactions, allergies, primary and secondary diagnoses, and critical information about any patient the care giver needing access to the record.  The advantage of this innovation is obvious.  The startup problem is what information is presented and how it is displayed, which is a source of variability and a key to its success.

Intent: We are proposing an innovation that supercedes the main design elements of a DASHBOARD and utilizes the conjoined syndromic features of the disparate data elements.  So the important determinant of the success of this endeavor is that it facilitates both the workflow and the decision-making process with a reduction of medical error. Continuing work is in progress in extending the capabilities with model datasets, and sufficient data because the extraction of data from disparate sources will, in the long run, further improve this process.  For instance, the finding of  both ST depression on EKG coincident with an elevated cardiac biomarker (troponin), particularly in the absence of substantially reduced renal function. The conversion of hematology based data into useful clinical information requires the establishment of problem-solving constructs based on the measured data.

The most commonly ordered test used for managing patients worldwide is the hemogram that often incorporates the review of a peripheral smear.  While the hemogram has undergone progressive modification of the measured features over time the subsequent expansion of the panel of tests has provided a window into the cellular changes in the production, release or suppression of the formed elements from the blood-forming organ to the circulation.  In the hemogram one can view data reflecting the characteristics of a broad spectrum of medical conditions.

Progressive modification of the measured features of the hemogram has delineated characteristics expressed as measurements of size, density, and concentration, resulting in many characteristic features of classification. In the diagnosis of hematological disorders proliferation of marrow precursors, the domination of a cell line, and features of suppression of hematopoiesis provide a two dimensional model.  Other dimensions are created by considering the maturity of the circulating cells.  The application of rules-based, automated problem solving should provide a valid approach to the classification and interpretation of the data used to determine a knowledge-based clinical opinion. The exponential growth of knowledge since the mapping of the human genome enabled by parallel advances in applied mathematics that have not been a part of traditional clinical problem solving.  As the complexity of statistical models has increased the dependencies have become less clear to the individual.  Contemporary statistical modeling has a primary goal of finding an underlying structure in studied data sets.  The development of an evidence-based inference engine that can substantially interpret the data at hand and convert it in real time to a “knowledge-based opinion” could improve clinical decision-making by incorporating multiple complex clinical features as well as duration of onset into the model.

An example of a difficult area for clinical problem solving is found in the diagnosis of SIRS and associated sepsis.  SIRS (and associated sepsis) is a costly diagnosis in hospitalized patients.   Failure to diagnose sepsis in a timely manner creates a potential financial and safety hazard.  The early diagnosis of SIRS/sepsis is made by the application of defined criteria (temperature, heart rate, respiratory rate and WBC count) by the clinician.   The application of those clinical criteria, however, defines the condition after it has developed and has not provided a reliable method for the early diagnosis of SIRS.  The early diagnosis of SIRS may possibly be enhanced by the measurement of proteomic biomarkers, including transthyretin, C-reactive protein and procalcitonin.  Immature granulocyte (IG) measurement has been proposed as a more readily available indicator of the presence of granulocyte precursors (left shift).  The use of such markers, obtained by automated systems in conjunction with innovative statistical modeling, provides a promising approach to enhance workflow and decision making.   Such a system utilizes the conjoined syndromic features of disparate data elements with an anticipated reduction of medical error.  This study is only an extension of our approach to repairing a longstanding problem in the construction of the many-sided electronic medical record (EMR).  In a classic study carried out at Bell Laboratories, Didner found that information technologies reflect the view of the creators, not the users, and Front-to-Back Design (R Didner) is needed.

Costs would be reduced, and accuracy improved, if the clinical data could be captured directly at the point it is generated, in a form suitable for transmission to insurers, or machine transformable into other formats.  Such data capture, could also be used to improve the form and structure of how this information is viewed by physicians, and form a basis of a more comprehensive database linking clinical protocols to outcomes, that could improve the knowledge of this relationship, hence clinical outcomes.



How we frame our expectations is so important that it determines the data we collect to examine the process.   In the absence of data to support an assumed benefit, there is no proof of validity at whatever cost.   This has meaning for hospital operations, for nonhospital laboratory operations, for companies in the diagnostic business, and for planning of health systems.

In 1983, a vision for creating the EMR was introduced by Lawrence Weed,  expressed by McGowan and Winstead-Fry (J J McGowan and P Winstead-Fry. Problem Knowledge Couplers: reengineering evidence-based medicine through interdisciplinary development, decision support, and research. Bull Med Libr Assoc. 1999 October; 87(4): 462–470.)   PMCID: PMC226622    Copyright notice





They introduce Problem Knowledge Couplers as a clinical decision support software tool that  recognizes that functionality must be predicated upon combining unique patient information, but obtained through relevant structured question sets, with the appropriate knowledge found in the world’s peer-reviewed medical literature.  The premise of this is stated by LL WEED in “Idols of the Mind” (Dec 13, 2006): “ a root cause of a major defect in the health care system is that, while we falsely admire and extol the intellectual powers of highly educated physicians, we do not search for the external aids their minds require”.  HIT use has been focused on information retrieval, leaving the unaided mind burdened with information processing.



The data presented has to be comprehended in context with vital signs, key symptoms, and an accurate medical history.  Consequently, the limits of memory and cognition are tested in medical practice on a daily basis.  We deal with problems in the interpretation of data presented to the physician, and how through better design of the software that presents this data the situation could be improved.  The computer architecture that the physician uses to view the results is more often than not presented as the designer would prefer, and not as the end-user would like.  In order to optimize the interface for physician, the system would have a “front-to-back” design, with the call up for any patient ideally consisting of a dashboard design that presents the crucial information that the physician would likely act on in an easily accessible manner.  The key point is that each item used has to be closely related to a corresponding criterion needed for a decision.  Currently, improved design is heading in that direction.  In removing this limitation the output requirements have to be defined before the database is designed to produce the required output.  The ability to see any other information, or to see a sequential visualization of the patient’s course would be steps to home in on other views.  In addition, the amount of relevant information, even when presented well, is a cognitive challenge unless it is presented in a disease- or organ-system structure.  So the interaction between the user and the electronic medical record has a significant effect on practitioner time, ability to minimize errors of interpretation, facilitate treatment, and manage costs.  The reality is that clinicians are challenged by the need to view a large amount of data, with only a few resources available to know which of these values are relevant, or the need for action on a result, or its urgency. The challenge then becomes how fundamental measurement theory can lead to the creation at the point of care of more meaningful actionable presentations of results.  WP Fisher refers to the creation of a context in which computational resources for meeting the challenges will be incorporated into the electronic medical record.  The one which he chooses is a probabilistic conjoint (Rasch) measurement model, which uses scale-free standard measures and meets data quality standards. He illustrates this by fitting a set of data provided by Bernstein (19)(27 items for the diagnosis of acute myocardial infarction (AMI) to a Rasch multiple rating scale model testing the hypothesis that items work together to delineate a unidimensional measurement continuum. The results indicated that highly improbable observations could be discarded, data volume could be reduced based on internal, and increased ability of the care provider to interpret the data.


Classified data a separate issue from automation

 Feature Extraction. This further breakdown in the modern era is determined by genetically characteristic gene sequences that are transcribed into what we measure.  Eugene Rypka contributed greatly to clarifying the extraction of features in a series of articles, which set the groundwork for the methods used today in clinical microbiology.  The method he describes is termed S-clustering, and will have a significant bearing on how we can view hematology data.  He describes S-clustering as extracting features from endogenous data that amplify or maximize structural information to create distinctive classes.  The method classifies by taking the number of features with sufficient variety to map into a theoretic standard. The mapping is done by a truth table, and each variable is scaled to assign values for each: message choice.  The number of messages and the number of choices forms an N-by N table.  He points out that the message choice in an antibody titer would be converted from 0 + ++ +++ to 0 1 2 3.

Even though there may be a large number of measured values, the variety is reduced by this compression, even though there is risk of loss of information.  Yet the real issue is how a combination of variables falls into a table with meaningful information.  We are concerned with accurate assignment into uniquely variable groups by information in test relationships. One determines the effectiveness of each variable by its contribution to information gain in the system.  The reference or null set is the class having no information.  Uncertainty in assigning to a classification is only relieved by providing sufficient information.  One determines the effectiveness of each variable by its contribution to information gain in the system.  The possibility for realizing a good model for approximating the effects of factors supported by data used for inference owes much to the discovery of Kullback-Liebler distance or “information”, and Akaike found a simple relationship between K-L information and Fisher’s maximized log-likelihood function. A solid foundation in this work was elaborated by Eugene Rypka.  Of course, this was made far less complicated by the genetic complement that defines its function, which made  more accessible the study of biochemical pathways.  In addition, the genetic relationships in plant genetics were accessible to Ronald Fisher for the application of the linear discriminant function.    In the last 60 years the application of entropy comparable to the entropy of physics, information, noise, and signal processing, has been fully developed by Shannon, Kullback, and others,  and has been integrated with modern statistics, as a result of the seminal work of Akaike, Leo Goodman, Magidson and Vermunt, and unrelated work by Coifman. Dr. Magidson writes about Latent Class Model evolution:


The recent increase in interest in latent class models is due to the development of extended algorithms which allow today’s computers to perform LC analyses on data containing more than just a few variables, and the recent realization that the use of such models can yield powerful improvements over traditional approaches to segmentation, as well as to cluster, factor, regression and other kinds of analysis.

Perhaps the application to medical diagnostics had been slowed by limitations of data capture and computer architecture as well as lack of clarity in definition of what are the most distinguishing features needed for diagnostic clarification.  Bernstein and colleagues had a series of studies using Kullback-Liebler Distance  (effective information) for clustering to examine the latent structure of the elements commonly used for diagnosis of myocardial infarction (CK-MB, LD and the isoenzyme-1 of LD),  protein-energy malnutrition (serum albumin, serum transthyretin, condition associated with protein malnutrition (see Jeejeebhoy and subjective global assessment), prolonged period with no oral intake), prediction of respiratory distress syndrome of the newborn (RDS), and prediction of lymph nodal involvement of prostate cancer, among other studies.   The exploration of syndromic classification has made a substantial contribution to the diagnostic literature, but has only been made useful through publication on the web of calculators and nomograms (such as Epocrates and Medcalc) accessible to physicians through an iPhone.  These are not an integral part of the EMR, and the applications require an anticipation of the need for such processing.

Gil David et al. introduced an AUTOMATED processing of the data available to the ordering physician and can anticipate an enormous impact in diagnosis and treatment of perhaps half of the top 20 most common causes of hospital admission that carry a high cost and morbidity.  For example: anemias (iron deficiency, vitamin B12 and folate deficiency, and hemolytic anemia or myelodysplastic syndrome); pneumonia; systemic inflammatory response syndrome (SIRS) with or without bacteremia; multiple organ failure and hemodynamic shock; electrolyte/acid base balance disorders; acute and chronic liver disease; acute and chronic renal disease; diabetes mellitus; protein-energy malnutrition; acute respiratory distress of the newborn; acute coronary syndrome; congestive heart failure; disordered bone mineral metabolism; hemostatic disorders; leukemia and lymphoma; malabsorption syndromes; and cancer(s)[breast, prostate, colorectal, pancreas, stomach, liver, esophagus, thyroid, and parathyroid].

Extension of conditions and presentation to the electronic medical record (EMR)

We have published on the application of an automated inference engine to the Systemic Inflammatory Response (SIRS), a serious infection, or emerging sepsis.  We can report on this without going over previous ground.  Of considerable interest is the morbidity and mortality of sepsis, and the hospital costs from a late diagnosis.  If missed early, it could be problematic, and it could be seen as a hospital complication when it is not. Improving on previous work, we have the opportunity to look at the contribution of a fluorescence labeled flow cytometric measurement of the immature granulocytes (IG), which is now widely used, but has not been adequately evaluated from the perspective of diagnostic usage.  We have done considerable work on protein-energy malnutrition (PEM), to which the automated interpretation is currently in review.  Of course, the

cholesterol, lymphocyte count, serum albumin provide the weight of evidence with the primary diagnosis (emphysema, chronic renal disease, eating disorder), and serum transthyretin would be low and remain low for a week in critical care.  This could be a modifier with age in providing discriminatory power.


Chapter  3           References


The Cost Burden of Disease: U.S. and Michigan. CHRT Brief. January 2010.

The National Hospital Bill: The Most Expensive Conditions by Payer, 2006. HCUP Brief #59.


Rudolph RA, Bernstein LH, Babb J: Information-Induction for the diagnosis of

myocardial infarction. Clin Chem 1988;34:2031-2038.

Bernstein LH (Chairman). Prealbumin in Nutritional Care Consensus Group.

Measurement of visceral protein status in assessing protein and energy malnutrition: standard of care. Nutrition 1995; 11:169-171.

Bernstein LH, Qamar A, McPherson C, Zarich S, Rudolph R. Diagnosis of myocardial infarction: integration of serum markers and clinical descriptors using information theory. Yale J Biol Med 1999; 72: 5-13.


Kaplan L.A.; Chapman J.F.; Bock J.L.; Santa Maria E.; Clejan S.; Huddleston D.J.; Reed R.G.; Bernstein L.H.; Gillen-Goldstein J. Prediction of Respiratory Distress Syndrome using the Abbott FLM-II amniotic fluid assay. The National Academy of Clinical Biochemistry (NACB) Fetal Lung Maturity Assessment Project.  Clin Chim Acta 2002; 326(8): 61-68.


Bernstein LH, Qamar A, McPherson C, Zarich S. Evaluating a new graphical ordinal logit method (GOLDminer) in the diagnosis of myocardial infarction utilizing clinical features and laboratory data. Yale J Biol Med 1999; 72:259-268.


Bernstein L, Bradley K, Zarich SA. GOLDmineR: Improving models for classifying patients with chest pain. Yale J Biol Med 2002; 75, pp. 183-198.

Ronald Raphael Coifman and Mladen Victor Wickerhauser. Adapted Waveform Analysis as a Tool for Modeling, Feature Extraction, and Denoising. Optical Engineering, 33(7):2170–2174, July 1994.


R. Coifman and N. Saito. Constructions of local orthonormal bases for classification and regression. C. R. Acad. Sci. Paris, 319 Série I:191-196, 1994.


Chapter 4           Clinical Expert System

Realtime Clinical Expert Support and validation System

We have developed a software system that is the equivalent of an intelligent Electronic Health Records Dashboard that provides empirical medical reference and suggests quantitative diagnostics options. The primary purpose is to gather medical information, generate metrics, analyze them in realtime and provide a differential diagnosis, meeting the highest standard of accuracy. The system builds its unique characterization and provides a list of other patients that share this unique profile, therefore utilizing the vast aggregated knowledge (diagnosis, analysis, treatment, etc.) of the medical community. The main mathematical breakthroughs are provided by accurate patient profiling and inference methodologies in which anomalous subprofiles are extracted and compared to potentially relevant cases. As the model grows and its knowledge database is extended, the diagnostic and the prognostic become more accurate and precise. We anticipate that the effect of implementing this diagnostic amplifier would result in higher physician productivity at a time of great human resource limitations, safer prescribing practices, rapid identification of unusual patients, better assignment of patients to observation, inpatient beds, intensive care, or referral to clinic, shortened length of patients ICU and bed days.

The main benefit is a real time assessment as well as diagnostic options based on comparable cases, flags for risk and potential problems as illustrated in the following case acquired on 04/21/10. The patient was diagnosed by our system with severe SIRS at a grade of 0.61 .


The patient was treated for SIRS and the blood tests were repeated during the following week. The full combined record of our system’s assessment of the patient, as derived from the further Hematology tests, is illustrated below. The yellow line shows the diagnosis that corresponds to the first blood test (as also shown in the image above). The red line shows the next diagnosis that was performed a week later.









As we can see the following treatment, the SIRS risk as a major concern was eliminated and the system provides a positive feedback for the treatment of the physician.


Method for data organization and classification via characterization metrics.

Our database organized to enable linking a given profile to known profiles. This is achieved by associating a patient to a peer group of patients having an overall similar profile, where the similar profile is obtained through a randomized search for an appropriate weighting of variables. Given the selection of a patients’ peer group, we build a metric that measures the dissimilarity of the patient from its group. This is achieved through a local iterated statistical analysis in the peer group.

We then use this characteristic metric to locate other patients with similar unique profiles, for each of whom we repeat the procedure described above. This leads to a network of patients with similar risk condition. Then, the classification of the patient is inferred from the medical known condition of some of the patients in the linked network. Given a set of points (the database) and a newly arrived sample (point), we characterize the behavior of the newly arrived sample, according to the database. Then, we detect other points in the database that match this unique characterization. This collection of detected points defines the characteristic neighborhood of the newly arrived sample. We use the characteristic neighbor hood in order to classify the newly arrived sample. This process of differential diagnosis is repeated for every newly arrived point.   The medical colossus we have today has become a system out of control and beset by the elephant in the room – an uncharted complexity. We offer a method that addresses the complexity and enables rather than disables the practitioner.  The method identifies outliers and combines data according to commonality of features.




Read Full Post »