Posts Tagged ‘Euclidean distance’

A Software Agent for Diagnosis of ACUTE MI

Authors: Isaac E. Mayzlin, Ph.D.1, David Mayzlin1,Larry H. Bernstein, M.D.2

1MayNet, Carlsbad, CA, 2Department of Pathology and Laboratory Medicine, BridgeportHospital, Bridgeport, CT.

Agent-based  decision  support  systems  are  designed  to  provide  medical  staff  with  information  needed  for making critical decisions. We describe a Software Agent for evaluating multiple tests based on a large data base  especially  efficient  when  time  for  making  the  decision  is  critical  for  successful  treatment  of  serious conditions, such as stroke or acute myocardial infarction (AMI).

Goldman and others (1) developed a screening algorithm based on characteristics of the chest pain, EKG changes, and key clinical findings to separate high-risk from low-risk patients at the time they present using clinical features without using a serum marker. The Goldman algorithm was not widely used because of a 7 percent misclassification error, mostly false positives.       Nonetheless, A third of emergency room visits by patients presenting with symptoms of rule out AMI are not associated with chest pain. A related issue is the finding that a significant number of patients who are at high risk have to be identified using a cardiac marker. The use of cardiac isoenzymes has been to classify patients meeting the high risk criteria, many of whom are not subsequently found to have AMI.

Software Agent for Diagnosis based on the Knowledge incorporated in the Trained Artificial Neural Network and Data Clustering

This Software Agent is based on the combination of clustering by Euclidean distances in multi-dimensional space and non-linear  discrimination  fulfilled  by  the  Artificial  Neural  Network  (ANN)  trained  on  clusters’  averages.         Our  studies indicate that at an optimum clustering  distance the number of classes is minimized with efficient training on the ANN, retaining accuracy of classification by the ANN at 97%. The studies   conducted involve training and testing on separate clinical data sets.  We perform clustering using the geometrical (Euclidean) distance between two points in n-dimensional space,  formed  by  n  variables,  including  both  input  and  output  variables.  Since  this  distance  assumes  compatibility  of different variables, the values of all input variables are linearly transformed (scaled) to the range from 0 to 1.

The ANN technique for readers accustomed to classical statistics can be viewed as an extension of multivariate regression analyses with such new features as non-linearity and ability to process categorical data. Categorical (not continuous) variables represent two or more levels, groups, or classes of correspondent features, and in our case this concept is used to signify patient condition, for example existence or not of AMI.

Process  description. We  implemented  the  proposed  algorithm  for  diagnosis  of  AMI.  All  the  calculations  were performed on the authors’ unique Software Agent Maynet. First, using the automatic random extraction procedure, the initial data set (139 patients) was partitioned into two sets — training and testing.  This randomization also determined the size of these sets (96 and 43, respectively) since the program was instructed to assign approximately 70 % of data to the training set.

The main process consists of three successive steps:

(1)        clustering performed on training data set,

(2)        neural network’s training on clusters from previous step, and

(3)        classifier’s accuracy evaluation on testing data.

The classifier in this research will be the ANN, created on step 2, with output in the range [0,1], that provides binary result (1 – AMI, 0 – not AMI), using decision point 0.5.

In this paper we used the data of two previous studies (2,3) with three patients, potential outliers, removed (n = 139). The data contains three input variables, CK-MB, LD-1, LD-1/total LD, and one output variable, diagnoses, coded as 1 (for AMI) or 0 (non-AMI).

Table  1.  Effect  of  selection  of  maximum  distance  on  the  number  of  classes  formed  and  on  the accuracy of recognition by ANN

Clustering Distance Factor F(D = F * R) Number ofClasses Number of Nodes in The Hidden Layers Number of Misrecognized Patterns inThe TestingSet of 43 Percent ofMisrecognized




1,  02,  03,  0

1,  0

2,  0

3,  0

3,  2

3,  2












Abbreviations: creatine kinase MB isoenzyme: CK-MB; lactate dehydrogenase isoenzyme-1: LD1; LD1/total LD ratio: %LD1; acute myocardial infarction: AMI; artificial neural network: ANN

Read Full Post »