A Software Agent for Diagnosis of ACUTE MI
Authors: Isaac E. Mayzlin, Ph.D.1, David Mayzlin1,Larry H. Bernstein, M.D.2
1MayNet, Carlsbad, CA, 2Department of Pathology and Laboratory Medicine, BridgeportHospital, Bridgeport, CT.
Agent-based decision support systems are designed to provide medical staff with information needed for making critical decisions. We describe a Software Agent for evaluating multiple tests based on a large data base especially efficient when time for making the decision is critical for successful treatment of serious conditions, such as stroke or acute myocardial infarction (AMI).
Goldman and others (1) developed a screening algorithm based on characteristics of the chest pain, EKG changes, and key clinical findings to separate high-risk from low-risk patients at the time they present using clinical features without using a serum marker. The Goldman algorithm was not widely used because of a 7 percent misclassification error, mostly false positives. Nonetheless, A third of emergency room visits by patients presenting with symptoms of rule out AMI are not associated with chest pain. A related issue is the finding that a significant number of patients who are at high risk have to be identified using a cardiac marker. The use of cardiac isoenzymes has been to classify patients meeting the high risk criteria, many of whom are not subsequently found to have AMI.
Software Agent for Diagnosis based on the Knowledge incorporated in the Trained Artificial Neural Network and Data Clustering
This Software Agent is based on the combination of clustering by Euclidean distances in multi-dimensional space and non-linear discrimination fulfilled by the Artificial Neural Network (ANN) trained on clusters’ averages. Our studies indicate that at an optimum clustering distance the number of classes is minimized with efficient training on the ANN, retaining accuracy of classification by the ANN at 97%. The studies conducted involve training and testing on separate clinical data sets. We perform clustering using the geometrical (Euclidean) distance between two points in n-dimensional space, formed by n variables, including both input and output variables. Since this distance assumes compatibility of different variables, the values of all input variables are linearly transformed (scaled) to the range from 0 to 1.
The ANN technique for readers accustomed to classical statistics can be viewed as an extension of multivariate regression analyses with such new features as non-linearity and ability to process categorical data. Categorical (not continuous) variables represent two or more levels, groups, or classes of correspondent features, and in our case this concept is used to signify patient condition, for example existence or not of AMI.
Process description. We implemented the proposed algorithm for diagnosis of AMI. All the calculations were performed on the authors’ unique Software Agent Maynet. First, using the automatic random extraction procedure, the initial data set (139 patients) was partitioned into two sets — training and testing. This randomization also determined the size of these sets (96 and 43, respectively) since the program was instructed to assign approximately 70 % of data to the training set.
The main process consists of three successive steps:
(1) clustering performed on training data set,
(2) neural network’s training on clusters from previous step, and
(3) classifier’s accuracy evaluation on testing data.
The classifier in this research will be the ANN, created on step 2, with output in the range [0,1], that provides binary result (1 – AMI, 0 – not AMI), using decision point 0.5.
In this paper we used the data of two previous studies (2,3) with three patients, potential outliers, removed (n = 139). The data contains three input variables, CK-MB, LD-1, LD-1/total LD, and one output variable, diagnoses, coded as 1 (for AMI) or 0 (non-AMI).
Table 1. Effect of selection of maximum distance on the number of classes formed and on the accuracy of recognition by ANN
Clustering Distance Factor F(D = F * R) | Number ofClasses | Number of Nodes in The Hidden Layers | Number of Misrecognized Patterns inThe TestingSet of 43 | Percent ofMisrecognized |
10.90.8
0.7 |
241413
5 |
1, 02, 03, 0
1, 0 2, 0 3, 0 3, 2 3, 2 |
121
1 2 1 1 1 |
2.34.62.3
2.3 4.6 2.3 2.3 2.3 |
Abbreviations: creatine kinase MB isoenzyme: CK-MB; lactate dehydrogenase isoenzyme-1: LD1; LD1/total LD ratio: %LD1; acute myocardial infarction: AMI; artificial neural network: ANN