Posts Tagged ‘Medical Informatics’

Yay! Bloomberg View Seems to Be On the Side of the Lowly Scientist!

Posted in Academic Publishing, Curation, Open Access Journals, tagged #science2_0, @MozillaScience, @science 2_0, Aviva Lev-Ari, Bloomberg, Medical Informatics, National Institutes of Health, Open access, Open access journal, Open Access Publishing, Open data, open innovation, Open source on November 4, 2015| Leave a Comment »

Yay! Bloomberg View Seems to Be On the Side of the Lowly Scientist!

Reporter: Stephen J. Williams, Ph.D.

Justin Fox at BloombergView had just published an article near and dear to the hearts of all those #openaccess scientists and those of us @Pharma_BI and @MozillaScience who feel strong about #openscience #opendata and the movement to make scientific discourse freely accessible.

His article “Academic Publishing Can’t Remain Such a Great Business” discusses the history of academic publishing and how consolidation of smaller publishers into large scientific publishing houses (Bigger publishers bought smaller ones) has produced a monopoly like environment in which prices for journal subscriptions are rising. He also discusses how the open access movement is challenging this model and may oneday replace the big publishing houses.

A few tidbits from his article:

Publishers of academic journals have a great thing going. They generally don’t pay for the articles they publish, or for the primary editing and peer reviewing essential to preparing them for publication (they do fork over some money for copy editing). Most of this gratis labor is performed by employees of academic institutions. Those institutions, along with government agencies and foundations, also fund all the research that these journal articles are based upon.

Yet the journal publishers are able to get authors to sign over copyright to this content, and sell it in the form of subscriptions to university libraries. Most journals are now delivered in electronic form, which you think would cut the cost, but no, the price has been going up and up:

This isn’t just inflation at work: in 1994, journal subscriptions accounted for 51 percent of all library spending on information resources. In 2012 it was 69 percent.

Who exactly is getting that money? The largest academic publisher is Elsevier, which is also the biggest, most profitable division of RELX, the Anglo-Dutch company that was known until February as Reed Elsevier.

RELX reports results in British pounds; I converted to dollars in part because the biggest piece of the company’s revenue comes from the U.S. And yes, those are pretty great operating-profit margins: 33 percent in 2014, 39 percent in 2013. The next biggest academic publisher is Springer Nature, which is closely held (by German publisher Holtzbrinck and U.K. private-equity firm BC Partners) but reportedly has annual revenue of about $1.75 billion. Other biggies that are part of publicly traded companies include Wiley-Blackwell, a division of John Wiley & Sons; Wolters Kluwer Health, a division of Wolters Kluwer; and Taylor & Francis, a division of Informa.

And gives a brief history of academic publishing:

The history here is that most early scholarly journals were the work of nonprofit scientific societies. The goal was to disseminate research as widely as possible, not to make money — a key reason why nobody involved got paid. After World War II, the explosion in both the production of and demand for academic research outstripped the capabilities of the scientific societies, and commercial publishers stepped into the breach. At a time when journals had to be printed and shipped all over the world, this made perfect sense.

Once it became possible to effortlessly copy and disseminate digital files, though, the economics changed. For many content producers, digital copying is a threat to their livelihoods. As Peter Suber, the director of Harvard University’s Office for Scholarly Communication, puts it in his wonderful little book, “Open Access”:

And while NIH Tried To Force These Houses To Accept Open Access:

About a decade ago, the universities and funding agencies began fighting back. The National Institutes of Health in the U.S., the world’s biggest funder of medical research, began requiring in 2008 that all recipients of its grants submit electronic versions of their final peer-reviewed manuscripts when they are accepted for publication in journals, to be posted a year later on the NIH’s open-access PubMed depository. Publishers grumbled, but didn’t want to turn down the articles.

Big publishers are making $ by either charging as much as they can or focus on new customers and services

For the big publishers, meanwhile, the choice is between positioning themselves for the open-access future or maximizing current returns. In its most recent annual report, RELX leans toward the latter while nodding toward the former:

Over the past 15 years alternative payment models for the dissemination of research such as “author-pays” or “author’s funder-pays” have emerged. While it is expected that paid subscription will remain the primary distribution model, Elsevier has long invested in alternative business models to address the needs of customers and researchers.

Elsevier’s extra services can add news avenues of revenue

https://www.elsevier.com/social-sciences/business-and-management

https://www.elsevier.com/rd-solutions

but they may be seeing the light on OpenAccess (possibly due to online advocacy, an army of scientific curators and online scientific communities):

Elsevier’s Mendeley and Academia.edu – How We Distribute Scientific Research: A Case in Advocacy for Open Access Journals

SAME SCIENTIFIC IMPACT: Scientific Publishing – Open Journals vs. Subscription-based

e-Recognition via Friction-free Collaboration over the Internet: “Open Access to Curation of Scientific Research”

Indeed we recently put up an interesting authored paper “A Patient’s Perspective: On Open Heart Surgery from Diagnosis and Intervention to Recovery” (free of charge) letting the community of science freely peruse and comment, and generally well accepted by both author and community as a nice way to share academic discourse without the enormous fees, especially on opinion papers in which a rigorous peer review may not be necessary.

But it was very nice to see a major news outlet like Bloomberg View understand the lowly scientist’s aggravations.

Thanks Bloomberg!

Read Full Post »

Innovation in Laboratory Information Systems

Posted in Uncategorized, tagged Medical Informatics on October 28, 2015| Leave a Comment »

Innovation in Laboratory Information Systems

Larry H. Bernstein, MD, FCAP, LPBI

It is not clear how much has been completed in advancing the functionality of the LIS in the following reports. Nevertheless, these are both groundbreaking leaps in functionality that we have not seen in decades. The difficulty of what has been accomplished is not easy to comprehend, but the functionality has to be extended to the EHR. Whether that shall be accomplished in the near future is questionable. The EHR is still essentilly back door and inadequate for the users needs, but is largely a billing capture system. The problem resides in the failure of design in these systems to prioritize the data and to quickly delve through combinatorial information.

http://www.dtu.dk/english http://www.bio.dtu.dk/english/Nyheder/Nyhed News Groundbreaking computer program…

Groundbreaking computer program diagnoses cancer in two days

In by far the majority of cancer cases, the doctor can quickly identify the source of the disease, for example cancer of the liver, lungs, etc. However, in about one in 20 cases, the doctor can confirm that the patient has cancer—but cannot find the source. These patients then face the prospect of a long wait with numerous diagnostic tests and attempts to locate the origin of the cancer before starting any treatment.

By Mette Haagen Marcussen

Now, researchers at DTU Systems Biology have combined genetics with computer science and created a new diagnostic technology based on advanced self-learning computer algorithms which—on the basis of a biopsy from a metastasis—can with 85 per cent certainty identify the source of the disease and thus target treatment and, ultimately, improve the prognosis for the patient.

Each year, about 35,000 people are diagnosed with cancer in Denmark, and many of them face the prospect of a long wait until the cancer has been diagnosed and its source located. However, even after very extensive tests, there will still be 2-3 per cent of patients where it has not been possible to find the origin of the cancer. In such cases, the patient will be treated with a cocktail of chemotherapy instead of a more appropriately targeted treatment, which could be more effective and gentler on the patient.

Fast and accurate results
The newly developed method, which researchers are calling TumorTracer, are based on analyses of DNA mutations in cancer tissue samples from patients with metastasized cancer, i.e. cancer which has spread.

“We are very pleased that we can now use the same sequencing data together with our new algorithms to provide a much faster diagnosis for cancer cases that are difficult to diagnose.”

Associate Professor Aron Eklund from DTU Systems Biology

The pattern of mutations is analysed in a computer program which has been trained to find possible primary tumour localizations. The method has been tested on many thousands of samples where the primary tumour was already identified, and it has proven extremely precise. The next step will be to test the method on patients with unknown primary tumours. In recent years, researchers have discovered several ways of using genome sequencing of tumours to predict whether an individual cancer patient will benefit from a specific type of medicine.

This is a very effective method, and it is becoming increasingly common to conduct such sequencing for cancer patients. Associate Professor Aron Eklund from DTU Systems Biology explains:

“We are very pleased that we can now use the same sequencing data together with our new algorithms to provide a much faster diagnosis for cancer cases that are difficult to diagnose, and to provide a useful diagnosis in cases which are currently impossible to diagnose. At the moment, it takes researchers two days to obtain a biopsy result, but we expect this time to be reduced as it becomes possible to do the sequencing increasingly faster. And it will be straightforward to integrate the method with the methods already being used by doctors.”

Researchers expect that, in the long term, the method can also be used to identify the source of free cancer cells from a blood sample, and thus also as an effective and easy way of monitoring people who are at risk of developing cancer.

Read the scientific article TumorTracer in BMC Medical Genomics.

TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen

Andrea Marion Marquard¹, Nicolai Juul Birkbak¹², Cecilia Engel Thomas¹³, Francesco Favero¹, Marcin Krzystanek¹, Celine Lefebvre⁴, Charles Ferté⁴⁵, Mariam Jamal-Hanjani²,Gareth A. Wilson², Seema Shafi², Charles Swanton²⁶, Fabrice André⁴⁵, Zoltan Szallasi¹⁷^*and Aron Charles Eklund¹

BMC Medical Genomics 2015, 8:58 doi:10.1186/s12920-015-0130-0 http://www.biomedcentral.com/1755-8794/8/58

A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available.

We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing.

The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data.

Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin.

Cancer arises as a result of changes in the genomes of healthy cells; thus every tumor holds a set of mutations that reflect the transformational process as well as the selective pressure that shaped the tumor. Specific types of cancer are often driven by mutations, amplification, or deletions of specific oncogenes or tumor suppressor genes that are rarely or never observed in other types of cancer. For example, the proto-oncogene KRAS is found mutated in ~42 % of colorectal tumors but in less than 1 % of breast tumors; whereas amplification of ERBB2 is found in ~13 % of breast tumors but in only ~3 % of colorectal tumors [1]. With the increasing amount of cancer sequencing data available, we hypothesized that it may be possible to identify broad patterns in mutation or copy number profiles that can be used to distinguish among various cancer types.

A method to infer the tissue origin or site of a tumor could be useful in the diagnosis and treatment of metastatic cancer. Around 10–15 % of cancer patients present with metastatic cancer; in many of these cases the primary tumor cannot be readily located [2]. After histopathology and specialized investigations such as colonoscopy, CT scans, etc., 2–4 % of all cancers remain “cancers of unknown primary” (CUPs) [3]. If a genomic test could identify the most likely primary site of a metastatic tumor, this could enable more efficient treatment as well as improve patient outcomes. Indeed, early results suggest that exome sequencing can be used to suggest likely primary sites for CUPs [4].

A second prospective application of a genomic test to locate the origin of cancer is in the context of blood or urine screening programs for early detection of cancer. The detection and sequencing of cell-free circulating tumor DNA (ctDNA), as well as circulating tumor cells (CTCs), has recently been demonstrated for several cancer types [5]. As this technology develops, blood or urine sequencing may become standard to screen individuals at high risk of developing cancer. If cancer-implicated mutations are found in these fluids, a method to immediately deduce the location of the tumor directly from these mutations could enable quicker diagnosis and treatment of the disease.

Several genomic features have been systematically compared across, and found to differ between, various cancer types [6]. The pattern of gain or loss of specific chromosome regions, or copy number profile, has been explored by cytogenetic and hybridization-based methods [7]–[9]. Tumor-specific enrichment for mutations in certain genes, sometimes at specific positions within the gene, has been observed, and also used to infer tumor localization [10], [11]. The frequency of specific base substitutions, both alone and in the context of the two flanking bases, also seems to follow tissue-specific patterns [12], [13] and may reflect specific chemical or enzymatic mutational processes.

We aimed to determine how well the somatic mutations, here defined as a collective term for somatic point mutations and somatic copy number aberrations (SCNAs), found in a tumor can be used to infer its primary tissue of origin. The quality and quantity of data from tumor genome (or exome) sequencing can vary; therefore we developed and compared performance of classification algorithms utilizing various types and amounts of information. Specifically, we hypothesized that copy number profiles would add to the classifier performance. However, although tumor copy number profiles can be derived from whole genome or whole exome sequence data [14], the quality and reliability depends on adequate sequencing depth, and is therefore not available for all sequenced samples. Thus, we evaluated classifiers based on somatic point mutations only, here used as a collective term for single nucleotide substitutions, short insertions and deletions, and classifiers based on point mutations as well as SCNAs, separately.

Derivation of features

Non-synonymous mutations

Base substitution frequency

Trinucleotide base substitution frequency

Copy number aberrations

Machine learning

We considered four commonly used machine learning methods: stepwise additive logistic regression, artificial neural networks, support vector machines, and random forests. We anticipated that presence or absence of mutations in 232 genes recurrently mutated in cancer [10] along with the six single base substitution frequencies would allow fairly good discrimination between primary sites, and used these features to evaluate the performance of these four machine learning methods on the training data.

Validation data

SAFIR01 and MOSCATO trials

Mutation calls based on whole exome sequencing data for a cohort of 91 metastatic breast cancers was obtained from the Department of Medical Oncology, Gustave Roussy, Villejuif, France from the trials SAFIR01 (NCT01414933) [20] and MOSCATO (NCT01566019).

COSMIC v70

NSCLC cohort

In the non-small cell lung cancer patient cohort study (UCLHRTB 10/H1306/42), tumor specimens were collected from patients who were eligible for surgical resection at the University College London Hospitals NHS Foundation Trust.

Results

Development of a classifier based on somatic point mutations

We used the COSMIC version 68 Whole Genomes database to identify tumor specimens with genome-wide or exome-wide somatic point mutation data, and focused on solid non-CNS tumors of the ten primary sites for which at least 200 unique specimens were available (Table 1). CNS tumors were not included because extraneural metastases of these tumors are rare [33], and 200 specimens were required to allow for a reasonable number of tumors of each primary site within each cross-validation training and test set. The resulting 4,975 specimens were split randomly, while retaining proportionality of each class, into a training set of 3,982 specimens used to derive the classifier, and a test set of 993 specimens that was not used except to evaluate the final classifier. We used five-fold cross validation on the training set to select the feature sets as described below. For each primary site a binary random forest classifier was trained to distinguish that site from all other sites. When these classifers were applied to test samples, classifications were made for the primary site with the highest classification score (Fig. 1).

Table 1. Number of specimens available in the COSMIC whole genomes v68 database, with point mutations (PM) or with both point mutations and copy number aberrations (PM + CN), including those in the training set and those in the testing set. Categories with counts <200 were not analyzed and are omitted here

http://www.biomedcentral.com/content/figures/s12920-015-0130-0-1.gif

Fig. 1. Classifier outline. Somatic point mutation data is used to determine the mutation status of a set of cancer genes and to calculate the distributions of 96 classes of base substitutions. When copy number profiles are available, they are used to infer any SCNAs in the same set of cancer genes. These features are combined and provided to a set of random forest classifiers, one per primary site, each of which generates a classification score. The PM classifier does not use copy number profiles and is trained to distinguish between all 10 primary sites. The PM + CN classifier does use copy number profiles (orange), but can only distinguish between 6 primary sites (white) due to less training data. Thus, blue boxes are components of the the PM classifier only, and orange boxes are components of the PM + CN classifier only, and white boxes are components of both classifiers. These sites were selected based on the availability of sufficient training data (>200 cases)

Mutation status of recurrent cancer genes

For each sample, we determined the number of non-synonymous point mutations occurring within the coding regions of each of 232 genes that are recurrently mutated in cancer [10]. When training a model with these features alone we achieved a cross-validation accuracy of 55 % across the ten primary sites (Fig. 2a). Accuracy varied among primary sites, from 36 % for liver to 78 % for large intestine.

http://www.biomedcentral.com/content/figures/s12920-015-0130-0-2.gif

Fig. 2. Cross-validation accuracy in the training data using various combinations of feature sets. Random forest ensembles were trained using the feature sets shown in the tables below each bar, and classification accuracy was evaluated by cross-validation. Sufficient SCNA data was available for only six of ten primary sites; thus we analyzed these six sites separately when including SCNAs. a Classification accuracy when excluding SCNAs and distinguishing between ten primary sites. b Classification accuracy when including SCNAs and distinguishing between six primary sites. Accuracy of individual sites are indicated by colored circles. The two combinations of feature sets selected for further analysis are indicated at the top; PM: point mutations only, PM + CN: point mutations and copy number aberrations

Single base substitution frequency

Single base substitutions are found at different frequencies across tumors, likely reflecting the mutational processes that shaped the tumor genome. For example, carcinogens in tobacco smoke cause C to A transitions, which are found frequently in lung tumors. For each tumor sample, we used all base substitution mutations, regardless of their effect, to calculate the relative frequencies of the six different classes of single base substitutions. This feature set alone classified primary site with an overall accuracy of 48 %, but when combined with the point mutation feature set described above accuracy increased to 65 % (Fig. 2a).

Trinucleotide-context base substitution frequency

The imprint left by some mutational processes may not be fully discernible at the single-base resolution, and subclassification of the mutations by their trinucleotide sequence context has previously been used to decipher mutational signatures in cancer [34]. For each tumor sample, we used all single nucleotide substitution mutations and their flanking 5’ and 3’ bases to calculate the relative frequencies of the 96 possible trinucleotide mutations. This feature set alone identified primary site with an overall accuracy of 58 %, but when combined with the point mutation feature set described above accuracy increased to 66 % (Fig. 2a).

Development of a classifier based on somatic point mutations and copy number aberrations

….

Performance of PM and PM + CN classifiers on test data

We applied these two classifiers to the fraction of COSMIC data that had been set aside as test data, and achieved an overall accuracy of 69 % and 85 % with the PM and PM + CN classifiers, respectively (Figs. 3a and 4a).

http://www.biomedcentral.com/content/figures/s12920-015-0130-0-3.gif

Fig. 3. Performance of final PM classifier on the test data. a Confusion matrix of actual vs. predicted primary sites, with sensitivity, specificity, and marginal frequencies. b Performance of the final classifier in prioritizing primary sites. Each point indicates the cumulative accuracy when, for each sample, the top n highest-scoring sites are considered, or when sites are ranked by frequency or by random guess. c Classification accuracy increases with confidence score. Circles and bars indicate the accuracy and 95 % confidence interval for each bin of samples. Grey columns indicate the number of samples in each bin. d Accuracy vs. fraction of samples called. Accuracy (solid line) and 95 % confidence interval (grey region) of the corresponding fraction of tumors with highest confidence score. The fraction of tumors for which an accuracy of 95 % can be achieved is shown by a red circle with whiskers at the bottom

http://www.biomedcentral.com/content/figures/s12920-015-0130-0-4.gif

Fig. 4. Performance of final PM + CN classifier on the test data. a–d see Fig. 3 legend

Table 2. Some clinical subgroups are associated with increased or decreased performance of the primary site classifiers PM and PM + CN

Performance of PM classifier on independent validation cohorts

Our classifiers were developed using the data in COSMIC version 68. As an independent validation set we downloaded COSMIC version 70 point mutation data, and filtered out any specimens that were already entered in v68. This data is reasonably independent from the training data, because all data analysis steps such as quality control, alignment, mutation calls, etc., which could have added a systematic bias, were performed by the authors of the original publications rather than by COSMIC. From this independent validation set of 1669 samples from 9 primary sites we could derive the point mutation and trinucleotide frequency feature sets, based on which our model achieved accuracy slightly lower than expected from the test set, yet still substantially higher than random classification (Fig. 5a).

http://www.biomedcentral.com/content/figures/s12920-015-0130-0-5.gif

Fig. 5. Performance of the PM classifier on independent validation data. aTumors of various types from COSMIC v70 (n = 1669). b Metastatic breast tumors from the SAFIR01 trial (n = 91). c Multiregion-sequenced non-small cell lung cancer (n = 9). See Fig. 3b legend. For comparison, the expected performance of our method in each data set was estimated according to the distribution of primary sites and the site-specific accuracies on test data

Next, we applied the PM classifier to point mutation calls from 91 metastatic breast tumors from SAFIR01, a clinical trial to assess benefit of exome sequencing for metastatic breast cancer. These calls were derived from whole exome sequencing of metastasis biopsy specimens and matched blood samples. Our method correctly proposed breast as the primary site in 53 % of the samples (Fig. 5b). This is slightly lower than the breast-specific specificity of 61 % on the test set (Fig. 3a). After breast, the most commonly proposed sites were ovary (21 %) and prostate (11 %).

Finally, we applied the PM classifier to point mutation calls from whole exome sequencing of 24 specimens from 9 non-small cell lung cancer (NSCLC) patients in a cohort study in which multiple regions from the same lesion were sequenced to study intratumor heterogeneity. In addition, lymph node metastases had been analysed in some cases. When pooling the mutations called in all specimens of a lung tumor, our method correctly proposed lung as the primary site in eight out of nine tumors (Fig. 5c). When the 24 specimens were analysed individually, we found that the majority of the subregions and metastases were proposed to be of the same origin as the pooled specimens (Fig. 6).

http://www.biomedcentral.com/content/figures/s12920-015-0130-0-6.gif

Fig. 6. Consistency of the PM classifier on data from multiple samples from the same tumor. The classifier was applied to 24 specimens from 9 NSCLC patients, including primary regions (R) and lymph node metastases (L). The proposed primary site is indicated by color along with the confidence score

Comparison of the PM classifier with an existing method

To our knowledge there are no previously published studies that use copy number aberrations to infer the primary site of a tumor. However, there is one study aimed at inferring tumor primary site from point mutations [11].

…..

We compared ICOMS calls to our calls obtained for cross-validation test sets, and compared both to the actual primary sites, and found that ICOMS made 125 correct calls, whereas our classifier made 232 correct calls (Additional file 3: Table S1).

The two algorithms deal with uncertainty in different ways: ICOMS in some cases proposes no primary site, whereas our classifiers always propose a site along with a corresponding confidence score. Therefore, we did a second analysis omitting the n samples with lowest confidence scores generated by our classifier, in which n was the number of samples for which ICOMS made no proposal, and compared the performance of each method on the 109 samples for which both methods proposed a primary site. Accuracy, defined as the percentage of samples for which the correct primary site was inferred, was significantly higher by our classifier than by ICOMS (96 % vs. 83 %, p = 0.003).

Discussion

We developed proof-of-concept classifiers designed to identify the primary site of a tumor from its genomic profile. Specifically, our most accurate classifier used the point mutation and copy number status of a set of 232 genes recurrently mutated in cancer, as well as the relative frequencies of 96 classes of base substitutions. As more mutation data becomes available, it will likely be possible to increase accuracy and to develop classifiers for additional primary sites, which may involve additional genes.

In many cases, tumor material as well as resources for sequencing may be limited, and we therefore evaluated how well our algorithms performed in the context of less extensive or fewer types of data. We found that the type of feature that best identifies primary site on its own is the copy number profile. Copy number profiles can be inferred along with point mutations from sequencing data of sufficient depth [14], and the use of assays such as SNP arrays that measure copy number but not point mutations may thus become less frequent as sequencing costs decrease. Also, even though SCNA data provides notable increase in performance, using point mutation data alone still results in classification with an accuracy sufficiently high to be of clinical interest. A classifier using point mutations but not SCNAs could be preferred if sequencing depth or sample purity were not sufficient to infer copy numbers from sequencing data, or if point mutations were called from targeted sequencing of a restricted gene set.

Our classifiers were trained on data found in COSMIC, much of which comes from larger studies of many tumors of the same primary site. This introduces the possibility of bias resulting from confounding factors such as experimental or analytical protocols, which may explain why we observed slightly reduced performance in two of three independent validation data sets relative to what would be expected based on training data performance. The effect of this possible bias will be reduced as more data from multiple studies becomes available.

Our method does not use raw DNA sequence as input but instead relies on lists of point mutations, which are the output of algorithms designed to call mutations from sequence data. Several mutation calling algorithms exist, and there are extensive discrepancies between their output [35]. These discrepancies may influence the performance of our method, as well as any other method relying on point mutation calls.

see more at http://www.biomedcentral.com/1755-8794/8/58

It would be another step forward if the misdiagnosed cases were compared with respect to the cancer oncotherapy results, and perhaps that might provide another measure of clarity to reduce future misclassification. The same would be important for examining the therapeutic results of the correctly classified cancer diagnoses.

The next article is also quite an improvement over what has been experienced in the last many years.

More Effective Cancer Diagnosis Through Tight Integration Of Sectra’s Digital Pathology Solution With Software Point’s LIMS

http://www.laboratorynetwork.com/doc/more-effective-cancer-software-points-lims-0001

International medical imaging IT company Sectra has announced that its digital pathology solution can now be integrated with Software Point LIMS (Laboratory Information Management System). The integration enables a more efficient digital workflow, resulting in shortened lead times and improved precision in pathology diagnosis. Savings in pathologist time per case will be substantial. The integrated solution will be demonstrated at the 3rd Symposium on Digital Pathology on November 3-4 in Linköping, Sweden.

The integration between Sectra and the Software Point LIMS LabVantage Medical Suite creates a unique and seamless digital diagnostic workflow for the pathologist, including full access to lab data, requests, patient information and digital pathology images – fully synchronized between the two applications. This allows the pathologist to work fully digitally and to utilize digital tools to improve effectiveness.

“We see that tight integration with surrounding IT systems, such as different LIMS systems, is key in achieving efficient digital pathology workflows,” says Simon Häger, Product Manager for Sectra’s pathology solution.

About Sectra Digital Pathology Solution
Sectra provides a complete solution for primary diagnostics in pathology. The solution includes archiving and storage solutions together with the high-end review workstation. It allows pathologists to make their diagnoses and reports with higher precision and less time spent per case. Sectra’s solution for digital pathology is built on the same platform as Sectra’s radiology PACS, the solution for managing radiology images. With a shared technical platform, images from both of the diagnostic specialties can be stored and displayed in a single system. This enables deeper cooperation between radiologists and pathologists and facilitates, for example, multidisciplinary rounds, a step in so-called integrated diagnostics.

European and Scandinavian pathology departments are in the process of digitizing their work, but nonetheless, only a few hospitals have implemented full-scale digital pathology solutions. In the US, digital pathology for primary diagnostics is still pending FDA approval.

About Sectra
Sectra was founded in 1978 and has its roots in Linköping University in Sweden. The company’s business operation includes cutting-edge products and services within the niche segments of medical systems and secure communication systems. Sectra has offices in 12 countries and operates through partners worldwide. Sales in the 2014/2015 fiscal year totaled SEK 961 million. For more information, visit http://www.sectra.com

In the medical market, Sectra develops and sells IT systems and services for radiology and other image-intensive departments, orthopaedics and rheumatology. More than 1,700 hospitals, clinics and imaging centers worldwide use the systems daily. This makes Sectra one of the world-leading companies for handling digital radiology images. In Scandinavia, Sectra is the market leader with more than 50% of all film-free installations. Sectra’s systems have been installed in North America, Scandinavia and most major countries in Europe and the Far East.

SOURCE: Sectra

There is a third case that I have previously discussed. It is not LIS in the way I have presented, but worth review.

pathway and network analysis of complex ‘omics data

http://pharmaceuticalintelligence.com/2015/10/23/pathway-and-network-analysis-of-complex-omics-data/

While blood tests can be used to detect some cancers, the FDA said a San Diego company has no proof its blood test works in patients who have not already been diagnosed with some form of the disease.

WASHINGTON, Sept. 25 (UPI) — A San Diego company selling an early cancer detection test was notified by the U.S. Food and Drug Administration it can find no evidence the test actually works, and is concerned it could prove to be harmful for some people.

Pathway Genomics debuted its CancerIntercept test in early September with claims it can detect cancer cell DNA in the blood, picking up mutations linked to as many as 10 different cancers. The goal is to catch cancer early in people who are “otherwise healthy” and not showing symptoms of the disease.

“Based on our review of your promotional materials and the research publication cited above, we believe you are offering a high risk test that has not received adequate clinical validation and may harm the public health,” said FDA Deputy Director James L. Woods in a letter to the company.

CancerIntercept is billed by the company as a blood test looking for DNA fragments in the bloodstream and testing them for 96 genomic markers it says are found in several specific tumor types.

The direct-to-consumer test can be purchased through the Pathway Genomics website, with programs ranging from a one-time test to a quarterly “subscription” for people who want regular testing.

The company states, in several sections of its website, “the presence of one or more of these genomic markers in a patient’s bloodstream may indicate that the patient has a previously undetected cancer. However, the test is not diagnostic, and thus, follow-up screening and clinical testing would be required to confirm the presence or absence of a specific cancer in the patient.”

The FDA is concerned that people may seek treatment for tumors that do not require medical attention, or spend money and possibly seek out treatment they do not need at all — in either case, unnecessary treatment for cancer is potentially harmful to people, the agency said.

CancerIntercept has not been approved by the FDA for use as a medical device, nor has it been subjected to peer review as most tests of its type would be. The company published a white paper on its website which outlines how the test works, supporting its efficacy with references to several clinical trials on detection of mutated DNA in the bloodstream.

Glenn Braunstein, Chief Medical Officer at Pathway Genomics, told The VergePathway had validated its tests with “hundreds” of patients, though those patients had well-defined, often advanced cancers.

In the letter from the FDA, Woods requests the company provide a timeline for meeting with the agency to review plans for future longitudinal studies on the product and specific details on studies that have been conducted before it was made available to consumers.

http://www.upi.com/Health_News/2015/09/25/FDA-Start-ups-cancer-blood-test-may-be-harmful/4191443181676/

The clinical laboratory is an essential player in the treatment of cancer providing a diagnostic, potentially a prognostic, and follow-up treatment armamentarium. The laboratory diagnostics industry has grown over the last half century into a highly accurate, well regulated industry with highly automated and point of care technologies. Prior to introduction, the tests that are put on the market have to be validated prior to introduction.

How are they validated?

The most common approach is for the test to be used concomitantly with treatment in a clinical trial. Measurements may be made prior to surgical biopsy and treatment, and at a month or 6 months to a year later. The pharmaceutical and diagnostics industries are independent, even though a large company may have both pharmaceutical and diagnostic divisions. Consequently, the integration of diagnostics and therapeutics occurs on the front lines of patient care.

How this discrepancy between the FDA and the manufacturer could occur is not clear because prior to introduction, the test would have to be rigorously reviewed by the American Association for Clinical Chemistry, the largest and most competent organization to cover the scientific work, having industry-based committees. The only problem is that the companies may have products that are patented and have competing claims or interests. This is perhaps most likely to be problematic in the competitive environment of genomics testing.

The company here reported on is Pathway Genomics, that offers Ingenuity for pathway and variant analysis. There is no concern about the analysis methods, that are well studied. The concern is the validation of such method for screening of patients without prior diagnosis.

Model, analyze, and understand the complex biological and chemical systems at the core of life science research with IPA

QIAGEN’S Ingenuity Pathway Analysis (IPA) has been broadly adopted by the life science research community and is cited in thousands of peer-reviewed journal articles.

https://youtu.be/_HDkjuxYRcY

https://youtu.be/_HDkjuxYRcY?t=25

For the analysis and interpretation of ’omics data

Market Leading Pathway Analysis

Unlock the insights buried in experimental data by quickly identifying relationships, mechanisms, functions, and pathways of relevance.

Predictive Causal Analytics

Powerful causal analytics at your fingertips help you to build a more complete regulatory picture and a better understanding of the biology underlying a given gene expression study.

NGS/RNA-Seq Data Analysis

Get a better understanding of the isoform-specific biology resulting from RNA-Seq experiments.

Identify causal variants from human sequencing data

Ingenuity IPA Interpret Biological Meaning Graphic

http://www.ingenuity.com/wp-content/uploads/2014/01/variant-analyisis-interpretation.png

Rapidly Identify and Prioritize Variants

Ingenuity Variant Analysis combines analytical tools and integrated content to help you rapidly identify and prioritize variants by drilling down to a small, targeted subset of compelling variants based both upon published biological evidence and your own knowledge of disease biology. With Variant Analysis, you can interrogate your variants from multiple biological perspectives, explore different biological hypotheses, and identify the most promising variants for follow-up.

Variant Analysis used in NCI-60 Interpretation of Genomic Variants

The NCI-60 Data Set offers tremendous promise in the development and prescription of cancer drugs

97% of surveyed researchers are satisfied with the ease of use of Ingenuity Variant Analysis and we are honored that they chose to share the data through our Publish tool.

See the research verified by TechValidate

“Being a bioinformatician, I appreciated the speed and the complexity of analysis. Without Variant Analysis, I couldn’t have completed the analysis of 700 exomes in such a short time …. I found Variant Analysis very intuitive and easy to use.”

Francesco Lescai, Senior Research Associate in Genome Analysis, University College of London.

This appears to be the new rocky road to verification for validity in diagnostic and treatment application.

Hematological Malignancy Diagnostics

http://pharmaceuticalintelligence.com/2015/08/11/hematological-malignancy-diagnostics-4-2-3/

Author and Curator: Larry H. Bernstein, MD, FCAP

2.4.3 Diagnostics

2.4.3.1 Computer-aided diagnostics

Robert Didner
Bell Laboratories

Decision-making in the clinical setting
Didner, R Mar 1999 Amer Clin Lab

Mr. Didner is an Independent Consultant in Systems Analysis, Information Architecture (Informatics) Operations Research, and Human Factors Engineering (Cognitive Psychology), Decision Information Designs, 29 Skyline Dr., Morristown, NJ07960, U.S.A.; tel.: 973-455-0489; fax/e-mail: bdidner@hotmail.com

A common problem in the medical profession is the level of effort dedicated to administration and paperwork necessitated by various agencies, which contributes to the high cost of medical care. Costs would be reduced and accuracy improved if the clinical data could be captured directly at the point they are generated in a form suitable for transmission to insurers or machine transformable into other formats. Such a capability could also be used to improve the form and the structure of information presented to physicians and support a more comprehensive database linking clinical protocols to outcomes, with the prospect of improving clinical outcomes. Although the problem centers on the physician’s process of determining the diagnosis and treatment of patients and the timely and accurate recording of that process in the medical system, it substantially involves the pathologist and laboratorian, who interact significantly throughout the in-formation-gathering process. Each of the currently predominant ways of collecting information from diagnostic protocols has drawbacks. Using blank paper to collect free-form notes from the physician is not amenable to computerization; such free-form data are also poorly formulated, formatted, and organized for the clinical decision-making they support. The alternative of preprinted forms listing the possible tests, results, and other in-formation gathered during the diagnostic process facilitates the desired computerization, but the fixed sequence of tests and questions they present impede the physician from using an optimal decision-making sequence. This follows because:

People tend to make decisions and consider information in a step-by-step manner in which intermediate decisions are intermixed with data acquisition steps.
The sequence in which components of decisions are made may alter the decision outcome.
People tend to consider information in the sequence it is requested or displayed.
Since there is a separate optimum sequence of tests and questions for each cluster of history and presenting symptoms, there is no one sequence of tests and questions that can be optimal for all presenting clusters.
As additional data and test results are acquired, the optimal sequence of further testing and data acquisition changes, depending on the already acquired information.

Therefore, promoting an arbitrary sequence of information requests with preprinted forms may detract from outcomes by contributing to a non-optimal decision-making sequence. Unlike the decisions resulting from theoretical or normative processes, decisions made by humans are path dependent; that is, the out-come of a decision process may be different if the same components are considered in a different sequence.

Proposed solution

This paper proposes a general approach to gathering data at their source in computer-based form so as to improve the expected outcomes. Such a means must be interactive and dynamic, so that at any point in the clinical process the patient’s presenting symptoms, history, and the data already collected are used to determine the next data or tests requested. That de-termination must derive from a decision-making strategy designed to produce outcomes with the greatest value and supported by appropriate data collection and display techniques. The strategy must be based on the knowledge of the possible outcomes at any given stage of testing and information gathering, coupled with a metric, or hierarchy of values for assessing the relative desirability of the possible outcomes.

A value hierarchy

The numbered list below illustrates a value hierarchy. In any particular instance, the higher-numbered values should only be considered once the lower- numbered values have been satisfied. Thus, a diagnostic sequence that is very time or cost efficient should only be considered if it does not increase the likelihood (relative to some other diagnostic sequence) that a life-threatening disorder may be missed, or that one of the diagnostic procedures may cause discomfort.
Minimize the likelihood that a treatable, life-threatening disorder is not treated.
Minimize the likelihood that a treatable, discomfort-causing disorder is not treated.
Minimize the likelihood that a risky procedure(treatment or diagnostic procedure) is inappropriately administered.
Minimize the likelihood that a discomfort-causing procedure is inappropriately administered.
Minimize the likelihood that a costly procedure is inappropriately administered.
Minimize the time of diagnosing and treating thepatient.8.Minimize the cost of diagnosing and treating the patient.

The above hierarchy is relative, not absolute; for many patients, a little bit of testing discomfort may be worth a lot of time. There are also some factors and graduations intentionally left out for expository simplicity (e.g., acute versus chronic disorders).This value hierarchy is based on a hypothetical patient. Clearly, the hierarchy of a health insurance carrier might be different, as might that of another patient (e.g., a geriatric patient). If the approach outlined herein were to be followed, a value hierarchy agreed to by a majority of stakeholders should be adopted.

Efficiency

Once the higher values are satisfied, the time and cost of diagnosis and treatment should be minimized. One way to do so would be to optimize the sequence in which tests are performed, so as to minimize the number, cost, and time of tests that need to be per-formed to reach a definitive decision regarding treatment. Such an optimum sequence could be constructed using Claude Shannon’s information theory.

According to this theory, the best next question to ask under any given situation (assuming the question has two possible outcomes) is that question that divides the possible outcomes into two equally likely sets. In the real world, all tests or questions are not equally valuable, costly, or time consuming; therefore, value(risk factors), cost, and time should be used as weighting factors to optimize the test sequence, but this is a complicating detail at this point.

A value scale

For dynamic computation of outcome values, the hierarchy could be converted into a weighted value scale so differing outcomes at more than one level of the hierarchy could be readily compared. An example of such a weighted value scale is Quality Adjusted Life Years (QALY).

Although QALY does not incorporate all of the factors in this example, it is a good conceptual starting place.

The display, request, decision-making relationship

For each clinical determination, the pertinent information should be gathered, organized, formatted, and formulated in a way that facilitates the accuracy, reliability, and efficiency with which that determination is made. A physician treating a patient with high cholesterol and blood pressure (BP), for example, may need to know whether or not the patient’s cholesterol and BP respond to weight changes to determine an appropriate treatment (e.g., weight control versus medication). This requires searching records for BP, certain blood chemicals (e.g., HDLs, LDLs, triglycerides, etc.), and weight from several

sources, then attempting to track them against each other over time. Manually reorganizing this clinical information each time it is used is extremely inefficient. More important, the current organization and formatting defies principles of human factors for optimally displaying information to enhance human information-processing characteristics, particularly for decision support.

While a discussion of human factors and cognitive psychology principles is beyond the scope of this paper, following are a few of the system design principles of concern:

Minimize the load on short-term memory.
Provide information pertinent to a given decision or component of a decision in a compact, contiguous space.
Take advantage of basic human perceptual and pat-tern recognition facilities.
Design the form of an information display to com-plement the decision-making task it supports.

F i g u re 1 shows fictitious, quasi-random data from a hypothetical patient with moderately elevated cholesterol. This one-page display pulls together all the pertinent data from six years of blood tests and related clinical measurements. At a glance, the physician’s innate pattern recognition, color, and shape perception facilities recognize the patient’s steadily increasing weight, cholesterol, BP, and triglycerides as well as the declining high-density lipoproteins. It would have taken considerably more time and effort to grasp this information from the raw data collection and blood test reports as they are currently presented in independent, tabular time slices.

Design the formulation of an information display to complement the decision-making task.

The physician may wish to know only the relationship between weight and cardiac risk factors rather than whether these measures are increasing or decreasing, or are within acceptable or marginal ranges. If so, Table 1 shows the correlations between weight and the other factors in a much more direct and simple way using the same data as in Figure 1. One can readily see the same conclusions about relations that were drawn from Figure 1.This type of abstract, symbolic display of derived information also makes it easier to spot relationships when the individual variables are bouncing up and down, unlike the more or less steady rise of most values in Figure 1. This increase in precision of relationship information is gained at the expense of other types of information (e.g., trends). To display information in an optimum form then, the system designer must know what the information demands of the task are at the point in the task when the display is to be used.

Present the sequence of information display clusters to complement an optimum decision-making strategy.

Just as a fixed sequence of gathering clinical, diagnostic information may lead to a far from optimum outcome, there exists an optimum sequence of testing, considering information, and gathering data that will lead to an optimum outcome (as defined by the value hierarchy) with a minimum of time and expense. The task of the information system designer, then, is to provide or request the right information, in the best form, at each stage of the procedure. For ex-ample, Figure 1 is suitable for the diagnostic phase since it shows the current state of the risk factors and their trends. Table 1, on the other hand, might be more appropriate in determining treatment, where there may be a choice of first trying a strict dietary treatment, or going straight to a combination of diet plus medication. The fact that Figure 1 and Table 1 have somewhat redundant information is not a problem, since they are intended to optimally provide information for different decision-making tasks. The critical need, at this point, is for a model of how to determine what information should be requested, what tests to order, what information to request and display, and in what form at each step of the decision-making process. Commitment to a collaborative relationship between physicians and laboratorians and other information providers would be an essential requirement for such an undertaking. The ideal diagnostic data-collection instrument is a flexible, computer-based device, such as a notebook computer or Personal Digital Assistant (PDA) sized device.

Barriers to interactive, computer-driven data collection at the source

As with any major change, it may be difficult to induce many physicians to change their behavior by interacting directly with a computer instead of with paper and pen. Unlike office workers, who have had to make this transition over the past three decades, most physicians’ livelihoods will not depend on converting to computer interaction. Therefore, the transition must be made attractive and the changes less onerous. Some suggestions follow:

Make the data collection a natural part of the clinical process.
Ensure that the user interface is extremely friendly, easy to learn, and easy to use.
Use a small, portable device.
Use the same device for collection and display of existing information (e.g., test results and his-tory).
Minimize the need for free-form written data entry (use check boxes, forms, etc.).
Allow the entry of notes in pen-based free-form (with the option of automated conversion of numeric data to machine-manipulable form).
Give the physicians a more direct benefit for collecting data, not just a means of helping a clerk at an HMO second-guess the physician’s judgment.
Improve administrative efficiency in the office.
Make the data collection complement the clinical decision-making process.
Improve information displays, leading to better outcomes.
Make better use of the physician’s time and mental effort.

Conclusion

The medical profession is facing a crisis of information. Gathering information is costing a typical practice more and more while fees are being restricted by third parties, and the process of gathering this in-formation may be detrimental to current outcomes. Gathered properly, in machine-manipulable form, these data could be reformatted so as to greatly improve their value immediately in the clinical setting by leading to decisions with better outcomes and, in the long run, by contributing to a clinical data warehouse that could greatly improve medical knowledge. The challenge is to create a mechanism for data collection that facilitates, hastens, and improves the outcomes of clinical activity while minimizing the inconvenience and resistance to change on the part of clinical practitioners. This paper is intended to provide a high-level overview of how this may be accomplished, and start a dialogue along these lines.

References

Tversky A. Elimination by aspects: a theory of choice. Psych Rev 1972; 79:281–99.
Didner RS. Back-to-front design: a guns and butter approach. Ergonomics 1982; 25(6):2564–5.
Shannon CE. A mathematical theory of communication. Bell System Technical J 1948; 27:379–423 (July), 623–56 (Oct).
Feeny DH, Torrance GW. Incorporating utility-based quality-of-life assessment measures in clinical trials: two examples. Med Care 1989; 27:S190–204.
Smith S, Mosier J. Guidelines for designing user interface soft-ware. ESD-TR-86-278, Aug 1986.
Miller GA. The magical number seven plus or minus two. Psych Rev 1956; 65(2):81–97.
Sternberg S. High-speed scanning in human memory. Science 1966; 153: 652–4.

Table 1

Correlation of weight with other cardiac risk factors

Cholesterol	0.759384
HDL	0.53908
LDL	0.177297
BP-syst.	0.424728
BP-dia.	0.516167
Triglycerides	0.637817

Figure 1 Hypothetical patient data.

(not shown)

Realtime Clinical Expert Support

http://pharmaceuticalintelligence.com/2015/05/10/realtime-clinical-expert-support/

Regression: A richly textured method for comparison and classification of predictor variables

http://pharmaceuticalintelligence.com/2012/08/14/regression-a-richly-textured-method-for-comparison-and-classification-of-predictor-variables/

Converting Hematology Based Data into an Inferential Interpretation

Larry H. Bernstein, Gil David, James Rucinski and Ronald R. Coifman
In Hematology – Science and Practice
Lawrie CH, Ch 22. Pp541-552.
InTech Feb 2012, ISBN 978-953-51-0174-1
https://www.researchgate.net/profile/Larry_Bernstein/publication/221927033_Converting_Hematology_Based_Data_into_an_Inferential_Interpretation/links/0fcfd507f28c14c8a2000000.pdf

A model for Thalassemia Screening using Hematology Measurements

https://www.researchgate.net/profile/Larry_Bernstein/publication/258848064_A_model_for_Thalassemia_Screening_using_Hematology_Measurements/links/0c9605293c3048060b000000.pdf

2.4.3.2 A model for automated screening of thalassemia in hematology (math study).

Kneifati-Hayek J, Fleischman W, Bernstein LH, Riccioli A, Bellevue R.
Lab Hematol. 2007; 13(4):119-23. http://dx.doi.org:/10.1532/LH96.07003.

The results of 398 patient screens were collected. Data from the set were divided into training and validation subsets. The Mentzer ratio was determined through a receiver operating characteristic (ROC) curve on the first subset, and screened for thalassemia using the second subset. HgbA2 levels were used to confirm beta-thalassemia.

RESULTS: We determined the correct decision point of the Mentzer index to be a ratio of 20. Physicians can screen patients using this index before further evaluation for beta-thalassemia (P < .05).

CONCLUSION: The proposed method can be implemented by hospitals and laboratories to flag positive matches for further definitive evaluation, and will enable beta-thalassemia screening of a much larger population at little to no additional cost.

Measurement of granulocyte maturation may improve the early diagnosis of the septic state.

2.4.3.3 Bernstein LH, Rucinski J. Clin Chem Lab Med. 2011 Sep 21;49(12):2089-95.
http://dx.doi.org:/10.1515/CCLM.2011.688.

2.4.3.4 The automated malnutrition assessment.

David G, Bernstein LH, Coifman RR. Nutrition. 2013 Jan; 29(1):113-21.
http://dx.doi.org:/10.1016/j.nut.2012.04.017

2.4.3.5 Molecular Diagnostics

Genomic Analysis of Hematological Malignancies

Acute lymphoblastic leukemia (ALL) is the most common hematologic malignancy that occurs in children. Although more than 90% of children with ALL now survive to adulthood, those with the rarest and high-risk forms of the disease continue to have poor prognoses. Through the Pediatric Cancer Genome Project (PCGP), investigators in the Hematological Malignancies Program are identifying the genetic aberrations that cause these aggressive forms of leukemias. Here we present two studies on the genetic bases of early T-cell precursor ALL and acute megakaryoblastic leukemia.

Early T-Cell Precursor ALL Is Characterized by Activating Mutations
The CBFA2T3-GLIS2Fusion Gene Defines an Aggressive Subtype of Acute Megakaryoblastic Leukemia in Children

Early T-cell precursor ALL (ETP-ALL), which comprises 15% of all pediatric T-cell leukemias, is an aggressive disease that is typically resistant to contemporary therapies. Children with ETP-ALL have a high rate of relapse and an extremely poor prognosis (i.e., 5-year survival is approximately 20%). The genetic basis of ETP-ALL has remained elusive. Although ETP-ALL is associated with a high burden of DNA copy number aberrations, none are consistently found or suggest a unifying genetic alteration that drives this disease.

Through the efforts of the PCGP, Jinghui Zhang, PhD (Computational Biology), James R. Downing, MD (Pathology), Charles G. Mullighan, MBBS(Hons), MSc, MD (Pathology), and colleagues analyzed the whole-genome sequences of leukemic cells and matched normal DNA from 12 pediatric patients with ETP-ALL. The identified genetic mutations were confirmed in a validation cohort of 52 ETP-ALL specimens and 42 non-ETP T-lineage ALLs (T-ALL).

In the journal Nature, the investigators reported that each ETP-ALL sample carried an average of 1140 sequence mutations and 12 structural variations. Of the structural variations, 51% were breakpoints in genes with well-established roles in hematopoiesis or leukemogenesis (e.g., MLH2,SUZ12, and RUNX1). Eighty-four percent of the structural variations either caused loss of function of the gene in question or resulted in the formation of a fusion gene such as ETV6-INO80D. The ETV6 gene, which encodes a protein that is essential for hematopoiesis, is frequently mutated in leukemia. Among the DNA samples sequenced in this study, ETV6 was altered in 33% of ETP-ALL but only 10% of T-ALL cases.

Next-generation sequencing in hematologic malignancies: what will be the dividends?

Jason D. Merker, Anton Valouev, and Jason Gotlib
Ther Adv Hematol. 2012 Dec; 3(6): 333–339.
http://dx.doi.org:/10.1177/2040620712458948

The application of high-throughput, massively parallel sequencing technologies to hematologic malignancies over the past several years has provided novel insights into disease initiation, progression, and response to therapy. Here, we describe how these new DNA sequencing technologies have been applied to hematolymphoid malignancies. With further improvements in the sequencing and analysis methods as well as integration of the resulting data with clinical information, we expect these technologies will facilitate more precise and tailored treatment for patients with hematologic neoplasms.

Leveraging cancer genome information in hematologic malignancies.

Rampal R¹, Levine RL.
J Clin Oncol. 2013 May 20; 31(15):1885-92.
http://dx.doi.org:/10.1200/JCO.2013.48.7447

The use of candidate gene and genome-wide discovery studies in the last several years has led to an expansion of our knowledge of the spectrum of recurrent, somatic disease alleles, which contribute to the pathogenesis of hematologic malignancies. Notably, these studies have also begun to fundamentally change our ability to develop informative prognostic schema that inform outcome and therapeutic response, yielding substantive insights into mechanisms of hematopoietic transformation in different tissue compartments. Although these studies have already had important biologic and translational impact, significant challenges remain in systematically applying these findings to clinical decision making and in implementing new technologies for genetic analysis into clinical practice to inform real-time decision making. Here, we review recent major genetic advances in myeloid and lymphoid malignancies, the impact of these findings on prognostic models, our understanding of disease initiation and evolution, and the implication of genomic discoveries on clinical decision making. Finally, we discuss general concepts in genetic modeling and the current state-of-the-art technology used in genetic investigation.

p53 mutations are associated with resistance to chemotherapy and short survival in hematologic malignancies

E Wattel, C Preudhomme, B Hecquet, M Vanrumbeke, et AL.
Blood, (Nov 1), 1994; 84(9): pp 3148-3157
http://www.bloodjournal.org/content/bloodjournal/84/9/3148.full.pdf

We analyzed the prognostic value of p53 mutations for response to chemotherapy and survival in acute myeloid leukemia (AML), myelodysplastic syndrome (MDS), and chronic lymphocytic leukemia (CLL). Mutations were detected by single-stranded conformation polymorphism (SSCP) analysis of exons 4 to 10 of the P53 gene, and confirmed by direct sequencing. A p53 mutation was found in 16 of 107 (15%) AML, 20 of 182 (11%) MDS, and 9 of 81 (11%) CLL tested. In AML, three of nine (33%) mutated cases and 66 of 81 (81%) nonmutated cases treated with intensive chemotherapy achieved complete remission (CR) (P = .005) and none of five mutated cases and three of six nonmutated cases treated by low-dose Ara C achieved CR or partial remission (PR) (P = .06). Median actuarial survival was 2.5 months in mutated cases, and 15 months in nonmutated cases (P < lo-‘). In the MDS patients who received chemotherapy (intensive chemotherapy or low-dose Ara C), 1 of 13 (8%) mutated cases and 23 of 38 (60%) nonmutated cases achieved CR or PR (P = .004), and median actuarial survival was 2.5 and 13.5 months, respectively (P C lo-’). In all MDS cases (treated and untreated), the survival difference between mutated cases and nonmutated cases was also highly significant. In CLL, 1 of 8 (12.5%) mutated cases treated by chemotherapy (chlorambucil andlor CHOP andlor fludarabine) responded, as compared with 29 of 36 (80%) nonmutated cases (P = .02). In all CLL cases, survival from p53 analysis was significantly shorter in mutated cases (median 7 months) than in nonmutated cases (median not reached) (P < IO-’). In 35 of the 45 mutated cases of AML, MDS, and CLL, cytogenetic analysis or SSCP and sequence findings showed loss of the nonmutated P53 allele. Our findings show that p53 mutations are a strong prognostic indicator of response to chemotherapy and survival in AML, MDS, and CLL. The usual association of p53 mutations to loss of the nonmutated P53 allele, in those disorders, ie, to absence of normal p53 in tumor cells, suggests that p53 mutations could induce drug resistance, at least in part, by interfering with normal apoptotic pathways in tumor cells.

Genomic approaches to hematologic malignancies

Benjamin L. Ebert and Todd R. Golub
Blood. 2004; 104:923-932
https://www.broadinstitute.org/mpr/publications/projects/genomics/Review%20Genomics%20of%20Heme%20Malig,%20Blood%202004.pdf

In the past several years, experiments using DNA microarrays have contributed to an increasingly refined molecular taxonomy of hematologic malignancies. In addition to the characterization of molecular profiles for known diagnostic classifications, studies have defined patterns of gene expression corresponding to specific molecular abnormalities, oncologic phenotypes, and clinical outcomes. Furthermore, novel subclasses with distinct molecular profiles and clinical behaviors have been identified. In some cases, specific cellular pathways have been highlighted that can be therapeutically targeted. The findings of microarray studies are beginning to enter clinical practice as novel diagnostic tests, and clinical trials are ongoing in which therapeutic agents are being used to target pathways that were identified by gene expression profiling. While the technology of DNA microarrays is becoming well established, genome-wide surveys of gene expression generate large data sets that can easily lead to spurious conclusions. Many challenges remain in the statistical interpretation of gene expression data and the biologic validation of findings. As data accumulate and analyses become more sophisticated, genomic technologies offer the potential to generate increasingly sophisticated insights into the complex molecular circuitry of hematologic malignancies. This review summarizes the current state of discovery and addresses key areas for future research.

Realtime Clinical Expert Support

http://pharmaceuticalintelligence.com/2015/05/10/realtime-clinical-expert-support/

Medical Informatics View

Chapter 1

Statement of Inferential Second Opinion

Realtime Clinical Expert Support

Gil David and Larry Bernstein have developed, in consultation with Prof. Ronald Coifman, in the Yale University Applied Mathematics Program, a software system that is the equivalent of an intelligent Electronic Health Records Dashboard that provides empirical medical reference and suggests quantitative diagnostics options.

Keywords: Entropy, Maximum Likelihood Function, separatory clustering, peripheral smear, automated hemogram, Anomaly, classification by anomaly, multivariable and multisyndromic, automated second opinion

Abbreviations: Akaike Information Criterion, AIC; Bayes Information Criterion, BIC, Systemic Inflammatory Response Syndrome, SIRS.

Background: The current design of the Electronic Medical Record (EMR) is a linear presentation of portions of the record by services, by diagnostic method, and by date, to cite examples. This allows perusal through a graphical user interface (GUI) that partitions the information or necessary reports in a workstation entered by keying to icons. This requires that the medical practitioner finds the history, medications, laboratory reports, cardiac imaging and EKGs, and radiology in different workspaces. The introduction of a DASHBOARD has allowed a presentation of drug reactions, allergies, primary and secondary diagnoses, and critical information about any patient the care giver needing access to the record. The advantage of this innovation is obvious. The startup problem is what information is presented and how it is displayed, which is a source of variability and a key to its success.

Intent: We are proposing an innovation that supercedes the main design elements of a DASHBOARD and utilizes the conjoined syndromic features of the disparate data elements. So the important determinant of the success of this endeavor is that it facilitates both the workflow and the decision-making process with a reduction of medical error. Continuing work is in progress in extending the capabilities with model datasets, and sufficient data because the extraction of data from disparate sources will, in the long run, further improve this process. For instance, the finding of both ST depression on EKG coincident with an elevated cardiac biomarker (troponin), particularly in the absence of substantially reduced renal function. The conversion of hematology based data into useful clinical information requires the establishment of problem-solving constructs based on the measured data.

The most commonly ordered test used for managing patients worldwide is the hemogram that often incorporates the review of a peripheral smear. While the hemogram has undergone progressive modification of the measured features over time the subsequent expansion of the panel of tests has provided a window into the cellular changes in the production, release or suppression of the formed elements from the blood-forming organ to the circulation. In the hemogram one can view data reflecting the characteristics of a broad spectrum of medical conditions.

Progressive modification of the measured features of the hemogram has delineated characteristics expressed as measurements of size, density, and concentration, resulting in many characteristic features of classification. In the diagnosis of hematological disorders proliferation of marrow precursors, the domination of a cell line, and features of suppression of hematopoiesis provide a two dimensional model. Other dimensions are created by considering the maturity of the circulating cells. The application of rules-based, automated problem solving should provide a valid approach to the classification and interpretation of the data used to determine a knowledge-based clinical opinion. The exponential growth of knowledge since the mapping of the human genome enabled by parallel advances in applied mathematics that have not been a part of traditional clinical problem solving. As the complexity of statistical models has increased the dependencies have become less clear to the individual. Contemporary statistical modeling has a primary goal of finding an underlying structure in studied data sets. The development of an evidence-based inference engine that can substantially interpret the data at hand and convert it in real time to a “knowledge-based opinion” could improve clinical decision-making by incorporating multiple complex clinical features as well as duration of onset into the model.

An example of a difficult area for clinical problem solving is found in the diagnosis of SIRS and associated sepsis. SIRS (and associated sepsis) is a costly diagnosis in hospitalized patients. Failure to diagnose sepsis in a timely manner creates a potential financial and safety hazard. The early diagnosis of SIRS/sepsis is made by the application of defined criteria (temperature, heart rate, respiratory rate and WBC count) by the clinician. The application of those clinical criteria, however, defines the condition after it has developed and has not provided a reliable method for the early diagnosis of SIRS. The early diagnosis of SIRS may possibly be enhanced by the measurement of proteomic biomarkers, including transthyretin, C-reactive protein and procalcitonin. Immature granulocyte (IG) measurement has been proposed as a more readily available indicator of the presence of granulocyte precursors (left shift). The use of such markers, obtained by automated systems in conjunction with innovative statistical modeling, provides a promising approach to enhance workflow and decision making. Such a system utilizes the conjoined syndromic features of disparate data elements with an anticipated reduction of medical error. This study is only an extension of our approach to repairing a longstanding problem in the construction of the many-sided electronic medical record (EMR). In a classic study carried out at Bell Laboratories, Didner found that information technologies reflect the view of the creators, not the users, and Front-to-Back Design (R Didner) is needed.

Costs would be reduced, and accuracy improved, if the clinical data could be captured directly at the point it is generated, in a form suitable for transmission to insurers, or machine transformable into other formats. Such data capture, could also be used to improve the form and structure of how this information is viewed by physicians, and form a basis of a more comprehensive database linking clinical protocols to outcomes, that could improve the knowledge of this relationship, hence clinical outcomes.

How we frame our expectations is so important that it determines the data we collect to examine the process. In the absence of data to support an assumed benefit, there is no proof of validity at whatever cost. This has meaning for hospital operations, for nonhospital laboratory operations, for companies in the diagnostic business, and for planning of health systems.

In 1983, a vision for creating the EMR was introduced by Lawrence Weed, expressed by McGowan and Winstead-Fry (J J McGowan and P Winstead-Fry. Problem Knowledge Couplers: reengineering evidence-based medicine through interdisciplinary development, decision support, and research. Bull Med Libr Assoc. 1999 October; 87(4): 462–470.) PMCID: PMC226622 Copyright notice

A Personal View: Larry H Bernstein, MD, FCAP

http://pharmaceuticalintelligence.com/2014/04/15/a-personal-view-larry-h-bernstein-md-fcap/

I comment made on 12/9/2013

I have 3 patents, and I flew to Washington to the Patent Office with my now deceased patent attorney who had degrees in law and engineering, and both his sons were in Patent Law. I didn’t have the income to spend on my test for neonatal hyperbilirubinemia done by spectrophotometry. I spent about $100,000 dollars on patents, and only 1 submission out of 10 pass. The next step is harder. Getting it to the market. The market is blind to science. It is driven by market drivers. Perry Seamonds got his start as Medical Director of a Computer company owned by Rolls Royce, installed at Mayo and at Buffalo General. He is not only an outstanding physician (ophthalmology), but also knows the laboratory, trained under a Nobel nominated biochemist at Columbia, and had the patent on the bicarbonate enzymatic assay used on all instruments today.

He solved an automation problem that Technicon took for it’s own in creating the SMAC – removing the timing coils and using carryover correction. A decade before this, David Seligson had a falling out with Technicon because he had invented the multiphasic system. Yale became known for making it’s own instruments and reagents. But Michael Lehrer, one of the best, told him that their time had come and gone. There was a cost to developing the tests, a cost for quality control, and the problem of how to participate in a proficiency testing program. The regulation of the laboratory operations, already very good, became more cumbersome, but was helped by advances in manufacturing.

He was a visionary a decade ahead of his time. When every laboratory manager was attending user group meetings and the blackboard was filled with instruments coming on-line that needed interfaces, he saw and understood the problem. His engineer produced a “black-box” for each instrument. Each instrument was tied into a desktop “middleware” interface because the configuration then had physicians competing with the laboratory that was putting out 2 million tests a year. He was funded significantly by the medical staff. I visited his first installation, and I was told that the support was as good as he expected. His system was copied by a well put together company that preceded the one developed by Harvard. Arthur Karmen, one of the great clinical chemists, had experience with the system and then had it taken away and replaced by the vastly inferior system out of Kansas City that Mas Chiga and I discussed. Mas had already linked his computer to all the physicians on the medical staff.

I also had a long association with some of the leaders who had good knowledge of the computer that NIH built using a special language, that became a Boston-based company – MediTech. The history of the laboratory experience with informatics is a good 40 years old. Many of the EHR staff came out of the laboratory, and amazingly, the Blood Bank. I spent maybe 10 years with the AACC Laboratory Information Systems Division. It is established that 80% of the information a physician uses comes from the laboratory.

I recently filed a provisional patent for the “Second Opinion” that we developed – Gil David from Technion and RR Coifman, at Yale. It was validated and had a 30,000 patient database.

Kodak told me that I was always ahead of the market. The best representative, whose father was a computer scientist, came to my laboratory and showed me a program that would page the doctor on a critical value. We paged the surgeon who was in charge of the burn unit with a transthyretin value of 8. He loved it. Nursing had a stack of cellphones in a closet! It wouldn’t fly for security reasons. Four years later I was contacted by a company that had done the job completely.

Read Full Post »

Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA

Posts Tagged ‘Medical Informatics’

Yay! Bloomberg View Seems to Be On the Side of the Lowly Scientist!

Yay! Bloomberg View Seems to Be On the Side of the Lowly Scientist!

A few tidbits from his article:

And gives a brief history of academic publishing:

And while NIH Tried To Force These Houses To Accept Open Access:

Big publishers are making $ by either charging as much as they can or focus on new customers and services

Elsevier’s extra services can add news avenues of revenue

Thanks Bloomberg!

Share this:

Like this:

Innovation in Laboratory Information Systems

Derivation of features

Non-synonymous mutations

Base substitution frequency

Trinucleotide base substitution frequency

Copy number aberrations

Machine learning

Validation data

SAFIR01 and MOSCATO trials

COSMIC v70

NSCLC cohort

Results

Development of a classifier based on somatic point mutations

Mutation status of recurrent cancer genes

Single base substitution frequency

Trinucleotide-context base substitution frequency

Development of a classifier based on somatic point mutations and copy number aberrations

Performance of PM and PM + CN classifiers on test data

Performance of PM classifier on independent validation cohorts

Comparison of the PM classifier with an existing method

Discussion

Hematological Malignancy Diagnostics

Realtime Clinical Expert Support

Medical Informatics View

A Personal View: Larry H Bernstein, MD, FCAP

Share this:

Like this:

Follow Blog via Email

Recent Posts

Archives

Categories

Meta