10/22/2024
Most of GPT functions for biomedical analysis will rely on the proper construction of expert-curated knowledge graphs and expert-constructed ontologies. Below explained such a knowledge graphs for precision medicine, PrimeKG.
Building a knowledge graph to enable precision medicine
Source: Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci Data 10, 67 (2023). https://doi.org/10.1038/s41597-023-01960-3
Developing personalized diagnostic strategies and targeted treatments requires a deep understanding of disease biology and the ability to dissect the relationship between molecular and genetic factors and their phenotypic consequences. However, such knowledge is fragmented across publications, non-standardized repositories, and evolving ontologies describing various scales of biological organization between genotypes and clinical phenotypes. Here, we present PrimeKG, a multimodal knowledge graph for precision medicine analyses. PrimeKG integrates 20 high-quality resources to describe 17,080
diseases with 4,050,249 relationships representing ten major biological scales, including disease associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous eforts in disease-rooted knowledge graphs. PrimeKG contains an abundance of ‘indications’, ‘contradictions’, and ‘of-label use’ drug-disease edges that lack in other knowledge graphs and can support AI analyses of how drugs afect disease-associated networks. We supplement PrimeKG’s graph structure with language descriptions of clinical guidelines to enable multimodal analyses and provide
instructions for continual updates of PrimeKG as new data become available.
Background & Summary
Precision medicine takes an approach to disease diagnosis and treatment that accounts for the variability in genetics, environment, and lifestyle across individuals. To be precise, medicine must revolve around data and learn from biomedical knowledge and health information. Nevertheless, many barriers to efciently exploiting information across biological scales slow down the research and development of individualized care. While many have acknowledged the difculties in linking biomedical knowledge to patient-level health information, few realize that biomedical knowledge is itself fragmented. Biomedical knowledge about complex diseases comes from diferent organizational scales, including genomics, transcriptomics, proteomics, molecular functions, intra- and inter-cellular pathways, phenotypes, therapeutics, and environmental efects. For any given disease, information from these organizational scales is scattered across publications, non-standardized data repositories, evolving ontologies, and clinical guidelines. Developing networked relationships between these
sources can support research in precision medicine.
A resource that comprehensively describes the relationships of diseases to biomedical entities would enable systematic study of human disease. Understanding the connections between diseases, drugs, phenotypes, and other entities could open the doors for many types of research, including but not limited to the study of phenotyping, disease etiology, disease similarity10, diagnosis11–13, treatments14, drug-disease relationships15–17,
mechanisms of drug action18 and resistance, drug repurposing19–21, drug discovery22,23, adverse events, and combination therapies. Knowledge graphs developed for individual diseases have yielded insights into respective disease areas. Nevertheless, the costs and extended timelines of these individual eforts point to a need for a resource that would unify biomedical knowledge and enable the investigation of diseases at scale.
While many primary data resources contain information about diseases, consolidating them into a comprehensive, disease-rich, and functional knowledge graph presents three challenges. First, existing approaches to network analysis of diseases require expert review and curation of data in the knowledge graph. While incredibly detailed, such eforts require substantial manual labor and expensive expert input, making them difcult to scale. Second, there lacks a consistent representation of diseases across biomedical datasets and
Rather than have a standardized disease ontology, database developers select the ontology
that best suits their function from a multitude of biorepositories. Because each set of disease vocabulary was tailored for some to serve a unique purpose, their disease encodings overlap unsystematically and are often in confict. For instance, International Classifcation of Diseases (ICD) codes are optimized for medical billing whereas MedGen53, PhenoDB, and Orphanet focus on rare and genetic diseases. Moreover, expertly curated disease descriptions in medical repositories do not follow any naming conventions. The lack of standardized disease representations and the multimodal nature of the datasets makes it challenging
to harmonize biomedical knowledge at scale. Tird, the defnition of diseases as discrete and distinct units of analysis remains medically and scientifcally ambiguous. For instance, while autism spectrum disorder is considered a medical diagnosis, the condition has many subtypes linked to clinically divergent manifestations. Clinically studied disease subtypes ofen do not correlate clearly with those defned in disease ontologies. Although only three subtypes of autism have been clinically identifed, the Unifed Medical Language
System (UMLS)46 describes 192 subtypes, the Monarch Disease Ontology (MONDO)44 describes 37 subtypes, and finally, Orphanet contains 6 disease entries for autism. Te challenge in reconciling disease entities is only exacerbated by the variety of synonyms and abbreviations available for any particular disease58 and the difculty in linking structured disease entities to unstructured names in text59. Meaningful disease entity resolution across multimodal, non-standardized datasets is critical for developing resources useful for precision medicine tasks.
Collectively, biomedical knowledge graphs have lent themselves to a variety of scientifc
discoveries methodological innovationsand coordinated initiatives for model evaluation and benchmarking. Further, knowledge graphs facilitated research across various problems faced by the biomedical community. Nevertheless, due to the medical heterogeneity of diseases, the multimodal nature of disease inforamation, and the incompatibility of existing disease repositories, knowledge graphs focused on diseases have not yet achieved the scale or impact of biomedical efforts.
Precision Medicine Knowledge Graph (PrimeKG) is a knowledge graph providing a holistic and multimodal view of diseases. We integrate 20 high-quality resources, biorepositories, and ontologies to curate this knowledge graph. Across 129,375 nodes and 4,050,249 relationships, PrimeKG captures information on ten major biological scales, including disease-associated perturbations in the proteome, biological processes, molecular pathways, anatomical and phenotypic scales, environmental exposures, and the range of approved and experimental
drugs together with their therapeutic action (Fig. 1a,b). We demonstrate that PrimeKG improves on coverage of diseases, both rare and common, by one-to-two orders of magnitude compared to existing knowledge graphs. Moreover, disease nodes in PrimeKG are densely connected to many other node types, including phenotypes, exposures, and drugs. We tune PrimeKG specifcally to support artifcial intelligence analyses to understand
how drugs target disease-associated molecular perturbations by including an abundance of ‘indications’, ‘contradictions’, and ‘of-label use’ drug-disease edges, which are usually missing or sparse in other knowledge graphs.
We supplement PrimeKG’s graph structure with textual descriptions of clinical guidelines for drug and disease nodes to enable multimodal analyses (Fig. 1c). Finally, we address the disease entity resolution challenge by improving the correspondence between diseases in PrimeKG and disease subtypes found in the clinic to enable PrimeKG-powered analyses in precision medicine.
02/19/2024
Microsoft Bing Has Now Incorporated AI Features in their browser called New Bing AI. The following video shows a medical student producing a Case Report from a chart.
PLEASE WATCH VIDEO
02/18/2024
A Small Primer on the Differences Between Artificial Intelligence, Machine Learning, and Large Language Models
Reporter: Stephen J. Williams, PhD
It is warranted, at this point, to post a small reminder of the differences between artificial intelligence (AI), machine learing (ML) and large language models (LLM) in order to limit confusion, as these terms seem to be used interchangeably in the literature and news. There a wonderful article is reported below on these differences and some explanations to distinguish between the three technologies.
This excellent source can be found at: https://www.unite.ai/machine-learning-vs-deep-learning-key-differences/
Machine learning vs. Deep learning – Key Differences
by Haziqa Sajid (January 6 2023
Terminologies like Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning are hype these days. People, however, often use these terms interchangeably. Although these terms highly co-relate with each other, they also have distinctive features and specific use cases. AI deals with automated machines that solve problems and make decisions imitating human cognitive capabilities. Machine learning and deep learning are the subdomains of AI. Machine Learning is an AI that can make predictions with minimal human intervention. Whereas deep learning is the subset of machine learning that uses neural networks to make decisions by mimicking the neural and cognitive processes of the human mind.
|
Machine Learning in a Nutshell Machine learning allows experts to “train” a machine by making it analyze massive datasets. The more data the machine analyzes, the more accurate results it can produce by making decisions and predictions for unseen events or scenarios. Machine learning models need structured data to make accurate predictions and decisions. If the data is not labeled and organized, machine learning models fail to comprehend it accurately, and it becomes a domain of deep learning. |
|
| The availability of gigantic data volumes in organizations has made machine learning an integral component of decision-making. Recommendation engines are the perfect example of machine learning models. OTT services like Netflix learn your content preferences and suggest similar content based on your search habits and watch history.
To understand how machine learning models are trained, let’s first look at types of ML. |
There are four types of methodologies in machine learning.
- Supervised learning – It needs labeled data to give accurate results. It often requires learning more data and periodic adjustments to improve outcomes.
- Semi-supervised – It’s a middle tier between supervised & unsupervised learning that exhibits the functionality of both domains. It can give results on partially labeled data and doesn’t require ongoing adjustments to give accurate results.
- Unsupervised learning – It discovers patterns and insights in datasets without human intervention and gives accurate results. Clustering is the most common application of unsupervised learning.
- Reinforcement learning – The reinforcement learning model requires constant feedback or reinforcement as new information comes to give accurate results. It also uses a “Reward Function” that enables self-learning by rewarding desired outcomes and penalizing wrong ones
Machine learning models need human intervention to improve accuracy. On the contrary, deep learning models improve themselves after each result without human supervision. But it often requires more detailed and lengthy volumes of data.
The deep learning methodology designs a sophisticated learning model based on neural networks inspired by the human mind. These models have multiple layers of algorithms called neurons. They continue to improve without human intervention, like the cognitive mind that keeps improving and evolving with practice, revisits, and time.
Deep learning models are mainly used for classification and feature extraction. For instance, deep models feed on a dataset in facial recognition. The model creates multidimensional matrices to memorize each facial feature as pixels. When you ask it to recognize a picture of a person it was not exposed to, it easily recognizes it by matching limited facial features.
- Convolutional Neural Networks (CNN) – Convolution is the process of assigning weights to different objects of an image. Based on these assigned weights, the CNN model recognizes it. The results are based on how close these weights are to the object’s weight fed as a train set.
- Recurrent Neural Network (RNN) – Unlike CNN, the RNN model revisits the previous results and data points to make more accurate decisions and predictions. It’s an actual replica of human cognitive functionality.
- Generative Adversarial Networks (GANs) – The two classifiers in GAN, the generator & discriminator, access the same data. The generator produces fake data by incorporating feedback from the discriminator. The discriminator tries to classify whether a given data is real or fake.
Salient Differences
Below are some notable differences.
| Differences | Machine Learning | Deep Learning |
| Human Supervision | Machine learning requires more supervision. | Deep learning models require almost no human supervision after development. |
| Hardware Resources | You build and run Machine learning programs on a powerful CPU. | Deep learning models require more powerful hardware, like dedicated GPUs. |
| Time & Effort | The time required to set up a Machine learning model is less than deep learning, but its functionality is limited. | It requires more time to develop and train data with deep learning. Once created, it continues to improve its accuracy with time. |
| Data (structured/unstructured) | Machine learning models need structured data to give results (except unsupervised learning) and require continuous human intervention for improvement. | Deep learning models can process unstructured and complex datasets without compromising accuracy. |
| Use-cases | eCommerce websites and streaming services that use recommendation engines. | High-end applications like Autopilot in planes, self-driving vehicles, Rovers on the Martian surface, face recognition, etc. |
Large Language Models (LLM): A type of Deep Learning

Large language models (LLM) are a subset of Deep Learning
from source: https://aws.amazon.com/what-is/large-language-model/
Large language models (LLM) are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it. Transformer LLMs are capable of unsupervised training, although a more precise explanation is that transformers perform self-learning. It is through this process that transformers learn to understand basic grammar, languages, and knowledge. Unlike earlier recurrent neural networks (RNN) that sequentially process inputs, transformers process entire sequences in parallel. This allows the data scientists to use GPUs for training transformer-based LLMs, significantly reducing the training time. Transformer neural network architecture allows the use of very large models, often with hundreds of billions of parameters. Such large-scale models can ingest massive amounts of data, often from the internet, but also from sources such as the Common Crawl, which comprises more than 50 billion web pages, and Wikipedia, which has approximately 57 million pages.
02/17/2024
A Cross-Sectional Study Comparing Chat GPTs and Bard on Immuno-Oncology Queries Suggest the need for Expert-Driven Verification
Reporter: Stephen J. Williams, PhD
Large language models (LLMs) have seen great utility in evaluating large corpus of knowledge and, together with the recent developments in generative artificial intelligence (AI) and natural language processing (NLP), its use in a multitude of applications have dramatically increased, including in healthcare. As mentioned in previous postings on this site, uses of LLMs, NLP, and generative AI in oncology include providing recommendations to oncologists and physician boards, offering clinical advice, and providing educational and management information for cancer patients with regard to their treatment and prognosis. The promise of ChatGPTs to offer user-friendly tools able to provide detailed, accurate, and current information could be criticial for the physician and patient in management of cancer in the clinical setting.
The addition of immuno-oncology agents (IO) has revolutionized cancer therapy and has shown great success for a variety of malignancies. In addition, these agents have also been added to standard chemotherapy regimens, and the number of patients currently taking IO has increased substantially, increasing the demand for patient education and tools for physicians and patients to assist and provide in management of their care and information on the therapies patients are taking. Therefore, the utility of LLMs and GPTs would be significant in this regard. In the study “Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study” by Iannantuono et. al in The Oncologist, the authors assess the ability of ChatGPT4, ChatGPT3.5 and Google Bard to answer questions within four domain of immuno-oncology including mechanisms, indications, toxicities, and prognosis.
Methodology: 60 open-ended questions were generated based on the authors combined clinical experience covering mechanisms, indications, toxicities and prognosis concerning imuno-oncology drugs. These questions were submitted to the aforementioned pre-trained LLMs and responses were assessed based on reproducibility, accuracy, relevance, and readability of the response. Two reviewers also submitted independent ratings of the answers generated by the LLMs and also submitted their own written responses to the questions. These reviews were blinded and any disagreements were evaluated by a third expert reviewer. Importantly, statistical analysis, a Cohen’s kappa coefficient was calculated to evaluate inter-rater reliabilty during the rating process. The criteria and definitions of successful outcomes is shown in Table 1. A boolean response is defined as True or False where the evaluators responses were numerically reported and statistically evaluated with parametric tests. These numerical evaluations from experts were centered on agreements of expert-opinions on accuracy, relevence and reproduceability of ChatGPT or Bard responses.

Results: ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (P < .0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (P < .0001). In terms of accu¬racy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (P = .04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (P = .02). Results are graphically summarized in Figure 2 and 2. In Figure 2, it is evident that ChatGPT shows superiority of Bard with respect to various metrics (full blue circles represents 100%) while orange represents the reproduceability of results. Figure 3 shows results in bar chart format.

Figure 2. Spot matrix of the percentages of the answered questions (Blue) and reproducible responses (Orange) for each LLM. Color volume is directly proportional to percentage with the outer black circle representing 100%. Figure from Giovanni Maria Iannantuono, Dara Bracken-Clarke, Fatima Karzai, Hyoyoung Choo-Wosoba, James L Gulley, Charalampos S Floudas, Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study, The Oncologist, 2024;, oyae009, https://doi.org/10.1093/oncolo/oyae009

Figure 3. Bar plot of the results (accuracy, readability, and relevance) for all 3 LLMs. This plot was based only on the questions evaluable for accuracy, readability, and relevance. Figure from Giovanni Maria Iannantuono, Dara Bracken-Clarke, Fatima Karzai, Hyoyoung Choo-Wosoba, James L Gulley, Charalampos S Floudas, Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study, The Oncologist, 2024;, oyae009, https://doi.org/10.1093/oncolo/oyae009
Conclusion: ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all 3 LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies. However, the authors note some limitations of the study. First the authors note that they only compared three LLMs (ChatGPT 3.5, 4.5, Bard) while other LLMs may be more useful for biomedical information like Microsoft’s Bing AI. In addition there were only three experts and more robust statistical analyses would be available with increasing the number of expert reviewers, as well as available to retrain the LLMs.
Reference: Giovanni Maria Iannantuono, Dara Bracken-Clarke, Fatima Karzai, Hyoyoung Choo-Wosoba, James L Gulley, Charalampos S Floudas, Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study, The Oncologist, 2024;, oyae009, https://doi.org/10.1093/oncolo/oyae009
01/14/2024
Recent Highlights from Literature on Utilization of Chat GPT in medical oncology decision making: the verdict is not clear
Curator: Stephen J. Williams, PhD
Two recently published articles may be highlighting the conflicting results as to whether ChaptGPT and other generative pretrained transformers (GPT) are useful in medical oncology clinical decision making. In previous posts, we had discussed success and tribulations of some GPT queries on cancer care, and while there have been relative agreement on whole with cancer experts two themes of issues seem to occur: lack of referencing and hallucinations, or incorrect and outdated information. These articles discuss some of the usefulness of common GPT algortithms and some cautions that tumor boards should keep in mind when relying on such technology.
In “Pearls and Pitfalls of ChatGPT in Medical Oncology”, Blum et al discuss how ChatGPT can be utilized by oncologists and highlight some caveats of the technology, as well as point out some limitations that tumor boards should keep in mind when relying on GPT for reliable information and clinical decision-making. The rapid advent, development and adoption of ChatGPT has spurred multiple applications in multiple fields, including clinical oncology. As such there have been much excitement concerning this new technology potential for transforming healthcare and cancer care. This popularity of the ChatGPT technology may stem from its ability to perform effective large language processing and modeling on extremely large training data sets, particularly from over 175 billion parameters to achieve various natural language processing tasks.
Blum J, Menta AK, Zhao X, Yang VB, Gouda MA, Subbiah V. Pearls and pitfalls of ChatGPT in medical oncology. Trends Cancer. 2023 Oct;9(10):788-790. doi: 10.1016/j.trecan.2023.06.007. Epub 2023 Jul 4. PMID: 37407364
As the authors note,
currently there are issues of inaccurately generated medical advice, making the current implementation of ChatGPT a risky tool to abtain definitive treatment advice from
However, they do not that this tecnology could be useful in the training of residents and benefit from case analysis. For example, the Glass AI application is a note-taking software which has found usefulness in training of medical students with the goal of teaching them differences in clinical diagnosis and treatment plans. In the clinical setting, physicians and office staff can use AI to write letters for insurance submissions or follow-ups to patient visits. In addition, providers can use the GPT API to quickly double check clinical decision making. However there are certain caveats and warning about usage of ChatGPT for reliance on clinical decision making.
Warnings on usage
Some limitations that the authors see which could complicate effective usuage of GPT include
- Identity of the training set
A large proportion of ChatGPT training set eminates from non-peer reviewed sources, and peer-reviewed medical literature accounts for very little of ChatGPT’s overall training set. As shown below from the authors Table 1, Common Crawl, which is an open source repository from billions of web pages, comprises 60% of ChatGPT training set. Books1 and Books2 do contain some peer-reviewed medical literature however this small amount (16%) is diluted by the much larger contribution on unverified and potentially incorrect material.
| Dataset | No. of tokens | Propertion within training |
| Common Crawl | 410 billion | 60% |
| WebText2 | 19 billion | 22% |
| Books1 | 12 billion | 8% |
| Books2 | 55 billion | 8% |
| Wikipedia | 3 billion | 3% |
What are Book1 and Books2?
AI Training Datasets: the Books1+Books2 that Big AI eats for breakfast
Source: https://gregoreite.com/drilling-down-details-on-the-ai-training-datasets/
Books1 & Books2 are two internet-based books corpora, containing a random sampling of a small subset of all the public domain books that are available online [? fact check ASAP!] .
UPDATE: Jan 2023: I, and many others, are starting to seriously question what the actual contents of Books1 & Books2 are; they are not well documented online — some (including me) might even say that given the significance of their contribution to the AI brains, their contents has been intentionally obfuscated.
My research here is ongoing. On the surface, the contents of Books1 & Books2 appears to be far less than my original supposition of “all public domain books ever.” A helpful researcher by the name of Shawn Presser has done a hero’s task and published a “Books3” dataset. See the @theshawwn twitter feed regarding that, including download links to the raw multi-GB dataset downloads.
Therefore the authors suggest that ChatGPT should not be relied upon to provide the latest up-to-date medical information.
- Innacuracy of ChatGPT on standardized tests
- In the USMLE ChatGPT was proficient enough to pass bu this was highly variable, getting from 54% to a passing 75%. However this indicates that reliance to provide high accuracy to complex clinical medical decisions is unwarranted.
However, even with current limitations the authors do suggest a few areas where ChatGPT could be usefull in the clinical oncology setting including
- Symptom onset and evaluation
- Diagnosis and screening
- Language translation tool
- Patient education tool
- Diagnostic and treatment planning tool
- Patient virtual assistant
- Personalized error checking tool
Large Language Model (ChatGPT) as a Support Tool for Breast Tumor Board
Sorin, V., Klang, E., Sklair-Levy, M. et al. Large language model (ChatGPT) as a support tool for breast tumor board. npj Breast Cancer 9, 44 (2023). https://doi.org/10.1038/s41523-023-00557-8
Large language models (LLM) have gained much attention for potential utility in healthcare. LLMs that are based on transormers with Attention mechanism can be first trained on extremely large datasets but also considered “few shot learners” meaning that once trained you can update with only a small number of examples. Several examples and studies have shown some encouraging uses in medicine. In this paper Sorin et al from Chaim Sheba Medical Center evaluated ChatGPT as a support tool for breast tumor board decision making. They used ChatGPT 3.5 to insert the results of ten consecturive patient data presented to a breast tumor board. They next asked the chatbot to recommend treatment management and results graded by the tumor board and two senior radiologists. Eight of the patients had invasive ductal carcinoma and one patient with DCIS, one patient had a phylodes tumor.
Results: In seven out of ten cases ChatGPT’s recommendations were similar to the tumor board recommendation and agreement between raters was fair for summation, subtantial for cliical recommendation and substantial for explanation (however this needs to be improved). In eight out of ten cases surgery was the next recommended step and only in two patients was neoadjuvant therapy recommended. However the composition of the tumor board was not reported, so it is unsure whether the board was made of predominantly surgeons who might be apt to recommend surgery as first step. Also of note was the low percentage of neoadjuvant recommendations even though many more presented with Her2/neu amplification and could have been treated with neoadjuvant therapy, which in many cases is preferred. The authors DID note that ChatGPT missed the HER2 FISH results. In addition, little in the manner of additional diagnostic tests were suggested (note in the case of the phalloides tumor was hypoglycemia measured or IGFII levels?).
Another interesting point the authors noted was the lack of referral by GPT for additional imaging or consultion, especially consult with a radiologist. This may be a result of the way GPT was trained or amplification of a hallucination or bias. Another issue is question of leagality and liability in cases where AI-driven decisions lead to a negative clinical outcome. Exposure of the GPT to domain specific knowledge may help alleviate some of the biases and hallucinations, as noted by the authors.
11/05/2023
Insilico Medicine discovers novel dual targets for aging and disease using Microsoft BioGPT
CREDIT: INSILICO MEDICINE
- Based on Microsoft BioGPT, Insilico Medicine’s R&D team proposed a novel approach for predicting therapeutic targets using a large language model (LLM) specifically trained for biomedical tasks;
- A total of 9 potential dual-purpose targets against aging and 14 major age-related diseases were discovered, with CCR5 and PTH nominated as novel targets for anti-aging;
- Apart from target selection, the method can be applied to extensive ranking tasks, even without clear criteria.
As the cornerstone of popular chatbots including GPT-4, large language models (LLMs) trained on vast amounts of text data have been contributing to advances in diverse fields including literature, art, and science, but their potential in the complex realms of biology and genomics has yet to be fully unlocked.
Insilico Medicine, a clinical-stage generative artificial intelligence (AI)-driven drug discovery company announced that the company has utilized the connection retrieval ability of Microsoft BioGPT to identify 9 potential dual-purpose targets against both the aging process and 14 major age-related diseases. Two of the proposed genes have not been previously correlated to the aging process, indicating the potential of Transformer models in novel target prediction and other ranking tasks across the biomedical field. The findings were published in the journal Aging.
According to recent publications, the majority of LLMs are trained on the continuation of texts, and work by suggesting the next word possible depending on the connection and probability distribution extracted from the context. Given a plausible prompt and adequate background data, scientists can now apply LLMs, especially specialized models, to the target prioritization process.
BioGPT, the domain-specific generative Transformer language model, was jointly proposed by Microsoft Research and Peking University in China. Pre-trained on millions of previously published biomedical research articles, the model outperformed previous models in multiple biomedical natural language processing tasks and demonstrated human parity in analyzing biomedical research to answer questions.
To further enhance the performance of BioGPT, Insilico researchers used a dataset of 900,000 grant proposals from the National Institutes of Health for training, and evaluated the effect through log fold change of enrichment (ELFC) and hypergeometric p-value (HGPV) scores. Next, the team established a target discovery pipeline including the prompt, retrieval probability of tokens, and gene probability calculation.
Using the final prompt sentence of “human gene targeted by a drug for treating {DISEASE} is the,” and the general tokenizer from BioGPT, the researchers proposed 9 potential targets after several cycles of probability retrieval. In the end, 5 targets were nominated as dual-purpose targets against aging and all 14 age-related diseases including Alzheimer’s disease, amyotrophic lateral sclerosis, and idiopathic pulmonary fibrosis. Both CCR5 and PTH are considered novel age-related targets.
“I am thrilled to see this breakthrough based on LLMs presented by the Insilico team, as it highlights the potential of a Transformer and generative AI approach combined with specific databases,” says Alex Zhavoronkov, PhD, founder and CEO of Insilico Medicine. “We hope to further accelerate drug R&D processes using our proprietary Pharma.AI platform in this era of biotech paradigm change.”
“BioGPT can learn and understand large amounts of medical literature, thereby empowering practical processes including novel drug research and development, medical knowledge graph development, precision medicine, and medical dialogue assistance systems, and driving new biotechnology developments,” said Tao Qin, PhD, Senior Principal Researcher at Microsoft Research AI4Science. “The research results released by Insilico Medicine shed light on new practical application scenarios for BioGPT and other LLM-based AI engines. We look forward to further real-world applications and more breakthroughs.”
A leader in generative AI for drug discovery, Insilico Medicine has established and validated its proprietary end-to-end Pharma.AI platform across target discovery, small molecule generation, and clinical trial design. Recently, the company published the validation results of inClinico in Clinical Pharmacology and Therapeutics, where the Transformer-based clinical trial prediction tool achieved 79% accuracy in prospective validation.
The full article can be found here: https://www.aging-us.com/article/205055/text
However the article does provide some ethical concerns as well. Most importantly it states that Insilico Medicine was given access to over 900,000 grant applications from the NIH. These applications, if unfunded, are to be kept private and confidential, and the NIH does not have the authority to share a pre or non funding application data with outside parties. it is curious as to who authorized this. This also goes to the heart of many of the issues related to GPT technology, namely the source of model training data set and proper authorizations and compensation. It begs to ask was there consent given in this study, and if not, were any HIPAA rules then broken?
9/12/2023
Use of Artificial Intelligence Chatbots for Cancer Treatment Information
In a Research Editorial Comment in JAMA Oncology, authors Drs. Chen, Kann, and Foote from Memorial Sloan Kettering and Mass General Brigham investigated the validity of ChatGPT outputs concerning diagnositic recommendations for breast, prostate, and lung cancer against the guidelines of the National Comprehensive Cancer Network (NCCN), and a group of board certified oncologists.
Methods: The authors developed 4 zero-shot prompt templates to query treatment recommendations These templates did not provide the model with examples of correct responses. Templates were used to create 4 prompt variations for 26 diagnosis descriptions (cancer types with or without relevant extent of disease modifiers) for a total of 104 prompts. Prompts were input to the GPT-3.5-turbo-0301 model via the ChatGPT (OpenAI) interface. In accordance with the Common Rule, institutional review board approval was not needed since human participants were not involved. They benchmarked the chatbot’s recommendations against 2021 NCCN guidelines because this chatbot’s knowledge cutoff was September 2021. Five scoring criteria were developed to assess guideline concordance (as shown in the subsequent Table). The output did not have to recommend all possible regimens to be considered concordant; the recommended treatment approach needed only to be an NCCN option. Concordance of the chatbot output with NCCN guidelines was assessed by 3 of 4 board-certified oncologists, and majority rule was taken as the final score. In cases of complete disagreement, the oncologist who had not previously seen the output adjudicated. Data were analyzed between March 2 and March 14, 2023, using Excel, version 16.74 (Microsoft Corp).
A graphical summary of the methods is shown below (taken from Supplemental Material from the paper)

Underlined text indicates where each diagnosis description was input into the LLM chatbot prompt template. Diagnosis descriptions consisted of cancer type (breast cancer, lung cancer, non–small cell lung cancer, small cell lung cancer, and prostate cancer) with and without extents of disease relevant for each cancer type. A total of 26 disease descriptions were input into the prompt templates, for a total of 104 unique prompts. NCCN indicates National Comprehensive Cancer Network.
Results: Outputs of 104 unique prompts were scored on 5 criteria for a total of 520 scores. All 3 annotators agreed for 322 of 520 (61.9%) scores. Disagreements tended to arise when the output was unclear (eg, not specifying which multiple treatments to combine). The chatbot provided at least 1 recommendation for 102 of 104 (98%) prompts. All outputs with a recommendation included at least 1 NCCN-concordant treatment, but 35 of 102 (34.3%) of these outputs also recommended 1 or more nonconcordant treatments. Responses were hallucinated (ie, were not part of any recommended treatment) in 13 of 104 (12.5%) outputs. Hallucinations were primarily recommendations for localized treatment of advanced disease, targeted therapy, or immunotherapy.
In summary, one third of the ChatBots answers were not in agreement with expert curated NCCN guidelines.
As stated in the article
Disagreement among annotators highlighted the challenges of interpreting descriptive LLM output. Disagreements most often arose from unclear output, but differing interpretations of guidelines among annotators may have played a role. Clinicians should advise patients that LLM chatbots are not a reliable source of treatment information. ….. the chatbot did not perform well at providing accurate cancer treatment recommendations. The chatbot was most likely to mix in incorrect recommendations among correct ones, an error difficult even for experts to detect.
Source: Chen SKann BHFoote MB, et al. Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol. Published online August 24, 2023. doi:10.1001/jamaoncol.2023.2954
For a description of NCCN please see the following link
The National Comprehensive Cancer Network® (NCCN®) is a not-for-profit alliance of 33 leading cancer centers devoted to patient care, research, and education. NCCN is dedicated to improving and facilitating quality, effective, equitable, and accessible cancer care so all patients can live better lives. Through the leadership and expertise of clinical professionals at NCCN Member Institutions, NCCN develops resources that present valuable information to the numerous stakeholders in the health care delivery system. By defining and advancing high-quality cancer care, NCCN promotes the importance of continuous quality improvement and recognizes the significance of creating clinical practice guidelines appropriate for use by patients, clinicians, and other health care decision-makers around the world.
World-renowned experts from NCCN Member Institutions diagnose and treat patients with a broad spectrum of cancers and are recognized for dealing with complex, aggressive, or rare cancers. NCCN Member Institutions pioneered the concept of the multidisciplinary team approach to patient care and conduct innovative research that contributes significantly to understanding, diagnosing, and treating cancer. NCCN programs offer access to expert physicians, superior treatment, and quality and safety initiatives that continuously improve the effectiveness and efficiency of cancer care globally.
The Guideline are proprietary but formed from expert committees of national and international oncologists and reviewed periodically to provide updates on the latest standards and research in cancer diagnosis and treatment. The NCCN provides these guidelines to the 27 NCI designated cancer centers as a service.
A little bit of extra information on the OpenAI model GPT 3.5 Turbo
Source: https://platform.openai.com/docs/models/continuous-model-upgrades
The OpenAI API is powered by a diverse set of models with different capabilities and price points. You can also make customizations to our models for your specific use case with fine-tuning.
| MODELS | DESCRIPTION |
|---|---|
| GPT-4 | A set of models that improve on GPT-3.5 and can understand as well as generate natural language or code |
| GPT-3.5 | A set of models that improve on GPT-3 and can understand as well as generate natural language or code |
| GPT base | A set of models without instruction following that can understand as well as generate natural language or code |
| DALL·E | A model that can generate and edit images given a natural language prompt |
| Whisper | A model that can convert audio into text |
| Embeddings | A set of models that can convert text into a numerical form |
| Moderation | A fine-tuned model that can detect whether text may be sensitive or unsafe |
| GPT-3
Legacy
|
A set of models that can understand and generate natural language |
| Deprecated | A full list of models that have been deprecated |
We have also published open source models including Point-E, Whisper, Jukebox, and CLIP.
With the release of gpt-3.5-turbo, some of our models are now being continually updated. gpt-3.5-turbo, gpt-4, and gpt-4-32k point to the latest model version. You can verify this by looking at the response object after sending a ChatCompletion request. The response will include the specific model version used (e.g. gpt-3.5-turbo-0613).
9/11/2023
ChatGPT Found Lacking Performance and Accuracy by Cancer Clinicians
Again the jury is out concerning the usefulness and dangers of using ChatGPT to make clinical decisions, particularly in oncology. In a recently studies from investigators at Dana Farber Cancer Institute and Mass General Brigham on the utility of ChatGPT to answer clinicians queries and supply reccomendations, investigators found ChatGPT to fall very short of their expectations.
The following is taken from an aricle in the American Hospital Association blog page
In a recent column published by the Dana-Farber Cancer Institute, one of its physicians relayed his experience in using ChatGPT to provide statistics on a certain type of cancer. To his surprise, ChatGPT made up an equation and even gave it a name.
“It was an equation that does nothing, but it looked very convincing,” said Benjamin Schlechter, M.D., who specializes in gastrointestinal cancers. “In a way, it’s like talking to children: They start making up a story and continue the more you ask them about it. In this case, ChatGPT was adding detail after detail, none of it real, because I asked it to elaborate. It’s very confident for a computer.”
This casual observation is very much like the results study recently published in JAMA Oncology
and highlighted here in earlier posts.
Researchers from Mass General Brigham found that one-third of GPT’s 3.5 recommendations went at least partially against 2021 National Comprehensive Cancer Treatment guidelines. “Clinicians should advise patients that large language model chatbots are not a reliable source of information,” the study concluded. The chatbot was most likely to mix incorrect recommendations among correct ones, creating an error that’s difficult even for experts to detect. The study only evaluated one model at a snapshot in time, but the findings provide insight into areas of concern and future research needs. Danielle Bitterman, M.D., Mass General Brigham’s department of radiation oncology and the artificial intelligence (AI) in medicine program, said in a statement: “ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient’s unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide.” The chatbot did not purport to be a medical device, and need not be held to such standards, the study said. Patients, however, likely will use technologies like this to educate themselves, which may affect shared decision-making in the doctor-patient relationship.The investigators plan to explore how patients and physicians can distinguish between medical advice written by a physician compared with AI. They also plan to prompt ChatGPT with more detailed clinical cases to evaluate AI’s clinical knowledge further.
8/27/2023
GeneGPT: A New Chat/Search AI Bot for Finding BioInformatic Tools and Information by searching NCBI APIs
GeneGPT is a Large Language Model and Generative AI Search Bot to help scientisits find better and more appropriate bioinformatic tools and biomedical information.
A paper published as a pre-print on Cornel University’s arxiv server (under Computer Science) by Qiao Jin, Yifan Yang, Qingyu Chen, Zhiyong Lu discusses a new tool for biomedical analysis. The authors aimed to teach LLMs to use the Web APIs of the National Center for Biotechnology Information (NCBI), including BLAST tools.
While large language models (LLMs) have been successfully applied to various tasks, they still face challenges with hallucinations. Augmenting LLMs with domain-specific tools such as database utilities can facilitate easier and more precise access to specialized knowledge. In this paper, we present GeneGPT, a novel method for teaching LLMs to use the Web APIs of the National Center for Biotechnology Information (NCBI) for answering genomics questions. Specifically, we prompt Codex to solve the GeneTuring tests with NCBI Web APIs by in-context learning and an augmented decoding algorithm that can detect and execute API calls. Experimental results show that GeneGPT achieves state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83, largely surpassing retrieval-augmented LLMs such as the new Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as GPT-3 (0.16) and ChatGPT (0.12). Our further analyses suggest that: (1) API demonstrations have good cross-task generalizability and are more useful than documentations for in-context learning; (2) GeneGPT can generalize to longer chains of API calls and answer multi-hop questions in GeneHop, a novel dataset introduced in this work; (3) Different types of errors are enriched in different tasks, providing valuable insights for future improvements.
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Genomics (q-bio.GN) |
| Cite as: | arXiv:2304.09667 [cs.CL] |
| (or arXiv:2304.09667v3 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2304.09667 |
Code and Tools for GeneGPT are available using the link above. The following tweet also shows a screenshot of a use exampe of GeneGPT. Notice that the user @razoralign is using GeneGPT to ask the chromosomal location and function of a particular SNP. The answers that GeneGPT supplies are not necessarily text based (as other GPTs offer) but more importantly the GPT returns a link to NCBI pages showing the exact location and function of the queried gene.
GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information https://t.co/K9r3pG6mAd pic.twitter.com/mc063htgj8
— antisense. (@razoralign) August 25, 2023
GeneGPT, a novel method that prompts Codex (Chen et al., 2021) to use NCBI Web APIs by in-context learning (Brown et al., 2020). GeneGPT consists of two main modules: (a) a specifically designed prompt that consists of documentations and demonstrations of API usage, and (b) an inference algorithm that integrates API calls in the Codex decoding process. We evaluate GeneGPT on GeneTuring (Hou and Ji, 2023), a question answering (QA) benchmark for genomics, and compare GeneGPT to a variety of other LLMs such as the new Bing1, ChatGPT2, and BioGPT (Luo et al., 2022). GeneGPT achieves the best performance on eight GeneTuring tasks with anaverage score of 0.83, which is remarkably higher than the previous SOTA (0.44 by New Bing). In addition, we systematically characterize GeneGPT and find that: (1) API demonstrations are more useful than documentations for in-context learning; (2) GeneGPT generalizes to longer chains of subquestion decomposition and API calls with simple demonstrations; (3) GeneGPT makes specific errors that are enriched for each task
1https://www.bing.com/new 2https://chat.openai.com/
8/13/2023
ChatGPT: friend or foe?
From the Lancet: Although not specifically on the topic of cancer, the following article in Lancet discusses the use of Chat Generative Pre-trained Tranformer (ChatGPT) and AI in writing patient discharge summaries, seemingly a trite matter. However, as the editors of Lancet propose, the reliance on AI and not human input may have multitude of ethical issues as well as misinformation put into these summaries, and may ultimately jeapordize the patient. I will leave the editorial in its original words as it is important to realize there is much consternation on use of ChatGPT in medical writing.
EDITORIAL| VOLUME 5, ISSUE 3, E102, MARCH 2023
The Lancet Digital Health
Open Access Published:February 06, 2023DOI:https://doi.org/10.1016/S2589-7500(23)00023-7
You would have been hard-pressed to miss the storm surrounding ChatGPT (Chat Generative Pre-trained Transformer) over the past few months. News outlets and social media have been abuzz with reports on the chatbot developed by OpenAI. In response to a written prompt, ChatGPT can compose emails, write computer code, and even craft movie scripts. Researchers have also demonstrated its competency to pass medical licensing exams. But excitement has been matched by a swathe of ethical concerns that could—and perhaps should—limit its adoption. ChatGPT is powered by a refined version of the large language model (LLM) GPT-3.5. Its base model GPT-3 was trained on articles, websites, books, and written conversations, but a process of fine-tuning (including optimisation for dialogue) enables ChatGPT to respond to prompts in a conversational way. In the realm of health care, Sajan B Patel and Kyle Lam illustrated ChatGPT’s ability to generate a patient discharge summary from a brief prompt. Automating this process could reduce delays in discharge from secondary care without compromising on detail, freeing up valuable time for doctors to invest in patient care and developmental training. A separate study also tested its ability to simplify radiology reports, with the generated reports being deemed overall factually correct, complete, and with low perceived risk of harm to patients. But in both cases, errors were evident. In the discharge summary example provided by Patel and Lam, ChatGPT added extra information to the summary that was not included in their prompt. Likewise, the radiology report study identified potentially harmful mistakes such as missing key medical findings. Such errors signal that if implemented in clinical practice, manual checks of automated outputs would be required. The limitations of ChatGPT are known. By OpenAI’s own admission, ChatGPT’s output can be incorrect or biased, such as citing article references that do not exist or perpetuating sexist stereotypes. It could also respond to harmful instructions, such as to generate malware. OpenAI set up guardrails to minimise the risks, but users have found ways around these, and as ChatGPT’s outputs could be used to train future iterations of the model, these errors might be recycled and amplified. OpenAI have asked users to report inappropriate responses in order to help improve the model, but this has been met with criticism, as it’s often people disproportionately affected by algorithmic bias (such as those from marginalised communities) who are expected to help find solutions. Michael Liebrenz and colleagues opine that although ChatGPT could serve to democratise knowledge sharing as it can receive and output text in multiple languages (beneficial for non-native speakers publishing in English), inaccuracies in generated text could fuel the spread of misinformation.
These concerns have serious implications for the integrity of the scientific record, given the risk of introducing not only errors but also plagiarised content into publications. This could result in future research or health policy decisions being made on the basis of false information. Last month, the World Association of Medical Editors published its recommendations on the use of ChatGPT and other chatbots in scholarly publications, one of which is that journal editors need new tools to detect AI-generated or modified content. Indeed, an AI output detector was shown to be better at distinguishing between original and ChatGPT-generated research article abstracts than a plagiarism detector and human reviewers, but did falsely flag an original abstract as being “fake”. Technology is evolving, and editorial policies need to evolve too. Elsevier has introduced a new policy on the use of AI and AI-assisted technologies in scientific writing, stipulating that use should be limited to improving readability and language of the work, and should be declared in the manuscript; authors should do manual checks of any AI-generated output; and these tools should not be listed or cited as an author or co-author as they cannot take on the responsibilities that authorship entails (such as being accountable for the published work). Widespread use of ChatGPT is seemingly inevitable but in its current iteration careless, unchecked use could be a foe to both society and scholarly publishing. More forethought and oversight on model training are needed, as is investment in robust AI output detectors. ChatGPT is a game changer, but we’re not quite ready to play.
Source: https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00023-7/fulltext
7/10/2023
AI Tool Developed at Harvard Decodes Brain Cancer’s Genome During Surgery
Real-time tumor profiling can guide surgical, treatment decisions
At a glance:
- New AI tool enables in-surgery genomic profiling of gliomas, the most aggressive and most common brain tumors.
- This information offers critical clues about how aggressive a cancer is, its future behavior, and its likely response to treatment.
- The tool can provide real-time guidance to surgeons on the optimal surgical approach for removal of cancerous tissue.
Scientists have designed an AI tool that can rapidly decode a brain tumor’s DNA to determine its molecular identity during surgery — critical information that under the current approach can take a few days and up to a few weeks.
Knowing a tumor’s molecular type enables neurosurgeons to make decisions such as how much brain tissue to remove and whether to place tumor-killing drugs directly into the brain — while the patient is still on the operating table.
A report on the work, led by Harvard Medical School researchers, is published July 7 in the journal Med.
Accurate molecular diagnosis — which details DNA alterations in a cell — during surgery can help a neurosurgeon decide how much brain tissue to remove. Removing too much when the tumor is less aggressive can affect a patient’s neurologic and cognitive function. Likewise, removing too little when the tumor is highly aggressive may leave behind malignant tissue that can grow and spread quickly.
“Right now, even state-of-the-art clinical practice cannot profile tumors molecularly during surgery. Our tool overcomes this challenge by extracting thus-far untapped biomedical signals from frozen pathology slides,” said study senior author Kun-Hsing Yu, assistant professor of biomedical informatics in the Blavatnik Institute at HMS.
Knowing a tumor’s molecular identity during surgery is also valuable because certain tumors benefit from on-the-spot treatment with drug-coated wafers placed directly into the brain at the time of the operation, Yu said.
“The ability to determine intraoperative molecular diagnosis in real time, during surgery, can propel the development of real-time precision oncology,” Yu added.
The standard intraoperative diagnostic approach used now involves taking brain tissue, freezing it, and examining it under a microscope. A major drawback is that freezing the tissue tends to alter the appearance of cells under a microscope and can interfere with the accuracy of clinical evaluation. Furthermore, the human eye, even when using potent microscopes, cannot reliably detect subtle genomic variations on a slide.
The new AI approach overcomes these challenges.
The tool, called CHARM (Cryosection Histopathology Assessment and Review Machine), is freely available to other researchers. It still has to be clinically validated through testing in real-world settings and cleared by the FDA before deployment in hospitals, the research team said.
Cracking cancer’s molecular code
Recent advances in genomics have allowed pathologists to differentiate the molecular signatures — and the behaviors that such signatures portend — across various types of brain cancer as well as within specific types of brain cancer. For example, glioma — the most aggressive brain tumor and the most common form of brain cancer — has three main subvariants that carry different molecular markers and have different propensities for growth and spread.
The new tool’s ability to expedite molecular diagnosis could be particularly valuable in areas with limited access to technology to perform rapid cancer genetic sequencing.
Beyond the decisions made during surgery, knowledge of a tumor’s molecular type provides clues about its aggressiveness, behavior, and likely response to various treatments. Such knowledge can inform post-operative decisions.
Furthermore, the new tool enables during-surgery diagnoses aligned with the World Health Organization’s recently updated classification system for diagnosing and grading the severity of gliomas, which calls for such diagnoses to be made based on a tumor’s genomic profile.
Training CHARM
CHARM was developed using 2,334 brain tumor samples from 1,524 people with glioma from three different patient populations. When tested on a never-before-seen set of brain samples, the tool distinguished tumors with specific molecular mutations at 93 percent accuracy and successfully classified three major types of gliomas with distinct molecular features that carry different prognoses and respond differently to treatments.
Going a step further, the tool successfully captured visual characteristics of the tissue surrounding the malignant cells. It was capable of spotting telltale areas with greater cellular density and more cell death within samples, both of which signal more aggressive glioma types.
The tool was also able to pinpoint clinically important molecular alterations in a subset of low-grade gliomas, a subtype of glioma that is less aggressive and therefore less likely to invade surrounding tissue. Each of these changes also signals different propensity for growth, spread, and treatment response.
The tool further connected the appearance of the cells — the shape of their nuclei, the presence of edema around the cells — with the molecular profile of the tumor. This means that the algorithm can pinpoint how a cell’s appearance relates to the molecular type of a tumor.
This ability to assess the broader context around the image renders the model more accurate and closer to how a human pathologist would visually assess a tumor sample, Yu said.
The researchers say that while the model was trained and tested on glioma samples, it could be successfully retrained to identify other brain cancer subtypes.
Scientists have already designed AI models to profile other types of cancer — colon, lung, breast — but gliomas have remained particularly challenging due to their molecular complexity and huge variation in tumor cells’ shape and appearance.
The CHARM tool would have to be retrained periodically to reflect new disease classifications as they emerge from new knowledge, Yu said.
“Just like human clinicians who must engage in ongoing education and training, AI tools must keep up with the latest knowledge to remain at peak performance.”
For Reference the research paper can be found here:
https://www.cell.com/med/fulltext/S2666-6340(23)00189-7
Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma
Nasrallah MP, Zhao J, Tsai CC, Meredith D, Marostica E, Ligon KL, Golden JA, Yu KH. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med. 2023 Jun 29:S2666-6340(23)00189-7. Epub ahead of print. PMID: 37421953. DOI: https://doi.org/10.1016/j.medj.2023.06.002
Highlights
Summary

Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma in Nasrallah MP, Zhao J, Tsai CC, Meredith D, Marostica E, Ligon KL, Golden JA, Yu KH. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med. 2023 Jun 29:S2666-6340(23)00189-7. doi: 10.1016/j.medj.2023.06.002. Epub ahead of print. PMID: 37421953.
Background
Methods
Findings
Conclusions
6/12/2023
Use of ChatGPT in BioInformatics and How a Collaboration between MIT and Microsoft called BioGPT may Revolutionize Bioinformatic Analysis including Medical Text Analysis
Reporter: Stephen J. Williams, Ph.D.
In past updates on this page, a few case uses of ChatGPT in oncology had been discussed. These examples mainly referred to either using ChatGPT for patient information or in use in diagnosis and medical case reporting. However, there are multiple potential uses in cancer research as well. One of the most garing uses is in the field of bioinformatics, which is highlighted in this curated post with link below:
In this curation, the usefullness of ChatGPT as a bioinformatic tool to perform everything from DNA and protein sequence analysis to structural motif analysis is discussed. Afew pros and cons of the Chat GPT algorithm is discussed and, as highlighted below, weaknesses with reporting and referencing sources. However, also discussed is the new collaboration between MIT and Microsoft called BioGPT may replace ChatGPT for medical text analysis and many bioinformatic tools including for bioinformatic tool programming.
5/26/2023
Evaluation of ChatGPT for information for cancer patients: answering questions regarding misconceptions
Reporter: Stephen J. Williams, Ph.D.
The National Cancer Institute (NCI) maintains an excellent and comprehensive website for patient information, complete with common questions and answers that patients may have regarding their cancer type and therapy. Their page, “Common Cancer Myths and Misconceptions” is meant to alleviate a cancer patient’s fears by providing correct information which answers any misconceptions or misinformation a patient might have concerning their treatment and prognosis. The site is well visited but, in many instances, patients are overwhelmed by the load of information. Therefore, a chatbot may alleviate the patient’s trepidations in interacting with the site and extracting the correct information they need. A paper in JNCI “Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information” used the information on this page as source of training for ChatGPT and tested the output for correct information given common queiries. The results showed answers supplied by ChatGPT were accurate as analyzed by a team of cancer experts. Therefore ChatGPT, trained on NCI information, may provide an excellent ChatBot experience for new cancer patients and alleviate fears and misconceptions concerning their tumor type.
Paper citation: Johnson SB, King AJ, Warner EL, Aneja S, Kann BH, Bylund CL. Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr. 2023 Mar 1;7(2):pkad015. doi: 10.1093/jncics/pkad015. PMID: 36929393; PMCID: PMC10020140.
Abstract
Data about the quality of cancer information that chatbots and other artificial intelligence systems provide are limited. Here, we evaluate the accuracy of cancer information on ChatGPT compared with the National Cancer Institute’s (NCI’s) answers by using the questions on the “Common Cancer Myths and Misconceptions” web page. The NCI’s answers and ChatGPT answers to each question were blinded, and then evaluated for accuracy (accurate: yes vs no). Ratings were evaluated independently for each question, and then compared between the blinded NCI and ChatGPT answers. Additionally, word count and Flesch-Kincaid readability grade level for each individual response were evaluated. Following expert review, the percentage of overall agreement for accuracy was 100% for NCI answers and 96.9% for ChatGPT outputs for questions 1 through 13 (ĸ = ‒0.03, standard error = 0.08). There were few noticeable differences in the number of words or the readability of the answers from NCI or ChatGPT. Overall, the results suggest that ChatGPT provides accurate information about common cancer myths and misconceptions.
The queries are given below as the Table 1 in thier paper (source: https://pubmed.ncbi.nlm.nih.gov/36929393/)
Table 1.
Cancer myths and misconceptions questions, answers, and descriptive informationa
| Question | NCI answer | NCI WC | NCI FKG | ChatGPT answer | C WC | C FKG |
| Is cancer a death sentence? |
|
133 | 15.8 | Cancer is not necessarily a death sentence. The prognosis for people with cancer depends on many factors, including the type and stage of the cancer, the person’s overall health, and the effectiveness of treatment. Some types of cancer can be cured, while others may be managed through treatments such as surgery, chemotherapy, radiation, or targeted therapies. In some cases, people with cancer may live many years with the disease, while in other cases, cancer may be more aggressive and may progress quickly. It is important to work with a healthcare team to determine the best treatment plan based on the specifics of your situation. | 104 | 11.4 |
| Will eating sugar make my cancer worse? | No. Although research has shown that cancer cells consume more sugar (glucose) than normal cells, no studies have shown that eating sugar will make your cancer worse or that, if you stop eating sugar, your cancer will shrink or disappear. However, a high-sugar diet may contribute to excess weight gain, and obesity is associated with an increased risk of developing several types of cancer. For more information, see the NCI fact sheet on Obesity and Cancer. | 76 | 12.7 |
|
234 | 13.7 |
| Do artificial sweeteners cause cancer? | No. Researchers have conducted studies on the safety of the artificial sweeteners (sugar substitutes) saccharin (Sweet ‘N Low®, Sweet Twin®, NectaSweet®); cyclamate; aspartame (Equal®, NutraSweet®); acesulfame potassium (Sunett®, Sweet One®); sucralose (Splenda®); and neotame and found no evidence that they cause cancer in humans. All of these artificial sweeteners except for cyclamate have been approved by the Food and Drug Administration for sale in the United States. For more information, see the NCI fact sheet on Artificial Sweeteners and Cancer. | 80 | 9.5 |
|
157 | 14.9 |
| Is cancer contagious? |
|
174 | 13.9 | Cancer is not a contagious disease, meaning it cannot be transmitted from person to person like a cold or the flu. Cancer is a disease in which abnormal cells divide and grow uncontrollably, forming tumors. There are many different types of cancer, and each type can have a different cause. Some types of cancer are caused by genetic factors, while others may be caused by exposure to certain substances, such as tobacco smoke or radiation. Some types of cancer may also be caused by infections, such as the human papillomavirus (HPV), which can cause certain types of cancer, including cervical cancer and some types of head and neck cancer. However, these infections are not transmitted through casual contact and are not contagious in the same way that infectious diseases are. | 130 | 11.2 |
| Does my attitude—positive or negative—determine my risk of getting or surviving cancer? | To date, there is no convincing scientific evidence that links a person’s “attitude” to their risk of developing or dying from cancer. If you have cancer, it’s normal to feel sad, angry, or discouraged sometimes and positive or upbeat at other times. People with a positive attitude may be more likely to maintain social connections and stay active, and physical activity and emotional support may help you cope with your cancer. For more information, see the NCI fact sheet on Stress and Cancer. | 83 | 11.2 |
|
167 | 20.2 |
| Can cancer surgery or a tumor biopsy cause cancer to spread in the body? | The chance that surgery will cause cancer to spread to other parts of the body is extremely low. Following standard procedures, surgeons use special methods and take many steps to prevent cancer cells from spreading during biopsies or surgery to remove tumors. For example, if they must remove tissue from more than one area of the body, they use different surgical tools for each area. For information about how cancer spreads in the body, see our page on Metastatic Cancer. | 80 | 11.0 |
|
215 | 13.4 |
| Will cancer get worse if exposed to air? | No. Exposure to air will not make tumors grow faster or cause cancer to spread to other parts of the body. For information about how cancer spreads in the body, see our page on Metastatic Cancer. | 36 | 8.8 |
|
198 | 12.8 |
| Do cell phones cause cancer? | No, not according to the best studies completed so far. Cancer is caused by genetic mutations, and cell phones emit a type of low-frequency energy that does not damage genes. For more information, see the NCI fact sheet on Cell Phones and Cancer Risk. | 44 | 7.5 |
|
196 | 15.3 |
| Do power lines cause cancer? | No, not according to the best studies completed so far. Power lines emit both electric and magnetic energy. The electric energy emitted by power lines is easily shielded or weakened by walls and other objects. The magnetic energy emitted by power lines is a low-frequency form of radiation that does not damage genes. For more information, see the NCI fact sheet on Electromagnetic Fields and Cancer. | 66 | 13.2 |
|
180 | 13.7 |
| Are there herbal products that can cure cancer? | No. Although some studies suggest that alternative or complementary therapies, including some herbs, may help patients cope with the side effects of cancer treatment, no herbal products have been shown to be effective for treating cancer. In fact, some herbal products may be harmful when taken during chemotherapy or radiation therapy because they may interfere with how these treatments work. Cancer patients should talk with their doctor about any complementary and alternative medicine products—including vitamins and herbal supplements—they may be using. For more information, see Complementary and Alternative Medicine. | 91 | 15.0 |
|
154 | 15.5 |
| If someone in my family has cancer, am I likely to get cancer, too? |
|
128 | 13.5 |
|
164 | 17.4 |
| If no one in my family has had cancer, does that mean I’m risk-free? | No. Based on the most recent data, about 38% of men and women will be diagnosed with cancer at some point during their lives. Most cancers are caused by genetic changes that occur throughout a person’s lifetime as a natural result of aging and exposure to environmental factors, such as tobacco smoke and radiation. Other factors, such as what kind of food you eat, how much you eat, and whether you exercise, may also influence your risk of developing cancer. For more information, see Cancer Causes and Risk Factors. | 90 | 11.8 |
|
157 | 18.7 |
| Do antiperspirants or deodorants cause breast cancer? | No. The best studies so far have found no evidence linking the chemicals typically found in antiperspirants and deodorants with changes in breast tissue. For more information, see the NCI fact sheet on Antiperspirants/Deodorants and Breast Cancer. | 38 | 12.6 |
|
188 |
5/20/2023
Synchronous Bilateral Breast Cancer: A Case Report Piloting and Evaluating the Implementation of the AI-Powered Large Language Model (LLM) ChatGPT
Reporter: Stephen J. Williams, Ph.D.
A report by Naik et al. 2023 compares a case reports of a rare type of breast cancer, synchronous bilateral breast cancer generated either by only by ChatGPT, or a combination of ChatGPT and human curators, in order to increase accuracy and speed of case report generation. Clinically, synchronous bilateral breast cancer is defined as the precence of invasive or in situ breast cancer in both breasts diagnosed within 6 months of each other. The diagosis is difficult al s-BBC presents at different stages and histopathological characteristics, including prognostic factors. The authors present the case report of s-BBC along with the treatment, prognosis, and clinical management. Details of clinical MRI images as well as molecular porfiles of the cancer are given. To generate the report the authors supplied ChatGPT with ten PubMed references for purposes of a query to assist in generating an AI-authored case report.
As credit is given in the acknowledgements:
Acknowledgments
The manuscript includes input from the AI-powered large language model (LLM) ChatGPT (OpenAI, San Francisco, CA), although a significant (<50%) component of the text is manually (human) generated. Text generated by ChatGPT versus human input, as well as text analysis via GPTzero (OpenAI, San Francisco, CA) of the three queries to ChatGPT and one query via domain-trained LLM (Bloom), as well as author impressions of the writing process, are available in the appendix.
Abstract
Primary breast carcinoma is the most common cancer type in women, and although bilateral synchronous breast cancers (s-BBC) remain quite rare, the reported incidence may increase with the adoption of more sensitive imaging modalities. Here, we present a case of histomorphological and clinically distinct s-BBC, together with a discussion of clinical management decisions, prognosis, and treatment standards and how these relate to outcomes vis-à-vis more established standards in unifocal breast carcinoma. The case report also constitutes a pilot and formal evaluation of a large language model (LLM) of ChatGPT as a tool to aid in generating a single patient case report.
You can read the full report here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183235/
Below is Dr. Gurda’s comments which are found at the conclusion of the paper:
Author impressions (Dr. Gurda)
Unfortunately, there is no such thing as a “free lunch” and not everything is as easy as it seems. The discussion sections generated by my queries to ChatGPT seemed to make sense, and superficially, the text generated appeared sensible, but the numbers and the citations produced were entirely fictitious. The authors and the journals existed; the article titles and numbers made general sense but were apparently pulled out of “thin air”, or rather generated by a confluence of text in the training set. As an example, ChatGPT text reads:
“In a retrospective study of 166 patients with s-BBC who underwent BCS, the 10-year overall survival rate was 77%, and the disease-free survival rate was 67%.
The study found that the presence of lobular carcinoma in situ, an extensive intraductal component, and lymph node involvement were significant predictors of worse outcomes (1). (1) Sun J, Huo L, Xie C, et al. The prognosis of synchronous bilateral breast cancer: a single-center experience. PLoS One. 2015;10(6):e0128343.”
It sounds credible, but this study does not exist. The authors exist, the journal exists, but the article and, most importantly, the data cited do not. Overall, for the purposes of generating a discussion section, ChatGPT at this point in its development (early 2023) appears to mostly have the characteristics of a ‘confidence man’ — superficially accurate and strong, but the substance rings hollow. Nonetheless, as stated elsewhere, I found ChatGPT and other LLMs to be useful in generating a skeletonized outline, organizing my thoughts about the subject in a logical manner (even if I had to gather the evidence for the arguments being made in the discussion ‘manually’ later), and in general as a motivation to get started and/or overcome writer’s block. Lastly, the full text of the article. What was generated by ChatGPT (verbatim) is highlighted, and manual human entry and edits are in standard text (Figures (Figures4,4, ,55):

Figure 3: ChatGPT queries for generation of report: training data ten Pubmed articles. Source: Naik HR, Prather AD, Gurda GT. Synchronous Bilateral Breast Cancer: A Case Report Piloting and Evaluating the Implementation of the AI-Powered Large Language Model (LLM) ChatGPT. Cureus. 2023 Apr 14;15(4):e37587. doi: 10.7759/cureus.37587. PMID: 37193434; PMCID: PMC10183235.
Note that the authors had to manually add in references. ChatGPT, as discussed in last week’s article, does not return its source of information.

Fig 4. ChatGPT generated (yellow highlight) and human generated (nonhighlighted) part of case report for sc-BCC. Note that many annotated features like references and Figure numbers had to be manually curated.
Source: Naik HR, Prather AD, Gurda GT. Synchronous Bilateral Breast Cancer: A Case Report Piloting and Evaluating the Implementation of the AI-Powered Large Language Model (LLM) ChatGPT. Cureus. 2023 Apr 14;15(4):e37587. doi: 10.7759/cureus.37587. PMID: 37193434; PMCID: PMC10183235.
Also note that ChatGPT could not determine which figure in a paper, or which data should be referred to in the AI generated case report. This also had to be manually added in by the authors.
And last to note:
The authors (not AI) had obtained consent or consent waivers from all study participants. Note how human involvement is needed in order for proper regulatory compliance. It is probable that an AI generated report would have not included such a statement or consent acquisition, therefore underlies the initial question:
Is AI and ChatGPT, used in clinical reporting, HIPAA compliant?
Naik HR, Prather AD, Gurda GT. Synchronous Bilateral Breast Cancer: A Case Report Piloting and Evaluating the Implementation of the AI-Powered Large Language Model (LLM) ChatGPT. Cureus. 2023 Apr 14;15(4):e37587. doi: 10.7759/cureus.37587. PMID: 37193434; PMCID: PMC10183235.
5/06/2023
Can ChatGPT Revolutionize How Cancer Patients Retrieve Relevant Information? A Critique of a head-to-head Comparison with Google Search
Reporter: Stephen J. Williams, Ph.D.
A recent paper in JNCI Cancer Spectrum “Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift” discusses the utility of ChatGPT versus Google search as a suitable source for cancer patient information on their disease and treatment [1]. The authors compared Google Search results with ChaptGPT results upon a series of queries related to cancer and cancer risk and outcome, specifically questions which would be comparable to frequently asked questions by patients. However, contrary to the paradigm-shift as suggested by the title, a more nuanced analysis of these results suggest that either format has its pros and cons as far as reliable information for patients, and that ChatGPT is in its infancy with respect to its ability to supply simple, yet comprehensive information.
Both search engines were asked questions such as
- What are the most common cancer types?
- What exactly is cancer?
The authors reported the main response from Google and top 5 responses from ChatGPT. Google returned “snippets” or one to two sentence answers for example that breast, prostate, and lung cancer are most common cancers. These snippets from Google would always include the source, usually as a referenced web link. ChatGPT answers were more nuanced with provided contextualization according to the patients characteristics (for example if female or male or age). When asked “What is cancer” ChatGPT, over 5 returned responses, showed variability in supplemental information such as causes, treatments, symptoms. However ChatGPT did not return weblinks unlike Google Search.
This unreferenced material would be a hindrance to the professional, who requires a hyperlink to a source in order to investigate sources and verify additional related information. The authors suggest the disruptive nature of ChatGPT is its ability to formulate responses and advice to complex questions, rather than simple queries.
This divergence between utility for a patient versus professional can be seen when analyzing the responses to more specific questions related to discrete medical terminology.
The authors asked GPT and Google these three questions:
- Is coughing a sign of lung cancer
- What is the prognosis of metastatic pancreatic cancer
- Does pembrolizumab cause fever, and should I go to the hospital
Let’s look at the differences in response:
The first notable difference is in supplemental information and sources. While ChatGPT does not give you their source of information Google returns informative and authoritative weblinks:
Is coughing a sign of lung cancer è 7,510,000 weblinks from Google
Google response: “The main symptoms of lung cancer include a cough that does not go away after 3 weeks; a long -standing cough that gets worse; chest infections that keep coming back”
Note: this is short but informative enough for a patient however not as much scientific information however the link to the NHS is sufficient.
ChatGPT responses: Among its many responses include the statement “Coughing can sometimes be a sign of lung cancer, but…” and then describes other diseases which show sign of coughing as symptom. None of the responses give a duration of symptoms like the Google response.
What is the prognosis of metastatic potential è 29.500.000 weblinks from Google
This was more telling as to the difference between the clinical relevance between Google and ChatGPT.
Google response snippet: “Stage IV pancreatic cancer has a five-year survival rate of 1%. The average patient diagnosed with late-stage pancreatic cancer will live for about 1 year after diagnosis” and weblink from JohnsHopkins. The information is clinically relevant as uses an authoritative source of medical information which informs metastatic is referred to as stage IV. In addition Google, using the source from JohnsHopkins, gives correct and up to date information on PROGNOSTIC values. However look at ChatGPT response:
ChatGPT responses:
Response 1: “Metastatic pancreatic cancer is a very serious and difficult to treat form of the disease. The prognosis for people with metastatic pancreatic cancer is generally poor, with most people surviving less than six months after being diagnosed. However, it is important to remember that each person is different, and some people may be able to live longer with the disease. Treatment options, such as chemotherapy and radiation, can help to extend life expectancy and improve quality of life. It is important to work with a healthcare team to develop a treatment plan that is tailored to your individual needs.”
Response 2: “The prognosis for metastatic pancreatic cancer is generally poor. This type of cancer has a tendency to spread quickly and is often difficult to treat. In most cases, the cancer has already spread to other parts of the body by the time it is diagnosed, making it difficult to completely remove the cancer. The average life expectancy for people with this type of cancer is about six months, although some people may live longer with treatment. However, treatment options for metastatic pancreatic cancer are limited and may not be effective in all cases. It is important to talk with a doctor about the specific prognosis for an individual’s case.”
Response 3: “The prognosis of metastatic pancreatic cancer, also known as stage 4 pancreatic cancer, is generally not very good. This is because by the time pancreatic cancer has reached this advanced stage, it has typically spread to other parts of the body and is more difficult to treat. Most people with stage 4 pancreatic cancer survive for less than six months after being diagnosed, and only a small percentage of people live for more than one year. However, it is important to remember that every person is different, and some people may respond better to treatment than others. It is always best to discuss your specific situation with your doctor.”
Note that ChatGPT takes a more conversational approach and tries to make the language in a less technical reading level (useful for patients) however it takes about five tries before ChatGPT give prognostic values (putting prognosis at 6 months; no five year survival which most clinicians report) yet talks more about treatment, what most clinicians would refer to as more predictive information.
Does pembrolizumab cause fever, and should I go to the hospital è 119,000 from Google
Note the a Google search returns the manufacturers website on Keytruda side effects at https://www.keytruda.com/side-effects/ however ChatGPT first response is rather disappointing:
“I’m sorry, but I am not able to browse the internet and therefore do not have access to information about the effects of pembrolizumab. I would strongly recommend speaking with a healthcare professional if you are experiencing fever and are concerned about your health. They will be able to provide you with personalized medical advice based on your specific situation.”
For a patient ChatBot this would be very frustrating. Other answers say that pembolizumab is a type of cancer treatment called an immune checkpoint inhibitor and then says fever is a side effect in which should be reported to your doctor but this is the third possible response from GhatGPT.
“Pembrolizumab is a type of cancer treatment known as an immune checkpoint inhibitor. It can cause side effects such as fever, but it is important to discuss any concerns with your doctor. If you are experiencing fever or other severe symptoms, you should contact your doctor immediately and follow their advice. They will be able to provide you with personalized recommendations based on your individual situation. In some cases, it may be necessary to go to the hospital for further evaluation and treatment.”
The fourth response lists pembrolizumab as a type of cancer therapy called immunotherapy, which is a more understandable term for the patient.
In summary, while Google returns very informative weblinks, ChatGPT does have the capacity to relate information in a more conversational tone and less technical verbiage, more suitable for patients. However it could be frustrating to patients to have to go through multiple iterations to obtain a suitable answer, and no references are given nor source information. For the clinician ChatGPT seems to have more learning and training to do to reach a level which would be suitable for the clinical level. ChatGPT ability to relate to patients though could be a benefit.
Link for Source article: https://academic.oup.com/jncics/article/7/2/pkad010/7049531
Reference
- Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ: Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectrum 2023, 7(2).
4/23/2023
ChatGPT and Potential Healthcare Implications of Large Language Models – a talk from Dr. George Shih, M.D. FACR
at the March NCI Imaging and Informatics Community Webinar (IICW)! During the event, Weill Cornell Medicine’s Dr. George Shih will talk about ChatGPT and large language models.
ChatGPT is an AI chatbot, built on top of large language models, with the potential to impact many aspects of healthcare. Specifically, Dr. Shih will discuss:
examples of ChatGPT in healthcare, including current limitations.
potential implications of ChatGPT and large language models for researchers, providers, patients, and more.
how ChatGPT and AI’s evolution over time might impact users.
Both the Center for Biomedical Informatics and Information Technology and the Cancer Imaging Program organize the monthly NCI IICW. During the first Monday of every month, this event features scientific presentations and project updates. To receive updates on future topics and to access previous presentations and recordings, visit the IICW webpage.
Dr.Shih focuses his talk on the use of ChatGPT on analyzing DICOM images for radiology. The ChapGPT did a reasonable job at report generation and even coding in Python, however many radiologists are still hesitant to use it as some medical terminology ChatGPT has problems with. However Dr. Shih believes that will be addressed over time. One interesting example was a case study given to ChatGPT with five choices on further evaluations to suggest. ChatGPT, much like multiple radiologists, suggested an answer not give in the list of choices given to ChatGPT, and surprisingly ChatGPT did not use any of the choices given to it but went out to literature to find best suggestions.
However he does state that ChatGPT is not good at giving citations or referencing the literature correctly. Note a very good explanation on how GhatGPT3 works (OpenAI build on ChatGPT then with human instructors became ChatGPT4. the basis for ChatGPT is the algorithm is predicting the next word, like your phone does when you start typing a word
Watch the video of this talk
4/13/2023
The utility of ChatGPT for cancer treatment information
ABSTRACT
The use of large language models (LLMs) such as ChatGPT for medical question-answering is becoming increasingly popular. However, there are concerns that these models may generate and amplify medical misinformation. Because cancer patients frequently seek to educate themselves through online resources, some individuals will likely use ChatGPT to obtain cancer treatment information. This study evaluated the performance and robustness of ChatGPT in providing breast, prostate, and lung cancer treatment recommendations that align with National Comprehensive Cancer Network (NCCN) guidelines. Four prompt templates were created to explore how differences in how the query is posed impacts response. ChatGPT output was scored by 3 oncologists and a 4th oncologist adjudicated in cases of disagreement. ChatGPT provided at least one NCCN-concordant recommendation for 102/104 (98%) prompts. However, 35/102 (34.3%) of these also included a recommendation that was at least partially non-concordant with NCCN guidelines. Responses varied based on prompt type. In conclusion, ChatGPT did not perform well at reliably and robustly providing cancer treatment recommendations. Patients and clinicians should be aware of the limitations of ChatGPT and similar technologies for self-education.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The authors thank the Woods Foundation for their generous support of this work.

