Next Generation Sequencing (NGS) | Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA

APOE was marker for defining a long term survivor and short term survivor for ovarian cancer patients; the markers were in the stroma
there is spatial communication between tumor and underlying stroma
it is imperative to understand how your multiomics equipment images a tumor area before it laser captures and send to the MS system; can lose a lot of tissue and information based on differences in resolution
many of these multiomics systems are validated for the clinic in EU not US
multiomics spatial analysis allows you to image protein, metabolite, mRNA expression in the 3 dimensional environment of the tumor (tumor cells and stroma)
they are making a human tumor atlas
they say a patient who had tumor went home during COVID and took vaccine but got ill with vaccine; but came back to check tumor and tumor had greatly regressed because prevaccine the tumor was immunologically cold and post COVID vaccine any left over tumor showed great infiltration of immune cells

4:40-4:55

Aruna Ayer, PhD, VP, Multiomics, Innovation and Scientific Affairs, BD Biosciences

BD Bioscience multiomics platform is modular and can add more omics levels in the platorm
for example someone wanted to look at T cells
people have added CRISPR screens on the omics platform
most people are using single cell spatial omics
they have a FACS on their platform too so you can look at single cell spatial omics and sort different cellular populations
very comparative to 10X Genomics platform
their proteomics is another layer you can add on their platform however with proteomics you can high background notice with spatial proteomics or a limited panel of biomarkers
Their OMICS Protein One panels are optimized for biology and tumor type.
get high quality multiomics data and proteomics data but in a 3D spatial format
developed Cellismo Data Visualization software tool

4:55-5:10

Harsha Gowda, PhD, Senior Principal Scientist, Director, Research & Lab Operations, Signios Bio

Signios Biosciences (Signios Bio) is the US-based arm of MedGenome, a global leader in genetic testing services, genomics research, and drug discovery solutions.

Signios Bio is a multiomics and bioinformatics company dedicated to revealing the intricate signals within biological data. We leverage the power of multiomics—integrating data from genomics, transcriptomics, proteomics, epigenomics, metabolomics, and microbiomics—to gain a comprehensive understanding of disease biology. Our AI-powered bioinformatics platform allows us to efficiently analyze these complex datasets, uncovering hidden patterns and accelerating the development of new therapies and diagnostics.

Through the integration of cutting-edge multiomics technologies, advanced bioinformatics, and the expertise of world-class scientists, we enable researchers and clinicians with comprehensive, end-to-end solutions to improve drug discovery and development and advance precision medicine.

As part of MedGenome, we have access to real-world evidence (RWE) from global research networks across the US, Europe, Asia, Africa, Middle East, and Latin America. This access enables us to work with our partners to uncover insights that can lead to new biomarkers and drug targets, ensuring that precision medicine is inclusive and effective for all.

https://www.signiosbiolcom

their platform can do high throughput analysis of patient tumors (like gallbladder cancer) analyzing mutational spectrum with high dimensionality
they can integrate genomic and transcriptomics data to reveal multiple pathways affected in patient data
have used their platform to investigate spatial omics in lung cancer

Read Full Post »

Real Time Coverage Afternoon Session on Precision Oncology: Advancing Precision Medicine Annual Conference, Philadelphia PA November 1 2024

Posted in and Bioethics, Artificial Intelligence - Breakthroughs in Theories and Technologies, Artificial Intelligence Applications in Health Care, Artificial Intelligence in CANCER, Big Data, BioBanking, Biomarkers & Medical Diagnostics, Cancer Genomics, Cancer Informatics, Childhood cancer, Clinical Diagnostics, Clinical Genomics, Conference Coverage with Social Media, FDA, FDA Regulatory Affairs, Health Economics and Outcomes Research, Health Law & Patient Safety, Healthcare costs and reimbursement, Liquid Biopsy Chip detects an array of metastatic cancer cell markers in blood, Machine Learning, Natural Language Processing (NLP), Next Generation Sequencing (NGS), United States, tagged AI in Healthcare, Biomarker discovery, biomarker panels, FDA, genomic panels, genomics, health costs, healthcare models, Predictive Cancer Biomarkers on November 1, 2024| Leave a Comment »

Coverage Afternoon Session on Precision Oncology: Advancing Precision Medicine Annual Conference, Philadelphia PA November 1 2024

Reporter: Stephen J. Williams, Ph.D.

Unlocking the Next Quantum Leap in Precision Medicine – A Town Hall Discussion (CME Eligible)

Co-Chairs

Amanda Paulovich, Professor, Aven Foundation Endowed Chair
Fred Hutchinson Cancer Center

Susan Monarezm Deputy Director ARPA-H

Henry Rodriguez, NCI/NIH

Eric Schadt, Pathos

Ezra Cohen, Tempus

Jennifer Leib, Innovation Policy Solutions

Nick Seddon, Optum Genomics

Giselle Sholler, Penn State Hershey Children’s Hospital

Janet Woodcock, formerly FDA

Amanda Paulovich: Frustrated by the variability in cancer therapy results. Decided to help improve cancer diagnostics

We have plateaued on relying on single gene single protein companion diagnostics
She considers that regulatory, economic, and cultural factors are hindering the innovation and resulting in the science way ahead of the clinical aspect of diagnostics
Diagnostic research is not as well funded as drug discovery
Biomarkers, the foundation for the new personalized medicine, should be at forefront Read the Tipping Point by Malcolm Gladwell
FDA is constrained by statutory mandates

Eric Schadt

Pathos

Multiple companies trying to chase different components of precision medicine strategy including all the one involved in AI
He is helping companies creating those mindmaps, knowledge graphs, and create more predictive systems
Population screening into population groups will be using high dimensional genomic data to determine risk in various population groups however 60% of genomic data has no reported ancestry
He founded Sema4 but many of these companies are losing $$ on these genomic diagnostics
So the market is not monetizing properly
Barriers to progress: arbitrary evidence thresholds for payers, big variation across health care system, regulatory framework

Beat Childhood Cancer Consortium Giselle

Consortium of university doctors in pediatrics
They had a molecular tumor board to look at the omics data
Showed example of choroid plexus tumor success with multi precision meds vs std chemo
Challenges: understanding differences in genomics test (WES, NGS, transcriptome etc.
Precision medicine needs to be incorporated in med education.. Fellowships.. Residency
She spends hours with the insurance companies providing more and more evidence to justify reimbursements
She says getting that evidence is a challenged; biomedical information needs to be better CURATED

Dr. Ezra Cohen, Tempest

HPV head and neck cancer, good prognosis, can use cituximab and radiation
$2 billion investment at Templest of AI driven algorithm to integrate all omics; used LLM models too

Dr. Janet Woodcock

Our theoretical problem with precision and personalized medicine is that we are trained to think of the average patient
ISPAT II trial a baysian trial; COVID was a platform trial
She said there should there be NIH sponsored trials on adaptive biomarker platform trials

This event will be covered by the LPBI Group on Twitter. Follow on

using the following meeting hashtags

#AdvancingPM #precisionmedicine

Read Full Post »

The Use of ChatGPT in the World of BioInformatics and Cancer Research and Development of BioGPT by MIT

Posted in Advanced Computing Platform, Artificial Intelligence - Breakthroughs in Theories and Technologies, Artificial Intelligence - General, Artificial Intelligence Applications in Health Care, Artificial Intelligence in CANCER, Artificial Intelligence in Health Care - Tools & Innovations, BioIT: BioInformatics, BioIT: BioInformatics, NGS, Clinical & Translational, Pharmaceutical R&D Informatics, Clinical Genomics, Cancer Informatics, Biological Networks, Biological Networks, Gene Regulation and Evolution, ChatGPT in Academic Education, ChatGPT, GPT-4, Deep Learning, Intelligent Information Systems, Machine Learning, Natural Language Processing (NLP), Next Generation Sequencing (NGS), tagged Artificial intelligence, Artificial Intelligence (AI), artificial intelligence in drug design, BioGPT, bioinformatic tools, BioPERL, ChatGPT, gene sequencing, microsoft, MIT, Natural Language Processing (NLP), systems biology on June 12, 2023| 3 Comments »

The Use of ChatGPT in the World of BioInformatics and Cancer Research and Development of BioGPT by MIT

Curator: Stephen J. Williams, Ph.D.

Chatbots are being used in multiple interdisciplanary areas in research and medicine and so it was a natural progression to incorporate artificial intelligence (AI), natural language processing (NLP), and chatbot technology like ChatGPT into bioinformatic analysis.

“In domains like informatics, management and marketing, media and communication science, languages and philosophy, psychology and sociology, engineering, design, and human-computer interaction, the fast expanding body of chatbot study is clearly interdisciplinary.

The field of bioinformatics is a natural fit for incorporting this technology. The curated information below shows some use examples of this technology in bioinformatics related to cancer research.

The text in this reference source was actually generated by ChatGPT (https://chat.openai.com/chat)

Source: https://omicstutorials.com/using-chatgpt-in-bioinformatics-and-biomedical-research/

How ChatGPT can be used in bioinformatics research?
ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model that was designed to generate human-like text in a conversational setting. It is not directly related to bioinformatics, which is the field of study that deals with the storage, retrieval, and analysis of biological data, particularly in the context of genetics and genomics. However, ChatGPT could potentially be used in a bioinformatics context as a tool for generating natural language descriptions of genetic or genomic data, or for generating responses to queries about such data.

For example, a researcher could use ChatGPT to generate descriptions of genetic variants or gene expression patterns based on input data. This could be useful for generating summaries of results for reports or papers, or for generating explanations of complex concepts for non-expert audiences. ChatGPT could also be used to generate responses to queries about genetic or genomic data, such as “What is the function of gene X?” or “What are the potential consequences of variant Y?”

It’s worth noting that ChatGPT is just one of many tools and techniques that can be used in bioinformatics, and it is unlikely to be the most suitable or effective option in every situation. There are many specialized tools and resources available for working with biological data, and it is often necessary to use a combination of these tools to fully analyze and understand the data.

Generating descriptions of genetic or genomic data: ChatGPT could be used to generate natural language descriptions of genetic or genomic data based on input data. For example, suppose a researcher has a dataset containing information about gene expression levels in different tissues. The researcher could use ChatGPT to generate a description of the data, such as:
“Gene X is highly expressed in the liver and kidney, with moderate expression in the brain and heart. Gene Y, on the other hand, shows low expression in all tissues except for the lung, where it is highly expressed.”

Thereby ChatGPT, at its simplest level, could be used to ask general questions like “What is the function of gene product X?” and a ChatGPT could give a reasonable response without the scientist having to browse through even highly curated databases lie GeneCards or UniProt or GenBank. Or even “What are potential interactors of Gene X, validated by yeast two hybrid?” without even going to the curated InterActome databases or using expensive software like Genie.

Summarizing results: ChatGPT could be used to generate summaries of results from genetic or genomic studies. For example, a researcher might use ChatGPT to generate a summary of a study that found a association between a particular genetic variant and a particular disease. The summary might look something like this:
“Our study found that individuals with the variant form of gene X are more likely to develop disease Y. Further analysis revealed that this variant is associated with changes in gene expression that may contribute to the development of the disease.”

It’s worth noting that ChatGPT is just one tool that could potentially be used in these types of applications, and it is likely to be most effective when used in combination with other bioinformatics tools and resources. For example, a researcher might use ChatGPT to generate a summary of results, but would also need to use other tools to analyze the data and confirm the findings.

ChatGPT is a variant of the GPT (Generative Pre-training Transformer) language model that is designed for open-domain conversation. It is not specifically designed for generating descriptions of genetic variants or gene expression patterns, but it can potentially be used for this purpose if you provide it with a sufficient amount of relevant training data and fine-tune it appropriately.

To use ChatGPT to generate descriptions of genetic variants or gene expression patterns, you would first need to obtain a large dataset of examples of descriptions of genetic variants or gene expression patterns. You could use this dataset to fine-tune the ChatGPT model on the task of generating descriptions of genetic variants or gene expression patterns.

Here’s an example of how you might use ChatGPT to generate a description of a genetic variant:

First, you would need to pre-process your dataset of descriptions of genetic variants to prepare it for use with ChatGPT. This might involve splitting the descriptions into individual sentences or phrases, and encoding them using a suitable natural language processing (NLP) library or tool.

Next, you would need to fine-tune the ChatGPT model on the task of generating descriptions of genetic variants. This could involve using a tool like Hugging Face’s Transformers library to load the ChatGPT model and your pre-processed dataset, and then training the model on the task of generating descriptions of genetic variants using an appropriate optimization algorithm.

Once the model has been fine-tuned, you can use it to generate descriptions of genetic variants by providing it with a prompt or seed text and asking it to generate a response. For example, you might provide the model with the prompt “Generate a description of a genetic variant associated with increased risk of breast cancer,” and ask it to generate a response. The model should then generate a description of a genetic variant that is associated with increased risk of breast cancer.

It’s worth noting that generating high-quality descriptions of genetic variants or gene expression patterns is a challenging task, and it may be difficult to achieve good results using a language model like ChatGPT without a large amount of relevant training data and careful fine-tuning.

To train a language model like chatGPT to extract information about specific genes or diseases from research papers, you would need to follow these steps:

Gather a large dataset of research papers that contain information about the specific genes or diseases you are interested in. This dataset should be diverse and representative of the types of papers you want the model to be able to extract information from.

Preprocess the text data in the research papers by tokenizing the text and creating a vocabulary. You may also want to consider lemmatizing or stemming the text to reduce the dimensionality of the dataset.

Train the language model on the preprocessed text data. You may want to fine-tune a pre-trained model such as chatGPT on your specific dataset, or you can train a new model from scratch.

ChatGPT could also be useful for sequence analysis

A few examples of sequence analysis a ChatGPT could be useful include:

Protein structure
Identifying functional regions of a protein
Predicting protein-protein interactions
Identifying protein homologs
Generating Protein alignments

All this could be done without having access to UNIX servers or proprietary software or knowing GCG coding

ChatGPT in biomedical research
There are several potential ways that ChatGPT or other natural language processing (NLP) models could be applied in biomedical research:

Text summarization: ChatGPT or other NLP models could be used to summarize large amounts of text, such as research papers or clinical notes, in order to extract key information and insights more quickly.

Data extraction: ChatGPT or other NLP models could be used to extract structured data from unstructured text sources, such as research papers or clinical notes. For example, the model could be trained to extract information about specific genes or diseases from research papers, and then used to create a database of this information for further analysis.

Literature review: ChatGPT or other NLP models could be used to assist with literature review tasks, such as identifying relevant papers, extracting key information from papers, or summarizing the main findings of a group of papers.

Predictive modeling: ChatGPT or other NLP models could be used to build predictive models based on large amounts of text data, such as electronic health records or research papers. For example, the model could be trained to predict the likelihood of a patient developing a particular disease based on their medical history and other factors.

It’s worth noting that while NLP models like ChatGPT have the potential to be useful tools in biomedical research, they are only as good as the data they are trained on, and it is important to carefully evaluate the quality and reliability of any results generated by these models.

ChatGPT in text mining of biomedical data
ChatGPT could potentially be used for text mining in the biomedical field in a number of ways. Here are a few examples:

Extracting information from scientific papers: ChatGPT could be trained on a large dataset of scientific papers in the biomedical field, and then used to extract specific pieces of information from these papers, such as the names of compounds, their structures, and their potential uses.

Generating summaries of scientific papers: ChatGPT could be used to generate concise summaries of scientific papers in the biomedical field, highlighting the main findings and implications of the research.

Identifying trends and patterns in scientific literature: ChatGPT could be used to analyze large datasets of scientific papers in the biomedical field and identify trends and patterns in the data, such as emerging areas of research or common themes among different papers.

Generating questions for further research: ChatGPT could be used to suggest questions for further research in the biomedical field based on existing scientific literature, by identifying gaps in current knowledge or areas where further investigation is needed.

Generating hypotheses for scientific experiments: ChatGPT could be used to generate hypotheses for scientific experiments in the biomedical field based on existing scientific literature and data, by identifying potential relationships or associations that could be tested in future research.

PLEASE WATCH VIDEO

In this video, a bioinformatician describes the ways he uses ChatGPT to increase his productivity in writing bioinformatic code and conducting bioinformatic analyses.

He describes a series of uses of ChatGPT in his day to day work as a bioinformatian:

Using ChatGPT as a search engine: He finds more useful and relevant search results than a standard Google or Yahoo search. This saves time as one does not have to pour through multiple pages to find information. However, a caveat is ChatGPT does NOT return sources, as highlighted in previous postings on this page. This feature of ChatGPT is probably why Microsoft bought OpenAI in order to incorporate ChatGPT in their Bing search engine, as well as Office Suite programs

ChatGPT to help with coding projects: Bioinformaticians will spend multiple hours searching for and altering open access available code in order to run certain function like determining the G/C content of DNA (although there are many UNIX based code that has already been established for these purposes). One can use ChatGPT to find such a code and then assist in debugging that code for any flaws

ChatGPT to document and add coding comments: When writing code it is useful to add comments periodically to assist other users to determine how the code works and also how the program flow works as well, including returned variables.

One of the comments was interesting and directed one to use BIOGPT instead of ChatGPT

@tzvi7989

1 month ago (edited)

0:54 oh dear. You cannot use chatgpt like that in Bioinformatics as it is rn without double checking the info from it. You should be using biogpt instead for paper summarisation. ChatGPT goes for human-like responses over precise information recal. It is quite good for debugging though and automating boring awkward scripts

So what is BIOGPT?

BioGPT https://github.com/microsoft/BioGPT

The BioGPT model was proposed in BioGPT: generative pre-trained transformer for biomedical text generation and mining by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. BioGPT is a domain-specific generative pre-trained Transformer language model for biomedical text generation and mining. BioGPT follows the Transformer language model backbone, and is pre-trained on 15M PubMed abstracts from scratch.

The abstract from the paper is the following:

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.

Tips:

BioGPT is a model with absolute position embeddings so it’s usually advised to pad the inputs on the right rather than the left.
BioGPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next token in a sequence. Leveraging this feature allows BioGPT to generate syntactically coherent text as it can be observed in the run_generation.py example script.
The model can take the past_key_values (for PyTorch) as input, which is the previously computed key/value attention pairs. Using this (past_key_values or past) value prevents the model from re-computing pre-computed values in the context of text generation. For PyTorch, see past_key_values argument of the BioGptForCausalLM.forward() method for more information on its usage.

This model was contributed by kamalkraj. The original code can be found here.

This repository contains the implementation of BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. BioGPT is a github which is being developed by MIT in collaboration with Microsoft. It is based on Python.

License

BioGPT is MIT-licensed. The license applies to the pre-trained models as well.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

As of right now this does not seem Open Access, however a sign up is required!

We provide our pre-trained BioGPT model checkpoints along with fine-tuned checkpoints for downstream tasks, available both through URL download as well as through the Hugging Face 🤗 Hub.

Model	Description	URL	🤗 Hub
BioGPT	Pre-trained BioGPT model checkpoint	link	link
BioGPT-Large	Pre-trained BioGPT-Large model checkpoint	link	link
BioGPT-QA-PubMedQA-BioGPT	Fine-tuned BioGPT for question answering task on PubMedQA	link
BioGPT-QA-PubMedQA-BioGPT-Large	Fine-tuned BioGPT-Large for question answering task on PubMedQA	link
BioGPT-RE-BC5CDR	Fine-tuned BioGPT for relation extraction task on BC5CDR	link
BioGPT-RE-DDI	Fine-tuned BioGPT for relation extraction task on DDI	link
BioGPT-RE-DTI	Fine-tuned BioGPT for relation extraction task on KD-DTI	link
BioGPT-DC-HoC	Fine-tuned BioGPT for document classification task on HoC	link

So far as to date (6/12/2023) there are three papers referencing BioGPT on PubMed (abstracts and links below)

BioGPT: generative pre-trained transformer for biomedical text generation and mining.

Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, Liu TY.Brief Bioinform. 2022 Nov 19;23(6):bbac409. doi: 10.1093/bib/bbac409.PMID: 36156661

We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. …Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fl …

Here they show the advantages of BioGPT over BioBERT and ChatGPT in an example of text extraction and question answer

Table 7. Example of generated text

Input Model Text

Bicalutamide GPT-2 Bicalutamide, a synthetic bicalutamide derivative, was found to increase the cellular proliferation

of C. elegans cells. This effect was not observed with the other synthetic bicalutamide derivatives. This is in line with our previous work which showed that bicalutamide increases the proliferation of C. elegans cells but not in the presence of synthetic bicalutamide.

BioGPT Bicalutamide (Casodex) is an androgen receptor (AR) antagonist approved for the treatment of metastatic castration-resistant prostate cancer (mCRPC) in patients who have progressed on or are ineligible for docetaxel chemotherapy, as well as for the treatment of early-stage prostate cancer in men who have not progressed on or are ineligible for docetaxel chemotherapy, as well as for the treatment of metastatic castration-sensitive prostate cancer (mCSPC) in men who have not progressed on or are ineligible for docetaxel chemotherapy.

Janus kinase 3 (JAK-3) GPT-2 Janus kinase 3 (JAK-3) mediates the conversion of glucose to glycogen in skeletal muscle. The
increase in glucose uptake by skeletal muscle is believed to be associated with a reduction in muscle protein breakdown and with the release of ketone bodies, which contribute to the metabolism of glucose.

BioGPT Janus kinase 3 (JAK-3) is a member of the Janus kinase (JAK) family of non-receptor tyrosine kinases and plays an important role in the regulation of cell proliferation, differentiation, survival, migration and angiogenesis.

Apricitabine GPT-2 Apricitabine, duloxetine and citalopram) were investigated in patients with a history of
treatment-resistant depression (n = 10). We found that treatment with the three drugs significantly decreased depressive symptoms, but not depressive symptoms over the next 12 weeks. We conclude that antidepressant drugs that reduce depressive symptoms and reduce depressive symptoms over time may be beneficial in patients with recurrent depression.

BioGPT Apricitabine is an oral prodrug of 5-aza-2 ’-deoxycytidine (5-aza-CdR), a DNA methyltransferase (DNMT) inhibitor, which has been approved by the US Food and Drug Administration (FDA) for the treatment of myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML) in combination with low-dose cytarabine (Ara-C) and granulocyte colony-stimulating factor (G-CSF) for patients with intermediate-2 or high-risk MDS or AML.

Xylazine GPT-2 Xylazine, the ‘active ingredient’ in ‘bath salts’ (see above) has been linked to numerous deaths,
including those of former University of California, Berkeley, professor Andrew Anglin, who posted

a video of himself having a bath filled with it.

BioGPT Xylazine is an alpha 2-adrenoceptor agonist which has been used as a sedative and analgesic in veterinary medicine for many years, but its effects on the cardiovascular system have not been extensively studied in the dog, and its effects on the central nervous system (CNS) have not been well characterized in the dog, despite the fact that xylazine has been widely used as a sedative and analgesic in veterinary medicine for more than 30 years.

Psoralen GPT-2 Psoralen, the city of the spiritualist, and had been given to him by God, who wished to show him
the way of salvation, and to draw him back to the true life of Christ, and to bring him into communion with his fellow-man.

BioGPT Psoralen photochemotherapy (PUVA) is a well-established treatment for psoriasis, but its use is limited by the risk of skin cancer, particularly squamous cell carcinoma (SCC) of the head and neck (H & N), which is the most common site of skin cancer in the United States (US)

CP-673451 GPT-2 CP-673451) and 4-5-6-3-1-2-3-4-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-

28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50…

BioGPT CP-673451 is a potent, selective, and orally active inhibitor of human neutrophil elastase (HNE) and human cathepsin G (CatG) with in vitro and in vivo anti-inflammatory activity in a variety of animal models of inflammation and in a model of acute lung injury (ALI) in the rat induced by intratracheal instillation of lipopolysaccharide (LPS) and tumor necrosis factor-alpha (TNF-alpha), a model of acute lung injury (ALI) in which neutrophils play an important role.

BIIB-021 GPT-2 BIIB-021A-2-1.pdf. Wright, C. (2002). ‘The War on Terror: A Legal Theory of Terrorist Attacks,’ in
L.E.R. Wright, S.R. Warrick and J.H. Rothman (Eds.), The War on Terror: Essays in Honor of Michael T. Klare (New York: Oxford University Press), 143-57.

BioGPT BIIB-021 is a novel, orally active, non-peptide bradykinin B2 receptor antagonist with potent and long-lasting anti-inflammatory activity in animal models of acute and chronic inflammation and in a rat model of adjuvant-induced arthritis (AIA), an animal model of rheumatoid arthritis (RA) and in a rat model of collagen-induced arthritis (CIA), an animal model of collagen-induced arthritis (CIA), in which arthritis is induced by immunization with bovine type II collagen (CII).

Note how BioGPT is more descriptive and accurate!

EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information.

Huang L, Lin J, Li X, Song L, Zheng Z, Wong KC.Brief Bioinform. 2022 Jan 17;23(1):bbab451. doi: 10.1093/bib/bbab451.PMID: 34791012

The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pretrained on biomedical corpus. In particular, we propose the multihead self-attention mechanism and packed BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pretrained language model BioGPT-2 where the generation sentences are selected based on filtering rules.

Results: We evaluated the classification part on ‘DDIs 2013’ dataset and ‘DTIs’ dataset, achieving the F1 scores of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the existing growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset demonstrated the potential of EGFI to identify novel drug relationships.

Availability: Source code are publicly available at https://github.com/Layne-Huang/EGFI.

GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information.

Jin Q, Yang Y, Chen Q, Lu Z.ArXiv. 2023 May 16:arXiv:2304.09667v3. Preprint.PMID: 37131884 Free PMC article.

While large language models (LLMs) have been successfully applied to various tasks, they still face challenges with hallucinations. Augmenting LLMs with domain-specific tools such as database utilities can facilitate easier and more precise access to specialized knowledge. In this paper, we present GeneGPT, a novel method for teaching LLMs to use the Web APIs of the National Center for Biotechnology Information (NCBI) for answering genomics questions. Specifically, we prompt Codex to solve the GeneTuring tests with NCBI Web APIs by in-context learning and an augmented decoding algorithm that can detect and execute API calls. Experimental results show that GeneGPT achieves state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83, largely surpassing retrieval-augmented LLMs such as the new Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as GPT-3 (0.16) and ChatGPT (0.12). Our further analyses suggest that: (1) API demonstrations have good cross-task generalizability and are more useful than documentations for in-context learning; (2) GeneGPT can generalize to longer chains of API calls and answer multi-hop questions in GeneHop, a novel dataset introduced in this work; (3) Different types of errors are enriched in different tasks, providing valuable insights for future improvements.

PLEASE WATCH THE FOLLOWING VIDEOS ON BIOGPT

This one entitled

Microsoft’s BioGPT Shows Promise as the Best Biomedical NLP

gives a good general description of this new MIT/Microsoft project and its usefullness in scanning 15 million articles on PubMed while returning ChatGPT like answers.

Please note one of the comments which is VERY IMPORTANT

@rufus9322

2 months ago

bioGPT is difficult for non-developers to use, and Microsoft researchers seem to default that all users are proficient in Python and ML.

Much like Microsoft Azure it seems this BioGPT is meant for developers who have advanced programming skill. Seems odd then to be paying programmers multiK salaries when one or two Key Opinion Leaders from the medical field might suffice but I would be sure Microsoft will figure this out.

ALSO VIEW VIDEO

This is a talk from Microsoft on BioGPT

The Human Genome Gets Fully Sequenced: A Simplistic Take on Century Long Effort

Posted in Big Data, Bio Instrumentation in Experimental Life Sciences Research, BioIT: BioInformatics, BioIT: BioInformatics, NGS, Clinical & Translational, Pharmaceutical R&D Informatics, Clinical Genomics, Cancer Informatics, Biological Networks, Gene Regulation and Evolution, Disease Biology, Genome Biology, Next Generation Sequencing (NGS), Nobel Prize Winners, Single-cell sequencing, Variation in human protein-coding regions, tagged AAAS, Craig Venter, DNA Sequencing, genetic variants, Human Genome Project, Jennifer Doudna, junk DNA, telomere consortium on June 14, 2022| Leave a Comment »

The Human Genome Gets Fully Sequenced: A Simplistic Take on Century Long Effort

Curator: Stephen J. Williams, PhD

Article ID #295: The Human Genome Gets Fully Sequenced: A Simplistic Take on Century Long Effort. Published on 6/14/2022

WordCloud Image Produced by Adam Tubman

Ever since the hard work by Rosalind Franklin to deduce structures of DNA and the coincidental work by Francis Crick and James Watson who modeled the basic building blocks of DNA, DNA has been considered as the basic unit of heredity and life, with the “Central Dogma” (DNA to RNA to Protein) at its core. These were the discoveries in the early twentieth century, and helped drive the transformational shift of biological experimentation, from protein isolation and characterization to cloning protein-encoding genes to characterizing how the genes are expressed temporally, spatially, and contextually.

Rosalind Franklin, who’s crystolagraphic data led to determination of DNA structure. Shown as 1953 Time cover as Time person of the Year

Dr Francis Crick and James Watson in front of their model structure of DNA

Up to this point (1970s-mid 80s) , it was felt that genetic information was rather static, and the goal was still to understand and characterize protein structure and function while an understanding of the underlying genetic information was more important for efforts like linkage analysis of genetic defects and tools for the rapidly developing field of molecular biology. But the development of the aforementioned molecular biology tools including DNA cloning, sequencing and synthesis, gave scientists the idea that a whole recording of the human genome might be possible and worth the effort.

How the Human Genome Project Expanded our View of Genes Genetic Material and Biological Processes

From the Human Genome Project Information Archive

Source: https://web.ornl.gov/sci/techresources/Human_Genome/project/hgp.shtml

History of the Human Genome Project

The Human Genome Project (HGP) refers to the international 13-year effort, formally begun in October 1990 and completed in 2003, to discover all the estimated 20,000-25,000 human genes and make them accessible for further biological study. Another project goal was to determine the complete sequence of the 3 billion DNA subunits (bases in the human genome). As part of the HGP, parallel studies were carried out on selected model organisms such as the bacterium E. coli and the mouse to help develop the technology and interpret human gene function. The DOE Human Genome Program and the NIH National Human Genome Research Institute (NHGRI) together sponsored the U.S. Human Genome Project.

Please see the following for goals, timelines, and funding for this project

History of the Project

HGP Goals and Corresponding Completion Dates
Budget History of the U.S. Human Genome Project
Timeline: Major Events in the Human Genome Project

It is interesting to note that multiple government legislation is credited for the funding of such a massive project including

Project Enabling Legislation

The Atomic Energy Act of 1946 (P.L. 79-585) provided the initial charter for a comprehensive program of research and development related to the utilization of fissionable and radioactive materials for medical, biological, and health purposes.
The Atomic Energy Act of 1954 (P.L. 83-706) further authorized the AEC “to conduct research on the biologic effects of ionizing radiation.”
The Energy Reorganization Act of 1974 (P.L. 93-438) provided that responsibilities of the Energy Research and Development Administration (ERDA) shall include “engaging in and supporting environmental, biomedical, physical, and safety research related to the development of energy resources and utilization technologies.”
The Federal Non-nuclear Energy Research and Development Act of 1974 (P.L. 93-577) authorized ERDA to conduct a comprehensive non-nuclear energy research, development, and demonstration program to include the environmental and social consequences of the various technologies.
The DOE Organization Act of 1977 (P.L. 95-91) mandated the Department “to assure incorporation of national environmental protection goals in the formulation and implementation of energy programs; and to advance the goal of restoring, protecting, and enhancing environmental quality, and assuring public health and safety,” and to conduct “a comprehensive program of research and development on the environmental effects of energy technology and program.”

It should also be emphasized that the project was not JUST funded through NIH but also Department of Energy

Project Sponsors

The U.S. Department of Energy funded its Human Genome Program through their Office of Biological and Environmental Research. (genome@science.doe.gov).
The U.S. National Institutes of Health funded its program through the National Human Genome Research Institute (NHGRI).

For a great read on Dr. Craig Ventnor with interviews with the scientist see Dr. Larry Bernstein’s excellent post The Human Genome Project

By 2003 we had gained much information about the structure of DNA, genes, exons, introns and allowed us to gain more insights into the diversity of genetic material and the underlying protein coding genes as well as many of the gene-expression regulatory elements. However there was much uninvestigated material dispersed between genes, the then called “junk DNA” and, up to 2003 not much was known about the function of this ‘junk DNA’. In addition there were two other problems:

The reference DNA used was actually from one person (Craig Ventor who was the lead initiator of the project)
Multiple gaps in the DNA sequence existed, and needed to be filled in

It is important to note that a tremendous amount of diversity of protein has been realized from both transcriptomic and proteomic studies. Although about 20 to 25,000 coding genes exist the human proteome contains about 600,000 proteoforms (due to alternative splicing, posttranslational modifications etc.)

This expansion of the proteoform via alternate splicing into isoforms, gene duplication to paralogs has been shown to have major effects on, for example, cellular signaling pathways (1)

However just recently it has been reported that the FULL human genome has been sequenced and is complete and verified. This was the focus of a recent issue in the journal Science.

Source: https://www.science.org/doi/10.1126/science.abj6987

Abstract

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

The current human reference genome was released by the Genome Reference Consortium (GRC) in 2013 and most recently patched in 2019 (GRCh38.p13) (1). This reference traces its origin to the publicly funded Human Genome Project (2) and has been continually improved over the past two decades. Unlike the competing Celera effort (3) and most modern sequencing projects based on “shotgun” sequence assembly (4), the GRC assembly was constructed from sequenced bacterial artificial chromosomes (BACs) that were ordered and oriented along the human genome by means of radiation hybrid, genetic linkage, and fingerprint maps. However, limitations of BAC cloning led to an underrepresentation of repetitive sequences, and the opportunistic assembly of BACs derived from multiple individuals resulted in a mosaic of haplotypes. As a result, several GRC assembly gaps are unsolvable because of incompatible structural polymorphisms on their flanks, and many other repetitive and polymorphic regions were left unfinished or incorrectly assembled (5).

Fig. 1. Summary of the complete T2T-CHM13 human genome assembly.
(A) Ideogram of T2T-CHM13v1.1 assembly features. For each chromosome (chr), the following information is provided from bottom to top: gaps and issues in GRCh38 fixed by CHM13 overlaid with the density of genes exclusive to CHM13 in red; segmental duplications (SDs) (42) and centromeric satellites (CenSat) (30); and CHM13 ancestry predictions (EUR, European; SAS, South Asian; EAS, East Asian; AMR, ad-mixed American). Bottom scale is measured in Mbp. (B and C) Additional (nonsyntenic) bases in the CHM13 assembly relative to GRCh38 per chromosome, with the acrocentrics highlighted in black (B) and by sequence type (C). (Note that the CenSat and SD annotations overlap.) RepMask, RepeatMasker. (D) Total nongap bases in UCSC reference genome releases dating back to September 2000 (hg4) and ending with T2T-CHM13 in 2021. Mt/Y/Ns, mitochondria, chrY, and gaps.

Note in Figure 1D the exponential growth in genetic information.

Also very important is the ability to determine all the paralogs, isoforms, areas of potential epigenetic regulation, gene duplications, and transposable elements that exist within the human genome.

Analyses and resources

A number of companion studies were carried out to characterize the complete sequence of a human genome, including comprehensive analyses of centromeric satellites (30), segmental duplications (42), transcriptional (49) and epigenetic profiles (29), mobile elements (49), and variant calls (25). Up to 99% of the complete CHM13 genome can be confidently mapped with long-read sequencing, opening these regions of the genome to functional and variational analysis (23) (fig. S38 and table S14). We have produced a rich collection of annotations and omics datasets for CHM13—including RNA sequencing (RNA-seq) (30), Iso-seq (21), precision run-on sequencing (PRO-seq) (49), cleavage under targets and release using nuclease (CUT&RUN) (30), and ONT methylation (29) experiments—and have made these datasets available via a centralized University of California, Santa Cruz (UCSC), Assembly Hub genome browser (54).

To highlight the utility of these genetic and epigenetic resources mapped to a complete human genome, we provide the example of a segmentally duplicated region of the chromosome 4q subtelomere that is associated with facioscapulohumeral muscular dystrophy (FSHD) (55). This region includes FSHD region gene 1 (FRG1), FSHD region gene 2 (FRG2), and an intervening D4Z4 macrosatellite repeat containing the double homeobox 4 (DUX4) gene that has been implicated in the etiology of FSHD (56). Numerous duplications of this region throughout the genome have complicated past genetic analyses of FSHD.

The T2T-CHM13 assembly reveals 23 paralogs of FRG1 spread across all acrocentric chromosomes as well as chromosomes 9 and 20 (Fig. 5A). This gene appears to have undergone recent amplification in the great apes (57), and approximate locations of FRG1 paralogs were previously identified by FISH (58). However, only nine FRG1 paralogs are found in GRCh38, hampering sequence-based analysis.

Future of the human reference genome

The T2T-CHM13 assembly adds five full chromosome arms and more additional sequence than any genome reference release in the past 20 years (Fig. 1D). This 8% of the genome has not been overlooked because of a lack of importance but rather because of technological limitations. High-accuracy long-read sequencing has finally removed this technological barrier, enabling comprehensive studies of genomic variation across the entire human genome, which we expect to drive future discovery in human genomic health and disease. Such studies will necessarily require a complete and accurate human reference genome.

CHM13 lacks a Y chromosome, and homozygous Y-bearing CHMs are nonviable, so a different sample type will be required to complete this last remaining chromosome. However, given its haploid nature, it should be possible to assemble the Y chromosome from a male sample using the same methods described here and supplement the T2T-CHM13 reference assembly with a Y chromosome as needed.

Extending beyond the human reference genome, large-scale resequencing projects have revealed genomic variation across human populations. Our reanalyses of the 1KGP (25) and SGDP (42) datasets have already shown the advantages of T2T-CHM13, even for short-read analyses. However, these studies give only a glimpse of the extensive structural variation that lies within the most repetitive regions of the genome assembled here. Long-read resequencing studies are now needed to comprehensively survey polymorphic variation and reveal any phenotypic associations within these regions.

Although CHM13 represents a complete human haplotype, it does not capture the full diversity of human genetic variation. To address this bias, the Human Pangenome Reference Consortium (59) has joined with the T2T Consortium to build a collection of high-quality reference haplotypes from a diverse set of samples. Ideally, all genomes could be assembled at the quality achieved here, but automated T2T assembly of diploid genomes presents a difficult challenge that will require continued development. Until this goal is realized, and any human genome can be completely sequenced without error, the T2T-CHM13 assembly represents a more complete, representative, and accurate reference than GRCh38.

This paper was the focus of a Time article and their basis for making the lead authors part of their Time 100 people of the year.

From TIME

The Human Genome Is Finally Fully Sequenced

Source: https://time.com/6163452/human-genome-fully-sequenced/

The first human genome was mapped in 2001 as part of the Human Genome Project, but researchers knew it was neither complete nor completely accurate. Now, scientists have produced the most completely sequenced human genome to date, filling in gaps and correcting mistakes in the previous version.

The sequence is the most complete reference genome for any mammal so far. The findings from six new papers describing the genome, which were published in Science, should lead to a deeper understanding of human evolution and potentially reveal new targets for addressing a host of diseases.

A more precise human genome

“The Human Genome Project relied on DNA obtained through blood draws; that was the technology at the time,” says Adam Phillippy, head of genome informatics at the National Institutes of Health’s National Human Genome Research Institute (NHGRI) and senior author of one of the new papers. “The techniques at the time introduced errors and gaps that have persisted all of these years. It’s nice now to fill in those gaps and correct those mistakes.”

“We always knew there were parts missing, but I don’t think any of us appreciated how extensive they were, or how interesting,” says Michael Schatz, professor of computer science and biology at Johns Hopkins University and another senior author of the same paper.

The work is the result of the Telomere to Telomere consortium, which is supported by NHGRI and involves genetic and computational biology experts from dozens of institutes around the world. The group focused on filling in the 8% of the human genome that remained a genetic black hole from the first draft sequence. Since then, geneticists have been trying to add those missing portions bit by bit. The latest group of studies identifies about an entire chromosome’s worth of new sequences, representing 200 million more base pairs (the letters making up the genome) and 1,956 new genes.

NOTE: In 2001 many scientists postulated there were as much as 100,000 coding human genes however now we understand there are about 20,000 to 25,000 human coding genes. This does not however take into account the multiple diversity obtained from alternate splicing, gene duplications, SNPs, and chromosomal rearrangements.

Scientists were also able to sequence the long stretches of DNA that contained repeated sequences, which genetic experts originally thought were similar to copying errors and dismissed as so-called “junk DNA”. These repeated sequences, however, may play roles in certain human diseases. “Just because a sequence is repetitive doesn’t mean it’s junk,” says Eichler. He points out that critical genes are embedded in these repeated regions—genes that contribute to machinery that creates proteins, genes that dictate how cells divide and split their DNA evenly into their two daughter cells, and human-specific genes that might distinguish the human species from our closest evolutionary relatives, the primates. In one of the papers, for example, researchers found that primates have different numbers of copies of these repeated regions than humans, and that they appear in different parts of the genome.

“These are some of the most important functions that are essential to live, and for making us human,” says Eichler. “Clearly, if you get rid of these genes, you don’t live. That’s not junk to me.”

Deciphering what these repeated sections mean, if anything, and how the sequences of previously unsequenced regions like the centromeres will translate to new therapies or better understanding of human disease, is just starting, says Deanna Church, a vice president at Inscripta, a genome engineering company who wrote a commentary accompanying the scientific articles. Having the full sequence of a human genome is different from decoding it; she notes that currently, of people with suspected genetic disorders whose genomes are sequenced, about half can be traced to specific changes in their DNA. That means much of what the human genome does still remains a mystery.

The investigators in the Telomere to Telomere Consortium made the Time 100 People of the Year.

Michael Schatz, Karen Miga, Evan Eichler, and Adam Phillippy

Illustration by Brian Lutz for Time (Source Photos: Will Kirk—Johns Hopkins University; Nick Gonzales—UC Santa Cruz; Patrick Kehoe; National Human Genome Research Institute)

BY JENNIFER DOUDNA

MAY 23, 2022 6:08 AM EDT

Ever since the draft of the human genome became available in 2001, there has been a nagging question about the genome’s “dark matter”—the parts of the map that were missed the first time through, and what they contained. Now, thanks to Adam Phillippy, Karen Miga, Evan Eichler, Michael Schatz, and the entire Telomere-to-Telomere Consortium (T2T) of scientists that they led, we can see the full map of the human genomic landscape—and there’s much to explore.

In the scientific community, there wasn’t a consensus that mapping these missing parts was necessary. Some in the field felt there was already plenty to do using the data in hand. In addition, overcoming the technical challenges to getting the missing information wasn’t possible until recently. But the more we learn about the genome, the more we understand that every piece of the puzzle is meaningful.

I admire the

T2T group’s willingness to grapple with the technical demands of this project and their persistence in expanding the genome map into uncharted territory. The complete human genome sequence is an invaluable resource that may provide new insights into the origin of diseases and how we can treat them. It also offers the most complete look yet at the genetic script underlying the very nature of who we are as human beings.

Doudna is a biochemist and winner of the 2020 Nobel Prize in Chemistry

Source: https://time.com/collection/100-most-influential-people-2022/6177818/evan-eichler-karen-miga-adam-phillippy-michael-schatz/

International Award for Human Genome Project

Cracking the Genome – Inside the Race to Unlock Human DNA – quotes in newspapers

The Human Genome Project

Junk DNA and Breast Cancer

A Perspective on Personalized Medicine

Additional References

P. Scalia, A. Giordano, C. Martini, S. J. Williams, Isoform- and Paralog-Switching in IR-Signaling: When Diabetes Opens the Gates to Cancer. Biomolecules 10, (Nov 30, 2020).

Read Full Post »

#TUBiol5227: Biomarkers & Biotargets: Genetic Testing and Bioethics

Posted in and Bioethics, Bio-Ethics, BioBanking, BioIT: BioInformatics, BioIT: BioInformatics, NGS, Clinical & Translational, Pharmaceutical R&D Informatics, Clinical Genomics, Cancer Informatics, Biological Networks, Biomarkers & Medical Diagnostics, BioSimilars, Biotechnology, BioTechnology - Venture Creation, Cancer and Current Therapeutics, Cancer Genomics, Clinical Diagnostics, Clinical Genomics, Conference Coverage with Social Media, Ethics and Leadership, FDA, FDA Regulatory Affairs, Federal Budget Appropriations, Health Economics and Outcomes Research, Health Law Policy, Healthcare costs and reimbursement, Intellectual Property, Innovations, Commercialization, Investment in technological breakthrough, interventional oncology, Next Generation Sequencing (NGS), Patent Law in Biotech, Patents, Patient Experience, Personal Health Applications: Tech Innovations serves HealhCare, Personalized and Precision Medicine & Genomic Research, Personalized Medicine Coalition, Precision Cancer Medicine, Regulated Clinical Trials: Design, Methods, Components and IRB related issues, Voices of Patients and Healthcare Providers, tagged #TUBiol5227, bioethics, biomarkers, Drug development, FDA, healthcare, insurers, legal, Patient, payers, physician-patient, Regulatory on September 28, 2021| Leave a Comment »

#TUBiol5227: Biomarkers & Biotargets: Genetic Testing and Bioethics

Curator: Stephen J. Williams, Ph.D.

The advent of direct to consumer (DTC) genetic testing and the resultant rapid increase in its popularity as well as companies offering such services has created some urgent and unique bioethical challenges surrounding this niche in the marketplace. At first, most DTC companies like 23andMe and Ancestry.com offered non-clinical or non-FDA approved genetic testing as a way for consumers to draw casual inferences from their DNA sequence and existence of known genes that are linked to disease risk, or to get a glimpse of their familial background. However, many issues arose, including legal, privacy, medical, and bioethical issues. Below are some articles which will explain and discuss many of these problems associated with the DTC genetic testing market as well as some alternatives which may exist.

‘Direct-to-Consumer (DTC) Genetic Testing Market to hit USD 2.5 Bn by 2024’ by Global Market Insights

This post has the following link to the market analysis of the DTC market (https://www.gminsights.com/pressrelease/direct-to-consumer-dtc-genetic-testing-market). Below is the highlights of the report.

As you can see,this market segment appears to want to expand into the nutritional consulting business as well as targeted biomarkers for specific diseases.

Rising incidence of genetic disorders across the globe will augment the market growth

Increasing prevalence of genetic disorders will propel the demand for direct-to-consumer genetic testing and will augment industry growth over the projected timeline. Increasing cases of genetic diseases such as breast cancer, achondroplasia, colorectal cancer and other diseases have elevated the need for cost-effective and efficient genetic testing avenues in the healthcare market.

For instance, according to the World Cancer Research Fund (WCRF), in 2018, over 2 million new cases of cancer were diagnosed across the globe. Also, breast cancer is stated as the second most commonly occurring cancer. Availability of superior quality and advanced direct-to-consumer genetic testing has drastically reduced the mortality rates in people suffering from cancer by providing vigilant surveillance data even before the onset of the disease. Hence, the aforementioned factors will propel the direct-to-consumer genetic testing market overt the forecast timeline.

DTC Genetic Testing Market By Technology

Get more details on this report – Request Free Sample PDF

Nutrigenomic Testing will provide robust market growth

The nutrigenomic testing segment was valued over USD 220 million market value in 2019 and its market will witness a tremendous growth over 2020-2028. The growth of the market segment is attributed to increasing research activities related to nutritional aspects. Moreover, obesity is another major factor that will boost the demand for direct-to-consumer genetic testing market.

Nutrigenomics testing enables professionals to recommend nutritional guidance and personalized diet to obese people and help them to keep their weight under control while maintaining a healthy lifestyle. Hence, above mentioned factors are anticipated to augment the demand and adoption rate of direct-to-consumer genetic testing through 2028.

Browse key industry insights spread across 161 pages with 126 market data tables & 10 figures & charts from the report, “Direct-To-Consumer Genetic Testing Market Size By Test Type (Carrier Testing, Predictive Testing, Ancestry & Relationship Testing, Nutrigenomics Testing), By Distribution Channel (Online Platforms, Over-the-Counter), By Technology (Targeted Analysis, Single Nucleotide Polymorphism (SNP) Chips, Whole Genome Sequencing (WGS)), Industry Analysis Report, Regional Outlook, Application Potential, Price Trends, Competitive Market Share & Forecast, 2020 – 2028” in detail along with the table of contents:
https://www.gminsights.com/industry-analysis/direct-to-consumer-dtc-genetic-testing-market

Targeted analysis techniques will drive the market growth over the foreseeable future

Based on technology, the DTC genetic testing market is segmented into whole genome sequencing (WGS), targeted analysis, and single nucleotide polymorphism (SNP) chips. The targeted analysis market segment is projected to witness around 12% CAGR over the forecast period. The segmental growth is attributed to the recent advancements in genetic testing methods that has revolutionized the detection and characterization of genetic codes.

Targeted analysis is mainly utilized to determine any defects in genes that are responsible for a disorder or a disease. Also, growing demand for personalized medicine amongst the population suffering from genetic diseases will boost the demand for targeted analysis technology. As the technology is relatively cheaper, it is highly preferred method used in direct-to-consumer genetic testing procedures. These advantages of targeted analysis are expected to enhance the market growth over the foreseeable future.

Over-the-counter segment will experience a notable growth over the forecast period

The over-the-counter distribution channel is projected to witness around 11% CAGR through 2028. The segmental growth is attributed to the ease in purchasing a test kit for the consumers living in rural areas of developing countries. Consumers prefer over-the-counter distribution channel as they are directly examined by regulatory agencies making it safer to use, thereby driving the market growth over the forecast timeline.

Favorable regulations provide lucrative growth opportunities for direct-to-consumer genetic testing

Europe direct-to-consumer genetic testing market held around 26% share in 2019 and was valued at around USD 290 million. The regional growth is due to elevated government spending on healthcare to provide easy access to genetic testing avenues. Furthermore, European regulatory bodies are working on improving the regulations set on the direct-to-consumer genetic testing methods. Hence, the above-mentioned factors will play significant role in the market growth.

Focus of market players on introducing innovative direct-to-consumer genetic testing devices will offer several growth opportunities

Few of the eminent players operating in direct-to-consumer genetic testing market share include Ancestry, Color Genomics, Living DNA, Mapmygenome, Easy DNA, FamilytreeDNA (Gene By Gene), Full Genome Corporation, Helix OpCo LLC, Identigene, Karmagenes, MyHeritage, Pathway genomics, Genesis Healthcare, and 23andMe. These market players have undertaken various business strategies to enhance their financial stability and help them evolve as leading companies in the direct-to-consumer genetic testing industry.

For example, in November 2018, Helix launched a new genetic testing product, DNA discovery kit, that allows customer to delve into their ancestry. This development expanded the firm’s product portfolio, thereby propelling industry growth in the market.

The following posts discuss bioethical issues related to genetic testing and personalized medicine from a clinicians and scientisit’s perspective

Question: Each of these articles discusses certain bioethical issues although focuses on personalized medicine and treatment. Given your understanding of the robust process involved in validating clinical biomarkers and the current state of the DTC market, how could DTC testing results misinform patients and create mistrust in the physician-patient relationship?

Personalized Medicine, Omics, and Health Disparities in Cancer: Can Personalized Medicine Help Reduce the Disparity Problem?

Diversity and Health Disparity Issues Need to be Addressed for GWAS and Precision Medicine Studies

Genomics & Ethics: DNA Fragments are Products of Nature or Patentable Genes?

The following posts discuss the bioethical concerns of genetic testing from a patient’s perspective:

Ethics Behind Genetic Testing in Breast Cancer: A Webinar by Laura Carfang of survivingbreastcancer.org

Ethical Concerns in Personalized Medicine: BRCA1/2 Testing in Minors and Communication of Breast Cancer Risk

23andMe Product can be obtained for Free from a new app called Genes for Good: UMich’s Facebook-based Genomics Project

Question: If you are developing a targeted treatment with a companion diagnostic, what bioethical concerns would you address during the drug development process to ensure fair, equitable and ethical treatment of all patients, in trials as well as post market?

Articles on Genetic Testing, Companion Diagnostics and Regulatory Mechanisms

Centers for Medicare & Medicaid Services announced that the federal healthcare program will cover the costs of cancer gene tests that have been approved by the Food and Drug Administration

Real Time Coverage @BIOConvention #BIO2019: Genome Editing and Regulatory Harmonization: Progress and Challenges

New York Times vs. Personalized Medicine? PMC President: Times’ Critique of Streamlined Regulatory Approval for Personalized Treatments ‘Ignores Promising Implications’ of Field

Live Conference Coverage @Medcitynews Converge 2018 Philadelphia: Early Diagnosis Through Predictive Biomarkers, NonInvasive Testing

Protecting Your Biotech IP and Market Strategy: Notes from Life Sciences Collaborative 2015 Meeting

Question: What type of regulatory concerns should one have during the drug development process in regards to use of biomarker testing? From the last article on Protecting Your IP how important is it, as a drug developer, to involve all payers during the drug development process?

Read Full Post »

Thriving Vaccines and Research: Weizmann Institute Coronavirus Research Development

Posted in Academic Publishing, Advanced Drug Manufacturing Technology, Biomarkers & Medical Diagnostics, Clinical Diagnostics, coronavirus, Coronavirus Gene Expression, COVID-19, COVID-19, Diagnostics and Lab Tests, Disease Biology, Disease Biology, Small Molecules in Development of Therapeutic Drugs, Drug Development Process, Health in Israel, Human Immune System in Health and in Disease, Infectious Disease Immunodiagnostics, Mechanisms of infection by SARS-CoV-2, Mutant Gene Expression, New Drug Approval, Next Generation Sequencing (NGS), number of asymptomatic infections, Patient-centered Medicine, Proteosome, SAR-Cov-2 a vasculotropic (blood vessels) RNA Virus, SARS-CoV-2, SARS-CoV-2 circulating variants , SARS-CoV-2 Viral Variants, SARS-COV2 Hijacking the Complement and Coagulation Systems, Small Molecules in Development of Therapeutic Drugs, T-cell response to SARS-CoV-2 infection, Translational Research, Treatment Protocols for COVID-19, Vaccinology, Viral diseases, Virology, virology, Virology, Virtual Drug Molecule Screening targeting SAR-CoV-2 proteins, tagged antiviral vaccine, Coronavirus Vaccines, COVID-19 SARS-CoV-2, Drug development, HLA, mapping the genome, mRNA vaccines, next generation sequencing, Nsp1, Pandemic, Protein profiling, RT-PCR, Untranslated regions, viral protein, Virus replication, Weizmann Coronavirus Research on May 4, 2021| Leave a Comment »

Thriving Vaccines and Research: Weizmann Institute Coronavirus Research Development

Reporter: Amandeep Kaur, B.Sc., M.Sc.

In early February, Prof. Eran Segal updated in one of his tweets and mentioned that “We say with caution, the magic has started.”

The article reported that this statement by Prof. Segal was due to decreasing cases of COVID-19, severe infection cases and hospitalization of patients by rapid vaccination process throughout Israel. Prof. Segal emphasizes in another tweet to remain cautious over the country and informed that there is a long way to cover and searching for scientific solutions.

COVID-19 Vaccine
Image source: https://www.jhsph.edu/covid-19/articles/a-top-vaccine-expert-answers-important-questions-about-a-covid-19-vaccine.html

A daylong webinar entitled “COVID-19: The epidemic that rattles the world” was a great initiative by Weizmann Institute to share their scientific knowledge about the infection among the Israeli institutions and scientists. Prof. Gideon Schreiber and Dr. Ron Diskin organized the event with the support of the Weizmann Coronavirus Response Fund and Israel Society for Biochemistry and Molecular Biology. The speakers were invited from the Hebrew University of Jerusalem, Tel-Aviv University, the Israel Institute for Biological Research (IIBR), and Kaplan Medical Center who addressed the molecular structure and infection biology of the virus, treatments and medications for COVID-19, and the positive and negative effect of the pandemic.

The article reported that with the emergence of pandemic, the scientists at Weizmann started more than 60 projects to explore the virus from different range of perspectives. With the help of funds raised by communities worldwide for the Weizmann Coronavirus Response Fund supported scientists and investigators to elucidate the chemistry, physics and biology behind SARS-CoV-2 infection.

Prof. Avi Levy, the coordinator of the Weizmann Institute’s coronavirus research efforts, mentioned “The vaccines are here, and they will drastically reduce infection rates. But the coronavirus can mutate, and there are many similar infectious diseases out there to be dealt with. All of this research is critical to understanding all sorts of viruses and to preempting any future pandemics.”

The following are few important projects with recent updates reported in the article.

Mapping a hijacker’s methods

Dr. Noam Stern-Ginossar studied the virus invading strategies into the healthy cells and hijack the cell’s systems to divide and reproduce. The article reported that viruses take over the genetic translation system and mainly the ribosomes to produce viral proteins. Dr. Noam used a novel approach known as ‘ribosome profiling’ as her research objective and create a map to locate the translational events taking place inside the viral genome, which further maps the full repertoire of viral proteins produced inside the host.

She and her team members grouped together with the Weizmann’s de Botton Institute and researchers at IIBR for Protein Profiling and understanding the hijacking instructions of coronavirus and developing tools for treatment and therapies. Scientists generated a high-resolution map of the coding regions in the SARS-CoV-2 genome using ribosome-profiling techniques, which allowed researchers to quantify the expression of vital zones along the virus genome that regulates the translation of viral proteins. The study published in Nature in January, explains the hijacking process and reported that virus produces more instruction in the form of viral mRNA than the host and thus dominates the translation process of the host cell. Researchers also clarified that it is the misconception that virus forced the host cell to translate its viral mRNA more efficiently than the host’s own translation, rather high level of viral translation instructions causes hijacking. This study provides valuable insights for the development of effective vaccines and drugs against the COVID-19 infection.

Like chutzpah, some things don’t translate

Prof. Igor Ulitsky and his team worked on untranslated region of viral genome. The article reported that “Not all the parts of viral transcript is translated into protein- rather play some important role in protein production and infection which is unknown.” This region may affect the molecular environment of the translated zones. The Ulitsky group researched to characterize that how the genetic sequence of regions that do not translate into proteins directly or indirectly affect the stability and efficiency of the translating sequences.

Initially, scientists created the library of about 6,000 regions of untranslated sequences to further study their functions. In collaboration with Dr. Noam Stern-Ginossar’s lab, the researchers of Ulitsky’s team worked on Nsp1 protein and focused on the mechanism that how such regions affect the Nsp1 protein production which in turn enhances the virulence. The researchers generated a new alternative and more authentic protocol after solving some technical difficulties which included infecting cells with variants from initial library. Within few months, the researchers are expecting to obtain a more detailed map of how the stability of Nsp1 protein production is getting affected by specific sequences of the untranslated regions.

The landscape of elimination

The article reported that the body’s immune system consists of two main factors- HLA (Human Leukocyte antigen) molecules and T cells for identifying and fighting infections. HLA molecules are protein molecules present on the cell surface and bring fragments of peptide to the surface from inside the infected cell. These peptide fragments are recognized and destroyed by the T cells of the immune system. Samuels’ group tried to find out the answer to the question that how does the body’s surveillance system recognizes the appropriate peptide derived from virus and destroy it. They isolated and analyzed the ‘HLA peptidome’- the complete set of peptides bound to the HLA proteins from inside the SARS-CoV-2 infected cells.

After the analysis of infected cells, they found 26 class-I and 36 class-II HLA peptides, which are present in 99% of the population around the world. Two peptides from HLA class-I were commonly present on the cell surface and two other peptides were derived from coronavirus rare proteins- which mean that these specific coronavirus peptides were marked for easy detection. Among the identified peptides, two peptides were novel discoveries and seven others were shown to induce an immune response earlier. These results from the study will help to develop new vaccines against new coronavirus mutation variants.

Gearing up ‘chain terminators’ to battle the coronavirus

Prof. Rotem Sorek and his lab discovered a family of enzymes within bacteria that produce novel antiviral molecules. These small molecules manufactured by bacteria act as ‘chain terminators’ to fight against the virus invading the bacteria. The study published in Nature in January which reported that these molecules cause a chemical reaction that halts the virus’s replication ability. These new molecules are modified derivates of nucleotide which integrates at the molecular level in the virus and obstruct the works.

Prof. Sorek and his group hypothesize that these new particles could serve as a potential antiviral drug based on the mechanism of chain termination utilized in antiviral drugs used recently in the clinical treatments. Yeda Research and Development has certified these small novel molecules to a company for testing its antiviral mechanism against SARS-CoV-2 infection. Such novel discoveries provide evidences that bacterial immune system is a potential repository of many natural antiviral particles.

Resolving borderline diagnoses

Currently, Real-time Polymerase chain reaction (RT-PCR) is the only choice and extensively used for diagnosis of COVID-19 patients around the globe. Beside its benefits, there are problems associated with RT-PCR, false negative and false positive results and its limitation in detecting new mutations in the virus and emerging variants in the population worldwide. Prof. Eran Elinavs’ lab and Prof. Ido Amits’ lab are working collaboratively to develop a massively parallel, next-generation sequencing technique that tests more effectively and precisely as compared to RT-PCR. This technique can characterize the emerging mutations in SARS-CoV-2, co-occurring viral, bacterial and fungal infections and response patterns in human.

The scientists identified viral variants and distinctive host signatures that help to differentiate infected individuals from non-infected individuals and patients with mild symptoms and severe symptoms.

In Hadassah-Hebrew University Medical Center, Profs. Elinav and Amit are performing trails of the pipeline to test the accuracy in borderline cases, where RT-PCR shows ambiguous or incorrect results. For proper diagnosis and patient stratification, researchers calibrated their severity-prediction matrix. Collectively, scientists are putting efforts to develop a reliable system that resolves borderline cases of RT-PCR and identify new virus variants with known and new mutations, and uses data from human host to classify patients who are needed of close observation and extensive treatment from those who have mild complications and can be managed conservatively.

Moon shot consortium refining drug options

The ‘Moon shot’ consortium was launched almost a year ago with an initiative to develop a novel antiviral drug against SARS-CoV-2 and was led by Dr. Nir London of the Department of Chemical and Structural Biology at Weizmann, Prof. Frank von Delft of Oxford University and the UK’s Diamond Light Source synchroton facility.

To advance the series of novel molecules from conception to evidence of antiviral activity, the scientists have gathered support, guidance, expertise and resources from researchers around the world within a year. The article reported that researchers have built an alternative template for drug-discovery, full transparency process, which avoids the hindrance of intellectual property and red tape.

The new molecules discovered by scientists inhibit a protease, a SARS-CoV-2 protein playing important role in virus replication. The team collaborated with the Israel Institute of Biological Research and other several labs across the globe to demonstrate the efficacy of molecules not only in-vitro as well as in analysis against live virus.

Further research is performed including assaying of safety and efficacy of these potential drugs in living models. The first trial on mice has been started in March. Beside this, additional drugs are optimized and nominated for preclinical testing as candidate drug.

Source: https://www.weizmann.ac.il/WeizmannCompass/sections/features/the-vaccines-are-here-and-research-abounds

Other related articles were published in this Open Access Online Scientific Journal, including the following:

Identification of Novel genes in human that fight COVID-19 infection

Reporter: Amandeep Kaur, B.Sc., M.Sc. (ept. 5/2021)

https://pharmaceuticalintelligence.com/2021/04/19/identification-of-novel-genes-in-human-that-fight-covid-19-infection/

Fighting Chaos with Care, community trust, engagement must be cornerstones of pandemic response

Reporter: Amandeep Kaur, B.Sc., M.Sc. (ept. 5/2021)

https://pharmaceuticalintelligence.com/2021/04/13/fighting-chaos-with-care/

T cells recognize recent SARS-CoV-2 variants

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2021/03/30/t-cells-recognize-recent-sars-cov-2-variants/

Need for Global Response to SARS-CoV-2 Viral Variants

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2021/02/12/need-for-global-response-to-sars-cov-2-viral-variants/

Mechanistic link between SARS-CoV-2 infection and increased risk of stroke using 3D printed models and human endothelial cells

Reporter: Adina Hazan, PhD

https://pharmaceuticalintelligence.com/2020/12/28/mechanistic-link-between-sars-cov-2-infection-and-increased-risk-of-stroke-using-3d-printed-models-and-human-endothelial-cells/

Read Full Post »

Complex rearrangements and oncogene amplification revealed by long-read DNA and RNA sequencing of a breast cancer cell line

Posted in Cancer - General, CANCER BIOLOGY & Innovations in Cancer Therapy, Cancer Genomics, Genomic Testing: Methodology for Diagnosis, Next Generation Sequencing (NGS), Single Cell Genomics, tagged 10X Genomics Pacific Biosciences, breast cancer, cancer variants, chromosomal abberation, disease variants, fusion genes, gene fusions, long read sequencing, mutational spectrum, next gen sequencing (NGS), oncogenes, sequencing methodology, Whole genome sequencing on August 14, 2019| Leave a Comment »

Complex rearrangements and oncogene amplification revealed by long-read DNA and RNA sequencing of a breast cancer cell line

Reporter: Stephen J. Williams, PhD

In a Genome Research report by Marie Nattestad et al. [1], the SK-BR-3 breast cancer cell line was sequenced using a long read single molecule sequencing protocol in order to develop one of the most detailed maps of structural variations in a cancer genome to date. The authors detected over 20,000 variants with this new sequencing modality, whereas most of these variants would have been missed by short read sequencing. In addition, a complex sequence of nested duplications and translocations occurred surrounding the ERBB2 (HER2) while full-length transcriptomic analysis revealed novel gene fusions within the nested genomic variants. The authors suggest that combining this long-read genome and transcriptome sequencing results in a more comprehensive coverage of tumor gene variants and “sheds new light on the complex mechanisms involved in cancer genome evolution.”

Genomic instability is a hallmark of cancer [2], which lead to numerous genetic variations such as:

Copy number variations
Chromosomal alterations
Gene fusions
Deletions
Gene duplications
Insertions
Translocations

Efforts such as the Cancer Genome Atlas [3], and the International Genome Consortium (2010) use short-read sequencing technology to detect and analyze thousands of commonly occurring mutations however short-read technology has a high false positive and negative rate for detecting less common genetic structural variations {as high as 50% [4]}. In addition, short reads cannot detect variations in close proximity to each other or on the same molecule, therefore underestimating the variation number.

Methods: The authors used a long-read sequencing technology from Pacific Biosciences (SMRT) to analyze the mutational and structural variation in the SK-BR-3 breast cancer cell line. A split read and within-read mapping approach was used to detect variants of different types and sizes. In general, long-reads have better alignment qualities than short reads, resulting in higher quality mapping. Transcriptomic analysis was performed using Iso-Seq.

Results: Using the SMRT long-read sequencing technology from Pacific Biosciences, the authors were able to obtain 71.9% sequencing coverage with average read length of 9.8 kb for the SK-BR-3 genome.

A few notes:

Most amplified regions (33.6 copies) around the locus spanning the ERBB2 oncogene and around MYC locus (38 copies), EGFR locus (7 copies) and BCAS1 (16.8 copies)
The locus 8q24.12 had the most amplifications (this locus contains the SNTB1 gene) at 69.2 copies
Long-read sequencing showed more insertions than deletions and suggests an underestimate of the lengths of low complexity regions in the human reference genome
Found 1,493 long read variants, 603 of which were between different chromosomes
Using Iso-Seq in conjunction with the long-read platform, they detected 1,692,379 isoforms (93%) mapping to the reference genome and 53 putative gene fusions (39 of which they found genomic evidence)

A table modified from the paper on the gene fusions is given below:

Table 1. Gene fusions with RNA evidence from Iso-Seq and DNA evidence from SMRT DNA sequencing where the genomic path is found using SplitThreader from Sniffles variant calls. Note link in table is GeneCard for each gene.

SplitThreader path

#	Genes		Distance (bp)	Number of variants	Chromosomes in path	Previously observed in references
1	KLHDC2	SNTB1	9837	3	14\|17\|8	Asmann et al. (2011) as only a 2-hop fusion
2	CYTH1	EIF3H	8654	2	17\|8	Edgren et al. (2011); Kim and Salzberg
						(2011); RNA only, not observed as 2-hop
3	CPNE1	PREX1	1777	2	20	Found and validated as 2-hop by Chen et al. 2013
4	GSDMB	TATDN1	0	1	17\|8	Edgren et al. (2011); Kim and Salzberg
						(2011); Chen et al. (2013); validated by
						Edgren et al. (2011)
5	LINC00536	PVT1	0	1	8	No
6	MTBP	SAMD12	0	1	8	Validated by Edgren et al. (2011)
7	LRRFIP2	SUMF1	0	1	3	Edgren et al. (2011); Kim and Salzberg
						(2011); Chen et al. (2013); validated by
						Edgren et al. (2011)
8	FBXL7	TRIO	0	1	5	No
9	ATAD5	TLK2	0	1	17	No
10	DHX35	ITCH	0	1	20	Validated by Edgren et al. (2011)
11	LMCD1-AS1	MECOM	0	1	3	No
12	PHF20	RP4-723E3.1	0	1	20	No
13	RAD51B	SEMA6D	0	1	14\|15	No
14	STAU1	TOX2	0	1	20	No
15	TBC1D31	ZNF704	0	1	8	Edgren et al. (2011); Kim and Salzberg
						(2011); Chen et al. (2013); validated by
						Edgren et al. (2011); Chen et al. (2013)

SplitThreader found two different paths for the RAD51B-SEMA6D gene fusion and for the LINC00536-PVT1 gene fusion. Number of Iso-Seq reads refers to full-length HQ-filtered reads. Alignments of SMRT DNA sequence reads supporting each of these gene fusions are shown in Supplemental Note S2.

References

Nattestad M, Goodwin S, Ng K, Baslan T, Sedlazeck FJ, Rescheneder P, Garvin T, Fang H, Gurtowski J, Hutton E et al: Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome research 2018, 28(8):1126-1135.
Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100(1):57-70.
Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael JF, Wyczalkowski MA et al: Mutational landscape and significance across 12 major cancer types. Nature 2013, 502(7471):333-339.
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH et al: An integrated map of structural variation in 2,504 human genomes. Nature 2015, 526(7571):75-81.

Narrative Building for the Future of LPBI Group: List of Talking Points

Posted in BioMed e-Books e-Series by LPBI Group, Genome Biology, LPBI Group, e-Scientific Media, DFP, R&D-M3DP, R&D-Drug Discovery, US Patents: SOPs and Team Management, Next Generation Sequencing (NGS) on June 30, 2019| Leave a Comment »

Narrative Building for the Future of LPBI Group: List of Talking Points

Exchange between Gail and Aviva

On Tuesday, June 25, 2019, 11:43:27 AM EDT, Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu> wrote:

https://www.terarecon.com/blog/beyond-the-screen-episode-6-next-generation-ai-companies-providing-physicians-a-starting-point-in-ai?utm_campaign=AuntMinnie%20June%202019

HOW can we get Kevin Landwher of terarecon.com to create a Podcast for LPBI Group IP Assets, including a section on our forthcoming Genomics, Volume 2

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

In response to this question we are in discussion on POINTS #1,2,3,4

From: Gail Thornton <gailsthornton@yahoo.com>

Reply-To: Gail Thornton <gailsthornton@yahoo.com>

Date: Sunday, June 30, 2019 at 8:38 AM

To: Aviva Lev-Ari <aviva.lev-ari@comcast.net>

Cc: Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu>, Rick Mandahl <rmandahl@gmail.com>, Amnon Danzig <amnon.danzig@gmail.com>

Subject: Please AUDIT PODCAST —>>>>>>>> Beyond the Screen Episode 6: Next Generation AI Companies Providing Physicians a Starting Point in AI

Aviva:

These videos from terarecon.com typically focus on one topic (not many as you’ve described below).

If there are too many topics proposed to this company, they will not be interested.

My recommendation is for you to finalize Genomics, volume 2, and let’s see the story we have about that specific topic.

Gali

On Tuesday, June 25, 2019, 11:43:27 AM EDT, Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu> wrote:

https://www.terarecon.com/blog/beyond-the-screen-episode-6-next-generation-ai-companies-providing-physicians-a-starting-point-in-ai?utm_campaign=AuntMinnie%20June%202019

HOW can we get Kevin Landwher of terarecon.com to create a Podcast for LPBI Group IP Assets, including a section on our forthcoming Genomics, Volume 2

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

On Saturday, June 29, 2019, 03:56:08 PM EDT, Aviva Lev-Ari <aviva.lev-ari@comcast.net> wrote:

POINT #1 for VIDEO coverage – Focus on Genomics, Volume 2

After 7/15, Prof. Feldman will be back in the US, stating to work on Part 5 in Genomics, Volume 2. We will Skype to discuss what to include in 5.1, 5.2, 5.3, 5.4

On 7/15, I am submitting my work on creation of Parts 1,2,3,4,6

Dr. Williams and Dr. Saha are working already on Part 7&8.

Below you have abbreviated eTOCs.

Go to URL of the Book to see what I placed already inside this book.

Dr. Williams and Prof. Feldman will compose

Preface

Introduction to Volume 2

Volume Summary

Epilogue

Based on these four parts and the eTOCs you will have ample content for the video, which may start with the epitome of our book creation: Genomics Volume 2 (you interview the three Editors why it is Epitome)

POINT #2 or #3 or #4 for VIDEOs to Focus on coverage for Marketing LPBI Group

by DESCRIPTION of what was accomplished

Venture history/background
Venture milestones: all posts in the Journal with the Title
“We celebrate …..
5-6 Titles like that, I may add two more
Site Statistics
Book articles cumulative views (Article Scoring System: Data Extract)
section on BioMed e-Series
section on List of Conference covered in Real Time
FIT Team input to Venture Valuation: top 5 or top 10 Factors in consensus
the 3D graphs on Opportunity Maps: Gail, Rick, Amnon, Aviva – each explains their own outcome
section on Pipeline

Video on What is the Ideal Solution for the FUTURE of LPBI Group

Interviews with All FIT Members

For POINT #1:

To build the narrative for a VIDEO dedication to Genomics, Volume Two and Marketing campaign as a NEW BOOK on NGS, the Narrative will use content extracts to built a CASE for

Why GENOMICS Volume 2 – is the Epitome of all BioMed e-Series???????

forthcoming Genomics, Volume 2

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

Aviva completed Parts 1,2,3,4,6,

[5 is by Prof. Feldman]

[7,8 are by Scientists on FIT]:

Latest in Genomics Methodologies for Therapeutics:

Gene Editing, NGS & BioInformatics,

Simulations and the Genome Ontology

2019

Volume Two

Prof. Marcus W. Feldman, PhD, Editor

Prof. Stephen J. Williams, PhD, Editor

And

Aviva Lev-Ari, PhD, RN, Editor

https://pharmaceuticalintelligence.com/biomed-e-books/genomics-orientations-for-personalized-medicine/volume-two-genomics-methodologies-ngs-bioinformatics-simulations-and-the-genome-ontology/

Abbreviated eTOCs

Part 1: NGS

1.1 The Science

1.2 Technologies and Methodologies

1.3 Clinical Aspects

1.4 Business and Legal

Part 2: CRISPR for Gene Editing and DNA Repair

2.1 The Science

2.2 Technologies and Methodologies

2.3 Clinical Aspects

2.4 Business and Legal

Part 3: AI in Medicine

3.1 The Science

3.2 Technologies and Methodologies

3.3 Clinical Aspects

3.4 Business and Legal

3.5 Latest in Machine Learning (ML) Algorithms harnessed for Medical Diagnosis: Pattern Recognition & Prediction of Disease Onset

Part 4: Single Cell Genomics

4.1 The Science

4.2 Technologies and Methodologies

4.3 Clinical Aspects

4.4 Business and Legal

Part 5: Evolution Biology Genomics Modeling @Feldman Lab, Stanford University – Written and Curated by Prof. Marc Feldman

5.1

5.2

5.3

5.4

Part 6: Simulation Modeling in Genomics

6.1 Mutation Analysis – Gene Encoding

6.2 Mitochondrial Variations

6.3 Variant Analysis

6.4 Variant Detection in Hereditary Cancer Genes

6.5 Immuno-Informatics

6.6 RNA Sequencing

6.7 Complex Insertions and Deletions

6.8 Evolutionary Biology

6.9 Simulation Programs

6.10 A comparison of tools for the simulation of genomic next-generation sequencing data

Part 7: Applications of Genomics: Genotypes, Phenotypes and Complex Diseases

7.1 Genome-wide associations with complex diseases (GWAS)

7.2 Non-coding DNA and phenotypes—including diseases like cancer

7.3 Epigenomic associations with phenotypes including cancer

7.4 Rare variants and diseases

7.5 Population-level genomics and the meaning of group differences

7.6 Targeting drugs for complex diseases

Part 8: Epigenomics and Genomic Regulation

8.1 Genomic controls on epigenomics

8.2 The ENCODE project and gene regulation

8.3 Small interfering RNAs and gene expression

8.4 Epigenomics in cancer

8.5 Environmental epigenomics

Read Full Post »

Simulation Tools of Genomic Next Generation Sequencing Data: Comparative Analysis & Genetic Simulation Resources

Posted in BioIT: BioInformatics, NGS, Clinical & Translational, Pharmaceutical R&D Informatics, Clinical Genomics, Cancer Informatics, Next Generation Sequencing (NGS), Simulation Modeling in NGS on May 31, 2019| Leave a Comment »

Simulation Tools of Genomic Next Generation Sequencing Data: Comparative Analysis & Genetic Simulation Resources

Reporting: Aviva Lev-Ari, PhD, RN

INTRODUCTION

What is next generation sequencing?

Behjati S, Tarpey PS.

Arch Dis Child Educ Pract Ed. 2013 Dec;98(6):236-8. doi: 10.1136/archdischild-2013-304340. Epub 2013 Aug 28. Review.

PMID:: 23986538

Free PMC Article

Similar articles

Computational pan-genomics: status, promises and challenges.

Computational Pan-Genomics Consortium.

Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089. Review.

PMID:: 27769991

Free PMC Article

Similar articles

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

Dahlö M, Scofield DG, Schaal W, Spjuth O.

Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy028.

PMID:: 29659792

Free PMC Article

Similar articles

NGS IN THE CLINIC

[Clinical Applications of Next-Generation Sequencing].

Rebollar-Vega RG, Arriaga-Canon C, de la Rosa-Velázquez IA.

Rev Invest Clin. 2018;70(4):153-157. doi: 10.24875/RIC.18002544.

PMID:: 30067721

Free Article

Similar articles

Clinical Genomics: Challenges and Opportunities.

Vijay P, McIntyre AB, Mason CE, Greenfield JP, Li S.

Crit Rev Eukaryot Gene Expr. 2016;26(2):97-113. doi: 10.1615/CritRevEukaryotGeneExpr.2016015724. Review.

PMID:: 27480773

Free PMC Article

Similar articles

Next-generation sequencing in the clinic: promises and challenges.

Xuan J, Yu Y, Qing T, Guo L, Shi L.

Cancer Lett. 2013 Nov 1;340(2):284-95. doi: 10.1016/j.canlet.2012.11.025. Epub 2012 Nov 19. Review.

PMID:: 23174106

Free PMC Article

Similar articles

The Future of Whole-Genome Sequencing for Public Health and the Clinic.

Allard MW.

J Clin Microbiol. 2016 Aug;54(8):1946-8. doi: 10.1128/JCM.01082-16. Epub 2016 Jun 15.

PMID:: 27307454

Free PMC Article

Similar articles

Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists.

Roy S, Coldren C, Karunamurthy A, Kip NS, Klee EW, Lincoln SE, Leon A, Pullambhatla M, Temple-Smolkin RL, Voelkerding KV, Wang C, Carter AB.

J Mol Diagn. 2018 Jan;20(1):4-27. doi: 10.1016/j.jmoldx.2017.11.003. Epub 2017 Nov 21. Review.

PMID:: 29154853

Similar articles

MUTATION ANALYSIS – GENE ENCODING

Next-Generation Sequencing and Mutational Analysis: Implications for Genes Encoding LINC Complex Proteins.

Nagy PL, Worman HJ.

Methods Mol Biol. 2018;1840:321-336. doi: 10.1007/978-1-4939-8691-0_22.

PMID:: 30141054

Similar articles

Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML.

Nat Rev Genet. 2011 Jun 17;12(7):499-510. doi: 10.1038/nrg3012. Review.

PMID:: 21681211

Similar articles

Best practices for evaluating mutation prediction methods.

Rogan PK, Zou GY.

Hum Mutat. 2013 Nov;34(11):1581-2. doi: 10.1002/humu.22401. Epub 2013 Sep 10. No abstract available.

PMID:: 23955774

Similar articles

MITOCHONDRIAL VATIATIONS

mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

Vellarikkal SK, Dhiman H, Joshi K, Hasija Y, Sivasubbu S, Scaria V.

Hum Mutat. 2015 Apr;36(4):419-24. doi: 10.1002/humu.22767.

PMID:: 25677119

Similar articles

VARIANT ANALYSIS

A survey of tools for variant analysis of next-generation genome sequencing data.

Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z.

Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.

PMID:: 23341494

Free PMC Article

Similar articles

Variant callers for next-generation sequencing data: a comparison study.

Liu X, Han S, Wang Z, Gelernter J, Yang BZ.

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

PMID:: 24086590

Free PMC Article

Similar articles

VARIANT DETECTION IN HEREDITARY CANCER GENES

ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

Lopez-Doriga A, Feliubadaló L, Menéndez M, Lopez-Doriga S, Morón-Duran FD, del Valle J, Tornero E, Montes E, Cuesta R, Campos O, Gómez C, Pineda M, González S, Moreno V, Capellá G, Lázaro C.

Hum Mutat. 2014 Mar;35(3):271-7.

PMID:: 24227591

Similar articles

Development and analytical validation of a 25-gene next generation sequencing panel that includes the BRCA1 and BRCA2 genes to assess hereditary cancer risk.

Judkins T, Leclair B, Bowles K, Gutin N, Trost J, McCulloch J, Bhatnagar S, Murray A, Craft J, Wardell B, Bastian M, Mitchell J, Chen J, Tran T, Williams D, Potter J, Jammulapati S, Perry M, Morris B, Roa B, Timms K.

BMC Cancer. 2015 Apr 2;15:215. doi: 10.1186/s12885-015-1224-y.

PMID:: 25886519

Free PMC Article

Similar articles

Clinical Applications of Next-Generation Sequencing in Cancer Diagnosis.

Sabour L, Sabour M, Ghorbian S.

Pathol Oncol Res. 2017 Apr;23(2):225-234. doi: 10.1007/s12253-016-0124-z. Epub 2016 Oct 8. Review.

PMID:: 27722982

Similar articles

Studying cancer genomics through next-generation DNA sequencing and bioinformatics.

Doyle MA, Li J, Doig K, Fellowes A, Wong SQ.

Methods Mol Biol. 2014;1168:83-98. doi: 10.1007/978-1-4939-0847-9_6. Review.

PMID:: 24870132

Similar articles

IMMUNOINFORMATICS

Immunoinformatics and epitope prediction in the age of genomic medicine.

Backert L, Kohlbacher O.

Genome Med. 2015 Nov 20;7:119. doi: 10.1186/s13073-015-0245-0. Review.

PMID:: 26589500

Free PMC Article

Similar articles

IgSimulator: a versatile immunosequencing simulator.

Safonova Y, Lapidus A, Lill J.

Bioinformatics. 2015 Oct 1;31(19):3213-5. doi: 10.1093/bioinformatics/btv326. Epub 2015 May 25.

PMID:: 26007226

Similar articles

Computational genomics tools for dissecting tumour-immune cell interactions.

Hackl H, Charoentong P, Finotello F, Trajanoski Z.

Nat Rev Genet. 2016 Jul 4;17(8):441-58. doi: 10.1038/nrg.2016.67. Review.

PMID:: 27376489

Similar articles

RNA SEQUENCING

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

Audoux J, Salson M, Grosset CF, Beaumeunier S, Holder JM, Commes T, Philippe N.

BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.

PMID:: 28969586

Free PMC Article

Similar articles

COMPLEX INSERTIONS AND DELETIONS

INDELseek: detection of complex insertions and deletions from next-generation sequencing data.

Au CH, Leung AY, Kwong A, Chan TL, Ma ES.

BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.

PMID:: 28056804

Free PMC Article

Similar articles

EVOLUTIONARY BIOLOGY

The State of Software for Evolutionary Biology.

Darriba D, Flouri T, Stamatakis A.

Mol Biol Evol. 2018 May 1;35(5):1037-1046. doi: 10.1093/molbev/msy014. Review.

PMID:: 29385525

Free PMC Article

Similar articles

SIMULATION PROGRAMS

Nat Rev Genet. 2016 Aug; 17(8): 459–469.

Published online 2016 Jun 20. doi: 10.1038/nrg.2016.57

PMCID: PMC5224698

EMSID: EMS70941

PMID: 27320129

Systematic review of next-generation sequencing simulators: computational tools, features and perspectives.

Zhao M, Liu D, Qu H.

Brief Funct Genomics. 2017 May 1;16(3):121-128. doi: 10.1093/bfgp/elw012. Review.

PMID:: 27069250

Similar articles

A comparison of tools for the simulation of genomic next-generation sequencing data

Merly Escalona,¹ Sara Rocha,¹ and David Posada^1,²

Author information Copyright and License information Disclaimer

The publisher’s final edited version of this article is available at Nat Rev Genet

This article has been corrected. See Nat Rev Genet. 2018 October 03; : .

Online Summary

There is a large number of tools for the simulation of genomic data for all currently available NGS platforms, with partially overlapped functionality. Here we review 23 of these tools, highlighting their distinct functionalities, requirements and potential applications.

The parameterization of these simulators is often complex. The user may decide between using existing sets of parameters values called profiles or re-estimating them from its own data.

Parameters than can be modulated in these simulations include the effects of the PCR amplification of the libraries, read features and quality scores, base call errors, variation of sequencing depth across the genomes and the introduction of genomic variants.

Several types of genomic variants can be introduced in the simulated reads, such as SNPs, indels, inversions, translocations, copy-number variants and short-tandem repeats.

Reads can be generated from single or multiple genomes, and with distinct ploidy levels. NGS data from metagenomic communities can be simulated given an “abundance profile” that reflects the proportion of taxa in a given sample.

Many of the simulators have not been formally described and/or tested in dedicated publications. We encourage the formal publication of these tools and the realization of comprehensive, comparative benchmarkings.

Choosing among the different genomic NGS simulators is not easy. Here we provide a guidance tree to help users choosing a suitable tool for their specific interests.

Abstract

Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or to gain understanding about specific datasets. Multiple computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand.

Image source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/

An overview of current NGS technologies

The most popular NGS technologies on the market are Illumina’s sequencing by synthesis, which is probably the most widely used platform at present¹⁷, Roche’s 454 pyrosequencing (454), SOLiD sequencing-by-ligation (SOLiD), IonTorrent semiconductor sequencing¹⁸ (IonTorrent), Pacific Biosciences’s (PacBio) single molecule real-time sequencing¹⁹, and Oxford Nanopore Technologies (Nanopore) single-cell DNA template strand sequencing. These strategies can differ, for example, regarding the type of reads they produce or the kind of sequencing errors they introduce (Table 1). Only two of the current technologies (Illumina and SOLiD) are capable of producing all three sequencing read types —single end, paired end and mate pair. Read length is also dependent on the machine and the kit used; in platforms like Illumina, SOLiD, or IonTorrent it is possible to specify the number of desired base pairs per read. According to the sequencing run type selected it is possible to obtain reads with maximum lengths of 75 bp (SOLiD), 300 bp (Illumina) or 400bp (IonTorrent). On the other hand, in platforms like 454, Nanopore or PacBio, information is only given about the mean and maximum read length that can be obtained, with average lengths of 700 bp, 10 kb and 15 kb and maximum lengths of 1 kb, 10 kb and 15 kb, respectively. Error rates vary depending on the platform from <=1% in Illumina to ~30% in Nanopore. Further overviews and comparisons of NGS strategies can be found in ⁵^,²⁰^–²².

Table 1

Main characteristics of current NGS technologies.

Technology Run Type Maximum Read Length Quality Scores Error Rates References

Single-read Paired-end Mate-pair

Illumina X X X 300 bp > Q30 0.0034 – 1% ⁶⁵

SOLiD X X X 75 bp > Q30 0.01 – 1% ⁶⁶

IonTorrent X X 400 bp ~ Q20 1.78% ²²

454 X X ~700 bp (up to 1 Kb) > Q20 1.07 – 1.7% 59,67

Nanopore X 5.4 – 10 Kb NAY 10 – 40% ^68–72

PacBio X ~15 Kb (up to 40 Kb) < Q10 5 – 10% ^22,73–75

Go to:

Simulation parameters

The existing sequencing platforms use distinct protocols that result in datasets with different characteristics¹. Many of these attributes can be taken into account by the simulators (Fig. 2), although there is not a single tool that incorporates all possible variations. The main characteristics of the 23 simulators considered here are summarized in Tables 2 and and3.3. These tools differ in multiple aspects, such as sequencing technology, input requirements or output format, but maintain several common aspects. With some exceptions, all programs need a reference sequence, multiple parameter values indicating the characteristics of the sequencing experiment to be simulated (read length, error distribution, type of variation to be generated, if any, etc.) and/or a profile (a set of parameter values, conditions and/or data used for controlling the simulation), which can be provided by the simulator or estimated de novo from empirical data. The outcome will be aligned or unaligned reads in different standard file formats, such as FASTQ, FASTA or BAM. An overview of the NGS data simulation process is represented in Fig. 3. In the following sections we delve into the different steps involved.

Open in a separate window

Figure 2

General overview of the sequencing process and steps that can be parameterized in the simulations.

NGS simulators try to imitate the real sequencing process as closely as possible by considering all the steps that could influence the characteristics of the reads. a | NGS simulators do not take into account the effect of the different DNA extraction protocols in the resulting data. However, they can consider whether the sample we want to sequence includes one or more individuals, from the same or different organisms (e.g., pool-sequencing, metagenomics). Pools of related genomes can be simulated by replicating the reference sequence and introducing variants on the resulting genomes. Some tools can also simulate metagenomes with distinct taxa abundance. b | Simulators can try to mimic the length range of DNA fragmentation (empirically obtained by sonication or digestion protocols) or assume a fixed amplicon length. c | Library preparation involves ligating sequencing–platform dependent adaptors and/or barcodes to the selected DNA fragments (inserts). Some simulators can control the insert size, and produce reads with adaptors/barcodes. d | | Most NGS techniques include an amplification step for the preparation of libraries. Several simulators can take this step into account (for example, by introducing errors and/or chimaeras), with the possibility of specifying the number of reads per amplicons. e | Sequencing runs imply a decision about coverage, read length, read type (single-end, paired-end, mate-pair) and a given platform (with their specific errors and biases). Simulators exist for the different platforms, and they can use particular parameter profiles, often estimated from real data.

Open in a separate window

Figure 3

General overview of NGS simulation.

The simulation process begins with the input of a reference sequence (most cases) and simulation parameters. Some of the parameters can be given via a profile, that is estimated (by the simulator or other tools) from other reads or alignments. The outcome of this process may be reads (with or without quality information) or genome alignments in different formats.

CONCLUSIONS

NGS is having a big impact in a broad range of areas that benefit from genetic information, from medical genomics, phylogenetic and population genomics, to the reconstruction of ancient genomes, epigenomics and environmental barcoding. These applications include approaches such as de novo sequencing, resequencing, target sequencing or genome reduction methods. In all cases, caution is necessary in choosing a proper sequencing design and/or a reliable analytical approach for the specific biological question of interest. The simulation of NGS data can be extremely useful for planning experiments, testing hypotheses, benchmarking tools and evaluating particular results. Given a reference genome or dataset, for instance, one can play with an array of sequencing technologies to choose the best-suited technology and parameters for the particular goal, possibly optimizing time and costs. Yet, this is still not the standard practice and researchers often base their choices on practical considerations like technology and money availability. As shown throughout this Review, simulation of NGS data from known genomes or transcriptomes can be extremely useful when evaluating assembly, mapping, phasing or genotyping algorithms e.g. ²^,⁷^,¹⁰^,¹³^,⁶⁴ exposing their advantages and drawbacks under different circumstances.

Altogether, current NGS simulators consider most, if not all, of the important features regarding the generation of NGS data. However, they are not problem-free. The different simulators are largely redundant, implementing the same or very similar procedures. In our opinion, many are poorly documented and can be difficult to use for non-experts, and some of them are no longer maintained. Most importantly, for the most part they have not been benchmarked or validated. Remarkably, among the 23 tools considered here, only 13 have been described in dedicated application notes, 3 have been mentioned as add-ons in the methods section of bigger articles, and 5 have never been referenced in a journal. Indeed, peer-reviewed publication of these tools in dedicated articles would be highly desirable. While this would not definitively guarantee quality, at least it would encourage authors to reach minimum standards in terms of validation, benchmarking, and documentation. Collaborative efforts like the Assemblathon e.g. ²⁷ or iEvo (http://www.ievobio.org/) might be also a source of inspiration. Meanwhile, we hope that the decision tree presented in Fig. 1 helps users making appropriate choices.

SOURCE

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/

Technology	Run Type	Maximum Read Length	Quality Scores	Error Rates	References
Illumina	X	X	X	300 bp	> Q30	0.0034 – 1%	⁶⁵
SOLiD	X	X	X	75 bp	> Q30	0.01 – 1%	⁶⁶
IonTorrent	X	X		400 bp	~ Q20	1.78%	²²
454	X	X		~700 bp (up to 1 Kb)	> Q20	1.07 – 1.7%	59,67
Nanopore	X			5.4 – 10 Kb	NAY	10 – 40%	^68–72
PacBio	X			~15 Kb (up to 40 Kb)	< Q10	5 – 10%	^22,73–75

REFERENCES

Systematic benchmarking of omics computational tools

Serghei Mangul, Lana S. Martin, Brian L. Hill, Angela Ka-Mei Lam, Margaret G. Distler, Alex Zelikovsky, Eleazar Eskin, Jonathan Flint

Nat Commun. 2019; 10: 1393. Published online 2019 Mar 27. doi: 10.1038/s41467-019-09406-4

PMCID:: PMC6437167

Article PubReader PDF–927K Citation

Long fragments achieve lower base quality in Illumina paired-end sequencing

Ge Tan, Lennart Opitz, Ralph Schlapbach, Hubert Rehrauer

Sci Rep. 2019; 9: 2856. Published online 2019 Feb 27. doi: 10.1038/s41598-019-39076-7

PMCID:: PMC6393434

Article PubReader PDF–1.1M Citation

sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs

Apostolos Dimitromanolakis, Jingxiong Xu, Agnieszka Krol, Laurent Briollais

BMC Bioinformatics. 2019; 20: 26. Published online 2019 Jan 15. doi: 10.1186/s12859-019-2611-1

PMCID:: PMC6332552

Article PubReader PDF–1.0M Citation

Analysis validation has been neglected in the Age of Reproducibility

Kathleen E. Lotterhos, Jason H. Moore, Ann E. Stapleton

PLoS Biol. 2018 Dec; 16(12): e3000070. Published online 2018 Dec 10. doi: 10.1371/journal.pbio.3000070

PMCID:: PMC6301703

Article PubReader PDF–968K Citation

Enterovirus D68 – The New Polio?

Hayley Cassidy, Randy Poelman, Marjolein Knoester, Coretta C. Van Leer-Buter, Hubert G. M. Niesters

Front Microbiol. 2018; 9: 2677. Published online 2018 Nov 13. doi: 10.3389/fmicb.2018.02677

PMCID:: PMC6243117

Article PubReader PDF–2.4M Citation

Genetic Simulation Resources and the GSR Certification Program

Bo Peng, Man Chong Leong, Huann-Sheng Chen, Melissa Rotunno, Katy R Brignole, John Clarke, Leah E Mechanic

Bioinformatics. 2019 Feb 15; 35(4): 709–710. Published online 2018 Aug 7. doi: 10.1093/bioinformatics/bty666

PMCID:: PMC6378936

Currently embargoed: Free in PMC on Feb 15, 2020; PubMed

Simulating Illumina metagenomic data with InSilicoSeq

Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, Erik Bongcam-Rudloff

Bioinformatics. 2019 Feb 1; 35(3): 521–522. Published online 2018 Jul 19. doi: 10.1093/bioinformatics/bty630

PMCID:: PMC6361232

Article PubReader PDF–395K Citation

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model

Ze-Gang Wei, Shao-Wu Zhang

BMC Bioinformatics. 2018; 19: 177. Published online 2018 May 22. doi: 10.1186/s12859-018-2208-0

PMCID:: PMC5964698

Article PubReader PDF–2.1M Citation

DeepSimulator: a deep simulator for Nanopore sequencing

Yu Li, Renmin Han, Chongwei Bi, Mo Li, Sheng Wang, Xin Gao

Bioinformatics. 2018 Sep 1; 34(17): 2899–2908. Published online 2018 Apr 6. doi: 10.1093/bioinformatics/bty223

PMCID:: PMC6129308

Article PubReader PDF–615K Citation

Xome-Blender: A novel cancer genome simulator

Roberto Semeraro, Valerio Orlandini, Alberto Magi

PLoS One. 2018; 13(4): e0194472. Published online 2018 Apr 5. doi: 10.1371/journal.pone.0194472

PMCID:: PMC5886411

Article PubReader PDF–5.9M Citation

Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets

Soroush Samadian, Jeff P. Bruce, Trevor J. Pugh

PLoS Comput Biol. 2018 Mar; 14(3): e1006080. Published online 2018 Mar 28. doi: 10.1371/journal.pcbi.1006080

PMCID:: PMC5891060

Article PubReader PDF–3.5M Citation

Environmental and Host Effects on Skin Bacterial Community Composition in Panamanian Frogs

Brandon J. Varela, David Lesbarrères, Roberto Ibáñez, David M. Green

Front Microbiol. 2018; 9: 298. Published online 2018 Feb 22. doi: 10.3389/fmicb.2018.00298

PMCID:: PMC5826957

Article PubReader PDF–2.1M Citation

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Fedor M. Naumenko, Irina I. Abnizova, Nathan Beka, Mikhail A. Genaev, Yuriy L. Orlov

BMC Genomics. 2018; 19(Suppl 3): 92. Published online 2018 Feb 9. doi: 10.1186/s12864-018-4475-6

PMCID:: PMC5836841

Article PubReader PDF–1.9M Citation

HgtSIM: a simulator for horizontal gene transfer (HGT) in microbial communities

Weizhi Song, Kerrin Steensen, Torsten Thomas

PeerJ. 2017; 5: e4015. Published online 2017 Nov 8. doi: 10.7717/peerj.4015

PMCID:: PMC5681852

Article PubReader PDF–1.3M Citation

Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes

Haibao Tang, Ewen F. Kirkness, Christoph Lippert, William H. Biggs, Martin Fabani, Ernesto Guzman, Smriti Ramakrishnan, Victor Lavrenko, Boyko Kakaradov, Claire Hou, Barry Hicks, David Heckerman, Franz J. Och, C. Thomas Caskey, J. Craig Venter, Amalio Telenti

Am J Hum Genet. 2017 Nov 2; 101(5): 700–715. Published online 2017 Nov 2. doi: 10.1016/j.ajhg.2017.09.013

PMCID:: PMC5673627

Article PubReader PDF–1.1M Citation

Simulating the dynamics of targeted capture sequencing with CapSim

Minh Duc Cao, Devika Ganesamoorthy, Chenxi Zhou, Lachlan J M Coin

Bioinformatics. 2018 Mar 1; 34(5): 873–874. Published online 2017 Oct 28. doi: 10.1093/bioinformatics/btx691

PMCID:: PMC6192212

Article PubReader PDF–123K Citation

Next-generation sequencing applications in clinical bacteriology

Yair Motro, Jacob Moran-Gilad

Biomol Detect Quantif. 2017 Dec; 14: 1–6. Published online 2017 Oct 23. doi: 10.1016/j.bdq.2017.10.002

PMCID:: PMC5727008

Article PubReader PDF–204K Citation

A multi-scenario genome-wide medical population genetics simulation framework

Jacquiline W Mugo, Ephifania Geza, Joel Defo, Samar S M Elsheikh, Gaston K Mazandu, Nicola J Mulder, Emile R Chimusa

Bioinformatics. 2017 Oct 1; 33(19): 2995–3002. Published online 2017 Jun 24. doi: 10.1093/bioinformatics/btx369

PMCID:: PMC5870573

Article PubReader PDF–488K Citation

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

Ryan R. Wick, Louise M. Judd, Claire L. Gorrie, Kathryn E. Holt

PLoS Comput Biol. 2017 Jun; 13(6): e1005595. Published online 2017 Jun 8. doi: 10.1371/journal.pcbi.1005595

PMCID:: PMC5481147

Article PubReader PDF–7.2M Citation

NanoSim: nanopore sequence read simulator based on statistical characterization

Chen Yang, Justin Chu, René L Warren, Inanç Birol

Gigascience. 2017 Apr; 6(4): 1–6. Published online 2017 Feb 24. doi: 10.1093/gigascience/gix010

PMCID:: PMC5530317

Article PubReader PDF–829K Citation

Read Full Post »

Accelerating Clinical Next-Generation Sequencing: Navigating the Path to Reimbursement

Posted in Diagnostic Immunology, Diagnostics and Lab Tests, Liquid Biopsy: Circulating Tumor Cells in Urine and Blood, Next Generation Sequencing (NGS) on January 23, 2018| Leave a Comment »

Accelerating Clinical Next-Generation Sequencing: Navigating the Path to Reimbursement

Reporter: Aviva Lev-Ari, PhD, RN

Session at PMWC 2018 Silicon Valley

http://www.pmwcintl.com/sessionthemes-accelerating-clinical-next-generation-sequencing-2018sv/

Read Full Post »

Older Posts »

Technology	Run Type			Maximum Read Length	Quality Scores	Error Rates	References
Technology	Single-read	Paired-end	Mate-pair	Maximum Read Length	Quality Scores	Error Rates	References
Illumina	X	X	X	300 bp	> Q30	0.0034 – 1%	⁶⁵
SOLiD	X	X	X	75 bp	> Q30	0.01 – 1%	⁶⁶
IonTorrent	X	X		400 bp	~ Q20	1.78%	²²
454	X	X		~700 bp (up to 1 Kb)	> Q20	1.07 – 1.7%	59,67
Nanopore	X			5.4 – 10 Kb	NAY	10 – 40%	^68–72
PacBio	X			~15 Kb (up to 40 Kb)	< Q10	5 – 10%	^22,73–75

Archive for the ‘Next Generation Sequencing (NGS)’ Category

Real Time Conference Coverage: Advancing Precision Medicine Conference, Afternoon Omics Session Track 2 October 3 2025

Share this:

Like this:

Coverage Afternoon Session on Precision Oncology: Advancing Precision Medicine Annual Conference, Philadelphia PA November 1 2024

Share this:

Like this:

The Use of ChatGPT in the World of BioInformatics and Cancer Research and Development of BioGPT by MIT

Microsoft’s BioGPT Shows Promise as the Best Biomedical NLP

Other Relevant Articles on Natural Language Processing in BioInformatics, Healthcare and ChatGPT for Medicine on this Open Access Scientific Journal Include

Share this:

Like this:

The Human Genome Gets Fully Sequenced: A Simplistic Take on Century Long Effort

How the Human Genome Project Expanded our View of Genes Genetic Material and Biological Processes

History of the Human Genome Project

History of the Project

Project Enabling Legislation

Project Sponsors

For a great read on Dr. Craig Ventnor with interviews with the scientist see Dr. Larry Bernstein’s excellent post The Human Genome Project

Abstract

Also very important is the ability to determine all the paralogs, isoforms, areas of potential epigenetic regulation, gene duplications, and transposable elements that exist within the human genome.

Analyses and resources

Future of the human reference genome

The Human Genome Is Finally Fully Sequenced

A more precise human genome

Michael Schatz, Karen Miga, Evan Eichler, and Adam Phillippy

Other articles on the Human Genome Project and Junk DNA in this Open Access Scientific Journal Include:

Share this:

Like this:

#TUBiol5227: Biomarkers & Biotargets: Genetic Testing and Bioethics

Rising incidence of genetic disorders across the globe will augment the market growth

Nutrigenomic Testing will provide robust market growth

Targeted analysis techniques will drive the market growth over the foreseeable future

Over-the-counter segment will experience a notable growth over the forecast period

Favorable regulations provide lucrative growth opportunities for direct-to-consumer genetic testing

Focus of market players on introducing innovative direct-to-consumer genetic testing devices will offer several growth opportunities

The following posts discuss bioethical issues related to genetic testing and personalized medicine from a clinicians and scientisit’s perspective

The following posts discuss the bioethical concerns of genetic testing from a patient’s perspective:

Articles on Genetic Testing, Companion Diagnostics and Regulatory Mechanisms

Share this:

Like this:

Thriving Vaccines and Research: Weizmann Institute Coronavirus Research Development

Mapping a hijacker’s methods

Like chutzpah, some things don’t translate

The landscape of elimination

Gearing up ‘chain terminators’ to battle the coronavirus

Resolving borderline diagnoses

Moon shot consortium refining drug options

Share this:

Like this:

Complex rearrangements and oncogene amplification revealed by long-read DNA and RNA sequencing of a breast cancer cell line

Other articles on Cancer Genome Sequencing in this Open Access Journal Include:

Share this:

Like this:

Narrative Building for the Future of LPBI Group: List of Talking Points

In response to this question we are in discussion on POINTS #1,2,3,4

POINT #1 for VIDEO coverage – Focus on Genomics, Volume 2

POINT #2 or #3 or #4 for VIDEOs to Focus on coverage for Marketing LPBI Group

by DESCRIPTION of what was accomplished

Video on What is the Ideal Solution for the FUTURE of LPBI Group

For POINT #1:

To build the narrative for a VIDEO dedication to Genomics, Volume Two and Marketing campaign as a NEW BOOK on NGS, the Narrative will use content extracts to built a CASE for

Why GENOMICS Volume 2 – is the Epitome of all BioMed e-Series???????

forthcoming Genomics, Volume 2

Share this:

Like this:

Simulation Tools of Genomic Next Generation Sequencing Data: Comparative Analysis & Genetic Simulation Resources

SIMULATION PROGRAMS

A comparison of tools for the simulation of genomic next-generation sequencing data

Online Summary

Abstract

An overview of current NGS technologies

Table 1

Simulation parameters

Share this:

Like this:

Session at PMWC 2018 Silicon Valley

Share this:

Like this:

Follow Blog via Email