Curated | Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA

Archive for the ‘Curated’ Category

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

Posted in Academic Publishing, Art Inspires Science, Article Type, Artificial Intelligence - Breakthroughs in Theories and Technologies, Artificial Intelligence Applications in Health Care, Artificial Intelligence in Health Care - Tools & Innovations, Artificial Intelligence in Medicine - Applications in Therapeutics, Authored, Curated, Deep Learning, Deep Learning in Pathology, IP Development by LPBI Group Team, IP Development by LPBI Group Team & Other Organizations, LPBI Group, e-Scientific Media, DFP, R&D-M3DP, R&D-Drug Discovery, US Patents: SOPs and Team Management, Machine Learning, Medical Imaging Technology, Medical Imaging Technology, Image Processing/Computing, MRI, CT, Nuclear Medicine, Ultra Sound, Natural Language Processing (NLP), Scientific report, Ultra Sound on November 22, 2025| Leave a Comment »

AI Initiatives in Big Pharma @Grok

kprompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

Curator: Aviva Lev-Ari, PhD, RN

We had researched the topic of AI Initiatives in Big Pharma in the following article:

Authentic Relevance of LPBI Group’s Portfolio of IP as Proprietary Training Data Corpus for AI Initiatives at Big Pharma

https://pharmaceuticalintelligence.com/2025/11/15/authentic-relevance-of-lpbi-groups-portfolio-of-ip-as-proprietary-training-data-corpus-for-ai-initiatives-at-big-pharma/

We are publishing a Series of Five articles that demonstrate the Authentic Relevance of Five of the Ten Digital IP Asset Classes in LPBI Group’s Portfolio of IP for AI Initiatives at Big Pharma.

For the Ten IP Asset Classes in LPBI Group’s Portfolio, See

https://pharmaceuticalintelligence.com/portfolio-of-ip-assets/

The following Five Digital IP Asset classes are positioned as Proprietary Training Data and Inference for Foundation Models in Health care.

https://pharmaceuticalintelligence.com/healthcare-foundation-model/

This Corpus comprises of Live Repository of Domain Knowledge Expert-Written Clinical Interpretations of Scientific Findings codified in the following five Digital IP ASSETS CLASSES:

• IP Asset Class I: Journal: PharmaceuticalIntelligence.com

6,250 scientific articles (70% curations, creative expert opinions. 30% scientific reports).

2.4MM Views, equivalent of $50MM if downloading an article is paid market rate of $30.

PharmaceuticalIntelligence.com Journal – Projecting the Annual Rate of Article Views

https://pharmaceuticalintelligence.com/vision/pharmaceuticalintelligence-com-journal-projecting-the-annual-rate-of-article-views/

• IP Asset Class II: 48 e-Books: English Edition & Spanish Edition.

152,000 pages downloaded under pay-per-view. The largest number of downloads for one e-Publisher (LPBI)

https://www.amazon.com/s?k=Aviva+Lev-Ari&i=digital-text&rh=n%3A133140011&ref=nb_sb_noss

• IP Asset Class III: 100 e-Proceedings and 50 Tweet Collections of Top Biotech and Medical Global Conferences, 2013-2025

https://pharmaceuticalintelligence.com/press-coverage/part-three-conference-eproceedings-deliverables-social-media-analytics/

• IP Asset Class V: 7,500 Biological Images in our Digital Art Media Gallery, as prior art. The Media Gallery resides in WordPress.com Cloud of LPBI Group’s Web site

PharmaceuticalIntelligence.com

• IP Asset Class X: +300 Audio Podcasts: Interviews with Scientific Leaders

https://pharmaceuticalintelligence.com/biomed-audio-podcast-library-lpbi-group/

BECAUSE THE ABOVE ASSETS ARE DIGITAL ASSETS they are ready for use as Proprietary TRAINING DATA and INFERENCE for AI Foundation Models in HealthCare.

Expert‑curated healthcare corpus mapped to a living ontology, already packaged for immediate model ingestion and suitable for safe pre-training, evals, fine‑tuning and inference. If healthcare domain data is on your roadmap, this is a rare, defensible asset.

The article TITLE of each of the five Digital IP Asset Classes matched to AI Initiatives in Big Pharma, an article per IP Asset Class are:

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class I: PharmaceuticalIntelligence.com Journal, 2.5MM Views, 6,250 Scientific articles and Live Ontology

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-data-training-and-inference-by-lpbi-groups-ip-asset-class-i-pharmaceuticalintelligence-com-journal-2-5mm-views-6250-scientific-article/

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class II: 48 e-Books: English Edition & Spanish Edition. 152,000 pages downloaded under pay-per-view

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-ii-48-e-books-english-edition-spanish-edition-152000/

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class III: 100 e-Proceedings and 50 Tweet Collections of Top Biotech and Medical Global Conferences, 2013-2025

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-iii-100-e-proceedings-and-50-tweet-collections-of-top-biotech/

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-v-7500-biological-images-in-lpbi-groups-digital-art/

Conclusions by @Grok

Conclusions and Implications

Digital IP Class V’s image gallery is a “treasure trove” ready for Big Pharma AI, establishing prior art while powering multimodal breakthroughs. Technical Implications: Enables visual-enhanced models for disease detection and R&D acceleration. Business Implications: Supports $500M investments with ethical, diverse data for partnerships; licensing potential for grants/webinars. Unique Insight: As embedded prior art, these visuals create a “moat” in multimodal AI—extending series from text to imagery for holistic Pharma companies inference. Promotional with links to gallery/IP portfolio. Caps the series by adding visual depth to textual assets.

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class X: +300 Audio Podcasts Library: Interviews with Scientific Leaders

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-x-300-audio-podcasts-library-interviews-with-scientific-leaders/

In the series of five articles, as above, we are presenting the key AI Initiatives in Big Pharma as it was created by our prompt to @Grok on 11/18/2025:

What are PFIZER’s AI INITIATIVES?

@Grok Response:

x.com/i/grok/share/0ol5VOJsEYs11baXq4xkzNb0h

Generative AI tools that save scientists up to 16,000 hours annually in literature searches and data analysis.

Drug Discovery and Development Acceleration Pfizer uses AI, supercomputing, and ML to streamline R&D timelines

Clinical Trials and Regulatory Efficiency AI:

-Predictive Regulatory Tools

-Decentralize Trials

-inventory management

Disease Detection and Diagnostics:

– ATTR-CM Initiative

– Rare diseases

Generative AI and Operational Tools:

– Charlie Platform

– Scientific Data Cloud AWS powered ML on centralized data

– Amazon’s SageMaker /Bedrock for Manufacturing efficiency

– Global Health Grants:

Pfizer Foundation’s AI Learning Lab for equitable access to care and tools for community care

Partnerships and Education

– Collaborations: IMI Big Picture for 3M – sample disease database

– AI in Pharma AIPM Symposium: Drug discovery and Precision Medicine

– Webinars of AI for biomedical data integration

– Webinar on AI in Manufacturing

Strategic Focus:

– $500M R&D reinvestment by 2026 targets AI for Productivity

– Part of $7.7B cost savings

– Ethical AI, diverse DBs

– Global biotech advances: China’s AI in CRISPR

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

The Left Column was written @Grok

The Right Column was written by Aviva Lev-Ari, PhD, RN

AI Initiative at Big Pharma i.e., Pfizer	Biological Images selected by Experts embedded in original Text (Prior Art)
Generative AI tools that save scientists up to 16,000 hours annually in literature searches and data analysis.
Drug Discovery and Development Acceleration Pfizer uses AI, supercomputing, and ML to streamline R&D timelines	Gallery of ~8,000 Biological images and captions is a Treasure TROVE
Clinical Trials and Regulatory Efficiency AI: -Predictive Regulatory Tools -Decentralize Trials -inventory management
Disease Detection and Diagnostics: – ATTR-CM Initiative – Rare diseases	Gallery of ~8,000 Biological images and captions is a Treasure TROVE
Generative AI and Operational Tools: – Charlie Platform – Scientific Data Cloud AWS powered ML on centralized data – Amazon’s SageMaker/Bedrock for Manufacturing efficiency – Global Health Grants: Pfizer Foundation’s AI Learning Lab for equitable access to care and tools for community care	Ingest into Charlie Platform the Media Gallery for generation of Medical article drafts
Partnerships and Education – Collaborations: IMI Big Picture for 3M – sample disease database – AI in Pharma AIPM Symposium: Drug discovery and Precision Medicine – Webinars of AI for biomedical data integration – Webinar on Ai in Manufacturing
Strategic Focus: – $500M R&D reinvestment by 2026 targets AI for Productivity – Part of $7.7B cost savings – Ethical AI, diverse DBs – Global biotech advances: China’s AI in CRISPR

CONCLUSIONS: The Voice of Dr. Stephen J. Williams PhD

PENDING

Article Summary by @Grok of the ArticleTitle:

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

Publication Date: November 22, 2025

Author/Curator: Aviva Lev-Ari, PhD, RN

(Curator; Posted by 2012pharmaceutical)

@Grok SOURCE

https://x.com/i/grok?conversation=1994759142053908719

Overview: Fifth in LPBI Group’s five-article series on AI-ready digital IP assets for pharma. This piece spotlights IP Asset Class V—7,500 expert-selected biological images in the Digital Art Media Gallery—as proprietary training data and “prior art” for multimodal AI foundation models in healthcare. Leveraging a November 18, 2025, Grok prompt on Pfizer’s AI efforts, it maps the gallery to pharma applications, emphasizing visual data’s role in enhancing generative AI for diagnostics, drug discovery, and article drafting. Unlike text-heavy prior classes, this focuses on image-caption pairs for ingestion into platforms like Charlie, positioning them as a “treasure trove” for ethical, diverse AI training.Main Thesis and Key Arguments

Core Idea: LPBI’s 7,500 biological images (with captions) serve as defensible, expert-curated prior art and training data for Big Pharma AI, enabling multimodal inference that combines visuals with clinical insights—outpacing generic datasets by injecting human-selected domain knowledge.
Value Proposition: The ~8,000-image gallery (actual 7,500 noted) is a ready-to-ingest visual corpus for platforms like Pfizer’s Charlie, generating medical drafts and accelerating R&D. Valued within the series’ $50MM-equivalent portfolio; unique as embedded prior art in original texts, supporting ethical AI with diverse, ontology-mapped visuals.
Broader Context: Part of ten IP classes, with five (I-V, X) AI-primed; complements text assets (e.g., 6,250 articles, 48 e-books) by adding multimodal depth. Highlights live ontology for semantic integration, contrasting open-source data with proprietary, safe-for-healthcare inputs.

AI Initiatives in Big Pharma (Focus on Pfizer)Reuses the Grok prompt highlights, presented in a verbatim table:

Initiative Category	Description
Generative AI Tools	Generative AI tools that save scientists up to 16,000 hours annually in literature searches and data analysis.
Drug Discovery Acceleration	Drug Discovery and Development Acceleration Pfizer uses AI, supercomputing, and ML to streamline R&D timelines.
Clinical Trials & Regulatory Efficiency	Clinical Trials and Regulatory Efficiency AI: -Predictive Regulatory Tools -Decentralize Trials -inventory management.
Disease Detection & Diagnostics	Disease Detection and Diagnostics: – ATTR-CM Initiative – Rare diseases.
Generative AI & Operational Tools	Generative AI and Operational Tools: – Charlie Platform – Scientific Data Cloud AWS powered ML on centralized data – Amazon’s SageMaker /Bedrock for Manufacturing efficiency – Global Health Grants: Pfizer Foundation’s AI Learning Lab for equitable access to care and tools for community care.
Partnerships & Education	Partnerships and Education – Collaborations: IMI Big Picture for 3M – sample disease database – AI in Pharma AIPM Symposium: Drug discovery and Precision Medicine – Webinars of AI for biomedical data integration – Webinar on AI in Manufacturing.
Strategic Focus	Strategic Focus: – $500M R&D reinvestment by 2026 targets AI for Productivity – Part of $7.7B cost savings – Ethical AI, diverse DBs – Global biotech advances: China’s AI in CRISPR.

Mapping to LPBI’s Proprietary DataCore alignment table (verbatim extraction, linking Pfizer initiatives to Class V assets):

AI Initiative at Big Pharma i.e., Pfizer	Biological Images selected by Experts embedded in original Text (Prior Art)
Generative AI Tools (16,000 hours saved)	(No specific mapping provided.)
Drug Discovery Acceleration	Gallery of ~8,000 Biological images and captions is a Treasure TROVE.
Clinical Trials & Regulatory Efficiency	(No specific mapping provided.)
Disease Detection & Diagnostics (ATTR-CM, rare diseases)	Gallery of ~8,000 Biological images and captions is a Treasure TROVE.
Generative AI & Operational Tools (Charlie, AWS, etc.)	Ingest into Charlie Platform the Media Gallery for generation of Medical article drafts.
Partnerships & Education (IMI, AIPM, webinars)	(No specific mapping provided.)
Strategic Focus ($500M reinvestment, ethics)	(No specific mapping provided.)

Methodologies and Frameworks

AI Training Pipeline: Ingest image-caption pairs into Charlie/AWS platforms for pre-training (multimodal foundation models), fine-tuning (e.g., diagnostics visuals), and inference (e.g., draft generation). Use living ontology for semantic tagging; prior art embedding ensures IP defensibility.
Productivity Model: Amplifies Pfizer’s 16,000-hour savings and $7.7B efficiencies by fueling generative tools with expert visuals; ethical emphasis on diverse DBs for global equity (e.g., AI Learning Lab).
Insights: References Grok prompt as real-time sourcing; quotes series-wide vision of assets as “codified digital treasures” for safe healthcare AI. Predicts revolution in visual inference for rare diseases/ATTR-CM.

Conclusions and Implications

Digital IP Class V’s image gallery is a “treasure trove” ready for Big Pharma AI, establishing prior art while powering multimodal breakthroughs. Technical Implications: Enables visual-enhanced models for disease detection and R&D acceleration. Business Implications: Supports $500M investments with ethical, diverse data for partnerships; licensing potential for grants/webinars. Unique Insight: As embedded prior art, these visuals create a “moat” in multimodal AI—extending series from text to imagery for holistic Pharma companies inference. Promotional with links to gallery/IP portfolio. Caps the series by adding visual depth to textual assets.

If you’d like cross-series analysis, image examples, or multimodal AI trends, let me know!

Read Full Post »

The Human Proteome Map Completed

Posted in Big Data, BioIT: BioInformatics, Cancer - General, Cancer and Current Therapeutics, Cancer Genomics, Cancer Informatics, cancer metabolism, Cell Biology, Signaling & Cell Circuits, Curated, Genomic Expression, Inflammasome, KRAS Mutation, Metastasis Process, Proteins, Proteomics, TP53 - Germline mutations, Transcriptomics, tagged bioinformatic tools, Cancer, Cancer Genomics, Cancer immunology, oncoproteomics, Proteomics on August 28, 2014| Leave a Comment »

The Human Proteome Map Completed

Reporter and Curator: Larry H. Bernstein, MD, FCAP

UPDATED 6/02/2024

The genetic, pharmacogenomic, and immune landscapes associated with protein expression across human cancers.

Source: Chen C, Liu Y, Li Q, Zhang Z, Luo M, Liu Y, Han L. The Genetic, Pharmacogenomic, and Immune Landscapes Associated with Protein Expression across Human Cancers. Cancer Res. 2023 Nov 15;83(22):3673-3680. doi: 10.1158/0008-5472.CAN-23-0758. PMID: 37548539; PMCID: PMC10843800.

Abstract

Proteomics is a powerful approach that can rapidly enhance our understanding of cancer development. Detailed characterization of the genetic, pharmacogenomic, and immune landscape in relation to protein expression in cancer patients could provide new insights into the functional roles of proteins in cancer. By taking advantage of the genotype data from The Cancer Genome Atlas (TCGA) and protein expression data from The Cancer Proteome Atlas (TCPA), we characterized the effects of genetic variants on protein expression across 31 cancer types and identified approximately 100,000 protein quantitative trait loci (pQTL). Among these, over 8000 pQTL were associated with patient overall survival. Furthermore, characterization of the impact of protein expression on more than 350 imputed anticancer drug responses in patients revealed nearly 230,000 significant associations. In addition, approximately 21,000 significant associations were identified between protein expression and immune cell abundance. Finally, a user-friendly data portal, GPIP (https://hanlaboratory.com/GPIP), was developed featuring multiple modules that enable researchers to explore, visualize, and browse multidimensional data. This detailed analysis reveals the associations between the proteomic landscape and genetic variation, patient outcome, the immune microenvironment, and drug response across cancer types, providing a resource that may offer valuable clinical insights and encourage further functional investigations of proteins in cancer.

Introduction

Functional proteomics is a powerful approach that helps us understand cancer pathophysiology and identify potential therapeutic strategies (1). Functional protein analysis using reverse-phase protein arrays (RPPA) has already proven highly effective in studying large numbers of TCGA samples, especially when integrated with genomic, transcriptomic, and clinical information (2). Previous works demonstrated that a QTL mapping approach is effective to understand the genetic basis of multiple molecular features in human diseases (3). Identifying the sequence determinants of protein levels (pQTLs) may guide the search for causal genes and facilitate understanding the underlying mechanisms of human diseases. However, it remains challenging to further understand the functional roles of protein expression in cancers. For example, it is unclear whether proteins are associated with drug response and/or immune features in patients. In this study, we systematically investigated the effects of genetic variants on protein expression and characterized the impact of protein expression on imputed drug responses and immune cell abundances from different sources (Fig. 1). To facilitate broad access of these data for the biomedical research community, we developed a user-friendly database, GPIP (https://hanlaboratory.com/GPIP). We expect this study to have a significant clinical impact on the future development of protein-based targeted therapies.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0001.jpg

Figure 1.

Impact of genetic variants on protein expression.

A Workflow of GPIP to identify pQTLs and survival-associated pQTLs. B The number of pQTLs identified for each cancer type. C Association between CYCLINB1 protein expression level and rs12576855 in LUAD patients. D Association between CYCLINB1 protein expression level and rs2722796 in LGG patients. E The number of survival-associated pQTLs identified for each cancer type. F Kaplan–Meier plot showing the association between rs10918659 (pQTL of HER2_pY1248) genotypes and overall survival times of STAD patients. G Kaplan–Meier plot showing the association between rs13158796 (pQTL of HER2_pY1248) genotypes and overall survival times of STAD patients.

Identification of protein–drug associations

To investigate potential associations between protein expression and drug response, we calculated the Spearman rank correlation between protein expression data and drug response from DrVAEN and cancerRxTissue. These two datasets employed distinct predictive models that integrated omics data from CCLE and drug response data from GDSC to predict drug response in TCGA samples (Fig. 2A) (4,5). Association with |Rs| > 0.3 and FDR < 0.05 were considered as significant associations in each cancer type.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0002.jpg

Figure 2.

Exploring the pharmacogenomics of protein in human cancer.

A Workflow of GPIP to identify Drug-associated proteins. B The number of protein-drug response pairs identified from DrVAEN (left) and cancerRxTissue (right) for each cancer type. C Visualization of the associations between proteins and drugs (DrVAEN) within and across different cancer signaling pathways. Blue links represent associations within a single pathway, while orange links represent associations cross pathways. D Enrichment analysis of drug target pathways among significant protein-drug response pairs. The color represents the log2 (odds ratio) of Fisher’s exact test. The size represents the FDR value.

Identification of protein–immune cell associations

To examine the relationship between protein expression and immune cell abundance, we utilized Spearman rank correlation coefficient to calculate the associations between protein expression data and immune cell abundance data from TIMER, CIBERSORT, ImmuneCellAI, and ImmuneCellGSVA (Fig. 3). These datasets utilized different methods to evaluate immune cell abundance by leveraging immune gene signatures as a proxy (6–9). We considered correlations with |Rs| > 0.3 and FDR < 0.05 as significant associations.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0003.jpg

Figure 3.

Exploring the immune landscapes of protein in human cancer.

A Workflow of GPIP to identify Immune cell-associated proteins. B The number of protein-drug response pairs identified from ImmuneCellsGSVA (purple), ImmuCellAI (yellow), TIMER (red) and CIBERSORT (green) for each cancer type. C The top 10 proteins with the highest number of significantly associated immune cell types in HNSC. The color represents the Rs between protein expression and immune cell abundance (ImmuneCellGSVA). The size represents the FDR value. D Association between PREX1expression and impute MDSC abundance in HNSC patients.

Database construction

GPIP was developed using Python Flask-RESTful API frameworks (https://flask-restful.readthedocs.io/), AngularJS (https://angularjs.org), and Bootstrap (https://getbootstrap.com/). The database for GPIP was implemented using the NoSQL database program MongoDB (https://www.mongodb.com/). The user-friendly interface of the GPIP web application was served through the Apache HTTP Server, allowing users to access the database and perform queries and analysis through a web browser.

Data availability

All results generated in this study can be found in GPIP database, (https://hanlaboratory.com/GPIP). Publicly available data generated by others were used by the authors in this study: The genotype data and clinical data were obtained from The Cancer Genome Atlas (TCGA) data portal at https://tcga-data.nci.nih.gov/tcga/. The reverse-phase protein array (RPPA) protein expression data was obtained from The Cancer Proteome Atlas (TCPA) data portal at https://www.tcpaportal.org/. The imputed pharmacogenomic data were obtained from DrVAEN at https://bioinfo.uth.edu/drvaen/ and cancerRxTissue at https://manticore.niehs.nih.gov/cancerRxTissue/. The immune-cell infiltration data were obtained from Tumor Immune Estimation Resource (TIMER) at http://timer.cistrome.org/, Immune Cell Abundance Identifier (ImmuCellAI) at http://bioinfo.life.hust.edu.cn/ImmuCellAI/, and CIBERSORT at https://cibersort.stanford.edu/.

A comprehensive data portal

We developed a user-friendly data portal, GPIP (https://hanlaboratory.com/GPIP), to facilitate visualizing, searching, and browsing of our results by the biomedical research community (Fig. 4A). GPIP contains four main modules: Protein-QTLs, Surivial-QTLs, Drug Response, and Immune Infiltration (Fig. 4B). Querying can be easily performed by selecting cancer type, protein, drug, immune cell abundance, or entering the SNP ID of interest (Fig. 4C). For example, in the Protein-QTLs and Survival-QTLs modules, users can search for pQTLs by selecting a cancer type (e.g., LUAD) and entering a protein name (e.g., CYCLINB1) or an SNP ID (e.g., rs12576855). In the Drug Response module, users can search for protein-drug response associations by selecting a data source for imputed drug response (e.g., DrVAEN) and selecting an anticancer drug (e.g., Talazoparib) or a protein (e.g., PARP1). In the Immune Infiltration module, users can search for protein-immune infiltration pairs by selecting a data source for imputed immune cell abundance (e.g., ImmuneCellsGSVA), and selecting an immune cell type (e.g., Activated B cell) or a protein (e.g., PDL1). In addition, on the bottom of the main page, we developed a cancer type module where users can click on a specific cancer type (e.g., BLCA) to search for related information across all 4 modules (Fig. 4D). The search results for each module included a table to list related information accordingly (Fig. 4E). A “Details” button for each result item was clicked for generating a box plot in protein-QTLs module (Fig. 4F), a Kaplan–Meier plot in Survival-QTLs module (Fig. 4G) and a scatter plot in Drug Response and Immune Infiltration modules, respectively (Fig. 4H, ,I).I). Our database provides a valuable resource for cancer research and will be of great interest to the research community.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0004.jpg

Figure 4.

Content and interface of GPIP.

A GPIP homepage and browser bar. B The four main modules of GPIP. C Search boxes in the pQTLs module. D Search boxes in the cancer type-specific search module. E An example of resulting list in the pQTL module. F An example of boxplot for the pQTLs module result. G An example of Kaplan–Meier plot for the Survival protein-QTLs module result. H An example of scatter plot for the Drug Response module result. I An example of scatter plot for the Immune Infiltration module result.

Go to:

Discussion

Proteomics plays a crucial role in identifying potential therapeutic strategies and understanding cancer pathophysiology (2). In this study, we investigated the effects of genetic variants on protein expression and characterized the impact of protein expression on imputed drug responses and immune cell abundances across human cancers. We also developed the user-friendly data portal, GPIP, to provide access to these results. Our study provides a comprehensive analysis of protein expression in different cancer types and their association with drug response and immune cell abundance.

Identifying genetic variants associated with cancer has revolutionized our understanding of the disease and holds promise for improved diagnosis and treatment. In GPIP, we identified ~100,000 pQTLs across 31 cancer types and 8.8% of them were found to be associated with patient survival (Fig. 1). These genetic variants hold significant promise for unraveling the underlying biological mechanisms of disease progression and response to treatments. For example, a survival-associated pQTL may help to identify a genetic variant that controls the expression of a protein crucial for tumor growth or immune response, thus impacting patient survival. Our results suggest that pQTLs have the potential to serve as prognostic biomarkers and aid in the development of precision medicine.

Despite the promising implications, it is crucial to consider potential limitations of pQTL identification. One limitation is the small number of tumor samples in rare cancers, which limits statistical power and the detection of significant pQTLs. For example, only 8 proteins with pQTLs were found in CHOL, likely due to the small sample size (Table S1). Additionally, we observed that some cancer types with large sample sizes identified only a small number of pQTLs (e.g., BRAC), possibly due to the data quality of protein abundance. Tumors originating from different tissues may have variations in protein extraction quality or protein measurement accuracy (3). Furthermore, cancer type heterogeneity can impact pQTL identification, as tumors from different tissues exhibit distinct protein expression profiles and genetic landscapes. Addressing these limitations is necessary to ensure valid and reliable results.

Protein expression levels in tumors can impact response of cancer cells to therapeutic drugs due to their role as targets of drug action, with alterations in expression potentially modifying drug sensitivity or resistance. In GPIP, we utilized the imputed drug response and protein expression data in TCGA patients to identify the potential associations between protein expression and drug response (Fig. 2). Our results revealed that certain proteins were significantly associated with drug sensitivity or resistance, suggesting that protein expression levels could potentially be used as biomarkers to predict drug response in cancer patients. Recent studies have shown that the impact of genetic variants on drug response can be mediated through protein-protein interaction (PPI) networks (19,20). Integrating genetic variants and PPI to further understand the associations between protein expression and drug response may provide further insights.

The protein expression level in tumors is crucial in the context of tumor immune microenvironment and immunotherapy, as it might impact immune cell abundance and response, and potentially improve the efficacy of immunotherapy. In GPIP, we examined the association between protein expression levels and imputed immune cell abundance across multiple cancer types. Our study identified ~21,000 significant correlations between proteins and immune cell types, highlighting the potential role of protein expression levels in shaping the tumor immune microenvironment (Fig. 3). Our results offer a promising avenue for future research to understand the interplay between protein expression and the tumor immune microenvironment, leading to personalized immunotherapy strategies and better treatment outcomes for cancer patients.

In summary, GPIP is a comprehensive and multifaceted data platform designed to aid functional and clinical research on protein in cancer patients. As more relevant datasets become available, we will continually update GPIP to ensure its relevance and usefulness to the research community.

Significance:

Comprehensive characterization of the relationship between protein expression and the genetic, pharmacogenomic, and immune landscape of tumors across cancer types provides a foundation for investigating the role of protein expression in cancer development and treatment.

Researchers Produce First Map of Human Proteome, and Reveal New
Significance in The Human Proteome

HAHNE, TECHNISCHE UNIVERSITÄT MÜNCHENTwo international teams have
independently produced the first drafts of the human proteome. These curated
catalogs of the proteins expressed in most non-diseased human tissues and
organs can be used as a baseline to better understand changes that occur in
disease states. Their findings were published today (May 29) in Nature.

Both teams uncovered new complexities of the human genome, identifying novel
proteins from regions of the genome previously thought to be non-coding.

“the real breakthrough with these two projects is the comprehensive coverage of
more than 80 percent of the expected human proteome” said Hanno Steen, director
of proteomics at Boston Children’s Hospital, who was not involved in the work.

The human proteome map provides a catalog of proteins expressed in nondiseased tissues and organs to use as baseline in understanding changes that occur in disease

Given the growing importance of proteins in medical laboratory testing,

pathologists will want to know that drafts of the complete human proteome
have been released to the public.

Experts are comparing this to the first complete map of the human genome

and this information provides for rapid advances
in understanding transcriptomics and metabolomics

Map of Human Proteome Expected to Advance Medical Science

“Housekeeping genes” that are expressed in all tissues and cell types

have been thought to be involved in basic cellular functions.

Two teams developing a Human Proteome Map

detected proteins encoded by 2,350 genes
across all human cells and tissues.

The corresponding housekeeping proteins comprised
about 75% of total protein mass.

histones,
ribosomal proteins,
metabolic enzymes, and
cytoskeletal proteins

The two international teams produced

the first drafts of the human protoeome,
a catalog of proteins expressed in most
nondiseased human issues and organs.

The evidence suggests there is translation from DNA regions

that were not thought to be translated—including
more than 400 translated long, intergenic non-coding RNAs (lincRNAs)—
found by the Küster team—and
193 new proteins—uncovered by the Pandey team.

This proteome map can be used as a baseline to understand

changes that occur in the disease state

These studies are part of the Human Proteome Project,

an international effort by the Human Proteome Organization
to revolutionize our understanding of the human proteome
by coordinating research at laboratories around the world directed
at mapping the entire human proteome.

This new information about the human proteome

is expected to trigger rapid advances in medical science
and a better understanding of the underlying causes of human diseases.

One Study Team Was at Johns Hopkins University

In one study, which was headed by Ahilesh Pandey, M.D.,
at Johns Hopkins University in Baltimore,
and colleague Harsha Gowda, Ph.D.,
of the Institute of Bioinformatics in Bangalore, India,
the research team used an advanced form of mass spectrometry to analyze proteins
to create the human proteome map,

according to a report published in NIH Research Matters.

The research team examined

30 normal human tissue and cell types:
17 adult tissues,
7 fetal tissue and
6 blood cell types.

Samples from three people per tissue type

were processed through several steps.

The protein fragments, or peptides, were analyzed on

high-resolution Fourier-transform mass spectrometers.

The amino acid sequences were

then compared to known sequences.

Their results were published in the May 28, 2014, issue of Nature.

The resulting draft map of the human proteome map includes

proteins encoded by more than 17,000 genes,
noted the Research Matters article.

Among these are hundreds of proteins from regions

previously thought to be non-coding.

This study also provided a new understanding of

how genes are expressed.

For example, almost 200 genes begin in locations

other than those predicted based on genetic sequence.

“The fact that 193 of the proteins came from DNA sequences

predicted to be non-coding means that
we don’t fully understand how cells read DNA,
since the sequences code for proteins

This study also produced the Human Proteome Map,

an interactive online portal.

This can be accessed at this link.

The study data will soon be accessible through

the National Center for Biotechnology Information.

German’s ProteomicsDB Analyzed a Mix of Available and New Tissue Data

The other study was conducted by a team lead by Bernhard Küster
of the Technische Universität München in Germany.

Küster and his colleagues created a

searchable,
public database called
ProteomicsDB.

This database contains 92% of the

estimated 19,629 human proteins,

noted The Scientist article.

Küster’s team also used mass spectrometry

to analyze human tissue samples.

This team’s approach differed from Johns Hopkins’ in that

it compiled about 60% of the information
in the ProteomicsDB database

by using existing raw mass spec (MS) data
from databases and colleagues’ contributions.

To fill data gaps, the Küster lab generated its own
MS data after analyzing

60 human tissues,
13 body fluids, and
147 cancer cell lines.

High-resolution public data

was selected and computationally processed
for strict quality

The database for ProteomicsDB is

public and searchable.

It can be accessed at this link.

German Study Added New Insights to Transcription Process

Comparing the ratio of protein to mRNA levels for every protein globally,

the Küster lab found that the translation rate
is a constant feature of each mRNA transcript.

The proteomics community has viewed

transcriptome and proteome data as two sides of a coin.

But this analysis shows that at least, at steady state,

once the ratio for an mRNA/protein pair has been calculated,

protein levels can be determined
just from specific mRNA levels.

Proteomics researchers in Toronto maintaining ionic balance and in Boston commented on the
importance of the findings, even a “new paradigm” because of

the fixed ratio of protein to mRNA

This is quite in keeping with what we have been learning

with respect to homeostasis.

In 2003, the Human Genome Project created a

draft map of the human genome—
all the genes in the human body.

Genomics has since driven many advances in medical science.

This was a progress from the classic discovery of Watson and Crick –

the classical dogma holds that
DNA makes RNA makes protein.
no constraints are place on this

But the cell is functioning in contact with other cells,

immersed in interstitial fluid
maintaining cationic and anionic balance
and mitochondrial energy balance and ubiquitin systems interact
and protein interacts with the chromatin and transcriptional RNA

So the restriction that has been discovered has credence,

the classical diagram has to be redrawn

Deeper Knowledge of Proteome to Improve Diagnostics and Therapeutics

In the two projects is:

the comprehensive coverage of more than 80% of
the expected human proteome,

These studies indicate that to get to

a deep level of proteome coverage,
many different tissue types must be probed.

the studies are complimentary.

The Hopkins group provided a survey of human proteins from a single source, which allows for easy comparisons within their data.
The ProteomeDB effort connected new information with existing data

A deeper knowledge of the human proteome could help

fill the gap between genomes and phenotypes.

As this occurs, it has the potential to transform

the way diagnostics and therapeutics are developed,
enhancing overall biomedical research and healthcare,

it was noted in a report presented to scientific leaders at a NIH workshop

on advances in proteomics and its applications.

Having completed a draft map of the human proteome—
the set of all proteins in the human body

It opens another window to cell function.

It has been ASSUMED –

genes control the most basic functions of the cell,
including what proteins to make and when.
but we have assumed for too much in assigning
full control to the genome

Researchers have identified more than 20,000 protein- coding genes.

However, scientific understanding of the proteome has

lagged behind that of the genome,
partly because of the proteome’s complexities.

The relationship between genes and proteins isn’t a simple matter of

one gene coding for one protein.

Stretches of DNA can be read and translated

into proteins in different ways.

Proteins are also more difficult to sequence than genes.

The importance of these latest studies to pathologists and Ph.D.s working

in molecular diagnostics laboratories is that
this information will expedite further research into the human proteome.

Such research is expected to lead to

novel methods of diagnosis and complex
“multi-analyte” clinical laboratory tests that
look for multiple proteins in a single assay.

“The prevalent view was that information transfer was from genome to transcriptome to proteome.
What these efforts show is that it’s a two-way road— proteomics can be used to annotate the genome.
The importance is that, using these datasets, we can improve the annotation of the genome and the
algorithms that predict transcription and translation,” said Steen. “The genomics field can now hugely
benefit from proteomics data.”

Wilhelm et al., “Mass-spectrometry- based draft of the human proteome,”
Nature, http://dx.doi.doi:/10.1038/nature13319, 2014

M.S. Kim et al. “A draft map of the human proteome,”
Nature, http://dx.doi.org:/10.1038/nature13302, 2014.

Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA

Archive for the ‘Curated’ Category

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

AI Initiatives in Big Pharma @Grok

kprompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

For the Ten IP Asset Classes in LPBI Group’s Portfolio, See

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

CONCLUSIONS: The Voice of Dr. Stephen J. Williams PhD

Like this:

The Human Proteome Map Completed

The Human Proteome Map Completed

The genetic, pharmacogenomic, and immune landscapes associated with protein expression across human cancers.

Significance:

Like this:

Follow Blog via Email

Recent Posts

Archives

Categories

Meta

Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA

Archive for the ‘Curated’ Category

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

AI Initiatives in Big Pharma @Grok

kprompt &amp; Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

For the Ten IP Asset Classes in LPBI Group’s Portfolio, See

AI Initiatives in Big Pharma @Grok prompt &amp; Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

CONCLUSIONS: The Voice of Dr. Stephen J. Williams PhD

Share this:

Like this:

The Human Proteome Map Completed

The Human Proteome Map Completed

The genetic, pharmacogenomic, and immune landscapes associated with protein expression across human cancers.

Significance:

Share this:

Like this:

Follow Blog via Email

Recent Posts

Archives

Categories

Meta

kprompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art