Feeds:
Posts
Comments

Archive for the ‘Curated’ Category

AI Initiatives in Big Pharma @Grok

kprompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

Curator: Aviva Lev-Ari, PhD, RN

We had researched the topic of AI Initiatives in Big Pharma in the following article:

  • Authentic Relevance of LPBI Group’s Portfolio of IP as Proprietary Training Data Corpus for AI Initiatives at Big Pharma

https://pharmaceuticalintelligence.com/2025/11/15/authentic-relevance-of-lpbi-groups-portfolio-of-ip-as-proprietary-training-data-corpus-for-ai-initiatives-at-big-pharma/

 

We are publishing a Series of Five articles that demonstrate the Authentic Relevance of Five of the Ten Digital IP Asset Classes in LPBI Group’s Portfolio of IP for AI Initiatives at Big Pharma.

  • For the Ten IP Asset Classes in LPBI Group’s Portfolio, See

https://pharmaceuticalintelligence.com/portfolio-of-ip-assets/

The following Five Digital IP Asset classes are positioned as Proprietary Training Data and Inference for Foundation Models in Health care.
This Corpus comprises of Live Repository of Domain Knowledge Expert-Written Clinical Interpretations of Scientific Findings codified in the following five Digital IP ASSETS CLASSES:
 IP Asset Class I: Journal: PharmaceuticalIntelligence.com
6,250 scientific articles (70% curations, creative expert opinions.  30% scientific reports).
2.4MM Views, equivalent of $50MM if downloading an article is paid market rate of $30.

https://pharmaceuticalintelligence.com/vision/pharmaceuticalintelligence-com-journal-projecting-the-annual-rate-of-article-views/

 

 

• IP Asset Class II: 48 e-Books: English Edition & Spanish Edition.
152,000 pages downloaded under pay-per-view. The largest number of downloads for one e-Publisher (LPBI)
• IP Asset Class III: 100 e-Proceedings and 50 Tweet Collections of Top Biotech and Medical Global Conferences, 2013-2025

• IP Asset Class V: 7,500 Biological Images in our Digital Art Media Gallery, as prior art. The Media Gallery resides in WordPress.com Cloud of LPBI Group’s Web site

 

• IP Asset Class X: +300 Audio Podcasts: Interviews with Scientific Leaders
BECAUSE THE ABOVE ASSETS ARE DIGITAL ASSETS they are ready for use as Proprietary TRAINING DATA and INFERENCE for AI Foundation Models in HealthCare.
Expert‑curated healthcare corpus mapped to a living ontology, already packaged for immediate model ingestion and suitable for safe pre-training, evals, fine‑tuning and inference. If healthcare domain data is on your roadmap, this is a rare, defensible asset.
The article TITLE of each of the five Digital IP Asset Classes matched to AI Initiatives in Big Pharma, an article per IP Asset Class are:
  • AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class I: PharmaceuticalIntelligence.com Journal, 2.5MM Views, 6,250 Scientific articles and Live Ontology

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-data-training-and-inference-by-lpbi-groups-ip-asset-class-i-pharmaceuticalintelligence-com-journal-2-5mm-views-6250-scientific-article/

 

  • AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class II: 48 e-Books: English Edition & Spanish Edition. 152,000 pages downloaded under pay-per-view

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-ii-48-e-books-english-edition-spanish-edition-152000/

 

  • AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class III: 100 e-Proceedings and 50 Tweet Collections of Top Biotech and Medical Global Conferences, 2013-2025

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-iii-100-e-proceedings-and-50-tweet-collections-of-top-biotech/

 

  • AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-v-7500-biological-images-in-lpbi-groups-digital-art/

Conclusions by @Grok
Conclusions and Implications
Digital IP Class V’s image gallery is a “treasure trove” ready for Big Pharma AI, establishing prior art while powering multimodal breakthroughs. Technical Implications: Enables visual-enhanced models for disease detection and R&D acceleration. Business Implications: Supports $500M investments with ethical, diverse data for partnerships; licensing potential for grants/webinars. Unique Insight: As embedded prior art, these visuals create a “moat” in multimodal AI—extending series from text to imagery for holistic Pharma companies inference. Promotional with links to gallery/IP portfolio. Caps the series by adding visual depth to textual assets.
  • AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class X: +300 Audio Podcasts Library: Interviews with Scientific Leaders

https://pharmaceuticalintelligence.com/2025/11/22/ai-initiatives-in-big-pharma-grog-prompt-proprietary-training-data-and-inference-by-lpbi-groups-ip-asset-class-x-300-audio-podcasts-library-interviews-with-scientific-leaders/

 

In the series of five articles, as above, we are presenting the key AI Initiatives in Big Pharma as it was created by our prompt to @Grok on 11/18/2025:

  • What are PFIZER’s AI INITIATIVES?

@Grok Response:

x.com/i/grok/share/0ol5VOJsEYs11baXq4xkzNb0h

Generative AI tools that save scientists up to 16,000 hours annually in literature searches and data analysis.
Drug Discovery and Development Acceleration Pfizer uses AI, supercomputing, and ML to streamline R&D timelines
Clinical Trials and Regulatory Efficiency AI:

-Predictive Regulatory Tools

-Decentralize Trials

-inventory management

Disease Detection and Diagnostics:

–       ATTR-CM Initiative

–       Rare diseases

Generative AI and Operational Tools:

–       Charlie Platform

–       Scientific Data Cloud AWS powered ML on centralized data

–       Amazon’s SageMaker /Bedrock for Manufacturing efficiency

–       Global Health Grants:

Pfizer Foundation’s AI Learning Lab for equitable access to care and tools for community care

Partnerships and Education

–       Collaborations: IMI Big Picture for 3M – sample disease database

–       AI in Pharma AIPM Symposium: Drug discovery and Precision Medicine

–       Webinars of AI for biomedical data integration

–       Webinar on AI in Manufacturing

Strategic Focus:

–       $500M R&D reinvestment by 2026 targets AI for Productivity

–       Part of $7.7B cost savings

–       Ethical AI, diverse DBs

–       Global biotech advances: China’s AI in CRISPR

 

  • AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

    The Left Column was written @Grok

    The Right Column was written by Aviva Lev-Ari, PhD, RN

AI Initiative at Big Pharma

i.e., Pfizer

Biological Images selected by Experts embedded in original Text (Prior Art)

Generative AI tools that save scientists up to 16,000 hours annually in literature searches and data analysis.
Drug Discovery and Development Acceleration Pfizer uses AI, supercomputing, and ML to streamline R&D timelines Gallery of ~8,000 Biological images and captions is a Treasure TROVE
Clinical Trials and Regulatory Efficiency AI:

-Predictive Regulatory Tools

-Decentralize Trials

-inventory management

Disease Detection and Diagnostics:

–       ATTR-CM Initiative

–       Rare diseases

Gallery of ~8,000 Biological images and captions is a Treasure TROVE
Generative AI and Operational Tools:

–       Charlie Platform

–       Scientific Data Cloud AWS powered ML on centralized data

–       Amazon’s SageMaker/Bedrock for Manufacturing efficiency

–       Global Health Grants:

Pfizer Foundation’s AI Learning Lab for equitable  access to care and tools for community care

Ingest into Charlie Platform the Media Gallery for generation of Medical article drafts
Partnerships and Education

–       Collaborations: IMI Big Picture for 3M – sample disease database

–       AI in Pharma AIPM Symposium: Drug discovery and Precision Medicine

–       Webinars of AI for biomedical data integration

–       Webinar on Ai in Manufacturing

Strategic Focus:

–       $500M R&D reinvestment by 2026 targets AI for Productivity

–       Part of $7.7B cost savings

–       Ethical AI, diverse DBs

–       Global biotech advances: China’s AI in CRISPR

 

CONCLUSIONS: The Voice of Dr. Stephen J. Williams PhD

PENDING

Article Summary by @Grok of the ArticleTitle:

AI Initiatives in Big Pharma @Grok prompt & Proprietary Training Data and Inference by LPBI Group’s IP Asset Class V: 7,500 Biological Images in LPBI Group’s Digital Art Media Gallery, as Prior Art

Publication Date: November 22, 2025

Author/Curator: Aviva Lev-Ari, PhD, RN
(Curator; Posted by 2012pharmaceutical)
@Grok SOURCE

Overview: Fifth in LPBI Group’s five-article series on AI-ready digital IP assets for pharma. This piece spotlights IP Asset Class V—7,500 expert-selected biological images in the Digital Art Media Gallery—as proprietary training data and “prior art” for multimodal AI foundation models in healthcare. Leveraging a November 18, 2025, Grok prompt on Pfizer’s AI efforts, it maps the gallery to pharma applications, emphasizing visual data’s role in enhancing generative AI for diagnostics, drug discovery, and article drafting. Unlike text-heavy prior classes, this focuses on image-caption pairs for ingestion into platforms like Charlie, positioning them as a “treasure trove” for ethical, diverse AI training.Main Thesis and Key Arguments

  • Core Idea: LPBI’s 7,500 biological images (with captions) serve as defensible, expert-curated prior art and training data for Big Pharma AI, enabling multimodal inference that combines visuals with clinical insights—outpacing generic datasets by injecting human-selected domain knowledge.
  • Value Proposition: The ~8,000-image gallery (actual 7,500 noted) is a ready-to-ingest visual corpus for platforms like Pfizer’s Charlie, generating medical drafts and accelerating R&D. Valued within the series’ $50MM-equivalent portfolio; unique as embedded prior art in original texts, supporting ethical AI with diverse, ontology-mapped visuals.
  • Broader Context: Part of ten IP classes, with five (I-V, X) AI-primed; complements text assets (e.g., 6,250 articles, 48 e-books) by adding multimodal depth. Highlights live ontology for semantic integration, contrasting open-source data with proprietary, safe-for-healthcare inputs.

AI Initiatives in Big Pharma (Focus on Pfizer)Reuses the Grok prompt highlights, presented in a verbatim table:

Initiative Category
Description
Generative AI Tools
Generative AI tools that save scientists up to 16,000 hours annually in literature searches and data analysis.
Drug Discovery Acceleration
Drug Discovery and Development Acceleration Pfizer uses AI, supercomputing, and ML to streamline R&D timelines.
Clinical Trials & Regulatory Efficiency
Clinical Trials and Regulatory Efficiency AI: -Predictive Regulatory Tools -Decentralize Trials -inventory management.
Disease Detection & Diagnostics
Disease Detection and Diagnostics: – ATTR-CM Initiative – Rare diseases.
Generative AI & Operational Tools
Generative AI and Operational Tools: – Charlie Platform – Scientific Data Cloud AWS powered ML on centralized data – Amazon’s SageMaker /Bedrock for Manufacturing efficiency – Global Health Grants: Pfizer Foundation’s AI Learning Lab for equitable access to care and tools for community care.
Partnerships & Education
Partnerships and Education – Collaborations: IMI Big Picture for 3M – sample disease database – AI in Pharma AIPM Symposium: Drug discovery and Precision Medicine – Webinars of AI for biomedical data integration – Webinar on AI in Manufacturing.
Strategic Focus
Strategic Focus: – $500M R&D reinvestment by 2026 targets AI for Productivity – Part of $7.7B cost savings – Ethical AI, diverse DBs – Global biotech advances: China’s AI in CRISPR.
Mapping to LPBI’s Proprietary DataCore alignment table (verbatim extraction, linking Pfizer initiatives to Class V assets):
AI Initiative at Big Pharma i.e., Pfizer
Biological Images selected by Experts embedded in original Text (Prior Art)
Generative AI Tools (16,000 hours saved)
(No specific mapping provided.)
Drug Discovery Acceleration
Gallery of ~8,000 Biological images and captions is a Treasure TROVE.
Clinical Trials & Regulatory Efficiency
(No specific mapping provided.)
Disease Detection & Diagnostics (ATTR-CM, rare diseases)
Gallery of ~8,000 Biological images and captions is a Treasure TROVE.
Generative AI & Operational Tools (Charlie, AWS, etc.)
Ingest into Charlie Platform the Media Gallery for generation of Medical article drafts.
Partnerships & Education (IMI, AIPM, webinars)
(No specific mapping provided.)
Strategic Focus ($500M reinvestment, ethics)
(No specific mapping provided.)

Methodologies and Frameworks

  • AI Training Pipeline: Ingest image-caption pairs into Charlie/AWS platforms for pre-training (multimodal foundation models), fine-tuning (e.g., diagnostics visuals), and inference (e.g., draft generation). Use living ontology for semantic tagging; prior art embedding ensures IP defensibility.
  • Productivity Model: Amplifies Pfizer’s 16,000-hour savings and $7.7B efficiencies by fueling generative tools with expert visuals; ethical emphasis on diverse DBs for global equity (e.g., AI Learning Lab).
  • Insights: References Grok prompt as real-time sourcing; quotes series-wide vision of assets as “codified digital treasures” for safe healthcare AI. Predicts revolution in visual inference for rare diseases/ATTR-CM.

Conclusions and Implications

Digital IP Class V’s image gallery is a “treasure trove” ready for Big Pharma AI, establishing prior art while powering multimodal breakthroughs. Technical Implications: Enables visual-enhanced models for disease detection and R&D acceleration. Business Implications: Supports $500M investments with ethical, diverse data for partnerships; licensing potential for grants/webinars. Unique Insight: As embedded prior art, these visuals create a “moat” in multimodal AI—extending series from text to imagery for holistic Pharma companies inference. Promotional with links to gallery/IP portfolio. Caps the series by adding visual depth to textual assets.
If you’d like cross-series analysis, image examples, or multimodal AI trends, let me know!

Read Full Post »

The Human Proteome Map Completed

Reporter and Curator: Larry H. Bernstein, MD, FCAP

UPDATED 6/02/2024

The genetic, pharmacogenomic, and immune landscapes associated with protein expression across human cancers.

Source: Chen C, Liu Y, Li Q, Zhang Z, Luo M, Liu Y, Han L. The Genetic, Pharmacogenomic, and Immune Landscapes Associated with Protein Expression across Human Cancers. Cancer Res. 2023 Nov 15;83(22):3673-3680. doi: 10.1158/0008-5472.CAN-23-0758. PMID: 37548539; PMCID: PMC10843800.

Abstract

Proteomics is a powerful approach that can rapidly enhance our understanding of cancer development. Detailed characterization of the genetic, pharmacogenomic, and immune landscape in relation to protein expression in cancer patients could provide new insights into the functional roles of proteins in cancer. By taking advantage of the genotype data from The Cancer Genome Atlas (TCGA) and protein expression data from The Cancer Proteome Atlas (TCPA), we characterized the effects of genetic variants on protein expression across 31 cancer types and identified approximately 100,000 protein quantitative trait loci (pQTL). Among these, over 8000 pQTL were associated with patient overall survival. Furthermore, characterization of the impact of protein expression on more than 350 imputed anticancer drug responses in patients revealed nearly 230,000 significant associations. In addition, approximately 21,000 significant associations were identified between protein expression and immune cell abundance. Finally, a user-friendly data portal, GPIP (https://hanlaboratory.com/GPIP), was developed featuring multiple modules that enable researchers to explore, visualize, and browse multidimensional data. This detailed analysis reveals the associations between the proteomic landscape and genetic variation, patient outcome, the immune microenvironment, and drug response across cancer types, providing a resource that may offer valuable clinical insights and encourage further functional investigations of proteins in cancer.

Introduction

Functional proteomics is a powerful approach that helps us understand cancer pathophysiology and identify potential therapeutic strategies (). Functional protein analysis using reverse-phase protein arrays (RPPA) has already proven highly effective in studying large numbers of TCGA samples, especially when integrated with genomic, transcriptomic, and clinical information (). Previous works demonstrated that a QTL mapping approach is effective to understand the genetic basis of multiple molecular features in human diseases (). Identifying the sequence determinants of protein levels (pQTLs) may guide the search for causal genes and facilitate understanding the underlying mechanisms of human diseases. However, it remains challenging to further understand the functional roles of protein expression in cancers. For example, it is unclear whether proteins are associated with drug response and/or immune features in patients. In this study, we systematically investigated the effects of genetic variants on protein expression and characterized the impact of protein expression on imputed drug responses and immune cell abundances from different sources (Fig. 1). To facilitate broad access of these data for the biomedical research community, we developed a user-friendly database, GPIP (https://hanlaboratory.com/GPIP). We expect this study to have a significant clinical impact on the future development of protein-based targeted therapies.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0001.jpg
Impact of genetic variants on protein expression.

A Workflow of GPIP to identify pQTLs and survival-associated pQTLs. B The number of pQTLs identified for each cancer type. C Association between CYCLINB1 protein expression level and rs12576855 in LUAD patients. D Association between CYCLINB1 protein expression level and rs2722796 in LGG patients. E The number of survival-associated pQTLs identified for each cancer type. F Kaplan–Meier plot showing the association between rs10918659 (pQTL of HER2_pY1248) genotypes and overall survival times of STAD patients. G Kaplan–Meier plot showing the association between rs13158796 (pQTL of HER2_pY1248) genotypes and overall survival times of STAD patients.

Identification of protein–drug associations

To investigate potential associations between protein expression and drug response, we calculated the Spearman rank correlation between protein expression data and drug response from DrVAEN and cancerRxTissue. These two datasets employed distinct predictive models that integrated omics data from CCLE and drug response data from GDSC to predict drug response in TCGA samples (Fig. 2A) (,). Association with |Rs| > 0.3 and FDR < 0.05 were considered as significant associations in each cancer type.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0002.jpg
Exploring the pharmacogenomics of protein in human cancer.

A Workflow of GPIP to identify Drug-associated proteins. B The number of protein-drug response pairs identified from DrVAEN (left) and cancerRxTissue (right) for each cancer type. C Visualization of the associations between proteins and drugs (DrVAEN) within and across different cancer signaling pathways. Blue links represent associations within a single pathway, while orange links represent associations cross pathways. D Enrichment analysis of drug target pathways among significant protein-drug response pairs. The color represents the log2 (odds ratio) of Fisher’s exact test. The size represents the FDR value.

Identification of protein–immune cell associations

To examine the relationship between protein expression and immune cell abundance, we utilized Spearman rank correlation coefficient to calculate the associations between protein expression data and immune cell abundance data from TIMER, CIBERSORT, ImmuneCellAI, and ImmuneCellGSVA (Fig. 3). These datasets utilized different methods to evaluate immune cell abundance by leveraging immune gene signatures as a proxy (). We considered correlations with |Rs| > 0.3 and FDR < 0.05 as significant associations.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0003.jpg
Exploring the immune landscapes of protein in human cancer.

A Workflow of GPIP to identify Immune cell-associated proteins. B The number of protein-drug response pairs identified from ImmuneCellsGSVA (purple), ImmuCellAI (yellow), TIMER (red) and CIBERSORT (green) for each cancer type. C The top 10 proteins with the highest number of significantly associated immune cell types in HNSC. The color represents the Rs between protein expression and immune cell abundance (ImmuneCellGSVA). The size represents the FDR value. D Association between PREX1expression and impute MDSC abundance in HNSC patients.

Database construction

GPIP was developed using Python Flask-RESTful API frameworks (https://flask-restful.readthedocs.io/), AngularJS (https://angularjs.org), and Bootstrap (https://getbootstrap.com/). The database for GPIP was implemented using the NoSQL database program MongoDB (https://www.mongodb.com/). The user-friendly interface of the GPIP web application was served through the Apache HTTP Server, allowing users to access the database and perform queries and analysis through a web browser.

Data availability

All results generated in this study can be found in GPIP database, (https://hanlaboratory.com/GPIP). Publicly available data generated by others were used by the authors in this study: The genotype data and clinical data were obtained from The Cancer Genome Atlas (TCGA) data portal at https://tcga-data.nci.nih.gov/tcga/. The reverse-phase protein array (RPPA) protein expression data was obtained from The Cancer Proteome Atlas (TCPA) data portal at https://www.tcpaportal.org/. The imputed pharmacogenomic data were obtained from DrVAEN at https://bioinfo.uth.edu/drvaen/ and cancerRxTissue at https://manticore.niehs.nih.gov/cancerRxTissue/. The immune-cell infiltration data were obtained from Tumor Immune Estimation Resource (TIMER) at http://timer.cistrome.org/, Immune Cell Abundance Identifier (ImmuCellAI) at http://bioinfo.life.hust.edu.cn/ImmuCellAI/, and CIBERSORT at https://cibersort.stanford.edu/.

A comprehensive data portal

We developed a user-friendly data portal, GPIP (https://hanlaboratory.com/GPIP), to facilitate visualizing, searching, and browsing of our results by the biomedical research community (Fig. 4A). GPIP contains four main modules: Protein-QTLs, Surivial-QTLs, Drug Response, and Immune Infiltration (Fig. 4B). Querying can be easily performed by selecting cancer type, protein, drug, immune cell abundance, or entering the SNP ID of interest (Fig. 4C). For example, in the Protein-QTLs and Survival-QTLs modules, users can search for pQTLs by selecting a cancer type (e.g., LUAD) and entering a protein name (e.g., CYCLINB1) or an SNP ID (e.g., rs12576855). In the Drug Response module, users can search for protein-drug response associations by selecting a data source for imputed drug response (e.g., DrVAEN) and selecting an anticancer drug (e.g., Talazoparib) or a protein (e.g., PARP1). In the Immune Infiltration module, users can search for protein-immune infiltration pairs by selecting a data source for imputed immune cell abundance (e.g., ImmuneCellsGSVA), and selecting an immune cell type (e.g., Activated B cell) or a protein (e.g., PDL1). In addition, on the bottom of the main page, we developed a cancer type module where users can click on a specific cancer type (e.g., BLCA) to search for related information across all 4 modules (Fig. 4D). The search results for each module included a table to list related information accordingly (Fig. 4E). A “Details” button for each result item was clicked for generating a box plot in protein-QTLs module (Fig. 4F), a Kaplan–Meier plot in Survival-QTLs module (Fig. 4G) and a scatter plot in Drug Response and Immune Infiltration modules, respectively (Fig. 4H,I).I). Our database provides a valuable resource for cancer research and will be of great interest to the research community.

An external file that holds a picture, illustration, etc.
Object name is nihms-1924390-f0004.jpg
Content and interface of GPIP.

A GPIP homepage and browser bar. B The four main modules of GPIP. C Search boxes in the pQTLs module. D Search boxes in the cancer type-specific search module. E An example of resulting list in the pQTL module. F An example of boxplot for the pQTLs module result. G An example of Kaplan–Meier plot for the Survival protein-QTLs module result. H An example of scatter plot for the Drug Response module result. I An example of scatter plot for the Immune Infiltration module result.

Discussion

Proteomics plays a crucial role in identifying potential therapeutic strategies and understanding cancer pathophysiology (). In this study, we investigated the effects of genetic variants on protein expression and characterized the impact of protein expression on imputed drug responses and immune cell abundances across human cancers. We also developed the user-friendly data portal, GPIP, to provide access to these results. Our study provides a comprehensive analysis of protein expression in different cancer types and their association with drug response and immune cell abundance.

Identifying genetic variants associated with cancer has revolutionized our understanding of the disease and holds promise for improved diagnosis and treatment. In GPIP, we identified ~100,000 pQTLs across 31 cancer types and 8.8% of them were found to be associated with patient survival (Fig. 1). These genetic variants hold significant promise for unraveling the underlying biological mechanisms of disease progression and response to treatments. For example, a survival-associated pQTL may help to identify a genetic variant that controls the expression of a protein crucial for tumor growth or immune response, thus impacting patient survival. Our results suggest that pQTLs have the potential to serve as prognostic biomarkers and aid in the development of precision medicine.

Despite the promising implications, it is crucial to consider potential limitations of pQTL identification. One limitation is the small number of tumor samples in rare cancers, which limits statistical power and the detection of significant pQTLs. For example, only 8 proteins with pQTLs were found in CHOL, likely due to the small sample size (Table S1). Additionally, we observed that some cancer types with large sample sizes identified only a small number of pQTLs (e.g., BRAC), possibly due to the data quality of protein abundance. Tumors originating from different tissues may have variations in protein extraction quality or protein measurement accuracy (). Furthermore, cancer type heterogeneity can impact pQTL identification, as tumors from different tissues exhibit distinct protein expression profiles and genetic landscapes. Addressing these limitations is necessary to ensure valid and reliable results.

Protein expression levels in tumors can impact response of cancer cells to therapeutic drugs due to their role as targets of drug action, with alterations in expression potentially modifying drug sensitivity or resistance. In GPIP, we utilized the imputed drug response and protein expression data in TCGA patients to identify the potential associations between protein expression and drug response (Fig. 2). Our results revealed that certain proteins were significantly associated with drug sensitivity or resistance, suggesting that protein expression levels could potentially be used as biomarkers to predict drug response in cancer patients. Recent studies have shown that the impact of genetic variants on drug response can be mediated through protein-protein interaction (PPI) networks (,). Integrating genetic variants and PPI to further understand the associations between protein expression and drug response may provide further insights.

The protein expression level in tumors is crucial in the context of tumor immune microenvironment and immunotherapy, as it might impact immune cell abundance and response, and potentially improve the efficacy of immunotherapy. In GPIP, we examined the association between protein expression levels and imputed immune cell abundance across multiple cancer types. Our study identified ~21,000 significant correlations between proteins and immune cell types, highlighting the potential role of protein expression levels in shaping the tumor immune microenvironment (Fig. 3). Our results offer a promising avenue for future research to understand the interplay between protein expression and the tumor immune microenvironment, leading to personalized immunotherapy strategies and better treatment outcomes for cancer patients.

In summary, GPIP is a comprehensive and multifaceted data platform designed to aid functional and clinical research on protein in cancer patients. As more relevant datasets become available, we will continually update GPIP to ensure its relevance and usefulness to the research community.

Significance:

Comprehensive characterization of the relationship between protein expression and the genetic, pharmacogenomic, and immune landscape of tumors across cancer types provides a foundation for investigating the role of protein expression in cancer development and treatment.

Researchers Produce First Map of Human Proteome, and Reveal New
Significance in The Human Proteome

HAHNE, TECHNISCHE UNIVERSITÄT MÜNCHENTwo international teams have
independently produced the first drafts of the human proteome. These curated
catalogs of the proteins expressed in most non-diseased human tissues and
organs can be used as a baseline to better understand changes that occur in
disease states. Their findings were published today (May 29) in Nature.

Both teams uncovered new complexities of the human genome, identifying novel
proteins from regions of the genome previously thought to be non-coding.

“the real breakthrough with these two projects is the comprehensive coverage of
more than 80 percent of the expected human proteome” said Hanno Steen, director
of proteomics at Boston Children’s Hospital, who was not involved in the work.

The human proteome map provides a catalog of proteins expressed in nondiseased tissues and organs to use as baseline in understanding changes that occur in disease

Given the growing importance of proteins in medical laboratory testing,

Experts are comparing this to the first complete map of the human genome

  • and this information provides for rapid advances
  • in understanding transcriptomics and metabolomics

Map of Human Proteome Expected to Advance Medical Science

“Housekeeping genes” that are expressed in all tissues and cell types

  • have been thought to be involved in basic cellular functions.

Two teams developing a Human Proteome Map

  • detected proteins encoded by 2,350 genes
  • across all human cells and tissues.

The corresponding housekeeping proteins comprised
about 75% of total protein mass.

  •  histones,
  • ribosomal proteins,
  • metabolic enzymes, and
  • cytoskeletal proteins

The two international teams produced

  • the first drafts of the human protoeome,
  • a catalog of proteins expressed in most
  • nondiseased human issues and organs.

The evidence suggests there is translation from DNA regions

  • that were not thought to be translated—including
  • more than 400 translated long, intergenic non-coding RNAs (lincRNAs)—
    found by the Küster team—and
  • 193 new proteins—uncovered by the Pandey team.

This proteome map can be used as a baseline to understand

  • changes that occur in the disease state

These studies are part of the Human Proteome Project,

  1. an international effort by the Human Proteome Organization
  2. to revolutionize our understanding of the human proteome
  3. by coordinating research at laboratories around the world directed
  4. at mapping the entire human proteome.

This new information about the human proteome

  • is expected to trigger rapid advances in medical science
  • and a better understanding of the underlying causes of human diseases.

One Study Team Was at Johns Hopkins University

  • In one study, which was headed by Ahilesh Pandey, M.D.,
    at Johns Hopkins University in Baltimore,
  • and colleague Harsha Gowda, Ph.D.,
    of the Institute of Bioinformatics in Bangalore, India,
  • the research team used an advanced form of mass spectrometry to analyze proteins
  • to create the human proteome map,

according to a report published in NIH Research Matters.

The research team examined

  1. 30 normal human tissue and cell types:
  2. 17 adult tissues,
  3. 7 fetal tissue and
  4. 6 blood cell types.

Samples from three people per tissue type

  • were processed through several steps.

The protein fragments, or peptides, were analyzed on

The amino acid sequences were

  • then compared to known sequences.

Their results were published in the May 28, 2014, issue of Nature.

The resulting draft map of the human proteome map includes

  • proteins encoded by more than 17,000 genes,
  • noted the Research Matters article.

Among these are hundreds of proteins from regions

  • previously thought to be non-coding.

This study also provided a new understanding of

  • how genes are expressed.

For example, almost 200 genes begin in locations

  • other than those predicted based on genetic sequence.

“The fact that 193 of the proteins came from DNA sequences

  • predicted to be non-coding means that
  • we don’t fully understand how cells read DNA,
  • since the sequences code for proteins

This study also produced the Human Proteome Map,

  • an interactive online portal.

This can be accessed at this link.

The study data will soon be accessible through

German’s ProteomicsDB Analyzed a Mix of Available and New Tissue Data

The other study was conducted by a team lead by  Bernhard Küster
of the Technische Universität München in Germany.

Küster and his colleagues created a

This database contains 92% of the

  • estimated 19,629 human proteins,

noted The Scientist article.

Küster’s team also used mass spectrometry

  • to analyze human tissue samples.

This team’s approach differed from Johns Hopkins’ in that

  • it compiled about 60% of the information
  • in the ProteomicsDB database
  1. by using existing raw mass spec (MS) data
  2. from databases and colleagues’ contributions.

To fill data gaps, the Küster lab generated its own
MS data after analyzing

  1. 60 human tissues,
  2. 13 body fluids, and
  3. 147 cancer cell lines.

High-resolution public data

  • was selected and computationally processed
  • for strict quality

The database for ProteomicsDB is

  • public and searchable.

It can be accessed at this link.

German Study Added New Insights to Transcription Process

Comparing the ratio of protein to mRNA levels for every protein globally,

  • the Küster lab found that the translation rate
  • is a constant feature of each mRNA transcript. 

The proteomics community has viewed

  • transcriptome and proteome data as two sides of a coin.

But this analysis shows that at least, at steady state,

  • once the ratio for an mRNA/protein pair has been calculated,
  1. protein levels can be determined
  2. just from specific mRNA levels.

Proteomics researchers in Toronto maintaining ionic balance and in Boston commented on the
importance of the findings, even a “new paradigm” because of

  • the fixed ratio of protein to mRNA

This is quite in keeping with what we have been learning

  • with respect to homeostasis.

In 2003, the Human Genome Project created a

  • draft map of the human genome—
  • all the genes in the human body.

Genomics has since driven many advances in medical science.

This was a progress from the classic discovery of Watson and Crick –

  • the classical dogma holds that
  • DNA makes RNA makes protein.
  • no constraints are place on this

But the cell is functioning in contact with other cells,

  • immersed in interstitial fluid
  • maintaining cationic and anionic balance
  • and mitochondrial energy balance and ubiquitin systems interact
  • and protein interacts with the chromatin and transcriptional RNA

So the restriction that has been discovered has credence,

  • the classical diagram has to be redrawn

Deeper Knowledge of Proteome to Improve Diagnostics and Therapeutics

In the two projects is:

  • the comprehensive coverage of more than 80% of
  • the expected human proteome,

These studies indicate that to get to

  • a deep level of proteome coverage,
  • many different tissue types must be probed.

the  studies are  complimentary.

  1. The Hopkins group provided a survey of human proteins from a single source, which allows for easy comparisons within their data.
  2. The ProteomeDB effort connected new information with existing data

A deeper knowledge of the human proteome could help

  • fill the gap between genomes and phenotypes.

As this occurs, it has the potential to transform

  • the way diagnostics and therapeutics are developed,
  •  enhancing overall biomedical research and healthcare,

it was noted in a report presented to scientific leaders at a NIH workshop

  • on advances in proteomics and its applications.

Having completed a draft map of the human proteome—
the set of all proteins in the human body

  • It opens another window to cell function.

It has been ASSUMED –

  • genes control the most basic functions of the cell,
  • including what proteins to make and when.
  • but we have assumed for too much in assigning
    full control to the genome

Researchers have identified more than 20,000 protein- coding genes.

However, scientific understanding of the proteome has

  • lagged behind that of the genome,
  • partly because of the proteome’s complexities.

The relationship between genes and proteins isn’t a simple matter of

  • one gene coding for one protein.

Stretches of DNA can be read and translated

  • into proteins in different ways.

Proteins are also more difficult to sequence than genes.

The importance of these latest studies to pathologists and Ph.D.s working

  • in molecular diagnostics laboratories is that
  • this information will expedite further research into the human proteome.

Such research is expected to lead to

  • novel methods of diagnosis and complex
  • “multi-analyte” clinical laboratory tests that
  • look for multiple proteins in a single assay.

“The prevalent view was that information transfer was from genome to transcriptome to proteome.
What these efforts show is that it’s a two-way road— proteomics can be used to annotate the genome.
The importance is that, using these datasets, we can improve the annotation of the genome and the
algorithms that predict transcription and translation,” said Steen. “The genomics field can now hugely
benefit from proteomics data.”

Wilhelm et al., “Mass-spectrometry- based draft of the human proteome,”
Nature,  http://dx.doi.doi:/10.1038/nature13319, 2014

M.S. Kim et al. “A draft map of the human proteome,”
Nature,  http://dx.doi.org:/10.1038/nature13302, 2014.

Tags

proteomicsnoncoding RNAhuman researchhuman proteome projecthuman genetics and genomics

http://www.the-scientist.com/?articles.view/articleNo/40083/title/Human-Proteome-Mapped/

 

__Patricia Kirk

__by Harrison Wein, Ph.D.

__by Anna Azvolinsky

Related Information:

Revealing The Human Proteome

Human Proteome Mapped

The human proteome – a scientific opportunity for transforming diagnostics, therapeutics, and healthcare

Reference: A draft map of the human proteome.
Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Donahue CA, Gowda H, Pandey A.
Nature. 2014 May 29;509(7502):575-81. http://dx.doi.org:/10.1038/nature13302. PMID: 24870542

Funding: NIH’s National Institute of General Medical Sciences (NIGMS), National Cancer Institute (NCI),
and National Heart, Lung, and Blood Institute (NHLBI); the Sol Goldman Pancreatic Cancer Research Center;
India’s Council of Scientific and Industrial Research; and Wellcome Trust/DBT India Alliance.

http://nihprod.cit.nih.gov/researchmatters/june2014/06092014proteome.htm

 

 

 

 

 

 

 

 

 

Read Full Post »