Archive for the ‘Total error’ Category

More on the Performance of High Sensitivity Troponin T and with Amino Terminal Pro BNP in Diabetes

Writer and Curator: Larry H. Bernstein, MD, FCAP


UPDATED on 8/7/2018

Siemens’ high-sensitivity Troponin I (TnIH) assaysgot FDA clearance for use in diagnosing acute myocardial infarction. (Cardiovascular Business) The first high-sensitivity Troponin T test was cleared last year, as MedPage Today reported.



This is the final up to date review of the status of hs troponin T (or I) with or without the combined use of the Brain Type Natriuretic Peptide or its Amino Terminal peptide precursor.  In addition, a new identification of the role of the Atrial Natriuretic Peptide has been reported with respect to arrythmogenic activity.  On the one hand, the diagnostic value of the NT-proBNP has been seen as disappointing, in part because of the question of what information is gained by the test in overt known congestive heart failure, and in part because of uncertainty about following the test during a short hospital stay.  At least, this is the view of this reviewer.  However, in the last several years there has been an emphasis on the value this test adds to prediction of adverse outcomes.   In addition, there has been a hidden nvariable that has much to do with the original reference values that were established for age ranges, without any consideration of pathophysiology that might affect the values within those ranges, leading one to consider values in an aging population as normal, that might well be high.  Why is this?  Aging patients are more likely to have hypertension, and also the onset of type-2 diabetes mellitus, with cardiovascular disease consequences.  Type-2 diabetes mellitus (T2DM), for instance, is associated with insulin resistance and also fat gain with generation of adipokines, but the is also a hyalinization of insulin forming beta-cells of the pancreas, and there is hyalinization of glomeruli (glomerulosclerosis) and afferent arteriolonephrosclerosis with expected decline in glomerular filtrattion rate and hypertension as well.   Of course, this is also associated with hepatosteatosis.   Nevertheless, a reference range is established that takes none of this pathophysiology into account.   While a more reasonable approach has been pointed out, there has been no followup in the literature.

On the other hand, there has been much confusion over the restandardization of a high sensitivity troponin I or T test (hs-Tn(I or T).  The reference range declines precipitously, and there is a good identification of patients who are for the most part disease free, but there is no delineation of patients who are at high risk of acute coronary syndrome with plaque rupture, vs a  host of other cardiovascular conditions.  These have no relationship to plaque rupture, but may be serious and require further evaluation.  The question then becomes whether to admit for a hospital stay, to refer to clinic after an evaluation in the ICU without admission, or to do an extensive evaluation in the emergency department overnight before release for followup.  There is still another dimension of this that has to do with prediction of outcomes using hs-Tn(s) with or without the natriuretic peptides.  Another matter that is not for discussion in this article is the underutilization of hs-CRP.  Originally used for a marker of sepsis in the 1970s, it has come to be tied in with identification of an ongoing inflammatory condition.  Therefore, the existence of a known inflammatory condition in the family of autoimmune diseases, with one exception, might make it unnecessary.

The discussion is broken into three parts:

Part 1.   New findings on the troponins.
Part 2.  The use of combined hs-Tn with a natriuretic peptide (NT-proBNP)
Part 3.  Atrial natriuretic peptide

Part 1.    New findings on the troponins.

Troponin: more lessons to learn

C Liebetrau,HM Nef,andCW.Hamm*
Germany; (GermanCentreforCardiovascularResearch),partnersite
RheinMain,BadNauheim, Germany; and UniversityofGiessen,Medizinische
European Hear tJournal editorial refers to ‘Risk stratification in patients with acute chest pain
using three high-sensitivity cardiac troponin assays’,
by P. Haafetal. troponin entered our diagnostic armamentarium 20 years ago and –
unlike any other biomarker –

  • is going through constant expansion in its application.

Troponin started out as a marker of risk in unstable angina’, then was used

  • as gold standard for risk stratification and therapy guiding in acute coronary syndrome
  •  served further to redefine myocardial infarction, and
  • has also become a risk factor in apparently healthy subjects.

The recently introduced high-sensitivity cardiac troponin (hs-cTn) assays

  • have not only expanded the potential of troponins, but
  • have also resulted in a certain amount of confusion
    • among unprepared users.

After many years troponins were accepted as the gold standard in

  • patients with chest pain by
  • classifying them into troponin-positive and
    • troponin-negative patients.

The new generation of hs-cTn assays has

  • improved the accuracy at the lower limit of detection and
  • provided incremental diagnostic information especially
    • in the early phase of myocardial infarction.

Moreover, low levels of measurable troponins

  • unrelated to ACS have been associated with
    • an adverse long-term outcome.

Several studies demonstrated that

  • these low levels of cardiac troponin measureable 
    • only by hs-Tn assays
  • are able to predict mortality in patients with ACS
  • as well as patients with assumed
    • stable coronary artery disease.

Furthermore, hs-cTn has the potential

  • to play a role in the care of patients
    • undergoing non-cardiac surgery.

The additional determination of hs-cTn

  • improves risk stratification despite
  • established risk scores providing both diagnosis and
  • for prognosis prediction in chest pain patients.

The daily clinical challenge in using the highly sensitive assays is to 

  • interpret the troponin concentrations, especially
  • in patients with concomitant diseases
    • independently from myocardial ischaemia
  • influencing cardiac troponin concentrations
    (e.g. chronic kidney disease, or stroke). 

The troponin test lost its ‘pregnancy test’ quality with the different users.
Different opinions exist on

  • the change of hs-cTn levels compared to simple ‘positive–negative’ interpretation
  • and thereby makes diagnosis finding more complex than before.

This uncertainty probably has the paradigm that

  • serial measurements of troponins are necessary, and also
    • boosted the number of diagnoses of ACS and
    • invasive diagnostic procedures in some locations.

This is more than understandable, with acute chest pain using

  • three high-sensitivity cardiac troponins with their respective baseline value
    • before the diagnosis of acute myocardial infarction (AMI) can be made.

What is a relevant change in concentrations compatible with acute myocardial necrosis and

  • what is only biological variation for the specific biomarker and assay?

Changes in serial measurements between 20% and 200% have been debated, and
the discussion is ongoing. Furthermore, it has been proposed that

  • absolute changes in cardiac troponin concentrations 
    • have a higher diagnostic accuracy for AMI
  • compared with relative changes, and

it might be helpful in distinguishing AMI from other causes of cardiac troponin elevation.

Do we obtain any helpful directives from experts and guidelines for our daily practice?
Foreseeing this dilemma, the 2011 European Society of Cardiology (ESC) Guidelines

  • on non ST-elevation ACS acted.
  • Minor elevations of  troponins were accepted as hs-cTn values in the ‘grey zone’.

This was and still is the rule, but

  • the ESC provided a general algorithm on how to manage patients with limited data.

The ‘Study Group on Biomarkers in Cardiology’ suggested

  • a rise of 50% from the baseline value at low concentrations.

However, this group of experts could also not find a substitute for the missing data

  • needed to validate the proposed recommendation.

The story is just too complex:

  • different troponin assays with
  • different epitope targets,
  • different patient populations,
  • different sampling protocols,
  • different follow-up lengths, and much more.

Therefore, any study that helps us to see better through the fog is welcome here.

Haaf et al. have now presented the results of their study of

  • different hs-cTn assays
    (hs-cTnT, Roche Diagnostics; hs-cTnI, Beckman-Coulter; and  hs-cTnI, Siemens)

    • with respect to the -outcome of patients with acute chest pain.

The authors examine 1117 consecutive patients presenting with acute chest pain.
[340 patients with ACS (30.5%)] from the Advantageous Predictors of Acute Coronary Syndrome
Evaluation (APACE) study. Blood was collected

  • directly on admission and
  • serially thereafter at 2, 3, and 6h.

Eighty-two patients (7.3%) died during the 2-year follow-up. The main finding of the study is that

  1. hs-cTnT predicts mortality more accurately than the hs-cTnI assays, 
  2. -that a single measurement is sufficient
  3. challenges causes of cardiac troponin elevation.

These results of APACE remain in contrast to recent findings from a GUSTO IV cohortof 1335 patients with ACS (Table1).

Table1 Studies investigating high sensitivity troponins for long-term prognosis

Variable                                                       APACE (n 5 1117)              GUSTO IV (n 5 1335)              PEACE (n 5 3567)


Patient cohort                                                   Unstable                            Unstable                               Stable

Blood sampling                                     On admission,1,2,3,6h                    48h after
study randomization           Before randomization

No. of patients with detection limit             883 (79.1%)                                 UKN                                      UKN

No. of patients with hs-cTnT.
99thpercentile                                        401 (35.9%)                              1015 (76%)                          395 (10.9%)

No. of patients with hs-cTnI (Abbott).
detection limit                                           UKN                                             UKN                              3567 (98.5%)

No.of patients with hs-cTnI (Abbott).
99th percentile                                          UKN                                         988(74%)                           105 (2.9%)

No. of patients with NSTEMI                     170 (15.2%)                              100 (100%)                             0 (0%)

Follow-up                                               24 months                                  12 months                            5.2 years

Non-fatal AMI                                           UKN                                              UKN                               209 (5.9%)

Mortality or primary endpoint                    82 (7.3%)                                 119(8.9%)                           203 (5.7%)


Key findings                                    cTnT better than cTnI                      cTnI ¼cTnT                   cTnI better than cTnT

Single cTn sample sufficient

AMI, acute mycordial infaction; cTn, cardiac tropononin; NSTEMI ,non-ST-elevation myocardial infarction; UKN, unknown

NSTEMI defined in the GUSTO IV trial:
  1. one or more episodes of angina lasting ≥ 5min,
  2. within 24h of admission and
  3. either a positive cardiac TnT or I test
    (above the upper limit of a normal for the local assay; during the years 1999 and 2000)
  4. or ≥ 0.5 mm of transient or persistent ST-segment depression.

the prognostic capacity of four different sensitive cardiac troponin assays were compared

  1. hs-cTnT; Roche Diagnostics,
  2. cTnI and hs-cTnI;
  3. Abbott Diagnostics, and
  4. Acc-cTnI; Beckman-Coulter.

In total, 119 patients (8.9%) died during the 1-year follow-up. Looking at their

  • receiver operating characteristic curve (ROC) analyses,
  • there were only negligible diffferences
    • in the area under the curves between the assays.

Contrasting results have also been reported in patients(n 1/4 3.623)

  • with stable coronary artery disease and preserved systolic left ventricular function

from the PEACE trial (Table1).

During a median follow-up period of 5.2 years,

  • there were 203 (5.6%) cardiovascular deaths or
  • first hospitalization for heart failure.

Concentrations of hs-cTnI (Abbott Diagnostics) at or above

  • the limit of detection of the assay were measured in 3567 patients (98.5%), but
  • concentrations of hs-cTnI at or above the gender-specific 99th percentile
    • were found in only 105 patients (2.9%).

This study revealed that

  • there was a strong and graded association
  • between increasing quartiles of hs-cTnI concentrations and
  • the risk for cardiovascular death or heart failure.

Hs-cTnI provided incremental prognostication information

  • over conventional risk markers and
  • other established cardiovascular biomarkers,
  • including hs-cTnT.

In contrast to the APACE results, only hs-cTnI, but

  • no ths-cTnT, was significantly
  • associated with the risk for AMI.

Is there a real difference between cardiac troponin T and cardiac troponin I

  • in predicting long term prognosis?

The question arises of whether there is a true clinically relevant

  • difference between cTnT and cTnI.

Given the biochemical and analytical differences,the two

  • troponins display rather similar serum profiles during AMI.

While minor biological differences between cTnT and cTnI are

  • apparently not relevant for diagnosis
  • and clinical management in the acute setting of ACS.

This is a provocative theory, but appears premature in our opinion.
Above all, the results of the current study appear

  • too inconsistent to allow such conclusions.

In the present study, hs-cTnT (Roche Diagnostics) outperformed

  • hs-cTnI (Siemens and Beckman-Coulter) in terms of
  • very long term prediction of cardiovascular death and
    • heart failure in stable patients.

We don’t know how hs-cTnI from Abbott Diagnostics

  • performs in the APACE consort.

The number of patients and endpoints provided

  • by the APACE registry are rather low.
  • The results could, therefore, be a chance finding.

It is far too early to favour one high sensitivity assay over the other. The findings need confirmation.

Implications for clinical practice

There is no doubt that high-sensitivity assays

  • are the analytical method of choice
    • in terms of risk stratification in patients with ACS.

What is new?
A single measurement of hs-cTn seems to be adequate

  • for long-term risk stratification in patients without AMI.

However, the question of which troponin might be preferable

  • for long-term risk stratification remains unanswered.

Part 2. ability of high-sensitivity cTnT and NT pro-BNP to predict cardiovascular events and death in patients with T2DM

Hillis GS; Welsh P; Chalmers J; Perkovic V; Chow CK; Li Q; Jun M; Neal B; Zoungas S; Poulter N; Mancia G; Williams B; Sattar N; Woodward M
Diabetes Care.  2014; 37(1):295-303 (ISSN: 1935-5548)


Current methods of risk stratification in patients with

  • type 2 diabetes are suboptimal.

The current study assesses the ability of

  • N-terminal pro-B-type natriuretic peptide (NT-proBNP) and
  • high-sensitivity cardiac troponin T (hs-cTnT)

to improve the prediction of cardiovascular events and death in patients with type 2 diabetes.


A nested case-cohort study was performed in 3,862 patients who participated in the Action in Diabetes and Vascular Disease:

Preterax and Diamicron Modified Release Controlled Evaluation (ADVANCE) trial.


Seven hundred nine (18%) patients experienced a

  • major cardiovascular event

(composite of cardiovascular death, nonfatal myocardial infarction, or nonfatal stroke) and

  • 706 (18%) died during a median of 5 years of follow-up.

In Cox regression models, adjusting for all established risk predictors,

  • the hazard ratio for cardiovascular events for NT-proBNP was 1.95 per 1 SD increase (95% CI 1.72, 2.20) and
  • the hazard ratio for hs-cTnT was 1.50 per 1 SD increase (95% CI 1.36, 1.65). The hazard ratios for death were
    • 1.97 (95% CI 1.73, 2.24) and
    • 1.52 (95% CI 1.37, 1.67), respectively.

The addition of either marker improved 5-year risk classification for cardiovascular events
(net reclassification index in continuous model,

  • 39% for NT-proBNP and 46% for hs-cTnT).

Likewise, both markers greatly improved the accuracy with which the 5-year risk of death was predicted.
The combination of both markers provided optimal risk discrimination.


NT-proBNP and hs-cTnT appear to greatly improve the accuracy with which the

  • risk of cardiovascular events or death can be estimated in patients with type 2 diabetes.

PreMedline Identifier: 24089534

Part 3. M-Atrial Natriuretic Peptide

M-Atrial Natriuretic Peptide and Nitroglycerin in a Canine Model of Experimental Acute Hypertensive Heart Failure:
Differential Actions of 2 cGMP Activating Therapeutics.

Paul M McKie, Alessandro Cataliotti, Tomoko Ichiki, S Jeson Sangaralingham, Horng H Chen, John C Burnett
Journal of the American Heart Association 01/2014; 3(1):e000206.
Source: PubMed


Systemic hypertension is a common characteristic in

  • acute heart failure (HF).

This increasingly recognized phenotype

  • is commonly associated with renal dysfunction and
  • there is an unmet need for renal enhancing therapies.

In a canine model of HF and acute vasoconstrictive hypertension

  • we characterized and compared the cardiorenal actions of M-atrial natriuretic peptide (M-ANP),
    a novel particulate guanylyl cyclase (pGC) activator, and
  • nitroglycerin, a soluble guanylyl cyclase (sGC) activator.

HF was induced by rapid RV pacing (180 beats per minute) for 10 days. On day 11, hypertension was induced by continuous angiotensin II
infusion. We characterized the cardiorenal and humoral actions

  • prior to,
  • during, and
  • following intravenous infusions of
  1. M-ANP (n=7),
  2. nitroglycerin (n=7),
  3. and vehicle (n=7) infusion.

Mean arterial pressure (MAP) was reduced by

  1. M-ANP (139±4 to 118±3 mm Hg, P<0.05) and
  2. nitroglycerin (137±3 to 116±4 mm Hg, P<0.05);

similar findings were recorded for

  1. pulmonary wedge pressure (PCWP) with M-ANP (12±2 to 6±2 mm Hg, P<0.05)
  2. and nitroglycerin (12±1 to 6±1 mm Hg, P<0.05).

M-ANP enhanced renal function with significant increases (P<0.05) in

  • glomerular filtration rate (38±4 to 53±5 mL/min),
  • renal blood flow (132±18 to 236±23 mL/min), and
  • natriuresis (11±4 to 689±37 mEq/min) and
  • also inhibited aldosterone activation (32±3 to 23±2 ng/dL, P<0.05), whereas

nitroglycerin had no significant (P>0.05) effects on these renal parameters or aldosterone activation.

Our results advance

the differential cardiorenal actions of

  • pGC (M-ANP) and sGC (nitroglycerin) mediated cGMP activation.

These distinct renal and aldosterone modulating actions make

M-ANP an attractive therapeutic for HF with concomitant hypertension, where

  • renal protection is a key therapeutic goal.

Read Full Post »

Risk of Bias in Translational Science

Author: Larry H. Bernstein, MD, FCAP


Curator: Aviva Lev-Ari, PhD, RN


Assessment of risk of bias in translational science

Andre Barkhordarian1, Peter Pellionisz2, Mona Dousti1, Vivian Lam1,Lauren Gleason1, Mahsa Dousti1, Josemar Moura3 and Francesco Chiappelli14*  

1Oral Biology & Medicine, School of Dentistry, UCLA, Evidence-Based Decisions Practice-Based Research Network, Los Angeles, USA

2Pre-medical program, UCLA, Los Angeles, CA

3School of Medicine, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

4Evidence-Based Decisions Practice-Based Research Network, UCLA School of Dentistry, Los Angeles, CA

Journal of Translational Medicine 2013, 11:184

This is an Open Access article distributed under the terms of the Creative Commons Attribution License


Risk of bias in translational medicine may take one of three forms:

  1. a systematic error of methodology as it pertains to measurement or sampling (e.g., selection bias),
  2. a systematic defect of design that leads to estimates of experimental and control groups, and of effect sizes that substantially deviate from true values (e.g., information bias), and
  3. a systematic distortion of the analytical process, which results in a misrepresentation of the data with consequential errors of inference (e.g., inferential bias).

Risk of bias can seriously adulterate the internal and the external validity of a clinical study, and, unless it is identified and systematically evaluated, can seriously hamper the process of comparative effectiveness and efficacy research and analysis for practice. The Cochrane Group and the Agency for Healthcare Research and Quality have independently developed instruments for assessing the meta-construct of risk of bias. The present article begins to discuss this dialectic.


As recently discussed in this journal [1], translational medicine is a rapidly evolving field. In its most recent conceptualization, it consists of two primary domains:

  • translational research proper and
  • translational effectiveness.

This distinction arises from a cogent articulation of the fundamental construct of translational medicine in particular, and of translational health care in general.

The Institute of Medicine’s Clinical Research Roundtable conceptualized the field as being composed by two fundamental “blocks”:

  • one translational “block” (T1) was defined as “…the transfer of new understandings of disease mechanisms gained in the laboratory into the development of new methods for diagnosis, therapy, and prevention and their first testing in humans…”, and
  • the second translational “block” (T2) was described as “…the translation of results from clinical studies into everyday clinical practice and health decision making…” [2].

These are clearly two distinct facets of one meta-construct, as outlined in Figure 1. As signaled by others, “…Referring to T1 and T2 by the same name—translational research—has become a source of some confusion. The 2 spheres are alike in name only. Their goals, settings, study designs, and investigators differ…” [3].

1479-5876-11-184-1  Fig 1. TM construct

Figure 1. Schematic representation of the meta-construct of translational health carein general, and translational medicine in particular, which consists of two fundamental constructs: the T1 “block” (as per Institute of Medicine’s Clinical Research Roundtable nomenclature), which represents the transfer of new understandings of disease mechanisms gained in the laboratory into the development of new methods for diagnosis, therapy, and prevention as well as their first testing in humans, and the T2 “block”, which pertains to translation of results from clinical studies into everyday clinical practice and health decision making [[3]]. The two “blocks” are inextricably intertwined because they jointly strive toward patient-centered research outcomes (PCOR) through the process of comparative effectiveness and efficacy research/review and analysis for clinical practice (CEERAP). The domain of each construct is distinct, since the “block” T1 is set in the context of a laboratory infrastructure within a nurturing academic institution, whereas the setting of “block” T2 is typically community-based (e.g., patient-centered medical/dental home/neighborhoods [4]; “communities of practice” [5]).

For the last five years at least, the Federal responsibilities for “block” T1 and T2 have been clearly delineated. The National Institutes of Health (NIH) predominantly concerns itself with translational research proper – the bench-to-bedside enterprise (T1); the Agency for Healthcare Research Quality (AHRQ) focuses on the result-translation enterprise (T2). Specifically: “…the ultimate goal [of AHRQ] is research translation—that is, making sure that findings from AHRQ research are widely disseminated and ready to be used in everyday health care decision-making…” [6]. The terminology of translational effectiveness has emerged as a means of distinguishing the T2 block from T1.

Therefore, the bench-to-bedside enterprise pertains to translational research, and the result-translation enterprise describes translational effectiveness. The meta-construct of translational health care (viz., translational medicine) thus consists of these two fundamental constructs:

  • translational research and
  • translational effectiveness,

which have distinct purposes, protocols and products, while both converging on the same goal of new and improved means of

  • individualized patient-centered diagnostic and prognostic care.

It is important to note that the U.S. Patient Protection and Affordable Care Act (PPACA, 23 March 2010) has created an environment that facilitates the pursuit of translational health care because it emphasizes patient-centered outcomes research (PCOR). That is to say, it fosters the transaction between translational research (i.e., “block” T1)(TR) and translational effectiveness (i.e., “block” T2)(TE), and favors the establishment of communities of practice-research interaction. The latter, now recognized as practice-based research networks, incorporate three or more clinical practices in the community into

  • a community of practices network coordinated by an academic center of research.

Practice-based research networks may be a third “block” (T3)(PBTN) in translational health care and they could be conceptualized as a stepping-stone, a go-between bench-to-bedside translational research and result-translation translational effectiveness [7]. Alternatively, practice-based research networks represent the practical entities where the transaction between

  • translational research and translational effectiveness can most optimally be undertaken.

It is within the context of the practice-based research network that the process of bench-to-bedside can best seamlessly proceed, and it is within the framework of the practice-based research network that

  • the best evidence of results can be most efficiently translated into practice and
  • be utilized in evidence-based clinical decision-making, viz. translational effectiveness.

Translational effectiveness

As noted, translational effectiveness represents the translation of the best available evidence in the clinical practice to ensure its utilization in clinical decisions. Translational effectiveness fosters evidence-based revisions of clinical practice guidelines. It also encourages

  • effectiveness-focused,
  • patient-centered and
  • evidence-based clinical decision-making.

Translational effectiveness rests not only on the expertise of the clinical staff and the empowerment of patients, caregivers and stakeholders, but also, and

  • most importantly on the best available evidence [8].

The pursuit of the best available evidence is the foundation of

  • translational effectiveness and more generally of
  • translational medicine in evidence-based health care.

The best available evidence is obtained through a systematic process driven by

  • a research question/hypothesis that is articulated about clearly stated criteria that pertain to the
  • patient (P), the interventions (I) under consideration (C), for the sought clinical outcome (O), within a given timeline (T) and clinical setting (S).

PICOTS is tested on the appropriate bibliometric sample, with tools of measurements designed to establish the level (e.g., CONSORT) and the quality of the evidence. Statistical and meta-analytical inferences, often enhanced by analyses of clinical relevance [9], converge into the formulation of the consensus of the best available evidence. Its dissemination to all stakeholders is key to increase their health literacy in order to ensure their full participation

  • in the utilization of the best available evidence in clinical decisions, viz., translational effectiveness.

To be clear, translational effectiveness – and, in the perspective discussed above, translational health care – is anchored on obtaining the best available evidence,

  • which emerges from highest quality research.
  • which is obtained when errors are minimized.

In an early conceptualization [10], errors in research were presented as

  • those situations that threaten the internal and the external validity of a research study –

that is, conditions that impede either the study’s reproducibility, or its generalization. In point of fact, threats to internal and external validity [10] represent specific aspects of systematic errors (i.e., bias) in the

  • research design,
  • methodology and
  • data analysis.

Thence emerged a branch of science that seeks to

  • understand,
  • control and
  • reduce risk of bias in research.

Risk of bias and the best available evidence

It follows that the best available evidence comes from research with the fewest threats to internal and to external validity – that is to say, the fewest systematic errors: the lowest risk of bias. Quality of research, as defined in the field of research synthesis [11], has become synonymous with

  • low bias and contained risk of bias [1215].

Several years ago, the Cochrane group embarked on a new strategy for assessing the quality of research studies by examining potential sources of bias. Certain original areas of potential bias in research were identified, which pertain to

(a) the sampling and the sample allocation process, to measurement, and to other related sources of errors (reliability of testing),

(b) design issues, including blinding, selection and drop-out, and design-specific caveats, and

(c) analysis-related biases.

A Risk of Bias tool was created (Cochrane Risk of Bias), which covered six specific domains:

1. selection bias,

2. performance bias,

3. detection bias,

4. attrition bias,

5. reporting bias, and

6. other research protocol-related biases.

Assessments were made within each domain by one or more items specific for certain aspects of the domain. Each items was scored in two distinct steps:

1. the support for judgment was intended to provide a succinct free-text description of the domain being queried;

2. each item was scored high, low, or unclear risk of material bias (defined here as “…bias of sufficient magnitude to have a notable effect on the results or conclusions…” [16]).

It was advocated that assessments across items in the tool should be critically summarized for each outcome within each report. These critical summaries were to inform the investigator so that the primary meta-analysis could be performed either

  • only on studies at low risk of bias, or for
  • the studies stratified according to risk of bias [16].

This is a form of acceptable sampling analysis designed to yield increased homogeneity of meta-analytical outcomes [17]. Alternatively, the homogeneity of the meta-analysis can be further enhanced by means of the more direct quality-effects meta-analysis inferential model [18].

Clearly, one among the major drawbacks of the Cochrane Risk of Bias tool is

  • the subjective nature of its assessment protocol.

In an effort to correct for this inherent weakness of the instrument, the Cochrane group produced

  • detailed criteria for making judgments about the risk of bias from each individual item[16], and
  • that judgments be made independently by at least two people, with any discrepancies resolved by discussion [16].

This approach to increase the reliability of measurement in research synthesis protocols

  • is akin to that described by us [19,20] and by AHRQ [21].

In an effort to aid clinicians and patients in making effective health care related decisions, AHRQ developed an alternative Risk of Bias instrument for enabling systematical evaluation of evidence reporting [22]. The AHRQ Risk of Bias instrument was created to monitor four primary domains:

1. risk of bias: design, methodology, analysis scoring – low, medium, high

2. consistency: extent of similarity in effect sizes across studies within a bibliome scoring – consistent, inconsistent, unknown

3. directness: unidirectional link between the interventions of interest and the sought outcome, as opposed to multiple links in a casual chain scoring – direct, indirect

4. precision: extent of certainty for estimate of effect with respect to the outcome scoring – precise, imprecise In addition, four secondary domains were identified:

a. Dose response association: pattern of a larger effect with greater exposure (Present/Not Present/Not Applicable or Not Tested)

a. Confounders: consideration of confounding variables (Present/Absent)

a. Strength of association: likelihood that the observed effect is large enough that it cannot have occurred solely as a result of bias from potential confounding factors (Strong/Weak)

a. Publication bias

The AHRQ Risk of Bias instrument is also designed to yield an overall grade of the estimated risk of bias in quality reporting:

•Strength of Evidence Grades (scored as high – moderate – low – insufficient)

This global assessment, in addition to incorporating the assessments above, also rates:

–major benefit

–major harm

–jointly benefits and harms

–outcomes most relevant to patients, clinicians, and stakeholders

The AHRQ Risk of Bias instrument suffers from the same two major limitations as the Cochrane tool:

1. lack of formal psychometric validation as most other tools in the field [21], and

2. providing a subjective and not quantifiable assessment.

To begin the process of engaging in a systematic dialectic of the two instruments in terms of their respective construct and content validity, it is necessary

  • to validate each for reliability and validity either by means of the classic psychometric theory or generalizability (G) theory, which allows
  • the simultaneous estimation of multiple sources of measurement error variance (i.e., facets)
  • while generalizing the main findings across the different study facets.

G theory is particularly useful in clinical care analysis of this type, because it permits the assessment of the reliability of clinical assessment protocols.

  • the reliability and minimal detectable changes across varied combinations of these facets are then simply calculated [23], but
  • it is recommended that G theory determination follow classic theory psychometric assessment.

Therefore, we have commenced a process of revision the AHRQ Risk of Bias instrument by rendering questions in primary domains quantifiable (scaled 1–4),

  • which established the intra-rater reliability (r = 0.94, p < 0.05), and
  • the criterion validity (r = 0.96, p < 0.05) for this instrument (Figure 2).



Figure 2. Proportion of shared variance in criterion validity (A) and inter-rater reliability (B) in the AHRQ Risk of Bias instrument revised as described.
Two raters were trained and standardized 
[20] with the revised AHRQ Risk of Bias and with the R-Wong instrument, which has been previously validated[24]. Each rater independently produced ratings on a sample of research reports with both instruments on two separate occasions, 1–2 months apart. Pearson correlation coefficient was used to compute the respective associations. The figure shows Venn diagrams to illustrate the intersection between each two sets data used in the correlations. The overlap between the sets in each panel represents the proportion of shared variance for that correlation. The percent of unexplained variance is given in the insert of each panel.

A similar revision of the Cochrane Risk of Bias tool may also yield promising validation data. G theory validation of both tools will follow. Together, these results will enable a critical and systematic dialectical comparison of the Cochrane and the AHRQ Risk of Bias measures.


The critical evaluation of the best available evidence is critical to patient-centered care, because biased research findings are fundamentally invalid and potentially harmful to the patient. Depending upon the tool of measurement, the validity of an instrument in a study is obtained by means of criterion validity through correlation coefficients. Criterion validity refers to the extent to which one measures or predicts the value of another measure or quality based on a previously well-established criterion. There are other domains of validity such as: construct validity and content validity that are rather more descriptive than quantitative. Reliability however is used to describe the consistency of a measure, the extent to which a measurement is repeatable. It is commonly assessed quantitatively by correlation coefficients. Inter-rater reliability is rendered as a Pearson correlation coefficient between two independent readers, and establishes equivalence of ratings produced by independent observers or readers. Intra-rater reliability is determined by repeated measurement performed by the same subject (rater/reader) at two different points in time to assess the correlation or strength of association of the two sets of scores.

To establish the reliability of research quality assessment tools it is necessary, as we previously noted [20]:

•a) to train multiple readers in sharing a common view for the cognitive interpretation of each item. Readers must possess declarative knowledge a factual form of information known to be static in nature a certain depth of knowledge and understanding of the facts about which they are reviewing the literature. They must also have procedural knowledge known as imperative knowledge that can be directly applied to a task in this case a clear understanding of the fundamental concepts of research methodology, design, analysis and inference.

•b) to train the readers to read and evaluate the quality of a set of papers independently and blindly. They must also be trained to self-monitor and self-assess their skills for the purpose of insuring quality control.

•c) to refine the process until the inter-rater correlation coefficient and Cohen coefficient of agreement are about 0.9 (over 81% shared variance). This will establishes that the degree of attained agreement among well-trained readers is beyond chance.

•d) to obtain independent and blind reading assessments from readers on reports under study.

•e) to compute means and standard deviation of scores for each question across the reports, repeat process if the coefficient of variations are greater than 5% (i.e., less than 5% error among the readers across each questions).

The quantification provided by instruments validated in such a manner to assess the quality and the relative lack of bias in the research evidence allows for the analysis of the scores by means of the acceptable sampling protocol. Acceptance sampling is a statistical procedure that uses statistical sampling to determine whether a given lot, in this case evidence gathered from an identified set of published reports, should be accepted or rejected [12,25]. Acceptable sampling of the best available evidence can be obtained by:

•convention: accept the top 10 percentile of papers based on the score of the quality of the evidence (e.g., low Risk of Bias);

•confidence interval (CI95): accept the papers whose scores fall at of beyond the upper confidence limit at 95%, obtained with mean and variance of the scores of the entire bibliome;

•statistical analysis: accept the papers that sustain sequential repeated Friedman analysis.

To be clear, the Friedman test is a non-parametric equivalent of the analysis of variance for factorial designs. The process requires the 4-E process outlined below:

•establishing a significant Friedman outcome, which indicates significant differences in scores among the individual reports being tested for quality;

•examining marginal means and standard deviations to identify inconsistencies, and to identify the uniformly strong reports across all the domains tested by the quality instrument

•excluding those reports that show quality weakness or bias

•executing the Friedman analysis again, and repeating the 4-E process as many times as necessary, in a statistical process akin to hierarchical regression, to eliminate the evidence reports that exhibit egregious weakness, based on the analysis of the marginal values, and to retain only the group of report that harbor homogeneously strong evidence.

Taken together, and considering the domain and the structure of both tools, expectations are that these analyses will confirm that these instruments are two related entities, each measuring distinct aspects of bias. We anticipate that future research will establish that both tools assess complementary sub-constructs of one and the same archetype meta-construct of research quality.


  1. Jiang F, Zhang J, Wang X, Shen X: Important steps to improve translation from medical research to health policy.

    J Trans Med 2013, 11:33. BioMed Central Full Text OpenURL

  2. Sung NS, Crowley WF Jr, Genel M, Salber P, Sandy L, Sherwood LM, Johnson SB, Catanese V, Tilson H, Getz K, Larson EL, Scheinberg D, Reece EA, Slavkin H, Dobs A, Grebb J, Martinez RA, Korn A, Rimoin D:Central challenges facing the national clinical research enterprise.

    JAMA 2003, 289:1278-1287. PubMed Abstract | Publisher Full Text OpenURL

  3. Woolf SH: The meaning of translational research and why it matters.

    JAMA 2008, 299(2):211-213. PubMed Abstract | Publisher Full Text OpenURL

  4. Chiappelli F: From translational research to translational effectiveness: the “patient-centered dental home” model.

    Dental Hypotheses 2011, 2:105-112. Publisher Full Text OpenURL

  5. Maida C: Building communities of practice in comparative effectiveness research. In Comparative effectiveness and efficacy research and analysis for practice (CEERAP): applications for treatment options in health care. Edited by Chiappelli F, Brant X, Cajulis C. Heidelberg: Springer–Verlag; 2012.

    Chapter 1


  6. Agency for Healthcare Research and Quality: Budget estimates for appropriations committees, fiscal year (FY) 2008: performance budget submission for congressional justification.

    Performance budget overview 2008. webcite. Accessed 11 May 2013


  7. Westfall JM, Mold J, Fagnan L: Practice-based research—“blue highways” on the NIH roadmap.

    JAMA 2007, 297:403-406. PubMed Abstract | Publisher Full Text OpenURL

  8. Chiappelli F, Brant X, Cajulis C: Comparative effectiveness and efficacy research and analysis for practice (CEERAP) applications for treatment options in health care. Heidelberg: Springer–Verlag; 2012. OpenURL

  9. Dousti M, Ramchandani MH, Chiappelli F: Evidence-based clinical significance in health care: toward an inferential analysis of clinical relevance.

    Dental Hypotheses 2011, 2:165-177. Publisher Full Text OpenURL

  10. Campbell D, Stanley J: Experimental and quasi-experimental designs for research. Chicago, IL: Rand-McNally; 1963. OpenURL

  11. Littell JH, Corcoran J, Pillai V: Research synthesis reports and meta-analysis. New York, NY: Oxford Univeristy Press; 2008. OpenURL

  12. Chiappelli F: The science of research synthesis: a manual of evidence-based research for the health sciences. Hauppauge NY: NovaScience Publisher, Inc; 2008. OpenURL

  13. Higgins JPT, Green S: Cochrane handbook for systematic reviews of interventions version 5.0.1. Chichester, West Sussex, UK: John Wiley & Sons. The Cochrane collaboration; 2008. OpenURL

  14. CRD: Systematic Reviews: CRD’s guidance for undertaking reviews in health care. National Institute for Health Research (NIHR). University of York, UK: Center for reviews and dissemination; 2009. PubMed Abstract| Publisher Full Text OpenURL

  15. McDonald KM, Chang C, Schultz E: Closing the quality Gap: revisiting the state of the science. Summary report. U.S. Department of Health & Human Services. AHRQ, Rockville, MD: Summary report. AHRQ publication No. 12(13)-E017; 2013. OpenURL

Read Full Post »