Critical Appraisal Dictionary

This page contains the 50 most commonly used epidemiological terms. Understanding these terms will really help demystify critical appraisal for you. Each answer will give you a definition and an example.

Absolute Risk Reduction

A measure of effect. Incidence of outcome in intervention group minus Incidence of outcome in control group.

Example:

Accuracy

How often is a test truly positive or negative, as a proportion of all tests.

Also A+D / A+B+C+D. This is not a very useful test characteristic.

Tests are best interpreted when the sensitivity and sensitivity are reported. Â The Ottawa ankle rule has 65% accuraccy for detecting ankle fractures. This doesn’t tell you whether the test is sufficiently sensitive to rule in or rule out a fracture.

Example

The Ottawa ankle rule has 65% accuracy for detecting ankle fractures.

This doesn’t tell you whether the test is sufficiently sensitive to rule in or rule out a fracture.

Bias

A systematic difference in the way a study is conducted that leads to a less valid result.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac Pulmonary Oedema. The controls had one set of measurements recorded by existing nursing staff. The cases had a consultant cardiologist and intensivist next to them and underwent several measurements. The cases had a better survival. Here, there are several biases. Firstly, the patient and person recording the data are aware of the treatment allocation. Secondly, there are other systematic differences, other that the Non-invasive ventilation, that might account for a better outcome. More measurements were made on the cases than the controls.

Link

http://radiology.rsna.org/content/238/3/780.full

Bias-ascertainment

A bias that results when measurements on the subjects are performed differently.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac PulmonaryÂ Oedema. In this trial, the patients who got NIV had the data recorded by an automated monitoring machine, while the controls had data taken from the nursing and medical records. This means that any differences between the groups may be due to the different ways they were measured.

Bias-expectation

A bias that results when the people recording the data are influenced by their own knowledge.

Example

A trial of Non-Invasive Ventilation was done to see if it improved oxygenation in patients with Acute Cardiac Pulmonary Oedema. The researchers could see whether a patient had a NIV mask on or not, this might influence how they record saturations.

Link

http://radiology.rsna.org/content/238/3/780.full

Bias-follow up

A bias that results when the losses to follow up differ between the two groups.

Example

A trial of Non-Invasive Ventilation was done to see if it improved symptoms in patients with Acute Cardiac Pulmonary Oedema. They followed up all the patients after two weeks. One group had many more losses to follow up. This might be because all these patients had died. See also Intention to treat analysis.

Bias-inclusion

A bias that results when subjects are included into groups not at random.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac Pulmonary Oedema. They assigned patients to NIV or Standard treatment on alternate days. People who present on Mondays tend to be sicker, so they are not comparing like with like.

Link

http://radiology.rsna.org/content/238/3/780.full

Bias-spectrum

A bias that results when the subjects under study are from the extreme ends of the patient spectrum. This inflates sensitivities and specificities.

Example

A study reported a questionnaire to identify harmful alcohol drinking. They recruited end stage cirrhoticsÂ and Olympic triathletes. They found that their questionnaire had excellent sensitivity and specificity, but when the scale was used in an emergency department on a different population, it was less good.

Link

http://radiology.rsna.org/content/238/3/780.full

Case Control Study

A type of observational research study design. Subjects who have the outcome are compared to controls and then the researchers look back to see which had the exposures.

(The exposures areÂ unknownÂ at the beginning, but the outcomes areÂ known.) There must be a time delay, however short, between the recording of the exposures and the outcome, otherwise this is a cross-sectional study.

Example

Researchers wanted to identify risk factors for developing necrotising fasciitis. They compared cases of necrotising fasciitis against controls who had a discharge diagnosis of cellulitis.Â They then looked back through the hospital notes, blood results and so on to find what features were more likely in people with necrotising fasciitis.

Cohort Study

A type of observational research study design. Subjects have exposures recorded about them at the beginning and are then followed up over time to see who develops the outcome of interest. (The exposures areÂ knownÂ at the beginning, but the outcomes areÂ unknown.) There must be a time delay, however short, between the recording of the exposures and the outcome otherwise this is a cross-sectional study.

Note, a cohort study can be prospective, or retrospective.

Example

The Canadian C-Spine study collected data about patients with suspected cervical spine fractures and followed them up to see which features were associated with fractures. This is a cohort study.

References and Links

The Canadian C-Spine Rule for Radiography in Alert and Stable Trauma PatientsÂ Â JAMA. 2001;286(15):1841-1848. doi:10.1001/jama.286.15.1841.

http://jama.jamanetwork.com/article.aspx?articleid=194296

Confidence Interval

A range of uncertainty or ‘margin of error’ around an estimate. Usually 95% confidence intervals are reported and this means that the authors are 95% sure that their estimate of effect is between the two confidence intervals.

Example

A study reported that the treatment lead to a relative risk reduction of 0.67 (95% Confidence Interval 0.52-0.73) This indicates the the benefit was somewhere between these two numbers and, because the confidence intervals do not cross 1, statistically significant.

Confounding

A confounding variable is one which is associated with both the exposure and outcome of interest. This can lead to inappropriate conclusions about whether an association exists.

Example

A paper reported better survival rates after myocardial infaction in hospital A, compared to hospital B.Â However, the patients who went to hospital A were younger than those who went to hospital b. In this case, age (the confounding variable) is associated with the outcome of interest (good outcome) and also the variable of interest (either hospital). See also multivariate analysis.

CONSORT statement

A standard way of reporting randomised controlled trials. It stands for Consolidated Standards of Reporting Trials.

Link

http://www.consort-statement.org/index.aspx?o=1280&m=5&y=2008

Effectiveness

How well an intervention works in the real world.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac Pulmonary Oedema. The study was performed on breathless patients in several emergency departments. The patients were attended to by the usual medical and nursing team on duty at that time.

Efficiacy

How well an interventionÂ works under ideal conditions.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac Pulmonary Oedema. The study was performed on carefully selected patients in an academic teaching hospital coronary care unit. The patients were attended to by a Consultant cardiologist, Consultant respiratory physician and a research nurse. This is clearly ideal circumstances and you might hesitate to apply the findings here to your own patients.

Explanatory

An explanatory study is concerned with finding out how an intervention works.

Example

A trial of Non-Invasive Ventilation was done to see if it improved myocardial contractility on echo in patients with Acute Cardiac Pulmonary Oedema.

Grade of Recommendation

A Guideline makes recommendations. The Grade of a recommendation is a measure of how much evidence supports this recommendation. See also the Levels of Evidence.

Example

NICE prepared a guideline on head injuries. They stated that all head injured patients with a GCS of less than 8 should have a pre-alert made by the paramedics. There is no real evidence behind this, but reflects a consensus of sensible experts. This was a Grade D recommendation.

Link

http://www.sicsebm.org.uk/jargon.htm

Heterogeneity

Heterogeneity is the amount of variation in the results of trials included in a systematic review. The less heterogeneity there is, the more likely the result is to be true.Â This can be assessed statistically in two ways. A Cochrane’s Q statistic indicates significant heterogeneity if it is less than 0.05. A I2 value ranges from 0 to 1.0, smaller number indicate less heterogeneity.

Example

The Forest plot shows a meta-analysis of randomised controlled trials of giving dexamethasone to patients with severe migraine. This is a homogenous result, as all the studies are on the same side of the line. The I2 score is very low and the Test for heterogeneity does not find a statistically significant result.

The Forest plot below shows a very heterogeneous group of studies, with many studies being either side of the line.

Link

http://www.sicsebm.org.uk/jargon.htm

Intention to Treat

An analytical technique to mitigate the effect of cross overs and losses to follow-up. Subjects are analysed in the group that they were originally allocated to, not where they ended up.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac Pulmonary Oedema, compared to standard therapy. Initial results suggested that the patients who received NIV did much worse with more deaths. What had happened was that patients who were initially allocated to standard treatment who getting worse were then given NIV by their treating doctors. When an intention to treat analysis was used, there was no difference between the two groups.

JADAD

The JADAD scale isÂ a quality score of the methodolgical quality of a trial. A perfectly conducted double blind randomised controlled trial scores 5.Â A biased and unblinded trial would score 0.

Example

Early Goal Directed Therapy for Severe Sepsis, by Rivers et alÂ http://content.nejm.org/cgi/content/short/345/19/1368scored 3. It lost points because this was an unblinded randomised controlled trial

Kappa

A measure of concordance between observers. 1 is perfect agreement, 0 is absolute disagreement. The higher the Kappa value , the better the agreement.

Example

Two paediatric radiologists were asked to say whether 10 chest x-rays were normal or not. The Kappa value was 0.3 indicating substantial disagreement.

Level of Evidence

A way of deciding how strong the evidence is supporting a treatment or investigation is.Â This can apply to both interventions and diagnosis.

Example

A meta-analysis of high quality randomised controlled trials provides stronger evidence whether something works or not than a case series.

The table below shows the order suggested by the Centre for Evidence Based Medicine.

Link

http://www.cebm.net/index.aspx?o=1025

Likelihood Ratio of Negative Test

This is a useful measure to find out how good a test is at excluding disease.

It is calculated byÂ working out (1 – sensitivity) / specificity.

This results in a number less than one. The smaller the number, the better the test is at ruling out a disease. Ratios from one to 0.2 are pretty useless, 0.2-0.1 can be useful and ratios less than 0.1 are very useful.

The advantage of likelihood ratio tests is that they do not change with disease prevalence, unlike positive and negative predictive values. This means that you can apply results from a study with a different disease prevalence and make a judgement as to how useful it would be in your own population.

The likelihood ratio of a positive skull x-ray (visible fracture)Â is 7.7 for detecting an intracranial injury. This means that a skull fracture much more likely in a head injured patient. Here a skull x-ray is useful for ruling in an intracranial injury, but as you know, skull x-rays are not good at ruling out an intracranial injury.

Link

http://www.medicine.ox.ac.uk/bandolier/band80/b80-4.html

Likelihood Ratio of Positive Test

This is a measure of how good a test is at ruling in a disease.

It is worked out by calculating:

(Sensitivity / (1-specificity)

This results in a number greater than one. The bigger the number, the better the test is at ruling in a disease. Ratios from one to five are pretty useless, five to ten can be useful and ratios greater than ten are very useful

Meta-analysis

A mathematical process of combining together the results from studies answering the same question.Â A meta-analysis of well conducted randomised controlled trials is generally regarded as the strongest form of evidence. A meta-analysis is more than just adding the results together. The maths can be quite complex.
AÂ fixed effectsÂ meta-analysis is performed if the results are homogenous (see figure 1) or random effects, where the results are heterogeneous (see figure 2) .

Example

A meta-analysis was done to see if implantable defibrillators prevented death in patients with non-ischaemic cardiomyopathy.

Link

http://jama.ama-assn.org/content/292/23/2874.full

Negative Predictive Value

If a patient has a negative test, how likely is it that they do not have the disease? This is worked out by calculating d / c+d.

This is useful because it allows you judge how this test will perform in your population. This is the same as post test probability of a negative test.

Example

A patient presents with a possible DVT. You perform a Wells score and decide that the patient has a high pre-test probability. The chance, or pre-test probability, that this patient has a DVT is 30%. If the d-dimer comes back negative, the post test probability of a negative test (negative predictive test) is 10%.

Number Needed to Harm

The inverse of the Absolute Risk Reduction. This happens when the intervention is worse than the control.

Example

100 major trauma patients receive ‘superclot’ and 100 receive placebo.Â 20 in the treatment group get a pulmonary embolism and 10 in the placebo group. The absolute harm difference is 30/100 – 20/100 = 10/100 (or 10%). The number needed to harm is 1/0.1. This means that you need to treat 10 patients with â€˜superclotâ€™ to cause one pulmonary embolism.

Number Needed to Treat

The inverse of the Absolute Risk Reduction. This is useful for deciding how many patients you need to treat to prevent one outcome.

Example

100 major trauma patients receive ‘superclot’ and 100 receive placebo. 20 die in the treatment group and 30 die in the placebo group. The absolute risk reduction is 30/100 – 20/100 = 10/100 (or 10%). The number needed to treat is 1/0.1 = 10. That is, you need to treat 10 patients with â€˜superclotâ€™ to save one life.

Observational Study

A type of research study where no intervention that might alter the outcome status of a patient is performed. The picture shows some commonly used study designs in clinical research.

Example

The Canadian C-Spine study collected data about patients with suspected cervical spine fractures and followed them up to see which features were associated with fractures. This is an observational study, as the researcher didnâ€™t do anything to the patients.

Odds Ratio

A measure of effect from case control studies. This is ad / bc. It is also the output from a logistic regression model, which can be used in other study designs. An Odds ratio of 1 means no effect.

Example 1

Some researchers tried to identify what features were useful to distinguish between cellulitis and necrotising fasciitis. They compared necrotising fasciitis (cases) to cellulitis (controls). They found that inability to weight bear had an Odds ratio of 4.5 for necrotising fasciitis.

Example 2

A randomised controlled trial allocated major trauma patients to receive ‘superclot’ or placebo. There were significant differences in the baseline characteristics, so the authors did a regression analysis to control for this. They found that superclot use was associated with an Odds ratio of death of 0.67. They concluded that superclot was a good treatment.

Positive Predictive Value

If a patient has a positive test, how likely is it that they have the gold standard diagnosis? Also A / A+B. This also sometimes termed â€˜post test probability of a positive testâ€™.

Power

The chance of finding a difference between study groups if one truly exists. Also 1- the type 2 error.

Example

A randomised controlled trial was described as having a power of 80%. This means that there is an 80% chance of detecting a difference between groups, if such a difference truly existed. The smaller the difference between groups, the larger the study has to be to find a difference.

Pragmatic

A pragmatic study is concerned with the overall effectiveness of an intervention, regardless of the mechanism.

Example

A trial of Non-Invasive Ventilation was done to see if it improved mortality in patients with Acute Cardiac Pulmonary Oedema. The primary outcome was survival at 30 days. Here the outcome is pragmatic, rather than explanatory.

Pre-Test Probability

The chance a patient has a disease before you do a test. This is the same as disease prevalence.

Example

A patient presents with a possible DVT. You perform a Wells score and decide that the patient has a high pre-test probability. The chance, or pre-test probability, that this patient has a DVT is 30%.

Precision

This is a measure of the amount of random error around an estimate. A precise estimate is one with narrow confidence intervals.

Example 1

A trial found that NIV was better than standard treatment at preventing death. The relative risk of death was 0.8 (95% Confidence Interval 0.78-0.82) Here the estimate is very precise.

Example 2

A smaller study found that NIV was better than standard treatment at preventing death. The relative risk of death was 0.8 (95% Confidence Interval 0.2- 0.99) Here the confidence intervals are wider and this estimate is less precise.

Prevalence

The total number of cases who have a disease at a given time, divided by the at risk population.

Example

The number of cases of acute coronary syndrome in patients presenting with chest pain attending the emergency department was 20%.

Quality

â€˜Qualityâ€™ as applied to research studies

An assessment of how well bias has been minimised in a study.

Example

There are a variety of scores for this.

The JADAD score is used to grade randomised controlled trials, 5 being the perfect,
double blind randomised controlled trial and -2 being awful.

The QUADAS scale is used to grade diagnostic studies.

Receiver Operator Curve (ROC)

A graphical plot of sensitivity against 1-specificity.

This is useful for describing how good a diagnostic test is.

The greater the area under the curve, the better the diagnostic test is.

Example

Regression

A statistical technique where all known confounders are held constant so that an association
between variables can be examined.

Example

You want to compare the survival rates after major trauma between hospital A and B. Confounding variables might include age, co-morbidity, severity of injury and so on. These variables will need to be adjusted for in a regression model to examine any association
between survival and hospital site.

Relative Risk Ratio

A measure of effect.

Incidence of outcome in intervention group divided by the incidence of outcome in control group.

Example

100 patients receive a treatment and 100 receive placebo.

20 die in the treatment group and 30 die in the placeboÂ group.

The relative risk of death is (20/100) / (30/100 ) =Â 0.66.

Reliability

How repeatable a study is.

Example

A series of small randomised trials were done to see if dexamethosone reduced further headache in
patients with migraine. All of the trials had similar results, these were reliable. This is not the same as
validity.

Link

http://www.bmj.com/cgi/reprint/bmj.39566.806725.BEv1

Sensitivity

How many true positives does the test pick up, as a proportion of all true positives?
Also A / A+C.
Example

The Ottawa ankle rule has 99.9% sensitivity for detecting an ankle fractures.
This means that the Ottawa ankle rule is positive in 99.9% of people who have an ankle fracture.
This means that you can use this to rule out an ankle fracture, if the test is negative.
(AÂ sensitive test that isÂ negative can ruleÂ outÂ a condition: SNOUT )

Sensitivity Analysis

An analytical technique to mitigate the effect of outliers. The values obtained from extreme outliers are removed and the analysis is re-run.

Example

An observational study was done to see whether patients with major head injury did better on a general ICU or neurosurgical ICU. Initial results suggested that those on the general ICU did much worse. When those who died within 24 hours were excluded, the effect was not as strong. This was because there were more unsurvivable cases going to the general ICU, having been turned down by the neurosurgical ICU.

Specificity

How many true negatives does the test pick up, as a proportion of all negatives? Also D/ B+D.

Example 1

The Ottawa ankle rule has 31% specificity for ankle fractures.

This means that the Ottawa ankle is positive when there is no fractureÂ 69% of the time.

A low specificity means that there are lots of false positives.

Example 2

The Monospot test is 99% specific for Infectious Mononucleosis, but only 80% sensitive.

This means that a positive Monospot test allows you to very confident that someone has Infectious
Mononucleosis (high specificity).

However, a patient could have a negative Monospot test and still have Infectious Mononucleosis
(low sensitivity).

A veryÂ specific test which isÂ positive allows you rule a caseÂ in. This is sometimes termed
â€˜Spinâ€™

Systematic Review

A scientific experiment where studies examining a specific question are systematically searched for, appraised and graded to come to a conclusion. This differs from a narrative review, where a respected expert put opinions forward.

Example

Is paracetomol better than ibuprofen at reducing fever in children with fever?Â A systematic review of randomised controlled trials

Type 1 Error

Finding a difference, when in truth, there is none

A small trial found NIV was better than CPAP at reducing mortality in patients with acute pulmonary oedema. Larger, better trials showed that this finding was not true. In this case, the small study reported a type 1 error.

Type 2 Error

Failing to find a difference, when in truth, there is one

Example

A small trial found that there was no difference between two painkillers, A and B, for patients with broken limbs. Larger, better trials showed that painkiller A was better than painkiller B. In this case, the small study reported a type 2 error.

Validity

How likely is a result from a study to be true.

Example

A case report suggested that recombinant factor VIIa might save lives in major haemorrhage. Subsequent randomised trials showed that the overall mortality after major haemorrhage was unchanged regardless of whether factor VIIa or placebo was provided. In this case, the randomised controlled trial is providing a more valid answer than a case report.

Validity External

How well can the findings from a study be applied to another population. Also known as â€˜generalisibility.â€™

Example

A randomised trial of early goal directed therapy, compared to standard treatment, for severe sepsis found that it reduced mortality dramatically. The control group had a much higher mortality than most other hospitals. It is not clear whether this can be applied to other hospitals.Â http://content.nejm.org/cgi/content/short/345/19/1368Â The external validity of this is low.

Validity Internal

How true a study result is. In other words, how well was confounding and bias dealt with?

A randomised trial of early goal directed therapy, compared to standard treatment, for severe sepsis found that it reduced mortality dramatically. The trial was conducted rigorously, with little bias. The internal validity is high.

Content Type

Curriculum

Topics

Non- Clinical

Induction

Exam Info

MRCEM Practice

SBA Revise

SBA Explained

Guidelines - Coming Soon

TERN