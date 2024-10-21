Forest Plot Generation in R - Tilburg Science Hub (2024)

Overview

The goal of this building block is to provide a guide to create basic forest plots, compare the results of different studies, and assess the significance of their pooled effects.

Forest plots are a visual representation that summarises findings from various scientific studies that investigate a common research question. They find significant application in the field of meta-analysis, a type of statistical analysis that combines and examines results from a number of independent studies.More practically, forest plots identify a statistic that is common to such set of studies and report the various instances of that statistic. This, in turn, allows to compare the different results and the significance of the overall pooled summary effect.

Among the benefits of forest plots, we find:

  • clear and concise visual representation of results;
  • effect size and confidence interval comparison across different studies;
  • overall, useful tool to evaluate the consistency and strength of evidence, identify potential sources of bias,and make informed judgments about the effect of interventions or exposures.

Forest Plots in R

One of the most popular R packages used for forest plots is forestploter. Compared to other packages (e.g., forestplot), forestploter focuses entirely on forest plots, which are treated as a table. Moreover, it allows to control for graphical parameters with a theme and to have confidence intervals spread across multiple columns and divided by groups.

Generate Dataset

The code blocks below shows how to generate a dataset and create a basic layout for a forest plot.

  1. Load the required packages and generate a simulated dataset.

library(forestploter)library(dplyr)data <- data.frame( Study = c("Study A", "Study B", "Study C", "Study D", "Study E"), Group1 = c(200, 150, 180, 250, 120), Group2 = c(150, 170, 160, 240, 130), smd = c(0.51, 1.27, -0.54, 0.81, 0.87), CI_Lower = c(-0.25, -0.74, -1.6, 0.55, -0.1), CI_Upper = c(1.25, 1.4, -0.1, 2.1, 1.3))
  1. Calculate standard errors and weights to be used in the derivation of the pooled standard error and of the overall pooled effect.

data$se <- (data$CI_Upper - data$CI_Lower) / (2 * 1.96)data$Weight <- 1/data$se^2data$Weight <- data$Weight / sum(data$Weight)pooled_effect <- round(sum(data$smd * data$Weight), 2)data$Weight <- round(100 * (data$Weight), 2)data$n <- (data$Group1 + data$Group2)n_studies <- 5 pooled_se <- sqrt( (sum((data$n - 1) * data$se^2)) / ((sum(data$n) - n_studies)))
  1. Calculate the confidence interval for the pooled summary effect, insert empty cells to match the forest plot’s graphical representation, and include a final summary column.

z_score <- qnorm(0.975)lower_bound <- round(pooled_effect - z_score * pooled_se, 2)upper_bound <- round(pooled_effect + z_score * pooled_se, 2)data$` ` <- paste(rep(" ", 30), collapse = " ")data$`SMD (95% CI)` <- paste(data$smd, " [", data$CI_Lower, ", ", data$CI_Upper, "]", sep = "")
  1. Reorder the columns and keep only those necessary for the final forest plot. Generate a totals row to append to the original dataset and make any adjustments necessary to ensure that variables appear neat.

data <- data %>% select(Study, Group1, Group2, smd, CI_Lower, CI_Upper, se, ` `, Weight, `SMD (95% CI)`)totals <- c(" ", sum(data$Group1), sum(data$Group2), pooled_effect, lower_bound, upper_bound, pooled_se, " ", sum(data$Weight),  paste(pooled_effect, " [", lower_bound, ", ", upper_bound, "]", sep = ""))data <- rbind(data, totals)data$Weight <- paste(data$Weight, "%")data[nrow(data), 1] <- "Overall"

Create Forest Plot Theme

  1. Generate and customise forest plot theme.

tm <- forest_theme(base_size = 10, # Graphical parameters of confidence intervals ci_pch = 15, ci_col = "#0e8abb", ci_fill = "red", ci_alpha = 1, ci_lty = 1, ci_lwd = 2, ci_Theight = 0.2, # Graphical parameters of reference line refline_lwd = 1, refline_lty = "dashed", refline_col = "grey20", # Graphical parameters of vertical line vertline_lwd = 1, vertline_lty = "dashed", vertline_col = "grey20", # Graphical parameters of diamond shaped summary CI summary_fill = "#006400", summary_col = "#006400")
Forest Plot Generation in R - Tilburg Science Hub (11) Tip

Type help(forest_theme) in your R terminal for more info about the forest_theme() function arguments.

Generate Forest Plot

  1. Once dataset and theme are set, generate forest plot.

# Final data manipulation part data$CI_Upper <- as.numeric(data$CI_Upper)data$CI_Lower <- as.numeric(data$CI_Lower)data$smd <- as.numeric(data$smd)data$se <- as.numeric(data$se)# Forest plotpt <- forest(data[,c(1:3, 8:10)], est = data$smd, lower = data$CI_Lower, upper = data$CI_Upper, sizes = 0.8, is_summary = c(rep(FALSE, nrow(data)-1), TRUE), ci_column = 4, ref_line = 0, arrow_lab = c("Favours Group 1", "Favours Group 2"), xlim = c(-2, 2), ticks_at = c(-2, -1, 0, 1, 2), xlab = "Standardised Mean Difference", theme = tm)plot(pt)

Forest Plot Generation in R - Tilburg Science Hub (14) Tip

Type help(forest) in your R terminal for more info about the forest() function arguments.

This is what the final output should look like:

Forest Plot Generation in R - Tilburg Science Hub (15)

Interpret Forest Plots

The following is an explanation of how to interpret the figure above.

  • Study: results source.
  • Group 1/2: number of participants in the study. Normally the two groups are split between treatment and control groups.
  • Squares (red): effect size of the indvidual studies. In this example, the effect size is represented by the standardised mean difference between the averages of the two groups. Other possible effect sizes are mean difference, odds ratio, or hazard ratio.
  • Horizontal (blue) lines: 95% confidence intervals (CI). The interpretation is that we are 95% confident that the true value of the effect size lies between the lower and upper bounds. The wider the CI the less precise the study.
  • Diamond (green): pooled summary effect of all the studies included in the meta-analysis. The middle points of the diamond represent the pooled effect, while the points on the sides its 95% confidence interval.
  • Dotted Line: this line is known as line of no effect and it is plotted at the exact point where, relative to the effect size chosen in the analyis, there is no difference between the estimates of the two groups. If the effect size is based on a difference, the line of no effect will be at 0, whereas if the effect size is based on a ratio, the line of no effect will be at 1. This line is very useful to interpret the results, in fact, if the CI intersects the line, the results are NOT significant. In this case the pooled summary effect is not significant.
  • Weight: study weight is proportional to study precision and it represents the influence of each individual study on the pooled effect size. More practically, when the standard error of the estimate of a study increases, its weight decreases. An alternative approach is to use a weight that is positively correlated with sample size.
  • SMD (95% CI): summary of effect size and confidence intervals in the graphical representation.

Contributed by Matteo Zicari

FAQs

What is the best R package for forest plots? ›

Forest Plots in R

One of the most popular R packages used for forest plots is forestploter. Compared to other packages (e.g., forestplot), forestploter focuses entirely on forest plots, which are treated as a table.

What do forest plots provide a visualization of? ›

A forest plot is an essential tool to summarize information on individual studies, give a visual suggestion of the amount of study heterogeneity, and show the estimated common effect, all in one figure.

How do you know if a forest plot is statistically significant? ›

The statistical significance of a pooled estimate can be detected by visual inspection of the diamond (if the diamond width includes the line of no effect, there is no statistical difference between the two groups) or checking the p-value in the last row of a forest plot, “Test for overall effect” (P < 0.05 indicates a ...

Why is a forest plot called a forest plot? ›

The name refers to the forest of lines produced. In September 1990, Richard Peto joked that the plot was named after a breast cancer researcher called Pat Forrest and as a result the name has sometimes been spelled "forrest plot".

What is the effect size of a forest plot? ›

What is a Forest plot and what does it mean? A forest plot is a visual way to summarise the meta-analysis results. The effect size (green square) is the standardised mean difference, which shows the change in loneliness score from pre to post intervention for each individual study.

What can you learn from a forest plot? ›

In a systematic review of epidemiological studies, the forest plot shows whether the exposure (e.g., smoking, alcohol, inhaling high levels of air pollutants, obesity) is likely to be a cause of the outcome measure (e.g., development of a medical condition).

What is the aim of the forest plot? ›

The forest plot enables a straightforward comparison of the results of dozens of studies by presenting the results of a meta-analysis in a simple and understandable visual format, making it a useful element for researchers, healthcare professionals, and policymakers who want to make well-informed decisions based on the ...

What does the diamond represent in a forest plot? ›

The diamond at the bottom of the forest plot shows the result when all the individual studies are combined together and averaged. The horizontal points of the diamond are the limits of the 95% confidence intervals and are subject to the same interpretation as any of the other individual studies on the plot.

How to save a forest plot in R? ›

Let's say I want to save the Forest Plot now. The easiest way to do this is to plot it to a graphics device instead of to the screen. Just like the function sink() redirected text output from the console tab to a text file, there are functions that redirect images from the plot tab to a file.

How do I remove data from a plot in R? ›

In this approach to remove the axis values of the plot, the user just need to use the base function plot() of the R programming language, and further in this function the user needs to use the axt argument of this function and set its value to “n” and this will be leading to the removal of the Axis Values of Plot in R ...

What is a meta forest plot? ›

meta forestplot summarizes meta data in a graphical format. It reports individual effect sizes and the overall effect size (ES), their confidence intervals (CIs), heterogeneity statistics, and more.

What is the best mapping package in R? ›

ggmap. The ggmap package is the most exciting R mapping tool in a long time! You might be able to get better looking maps at some resolutions by using shapefiles and rasters from naturalearthdata.com but ggmap will get you 95% of the way there with only 5% of the work!

What package is random forest in R? ›

The R package "randomForest" is used to create random forests.

What package is plot function in R? ›

the package 'itsadug', which con- tains the core functions for visualizing and evaluating nonlinear regression mod- els, and 2. the package 'plotfunctions', which contains more general plot functions.

