Package: gtsummary

First we will load the necessary packages and import the Titanic dataset from the package “titanic”. Quick look at the variables to see which we want to keep for our summary table or “Table 1.”

Code

library(tidyverse)
library(skimr)
library(titanic)
library(janitor)
library(gtsummary)

titanic_train %>% 
  glimpse()

Rows: 891
Columns: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Survived    <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
$ Pclass      <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
$ Name        <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
$ Sex         <chr> "male", "female", "female", "female", "male", "male", "mal…
$ Age         <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
$ SibSp       <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
$ Parch       <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
$ Ticket      <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
$ Fare        <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
$ Cabin       <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", "G6", "C…
$ Embarked    <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", "S"…

Manipulating the data

“PassengerId”, “Name”, “Ticket”, “Parch”, “Cabin” seem to either give us no useful information/I don’t know what they mean so let’s remove them before creating our table.

Before passing the data to the “tbl_summary” package we will likely want to rename our variables and factor levels. Also the siblings category has many levels (likely some with few events) so we will collapse this into fewer levels.

Code

df_titanic <- titanic::titanic_train %>%
  select(-c(PassengerId, Name, Ticket, Parch, Cabin)) %>% # Removing unwanted variables
  mutate( # First converting SibSp into a factor variable and then collapsing it to 3 levels and an "other" level
    SibSp = factor(SibSp), 
    SibSp = fct_lump_n(SibSp, n = 3),
    SibSp = fct_recode(SibSp, ">=3" = "Other"), # Renaming the new level
    Sex = fct_recode(Sex, # Renaming levels of categorical variables
      "Female" = "female",
      "Male" = "male"
    ),
    Embarked = fct_recode(Embarked,
                          "Cherbourg" = "C",
                          "Southhampton" = "S",
                          "Cobh" = "Q")
    ) %>% 
  rename( # Renaming variables
    "Passenger class" = Pclass,
    "Number of siblings" = SibSp,
    "Ticket price" = Fare,
    "City of embarkation" = Embarked
  )

Creating the table

Let’s also divide our table into two columns dependent on passenger survival and add some summary statistics.

Code

df_titanic %>% 
  tbl_summary( # Table summary function
    by = Survived, # Grouping variable
  ) %>% 
  add_p() %>% # Add basic statistics (automatically chosen based on variable type)
  bold_labels() %>% 
  italicize_levels() %>% 
  bold_p() %>% # Any significant p-values will be bold
  modify_header( # Changing the headings above the grouped columns
     stat_1 = "**Died**, N = 549",  
     stat_2 = "**Survived**, N = 342",  
  ) %>% 
  modify_caption("**Table 1. Baseline characteristics of passengers on The Titanic**") # Adding a title

**Table 1. Baseline characteristics of passengers on The Titanic**
Characteristic	Died, N = 549¹	Survived, N = 342¹	p-value²
Passenger class			<0.001
1	80 (15%)	136 (40%)
2	97 (18%)	87 (25%)
3	372 (68%)	119 (35%)
Sex			<0.001
Female	81 (15%)	233 (68%)
Male	468 (85%)	109 (32%)
Age	28 (21, 39)	28 (19, 36)	0.2
Unknown	125	52
Number of siblings			<0.001
0	398 (72%)	210 (61%)
1	97 (18%)	112 (33%)
2	15 (2.7%)	13 (3.8%)
>=3	39 (7.1%)	7 (2.0%)
Ticket price	10 (8, 26)	26 (12, 57)	<0.001
City of embarkation			<0.001
	0 (0%)	2 (0.6%)
Cherbourg	75 (14%)	93 (27%)
Cobh	47 (8.6%)	30 (8.8%)
Southhampton	427 (78%)	217 (63%)
¹ n (%); Median (IQR)
² Pearson's Chi-squared test; Wilcoxon rank sum test; Fisher's exact test