Package: gtsummary

First we will load the necessary packages and import the Titanic dataset from the package “titanic”. Quick look at the variables to see which we want to keep for our summary table or “Table 1.”

Code
library(tidyverse)
library(skimr)
library(titanic)
library(janitor)
library(gtsummary)

titanic_train %>% 
  glimpse()
Rows: 891
Columns: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Survived    <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
$ Pclass      <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
$ Name        <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
$ Sex         <chr> "male", "female", "female", "female", "male", "male", "mal…
$ Age         <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
$ SibSp       <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
$ Parch       <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
$ Ticket      <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
$ Fare        <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
$ Cabin       <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", "G6", "C…
$ Embarked    <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", "S"…

Manipulating the data

“PassengerId”, “Name”, “Ticket”, “Parch”, “Cabin” seem to either give us no useful information/I don’t know what they mean so let’s remove them before creating our table.

Before passing the data to the “tbl_summary” package we will likely want to rename our variables and factor levels. Also the siblings category has many levels (likely some with few events) so we will collapse this into fewer levels.

Code
df_titanic <- titanic::titanic_train %>%
  select(-c(PassengerId, Name, Ticket, Parch, Cabin)) %>% # Removing unwanted variables
  mutate( # First converting SibSp into a factor variable and then collapsing it to 3 levels and an "other" level
    SibSp = factor(SibSp), 
    SibSp = fct_lump_n(SibSp, n = 3),
    SibSp = fct_recode(SibSp, ">=3" = "Other"), # Renaming the new level
    Sex = fct_recode(Sex, # Renaming levels of categorical variables
      "Female" = "female",
      "Male" = "male"
    ),
    Embarked = fct_recode(Embarked,
                          "Cherbourg" = "C",
                          "Southhampton" = "S",
                          "Cobh" = "Q")
    ) %>% 
  rename( # Renaming variables
    "Passenger class" = Pclass,
    "Number of siblings" = SibSp,
    "Ticket price" = Fare,
    "City of embarkation" = Embarked
  )

Creating the table

Let’s also divide our table into two columns dependent on passenger survival and add some summary statistics.

Code
df_titanic %>% 
  tbl_summary( # Table summary function
    by = Survived, # Grouping variable
  ) %>% 
  add_p() %>% # Add basic statistics (automatically chosen based on variable type)
  bold_labels() %>% 
  italicize_levels() %>% 
  bold_p() %>% # Any significant p-values will be bold
  modify_header( # Changing the headings above the grouped columns
     stat_1 = "**Died**, N = 549",  
     stat_2 = "**Survived**, N = 342",  
  ) %>% 
  modify_caption("**Table 1. Baseline characteristics of passengers on The Titanic**") # Adding a title
Table 1. Baseline characteristics of passengers on The Titanic
Characteristic Died, N = 5491 Survived, N = 3421 p-value2
Passenger class <0.001
1 80 (15%) 136 (40%)
2 97 (18%) 87 (25%)
3 372 (68%) 119 (35%)
Sex <0.001
Female 81 (15%) 233 (68%)
Male 468 (85%) 109 (32%)
Age 28 (21, 39) 28 (19, 36) 0.2
Unknown 125 52
Number of siblings <0.001
0 398 (72%) 210 (61%)
1 97 (18%) 112 (33%)
2 15 (2.7%) 13 (3.8%)
>=3 39 (7.1%) 7 (2.0%)
Ticket price 10 (8, 26) 26 (12, 57) <0.001
City of embarkation <0.001
0 (0%) 2 (0.6%)
Cherbourg 75 (14%) 93 (27%)
Cobh 47 (8.6%) 30 (8.8%)
Southhampton 427 (78%) 217 (63%)
1 n (%); Median (IQR)
2 Pearson's Chi-squared test; Wilcoxon rank sum test; Fisher's exact test