Tables are often essential for organzing and summarizing your data, especially with categorical variables. When creating a table in R, it considers your table as a specifc type of object (called “table”) which is very similar to a data frame. Though this may seem strange since datasets are stored as data frames, this means working with tables will be very easy since we have covered data frames in detail over the previous tutorials. In this chapter, we will discuss how to create various types of tables, and how to use various statistical methods to analyze tabular data. Throughout the chapter, the AOSI dataset will be used.

6.2 Creating Basic Tables: table() and xtabs()

A contingency table is a tabulation of counts and/or percentages for one or more variables. In R, these tables can be created using table() along with some of its variations. To use table(), simply add in the variables you want to tabulate separated by a comma. Note that table() does not have a data= argument like many other functions do (e.g., ggplot2 functions), so you much reference the variable using dataset$variable. Some examples are shown below. By default, missing values are excluded from the counts; if you want a count for these missing values you must specify the argument useNA=“ifany” or useNA=“always”. The below examples show how to use this function.

aosi_data <- read.csv ( "Data/cross-sec_aosi.csv" , stringsAsFactors= FALSE , na.strings = "." ) # Table for gender table (aosi_data $ Gender)

## ## Female Male ## 235 352

# Table for study site table (aosi_data $ Study_Site)

## ## PHI SEA STL UNC ## 149 152 145 141

# Two-way table for gender and study site table (aosi_data $ Gender, aosi_data $ Study_Site)

## ## PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88

# Notice order matters: 1st variable is row variable, 2nd variable is column variable # Let's try adding in the useNA argument table (aosi_data $ Gender, aosi_data $ Study_Site, useNA = "ifany" )

## ## PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88

table (aosi_data $ Gender, aosi_data $ Study_Site, useNA = "always" )

## ## PHI SEA STL UNC <NA>## Female 55 67 60 53 0## Male 94 85 85 88 0## <NA> 0 0 0 0 0

# Let's save one of these tables to use for later examples table_ex <- table (aosi_data $ Gender, aosi_data $ Study_Site)

Now let’s add row and column labels to the gender by study site table. For a table object, these labels are referred to as “dimnames” (i.e., dimension names) which can be accessed using the dimnames() function. Note that this is similar to the names() function with lists, except that now our table has multiple dimensions, each of which can have its own set of names. For a table, dimnames are stored as a list, with each list entry holding the group labels for the variable corresponding to that dimension. The name for each of these list entries will specify the actual label to be used in the table. By default, these names are blank, hence why the default table has no row and column labels. We can change this by specifying these names, using names() with dimnames().

dimnames (table_ex)

## [[1]]## [1] "Female" "Male" ## ## [[2]]## [1] "PHI" "SEA" "STL" "UNC"

# we see the group labels. Note that each set of group labels in unnamed (blanks next to [[1]] and [[2]]). This is more clearly see by accessing these names explicitly using names() names ( dimnames (table_ex))

## [1] "" ""

# Now, let's change these names and see how the table changes names ( dimnames (table_ex)) <- c ( "Gender" , "Site" ) names ( dimnames (table_ex))

## [1] "Gender" "Site"

table_ex

## Site## Gender PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88

# Now the row and column labels appear, making the table easier to understand

It also common to view these tabulations as percentages. This can be done by using prop.table(), which unlike table() takes in a table object as an argument and not the actual variables of interest. Note that any changes to dimnames that are done to the table object are kept when applying prop.table(). The output from prop.table() is also stored as an object of type table.

# 2 Way Proportion Table prop_table_ex <- prop.table (table_ex)prop_table_ex

## Site## Gender PHI SEA STL UNC## Female 0.09369676 0.11413969 0.10221465 0.09028961## Male 0.16013629 0.14480409 0.14480409 0.14991482

A second way of creating contingency tables is using the xtabs() function, which requires the stats package (which is included in R by default, though still load the package using library()). The function xtabs() creates a object of type xtabs and you will notice that the output of both xtabs() and tabel() is nearly identical. xtabs() has the following advantages: 1) row and column labels are included automatically, set to the variable names and 2) there is a data= argument, which means you just have to reference the variable names. With xtabs(), you do not list out the variables of interest separated by commas. Instead you use formula notation, which is ~variable1+variable2+… where variable1 and variable2 are the names of the variables of interest. You can add more then two variables (hence the …). See below for the two-way gender and site example.

library (stats)table_ex_xtabs <- xtabs ( ~ Gender + Study_Site, data= aosi_data)table_ex_xtabs

## Study_Site## Gender PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88

To create a table of proportions using xtab(), you first create the table of counts using xtab(), and then use the prop.table() function on this table object. This is exactly what was done when using table().

One useful function when creating tables is proportions is round(). As seen with the previous table of proportions, R will not round decimals by default. The round() function can be used for all types of R objects. The first argument is the object of values you want to round and the second argument is the number of decimal places to round to.

prop_table_ex_xtabs <- prop.table (table_ex_xtabs)prop_table_ex_xtabs

## Study_Site## Gender PHI SEA STL UNC## Female 0.09369676 0.11413969 0.10221465 0.09028961## Male 0.16013629 0.14480409 0.14480409 0.14991482

prop_table_ex_xtabs <- round (prop_table_ex_xtabs, 2 )prop_table_ex_xtabs

## Study_Site## Gender PHI SEA STL UNC## Female 0.09 0.11 0.10 0.09## Male 0.16 0.14 0.14 0.15

prop_table_ex <- round (prop_table_ex, 2 )prop_table_ex

## Site## Gender PHI SEA STL UNC## Female 0.09 0.11 0.10 0.09## Male 0.16 0.14 0.14 0.15

Lastly, we discuss how to add margin totals to your table. Whether using table() or xtab(), a simple way to add all margin totals to your table is with the function addmargins() from the stats package. Simply add your table or xtab object as the first argument to the addmargins() function, and a new table will be returned which includes these margin totals. This also works with tables of proportions.

table_ex <- addmargins (table_ex)table_ex_xtabs <- addmargins (table_ex_xtabs)prop_table_ex <- addmargins (prop_table_ex)prop_table_ex_xtabs <- addmargins (prop_table_ex_xtabs)table_ex

## Site## Gender PHI SEA STL UNC Sum## Female 55 67 60 53 235## Male 94 85 85 88 352## Sum 149 152 145 141 587

table_ex_xtabs

## Study_Site## Gender PHI SEA STL UNC Sum## Female 55 67 60 53 235## Male 94 85 85 88 352## Sum 149 152 145 141 587

prop_table_ex

## Site## Gender PHI SEA STL UNC Sum## Female 0.09 0.11 0.10 0.09 0.39## Male 0.16 0.14 0.14 0.15 0.59## Sum 0.25 0.25 0.24 0.24 0.98

prop_table_ex_xtabs

## Study_Site## Gender PHI SEA STL UNC Sum## Female 0.09 0.11 0.10 0.09 0.39## Male 0.16 0.14 0.14 0.15 0.59## Sum 0.25 0.25 0.24 0.24 0.98

There are many packages which you can install with more advanced tools for creating and customizing contingency tables. We will cover some in the Chapter 9, though table() and xtabs() should suffice for exploratory analyses.