R base functions
table
table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.
Examples
Which value appears in the "STATE" column the most times, in itcont1980.txt?
Click to see solution
library(data.table)
myDF <- fread("/anvil/projects/tdm/data/election/itcont1980.txt", quote="")
names(myDF) <- c("CMTE_ID", "AMNDT_IND", "RPT_TP", "TRANSACTION_PGI", "IMAGE_NUM", "TRANSACTION_TP", "ENTITY_TP", "NAME", "CITY", "STATE", "ZIP_CODE", "EMPLOYER", "OCCUPATION", "TRANSACTION_DT", "TRANSACTION_AMT", "OTHER_ID", "TRAN_ID", "FILE_NUM", "MEMO_CD", "MEMO_TEXT", "SUB_ID")
head(sort(table(myDF$STATE), decreasing=TRUE), n=1)
CA 3706
paste and paste0
paste is a function that converts vector elements to character strings and then concatenates them. It has a sep argument (default sep = " ") where the user can include a phrase/string to separate the strings being pasted together
paste0 is a version of paste where its sep argument is "", meaning the strings will be linked with no characters in between.
Examples
Use the paste command to join the "CITY" and "STATE" columns, with the goal of determining the top 5 city-and-state locations where donations were made.
Click to see solution
head(sort(table(paste(myDF$"CITY", myDF$"STATE", sep=", ")), decreasing=TRUE), n=6)
NEW YORK, NY , HOUSTON, TX DALLAS, TX WASHINGTON, DC
13862 11582 10146 6438 5890
LOS ANGELES, CA
5866