Data can be enormous and hard to understand when observed in its raw, unprocessed likeness. For this purpose, statistics was created to help us organize and analyze data. First, values are organized in tables; then, data are graphed in different displays.
Tables are a necessary step to start analyzing data, but it may fail to highlight essential features with data. The graphical displays are visually attractive, easy to read, and see important patterns of the distribution. For categorical variables, the choices are limited. Bar graphs and pie charts are the most common displays when looking at data in chart-like formats. 🪑
Let's say we just finished sending out a survey to our AP Statistics class on how stressful being a student is as a hypothetical occupation with our categories being "very," "somewhat," and "none" (not stressful at all). We, then, have collected 30 responses. How do we organize these responses and make sense of them? 🤔
One way to do so is to “pile” the data by counting the number of data values in each category of interest. From there, we can organize these counts into a frequency table, which records the totals and the category names.
At this point, you might wonder: "Wait, what's a frequency table?"
A frequency table for qualitative data lists all categories in one column and the number of elements that belong to each of the categories on the next column. Tallies (e.g., ||||) can be used to number the raw data. The frequency table has the following look:
Source: Prem S. Mann. Introductory Statistics. John Wiley and Sons Inc. 2010
The variable is stress on job, which assumes three categories; very, somewhat, and none. Since there is some order, stress on job can be ranked as an ordinal variable. The frequency table always reports the sum of the frequencies that makes up our sample. 🧮
The concept of frequency tables can be extended using relative frequencies and percentages.
Relative frequency of a category = Frequency of that cat category / Sum of all frequencies
Percentage = Relative frequency * 100
With these quantities in mind, a relative frequency table is similar, but it gives the percentages for each category instead of counts. Based on the relative frequency and percentage distributions of stress on job, we can state that the 33.3% of the employees answered that their jobs are very stressful. Another way to interpret the data is by combining the groups "very" and "somewhat" stressed and report that 80% of the employees answered that jobs are very or somewhat stressful. ✅
The image below shows the difference between frequency and relative frequency. Unlike frequencies which typically add up to whatever total count of items/respondents there are, tThe sum of the relative frequencies should be 1.00 or close to 1.00 if the relative frequencies have been rounded. Similarly, the sum of the percentages is always 100 or close 100 if the percentages have been rounded as well.
Source: Basic Mathematics
For example, consider the following data set: 2, 3, 3, 5, 5, 5, 7, 7, 9. A frequency table for this data set would look like this:
Value ---- Frequency
2 ---- 1
3 ---- 2
5 ---- 3
7 ---- 2
9 ---- 1
For example, the relative frequencies for the data set above would be:
Value ---- Relative Frequency
2 ---- 0.1
3 ---- 0.2
5 ---- 0.3
7 ---- 0.2
9 ---- 0.1