Student-Teacher Evaluation Analysis

This document provides an overview of the methods used to produce the figures and values in the blog post shown here.

It utilizes predominantly the pandas and seaborn libraries for Python. Note that, by convention, these libraries are imported as pd and sns, respectively, so that any reference to these namespaces will indicate a call to the respective library.

Background

Tidy Data

Each of these libraries, and especially seaborn, relies upon the data being represented in a tidy format. Briefly, this tidy standard (also known as data matrix) insists that each observation occupies its own row, and that each feature/variable of an observation occupies its own column. This simple format, when enforced, allows graphing libraries to make the programming interfaces more user-friendly and terse based on assumptions about the arrangement of the incoming data. Many excellent examples of tidy (and messy) datasets are available in a pioneering paper by Hadley Wickham, developer of the renowned ggplot library in R (H.Wickham, 2014). Typically, there are multiple measures that could be considered an "observation" and used to denote a distinct row; the appropriate dimension can often be found by close inspection of the dataset and the relationships you want to inspect.

As an example, my company uses imaging techniques to capture oxygenation biomarkers in patients' feet. When we look at this data, should we consider a patient to be a single observation, such that each patient is in their own row? Seems logical. However, if a patient occupies a single row, our features will have to be associated with the left and right foot of that patient, e.g. 'Left Foot Biomarker 1', 'Right Foot Biomarker 1', 'Left Foot Biomarker 2', etc. This organization is useful if we want to look at properties for the Left foot compared to the Right foot for each patient. For most applications, it makes more sense to consider a single limb as a base unit of observation, such that the features need not be associated with a direction. With this organization, we denote a new feature - 'Direction' - which takes the values 'Left' or 'Right', and the other biomarkers can drop their explicit association with direction. In practice, it sometimes make sense to keep multiple copies of the dataframe in memory, with each organized according to a different base unit of measurement.

We will be working with predominantly tidy data throughout this tutorial, where each observation/row is a section of a course; That is, one instance of a course taught by one instructor. Our data will come in this tidy format, but know that pandas library comes with many tools for converting messy data to a tidy format (search pd.wide_to_long, pd.melt, pd.DataFrame.unstack functions for more info).

The first step is to read in the datasets. Both datasets will be read in as pandas DataFrame objects, which you can think of as analogous to spreadsheets, large tables - the main student reviews dataset read as df and the gender information (obtained from Gender API based on instructor first names) as df_gender). Instantiating these DataFrame objects will allow us to use the methods (functions tied to classes) associated with the DataFrame class to operate on our data.

A Note on Method-Chaining

Although dataframes (the chief abstraction provided by Pandas) are inherently mutable (by passing in the inplace=True operator), this approach is slated for deprecation and negates much of the beauty of pandas. The example below displays two equivalent operations, one using inplace operators and the other using a functional approach called method-chaining.

In-place operations Method Chaining
df.reset_index(inplace=True)
df.join(df2, by='', inplace=True)
df.drop_duplicates(inplace=True)
df.round(2, inplace=True)
df=df.reset_index().join(df2, by='Name').drop_duplicates().round(2)

By chaining commands together, arguments are kept close to their respective function calls. Furthermore, this approach implicitly informs anyone reading the code that the intermediate state of the df object within the method chain is irrelevant. In this way, these custom user-defined method chains can be more easily understood as serial functions which perform some task. In pandas, method chaining allows you to write code that is more terse and clearer. A better example is shown below, borrowed generously from here.

Wrapped Functions Imperative, treating objects as immutable Method-Chained
tumble_after(
broke(
fell_down(
fetch(
went_up(
jack_jill, "hill"),
"water"),
"jack"),
"crown"),
"jill")
on_hill = went_up(jack_jill, 'hill')
with_water = fetch(on_hill, 'water')
fallen = fell_down(with_water, 'jack')
broken = broke(fallen, 'jack')
after = tumble_after(broken, 'jill')
jack_jill
.went_up("hill")
.fetch("water")
.fell_down("jack")
.broke("crown")
.tumble_after("jill")

As we work through these analyses, many examples will include method chaining with pandas.

A Note on Imports

In Python, it's idiomatic to place all of your import statements at the top of the script. In this notebook, I've neglected to do that, such that I can keep the import statement close to the code which calls the imported module. This approach was taken to increase the clarity and readability of the examples.


Analysis

Loading in the Data

In [1]:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt # matplotlib will be used to generate figures that seaborn fills with content

# IPython.display will let us nicely render code as markdown
from IPython.display import display, Markdown
# Assign the function dm to render markdown to standard output
dm = lambda x: display(Markdown(x))

# Read in the Aggregated dataset
df_reviews = pd.read_csv('2019-10-06-Aggregated-Reviews.csv', index_col=False)

# Read in the instructor gender information
df_gender = pd.read_csv('instructors_with_gender.csv', index_col=False)

Now we come to an issue. We need to join the two dataframes in order to enforce our tidy data standards and to use tidy tools (like Seaborn) to explore the dataset. We can accomplish this easily via the pd.DataFrame.merge method. To better represent this join, we'll first set the index of each array to the columns along which they will be joined, via the pd.DataFrame.set_index method. To display the reindexed DataFrame, we'll use the pd.DataFrame.head function, which displays the first n elements. Well also print out the pd.DataFrame.columns attribute, which contains the names of the columns that can be operated upon in the DataFrame.

In [4]:
df_gender = df_gender.sort_values(by='Last Name')
dm('##### df_gender') # Recall that dm was assigned to the function display markdown, so it is basically a print statement now
display(df_gender[['Last Name', 'First Name', 'ga_gender']].head(5)) # Print the first 5 rows
print(f'Columns: {df_gender.columns.values}') # .values returns an np.array from the .columns attribute 

df_reviews = df_reviews.sort_values(by='Instructor Last Name')
dm('<br />') # Recall that dm was assigned to the function display markdown, so it is basically a print statement now
dm('##### df_reviews')
display(df_reviews[['Instructor Last Name', 'Instructor First Name', 'Instructor ID']].drop_duplicates().head(5)) # Slice columns and print the first 5 rows
print(f'Columns: {df_reviews.columns.values}') # .values returns an np.array from the .columns attribute 
df_gender
Last Name First Name ga_gender
757 Abbas June female
3079 Abbott Gabrielle female
2035 Abbott Braden male
2913 Abousleiman Younane male
3119 Abraham Eric male
Columns: ['First Name' 'Last Name' 'ga_first_name' 'ga_gender' 'ga_accuracy'
 'ga_samples']


df_reviews
Instructor Last Name Instructor First Name Instructor ID
43846 Abbas June 18447871
22979 Abbott Braden 1214093536
27276 Abbott Gabrielle 1704333194
25471 Abdallah Raef 564469350
25581 Abdeddine Ali 767991961
Columns: ['Unnamed: 0' 'Avg Course Rating' 'Avg Department Rating'
 'Avg Instructor Rating In Section' 'College Code' 'Course Enrollment'
 'Course Number' 'Course Rank in Department in Semester' 'Course Title'
 'Instructor Enrollment' 'Instructor First Name' 'Instructor ID'
 'Instructor Last Name' 'SD Course Rating' 'SD Department Rating'
 'SD Instructor Rating In Section' 'Subject Code' 'Term Code'
 'course_uuid']

Take a moment to note all of the columns of each dataframe, and to recognize the breadth of analysis opportunities this affords.

To create the Dataframe we'll use for analysis, we can merge df_gender to df_reviews via the First Name and Last Name columns. But, as we can see the equivalent columns in df_reviews are named Instructor First Name and Instructor Last Name, so we can pass in some additional arguments to the .merge() method to account for this distinction.

In [4]:
df = df_reviews.merge(df_gender, left_on=['Instructor First Name', 'Instructor Last Name'], right_on=['First Name', 'Last Name'], how='inner')
display(df[['Instructor Last Name', 'Instructor First Name', 'Instructor ID', 'ga_gender']]\
        .sort_values(by='Instructor Last Name')\
        .drop_duplicates().head(5)) # Slice columns then use df.head() method to output first 5 columns
Instructor Last Name Instructor First Name Instructor ID ga_gender
0 Abbas June 18447871 female
26 Abbott Braden 1214093536 male
18 Abbott Gabrielle 1704333194 female
39 Abousleiman Younane 1908424646 male
58 Abraham Eric 961195025 male

You can see from the above that we have successfully merged our two dataframes into the df variable - Note the gender and ID equivalence to the individual tables above.

Now, what can we do with this tidy and merged dataset? That's where seaborn comes in.

Gender Differences In Student Ratings

Seaborn knows how to fetch and display data from a tidy dataframe, like the one(s) we have loaded in and the df we created from the .merge() method. To show how this accelerates the visualization process, consider the example below.

In [5]:
print(f'Before filtering to ensure gender accuracy, the dataset had {len(df)} entries.\n')
df = df[df['ga_accuracy']>90] # filter the df by ga_accuracy to ensure that we are confident in the gender of all of our professors
print(f'After filtering to ensure gender accuracy, the dataset has {len(df)} entries.\n\n')
sns.reset_defaults() # See other seaborn styles here: https://seaborn.pydata.org/tutorial/aesthetics.html

# Create a larger axes for the plot
fig, ax = plt.subplots(2,1, figsize=(10,8))
fullscale = sns.boxplot(x='Avg Instructor Rating In Section', y='ga_gender', data=df, ax=ax[0])
zoomed = sns.boxplot(x='Avg Instructor Rating In Section', y='ga_gender', data=df, ax=ax[1])
ax[1].set_xlim([4.2,4.5])
plt.show()
Before filtering to ensure gender accuracy, the dataset had 41039 entries.

After filtering to ensure gender accuracy, the dataset has 36232 entries.


It looks like, for our dataset at the very least, male professors are rated more highly than female. But how much higher are they rated? And whats our sample size like? And how did these biases vary over time? And is the difference significant? To answer these questions that any statistician/data scientist would pose, we can use a combination of pandas, seaborn, and another common Python package regularly used in scientific computing known as scipy.

What is the magnitude of the difference and whats our sample size?

To answer this, we'll use the pd.DataFrame.Groupby method to separate the two groups, then the pd.DataFrame.aggregate method to compute some metrics. With the magic of pandas, we can answer both of these questions (and more) in one line of code.

In [6]:
display(df.groupby('ga_gender').aggregate({'Avg Instructor Rating In Section': ['count', 'mean', 'median', 'std']}))
Avg Instructor Rating In Section
count mean median std
ga_gender
female 13570 4.245746 4.345661 0.555673
male 22662 4.252018 4.366666 0.593590

We note that men teach ~62% of the course sections at OU, and that the mean scores are actually extremely similar ($\Delta\approx0.006$) between male and female profesors.

How much have these gender effects changed over time?

To answer this, lets pull out another column using the pd.Series.apply method. Note that a pd.Series is simply a column in a pd.DataFrame, s.t. df['Term Code'] returns a pd.Series object. Think of our pd.Series.apply method as analogous to an excel operation where you write out a formula in a top cell and drag the formula down to apply to all of your rows. To pull out this Year Series/column, note that the df column Term Code takes format YYYYSS, where YYYY is the year and SS the semester (10: Fall, 20:Spring, 30:Summer). We'll pull out these first 4 digits and assign them to the Year column.

In [7]:
 # Note that our lambda function converts the int to string, then slices off the first 4 characters and converts these back to int type
df['Year'] = df['Term Code'].apply(lambda x: int(str(x)[:4]))

Now, we can plot the average ratings of male and female professors by year.

In [8]:
df_by_year = df.groupby(['Year', 'ga_gender']).aggregate({'Avg Instructor Rating In Section': 'mean'}).reset_index()
display(df_by_year.set_index('Year').round(2).T) # .T transposes the table
fig, ax = plt.subplots(1, figsize=(10,4))
bar = sns.barplot(x='Year', y='Avg Instructor Rating In Section', hue='ga_gender', data=df, ax=ax)
ax.set_ylim([4,4.5])
plt.show()
Year 2010 2010 2011 2011 2012 2012 2013 2013 2014 2014 2015 2015 2016 2016 2017 2017 2018 2018 2019 2019
ga_gender female male female male female male female male female male female male female male female male female male female male
Avg Instructor Rating In Section 4.17 4.19 4.35 4.32 4.25 4.28 4.21 4.26 4.25 4.31 4.23 4.22 4.25 4.21 4.23 4.24 4.26 4.26 4.27 4.3

Things are less apparent from this viewpoint. Although men have higher mean ratings in the majority of years - 2010, 2012, 2013, 2014, 2017, and 2018 - Men are by no means consistently or exclusively rated higher than women.

Are the gender differences significant?

We can use the python package scipy to conduct a two-sample students t-test to compare the means of the two groups.

In [9]:
from scipy import stats
male_ratings = df[df['ga_gender']=='male']['Avg Instructor Rating In Section']
female_ratings = df[df['ga_gender']=='female']['Avg Instructor Rating In Section']
dm('Based on a Individual Means t-test for all tested instructor ratings, the p-value for the test is '\
   f'{round(stats.ttest_ind(male_ratings, female_ratings)[1], 2)}, '\
   'meaning we cannot reject the null hypothesis that the male and female group ratings are equivalent.')

Based on a Individual Means t-test for all tested instructor ratings, the p-value for the test is 0.32, meaning we cannot reject the null hypothesis that the male and female group ratings are equivalent.

There are many more comparisons between variables we could conduct, but lets move on to futher analaysis and demonstrate some other functionality in seaborn and pandas.


Differences by College

We'll take a chance to assess how the ratings for each college have varied over time. However, the colleges are stored by codes, rather than the full name of the department. To back out the full department name, we can use the pd.DataFrame.map function to apply the following hash map to convert college codes to college names:

In [10]:
from pprint import pprint  # pprint stands for pretty print; Just outputs nicely formatted text

college_code_mapper = {'College of Architecture': 'CoA',
    'College of Arts and Sciences': 'CoAaS',
    'College of Atmospheric & Geographic Sciences': 'CoA&GS',
    'College of Continuing Education - Department of Aviation': 'CoCE-DoA',
    'Michael F. Price College of Business': 'MFPCoB',
    'Melbourne College of Earth and Energy': 'MCoEaE',
    'Jeannine Rainbolt College of Education': 'JRCoE',
    'Gallogly College of Engineering': 'GCoE',
    'Weitzenhoffer Family College of Fine Arts': 'WFCoFA',
    'Honors College': 'HC', 'College of International Studies': 'CoIS',
    'Gaylord College of Journalism and Mass Communication': 'GCoJaMC',
    'College of Professional and Continuing Studies': 'CoPaCS',
    'University College': 'UC', 'Center for Independent and Distance Learning': 'CfIaDL',
    'Expository Writing Program': 'EWP', 'ROTC - Air Force': 'R-AF'}

# Invert the mapper
code_college_mapper = {v:k for k,v in college_code_mapper.items()} # dict comprehensions are great
pprint(code_college_mapper)

df['College Name'] = df['College Code'].map(code_college_mapper)
{'CfIaDL': 'Center for Independent and Distance Learning',
 'CoA': 'College of Architecture',
 'CoA&GS': 'College of Atmospheric & Geographic Sciences',
 'CoAaS': 'College of Arts and Sciences',
 'CoCE-DoA': 'College of Continuing Education - Department of Aviation',
 'CoIS': 'College of International Studies',
 'CoPaCS': 'College of Professional and Continuing Studies',
 'EWP': 'Expository Writing Program',
 'GCoE': 'Gallogly College of Engineering',
 'GCoJaMC': 'Gaylord College of Journalism and Mass Communication',
 'HC': 'Honors College',
 'JRCoE': 'Jeannine Rainbolt College of Education',
 'MCoEaE': 'Melbourne College of Earth and Energy',
 'MFPCoB': 'Michael F. Price College of Business',
 'R-AF': 'ROTC - Air Force',
 'UC': 'University College',
 'WFCoFA': 'Weitzenhoffer Family College of Fine Arts'}
In [11]:
fig, ax = plt.subplots(1, figsize = (12, 4))
# Plot the lines
lines = sns.lineplot(x='Year', y='Avg Department Rating', markers=True, hue='College Name', ci=None, data=df, ax=ax)
# Plot the regression lines
reg = sns.regplot(x='Year', y='Avg Department Rating', color='k', label='Linear Regression', ci=95, data=df, scatter=False, ax=ax)
lg = plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
plt.show()
In [12]:
# Compute the linear regression information via stats package
x = df['Year'].values
y = df['Avg Department Rating'].values
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
# Display our results
dm('With this plot, we can get a good indication of '
   'the overall temporal changes in ratings by department via the black line.'
   f'We note that the slope of the change (in rating points per year) is {round(slope, 3)}, and that,'
   f'although we are confident in this trend ($p<10^{{-30}}$), it represents only about $R^2=${round(100*r_value**2, 2)}%'
   ' of the variation observed in the dataset. Visually, it appears that much more of the variance can be attributed'
   ' to differences between the colleges. However, with this busy plot its tough to distinguish these colleges.'
   ' With the next graph, we will break up these curves to visually distinguish the colleges.')

With this plot, we can get a good indication of the overall temporal changes in ratings by department via the black line.We note that the slope of the change (in rating points per year) is 0.01, and that,although we are confident in this trend ($p<10^{-30}$), it represents only about $R^2=$0.48% of the variation observed in the dataset. Visually, it appears that much more of the variance can be attributed to differences between the colleges. However, with this busy plot its tough to distinguish these colleges. With the next graph, we will break up these curves to visually distinguish the colleges.

Do the number of courses offered by a college impact its ratings?

In [13]:
# We've used these functions before; See if you can follow the operations.
collegecounts = df.drop_duplicates(subset=['course_uuid']).groupby('College Name')\
                  .aggregate({'course_uuid':'count', 'Avg Department Rating': 'mean'})\
                  .rename(columns={'course_uuid':'NumberofCoursesInCollege', 'Avg Department Rating': 'Avg College Rating'})
display(collegecounts.sort_values(by='NumberofCoursesInCollege').round(2))
df = df.merge(collegecounts, left_on='College Name', right_index=True).reset_index()

dm('Now, we can plot the smaller departments and the larger departments in their respective graphs. Well '\
   'use the `pd.DataFrame.query` method to separate the df into three groups for plotting.')
NumberofCoursesInCollege Avg College Rating
College Name
Center for Independent and Distance Learning 8 3.93
Expository Writing Program 79 4.18
University College 88 4.44
College of International Studies 171 4.33
Melbourne College of Earth and Energy 220 4.13
Honors College 262 4.35
Gaylord College of Journalism and Mass Communication 267 4.00
College of Professional and Continuing Studies 378 4.39
Michael F. Price College of Business 405 4.19
Gallogly College of Engineering 508 4.19
College of Architecture 560 4.25
Jeannine Rainbolt College of Education 561 4.19
Weitzenhoffer Family College of Fine Arts 1130 4.48
College of Arts and Sciences 2895 3.92

Now, we can plot the smaller departments and the larger departments in their respective graphs. Well use the pd.DataFrame.query method to separate the df into three groups for plotting.

In [14]:
fig, ax = plt.subplots(3,1, figsize = (12, 10), sharex=True)
for index, query in enumerate(['NumberofCoursesInCollege < 200', '500 > NumberofCoursesInCollege > 200', 'NumberofCoursesInCollege > 500']):
    lp = sns.lineplot(x='Year', y='Avg Department Rating', hue='College Name', size='NumberofCoursesInCollege', data=df.query(query), ax=ax[index]) # Add width
    ax[index].set_title(f'Colleges with {query}')
    ax[index].legend(bbox_to_anchor=(1.02, 1), loc=2)
    ax[index].set_ylim([3.4, 4.6])
plt.show()