Education Analytics of an Online Classroom

February, 2015

Back in 2015, while I was a Sophomore at the Stanford Online High School (OHS), I wrote a series of programs for extracting and analyzing data from recordings of my online classes. I'm not going to publish too much information here because I don't think my classmates would appreciate having their classroom behavior publicly scrutinized, but I can say that I found some general results, including that:

I was also able to generate tons of class-specific statistics. For example, I could generate the network showing which students "agreed" with which others in the text chat (names anonymized):

And word clouds for the text chat in each course:

(I like how the most common word in the STEM courses is usually "oh", while the most common word in the more humanities-oriented courses is usually something more abstract like "think".)

I also generated these plots which show the breakdown of the text chat by student over time for each course, allowing you to see how much each student participated over time:

Feel free to contact me if you want more information or examples.


OHS classes are held synchronously over Adobe Connect, and (at least while I was there) every class was recorded so students could review earlier classes, or view classes that they could not attend for scheduling reasons. This is a screenshot of a typical OHS class recording:

This is probably pretty familiar to anyone who has seen a classroom in 2020, so I won't spend too much time explaining it. The main differences between OHS classrooms (while I was there) and most Zoom classrooms are that 1) most teachers didn't require that students enable their cameras when they weren't speaking, and 2) that the text chat was extremely important. To the latter point, one of my classes averaged over 1000 words per hour in the chat.

I noticed that the class recordings were not videos (as many Zoom recordings are now). Instead, they were re-rendered locally, meaning the raw data had to be transmitted to the client in some way. I discovered a few undocumented APIs that made it possible to download that raw data in the form of XML files. However, these XML files were clearly not made for analysis, and instead were more like a incredibly messy animation-by-animation guide to re-rendering the class. With some work, I managed to reverse engineer the format and extract text chat logs, microphone talk times, hand raises, slide changes, etc.