Education at the Beginning of Pandemic

Alamsyah Hanza
7 min readJan 7, 2022

The covid-19 pandemic impact has always been a topic since 2020. It causes many disruptions in many different fields due to this pandemic. Education is not excluded from these disruptions in all countries.

The pandemic pushes many students to start learning at home, many teachers learn how to use tools for teaching, and many parents guide their children in this process. Therefore, many institutes create their rules then put everything online. It is not an easy process since learning should be more effective in an offline manner.

Looking into this competition in Kaggle, I would like to go through what had happened in 2020 when the changes in Online Learning happen in the US. This analysis is a good start to making Public Policy in terms of Online Learning, especially for cities with low bandwidth.

The Description

This online learning analysis uses a data source that contains 22 million rows of online learning activity on 8647 learning products within a year. The data is collected from 233 unique districts of 15 areas which are dominated by ‘Connecticut’, ‘Utah’, ‘Massachusetts’, ‘Illinois’. The data source contains 2 main metrics :

1. Pct Accees :  Percentage of students in the district have at least one page-load event of a given product and on a given day
2. Engagement : Total page-load events per one thousand students of a given product and on a given day

The data definitely not complete, here are some notes to be remembered about the data source :

1. There are 57 district doesnt show anything about the state, locale, pct_black. 
2. There are 20 Learning Product doesnt have Sectors
3. Many null engagment index which relate to zero or null pct access.
4. In districts data only, the 'county_connections_ratio' is not really informative since the columns mostly contains `[0.18, 1[`.
5. There are 8647 products on the engagement list, however the details only have 372 (mostly comes from Google).
6. Fortunately, those 372 products with details covers half of engagement details.
7. Most of the products is under sector PreK-12 with functions LC.

Therefore, external data is needed to get this information. Also, from non-null data, the Majority of the district (65.9%) has [0, 0.2[ pct_black/Hispanic. However, there is still a small engagement for pct_access = 0 which means the zero value is just round down for small access.

The Adaptation

Firstly, I try to picture digital connectivity and engagement of 2020 online learning in the US. The picture here is done by comparing Engagement with Covid Cases in one plot and dividing it into 4 periods.

Self-Generated Trends

Those 4 different periods or phases in 2020 are based on the Covid Cases from The orange line shows the total covid cases for all areas in the US, meanwhile, the blue line shows the total covid cases of areas on Data Source. The areas that contribute to blue lines denote as Blue Areas. Here is a detailed chronology story from the plot:

Phase 1 (before 2020–02–10): It's Pre Covid for Blue Areas. The online engagement during this phase is still stable with low engagement during the weekend and high during weekdays. This phase shows the small covid cases is not make online learning increase or decrease significantly. In addition, lockdown hadn't been implemented during this phase on Blue Areas.

Phase 2 (2020–02–10; 2020–06–14): First high wave starts to happen around April but the Online Learning engagement changes drastically around February 2020. It is probably an intervention from the State to make schools start thinking about the alternatives of offline learning. The highest median of engagement happens around the beginning of high covid cases. After the peak around the middle of April, the cases on Blue areas starts decreasing along with online learning engagement.

Phase 3 (2020–06–14; 2020–09–07): After reaching the lowest covid cases in June 2020, the case on Blue Areas slowly increased (not significantly like other areas). Interestingly, the learning engagement keeps decreasing even lower than phase one despite covid cases movement. The hypothesis for this phenomenon is that Online learning engagement does not really correlate with covid cases, but is more impacted by states rule. Also, since it had been 2 or 3 months without offline learning, many districts optimally use this offline learning period by decreasing online activity.

Phase 4 (after 2020–09–07): In the last phase, the second wave of covid 19 has started and the school begins to do engagement online (maybe State’s policies). However, there is no panic from most of the schools since the engagement is not as higher as engagement during Phase 2. It could show 2 things: the school is not following lockdown policies properly OR the school can handle the online learning calmly. It means the school is adapting, it is more ready to do Online learning in terms of tools/applications and how to communicate everything to parents and students.

The Groups

The second analysis is about the relationship between student engagement with online learning platforms and different locale types (Rural, Suburb, town, etc), demographic race, and internet speed. One simple way to present this relation is by clustering 233 areas based on student engagement with online learning platforms. Three is an optimal number cluster with three features: number of active days, engagement, and unique product. However, there may be some information loss using this technique and bias towards outliers.

Based on the features, we can describe those clusters as follow :

  • 0 → high active days, high product (the Explorer) → 150 areas
  • 1 → high active days, low product (the Loyal) → 59 areas
  • 2 → low active days, low engagement (the Rookie) → 24 areas

Since the locale types, internet speed, and demographic race are nominal columns, then we use the Theils-u score to get the degree association between the two nominal variables.

Relation Clusters with internet speed and locale type

Firstly, the correlation clusters with the internet speed group (grp0 is the lowest). The Theil U score shows low relation, but the graph describes how cluster 0 relatively has low speed, meanwhile cluster 1 otherwise. The hypothesis is that because of the low internet speed, areas in cluster 0 always try alternatives so that students can follow the class.

Additionally, the locale type on cluster 1 is dominated by Rural areas. It indicates that with low population density, the areas are most likely to be loyal to certain products which are not necessarily good for them. there are two possibilities :

  1. These rural areas have learned and compared from other Cities or Towns about other products before do a decision. OR
  2. These rural areas just pick several products from limited options and it works fine for students and parents.

Either way, Rural areas must have better guidance to select the best product for their areas.

Relation Clusters with demographic race

On the other hand, cluster 2 mainly has an unknown locale type with a relatively high percentage of Black/Hispanic and relatively low internet speed. This cluster has a lower active day and engagement. Largely, this cluster has increased in engagement in the second and third weeks of August 2020 (week 33/34). There are two scenarios :

  1. These areas were just hit by the second wave of covid 19, so high engagement just happen in August 2020. OR
  2. These areas don’t have enough support or guidance in the first wave.
cluster 2 engagement

As a note, cluster 2 relatively has lower bandwidth than cluster 0. It may affect product options of these areas to explore.

The Conclusion

As we see in the Adaptation, the beginning is always harder than later. The decision which tools should be used, the adaptation of students and parents, also how the teacher adapts this into their curriculum. After several cases, trial and error, in the end, the school was disrupted. They have decided the best application for them and are more ready to do online learning.

In the process, however, some schools have limitations. Starts with internet bandwidth and a lack of information about online learning applications. This is obviously slowing down the adaptation process from offline to online learning.

In a simple sentence, disruption of the education field is hard. Not only on the school’s side, but parents must also understand the disruption. The government must provide better technology and communication within countries so there are no students left behind.