Identifying student behavior in MOOCs using Machine Learning Goals and challenges

Main Article Content

Vanessa Faria de Souza
Gabriela Perry


This paper presents the results literature review, carried out with the objective of identifying prevalent research goals and challenges in the prediction of student behavior in MOOCs, using Machine Learning. The results allowed recognizingthree goals: 1. Student Classification and 2. Dropout prediction. Regarding the challenges, five items were identified: 1. Incompatibility of AVAs, 2. Complexity of data manipulation, 3. Class Imbalance Problem, 4. Influence of External Factors and 5. Difficulty in manipulating data by untrained personnel.


Download data is not yet available.

Article Details

How to Cite
de Souza, V. F., & Perry, G. (2019). Identifying student behavior in MOOCs using Machine Learning: Goals and challenges. International Journal for Innovation Education and Research, 7(3), 30-39.
Author Biographies

Vanessa Faria de Souza, Universidade Federal do Rio Grande do Sul, Brazil

Graduate Program of Informatics in Education

Gabriela Perry, Universidade Federal do Rio Grande do Sul, Brazil

Graduate Program of Informatics in Education


[1] ABED - CENSO EAD 2016 [Online]. “Relatório analítico da aprendizagem a distância no Brasil”, Available: [Acesso em 30 de setembro de 2018]
[2] A. Singh, A. Purohit, “A Survey on Methods for Solving Data Imbalance Problem for Classification,”International Journal of Computer Applications, V. 127, N.15, 2015, pp. 0975 – 8887.
[3] B. Hong, Z. Wei, Y. Yang, “Discovering Learning Behavior Patterns to Predict Dropout in MOOC,” in 12th International Conference on Computer Science and Education (ICCSE), Houston, TX, USA, 2017, pp. 700–704.
[4] C. Romero, S. Ventura, S. “Educational data science in massive open online courses,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, V. 7, N. 1, 2016.
[5] D. F. Onah, J. Sinclair, R. Boyatt, “Dropout rates of massive open online courses: behavioural patterns,” in 14th, EDULEARN, EUA, 2014, pp. 5825-5834.
[6] D. S. R. Vosgerau, J. P. Romanowski, “Estudos de revisão: implicações conceituais e metodológicas,” Revista Diálogo Educacional. Curitiba, vol. 14, n. 41, 2014, pp. 165-189.
[7] J. A. Greene, C. A. Oswald, J. Pomerantz, “Predictors of Retention and Achievement in a Massive Open Online Course,” American Educational Research Journal, V. 52, N. 5, 2015, pp. 925–955.
[8] J. Liang, C. LI, L. Zheng, “Machine Learning Application in MOOCs: Dropout Prediction,” in 11th International Conference on Computer Science & Education (ICCSE 2016), Nagoya University, Japan, 2016, pp. 752–57.
[9] J. A. Ruipérez-Valiente, p. J. Muñoz-merino, d. Leony, c. D. Kloos, “ALAS-KA: A learning analytics extension for better understanding the learning process in the Khan Academy platform”, Computers in Human Behavior, Volume 47, 2015, pp. 139-148.
[10] K. F. Hew, C. Qiao, Y. Tang, “Understanding Student Engagement in Large-Scale Open Online Courses: A Machine Learning Facilitated Analysis of Student’s Reflections in 18 Highly Rated MOOCs,”International Review of Research in Open and Distributed Learning, V. 19, N. 3, 2018, pp. 69-93.
[11] L. M. B. Manhaes, S. M. S. Costa, J. Zavaleta, G. Zimbrao, “Previsão de Estudantes com Risco de Evasão o Utilizando Técnicas de Mineração de Dados,” in Proceedings of the 22th, SBIE, Campinas, Brasil, 2011, pp. 1500-1510.
[12] L. Wang, G. Hu, T. Zhou, “Semantic Analysis of Learners Emotional Tendencies on Online MOOC Education,” Sustainability V. 10, N. 192, 2018.
[13] N. Periwal, K. Rana, “An Empirical Comparison of Models for DropoutProphecy in MOOCs,” in International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 2017, pp. 906–911.
[14] R. Gotardo, P. Cereda, J. E. Hruschka, “Predição do Desempenho do Aluno usando Sistemas de Recomendação e Acoplamento de Classificadores,” In Proceedings of the 24th, SBIE, Campinas, Brasil, 2013, pp. 2202-2212.
[15] R. L. Rodrigues, F. P. A. Medeiros, A. S. Gomes, “Modelo de Regressão Linear aplicado à previsão de desempenho de estudantes em ambiente de aprendizagem,” in 24th, SBIE, Campinas, Brasil, 2013, pP. 607-616.
[16] R. S. Baker, D. Lindrum, M. J. Lindrum, D. Perkowski, “Analyzing early at-risk factors in higher education e-learning courses”. Students at Risk: Detection and Remediation, 2015.
[17] S. Halawa, D. Greene, J. Mitchell, “Dropout prediction in moocs using learner activity features,” in Proceedings of the European MOOC Summit (EMOOCs 2014)Lausanne, Switzerland, 2014.
[18] S. Jiang, A. Williams, K. Schenke, M. Warschauer, D. O'dowd, “Predicting MOOC performance with week 1 behavior,” in 7th International Conference on Educational Data Mining, 2014.
[19] T. L. Durksen, M. W. Chu, Z. F. Ahmad, A. L. Radil, M. L. Daniels, “Motivation in a MOOC: a probabilistic analysis of online learners basic psychological needs,” Springer, Soc. Psychol Educ., 2016.
[20] W. Xing, R. Wadholm, E. Petakovic, S. Goggins, “Group learning assessment: developing a theory-informed analytics”. Journal of Educational Technology & Society, V. 18 N. 2, 2015, pp. 110-128,
[21] W. Xing, Chenx., J. Stein,M.Marcinkowski, “Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization,”Elservier, Computers in Human Behavior V. 58, 2016, pp. 119-129.
[22] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, A. Mclachlan, B. Liu, P. S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, “Top10 algorithms in data mining. Knowledge and Information Systems,” Springer, Knowl Inf Syst, V. 14, 2008, pp. 1–37.
[23] Y. Chen, Q. Chen, M. Zhao, S. Boyer, K. Veeramachaneni, H. Qu, “DropoutSeer: Visualizing learning patterns in Massive Open Online Courses for dropout reasoning and prediction,” in IEEE Conference on Visual Analytics Science and Technology (VAST), Baltimore, MD, USA, 2016, pp. 111–120.