

Understanding student engagement in online education is crucial for optimizing learning outcomes. This paper introduces ECLIPSE dataset (Extended Classroom Learning Insights via Prolonged Student Engagement), comprising 10,110 annotated images from a 55-minutes , 30-minutes and 20-minutes online lecture. Annotations include four affective states: engagement, boredom, confusion, and frustration. ECLIPSE enables the investigation of learner attention dynamics over extended periods, overcoming the limitations of short-duration datasets. We establish benchmarks for ECLIPSE using models such as EfficientNet, Vision Transformer, Residual Attention Network, and GLAMOR-Net. We propose NeuralGaze, a novel framework integrating Neural Cellular Automata (NCA) with self-attention mechanisms, demonstrating superior accuracy in engagement level assessment compared to basic single-frame models. Furthermore, we introduce CG-SwT, a content-guided Swin Transformer model, which significantly outperforms the baseline ViT model on the ECLIPSE dataset (with F1-score improvements of 21.12%, 12.5%, 16.77%, and 15.41% for engagement, boredom, frustration, and confusion respectively). Our methods surpass existing single-frame engagement prediction baselines for both EngageNet and DAiSEE datasets by significant margins (7.4% and 6.2%, respectively). The code and dataset will be made publicly available.