In higher education institutions, the most significant issue is to improve the students’ performance and retention rate. Massive numbers of students’ data are used to gain new hidden knowledge from students’ learning behaviour, particularly to discover the initial symptom of at-risk students by using Educational Data Mining techniques. However, data with noises, outliers and irrelevant information might cause an inaccurate result. This study aims to develop a robust students’ performance prediction model for higher education institution by identifying features of students’ data that have the potential to increase performance prediction results, comparing and identifying the most suitable ensemble learning technique after preprocessing the data and optimizing the hyperparameters. Data are collected from 2 different systems, which are: student information system and e-learning system of undergraduate students from the Faculty of Engineering in one of Malaysia’s public university. 4413 students’ instances are used for this study. The process follows 6 different data mining phases namely: data collection, data integration, data pre-processing (such as cleaning, normalization, and transformation), feature selection, patterns extraction and finally model optimization and evaluation. Machine learning techniques used to build prediction model are Decision Tree, Support Vector Machine and Artificial Neural Network, while for ensemble learning: Random Forest, Bagging, Stacking, Majority Vote and 2 variants of Boosting techniques are AdaBoost and XGBoost. Hyperparameters for ensemble learning techniques are optimized to gain better performance and optimum result. The result shows that the combination of features of students’ behaviour from e-learning and students information system using Majority Vote produced better result compared to other ensemble methods.