This work aims at developing a generalized and optimized path loss model that considers rural, suburban, urban, and urban high rise environments over different frequencies, for use in the Heterogenous Ultra Dense Networks in 5G. Five different machine learning algorithms were tested on four combined datasets, with a sum of 12369 samples in which their hyper-parameters were automatically optimized using Bayesian optimization, HyperBand and Asynchronous Successive Halving (ASHA). For the Bayesian optimization, three surrogate models (the Gaussian Process, Tree Structured Parzen Estimator and Random Forest) were considered. To the best of our knowledge, few works have been found on automatic hyper-parameter optimization for path loss prediction and none of the works used the aforementioned optimization techniques. Differentiation among the various environments was achieved by the assignment of the clutter height values based on International Telecommunication Union Recommendation (ITU-R) P.452-16. We also included the elevation of the transmitting antenna position as a feature so as to capture its effect on path loss. The best machine learning model observed is K Nearest Neighbor (KNN), achieving mean Coefficient of Determination (R2), average Mean Absolute Error (MAE) and mean Root Mean Squared Error (RMSE) values of 0.7713, 4.8860dB, and 6.8944dB, respectively, obtained from 100 different samplings of train set and test set. Results show that machine learning can also be used to develop path loss models that are valid for a certain range of distances, frequencies, antenna heights, and environment types. HyperBand produced hyper-parameter configurations with the highest accuracy in most of the algorithms.