When training models to learn the relationship between two or more variables, we expect to see previously demonstrated knowledge about that relationship reflected in the resulting estimators. For some domains, such as healthcare, it is imperative for actual implementation of those models that their predictions respect this knowledge. In this study we focus on Assisted Reproduction Technology (ART), the subspecialty of gynecology occupied with treating human infertility, and where the goal of any treatment is the delivery of a healthy newborn. A common ART treatment is In vitro Fertilization (IVF), where embryos are generated in vitro from collected sperm and oocytes, and transferred to the uterus of the patient after selecting those most likely to give rise to a healthy pregnancy. IVF has an approximate 30% successes rate per cycle; to palliate for this low success rate, a common practice so far has been to transfer two embryos simultaneously, aiming to increase the chances of a favorable outcome. While increasing overall live birth rates, this method has also led to an alarmingly high rate of twin and triplet births, associated with four times higher risk of perinatal mortality and increased obstetric complications. Our objective is to predict the chances of both pregnancy (P) and multiple pregnancy (MP) following either single embryo transfer (SET) or double embryo transfer (DET), and in so facilitating an informed decision on how many embryos to transfer. From existing literature, it is known that: (1) it is not possible for the chances of both P and MP to be decreased by increasing the number of embryos; (2) MP chances cannot be higher than P; and (3) chances of pregnancy are highly correlated with age, embryo stage, and quality. With a dataset generated from an existing observational study, we trained several state-of-the-art classifiers to predict P and MP given SET and DET. Analyzing the results, all classifiers achieved promising AUC scores. However, Random Forest and Gradient Boosting predicted negative chance differences in many instances when increasing the number of embryos infringing the first constraint. Logistic Regression predicted always positive differences, but in some instances it infringes the second constraint, predicting higher chances of MP than of P. Moreover, it showed little to no variation across ages or embryo stages violating third constraint. Conventional Machine Learning models struggle to reflect the real-world outcomes when using DET versus SET in specific patients. More informative variables could help, but it is already worrisome that variables as important as age and embryo stage do not result already in any variation, and that when models do show variation, in many cases they predicted decreasing chances of success with more embryos. We conclude that new and different approaches are needed to correctly model this scenario and, likely, many others resembling this one.