Predicting depth and recognizing relations between 2D images and 3D models is an essential component in understanding the 3D geometry of a scene. Reconstructing depth image from a single RGB image is still a challenge. Since depth images represent only objects’ shape compared to intensity image, whereas the intensity image is relative to the view-points, texture and lighting conditions. In this paper, we propose a deep learning model based on multi-generative network to predict a depth image from a single RGB image. We train a multi-generative network with adversarial learning with depth images rendered of 3D CAD models corresponding to objects appearing in real images. Moreover, the model is trained to optimize the Structural Similarity (SSIM) and Scale Invariant Error (SI). Using SSIM and SI as loss function improves the performance comparing to the simpler Mean Squared Error (MSE). The proposed model is evaluated on the PASCAL 3D+ dataset. Our model is able to produce images with more photo-realistic details preserving the object boundaries. The experimental results show that the proposed model yields an improvement of 4% compared to the standard GAN and GAN with a reconstruction loss.