Learning effective feature representations and similarity measures are critical to the performance of a CBIR. Although various techniques have been proposed, it remains one of the most challenging problems in CBIR, which is mainly due to “semantic gap” issue that exists between low-level image pixels captured by machine and high-level semantic concepts perceived by human. One of the most important advances in machine learning is known as “deep learning” that attempts to model high-level abstractions in data by employing deep architectures composed of multiple non-linear transformations. We can improve CBIR using the state-of-the-art deep learning techniques for learning feature representations and similarity measures. Deep Neural Networks have recently shown great performance on image classification.
In this paper we take another step and present some proposed methods for object detection and semantic segmentation which can be used for CBIR. Besides, we propose an approach based on a well-known deep CNN architecture, GoogLeNet. One of the most important approaches for object detection is RCNN (Regions with zCNN features). The core idea of RCNN is to generate multiple object proposals, extract features from each proposal using a CNN, and then classify each candidate window with a category-specific linear SVM. We will present the idea of RCNN and the method improved this idea including multi-stage and deformable deep CNN. The other important approaches for object detection are based on a DNN-based regression towards an object mask. In this approach we can define one or multi CNN to detect multi objects. We will present the idea of DNN-based regression and compare this idea with the idea of RCNN. Finally, we propose a CBIR using a Deep CNN in this paper. In our proposed CBIR, we consider the output of the last convolutional layer as image features to find similar images based on these feature maps.
Deep learning requires extremely large computational resources to train a multi-layer neural network. GPUs are often used to train deep neural networks, because the primary computational steps match their SIMD-style nature and they provide much rawer computing capability than traditional CPU cores. In this paper we explain the role of GPUs and Caffe to get high performance computing model in deep learning.
Totally, our goal in this paper is to explain various architectures in deep neural networks, RCNN approach to object detection, DNN-based regression approach to object detection, high performance computing in Deep learning using GPUs, and Our proposed CBIR using a Deep CNN.