Nowadays, concept detection from multimedia data is considered as an emerging topic due to its applicability to various applications in both academia and industry. However, there are some inevitable challenges including the high volume and variety of multimedia data as well as its skewed distribution. To cope with these challenges, in this paper, a novel framework is proposed to integrate two correlation-based methods, Feature-Correlation Maximum Spanning Tree (FC-MST) and Negative-based Sampling (NS), with a well-known deep learning algorithm called Convolutional Neural Network (CNN). First, FC-MST is introduced to select the most relevant low-level features, which are extracted from multiple modalities, and to decide the input layer dimension of the CNN. Second, NS is adopted to improve the batch sampling in the CNN. Using NUS-WIDE image data set as a web-based application, the experimental results demonstrate the effectiveness of the proposed framework for semantic concept detection, comparing to other well-known classifiers.