Visual dictionary learning has the capacity to determine sparse representations of input images in a data-driven manner using over-complete bases. Sparsity allows robustness to distractors and resistance against over-fitting, two valuable attributes of a competent classification solution. Its data-driven nature is comparable to deep convolutional neural networks, which elegantly blend global and local information through progressively more specific filter layers with increasingly extending receptive fields. One shortcoming of dictionary learning is that it does not explicitly select and focus on important regions, instead it either generate responses on uniform grid of patches or entire image. To address this, we present an object-aware dictionary learning framework that systematically incorporates region proposals and deep features in order to improve the discriminative power of the combined classifier. Rather than extracting a dictionary from all fixed sized image windows, our methods concentrates on a small set of object candidates, which enables consolidation of semantic information. We formulate this as an optimization problem on a new objective function and propose an iterative solver. Our results on benchmark datasets demonstrate the effectiveness of our method, which is shown to be superior to the state-of-the-art dictionary learning and deep learning based image classification approaches.