If you spend some time browsing, there are some examples already available for Python SIFT/SURF bag of words (BoW) classifier in the internet. They use clustering (usually K-Means) to build dictionary of visual vocabularies (usually with sklearn
or cv2
clustering library) of SIFT/SURF features. However, most of the sample codes that I found can’t properly handle big number(> 100) of vocabularies/clusters, while some papers (such as this one) shows best result are achieved using 2000+ clusters.
Building visual dictionary using cv2.BOWKMeansTrainer
is super slow when using > 100 clusters. While using sklearn.cluster.KMeans
solves the speed issue, it requires huge amount of memory (8 GB of RAM is still insufficient to handle > 400 clusters). That’s where klearn.cluster.MiniBatchKMeans
comes into picture.
import cv2 import numpy as np import progressbar from sklearn.cluster import MiniBatchKMeans ... def build_dictionary(xfeatures2d, dir_names, file_paths, dictionary_size): print('Computing descriptors..') desc_list = [] num_files = len(file_paths) bar = progressbar.ProgressBar(maxval=num_files).start() for i in range(num_files): p = file_paths[i] image = cv2.imread(p) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) kp, dsc = xfeatures2d.detectAndCompute(gray, None) desc_list.extend(dsc) bar.update(i) bar.finish() print('Creating BoW dictionary using K-Means clustering with k={}..'.format(dictionary_size)) dictionary = MiniBatchKMeans(n_clusters=dictionary_size, batch_size=100, verbose=1) dictionary.fit(np.array(desc_list)) return dictionary ... # usage example sift = cv2.xfeatures2d.SIFT_create() dir_names = ['class1', 'class2', ...] file_paths = ['/data/class1/1.jpg', '/data/class1/2.jpg', ..., '/data/class2/1.jpg', '/data/class2/2.jpg', ...] dictionary_size = 2800 dictionary = build_dictionary(sift, dir_names, file_paths, dictionary_size)
Using the code above, I was able to complete whole process of raw data (2000+ images) preprocessing, building SIFT dictionary, cross-validating 6 classifiers within 52 minutes (still quite long, but acceptable ?). While using SURF, it takes around 45 minutes. The complete main files for both experiments can be found in here and here.