Image Classification
Posted By : Webtunix Solution
Image classification is the process of categorizing and labeling groups of pixels or vectors within an image based on specific rules. The categorization law can be devised using one or more spectral or textural characteristics.
Perform Image classification method to find the most accuracyof predicted image dataset. The image classification accepts the given input images and produces output classification for identifying whether the disease is present or not.
Import required library for image classification
In [24]:
import numpy as np
import pprint
pp=pprint.PrettyPrinter(indent=4)
import joblib
from skimage.io import imread
from skimage.transform import resize
import os
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.image as mpimg
from skimage.feature import hog
Define a function to resize all the images size and create adictionary to store all the images data values and labes
In [11]:
def resize_all(src,pklname,include,width=250,height=None):
height=height if height is not None else width
data=dict()
data['description']='resize({0}*{1}) gender image in rgb'.format(int(width),int(height))
data['label']=[]
data['filename']=[]
data['data']=[]
pklname=f'{pklname}_{width}*{height}px.pxl'
for subdir in os.listdir(src):
if subdir in include:
print(subdir)
currentpath=os.path.join(src,subdir)
for file in os.listdir(currentpath):
if file[-3:] in {'jpg','png'}:
im=imread(os.path.join(currentpath,file))
im=resize(im,(width,height))
data['label'].append(subdir[:])
data['filename'].append(file)
data['data'].append(im)
joblib.dump(data,pklname)
Give directory of image dataset
In [12]:
data_path='/home/webtunix/Downloads/image classification archive /gender/train'
os.listdir(data_path)
Out[12]:
['male', 'female']
Calling the resize_all function by goving some parameter values (width, base name, include)
In [13]:
base_name="Gender data"
width=224
include={'male',"female"}
resize_all(src=data_path,pklname=base_name,width=width,include=include)
male
female
Print information of dataset stored in list of dictionary 'data' and use counter to count the sample of each label of dictionary
In [14]:
from collections import Counter
data=joblib.load(f'{base_name}_{width}*{width}px.pxl')
print("Number of sample:",len(data['data']))
print("Keys:",list(data.keys()))
print("description:",data['description'])
print("image shape:",data['data'][0].shape)
print('lables:',np.unique(data['label']))
Counter(data['label'])
Number of sample: 3491
Keys: ['description', 'label', 'filename', 'data']
description: resize(224*224) gender image in rgb
image shape: (224, 224, 3)
lables: ['female' 'male']
Out[14]:
Counter({'male': 1744, 'female': 1747})
perform a operation to show the unique classify image sample of each label(male, female)
In [19]:
labels = np.unique(data['label'])
fig, axes = plt.subplots(1, len(labels))
fig.set_size_inches(15,4)
fig.tight_layout()
for ax, label in zip(axes, labels):
idx = data['label'].index(label)
ax.imshow(data['data'][idx])
ax.axis('off')
ax.set_title(label)
Convert the dictionary value in an array list and apply test and train split to test and train the dataset
In [20]:
X = np.array(data['data'],dtype='uint8')
y = np.array(data['label'])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,shuffle=True,random_state=42,)
Define a function to plot a graph of uniquly percentage count the male and female data sample in test and train array of list.
In [21]:
def plot_bar(y, loc='left', relative=True):
width = 0.35
if loc == 'left':
n = -0.5
elif loc == 'right':
n = 0.5
unique, counts = np.unique(y, return_counts=True)
sorted_index = np.argsort(unique)
unique = unique[sorted_index]
if relative:
counts = 100*counts[sorted_index]/len(y)
ylabel_text = '% count'
else:
counts = counts[sorted_index]
ylabel_text = 'count'
xtemp = np.arange(len(unique))
plt.bar(xtemp + n*width, counts, align='center', alpha=.7, width=width)
plt.xticks(xtemp, unique, rotation=45)
plt.xlabel('equipment type')
plt.ylabel(ylabel_text)
plt.suptitle('relative amount of photos per type')
plot_bar(y_train, loc='left')
plot_bar(y_test, loc='right')
plt.legend(['train ({0} photos)'.format(len(y_train)),'test ({0} photos)'.format(len(y_test))]);
perform RGB2GRAY transformer to Convert an array of RGB images to grayscale,return an array of transformation and Calculates hog features for each img that is a feature extraction method for detection sample shape
In [25]:
from sklearn.base import BaseEstimator, TransformerMixin
class RGB2GrayTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return np.array([skimage.color.rgb2gray(img) for img in X])
class HogTransformer(BaseEstimator, TransformerMixin):
def __init__(self, y=None, orientations=9,
pixels_per_cell=(8, 8),
cells_per_block=(3, 3), block_norm='L2-Hys'):
self.y = y
self.orientations = orientations
self.pixels_per_cell = pixels_per_cell
self.cells_per_block = cells_per_block
self.block_norm = block_norm
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
def local_hog(X):
return hog(X,
orientations=self.orientations,
pixels_per_cell=self.pixels_per_cell,
cells_per_block=self.cells_per_block,
block_norm=self.block_norm)
try:
return np.array([local_hog(img) for img in X])
except:
return np.array([local_hog(img) for img in X])
create an instance of each transformer and call fit_transform on each transform converting X_train step by step by calling hog transformation and BGR2Gray transformation
In [40]:
from sklearn import svm
from sklearn.model_selection import cross_val_predict
from sklearn.preprocessing import StandardScaler, Normalizer
import skimage
grayify = RGB2GrayTransformer()
hogify = HogTransformer(
pixels_per_cell=(14, 14),
cells_per_block=(2,2),
orientations=9,
block_norm='L2-Hys'
)
scalify = StandardScaler()
X_train_gray = grayify.fit_transform(X_train)
X_train_hog = hogify.fit_transform(X_train_gray)
X_train_prepared = scalify.fit_transform(X_train_hog)
print(X_train_prepared.shape)
X_test_gray = grayify.transform(X_test)
X_test_hog = hogify.transform(X_test_gray)
X_test_prepared = scalify.transform(X_test_hog)
(2792, 8100)
Apply SVM classifier to predict the data sample and print their accuracy score
that are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis.
In [41]:
svm_clf = svm.SVC()
svm_clf.fit(X_train_prepared, y_train)
y_pred = svm_clf.predict(X_test_prepared)
print("SVM Classifier\n",np.array(y_pred == y_test)[:50])
print('')
print('SVM Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred == y_test)/len(y_test))
SVM Classifier
[ True True True False True False True True True True True False
True False True True True False True True False True False False
False True False True True False True True False False True True
True True True True True True True False True False True False
True True]
SVM Percentage correct(%Accuracy of data):
57.93991416309013
Apply KNN classifier to predict the data sample and print their accuracy score
that is a data classification algorithm that attempts to determine what group a data point is
in by looking at the data points around it.
In [33]:
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train_prepared, y_train)
y_pred1 = neigh.predict(X_test_prepared)
print("KNN Classifier\n",np.array(y_pred1 == y_test)[:50])
print('')
print('KNN Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred1 == y_test)/len(y_test))
KNN Classifier
[ True True True False True False True True False True True False
True False True True True False False True False True False True
False True False True True False True True False False True True
True True False True True True True False True False True False
False True]
KNN Percentage correct(%Accuracy of data):
55.36480686695279
Apply Light GBM classifier to predict the data sample and print their accuracy score
that is also using for classification and have following advantages: Faster training speed and higher efficiency. Lower memory usage. Better accuracy. Support of parallel and GPU learning. Capable of handling large-scale data
In [32]:
import lightgbm as lgb
lgb_clf = lgb.LGBMClassifier()
lgb_clf.fit(X_train_prepared, y_train)
y_pred2 = lgb_clf.predict(X_test_prepared)
print("LightGB Classifier\n",np.array(y_pred2 == y_test)[:30])
print('')
print('LightGBM Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred2 == y_test)/len(y_test))
LightGB Classifier
[ True True True False True False True True False True True False
True False True False True False True True False True False False
False True False True True False]
LightGBM Percentage correct(%Accuracy of data):
55.79399141630901
Apply CatBoost classifier to predict the data sample and print their accuracy score that converts categorical values into numbers using various statistics on combinations of categorical features and combinations of categorical and numerical features
In [35]:
from catboost import CatBoostClassifier
cbc_clf = CatBoostClassifier(iterations=5, learning_rate=0.1,)
cbc_clf.fit(X_train_prepared, y_train,verbose=False)
y_pred3 = cbc_clf.predict(X_test_prepared)
print("CatBoost Classifier\n",np.array(y_pred3 == y_test)[:30])
print('')
print('CatBoostClassifier Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred3 == y_test)/len(y_test))
CatBoost Classifier
[ True True True False True False True True True True True False
True False True True True False False True False True False True
False True False True True False]
CatBoostClassifier Percentage correct(%Accuracy of data):
57.51072961373391
Apply XGBoost classifier to predict the data sample and print their accuracy score that is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.)
In [37]:
from xgboost import XGBClassifier
xg_clf = XGBClassifier()
xg_clf.fit(X_train_prepared, y_train)
y_pred4 = xg_clf.predict(X_test_prepared)
print("XGBoost Classifier\n",np.array(y_pred4 == y_test)[:30])
print('')
print('XGBoostClassifier Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred4 == y_test)/len(y_test))
[13:56:06] WARNING: ../src/learner.cc:1095: Starting in XGBoost
1.3.0, the default evaluation metric used with the objective
'binary:logistic' was changed from 'error' to 'logloss'. E
xplicitly set eval_metric if you'd like to restore the old behavior.
XGBoost Classifier
[ True True True False True False True True False True True False
True False True True True False False True False True False False
False True False True True False]
XGBoostClassifier Percentage correct(%Accuracy of data):
56.36623748211731
Apply DecisionTree classifier to predict the data sample and print their accuracy score that creates the classification model by building a decision tree. Each node in the tree specifies a test on an attribute, each branch descending from that node corresponds to one of the possible values for that attribute.
In [38]:
from sklearn.tree import DecisionTreeClassifier
dt_clf = DecisionTreeClassifier(random_state=0)
dt_clf.fit(X_train_prepared, y_train)
y_pred5 = dt_clf.predict(X_test_prepared)
print("Decision Tree Classifier\n",np.array(y_pred5 == y_test)[:30])
print('')
print('Decision Tree Classifier Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred5 == y_test)/len(y_test))
Decision Tree Classifier
[ True False False True False True False False False False True True
False True True False False True False True True False True True
True False True False False True]
Decision Tree Classifier Percentage correct
(%Accuracy of data): 50.07153075822604
Conclusion
Perform classification method on image dataset to predict the best accuracy score and learn What is Image Classification and its use cases so Image Classification means assigning an input image, one label from a fixed set of categories and it is uses such as medical image analysis, identifying objects in autonomous cars, face detection for security purpose, etc. After applying different classification method I discovered tha SVM Classifier having best accuracy score.