Image Classification

Image Classification

Image classification is the process of categorizing and labeling groups of pixels or vectors within an image based on specific rules. The categorization law can be devised using one or more spectral or textural characteristics.

Perform Image classification method to find the most accuracyof predicted image dataset. The image classification accepts the given input images and produces output classification for identifying whether the disease is present or not.

Import required library for image classification

In [24]:

import numpy as np
import pprint
pp
=pprint.PrettyPrinter(indent=4)
import joblib
from skimage.io import imread
from skimage.transform import resize
import os
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.image as mpimg
from skimage.feature import hog

Define a function to resize all the images size and create adictionary to store all the images data values and labes

In [11]:

def resize_all(src,pklname,include,width=250,height=None):
    height=height if height is not None else width
    data
=dict()
    data['description']='resize({0}*{1}) gender image in rgb'.format(int(width),int(height))
    data['label']=[]
    data['filename']=[]
    data['data']=[]
    pklname=f'{pklname}_{width}*{height}px.pxl'
    for subdir in os.listdir(src):
        if subdir in include:
            print(subdir)
            currentpath=os.path.join(src,subdir)
            for file in os.listdir(currentpath):
                if file[-3:] in {'jpg','png'}:
                    im=imread(os.path.join(currentpath,file))
                    im=resize(im,(width,height))
                    data['label'].append(subdir[:])
                    data['filename'].append(file)
                    data['data'].append(im)
        joblib.dump(data,pklname)

Give directory of image dataset

In [12]:

data_path='/home/webtunix/Downloads/image classification archive /gender/train'
os.listdir(data_path)

Out[12]:

['male', 'female']

Calling the resize_all function by goving some parameter values (width, base name, include)

In [13]:

base_name="Gender data"
width=224
include={'male',"female"}
resize_all(src=data_path,pklname=base_name,width=width,include=include)
male
female

Print information of dataset stored in list of dictionary 'data' and use counter to count the sample of each label of dictionary

In [14]:

from collections import Counter
data
=joblib.load(f'{base_name}_{width}*{width}px.pxl')
print("Number of sample:",len(data['data']))
print("Keys:",list(data.keys()))
print("description:",data['description'])
print("image shape:",data['data'][0].shape)
print('lables:',np.unique(data['label']))
Counter(data['label'])

Number of sample: 3491
Keys: ['description', 'label', 'filename', 'data']
description: resize(224*224) gender image in rgb
image shape: (224, 224, 3)
lables: ['female' 'male']

Out[14]:

Counter({'male': 1744, 'female': 1747})

perform a operation to show the unique classify image sample of each label(male, female)

In [19]:

labels = np.unique(data['label'])
fig, axes = plt.subplots(1, len(labels))
fig.set_size_inches(15,4)
fig.tight_layout()
for ax, label in zip(axes, labels):
    idx = data['label'].index(label)
    ax.imshow(data['data'][idx])
    ax.axis('off')
    ax.set_title(label)

Convert the dictionary value in an array list and apply test and train split to test and train the dataset

In [20]:

X = np.array(data['data'],dtype='uint8')
y = np.array(data['label'])
from sklearn.model_selection import train_test_split
X_train
, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,shuffle=True,random_state=42,)

Define a function to plot a graph of uniquly percentage count the male and female data sample in test and train array of list.

In [21]:

def plot_bar(y, loc='left', relative=True):
    width = 0.35
    if loc == 'left':
        n
= -0.5
    elif loc == 'right':
        n
= 0.5
    unique, counts = np.unique(y, return_counts=True)
    sorted_index = np.argsort(unique)
    unique = unique[sorted_index]
    if relative:
        counts = 100*counts[sorted_index]/len(y)
        ylabel_text = '% count'
    else:
        counts = counts[sorted_index]
        ylabel_text = 'count'
    xtemp = np.arange(len(unique))
    plt.bar(xtemp + n*width, counts, align='center', alpha=.7, width=width)
    plt.xticks(xtemp, unique, rotation=45)
    plt.xlabel('equipment type')
    plt.ylabel(ylabel_text)
plt.suptitle('relative amount of photos per type')
plot_bar(y_train, loc='left')
plot_bar(y_test, loc='right')
plt.legend(['train ({0} photos)'.format(len(y_train)),'test ({0} photos)'.format(len(y_test))]);

perform RGB2GRAY transformer to Convert an array of RGB images to grayscale,return an array of transformation and Calculates hog features for each img that is a feature extraction method for detection sample shape

In [25]:

from sklearn.base import BaseEstimator, TransformerMixin
class RGB2GrayTransformer(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        return np.array([skimage.color.rgb2gray(img) for img in X])
class HogTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, y=None, orientations=9,
                 pixels_per_cell=(8, 8),
                 cells_per_block=(3, 3), block_norm='L2-Hys'):
        self.y = y
        self
.orientations = orientations
        self
.pixels_per_cell = pixels_per_cell
        self
.cells_per_block = cells_per_block
        self
.block_norm = block_norm
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        def local_hog(X):
            return hog(X,
                       orientations=self.orientations,
                       pixels_per_cell=self.pixels_per_cell,
                       cells_per_block=self.cells_per_block,
                       block_norm=self.block_norm)
        try:
            return np.array([local_hog(img) for img in X])
        except:
           
return np.array([local_hog(img) for img in X])

create an instance of each transformer and call fit_transform on each transform converting X_train step by step by calling hog transformation and BGR2Gray transformation

In [40]:

from sklearn import svm
from sklearn.model_selection import cross_val_predict
from sklearn.preprocessing import StandardScaler, Normalizer
import skimage
grayify
= RGB2GrayTransformer()
hogify = HogTransformer(
    pixels_per_cell=(14, 14),
    cells_per_block=(2,2),
    orientations=9,
    block_norm='L2-Hys'
)
scalify = StandardScaler()
X_train_gray = grayify.fit_transform(X_train)
X_train_hog = hogify.fit_transform(X_train_gray)
X_train_prepared = scalify.fit_transform(X_train_hog)
print(X_train_prepared.shape)
X_test_gray = grayify.transform(X_test)
X_test_hog = hogify.transform(X_test_gray)
X_test_prepared = scalify.transform(X_test_hog)
(2792, 8100)

Apply SVM classifier to predict the data sample and print their accuracy score

that are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis.

In [41]:

svm_clf = svm.SVC()
svm_clf.fit(X_train_prepared, y_train)
y_pred = svm_clf.predict(X_test_prepared)
print("SVM Classifier\n",np.array(y_pred == y_test)[:50])
print('')
print('SVM Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred == y_test)/len(y_test))
SVM Classifier
 [ True  True  True False  True False  True  True  True  True  True False
  True False  True  True  True False  True  True False  True False False
 False  True False  True  True False  True  True False False  True  True
  True  True  True  True  True  True  True False  True False  True False
  True  True]

SVM Percentage correct(%Accuracy of data): 
 57.93991416309013

Apply KNN classifier to predict the data sample and print their accuracy score

that is a data classification algorithm that attempts to determine what group a data point is

in by looking at the data points around it.

In [33]:

from sklearn.neighbors import KNeighborsClassifier
neigh
= KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train_prepared, y_train)
y_pred1 = neigh.predict(X_test_prepared)
print("KNN Classifier\n",np.array(y_pred1 == y_test)[:50])
print('')
print('KNN Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred1 == y_test)/len(y_test))
KNN Classifier
 [ True  True  True False  True False  True  True False  True  True False
  True False  True  True  True False False  True False  True False  True
 False  True False  True  True False  True  True False False  True  True
  True  True False  True  True  True  True False  True False  True False
 False  True]

KNN Percentage correct(%Accuracy of data): 
 55.36480686695279

Apply Light GBM classifier to predict the data sample and print their accuracy score

that is also using for classification and have following advantages: Faster training speed and higher efficiency. Lower memory usage. Better accuracy. Support of parallel and GPU learning. Capable of handling large-scale data

In [32]:

import lightgbm as lgb
lgb_clf
= lgb.LGBMClassifier()
lgb_clf.fit(X_train_prepared, y_train)
y_pred2 = lgb_clf.predict(X_test_prepared)
print("LightGB Classifier\n",np.array(y_pred2 == y_test)[:30])
print('')
print('LightGBM Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred2 == y_test)/len(y_test))
LightGB Classifier
 [ True  True  True False  True False  True  True False  True  True False
  True False  True False  True False  True  True False  True False False
 False  True False  True  True False]

LightGBM Percentage correct(%Accuracy of data): 
 55.79399141630901

Apply CatBoost classifier to predict the data sample and print their accuracy score that converts categorical values into numbers using various statistics on combinations of categorical features and combinations of categorical and numerical features

In [35]:

from catboost import CatBoostClassifier
cbc_clf = CatBoostClassifier(iterations=5, learning_rate=0.1,)
cbc_clf.fit(X_train_prepared, y_train,verbose=False)
y_pred3 = cbc_clf.predict(X_test_prepared)
print("CatBoost Classifier\n",np.array(y_pred3 == y_test)[:30])
print('')
print
('CatBoostClassifier Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred3 == y_test)/len(y_test))
CatBoost Classifier
 [ True  True  True False  True False  True  True  True  True  True False
  True False  True  True  True False False  True False  True False  True
 False  True False  True  True False]

CatBoostClassifier Percentage correct(%Accuracy of data): 
 57.51072961373391

Apply XGBoost classifier to predict the data sample and print their accuracy score that is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.)

In [37]:

from xgboost import XGBClassifier
xg_clf = XGBClassifier()
xg_clf.fit(X_train_prepared, y_train)
y_pred4 = xg_clf.predict(X_test_prepared)
print("XGBoost Classifier\n",np.array(y_pred4 == y_test)[:30])
print('')
print('XGBoostClassifier Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred4 == y_test)/len(y_test))
[13:56:06] WARNING: ../src/learner.cc:1095: Starting in XGBoost 
1.3.0, the default evaluation metric used with the objective
 'binary:logistic' was changed from 'error' to 'logloss'. E
xplicitly set eval_metric if you'd like to restore the old behavior.
XGBoost Classifier
 [ True  True  True False  True False  True  True False  True  True False
  True False  True  True  True False False  True False  True False False
 False  True False  True  True False]

XGBoostClassifier Percentage correct(%Accuracy of data): 
 56.36623748211731

Apply DecisionTree classifier to predict the data sample and print their accuracy score that creates the classification model by building a decision tree. Each node in the tree specifies a test on an attribute, each branch descending from that node corresponds to one of the possible values for that attribute.

In [38]:

from sklearn.tree import DecisionTreeClassifier
dt_clf
= DecisionTreeClassifier(random_state=0)
dt_clf.fit(X_train_prepared, y_train)
y_pred5 = dt_clf.predict(X_test_prepared)
print("Decision Tree Classifier\n",np.array(y_pred5 == y_test)[:30])
print('')
print('Decision Tree Classifier Percentage correct(%Accuracy of data): ', 100*np.sum(y_pred5 == y_test)/len(y_test))
Decision Tree Classifier
 [ True False False  True False  True False False False False  True  True
 False  True  True False False  True False  True  True False  True  True
  True False  True False False  True]

Decision Tree Classifier Percentage correct
(%Accuracy of data):  50.07153075822604

Conclusion

Perform classification method on image dataset to predict the best accuracy score and learn What is Image Classification and its use cases so Image Classification means assigning an input image, one label from a fixed set of categories and it is uses such as medical image analysis, identifying objects in autonomous cars, face detection for security purpose, etc. After applying different classification method I discovered tha SVM Classifier having best accuracy score.

 

 

Improve your Business Analytics with our training data.

Better data is the key for the better products. We train you data for Machine Learning and better business analytics. We can annotate, collect, evaluate and translate any type of data in any language.