This is a machine learning model in python using scikit learn to classify the

Question

handwritten Arabic letters. There are two files. The train data and the test data. The

code is available, and we need to optimize the code so under box number 6 when we

do the cross validation of the model, the accuracy of the model should be in high 80s

and low 90s. we should be tuning the hyperparameters and improve the pipeline as

needed. Anything is allowed to be used from the scikit learn but nothing more.

The code as it is, the model accuracy is 79

The goal is to modify the code to be able to get an accuracy of the model in the high

80s and low 90s.

In box 3 of the code, there are the hyperparameters that need to be tuned and the

pipeline that might need to be modifed. Voting model can be used to get high

accuracy.

We need to improve the model accuracy from the existing code.

Info about the dataset: The dataset is composed of 16,800 characters written by 60

participants, the age range is between 19 to 40 years, and 90% of participants are

right-hand. Each participant wrote each character (from 'alef' to 'yeh') ten times on

two forms. The forms were scanned at the resolution of 300 dpi. The dataset is

partitioned into two sets: a training set (13,440 characters to 480 images per class)

and a test set (3,360 characters to 120 images per class). Writers of training set and

test set are exclusive. Ordering of including writers to test set are randomized to make

sure that writers of test set were not from a single institution (to ensure variability of

the test set).

The code: This is a machine learning model in python using scikit learn to classify

the handwritten Arabic letters. There are two files. The train data and the test data.

The code is available, and we need to optimize the code so under box number 6 when

we do the cross validation of the model, the accuracy of the model should be in high

80s and low 90s. we should be tuning the hyperparameters and improve the pipeline

as needed. Anything is allowed to be used from the scikit learn but nothing more.

Voting model can be used to improve accuracy.

Goal: build an image classifier to classify handwritten Arabic language characters

using scikit learn. The model accuracy have to be in high 80s like 89% or low 90s

like 92%

This is all about tuning the hyperparameters and the model pipeline