scikit-learn: Expected 2D array, got 1D array instead


I am currently trying to train my dataset and is currently trying to follow the steps at the link. Link here

I keep getting errors like

ValueError: Expected 2D array, got 1D array instead: array=[0. 0. 0. and so on..]. 

This is the code that I am trying to test and train using sci-kit learn:

import numpy as np
from sklearn.svm import LinearSVC
import os
import cv2
import joblib

# Generate training set
TRAIN_PATH = 'Try/'
list_folder = os.listdir(TRAIN_PATH)
trainset = []
for folder in list_folder:
    flist = os.listdir(os.path.join(TRAIN_PATH, folder))
    for f in flist:
        im = cv2.imread(os.path.join(TRAIN_PATH, folder, f))
        im = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY )
        im = cv2.resize(im, (36,36))
        trainset.append(im)
# Labeling for trainset
train_label = []
for i in range(0,1): #in the dataset i currently have a 0 folder and a 1 folder
    temp = 400*[i] #400 images in 1 folder
    train_label += temp

# Generate testing set
TEST_PATH = 'Test/'
list_folder = os.listdir(TEST_PATH)
testset = []
test_label = []
for folder in list_folder:
    flist = os.listdir(os.path.join(TEST_PATH, folder))
    for f in flist:
        im = cv2.imread(os.path.join(TEST_PATH, folder, f))
        im = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY )
        im = cv2.resize(im, (36,36))
        testset.append(im)
        test_label.append(folder)
trainset = np.reshape(trainset, (800, -1)) #800 because total number of images

# Create an linear SVM object
clf = LinearSVC()

# Perform the training
clf.fit(train_label, testset)
print("Training finished successfully")

# Testing
testset = np.reshape(testset, (len(testset), -1))
y = clf.predict(testset)
print("Testing accuracy: " + str(clf.score(testset, test_label)))

This is the error message:

 Traceback (most recent call last):   File
 "C:\Users\hp\AppData\Local\Programs\Python\Python37-32\Thesis\Test.py",
 line 43, in <module>
     clf.fit(train_label, testset)   File "C:\Users\hp\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\svm\classes.py",
 line 229, in fit
     accept_large_sparse=False)   File "C:\Users\hp\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py",
 line 756, in check_X_y
     estimator=estimator)   File "C:\Users\hp\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py",
 line 552, in check_array
     "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got 1D array instead: array=[0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or
 array.reshape(1, -1) if it contains a single sample.

Answer

Hard to tell with such little information, but it might have to do with the parameters clf.fit() is receiving. It looks like you are feeding it the label train_label before the actual training data X.

If you have a look at the documentation, the order is as follows:

fit(X, y[, sample_weight]) Fit the SVM model according to the given training data.

Use instead:

clf.fit(testset, train_label)