Wednesday, October 28, 2015

Numpy Array Size, and ValueError

I just wanted to capture this for others as I'm not sure the *correct* solution but it appears that trying to use a very large array within SKLearn's kmeans seems to cause a problem.

Traceback (most recent call last):
  File "..\School\GeorgiaTech\Assignment_3\live_pca.py", line 167, in odule>
    k_means_results('Live No Feature Selection', [X,y], [X_test, y_test], colorm
ap = False)
  File "..\School\GeorgiaTech\Assignment_3\live_pca.py", line 60, in k_m
eans_results
    fit_results = k_means.fit(X)
  File "C:\Python27\lib\site-packages\sklearn\cluster\k_means_.py", line 785, in
 fit
    X = self._check_fit_data(X)
  File "C:\Python27\lib\site-packages\sklearn\cluster\k_means_.py", line 755, in
 _check_fit_data
    X = check_array(X, accept_sparse='csr', dtype=np.float64)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 344, in
 check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence

When I reduce the size of my input by a bunch (I had roughly 246 features, and 3500 lines), the code begins to run correctly (I have a smaller input size for another dataset, that has the same setup except 6 features not 246, and is shorter, no problems there).

Good luck

UPDATE
Open your csv input in excel move to rightmost column, now one more over. Hit crtl and down arrow, if you have incorrect data, youll find more elements, if you load a csv like i did