Integer variable for range "not found on sheet" - python

I continue to have issues with my code (I am self-taught so I'm far from an expert in Python or XLwings). My method is as follows:
I copy a 4 column set of data (again of varying lengths) to a 2d array.
The data is then compared against a newly found set of a data and the matches are removed from the copied array.
The contents of the columns that were assigned to the original 2d array are then cleared of their contents.
The same array (with the matches deleted), is replaced at the same starting cell as the previous array.
In short, I copy the columns, remove the matches, clear the original cells, and then paste the new contents at the same starting cell as the original columns. (Since the length of the column is not fixed, I assign this cell row to a variable "sht2Row".)
However, about 10% of the time, I get the following error:
104 sht2.range((5,3),(sht2Row,10)).clear_contents()
105 time.sleep(0.5)
--> 106 sht2.range((5,12),(sht2Row,16)).clear_contents()
107 time.sleep(0.5)
108 sht2.range("C5").value=BoughtData
~\anaconda3\lib\site-packages\xlwings\main.py in range(self, cell1, cell2)
862 raise ValueError("Second range is not on this sheet")
863 cell2 = cell2.impl
--> 864 return Range(impl=self.impl.range(cell1, cell2))
865
866 #property
~\anaconda3\lib\site-packages\xlwings\_xlwindows.py in range(self, arg1, arg2)
635 xl2 = self.xl.Range(arg2)
636
--> 637 return Range(xl=self.xl.Range(xl1, xl2))
638
639 #property
~\anaconda3\lib\site-packages\xlwings\_xlwindows.py in __call__(self, *args, **kwargs)
64 while True:
65 try:
---> 66 v = self.__method(*args, **kwargs)
67 if isinstance(v, (CDispatch, CoClassBaseClass, DispatchBaseClass)):
68 return COMRetryObjectWrapper(v)
~\AppData\Local\Temp\gen_py\3.8\00020813-0000-0000-C000-000000000046x0x1x9.py in Range(self, Cell1, Cell2)
47370 # The method Range is actually a property, but must be used as a method to correctly pass the arguments
47371 def Range(self, Cell1=defaultNamedNotOptArg, Cell2=defaultNamedOptArg):
> 47372 ret = self._oleobj_.InvokeTypes(197, LCID, 2, (9, 0), ((12, 1), (12, 17)),Cell1 , Cell2)
47374 if ret is not None:
com_error: (-2147352567, 'Exception occurred.', (0, None, None, None,
0, -2146827284), None)
Does anyone know why this would occur? In the past, I have occasionally received this error because I was working on code that talks to these sells but most of the time lately, I won't even be at the computer but will still receive this error. I don't understand this. If it is some issue with sht2Row then why isn't it occurring the first time I use the variable? Regardless, how is it possible to get a "Second range is not on this sheet" error when sht2Row is just an integer? I am really at a loss as to why this would only occasionally occur and why it would say that the range is not on the sheet...
This is my first time using this site. I normally try to figure my errors out on my own but this one has me stumped...

Related

I keep getting the error message ValueError: Wrong number of items passed 2, placement implies 1 [duplicate]

I am receiving the error:
ValueError: Wrong number of items passed 3, placement implies 1, and I am struggling to figure out where, and how I may begin addressing the problem.
I don't really understand the meaning of the error; which is making it difficult for me to troubleshoot. I have also included the block of code that is triggering the error in my Jupyter Notebook.
The data is tough to attach; so I am not looking for anyone to try and re-create this error for me. I am just looking for some feedback on how I could address this error.
KeyError Traceback (most recent call last)
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
1944 try:
-> 1945 return self._engine.get_loc(key)
1946 except KeyError:
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 'predictedY'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in set(self, item, value, check)
3414 try:
-> 3415 loc = self.items.get_loc(item)
3416 except KeyError:
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 'predictedY'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-95-476dc59cd7fa> in <module>()
26 return gp, results
27
---> 28 gp_dailyElectricity, results_dailyElectricity = predictAll(3, 0.04, trainX_dailyElectricity, trainY_dailyElectricity, testX_dailyElectricity, testY_dailyElectricity, testSet_dailyElectricity, 'Daily Electricity')
<ipython-input-95-476dc59cd7fa> in predictAll(theta, nugget, trainX, trainY, testX, testY, testSet, title)
8
9 results = testSet.copy()
---> 10 results['predictedY'] = predictedY
11 results['sigma'] = sigma
12
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
2355 else:
2356 # set column
-> 2357 self._set_item(key, value)
2358
2359 def _setitem_slice(self, key, value):
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
2422 self._ensure_valid_index(value)
2423 value = self._sanitize_column(key, value)
-> 2424 NDFrame._set_item(self, key, value)
2425
2426 # check if we are modifying a copy
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\generic.py in _set_item(self, key, value)
1462
1463 def _set_item(self, key, value):
-> 1464 self._data.set(key, value)
1465 self._clear_item_cache()
1466
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in set(self, item, value, check)
3416 except KeyError:
3417 # This item wasn't present, just insert at end
-> 3418 self.insert(len(self.items), item, value)
3419 return
3420
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in insert(self, loc, item, value, allow_duplicates)
3517
3518 block = make_block(values=value, ndim=self.ndim,
-> 3519 placement=slice(loc, loc + 1))
3520
3521 for blkno, count in _fast_count_smallints(self._blknos[loc:]):
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
2516 placement=placement, dtype=dtype)
2517
-> 2518 return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
2519
2520 # TODO: flexible with index=None and/or items=None
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in __init__(self, values, placement, ndim, fastpath)
88 raise ValueError('Wrong number of items passed %d, placement '
89 'implies %d' % (len(self.values),
---> 90 len(self.mgr_locs)))
91
92 #property
ValueError: Wrong number of items passed 3, placement implies 1
My code is as follows:
def predictAll(theta, nugget, trainX, trainY, testX, testY, testSet, title):
gp = gaussian_process.GaussianProcess(theta0=theta, nugget =nugget)
gp.fit(trainX, trainY)
predictedY, MSE = gp.predict(testX, eval_MSE = True)
sigma = np.sqrt(MSE)
results = testSet.copy()
results['predictedY'] = predictedY
results['sigma'] = sigma
print ("Train score R2:", gp.score(trainX, trainY))
print ("Test score R2:", sklearn.metrics.r2_score(testY, predictedY))
plt.figure(figsize = (9,8))
plt.scatter(testY, predictedY)
plt.plot([min(testY), max(testY)], [min(testY), max(testY)], 'r')
plt.xlim([min(testY), max(testY)])
plt.ylim([min(testY), max(testY)])
plt.title('Predicted vs. observed: ' + title)
plt.xlabel('Observed')
plt.ylabel('Predicted')
plt.show()
return gp, results
gp_dailyElectricity, results_dailyElectricity = predictAll(3, 0.04, trainX_dailyElectricity, trainY_dailyElectricity, testX_dailyElectricity, testY_dailyElectricity, testSet_dailyElectricity, 'Daily Electricity')
In general, the error ValueError: Wrong number of items passed 3, placement implies 1 suggests that you are attempting to put too many pigeons in too few pigeonholes. In this case, the value on the right of the equation
results['predictedY'] = predictedY
is trying to put 3 "things" into a container that allows only one. Because the left side is a dataframe column, and can accept multiple items on that (column) dimension, you should see that there are too many items on another dimension.
Here, it appears you are using sklearn for modeling, which is where gaussian_process.GaussianProcess() is coming from (I'm guessing, but correct me and revise the question if this is wrong).
Now, you generate predicted values for y here:
predictedY, MSE = gp.predict(testX, eval_MSE = True)
However, as we can see from the documentation for GaussianProcess, predict() returns two items. The first is y, which is array-like (emphasis mine). That means that it can have more than one dimension, or, to be concrete for thick headed people like me, it can have more than one column -- see that it can return (n_samples, n_targets) which, depending on testX, could be (1000, 3) (just to pick numbers). Thus, your predictedY might have 3 columns.
If so, when you try to put something with three "columns" into a single dataframe column, you are passing 3 items where only 1 would fit.
Not sure if this is relevant to your question but it might be relevant to someone else in the future: I had a similar error. Turned out that the df was empty (had zero rows) and that is what was causing the error in my command.
Another cause of this error is when you apply a function on a DataFrame where there are two columns with the same name.
Starting with pandas 1.3.x it's not allowed to fill objects (e.g. like an eagertensor from an embedding) into columns.
https://github.com/pandas-dev/pandas/blame/master/pandas/core/internals/blocks.py
So ValueError: The wrong number of items passed 3, placement implies 1 occurs when you're passing to many arguments but method supports only a few. for example -
df['First_Name', 'Last_Name'] = df['Full_col'].str.split(' ', expand = True)
In the above code, I'm trying to split Full_col into two sub-columns names as -First_Name & Last_Name, so here I'll get the error because instead list of columns the columns I'm passing only a single argument.
So to avoid this - use another sub-list
df[['First_Name', 'Last_Name']] = df['Full_col'].str.split(' ', expand = True)
Just adding this as an answer: nesting methods and misplacing closed brackets will also throw this error, ex:
march15_totals= march15_t.assign(sum_march15_t=march15_t[{"2021-03-15","2021-03-16","2021-03-17","2021-03-18","2021-03-19","2021-03-20","2021-03-21"}]).sum(axis=1)
Versus the (correct) version:
march15_totals= march15_t.assign(sum_march15_t=march15_t[{"2021-03-15","2021-03-16","2021-03-17","2021-03-18","2021-03-19","2021-03-20","2021-03-21"}].sum(axis=1))
This is probably common sense to most of you but I was quite puzzled until I realized my mistake.
I got this error when I was trying to convert a one-column dataframe, df, into a Series, pd.Series(df).
I resolved this with
pd.Series(df.values.flatten())
The problem was that the values in the dataframe were lists:
my_col
0 ['a']
1 ['b']
2 ['c']
3 ['d']
When I was printing the dataframe it wasn't showing the brackets which made it hard to track down.
for i in range(100):
try:
#Your code here
break
except:
continue
This one worked for me.

I cant import the ucf 101 dataset (torchvision), 'list index out of range' error

dataset = torchvision.datasets.UCF101(r'my_directory', annotation_path=r'my_directory2', frames_per_clip=16, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=transforms.Compose([transforms.ToTensor()]), _precomputed_metadata=None, num_workers=1, _video_width=64, _video_height=64, _video_min_dimension=0, _audio_samples=0)
This line works, the problem is when I try to do any kind of operation using 'dataset', in particular this one:
data = torch.utils.data.DataLoader(dataset, batch_size=512, shuffle=True)
I get the error, so i can't work on the video data because I can't use the dataLoader, the error is:
IndexError: list index out of range
Complete error message:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-dc3631481cb3> in <module>
----> 1 data = torch.utils.data.DataLoader(dataset, batch_size=512, shuffle=True)
~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context)
211 else: # map-style
212 if shuffle:
--> 213 sampler = RandomSampler(dataset)
214 else:
215 sampler = SequentialSampler(dataset)
~\anaconda3\lib\site-packages\torch\utils\data\sampler.py in __init__(self, data_source, replacement, num_samples)
90 "since a random permute will be performed.")
91
---> 92 if not isinstance(self.num_samples, int) or self.num_samples <= 0:
93 raise ValueError("num_samples should be a positive integer "
94 "value, but got num_samples={}".format(self.num_samples))
~\anaconda3\lib\site-packages\torch\utils\data\sampler.py in num_samples(self)
98 # dataset size might change at runtime
99 if self._num_samples is None:
--> 100 return len(self.data_source)
101 return self._num_samples
102
~\anaconda3\lib\site-packages\torchvision\datasets\ucf101.py in __len__(self)
96
97 def __len__(self):
---> 98 return self.video_clips.num_clips()
99
100 def __getitem__(self, idx):
~\anaconda3\lib\site-packages\torchvision\datasets\video_utils.py in num_clips(self)
241 Number of subclips that are available in the video list.
242 """
--> 243 return self.cumulative_sizes[-1]
244
245 def get_clip_location(self, idx):
IndexError: list index out of range
Problem:
This problem occurs when you run your code on windows because windows paths use backslash ("\") instead of forward slash ("/").
As you see in the code:
https://github.com/pytorch/vision/blob/7b9d30eb7c4d92490d9ac038a140398e0a690db6/torchvision/datasets/ucf101.py#L94
So, this line of code reads the file path from label file as “action\video_name” and merge it with “root” path using backslash therefore full path becomes like “root\action/video_name”. Such paths doesn’t match with the video lists at line#97 and returns empty list for indices variable.
Solution:
Two of possible solutions can be:
Replace the forwardslashes “/” in the label files with backslashes “\”.
Override the _select_fold(…) function of class UCF101 and fix the backslashes inside the function

Error while using sum() in Python SFrame

I'm new to python and I'm performing a basic EDA analysis on two similar SFrames. I have a dictionary as two of my columns and I'm trying to find out if the max values of each dictionary are the same or not. In the end I want to sum up the Value_Match column so that I can know how many values match but I'm getting a nasty error and I haven't been able to find the source. The weird thing is I have used the same methodology for both the SFrames and only one of them is giving me this error but not the other one.
I have tried calculating max_func in different ways as given here but the same error has persisted : getting-key-with-maximum-value-in-dictionary
I have checked for any possible NaN values in the column but didn't find any of them.
I have been stuck on this for a while and any help will be much appreciated. Thanks!
Code:
def max_func(d):
v=list(d.values())
k=list(d.keys())
return k[v.index(max(v))]
sf['Max_Dic_1'] = sf['Dic1'].apply(max_func)
sf['Max_Dic_2'] = sf['Dic2'].apply(max_func)
sf['Value_Match'] = sf['Max_Dic_1'] == sf['Max_Dic_2']
sf['Value_Match'].sum()
Error :
RuntimeError Traceback (most recent call last)
<ipython-input-70-f406eb8286b3> in <module>()
----> 1 x = sf['Value_Match'].sum()
2 y = sf.num_rows()
3
4 print x
5 print y
C:\Users\rakesh\Anaconda2\lib\site-
packages\graphlab\data_structures\sarray.pyc in sum(self)
2216 """
2217 with cython_context():
-> 2218 return self.__proxy__.sum()
2219
2220 def mean(self):
C:\Users\rakesh\Anaconda2\lib\site-packages\graphlab\cython\context.pyc in
__exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let
exception propagate
RuntimeError: Runtime Exception. Exception in python callback function
evaluation:
ValueError('max() arg is an empty sequence',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in
graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in
graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
In order to debug this problem, you have to look at the stack trace. On the last line we see:
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
Python thus says that you aim to calculate the maximum of a list with no elements. This is the case if the dictionary is empty. So in one of your dataframes there is probably an empty dictionary {}.
The question is what to do in case the dictionary is empty. You might decide to return a None into that case.
Nevertheless the code you write is too complicated. A simpler and more efficient algorithm would be:
def max_func(d):
if d:
return max(d,key=d.get)
else:
# or return something if there is no element in the dictionary
return None

ValueError: Wrong number of items passed - Meaning and suggestions?

I am receiving the error:
ValueError: Wrong number of items passed 3, placement implies 1, and I am struggling to figure out where, and how I may begin addressing the problem.
I don't really understand the meaning of the error; which is making it difficult for me to troubleshoot. I have also included the block of code that is triggering the error in my Jupyter Notebook.
The data is tough to attach; so I am not looking for anyone to try and re-create this error for me. I am just looking for some feedback on how I could address this error.
KeyError Traceback (most recent call last)
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
1944 try:
-> 1945 return self._engine.get_loc(key)
1946 except KeyError:
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 'predictedY'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in set(self, item, value, check)
3414 try:
-> 3415 loc = self.items.get_loc(item)
3416 except KeyError:
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 'predictedY'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-95-476dc59cd7fa> in <module>()
26 return gp, results
27
---> 28 gp_dailyElectricity, results_dailyElectricity = predictAll(3, 0.04, trainX_dailyElectricity, trainY_dailyElectricity, testX_dailyElectricity, testY_dailyElectricity, testSet_dailyElectricity, 'Daily Electricity')
<ipython-input-95-476dc59cd7fa> in predictAll(theta, nugget, trainX, trainY, testX, testY, testSet, title)
8
9 results = testSet.copy()
---> 10 results['predictedY'] = predictedY
11 results['sigma'] = sigma
12
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
2355 else:
2356 # set column
-> 2357 self._set_item(key, value)
2358
2359 def _setitem_slice(self, key, value):
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
2422 self._ensure_valid_index(value)
2423 value = self._sanitize_column(key, value)
-> 2424 NDFrame._set_item(self, key, value)
2425
2426 # check if we are modifying a copy
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\generic.py in _set_item(self, key, value)
1462
1463 def _set_item(self, key, value):
-> 1464 self._data.set(key, value)
1465 self._clear_item_cache()
1466
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in set(self, item, value, check)
3416 except KeyError:
3417 # This item wasn't present, just insert at end
-> 3418 self.insert(len(self.items), item, value)
3419 return
3420
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in insert(self, loc, item, value, allow_duplicates)
3517
3518 block = make_block(values=value, ndim=self.ndim,
-> 3519 placement=slice(loc, loc + 1))
3520
3521 for blkno, count in _fast_count_smallints(self._blknos[loc:]):
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
2516 placement=placement, dtype=dtype)
2517
-> 2518 return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
2519
2520 # TODO: flexible with index=None and/or items=None
C:\Users\brennn1\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\internals.py in __init__(self, values, placement, ndim, fastpath)
88 raise ValueError('Wrong number of items passed %d, placement '
89 'implies %d' % (len(self.values),
---> 90 len(self.mgr_locs)))
91
92 #property
ValueError: Wrong number of items passed 3, placement implies 1
My code is as follows:
def predictAll(theta, nugget, trainX, trainY, testX, testY, testSet, title):
gp = gaussian_process.GaussianProcess(theta0=theta, nugget =nugget)
gp.fit(trainX, trainY)
predictedY, MSE = gp.predict(testX, eval_MSE = True)
sigma = np.sqrt(MSE)
results = testSet.copy()
results['predictedY'] = predictedY
results['sigma'] = sigma
print ("Train score R2:", gp.score(trainX, trainY))
print ("Test score R2:", sklearn.metrics.r2_score(testY, predictedY))
plt.figure(figsize = (9,8))
plt.scatter(testY, predictedY)
plt.plot([min(testY), max(testY)], [min(testY), max(testY)], 'r')
plt.xlim([min(testY), max(testY)])
plt.ylim([min(testY), max(testY)])
plt.title('Predicted vs. observed: ' + title)
plt.xlabel('Observed')
plt.ylabel('Predicted')
plt.show()
return gp, results
gp_dailyElectricity, results_dailyElectricity = predictAll(3, 0.04, trainX_dailyElectricity, trainY_dailyElectricity, testX_dailyElectricity, testY_dailyElectricity, testSet_dailyElectricity, 'Daily Electricity')
In general, the error ValueError: Wrong number of items passed 3, placement implies 1 suggests that you are attempting to put too many pigeons in too few pigeonholes. In this case, the value on the right of the equation
results['predictedY'] = predictedY
is trying to put 3 "things" into a container that allows only one. Because the left side is a dataframe column, and can accept multiple items on that (column) dimension, you should see that there are too many items on another dimension.
Here, it appears you are using sklearn for modeling, which is where gaussian_process.GaussianProcess() is coming from (I'm guessing, but correct me and revise the question if this is wrong).
Now, you generate predicted values for y here:
predictedY, MSE = gp.predict(testX, eval_MSE = True)
However, as we can see from the documentation for GaussianProcess, predict() returns two items. The first is y, which is array-like (emphasis mine). That means that it can have more than one dimension, or, to be concrete for thick headed people like me, it can have more than one column -- see that it can return (n_samples, n_targets) which, depending on testX, could be (1000, 3) (just to pick numbers). Thus, your predictedY might have 3 columns.
If so, when you try to put something with three "columns" into a single dataframe column, you are passing 3 items where only 1 would fit.
Not sure if this is relevant to your question but it might be relevant to someone else in the future: I had a similar error. Turned out that the df was empty (had zero rows) and that is what was causing the error in my command.
Another cause of this error is when you apply a function on a DataFrame where there are two columns with the same name.
Starting with pandas 1.3.x it's not allowed to fill objects (e.g. like an eagertensor from an embedding) into columns.
https://github.com/pandas-dev/pandas/blame/master/pandas/core/internals/blocks.py
So ValueError: The wrong number of items passed 3, placement implies 1 occurs when you're passing to many arguments but method supports only a few. for example -
df['First_Name', 'Last_Name'] = df['Full_col'].str.split(' ', expand = True)
In the above code, I'm trying to split Full_col into two sub-columns names as -First_Name & Last_Name, so here I'll get the error because instead list of columns the columns I'm passing only a single argument.
So to avoid this - use another sub-list
df[['First_Name', 'Last_Name']] = df['Full_col'].str.split(' ', expand = True)
Just adding this as an answer: nesting methods and misplacing closed brackets will also throw this error, ex:
march15_totals= march15_t.assign(sum_march15_t=march15_t[{"2021-03-15","2021-03-16","2021-03-17","2021-03-18","2021-03-19","2021-03-20","2021-03-21"}]).sum(axis=1)
Versus the (correct) version:
march15_totals= march15_t.assign(sum_march15_t=march15_t[{"2021-03-15","2021-03-16","2021-03-17","2021-03-18","2021-03-19","2021-03-20","2021-03-21"}].sum(axis=1))
This is probably common sense to most of you but I was quite puzzled until I realized my mistake.
I got this error when I was trying to convert a one-column dataframe, df, into a Series, pd.Series(df).
I resolved this with
pd.Series(df.values.flatten())
The problem was that the values in the dataframe were lists:
my_col
0 ['a']
1 ['b']
2 ['c']
3 ['d']
When I was printing the dataframe it wasn't showing the brackets which made it hard to track down.
for i in range(100):
try:
#Your code here
break
except:
continue
This one worked for me.

using pool.map to apply function to list of strings in parallel?

I have a large list of http user agent strings (taken from a pandas dataframe) that I am trying to parse using the python implementation of ua-parser. I can parse the list fine when only using a single thread, but based on some preliminary speed testing, it'd take me well over 10 hours to run the whole dataset.
I am trying to use pool.map() to decrease processing time but can't quite seem to figure out how to get it to work. I've read about a dozen 'tutorials' that I found online and have searched SO (likely a duplicate of some sort, as there are a lot of similar questions), but none of the dozens of attempts have worked for one reason or another. I'm assuming/hoping it's an easy fix.
Here is what I have so far:
from ua_parser import user_agent_parser
http_str = df['user_agents'].tolist()
def uaparse(http_str):
for i, item in enumerate(http_str):
return user_agent_parser.Parse(http_str[i])
pool = mp.Pool(processes=10)
parsed = pool.map(uaparse, range(0,len(http_str))
Right now I'm seeing the following error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-701fbf58d263> in <module>()
7
8 pool = mp.Pool(processes=10)
----> 9 results = pool.map(uaparse, range(0,len(http_str)))
/home/ubuntu/anaconda/lib/python2.7/multiprocessing/pool.pyc in map(self, func, iterable, chunksize)
249 '''
250 assert self._state == RUN
--> 251 return self.map_async(func, iterable, chunksize).get()
252
253 def imap(self, func, iterable, chunksize=1):
/home/ubuntu/anaconda/lib/python2.7/multiprocessing/pool.pyc in get(self, timeout)
565 return self._value
566 else:
--> 567 raise self._value
568
569 def _set(self, i, obj):
TypeError: 'int' object is not iterable
Thanks in advance for any assistance/direction you can provide.
It seems like all you need is:
http_str = df['user_agents'].tolist()
pool = mp.Pool(processes=10)
parsed = pool.map(user_agent_parser.Parse, http_str)

Categories

Resources