Pandas Reduction Error when trying to update dataframe - python

I need to update some of the data in my dataframe in the same sense of a update query in SQL. My current code is as follows:
import pandas
df = pandas.read_csv('filee.csv') # load trades from csv file
def updateDataframe(row):
if row['Name'] == "Joe":
return "Black"
else:
return row
df['LastName'] = df.apply(updateDataframe,axis=1)
However, it returns the following error:
Traceback (most recent call last):
File "test.py", line 11, in <module>
df['LastName'] = df.apply(updateDataframe,axis=1)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2038, in __setitem__
self._set_item(key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2085, in _set_item
NDFrame._set_item(self, key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 582, in _set_item
self._data.set(key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 1459, in set
_set_item(self.items[loc], value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 1454, in _set_item
block.set(item, arr)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 176, in set
self.values[loc] = value
ValueError: output operand requires a reduction, but reduction is not enabled
How do I resolve this. Or is there a better way to accomplish what I am trying to do?

#Jeff has a good concise implementation of your problem in the comments above, but if you want to fix the error in your code, try the following:
For the file filee.csv with the following contents:
Name,LastName
Andy,Blue
Joe,Smith
After the else, you need to return a Last Name string rather than a row object, as shown below:
import pandas
df = pandas.read_csv('filee.csv') # load trades from csv file
def updateDataframe(row):
if row['Name'] == "Joe":
return "Black"
else:
return row['LastName']
df['LastName'] = df.apply(updateDataframe,axis=1)
print df
results in the the following output:
Name LastName
0 Andy Blue
1 Joe Black

Related

if df confition is met add x value to variable

What I want to try is, if df confition is met add x value to variable
example
local_bid = 0
df.loc[["Entity"] == "Keyword"]
then
local_bid = df["Bid"]
I tried the
df.loc[["Entity"] == "Keyword", local_bid] = df["Bid"]
but it didn't work
Traceback (most recent call last):
File "/home/shaumne/Desktop/zorba/Sp_limpr.py", line 17, in <module>
s =limpr.loc[["Entity"] == "Keyword", local_bid] = limpr["Bid"]
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 818, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1703, in _setitem_with_indexer
key, _ = convert_missing_indexer(idx)
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 2585, in convert_missing_indexer
raise KeyError("cannot use a single bool to index into setitem")
KeyError: 'cannot use a single bool to index into setitem'
You need compare column df["Entity"] not list ["Entity"]:
local_bid = 0
df.loc[df["Entity"] == "Keyword", local_bid] = df["Bid"]

Pandas AttributeError: 'DataFrame' object has no attribute 'Timestamp'

so i want to get the monthly sum with my script but i always get an AttributeError, which i dont understand. The column Timestamp does indeed exist on my combined_csv. I know for sure that this line is causing the problem since i tested al of my other code before.
AttributeError: 'DataFrame' object has no attribute 'Timestamp'
I'll appreciate every kind of help i can get - thanks
import os
import glob
import pandas as pd
# set working directory
os.chdir("Path to CSVs")
# find all csv files in the folder
# use glob pattern matching -> extension = 'csv'
# save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
# print(all_filenames)
# combine all files in the list
combined_csv = pd.concat([pd.read_csv(f, sep=';') for f in all_filenames])
# Format CSV
# Transform Timestamp column into datetime
combined_csv['Timestamp'] = pd.to_datetime(combined_csv.Timestamp)
# Read out first entry of every day of every month
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first']))
# To get the yield of day i have to subtract day 2 HtmDht_Energy - day 1 HtmDht_Energy
combined_csv["dailyYield"] = combined_csv["first"] - combined_csv["first"].shift()
# combined_csv.reset_index()
# combined_csv.index.set_names(["year", "month"], inplace=True)
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum()
Output of combined_csv.columns
Index(['Timestamp', 'teHst0101', 'teHst0102', 'teHst0103', 'teHst0104',
'teHst0105', 'teHst0106', 'teHst0107', 'teHst0201', 'teHst0202',
'teHst0203', 'teHst0204', 'teHst0301', 'teHst0302', 'teHst0303',
'teHst0304', 'teAmb', 'teSolFloHexHst', 'teSolRetHexHst',
'teSolCol0501', 'teSolCol1001', 'teSolCol1501', 'vfSol', 'prSolRetSuc',
'rdGlobalColAngle', 'gSolPump01_roActual', 'gSolPump02_roActual',
'gHstPump03_roActual', 'gHstPump04_roActual', 'gDhtPump06_roActual',
'gMB01_isOpened', 'gMB02_isOpened', 'gCV01_posActual',
'gCV02_posActual', 'HtmDht_Energy', 'HtmDht_Flow', 'HtmDht_Power',
'HtmDht_Volume', 'HtmDht_teFlow', 'HtmDht_teReturn', 'HtmHst_Energy',
'HtmHst_Flow', 'HtmHst_Power', 'HtmHst_Volume', 'HtmHst_teFlow',
'HtmHst_teReturn', 'teSolColDes', 'teHstFloDes'],
dtype='object')
Traceback:
When i select it with
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum()
Traceback (most recent call last):
File "D:\Users\wink\PycharmProjects\csvToExcel\main.py", line 28, in <module>
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum()
File "D:\Users\wink\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "D:\Users\wink\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'Timestamp'
traceback with mustafas solution
Traceback (most recent call last):
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3862, in reindexer
value = value.reindex(self.index)._values
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\util\_decorators.py", line 312, in wrapper
return func(*args, **kwargs)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4176, in reindex
return super().reindex(**kwargs)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\generic.py", line 4811, in reindex
return self._reindex_axes(
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4022, in _reindex_axes
frame = frame._reindex_index(
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4038, in _reindex_index
new_index, indexer = self.index.reindex(
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 2492, in reindex
target = MultiIndex.from_tuples(target)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 175, in new_meth
return meth(self_or_cls, *args, **kwargs)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 531, in from_tuples
arrays = list(lib.tuples_to_object_array(tuples).T)
File "pandas\_libs\lib.pyx", line 2527, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\winklerm\PycharmProjects\csvToExcel\main.py", line 28, in <module>
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum()
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3888, in _sanitize_column
value = reindexer(value).T
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3870, in reindexer
raise TypeError(
TypeError: incompatible index of inserted column with frame index
This line makes the Timestamp column the index of the combined_csv:
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first']))
and therefore you get an error when you try to access .Timestamp.
Remedy is to reset_index, so instead of above line, you can try this:
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first'])).reset_index()
which will take the Timestamp column back into normal columns from the index and you can then access it.
Side note:
combined_csv["dailyYield"] = combined_csv["first"] - combined_csv["first"].shift()
is equivalent to
combined_csv["dailyYield"] = combined_csv["first"].diff()

Pandas Dataframe: "ValueError: could not convert string to float" when trying to change value of second column entry

I have a short code where I want to be able to change values of a csv.
My Code is the following:
import pandas as
import os
if os.path.exists('annotation.csv'):
df = pd.read_csv('annotation.csv')
label = 'unknown'
img_number = 99
if str(df.at[img_number, 'label_2']) == 'nan':
df.at[img_number, 'label_2'] = label
else:
continue
My file looks like this:
label_1,label_2
,
,
,
,
,
(and so on)
I am able to change 'label_1' in df.at[img_number, 'label_2']
but if I try replace it 'label_2' I get the following error.
Traceback (most recent call last):
File "D:/develop/mbuchwald/machinelearning/Organ_Annotation/test.py", line 11, in <module>
df.at[img_number, 'label_2'] = label
File "C:\Users\mbuchwald\AppData\Local\Continuum\anaconda3\envs\newEnv\lib\site-packages\pandas\core\indexing.py", line 2159, in __setitem__
self.obj._set_value(*key, takeable=self._takeable)
File "C:\Users\mbuchwald\AppData\Local\Continuum\anaconda3\envs\newEnv\lib\site-packages\pandas\core\frame.py", line 2582, in _set_value
engine.set_value(series._values, index, value)
File "pandas\_libs\index.pyx", line 124, in pandas._libs.index.IndexEngine.set_value
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.set_value
File "pandas/_libs/src\util.pxd", line 150, in util.set_value_at
File "pandas/_libs/src\util.pxd", line 142, in util.set_value_at_unsafe
ValueError: could not convert string to float: 'unknown'
Has anyone a clue. I canĀ“t figure it out. Thank you!

using replace for a column of a pandas df TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str'

How should I fix this?
import pandas as pd
csv_file = 'sample.csv'
count = 1
my_filtered_csv = pd.read_csv(csv_file, usecols=['subDirectory_filePath', 'expression'])
emotion_map = { '0':'6', '1':'3', '2':'4', '3':'5', '4':'2', '5':'1', '6':'0'}
my_filtered_csv['expression'] = my_filtered_csv['expression'].replace(emotion_map)
print(my_filtered_csv)
Error is:
Traceback (most recent call last):
File "/Users/mona/CS585/project/affnet/emotion_map.py", line 11, in <module>
my_filtered_csv['expression'] = my_filtered_csv['expression'].replace(emotion_map)
File "/Users/mona/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 3836, in replace
limit=limit, regex=regex)
File "/Users/mona/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 3885, in replace
regex=regex)
File "/Users/mona/anaconda/lib/python3.6/site-packages/pandas/core/internals.py", line 3259, in replace_list
masks = [comp(s) for i, s in enumerate(src_list)]
File "/Users/mona/anaconda/lib/python3.6/site-packages/pandas/core/internals.py", line 3259, in <listcomp>
masks = [comp(s) for i, s in enumerate(src_list)]
File "/Users/mona/anaconda/lib/python3.6/site-packages/pandas/core/internals.py", line 3247, in comp
return _maybe_compare(values, getattr(s, 'asm8', s), operator.eq)
File "/Users/mona/anaconda/lib/python3.6/site-packages/pandas/core/internals.py", line 4619, in _maybe_compare
raise TypeError("Cannot compare types %r and %r" % tuple(type_names))
TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str'
Process finished with exit code 1
A few lines of the csv file looks like:
,subDirectory_filePath,expression
0,689/737db2483489148d783ef278f43f486c0a97e140fc4b6b61b84363ca.jpg,1
1,392/c4db2f9b7e4b422d14b6e038f0cdc3ecee239b55326e9181ee4520f9.jpg,0
2,468/21772b68dc8c2a11678c8739eca33adb6ccc658600e4da2224080603.jpg,0
3,944/06e9ae8d3b240eb68fa60534783eacafce2def60a86042f9b7d59544.jpg,1
4,993/02e06ee5521958b4042dd73abb444220609d96f57b1689abbe87c024.jpg,8
5,979/f675c6a88cdef99a6d8b0261741217a0319387fcf1571a174f99ac81.jpg,6
6,637/94b769d8e880cbbea8eaa1350cb8c094a03d27f9fef44e1f4c0fb2ae.jpg,9
7,997/b81f843f08ce3bb0c48b270dc58d2ab8bf5bea3e2262e50bbcadbec2.jpg,6
8,358/21a32dd1c1ecd57d3e8964621c911df1c0b3348a4ae5203b4a243230.JPG,9
Changing the emotion_map to the following fixed the problem:
emotion_map = { 0:6, 1:3, 2:4, 3:5, 4:2, 5:1, 6:0}
Another possiblity which can also create this error is:
you have already run this code once and the data is already replaced.
To solve this problem, you can go back and load the data set again

KeyError: 'cannot use a single bool to index into setitem'

I am trying to set values of column named 'barony2' in data frame 'df', if it's 'add_suburb' column contains ref_file's current iterrows' 'Townland' value using the following code.
df = pd.read_csv('a.csv')
ref_file = pd.read_csv('b.csv')
for index, row in ref_file.iterrows():
cond = str(row['Townland']) in df.add_suburb
df.loc[cond, 'barony2'] = str(row['Barony'])
I get KeyError: 'cannot use a single bool to index into setitem' Any help is highly appreciated.
Traceback (most recent call last):
File "C:/Users/saiki/PycharmProjects/webcrawl/edit_address2.py", line 46, in <module>
import_encounters2()
File "C:/Users/saiki/PycharmProjects/webcrawl/edit_address2.py", line 20, in import_encounters2
df.loc[cond, 'barony2'] = str(row['Barony'])
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 132, in __setitem__
self._setitem_with_indexer(indexer, value)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 263, in _setitem_with_indexer
key, _ = convert_missing_indexer(idx)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1824, in convert_missing_indexer
raise KeyError("cannot use a single bool to index into setitem")
KeyError: 'cannot use a single bool to index into setitem'

Categories

Resources