I am trying to set values of column named 'barony2' in data frame 'df', if it's 'add_suburb' column contains ref_file's current iterrows' 'Townland' value using the following code.
df = pd.read_csv('a.csv')
ref_file = pd.read_csv('b.csv')
for index, row in ref_file.iterrows():
cond = str(row['Townland']) in df.add_suburb
df.loc[cond, 'barony2'] = str(row['Barony'])
I get KeyError: 'cannot use a single bool to index into setitem' Any help is highly appreciated.
Traceback (most recent call last):
File "C:/Users/saiki/PycharmProjects/webcrawl/edit_address2.py", line 46, in <module>
import_encounters2()
File "C:/Users/saiki/PycharmProjects/webcrawl/edit_address2.py", line 20, in import_encounters2
df.loc[cond, 'barony2'] = str(row['Barony'])
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 132, in __setitem__
self._setitem_with_indexer(indexer, value)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 263, in _setitem_with_indexer
key, _ = convert_missing_indexer(idx)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1824, in convert_missing_indexer
raise KeyError("cannot use a single bool to index into setitem")
KeyError: 'cannot use a single bool to index into setitem'
Related
What I want to try is, if df confition is met add x value to variable
example
local_bid = 0
df.loc[["Entity"] == "Keyword"]
then
local_bid = df["Bid"]
I tried the
df.loc[["Entity"] == "Keyword", local_bid] = df["Bid"]
but it didn't work
Traceback (most recent call last):
File "/home/shaumne/Desktop/zorba/Sp_limpr.py", line 17, in <module>
s =limpr.loc[["Entity"] == "Keyword", local_bid] = limpr["Bid"]
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 818, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1703, in _setitem_with_indexer
key, _ = convert_missing_indexer(idx)
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 2585, in convert_missing_indexer
raise KeyError("cannot use a single bool to index into setitem")
KeyError: 'cannot use a single bool to index into setitem'
You need compare column df["Entity"] not list ["Entity"]:
local_bid = 0
df.loc[df["Entity"] == "Keyword", local_bid] = df["Bid"]
so i want to get the monthly sum with my script but i always get an AttributeError, which i dont understand. The column Timestamp does indeed exist on my combined_csv. I know for sure that this line is causing the problem since i tested al of my other code before.
AttributeError: 'DataFrame' object has no attribute 'Timestamp'
I'll appreciate every kind of help i can get - thanks
import os
import glob
import pandas as pd
# set working directory
os.chdir("Path to CSVs")
# find all csv files in the folder
# use glob pattern matching -> extension = 'csv'
# save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
# print(all_filenames)
# combine all files in the list
combined_csv = pd.concat([pd.read_csv(f, sep=';') for f in all_filenames])
# Format CSV
# Transform Timestamp column into datetime
combined_csv['Timestamp'] = pd.to_datetime(combined_csv.Timestamp)
# Read out first entry of every day of every month
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first']))
# To get the yield of day i have to subtract day 2 HtmDht_Energy - day 1 HtmDht_Energy
combined_csv["dailyYield"] = combined_csv["first"] - combined_csv["first"].shift()
# combined_csv.reset_index()
# combined_csv.index.set_names(["year", "month"], inplace=True)
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum()
Output of combined_csv.columns
Index(['Timestamp', 'teHst0101', 'teHst0102', 'teHst0103', 'teHst0104',
'teHst0105', 'teHst0106', 'teHst0107', 'teHst0201', 'teHst0202',
'teHst0203', 'teHst0204', 'teHst0301', 'teHst0302', 'teHst0303',
'teHst0304', 'teAmb', 'teSolFloHexHst', 'teSolRetHexHst',
'teSolCol0501', 'teSolCol1001', 'teSolCol1501', 'vfSol', 'prSolRetSuc',
'rdGlobalColAngle', 'gSolPump01_roActual', 'gSolPump02_roActual',
'gHstPump03_roActual', 'gHstPump04_roActual', 'gDhtPump06_roActual',
'gMB01_isOpened', 'gMB02_isOpened', 'gCV01_posActual',
'gCV02_posActual', 'HtmDht_Energy', 'HtmDht_Flow', 'HtmDht_Power',
'HtmDht_Volume', 'HtmDht_teFlow', 'HtmDht_teReturn', 'HtmHst_Energy',
'HtmHst_Flow', 'HtmHst_Power', 'HtmHst_Volume', 'HtmHst_teFlow',
'HtmHst_teReturn', 'teSolColDes', 'teHstFloDes'],
dtype='object')
Traceback:
When i select it with
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum()
Traceback (most recent call last):
File "D:\Users\wink\PycharmProjects\csvToExcel\main.py", line 28, in <module>
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum()
File "D:\Users\wink\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "D:\Users\wink\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'Timestamp'
traceback with mustafas solution
Traceback (most recent call last):
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3862, in reindexer
value = value.reindex(self.index)._values
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\util\_decorators.py", line 312, in wrapper
return func(*args, **kwargs)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4176, in reindex
return super().reindex(**kwargs)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\generic.py", line 4811, in reindex
return self._reindex_axes(
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4022, in _reindex_axes
frame = frame._reindex_index(
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4038, in _reindex_index
new_index, indexer = self.index.reindex(
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 2492, in reindex
target = MultiIndex.from_tuples(target)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 175, in new_meth
return meth(self_or_cls, *args, **kwargs)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 531, in from_tuples
arrays = list(lib.tuples_to_object_array(tuples).T)
File "pandas\_libs\lib.pyx", line 2527, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\winklerm\PycharmProjects\csvToExcel\main.py", line 28, in <module>
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum()
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3888, in _sanitize_column
value = reindexer(value).T
File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3870, in reindexer
raise TypeError(
TypeError: incompatible index of inserted column with frame index
This line makes the Timestamp column the index of the combined_csv:
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first']))
and therefore you get an error when you try to access .Timestamp.
Remedy is to reset_index, so instead of above line, you can try this:
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first'])).reset_index()
which will take the Timestamp column back into normal columns from the index and you can then access it.
Side note:
combined_csv["dailyYield"] = combined_csv["first"] - combined_csv["first"].shift()
is equivalent to
combined_csv["dailyYield"] = combined_csv["first"].diff()
Is there a way to force pandas to write an empty DataFrame to an HDF file?
import pandas as pd
df = pd.DataFrame(columns=['x','y'])
df.to_hdf('temp.h5', 'xxx')
df2 = pd.read_hdf('temp.h5', 'xxx')
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 389, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 740, in select
return it.get_result()
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1518, in get_result
results = self.func(self.start, self.stop, where)
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 733, in func
columns=columns)
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2986, in read
idx=i), start=_start, stop=_stop)
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2575, in read_index
_, index = self.read_index_node(getattr(self.group, key), **kwargs)
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 2676, in read_index_node
data = node[start:stop]
File ".../Python-3.6.3/lib/python3.6/site-packages/tables/vlarray.py", line 675, in __getitem__
return self.read(start, stop, step)
File ".../Python-3.6.3/lib/python3.6/site-packages/tables/vlarray.py", line 811, in read
listarr = self._read_array(start, stop, step)
File "tables/hdf5extension.pyx", line 2106, in tables.hdf5extension.VLArray._read_array (tables/hdf5extension.c:24649)
ValueError: cannot set WRITEABLE flag to True of this array
Writing with format='table':
import pandas as pd
df = pd.DataFrame(columns=['x','y'])
df.to_hdf('temp.h5', 'xxx', format='table')
df2 = pd.read_hdf('temp.h5', 'xxx')
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 389, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File ".../Python-3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 722, in select
raise KeyError('No object named {key} in the file'.format(key=key))
KeyError: 'No object named xxx in the file'
Pandas version: 0.24.2
Thank you for your help!
Putting empty DataFrame into HDFStore in fixed format should work (maybe you need to check versions of other packages, e.g. tables):
# Versions
pd.__version__
tables.__version__
# DF
df = pd.DataFrame(columns=['x','y'])
df
# Dump in fixed format
with pd.HDFStore('temp.h5') as store:
store.put('df', df, format='f')
print('Read:')
store.select('df')
>>> '0.24.2'
>>> '3.5.1'
>>> x y
>>>
>>> Read:
>>> x y
Pytable really forbids to do so (at least it was), but for fixed pandas has its workaround.
But as discussed in same github issue there are made some efforts to fix this behavior for table as well. But looks like solution is still 'hangs in the air' because it was so at the end of march.
I have a data pandas data frame and a function that appends a row to the existing data frame
df = pd.DataFrame()
df.columns = ['A', 'Line', 'B']
# add a new row at the end of non-indexed df
def addRow(self, colData, colNames):
l = len(df)
colList = []
for x in colData :
colList.append(str(x))
new_record = pd.DataFrame([tuple(colDList)])
new_record['Line'] = new_record['Line'].astype(int)
I am getting following error
Traceback (most recent call last):
File
line 207, in addRow
new_record[colName] = new_record[colName].astype(int)
File "site-packages/pandas/core/generic.py", line 3054, in astype
raise_on_error=raise_on_error, **kwargs)
File "python-3.6.1/linux_x86_64/lib/python3.6/site-packages/pandas/core/internals.py", line 3168, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "core/internals.py", line 3035, in apply
applied = getattr(b, f)(**kwargs)
File "site-packages/pandas/core/internals.py", line 462, in astype
values=values, **kwargs)
File "internals.py", line 505, in _astype
values = _astype_nansafe(values.ravel(), dtype, copy=True)
File "ckages/pandas/types/cast.py", line 534, in _astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/lib.pyx", line 983, in pandas.lib.astype_intsafe
(pandas/lib.c:16816)
File "pandas/src/util.pxd", line 74, in util.set_value_at (pandas/lib.c:69655)
ValueError: invalid literal for int() with base 10: ''
Can someone help me solving the value error
Agree with Sai in the comments, because how to convert 'A' to integer, so use pd.to_numeric:
new_record['Line'] = pd.to_numeric(new_record['Line'], errors='coerce')
Lol, Now you love pandas :-)
I need to update some of the data in my dataframe in the same sense of a update query in SQL. My current code is as follows:
import pandas
df = pandas.read_csv('filee.csv') # load trades from csv file
def updateDataframe(row):
if row['Name'] == "Joe":
return "Black"
else:
return row
df['LastName'] = df.apply(updateDataframe,axis=1)
However, it returns the following error:
Traceback (most recent call last):
File "test.py", line 11, in <module>
df['LastName'] = df.apply(updateDataframe,axis=1)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2038, in __setitem__
self._set_item(key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2085, in _set_item
NDFrame._set_item(self, key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 582, in _set_item
self._data.set(key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 1459, in set
_set_item(self.items[loc], value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 1454, in _set_item
block.set(item, arr)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 176, in set
self.values[loc] = value
ValueError: output operand requires a reduction, but reduction is not enabled
How do I resolve this. Or is there a better way to accomplish what I am trying to do?
#Jeff has a good concise implementation of your problem in the comments above, but if you want to fix the error in your code, try the following:
For the file filee.csv with the following contents:
Name,LastName
Andy,Blue
Joe,Smith
After the else, you need to return a Last Name string rather than a row object, as shown below:
import pandas
df = pandas.read_csv('filee.csv') # load trades from csv file
def updateDataframe(row):
if row['Name'] == "Joe":
return "Black"
else:
return row['LastName']
df['LastName'] = df.apply(updateDataframe,axis=1)
print df
results in the the following output:
Name LastName
0 Andy Blue
1 Joe Black