Unexpected Python KeyError - python

I have loaded a CSV file into a Pandas dataframe:
import pandas as pd
Name ID Sex M_Status DaysOff
Joe 3 M S 1
NaN NaN NaN NaN 2
NaN NaN NaN NaN 3
df = pd.read_csv('People.csv')
This data will then be loaded into an HTML file.
test = """
HTML code
"""
Now for preparing data for the HTML file:
df1 = df.filter(['Name','ID','Sex','M_Status','DaysOff'])
file = ""
for i, rows in df1.iterrows():
name = (df1['Name'][i])
id = (df1['ID'][i])
sex = (df1['Sex'][i])
m_status = (df1['M_Status'][i])
days_off = (df1['DaysOff'][i])
with open(f"personInfo{i}.html", "w") as file:
file.write(test.format(name,id,sex,m_status,days_off))
file.close()
And the error:
KeyError: 'days_off'
Note: This error is occurs within the for loop.
Can anyone see where I'm going wrong? This error is generated when you try to grab data from a column which doesn't match the name, or if the column doesn't have that header namne. However, it does.
Error Information:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'days_off'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-16-35e6b916521b> in <module>
1 name = (df1['Name'][i])
2 id = (df1['ID'][i])
3 sex = (df1['Sex'][i])
4 m_status = (df1['M_Status'][i])
----> 5 days_off = (df1['DaysOff'][i])
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in
__getitem__(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return
self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method,
tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'days_off'

Just a hunch, but your error message suggests that you are trying to access your dataframe column with the key days_off, when it should be DaysOff. I don't see any place in the code you provided where this happens, but I would double-check your source code file to make sure that you are using the right key name.

I've resolved this and what a stupid error it was!
Basically there was a space at the end of the header name.
What Python wanted/was expecting:
days_off = (df1['DaysOff '][i])
whereas I was giving it:
days_off = (df1['DaysOff'][i])
Very stupid human error. Thanks to all that looked into it though

Related

Pandas df: retrieving the records that have cell value == float not working. What am I doing wrong?

I have this code and I can't figure out what to do to retrieve the row that I want.
I am trying to retrieve the row that has device_id=16384035. I tried both float and integer and also string there (since it tells me the column is object) but none worked
print(s_devices['Device ID'])
print(s_devices.columns)
print(s_devices.iloc[0,1])
print(type(s_devices.iloc[0,1]))
print(s_devices[['Device ID']==float(16384035)])
The above prints the below
0 16384035.0
1 17177488.0
2 16384036.0
3 17177489.0
4 0
...
534 17746358.0
535 17746359.0
536 17746360.0
537 17746361.0
538 17746362.0
Name: Device ID, Length: 539, dtype: object
Index(['Serial Number', 'Device ID', 'Device Name'.....some keys removed from here],
dtype='object')
16384035.0
<class 'float'>
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_540/2993650396.py in <module>
3 print(s_devices.iloc[0,1])
4 print(type(s_devices.iloc[0,1]))
----> 5 print(s_devices[['Device ID']==float(16384035)])
~/.local/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3453 if self.columns.nlevels > 1:
3454 return self._getitem_multilevel(key)
-> 3455 indexer = self.columns.get_loc(key)
3456 if is_integer(indexer):
3457 indexer = [indexer]
~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: False
You're indexing it wrong:
s_devices[['Device ID']==float(16384035)]
You create a boolean Series with the equality and index the DataFrame with it:
msk = s_devices['Device ID']==float(16384035)
out = s_devices[msk]
or as a one-liner:
out = s_devices[s_devices['Device ID']==float(16384035)]
Note that ['Device ID']==float(16384035) evaluates to False. So what you're trying to do is: s_devices[False] which raises the KeyError since there's no column named False.

Not able to display the column of a dataframe

When I am trying to print a single column of my data set it is showing errors
KeyError Traceback (most recent call
last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2645 try:
-> 2646 return self._engine.get_loc(key) 2647 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) in
----> 1 data['Label']
~\anaconda3\lib\site-packages\pandas\core\frame.py in
getitem(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2646 return
self._engine.get_loc(key) 2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
data['Label']
It could be possible that the column name is having trailing spaces. Just try to print the column names & verify.
print(data.columns)
or try to print the columns after
data.columns = data.columns.str.strip()
If you have DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets.
Now suppose that you want to select a column from the data(as per your question) DataFrame.
data["Label"]
But if you are unaware of the columns. You can get a column list and then display column data.
columns = data.columns.values.tolist()
data[columns[index]]

KeyError while tring to write an additional field in Python Pandas dataframe

I want to add a calculated field 'Score' in dataframe positions_deposits.
When I run the following operation on pandas dataframe positions_deposits,
for i in range(len(positions_deposits)):
<Read some values from the dataframe which would be passed to a function in the next line>
Score = RAG_function (Amber_threshold, Red_threshold, Type_threshold, Values)
positions_deposits['Score'].loc[i] = Score
I get the following error. Can you please guide me through what error I am making and how to resolve it?
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Score'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-201-7d0481b84aa4> in <module>
6 Values = positions_deposits['Values'].loc[i]
7 # Score = RAG_function (Amber_threshold, Red_threshold, Type_threshold, Values)
----> 8 positions_deposits["Score"].loc[i] = RAG_function (Amber_threshold, Red_threshold, Type_threshold, Values)
9
10 # print("Score is %i.00" %Score)
~/.local/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2904 if self.columns.nlevels > 1:
2905 return self._getitem_multilevel(key)
-> 2906 indexer = self.columns.get_loc(key)
2907 if is_integer(indexer):
2908 indexer = [indexer]
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'Score'
Please note: if I print(Score), there is no error. It means the function, RAG_function is getting executed but the dataframe is failing.
Thanks!
You'll probably want to read up on how .loc and .iloc work. But having said that, there is another way which is better:
import pandas
import random
df = pandas.DataFrame([{"A": random.randint(0,100), "B": random.randint(0,100)} for _ in range(100)])
def rag_function(row):
A = row["A"]
B = row["B"]
return A * B
df["Score"] = df.apply(rag_function, axis=1)
NOTE: I don't have your RAG_function so I've created some random function. The idea is that you apply this function to every row in the dataframe.

Money Flow Index keyerror

I've obtained the historical values from an example stock (Apple in this case) and was following an example I saw online, however, when their code succeeded mine failed because of some keyerror?
Could anyone tell/show me what's wrong and hopefully how to fix it? Error is:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-60-d89407b24e87> in <module>
3
4 for i in range(1, len(typical_price)):
----> 5 if typical_price[i] > typical_price[i-1]:
6 positive_flow.append(money_flow[i-1])
7 negative_flow.append(0)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
Code is:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
import pandas_datareader as pdr
stocks = ['AAPL']
data_close = pdr.get_data_yahoo(stocks, start='2020-01-01')['Close']
data_high = pdr.get_data_yahoo(stocks, start='2020-01-01')['High']
data_low = pdr.get_data_yahoo(stocks, start='2020-01-01')['Low']
data_volume = pdr.get_data_yahoo(stocks, start='2020-01-01')['Volume']
typical_price = (data_close + data_high + data_low)/3;
money_flow = typical_price * data_volume;
positive_flow = []
negative_flow = []
for i in range(1, len(typical_price)):
if typical_price[i] > typical_price[i-1]:
positive_flow.append(money_flow[i-1])
negative_flow.append(0)
elif typical_price[i] < typical_price[i-1]:
positive_flow.append(0)
negative_flow.append(money_flow[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
Error appears when I run the final part of the code where I try to retrieve the positive and negative moneyflow for my MFI algorithm.
Use iloc indexing
for i in range(1, len(typical_price)):
if typical_price.iloc[i].item() > typical_price.iloc[i-1].item():
positive_flow.append(money_flow.iloc[i-1])
negative_flow.append(0)
elif typical_price.iloc[i].item() < typical_price.iloc[i-1].item():
positive_flow.append(0)
negative_flow.append(money_flow.iloc[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
typical_price and money_flow probably has datetime as index not integers. If you want access row by integer-location then you can use iloc

Why i am getting this error keyword:Borough

I am a beginner in Python.I merged two columnsAfter that i tried to change 'not assigned' value of a column with another column value. I cant do that. If I use premodified dataframe then I can change.
I scraped a table from a page then modifying the data in that dataframe.
import pandas as pd
import numpy as np
import requests
pip install lxml
toronto_url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
toronto_df1.head()
toronto_df1.drop(toronto_df1.loc[toronto_df1['Borough']=="Not assigned"].index, inplace=True)
toronto_df2=toronto_df1.groupby(['Postcode','Borough'],sort=False).agg(lambda x: ','.join(x))
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']
This is the code i have used.
I expect to change the neighbourhood value with borough value.
I got this error.
KeyError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2656 try:
-> 2657 return self._engine.get_loc(key) 2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Borough'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) 9 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2657 return
self._engine.get_loc(key) 2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2660
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Borough'
Reason of your keyerror is Neighbourhood is not column, but index level, solution is add reset_index:
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#boolean indexing
toronto_df1 = toronto_df1.loc[toronto_df1['Borough']!="Not assigned"]
toronto_df2 = toronto_df1.groupby(['Postcode','Borough'],sort=False)['Neighbourhood'].agg(','.join).reset_index()
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']
Or parameter as_index=False to groupby:
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#boolean indexing
toronto_df1 = toronto_df1.loc[toronto_df1['Borough']!="Not assigned"]
toronto_df2=toronto_df1.groupby(['Postcode','Borough'],sort=False, as_index=False)['Neighbourhood'].agg(','.join)
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']

Categories

Resources