KeyError altering columns in dataframe

KeyError altering columns in dataframe - python

I'm trying to change column or deal with columns and I'm getting some keyError error. Working on chicago crime data analysis.
For example when i'm trying to run
ds["DATE OF OCCURRENCE"] = pd.to_datetime([ds["DATE OF OCCURRENCE"]], format="%m/%d/%Y %I:%M:%S %p")
KeyError
Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
Complete code:
import pandas as pd
url="https://data.cityofchicago.org/api/views/x2n5-8w5q/rows.csv?accessType=DOWNLOAD"
df= pd.read_csv(url)
ds = df.copy()
ds["DATE OF OCCURRENCE"] = pd.to_datetime([ds["DATE OF OCCURRENCE"]], format="%m/%d/%Y %I:%M:%S %p")
This is the Error:
2896 try:
-> 2897 return self._engine.get_loc(key) 2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'DATE OF OCCURRENCE'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) 2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2897 return
self._engine.get_loc(key) 2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2900
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'DATE OF OCCURRENCE'

Your column is renamed, so need Crime_Date and also select column only one [] for Series:
ds["Crime_Date"] = pd.to_datetime(ds["Crime_Date"], format="%m/%d/%Y %I:%M:%S %p")
EDIT:
There are some spaces in column name, so need:
ds["DATE OF OCCURRENCE"] = pd.to_datetime(ds["DATE OF OCCURRENCE"], format="%m/%d/%Y %I:%M:%S %p")

Related

Not able to display the column of a dataframe

When I am trying to print a single column of my data set it is showing errors
KeyError Traceback (most recent call
last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2645 try:
-> 2646 return self._engine.get_loc(key) 2647 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) in
----> 1 data['Label']
~\anaconda3\lib\site-packages\pandas\core\frame.py in
getitem(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2646 return
self._engine.get_loc(key) 2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
data['Label']

It could be possible that the column name is having trailing spaces. Just try to print the column names & verify.
print(data.columns)
or try to print the columns after
data.columns = data.columns.str.strip()

If you have DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets.
Now suppose that you want to select a column from the data(as per your question) DataFrame.
data["Label"]
But if you are unaware of the columns. You can get a column list and then display column data.
columns = data.columns.values.tolist()
data[columns[index]]

Money Flow Index keyerror

I've obtained the historical values from an example stock (Apple in this case) and was following an example I saw online, however, when their code succeeded mine failed because of some keyerror?
Could anyone tell/show me what's wrong and hopefully how to fix it? Error is:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-60-d89407b24e87> in <module>
3
4 for i in range(1, len(typical_price)):
----> 5 if typical_price[i] > typical_price[i-1]:
6 positive_flow.append(money_flow[i-1])
7 negative_flow.append(0)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
Code is:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
import pandas_datareader as pdr
stocks = ['AAPL']
data_close = pdr.get_data_yahoo(stocks, start='2020-01-01')['Close']
data_high = pdr.get_data_yahoo(stocks, start='2020-01-01')['High']
data_low = pdr.get_data_yahoo(stocks, start='2020-01-01')['Low']
data_volume = pdr.get_data_yahoo(stocks, start='2020-01-01')['Volume']
typical_price = (data_close + data_high + data_low)/3;
money_flow = typical_price * data_volume;
positive_flow = []
negative_flow = []
for i in range(1, len(typical_price)):
if typical_price[i] > typical_price[i-1]:
positive_flow.append(money_flow[i-1])
negative_flow.append(0)
elif typical_price[i] < typical_price[i-1]:
positive_flow.append(0)
negative_flow.append(money_flow[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
Error appears when I run the final part of the code where I try to retrieve the positive and negative moneyflow for my MFI algorithm.

Use iloc indexing
for i in range(1, len(typical_price)):
if typical_price.iloc[i].item() > typical_price.iloc[i-1].item():
positive_flow.append(money_flow.iloc[i-1])
negative_flow.append(0)
elif typical_price.iloc[i].item() < typical_price.iloc[i-1].item():
positive_flow.append(0)
negative_flow.append(money_flow.iloc[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
typical_price and money_flow probably has datetime as index not integers. If you want access row by integer-location then you can use iloc

Product Classification loading image error in python

Recently i was doing a product classification project, i have a pre-classified dataset 'train' with 41 folders corresponding to each category of products, and its csv file listing the image name and its category.
Then, i have another 'test' dataset with bunch of unclassified products, the project wished to classfied those pictures and output a csv file with "name" and "category"
I was using google colab in this project, after i successfully load and mount all the files and i ready to scan the trained image, i got an error about it, below is my code
train_image = []
for i in tqdm(range(train.shape[0])):
img = image.load_img('content/train/train/'+train[i].astype('str')+'.jpg', target_size=(28,28,1))
img = image.img_to_array(img)
img = img/255
train_image.append(img)
X = np.array(train_image)
and this is the error i get:
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
This is how the my training directory looks like in google colab
--content
↳train
↳train
↳0
↳1
....
↳41
What to do to eliminated that error?

Ok i get it it's the format problem of load_image
img = image.load_img(r'train/train/' + str(train_df['category'][i]) + '/' + train_df['filename'][i], target_size=(28,28))

Unexpected Python KeyError

I have loaded a CSV file into a Pandas dataframe:
import pandas as pd
Name ID Sex M_Status DaysOff
Joe 3 M S 1
NaN NaN NaN NaN 2
NaN NaN NaN NaN 3
df = pd.read_csv('People.csv')
This data will then be loaded into an HTML file.
test = """
HTML code
"""
Now for preparing data for the HTML file:
df1 = df.filter(['Name','ID','Sex','M_Status','DaysOff'])
file = ""
for i, rows in df1.iterrows():
name = (df1['Name'][i])
id = (df1['ID'][i])
sex = (df1['Sex'][i])
m_status = (df1['M_Status'][i])
days_off = (df1['DaysOff'][i])
with open(f"personInfo{i}.html", "w") as file:
file.write(test.format(name,id,sex,m_status,days_off))
file.close()
And the error:
KeyError: 'days_off'
Note: This error is occurs within the for loop.
Can anyone see where I'm going wrong? This error is generated when you try to grab data from a column which doesn't match the name, or if the column doesn't have that header namne. However, it does.
Error Information:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'days_off'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-16-35e6b916521b> in <module>
1 name = (df1['Name'][i])
2 id = (df1['ID'][i])
3 sex = (df1['Sex'][i])
4 m_status = (df1['M_Status'][i])
----> 5 days_off = (df1['DaysOff'][i])
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in
__getitem__(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return
self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method,
tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'days_off'

Just a hunch, but your error message suggests that you are trying to access your dataframe column with the key days_off, when it should be DaysOff. I don't see any place in the code you provided where this happens, but I would double-check your source code file to make sure that you are using the right key name.

I've resolved this and what a stupid error it was!
Basically there was a space at the end of the header name.
What Python wanted/was expecting:
days_off = (df1['DaysOff '][i])
whereas I was giving it:
days_off = (df1['DaysOff'][i])
Very stupid human error. Thanks to all that looked into it though

Why i am getting this error keyword:Borough

I am a beginner in Python.I merged two columnsAfter that i tried to change 'not assigned' value of a column with another column value. I cant do that. If I use premodified dataframe then I can change.
I scraped a table from a page then modifying the data in that dataframe.
import pandas as pd
import numpy as np
import requests
pip install lxml
toronto_url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
toronto_df1.head()
toronto_df1.drop(toronto_df1.loc[toronto_df1['Borough']=="Not assigned"].index, inplace=True)
toronto_df2=toronto_df1.groupby(['Postcode','Borough'],sort=False).agg(lambda x: ','.join(x))
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']
This is the code i have used.
I expect to change the neighbourhood value with borough value.
I got this error.
KeyError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2656 try:
-> 2657 return self._engine.get_loc(key) 2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Borough'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) 9 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2657 return
self._engine.get_loc(key) 2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2660
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Borough'

Reason of your keyerror is Neighbourhood is not column, but index level, solution is add reset_index:
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#boolean indexing
toronto_df1 = toronto_df1.loc[toronto_df1['Borough']!="Not assigned"]
toronto_df2 = toronto_df1.groupby(['Postcode','Borough'],sort=False)['Neighbourhood'].agg(','.join).reset_index()
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']
Or parameter as_index=False to groupby:
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#boolean indexing
toronto_df1 = toronto_df1.loc[toronto_df1['Borough']!="Not assigned"]
toronto_df2=toronto_df1.groupby(['Postcode','Borough'],sort=False, as_index=False)['Neighbourhood'].agg(','.join)
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

KeyError altering columns in dataframe - python

Related

Not able to display the column of a dataframe

Money Flow Index keyerror

Product Classification loading image error in python

Unexpected Python KeyError

Why i am getting this error keyword:Borough

Categories

Resources