Why i am getting this error keyword:Borough - python

I am a beginner in Python.I merged two columnsAfter that i tried to change 'not assigned' value of a column with another column value. I cant do that. If I use premodified dataframe then I can change.
I scraped a table from a page then modifying the data in that dataframe.
import pandas as pd
import numpy as np
import requests
pip install lxml
toronto_url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
toronto_df1.head()
toronto_df1.drop(toronto_df1.loc[toronto_df1['Borough']=="Not assigned"].index, inplace=True)
toronto_df2=toronto_df1.groupby(['Postcode','Borough'],sort=False).agg(lambda x: ','.join(x))
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']
This is the code i have used.
I expect to change the neighbourhood value with borough value.
I got this error.
KeyError Traceback (most recent call
last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2656 try:
-> 2657 return self._engine.get_loc(key) 2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Borough'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) 9 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2657 return
self._engine.get_loc(key) 2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2660
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Borough'

Reason of your keyerror is Neighbourhood is not column, but index level, solution is add reset_index:
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#boolean indexing
toronto_df1 = toronto_df1.loc[toronto_df1['Borough']!="Not assigned"]
toronto_df2 = toronto_df1.groupby(['Postcode','Borough'],sort=False)['Neighbourhood'].agg(','.join).reset_index()
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']
Or parameter as_index=False to groupby:
toronto_df1= pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
#boolean indexing
toronto_df1 = toronto_df1.loc[toronto_df1['Borough']!="Not assigned"]
toronto_df2=toronto_df1.groupby(['Postcode','Borough'],sort=False, as_index=False)['Neighbourhood'].agg(','.join)
toronto_df2.loc[toronto_df2['Neighbourhood'] == "Not assigned", 'Neighbourhood'] = toronto_df2['Borough']

Related

Not able to display the column of a dataframe

When I am trying to print a single column of my data set it is showing errors
KeyError Traceback (most recent call
last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2645 try:
-> 2646 return self._engine.get_loc(key) 2647 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) in
----> 1 data['Label']
~\anaconda3\lib\site-packages\pandas\core\frame.py in
getitem(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance) 2646 return
self._engine.get_loc(key) 2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Label'
data['Label']
It could be possible that the column name is having trailing spaces. Just try to print the column names & verify.
print(data.columns)
or try to print the columns after
data.columns = data.columns.str.strip()
If you have DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets.
Now suppose that you want to select a column from the data(as per your question) DataFrame.
data["Label"]
But if you are unaware of the columns. You can get a column list and then display column data.
columns = data.columns.values.tolist()
data[columns[index]]

Money Flow Index keyerror

I've obtained the historical values from an example stock (Apple in this case) and was following an example I saw online, however, when their code succeeded mine failed because of some keyerror?
Could anyone tell/show me what's wrong and hopefully how to fix it? Error is:
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-60-d89407b24e87> in <module>
3
4 for i in range(1, len(typical_price)):
----> 5 if typical_price[i] > typical_price[i-1]:
6 positive_flow.append(money_flow[i-1])
7 negative_flow.append(0)
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1
Code is:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math
import pandas_datareader as pdr
stocks = ['AAPL']
data_close = pdr.get_data_yahoo(stocks, start='2020-01-01')['Close']
data_high = pdr.get_data_yahoo(stocks, start='2020-01-01')['High']
data_low = pdr.get_data_yahoo(stocks, start='2020-01-01')['Low']
data_volume = pdr.get_data_yahoo(stocks, start='2020-01-01')['Volume']
typical_price = (data_close + data_high + data_low)/3;
money_flow = typical_price * data_volume;
positive_flow = []
negative_flow = []
for i in range(1, len(typical_price)):
if typical_price[i] > typical_price[i-1]:
positive_flow.append(money_flow[i-1])
negative_flow.append(0)
elif typical_price[i] < typical_price[i-1]:
positive_flow.append(0)
negative_flow.append(money_flow[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
Error appears when I run the final part of the code where I try to retrieve the positive and negative moneyflow for my MFI algorithm.
Use iloc indexing
for i in range(1, len(typical_price)):
if typical_price.iloc[i].item() > typical_price.iloc[i-1].item():
positive_flow.append(money_flow.iloc[i-1])
negative_flow.append(0)
elif typical_price.iloc[i].item() < typical_price.iloc[i-1].item():
positive_flow.append(0)
negative_flow.append(money_flow.iloc[i-1])
else:
positive_flow.append(0)
negative_flow.append(0)
typical_price and money_flow probably has datetime as index not integers. If you want access row by integer-location then you can use iloc

Product Classification loading image error in python

Recently i was doing a product classification project, i have a pre-classified dataset 'train' with 41 folders corresponding to each category of products, and its csv file listing the image name and its category.
Then, i have another 'test' dataset with bunch of unclassified products, the project wished to classfied those pictures and output a csv file with "name" and "category"
I was using google colab in this project, after i successfully load and mount all the files and i ready to scan the trained image, i got an error about it, below is my code
train_image = []
for i in tqdm(range(train.shape[0])):
img = image.load_img('content/train/train/'+train[i].astype('str')+'.jpg', target_size=(28,28,1))
img = image.img_to_array(img)
img = img/255
train_image.append(img)
X = np.array(train_image)
and this is the error i get:
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
This is how the my training directory looks like in google colab
--content
↳train
↳train
↳0
↳1
....
↳41
What to do to eliminated that error?
Ok i get it it's the format problem of load_image
img = image.load_img(r'train/train/' + str(train_df['category'][i]) + '/' + train_df['filename'][i], target_size=(28,28))

KeyError altering columns in dataframe

I'm trying to change column or deal with columns and I'm getting some keyError error. Working on chicago crime data analysis.
For example when i'm trying to run
ds["DATE OF OCCURRENCE"] = pd.to_datetime([ds["DATE OF OCCURRENCE"]], format="%m/%d/%Y %I:%M:%S %p")
KeyError
Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
Complete code:
import pandas as pd
url="https://data.cityofchicago.org/api/views/x2n5-8w5q/rows.csv?accessType=DOWNLOAD"
df= pd.read_csv(url)
ds = df.copy()
ds["DATE OF OCCURRENCE"] = pd.to_datetime([ds["DATE OF OCCURRENCE"]], format="%m/%d/%Y %I:%M:%S %p")
This is the Error:
2896 try:
-> 2897 return self._engine.get_loc(key) 2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'DATE OF OCCURRENCE'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call
last) 2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in
get_loc(self, key, method, tolerance) 2897 return
self._engine.get_loc(key) 2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2900
indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'DATE OF OCCURRENCE'
Your column is renamed, so need Crime_Date and also select column only one [] for Series:
ds["Crime_Date"] = pd.to_datetime(ds["Crime_Date"], format="%m/%d/%Y %I:%M:%S %p")
EDIT:
There are some spaces in column name, so need:
ds["DATE OF OCCURRENCE"] = pd.to_datetime(ds["DATE OF OCCURRENCE"], format="%m/%d/%Y %I:%M:%S %p")

Unexpected Python KeyError

I have loaded a CSV file into a Pandas dataframe:
import pandas as pd
Name ID Sex M_Status DaysOff
Joe 3 M S 1
NaN NaN NaN NaN 2
NaN NaN NaN NaN 3
df = pd.read_csv('People.csv')
This data will then be loaded into an HTML file.
test = """
HTML code
"""
Now for preparing data for the HTML file:
df1 = df.filter(['Name','ID','Sex','M_Status','DaysOff'])
file = ""
for i, rows in df1.iterrows():
name = (df1['Name'][i])
id = (df1['ID'][i])
sex = (df1['Sex'][i])
m_status = (df1['M_Status'][i])
days_off = (df1['DaysOff'][i])
with open(f"personInfo{i}.html", "w") as file:
file.write(test.format(name,id,sex,m_status,days_off))
file.close()
And the error:
KeyError: 'days_off'
Note: This error is occurs within the for loop.
Can anyone see where I'm going wrong? This error is generated when you try to grab data from a column which doesn't match the name, or if the column doesn't have that header namne. However, it does.
Error Information:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'days_off'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-16-35e6b916521b> in <module>
1 name = (df1['Name'][i])
2 id = (df1['ID'][i])
3 sex = (df1['Sex'][i])
4 m_status = (df1['M_Status'][i])
----> 5 days_off = (df1['DaysOff'][i])
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in
__getitem__(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in
get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return
self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method,
tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in
pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'days_off'
Just a hunch, but your error message suggests that you are trying to access your dataframe column with the key days_off, when it should be DaysOff. I don't see any place in the code you provided where this happens, but I would double-check your source code file to make sure that you are using the right key name.
I've resolved this and what a stupid error it was!
Basically there was a space at the end of the header name.
What Python wanted/was expecting:
days_off = (df1['DaysOff '][i])
whereas I was giving it:
days_off = (df1['DaysOff'][i])
Very stupid human error. Thanks to all that looked into it though

Categories

Resources