KeyError(Key) when using append with defaultdict - python

I am getting the following error when I am trying to append to a dictionary using defaultdict(list). From my understanding, defaultdict is suppose to prevent a keyerror.
raise KeyError(key) from err
KeyError: 'id'
The following is my code:
weather_data = defaultdict(list)
m = len(_ids)
date = str(date.today())
i = 0
while i < m:
url = ("https://api.openweathermap.org/data/2.5/weather?id=%s&units=%s&appid=%s") %
(_ids.loc[i], 'imperial', weather_key)
payload = r.get(url).json()
payload_from_json = pd.json_normalize(payload)
weather_data[date].append(date)
weather_data['id'].append(payload_from_json['id'])
weather_data['weather'].append(payload_from_json['weather'])
weather_data['base'].append(payload_from_json['base'])
weather_data['visibility'].append(payload_from_json['visibility'])
weather_data['dt'].append(payload_from_json['dt'])
weather_data['name'].append(payload_from_json['name'])
weather_data['cod'].append(payload_from_json['cod'])
weather_data['coord.lon'].append(payload_from_json['coord.lon'])
weather_data['coord.lat'].append(payload_from_json['coord.lat'])
weather_data['main.temp'].append(payload_from_json['main.temp'])
weather_data['main.feels_like'].append(payload_from_json['main.feels_like'])
weather_data['main.temp_min'].append(payload_from_json['main.temp_min'])
weather_data['main.temp_max'].append(payload_from_json['main.temp_max'])
weather_data['main.pressure'].append(payload_from_json['main.pressure'])
weather_data['main.humidity'].append(payload_from_json['main.humidity'])
weather_data['wind.speed'].append(payload_from_json['wind.speed'])
weather_data['wind.deg'].append(payload_from_json['wind.deg'])
weather_data['clouds.all'].append(payload_from_json['clouds.all'])
weather_data['sys.type'].append(payload_from_json['sys.type'])
weather_data['sys.id'].append(payload_from_json['sys.id'])
weather_data['sys.country'].append(payload_from_json['sys.country'])
weather_data['sys.sunrise'].append(payload_from_json['sys.sunrise'])
weather_data['sys.sunset'].append(payload_from_json['sys.sunset'])
i = i + 1
print(weather_data)
Here is the traceback error - can someone tell me how to interpret this:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'coord.lon'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 40, in <module>
weather_data['coord.lon'].append(payload_from_json['coord.lon'])
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2902, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err

[EDIT]
your weather_data is your default dict, but payload_from_json not. So your error was raised by payload_from_json.
You can fix this by using get to access the key:
weather_data['id'].append(payload_from_json.get('id'))
if you dont want to include junk data, you can add some verifications before append:
if payload_from_json.get('id') is not None:
weather_data['id'].append(payload_from_json.get('id'))
Also, you can add some default value like this:
weather_data['id'].append(payload_from_json.get('id', 'missing'))
or
weather_data['id'].append(payload_from_json.get('id', ''))
or by default:
weather_data['id'].append(payload_from_json.get('id', None))
In your specific problem, this should work:
weather_data = defaultdict(list)
m = len(_ids)
date = str(date.today())
i = 0
while i < m:
url = ("https://api.openweathermap.org/data/2.5/weather?id=%s&units=%s&appid=%s") %
(_ids.loc[i], 'imperial', weather_key)
payload = r.get(url).json()
payload_from_json = pd.json_normalize(payload)
weather_data[date].append(date)
weather_data['id'].append(payload_from_json.get('id'))
weather_data['weather'].append(payload_from_json.get('weather'))
weather_data['base'].append(payload_from_json.get('base'))
weather_data['visibility'].append(payload_from_json.get('visibility'))
weather_data['dt'].append(payload_from_json.get('dt'))
weather_data['name'].append(payload_from_json.get('name'))
weather_data['cod'].append(payload_from_json.get('cod'))
weather_data['coord.lon'].append(payload_from_json.get('coord.lon'))
weather_data['coord.lat'].append(payload_from_json.get('coord.lat'))
weather_data['main.temp'].append(payload_from_json.get('main.temp'))
weather_data['main.feels_like'].append(payload_from_json.get('main.feels_like'))
weather_data['main.temp_min'].append(payload_from_json.get('main.temp_min'))
weather_data['main.temp_max'].append(payload_from_json.get('main.temp_max'))
weather_data['main.pressure'].append(payload_from_json.get('main.pressure'))
weather_data['main.humidity'].append(payload_from_json.get('main.humidity'))
weather_data['wind.speed'].append(payload_from_json.get('wind.speed'))
weather_data['wind.deg'].append(payload_from_json.get('wind.deg'))
weather_data['clouds.all'].append(payload_from_json.get('clouds.all'))
weather_data['sys.type'].append(payload_from_json.get('sys.type'))
weather_data['sys.id'].append(payload_from_json.get('sys.id'))
weather_data['sys.country'].append(payload_from_json.get('sys.country'))
weather_data['sys.sunrise'].append(payload_from_json.get('sys.sunrise'))
weather_data['sys.sunset'].append(payload_from_json.get('sys.sunset'))
i += 1

Related

iterate a dataframe

I'm trying to iterate a dataframe to call queries in mongodb from a list and save each query in a csv file. I have the connection with no errors, but when I iterate it just creates the frist file (0.csv) and I have an error for the second row of the dataframe.
This is my code:
sql = [
('tran','transactions',{"den": "00100002773060"}),
('tran','Data',{'name': 'john'}),
]
df = pd.DataFrame(sql, columns = ["database", "entity", "sql"])
for i in range(len(df)):
database = df.iloc[i]["database"]
entity=df.iloc[i]["entity"]
myquery=df.iloc[i]["sql"]
collection = client[database][entity]
try:
mydoc = list(collection.find(myquery))
if len(mydoc) > 0:
df = pd.DataFrame(mydoc)
df.pop("_id")
df.to_csv(str(i) + '.csv')
print("file saved")
except:
print("error on file")
and this the error
Traceback (most recent call last):
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'database'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "getSql.py", line 12, in <module>
database = df.iloc[i]["database"]
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/series.py", line 958, in __getitem__
return self._get_value(key)
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
raise KeyError(key) from err
KeyError: 'database'
from what I can see here you are changing your df variable here
df = pd.DataFrame(mydoc)
probably just rename it

saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError

I want to create a Polygon from a list of coordinates:
import pandas as pd
from shapely.geometry import Point, Polygon
data = pd.read_csv('path.csv', sep=';')
the data is in the following format
Suburb
features_geometry_x
features_geometry_y
1
50.941840
6.9595637
1
50.941845
6.9595698
3
50.94182
6.9595632
4
50.9418837
6.9595958
with several rows for suburb 1, 3 and 4
#create a polygon
I = data.loc[data['Suburb'] == 1]
I['coordinates'] = list(zip(I['features_geometry_x'], I['features_geometry_y']))
poly_i = Polygon(I['coordinates'])
the code above works fine but if I do the same thing for suburb 3 and 4 it yields the following error:
L = data.loc[data['Suburb'] == 3]
L['coordinates'] = list(zip(L['features_geometry_x'], L['features_geometry_y']))
poly_l = Polygon(L['coordinates'])
File "shapely/speedups/_speedups.pyx", line 252, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'array_interface'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/j6/wgg72kmx145f3krf14nzjfq40000gn/T/ipykernel_4092/214655495.py", line 3, in
poly_l = Polygon(Lindenthal['coordinates'])
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 261, in init
ret = geos_polygon_from_py(shell, holes)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 539, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 344, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Please help :)
I think the issue here is that you need more than one data point to create a polygon where as your suburb 2 and 3 each got only a single point.

Pandas To_Excel parsing problem - outputting only 1 file

Hello I have working code like this:
import pandas as pdfrom pandas.io.json import json_normalize
import json
import warnings
warnings.filterwarnings('ignore')
with open('yieldfull.json') as file:
data = json.load(file)
df_json = json_normalize(data)
df_json_stripped = data[0]
platform_dict = df_json_stripped['result']
platform_names = []
for key in platform_dict:
platform_names.append(key)
for name in platform_names:
if name == 'Autofarm':
vault_name_df = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.LPVaults.vaults'].items()]))[0])['name']
current_token_0 = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.LPVaults.vaults'].items()]))[0])['LPInfo.currentToken0']
current_token_1 = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.LPVaults.vaults'].items()]))[0])['LPInfo.currentToken1']
df_json = pd.DataFrame({'Vault_Name':vault_name_df, 'Current_Token_0':current_token_0 , 'Current_Token_1':current_token_1})
df_json.to_excel('Output_'+name+'.xlsx', index = False)
platform_names.remove(name)
elif name == 'Acryptos':
vault_name_df = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.vaults.vaults'].items()]))[0])['name']
price_USD = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.vaults.vaults'].items()]))[0])['priceInUSDDepositToken']
current_token_0 = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.vaults.vaults'].items()]))[0])['currentTokens']
deposited_token = json_normalize(pd.DataFrame(dict([(k, pd.Series(v)) for k,v in df_json['result.'+name+'.vaults.vaults'].items()]))[0])['depositedTokens']
df_json = pd.DataFrame({'Vault_Name':vault_name_df, 'Price_USD':price_USD, 'Current_Token_0':current_token_0, 'Deposited_Token':deposited_token})
df_json.to_excel('Output_'+name+'.xlsx', index = False)
else:
pass
Problem is: If I leave it like this it only outputs for first if. When I comment out that if section it will successfully output elif, but I can't get it to output 2 files whatever I do. Any ideas?
Error I'm getting for Acryptos:
Traceback (most recent call last):
File "C:\Users\Adam\PycharmProjects\Scrapy_Things\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'result.Acryptos.vaults.vaults'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/Adam/PycharmProjects/Scrapy_Things/yieldwatch/yieldwatch/spiders/JsonExcel.py", line 27, in <module>
vault_name_df = json_normalize(pd.DataFrame(dict([(k , pd.Series(v)) for k,v in df_json['result.'+name+'.vaults.vaults'].items()]))[0])['name']
File "C:\Users\Adam\PycharmProjects\Scrapy_Things\venv\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\Adam\PycharmProjects\Scrapy_Things\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'result.Acryptos.vaults.vaults'
But if I comment out Autofarm and just process if for Acryptos is outputs excel just fine.
please remove the below line from your code
platform_names.remove(name)
debug code:
platform_names=['Autofarm','Acryptos']
for name in platform_names:
if name == 'Autofarm':
print("Autofarm")
#platform_names.remove(name) # remove this line
elif name == "Acryptos":
print("Acryptos")
you have initially created
df_json = json_normalize(data)
and also in loop, you are overwriting it -->
df_json = pd.DataFrame({'Vault_Name':vault_name_df, 'Current_Token_0':current_token_0 , 'Current_Token_1':current_token_1})
df_json.to_excel('Output_'+name+'.xlsx', index = False)
so change the name in loop and it will be okay.

Problem accessing pandas data that is represented with commas?

I have line as follows:
data = pd.read_csv("file.csv", sep=";", encoding='ISO-8859-1', engine = 'python')
test = str(data['information'])
I'm trying to access csv column that contains data in a cell like so: "1000,10500,2500"
I get an error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Vastuualue'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/erik.ilonen/Desktop/Projekti_csv_data/Toinen_testiohjelma/toinen_datan_kasittely_ohjelma.py", line 12, in <module>
test = str(dataAlkuperainen['Vastuualue'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'information'
Your separator is not right.
sep should be comma not semicolon, so use sep="," instead of sep=";".

Key error message when calculating variables using pandas and yfinance

trying to calculate some variables from yfinance from the column df['Close'].
But im getting this error which i have not seen before. and heres are the code:
import os
import pandas as pd
import plotly.graph_objects as go
symbols = 'AAPL'
for filename in os.listdir('datasets/'):
#print(filename)
symbol = filename.split('.')[0]
#print(symbol)
df = pd.read_csv('datasets/{}'.format(filename))
if df.empty:
continue
df['20_sma'] = df['Close'].rolling(window=20).mean()
df['stddev'] = df['Close'].rolling(window=20).std()
df['lowerband'] = df['20_sma'] + (2* df['stddev'])
df['upperband'] = df['20_sma'] - (2* df['stddev'])
if symbol in symbols:
print(df)
and heres are the error message:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Close'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Kit/Documents/TTM_squeezer/squeeze.py", line 16, in <module>
df['20_sma'] = df['Close'].rolling(window=20).mean()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Close'
Seems like the 'Close' column has contributed to this error but i just cant figure out why?
Many thanks
turns out there was an error in the process where the local file was saved
case closed, thanks all

Categories

Resources