iterate a dataframe

iterate a dataframe - python

I'm trying to iterate a dataframe to call queries in mongodb from a list and save each query in a csv file. I have the connection with no errors, but when I iterate it just creates the frist file (0.csv) and I have an error for the second row of the dataframe.
This is my code:
sql = [
('tran','transactions',{"den": "00100002773060"}),
('tran','Data',{'name': 'john'}),
]
df = pd.DataFrame(sql, columns = ["database", "entity", "sql"])
for i in range(len(df)):
database = df.iloc[i]["database"]
entity=df.iloc[i]["entity"]
myquery=df.iloc[i]["sql"]
collection = client[database][entity]
try:
mydoc = list(collection.find(myquery))
if len(mydoc) > 0:
df = pd.DataFrame(mydoc)
df.pop("_id")
df.to_csv(str(i) + '.csv')
print("file saved")
except:
print("error on file")
and this the error
Traceback (most recent call last):
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'database'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "getSql.py", line 12, in <module>
database = df.iloc[i]["database"]
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/series.py", line 958, in __getitem__
return self._get_value(key)
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
raise KeyError(key) from err
KeyError: 'database'

from what I can see here you are changing your df variable here
df = pd.DataFrame(mydoc)
probably just rename it

Related

saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError

I want to create a Polygon from a list of coordinates:
import pandas as pd
from shapely.geometry import Point, Polygon
data = pd.read_csv('path.csv', sep=';')
the data is in the following format
Suburb
features_geometry_x
features_geometry_y
1
50.941840
6.9595637
1
50.941845
6.9595698
3
50.94182
6.9595632
4
50.9418837
6.9595958
with several rows for suburb 1, 3 and 4
#create a polygon
I = data.loc[data['Suburb'] == 1]
I['coordinates'] = list(zip(I['features_geometry_x'], I['features_geometry_y']))
poly_i = Polygon(I['coordinates'])
the code above works fine but if I do the same thing for suburb 3 and 4 it yields the following error:
L = data.loc[data['Suburb'] == 3]
L['coordinates'] = list(zip(L['features_geometry_x'], L['features_geometry_y']))
poly_l = Polygon(L['coordinates'])
File "shapely/speedups/_speedups.pyx", line 252, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'array_interface'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/j6/wgg72kmx145f3krf14nzjfq40000gn/T/ipykernel_4092/214655495.py", line 3, in
poly_l = Polygon(Lindenthal['coordinates'])
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 261, in init
ret = geos_polygon_from_py(shell, holes)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 539, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 344, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Please help :)

I think the issue here is that you need more than one data point to create a polygon where as your suburb 2 and 3 each got only a single point.

Problem accessing pandas data that is represented with commas?

I have line as follows:
data = pd.read_csv("file.csv", sep=";", encoding='ISO-8859-1', engine = 'python')
test = str(data['information'])
I'm trying to access csv column that contains data in a cell like so: "1000,10500,2500"
I get an error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Vastuualue'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/erik.ilonen/Desktop/Projekti_csv_data/Toinen_testiohjelma/toinen_datan_kasittely_ohjelma.py", line 12, in <module>
test = str(dataAlkuperainen['Vastuualue'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'information'

Your separator is not right.
sep should be comma not semicolon, so use sep="," instead of sep=";".

KeyError(Key) when using append with defaultdict

I am getting the following error when I am trying to append to a dictionary using defaultdict(list). From my understanding, defaultdict is suppose to prevent a keyerror.
raise KeyError(key) from err
KeyError: 'id'
The following is my code:
weather_data = defaultdict(list)
m = len(_ids)
date = str(date.today())
i = 0
while i < m:
url = ("https://api.openweathermap.org/data/2.5/weather?id=%s&units=%s&appid=%s") %
(_ids.loc[i], 'imperial', weather_key)
payload = r.get(url).json()
payload_from_json = pd.json_normalize(payload)
weather_data[date].append(date)
weather_data['id'].append(payload_from_json['id'])
weather_data['weather'].append(payload_from_json['weather'])
weather_data['base'].append(payload_from_json['base'])
weather_data['visibility'].append(payload_from_json['visibility'])
weather_data['dt'].append(payload_from_json['dt'])
weather_data['name'].append(payload_from_json['name'])
weather_data['cod'].append(payload_from_json['cod'])
weather_data['coord.lon'].append(payload_from_json['coord.lon'])
weather_data['coord.lat'].append(payload_from_json['coord.lat'])
weather_data['main.temp'].append(payload_from_json['main.temp'])
weather_data['main.feels_like'].append(payload_from_json['main.feels_like'])
weather_data['main.temp_min'].append(payload_from_json['main.temp_min'])
weather_data['main.temp_max'].append(payload_from_json['main.temp_max'])
weather_data['main.pressure'].append(payload_from_json['main.pressure'])
weather_data['main.humidity'].append(payload_from_json['main.humidity'])
weather_data['wind.speed'].append(payload_from_json['wind.speed'])
weather_data['wind.deg'].append(payload_from_json['wind.deg'])
weather_data['clouds.all'].append(payload_from_json['clouds.all'])
weather_data['sys.type'].append(payload_from_json['sys.type'])
weather_data['sys.id'].append(payload_from_json['sys.id'])
weather_data['sys.country'].append(payload_from_json['sys.country'])
weather_data['sys.sunrise'].append(payload_from_json['sys.sunrise'])
weather_data['sys.sunset'].append(payload_from_json['sys.sunset'])
i = i + 1
print(weather_data)
Here is the traceback error - can someone tell me how to interpret this:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'coord.lon'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 40, in <module>
weather_data['coord.lon'].append(payload_from_json['coord.lon'])
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2902, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err

[EDIT]
your weather_data is your default dict, but payload_from_json not. So your error was raised by payload_from_json.
You can fix this by using get to access the key:
weather_data['id'].append(payload_from_json.get('id'))
if you dont want to include junk data, you can add some verifications before append:
if payload_from_json.get('id') is not None:
weather_data['id'].append(payload_from_json.get('id'))
Also, you can add some default value like this:
weather_data['id'].append(payload_from_json.get('id', 'missing'))
or
weather_data['id'].append(payload_from_json.get('id', ''))
or by default:
weather_data['id'].append(payload_from_json.get('id', None))
In your specific problem, this should work:
weather_data = defaultdict(list)
m = len(_ids)
date = str(date.today())
i = 0
while i < m:
url = ("https://api.openweathermap.org/data/2.5/weather?id=%s&units=%s&appid=%s") %
(_ids.loc[i], 'imperial', weather_key)
payload = r.get(url).json()
payload_from_json = pd.json_normalize(payload)
weather_data[date].append(date)
weather_data['id'].append(payload_from_json.get('id'))
weather_data['weather'].append(payload_from_json.get('weather'))
weather_data['base'].append(payload_from_json.get('base'))
weather_data['visibility'].append(payload_from_json.get('visibility'))
weather_data['dt'].append(payload_from_json.get('dt'))
weather_data['name'].append(payload_from_json.get('name'))
weather_data['cod'].append(payload_from_json.get('cod'))
weather_data['coord.lon'].append(payload_from_json.get('coord.lon'))
weather_data['coord.lat'].append(payload_from_json.get('coord.lat'))
weather_data['main.temp'].append(payload_from_json.get('main.temp'))
weather_data['main.feels_like'].append(payload_from_json.get('main.feels_like'))
weather_data['main.temp_min'].append(payload_from_json.get('main.temp_min'))
weather_data['main.temp_max'].append(payload_from_json.get('main.temp_max'))
weather_data['main.pressure'].append(payload_from_json.get('main.pressure'))
weather_data['main.humidity'].append(payload_from_json.get('main.humidity'))
weather_data['wind.speed'].append(payload_from_json.get('wind.speed'))
weather_data['wind.deg'].append(payload_from_json.get('wind.deg'))
weather_data['clouds.all'].append(payload_from_json.get('clouds.all'))
weather_data['sys.type'].append(payload_from_json.get('sys.type'))
weather_data['sys.id'].append(payload_from_json.get('sys.id'))
weather_data['sys.country'].append(payload_from_json.get('sys.country'))
weather_data['sys.sunrise'].append(payload_from_json.get('sys.sunrise'))
weather_data['sys.sunset'].append(payload_from_json.get('sys.sunset'))
i += 1

Key error message when calculating variables using pandas and yfinance

trying to calculate some variables from yfinance from the column df['Close'].
But im getting this error which i have not seen before. and heres are the code:
import os
import pandas as pd
import plotly.graph_objects as go
symbols = 'AAPL'
for filename in os.listdir('datasets/'):
#print(filename)
symbol = filename.split('.')[0]
#print(symbol)
df = pd.read_csv('datasets/{}'.format(filename))
if df.empty:
continue
df['20_sma'] = df['Close'].rolling(window=20).mean()
df['stddev'] = df['Close'].rolling(window=20).std()
df['lowerband'] = df['20_sma'] + (2* df['stddev'])
df['upperband'] = df['20_sma'] - (2* df['stddev'])
if symbol in symbols:
print(df)
and heres are the error message:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Close'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Kit/Documents/TTM_squeezer/squeeze.py", line 16, in <module>
df['20_sma'] = df['Close'].rolling(window=20).mean()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Close'
Seems like the 'Close' column has contributed to this error but i just cant figure out why?
Many thanks

turns out there was an error in the process where the local file was saved
case closed, thanks all

Reading a file with pandas and use correlation coefficients on two columns

I have a file like following with no header
0.000000 0.330001 0.280120
1.000000 0.355590 0.298581
2.000000 0.305945 0.280231
I want to read this file using pandas dataframe and want to perform correlation coefficient between the second and the third column.
I am trying like following:
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df['1'].corr(df['2'])
It pops up with a huge error message. Am I not treating the columns properly? Any suggestion or hint?
Error message
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3063, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2685, in __getitem__
return self._getitem_column(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2692, in _getitem_column
return self._get_item_cache(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3065, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'

You will have to specify separator which is space while reading file. Then use position to access the columns. Below code should work.
df = pd.read_csv('test.txt', sep=' ', header=None)
df[1].corr(df[2])

Roy what is the file extension? is it .csv ? if it is you should add it to the end of fileName like pd.read_csv('COLVAR_hbondnohead.csv', header=None)

You don't have columns named 1 and 2, So, you have to create those columns first.
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df1 = df.reindex(columns=['1','2', '3'])
then
df1['2'].corr(df1['3'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

iterate a dataframe - python

from what I can see here you are changing your df variable here df = pd.DataFrame(mydoc) probably just rename it

Related

saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError

Problem accessing pandas data that is represented with commas?

KeyError(Key) when using append with defaultdict

Key error message when calculating variables using pandas and yfinance

Reading a file with pandas and use correlation coefficients on two columns

Categories

Resources