saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError - python

I want to create a Polygon from a list of coordinates:
import pandas as pd
from shapely.geometry import Point, Polygon
data = pd.read_csv('path.csv', sep=';')
the data is in the following format
Suburb
features_geometry_x
features_geometry_y
1
50.941840
6.9595637
1
50.941845
6.9595698
3
50.94182
6.9595632
4
50.9418837
6.9595958
with several rows for suburb 1, 3 and 4
#create a polygon
I = data.loc[data['Suburb'] == 1]
I['coordinates'] = list(zip(I['features_geometry_x'], I['features_geometry_y']))
poly_i = Polygon(I['coordinates'])
the code above works fine but if I do the same thing for suburb 3 and 4 it yields the following error:
L = data.loc[data['Suburb'] == 3]
L['coordinates'] = list(zip(L['features_geometry_x'], L['features_geometry_y']))
poly_l = Polygon(L['coordinates'])
File "shapely/speedups/_speedups.pyx", line 252, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'array_interface'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/j6/wgg72kmx145f3krf14nzjfq40000gn/T/ipykernel_4092/214655495.py", line 3, in
poly_l = Polygon(Lindenthal['coordinates'])
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 261, in init
ret = geos_polygon_from_py(shell, holes)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 539, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 344, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Please help :)

I think the issue here is that you need more than one data point to create a polygon where as your suburb 2 and 3 each got only a single point.

Related

iterate a dataframe

I'm trying to iterate a dataframe to call queries in mongodb from a list and save each query in a csv file. I have the connection with no errors, but when I iterate it just creates the frist file (0.csv) and I have an error for the second row of the dataframe.
This is my code:
sql = [
('tran','transactions',{"den": "00100002773060"}),
('tran','Data',{'name': 'john'}),
]
df = pd.DataFrame(sql, columns = ["database", "entity", "sql"])
for i in range(len(df)):
database = df.iloc[i]["database"]
entity=df.iloc[i]["entity"]
myquery=df.iloc[i]["sql"]
collection = client[database][entity]
try:
mydoc = list(collection.find(myquery))
if len(mydoc) > 0:
df = pd.DataFrame(mydoc)
df.pop("_id")
df.to_csv(str(i) + '.csv')
print("file saved")
except:
print("error on file")
and this the error
Traceback (most recent call last):
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'database'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "getSql.py", line 12, in <module>
database = df.iloc[i]["database"]
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/series.py", line 958, in __getitem__
return self._get_value(key)
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "/home/r/Desktop/table_csv/entorno_virtual/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
raise KeyError(key) from err
KeyError: 'database'
from what I can see here you are changing your df variable here
df = pd.DataFrame(mydoc)
probably just rename it

Problem accessing pandas data that is represented with commas?

I have line as follows:
data = pd.read_csv("file.csv", sep=";", encoding='ISO-8859-1', engine = 'python')
test = str(data['information'])
I'm trying to access csv column that contains data in a cell like so: "1000,10500,2500"
I get an error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Vastuualue'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/erik.ilonen/Desktop/Projekti_csv_data/Toinen_testiohjelma/toinen_datan_kasittely_ohjelma.py", line 12, in <module>
test = str(dataAlkuperainen['Vastuualue'])
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'information'
Your separator is not right.
sep should be comma not semicolon, so use sep="," instead of sep=";".

Pandas error when using resample and groupby to apply function

I am new to python. I used to code in R and work with period_apply function. So I tried the following approaches in python below.
First, I do not understand what the errors are trying to tell me.
Second, I do not understand why I only get errors with groupby if I include the first row of the data. Yet with resample, i get error no matter whether I include the first row or not.
Third, how do I resolve this problem, please do not tell me skip the first row, because I work with a much much bigger dataset
Data
Best_Bid Best_Ask
Timestamp
2019-05-02 11:59:59.602 29636.0 29638.0
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
2019-05-02 12:59:00.033 NaN NaN
{'Best_Bid': {Timestamp('2019-05-02 11:59:59.602000'): 29636.0,
Timestamp('2019-05-02 12:59:00.033000'): nan},
'Best_Bid_Q': {Timestamp('2019-05-02 11:59:59.602000'): 4.0,
Timestamp('2019-05-02 12:59:00.033000'): nan},
'Best_Ask': {Timestamp('2019-05-02 11:59:59.602000'): 29638.0,
Timestamp('2019-05-02 12:59:00.033000'): nan}}
And I am trying to apply the below function(I know I could have just done .agg({'Best_Bid':['last']}) but this is a simplified version of my original code).
Function
def func(x):
best_bid = (x['Best_Bid'])[-1]
best_ask = (x['Best_Ask'])[-1]
return pd.Series([best_bid,best_ask], index=['bbbid', 'aaask'])
groupby and grouper
If I skip the first row and run. Things work fine.
df.iloc[1:,:].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
bbbid aaask
Timestamp
2019-05-02 12:59:59.999899904 NaN NaN
However, if i include the first row, I got the following error.
df.groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 471, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-191-85550b07b869>", line 1, in <module>
df.iloc[371448:371455,0:3].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
result = self._python_apply_general(f)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
res = f(group)
File "<ipython-input-104-c57c7e2b6885>", line 2, in func
best_bid = (x['Best_Bid'])[-1]
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
value = Index.get_value(self, series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
return libindex.get_value_at(s, key)
File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas\_libs\util.pxd", line 89, in pandas._libs.util.validate_indexer
IndexError: index out of bounds
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 471, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-191-85550b07b869>", line 1, in <module>
df.iloc[371448:371455,0:3].groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).apply(func)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
result = self._python_apply_general(f)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
res = f(group)
File "<ipython-input-104-c57c7e2b6885>", line 2, in func
best_bid = (x['Best_Bid'])[-1]
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
value = Index.get_value(self, series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
return libindex.get_value_at(s, key)
File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas\_libs\util.pxd", line 89, in pandas._libs.util.validate_indexer
IndexError: index out of bounds
Resample
I got the following error regardless of including the first row or not.
df.resample(rule='180S',closed='right',label='right',base=-0.0001).agg(func)
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4411, in get_value
return libindex.get_value_at(s, key)
File "pandas\_libs\index.pyx", line 44, in pandas._libs.index.get_value_at
File "pandas\_libs\index.pyx", line 45, in pandas._libs.index.get_value_at
File "pandas\_libs\util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas\_libs\util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 651, in get_value
value = Index.get_value(self, series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4419, in get_value
raise e1
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 473, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas\_libs\index.pyx", line 479, in pandas._libs.index.DatetimeEngine._date_check_type
KeyError: 'Best_Bid'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas\_libs\tslibs\conversion.pyx", line 520, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
File "pandas\_libs\tslibs\parsing.pyx", line 228, in pandas._libs.tslibs.parsing.parse_datetime_string
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\dateutil\parser\_parser.py", line 649, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: Best_Bid
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 660, in get_value
return self.get_value_maybe_box(series, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 675, in get_value_maybe_box
key = Timestamp(key)
File "pandas\_libs\tslibs\timestamps.pyx", line 418, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas\_libs\tslibs\conversion.pyx", line 292, in pandas._libs.tslibs.conversion.convert_to_tsobject
File "pandas\_libs\tslibs\conversion.pyx", line 523, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
ValueError: could not convert string to Timestamp
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-190-d2caa0c5152a>", line 1, in <module>
df.iloc[371448:371455,0:3].resample(rule='180S',closed='right',label='right',base=-0.0001).agg(func)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\resample.py", line 285, in aggregate
result = self._groupby_and_aggregate(how, grouper, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\resample.py", line 359, in _groupby_and_aggregate
result = grouped._aggregate_item_by_item(how, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 1172, in _aggregate_item_by_item
result[item] = colg.aggregate(func, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 269, in aggregate
result = self._aggregate_named(func, *args, **kwargs)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\groupby\generic.py", line 452, in _aggregate_named
output = func(group, *args, **kwargs)
File "<ipython-input-104-c57c7e2b6885>", line 2, in func
best_bid = (x['Best_Bid'])[-1]
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\testUser\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\datetimes.py", line 662, in get_value
raise KeyError(key)
KeyError: 'Best_Bid'
From what I can tell you are getting an error with the (x['Best_Bid'])[-1] because it is returning a KeyError: -1
Your apply function is iterating though each element (x) from the column (Best_Bid and Bid_Ask) and trying to grab the last index from the element (x) which doesn't make sense.
I don't have your dataset in front of me to work with but I would try this code to see if it works.
gdf = df.groupby(pd.Grouper(freq='180S',closed='right',label='right',base=-0.0001)).copy()
print(gdf['Best_Bid'][gdf.index[-1]],gdf.index[-1])
print(gdf['Best_Ask'][gdf.index[-1]],gdf.index[-1])
Now this code can definitely be simplified but it should work for all rows and it will be much faster than the .apply method if it is a large dataset.

Printing mutiple columns in Pandas (Python)

I'm new to Python and the Pandas module, but I can't seem to get this to work.
This is my code. I'm using a csv file containing the month and rainfall for Singapore.
Below is my code: 0
df = pd.read_csv('rainfall-monthly-total.csv')
print ((df['total_rainfall'])[df.total_rainfall == df['total_rainfall'].max()])
print ((df['month'])[df.total_rainfall == df['total_rainfall'].max()])
print ((df['total_rainfall', 'month'])[df.total_rainfall == df['total_rainfall'].max()])
The first two statements work fine. But something is wrong with the third and I can't find out why. Below is the output.
"/Users/xxxx/PycharmProjects/Phyton for Finance/venv/bin/python" "/Users/xxxx/PycharmProjects/Phyton for Finance/Panda Tutorial.py"
299 765.9
Name: total_rainfall, dtype: float64
299 2006-12
Name: month, dtype: object
Traceback (most recent call last):
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('total_rainfall', 'month')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xxxx/PycharmProjects/Phyton for Finance/Panda Tutorial.py", line 16, in <module>
print ((df['total_rainfall', 'month'])[df.total_rainfall == df['total_rainfall'].max()])
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('total_rainfall', 'month')
Process finished with exit code 1
I'm using PyCharm with python 3.7.
How do I get python to print out both columns for that particular month?
Try this:
print ((df[['total_rainfall', 'month']])[df.total_rainfall == df['total_rainfall'].max()]
You need to convert single square brackets to double:
['total_rainfall', 'month']
TO
[['total_rainfall', 'month']]
Easy. You need to use use a list of columns you want to print. so use df.loc to filter your data frame with conditions:
print(df.loc[df.total_rainfall == df['total_rainfall'].max(), ['total_rainfall', 'month']])

Reading a file with pandas and use correlation coefficients on two columns

I have a file like following with no header
0.000000 0.330001 0.280120
1.000000 0.355590 0.298581
2.000000 0.305945 0.280231
I want to read this file using pandas dataframe and want to perform correlation coefficient between the second and the third column.
I am trying like following:
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df['1'].corr(df['2'])
It pops up with a huge error message. Am I not treating the columns properly? Any suggestion or hint?
Error message
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3063, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2685, in __getitem__
return self._getitem_column(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2692, in _getitem_column
return self._get_item_cache(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3065, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'
You will have to specify separator which is space while reading file. Then use position to access the columns. Below code should work.
df = pd.read_csv('test.txt', sep=' ', header=None)
df[1].corr(df[2])
Roy what is the file extension? is it .csv ? if it is you should add it to the end of fileName like pd.read_csv('COLVAR_hbondnohead.csv', header=None)
You don't have columns named 1 and 2, So, you have to create those columns first.
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df1 = df.reindex(columns=['1','2', '3'])
then
df1['2'].corr(df1['3'])

Categories

Resources