KeyError: 1.0 after renaming columns of dataframe - python

Following script:
import pandas as pd
import numpy as np
import math
A = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8]]))
Floor1 = math.floor(A.min()[1]/2)*2
names = np.array([ 0. , 0.635, 1.27 , 1.905])
A.columns = names
Floor2 = math.floor(A.min()[1]/2)*2
Floor1 is being executed correctly, Floor2 which is done with the same df but with renamed columns isn't. I get a key error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 385, in pandas._libs.hashtable.Float64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 392, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Desktop\Python\untitled0.py", line 13, in <module>
Floor2 = math.floor(A.min()[1]/2)*2
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\numeric.py", line 449, in get_value
loc = self.get_loc(k)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\numeric.py", line 508, in get_loc
return super().get_loc(key, method=method, tolerance=tolerance)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 385, in pandas._libs.hashtable.Float64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 392, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0
I know, there is a similar question: After rename column get keyerror
But I didn't really get the answer and - more important - how to solve it.

Before renaming if you get the columns of A using list(A.columns), you'll see that you'll get the list [0,1,2,3]. So, you can index using the key 1. However, after renaming, you can no longer index with key 1 because the column names have changed.

If you are using A.min(), you are finding minimum value in axis=0 by default that is along columns.
When changing the column names, you cannot access index '1' as there is no index with the name '1' in the columns.
If Your intension is finding the minimum in a row, you can use A.min(axis=1).
You can write the code like this.
import pandas as pd
import numpy as np
import math
A = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8]]))
Floor1 = math.floor(A.min(axis=1)[1]/2)*2
names = np.array([ 0. , 0.635, 1.27 , 1.905])
A.columns = names
Floor2 = math.floor(A.min(axis=1)[1]/2)*2
Thank you

Related

Getting Error While Mapping Data using Dictionary

I'm reading multiple files using this code block. Sometimes the column names in the file and col_map dictionary differ and code through the error. If the column name in the file is Label/Name and the col_map value is Label/, then I will throw the error. I'm looking for a wildcard kind of approach. It can name a partial value match. Then it should map the value.
If the Column name contains Label/, it should map the values.
Errors:
File "backup.py", line 27, in
mapping_function(df)
File "backup.py", line 24, in mapping_function
_data[i] = data[col_map[i]]
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2927, in getitem
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Label/Studio/Network/Developer/Publisher'
import pandas as pd
df=pd.read_csv('test.txt',sep=' ')
print(df.columns)
### Columns Name #########
# Label/Name,Item Title,Quantity
col_map = {
"start_date":None,
"end_date":None,
"product_label":"Label/",
"product_title":"Item Title",
"product_sku":None,
"quantity":"Quantity"
}
def mapping_function(data):
_data = {}
for i in col_map:
if col_map[i] is not None:
_data[i] = data[col_map[i]]
mapping_function(df)

saving coordinates from Dataframe as Polygons (shapely.geometry) AttributeError

I want to create a Polygon from a list of coordinates:
import pandas as pd
from shapely.geometry import Point, Polygon
data = pd.read_csv('path.csv', sep=';')
the data is in the following format
Suburb
features_geometry_x
features_geometry_y
1
50.941840
6.9595637
1
50.941845
6.9595698
3
50.94182
6.9595632
4
50.9418837
6.9595958
with several rows for suburb 1, 3 and 4
#create a polygon
I = data.loc[data['Suburb'] == 1]
I['coordinates'] = list(zip(I['features_geometry_x'], I['features_geometry_y']))
poly_i = Polygon(I['coordinates'])
the code above works fine but if I do the same thing for suburb 3 and 4 it yields the following error:
L = data.loc[data['Suburb'] == 3]
L['coordinates'] = list(zip(L['features_geometry_x'], L['features_geometry_y']))
poly_l = Polygon(L['coordinates'])
File "shapely/speedups/_speedups.pyx", line 252, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'Series' object has no attribute 'array_interface'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/folders/j6/wgg72kmx145f3krf14nzjfq40000gn/T/ipykernel_4092/214655495.py", line 3, in
poly_l = Polygon(Lindenthal['coordinates'])
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 261, in init
ret = geos_polygon_from_py(shell, holes)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/shapely/geometry/polygon.py", line 539, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 344, in shapely.speedups._speedups.geos_linearring_from_py
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/Jojo/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 0
Please help :)
I think the issue here is that you need more than one data point to create a polygon where as your suburb 2 and 3 each got only a single point.

Printing mutiple columns in Pandas (Python)

I'm new to Python and the Pandas module, but I can't seem to get this to work.
This is my code. I'm using a csv file containing the month and rainfall for Singapore.
Below is my code: 0
df = pd.read_csv('rainfall-monthly-total.csv')
print ((df['total_rainfall'])[df.total_rainfall == df['total_rainfall'].max()])
print ((df['month'])[df.total_rainfall == df['total_rainfall'].max()])
print ((df['total_rainfall', 'month'])[df.total_rainfall == df['total_rainfall'].max()])
The first two statements work fine. But something is wrong with the third and I can't find out why. Below is the output.
"/Users/xxxx/PycharmProjects/Phyton for Finance/venv/bin/python" "/Users/xxxx/PycharmProjects/Phyton for Finance/Panda Tutorial.py"
299 765.9
Name: total_rainfall, dtype: float64
299 2006-12
Name: month, dtype: object
Traceback (most recent call last):
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('total_rainfall', 'month')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xxxx/PycharmProjects/Phyton for Finance/Panda Tutorial.py", line 16, in <module>
print ((df['total_rainfall', 'month'])[df.total_rainfall == df['total_rainfall'].max()])
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/Users/xxxx/PycharmProjects/Phyton for Finance/venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('total_rainfall', 'month')
Process finished with exit code 1
I'm using PyCharm with python 3.7.
How do I get python to print out both columns for that particular month?
Try this:
print ((df[['total_rainfall', 'month']])[df.total_rainfall == df['total_rainfall'].max()]
You need to convert single square brackets to double:
['total_rainfall', 'month']
TO
[['total_rainfall', 'month']]
Easy. You need to use use a list of columns you want to print. so use df.loc to filter your data frame with conditions:
print(df.loc[df.total_rainfall == df['total_rainfall'].max(), ['total_rainfall', 'month']])

Python, Panda.read_excel Problem reading multiple sets of Data from one sheet

I have an Excel sheet with two sets of data (picture). I want to plot those with matplotlib in Python and import them with pandas. I narrowed down my script to make it quicker to read.
script:
import matplotlib.pyplot as plt
import pandas as pd
Tabelle = pd.read_excel("C:\\Users\\alexk\\Dropbox\\WW\\WW Master\\1. Semester\\WW2\\WW2 Kernfachpraktikum\\KFP2\\Ergebnisse.xlsx","Tabelle1")
x = Tabelle["Number1"]
y = Tabelle["Value1"]
x2=Tabelle["Number2"]
y2=Tabelle["Value2"]
plt.bar(x, y)
plt.bar(x2,y2)
plt.show()
End of script.
In the script it's possible to plot x and y when x2 and y2 are hashtagged out. When I want to read/plot/whatever x2 and y2 I get an error.
Error code:
Traceback (most recent call last):
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\indexes\base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Number2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/alexk/PycharmProjects/Vickers.py", line 8, in <module>
x2=Tabelle["Number2"]
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Users\alexk\PycharmProjects\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Number2'
Process finished with exit code 1
End of error code.
With other excel sheets this process worked fine. What am I missing? Does it have something to do with the excel sheet?

Reading a file with pandas and use correlation coefficients on two columns

I have a file like following with no header
0.000000 0.330001 0.280120
1.000000 0.355590 0.298581
2.000000 0.305945 0.280231
I want to read this file using pandas dataframe and want to perform correlation coefficient between the second and the third column.
I am trying like following:
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df['1'].corr(df['2'])
It pops up with a huge error message. Am I not treating the columns properly? Any suggestion or hint?
Error message
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3063, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2685, in __getitem__
return self._getitem_column(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2692, in _getitem_column
return self._get_item_cache(key)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/sbhakat/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3065, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc
KeyError: '1'
You will have to specify separator which is space while reading file. Then use position to access the columns. Below code should work.
df = pd.read_csv('test.txt', sep=' ', header=None)
df[1].corr(df[2])
Roy what is the file extension? is it .csv ? if it is you should add it to the end of fileName like pd.read_csv('COLVAR_hbondnohead.csv', header=None)
You don't have columns named 1 and 2, So, you have to create those columns first.
import pandas as pd
df = pd.read_csv('COLVAR_hbondnohead', header=None)
df1 = df.reindex(columns=['1','2', '3'])
then
df1['2'].corr(df1['3'])

Categories

Resources