This question already has an answer here:
Why do I get a KeyError when using pandas apply?
(1 answer)
Closed 13 days ago.
I was looking at this answer by Roman Pekar for using apply. I initially copied the code exactly and it worked fine. Then I used it on my df3 that is created from a csv file and I got a KeyError. I checked datatypes the columns I was using are int64, so that is okay. I don't have nulls. If I can get this working then I will make the function more complex. How do I get this working?
def fxy(x, y):
return x * y
df3 = pd.read_csv(path + 'test_data.csv', usecols=[0,1,2])
print(df3.dtypes)
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']))
Trace back
Traceback (most recent call last):
File "f:\...\my_file.py", line 54, in <module>
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']))
File "C:\...\frame.py", line 8833, in apply
return op.apply().__finalize__(self, method="apply")
File "C:\...\apply.py", line 727, in apply
return self.apply_standard()
File "C:\...\apply.py", line 851, in apply_standard
results, res_index = self.apply_series_generator()
File "C:\...\apply.py", line 867, in apply_series_generator
results[i] = self.f(v)
File "f:\...\my_file.py", line 54, in <lambda>
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']))
File "C:\...\series.py", line 958, in __getitem__
return self._get_value(key)
File "C:\...\series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "C:\...\range.py", line 389, in get_loc
raise KeyError(key)
KeyError: 'Len'
I don't see a way to attach the csv file. Below is Sample df3 if I save the below with excel as "CSV (Comma delimited)(*.csv) I get the same results.
ID
Len
Width
A
170
4
B
362
5
C
12
15
D
42
7
E
15
3
F
46
49
G
71
74
I think you miss the axis=1 on apply:
df3['Area'] = df3.apply(lambda x: fxy(x['Len'], x['Width']), axis=1)
But in your case, you can just do:
df3['Area'] = df3['Len'] * df3['Width']
print(df3)
# Output
ID Len Width Area
0 A 170 4 680
1 B 362 5 1810
2 C 12 15 180
3 D 42 7 294
4 E 15 3 45
5 F 46 49 2254
6 G 71 74 5254
Related
I'm cycling through points in a geodataframe by index in such away where I am comparing index 0 and 1, then 1 and 2, then 3 and 4 etc... The purpose is to compare the 2 points. if the points are occupying the same location pass, else draw a line between the 2 points and summarize some stats. I figured if I compared the distance between the 2 points and got 0 then that would be skipped. What I have done before was to pass the 2 points in a single geodataframe into a function that returns a value for distance. They are in a projected crs units metres.
def getdist(pt_pair):
shift_pt = pt_pair.shift()
return pt_pair.distance(shift_pt)[1]
When I pass my 2 points to the function the first 2 return 0.0 the next return nan then I get this error.
Traceback (most recent call last):
File "C:/.../PycharmProjects/.../vessel_track_builder.py", line 33, in <module>
print(getdist(set_pts))
File "C:/.../PycharmProjects/.../vessel_track_builder.py", line 19, in getdist
if math.isnan(mdist1.distance(shift_pt)[1]):
File "C:\OSGEO4~1\apps\Python37\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\OSGEO4~1\apps\Python37\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 997, in
pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in
pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1
Process finished with exit code 1
Thought this might be an error in the point geometry so I added an if nan return 0 to the function.
def getdist(pt_pair):
shift_pt = pt_pair.shift()
if math.isnan(pt_pair.distance(shift_pt)[1]):
return 0
else:
return pt_pair.distance(shift_pt)[1]
The result is 0.0, 0, then the aforementioned error.
I added a print statement of my geodataframes but didn't see anything out of the ordinary.
index ... MMSI MONTH geometry
0 92 ... 123 4 POINT (2221098.494 1668358.870)
1 39 ... 123 4 POINT (2221098.494 1668358.870)
[2 rows x 12 columns]
index ... MMSI MONTH geometry
1 39 ... 456 4 POINT (2221098.494 1668358.870)
2 3231 ... 456 4 POINT (2221098.494 1668358.870)
[2 rows x 12 columns]
index ... MMSI MONTH geometry
2 3231 ... 789 4 POINT (2221098.494 1668358.870)
3 1032 ... 789 4 POINT (2221098.494 1668358.870)
I tried it on some test data with simple points and it went through them fine so I am wondering if there is something with how I am passing the geodataframe to the function. Since I am trying to compare each point to the one after it I am using the index to keep the order, could that be the issue?
for mmsi in points_gdf.MMSI.unique():
track_pts = points_gdf[(points_gdf.MMSI == mmsi)].sort_values(['POSITION_UTC_DATE']).reset_index()
print(track_pts.shape[0])
for index, row in track_pts.iterrows():
if index + 1 < track_pts.shape[0]:
set_pts = track_pts[(track_pts.index == index) | (track_pts.index == index + 1)]
print(set_pts)
print(getdist(set_pts))
else:
sys.exit()
I am noticing the index header which when I look at the data in QGIS there is no index column the first column is OBJECTID and the data is stored in a filegeodatabase. Could the index column be causing me the issue?
Instead of looping through each pair of points, do this once:
dist_to_next_point = track_pts.distance(track_pts.shift()).dropna()
I see dataframe error while trying to print it within single df[ _ , _ ] form. Below are the code lines
#Data Frames code
import numpy as np
import pandas as pd
randArr = np.random.randint(0,100,20).reshape(5,4)
df =pd.DataFrame(randArr,np.arange(101,106,1),['PDS','Algo','SE','INS'])
print(df['PDS','SE'])
errors:
Traceback (most recent call last): File "C:\Users\subro\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: ('PDS', 'SE')
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\Education\4th year\1st sem\Machine Learning Lab\1st Lab\python\pandas\pdDataFrame.py", line 11, in <module> print(df['PDS','SE']) File "C:\Users\subro\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\frame.py", line 3505, in __getitem__ indexer = self.columns.get_loc(key) File "C:\Users\subro\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: ('PDS', 'SE')
Do you mean to do this? Need to indicate the column names when creating the dataframe, and also need double square brackets df[[ ]] when extracting a slice of the dataframe
import numpy as np
import pandas as pd
randArr = np.random.randint(0,100,20).reshape(5,4)
df = pd.DataFrame(randArr, columns=['PDS', 'SE', 'ABC', 'CDE'])
print(df)
print(df[['PDS','SE']])
Output:
PDS SE ABC CDE
0 56 77 82 42
1 17 12 84 46
2 34 9 19 12
3 19 88 34 19
4 51 54 9 94
PDS SE
0 56 77
1 17 12
2 34 9
3 19 88
4 51 54
use print(df[['PDS','SE']]) format instead of print(df['PDS','SE'])
I'm having the petrole_price dataset and I'm checking the condition whether it's greater than yesterday or not. If it is true it will create a subset.
petrole_price
Country Today Yesterday
0 India 120 117
1 US 90 92
2 UAE 32 31
3 Russia 70 69
4 UK 55 55
While execute below code I'm getting error as key_error 'Today'
petrole_price = petrole_price[petrole_price['Today'] > petrole_price['Yesterday']]
Here is the entire error:
petrole_price = petrole_price[petrole_price['Today'] > petrole_price['Yesterday']]
File "/home/tgphamifm/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/tgphamifm/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Today'
I have following dataframe - dfgeo:
x y z zt n k pv geometry dist
0 6574878.210 4757530.610 1152.588 1 8 4 90 POINT (6574878.210 4757530.610) 0.000000
1 6574919.993 4757570.314 1174.724 0 POINT (6574919.993 4757570.314) 57.638760
2 6575020.518 4757665.839 1177.339 0 POINT (6575020.518 4757665.839) 138.673362
3 6575239.548 4757873.972 1160.156 1 8 4 90 POINT (6575239.548 4757873.972) 302.148120
4 6575351.603 4757980.452 1202.418 0 POINT (6575351.603 4757980.452) 154.577856
5 6575442.780 4758067.093 1199.297 0 POINT (6575442.780 4758067.093) 125.777217
6 6575538.217 4758157.782 1192.914 1 8 4 90 POINT (6575538.217 4758157.782) 131.653772
7 6575594.625 4758240.033 1217.442 0 POINT (6575594.625 4758240.033) 99.735096
8 6575738.820 4758450.289 1174.477 0 POINT (6575738.820 4758450.289) 254.950551
9 6575850.937 4758613.772 1123.852 1 8 4 90 POINT (6575850.937 4758613.772) 198.234490
10 6575984.323 4758647.118 1131.761 0 POINT (6575984.323 4758647.118) 137.491020
11 6576204.312 4758702.115 1119.407 0 POINT (6576204.312 4758702.115) 226.759410
12 6576303.976 4758727.031 1103.064 0 POINT (6576303.976 4758727.031) 102.731300
13 6576591.496 4758798.910 1114.06 0 POINT (6576591.496 4758798.910) 296.368590
14 6576736.965 4758835.277 1120.285 1 8 4 90 POINT (6576736.965 4758835.277) 149.945952
I am trying to group by zt values an summarize dist column. I have tried this:
def summarize(group):
s = group['zt'].eq(1).cumsum()
return group.groupby(s).agg(
D=('dist', 'sum')
)
dfzp=dfgeo.apply(summarize)
But i get following errors on last line of code
s = group['zt'].eq(1).cumsum()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index_class_helper.pxi", line 109, in pandas._libs.index.Int64Engine._check_type
KeyError: 'zt'
Any help in resolving this appreciated.
If need pass Dataframe to function use:
dfzp=summarize(dfgeo)
Or DataFrame.pipe:
dfzp=dfgeo.pipe(summarize)
If use DataFrame.apply then is used function per columns or per rows if axis=1.
I've loaded a csv file, and printed correctly, but I get an error when drawing boxplot with a Series.
Loaded my data and printed correctly
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data2 = pd.read_csv(...)
print(data2)
ax = sns.boxplot(x=data2['2'])
plt.show()
and the formation of my datas are followed:
0 1 2 3 4 5 6 7 ... 29 30 31 32 33 34 35 36
0 2016-06-06 04:07:42 0 26.0 0 1 101 0 0 ... 0 0 0 0 0 0 0
1 2016-06-08 12:34:10 0 25.0 0 1 101 0 0 ... 0 0 0 0 0 0 0
....
I want to draw a boxplot with the 2 columns (26.0、25.0), but I got this error:
Traceback (most recent call last):
File "D:\Python-Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 129, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/work/fLUTE/Solve-52/练习/sns练习/boxplot.py", line 16, in
ax = sns.boxplot(x=data2['2'])
File "D:\Python-Anaconda\lib\site-packages\pandas\core\frame.py", line 2927, in getitem
indexer = self.columns.get_loc(key)
File "D:\Python-Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 129, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '2'
When changing
ax = sns.boxplot(x=data2['2'])
to
ax = sns.boxplot(x=data2[2])
another error occurs:
TypeError: cannot perform reduce with flexible type
First, change ax = sns.boxplot(x=data2['2']) to ax = sns.boxplot(x=data2[2])
Second, add such codes data2[2] = data2[2].astype(float)