pandas Series' object has no attribute 'find' - python

I am trying to do simple plot of data and getting the following error.. any help is very much appreciated
AttributeError: 'Series' object has no attribute 'find'
Versions :
python3 ,
matplotlib (2.0.2) ,
pandas (0.20.3) ,
jupyter (1.0.0).
Code:
import pandas as pd
import matplotlib.pyplot as plt
pd_hr_data = pd.read_csv("/Users/pc/Downloads/HR_comma_sep.csv")
#print(pd_hr_data['average_montly_hours'],pd_hr_data['sales'])
take_ten_data = pd_hr_data[0:19]
x = take_ten_data['average_montly_hours'].astype(int)
y = take_ten_data['sales'].astype(str)
print(type(x[0]))
print(type(y[0]))
#print(x,y) ---- this gives me all the 20 values
#print(type(y[0]))
plt.plot(x,y)
plt.show()
Out Put / Error:
-
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
9 #print(type(y[0]))
10
---> 11 plt.plot(x,y)
12 plt.show()
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/pyplot.py
in plot(*args, **kwargs)
3315 mplDeprecation)
3316 try:
-> 3317 ret = ax.plot(*args, **kwargs)
3318 finally:
3319 ax._hold = washold
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/__init__.py
in inner(ax, *args, **kwargs)
1896 warnings.warn(msg % (label_namer, func.__name__),
1897 RuntimeWarning, stacklevel=2)
-> 1898 return func(ax, *args, **kwargs)
1899 pre_doc = inner.__doc__
1900 if pre_doc is None:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/axes/_axes.py
in plot(self, *args, **kwargs)
1404 kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
1405
-> 1406 for line in self._get_lines(*args, **kwargs):
1407 self.add_line(line)
1408 lines.append(line)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py
in _grab_next_args(self, *args, **kwargs)
405 return
406 if len(remaining) <= 3:
--> 407 for seg in self._plot_args(remaining, kwargs):
408 yield seg
409 return
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py
in _plot_args(self, tup, kwargs)
355 ret = []
356 if len(tup) > 1 and is_string_like(tup[-1]):
--> 357 linestyle, marker, color = _process_plot_format(tup[-1])
358 tup = tup[:-1]
359 elif len(tup) == 3:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py
in _process_plot_format(fmt)
92 # handle the multi char special cases and strip them from the
93 # string
---> 94 if fmt.find('--') >= 0:
95 linestyle = '--'
96 fmt = fmt.replace('--', '')
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/generic.py
in __getattr__(self, name)
3079 if name in self._info_axis:
3080 return self[name]
-> 3081 return object.__getattribute__(self, name)
3082
3083 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'find'

I think you can use DataFrame.plot with define x and y by columns names, because it better support plotting non numeric values:
take_ten_data = pd_hr_data[0:19]
x = take_ten_data['average_montly_hours'].astype(int)
y = take_ten_data['sales'].astype(str)
take_ten_data.plot(x='average_montly_hours', y='sales')
#working without x,y also, but less readable
#take_ten_data.plot('average_montly_hours','sales')
plt.show()
Sample:
take_ten_data = pd.DataFrame({'average_montly_hours':[3,10,12], 'sales':[10,20,30]})
x = take_ten_data['average_montly_hours'].astype(int)
y = take_ten_data['sales'].astype(str)
take_ten_data.plot(x='average_montly_hours', y='sales')
plt.show()
But if all values are numeric it works nice:
take_ten_data = pd.DataFrame({'average_montly_hours':[3,10,12], 'sales':['10','20','30']})
x = take_ten_data['average_montly_hours'].astype(int)
#convert to int if necessary
y = take_ten_data['sales'].astype(int)
plt.plot(x,y)
plt.show()

Following worked for me and hope it helps.... Issue was mixing differnt data types for plotting.
import pandas as pd
import matplotlib.pyplot as plt
pd_hr_data = pd.read_csv("/Users/pc/Downloads/HR_comma_sep.csv")
take_ten_data = pd_hr_data[0:4]
y = take_ten_data['average_montly_hours'].astype(int)
x = [1,2,3,4] ----this is can be autogenerated based on the series/matrix size
names = take_ten_data['sales']
plt.bar(x,y, align='center')
#plt.plot(x,y) ---- use this if you want
plt.xticks(x, names)
plt.show()

Related

Value Error: x and y must have the same first dimension

Let me quickly brief you first, I am working with a .txt file with 5400 data points. Each is a 16 second average over a 24 hour period (24 hrs * 3600 s/hr = 86400...86400/16 = 5400). In short this is the average magnetic strength in the z direction for an inbound particle field curtsy of the Advanced Composition Experiment satellite. Data publicly available here. So when I try to plot it says the error
Value Error: x and y must have the same first dimension
So I created a numpy lin space of 5400 points broken apart by 16 units. I did this because I thought that my dimensions didn't match with my previous array that I had defined. But now I am sure these two array's are of the same dimension and yet it still gives back that Value Error. The code is as follows:
First try (without the linspace):
import numpy as np
import matplotlib as plt
Bz = np.loadtxt(r"C:\Users\Schmidt\Desktop\Project\Data\ACE\MAG\ACE_MAG_Data_20151202_GSM.txt", dtype = bytes).astype(float)
Start_ACE = dt.date(2015,12,2)
Finish_ACE = dt.date(2015,12,2)
dt_Mag = 16
time_Mag = np.arange(Start_ACE, Finish_ACE, dt_Mag)
plt.subplot(3,1,1)
plt.plot(time_Mag, Bz)
plt.title('Bz 2015 12 02')
Second Try (with linspace):
import numpy as np
import matplotlib as plt
Bz = np.loadtxt(r"C:\Users\Schmidt\Desktop\Project\Data\ACE\MAG\ACE_MAG_Data_20151202_GSM.txt", dtype = bytes).astype(float)
Mag_time = np.linspace(0,5399,16, dtype = float)
plt.subplot(3,1,1)
plt.plot(Mag_time, Bz)
plt.title('Bz 2015 12 02')
Other than it being a dimensional problem I don't know what else could be holding back this plotting procedure back.
Full traceback:
ValueError Traceback (most recent call last)
<ipython-input-68-c5dc0bdf5117> in <module>()
1 plt.subplot(3,1,1)
----> 2 plt.plot(Mag_time, Bz)
3 plt.title('Bz 2015 12 02')
C:\Users\Schmidt\Anaconda3\lib\site-packages\matplotlib\pyplot.py in plot(*args, **kwargs)
3152 ax.hold(hold)
3153 try:
-> 3154 ret = ax.plot(*args, **kwargs)
3155 finally:
3156 ax.hold(washold)
C:\Users\Schmidt\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1809 warnings.warn(msg % (label_namer, func.__name__),
1810 RuntimeWarning, stacklevel=2)
-> 1811 return func(ax, *args, **kwargs)
1812 pre_doc = inner.__doc__
1813 if pre_doc is None:
C:\Users\Schmidt\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in plot(self, *args, **kwargs)
1422 kwargs['color'] = c
1423
-> 1424 for line in self._get_lines(*args, **kwargs):
1425 self.add_line(line)
1426 lines.append(line)
C:\Users\Schmidt\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _grab_next_args(self, *args, **kwargs)
384 return
385 if len(remaining) <= 3:
--> 386 for seg in self._plot_args(remaining, kwargs):
387 yield seg
388 return
C:\Users\Schmidt\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _plot_args(self, tup, kwargs)
362 x, y = index_of(tup[-1])
363
--> 364 x, y = self._xy_from_xy(x, y)
365
366 if self.command == 'plot':
C:\Users\Schmidt\Anaconda3\lib\site-packages\matplotlib\axes\_base.py in _xy_from_xy(self, x, y)
221 y = _check_1d(y)
222 if x.shape[0] != y.shape[0]:
--> 223 raise ValueError("x and y must have same first dimension")
224 if x.ndim > 2 or y.ndim > 2:
225 raise ValueError("x and y can be no greater than 2-D")
ValueError: x and y must have same first dimension
The problem was the selection of array creation. Instead of linspace, I should have used arange.
Mag_time = np.arange(0,86400, 16, dtype = float)

Matplotlib errorbar fails to read a pandas data frame

I have two data frames in pandas format that I am trying to plot as values and error bars. But the python interface complains about some error I cannot understand. I have tested a colleague's almost the same code, and it appears that the fact that I run python 3.5 while he utilizes 2.7, is the source of the error. Therefore, I did test his code on my computer (python 3.5) and I am getting the same error message.
Bellow is a subset of my troubling code:
"Using pandas library to combine the three white spruce data sets"
trees = [white_spruce_1,white_spruce_2,white_spruce_3]
ntrees = pd.concat(trees) # Concatenate list into a series
spruce_stat = ntrees.groupby("Wvl") #Converted the series into a panda object
mean_spruce = spruce_stat.mean()
std_spruce = spruce_stat.std()
#mean_spruce.head()
mean_spruce['wvl']=mean_spruce.index
mean_spruce.head()
Chan.# Rad. (Ref.) Rad. (Target) Tgt./Ref. %
Wvl
350 0 0 0.000014 0.686176
351 0 0 0.000015 0.707577
std_spruce.head()
Chan.# Rad. (Ref.) Rad. (Target) Tgt./Ref. %
Wvl
350 0 0 0.000014 0.686176
351 0 0 0.000015 0.707577
plt.errorbar(mean_spruce['wvl'],mean_spruce['Tgt./Ref. %'], xerr = None, yerr = std_spruce['Rad. (Ref.)'])
Bellow is the error message I receive:
KeyError Traceback (most recent call last)
<ipython-input-52-13352d94b09c> in <module>()
2 #plt.errorbar(mean_spruce['wvl'],mean_spruce['Tgt./Ref. %'], xerr = None,yerr=std_spruce['Tgt./Ref. %'],c='k',ecolor='r', elinewidth=0.5, errorevery=5)
3 #plt.errorbar( x, y, xerr = None , yerr = sd_white_spruce['Tgt./Ref. %'],c = 'green', ecolor = 'red', capsize = 0,elinewidth = 0.5, errorevery = 5 )
----> 4 plt.errorbar(mean_spruce['wvl'],mean_spruce['Tgt./Ref. %'], xerr = None, yerr = std_spruce['Rad. (Ref.)'])# ,c = 'green', ecolor = 'red', capsize = 0,elinewidth = 0.5, errorevery = 5)
5
C:\Users\mike\Anaconda3\lib\site-packages\matplotlib\pyplot.py in errorbar(x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, hold, data, **kwargs)
2828 xlolims=xlolims, xuplims=xuplims,
2829 errorevery=errorevery, capthick=capthick, data=data,
-> 2830 **kwargs)
2831 finally:
2832 ax.hold(washold)
C:\Users\mike\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1809 warnings.warn(msg % (label_namer, func.__name__),
1810 RuntimeWarning, stacklevel=2)
-> 1811 return func(ax, *args, **kwargs)
1812 pre_doc = inner.__doc__
1813 if pre_doc is None:
C:\Users\mike\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in errorbar(self, x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, **kwargs)
2961 # Check for scalar or symmetric, as in xerr.
2962 if len(yerr) > 1 and not ((len(yerr) == len(y) and not (
-> 2963 iterable(yerr[0]) and len(yerr[0]) > 1))):
2964 raise ValueError("yerr must be a scalar, the same "
2965 "dimensions as y, or 2xN.")
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
555 def __getitem__(self, key):
556 try:
--> 557 result = self.index.get_value(self, key)
558
559 if not np.isscalar(result):
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
3882
3883 k = _values_from_object(key)
-> 3884 loc = self.get_loc(k)
3885 new_values = _values_from_object(series)[loc]
3886
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method, tolerance)
3940 pass
3941 return super(Float64Index, self).get_loc(key, method=method,
-> 3942 tolerance=tolerance)
3943
3944 #property
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method, tolerance)
1757 'backfill or nearest lookups')
1758 key = _values_from_object(key)
-> 1759 return self._engine.get_loc(key)
1760
1761 indexer = self.get_indexer([key], method=method,
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3979)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Float64HashTable.get_item (pandas\hashtable.c:9556)()
pandas\hashtable.pyx in pandas.hashtable.Float64HashTable.get_item (pandas\hashtable.c:9494)()
KeyError: 0.0
Thanks for the help
The problem is the discrepancy between the Pandas indexing and the indexing of Matplotlib internal functions. One way to resolve it -albeit not being elegant- is to create a dummy dataframe just for the plotting purpose. In your case:
mean_spruce_dummy = mean_spruce
mean_spruce_dummy.columns = np.arange(0, len(mean_spruce))
In principle, this discrepancy is solved in the newer version of Pandas.
I'm seeing a similar error in python 2.7. My solution is to access the underlying data directly. This should work for you
x = mean_spruce['wvl'].values
y = mean_spruce['Tgt./Ref. %'].values
yerr = std_spruce['Rad. (Ref.)'].values
plt.errorbar(x, y yerr=yerr)

AssertionError using Basemap and Pandas

I'm trying to follow the tutorial here:
http://nbviewer.ipython.org/github/ehmatthes/intro_programming/blob/master/notebooks/visualization_earthquakes.ipynb#install_standard
However, I am using pandas instead of the built in csv module for python. My code is as follows:
import pandas as pd
eq_data = pd.read_csv('earthquake_data.csv')
map2 = Basemap(projection='robin'
, resolution='l'
, area_thresh=1000.0
, lat_0=0
, lon_0=0)
map2.drawcoastlines()
map2.drawcountries()
map2.fillcontinents(color = 'gray')
map2.drawmapboundary()
map2.drawmeridians(np.arange(0, 360, 30))
map2.drawparallels(np.arange(-90, 90, 30))
x,y = map2(eq_data['longitude'].values, eq_data['latitude'].values)
map2.plot(x,y, marker='0', markercolor='red', markersize=6)
This produces an AssertionError but with no description:
AssertionError Traceback (most recent call last)
<ipython-input-64-d3426e1f175d> in <module>()
14 x,y = map2(range(20), range(20))#eq_data['longitude'].values, eq_data['latitude'].values)
15
---> 16 map2.plot(x,y, marker='0', markercolor='red', markersize=6)
c:\Python27\lib\site-packages\mpl_toolkits\basemap\__init__.pyc in with_transform(self, x, y, *args, **kwargs)
540 # convert lat/lon coords to map projection coords.
541 x, y = self(x,y)
--> 542 return plotfunc(self,x,y,*args,**kwargs)
543 return with_transform
544
c:\Python27\lib\site-packages\mpl_toolkits\basemap\__init__.pyc in plot(self, *args, **kwargs)
3263 ax.hold(h)
3264 try:
-> 3265 ret = ax.plot(*args, **kwargs)
3266 except:
3267 ax.hold(b)
c:\Python27\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
4135 lines = []
4136
-> 4137 for line in self._get_lines(*args, **kwargs):
4138 self.add_line(line)
4139 lines.append(line)
c:\Python27\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
315 return
316 if len(remaining) <= 3:
--> 317 for seg in self._plot_args(remaining, kwargs):
318 yield seg
319 return
c:\Python27\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
303 ncx, ncy = x.shape[1], y.shape[1]
304 for j in xrange(max(ncx, ncy)):
--> 305 seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
306 ret.append(seg)
307 return ret
c:\Python27\lib\site-packages\matplotlib\axes.pyc in _makeline(self, x, y, kw, kwargs)
255 **kw
256 )
--> 257 self.set_lineprops(seg, **kwargs)
258 return seg
259
c:\Python27\lib\site-packages\matplotlib\axes.pyc in set_lineprops(self, line, **kwargs)
198 raise TypeError('There is no line property "%s"' % key)
199 func = getattr(line, funcName)
--> 200 func(val)
201
202 def set_patchprops(self, fill_poly, **kwargs):
c:\Python27\lib\site-packages\matplotlib\lines.pyc in set_marker(self, marker)
851
852 """
--> 853 self._marker.set_marker(marker)
854
855 def set_markeredgecolor(self, ec):
c:\Python27\lib\site-packages\matplotlib\markers.pyc in set_marker(self, marker)
231 else:
232 try:
--> 233 Path(marker)
234 self._marker_function = self._set_vertices
235 except ValueError:
c:\Python27\lib\site-packages\matplotlib\path.pyc in __init__(self, vertices, codes, _interpolation_steps, closed, readonly)
145 codes[-1] = self.CLOSEPOLY
146
--> 147 assert vertices.ndim == 2
148 assert vertices.shape[1] == 2
149
AssertionError:
I thought I had the problem due to the update to pandas which no longer allows passing Series like you used to be able to as described here:
Runtime error using python basemap and pyproj?
But as you can see, I adjusted my code for this and it didn't fix the problem. At this point I am lost.
I am using Python 2.7.6, pandas 0.15.2, and basemap 1.0.7 on windows server 2012 x64.
There are two problems with my code. First, the plot function for the map2 object is inherited from matplotlib. Thus the marker attribute cannot be '0' it needs to be 'o'. Additionally, there is no markercolor attribute. It is called color. The below code should work.
import pandas as pd
eq_data = pd.read_csv('earthquake_data.csv')
map2 = Basemap(projection='robin'
, resolution='l'
, area_thresh=1000.0
, lat_0=0
, lon_0=0)
map2.drawcoastlines()
map2.drawcountries()
map2.fillcontinents(color = 'gray')
map2.drawmapboundary()
map2.drawmeridians(np.arange(0, 360, 30))
map2.drawparallels(np.arange(-90, 90, 30))
x,y = map2(eq_data['longitude'].values, eq_data['latitude'].values)
map2.plot(x,y, marker='o', color='red', markersize=6, linestyle='')

python plot error when reading .csv with pandas: 'Series' object has no attribute 'find'

I am trying to read few .csv files and do something line plot(x,y) with this code:
import numpy as np
import pandas
from matplotlib import pyplot as plt
%matplotlib inline
colnames = ['X','Y']
xfmr_x_y_file = pandas.read_csv('AMI_X_Y.csv', names=colnames)
gnode_x_y_file = pandas.read_csv('AMI_GNODE_X_Y.csv', names=colnames)
node_x_y_file = pandas.read_csv('AMI_NODE_X_Y.csv', names=colnames)
EX_XFMR_X_meas = (xfmr_x_y_file.X)
EX_XFMR_Y_meas = (xfmr_x_y_file.Y)
DB_GNODE_X_meas = (gnode_x_y_file.X)
DB_GNODE_Y_meas = (gnode_x_y_file.Y)
DB_NODE_X_meas = (node_x_y_file.X)
DB_NODE_Y_meas = (node_x_y_file.Y)
plt.plot(EX_XFMR_X_meas[1:],EX_XFMR_Y_meas[1:],label='XFMR')
plt.title('TUR117')
plt.xlabel('X')
plt.ylabel('Y')
plt.gcf().set_size_inches(18, 6)
#plt.savefig('TUR117.png')#,dpi=300
plt.show()
But it is generating a weird error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-24-0428c97a1c49> in <module>()
17 DB_NODE_Y_meas = (node_x_y_file.Y)
18
---> 19 plt.plot(EX_XFMR_X_meas[1:],EX_XFMR_Y_meas[1:],label='XFMR')
20 plt.title('TUR117')
21 plt.xlabel('X')
C:\Program Files (x86)\ActivePython 2.7.8\lib\site-packages\matplotlib\pyplot.pyc in plot(*args, **kwargs)
2985 ax.hold(hold)
2986 try:
-> 2987 ret = ax.plot(*args, **kwargs)
2988 draw_if_interactive()
2989 finally:
C:\Program Files (x86)\ActivePython 2.7.8\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
4135 lines = []
4136
-> 4137 for line in self._get_lines(*args, **kwargs):
4138 self.add_line(line)
4139 lines.append(line)
C:\Program Files (x86)\ActivePython 2.7.8\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
315 return
316 if len(remaining) <= 3:
--> 317 for seg in self._plot_args(remaining, kwargs):
318 yield seg
319 return
C:\Program Files (x86)\ActivePython 2.7.8\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
274 ret = []
275 if len(tup) > 1 and is_string_like(tup[-1]):
--> 276 linestyle, marker, color = _process_plot_format(tup[-1])
277 tup = tup[:-1]
278 elif len(tup) == 3:
C:\Program Files (x86)\ActivePython 2.7.8\lib\site-packages\matplotlib\axes.pyc in _process_plot_format(fmt)
97 # handle the multi char special cases and strip them from the
98 # string
---> 99 if fmt.find('--') >= 0:
100 linestyle = '--'
101 fmt = fmt.replace('--', '')
C:\Program Files (x86)\ActivePython 2.7.8\lib\site-packages\pandas\core\generic.pyc in __getattr__(self, name)
1934 return self[name]
1935 raise AttributeError("'%s' object has no attribute '%s'" %
-> 1936 (type(self).__name__, name))
1937
1938 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'find'
If I simply do plt.plot(EX_XFMR_X[1:]), it plots fine and it appears that for some reason it is not able to simulate plt.plot(x,y) format. Did anyone face this problem before? Is there something I am not doing right?
I had a similar problem and the plt.plot(x,y) is indeed not respected in this case. Now both your inputs EX_XFMR_X_meas[1:] and EX_XFMR_Y_meas[1:] are still pandas.Series, so plt.plot(x,y) takes the Series' index as x and the Series' values as y. If the values of the first variable depict the x and the second the y, do:
plt.plot(EX_XFMR_X_meas[1:].values,EX_XFMR_Y_meas[1:].values,label='XFMR')
which passes them as numpy.ndarray .
I guess the weird error originates because plt.plot() does not know what to do with the second pandas.Series as input.

KeyError when plotting a sliced pandas dataframe with datetimes

I get a KeyError when I try to plot a slice of a pandas DataFrame column with datetimes in it. Does anybody know what could cause this?
I managed to reproduce the error in a little self contained example (which you can also view here: http://nbviewer.ipython.org/3714142/):
import numpy as np
from pandas import DataFrame
import datetime
from pylab import *
test = DataFrame({'x' : [datetime.datetime(2012,9,10) + datetime.timedelta(n) for n in range(10)],
'y' : range(10)})
Now if I plot:
plot(test['x'][0:5])
there is not problem, but when I plot:
plot(test['x'][5:10])
I get the KeyError below (and the error message is not very helpfull to me). This only happens with datetime columns, not with other columns (as far as I experienced). E.g. plot(test['y'][5:10]) is not a problem.
Ther error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-aa076e3fc4e0> in <module>()
----> 1 plot(test['x'][5:10])
C:\Python27\lib\site-packages\matplotlib\pyplot.pyc in plot(*args, **kwargs)
2456 ax.hold(hold)
2457 try:
-> 2458 ret = ax.plot(*args, **kwargs)
2459 draw_if_interactive()
2460 finally:
C:\Python27\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
3846 lines = []
3847
-> 3848 for line in self._get_lines(*args, **kwargs):
3849 self.add_line(line)
3850 lines.append(line)
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
321 return
322 if len(remaining) <= 3:
--> 323 for seg in self._plot_args(remaining, kwargs):
324 yield seg
325 return
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
298 x = np.arange(y.shape[0], dtype=float)
299
--> 300 x, y = self._xy_from_xy(x, y)
301
302 if self.command == 'plot':
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _xy_from_xy(self, x, y)
215 if self.axes.xaxis is not None and self.axes.yaxis is not None:
216 bx = self.axes.xaxis.update_units(x)
--> 217 by = self.axes.yaxis.update_units(y)
218
219 if self.command!='plot':
C:\Python27\lib\site-packages\matplotlib\axis.pyc in update_units(self, data)
1277 neednew = self.converter!=converter
1278 self.converter = converter
-> 1279 default = self.converter.default_units(data, self)
1280 #print 'update units: default=%s, units=%s'%(default, self.units)
1281 if default is not None and self.units is None:
C:\Python27\lib\site-packages\matplotlib\dates.pyc in default_units(x, axis)
1153 'Return the tzinfo instance of *x* or of its first element, or None'
1154 try:
-> 1155 x = x[0]
1156 except (TypeError, IndexError):
1157 pass
C:\Python27\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
374 def __getitem__(self, key):
375 try:
--> 376 return self.index.get_value(self, key)
377 except InvalidIndexError:
378 pass
C:\Python27\lib\site-packages\pandas\core\index.pyc in get_value(self, series, key)
529 """
530 try:
--> 531 return self._engine.get_value(series, key)
532 except KeyError, e1:
533 if len(self) > 0 and self.inferred_type == 'integer':
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1479)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1374)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2498)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2460)()
KeyError: 0
HYRY explained why you get the KeyError.
To plot with slices using matplotlib you can do:
In [157]: plot(test['x'][5:10].values)
Out[157]: [<matplotlib.lines.Line2D at 0xc38348c>]
In [158]: plot(test['x'][5:10].reset_index(drop=True))
Out[158]: [<matplotlib.lines.Line2D at 0xc37e3cc>]
x, y plotting in one go with 0.7.3
In [161]: test[5:10].set_index('x')['y'].plot()
Out[161]: <matplotlib.axes.AxesSubplot at 0xc48b1cc>
Instead of calling plot(test["x"][5:10]), you can call the plot method of Series object:
test["x"][5:10].plot()
The reason: test["x"][5:10] is a Series object with integer index from 5 to 10. plot() try to get index 0 of it, that will cause error.
I encountered this error with pd.groupby in Pandas 0.14.0 and solved it with df = df[df['col']!= 0].reset_index()

Categories

Resources