KeyError when using for loop on dataframe to plot histograms - python

I have a dataframe similar to:
df = pd.DataFrame({'Date': ['2016-01-05', '2016-01-05', '2016-01-05', '2016-01-05', '2016-01-08', '2016-01-08', '2016-02-01'], 'Count': [1, 2, 2, 3, 2, 0, 2]})
and I am trying to plot a histogram of Count for each unique Date
I've tried:
for date in df.Date.unique():
plt.hist([df[df.Date == '%s' %(date)]['Count']])
plt.title('%s' %(date))
which results in
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-17-971a1cf07250> in <module>()
1 for date in df.Date.unique():
----> 2 plt.hist([df[df.Date == '%s' %(date)]['Count']])
3 plt.title('%s' %(date))
c:~\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, data, **kwargs)
2963 histtype=histtype, align=align, orientation=orientation,
2964 rwidth=rwidth, log=log, color=color, label=label,
-> 2965 stacked=stacked, data=data, **kwargs)
2966 finally:
2967 ax.hold(washold)
c:~\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1816 warnings.warn(msg % (label_namer, func.__name__),
1817 RuntimeWarning, stacklevel=2)
-> 1818 return func(ax, *args, **kwargs)
1819 pre_doc = inner.__doc__
1820 if pre_doc is None:
c:~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5925
5926 # basic input validation
-> 5927 flat = np.ravel(x)
5928
5929 input_empty = len(flat) == 0
c:~\anaconda3\lib\site-packages\numpy\core\fromnumeric.py in ravel(a, order)
1482 return asarray(a).ravel(order=order)
1483 else:
-> 1484 return asanyarray(a).ravel(order=order)
1485
1486
c:~\anaconda3\lib\site-packages\numpy\core\numeric.py in asanyarray(a, dtype, order)
581
582 """
--> 583 return array(a, dtype, copy=False, order=order, subok=True)
584
585
c:~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
581 key = com._apply_if_callable(key, self)
582 try:
--> 583 result = self.index.get_value(self, key)
584
585 if not lib.isscalar(result):
c:~\anaconda3\lib\site-packages\pandas\indexes\base.py in get_value(self, series, key)
1978 try:
1979 return self._engine.get_value(s, k,
-> 1980 tz=getattr(series.dtype, 'tz', None))
1981 except KeyError as e1:
1982 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3332)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3035)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6610)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6554)()
KeyError: 0
But when I try to simply print it, there is no problem:
for date in df.Date.unique():
print([df[df.Date == '%s' %(date)]['Count']])
[0 1
1 2
2 2
3 3
Name: Count, dtype: int64]
[4 2
5 0
Name: Count, dtype: int64]
[6 2
Name: Count, dtype: int64]
What is the issue with calling plt.hist on my dataframe the way that I have it here?

Essentially you have two square brackets too much in your code.
plt.hist([series]) # <- wrong
plt.hist(series) # <- correct
In the first case matplotlib would try to plot a histogram of a list of one element, which is non-numeric. That won't work.
Instead, removing the brackts and directly supplying the series, works fine
for date in df.Date.unique():
plt.hist(df[df.Date == '%s' %(date)]['Count'])
plt.title('%s' %(date))
Now this will create all histograms in the same plot. Not sure if this is desired. If not, consider the incredibly short alternative:
df.hist(by="Date")

You're passing a list of dataframes, which is causing a problem here. You could deconstruct a groupby object and plot each one separately.
gps = df.groupby('Date').Count
_, axes = plt.subplots(nrows=gps.ngroups)
for (_, g), ax in zip(df.groupby('Date').Count, axes):
g.plot.hist(ax=ax)
plt.show()
Take a look at the Visualisation docs if you need more sugar in your graph.

Related

Pandas 0.19: Attribute error "unknown property color_cycle" still exists while performing boxplot

Unlike what has been said here, the AttributeError: Unknown property color_cycle is still persistent in the newest Pandas version (0.19.0-1).
In my case, I have a dataframe similar to this, although much longer (3,000,000 rows):
A B C
1 05010001 17 1
2 05020001 5 1
3 05020002 11 1
4 05020003 2 1
5 05030001 86 1
6 07030001 84 2
7 07030002 10 2
8 08010001 16 3
For some reason, if I implement this very example, there are no errors. In my actual case, executing the simple
df.boxplot(by='C')
triggers this mess:
AttributeError Traceback (most recent call last)
<ipython-input-51-5c645348f82f> in <module>()
----> 1 df.boxplot(by='C')
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\frame.pyc in boxplot(self, column, by, ax, fontsize, rot, grid, figsize, layout, return_type, **kwds)
5514
5515 columns = list(data.dtype.names)
-> 5516 arrays = [data[k] for k in columns]
5517 return arrays, columns
5518 else:
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\plotting.pyc in boxplot(data, column, by, ax, fontsize, rot, grid, figsize, layout, return_type, **kwds)
2687 ax : Matplotlib axes object, optional
2688 fontsize : int or string
-> 2689 rot : label rotation angle
2690 figsize : A tuple (width, height) in inches
2691 grid : Setting this to True will show the grid
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\plotting.pyc in _grouped_plot_by_column(plotf, data, columns, by, numeric_only, grid, figsize, ax, layout, return_type, **kwargs)
3091 >>> df = pandas.DataFrame(data, columns=list('ABCD'), index=index)
3092 >>>
-> 3093 >>> grouped = df.groupby(level='lvl1')
3094 >>> boxplot_frame_groupby(grouped)
3095 >>>
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tools\plotting.pyc in plot_group(keys, values, ax)
2659 create a figure with the default figsize, causing the figsize=parameter to
2660 be ignored.
-> 2661 """
2662 if ax is None and len(plt.get_fignums()) > 0:
2663 ax = _gca()
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\__init__.pyc in inner(ax, *args, **kwargs)
1810 warnings.warn(msg % (label_namer, func.__name__),
1811 RuntimeWarning, stacklevel=2)
-> 1812 return func(ax, *args, **kwargs)
1813 pre_doc = inner.__doc__
1814 if pre_doc is None:
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_axes.pyc in boxplot(self, x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap, usermedians, conf_intervals, meanline, showmeans, showcaps, showbox, showfliers, boxprops, labels, flierprops, medianprops, meanprops, capprops, whiskerprops, manage_xticks)
3321 meanline=meanline, showfliers=showfliers,
3322 capprops=capprops, whiskerprops=whiskerprops,
-> 3323 manage_xticks=manage_xticks)
3324 return artists
3325
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_axes.pyc in bxp(self, bxpstats, positions, widths, vert, patch_artist, shownotches, showmeans, showcaps, showbox, showfliers, boxprops, whiskerprops, flierprops, medianprops, capprops, meanprops, meanline, manage_xticks)
3650 boxes.extend(dopatch(box_x, box_y, **final_boxprops))
3651 else:
-> 3652 boxes.extend(doplot(box_x, box_y, **final_boxprops))
3653
3654 # draw the whiskers
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_axes.pyc in doplot(*args, **kwargs)
3564 if vert:
3565 def doplot(*args, **kwargs):
-> 3566 return self.plot(*args, **kwargs)
3567
3568 def dopatch(xs, ys, **kwargs):
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\__init__.pyc in inner(ax, *args, **kwargs)
1810 warnings.warn(msg % (label_namer, func.__name__),
1811 RuntimeWarning, stacklevel=2)
-> 1812 return func(ax, *args, **kwargs)
1813 pre_doc = inner.__doc__
1814 if pre_doc is None:
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_axes.pyc in plot(self, *args, **kwargs)
1422 kwargs['color'] = c
1423
-> 1424 for line in self._get_lines(*args, **kwargs):
1425 self.add_line(line)
1426 lines.append(line)
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_base.pyc in _grab_next_args(self, *args, **kwargs)
384 return
385 if len(remaining) <= 3:
--> 386 for seg in self._plot_args(remaining, kwargs):
387 yield seg
388 return
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_base.pyc in _plot_args(self, tup, kwargs)
372 ncx, ncy = x.shape[1], y.shape[1]
373 for j in xrange(max(ncx, ncy)):
--> 374 seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
375 ret.append(seg)
376 return ret
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes\_base.pyc in _makeline(self, x, y, kw, kwargs)
278 default_dict = self._getdefaults(None, kw, kwargs)
279 self._setdefaults(default_dict, kw, kwargs)
--> 280 seg = mlines.Line2D(x, y, **kw)
281 self.set_lineprops(seg, **kwargs)
282 return seg
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\lines.pyc in __init__(self, xdata, ydata, linewidth, linestyle, color, marker, markersize, markeredgewidth, markeredgecolor, markerfacecolor, markerfacecoloralt, fillstyle, antialiased, dash_capstyle, solid_capstyle, dash_joinstyle, solid_joinstyle, pickradius, drawstyle, markevery, **kwargs)
365 # update kwargs before updating data to give the caller a
366 # chance to init axes (and hence unit support)
--> 367 self.update(kwargs)
368 self.pickradius = pickradius
369 self.ind_offset = 0
C:\Users\B4058846\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\artist.pyc in update(self, props)
854 func = getattr(self, 'set_' + k, None)
855 if func is None or not six.callable(func):
--> 856 raise AttributeError('Unknown property %s' % k)
857 func(v)
858 changed = True
AttributeError: Unknown property color_cycle
And I am left with a blank 4-pane plot, when there should have been just one with 5 boxplot columns:
How to fix this?
I've verified myself that one needs to have pandas 0.19.0-1 together with matplotlib 1.5.1-8 to not experience this error.

Pie chart with pandas & matplot lib

I'm trying to plot a pie chart of the titanic survivor data. I have been trying to plot this as a pie chart but I keep getting a KeyError 0. How can I fix this?
figure(1, figsize=(6,6))
ax = axes([0.1, 0.1, 0.8, 0.8])
s_survival = (titanic_data.Survived[titanic_data.Embarked == 'S'][titanic_data.Survived == 1].value_counts()
) / survivors.sum()
c_survival = (titanic_data.Survived[titanic_data.Embarked == 'C'][titanic_data.Survived == 1].value_counts()
) / survivors.sum()
q_survival = (titanic_data.Survived[titanic_data.Embarked == 'Q'][titanic_data.Survived == 1].value_counts()
) / survivors.sum()
labels = ['s_survival', 'c_survival', 'q_survival']
sizes = [s_survival, c_survival, q_survival]
pie(sizes, explode=None, labels=labels, autopct='%1.1f%%', shadow=False, startangle=90)
title('Survivor Percentage from Embarked Port')
Any advice would be greatly appreciated.
StackTrace :
KeyError Traceback (most recent call last)
<ipython-input-18-a8c68ae3422f> in <module>()
11 sizes = [s_survival, c_survival, q_survival]
12 # add a list zip here
---> 13 plt.pie(sizes, explode=None, labels=labels, autopct='%1.1f%%', shadow=False, startangle=90)
14 title('Survivor Percentage from Embarked Port')
//anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in pie(x, explode, labels, colors, autopct, pctdistance, shadow, labeldistance, startangle, radius, counterclock, wedgeprops, textprops, center, frame, hold, data)
3135 radius=radius, counterclock=counterclock,
3136 wedgeprops=wedgeprops, textprops=textprops, center=center,
-> 3137 frame=frame, data=data)
3138 finally:
3139 ax.hold(washold)
//anaconda/lib/python2.7/site-packages/matplotlib/__init__.pyc in inner(ax, *args, **kwargs)
1810 warnings.warn(msg % (label_namer, func.__name__),
1811 RuntimeWarning, stacklevel=2)
-> 1812 return func(ax, *args, **kwargs)
1813 pre_doc = inner.__doc__
1814 if pre_doc is None:
//anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in pie(self, x, explode, labels, colors, autopct, pctdistance, shadow, labeldistance, startangle, radius, counterclock, wedgeprops, textprops, center, frame)
2546 """
2547
-> 2548 x = np.asarray(x).astype(np.float32)
2549
2550 sx = float(x.sum())
//anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
472
473 """
--> 474 return array(a, dtype, copy=False, order=order)
475
476 def asanyarray(a, dtype=None, order=None):
//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
558 def __getitem__(self, key):
559 try:
--> 560 result = self.index.get_value(self, key)
561
562 if not lib.isscalar(result):
//anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_value(self, series, key)
1909 try:
1910 return self._engine.get_value(s, k,
-> 1911 tz=getattr(series.dtype, 'tz', None))
1912 except KeyError as e1:
1913 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3234)()
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:2931)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)()
pandas/hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6527)()
pandas/hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6465)()
KeyError: 0

Keyerror in matplotlib when the column clearly exists

I'm trying to plot using the following code:
df.plot(kind='scatter',x='branch', y='retention', s=df['active_users']*200)
Which gives me the following error:
KeyError Traceback (most recent call last)
<ipython-input-17-e43e5aeff662> in <module>()
3 df = Flexbooks[Flexbooks['schoolyearsemester'] == StartSem][Flexbooks['branch'] != 'OTHE'][Flexbooks['branch'] != 'SSCI'][Flexbooks['branch'] != 'EM1'][Flexbooks['branch'] != 'EM2'][Flexbooks['branch'] != 'EM3'][Flexbooks['branch'] != 'EM4'][Flexbooks['branch'] != 'EM5'][Flexbooks['branch'] != 'SATP'][Flexbooks['branch'] != 'MORE'][Flexbooks['branch'] != 'SPEL'][Flexbooks['branch'] != 'ENG'][Flexbooks['branch'] != 'ENGG'][Flexbooks['branch'] != 'NANO'][Flexbooks['branch'] != 'TECH'][Flexbooks['branch'] != 'HIST'][Flexbooks['branch'] != 'WRIT'][Flexbooks['branch'] != 'ASTR'][Flexbooks['branch'] != 'EXAP']
4
----> 5 df.plot(kind='scatter',x='branch', y='retention', s=df['active_users']*200)
6 #df.plot.scatter(x='branch', y='retention', s=df['active_users']*200)
E:\Anaconda2\lib\site-packages\pandas\tools\plotting.pyc in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
3669 fontsize=fontsize, colormap=colormap, table=table,
3670 yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 3671 sort_columns=sort_columns, **kwds)
3672 __call__.__doc__ = plot_frame.__doc__
3673
E:\Anaconda2\lib\site-packages\pandas\tools\plotting.pyc in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
2554 yerr=yerr, xerr=xerr,
2555 secondary_y=secondary_y, sort_columns=sort_columns,
-> 2556 **kwds)
2557
2558
E:\Anaconda2\lib\site-packages\pandas\tools\plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
2382 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
2383
-> 2384 plot_obj.generate()
2385 plot_obj.draw()
2386 return plot_obj.result
E:\Anaconda2\lib\site-packages\pandas\tools\plotting.pyc in generate(self)
985 self._compute_plot_data()
986 self._setup_subplots()
--> 987 self._make_plot()
988 self._add_table()
989 self._make_legend()
E:\Anaconda2\lib\site-packages\pandas\tools\plotting.pyc in _make_plot(self)
1556 else:
1557 label = None
-> 1558 scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
1559 label=label, cmap=cmap, **self.kwds)
1560 if cb:
E:\Anaconda2\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1967 return self._getitem_multilevel(key)
1968 else:
-> 1969 return self._getitem_column(key)
1970
1971 def _getitem_column(self, key):
E:\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
1974 # get column
1975 if self.columns.is_unique:
-> 1976 return self._get_item_cache(key)
1977
1978 # duplicate columns & possible reduce dimensionality
E:\Anaconda2\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
1089 res = cache.get(item)
1090 if res is None:
-> 1091 values = self._data.get(item)
1092 res = self._box_item_values(item, values)
1093 cache[item] = res
E:\Anaconda2\lib\site-packages\pandas\core\internals.pyc in get(self, item, fastpath)
3209
3210 if not isnull(item):
-> 3211 loc = self.items.get_loc(item)
3212 else:
3213 indexer = np.arange(len(self.items))[isnull(self.items)]
E:\Anaconda2\lib\site-packages\pandas\core\index.pyc in get_loc(self, key, method, tolerance)
1757 'backfill or nearest lookups')
1758 key = _values_from_object(key)
-> 1759 return self._engine.get_loc(key)
1760
1761 indexer = self.get_indexer([key], method=method,
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3979)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12265)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12216)()
KeyError: 'branch'
I can confidently say that the column 'branch' exists, then why the KeyError?
I know this problem exists when the column in datetime, but this column is string.
Any help would be appreciated.

Matplotlib errorbar fails to read a pandas data frame

I have two data frames in pandas format that I am trying to plot as values and error bars. But the python interface complains about some error I cannot understand. I have tested a colleague's almost the same code, and it appears that the fact that I run python 3.5 while he utilizes 2.7, is the source of the error. Therefore, I did test his code on my computer (python 3.5) and I am getting the same error message.
Bellow is a subset of my troubling code:
"Using pandas library to combine the three white spruce data sets"
trees = [white_spruce_1,white_spruce_2,white_spruce_3]
ntrees = pd.concat(trees) # Concatenate list into a series
spruce_stat = ntrees.groupby("Wvl") #Converted the series into a panda object
mean_spruce = spruce_stat.mean()
std_spruce = spruce_stat.std()
#mean_spruce.head()
mean_spruce['wvl']=mean_spruce.index
mean_spruce.head()
Chan.# Rad. (Ref.) Rad. (Target) Tgt./Ref. %
Wvl
350 0 0 0.000014 0.686176
351 0 0 0.000015 0.707577
std_spruce.head()
Chan.# Rad. (Ref.) Rad. (Target) Tgt./Ref. %
Wvl
350 0 0 0.000014 0.686176
351 0 0 0.000015 0.707577
plt.errorbar(mean_spruce['wvl'],mean_spruce['Tgt./Ref. %'], xerr = None, yerr = std_spruce['Rad. (Ref.)'])
Bellow is the error message I receive:
KeyError Traceback (most recent call last)
<ipython-input-52-13352d94b09c> in <module>()
2 #plt.errorbar(mean_spruce['wvl'],mean_spruce['Tgt./Ref. %'], xerr = None,yerr=std_spruce['Tgt./Ref. %'],c='k',ecolor='r', elinewidth=0.5, errorevery=5)
3 #plt.errorbar( x, y, xerr = None , yerr = sd_white_spruce['Tgt./Ref. %'],c = 'green', ecolor = 'red', capsize = 0,elinewidth = 0.5, errorevery = 5 )
----> 4 plt.errorbar(mean_spruce['wvl'],mean_spruce['Tgt./Ref. %'], xerr = None, yerr = std_spruce['Rad. (Ref.)'])# ,c = 'green', ecolor = 'red', capsize = 0,elinewidth = 0.5, errorevery = 5)
5
C:\Users\mike\Anaconda3\lib\site-packages\matplotlib\pyplot.py in errorbar(x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, hold, data, **kwargs)
2828 xlolims=xlolims, xuplims=xuplims,
2829 errorevery=errorevery, capthick=capthick, data=data,
-> 2830 **kwargs)
2831 finally:
2832 ax.hold(washold)
C:\Users\mike\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1809 warnings.warn(msg % (label_namer, func.__name__),
1810 RuntimeWarning, stacklevel=2)
-> 1811 return func(ax, *args, **kwargs)
1812 pre_doc = inner.__doc__
1813 if pre_doc is None:
C:\Users\mike\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in errorbar(self, x, y, yerr, xerr, fmt, ecolor, elinewidth, capsize, barsabove, lolims, uplims, xlolims, xuplims, errorevery, capthick, **kwargs)
2961 # Check for scalar or symmetric, as in xerr.
2962 if len(yerr) > 1 and not ((len(yerr) == len(y) and not (
-> 2963 iterable(yerr[0]) and len(yerr[0]) > 1))):
2964 raise ValueError("yerr must be a scalar, the same "
2965 "dimensions as y, or 2xN.")
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
555 def __getitem__(self, key):
556 try:
--> 557 result = self.index.get_value(self, key)
558
559 if not np.isscalar(result):
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
3882
3883 k = _values_from_object(key)
-> 3884 loc = self.get_loc(k)
3885 new_values = _values_from_object(series)[loc]
3886
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method, tolerance)
3940 pass
3941 return super(Float64Index, self).get_loc(key, method=method,
-> 3942 tolerance=tolerance)
3943
3944 #property
C:\Users\mike\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method, tolerance)
1757 'backfill or nearest lookups')
1758 key = _values_from_object(key)
-> 1759 return self._engine.get_loc(key)
1760
1761 indexer = self.get_indexer([key], method=method,
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3979)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Float64HashTable.get_item (pandas\hashtable.c:9556)()
pandas\hashtable.pyx in pandas.hashtable.Float64HashTable.get_item (pandas\hashtable.c:9494)()
KeyError: 0.0
Thanks for the help
The problem is the discrepancy between the Pandas indexing and the indexing of Matplotlib internal functions. One way to resolve it -albeit not being elegant- is to create a dummy dataframe just for the plotting purpose. In your case:
mean_spruce_dummy = mean_spruce
mean_spruce_dummy.columns = np.arange(0, len(mean_spruce))
In principle, this discrepancy is solved in the newer version of Pandas.
I'm seeing a similar error in python 2.7. My solution is to access the underlying data directly. This should work for you
x = mean_spruce['wvl'].values
y = mean_spruce['Tgt./Ref. %'].values
yerr = std_spruce['Rad. (Ref.)'].values
plt.errorbar(x, y yerr=yerr)

KeyError when plotting a sliced pandas dataframe with datetimes

I get a KeyError when I try to plot a slice of a pandas DataFrame column with datetimes in it. Does anybody know what could cause this?
I managed to reproduce the error in a little self contained example (which you can also view here: http://nbviewer.ipython.org/3714142/):
import numpy as np
from pandas import DataFrame
import datetime
from pylab import *
test = DataFrame({'x' : [datetime.datetime(2012,9,10) + datetime.timedelta(n) for n in range(10)],
'y' : range(10)})
Now if I plot:
plot(test['x'][0:5])
there is not problem, but when I plot:
plot(test['x'][5:10])
I get the KeyError below (and the error message is not very helpfull to me). This only happens with datetime columns, not with other columns (as far as I experienced). E.g. plot(test['y'][5:10]) is not a problem.
Ther error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-aa076e3fc4e0> in <module>()
----> 1 plot(test['x'][5:10])
C:\Python27\lib\site-packages\matplotlib\pyplot.pyc in plot(*args, **kwargs)
2456 ax.hold(hold)
2457 try:
-> 2458 ret = ax.plot(*args, **kwargs)
2459 draw_if_interactive()
2460 finally:
C:\Python27\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
3846 lines = []
3847
-> 3848 for line in self._get_lines(*args, **kwargs):
3849 self.add_line(line)
3850 lines.append(line)
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
321 return
322 if len(remaining) <= 3:
--> 323 for seg in self._plot_args(remaining, kwargs):
324 yield seg
325 return
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
298 x = np.arange(y.shape[0], dtype=float)
299
--> 300 x, y = self._xy_from_xy(x, y)
301
302 if self.command == 'plot':
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _xy_from_xy(self, x, y)
215 if self.axes.xaxis is not None and self.axes.yaxis is not None:
216 bx = self.axes.xaxis.update_units(x)
--> 217 by = self.axes.yaxis.update_units(y)
218
219 if self.command!='plot':
C:\Python27\lib\site-packages\matplotlib\axis.pyc in update_units(self, data)
1277 neednew = self.converter!=converter
1278 self.converter = converter
-> 1279 default = self.converter.default_units(data, self)
1280 #print 'update units: default=%s, units=%s'%(default, self.units)
1281 if default is not None and self.units is None:
C:\Python27\lib\site-packages\matplotlib\dates.pyc in default_units(x, axis)
1153 'Return the tzinfo instance of *x* or of its first element, or None'
1154 try:
-> 1155 x = x[0]
1156 except (TypeError, IndexError):
1157 pass
C:\Python27\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
374 def __getitem__(self, key):
375 try:
--> 376 return self.index.get_value(self, key)
377 except InvalidIndexError:
378 pass
C:\Python27\lib\site-packages\pandas\core\index.pyc in get_value(self, series, key)
529 """
530 try:
--> 531 return self._engine.get_value(series, key)
532 except KeyError, e1:
533 if len(self) > 0 and self.inferred_type == 'integer':
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1479)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1374)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2498)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2460)()
KeyError: 0
HYRY explained why you get the KeyError.
To plot with slices using matplotlib you can do:
In [157]: plot(test['x'][5:10].values)
Out[157]: [<matplotlib.lines.Line2D at 0xc38348c>]
In [158]: plot(test['x'][5:10].reset_index(drop=True))
Out[158]: [<matplotlib.lines.Line2D at 0xc37e3cc>]
x, y plotting in one go with 0.7.3
In [161]: test[5:10].set_index('x')['y'].plot()
Out[161]: <matplotlib.axes.AxesSubplot at 0xc48b1cc>
Instead of calling plot(test["x"][5:10]), you can call the plot method of Series object:
test["x"][5:10].plot()
The reason: test["x"][5:10] is a Series object with integer index from 5 to 10. plot() try to get index 0 of it, that will cause error.
I encountered this error with pd.groupby in Pandas 0.14.0 and solved it with df = df[df['col']!= 0].reset_index()

Categories

Resources