KeyError when plotting a sliced pandas dataframe with datetimes - python

I get a KeyError when I try to plot a slice of a pandas DataFrame column with datetimes in it. Does anybody know what could cause this?
I managed to reproduce the error in a little self contained example (which you can also view here: http://nbviewer.ipython.org/3714142/):
import numpy as np
from pandas import DataFrame
import datetime
from pylab import *
test = DataFrame({'x' : [datetime.datetime(2012,9,10) + datetime.timedelta(n) for n in range(10)],
'y' : range(10)})
Now if I plot:
plot(test['x'][0:5])
there is not problem, but when I plot:
plot(test['x'][5:10])
I get the KeyError below (and the error message is not very helpfull to me). This only happens with datetime columns, not with other columns (as far as I experienced). E.g. plot(test['y'][5:10]) is not a problem.
Ther error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-aa076e3fc4e0> in <module>()
----> 1 plot(test['x'][5:10])
C:\Python27\lib\site-packages\matplotlib\pyplot.pyc in plot(*args, **kwargs)
2456 ax.hold(hold)
2457 try:
-> 2458 ret = ax.plot(*args, **kwargs)
2459 draw_if_interactive()
2460 finally:
C:\Python27\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
3846 lines = []
3847
-> 3848 for line in self._get_lines(*args, **kwargs):
3849 self.add_line(line)
3850 lines.append(line)
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
321 return
322 if len(remaining) <= 3:
--> 323 for seg in self._plot_args(remaining, kwargs):
324 yield seg
325 return
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
298 x = np.arange(y.shape[0], dtype=float)
299
--> 300 x, y = self._xy_from_xy(x, y)
301
302 if self.command == 'plot':
C:\Python27\lib\site-packages\matplotlib\axes.pyc in _xy_from_xy(self, x, y)
215 if self.axes.xaxis is not None and self.axes.yaxis is not None:
216 bx = self.axes.xaxis.update_units(x)
--> 217 by = self.axes.yaxis.update_units(y)
218
219 if self.command!='plot':
C:\Python27\lib\site-packages\matplotlib\axis.pyc in update_units(self, data)
1277 neednew = self.converter!=converter
1278 self.converter = converter
-> 1279 default = self.converter.default_units(data, self)
1280 #print 'update units: default=%s, units=%s'%(default, self.units)
1281 if default is not None and self.units is None:
C:\Python27\lib\site-packages\matplotlib\dates.pyc in default_units(x, axis)
1153 'Return the tzinfo instance of *x* or of its first element, or None'
1154 try:
-> 1155 x = x[0]
1156 except (TypeError, IndexError):
1157 pass
C:\Python27\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
374 def __getitem__(self, key):
375 try:
--> 376 return self.index.get_value(self, key)
377 except InvalidIndexError:
378 pass
C:\Python27\lib\site-packages\pandas\core\index.pyc in get_value(self, series, key)
529 """
530 try:
--> 531 return self._engine.get_value(series, key)
532 except KeyError, e1:
533 if len(self) > 0 and self.inferred_type == 'integer':
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1479)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1374)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2498)()
C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2460)()
KeyError: 0

HYRY explained why you get the KeyError.
To plot with slices using matplotlib you can do:
In [157]: plot(test['x'][5:10].values)
Out[157]: [<matplotlib.lines.Line2D at 0xc38348c>]
In [158]: plot(test['x'][5:10].reset_index(drop=True))
Out[158]: [<matplotlib.lines.Line2D at 0xc37e3cc>]
x, y plotting in one go with 0.7.3
In [161]: test[5:10].set_index('x')['y'].plot()
Out[161]: <matplotlib.axes.AxesSubplot at 0xc48b1cc>

Instead of calling plot(test["x"][5:10]), you can call the plot method of Series object:
test["x"][5:10].plot()
The reason: test["x"][5:10] is a Series object with integer index from 5 to 10. plot() try to get index 0 of it, that will cause error.

I encountered this error with pd.groupby in Pandas 0.14.0 and solved it with df = df[df['col']!= 0].reset_index()

Related

Why I am getting Error while using Lambda within Apply

Request help on why the following is giving error?:
import numpy as np
from pydataset import data
mtcars = data('mtcars')
mtcars.apply(['mean', lambda x: max(x)-min(x), lambda x: np.percentile(x, 0.15)])
I am trying to create a data frame with the mean, max-min and 15th Percentile for all columns of the dataset mtcars.
Error Message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py in agg_list_like(obj, arg, _axis)
674 try:
--> 675 return concat(results, keys=keys, axis=1, sort=False)
676 except TypeError as err:
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
284 """
--> 285 op = _Concatenator(
286 objs,
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
369 )
--> 370 raise TypeError(msg)
371
TypeError: cannot concatenate object of type '<class 'float'>'; only Series and DataFrame objs are valid
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-645-51b8f1de1855> in <module>
----> 1 mtcars.apply(['mean', lambda x: max(x)-min(x), lambda x: np.percentile(x, 0.15)])
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/apply.py in get_result(self)
145 # pandas\core\apply.py:144: error: "aggregate" of "DataFrame" gets
146 # multiple values for keyword argument "axis"
--> 147 return self.obj.aggregate( # type: ignore[misc]
148 self.f, axis=self.axis, *self.args, **self.kwds
149 )
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
7576 result = None
7577 try:
-> 7578 result, how = self._aggregate(func, axis, *args, **kwargs)
7579 except TypeError as err:
7580 exc = TypeError(
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in _aggregate(self, arg, axis, *args, **kwargs)
7607 result = result.T if result is not None else result
7608 return result, how
-> 7609 return aggregate(self, arg, *args, **kwargs)
7610
7611 agg = aggregate
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py in aggregate(obj, arg, *args, **kwargs)
584 # we require a list, but not an 'str'
585 arg = cast(List[AggFuncTypeBase], arg)
--> 586 return agg_list_like(obj, arg, _axis=_axis), None
587 else:
588 result = None
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py in agg_list_like(obj, arg, _axis)
651 colg = obj._gotitem(col, ndim=1, subset=selected_obj.iloc[:, index])
652 try:
--> 653 new_res = colg.aggregate(arg)
654 except (TypeError, DataError):
655 pass
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in aggregate(self, func, axis, *args, **kwargs)
3972 func = dict(kwargs.items())
3973
-> 3974 result, how = aggregate(self, func, *args, **kwargs)
3975 if result is None:
3976
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py in aggregate(obj, arg, *args, **kwargs)
584 # we require a list, but not an 'str'
585 arg = cast(List[AggFuncTypeBase], arg)
--> 586 return agg_list_like(obj, arg, _axis=_axis), None
587 else:
588 result = None
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py in agg_list_like(obj, arg, _axis)
683 result = Series(results, index=keys, name=obj.name)
684 if is_nested_object(result):
--> 685 raise ValueError(
686 "cannot combine transform and aggregation operations"
687 ) from err
ValueError: cannot combine transform and aggregation operations
But, the following works:
mtcars.apply(['mean', lambda x: max(x)-min(x)])
Both type(mtcars.apply(lambda x: np.percentile(x, 0.15))) and type(mtcars.apply(lambda x: max(x)-min(x))) gives Pandas Series. Then why the problem is happening with only the Percentile?
Thanks
Reading the answer by #James my guess is that you need to write the custom function such that the function is applied on the series and not over each element. Maybe someone else who is more familiar with the underlying pandas code can chip in:
def min_max(x):
return max(x)-min(x)
def perc(x):
return x.quantile(0.15)
mtcars.agg(['mean',min_max,perc])
mpg cyl disp hp drat wt qsec vs am gear carb
mean 20.090625 6.1875 230.721875 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125
min_max 23.500000 4.0000 400.900000 283.0000 2.170000 3.91100 8.40000 1.0000 1.00000 2.0000 7.0000
perc 14.895000 4.0000 103.485000 82.2500 3.070000 2.17900 16.24300 0.0000 0.00000 3.0000 1.0000

loc API of xarray gives KeyError: "not all values found in index"

I am having issues with the loc API of xarray. I am trying to select data that satisfy a certain condition. Below should be a reproducible example:
import xarray as xr
da = xr.tutorial.load_dataset('air_temperature').air
mask = 100*xr.ones_like(da[0])
da[0].loc[da[0]<mask]
but this gives the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-238-75b6b6f544d4> in <module>
1 da = xr.tutorial.load_dataset('air_temperature').air
2 mask = 100*xr.ones_like(da[0])
----> 3 da[0].loc[da[0]<mask]
~/miniconda3/envs/ensemble/lib/python3.7/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
194 labels = indexing.expanded_indexer(key, self.data_array.ndim)
195 key = dict(zip(self.data_array.dims, labels))
--> 196 return self.data_array.sel(**key)
197
198 def __setitem__(self, key, value) -> None:
~/miniconda3/envs/ensemble/lib/python3.7/site-packages/xarray/core/dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
1045 method=method,
1046 tolerance=tolerance,
-> 1047 **indexers_kwargs
1048 )
1049 return self._from_temp_dataset(ds)
~/miniconda3/envs/ensemble/lib/python3.7/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
1998 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
1999 pos_indexers, new_indexes = remap_label_indexers(
-> 2000 self, indexers=indexers, method=method, tolerance=tolerance
2001 )
2002 result = self.isel(indexers=pos_indexers, drop=drop)
~/miniconda3/envs/ensemble/lib/python3.7/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs)
390
391 pos_indexers, new_indexes = indexing.remap_label_indexers(
--> 392 obj, v_indexers, method=method, tolerance=tolerance
393 )
394 # attach indexer's coordinate to pos_indexers
~/miniconda3/envs/ensemble/lib/python3.7/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
259 coords_dtype = data_obj.coords[dim].dtype
260 label = maybe_cast_to_coords_dtype(label, coords_dtype)
--> 261 idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
262 pos_indexers[dim] = idxr
263 if new_idx is not None:
~/miniconda3/envs/ensemble/lib/python3.7/site-packages/xarray/core/indexing.py in convert_label_indexer(index, label, index_name, method, tolerance)
191 indexer = get_indexer_nd(index, label, method, tolerance)
192 if np.any(indexer < 0):
--> 193 raise KeyError("not all values found in index %r" % index_name)
194 return indexer, new_index
195
KeyError: "not all values found in index 'lat'"
Since the mask is identical to the xarray.DataArray da in terms of dimensions and coordinates, I don't think this error makes sense... Am I missing something or is this possibly a bug?
Thank you in advance for your help.

Dask groupby with multiple columns issue

I have the following dataframe created by using dataframe.from_delayed method tha has the following columns
_id hour_timestamp http_method total_hits username hour weekday.
Some details on the source dataframe:
hits_rate_stats._meta.dtypes
_id object
hour_timestamp datetime64[ns]
http_method object
total_hits object
username object
hour int64
weekday int64
dtype: object
meta index:
RangeIndex(start=0, stop=0, step=1)
When I execute the following code
my_df_grouped = my_df.groupby(['username', 'http_method', 'weekday', 'hour'])
my_df_grouped.total_hits.sum().reset_index().compute()
I get the following exception:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-b24b24fc86db> in <module>()
----> 1 hits_rate_stats_grouped.total_hits.sum().reset_index().compute()
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/dask/base.pyc in compute(self, **kwargs)
141 dask.base.compute
142 """
--> 143 (result,) = compute(self, traverse=False, **kwargs)
144 return result
145
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/dask/base.pyc in compute(*args, **kwargs)
390 postcomputes = [a.__dask_postcompute__() if is_dask_collection(a)
391 else (None, a) for a in args]
--> 392 results = get(dsk, keys, **kwargs)
393 results_iter = iter(results)
394 return tuple(a if f is None else f(next(results_iter), *a)
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/distributed/client.pyc in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, **kwargs)
2039 secede()
2040 try:
-> 2041 results = self.gather(packed, asynchronous=asynchronous)
2042 finally:
2043 for f in futures.values():
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/distributed/client.pyc in gather(self, futures, errors, maxsize, direct, asynchronous)
1476 return self.sync(self._gather, futures, errors=errors,
1477 direct=direct, local_worker=local_worker,
-> 1478 asynchronous=asynchronous)
1479
1480 #gen.coroutine
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/distributed/client.pyc in sync(self, func, *args, **kwargs)
601 return future
602 else:
--> 603 return sync(self.loop, func, *args, **kwargs)
604
605 def __repr__(self):
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/distributed/utils.pyc in sync(loop, func, *args, **kwargs)
251 e.wait(10)
252 if error[0]:
--> 253 six.reraise(*error[0])
254 else:
255 return result[0]
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/distributed/utils.pyc in f()
235 yield gen.moment
236 thread_state.asynchronous = True
--> 237 result[0] = yield make_coro()
238 except Exception as exc:
239 logger.exception(exc)
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/tornado/gen.pyc in run(self)
1053
1054 try:
-> 1055 value = future.result()
1056 except Exception:
1057 self.had_exception = True
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/tornado/concurrent.pyc in result(self, timeout)
236 if self._exc_info is not None:
237 try:
--> 238 raise_exc_info(self._exc_info)
239 finally:
240 self = None
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/tornado/gen.pyc in run(self)
1061 if exc_info is not None:
1062 try:
-> 1063 yielded = self.gen.throw(*exc_info)
1064 finally:
1065 # Break up a reference to itself
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/distributed/client.pyc in _gather(self, futures, errors, direct, local_worker)
1354 six.reraise(type(exception),
1355 exception,
-> 1356 traceback)
1357 if errors == 'skip':
1358 bad_keys.add(key)
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/dask/dataframe/core.pyc in apply_and_enforce()
3354 return meta
3355 c = meta.columns if isinstance(df, pd.DataFrame) else meta.name
-> 3356 return _rename(c, df)
3357 return df
3358
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/dask/dataframe/core.pyc in _rename()
3391 # deep=False doesn't doesn't copy any data/indices, so this is cheap
3392 df = df.copy(deep=False)
-> 3393 df.columns = columns
3394 return df
3395 elif isinstance(df, (pd.Series, pd.Index)):
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/pandas/core/generic.pyc in __setattr__()
3625 try:
3626 object.__getattribute__(self, name)
-> 3627 return object.__setattr__(self, name, value)
3628 except AttributeError:
3629 pass
pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/pandas/core/generic.pyc in _set_axis()
557
558 def _set_axis(self, axis, labels):
--> 559 self._data.set_axis(axis, labels)
560 self._clear_item_cache()
561
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/pandas/core/internals.pyc in set_axis()
3072 raise ValueError('Length mismatch: Expected axis has %d elements, '
3073 'new values have %d elements' %
-> 3074 (old_len, new_len))
3075
3076 self.axes[axis] = new_labels
ValueError: Length mismatch: Expected axis has 5 elements, new values have 2 elements
When I do my_df_grouped.count().reset_index().compute() it works as it should and when I do my_df_grouped.sum().reset_index().compute() I get
/home/avlach/virtualenvs/enorasys_sa_v2/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper()
2830 raise ValueError('No group keys passed!')
2831 else:
-> 2832 raise ValueError('multiple levels only valid with '
2833 'MultiIndex')
2834
ValueError: multiple levels only valid with MultiIndex
Reproducing locally with dummy data doesn't give me these errors. What could be going wrong?
EDIT:
It seems it is loosing the multi index. If I do this:
total_hits = my_df_grouped.total_hits.sum()
total_hits._meta.index = pd.MultiIndex(levels=[[],[],[],[],], labels=[[],[],[],[]], names=['username', 'http_method', 'weekday', hour'])

ValueError when using .diff() with dask dataframe

I have a large time series data set which I want to process with Dask.
apart from a few other columns, there is a column called 'id' which identifies individuals and a column transc_date which identifies the date and a column transc_time identifying the time when an individual made a transaction.
The data is sorted using:
df = df.map_partitions(lambda x: x.sort_values(['id', 'transc_date', 'transc_time'], ascending=[True, True, True]))
transc_time is of type int and transc_date is of type datetime64.
I want to create a new column which gives me for each individual the number of days since the last transaction. For this I created the following function:
def get_diff_since_last_trans(df, plot=True):
df['diff_last'] = df.map_overlap(lambda x: x.groupby('id')['transc_date'].diff(), before=10, after=10)
diffs = df[['id', 'diff_last']].groupby(['id']).agg('max')['diff_last'].dt.days.compute()
if plot:
sns.distplot(diffs.values, kde = False, rug = False)
return diffs
When I try this function on a small subset of the data (200k rows) it works as intended. But when I use it on the full data set I get a ValueErro below.
I dropped all ids which have fewer than 10 occurrences first. transc_date does not contain nans, it only contains datetime64 entries.
Any idea what's going wrong?
ValueError Traceback (most recent call last)
<ipython-input-12-551d7256f328> in <module>()
1 a = get_diff_first_last_trans(df, plot=False)
----> 2 b = get_diff_since_last_trans(df, plot=False)
3 plot_trans_diff(a,b)
<ipython-input-10-8f83d4571659> in get_diff_since_last_trans(df, plot)
12 def get_diff_since_last_trans(df, plot=True):
13 df['diff_last'] = df.map_overlap(lambda x: x.groupby('id')['transc_date'].diff(), before=10, after=10)
---> 14 diffs = df[['id', 'diff_last']].groupby(['id']).agg('max')['diff_last'].dt.days.compute()
15 if plot:
16 sns.distplot(diffs.values, kde = False, rug = False)
~/venv/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
133 dask.base.compute
134 """
--> 135(result,)= compute(self, traverse=False,**kwargs) 136return result
137
~/venv/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
331 postcomputes = [a.__dask_postcompute__() if is_dask_collection(a)
332 else (None, a) for a in args]
--> 333 results = get(dsk, keys, **kwargs)
334 results_iter = iter(results)
335 return tuple(a if f is None else f(next(results_iter), *a)
~/venv/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, **kwargs)
1997 secede()
1998 try:
-> 1999 results = self.gather(packed, asynchronous=asynchronous)
2000 finally:
2001 for f in futures.values():
~/venv/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous)
1435 return self.sync(self._gather, futures, errors=errors,
1436 direct=direct, local_worker=local_worker,
-> 1437 asynchronous=asynchronous)
1438
1439 #gen.coroutine
~/venv/lib/python3.6/site-packages/distributed/client.py in sync(self, func, *args, **kwargs)
590 return future
591 else:
--> 592return sync(self.loop, func,*args,**kwargs) 593 594def __repr__(self):
~/venv/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs)
252 e.wait(1000000)
253 if error[0]:
--> 254 six.reraise(*error[0])
255 else:
256 return result[0]
~/venv/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693raise value
694finally: 695 value =None
~/venv/lib/python3.6/site-packages/distributed/utils.py in f()
236 yield gen.moment
237 thread_state.asynchronous = True
--> 238 result[0] = yield make_coro()
239 except Exception as exc:
240 logger.exception(exc)
~/venv/lib/python3.6/site-packages/tornado/gen.py in run(self)
1053
1054 try:
-> 1055 value = future.result()
1056 except Exception:
1057 self.had_exception = True
~/venv/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
236 if self._exc_info is not None:
237 try:
--> 238 raise_exc_info(self._exc_info)
239 finally:
240 self = None
~/venv/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
~/venv/lib/python3.6/site-packages/tornado/gen.py in run(self)
1061 if exc_info is not None:
1062 try:
-> 1063 yielded = self.gen.throw(*exc_info)
1064 finally:
1065 # Break up a reference to itself
~/venv/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
1313 six.reraise(type(exception),
1314 exception,
-> 1315 traceback)
1316 if errors == 'skip':
1317 bad_keys.add(key)
~/venv/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
690 value = tp()
691 if value.__traceback__ is not tb:
--> 692raise value.with_traceback(tb) 693raise value
694finally:
~/venv/lib/python3.6/site-packages/dask/dataframe/rolling.py in overlap_chunk()
30 parts = [p for p in (prev_part, current_part, next_part) if p is not None]
31 combined = pd.concat(parts)
---> 32 out = func(combined, *args, **kwargs)
33 if prev_part is None:
34 before = None
<ipython-input-10-8f83d4571659> in <lambda>()
11
12 def get_diff_since_last_trans(df, plot=True):
---> 13 df['diff_last'] = df.map_overlap(lambda x: x.groupby('id')['transc_date'].diff(), before=10, after=10)
14 diffs = df[['id', 'diff_last']].groupby(['id']).agg('max')['diff_last'].dt.days.compute()
15 if plot:
~/venv/lib/python3.6/site-packages/pandas/core/groupby.py in wrapper()
737 *args, **kwargs)
738 except (AttributeError):
--> 739raise ValueError
740 741return wrapper
ValueError:

AssertionError using Basemap and Pandas

I'm trying to follow the tutorial here:
http://nbviewer.ipython.org/github/ehmatthes/intro_programming/blob/master/notebooks/visualization_earthquakes.ipynb#install_standard
However, I am using pandas instead of the built in csv module for python. My code is as follows:
import pandas as pd
eq_data = pd.read_csv('earthquake_data.csv')
map2 = Basemap(projection='robin'
, resolution='l'
, area_thresh=1000.0
, lat_0=0
, lon_0=0)
map2.drawcoastlines()
map2.drawcountries()
map2.fillcontinents(color = 'gray')
map2.drawmapboundary()
map2.drawmeridians(np.arange(0, 360, 30))
map2.drawparallels(np.arange(-90, 90, 30))
x,y = map2(eq_data['longitude'].values, eq_data['latitude'].values)
map2.plot(x,y, marker='0', markercolor='red', markersize=6)
This produces an AssertionError but with no description:
AssertionError Traceback (most recent call last)
<ipython-input-64-d3426e1f175d> in <module>()
14 x,y = map2(range(20), range(20))#eq_data['longitude'].values, eq_data['latitude'].values)
15
---> 16 map2.plot(x,y, marker='0', markercolor='red', markersize=6)
c:\Python27\lib\site-packages\mpl_toolkits\basemap\__init__.pyc in with_transform(self, x, y, *args, **kwargs)
540 # convert lat/lon coords to map projection coords.
541 x, y = self(x,y)
--> 542 return plotfunc(self,x,y,*args,**kwargs)
543 return with_transform
544
c:\Python27\lib\site-packages\mpl_toolkits\basemap\__init__.pyc in plot(self, *args, **kwargs)
3263 ax.hold(h)
3264 try:
-> 3265 ret = ax.plot(*args, **kwargs)
3266 except:
3267 ax.hold(b)
c:\Python27\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
4135 lines = []
4136
-> 4137 for line in self._get_lines(*args, **kwargs):
4138 self.add_line(line)
4139 lines.append(line)
c:\Python27\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
315 return
316 if len(remaining) <= 3:
--> 317 for seg in self._plot_args(remaining, kwargs):
318 yield seg
319 return
c:\Python27\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
303 ncx, ncy = x.shape[1], y.shape[1]
304 for j in xrange(max(ncx, ncy)):
--> 305 seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
306 ret.append(seg)
307 return ret
c:\Python27\lib\site-packages\matplotlib\axes.pyc in _makeline(self, x, y, kw, kwargs)
255 **kw
256 )
--> 257 self.set_lineprops(seg, **kwargs)
258 return seg
259
c:\Python27\lib\site-packages\matplotlib\axes.pyc in set_lineprops(self, line, **kwargs)
198 raise TypeError('There is no line property "%s"' % key)
199 func = getattr(line, funcName)
--> 200 func(val)
201
202 def set_patchprops(self, fill_poly, **kwargs):
c:\Python27\lib\site-packages\matplotlib\lines.pyc in set_marker(self, marker)
851
852 """
--> 853 self._marker.set_marker(marker)
854
855 def set_markeredgecolor(self, ec):
c:\Python27\lib\site-packages\matplotlib\markers.pyc in set_marker(self, marker)
231 else:
232 try:
--> 233 Path(marker)
234 self._marker_function = self._set_vertices
235 except ValueError:
c:\Python27\lib\site-packages\matplotlib\path.pyc in __init__(self, vertices, codes, _interpolation_steps, closed, readonly)
145 codes[-1] = self.CLOSEPOLY
146
--> 147 assert vertices.ndim == 2
148 assert vertices.shape[1] == 2
149
AssertionError:
I thought I had the problem due to the update to pandas which no longer allows passing Series like you used to be able to as described here:
Runtime error using python basemap and pyproj?
But as you can see, I adjusted my code for this and it didn't fix the problem. At this point I am lost.
I am using Python 2.7.6, pandas 0.15.2, and basemap 1.0.7 on windows server 2012 x64.
There are two problems with my code. First, the plot function for the map2 object is inherited from matplotlib. Thus the marker attribute cannot be '0' it needs to be 'o'. Additionally, there is no markercolor attribute. It is called color. The below code should work.
import pandas as pd
eq_data = pd.read_csv('earthquake_data.csv')
map2 = Basemap(projection='robin'
, resolution='l'
, area_thresh=1000.0
, lat_0=0
, lon_0=0)
map2.drawcoastlines()
map2.drawcountries()
map2.fillcontinents(color = 'gray')
map2.drawmapboundary()
map2.drawmeridians(np.arange(0, 360, 30))
map2.drawparallels(np.arange(-90, 90, 30))
x,y = map2(eq_data['longitude'].values, eq_data['latitude'].values)
map2.plot(x,y, marker='o', color='red', markersize=6, linestyle='')

Categories

Resources